You May Also Enjoy
A Bibliography Database for Machine Learning
2 minute read
Published:
Getting the correct bibtex entry for a conference paper (e.g. published at NeurIPS, ICML, ICLR) is annoyingly hard: if you search for the title, you will often find a link to arxiv or to the pdf file, but not to the conference website that contains the bibtex.
How to jointly tune learning rate and weight decay for AdamW
15 minute read
Published:
TL;DR: AdamW is often considered a method that decouples weight decay and learning rate. In this blog post, we show that this is not true for the specific way AdamW is implemented in Pytorch. We also show how to adapt the tuning strategy in order to fix this: when doubling the learning rate, the weight decay should be halved.
Optimization Nuggets: Stochastic Polyak Step-size, Part 2
less than 1 minute read
Published:
Fabian Pedregosa invited me to write a joint blog post on a convergence proof for the stochastic Polyak step size (SPS).
Solve it all and solve it fast: using numba for optimization in Python
6 minute read
Published:
When implementing optimization algorithms, we typically have to balance the following goals: