Posts by Tags

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

less than 1 minute read

Published: November 19, 2023

Fabian Pedregosa invited me to write a joint blog post on a convergence proof for the stochastic Polyak step size (SPS).

Decay No More

less than 1 minute read

Published: May 01, 2023

I wrote a blog post which got published at the ICLR blog post track 2023. The post is titled Decay No More and explains the details of AdamW and its weight decay mechanism. Check it out here.

How to jointly tune learning rate and weight decay for AdamW

14 minute read

Published: February 19, 2024

TL;DR: AdamW is often considered a method that decouples weight decay and learning rate. In this blog post, we show that this is not true for the specific way AdamW is implemented in Pytorch. We also show how to adapt the tuning strategy in order to fix this: when doubling the learning rate, the weight decay should be halved.

A Bibliography Database for Machine Learning

2 minute read

Published: December 16, 2024

Getting the correct bibtex entry for a conference paper (e.g. published at NeurIPS, ICML, ICLR) is annoyingly hard: if you search for the title, you will often find a link to arxiv or to the pdf file, but not to the conference website that contains the bibtex.

Decay No More

less than 1 minute read

Published: May 01, 2023

I wrote a blog post which got published at the ICLR blog post track 2023. The post is titled Decay No More and explains the details of AdamW and its weight decay mechanism. Check it out here.

A collection of resources for creating open-source software packages

7 minute read

Published: January 03, 2022

Making your research code open-source, tested and documented is quite simple nowadays. This post gives an overview of the most important steps and collects useful ressources, e.g. tutorials for Readthedocs, Sphinx (Gallery) and unit testing in Python.

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

less than 1 minute read

Published: November 19, 2023

Fabian Pedregosa invited me to write a joint blog post on a convergence proof for the stochastic Polyak step size (SPS).

Decay No More

less than 1 minute read

Published: May 01, 2023

I wrote a blog post which got published at the ICLR blog post track 2023. The post is titled Decay No More and explains the details of AdamW and its weight decay mechanism. Check it out here.

Solve it all and solve it fast: using numba for optimization in Python

6 minute read

Published: June 06, 2022

When implementing optimization algorithms, we typically have to balance the following goals:

Solve it all and solve it fast: using numba for optimization in Python

6 minute read

Published: June 06, 2022

When implementing optimization algorithms, we typically have to balance the following goals:

Solve it all and solve it fast: using numba for optimization in Python

6 minute read

Published: June 06, 2022

When implementing optimization algorithms, we typically have to balance the following goals:

A collection of resources for creating open-source software packages

7 minute read

Published: January 03, 2022

Making your research code open-source, tested and documented is quite simple nowadays. This post gives an overview of the most important steps and collects useful ressources, e.g. tutorials for Readthedocs, Sphinx (Gallery) and unit testing in Python.

How to jointly tune learning rate and weight decay for AdamW

14 minute read

Published: February 19, 2024

TL;DR: AdamW is often considered a method that decouples weight decay and learning rate. In this blog post, we show that this is not true for the specific way AdamW is implemented in Pytorch. We also show how to adapt the tuning strategy in order to fix this: when doubling the learning rate, the weight decay should be halved.

Fabian Schaipp

Posts by Tags

Polyak step size

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

adam

Decay No More

adamw

How to jointly tune learning rate and weight decay for AdamW

datasets

A Bibliography Database for Machine Learning

machine learning

Decay No More

open-source

A collection of resources for creating open-source software packages

optimization

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

Decay No More

Solve it all and solve it fast: using numba for optimization in Python

python

Solve it all and solve it fast: using numba for optimization in Python

software

Solve it all and solve it fast: using numba for optimization in Python

A collection of resources for creating open-source software packages

weight decay

How to jointly tune learning rate and weight decay for AdamW