Seminar Cycles of the Statistical Physics Group

Fisica Statistica

Nonequilibrium Aspects of Stochastic Gradient Descent

by Prof. Alkan Kabakçıoğlu (Dept. of Physics, Koç University, İstanbul)

Europe/Rome
1/1-3 - Aula B (Dipartimento di Fisica e Astronomia - Edificio Marzolo)

1/1-3 - Aula B

Dipartimento di Fisica e Astronomia - Edificio Marzolo

200
Description

We use the Fokker-Planck equation to explore the dynamics under stochastic gradient descent (SGD), the most common algorithm for training artificial neural networks. SGD has close parallels to natural processes that navigate a high-dimensional parameter space, such as protein folding and evolution. However, in contrast to its biophysical analogues, it leads to a nonequilibrium stationary state exhibiting persistent currents in the space of network parameters. We find that the effective loss landscape that determines the shape of the stationary distribution depends sensitively on whether the minibatches are selected with or without replacement. We demonstrate that the stationary state satisfies the integral fluctuation theorem, a nonequilibrium generalization of the second law of thermodynamics, and finally propose a "thermalization" procedure as an efficient method to implement Bayesian machine learning.