We use the Fokker-Planck equation to explore the dynamics under stochastic gradient descent (SGD), the most common algorithm for training artificial neural networks. SGD has close parallels to natural processes that navigate a high-dimensional parameter space, such as protein folding and evolution. However, in contrast to its biophysical analogues, it leads to a nonequilibrium stationary state exhibiting persistent currents in the space of network parameters. We find that the effective loss landscape that determines the shape of the stationary distribution depends sensitively on whether the minibatches are selected with or without replacement. We demonstrate that the stationary state satisfies the integral fluctuation theorem, a nonequilibrium generalization of the second law of thermodynamics, and finally propose a "thermalization" procedure as an efficient method to implement Bayesian machine learning.