Understanding Contrastive Learning through Variational Analysis and Neural Network Optimization Perspectives
Journal:
arXiv
Published Date:
Mar 13, 2025
Abstract
The SimCLR method for contrastive learning of invariant visual
representations has become extensively used in supervised, semi-supervised, and
unsupervised settings, due to its ability to uncover patterns and structures in
image data that are not directly present in the pixel representations. However,
the reason for this success is not well-explained, since it is not guaranteed
by invariance alone. In this paper, we conduct a mathematical analysis of the
SimCLR method with the goal of better understanding the geometric properties of
the learned latent distribution. Our findings reveal two things: (1) the SimCLR
loss alone is not sufficient to select a good minimizer -- there are minimizers
that give trivial latent distributions, even when the original data is highly
clustered -- and (2) in order to understand the success of contrastive learning
methods like SimCLR, it is necessary to analyze the neural network training
dynamics induced by minimizing a contrastive learning loss. Our preliminary
analysis for a one-hidden layer neural network shows that clustering structure
can present itself for a substantial period of time during training, even if it
eventually converges to a trivial minimizer. To substantiate our theoretical
insights, we present numerical results that confirm our theoretical
predictions.