Superposition through Active Learning lens
Journal:
arXiv
Published Date:
Dec 5, 2024
Abstract
Superposition or Neuron Polysemanticity are important concepts in the field
of interpretability and one might say they are these most intricately beautiful
blockers in our path of decoding the Machine Learning black-box. The idea
behind this paper is to examine whether it is possible to decode Superposition
using Active Learning methods. While it seems that Superposition is an attempt
to arrange more features in smaller space to better utilize the limited
resources, it might be worth inspecting if Superposition is dependent on any
other factors. This paper uses CIFAR-10 and Tiny ImageNet image datasets and
the ResNet18 model and compares Baseline and Active Learning models and the
presence of Superposition in them is inspected across multiple criteria,
including t-SNE visualizations, cosine similarity histograms, Silhouette
Scores, and Davies-Bouldin Indexes. Contrary to our expectations, the active
learning model did not significantly outperform the baseline in terms of
feature separation and overall accuracy. This suggests that non-informative
sample selection and potential overfitting to uncertain samples may have
hindered the active learning model's ability to generalize better suggesting
more sophisticated approaches might be needed to decode superposition and
potentially reduce it.