Fidelity Isn't Accuracy: When Linearly Decodable Functions Fail to Match the Ground Truth
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
Neural networks excel as function approximators, but their complexity often
obscures the nature of the functions they learn. In this work, we propose the
linearity score $\lambda(f)$, a simple and interpretable diagnostic that
quantifies how well a regression network's output can be mimicked by a linear
model. Defined as the $R^2$ between the network's predictions and those of a
trained linear surrogate, $\lambda(f)$ offers insight into the linear
decodability of the learned function. We evaluate this framework on both
synthetic ($y = x \sin(x) + \epsilon$) and real-world datasets (Medical
Insurance, Concrete, California Housing), using dataset-specific networks and
surrogates. Our findings show that while high $\lambda(f)$ scores indicate
strong linear alignment, they do not necessarily imply predictive accuracy with
respect to the ground truth. This underscores both the promise and the
limitations of using linear surrogates to understand nonlinear model behavior,
particularly in high-stakes regression tasks.