Asymptotic theory of in-context learning by linear attention.

Journal: Proceedings of the National Academy of Sciences of the United States of America

Published Date: Jul 9, 2025

Abstract

Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved. Here, we provide a precise answer to these questions in an exactly solvable model of ICL of a linear regression task by linear attention. We derive sharp asymptotics for the learning curve in a phenomenologically rich scaling regime where the token dimension is taken to infinity; the context length and pretraining task diversity scale proportionally with the token dimension; and the number of pretraining examples scales quadratically. We demonstrate a double-descent learning curve with increasing pretraining examples, and uncover a phase transition in the model's behavior between low and high task diversity regimes: in the low diversity regime, the model tends toward memorization of training tasks, whereas in the high diversity regime, it achieves genuine ICL and generalization beyond the scope of pretrained tasks. These theoretical insights are empirically validated through experiments with both linear attention and full nonlinear Transformer architectures.

Authors

Yue M Lu

John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA (yuelu@seas.harvard.edu).
Mary Letey

The John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138.
Jacob A Zavatone-Veth

Department of Physics and Center for Brain Science, Harvard University, Cambridge, MA 02138, U.S.A. jzavatoneveth@g.harvard.edu.
Anindita Maiti

Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5, Canada.
Cengiz Pehlevan

Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, and Simons Center for Analysis, Simons Foundation, New York, NY 10010, U.S.A. cpehlevan@simonsfoundation.org.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40632569)

Asymptotic theory of in-context learning by linear attention.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Asymptotic theory of in-context learning by linear attention.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals