Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
While transformers have proven enormously successful in a range of tasks,
their fundamental properties as models of computation are not well understood.
This paper contributes to the study of the expressive capacity of transformers,
focusing on their ability to perform the fundamental computational task of
evaluating an arbitrary function from $[n]$ to $[n]$ at a given argument. We
prove that concise 1-layer transformers (i.e., with a polylog bound on the
product of the number of heads, the embedding dimension, and precision) are
capable of doing this task under some representations of the input, but not
when the function's inputs and values are only encoded in different input
positions. Concise 2-layer transformers can perform the task even with the more
difficult input representation. Experimentally, we find a rough alignment
between what we have proven can be computed by concise transformers and what
can be practically learned.