Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Journal:
arXiv
Published Date:
Jun 8, 2025
Abstract
The sampling temperature, a critical hyperparameter in large language models
(LLMs), modifies the logits before the softmax layer, thereby reshaping the
distribution of output tokens. Recent studies have challenged the Stochastic
Parrots analogy by demonstrating that LLMs are capable of understanding
semantics rather than merely memorizing data and that randomness, modulated by
sampling temperature, plays a crucial role in model inference. In this study,
we systematically evaluated the impact of temperature in the range of 0 to 2 on
data sets designed to assess six different capabilities, conducting statistical
analyses on open source models of three different sizes: small (1B--4B), medium
(6B--13B), and large (40B--80B). Our findings reveal distinct skill-specific
effects of temperature on model performance, highlighting the complexity of
optimal temperature selection in practical applications. To address this
challenge, we propose a BERT-based temperature selector that takes advantage of
these observed effects to identify the optimal temperature for a given prompt.
We demonstrate that this approach can significantly improve the performance of
small and medium models in the SuperGLUE datasets. Furthermore, our study
extends to FP16 precision inference, revealing that temperature effects are
consistent with those observed in 4-bit quantized models. By evaluating
temperature effects up to 4.0 in three quantized models, we find that the
Mutation Temperature -- the point at which significant performance changes
occur -- increases with model size.