Fast Autoregressive Models for Continuous Latent Generation
Journal:
arXiv
Published Date:
Apr 24, 2025
Abstract
Autoregressive models have demonstrated remarkable success in sequential data
generation, particularly in NLP, but their extension to continuous-domain image
generation presents significant challenges. Recent work, the masked
autoregressive model (MAR), bypasses quantization by modeling per-token
distributions in continuous spaces using a diffusion head but suffers from slow
inference due to the high computational cost of the iterative denoising
process. To address this, we propose the Fast AutoRegressive model (FAR), a
novel framework that replaces MAR's diffusion head with a lightweight shortcut
head, enabling efficient few-step sampling while preserving autoregressive
principles. Additionally, FAR seamlessly integrates with causal Transformers,
extending them from discrete to continuous token generation without requiring
architectural modifications. Experiments demonstrate that FAR achieves
$2.3\times$ faster inference than MAR while maintaining competitive FID and IS
scores. This work establishes the first efficient autoregressive paradigm for
high-fidelity continuous-space image generation, bridging the critical gap
between quality and scalability in visual autoregressive modeling.