TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

Journal: arXiv

Published Date: Nov 6, 2024

Abstract

Modern transformer-based deep neural networks present unique technical challenges for effective acceleration in real-world applications. Apart from the vast amount of linear operations needed due to their sizes, modern transformer models are increasingly reliance on precise non-linear computations that make traditional low-bitwidth quantization methods and fixed-dataflow matrix accelerators ineffective for end-to-end acceleration. To address this need to accelerate both linear and non-linear operations in a unified and programmable framework, this paper introduces TATAA. TATAA employs 8-bit integer (int8) arithmetic for quantized linear layer operations through post-training quantization, while it relies on bfloat16 floating-point arithmetic to approximate non-linear layers of a transformer model. TATAA hardware features a transformable arithmetic architecture that supports both formats during runtime with minimal overhead, enabling it to switch between a systolic array mode for int8 matrix multiplications and a SIMD mode for vectorized bfloat16 operations. An end-to-end compiler is presented to enable flexible mapping from emerging transformer models to the proposed hardware. Experimental results indicate that our mixed-precision design incurs only 0.14% to 1.16% accuracy drop when compared with the pre-trained single-precision transformer models across a range of vision, language, and generative text applications. Our prototype implementation on the Alveo U280 FPGA currently achieves 2935.2 GOPS throughput on linear layers and a maximum of 189.5 GFLOPS for non-linear operations, outperforming related works by up to 1.45x in end-to-end throughput and 2.29x in DSP efficiency, while achieving 2.19x higher power efficiency than modern NVIDIA RTX4090 GPU.

Authors

Jiajun Wu
Mo Song
Jingmin Zhao
Yizhao Gao
Jia Li
Hayden Kwok-Hay So

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2411.03697v1)

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals