Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
Journal:
arXiv
Published Date:
Apr 30, 2025
Abstract
The rapid development of AIGC foundation models has revolutionized the
paradigm of image compression, which paves the way for the abandonment of most
pixel-level transform and coding, compelling us to ask: why compress what you
can generate if the AIGC foundation model is powerful enough to faithfully
generate intricate structure and fine-grained details from nothing more than
some compact descriptors, i.e., texts, or cues. Fortunately, recent GPT-4o
image generation of OpenAI has achieved impressive cross-modality generation,
editing, and design capabilities, which motivates us to answer the above
question by exploring its potential in image compression fields. In this work,
we investigate two typical compression paradigms: textual coding and multimodal
coding (i.e., text + extremely low-resolution image), where all/most
pixel-level information is generated instead of compressing via the advanced
GPT-4o image generation function. The essential challenge lies in how to
maintain semantic and structure consistency during the decoding process. To
overcome this, we propose a structure raster-scan prompt engineering mechanism
to transform the image into textual space, which is compressed as the condition
of GPT-4o image generation. Extensive experiments have shown that the
combination of our designed structural raster-scan prompts and GPT-4o's image
generation function achieved the impressive performance compared with recent
multimodal/generative image compression at ultra-low bitrate, further
indicating the potential of AIGC generation in image compression fields.