AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
Journal:
arXiv
Published Date:
Jun 10, 2025
Abstract
The site conditions that make astronomical observatories in space and on the
ground so desirable -- cold and dark -- demand a physical remoteness that leads
to limited data transmission capabilities. Such transmission limitations
directly bottleneck the amount of data acquired and in an era of costly modern
observatories, any improvements in lossless data compression has the potential
scale to billions of dollars worth of additional science that can be
accomplished on the same instrument. Traditional lossless methods for
compressing astrophysical data are manually designed. Neural data compression,
on the other hand, holds the promise of learning compression algorithms
end-to-end from data and outperforming classical techniques by leveraging the
unique spatial, temporal, and wavelength structures of astronomical images.
This paper introduces AstroCompress: a neural compression challenge for
astrophysics data, featuring four new datasets (and one legacy dataset) with
16-bit unsigned integer imaging data in various modes: space-based,
ground-based, multi-wavelength, and time-series imaging. We provide code to
easily access the data and benchmark seven lossless compression methods (three
neural and four non-neural, including all practical state-of-the-art
algorithms). Our results on lossless compression indicate that lossless neural
compression techniques can enhance data collection at observatories, and
provide guidance on the adoption of neural compression in scientific
applications. Though the scope of this paper is restricted to lossless
compression, we also comment on the potential exploration of lossy compression
methods in future studies.