POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning
Journal:
arXiv
Published Date:
Jun 4, 2025
Abstract
With over 1,000,000 images from more than 10,000 exposures using
state-of-the-art high-contrast imagers (e.g., Gemini Planet Imager, VLT/SPHERE)
in the search for exoplanets, can artificial intelligence (AI) serve as a
transformative tool in imaging Earth-like exoplanets in the coming decade? In
this paper, we introduce a benchmark and explore this question from a
polarimetric image representation learning perspective. Despite extensive
investments over the past decade, only a few new exoplanets have been directly
imaged. Existing imaging approaches rely heavily on labor-intensive labeling of
reference stars, which serve as background to extract circumstellar objects
(disks or exoplanets) around target stars. With our POLARIS (POlarized Light
dAta for total intensity Representation learning of direct Imaging of
exoplanetary Systems) dataset, we classify reference star and circumstellar
disk images using the full public SPHERE/IRDIS polarized-light archive since
2014, requiring less than 10 percent manual labeling. We evaluate a range of
models including statistical, generative, and large vision-language models and
provide baseline performance. We also propose an unsupervised generative
representation learning framework that integrates these models, achieving
superior performance and enhanced representational power. To our knowledge,
this is the first uniformly reduced, high-quality exoplanet imaging dataset,
rare in astrophysics and machine learning. By releasing this dataset and
baselines, we aim to equip astrophysicists with new tools and engage data
scientists in advancing direct exoplanet imaging, catalyzing major
interdisciplinary breakthroughs.