AstroLLaVA: towards the unification of astronomical data and natural language
Journal:
arXiv
Published Date:
Apr 11, 2025
Abstract
We present AstroLLaVA, a vision language model for astronomy that enables
interaction with astronomical imagery through natural dialogue. By fine-tuning
the LLaVA model on a diverse dataset of $\sim$30k images with captions and
question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the
European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we
create a model capable of answering open-ended questions about astronomical
concepts depicted visually. Our two-stage fine-tuning process adapts the model
to both image captioning and visual question answering in the astronomy domain.
We demonstrate AstroLLaVA's performance on an astronomical visual question
answering benchmark and release the model weights, code, and training set to
encourage further open source work in this space. Finally, we suggest a roadmap
towards general astronomical data alignment with pre-trained language models,
and provide an open space for collaboration towards this end for interested
researchers.