BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
Unlocking deep, interpretable biological reasoning from complex genomic data
is a major AI challenge hindering scientific discovery. Current DNA foundation
models, despite strong sequence representation, struggle with multi-step
reasoning and lack inherent transparent, biologically intuitive explanations.
We introduce BioReason, a pioneering architecture that, for the first time,
deeply integrates a DNA foundation model with a Large Language Model (LLM).
This novel connection enables the LLM to directly process and reason with
genomic information as a fundamental input, fostering a new form of multimodal
biological understanding. BioReason's sophisticated multi-step reasoning is
developed through supervised fine-tuning and targeted reinforcement learning,
guiding the system to generate logical, biologically coherent deductions. On
biological reasoning benchmarks including KEGG-based disease pathway prediction
- where accuracy improves from 88% to 97% - and variant effect prediction,
BioReason demonstrates an average 15% performance gain over strong
single-modality baselines. BioReason reasons over unseen biological entities
and articulates decision-making through interpretable, step-by-step biological
traces, offering a transformative approach for AI in biology that enables
deeper mechanistic insights and accelerates testable hypothesis generation from
genomic data. Data, code, and checkpoints are publicly available at
https://github.com/bowang-lab/BioReason