G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion
Journal:
arXiv
Published Date:
Feb 7, 2025
Abstract
Understanding how genes influence phenotype across species is a fundamental
challenge in genetic engineering, which will facilitate advances in various
fields such as crop breeding, conservation biology, and personalized medicine.
However, current phenotype prediction models are limited to individual species
and expensive phenotype labeling process, making the genotype-to-phenotype
prediction a highly domain-dependent and data-scarce problem. To this end, we
suggest taking images as morphological proxies, facilitating cross-species
generalization through large-scale multimodal pretraining. We propose the first
genotype-to-phenotype diffusion model (G2PDiffusion) that generates
morphological images from DNA considering two critical evolutionary signals,
i.e., multiple sequence alignments (MSA) and environmental contexts. The model
contains three novel components: 1) a MSA retrieval engine that identifies
conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional
encoder that effectively models complex genotype-environment interactions; and
3) an adaptive phenomic alignment module to improve genotype-phenotype
consistency. Extensive experiments show that integrating evolutionary signals
with environmental context enriches the model's understanding of phenotype
variability across species, thereby offering a valuable and promising
exploration into advanced AI-assisted genomic analysis.