GraphSeqLM: A Unified Graph Language Framework for Omic Graph Learning
Journal:
arXiv
Published Date:
Dec 20, 2024
Abstract
The integration of multi-omic data is pivotal for understanding complex
diseases, but its high dimensionality and noise present significant challenges.
Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale
signaling pathways and protein-protein interaction networks, yet they face
limitations in expressivity when capturing intricate biological relationships.
To address this, we propose Graph Sequence Language Model (GraphSeqLM), a
framework that enhances GNNs with biological sequence embeddings generated by
Large Language Models (LLMs). These embeddings encode structural and biological
properties of DNA, RNA, and proteins, augmenting GNNs with enriched features
for analyzing sample-specific multi-omic data. By integrating topological,
sequence-derived, and biological information, GraphSeqLM demonstrates superior
predictive accuracy and outperforms existing methods, paving the way for more
effective multi-omic data integration in precision medicine.