CellAwareGNN: Single-Cell Enhanced Knowledge Graph Foundation Model for Drug Indication Prediction
Journal:
bioRxiv
Published Date:
Feb 23, 2026
Abstract
Graph foundation models have emerged as powerful tools for drug repurposing by enabling the prediction of novel drug-disease indications from large biomedical knowledge graphs. A representative example is TxGNN, which was previously developed and trained on PrimeKG, a comprehensive biomedical knowledge graph covering over 17,000 diseases. While TxGNN demonstrates strong performance, existing biomedical knowledge graphs largely lack fine-grained, cell-type-specific genomic context. It limits their ability to capture disease mechanisms driven by dysregulated cellular programs, such as immune cell-specific pathways in autoimmune diseases. Moreover, prior evaluations typically test only randomly selected subsets of diseases, leaving many diseases unexamined and limiting conclusions about model performance across the full disease spectrum. To address these limitations, we first update PrimeKG to PrimeKG-U by incorporating expanded and curated biomedical knowledge and then develop TxGNN-U as a stronger graph-based baseline. Building on this foundation, we introduce CellAwareGNN, a graph foundation model that integrates single-cell genomics into PrimeKG-U. We construct a single-cell-enhanced knowledge graph, scPrimeKG, by incorporating cell-type-specific genetic associations from the OneK1K dataset, expanding PrimeKG from approximately 8.1 million edges and 129k nodes to over 14 million edges and 140k nodes. CellAwareGNN is pre-trained on all relation types in scPrimeKG and evaluated on drug indication prediction with explicit coverage of all diseases in the knowledge graph. CellAwareGNN consistently outperforms TxGNN and TxGNN-U. For drug indication prediction, CellAwareGNN achieves an AUPRC of 0.826, representing a 1.2% improvement over TxGNN-U (0.816) and a 3.4% improvement over TxGNN (0.799). Notably, for autoimmune diseases, CellAwareGNN attains an AUPRC of 0.864, improving by 2.0% over TxGNN-U (0.847) and 6.0% over TxGNN (0.815). Importantly, CellAwareGNN prioritizes promising repurposing candidates, including Ocrelizumab for Pemphigus via CD20-expressing B cells, Methotrexate for Pemphigus through DHFR and ATIC activity in T and B cells, and Rosiglitazone for Rheumatoid Arthritis through PPAR-{gamma} activation. These results demonstrate the value of incorporating cell-type-specific genomic context to improve both predictive performance and biological interpretability in graph- based drug repurposing.