GENPHIRE: Enhancing Disease Risk Prediction Using Large Language Model

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Estimating an individual’s liability to a disease is a fundamental problem in genome research. By exploiting findings from genome-wide association studies (GWASs), many powerful polygenic risk scores (PRSs) have been developed to predict disease risk based on genetic profile. Despite much success, the performance of PRS models is hindered by its inability to capture complex, nonlinear effects and interactions among variants. In this study, we introduce GENPHIRE or Genetic–Phenotypic Representation, a novel machine learning framework designed for disease risk prediction. The central idea in GENPHIRE is to translate an individual’s genotype profile to a “sentence” consist of basic clinical information together with an ordered list of top phenotypes for which the individual is found to have elevated number of risk alleles. After translation, the sentence is converted to an embedded vector by an pre-trained large language model (LLM) to assess its disease risk. We have tested GENPHIRE using UK Biobank data across a broad range of diseases and found it outperforms state-of-the-art PRS models more than 80% of the time. Our results demonstrated that LLM-derived embeddings can be leveraged for disease risk prediction when an individual’s genotype profile is effectively represented. Our findings highlight a promising alternative strategy that complements existing PRS approaches.

Authors

Danwei Yao; Chang Liu; Shifan Yan; Jiayi Zhang; Yan V. Sun; Zhaohui S. Qin

External Resources

View on medRxiv Access via DOI

GENPHIRE: Enhancing Disease Risk Prediction Using Large Language Model

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

GENPHIRE: Enhancing Disease Risk Prediction Using Large Language Model

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals