Large language models identify causal genes in complex trait GWAS

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Identifying causal genes at genome-wide association study (GWAS) loci remains a major challenge. Literature evidence for disease-gene co-occurrence, whether through automated approaches or human expert annotation, is one way of nominating causal genes at GWAS loci. However, current automated approaches are limited in accuracy and generalizability, and expert annotation is not scalable to hundreds of thousands of significant findings. Here, we demonstrate that large language models (LLMs) can accurately prioritize likely causal genes at GWAS loci. We rigorously evaluated several widely available general-purpose LLMs using a benchmark of high-confidence causal gene annotations, including a novel set of 26 previously unpublished GWAS. Our results show that LLMs outperform current state-of-the-art methods and substantially augment their performance. These findings establish LLMs as a powerful, efficient, and scalable approach to causal gene discovery.

Authors

Suyash S. Shringarpure; Wei Wang; Sotiris Karagounis; Xin Wang; Anna C. Reisetter; Adam Auton; Aly A. Khan

External Resources

View on medRxiv Access via DOI

Large language models identify causal genes in complex trait GWAS

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Large language models identify causal genes in complex trait GWAS

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals