Corpus-wide causality: Algorithm design & application for aggregating gene-disease causal evidence

Journal: bioRxiv
Published Date:

Abstract

Identifying causal relationships, rather than mere associations, is essential for applications such as finding genes driving diseases and guiding drug discovery towards disease mechanisms rather than symptom management. Although many studies extract biomedical relations from large literature corpora such as PubMed, fewer focus on causal relations from abstracts, and fewer still summarize corpus-level evidence for causal links. LLMs (Large Language Models) are increasingly used for biomedical summarization and relation extraction, but explicit benchmarks comparing generalized LLMs against specialized, domain-aware methods for corpus-wide causal inference are lacking. We develop a method to infer Corpus-Wide Causal Score (CWCS) of gene-disease (G-D) pairs by integrating two evidence sources: (i) network-based causal signals in a prior gene regulatory network, quantified as CWCS-Net score using an existing centrality algorithm; and (ii) corpus-wide literature evidence, quantified as CWCS-TD score using a newly-developed Truth Discovery (TD) algorithm. CWCS-TD algorithm jointly and iteratively estimates causal scores for multiple G-D pairs while modeling the relevant abstracts' reliabilities; and advances the field of TD by incorporating bibliometric features to address sparse causal evidence. Using OMIM (Online Mendelian Inheritance in Man) as an expert-curated reference to evaluate classification of G-D pairs as causal or not across ten diseases, CWCS outperformed the tested LLMs, GPT-4o and MMed-Llama 3 (e.g., F1 score of 0.600 for CWCS, and 0.505-0.522 for the LLMs which exhibit high recall but relatively low precision). Together, these evaluations and ablation studies demonstrate CWCS's potential for integrating network- and literature-based evidence to infer biomedical causal relations.

Authors

  • Bansal
  • N.; Parsodkar
  • A. P.; Pathak
  • A.; Narayanan
  • M.

Categories