Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation
Journal:
arXiv
Published Date:
Apr 1, 2025
Abstract
The causal relationships between biomarkers are essential for disease
diagnosis and medical treatment planning. One notable application is
Alzheimer's disease (AD) diagnosis, where certain biomarkers may influence the
presence of others, enabling early detection, precise disease staging, targeted
treatments, and improved monitoring of disease progression. However,
understanding these causal relationships is complex and requires extensive
research. Constructing a comprehensive causal network of biomarkers demands
significant effort from human experts, who must analyze a vast number of
research papers, and have bias in understanding diseases' biomarkers and their
relation. This raises an important question: Can advanced large language models
(LLMs), such as those utilizing retrieval-augmented generation (RAG), assist in
building causal networks of biomarkers for further medical analysis? To explore
this, we collected 200 AD-related research papers published over the past 25
years and then integrated scientific literature with RAG to extract AD
biomarkers and generate causal relations among them. Given the high-risk nature
of the medical diagnosis, we applied uncertainty estimation to assess the
reliability of the generated causal edges and examined the faithfulness and
scientificness of LLM reasoning using both automatic and human evaluation. We
find that RAG enhances the ability of LLMs to generate more accurate causal
networks from scientific papers. However, the overall performance of LLMs in
identifying causal relations of AD biomarkers is still limited. We hope this
study will inspire further foundational research on AI-driven analysis of AD
biomarkers causal network discovery.