Clinical Trial Eligibility Criteria Decomposition and Parsing with Large Language Models.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Clinical trial eligibility criteria, often presented as complex free text, pose significant challenges for automated processing. This study introduces a Decomposition and Parsing (DP) workflow to address these challenges by systematically breaking down criteria into "study traits"-the smallest meaningful units-and structuring them with components such as entities, modifiers, constraints, and negations. Leveraging advanced large language models (LLMs) like GPT-4o and Llama3.3 with Chain-of-Thought prompting, the workflow successfully processes Alzheimer's disease trial datasets, achieving strong performance in tasks like logical relationship extraction and trait computability determination. However, challenges remain in capturing nuanced elements like modifiers. The study also proposes innovative evaluation metrics that outperform traditional approaches in assessing the quality of automated extractions. This scalable and intuitive framework advances the representation of clinical trial eligibility criteria, paving the way for improved biomedical informatics applications and highlighting the need for domain-specific fine-tuning and broader dataset integration.

Authors

  • Hongyu Chen
    Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research (Ministry of Education), College of Chemistry and Chemical Engineering, Hunan Normal University, Changsha, 410081, China.
  • Lingfei Qian
    Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.
  • Xing He
    University of Florida, Gainesville, Florida, USA.
  • Aokun Chen
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
  • Yu Huang
    School of Data Science and Software Engineering, Qingdao University, Qingdao 266021, China.
  • Qinling Gou
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
  • Yuxuan Wang
    Department of Maxillofacial and Otorhinolaryngology Oncology, Tianjin's Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.
  • Yan Wang
    College of Animal Science and Technology, Beijing University of Agriculture, Beijing, China.
  • Xuguang Ai
    Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
  • Yujia Zhou
    Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.
  • Inessa Cohen
    Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA.
  • Qingyu Chen
    Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Jiang Bian
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America.