Pandemic-Potential Viruses are a Blind Spot for Frontier Open-Source LLMs

Journal: medRxiv
Published Date:

Abstract

We study large language models (LLMs) for front-line, pre-diagnostic infectious-disease triage, a critically understudied stage in clinical interventions, public health, and biothreat containment. We focus specifically on the operational decision of classifying symptomatic cases as viral vs. non-viral at first clinical contact, a critical decision point for resource allocation, quarantine strategy, and antibiotic use. We create a benchmark dataset of first-encounter cases in collaboration with multiple healthcare clinics in Nigeria, capturing high-risk viral presentations in low-resource settings with limited data. Our evaluations across frontier open-source LLMs reveal that (1) LLMs underperform standard tabular models and (2) case summaries and Retrieval Augmented Generation yield only modest gains, suggesting that naïve information enrichment is insufficient in this setting. To address this, we demonstrate how models aligned with Group Relative Policy Optimization and a triage-oriented reward consistently improve baseline performance. Our results highlight persistent failure modes of general-purpose LLMs in pre-diagnostic triage and demonstrate how targeted reward-based alignment can help close this gap.

Authors

  • Laura Luebbert; Yasha Ektefaie; Arya S. Rao; Colby Wilkason; Dolo Nosamiefan; Olivia Achonduh-Atijegbe; Harouna Soumare; Adefoye Precious Adebayo; Olufemi Olulaja; Judith Amadi; Nicholas Oyejide; Funmilayo Olayiwola; Etim Henshaw; Yusuf Okocha; Nkechinyere Nwachukwu; Elechi Friday Ewah; Sylvanus Okoro; Ebenezer Nwakpakpa; Peter Okokhere; Kelly Iraoyah; Joseph Okoeguale; Ireti Dada; Andy Burris; Karlie Zhao; Ellory Laning; Chase van Amburg; Paul Cronan; Ben Fry; Christian Happi; Al Ozonoff; Pardis C. Sabeti