Integrating etiological insights with machine learning for precision diagnosis of obstructive jaundice: Findings from a high-volume center

Journal: medRxiv
Published Date:

Abstract

Large-scale cohort studies exploring the etiology of obstructive jaundice (OJ) are scarce, with current serum-based diagnostic markers offering suboptimal performance. This study leverages the largest retrospective cohort of OJ patients to date to investigate its disease spectrum and to develop a novel diagnostic system. This study involves two retrospective observational cohorts. The biliary surgery cohort (BS cohort, n=349) served for initial data exploration and external validation of ML models. The large general cohort (LG cohort, n=5726) enabled an in-depth analysis of etiologies and the determination of relevant diagnostic indicators, in addition to supporting ML model development. Interpretable ML techniques were employed to derive insights from the models. The LG cohort highlighted a diverse disease spectrum of OJ, including cholangiocarcinoma (10.39% distal, 10.01% perihilar, 5.59% intrahepatic), pancreatic adenocarcinoma (19.11%), and common bile duct stones (18.27%) as leading causes. Traditional serum markers such as CA 19-9 and CEA lacked standalone diagnostic accuracy. Two ML-based models (collectively termed the MOLT model) were developed: a classifier to differentiate benign from malignant causes (AUROC=0.862) and a multi-class model to further stratify malignant and benign diseases (ACC=0.777). Interpretable ML tools provided clarity on critical features, offering actionable insights and enhancing transparency in the decision-making process. This study elucidates the etiological spectrum of OJ, meanwhile providing a practical and interpretable ML-based diagnostic tool. By leveraging large-scale clinical data, our model provides a rapid and reliable primary assessment for patients with OJ, enabling clinicians to identify potential etiologies and guide further diagnostic workup. Currently, there is a deficit in large-scale cohort studies as well as practical diagnostic models for identifying the etiology of obstructive jaundice (OJ). Our study emerged as the largest cohort study regarding OJ to date, delineating the spectrum of diseases associated with this condition. Interpretable ML models based on common clinical laboratory tests were developed, collectively termed the MOLT model, which not only distinguishes between benign and malignant obstructions, but also further differentiates between calculous benign lesions, non-calculous benign lesions, metastatic malignancies, pancreato-biliary malignancies and other types of malignancies. These findings can support the identification of the underlying etiology of OJ in primary clinical settings, helping clinicians make well-informed decisions. This is a retrospective cohort study, preregistered in Open Science Framework (registration DOI: https://doi.org/10.17605/OSF.IO/DC4B8).

Authors

  • Ningyuan Wen; Yaoqun Wang; Xianze Xiong; Jianrong Xu; Shaofeng Wang; Yuan Tian; Di Zeng; Xingyu Pu; Geng Liu; Bei Li; Jiong Lu; Nansheng Cheng