Machine learning models for tuberculous pleural effusion diagnosis in Africa setting

Journal: medRxiv
Published Date:

Abstract

Traditional diagnostic methods for Tuberculous pleural effusion (TPE) are often limited by their invasiveness, low sensitivity, and lack of accessibility. This study aims to develop and evaluate ML models for diagnosing TPE in African contexts, using readily available clinical and laboratory data. A cross-sectional study carried out in Yaoundé, Cameroon (2018-2023), included patients with non-purulent exudative pleural effusion. Pleural fluid was analysed for total protein, lactate dehydrogenase (LDH), glucose, C-reactive protein (CRP) and cytology. TPE diagnosis relied on detection of tuberculous bacilli or tuberculous granuloma . Five ML models namely Random Forest (RF), XGBOOST, Logistic Regression (LR), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) were tested using binary classification (TPE vs non-TPE) in Python software. The performance of models was evaluated using the area under the receiver operating characteristic curve (AUC), F1 score, accuracy, sensitivity, and precision. Of the 302 participants included, 175 (57.9%) were male and their median age (interquartile range) was 46 (34-61) years. Overall, 58.9% of participants had TPE, 15.9% had pleural metastasis and 25.2% had other causes of pleural effusion. Patients with TPE were younger, more often male and had a higher prevalence of HIV-infection. They also had higher pleural protein and CRP levels. The RF model showed the best performance with an AUC of 0.846 and an F1 score of 0.811 in the testing sample. Sensitivity was higher for MLP (0,944) and precision was higher for LR (0.806). Key predictors identified by the RF model were pleural CRP levels, age, pleural LDH levels, body mass index, and pleural protein levels. The RF model had the best performance. MLP and LR had the best sensitivity and precision respectively. The models can be used to improve diagnosis of TPE in Africa settings. Tuberculous pleural effusion (TPE) diagnosis is invasive and lacks sensitivity in resource-limited African settings, with limited ML models available. This study developed and evaluated five ML models for TPE diagnosis using clinical and pleural fluid data from Africa setting. The Random Forest model outperformed others in diagnosing TPE, using pleural CRP levels, age, pleural LDH levels, body mass index, and pleural protein levels as key predictors. Machine learning models can enhance TPE diagnosis in Africa using accessible and reliable biomarkers.

Authors

  • Eric Walter Pefura-Yone; Adamou Dodo Balkissou; Laurent-Mireille Endale-Mangamba; Jodie Bane; Massongo Massongo; Marie Elisabeth Ngah Komo; Virginie Poka-Mayap; Alain Kuaban; Abdou Wouliyou Nsounfon; Djenabou Amadou; Arnaud Laurel Ntyo’o’ Nkoumou; Paul Ledoux Tendap-Ndam; Marie Josiane Ntsama-Essomba; Marie Christine Ekongolo; Christian Mbobara-Yapele; Vicky Jocelyne Ama-Moor