Using Machine Learning Algorithms to Predict Hepatitis B Surface Antigen Seroclearance.

Journal: Computational and mathematical methods in medicine
PMID:

Abstract

Hepatitis B surface antigen (HBsAg) seroclearance during treatment is associated with a better prognosis among patients with chronic hepatitis B (CHB). Significant gaps remain in our understanding on how to predict HBsAg seroclearance accurately and efficiently based on obtainable clinical information. This study aimed to identify the optimal model to predict HBsAg seroclearance. We obtained the laboratory and demographic information for 2,235 patients with CHB from the South China Hepatitis Monitoring and Administration (SCHEMA) cohort. HBsAg seroclearance occurred in 106 patients in total. We developed models based on four algorithms, including the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DCT), and logistic regression (LR). The optimal model was identified by the area under the receiver operating characteristic curve (AUC). The AUCs for XGBoost, RF, DCT, and LR models were 0.891, 0.829, 0.619, and 0.680, respectively, with XGBoost showing the best predictive performance. The variable importance plot of the XGBoost model indicated that the level of HBsAg was of high importance followed by age and the level of hepatitis B virus (HBV) DNA. Machine learning algorithms, especially XGBoost, have appropriate performance in predicting HBsAg seroclearance. The results showed the potential of machine learning algorithms for predicting HBsAg seroclearance utilizing obtainable clinical data.

Authors

  • Xiaolu Tian
    Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.
  • Yutian Chong
    Department of Infectious Disease, The Third Affiliated Hospital of Sun Yat-sen University No. 600, Tianhe Road, Guangzhou 510630, China.
  • Yutao Huang
    School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China.
  • Pi Guo
    Department of Preventive Medicine, Shantou University Medical College, Shantou, China.
  • Mengjie Li
    Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.
  • Wangjian Zhang
    Department of Environmental Health Sciences, School of Public Health, University at Albany, State University of New York, Rensselaer 12144, USA.
  • Zhicheng Du
    Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.
  • Xiangyong Li
    Department of Gastrointestinal Surgery, The Second Affiliated Hospital of Soochow University, Suzhou City, Jiangsu Province, People's Republic of China.
  • Yuantao Hao
    Department of Medical Statistics and Epidemiology & Health Information Research Center & Guangdong Key Laboratory of Medicine, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China.