Machine-learning-based identification of patients with IgA nephropathy using a computerized medical billing database.

Journal: PloS one
PMID:

Abstract

The billing database of the universal healthcare system in Japan potentially includes large-cohort data of patients with immunoglobulin A nephropathy, diagnosis codes aimed at billing should not be directly used for clinical research because of the risk of misdiagnosis. To solve this problem, we aimed to develop a novel method for identifying patients with immunoglobulin A nephropathy from billing data using machine learning. The medical records and bills of 3,743 patients who consulted nephrologists at a single center were extracted. Patients were labeled to have been diagnosed with immunoglobulin A nephropathy through a review of medical records. A manual analysis of the diagnostic accuracy and machine learning was performed. For machine learning, the datasets were preprocessed in three patterns and assigned to the XGBoost program using five-fold cross-validation. Of all the participants, 437 were labeled as having been diagnosed with immunoglobulin A nephropathy. Bill codes for immunoglobulin A nephropathy were provided to approximately half of them. The manually created criteria consisting of the recommended examinations and treatments in the Japanese guidelines for immunoglobulin A nephropathy showed both specificity and sensitivity < 0.8. In contrast, with the receiver operating characteristic curve analysis, the machine learning process yielded area under the curve values over 0.9 with preprocessing from the clinical viewpoint. Applying machine learning technology to a dataset preprocessed from a clinical viewpoint achieved a high performance in detecting patients with immunoglobulin A nephropathy. This methodology contributes to the construction of a disease-specific cohort using big bill data.

Authors

  • Ryoya Tsunoda
    Faculty of Medicine, Department of Nephrology, University of Tsukuba, Tsukuba, Japan.
  • Keitaro Kume
    Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan.
  • Rina Kagawa
    1 Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan.
  • Masaru Sanuki
    Faculty of Medicine, Department of Clinical Medicine, University of Tsukuba, Tsukuba, Japan.
  • Hiroyuki Kitagawa
    Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan.
  • Kaori Mase
    Faculty of Medicine, Department of Nephrology, University of Tsukuba, Tsukuba, Japan.
  • Kunihiro Yamagata
    Department of Nephrology, Faculty of Medicine, University of Tsukuba, Ibaraki, Japan.