NLP-ROPCare: predicting retinopathy of prematurity with admission notes using natural language processing.

Journal: BMJ open ophthalmology
Published Date:

Abstract

OBJECTIVES: Retinopathy of prematurity (ROP) is a leading cause of blindness in children worldwide, requiring more efficient models to help predict treatment-requiring ROP. Our study aimed to develop a new prediction model for ROP occurrence and severity, named NLP-ROPCare, using natural language processing (NLP). METHODS AND ANALYSIS: A retrospective observational study. Infants with a gestational age ≤32 weeks or birth weight ≤2000 g were collected in Guangdong Women and Children Hospital from 2013 to 2022, including 3922 preterm infants with 1106 patients with ROP. Four pretrained language models - BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly Optimized BERT pretraining Approach), MC-BERT (language pre-training via a Meta Controller) and NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) - were used for development of NLP prediction models based on free-form texts in the admission notes. For comparison, two machine learning methods (Random Forest and Support Vector Machine) were used to construct prediction models based on 20 structured characteristics previously extracted from the admission notes. Performance evaluating metrics included accuracy, precision, recall, F1 score and area under the curve (AUC). RESULTS: The NLP prediction models for ROP occurrence outperformed those for severity. The NEZHA model demonstrated the highest accuracy in predicting ROP occurrence, achieving an F1 score of 89.35% and an AUC of 0.90. Its performance was also better than two machine learning models whose highest F1 was 78% with an AUC equal to 0.87. In addition, the F1 score of RoBERTa (78.44%) was slightly higher than that of NEZHA (77.81%) for predicting ROP severity, and the AUC of RoBERTa also achieved the highest 0.91. CONCLUSION: The NLP-ROPCare combines language models NEZHA and RoBERTa to enable early prediction of ROP occurrence and severity based on unstructured free-form texts in the admission notes of preterm infants, highlighting its value in early prevention of ROP. Further external validation should be carried out to better adjust the model.

Authors

  • Yulin Zhang
  • Shuai Zhao
    Xi'an Medical University, Xi'an Shaanxi, 710068, P.R.China.
  • Jianbing Ren
    Chongqing University of Science and Technology, No. 20, University City East Road, Chongqing, 401331, China.
  • Yuwen Li
  • Xinyu Zhao
    AU MRI Research Center, Department of Electrical and Computer Engineering, Auburn University, Auburn, AL, USA.
  • Jie Sun
    College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, People's Republic of China.
  • Chuan Nie
    National Key Clinical Specialty Construction Project/Department of Neonatology, Guangdong Women and Children Hospital, Guangzhou, Guangdong, China.
  • Suzhen Xie
    Department of Ophthalmology, Guangdong Women and Children Hospital, Guangzhou, Guangdong, China.
  • Xuelin Huang
    Department of Ophthalmology, Guangdong Women and Children Hospital, Guangzhou, Guangdong, China.
  • Jinming Wen
    School of Mathematics, Jilin University, Changchun, Jilin, China [email protected] [email protected] [email protected].
  • Xianqiong Luo
    National Key Clinical Specialty Construction Project/Department of Neonatology, Guangdong Women and Children Hospital, Guangzhou, Guangdong, China [email protected] [email protected] [email protected].
  • Guoming Zhang
    Shenzhen Eye Hospital; Shenzhen Key Ophthalmic Laboratory, Health Science Center, Shenzhen University, The Second Affiliated Hospital of Jinan University, Shenzhen, China. Electronic address: [email protected].