A Computable Phenotype for Acute Respiratory Distress Syndrome Using Natural Language Processing and Machine Learning.
Journal:
AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:
Dec 5, 2018
Abstract
Acute Respiratory Distress Syndrome (ARDS) is a syndrome of respiratory failure that may be identified using text from radiology reports. The objective of this study was to determine whether natural language processing (NLP) with machine learning performs better than a traditional keyword model for ARDS identification. Linguistic pre-processing of reports was performed and text features were inputs to machine learning classifiers tuned using 10-fold cross-validation on 80% of the sample size and tested in the remaining 20%. A cohort of 533 patients was evaluated, with a data corpus of 9,255 radiology reports. The traditional model had an accuracy of 67.3% (95% CI: 58.3-76.3) with a positive predictive value (PPV) of 41.7% (95% CI: 27.7-55.6). The best NLP model had an accuracy of 83.0% (95% CI: 75.9-90.2) with a PPV of 71.4% (95% CI: 52.1-90.8). A computable phenotype for ARDS with NLP may identify more cases than the traditional model.
Authors
Keywords
Adult
Aged
Area Under Curve
Cohort Studies
Diagnosis, Computer-Assisted
Electronic Health Records
Female
Humans
Length of Stay
Male
Middle Aged
Natural Language Processing
Predictive Value of Tests
Radiography, Thoracic
Respiratory Distress Syndrome
Risk Factors
Supervised Machine Learning
Unified Medical Language System