Automatic prediction of coronary artery disease from clinical narratives.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Coronary Artery Disease (CAD) is not only the most common form of heart disease, but also the leading cause of death in both men and women (Coronary Artery Disease: MedlinePlus, 2015). We present a system that is able to automatically predict whether patients develop coronary artery disease based on their narrative medical histories, i.e., clinical free text. Although the free text in medical records has been used in several studies for identifying risk factors of coronary artery disease, to the best of our knowledge our work marks the first attempt at automatically predicting development of CAD. We tackle this task on a small corpus of diabetic patients. The size of this corpus makes it important to limit the number of features in order to avoid overfitting. We propose an ontology-guided approach to feature extraction, and compare it with two classic feature selection techniques. Our system achieves state-of-the-art performance of 77.4% F1 score.

Authors

  • Kevin Buchan
    Department of Information Science, State University of New York at Albany, NY, USA. Electronic address: kbuchan@albany.edu.
  • Michele Filannino
    Department of Computer Science, State University of New York at Albany, NY, USA.
  • Ozlem Uzuner
    Department of Information Studies, University at Albany, SUNY. Albany, NY.