A disease inference method based on symptom extraction and bidirectional Long Short Term Memory networks.
Journal:
Methods (San Diego, Calif.)
Published Date:
Jul 10, 2019
Abstract
The wide applications of automatic disease inference in many medical fields improve the efficiency of medical treatments. Many efforts have been made to predict patients' future health conditions according to their full clinical texts, clinical measurements or medical codes. Symptoms reflect the onset of diseases and can provide credible information for disease diagnosis. In this study, we propose a new disease inference method by extracting symptoms and integrating two symptom representation approaches. To reduce the uncertainty and irregularity of symptom descriptions in Electronic Medical Records (EMR), a comprehensive clinical knowledge database consisting of massive amount of data about diseases, symptoms, and their relationships, we extract symptoms with existing nature language process tool Metamap which is designed for biomedical texts. To take advantages of the complex relationship between symptoms and diseases to enhance the accuracy of disease inference, we present two symptom representation models: term frequency-inverse document frequency (TF-IDF) model for the representation of the relationship between symptoms and diseases and Word2Vec for the expression of the semantic relationship between symptoms. Based on these two symptom representations, we employ the bidirectional Long Short Term Memory networks (BiLSTMs) to model symptom sequences in EMR. Our proposed model shows a significant improvement in term of AUC (0.895) and F1 (0.572) for 50 diseases in MIMIC-III dataset. The results illustrate that the model with the combination of the two symptom representations perform better than the one with only one of them.