Early Disease Prediction Using a Text-Numerical Hybrid Model Using Large-Scale Clinical Real-World Data.
Journal:
AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:
May 22, 2025
Abstract
To assist physicians in predicting diseases, most natural language processing (NLP) models have focused on progress notes in electronic medical records with full descriptions from the initial stage of patient diagnosis to the final stage of discharge. However, accurately predicting diseases in the early stage using initial notes is challenging due to limited information. To address this, a text-numerical hybrid method is developed to improve disease prediction accuracy. The method identifies "Reliably predicted diseases (RPD)" that can be robustly predicted in the NLP and Random Forest models even if there are missing values in the numerical data or the amount of text data is small. Results show that, among the predicted disease groups of the two models, diseases matching the RPD are preferentially adopted and integrated. Precision@10 reveals that our developed method has a relatively higher accuracy of 67.0% than the traditional NLP model.