Comparing natural language processing representations of coded disease sequences for prediction in electronic health records.
Journal:
Journal of the American Medical Informatics Association : JAMIA
PMID:
38719204
Abstract
OBJECTIVE: Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease codes generated by bag-of-words versus sequence-based NLP algorithms at predicting clinically relevant outcomes.