Biomedical text normalization through generative modeling.
Journal:
Journal of biomedical informatics
Published Date:
Jul 1, 2025
Abstract
OBJECTIVE: A large proportion of electronic health record (EHR) data consists of unstructured medical language text. The formatting of this text is often flexible and inconsistent, making it challenging to use for predictive modeling, clinical decision support, and data mining. Large language models' (LLMs) ability to understand context and semantic variations makes them promising tools for standardizing medical text. In this study, we develop and assess clinical text normalization pipelines built using large-language models.