A patient-centered approach to developing and validating a natural language processing model for extracting patient-reported symptoms.
Journal:
Scientific reports
Published Date:
Jul 29, 2025
Abstract
Patient-reported symptoms provide valuable insights into patient experiences and can enhance healthcare quality; however, effectively capturing them remains challenging. Although natural language processing (NLP) models have been developed to extract adverse events and symptoms from medical records written by healthcare professionals, limited studies have focused on models designed for patient-generated narratives. This study developed an NLP model to extract patient-reported symptoms from pharmaceutical care records and validated its effectiveness in analyzing diverse patient-generated narratives. The target dataset comprised "Subjective" sections of pharmaceutical care records created by community pharmacists for patients prescribed anticancer drugs. Two annotation guidelines were applied to develop robust ground-truth data, which was used to develop and evaluate a new transformer-based named entity recognition model. Model performance was compared with that of an existing tool for Japanese clinical texts and tested on external patient-generated blog data to evaluate generalizability. The newly developed BERT-CRF model significantly outperformed the existing model, achieving an F1 score > 0.8 on pharmaceutical care records and extracting > 98% of physical symptom entries from patient blogs, with a 20% improvement over the existing tool. These findings highlight the importance of fine-tuning models using patient-specific narrative data to capture nuanced and colloquial symptom expressions.