Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish. It performs medical NER and normalization, medication information extraction and detection of temporal entities, negation and speculation, and temporality or experiencer attributes (Age, Contraindicated, Negated, Speculated, Hypothetical, Future, Family_member, Patient and Other). We built the tool with a dedicated lexicon and rules adapted from NegEx and HeidelTime. Using these resources, we annotated a corpus of 1200 texts, with high inter-annotator agreement (average F1 = 0.841% ± 0.045 for entities, and average F1 = 0.881% ± 0.032 for attributes). We used this corpus to train Transformer-based models (RoBERTa-based models, mBERT and mDeBERTa). We integrated them with the dictionary-based system in a hybrid tool, and distribute the models via the Hugging Face hub. For an internal validation, we used a held-out test set and conducted an error analysis. For an external validation, eight medical professionals evaluated the system by revising the annotation of 200 new texts not used in development.

Authors

  • Leonardo Campillos-Llanos
    Computational Linguistics Laboratory, Universidad Autónoma de Madrid, C/Francisco Tomás y Valiente 1. Cantoblanco Campus, 28049, Madrid, Spain. leonardo.campillos@uam.es.
  • Ana Valverde-Mateos
    Medical Terminology Unit, Spanish Royal Academy of Medicine., C/Arrieta 12, 28013, Madrid, Spain.
  • Adrián Capllonch-Carrión
    Complejo Asistencial Hospital Benito Menni., C/Jardines 1, 28350, Ciempozuelos, Madrid, Spain.