Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

Journal: BMC medical informatics and decision making
PMID:

Abstract

The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.

Authors

  • Domenico Paolo
    Unit of Computer Systems & Bioinformatics, Università Campus Bio-Medico di Roma, Italy.
  • Carlo Greco
    Operative Research Unit of Radiation Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Italy.
  • Alessio Cortellini
    Operative Research Unit of Medical Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Via Alvaro del Portillo, Rome, Italy.
  • Sara Ramella
    Operative Research Unit of Radiation Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Italy.
  • Paolo Soda
    Unit of Computer Systems and Bioinformatics, Department of Engineering, University Campus Bio-Medico of Rome, Italy; Department of Radiation Sciences, Radiation Physics, Biomedical Engineering, Umeå, University, Umeå, Sweden. Electronic address: paolo.soda@umu.se.
  • Alessandro Bria
    Department of Engineering, University Campus Bio-Medico of Rome, Rome, Italy.
  • Rosa Sicilia
    Unit of Computer Systems & Bioinformatics, Università Campus Bio-Medico di Roma, Italy.