SSMT-PANBERT: A single-stage multitask model for phenotype extraction and assertion negation detection in unstructured clinical text.
Journal:
Computers in biology and medicine
Published Date:
Jun 22, 2025
Abstract
Automatic phenotype extraction and assertion negation detection from large-scale accessible Electronic Health Records (EHRs), including discharge summaries and radiology reports, is a crucial task for various healthcare applications, such as disease diagnosis and treatment planning. The unstructured nature of these documents poses significant challenges for manual processing. However, prior studies exhibit several limitations, such as being restricted to a single label per sentence or omitting the extraction and negation of medical concepts, which make them prone to fail in complex circumstances. In this paper, we capitalize on the advancement of state-of-the-art pre-trained language models (PLMs) to propose a single-stage multitask solution that jointly learns to extract phenotypes and detect their assertion or negation in an end-to-end fashion. Our proposed approach aims to provide practical assistance to healthcare professionals by handling complex and diverse clinical scenarios. We evaluate our method on a validation set derived from an annotated, balanced, and validated dataset based on MIMIC-III clinical notes. The annotations were rigorously reviewed by domain experts to ensure high reliability. The top-performing model in our experiments, SSMT-PANBERT, achieves an average Macro F1 score of 92.33% and a Micro F1 score of 91.66% on the validation set, outperforming traditional pipeline approaches in terms of Macro F1 (92.33% vs. 91.66%), while reducing training time by 37%, inference time by 18.2%, and GPU memory usage by 57%. These results demonstrate the effectiveness of our unified approach in handling complex clinical scenarios while providing significant computational advantages for real-world applications. Furthermore, we conduct a thorough analysis of the model's performance and identify potential areas for future improvement.