Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification
Journal:
arXiv
Published Date:
Apr 16, 2025
Abstract
Clinical natural language processing (NLP) is increasingly in demand in both
clinical research and operational practice. However, most of the
state-of-the-art solutions are transformers-based and require high
computational resources, limiting their accessibility. We propose a hybrid NLP
framework that integrates rule-based filtering, a Support Vector Machine (SVM)
classifier, and a BERT-based model to improve efficiency while maintaining
accuracy. We applied this framework in a dementia identification case study
involving 4.9 million veterans with incident hypertension, analyzing 2.1
billion clinical notes. At the patient level, our method achieved a precision
of 0.90, a recall of 0.84, and an F1-score of 0.87. Additionally, this NLP
approach identified over three times as many dementia cases as structured data
methods. All processing was completed in approximately two weeks using a single
machine with dual A40 GPUs. This study demonstrates the feasibility of hybrid
NLP solutions for large-scale clinical text analysis, making state-of-the-art
methods more accessible to healthcare organizations with limited computational
resources.