A Hybrid Natural Language Processing Platform for Multi-Site RWD Studies.
Journal:
Studies in health technology and informatics
Published Date:
Aug 7, 2025
Abstract
Real-world data (RWD) obtained from electronic medical records has become a valuable resource for healthcare research. However, integrating unstructured free-text clinical data remains a significant challenge. Although natural language processing (NLP) offers a promising solution, its implementation is frequently hampered by high computational costs. Moreover, privacy concerns complicate data integration in multi-site RWD studies. This study proposes a hybrid platform that integrates centralized NLP processing with robust privacy protection, facilitating effective information extraction from free-text data across various institutions. We performed comparative experiments utilizing 500 sample reports to assess the efficacy of the proposed hybrid platform against a fully distributed method using on-site servers. The results indicated that the central graphics processing units server significantly outperformed the site central processing units, processing reports in 0.12 s compared to an average of 64.23 s. Additionally, the central server exhibited a low and consistent increase in processing time regardless of report lengths, highlighting its efficiency and scalability. Our developed hybrid platform enhances computational efficiency while tackling privacy and data governance issues.