Mining Unstructured Medical Texts With Conformal Active Learning
Journal:
arXiv
Published Date:
Feb 5, 2025
Abstract
The extraction of relevant data from Electronic Health Records (EHRs) is
crucial to identifying symptoms and automating epidemiological surveillance
processes. By harnessing the vast amount of unstructured text in EHRs, we can
detect patterns that indicate the onset of disease outbreaks, enabling faster,
more targeted public health responses. Our proposed framework provides a
flexible and efficient solution for mining data from unstructured texts,
significantly reducing the need for extensive manual labeling by specialists.
Experiments show that our framework achieving strong performance with as few as
200 manually labeled texts, even for complex classification problems.
Additionally, our approach can function with simple lightweight models,
achieving competitive and occasionally even better results compared to more
resource-intensive deep learning models. This capability not only accelerates
processing times but also preserves patient privacy, as the data can be
processed on weaker on-site hardware rather than being transferred to external
systems. Our methodology, therefore, offers a practical, scalable, and
privacy-conscious approach to real-time epidemiological monitoring, equipping
health institutions to respond rapidly and effectively to emerging health
threats.