RelCAT: Advancing Extraction of Clinical Inter-Entity Relationships from Unstructured Electronic Health Records
Journal:
arXiv
Published Date:
Jan 27, 2025
Abstract
This study introduces RelCAT (Relation Concept Annotation Toolkit), an
interactive tool, library, and workflow designed to classify relations between
entities extracted from clinical narratives. Building upon the CogStack MedCAT
framework, RelCAT addresses the challenge of capturing complete clinical
relations dispersed within text. The toolkit implements state-of-the-art
machine learning models such as BERT and Llama along with proven evaluation and
training methods. We demonstrate a dataset annotation tool (built within
MedCATTrainer), model training, and evaluate our methodology on both openly
available gold-standard and real-world UK National Health Service (NHS)
hospital clinical datasets. We perform extensive experimentation and a
comparative analysis of the various publicly available models with varied
approaches selected for model fine-tuning. Finally, we achieve macro F1-scores
of 0.977 on the gold-standard n2c2, surpassing the previous state-of-the-art
performance, and achieve performance of >=0.93 F1 on our NHS gathered datasets.