Development of a natural language processing application to extract and categorize mentions of violence from mental healthcare records text

Journal: medRxiv

Published Date: Mar 26, 2026

Abstract

Background: Experiences of violence are reported frequently by mental health service users, victims of violence are at a greater risk of mental health disorders, and violence may sometimes occur as a consequence of a mental disorder. Electronic health records (EHRs) are an important source of information about healthcare, and its social context. Occurrences of violence are not routinely recorded as structured data in EHRs but are however recorded in the free text narrative. Objective: Our objective was to address this research gap by creating a natural language processing (NLP) application that extracts information related to various forms of violence (physical (non-sexual), sexual, emotional, and financial) from the EHR of a large south London mental health service. Additionally, we aimed to extract features concerning the patients role (victimization vs. perpetration), timing (recent vs. historic), domestic context, presence (actual, threat, or unclear), and polarity (affirmed, abstract, or negated) of the violent behaviors. Methods: Two raters independently annotated 6,500 randomly selected segments of clinical notes containing violence-related keywords from a large mental healthcare provider in South London, each containing 400 characters (with approximately 200 characters before and after the keyword) after rigorous training using a pre-defined and approved coding book provided by senior professionals. We utilized 90% of the annotated data for fine-tuning a multi-label BERT model (employing 5-fold cross-validation) with the remaining 10% of data reserved for a blind test. Results: The model performed well on the blind test set for emotional violence (F1= 0.89), financial violence (0.88), physical (non-sexual) violence (0.84), and unspecified violence (0.81), and the patient role (0.89 as perpetrator; 0.84 as victim), polarity (0.89 for affirmed behavior), presence (0.95 for actual violence), and domestic settings (0.88). We were unable to achieve satisfactory results in capturing temporal aspects (0.65 for past violence). Conclusions: We were able to improve substantially on previously developed NLP for ascertaining violence in routine mental health records, providing novel opportunities for both surveillance and research. Keywords: Electronic mental health records, EHR, violence, NLP, BERT

Authors

Li
L.; Sondh
S.; Sondh
H. K.; Stewart
R.; Roberts
A.

External Resources

View on medRxiv Access via DOI

Development of a natural language processing application to extract and categorize mentions of violence from mental healthcare records text

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Development of a natural language processing application to extract and categorize mentions of violence from mental healthcare records text

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals