Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:

Abstract

Suicide remains a pressing global health concern, necessitating innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in posts from the SuicideWatch subreddit by proposing a novel deep-learning approach that utilizes the state-of-the-art RoBERTa-CNN model. The robustly Optimized BERT Pretraining Approach (RoBERTa) excels at capturing textual nuances and forming semantic relationships within the text. The remaining Convolutional Neural Network (CNN) head enhances RoBERTa's capacity to discern critical patterns from extensive datasets. To evaluate RoBERTa-CNN, we conducted experiments on the Suicide and Depression Detection dataset, yielding promising results. For instance, RoBERTa-CNN achieves a mean accuracy of 98% with a standard deviation (STD) of 0.0009. Additionally, we found that data quality significantly impacts the training of a robust model. To improve data quality, we removed noise from the text data while preserving its contextual content through either manually cleaning or utilizing the OpenAI API.

Authors

  • Emily Lin
    Division of Gynecology, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
  • Jian Sun
    Department Of Computer Science, University of Denver, 2155 E Wesley Ave, Denver, Colorado, 80210, United States of America.
  • Hsingyu Chen
  • Mohammad H Mahoor
    Department Of Computer Engineering, University of Denver, 2155 E Wesley Ave, Denver, Colorado, 80210, United States of America.