Machine learning and natural language processing for the early detection of potential mental disorders among school-age children: a prospective birth cohort study
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Early detection of childhood mental health disorders remains challenging due to gaps in current screening approaches that lack sensitivity to subtle psychological indicators and rely heavily on observable behaviors. We investigated whether integrating machine learning with natural language processing of children’s written expressions could enhance early detection of potential mental disorders among school-age children. This prospective birth cohort study used National Child Development Study (NCDS) data, analyzing 8,981 children born in 1958 in the United Kingdom. Mental health outcomes were assessed using the Bristol Social Adjustment Guide (BSAG) and Rutter A Scale at age 11, with cases defined by scores above 95th and 90th percentiles. Predictive models combined traditional risk factors with natural language features extracted from children’s essays describing their imagined future at age 25. We developed eight machine learning models using various predictor combinations, evaluating performance through receiver operating characteristic (ROC) values. Using BSAG 95th percentile threshold, models combining top five selected variables with essay features achieved significantly higher predictive capability (ROC:0.77, 95%CI:0.71-0.83) compared to models using all variables (ROC:0.70, 95%CI:0.63-0.76) or essay features alone (ROC:0.67, 95%CI:0.60-0.74). At 90th percentile threshold, this integrated approach showed similar improvement (ROC:0.81, 95%CI:0.78-0.85). Key predictors included gestational length, maternal parity, parental age, residential characteristics, parental engagement metrics, and children’s BMI. Sensitivity analyses using Rutter A Scale confirmed these findings. Combining machine learning with natural language processing of children’s future-oriented essays offers a promising approach for early detection of childhood mental health disorders. This integrated screening method could facilitate more timely intervention, though validation in contemporary populations is needed before clinical implementation.