An integrated analytical framework for gender-based violence research: A simulation study combining machine learning and causal inference
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Current research on Gender-Based Violence (GBV) typically separates predictive machine learning and causal inference into distinct analytical silos. Yet, grasping the multi-level determinants of violence requires an approach that can both identify high-value predictors and disentangle their causal mechanisms. This study develops and validates an integrated five-phase analytical framework to demonstrate how combining machine learning with structural causal modeling improves the detection of determinants within the ecological model. Using a Monte Carlo simulation (N = 3,000) parameterized to replicate the prevalence patterns of the 2022 Kenya Demographic and Health Survey (KDHS), the framework was applied across five stages: descriptive epidemiology, Random Forest variable selection, logistic regression for adjusted associations, mediation analysis, and evidence synthesis. The Random Forest model successfully recovered the programmed data structure, achieving 74.6% prediction accuracy (AUC = 0.732) and correctly prioritizing partner alcohol use, childhood trauma, and marital conflict as top predictors. Subsequent regression analyses validated these inputs, yielding stable effect estimates for partner alcohol use (OR = 6.60) and childhood trauma (OR = 1.99). Notably, the mediation phase quantified the indirect pathway of marital conflict, demonstrating the framework’s capacity to capture complex interactions that traditional models often miss. These findings confirm that integrating machine learning’s selection power with the rigor of causal inference offers a more robust tool for analyzing complex health data than either method alone. The proposed framework provides a replicable blueprint for future research, ensuring that GBV interventions are grounded in both predictive precision and causal understanding.