Limited Predictability of Client Attendance in a Support Program for HIV Vertical Transmission Prevention: A Comparison of Machine Learning and Community Health Worker Predictions

Journal: medRxiv
Published Date:

Abstract

Client attendance is vital for the success of HIV vertical transmission prevention programs, yet 23.4% of clients missed follow-up appointments after enrolling in a community health worker-led program (n=24,807, Aug-Dec 2022). Predicting which clients are most likely to miss appointments could enable targeted interventions to improve retention. While machine learning appears well-suited for this prediction task, its effectiveness compared to community health worker judgment remains unexplored. We evaluated three machine learning approaches—logistic regression, balanced random forest, and gradient-boosted trees—trained on client enrollment records (n=51,297 training; n=18,577 test) and compared their performance to predictions by community health workers (n=61), who possess direct client interactions and contextual insights. Machine learning models achieved modest predictive performance, with the balanced random forest showing the best accuracy (ROC AUC=0.689). Community health worker predictions similarly showed low accuracy despite their rich contextual knowledge, suggesting attendance is fundamentally difficult to predict. Qualitative insights identified complex and dynamic barriers including stigma, transportation difficulties, and competing personal commitments, often unpredictable at enrollment. These findings highlight fundamental limitations in predicting client attendance and suggest that merely accumulating additional data might not enhance predictive accuracy. Instead, resources might be better allocated to addressing systemic barriers identified by community health workers.

Authors

  • Matthew Olckers; Alexander Lam; Lazola Makhupula; Mhlasakululeka Mvubu