When Timing Matters: Evaluating Temporal Leakage in Machine Learning Models of Football Pass Turnovers.

Journal: Research quarterly for exercise and sport
Published Date:

Abstract

The Expected Pass Turnovers (xPT) model advances turnover probability quantification in professional football, but the inclusion of post-pass descriptive features such as ball speed and distance moved introduces temporal leakage and limits real-time tactical utility. This study compares the original xPT framework with leakage-corrected alternatives across four modeling approaches: mixed-effects logistic regression, penalized logistic regression, random forest, and gradient boosting. Using 256,433 passes from the 2020-21 English Premier League, we evaluated the performance of leakage-inclusive and leakage-corrected feature sets using grouped cross-validation and cross-validated ROC-AUC, accuracy, sensitivity, specificity, F-measure, and Brier score. Removing post-execution features reduced ROC-AUC by 0.082-0.183 (mean = 0.136), with tree-based methods experiencing larger performance losses than logistic approaches. The best-performing alternative model, gradient boosting (ROC-AUC = 0.742), approached the default mixed-effects logistic model (ROC-AUC = 0.789), indicating that substantial predictive signal remains after leakage correction. Alternative models retained acceptable sensitivity and calibration, supporting their use in prospective tactical deployment. SHAP analysis showed that default models relied heavily on four post-pass descriptive variables, whereas leakage-corrected models shifted toward pressing intensity and tactical context variables that are available before pass execution. The findings suggest that default xPT-style models are well suited to retrospective analysis, while leakage-corrected models are more appropriate for real-time tactical decision-making.

Authors

Keywords

No keywords available for this article.