Comparison of machine learning and validation methods for high-dimensional accelerometer data to detect foot lesions in dairy cattle.
Journal:
PloS one
Published Date:
Jun 27, 2025
Abstract
Lameness is one of the major production diseases affecting dairy cattle. It is associated with negative welfare in affected cattle, economic losses at the farm level, and adverse effects on sustainability. Prompt identification of lameness is necessary to facilitate early treatment, enhance animal welfare, and mitigate short and long-term production impacts associated with the disease. In recent years, automated detection systems have emerged as a potential solution for identifying early signs of lameness. Among these systems, accelerometers have been widely adopted, as they continuously capture data on animal movement. Analyzing accelerometer data is challenging due to its wide, high-dimensional structure as it has many features and typically much fewer animals or samples, reducing the utility of many machine learning (ML) models and increasing the risk of overfitting. To handle this, researchers often summarize accelerometer data into indices like step counts, which simplifies analysis but may sacrifice important details needed for accurate prediction of lameness. Dimension reduction techniques, such as principal component analysis (PCA) and functional principal component analysis (fPCA), offer solutions by reducing the dimensionality of the data while retaining key information and allowing for the application of a broader set of ML approaches. Using data containing 20 thousand recordings from 383 dairy cows in 11 dairy herds, this study evaluated the effectiveness of ML methods in detecting foot lesions in dairy cows using accelerometer data, with a focus on dimensionality reduction approaches and cross-validation strategies. Our study offers practical insights for the dairy industry by highlighting the potential benefits of combining dimensionality reduction with cross-validation strategies to improve the performance of ML methods applied to wide accelerometer data. In addition, our study highlights the impact and importance of using data from independent farms. A by-farm approach to cross-validation will likely give a more robust, realistic estimate of general model performance.