Determination of milk yield in water buffaloes using multi-class logistic regression and machine learning methods.
Journal:
Tropical animal health and production
Published Date:
Jul 28, 2025
Abstract
In this study, Random Forest, Gradient Boosting Machines (GBM), and Support Vector Machines (SVM), Multi-Class Logistic Regression (MCLR) models were comparatively evaluated for the prediction of milk yield in water buffaloes. The study's main purpose was to compare the success of the determined models in milk yield predictions with their accuracy rates. In response to reviewer feedback, the methodology was enhanced to include stratified 8-fold cross-validation, hyperparameter tuning for RF, GBM, and SVM, and the removal of the multicollinear AGE variable. The revised dataset comprised the following features: lactation period (LP), lactation milk yield (LDMY), age at first pregnancy (1stPregAge), and feed type. Model training and evaluation were conducted using Python 3.7 with Pandas, NumPy, Scikit-learn, and Matplotlib libraries. According to the updated findings, the GBM model outperformed others, achieving an average accuracy of 64.63%, weighted precision of 0.6578, recall of 0.6463, F1-score of 0.6311, and ROC AUC of 0.6625. While the predictive performance remains moderate, these results demonstrate the potential of advanced machine.