Leveraging unsupervised machine learning techniques for detecting outliers in the daily milk yield data of dairy cows.

Journal: Journal of dairy science
Published Date:

Abstract

The lactation curve is essential for developing effective feeding plans, optimizing breeding, and strategizing milk production for dairy farms. However, health disorders, as well as external factors such as heat stress, dietary changes, and certain management practices can cause perturbations (temporary drops in milk yield) that shift the fitted lactation curve downward, making it difficult to accurately estimate the potential lactation ability of dairy cows. This study aims to evaluate the applicability of unsupervised machine learning techniques for detecting outliers in daily milk yield data and estimating the expected lactation curve in the absence of perturbations, referred to as the unperturbed lactation curve (ULC). Using the Wood model as the baseline lactation curve, we compared ULC derived from 3 unsupervised machine learning models (UMLM), specifically one-class support vector machines, isolation forest, and local outlier factor, with those from 2 previously proposed models: the perturbed lactation model (PLM) and the iterative Wood model (IWM). We first conducted a simulation study using 1,000 simulated lactations over a 305-d period, each including 1 to 15 perturbations (mean ± SD: 4.00 ± 1.46), to assess perturbation detection performance. Across all UMLM, sensitivities (∼61%), precisions (∼82%), and their harmonic means (F scores, ∼70%) did not differ significantly. The UMLM outperformed the baseline Wood model in sensitivity (51.5%) and F score (64.2%) while maintaining comparable precision (83.8%). Their F scores also exceeded those of the PLM (53.2%) and IWM (66.8%), indicating more balanced curve adjustment and improved perturbation detection. We then applied the models to observed daily milk yield data from 2,831 lactation records of 1,636 Holstein cows collected over a 10-year period at the University of Wisconsin-Madison Agricultural Research Station. The comparison focused on the goodness-of-fit of ULC, computational efficiency, curve shape, and the validity of identified perturbations. The UMLM demonstrated relatively high computational efficiency in establishing the ULC, and these ULC showed better goodness-of-fit and shapes more consistent with the baseline Wood curve than the PLM and IWM. The upward shifts in the ULC from the UMLM were more conservative than those from the IWM and PLM, yet seemed reasonable based on previous reports on the impact of health disorders on milk yield. Additionally, these upward shifts by the UMLM may help identify potential perturbations that went undetected with the baseline Wood curve. In contrast, the PLM and IWM showed limitations in detecting potential perturbations, especially during early lactation. These findings suggest that unsupervised machine learning techniques can effectively detect potential outliers in daily milk yield data and adequately estimate the expected lactation curve in the absence of perturbations. However, the generalizability of the findings may be limited by the use of data from only Holstein cows at a single farm and the absence of health, environmental, and management records. Moreover, the current UMLM do not account for fixed effects (e.g., breed, parity, calving season) or long-term impacts of health disorders, which may hinder accurate lactation curve modeling. Future studies should consider incorporating more flexible modeling approaches and multifarm datasets with detailed background records.

Authors

  • Shogo Higaki
    National Institute of Animal Health, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, 305-0856, Japan.
  • Eduardo Noronha de Andrade Freitas
    Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706; Department of Informatic, Federal Institute of Goiás, Goiânia, Goiás 74130-012, Brazil.
  • Ariana Negreiro
    Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53703.
  • João R R Dórea
    Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI.
  • Victor E Cabrera
    Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706. Electronic address: vcabrera@wisc.edu.