Interpretable machine learning for cattle breed classification and SNP prioritization.

Journal: Genetics, selection, evolution : GSE
Published Date:

Abstract

BACKGROUND: The conservation of endangered cattle breeds is an important priority for maintaining biodiversity and keeping unique genetic resources. Traditional conservation methods are often not precise enough for accurate classification into closely related breeds. The aim of this study was to develop a machine learning classification model using single nucleotide polymorphisms to improve breed identification and to identify breed discriminating markers. RESULTS: We applied a tuned Light Gradient Boosting Machine (LightGBM) and a Random Forest (RF) classifier to genome-wide SNP data from 6850 individuals representing 11 endangered Austrian cattle breeds. To interpret the model predictions, SHapley Additive exPlanations (SHAP) values were used. Hyperparameters were tuned within the training sets using five-fold cross-validation. Final models were then evaluated on the independent test sets (20% of the data) across six random seeds, yielding mean classification accuracies of 0.842 for LightGBM and 0.837 for Random Forest. Feature importance analysis identified the top 100 single nucleotide polymorphisms (SNPs) contributing most to breed separation. Some SNPs were highly specific for individual breeds, while others reflected broader population genetic structures. In particular, ARS-BFGL-NGS-97995 distinguished major breed clusters, whereas DIAS-308, ARS-BFGL-NGS-843, and ARS-BFGL-NGS-3513 were highly breed-specific for Fleckvieh, Brown Swiss, and Original Braunvieh, respectively. CONCLUSIONS: The machine learning methods provide scalable tools for accurate breed prediction based on genomic data, and the identified SNPs provide practical markers that can support breed management, monitoring programs, and policy strategies. Hence, by combining efficient classification with interpretable feature analysis, our study offers a useful framework for genomic conservation.

Authors

Keywords

No keywords available for this article.