Evaluation of six machine learning classification algorithms in pig breed identification using SNPs array data.

Journal: Animal genetics
Published Date:

Abstract

Breed identification utilizing multiple information sources and methods is widely applicated in the field of animal genetics and breeding. Simultaneously, with the development of artificial intelligence, the integration of high-throughput genomic data and machine learning techniques is increasingly used for breed identification. In this context, we used 654 individuals from 15 pig breeds, evaluating the performance of machine learning and stacking ensemble learning classifiers, as well as the function of feature selection and anomaly detection in different scenarios. Our results showed that, when using a training set of 16 individuals per breed and 32 features (SNPs), the accuracy of breed identification with feature selection (eXtreme Gradient Boosting, XGBoost) could exceed 95.00% (nine breeds), and was improved by 7.04% over the results with random selection. For stacking ensemble learning, feature selection methods (including random selection method) were used before different base learners. When these base learners' training set had 16 individuals per breed and 32 features, the accuracy of stacking ensemble learning improved by 9.24% over the best base learner (nine breeds), but did not significantly increase the advantage over the models with XGBoost feature selection. When using a training set of 16 individuals and 512 features per breed, breed identification with anomaly detection (local outlier factor, LOF) and random selection could achieve an accuracy of 89.06% (15 breeds). These results show that machine learning could be an effective tool for breed identification and this study will also provide useful information for the application of machine learning in animal genetics and breeding.

Authors

  • Ruiqi Liu
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Zhiting Xu
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Jinyan Teng
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Xiangchun Pan
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Qing Lin
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Xiaodian Cai
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Shuqi Diao
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Xueyan Feng
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Xiaolong Yuan
    National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, Guangdong Laboratory of Lingnan Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China.
  • Jiaqi Li
    Department of Critical Care Medicine, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, People's Republic of China.
  • Zhe Zhang
    Department of Urology, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning 110001, China.