A high-throughput screening method for selecting feature SNPs to evaluate breed diversity and infer ancestry.

Journal: Genome research
Published Date:

Abstract

As the scale of deep whole-genome sequencing (WGS) data has grown exponentially, hundreds of millions of single nucleotide polymorphisms (SNPs) have been identified in livestock. Utilizing these massive SNP data in population stratification analysis, ancestry prediction, and breed diversity assessments leads to overfitting issues in computational models and creates computational bottlenecks. Therefore, selecting genetic variants that express high amounts of information for use in population diversity studies and ancestry inference becomes critically important. Here, we develop a method, HITSNP, that combines feature selection and machine learning algorithms to select high-representative SNPs that can effectively estimate breed diversity and infer ancestry. HITSNP outperforms existing feature selection methods in estimating accuracy and computational stability. Furthermore, HITSNP offers a new algorithm to predict the number and composition of ancestral populations using a small number of SNPs, and avoiding calculating the number of clusters. Taken together, HITSNP facilitates the research of population structure, animal breeding, and animal resource protection.

Authors

  • Meilin Zhang
    National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; State Key Laboratory of Animal Biotech Breeding; Frontiers Science Center for Molecular Design Breeding (MOE); College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
  • Heng Du
    National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; State Key Laboratory of Animal Biotech Breeding; Frontiers Science Center for Molecular Design Breeding (MOE); College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
  • Yu Zhang
    College of Marine Electrical Engineering, Dalian Maritime University, Dalian, China.
  • Yue Zhuo
    Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, United States; Harvard Medical School, Boston, United States.
  • Zhen Liu
    School of Pharmacy, Fudan University, PR China; Analytical Service Unit, WuXi AppTec (Shanghai) Co., Ltd, Shanghai, 200131, PR China.
  • Yahui Xue
    College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
  • Lei Zhou
    Department of Gastroenterology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Sixuan Zhou
    Institute of Animal Husbandry and Veterinary Sciences, Guizhou Academy of Agricultural Sciences, Guiyang, Guizhou 550005, China.
  • Wanying Li
    Academy of Military Medical Sciences, BeijingĀ 100850, China.
  • Jian-Feng Liu
    National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; State Key Laboratory of Animal Biotech Breeding; Frontiers Science Center for Molecular Design Breeding (MOE); College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; liujf@cau.edu.cn.

Keywords

No keywords available for this article.