Genetic structure analysis of a Northwest Chinese population using a self-developed DIP system and machine learning for forensic ancestry inference.
Journal:
BMC genomics
Published Date:
Feb 16, 2026
Abstract
This study investigates forensic ancestry inference in admixed populations using a self-developed 60-DIP panel, analyzing Kyrgyz samples from Northwest China (CNK) and 26 reference populations from the 1000 Genomes Project. Genetic analyses (PCA, ADMIXTURE, TreeMix) indicated that the CNK shares core ancestry with East Asian populations while exhibiting a partially distinct genetic structure, consistent with previous genome-wide studies. Although inferences regarding deep demographic history are preliminary given the limited number of markers, the DIP panel achieved high forensic efficiency, with a cumulative probability of discrimination > 0.99999999999 and a probability of exclusion > 0.9996. In ancestry modeling, machine learning algorithms significantly outperformed traditional supervised dimensionality reduction methods. At the continental level, XGBoost achieved the highest accuracy (0.919) and strong performance across all major ancestries, with near‑perfect discrimination of African and East Asian populations. For East‑Asian sub‑regional classification, random forest achieved the best performance (accuracy = 0.709) and showed the highest precision for the CNK group. The loci rs5891435 and rs35171885 were key for continental and subregional differentiation. Results support the East Asian background of the CNK, and show that machine learning with an optimized DIP panel can substantially improve ancestry inference in the admixed CNK group and similar settings, providing a promising and practically useful forensic biogeographical ancestry tool that warrants further validation in broader datasets.
Authors
Keywords
No keywords available for this article.