Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to ...
Machine learning-based pathogenicity prediction helps interpret rare missense variants of BRCA1 and BRCA2, which are associated with hereditary cancers. Recent studies have shown that classifiers trained using variants of a specific gene or a set of ...
The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual posi...
MOTIVATION: Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships bet...
INTRODUCTION: In silico tools capable of predicting the functional consequences of genomic differences between individuals, many of which are AI-driven, have been the most effective over the past two decades for non-synonymous single nucleotide varia...
Insertion/deletion (InDel) polymorphisms can be used as one of the ancestry-informative markers in ancestry analysis. In this study, a self-developed panel consisting of 56 ancestry-informative InDels was used to investigate the genetic structures an...
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele...
Small insertions and deletions (indels) are critical yet challenging genetic variations with significant clinical implications. However, the identification of pathogenic indels from neutral variants in clinical contexts remains an understudied proble...
MOTIVATION: Selective sweeps can successfully be distinguished from neutral genetic data using summary statistics and likelihood-based methods that analyze single nucleotide polymorphisms (SNPs). However, these methods are sensitive to confounding fa...
Deep learning classification models based on Convolutional Neural Networks (CNNs) are increasingly used in population genetic inference for detecting signatures of natural selection. Prevailing detection methods treat the design of the classifier as ...