Single nucleotide exact amplicon sequence variants (ASV) of the human gut microbiome were used to evaluate if individuals with a depression phenotype (DEPR) could be identified from healthy reference subjects (NODEP). Microbial DNA in stool samples o...
In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsuperv...
BACKGROUND: Calling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for calling single nucleotide polymorphisms (...
BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the read...
Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity ca...
Blood-borne small non-coding (sncRNAs) are among the prominent candidates for blood-based diagnostic tests. Often, high-throughput approaches are applied to discover biomarker signatures. These have to be validated in larger cohorts and evaluated by ...
The performance of most error-correction (EC) algorithms that operate on genomics reads is dependent on the proper choice of its configuration parameters, such as the value of k in k-mer based techniques. In this work, we target the problem of findin...
Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequen...
IEEE/ACM transactions on computational biology and bioinformatics
Nov 4, 2019
The development of the next-generation sequencing (NGS) technologies has led to massive amounts of VCF (Variant Call Format) files, which have been the standard formats developed with 1000 Genomes Project. At the same time, with the widespread use of...
Forensic science international. Genetics
Oct 28, 2019
We present a machine learning approach to short tandem repeat (STR) sequence detection and extraction from massively parallel sequencing data called Fragsifier. Using this approach, STRs are detected on each read by first locating the longest repeat ...