Recent advances in machine learning promise to yield novel insights by interrogation of large datasets ranging from gene expression and mutation data to CRISPR knockouts and drug screens. We combined existing and new algorithms with available experim...
Computational and mathematical methods in medicine
Apr 24, 2021
Ensemble learning combines multiple learners to perform combinatorial learning, which has advantages of good flexibility and higher generalization performance. To achieve higher quality cancer classification, in this study, the fast correlation-based...
Gliomas are primary malignant brain tumors. Monocytes have been proved to actively participate in tumor growth. Weighted gene co-expression network analysis was used to identify meaningful monocyte-related genes for clustering. Neural network and SVM...
IEEE/ACM transactions on computational biology and bioinformatics
Apr 8, 2021
There is often a limited amount of omics data to design predictive models in biomedicine. Knowing that these omics data come from underlying processes that may share common pathways and disease mechanisms, it may be beneficial for designing a more ac...
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems i...
The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing i...
Colorectal cancer (CRC) is one of the most common cancer, and the early detection of CRC is essential to improve the survival rate of patients. To identify diagnostic markers for colorectal cancer (CRC) by screening differentially expressed proteins ...
Deep learning algorithms have been utilized to achieve enhanced performance in pattern-recognition tasks. The ability to learn complex patterns in data has tremendous implications in immunogenomics. T-cell receptor (TCR) sequencing assesses the diver...
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deami...
Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based ...