Sequence alignment is an essential component of bioinformatics, for identifying regions of similarity that may indicate functional, structural, or evolutionary relationships between the sequences. Genome-based diagnostics relying on DNA sequencing ha...
The discovery of selective and potent kinase inhibitors is crucial for the treatment of various diseases, but the process is challenging due to the high structural similarity among kinases. Efficient kinome-wide bioactivity profiling is essential for...
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which present...
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure pred...
Selecting the best model of sequence evolution for a multiple-sequence-alignment (MSA) constitutes the first step of phylogenetic tree reconstruction. Common approaches for inferring nucleotide models typically apply maximum likelihood (ML) methods, ...
Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec...
AlphaFold2 (ref. ) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates, and disease-causing point mutations often ca...
Phylogenetic tree inference is a classic fundamental task in evolutionary biology that entails inferring the evolutionary relationship of targets based on multiple sequence alignment (MSA). Maximum likelihood (ML) and Bayesian inference (BI) methods ...
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Tra...