Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web ...
Phylogenetic inference aims at reconstructing the tree describing the evolution of a set of sequences descending from a common ancestor. The high computational cost of state-of-the-art maximum likelihood and Bayesian inference methods limits their us...
Amino acid substitution models play an important role in studying the evolutionary relationships among species from protein sequences. The amino acid substitution model consists of a large number of parameters; therefore, it is estimated from hundred...
MOTIVATION: Multiple sequence alignments (MSAs) are extensively used in biology, from phylogenetic reconstruction to structure and function prediction. Here, we suggest an out-of-the-box approach for the inference of MSAs, which relies on algorithms ...
Understanding the genetic basis of phenotypic variation is fundamental to biology. Here we introduce GAP, a novel machine learning framework for predicting binary phenotypes from gaps in multi-species sequence alignments. GAP employs a neural network...
MOTIVATION: Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic ana...
MOTIVATION: Deep-learning models are transforming biological research, including many bioinformatics and comparative genomics algorithms, such as sequence alignments, phylogenetic tree inference, and automatic classification of protein functions. Amo...
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific p...
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Tra...
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.