Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models.

Journal: Bioinformatics (Oxford, England)

Published Date: Jun 1, 2020

Abstract

MOTIVATION: The Protein Data Bank (PDB), the ultimate source for data in structural biology, is inherently imbalanced. To alleviate biases, virtually all structural biology studies use nonredundant (NR) subsets of the PDB, which include only a fraction of the available data. An alternative approach, dubbed redundancy-weighting (RW), down-weights redundant entries rather than discarding them. This approach may be particularly helpful for machine-learning (ML) methods that use the PDB as their source for data. Methods for secondary structure prediction (SSP) have greatly improved over the years with recent studies achieving above 70% accuracy for eight-class (DSSP) prediction. As these methods typically incorporate ML techniques, training on RW datasets might improve accuracy, as well as pave the way toward larger and more informative secondary structure classes.

Authors

Tomer Sidi
Chen Keasar

Keywords

Computational Biology Databases, Protein Deep Learning Machine Learning Protein Structure, Secondary

External Resources

View on PubMed Access via DOI PubMed (32186698)

Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals