ProteinNet: a standardized data set for machine learning of protein structure.

Journal: BMC bioinformatics

Published Date: Jun 11, 2019

Abstract

BACKGROUND: Rapid progress in deep learning has spurred its application to bioinformatics problems including protein structure prediction and design. In classic machine learning problems like computer vision, progress has been driven by standardized data sets that facilitate fair assessment of new methods and lower the barrier to entry for non-domain experts. While data sets of protein sequence and structure exist, they lack certain components critical for machine learning, including high-quality multiple sequence alignments and insulated training/validation splits that account for deep but only weakly detectable homology across protein space.

Authors

Mohammed AlQuraishi

Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. Electronic address: alquraishi@hms.harvard.edu.

Keywords

Amino Acid Sequence Databases, Protein Machine Learning Proteins Reference Standards Sequence Alignment

External Resources

View on PubMed Access via DOI PubMed (31185886)

ProteinNet: a standardized data set for machine learning of protein structure.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals