Impact of Multi-Factor Features on Protein Secondary Structure Prediction.

Journal: Biomolecules
PMID:

Abstract

Protein secondary structure prediction (PSSP) plays a crucial role in resolving protein functions and properties. Significant progress has been made in this field in recent years, and the use of a variety of protein-related features, including amino acid sequences, position-specific score matrices (PSSM), amino acid properties, and secondary structure trend factors, to improve prediction accuracy is an important technical route for it. However, a comprehensive evaluation of the impact of these factor features in secondary structure prediction is lacking in the current work. This study quantitatively analyzes the impact of several major factors on secondary structure prediction models using a more explanatory four-class machine learning approach. The applicability of each factor in the different types of methods, the extent to which the different methods work on each factor, and the evaluation of the effect of multi-factor combinations are explored in detail. Through experiments and analyses, it was found that PSSM performs best in methods with strong high-dimensional features and complex feature extraction capabilities, while amino acid sequences, although performing poorly overall, perform relatively well in methods with strong linear processing capabilities. Also, the combination of amino acid properties and trend factors significantly improved the prediction performance. This study provides empirical evidence for future researchers to optimize multi-factor feature combinations and apply them to protein secondary structure prediction models, which is beneficial in further optimizing the use of these factors to enhance the performance of protein secondary structure prediction models.

Authors

  • Benzhi Dong
    Information and Computer Engineering College, Northeast Forestry University, Harbin, China. Electronic address: nefu_dbz@163.com.
  • Zheng Liu
    ICSC World Laboratory, Geneva, Switzerland.
  • Dali Xu
    College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China.
  • Chang Hou
    Cardiovascular department, Peking University People's Hospital, Beijing, China.
  • Na Niu
    Department of Nuclear Medicine, State Key Laboratory of Complex Severe and Rare Diseases, Beijing Key Laboratory of Molecular Targeted Diagnosis and Therapy in Nuclear Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, 100730, China.
  • Guohua Wang
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.