Prediction of amyloid aggregation rates by machine learning and feature selection.

Journal: The Journal of chemical physics
PMID:

Abstract

A novel data-based machine learning algorithm for predicting amyloid aggregation rates is reported in this paper. Based on a highly nonlinear projection from 16 intrinsic features of a protein and 4 extrinsic features of the environment to the protein aggregation rate, a feedforward fully connected neural network (FCN) with one hidden layer is trained on a dataset composed of 21 different kinds of amyloid proteins and tested on 4 rest proteins. FCN shows a much better performance than traditional algorithms, such as multivariable linear regression and support vector regression, with an average accuracy higher than 90%. Furthermore, by the correlation analysis and the principal component analysis, seven key features, folding energy, HP patterns for helix, sheet and helices cross membrane, pH, ionic strength, and protein concentration, are shown to constitute a minimum feature set for characterizing the amyloid aggregation kinetics.

Authors

  • Wuyue Yang
    Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing 100084, China.
  • Pengzhen Tan
    Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing 100084, China.
  • Xianjun Fu
    Institute for Literature and Culture of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan 250355, China.
  • Liu Hong
    Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka, Osaka, 560-0043, Japan.