idDock+: Integrating Machine Learning in Probabilistic Search for Protein-Protein Docking.
Journal:
Journal of computational biology : a journal of computational molecular cell biology
Published Date:
Jul 29, 2015
Abstract
Predicting the three-dimensional native structures of protein dimers, a problem known as protein-protein docking, is key to understanding molecular interactions. Docking is a computationally challenging problem due to the diversity of interactions and the high dimensionality of the configuration space. Existing methods draw configurations systematically or at random from the configuration space. The inaccuracy of scoring functions used to evaluate drawn configurations presents additional challenges. Evidence is growing that optimization of a scoring function is an effective technique only once the drawn configuration is sufficiently similar to the native structure. Therefore, in this article we present a method that employs optimization of a sophisticated energy function, FoldX, only to locally improve a promising configuration. The main question of how promising configurations are identified is addressed through a machine learning method trained a priori on an extensive dataset of functionally diverse protein dimers. To deal with the vast configuration space, a probabilistic search algorithm operates on top of the learner, feeding to it configurations drawn at random. We refer to our method as idDock+, for informatics-driven Docking. idDock+is tested on 15 dimers of different sizes and functional classes. Analysis shows that on all systems idDock+finds a near-native structure and is comparable in accuracy to other state-of-the-art methods. idDock+ represents one of the first highly efficient hybrid methods that combines fast machine learning models with demanding optimization of sophisticated energy scoring functions. Our results indicate that this is a promising direction to improve both efficiency and accuracy in docking.