Protein function prediction using functional inter-relationship.

Journal: Computational biology and chemistry
Published Date:

Abstract

With the growth of high throughput sequencing techniques, the generation of protein sequences has become fast and cheap, leading to a huge increase in the number of known proteins. However, it is challenging to identify the functions being performed by these newly discovered proteins. Machine learning techniques have improved traditional methods' efficiency by suggesting relevant functions but fails to perform well when the number of functions to be predicted becomes large. In this work, we propose a machine learning-based approach to predict huge set of protein functions that use the inter-relationships between functions to improve the model's predictability. These inter-relationships of functions is used to reduce the redundancy caused by highly correlated functions. The proposed model is trained on the reduced set of non-redundant functions hindering the ambiguity caused due to inter-related functions. Here, we use two statistical approaches 1) Pearson's correlation coefficient 2) Jaccard similarity coefficient, as a measure of correlation to remove redundant functions. To have a fair evaluation of the proposed model, we recreate our original function set by inverse transforming the reduced set using the two proposed approaches: Direct mapping and Ensemble approach. The model is tested using different feature sets and function sets of biological processes and molecular functions to get promising results on DeepGO and CAFA3 dataset. The proposed model is able to predict specific functions for the test data which were unpredictable by other compared methods. The experimental models, code and other relevant data are available at https://github.com/richadhanuka/PFP-using-Functional-interrelationship.

Authors

  • Richa Dhanuka
    Department of Computer Science and Engineering, National Institute of Technology Patna, India. Electronic address: richa.dhanuka@gmail.com.
  • Jyoti Prakash Singh
    Department of Computer Science and Engineering, National Institute of Technology Patna, India. Electronic address: jps@nitp.ac.in.