Topology-driven negative sampling enhances generalizability in protein-protein interaction prediction.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Unraveling the human interactome to uncover disease-specific patterns and discover drug targets hinges on accurate protein-protein interaction (PPI) predictions. However, challenges persist in machine learning (ML) models due to a scarcity of quality hard negative samples, shortcut learning, and limited generalizability to novel proteins.

Authors

  • Ayan Chatterjee
    Department of Information and Communication Technology, Centre for e-Health, University of Agder, 4604 Kristiansand, Norway.
  • Babak Ravandi
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • Parham Haddadi
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • Naomi H Philip
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • Mario Abdelmessih
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • William R Mowrey
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • Piero Ricchiuto
    Alexion Pharmaceuticals, Boston, Massachusetts, USA.
  • Yupu Liang
    Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States.
  • Wei Ding
    Division of Stem Cell and Tissue Engineering, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu Sichuan, 610041, P.R.China.
  • Juan Carlos Mobarec
    Protein Structure and Biophysics, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
  • Tina Eliassi-Rad
    Network Science Institute, Northeastern University, Boston, MA, USA.