Interpretable bioinformatics approaches for pheochromocytoma bioactivity and protein interaction analysis.

Journal: Computers in biology and medicine
Published Date:

Abstract

Pheochromocytoma (PCC) is a rare neuroendocrine tumor driven by complex molecular mechanisms, notably involving the oncogenic c-Myc/Max and c-Myc/c-Max protein complexes. Despite their pivotal role in tumor progression, the molecular interactions and bioactive compounds specifically targeting these complexes remain inadequately characterized. This study presents an integrative computational pipeline combining interpretable bioinformatics, network biology, and machine learning to elucidate key molecular mechanisms and bioactive motifs associated with PCC. A curated dataset of 5000 bioactive molecules was obtained from ChEMBL, and structural motifs associated with bioactivity were identified using a genetic programming-based approach. Random Forest, Support Vector Machines, and Gradient Boosting classifiers were trained and cross-validated using 10-fold cross-validation to predict pIC50 values, achieving high performance (mean accuracy: 0.98, AUC >0.97). Feature importance analysis consistently identified pIC50, molecular weight (MW), lipophilicity (LogP), and hydrogen-bonding properties as primary determinants of bioactivity. PPI networks were built using STRING's experimentally validated interactions and refined using BioGRID and literature cross-validation. Network centrality analysis and community detection using the Girvan-Newman algorithm revealed MYC, MAX, and EP300 as central hubs, with associated protein modules significantly enriched for biological processes including transcriptional regulation, cell cycle control, ubiquitination, and apoptosis. To enhance model interpretability, explainable artificial intelligence (XAI) methods, including SHAP and DALEX, were employed to elucidate the contribution of individual molecular descriptors, mechanistically elucidating compound-target interactions. Despite its robustness, this computational framework lacks experimental validation and independent external datasets. Additionally, STRING's uniform confidence scores limited edge-weight precision in network visualizations during network analyses. Nevertheless, this study demonstrates the potential of a multi-layered computational approach to deepen the understanding of MYC-driven oncogenesis in PCC. By integrating motif discovery, network biology, and interpretable machine learning, the work identifies actionable molecular signatures and critical protein targets, providing a foundation for future experimental validation and the development of targeted therapies in pheochromocytoma as well as other rare cancers.

Authors

  • İlhan Uysal
    Information Systems and Technologies. Depart, Burdur Mehmet Akif Ersoy University, Bucak Zeliha Tolunay School of Applied Technology and Business, Burdur, Turkey.

Keywords

No keywords available for this article.