FusionPath: Gene fusion pathogenicity prediction using protein structural data and contextual protein embeddings
Journal:
bioRxiv
Published Date:
Jan 26, 2026
Abstract
Accurate prediction of gene fusion pathogenicity is critical for understanding oncogenic mechanisms and advancing precision oncology. While existing computational methods provide valuable insights, their performance remains limited by incomplete integration of multi-scale biological features and lack of interpretability. We present FusionPath, a novel deep learning framework for gene fusion pathogenicity prediction. FusionPath uniquely integrates embeddings from multiple pretrained protein language models, including FusON-pLM and ProtBERT and retained protein domains and Gene Ontology (GO) functional annotations. A hierarchical attention mechanism dynamically weights the contribution of each feature type, enabling both high-accuracy prediction and biological interpretability. The model was trained and rigorously validated on a large-scale dataset of clinically annotated pathogenic and benign fusions. FusionPath significantly outperformed state-of-the-art methods, achieving higher AUC on independent test sets. Crucially, SHAP analysis revealed that protein domains and GO terms contributed non-redundant, biologically interpretable signals, with specific domains and GO processes exhibiting high predictive weights for pathogenicity. FusionPath establishes a new standard for gene fusion pathogenicity prediction by effectively leveraging complementary sequence, structural, and functional information. Its attention-driven interpretability provides actionable insights into the molecular determinants of fusion oncogenicity, facilitating biological discovery and clinical variant prioritization. The framework is publicly available to accelerate research in cancer genomics and therapeutic target identification.