Deep learning-based computational approach for predicting ncRNAs-disease associations in metaplastic breast cancer diagnosis.

Journal: BMC cancer
PMID:

Abstract

Non-coding RNAs (ncRNAs) play a crucial role in breast cancer progression, necessitating advanced computational approaches for precise disease classification. This study introduces a Deep Reinforcement Learning (DRL)-based framework for predicting ncRNA-disease associations in metaplastic breast cancer (MBC) using a multi-dimensional descriptor system (ncRNADS) integrating 550 sequence-based features and 1,150 target gene descriptors (miRDB score ≥ 90). The model achieved 96.20% accuracy, 96.48% precision, 96.10% recall, and a 96.29% F1-score, outperforming traditional classifiers such as support vector machines (SVM) and neural networks. Feature selection and optimization reduced dimensionality by 42.5% (4,430 to 2,545 features) while maintaining high accuracy, demonstrating computational efficiency. External validation confirmed model specificity to breast cancer subtypes (87-96.5% accuracy) and minimal cross-reactivity with unrelated diseases like Alzheimer's (8-9% accuracy), ensuring robustness. SHAP analysis identified key sequence motifs (e.g., "UUG") and structural free energy (ΔG = - 12.3 kcal/mol) as critical predictors, validated by PCA (82% variance) and t-SNE clustering. Survival analysis using TCGA data revealed prognostic significance for MALAT1, HOTAIR, and NEAT1 (associated with poor survival, HR = 1.76-2.71) and GAS5 (protective effect, HR = 0.60). The DRL model demonstrated rapid training (0.08 s/epoch) and cloud deployment compatibility, underscoring its scalability for large-scale applications. These findings establish ncRNA-driven classification as a cornerstone for precision oncology, enabling patient stratification, survival prediction, and therapeutic target identification in MBC.

Authors

  • Saleem Ahmad
    Department of Cell Biology and Physiology, University of Kansas Medical Center, Kansas City, KS, 66160, USA.
  • Imran Zafar
    Department of Bioinformatics and Computational Biology, Virtual University Pakistan, 44000, Pakistan. Electronic address: bioinfo.pk@gmail.com.
  • Shaista Shafiq
    Department of Biochemistry and Biotechnology, Faculty of Science, The University of Faisalabad (TUF), Faisalabad, Punjab, Pakistan.
  • Laila Sehar
    National Centre for Bioinformatics, Quaid-E-Azam University Islamabad, Islamabad, Pakistan.
  • Hafsa Khalil
    National Centre for Bioinformatics, Quaid-E-Azam University Islamabad, Islamabad, Pakistan.
  • Nida Matloob
    COMSATS University, Islamabad, Pakistan.
  • Mehvish Hina
    Department: Institute of Molecular Biology and Biotechnology, University of Lahore, Lahore, Pakistan.
  • Sidra Tul Muntaha
    Institute of Biotechnology and Genetic Engineering, The University of Agriculture, Peshawar, Pakistan.
  • Hamid Khan
    Faculty of Biological Sciences, Department of Biochemistry, Quaid-E-Azam University, Islamabad, Pakistan.
  • Najeeb Ullah Khan
    Institute of Biotechnology and Genetic Engineering, The University of Agriculture, Peshawar, Pakistan.
  • Samreen Rana
    Department of Bioinformatics, School of Interdisciplinary Engineering & Sciences, NUST, Islamabad, Pakistan.
  • Ahsanullah Unar
    Department of Precision Medicine, University of Campania 'L. Vanvitelli', Naples, Italy.
  • Muhammad Azmat
    School of Civil and Environmental Engineering (SCEE), National University of Sciences and Technology (NUST), Islamabad, Pakistan.
  • Muhammad Shafiq
    Department of Electrical & Computer Engineering, Sultan Qaboos University, Muscat, Oman. Electronic address: mshafiq@squ.edu.om.
  • Yousef A Bin Jardan
    Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 11451, Riyadh, Saudi Arabia.
  • Musaab Dauelbait
    University of Bahr el Ghazal, Freedom Street, Wau, 91113, South Sudan.
  • Mohammed Bourhia
    Laboratory of Biotechnology and Natural Resources Valorization, Faculty of Sciences, Ibn Zohr University, 80060, Agadir, Morocco.