GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides.

Journal: Scientific reports
PMID:

Abstract

Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.

Authors

  • Jaskaran Singh
    College of Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.
  • Narendra N Khanna
    Cardiology Department, Apollo Hospitals, New Delhi, India.
  • Ranjeet K Rout
    Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India.
  • Narpinder Singh
    Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India.
  • John R Laird
    UC Davis Vascular Center, University of California, Davis, CA, USA.
  • Inder M Singh
    Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, 95747, CA, USA.
  • Mannudeep K Kalra
  • Laura E Mantella
    Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada.
  • Amer M Johri
    Division of Cardiology, Department of Medicine, Queen's University, Kingston, ON, Canada.
  • Esma R Isenovic
    Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia.
  • Mostafa M Fouda
    Department of Electrical and Computer Engineering, College of Science and Engineering, Idaho State University, Pocatello, ID 83209, USA.
  • Luca Saba
    Department of Radiology, A.O.U., Italy.
  • Mostafa Fatemi
    Department of Physiology and Biomedical Engineering, Mayo Clinic College of Medicine and Science, Rochester, MN, 55902, USA.
  • Jasjit S Suri
    Advanced Knowledge Engineering Center, Global Biomedical Technologies, Inc., Roseville, CA, USA. Electronic address: jsuri@comcast.net.