A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.

Journal: Scientific reports
Published Date:

Abstract

With human guidance, computers now use machine learning (ML) in artificial intelligence (AI) to learn from data, detect trends, and make predictions. Software can adapt and improve with new information. Imaging scans leverage pattern recognition to predict outcomes, diagnose disorders, and suggest treatments. Tuberculosis (TB) remains the most common bacterial disease affecting humans. The World Health Organisation reported that in 2022, 1.3 million people died from tuberculosis, with the death rate potentially reaching 66% if proper treatment isn't provided. We trained ML-supervised algorithms like XG Boost, Logistic Regression, Random Forest Classifier, Ad- aBoost, and Support Vector Machine to help classify TB patients from large RNA-sequence count data. Such algorithms provided prediction accuracies of 0.963, 0.739, 0.773, 0.866, and 0.866 sequentially. This article highlights feature importance techniques using the ML model, XGBoost, with the highest prediction accuracy of 0.963, identifying significant genes in TB RNA sequence count data. Using key machine learning features, we here identified 20 pathways, 24 gene ontologies, 20 hub genes, and 22 drugs. Next, we applied advanced computational techniques, including pathway analysis, GO, hub-protein and protein-protein interactions (PPI), transcriptomic and miRNA interactions, and drug-protein interactions, to help analyze 100 highly expressed genes.

Authors

  • Md Saddam Hossain
    Department of Statistics, Research Division, Population Council, Dhaka, Bangladesh.
  • Md Parvez Khandocar
    Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
  • Farzana Akter Riti
    Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
  • Md Yeakub Ali
    Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
  • Prithbey Raj Dey
    Department of Industrial and Production Engineering, Faculty of Mechanical Engineering, Dhaka University of Engineering and Technology, Gazipur, 1707, Bangladesh.
  • S M Jahurul Haque
    Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
  • Amira Metouekel
    University of Technology of Compiègne, EA 4297 TIMR, 60205 Compiègne Cedex, France.
  • Atrsaw Asrat Mengistie
    Department of Biology, Bahir Dar University, P.O. Box 79, Bahir Dar, Ethiopia. smartresercher@gmail.com.
  • Mohammed Bourhia
    Laboratory of Biotechnology and Natural Resources Valorization, Faculty of Sciences, Ibn Zohr University, 80060, Agadir, Morocco.
  • Farid Khallouki
    Ethnopharmacology and Pharmacognosy Team, Department of Biology, Moulay Ismail University of Meknes, Errachidia, Morocco.
  • Khalid S Almaary
    Department of Botany and Microbiology, College of Science, King Saud University, P. O. BOX 2455, 11451, Riyadh, Saudi Arabia. kalmaary@ksu.edu.sa.