A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.

Journal: Scientific reports

Published Date: Jul 16, 2025

Abstract

With human guidance, computers now use machine learning (ML) in artificial intelligence (AI) to learn from data, detect trends, and make predictions. Software can adapt and improve with new information. Imaging scans leverage pattern recognition to predict outcomes, diagnose disorders, and suggest treatments. Tuberculosis (TB) remains the most common bacterial disease affecting humans. The World Health Organisation reported that in 2022, 1.3 million people died from tuberculosis, with the death rate potentially reaching 66% if proper treatment isn't provided. We trained ML-supervised algorithms like XG Boost, Logistic Regression, Random Forest Classifier, Ad- aBoost, and Support Vector Machine to help classify TB patients from large RNA-sequence count data. Such algorithms provided prediction accuracies of 0.963, 0.739, 0.773, 0.866, and 0.866 sequentially. This article highlights feature importance techniques using the ML model, XGBoost, with the highest prediction accuracy of 0.963, identifying significant genes in TB RNA sequence count data. Using key machine learning features, we here identified 20 pathways, 24 gene ontologies, 20 hub genes, and 22 drugs. Next, we applied advanced computational techniques, including pathway analysis, GO, hub-protein and protein-protein interactions (PPI), transcriptomic and miRNA interactions, and drug-protein interactions, to help analyze 100 highly expressed genes.

Authors

Md Saddam Hossain

Department of Statistics, Research Division, Population Council, Dhaka, Bangladesh.
Md Parvez Khandocar

Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
Farzana Akter Riti

Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
Md Yeakub Ali

Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
Prithbey Raj Dey

Department of Industrial and Production Engineering, Faculty of Mechanical Engineering, Dhaka University of Engineering and Technology, Gazipur, 1707, Bangladesh.
S M Jahurul Haque

Department of Biomedical Engineering, Faculty of Engineering and Technology, Islamic University, Kushtia, 7003, Bangladesh.
Amira Metouekel

University of Technology of Compiègne, EA 4297 TIMR, 60205 Compiègne Cedex, France.
Atrsaw Asrat Mengistie

Department of Biology, Bahir Dar University, P.O. Box 79, Bahir Dar, Ethiopia. smartresercher@gmail.com.
Mohammed Bourhia

Laboratory of Biotechnology and Natural Resources Valorization, Faculty of Sciences, Ibn Zohr University, 80060, Agadir, Morocco.
Farid Khallouki

Ethnopharmacology and Pharmacognosy Team, Department of Biology, Moulay Ismail University of Meknes, Errachidia, Morocco.
Khalid S Almaary

Department of Botany and Microbiology, College of Science, King Saud University, P. O. BOX 2455, 11451, Riyadh, Saudi Arabia. kalmaary@ksu.edu.sa.

Keywords

Algorithms Computational Biology Humans Machine Learning Molecular Sequence Annotation Mycobacterium tuberculosis Support Vector Machine Tuberculosis

External Resources

View on PubMed Access via DOI PubMed (40670587)

A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals