A deep neural network-based approach for prediction of mutagenicity of compounds.

Journal: Environmental science and pollution research international
Published Date:

Abstract

We are exposed to various chemical compounds present in the environment, cosmetics, and drugs almost every day. Mutagenicity is a valuable property that plays a significant role in establishing a chemical compound's safety. Exposure and handling of mutagenic chemicals in the environment pose a high health risk; therefore, identification and screening of these chemicals are essential. Considering the time constraints and the pressure to avoid laboratory animals' use, the shift to alternative methodologies that can establish a rapid and cost-effective detection without undue over-conservation seems critical. In this regard, computational detection and identification of the mutagens in environmental samples like drugs, pesticides, dyes, reagents, wastewater, cosmetics, and other substances is vital. From the last two decades, there have been numerous efforts to develop the prediction models for mutagenicity, and by far, machine learning methods have demonstrated some noteworthy performance and reliability. However, the accuracy of such prediction models has always been one of the major concerns for the researchers working in this area. The mutagenicity prediction models were developed using deep neural network (DNN), support vector machine, k-nearest neighbor, and random forest. The developed classifiers were based on 3039 compounds and validated on 1014 compounds; each of them encoded with 1597 molecular feature vectors. DNN-based prediction model yielded highest prediction accuracy of 92.95% and 83.81% with the training and test data, respectively. The area under the receiver's operating curve and precision-recall curve values were found to be 0.894 and 0.838, respectively. The DNN-based classifier not only fits the data with better performance as compared to traditional machine learning algorithms, viz., support vector machine, k-nearest neighbor, and random forest (with and without feature reduction) but also yields better performance metrics. In current work, we propose a DNN-based model to predict mutagenicity of compounds.

Authors

  • Rajnish Kumar
    Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India.
  • Farhat Ullah Khan
    Computer and Information Sciences Department, Universiti Teknologi Petronas, 32610, Seri Iskander, Perak, Malaysia.
  • Anju Sharma
    Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow, 226028, Uttar Pradesh. India.
  • Mohammed Haris Siddiqui
    Department of Bioengineering, Integral University, Dasauli, P.O. Basha, Kursi Road, Lucknow, Uttar Pradesh. India.
  • Izzatdin Ba Aziz
    Computer and Information Sciences Department, Universiti Teknologi Petronas, 32610, Seri Iskander, Perak, Malaysia.
  • Mohammad Amjad Kamal
    West China School of Nursing / Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China.
  • Ghulam Md Ashraf
    Pre-Clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia. gashraf@kau.edu.sa.
  • Badrah S Alghamdi
    Pre-Clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.
  • Md Sahab Uddin
    Department of Pharmacy, Southeast University, Dhaka, Bangladesh. msu-neuropharma@hotmail.com.