A Machine Learning Approach to Molecular Initiating Event Prediction Using High-Throughput Transcriptomic Chemical Screening Data.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Improved scalability of high-throughput RNA-sequencing technologies has contributed to their proposed use in regulatory contexts for chemical hazard identification. However, the high dimensionality and size of these transcriptomic data sets present a formidable obstacle for utilizing these data to address gaps in chemical safety information. New bioinformatic approaches for extracting mechanistic insight from transcriptomic data sets are a critical need. Here, we present a framework for predicting molecular initiating events (MIE) from transcriptomic chemical bioactivity screens. This framework trains models to predict activation using achine earning (MIEML). Classifiers were trained by integrating gene expression profiles, derived from chemical exposures in MCF-7 cells, with chemical-MIE annotations from RefChemDB, a database that identifies reference chemicals for molecular targets by aggregating data across multiple sources. Classifiers were trained to predict the activation of nine distinct MIEs using profiles derived from reference chemical exposures. Of the nine MIEs modeled, three yielded classifiers that performed significantly better than classifiers trained on randomly selected profiles ( value ≤ 0.1) and correctly predicted MIEs of training-excluded reference chemicals. These classifiers were then used to generate MIE predictions for 1750 test chemicals. These predictions showed significant overlap with those derived from targeted molecular bioassay data for estrogen receptor agonism, aryl hydrocarbon receptor agonism, and glucocorticoid receptor agonism. These results demonstrate the utility of this framework in predicting the MIE activation from large transcriptomic chemical screens.

Authors

  • Joseph L Bundy
    Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Durham, North Carolina 27709, United States.
  • Jesse D Rogers
    Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Durham, North Carolina 27709, United States.
  • Imran Shah
    National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.
  • Richard J Judson
    Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Durham, North Carolina 27709, United States.
  • Logan J Everett
    Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Durham, North Carolina 27709, United States.
  • Joshua A Harrill
    Center for Computational Toxicology & Exposure, Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Dr, Durham, North Carolina 27709, United States.