AI for predicting chemical-effect associations at the chemical universe level-deepFPlearn.

Journal: Briefings in bioinformatics
PMID:

Abstract

Many chemicals are present in our environment, and all living species are exposed to them. However, numerous chemicals pose risks, such as developing severe diseases, if they occur at the wrong time in the wrong place. For the majority of the chemicals, these risks are not known. Chemical risk assessment and subsequent regulation of use require efficient and systematic strategies. Lab-based methods-even if high throughput-are too slow to keep up with the pace of chemical innovation. Existing computational approaches are designed for specific chemical classes or sub-problems but not usable on a large scale. Further, the application range of these approaches is limited by the low amount of available labeled training data. We present the ready-to-use and stand-alone program deepFPlearn that predicts the association between chemical structures and effects on the gene/pathway level using a combined deep learning approach. deepFPlearn uses a deep autoencoder for feature reduction before training a deep feed-forward neural network to predict the target association. We received good prediction qualities and showed that our feature compression preserves relevant chemical structural information. Using a vast chemical inventory (unlabeled data) as input for the autoencoder did not reduce our prediction quality but allowed capturing a much more comprehensive range of chemical structures. We predict meaningful-experimentally verified-associations of chemicals and effects on unseen data. deepFPlearn classifies hundreds of thousands of chemicals in seconds. We provide deepFPlearn as an open-source and flexible tool that can be easily retrained and customized to different application settings at https://github.com/yigbt/deepFPlearn.

Authors

  • Jana Schor
    Department Computational Biology, Helmholtz Centre for environmental research - UFZ, Permoserstr. 15, 04318 Leipzig, Saxony, Germany.
  • Patrick Scheibe
    Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraβe 1a, 04103 Leipzig, Saxony, Germany.
  • Matthias Bernt
    Department Computational Biology and Chemistry - Helmholtz Centre for Environmental Research (UFZ), Permoserstraße 15, 04318, Leipzig, Germany.
  • Wibke Busch
    Department Ecotoxicology - Helmholtz Centre for Environmental Research (UFZ), Permoserstraße 15, 04318, Leipzig, Germany.
  • Chih Lai
    Graduate Program in Software & School of Engineering, University of St. Thomas, 2115 Summit Ave, St. Paul, MN 55105, Minnesota, USA.
  • Jörg Hackermüller
    Department Computational Biology, Helmholtz Centre for environmental research - UFZ, Permoserstr. 15, 04318 Leipzig, Saxony, Germany.