Compression of molecular fingerprints with autoencoder networks.

Journal: Molecular informatics
Published Date:

Abstract

Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.

Authors

  • Agnieszka Ilnicka
    Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
  • Gisbert Schneider
    Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland.