Fuzz Testing Molecular Representation Using Deep Variational Anomaly Generation.

Journal: Journal of chemical information and modeling

Published Date: Feb 5, 2025

Abstract

Researchers are developing increasingly robust molecular representations, motivating the need for thorough methods to stress-test and validate them. Here, we use a variational auto-encoder (VAE), an unsupervised deep learning model, to generate anomalous examples of SELF-referencIng Embedded Strings (SELFIES), a popular molecular string format. These anomalies defy the assertion that all SELFIES convert into valid SMILES strings. Interestingly, we find specific regions within the VAE's internal landscape (latent space), whose decoding frequently generates inconvertible SELFIES anomalies. The model's internal landscape self-organization helps with exploring factors affecting molecular representation reliability. We show how VAEs and similar anomaly generation methods can empirically stress-test molecular representation robustness. Additionally, we investigate reasons for the invalidity of some discovered SELFIES strings (version 2.1.1) and suggest changes to improve them, aiming to spark ongoing molecular representation improvement.

Authors

Victor H R Nogueira

São Carlos Institute of Physics, University of São Paulo, São Paulo 13563-120, Brazil.
Rishabh Sharma

Mechanical Engineering Department, The NorthCap University, Gurugram, Haryana, India.
Rafael V C Guido

São Carlos Institute of Physics, University of São Paulo, São Paulo 13563-120, Brazil.
Michael J Keiser

Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Institute for Neurodegenerative Diseases and Bakar Institute for Computational Health Sciences , University of California-San Francisco , 675 Nelson Rising Lane , San Francisco , California 94158 , United States.

Keywords

Deep Learning

External Resources

View on PubMed Access via DOI PubMed (39908426)

Fuzz Testing Molecular Representation Using Deep Variational Anomaly Generation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals