Evolutionary Tree in Chemical Space of Natural Products
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Natural products (NPs) are key to biological function and adaptation, with their distribution shaped by complex evolutionary and ecological forces. While it may seem reasonable to assume that closely related species produce chemically similar NPs, this assumption has not been systematically tested at a broad taxonomic scale. Here, we evaluate whether evolutionary (taxonomic) proximity correlates with chemical similarity in large-scale data from the Lotus database of NPs. We use five deep learning-based encoders, including Chemformer and SMILES Transformer, to embed NPs into a high-dimensional “chemical space.” Our results demonstrate that, for flowering plants (Magnoliopsida) and conifers (Pinopsida), species separated by shorter taxonomic distances tend to produce significantly more similar NPs. Similar trends are observed for Fungi and Metazoa, albeit with some complications, possibly due to horizontal gene transfer, convergent evolution, and/or incomplete coverage in the dataset used for NPs. Our findings suggest that the evolutionary tree can be statistically recovered in a chemical space of NPs, provided that this space is constructed with appropriate deep learning techniques, and provide a new computational framework to investigate the evolutionary dynamics of secondary metabolism. These results can inform drug design strategies, for example by enabling the reconstruction of NPs from poorly studied or extinct species.