Learning Universal Representations of Intermolecular Interactions with ATOMICA
Journal:
bioRxiv
Published Date:
Mar 16, 2026
Abstract
Molecular interactions underlie nearly all biological processes, but many representation learning models either focus on single entities or are trained for a narrow set of interaction settings. Here, we introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across five modalities, including proteins, small molecules, metal ions, lipids, and nucleic acids. ATOMICA is trained on 2,037,972 interaction complexes to generate embeddings of interaction interfaces at the levels of atoms, chemical blocks, and molecular interfaces. The latent space is multiscale and reflects physicochemical features shared across molecular classes. On the RNAGlib 3D structure-function benchmark, ATOMICA attains the best performance across four tasks, and in protein pocket ligand classification, ATOMICA improves upon established protein pocket encoders and is comparable to protein language models. Using the shared embedding space, we embed orthosteric PPI inhibitors and find inhibitor embeddings are more similar to interface embeddings proximal to the native binding site across protein-peptide and protein-protein complexes. We use ATOMICA to suggest putative ligands to pockets in the dark proteome, which are proteins lacking known function. In total, ligands are predicted for 2,646 dark protein pockets and heme binding is experimentally confirmed for five ATOMICA predictions. ATOMICA opens new avenues for learning representations of intermolecular interactions.