polyGen: A Learning Framework for Atomic-level Polymer Structure Generation
Journal:
arXiv
Published Date:
Apr 24, 2025
Abstract
Synthetic polymeric materials underpin fundamental technologies in the
energy, electronics, consumer goods, and medical sectors, yet their development
still suffers from prolonged design timelines. Although polymer informatics
tools have supported speedup, polymer simulation protocols continue to face
significant challenges: on-demand generation of realistic 3D atomic structures
that respect the conformational diversity of polymer structures. Generative
algorithms for 3D structures of inorganic crystals, bio-polymers, and small
molecules exist, but have not addressed synthetic polymers. In this work, we
introduce polyGen, the first latent diffusion model designed specifically to
generate realistic polymer structures from minimal inputs such as the repeat
unit chemistry alone, leveraging a molecular encoding that captures polymer
connectivity throughout the architecture. Due to a scarce dataset of only 3855
DFT-optimized polymer structures, we augment our training with DFT-optimized
molecular structures, showing improvement in joint learning between similar
chemical structures. We also establish structure matching criteria to benchmark
our approach on this novel problem. polyGen effectively generates diverse
conformations of both linear chains and complex branched structures, though its
performance decreases when handling repeat units with a high atom count. Given
these initial results, polyGen represents a paradigm shift in atomic-level
structure generation for polymer science-the first proof-of-concept for
predicting realistic atomic-level polymer conformations while accounting for
their intrinsic structural flexibility.