Contrastive Language-Structure Pre-training Driven by Materials Science Literature
Journal:
arXiv
Published Date:
Jan 22, 2025
Abstract
Understanding structure-property relationships is an essential yet
challenging aspect of materials discovery and development. To facilitate this
process, recent studies in materials informatics have sought latent embedding
spaces of crystal structures to capture their similarities based on properties
and functionalities. However, abstract feature-based embedding spaces are
human-unfriendly and prevent intuitive and efficient exploration of the vast
materials space. Here we introduce Contrastive Language--Structure Pre-training
(CLaSP), a learning paradigm for constructing crossmodal embedding spaces
between crystal structures and texts. CLaSP aims to achieve material embeddings
that 1) capture property- and functionality-related similarities between
crystal structures and 2) allow intuitive retrieval of materials via
user-provided description texts as queries. To compensate for the lack of
sufficient datasets linking crystal structures with textual descriptions, CLaSP
leverages a dataset of over 400,000 published crystal structures and
corresponding publication records, including paper titles and abstracts, for
training. We demonstrate the effectiveness of CLaSP through text-based crystal
structure screening and embedding space visualization.