Generative Artificial Intelligence GPT-4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology.

Journal: ACS synthetic biology
Published Date:

Abstract

Knowledge mining from synthetic biology journal articles for machine learning (ML) applications is a labor-intensive process. The development of natural language processing (NLP) tools, such as GPT-4, can accelerate the extraction of published information related to microbial performance under complex strain engineering and bioreactor conditions. As a proof of concept, we proposed prompt engineering for a GPT-4 workflow pipeline to extract knowledge from 176 publications on two oleaginous yeasts ( and ). After human intervention, the pipeline obtained a total of 2037 data instances. The structured data sets and feature selections enabled ML approaches (e.g., a random forest model) to predict fermentation titers with decent accuracy ( of 0.86 for unseen test data). Via transfer learning, the trained model could assess the production potential of the engineered nonconventional yeast, , for which there are fewer published reports. This work demonstrated the potential of generative artificial intelligence to streamline information extraction from research articles, thereby facilitating fermentation predictions and biomanufacturing development.

Authors

  • Zhengyang Xiao
    Department of Energy, Environmental, and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States.
  • Wenyu Li
    School of Marxism, Capital Normal University, Beijing, China.
  • Hannah Moon
    ImpactDB LLC, St. Louis, Missouri 63105, United States.
  • Garrett W Roell
    ImpactDB LLC, St. Louis, Missouri 63105, United States.
  • Yixin Chen
    Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO, 63110, USA.
  • Yinjie J Tang
    Department of Energy, Environmental and Chemical Engineering, Washington University in Saint Louis, Saint Louis, MO 63130, USA. Electronic address: yinjie.tang@wustl.edu.