Biological Sequence with Language Model Prompting: A Survey
Journal:
arXiv
Published Date:
Mar 6, 2025
Abstract
Large Language models (LLMs) have emerged as powerful tools for addressing
challenges across diverse domains. Notably, recent studies have demonstrated
that large language models significantly enhance the efficiency of biomolecular
analysis and synthesis, attracting widespread attention from academics and
medicine. In this paper, we systematically investigate the application of
prompt-based methods with LLMs to biological sequences, including DNA, RNA,
proteins, and drug discovery tasks. Specifically, we focus on how prompt
engineering enables LLMs to tackle domain-specific problems, such as promoter
sequence prediction, protein structure modeling, and drug-target binding
affinity prediction, often with limited labeled data. Furthermore, our
discussion highlights the transformative potential of prompting in
bioinformatics while addressing key challenges such as data scarcity,
multimodal fusion, and computational resource limitations. Our aim is for this
paper to function both as a foundational primer for newcomers and a catalyst
for continued innovation within this dynamic field of study.