Deciphering genomic codes using advanced natural language processing techniques: a scoping review.
Journal:
Journal of the American Medical Informatics Association : JAMIA
PMID:
39998912
Abstract
OBJECTIVES: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of natural language processing (NLP) techniques, particularly large language models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data.