NLP for computational insights into nutritional impacts on colorectal cancer care.

Journal: SLAS technology
Published Date:

Abstract

Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.

Authors

  • Shengnan Gong
    The Affiliated Hospital of Nantong University, Nantong University, Nantong, Jiangsu 226001, PR China. Electronic address: gsn6574172@163.com.
  • Xiaohong Jin
    School of Electronic Information, Guangxi University for Nationalities, Nanning 530000, China.
  • Yujie Guo
    Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250117, People's Republic of China.
  • Jie Yu
    Institute of Animal Nutrition, Sichuan Agricultural University, Key Laboratory for Animal Disease-Resistance Nutrition of China Ministry of Education, Key Laboratory of Animal Disease-resistant Nutrition and Feed of China Ministry of Agriculture and Rural Affairs, Key Laboratory of Animal Disease-resistant Nutrition of Sichuan Province, Ya'an, 625014, China.