A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides.
Journal:
Scientific reports
Published Date:
Jul 1, 2025
Abstract
Interleukin-6 (IL-6) is a cytokine with diverse biological activities that contribute to a variety of physiologic and immune responses. IL-6-inducing peptides are the short protein fragments that are critical for playing a contributing role in biological processes. Extensive research has advanced the development of IL-6-inducing peptides, but identifying these peptides experimentally remains time-consuming, labor-intensive, and costly. Therefore, computational prediction has gained attention as an alternative method. Meanwhile, some computational methods have already been developed, but they suffer from insufficient accuracy and inadequate feature engineering. In this study, we developed PredIL6, an advanced ensemble learning model that precisely identifies IL-6-inducing peptides by combining probability scores from 148 baseline machine learning and deep learning models, using a genetic algorithm-based meta-classifier. A forward feature selection method was used to construct the ensemble model, which consists of 20 baseline or single-feature models, including AAINDEX, BLOSUM62, and language models (ESM-2 and word2vec). PredIL6 outperformed existing state-of-the-art methods, achieving accuracy values of 0.934 and 0.899 on the training and test sets, respectively. Thus, PredIL6 is a powerful tool for expediting the identification of IL-6-inducing peptides. A freely available web application and a standalone PredIL6 program are provided.