Cysteine pattern barcoding-based dataset filtration enhances the machine learning-assisted interpretation of Conus venom peptide therapeutics.
Journal:
PloS one
Published Date:
Jul 11, 2025
Abstract
Crude cone snail venom is a rich source of bioactive compounds with significant therapeutic potential. In this study, we conducted a comprehensive analysis of 5,985 cone snail peptides across 82 Conus species to identify unique cysteine (Cys) patterns and associated frameworks. The classification of these Cys patterns, based on conserved framework combinations, enabled the generation of species-level pattern barcodes. These barcodes were then evaluated to assess the species correlations of individual sequences. By analyzing 151 known Conus peptide PDB files, we computed Cys disulfide linkages to assess overall stability profiles. Incorporating barcode data allowed us to filter the dataset and prepare it for machine learning (ML) processing. Random Forest (RF) modeling, a supervised learning technique, was used to predict the therapeutic potential of venom peptides. Feature extraction was based on known venom-derived approved peptide-based drugs. The dataset was split into a 70:30 train-test ratio. A total of 6,430 peptides (5,985 from cone snails and 445 from other venomous species) were used to evaluate model prediction capability. The proposed model achieved ideal accuracy (90.48%) in peptide therapeutic classification. Subsequent model outputs underwent further structural and binding pattern analysis against known targets, revealing significant similarities between the binding patterns of approved and novel peptides. The model's performance could be further enhanced by incorporating additional datasets and optimizing feature selection, potentially broadening its applicability to larger peptide datasets. Overall, this study underscores the potential of ML in advancing pharmacological research on diverse venom peptides.