Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets.

Journal: Biochimica et biophysica acta. General subjects
Published Date:

Abstract

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2D. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities.

Authors

  • Michelle P Aranha
    Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, United States of America; University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
  • Catherine Spooner
    Department of Mathematics and Computer Science, Fayetteville State University, Fayetteville, NC 28301, United States of America.
  • Omar Demerdash
    University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America; Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.
  • Bogdan Czejdo
    Department of Mathematics and Computer Science, Fayetteville State University, Fayetteville, NC 28301, United States of America.
  • Jeremy C Smith
    Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, United States of America; University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States of America.
  • Julie C Mitchell
    Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America. Electronic address: mitchelljc@ornl.gov.