Machine learning method using position-specific mutation based classification outperforms one hot coding for disease severity prediction in haemophilia 'A'.

Journal: Genomics
PMID:

Abstract

Haemophilia is an X-linked genetic disorder in which A and B types are the most common that occur due to absence or lack of protein factors VIII and IX, respectively. Severity of the disease depends on mutation. Available Machine Learning (ML) methods that predict the mutational severity by using traditional encoding approaches, generally have high time complexity and compromised accuracy. In this study, Haemophilia 'A' patient mutation dataset containing 7784 mutations was processed by the proposed Position-Specific Mutation (PSM) and One-Hot Encoding (OHE) technique to predict the disease severity. The dataset processed by PSM and OHE methods was analyzed and trained for classification of mutation severity level using various ML algorithms. Surprisingly, PSM outperformed OHE, both in terms of time efficiency and accuracy, with training and prediction time improvement in the range of approximately 91 to 98% and 80 to 99% respectively. The severity prediction accuracy also improved by using PSM with different ML algorithms.

Authors

  • Vikalp Kumar Singh
    Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India.
  • Neha Shree Maurya
    Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India.
  • Ashutosh Mani
    Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India. Electronic address: amani@mnnit.ac.in.
  • Rama Shankar Yadav
    Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, UP 211004, India.