An improved poly(A) motifs recognition method based on decision level fusion.

Journal: Computational biology and chemistry
Published Date:

Abstract

Polyadenylation is the process of addition of poly(A) tail to mRNA 3' ends. Identification of motifs controlling polyadenylation plays an essential role in improving genome annotation accuracy and better understanding of the mechanisms governing gene regulation. The bioinformatics methods used for poly(A) motifs recognition have demonstrated that information extracted from sequences surrounding the candidate motifs can differentiate true motifs from the false ones greatly. However, these methods depend on either domain features or string kernels. To date, methods combining information from different sources have not been found yet. Here, we proposed an improved poly(A) motifs recognition method by combing different sources based on decision level fusion. First of all, two novel prediction methods was proposed based on support vector machine (SVM): one method is achieved by using the domain-specific features and principle component analysis (PCA) method to eliminate the redundancy (PCA-SVM); the other method is based on Oligo string kernel (Oligo-SVM). Then we proposed a novel machine-learning method for poly(A) motif prediction by marrying four poly(A) motifs recognition methods, including two state-of-the-art methods (Random Forest (RF) and HMM-SVM), and two novel proposed methods (PCA-SVM and Oligo-SVM). A decision level information fusion method was employed to combine the decision values of different classifiers by applying the DS evidence theory. We evaluated our method on a comprehensive poly(A) dataset that consists of 14,740 samples on 12 variants of poly(A) motifs and 2750 samples containing none of these motifs. Our method has achieved accuracy up to 86.13%. Compared with the four classifiers, our evidence theory based method reduces the average error rate by about 30%, 27%, 26% and 16%, respectively. The experimental results suggest that the proposed method is more effective for poly(A) motif recognition.

Authors

  • Shanxin Zhang
    School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, PR China.
  • Jiuqiang Han
    School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, PR China. Electronic address: jqhan@mail.xjtu.edu.cn.
  • Jun Liu
    Department of Radiology, Second Xiangya Hospital, Changsha, Hunan, China.
  • Jiguang Zheng
    School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, PR China.
  • Ruiling Liu
    School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, PR China.