Analysis of Expression Pattern of snoRNAs in Different Cancer Types with Machine Learning Algorithms.

Journal: International journal of molecular sciences
Published Date:

Abstract

Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew's correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.

Authors

  • Xiaoyong Pan
    Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark. xypan172436@gmail.com.
  • Lei Chen
    Department of Chemistry, Stony Brook University Stony Brook NY USA.
  • Kai-Yan Feng
    Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou 510507, China. addland@126.com.
  • Xiao-Hua Hu
    Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, China. xhhu@fudan.edu.cn.
  • Yu-Hang Zhang
    Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
  • Xiang-Yin Kong
    Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China. xykong@sibs.ac.cn.
  • Tao Huang
    The Second Clinical Medical College of Guangzhou University of Chinese Medicine, Guangzhou, China.
  • Yu-Dong Cai
    College of Life Science, Shanghai University, Shanghai, People's Republic of China.