Integrating multiple fitting regression and Bayes decision for cancer diagnosis with transcriptomic data from tumor-educated blood platelets.

Journal: The Analyst
Published Date:

Abstract

The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.

Authors

  • Guangzao Huang
    Department of Automation, Xiamen University, Xiamen 361005, Fujian, China. glji@xmu.edu.cn and Interdisciplinary Program in Genetics, Texas A&M University, College Station, Texas 77843, USA.
  • Mingshun Yuan
    Department of Automation, Xiamen University, Xiamen 361005, Fujian, China. glji@xmu.edu.cn.
  • Moliang Chen
    Department of Automation, Xiamen University, Xiamen 361005, Fujian, China. glji@xmu.edu.cn.
  • Lei Li
    Department of Thoracic Surgery, The Affiliated Huaian No.1 People's Hospital of Nanjing Medical University, Huai'an, China.
  • Wenjie You
    School of Electronics and Information Engineering, Fujian Normal University, Fuqing 350300, Fujian, China.
  • Hanjie Li
    Xiamen LifeInt Technology Co., Ltd., Xiamen 361000, Fujian, China and Department of Immunology, Weizmann Institute, Rehovot 76100, Israel.
  • James J Cai
    Interdisciplinary Program in Genetics, Texas A&M University, College Station, Texas 77843, USA.
  • Guoli Ji
    Department of Automation, Xiamen University, Xiamen 361005, Fujian, China. glji@xmu.edu.cn and Innovation Center for Cell Signaling Network, Xiamen University, Xiamen 361102, Fujian, China and Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, Fujian, China.