A deep learning model to classify neoplastic state and tissue origin from transcriptomic data.

Journal: Scientific reports
Published Date:

Abstract

Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.

Authors

  • James Hong
    Krembil Research Institute, University Health Network, 399 Bathurst Street, Suite 4W-449, Toronto, ON, M5T 2S8, Canada.
  • Laureen D Hachem
    Krembil Research Institute, University Health Network, 399 Bathurst Street, Suite 4W-449, Toronto, ON, M5T 2S8, Canada.
  • Michael G Fehlings
    Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.