Multiple instance neural networks based on sparse attention for cancer detection using T-cell receptor sequences.

Journal: BMC bioinformatics
Published Date:

Abstract

Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.

Authors

  • Younghoon Kim
  • Tao Wang
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Danyi Xiong
    Department of Statistical Science, Southern Methodist University, Dallas, TX, USA.
  • Xinlei Wang
    School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China.
  • Seongoh Park
    School of Mathematics, Statistics and Data Science, Sungshin Women's University, Seoul, Korea. spark6@sungshin.ac.kr.