Childhood Leukemia Classification via Information Bottleneck Enhanced Hierarchical Multi-Instance Learning.

Journal: IEEE transactions on medical imaging
Published Date:

Abstract

Leukemia classification relies on a detailed cytomorphological examination of Bone Marrow (BM) smear. However, applying existing deep-learning methods to it is facing two significant limitations. Firstly, these methods require large-scale datasets with expert annotations at the cell level for good results and typically suffer from poor generalization. Secondly, they simply treat the BM cytomorphological examination as a multi-class cell classification task, thus failing to exploit the correlation among leukemia subtypes over different hierarchies. Therefore, BM cytomorphological estimation as a time-consuming and repetitive process still needs to be done manually by experienced cytologists. Recently, Multi-Instance Learning (MIL) has achieved much progress in data-efficient medical image processing, which only requires patient-level labels (which can be extracted from the clinical reports). In this paper, we propose a hierarchical MIL framework and equip it with Information Bottleneck (IB) to tackle the above limitations. First, to handle the patient-level label, our hierarchical MIL framework uses attention-based learning to identify cells with high diagnostic values for leukemia classification in different hierarchies. Then, following the information bottleneck principle, we propose a hierarchical IB to constrain and refine the representations of different hierarchies for better accuracy and generalization. By applying our framework to a large-scale childhood acute leukemia dataset with corresponding BM smear images and clinical reports, we show that it can identify diagnostic-related cells without the need for cell-level annotations and outperforms other comparison methods. Furthermore, the evaluation conducted on an independent test cohort demonstrates the high generalizability of our framework.

Authors

  • Zeyu Gao
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China; Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
  • Anyu Mao
  • Kefei Wu
  • Yang Li
    Occupation of Chinese Center for Disease Control and Prevention, Beijing, China.
  • Liebin Zhao
    School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Xianli Zhang
    National Engineering Lab for Big Data Analytics, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
  • Jialun Wu
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China; Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
  • Lisha Yu
  • Chao Xing
  • Tieliang Gong
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China; Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
  • Yefeng Zheng
  • Deyu Meng
  • Min Zhou
    Department of Respiratory and Critical Care Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
  • Chen Li
    School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.