Deep local-to-global feature learning for medical image super-resolution.

Journal: Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
Published Date:

Abstract

Medical images play a vital role in medical analysis by providing crucial information about patients' pathological conditions. However, the quality of these images can be compromised by many factors, such as limited resolution of the instruments, artifacts caused by movements, and the complexity of the scanned areas. As a result, low-resolution (LR) images cannot provide sufficient information for diagnosis. To address this issue, researchers have attempted to apply image super-resolution (SR) techniques to restore the high-resolution (HR) images from their LR counterparts. However, these techniques are designed for generic images, and thus suffer from many challenges unique to medical images. An obvious one is the diversity of the scanned objects; for example, the organs, tissues, and vessels typically appear in different sizes and shapes, and are thus hard to restore with standard convolution neural networks (CNNs). In this paper, we develop a dynamic-local learning framework to capture the details of these diverse areas, consisting of deformable convolutions with adjustable kernel shapes. Moreover, the global information between the tissues and organs is vital for medical diagnosis. To preserve global information, we propose pixel-pixel and patch-patch global learning using a non-local mechanism and a vision transformer (ViT), respectively. The result is a novel CNN-ViT neural network with Local-to-Global feature learning for medical image SR, referred to as LGSR, which can accurately restore both local details and global information. We evaluate our method on six public datasets and one large-scale private dataset, which include five different types of medical images (i.e., Ultrasound, OCT, Endoscope, CT, and MRI images). Experiments show that the proposed method achieves superior PSNR/SSIM and visual performance than the state of the arts with competitive computational costs, measured in network parameters, runtime, and FLOPs. What is more, the experiment conducted on OCT image segmentation for the downstream task demonstrates a significantly positive performance effect of LGSR.

Authors

  • Wenfeng Huang
    State Key Laboratory of Precision Spectroscopy, Quantum Institute for Light and Atoms, Department of Physics and Electronic Science, East China Normal University, Shanghai 200062, China.
  • Xiangyun Liao
    Guangdong Provincial Key Laboratory of Machine Vision and Virtual Reality Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China. Electronic address: xyunliao@gmail.com.
  • Hao Chen
    The First School of Medicine, Wenzhou Medical University, Wenzhou, China.
  • Ying Hu
    Department of Ultrasonography, The First Affiliated Hospital, College of Medicine, Zhejiang University, Qingchun Road No. 79, Hangzhou, Zhejiang 310003, China.
  • Wenjing Jia
    School of Electrical and Data Engineering (SEDE), University of Technology Sydney, 2007, Sydney, Australia.
  • Qiong Wang
    Beijing Meiling Biotechnology Corporation, Beijing, 102600, PR China.