HyFormer: Hybrid Transformer and CNN for Pixel-Level Multispectral Image Land Cover Classification.

Journal: International journal of environmental research and public health
PMID:

Abstract

To effectively solve the problems that most convolutional neural networks cannot be applied to the pixelwise input in remote sensing (RS) classification and cannot adequately represent the spectral sequence information, we propose a new multispectral RS image classification framework called HyFormer based on Transformer. First, a network framework combining a fully connected layer (FC) and convolutional neural network (CNN) is designed, and the 1D pixelwise spectral sequences obtained from the fully connected layers are reshaped into a 3D spectral feature matrix for the input of CNN, which enhances the dimensionality of the features through FC as well as increasing the feature expressiveness, and can solve the problem that 2D CNN cannot achieve pixel-level classification. Secondly, the features of the three levels of CNN are extracted and combined with the linearly transformed spectral information to enhance the information expression capability, and also used as the input of the transformer encoder to improve the features of CNN using the powerful global modelling capability of the Transformer, and finally the skip connection of the adjacent encoders to enhance the fusion between different levels of information. The pixel classification results are obtained by MLP Head. In this paper, we mainly focus on the feature distribution in the eastern part of Changxing County and the central part of Nanxun District, Zhejiang Province, and conduct experiments based on Sentinel-2 multispectral RS images. The experimental results show that the overall accuracy of HyFormer for the study area classification in Changxing County is 95.37% and that of Transformer (ViT) is 94.15%. The experimental results show that the overall accuracy of HyFormer for the study area classification in Nanxun District is 95.4% and that of Transformer (ViT) is 94.69%, and the performance of HyFormer on the Sentinel-2 dataset is better than that of the Transformer.

Authors

  • Chuan Yan
    School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China.
  • Xiangsuo Fan
    College of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China.
  • Jinlong Fan
    National Satellite Meteorological Center, China Meteorological Administration, Beijing 100081, China.
  • Ling Yu
    School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China.
  • Nayi Wang
    School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China.
  • Lin Chen
    College of Sports, Nanjing Tech University, Nanjing, China.
  • Xuyang Li
    School of Automation, Guangxi University of Science and Technology, Liuzhou 545006, China.