EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention.

Journal: IEEE transactions on cybernetics
Published Date:

Abstract

Owing to advancements in deep learning technology, vision transformers (ViTs) have demonstrated impressive performance in various computer vision tasks. Nonetheless, ViTs still face some challenges, such as high computational complexity and the absence of desirable inductive biases. To alleviate these issues, the potential advantages of combining eagle vision with ViTs are explored. A bi-fovea visual interaction (BFVI) structure inspired by the unique physiological and visual characteristics of eagle eyes is introduced. Based on this structural design approach, a novel bi-fovea self-attention (BFSA) mechanism and bi-fovea feedforward network (BFFN) are proposed. These components are employed to mimic the hierarchical and parallel information processing scheme of the biological visual cortex, thereby enabling networks to learn the feature representations of the targets in a coarse-to-fine manner. Furthermore, a bionic eagle vision (BEV) block is designed as the basic building unit based on the BFSA mechanism and the BFFN. By stacking the BEV blocks, a unified and efficient family of pyramid backbone networks called eagle ViTs (EViTs) is developed. Experimental results indicate that the EViTs exhibit highly competitive performance in various computer vision tasks, demonstrating their potential as backbone networks. In terms of computational efficiency and scalability, EViTs show significant advantages compared with other counterparts. The developed code is available at https://github.com/nkusyl/EViT.

Authors

  • Yulong Shi
    CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
  • Mingwei Sun
    Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
  • Yongshuai Wang
  • Jiahao Ma
  • Zengqiang Chen
    College of Artificial Intelligence, Nankai University, Tianjin 300350, China.