EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention.

Journal: IEEE transactions on cybernetics

Published Date: Mar 6, 2025

Abstract

Owing to advancements in deep learning technology, vision transformers (ViTs) have demonstrated impressive performance in various computer vision tasks. Nonetheless, ViTs still face some challenges, such as high computational complexity and the absence of desirable inductive biases. To alleviate these issues, the potential advantages of combining eagle vision with ViTs are explored. A bi-fovea visual interaction (BFVI) structure inspired by the unique physiological and visual characteristics of eagle eyes is introduced. Based on this structural design approach, a novel bi-fovea self-attention (BFSA) mechanism and bi-fovea feedforward network (BFFN) are proposed. These components are employed to mimic the hierarchical and parallel information processing scheme of the biological visual cortex, thereby enabling networks to learn the feature representations of the targets in a coarse-to-fine manner. Furthermore, a bionic eagle vision (BEV) block is designed as the basic building unit based on the BFSA mechanism and the BFFN. By stacking the BEV blocks, a unified and efficient family of pyramid backbone networks called eagle ViTs (EViTs) is developed. Experimental results indicate that the EViTs exhibit highly competitive performance in various computer vision tasks, demonstrating their potential as backbone networks. In terms of computational efficiency and scalability, EViTs show significant advantages compared with other counterparts. The developed code is available at https://github.com/nkusyl/EViT.

Authors

Yulong Shi

CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
Mingwei Sun

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
Yongshuai Wang
Jiahao Ma
Zengqiang Chen

College of Artificial Intelligence, Nankai University, Tianjin 300350, China.

Keywords

Algorithms Animals Deep Learning Eagles Image Processing, Computer-Assisted Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40031751)

EViT: An Eagle Vision Transformer With Bi-Fovea Self-Attention.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals