Enabling scale and rotation invariance in convolutional neural networks with retina like transformation.

Journal: Neural networks : the official journal of the International Neural Network Society

PMID: 40121784

Abstract

Traditional convolutional neural networks (CNNs) struggle with scale and rotation transformations, resulting in reduced performance on transformed images. Previous research focused on designing specific CNN modules to extract transformation-invariant features. However, these methods lack versatility and are not adaptable to a wide range of scenarios. Drawing inspiration from human visual invariance, we propose a novel brain-inspired approach to tackle the invariance problem in CNNs. If we consider a CNN as the visual cortex, we have the potential to design an "eye" that exhibits transformation invariance, allowing CNNs to perceive the world consistently. Therefore, we propose a retina module and then integrate it into CNNs to create transformation-invariant CNNs (TICNN), achieving scale and rotation invariance. The retina module comprises a retina-like transformation and a transformation-aware neural network (TANN). The retina-like transformation supports flexible image transformations, while the TANN regulates these transformations for scaling and rotation. Specifically, we propose a reference-based training method (RBTM) where the retina module learns to align input images with a reference scale and rotation, thereby achieving invariance. Furthermore, we provide mathematical substantiation for the retina module to confirm its feasibility. Experimental results also demonstrate that our method outperforms existing methods in recognizing images with scale and rotation variations. The code will be released at https://github.com/JiaHongZ/TICNN.

Authors

Jiahong Zhang

State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China.
Guoqi Li

University of Chinese Academy of Sciences, Beijing 100049, China.
Qiaoyi Su

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China. Electronic address: suqiaoyi2020@ia.ac.cn.
Lihong Cao

State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China.
Yonghong Tian

National Engineering Laboratory for Video Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing, China; Peng Cheng Laboratory, Shenzhen, China.
Bo Xu

State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China.

Keywords

Convolutional Neural Networks Humans Image Processing, Computer-Assisted Neural Networks, Computer Retina Rotation Visual Cortex

External Resources

View on PubMed Access via DOI PubMed (40121784)

Enabling scale and rotation invariance in convolutional neural networks with retina like transformation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals