Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness
Journal:
arXiv
Published Date:
Sep 25, 2024
Abstract
While convolutional neural networks (CNNs) excel at clean image
classification, they struggle to classify images corrupted with different
common corruptions, limiting their real-world applicability. Recent work has
shown that incorporating a CNN front-end block that simulates some features of
the primate primary visual cortex (V1) can improve overall model robustness.
Here, we expand on this approach by introducing two novel biologically-inspired
CNN model families that incorporate a new front-end block designed to simulate
pre-cortical visual processing. RetinaNet, a hybrid architecture containing the
novel front-end followed by a standard CNN back-end, shows a relative
robustness improvement of 12.3% when compared to the standard model; and EVNet,
which further adds a V1 block after the pre-cortical front-end, shows a
relative gain of 18.5%. The improvement in robustness was observed for all the
different corruption categories, though accompanied by a small decrease in
clean image accuracy, and generalized to a different back-end architecture.
These findings show that simulating multiple stages of early visual processing
in CNN early layers provides cumulative benefits for model robustness.