Bidirectionally self-normalizing neural networks.

Journal: Neural networks : the official journal of the International Neural Network Society

PMID: 37666186

Abstract

The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.

Authors

Yao Lu

Department of Laboratory Medicine, The First Affiliated Hospital of Ningbo University, Ningbo First Hospital, Ningbo, China.
Stephen Gould

Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Extracellular RNA Communication Consortium (ERCC), ᅟ, ᅟ ; American Society for Exosomes and Microvesicles (ASEMV), ᅟ, ᅟ
Thalaiyasingam Ajanthan

Australian National University, Australia; Amazon. Electronic address: thalaiyasingam.ajanthan@anu.edu.au.

Keywords

Neural Networks, Computer Normal Distribution

External Resources

View on PubMed Access via DOI PubMed (37666186)

Bidirectionally self-normalizing neural networks.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals