Volume-preserving Neural Networks (1911.09576v3)
Abstract: We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. Our approach replaces the standard weight matrix of a neural network with a combination of diagonal, rotational and permutation matrices, all of which are volume-preserving. We introduce a coupled activation function allowing us to preserve volume even in the activation function portion of a neural network layer. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. To demonstrate our architecture we apply our volume-preserving neural network model to two standard datasets.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.