Deep Learning with S-shaped Rectified Linear Activation Units (1512.07030v1)

Published 22 Dec 2015 in cs.CV

Abstract: Rectified linear activation units are important components for state-of-the-art deep convolutional networks. In this paper, we propose a novel S-shaped rectified linear activation unit (SReLU) to learn both convex and non-convex functions, imitating the multiple function forms given by the two fundamental laws, namely the Webner-Fechner law and the Stevens law, in psychophysics and neural sciences. Specifically, SReLU consists of three piecewise linear functions, which are formulated by four learnable parameters. The SReLU is learned jointly with the training of the whole deep network through back propagation. During the training phase, to initialize SReLU in different layers, we propose a "freezing" method to degenerate SReLU into a predefined leaky rectified linear unit in the initial several training epochs and then adaptively learn the good initial values. SReLU can be universally used in the existing deep networks with negligible additional parameters and computation cost. Experiments with two popular CNN architectures, Network in Network and GoogLeNet on scale-various benchmarks including CIFAR10, CIFAR100, MNIST and ImageNet demonstrate that SReLU achieves remarkable improvement compared to other activation functions.

Citations (213)

View on Semantic Scholar

Summary

The paper introduces SReLU, a novel activation function that models both convex and non-convex relationships using a three-piece linear design.
It applies an adaptive freezing method for initial parameter learning to ensure optimal tuning across different network layers.
Empirical tests demonstrate SReLU significantly reduces error rates on CIFAR, MNIST, and ImageNet, outperforming traditional ReLU variants.

Overview of Deep Learning with S-shaped Rectified Linear Activation Units

The paper introduces the S-shaped Rectified Linear Activation Unit (SReLU), a novel activation function designed to enhance the learning capability of deep neural networks by capturing both convex and non-convex transformation functions. This innovation is motivated by the Webner-Fechner and Stevens laws, which describe sensory perceptions in psychophysics and neural sciences, proposing that human perception of stimulus intensity follows logarithmic and power laws, respectively.

In recent developments within convolutional neural networks (CNNs), non-saturated activation functions such as Rectified Linear Units (ReLU) have played a pivotal role in improving model convergence and mitigating the vanishing gradient problem associated with traditional saturated functions. The SReLU extends beyond existing variants like Leaky ReLU (LReLU), Parametric ReLU (PReLU), and Maxout by incorporating a three-piecewise linear function defined by four learnable parameters. This approach brings a heightened degree of flexibility, allowing for the modeling of more complex, non-linear relationships.

Key Contributions

Piecewise Linear Design: SReLU is constructed from three segments, characterized by four learnable parameters that dictate two threshold points and respective slopes. This configuration allows it to adaptively imitate various non-linear functions, including convex and non-convex forms, akin to those proposed by the Webner-Fechner and Stevens laws.
Adaptive Initialization: A novel "freezing" method during initial training epochs constrains SReLU parameters to simulate a Leaky ReLU, followed by adaptive learning from data distributions to set optimal initial parameter values. This technique ensures effective parameter learning from the onset, particularly in different network layers where input magnitude can vary significantly.
Universal Applicability: SReLU can integrate seamlessly with existing deep network architectures such as Network in Network (NIN) and GoogLeNet, incurring negligible additional computation or parameter costs.

Empirical Results

The empirical evaluation demonstrates that networks employing SReLU consistently outperform those with existing activation functions like ReLU, LReLU, and APL across various datasets:

CIFAR-10 and CIFAR-100: When integrated with NIN, SReLU yields an error reduction of over 1% on CIFAR-10 and 4.86% on CIFAR-100 compared to its ReLU counterpart without data augmentation.
MNIST: The error rate on the MNIST dataset, using NIN coupled with SReLU, achieved 0.35%, matching the best previously reported results with DSN and exhibiting a distinct improvement over PReLU and LReLU.
ImageNet: For large-scale data, incorporating SReLU into GoogLeNet led to a 1.24% accuracy improvement, confirming its scalability and efficacy in deep networks operating on complex datasets.

Implications and Future Directions

The introduction of SReLU presents both theoretical and practical implications in deep learning research. Theoretically, it enriches the understanding of activation functions by corroborating psychophysical insights with computational frameworks. Practically, the enhanced modeling capability of SReLU can be leveraged across various deep learning applications beyond image recognition, such as natural language processing, where complex data relationships prevail.

Future work may explore SReLU’s applications across diverse domains and extend its applicability to more advanced neural network architectures. As neural networks become an integral component of various machine learning endeavors, enhancing their expressive power through innovative activation functions like SReLU remains an essential area of paper.

PDF Markdown