Sensitivity and Generalization in Neural Networks: an Empirical Study

Published 23 Feb 2018 in stat.ML, cs.AI, cs.LG, and cs.NE | (1802.08760v3)

Abstract: In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models. In this work, we investigate this tension between complexity and generalization through an extensive empirical exploration of two natural metrics of complexity related to sensitivity to input perturbations. Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets. We find that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization. We further establish that factors associated with poor generalization $-$ such as full-batch training or using random labels $-$ correspond to lower robustness, while factors associated with good generalization $-$ such as data augmentation and ReLU non-linearities $-$ give rise to more robust functions. Finally, we demonstrate how the input-output Jacobian norm can be predictive of generalization at the level of individual test points.

Abstract PDF Upgrade to Chat

Citations (430)

View on Semantic Scholar

Summary

The paper demonstrates that reduced sensitivity, quantified by the input-output Jacobian norm, strongly predicts improved generalization performance.
It employs an extensive empirical analysis of thousands of fully-connected network models using metrics and techniques like data augmentation and mini-batch optimization.
The study suggests that prioritizing robustness to input perturbations over traditional complexity measures can enhance model selection and uncertainty estimation.

Sensitivity and Generalization in Neural Networks: An Empirical Study

In the field of machine learning, particularly deep learning, one perplexing observation is the counterintuitive performance of large, over-parameterized neural networks. These networks often generalize better than their ostensibly simpler, smaller counterparts, despite their vastly greater complexity as traditionally measured by parameters and capacity. The paper "Sensitivity and Generalization in Neural Networks: an Empirical Study" embarks on an empirical analysis to probe this paradox, focusing on metrics related to sensitivity and robustness to input perturbations.

Key Findings

The researchers scrutinize thousands of models across varied architectures and settings, using fully-connected neural networks as their primary workhorse. A pivotal discovery is that neural networks exhibiting lower sensitivity to input perturbations—quantified via the norm of the input-output Jacobian—tend to generalize better. This Jacobian norm serves as a robust predictor of generalization performance across several test scenarios.

Remarkably, the study highlights that neural networks trained with techniques known to bolster generalization, such as data augmentation and stochastic gradient-based optimizers, exhibit enhanced robustness, as evidenced by lower sensitivity metrics. This relationship between robustness and generalization challenges the classical complexity theory, which would predict poorer generalization in more complex models.

Experimental Approach and Results

The researchers employ two major sensitivity metrics:

The norm of the input-output Jacobian.
The number of transitions or linear region changes along sampled input paths.

Through extensive experimentation on multiple datasets (such as CIFAR10 and MNIST), they validate that successful generalization consistently aligns with reduced sensitivity. For instance, conditions like randomized labels or full-batch training, known to undermine generalization, are also seen to increase sensitivity. Conversely, employing ReLU non-linearities and mini-batch stochastic optimization tends to decrease network sensitivity, paralleling improvements in generalization.

Moreover, the study explores the intricacies of individual test data points, probing the predictive power of the Jacobian norm on per-point generalization. While some complexity persists in this relationship, points with higher sensitivity (as indicated by the Jacobian) tend to exhibit poorer classification confidence, suggesting potential applications for uncertainty estimation in active learning.

Implications and Speculations

The findings significantly enrich our understanding of neural network generalization, presenting the Jacobian norm as a viable sensitivity metric for gauging model robustness. The theoretical implications suggest the need for refined complexity measures that integrate sensitivity to data manifold characteristics over traditional architectural complexity metrics.

Practically, these insights urge a reevaluation of model selection criteria, emphasizing that robustness to input perturbations should be prioritized. The nuanced relationship between model capacity, trainability, and generalization highlighted in this paper could inspire future heuristic-driven techniques in deep learning, possibly enhancing hyper-parameter tuning strategies and regularization schemes.

Future Directions

Building on these insights, future research could extend the empirical analysis to more complex architectures, including convolutional and transformer-based networks, examining how different model configurations and tasks influence sensitivity. Another potential avenue is exploring sensitivity's role in adversarial robustness, offering broader applications in secure AI development.

Overall, this study marks a progression towards a deeper understanding of what underpins successful neural network generalization, challenging conventional paradigms and paving the way for novel, more nuanced theoretical frameworks in the study of deep learning models.

Markdown Report Issue