Accelerating Deep Learning by Focusing on the Biggest Losers (1910.00762v1)

Published 2 Oct 2019 in cs.LG and stat.ML

Abstract: This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02--1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples.

Citations (103)

View on Semantic Scholar

Summary

The paper demonstrates the effectiveness of Selective-Backprop in dynamically prioritizing high-loss samples to optimize training efficiency.
It employs rigorous experiments on CIFAR10, CIFAR100, and SVHN with various architectures, showing significant reductions in wall-clock training time.
Selective-Backprop achieves up to a 15.9x speedup, highlighting its potential to reduce computational costs while maintaining model accuracy.

A Comprehensive Analysis of Selective-Backprop in Neural Network Training

The paper presents an in-depth exploration of the Selective-Backprop (SB) algorithm applied to neural network training, specifically focusing on its efficacy in varying conditions and architecture setups. This approach is evaluated against conventional training methodologies to assess its impact on computational efficiency and accuracy.

The central premise of the Selective-Backprop algorithm is to enhance the training process's efficiency by prioritizing certain samples during the backpropagation phase. This algorithm dynamically selects examples based on the network's current state, distinguishing it from traditional uniform sampling methods, whereby all examples are treated equally irrespective of their gradient magnitudes or potential contribution to the training loss.

Experimental Setup and Results

Selective-Backprop's performance was assessed using several benchmark datasets including CIFAR10, CIFAR100, and SVHN, with its effectiveness tested across multiple neural network architectures such as ResNet18, DenseNet, and MobileNetV2. The experiments underscored several key findings:

Variance in Sample Selection: The research highlights the variance in relative losses when integrating sampling techniques, contrasting it with conventional training methods. The Selective-Backprop technique consistently relied on the most updated network states to decide on sample importance, potentially addressing gradient explosion issues associated with simpler examples [Figure \ref{fig:cifar10-forgetting}].
Accelerated Learning Rate Schedule: Under a varied learning rate schedule, Selective-Backprop demonstrated a notable reduction in wall-clock time required to achieve target error rates. This is particularly evident when altering the decay rate at specific epochs for each dataset, indicating the potential of Selective-Backprop in optimizing training time [Figure \ref{fig:strategy-seconds-lr3}].
Illustration of Computational Cost Asymmetry: The paper also explores the asymmetrical nature of computational costs between the backward and forward passes within modern GPU architectures, noting up to a 2.5x longer processing time for the backward pass. This substantiates the potential computational advantages offered by SB [Figure \ref{fig:asymmetry}].

Performance Metrics and Comparative Analysis

The speedup achieved by Selective-Backprop over other strategies like Stale-SB and Kath18 was quantified in terms of achieving final error rates. Notably, SB exhibits a substantial reduction in computational requirements when generating comparable accuracy levels, specifying up to 15.9x speedup in the SVHN dataset scenario [Table \ref{table:speedup-abs}].

Implications and Future Directions

This research provides compelling evidence that Selective-Backprop can enhance training efficiency without compromising model accuracy. The approach demonstrates significant promise in contexts where computational resources are limited, or in applications requiring rapid model deployment and iteration cycles.

However, the findings also prompt further inquiry into the balance between training efficiency and model generalization, particularly when applying Selective-Backprop in more complex machine learning tasks beyond image classification. Future exploration might investigate this methodology within diverse datasets, more sophisticated architectures, or integrated with other sampling techniques to optimize sample importance estimation further.

Overall, Selective-Backprop offers a compelling method for reducing the computational burden of neural network training, with broader implications for scalable and efficient AI deployment. Its demonstrated ability to adapt sample difficulty dynamically could play a critical role in the future landscape of machine learning optimizations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/drscotthawley/status/1799817702053425430

https://twitter.com/MaxKannen/status/1763262894760665482

YouTube

Show All Videos