Efficient Neural Architecture Search via Parameter Sharing (1802.03268v2)

Published 9 Feb 2018 in cs.LG, cs.CL, cs.CV, cs.NE, and stat.ML

Abstract: We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.

Citations (2,650)

View on Semantic Scholar

Summary

The paper presents ENAS, which expedites neural architecture search by sharing parameters across child models, reducing computational cost by over 1000x.
ENAS uses an LSTM controller to sample subgraphs from a larger DAG, efficiently designing both recurrent and convolutional networks.
Empirical results on Penn Treebank and CIFAR-10 demonstrate improved performance metrics with lower error rates and test perplexity compared to traditional methods.

The paper presents Efficient Neural Architecture Search (ENAS), an innovative method aimed at expediting the Neural Architecture Search (NAS) process, which has historically been computationally expensive. The primary advancement detailed in the paper is the parameter sharing mechanism across child models during the architecture search, which starkly contrasts traditional approaches where each model is trained independently.

Central to ENAS is the observation that potential neural network architectures can be viewed as subgraphs within a larger Directed Acyclic Graph (DAG). This overarching graph encapsulates all possible configurations within the designated search space. By employing a controller, specifically an LSTM, that samples these subgraphs, ENAS efficiently narrows down the optimal architectures through policy gradient training to maximize validation set performance.

Methodology

In ENAS, the search space is represented as a DAG. Each node in the DAG signifies a local computation, while the edges represent the flow of information. The LSTM controller navigates this DAG, making decisions on operations and connections, ultimately crafting network architectures.

Recurrent Cells

For designing recurrent cells, ENAS uses a DAG with $N$ nodes. The LSTM controller samples from this space, specifying:

Previous nodes.
Activation functions.

This dynamic sampling enables the design of complex, flexible RNN architectures beyond the constraints of pre-fixed structures such as binary trees. The benefit of this flexibility is evidenced in the empirical performance improvements seen in the results.

Convolutional Networks

For convolutional architectures, ENAS employs two primary search spaces:

Macro - This entails designing the entire network.
Micro - This involves designing smaller convolutional and reduction cells, which are then composed to form the complete architecture.

Experimental Results

ENAS demonstrated strong empirical performance across distinct tasks such as LLMing and image classification, significantly reducing computational resources in the process.

Penn Treebank

ENAS was applied to design a recurrent cell for LLMing on the Penn Treebank dataset. The resulting architecture achieved a state-of-the-art test perplexity of 55.8, outperforming the NAS approach by a significant margin while utilizing over 1000x fewer GPU-hours.

CIFAR-10

For image classification on CIFAR-10, ENAS was tested in both its macro and micro search spaces. In the macro space, ENAS achieved a test error of 4.23%, and 3.87% when the number of filters was increased. In the micro space, ENAS achieved an error rate of 3.54%, with further reduction to 2.89% when using the CutOut data augmentation method. These results are comparable to those of state-of-the-art manually designed architectures and other NAS approaches, but come at a fraction of the computational cost.

Implications and Future Directions

ENAS has profound implications for the field of automated model design. By drastically reducing the computational resources required for NAS, ENAS democratizes access to architecture search, making it feasible for broader use beyond large-scale industrial applications. The demonstrated efficiency and effectiveness of parameter sharing across child models also open up further avenues for optimizing search processes in other domains and tasks.

From a theoretical standpoint, ENAS challenges the traditional notion that each candidate model in NAS must be independently trained to performance convergence. This paradigm shift could inspire new methodologies that further leverage shared computations and representations.

In conclusion, ENAS stands as a significant contribution to the efficient design of neural architectures, bridging the gap between nascent theoretical ideas and their practical, large-scale application. Future work may explore more sophisticated controller mechanisms or holistic integration with other meta-learning techniques, potentially further enhancing the performance and efficiency of NAS.

PDF Markdown

Related Papers

YouTube

Show All Videos