Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours (1904.02877v1)

Published 5 Apr 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Can we automatically design a Convolutional Network (ConvNet) with the highest image classification accuracy under the runtime constraint of a mobile device? Neural architecture search (NAS) has revolutionized the design of hardware-efficient ConvNets by automating this process. However, the NAS problem remains challenging due to the combinatorially large design space, causing a significant searching time (at least 200 GPU-hours). To alleviate this complexity, we propose Single-Path NAS, a novel differentiable NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our contributions are as follows: 1. Single-path search space: Compared to previous differentiable NAS methods, Single-Path NAS uses one single-path over-parameterized ConvNet to encode all architectural decisions with shared convolutional kernel parameters, hence drastically decreasing the number of trainable parameters and the search cost down to few epochs. 2. Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96% top-1 accuracy on ImageNet with 79ms latency on a Pixel 1 phone, which is state-of-the-art accuracy compared to NAS methods with similar constraints (<80ms). 3. NAS efficiency: Single-Path NAS search cost is only 8 epochs (30 TPU-hours), which is up to 5,000x faster compared to prior work. 4. Reproducibility: Unlike all recent mobile-efficient NAS methods which only release pretrained models, we open-source our entire codebase at: https://github.com/dstamoulis/single-path-nas.

Citations (276)

Summary

  • The paper introduces Single-Path NAS, which reduces neural architecture search time to under 4 hours using a unified single-path over-parameterized ConvNet framework.
  • The method achieved 74.96% top-1 accuracy on ImageNet with a 79ms inference latency on a Pixel 1 device, setting a state-of-the-art efficiency benchmark.
  • By leveraging a shared kernel parameter approach, the technique simplifies the NAS search space and offers scalable design solutions for resource-constrained environments.

Single-Path NAS: Advancements in Efficient ConvNet Design

The paper "Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours" presents a significant methodological advancement in the domain of Neural Architecture Search (NAS). Authored by Dimitrios Stamoulis and collaborators, this work addresses the challenge of optimizing convolutional networks (ConvNets) for both computational efficiency and accuracy under stringent hardware limitations, such as those imposed by mobile devices.

Core Contributions

The paper introduces the Single-Path NAS method, a novel differentiable approach that significantly reduces the computational cost of NAS for hardware-efficient ConvNets. Traditional NAS approaches, relying on multi-path methods, suffer from high computational demands due to the expansive search spaces they create, often requiring upwards of 200 GPU-hours. Single-Path NAS mitigates these inefficiencies by proposing a unified single-path over-parameterized ConvNet framework that models all potential architectural variations using shared convolutional kernel parameters. This adjustment drastically reduces both the trainable parameters and the time required for architecture search, achieving results in under 4 hours.

Key results from the application of this method include reaching a 74.96% top-1 accuracy on the ImageNet dataset with an inference latency of 79ms on a Pixel 1 device. This marks a state-of-the-art performance benchmark within comparable latency constraints. Furthermore, the method accelerates the search process substantially, completing in just 8 epochs, equating to about 30 TPU-hours.

Methodology

The pivotal innovation of this work lies in the reimagined search space, whereby the NAS task is translated into selecting which kernels' subsets are to be used in specific ConvNet layers. This is achieved through a parameter-sharing mechanism within an over-parameterized "superkernel" that is both memory and computation-efficient compared to traditional approaches that maintain separate paths for each candidate operation.

The authors introduce a differentiation-base approach to navigate this search space, leveraging differentiable indicators that decide the active set of kernel weights during training. This approach effectively transforms the NAS search into a weight optimization problem, eliminating the need for separate architectural weights, hence simplifying the optimization landscape and reducing the computational burden.

Implications and Future Directions

The advancements demonstrated by Single-Path NAS have significant implications for both the theoretical understanding and practical deployment of NAS. By moving towards a more efficient encoding of NAS design spaces, this work opens avenues for exploring more complex and constrained NAS problems previously deemed computationally prohibitive. For instance, applying this single-path methodology could be extended to optimize architectures across a broader set of constraints such as energy efficiency and real-time processing capabilities on edge devices.

In terms of future developments, the single-path framework can potentially be integrated into other NAS paradigms, such as reinforcement learning and evolutionary strategies, offering these models the same reductions in computational expenditure while maintaining model accuracy and efficiency gains. The methodology’s adaptability to various hardware platforms could also be an exciting frontier, extending its applicability beyond mobile devices to other resource-constrained environments such as IoT devices and embedded systems.

In conclusion, this work sets a compelling precedent for future NAS research by demonstrating an effective balance between network performance and search efficiency. The open-sourcing of their implementation lays a foundation for further exploration and validation by the AI research community. This contribution is poised to fuel further innovation in the design of hardware-efficient neural networks, leveraging the informed selection of network architectures that align with specific hardware constraints.