Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation (1904.12760v1)

Published 29 Apr 2019 in cs.CV and cs.LG

Abstract: Recently, differentiable search methods have made major progress in reducing the computational costs of neural architecture search. However, these approaches often report lower accuracy in evaluating the searched architecture or transferring it to another dataset. This is arguably due to the large gap between the architecture depths in search and evaluation scenarios. In this paper, we present an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. This brings two issues, namely, heavier computational overheads and weaker search stability, which we solve using search space approximation and regularization, respectively. With a significantly reduced search time (~7 hours on a single GPU), our approach achieves state-of-the-art performance on both the proxy dataset (CIFAR10 or CIFAR100) and the target dataset (ImageNet). Code is available at https://github.com/chenxin061/pdarts.

Citations (634)

View on Semantic Scholar

Summary

The paper introduces Progressive DARTS (P-DARTS), which incrementally deepens network architectures during the search phase to close the evaluation gap.
It employs search space approximation and operation-level dropout to manage computational load and mitigate bias towards simpler operations.
P-DARTS achieves a top-1 error of 24.4% on ImageNet in 0.3 GPU-days, demonstrating significant efficiency and performance improvements over traditional NAS methods.

Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation

The paper "Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation," introduces a novel approach to neural architecture search (NAS) that addresses a pivotal issue within differentiable search methods. Traditional differentiable NAS approaches, such as DARTS, have shown reduced accuracy when transitioning from the search to evaluation phase, primarily due to a disparity between the architectural depth utilized during these phases.

Key Contributions

The authors propose a method termed Progressive DARTS (P-DARTS) that incrementally increases the depth of the architectures during the search process, aiming to close this depth gap. This strategy involves several stages where network depth is progressively increased, resulting in architectures better suited for deep network evaluations.

To counter the increased computational demands of deeper architectures and maintain search stability, the paper introduces two main techniques:

Search Space Approximation: This approach reduces the number of candidate operations based on their performance in previous stages, thereby managing the computational overhead effectively.
Search Space Regularization: Using operation-level Dropout, the method mitigates bias towards parameter-free operations like skip connections, which could skew towards rapid gradient descent but offer limited learning capacity. This regularization is crucial in ensuring balanced exploration of the operation space.

Performance and Results

The authors demonstrate the effectiveness of P-DARTS by achieving state-of-the-art results on both CIFAR10 and ImageNet datasets. The proposed method achieves a top-1 error of 24.4% on ImageNet within the mobile setting, showing significant improvements over standard DARTS and other contemporary approaches.

Furthermore, the search method is remarkably efficient. With a search time of about 0.3 GPU-days, it significantly outpaces prior methods, such as AmoebaNet, which required thousands of GPU-days. This acceleration is particularly noteworthy given the competitive accuracy results.

Implications and Future Directions

The improvement in search efficiency and accuracy underlines the potential of P-DARTS for advancing automatic model design in deep learning. Practically, this method can be deployed across diverse datasets with minimal resource consumption. Theoretically, it prompts further investigation into progressive search strategies and their application to other neural architectures beyond image classification.

Looking ahead, future research could explore integrating more sophisticated regularization schemes or leveraging larger operation spaces. Moreover, adapting this methodology to handle more complex datasets and tasks could further establish its relevance in generalizing NAS applications.

By focusing on the critical "depth gap," this paper provides both a practical and theoretical contribution to the domain of NAS, representing a meaningful step forward in automated architecture optimization.