Papers
Topics
Authors
Recent
2000 character limit reached

Multi-objective Differentiable Neural Architecture Search (2402.18213v3)

Published 28 Feb 2024 in cs.LG, cs.CV, and stat.ML

Abstract: Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Getting vit in shape: Scaling laws for compute-optimal model design. Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  2. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning (ICML'18), volume 80. Proceedings of Machine Learning Research, 2018.
  3. Hardware-aware neural architecture search: Survey and taxonomy. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp.  4322–4329, 8 2021. Survey Track.
  4. SMASH: One-shot model architecture search through hypernetworks. In International Conference on Learning Representations, 2018.
  5. Proxylessnas: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations, 2018.
  6. Once-for-All: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations (ICLR), 2020.
  7. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  12270–12280, 2021a.
  8. DrNAS: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021b.
  9. Fairer and more accurate tabular models through nas. Algorithmic Fairness through the Lens of Time Workshop at NeurIPS, 2023.
  10. Multi-objective bayesian optimization over high-dimensional search spaces. In Uncertainty in Artificial Intelligence, pp.  507–517. PMLR, 2022.
  11. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In Parallel Problem Solving from Nature PPSN VI: 6th International Conference Paris, France, September 18–20, 2000 Proceedings 6, pp.  849–858. Springer, 2000.
  12. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  13. Désidéri, J.-A. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350:313–318, 2012.
  14. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  15. NAS-Bench-201: Extending the scope of reproducible neural architecture search. In Proceedings of the International Conference on Learning Representations (ICLR'20), 2020. Published online: iclr.cc.
  16. Rethinking bias mitigation: Fairer architectures make for fairer face recognition. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  17. Brp-nas: Prediction-based nas using gcns. Advances in Neural Information Processing Systems, 33:10480–10490, 2020.
  18. Efficient multi-objective neural architecture search via lamarckian evolution. In International Conference on Learning Representations, 2019a.
  19. Efficient multi-objective neural architecture search via lamarckian evolution. In International Conference on Learning Representations, 2019b.
  20. AutoGAN-distiller: Searching to compress generative adversarial networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  3292–3303. PMLR, 13–18 Jul 2020.
  21. Gunantara, N. A review of multi-objective optimization: Methods and its applications. Cogent Engineering, 5(1):1502242, 2018.
  22. Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pp.  544–560. Springer, 2020.
  23. Hypernetworks. In Proceedings of the International Conference on Learning Representations (ICLR'17), 2017.
  24. Improving pareto front learning via multi-sample hypernetworks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  7875–7883, 2023.
  25. An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems, volume 35, pp.  30016–30030. Curran Associates, Inc., 2022.
  26. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332, 2018.
  27. Ofa 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: A multi-objective perspective for the once-for-all neural architecture search. arXiv preprint arXiv:2303.13683, 2023.
  28. Jaggi, M. Revisiting frank-wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning, 2013.
  29. Categorical reparameterization with gumbel-softmax. In Proceedings of the International Conference on Learning Representations (ICLR'17), 2017. Published online: iclr.cc.
  30. Montreal neural machine translation systems for wmt’15. In Proceedings of the tenth workshop on statistical machine translation, pp.  134–140, 2015.
  31. Eh-dnas: End-to-end hardware-aware differentiable neural architecture search. arXiv preprint arXiv:2111.12299, 2021.
  32. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  33. Mdarts: Multi-objective differentiable neural architecture search. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.  1344–1349. IEEE, 2021.
  34. Rapid neural architecture search by learning to generate graphs from datasets. In International Conference on Learning Representations, 2021a.
  35. Hardware-adaptive efficient latency prediction for nas via meta-learning. In Advances in Neural Information Processing Systems, volume 34, pp.  27016–27028. Curran Associates, Inc., 2021b.
  36. S3nas: Fast npu-aware neural architecture search methodology. arXiv preprint arXiv:2009.02009, 2020.
  37. Hw-nas-bench: Hardware-aware neural architecture search benchmark. In International Conference on Learning Representations, 2021.
  38. Random search and reproducibility for neural architecture search. In Peters, J. and Sontag, D. (eds.), Proceedings of The 36th Uncertainty in Artificial Intelligence Conference (UAI'20), pp. 367–377. PMLR, 2020.
  39. Pareto multi-task learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  40. Controllable pareto multi-task learning. ArXiv, abs/2010.06313, 2020.
  41. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019.
  42. Bridging discrete and backpropagation: Straight-through and beyond. Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  43. The stochastic multi-gradient algorithm for multi-objective optimization and its application to supervised machine learning. Annals of Operations Research, pp.  1572–9338, 2021.
  44. Nsganetv2: Evolutionary multi-objective surrogate-assisted neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp.  35–51. Springer, 2020.
  45. Results of the WMT14 metrics shared task. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp.  293–301, Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-3336.
  46. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In Proceedings of the 36th International Conference on Machine Learning (ICML'20), pp.  6597–6607. Proceedings of Machine Learning Research, 2020.
  47. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning, pp. 6755–6764. PMLR, 2020.
  48. A multi-objective/multi-task learning framework induced by pareto stationarity. In International Conference on Machine Learning, pp. 15895–15907. PMLR, 2022.
  49. λ𝜆\lambdaitalic_λ-darts: Mitigating performance collapse by harmonizing operation selection among cells. The Eleventh International Conference on Learning Representations, 2022.
  50. Learning the pareto front with hypernetworks. International Conference on Learning Representations, 2021.
  51. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning, 2018.
  52. Stochastic multiple target sampling gradient descent. Advances in neural information processing systems, 35:22643–22655, 2022.
  53. Speedy performance estimation for neural architecture search. In Advances in Neural Information Processing Systems, 2021.
  54. Scalable pareto front approximation for deep multi-objective learning. In 2021 IEEE international conference on data mining (ICDM), pp.  1306–1311. IEEE, 2021.
  55. Convolutional neural fabrics. Advances in neural information processing systems, 29, 2016.
  56. Multi-task learning as multi-objective optimization. In Neural Information Processing Systems, 2018.
  57. Squeezenas: Fast neural architecture search for faster semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp.  0–0, 2019.
  58. Weight-entanglement meets gradient-based neural architecture search. arXiv preprint arXiv:2312.10440, 2023.
  59. Mnasnet: Platform-aware neural architecture search for mobile. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  60. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12965–12974, 2020.
  61. Attentivenas: Improving neural architecture search via attentive sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  6418–6427, 2021.
  62. Hat: Hardware-aware transformers for efficient natural language processing. arXiv:2005.14187[cs.CL], 2020a.
  63. HAT: Hardware-aware transformers for efficient natural language processing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  7675–7688, Online, July 2020b. Association for Computational Linguistics.
  64. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  139–149, 2022.
  65. Neural architecture search: Insights from 1000 papers. ArXiv, abs/2301.08727, 2023.
  66. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10734–10742, 2019.
  67. Trilevel neural architecture search for efficient single image super-resolution. arXiv preprint arXiv:2101.06658, 2021.
  68. SNAS: stochastic neural architecture search. In International Conference on Learning Representations, 2019.
  69. Pc-darts: Partial channel connections for memory-efficient architecture search. In International Conference on Learning Representations, 2020a.
  70. Latency-aware differentiable neural architecture search. arXiv preprint arXiv:2001.06392, 2020b.
  71. Pareto navigation gradient descent: a first-order algorithm for optimization in pareto set. In Uncertainty in Artificial Intelligence, pp.  2246–2255. PMLR, 2022.
  72. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12104–12113, 2022.
  73. Fast hardware-aware neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  692–693, 2020.
  74. idarts: Differentiable architecture search with stochastic implicit gradients. In International Conference on Machine Learning, 2021.
  75. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR'17), 2017.
Citations (1)

Summary

  • The paper introduces a unified hardware-aware and gradient-based NAS method that generates diverse Pareto-optimal architectures for multiple devices in one search.
  • It employs a MetaHypernetwork conditioned on user preferences and device embeddings to convert continuous architecture distributions into differentiable discrete designs.
  • Empirical results demonstrate superior Pareto front quality and search efficiency, outperforming baseline methods across various tasks and objectives.

Multi-objective Differentiable Neural Architecture Search: MODNAS

Introduction and Motivation

Multi-objective optimization (MOO) for neural architecture search (NAS) is essential for navigating tradeoffs between predictive performance and hardware efficiency metrics (e.g., latency, energy consumption, memory footprint) as modern neural architectures are increasingly deployed across heterogeneous computing devices. However, mainstream NAS approaches commonly address this challenge by integrating hardware constraints directly into the search objective, yielding only a single solution per constraint and per device—requiring repeated, costly searches for profiling the entire Pareto front across devices or tradeoff constraints.

"Multi-objective Differentiable Neural Architecture Search" (2402.18213) proposes a unified, hardware-aware, and gradient-based NAS formulation that produces diverse, distributed Pareto-optimal architecture sets for varying user preference scalarizations and different devices in a single search, thereby achieving substantial search efficiency improvements and state-of-the-art Pareto set quality.

Methodological Overview: MODNAS Formulation

The proposed method, MODNAS, frames hardware-aware multi-objective NAS as a multi-task, multi-objective bi-level optimization problem. Each device is treated as a separate optimization task with MM potentially conflicting objectives (e.g., accuracy, latency, energy). The Pareto front sampling is controlled by a user-specified preference vector (scalarization) rr in the MM-dimensional simplex, enabling convex combinations of objectives and flexible navigation of tradeoffs.

A key innovation is the use of a MetaHypernetwork HΦH_\Phi conditioned on both a preference vector rr and learnable device embeddings dtd_t. This hypernetwork outputs a continuous, unnormalized architectural distribution $\Tilde{\alpha}$, which the Architect module transforms into differentiable discrete architectures via the ReinMax estimator, eliminating the need for expensive search restarts for every device or constraint. Figure 1

Figure 1: MODNAS system diagram, featuring the MetaHypernetwork HΦ(r,dt)H_\Phi(r, d_t) to produce architecture distributions conditioned on user preferences and device characteristics, facilitating efficient multi-objective optimization across devices.

The Supernetwork ties this mechanism together via parameter sharing, supporting memory efficiency and scalable architecture evaluation. For non-differentiable or costly hardware metrics, a pre-trained MetaPredictor regresses device-specific objective values from architecture and device embeddings, enabling gradient-based updates for all objectives.

Optimization employs Multiple Gradient Descent (MGD)—using Frank-Wolfe–driven convex combinations of per-task gradients—to find updates for HΦH_\Phi that yield Pareto improvements on all devices and objectives. The algorithm performs bi-level, stochastic updates: outer-level updates HΦH_\Phi (architecture distribution), inner-level updates the Supernetwork weights, and both exploit scalarizations sampled from the Dirichlet prior over the simplex.

Experimental Evaluation

Large-scale Multi-device and Multi-objective Benchmarks

MODNAS is evaluated over NAS-Bench-201 (19 devices, up to 3 objectives), MobileNetV3/OFA (12 devices, ImageNet-1k), and a hardware-aware Transformer (HAT) search space (WMT'14 En-De, 3 devices). Each experiment assesses Pareto front quality using hypervolume (HV), GD, IGD, and variants, profiling Pareto fronts on test devices (zero-shot) by passing preference vectors and device embeddings through the trained MetaHypernetwork. Figure 2

Figure 2: MODNAS achieves broad Pareto front coverage (radar plot hypervolume) across 19 devices—significantly outperforming random and multi-run constraint-based baselines on NAS-Bench-201.

The results indicate that MODNAS outperforms all tested baselines (e.g., random search, random hypernetwork, MetaD2A+HELP) in hypervolume, especially on unseen/test devices, and achieves better front diversity versus prior approaches focused primarily on accuracy. Notably, MODNAS yields these results via a single search, irrespective of the number of objectives (MM) or devices (TT).

Gradient Aggregation Schemes and Robustness

MODNAS systematically compares mean gradient, sequential updates, MC-sampled updates, and MGD. The MGD approach exhibits superior convergence speed and final HV under all tested scenarios. Figure 3

Figure 3

Figure 3: Comparison of search hypervolume progression across gradient schemes—MGD delivers faster and higher HV convergence than baseline aggregation methods.

Three-objective Scalability

MODNAS is further validated on a tri-objective scenario (accuracy, latency, energy) on NAS-Bench-201 (FPGA and Eyeriss). The method demonstrates near-optimal HV and high-quality Pareto fronts without added search complexity. Figure 4

Figure 4

Figure 4: 3-objective window—MODNAS matches or surpasses baselines on hypervolume and Pareto coverage when scaling to tri-objective MOO.

Modulation via User Priors

By introducing explicit scalarized or hard constraints (e.g., latency caps) into the MetaHypernetwork inputs or training procedure, MODNAS flexibly adapts the sampled Pareto front shape and focus—e.g., producing more performant architectures under tighter hardware requirements—demonstrating both flexibility and practical deployment relevance. Figure 5

Figure 5

Figure 5: Pareto and HV sensitivity—MODNAS dynamically modulates fronts with latency constraints, highlighting front concentration and accuracy tradeoff.

Transferability and Generalization

Across tasks and predictors (including large vision and sequence datasets), MODNAS zero-shot transfers to new devices with strong Pareto coverage and accuracy, supported by reliable MetaPredictor performance for hardware metrics. Figure 6

Figure 6

Figure 6

Figure 6: HAT Transformer search space—HV plots show MODNAS maintaining dominant coverage across multiple hardware domains.

ImageNet/OFA Validation

On MobileNetV3/OFA search space with 12 devices, MODNAS demonstrates efficient search cost and higher Pareto set HV across all devices, outperforming OFA+HELP, with MODNAS requiring substantially less GPU time per device/constraint coverage. Figure 7

Figure 7: MODNAS hypervolume (radar) for MobileNetV3—across accuracy-latency axes and target devices, MODNAS consistently dominates baseline methods.

Implications and Future Applications

MODNAS unifies multi-objective, hardware-aware NAS using a differentiable, meta-learned approach that, for the first time, delivers full Pareto front approximation, device transfer, and constraint flexibility in a single search. The method's scalability to many objectives and devices, strong zero-shot transfer, and search efficiency directly address long-standing obstacles in deploying deep architectures under real-world compute and energy constraints.

MODNAS's framework is immediately relevant for automated deployment pipelines, where variable device constraints and user priorities are non-stationary and must be navigated with limited search budgets. The approach's extension to fairness, robustness, and other multi-objective domains is straightforward, and future work can exploit learned device/embedding spaces for rapid adaptation in dynamic or edge environments.

The methodology offers theoretical connections to meta-learning, multi-task optimization, and hypernetwork generalization. Its plugin architecture supports integration with arbitrary gradient-based NAS spaces, search spaces with complex architectural topologies, and meta-predictors for further hardware, performance, or fairness objectives.

Conclusion

MODNAS (2402.18213) introduces a principled, scalable, and hardware-aware differentiable NAS paradigm to simultaneously profile the global multi-objective Pareto front across arbitrary devices and user priorities, all in a single search. Empirical evidence on benchmarks with up to 19 devices and three objectives demonstrates robust generalizability, superior Pareto set quality, and search cost reduction, establishing a new state of the art for practical multi-objective NAS. The design is compatible with diverse objectives and offers strong potential for extension into fairness and deployment-centric auto-ML.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 3 likes about this paper.