Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 85 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge (2401.12350v1)

Published 22 Jan 2024 in cs.CV and cs.LG

Abstract: Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems 34 (2021), 1074–1085.
  2. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019).
  3. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
  4. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on computer vision. 12239–12248.
  5. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nature Machine Intelligence 3, 8 (jun 2021), 675–686. https://doi.org/10.1038/s42256-021-00356-5
  6. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248–255.
  8. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997–2017.
  9. FNA++: Fast network adaptation via parameter remapping and architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 9 (2020), 2990–3004.
  10. Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
  11. Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020. 544–560.
  12. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
  13. Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing 338 (2019), 321–348.
  14. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1989–1998.
  15. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
  16. Deep learning for generic object detection: A survey. International journal of computer vision 128, 2 (2020), 261–318.
  17. Distilling optimal neural networks: Rapid search in diverse spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12229–12238.
  18. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  19. Radar-based Object Classification in ADAS with Hardware-Aware NAS and Input Region Scaling. In 2023 IEEE Radar Conference (RadarConf23). IEEE, 1–6.
  20. Raghubir Singh and Sukhpal Singh Gill. 2023. Edge AI: a survey. Internet of Things and Cyber-Physical Systems (2023).
  21. Neural architecture search for energy-efficient always-on audio machine learning. Neural Computing and Applications 35, 16 (2023), 12133–12144.
  22. Quantization-Aware Neural Architecture Search with Hyperparameter Optimization for Industrial Predictive Maintenance Applications. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  23. BOMP-NAS: Bayesian Optimization Mixed Precision NAS. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  24. Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2078–2087.
  25. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 129–137.
  26. Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–37.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube