Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge (2401.12350v1)

Published 22 Jan 2024 in cs.CV and cs.LG

Abstract: Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems 34 (2021), 1074–1085.
  2. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019).
  3. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
  4. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on computer vision. 12239–12248.
  5. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nature Machine Intelligence 3, 8 (jun 2021), 675–686. https://doi.org/10.1038/s42256-021-00356-5
  6. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248–255.
  8. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997–2017.
  9. FNA++: Fast network adaptation via parameter remapping and architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 9 (2020), 2990–3004.
  10. Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
  11. Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020. 544–560.
  12. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
  13. Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing 338 (2019), 321–348.
  14. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1989–1998.
  15. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
  16. Deep learning for generic object detection: A survey. International journal of computer vision 128, 2 (2020), 261–318.
  17. Distilling optimal neural networks: Rapid search in diverse spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12229–12238.
  18. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  19. Radar-based Object Classification in ADAS with Hardware-Aware NAS and Input Region Scaling. In 2023 IEEE Radar Conference (RadarConf23). IEEE, 1–6.
  20. Raghubir Singh and Sukhpal Singh Gill. 2023. Edge AI: a survey. Internet of Things and Cyber-Physical Systems (2023).
  21. Neural architecture search for energy-efficient always-on audio machine learning. Neural Computing and Applications 35, 16 (2023), 12133–12144.
  22. Quantization-Aware Neural Architecture Search with Hyperparameter Optimization for Industrial Predictive Maintenance Applications. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  23. BOMP-NAS: Bayesian Optimization Mixed Precision NAS. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  24. Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2078–2087.
  25. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 129–137.
  26. Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–37.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com