MobileNetV4 -- Universal Models for the Mobile Ecosystem (2404.10518v2)

Published 16 Apr 2024 in cs.CV

Abstract: We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.

References (52)

Citations (29)

View on Semantic Scholar

Summary

The paper presents MobileNetV4's Universal Inverted Bottleneck (UIB) and Mobile MQA, which boost model efficiency across varied mobile platforms.
It employs an optimized neural architecture search strategy to reliably detect robust, mostly Pareto optimal architectures.
Results include 87% top-1 ImageNet accuracy and a 3.8ms runtime on Pixel 8 EdgeTPU, underscoring its practical impact on mobile deployment.

MobileNetV4: Enhancements for Optimal Mobile Ecosystem Deployment

Introduction to MobileNetV4

The latest addition to the MobileNets series, MobileNetV4 (MNv4), offers significant innovations in mobile device architectures which address the balancing act between efficiency and accuracy. The crucial advancements include the Universal Inverted Bottleneck (UIB), an optimized Mobile Multi-Query Attention (MQA) block tailored for mobile accelerators, and an improved neural architecture search (NAS) recipe. Among these advancements, the UIB and Mobile MQA are pivotal in achieving a universally efficient architecture designed to be mostly Pareto optimal across diverse mobile platforms, including CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU.

Key Contributions

Universal Inverted Bottleneck (UIB): The UIB is an evolution of the Inverted Bottleneck block, integrating features from ConvNext and Feed Forward Networks. This block allows flexibility in spatial and channel mixing, option to extend the receptive field, and improved computational efficiency.
Mobile MQA: A novel attention block providing a 39% inference speedup on mobile accelerators. It exploits the efficiency of shared keys and values across all attention heads, significantly improving the operational intensity, which is crucial for performance on mobile devices.
Optimized Neural Architecture Search (NAS): A refined NAS process enhances MNv4's search effectiveness. The inclusion of a coarse and fine-grained search, along with an offline distilled dataset, improves the detection of robust architectures, making the search process more efficient and effective.
Mostly Pareto Optimal Performance: MNv4 achieves mostly Pareto optimal performance across a wide range of hardware, establishing a new benchmark for multi-platform deployment without platform-specific tuning.

Results and Implications

Improved Hardware Efficiency

MNv4 models showcase exceptional hardware-wise efficiency. Specifically, the MNv4-Hybrid-L model achieved 87% top-1 accuracy on ImageNet-1K with a notable 3.8ms runtime on the Pixel 8 EdgeTPU. This robust performance is indicative of its universal design efficiency, beneficial for practitioners aiming to deploy on a variety of mobile platforms without needing extensive customization.

Benchmarks Across Devices

The MNv4 models' performance was rigorously benchmarked across significant mobile processing environments. It was noted that these models exhibited mostly Pareto optimal curves in almost all hardware scenarios tested. This universality in performance across hardware types like CPUs, DSPs, and specialized accelerators underscores MNv4's broad applicability in the mobile ecosystem.

Future Outlook

Building on the insights gained from MNv4, future research could explore further optimization of the UIB and Mobile MQA components to enhance model efficiency and accuracy. Additionally, expanding the NAS methodology to seamlessly integrate emergent hardware capabilities could sustain the evolution of highly efficient mobile-specific models. As mobile devices continue to diversify and as their computing capabilities expand, maintaining a focus on universal model performance will remain paramount.

In conclusion, MobileNetV4's introduction of the UIB block, enhanced Mobile MQA, and refined NAS represent significant steps forward in the design of neural network architectures for mobile devices. Its mostly Pareto optimal performance across diverse hardware platforms not only enhances its applicability but also sets a new standard in mobile neural network efficiency and effectiveness.

PDF Markdown

Related Papers

Tweets

https://twitter.com/wightmanr/status/1794011443395809304

https://twitter.com/fly51fly/status/1780719080720486566

https://twitter.com/OpenCV_AI/status/1795777547218596275

https://twitter.com/arxivsanitybot/status/1780591479561326928

YouTube

Show All Videos