Emergent Mind

MobileNetV4 -- Universal Models for the Mobile Ecosystem

(2404.10518)
Published Apr 16, 2024 in cs.CV

Abstract

We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.

MNv4 model excels in efficiency on various devices compared to other leading models, despite some compatibility issues.

Overview

  • MobileNetV4 introduces significant enhancements such as the Universal Inverted Bottleneck (UIB), optimized Mobile Multi-Query Attention (MQA), and an improved neural architecture search (NAS) for better mobile deployment.

  • The model achieves mostly Pareto optimal performance on various hardware platforms including CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU.

  • MNv4-Hybrid-L model has shown remarkable performance with 87% top-1 accuracy on ImageNet-1K and a rapid 3.8ms runtime on the Pixel 8 EdgeTPU.

  • Future improvements may focus on further optimization of UIB and Mobile MQA, and adapting NAS to accommodate new hardware developments.

MobileNetV4: Enhancements for Optimal Mobile Ecosystem Deployment

Introduction to MobileNetV4

The latest addition to the MobileNets series, MobileNetV4 (MNv4), offers significant innovations in mobile device architectures which address the balancing act between efficiency and accuracy. The crucial advancements include the Universal Inverted Bottleneck (UIB), an optimized Mobile Multi-Query Attention (MQA) block tailored for mobile accelerators, and an improved neural architecture search (NAS) recipe. Among these advancements, the UIB and Mobile MQA are pivotal in achieving a universally efficient architecture designed to be mostly Pareto optimal across diverse mobile platforms, including CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU.

Key Contributions

  • Universal Inverted Bottleneck (UIB): The UIB is an evolution of the Inverted Bottleneck block, integrating features from ConvNext and Feed Forward Networks. This block allows flexibility in spatial and channel mixing, option to extend the receptive field, and improved computational efficiency.
  • Mobile MQA: A novel attention block providing a 39% inference speedup on mobile accelerators. It exploits the efficiency of shared keys and values across all attention heads, significantly improving the operational intensity, which is crucial for performance on mobile devices.
  • Optimized Neural Architecture Search (NAS): A refined NAS process enhances MNv4's search effectiveness. The inclusion of a coarse and fine-grained search, along with an offline distilled dataset, improves the detection of robust architectures, making the search process more efficient and effective.
  • Mostly Pareto Optimal Performance: MNv4 achieves mostly Pareto optimal performance across a wide range of hardware, establishing a new benchmark for multi-platform deployment without platform-specific tuning.

Results and Implications

Improved Hardware Efficiency

MNv4 models showcase exceptional hardware-wise efficiency. Specifically, the MNv4-Hybrid-L model achieved 87% top-1 accuracy on ImageNet-1K with a notable 3.8ms runtime on the Pixel 8 EdgeTPU. This robust performance is indicative of its universal design efficiency, beneficial for practitioners aiming to deploy on a variety of mobile platforms without needing extensive customization.

Benchmarks Across Devices

The MNv4 models' performance was rigorously benchmarked across significant mobile processing environments. It was noted that these models exhibited mostly Pareto optimal curves in almost all hardware scenarios tested. This universality in performance across hardware types like CPUs, DSPs, and specialized accelerators underscores MNv4's broad applicability in the mobile ecosystem.

Future Outlook

Building on the insights gained from MNv4, future research could explore further optimization of the UIB and Mobile MQA components to enhance model efficiency and accuracy. Additionally, expanding the NAS methodology to seamlessly integrate emergent hardware capabilities could sustain the evolution of highly efficient mobile-specific models. As mobile devices continue to diversify and as their computing capabilities expand, maintaining a focus on universal model performance will remain paramount.

In conclusion, MobileNetV4's introduction of the UIB block, enhanced Mobile MQA, and refined NAS represent significant steps forward in the design of neural network architectures for mobile devices. Its mostly Pareto optimal performance across diverse hardware platforms not only enhances its applicability but also sets a new standard in mobile neural network efficiency and effectiveness.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.