MicroNet: Improving Image Recognition with Extremely Low FLOPs
(2108.05894v1)
Published 12 Aug 2021 in cs.CV and cs.LG
Abstract: This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy. The former avoids the significant reduction of network width, while the latter mitigates the detriment of reduction in network depth. Technically, we propose micro-factorized convolution, which factorizes a convolution matrix into low rank matrices, to integrate sparse connectivity into convolution. We also present a new dynamic activation function, named Dynamic Shift Max, to improve the non-linearity via maxing out multiple dynamic fusions between an input feature map and its circular channel shift. Building upon these two new operators, we arrive at a family of networks, named MicroNet, that achieves significant performance gains over the state of the art in the low FLOP regime. For instance, under the constraint of 12M FLOPs, MicroNet achieves 59.4\% top-1 accuracy on ImageNet classification, outperforming MobileNetV3 by 9.6\%. Source code is at \href{https://github.com/liyunsheng13/micronet}{https://github.com/liyunsheng13/micronet}.
The paper introduces a MicroNet framework that reduces computational cost (4-21M FLOPs) while outperforming benchmarks like MobileNet.
It presents a novel Micro-Factorized Convolution technique that optimizes channel connectivity and reduces FLOPs through adaptive group configurations.
Dynamic Shift-Max activation is employed to dynamically fuse channel groups, enhancing non-linearity and improving accuracy on tasks such as ImageNet classification.
Overview of the MicroNet Approach in Image Recognition
The paper "MicroNet: Improving Image Recognition with Extremely Low FLOPs" presents an innovative method to tackle the challenge of maintaining image recognition performance while significantly reducing computational demands. The focus is on achieving efficient Convolutional Neural Networks (CNNs) that operate with as few as 4M to 21M FLOPs, which is a marked decrease from the prevalent budgets in existing state-of-the-art models like MobileNet.
Technical Contributions
The authors introduce two primary technical innovations: Micro-Factorized Convolution and Dynamic Shift-Max activation functions. Together, these components enable a new family of networks, termed MicroNets, which can outperform established models at the same or lower computational cost.
Micro-Factorized Convolution: The paper proposes a technique to effectively balance the number of channels and node connectivity using low-rank approximations. This approach is applied to both pointwise and depthwise convolutions. For pointwise convolutions, the number of groups is adaptively determined via G=C/R​, optimizing the trade-off between channel width and connectivity. The factorization reduces the FLOPs by decomposing a pointwise convolution into two group-adaptive layers with an underlying permutation, enhancing input-output connectivity without saturating the computational budget.
Dynamic Shift-Max: This novel activation function improves non-linearity by fusing channel groups dynamically based on input, effectively increasing the model's representation power. Dynamic Shift-Max is designed to select the best channel fusion dynamically and is computationally light, adhering to the constraints of extremely low FLOPs.
Empirical Results
MicroNets are empirically demonstrated to outperform baselines on ImageNet classification under low FLOPs. For instance, with 12M FLOPs, a MicroNet achieves 59.4% top-1 accuracy, markedly higher than MobileNetV3. This result emphasizes the effectiveness of the proposed methodologies. Moreover, MicroNet's adaptability extends beyond classification, showcasing improvements in object detection and keypoint detection tasks, indicating its broader applicability in vision tasks constrained by computational resources.
Implications and Future Directions
The implications of this research are notable in environments where power or computational resources are limited, such as edge devices and mobile applications. By reducing the FLOPs while preserving or enhancing performance, MicroNets present a significant advance for efficient model design.
For future developments, several avenues could be explored:
Hardware Optimization: The implementation efficiency might benefit from hardware-specific optimizations, especially for the group convolutions and dynamic operations.
Automated Architecture Search: Integrating MicroNet design principles with architecture search methodologies could potentially yield networks with even more optimal performance-efficiency trade-offs.
Cross-Domain Applications: Extending this approach to other domains where computational efficiency is critical could be beneficial.
In conclusion, the MicroNet paper contributes a theoretically sound and practically impactful advancement in designing efficient CNNs for image recognition tasks. Its introduction of Micro-Factorized Convolution and Dynamic Shift-Max activation functions offers a promising direction for future research and applications in efficient deep learning.