Efficient Long-Range Attention Network for Image Super-resolution

Published 13 Mar 2022 in cs.CV | (2203.06697v1)

Abstract: Recently, transformer-based methods have demonstrated impressive results in various vision tasks, including image super-resolution (SR), by exploiting the self-attention (SA) for feature extraction. However, the computation of SA in most existing transformer based models is very expensive, while some employed operations may be redundant for the SR task. This limits the range of SA computation and consequently the SR performance. In this work, we propose an efficient long-range attention network (ELAN) for image SR. Specifically, we first employ shift convolution (shift-conv) to effectively extract the image local structural information while maintaining the same level of complexity as 1x1 convolution, then propose a group-wise multi-scale self-attention (GMSA) module, which calculates SA on non-overlapped groups of features using different window sizes to exploit the long-range image dependency. A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism. Without bells and whistles, our ELAN follows a fairly simple design by sequentially cascading the ELABs. Extensive experiments demonstrate that ELAN obtains even better results against the transformer-based SR models but with significantly less complexity. The source code can be found at https://github.com/xindongzhang/ELAN.

Abstract PDF Upgrade to Chat

Citations (199)

View on Semantic Scholar

Summary

The paper introduces ELAN, which efficiently integrates Shift-Conv and group-wise multi-scale self-attention to capture long-range dependencies for image super-resolution.
The methodology reduces computational overhead by employing accelerated self-attention and a shared attention mechanism for optimal resource use.
Empirical evaluations demonstrate consistent improvements in PSNR and SSIM across benchmarks, highlighting ELAN's superior performance and efficiency.

An Evaluation of Efficient Long-Range Attention Network for Image Super-resolution

The paper "Efficient Long-Range Attention Network for Image Super-resolution" introduces a novel approach to improving the computational efficiency and performance of single image super-resolution (SR) utilizing transformer-based methods. The authors propose the Efficient Long-Range Attention Network (ELAN), which aims to address the computational complexity associated with self-attention (SA) in existing models, particularly when applied to SR tasks that involve large input feature sizes.

Methodological Innovations

The ELAN framework is characterized by its simplicity and performance efficiency, comprised of three primary components: shallow feature extraction, deep feature extraction through Efficient Long-Range Attention Blocks (ELAB), and HR image reconstruction. The ELAB is a pivotal innovation, integrating shift convolution (shift-conv) and a group-wise multi-scale self-attention (GMSA) mechanism.

Shift Convolution (Shift-Conv): Enhances local structural information extraction with the same complexity as 1x1 convolutions while maintaining large receptive fields for efficient feature extraction.
Group-wise Multi-Scale Self-Attention (GMSA): This module reduces the computational burden by dividing features into groups with varying window sizes, hence efficiently leveraging long-range dependencies in the image data.
Accelerated Self-Attention (ASA) and Shared Attention Mechanism: The ASA mechanism streamlines SA calculation by minimizing the computational overhead associated with SA in large feature spaces. The Shared Attention mechanism further optimizes resource use by reustilizing attention maps across layers.

Empirical Evaluation

The authors performed extensive experiments benchmarking ELAN against state-of-the-art CNN and transformer-based SR models across multiple datasets, including Set5, Set14, BSD100, Urban100, and Manga109. The results consistently show that the ELAN framework surpasses competitor models both in terms of PSNR and SSIM metrics across various scaling factors, demonstrating superior image reconstruction quality and computational efficiency. Notably, ELAN maintains high performance while significantly reducing computational complexity, evidenced by decreased FLOPs and latency in evaluation environments.

Implications and Future Directions

ELAN's design underpins a significant stride in balancing performance and computational demand in SR tasks, particularly relevant when deploying models in resource-constrained environments. The capacity to efficiently model long-range dependencies without incurring exorbitant computational costs suggests future applications in broader vision tasks that necessitate large-scale feature interactions and structure learning.

Given these advancements, future research can explore the application of ELAN's architecture to other low-level vision tasks, such as denoising, inpainting, and texture synthesis. Additionally, refining the shared attention mechanism and further optimizing the GMSA strategy could yield even greater efficiency, potentially enabling real-time applications in edge devices and mobile platforms.

In summary, the proposed ELAN framework significantly contributes to the domain of image super-resolution by offering an effective and efficient model architecture, which leverages innovative computational techniques to achieve high-quality image reconstruction. The paper's findings underscore the potential of integrating adaptive and resource-conscious strategies within transformer-based SR models, advocating an exciting direction for further exploration in the computational vision community.

Markdown Report Issue