Dual Aggregation Transformer for Image Super-Resolution (2308.03364v2)

Published 7 Aug 2023 in cs.CV

Abstract: Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at https://github.com/zhengchen1999/DAT.

Citations (100)

View on Semantic Scholar

Summary

The paper introduces a novel DAT that employs inter-block and intra-block aggregation to capture both spatial and channel dependencies.
It leverages an Adaptive Interaction Module and a Spatial-Gate Feed-Forward Network to blend global and local features for superior reconstruction.
Experiments on benchmark datasets show DAT’s significant PSNR and SSIM gains while reducing computational overhead compared to state-of-the-art models.

Overview of "Dual Aggregation Transformer for Image Super-Resolution"

The paper introduces a novel Transformer-based model, the Dual Aggregation Transformer (DAT), aiming to enhance image super-resolution (SR) by aggregating spatial and channel features effectively. The paper addresses the limitations of traditional convolutional approaches which often struggle with capturing global dependencies crucial for high-quality image reconstruction.

Methodology

The authors propose the Dual Aggregation Transformer (DAT), which is characterized by its ability to perform feature aggregation across spatial and channel dimensions via both inter-block and intra-block mechanisms.

Inter-Block Feature Aggregation:
- DAT alternates between using spatial window self-attention (SW-SA) and channel-wise self-attention (CW-SA) across successive Transformer blocks. This strategy allows the model to capture comprehensive spatial and channel contexts, optimizing the representation capabilities needed for SR tasks.
Intra-Block Feature Aggregation:
- An Adaptive Interaction Module (AIM) is introduced to enhance the fusion of features from self-attention and convolutional branches. AIM uses spatial and channel interaction mechanisms to adaptively combine global and local information.
- The Spatial-Gate Feed-Forward Network (SGFN) is incorporated to integrate additional nonlinear spatial information and address channel redundancy, strengthening the traditional feed-forward network's ability to handle spatial features.

Overall, these dual aggregation strategies are designed to achieve superior feature representation, facilitating high-quality image reconstruction.

Experimental Results

The authors conduct extensive experiments across several benchmark datasets, using upscaling factors of ×2, ×3, and ×4. The results indicate that DAT consistently outperforms existing state-of-the-art methods, notably on challenging datasets such as Urban100 and Manga109.

Performance Gains: The paper reports significant improvements in PSNR and SSIM metrics, showcasing DAT’s capability in generating sharper and more accurate images compared to other SR methods.
Computation Efficiency: Compared with other models like SwinIR and CAT-A, DAT provides competitive performance with lower computation complexity (FLOPs) and parameter count.

Implications and Future Directions

The introduction of DAT marks an important step in leveraging Transformers for low-level vision tasks, specifically image super-resolution. The dual aggregation approach is proven to effectively integrate spatial and channel information, providing a robust framework for future research.

Potential developments may include:

Further refinement of AIM and SGFN modules to optimize computational overhead.
Exploration of DAT applications in other related vision tasks requiring enhanced detail preservation and reconstruction.
Investigation into the integration of additional attention mechanisms to further augment the model’s adaptability and efficiency.

In conclusion, the Dual Aggregation Transformers present a significant advancement in image super-resolution through innovative feature aggregation strategies, standing as a promising foundation for future enhancements in both theoretical models and practical applications.

PDF Markdown

Related Papers

GitHub

GitHub - zhengchen1999/DAT: PyTorch code for our ICCV 2023 paper "Dual Aggregation Transformer for Image Super-Resolution" (430 stars)

Tweets

https://twitter.com/EsotericCofe/status/1829785788294340913