NAFSSR: Stereo Image Super-Resolution Using NAFNet (2204.08714v2)

Published 19 Apr 2022 in cs.CV

Abstract: Stereo image super-resolution aims at enhancing the quality of super-resolution results by utilizing the complementary information provided by binocular systems. To obtain reasonable performance, most methods focus on finely designing modules, loss functions, and etc. to exploit information from another viewpoint. This has the side effect of increasing system complexity, making it difficult for researchers to evaluate new ideas and compare methods. This paper inherits a strong and simple image restoration model, NAFNet, for single-view feature extraction and extends it by adding cross attention modules to fuse features between views to adapt to binocular scenarios. The proposed baseline for stereo image super-resolution is noted as NAFSSR. Furthermore, training/testing strategies are proposed to fully exploit the performance of NAFSSR. Extensive experiments demonstrate the effectiveness of our method. In particular, NAFSSR outperforms the state-of-the-art methods on the KITTI 2012, KITTI 2015, Middlebury, and Flickr1024 datasets. With NAFSSR, we won 1st place in the NTIRE 2022 Stereo Image Super-resolution Challenge. Codes and models will be released at https://github.com/megvii-research/NAFNet.

Citations (94)

View on Semantic Scholar

Summary

The paper introduces NAFSSR, a framework that extends NAFNet using cross-attention to fuse stereo views for enhanced super-resolution.
It achieves significant efficiency, reducing parameters by up to 79% and boosting speed over 5 times while delivering superior PSNR and SSIM scores.
The framework offers practical benefits for applications in autonomous driving, VR, and robotics by simplifying architecture without sacrificing performance.

NAFSSR: Stereo Image Super-Resolution Using NAFNet

The paper "NAFSSR: Stereo Image Super-Resolution Using NAFNet" introduces an innovative framework for stereo image super-resolution (SR), aiming to enhance low-resolution stereo image pairs by leveraging both intra-view and cross-view information. The authors build on the existing NAFNet architecture designed for single image restoration and extend it to stereo image scenarios through the integration of cross-attention modules.

Technical Overview

Stereo image super-resolution seeks to recover high-resolution images from low-resolution counterparts, leveraging the additional viewpoint information available in stereo pairs. This paper addresses the complexities often faced in stereo SR models, which tend to increase system complexity with intricate module designs and loss functions. Instead, the paper proposes NAFSSR, a baseline approach adapted from NAFNet, which is renowned for its simplicity and competitive performance in single image restoration tasks.

The NAFSSR architecture is composed of three main components:

Intra-view Feature Extraction: Utilizing NAFNet blocks (NAFBlocks) for extracting features from each stereo view, maintaining weight-sharing to promote consistency.
Cross-view Feature Fusion: Implemented via Stereo Cross Attention Modules (SCAMs), which fuse features from the left and right images, focusing on interactions along the horizontal epipolar lines specific to stereo images.
Reconstruction: Leveraging convolutional layers and pixel shuffle operations to upscale the features to a high-resolution format.

NAFSSR incorporates regularization techniques like stochastic depth to alleviate overfitting, especially relevant given the limited training data for stereo tasks. Additionally, channel shuffle is introduced for data augmentation, and test-time local converters (TLSC) are employed to address train-test disparities.

Numerical Results and Claims

The authors provide extensive experimental results demonstrating that NAFSSR outperforms contemporary state-of-the-art models across multiple datasets, including KITTI 2012, KITTI 2015, Middlebury, and Flickr1024. Notably, NAFSSR achieves a substantial reduction in parameter count—up to 79% compared to existing models—while delivering superior PSNR and SSIM scores. This parameter efficiency, coupled with speed improvements of up to 5.11 times over previous methods, highlights the framework's practical applicability.

Implications and Future Directions

The implications of this research are multi-faceted. Practically, the proposed NAFSSR framework offers a more efficient and effective solution for stereo image super-resolution, with potential applications in fields such as autonomous driving, virtual reality, and robotics where stereo vision systems are prevalent.

Theoretically, this research advances the understanding of how cross-attention mechanisms can be specifically adapted and optimized for stereo images, highlighting the synergy between advanced attention modules and simplified network architectures in handling spatially correlated data.

Future research could explore further optimizations of the SCAM to capture more intricate details across varying image disparities or investigate the application of NAFSSR to other tasks involving stereo or multi-view images. Additionally, scaling models and training techniques to accommodate larger datasets without sacrificing performance could be a valuable direction.

Overall, "NAFSSR: Stereo Image Super-Resolution Using NAFNet" provides a comprehensive and efficient approach to tackling the challenges of stereo image super-resolution, emphasizing both the simplicity and robustness necessary for practical deployment.

PDF Markdown

Related Papers

GitHub

GitHub - megvii-research/NAFNet: The state-of-the-art image restoration model without nonlinear activation functions. (2,045 stars)

Tweets

https://twitter.com/_akhaliq/status/1516591087615463425

https://twitter.com/johntroony/status/1528701274660683778

https://twitter.com/NewsNeus/status/1517226130561982465