Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-Resolution (2312.05799v3)

Published 10 Dec 2023 in cs.CV

Abstract: Depth super-resolution (DSR) aims to restore high-resolution (HR) depth from low-resolution (LR) one, where RGB image is often used to promote this task. Recent image guided DSR approaches mainly focus on spatial domain to rebuild depth structure. However, since the structure of LR depth is usually blurry, only considering spatial domain is not very sufficient to acquire satisfactory results. In this paper, we propose structure guided network (SGNet), a method that pays more attention to gradient and frequency domains, both of which have the inherent ability to capture high-frequency structure. Specifically, we first introduce the gradient calibration module (GCM), which employs the accurate gradient prior of RGB to sharpen the LR depth structure. Then we present the Frequency Awareness Module (FAM) that recursively conducts multiple spectrum differencing blocks (SDB), each of which propagates the precise high-frequency components of RGB into the LR depth. Extensive experimental results on both real and synthetic datasets demonstrate the superiority of our SGNet, reaching the state-of-the-art. Codes and pre-trained models are available at https://github.com/yanzq95/SGNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Guided image-to-image translation with bi-directional feature transformation. In ICCV, 9016–9025.
  2. Augmented reality and virtual reality in physical and online retailing: A review, synthesis and research agenda. Augmented reality and virtual reality: Empowering human, place and business, 119–132.
  3. Learning graph regularisation for guided super-resolution. In CVPR, 1979–1988.
  4. Deep convolutional neural network for multi-modal image restoration and fusion. IEEE transactions on pattern analysis and machine intelligence, 43(10): 3333–3348.
  5. Image guided depth upsampling using anisotropic total generalized variation. In ICCV, 993–1000.
  6. Robust guided image filtering using nonconvex potentials. IEEE transactions on pattern analysis and machine intelligence, 40(1): 192–207.
  7. Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline. In CVPR, 9229–9238.
  8. Evaluation of cost functions for stereo matching. In CVPR, 1–8.
  9. Depth map super-resolution by deep multi-scale guidance. In ECCV, 353–369.
  10. Focal frequency loss for image reconstruction and synthesis. In ICCV, 13919–13929.
  11. Deformable kernel networks for joint image filtering. International Journal of Computer Vision, 129(2): 579–600.
  12. Adam: A Method for Stochastic Optimization. Computer Science.
  13. Deep joint image filtering. In ECCV, 154–169.
  14. Joint image filtering with deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 41(8): 1909–1923.
  15. Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder. In CVPR, 1736–1745.
  16. Depth enhancement via low-rank matrix completion. In CVPR, 3390–3397.
  17. Structure-preserving super resolution with gradient guidance. In CVPR, 7769–7778.
  18. Intriguing findings of frequency selection for image deblurring. In AAAI, 1905–1913.
  19. Guided Depth Super-Resolution by Deep Anisotropic Diffusion. In CVPR, 18237–18246.
  20. Depth Super-Resolution from Explicit and Implicit High-Frequency Features. arXiv preprint arXiv:2303.09307.
  21. Learning conditional random fields for stereo. In CVPR, 1–8.
  22. Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution. In ACM MM, 3867–3876.
  23. Indoor segmentation and support inference from rgbd images. In ECCV, 746–760.
  24. Channel attention based iterative residual learning for depth map super-resolution. In CVPR, 5631–5640.
  25. Pixel-adaptive convolutional neural networks. In CVPR, 11166–11175.
  26. Learning scene structure guidance via cross-task knowledge transfer for single depth super-resolution. In CVPR, 7792–7801.
  27. Gradient profile prior and its applications in image super-resolution and enhancement. IEEE Transactions on Image Processing, 20(6): 1529–1542.
  28. Joint implicit image function for guided depth super-resolution. In ACM MM, 4390–4399.
  29. Bridgenet: A joint learning network of depth map super-resolution and monocular depth estimation. In ACM MM, 2148–2157.
  30. Cbam: Convolutional block attention module. In ECCV, 3–19.
  31. Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light: Science & Applications, 10(1): 216.
  32. Learning complementary correlations for depth super-resolution with incomplete data in real world. IEEE transactions on neural networks and learning systems.
  33. RigNet: Repetitive image guided network for depth completion. In ECCV, 214–230. Springer.
  34. CODON: on orchestrating cross-domain attentions for depth super-resolution. International Journal of Computer Vision, 130(2): 267–284.
  35. Recurrent Structure Attention Guidance for Depth Super-Resolution. arXiv preprint arXiv:2301.13419.
  36. Structure Flow-Guided Network for Real Depth Super-Resolution. arXiv preprint arXiv:2301.13416.
  37. Image super-resolution using very deep residual channel attention networks. In ECCV, 286–301.
  38. Spherical space feature decomposition for guided depth map super-resolution. arXiv preprint arXiv:2303.08942.
  39. Discrete cosine transform network for guided depth map super-resolution. In CVPR, 5697–5707.
  40. High-resolution depth maps imaging via attention-based hierarchical multi-modal fusion. IEEE Transactions on Image Processing, 31: 648–663.
  41. Pan-sharpening with customized transformer and invertible neural network. In AAAI, volume 36, 3553–3561.
  42. Adaptively learning low-high frequency information integration for pan-sharpening. In ACM MM, 3375–3384.
  43. Spatial-frequency domain information integration for pan-sharpening. In ECCV, 274–291.
  44. Modeling deformable gradient compositions for single-image super-resolution. In CVPR, 5417–5425.
Citations (15)

Summary

  • The paper introduces SGNet, a novel architecture that leverages gradient and frequency cues to enhance depth map super-resolution.
  • It employs a Gradient Calibration Module and a Frequency Awareness Module to embed high-frequency details and sharpen depth reconstruction.
  • Experiments demonstrate significant performance improvements over state-of-the-art methods in 3D reconstruction, VR, and AR tasks.

Introduction

Depth map super-resolution (DSR) is a technology widely used in various applications such as 3D reconstruction, virtual reality, and augmented reality. Essentially, DSR aims to generate a high-resolution depth map from a low-resolution one. Standard approaches often leverage accompanying high-resolution RGB images to enhance the reconstruction of depth structure. However, due to the inherently fuzzy nature of low-resolution depth, focusing solely on the spatial domain often falls short. Recognizing the limitation, this work introduces a novel approach that additionally harnesses the gradient and frequency domains to extract high-frequency structural details.

Gradient and Frequency Learning

The proposed system, SGNet, includes two novel components: the Gradient Calibration Module (GCM) and the Frequency Awareness Module (FAM).

Gradient Domain

GCM utilizes the clear gradient features of a high-resolution RGB image to rectify and enhance the blurry structure of low-resolution depth maps. This process involves mapping both RGB and low-resolution depth images into a gradient domain and then applying the refined RGB gradient information to improve the depth structure. A gradient-aware loss function further sharpens the structure by reducing the discrepancy between the intermediate features of GCM and the target high-resolution depth in the gradient domain.

Frequency Domain

FAM introduces a series of Spectrum Differencing Blocks (SDB), which operate recursively to embed the high-frequency components of the RGB image into the low-resolution depth. It starts by mapping the RGB and low-resolution depth images into the frequency space, where the difference in high-frequency information between the two is emphasized and merged to enhance the depth map structure. A frequency-aware loss function is also employed to solidify the response of FAM in the frequency space.

Related Work

DSR has been subject to extensive research, with various methods focusing on the spatial domain to leverage the rich structure information of RGB images to aid in depth map reconstruction. However, previous methods have not fully explored the potential of gradient and frequency information. By comparison, SGNet stands out for its innovative use of these domains to guide the structural recovery of depth maps.

Experiments and Results

Comprehensive testing on both real-world and synthetic data sets demonstrates that SGNet surpasses state-of-the-art methods, significantly improving depth map quality. Notably, on multiple benchmarks, SGNet improved upon the next-best methodologies by considerable margins.

Conclusion

SGNet proposes an advanced approach to depth map super-resolution by extending beyond the spatial domain to incorporate insights from gradient and frequency domains. It has shown notable performance gains over existing methods, highlighting the importance of multi-domain information exploitation in DSR tasks. With its publicly available codes and pre-trained models, SGNet is poised to be a valuable contribution to the research community.