Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Depth Completion Guided by 3D Perception and Geometry Consistency (2312.15263v1)

Published 23 Dec 2023 in cs.CV

Abstract: Depth completion, aiming to predict dense depth maps from sparse depth measurements, plays a crucial role in many computer vision related applications. Deep learning approaches have demonstrated overwhelming success in this task. However, high-precision depth completion without relying on the ground-truth data, which are usually costly, still remains challenging. The reason lies on the ignorance of 3D structural information in most previous unsupervised solutions, causing inaccurate spatial propagation and mixed-depth problems. To alleviate the above challenges, this paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method. Firstly, a 3D perceptual spatial propagation algorithm is constructed with a point cloud representation and an attention weighting mechanism to capture more reasonable and favorable neighboring features during the iterative depth propagation process. Secondly, the multi-view geometric constraints between adjacent views are explicitly incorporated to guide the optimization of the whole depth completion model in a self-supervised manner. Extensive experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance compared with other unsupervised methods, and competitive performance compared with previous supervised methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. J. Yu, W. Yin, Z. Hu, and Y. Liu, “3d reconstruction for multi-view objects,” Computers and Electrical Engineering, vol. 106, p. 108567, 2023.
  2. Z. Song, X. Wang, H. Zhu, G. Zhou, and Q. Wang, “Learning reliable gradients from undersampled circular light field for 3d reconstruction,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 12, pp. 5194–5207, 2023.
  3. A. Otaran and I. Farkhatdinov, “Haptic ankle platform for interactive walking in virtual reality,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 12, pp. 3974–3985, 2022.
  4. K. J. Mimnaugh, E. G. Center, M. Suomalainen, I. Becerra, E. Lozano, R. Murrieta-Cid, T. Ojala, S. M. LaValle, and K. D. Federmeier, “Virtual reality sickness reduces attention during immersive experiences,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 11, pp. 4394–4404, 2023.
  5. J. W. Kelly, “Distance perception in virtual reality: A meta-analysis of the effect of head-mounted display characteristics,” IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 12, pp. 4978–4989, 2023.
  6. L. Bai, Y. Zhao, M. Elhousni, and X. Huang, “Depthnet: Real-time lidar point cloud depth completion for autonomous vehicles,” IEEE Access, vol. 8, pp. 227 825–227 833, 2020.
  7. F. Farahnakian and J. Heikkonen, “Rgb-depth fusion framework for object detection in autonomous vehicles,” in 2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020, pp. 1–6.
  8. R. A. El-laithy, J. Huang, and M. Yeh, “Study on the use of microsoft kinect for robotics applications,” in Proceedings of the 2012 IEEE/ION Position, Location and Navigation Symposium, 2012, pp. 1280–1288.
  9. C. Jing, J. Potgieter, F. Noble, and R. Wang, “A comparison and analysis of rgb-d cameras’ depth performance for robotics application,” in 2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2017, pp. 1–6.
  10. J. Ku, A. Harakeh, and S. L. Waslander, “In defense of classical image processing: Fast depth completion on the cpu,” in 2018 15th Conference on Computer and Robot Vision (CRV), 2018, pp. 16–22.
  11. N. Chodosh, C. Wang, and S. Lucey, “Deep convolutional compressed sensing for lidar depth completion,” in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part I 14.   Springer, 2019, pp. 499–513.
  12. J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 international conference on 3D Vision (3DV).   IEEE, 2017, pp. 11–20.
  13. S. K. . E. C. Adam Paszke, Abhishek Chaurasia, “Enet: A deep neural network architecture for real-time semantic segmentation.” CoRR, vol. abs/1606.02147, 2016.
  14. M. Hu, S. Wang, B. Li, S. Ning, L. Fan, and X. Gong, “Penet: Towards precise and efficient image guided depth completion,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 13 656–13 662.
  15. S. Liu, S. De Mello, J. Gu, G. Zhong, M.-H. Yang, and J. Kautz, “Learning affinity via spatial propagation networks,” Advances in Neural Information Processing Systems, vol. 30, pp. 1519––1529, 2017.
  16. X. Cheng, P. Wang, and R. Yang, “Depth estimation via affinity learned with convolutional spatial propagation network,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 103–119.
  17. J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. So Kweon, “Non-local spatial propagation network for depth completion,” in Computer Vision–ECCV 2020.   Springer, 2020, pp. 120–136.
  18. M. Dimitrievski, P. Veelaert, and W. Philips, “Learning morphological operators for depth completion,” in Advanced Concepts for Intelligent Vision Systems, 2018, pp. 450–461.
  19. N. Chodosh, C. Wang, and S. Lucey, “Deep convolutional compressed sensing for lidar depth completion,” in Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, 2019, pp. 499–513.
  20. F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 3288–3295.
  21. M. Jaritz, R. De Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV).   IEEE, 2018, pp. 52–60.
  22. Y. Yang, A. Wong, and S. Soatto, “Dense depth posterior (ddp) from single image and sparse range,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3353–3362.
  23. Z. Huang, J. Fan, S. Cheng, S. Yi, X. Wang, and H. Li, “Hms-net: Hierarchical multi-scale sparsity-invariant network for sparse depth completion,” IEEE Transactions on Image Processing, vol. 29, pp. 3429–3441, 2020.
  24. Y. Chen, B. Yang, M. Liang, and R. Urtasun, “Learning joint 2d-3d representations for depth completion,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 023–10 032.
  25. A. Li, Z. Yuan, Y. Ling, W. Chi, S. Zhang, and C. Zhang, “A multi-scale guided cascade hourglass network for depth completion,” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 32–40.
  26. Q. Zhang, Y. Jiang, Q. Zhou, Y. Zhao, Y. Liu, H. Lu, and X.-S. Hua, “Single person dense pose estimation via geometric equivariance consistency,” IEEE Transactions on Multimedia, vol. 25, pp. 572–583, 2023.
  27. T. Shen, D. Li, F.-Y. Wang, and H. Huang, “Depth-aware multi-person 3d pose estimation with multi-scale waterfall representations,” IEEE Transactions on Multimedia, vol. 25, pp. 1439–1451, 2022.
  28. J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3313–3322.
  29. Y. Xu, X. Zhu, J. Shi, G. Zhang, H. Bao, and H. Li, “Depth completion from sparse lidar data with depth-normal constraints,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2811–2820.
  30. Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 175–185.
  31. W. Van Gansbeke, D. Neven, B. De Brabandere, and L. Van Gool, “Sparse and noisy lidar completion with rgb guidance and uncertainty,” in 2019 16th International Conference on Machine Vision Applications (MVA), 2019, pp. 1–6.
  32. S. S. Shivakumar, T. Nguyen, I. D. Miller, S. W. Chen, V. Kumar, and C. J. Taylor, “Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC).   IEEE, 2019, pp. 13–20.
  33. A. Wong, S. Cicek, and S. Soatto, “Learning topology from synthetic data for unsupervised depth completion,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1495–1502, 2021.
  34. A. Wong, X. Fei, B.-W. Hong, and S. Soatto, “An adaptive framework for learning unsupervised depth completion,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3120–3127, 2021.
  35. A. Wong, X. Fei, S. Tsuei, and S. Soatto, “Unsupervised depth completion from visual inertial odometry,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1899–1906, 2020.
  36. A. Wong and S. Soatto, “Unsupervised depth completion with calibrated backprojection layers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 747–12 756.
  37. T. Y. Liu, P. Agrawal, A. Chen, B.-W. Hong, and A. Wong, “Monitored distillation for positive congruent depth completion,” in European Conference on Computer Vision, 2022, pp. 35–53.
  38. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  39. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Ep n p: An accurate o (n) solution to the p n p problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009.
  40. S. Avinash, H. Naveen Kumar, M. Guru Prasad, R. Mohan Naik, and G. Parveen, “Early detection of malignant tumor in lungs using feed-forward neural network and k-nearest neighbor classifier,” SN Computer Science, vol. 4, no. 2, p. 195, 2023.
  41. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision–ECCV 2012, 2012, pp. 746–760.
  42. X. Fei, A. Wong, and S. Soatto, “Geo-supervised visual depth prediction,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1661–1668, 2019.
  43. C. Ling, X. Zhang, and H. Chen, “Unsupervised monocular depth estimation using attention and multi-warp reconstruction,” IEEE Transactions on Multimedia, vol. 24, pp. 2938–2949, 2021.
  44. J. Wu, R. Ji, Q. Wang, S. Zhang, X. Sun, Y. Wang, M. Xu, and F. Huang, “Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement,” IEEE Transactions on Multimedia, vol. 25, pp. 1204–1216, 2023.
  45. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  46. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  47. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2014.
  48. I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in 2016 Fourth international conference on 3D vision (3DV).   IEEE, 2016, pp. 239–248.
  49. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in neural information processing systems, vol. 27, pp. 1–9, 2014.
  50. T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1851–1858.
  51. X. Cheng, P. Wang, C. Guan, and R. Yang, “Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 615–10 622.
  52. A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,” in ACM SIGGRAPH 2004 Papers, 2004, pp. 689–694.
  53. J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Transactions on Graphics (ToG), vol. 26, no. 3, pp. 96–es, 2007.
  54. A. Lopez-Rodriguez, B. Busam, and K. Mikolajczyk, “Project to adapt: Domain adaptation for depth completion from noisy and sparse sensor data,” in Computer Vision – ACCV 2020, H. Ishikawa, C.-L. Liu, T. Pajdla, and J. Shi, Eds.   Cham: Springer International Publishing, 2021, pp. 330–348.
  55. S. S. Shivakumar, T. Nguyen, I. D. Miller, S. W. Chen, V. Kumar, and C. J. Taylor, “Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 2019, pp. 13–20.
  56. F. Ma and S. Karaman, “Sparse-to-dense: Depth prediction from sparse depth samples and a single image,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 4796–4803.
  57. J. T. Barron and B. Poole, “The fast bilateral solver,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds.   Cham: Springer International Publishing, 2016, pp. 617–632.
  58. T. Igarashi, T. Moscovich, and J. F. Hughes, “As-rigid-as-possible shape manipulation,” ACM transactions on Graphics (TOG), vol. 24, no. 3, pp. 1134–1141, 2005.
  59. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18.   Springer, 2015, pp. 234–241.
  60. F. Moosmann and C. Stiller, “Velodyne slam,” in 2011 ieee intelligent vehicles symposium (iv).   IEEE, 2011, pp. 393–398.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yu Cai (45 papers)
  2. Tianyu Shen (6 papers)
  3. Shi-Sheng Huang (9 papers)
  4. Hua Huang (70 papers)

Summary

We haven't generated a summary for this paper yet.