Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes (2403.15679v1)

Published 23 Mar 2024 in cs.CV and cs.MM

Abstract: Implicit neural representations for video (NeRV) have recently become a novel way for high-quality video representation. However, existing works employ a single network to represent the entire video, which implicitly confuse static and dynamic information. This leads to an inability to effectively compress the redundant static information and lack the explicitly modeling of global temporal-coherent dynamic details. To solve above problems, we propose DS-NeRV, which decomposes videos into sparse learnable static codes and dynamic codes without the need for explicit optical flow or residual supervision. By setting different sampling rates for two codes and applying weighted sum and interpolation sampling methods, DS-NeRV efficiently utilizes redundant static information while maintaining high-frequency details. Additionally, we design a cross-channel attention-based (CCA) fusion module to efficiently fuse these two codes for frame decoding. Our approach achieves a high quality reconstruction of 31.2 PSNR with only 0.35M parameters thanks to separate static and dynamic codes representation and outperforms existing NeRV methods in many downstream tasks. Our project website is at https://haoyan14.github.io/DS-NeRV.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Cisco: Service provider solutions. https://www.cisco.com/c/en/us/solutions/service-provider/index.html#~products-and-solutions.
  2. 2023 global internet phenomena report. https://www.sandvine.com/phenomena.
  3. Scale-space flow for end-to-end optimized video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8503–8512, 2020.
  4. Ps-nerv: Patch-wise stylized neural representations for videos. In 2023 IEEE International Conference on Image Processing (ICIP), pages 41–45. IEEE, 2023.
  5. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  6. Nope-nerf: Optimising neural radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023.
  7. Optimizing the latent space of generative networks. arXiv preprint arXiv:1707.05776, 2017.
  8. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022a.
  9. Nerv: Neural representations for videos. Advances in Neural Information Processing Systems, 34:21557–21568, 2021.
  10. Cnerv: Content-adaptive neural representation for visual data. arXiv preprint arXiv:2211.10421, 2022b.
  11. Hnerv: A hybrid neural representation for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10270–10279, 2023.
  12. Deepcoder: A deep neural network based video compression. In IEEE Visual Communications and Image Processing Conference, 2017.
  13. An overview of core coding tools in the av1 video codec. In 2018 picture coding symposium (PCS), pages 41–45. IEEE, 2018.
  14. Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology, 30(2):566–576, 2019.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  16. Neural radiance flow for 4d view synthesis and video processing. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14304–14314. IEEE Computer Society, 2021.
  17. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12479–12488, 2023.
  18. Towards scalable neural representation for diverse videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6132–6142, 2023.
  19. Few-view object reconstruction with unknown categories and camera poses. arXiv preprint arXiv:2212.04492, 2022.
  20. Scalable neural video representations with learnable positional features. Advances in Neural Information Processing Systems, 35:12718–12731, 2022.
  21. Ffnerv: Flow-guided frame-wise neural representations for videos. arXiv preprint arXiv:2212.12294, 2022.
  22. D LEGALL. A video compression standard for multimedia applications. Commun. ACM, 34:226–252, 1993.
  23. Deep contextual video compression. Advances in Neural Information Processing Systems, 34:18114–18125, 2021.
  24. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022a.
  25. E-nerv: Expedite neural video representation with disentangled spatial-temporal context. In European Conference on Computer Vision, pages 267–284. Springer, 2022b.
  26. M-lvc: Multiple frames prediction for learned video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3546–3554, 2020.
  27. Cu partition mode decision for hevc hardwired intra encoder using convolution neural network. IEEE Transactions on Image Processing, 25(11):5088–5103, 2016.
  28. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  29. Video frame interpolation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3532–3542, 2022.
  30. Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14378–14387, 2023.
  31. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  32. Uvg dataset: 50/120fps 4k sequences for video codec analysis and development. In Proceedings of the 11th ACM Multimedia Systems Conference, pages 297–302, 2020.
  33. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
  34. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  35. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  36. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  37. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732, 2016.
  38. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  39. Elf-vc: Efficient learned flexible-rate video coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14479–14488, 2021.
  40. Ton Roosendaal. Big buck bunny. In ACM SIGGRAPH ASIA 2008 computer animation festival, pages 62–62. 2008.
  41. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
  42. Neural network-based arithmetic coding of intra prediction modes in hevc, 2017.
  43. Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on circuits and systems for video technology, 22(12):1649–1668, 2012.
  44. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
  45. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12959–12970, 2021.
  46. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, pages 1398–1402. Ieee, 2003.
  47. Overview of the h. 264/avc video coding standard. IEEE Transactions on circuits and systems for video technology, 13(7):560–576, 2003.
  48. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  49. Video compression through image interpolation. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  50. Logan: Latent optimisation for generative adversarial networks. arXiv preprint arXiv:1912.00953, 2019.
  51. Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models. arXiv preprint arXiv:2208.06677, 2022.
  52. Learning for video compression with hierarchical quality and recurrent enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6628–6637, 2020.
  53. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5682–5692, 2023.
  54. Dnerv: Modeling inherent dynamics via difference neural representation for videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2031–2040, 2023.
  55. Cross-view transformers for real-time map-view semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13760–13769, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com