Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation (2312.01632v4)

Published 4 Dec 2023 in cs.CV

Abstract: Constructing vivid 3D head avatars for given subjects and realizing a series of animations on them is valuable yet challenging. This paper presents GaussianHead, which models the actional human head with anisotropic 3D Gaussians. In our framework, a motion deformation field and multi-resolution tri-plane are constructed respectively to deal with the head's dynamic geometry and complex texture. Notably, we impose an exclusive derivation scheme on each Gaussian, which generates its multiple doppelgangers through a set of learnable parameters for position transformation. With this design, we can compactly and accurately encode the appearance information of Gaussians, even those fitting the head's particular components with sophisticated structures. In addition, an inherited derivation strategy for newly added Gaussians is adopted to facilitate training acceleration. Extensive experiments show that our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks. Our code is available at: https://github.com/chiehwangs/gaussian-head.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  2. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  3. Riemannian adaptive optimization methods. arXiv preprint arXiv:1810.00760, 2018.
  4. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023.
  5. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013.
  6. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
  7. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  8. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
  9. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
  10. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  11. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
  12. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG), 41(6):1–12, 2022.
  13. Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14346–14355, 2021.
  14. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 75–82. IEEE, 2018.
  15. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020.
  16. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  17. Baking neural radiance fields for real-time view synthesis. in 2021 ieee. In CVF International Conference on Computer Vision (ICCV), pages 5855–5864, 2021.
  18. Headnerf: A real-time nerf-based parametric head model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20374–20384, 2022.
  19. Sdfdiff: Differentiable rendering of signed distance fields for 3d shape optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1251–1261, 2020.
  20. Modnet: Real-time trimap-free portrait matting via objective decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1140–1147, 2022.
  21. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  22. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  23. Nersemble: Multi-view radiance field reconstruction of human heads. arXiv preprint arXiv:2305.03027, 2023.
  24. Efficient region-aware neural radiance fields for high-fidelity talking portrait synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7568–7578, 2023.
  25. Learning a model of facial shape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017.
  26. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6498–6508, 2021.
  27. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  28. Dist: Rendering deep implicit signed distance function with differentiable sphere tracing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2019–2028, 2020.
  29. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  30. Stéphane Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, 2012.
  31. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  32. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
  33. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  34. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  35. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5589–5599, 2021.
  36. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  37. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
  38. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  39. Convolutional occupancy networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 523–540. Springer, 2020.
  40. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
  41. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  42. Learning invariant representations with local transformations. arXiv preprint arXiv:1206.6418, 2012.
  43. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  44. Real-time neural radiance talking portrait synthesis via audio-spatial decomposition. arXiv preprint arXiv:2211.12368, 2022.
  45. Hq3davatar: High quality controllable 3d head avatar. arXiv preprint arXiv:2303.14471, 2023.
  46. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  47. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  48. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  49. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023.
  50. Latentavatar: Learning latent expression code for expressive neural head avatar. arXiv preprint arXiv:2305.01190, 2023.
  51. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023.
  52. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642, 2023.
  53. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
  54. Bakedsdf: Meshing neural sdfs for real-time view synthesis. arXiv preprint arXiv:2302.14859, 2023.
  55. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
  56. Canonical factors for hybrid neural fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3414–3426, 2023.
  57. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), 38(6):1–14, 2019.
  58. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  59. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  60. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
  61. Ma-nerf: Motion-assisted neural radiance fields for face synthesis from sparse images. arXiv preprint arXiv:2306.10350, 2023.
  62. Im avatar: Implicit morphable head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13545–13555, 2022.
  63. Pointavatar: Deformable point-based head avatars from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21057–21067, 2023.
  64. Surface splatting. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 371–378, 2001.
  65. Ewa splatting. IEEE Transactions on Visualization and Computer Graphics, 8(3):223–238, 2002.
Citations (18)

Summary

  • The paper introduces a novel learnable Gaussian derivation that uses anisotropic 3D Gaussian primitives to accurately represent dynamic head geometry.
  • It employs a motion deformation field and hierarchical radiance decoding to capture facial expressions and deliver vivid view-dependent colors.
  • Experimental evaluations demonstrate improved metrics and detailed reconstructions, outperforming methods like NeRFBlendShape and PointAvatar.

High-fidelity Head Avatars with Learnable Gaussian Derivation: A Technical Examination

The paper "GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation" introduces an innovative approach to modeling dynamic and high-fidelity 3D head avatars using a framework built around anisotropic 3D Gaussian primitives. This research marks a significant development in head avatar construction by effectively leveraging both geometric and appearance-based representation methodologies. The authors address challenging tasks in reconstructive and generative 3D modeling, showing how GaussianHead achieves outstanding outcomes in novel-view synthesis, self-reconstruction, and cross-identity reenactment tasks.

Technical Contributions

The principal contribution of the work is the use of anisotropic 3D Gaussians for an accurate geometric representation of a head avatar. This method offers a dynamic deformation to fit various head movements effectively, circumventing several limitations found in prior approaches such as signed distance fields or point-cloud-based representations. The research details the implementation of:

  1. Motion Deformation Field: This component transforms Gaussians to a canonical space, allowing for head dynamics captured from a monocular video to be expressed through a motion deformation field. By conditioning on expression parameters derived from facial movements, the method effectively models the dynamic geometry of human heads.
  2. Learnable Gaussian Derivation: The authors have introduced a novel approach to address feature dilution that commonly arises in axis-aligned mappings used in explicit data structures. By generating multiple derivations of core Gaussians through learnable rotational transforms, the method ensures precise encoding of even complex textures and structures, enhancing fidelity.
  3. Hierarchical Radiance Decoding: To extract radiance data, the authors employ a set of MLP networks, optimizing both opacity and spherical harmonic coefficients. This approach yields high-detail view-dependent colors, promoting vivid and realistic renderings of facial features.

Experimental Evaluation

Extensive evaluations demonstrate that GaussianHead surpasses contemporary methods such as NeRFBlendShape, PointAvatar, and INSTA in obtaining visually coherent and quantitatively superior results. Notably, the method's efficacy is evident in its ability to retain intricate details including hair strands and skin textures, while effectively reconstructing complex expressions and head poses.

GaussianHead's utilization of learnable parameters in the derivation and encoding strategies allows it to achieve a reduction in the "feature dilution" issue prevalent in axis-aligned data representations. Empirical results illustrate a marked improvement across metrics such as L1 norm, PSNR, SSIM, and LPIPS when contrasted with existing benchmarks.

Implications and Future Directions

The significant advancements presented in this paper have palpable implications across numerous domains, including virtual reality, telecommunications, and digital simulations. The fidelity with which GaussianHead models high-dynamic-range and detailed features offers the potential for more immersive and personalized virtual interactions.

Looking forward, the research opens several pathways for future exploration. Among these, the disentanglement of head and torso movements within the framework could be explored for even finer motion control. Additionally, further optimization could be directed towards real-time applications, reducing computational overhead while maintaining quality.

In summary, GaussianHead provides a substantive contribution to the domain of 3D head modeling by intersecting cutting-edge computer graphics methodologies with flexible and scalable machine learning-driven architectures. This fusion yields a powerful tool capable of rendering high-fidelity avatars for both reconstructive and generative digital experiences.