TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation (2407.02034v2)
Abstract: Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. Additionally, we explore the relationship between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choice, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Further extensive quantitative and qualitative results in text-guided 3D scene editing indicate that our method achieves superior editing quality compared to state-of-the-art methods. We will make the complete codebase publicly available following the conclusion of the review process.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5470–5479, 2022.
- Instructpix2pix: Learning to Follow Image Editing Instructions. In IEEE Conf. Comput. Vis. Pattern Recog., pages 18392–18402. IEEE, 2023.
- A survey on 3d gaussian splatting. arXiv preprint arXiv:2401.03890, 2024.
- Generic 3d Diffusion Adapter Using Controlled Multi-View Editing. ArXiv, abs/2403.12032, 2024.
- Gaussianeditor: Swift and controllable 3d editing with gaussian splatting. In IEEE Conf. Comput. Vis. Pattern Recog., 2024.
- Neural volume rendering: Nerf and beyond. arXiv preprint arXiv:2101.05204, 2020.
- Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv preprint arXiv:2210.00379, 2022.
- Diffusion Models as Plug-and-Play Priors. In Adv. Neural Inform. Process. Syst., 2022.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. In Int. Conf. Comput. Vis., pages 19740–19750, October 2023.
- Instruct-NeRF2NeRF: Editing 3d Scenes with Instructions. In Int. Conf. Comput. Vis., pages 19683–19693, 2023.
- Customize your nerf: Adaptive source driven 3d scene editing via local-global iterative training. arXiv preprint arXiv:2312.01663, 2023.
- Freditor: High-fidelity and transferable nerf editing by frequency decomposition. arXiv preprint arXiv:2404.02514, 2024.
- Delta Denoising Score. In Int. Conf. Comput. Vis., pages 2328–2337. IEEE, 2023.
- Prompt-to-Prompt Image Editing with Cross-Attention Control. 2022.
- Dreamtime: An Improved Optimization Strategy for Text-to-3d Content Creation. In Int. Conf. Learn. Represent., 2024.
- Instruct 3d-to-3d: Text instruction guided 3d-to-3d conversion. arXiv preprint arXiv:2303.15780, 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):1–14, 2023.
- Latenteditor: Text driven local editing of 3d scenes. arXiv preprint arXiv:2312.09313, 2023.
- Posterior distillation sampling. arXiv preprint arXiv:2311.13831, 2023.
- Focaldreamer: Text-driven 3d editing via focal-fusion assembly. In AAAI Conf. on Artificial Intell., 2024.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Reffusion: Reference adapted diffusion models for 3d scene inpainting. arXiv preprint arXiv:2404.10765, 2024.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):1–15, 2022.
- Ed-nerf: Efficient text-guided editing of 3d scene with latent space nerf. In Int. Conf. Learn. Represent., 2024.
- Dreamfusion: Text-to-3d using 2d Diffusion. In Int. Conf. Learn. Represent., 2023.
- High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022.
- Vox-e: Text-guided voxel editing of 3d objects. In Int. Conf. Comput. Vis., pages 430–440, 2023.
- Denoising Diffusion Implicit Models. In Int. Conf. Learn. Represent., 2021.
- Efficient-nerf2nerf: Streamlining text-driven 3d editing with multiview correspondence-enhanced diffusion models. arXiv preprint arXiv:2312.08563, 2023.
- Score Jacobian Chaining: Lifting Pretrained 2d Diffusion Models for 3d Generation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12619–12629. IEEE, 2023.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Adv. Neural Inform. Process. Syst., volume 34, pages 27171–27183. Curran Associates, Inc., 2021.
- Gaussctrl: Multi-view consistent text-driven 3d gaussian splatting editing. arXiv preprint arXiv:2403.08733, 2024.
- Recent advances in 3d gaussian splatting. arXiv preprint arXiv:2403.11134, 2024.
- Inversion-Free Image Editing with Natural Language. In IEEE Conf. Comput. Vis. Pattern Recog., 2024.
- Gaussian grouping: Segment and edit anything in 3d scenes. arXiv preprint arXiv:2312.00732, 2023.
- Diffusion Time-step Curriculum for One Image to 3d Generation. 2024.
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
- Adding conditional control to text-to-image diffusion models. In Int. Conf. Comput. Vis., volume abs/2302.05543, pages 3836–3847, 2023.
- Repaint-nerf: Nerf editting via semantic masks and diffusion models. In Int. Joint Conf. on Artificial Intell., pages 1813–1821, 2023.
- HIFA: High-fidelity text-to-3d generation with advanced diffusion guidance. In Int. Conf. Learn. Represent., 2024.
- Dreameditor: Text-driven 3d scene editing with neural fields. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
- Videomv: Consistent Multi-View Generation Based on Large Video Generative Model. arXiv, abs/2403.12010, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.