Papers
Topics
Authors
Recent
2000 character limit reached

FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes (2401.03470v2)

Published 7 Jan 2024 in cs.CV and cs.AI

Abstract: Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design. Current indoor scene generation methods can produce reasonable room layouts but often lack diversity and realism. This is primarily due to the limited coverage of existing datasets, including only large furniture without tiny furnishings in daily life. To address these challenges, we propose FurniScene, a large-scale 3D room dataset with intricate furnishing scenes from interior design professionals. Specifically, the FurniScene consists of 11,698 rooms and 39,691 unique furniture CAD models with 89 different types, covering things from large beds to small teacups on the coffee table. To better suit fine-grained indoor scene layout generation, we introduce a novel Two-Stage Diffusion Scene Model (TSDSM) and conduct an evaluation benchmark for various indoor scene generation based on FurniScene. Quantitative and qualitative evaluations demonstrate the capability of our method to generate highly realistic indoor scenes. Our dataset and code will be publicly available soon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. 3d semantic parsing of large-scale indoor spaces. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1534–1543, 2016.
  2. Scan2cad: Learning cad model alignment in rgb-d scans. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2614–2623, 2019.
  3. Matterport3d: Learning from rgb-d data in indoor environments. In Proc. International Conference on 3D Vision, pages 667–676, 2017.
  4. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5828–5839, 2017.
  5. Characterizing structural relationships in scenes using graph kernels. ACM Transactions on Graphics, 30(4):1–12, 2011.
  6. Procedural generation of multistory buildings with interior. IEEE Transactions on Games, 12(3):323–336, 2019.
  7. 3d-front: 3d furnished rooms with layouts and semantics. In Proc. IEEE/CVF International Conference on Computer Vision, pages 10933–10942, 2021a.
  8. 3d-future: 3d furniture shape with texture. International Journal of Computer Vision, 129(12):3313–3337, 2021b.
  9. Scenehgn: Hierarchical graph networks for 3d indoor scene generation with fine-grained geometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(07):8902–8919, 2023.
  10. Understanding real world indoor scenes with synthetic data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4077–4085, 2016.
  11. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  12. Diffpose: Multi-hypothesis human pose estimation using diffusion models. In Proc. IEEE/CVF International Conference on Computer Vision, pages 15977–15987, 2023.
  13. Scenenn: A scene meshes dataset with annotations. In Proc. International Conference on 3D Vision, pages 92–101, 2016.
  14. Fastdiff: A fast conditional diffusion model for high-quality speech synthesis. In Proc. International Joint Conference on Artificial Intelligence, pages 4157–4163, 2022.
  15. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022.
  16. Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics, 38(2):1–16, 2019.
  17. Openrooms: An open framework for photorealistic indoor scene datasets. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7190–7199, 2021.
  18. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proc. AAAI Conference on Artificial Intelligence, pages 11020–11028, 2022.
  19. Clip-layout: Style-consistent indoor scene synthesis with semantic furniture embedding. arXiv preprint arXiv:2303.03565, 2023.
  20. End-to-end optimization of scene layout. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3753–3762, 2020.
  21. Interactive furniture layout using interior design guidelines. ACM transactions on graphics, 30(4):1–10, 2011.
  22. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In Proc. International Conference on Machine Learning, pages 16784–16804, 2022.
  23. Learning 3d scene priors with 2d supervision. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 792–802, 2023.
  24. Atiss: Autoregressive transformers for indoor scene synthesis. Advances in Neural Information Processing Systems, 34:12013–12026, 2021.
  25. Sg-vae: Scene grammar variational autoencoder to generate new indoor scenes. In Proc. European Conference on Computer Vision, pages 155–171, 2020.
  26. Human-centric indoor scene synthesis using stochastic grammar. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5899–5908, 2018.
  27. Learning structure-guided diffusion model for 2d human pose estimation. arXiv preprint arXiv:2306.17074, 2023.
  28. Fast and flexible indoor scene synthesis via deep convolutional generative models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6182–6190, 2019.
  29. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  30. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32:11895–11907, 2019.
  31. Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33:12438–12448, 2020.
  32. Metropolis procedural modeling. ACM Transactions on Graphics, 30(2):1–14, 2011.
  33. Diffuscene: Scene graph denoising diffusion probabilistic model for generative indoor scene synthesis. arXiv preprint arXiv:2303.14207, 2023.
  34. Rule-based layout solving and its application to procedural interior generation. In Proc. 3D Advanced Media In Gaming And Simulation, pages 15–24, 2009.
  35. Attention is all you need. In Proc. International Conference on Neural Information Processing Systems, page 6000–6010, 2017.
  36. Deep convolutional priors for indoor scene synthesis. ACM Transactions on Graphics, 37(4):1–14, 2018.
  37. Planit: Planning and instantiating indoor scenes with relation graph and spatial prior networks. ACM Transactions on Graphics, 38(4):1–15, 2019.
  38. Sceneformer: Indoor scene generation with transformers. In Proc. International Conference on 3D Vision, pages 106–115, 2021.
  39. Lego-net: Learning regular rearrangements of objects in rooms. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19037–19047, 2023.
  40. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proc. IEEE International Conference on Computer Vision, pages 1625–1632, 2013.
  41. Scene synthesis via uncertainty-driven attribute synchronization. In Proc. IEEE/CVF International Conference on Computer Vision, pages 5630–5640, 2021a.
  42. Indoor scene generation from a collection of semantic-segmented depth images. In Proc. IEEE/CVF International Conference on Computer Vision, pages 15203–15212, 2021b.
  43. Make it home: automatic optimization of furniture arrangement. ACM Transactions on Graphics, 30(4), 2011.
  44. The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Transactions on Visualization and Computer Graphics, 22(2):1138–1148, 2015.
  45. Deep generative modeling for scene synthesis via hybrid representations. ACM Transactions on Graphics, 39(2):1–21, 2020.
  46. Structured3d: A large photo-realistic dataset for structured 3d modeling. In Proc. European Conference on Computer Vision, pages 519–535, 2020.
Citations (3)

Summary

  • The paper introduces a novel two-stage diffusion model (TSDSM) that leverages the FurniScene dataset to generate realistic indoor scenes.
  • The dataset comprises 111,698 rooms and 39,691 unique furniture models across 89 categories, significantly enhancing scene diversity and detail.
  • Benchmarking results indicate that TSDSM outperforms previous methods, improving metrics such as FID, KID, and scene classification accuracy.

FurniScene: A Comprehensive Analysis of a Large-Scale 3D Room Dataset

Introduction

The paper "FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes" presents a significant advancement in the domain of indoor scene generation by introducing the FurniScene dataset. This dataset is designed to support applications in gaming, virtual reality, and interior design by providing a diverse and realistic collection of 3D room models enriched with intricate furnishings. The authors address the limitations of existing datasets in terms of diversity and realism by emphasizing smaller decorative objects that are often overlooked. A new generative model, the Two-Stage Diffusion Scene Model (TSDSM), is proposed to leverage this comprehensive dataset effectively.

Data Collection and Dataset Characteristics

The creation of the FurniScene dataset involves a multi-step data collection framework, which is outlined in a systematic pipeline (Figure 1). Figure 1

Figure 1: The pipeline of building FurniScene. Our data collection framework consists of purchasing SketchUp scenes, extracting CAD models, rendering and labeling in UE, performing data augmentation, and generating point clouds.

  1. Data Acquisition: Raw SketchUp models are sourced from interior designers and processed in 3DMax to ensure geometric accuracy and semantic richness.
  2. Data Augmentation: Enhancements such as rotation, deletion, and replacement ensure a wide variety of layouts.
  3. Point Cloud Generation: Mesh data is converted into point clouds to facilitate various applications, such as 3D semantic segmentation.

The dataset contains 111,698 rooms with 39,691 unique high-quality furniture CAD models spanning 89 object types, which is notably more comprehensive compared to previous datasets like 3D-FRONT.

Indoor Scene Generation Methodology

The proposed Two-Step Diffusion Scene Model (TSDSM) is a novel generative framework designed to enhance the realism and diversity of generated indoor scenes, summarized in Figure 2. Figure 2

Figure 2: Model architecture. Firstly, FLGM generates a furniture list based on text prompt. Subsequently, FRS utilizes this list to retrieve furniture models, which are then used by LGM to generate layout information.

  1. Furniture List Generation: This stage employs a diffusion model to progressively generate a furniture list from a text prompt, detailing the size and category of each object.
  2. Layout Generation: Utilizing the furniture list, a separate model generates detailed layout information, optimizing the placement of furniture and ensuring cohesiveness in the scene.

The TSDSM's architecture emphasizes modularity and adaptability, allowing for efficient scaling and application to various contexts beyond the initial benchmarks.

Benchmarking and Evaluation

The paper conducts a thorough benchmarking of indoor scene generation methods using the FurniScene dataset. The results demonstrate a superior performance of TSDSM over existing methods, as highlighted by several metrics such as FID, KID, and scene classification accuracy (Table 1). Figure 3

Figure 3: Qualitative results of unconditional scene generation. The top row represents the generated results for bedrooms, the middle row for living rooms, and the bottom row for dining rooms.

Implications and Future Work

The implications of FurniScene are multifaceted:

  • Enhanced Realism: The inclusion of detailed decorative items bridges the gap between synthetic and real-world scenes, offering better model training for applications such as autonomous interior design and realistic virtual environments.
  • Research Opportunities: The dataset and associated benchmarks provide a foundation for developing and evaluating new generative models, particularly those focused on high-fidelity scene synthesis.
  • Scalability and Generalization: The modular TSDSM enables flexible adaptation to various generative tasks beyond room layout generation.

Conclusion

FurniScene addresses a critical gap in the field of 3D indoor scene datasets by providing an unprecedented level of detail and diversity. Coupled with the introduction of TSDSM, it sets a new standard for indoor scene generation research, enhancing the capability to produce realistic and versatile virtual environments. Future developments could explore the integration of dynamic elements into the dataset and model to simulate evolving environments for more complex applications.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.