- The paper introduces SDR, a novel method that integrates contextual structure into synthetic data generation to improve object detection.
- The methodology employs probabilistic models with global parameters and context splines to realistically simulate environments for robust car detection.
- Experimental results on datasets like KITTI show that SDR pretraining yields competitive performance compared to traditional synthetic and real data.
Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data
Introduction
Structured domain randomization (SDR) improves the generation of synthetic data for training deep neural networks by incorporating contextual structure into the randomization process. By comparison, traditional domain randomization (DR) involves the placement of objects and distractors according to a uniform probability distribution without regard to the context. SDR, however, introduces probability distributions aligned with specific problems, enabling context-aware randomization. This approach is demonstrated for 2D bounding box car detection, using synthetic data alone to achieve competitive results on real datasets such as KITTI.
Methodology
SDR functions by generating synthetic images according to a probabilistic model that integrates global parameters, context splines, and objects. The global parameters include variables such as road geometry, lighting conditions, and scenario selection (urban, suburban, rural), depicted in Figure 1.








Figure 1: In Structured Domain Randomization (SDR), a scenario is chosen at random, then global parameters, such as road curvature and lighting, generate context splines, upon which objects like cars and pedestrians are placed.
The context splines define structural aspects of the scene such as road lanes, sidewalks, which support the placement of objects along them. Objects are placed respecting the structure, allowing neural networks to learn both from object features and their contextual environment.
Comparative Evaluation
The effectiveness of SDR was compared against several synthetic data generation techniques using Faster-RCNN for 2D vehicle detection on the KITTI dataset. Compared to traditional domain randomization and other synthetic datasets like Virtual KITTI, SDR achieved superior results across all difficulty levels (Easy, Moderate, Hard).











Figure 2: Synthetic datasets used for training object detection models. SDR produces realistic images and exhibits large variety, enhancing model robustness.
Synthetic data from SDR was also compared against real datasets. Despite the typical advantage of dataset alignment in terms of distribution, SDR demonstrated competitive performance against real data from differing domains, such as BDD100K, due to the variety and context provided in the synthetic images.



Figure 3: Qualitative results on KITTI of Faster-RCNN trained only on SDR-generated synthetic data. Note the successful detection of severely occluded vehicles.
Initialization and Fine-Tuning
SDR’s utility as an initialization strategy was explored by pretraining networks before fine-tuning with limited real labeled data. Results showed that models pretrained with SDR exhibited enhanced performance over those trained directly on the same volume of real data.
Ablation Study
An ablative examination of SDR parameters revealed the critical role of context and texture-related attributes (such as saturation and contrast) in enhancing model performance. Variations in the scene, lighting, and object count were also found to be significant contributors.
Figure 4: Ablation paper for full SDR, examining the impact of various randomization parameters.
Conclusion
SDR integrates structural context into synthetic data generation processes, offering robust improvements over conventional DR in object detection tasks. The increased variety and contextual integrity in synthetic scenes facilitate better generalization across real-world domains, as evidenced by the strong KITTI dataset results. Future work will explore SDR applications across different computer vision tasks, such as multi-class detection and segmentation, broadening the scope of its applicability.