- The paper presents a novel synthetic fog generation pipeline that simulates realistic fog effects using refined depth maps and atmospheric light estimation.
- It demonstrates improved CNN performance, with fine-tuning on synthetic foggy images raising mean IoU from 34.9% to 37.8% on real-world foggy scenes.
- It introduces a semi-supervised strategy using pseudo-labels from clear-weather images to further enhance segmentation accuracy while reducing annotation costs.
Semantic Foggy Scene Understanding with Synthetic Data
Introduction
This paper tackles the problem of Semantic Foggy Scene Understanding (SFSU), an area that has received limited attention compared to general semantic scene understanding and image dehazing. The significant challenge in SFSU lies in the difficulty of collecting and labeling foggy images. To address this, the authors generate synthetic fog on existing clear-weather images and leverage these partially synthetic images for SFSU using state-of-the-art Convolutional Neural Networks (CNN).
Methodology
Fog Generation Pipeline
The authors develop an automatic pipeline to add synthetic fog to clear-weather images using depth information, extending a standard optical model for daytime fog. This model considers both the attenuation of scene radiance and the contribution of atmospheric light to simulate fog effects. Key steps include:
- Depth Map Calculation and Refinement: Initial depth maps are obtained using stereo matching algorithms. Severe artifacts and noise in these maps are handled using a novel superpixel-matching optimization and guided filtering.
- Transmission Estimation: The transmission map, which dictates the visibility of objects based on distance, is computed using the refined depth maps.
- Fog Simulation: The final foggy images are generated by combining the clear-weather image with the calculated transmission map and atmospheric light.
Datasets
Two datasets are created to facilitate SFSU:
- Foggy Cityscapes: Derived from the Cityscapes dataset, this includes 550 high-quality synthetically foggy images with fine annotations, and an additional 20,000 heavier fog images without fine annotations (Foggy Cityscapes-coarse).
- Foggy Driving: Comprising 101 real-world foggy images with pixel-level annotations for semantic segmentation and object detection.
Learning Strategies
Supervised Learning
The effectiveness of supervised learning using synthetic foggy images is demonstrated through experiments with modern CNN models like Dilated Convolutional Network (DCN). Fine-tuning these models on Foggy Cityscapes-refined images improves performance on real foggy images from Foggy Driving. For example, fine-tuning DCN improves mean Intersection over Union (IoU) from 34.9% to 37.8%. Notably, the benefit is pronounced for distant parts of the scene, aligning with training data properties.
Semi-Supervised Learning
To reduce reliance on annotated foggy images, a semi-supervised learning approach leveraging clear-weather images is proposed. Segmentation models trained on clear-weather images generate pseudo-labels for foggy counterparts, which are then used for fine-tuning models on foggy images. This approach yields improvements in mean IoU from 46.3% to 49.7%, validating the strategy of transferring supervision from clear to foggy weather conditions.
Dehazing and its Utility
The paper also investigates the utility of dehazing as a preprocessing step using methods like MSCNN, DCP, and non-local dehazing. Results indicate a marginal benefit from dehazing, as direct training on synthetic foggy images without dehazing often yields equal or superior performance. This aligns with the challenges of applying dehazing algorithms out-of-the-box to real-world foggy images where standard assumptions may not hold.
Implications and Future Work
The research contributes a novel approach to generating and utilizing synthetic foggy images for semantic scene understanding. The datasets and models presented are publicly available, facilitating further research in adverse weather conditions. Future directions include integrating dehazing and SFSU into a unified, end-to-end trainable pipeline to enhance understanding under all weather conditions.
Conclusion
The authors successfully demonstrate the value of synthetic data for SFSU, especially when traditional supervised learning and semi-supervised learning paradigms are employed. The marginal utility of image dehazing highlights the necessity for better integration of preprocessing steps within the learning frameworks. The paper’s contributions not only advance the state of SFSU but also offer practical datasets and models for future exploration.