Semantic Foggy Scene Understanding with Synthetic Data (1708.07819v3)

Published 25 Aug 2017 in cs.CV

Abstract: This work addresses the problem of semantic foggy scene understanding (SFSU). Although extensive research has been performed on image dehazing and on semantic scene understanding with clear-weather images, little attention has been paid to SFSU. Due to the difficulty of collecting and annotating foggy images, we choose to generate synthetic fog on real images that depict clear-weather outdoor scenes, and then leverage these partially synthetic data for SFSU by employing state-of-the-art convolutional neural networks (CNN). In particular, a complete pipeline to add synthetic fog to real, clear-weather images using incomplete depth information is developed. We apply our fog synthesis on the Cityscapes dataset and generate Foggy Cityscapes with 20550 images. SFSU is tackled in two ways: 1) with typical supervised learning, and 2) with a novel type of semi-supervised learning, which combines 1) with an unsupervised supervision transfer from clear-weather images to their synthetic foggy counterparts. In addition, we carefully study the usefulness of image dehazing for SFSU. For evaluation, we present Foggy Driving, a dataset with 101 real-world images depicting foggy driving scenes, which come with ground truth annotations for semantic segmentation and object detection. Extensive experiments show that 1) supervised learning with our synthetic data significantly improves the performance of state-of-the-art CNN for SFSU on Foggy Driving; 2) our semi-supervised learning strategy further improves performance; and 3) image dehazing marginally advances SFSU with our learning strategy. The datasets, models and code are made publicly available.

Authors (3)

Christos Sakaridis (46 papers)
Dengxin Dai (99 papers)
Luc Van Gool (570 papers)

Citations (1,002)

View on Semantic Scholar

Summary

The paper presents a novel synthetic fog generation pipeline that simulates realistic fog effects using refined depth maps and atmospheric light estimation.
It demonstrates improved CNN performance, with fine-tuning on synthetic foggy images raising mean IoU from 34.9% to 37.8% on real-world foggy scenes.
It introduces a semi-supervised strategy using pseudo-labels from clear-weather images to further enhance segmentation accuracy while reducing annotation costs.

Semantic Foggy Scene Understanding with Synthetic Data

Introduction

This paper tackles the problem of Semantic Foggy Scene Understanding (SFSU), an area that has received limited attention compared to general semantic scene understanding and image dehazing. The significant challenge in SFSU lies in the difficulty of collecting and labeling foggy images. To address this, the authors generate synthetic fog on existing clear-weather images and leverage these partially synthetic images for SFSU using state-of-the-art Convolutional Neural Networks (CNN).

Methodology

Fog Generation Pipeline

The authors develop an automatic pipeline to add synthetic fog to clear-weather images using depth information, extending a standard optical model for daytime fog. This model considers both the attenuation of scene radiance and the contribution of atmospheric light to simulate fog effects. Key steps include:

Depth Map Calculation and Refinement: Initial depth maps are obtained using stereo matching algorithms. Severe artifacts and noise in these maps are handled using a novel superpixel-matching optimization and guided filtering.
Transmission Estimation: The transmission map, which dictates the visibility of objects based on distance, is computed using the refined depth maps.
Fog Simulation: The final foggy images are generated by combining the clear-weather image with the calculated transmission map and atmospheric light.

Datasets

Two datasets are created to facilitate SFSU:

Foggy Cityscapes: Derived from the Cityscapes dataset, this includes 550 high-quality synthetically foggy images with fine annotations, and an additional 20,000 heavier fog images without fine annotations (Foggy Cityscapes-coarse).
Foggy Driving: Comprising 101 real-world foggy images with pixel-level annotations for semantic segmentation and object detection.

Learning Strategies

Supervised Learning

The effectiveness of supervised learning using synthetic foggy images is demonstrated through experiments with modern CNN models like Dilated Convolutional Network (DCN). Fine-tuning these models on Foggy Cityscapes-refined images improves performance on real foggy images from Foggy Driving. For example, fine-tuning DCN improves mean Intersection over Union (IoU) from 34.9% to 37.8%. Notably, the benefit is pronounced for distant parts of the scene, aligning with training data properties.

Semi-Supervised Learning

To reduce reliance on annotated foggy images, a semi-supervised learning approach leveraging clear-weather images is proposed. Segmentation models trained on clear-weather images generate pseudo-labels for foggy counterparts, which are then used for fine-tuning models on foggy images. This approach yields improvements in mean IoU from 46.3% to 49.7%, validating the strategy of transferring supervision from clear to foggy weather conditions.

Dehazing and its Utility

The paper also investigates the utility of dehazing as a preprocessing step using methods like MSCNN, DCP, and non-local dehazing. Results indicate a marginal benefit from dehazing, as direct training on synthetic foggy images without dehazing often yields equal or superior performance. This aligns with the challenges of applying dehazing algorithms out-of-the-box to real-world foggy images where standard assumptions may not hold.

Implications and Future Work

The research contributes a novel approach to generating and utilizing synthetic foggy images for semantic scene understanding. The datasets and models presented are publicly available, facilitating further research in adverse weather conditions. Future directions include integrating dehazing and SFSU into a unified, end-to-end trainable pipeline to enhance understanding under all weather conditions.

Conclusion

The authors successfully demonstrate the value of synthetic data for SFSU, especially when traditional supervised learning and semi-supervised learning paradigms are employed. The marginal utility of image dehazing highlights the necessity for better integration of preprocessing steps within the learning frameworks. The paper’s contributions not only advance the state of SFSU but also offer practical datasets and models for future exploration.

PDF Markdown