Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection (1810.09111v3)

Published 22 Oct 2018 in cs.CV

Abstract: A critical challenge problem of scene change detection is that noisy changes generated by varying illumination, shadows and camera viewpoint make variances of a scene difficult to define and measure since the noisy changes and semantic ones are entangled. Following the intuitive idea of detecting changes by directly comparing dissimilarities between a pair of features, we propose a novel fully Convolutional siamese metric Network(CosimNet) to measure changes by customizing implicit metrics. To learn more discriminative metrics, we utilize contrastive loss to reduce the distance between the unchanged feature pairs and to enlarge the distance between the changed feature pairs. Specifically, to address the issue of large viewpoint differences, we propose Thresholded Contrastive Loss (TCL) with a more tolerant strategy to punish noisy changes. We demonstrate the effectiveness of the proposed approach with experiments on three challenging datasets: CDnet, PCD2015, and VL-CMU-CD. Our approach is robust to lots of challenging conditions, such as illumination changes, large viewpoint difference caused by camera motion and zooming. In addition, we incorporate the distance metric into the segmentation framework and validate the effectiveness through visualization of change maps and feature distribution. The source code is available at https://github.com/gmayday1997/ChangeDet.

Citations (84)

View on Semantic Scholar

Summary

The paper introduces CosimNet, a framework that leverages siamese networks and deep metric learning to directly measure dissimilarities between image pairs for scene change detection.
It presents Thresholded Contrastive Loss to adaptively tolerate noisy changes from illumination and viewpoint variations, significantly enhancing detection robustness.
Experimental results on benchmark datasets demonstrate that using Euclidean metrics and multi-layer side outputs improves performance and generalizability in change detection tasks.

Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection

Introduction

The paper introduces a novel approach for scene change detection (SCD)—a critical problem in computer vision—by proposing the fully convolutional siamese metric network (CosimNet). The primary challenge addressed by this research is the differentiation between semantic changes and noisy changes, which are often created by variations in illumination, shadows, and camera viewpoint differences. Unlike traditional FCN-based models that learn decision boundaries, CosimNet measures changes through customized metrics that directly evaluate dissimilarities in image pairs.

CosimNet Architecture

The proposed CosimNet framework leverages siamese networks to extract deep feature pairs from images taken at different times. These features are evaluated using predefined distance metrics such as Euclidean or cosine. The key innovation lies in utilizing a contrastive loss function to optimize these metrics, reducing distances for unchanged feature pairs while increasing them for changed feature pairs, a technique inspired by deep metric learning. This approach transforms the change detection task into a metric learning challenge.

Thresholded Contrastive Loss

To handle noisy changes due to large viewpoint differences—a major limitation in current SCD methods—the authors introduce Thresholded Contrastive Loss (TCL). TCL offers flexibility by implementing a threshold that permits variance within unchanged feature pairs, enhancing robustness to camera rotations and zooming not effectively addressed by traditional metrics. This adjustment allows the network to remain invariant to certain noise types while focusing on semantic changes.

Experimental Evaluation

The research was substantiated through rigorous testing on three benchmark datasets: CDnet, PCD2015, and VL-CMU-CD. The results highlighted the superiority of CosimNet over existing models, achieving state-of-the-art performance on PCD2015 and VL-CMU-CD datasets, with competitive results on CDnet. Importantly, the introduction of TCL significantly improved the model’s performance under extreme viewpoint variations compared to traditional contrastive loss.

Dataset Analysis and Performance

Across datasets, CosimNet has demonstrated substantial enhancements in change detection accuracy, particularly in environments with varying illumination and camera perspectives. The use of Euclidean over cosine distance metrics showed better performance, attributed to its higher discriminative power in separating changed and unchanged pairs. The experiments also revealed that implementing multi-layer side outputs (MLSO) further increased discriminability and improved robustness against challenging conditions.

Implications and Limitations

The paper delineates how CosimNet's architecture can be utilized not only for traditional change detection but potentially applied to other tasks requiring robust differentiation between semantically similar and distinct images. The use of deep metric learning within a unified architecture provides a pathway to developing more adaptive and generalized change detection mechanisms.

However, challenges persist in balancing tolerance levels in TCL to optimize performance under diverse conditions without diminishing interclass separability. The dependency on accurate distance metric calibration and the inherent computational demand of the siamese network framework are also considerations for practical applications.

Conclusion

CosimNet presents a significant contribution to scene change detection by addressing the entwined challenge of semantic versus noisy changes through implicit metric learning. Its application to real-world datasets demonstrates meaningful advances in detecting and segmenting scene changes under complex and variable conditions, offering future possibilities for broader applications in computer vision and remote sensing domains. The innovative use of thresholded contrastive loss provides a critical refinement to traditional methodologies, paving the way for more nuanced interpretations of visual change.