RGBD Salient Object Detection via Deep Fusion (1607.03333v1)

Published 12 Jul 2016 in cs.CV

Abstract: Numerous efforts have been made to design different low level saliency cues for the RGBD saliency detection, such as color or depth contrast features, background and color compactness priors. However, how these saliency cues interact with each other and how to incorporate these low level saliency cues effectively to generate a master saliency map remain a challenging problem. In this paper, we design a new convolutional neural network (CNN) to fuse different low level saliency cues into hierarchical features for automatically detecting salient objects in RGBD images. In contrast to the existing works that directly feed raw image pixels to the CNN, the proposed method takes advantage of the knowledge in traditional saliency detection by adopting various meaningful and well-designed saliency feature vectors as input. This can guide the training of CNN towards detecting salient object more effectively due to the reduced learning ambiguity. We then integrate a Laplacian propagation framework with the learned CNN to extract a spatially consistent saliency map by exploiting the intrinsic structure of the input image. Extensive quantitative and qualitative experimental evaluations on three datasets demonstrate that the proposed method consistently outperforms state-of-the-art methods.

Citations (330)

View on Semantic Scholar

Summary

The paper introduces a novel deep fusion framework that integrates CNN and Laplacian propagation to effectively combine RGB and depth features.
It extracts multi-scale saliency features from superpixels and leverages CNN to generate robust hyper-features for detection.
Experimental results on benchmark datasets show enhanced precision, recall, and F-measure compared to state-of-the-art methods.

Deep Fusion Framework for RGBD Salient Object Detection

The paper, "RGBD Salient Object Detection via Deep Fusion," introduces a novel approach to salient object detection in RGBD images by leveraging deep fusion of low-level saliency cues through a Convolutional Neural Network (CNN) integrated with a Laplacian propagation framework. Authored by Liangqiong Qu et al., this paper addresses the challenges associated with effectively combining RGB and depth information to enhance the detection of salient objects, a task critical for numerous computer vision applications such as image classification, retargeting, and object recognition.

Overview of the Proposed Method

The proposed framework entails three primary components:

Saliency Feature Vector Extraction: The framework first segments the RGBD image into superpixels and computes several saliency feature vectors, including local and global contrast, background prior, and color compactness. These features are crucial for encapsulating the salient cues from both RGB and depth channels, laying the groundwork for the subsequent fusion process.
Hyper-feature Extraction via CNN: Unlike conventional methods that feed raw pixel data into CNN, this paper advocates using hand-designed saliency feature vectors as input to a CNN. The CNN learns the interactions among these features to produce hyper-features, leading to more representative and powerful discriminative capabilities for salient object detection. The network architecture consists of convolutional and fully connected layers, optimized for identifying salient regions.
Laplacian Propagation for Consistency: To address spatial inconsistencies and noise in the saliency map output from the CNN, a Laplacian propagation framework is employed. This method leverages high-confidence saliency regions to propagate saliency information, ensuring a spatially consistent map by utilizing color and depth affinities between regions.

Experimental Validation and Results

The authors validate their approach on three benchmark datasets—NLPR, NJUDS2000, and LFSD—demonstrating superior performance compared to existing state-of-the-art methods. The proposed deep fusion framework consistently achieves higher precision, recall, and F-measure scores across these datasets. Notably, the integration of Laplacian propagation not only enhances the spatial consistency of the detected saliency maps but also proves to be beneficial when applied to optimize existing approaches.

Implications and Future Directions

This work substantiates the efficacy of deep learning models in fusing multi-modal cues for visual tasks, particularly underlining the potential of feature-level fusion approaches over traditional pixel-level techniques. The use of a CNN with saliency feature vectors as input could inspire further research on tailoring deep networks to leverage domain-specific knowledge effectively.

The framework's adaptability suggests opportunities for extension to other settings, such as incorporating additional cues like object motion or context features. Moreover, exploring deeper architectures or integrating attention mechanisms may yield further improvements in model accuracy and robustness. The theoretical basis of the Laplacian propagation also invites exploration into more sophisticated graph-based regularization techniques, potentially contributing broader advancements in domain-general saliency detection paradigms.

Overall, this paper offers valuable insights into the development of more effective and generalizable object detection frameworks by intelligently bridging traditional computer vision techniques with modern deep learning methodologies.