Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation (2110.07742v1)

Published 14 Oct 2021 in cs.CV

Abstract: Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) because of their sparse, asynchronous, and binary event-driven processing. Due to their energy efficiency, SNNs have a high possibility of being deployed for real-world, resource-constrained systems such as autonomous vehicles and drones. However, owing to their non-differentiable and complex neuronal dynamics, most previous SNN optimization methods have been limited to image recognition. In this paper, we explore the SNN applications beyond classification and present semantic segmentation networks configured with spiking neurons. Specifically, we first investigate two representative SNN optimization techniques for recognition tasks (i.e., ANN-SNN conversion and surrogate gradient learning) on semantic segmentation datasets. We observe that, when converted from ANNs, SNNs suffer from high latency and low performance due to the spatial variance of features. Therefore, we directly train networks with surrogate gradient learning, resulting in lower latency and higher performance than ANN-SNN conversion. Moreover, we redesign two fundamental ANN segmentation architectures (i.e., Fully Convolutional Networks and DeepLab) for the SNN domain. We conduct experiments on two public semantic segmentation benchmarks including the PASCAL VOC2012 dataset and the DDD17 event-based dataset. In addition to showing the feasibility of SNNs for semantic segmentation, we show that SNNs can be more robust and energy-efficient compared to their ANN counterparts in this domain.

Citations (66)

View on Semantic Scholar

Summary

The paper demonstrates direct training of SNNs for semantic segmentation by redesigning FCN and DeepLab architectures.
The study leverages surrogate gradient learning to address non-differentiability issues in spiking neural networks.
Experimental results reveal over 2× energy efficiency and improved noise robustness compared to traditional ANNs.

Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation

Introduction

The paper "Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation" (2110.07742) introduces an alternative method to artificial neural networks (ANNs) by implementing spiking neural networks (SNNs) for semantic segmentation tasks. SNNs are recognized for their asynchronous, event-driven processing, which significantly lowers power consumption compared to ANNs. This characteristic makes SNNs ideal for deployment in resource-constrained environments such as autonomous vehicles. The paper addresses the challenges associated with the non-differentiability and complex dynamics of SNNs that have traditionally limited their application to image recognition. The authors propose a solution by redesigning two fundamental ANN segmentation architectures, Fully Convolutional Networks (FCN) and DeepLab, for direct training using SNNs.

Figure 1: The concept of image classification and semantic segmentation. For a given image, the classification network provides a class prediction, while the semantic segmentation network assigns every pixel in the image its own class, resulting in a two-dimensional prediction map.

Methodology

The paper explores two representative methods for optimizing SNNs: ANN-SNN conversion and surrogate gradient learning. In the ANN-SNN conversion, pre-trained ANN models are transformed into SNNs by replacing ReLU activation neurons with Integrate-and-Fire (IF) neurons. However, this approach faces limitations in capturing temporal spike information, resulting in high latency and degraded performance, particularly with tasks beyond basic image classification. In contrast, surrogate gradient learning directly trains SNNs using approximated gradient functions, leading to lower latency and improved performance.

Figure 2: The neuronal dynamics of SNNs, illustrating the spike train transmission and membrane potential dynamics within neurons.

To enhance performance, the authors introduce redesigned architectures of FCN and DeepLab tailored for the SNN domain. Spiking-FCN comprises a downsampling and upsampling network with skip connections, while Spiking-DeepLab employs dilated convolutions and bilinear interpolation. These modifications focus on maintaining high spatial resolution and increasing receptive fields in deep network layers to achieve effective semantic segmentation.

Figure 3: Performance of converted SNNs with respect to the number of time-steps, indicating degraded performance due to large activation variance.

Experimental Setup

The paper provides extensive experimental evaluation using the PASCAL VOC2012 and DDD17 datasets. The PASCAL VOC2012 dataset consists of static image data, which is processed using Poisson rate coding to convert images into spike trains. DDD17 features event-based data captured by dynamic vision sensors (DVS), which naturally provides spike streams allowing direct training without the need for additional coding schemes. Evaluations of the Spiking-FCN and Spiking-DeepLab architectures demonstrate the feasibility of SNNs for semantic segmentation, with enhanced robustness to noise and energy efficiency compared to ANNs.

Figure 4: Illustration of normalized maximum activations across channel and spatial axes in DeepLab architecture.

Results and Analysis

Quantitative results indicate that while ANN counterparts achieve slightly superior performance, the advantages of SNNs are evident in areas of robustness and energy-efficiency. The Spiking-DeepLab and Spiking-FCN architectures deliver robust performance against Gaussian noise due to their spike-based input encoding. Energy consumption analysis highlights that SNN architectures offer more than $2\times$ energy efficiency compared to standard ANNs, particularly when processing static image datasets.

Figure 5: Examples of input representations, showcasing the conversion of static images into Poisson spikes and the accumulation of spikes in event-based camera input.

Figure 6: Spiking-DeepLab architecture unrolled over multiple time-steps, detailing neuron dynamics and spike accumulation in dilated layers.

Conclusion

The paper successfully expands the application of SNNs from image recognition to semantic segmentation, presenting Spiking-FCN and Spiking-DeepLab as viable, energy-efficient alternatives to traditional ANN models. While gaps remain in performance, the inherent robustness and efficiency of SNNs highlight their potential for deployment in constrained environments and future scene-understanding tasks. This foundational research opens doors for further development of SNN-centric learning mechanisms to bridge current performance discrepancies and optimize SNN capabilities for complex vision applications.