Combining Convolution and Delay Learning in Recurrent Spiking Neural Networks

TL;DR

Combining convolution and delay learning in RSNNs achieves 52x inference speedup and 99% parameter savings on audio tasks.

cs.NE 🔴 Advanced 2026-04-17 34 views
Lúcio Folly Sanches Zebendo Eleonora Cicciarella Michele Rossi
Spiking Neural Networks Convolutional Recurrent Architectures Delay Learning Speech Recognition Neuromorphic Computing

Key Findings

Methodology

This paper introduces a novel Convolutional Recurrent Spiking Neural Network (CRSNN) architecture that combines convolutional connections with delay learning. By replacing dense recurrent connections with lightweight 1D convolutions and integrating the DelRec delay learning mechanism, CRSNN effectively reduces parameter overhead while maintaining efficient temporal modeling capabilities. This approach is particularly suitable for signals with local temporal correlations, such as audio spectrograms.

Key Results

  • On the Spiking Heidelberg Digits (SHD) dataset, CRSNN achieved an accuracy of 91.51%, closely matching DelRec's 91.72%, but with a 99.6% reduction in recurrent parameters. Additionally, inference time was reduced by 52x, demonstrating potential for online streaming applications.
  • On the Spiking Speech Commands (SSC) dataset, CRSNN achieved 78.59% accuracy using baseline hyperparameters from the original paper, showing a gap of almost 4 percentage points compared to DelRec, but with significant improvements in parameter efficiency.
  • Ablation studies on delay learning showed that removing learnable delays led to significant performance drops, especially on the SHD dataset, where accuracy decreased by more than 2 percentage points, highlighting the importance of adaptive temporal dynamics.

Significance

This research holds significant implications for both academia and industry, particularly in resource-constrained edge computing devices. By reducing recurrent parameters and accelerating inference time, CRSNN enables real-time applications. Additionally, it addresses the longstanding issues of vanishing or exploding gradients in traditional recurrent networks, enhancing training stability and temporal modeling capabilities.

Technical Contribution

The technical contributions of this paper include the introduction of a novel convolutional recurrent spiking neural network architecture that incorporates learnable axonal delays. This method not only surpasses existing state-of-the-art methods in parameter efficiency but also opens up new engineering possibilities, particularly for deployment on neuromorphic hardware. By exploiting local temporal correlations, CRSNN significantly reduces parameter count without sacrificing performance.

Novelty

This paper is the first to combine convolutional recurrent connections with learnable delays, forming a new spiking neural network architecture. Compared to existing methods, CRSNN reduces parameter overhead by leveraging local temporal correlations while maintaining efficient temporal modeling capabilities, which is crucial for processing signals like audio spectrograms.

Limitations

  • The accuracy on the Spiking Speech Commands dataset is lower than DelRec, indicating potential areas for further optimization in certain tasks.
  • Despite significant parameter reduction, there may be overfitting in some cases, particularly when the number of layers increases.
  • The current implementation is primarily targeted at audio data, and other types of temporal sequence data may require adjustments.

Future Work

Future research directions include exploring the performance of CRSNN on other types of temporal sequence data, such as gesture recognition and biomedical signal processing. Additionally, further optimization of the delay learning mechanism could enhance generalization across different tasks. Investigating how to maintain performance on larger-scale datasets is also an important direction.

AI Executive Summary

Spiking Neural Networks (SNNs) have gained increasing attention as a biologically inspired computational framework, particularly in resource-constrained edge systems. Traditional artificial neural networks face challenges such as vanishing gradients and high computational overhead when processing complex temporal data, whereas SNNs offer an efficient solution by encoding information as sparse, asynchronous binary events.

This paper introduces a novel Convolutional Recurrent Spiking Neural Network (CRSNN) architecture to address the limitations of existing recurrent SNNs in handling long-range dependencies. By incorporating convolutional recurrent connections, CRSNN exploits local correlations in temporal signals, significantly reducing parameter overhead while maintaining efficient temporal modeling capabilities.

In experiments, CRSNN achieved an accuracy of 91.51% on the Spiking Heidelberg Digits (SHD) dataset, closely matching DelRec's 91.72% but with a 99.6% reduction in recurrent parameters. Additionally, inference time was reduced by 52x, demonstrating potential for online streaming applications. On the Spiking Speech Commands (SSC) dataset, CRSNN achieved 78.59% accuracy using baseline hyperparameters, showing a gap of almost 4 percentage points compared to DelRec but with significant improvements in parameter efficiency.

This research holds significant implications for both academia and industry, particularly in resource-constrained edge computing devices. By reducing recurrent parameters and accelerating inference time, CRSNN enables real-time applications. Additionally, it addresses the longstanding issues of vanishing or exploding gradients in traditional recurrent networks, enhancing training stability and temporal modeling capabilities.

However, there is room for improvement in CRSNN's performance on certain tasks, particularly on the Spiking Speech Commands dataset. Future research directions include exploring CRSNN's performance on other types of temporal sequence data, such as gesture recognition and biomedical signal processing. Additionally, further optimization of the delay learning mechanism could enhance generalization across different tasks. Investigating how to maintain performance on larger-scale datasets is also an important direction.

Deep Analysis

Background

Spiking Neural Networks (SNNs) have shown tremendous potential in processing temporal sequence data. Unlike traditional Artificial Neural Networks (ANNs), SNNs offer an efficient computational framework by encoding information as sparse, asynchronous binary events, making them particularly suitable for neuromorphic hardware. Recurrent Spiking Neural Networks (RSNNs) enhance temporal modeling capabilities by incorporating recurrent connections and neuron state dynamics, allowing for information integration over extended time horizons. However, training RSNNs on complex temporal tasks remains challenging, especially with gradient-based optimization methods, where gradients can vanish or explode over long sequences. Recent advances have been made by enhancing spiking neuron models and incorporating learnable delay mechanisms.

Core Problem

Traditional Recurrent Neural Networks (RNNs) face challenges such as vanishing gradients and high computational overhead when processing complex temporal sequence data. Although SNNs offer an efficient solution, they still face limitations in handling long-range dependencies. Specifically, existing RSNNs present a trade-off between parameter efficiency and temporal modeling capabilities, especially in resource-constrained edge devices. Addressing how to reduce parameter overhead while maintaining efficient temporal modeling capabilities is a pressing issue.

Innovation

The core innovations of this paper include the introduction of a Convolutional Recurrent Spiking Neural Network (CRSNN) architecture. • By replacing dense recurrent connections with lightweight 1D convolutions, CRSNN exploits local temporal correlations, significantly reducing parameter overhead. • Integrating the DelRec delay learning mechanism enhances temporal modeling capabilities when handling long-range dependencies. • This approach is particularly suitable for signals with local temporal correlations, such as audio spectrograms, demonstrating potential for neuromorphic hardware deployment.

Methodology

The design of CRSNN includes several key steps: • Replacing dense recurrent connections with lightweight 1D convolutions to reduce parameter overhead. • Integrating the DelRec delay learning mechanism to enhance temporal modeling capabilities through learnable axonal delays. • Implementing a circular buffer to manage recurrent inputs and optimize using a differentiable triangular spread function. • Using surrogate gradient learning methods to optimize SNN parameters during training, ensuring efficient model training.

Experiments

The experimental design includes evaluations on two neuromorphic audio benchmark datasets: Spiking Heidelberg Digits (SHD) and Spiking Speech Commands (SSC). • On the SHD dataset, CRSNN's performance was evaluated with 2 and 4 hidden layers and compared with DelRec. • On the SSC dataset, training was conducted using baseline hyperparameters. • Standard cross-entropy loss was used, and hyperparameters were optimized. • Ablation studies were conducted to assess the impact of learnable delays on model performance.

Results

On the SHD dataset, CRSNN achieved an accuracy of 91.51%, closely matching DelRec's 91.72% but with a 99.6% reduction in recurrent parameters. Additionally, inference time was reduced by 52x, demonstrating potential for online streaming applications. On the SSC dataset, CRSNN achieved 78.59% accuracy using baseline hyperparameters, showing a gap of almost 4 percentage points compared to DelRec but with significant improvements in parameter efficiency. Ablation studies showed that removing learnable delays led to significant performance drops, especially on the SHD dataset, where accuracy decreased by more than 2 percentage points, highlighting the importance of adaptive temporal dynamics.

Applications

CRSNN has broad application potential in resource-constrained edge computing devices. • In real-time audio processing and speech recognition, CRSNN provides fast, efficient inference capabilities. • On neuromorphic hardware, CRSNN demonstrates potential for IoT device applications by reducing parameter overhead and energy consumption. • Future research can explore CRSNN's performance on other types of temporal sequence data, such as gesture recognition and biomedical signal processing.

Limitations & Outlook

Despite CRSNN's impressive parameter efficiency and inference speed, there is room for improvement in certain tasks, particularly on the Spiking Speech Commands dataset. • Increasing the number of layers may lead to overfitting, necessitating further optimization of the model structure. • The current implementation is primarily targeted at audio data, and other types of temporal sequence data may require adjustments. Future research can explore how to maintain performance on larger-scale datasets and optimize the delay learning mechanism to enhance generalization.

Plain Language Accessible to non-experts

Imagine you're cooking in a kitchen. Traditional neural networks are like a pot that needs constant stirring, mixing all ingredients together, and over time, it might burn. Spiking Neural Networks (SNNs) are like a smart oven that only heats when needed, saving energy. Recurrent Spiking Neural Networks (RSNNs) are like a multi-layer oven that can handle different ingredients at different times, ensuring each ingredient is cooked properly. However, traditional RSNNs might have too many ingredients in the pot, making it hard to control. The CRSNN proposed in this paper is like an oven with smart temperature control, using convolutional connections to reduce unnecessary stirring while maintaining precise control over the ingredients. This way, not only do we save energy, but we also ensure that each dish is served at its best timing.

ELI14 Explained like you're 14

Hey there, young explorers! Today we're diving into something super cool—Spiking Neural Networks (SNNs). Imagine you're playing a super complex video game where you need to control many characters at once. Traditional neural networks are like a controller that needs you to press many buttons at the same time, and your hands get tired. But SNNs are like a smart game controller that only presses buttons when needed, super energy-saving!

Now, Recurrent Spiking Neural Networks (RSNNs) are even cooler. They're like a super controller that remembers your previous moves, helping you make smarter decisions in the game. But sometimes, there are too many buttons on the controller, and you might press the wrong one. So, scientists invented a new controller—CRSNN. It's like a controller with smart touch controls, where you just need to swipe lightly to perform complex actions.

This new controller not only helps you react quickly in the game but also saves battery life. Isn't that awesome? However, this controller might need some tweaks for certain games, like language games, where it might need more tuning. But overall, it provides us with a more efficient and smarter gaming experience. Isn't that cool?

Glossary

Spiking Neural Networks

SNNs are biologically inspired neural network models that encode information as sparse, asynchronous binary events, making them particularly suitable for neuromorphic hardware.

In this paper, SNNs are used to process complex temporal sequence data, providing an efficient computational framework.

Recurrent Spiking Neural Networks

RSNNs enhance temporal modeling capabilities by incorporating recurrent connections and neuron state dynamics, allowing for information integration over extended time horizons.

This paper explores the limitations of RSNNs in handling long-range dependencies and proposes improvements.

Convolutional Recurrent Spiking Neural Networks

CRSNN combines convolutional connections with learnable delay mechanisms, exploiting local temporal correlations to significantly reduce parameter overhead.

This paper proposes CRSNN as a new architecture to enhance temporal modeling capabilities.

DelRec

DelRec is a recurrent spiking neural network architecture that incorporates learnable axonal delays to enhance temporal modeling capabilities.

This paper builds on DelRec by introducing convolutional connections in CRSNN.

Learnable Delays

In neural networks, delays refer to the time required for a signal to travel from one neuron to another. Learnable delays allow the network to automatically adjust these times during training to optimize performance.

This paper enhances temporal modeling capabilities by introducing learnable delays.

Surrogate Gradient Learning

SGL is a technique used to train SNNs by replacing the true derivative of the Heaviside function with a smooth approximation, enabling standard backpropagation to optimize SNN parameters.

This paper uses SGL to optimize CRSNN parameters, ensuring efficient model training.

Spiking Heidelberg Digits Dataset

The SHD dataset is a neuromorphic audio benchmark dataset used to evaluate spiking neural networks, containing audio samples of digits in different languages.

This paper evaluates CRSNN's performance on the SHD dataset.

Spiking Speech Commands Dataset

The SSC dataset is a neuromorphic audio benchmark dataset used for speech command recognition, containing audio samples of various language commands.

This paper evaluates CRSNN's performance on the SSC dataset.

Axonal Delays

Axonal delays refer to the time delay in signal transmission between neurons, affecting the temporal modeling capabilities of neural networks.

This paper enhances CRSNN's temporal modeling capabilities by introducing learnable axonal delays.

Circular Buffer

A circular buffer is a data structure used to manage recurrent inputs in recurrent neural networks, ensuring signal transmission between different time steps.

This paper uses a circular buffer to manage recurrent inputs in CRSNN.

Leaky Integrate-and-Fire Neuron

The LIF neuron is a commonly used spiking neuron model that balances biological realism and computational efficiency, suitable for neuromorphic computing.

This paper uses LIF neurons in CRSNN for temporal modeling.

Temporal Modeling

Temporal modeling refers to processing and analyzing temporal sequence data in neural networks to capture temporal dependencies.

This paper enhances CRSNN's temporal modeling capabilities by combining convolution and delay learning.

Neuromorphic Hardware

Neuromorphic hardware is a hardware architecture specifically designed to run spiking neural networks, efficiently processing sparse, asynchronous binary events.

This paper explores the potential application of CRSNN on neuromorphic hardware.

Gradient Vanishing

Gradient vanishing refers to the gradual decrease of gradients in recurrent neural networks over long sequences, making it difficult for the model to learn long-range dependencies.

This paper alleviates the gradient vanishing problem by introducing learnable delays.

Gradient Explosion

Gradient explosion refers to the excessive increase of gradients in recurrent neural networks over long sequences, leading to model instability.

This paper alleviates the gradient explosion problem by introducing learnable delays.

Open Questions Unanswered questions from this research

  • 1 Although CRSNN performs well on audio data, its performance on other types of temporal sequence data remains to be verified. Existing research primarily focuses on audio data, and future exploration is needed in fields such as gesture recognition and biomedical signal processing.
  • 2 CRSNN's accuracy on the Spiking Speech Commands dataset is lower than DelRec, indicating potential areas for further optimization. Future research could explore how to adjust hyperparameters and model structures to improve generalization across different tasks.
  • 3 The current implementation is primarily targeted at neuromorphic hardware, and further research is needed to efficiently run CRSNN on traditional hardware. Optimizing performance on large-scale datasets is a pressing issue.
  • 4 The generalization ability of the delay learning mechanism across different tasks remains to be verified. Although it performs well on audio data, further research is needed on other types of temporal sequence data.
  • 5 CRSNN may overfit when the number of layers increases, especially with reduced parameters. Future research could explore how to improve the model's generalization ability without increasing parameters.

Applications

Immediate Applications

Real-time Audio Processing

CRSNN enables fast, efficient audio processing on resource-constrained edge devices, suitable for applications such as speech recognition and audio classification.

Neuromorphic Hardware Applications

By reducing parameter overhead and energy consumption, CRSNN demonstrates potential for IoT device applications, particularly in low-power scenarios.

Online Streaming Applications

CRSNN's fast inference capabilities make it suitable for online streaming applications, allowing efficient real-time processing of audio data.

Long-term Vision

Gesture Recognition

Future research could explore CRSNN's application in gesture recognition, enhancing modeling capabilities for complex temporal sequence data by combining convolution and delay learning.

Biomedical Signal Processing

CRSNN demonstrates significant potential in biomedical signal processing, providing efficient solutions for handling complex biomedical temporal sequence data.

Abstract

Spiking neural networks (SNNs) are rapidly gaining momentum as an alternative to conventional artificial neural networks in resource constrained edge systems. In this work, we continue a recent research line on recurrent SNNs where axonal delays are learned at runtime along with the other network parameters. The first proposed approach, dubbed DelRec, demonstrated the benefit of recurrent delay learning in SNNs. Here, we extend it by advocating the use of convolutional recurrent connections in conjunction with the DelRec delay learning mechanism. According to our tests on an audio classification task, this leads to a streamlined architecture with smaller memory footprint (around 99% savings in terms of number of recurrent parameters) and a much faster (52x) inference time, while retaining DelRec's accuracy. Our code is available at: https://github.com/luciozebendo/delrec_snn/tree/conv_delays

cs.NE

References (9)

DelRec: learning delays in recurrent spiking neural networks

Alexandre Queant, Ulysse Rançon, Benoit R. Cottereau et al.

2025 8 citations ⭐ Influential View Analysis →

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Ilyass Hammouamri, Ismail Khalfaoui Hassani, T. Masquelier

2023 103 citations View Analysis →

Training Spiking Neural Networks Using Lessons From Deep Learning

J. Eshraghian, Max Ward, Emre O. Neftci et al.

2021 768 citations View Analysis →

Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks

Emre O. Neftci, H. Mostafa, Friedemann Zenke

2019 1651 citations View Analysis →

Advancing spatio-temporal processing through adaptation in spiking neural networks

Maximilian Baronig, Romain Ferrand, Silvester Sabathiel et al.

2024 23 citations View Analysis →

ASRC-SNN: Adaptive Skip Recurrent Connection Spiking Neural Network

Shang Xu, Jiayu Zhang, Ziming Wang et al.

2025 3 citations View Analysis →

The Heidelberg Spiking Data Sets for the Systematic Evaluation of Spiking Neural Networks

Benjamin Cramer, Yannik Stradmann, J. Schemmel et al.

2019 308 citations View Analysis →

DelGrad: exact event-based gradients for training delays and weights on spiking neuromorphic hardware

Julian Goltz, Jimmy Weber, Laura Kriener et al.

2024 15 citations View Analysis →

Co-learning synaptic delays, weights and adaptation in spiking neural networks

Lucas Deckers, Lauren Damme, Ing Jyh Tsang et al.

2023 28 citations View Analysis →