SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks
Proposed an SRAM-based CIM accelerator optimizing linear-decay SNNs, achieving 15.9 to 69 times energy efficiency improvement.
Key Findings
Methodology
This paper proposes an SRAM-based Compute-in-Memory (CIM) architecture to optimize linear-decay Spiking Neural Networks (SNNs). At the algorithmic level, a linear decay approximation replaces the conventional exponential membrane decay, converting costly multiplications into simple additions with only about 1% accuracy drop. At the architectural level, an in-memory parallel update scheme is designed to perform in-place decay directly within the SRAM array, eliminating the need for global sequential updates. This method significantly enhances the energy efficiency of SNN inference.
Key Results
- Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1x to 16.7x reduction in SOP energy consumption, while providing 15.9x to 69x more energy efficiency, with negligible accuracy loss.
- Tested on N-MNIST, SHD, and DVS Gesture datasets, the linear decay model's accuracy decreases by only 0.96%, 1.11%, and 0.38%, respectively, demonstrating the method's effectiveness.
- Compared to a traditional digital implementation of the v-LIF model, the LD-LIF module achieves approximately 5.2 times improvement in energy efficiency under TSMC 65 nm process.
Significance
This research significantly improves the energy efficiency and real-time processing capability of Compute-in-Memory architectures by optimizing the state update process in Spiking Neural Networks. In traditional SNN inference, state updates are the primary latency and energy bottleneck. This method successfully addresses this issue through linear decay and in-memory parallel updates. It holds significant academic value and provides a low-power, scalable solution for neuromorphic processing in the industry.
Technical Contribution
Technically, this paper contributes a novel linear-decay spiking neural network model that simplifies computational complexity without significantly reducing accuracy. The proposed SRAM in-memory parallel update scheme overcomes the bottleneck of state updates in traditional CIM architectures, offering new engineering possibilities for efficient neuromorphic processors.
Novelty
This is the first to introduce a linear decay model in CIM architectures to optimize the state update process of spiking neural networks. Compared to previous exponential decay models, linear decay is simpler to implement in hardware and significantly enhances energy efficiency without compromising accuracy.
Limitations
- While the linear decay model excels in energy efficiency, it may not fully replace traditional exponential decay models in certain applications requiring highly precise neural activity simulation.
- The method's performance on more complex SNN architectures or larger datasets has not been verified, potentially requiring further optimization and adjustments.
- Due to the use of SRAM technology, there may be limitations in cost and scalability, especially in large-scale commercial applications.
Future Work
Future research directions include validating the method's effectiveness on more complex spiking neural network architectures and exploring the application of other memory technologies (such as RRAM, PCM) in CIM architectures. Further optimization of linear decay model parameters to adapt to different application scenarios and requirements is also an important research direction.
AI Executive Summary
Spiking Neural Networks (SNNs) have emerged as a biologically inspired alternative to conventional deep networks, offering event-driven and energy-efficient computation. However, their throughput is constrained by the serial update of neuron membrane states. While many hardware accelerators and Compute-in-Memory (CIM) architectures efficiently parallelize the synaptic operation (W x I), achieving O(1) complexity for matrix-vector multiplication, the subsequent state update step still requires O(N) time to refresh all neuron membrane potentials. This mismatch makes state update the dominant latency and energy bottleneck in SNN inference.
To address this challenge, this paper proposes an SRAM-based CIM architecture with Linear Decay Leaky Integrate-and-Fire (LD-LIF) Neurons that co-optimizes algorithm and hardware. At the algorithmic level, the conventional exponential membrane decay is replaced with a linear decay approximation, converting costly multiplications into simple additions while accuracy drops only around 1%. At the architectural level, an in-memory parallel update scheme is introduced to perform in-place decay directly within the SRAM array, eliminating the need for global sequential updates.
Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1x to 16.7x reduction in SOP energy consumption, while providing 15.9x to 69x more energy efficiency, with negligible accuracy loss. Experimental results show that the LD-LIF neurons achieve accuracy decreases of only 0.96%, 1.11%, and 0.38% on N-MNIST, SHD, and DVS Gesture datasets, respectively.
This research holds significant academic value and provides a low-power, scalable solution for neuromorphic processing in the industry by optimizing the state update process in Spiking Neural Networks. It significantly improves the energy efficiency and real-time processing capability of Compute-in-Memory architectures.
However, while the linear decay model excels in energy efficiency, it may not fully replace traditional exponential decay models in certain applications requiring highly precise neural activity simulation. Future research directions include validating the method's effectiveness on more complex SNN architectures and exploring the application of other memory technologies in CIM architectures.
Deep Analysis
Background
Spiking Neural Networks (SNNs) have gained considerable attention due to their biologically inspired characteristics and energy-efficient computation capabilities. Traditional deep neural networks, despite their excellent performance in many tasks, are limited by high energy consumption and the need for large-scale data, restricting their use in real-time applications. SNNs, by simulating the pulse firing mechanism of biological neurons, achieve event-driven computation, significantly reducing energy consumption. However, the efficiency of SNN inference is limited by the serial update of neuron membrane states, which becomes a bottleneck in large-scale applications. Many studies have attempted to improve the computational efficiency of SNNs through hardware accelerators and Compute-in-Memory (CIM) architectures, but the energy and latency issues of state updates remain.
Core Problem
The core problem of Spiking Neural Networks lies in the high energy consumption and latency of their state update process. Although synaptic operations (W x I) can be efficiently parallelized through CIM architectures, each neuron's membrane potential state update still requires O(N) time. This mismatch makes state updates the primary bottleneck in SNN inference, limiting their potential in real-time applications. Solving this problem is crucial for achieving large-scale, low-power neuromorphic processors.
Innovation
The core innovations of this paper include:
1. Introduction of a Linear Decay Model: A linear decay approximation replaces the traditional exponential membrane decay, converting costly multiplications into simple additions, significantly reducing computational complexity.
2. In-Memory Parallel Update Scheme: A scheme is designed to perform in-place decay directly within the SRAM array, eliminating the need for global sequential updates, greatly enhancing the efficiency of state updates.
3. Co-optimization of Algorithm and Hardware: The algorithm and hardware are co-optimized to achieve efficient inference of spiking neural networks.
Methodology
The methodology of this paper includes the following key steps:
- �� Algorithm Optimization: A linear decay approximation replaces the traditional exponential membrane decay, converting costly multiplications into simple additions, with only about 1% accuracy drop.
- �� Hardware Architecture Design: An SRAM-based CIM architecture is designed to support in-memory parallel updates, performing in-place decay directly within the SRAM array.
- �� Experimental Validation: Experiments are conducted on N-MNIST, SHD, and DVS Gesture datasets to evaluate the energy efficiency and accuracy performance of the method.
Experiments
The experimental design includes testing the proposed LD-LIF model on N-MNIST, SHD, and DVS Gesture datasets. The baseline model is the traditional v-LIF model, with evaluation metrics including accuracy, energy consumption, and latency. The impact of weight quantization is also analyzed, with MLP-1 using 3-bit quantization and MLP-2 and CNN using 4-bit quantization. Additionally, the learning and analysis of linear decay parameters are conducted.
Results
Experimental results show that the LD-LIF model achieves accuracy decreases of only 0.96%, 1.11%, and 0.38% on N-MNIST, SHD, and DVS Gesture datasets, respectively. Compared to the traditional v-LIF model, the LD-LIF module achieves approximately 5.2 times improvement in energy efficiency under TSMC 65 nm process. SOP energy consumption is reduced by 1.1x to 16.7x, and energy efficiency is improved by 15.9x to 69x, with negligible accuracy loss.
Applications
This method has direct application potential in low-power, real-time neuromorphic processors, particularly in scenarios requiring high energy efficiency and fast response, such as smart surveillance, autonomous driving, and IoT devices. By optimizing the state update process, this method can significantly enhance the inference efficiency of SNNs without significantly reducing accuracy.
Limitations & Outlook
While the linear decay model excels in energy efficiency, it may not fully replace traditional exponential decay models in certain applications requiring highly precise neural activity simulation. Additionally, the method's performance on more complex SNN architectures or larger datasets has not been verified, potentially requiring further optimization and adjustments. Due to the use of SRAM technology, there may be limitations in cost and scalability, especially in large-scale commercial applications.
Plain Language Accessible to non-experts
Imagine you're in a kitchen cooking a meal. Traditional spiking neural networks are like a complex recipe that needs to be followed step by step, with precise measurements and mixing, much like the exponential decay model requiring complex calculations. The method proposed in this paper is like a simplified recipe that only needs simple additions and stirring to achieve almost the same taste effect. This is akin to the linear decay model, which saves time and energy by simplifying the calculation steps. What's more, this new recipe can handle multiple dishes at once, similar to the in-memory parallel update scheme that can update multiple neuron states simultaneously, greatly improving efficiency. This way, you can not only cook faster but also save energy, isn't that great?
ELI14 Explained like you're 14
Hey there, friends! Imagine you're playing a super cool game where your character needs to level up constantly to defeat enemies. The traditional way of leveling up is like completing tasks one by one, which is time-consuming and laborious. But this paper proposes a new method, like giving you a super booster that lets you complete multiple tasks at once and level up quickly! This is like using a linear decay model instead of the traditional exponential model, simple and efficient. Plus, this method can help you save energy, like making your gaming console more power-efficient so you can play longer! Isn't that awesome?
Glossary
Spiking Neural Networks (SNNs)
A type of neural network that mimics the pulse firing mechanism of biological neurons, characterized by high energy efficiency and event-driven computation.
Used for low-power neuromorphic computing.
Compute-in-Memory (CIM)
An architecture that integrates computing functions directly into memory, aiming to reduce energy consumption and latency of data transfer.
Used to optimize synaptic operations in SNNs.
Linear Decay
A model that approximates traditional exponential decay with a linear function, simplifying computational complexity.
Used to replace traditional exponential membrane decay.
SRAM (Static Random Access Memory)
A high-speed, low-power memory technology suitable for frequent read-write operations.
Used to implement the in-memory parallel update scheme.
Leaky Integrate-and-Fire (LIF)
A neuron model that simulates the firing mechanism of neurons through accumulation and decay of membrane potential.
Describes the dynamics of neuron membrane potential.
Energy Efficiency
Refers to the amount of computation that can be completed per unit of energy consumed, often used to evaluate hardware performance.
Measures the performance improvement of CIM architectures.
State Update
Refers to the process of updating neuron membrane potential after receiving input, a key step in SNN inference.
The primary energy and latency bottleneck in SNN inference.
Synaptic Operation
Refers to the computation of weight and input product in neural networks, a fundamental operation in SNN inference.
Achieved efficient parallelization in CIM architectures.
Energy Consumption
Refers to the amount of energy consumed in executing a specific computational task, an important indicator of hardware efficiency.
Used to evaluate the performance of the LD-LIF model.
Parallel Update
A technique for updating multiple neuron states simultaneously, aiming to improve computational efficiency.
Implemented in SRAM arrays for in-place decay.
Open Questions Unanswered questions from this research
- 1 While the linear decay model excels in energy efficiency, it may not fully replace traditional exponential decay models in certain applications requiring highly precise neural activity simulation. Future research needs to explore how to further optimize the linear decay model without significantly reducing accuracy.
- 2 The method's performance on more complex SNN architectures or larger datasets has not been verified, potentially requiring further optimization and adjustments. Future research needs to validate its effectiveness in larger-scale applications and explore potential performance bottlenecks.
- 3 Due to the use of SRAM technology, there may be limitations in cost and scalability, especially in large-scale commercial applications. Future research needs to explore the application of other memory technologies (such as RRAM, PCM) in CIM architectures to improve cost-effectiveness and scalability.
- 4 The choice of linear decay parameters significantly impacts model performance, but there is currently a lack of systematic methods to determine optimal parameters. Future research needs to develop automated parameter optimization methods to improve model adaptability and robustness.
- 5 Although the in-memory parallel update scheme significantly enhances the efficiency of state updates, there may still be potential latency bottlenecks in certain complex neural network architectures. Future research needs to explore more efficient parallel computing techniques to further improve computational efficiency.
Applications
Immediate Applications
Smart Surveillance
By optimizing the energy efficiency and response speed of SNNs, this method can be used in real-time surveillance systems to improve the accuracy and efficiency of event detection.
Autonomous Driving
In autonomous driving systems, this method can be used to process sensor data in real-time, improving vehicle response speed and safety.
IoT Devices
This method can be used in low-power IoT devices to extend battery life and improve data processing efficiency.
Long-term Vision
Smart Cities
By applying this method in smart city infrastructure, more efficient resource management and real-time data analysis can be achieved, enhancing the level of urban intelligence.
Brain-Computer Interfaces
In brain-computer interface technology, this method can be used to process neural signals in real-time, improving device response speed and user experience.
Abstract
Spiking Neural Networks (SNNs) have emerged as a biologically inspired alternative to conventional deep networks, offering event-driven and energy-efficient computation. However, their throughput remains constrained by the serial update of neuron membrane states. While many hardware accelerators and Compute-in-Memory (CIM) architectures efficiently parallelize the synaptic operation (W x I) achieving O(1) complexity for matrix-vector multiplication, the subsequent state update step still requires O(N) time to refresh all neuron membrane potentials. This mismatch makes state update the dominant latency and energy bottleneck in SNN inference. To address this challenge, we propose an SRAM-based CIM for SNN with Linear Decay Leaky Integrate-and-Fire (LD-LIF) Neuron that co-optimizes algorithm and hardware. At the algorithmic level, we replace the conventional exponential membrane decay with a linear decay approximation, converting costly multiplications into simple additions while accuracy drops only around 1%. At the architectural level, we introduce an in-memory parallel update scheme that performs in-place decay directly within the SRAM array, eliminating the need for global sequential updates. Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1 x to 16.7 x reduction of SOP energy consumption, while providing 15.9 x to 69 x more energy efficiency, with negligible accuracy loss relative to original decay models. This work highlights that beyond accelerating the (W x I) computation, optimizing state-update dynamics within CIM architectures is essential for scalable, low-power, and real-time neuromorphic processing.
References (15)
Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects
Shimeng Yu, Hongwu Jiang, Shanshi Huang et al.
An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
Y. Chih, Po-Hao Lee, H. Fujiwara et al.
A Layer-wised Mixed-Precision CIM Accelerator with Bit-level Sparsity-aware ADCs for NAS-Optimized CNNs
Haoxiang Zhou, Zikun Wei, Dingbang Liu et al.
Efficient nonlinear function approximation in analog resistive crossbars for recurrent neural networks
Junyi Yang, Ruibin Mao, Mingrui Jiang et al.
FPT-spike: a flexible precise-time-dependent single-spike neuromorphic computing architecture
Tao Liu, Gang Quan, Wujie Wen
Towards Understanding the Effect of Leak in Spiking Neural Networks
Sayeed Shafayet Chowdhury, Chankyu Lee, K. Roy
ANP-I: A 28-nm 1.5-pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-0.1-μ J/Sample On-Chip Learning for Edge-AI Applications
Jilin Zhang, Dexuan Huo, Jian Zhang et al.
SpiNNaker 2: A 10 Million Core Processor System for Brain Simulation and Machine Learning
C. Mayr, Sebastian Hoeppner, S. Furber
TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip
F. Akopyan, J. Sawada, A. Cassidy et al.
Energy-Efficient Stochastic Spiking Neural Network Hardware with 8T SRAM Array Utilizing Sub-Threshold Cascaded Current Mirrors and Stochastic CMOS Leaky Integrate-and-Fire Neurons
Honggu Kim, Yerim An, Dongjun Son et al.
Loihi: A Neuromorphic Manycore Processor with On-Chip Learning
Mike Davies, N. Srinivasa, Tsung-Han Lin et al.
SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation
E. Painkras, L. Plana, J. Garside et al.
ANP-G: A 28nm 1.04pJ/SOP Sub-mm2 Spiking and Back-propagation Hybrid Neural Network Asynchronous Olfactory Processor Enabling Few-shot Class-incremental On-chip Learning
D. Huo, Jilin Zhang, Xinyu Dai et al.
Neural inference at the frontier of energy, space, and time
D. Modha, F. Akopyan, Alexander Andreopoulos et al.
Hardware Approximation of Exponential Decay for Spiking Neural Networks
S. Eissa, S. Stuijk, H. Corporaal