Training a Predictive Coding Network on ImageNet using Equilibrium Propagation
This paper introduces an equilibrium propagation (EP)-based training method for deep predictive coding networks (PCNs), achieving 13.23% Top-5 error on ImageNet with a 10-layer VGG model, close to the 12.2% baseline of backpropagation.
Key Findings
Methodology
This study innovatively combines the centered variant of EP with a novel equilibrium scheme tailored for PCNs, enabling training of deep convolutional networks on large-scale datasets. The approach involves applying small perturbations (via nudging) to the output layer, then using finite difference schemes—random, centered, forward, backward—to estimate the gradient of the energy function with respect to model parameters. Hyperparameters such as perturbation strength β and number of iterations K are systematically tuned. The training process alternates between a free phase (β=0), where the network reaches equilibrium via forward passes, and a nudged phase (β≠0), where the network state is perturbed and equilibrated again. The energy function combines the network's prediction errors with a cost function (cross-entropy or MSE). GPU-based simulations demonstrate that this method achieves a Top-5 error of 13.23% on full ImageNet, closely matching the 12.2% of backpropagation, marking a significant milestone for EP and PCN scalability.
Key Results
- The trained VGG10 model on full-size ImageNet dataset achieved a Top-5 error rate of 13.23%, demonstrating the first successful large-scale application of EP and PCNs, with performance close to the backpropagation baseline of 12.2%.
- Different perturbation schemes (random, centered, forward, backward) were compared, revealing that the random scheme remains competitive even at full scale, challenging previous assumptions favoring the centered scheme.
- Hyperparameter sensitivity analysis showed that perturbation strength β in the range 0.0002 to 0.1 and at least 4 iterations (K≥4) are sufficient for stable training, validating the robustness of the approach across various settings.
Significance
This work significantly advances the practical applicability of energy-based models and EP in large-scale visual recognition tasks. It demonstrates that EP can scale beyond small datasets and hardware prototypes, opening pathways for energy-efficient, neuromorphic hardware implementations. The near-parity with backpropagation performance on ImageNet suggests that EP could serve as a biologically plausible alternative for training deep neural networks, potentially reducing energy consumption and hardware complexity. Moreover, these results foster deeper understanding of the computational properties of physical systems governed by variational principles, bridging neuroscience-inspired models with mainstream deep learning. The successful training of a 10-layer convolutional network on ImageNet sets a new benchmark, inspiring future research on even deeper and more complex architectures.
Technical Contribution
The core technical innovation lies in integrating the centered EP scheme with a novel equilibrium strategy tailored for PCNs, enabling stable training of deep networks on large datasets. Key contributions include: • Designing an energy function that combines prediction errors with a cost term suitable for supervised learning; • Implementing multiple perturbation schemes (random, centered, forward, backward) and analyzing their empirical performance; • Developing an efficient GPU-based simulation pipeline that accelerates energy minimization and gradient estimation, overcoming previous computational bottlenecks. These advancements demonstrate that EP can be effectively scaled and optimized for real-world large-scale tasks, broadening its applicability in both theoretical and practical contexts.
Novelty
This research is the first to successfully train a deep convolutional network (VGG10) on full-size ImageNet using EP, bridging the gap between small-scale energy models and large-scale visual recognition. Unlike prior work limited to low-resolution datasets or small networks, this study validates EP’s scalability and robustness in a challenging real-world setting. Additionally, the systematic comparison of multiple perturbation schemes and hyperparameters provides new insights into the dynamics and stability of EP-based training, challenging previous assumptions that centered schemes are always superior. The integration of a novel equilibrium scheme with the traditional EP framework represents a significant methodological advance, opening new avenues for energy-based learning.
Limitations
- Despite achieving near state-of-the-art performance, the training process remains computationally intensive, requiring extensive GPU resources and long training times, which may hinder practical deployment.
- The physical implementation of EP in hardware systems still faces significant challenges, such as energy consumption, noise robustness, and device variability, which are not addressed in this simulation-based study.
- The current approach has been validated primarily on convolutional architectures; extending it to transformer-based models or deeper networks like ResNets may encounter stability and convergence issues that need further investigation.
Future Work
Future research will focus on developing hardware-compatible EP implementations, exploring memristor-based or optical energy systems for real-time, low-power training. Additionally, extending the framework to more complex architectures such as ResNets and transformers, incorporating batch normalization and attention mechanisms, will be a priority. Investigating EP’s applicability to unsupervised and reinforcement learning tasks, as well as theoretical analyses of convergence guarantees and energy efficiency, will further solidify its role as a biologically plausible alternative to backpropagation.
AI Executive Summary
The rapid growth of deep learning models has brought about remarkable achievements in tasks like image recognition and natural language processing. However, the reliance on backpropagation (BP) as the core training algorithm poses significant challenges in terms of energy consumption, scalability, and hardware implementation. As models deepen and datasets expand, the need for more biologically plausible, energy-efficient learning algorithms becomes urgent.
Equilibrium propagation (EP), a physics-inspired framework, offers a promising alternative by leveraging the energy minimization principles of physical systems. Historically, EP has demonstrated success in small-scale energy models such as Hopfield networks and resistor circuits, but its application to large-scale neural networks has been limited by computational complexity and stability issues.
This study marks a pivotal breakthrough by successfully training a 10-layer convolutional predictive coding network (VGG10) on the full ImageNet dataset using EP. The authors introduce a novel combination of the centered EP scheme with a new equilibrium strategy tailored for PCNs, enabling stable and efficient training at scale. They systematically compare various perturbation schemes—random, centered, forward, backward—and hyperparameters, demonstrating that the random scheme remains competitive even at full scale. The trained model achieves a Top-5 error rate of 13.23%, closely approaching the 12.2% baseline of BP, thus validating EP’s potential for large-scale applications.
The experimental results are significant for both theoretical and practical reasons. They challenge the conventional wisdom that centered schemes are always superior, showing that simpler random schemes can perform equally well. The robustness of the approach across datasets and hyperparameters underscores its viability. Importantly, this work paves the way for energy-efficient, hardware-friendly neural networks, especially in resource-constrained environments such as edge devices and neuromorphic chips.
Despite these advances, challenges remain. The training process is computationally intensive, and translating these simulation results into physical hardware implementations involves addressing issues like device variability, energy consumption, and noise robustness. Future work will focus on hardware realization, extending the framework to deeper and more complex architectures, and exploring applications beyond supervised image classification. Overall, this research significantly advances the frontier of energy-based learning, bridging neuroscience-inspired models with mainstream deep learning, and setting the stage for sustainable, scalable AI systems.
Deep Dive
Abstract
Equilibrium Propagation (EP) is a physics-based training framework that has primarily been employed in energy-based models, including continuous Hopfield networks, nonlinear resistive networks and coupled phase oscillators. However, EP's practical applications have so far remained limited to relatively small-scale problems. Predictive coding networks (PCNs), another class of energy-based models rooted in computational neuroscience, are typically trained with a specialized algorithm and have likewise not yet been demonstrated at large scale. In this work, we develop an EP-based training method for PCNs which combines the centered variant of EP with a novel equilibration scheme for PCNs. Using this approach, we train a 10-layer convolutional PCN (VGG10) on full-size ImageNet, achieving 13.23\% test error rate on the top-5 classification task, close to the 12.2\% backpropagation baseline. To our knowledge, this is the first demonstration of both PCNs and EP-based training at ImageNet scale. These results significantly extend the scalability of both approaches and suggest that the primary challenges in scaling EP in other physical systems may come more from the computational properties of these systems than from inherent limitations of the EP framework.
References (20)
Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation
B. Scellier, Yoshua Bengio
Benchmarking Predictive Coding Networks - Made Simple
Luca Pinchetti, C. Qi, Oleh Lokshyn et al.
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation
Jason Ansel, Edward Yang, Horace He et al.
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias
Axel Laborieux, M. Ernoult, B. Scellier et al.
Energy-based learning algorithms for analog computing: a comparative study
B. Scellier, M. Ernoult, Jack Kendall et al.
An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity
James C. R. Whittington, R. Bogacz
Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm
R. O’Reilly
Machine learning without a processor: Emergent learning in a nonlinear analog network
Sam Dillavou, Benjamin D. Beyer, M. Stern et al.
Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations
Axel Laborieux, F T Zenke
Quantum equilibrium propagation for efficient training of quantum systems based on Onsager reciprocity
C. C. Wanjura, Florian Marquardt
A Gradient Estimator for Time-Varying Electrical Networks with Non-Linear Dissipation
Jack D. Kendall
Activity-difference training of deep neural networks using memristor crossbars
Su-in Yi, Jack D. Kendall, R. S. Williams et al.
Bilevel Programs Meet Deep Learning: A Unifying View on Inference Learning Methods
C. Zach
A deep learning theory for neural networks grounded in physics
B. Scellier
Equilibrium Propagation and (Memristor-based) Oscillatory Neural Networks
Gianluca Zoppo, Francesco Marrone, M. Bonnin et al.
Theories of Error Back-Propagation in the Brain
James C. R. Whittington, R. Bogacz
Training End-to-End Analog Neural Networks with Equilibrium Propagation
Jack D. Kendall, Ross D. Pantone, Kalpana Manickavasagam et al.
Supervised Learning in Physical Networks: From Machine Learning to Learning Machines
M. Stern, D. Hexner, J. Rocks et al.
Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation
Umais Zahid, Qinghai Guo, Z. Fountas
Towards the Training of Deeper Predictive Coding Neural Networks
C. Qi, Matteo Forasassi, Thomas Lukasiewicz et al.