Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

TL;DR

This paper introduces an equilibrium propagation (EP)-based training method for deep predictive coding networks (PCNs), achieving 13.23% Top-5 error on ImageNet with a 10-layer VGG model, close to the 12.2% baseline of backpropagation.

cs.LG 🔴 Advanced 2026-06-02 41 views

Tugdual Kerjan Rasmus Høier Benjamin Scellier

AI Reader Arxiv Page Download PDF

deep learning energy-based models predictive coding equilibrium propagation large-scale training

Key Findings

Methodology

This study innovatively combines the centered variant of EP with a novel equilibrium scheme tailored for PCNs, enabling training of deep convolutional networks on large-scale datasets. The approach involves applying small perturbations (via nudging) to the output layer, then using finite difference schemes—random, centered, forward, backward—to estimate the gradient of the energy function with respect to model parameters. Hyperparameters such as perturbation strength β and number of iterations K are systematically tuned. The training process alternates between a free phase (β=0), where the network reaches equilibrium via forward passes, and a nudged phase (β≠0), where the network state is perturbed and equilibrated again. The energy function combines the network's prediction errors with a cost function (cross-entropy or MSE). GPU-based simulations demonstrate that this method achieves a Top-5 error of 13.23% on full ImageNet, closely matching the 12.2% of backpropagation, marking a significant milestone for EP and PCN scalability.

Key Results

The trained VGG10 model on full-size ImageNet dataset achieved a Top-5 error rate of 13.23%, demonstrating the first successful large-scale application of EP and PCNs, with performance close to the backpropagation baseline of 12.2%.
Different perturbation schemes (random, centered, forward, backward) were compared, revealing that the random scheme remains competitive even at full scale, challenging previous assumptions favoring the centered scheme.
Hyperparameter sensitivity analysis showed that perturbation strength β in the range 0.0002 to 0.1 and at least 4 iterations (K≥4) are sufficient for stable training, validating the robustness of the approach across various settings.

Significance

This work significantly advances the practical applicability of energy-based models and EP in large-scale visual recognition tasks. It demonstrates that EP can scale beyond small datasets and hardware prototypes, opening pathways for energy-efficient, neuromorphic hardware implementations. The near-parity with backpropagation performance on ImageNet suggests that EP could serve as a biologically plausible alternative for training deep neural networks, potentially reducing energy consumption and hardware complexity. Moreover, these results foster deeper understanding of the computational properties of physical systems governed by variational principles, bridging neuroscience-inspired models with mainstream deep learning. The successful training of a 10-layer convolutional network on ImageNet sets a new benchmark, inspiring future research on even deeper and more complex architectures.

Technical Contribution

The core technical innovation lies in integrating the centered EP scheme with a novel equilibrium strategy tailored for PCNs, enabling stable training of deep networks on large datasets. Key contributions include: • Designing an energy function that combines prediction errors with a cost term suitable for supervised learning; • Implementing multiple perturbation schemes (random, centered, forward, backward) and analyzing their empirical performance; • Developing an efficient GPU-based simulation pipeline that accelerates energy minimization and gradient estimation, overcoming previous computational bottlenecks. These advancements demonstrate that EP can be effectively scaled and optimized for real-world large-scale tasks, broadening its applicability in both theoretical and practical contexts.

Novelty

This research is the first to successfully train a deep convolutional network (VGG10) on full-size ImageNet using EP, bridging the gap between small-scale energy models and large-scale visual recognition. Unlike prior work limited to low-resolution datasets or small networks, this study validates EP’s scalability and robustness in a challenging real-world setting. Additionally, the systematic comparison of multiple perturbation schemes and hyperparameters provides new insights into the dynamics and stability of EP-based training, challenging previous assumptions that centered schemes are always superior. The integration of a novel equilibrium scheme with the traditional EP framework represents a significant methodological advance, opening new avenues for energy-based learning.

Limitations

Despite achieving near state-of-the-art performance, the training process remains computationally intensive, requiring extensive GPU resources and long training times, which may hinder practical deployment.
The physical implementation of EP in hardware systems still faces significant challenges, such as energy consumption, noise robustness, and device variability, which are not addressed in this simulation-based study.
The current approach has been validated primarily on convolutional architectures; extending it to transformer-based models or deeper networks like ResNets may encounter stability and convergence issues that need further investigation.

Future Work

Future research will focus on developing hardware-compatible EP implementations, exploring memristor-based or optical energy systems for real-time, low-power training. Additionally, extending the framework to more complex architectures such as ResNets and transformers, incorporating batch normalization and attention mechanisms, will be a priority. Investigating EP’s applicability to unsupervised and reinforcement learning tasks, as well as theoretical analyses of convergence guarantees and energy efficiency, will further solidify its role as a biologically plausible alternative to backpropagation.

AI Executive Summary

The rapid growth of deep learning models has brought about remarkable achievements in tasks like image recognition and natural language processing. However, the reliance on backpropagation (BP) as the core training algorithm poses significant challenges in terms of energy consumption, scalability, and hardware implementation. As models deepen and datasets expand, the need for more biologically plausible, energy-efficient learning algorithms becomes urgent.

Equilibrium propagation (EP), a physics-inspired framework, offers a promising alternative by leveraging the energy minimization principles of physical systems. Historically, EP has demonstrated success in small-scale energy models such as Hopfield networks and resistor circuits, but its application to large-scale neural networks has been limited by computational complexity and stability issues.

This study marks a pivotal breakthrough by successfully training a 10-layer convolutional predictive coding network (VGG10) on the full ImageNet dataset using EP. The authors introduce a novel combination of the centered EP scheme with a new equilibrium strategy tailored for PCNs, enabling stable and efficient training at scale. They systematically compare various perturbation schemes—random, centered, forward, backward—and hyperparameters, demonstrating that the random scheme remains competitive even at full scale. The trained model achieves a Top-5 error rate of 13.23%, closely approaching the 12.2% baseline of BP, thus validating EP’s potential for large-scale applications.

The experimental results are significant for both theoretical and practical reasons. They challenge the conventional wisdom that centered schemes are always superior, showing that simpler random schemes can perform equally well. The robustness of the approach across datasets and hyperparameters underscores its viability. Importantly, this work paves the way for energy-efficient, hardware-friendly neural networks, especially in resource-constrained environments such as edge devices and neuromorphic chips.

Despite these advances, challenges remain. The training process is computationally intensive, and translating these simulation results into physical hardware implementations involves addressing issues like device variability, energy consumption, and noise robustness. Future work will focus on hardware realization, extending the framework to deeper and more complex architectures, and exploring applications beyond supervised image classification. Overall, this research significantly advances the frontier of energy-based learning, bridging neuroscience-inspired models with mainstream deep learning, and setting the stage for sustainable, scalable AI systems.

Deep Dive

Abstract

Equilibrium Propagation (EP) is a physics-based training framework that has primarily been employed in energy-based models, including continuous Hopfield networks, nonlinear resistive networks and coupled phase oscillators. However, EP's practical applications have so far remained limited to relatively small-scale problems. Predictive coding networks (PCNs), another class of energy-based models rooted in computational neuroscience, are typically trained with a specialized algorithm and have likewise not yet been demonstrated at large scale. In this work, we develop an EP-based training method for PCNs which combines the centered variant of EP with a novel equilibration scheme for PCNs. Using this approach, we train a 10-layer convolutional PCN (VGG10) on full-size ImageNet, achieving 13.23\% test error rate on the top-5 classification task, close to the 12.2\% backpropagation baseline. To our knowledge, this is the first demonstration of both PCNs and EP-based training at ImageNet scale. These results significantly extend the scalability of both approaches and suggest that the primary challenges in scaling EP in other physical systems may come more from the computational properties of these systems than from inherent limitations of the EP framework.

cs.LG cond-mat.dis-nn cs.NE

References (20)

Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation

B. Scellier, Yoshua Bengio

2016 638 citations ⭐ Influential View Analysis →

Benchmarking Predictive Coding Networks - Made Simple

Luca Pinchetti, C. Qi, Oleh Lokshyn et al.

2024 28 citations ⭐ Influential View Analysis →

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation

Jason Ansel, Edward Yang, Horace He et al.

2024 1172 citations ⭐ Influential

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias

Axel Laborieux, M. Ernoult, B. Scellier et al.

2020 103 citations ⭐ Influential View Analysis →

Energy-based learning algorithms for analog computing: a comparative study

B. Scellier, M. Ernoult, Jack Kendall et al.

2023 50 citations ⭐ Influential View Analysis →

An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity

James C. R. Whittington, R. Bogacz

2017 358 citations

Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm

R. O’Reilly

1996 406 citations

Machine learning without a processor: Emergent learning in a nonlinear analog network

Sam Dillavou, Benjamin D. Beyer, M. Stern et al.

2023 46 citations View Analysis →

Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations

Axel Laborieux, F T Zenke

2022 57 citations View Analysis →

Quantum equilibrium propagation for efficient training of quantum systems based on Onsager reciprocity

C. C. Wanjura, Florian Marquardt

2024 10 citations View Analysis →

A Gradient Estimator for Time-Varying Electrical Networks with Non-Linear Dissipation

Jack D. Kendall

2021 9 citations View Analysis →

Activity-difference training of deep neural networks using memristor crossbars

Su-in Yi, Jack D. Kendall, R. S. Williams et al.

2022 107 citations

Bilevel Programs Meet Deep Learning: A Unifying View on Inference Learning Methods

C. Zach

2021 7 citations View Analysis →

A deep learning theory for neural networks grounded in physics

B. Scellier

2021 34 citations View Analysis →

Equilibrium Propagation and (Memristor-based) Oscillatory Neural Networks

Gianluca Zoppo, Francesco Marrone, M. Bonnin et al.

2022 5 citations

Theories of Error Back-Propagation in the Brain

James C. R. Whittington, R. Bogacz

2019 432 citations

Training End-to-End Analog Neural Networks with Equilibrium Propagation

Jack D. Kendall, Ross D. Pantone, Kalpana Manickavasagam et al.

2020 109 citations View Analysis →

Supervised Learning in Physical Networks: From Machine Learning to Learning Machines

M. Stern, D. Hexner, J. Rocks et al.

2020 140 citations View Analysis →

Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation

Umais Zahid, Qinghai Guo, Z. Fountas

2023 15 citations View Analysis →

Towards the Training of Deeper Predictive Coding Neural Networks

C. Qi, Matteo Forasassi, Thomas Lukasiewicz et al.

2025 4 citations View Analysis →

Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

References (20)

Related Papers

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

On the Oracle Complexity of Interpolation-Based Gradient Descent

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Zero-Shot Active Feature Acquisition via LLM-Elicitation

Looped World Models

Kolmogorov Regression for Robust Diffusion Policies