Pruning-induced phases in fully-connected neural networks: the eumentia, the dementia, and the amentia

TL;DR

Study reveals three phases in fully-connected neural networks through dropout pruning: eumentia, dementia, and amentia.

cond-mat.dis-nn 🔴 Advanced 2026-03-13 2 views

Haining Pan Nakul Aggarwal J. H. Pixley

neural networks pruning phase transition statistical mechanics machine learning

Key Findings

Methodology

The study investigates fully-connected neural networks by independently varying dropout rates during training and evaluation to map the phase diagram. Three distinct phases are identified: eumentia (network learns), dementia (network forgets), and amentia (network cannot learn). These phases are sharply distinguished by the power-law scaling of cross-entropy loss with training dataset size.

Key Results

Result 1: In the eumentia phase, the cross-entropy loss decays algebraically with more data, consistent with quasi-long-range order in statistical mechanics.
Result 2: The transition between eumentia and dementia phases is accompanied by scale invariance, exhibiting characteristics of a Berezinskii-Kosterlitz-Thouless (BKT) transition.
Result 3: The phase structure is robust across different network widths and depths, demonstrating that dropout-induced pruning provides a concrete framework for understanding neural network behavior.

Significance

This research reveals phase transitions in neural networks during pruning from a statistical mechanics perspective, offering a new theoretical framework for understanding overparameterized networks. The findings are significant for academia and industry, providing new insights into model compression and optimization, especially in resource-constrained environments.

Technical Contribution

The technical contribution lies in the first-time revelation of phase transitions in neural networks through dropout pruning, with detailed experimental validation of these transitions' existence and characteristics. The study offers a theoretical explanation of network behavior under different pruning intensities, expanding the understanding of neural network structure and function.

Novelty

This is the first systematic study of phase transitions in neural networks through dropout pruning. Unlike previous studies, this paper not only focuses on the practical effects of pruning but also explores its deeper impact on network behavior from a theoretical perspective.

Limitations

Limitation 1: The study focuses primarily on fully-connected neural networks and the MNIST dataset, which may not generalize to other types of networks and datasets.
Limitation 2: While phase transitions are identified, their manifestation in more complex network structures requires further investigation.
Limitation 3: Verification of the BKT transition relies on finite-scale experiments, which may require larger-scale experiments for confirmation.

Future Work

Future research could extend to other network architectures and datasets to explore the universality of these phase transitions. Additionally, the impact of these transitions on practical applications, such as model compression and optimization, warrants further exploration.

AI Executive Summary

Modern neural networks are often overparameterized, leading to a significant number of redundant neurons and connections. Pruning techniques aim to remove these redundancies to compress the network while maintaining performance. However, whether pruning induces sharp phase transitions in neural networks and to what universality class they belong remain open questions.

This study investigates fully-connected neural networks trained on the MNIST dataset by independently varying dropout rates during training and evaluation to map the phase diagram. Three distinct phases are identified: eumentia (network learns), dementia (network forgets), and amentia (network cannot learn). These phases are sharply distinguished by the power-law scaling of cross-entropy loss with training dataset size.

In the eumentia phase, the cross-entropy loss decays algebraically with more data, consistent with quasi-long-range order in statistical mechanics. The transition between eumentia and dementia phases is accompanied by scale invariance, exhibiting characteristics of a Berezinskii-Kosterlitz-Thouless (BKT) transition. The phase structure is robust across different network widths and depths, demonstrating that dropout-induced pruning provides a concrete framework for understanding neural network behavior.

However, the study focuses primarily on fully-connected neural networks and the MNIST dataset, which may not generalize to other types of networks and datasets. While phase transitions are identified, their manifestation in more complex network structures requires further investigation. Future research could extend to other network architectures and datasets to explore the universality of these phase transitions. Additionally, the impact of these transitions on practical applications, such as model compression and optimization, warrants further exploration.

Deep Analysis

Background

Modern neural networks are characterized by overparameterization, where the number of parameters far exceeds what is needed to fit the training data. This redundancy is particularly evident in large language models, convolutional networks, and Transformers. While pruning techniques have been developed to reduce this redundancy, whether pruning induces phase transitions in neural networks and to what universality class they belong remain open questions. Recent research has focused on the practical utility of pruning, but the underlying physical mechanisms have not been fully explored.

Core Problem

The core problem is understanding whether pruning induces phase transitions in neural networks and what universality class these transitions belong to. While practical pruning methods are well-developed, it remains unclear whether they cause sharp changes in network behavior. Furthermore, explaining these changes through the lens of statistical mechanics is a challenging problem. Understanding these phase transitions is crucial for optimizing network structures and improving model efficiency.

Innovation

The core innovations of this study include systematically investigating phase transitions in neural networks through dropout pruning. • Identification of three phases: eumentia, dementia, and amentia. • Introduction of a method to distinguish these phases through the power-law scaling of cross-entropy loss with training dataset size. • Discovery of BKT-like characteristics in the transition between eumentia and dementia phases.

Methodology

�� Use fully-connected neural networks (FCNN) trained on the MNIST dataset for experiments. • Independently vary dropout rates during training and evaluation to map the phase diagram. • Distinguish phases through the power-law scaling of cross-entropy loss with training dataset size. • Validate the existence and characteristics of BKT transitions through finite-scale experiments.

Experiments

The experimental design uses the MNIST dataset with a fully-connected neural network architecture. • Dropout rates during training and evaluation are independently varied. • Cross-entropy loss and classification accuracy are used as evaluation metrics. • Finite-scale experiments are conducted to verify the existence of BKT transitions.

Results

Results show that in the eumentia phase, cross-entropy loss decays algebraically with more data. • The transition between eumentia and dementia phases exhibits BKT-like characteristics. • The phase structure is robust across different network widths and depths.

Applications

Application scenarios include model compression and optimization, especially in resource-constrained environments. • Understanding and optimizing other types of neural network architectures.

Limitations & Outlook

The study focuses primarily on fully-connected neural networks and the MNIST dataset, which may not generalize to other types of networks and datasets. • Verification of the BKT transition relies on finite-scale experiments, which may require larger-scale experiments for confirmation. • Future research could extend to other network architectures and datasets to explore the universality of these phase transitions.

Plain Language Accessible to non-experts

Imagine a factory with many machines and workers. The goal of this factory is to produce high-quality products, but sometimes there are too many machines and workers, leading to inefficiencies. To improve efficiency, the factory manager decides to reduce some unnecessary machines and workers, similar to the pruning process in neural networks. By removing the excess parts, the factory can improve production efficiency without affecting product quality. However, the manager needs to be careful because if too much is removed, the factory might not function properly. This process is like the eumentia, dementia, and amentia phases mentioned in the study: in the eumentia phase, the factory works well; in the dementia phase, the factory starts having problems; in the amentia phase, the factory can hardly operate. Through this analogy, we can understand the phase transitions in neural networks during pruning.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super complex block-building game. This game has way more blocks than you actually need. To make your tower more stable, you decide to remove some of the less important blocks. That's what scientists do with neural networks. They found that if you remove some unnecessary parts, the network can still work well, just like your block tower. But if you remove too much, the network might forget what it learned or even fail to learn new things. It's like if you take apart your block tower too much, and it falls over. Scientists also found that this process is a bit like some mysterious phenomena in physics. How cool is that!

Glossary

Dropout

A regularization technique that randomly drops neurons during training to prevent overfitting.

In this paper, dropout is used as a pruning method to study phase transitions in neural networks.

Fully-connected Neural Network

A neural network architecture where each neuron in one layer is connected to every neuron in the next layer.

The paper uses fully-connected neural networks for experiments on the MNIST dataset.

Cross-entropy Loss

A loss function used for classification tasks that measures the difference between predicted and true probability distributions.

Cross-entropy loss is used to distinguish different phases.

Berezinskii-Kosterlitz-Thouless Transition

A type of phase transition typically occurring in two-dimensional systems, involving the binding and unbinding of topological defects.

The transition between eumentia and dementia phases exhibits BKT-like characteristics.

Eumentia Phase

In this paper, it refers to the phase where the network can learn effectively.

In the eumentia phase, cross-entropy loss decays algebraically with more data.

Dementia Phase

In this paper, it refers to the phase where the network forgets what it has learned.

In the dementia phase, network performance worsens with more data.

Amentia Phase

In this paper, it refers to the phase where the network cannot learn.

In the amentia phase, the network cannot learn effectively.

Neural Scaling Laws

Empirical rules describing how neural network performance scales with model size and data quantity.

In the eumentia phase, the power-law decay of cross-entropy loss is consistent with neural scaling laws.

Overparameterization

Refers to a neural network having more parameters than necessary to fit the training data.

Modern neural networks are often characterized by overparameterization.

Statistical Mechanics

A branch of physics that uses statistical methods to study the behavior of systems with a large number of particles.

The paper studies phase transitions in neural networks from a statistical mechanics perspective.

Open Questions Unanswered questions from this research

1 The study primarily focuses on fully-connected neural networks and the MNIST dataset, leaving it unclear whether these phase transitions apply to other types of networks and datasets. Future research needs to explore how these transitions manifest in more complex network structures.
2 While phase transitions are identified, further investigation is needed to understand their manifestation in more complex network structures. Specifically, how to verify the existence of BKT transitions in deeper networks remains an open question.
3 Verification of the BKT transition relies on finite-scale experiments, which may require larger-scale experiments for confirmation. How to conduct experimental verification in large-scale networks is a question worth exploring.
4 The dropout rate used as a control parameter in the study raises the question of whether there are other more effective parameters for studying phase transitions in neural networks. This question requires further theoretical and experimental research.
5 In practical applications, how do these phase transitions affect model compression and optimization? Solving this question will have a significant impact on industrial applications.

Applications

Immediate Applications

Model Compression

By identifying and removing redundant parts, enhance the efficiency and performance of neural networks, applicable in resource-constrained environments.

Optimizing Network Structures

By understanding phase transitions, optimize the structure and parameters of neural networks to improve training and inference efficiency.

Theoretical Guidance

Provide theoretical guidance for the design and optimization of neural networks, helping researchers better understand and apply pruning techniques.

Long-term Vision

Insights into Biological Neural Networks

The study's findings may offer new perspectives for understanding synaptic pruning processes in biological neural networks, advancing neuroscience.

Cross-disciplinary Applications

Understanding phase transitions may find applications in other fields (e.g., physics, chemistry), promoting interdisciplinary research.

Abstract

Modern neural networks are heavily overparameterized, and pruning, which removes redundant neurons or connections, has emerged as a key approach to compressing them without sacrificing performance. However, while practical pruning methods are well developed, whether pruning induces sharp phase transitions in the neural networks and, if so, to what universality class they belong, remain open questions. To address this, we study fully-connected neural networks trained on MNIST, independently varying the dropout (i.e., removing neurons) rate at both the training and evaluation stages to map the phase diagram. We identify three distinct phases: eumentia (the network learns), dementia (the network has forgotten), and amentia (the network cannot learn), sharply distinguished by the power-law scaling of the cross-entropy loss with the training dataset size. {In the eumentia phase, the algebraic decay of the loss, as documented in the machine learning literature as neural scaling laws, is from the perspective of statistical mechanics the hallmark of quasi-long-range order.} We demonstrate that the transition between the eumentia and dementia phases is accompanied by scale invariance, with a diverging length scale that exhibits hallmarks of a Berezinskii-Kosterlitz-Thouless-like transition; the phase structure is robust across different network widths and depths. Our results establish that dropout-induced pruning provides a concrete setting in which neural network behavior can be understood through the lens of statistical mechanics.

cond-mat.dis-nn cs.LG cs.NE

References (20)

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, Geoffrey E. Hinton, A. Krizhevsky et al.

2014 42618 citations ⭐ Influential

Destruction of long range order in one-dimensional and two-dimensional systems having a continuous symmetry group. I. Classical systems

V. Berezinsky

1970 726 citations

Learning Structured Sparsity in Deep Neural Networks

W. Wen, Chunpeng Wu, Yandan Wang et al.

2016 2479 citations View Analysis →

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

2020 243 citations View Analysis →

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder et al.

2020 55356 citations View Analysis →

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

T. Hoefler, Dan Alistarh, Tal Ben-Nun et al.

2021 913 citations View Analysis →

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba

2014 163794 citations View Analysis →

Ordering, metastability and phase transitions in two-dimensional systems

J. Kosterlitz, D. Thouless

1973 6352 citations

Synaptic Pruning in Development: A Computational Account

Gal Chechik, I. Meilijson, E. Ruppin

1998 180 citations

Pruning Filters for Efficient ConvNets

Hao Li, Asim Kadav, Igor Durdanovic et al.

2016 4013 citations View Analysis →

Language Models

Jordan Boyd-Graber, Philipp Koehn

2009 987 citations

Critical properties of the two-dimensional XY model

D. Lublin

1976 254 citations

Learning long-term dependencies with gradient descent is difficult

Yoshua Bengio, P. Simard, P. Frasconi

1994 8889 citations

A Constructive Prediction of the Generalization Error Across Scales

Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov et al.

2019 264 citations View Analysis →

Synaptic density in human frontal cortex - developmental changes and effects of aging.

P. Huttenlocher

1979 2624 citations

Scaling Laws for Neural Language Models

J. Kaplan, Sam McCandlish, T. Henighan et al.

2020 7270 citations View Analysis →

Optimal Brain Surgeon and general network pruning

B. Hassibi, D. Stork, G. Wolff

1993 917 citations

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Sepp Hochreiter

1998 2668 citations

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot, Yoshua Bengio

2010 18945 citations

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Y. Gal, Zoubin Ghahramani

2015 11167 citations View Analysis →