Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

TL;DR

Introduces Graph Energy Matching (GEM), surpassing discrete diffusion models in molecular graph generation.

cs.LG 🔴 Advanced 2026-03-25 43 views

Michal Balcerak Suprosana Shit Chinmay Prabhakar Sebastian Kaltenbach Michael S. Albergo Yilun Du Bjoern Menze

graph generation energy model discrete sampling molecular graph generative model

Key Findings

Methodology

Graph Energy Matching (GEM) is a generative framework for graphs that learns a permutation-invariant potential energy to provide transport-aligned guidance from noise to data and refines samples within high data likelihood regions. GEM combines the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme with an energy-driven sampling protocol that enables rapid transport to high-probability regions and exploration of the learned graph distribution.

Key Results

On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines in sample quality. Specifically, on the MOSES dataset, GEM outperforms baseline models in Valid-Unique-Novel (V.U.N.) metrics and achieves lower Fréchet ChemNet Distance (FCD).
On the QM9 dataset, GEM achieves V.U. scores comparable to discrete diffusion baselines under noise initialization but significantly surpasses them in novelty, indicating effective exploration from random initialization.
GEM maintains strong validity, uniqueness, and novelty in property optimization tasks while improving constraint satisfaction relative to baselines.

Significance

The introduction of GEM is significant for both academia and industry. It addresses the historical gap in sampling efficiency and sample quality of discrete energy models, providing a more efficient framework for graph generation. By explicitly modeling relative likelihood, GEM enables targeted exploration during inference, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs. This capability is particularly important in fields like drug discovery and materials design.

Technical Contribution

GEM provides a novel discrete energy model framework that fundamentally differs from existing discrete diffusion methods. By introducing a JKO-style transport map optimization perspective combined with an energy-driven sampling protocol, it addresses the shortcomings of discrete energy models in sampling efficiency and sample quality. Additionally, GEM's explicit relative likelihood modeling allows for direct integration of constraints and property objectives during inference without retraining.

Novelty

GEM is the first framework to apply transport-aligned energy models in graph generation. Its fundamental innovation compared to existing discrete diffusion models lies in its energy-driven sampling protocol and explicit relative likelihood modeling, enabling more flexible and efficient exploration during inference.

Limitations

GEM's sampling efficiency may still be limited on certain complex graph structures, particularly in high-dimensional discrete spaces.
The computational cost may significantly increase when handling very large graphs, affecting its practicality in large-scale applications.
The training and sampling process of the model may require fine-tuning of hyperparameters to ensure optimal performance.

Future Work

Future research directions include further optimizing GEM's sampling efficiency, particularly in high-dimensional discrete spaces. Additionally, exploring GEM's potential applications in other types of graph structures, such as social network graphs, and developing more efficient training and sampling algorithms to reduce computational costs.

AI Executive Summary

Graph Energy Matching (GEM) is a novel generative framework designed to address these issues. GEM learns a permutation-invariant potential energy that provides transport-aligned guidance from noise to data and refines samples within high data likelihood regions. Its energy-driven sampling protocol enables rapid transport to high-probability regions and exploration of the learned graph distribution.

GEM demonstrates outstanding performance on molecular graph benchmarks, matching or exceeding strong discrete diffusion baselines in sample quality. On the MOSES dataset, GEM outperforms baseline models in Valid-Unique-Novel (V.U.N.) metrics and achieves lower Fréchet ChemNet Distance (FCD). On the QM9 dataset, GEM achieves V.U. scores comparable to discrete diffusion baselines under noise initialization but significantly surpasses them in novelty, indicating effective exploration from random initialization.

However, GEM's sampling efficiency may still be limited on certain complex graph structures, particularly in high-dimensional discrete spaces. The computational cost may significantly increase when handling very large graphs, affecting its practicality in large-scale applications. Future research directions include further optimizing GEM's sampling efficiency, particularly in high-dimensional discrete spaces. Additionally, exploring GEM's potential applications in other types of graph structures, such as social network graphs, and developing more efficient training and sampling algorithms to reduce computational costs.

Deep Analysis

Background

Graph generation is a central challenge across many scientific and industrial applications, particularly in drug discovery and materials design. Current state-of-the-art models predominantly rely on discrete diffusion processes, generating target data distributions by progressively denoising samples. However, these methods define distributions implicitly via noisy intermediate states, lacking an explicit model directly on clean graphs, complicating the enforcement of user-specified properties or constraints during inference. Energy-based models offer a complementary perspective by explicitly representing the probability structure of the data distribution through a scalar energy function, enabling direct incorporation of constraints, priors, or property-based objectives at inference time without retraining.

Core Problem

Discrete energy models have historically struggled with sampling efficiency and sample quality, particularly in high-dimensional discrete spaces. Existing methods define distributions implicitly via noisy intermediate states, lacking an explicit model directly on clean graphs, complicating the enforcement of user-specified properties or constraints during inference. Additionally, discrete energy models suffer from poor sampling efficiency, especially in high-dimensional discrete spaces, which can lead to suboptimal sample quality.

Innovation

GEM is the first framework to apply transport-aligned energy models in graph generation. Its fundamental innovations include:

�� Energy-driven sampling protocol: Learns a permutation-invariant potential energy to provide transport-aligned guidance from noise to data and refines samples within high data likelihood regions.

�� Explicit relative likelihood modeling: Enables more flexible and efficient exploration during inference, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.

�� JKO-style transport map optimization perspective: Combines with an energy-driven sampling protocol to address the shortcomings of discrete energy models in sampling efficiency and sample quality.

Methodology

GEM's methodology involves several key steps:

�� Learning a permutation-invariant potential energy: Optimizes a JKO-style transport map to provide transport-aligned guidance from noise to data.

�� Energy-driven sampling protocol: Combines rapid transport and mixing exploration mechanisms to enable rapid transport to high-probability regions and exploration of the learned graph distribution.

�� Explicit relative likelihood modeling: Explicitly represents the probability structure of the data distribution, enabling direct incorporation of constraints, priors, or property-based objectives at inference time without retraining.

�� Experimental validation: Demonstrates GEM's superior performance in sample quality and sampling efficiency on molecular graph benchmarks.

Experiments

The experimental design includes validation on two molecular graph datasets, QM9 and MOSES. The QM9 dataset comprises approximately 150,000 small organic molecules, while the MOSES dataset is a larger drug-like dataset with roughly 1.5 million molecules. Baseline models used in the experiments include discrete diffusion models and existing energy models. Key metrics include Valid-Unique-Novel (V.U.N.) and Fréchet ChemNet Distance (FCD). The experiments also include property optimization tasks and graph-to-graph geodesic interpolation analysis.

Results

The experimental results show that GEM matches or exceeds strong discrete diffusion baselines in sample quality. On the MOSES dataset, GEM outperforms baseline models in Valid-Unique-Novel (V.U.N.) metrics and achieves lower Fréchet ChemNet Distance (FCD). On the QM9 dataset, GEM achieves V.U. scores comparable to discrete diffusion baselines under noise initialization but significantly surpasses them in novelty, indicating effective exploration from random initialization. Additionally, GEM maintains strong validity, uniqueness, and novelty in property optimization tasks while improving constraint satisfaction relative to baselines.

Applications

GEM has broad applications in fields like drug discovery and materials design. Its explicit relative likelihood modeling enables targeted exploration during inference, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs. This capability is particularly important when generating molecular structures that meet specific properties or constraints. Additionally, GEM's efficient sampling protocol makes it advantageous for large-scale graph generation tasks.

Limitations & Outlook

Despite GEM's outstanding performance in sample quality and sampling efficiency, its sampling efficiency may still be limited on certain complex graph structures, particularly in high-dimensional discrete spaces. Additionally, the computational cost may significantly increase when handling very large graphs, affecting its practicality in large-scale applications. Future research directions include further optimizing GEM's sampling efficiency, particularly in high-dimensional discrete spaces. Additionally, exploring GEM's potential applications in other types of graph structures, such as social network graphs, and developing more efficient training and sampling algorithms to reduce computational costs.

Plain Language Accessible to non-experts

Imagine you're in a kitchen cooking a meal. You have a recipe that tells you what ingredients you need and how to combine them step by step to create a delicious dish. Graph Energy Matching (GEM) is like this recipe, but instead of making a dish, it's generating graphs. Graphs are complex structures made up of nodes (ingredients) and edges (connections). GEM works by using something called an energy model to guide the generation process. It's like having a smart chef who can adjust the recipe based on different conditions to make sure the dish turns out perfect every time. In this way, GEM can generate high-quality graph structures that meet specific properties or constraints, just like a chef can adjust a dish to suit a customer's taste.

ELI14 Explained like you're 14

Hey there! Did you know that scientists sometimes play with building blocks like LEGO? But instead of building towers or cars, they're building graphs! Graphs are like networks made up of lots of points and lines, like social networks or molecular structures. Graph Energy Matching (GEM) is a new method scientists use to create these graphs. Imagine having a super-smart assistant while you're playing LEGO, who helps you pick the best blocks and shows you how to build the coolest structures! That's what GEM does—it can create perfect graph structures based on different needs. Isn't that awesome?

Glossary

Graph Energy Matching (GEM)

A generative framework for graphs that learns a permutation-invariant potential energy to provide transport-aligned guidance from noise to data and refines samples within high data likelihood regions.

GEM is the core method proposed in this paper to address the shortcomings of discrete energy models in sampling efficiency and sample quality.

Energy-Based Model (EBM)

Represents the probability structure of the data distribution through a scalar energy function, enabling direct incorporation of constraints, priors, or property-based objectives at inference time without retraining.

EBMs provide the foundation for GEM, allowing it to perform targeted exploration during inference.

Jordan-Kinderlehrer-Otto (JKO) Scheme

A transport map optimization perspective used to define transport-aligned guidance from noise to data.

The JKO scheme provides the theoretical basis for GEM's transport alignment.

Fréchet ChemNet Distance (FCD)

A metric used to evaluate the similarity between generated molecules and real molecule distributions.

FCD is a key metric used in this paper to evaluate GEM's performance in molecular graph generation tasks.

Valid-Unique-Novel (V.U.N.)

Evaluates the performance of generated graph structures in terms of validity, uniqueness, and novelty.

V.U.N. is a key metric used in this paper to evaluate GEM's performance in molecular graph generation tasks.

Discrete Diffusion Model

Generates target data distributions by progressively denoising samples.

Discrete diffusion models are one of the baseline comparisons for GEM.

Sampling Protocol

Mechanism for rapid transport to high-probability regions and exploration of the learned graph distribution.

GEM's sampling protocol combines rapid transport and mixing exploration mechanisms.

Transport-Aligned Guidance

Provides transport-aligned guidance from noise to data by learning a permutation-invariant potential energy.

Transport-aligned guidance is one of GEM's core innovations.

Permutation-Invariant Potential Energy

An energy function that is invariant to node permutations, used to guide the graph generation process.

Permutation-invariant potential energy is a core component of GEM.

Geodesic Interpolation

A method for interpolating between graphs, capable of generating graph structures that meet specific properties or constraints.

Geodesic interpolation is one of GEM's capabilities for targeted exploration during inference.

Open Questions Unanswered questions from this research

1 How can GEM's sampling efficiency be further optimized, particularly in high-dimensional discrete spaces? Existing methods may still be limited on certain complex graph structures, requiring the development of more efficient sampling algorithms.
2 How can computational costs be reduced when handling very large graphs? GEM's computational cost may significantly increase, affecting its practicality in large-scale applications, necessitating the development of more efficient training and sampling algorithms.
3 What is GEM's potential application in other types of graph structures, such as social network graphs? Exploration of its performance and applicability in different graph structures is needed.
4 How can GEM's hyperparameter tuning process be simplified without affecting performance? The training and sampling process of the model may require fine-tuning of hyperparameters to ensure optimal performance.
5 How can more domain knowledge be integrated into GEM to enhance its performance in specific applications? Exploration of how to better integrate domain knowledge into energy models is needed.

Applications

Immediate Applications

Drug Discovery

GEM can be used to generate molecular structures that meet specific pharmacological properties, helping to accelerate the drug discovery process.

Materials Design

By generating material structures with specific physical or chemical properties, GEM can play a significant role in the design and development of new materials.

Chemical Synthesis Pathway Optimization

GEM can be used to generate molecular structures that meet specific synthesis pathway constraints, optimizing the chemical synthesis process.

Long-term Vision

Personalized Medicine Design

By generating molecular structures that meet individual-specific needs, GEM can play a significant role in personalized medicine design.

Smart Materials Development

GEM can be used to generate material structures with adaptive or smart properties, driving the development and application of smart materials.

Abstract

Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.

cs.LG cs.AI stat.ML

References (20)

GraphEBM: Molecular Graph Generation with Energy-Based Models

Meng Liu, Keqiang Yan, Bora Oztekin et al.

2021 103 citations ⭐ Influential View Analysis →

Informed Proposals for Local MCMC in Discrete Spaces

Giacomo Zanella

2017 137 citations ⭐ Influential View Analysis →

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

Will Grathwohl, Kevin Swersky, Milad Hashemi et al.

2021 111 citations ⭐ Influential View Analysis →

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Daniil Polykovskiy, Alexander Zhebrak, Benjamín Sánchez-Lengeling et al.

2018 816 citations ⭐ Influential View Analysis →

Training Products of Experts by Minimizing Contrastive Divergence

Geoffrey E. Hinton

2002 5545 citations ⭐ Influential

DeFoG: Discrete Flow Matching for Graph Generation

Yiming Qin, Manuel Madeira, D. Thanou et al.

2024 45 citations ⭐ Influential View Analysis →

A Langevin-like Sampler for Discrete Distributions

Ruqi Zhang, Xingchao Liu, Qiang Liu

2022 50 citations ⭐ Influential View Analysis →

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

M. Balcerak, Tamaz Amiranashvili, Suprosanna Shit et al.

2025 14 citations ⭐ Influential View Analysis →

Graph Diffusion that can Insert and Delete

Matteo Ninniri, M. Podda, Davide Bacciu

2025 2 citations View Analysis →

First-Order Conditions for Optimization in the Wasserstein Space

Nicolas Lanzetti, S. Bolognani, Florian Dörfler

2022 29 citations View Analysis →

Pre-training Molecular Graph Representation with 3D Geometry

Shengchao Liu, Hanchen Wang, Weiyang Liu et al.

2021 433 citations View Analysis →

Controlled Generation with Equivariant Variational Flow Matching

Floor Eijkelboom, Heiko Zimmermann, Sharvaree P. Vadgama et al.

2025 9 citations View Analysis →

Stochastic interpolants with data-dependent couplings

M. S. Albergo, Mark Goldstein, N. Boffi et al.

2023 72 citations View Analysis →

Energy-Based Diffusion Language Models for Text Generation

Minkai Xu, Tomas Geffner, Karsten Kreis et al.

2024 67 citations View Analysis →

DiGress: Discrete Denoising diffusion for graph generation

Clément Vignac, Igor Krawczuk, Antoine Siraudin et al.

2022 560 citations View Analysis →

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

Mathieu Blondel, Michael E. Sander, Germain Vivier-Ardisson et al.

2025 2 citations View Analysis →

Junction Tree Variational Autoencoder for Molecular Graph Generation

Wengong Jin, R. Barzilay, T. Jaakkola

2018 1582 citations View Analysis →

Graph Inductive Biases in Transformers without Message Passing

Liheng Ma, Chen Lin, Derek Lim et al.

2023 170 citations View Analysis →

DockRMSD: an open-source tool for atom mapping and RMSD calculation of symmetric molecules through graph isomorphism

Eric W. Bell, Yang Zhang

2019 363 citations

Learning Diffusion at Lightspeed

Antonio Terpin, Nicolas Lanzetti, Florian Dörfler

2024 16 citations View Analysis →

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Graph Energy Matching (GEM)

Energy-Based Model (EBM)

Jordan-Kinderlehrer-Otto (JKO) Scheme

Fréchet ChemNet Distance (FCD)

Valid-Unique-Novel (V.U.N.)

Discrete Diffusion Model

Sampling Protocol

Transport-Aligned Guidance

Permutation-Invariant Potential Energy

Geodesic Interpolation

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Drug Discovery

Materials Design

Chemical Synthesis Pathway Optimization

Long-term Vision

Personalized Medicine Design

Smart Materials Development

Abstract

References (20)

Related Papers

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Efficient learning by implicit exploration in bandit problems with side observations

Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Collocation-based Robust Physics Informed Neural Networks for time-dependent simulations of pollution propagation under thermal inversion conditions on Spitsbergen

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data