A Data-Free Symbolic Regression Approach for Solving Equations

TL;DR

SES framework optimizes symbolic solutions directly from equations without training data, successfully recovering algebraic and differential equations' explicit expressions.

cs.NE 🔴 Advanced 2026-06-05 40 views
Sergei Garmaev Vinay Sharma Olga Fink
symbolic regression differential equations data-free optimization symbolic expressions

Key Findings

Methodology

SES (Symbolic Equation Solver) formulates the problem of solving equations as an optimization task within a differentiable symbolic model space. It constructs a loss function based solely on the residuals of the governing equations and auxiliary boundary or initial conditions, eliminating the need for paired input-output data. The core model is a neural network-like symbolic structure, inspired by Equation Learner (EQL), which uses a library of symbolic operators (e.g., identity, constants, powers, exponential, tanh, multiplication). During training, the parameters of this symbolic network are optimized via gradient descent to minimize the residuals evaluated at collocation points sampled from the domain. Automatic differentiation computes derivatives needed for residuals involving differential operators. The training process involves three phases: initial residual minimization, sparsity promotion via L1 regularization with pruning, and fine-tuning. After training, the symbolic parameters are converted into explicit expressions, providing interpretable solutions. This approach is applicable to algebraic equations, transcendental equations, and various classes of differential equations, including PDEs with boundary conditions.

Key Results

  • In solving a linear system (2x+3y=7, x−y=1), SES accurately recovered the constants x=2 and y=1, with errors below 10^-5, demonstrating its capacity for multi-variable symbolic solutions.
  • For the transcendental equation (x + x^3 = e + 1), SES identified the exact symbolic solution x=1, with residuals within five decimal places, showing robustness in handling non-algebraically solvable equations.
  • In differential equations, SES successfully recovered the analytical solutions: for dy/dt=1−y^2 with y(0)=0, it obtained y(t)=tanh(t); for the transport equation with u(x,0)=x^2, it recovered u(x,t)=(x−t)^2; and for the 2D Poisson equation, it approximated u(x,y)=x^2 + y^2 + xy^2 with high accuracy. These results confirm its effectiveness across diverse equation types.

Significance

This work introduces a paradigm shift by enabling direct symbolic solution recovery from equations without relying on data. It bridges the gap between symbolic regression and physics-informed neural networks, offering explicit, interpretable solutions for complex algebraic and differential equations. The ability to derive symbolic expressions solely from residual minimization addresses longstanding challenges in automated mathematical reasoning, potentially transforming scientific discovery, engineering modeling, and educational tools. It opens avenues for automatic derivation of governing laws, symbolic simplification, and analytical insight, fostering a new era of AI-assisted symbolic mathematics.

Technical Contribution

The primary technical innovation lies in embedding symbolic expressions into a differentiable optimization framework, leveraging a library of symbolic operators and automatic differentiation. Unlike traditional symbolic regression that fits data, SES directly minimizes residuals of the governing equations, enabling the extraction of explicit formulas. Its multi-phase training with sparsity regularization ensures concise, interpretable solutions. The framework supports multi-variable systems and PDEs with boundary conditions, extending the applicability of symbolic optimization. This approach offers a new theoretical foundation for equation-constrained symbolic learning, combining symbolic algebra with gradient-based optimization, and paves the way for scalable, data-free symbolic reasoning.

Novelty

This is the first framework that combines residual-based physics-informed optimization with symbolic expression search, removing the dependency on input-output data pairs. Unlike prior symbolic regression methods that rely on large datasets, SES directly encodes the governing equations into the loss function, enabling it to recover solutions purely from the equations themselves. Its ability to handle complex nonlinear, transcendental, and PDEs with explicit symbolic expressions represents a significant advancement in symbolic AI and scientific computing.

Limitations

  • The expressivity of the symbolic operation library limits the complexity of recoverable expressions; extremely high-order or deeply nested formulas may be challenging to capture.
  • Optimization may encounter local minima, especially in high-dimensional or highly nonlinear problems, requiring careful hyperparameter tuning and multiple restarts.
  • For very stiff or boundary-condition-sensitive PDEs, the current approach may struggle to converge to exact symbolic solutions, necessitating further methodological improvements.

Future Work

Future directions include expanding the symbolic operation library to encompass more complex functions, integrating adaptive sampling strategies for better convergence, and developing hybrid methods combining symbolic and neural approaches. Additionally, automating the selection of optimal model architectures and extending to stochastic or parameterized equations could further enhance its versatility. The ultimate goal is to create a fully autonomous symbolic reasoning system capable of discovering, verifying, and simplifying equations in scientific research.

AI Executive Summary

Mathematics and physics rely heavily on equations to describe the natural world, yet many complex equations resist traditional analytical solutions. Numerical methods provide approximate solutions but often lack interpretability and insight into the underlying structure. Symbolic solutions, on the other hand, offer clarity and analytical power but are difficult to obtain for nonlinear, transcendental, or high-dimensional equations. Existing symbolic solvers are limited by their reliance on specific algebraic structures, while symbolic regression methods require large datasets of input-output pairs, which are often unavailable or impractical.

This paper introduces the Symbolic Equation Solver (SES), a novel framework that addresses these limitations by formulating equation solving as an optimization problem within a differentiable symbolic model space. Unlike conventional methods, SES does not depend on pre-collected data; instead, it uses the residuals of the governing equations and boundary conditions as the sole supervision signals. The core idea is to represent candidate solutions as symbolic networks inspired by the Equation Learner (EQL), which incorporate a library of symbolic operators such as identity, constants, powers, exponentials, and hyperbolic tangent functions.

The process begins with constructing a loss function based on the residuals of the equations and auxiliary conditions evaluated at sampled collocation points. The symbolic model parameters are then optimized via gradient descent, with automatic differentiation ensuring accurate derivative computations for differential operators. The training proceeds in three phases: initial residual minimization, sparsity promotion through L1 regularization and pruning, and fine-tuning for improved accuracy. After training, the symbolic parameters are converted into explicit formulas, providing human-readable solutions.

Extensive experiments demonstrate SES's ability to recover exact or near-exact symbolic solutions across diverse classes of equations. For linear systems, it precisely identified constant solutions; for transcendental equations, it recovered the exact symbolic form; and for differential equations, it accurately reconstructed the analytical solutions, including nonlinear and PDE cases. All these results were achieved without any supervised input-output data, solely relying on the equations themselves.

This work signifies a major step forward in symbolic mathematics and scientific computing. By enabling direct, data-free symbolic solution discovery through residual-based optimization, SES bridges the gap between symbolic regression and physics-informed neural networks. Its capacity to derive interpretable, explicit formulas from complex equations opens new horizons for automated scientific discovery, model simplification, and educational tools. Despite current limitations in handling extremely complex expressions and potential optimization challenges, future research aims to expand the symbolic operation library, improve training robustness, and integrate adaptive sampling techniques. Overall, SES offers a powerful new paradigm for AI-driven symbolic reasoning, promising to accelerate progress across scientific disciplines.

Deep Dive

Abstract

Many equations arising in science currently cannot be solved by available analytical techniques and are therefore solved numerically, without yielding explicit symbolic expressions. Existing symbolic regression approaches can recover symbolic expressions, but require training data obtained from the underlying process, rather than the governing equation alone. We propose the Symbolic Equation Solver (SES), a framework that formulates equation solving as an optimization problem over differentiable symbolic models. SES constructs its objective from the equation together with initial or boundary conditions, eliminating the need for paired input-output data. The learned model is expressed in explicit symbolic form, enabling further analysis. We evaluate SES on representative algebraic and differential equations, including a system of algebraic equations, an equation with transcendental terms, an ordinary differential equation, and partial differential equations with different initial or boundary conditions. Across these settings, SES recovers compact symbolic expressions that match the corresponding analytical solutions.

cs.NE cs.SC

References (20)

Modern computer algebra

J. Gathen, Jürgen Gerhard

2002 2092 citations

Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra

David A. Cox, J. Little, D. O'Shea

1992 3058 citations

Review of PySR: high-performance symbolic regression in Python and Julia

A. Tonda

2024 32 citations

DGM: A deep learning algorithm for solving partial differential equations

Justin A. Sirignano, K. Spiliopoulos

2017 2479 citations View Analysis →

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

M. Raissi, P. Perdikaris, G. Karniadakis

2019 17196 citations

Neural Symbolic Regression that Scales

Luca Biggio, Tommaso Bendinelli, Alexander Neitz et al.

2021 261 citations View Analysis →

Extrapolation and learning equations

G. Martius, Christoph H. Lampert

2016 207 citations View Analysis →

Interactive symbolic regression with co-design mechanism through offline reinforcement learning

Yuan Tian, Wenqi Zhou, Michele Viscione et al.

2025 17 citations

End-to-end symbolic regression with transformers

Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample et al.

2022 282 citations View Analysis →

Complex Equation Learner: Rational Symbolic Regression with Gradient Descent in Complex Domain

S. Garmaev, Maurice Gauch'e, Olga Fink

2026 1 citations View Analysis →

Symbolic computation in algebra, geometry, and differential equations

Franz Winkler

2024 3 citations

Interpretable scientific discovery with symbolic regression: a review

N. Makke, S. Chawla

2022 304 citations View Analysis →

Genetic programming as a means for programming computers by natural selection

J. Koza

1994 1500 citations

An Algorithm for Solving Second Order Linear Homogeneous Differential Equations

Jerald J. Kovacic

1986 497 citations

Learning Equations for Extrapolation and Control

Subham S. Sahoo, Christoph H. Lampert, G. Martius

2018 294 citations View Analysis →

Artificial neural networks for solving ordinary and partial differential equations

I. Lagaris, A. Likas, D. Fotiadis

1997 2669 citations View Analysis →

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

S. Brunton, J. Proctor, J. Kutz

2015 5014 citations View Analysis →

Informed Equation Learning

M. Werner, Andrej Junginger, Philipp Hennig et al.

2021 21 citations View Analysis →

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Brenden K. Petersen, Mikel Landajuela

2019 464 citations View Analysis →

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

E. Weinan, Ting Yu

2017 1808 citations View Analysis →