Prism: Symbolic Superoptimization of Tensor Programs

TL;DR

Prism uses sGraph for symbolic superoptimization of tensor programs, achieving up to 2.2x speedup.

cs.PL 🔴 Advanced 2026-04-17 22 views
Mengdi Wu Xiaoyu Jiang Oded Padon Zhihao Jia
symbolic superoptimization tensor programs sGraph machine learning GPU acceleration

Key Findings

Methodology

Prism employs sGraph, a symbolic hierarchical representation, to achieve superoptimization of tensor programs. sGraph compactly encodes large classes of tensor programs by symbolically representing execution parameters. Prism organizes optimization as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations. This enables structured pruning of provably suboptimal regions using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.

Key Results

  • Prism achieves up to 2.2x speedup over the best superoptimizers and 4.9x over the best compiler-based approaches on five commonly used LLM workloads, while reducing end-to-end optimization time by up to 3.4x.
  • Through symbolic reasoning, Prism effectively prunes the search space, handling complex optimization problems that traditional enumeration methods cannot.
  • Prism preserves optimality guarantees; its pruning process is sound and does not eliminate optimal solutions.

Significance

Prism significantly enhances the execution efficiency of tensor programs, particularly in modern ML workloads. By leveraging symbolic superoptimization, Prism automatically discovers efficient tensor program implementations, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss.

Technical Contribution

Prism introduces sGraph, a symbolic representation that organizes the search space without concretely enumerating candidate programs. Compared to existing enumeration and sampling methods, Prism explores larger search spaces and discovers higher-quality optimizations. Additionally, Prism achieves structured pruning through symbolic reasoning, ensuring optimality guarantees.

Novelty

Prism is the first symbolic superoptimizer for tensor programs, introducing sGraph to compactly encode program families. Unlike existing enumeration and sampling methods, Prism achieves more efficient search and pruning through symbolic reasoning.

Limitations

  • Prism may have limitations in handling specific complex hardware constraints, as these constraints might not be effectively expressed and pruned through symbolic reasoning.
  • Prism's performance depends on the efficiency of symbolic graph generation and equivalence verification, which may become bottlenecks in certain scenarios.
  • Prism's symbolic superoptimization approach may not significantly outperform existing methods on certain specific tensor programs.

Future Work

Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.

AI Executive Summary

Efficient execution of machine learning models on GPUs is crucial for modern AI applications. However, existing optimization methods often rely on manually designed rules and scheduling templates, which struggle to capture complex combinatorial interactions. Prism introduces a new approach to symbolic superoptimization by leveraging sGraph, a symbolic representation that compactly encodes large classes of tensor programs. Through symbolic reasoning, Prism achieves structured pruning, eliminating suboptimal regions in the search space.

Prism's optimization process is organized as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations. This allows Prism to explore larger search spaces without concretely enumerating candidate programs, discovering higher-quality optimizations. Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.

In experiments, Prism demonstrated significant performance improvements on five commonly used LLM workloads. Compared to the best superoptimizers, Prism achieved up to 2.2x speedup; compared to the best compiler-based approaches, Prism achieved up to 4.9x speedup. Additionally, Prism reduced end-to-end optimization time by up to 3.4x. These performance gains indicate that Prism effectively handles complex optimization problems in modern ML workloads.

The significance of Prism lies in its ability to enhance the execution efficiency of tensor programs, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss. By leveraging symbolic superoptimization, Prism provides an efficient optimization method for modern ML workloads.

However, Prism may have limitations in handling specific complex hardware constraints, and its performance depends on the efficiency of symbolic graph generation and equivalence verification. Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.

Deep Analysis

Background

In modern machine learning systems, tensor programs are typically represented as directed acyclic graphs (DAGs), where nodes correspond to tensor operators and edges denote tensors (i.e., multi-dimensional arrays). Existing ML systems generally optimize tensor programs through transformation rules and scheduling templates manually designed by domain experts. However, this approach has two fundamental limitations: first, it requires substantial engineering effort to support new operators or hardware targets; second, manually designed rules explore only a limited portion of the optimization space, failing to capture the combinatorial interactions among algebraic transformations, data layouts, and hardware-specific scheduling decisions. Superoptimization is an approach to automatically discover fast tensor programs without relying on manually specified rules. TASO pioneered this approach by automatically generating graph substitutions and applying them to optimize tensor programs. Mirage applies superoptimization across multiple levels of the GPU execution hierarchy through a unified representation, enabling coordinated algebraic and scheduling transformations. Recently, large language models (LLMs) have been used to generate optimized GPU kernels, and AlphaEvolve has been proposed as a general-purpose superoptimizer leveraging LLMs to guide the search. However, their applicability to optimizing tensor programs has not been explored yet.

Core Problem

Existing enumeration and sampling-based superoptimization methods face scalability bottlenecks when dealing with large or deeply nested programs: the number of candidate programs grows combinatorially with the number of operators and levels in the execution hierarchy, making exhaustive enumeration impractical. Learning-based superoptimizers like AlphaEvolve use learned priors to guide the search, enabling exploration over substantially larger spaces. However, these methods treat the optimization landscape as largely unstructured, which can lead to unstable search behavior and provides limited guarantees on coverage or completeness of the explored program space.

Innovation

Prism is the first symbolic superoptimizer for tensor programs. Rather than enumerating concrete candidate programs via brute-force enumeration or stochastic sampling, Prism organizes the search space into a two-level hierarchy. At the upper level, it constructs a representation called sGraph, which compactly encodes entire families of tensor programs; at the lower level, each sGraph is instantiated into many concrete implementations. This symbolic formulation enables Prism to explore substantially larger search spaces and discover higher-quality optimizations compared to prior enumeration- and sampling-based approaches for two key reasons. First, sGraph encodes operator semantics, algebraic identities, and hardware constraints as symbolic expressions, allowing Prism to prune provably suboptimal regions of the search space before materializing concrete programs. This structured pruning enables scalability to optimization problems that are intractable for exhaustive enumeration. Second, Prism preserves optimality guarantees: the pruning process is sound and does not eliminate optimal solutions. This property fundamentally distinguishes symbolic superoptimization from prior sampling-based approaches.

Methodology

  • �� Prism employs sGraph, a symbolic hierarchical representation, to achieve superoptimization of tensor programs. sGraph compactly encodes large classes of tensor programs by symbolically representing execution parameters.

  • �� Prism organizes optimization as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations.

  • �� Through symbolic reasoning, Prism achieves structured pruning, eliminating suboptimal regions in the search space.

  • �� Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.

  • �� Prism's symbolic formulation allows it to explore larger search spaces and discover higher-quality optimizations.

Experiments

Prism was evaluated on five commonly used LLM workloads, including fused normalization-linear layers, gated MLPs, and group-query attention. In these experiments, Prism achieved up to 2.2x speedup over the best superoptimizers and 4.9x over the best compiler-based approaches, while reducing end-to-end optimization time by up to 3.4x. The experimental design involved symbolic reasoning to effectively prune the search space, enabling Prism to handle complex optimization problems that traditional enumeration methods cannot solve. The results demonstrate that Prism significantly enhances the execution efficiency of tensor programs, particularly in modern ML workloads.

Results

In experiments, Prism demonstrated significant performance improvements on five commonly used LLM workloads. Compared to the best superoptimizers, Prism achieved up to 2.2x speedup; compared to the best compiler-based approaches, Prism achieved up to 4.9x speedup. Additionally, Prism reduced end-to-end optimization time by up to 3.4x. These performance gains indicate that Prism effectively handles complex optimization problems in modern ML workloads. Through symbolic reasoning, Prism effectively prunes the search space, handling complex optimization problems that traditional enumeration methods cannot.

Applications

Prism's application scenarios include tensor program optimization in modern ML workloads. By leveraging symbolic superoptimization, Prism automatically discovers efficient tensor program implementations, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss. Prism provides an efficient optimization method for modern ML workloads, significantly enhancing the execution efficiency of tensor programs.

Limitations & Outlook

Prism may have limitations in handling specific complex hardware constraints, as these constraints might not be effectively expressed and pruned through symbolic reasoning. Prism's performance depends on the efficiency of symbolic graph generation and equivalence verification, which may become bottlenecks in certain scenarios. Prism's symbolic superoptimization approach may not significantly outperform existing methods on certain specific tensor programs. Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.

Plain Language Accessible to non-experts

Imagine you're in a kitchen, cooking a meal. You have a recipe that tells you what ingredients you need, how to chop them, and how to cook them. Traditional methods are like following the recipe step by step, but sometimes you might find faster ways, like chopping vegetables while the soup is boiling. Prism is like a smart kitchen assistant that not only follows the recipe but also automatically finds these faster methods. It uses something called sGraph to list out all possible cooking steps and then uses smart reasoning to eliminate those that aren't so good. By doing this, it helps you find the fastest and best way to cook. Prism's strength lies in its ability to handle very complex recipes, even those that ordinary chefs find challenging. In this way, Prism can make your kitchen work more efficient, like adding a supercharger to your cooking process.

ELI14 Explained like you're 14

Hey there! Have you ever wondered why some games load super fast while others take forever? It's like we're using a tool called Prism to speed up game loading times. Imagine you're playing a huge game with lots of characters, scenes, and effects. Prism is like a smart game assistant that helps you find the fastest way to load everything. It first lists out all the possible loading methods and then uses something called sGraph magic to get rid of the not-so-good ones. This way, the game loads at lightning speed, so you don't have to wait forever. Prism is really cool because it can handle super complex game scenes, even those that regular tools find hard to manage. By doing this, Prism makes your gaming experience smoother, like adding a supercharger to your game. Isn't that awesome?

Glossary

Symbolic Superoptimization

A method that uses symbolic reasoning to automatically discover efficient program implementations, organizing the search space without concretely enumerating candidate programs.

Prism achieves efficient execution of tensor programs through symbolic superoptimization.

sGraph

A symbolic hierarchical representation that compactly encodes large classes of tensor programs, enabling structured pruning through symbolic reasoning.

Prism uses sGraph to represent and optimize tensor programs.

e-graph Rewriting

A method for program equivalence verification using equivalence class graphs, enabling efficient equivalence checking in symbolic representations.

Prism performs equivalence verification of symbolic graphs via e-graph rewriting.

Auto-tuning

A method that automatically searches for the best parameter configurations to improve program execution efficiency.

Prism employs auto-tuning to instantiate parameters for optimized GPU kernels.

Tensor Program

A program used to represent multi-dimensional array computations, commonly used in machine learning and deep learning.

Prism targets tensor programs for symbolic superoptimization.

Algebraic Identity

A mathematical equation that expresses the equality of two expressions under all circumstances, often used in computation optimization.

Prism uses algebraic identities for symbolic reasoning and pruning.

Hardware Constraint

A hardware-specific feature or condition that limits program execution, potentially affecting performance and optimization space.

Prism considers hardware constraints in symbolic reasoning for optimization.

GPU Kernel

A parallel computation unit executed on a GPU, typically used to accelerate large-scale data processing.

Prism generates optimized GPU kernels through symbolic superoptimization.

Parallelization Parameter

A parameter used to control the parallel execution of a program, affecting computation distribution and scheduling.

Prism optimizes through symbolic parallelization parameters.

Equivalence Verification

A method to verify whether two programs are functionally equivalent, ensuring the correctness of the optimization process.

Prism ensures the correctness of symbolic graphs through equivalence verification.

Open Questions Unanswered questions from this research

  • 1 How can complex hardware constraints be better expressed and handled in symbolic superoptimization? Current methods may not effectively capture the nuances of these constraints, affecting optimization outcomes.
  • 2 How can the efficiency of symbolic graph generation and equivalence verification be further improved? These steps may become performance bottlenecks in certain scenarios, limiting Prism's applicability.
  • 3 How can Prism be extended to support more types of tensor programs and operators? Current methods may not significantly outperform existing methods on certain specific tensor programs.
  • 4 How can symbolic reasoning techniques be further developed to achieve more efficient search and pruning? Current methods may not fully leverage the potential of symbolic reasoning in some cases.
  • 5 How can machine learning techniques be better integrated into symbolic superoptimization to enhance optimization outcomes? Current methods may not fully exploit the advantages of machine learning in some scenarios.

Applications

Immediate Applications

Machine Learning Model Optimization

Prism can be used to optimize tensor programs in machine learning models, improving execution efficiency and reducing training and inference times.

GPU Acceleration

By generating optimized GPU kernels, Prism can significantly speed up large-scale data processing, applicable to scientific computing and data analysis.

Automated Software Optimization

Prism can be used for automated software optimization, reducing reliance on manually designed rules and improving software performance and scalability.

Long-term Vision

General Computing Optimization

Prism's symbolic superoptimization method can be extended to broader computing domains, achieving automated optimization for general computing.

Intelligent System Development

By integrating machine learning techniques, Prism can be used to develop more intelligent systems, achieving automated program optimization and performance enhancement.

Abstract

This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.

cs.PL cs.AI cs.LG

References (20)

egg: Fast and extensible equality saturation

Max Willsey, Chandrakana Nandi, Y. Wang et al.

2020 260 citations ⭐ Influential

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger et al.

2025 382 citations ⭐ Influential View Analysis →

Mirage: A Multi-Level Superoptimizer for Tensor Programs

Mengdi Wu, Xinhao Cheng, Shengyu Liu et al.

2024 40 citations ⭐ Influential View Analysis →

TASO: optimizing deep learning computation with automatic generation of graph substitutions

Zhihao Jia, Oded Padon, James J. Thomas et al.

2019 348 citations ⭐ Influential

Ansor : Generating High-Performance Tensor Programs for Deep Learning

Lianmin Zheng, Chengfan Jia, Minmin Sun et al.

2020 540 citations ⭐ Influential View Analysis →

Automatically scheduling halide image processing pipelines

Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet et al.

2016 206 citations

Beyond Data and Model Parallelism for Deep Neural Networks

Zhihao Jia, M. Zaharia, A. Aiken

2018 597 citations View Analysis →

KernelFoundry: Hardware-aware evolutionary GPU kernel optimization

Nina Wiedemann, Quentin Leboutet, Michael Paulitsch et al.

2026 2 citations View Analysis →

EINNET: Optimizing Tensor Programs with Derivation-Based Transformations

Liyan Zheng, Haojie Wang, Jidong Zhai et al.

2023 18 citations

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections

Haojie Wang, Jidong Zhai, Mingyu Gao et al.

2021 76 citations

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Colin Unger, Zhihao Jia, Wei Wu et al.

2022 101 citations

Learning to Optimize Tensor Programs

Tianqi Chen, Lianmin Zheng, Eddie Q. Yan et al.

2018 479 citations View Analysis →

NVIDIA Tensor Core Programmability, Performance & Precision

S. Markidis, Steven W. D. Chien, E. Laure et al.

2018 445 citations View Analysis →

Superoptimizer: a look at the smallest program

H. Massalin

1987 392 citations

TVM: End-to-End Optimization Stack for Deep Learning

Tianqi Chen, T. Moreau, Ziheng Jiang et al.

2018 189 citations

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Byungsoo Jeon, Mengdi Wu, Shiyi Cao et al.

2024 19 citations View Analysis →

Astra: A Multi-Agent System for GPU Kernel Performance Optimization

Anjiang Wei, Tianran Sun, Yogesh Seenichamy et al.

2025 30 citations View Analysis →

Transformer

Mukund R. Patel

2020 467 citations

Equality Saturation for Tensor Graph Superoptimization

Yicheng Yang, Mangpo Phitchaya Phothilimtha, Y. Wang et al.

2021 115 citations View Analysis →

TensorFlow: A system for large-scale machine learning

Martín Abadi, P. Barham, Jianmin Chen et al.

2016 19484 citations View Analysis →