Prism: Symbolic Superoptimization of Tensor Programs
Prism uses sGraph for symbolic superoptimization of tensor programs, achieving up to 2.2x speedup.
Key Findings
Methodology
Prism employs sGraph, a symbolic hierarchical representation, to achieve superoptimization of tensor programs. sGraph compactly encodes large classes of tensor programs by symbolically representing execution parameters. Prism organizes optimization as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations. This enables structured pruning of provably suboptimal regions using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.
Key Results
- Prism achieves up to 2.2x speedup over the best superoptimizers and 4.9x over the best compiler-based approaches on five commonly used LLM workloads, while reducing end-to-end optimization time by up to 3.4x.
- Through symbolic reasoning, Prism effectively prunes the search space, handling complex optimization problems that traditional enumeration methods cannot.
- Prism preserves optimality guarantees; its pruning process is sound and does not eliminate optimal solutions.
Significance
Prism significantly enhances the execution efficiency of tensor programs, particularly in modern ML workloads. By leveraging symbolic superoptimization, Prism automatically discovers efficient tensor program implementations, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss.
Technical Contribution
Prism introduces sGraph, a symbolic representation that organizes the search space without concretely enumerating candidate programs. Compared to existing enumeration and sampling methods, Prism explores larger search spaces and discovers higher-quality optimizations. Additionally, Prism achieves structured pruning through symbolic reasoning, ensuring optimality guarantees.
Novelty
Prism is the first symbolic superoptimizer for tensor programs, introducing sGraph to compactly encode program families. Unlike existing enumeration and sampling methods, Prism achieves more efficient search and pruning through symbolic reasoning.
Limitations
- Prism may have limitations in handling specific complex hardware constraints, as these constraints might not be effectively expressed and pruned through symbolic reasoning.
- Prism's performance depends on the efficiency of symbolic graph generation and equivalence verification, which may become bottlenecks in certain scenarios.
- Prism's symbolic superoptimization approach may not significantly outperform existing methods on certain specific tensor programs.
Future Work
Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.
AI Executive Summary
Efficient execution of machine learning models on GPUs is crucial for modern AI applications. However, existing optimization methods often rely on manually designed rules and scheduling templates, which struggle to capture complex combinatorial interactions. Prism introduces a new approach to symbolic superoptimization by leveraging sGraph, a symbolic representation that compactly encodes large classes of tensor programs. Through symbolic reasoning, Prism achieves structured pruning, eliminating suboptimal regions in the search space.
Prism's optimization process is organized as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations. This allows Prism to explore larger search spaces without concretely enumerating candidate programs, discovering higher-quality optimizations. Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.
In experiments, Prism demonstrated significant performance improvements on five commonly used LLM workloads. Compared to the best superoptimizers, Prism achieved up to 2.2x speedup; compared to the best compiler-based approaches, Prism achieved up to 4.9x speedup. Additionally, Prism reduced end-to-end optimization time by up to 3.4x. These performance gains indicate that Prism effectively handles complex optimization problems in modern ML workloads.
The significance of Prism lies in its ability to enhance the execution efficiency of tensor programs, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss. By leveraging symbolic superoptimization, Prism provides an efficient optimization method for modern ML workloads.
However, Prism may have limitations in handling specific complex hardware constraints, and its performance depends on the efficiency of symbolic graph generation and equivalence verification. Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.
Deep Analysis
Background
In modern machine learning systems, tensor programs are typically represented as directed acyclic graphs (DAGs), where nodes correspond to tensor operators and edges denote tensors (i.e., multi-dimensional arrays). Existing ML systems generally optimize tensor programs through transformation rules and scheduling templates manually designed by domain experts. However, this approach has two fundamental limitations: first, it requires substantial engineering effort to support new operators or hardware targets; second, manually designed rules explore only a limited portion of the optimization space, failing to capture the combinatorial interactions among algebraic transformations, data layouts, and hardware-specific scheduling decisions. Superoptimization is an approach to automatically discover fast tensor programs without relying on manually specified rules. TASO pioneered this approach by automatically generating graph substitutions and applying them to optimize tensor programs. Mirage applies superoptimization across multiple levels of the GPU execution hierarchy through a unified representation, enabling coordinated algebraic and scheduling transformations. Recently, large language models (LLMs) have been used to generate optimized GPU kernels, and AlphaEvolve has been proposed as a general-purpose superoptimizer leveraging LLMs to guide the search. However, their applicability to optimizing tensor programs has not been explored yet.
Core Problem
Existing enumeration and sampling-based superoptimization methods face scalability bottlenecks when dealing with large or deeply nested programs: the number of candidate programs grows combinatorially with the number of operators and levels in the execution hierarchy, making exhaustive enumeration impractical. Learning-based superoptimizers like AlphaEvolve use learned priors to guide the search, enabling exploration over substantially larger spaces. However, these methods treat the optimization landscape as largely unstructured, which can lead to unstable search behavior and provides limited guarantees on coverage or completeness of the explored program space.
Innovation
Prism is the first symbolic superoptimizer for tensor programs. Rather than enumerating concrete candidate programs via brute-force enumeration or stochastic sampling, Prism organizes the search space into a two-level hierarchy. At the upper level, it constructs a representation called sGraph, which compactly encodes entire families of tensor programs; at the lower level, each sGraph is instantiated into many concrete implementations. This symbolic formulation enables Prism to explore substantially larger search spaces and discover higher-quality optimizations compared to prior enumeration- and sampling-based approaches for two key reasons. First, sGraph encodes operator semantics, algebraic identities, and hardware constraints as symbolic expressions, allowing Prism to prune provably suboptimal regions of the search space before materializing concrete programs. This structured pruning enables scalability to optimization problems that are intractable for exhaustive enumeration. Second, Prism preserves optimality guarantees: the pruning process is sound and does not eliminate optimal solutions. This property fundamentally distinguishes symbolic superoptimization from prior sampling-based approaches.
Methodology
- �� Prism employs sGraph, a symbolic hierarchical representation, to achieve superoptimization of tensor programs. sGraph compactly encodes large classes of tensor programs by symbolically representing execution parameters.
- �� Prism organizes optimization as a two-level search: constructing symbolic graphs representing program families and instantiating them into concrete implementations.
- �� Through symbolic reasoning, Prism achieves structured pruning, eliminating suboptimal regions in the search space.
- �� Key techniques include efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning.
- �� Prism's symbolic formulation allows it to explore larger search spaces and discover higher-quality optimizations.
Experiments
Prism was evaluated on five commonly used LLM workloads, including fused normalization-linear layers, gated MLPs, and group-query attention. In these experiments, Prism achieved up to 2.2x speedup over the best superoptimizers and 4.9x over the best compiler-based approaches, while reducing end-to-end optimization time by up to 3.4x. The experimental design involved symbolic reasoning to effectively prune the search space, enabling Prism to handle complex optimization problems that traditional enumeration methods cannot solve. The results demonstrate that Prism significantly enhances the execution efficiency of tensor programs, particularly in modern ML workloads.
Results
In experiments, Prism demonstrated significant performance improvements on five commonly used LLM workloads. Compared to the best superoptimizers, Prism achieved up to 2.2x speedup; compared to the best compiler-based approaches, Prism achieved up to 4.9x speedup. Additionally, Prism reduced end-to-end optimization time by up to 3.4x. These performance gains indicate that Prism effectively handles complex optimization problems in modern ML workloads. Through symbolic reasoning, Prism effectively prunes the search space, handling complex optimization problems that traditional enumeration methods cannot.
Applications
Prism's application scenarios include tensor program optimization in modern ML workloads. By leveraging symbolic superoptimization, Prism automatically discovers efficient tensor program implementations, reducing reliance on manually designed rules. This decreases engineering effort for supporting new operators or hardware targets and expands the optimization space to capture combinatorial interactions that human intuition might miss. Prism provides an efficient optimization method for modern ML workloads, significantly enhancing the execution efficiency of tensor programs.
Limitations & Outlook
Prism may have limitations in handling specific complex hardware constraints, as these constraints might not be effectively expressed and pruned through symbolic reasoning. Prism's performance depends on the efficiency of symbolic graph generation and equivalence verification, which may become bottlenecks in certain scenarios. Prism's symbolic superoptimization approach may not significantly outperform existing methods on certain specific tensor programs. Future directions include extending Prism to support more types of hardware constraints, optimizing the efficiency of symbolic graph generation and equivalence verification, and exploring more symbolic reasoning techniques to further enhance optimization effectiveness.
Plain Language Accessible to non-experts
Imagine you're in a kitchen, cooking a meal. You have a recipe that tells you what ingredients you need, how to chop them, and how to cook them. Traditional methods are like following the recipe step by step, but sometimes you might find faster ways, like chopping vegetables while the soup is boiling. Prism is like a smart kitchen assistant that not only follows the recipe but also automatically finds these faster methods. It uses something called sGraph to list out all possible cooking steps and then uses smart reasoning to eliminate those that aren't so good. By doing this, it helps you find the fastest and best way to cook. Prism's strength lies in its ability to handle very complex recipes, even those that ordinary chefs find challenging. In this way, Prism can make your kitchen work more efficient, like adding a supercharger to your cooking process.
ELI14 Explained like you're 14
Hey there! Have you ever wondered why some games load super fast while others take forever? It's like we're using a tool called Prism to speed up game loading times. Imagine you're playing a huge game with lots of characters, scenes, and effects. Prism is like a smart game assistant that helps you find the fastest way to load everything. It first lists out all the possible loading methods and then uses something called sGraph magic to get rid of the not-so-good ones. This way, the game loads at lightning speed, so you don't have to wait forever. Prism is really cool because it can handle super complex game scenes, even those that regular tools find hard to manage. By doing this, Prism makes your gaming experience smoother, like adding a supercharger to your game. Isn't that awesome?
Glossary
Symbolic Superoptimization
A method that uses symbolic reasoning to automatically discover efficient program implementations, organizing the search space without concretely enumerating candidate programs.
Prism achieves efficient execution of tensor programs through symbolic superoptimization.
sGraph
A symbolic hierarchical representation that compactly encodes large classes of tensor programs, enabling structured pruning through symbolic reasoning.
Prism uses sGraph to represent and optimize tensor programs.
e-graph Rewriting
A method for program equivalence verification using equivalence class graphs, enabling efficient equivalence checking in symbolic representations.
Prism performs equivalence verification of symbolic graphs via e-graph rewriting.
Auto-tuning
A method that automatically searches for the best parameter configurations to improve program execution efficiency.
Prism employs auto-tuning to instantiate parameters for optimized GPU kernels.
Tensor Program
A program used to represent multi-dimensional array computations, commonly used in machine learning and deep learning.
Prism targets tensor programs for symbolic superoptimization.
Algebraic Identity
A mathematical equation that expresses the equality of two expressions under all circumstances, often used in computation optimization.
Prism uses algebraic identities for symbolic reasoning and pruning.
Hardware Constraint
A hardware-specific feature or condition that limits program execution, potentially affecting performance and optimization space.
Prism considers hardware constraints in symbolic reasoning for optimization.
GPU Kernel
A parallel computation unit executed on a GPU, typically used to accelerate large-scale data processing.
Prism generates optimized GPU kernels through symbolic superoptimization.
Parallelization Parameter
A parameter used to control the parallel execution of a program, affecting computation distribution and scheduling.
Prism optimizes through symbolic parallelization parameters.
Equivalence Verification
A method to verify whether two programs are functionally equivalent, ensuring the correctness of the optimization process.
Prism ensures the correctness of symbolic graphs through equivalence verification.
Open Questions Unanswered questions from this research
- 1 How can complex hardware constraints be better expressed and handled in symbolic superoptimization? Current methods may not effectively capture the nuances of these constraints, affecting optimization outcomes.
- 2 How can the efficiency of symbolic graph generation and equivalence verification be further improved? These steps may become performance bottlenecks in certain scenarios, limiting Prism's applicability.
- 3 How can Prism be extended to support more types of tensor programs and operators? Current methods may not significantly outperform existing methods on certain specific tensor programs.
- 4 How can symbolic reasoning techniques be further developed to achieve more efficient search and pruning? Current methods may not fully leverage the potential of symbolic reasoning in some cases.
- 5 How can machine learning techniques be better integrated into symbolic superoptimization to enhance optimization outcomes? Current methods may not fully exploit the advantages of machine learning in some scenarios.
Applications
Immediate Applications
Machine Learning Model Optimization
Prism can be used to optimize tensor programs in machine learning models, improving execution efficiency and reducing training and inference times.
GPU Acceleration
By generating optimized GPU kernels, Prism can significantly speed up large-scale data processing, applicable to scientific computing and data analysis.
Automated Software Optimization
Prism can be used for automated software optimization, reducing reliance on manually designed rules and improving software performance and scalability.
Long-term Vision
General Computing Optimization
Prism's symbolic superoptimization method can be extended to broader computing domains, achieving automated optimization for general computing.
Intelligent System Development
By integrating machine learning techniques, Prism can be used to develop more intelligent systems, achieving automated program optimization and performance enhancement.
Abstract
This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.
References (20)
egg: Fast and extensible equality saturation
Max Willsey, Chandrakana Nandi, Y. Wang et al.
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜u, Marvin Eisenberger et al.
Mirage: A Multi-Level Superoptimizer for Tensor Programs
Mengdi Wu, Xinhao Cheng, Shengyu Liu et al.
TASO: optimizing deep learning computation with automatic generation of graph substitutions
Zhihao Jia, Oded Padon, James J. Thomas et al.
Ansor : Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng, Chengfan Jia, Minmin Sun et al.
Automatically scheduling halide image processing pipelines
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet et al.
Beyond Data and Model Parallelism for Deep Neural Networks
Zhihao Jia, M. Zaharia, A. Aiken
KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
Nina Wiedemann, Quentin Leboutet, Michael Paulitsch et al.
EINNET: Optimizing Tensor Programs with Derivation-Based Transformations
Liyan Zheng, Haojie Wang, Jidong Zhai et al.
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
Haojie Wang, Jidong Zhai, Mingyu Gao et al.
Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization
Colin Unger, Zhihao Jia, Wei Wu et al.
Learning to Optimize Tensor Programs
Tianqi Chen, Lianmin Zheng, Eddie Q. Yan et al.
NVIDIA Tensor Core Programmability, Performance & Precision
S. Markidis, Steven W. D. Chien, E. Laure et al.
Superoptimizer: a look at the smallest program
H. Massalin
TVM: End-to-End Optimization Stack for Deep Learning
Tianqi Chen, T. Moreau, Ziheng Jiang et al.
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon, Mengdi Wu, Shiyi Cao et al.
Astra: A Multi-Agent System for GPU Kernel Performance Optimization
Anjiang Wei, Tianran Sun, Yogesh Seenichamy et al.
Transformer
Mukund R. Patel
Equality Saturation for Tensor Graph Superoptimization
Yicheng Yang, Mangpo Phitchaya Phothilimtha, Y. Wang et al.
TensorFlow: A system for large-scale machine learning
Martín Abadi, P. Barham, Jianmin Chen et al.