Runtime Analysis of Cartesian Genetic Programming in Evolving Boolean Functions

TL;DR

This paper provides the first runtime analysis of Cartesian Genetic Programming (CGP) in evolving Boolean functions, establishing bounds of O(nD^5) for conjunctions and exponential time for XOR, highlighting the impact of selection strategies.

cs.NE 🔴 Advanced 2026-06-15 34 views
Duc-Cuong Dang Roman Kalkreuth Andre Opris
Genetic Programming Runtime Analysis Boolean Functions Graph-Based Representation Algorithm Complexity

Key Findings

Methodology

This study employs a rigorous asymptotic analysis combining fitness-level partitioning, Markov chain modeling, and drift theorems to analyze the runtime of (1+1) CGP and its variant (1+1) CGP*. The analysis considers the effects of strict versus non-strict selection mechanisms, with the former only accepting better solutions and the latter allowing equal fitness solutions, including redundant inactive gates. The core algorithm utilizes single active-gene mutation (SAM), which selectively mutates active genes, ensuring efficient exploration of the search space. Theoretical bounds are derived for the expected number of fitness evaluations needed to evolve conjunctions of n variables using at most D ≥ n−1 binary AND gates, with proofs extending to the negative case of XOR functions via the negative drift theorem. The analysis is complemented by experiments on randomly generated Boolean functions, validating the bounds and illustrating the influence of selection strategies on convergence speed.

Key Results

  • For constructing conjunctions (Andn) with D ≥ n−1, the expected number of fitness evaluations for (1+1) CGP and (1+1) CGP* is bounded by O(nD^5) and O(nD^4), respectively. When D scales linearly with n, the complexity remains polynomial, with empirical data showing that increasing D beyond n significantly raises the evaluation count. The results demonstrate that non-strict selection accelerates convergence by accepting solutions with redundant gates that do not contribute to fitness, thus speeding up the search process.
  • In contrast, the evolution of XOR functions requires exponential time with high probability, as proved via the negative drift theorem. This indicates a fundamental difficulty in evolving non-linear, non-conjunctive Boolean functions with CGP, highlighting the limitations of the approach for certain target functions.
  • Additional experiments with incomplete training sets reveal that sampling a subset of the truth table reduces the average evaluation count while maintaining good generalization, suggesting practical benefits in real-world scenarios where full truth tables are computationally infeasible.

Significance

This work bridges the gap between empirical observations and theoretical understanding of CGP's performance in Boolean function synthesis. By establishing polynomial bounds for conjunctions and demonstrating exponential difficulty for XOR, it clarifies the conditions under which CGP is efficient. These insights inform the design of more effective genetic programming strategies, especially in hardware synthesis and logic circuit optimization, where understanding the complexity landscape is crucial. The findings also highlight the importance of selection mechanisms, providing a foundation for future algorithmic improvements and theoretical advancements in genetic programming.

Technical Contribution

The paper introduces a comprehensive theoretical framework for analyzing CGP's runtime in Boolean function evolution, employing fitness-level partitioning, Markov chain analysis, and drift theorems. It rigorously proves polynomial bounds for conjunctions with D≥n−1 and exponential bounds for XOR functions, considering both strict and non-strict selection strategies. The analysis clarifies how accepting solutions with redundant, non-contributing gates can accelerate convergence, offering a new perspective on search dynamics. Additionally, the work provides space complexity bounds and explores the impact of incomplete training sets, contributing to the broader understanding of GP complexity and guiding practical algorithm design.

Novelty

This is the first rigorous theoretical analysis of CGP's runtime in Boolean function synthesis, particularly distinguishing between strict and non-strict selection strategies. Unlike prior empirical or limited analyses focused on tree-based GP, this work models CGP as a graph-based representation with specific mutation operators (SAM), deriving explicit bounds for different target functions. The novel application of the negative drift theorem to demonstrate exponential complexity for XOR functions and the detailed comparison of selection mechanisms represent significant advances in understanding the computational limits of CGP. These contributions establish a new foundation for future theoretical and practical research in genetic programming.

Limitations

  • The analysis assumes access to complete truth tables for target functions, which is often impractical in real-world applications where only partial data is available. This may limit the direct applicability of the theoretical bounds.
  • The focus on specific functions (conjunctions and XOR) leaves open questions about the complexity for other classes of Boolean functions, especially those with more complex or composite structures.
  • The theoretical results do not account for hardware-specific factors such as parallelization, memory constraints, or implementation overhead, which can influence actual runtime performance.
  • The current study considers static training sets and single-objective optimization; extending to dynamic environments or multi-objective scenarios remains an open challenge.

Future Work

Future research will explore extending the analysis to broader classes of Boolean functions, including non-linear and composite functions, and incorporating partial or noisy training data. Investigations into multi-objective optimization, dynamic environments, and hardware-accelerated implementations are also planned. Additionally, developing adaptive selection mechanisms that balance exploration and exploitation based on theoretical insights could further improve CGP efficiency. The integration of deep learning techniques with genetic programming for complex circuit design and automated reasoning tasks represents another promising direction, aiming to bridge theoretical guarantees with practical scalability.

AI Executive Summary

Genetic Programming (GP) has long been celebrated for its ability to automatically discover solutions to complex problems by mimicking natural evolution. Among its various forms, Cartesian Genetic Programming (CGP) stands out due to its graph-based representation, which offers enhanced flexibility and expressive power, especially in logic circuit synthesis. Despite its empirical success, the theoretical understanding of CGP's performance, particularly in evolving Boolean functions, has remained limited. This paper addresses this gap by providing the first rigorous runtime analysis of CGP in this domain.

The core contribution lies in establishing asymptotic bounds on the expected number of fitness evaluations needed for CGP to evolve specific Boolean functions, such as conjunctions and XORs. Using a combination of fitness-level analysis, Markov chain modeling, and the negative drift theorem, the authors derive that constructing an n-input conjunction with D ≥ n−1 AND gates requires O(nD^5) evaluations under strict selection, which improves to O(nD^4) with non-strict selection. These bounds are significant because they demonstrate that polynomial runtime guarantees are achievable for certain classes of functions, provided the algorithm employs strategies that accept solutions with equal fitness, including redundant, non-contributing gates.

Conversely, the analysis reveals a stark contrast when evolving XOR functions. The negative results show that CGP requires exponential time with high probability, highlighting the inherent difficulty of non-linear, non-conjunctive Boolean functions. This negative result underscores the importance of understanding the structure of target functions when applying genetic programming techniques.

Complementing the theoretical work, the authors conduct experiments on randomly sampled Boolean functions. These experiments confirm the theoretical bounds and show that using incomplete training sets—sampling only a portion of the truth table—can further reduce the average evaluation count while maintaining good generalization. This insight is particularly relevant for practical applications, where full truth tables are often infeasible due to exponential size.

Overall, this research advances the theoretical foundation of CGP, clarifying the conditions under which it can efficiently evolve Boolean functions. It highlights the critical role of selection strategies and provides a roadmap for future improvements. The findings have broad implications for logic circuit design, hardware synthesis, and automated reasoning, paving the way for more robust, theoretically grounded genetic programming algorithms in complex domains. Future work will extend these analyses to more diverse functions, dynamic environments, and hardware-accelerated implementations, aiming to bridge the gap between theory and practice in evolutionary computation.

Deep Dive

Abstract

Cartesian Genetic Programming (CGP) is among the practical and popular forms of Genetic Programming as it uses a graph-based representation of programs. This paper presents a first runtime analysis of CGP in evolving Boolean functions using complete training sets. We prove an asymptotic bound $O(n D^5)$ for the expected number of fitness evaluations of CGP to construct a conjunction of $n$ inputs using at most $D \geq n-1$ binary gates, a minimal function set, and even with a strict survival selection. When the non-strict selection is used, the bound is improved to $O(n D^4)$. Our analysis reveals interesting characteristics of CGP induced search, which have been only observed empirically. In particular, enabling the acceptance of equally good solutions, including those with connected gates non-contributing to fitness, can lead to a speedup, and consequently a better asymptotic time bound. In contrast to conjunctions, we also prove a negative result which shows that CGP requires exponential time to evolve an exclusive disjunction. Experiments evolving conjunctions complement our theoretical findings. The use of incomplete training sets is found to further reduce the average number of fitness evaluations while maintaining a good level of generalisation.

cs.NE cs.AI cs.LG