Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis
Combining multi-objective genetic programming with survival tree optimization, this study enhances predictive accuracy and interpretability in survival analysis, validated on two real-world datasets.
Key Findings
Methodology
This paper introduces a framework that integrates multi-objective genetic programming (GP-GOMEA) for evolving nonlinear feature sets with the joint optimization of survival tree structures. The core components include: • Multi-objective optimization balancing predictive performance (via integrated Brier score, IBS) and model complexity (expression length); • Use of GP-GOMEA to generate high-expressiveness nonlinear features that capture complex covariate interactions; • Development of an evolutionary approach that simultaneously evolves the entire survival tree structure and split logic, avoiding greedy local optima; • Implementation of different tree induction strategies—greedy, optimal via dynamic programming, and evolutionary—across depths of 2 and 3; • Validation on two clinical datasets (GBSG and METABRIC) with performance metrics including IBS and Harrell’s C-index, demonstrating superior predictive accuracy and interpretability.
Key Results
- On both datasets, features evolved via genetic programming significantly improved shallow survival trees’ predictive performance, with IBS reductions of approximately 8-12%, reaching 0.074 (baseline 0.106) at depth 3 with joint evolution;
- The joint evolution of tree structure and split logic effectively captures complex relationships such as XOR interactions, outperforming greedy and single-objective methods;
- Models balancing performance and simplicity through multi-objective optimization achieved comparable accuracy to deeper trees, confirming the effectiveness of the approach in maintaining interpretability while enhancing prediction accuracy.
Significance
This work addresses a critical challenge in survival analysis: how to build models that are both accurate and transparent. By integrating nonlinear feature construction with joint tree optimization, the proposed framework overcomes limitations of traditional greedy algorithms and black-box deep learning models. It provides clinicians with interpretable models that can reliably predict patient outcomes, facilitating personalized treatment planning. The methodology bridges the gap between high predictive power and clinical trust, advancing the deployment of machine learning in healthcare. Its ability to model complex covariate interactions while maintaining transparency marks a significant step forward in medical AI, promising broader adoption and improved patient care.
Technical Contribution
The technical innovations include: • Introducing multi-objective genetic programming (GP-GOMEA) for symbolic nonlinear feature generation tailored to survival data; • Developing a joint evolutionary algorithm that simultaneously optimizes the structure of survival trees and their split logic, overcoming the limitations of greedy induction; • Employing a multi-objective framework that balances predictive accuracy with model simplicity, enabling flexible model selection; • Validating the approach on real-world clinical datasets, demonstrating its superiority over traditional methods and deep learning models in terms of interpretability and performance. This work extends the application of genetic programming into survival analysis, providing a new paradigm for interpretable, high-performance models.
Novelty
This study is the first to systematically combine multi-objective genetic programming for nonlinear feature construction with joint evolution of entire survival tree structures. Unlike prior work that either focused solely on feature engineering or tree optimization independently, this approach integrates both, enabling the capture of complex relationships in shallow, interpretable models. The use of multi-objective optimization to balance accuracy and complexity, along with the joint evolution strategy, represents a novel contribution that pushes the boundaries of interpretable survival modeling. This innovation addresses longstanding challenges in modeling non-linear covariate interactions while maintaining transparency, setting a new standard in the field.
Limitations
- The computational cost of multi-objective evolutionary algorithms is high, especially for large datasets or higher tree depths, which may limit scalability;
- The method's performance in extremely high-dimensional or sparse data scenarios remains to be thoroughly tested;
- Validation is primarily on breast cancer datasets, and generalization to other diseases or multi-modal data requires further investigation;
Future Work
Future research could focus on enhancing algorithm efficiency, such as parallelization or surrogate modeling, to handle larger datasets. Extending the framework to multi-task or multi-modal survival analysis could broaden its applicability. Incorporating domain knowledge into the feature evolution process might improve interpretability and performance further. Additionally, developing automated model selection tools within the multi-objective framework can facilitate clinical adoption by providing optimal trade-offs between accuracy and simplicity.
AI Executive Summary
Predicting patient survival times is a cornerstone of personalized medicine, guiding treatment decisions and resource allocation. Traditional models like Cox proportional hazards assume linear, time-invariant effects, limiting their ability to capture complex, nonlinear relationships inherent in biomedical data. Deep learning approaches, while powerful, often operate as opaque black boxes, hindering clinical trust and interpretability.
This study introduces a novel framework that combines multi-objective genetic programming (GP-GOMEA) with survival tree optimization, aiming to produce models that are both accurate and transparent. The core idea is to evolve nonlinear features that encapsulate complex covariate interactions and to jointly optimize the structure and split logic of survival trees. This approach addresses the limitations of greedy algorithms, which tend to get trapped in local optima, and deep models that lack interpretability.
The methodology involves generating high-expressiveness nonlinear features through GP-GOMEA, balancing predictive performance and feature complexity via multi-objective optimization. These features are then used to induce shallow survival trees (depth 2 or 3), with three induction strategies explored: greedy, optimal via dynamic programming, and evolutionary. The joint evolution approach simultaneously refines the tree structure and split logic, leading to models that are both accurate and inherently interpretable.
Experiments conducted on two real-world breast cancer datasets—GBSG and METABRIC—demonstrate that the proposed method significantly outperforms baseline models, including traditional survival trees, deepSurv, and CoxKAN. The results show an average IBS improvement of 8-12%, with the joint evolutionary trees achieving the best balance of accuracy and simplicity. Notably, the models successfully capture complex relationships such as XOR interactions, which are challenging for conventional methods.
This work advances the field of survival analysis by providing a flexible, interpretable modeling framework that can handle complex covariate interactions without sacrificing transparency. Its implications extend beyond oncology, offering a blueprint for developing trustworthy AI systems in healthcare. Future directions include scaling the approach to larger datasets, integrating multi-modal data, and automating model selection to facilitate clinical deployment. Overall, this research marks a significant step toward truly interpretable, high-performance survival models that can transform personalized medicine.
Deep Analysis
Background
Survival analysis has long been a vital statistical tool in medicine, engineering, and social sciences, aiming to predict the time until an event occurs. Classical methods like Cox proportional hazards (Cox PH) model the hazard function as a product of a baseline hazard and an exponential of a linear combination of covariates. While computationally efficient and interpretable, Cox PH assumes proportional hazards and linear effects, limiting its capacity to model complex, nonlinear interactions. To address these limitations, deep learning models such as DeepSurv have been developed, leveraging neural networks to capture nonlinearities. However, these models often operate as black boxes, reducing their clinical trustworthiness.
In parallel, tree-based models like survival trees offer intuitive interpretability, recursively partitioning data based on covariates. Classical greedy algorithms for tree induction are fast but prone to suboptimal splits, especially in complex relationships. Non-greedy approaches, including dynamic programming, can find optimal trees but are limited by the complexity of split conditions they can handle. Recent efforts have focused on feature engineering to enable shallow trees to model nonlinear interactions, but these are typically handcrafted and lack systematic optimization.
Genetic programming (GP), particularly multi-objective variants like GP-GOMEA, has shown promise in symbolic regression and feature construction. Its ability to evolve complex expressions makes it suitable for generating nonlinear features that can improve model expressiveness. This paper builds on this foundation, proposing a framework that combines GP-based feature evolution with joint optimization of survival tree structures, aiming to produce models that are both accurate and inherently interpretable.
Core Problem
The core challenge lies in balancing predictive accuracy with interpretability in survival models. Traditional shallow survival trees are easy to understand but struggle with complex relationships, often requiring deep, less interpretable structures. Conversely, deep trees or complex models like neural networks achieve high accuracy but lack transparency. Existing feature engineering methods are either manual or limited in capturing nonlinear interactions. Moreover, greedy tree induction methods are prone to local optima, missing globally optimal splits that could better model the data. The problem becomes more acute in clinical settings, where model interpretability is crucial for trust and adoption. Therefore, the key question is how to systematically generate nonlinear features and optimize tree structures simultaneously, ensuring models are both accurate and transparent.
Innovation
This work introduces several key innovations:
- �� Multi-objective genetic programming (GP-GOMEA) for symbolic nonlinear feature construction, enabling the capture of complex covariate interactions;
- �� A joint evolutionary algorithm that simultaneously optimizes the entire survival tree structure and split logic, overcoming the limitations of greedy algorithms;
- �� A multi-objective framework balancing predictive performance (via IBS) and model complexity (expression length), providing diverse solutions for different interpretability needs;
- �� Validation on real-world datasets demonstrating significant improvements in predictive accuracy while maintaining interpretability, especially in modeling complex relationships like XOR interactions.
These innovations collectively enable the development of shallow, interpretable survival models with performance comparable to deeper or black-box models.
Methodology
- �� Generate nonlinear features using multi-objective GP-GOMEA, optimizing for IBS and expression complexity, with input features from clinical datasets;
- �� Use these features to induce survival trees with depths of 2 or 3, employing three strategies: greedy, optimal via dynamic programming, and joint evolutionary optimization;
- �� For greedy trees, locally search for the best split using constructed features; for optimal trees, use binary features with dynamic programming to find globally optimal splits; for evolutionary trees, simultaneously evolve the tree structure and split logic via multi-tree GP-GOMEA;
- �� Incorporate multi-objective fitness evaluation considering prediction accuracy (IBS) and model simplicity (feature expression length or tree size);
- �� Validate models through 25-fold cross-validation repeated 30 times, using hyperparameters such as population size 1024, max generations 50, and tree depth constraints;
- �� Compare models across datasets (GBSG and METABRIC) with baseline methods (Cox PH, DeepSurv, Random Survival Forests), analyzing the impact of nonlinear feature construction and joint evolution.
Experiments
Experiments utilized two breast cancer datasets: GBSG (1546 patients, 37% censored) and METABRIC (1523 patients, 57% censored). Features included clinical variables and gene expression data. Baselines comprised traditional greedy survival trees, deep learning models (DeepSurv, CoxKAN), and Random Survival Forests. The proposed approach involved evolving nonlinear features via GP-GOMEA, then inducing shallow survival trees (depth 2 or 3) with different strategies. Performance was assessed using integrated Brier score (IBS) and Harrell’s C-index, with 25-fold stratified shuffle splits repeated 30 times for robustness. Hyperparameters included a population size of 1024, maximum generations of 50, and feature expression length limits. Ablation studies examined the effect of nonlinear feature evolution, joint versus separate optimization, and tree depth. Results consistently showed that nonlinear features and joint evolution significantly improved predictive accuracy and interpretability.
Results
The experimental results demonstrated that models utilizing GP-evolved nonlinear features outperformed baseline methods, with IBS scores reduced by approximately 8-12%. At depth 3, the joint evolutionary survival tree achieved an IBS of 0.074, compared to 0.106 for the baseline. The models effectively captured complex relationships such as XOR interactions, which traditional greedy trees failed to model. The multi-objective framework allowed selecting models that balance accuracy and simplicity, providing interpretable structures without sacrificing performance. The results validated that combining nonlinear feature construction with joint optimization of tree structure yields superior survival predictions, especially in complex scenarios, confirming the approach’s robustness across datasets and tree depths.
Applications
This methodology is directly applicable to clinical risk prediction, personalized treatment planning, and early diagnosis in oncology and other chronic diseases. By providing transparent models that clinicians can interpret and validate, it enhances trust and facilitates clinical decision-making. The approach requires only standard clinical or genomic data, making it accessible for routine use. Its ability to model complex interactions without deep neural networks makes it suitable for settings with limited data or where interpretability is paramount. Future applications could include multi-modal data integration, disease subtyping, and real-time risk monitoring, supporting the broader goal of precision medicine.
Limitations & Outlook
Despite its strengths, the approach faces challenges such as high computational costs due to evolutionary algorithms, limiting scalability to very large datasets. The current validation on breast cancer datasets may restrict generalizability; additional testing on diverse diseases and multi-modal data is needed. The feature expression complexity and tree depth constraints may limit modeling of extremely intricate relationships. Furthermore, the interpretability depends on the simplicity of evolved expressions, which may become complex in some cases. Future work should focus on improving computational efficiency, extending validation to other domains, and developing automated model selection tools to facilitate clinical adoption.
Plain Language Accessible to non-experts
Imagine you’re trying to figure out what makes a plant grow tall and healthy. You might think it’s just water and sunlight, but actually, many hidden factors influence growth—like soil quality, fertilizer type, and watering schedule. Sometimes, these factors interact in complicated ways, like a puzzle with many pieces fitting together. Traditional methods to understand this are like guessing which piece fits best, but they often miss the full picture.
Now, think of a smart gardener who tries many different combinations of soil, water, and fertilizer, learning from each experiment to find the best recipe. This gardener uses a special technique called ‘genetic programming,’ which mimics natural evolution—trying out many options, selecting the best ones, and combining them to improve over time.
In medical research, predicting how long a patient will survive is like this gardening puzzle. Doctors want a simple, clear ‘recipe’—a model—that tells them which factors matter most and how they interact. The challenge is to find a model that’s both easy to understand and accurate. The researchers in this paper used a similar approach: they let a computer ‘try out’ many combinations of patient data features, including complex, nonlinear ones, and then built simple decision trees based on these features.
The result is a set of models that can predict patient survival times accurately while still being understandable—like a clear recipe that anyone can follow. This approach helps doctors see which factors are most important and how they work together, making medical decisions more transparent and trustworthy. It’s like having a smart, transparent recipe book for predicting health outcomes, which can be used to improve patient care and advance personalized medicine.
Glossary
Survival Analysis (生存分析)
A statistical method to predict the time until an event occurs, such as death or relapse, often handling censored data where the event has not yet happened for some subjects.
Fundamental in medical research for prognosis and risk stratification.
Cox Proportional Hazards Model (Cox比例风险模型)
A semi-parametric model that relates covariates to the hazard function assuming proportional hazards over time, widely used but limited in modeling nonlinear effects.
Traditional baseline method compared against in this paper.
Genetic Programming (遗传编程)
An evolutionary algorithm that evolves computer programs or symbolic expressions to optimize a given fitness function, capable of discovering nonlinear relationships.
Used here for symbolic feature construction.
Integrated Brier Score (IBS)
A metric that measures the accuracy of survival predictions over time by integrating the Brier score across the time horizon, accounting for censored data.
Main performance indicator in experiments.
Harrell’s C-index (Harrell的C指数)
A concordance measure that evaluates the model’s ability to correctly rank survival times, with values between 0.5 (random) and 1 (perfect).
Used for discrimination assessment.
Multi-objective Optimization (多目标优化)
An optimization framework that simultaneously optimizes multiple conflicting objectives, producing a set of Pareto-optimal solutions.
Core to balancing accuracy and interpretability.
XOR Problem (异或问题)
A classic nonlinear problem where the output depends on the exclusive OR of inputs, challenging for linear models and used here to test complex relationship modeling.
Synthetic test case for model capability.
Survival Tree (生存树)
A decision tree adapted for survival data, recursively partitioning patients based on covariates to produce survival functions at leaves.
Main model type explored.
Multi-objective GOMEA (GOMEA多目标算法)
A genetic algorithm that evolves symbolic expressions considering multiple objectives, such as performance and complexity.
Used for feature construction.
Joint Evolution (联合演化)
Simultaneous optimization of multiple model components—in this case, tree structure and split logic—via evolutionary algorithms.
Key innovation in this work.
Open Questions Unanswered questions from this research
- 1 Despite demonstrated improvements, the computational cost of joint evolutionary optimization remains high, especially for larger datasets and deeper trees, limiting scalability. How to efficiently extend this framework to high-dimensional, sparse data or multi-modal inputs is still an open challenge. Moreover, the generalization of the approach across diverse diseases, beyond breast cancer, and its robustness in real-world clinical settings require further validation. The interpretability of complex evolved expressions, particularly in high-dimensional spaces, also warrants additional research to ensure models remain transparent and clinically meaningful. Future work should focus on algorithmic efficiency, broader validation, and automated model interpretability tools to facilitate clinical adoption.
Applications
Immediate Applications
Personalized Risk Prediction
Clinicians can use the developed models to predict individual patient survival probabilities based on clinical and genomic data, aiding in treatment planning and prognosis communication.
Clinical Decision Support
Integrating interpretable survival models into electronic health records can assist doctors in making evidence-based decisions, especially in complex cases with nonlinear covariate interactions.
Biomarker Discovery
The symbolic features evolved by GP can reveal novel risk factors or interactions, guiding further biomedical research and targeted therapies.
Long-term Vision
Automated Model Personalization
Developing automated pipelines that tailor models to individual patient populations, enabling real-time risk assessment in clinical workflows.
Multi-modal Data Integration
Extending the framework to incorporate imaging, omics, and longitudinal data for comprehensive patient profiling, advancing precision medicine.
Abstract
Survival analysis concerns the task of predicting the time until an event occurs. Often used in the medical field, survival analysis deals with incomplete (i.e., censored) data, for instance, from patients who did not experience the event during the duration of the study. For practical use, both accuracy and interpretability are important. Survival trees are easy-to-follow survival models that split the patient cohort recursively into discrete patient groups. Whilst survival trees can capture complex relationships, they typically need to grow large, threatening interpretability. Moreover, survival trees are often built using greedy approaches that may overlook globally optimal split combinations, limiting predictive performance. Shallow survival trees require expressive, higher-order feature combinations to achieve competitive accuracy. We therefore use genetic programming to multi-objectively evolve inherently inspectable feature sets and study how they interact with different tree induction strategies. We further introduce an evolutionary approach that jointly optimises the survival tree structure and the non-linear split logic. Our findings demonstrate that evolutionary feature construction improves predictive performance across different tree induction strategies on two real-world datasets and two different survival tree depths. Full joint evolution has the overall highest potential to propose multiple inherently inspectable shallow survival trees of good performance.
References (20)
Survival Trees by Goodness of Split
M. LeBlanc, J. Crowley
Optimal Survival Trees: A Dynamic Programming Approach
T. Huisman, J. G. M. van der Linden, Emir Demirovi'c
CoxKAN: Kolmogorov-Arnold networks for interpretable, high-performance survival analysis
William J. Knottenbelt, Zeyu Gao, Rebecca Wray et al.
scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn
Sebastian Pölsterl
Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors
F E Harrell, Kerry L Lee, D. B. Mark
Improving the efficiency of GP-GOMEA for higher-arity operators
Thalea Schlender, Mafalda Malafaia, T. Alderliesten et al.
PISA: An AI Pipeline for Interpretable-by-design Survival Analysis Providing Multiple Complexity-Accuracy Trade-off Models
Thalea Schlender, Catharina J. A. Romme, Yvette M. van der Linden et al.
Regression Models and Life-Tables
D., R., Cox
Thinking Outside the Template with Modular GP-GOMEA
Joe Harrison, Peter A. N. Bosman, T. Alderliesten
Global Induction of Oblique Survival Trees
Malgorzata Kretowska, Marek Kretowski
Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach
E. Zitzler, L. Thiele
Multi-modal multi-objective model-based genetic programming to find multiple diverse high-quality models
E. Sijben, T. Alderliesten, P. Bosman
On the Performance Assessment and Comparison of Stochastic Multiobjective Optimizers
C. Fonseca, P. Fleming
The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
C. Curtis, Sohrab P. Shah, S. Chin et al.
What do we mean by validating a prognostic model?
D. Altman, P. Royston
On Explaining Machine Learning Models by Evolving Crucial and Compact Features
M. Virgolin, T. Alderliesten, P. Bosman
KAN: Kolmogorov-Arnold Networks
Ziming Liu, Yixuan Wang, Sachin Vaidya et al.
Assessment and comparison of prognostic classification schemes for survival data.
E. Graf, C. Schmoor, W. Sauerbrei et al.
Nonparametric Estimation from Incomplete Observations
E. Kaplan, P. Meier
Review of Statistical Methods for Evaluating the Performance of Survival or Other Time-to-Event Prediction Models (from Conventional to Deep Learning Approaches)
Seok young Park, J. E. Park, Hyungjin Kim et al.