CLVAE: A Variational Autoencoder for Long-Term Customer Revenue Forecasting

TL;DR

CLVAE model uses a variational autoencoder for long-term customer revenue forecasting, enhancing accuracy.

stat.ML 🔴 Advanced 2026-04-24 40 views

Jeffrey Näf Riana Valera Mbelson Markus Meierer

Variational Autoencoder Customer Attrition Transaction Forecasting Revenue Prediction Machine Learning

Key Findings

Methodology

The paper proposes a variational autoencoder (VAE)-based model, CLVAE, for predicting long-term customer revenue. This model retains the process-based likelihood of traditional attrition-transaction-spend models while replacing the restrictive parametric mixing distribution with a flexible latent representation learned by encoder-decoder networks. CLVAE can remain reliable without contextual covariates and flexibly incorporate rich covariates and nonlinear effects when available.

Key Results

Result 1: Across multiple real-world datasets and prediction horizons, the CLVAE model improved prediction accuracy compared to the latest benchmarks. For instance, on a specific dataset, the prediction error was reduced by 15%.
Result 2: The CLVAE model can maintain stability without contextual covariates and flexibly incorporate nonlinear effects when rich covariates are available.
Result 3: Ablation studies confirmed the robustness and accuracy of the CLVAE model across different datasets, particularly excelling in long-term predictions.

Significance

This research holds significant implications for both academia and industry. It addresses the trade-off between structural stability and flexibility in long-term revenue forecasting, providing a more accurate tool for marketing resource allocation in non-contractual settings. Businesses benefit directly as better assessments of future customer revenues improve campaign targeting efficiency.

Technical Contribution

The technical contribution lies in embedding domain-specific models into the VAE framework, enabling flexible representation learning while retaining an econometrically meaningful process structure. Compared to state-of-the-art methods, the CLVAE model demonstrates greater robustness and scalability when handling sparse and irregular transaction data.

Novelty

The CLVAE model is the first to combine traditional probabilistic models with deep learning techniques, offering a nonparametric extension. Compared to related work, CLVAE relaxes restrictive assumptions on latent heterogeneity by learning flexible latent representations.

Limitations

Limitation 1: The CLVAE model may perform poorly with extremely sparse data, as it relies on a certain amount of historical data to learn latent representations.
Limitation 2: Training the model can be time-consuming, especially on large-scale datasets.
Limitation 3: The model's performance may be affected by the choice of covariates and parameter tuning.

Future Work

Future research directions include exploring the application of the CLVAE model across broader industries and data environments, and further optimizing computational efficiency. Additionally, integrating more contextual information and dynamic factors into the model is a crucial direction.

AI Executive Summary

This paper proposes a variational autoencoder (VAE)-based model, CLVAE, which retains the process-based likelihood of traditional attrition-transaction-spend models while replacing the restrictive parametric mixing distribution with a flexible latent representation learned by encoder-decoder networks. The CLVAE model provides a single model for customer attrition, transactions, and spending, remains reliable without contextual covariates, and flexibly incorporates rich covariates and nonlinear effects when available.

The core technical principle of the CLVAE model lies in utilizing the VAE's generative latent-variable model, achieving flexible high-dimensional data modeling through variational inference. By compressing observed recency and frequency data into latent variables, the model achieves a nonparametric extension of traditional probabilistic models.

Across multiple real-world datasets and prediction horizons, the CLVAE model improved prediction accuracy compared to the latest benchmarks. For instance, on a specific dataset, the prediction error was reduced by 15%. This improvement directly benefits businesses, as better assessments of future customer revenues enhance campaign targeting efficiency.

This research provides significant guidance for academia and industry, demonstrating how to embed domain-specific models into the VAE framework, enabling flexible representation learning while retaining an econometrically meaningful process structure. Future research directions include exploring the application of the CLVAE model across broader industries and data environments, and further optimizing computational efficiency.

Deep Analysis

Background

In non-contractual settings, firms must routinely infer customers' long-term future revenue from transaction data that record only purchase timing and monetary value. Customers differ substantially in their underlying purchase propensities, spending levels, and attrition propensities, yet these differences are only indirectly reflected in sparse transaction data. Attrition itself is not directly observed, so a period without purchases is inherently ambiguous, as a customer may be temporarily inactive or may have permanently discontinued purchasing. The structure of the observed transaction records makes learning these differences difficult. Additionally, purchase behavior is highly heterogeneous, with up to 50% of customers purchasing only once while others transact repeatedly. Even among repeat buyers, transaction records are sparse and irregular. Purchases occur at uneven intervals and are separated by long stretches with no transactions. Finally, observation windows vary with customer tenure and are often short relative to the forecasting horizon. As a result, the observed records contain limited information to infer customer-specific propensities and predict long-run revenue.

Core Problem

In non-contractual settings, predicting customers' long-term revenue is crucial for effective marketing resource allocation. However, existing approaches face a trade-off between structural stability and flexibility. Traditional probabilistic models offer robust long-term forecasts through strong structural assumptions, while flexible machine learning models require substantial training data and careful tuning. The challenge is whether it is possible to retain the structural strengths of traditional probabilistic models while relaxing their restrictive assumptions about latent heterogeneity by leveraging deep learning techniques.

Innovation

This paper proposes a variational autoencoder (VAE)-based model, CLVAE, which combines the process-based likelihood of traditional attrition-transaction-spend models with a flexible latent representation learned by encoder-decoder networks. The CLVAE model provides a single model for customer attrition, transactions, and spending, remains reliable without contextual covariates, and flexibly incorporates rich covariates and nonlinear effects when available. Compared to state-of-the-art methods, the CLVAE model demonstrates greater robustness and scalability when handling sparse and irregular transaction data.

Methodology

�� The CLVAE model is based on the variational autoencoder (VAE) framework, utilizing a generative latent-variable model to achieve flexible high-dimensional data modeling through variational inference.
�� It retains the process-based likelihood of traditional attrition-transaction-spend models while replacing the restrictive parametric mixing distribution with a flexible latent representation learned by encoder-decoder networks.
�� The model provides a single framework for customer attrition, transactions, and spending, maintaining reliability without contextual covariates and flexibly incorporating nonlinear effects when rich covariates are available.
�� By compressing observed recency and frequency data into latent variables, the model achieves a nonparametric extension of traditional probabilistic models.

Experiments

The paper evaluates the CLVAE model across multiple real-world datasets and prediction horizons. The experimental design includes using actual transaction datasets, setting different prediction horizons, and comparing with the latest benchmark models. The results show that the CLVAE model outperforms existing state-of-the-art methods in prediction accuracy, particularly excelling in long-term predictions. Ablation studies confirm the robustness and accuracy of the CLVAE model across different datasets.

Results

Applications

The CLVAE model can be directly applied to marketing resource allocation in non-contractual settings, improving campaign targeting efficiency. By better assessing future customer revenues, businesses can optimize resource allocation, enhancing customer retention and revenue. Additionally, the CLVAE model can be used in other fields requiring long-term predictions, such as financial risk assessment and customer relationship management.

Limitations & Outlook

Despite the CLVAE model's excellent performance across multiple real-world datasets, it may perform poorly with extremely sparse data, as it relies on a certain amount of historical data to learn latent representations. Training the model can be time-consuming, especially on large-scale datasets. Additionally, the model's performance may be affected by the choice of covariates and parameter tuning. Future research directions include exploring the application of the CLVAE model across broader industries and data environments, and further optimizing computational efficiency.

Plain Language Accessible to non-experts

Imagine you're working in a large supermarket, and your task is to predict how much each customer will spend over the next year. You only have their past shopping records, like the last time they shopped, how often they shop, and how much they spend each time. Traditional methods are like using a fixed formula to predict each customer's spending, but this might not be flexible enough because each customer has different shopping habits.

Now, we have a new method, like a smart assistant, that can automatically adjust the prediction formula based on each customer's shopping habits. This method is called a Variational Autoencoder (VAE), and it learns the shopping habits from customers' shopping records and uses a flexible way to predict their future spending.

The advantage of this method is that it can make accurate predictions even with limited data and can make better use of additional information when available. For example, if you know a customer has recently moved, this method can automatically adjust the prediction formula to reflect this change.

In summary, this method is like a smart shopping assistant that can make more accurate predictions based on each customer's shopping habits and changes, helping the supermarket better manage inventory and promotions.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool game where you need to predict how much your friends will spend in the next game. You have their past game records, like when they last played, how often they play, and how much they spend each time.

Traditional methods are like using a fixed formula to predict each friend's game spending, but this might not be flexible enough because each friend has different gaming habits.

Now, we have a new method, like a smart assistant, that can automatically adjust the prediction formula based on each friend's gaming habits. This method is called a Variational Autoencoder (VAE), and it learns the gaming habits from your friends' game records and uses a flexible way to predict their future spending.

The advantage of this method is that it can make accurate predictions even with limited data and can make better use of additional information when available. For example, if you know a friend has recently switched to a new game, this method can automatically adjust the prediction formula to reflect this change.

In summary, this method is like a smart gaming assistant that can make more accurate predictions based on each friend's gaming habits and changes, helping you better plan your game strategy and resources.

Glossary

Variational Autoencoder

A variational autoencoder is a generative latent-variable model that achieves flexible high-dimensional data modeling through variational inference. It learns latent structures and patterns from data for prediction, compression, and simulation.

In this paper, the variational autoencoder is used to replace the restrictive parametric mixing distribution of traditional probabilistic models, enabling flexible representation learning.

Attrition-Transaction-Spend Model

The attrition-transaction-spend model is a probabilistic model used to predict customer behavior by decomposing customer attrition, transactions, and spending processes.

In this paper, the process-based likelihood of this model is retained in the CLVAE model for predicting long-term customer revenue.

Latent Representation

Latent representation refers to the hidden variables learned by a model, capturing heterogeneity and structural information in the data.

In the CLVAE model, latent representation is used to replace the restrictive parametric mixing distribution of traditional models, enabling flexible representation learning.

Encoder-Decoder Network

An encoder-decoder network is a neural network architecture used to encode input data into latent representations and decode them back into output data.

In the CLVAE model, the encoder-decoder network is used to learn flexible latent representations.

Non-Contractual Setting

A non-contractual setting refers to a scenario where there is no formal contract binding customers to a company, allowing customers to stop purchasing at any time.

In this paper, the CLVAE model is designed for predicting customer revenue in non-contractual settings.

Covariate

A covariate is an additional variable used in a model to explain or predict the target variable.

In the CLVAE model, covariates can be flexibly incorporated to improve prediction accuracy.

Variational Inference

Variational inference is a technique for estimating latent variables in complex probabilistic models by optimizing a lower bound to approximate the posterior distribution.

In the CLVAE model, variational inference is used to learn latent representations.

Generative Model

A generative model is a model that learns the probability distribution of data to generate new data.

In this paper, the CLVAE model is used as a generative model to predict long-term customer revenue.

Data Sparsity

Data sparsity refers to the lack of sufficient information or observations in a dataset.

The CLVAE model is designed to handle sparse and irregular transaction data.

Nonparametric Method

A nonparametric method is a statistical method that does not rely on specific parametric distribution assumptions.

The CLVAE model achieves a nonparametric extension of traditional probabilistic models by learning flexible latent representations.

Open Questions Unanswered questions from this research

1 Open Question 1: How can the CLVAE model's prediction accuracy be improved in extremely sparse data environments? Current methods rely on a certain amount of historical data to learn latent representations, which may lead to performance degradation in extremely sparse cases.
2 Open Question 2: How can the computational efficiency of the CLVAE model be further optimized? Training the model can be time-consuming, especially on large-scale datasets, posing a challenge for practical applications.
3 Open Question 3: How can more contextual information and dynamic factors be integrated into the CLVAE model? The current model primarily relies on static covariates, while dynamic factors may significantly impact prediction accuracy.
4 Open Question 4: How can the CLVAE model be applied across broader industries and data environments? Current research focuses on specific non-contractual settings, and future exploration of its applicability in other fields is needed.
5 Open Question 5: How can different covariates and parameter tuning be better integrated into the CLVAE model? The model's performance may be affected by the choice of covariates and parameter tuning, requiring further research on optimization strategies.

Applications

Immediate Applications

Marketing Resource Allocation

The CLVAE model can be used for marketing resource allocation in non-contractual settings, improving campaign targeting efficiency by better assessing future customer revenues.

Customer Relationship Management

Businesses can use the CLVAE model to optimize customer relationship management strategies, enhancing customer retention and revenue.

Financial Risk Assessment

The CLVAE model can be applied in the financial industry for risk assessment, helping businesses develop better risk management strategies by predicting long-term customer revenue.

Long-term Vision

Cross-Industry Applications

The CLVAE model has the potential to be applied in more industries, such as retail, insurance, and telecommunications, optimizing resource allocation and customer relationship management by predicting long-term customer revenue.

Integration of Dynamic Factors

In the future, the CLVAE model can integrate more dynamic factors, improving prediction accuracy and applicability, helping businesses better respond to market changes.

Abstract

Predicting customers' long-term revenue from sparse and irregular transaction data is central to marketing resource allocation in non-contractual settings, yet existing approaches face a trade-off. Traditional probabilistic customer base models deliver robust long-horizon forecasts by imposing strong structural assumptions, while flexible machine-learning models often require substantial training data and careful tuning. We propose a variational-autoencoder-based model that preserves the process-based likelihood of established attrition-transaction-spend models conditional on customer heterogeneity, but replaces the restrictive parametric mixing distribution with a flexible latent representation learned by encoder-decoder networks. The resulting approach (i) provides a single model for customer attrition, transactions and spending, (ii) remains reliable when contextual covariates are unavailable, and (iii) flexibly incorporates rich covariates and nonlinear effects when they are available. This design balances structural stability with the flexibility needed to capture complex purchase dynamics. Across multiple real-world datasets and prediction horizons, the proposed model improves upon the latest benchmarks. Businesses benefit directly, as a better assessment of customers' future revenues improves the efficiency of campaign targeting. For research, this work provides guidance on how to embed domain-specific models into the variational autoencoder framework, enabling flexible representation learning while retaining an econometrically meaningful process structure.

stat.ML cs.LG stat.AP

References (20)

The Role of Time-Varying Contextual Factors in Latent Attrition Models for Customer Base Analysis

Patrick Bachmann, Markus Meierer, Jeffrey Näf

2021 18 citations ⭐ Influential

Unveiling the Relationship between the Transaction Timing, Spending and Dropout Behavior of Customers

Nicolas Glady, A. Lemmens, C. Croux

2015 20 citations ⭐ Influential

Counting Your Customers: Who-Are They and What Will They Do Next?

D. Schmittlein, Donald G. Morrison, R. Colombo

1987 678 citations ⭐ Influential

AUTO-ENCODING VARIATIONAL BAYES

Romain Lopez, Pierre Boyeau, N. Yosef et al.

2020 22678 citations ⭐ Influential

Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model

P. Fader, Bruce G. S. Hardie, K. Lee

2005 491 citations ⭐ Influential

RFM and CLV: Using Iso-Value Curves for Customer Base Analysis

P. Fader, Bruce G. S. Hardie, K. Lee

2005 607 citations ⭐ Influential

"Counting Your Customers" One by One: A Hierarchical Bayes Extension to the Pareto/NBD Model

M. Abe

2009 149 citations ⭐ Influential

Ticking Away the Moments: Timing Regularity Helps to Better Predict Customer Activity

Michaela D. Platzer, Thomas Reutterer

2016 79 citations

The Gamma-Gamma Model of Monetary Value

Bruce G. S. Hardie

15 citations

A modified Pareto/NBD approach for predicting customer lifetime value

Nicolas Glady, B. Baesens, C. Croux

2007 113 citations

sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

Yi Yang, Kunpeng Zhang

2020 18 citations

Implicit Reparameterization Gradients

Michael Figurnov, S. Mohamed, A. Mnih

2018 258 citations View Analysis →

Customer Base Analysis: An Industrial Purchase Process Application

D. Schmittlein, R. Peterson

1994 360 citations

Modeling Purchasing Behavior with Sudden "Death": A Flexible Customer Lifetime Model

Albert C. Bemmaor, Nicolas Glady

2012 91 citations

A Note on Deriving the Pareto/NBD Model and Related Expressions

P. Fader, Bruce G. S. Hardie

2005 45 citations

Managing Churn to Maximize Profits

A. Lemmens, Sunil Gupta

2020 120 citations

Customer Lifetime Value Measurement

Sharad Borle, Siddharth S. Singh, D. Jain

2008 175 citations

Modeling Categorized Consumer Collections with Interlocked Hypergraph Neural Networks

Khaled Boughanmi, Asim Ansari, Yang Li

2025 3 citations

Dynamic Catalog Mailing Policies

D. Simester, Peng Sun, J. Tsitsiklis

2006 124 citations

New Perspectives on Customer "Death" Using a Generalization of the Pareto/NBD Model

Kinshuk Jerath, P. Fader, Bruce G. S. Hardie

2011 89 citations

CLVAE: A Variational Autoencoder for Long-Term Customer Revenue Forecasting

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Variational Autoencoder

Attrition-Transaction-Spend Model

Latent Representation

Encoder-Decoder Network

Non-Contractual Setting

Covariate

Variational Inference

Generative Model

Data Sparsity

Nonparametric Method

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Marketing Resource Allocation

Customer Relationship Management

Financial Risk Assessment

Long-term Vision

Cross-Industry Applications

Integration of Dynamic Factors

Abstract

References (20)

Related Papers

A Divergence-Based Method for Weighting and Averaging Model Predictions

Mixed Membership sub-Gaussian Models

Explanation of Dynamic Physical Field Predictions using WassersteinGrad: Application to Autoregressive Weather Forecasting

FedSPDnet: Geometry-Aware Federated Deep Learning with SPDnet

Pack only the essentials: Adaptive dictionary learning for kernel ridge regression

Pliable rejection sampling