Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation
Policy-Guided Hybrid Simulation (PGHS) achieves 8.80% group simulation error on Meituan, improving over baselines by 45.8% and 40.9%.
Key Findings
Methodology
The paper introduces a dual-process framework called Policy-Guided Hybrid Simulation (PGHS) for counterfactual evaluation of merchant strategies. The framework mines transferable decision policies from behavioral trajectories to form a shared alignment layer. This layer anchors an LLM-based reasoning branch to prevent over-rationalization and an ML-based fitting branch to absorb implicit regularities. Group-level predictions from both branches are fused for complementary correction.
Key Results
- PGHS was deployed on the Meituan platform with 101 merchants and over 26,000 trajectories, achieving a group simulation error of 8.80%, improving by 45.8% and 40.9% over the best reasoning-based and fitting-based baselines, respectively.
- In experiments, PGHS outperformed other baseline models across different merchant categories and traffic tiers, especially in data-sparse tail merchants.
- Ablation studies showed that policy guidance and dual-process fusion reduced simulation error by 15.4% for the LLM branch and 9.7% for the ML branch, respectively.
Significance
This study holds significant implications for both academia and industry by addressing two structural challenges in merchant strategy counterfactual evaluation: information incompleteness and mechanism duality. By introducing a policy-guided dual-process framework, PGHS enables accurate simulation of group-level user behavior without costly online experiments. This method not only enhances the efficiency of merchant strategy evaluation but also provides new perspectives for user behavior simulation, potentially impacting other fields requiring group behavior prediction.
Technical Contribution
The technical contribution of PGHS lies in its innovative combination of dual-process frameworks. By integrating LLM reasoning and ML fitting through a shared policy alignment layer, it overcomes the limitations of single paradigms that cannot simultaneously capture interpretable preferences and implicit statistical regularities. Additionally, PGHS excels in handling long-tail data sparsity and semantic shifts, offering a more robust solution for merchant diagnosis.
Novelty
PGHS is the first to apply a policy-guided dual-process framework to user behavior simulation, innovatively combining the strengths of LLM and ML. Compared to traditional methods, PGHS not only captures explicit decision logic but also absorbs implicit environmental regularities, providing a more comprehensive perspective for merchant strategy evaluation.
Limitations
- PGHS may face instability in predictions when dealing with extreme tail merchants due to data sparsity.
- The complexity of the model increases computational costs, which may pose challenges for real-time applications.
- During the policy mining phase, the diversity of user groups may limit the generalization ability of the policy alignment layer.
Future Work
Future research directions include developing adaptive fusion mechanisms to better handle diverse data across different scenarios, studying temporal policy dynamics to enhance model timeliness, and validating PGHS on public e-commerce benchmarks to expand its applicability and impact.
AI Executive Summary
In modern e-commerce platforms, merchants need to constantly adjust their strategies to improve user conversion rates. However, traditional online experiments like A/B testing are costly and risky. As a result, user behavior simulation has become a viable alternative. Existing simulation methods face issues of information incompleteness and mechanism duality, making it difficult to capture both explicit user preferences and implicit regularities.
This paper introduces a new framework called Policy-Guided Hybrid Simulation (PGHS), which extracts transferable decision policies from user behavior trajectories to form a shared alignment layer. This layer anchors an LLM-based reasoning branch and an ML-based fitting branch. The reasoning branch uses policy text to constrain LLM inference, preventing over-rationalization, while the fitting branch uses policy vectors to absorb implicit environmental regularities.
PGHS was deployed on the Meituan platform, involving 101 merchants and over 26,000 user trajectories. Experimental results show that PGHS achieved a group simulation error of only 8.80%, improving by 45.8% and 40.9% over the best reasoning-based and fitting-based baselines, respectively. This significant performance improvement indicates that PGHS has important applications in merchant strategy evaluation.
The dual-process framework of PGHS not only improves the accuracy of merchant strategy evaluation but also provides new perspectives for user behavior simulation. Its successful application may positively impact other fields requiring group behavior prediction, such as recommendation systems and market analysis.
Despite its outstanding performance in experiments, PGHS's complexity increases computational costs, which may pose challenges for real-time applications. Additionally, the model may face instability in predictions when dealing with extreme tail merchants due to data sparsity. Future research directions include developing adaptive fusion mechanisms and studying temporal policy dynamics to enhance model timeliness and applicability.
Deep Analysis
Background
With the rapid development of e-commerce platforms, merchants need to constantly optimize their operational strategies to improve user conversion rates. Traditional online experiments like A/B testing are not only costly but also pose significant risks, such as degrading user experience and financial loss. Moreover, modern strategies involve high-dimensional semantic variables like promotional copywriting and visual layouts, creating a combinatorial space that physical experiments cannot cover. As a result, simulation-based counterfactual evaluation has become a viable alternative. In recent years, user behavior simulation has evolved from rule-based and statistical approaches to LLM-driven agents. However, these agents often behave as hyper-rational decision-makers that fail to replicate bounded rationality.
Core Problem
Building a trustworthy group-level user behavior simulator for merchant strategy evaluation faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. Despite their complementary strengths, effectively bridging the reasoning and fitting paradigms remains an unexplored challenge.
Innovation
The core innovations of PGHS include its policy-guided dual-process framework:
1. Policy Mining: Extracts transferable decision policies from user behavior trajectories to form a shared alignment layer.
2. Dual-Process Simulation: Integrates LLM reasoning and ML fitting through the shared policy alignment layer, overcoming the limitations of single paradigms that cannot simultaneously capture interpretable preferences and implicit statistical regularities.
3. Group-Level Fusion: Fuses group-level predictions from both branches through the shared policy space for complementary correction.
Methodology
PGHS operates in three phases:
- �� Decision Policy Abstraction: Distills noisy online behavioral logs into a concise set of transferable decision policies that serve as explicit proxies for the latent mechanism.
- �� Dual-Process Simulation: Deploys two parallel branches anchored by the shared policy representations: one captures explicit decision logic via an LLM, and the other captures implicit regularities via a supervised model.
- �� Group-Level Aggregation: Estimates the policy distribution for each scene using its historical visitor log, conducts Monte Carlo simulation, and fuses predictions from both branches.
Experiments
Experiments were conducted on the Meituan platform, covering 101 merchants and 26,461 user decision trajectories across five culinary categories and three traffic tiers. Each trajectory records a complete search-to-purchase sequence, and user and merchant profiles are encoded into dense embeddings via a pre-trained encoder. Baseline models include Logistic Regression, XGBoost, Gradient Boosting, and a 3-layer DNN. The evaluation metric is Group Simulation Error (GSE), defined as the mean absolute error between predicted and ground-truth per-merchant choice rates.
Results
Experimental results show that PGHS outperformed other baseline models across different merchant categories and traffic tiers, especially in data-sparse tail merchants. PGHS achieved a group simulation error of 8.80% with a standard deviation of 6.62%, representing relative improvements of 45.8% over the best LLM baseline and 40.9% over the best ML baseline. Ablation studies showed that policy guidance and dual-process fusion reduced simulation error by 15.4% for the LLM branch and 9.7% for the ML branch, respectively.
Applications
Application scenarios for PGHS in merchant strategy evaluation include:
- �� Optimizing pricing, menu structure, and promotional page design by simulating user behavior under different strategies.
- �� Evaluating the potential impact of new strategies without conducting costly online experiments, reducing risk.
- �� Providing new perspectives for group behavior prediction in other fields, such as recommendation systems and market analysis.
Limitations & Outlook
Despite its outstanding performance in experiments, PGHS's complexity increases computational costs, which may pose challenges for real-time applications. Additionally, the model may face instability in predictions when dealing with extreme tail merchants due to data sparsity. During the policy mining phase, the diversity of user groups may limit the generalization ability of the policy alignment layer. Future research directions include developing adaptive fusion mechanisms and studying temporal policy dynamics to enhance model timeliness and applicability.
Plain Language Accessible to non-experts
Imagine you're managing a large shopping mall, and each store is trying to attract customers with different strategies like discounts, gifts, or store redesigns. Now, suppose you're the mall manager, and you need to decide which store strategy is the most effective, but you can't just let every store try their strategy because it might affect the customer experience.
This is where a tool called PGHS comes in. PGHS is like a virtual shopping mall that can simulate customer behavior under different strategies. It analyzes past shopping data to find patterns in how customers make decisions and uses these patterns to predict customer choices under different strategies.
PGHS has two parts: one part acts like a smart customer, making rational choices by analyzing store strategies; the other part acts like an observer, recording customers' implicit preferences, such as which store style they prefer. By combining these two parts, PGHS can more accurately predict customer behavior, helping the mall manager make better decisions.
In short, PGHS is like a smart assistant that helps the mall manager find the most effective store strategy without affecting the customer experience.
ELI14 Explained like you're 14
Hey there! Today I'm going to tell you about something super cool called PGHS. Imagine you're playing a simulation game where you manage a shopping mall with lots of stores. Each store has its own strategy, like giving discounts, offering gifts, or redecorating. Your job is to find the best strategy to make customers happy!
But you can't just let every store try their strategy because it might make customers unhappy. That's where PGHS comes in! It's like a virtual shopping mall that can simulate customer behavior under different strategies. It analyzes past data to find patterns in how customers make choices and uses these patterns to predict what customers would do under different strategies.
PGHS has two smart parts: one part is like a super rational customer that analyzes store strategies to make the best choice; the other part is like an observer that records customers' hidden preferences, like which store style they like more. By combining these two parts, PGHS can predict customer behavior more accurately, helping you make better decisions.
So, PGHS is like your super assistant, helping you find the best store strategy in the game and making your shopping mall awesome!
Glossary
Policy-Guided Hybrid Simulation (PGHS)
A dual-process framework combining large language models and machine learning for simulating user behavior and evaluating merchant strategies.
Used to address information incompleteness and mechanism duality in merchant strategy evaluation.
Large Language Model (LLM)
A deep learning-based model capable of understanding and generating natural language text.
Used in PGHS for the reasoning branch to prevent over-rationalization.
Machine Learning (ML)
A technology that trains models on data to make predictions and decisions.
Used in PGHS for the fitting branch to absorb implicit regularities.
Group Simulation Error (GSE)
The mean absolute error between predicted and ground-truth per-merchant choice rates.
Used to evaluate PGHS's performance across different merchant categories and traffic tiers.
Decision Policy
Transferable decision rules extracted from user behavior trajectories, serving as explicit proxies for latent mechanisms.
Guides the LLM and ML branches in PGHS.
Information Incompleteness
Insufficient decision information due to unobserved factors like offline context and implicit habits.
One of the structural challenges PGHS addresses.
Mechanism Duality
The need to capture both interpretable preferences and implicit statistical regularities.
One of the structural challenges PGHS addresses.
Monte Carlo Simulation
A method of numerical simulation through random sampling to estimate the behavior of complex systems.
Used in PGHS for group-level aggregation.
Tail Merchants
Merchants with fewer interaction records in the dataset.
PGHS outperforms other baseline models when handling these merchants.
Ablation Study
Evaluating the impact of removing or modifying model components on overall performance.
Used to analyze the contributions of policy guidance and dual-process fusion in PGHS.
Open Questions Unanswered questions from this research
- 1 How can PGHS improve prediction stability for extreme tail merchants? The current model may be unstable under data sparsity, requiring new strategies to enhance its robustness.
- 2 How can PGHS reduce computational costs to support real-time applications? The complexity of the model increases computational costs, necessitating algorithm optimization or hardware acceleration to improve efficiency.
- 3 How can the generalization ability of the policy alignment layer be improved during the policy mining phase? The diversity of user groups may limit the generalization ability of the policy alignment layer, requiring exploration of new policy mining methods.
- 4 How can PGHS's applicability be validated on public e-commerce benchmarks? Current research focuses primarily on the Meituan platform, requiring validation on other platforms to expand its applicability.
- 5 How can adaptive fusion mechanisms be developed to handle diverse data across different scenarios? The current fusion mechanism may perform poorly in specific scenarios, requiring exploration of new adaptive strategies.
Applications
Immediate Applications
Merchant Strategy Optimization
Simulating user behavior under different strategies to help merchants optimize pricing, menu structure, and promotional page design.
Risk Assessment
Evaluating the potential impact of new strategies without conducting costly online experiments, reducing risk.
Market Analysis
Providing new perspectives for group behavior prediction in other fields, such as recommendation systems and market analysis.
Long-term Vision
Cross-Platform Application
Applying PGHS to other e-commerce platforms to expand its applicability and impact.
Real-Time Decision Support
Achieving real-time merchant strategy evaluation and optimization by reducing computational costs and improving efficiency.
Abstract
Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a trustworthy simulator faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. We propose Policy-Guided Hybrid Simulation (PGHS), a dual-process framework that mines transferable decision policies from behavioral trajectories and uses them as a shared alignment layer. This layer anchors an LLM-based reasoning branch that prevents over-rationalization and an ML-based fitting branch that absorbs implicit regularities. Group-level predictions from both branches are fused for complementary correction. We deploy PGHS on Meituan with 101 merchants and over 26,000 trajectories. PGHS achieves a group simulation error of 8.80%, improving over the best reasoning-based and fitting-based baselines by 45.8% and 40.9% respectively.
References (18)
Wide & Deep Learning for Recommender Systems
Heng-Tze Cheng, L. Koc, Jeremiah Harmsen et al.
User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
K. Balog, ChengXiang Zhai
Causal inference in statistics: An overview
J. Pearl
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
Zelong Li, Wenyue Hua, Hao Wang et al.
CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment
Akira Kasuga, Ryo Yonetani
Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests
Yuan Yuan, Kristen M. Altenburger, F. Kooti
Limitations of Design-based Causal Inference and A/B Testing under Arbitrary and Network Interference
Guillaume W. Basse, E. Airoldi
Self-Attentive Sequential Recommendation
Wang-Cheng Kang, Julian McAuley
Density-Based Clustering Based on Hierarchical Density Estimates
R. Campello, D. Moulavi, J. Sander
User Modeling in Human^Computer Interaction
Gerhard Fischer
Causation and intervention
Mathias Frisch
Deconfounded Recommendation for Alleviating Bias Amplification
Wenjie Wang, Fuli Feng, Xiangnan He et al.
Scalable Techniques for Mining Causal Structures
Craig Silverstein, Sergey Brin, R. Motwani et al.
ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise
Xing-ming Guo, Darioush Keivan, U. Syed et al.
Thinking fast and slow.
N. McGlynn
LLM-Powered User Simulator for Recommender System
Zijian Zhang, Shuchang Liu, Ziru Liu et al.
Understanding dynamics of strategic decision-making in venture creation : a process study of effectuation and causation
I. Reymen, Petra Andries, H. Berends et al.
The Foundations of Causal Decision Theory
James M. Joyce