Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

TL;DR

Policy-Guided Hybrid Simulation (PGHS) achieves 8.80% group simulation error on Meituan, improving over baselines by 45.8% and 40.9%.

cs.AI 🔴 Advanced 2026-04-17 34 views

Ziyang Chen Renbing Chen Daowei Li Jinzhi Liao Jiashen Sun Ke Zeng Xiang Zhao

AI Reader Arxiv Page Download PDF

User Behavior Simulation Policy-Guided Large Language Models Machine Learning Merchant Diagnosis

Key Findings

Methodology

The paper introduces a dual-process framework called Policy-Guided Hybrid Simulation (PGHS) for counterfactual evaluation of merchant strategies. The framework mines transferable decision policies from behavioral trajectories to form a shared alignment layer. This layer anchors an LLM-based reasoning branch to prevent over-rationalization and an ML-based fitting branch to absorb implicit regularities. Group-level predictions from both branches are fused for complementary correction.

Key Results

PGHS was deployed on the Meituan platform with 101 merchants and over 26,000 trajectories, achieving a group simulation error of 8.80%, improving by 45.8% and 40.9% over the best reasoning-based and fitting-based baselines, respectively.
In experiments, PGHS outperformed other baseline models across different merchant categories and traffic tiers, especially in data-sparse tail merchants.
Ablation studies showed that policy guidance and dual-process fusion reduced simulation error by 15.4% for the LLM branch and 9.7% for the ML branch, respectively.

Significance

This study holds significant implications for both academia and industry by addressing two structural challenges in merchant strategy counterfactual evaluation: information incompleteness and mechanism duality. By introducing a policy-guided dual-process framework, PGHS enables accurate simulation of group-level user behavior without costly online experiments. This method not only enhances the efficiency of merchant strategy evaluation but also provides new perspectives for user behavior simulation, potentially impacting other fields requiring group behavior prediction.

Technical Contribution

The technical contribution of PGHS lies in its innovative combination of dual-process frameworks. By integrating LLM reasoning and ML fitting through a shared policy alignment layer, it overcomes the limitations of single paradigms that cannot simultaneously capture interpretable preferences and implicit statistical regularities. Additionally, PGHS excels in handling long-tail data sparsity and semantic shifts, offering a more robust solution for merchant diagnosis.

Novelty

PGHS is the first to apply a policy-guided dual-process framework to user behavior simulation, innovatively combining the strengths of LLM and ML. Compared to traditional methods, PGHS not only captures explicit decision logic but also absorbs implicit environmental regularities, providing a more comprehensive perspective for merchant strategy evaluation.

Limitations

PGHS may face instability in predictions when dealing with extreme tail merchants due to data sparsity.
The complexity of the model increases computational costs, which may pose challenges for real-time applications.
During the policy mining phase, the diversity of user groups may limit the generalization ability of the policy alignment layer.

Future Work

Future research directions include developing adaptive fusion mechanisms to better handle diverse data across different scenarios, studying temporal policy dynamics to enhance model timeliness, and validating PGHS on public e-commerce benchmarks to expand its applicability and impact.

AI Executive Summary

In modern e-commerce platforms, merchants need to constantly adjust their strategies to improve user conversion rates. However, traditional online experiments like A/B testing are costly and risky. As a result, user behavior simulation has become a viable alternative. Existing simulation methods face issues of information incompleteness and mechanism duality, making it difficult to capture both explicit user preferences and implicit regularities.

This paper introduces a new framework called Policy-Guided Hybrid Simulation (PGHS), which extracts transferable decision policies from user behavior trajectories to form a shared alignment layer. This layer anchors an LLM-based reasoning branch and an ML-based fitting branch. The reasoning branch uses policy text to constrain LLM inference, preventing over-rationalization, while the fitting branch uses policy vectors to absorb implicit environmental regularities.

PGHS was deployed on the Meituan platform, involving 101 merchants and over 26,000 user trajectories. Experimental results show that PGHS achieved a group simulation error of only 8.80%, improving by 45.8% and 40.9% over the best reasoning-based and fitting-based baselines, respectively. This significant performance improvement indicates that PGHS has important applications in merchant strategy evaluation.

The dual-process framework of PGHS not only improves the accuracy of merchant strategy evaluation but also provides new perspectives for user behavior simulation. Its successful application may positively impact other fields requiring group behavior prediction, such as recommendation systems and market analysis.

Despite its outstanding performance in experiments, PGHS's complexity increases computational costs, which may pose challenges for real-time applications. Additionally, the model may face instability in predictions when dealing with extreme tail merchants due to data sparsity. Future research directions include developing adaptive fusion mechanisms and studying temporal policy dynamics to enhance model timeliness and applicability.

Deep Analysis

Background

With the rapid development of e-commerce platforms, merchants need to constantly optimize their operational strategies to improve user conversion rates. Traditional online experiments like A/B testing are not only costly but also pose significant risks, such as degrading user experience and financial loss. Moreover, modern strategies involve high-dimensional semantic variables like promotional copywriting and visual layouts, creating a combinatorial space that physical experiments cannot cover. As a result, simulation-based counterfactual evaluation has become a viable alternative. In recent years, user behavior simulation has evolved from rule-based and statistical approaches to LLM-driven agents. However, these agents often behave as hyper-rational decision-makers that fail to replicate bounded rationality.

Core Problem

Building a trustworthy group-level user behavior simulator for merchant strategy evaluation faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. Despite their complementary strengths, effectively bridging the reasoning and fitting paradigms remains an unexplored challenge.

Innovation

The core innovations of PGHS include its policy-guided dual-process framework:

1. Policy Mining: Extracts transferable decision policies from user behavior trajectories to form a shared alignment layer.

2. Dual-Process Simulation: Integrates LLM reasoning and ML fitting through the shared policy alignment layer, overcoming the limitations of single paradigms that cannot simultaneously capture interpretable preferences and implicit statistical regularities.

3. Group-Level Fusion: Fuses group-level predictions from both branches through the shared policy space for complementary correction.

Methodology

PGHS operates in three phases:

�� Decision Policy Abstraction: Distills noisy online behavioral logs into a concise set of transferable decision policies that serve as explicit proxies for the latent mechanism.
�� Dual-Process Simulation: Deploys two parallel branches anchored by the shared policy representations: one captures explicit decision logic via an LLM, and the other captures implicit regularities via a supervised model.
�� Group-Level Aggregation: Estimates the policy distribution for each scene using its historical visitor log, conducts Monte Carlo simulation, and fuses predictions from both branches.

Experiments

Experiments were conducted on the Meituan platform, covering 101 merchants and 26,461 user decision trajectories across five culinary categories and three traffic tiers. Each trajectory records a complete search-to-purchase sequence, and user and merchant profiles are encoded into dense embeddings via a pre-trained encoder. Baseline models include Logistic Regression, XGBoost, Gradient Boosting, and a 3-layer DNN. The evaluation metric is Group Simulation Error (GSE), defined as the mean absolute error between predicted and ground-truth per-merchant choice rates.

Results

Experimental results show that PGHS outperformed other baseline models across different merchant categories and traffic tiers, especially in data-sparse tail merchants. PGHS achieved a group simulation error of 8.80% with a standard deviation of 6.62%, representing relative improvements of 45.8% over the best LLM baseline and 40.9% over the best ML baseline. Ablation studies showed that policy guidance and dual-process fusion reduced simulation error by 15.4% for the LLM branch and 9.7% for the ML branch, respectively.

Applications

Application scenarios for PGHS in merchant strategy evaluation include:

�� Optimizing pricing, menu structure, and promotional page design by simulating user behavior under different strategies.
�� Evaluating the potential impact of new strategies without conducting costly online experiments, reducing risk.
�� Providing new perspectives for group behavior prediction in other fields, such as recommendation systems and market analysis.

Limitations & Outlook

Despite its outstanding performance in experiments, PGHS's complexity increases computational costs, which may pose challenges for real-time applications. Additionally, the model may face instability in predictions when dealing with extreme tail merchants due to data sparsity. During the policy mining phase, the diversity of user groups may limit the generalization ability of the policy alignment layer. Future research directions include developing adaptive fusion mechanisms and studying temporal policy dynamics to enhance model timeliness and applicability.

Plain Language Accessible to non-experts

Imagine you're managing a large shopping mall, and each store is trying to attract customers with different strategies like discounts, gifts, or store redesigns. Now, suppose you're the mall manager, and you need to decide which store strategy is the most effective, but you can't just let every store try their strategy because it might affect the customer experience.

This is where a tool called PGHS comes in. PGHS is like a virtual shopping mall that can simulate customer behavior under different strategies. It analyzes past shopping data to find patterns in how customers make decisions and uses these patterns to predict customer choices under different strategies.

PGHS has two parts: one part acts like a smart customer, making rational choices by analyzing store strategies; the other part acts like an observer, recording customers' implicit preferences, such as which store style they prefer. By combining these two parts, PGHS can more accurately predict customer behavior, helping the mall manager make better decisions.

In short, PGHS is like a smart assistant that helps the mall manager find the most effective store strategy without affecting the customer experience.

ELI14 Explained like you're 14

Hey there! Today I'm going to tell you about something super cool called PGHS. Imagine you're playing a simulation game where you manage a shopping mall with lots of stores. Each store has its own strategy, like giving discounts, offering gifts, or redecorating. Your job is to find the best strategy to make customers happy!

But you can't just let every store try their strategy because it might make customers unhappy. That's where PGHS comes in! It's like a virtual shopping mall that can simulate customer behavior under different strategies. It analyzes past data to find patterns in how customers make choices and uses these patterns to predict what customers would do under different strategies.

PGHS has two smart parts: one part is like a super rational customer that analyzes store strategies to make the best choice; the other part is like an observer that records customers' hidden preferences, like which store style they like more. By combining these two parts, PGHS can predict customer behavior more accurately, helping you make better decisions.

So, PGHS is like your super assistant, helping you find the best store strategy in the game and making your shopping mall awesome!

Glossary

Policy-Guided Hybrid Simulation (PGHS)

A dual-process framework combining large language models and machine learning for simulating user behavior and evaluating merchant strategies.

Used to address information incompleteness and mechanism duality in merchant strategy evaluation.

Large Language Model (LLM)

A deep learning-based model capable of understanding and generating natural language text.

Used in PGHS for the reasoning branch to prevent over-rationalization.

Machine Learning (ML)

A technology that trains models on data to make predictions and decisions.

Used in PGHS for the fitting branch to absorb implicit regularities.

Group Simulation Error (GSE)

The mean absolute error between predicted and ground-truth per-merchant choice rates.

Used to evaluate PGHS's performance across different merchant categories and traffic tiers.

Decision Policy

Transferable decision rules extracted from user behavior trajectories, serving as explicit proxies for latent mechanisms.

Guides the LLM and ML branches in PGHS.

Information Incompleteness

Insufficient decision information due to unobserved factors like offline context and implicit habits.

One of the structural challenges PGHS addresses.

Mechanism Duality

The need to capture both interpretable preferences and implicit statistical regularities.

One of the structural challenges PGHS addresses.

Monte Carlo Simulation

A method of numerical simulation through random sampling to estimate the behavior of complex systems.

Used in PGHS for group-level aggregation.

Tail Merchants

Merchants with fewer interaction records in the dataset.

PGHS outperforms other baseline models when handling these merchants.

Ablation Study

Evaluating the impact of removing or modifying model components on overall performance.

Used to analyze the contributions of policy guidance and dual-process fusion in PGHS.

Open Questions Unanswered questions from this research

1 How can PGHS improve prediction stability for extreme tail merchants? The current model may be unstable under data sparsity, requiring new strategies to enhance its robustness.
2 How can PGHS reduce computational costs to support real-time applications? The complexity of the model increases computational costs, necessitating algorithm optimization or hardware acceleration to improve efficiency.
3 How can the generalization ability of the policy alignment layer be improved during the policy mining phase? The diversity of user groups may limit the generalization ability of the policy alignment layer, requiring exploration of new policy mining methods.
4 How can PGHS's applicability be validated on public e-commerce benchmarks? Current research focuses primarily on the Meituan platform, requiring validation on other platforms to expand its applicability.
5 How can adaptive fusion mechanisms be developed to handle diverse data across different scenarios? The current fusion mechanism may perform poorly in specific scenarios, requiring exploration of new adaptive strategies.

Applications

Immediate Applications

Merchant Strategy Optimization

Simulating user behavior under different strategies to help merchants optimize pricing, menu structure, and promotional page design.

Risk Assessment

Evaluating the potential impact of new strategies without conducting costly online experiments, reducing risk.

Market Analysis

Providing new perspectives for group behavior prediction in other fields, such as recommendation systems and market analysis.

Long-term Vision

Cross-Platform Application

Applying PGHS to other e-commerce platforms to expand its applicability and impact.

Real-Time Decision Support

Achieving real-time merchant strategy evaluation and optimization by reducing computational costs and improving efficiency.

Abstract

Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a trustworthy simulator faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. We propose Policy-Guided Hybrid Simulation (PGHS), a dual-process framework that mines transferable decision policies from behavioral trajectories and uses them as a shared alignment layer. This layer anchors an LLM-based reasoning branch that prevents over-rationalization and an ML-based fitting branch that absorbs implicit regularities. Group-level predictions from both branches are fused for complementary correction. We deploy PGHS on Meituan with 101 merchants and over 26,000 trajectories. PGHS achieves a group simulation error of 8.80%, improving over the best reasoning-based and fitting-based baselines by 45.8% and 40.9% respectively.

cs.AI cs.CL

References (18)

Wide & Deep Learning for Recommender Systems

Heng-Tze Cheng, L. Koc, Jeremiah Harmsen et al.

2016 4079 citations View Analysis →

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

K. Balog, ChengXiang Zhai

2025 17 citations View Analysis →

Causal inference in statistics: An overview

J. Pearl

2009 2413 citations

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

Zelong Li, Wenyue Hua, Hao Wang et al.

2024 44 citations View Analysis →

CXSimulator: A User Behavior Simulation using LLM Embeddings for Web-Marketing Campaign Assessment

Akira Kasuga, Ryo Yonetani

2024 12 citations View Analysis →

Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests

Yuan Yuan, Kristen M. Altenburger, F. Kooti

2020 36 citations View Analysis →

Limitations of Design-based Causal Inference and A/B Testing under Arbitrary and Network Interference

Guillaume W. Basse, E. Airoldi

2017 58 citations View Analysis →

Self-Attentive Sequential Recommendation

Wang-Cheng Kang, Julian McAuley

2018 3519 citations View Analysis →

Density-Based Clustering Based on Hierarchical Density Estimates

R. Campello, D. Moulavi, J. Sander

2013 2362 citations

User Modeling in Human^Computer Interaction

Gerhard Fischer

2000 871 citations

Causation and intervention

Mathias Frisch

2014 67 citations

Deconfounded Recommendation for Alleviating Bias Amplification

Wenjie Wang, Fuli Feng, Xiangnan He et al.

2021 209 citations View Analysis →

Scalable Techniques for Mining Causal Structures

Craig Silverstein, Sergey Brin, R. Motwani et al.

1998 315 citations

ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise

Xing-ming Guo, Darioush Keivan, U. Syed et al.

2024 23 citations View Analysis →

Thinking fast and slow.

N. McGlynn

2014 10998 citations

LLM-Powered User Simulator for Recommender System

Zijian Zhang, Shuchang Liu, Ziru Liu et al.

2024 46 citations View Analysis →

Understanding dynamics of strategic decision-making in venture creation : a process study of effectuation and causation

I. Reymen, Petra Andries, H. Berends et al.

2015 383 citations

The Foundations of Causal Decision Theory

James M. Joyce

1999 758 citations

Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Policy-Guided Hybrid Simulation (PGHS)

Large Language Model (LLM)

Machine Learning (ML)

Group Simulation Error (GSE)

Decision Policy

Information Incompleteness

Mechanism Duality

Monte Carlo Simulation

Tail Merchants

Ablation Study

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Merchant Strategy Optimization

Risk Assessment

Market Analysis

Long-term Vision

Cross-Platform Application

Real-Time Decision Support

Abstract

References (18)

Related Papers

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity

AgentSearchBench: A Benchmark for AI Agent Search in the Wild

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

Large Language Models Exhibit Normative Conformity