Auditing Preferences for Brands and Cultures in LLMs
ChoiceEval framework reveals geographic bias in LLM brand and cultural preferences, notably favoring US entities.
Key Findings
Methodology
ChoiceEval is a reproducible framework for auditing brand and cultural preferences in large language models (LLMs) under realistic usage conditions. The framework addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries; (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. ChoiceEval achieves this by segmenting users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience) and deriving diverse prompts reflecting real-world advice-seeking and decision-making behavior. LLM responses are converted into normalized top-k choice sets, quantifying preference and geographic bias.
Key Results
- Result 1: ChoiceEval applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favoritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences.
- Result 2: These preference patterns persist across user personas, suggesting systematic rather than incidental effects.
- Result 3: In multiple domains such as hotel chains, electric cars, and running shoes, LLM recommendations are skewed towards American entities despite the existence of competitive global alternatives.
Significance
ChoiceEval's significance lies in providing a scalable audit pipeline for researchers, platforms, and regulators, linking model behavior to real-world economic outcomes. The framework reveals geographic bias in LLM brand and cultural preferences, particularly in the over-representation of American entities. Such biases may impact market fairness, competition, and the diversity of information exposure, especially in AI-driven markets where these biases could lead to systematic economic advantages or disadvantages.
Technical Contribution
ChoiceEval's technical contributions include providing a comprehensive framework for systematically generating evaluation questions and assessing entity-perception bias in AI assistants. The framework is not only applicable to measuring social biases but extends to evaluating brand and cultural biases, particularly in open-ended recommendation scenarios. ChoiceEval ensures broader applicability by integrating psychographic user clusters and generating contextually relevant evaluation questions.
Novelty
ChoiceEval's novelty lies in its first systematic generation of evaluation questions and assessment of entity-perception bias in AI assistants. Unlike previous research primarily focused on social biases, ChoiceEval focuses on brand and cultural bias evaluation in open-ended recommendation scenarios, providing a scalable foundation for evaluating how AI assistants may shape real-world decisions.
Limitations
- Limitation 1: The evaluation results of the ChoiceEval framework may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions.
- Limitation 2: Since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users.
- Limitation 3: In some cases, LLM recommendations may be influenced by inherent biases in the training data, which ChoiceEval may not fully eliminate.
Future Work
Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts. Additionally, further research could explore how to reduce geographic and cultural biases in LLMs through adjustments in training data and model architecture.
AI Executive Summary
The rapid adoption of large language models (LLMs) such as ChatGPT, Google Gemini, and Meta AI has fundamentally transformed how individuals interact with technology and access information. These conversational AI systems increasingly supplement and even replace traditional search engines, becoming the primary means of information retrieval. However, geographic biases in LLM brand and cultural preferences may have profound implications for market fairness, competition, and the diversity of information exposure.
To address this issue, the paper introduces ChoiceEval, a reproducible framework for auditing brand and cultural preferences in LLMs under realistic usage conditions. ChoiceEval tackles the technical challenges of generating realistic, persona-diverse evaluation queries and converting free-form outputs into comparable choice sets and quantitative preference metrics. By segmenting users into psychographic profiles and generating diverse prompts, ChoiceEval quantifies preference and geographic bias.
In experiments, ChoiceEval was applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions. Results show that U.S.-developed models Gemini and GPT exhibit marked favoritism toward American entities, while China-developed DeepSeek shows more balanced yet still detectable geographic preferences. These preference patterns persist across user personas, indicating systematic rather than incidental effects.
The significance of ChoiceEval lies in providing a scalable audit pipeline for researchers, platforms, and regulators, linking model behavior to real-world economic outcomes. The framework reveals geographic bias in LLM brand and cultural preferences, particularly in the over-representation of American entities. Such biases may impact market fairness, competition, and the diversity of information exposure, especially in AI-driven markets where these biases could lead to systematic economic advantages or disadvantages.
However, ChoiceEval also has its limitations. Its evaluation results may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions. Additionally, since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users. Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts.
Deep Analysis
Background
In recent years, the development of large language models (LLMs) has significantly enhanced the role of AI systems in information retrieval and decision support. LLMs like ChatGPT, Google Gemini, and Meta AI have become the primary means for many people to access information. These systems not only influence individual choices but also potentially affect market fairness and competition. However, existing research has primarily focused on social biases, such as gender, race, and religion, while brand and cultural biases have received relatively less attention. The introduction of the ChoiceEval framework aims to fill this research gap by systematically evaluating geographic bias in LLM brand and cultural preferences, revealing its potential impact on market and cultural diversity.
Core Problem
Geographic biases in LLM brand and cultural preferences may have profound implications for market fairness, competition, and the diversity of information exposure. Especially in AI-driven markets, these biases could lead to systematic economic advantages or disadvantages. Existing research has primarily focused on social biases, while brand and cultural biases have received relatively less attention. Therefore, there is an urgent need for a systematic approach to evaluate geographic bias in LLM brand and cultural preferences to reveal its potential impact on market and cultural diversity.
Innovation
The core innovation of the ChoiceEval framework lies in its first systematic generation of evaluation questions and assessment of entity-perception bias in AI assistants. Unlike previous research primarily focused on social biases, ChoiceEval focuses on brand and cultural bias evaluation in open-ended recommendation scenarios, providing a scalable foundation for evaluating how AI assistants may shape real-world decisions. Specifically, ChoiceEval achieves this by segmenting users into psychographic profiles and generating diverse prompts, allowing for the quantification of preference and geographic bias. This approach is not only applicable to measuring social biases but extends to evaluating brand and cultural biases, particularly in open-ended recommendation scenarios.
Methodology
The implementation of the ChoiceEval framework includes the following steps:
- �� User Cluster Definitions: Use psychographic user clusters (e.g., budget-conscious, wellness-focused, convenience) to capture how different user types interact with AI assistants.
- �� Question Generation: Use LLMs to adapt core consumer clusters for each domain, converting their general characteristics into terminology and concerns specific to that decision-making context.
- �� Response Extraction: Query each LLM with the same set of questions and record its responses. Simulate expert evaluation with multiple independent extraction runs to reduce decoding variability and parsing ambiguities.
- �� Bias Analysis: Perform statistical analysis on extracted recommendations to reveal geographic bias in AI assistant brand and cultural preferences.
Experiments
In experiments, the ChoiceEval framework was applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions. The experimental design includes:
- �� Datasets: Select 10 topics covering commerce and culture, such as hotel chains, electric cars, and running shoes.
- �� Baselines: Choose Gemini, GPT, and DeepSeek as comparison models to reveal geographic bias in brand and cultural preferences.
- �� Evaluation Metrics: Use normalized top-k choice sets to quantify preference and geographic bias.
- �� Hyperparameters: Generate 23 questions per topic and user cluster pair, resulting in 2,070 questions in total.
Results
Experimental results show that U.S.-developed models Gemini and GPT exhibit marked favoritism toward American entities, while China-developed DeepSeek shows more balanced yet still detectable geographic preferences. These preference patterns persist across user personas, indicating systematic rather than incidental effects. In multiple domains such as hotel chains, electric cars, and running shoes, LLM recommendations are skewed towards American entities despite the existence of competitive global alternatives. Statistical analysis of extracted recommendations reveals geographic bias in AI assistant brand and cultural preferences.
Applications
The application scenarios of the ChoiceEval framework include:
- �� Market Auditing: Help researchers and regulators evaluate geographic bias in LLM brand and cultural preferences to reveal its potential impact on market and cultural diversity.
- �� Model Improvement: Provide feedback to AI developers on LLM preferences to improve model fairness and diversity, ensuring diversity and fairness in recommendations.
- �� Consumer Protection: Help consumers identify biases in AI recommendations to make more informed decisions, avoiding being swayed by certain preferences.
Limitations & Outlook
The limitations of the ChoiceEval framework include:
- �� Evaluation results may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions.
- �� Since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users.
- �� In some cases, LLM recommendations may be influenced by inherent biases in the training data, which ChoiceEval may not fully eliminate. Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts.
Plain Language Accessible to non-experts
Imagine you're in a large shopping mall with a variety of stores and brands. You want to buy a pair of running shoes but don't know which brand to choose. Your friend (like a large language model) recommends a few brands, but he always tends to recommend those he's familiar with, like American brands. This preference might influence your choice because you might overlook other equally good brands. ChoiceEval acts like a detector, helping you identify these preferences, ensuring you can see all the options, not just the favored brands. In this way, ChoiceEval helps you make more informed decisions rather than being swayed by certain preferences.
ELI14 Explained like you're 14
Hey there! You know how sometimes when we're online looking for stuff, like buying shoes or finding travel spots, AI assistants give us recommendations? But these AI helpers sometimes have a favorite, just like you might have a favorite game character! This might make us miss out on other cool stuff! ChoiceEval is like a super detective that helps us spot these favorites, so we can see more options. That way, we're not stuck in a tiny circle but can see a bigger world! Isn't that cool?
Glossary
Large Language Model (LLM)
A large language model is an AI system based on deep learning that can generate and understand natural language text.
In this paper, LLMs are used to generate recommendations for brand and cultural preferences.
ChoiceEval
ChoiceEval is a framework for auditing brand and cultural preferences in LLMs, capable of generating evaluation questions and quantifying preference and geographic bias.
ChoiceEval is used to evaluate geographic bias in LLM brand and cultural preferences.
Psychographic Profile
A psychographic profile is a user classification based on consumer values and lifestyles, used to capture different user types' behaviors and preferences.
In ChoiceEval, psychographic profiles are used to generate diverse evaluation questions.
Geographic Bias
Geographic bias refers to an AI system's preference for entities from certain geographic regions in recommendations, potentially leading to market unfairness.
ChoiceEval is used to detect geographic bias in LLM brand and cultural preferences.
Normalized Top-k Choice Set
A normalized top-k choice set refers to the top k recommendations extracted from LLM responses, used to quantify preference and geographic bias.
In ChoiceEval, used to evaluate LLM recommendation preferences.
Brand Preference
Brand preference refers to the tendency of consumers or AI systems to favor certain brands, influenced by various factors.
ChoiceEval is used to evaluate brand preferences in LLM recommendations.
Cultural Preference
Cultural preference refers to the tendency of consumers or AI systems to favor certain cultural entities, potentially affecting cultural diversity.
ChoiceEval is used to evaluate cultural preferences in LLM recommendations.
Open-ended Recommendation Scenario
An open-ended recommendation scenario is a context where users request suggestions from AI systems without explicit constraints.
ChoiceEval is used to evaluate preferences in open-ended recommendation scenarios.
Entity-perception Bias
Entity-perception bias refers to an AI system's bias in describing or recommending entities, potentially influencing user decisions.
ChoiceEval is used to detect entity-perception bias in LLMs.
Market Fairness
Market fairness refers to the equal opportunity for market participants to compete without bias or unfair practices.
ChoiceEval is used to evaluate the impact of LLMs on market fairness.
Open Questions Unanswered questions from this research
- 1 How can geographic and cultural biases in LLMs be reduced without affecting model performance? Existing methods focus on data and model architecture adjustments, but these may impact overall model performance.
- 2 Do LLM brand and cultural preferences manifest differently in non-English contexts? Existing research primarily focuses on English contexts, while preferences in non-English contexts remain underexplored.
- 3 How can preferences in LLMs be more effectively detected and quantified in open-ended recommendation scenarios? Current methods rely on standardized choice sets, which may not be effective in open-ended scenarios.
- 4 In multilingual environments, do LLM brand and cultural preferences get influenced by language? Existing research primarily focuses on single-language environments, while preferences in multilingual environments remain underexplored.
- 5 How can the diversity of LLM recommendations be improved without affecting user experience? Existing methods focus on model and data adjustments, but these may impact overall user experience.
Applications
Immediate Applications
Market Auditing
ChoiceEval can help researchers and regulators evaluate geographic bias in LLM brand and cultural preferences, revealing its potential impact on market and cultural diversity.
Model Improvement
ChoiceEval provides feedback to AI developers on LLM preferences to improve model fairness and diversity, ensuring diversity and fairness in recommendations.
Consumer Protection
ChoiceEval can help consumers identify biases in AI recommendations to make more informed decisions, avoiding being swayed by certain preferences.
Long-term Vision
Global Market Fairness
By reducing geographic and cultural biases in LLMs, ChoiceEval contributes to achieving global market fairness, ensuring equal competition opportunities for all market participants.
Cultural Diversity Protection
ChoiceEval promotes cultural diversity protection by revealing cultural preferences in LLMs, ensuring visibility and representation of different cultures globally.
Abstract
Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure. This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes. Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.
References (20)
The proof and measurement of association between two things.
C. Spearman
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou et al.
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Liwei Jiang, Yuanjun Chai, Margaret Li et al.
Cultural Values do Correlate with Consumer Behavior
W. Henry
Survey of Cultural Awareness in Language Models: Text and Beyond
S. Pawar, Junyeong Park, Jiho Jin et al.
The silicon gaze: A typology of biases and inequality in LLMs through the lens of place
Francisco W. Kerche, M. Zook, Mark Graham
Consumer Behavior: Buying, Having, and Being
M. R. Solomon
StereoSet: Measuring stereotypical bias in pretrained language models
Moin Nadeem, Anna Bethke, Siva Reddy
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major et al.
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study
Yong Cao, Li Zhou, Seolhwa Lee et al.
Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations
Kirti Bhagat, Kinshuk Vasisht, Danish Pruthi
TravelAgent: Generative agents in the built environment
Ariel Noyman, Kai Hu, Kent Larson
Large Language Models are Geographically Biased
Rohin Manvi, Samar Khanna, Marshall Burke et al.
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
J. Dhamala, Tony Sun, Varun Kumar et al.
What is in a name? Mitigating Name Bias in Text Embedding Similarity via Anonymization
S. Manchanda, Pannagadatta K. Shivaswamy
Are LLMs Rational Investors? A Study on the Financial Bias in LLMs
Yuhang Zhou, Yuchen Ni, Zhiheng Xi et al.
Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
Rishabh Mehrotra, James McInerney, Hugues Bouchard et al.
What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence,&Emerging Implications for Agentic E-Commerce
Amine Allouah, Omar Besbes, Josue Figueroa et al.
The Automated but Risky Game: Modeling and Benchmarking Agent-to-Agent Negotiations and Transactions in Consumer Markets
Shenzhe Zhu, Jiao Sun, Yi Nian et al.
Using Natural Sentence Prompts for Understanding Biases in Language Models
Sarah Alnegheimish, Alicia Guo, Yi Sun