Auditing Preferences for Brands and Cultures in LLMs

TL;DR

ChoiceEval framework reveals geographic bias in LLM brand and cultural preferences, notably favoring US entities.

cs.HC 🔴 Advanced 2026-03-19 60 views
Jasmine Rienecker Katarina Mpofu Naman Goel Siddhartha Datta Jun Zhao Oscar Danielsson Fredrik Thorsen
LLM preference auditing brand bias cultural bias geographic bias

Key Findings

Methodology

ChoiceEval is a reproducible framework for auditing brand and cultural preferences in large language models (LLMs) under realistic usage conditions. The framework addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries; (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. ChoiceEval achieves this by segmenting users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience) and deriving diverse prompts reflecting real-world advice-seeking and decision-making behavior. LLM responses are converted into normalized top-k choice sets, quantifying preference and geographic bias.

Key Results

  • Result 1: ChoiceEval applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favoritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences.
  • Result 2: These preference patterns persist across user personas, suggesting systematic rather than incidental effects.
  • Result 3: In multiple domains such as hotel chains, electric cars, and running shoes, LLM recommendations are skewed towards American entities despite the existence of competitive global alternatives.

Significance

ChoiceEval's significance lies in providing a scalable audit pipeline for researchers, platforms, and regulators, linking model behavior to real-world economic outcomes. The framework reveals geographic bias in LLM brand and cultural preferences, particularly in the over-representation of American entities. Such biases may impact market fairness, competition, and the diversity of information exposure, especially in AI-driven markets where these biases could lead to systematic economic advantages or disadvantages.

Technical Contribution

ChoiceEval's technical contributions include providing a comprehensive framework for systematically generating evaluation questions and assessing entity-perception bias in AI assistants. The framework is not only applicable to measuring social biases but extends to evaluating brand and cultural biases, particularly in open-ended recommendation scenarios. ChoiceEval ensures broader applicability by integrating psychographic user clusters and generating contextually relevant evaluation questions.

Novelty

ChoiceEval's novelty lies in its first systematic generation of evaluation questions and assessment of entity-perception bias in AI assistants. Unlike previous research primarily focused on social biases, ChoiceEval focuses on brand and cultural bias evaluation in open-ended recommendation scenarios, providing a scalable foundation for evaluating how AI assistants may shape real-world decisions.

Limitations

  • Limitation 1: The evaluation results of the ChoiceEval framework may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions.
  • Limitation 2: Since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users.
  • Limitation 3: In some cases, LLM recommendations may be influenced by inherent biases in the training data, which ChoiceEval may not fully eliminate.

Future Work

Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts. Additionally, further research could explore how to reduce geographic and cultural biases in LLMs through adjustments in training data and model architecture.

AI Executive Summary

The rapid adoption of large language models (LLMs) such as ChatGPT, Google Gemini, and Meta AI has fundamentally transformed how individuals interact with technology and access information. These conversational AI systems increasingly supplement and even replace traditional search engines, becoming the primary means of information retrieval. However, geographic biases in LLM brand and cultural preferences may have profound implications for market fairness, competition, and the diversity of information exposure.

To address this issue, the paper introduces ChoiceEval, a reproducible framework for auditing brand and cultural preferences in LLMs under realistic usage conditions. ChoiceEval tackles the technical challenges of generating realistic, persona-diverse evaluation queries and converting free-form outputs into comparable choice sets and quantitative preference metrics. By segmenting users into psychographic profiles and generating diverse prompts, ChoiceEval quantifies preference and geographic bias.

In experiments, ChoiceEval was applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions. Results show that U.S.-developed models Gemini and GPT exhibit marked favoritism toward American entities, while China-developed DeepSeek shows more balanced yet still detectable geographic preferences. These preference patterns persist across user personas, indicating systematic rather than incidental effects.

The significance of ChoiceEval lies in providing a scalable audit pipeline for researchers, platforms, and regulators, linking model behavior to real-world economic outcomes. The framework reveals geographic bias in LLM brand and cultural preferences, particularly in the over-representation of American entities. Such biases may impact market fairness, competition, and the diversity of information exposure, especially in AI-driven markets where these biases could lead to systematic economic advantages or disadvantages.

However, ChoiceEval also has its limitations. Its evaluation results may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions. Additionally, since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users. Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts.

Deep Analysis

Background

In recent years, the development of large language models (LLMs) has significantly enhanced the role of AI systems in information retrieval and decision support. LLMs like ChatGPT, Google Gemini, and Meta AI have become the primary means for many people to access information. These systems not only influence individual choices but also potentially affect market fairness and competition. However, existing research has primarily focused on social biases, such as gender, race, and religion, while brand and cultural biases have received relatively less attention. The introduction of the ChoiceEval framework aims to fill this research gap by systematically evaluating geographic bias in LLM brand and cultural preferences, revealing its potential impact on market and cultural diversity.

Core Problem

Geographic biases in LLM brand and cultural preferences may have profound implications for market fairness, competition, and the diversity of information exposure. Especially in AI-driven markets, these biases could lead to systematic economic advantages or disadvantages. Existing research has primarily focused on social biases, while brand and cultural biases have received relatively less attention. Therefore, there is an urgent need for a systematic approach to evaluate geographic bias in LLM brand and cultural preferences to reveal its potential impact on market and cultural diversity.

Innovation

The core innovation of the ChoiceEval framework lies in its first systematic generation of evaluation questions and assessment of entity-perception bias in AI assistants. Unlike previous research primarily focused on social biases, ChoiceEval focuses on brand and cultural bias evaluation in open-ended recommendation scenarios, providing a scalable foundation for evaluating how AI assistants may shape real-world decisions. Specifically, ChoiceEval achieves this by segmenting users into psychographic profiles and generating diverse prompts, allowing for the quantification of preference and geographic bias. This approach is not only applicable to measuring social biases but extends to evaluating brand and cultural biases, particularly in open-ended recommendation scenarios.

Methodology

The implementation of the ChoiceEval framework includes the following steps:


  • �� User Cluster Definitions: Use psychographic user clusters (e.g., budget-conscious, wellness-focused, convenience) to capture how different user types interact with AI assistants.

  • �� Question Generation: Use LLMs to adapt core consumer clusters for each domain, converting their general characteristics into terminology and concerns specific to that decision-making context.

  • �� Response Extraction: Query each LLM with the same set of questions and record its responses. Simulate expert evaluation with multiple independent extraction runs to reduce decoding variability and parsing ambiguities.

  • �� Bias Analysis: Perform statistical analysis on extracted recommendations to reveal geographic bias in AI assistant brand and cultural preferences.

Experiments

In experiments, the ChoiceEval framework was applied to Gemini, GPT, and DeepSeek across 10 topics and over 2,000 questions. The experimental design includes:


  • �� Datasets: Select 10 topics covering commerce and culture, such as hotel chains, electric cars, and running shoes.

  • �� Baselines: Choose Gemini, GPT, and DeepSeek as comparison models to reveal geographic bias in brand and cultural preferences.

  • �� Evaluation Metrics: Use normalized top-k choice sets to quantify preference and geographic bias.

  • �� Hyperparameters: Generate 23 questions per topic and user cluster pair, resulting in 2,070 questions in total.

Results

Experimental results show that U.S.-developed models Gemini and GPT exhibit marked favoritism toward American entities, while China-developed DeepSeek shows more balanced yet still detectable geographic preferences. These preference patterns persist across user personas, indicating systematic rather than incidental effects. In multiple domains such as hotel chains, electric cars, and running shoes, LLM recommendations are skewed towards American entities despite the existence of competitive global alternatives. Statistical analysis of extracted recommendations reveals geographic bias in AI assistant brand and cultural preferences.

Applications

The application scenarios of the ChoiceEval framework include:


  • �� Market Auditing: Help researchers and regulators evaluate geographic bias in LLM brand and cultural preferences to reveal its potential impact on market and cultural diversity.

  • �� Model Improvement: Provide feedback to AI developers on LLM preferences to improve model fairness and diversity, ensuring diversity and fairness in recommendations.

  • �� Consumer Protection: Help consumers identify biases in AI recommendations to make more informed decisions, avoiding being swayed by certain preferences.

Limitations & Outlook

The limitations of the ChoiceEval framework include:


  • �� Evaluation results may be limited by the selected models and datasets, especially in geographic bias analysis, which may not fully represent preferences across all regions.

  • �� Since the framework relies on psychographic user clusters, it may not fully capture the complex behaviors and preferences of all users.

  • �� In some cases, LLM recommendations may be influenced by inherent biases in the training data, which ChoiceEval may not fully eliminate. Future directions include expanding the ChoiceEval framework to cover more topics and user clusters, especially in non-English contexts.

Plain Language Accessible to non-experts

Imagine you're in a large shopping mall with a variety of stores and brands. You want to buy a pair of running shoes but don't know which brand to choose. Your friend (like a large language model) recommends a few brands, but he always tends to recommend those he's familiar with, like American brands. This preference might influence your choice because you might overlook other equally good brands. ChoiceEval acts like a detector, helping you identify these preferences, ensuring you can see all the options, not just the favored brands. In this way, ChoiceEval helps you make more informed decisions rather than being swayed by certain preferences.

ELI14 Explained like you're 14

Hey there! You know how sometimes when we're online looking for stuff, like buying shoes or finding travel spots, AI assistants give us recommendations? But these AI helpers sometimes have a favorite, just like you might have a favorite game character! This might make us miss out on other cool stuff! ChoiceEval is like a super detective that helps us spot these favorites, so we can see more options. That way, we're not stuck in a tiny circle but can see a bigger world! Isn't that cool?

Glossary

Large Language Model (LLM)

A large language model is an AI system based on deep learning that can generate and understand natural language text.

In this paper, LLMs are used to generate recommendations for brand and cultural preferences.

ChoiceEval

ChoiceEval is a framework for auditing brand and cultural preferences in LLMs, capable of generating evaluation questions and quantifying preference and geographic bias.

ChoiceEval is used to evaluate geographic bias in LLM brand and cultural preferences.

Psychographic Profile

A psychographic profile is a user classification based on consumer values and lifestyles, used to capture different user types' behaviors and preferences.

In ChoiceEval, psychographic profiles are used to generate diverse evaluation questions.

Geographic Bias

Geographic bias refers to an AI system's preference for entities from certain geographic regions in recommendations, potentially leading to market unfairness.

ChoiceEval is used to detect geographic bias in LLM brand and cultural preferences.

Normalized Top-k Choice Set

A normalized top-k choice set refers to the top k recommendations extracted from LLM responses, used to quantify preference and geographic bias.

In ChoiceEval, used to evaluate LLM recommendation preferences.

Brand Preference

Brand preference refers to the tendency of consumers or AI systems to favor certain brands, influenced by various factors.

ChoiceEval is used to evaluate brand preferences in LLM recommendations.

Cultural Preference

Cultural preference refers to the tendency of consumers or AI systems to favor certain cultural entities, potentially affecting cultural diversity.

ChoiceEval is used to evaluate cultural preferences in LLM recommendations.

Open-ended Recommendation Scenario

An open-ended recommendation scenario is a context where users request suggestions from AI systems without explicit constraints.

ChoiceEval is used to evaluate preferences in open-ended recommendation scenarios.

Entity-perception Bias

Entity-perception bias refers to an AI system's bias in describing or recommending entities, potentially influencing user decisions.

ChoiceEval is used to detect entity-perception bias in LLMs.

Market Fairness

Market fairness refers to the equal opportunity for market participants to compete without bias or unfair practices.

ChoiceEval is used to evaluate the impact of LLMs on market fairness.

Open Questions Unanswered questions from this research

  • 1 How can geographic and cultural biases in LLMs be reduced without affecting model performance? Existing methods focus on data and model architecture adjustments, but these may impact overall model performance.
  • 2 Do LLM brand and cultural preferences manifest differently in non-English contexts? Existing research primarily focuses on English contexts, while preferences in non-English contexts remain underexplored.
  • 3 How can preferences in LLMs be more effectively detected and quantified in open-ended recommendation scenarios? Current methods rely on standardized choice sets, which may not be effective in open-ended scenarios.
  • 4 In multilingual environments, do LLM brand and cultural preferences get influenced by language? Existing research primarily focuses on single-language environments, while preferences in multilingual environments remain underexplored.
  • 5 How can the diversity of LLM recommendations be improved without affecting user experience? Existing methods focus on model and data adjustments, but these may impact overall user experience.

Applications

Immediate Applications

Market Auditing

ChoiceEval can help researchers and regulators evaluate geographic bias in LLM brand and cultural preferences, revealing its potential impact on market and cultural diversity.

Model Improvement

ChoiceEval provides feedback to AI developers on LLM preferences to improve model fairness and diversity, ensuring diversity and fairness in recommendations.

Consumer Protection

ChoiceEval can help consumers identify biases in AI recommendations to make more informed decisions, avoiding being swayed by certain preferences.

Long-term Vision

Global Market Fairness

By reducing geographic and cultural biases in LLMs, ChoiceEval contributes to achieving global market fairness, ensuring equal competition opportunities for all market participants.

Cultural Diversity Protection

ChoiceEval promotes cultural diversity protection by revealing cultural preferences in LLMs, ensuring visibility and representation of different cultures globally.

Abstract

Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure. This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes. Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.

cs.HC cs.AI cs.CY cs.IR cs.LG

References (20)

The proof and measurement of association between two things.

C. Spearman

2015 6333 citations ⭐ Influential

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou et al.

2016 3587 citations View Analysis →

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Liwei Jiang, Yuanjun Chai, Margaret Li et al.

2025 30 citations View Analysis →

Cultural Values do Correlate with Consumer Behavior

W. Henry

1976 252 citations

Survey of Cultural Awareness in Language Models: Text and Beyond

S. Pawar, Junyeong Park, Jiho Jin et al.

2024 101 citations View Analysis →

The silicon gaze: A typology of biases and inequality in LLMs through the lens of place

Francisco W. Kerche, M. Zook, Mark Graham

2026 1 citations

Consumer Behavior: Buying, Having, and Being

M. R. Solomon

1993 2343 citations

StereoSet: Measuring stereotypical bias in pretrained language models

Moin Nadeem, Anna Bethke, Siva Reddy

2020 1281 citations View Analysis →

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major et al.

2021 6668 citations

Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study

Yong Cao, Li Zhou, Seolhwa Lee et al.

2023 286 citations View Analysis →

Richer Output for Richer Countries: Uncovering Geographical Disparities in Generated Stories and Travel Recommendations

Kirti Bhagat, Kinshuk Vasisht, Danish Pruthi

2024 6 citations View Analysis →

TravelAgent: Generative agents in the built environment

Ariel Noyman, Kai Hu, Kent Larson

2024 8 citations View Analysis →

Large Language Models are Geographically Biased

Rohin Manvi, Samar Khanna, Marshall Burke et al.

2024 100 citations View Analysis →

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation

J. Dhamala, Tony Sun, Varun Kumar et al.

2021 525 citations View Analysis →

What is in a name? Mitigating Name Bias in Text Embedding Similarity via Anonymization

S. Manchanda, Pannagadatta K. Shivaswamy

2025 2 citations

Are LLMs Rational Investors? A Study on the Financial Bias in LLMs

Yuhang Zhou, Yuchen Ni, Zhiheng Xi et al.

2025 5 citations

Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems

Rishabh Mehrotra, James McInerney, Hugues Bouchard et al.

2018 314 citations

What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence,&Emerging Implications for Agentic E-Commerce

Amine Allouah, Omar Besbes, Josue Figueroa et al.

2025 5 citations View Analysis →

The Automated but Risky Game: Modeling and Benchmarking Agent-to-Agent Negotiations and Transactions in Consumer Markets

Shenzhe Zhu, Jiao Sun, Yi Nian et al.

2025 1 citations View Analysis →

Using Natural Sentence Prompts for Understanding Biases in Language Models

Sarah Alnegheimish, Alicia Guo, Yi Sun

2022 26 citations View Analysis →