Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
Study reveals representational harms in LLM narratives against Global Majority nationalities using a QA model on 500,000 stories.
Key Findings
Methodology
The study employs a Question Answering (QA) model to identify nationality identity markers in narratives. By analyzing 500,000 stories generated by GPT-3.5, GPT-4, Llama 2, Claude 2.0, and PaLM 2, the research team evaluated nationality biases of these models in a U.S. context. The study also introduces a new dataset of 292,500 narratives generated by GPT4.1-Nano across 195 globally recognized nations to compare Global Majority and Minority representations.
Key Results
- Result 1: Out of 500,000 stories, only 5.4% mention a country or nationality other than the U.S., with only 1.8% involving character identity. Non-U.S. nationalities in neutral scenarios are 50 times more likely to refer to cuisine, language, or travel destinations than to a person.
- Result 2: In power-laden stories, non-U.S. nationality characters are more likely to be portrayed as subordinate roles (98.6%) rather than dominant roles (1.6%).
- Result 3: Analysis in a global context shows that characters from Global Majority countries are more likely to be portrayed as needing help rather than as dominant figures.
Significance
This research highlights the potential biases and representational harms in LLM-generated narratives against Global Majority nationalities. By thoroughly analyzing how these models depict nationality identities in different contexts, the study provides crucial insights into understanding and mitigating cultural biases in AI systems. This is particularly significant for enterprise and government applications that rely on AI for decision-making, as these biases could impact policy-making and social equity.
Technical Contribution
The technical contributions of the study include the development of a QA model to extract nationality identity markers from complex open-ended narratives. This approach overcomes the reliance on templated or explicit identity prompts in previous studies, allowing for a more accurate assessment of biases in natural language generation tasks. Additionally, the study reveals representational differences between Global Majority and Minority countries by comparing story generations across different national contexts.
Novelty
This study is the first to systematically analyze nationality biases in LLM-generated narratives, particularly in the context of Global Majority countries. Unlike previous research that primarily focused on gender, race, and sexual orientation biases, this study expands the analysis to nationality identities, revealing new dimensions of cultural bias.
Limitations
- Limitation 1: The study primarily focuses on English language models, which may not fully reflect biases in models of other languages.
- Limitation 2: Due to dataset limitations, the study may not capture nuanced differences in specific national or cultural contexts.
- Limitation 3: The study relies on existing models and datasets, which may not capture improvements in the latest models.
Future Work
Future research could expand to multilingual models to assess biases across different languages and cultural contexts. Additionally, new methods could be developed to automatically detect and mitigate cultural biases in AI-generated content, especially in high-stakes applications. This would help enhance the fairness and inclusivity of AI systems.
AI Executive Summary
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. However, these technologies are not value-neutral; they may encode and perpetuate harmful biases against non-dominant communities worldwide. To better evaluate and mitigate such harms, more research examining how LLMs portray diverse individuals is needed.
In this study, we analyze how national origin identities are portrayed by widely-adopted LLMs in response to open-ended narrative generation prompts. Our findings demonstrate the presence of persistent representational harms by national origin, including harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority identities. Minoritized national identities are simultaneously underrepresented in power-neutral stories and overrepresented in subordinated character portrayals, which are over fifty times more likely to appear than dominant portrayals.
The degree of harm is amplified when US nationality cues (e.g., “American”) are present in input prompts. Notably, we find that the harms we identify cannot be explained away via sycophancy, as US-centric biases persist even when replacing US nationality cues with non-US national identities in the prompts. Based on our findings, we call for further exploration of cultural harms in LLMs through methodologies that center Global Majority perspectives and challenge the uncritical adoption of US-based LLMs for the classification, surveillance, and misrepresentation of the majority of our planet.
The study employs a Question Answering (QA) model to identify nationality identity markers in narratives. By analyzing 500,000 stories generated by GPT-3.5, GPT-4, Llama 2, Claude 2.0, and PaLM 2, the research team evaluated nationality biases of these models in a U.S. context. The study also introduces a new dataset of 292,500 narratives generated by GPT4.1-Nano across 195 globally recognized nations to compare Global Majority and Minority representations.
Results show that out of 500,000 stories, only 5.4% mention a country or nationality other than the U.S., with only 1.8% involving character identity. Non-U.S. nationalities in neutral scenarios are 50 times more likely to refer to cuisine, language, or travel destinations than to a person. In power-laden stories, non-U.S. nationality characters are more likely to be portrayed as subordinate roles (98.6%) rather than dominant roles (1.6%).
This research highlights the potential biases and representational harms in LLM-generated narratives against Global Majority nationalities. By thoroughly analyzing how these models depict nationality identities in different contexts, the study provides crucial insights into understanding and mitigating cultural biases in AI systems. This is particularly significant for enterprise and government applications that rely on AI for decision-making, as these biases could impact policy-making and social equity.
Deep Dive
Abstract
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating harmful biases about non-dominant communities across the globe. To better evaluate and mitigate such harms, more research examining how LLMs portray diverse individuals is needed. In this work, we study how national origin identities are portrayed by widely-adopted LLMs in response to open-ended narrative generation prompts. Our findings demonstrate the presence of persistent representational harms by national origin, including harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority identities. Minoritized national identities are simultaneously underrepresented in power-neutral stories and overrepresented in subordinated character portrayals, which are over fifty times more likely to appear than dominant portrayals. The degree of harm is amplified when US nationality cues (e.g., ``American'') are present in input prompts. Notably, we find that the harms we identify cannot be explained away via sycophancy, as US-centric biases persist even when replacing US nationality cues with non-US national identities in the prompts. Based on our findings, we call for further exploration of cultural harms in LLMs through methodologies that center Global Majority perspectives and challenge the uncritical adoption of US-based LLMs for the classification, surveillance, and misrepresentation of the majority of our planet.
References (20)
Globalized anti-blackness: Transnationalizing Western immigration law, policy, and practice
Vilna I. Bashi
Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject
N. Couldry, U. Mejías
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
A. Luccioni, Alex Hernandez-Garcia
The group 77
M. Naraine
The Unpaid Toll: Quantifying the Public Health Impact of AI
Yuelin Han, Zhifeng Wu, Pengfei Li et al.
Is It Bad to Work All the Time? Cross-Cultural Evaluation of Social Norm Biases in GPT-4
Z. Liu, Farhan Samir, Mehar Bhatia et al.
Echoes of Eugenics: Tracing the Ideological Persistence of Scientific Racism in Scholarly Discourse
Nada Hashmi, Sydney Lodge, Cassidy R. Sugimoto et al.
Ethical and social risks of harm from Language Models
Laura Weidinger, John F. J. Mellor, Maribeth Rauh et al.
Teaching Parrots to See Red: Self-Audits of Generative Language Models Overlook Sociotechnical Harms
Evan Shieh, T. Monroe-White
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
Yixin Wan, George Pu, Jiao Sun et al.
AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances
Dhruv Agarwal, Mor Naaman, Aditya Vashistha
Ethics for the majority world: AI and the question of violence at scale
Paola Ricaurte
Decolonizing Post-Colonial Studies and Paradigms of Political Economy: Transmodernity, Decolonial Thinking, and Global Coloniality
Ramón Grosfoguel
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
R. Shelby, Shalaleh Rismani, K. Henne et al.
More of the Same: Persistent Representational Harms Under Increased Representation
Jennifer Mickel, Maria De-Arteaga, Leqi Liu et al.
The Weirdest People in the World
Joseph Henrich
Investigating Cultural Alignment of Large Language Models
Badr AlKhamissi, Muhammad N. ElNokrashy, Mai Alkhamissi et al.
How to Hide an Empire: A Short History of the Greater United States
A. Priest
The Psychosocial Impacts of Generative AI Harms
Faye-Marie Vassel, Evan Shieh, Cassidy R. Sugimoto et al.
Unintended Impacts of LLM Alignment on Global Representation
Michael Joseph Ryan, William B. Held, Diyi Yang