The Collaboration Gap in Human-AI Work

TL;DR

Introduces a framework for understanding the fragility of human-AI collaboration, analyzing grounding conditions and repair burden.

cs.HC 🟡 Intermediate 2026-04-20 32 views

Varad Vishwarupe Marina Jirotka Nigel Shadbolt Ivan Flechais

AI Reader Arxiv Page Download PDF

human-AI collaboration LLM common ground repair HCI

Key Findings

Methodology

The study employs a constructivist grounded theory analysis based on 16 interviews with designers, developers, and applied AI practitioners. It introduces a framework to analyze grounding conditions and repair burden in human-AI collaboration by distinguishing three interaction structures: one-shot assistance, weak collaboration, and grounded collaboration.

Key Results

The study finds that collaboration often breaks down when the appearance of partnership outpaces the grounding capacity of the interaction. Specifically, data from 16 interviews show that participants generally perceive current LLM systems as significantly lacking in supporting deep collaboration.
In one-shot assistance, the user provides a prompt, the system produces an output, but shared understanding remains low, suitable for low-risk tasks.
In grounded collaboration, the system helps surface assumptions and track context, making collaboration more stable.

Significance

This research offers a new perspective on understanding the fragility of human-AI collaboration, emphasizing the importance of grounding conditions and repair burden. The framework can assist academia in better designing and evaluating human-AI collaboration systems and provide guidance for industry to improve collaboration experiences in practical applications.

Technical Contribution

The technical contributions include proposing a new framework to analyze grounding and repair issues in human-AI collaboration, differing from existing methods by emphasizing the grounding capacity of interactions and the distribution of repair burden rather than focusing solely on model performance.

Novelty

This study is the first to systematically analyze grounding conditions and repair burden in human-AI collaboration, providing a new framework to understand and improve the fragility of such collaborations.

Limitations

The study is limited by a small sample size of only 16 interviews, which may not fully represent all LLM application scenarios.
The applicability of the framework needs to be validated in more practical applications.

Future Work

Future research could expand the sample size, validate the framework's applicability in different application scenarios, and explore how design improvements can enhance the grounding capacity of collaboration.

AI Executive Summary

In today's technological landscape, large language models (LLMs) are widely used in fields such as programming, design, writing, and analysis. Despite being considered potential collaborators, the actual collaboration experience often falls short of expectations. Users frequently need to diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This phenomenon reflects a broader issue of fragility in collaborative AI.

This paper introduces a new conceptual framework to understand this fragility, based on an analysis of 16 interviews with designers, developers, and applied AI practitioners. The study shows that stable collaboration depends not only on model capability but also on the interaction's grounding conditions. By distinguishing three structures of human-AI collaboration: one-shot assistance, weak collaboration, and grounded collaboration, the research reveals the reasons for collaboration failures.

In one-shot assistance, the interaction structure remains close to a request-and-response pattern, where the user provides a prompt, and the system produces an output, but shared understanding remains low. This structure works for low-risk tasks such as summarization or boilerplate generation but does not support deeper collaboration.

Weak collaboration emerges when interaction becomes iterative. Users refine prompts, correct outputs, add context, or ask for revisions. These interactions may appear collaborative, but the burden of repair remains largely human. The user must infer what has gone wrong, reconstruct missing assumptions, and guide the system back toward the task.

In grounded collaboration, the interaction begins to support explicit clarification, signaling, and mutual repair. The system helps surface assumptions, track context, and make misalignment more visible. Final authority may still remain with the human, but the interaction itself becomes more balanced because repair no longer depends entirely on human improvisation.

This research provides a new perspective on the design of human-AI collaboration, emphasizing the importance of grounding conditions and repair burden. By reframing the collaboration gap as a grounding and repair problem, this work offers a conceptual lens for rethinking how human-AI collaboration is designed.

Deep Analysis

Background

With the rapid development of artificial intelligence technology, large language models (LLMs) have been widely applied in professional workflows such as programming, design, writing, and analysis. These models are not only seen as tools but also as potential collaborators. However, the actual collaboration experience often falls short of expectations, with users frequently needing to diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This phenomenon reflects a broader issue of fragility in collaborative AI. Studies have shown that while LLMs perform well in isolated environments, their performance may degrade when required to collaborate, a phenomenon known as the 'collaboration gap.'

Core Problem

The core problem lies in the fragility of human-AI collaboration. Although LLMs perform well in many tasks, in collaborative environments, users often need to diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This collaboration failure is not merely about whether outputs are correct but about participants' inability to reliably establish shared assumptions, interpret task state, or efficiently repair misunderstandings.

Innovation

The core innovation of this paper is the introduction of a new framework to analyze grounding conditions and repair burden in human-AI collaboration. By distinguishing three interaction structures: one-shot assistance, weak collaboration, and grounded collaboration, the study reveals the reasons for collaboration failures. This framework emphasizes the grounding capacity of interactions and the distribution of repair burden, rather than focusing solely on model performance.

Methodology

�� Employs constructivist grounded theory analysis based on 16 interviews with designers, developers, and applied AI practitioners.
�� Iterative coding of practitioner accounts to identify common structures and issues in collaboration.
�� Distinguishes three interaction structures: one-shot assistance, weak collaboration, and grounded collaboration.
�� Analyzes the grounding capacity and repair burden distribution of each structure.

Experiments

The experimental design is based on 16 semi-structured interviews with designers, developers, and applied AI practitioners. The interviews cover the use of LLMs in workflows involving drafting, ideation, coding, evaluation, and decision support. Participants described when collaboration with the model felt productive, when it became fragile, and how they responded when outputs diverged from task requirements or expectations.

Results

The study finds that collaboration often breaks down when the appearance of partnership outpaces the grounding capacity of the interaction. Specifically, data from 16 interviews show that participants generally perceive current LLM systems as significantly lacking in supporting deep collaboration. In one-shot assistance, the user provides a prompt, the system produces an output, but shared understanding remains low, suitable for low-risk tasks. In grounded collaboration, the system helps surface assumptions and track context, making collaboration more stable.

Applications

The framework can be used to improve the collaboration experience of LLMs in fields such as programming, design, writing, and analysis. By enhancing the grounding capacity of interactions and the distribution of repair burden, systems can better support deep collaboration, reducing the user's burden in diagnosing misunderstandings and repairing misaligned responses.

Limitations & Outlook

The study is limited by a small sample size of only 16 interviews, which may not fully represent all LLM application scenarios. The applicability of the framework needs to be validated in more practical applications. Additionally, the research primarily focuses on the perspectives of designers and developers, potentially overlooking the needs and challenges of other user groups.

Plain Language Accessible to non-experts

Imagine you're in a kitchen cooking a meal. You have an assistant, a large language model, who can help you chop vegetables, season dishes, and even suggest new recipes. However, sometimes the assistant misunderstands your instructions, like using salt instead of sugar or baking when you wanted to boil. This is similar to the problems in human-AI collaboration: while the assistant is smart, it doesn't always understand your intentions.

To make the collaboration smoother, you need to constantly communicate your ideas to the assistant, check its work, and correct it when it makes mistakes. This is what the paper refers to as the 'repair burden.' If you always have to spend a lot of time correcting the assistant's errors, the collaboration becomes tiring.

To improve this situation, the study suggests some methods, like having the assistant repeat your instructions or ask questions when it's unsure. It's like in the kitchen, where you have the assistant confirm each step to ensure it understands correctly.

Through these methods, human-AI collaboration can become more seamless, just like in the kitchen, where you and the assistant can work together to create delicious dishes.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool online multiplayer game. You have an AI teammate who can help you fight monsters, level up, and collect gear. Sounds awesome, right? But sometimes, this AI teammate messes up, like defending when you need it to attack or running left when you want it to go right.

This is like the problems in human-AI collaboration: the AI is smart, but sometimes it doesn't get what you're thinking. To make the game go smoothly, you need to keep telling the AI your plans, check its actions, and correct it when it makes mistakes.

Researchers found that if the AI can better understand your instructions or ask you when it's unsure, the collaboration would be smoother. It's like in the game, where you have the AI confirm each step to make sure it understands correctly.

With these methods, the gaming experience can be more fun, just like you and your AI teammate can work together to defeat all the enemies!

Glossary

Large Language Model (LLM)

A type of AI model trained on vast amounts of text data, capable of generating and understanding natural language.

In this paper, LLMs are used as collaborators in fields like programming and design.

Human-AI Collaboration

The process where humans and AI systems work together to complete tasks, emphasizing interaction and cooperation.

The paper explores the fragility and reasons for failure in human-AI collaboration.

Common Ground

Refers to shared beliefs, assumptions, and goals among participants, essential for effective communication and collaboration.

The study emphasizes the importance of common ground in stable collaboration.

Repair Burden

The responsibility for identifying and correcting misunderstandings in collaboration.

The paper analyzes the distribution of repair burden in different collaboration structures.

One-shot Assistance

A simple request-and-response pattern where the user provides a prompt, and the system produces an output.

Suitable for low-risk tasks but does not support deep collaboration.

Weak Collaboration

An interaction that appears collaborative, where users need to constantly adjust prompts and correct outputs.

The study points out that the repair burden in weak collaboration is primarily human.

Grounded Collaboration

An interaction that supports explicit clarification and mutual repair, with the system helping to surface assumptions and track context.

Grounded collaboration makes human-AI collaboration more stable.

Constructivist Grounded Theory

A method of generating theory through data analysis, emphasizing the abstraction of concepts from practice.

The paper uses this method to analyze interview data.

Signaling

The process of enhancing shared understanding through explicit feedback and confirmation.

The study suggests using signaling to improve the grounding capacity of collaboration.

Design Mechanisms

Specific methods and strategies designed to enhance system functionality and user experience.

The paper proposes three design mechanisms to improve human-AI collaboration.

Open Questions Unanswered questions from this research

1 Current LLM systems are significantly lacking in supporting deep collaboration, particularly in establishing and maintaining common ground. Research needs to explore how to enhance the system's grounding capacity to reduce the user's burden in repairing misunderstandings.
2 Although the paper proposes a new framework to analyze grounding and repair issues in human-AI collaboration, its applicability needs to be validated in more practical applications. Future research should expand the sample size and test the framework's effectiveness in different application scenarios.
3 The study primarily focuses on the perspectives of designers and developers, potentially overlooking the needs and challenges of other user groups. Future work should include more diverse user groups to fully understand the challenges of human-AI collaboration.
4 The current research mainly relies on interview data, lacking quantitative experimental support. Future studies could validate the framework's hypotheses through experiments and quantify the effects of different collaboration structures.
5 The distribution of repair burden may vary across different application scenarios, and research needs to further explore how to optimize the distribution of repair burden in different scenarios.

Applications

Immediate Applications

Programming Assistant

By enhancing the grounding capacity of LLMs, develop smarter programming assistants to help developers debug and optimize code more effectively.

Design Tools

Integrate LLMs into design tools to provide smarter design suggestions and automation features, enhancing designers' productivity.

Writing Assistant

Develop smarter writing assistants to help users quickly generate high-quality text content and provide real-time feedback and correction suggestions.

Long-term Vision

Intelligent Collaboration Platform

Create an intelligent collaboration platform integrating multiple AI technologies to support cross-domain team collaboration and innovation.

Personalized Learning System

Develop a personalized learning system based on LLMs, providing customized learning content and feedback according to users' learning styles and needs.

Abstract

LLMs are increasingly presented as collaborators in programming, design, writing, and analysis. Yet the practical experience of working with them often falls short of this promise. In many settings, users must diagnose misunderstandings, reconstruct missing assumptions, and repeatedly repair misaligned responses. This poster introduces a conceptual framework for understanding why such collaboration remains fragile. Drawing on a constructivist grounded theory analysis of 16 interviews with designers, developers, and applied AI practitioners working on LLM-enabled systems, and informed by literature on human-AI collaboration, we argue that stable collaboration depends not only on model capability but on the interaction's grounding conditions. We distinguish three recurrent structures of human-AI work: one-shot assistance, weak collaboration with asymmetric repair, and grounded collaboration. We propose that collaboration breaks down when the appearance of partnership outpaces the grounding capacity of the interaction and contribute a framework for discussing grounding, repair, and interaction structure in LLM-enabled work.

cs.HC cs.AI cs.IR cs.LG

References (19)

The Construction of Shared Knowledge in Collaborative Problem Solving

J. Roschelle, Stephanie D. Teasley

1995 2244 citations

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning

Harmanpreet Kaur, H. Nori, Samuel Jenkins et al.

2020 574 citations

Constructing Grounded Theory

Kathy Charmaz

2014 10449 citations

"To LLM, or Not to LLM?": How Designers and Developers Navigate LLMs as Tools or Teammates

Varad V. Vishwarupe, Ivan Flechais, Nigel Shadbolt et al.

2026 1 citations View Analysis →

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance

Gagan Bansal, Tongshuang Sherry Wu, Joyce Zhou et al.

2020 851 citations View Analysis →

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy

B. Shneiderman

2020 1122 citations View Analysis →

On Clark and Schaefer’s Contribution Model and its applicability to Human-Computer Collaboration

D. Traum

2007 7 citations

Coordination of knowledge in communication: effects of speakers' assumptions about what others know.

Susan R. Fussell, R. Krauss

1992 403 citations

The use of visual information in shared visual spaces: informing the development of virtual co-presence

R. Kraut, Darren Gergle, Susan R. Fussell

2002 251 citations

Using Visual Information for Grounding and Awareness in Collaborative Tasks

Darren Gergle, R. Kraut, Susan R. Fussell

2012 176 citations

Bringing Transparency Design into Practice

Malin Eiband, H. Schneider, Mark Bilandzic et al.

2018 230 citations

Guidelines for Human-AI Interaction

Saleema Amershi, Daniel S. Weld, Mihaela Vorvoreanu et al.

2019 1977 citations

Grounding Gaps in Language Model Generations

Omar Shaikh, Kristina Gligori'c, Ashna Khetan et al.

2023 54 citations View Analysis →

Questioning the AI: Informing Design Practices for Explainable AI User Experiences

Q. Liao, D. Gruen, Sarah Miller

2020 905 citations View Analysis →

Grounding in communication

H. Clark, S. Brennan

1991 4758 citations

A "speech acts" approach to grounding in conversation

D. Traum, James F. Allen

1992 82 citations

Referring as a Collaborative Process

Philip R. Cohen, J. Morgan, M. Pollack

2003 1082 citations

On Using Language

C. K. Grant

1956 5243 citations

The Collaboration Gap

Tim R. Davidson, Adam Fourney, Saleema Amershi et al.

2025 5 citations View Analysis →

The Collaboration Gap in Human-AI Work

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Large Language Model (LLM)

Human-AI Collaboration

Common Ground

Repair Burden

One-shot Assistance

Weak Collaboration

Grounded Collaboration

Constructivist Grounded Theory

Signaling

Design Mechanisms

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Programming Assistant

Design Tools

Writing Assistant

Long-term Vision

Intelligent Collaboration Platform

Personalized Learning System

Abstract

References (19)

Related Papers

Point & Grasp: Flexible Selection of Out-of-Reach Objects Through Probabilistic Cue Integration

Auditing Preferences for Brands and Cultures in LLMs

Modeling Trial-and-Error Navigation With a Sequential Decision Model of Information Scent