COFFAIL: A Dataset of Successful and Anomalous Robot Skill Executions in the Context of Coffee Preparation

TL;DR

COFFAIL dataset includes successful and anomalous robot skill executions in coffee preparation, supporting imitation learning.

cs.RO 🟡 Intermediate 2026-04-20 31 views

Alex Mitrevski Ayush Salunke

AI Reader Arxiv Page Download PDF

robot learning anomaly detection imitation learning dataset bimanual manipulation

Key Findings

Methodology

The COFFAIL dataset was collected in a kitchen environment using a bimanual mobile robot named Jessie. It covers seven domestic-related skills, including both successful and anomalous executions. Data was collected through hand-coded scripts and kinaesthetic teaching. Anomalies include missing objects, camera occlusions, and collisions. The dataset records camera images, joint data, end effector positions, and delta actions.

Key Results

Result 1: The COFFAIL dataset documents successful and failed instances of seven skills, including cup pickup, pouring, and stirring. The number of episodes per skill is detailed in Table I.
Result 2: Using the COFFAIL dataset for imitation learning, a convolutional neural network (CNN) policy was trained, achieving predictions close to real actions using mean squared error (MSE) loss.
Result 3: Experimental results demonstrate the potential of the COFFAIL dataset for anomaly detection and failure recovery strategy research.

Significance

The release of the COFFAIL dataset provides a significant resource for the field of robot learning, particularly in anomaly detection and failure recovery. Existing datasets typically focus on successful executions, but COFFAIL addresses this gap by including anomalous data. This is crucial for developing robust robot strategies capable of handling real-world uncertainties and failures. By offering a diverse range of skills and anomalies, COFFAIL aids in advancing robot learning algorithms, especially in domestic automation.

Technical Contribution

The COFFAIL dataset's technical contributions lie in its diversity and practicality. Unlike existing datasets, COFFAIL includes a variety of skills and records anomalous executions, which are essential for developing smarter robotic systems. Additionally, the dataset's design considers bimanual manipulation and multi-camera perspectives, providing rich data support for studying complex robotic operations.

Novelty

The novelty of the COFFAIL dataset lies in its inclusion of both successful and anomalous execution instances across multiple domestic skills. This design makes it an ideal resource for studying imitation learning and anomaly detection, addressing the shortcomings of existing datasets in terms of diversity and practicality.

Limitations

Limitation 1: The COFFAIL dataset is primarily collected in static environments, lacking data from dynamic scenarios, which may limit its applicability in dynamic settings.
Limitation 2: The types of anomalies included in the dataset are limited and may not cover all possible failure modes.
Limitation 3: Due to the dataset's scale and complexity, processing and analysis may require significant computational resources.

Future Work

Future research could expand the applications of the COFFAIL dataset, including developing more complex anomaly detection algorithms and failure recovery strategies. Additionally, exploring how to collect data in dynamic environments and how to leverage human-robot interaction to enhance the dataset's diversity and practicality are worthwhile directions.

AI Executive Summary

The release of the COFFAIL dataset brings new possibilities to the field of robot learning, particularly in anomaly detection and failure recovery. Existing datasets typically focus on successful executions, limiting researchers' options when developing robust robot strategies. COFFAIL addresses this gap by including successful and anomalous execution instances across multiple skills.

The COFFAIL dataset was collected in a kitchen environment using a bimanual mobile robot named Jessie. It covers seven domestic-related skills, including cup pickup, pouring, and stirring. The number of episodes per skill is detailed in Table I. Data was collected through hand-coded scripts and kinaesthetic teaching, ensuring diversity and accuracy.

Technically, the COFFAIL dataset records camera images, joint data, end effector positions, and delta actions. These data provide rich resources for studying imitation learning, anomaly detection, and failure recovery. Experimental results show that using the COFFAIL dataset effectively supports imitation learning, with a convolutional neural network (CNN) policy achieving predictions close to real actions.

The release of the COFFAIL dataset is significant for both academia and industry. It offers researchers a diverse resource to help develop smarter, more robust robotic systems. This is particularly important for domestic automation and other applications requiring high reliability.

However, the COFFAIL dataset also has its limitations. Since it is primarily collected in static environments, it lacks data from dynamic scenarios, which may limit its applicability in dynamic settings. Additionally, the types of anomalies included are limited and may not cover all possible failure modes. Future research can overcome these limitations by expanding the dataset's diversity and practicality.

Deep Analysis

Background

Robot skill execution datasets are crucial in the field of robot learning. In recent years, with the advancement of machine learning technologies, more research has focused on improving the flexibility and robustness of robotic operations through data-driven methods. However, existing large-scale learning datasets typically only include successful executions, posing challenges for researchers developing failure detection and recovery techniques. While some datasets include anomalous data, they often focus on industrial applications or specific everyday skills, such as pouring or object handover. The release of the COFFAIL dataset aims to fill this gap by providing a diverse range of skills and anomalies, advancing robot learning algorithms.

Core Problem

Most existing robot learning datasets focus on successful executions, lacking records of failures and anomalies. This limits researchers' options when developing robust robot strategies, as real-world robotic operations inevitably encounter various failures and anomalies. The goal of the COFFAIL dataset is to provide a diverse resource by documenting both successful and anomalous execution instances, helping researchers develop robot strategies that can handle real-world uncertainties and failures.

Innovation

The core innovations of the COFFAIL dataset lie in its diversity and practicality. First, the dataset includes both successful and anomalous execution instances, which are crucial for studying imitation learning and anomaly detection. Second, the dataset covers multiple domestic skills, including bimanual manipulation, providing rich data support for studying complex robotic operations. Additionally, the dataset's design considers multi-camera perspectives, recording camera images, joint data, end effector positions, and delta actions, offering a comprehensive observational perspective for researchers.

Methodology

�� The COFFAIL dataset was collected in a kitchen environment using a bimanual mobile robot named Jessie.
�� The dataset covers seven domestic-related skills, including cup pickup, pouring, and stirring.
�� Data was collected through hand-coded scripts and kinaesthetic teaching, ensuring diversity and accuracy.
�� Recorded data includes camera images, joint data, end effector positions, and delta actions.
�� Anomalous instances include missing objects, camera occlusions, and collisions, providing rich resources for anomaly detection research.

Experiments

In the experimental design, the COFFAIL dataset was used to train a convolutional neural network (CNN) policy, particularly for the cup pickup skill. The experiment used mean squared error (MSE) loss for training, with the Adam optimizer and a learning rate of 1e-5. Results showed that the trained CNN policy could effectively predict actions close to real ones. Additionally, the experiment included preliminary research on anomaly detection and failure recovery, demonstrating the potential of the COFFAIL dataset in these areas.

Results

Experimental results show that using the COFFAIL dataset effectively supports imitation learning, with the trained convolutional neural network (CNN) policy achieving predictions close to real actions for the cup pickup skill. Additionally, the anomalous instances in the dataset provide rich resources for research on anomaly detection and failure recovery. The experiment also demonstrated the applicability of the COFFAIL dataset across multiple skills, including cup pickup, pouring, and stirring, with the number of episodes per skill detailed in Table I.

Applications

Application scenarios for the COFFAIL dataset include domestic automation, anomaly detection, and failure recovery. The diverse skills and anomalous instances provided by the dataset lay the foundation for developing smarter, more robust robotic systems. This is particularly important for applications requiring high reliability, such as home service robots and industrial automation.

Limitations & Outlook

The limitations of the COFFAIL dataset include its primary collection in static environments, lacking data from dynamic scenarios, which may limit its applicability in dynamic settings. Additionally, the types of anomalies included are limited and may not cover all possible failure modes. Future research can overcome these limitations by expanding the dataset's diversity and practicality.

Plain Language Accessible to non-experts

Imagine you're in a kitchen preparing coffee. You need to do many things, like picking up a cup, pouring water, stirring, and so on. Now, imagine there's a robot assistant that can help you with these tasks. The COFFAIL dataset is like this robot's memory bank, recording its successes and failures while performing these tasks. By analyzing these records, researchers can teach the robot how to perform tasks better, just like you improve your skills through observation and practice.

The COFFAIL dataset doesn't just record successful operations; it also includes failures, like dropping a cup or spilling water. These failure records are crucial because they can help researchers identify areas where the robot needs improvement. Just as you learn from mistakes during practice, the robot can improve its abilities by analyzing these failures.

In summary, the COFFAIL dataset is like a rich learning resource, helping robots become smarter and more reliable when performing domestic tasks. Through continuous learning and improvement, robots can better adapt to various challenges in the real world.

ELI14 Explained like you're 14

Hey there, friends! Today we're going to talk about a dataset called COFFAIL. Imagine you're at home making coffee, and sometimes you accidentally spill water or drop a cup. COFFAIL is like a super-smart robot that records its successes and failures while doing these things.

This dataset is super cool because it doesn't just record when the robot succeeds; it also records when it fails. Why is this important? Because by analyzing these failures, scientists can teach the robot how to avoid these mistakes, just like you practice a game until you beat it!

The COFFAIL dataset helps scientists develop smarter robots that can do many things at home, like pouring water, stirring coffee, and more. Imagine one day you could have a robot make breakfast for you—how awesome would that be?

So next time you see a robot, think about the COFFAIL dataset. It's helping these robots become smarter and more reliable, making our lives more convenient and fun!

Glossary

COFFAIL

COFFAIL is a dataset containing successful and anomalous robot skill executions in coffee preparation, aimed at supporting imitation learning and anomaly detection research.

In the paper, COFFAIL is used to train and evaluate robot policies.

Imitation Learning

Imitation learning is a machine learning method where tasks are learned by observing and mimicking the behavior of humans or other agents.

The COFFAIL dataset is used for imitation learning to train robots to perform specific skills.

Anomaly Detection

Anomaly detection is the process of identifying unusual patterns or behaviors in data that do not conform to expected norms.

Anomalous instances in the COFFAIL dataset are used to study anomaly detection algorithms.

Bimanual Manipulation

Bimanual manipulation involves using two robotic arms simultaneously to perform tasks, enhancing operational flexibility and efficiency.

The COFFAIL dataset includes skills involving bimanual manipulation, such as pouring.

Convolutional Neural Network (CNN)

A CNN is a deep learning model that excels at processing image data by extracting features through convolutional layers.

In the paper, CNNs are used to train robot policies.

Mean Squared Error (MSE)

MSE is a loss function used to measure the difference between predicted and actual values.

MSE is used as the loss function when training CNN policies.

Kinaesthetic Teaching

Kinaesthetic teaching is a method of collecting example data by physically guiding a robot to perform tasks.

Part of the COFFAIL dataset is collected through kinaesthetic teaching.

Static Environment

A static environment refers to a scenario where the environment remains unchanged during data collection.

The COFFAIL dataset is primarily collected in static environments.

Adam Optimizer

Adam is an adaptive learning rate optimization algorithm widely used for training deep learning models.

The Adam optimizer is used when training CNN policies.

End Effector

An end effector is the part of a robotic arm that interacts with the environment.

The COFFAIL dataset records the positions and actions of end effectors.

Open Questions Unanswered questions from this research

1 Effectively collecting and utilizing data in dynamic environments remains an open question. Existing datasets are primarily collected in static environments, lacking adaptability to dynamic changes. Researchers need to develop new methods to capture and analyze data in dynamic environments to enhance robots' applicability in the real world.
2 Existing anomaly detection algorithms still face limitations when handling complex anomaly patterns. The COFFAIL dataset provides some anomalous instances, but developing smarter algorithms to identify and handle these anomalies remains a challenge.
3 Incorporating human-robot interaction into the process of collecting and using datasets to enhance their diversity and practicality is a direction worth exploring. By interacting with humans, robots can receive more feedback and guidance, improving their learning and adaptability.
4 Effectively processing and analyzing large-scale datasets with limited resources remains a challenge. The scale and complexity of the COFFAIL dataset may require significant computational resources, necessitating the development of more efficient data processing and analysis methods.
5 Further research is needed to develop more complex failure recovery strategies using the COFFAIL dataset. Existing research primarily focuses on anomaly detection, while effectively recovering tasks after detecting anomalies is an important research direction.

Applications

Immediate Applications

Domestic Automation

The COFFAIL dataset provides a foundation for developing domestic automation robots, helping them perform tasks like pouring and stirring more effectively.

Anomaly Detection

By analyzing anomalous instances in the COFFAIL dataset, researchers can develop smarter anomaly detection algorithms to enhance the robustness of robotic systems.

Failure Recovery

The COFFAIL dataset provides rich resources for studying failure recovery strategies, helping robots effectively recover tasks when encountering anomalies.

Long-term Vision

Smart Homes

With the application of the COFFAIL dataset, future smart home systems can better collaborate with robots, enhancing the efficiency and convenience of home automation.

Human-Robot Collaboration

By leveraging the COFFAIL dataset, future human-robot collaboration systems can achieve more efficient interaction and collaboration, improving robots' adaptability and flexibility.

Abstract

In the context of robot learning for manipulation, curated datasets are an important resource for advancing the state of the art; however, available datasets typically only include successful executions or are focused on one particular type of skill. In this short paper, we briefly describe a dataset of various skills performed in the context of coffee preparation. The dataset, which we call COFFAIL, includes both successful and anomalous skill execution episodes collected with a physical robot in a kitchen environment, a couple of which are performed with bimanual manipulation. In addition to describing the data collection setup and the collected data, the paper illustrates the use of the data in COFFAIL to learn a robot policy using imitation learning.

cs.RO

References (15)

Robot Action Diagnosis and Experience Correction by Falsifying Parameterised Execution Models

Alex Mitrevski, P. Plöger, G. Lakemeyer

2021 3 citations View Analysis →

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

309 citations

Using Visual Anomaly Detection for Task Execution Monitoring

Santosh Thoduka, Juergen Gall, P. Plöger

2021 15 citations View Analysis →

ConditionNET: Learning Preconditions and Effects for Execution Monitoring

Daniel Sliwowski, Dongheui Lee

2025 11 citations View Analysis →

ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation

Chaitanya Mitash, Fan Wang, Shiyang Lu et al.

2023 42 citations View Analysis →

AURSAD: Universal Robot Screwdriving Anomaly Detection Dataset

Błażej Leporowski, Daniella Tola, Casper Hansen et al.

2021 6 citations View Analysis →

Foundation models in robotics: Applications, challenges, and the future

Roya Firoozi, Johnathan Tucker, Stephen Tian et al.

2023 340 citations View Analysis →

BridgeData V2: A Dataset for Robot Learning at Scale

H. Walke, Kevin Black, Abraham Lee et al.

2023 598 citations View Analysis →

Demonstrating REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

Daniel Sliwowski, Shail V Jadav, Sergej Stanovcic et al.

2025 5 citations

Stow: Robotic Packing of Items into Fabric Pods

Nicolas Hudson, Joshua Hooks, Rahul B. Warrier et al.

2025 5 citations View Analysis →

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba

2014 165017 citations View Analysis →

REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

Zeyi Liu, Arpit Bahety, Shuran Song

2023 214 citations View Analysis →

FINO-Net: A Deep Multimodal Sensor Fusion Framework for Manipulation Failure Detection

A. Inceoğlu, E. Aksoy, Abdullah Cihan Ak et al.

2020 40 citations View Analysis →

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Oliver Kroemer, S. Niekum, G. Konidaris

2019 474 citations View Analysis →

A Multimodal Handover Failure Detection Dataset and Baselines

Santosh Thoduka, Nico Hochgeschwender, Juergen Gall et al.

2024 6 citations View Analysis →

COFFAIL: A Dataset of Successful and Anomalous Robot Skill Executions in the Context of Coffee Preparation

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

COFFAIL

Imitation Learning

Anomaly Detection

Bimanual Manipulation

Convolutional Neural Network (CNN)

Mean Squared Error (MSE)

Kinaesthetic Teaching

Static Environment

Adam Optimizer

End Effector

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Domestic Automation

Anomaly Detection

Failure Recovery

Long-term Vision

Smart Homes

Human-Robot Collaboration

Abstract

References (15)

Related Papers

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Pushing Radar Odometry Beyond the Pavement: Current Capabilities and Challenges

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

Computational Design and Co-Robotic Fabrication for Material Reuse in Architecture

Guiding Vector Field Generation via Score-based Diffusion Model