EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

TL;DR

EVENT5Ws: A large dataset for open-domain event extraction, manually annotated and statistically verified.

cs.CL 🔴 Advanced 2026-04-24 31 views

Praval Sharma Ashok Samal Leen-Kiat Soh Deepti Joshi

event extraction dataset open-domain NLP machine learning

Key Findings

Methodology

This study introduces the EVENT5Ws dataset, a large open-domain event extraction dataset, manually annotated and validated through inter-coder reliability (ICR). The dataset is created following a systematic annotation pipeline, covering the five key aspects of events: where, when, what, who, and why. By evaluating state-of-the-art pre-trained large language models, the study establishes a benchmark for future research and demonstrates the models' effective generalization to datasets from different geographical contexts.

Key Results

Result 1: Pre-trained large language models evaluated on the EVENT5Ws dataset perform well on 'where', 'when', and 'who', but struggle with 'what' and 'why', highlighting the challenges of open-domain event extraction.
Result 2: Models show good generalization capabilities across datasets from different geographical and textual contexts, demonstrating the potential of the EVENT5Ws dataset for developing generalizable algorithms.
Result 3: Through experiments, the study demonstrates the models' generalization capabilities across diverse geographical and textual contexts, emphasizing the dataset's potential for developing robust open-domain event extraction algorithms.

Significance

The development of the EVENT5Ws dataset addresses the gap in the field of open-domain event extraction, where large-scale manually verified datasets are lacking. This dataset not only provides a new benchmark for researchers but also demonstrates its generalization capabilities across different geographical contexts, advancing natural language understanding and downstream applications. Through a systematic annotation process and evaluation of existing models, the study offers valuable insights and recommendations for future large-scale dataset development.

Technical Contribution

The technical contributions of this study include: 1) the development of a large-scale open-domain event extraction dataset, EVENT5Ws, 2) the proposal of a systematic annotation pipeline, providing empirical insights into annotation complexity, and 3) the evaluation of state-of-the-art pre-trained large language models, establishing a benchmark for future open-domain event extraction research.

Novelty

EVENT5Ws is the first large-scale manually annotated and ICR-verified open-domain event extraction dataset. Unlike existing datasets based on predefined event schemas, EVENT5Ws employs the 5Ws framework, supporting the extraction of unconstrained event types and offering new possibilities for natural language understanding.

Limitations

Limitation 1: Despite the generalization capabilities of the EVENT5Ws dataset across different geographical contexts, challenges remain in handling 'what' and 'why', which may require more complex algorithms to improve performance.
Limitation 2: The dataset's creation relies on manual annotation, which, despite ICR verification, may still contain subjective biases.
Limitation 3: As the dataset is primarily based on news reports, it may perform less effectively when dealing with other types of documents.

Future Work

Future directions include: 1) developing more complex algorithms to improve performance on 'what' and 'why', 2) expanding the dataset to cover more types of documents and events, and 3) exploring automated annotation methods to reduce subjective bias in manual annotations.

AI Executive Summary

Event extraction involves identifying the central aspects of events from text, which is crucial for improving situational awareness, emergency management, and decision-making. However, existing datasets have limitations in event type coverage and lack large-scale manually verified datasets in open-domain settings.

To address these issues, researchers have developed EVENT5Ws, a large-scale, manually annotated, and inter-coder reliability (ICR) verified open-domain event extraction dataset. This dataset employs the 5Ws framework, covering the five key aspects of events: where, when, what, who, and why, and follows a systematic annotation pipeline.

By evaluating existing pre-trained large language models, the study establishes a benchmark for future research. Experimental results show that models perform well on 'where', 'when', and 'who', but struggle with 'what' and 'why', highlighting the challenges of open-domain event extraction.

Furthermore, the study demonstrates the models' generalization capabilities across different geographical and textual contexts, proving the potential of the EVENT5Ws dataset for developing generalizable algorithms. Through these experiments, the study offers valuable insights and recommendations for future large-scale dataset development.

Despite the generalization capabilities of the EVENT5Ws dataset across different geographical contexts, challenges remain in handling 'what' and 'why', which may require more complex algorithms to improve performance. Future directions include developing more complex algorithms, expanding the dataset to cover more types of documents and events, and exploring automated annotation methods to reduce subjective bias in manual annotations.

Deep Analysis

Background

Event extraction is a crucial task in the field of natural language processing, aiming to identify the central aspects of events from text. With the increasing volume of information, automated event extraction becomes increasingly important, especially in emergency management and decision-making. Existing datasets are mostly closed-domain, based on predefined event schemas, limiting the algorithms' generalization capabilities to unseen event types. Moreover, there is a lack of large-scale manually verified datasets in open-domain settings, limiting the development of deep learning methods. To address this, researchers have developed EVENT5Ws, a large-scale, manually annotated, and ICR-verified open-domain event extraction dataset, aiming to support the extraction of unconstrained event types.

Core Problem

Existing event extraction datasets are primarily closed-domain, using predefined event schemas, limiting the algorithms' generalization capabilities to unseen event types. Moreover, there is a lack of large-scale manually verified datasets in open-domain settings, limiting the development of deep learning methods. As real-world events are diverse and constantly evolving, enumerating all possible event types is impractical, necessitating the development of open-domain event extraction datasets that support the extraction of unconstrained event types.

Innovation

The development of the EVENT5Ws dataset has the following innovations: 1) It employs the 5Ws framework, covering the five key aspects of events: where, when, what, who, and why, supporting the extraction of unconstrained event types; 2) It follows a systematic annotation pipeline and is ICR-verified, ensuring the quality and reliability of the dataset; 3) It provides a new benchmark for evaluating existing pre-trained large language models on open-domain event extraction tasks.

Methodology

�� Coder Selection and Recruitment: Coders familiar with the geographical and cultural context were recruited through university courses, mailing lists, and student associations.

�� Annotation Platform: Dataturks, an open-source web application, was used for text annotation.

�� Annotation Guidelines: Clear guidelines were created, along with illustrative examples to help coders understand the task.

�� Resolution Policy: A policy was established to systematically handle disagreements among coders, ensuring the final dataset's accuracy and reliability.

�� Dataset Construction Process: Consisted of four steps: training, dataset preparation, annotation, and resolution of disagreements.

Experiments

The experimental design includes evaluating several state-of-the-art pre-trained large language models on the EVENT5Ws dataset, including Gemma 3, Llama 3.1, Qwen 3, Mistral v0.3, and T5 Large. Experiments were conducted under zero-shot and five-shot prompting, using metrics such as exact match and ROUGE-L to evaluate models' performance on the five aspects: where, when, what, who, and why. Experiments also included testing generalization capabilities across different geographical and textual contexts.

Results

Experimental results show that pre-trained large language models perform well on 'where', 'when', and 'who', but struggle with 'what' and 'why', highlighting the challenges of open-domain event extraction. Additionally, models demonstrate good generalization capabilities across datasets from different geographical and textual contexts, proving the potential of the EVENT5Ws dataset for developing generalizable algorithms. Through these experiments, the study offers valuable insights and recommendations for future large-scale dataset development.

Applications

Application scenarios for the EVENT5Ws dataset include: 1) Supporting the development of more complex algorithms to improve performance in open-domain event extraction; 2) Serving as a benchmark for evaluating existing pre-trained large language models; 3) Testing models' generalization capabilities across different geographical and textual contexts, advancing natural language understanding and downstream applications.

Limitations & Outlook

Despite the generalization capabilities of the EVENT5Ws dataset across different geographical contexts, challenges remain in handling 'what' and 'why', which may require more complex algorithms to improve performance. Additionally, the dataset's creation relies on manual annotation, which, despite ICR verification, may still contain subjective biases. As the dataset is primarily based on news reports, it may perform less effectively when dealing with other types of documents. Future directions include developing more complex algorithms, expanding the dataset to cover more types of documents and events, and exploring automated annotation methods to reduce subjective bias in manual annotations.

Plain Language Accessible to non-experts

Imagine you're in a kitchen preparing a meal. Event extraction is like picking out the ingredients you need from a pantry to make a dish. You need to know where to find these ingredients (where), when you need them (when), what dish you're making (what), who is helping you (who), and why you're making this dish (why). In this study, researchers developed a dataset called EVENT5Ws, which is like a detailed recipe that helps you better select and use these ingredients. This dataset is manually annotated and verified, ensuring that the ingredients you pick are correct and can be used in different kitchens (geographical contexts). However, some ingredients might be hard to find, like special spices (what and why), which require more complex skills to handle. In the future, we hope to develop better tools to help you find and use these ingredients more easily in the kitchen.

ELI14 Explained like you're 14

Hey there! Have you ever wondered how news stories are written? Well, every news story has some key questions: where did it happen? When did it happen? What happened? Who was involved? And why did it happen? These questions are like a big detective story! Researchers have developed a dataset called EVENT5Ws to help computers find these answers in news stories, just like a detective finding clues. This dataset is like a super detailed detective guide, helping computers find clues faster and more accurately. But sometimes, computers face tough challenges, like figuring out why something happened, which is like solving a complex puzzle. In the future, we hope to make computers even smarter to help them solve these puzzles better!

Glossary

Event Extraction

The process of identifying and extracting the central aspects of events from text, such as time, location, and participants.

Used in the paper to describe the process of extracting event information from documents.

Open-Domain

Datasets or algorithms that are not restricted to specific domains or event types.

Describes the EVENT5Ws dataset's support for extracting unconstrained event types.

5Ws Framework

A framework for describing events using five key aspects: where, when, what, who, and why.

Used in the paper to guide the annotation process of the dataset.

Inter-Coder Reliability (ICR)

A metric used to assess the consistency of annotations between different coders.

Used to verify the quality of annotations in the EVENT5Ws dataset.

Pre-trained Large Language Models

Language models trained on large-scale data, capable of performing various natural language processing tasks.

Evaluated for performance on the EVENT5Ws dataset.

Exact Match (EM)

An evaluation metric that determines whether a predicted result exactly matches the corresponding gold standard answer.

Used to evaluate model performance on the EVENT5Ws dataset.

ROUGE-L

A text evaluation metric based on the longest common subsequence, used to assess the quality of generated text.

Used to evaluate model performance on 'what' and 'why'.

Dataset

A collection of data used for training and evaluating algorithms.

EVENT5Ws is a dataset for open-domain event extraction.

Manual Annotation

The process of marking and annotating data by human coders.

Used to create the EVENT5Ws dataset.

Generalization Ability

The ability of a model to perform well on unseen data or in different contexts.

Describes model performance across different geographical and textual contexts.

Open Questions Unanswered questions from this research

1 The challenge of open-domain event extraction lies in improving model performance on 'what' and 'why'. Existing models perform poorly in these areas, potentially requiring more complex algorithms and richer datasets to improve performance.
2 Although the EVENT5Ws dataset demonstrates good generalization capabilities across different geographical contexts, its performance on other types of documents remains to be further verified. This requires expanding the dataset to cover more types of documents and events.
3 Manual annotation, although verified by ICR, may still contain subjective biases. Future research could explore automated annotation methods to reduce subjective bias in manual annotations.
4 Existing pre-trained large language models have limited performance on open-domain event extraction tasks, potentially requiring the development of specialized algorithms to improve performance. This requires further research and optimization of the models.
5 When dealing with complex events, models may struggle to identify implicit causal relationships. This requires the development of more complex algorithms to improve models' ability to recognize and understand complex events.

Applications

Immediate Applications

Emergency Management

The EVENT5Ws dataset can be used to develop more complex algorithms to improve event identification and information extraction capabilities in emergency management.

News Analysis

By using the EVENT5Ws dataset, news organizations can more accurately analyze and report events, improving the quality and efficiency of news reporting.

Natural Language Understanding

The EVENT5Ws dataset provides a new benchmark for natural language understanding, advancing the development and application of related algorithms.

Long-term Vision

Intelligent Decision Support Systems

By improving the accuracy and efficiency of event extraction, the EVENT5Ws dataset is expected to drive the development of intelligent decision support systems, providing more accurate information services across industries.

Cross-Cultural Information Exchange

The generalization capabilities of the EVENT5Ws dataset can facilitate cross-cultural information exchange, improving information understanding and sharing across different cultural contexts.

Abstract

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts, which demonstrates its potential for developing generalizable algorithms. Finally, we summarize the lessons learned during the dataset development and provide recommendations to support future large-scale dataset development.

cs.CL

References (20)

DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction

Meihan Tong, Bin Xu, Shuai Wang et al.

2022 58 citations ⭐ Influential

Use of Ranks in One-Criterion Variance Analysis

W. Kruskal, W. A. Wallis

1952 12480 citations ⭐ Influential

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

2004 20065 citations ⭐ Influential

Open-Domain Event Detection using Distant Supervision

J. Araki, T. Mitamura

2018 43 citations ⭐ Influential

Giveme5W1H: A Universal System for Extracting Main Events from News Articles

Felix Hamborg, Corinna Breitinger, Bela Gipp

2019 52 citations ⭐ Influential View Analysis →

Experiments with crowdsourced re-annotation of a POS tagging data set

Dirk Hovy, Barbara Plank, Anders Søgaard

2014 49 citations

Multi-Sentence Argument Linking

Seth Ebner, Patrick Xia, Ryan Culkin et al.

2019 203 citations View Analysis →

Utility data annotation with Amazon Mechanical Turk

A. Sorokin, D. Forsyth

2008 713 citations

Open Domain Event Extraction Using Neural Latent Variable Models

Xiao Liu, Heyan Huang, Yue Zhang

2019 64 citations View Analysis →

Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2

Amir Pouran Ben Veyseh, Minh Nguyen, Bonan Min et al.

2021 19 citations

“Making the News”: Identifying Noteworthy Events in News Articles

Shyam Upadhyay, Christos Christodoulopoulos, Dan Roth

2016 13 citations

Open domain event extraction from twitter

Alan Ritter, Mausam, Oren Etzioni et al.

2012 659 citations

The Reliability of Multi-Valued Coding of Data

K. Krippendorff, R. Craggs

2016 38 citations

Topic Detection and Tracking Pilot Study Final Report

James Allan, J. Carbonell, G. Doddington et al.

1998 1204 citations

Citizen Science for Mining the Biomedical Literature

Ginger Tsueng, Steven M. Nanis, Jennifer T. Fouquier et al.

2016 26 citations

MEANTIME, the NewsReader Multilingual Event and Time Corpus

Anne-Lyse Minard, Manuela Speranza, Ruben Urizar et al.

2016 147 citations

Lessons Learned from a Citizen Science Project for Natural Language Processing

Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe et al.

2023 5 citations View Analysis →

Spatiotemporal event detection: a review

Manzhu Yu, M. Bambacus, G. Cervone et al.

2020 93 citations

Literary Event Detection

Matthew Sims, Jongho Park, David Bamman

2019 98 citations

Overview of Linguistic Resources for the TAC KBP 2017 Evaluations: Methodologies and Results

Jeremy Getman, Joe Ellis, Zhiyi Song et al.

2017 30 citations

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Event Extraction

Open-Domain

5Ws Framework

Inter-Coder Reliability (ICR)

Pre-trained Large Language Models

Exact Match (EM)

ROUGE-L

Dataset

Manual Annotation

Generalization Ability

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Emergency Management

News Analysis

Natural Language Understanding

Long-term Vision

Intelligent Decision Support Systems

Cross-Cultural Information Exchange

Abstract

References (20)

Related Papers

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution

Improving Robustness of Tabular Retrieval via Representational Stability

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

CRAFT: Clustered Regression for Adaptive Filtering of Training data

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering