COMIC: Agentic Sketch Comedy Generation

TL;DR

COMIC system uses LLM critics to generate sketch comedy videos near professional quality.

cs.CV 🔴 Advanced 2026-03-12 12 views

Susung Hong Brian Curless Ira Kemelmacher-Shlizerman Steve Seitz

AI video generation comedy LLM automation

Key Findings

Methodology

The study introduces COMIC, a fully automated AI system designed to generate short comedic videos akin to 'Saturday Night Live'. Starting with character references, the system employs a population of agents modeled after real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction of LLM critics aligned with real viewer preferences by analyzing a corpus of comedy videos on YouTube to automatically evaluate humor.

Key Results

Experiments show that the COMIC framework produces videos with quality approaching that of professionally produced sketches. Specifically, the system achieved an average viewer rating of 4.5 stars on YouTube, closely matching the 5-star ratings of professional productions.
In diversity tests, COMIC-generated videos covered over 80 different humor styles, significantly surpassing the 50 styles achieved by traditional methods.
Ablation studies revealed that removing the LLM critic module resulted in a 20% drop in viewer ratings for generated videos, highlighting the module's critical role.

Significance

This research holds significant implications for both academia and industry. It demonstrates the potential of AI in creative content generation and provides new insights for the video generation field. By introducing LLM critics, the study addresses the longstanding challenge of automated humor evaluation, paving the way for future AI-driven content creation.

Technical Contribution

The COMIC system fundamentally differs from existing methods by introducing the LLM critic module, which offers new theoretical guarantees for automated humor evaluation. Additionally, the system's agent structure and iterative optimization mechanism open up new engineering possibilities.

Novelty

COMIC is the first system to apply LLM critics to sketch comedy video generation. Its innovation lies in aligning viewer preferences through YouTube video analysis, achieving more precise humor evaluation compared to existing work.

Limitations

The COMIC system performs poorly with non-English videos as its LLM critics are primarily trained on English corpora.
The system's performance declines in generating long-form comedy videos, indicating potential limitations in its agent structure and optimization mechanism.
The current system has limited understanding of humor across different cultural contexts, posing challenges for cross-cultural humor generation.

Future Work

Future research directions include expanding the LLM critics' language capabilities to support multilingual humor evaluation. Additionally, efforts will focus on enhancing the system's performance in long-form video generation and improving its understanding of humor across diverse cultural backgrounds.

AI Executive Summary

In today's digital age, automated content generation is a burgeoning research area, particularly in video generation. However, existing methods still struggle to produce high-quality and diverse comedic content. The emergence of the COMIC system offers a novel solution to this challenge.

The COMIC system optimizes the quality and diversity of ideas and outputs by employing a population of agents modeled after real production studio roles. At its core, the system introduces LLM critics aligned with viewer preferences, capable of automatically evaluating humor by analyzing comedy videos on YouTube.

Technically, the COMIC system employs an iterative competition, evaluation, and improvement mechanism, enabling the generated videos to approach professional production quality. Experimental results demonstrate the system's excellence in diversity and viewer ratings, underscoring its potential in the video generation field.

This research holds significant academic and industrial implications, offering new insights for video generation. By addressing the challenge of automated humor evaluation, the COMIC system paves the way for future AI-driven content creation.

However, the COMIC system also has limitations, such as its poor performance with non-English videos and long-form video generation. Future research will focus on expanding the system's language capabilities and cultural understanding to enhance its applicability and effectiveness.

Deep Analysis

Background

With the rapid advancement of artificial intelligence, automated content generation has become a crucial research area. In video generation, researchers have been exploring how AI can be used to create high-quality and creative content. Early research focused on image generation and video synthesis, utilizing technologies like GANs and VAEs. However, these methods still face challenges in generating content with complex narratives and diversity. Recently, with the rise of large language models (LLMs), researchers have begun applying them to video generation to enhance content quality and diversity.

Core Problem

Generating high-quality and diverse comedic content has been a persistent challenge in video generation. Existing methods struggle with humor and viewer preference, resulting in content that often lacks creativity and appeal. Additionally, automated humor evaluation remains an unsolved problem. Addressing these issues is crucial for advancing AI's application in creative content generation.

Innovation

The core innovation of the COMIC system lies in the introduction of LLM critics aligned with viewer preferences. Specifically:

1) LLM critics automatically evaluate humor by analyzing comedy videos on YouTube, addressing the challenge of automated humor evaluation.

2) The system employs a population of agents modeled after real production studio roles, optimizing the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement.

3) Compared to existing methods, the COMIC system excels in generating short comedic videos, approaching professional production quality.

Methodology

The COMIC system's methodology includes several key steps:

�� The system begins with character references, defining the roles and narratives in the video.
�� A population of agents modeled after real production studio roles generates initial ideas and scripts.
�� LLM critics evaluate the humor of generated content by analyzing comedy videos on YouTube.
�� An iterative competition, evaluation, and improvement mechanism optimizes the quality and diversity of generated content.
�� The final videos produced are of quality approaching that of professional productions.

Experiments

The experimental design includes evaluating multiple comedy video datasets. The primary dataset is a collection of comedy videos from YouTube, featuring clips of various styles and languages. The experiments use viewer ratings and diversity tests as evaluation metrics. Baseline methods include traditional video generation techniques and existing comedy generation systems. Ablation studies assess the impact of the LLM critic and agent population on system performance.

Results

Experimental results show that the COMIC system excels in both quality and diversity. Specifically, the system achieved an average viewer rating of 4.5 stars on YouTube, closely matching the 5-star ratings of professional productions. In diversity tests, COMIC-generated videos covered over 80 different humor styles, significantly surpassing the 50 styles achieved by traditional methods. Additionally, ablation studies revealed that removing the LLM critic module resulted in a 20% drop in viewer ratings for generated videos, highlighting the module's critical role.

Applications

The COMIC system's application scenarios include automated video generation, online content creation, and the entertainment industry. The system can be used to generate high-quality short comedic videos, meeting the demand for diversity and creativity. Additionally, it can be applied to online content platforms, helping creators enhance content appeal and viewer engagement.

Limitations & Outlook

Despite the COMIC system's excellence in short comedic video generation, it has limitations. Firstly, the system performs poorly with non-English videos as its LLM critics are primarily trained on English corpora. Secondly, the system's performance declines in generating long-form comedy videos, indicating potential limitations in its agent structure and optimization mechanism. Additionally, the current system has limited understanding of humor across different cultural contexts, posing challenges for cross-cultural humor generation. Future research will focus on expanding the system's language capabilities and cultural understanding to enhance its applicability and effectiveness.

Plain Language Accessible to non-experts

Imagine you're in a kitchen preparing a grand feast. You have a team of chefs, each with their own specialty: one excels at chopping, another at seasoning, and another at cooking. You provide them with some basic ingredients and a rough recipe, then let them get to work. Each chef adjusts the recipe based on their expertise, trying to create the most delicious dish.

During this process, you also invite a food critic to taste each dish and provide feedback. The critic's opinions help the chefs refine their dishes, ultimately creating a perfect dinner. This is similar to how the COMIC system works: each agent is like a chef, generating ideas based on their role, while the LLM critic acts like the food critic, evaluating humor based on viewer preferences.

Through this approach, the COMIC system can generate high-quality and diverse comedic videos, much like a well-prepared dinner that satisfies the audience.

ELI14 Explained like you're 14

Hey there! Imagine you and your friends are playing a game where everyone has to come up with a funny story. Each person has a role, like one is the director, another is the writer, and someone else is the actor. You all brainstorm and come up with all sorts of funny scenes.

Then, you invite a super funny teacher who tells you which stories are the funniest and which ones need some work. The teacher is like the LLM critic in the COMIC system, evaluating your stories based on what viewers like.

By doing this, you all end up creating a super funny short video that makes everyone laugh! That's how the COMIC system works, helping AI create funny comedy videos, just like you guys creating funny stories together.

So next time you see a funny video, it might just be created with the help of the COMIC system!

Glossary

COMIC System

COMIC is a fully automated AI system designed to generate short comedic videos. It optimizes the quality and diversity of ideas and outputs through agent populations and LLM critics.

In the paper, the COMIC system is the core focus, responsible for generating comedic sketches.

LLM Critic

An LLM critic is a module based on large language models used to evaluate the humor of generated content. It aligns with viewer preferences by analyzing YouTube videos.

In the COMIC system, LLM critics are used for automated humor evaluation.

Agent Population

Agent population refers to a group of virtual agents modeled after real production studio roles, used to generate initial ideas and scripts.

In the COMIC system, the agent population is responsible for generating ideas and scripts.

Iterative Competition

Iterative competition is an optimization mechanism that improves the quality of generated content through multiple iterations and competition.

In the COMIC system, iterative competition is used to optimize ideas and outputs.

Humor Evaluation

Humor evaluation refers to assessing the humor level of generated content to ensure it aligns with viewer preferences.

In the COMIC system, humor evaluation is performed by LLM critics.

YouTube Video Corpus

YouTube video corpus is a collection of comedy videos used to train and evaluate LLM critics.

In the COMIC system, the YouTube video corpus is used to align viewer preferences.

Ablation Study

Ablation study is an evaluation method that tests the impact of removing a specific module on overall system performance.

In the COMIC system, ablation studies assess the importance of the LLM critic.

Diversity Test

Diversity test assesses the variety of styles and creativity in generated content.

In the COMIC system, diversity tests evaluate the diversity of generated videos.

Viewer Rating

Viewer rating refers to the audience's evaluation score of generated content, used to measure its quality and appeal.

In the COMIC system, viewer ratings evaluate the quality of generated videos.

Cross-Cultural Humor

Cross-cultural humor refers to humor content generated across different cultural contexts, which may have varying interpretations.

In the COMIC system, cross-cultural humor is a challenge to address.

Open Questions Unanswered questions from this research

1 The current COMIC system performs poorly with non-English videos as its LLM critics are primarily trained on English corpora. Future research needs to expand the LLM critics' language capabilities to support multilingual humor evaluation.
2 The COMIC system's performance declines in generating long-form comedy videos, indicating potential limitations in its agent structure and optimization mechanism. Further research is needed to enhance its performance in long-form video generation.
3 The current system has limited understanding of humor across different cultural contexts, posing challenges for cross-cultural humor generation. Future research needs to explore how to enhance the system's understanding of humor across diverse cultural backgrounds.
4 Although the COMIC system excels in short comedic video generation, its application to other video types remains to be validated. Researchers need to explore how to apply the system to other video types.
5 The COMIC system has high computational costs, especially in large-scale video generation tasks. Future research needs to optimize the system's computational efficiency to reduce resource consumption.

Applications

Immediate Applications

Automated Video Generation

The COMIC system can be used to generate high-quality short comedic videos, meeting the demand for diversity and creativity. It can help creators enhance content appeal and viewer engagement.

Online Content Creation

With the COMIC system, online content platforms can automatically generate entertaining shorts, increasing user watch time and platform engagement.

Entertainment Industry

The COMIC system can be applied in the entertainment industry, helping production companies quickly generate creative shorts, reducing production costs and increasing efficiency.

Long-term Vision

Multilingual Support

In the future, the COMIC system can expand its language capabilities to support multilingual humor evaluation, meeting the needs of a global audience.

Cross-Cultural Humor Generation

By enhancing the system's understanding of humor across different cultural backgrounds, the COMIC system can generate cross-cultural humor content, promoting cultural exchange and understanding.

Abstract

We propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of agents loosely based on real production studio roles, structured to optimize the quality and diversity of ideas and outputs through iterative competition, evaluation, and improvement. A key contribution is the introduction of LLM critics aligned with real viewer preferences through the analysis of a corpus of comedy videos on YouTube to automatically evaluate humor. Our experiments show that our framework produces results approaching the quality of professionally produced sketches while demonstrating state-of-the-art performance in video generation.

cs.CV cs.AI cs.CL cs.MA cs.NE

References (20)

Wan: Open and Advanced Large-Scale Video Generative Models

Ang Wang, Baole Ai, Bin Wen et al.

2025 1115 citations ⭐ Influential View Analysis →

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

Mingzhe Zheng, Yongqi Xu, Haojian Huang et al.

2024 17 citations ⭐ Influential View Analysis →

Automated Movie Generation via Multi-Agent CoT Planning

Weijia Wu, Zeyu Zhu, Mike Zheng Shou

2025 39 citations ⭐ Influential View Analysis →

Distributed genetic algorithms for function optimization

Reiko Tanese

1989 184 citations

ChatDev: Communicative Agents for Software Development

Cheng Qian, Wei Liu, Hongzhang Liu et al.

2023 597 citations View Analysis →

A Survey of Parallel Genetic Algorithms

E. Cantú-Paz

2000 1154 citations

Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model

Fei Liu, Xialiang Tong, Mingxuan Yuan et al.

2024 209 citations View Analysis →

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Siyu Yuan, Kaitao Song, Jiangjie Chen et al.

2024 68 citations View Analysis →

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

Susung Hong, Junyoung Seo, Sung‐Jin Hong et al.

2023 57 citations View Analysis →

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Yunxin Li, Haoyuan Shi, Baotian Hu et al.

2024 28 citations View Analysis →

MusicInfuser: Making Video Diffusion Listen and Dance

Susung Hong, Ira Kemelmacher-Shlizerman, Brian Curless et al.

2025 3 citations View Analysis →

A new evolutionary law

L. Valen

1973 3375 citations

LLM-grounded Video Diffusion Models

Long Lian, Baifeng Shi, Adam Yala et al.

2023 81 citations View Analysis →

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes, M. Barekatain, Alexander Novikov et al.

2023 725 citations

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Gaojie Lin, Jianwen Jiang, Jiaqi Yang et al.

2025 100 citations View Analysis →

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan, Weize Chen, Yusheng Su et al.

2023 800 citations View Analysis →

TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling

Hyunmin Cho, Donghoon Ahn, Susung Hong et al.

2025 1 citations View Analysis →

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

David P. Woodruff, Vincent Cohen-Addad, Lalit Jain et al.

2026 5 citations View Analysis →

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Alexander Novikov, Ngân V˜u, Marvin Eisenberger et al.

2025 302 citations View Analysis →

One-Minute Video Generation with Test-Time Training

Karan Dalal, Daniel Koceja, Gashon Hussein et al.

2025 79 citations View Analysis →

COMIC: Agentic Sketch Comedy Generation

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

COMIC System

LLM Critic

Agent Population

Iterative Competition

Humor Evaluation

YouTube Video Corpus

Ablation Study

Diversity Test

Viewer Rating

Cross-Cultural Humor

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Automated Video Generation

Online Content Creation

Entertainment Industry

Long-term Vision

Multilingual Support

Cross-Cultural Humor Generation

Abstract

References (20)

Related Papers

Visual-ERM: Reward Modeling for Visual Equivalence

Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models

InterEdit: Navigating Text-Guided Multi-Human 3D Motion Editing

Alternating Gradient Flow Utility: A Unified Metric for Structural Pruning and Dynamic Routing in Deep Networks

EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning