TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

Key Findings

Methodology

TextSeal builds upon Gumbel-max sampling by introducing dual-key generation to restore output diversity and entropy-weighted scoring for improved detection. It supports speculative decoding and multi-token prediction without adding inference overhead. TextSeal significantly outperforms baselines like SynthID-text in detection strength and maintains confident localized detection in mixed human/AI documents.

Key Results

TextSeal was evaluated across multiple reasoning benchmarks, showing it preserves downstream performance. For instance, on the MATH benchmark, TextSeal achieved the same score as the non-watermarked condition, 79.8, demonstrating its distortion-free nature.
In a multilingual human evaluation (6000 A/B comparisons across 5 languages), TextSeal showed no perceptible quality difference compared to non-watermarked outputs, indicating no loss in diversity or quality.
Experiments demonstrated that TextSeal's watermark signal transfers through model distillation, enabling detection of unauthorized use, providing new possibilities for data protection.

Significance

TextSeal's significance lies in providing an effective watermarking scheme for large language models that achieves efficient content detection without affecting generation quality. This is particularly important for production systems that must comply with regulations requiring machine-detectable marking of AI outputs. Additionally, TextSeal's 'radioactive' property allows its watermark signal to persist through model distillation, offering a novel technical means to prevent unauthorized model use.

Technical Contribution

TextSeal addresses the determinism issue of traditional Gumbel-max methods through innovative dual-key generation and entropy-weighted scoring mechanisms, enhancing detection robustness. Its multi-region localization technique significantly boosts detection capability in mixed documents. Furthermore, TextSeal's design supports speculative decoding and multi-token prediction without adding inference overhead, enabling large-scale production deployment.

Novelty

TextSeal is the first to introduce dual-key generation and entropy-weighted scoring in LLM watermarking. Compared to existing methods, it significantly enhances detection strength and robustness without affecting generation quality. Its multi-region localization technique is also novel, enabling more precise watermark detection in mixed documents.

Limitations

TextSeal's detection capability may decrease in extremely low-entropy environments, where high-confidence model outputs can weaken the watermark signal.
While TextSeal performs well in multilingual settings, its applicability to specific languages or domains may require further validation.
In some complex generation tasks, TextSeal may require parameter adjustments to ensure optimal performance.

Future Work

Future research directions include optimizing TextSeal's performance in low-entropy environments and exploring its application in more languages and domains. Further investigation into its signal transfer mechanism during model distillation and maintaining efficient watermark detection in more complex generation tasks are also promising areas for exploration.

AI Executive Summary

The widespread adoption of large language models (LLMs) has raised significant concerns about content provenance and unauthorized model use. Existing watermarking techniques often face a trade-off between detection strength and generation diversity, making it challenging to achieve efficient detection without compromising output quality.

TextSeal introduces a novel watermarking scheme for LLMs, addressing the determinism issue of traditional methods through dual-key generation and entropy-weighted scoring. Its design supports speculative decoding and multi-token prediction without adding inference overhead, making it suitable for large-scale production deployment.

At the core of TextSeal are dual-key generation based on Gumbel-max sampling and entropy-weighted scoring, which enhance detection strength while preserving generation diversity and quality. Additionally, TextSeal's multi-region localization technique significantly improves detection capability in mixed documents.

Experimental results demonstrate that TextSeal preserves downstream performance across multiple reasoning benchmarks and shows no perceptible quality difference in a multilingual human evaluation. This indicates that TextSeal achieves efficient content detection without affecting generation quality.

TextSeal's 'radioactive' property allows its watermark signal to persist through model distillation, providing a novel technical means to prevent unauthorized model use. This is particularly important for production systems that must comply with regulations requiring machine-detectable marking of AI outputs.

Despite its strong performance in detection strength and generation diversity, TextSeal's performance in extremely low-entropy environments requires further optimization. Future research directions include enhancing its performance in such environments and exploring its application in more languages and domains.

Deep Analysis

Background

With the rapid advancement of AI technologies, large language models (LLMs) have become widely used in various applications. However, this has also raised new challenges in ensuring content provenance and preventing unauthorized model use. Traditional watermarking techniques often face a trade-off between detection strength and generation diversity, making it difficult to achieve efficient detection without compromising output quality. For example, methods like SynthID-text perform well in certain scenarios but struggle to maintain effective detection in mixed documents. Additionally, existing methods often fail to maintain watermark signals during model distillation, limiting their potential for data protection applications. Therefore, developing a watermarking scheme that can achieve efficient detection without affecting generation quality is a critical research direction.

Core Problem

The widespread use of LLMs has raised issues of content provenance and model use authorization. Existing watermarking techniques face a trade-off between detection strength and generation diversity, making it challenging to achieve efficient detection without compromising output quality. Additionally, maintaining watermark signals during model distillation is an unsolved problem. Solving these issues is crucial for production systems that must comply with regulations requiring machine-detectable marking of AI outputs.

Innovation

TextSeal addresses the limitations of existing watermarking techniques through the following innovations:

�� Dual-key generation: By introducing dual-key generation, TextSeal significantly enhances detection strength and robustness without affecting generation quality.

�� Entropy-weighted scoring: This mechanism allows TextSeal to achieve more precise watermark detection in mixed documents.

�� Multi-region localization technique: This innovation significantly boosts detection capability in mixed documents, enabling efficient detection without affecting generation quality.

Methodology

TextSeal's implementation involves the following key steps:

�� Dual-key generation based on Gumbel-max sampling: This mechanism enhances detection strength and robustness without affecting generation quality.

�� Entropy-weighted scoring: This mechanism allows for more precise watermark detection in mixed documents.

�� Multi-region localization technique: This innovation significantly boosts detection capability in mixed documents, enabling efficient detection without affecting generation quality.

�� Support for speculative decoding and multi-token prediction: TextSeal's design supports these optimizations without adding inference overhead, making it suitable for large-scale production deployment.

Experiments

TextSeal was evaluated across multiple reasoning benchmarks, including MATH, GSM8K, and HumanEval datasets. Results show that TextSeal preserves downstream performance and shows no perceptible quality difference in a multilingual human evaluation. Additionally, TextSeal's watermark signal transfers through model distillation, enabling detection of unauthorized use, providing new possibilities for data protection.

Results

Experimental results show that TextSeal preserves downstream performance across multiple reasoning benchmarks. For instance, on the MATH benchmark, TextSeal achieved the same score as the non-watermarked condition, 79.8, demonstrating its distortion-free nature. In a multilingual human evaluation (6000 A/B comparisons across 5 languages), TextSeal showed no perceptible quality difference compared to non-watermarked outputs, indicating no loss in diversity or quality. Additionally, experiments demonstrated that TextSeal's watermark signal transfers through model distillation, enabling detection of unauthorized use, providing new possibilities for data protection.

Applications

TextSeal's application scenarios include:

�� Content provenance detection: In production systems that must comply with regulations, TextSeal can provide machine-detectable marking of AI-generated content, enhancing content traceability.

�� Prevention of unauthorized model use: TextSeal's 'radioactive' property allows its watermark signal to persist through model distillation, offering a novel technical means to prevent unauthorized model use.

�� Data protection: By achieving efficient detection without affecting generation quality, TextSeal provides new possibilities for data protection, especially in scenarios requiring protection of generated data.

Limitations & Outlook

Despite TextSeal's strong performance in detection strength and generation diversity, its performance in extremely low-entropy environments requires further optimization. In such cases, high-confidence model outputs can weaken the watermark signal. Additionally, while TextSeal performs well in multilingual settings, its applicability to specific languages or domains may require further validation. In some complex generation tasks, TextSeal may require parameter adjustments to ensure optimal performance. Future research directions include optimizing its performance in low-entropy environments and exploring its application in more languages and domains.

Plain Language Accessible to non-experts

Imagine you're in a kitchen, cooking a meal. You have a secret recipe (like TextSeal's dual-key) that makes your dishes unique and delicious. Every time you cook, you use this recipe to ensure the taste is consistent but still diverse, just like how TextSeal maintains diversity and quality in generated content.

In the kitchen, you also adjust the amount of seasoning based on different ingredients (like different text environments), ensuring each dish reaches its best flavor. This method ensures that even in different situations, your dishes remain tasty and appealing.

Moreover, when you need to use the same seasoning across multiple dishes, you carefully distribute it to ensure each dish has enough seasoning (similar to TextSeal's multi-region localization technique). This way, even in a complex banquet, each of your dishes can be recognized and appreciated by guests.

In summary, TextSeal is like a smart chef who ensures each dish has a unique flavor and consistent quality without affecting the dish's quality.

ELI14 Explained like you're 14

Hey there! Did you know there's something called TextSeal that works like an invisible signature for computer-written articles?

Imagine you're playing a game and you give your character an invisible cloak so others can't see you, but you know where you are. That's what TextSeal does! It adds a secret mark to computer-generated text, so even if it's mixed with a bunch of human-written stuff, we can still find it.

The coolest part is that this mark doesn't make the text weird or ugly, just like the invisible cloak doesn't make your character clumsy. It can even work in different languages and situations, super cool!

Sometimes, in really complicated cases, this mark might be a bit hard to find, like looking for a tiny treasure on a huge map. But don't worry, scientists are working hard to make it even better!

Glossary

Gumbel-max sampling

A sampling method used to generate random variables, enabling efficient detection without affecting generation quality.

Used in TextSeal for dual-key generation.

Dual-key generation

Generates output using two keys, increasing generation diversity and detection robustness.

A core technique of TextSeal.

Entropy-weighted scoring

Scores based on local text entropy to improve watermark detection accuracy.

Used in TextSeal's multi-region localization technique.

Multi-region localization

A technique for identifying watermark signals in mixed documents, enabling more precise detection.

An innovative technique in TextSeal.

Speculative decoding

Improves generation efficiency without adding inference overhead.

An optimization supported by TextSeal.

Multi-token prediction

Predicts multiple tokens simultaneously, improving generation efficiency and quality.

An optimization supported by TextSeal.

Model distillation

Trains a smaller model to approximate a larger model, reducing computational costs while maintaining performance.

TextSeal's watermark signal transfers during model distillation.

Radioactive

Refers to the property of watermark signals persisting through model distillation.

An important property of TextSeal.

Distortion-free

Achieves watermark detection without affecting generation quality.

An important property of TextSeal.

SynthID-text

A baseline method for LLM watermarking, performs well in certain scenarios but struggles in mixed documents.

Compared with TextSeal as a baseline method.

Open Questions Unanswered questions from this research

1 How can TextSeal's detection capability be improved in extremely low-entropy environments? Existing methods may weaken the watermark signal in high-confidence outputs, requiring further optimization.
2 What is TextSeal's applicability to specific languages or domains? While it performs well in multilingual settings, further validation may be needed for specific languages or domains.
3 How can TextSeal maintain efficient watermark detection in more complex generation tasks? Some complex tasks may require parameter adjustments to ensure optimal performance.
4 What is the mechanism of TextSeal's signal transfer during model distillation? Further investigation into its signal transfer mechanism during model distillation is an important direction.
5 How can TextSeal's multi-region localization technique be optimized without increasing computational costs? Achieving efficient detection in complex documents requires further research.

Applications

Immediate Applications

Content provenance detection

TextSeal can provide machine-detectable marking of AI-generated content in production systems that must comply with regulations, enhancing content traceability.

Prevention of unauthorized model use

TextSeal's 'radioactive' property allows its watermark signal to persist through model distillation, offering a novel technical means to prevent unauthorized model use.

Data protection

By achieving efficient detection without affecting generation quality, TextSeal provides new possibilities for data protection, especially in scenarios requiring protection of generated data.

Long-term Vision

Multilingual support

Further expand TextSeal's application in more languages and domains, enhancing its applicability and impact globally.

Complex generation tasks

Optimize TextSeal's performance in complex generation tasks, ensuring efficient watermark detection across various generation scenarios.

Abstract

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

cs.CR cs.CL cs.LG

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Gumbel-max sampling

Dual-key generation

Entropy-weighted scoring

Multi-region localization

Speculative decoding

Multi-token prediction

Model distillation

Radioactive

Distortion-free

SynthID-text

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Content provenance detection

Prevention of unauthorized model use

Data protection

Long-term Vision

Multilingual support

Complex generation tasks

Abstract

Related Papers

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks

On the Evaluation of Spiking Neural Network Configurations for Network Intrusion Detection

TriSweep: A Four-Drone Swarm Framework for Electromagnetic Side-Channel Analysis

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection