STAMP: Selective Task-Aware Mechanism for Text Privacy

TL;DR

STAMP framework uses the Polar mechanism to achieve superior privacy-utility trade-offs in text privacy.

cs.LG 🔴 Advanced 2026-03-13 13 views

Fengwei Tian Payel Bhattacharjee Heidi Hanson Geoffrey D. Rubin Joseph Y. Lo Ravi Tandon

privacy protection text processing polar mechanism differential privacy task-aware

Key Findings

Methodology

The STAMP framework integrates task-aware privacy allocation with the Polar mechanism. By considering each token's importance to downstream tasks and its privacy sensitivity, STAMP achieves fine-grained privacy budget allocation. The Polar mechanism perturbs only the direction of embeddings while preserving their magnitude, ensuring semantic neighborhoods are maintained.

Key Results

On the SQuAD dataset, STAMP combined with the Polar mechanism achieved a cosine similarity of 0.833 under the same privacy budget, while the traditional Laplace mechanism only reached 0.343, demonstrating significant utility improvement.
On the Yelp dataset, STAMP achieved an accuracy of 0.560 under the same conditions, compared to only 0.220 with the Laplace mechanism, further validating its superiority.
On the AG News dataset, STAMP achieved an accuracy of 0.800, significantly outperforming the Laplace mechanism's 0.520, showcasing its stability across different scenarios.

Significance

The STAMP framework holds significant importance in the field of text privacy protection. It not only addresses the severe utility loss in traditional methods but also achieves higher flexibility and adaptability through task-aware privacy allocation. This framework provides a new approach for academia, effectively balancing privacy protection and task utility, with broad application potential.

Technical Contribution

The technical contributions of the STAMP framework lie in its innovative combination of task-aware privacy budget allocation and the Polar mechanism. Unlike existing isotropic noise mechanisms, the Polar mechanism significantly improves downstream utility by preserving semantic neighborhoods. Additionally, the STAMP framework offers a modular principle for privacy budget distribution that can be integrated with other privacy mechanisms.

Novelty

The STAMP framework is the first to introduce task-aware privacy budget allocation and the Polar mechanism in text privacy protection. This innovation lies in its ability to dynamically adjust privacy protection intensity based on task requirements, significantly enhancing the privacy-utility balance.

Limitations

The STAMP framework may encounter boundary detection errors when handling multi-token entities. Although the Polar mechanism is robust to slight inconsistencies, it may still affect overall performance.
The framework has a slightly higher computational complexity than traditional methods, especially during task-aware grouping and budget allocation.
For certain specific privacy requirement scenarios, the applicability of STAMP may be limited.

Future Work

Future research directions include optimizing the computational efficiency of the STAMP framework, exploring its performance in more practical application scenarios, and integrating it with other privacy protection mechanisms to enhance its applicability. Further research on dynamically adjusting privacy budgets across different tasks is also a promising direction.

AI Executive Summary

In the era of big data, protecting user privacy has become a crucial research topic. Traditional text privacy protection methods often severely compromise task utility while safeguarding privacy, posing significant challenges in practical applications. The STAMP framework was proposed to address this issue. By integrating task-aware privacy budget allocation and the Polar mechanism, STAMP can maximize task utility while protecting privacy.

The core of the STAMP framework lies in its innovative privacy budget allocation strategy. By analyzing each token's importance to downstream tasks and its privacy sensitivity, STAMP achieves fine-grained privacy budget allocation. This method not only enhances the flexibility of privacy protection but also ensures adaptability across different task scenarios.

The Polar mechanism is another major innovation of the STAMP framework. Unlike traditional isotropic noise mechanisms, the Polar mechanism perturbs only the direction of embeddings while preserving their magnitude, ensuring semantic neighborhoods are maintained. This mechanism significantly improves downstream task utility, making STAMP outperform traditional methods on multiple datasets.

Experimental results show that STAMP achieves superior privacy-utility trade-offs on datasets such as SQuAD, Yelp, and AG News. Under the same privacy budget, STAMP's utility metrics significantly surpass those of the traditional Laplace mechanism, demonstrating its stability and applicability across different scenarios.

However, the STAMP framework also has some limitations. Its computational complexity is slightly higher than traditional methods, especially during task-aware grouping and budget allocation. Additionally, for certain specific privacy requirement scenarios, the applicability of STAMP may be limited.

Future research directions include optimizing the computational efficiency of the STAMP framework, exploring its performance in more practical application scenarios, and integrating it with other privacy protection mechanisms to enhance its applicability. The STAMP framework provides a new approach for text privacy protection, with broad application potential.

Deep Analysis

Background

With the rapid development of big data and artificial intelligence technologies, text privacy protection has become a highly concerned research area. Traditional privacy protection methods, such as isotropic Gaussian or Laplace noise mechanisms, often severely compromise task utility while safeguarding privacy. In recent years, researchers have begun exploring how to balance privacy protection and task utility. The STAMP framework was proposed based on this background, offering a new solution by integrating task-aware privacy budget allocation and the Polar mechanism.

Core Problem

The core problem of text privacy protection is how to maximize task utility while protecting user privacy. Traditional methods often employ uniform privacy budget allocation strategies, ignoring the importance and privacy sensitivity of different tokens in tasks. This leads to excessive or insufficient privacy protection, affecting downstream task utility. Therefore, achieving fine-grained privacy budget allocation has become a pressing issue.

Innovation

The core innovations of the STAMP framework lie in its task-aware privacy budget allocation strategy and the Polar mechanism. • Task-aware privacy budget allocation: By analyzing each token's importance to downstream tasks and its privacy sensitivity, STAMP achieves fine-grained privacy budget allocation. This method not only enhances the flexibility of privacy protection but also ensures adaptability across different task scenarios. • Polar mechanism: Unlike traditional isotropic noise mechanisms, the Polar mechanism perturbs only the direction of embeddings while preserving their magnitude, ensuring semantic neighborhoods are maintained. This mechanism significantly improves downstream task utility.

Methodology

The implementation of the STAMP framework includes the following key steps: • Task-aware privacy budget allocation: By analyzing each token's importance to downstream tasks and its privacy sensitivity, STAMP achieves fine-grained privacy budget allocation. • Polar mechanism: By perturbing only the direction of embeddings while preserving their magnitude, the semantic neighborhoods are maintained. • Decoding process: Cosine nearest-neighbor search is used for decoding, ensuring alignment between perturbation geometry and decoding geometry. • Combination strategy: Task-aware privacy budget allocation is combined with the Polar mechanism to achieve superior privacy-utility trade-offs.

Experiments

The experimental design includes evaluations on datasets such as SQuAD, Yelp, and AG News. • Datasets: Widely used datasets like SQuAD, Yelp, and AG News are selected to verify the generality of the STAMP framework. • Baseline methods: Compared with traditional Laplace mechanisms to evaluate the utility improvement of STAMP. • Evaluation metrics: Cosine similarity and classification accuracy are used as utility metrics to measure the performance of different methods under the same privacy budget. • Hyperparameter settings: Experiments are conducted under different privacy budgets to analyze the performance of STAMP in different scenarios.

Results

Experimental results show that STAMP achieves superior privacy-utility trade-offs on multiple datasets. • On the SQuAD dataset, STAMP combined with the Polar mechanism achieved a cosine similarity of 0.833 under the same privacy budget, while the traditional Laplace mechanism only reached 0.343. • On the Yelp dataset, STAMP achieved an accuracy of 0.560 under the same conditions, compared to only 0.220 with the Laplace mechanism. • On the AG News dataset, STAMP achieved an accuracy of 0.800, significantly outperforming the Laplace mechanism's 0.520.

Applications

The STAMP framework has broad application scenarios. • Data privacy protection: Suitable for text processing tasks that require user privacy protection, such as medical records and customer support. • Task-aware text processing: Capable of dynamically adjusting privacy protection intensity based on task requirements, improving task utility. • Industrial applications: Significant in scenarios requiring a balance between privacy protection and task utility, such as finance and e-commerce.

Limitations & Outlook

Although the STAMP framework achieves a good balance between privacy protection and task utility, there are still some limitations. • Computational complexity: The computational complexity of STAMP is slightly higher than traditional methods, especially during task-aware grouping and budget allocation. • Applicability: For certain specific privacy requirement scenarios, the applicability of STAMP may be limited. • Boundary detection: When handling multi-token entities, boundary detection errors may occur. Although the Polar mechanism is robust to slight inconsistencies, it may still affect overall performance.

Plain Language Accessible to non-experts

Imagine you're in a library and want to borrow some books, but you don't want others to know which books you've borrowed. Traditional methods are like covering all the books with a cloth, so no one can see what you've borrowed, but you also find it hard to locate the books you want. The STAMP framework is like a smart librarian who knows the content and importance of each book and can selectively hide some book information based on your needs while keeping the visibility of the books you need.

In this process, the librarian decides which books need higher privacy protection and which can be open for you to view based on each book's content and your needs. This is like the task-aware privacy budget allocation strategy in the STAMP framework, achieving fine-grained privacy budget allocation by analyzing each token's importance to downstream tasks and its privacy sensitivity.

Moreover, the librarian ensures that while hiding information, your understanding of the book content is not affected. This is like the Polar mechanism in the STAMP framework, which perturbs only the direction of embeddings while preserving their magnitude, ensuring semantic neighborhoods are maintained.

Overall, the STAMP framework is like a smart librarian who can maximize task utility while protecting privacy, providing a new solution for text privacy protection.

ELI14 Explained like you're 14

Hey there, friends! Today I'm going to tell you about something super cool called the STAMP framework. Imagine you're playing a game where you need to protect your secrets from other players, but you also have to complete tasks. STAMP is like your super assistant, helping you complete tasks while keeping your secrets safe!

First, STAMP analyzes the importance of each task, just like you decide which task is most worth completing in a game. Then, it allocates resources to protect your secrets based on the importance of the tasks. This way, you can complete important tasks without revealing your secrets!

Next, STAMP has a special skill called the Polar mechanism. It's like a magic shield that protects your secrets from being discovered while keeping you strong in the game. This magic shield adjusts the strength of protection based on the task's needs, making you unstoppable in the game!

In short, STAMP is like your super assistant, helping you protect your secrets while completing tasks in the game. Isn't that cool?

Glossary

Privacy Budget

A privacy budget is a parameter used in differential privacy to control the strength of privacy protection. A smaller privacy budget means stronger privacy protection.

In the STAMP framework, the privacy budget is used to control the privacy protection strength of each token.

Differential Privacy

Differential privacy is a technique for protecting data privacy by adding noise to prevent individual data identification.

The STAMP framework uses differential privacy techniques to protect sensitive information in text.

Polar Mechanism

The Polar mechanism is a privacy protection method that perturbs only the direction of embeddings while preserving their magnitude, ensuring semantic neighborhoods are maintained.

In the STAMP framework, the Polar mechanism is used to achieve fine-grained privacy protection.

Task-Aware

Task-aware refers to the ability to dynamically adjust system behavior based on task requirements. In privacy protection, task-aware means adjusting privacy protection strength based on task importance.

The STAMP framework achieves superior privacy-utility balance through task-aware privacy budget allocation.

Cosine Similarity

Cosine similarity is a metric for measuring the similarity of two vectors, with values closer to 1 indicating more similar vectors.

In the STAMP framework experiments, cosine similarity is used to evaluate the utility of text privacy protection.

Isotropic Noise

Isotropic noise is noise that is uniformly distributed in all directions, commonly used in traditional privacy protection methods.

The STAMP framework avoids the utility loss caused by isotropic noise through the Polar mechanism.

Semantic Neighborhood

A semantic neighborhood is a set of semantically similar tokens in embedding space. Maintaining semantic neighborhoods helps improve downstream task utility.

The Polar mechanism improves the utility of the STAMP framework by maintaining semantic neighborhoods.

Decoding Geometry

Decoding geometry refers to the geometric space used in the decoding process to ensure alignment between perturbation geometry and decoding geometry.

The STAMP framework achieves alignment of decoding geometry through cosine nearest-neighbor search.

Laplace Mechanism

The Laplace mechanism is a method for achieving differential privacy protection by adding Laplace noise.

In the STAMP framework experiments, the Laplace mechanism is used as a baseline method for comparison.

Task Utility

Task utility is a performance metric of the system when performing a specific task. High task utility means the system performs well in the task.

The STAMP framework improves task utility through task-aware privacy budget allocation.

Token

A token is a basic unit in text, which can be a word, character, or symbol.

In the STAMP framework, tokens are the basic unit for privacy budget allocation.

Embedding

An embedding is a representation method that maps text data into vector space to capture semantic information.

The STAMP framework perturbs embeddings through the Polar mechanism to achieve privacy protection.

Semantic Similarity

Semantic similarity is the degree of similarity between two tokens in semantic space.

In the STAMP framework, semantic similarity is used to evaluate the utility of privacy protection.

Task Importance

Task importance is the degree of importance of a token in a specific task.

The STAMP framework achieves fine-grained privacy budget allocation by analyzing task importance.

Privacy Sensitivity

Privacy sensitivity is the degree of importance of a token in privacy protection.

The STAMP framework achieves fine-grained privacy budget allocation by analyzing privacy sensitivity.

Open Questions Unanswered questions from this research

1 How to dynamically adjust privacy budgets across different tasks remains an open question. The current STAMP framework primarily optimizes for a single task, and further research is needed to effectively allocate privacy budgets in multi-task scenarios.
2 Improving boundary detection accuracy when handling multi-token entities still needs exploration. Although the STAMP framework is robust to slight inconsistencies, more precise boundary detection will help improve overall performance.
3 How to further improve computational efficiency while ensuring privacy protection is a concern. The computational complexity of the STAMP framework is slightly higher than traditional methods, and future research can explore more efficient implementations.
4 For certain specific privacy requirement scenarios, the applicability of the STAMP framework may be limited. How to extend its applicability to meet the needs of more scenarios is a research direction worth exploring.
5 How to integrate other privacy protection mechanisms to enhance the applicability and flexibility of the STAMP framework still requires further research. Different privacy protection mechanisms have their advantages and disadvantages, and how to effectively combine them to achieve a better privacy-utility balance is an important research topic.

Applications

Immediate Applications

Medical Record Protection

The STAMP framework can be used to protect sensitive information in medical records, ensuring patient privacy is not compromised when sharing data while retaining the research value of the data.

Customer Support Systems

In customer support systems, the STAMP framework can protect customers' personal information, preventing sensitive data leakage while ensuring customer service representatives can access necessary information to resolve issues.

Financial Data Processing

In the financial industry, the STAMP framework can be used to protect customers' financial information, preventing data leaks while ensuring the accuracy and effectiveness of financial analysis.

Long-term Vision

Intelligent Privacy Protection Systems

In the future, the STAMP framework can evolve into an intelligent privacy protection system that dynamically adjusts privacy protection strategies based on different scenarios and needs, achieving a more efficient privacy-utility balance.

Cross-Domain Privacy Protection

The successful application of the STAMP framework can promote the development of cross-domain privacy protection, facilitating the realization of unified privacy protection standards and technologies across different fields.

Abstract

We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.

cs.LG cs.CR cs.IT

References (20)

Character-level Convolutional Networks for Text Classification

Xiang Zhang, J. Zhao, Yann LeCun

2015 6842 citations ⭐ Influential View Analysis →

Broadening the Scope of Differential Privacy Using Metrics

K. Chatzikokolakis, Miguel E. Andrés, N. E. Bordenabe et al.

2013 432 citations

Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text

Stephen Meisenbacher, Florian Matthes

2024 9 citations View Analysis →

TEM: High Utility Metric Differential Privacy on Text

Ricardo Silva Carvalho, Theodore Vasiloudis, Oluwaseyi Feyisetan

2021 57 citations View Analysis →

Thinking Outside of the Differential Privacy Box: A Case Study in Text Privatization with Language Model Prompting

Stephen Meisenbacher, Florian Matthes

2024 9 citations View Analysis →

A Customized Text Sanitization Mechanism with Differential Privacy

Hui Chen, Fengran Mo, Yanhao Wang et al.

2022 66 citations View Analysis →

Privacy Risks of General-Purpose Language Models

Xudong Pan, Mi Zhang, S. Ji et al.

2020 282 citations

All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Jiaqi Mu, S. Bhat, P. Viswanath

2017 356 citations View Analysis →

Private Release of Text Embedding Vectors

Oluwaseyi Feyisetan, S. Kasiviswanathan

2021 28 citations

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Kawin Ethayarajh

2019 1091 citations View Analysis →

The Composition Theorem for Differential Privacy

P. Kairouz, Sewoong Oh, P. Viswanath

2013 777 citations View Analysis →

Locally Differentially Private Document Generation Using Zero Shot Prompting

Saiteja Utpala, Sara Hooker, Pin Yu Chen

2023 64 citations View Analysis →

Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

Oluwaseyi Feyisetan, Borja Balle, Thomas Drake et al.

2019 200 citations View Analysis →

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev et al.

2016 9143 citations View Analysis →

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs

Yury Malkov, Dmitry A. Yashunin

2016 2101 citations View Analysis →

Local Differential Privacy for Deep Learning

Pathum Chamikara Mahawaga Arachchige, P. Bertók, I. Khalil et al.

2019 265 citations View Analysis →

A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis

Teng Wang, Jun Zhao, Xuefeng Zhang et al.

2020 108 citations View Analysis →

Randomized response: a survey technique for eliminating evasive answer bias.

S. Warner

1965 3504 citations

A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric

Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan et al.

2020 67 citations View Analysis →

Billion-Scale Similarity Search with GPUs

Jeff Johnson, Matthijs Douze, H. Jégou

2017 4802 citations View Analysis →

STAMP: Selective Task-Aware Mechanism for Text Privacy

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Privacy Budget

Differential Privacy

Polar Mechanism

Task-Aware

Cosine Similarity

Isotropic Noise

Semantic Neighborhood

Decoding Geometry

Laplace Mechanism

Task Utility

Token

Embedding

Semantic Similarity

Task Importance

Privacy Sensitivity

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Medical Record Protection

Customer Support Systems

Financial Data Processing

Long-term Vision

Intelligent Privacy Protection Systems

Cross-Domain Privacy Protection

Abstract

References (20)

Related Papers

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Representation Learning for Spatiotemporal Physical Systems

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training

BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning