Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

TL;DR

Proposes EmbedFilter, a linear transformation that filters the latent subspace encoding high-frequency, uninformative tokens, improving zero-shot text embedding performance by up to 14%.

cs.CL 🔴 Advanced 2026-06-06 72 views

Songhao Wu Zhongxin Chen Yuxuan Liu Heng Cui Cong Li Rui Yan

AI Reader Arxiv Page Download PDF

Large Language Models Text Embedding Mechanistic Interpretability Feature Filtering Dimensionality Reduction

Key Findings

Methodology

This work analyzes the unembedding matrix of large language models (LLMs) to reveal it encodes a latent subspace associated with an 'average' token, biased towards high-frequency but semantically uninformative words. Using Logit Lens and Logit Spectroscopy, the authors identify the 'edge spectrum'—the spectral region responsible for driving high-frequency token expression. They then design EmbedFilter, a simple linear transformation, to filter out this subspace without additional training. Experiments across models like Qwen-2.5, Llama-3.1, and Mistral evaluate the impact on multiple downstream tasks within the MTEB benchmark, demonstrating performance gains, robustness, and effective dimensionality reduction.

Key Results

Applying EmbedFilter (τ=2) on Qwen-2.5 yields a 14.1% average improvement across 49 datasets in the MTEB benchmark, with performance maintained or improved even when reducing embedding dimensions to 25% of the original size. Similar improvements are observed on Llama-3.1 and Mistral, confirming the method’s robustness.
Filtering the edge spectrum suppresses the influence of high-frequency, uninformative tokens, resulting in embeddings with enhanced semantic discriminability. The technique acts as a post-processing step, requiring minimal computational overhead, and achieves significant embedding compression while preserving or boosting task performance.
The distance-preserving nature of the spectral filtering enables natural embedding dimensionality reduction, reducing storage and speeding up retrieval in large-scale information retrieval systems, with minimal performance trade-offs.

Significance

This research uncovers a fundamental mechanism behind the suboptimal performance of LLM-based text embeddings. By revealing that the unembedding matrix encodes a latent subspace biased toward high-frequency, low-information tokens, it provides a mechanistic explanation for the anisotropy and semantic collapse observed in embeddings. The proposed EmbedFilter offers a simple yet powerful post-processing solution that enhances semantic quality, reduces storage costs, and accelerates retrieval, bridging the gap between model interpretability and practical deployment. The findings open new avenues for understanding internal model representations and optimizing large-scale embedding systems, with broad implications for information retrieval, semantic search, and NLP applications.

Technical Contribution

The core technical contribution lies in identifying the latent subspace within the unembedding matrix that encodes high-frequency, uninformative tokens. The authors leverage spectral analysis tools—Logit Spectroscopy—to pinpoint the 'edge spectrum' responsible for this bias. They then formulate EmbedFilter as a linear transformation based on the spectral components, which effectively filters out this subspace. This approach is novel in that it does not require retraining or fine-tuning, yet yields significant improvements in embedding quality and efficiency. The method also provides a theoretical guarantee of distance preservation, enabling natural dimensionality reduction. This work bridges mechanistic interpretability with practical embedding optimization, offering a new paradigm for post hoc model refinement.

Novelty

This study is the first to systematically interpret the unembedding matrix as a feature lens that encodes a latent 'average' token subspace. It innovatively applies spectral analysis—Logit Spectroscopy—to identify the 'edge spectrum' responsible for high-frequency bias, and introduces EmbedFilter as a simple, effective linear filter. Unlike prior work focused on prompt engineering or fine-tuning, this approach offers a mechanism-based, post-processing solution that enhances embedding quality and efficiency without additional training. Its novelty lies in combining mechanistic interpretability with practical embedding optimization, providing both theoretical insights and engineering benefits.

Limitations

The linear filtering approach assumes the subspace responsible for high-frequency bias can be effectively captured by spectral components, which may not hold in cases with more complex semantic structures or domain-specific vocabularies. Its effectiveness might diminish in highly specialized tasks or languages with different frequency distributions.
Parameter tuning, such as the filtering ratio τ, requires empirical adjustment and may affect performance across models and tasks, indicating a sensitivity to hyperparameters.
The current analysis predominantly focuses on static text embedding tasks; its applicability to generative tasks, multi-modal models, or tasks involving complex reasoning remains to be validated. Further research is needed to extend the approach to these scenarios.

Future Work

Future directions include exploring adaptive, non-linear filtering strategies that dynamically adjust the subspace based on input context. Extending the spectral analysis to multi-modal models and integrating it with fine-tuning or prompt engineering could further enhance robustness. Investigating the impact of filtering on tasks requiring complex reasoning or generation, and developing automated methods for optimal subspace identification, are promising avenues. Additionally, applying this mechanistic interpretability framework to other model components may yield deeper insights into internal representations and lead to more efficient, transparent NLP systems.

AI Executive Summary

The rapid advancement of large language models (LLMs) has revolutionized natural language processing, enabling unprecedented capabilities in understanding and generating human language. Despite their impressive performance across a broad spectrum of tasks, a persistent challenge remains: their effectiveness as off-the-shelf text embedding tools is limited. This shortfall is particularly evident in zero-shot scenarios, where models often produce embeddings biased toward high-frequency, semantically uninformative tokens. Such biases hinder the models’ ability to capture nuanced semantic distinctions, thereby constraining their utility in applications like information retrieval, semantic search, and downstream classification tasks.

To address this fundamental issue, the authors of this study embarked on a mechanistic exploration of the internal representations within LLMs. They focused on the unembedding matrix—the component responsible for mapping hidden states back to vocabulary logits—and discovered that it encodes a latent subspace associated with an 'average' token. This subspace is characterized by spectral components at the edges of the spectrum, which disproportionately influence the expression of high-frequency, low-information words. Using tools like Logit Lens and Logit Spectroscopy, they identified this 'edge spectrum' as the driver of the bias.

Building on this insight, the authors proposed EmbedFilter, a simple yet effective linear transformation designed to filter out the identified subspace. By removing the influence of this latent component, EmbedFilter suppresses the over-representation of uninformative tokens, thereby enhancing the semantic richness and discriminability of the resulting embeddings. Remarkably, this approach requires no additional training or fine-tuning, making it a practical post-processing step applicable across various models.

Extensive experiments across models such as Qwen-2.5, Llama-3.1, and Mistral demonstrated that EmbedFilter consistently improves zero-shot performance on the MTEB benchmark, with an average gain of over 14%. Additionally, the filtering process naturally enables dimensionality reduction, significantly decreasing storage requirements and accelerating retrieval without sacrificing accuracy. This dual benefit of performance enhancement and efficiency makes EmbedFilter a compelling tool for real-world deployment.

Beyond empirical results, this work offers a new mechanistic perspective on how internal model components influence semantic representations. It bridges the gap between interpretability and practical optimization, paving the way for more transparent and efficient NLP systems. Future research may extend these ideas to dynamic filtering, multi-modal models, and tasks involving complex reasoning, further unlocking the potential of large language models in diverse applications.

Deep Analysis

Background

The evolution of NLP has been marked by the transition from static word embeddings like Word2Vec and GloVe to contextualized representations from Transformer-based models such as BERT, GPT, and LLaMA. These models leverage self-attention mechanisms to capture long-range dependencies and complex semantic structures, leading to significant performance improvements in tasks like question answering, summarization, and translation. Pretraining on massive corpora enables models to learn rich internal representations, which can be fine-tuned or directly used as embeddings for downstream tasks.

However, despite these advances, the effectiveness of LLMs as off-the-shelf text encoders remains limited. Empirical studies have shown that raw embeddings tend to be anisotropic, concentrated in a narrow cone, and biased toward high-frequency tokens that do not carry meaningful semantic information. Prior efforts, such as prompt engineering (e.g., PromptEOL, ECHO), attempted to improve embedding quality by crafting specific prompts or input modifications, but these methods are heuristic, sensitive to prompt design, and often yield modest gains.

Recent mechanistic interpretability tools like Logit Lens and Spectroscopy have provided new insights into the internal workings of LLMs. These tools project intermediate activations into the vocabulary space, revealing biases and structural properties of embeddings. Nonetheless, a comprehensive understanding of how these internal biases affect downstream performance and how to systematically mitigate them has been lacking. This work aims to fill that gap by analyzing the unembedding matrix's spectral properties and their impact on semantic representation quality.

Core Problem

The core challenge addressed in this paper is the suboptimal performance of large language models when used directly as off-the-shelf text embedding tools. Despite their impressive zero-shot capabilities, embeddings generated from raw models are often dominated by high-frequency, semantically uninformative tokens, leading to poor semantic discriminability. This bias stems from the internal structure of the unembedding matrix, which encodes a latent subspace associated with an 'average' token. This subspace exerts disproportionate influence, pulling embeddings toward a common centroid and causing anisotropy in the embedding space.

Such biases limit the models’ ability to produce nuanced semantic representations, impairing downstream tasks like semantic similarity, clustering, and retrieval. Existing solutions like prompt engineering or fine-tuning are either heuristic or computationally expensive, and do not address the root cause. Therefore, a mechanism-based approach that can identify and filter out the biased subspace is urgently needed to unlock the full potential of LLM-based embeddings.

Innovation

This work introduces a novel mechanistic interpretation of the unembedding matrix as a feature lens that encodes a latent 'average' token subspace. The key innovation is the identification of the 'edge spectrum'—spectral components at the extremes of the singular value spectrum—that drive high-frequency, uninformative token expression. Leveraging Logit Spectroscopy, the authors systematically analyze the spectral properties of the unembedding matrix, revealing how this subspace biases embeddings.

Building on this insight, they propose EmbedFilter, a simple linear transformation that filters out the edge spectrum components. Unlike traditional prompt engineering or fine-tuning, EmbedFilter operates as a post-processing step, requiring no additional training. Its design is grounded in spectral analysis, ensuring that the filtering preserves semantic information while suppressing bias. This approach not only improves embedding quality but also enables natural dimensionality reduction, reducing storage and computational costs.

The combination of mechanistic interpretability and spectral filtering represents a significant advancement in understanding and optimizing large language models, bridging theoretical insights with practical engineering solutions.

Methodology

�� 识别潜在子空间：利用Logit Spectroscopy分析未嵌入矩阵的奇异值分解，识别出驱动高频无信息词的边缘频谱子空间。
�� 逆向推导“平均”词：结合训练语料中的词频分布，利用未嵌入矩阵的伪逆，反推出代表“平均”词的潜在向量。
�� 识别边缘频谱：对“平均”词在频谱子空间中进行投影过滤，观察其对高频词logits的影响，计算logit偏移量。
�� 设计EmbedFilter：基于频谱分析，构建线性变换矩阵，过滤掉边缘频谱子空间中的成分。
�� 评估性能：在Qwen-2.5、Llama-3.1和Mistral模型上，应用EmbedFilter对文本嵌入进行后处理，评估在MTEB基准上的多任务性能，包括语义相似性、分类、聚类和检索。
�� 降维实现：利用正交矩阵的距离保持性质，将嵌入空间投影到更低维度，减少存储和检索时间，验证降维效果。

Experiments

实验采用Qwen-2.5、Llama-3.1和Mistral模型，基于MTEB基准评估多任务性能。通过调整过滤比例τ，测试不同嵌入维度下的性能变化。指标涵盖语义相似性（STS）、分类准确率、聚类纯度和检索召回率。对比未过滤模型和经过EmbedFilter处理的模型，验证性能提升的统计显著性。还进行了消融实验，分析不同频谱子空间的影响，以及不同模型规模和任务类型的适应性。结果显示，EmbedFilter在保持或提升性能的同时，有效压缩了嵌入空间，验证了其鲁棒性和实用性。

Results

在Qwen-2.5模型中，应用EmbedFilter（τ=2）后，平均性能提升达14.1%，在49个数据集上表现优异。Llama-3.1模型经过相同处理，性能提升约3%，在多项任务中表现出色。Mistral模型也显示出类似趋势。过滤后，嵌入维度可降低至原始的25%，性能仍优于未过滤模型。实验证明，过滤边缘频谱有效抑制高频无信息词偏向，提升语义区分度。距离保持变换实现了嵌入空间的无损压缩，极大降低存储和检索成本，验证了方法的实用价值。

Applications

该技术适用于大规模信息检索、语义搜索、知识图谱构建等场景。通过过滤无关的高频词子空间，可以显著提升检索效率和准确性，减少存储成本，适合在资源有限的边缘设备或云平台部署。未来还可以结合多模态模型，优化跨模态语义表示，推动智能问答、内容推荐等应用的发展。

Limitations & Outlook

目前方法主要依赖线性过滤，假设高频偏差子空间可以通过谱分析有效捕获，但在极端复杂语义或专业领域中，效果可能受限。过滤参数（如τ）需要调优，不同模型和任务可能表现不同。实验主要集中在静态文本任务，尚未充分验证在生成、推理等复杂场景中的效果。未来应探索非线性过滤和动态调节策略，以增强适应性和泛化能力。

Plain Language Accessible to non-experts

想象你在一家工厂工作，工厂里有很多机器在生产各种产品。有时候，工厂会制造一些无用的零件或者重复的部件，这些会占用空间，影响效率。为了让工厂运转得更快、更顺畅，你会用一种特殊的筛子，把那些无用的零件筛掉。这就像我们在模型中筛除那些频繁出现但没有实际意义的词汇一样。

在这家工厂里，所有的机器都在不断地制造和装配零件。有些零件虽然经常出现，但其实没有什么用处，反而会让整个生产变得混乱。我们发现，模型的内部结构就像是工厂的操作手册，它里面藏着这些无用零件的“秘密”。通过分析这个手册，我们可以找到那些经常出现但没有实际意义的零件，然后用特殊的工具（EmbedFilter）把它们筛掉。

这样一来，工厂的生产线变得更加高效，产品的质量也得到了提升。模型的嵌入空间变得更加干净，语义表达也更清晰，就像工厂里没有了多余的废料，生产出来的产品更纯净、更有价值。这种方法不仅让工厂运转得更快，还节省了存储空间和时间，让整个系统变得更智能、更高效。

Abstract

Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal performance on massive text embedding benchmarks. In this paper, we identify a potential cause underlying this deficiency. Our motivation stems from an unexpected observation: text embeddings tend to align with frequent but uninformative tokens when projected onto the vocabulary space. We argue that this excessive expression of high-frequency tokens suppresses the model's ability to capture nuanced semantics. To address this, we introduce EmbedFilter, a simple linear transformation designed to refine text embeddings derived from LLMs directly. Specifically, we uncover that the unembedding matrix within LLMs encodes a latent space that is actively writing these frequent tokens into embedding space. By filtering out this subspace, EmbedFilter suppress the influence of high-frequency tokens, thereby enhancing semantic representations. As a compelling byproduct, this enables an inherent dimensionality reduction, lowering index storage and speedup retrieval while fully preserving the refined embedding quality. Our experiments across multiple LLM backbones demonstrate that LLMs equipped with EmbedFilter achieve superior zero-shot downstream performance even with significantly reduced embedding dimensions. We hope our findings provide deeper insights into the mechanisms of LLM-based representations and inspire more principled designs to improve text embeddings training. Our code is available at https://github.com/CentreChen/EmbFilter.

cs.CL cs.IR

References (16)

Retrieval of the Best Counterargument without Prior Topic Knowledge

Henning Wachsmuth, S. Syed, Benno Stein

2018 200 citations

Searching for scientific evidence in a pandemic: An overview of TREC-COVID

Kirk Roberts, Tasmeer Alam, Steven Bedrick et al.

2021 44 citations View Analysis →

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach et al.

2024 501 citations View Analysis →

WWW'18 Open Challenge: Financial Opinion Mining and Question Answering

Macedo Maia, S. Handschuh, A. Freitas et al.

2018 431 citations

Eliciting Latent Predictions from Transformers with the Tuned Lens

Nora Belrose, Zach Furman, Logan Smith et al.

2023 453 citations View Analysis →

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Nandan Thakur, Nils Reimers, Andreas Ruckl'e et al.

2021 1711 citations View Analysis →

Spectral Filters, Dark Signals, and Attention Sinks

Nicola Cancedda

2024 49 citations View Analysis →

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Kawin Ethayarajh

2019 1218 citations View Analysis →

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Jianlin Su, Jiarun Cao, Weijie Liu et al.

2021 359 citations View Analysis →

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Arman Cohan, Sergey Feldman, Iz Beltagy et al.

2020 816 citations View Analysis →

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao, Xingcheng Yao, Danqi Chen

2021 4482 citations View Analysis →

A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens

Zhijie Nie, Richong Zhang, Zhanyu Wu

2024 6 citations View Analysis →

Scaling Laws for Neural Language Models

J. Kaplan, Sam McCandlish, T. Henighan et al.

2020 8160 citations View Analysis →

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Ziyue Li, Tianyi Zhou

2024 35 citations View Analysis →

A large annotated corpus for learning natural language inference

Samuel R. Bowman, Gabor Angeli, Christopher Potts et al.

2015 4698 citations View Analysis →

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

Raghuveer Thirukovalluru, Bhuwan Dhingra

2024 13 citations View Analysis →

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

Abstract

References (16)

Related Papers

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

Learning User Simulators with Turing Rewards

RubricsTree: Scalable and Evolving Open-Ended Evaluation of Personal Health Agents across Health Memory and Medical Skills

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Characterizing Cultural Localization in AI-Generated Stories

Operads for compositional reasoning in LLMs