FASE: Fast Adaptive Semantic Entropy for Code Quality

TL;DR

FASE employs graph-based semantic embeddings to approximate code correctness, achieving 25% higher correlation and only 0.3% of traditional computational cost.

cs.SE 🔴 Advanced 2026-06-09 73 views

Shizhe Lin Ladan Tahvildari

AI Reader Arxiv Page Download PDF

Artificial Intelligence Code Generation Uncertainty Quantification Semantic Entropy Multi-Agent Systems

Key Findings

Methodology

The proposed FASE method leverages code embeddings to map generated code into a continuous semantic space. It constructs pairwise distance matrices using cosine similarity, then extracts a minimum spanning tree (MST) to abstract the core semantic relationships among samples. An adaptive density-based clustering algorithm, guided by the distribution of MST edge weights via Gaussian kernel density estimation, dynamically determines the neighborhood threshold for clustering. This process forms semantic equivalence classes without relying on costly LLM-based bidirectional entailment checks. The entropy is then computed based on the distribution of these classes, providing an efficient estimate of functional correctness. This approach replaces traditional LLM equivalence checks, significantly reducing computational overhead while maintaining high accuracy in uncertainty estimation, suitable for large-scale multi-agent workflows.

Key Results

On HumanEval and BigCodeBench datasets, FASE outperformed existing semantic entropy methods with an average 25% increase in Spearman correlation and a 19% rise in ROCAUC scores. The method demonstrated strong capability to distinguish functionally correct code samples, aligning well with ground-truth correctness labels, and showed robustness across different embedding models such as All-MiniLM-L6-v2, GTE-ModernBERT, and Llama-Embed-Nemotron.
By eliminating expensive LLM entailment checks, FASE achieved a computational cost only 0.3% of traditional approaches, enabling near real-time uncertainty estimation in multi-agent systems. Experimental results confirmed that FASE maintains high predictive accuracy while substantially reducing runtime, making it practical for deployment in real-world software engineering workflows.
Across various datasets and models, the pairwise semantic distances captured by embeddings correlated strongly with code functionality. The MST-based abstraction preserved essential semantic structure, enabling effective clustering of functionally equivalent code samples. These findings validated the effectiveness of the embedding-driven graph analysis framework in estimating code correctness, with potential for broad application in automated code review and quality assurance.

Significance

This research addresses the critical bottleneck of high computational cost in uncertainty quantification for code generation. By introducing a graph-based semantic entropy that leverages pre-trained embeddings, FASE offers a scalable, cost-effective alternative to LLM-dependent methods. Its ability to accurately estimate code correctness without ground-truth test cases significantly enhances the reliability of autonomous software development systems. The approach bridges the gap between syntactic structural analysis and semantic fidelity, providing a practical tool for real-world multi-agent workflows, and advancing the state of the art in AI-driven software engineering.

Technical Contribution

FASE innovatively combines code embedding models, minimum spanning tree abstraction, and adaptive density clustering to produce a fast, scalable semantic entropy measure. Unlike prior methods relying on costly LLM entailment, this approach utilizes the semantic distances in embedding space, ensuring computational efficiency. The adaptive clustering mechanism, based on the distribution of MST edge weights, dynamically adjusts to task-specific semantic structures, improving the accuracy of equivalence class estimation. The method guarantees that the most informative semantic relationships are preserved while reducing computational complexity, enabling large-scale deployment in multi-agent systems.

Novelty

This work is the first to integrate graph-theoretic structures, specifically MST, with embedding-based semantic distances for code uncertainty estimation. It departs from traditional reliance on LLM entailment checks, offering a lightweight, model-agnostic alternative. The adaptive density clustering guided by MST edge weight distribution introduces a novel mechanism for dynamically determining semantic equivalence classes, significantly improving scalability and robustness. Overall, it provides a new paradigm for uncertainty quantification in code generation, combining efficiency with semantic fidelity.

Limitations

The effectiveness of FASE depends heavily on the quality of the underlying code embeddings; poor embedding representations may lead to inaccurate semantic distances and thus unreliable entropy estimates.
In highly complex or ambiguous code samples, the MST abstraction might oversimplify semantic relationships, potentially missing subtle correctness issues.
While computationally efficient, the method still requires pre-trained embedding models and clustering hyperparameters, which may need fine-tuning for different programming languages or domains.

Future Work

Future research will explore integrating dynamic static and dynamic analysis data to enhance semantic similarity measures. Developing domain-specific embedding models and fine-tuning clustering parameters could further improve accuracy. Additionally, extending the framework to support multiple programming languages and incorporating real-time feedback mechanisms will be key directions. The goal is to build a comprehensive, scalable uncertainty quantification system that can adapt to evolving codebases and diverse development environments, ultimately enabling fully autonomous, trustworthy software engineering workflows.

AI Executive Summary

In the rapidly evolving landscape of autonomous software development, ensuring the correctness and reliability of generated code remains a fundamental challenge. Traditional validation methods, such as executing test cases, are effective but become prohibitively expensive and less scalable as systems grow in complexity and size. Recent advances in large language models (LLMs) have enabled automated code generation, yet their outputs are often plagued by hallucinations and uncertainty, which can propagate errors across multi-agent workflows.

To address this, researchers have turned to uncertainty quantification techniques, notably semantic entropy, which measures the variability in code outputs based on their functional equivalence rather than mere textual similarity. However, existing semantic entropy methods rely heavily on costly LLM-based bidirectional entailment checks, limiting their scalability and real-time applicability in complex development pipelines.

This paper introduces FASE (Fast Adaptive Semantic Entropy), a novel approach that leverages code embeddings and graph analysis to efficiently estimate the functional correctness of generated code. Instead of expensive LLM inference, FASE maps code samples into a continuous semantic space using pre-trained embedding models like Qwen3-Embedding-8B. It computes pairwise semantic distances via cosine similarity, then constructs a dense distance matrix. From this matrix, a minimum spanning tree (MST) is extracted, capturing the most significant semantic relationships while discarding less informative connections.

The core innovation lies in using the distribution of MST edge weights to adaptively determine clustering thresholds through Gaussian kernel density estimation. This dynamic threshold guides a density-based clustering algorithm, which groups code samples into semantic equivalence classes without relying on explicit LLM equivalence checks. The resulting class distribution enables the calculation of semantic entropy, providing a measure of uncertainty that correlates well with the functional correctness of code samples.

Extensive experiments on benchmark datasets such as HumanEval and BigCodeBench demonstrate that FASE outperforms state-of-the-art semantic entropy methods, achieving an average 25% improvement in Spearman correlation and a 19% increase in ROCAUC scores when predicting code correctness. Importantly, FASE's computational overhead is only about 0.3% of traditional methods, making it highly suitable for real-time, large-scale multi-agent workflows.

The significance of this work extends beyond efficiency gains. By providing a reliable, scalable, and cost-effective uncertainty estimation framework, FASE enhances the trustworthiness of autonomous code generation systems. It bridges the gap between structural syntactic analysis and semantic fidelity, offering a practical solution for real-world software engineering challenges. Future directions include integrating multi-modal data, refining embedding models, and extending the framework to support diverse programming languages, paving the way for fully autonomous, trustworthy software development pipelines.

Deep Dive

Abstract

Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation across interacting agents. While semantic entropy provides a principled way to quantify uncertainty without ground-truth answers, current methods often rely on costly LLM-driven equivalence checks. In this work, we introduce Fast Adaptive Semantic Entropy (FASE), a novel metric that approximates functional correctness based on the minimum spanning tree of structural and semantic dissimilarity graphs. Evaluations on HumanEval and BigCodeBench demonstrate that FASE outperforms state-of-the-art semantic entropy by LLM entailment, achieving a 25% average improvement in Spearman correlation and a 19% increase in ROCAUC score against Pass@1 from ground-truth test cases when using the Qwen3-Embedding-8B model. Furthermore, by eliminating costly LLM-driven equivalence evaluation, FASE incurs negligible computational overhead, requiring only approximately 0.3% of the runtime cost of traditional semantic entropy approaches. These results position FASE as a practical, cost-effective solution for optimizing uncertainty quantification in real-world multi-agent workflows.

cs.SE cs.AI cs.MA

FASE: Fast Adaptive Semantic Entropy for Code Quality

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

Related Papers

Probe-and-Refine Tuning of Repository Guidance for Coding Agents

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

Code Review Agent Benchmark

Evaluating LLM-Based Test Generation Under Software Evolution