Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

TL;DR

Discovering a shared logical subspace in LLMs improves logical reasoning accuracy by up to 11% via alignment of natural-language and symbolic views.

cs.CL 🔴 Advanced 2026-04-22 34 views
Feihao Fang My T. Thai Yuanyuan Lei
Large Language Models Logical Reasoning Symbolic Reasoning Subspace Alignment Training-Free Method

Key Findings

Methodology

The paper introduces a novel approach that employs Canonical Correlation Analysis (CCA) on the residual activations of natural-language and symbolic-language reasoning chains to learn a low-dimensional subspace with maximum cross-view correlation. This training-free method steers LLMs' reasoning chains along this logical subspace, leveraging complementary reasoning signals from both views.

Key Results

  • In experiments across four logical reasoning benchmarks, the proposed method improved accuracy by up to 11 percentage points, notably increasing Phi-3-Mini's accuracy from 87.2% to 93.2% on the PrOntoQA dataset.
  • Compared to Greedy-CoT, LSS-CoT improved accuracy from 51.7% to 61.1% on the FOLIO dataset using the Llama-3.1-8B model.
  • On PrOntoQA and ProofWriter datasets, LSS-CoT matched or slightly outperformed SC-3 using the Gemma-2-9B model.

Significance

This research significantly enhances multi-step logical reasoning accuracy by discovering and utilizing a shared logical subspace within LLMs. It holds substantial academic significance by advancing the integration of natural language processing and symbolic reasoning, and offers new insights for industrial applications requiring multi-step decision-making, such as mathematics, scientific analysis, planning, and coding.

Technical Contribution

The technical contribution lies in the novel use of CCA to discover a shared logical subspace within LLMs and leveraging this subspace for reasoning guidance. This method does not rely on additional training or external symbolic solvers, providing a new engineering possibility to enhance reasoning performance without altering model weights.

Novelty

This study is the first to propose discovering a shared logical subspace within LLMs through the alignment of natural-language and symbolic views. This innovation contrasts sharply with existing single-view heuristic methods and those relying on external symbolic components, offering a training-free approach to reasoning enhancement.

Limitations

  • The method may underperform in complex reasoning tasks, especially when alignment between natural-language and symbolic expressions is weak.
  • Its generalizability might be limited due to reliance on existing model architectures and datasets.
  • The method's effectiveness may be less pronounced in non-logical reasoning tasks.

Future Work

Future research directions include exploring the application of this method to a broader range of reasoning tasks and further optimizing the subspace learning process. Additionally, investigating how to implement this method on larger-scale models and datasets is a promising area for exploration.

AI Executive Summary

In the field of natural language processing, large language models (LLMs) still face challenges in multi-step logical reasoning. Existing approaches either refine the reasoning chain purely in natural language form or rely on external symbolic solvers. However, these methods fail to fully exploit the potential shared logical subspace within LLMs.

This paper introduces a novel approach that employs Canonical Correlation Analysis (CCA) on the residual activations of natural-language and symbolic-language reasoning chains to learn a low-dimensional subspace with maximum cross-view correlation. This training-free method steers LLMs' reasoning chains along this logical subspace, leveraging complementary reasoning signals from both views.

Experimental results demonstrate that the proposed method improves accuracy by up to 11 percentage points across four logical reasoning benchmarks, notably increasing Phi-3-Mini's accuracy from 87.2% to 93.2% on the PrOntoQA dataset. Moreover, the method generalizes well to out-of-domain reasoning problems.

This research significantly enhances multi-step logical reasoning accuracy by discovering and utilizing a shared logical subspace within LLMs. It holds substantial academic significance by advancing the integration of natural language processing and symbolic reasoning, and offers new insights for industrial applications requiring multi-step decision-making, such as mathematics, scientific analysis, planning, and coding.

However, the method may underperform in complex reasoning tasks, especially when alignment between natural-language and symbolic expressions is weak. Future research directions include exploring the application of this method to a broader range of reasoning tasks and further optimizing the subspace learning process. Additionally, investigating how to implement this method on larger-scale models and datasets is a promising area for exploration.

Deep Analysis

Background

Large language models (LLMs) have made significant strides in the field of natural language processing, particularly in text generation and understanding tasks. Despite their success in many areas, LLMs still struggle with complex multi-step logical reasoning problems. Traditional approaches often rely on optimizing reasoning chains in natural language form or utilizing external symbolic solvers, but these methods fail to fully leverage the potential capabilities within LLMs. As the demand for integrating natural language processing with symbolic reasoning grows, discovering and utilizing a shared logical subspace within LLMs has become a crucial research direction.

Core Problem

The core problem with LLMs in multi-step logical reasoning is their difficulty in establishing effective alignment between natural-language and symbolic-language expressions. This lack of alignment leads to information loss and incorrect inferences during the reasoning process. Particularly in tasks involving complex rules and multi-step reasoning, existing methods often fail to provide sufficient accuracy and robustness. Therefore, discovering and leveraging a shared logical subspace within LLMs to enhance their reasoning capabilities is a pressing issue that needs to be addressed.

Innovation

The core innovations of this paper include:


  • �� Proposing a method to discover a shared logical subspace within LLMs through the alignment of natural-language and symbolic views.

  • �� Utilizing Canonical Correlation Analysis (CCA) to analyze the residual activations of natural-language and symbolic-language reasoning chains, learning a low-dimensional subspace.

  • �� Maximizing cross-view correlation to capture shared logical reasoning capabilities within LLMs.

  • �� Steering LLMs' reasoning chains along this logical subspace during generation, leveraging complementary reasoning signals without additional training.

Methodology

The methodology involves several key steps:


  • �� Canonical Correlation Analysis (CCA): Analyzing the residual activations of natural-language and symbolic-language reasoning chains to learn a low-dimensional subspace.

  • �� Subspace Steering: Guiding LLMs' reasoning chains along this logical subspace during generation, leveraging complementary reasoning signals.

  • �� Training-Free Approach: The method requires no additional training, achieving reasoning guidance by linearly amplifying the projection of each token's activation onto the learned subspace.

  • �� Experimental Validation: Conducting experiments across four logical reasoning benchmarks to evaluate the method's effectiveness.

Experiments

The experimental design includes validation across four logical reasoning benchmarks, including FOLIO, PrOntoQA, and ProofWriter datasets. Models used include Meta-Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, and Llama-2-13B-Chat. The experiments compare baseline methods such as Greedy-CoT, 3-shot-CoT, and SC-3, evaluating accuracy improvements across different models and datasets. Key hyperparameters include the dimensionality of the subspace and steering strength, with ablation studies conducted to analyze the contribution of each component.

Results

Experimental results show that the proposed method significantly improves accuracy across multiple logical reasoning benchmarks. For instance, on the PrOntoQA dataset, Phi-3-Mini's accuracy increased from 87.2% to 93.2%. On the FOLIO dataset, using the Llama-3.1-8B model, accuracy improved from 51.7% to 61.1%. Additionally, compared to the SC-3 method, LSS-CoT matched or slightly outperformed on PrOntoQA and ProofWriter datasets using the Gemma-2-9B model. These results demonstrate the method's ability to effectively utilize the shared logical subspace within LLMs to enhance reasoning performance.

Applications

The method has direct applications in various fields, including mathematics, scientific analysis, planning, and coding tasks that require multi-step decision-making. By enhancing LLMs' logical reasoning capabilities, the method provides higher accuracy and robustness in these areas. Additionally, the method is training-free and applicable to existing LLM architectures, offering low computational overhead.

Limitations & Outlook

Despite the method's impressive performance across multiple benchmarks, it may underperform in complex reasoning tasks, especially when alignment between natural-language and symbolic expressions is weak. Additionally, its generalizability might be limited due to reliance on existing model architectures and datasets. The method's effectiveness may also be less pronounced in non-logical reasoning tasks. Future research could explore how to apply this method to a broader range of reasoning tasks and further optimize the subspace learning process.

Plain Language Accessible to non-experts

Imagine a large factory with many different machines, each with its own task. Large language models are like this factory, responsible for handling various language tasks. However, when it comes to complex multi-step reasoning, the machines in the factory might make mistakes due to a lack of coordination. The method in this paper is like a clever scheduler who can find commonalities between different machines and use these commonalities to improve overall efficiency. By analyzing the reasoning processes in natural language and symbolic language, the scheduler finds a hidden channel that allows different machines to work on the same line, thereby improving reasoning accuracy. This channel is the shared logical subspace. The scheduler doesn't need to change the internal structure of the machines; it just needs to guide them along this channel at critical moments to significantly boost the factory's productivity.

ELI14 Explained like you're 14

Hey there, young friends! Did you know that large language models are like super-smart robots that help us handle all sorts of language problems? But sometimes, they mess up when solving complex logic problems. Imagine you're playing a puzzle game that requires multi-step reasoning, and your robot assistant keeps taking the wrong path. Isn't that annoying?

Well, here's some good news! Scientists have found a new way to make this robot smarter! By analyzing the robot's thought process, they've discovered a hidden 'wisdom channel.' This channel is like a shortcut in the game that helps the robot find the answer faster and more accurately.

What's even cooler is that this method doesn't require a major overhaul of the robot. Just a gentle nudge at the right moment can make it perform better. It's like giving your little helper a hint in the game, letting it know which way to go.

So, next time you're playing a puzzle game, don't forget to give your robot assistant a little hint to make it follow the 'wisdom channel.' That way, you can win the game faster together!

Glossary

Large Language Model (LLM)

A large language model is a deep learning-based natural language processing model capable of handling and generating natural language text.

In this paper, LLMs are used for multi-step logical reasoning.

Canonical Correlation Analysis (CCA)

Canonical Correlation Analysis is a statistical method used to analyze the correlation between two sets of multivariate data.

CCA is used in this paper to analyze the residual activations of natural-language and symbolic-language reasoning chains.

Residual Activation

Residual activation refers to the difference between the output and input of each layer in a neural network.

The paper analyzes residual activations to discover the shared logical subspace.

Logical Subspace

A logical subspace is a low-dimensional space within LLMs that captures logical reasoning capabilities.

The paper learns a shared logical subspace through CCA.

Reasoning Chain

A reasoning chain refers to the sequence of steps from premises to conclusions in logical reasoning.

The paper studies reasoning chains in both natural language and symbolic language.

Training-Free Method

A training-free method is an algorithm or technique that does not require additional training of the model.

The paper's method achieves reasoning guidance in a training-free manner.

PrOntoQA

PrOntoQA is a dataset used to test logical reasoning capabilities.

The paper validates its method on the PrOntoQA dataset.

FOLIO

FOLIO is a dataset containing natural language stories and first-order logic formalizations.

The paper conducts experiments on the FOLIO dataset.

ProofWriter

ProofWriter is a dataset providing multi-step reasoning questions paired with natural language proofs.

The paper conducts experiments on the ProofWriter dataset.

Greedy-CoT

Greedy-CoT is a zero-shot reasoning method that generates reasoning chains through greedy decoding.

The paper uses Greedy-CoT as a baseline method for comparison.

Open Questions Unanswered questions from this research

  • 1 How can this method be applied to a broader range of reasoning tasks? Currently, the method is primarily validated on logical reasoning tasks and has not been widely tested on other types of reasoning tasks. Future research could explore how to apply this method across different domains.
  • 2 What is the method's effectiveness in non-logical reasoning tasks? While the method performs well in logical reasoning tasks, its effectiveness in non-logical reasoning tasks may be less pronounced. Future research could explore how to improve this method to accommodate a wider range of tasks.
  • 3 How can the subspace learning process be further optimized? Currently, the method relies on existing model architectures and datasets. Future research could explore how to implement this method on larger-scale models and datasets.
  • 4 How does the method perform in complex reasoning tasks? The method may underperform when alignment between natural-language and symbolic expressions is weak. Future research could explore how to improve the method's performance in these tasks.
  • 5 What is the method's generalizability? Due to reliance on existing model architectures and datasets, the method's generalizability might be limited. Future research could explore how to enhance the method's generalizability.

Applications

Immediate Applications

Mathematical Computation

By enhancing LLMs' logical reasoning capabilities, this method can provide higher accuracy and robustness in mathematical computation tasks, helping solve complex mathematical problems.

Scientific Analysis

In scientific analysis tasks, this method can help researchers perform data reasoning and conclusion derivation more accurately, improving the efficiency of scientific research.

Planning and Coding

In planning and coding tasks, this method can help developers make multi-step decisions more efficiently, improving the quality and speed of software development.

Long-term Vision

Intelligent Assistants

By enhancing LLMs' logical reasoning capabilities, this method can be applied to the development of intelligent assistants, enabling them to better understand and solve users' problems.

Automated Decision Systems

This method can be applied to the development of automated decision systems, improving the accuracy and efficiency of decision-making, and advancing automation technology.

Abstract

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In this work, we instead ask whether LLMs contain a shared internal logical subspace that simultaneously aligns natural-language and symbolic-language views of the reasoning process. Our hypothesis is that this logical subspace captures logical reasoning capabilities in LLMs that are shared across views while remaining independent of surface forms. To verify this, we employ Canonical Correlation Analysis on the paired residual activations from natural-language and symbolic-language reasoning chains, learning a low-dimensional subspace with maximum cross-view correlation. Furthermore, we design a training-free approach that steers LLMs reasoning chain along this logical subspace, thereby leveraging the complementary reasoning signals from both views. Experiments on four logical reasoning benchmarks demonstrate the effectiveness of our approach, improving accuracy by up to 11 percentage points and generalizing well on out-of-domain problems.

cs.CL

References (20)

Relations Between Two Sets of Variates

H. Hotelling

1936 6799 citations ⭐ Influential

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa et al.

2024 1882 citations ⭐ Influential View Analysis →

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

Abulhair Saparov, He He

2022 462 citations ⭐ Influential View Analysis →

ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

Oyvind Tafjord, Bhavana Dalvi, Peter Clark

2020 417 citations ⭐ Influential View Analysis →

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

M. Raghu, J. Gilmer, J. Yosinski et al.

2017 810 citations ⭐ Influential

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Liangming Pan, Alon Albalak, Xinyi Wang et al.

2023 503 citations View Analysis →

Steering Language Models With Activation Engineering

A. M. Turner, Lisa Thiergart, Gavin Leech et al.

2023 454 citations View Analysis →

Steering Llama 2 via Contrastive Activation Addition

Nina Rimsky, Nick Gabrieli, Julia Schulz et al.

2023 644 citations View Analysis →

Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, S. Gu, Machel Reid et al.

2022 6849 citations View Analysis →

FoVer: First-Order Logic Verification for Natural Language Reasoning

Yu Pei, Yongping Du, Xingnan Jin

2025 2 citations

Large Language Model Guided Tree-of-Thought

Jieyi Long

2023 310 citations View Analysis →

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Fangzhi Xu, Qika Lin, Jiawei Han et al.

2023 97 citations View Analysis →

Boosting Logical Fallacy Reasoning in LLMs via Logical Structure Tree

Yuanyuan Lei, Ruihong Huang

2024 6 citations View Analysis →

Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

Yujun Zhou, Jiayi Ye, Zipeng Ling et al.

2025 11 citations View Analysis →

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin R. Stone et al.

2023 16534 citations View Analysis →

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder et al.

2020 56897 citations View Analysis →

HellaSwag: Can a Machine Really Finish Your Sentence?

Rowan Zellers, Ari Holtzman, Yonatan Bisk et al.

2019 3966 citations View Analysis →

Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

Hanmeng Liu, Ruoxi Ning, Zhiyang Teng et al.

2023 324 citations View Analysis →

The TPTP Problem Library

G. Sutcliffe, C. Suttner

1994 387 citations

Empowering LLMs with Logical Reasoning: A Comprehensive Survey

Fengxiang Cheng, Haoxuan Li, Fenrong Liu et al.

2024 59 citations View Analysis →