DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs

TL;DR

DeepSWIP leverages neural materialization and quotient WMC for exact single-world counterfactual inference in neural probabilistic logic programs, achieving a 2.14× speedup.

cs.AI 🔴 Advanced 2026-06-19 30 views

Saimun Habib Vaishak Belle Fengxiang He

AI Reader Arxiv Page Download PDF

Neurosymbolic Systems Counterfactual Reasoning Probabilistic Logic Weighted Model Counting Causal Inference

Key Findings

Methodology

DeepSWIP employs neural materialization to convert neural predicates into categorical probabilistic choices, enabling the transformation of neural probabilistic logic programs into standard ProbLog models. The process begins with evaluating neural networks under fixed parameters and contexts, deriving probability distributions for neural predicates. These are then replaced by their categorical choices, forming a traditional ProbLog program. Subsequently, SWIP (Single World Intervention Program) transformations are applied: rules defining intervened atoms are removed, downstream rules are redirected to fixed intervention values, and the modified program is used for counterfactual inference. The core computation involves calculating the quotient of weighted model counts (WMC) over the transformed program, which algebraically represents the counterfactual probability as ΦxY,E(p)/ΦxE(p). The framework assumes finite grounding and unique supported models, ensuring exactness relative to the learned materialized causal model (FCM). The paper further analyzes the algebraic structure of neural activation, intervention cleaning, calibration sensitivity, and rare evidence instability, revealing how neural prediction errors propagate through the logical inference process.

Key Results

On the MPI3D dataset, DeepSWIP achieved perfect agreement with the deep Twin approach across 12,000 queries, validating its correctness. It also demonstrated a 2.14× inference speedup (from 7.65 ms to 3.57 ms per query) by avoiding endogenous duplication in Twin networks. The experiments confirmed that neural calibration errors significantly influence counterfactual estimates, with the framework effectively removing redundant mechanisms during intervention. In the SUMO traffic scenario, neural calibration degradation introduced biases in plug-in estimates, which were mitigated by the scoped AIPW estimator, reducing first-order bias in population mean and ATE calculations. These results highlight the method’s robustness and efficiency in complex real-world scenarios.
The algebraic formulation of counterfactuals as WMC quotients provides interpretability and insight into intervention effects, calibration, and rare event sensitivity. The theoretical analysis shows that only active neural probabilities—those surviving intervention pruning—affect the counterfactual outcome, enabling precise sensitivity analysis and bias correction. The experimental validation confirms that DeepSWIP not only accelerates inference but also maintains high accuracy and interpretability, making it a promising tool for scalable causal reasoning in neural-symbolic systems.

Significance

This work advances the field of neurosymbolic reasoning by bridging neural perception and causal inference through an algebraic, model-counting-based approach. By formalizing counterfactuals as quotients of WMC polynomials, it offers a transparent, mathematically grounded framework that addresses longstanding challenges in integrating neural uncertainty with symbolic causal semantics. The ability to perform exact, efficient, and interpretable counterfactual reasoning opens new avenues for deploying AI systems in safety-critical applications such as autonomous driving, medical diagnosis, and decision support, where understanding causal effects and counterfactual scenarios is crucial. Moreover, the theoretical insights into calibration sensitivity and rare evidence instability contribute to the broader understanding of neural-symbolic robustness and reliability.

Technical Contribution

The paper introduces neural materialization, transforming neural predicates into categorical choices within ProbLog, enabling standard causal transformations. It formalizes counterfactual inference as a quotient of WMC evaluations, leveraging algebraic model counting (AMC) to unify probabilistic inference, gradient computation, and sensitivity analysis. The framework rigorously proves that, under finite grounding and unique supported models, the counterfactual probability is exactly represented by the quotient of two multilinear WMC polynomials. It also provides algebraic results for intervention cleaning, active neural set identification, and local calibration sensitivity, linking neural prediction errors to counterfactual bias. The integration of SWIP with AMC offers a scalable, interpretable, and precise method for causal reasoning in neural-symbolic systems.

Novelty

This work is the first to formulate neural probabilistic logic program counterfactuals explicitly as a quotient of WMC polynomials, moving beyond Twin network duplication. It innovatively combines neural materialization with SWIP transformations to achieve exact, single-world counterfactual inference efficiently. The algebraic model counting perspective unifies intervention, calibration, and rare evidence analysis within a single mathematical framework, providing new theoretical insights and practical tools. These contributions significantly extend the capabilities of existing probabilistic logic programming and neural-symbolic reasoning methods, enabling scalable, interpretable, and precise causal inference in complex neural systems.

Limitations

The approach relies on the assumptions of finite grounding and unique supported models, which may limit scalability to very large or highly relational programs. Extending to infinite or cyclic models remains challenging.
Neural calibration errors directly influence counterfactual accuracy; current calibration techniques may not suffice in noisy real-world data, potentially affecting robustness.
Handling continuous variables or path-specific counterfactuals requires additional discretization or approximation strategies, which could introduce errors or complexity.
The method's computational cost grows with the number of active neural probabilities and program size; approximate WMC methods could be explored to improve scalability in future work.

Future Work

Future research will focus on extending DeepSWIP to handle continuous variables and more complex causal structures, including path-specific effects. Incorporating approximate WMC algorithms could improve scalability for large-scale models. Developing more robust neural calibration techniques will enhance reliability in noisy environments. Additionally, exploring multi-world and multi-intervention scenarios, as well as integrating the framework with deep reinforcement learning, could broaden its applicability. Further theoretical work on relaxing assumptions and improving computational efficiency will be crucial for deploying these methods in real-world, large-scale AI systems.

AI Executive Summary

In recent years, neurosymbolic AI has emerged as a promising paradigm that combines neural perception modules with symbolic reasoning frameworks. Systems like DeepProbLog exemplify this approach by integrating neural networks with probabilistic logic programming, enabling robust prediction under perceptual uncertainty. However, a significant limitation has been the inability to perform causal and counterfactual reasoning within these systems, which is essential for explanation, robustness, and fairness.

Traditional probabilistic logic programs can incorporate causal semantics through transformations such as Twin networks, but these methods are computationally expensive and awkward, especially when neural predicates are involved. The duplication inherent in Twin constructions complicates neural predicate sharing and obscures the single-world semantics necessary for meaningful interventions. To address this, the authors introduce DeepSWIP, a novel framework that leverages neural materialization and algebraic model counting to enable exact, efficient, and interpretable counterfactual inference.

DeepSWIP begins by evaluating neural predicates once under fixed parameters and input contexts, replacing them with categorical probabilistic choices. This transforms the neural probabilistic logic program into a standard ProbLog model, which is amenable to causal transformations. The core innovation is applying SWIP—single-world intervention transformations—directly to this materialized program. SWIP operations remove mechanisms defining intervened atoms, redirect downstream rules to fixed intervention values, and preserve the logical structure. The counterfactual probability then emerges as a quotient of weighted model counts, expressed algebraically as ΦxY,E(p)/ΦxE(p), where Φ are multilinear WMC polynomials.

The authors rigorously prove that, under assumptions of finite grounding and unique supported models, this quotient precisely captures the counterfactual distribution relative to the learned causal model. They analyze how neural activation probabilities influence the quotient, revealing that only active, non-intervened neural probabilities matter after intervention pruning. The algebraic framework also enables sensitivity analysis, showing that calibration errors in neural predictions can be amplified by the logical structure, especially under rare evidence conditions.

Experimental validation on MPI3D demonstrates that DeepSWIP matches the correctness of deep Twin methods while being 2.14 times faster, significantly reducing inference time by avoiding endogenous duplication. The experiments also highlight the impact of neural calibration errors on counterfactual estimates and demonstrate how scoped AIPW estimators can mitigate bias in statistical inference scenarios like SUMO traffic simulations. These results confirm the method’s effectiveness in both symbolic and statistical settings.

Overall, DeepSWIP represents a substantial step forward in neural-symbolic causal reasoning. Its algebraic, quotient-based formulation offers interpretability, scalability, and theoretical guarantees, making it suitable for real-world applications requiring explainability and robustness. Future work aims to extend the framework to continuous variables, path-specific effects, and approximate inference techniques, broadening its applicability across diverse AI domains. This research paves the way for more trustworthy, causally-aware AI systems capable of reasoning about hypothetical scenarios with high efficiency and clarity.

Deep Dive

Abstract

Neurosymbolic systems such as DeepProbLog combine neural perception with probabilistic logic, but standard inference is associational. Counterfactual reasoning additionally requires a causal semantics for interventions and evidence. We introduce DeepSWIP, a single-world counterfactual semantics for DeepProbLog programs. Using neural materialization, we reduce fixed-context neural predicates to ordinary ProbLog choices, apply Single World Intervention Programs (SWIPs), and compute counterfactuals by weighted model counting (WMC) over a single transformed program. Under finite grounding and unique-supported-model assumptions, DeepSWIP is exact relative to the learned materialized FCM. The standard quotient-WMC form of ProbLog conditionals identifies active neural probabilities and explains intervention cleaning, calibration sensitivity, and rare-evidence instability. Experiments on MPI3D confirm the transformation against a DeepTwin construction against 12,000 queries, as predicted and a 2.14$\times$ inference speedup from avoiding the Twin's endogenous duplication. A SUMO HOV experiment shows that neural calibration degradation biases plug-in estimates, while a correctly scoped randomized-policy AIPW estimator removes most first-order bias for population mean and ATE estimands. Code is at https://github.com/saibib/deep_SWIP.

cs.AI

DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Dive

Abstract

Related Papers

Multi-Agent Transactive Memory

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

Abstracting Cross-Domain Action Sequences into Interpretable Workflows

Automated reproducibility assessments in the social and behavioral sciences using large language models

The Role of Feedback Alignment in Self-Distillation

A History-Aware Visually Grounded Critic for Computer Use Agents