From Boolean Verdicts to Quantitative Witnesses: Why DAG Topology Needs a Trail Semantics

Tue, 19 May 2026 03:30:00 +0800

Since Pearl (2009), causal inference on DAGs has crystallized around a powerful but austere toolkit: boolean d-separation and do-calculus. Does evidence flow? Full stop. Does it flow after an intervention? Full stop. This framework is sufficient for causal identification—determining whether an effect is estimable from observed data. But it is curiously silent on a question that seems equally natural: how much flows, through which channels, and with what residual structure?

I want to argue that this silence is not a minor gap; it is a symptom of two communities talking past each other. And the bridge between them is a rigorous notion of trail as witness.

The Boolean Trap

Traditional DAG analysis gives you a verdict at the endpoints. $X$ is d-separated from $Y$ given $Z$: True or False. The intervened distribution $P(y \mid do(x))$ is identifiable: Yes or No. This is the “full stop” regime.

But a boolean verdict discards the path. It tells you that water reaches the tap, but not whether it travelled through lead pipes, detoured through a cistern, or was siphoned off midway. To know the texture of a causal chain—to distinguish a default association from a derived conclusion, or to know whether an observed correlation persists because of the presence of one variable or the absence of another—you need to traverse the trail in slow motion, node by node, junction by junction.

Trail traversal gives the texture of causal chains; endpoint booleans do not.

Two Communities, One Missing Link

Remarkably, two intellectual neighborhoods have been circling this problem from opposite ends without quite meeting in the middle:

The Causal Inference community (Pearl et al.) is structurally obsessed. It abstracts the world into absolute black-and-white: if d-separation holds, the answer is True; otherwise, False. They are plumbers who care only whether the pipe is open, not what contaminant rides the flow. After 2009, the field’s theoretical engine on this particular front seems to have stalled—perhaps because every remaining graph-theoretic challenge starts to look like the Four Color Problem, solvable only by brute force unless a new representational insight appears.

The Quantitative Information Flow (QIF) community (Alvim, Palamidessi, Smith et al.) is capacity-obsessed. They compute leakage in bits, bound it with Shannon capacity, and seek KKT certificates for optimality. But their channel models are often toy simplifications, stripped of topological depth. They measure the volume of water without being able to trace the pipe’s winding route through the labyrinth.

Both communities study DAGs plus information flow. Yet:

Causal inference has the topology but no quantitative bound.
QIF has the quantitative machinery but no trail-level topological witness.

No one has connected them. Why? Because the causal side asks “Is it identifiable?” and the QIF side asks “How many bits leak?"—and neither side has a formalism that answers both at once while retaining the path as a first-class object.

Lemma 1: The Collider Ancestor Leak & Rerouting

Consider a collider $u \rightarrow w \leftarrow v$. Textbook d-separation says evidence can pass through $w$ only when $w$ or a descendant is observed. But this description conflates two distinct phenomena.

Suppose we are testing whether $X$ and $Y$ are d-separated given $Z$. If a descendant of $w$ leads only to $X$ or $Y$ and not to $Z$, then the collider is not “activated by $Z$” in any global sense. It is activated because it creates a connected route from a source to a destination. The conditioning set is, locally, a red herring.

Path Normalization / Rerouting

Claim: If a descendant of $w$ leads to $X$, the original long path was never necessary; the descendant path itself provides a shorter route.

Proof sketch. Take any active trail passing through collider $w$ and reaching $X$ via descendant $d$. Replace the subpath from $w$ to $X$ with $w \leadsto d \leadsto X$. By minimality of active trails (or induction on cutset size), the rerouted path is no longer than the original and preserves endpoint connectivity. The original trail was therefore non-minimal, containing a redundant detour shortcuttable through the collider’s own descendant. ∎

The upshot: collider “activation” is often just topological connectivity leaking outward, not a special global event mediated by the conditioning set.

Lemma 2: The Junction Obligation Problem

If we accept that trails matter, we need a local criterion for their validity that does not require re-scanning the entire graph at every step.

Decompiling the Trail

Decompose an active trail into a stateful path type—call it ActiveRoute or BayesBallPathT. Each traversal step carries a direction tag:

outOf: leaving via an outgoing edge.
into: entering via an incoming edge.

These tags are not bookkeeping; they encode the obligation imposed by the junction just traversed.

Global Topology → Local Type Constraints

In the global formulation, a junction $(A, B, C)$ is valid only after inspecting the whole graph and the conditioning set. Under the state-machine view:

Obligations are pushed to interfaces. The direction label at a boundary encodes what junction type is expected on the other side.
Composition is type checking. Concatenating two path segments requires only that the output state of the first matches the input obligation of the second. No global inspection needed.
Local consistency implies global consistency. If every adjacent pair of segments satisfies their shared interface obligation, the entire trail is valid by construction.

The global topological constraint of d-separation becomes a local type-system constraint on path segments. The type of a segment is its pair of boundary states; composition is well-typed iff obligations align.

What Is Still Missing

Lemmas 1 and 2 give us a cleaner, more local way to reason about whether a trail is active. But they do not yet answer the quantitative question:

Not “does information flow?” but “how many bits flow through this specific trail?”

That question requires machinery that currently lives only in QIF:

Channel capacity between observables and secrets along a specific topological route.
KKT conditions to certify that a given leakage bound is optimal under the graph’s structural constraints.
Shannon bounds that respect the DAG’s conditional-independence structure rather than assuming a flat channel matrix.

What does not yet exist—and what I am groping toward—is a framework where:

The DAG provides the topological syntax.
The trail provides the witness (the specific path whose capacity we measure).
KKT + channel capacity provide the quantitative certificate.
A proof assistant (Lean4, Coq) checks both the topological type constraints (Lemma 2) and the information-theoretic bounds.

Takeaway

Pearl’s boolean tools are not wrong; they are insufficient for anyone who wants to know the texture of a causal chain. QIF’s quantitative tools are not wrong; they are topologically blind. The missing piece is a trail semantics that makes the path a first-class object—so that we can ask not only whether an intervention opens a channel, but how wide that channel is, what contaminants it carries, and whether the leakage is bounded.

We need to move from “full stop” to “slow-motion replay.” The trail is the witness.

Constructing El Gamal & Kim Proof Chain for CutSetBound.lean

Fri, 15 May 2026 10:00:00 +0800

Reconstructing the Relay Channel: Modernizing the Cut-set Bound and Degraded Capacity Proofs

When aiming to “extract dependencies and compress proof chains,” few starting points are as effective as the core results from Chapter 16 of El Gamal and Kim: the cut-set upper bound for general discrete memoryless relay channels and the capacity theorem for physically degraded relay channels. These results, originating from the landmark 1979 Cover–El Gamal paper, hold immense historical significance but carry a structural “debt” that allows for substantial modernization and simplification.

Why Reconstruct?

The achievability proof in the original 1979 paper relies on a “random partition + ambiguity set intersection” strategy. While elegant for its time, it is no longer the shortest path to the result.

If our goal is simply to reach the same Decode–Forward rate, we can employ a more direct “降维打击” (dimensionality reduction) through regular encoding and backward decoding. This approach entirely eliminates the need for random partitioning, binning analysis, and the lengthy Slepian–Wolf style derivations. Simultaneously, the converse can be modularized into three clear steps: two applications of Fano’s inequality, a causal Markov chain argument, and a single-letterization lemma via concavity.

The core logic of this reconstruction is that degradedness is not a prerequisite for achievability. The Decode–Forward construction holds for general relay channels; the degradedness assumption is only required in the final step of the converse to tighten the bound.

Selecting Theorems: Where is the Room for Compression?

Within the landscape of network information theory, I have selected several candidates for modular reconstruction. The criterion is not just the fame of the conclusion, but the potential for the proof chain to be further abstracted and streamlined.

Candidate Theorem	Text Location	Core Statement	Simplification Potential
Capacity of Physically Degraded Relay Channels	§16.4, p.386	$C = \max \min {I(X_1;Y_2	X_2), I(X_1,X_2;Y_3)}$
Cut-set Bound for General Relay Channels	§16.2, p.384	The outer bound for capacity	High. Causality arguments can be modularized without coupling to specific coding schemes.
Gel’fand–Pinsker Theorem	§7.6, p.178	Capacity with non-causal state information at the encoder	High. The auxiliary variable selection and Csiszár sum identity offer excellent room for abstraction.

Reorganizing the Logical Chain

We compress the entire proof into the following logical path:

The Two-Cut Converse:
- Cut 1: At the receiver, using Fano’s inequality to bound $I(X_1, X_2; Y_3)$.
- Cut 2: In an “enhanced” system where the relay’s observations are shared, bounding $I(X_1; Y_2, Y_3 | X_2)$.
- This relies only on Fano, causality, and memorylessness—no degradedness required.
Achievability via Backward Decoding:
- Utilize block-Markov superposition coding.
- The relay decodes forward (block-by-block), while the destination decodes backward.
- The “magic” of backward decoding is that once the next block’s message is known, the current block’s decision becomes a standard single-user problem, bypassing the need for binning.
Specializing to Degraded Channels:
- Introduce the physical degradedness Markov chain $X_1 \to (X_2, Y_2) \to Y_3$.
- The second cut’s mutual information term collapses, closing the gap between the upper and lower bounds.

Next Steps: Toward Formalization

The current proof draft is mathematically closed. The remaining refinements involve a detailed bookkeeping of the $\delta(\varepsilon)$ terms in the typicality analysis and a formalization of the notation for extending the three-node model to general time-expanded Directed Acyclic Graphs (DAGs).

For formal verification projects like Lean, the most elegant path is to decompose this into three independent lemmas: the Two-Cut Converse Lemma, the Backward Decoding Achievability Lemma, and the Additive Decomposition for Orthogonal Networks. This represents the purest and most modular form of these classic information-theoretic results.

Explorative Thoughts on My Hugo Project