ClawCapsule Proof Viewer

Deterministic runtime memory for long-running agents

ClawCapsule is a local-first runtime for agent execution that treats structured state as authoritative, compiles deterministic, bounded context for each step, and supports checkpoint and restore as explicit runtime operations. This viewer organises the current evidence package for technical review, beginning with the runtime substrate, then the bounded controller path, and then the supporting operational evidence from the validated path to date.

The core proposition examined here is that long-running agent continuity can be implemented through authoritative structured state, deterministic compile, ordered persistence, and recoverable execution semantics within a bounded validation envelope.
Structured state is authoritative Deterministic bounded compile Checkpoint and restore are runtime operations Validated evidence is bounded and explicit

What this page is for

This is a compact view of the current proof package for ClawCapsule. It focuses on whether the runtime claim is real, bounded, and supported by inspectable evidence.

The proof layers remain separate so the boundary between substrate evidence, controller evidence, and operational evidence stays visible.

Boundary discipline

ClawCapsule is the runtime substrate: compile, persist, checkpoint, restore, and state semantics. claw-agent-ref is the proving and benchmark harness used to exercise that substrate under controlled conditions.

The runtime claim should stand on its own. Evidence from the harness is used to validate behaviour, not to redefine the product boundary.

Four outcomes that matter

Four conclusions supported by the current evidence.

Outcome 1

Deterministic compile is real

The runtime is designed to return byte-stable bundle_text for the same authoritative state, regardless of delay, restart, or repeated invocation. That matters because it turns context assembly into a governed runtime property rather than a hidden prompt artefact.

Outcome 2

Checkpoint and restore are runtime operations

Checkpointing is not a logging convenience. Restore re-establishes prior authoritative state and removes later mutations that are outside the restored checkpoint, which is a stronger claim than simple history retention.

Outcome 3

Validation extends beyond a single endurance run

The evidence packet covers core mechanism proofs, clutter-pressure and replay exercises, limited live model-in-loop reference validation, and operational telemetry. The result is a broader validation story than one long run alone.

Outcome 4

The live controller claim is bounded

There is a constrained model-in-loop reference lane showing that a canonical controller can operate against the substrate with real persistence and replay semantics. It is evidence of viability, not a blanket autonomy claim.

Current proof stack

The proof stack is organised by evidence type.

Layer 1

Runtime-only control-lane endurance proof

This is the cleanest substrate lane. It focuses on deterministic compile, persistence, checkpoint, restore, and endurance behaviour without relying on an external daemon-backed proving packet.

Runtime substrate 50k validated 100k telemetry available

It shows that the runtime can hold its state contract under sustained operation and that compile remains a governed system function rather than a conversational side effect.

Sources: Endurance Gate Note (50k)

Layer 2

Daemon-backed replay packet

This packet covers replay and clutter-pressure behaviour, with base mechanism proofs, pressure tests, and a capped extension out to 500 steps. It demonstrates replay semantics and retained-state discipline under more adversarial conditions.

Replay packet Base proofs Capped extension to 500

It strengthens the case that state durability is operationally meaningful, while still being read as a packet with explicit boundaries rather than a claim of general intelligence.

Sources: packet README, packet summary, proof plan

Layer 3

Live model-in-loop reference validation

The reference lane shows a canonical controller operating against the substrate with grounded writes, replay, and bounded evidence admission. This is the bridge from substrate contract to live controller use.

Canonical controller Reference validation Bounded claim

It supports a practical viability reading, but it should not be read as proof of broad autonomous capability across open-ended tasks.

Sources: Reference validation report

Layer 4

Operational validation and telemetry

Supporting reports add latency, throughput, soak, and implementation behaviour detail. These are important because runtime credibility depends not only on semantic correctness but also on whether the system can operate with acceptable practical characteristics.

Operational evidence Overnight runs

This layer helps a reader judge whether the design looks governable and investable in real systems, even though it is not yet a full production benchmark suite.

Sources: Overnight run suite report

Replay packet map

This section expands the daemon-backed replay packet into the underlying cases. It keeps the mechanism layer legible by grouping the evidence into base proofs, clutter-pressure proofs, and the single capped endurance extension.

Base mechanism proofs

Five compact cases that make the core substrate mechanics easy to inspect.

Disciplined ingress 3 of 3 steps

Proves event-path admission over the current daemon HTTP surface.

The post-event compile adds 1 decision id, 1 fact id, and 1 artifact hash. The bundle includes incident_code=OMEGA-9921 and src/auth/webhook_guard.rs.

Artifacts: note, trace

Authoritative retention 3 of 3 steps

Shows that the compiled known-facts view resolves to the latest retained value instead of echoing conflicting values as equally current.

The final compile includes 1 fact id, contains release_state=approved, and excludes release_state=draft.

Artifacts: note, trace

Deterministic compile 4 of 4 steps

Shows deterministic compile over unchanged authoritative state.

compile_a and compile_b have identical bundle_text and identical included_items. Each compile emits 2 decision ids, 2 fact ids, and 1 artifact hash.

Artifacts: note, trace

Checkpoint and restore 7 of 7 steps

Shows interruption survival through checkpoint and restore.

The mutated compile differs from the checkpoint-state compile, and the restored compile matches the checkpoint-state compile exactly. The restored bundle contains re-enable HMAC verification and excludes leave temporary bypass enabled.

Artifacts: note, trace

Useful re-emission 4 of 4 steps

Shows that compile output is useful to downstream execution rather than only a storage artifact.

The final compile emits 3 decision ids, 3 fact ids, and 2 artifact hashes. The bundle contains OMEGA-9921, src/auth/webhook_guard.rs, and re-enable HMAC verification.

Artifacts: note, trace

Clutter-pressure proofs

Three cases that keep the substrate honest under growing but valid same-task clutter.

Light retention 15 of 15 steps

Shows bounded relevant retention under light clutter pressure.

The final compile token estimate is 564. It emits 3 decision ids, 4 fact ids, and 5 artifact hashes. Relevant auth-fix strings remain present while sampled clutter strings remain absent.

Artifacts: note, trace

Medium compile and re-emission 36 of 36 steps

Shows deterministic compile and useful re-emission under medium clutter pressure.

compile_a and compile_b remain identical in both bundle_text and included_items. Each compile token estimate is 567, and the final bundle keeps the relevant auth-fix subset while omitting sampled clutter strings.

Artifacts: note, trace

Heavy restore continuity 64 of 64 steps

Shows restore continuity after heavy valid clutter plus post-checkpoint mutations.

The checkpoint-state compile, compile_restored_a, and compile_restored_b are identical in both bundle_text and included_items. The restored bundle returns to the retained auth-fix subset and excludes the temporary bypass lane.

Artifacts: note, trace

Capped endurance extension

One bounded extension that demonstrates a longer clutter run without inflating it into a broader benchmark family.

500-step restore and re-emission 500 of 500 steps

This is the only endurance extension in the packet, and it remains explicitly capped.

compile_100_a and compile_100_b match, compile_250_a and compile_250_b match, and compile_500_a and compile_500_b match. The final restored compile matches the checkpointed retained subset, while the mutated compile shows the temporary proxy bypass lane and then disappears after restore.

  • final restored bundle contains OMEGA-9921, src/auth/webhook_guard.rs, re-enable HMAC verification, and reject unsigned webhook replay
  • final restored bundle excludes src/gateway/proxy.rs, leave temporary bypass enabled, and allow unsigned replay until Monday
  • bounded token estimate is 449 at step 100, 566 at step 250, 580 for the mutated compile, and 566 at the final restored compile

Artifacts: note, trace

How to read the stack

The stack is organised from core runtime proof outward: substrate validation first, replay validation second, bounded live controller evidence third, and supporting operational evidence after that.

Safe reading

The current evidence supports a serious architectural claim: durable agent continuity can be implemented as explicit runtime machinery with deterministic state compilation, persistence, and recoverable execution semantics.

It also supports a narrower practical claim: a controller can operate on that substrate in a controlled reference lane without collapsing the runtime boundary.

Unsafe reading

The evidence should not be read as proof of broad autonomous reliability, universal coding superiority, general product readiness, or a solved comparison against every transcript-centric system in the market.

Those are future diligence questions, not current validated conclusions.

Trust boundaries and current limits

The current evidence establishes a bounded but substantive validation envelope for the ClawCapsule runtime and its supporting proof stack.

Validated now

The runtime substrate has evidence for deterministic compile, authoritative checkpoint and restore semantics, and sustained endurance in the runtime-only lane. The validation package also includes a daemon-backed replay packet, limited but meaningful model-in-loop reference validation, and supporting operational telemetry and latency evidence.

Not validated now

The current package does not yet establish broad comparative dominance over the best transcript-centric systems, full production hardening across varied workloads, or unconstrained long-horizon autonomous task completion in the general case.

Why that matters

Current diligence position The package supports technical review of the core runtime claim, while leaving broader comparative, deployment, and generalisation questions open for further diligence.

Core evidence documents

These documents form the primary evidence path for the runtime claim: claim boundaries, methodology, exact gate proof, and substrate packet summary.

Supporting technical evidence

These reports provide supporting operational and bounded integration context. They sit behind the core runtime evidence and do not widen the primary validation claim.