ClawCapsule Proof Viewer

Deterministic runtime memory for long-running agents

ClawCapsule is a local-first runtime for agent execution built on a different view of continuity. Rather than relying mainly on an accumulating transcript, it treats structured state as authoritative, compiles bounded context deterministically for each step, and exposes checkpoint and restore as explicit runtime operations. This page is a compact route through the current technical evidence.

The core proposition examined here is that long-running agent continuity can be implemented through authoritative structured state, deterministic compile, ordered persistence, and recoverable execution semantics within a bounded validation envelope.

split_control 50k reference gate is banked Replay packet makes compile, restore, and re-emission inspectable Live seam now includes bounded banked slices, not just seam theory Limits remain explicit instead of being blurred by narrative shorthand

What this is

This is a compact view of the current proof package for ClawCapsule. It focuses on whether the runtime claim is real, bounded, and supported by inspectable evidence.

Runtime proof, replay proof, bounded live-seam proof, and operational telemetry are kept separate so each can carry the right weight.

Why it matters

If long-running agents are to behave more like durable systems than elongated chat sessions, then continuity, recovery, and context assembly need to be runtime properties. The current pack shows that direction in executable, reviewable form rather than only in architecture prose.

Read the stack in order: runtime contract first, replay packet second, bounded live-seam results third, and the boundary documents after that.

Four Outcomes that matter

Four conclusions that hold when the evidence is read conservatively and in sequence.

Outcome 1

Deterministic compile is real

The runtime is designed to return byte-stable bundle_text for the same authoritative state, regardless of delay, restart, or repeated invocation. That matters because it turns context assembly into a governed runtime property rather than a hidden prompt artefact.

Outcome 2

Checkpoint and restore are runtime operations

Checkpointing is not a logging convenience. Restore re-establishes prior authoritative state and removes later mutations that are outside the restored checkpoint, which is a stronger claim than simple history retention.

Outcome 3

Validation extends beyond a single endurance run

The evidence packet covers core mechanism proofs, clutter-pressure and replay exercises, limited live model-in-loop reference validation, and operational telemetry. The result is a broader validation story than one long run alone.

Outcome 4

The live controller claim is bounded

There is a constrained model-in-loop reference lane showing that a canonical controller can operate against the substrate with real persistence and replay semantics. It is evidence of viability, not a blanket autonomy claim.

Current proof stack

The pack is arranged by evidence type so the runtime case and the bounded live-seam case can be assessed without collapsing into a single oversized claim.

Layer 1

Runtime-only control-lane endurance proof

This is the cleanest banked proof of the runtime contract itself. The validated result is the exact split_control 50k reference gate on the recorded runtime-only lane, with compile, persistence, checkpoint, restore, and restart-open behaviour tested inside a tightly governed scope.

Runtime substrate 50k validated 100k telemetry available

It shows that the runtime can hold its state contract under sustained operation and that compile remains a governed system function rather than a conversational side effect.

Sources: 50k gate note, 100k soak telemetry

Layer 2

Daemon-backed replay packet

This layer makes the runtime legible under inspection. Rather than relying on live model calls, it replays bounded HTTP interactions across the daemon surface and shows disciplined ingress, authoritative retention, deterministic compile, checkpoint and restore, useful re-emission, clutter-pressure behaviour, and a capped 500-step replay extension.

Replay packet Base proofs Capped extension to 500

This matters because it turns the substrate from an architectural assertion into an inspectable mechanism. It should still be read as substrate proof rather than as live controller or model capability evidence, but it materially strengthens the claim that the runtime behaviour is real.

Sources: packet README, packet summary, proof plan

Layer 3

Live model-in-loop reference validation

This is the bridge from substrate proof to live controller use. The live seam should now be read as a bounded proof pack, not only as an early anchor with caveats. The strongest current reading is a canonical controller on the locked path with a small but meaningful set of banked bounded results: cross-family portability at a stricter rung, same-surface endurance through 400 turns, short-horizon signer/fallback preservation, and narrow fixed-dossier research transfer.

Canonical controller Reference rounds + backfill 400-turn boundary Bounded claim

This does not establish broad autonomous capability. It does establish that the live path is no longer merely structural or speculative. It now contains inspectable, banked slices that can be interrogated directly.

Sources: Round 2 anchor, portability report, endurance to 400, signer/fallback Slice 2, fixed-dossier transfer

Layer 4

Operational validation and telemetry

Operational reports add latency, soak, determinism, and implementation-behaviour context around the validated lanes. They help show how the runtime behaves under longer-running conditions and they record both positive signals and practical cautions.

Operational evidence Overnight runs 100k soak telemetry

This layer is operational support rather than substitute proof for the substrate or live seam. Its value is that it adds operating-context evidence around the validated path without widening the substrate or live-seam claim.

Sources: overnight suite report, 100k soak telemetry

Replay packet map

The replay packet is grouped into base mechanism proofs, clutter-pressure proofs, and one capped endurance extension so the mechanism layer can be read directly rather than through summary shorthand.

Base mechanism proofs

Five compact cases that make the core substrate mechanics easy to inspect.

Disciplined ingress 3 of 3 steps

Proves event-path admission over the current daemon HTTP surface.

The post-event compile adds 1 decision id, 1 fact id, and 1 artifact hash. The bundle includes incident_code=OMEGA-9921 and src/auth/webhook_guard.rs.

Artifacts: note, trace

Authoritative retention 3 of 3 steps

Shows that the compiled known-facts view resolves to the latest retained value instead of echoing conflicting values as equally current.

The final compile includes 1 fact id, contains release_state=approved, and excludes release_state=draft.

Artifacts: note, trace

Deterministic compile 4 of 4 steps

Shows deterministic compile over unchanged authoritative state.

compile_a and compile_b have identical bundle_text and identical included_items. Each compile emits 2 decision ids, 2 fact ids, and 1 artifact hash.

Artifacts: note, trace

Checkpoint and restore 7 of 7 steps

Shows interruption survival through checkpoint and restore.

The mutated compile differs from the checkpoint-state compile, and the restored compile matches the checkpoint-state compile exactly. The restored bundle contains re-enable HMAC verification and excludes leave temporary bypass enabled.

Artifacts: note, trace

Useful re-emission 4 of 4 steps

Shows that compile output is useful to downstream execution rather than only a storage artifact.

The final compile emits 3 decision ids, 3 fact ids, and 2 artifact hashes. The bundle contains OMEGA-9921, src/auth/webhook_guard.rs, and re-enable HMAC verification.

Artifacts: note, trace

Clutter-pressure proofs

Three cases that keep the substrate honest under growing but valid same-task clutter.

Light retention 15 of 15 steps

Shows bounded relevant retention under light clutter pressure.

The final compile token estimate is 564. It emits 3 decision ids, 4 fact ids, and 5 artifact hashes. Relevant auth-fix strings remain present while sampled clutter strings remain absent.

Artifacts: note, trace

Medium compile and re-emission 36 of 36 steps

Shows deterministic compile and useful re-emission under medium clutter pressure.

compile_a and compile_b remain identical in both bundle_text and included_items. Each compile token estimate is 567, and the final bundle keeps the relevant auth-fix subset while omitting sampled clutter strings.

Artifacts: note, trace

Heavy restore continuity 64 of 64 steps

Shows restore continuity after heavy valid clutter plus post-checkpoint mutations.

The checkpoint-state compile, compile_restored_a, and compile_restored_b are identical in both bundle_text and included_items. The restored bundle returns to the retained auth-fix subset and excludes the temporary bypass lane.

Artifacts: note, trace

Capped endurance extension

One bounded extension that demonstrates a longer clutter run without inflating it into a broader benchmark family.

500-step restore and re-emission 500 of 500 steps

This is the only endurance extension in the packet, and it remains explicitly capped.

compile_100_a and compile_100_b match, compile_250_a and compile_250_b match, and compile_500_a and compile_500_b match. The final restored compile matches the checkpointed retained subset, while the mutated compile shows the temporary proxy bypass lane and then disappears after restore.

final restored bundle contains OMEGA-9921, src/auth/webhook_guard.rs, re-enable HMAC verification, and reject unsigned webhook replay
final restored bundle excludes src/gateway/proxy.rs, leave temporary bypass enabled, and allow unsigned replay until Monday
bounded token estimate is 449 at step 100, 566 at step 250, 580 for the mutated compile, and 566 at the final restored compile

Artifacts: note, trace

How to read the stack

The stack is strongest when read from substrate outward. Within the live layer, banked results, seam foundation, correction notes, and non-cleared boundaries are separated so each document can be used at the right level.

Safe reading

The current evidence supports a substantial architectural claim: durable agent continuity can be implemented as explicit runtime machinery with deterministic state compilation, persistence, and recoverable execution semantics, and that machinery can support a bounded canonical controller path on top of it.

It also supports a narrower practical claim: the live seam now contains a small set of real, inspectable carried-task results rather than only a structural argument for why such a seam should exist.

Unsafe reading

The evidence should not be read as proof of broad autonomous reliability, universal coding superiority, product readiness in the general case, or a solved comparison against every transcript-centric approach in the market.

Nor should it be compressed into shorthand that implies live 500-turn clearance, restart clearance, full Slice 3 follow-through, quality100 success, or broad research generalisation.

Live seam proof map

The bounded model-in-loop pack is grouped in the order most useful for review: strongest current bounded results first, seam foundation second, corrections and blocker localisation third, and upper boundaries last.

Banked bounded results

The current live-seam case begins here. These backfilled reports are the strongest outward-facing controller artefacts in the current repo snapshot.

Portable runtime-bundle-only closure pass / 2 families

A stricter runtime-bundle-only confirmed-alternative-closure rung transferred across itsdangerous and blinker under the same bounded aided regime.

The safe read is cross-family transfer of a specific rung, not broad or unaided portability.

Artifact: portability reference report

Endurance boundary to 400 pass / banked to 400

Same-surface endurance is banked through 400 turns on the frozen itsdangerous surface.

The value of the report is precision: 500 turns remains non-cleared and restart proof remains separate.

Artifact: endurance reference report

Signer/fallback Slice 2 pass / short horizon

The integrated signer/fallback burden could be converted and preserved over the short Slice 2 attribution horizon on the locked path.

This is a meaningful practical result because the burden is explicit and integrated, but the scope remains short-horizon and benchmark-local.

Artifact: Slice 2 reference report

Fixed-dossier research transfer pass / 3 turns

Three dossier-bounded research turns completed on one run object with preserved provenance and a clean bounded transfer read.

The report also sharpens the boundary: the guarded live artefact did not durably prove the stricter turn-2 recommendation-only zero-write continuation.

Artifact: Phase 4 reference report

Seam foundation

The original reference rounds remain important because they show where the live claim began, what the seam established, and why the later backfill matters.

Round 1 partial pass

Established canonical-envelope discipline, grounded zero-write, and replay equivalence, but the bounded patch turns were semantically weaker than the intended write-backed contract.

Artifact: reference_validation_report.md

Round 2 partial pass

Confirmed the seam stayed fail-closed and that single-step checkpoint-before-write could clear, but the bounded turn-2 patch paths stopped at a duplicate-decision checkpoint boundary.

That makes Round 2 a structural anchor and a useful failure-localisation report, not the full live-seam story on its own.

Artifact: reference_validation_report.md

Corrections and localisation

These notes make the later pack safer to interpret. They correct older reads and localise real blockers rather than allowing ambiguity to carry the story.

Foundational corrections zero-write + retention

The pack retains the March 11 canonical zero-write fix and the later PLAN_SLICE retention correction because later seam claims depend on those corrections being read correctly.

Artifacts: PR4 zero-write fix, PLAN_SLICE retention correction

Early family confirmation corrected pluggy rerun

A corrected metric read converted the locked pluggy delayed re-entry family from a false negative into a properly grounded supported write-backed success.

Artifact: pluggy corrected-metric rerun

Blocker localisation late Phase 3B boundary

The late Phase 3B consolidation kept the active blocker on delayed final-turn conversion under sparse continuity, not on runtime/storage failure.

Artifact: cross-family delayed final-turn boundary

Current upper boundaries

The live pack is materially stronger than it was, but it remains bounded. These items keep the non-cleared and limit-defining reads visible without letting them obscure the positive case.

Signer/fallback Slice 3 upper boundary

The original signer/fallback contract could be repaired on turn 4 on the frozen same-file surface, but the full follow-on burden was not cleared.

This is a useful upper-boundary behavioural result, not a full practical-proof clearance.

Artifact: Slice 3 upper-boundary report

Localised and negative boundaries restart, quality, delayed final-turn

The retained negative pack matters because it keeps the live-seam story technically honest: delayed final-turn conversion was the late Phase 3B blocker, restart survival did not become restart clearance, the dense-churn proving gate stayed no signal, and quality100 remains a corrective failure read.

Artifacts: delayed final-turn boundary, restart boundary, dense-churn proving gate, quality100 postmortem

Navigation and claim controls viewer index, gap matrix, claim map

These documents keep evidence strength, section order, and safe wording explicit now that the live pack is broader than the original seam rounds. They govern interpretation, but they do not replace the core technical story.

Artifacts: viewer index, gap matrix, claim map

Excluded readings: no broad unaided portability, no live 500-turn endurance clearance, no restart clearance, no Slice 3 clearance, no quality100 success, and no broad open-web research proof.

Trust boundaries and current limits

The current evidence establishes a bounded but substantive validation envelope for the ClawCapsule runtime and a bounded canonical controller path.

Validated now

The runtime substrate has evidence for deterministic compile, authoritative checkpoint and restore semantics, a banked 50k runtime-only reference lane, and a replay packet that makes the restore and re-emission claim easy to inspect. The live seam now extends to bounded portability, endurance through 400 turns on one frozen surface, short-horizon signer/fallback preservation, and narrow three-turn fixed-dossier transfer.

Still intentionally bounded

The current package does not establish broad comparative dominance over transcript-centric systems, unconstrained long-horizon autonomous task completion, live restart clearance, live 500-turn continuity, full signer/fallback follow-through, or broad research transfer beyond a fixed dossier.

Why the separation matters

The pack is more credible because those limits stay explicit. It supports serious technical diligence on the runtime and bounded controller case while leaving broader comparative, deployment, and generalisation questions open rather than implying more than has been shown.

Core evidence documents

Shortest path through the pack: exact runtime proof first, replay packet second, live-seam foundation third, strongest current bounded reports fourth, and interpretation or release-boundary documents after that.

analysis/runtime_only_probe/20260313_runtime_only_50k_split_control_evidence_gate.md

Records the exact current reference-lane 50k runtime-only gate clearance, including run id, checkpoints, challenger-lane non-clearance, and the narrow scope of the banked claim.

Runtime proof 50k gate

analysis/runtime_substrate_proof/SUMMARY.md

Summarises the replay packet, including base mechanism proofs, clutter-pressure cases, and the capped 500-step extension that demonstrates retained restore and re-emission under bounded clutter.

Replay packet Mechanism summary

analysis/model_in_loop_reference_validation/20260313_round2/reference_validation_report.md

The original bounded live-seam anchor report: canonical controller, grounded writeback, fail-closed envelope discipline, checkpoint-before-write clearance on the single-step lane, and the turn-2 checkpoint boundary that kept the broader claim narrow.

Live seam Reference anchor

Backfilled bounded live-seam reports

The current bounded controller pack is now surfaced as formal reports covering portable runtime-bundle-only closure, endurance to 400, signer/fallback Slice 2, Slice 3 upper boundary, and fixed-dossier research transfer.

Live seam Current bounded reports

docs/Runtime-Validation-and-Evidence-Note.md

Explains the runtime validation methodology, conservative reading discipline, and how the runtime-only evidence should be interpreted relative to broader claims.

Authority Runtime methodology

docs/Validation-Posture-and-Release-Boundaries.md

Defines the release-boundary reading, including what the current repo does not yet justify saying about storage paths, model-in-loop behaviour, and broader scaling or readiness.

Authority Release boundary

Supporting technical evidence

These documents sharpen, qualify, or organise the main proof path. They help govern interpretation; they do not replace the core evidence.

Replay packet review docs

The replay packet is easier to trust when its review method is visible. Start with the packet README and proof plan if you want to inspect case design, reading method, and packet boundaries directly rather than reading only the summary.

Replay packet Review method

Live-seam navigation and claim controls

The viewer index, gap matrix, and claim map keep report strength, section order, and safe wording explicit now that the live pack is broader than the original seam rounds.

Live seam Navigation + discipline

Foundational live-seam corrections

The PR4 zero-write fix and PLAN_SLICE retention correction matter because later seam claims depend on these corrections being understood correctly.

Live seam Corrections

Family proof and blocker localisation

The pluggy corrected-metric rerun adds a real early family success once the false metric read was repaired, and the delayed final-turn boundary note keeps the late Phase 3B blocker tied to conversion behaviour rather than runtime or storage.

Live seam Family support

Negative and non-cleared technical notes

The restart boundary, dense-churn proving gate, and quality100 postmortem remain visible because the pack is more credible when non-success results are easy to inspect.

Corrections Negative boundaries

Operational telemetry and overnight runs

The overnight runtime-only suite report adds phase-outcome and determinism context, while the 100k soak telemetry report adds longer-run latency and storage growth context, including the recorded acceleration caution.

Operational Overnight + 100k soak