# Runtime Validation and Evidence Note

This note explains how current runtime evidence is produced and how it should be interpreted conservatively.

## 1. Evidence posture

- Runtime claims should distinguish:
  - code/test coverage
  - runtime-only harness evidence
  - bounded inference
- Historical notes remain useful, but the current evidence posture should be anchored to the latest reviewed gate artifacts.
- In the current repo snapshot, the latest reviewed runtime-only 50k reference-lane gate remains recorded in:
  - `analysis/runtime_only_probe/20260313_runtime_only_50k_split_control_evidence_gate.md`
  - run id `guarded-segmented-runtime-only-50k-validation-20260313-002`

## 2. Validation methodology currently used

- The repo uses layered validation:
  - unit and integration tests for determinism, crash recovery, restart safety, diagnostics behavior, and storage fail-closed rules
  - targeted daemon tests for split and guarded-segmented load/write paths
  - runtime-only probe harnesses that exercise the real HTTP runtime without model calls
- The current runtime-only 50k methodology uses:
  - preflight checks
  - staged totals at `20000` and `50000`
  - periodic compile checkpoints every `5000` accepted events
  - stage-end compile checkpoints
  - restart/open checkpoints after stage boundaries
  - per-lane runtime logs, compile samples, and summary artifacts
- The current harness compares:
  - `split_control` as the control/reference lane
  - `guarded_segmented` as a challenger lane

## 3. What the current 50k runtime-only gate proves

- It proves that the current control/reference lane cleared the exact staged runtime-only gate on the recorded configuration for run `guarded-segmented-runtime-only-50k-validation-20260313-002`.
- The following control-lane checkpoints completed successfully in that run:
  - `stage_end_020000`
  - `restart_020000`
  - `stage_end_050000`
  - `restart_050000`
- It proves that, on that run:
  - control-lane restart stayed healthy
  - split-unit cold reconstruction stayed disabled at the reviewed restart checkpoints
  - retained bytes stayed `0 / 0` at the validated 50k stage end for the control lane

## 4. What the current 50k runtime-only gate does not prove

- It does not prove that all runtime storage paths passed 50k.
- It does not clear `guarded_segmented`.
- It does not justify promoting `guarded_segmented` to the reference lane.
- It does not prove unbounded long-horizon scaling.
- It does not prove equivalent behavior across different hardware, OS, storage media, or future commits.
- It does not prove model-in-the-loop behavior, since the runtime-only harness does not include real model calls.
- It does not make diagnostics `dump-bundle` a substitute for the standard `/v1/compile` gate.

## 5. How to interpret performance and scaling evidence conservatively

- Treat reported latency and throughput figures as bounded envelopes tied to:
  - a specific run id
  - a specific harness
  - a specific machine/configuration
  - a specific lane
- Prefer wording such as:
  - `validated on the current split_control reference lane`
  - `not cleared for guarded_segmented`
  - `bounded to the reviewed runtime-only gate`
- Avoid wording such as:
  - `all paths passed`
  - `scales indefinitely`
  - `production-proven across configurations`
- Keep event, compile, restart/open, and diagnostics evidence separated. A healthy event lane does not automatically prove a healthy diagnostics lane, and vice versa.

## 6. Historical versus current evidence

- `analysis/runtime_only_probe/20260312_runtime_only_50k_validation_note.md` remains historically correct for its failed run.
- The later control-lane gate supersedes it narrowly for current 50k status interpretation.
- Historical negative evidence should not be erased, but later positive evidence must be read with equal care and with lane-specific scope.

## 7. Current caution areas

- `guarded_segmented` remains outside the acceptable storage/restart envelope from the currently reviewed evidence.
- Foundational public docs still need addenda so that evidence posture and implementation status are explicit rather than inferred.
- Until a future ADR says otherwise, storage-lane promotion and retirement remain governance questions, not settled architecture.
