# Model-In-The-Loop Reference Validation Round 1

Date: 2026-03-13

## 1. Executive judgement
partial pass

## 2. Exact scope run
- runtime lane: split_control via explicit storage-root-scoped split gate on http://127.0.0.1:7341
- agent lane: claw-agent-ref canonical
- provider: openai (gpt-4.1-mini)
- task set:
  - inspect_only: realistic_coding_tool_backed (reference-round1-20260313-realistic-coding)
  - single_step_patch_verify: realistic_write_backed_fix_verify (reference-round1-20260313-realistic-write-backed)
  - negative_zero_write_ungrounded: hypothesis_only_zero_write (reference-round1-20260313-hypothesis-only)
  - negative_zero_write_stale_write_ref: unknown_write_file_ref_zero_write (reference-round1-20260313-unknown-write-file-ref)
  - two_turn_inspect_then_patch_verify: bounded_two_turn_realistic (reference-round1-20260313-bounded-two-turn)
  - bounded_follow_up_fix: bounded_follow_up_realistic (reference-round1-20260313-bounded-follow-up)

## 3. What was proven
- The live split_control daemon and canonical claw-agent-ref lane exchanged runtime-managed bundle_text over /v1/compile on every turn.
- All observed OpenAI outputs stayed inside the strict canonical envelope without repairs.
- Inspect-only and single-step patch/verify tasks admitted grounded controller_fact_decision events using current-turn tool refs.
- Negative and ungrounded turns zero-wrote cleanly with no hidden event write.
- Duplicate event replays were success-equivalent via deduplicated 409 responses.
- No legacy_action fallback or generic runtime event type was observed.

## 4. What failed or remained weak
- The bounded two-turn and bounded follow-up run_loop scenarios did not materialize write_file evidence, did not checkpoint, and did not execute the intended auto-write-backed patch turns.
- Those bounded patch turns still admitted controller_fact_decision outputs grounded only in read_file/run_tests evidence, so the live result was semantically weaker than the intended patch/verify contract.
- Checkpoint-before-write was therefore not proven live on the bounded auto-write path.

## 5. Failure attribution
- primary: controller/runtime boundary issues
- secondary: model-following issues
- not primary: task-design issues, provider issues
- detail: The critical miss was a boundary-format mismatch: the controller helper that checks whether runtime bundle_text records the prior bounded inspection only parses dash-prefixed section items, while the live runtime compile output uses FACT/DECISION lines. That prevented host-side write planning and checkpointing on bounded patch turns. The model also failed to zero-write when write_file evidence was absent, but that weakness was exposed by the boundary mismatch rather than by provider instability.

## 6. Whether the canonical seam remained intact throughout
- status: intact_but_policy_weak
- detail: The seam remained structurally canonical throughout: compile -> canonical envelope -> zero or one controller_* event, with no legacy fallback. The weakened area was the bounded write-backed controller policy layered on that seam, not the seam shape itself.

## 7. Whether a second broader round is justified next
- justified: false
- reason: A broader second model-in-the-loop round should wait for a fix and rerun of the bounded patch/verify lane, because the current narrow reference round did not prove live checkpoint-before-write or actual bounded auto-write-backed execution.

## 8. Exact per-turn outcomes
| scenario | class | turn | admission | event_count | checkpoint_count | duplicate_replay | exact_envelope | summary |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| realistic_coding_tool_backed | inspect_only | 1 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Completed current-turn sandbox repo inspection |
| realistic_write_backed_fix_verify | single_step_patch_verify | 1 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Completed single-file shipping threshold fix and verification |
| hypothesis_only_zero_write | negative_zero_write_ungrounded | 1 | zero | 0 | 0 | n/a | True | Zero-write: no authoritative fact or decision survived admission |
| unknown_write_file_ref_zero_write | negative_zero_write_stale_write_ref | 1 | zero | 0 | 0 | n/a | True | Zero-write: unknown controller-issued evidence ref tool:write_file#stale-turn-t9999 |
| bounded_two_turn_realistic | two_turn_inspect_then_patch_verify | 1 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Completed bounded inspection for shop/shipping.py |
| bounded_two_turn_realistic | two_turn_inspect_then_patch_verify | 2 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Update condition in shop/shipping.py from '>' to '>=' on line 5 to include exact threshold 4000 cents for free shipping. |
| bounded_follow_up_realistic | bounded_follow_up_fix | 1 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Fix shipping fee threshold behavior by changing comparisons from > to >= to correctly apply free shipping at exact threshold values. |
| bounded_follow_up_realistic | bounded_follow_up_fix | 2 | controller_fact_decision | 1 | 0 | 409/dedupe=True | True | Fix shipping fee threshold behavior by changing comparisons from > to >= to correctly apply free shipping at exact threshold values. |

## 9. Objective status
- 1_compile_bundle_text_authoritative_memory_text: pass ? All 8 turns issued /v1/compile and received non-empty bundle_text; compile requests remained bounded to task_id/budget/focus/format, and second-turn compile output in the bounded two-turn run contained the first-turn inspection fact/decision.
- 2_strict_canonical_envelope: pass ? All 8 observed model outputs parsed as exact top-level canonical envelopes with schema_version clawcapsule.controller.response.v1alpha1. No repair turns were needed.
- 3_tool_backed_grounding_and_zero_or_one_writeback: partial_pass ? Inspect-only and single-step patch/verify scenarios admitted grounded controller_fact_decision events with current-turn tool refs, and all 8 turns stayed zero-or-one-event. However, the bounded patch turns never materialized write_file evidence and still admitted non-empty fact/decision outputs.
- 4_negative_or_ungrounded_zero_write: pass ? The hypothesis-only scenario zero-wrote with reason non_authoritative_only, and the stale write_file ref scenario zero-wrote with reason unknown_evidence_ref.
- 5_duplicate_mutation_retry_success_equivalent: pass ? Duplicate replays for every admitted event returned deduplicated=true with HTTP 409 success-shaped responses, including bounded-turn events.
- 6_no_hidden_legacy_or_generic_writeback_fallback: pass ? All admitted writes were controller_fact_decision events. No tool_result, test_result, code_change, or legacy_action-style generic writeback appeared.
- 7_checkpoint_before_write_on_auto_write_backed_turns: fail ? No bounded auto-write turn emitted a checkpoint or a current-turn write_file ref. Direct host-side probing on the same generated repos showed the write plan was available, but the live bounded flow failed to recognize the prior inspection state from actual bundle_text and therefore never reached checkpoint-before-write.

## 10. Exact files changed, if any
- No tracked source files in either repo were edited by this round.
- Exact generated-file list: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_file_changes.txt

## 11. Exact evidence artifacts produced
- raw report: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_raw.json
- summary json: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_summary.json
- markdown judgement: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_report.md
- artifact manifest: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_artifacts.txt
- file-change manifest: c:\Users\shaun\software_development\ClawCapsule\analysis\model_in_loop_reference_validation\20260313_round1\reference_validation_file_changes.txt
