the-grid-cc 1.7.2 → 1.7.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,867 @@
1
+ # Feature: Auto-Verify Default
2
+
3
+ ## Research Summary
4
+
5
+ ### Industry Patterns from CI/CD and Production Systems
6
+
7
+ From my research on verification patterns in modern CI/CD pipelines and production deployment systems, I identified several critical patterns:
8
+
9
+ **1. Verification as Default, Not Optional (Harness CV, 2025)**
10
+ - Modern CD platforms like Harness implement "Continuous Verification" as an automatic step that validates deployments using APM integration and ML-based anomaly detection
11
+ - Verification triggers automatic rollbacks if anomalies are found
12
+ - The pattern: verification is the default behavior after any deployment, not something teams must remember to add
13
+
14
+ **2. Smoke Tests as Immediate Post-Deployment Gates**
15
+ - Industry standard: smoke tests execute IMMEDIATELY after deployment completes
16
+ - Purpose: rapid validation that core functionality works before proceeding
17
+ - LaunchDarkly (2024): "Smoke testing confirms build stability before full testing begins"
18
+ - New Relic (2024): Synthetic monitors continuously verify production deployments automatically
19
+ - These are NOT opt-in; they're built into the deployment pipeline
20
+
21
+ **3. Fast Feedback Loops Over Manual Verification**
22
+ - Dev.to (2026): CI/CD pipelines prioritize "automated builds and tests" immediately after code commits
23
+ - The faster the feedback, the cheaper the fix
24
+ - Manual verification creates bottlenecks and is reserved for truly non-automatable scenarios
25
+
26
+ **4. Blocking vs Non-Blocking Verification**
27
+ - Production systems use both patterns depending on risk:
28
+ - **Blocking:** Verification completes BEFORE next wave proceeds (deployment gates)
29
+ - **Non-blocking:** Verification runs in parallel with next steps (canary deployments with monitoring)
30
+ - Grid's wave-based execution aligns with blocking pattern: verify Wave 1 before spawning Wave 2
31
+
32
+ **5. Verification Scope: Structural vs Runtime**
33
+ - CI/CD distinguishes between:
34
+ - **Verification:** "Did we build the right thing?" (structural checks, unit tests)
35
+ - **Validation:** "Does it work for users?" (integration tests, E2E)
36
+ - Recognizer's three-level artifact verification (Exist → Substantive → Wired) mirrors this structural verification pattern
37
+
38
+ **6. Opt-Out Not Opt-In**
39
+ - Modern frameworks make verification the default path
40
+ - Teams must explicitly skip verification (e.g., `--skip-tests`, `verify: false` flags)
41
+ - This creates psychological friction to skip safety checks, reducing incidents
42
+
43
+ **Key Takeaway:** The industry has converged on **automatic verification as the default behavior** after any code execution. Manual opt-in verification is a legacy pattern that creates risk.
44
+
45
+ ---
46
+
47
+ ## Current Protocol
48
+
49
+ ### How It Works Now
50
+
51
+ From `mc.md` lines 310-363, the current protocol documents the "execute-and-verify primitive":
52
+
53
+ ```python
54
+ ## EXECUTE-AND-VERIFY PRIMITIVE
55
+
56
+ **Executor + Recognizer is the atomic unit.** Don't spawn Executor without planning to verify.
57
+
58
+ def execute_and_verify(plan_content, state_content, warmth=None):
59
+ """Execute a plan and verify the result. Returns combined output."""
60
+
61
+ # 1. Spawn Executor
62
+ exec_result = Task(...)
63
+
64
+ # 2. If checkpoint hit, return early (don't verify incomplete work)
65
+ if "CHECKPOINT REACHED" in exec_result:
66
+ return exec_result
67
+
68
+ # 3. Read the SUMMARY for verification context
69
+ summary = read(f".grid/phases/{block_dir}/{block}-SUMMARY.md")
70
+
71
+ # 4. Spawn Recognizer
72
+ verify_result = Task(...)
73
+
74
+ return {
75
+ "execution": exec_result,
76
+ "verification": verify_result
77
+ }
78
+ ```
79
+
80
+ **Problem:** This is documented but NOT enforced. MC must manually remember to:
81
+ 1. Check if Executor returned CHECKPOINT
82
+ 2. Decide whether to verify
83
+ 3. Spawn Recognizer if appropriate
84
+
85
+ This creates gaps:
86
+ - Verification can be forgotten under cognitive load
87
+ - Manual decision adds friction and delay
88
+ - The atomic "execute-and-verify" primitive isn't actually atomic in practice
89
+
90
+ ### Current Spawning Pattern (lines 229-236)
91
+
92
+ ```python
93
+ # Parallel execution - all three spawn simultaneously
94
+ Task(prompt="...", subagent_type="general-purpose", description="Execute plan 01")
95
+ Task(prompt="...", subagent_type="general-purpose", description="Execute plan 02")
96
+ Task(prompt="...", subagent_type="general-purpose", description="Execute plan 03")
97
+ ```
98
+
99
+ Programs spawn in parallel, MC waits for all to complete, then manually decides next steps.
100
+
101
+ ### Current Recognizer Usage (lines 757-774)
102
+
103
+ ```markdown
104
+ ## VERIFICATION (RECOGNIZER)
105
+
106
+ After execution completes, spawn Recognizer for goal-backward verification:
107
+
108
+ **Three-Level Artifact Check:**
109
+ 1. **Existence** - Does the file exist?
110
+ 2. **Substantive** - Is it real code (not stub)? Min lines, no TODO/FIXME
111
+ 3. **Wired** - Is it connected to the system?
112
+
113
+ If Recognizer finds gaps, spawn Planner with `--gaps` flag to create closure plans.
114
+ ```
115
+
116
+ Again, documented but not automatic. "After execution completes" is vague—WHEN exactly? Who remembers?
117
+
118
+ ---
119
+
120
+ ## Proposed Changes
121
+
122
+ ### 1. Make Verification Automatic by Default
123
+
124
+ **BEFORE (mc.md lines 310-363):**
125
+ ```markdown
126
+ ## EXECUTE-AND-VERIFY PRIMITIVE
127
+
128
+ **Executor + Recognizer is the atomic unit.** Don't spawn Executor without planning to verify.
129
+
130
+ def execute_and_verify(plan_content, state_content, warmth=None):
131
+ """Execute a plan and verify the result. Returns combined output."""
132
+ [current implementation]
133
+ ```
134
+
135
+ **AFTER:**
136
+ ```markdown
137
+ ## EXECUTE-AND-VERIFY PRIMITIVE
138
+
139
+ **Verification is AUTOMATIC after successful execution.** The atomic unit is:
140
+ ```
141
+ Executor → (if SUCCESS) → Recognizer → (if GAPS) → Planner --gaps
142
+ ```
143
+
144
+ ### Protocol
145
+
146
+ **1. Executor completes with status:**
147
+ - `SUCCESS` → Auto-spawn Recognizer (default path)
148
+ - `CHECKPOINT` → Return to MC, don't verify incomplete work
149
+ - `FAILURE` → Return to MC with structured failure report
150
+
151
+ **2. Recognizer spawns AUTOMATICALLY unless:**
152
+ - Executor returned CHECKPOINT (incomplete work, nothing to verify yet)
153
+ - Executor returned FAILURE (broken build, fix first)
154
+ - Plan frontmatter contains `verify: false` (rare override)
155
+ - User explicitly said "skip verification" in this session
156
+
157
+ **3. Verification timing:**
158
+ - **Wave-level verification:** After entire wave completes, verify all plans in that wave
159
+ - Recognizer receives ALL wave execution summaries for holistic goal verification
160
+ - This prevents redundant verification of interdependent plans
161
+
162
+ ### Implementation Pattern
163
+
164
+ ```python
165
+ def execute_wave(wave_plans, state_content, warmth=None):
166
+ """Execute a wave and auto-verify results."""
167
+
168
+ # 1. Spawn all Executors in wave (parallel)
169
+ exec_results = []
170
+ for plan in wave_plans:
171
+ result = Task(
172
+ prompt=f"""
173
+ First, read ~/.claude/agents/grid-executor.md for your role.
174
+
175
+ <state>{state_content}</state>
176
+ <plan>{plan.content}</plan>
177
+ {f'<warmth>{warmth}</warmth>' if warmth else ''}
178
+
179
+ Execute the plan. Include lessons_learned in your SUMMARY.
180
+ Return one of: SUCCESS | CHECKPOINT | FAILURE
181
+ """,
182
+ subagent_type="general-purpose",
183
+ model=get_model("executor"),
184
+ description=f"Execute {plan.id}"
185
+ )
186
+ exec_results.append((plan, result))
187
+
188
+ # 2. Analyze wave results
189
+ checkpoints = [r for r in exec_results if "CHECKPOINT" in r[1]]
190
+ failures = [r for r in exec_results if "EXECUTION FAILED" in r[1]]
191
+ successes = [r for r in exec_results if "SUCCESS" in r[1]]
192
+
193
+ # 3. Handle non-success states
194
+ if checkpoints:
195
+ return {"status": "CHECKPOINT", "details": checkpoints}
196
+ if failures:
197
+ return {"status": "FAILURE", "details": failures}
198
+
199
+ # 4. Auto-verify successes (unless explicitly skipped)
200
+ if should_skip_verification(wave_plans):
201
+ return {"status": "SUCCESS", "verification": "SKIPPED"}
202
+
203
+ # 5. Collect all summaries for wave
204
+ summaries = []
205
+ must_haves = []
206
+ for plan, result in successes:
207
+ summary = read(f".grid/phases/{plan.phase_dir}/{plan.block}-SUMMARY.md")
208
+ summaries.append(summary)
209
+
210
+ # Extract must-haves from plan frontmatter
211
+ plan_must_haves = extract_must_haves(plan.content)
212
+ must_haves.extend(plan_must_haves)
213
+
214
+ # 6. Spawn Recognizer (AUTOMATIC)
215
+ verify_result = Task(
216
+ prompt=f"""
217
+ First, read ~/.claude/agents/grid-recognizer.md for your role.
218
+
219
+ PATROL MODE: Wave {wave_plans[0].wave} verification
220
+
221
+ <wave_summaries>
222
+ {''.join(summaries)}
223
+ </wave_summaries>
224
+
225
+ <must_haves>
226
+ {yaml.dump(must_haves)}
227
+ </must_haves>
228
+
229
+ Verify goal achievement for this wave. Check all artifacts against three levels:
230
+ 1. Existence
231
+ 2. Substantive (not stubs)
232
+ 3. Wired (connected to system)
233
+
234
+ Return status: CLEAR | GAPS_FOUND | CRITICAL_ANOMALY
235
+ """,
236
+ subagent_type="general-purpose",
237
+ model=get_model("recognizer"),
238
+ description=f"Verify wave {wave_plans[0].wave}"
239
+ )
240
+
241
+ # 7. Handle verification results
242
+ if "GAPS_FOUND" in verify_result:
243
+ # Auto-spawn Planner with --gaps flag
244
+ gaps = extract_gaps_from_verification(verify_result)
245
+ gap_closure_plan = spawn_planner_gaps(gaps, state_content)
246
+ return {
247
+ "status": "GAPS_FOUND",
248
+ "verification": verify_result,
249
+ "gap_closure": gap_closure_plan
250
+ }
251
+
252
+ return {
253
+ "status": "VERIFIED",
254
+ "verification": verify_result
255
+ }
256
+
257
+
258
+ def should_skip_verification(wave_plans):
259
+ """Check if verification should be skipped for this wave."""
260
+
261
+ # Check each plan's frontmatter for verify: false
262
+ for plan in wave_plans:
263
+ frontmatter = extract_frontmatter(plan.content)
264
+ if frontmatter.get("verify") == False:
265
+ return True
266
+
267
+ # Check session state for global skip flag
268
+ if session_state.get("skip_verification"):
269
+ return True
270
+
271
+ return False # Default: always verify
272
+ ```
273
+
274
+ ### Opt-Out Mechanism
275
+
276
+ Users can skip verification via:
277
+
278
+ **A. Plan-level override (in PLAN.md frontmatter):**
279
+ ```yaml
280
+ ---
281
+ phase: 01-foundation
282
+ plan: 02
283
+ wave: 1
284
+ verify: false # Skip verification for this plan
285
+ verify_reason: "Prototype/throwaway code"
286
+ ---
287
+ ```
288
+
289
+ **B. Session-level override:**
290
+ ```
291
+ User: "Skip verification for the rest of this session"
292
+ MC: "Verification disabled for this session. Will re-enable on next /grid invocation. End of Line."
293
+ ```
294
+
295
+ **C. Wave-level override (rare):**
296
+ ```python
297
+ # In MC during wave execution
298
+ if user_said_skip_verification:
299
+ session_state["skip_verification"] = True
300
+ ```
301
+ ```
302
+
303
+ ### 2. Update Wave Execution Documentation
304
+
305
+ **BEFORE (mc.md lines 238-248):**
306
+ ```markdown
307
+ ### Wave-Based Execution
308
+
309
+ Plans are assigned **wave numbers** during planning (not execution). Execute waves sequentially, plans within each wave in parallel:
310
+
311
+ WAVE 1: [plan-01, plan-02] → Spawn both in parallel
312
+ ↓ (wait for completion)
313
+ WAVE 2: [plan-03] → Spawn after Wave 1
314
+ ↓ (wait for completion)
315
+ WAVE 3: [plan-04, plan-05] → Spawn both in parallel
316
+ ```
317
+
318
+ **AFTER:**
319
+ ```markdown
320
+ ### Wave-Based Execution with Auto-Verification
321
+
322
+ Plans are assigned **wave numbers** during planning. Execute waves sequentially, with automatic verification after each wave:
323
+
324
+ ```
325
+ WAVE 1: [plan-01, plan-02]
326
+ ├─ Spawn Executors (parallel)
327
+ ├─ Wait for completion
328
+ ├─ Auto-spawn Recognizer (wave-level verification)
329
+ └─ If GAPS_FOUND → Spawn Planner --gaps
330
+
331
+ WAVE 2: [plan-03]
332
+ ├─ Spawn Executor
333
+ ├─ Wait for completion
334
+ ├─ Auto-spawn Recognizer
335
+ └─ If CLEAR → Proceed
336
+
337
+ WAVE 3: [plan-04, plan-05]
338
+ ├─ Spawn Executors (parallel)
339
+ ├─ Wait for completion
340
+ └─ Auto-spawn Recognizer
341
+ ```
342
+
343
+ **Verification Timing:** Wave-level, not plan-level. This prevents redundant checks on interdependent plans.
344
+
345
+ **Verification Skipped When:**
346
+ - Executor returned CHECKPOINT (incomplete work)
347
+ - Executor returned FAILURE (broken state)
348
+ - Plan frontmatter has `verify: false`
349
+ - User said "skip verification"
350
+ ```
351
+
352
+ ### 3. Update Rules Section
353
+
354
+ **BEFORE (mc.md line 892):**
355
+ ```markdown
356
+ 8. **Execute and verify** - Executor + Recognizer is atomic
357
+ ```
358
+
359
+ **AFTER:**
360
+ ```markdown
361
+ 8. **Auto-verify by default** - Recognizer spawns automatically after successful execution (opt-out not opt-in)
362
+ ```
363
+
364
+ ### 4. Update Progress Updates Format
365
+
366
+ **BEFORE (mc.md lines 724-740):**
367
+ ```markdown
368
+ ## PROGRESS UPDATES
369
+
370
+ Never leave User in darkness. Show what's happening:
371
+
372
+ Spawning Executor Programs...
373
+ ├─ Wave 1: plan-01, plan-02 (parallel)
374
+ │ ├─ plan-01: Creating components...
375
+ │ └─ plan-02: Writing API routes...
376
+ ├─ Wave 1 complete
377
+ ├─ Wave 2: plan-03
378
+ │ └─ plan-03: Integrating auth...
379
+ └─ All waves complete
380
+ ```
381
+
382
+ **AFTER:**
383
+ ```markdown
384
+ ## PROGRESS UPDATES
385
+
386
+ Never leave User in darkness. Show what's happening (including automatic verification):
387
+
388
+ ```
389
+ Executing Wave 1...
390
+ ├─ Spawning Executors: plan-01, plan-02 (parallel)
391
+ │ ├─ plan-01: Creating components... ✓
392
+ │ └─ plan-02: Writing API routes... ✓
393
+ ├─ Executors complete
394
+ ├─ Auto-spawning Recognizer...
395
+ │ └─ Verifying artifacts and goal achievement... ✓ CLEAR
396
+ └─ Wave 1 verified
397
+
398
+ Executing Wave 2...
399
+ ├─ Spawning Executor: plan-03
400
+ │ └─ plan-03: Integrating auth... ✓
401
+ ├─ Auto-spawning Recognizer...
402
+ │ └─ Verifying artifacts... ⚠ GAPS_FOUND
403
+ ├─ Spawning Planner for gap closure...
404
+ │ └─ Creating closure plan... ✓
405
+ └─ Wave 2 needs fixes (gap closure plan ready)
406
+ ```
407
+
408
+ The "Auto-spawning Recognizer" line shows it's automatic, not manual.
409
+ ```
410
+
411
+ ### 5. Update Quick Reference
412
+
413
+ **BEFORE (mc.md line 914):**
414
+ ```markdown
415
+ Checkpoints: Present via I/O Tower, spawn fresh with warmth
416
+ ```
417
+
418
+ **AFTER:**
419
+ ```markdown
420
+ Checkpoints: Present via I/O Tower, spawn fresh with warmth
421
+ Verification: Automatic after SUCCESS (wave-level, opt-out via verify: false)
422
+ ```
423
+
424
+ ---
425
+
426
+ ## Rationale
427
+
428
+ ### Why This Is Better
429
+
430
+ **1. Reduced Cognitive Load**
431
+ - MC no longer needs to remember to verify
432
+ - The decision tree collapses: SUCCESS → verify (always)
433
+ - Mental overhead shifts from "should I verify?" to "is this a rare case where I skip?"
434
+
435
+ **2. Aligns with Industry Standards**
436
+ - Modern CI/CD pipelines don't ask "should we run tests?" — they just do
437
+ - Verification gates are the default in production deployment systems
438
+ - Grid moves from legacy "manual QA" pattern to modern "continuous verification"
439
+
440
+ **3. Prevents Silent Gaps**
441
+ - Current risk: MC forgets to verify under time pressure or complexity
442
+ - New behavior: Gaps are caught automatically before User sees "BUILD COMPLETE"
443
+ - Shift-left principle: catch issues immediately after creation
444
+
445
+ **4. Psychological Forcing Function**
446
+ - Opt-out (not opt-in) creates friction to skip verification
447
+ - Teams must explicitly justify skipping with `verify: false` in frontmatter
448
+ - This mirrors production safety patterns (e.g., required PR reviews)
449
+
450
+ **5. Better User Experience**
451
+ - User sees verification happening in progress updates
452
+ - Trust increases: "Grid verified this automatically"
453
+ - Reduced surprises: fewer "wait, this doesn't work" moments post-delivery
454
+
455
+ **6. Wave-Level Verification Reduces Redundancy**
456
+ - Verifying plan-01 then plan-02 separately is wasteful when they're interdependent
457
+ - Wave-level verification checks the COMBINED result of parallel work
458
+ - Recognizer sees the full picture, not partial snapshots
459
+
460
+ **7. Enables Automatic Gap Closure**
461
+ - Current: MC sees verification results, manually decides to spawn Planner
462
+ - New: Verification → GAPS_FOUND → auto-spawn Planner --gaps
463
+ - Complete automation of the "build → verify → fix gaps" cycle
464
+
465
+ **8. Preserves Escape Hatches**
466
+ - Not draconian: three ways to opt out (plan, session, wave)
467
+ - Checkpoints and failures naturally skip verification (smart defaults)
468
+ - Power users can disable for rapid prototyping
469
+
470
+ ---
471
+
472
+ ## Edge Cases Considered
473
+
474
+ ### Edge Case 1: Executor Returns CHECKPOINT
475
+
476
+ **Scenario:** Executor hits a checkpoint mid-wave (e.g., "verify login flow manually").
477
+
478
+ **Handling:**
479
+ ```python
480
+ if "CHECKPOINT REACHED" in exec_result:
481
+ return {"status": "CHECKPOINT", "details": checkpoint_data}
482
+ # DON'T verify — work is incomplete
483
+ ```
484
+
485
+ **Why:** Checkpoints indicate incomplete work. Verifying incomplete work produces false negatives (gaps that aren't real because work isn't done). Wait for User to resolve checkpoint, then verify on continuation.
486
+
487
+ **User Experience:**
488
+ ```
489
+ Wave 1 Execution...
490
+ ├─ plan-01: ✓ SUCCESS
491
+ ├─ plan-02: ⏸ CHECKPOINT (needs User action)
492
+ └─ Verification skipped (checkpoint pending)
493
+
494
+ [MC presents checkpoint to User]
495
+ ```
496
+
497
+ ### Edge Case 2: Executor Returns FAILURE
498
+
499
+ **Scenario:** Executor can't complete due to error (broken build, missing dependency).
500
+
501
+ **Handling:**
502
+ ```python
503
+ if "EXECUTION FAILED" in exec_result:
504
+ return {"status": "FAILURE", "details": failure_report}
505
+ # DON'T verify — nothing meaningful to verify
506
+ ```
507
+
508
+ **Why:** Verification assumes there's work to verify. A failed execution produces no artifacts to check. Spawn retry with failure context instead.
509
+
510
+ **User Experience:**
511
+ ```
512
+ Wave 1 Execution...
513
+ ├─ plan-01: ✓ SUCCESS
514
+ ├─ plan-02: ✗ FAILURE (missing prisma client)
515
+ └─ Verification skipped (fix failures first)
516
+
517
+ Spawning retry for plan-02 with failure context...
518
+ ```
519
+
520
+ ### Edge Case 3: Multiple Plans, Mixed Results
521
+
522
+ **Scenario:** Wave has 3 plans. Two succeed, one checkpoints.
523
+
524
+ **Handling:**
525
+ ```python
526
+ # Prioritize most blocking state
527
+ if any_checkpoints:
528
+ return CHECKPOINT # Block on checkpoint first
529
+ elif any_failures:
530
+ return FAILURE # Fix failures next
531
+ else:
532
+ verify(successes) # Only verify if all succeeded
533
+ ```
534
+
535
+ **Why:** Verification should see COMPLETE wave results. If wave is partial, wait until checkpoint resolves.
536
+
537
+ **User Experience:**
538
+ ```
539
+ Wave 1: 3 plans
540
+ ├─ plan-01: ✓ SUCCESS
541
+ ├─ plan-02: ✓ SUCCESS
542
+ ├─ plan-03: ⏸ CHECKPOINT
543
+ └─ Verification deferred until checkpoint resolves
544
+
545
+ [User resolves checkpoint]
546
+
547
+ Resuming Wave 1...
548
+ ├─ plan-03: ✓ SUCCESS
549
+ ├─ Auto-spawning Recognizer...
550
+ │ └─ Verifying all 3 plans... ✓ CLEAR
551
+ └─ Wave 1 verified
552
+ ```
553
+
554
+ ### Edge Case 4: Verification Itself Fails
555
+
556
+ **Scenario:** Recognizer crashes or times out.
557
+
558
+ **Handling:**
559
+ ```python
560
+ try:
561
+ verify_result = Task(...)
562
+ except Exception as e:
563
+ log_error(f"Recognizer failed: {e}")
564
+ return {
565
+ "status": "VERIFICATION_FAILED",
566
+ "error": str(e),
567
+ "recommendation": "Manual verification needed"
568
+ }
569
+ ```
570
+
571
+ **Why:** Don't block progress on verification tooling failure. Surface to User as anomaly.
572
+
573
+ **User Experience:**
574
+ ```
575
+ Wave 1: Execution complete
576
+ ├─ Auto-spawning Recognizer... ✗ FAILED (timeout)
577
+ └─ Verification tool error (manual check recommended)
578
+
579
+ MC: Recognizer encountered an error. Execution completed but verification failed.
580
+ Manual inspection recommended before proceeding. End of Line.
581
+ ```
582
+
583
+ ### Edge Case 5: Verification Finds Gaps, Planner Fails
584
+
585
+ **Scenario:** Recognizer finds gaps → spawns Planner --gaps → Planner fails.
586
+
587
+ **Handling:**
588
+ ```python
589
+ if "GAPS_FOUND" in verify_result:
590
+ try:
591
+ gap_closure = spawn_planner_gaps(...)
592
+ return {"status": "GAPS_FOUND", "closure_plan": gap_closure}
593
+ except Exception as e:
594
+ return {
595
+ "status": "GAPS_FOUND",
596
+ "closure_plan": None,
597
+ "error": "Planner failed, manual gap closure needed"
598
+ }
599
+ ```
600
+
601
+ **Why:** Gaps are still real even if automated closure planning fails. Surface gaps to User.
602
+
603
+ **User Experience:**
604
+ ```
605
+ Wave 1: Verification complete
606
+ ├─ Status: GAPS_FOUND
607
+ │ └─ Missing: Auth token validation
608
+ ├─ Spawning Planner for gap closure... ✗ FAILED
609
+ └─ Gaps identified but automated closure failed
610
+
611
+ MC: Recognizer found gaps. Automatic closure planning failed.
612
+ See VERIFICATION.md for details. Manual fix needed. End of Line.
613
+ ```
614
+
615
+ ### Edge Case 6: Parallel Waves Completing Out of Order
616
+
617
+ **Scenario:** Due to Task() batching, Wave 2 might complete before Wave 1 verification.
618
+
619
+ **Handling:**
620
+ ```python
621
+ # Waves execute SEQUENTIALLY (per current protocol)
622
+ # Wave 2 doesn't spawn until Wave 1 is fully verified
623
+
624
+ def execute_all_waves(waves):
625
+ for wave in waves:
626
+ result = execute_wave(wave) # Includes auto-verification
627
+
628
+ if result["status"] == "CHECKPOINT":
629
+ return result # Block and return to User
630
+ elif result["status"] == "FAILURE":
631
+ retry_or_escalate()
632
+ elif result["status"] == "GAPS_FOUND":
633
+ execute_gap_closure(result["closure_plan"])
634
+ # Only proceed to next wave if verified
635
+ ```
636
+
637
+ **Why:** Wave-based execution is ALREADY sequential (mc.md line 245). Verification is just the last step of each wave.
638
+
639
+ **User Experience:**
640
+ ```
641
+ Wave 1: Execute → Verify ✓
642
+
643
+ Wave 2: Execute → Verify ✓
644
+
645
+ Wave 3: Execute → Verify ✓
646
+ ```
647
+
648
+ No change from current behavior — verification just becomes automatic final step.
649
+
650
+ ### Edge Case 7: User Requests Mid-Session Verification Skip
651
+
652
+ **Scenario:** User says "just skip verification for now, I'll check later."
653
+
654
+ **Handling:**
655
+ ```python
656
+ # Set session flag
657
+ session_state["skip_verification"] = True
658
+
659
+ # Inform User
660
+ print("Verification disabled for this session.")
661
+ print("Will re-enable automatically on next /grid invocation.")
662
+ print("End of Line.")
663
+ ```
664
+
665
+ **Why:** Respect User agency. Power users prototyping may want speed over safety temporarily.
666
+
667
+ **User Experience:**
668
+ ```
669
+ User: "Skip verification for now, I'm just prototyping"
670
+
671
+ MC: Verification disabled for this session.
672
+ Will re-enable automatically on next /grid invocation.
673
+ End of Line.
674
+
675
+ [All subsequent waves skip verification]
676
+
677
+ User: /clear
678
+ User: /grid
679
+ User: "Build X"
680
+
681
+ MC: [Verification automatically re-enabled — fresh session]
682
+ ```
683
+
684
+ ### Edge Case 8: Verification Takes Too Long
685
+
686
+ **Scenario:** Large codebase, Recognizer takes 5+ minutes to verify.
687
+
688
+ **Handling:**
689
+ ```python
690
+ # Add timeout to verification Task
691
+ verify_result = Task(
692
+ prompt="...",
693
+ timeout=300000, # 5 minutes
694
+ ...
695
+ )
696
+
697
+ if verify_result == TIMEOUT:
698
+ return {
699
+ "status": "VERIFICATION_TIMEOUT",
700
+ "recommendation": "Manual verification or increase timeout"
701
+ }
702
+ ```
703
+
704
+ **Why:** Don't block progress indefinitely. Surface timeout and let User decide.
705
+
706
+ **User Experience:**
707
+ ```
708
+ Wave 1: Execution complete
709
+ ├─ Auto-spawning Recognizer...
710
+ │ └─ Verifying... (large codebase, this may take a few minutes)
711
+ │ └─ Timeout after 5 minutes
712
+ └─ Verification incomplete (manual check recommended)
713
+
714
+ MC: Verification timed out. Execution completed successfully.
715
+ Recommend manual inspection of key artifacts. End of Line.
716
+ ```
717
+
718
+ ### Edge Case 9: Verification Finds CRITICAL_ANOMALY
719
+
720
+ **Scenario:** Recognizer can't determine goal achievement programmatically (needs human verification).
721
+
722
+ **Handling:**
723
+ ```python
724
+ if verify_result["status"] == "CRITICAL_ANOMALY":
725
+ return {
726
+ "status": "HUMAN_VERIFICATION_NEEDED",
727
+ "details": verify_result["human_verification_items"]
728
+ }
729
+ # Present to User via I/O Tower
730
+ ```
731
+
732
+ **Why:** Some things (visual, UX, external integrations) need human eyes. Don't block, surface.
733
+
734
+ **User Experience:**
735
+ ```
736
+ Wave 1: Execution complete
737
+ ├─ Auto-spawning Recognizer... ✓
738
+ └─ Status: HUMAN_VERIFICATION_NEEDED
739
+
740
+ Human Verification Required:
741
+ 1. Check login UI renders correctly (screenshot at .grid/refinement/screenshots/login.png)
742
+ 2. Test email delivery works (external service)
743
+
744
+ MC: Automated checks passed. Manual verification needed for items above.
745
+ Confirm when ready to proceed. End of Line.
746
+
747
+ User: "Looks good"
748
+ MC: Proceeding to Wave 2. End of Line.
749
+ ```
750
+
751
+ ### Edge Case 10: Opt-Out via Frontmatter But Verification Needed Anyway
752
+
753
+ **Scenario:** User sets `verify: false` but User later says "wait, verify that."
754
+
755
+ **Handling:**
756
+ ```python
757
+ # Respect explicit User command over frontmatter
758
+ if user_says_verify_now:
759
+ spawn_recognizer(...) # Override frontmatter setting
760
+ print("Verification override: Spawning Recognizer despite verify: false in plan.")
761
+ ```
762
+
763
+ **Why:** User intent in conversation overrides static config. Be flexible.
764
+
765
+ **User Experience:**
766
+ ```
767
+ [Wave completes with verify: false in plan]
768
+
769
+ MC: Wave 1 complete. Verification skipped (verify: false in plan). End of Line.
770
+
771
+ User: "Actually, verify that wave"
772
+
773
+ MC: Verification override: Spawning Recognizer despite verify: false in plan.
774
+ [Recognizer runs...]
775
+ End of Line.
776
+ ```
777
+
778
+ ---
779
+
780
+ ## Implementation Checklist
781
+
782
+ Before merging this feature into production mc.md:
783
+
784
+ - [ ] Update EXECUTE-AND-VERIFY PRIMITIVE section with new protocol
785
+ - [ ] Update Wave-Based Execution section with auto-verification flow
786
+ - [ ] Add `should_skip_verification()` helper function to Quick Reference
787
+ - [ ] Update RULES section (rule #8)
788
+ - [ ] Update PROGRESS UPDATES with verification output
789
+ - [ ] Add verification opt-out patterns to documentation
790
+ - [ ] Update Quick Reference with verification timing note
791
+ - [ ] Test edge cases:
792
+ - [ ] Executor returns CHECKPOINT → verify skipped
793
+ - [ ] Executor returns FAILURE → verify skipped
794
+ - [ ] Mixed wave results → correct prioritization
795
+ - [ ] Verification finds gaps → Planner spawns
796
+ - [ ] User says "skip verification" → session flag set
797
+ - [ ] Plan has `verify: false` → skipped
798
+ - [ ] Verification timeout → graceful degradation
799
+ - [ ] Update grid-executor.md to return explicit SUCCESS status
800
+ - [ ] Update grid-recognizer.md to handle wave-level summaries
801
+ - [ ] Add verification metrics to STATE.md (optional):
802
+ ```yaml
803
+ verification_stats:
804
+ waves_verified: 3
805
+ gaps_found: 1
806
+ gaps_closed: 1
807
+ verification_skipped: 0
808
+ ```
809
+
810
+ ---
811
+
812
+ ## Migration Path
813
+
814
+ This feature is **backward compatible**:
815
+
816
+ 1. **Existing behavior still works:** MC can still manually spawn Recognizer if needed
817
+ 2. **New projects get automatic verification:** Fresh `/grid` sessions use auto-verify
818
+ 3. **Old projects unaffected:** No changes to existing .grid/ state
819
+ 4. **Gradual rollout:** Ship to npm, users adopt on next `npm update`
820
+
821
+ No breaking changes. Pure enhancement.
822
+
823
+ ---
824
+
825
+ ## Success Metrics
826
+
827
+ After shipping, measure:
828
+
829
+ 1. **Gap detection rate:** % of waves where Recognizer finds gaps
830
+ - Hypothesis: Will increase initially (catching silent gaps), then decrease (quality improves)
831
+
832
+ 2. **Verification skip rate:** % of waves with `verify: false`
833
+ - Target: <5% (verification should be rare to skip)
834
+
835
+ 3. **User-initiated verification skips:** % of sessions where User says "skip verification"
836
+ - Target: <10% (should be exceptional, not common)
837
+
838
+ 4. **Time-to-verification:** Median time from Executor SUCCESS to Recognizer spawn
839
+ - Target: <2 seconds (nearly instant)
840
+
841
+ 5. **Gap closure success rate:** % of GAPS_FOUND that lead to successful closure plan execution
842
+ - Target: >80% (most gaps should be auto-fixable)
843
+
844
+ ---
845
+
846
+ ## Future Enhancements (Out of Scope)
847
+
848
+ These are NOT part of this feature but could build on it later:
849
+
850
+ 1. **Predictive Verification:** Recognizer prioritizes checks based on past gap patterns
851
+ 2. **Partial Wave Verification:** Verify successes even if checkpoint pending (if independent)
852
+ 3. **Verification Metrics Dashboard:** `.grid/metrics.json` tracking verification health
853
+ 4. **Smart Verification Skipping:** Auto-skip verification for trivial changes (doc updates)
854
+ 5. **Verification Confidence Scores:** Recognizer returns 0-100% confidence in each check
855
+ 6. **Parallel Verification:** Spawn multiple Recognizers to verify different aspects simultaneously
856
+
857
+ ---
858
+
859
+ ## Conclusion
860
+
861
+ This feature transforms verification from a **manual afterthought** to an **automatic safety gate**. By making verification opt-out (not opt-in), we align with industry best practices and dramatically reduce the risk of silent gaps reaching Users.
862
+
863
+ The execute-and-verify primitive becomes truly atomic: Executor → Recognizer happens automatically unless there's a specific reason not to (checkpoint, failure, explicit skip).
864
+
865
+ User experience improves: they see verification happening automatically, trust increases, and surprises decrease.
866
+
867
+ End of Line.