the-grid-cc 1.7.3 → 1.7.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.grid-drafts/01-plan-execute-pipeline.md +709 -0
- package/.grid-drafts/02-auto-verify-default.md +867 -0
- package/.grid-drafts/03-quick-mode-detection.md +589 -0
- package/.grid-drafts/04-scratchpad-enforcement.md +669 -0
- package/README.md +13 -1
- package/assets/terminal-v3.svg +112 -0
- package/commands/grid/VERSION +1 -1
- package/commands/grid/mc.md +166 -7
- package/package.json +1 -1
- package/assets/terminal.svg +0 -120
|
@@ -0,0 +1,867 @@
|
|
|
1
|
+
# Feature: Auto-Verify Default
|
|
2
|
+
|
|
3
|
+
## Research Summary
|
|
4
|
+
|
|
5
|
+
### Industry Patterns from CI/CD and Production Systems
|
|
6
|
+
|
|
7
|
+
From my research on verification patterns in modern CI/CD pipelines and production deployment systems, I identified several critical patterns:
|
|
8
|
+
|
|
9
|
+
**1. Verification as Default, Not Optional (Harness CV, 2025)**
|
|
10
|
+
- Modern CD platforms like Harness implement "Continuous Verification" as an automatic step that validates deployments using APM integration and ML-based anomaly detection
|
|
11
|
+
- Verification triggers automatic rollbacks if anomalies are found
|
|
12
|
+
- The pattern: verification is the default behavior after any deployment, not something teams must remember to add
|
|
13
|
+
|
|
14
|
+
**2. Smoke Tests as Immediate Post-Deployment Gates**
|
|
15
|
+
- Industry standard: smoke tests execute IMMEDIATELY after deployment completes
|
|
16
|
+
- Purpose: rapid validation that core functionality works before proceeding
|
|
17
|
+
- LaunchDarkly (2024): "Smoke testing confirms build stability before full testing begins"
|
|
18
|
+
- New Relic (2024): Synthetic monitors continuously verify production deployments automatically
|
|
19
|
+
- These are NOT opt-in; they're built into the deployment pipeline
|
|
20
|
+
|
|
21
|
+
**3. Fast Feedback Loops Over Manual Verification**
|
|
22
|
+
- Dev.to (2026): CI/CD pipelines prioritize "automated builds and tests" immediately after code commits
|
|
23
|
+
- The faster the feedback, the cheaper the fix
|
|
24
|
+
- Manual verification creates bottlenecks and is reserved for truly non-automatable scenarios
|
|
25
|
+
|
|
26
|
+
**4. Blocking vs Non-Blocking Verification**
|
|
27
|
+
- Production systems use both patterns depending on risk:
|
|
28
|
+
- **Blocking:** Verification completes BEFORE next wave proceeds (deployment gates)
|
|
29
|
+
- **Non-blocking:** Verification runs in parallel with next steps (canary deployments with monitoring)
|
|
30
|
+
- Grid's wave-based execution aligns with blocking pattern: verify Wave 1 before spawning Wave 2
|
|
31
|
+
|
|
32
|
+
**5. Verification Scope: Structural vs Runtime**
|
|
33
|
+
- CI/CD distinguishes between:
|
|
34
|
+
- **Verification:** "Did we build the right thing?" (structural checks, unit tests)
|
|
35
|
+
- **Validation:** "Does it work for users?" (integration tests, E2E)
|
|
36
|
+
- Recognizer's three-level artifact verification (Exist → Substantive → Wired) mirrors this structural verification pattern
|
|
37
|
+
|
|
38
|
+
**6. Opt-Out Not Opt-In**
|
|
39
|
+
- Modern frameworks make verification the default path
|
|
40
|
+
- Teams must explicitly skip verification (e.g., `--skip-tests`, `verify: false` flags)
|
|
41
|
+
- This creates psychological friction to skip safety checks, reducing incidents
|
|
42
|
+
|
|
43
|
+
**Key Takeaway:** The industry has converged on **automatic verification as the default behavior** after any code execution. Manual opt-in verification is a legacy pattern that creates risk.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Current Protocol
|
|
48
|
+
|
|
49
|
+
### How It Works Now
|
|
50
|
+
|
|
51
|
+
From `mc.md` lines 310-363, the current protocol documents the "execute-and-verify primitive":
|
|
52
|
+
|
|
53
|
+
```python
|
|
54
|
+
## EXECUTE-AND-VERIFY PRIMITIVE
|
|
55
|
+
|
|
56
|
+
**Executor + Recognizer is the atomic unit.** Don't spawn Executor without planning to verify.
|
|
57
|
+
|
|
58
|
+
def execute_and_verify(plan_content, state_content, warmth=None):
|
|
59
|
+
"""Execute a plan and verify the result. Returns combined output."""
|
|
60
|
+
|
|
61
|
+
# 1. Spawn Executor
|
|
62
|
+
exec_result = Task(...)
|
|
63
|
+
|
|
64
|
+
# 2. If checkpoint hit, return early (don't verify incomplete work)
|
|
65
|
+
if "CHECKPOINT REACHED" in exec_result:
|
|
66
|
+
return exec_result
|
|
67
|
+
|
|
68
|
+
# 3. Read the SUMMARY for verification context
|
|
69
|
+
summary = read(f".grid/phases/{block_dir}/{block}-SUMMARY.md")
|
|
70
|
+
|
|
71
|
+
# 4. Spawn Recognizer
|
|
72
|
+
verify_result = Task(...)
|
|
73
|
+
|
|
74
|
+
return {
|
|
75
|
+
"execution": exec_result,
|
|
76
|
+
"verification": verify_result
|
|
77
|
+
}
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Problem:** This is documented but NOT enforced. MC must manually remember to:
|
|
81
|
+
1. Check if Executor returned CHECKPOINT
|
|
82
|
+
2. Decide whether to verify
|
|
83
|
+
3. Spawn Recognizer if appropriate
|
|
84
|
+
|
|
85
|
+
This creates gaps:
|
|
86
|
+
- Verification can be forgotten under cognitive load
|
|
87
|
+
- Manual decision adds friction and delay
|
|
88
|
+
- The atomic "execute-and-verify" primitive isn't actually atomic in practice
|
|
89
|
+
|
|
90
|
+
### Current Spawning Pattern (lines 229-236)
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
# Parallel execution - all three spawn simultaneously
|
|
94
|
+
Task(prompt="...", subagent_type="general-purpose", description="Execute plan 01")
|
|
95
|
+
Task(prompt="...", subagent_type="general-purpose", description="Execute plan 02")
|
|
96
|
+
Task(prompt="...", subagent_type="general-purpose", description="Execute plan 03")
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Programs spawn in parallel, MC waits for all to complete, then manually decides next steps.
|
|
100
|
+
|
|
101
|
+
### Current Recognizer Usage (lines 757-774)
|
|
102
|
+
|
|
103
|
+
```markdown
|
|
104
|
+
## VERIFICATION (RECOGNIZER)
|
|
105
|
+
|
|
106
|
+
After execution completes, spawn Recognizer for goal-backward verification:
|
|
107
|
+
|
|
108
|
+
**Three-Level Artifact Check:**
|
|
109
|
+
1. **Existence** - Does the file exist?
|
|
110
|
+
2. **Substantive** - Is it real code (not stub)? Min lines, no TODO/FIXME
|
|
111
|
+
3. **Wired** - Is it connected to the system?
|
|
112
|
+
|
|
113
|
+
If Recognizer finds gaps, spawn Planner with `--gaps` flag to create closure plans.
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Again, documented but not automatic. "After execution completes" is vague—WHEN exactly? Who remembers?
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Proposed Changes
|
|
121
|
+
|
|
122
|
+
### 1. Make Verification Automatic by Default
|
|
123
|
+
|
|
124
|
+
**BEFORE (mc.md lines 310-363):**
|
|
125
|
+
```markdown
|
|
126
|
+
## EXECUTE-AND-VERIFY PRIMITIVE
|
|
127
|
+
|
|
128
|
+
**Executor + Recognizer is the atomic unit.** Don't spawn Executor without planning to verify.
|
|
129
|
+
|
|
130
|
+
def execute_and_verify(plan_content, state_content, warmth=None):
|
|
131
|
+
"""Execute a plan and verify the result. Returns combined output."""
|
|
132
|
+
[current implementation]
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
**AFTER:**
|
|
136
|
+
```markdown
|
|
137
|
+
## EXECUTE-AND-VERIFY PRIMITIVE
|
|
138
|
+
|
|
139
|
+
**Verification is AUTOMATIC after successful execution.** The atomic unit is:
|
|
140
|
+
```
|
|
141
|
+
Executor → (if SUCCESS) → Recognizer → (if GAPS) → Planner --gaps
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Protocol
|
|
145
|
+
|
|
146
|
+
**1. Executor completes with status:**
|
|
147
|
+
- `SUCCESS` → Auto-spawn Recognizer (default path)
|
|
148
|
+
- `CHECKPOINT` → Return to MC, don't verify incomplete work
|
|
149
|
+
- `FAILURE` → Return to MC with structured failure report
|
|
150
|
+
|
|
151
|
+
**2. Recognizer spawns AUTOMATICALLY unless:**
|
|
152
|
+
- Executor returned CHECKPOINT (incomplete work, nothing to verify yet)
|
|
153
|
+
- Executor returned FAILURE (broken build, fix first)
|
|
154
|
+
- Plan frontmatter contains `verify: false` (rare override)
|
|
155
|
+
- User explicitly said "skip verification" in this session
|
|
156
|
+
|
|
157
|
+
**3. Verification timing:**
|
|
158
|
+
- **Wave-level verification:** After entire wave completes, verify all plans in that wave
|
|
159
|
+
- Recognizer receives ALL wave execution summaries for holistic goal verification
|
|
160
|
+
- This prevents redundant verification of interdependent plans
|
|
161
|
+
|
|
162
|
+
### Implementation Pattern
|
|
163
|
+
|
|
164
|
+
```python
|
|
165
|
+
def execute_wave(wave_plans, state_content, warmth=None):
|
|
166
|
+
"""Execute a wave and auto-verify results."""
|
|
167
|
+
|
|
168
|
+
# 1. Spawn all Executors in wave (parallel)
|
|
169
|
+
exec_results = []
|
|
170
|
+
for plan in wave_plans:
|
|
171
|
+
result = Task(
|
|
172
|
+
prompt=f"""
|
|
173
|
+
First, read ~/.claude/agents/grid-executor.md for your role.
|
|
174
|
+
|
|
175
|
+
<state>{state_content}</state>
|
|
176
|
+
<plan>{plan.content}</plan>
|
|
177
|
+
{f'<warmth>{warmth}</warmth>' if warmth else ''}
|
|
178
|
+
|
|
179
|
+
Execute the plan. Include lessons_learned in your SUMMARY.
|
|
180
|
+
Return one of: SUCCESS | CHECKPOINT | FAILURE
|
|
181
|
+
""",
|
|
182
|
+
subagent_type="general-purpose",
|
|
183
|
+
model=get_model("executor"),
|
|
184
|
+
description=f"Execute {plan.id}"
|
|
185
|
+
)
|
|
186
|
+
exec_results.append((plan, result))
|
|
187
|
+
|
|
188
|
+
# 2. Analyze wave results
|
|
189
|
+
checkpoints = [r for r in exec_results if "CHECKPOINT" in r[1]]
|
|
190
|
+
failures = [r for r in exec_results if "EXECUTION FAILED" in r[1]]
|
|
191
|
+
successes = [r for r in exec_results if "SUCCESS" in r[1]]
|
|
192
|
+
|
|
193
|
+
# 3. Handle non-success states
|
|
194
|
+
if checkpoints:
|
|
195
|
+
return {"status": "CHECKPOINT", "details": checkpoints}
|
|
196
|
+
if failures:
|
|
197
|
+
return {"status": "FAILURE", "details": failures}
|
|
198
|
+
|
|
199
|
+
# 4. Auto-verify successes (unless explicitly skipped)
|
|
200
|
+
if should_skip_verification(wave_plans):
|
|
201
|
+
return {"status": "SUCCESS", "verification": "SKIPPED"}
|
|
202
|
+
|
|
203
|
+
# 5. Collect all summaries for wave
|
|
204
|
+
summaries = []
|
|
205
|
+
must_haves = []
|
|
206
|
+
for plan, result in successes:
|
|
207
|
+
summary = read(f".grid/phases/{plan.phase_dir}/{plan.block}-SUMMARY.md")
|
|
208
|
+
summaries.append(summary)
|
|
209
|
+
|
|
210
|
+
# Extract must-haves from plan frontmatter
|
|
211
|
+
plan_must_haves = extract_must_haves(plan.content)
|
|
212
|
+
must_haves.extend(plan_must_haves)
|
|
213
|
+
|
|
214
|
+
# 6. Spawn Recognizer (AUTOMATIC)
|
|
215
|
+
verify_result = Task(
|
|
216
|
+
prompt=f"""
|
|
217
|
+
First, read ~/.claude/agents/grid-recognizer.md for your role.
|
|
218
|
+
|
|
219
|
+
PATROL MODE: Wave {wave_plans[0].wave} verification
|
|
220
|
+
|
|
221
|
+
<wave_summaries>
|
|
222
|
+
{''.join(summaries)}
|
|
223
|
+
</wave_summaries>
|
|
224
|
+
|
|
225
|
+
<must_haves>
|
|
226
|
+
{yaml.dump(must_haves)}
|
|
227
|
+
</must_haves>
|
|
228
|
+
|
|
229
|
+
Verify goal achievement for this wave. Check all artifacts against three levels:
|
|
230
|
+
1. Existence
|
|
231
|
+
2. Substantive (not stubs)
|
|
232
|
+
3. Wired (connected to system)
|
|
233
|
+
|
|
234
|
+
Return status: CLEAR | GAPS_FOUND | CRITICAL_ANOMALY
|
|
235
|
+
""",
|
|
236
|
+
subagent_type="general-purpose",
|
|
237
|
+
model=get_model("recognizer"),
|
|
238
|
+
description=f"Verify wave {wave_plans[0].wave}"
|
|
239
|
+
)
|
|
240
|
+
|
|
241
|
+
# 7. Handle verification results
|
|
242
|
+
if "GAPS_FOUND" in verify_result:
|
|
243
|
+
# Auto-spawn Planner with --gaps flag
|
|
244
|
+
gaps = extract_gaps_from_verification(verify_result)
|
|
245
|
+
gap_closure_plan = spawn_planner_gaps(gaps, state_content)
|
|
246
|
+
return {
|
|
247
|
+
"status": "GAPS_FOUND",
|
|
248
|
+
"verification": verify_result,
|
|
249
|
+
"gap_closure": gap_closure_plan
|
|
250
|
+
}
|
|
251
|
+
|
|
252
|
+
return {
|
|
253
|
+
"status": "VERIFIED",
|
|
254
|
+
"verification": verify_result
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
|
|
258
|
+
def should_skip_verification(wave_plans):
|
|
259
|
+
"""Check if verification should be skipped for this wave."""
|
|
260
|
+
|
|
261
|
+
# Check each plan's frontmatter for verify: false
|
|
262
|
+
for plan in wave_plans:
|
|
263
|
+
frontmatter = extract_frontmatter(plan.content)
|
|
264
|
+
if frontmatter.get("verify") == False:
|
|
265
|
+
return True
|
|
266
|
+
|
|
267
|
+
# Check session state for global skip flag
|
|
268
|
+
if session_state.get("skip_verification"):
|
|
269
|
+
return True
|
|
270
|
+
|
|
271
|
+
return False # Default: always verify
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
### Opt-Out Mechanism
|
|
275
|
+
|
|
276
|
+
Users can skip verification via:
|
|
277
|
+
|
|
278
|
+
**A. Plan-level override (in PLAN.md frontmatter):**
|
|
279
|
+
```yaml
|
|
280
|
+
---
|
|
281
|
+
phase: 01-foundation
|
|
282
|
+
plan: 02
|
|
283
|
+
wave: 1
|
|
284
|
+
verify: false # Skip verification for this plan
|
|
285
|
+
verify_reason: "Prototype/throwaway code"
|
|
286
|
+
---
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
**B. Session-level override:**
|
|
290
|
+
```
|
|
291
|
+
User: "Skip verification for the rest of this session"
|
|
292
|
+
MC: "Verification disabled for this session. Will re-enable on next /grid invocation. End of Line."
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
**C. Wave-level override (rare):**
|
|
296
|
+
```python
|
|
297
|
+
# In MC during wave execution
|
|
298
|
+
if user_said_skip_verification:
|
|
299
|
+
session_state["skip_verification"] = True
|
|
300
|
+
```
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### 2. Update Wave Execution Documentation
|
|
304
|
+
|
|
305
|
+
**BEFORE (mc.md lines 238-248):**
|
|
306
|
+
```markdown
|
|
307
|
+
### Wave-Based Execution
|
|
308
|
+
|
|
309
|
+
Plans are assigned **wave numbers** during planning (not execution). Execute waves sequentially, plans within each wave in parallel:
|
|
310
|
+
|
|
311
|
+
WAVE 1: [plan-01, plan-02] → Spawn both in parallel
|
|
312
|
+
↓ (wait for completion)
|
|
313
|
+
WAVE 2: [plan-03] → Spawn after Wave 1
|
|
314
|
+
↓ (wait for completion)
|
|
315
|
+
WAVE 3: [plan-04, plan-05] → Spawn both in parallel
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
**AFTER:**
|
|
319
|
+
```markdown
|
|
320
|
+
### Wave-Based Execution with Auto-Verification
|
|
321
|
+
|
|
322
|
+
Plans are assigned **wave numbers** during planning. Execute waves sequentially, with automatic verification after each wave:
|
|
323
|
+
|
|
324
|
+
```
|
|
325
|
+
WAVE 1: [plan-01, plan-02]
|
|
326
|
+
├─ Spawn Executors (parallel)
|
|
327
|
+
├─ Wait for completion
|
|
328
|
+
├─ Auto-spawn Recognizer (wave-level verification)
|
|
329
|
+
└─ If GAPS_FOUND → Spawn Planner --gaps
|
|
330
|
+
↓
|
|
331
|
+
WAVE 2: [plan-03]
|
|
332
|
+
├─ Spawn Executor
|
|
333
|
+
├─ Wait for completion
|
|
334
|
+
├─ Auto-spawn Recognizer
|
|
335
|
+
└─ If CLEAR → Proceed
|
|
336
|
+
↓
|
|
337
|
+
WAVE 3: [plan-04, plan-05]
|
|
338
|
+
├─ Spawn Executors (parallel)
|
|
339
|
+
├─ Wait for completion
|
|
340
|
+
└─ Auto-spawn Recognizer
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
**Verification Timing:** Wave-level, not plan-level. This prevents redundant checks on interdependent plans.
|
|
344
|
+
|
|
345
|
+
**Verification Skipped When:**
|
|
346
|
+
- Executor returned CHECKPOINT (incomplete work)
|
|
347
|
+
- Executor returned FAILURE (broken state)
|
|
348
|
+
- Plan frontmatter has `verify: false`
|
|
349
|
+
- User said "skip verification"
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
### 3. Update Rules Section
|
|
353
|
+
|
|
354
|
+
**BEFORE (mc.md line 892):**
|
|
355
|
+
```markdown
|
|
356
|
+
8. **Execute and verify** - Executor + Recognizer is atomic
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
**AFTER:**
|
|
360
|
+
```markdown
|
|
361
|
+
8. **Auto-verify by default** - Recognizer spawns automatically after successful execution (opt-out not opt-in)
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
### 4. Update Progress Updates Format
|
|
365
|
+
|
|
366
|
+
**BEFORE (mc.md lines 724-740):**
|
|
367
|
+
```markdown
|
|
368
|
+
## PROGRESS UPDATES
|
|
369
|
+
|
|
370
|
+
Never leave User in darkness. Show what's happening:
|
|
371
|
+
|
|
372
|
+
Spawning Executor Programs...
|
|
373
|
+
├─ Wave 1: plan-01, plan-02 (parallel)
|
|
374
|
+
│ ├─ plan-01: Creating components...
|
|
375
|
+
│ └─ plan-02: Writing API routes...
|
|
376
|
+
├─ Wave 1 complete
|
|
377
|
+
├─ Wave 2: plan-03
|
|
378
|
+
│ └─ plan-03: Integrating auth...
|
|
379
|
+
└─ All waves complete
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
**AFTER:**
|
|
383
|
+
```markdown
|
|
384
|
+
## PROGRESS UPDATES
|
|
385
|
+
|
|
386
|
+
Never leave User in darkness. Show what's happening (including automatic verification):
|
|
387
|
+
|
|
388
|
+
```
|
|
389
|
+
Executing Wave 1...
|
|
390
|
+
├─ Spawning Executors: plan-01, plan-02 (parallel)
|
|
391
|
+
│ ├─ plan-01: Creating components... ✓
|
|
392
|
+
│ └─ plan-02: Writing API routes... ✓
|
|
393
|
+
├─ Executors complete
|
|
394
|
+
├─ Auto-spawning Recognizer...
|
|
395
|
+
│ └─ Verifying artifacts and goal achievement... ✓ CLEAR
|
|
396
|
+
└─ Wave 1 verified
|
|
397
|
+
|
|
398
|
+
Executing Wave 2...
|
|
399
|
+
├─ Spawning Executor: plan-03
|
|
400
|
+
│ └─ plan-03: Integrating auth... ✓
|
|
401
|
+
├─ Auto-spawning Recognizer...
|
|
402
|
+
│ └─ Verifying artifacts... ⚠ GAPS_FOUND
|
|
403
|
+
├─ Spawning Planner for gap closure...
|
|
404
|
+
│ └─ Creating closure plan... ✓
|
|
405
|
+
└─ Wave 2 needs fixes (gap closure plan ready)
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
The "Auto-spawning Recognizer" line shows it's automatic, not manual.
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
### 5. Update Quick Reference
|
|
412
|
+
|
|
413
|
+
**BEFORE (mc.md line 914):**
|
|
414
|
+
```markdown
|
|
415
|
+
Checkpoints: Present via I/O Tower, spawn fresh with warmth
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
**AFTER:**
|
|
419
|
+
```markdown
|
|
420
|
+
Checkpoints: Present via I/O Tower, spawn fresh with warmth
|
|
421
|
+
Verification: Automatic after SUCCESS (wave-level, opt-out via verify: false)
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## Rationale
|
|
427
|
+
|
|
428
|
+
### Why This Is Better
|
|
429
|
+
|
|
430
|
+
**1. Reduced Cognitive Load**
|
|
431
|
+
- MC no longer needs to remember to verify
|
|
432
|
+
- The decision tree collapses: SUCCESS → verify (always)
|
|
433
|
+
- Mental overhead shifts from "should I verify?" to "is this a rare case where I skip?"
|
|
434
|
+
|
|
435
|
+
**2. Aligns with Industry Standards**
|
|
436
|
+
- Modern CI/CD pipelines don't ask "should we run tests?" — they just do
|
|
437
|
+
- Verification gates are the default in production deployment systems
|
|
438
|
+
- Grid moves from legacy "manual QA" pattern to modern "continuous verification"
|
|
439
|
+
|
|
440
|
+
**3. Prevents Silent Gaps**
|
|
441
|
+
- Current risk: MC forgets to verify under time pressure or complexity
|
|
442
|
+
- New behavior: Gaps are caught automatically before User sees "BUILD COMPLETE"
|
|
443
|
+
- Shift-left principle: catch issues immediately after creation
|
|
444
|
+
|
|
445
|
+
**4. Psychological Forcing Function**
|
|
446
|
+
- Opt-out (not opt-in) creates friction to skip verification
|
|
447
|
+
- Teams must explicitly justify skipping with `verify: false` in frontmatter
|
|
448
|
+
- This mirrors production safety patterns (e.g., required PR reviews)
|
|
449
|
+
|
|
450
|
+
**5. Better User Experience**
|
|
451
|
+
- User sees verification happening in progress updates
|
|
452
|
+
- Trust increases: "Grid verified this automatically"
|
|
453
|
+
- Reduced surprises: fewer "wait, this doesn't work" moments post-delivery
|
|
454
|
+
|
|
455
|
+
**6. Wave-Level Verification Reduces Redundancy**
|
|
456
|
+
- Verifying plan-01 then plan-02 separately is wasteful when they're interdependent
|
|
457
|
+
- Wave-level verification checks the COMBINED result of parallel work
|
|
458
|
+
- Recognizer sees the full picture, not partial snapshots
|
|
459
|
+
|
|
460
|
+
**7. Enables Automatic Gap Closure**
|
|
461
|
+
- Current: MC sees verification results, manually decides to spawn Planner
|
|
462
|
+
- New: Verification → GAPS_FOUND → auto-spawn Planner --gaps
|
|
463
|
+
- Complete automation of the "build → verify → fix gaps" cycle
|
|
464
|
+
|
|
465
|
+
**8. Preserves Escape Hatches**
|
|
466
|
+
- Not draconian: three ways to opt out (plan, session, wave)
|
|
467
|
+
- Checkpoints and failures naturally skip verification (smart defaults)
|
|
468
|
+
- Power users can disable for rapid prototyping
|
|
469
|
+
|
|
470
|
+
---
|
|
471
|
+
|
|
472
|
+
## Edge Cases Considered
|
|
473
|
+
|
|
474
|
+
### Edge Case 1: Executor Returns CHECKPOINT
|
|
475
|
+
|
|
476
|
+
**Scenario:** Executor hits a checkpoint mid-wave (e.g., "verify login flow manually").
|
|
477
|
+
|
|
478
|
+
**Handling:**
|
|
479
|
+
```python
|
|
480
|
+
if "CHECKPOINT REACHED" in exec_result:
|
|
481
|
+
return {"status": "CHECKPOINT", "details": checkpoint_data}
|
|
482
|
+
# DON'T verify — work is incomplete
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
**Why:** Checkpoints indicate incomplete work. Verifying incomplete work produces false negatives (gaps that aren't real because work isn't done). Wait for User to resolve checkpoint, then verify on continuation.
|
|
486
|
+
|
|
487
|
+
**User Experience:**
|
|
488
|
+
```
|
|
489
|
+
Wave 1 Execution...
|
|
490
|
+
├─ plan-01: ✓ SUCCESS
|
|
491
|
+
├─ plan-02: ⏸ CHECKPOINT (needs User action)
|
|
492
|
+
└─ Verification skipped (checkpoint pending)
|
|
493
|
+
|
|
494
|
+
[MC presents checkpoint to User]
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### Edge Case 2: Executor Returns FAILURE
|
|
498
|
+
|
|
499
|
+
**Scenario:** Executor can't complete due to error (broken build, missing dependency).
|
|
500
|
+
|
|
501
|
+
**Handling:**
|
|
502
|
+
```python
|
|
503
|
+
if "EXECUTION FAILED" in exec_result:
|
|
504
|
+
return {"status": "FAILURE", "details": failure_report}
|
|
505
|
+
# DON'T verify — nothing meaningful to verify
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
**Why:** Verification assumes there's work to verify. A failed execution produces no artifacts to check. Spawn retry with failure context instead.
|
|
509
|
+
|
|
510
|
+
**User Experience:**
|
|
511
|
+
```
|
|
512
|
+
Wave 1 Execution...
|
|
513
|
+
├─ plan-01: ✓ SUCCESS
|
|
514
|
+
├─ plan-02: ✗ FAILURE (missing prisma client)
|
|
515
|
+
└─ Verification skipped (fix failures first)
|
|
516
|
+
|
|
517
|
+
Spawning retry for plan-02 with failure context...
|
|
518
|
+
```
|
|
519
|
+
|
|
520
|
+
### Edge Case 3: Multiple Plans, Mixed Results
|
|
521
|
+
|
|
522
|
+
**Scenario:** Wave has 3 plans. Two succeed, one checkpoints.
|
|
523
|
+
|
|
524
|
+
**Handling:**
|
|
525
|
+
```python
|
|
526
|
+
# Prioritize most blocking state
|
|
527
|
+
if any_checkpoints:
|
|
528
|
+
return CHECKPOINT # Block on checkpoint first
|
|
529
|
+
elif any_failures:
|
|
530
|
+
return FAILURE # Fix failures next
|
|
531
|
+
else:
|
|
532
|
+
verify(successes) # Only verify if all succeeded
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
**Why:** Verification should see COMPLETE wave results. If wave is partial, wait until checkpoint resolves.
|
|
536
|
+
|
|
537
|
+
**User Experience:**
|
|
538
|
+
```
|
|
539
|
+
Wave 1: 3 plans
|
|
540
|
+
├─ plan-01: ✓ SUCCESS
|
|
541
|
+
├─ plan-02: ✓ SUCCESS
|
|
542
|
+
├─ plan-03: ⏸ CHECKPOINT
|
|
543
|
+
└─ Verification deferred until checkpoint resolves
|
|
544
|
+
|
|
545
|
+
[User resolves checkpoint]
|
|
546
|
+
|
|
547
|
+
Resuming Wave 1...
|
|
548
|
+
├─ plan-03: ✓ SUCCESS
|
|
549
|
+
├─ Auto-spawning Recognizer...
|
|
550
|
+
│ └─ Verifying all 3 plans... ✓ CLEAR
|
|
551
|
+
└─ Wave 1 verified
|
|
552
|
+
```
|
|
553
|
+
|
|
554
|
+
### Edge Case 4: Verification Itself Fails
|
|
555
|
+
|
|
556
|
+
**Scenario:** Recognizer crashes or times out.
|
|
557
|
+
|
|
558
|
+
**Handling:**
|
|
559
|
+
```python
|
|
560
|
+
try:
|
|
561
|
+
verify_result = Task(...)
|
|
562
|
+
except Exception as e:
|
|
563
|
+
log_error(f"Recognizer failed: {e}")
|
|
564
|
+
return {
|
|
565
|
+
"status": "VERIFICATION_FAILED",
|
|
566
|
+
"error": str(e),
|
|
567
|
+
"recommendation": "Manual verification needed"
|
|
568
|
+
}
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
**Why:** Don't block progress on verification tooling failure. Surface to User as anomaly.
|
|
572
|
+
|
|
573
|
+
**User Experience:**
|
|
574
|
+
```
|
|
575
|
+
Wave 1: Execution complete
|
|
576
|
+
├─ Auto-spawning Recognizer... ✗ FAILED (timeout)
|
|
577
|
+
└─ Verification tool error (manual check recommended)
|
|
578
|
+
|
|
579
|
+
MC: Recognizer encountered an error. Execution completed but verification failed.
|
|
580
|
+
Manual inspection recommended before proceeding. End of Line.
|
|
581
|
+
```
|
|
582
|
+
|
|
583
|
+
### Edge Case 5: Verification Finds Gaps, Planner Fails
|
|
584
|
+
|
|
585
|
+
**Scenario:** Recognizer finds gaps → spawns Planner --gaps → Planner fails.
|
|
586
|
+
|
|
587
|
+
**Handling:**
|
|
588
|
+
```python
|
|
589
|
+
if "GAPS_FOUND" in verify_result:
|
|
590
|
+
try:
|
|
591
|
+
gap_closure = spawn_planner_gaps(...)
|
|
592
|
+
return {"status": "GAPS_FOUND", "closure_plan": gap_closure}
|
|
593
|
+
except Exception as e:
|
|
594
|
+
return {
|
|
595
|
+
"status": "GAPS_FOUND",
|
|
596
|
+
"closure_plan": None,
|
|
597
|
+
"error": "Planner failed, manual gap closure needed"
|
|
598
|
+
}
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
**Why:** Gaps are still real even if automated closure planning fails. Surface gaps to User.
|
|
602
|
+
|
|
603
|
+
**User Experience:**
|
|
604
|
+
```
|
|
605
|
+
Wave 1: Verification complete
|
|
606
|
+
├─ Status: GAPS_FOUND
|
|
607
|
+
│ └─ Missing: Auth token validation
|
|
608
|
+
├─ Spawning Planner for gap closure... ✗ FAILED
|
|
609
|
+
└─ Gaps identified but automated closure failed
|
|
610
|
+
|
|
611
|
+
MC: Recognizer found gaps. Automatic closure planning failed.
|
|
612
|
+
See VERIFICATION.md for details. Manual fix needed. End of Line.
|
|
613
|
+
```
|
|
614
|
+
|
|
615
|
+
### Edge Case 6: Parallel Waves Completing Out of Order
|
|
616
|
+
|
|
617
|
+
**Scenario:** Due to Task() batching, Wave 2 might complete before Wave 1 verification.
|
|
618
|
+
|
|
619
|
+
**Handling:**
|
|
620
|
+
```python
|
|
621
|
+
# Waves execute SEQUENTIALLY (per current protocol)
|
|
622
|
+
# Wave 2 doesn't spawn until Wave 1 is fully verified
|
|
623
|
+
|
|
624
|
+
def execute_all_waves(waves):
|
|
625
|
+
for wave in waves:
|
|
626
|
+
result = execute_wave(wave) # Includes auto-verification
|
|
627
|
+
|
|
628
|
+
if result["status"] == "CHECKPOINT":
|
|
629
|
+
return result # Block and return to User
|
|
630
|
+
elif result["status"] == "FAILURE":
|
|
631
|
+
retry_or_escalate()
|
|
632
|
+
elif result["status"] == "GAPS_FOUND":
|
|
633
|
+
execute_gap_closure(result["closure_plan"])
|
|
634
|
+
# Only proceed to next wave if verified
|
|
635
|
+
```
|
|
636
|
+
|
|
637
|
+
**Why:** Wave-based execution is ALREADY sequential (mc.md line 245). Verification is just the last step of each wave.
|
|
638
|
+
|
|
639
|
+
**User Experience:**
|
|
640
|
+
```
|
|
641
|
+
Wave 1: Execute → Verify ✓
|
|
642
|
+
↓
|
|
643
|
+
Wave 2: Execute → Verify ✓
|
|
644
|
+
↓
|
|
645
|
+
Wave 3: Execute → Verify ✓
|
|
646
|
+
```
|
|
647
|
+
|
|
648
|
+
No change from current behavior — verification just becomes automatic final step.
|
|
649
|
+
|
|
650
|
+
### Edge Case 7: User Requests Mid-Session Verification Skip
|
|
651
|
+
|
|
652
|
+
**Scenario:** User says "just skip verification for now, I'll check later."
|
|
653
|
+
|
|
654
|
+
**Handling:**
|
|
655
|
+
```python
|
|
656
|
+
# Set session flag
|
|
657
|
+
session_state["skip_verification"] = True
|
|
658
|
+
|
|
659
|
+
# Inform User
|
|
660
|
+
print("Verification disabled for this session.")
|
|
661
|
+
print("Will re-enable automatically on next /grid invocation.")
|
|
662
|
+
print("End of Line.")
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
**Why:** Respect User agency. Power users prototyping may want speed over safety temporarily.
|
|
666
|
+
|
|
667
|
+
**User Experience:**
|
|
668
|
+
```
|
|
669
|
+
User: "Skip verification for now, I'm just prototyping"
|
|
670
|
+
|
|
671
|
+
MC: Verification disabled for this session.
|
|
672
|
+
Will re-enable automatically on next /grid invocation.
|
|
673
|
+
End of Line.
|
|
674
|
+
|
|
675
|
+
[All subsequent waves skip verification]
|
|
676
|
+
|
|
677
|
+
User: /clear
|
|
678
|
+
User: /grid
|
|
679
|
+
User: "Build X"
|
|
680
|
+
|
|
681
|
+
MC: [Verification automatically re-enabled — fresh session]
|
|
682
|
+
```
|
|
683
|
+
|
|
684
|
+
### Edge Case 8: Verification Takes Too Long
|
|
685
|
+
|
|
686
|
+
**Scenario:** Large codebase, Recognizer takes 5+ minutes to verify.
|
|
687
|
+
|
|
688
|
+
**Handling:**
|
|
689
|
+
```python
|
|
690
|
+
# Add timeout to verification Task
|
|
691
|
+
verify_result = Task(
|
|
692
|
+
prompt="...",
|
|
693
|
+
timeout=300000, # 5 minutes
|
|
694
|
+
...
|
|
695
|
+
)
|
|
696
|
+
|
|
697
|
+
if verify_result == TIMEOUT:
|
|
698
|
+
return {
|
|
699
|
+
"status": "VERIFICATION_TIMEOUT",
|
|
700
|
+
"recommendation": "Manual verification or increase timeout"
|
|
701
|
+
}
|
|
702
|
+
```
|
|
703
|
+
|
|
704
|
+
**Why:** Don't block progress indefinitely. Surface timeout and let User decide.
|
|
705
|
+
|
|
706
|
+
**User Experience:**
|
|
707
|
+
```
|
|
708
|
+
Wave 1: Execution complete
|
|
709
|
+
├─ Auto-spawning Recognizer...
|
|
710
|
+
│ └─ Verifying... (large codebase, this may take a few minutes)
|
|
711
|
+
│ └─ Timeout after 5 minutes
|
|
712
|
+
└─ Verification incomplete (manual check recommended)
|
|
713
|
+
|
|
714
|
+
MC: Verification timed out. Execution completed successfully.
|
|
715
|
+
Recommend manual inspection of key artifacts. End of Line.
|
|
716
|
+
```
|
|
717
|
+
|
|
718
|
+
### Edge Case 9: Verification Finds CRITICAL_ANOMALY
|
|
719
|
+
|
|
720
|
+
**Scenario:** Recognizer can't determine goal achievement programmatically (needs human verification).
|
|
721
|
+
|
|
722
|
+
**Handling:**
|
|
723
|
+
```python
|
|
724
|
+
if verify_result["status"] == "CRITICAL_ANOMALY":
|
|
725
|
+
return {
|
|
726
|
+
"status": "HUMAN_VERIFICATION_NEEDED",
|
|
727
|
+
"details": verify_result["human_verification_items"]
|
|
728
|
+
}
|
|
729
|
+
# Present to User via I/O Tower
|
|
730
|
+
```
|
|
731
|
+
|
|
732
|
+
**Why:** Some things (visual, UX, external integrations) need human eyes. Don't block, surface.
|
|
733
|
+
|
|
734
|
+
**User Experience:**
|
|
735
|
+
```
|
|
736
|
+
Wave 1: Execution complete
|
|
737
|
+
├─ Auto-spawning Recognizer... ✓
|
|
738
|
+
└─ Status: HUMAN_VERIFICATION_NEEDED
|
|
739
|
+
|
|
740
|
+
Human Verification Required:
|
|
741
|
+
1. Check login UI renders correctly (screenshot at .grid/refinement/screenshots/login.png)
|
|
742
|
+
2. Test email delivery works (external service)
|
|
743
|
+
|
|
744
|
+
MC: Automated checks passed. Manual verification needed for items above.
|
|
745
|
+
Confirm when ready to proceed. End of Line.
|
|
746
|
+
|
|
747
|
+
User: "Looks good"
|
|
748
|
+
MC: Proceeding to Wave 2. End of Line.
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### Edge Case 10: Opt-Out via Frontmatter But Verification Needed Anyway
|
|
752
|
+
|
|
753
|
+
**Scenario:** User sets `verify: false` but User later says "wait, verify that."
|
|
754
|
+
|
|
755
|
+
**Handling:**
|
|
756
|
+
```python
|
|
757
|
+
# Respect explicit User command over frontmatter
|
|
758
|
+
if user_says_verify_now:
|
|
759
|
+
spawn_recognizer(...) # Override frontmatter setting
|
|
760
|
+
print("Verification override: Spawning Recognizer despite verify: false in plan.")
|
|
761
|
+
```
|
|
762
|
+
|
|
763
|
+
**Why:** User intent in conversation overrides static config. Be flexible.
|
|
764
|
+
|
|
765
|
+
**User Experience:**
|
|
766
|
+
```
|
|
767
|
+
[Wave completes with verify: false in plan]
|
|
768
|
+
|
|
769
|
+
MC: Wave 1 complete. Verification skipped (verify: false in plan). End of Line.
|
|
770
|
+
|
|
771
|
+
User: "Actually, verify that wave"
|
|
772
|
+
|
|
773
|
+
MC: Verification override: Spawning Recognizer despite verify: false in plan.
|
|
774
|
+
[Recognizer runs...]
|
|
775
|
+
End of Line.
|
|
776
|
+
```
|
|
777
|
+
|
|
778
|
+
---
|
|
779
|
+
|
|
780
|
+
## Implementation Checklist
|
|
781
|
+
|
|
782
|
+
Before merging this feature into production mc.md:
|
|
783
|
+
|
|
784
|
+
- [ ] Update EXECUTE-AND-VERIFY PRIMITIVE section with new protocol
|
|
785
|
+
- [ ] Update Wave-Based Execution section with auto-verification flow
|
|
786
|
+
- [ ] Add `should_skip_verification()` helper function to Quick Reference
|
|
787
|
+
- [ ] Update RULES section (rule #8)
|
|
788
|
+
- [ ] Update PROGRESS UPDATES with verification output
|
|
789
|
+
- [ ] Add verification opt-out patterns to documentation
|
|
790
|
+
- [ ] Update Quick Reference with verification timing note
|
|
791
|
+
- [ ] Test edge cases:
|
|
792
|
+
- [ ] Executor returns CHECKPOINT → verify skipped
|
|
793
|
+
- [ ] Executor returns FAILURE → verify skipped
|
|
794
|
+
- [ ] Mixed wave results → correct prioritization
|
|
795
|
+
- [ ] Verification finds gaps → Planner spawns
|
|
796
|
+
- [ ] User says "skip verification" → session flag set
|
|
797
|
+
- [ ] Plan has `verify: false` → skipped
|
|
798
|
+
- [ ] Verification timeout → graceful degradation
|
|
799
|
+
- [ ] Update grid-executor.md to return explicit SUCCESS status
|
|
800
|
+
- [ ] Update grid-recognizer.md to handle wave-level summaries
|
|
801
|
+
- [ ] Add verification metrics to STATE.md (optional):
|
|
802
|
+
```yaml
|
|
803
|
+
verification_stats:
|
|
804
|
+
waves_verified: 3
|
|
805
|
+
gaps_found: 1
|
|
806
|
+
gaps_closed: 1
|
|
807
|
+
verification_skipped: 0
|
|
808
|
+
```
|
|
809
|
+
|
|
810
|
+
---
|
|
811
|
+
|
|
812
|
+
## Migration Path
|
|
813
|
+
|
|
814
|
+
This feature is **backward compatible**:
|
|
815
|
+
|
|
816
|
+
1. **Existing behavior still works:** MC can still manually spawn Recognizer if needed
|
|
817
|
+
2. **New projects get automatic verification:** Fresh `/grid` sessions use auto-verify
|
|
818
|
+
3. **Old projects unaffected:** No changes to existing .grid/ state
|
|
819
|
+
4. **Gradual rollout:** Ship to npm, users adopt on next `npm update`
|
|
820
|
+
|
|
821
|
+
No breaking changes. Pure enhancement.
|
|
822
|
+
|
|
823
|
+
---
|
|
824
|
+
|
|
825
|
+
## Success Metrics
|
|
826
|
+
|
|
827
|
+
After shipping, measure:
|
|
828
|
+
|
|
829
|
+
1. **Gap detection rate:** % of waves where Recognizer finds gaps
|
|
830
|
+
- Hypothesis: Will increase initially (catching silent gaps), then decrease (quality improves)
|
|
831
|
+
|
|
832
|
+
2. **Verification skip rate:** % of waves with `verify: false`
|
|
833
|
+
- Target: <5% (verification should be rare to skip)
|
|
834
|
+
|
|
835
|
+
3. **User-initiated verification skips:** % of sessions where User says "skip verification"
|
|
836
|
+
- Target: <10% (should be exceptional, not common)
|
|
837
|
+
|
|
838
|
+
4. **Time-to-verification:** Median time from Executor SUCCESS to Recognizer spawn
|
|
839
|
+
- Target: <2 seconds (nearly instant)
|
|
840
|
+
|
|
841
|
+
5. **Gap closure success rate:** % of GAPS_FOUND that lead to successful closure plan execution
|
|
842
|
+
- Target: >80% (most gaps should be auto-fixable)
|
|
843
|
+
|
|
844
|
+
---
|
|
845
|
+
|
|
846
|
+
## Future Enhancements (Out of Scope)
|
|
847
|
+
|
|
848
|
+
These are NOT part of this feature but could build on it later:
|
|
849
|
+
|
|
850
|
+
1. **Predictive Verification:** Recognizer prioritizes checks based on past gap patterns
|
|
851
|
+
2. **Partial Wave Verification:** Verify successes even if checkpoint pending (if independent)
|
|
852
|
+
3. **Verification Metrics Dashboard:** `.grid/metrics.json` tracking verification health
|
|
853
|
+
4. **Smart Verification Skipping:** Auto-skip verification for trivial changes (doc updates)
|
|
854
|
+
5. **Verification Confidence Scores:** Recognizer returns 0-100% confidence in each check
|
|
855
|
+
6. **Parallel Verification:** Spawn multiple Recognizers to verify different aspects simultaneously
|
|
856
|
+
|
|
857
|
+
---
|
|
858
|
+
|
|
859
|
+
## Conclusion
|
|
860
|
+
|
|
861
|
+
This feature transforms verification from a **manual afterthought** to an **automatic safety gate**. By making verification opt-out (not opt-in), we align with industry best practices and dramatically reduce the risk of silent gaps reaching Users.
|
|
862
|
+
|
|
863
|
+
The execute-and-verify primitive becomes truly atomic: Executor → Recognizer happens automatically unless there's a specific reason not to (checkpoint, failure, explicit skip).
|
|
864
|
+
|
|
865
|
+
User experience improves: they see verification happening automatically, trust increases, and surprises decrease.
|
|
866
|
+
|
|
867
|
+
End of Line.
|