@ai-dev-methodologies/rlp-desk 0.2.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -45,12 +45,34 @@ Read these files in order:
45
45
  - No file creation or modification outside the project root.
46
46
  - Do not modify this prompt file or any PRD/test-spec files.
47
47
 
48
+ ## Test-First Approach (read test-spec BEFORE coding)
49
+ 1. Read test-spec "Impacted Tests" — if TODO (first iteration), skip to step 2 and fill this section during your work. Otherwise, run these FIRST to confirm they pass before your changes.
50
+ 2. Read test-spec "Required New Tests" — write these. They SHOULD FAIL initially.
51
+ 3. Implement minimum code to make all tests pass.
52
+ 4. Run ALL tests (impacted + new) to confirm nothing is broken.
53
+
54
+ ## Forbidden Shortcuts (Verifier will check these)
55
+ - Do not mock external services when L2 integration test is required by test-spec.
56
+ - Do not delete or weaken existing assertions to make tests pass.
57
+ - Do not add test-specific logic (code that detects it is running in a test).
58
+ - Do not skip boundary cases listed in the PRD.
59
+ - Do not claim "code inspection" as verification — run the actual command.
60
+ - Do not say "too simple to test" — simple code breaks. Test takes 30 seconds.
61
+ - Do not say "I'll test after" — tests passing immediately prove nothing.
62
+ - Do not say "already manually tested" — ad-hoc is not systematic, no record.
63
+ - Do not say "partial check is enough" — partial proves nothing about the whole.
64
+ - Do not say "I'm confident" — confidence is not evidence.
65
+ - Do not say "existing code has no tests" — you are improving it, add tests.
66
+ - Do not write code before tests — if you did, delete it and start with tests.
67
+
48
68
  ## Iteration rules
49
69
  - Use fresh context only; do NOT depend on prior chat history.
50
70
  - Execute exactly the work specified in the Next Iteration Contract.
51
71
  - Refresh context file with the current frontier.
52
72
  - Rewrite campaign memory in full.
53
73
  - Write evidence artifacts.
74
+ - **After writing tests, update test-spec Criteria Mapping with actual test file paths and function names** (replace placeholder -k filters).
75
+ - Ensure **each AC has >= 3 tests** (happy + negative + boundary). Do not just meet the total count — distribute evenly per AC.
54
76
  - **Commit all changes when the iteration is complete** (include iteration number and story ID in commit message).
55
77
 
56
78
  MANDATORY: When done with this iteration, write the following signal file:
@@ -68,6 +90,25 @@ MANDATORY: When done with this iteration, write the following signal file:
68
90
  - Do NOT signal "continue" when a US is done — always signal "verify" per US.
69
91
  - Signal "continue" ONLY when you have more work to do within the same US (e.g., a multi-step task).
70
92
 
93
+ ## Done Claim Format
94
+ When writing done-claim JSON, ALWAYS include execution_steps — what you did, in what order, with evidence:
95
+ \`\`\`json
96
+ {
97
+ "us_id": "US-NNN",
98
+ "claims": ["AC1: ...", "AC2: ..."],
99
+ "execution_steps": [
100
+ {"step": "write_test", "ac_id": "AC1", "command": null, "summary": "wrote tests/test_add.py with 3 tests"},
101
+ {"step": "verify_red", "ac_id": "AC1", "command": "pytest tests/...", "exit_code": 1, "summary": "RED: test fails as expected"},
102
+ {"step": "implement", "ac_id": "AC1", "command": null, "summary": "created add() function"},
103
+ {"step": "verify_green", "ac_id": "AC1", "command": "pytest tests/...", "exit_code": 0, "summary": "GREEN: 3 passed"},
104
+ {"step": "verify_e2e", "ac_id": "AC1", "command": "python -c '...'", "exit_code": 0, "summary": "E2E output matches expected"},
105
+ {"step": "commit", "ac_id": "AC1", "command": "git commit ...", "exit_code": 0, "summary": "committed abc1234"}
106
+ ]
107
+ }
108
+ \`\`\`
109
+ This is NOT optional. Every done-claim must include the steps you took and the evidence for each.
110
+ execution_steps MUST be a JSON array of objects (not a dict with string keys). Each object MUST have: "step", "ac_id", "command", "exit_code", "summary".
111
+
71
112
  ## Stop behavior
72
113
  - Single US achieved → write done-claim JSON to $DESK/memos/$SLUG-done-claim.json with the specific US, signal verify, exit
73
114
  - All US achieved → write done-claim JSON with all US, signal verify with us_id "ALL", exit
@@ -86,6 +127,17 @@ if [[ ! -f "$F" ]]; then
86
127
  cat > "$F" <<EOF
87
128
  Independent verifier for Ralph Desk: $SLUG
88
129
 
130
+ ## Iron Law (ABSOLUTE — no exceptions)
131
+ > NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
132
+ > "should pass", "probably works", "seems to" = automatic FAIL
133
+
134
+ ## Evidence Gate (MANDATORY before any verdict)
135
+ 1. IDENTIFY: What command proves this claim?
136
+ 2. RUN: Execute the FULL command (fresh, complete)
137
+ 3. READ: Full output, check exit code, count failures
138
+ 4. VERIFY: Does output confirm the claim?
139
+ 5. ONLY THEN: Issue verdict
140
+
89
141
  Required reads:
90
142
  - PRD: $DESK/plans/prd-$SLUG.md
91
143
  - Test Spec: $DESK/plans/test-spec-$SLUG.md
@@ -100,14 +152,24 @@ Check the iter-signal.json "us_id" field:
100
152
  - If us_id is "ALL": verify ALL acceptance criteria from the PRD (final full verify).
101
153
  - If us_id is absent or null: verify all criteria in the done-claim (legacy/batch mode).
102
154
 
103
- Process:
155
+ ## Verification Process
104
156
  1. Read PRD acceptance criteria (scoped to us_id if present)
105
157
  2. Read done claim
106
158
  3. Identify scope: run \`git diff --name-only\` to find changed files, then read those files + related imports only
107
- 4. Run fresh verification: build, test, lint, typecheck (per test-spec tools)
108
- 5. Check each criterion against fresh evidence (only for the scoped US, or all if us_id=ALL)
109
- 6. Run smoke test if defined in PRD
110
- 7. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
159
+ 4. **Scope Lock check**: (a) Read the Next Iteration Contract from campaign memory to identify the contracted US. (b) Run \`git diff --name-only\` to list all changed files. (c) For each changed file, verify it is plausibly related to the contracted US's acceptance criteria. (d) Flag files that appear unrelated. (e) Shared infrastructure (types, configs, common utilities) and dependency files are permitted if the AC implies them.
160
+ 5. **Layer Enforcement**: check test-spec L1/L2/L3/L4 sections. ANY section with TODO or blank = FAIL (IL-3).
161
+ 6. Run fresh verification: execute ALL commands from test-spec verification layers (L1, L2, L3, L4 as applicable)
162
+ 7. Check each criterion against fresh evidence (only for the scoped US, or all if us_id=ALL)
163
+ 8. Run smoke test if defined in PRD
164
+ 9. **Test Sufficiency (IL-4)**: count test functions exercising each AC. Count < 3 per AC = FAIL.
165
+ Check diversity: at least 2 of 3 categories (happy, negative, boundary) per AC.
166
+ 10. **Anti-Gaming Detection**:
167
+ - Assertion integrity: compare assertion count/strength via \`git diff HEAD~1\` — assertions not deleted or weakened
168
+ - Test-specific logic: no environment-detection patterns
169
+ - "Code inspection" claims: Worker must run actual commands
170
+ - Tautological tests: expected values that mirror implementation logic
171
+ 11. **Reproducibility check**: verify lock file committed, clean install succeeds, security scan passes, env vars documented (per test-spec Reproducibility Gate). Skip if test-spec says "N/A."
172
+ 12. Write verdict JSON to: $DESK/memos/$SLUG-verify-verdict.json
111
173
 
112
174
  Verdict JSON:
113
175
  {
@@ -118,6 +180,14 @@ Verdict JSON:
118
180
  "criteria_results": [{"criterion":"...","met":true/false,"evidence":"..."}],
119
181
  "missing_evidence": [],
120
182
  "issues": [{"id":"...","severity":"critical|major|minor","description":"...","fix_hint":"(suggestion, non-authoritative)"}],
183
+ "reasoning": [
184
+ {"check": "IL-1 Evidence Gate", "decision": "pass|fail", "basis": "what command was run, what output confirmed the decision"},
185
+ {"check": "Layer Enforcement", "decision": "pass|fail", "basis": "which layers checked, any TODO found"},
186
+ {"check": "Test Sufficiency", "decision": "pass|fail", "basis": "test count per AC, category coverage"},
187
+ {"check": "Anti-Gaming", "decision": "pass|fail", "basis": "what was checked, any suspicious patterns"}
188
+ ],
189
+ "layer_status": {"L1":"pass|fail|todo|na","L2":"pass|fail|todo|na","L3":"pass|fail|todo|na","L4":"pass|fail|todo|na"},
190
+ "test_quality": {"test_count":0,"ac_count":0,"sufficiency":"pass|fail","anti_patterns_found":[]},
121
191
  "recommended_state_transition": "complete|continue|blocked",
122
192
  "next_iteration_contract": "...",
123
193
  "evidence_paths": []
@@ -129,6 +199,7 @@ Rules:
129
199
  - Campaign Memory is for orientation only — do NOT use it as source of truth for AC verification.
130
200
  - Deterministic checks (type hints, linting, security) delegate to test-spec tools; focus on AC verification + semantic review + smoke test.
131
201
  - Do NOT modify code or write sentinel files.
202
+ - If Worker claims "inspection" or "review" for an AC that requires an automated command, verdict = FAIL.
132
203
  EOF
133
204
  echo " + $F"
134
205
  else echo " · $F"; fi
@@ -199,16 +270,29 @@ $OBJECTIVE
199
270
  ### US-001: [Title]
200
271
  - **Priority**: P0
201
272
  - **Size**: S|M|L
273
+ - **Type**: code|visual|content|integration|infra
274
+ - **Risk**: LOW|MEDIUM|HIGH|CRITICAL (governance §1c)
202
275
  - **Depends on**: []
203
- - **Acceptance Criteria**:
204
- - [ ] [Specific, testable criterion]
276
+ - **Acceptance Criteria** (Given/When/Then — domain language only):
277
+ - AC1:
278
+ - Given: [precondition in domain language]
279
+ - When: [action in domain language]
280
+ - Then: [expected outcome with quantitative criteria]
281
+ - AC2:
282
+ - Given: [precondition]
283
+ - When: [action]
284
+ - Then: [expected outcome with quantitative criteria]
285
+ - **Boundary Cases**: [edge cases — empty input, max values, error conditions, concurrent access]
286
+ - **Verification Layers**: [Fill per Risk level — LOW: L1+L3, MEDIUM: L1+L2(if ext deps)+L3, HIGH: L1+L2+L3+L4, CRITICAL: L1+L2+L3+L4+mutation (governance §1c)]
205
287
  - **Status**: not started
206
288
 
207
289
  ## Non-Goals
208
290
  ## Technical Constraints
209
291
  ## Done When
210
- - All acceptance criteria pass
211
- - Independent verifier confirms
292
+ - All acceptance criteria pass with quantitative evidence
293
+ - All boundary cases covered
294
+ - All required verification layers executed (no TODO remaining)
295
+ - Independent verifier confirms via Evidence Gate (governance §1b)
212
296
  EOF
213
297
  echo " + $F"
214
298
  else echo " · $F"; fi
@@ -219,6 +303,12 @@ if [[ ! -f "$F" ]]; then
219
303
  cat > "$F" <<EOF
220
304
  # Test Specification: $SLUG
221
305
 
306
+ ## Iron Law Reference
307
+ > IL-3: NO PASS WITH TODO IN ANY REQUIRED VERIFICATION LAYER
308
+ > IL-4: NO PASS WITHOUT TEST COUNT >= AC COUNT x 3
309
+
310
+ ---
311
+
222
312
  ## Verification Commands
223
313
  ### Build
224
314
  \`\`\`bash
@@ -233,10 +323,129 @@ if [[ ! -f "$F" ]]; then
233
323
  # TODO
234
324
  \`\`\`
235
325
 
326
+ ---
327
+
328
+ ## Verification Context (fill BEFORE implementation)
329
+
330
+ ### Target Behavior
331
+ What behavior does this project change or introduce?
332
+ - TODO
333
+
334
+ ### Impacted Tests
335
+ Existing tests that may break due to this change:
336
+ - TODO (acceptable at init; Worker fills during first iteration)
337
+
338
+ ### Required New Tests
339
+ Tests that MUST be written (minimum 3 per AC: happy + negative + boundary):
340
+ - TODO
341
+
342
+ ### Forbidden Shortcuts (see Worker prompt for full list)
343
+ - Do not mock external services when L2 integration test is required
344
+ - Do not delete or weaken existing assertions to make tests pass
345
+ - Do not add test-specific logic (if __name__ == '__test__' patterns)
346
+ - Do not skip boundary cases listed in the PRD
347
+ - Do not claim "code inspection" as verification — run the actual command
348
+ - Do not say "too simple to test" — simple code breaks
349
+ - Do not say "I'll test after" — tests passing immediately prove nothing
350
+ - Do not say "already manually tested" — ad-hoc is not systematic
351
+ - Do not say "partial check is enough" — partial proves nothing
352
+ - Do not say "I'm confident" — confidence is not evidence
353
+ - Do not say "existing code has no tests" — you are improving it, add tests
354
+ - Do not write code before tests — delete it and start with tests
355
+
356
+ ### Pass/Fail Evidence Format
357
+ - Command output with exit code 0
358
+ - Quantitative result matching expected value
359
+ - Screenshot comparison (for visual tasks)
360
+
361
+ ---
362
+
363
+ ## Verification Layers (ALL required sections — TODO in required layer = Verifier FAIL)
364
+
365
+ ### L1: Unit Test (REQUIRED)
366
+ \`\`\`bash
367
+ # TODO — unit test command (e.g., pytest, jest, go test)
368
+ \`\`\`
369
+
370
+ ### L2: Integration (required if external services exist, otherwise "N/A — reason")
371
+ \`\`\`bash
372
+ # TODO — integration test command, or write: N/A — no external services (pure computation/transformation)
373
+ \`\`\`
374
+
375
+ ### L3: E2E Simulation (REQUIRED)
376
+ Known input → full pipeline → quantitative output comparison.
377
+ Must cover ALL AC types: happy path + boundary + error path.
378
+ - **Happy path input**: TODO (specific test data)
379
+ - **Happy path expected output**: TODO (quantitative value)
380
+ - **Happy path command**:
381
+ \`\`\`bash
382
+ # TODO — E2E happy path command
383
+ \`\`\`
384
+ - **Error path input**: TODO (invalid/boundary input that triggers error)
385
+ - **Error path expected**: TODO (error type + non-zero exit code)
386
+ - **Error path command**:
387
+ \`\`\`bash
388
+ # TODO — E2E error path command (expected exit ≠ 0)
389
+ \`\`\`
390
+
391
+ ### L4: Deploy Verification (required if deploying, otherwise "N/A — reason")
392
+ \`\`\`bash
393
+ # TODO — deploy verification command, or write: N/A — no deployment (library/tool, local-only change)
394
+ \`\`\`
395
+
396
+ ---
397
+
398
+ ## Mutation Testing Gate (CRITICAL risk only)
399
+ - Required: only for CRITICAL risk classification (governance §1c)
400
+ - Tool: TODO (e.g., mutmut, Stryker, go-mutesting) or "N/A — not CRITICAL risk"
401
+ - Target: >= 60% mutation score on core business logic (project default; override in PRD if justified)
402
+ - Scope: core business logic files (not config/tests/docs)
403
+ - Command:
404
+ \`\`\`bash
405
+ # TODO — mutation testing command, or write: N/A — not CRITICAL risk
406
+ \`\`\`
407
+
408
+ ---
409
+
410
+ ## Test Quality Checklist (Verifier checks these)
411
+ - [ ] Tests verify behavior, not implementation details
412
+ - [ ] Each test has meaningful assertions (not just "no error thrown")
413
+ - [ ] Boundary cases covered (empty, max, zero, null, concurrent)
414
+ - [ ] No tautological tests (expected value copied from implementation)
415
+ - [ ] Mock usage limited to external boundaries only
416
+ - [ ] No test-specific logic in production code
417
+ - [ ] Each AC has >= 3 tests (happy + negative + boundary) per IL-4
418
+
419
+ ## Traceability Matrix (Worker fills during implementation)
420
+
421
+ | US | AC | Test File :: Function | Layer | Evidence | Status |
422
+ |----|----|----------------------|-------|----------|--------|
423
+ | US-001 | AC1 | TODO | L1 | TODO | pending |
424
+
425
+ ---
426
+
427
+ ## Code Quality Gates (defaults — override in PRD with justification)
428
+ - **Code duplication**: <= 3% (project-appropriate tool, e.g., jscpd, pylint, sonar)
429
+ - **Mock ratio**: mock-based assertions <= 30% of total assertions
430
+ - **Cyclomatic complexity**: <= 10 per function
431
+ - **Function length**: <= 50 lines per function
432
+ - **File length**: <= 800 lines per file
433
+
434
+ ---
435
+
436
+ ## Reproducibility Gate
437
+ - [ ] Lock file exists and committed (package-lock.json, poetry.lock, go.sum, etc.) or "N/A — no external dependencies"
438
+ - [ ] Clean install succeeds (npm ci, pip install, etc.) or "N/A — no external dependencies"
439
+ - [ ] Security scan passes (or known vulnerabilities documented and acknowledged in PRD) or "N/A — no dependencies"
440
+ - [ ] Environment variables documented (.env.example or equivalent) or "N/A — no env vars"
441
+
442
+ ---
443
+
236
444
  ## Criteria → Verification Mapping
237
- | Criterion | Method | Command |
238
- |-----------|--------|---------|
239
- | US-001 AC1 | TODO | TODO |
445
+
446
+ | US | AC | Layer | Method | Command | Expected Output | Pass Criteria |
447
+ |----|----|-------|--------|---------|-----------------|---------------|
448
+ | US-001 | AC1 | L1 | TODO | TODO | TODO | TODO |
240
449
  EOF
241
450
  echo " + $F"
242
451
  else echo " · $F"; fi