npm - @ai-dev-methodologies/rlp-desk - Versions diffs - 0.8.0 → 0.9.0 - Mend

@ai-dev-methodologies/rlp-desk 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/docs/plans/validated-snacking-crayon.md +407 -0
package/package.json +1 -1
package/src/commands/rlp-desk.md +10 -1
package/src/governance.md +1 -1

package/docs/plans/validated-snacking-crayon.md ADDED Viewed

@@ -0,0 +1,407 @@
+# Plan: Worker Planning, Preset Sync, Brainstorm Exploration, Memory Bridge & Coding Principles
+## Context
+rlp-desk의 Worker/Verifier 프롬프트와 brainstorm/init 흐름에 5가지 개선을 적용한다.
+기존 iron law 정책 체계의 후속 업데이트로, 검증된 패턴을 Worker/Verifier fresh context에 내장한다.
+**문제:**
+1. `print_run_presets()`가 rlp-desk.md 옵션 인터페이스와 desync (stale 플래그, 틀린 기본값)
+2. Worker가 파일 읽자마자 바로 TDD로 넘어감 (계획 단계 없음)
+3. Brainstorm이 코드 안 보고 US 제안
+4. Brainstorm 결과가 campaign memory에 안 남음 (첫 Worker가 재발견)
+5. Worker/Verifier가 코딩 원칙 가이드라인 없이 작동 (글로벌 CLAUDE.md 의존 불가)
+**브랜치:** `improve/worker-planning-and-preset-sync`
+---
+## Changes
+### Change 1: Fix Run Preset Desync
+**File:** `src/scripts/init_ralph_desk.zsh` lines 197-238
+Rewrite `print_run_presets()` to match `src/commands/rlp-desk.md` lines 142-200.
+**Desync table:**
+| current (init_ralph_desk.zsh) | canonical (rlp-desk.md) |
+|---|---|
+| `--final-consensus` (line 207) | `--consensus final-only` |
+| `gpt-5.3-codex-spark:high` (line 210) | `spark:high` |
+| `--verify-consensus` (line 232) | `--consensus off\|all\|final-only` |
+| worker default `sonnet` (line 230) | `haiku` |
+| verifier default `opus` (line 231) | per-US `sonnet`, final `opus` |
+| Missing `--mode tmux` in recommended | Present |
+| Missing 6 options | `--lock-worker-model`, `--consensus-model`, `--final-consensus-model`, `--cb-threshold`, `--iter-timeout`, `--final-verifier-model` |
+**Action:** Replace lines 197-238 with function that mirrors rlp-desk.md lines 142-200.
+### Change 2: Add Worker Planning Step
+**Files:**
+- `src/scripts/init_ralph_desk.zsh` Worker prompt — insert between line 316 and line 318
+- `src/governance.md` line 217 — add `plan` to step types
+- `src/scripts/init_ralph_desk.zsh` Verifier prompt — add audit after line 478
+**Insert after line 316 ("Execute the plan for $SLUG."), before line 318 ("## Before you start"):**
+```
+## Planning (before writing any code)
+After reading all files, BEFORE writing any test or code:
+1. List the specific files you will create or modify
+2. For each AC in the contract, state your approach in 1 sentence
+3. Identify ordering constraints (which AC depends on which)
+4. Record as first execution_step: {"step": "plan", "ac_id": "all", "command": null, "exit_code": null, "summary": "Plan: [files], [approach], [order]"}
+Keep planning lightweight — 1-2 sentences per AC, not a detailed analysis.
+If the plan reveals the contract is unclear or infeasible, signal "blocked" immediately.
+```
+**governance.md line 217:** Change from:
+```
+- Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
+```
+to:
+```
+- Step types: `plan`, `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
+```
+**Verifier prompt after line 478 (Worker Process Audit):** Add:
+```
+   - Planning step presence: done-claim execution_steps should include a `plan` step as the first entry. If missing, record in reasoning as {"check": "Planning Step", "decision": "info", "basis": "plan step present/absent"} — informational only (does not affect pass/fail verdict)
+```
+### Change 3: Brainstorm Exploration Phase
+**File:** `src/commands/rlp-desk.md` — insert between line 25 and line 26
+**Insert after line 25 ("2. **Objective**") and before line 26 ("3. **User Stories**"):**
+```
+2.5. **Codebase Exploration** — Before proposing user stories, examine the project:
+   - Read the project's entry points, key modules, and test structure
+   - Identify architectural patterns in use (frameworks, conventions, test setup)
+   - Note constraints the Worker will encounter (dependencies, build system, existing code style)
+   - Present findings: "I explored the codebase and found: [patterns], [constraints], [existing tests]. This informs the US breakdown below."
+   - If the project is new/empty, skip this step and note "greenfield project."
+```
+### Change 4: Memory Bridge
+**Files:**
+- `src/commands/rlp-desk.md` line 131
+- `src/scripts/init_ralph_desk.zsh` lines 578-580 (campaign memory template)
+- `src/scripts/init_ralph_desk.zsh` line 355 area (Worker prompt iteration rules)
+**rlp-desk.md line 131:** Change from:
+```
+If brainstorm was done, auto-fill PRD and test-spec with the results.
+```
+to:
+```
+If brainstorm was done, auto-fill:
+- PRD and test-spec with the brainstorm results
+- Campaign memory "Key Decisions" with architectural decisions from brainstorm
+- Campaign memory "Patterns Discovered" with codebase exploration findings (from step 2.5)
+```
+**init_ralph_desk.zsh lines 578-580:** Change from:
+```
+## Key Decisions
+## Patterns Discovered
+```
+to:
+```
+## Key Decisions
+(seeded from brainstorm — do not erase, only append)
+## Patterns Discovered
+(seeded from brainstorm codebase exploration — do not erase, only append)
+```
+**init_ralph_desk.zsh Worker prompt, after line 355 ("- Rewrite campaign memory in full."):** Add:
+```
+- When rewriting campaign memory, PRESERVE the Key Decisions and Patterns Discovered sections from prior iterations — append new entries, do not erase existing ones.
+```
+### Change 5: Coding Principles (Karpathy Guidelines)
+**Files:**
+- `src/scripts/init_ralph_desk.zsh` Worker prompt — insert after line 316, before Change 2's Planning section
+- `src/scripts/init_ralph_desk.zsh` Verifier prompt — insert after line 429
+**Worker prompt — insert after line 316 ("Execute the plan for $SLUG."), as first section:**
+```
+## Coding Principles (applies to ALL work in this iteration)
+1. Think Before Coding
+   Don't assume. Don't hide confusion. Surface tradeoffs.
+   - State assumptions explicitly. If uncertain, signal blocked with your options
+     listed — do not guess.
+   - If multiple interpretations exist, present them in blocked signal — do not
+     pick silently.
+   - If a simpler approach exists, note it in your plan.
+   - If something important is unclear, stop and name what is confusing.
+2. Simplicity First
+   Minimum code that solves the problem. Nothing speculative.
+   - No features beyond what was asked.
+   - No abstractions for single-use code.
+   - No configurability that was not specified.
+   - No defensive handling for implausible scenarios unless the context requires it.
+   - If 200 lines could be 50, rewrite it.
+   Ask: "Would a strong senior engineer call this overcomplicated?" If yes, simplify.
+3. Surgical Changes
+   Touch only what you must. Clean up only your own mess.
+   - Do not improve adjacent code, comments, or formatting unless required by the task.
+   - Do not refactor unrelated code.
+   - Match the local style unless there is a compelling reason not to.
+   - If unrelated dead code is noticed, mention it in done-claim — do not delete it.
+   - Remove imports, variables, or functions that YOUR changes made unused.
+   - Do not remove pre-existing dead code.
+   Test: every changed line should trace directly to the contract.
+4. Goal-Driven Execution
+   Define success criteria. Loop until verified.
+   These principles are enforced by the TDD Mandate and Planning step below.
+   If success criteria for any AC are unclear, signal blocked.
+```
+**Verifier prompt — insert after line 429 ("Independent verifier for Ralph Desk: $SLUG"), before line 431 ("## Iron Law"):**
+```
+## Verification Principles
+1. Think Before Judging
+   Don't assume. Don't default to PASS or FAIL without evidence.
+   - State your assumptions about what PASS looks like for each AC before
+     checking evidence.
+   - If evidence is ambiguous or incomplete, say what is unclear and why —
+     do not default to either verdict.
+   - If multiple interpretations of an AC exist, flag it as a spec issue.
+2. Goal-Driven Verification
+   Define the specific evidence required for PASS before you start checking.
+   - For each AC, state: "PASS requires [specific evidence]."
+   - Verify against that criteria, not against a general impression of code quality.
+   - If success criteria are unclear, note it in reasoning — do not invent criteria.
+```
+---
+## Implementation Sequence
+| Wave | Changes | Files | Risk |
+|------|---------|-------|------|
+| 1 | Change 1 (run preset desync) | init_ralph_desk.zsh | LOW |
+| 2 | Change 5 (coding principles) | init_ralph_desk.zsh | LOW |
+| 2 | Change 2 (planning step) | init_ralph_desk.zsh + governance.md | LOW-MED |
+| 3 | Change 3 (brainstorm exploration) | rlp-desk.md | LOW |
+| 3 | Change 4 (memory bridge) | rlp-desk.md + init_ralph_desk.zsh | MEDIUM |
+**Order rationale:**
+- Wave 1: Standalone bugfix, no dependencies
+- Wave 2: Coding Principles first (top of prompt), then Planning step (uses principles). Both in init_ralph_desk.zsh Worker prompt.
+- Wave 3: rlp-desk.md changes. Change 4 depends on Change 3 (exploration produces findings that get seeded).
+---
+## TDD Verification Plan
+Each change has tests written FIRST, verified to fail, then implementation, then re-verify.
+### Test Script: `tests/test_template_generation.sh`
+```bash
+#!/bin/bash
+# TDD tests for template generation changes
+# Run: bash tests/test_template_generation.sh
+set -euo pipefail
+SCRIPT="src/scripts/init_ralph_desk.zsh"
+CMD="src/commands/rlp-desk.md"
+GOV="src/governance.md"
+PASS=0; FAIL=0; TOTAL=0
+assert_contains() {
+  local file="$1" pattern="$2" label="$3"
+  TOTAL=$((TOTAL+1))
+  if grep -q "$pattern" "$file" 2>/dev/null; then
+    echo "  PASS: $label"; PASS=$((PASS+1))
+  else
+    echo "  FAIL: $label (pattern not found: $pattern)"; FAIL=$((FAIL+1))
+  fi
+}
+assert_not_contains() {
+  local file="$1" pattern="$2" label="$3"
+  TOTAL=$((TOTAL+1))
+  if grep -q "$pattern" "$file" 2>/dev/null; then
+    echo "  FAIL: $label (stale pattern still present: $pattern)"; FAIL=$((FAIL+1))
+  else
+    echo "  PASS: $label"; PASS=$((PASS+1))
+  fi
+}
+echo "=== Change 1: Run Preset Desync ==="
+assert_not_contains "$SCRIPT" "\-\-final-consensus" "C1: no --final-consensus"
+assert_not_contains "$SCRIPT" "gpt-5.3-codex-spark" "C1: no gpt-5.3-codex-spark"
+assert_not_contains "$SCRIPT" "\-\-verify-consensus" "C1: no --verify-consensus"
+assert_contains "$SCRIPT" "\-\-consensus final-only" "C1: --consensus final-only present"
+assert_contains "$SCRIPT" "spark:high" "C1: spark:high present"
+assert_contains "$SCRIPT" "default: haiku" "C1: worker default haiku"
+assert_contains "$SCRIPT" "\-\-lock-worker-model" "C1: --lock-worker-model in options"
+assert_contains "$SCRIPT" "\-\-cb-threshold" "C1: --cb-threshold in options"
+assert_contains "$SCRIPT" "\-\-iter-timeout" "C1: --iter-timeout in options"
+assert_contains "$SCRIPT" "\-\-consensus-model" "C1: --consensus-model in options"
+assert_contains "$SCRIPT" "\-\-mode tmux" "C1: --mode tmux in recommended"
+echo ""
+echo "=== Change 2: Worker Planning Step ==="
+assert_contains "$SCRIPT" "## Planning" "C2: Planning section in Worker prompt"
+assert_contains "$SCRIPT" "step.*plan.*ac_id.*all" "C2: plan execution_step format"
+assert_contains "$SCRIPT" "Keep planning lightweight" "C2: lightweight constraint"
+assert_contains "$GOV" "plan.*write_test.*verify_red" "C2: plan in §1f step types"
+assert_contains "$SCRIPT" "Planning Step.*decision.*info" "C2: Verifier plan audit"
+echo ""
+echo "=== Change 3: Brainstorm Exploration ==="
+assert_contains "$CMD" "Codebase Exploration" "C3: exploration step present"
+assert_contains "$CMD" "greenfield project" "C3: greenfield skip path"
+assert_contains "$CMD" "entry points.*key modules" "C3: exploration instructions"
+echo ""
+echo "=== Change 4: Memory Bridge ==="
+assert_contains "$CMD" "Campaign memory.*Key Decisions" "C4: init seeds memory instruction"
+assert_contains "$SCRIPT" "seeded from brainstorm" "C4: seed markers in template"
+assert_contains "$SCRIPT" "PRESERVE the Key Decisions" "C4: Worker preservation instruction"
+echo ""
+echo "=== Change 5: Coding Principles ==="
+assert_contains "$SCRIPT" "## Coding Principles" "C5: Worker coding principles section"
+assert_contains "$SCRIPT" "Think Before Coding" "C5: principle 1 in Worker"
+assert_contains "$SCRIPT" "Simplicity First" "C5: principle 2 in Worker"
+assert_contains "$SCRIPT" "Surgical Changes" "C5: principle 3 in Worker"
+assert_contains "$SCRIPT" "Goal-Driven Execution" "C5: principle 4 in Worker"
+assert_contains "$SCRIPT" "## Verification Principles" "C5: Verifier principles section"
+assert_contains "$SCRIPT" "Think Before Judging" "C5: Verifier principle 1"
+assert_contains "$SCRIPT" "Goal-Driven Verification" "C5: Verifier principle 2"
+echo ""
+echo "=== RESULTS ==="
+echo "PASS: $PASS / $TOTAL"
+echo "FAIL: $FAIL / $TOTAL"
+[ $FAIL -eq 0 ] && echo "ALL TESTS PASSED" || echo "SOME TESTS FAILED"
+exit $FAIL
+```
+### TDD Flow Per Wave
+**Wave 1 (Change 1):**
+1. Write test → run → expect 11 FAIL (stale patterns present, new patterns absent)
+2. Implement Change 1
+3. Run test → expect 11 PASS
+4. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
+**Wave 2 (Changes 5, 2):**
+1. Run test → expect Change 5 (7 tests) + Change 2 (5 tests) = 12 FAIL
+2. Implement Change 5 (Worker + Verifier principles)
+3. Run test → expect Change 5 PASS, Change 2 still FAIL
+4. Implement Change 2 (Planning step + governance + Verifier audit)
+5. Run test → expect all PASS
+6. `bash -n src/scripts/init_ralph_desk.zsh` (syntax check)
+**Wave 3 (Changes 3, 4):**
+1. Run test → expect Change 3 (3 tests) + Change 4 (3 tests) = 6 FAIL
+2. Implement Change 3 (brainstorm exploration)
+3. Run test → expect Change 3 PASS, Change 4 still FAIL
+4. Implement Change 4 (memory bridge — rlp-desk.md + init)
+5. Run test → expect all PASS
+### Artifact-Based End-to-End Verification
+After all waves, run init on a test slug and verify generated artifacts:
+```bash
+# E2E: generate artifacts and verify
+TEST_SLUG="test-karpathy-e2e"
+TEST_DIR=$(mktemp -d)
+cd "$TEST_DIR" && git init && mkdir -p .claude/ralph-desk
+bash /path/to/src/scripts/init_ralph_desk.zsh "$TEST_SLUG" "test objective"
+# Check Worker prompt
+grep -q "## Coding Principles" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
+grep -q "## Planning" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
+grep -q "Think Before Coding" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
+grep -q "PRESERVE the Key Decisions" .claude/ralph-desk/prompts/$TEST_SLUG.worker.prompt.md
+# Check Verifier prompt
+grep -q "## Verification Principles" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
+grep -q "Think Before Judging" .claude/ralph-desk/prompts/$TEST_SLUG.verifier.prompt.md
+# Check campaign memory
+grep -q "seeded from brainstorm" .claude/ralph-desk/memos/$TEST_SLUG-memory.md
+# Check run presets (capture init output)
+# ... verify --consensus, spark:high, haiku defaults appear
+rm -rf "$TEST_DIR"
+```
+---
+## Self-Verification Gate (CLAUDE.md mandatory)
+3 scenarios required because `governance.md`, `rlp-desk.md`, `init_ralph_desk.zsh` all change.
+**Scenario 1: LOW risk — greenfield campaign, brainstorm skipped**
+- Init with test slug, no brainstorm
+- Verify: Worker prompt has Coding Principles + Planning section, run presets correct, campaign memory has default template (seed markers present but empty), Verifier has Verification Principles
+- Layers: L1 (grep tests) + L3 (E2E artifact check)
+**Scenario 2: MEDIUM risk — full brainstorm flow**
+- Brainstorm + init with codex installed
+- Verify: exploration step in brainstorm, init seeds memory, Worker preserves seeds, run presets show cross-engine commands, Verifier audits plan step
+- Layers: L1 + L2 (real integration) + L3
+**Scenario 3: CRITICAL risk — governance change verification**
+- Verify governance §1f has `plan` in step types
+- Simulate: Worker without plan step → Verifier records `info` (not fail)
+- Simulate: Worker erases Key Decisions → next Worker loses context
+- Layers: L1 + L2 + L3 + governance compliance
+---
+## Post-Commit Checklist
+1. Local file sync (ALL distributable files):
+```bash
+cp src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
+cp src/governance.md ~/.claude/ralph-desk/governance.md
+cp src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
+cp src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
+cp src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
+cp README.md ~/.claude/ralph-desk/README.md
+```
+2. Verify sync:
+```bash
+diff -q src/commands/rlp-desk.md ~/.claude/commands/rlp-desk.md
+diff -q src/governance.md ~/.claude/ralph-desk/governance.md
+diff -q src/scripts/init_ralph_desk.zsh ~/.claude/ralph-desk/init_ralph_desk.zsh
+diff -q src/scripts/run_ralph_desk.zsh ~/.claude/ralph-desk/run_ralph_desk.zsh
+diff -q src/scripts/lib_ralph_desk.zsh ~/.claude/ralph-desk/lib_ralph_desk.zsh
+diff -q README.md ~/.claude/ralph-desk/README.md
+```
+All must produce no output.
+---
+## Critical Files
+| File | Changes |
+|------|---------|
+| `src/scripts/init_ralph_desk.zsh` | C1 (lines 197-238), C2 (lines 316-318, 478), C4 (lines 355, 578-580), C5 (lines 316, 429) |
+| `src/commands/rlp-desk.md` | C3 (lines 25-26), C4 (line 131) |
+| `src/governance.md` | C2 (line 217) |
+| `tests/test_template_generation.sh` | New — TDD test script |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ai-dev-methodologies/rlp-desk",
-  "version": "0.8.0",
+  "version": "0.9.0",
   "description": "Fresh-context iterative loops for Claude Code — autonomous task completion with independent verification",
   "scripts": {
     "postinstall": "node scripts/postinstall.js",

package/src/commands/rlp-desk.md CHANGED Viewed

@@ -23,6 +23,12 @@ Present your suggestion, then wait for the user's confirmation or change.
 Ask about these items one by one (or in small groups):
 1. **Slug** — short identifier (e.g., `auth-refactor`). Suggest one, ask if OK.
 2. **Objective** — what the loop achieves
+2.5. **Codebase Exploration** — Before proposing user stories, examine the project:
+   - Read the project's entry points, key modules, and test structure
+   - Identify architectural patterns in use (frameworks, conventions, test setup)
+   - Note constraints the Worker will encounter (dependencies, build system, existing code style)
+   - Present findings: "I explored the codebase and found: [patterns], [constraints], [existing tests]. This informs the US breakdown below."
+   - If the project is new/empty, skip this step and note "greenfield project."
 3. **User Stories** — discrete units with testable acceptance criteria. Propose a breakdown, ask the user to confirm/modify.
    - Apply INVEST criteria: each US must be Independent, Negotiable, Valuable, Estimable, Small, Testable.
    - **Task Sizing (governance §1c)**: Size each US within the Worker's comfortable zone — smaller than what the Worker can handle, not at its ceiling. Max 3-4 ACs, max 2 files. If a US feels "just barely doable" for the target model, split it further.
@@ -128,7 +134,10 @@ Do NOT auto-decide iteration unit — the user MUST explicitly choose.
 ## `init <slug> [objective]`
 Run: `~/.claude/ralph-desk/init_ralph_desk.zsh <slug> "<objective>" [--mode fresh|improve]`
-If brainstorm was done, auto-fill PRD and test-spec with the results.
+If brainstorm was done, auto-fill:
+- PRD and test-spec with the brainstorm results
+- Campaign memory "Key Decisions" with architectural decisions from brainstorm
+- Campaign memory "Patterns Discovered" with codebase exploration findings (from step 2.5)
 **After init completes, STOP. Do NOT auto-run the loop.**

package/src/governance.md CHANGED Viewed

@@ -214,7 +214,7 @@ This is the default behavior, not an optional flag. Without it, IL-1 (Evidence M
 ### Worker: execution_steps in done-claim.json
 Worker records what was done, in what order, with command evidence in `done-claim.json`:
 - Each step includes: what action, which AC, command executed, exit code, summary
-- Step types: `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
+- Step types: `plan`, `write_test`, `verify_red`, `implement`, `verify_green`, `refactor`, `commit`, `verify`, `verify_existing`
 - This proves the Worker followed test-first approach and did not skip steps
 - **Existing implementation rule**: When code already exists from a prior iteration/campaign, Worker MAY use `verify_existing` instead of `write_test → verify_red → implement → verify_green`. `verify_existing` requires: run all existing tests, record exit codes, confirm all AC are covered by passing tests. Worker MUST NOT skip recording evidence — `verify_existing` is evidence that existing code satisfies AC, not a shortcut to skip verification.