pi-rnd 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (63) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +74 -0
  3. package/agents/rnd-builder.md +98 -0
  4. package/agents/rnd-integrator.md +104 -0
  5. package/agents/rnd-planner.md +208 -0
  6. package/agents/rnd-verifier.md +164 -0
  7. package/dist/doctor.js +166 -0
  8. package/dist/doctor.js.map +1 -0
  9. package/dist/gates/bash-discipline.js +27 -0
  10. package/dist/gates/bash-discipline.js.map +1 -0
  11. package/dist/gates/read-evidence-pack.js +23 -0
  12. package/dist/gates/read-evidence-pack.js.map +1 -0
  13. package/dist/gates/registry.js +24 -0
  14. package/dist/gates/registry.js.map +1 -0
  15. package/dist/gates/rnd-dir-required.js +31 -0
  16. package/dist/gates/rnd-dir-required.js.map +1 -0
  17. package/dist/index.js +20 -0
  18. package/dist/index.js.map +1 -0
  19. package/dist/orchestrator/prompts.js +58 -0
  20. package/dist/orchestrator/prompts.js.map +1 -0
  21. package/dist/orchestrator/rnd-dir.js +20 -0
  22. package/dist/orchestrator/rnd-dir.js.map +1 -0
  23. package/dist/orchestrator/spawn.js +67 -0
  24. package/dist/orchestrator/spawn.js.map +1 -0
  25. package/dist/orchestrator/start.js +195 -0
  26. package/dist/orchestrator/start.js.map +1 -0
  27. package/dist/orchestrator/state.js +15 -0
  28. package/dist/orchestrator/state.js.map +1 -0
  29. package/dist/orchestrator/types.js +2 -0
  30. package/dist/orchestrator/types.js.map +1 -0
  31. package/docs/PI-API.md +574 -0
  32. package/docs/PORTING.md +105 -0
  33. package/package.json +57 -0
  34. package/skills/fp-practices/SKILL.md +128 -0
  35. package/skills/fp-practices/bash.md +114 -0
  36. package/skills/fp-practices/duckdb.md +116 -0
  37. package/skills/fp-practices/elixir.md +115 -0
  38. package/skills/fp-practices/javascript.md +119 -0
  39. package/skills/fp-practices/koka.md +120 -0
  40. package/skills/fp-practices/lean.md +120 -0
  41. package/skills/fp-practices/postgresql.md +120 -0
  42. package/skills/fp-practices/python.md +120 -0
  43. package/skills/fp-practices/svelte.md +114 -0
  44. package/skills/kiss-practices/SKILL.md +41 -0
  45. package/skills/kiss-practices/bash.md +70 -0
  46. package/skills/kiss-practices/duckdb.md +30 -0
  47. package/skills/kiss-practices/elixir.md +38 -0
  48. package/skills/kiss-practices/javascript.md +43 -0
  49. package/skills/kiss-practices/koka.md +34 -0
  50. package/skills/kiss-practices/lean.md +45 -0
  51. package/skills/kiss-practices/markdown.md +20 -0
  52. package/skills/kiss-practices/postgresql.md +31 -0
  53. package/skills/kiss-practices/python.md +64 -0
  54. package/skills/kiss-practices/svelte.md +59 -0
  55. package/skills/rnd-building/SKILL.md +256 -0
  56. package/skills/rnd-decomposition/SKILL.md +188 -0
  57. package/skills/rnd-experiments/SKILL.md +197 -0
  58. package/skills/rnd-failure-modes/SKILL.md +222 -0
  59. package/skills/rnd-iteration/SKILL.md +170 -0
  60. package/skills/rnd-orchestration/SKILL.md +314 -0
  61. package/skills/rnd-scaling/SKILL.md +188 -0
  62. package/skills/rnd-verification/SKILL.md +248 -0
  63. package/skills/using-rnd-framework/SKILL.md +65 -0
@@ -0,0 +1,314 @@
1
+ ---
2
+ name: rnd-orchestration
3
+ description: "Use when coordinating multi-agent R&D pipeline execution — provides pipeline overview, agent roles, information barriers, and gate criteria"
4
+ user-invocable: false
5
+ effort: medium
6
+ ---
7
+
8
+ # R&D Orchestration Framework
9
+
10
+ ## When to activate
11
+ Activate when the user invokes any `/rnd-framework:*` command, mentions "rnd framework", or when you detect a complex multi-step coding task that would benefit from structured decomposition and verification.
12
+
13
+ ## Epistemic Foundation
14
+
15
+ This is a scientific process. Treat every claim — including your own — with skepticism until proven by evidence.
16
+
17
+ - **A result is true or false.** There is no "almost true", "mostly works", or "close enough".
18
+ - **Evidence must be reproducible.** If you can't reproduce it, it doesn't count.
19
+ - **First results are hypotheses, not conclusions.** Tests passing on the first run is a data point, not proof. What about the second run? Edge cases? Adversarial inputs?
20
+ - **Disconfirmation over confirmation.** Actively try to break things. A result that survives attempts to disprove it is stronger than one you only tried to confirm.
21
+ - **No one is served by false positives.** Passing broken work is worse than blocking correct work. When in doubt, FAIL.
22
+
23
+ ## Framework Overview
24
+
25
+ This framework applies the scientific method to structured coding:
26
+
27
+ | Scientific Method | Principle | Role |
28
+ |---|---|---|
29
+ | Hypothesis declaration | Pre-registration | Declare intent + success criteria BEFORE coding |
30
+ | Structured experimentation | Hierarchical decomposition | Break tasks into System → Module → Unit with paired verification |
31
+ | Blinded peer review | Independent verification | Builder and Verifier are separate — Verifier never sees Builder reasoning |
32
+ | Reproducible evidence | Evidence-based gates | No work proceeds without reproducible evidence |
33
+ | Dependency analysis | Parallel scheduling | Identify parallel vs sequential work |
34
+
35
+ ## Agent Roles & Information Barriers
36
+
37
+ The framework defines 10 specialized agent roles. Dedicated agents are spawned for each role.
38
+
39
+ **Planner** — Decomposes tasks, writes pre-registration docs with testable success criteria. Uses `rnd-framework:rnd-decomposition` skill.
40
+ **Orchestrator** — Analyzes dependencies, schedules parallel waves, enforces iteration budgets. Uses `rnd-framework:rnd-orchestration` skill.
41
+ **Builder** — Writes code + tests + honest self-assessment. Uses `rnd-framework:rnd-building` skill. Does NOT verify own work.
42
+ **Proof Gate** — Attempts formal Lean 4 proofs of pre-registration criteria. Advisory — results inform the Verifier but do not block the pipeline. Skips when Lean is unavailable.
43
+ **Reality Auditor** — Adversarially verifies external service contracts (SQL schemas, HTTP endpoints, env vars, SDK behavior). Blocking — INVALID_FOUND routes the task back to the Builder before the Verifier sees it.
44
+ **Verifier** — Checks output against pre-registered criteria. Uses `rnd-framework:rnd-verification` skill. Does NOT read Builder's self-assessment (enforced by `read-gate.sh` hook). In multi-judge mode, two independent Verifiers run in parallel; if they disagree, a third **Tiebreaker** Verifier receives both reports (but never self-assessments) and issues the final verdict.
45
+ **Cleanup** — Post-verification per-task entropy reduction: dead code, orphan files, duplicate implementations, stale comments. Applies mutations in-place and rolls back automatically if re-verification breaks. Uses `rnd-framework:rnd-cleanup` skill.
46
+ **Polisher** — Wave-level cross-task seam fixer: detects cross-task duplication, naming and API drift across the wave, helpers that should be lifted to shared locations, and structural inconsistencies. Runs after all per-task cleanup completes. Applies mutations in-place and rolls back automatically if re-verification breaks. Reports written to `$RND_DIR/polish/wave-<N>-polish-report.md`.
47
+ **Integrator** — Merges verified outputs, runs integration/system tests. Uses `rnd-framework:rnd-integration` skill.
48
+ **Data Scientist** — Handles numerical analysis, financial calculations, data wiring, chart generation. Uses `rnd-framework:rnd-data-science` skill. Spawned on-demand when the task requires Julia, DuckDB, or statistical analysis.
49
+
50
+ ### Critical Information Flow Rules
51
+
52
+ These barriers are what make the framework work. Violating them defeats the purpose.
53
+
54
+ - Builder → Verifier: Send code, tests, artifacts. BLOCK reasoning, self-assessment, internal notes.
55
+ - Verifier → Builder (on fail): Send actionable feedback. BLOCK suggested fixes, internal reasoning.
56
+ - The Verifier must assess work purely against the pre-registered spec.
57
+
58
+ ## Pre-Registration Document Format
59
+
60
+ Every task must have this BEFORE any code is written:
61
+
62
+ ```
63
+ Task ID: T<number>
64
+ Intent: One sentence — what and why.
65
+ Approach: Brief planned implementation.
66
+ Expected outputs: Files/functions to produce.
67
+ Success criteria:
68
+ Correctness:
69
+ - [ ] Specific, testable condition 1
70
+ Quality:
71
+ - [ ] Specific, testable condition 2
72
+ Verification level: unit | integration | system
73
+ Dependencies: [list of task IDs]
74
+ Preconditions:
75
+ - [File/content assertion verified before build starts — omit if none]
76
+ External dependencies:
77
+ - system: [DB | API | file | env | service]
78
+ contract: [What is assumed about this system — schema, response shape, format, presence]
79
+ verification: [How this will be confirmed — e.g., Read actual schema, query endpoint, inspect file sample]
80
+ fulfills: [VAL-AREA-NNN, ...]
81
+ ```
82
+
83
+ ## Execution Mode
84
+
85
+ Dedicated agents are spawned for each pipeline role. The orchestrator session coordinates them, enforcing information barriers and gate criteria.
86
+
87
+ ### Dispatch Policy: Criticality-Driven Model Selection
88
+
89
+ Four agents support **per-spawn model override** based on the per-task `Criticality` field in the pre-registration. Non-adaptive agents always run at their fixed model/effort regardless of criticality.
90
+
91
+ **Per-agent criticality matrix:**
92
+
93
+ | Agent | LOW | MEDIUM | HIGH | Adaptive? |
94
+ |---|---|---|---|---|
95
+ | `rnd-planner` | opus/high | opus/high | opus/xhigh | yes |
96
+ | `rnd-verifier` | sonnet/high | opus/high | opus/xhigh | yes |
97
+ | `rnd-builder` | sonnet/high | sonnet/high | opus/high | yes |
98
+ | `rnd-debugger` | sonnet/high | sonnet/high | opus/high | yes |
99
+ | `rnd-amendment-arbiter` | opus/xhigh | opus/xhigh | opus/xhigh | no (fixed) |
100
+ | `rnd-polisher` | opus/high | opus/high | opus/xhigh | no (per-wave, fixed) |
101
+
102
+ > **Note on non-adaptive agents:** `rnd-amendment-arbiter` and `rnd-polisher` always run at their listed model and effort — the criticality column shows the same value in every tier to make this explicit. Auxiliary agents not in this table (integrator, cleanup, reality-auditor, proof-gate, data-scientist) are also non-adaptive and always use their frontmatter `model:`.
103
+
104
+ **Fallback rule.** If the task has no `Criticality` field (or no pre-reg), the orchestrator does NOT override — the agent's frontmatter `model:` is used. Effort is NOT per-spawn overridable; it stays at the agent's frontmatter value.
105
+
106
+ **Granularity.** Builder/Verifier/Debugger spawns read the criticality of the specific task they are working on (per-task). Planner uses the overall task tree's max-criticality at plan time (or the user-stated complexity at `/rnd-start`).
107
+
108
+ **Dispatch example:**
109
+
110
+ ```typescript
111
+ // Task T7 has `Criticality: HIGH` in plan.md → spawn Builder with model="opus"
112
+ pi.events.emit("subagents:rpc:spawn", {
113
+ requestId,
114
+ type: "rnd-builder",
115
+ prompt: "Task: T7\nRND_DIR: ...\n...",
116
+ options: { description: "Build task T7", model: "opus", run_in_background: true, max_turns: 0 },
117
+ });
118
+ ```
119
+
120
+ **Frontmatter defaults (used when criticality is absent OR for non-adaptive agents):**
121
+
122
+ | Agent | Default model | Effort | Adaptive? |
123
+ |---|---|---|---|
124
+ | `rnd-planner` | opus | high | yes |
125
+ | `rnd-builder` | sonnet | high | yes |
126
+ | `rnd-verifier` | sonnet | high | yes |
127
+ | `rnd-debugger` | sonnet | high | yes |
128
+ | `rnd-proof-gate` | sonnet | low | no (advisory) |
129
+ | `rnd-reality-auditor` | sonnet | low | no |
130
+ | `rnd-amendment-arbiter` | opus | xhigh | no |
131
+ | `rnd-cleanup` | sonnet | medium | no |
132
+ | `rnd-polisher` | opus | high | no |
133
+ | `rnd-integrator` | haiku | low | no |
134
+ | `rnd-data-scientist` | sonnet | medium | no |
135
+
136
+ > **Note on RND_DIR:** The orchestrator computes the artifact directory and sets `$RND_DIR` before spawning agents. Agents receive `$RND_DIR` in their prompt.
137
+
138
+ ### Calibration Auto-Escalation
139
+
140
+ Before spawning any adaptive agent (planner, builder, verifier, debugger), the orchestrator checks whether auto-escalation is warranted based on recorded false-pass rate. When the rolling false-pass rate reaches 20%, the next spawn upgrades one tier automatically.
141
+
142
+ - **Promotion warranted:** set the effective tier to the next tier up, use it for model selection in the dispatch table.
143
+ - **No promotion:** use the original tier.
144
+ - **`RND_DISABLE_AUTO_ESCALATION=1`:** disables the entire mechanism.
145
+
146
+ ## Subagent Coordination
147
+
148
+ ### Spawning Agents
149
+
150
+ All pipeline agents are spawned via PI's subagent RPC:
151
+
152
+ ```typescript
153
+ pi.events.emit("subagents:rpc:spawn", {
154
+ requestId,
155
+ type: "rnd-builder", // agent filename stem
156
+ prompt: "...",
157
+ options: { description: "Build task T7", run_in_background: true, max_turns: 0 },
158
+ });
159
+ ```
160
+
161
+ Subscribe to the reply channel BEFORE emitting the request. Await completion via `subagents:completed` / `subagents:failed` events filtered by agent ID.
162
+
163
+ - **Planner** — decomposes tasks and writes pre-registrations
164
+ - **Builder** — implements tasks with TDD discipline
165
+ - **Verifier** — independently checks outputs against pre-registered criteria
166
+ - **Cleanup** — sweeps dead code and stale artifacts per task after PASS
167
+ - **Integrator** — merges verified outputs and runs integration tests
168
+
169
+ ### Blocking Behavior
170
+
171
+ Awaiting `subagents:completed` is how the orchestrator learns a subagent is done. Do not poll `$RND_DIR` files for progress — subscribe to events.
172
+
173
+ - **Never** use `sleep` to wait for subagents
174
+ - **Never** write bash loops to check if build artifacts exist yet
175
+ - **Do** spawn multiple agents in parallel (multiple `subagents:rpc:spawn` emits in one turn) for independent tasks within a wave
176
+ - **Do** use `run_in_background: true` in spawn options if you want to continue working while agents run, then process results when notified
177
+
178
+ ## Execution Phases
179
+
180
+ 1. **Plan** — Run environment discovery (structured checklist scan for package manager, test framework, CI, external services, env vars, secrets). Decompose the task, write pre-registrations with `fulfills` traceability, build dependency matrix. Generate Validation Contract (numbered VAL-AREA-NNN assertions with exact evidence commands). Produce enriched plan.md with sections: Task Tree, Environment Setup, Infrastructure, Testing Strategy, Worker Guidelines, Validation Contract, Pre-Registration Documents, Dependency Matrix, Execution Schedule, Iteration Budgets. Write exploration cache to `$RND_DIR/exploration/`. In multi-agent mode, the Planner agent handles this phase.
181
+ 2. **Schedule** — Create execution waves from dependency matrix. In multi-agent mode, the Orchestrator session handles scheduling directly.
182
+ 3. **Build** — Work tasks in parallel within waves. Produce code + tests + self-assessment. Builder agents are spawned per task.
183
+ 3.5. **Proof Gate** (advisory, conditional) — Attempt Lean 4 formal proofs for tasks with mathematical invariants. Only runs when:
184
+ - Task has `Proof: lean` annotation in pre-registration
185
+ - Lean is available in PATH
186
+ Results (PROVEN/UNPROVEN) passed to Verifier. Pipeline continues regardless.
187
+
188
+ 3.75. **Reality Audit** (blocking, conditional) — Run only when:
189
+ - Task has `External dependencies` declared in pre-registration AND
190
+ - User has not disabled via `--skip-reality-checks`
191
+ Adversarially verifies declared external references. INVALID_FOUND routes back to build.
192
+ If no external dependencies declared → auto-SKIPPED.
193
+ 4. **Verify** — Check each task against pre-registered criteria. PASS/FAIL/ITERATE. In multi-agent mode, Verifier agents are spawned independently.
194
+ 4. **Cleanup** (per task, after PASS) — Spawn a Cleanup agent for each task that passed verification. The agent detects and removes: dead functions/variables, orphan files, duplicate implementations, and stale comments. Applies mutations in-place and rolls back automatically if re-verification breaks. Reports written to `$RND_DIR/cleanup/T<id>-cleanup-report.md`. A `cleanup: rolled_back` result is not a pipeline failure.
195
+ 4.5. **Polish** (wave-level, after all per-task cleanup) — Spawn ONE Polisher agent for the entire wave. The agent detects and fixes cross-task seam issues: cross-task duplication, naming and API drift across the wave, helpers that should be lifted to a shared location, and structural inconsistencies. Applies mutations in-place and rolls back if re-verification breaks. Reports written to `$RND_DIR/polish/wave-<N>-polish-report.md`. A `polish: skipped` result is not a pipeline failure.
196
+ 5. **Iterate** — On FAIL, build phase gets feedback only (not fixes). Iteration budget is wave-scoped and tier-keyed (LOW=2, NORMAL=3, HIGH=5, by highest-criticality task in the wave); see `rnd-framework:rnd-iteration` for the table. Budget exhausted → escalate.
197
+ 6. **Integrate** — Merge verified outputs, run integration tests, system validation. In multi-agent mode, the Integrator agent handles this phase.
198
+
199
+ ## Gate Criteria
200
+
201
+ **Gate 1 (post-plan):** Every task has complete pre-registration with testable criteria, `fulfills` field linking to VAL assertions, and all Validation Contract assertions are covered.
202
+ **Gate 2 (post-build):** Code + tests + artifacts submitted. Tests pass locally.
203
+ **Gate 2.5 (post-reality-audit):** Reality Audit complete for every task in the wave. Any INVALID verdict blocks pipeline progression for that task — it must return to build before proceeding to verification.
204
+ **Gate 3 (post-verify):** Verification PASS on all criteria with evidence.
205
+ **Gate 4 (post-integrate):** Integration tests pass. No regressions. System validation passes.
206
+
207
+ ## Task Status Determination
208
+
209
+ Task status is derived from artifact files — no separate state file is needed. At each gate, check:
210
+
211
+ | Artifact exists? | Status |
212
+ |-----------------|--------|
213
+ | `$RND_DIR/integration/wave-<N>-report.md` contains SHIP | integrated |
214
+ | `$RND_DIR/verifications/T<id>-verification.md` contains `Overall Verdict: PASS` | verified |
215
+ | `$RND_DIR/verifications/T<id>-verification.md` contains NEEDS_ITERATION | iterating |
216
+ | `$RND_DIR/builds/T<id>-manifest.md` exists and is non-empty | built |
217
+ | Task in plan.md but no build artifact | planned |
218
+
219
+ **At each gate**, validate the expected artifact exists and is non-empty (use Bash `test -s`). If missing, notify the user via `ctx.ui.notify(text, level)` and do not proceed with that task.
220
+
221
+ **Always use pipeline IDs in user-facing output.** When displaying task references, blocked-by relationships, or status updates, always use `T<n>` pipeline IDs.
222
+
223
+ **Before scheduling each wave**, scan `$RND_DIR/builds/` and `$RND_DIR/verifications/` to determine which tasks are complete. Skip tasks that already have the expected artifacts for the current phase.
224
+
225
+ ## User Decision Points
226
+
227
+ When a phase completes and the user needs to decide what happens next, surface options via `ctx.ui.notify(text, level)` and read the user's typed response from the next turn. Present 2-4 concrete options with action-oriented labels instead of open-ended text.
228
+
229
+ Rules:
230
+ - Always include 2-4 concrete options
231
+ - Mark the recommended option first with "(Recommended)" in the label
232
+ - Use short, action-oriented labels (e.g., "Fix P0 blockers first", "Verify wave-1", "Re-plan T3")
233
+ - Put context alongside the options, not in the label
234
+
235
+ Common decision points:
236
+ - **Post-plan:** "Approve plan", "Revise criteria for T2", "Add more tasks"
237
+ - **Post-build:** "Verify this wave", "Re-build T3", "Review findings first"
238
+ - **Post-verify (mixed results):** "Fix P0 issues first (Recommended)", "Fix all issues", "Ship as-is with known issues"
239
+ - **Post-integrate:** "Ship it", "Run another verification pass", "Fix integration failures"
240
+
241
+ ## Scaling Rules
242
+
243
+ - **Small tasks (<1hr):** Collapse — one Builder + one Verifier (single judge). Lightweight pre-registration.
244
+ - **Medium tasks:** Full framework with parallel waves. Use 2-judge consensus verification per task.
245
+ - **Large tasks (multi-day):** Add design review gate between Plan and Schedule. Add sub-waves. Use 2-judge consensus verification.
246
+ - **Exploratory:** Add Phase 0 — spike 2-3 approaches with time-box before committing.
247
+ - **High-stakes:** Multi-judge verification (2 judges + tiebreaker on disagreement). Add formal invariants via Proof Gate.
248
+
249
+ ## User-Facing Briefs
250
+
251
+ Briefs are user-facing narratives — plain-language updates the user sees in real time while a non-verifier agent works in the background. They live under `$RND_DIR/briefs/` which is mechanically blocked from Verifier agents via the three PreToolUse gate hooks (`hooks/read-gate.sh`, `hooks/glob-grep-gate.sh`, `hooks/bash-gate.sh`). Only Planner, Builder, Debugger, Integrator, and the orchestrator may read or write briefs.
252
+
253
+ **Files (per agent):**
254
+ - Planner: `$RND_DIR/briefs/plan-briefs.md`
255
+ - Builder / Debugger: `$RND_DIR/briefs/T<id>-briefs.md`
256
+ - Integrator: `$RND_DIR/briefs/wave-<N>-briefs.md`
257
+
258
+ All brief files are append-only. Use the Read tool to load existing content, then Write the concatenated result. Never delete prior entries. `mkdir -p "$RND_DIR/briefs"` before first write.
259
+
260
+ **When to append a brief entry:**
261
+ - **On phase completion (always):** one entry summarizing what was built/decided/integrated, surprising findings, unverified assumptions, anything the user should know.
262
+ - **Mid-phase, on a non-trivial judgment call:** one entry capturing the choice in plain language. Pair (do not replace) with the structured `decisions.md` entry.
263
+
264
+ Skip briefs for routine micro-steps, green-tests status, or anything the user can read off the diff or manifest. Signal, not noise.
265
+
266
+ **Entry template:**
267
+
268
+ ```markdown
269
+ ## [ISO timestamp] — <Phase> <T<id>|wave-<N>>: [decision|completion] — [short title]
270
+
271
+ [One paragraph in plain language. What changed, why it matters, what the user should know. Avoid pipeline internals. If there is an unverified assumption or surprising finding, surface it here.]
272
+ ```
273
+
274
+ **Notify the orchestrator** after each brief append by including the brief context in your final response text:
275
+
276
+ ```
277
+ [user-brief] <context>: <short title> — see <file path>
278
+ ```
279
+
280
+ The orchestrator reads the latest entry and surfaces it to user chat. The orchestrator MUST NOT forward brief content into any Verifier spawn prompt — the hook layer also enforces this mechanically by blocking `/briefs/` reads when the agent is the verifier.
281
+
282
+ ## Decisions Log
283
+
284
+ Persistent, append-only record of non-trivial judgment calls shared across Planner, Builder, Debugger, and Integrator. Survives past the chat transcript so the "why we chose X" thread remains discoverable.
285
+
286
+ **File:** `$RND_DIR/briefs/decisions.md` (append-only — Read existing content, then Write the concatenated result; never delete prior entries).
287
+
288
+ **When to log an entry:**
289
+ - Architectural fork between meaningfully different approaches (not surface variations).
290
+ - Scope cut (deferring or rejecting a requirement).
291
+ - Library / framework / primitive choice when there were real alternatives.
292
+ - Interface-shape decision (API contract, function signature) callers will depend on.
293
+ - Non-obvious ordering or sequencing choice.
294
+ - A fork where the LLM-default was rejected in favor of something else — always log these.
295
+
296
+ **When NOT to log:** variable naming, formatting, micro-refactors within a function, following an already-specified path without divergence, decisions dictated by the pre-registration.
297
+
298
+ **Entry template:**
299
+
300
+ ```markdown
301
+ ## D<N>: [one-line title]
302
+
303
+ - **Phase:** Planning | Building T<id> | Debugging T<id> | Integration wave <N>
304
+ - **Context:** [what situation forced a choice — 1 sentence]
305
+ - **Considered:**
306
+ - A. [option name] — [tradeoff / why it could work]
307
+ - B. [option name] — [tradeoff / why it could work]
308
+ - C. [option name] (optional) — [tradeoff]
309
+ - **Chosen:** [letter + name]
310
+ - **Why:** [1-2 sentences, tied to constraints or evidence]
311
+ - **Would flip if:** [condition under which a different option becomes better]
312
+ ```
313
+
314
+ **Explicit-fork discipline:** when an agent makes a decision that qualifies, the agent's output MUST narrate the fork ("I considered A, B, C; chose A because...") before appending the entry. This forces critical thinking at the decision point instead of post-hoc justification.
@@ -0,0 +1,188 @@
1
+ ---
2
+ name: rnd-scaling
3
+ description: "Use when deciding how much R&D pipeline ceremony a task needs — scales from trivial to high-stakes (dual verification)"
4
+ user-invocable: false
5
+ effort: medium
6
+ ---
7
+
8
+ # R&D Scaling
9
+
10
+ ## Overview
11
+
12
+ The R&D pipeline scales to task complexity. A typo fix doesn't need the full pipeline ceremony. A security-critical feature does.
13
+
14
+ **Core principle:** Always use the pipeline. Scale it, don't skip it.
15
+
16
+ ## Scaling Tiers
17
+
18
+ ### Trivial (fix typo, add log line)
19
+
20
+ **Entry:** `/rnd-framework:rnd-start`
21
+ **Process:**
22
+ 1. Write a one-line pre-registration inline
23
+ 2. Spawn a Builder agent for the change
24
+ 3. Spawn a Verifier agent to check against criteria
25
+ 4. Done
26
+
27
+ **Skip:** Planner, dependency scheduling, Integrator
28
+ **Keep:** Pre-registration, verification
29
+
30
+ ### Small (<1 hour of work)
31
+
32
+ **Entry:** `/rnd-framework:rnd-start`
33
+ **Process:**
34
+ 1. Write a brief pre-registration inline
35
+ 2. Spawn a Builder agent with TDD (uses `rnd-framework:rnd-building`)
36
+ 3. Spawn a Verifier agent for independent verification
37
+ 4. Max 2 iterations
38
+
39
+ **Skip:** Planner subagent, dependency scheduling, Integrator
40
+ **Keep:** Pre-registration, TDD, independent verification
41
+
42
+ ### Medium (multiple components, 1-4 hours)
43
+
44
+ **Entry:** `/rnd-framework:rnd-start`
45
+ **Process:**
46
+ 1. Spawn `rnd-planner` for hierarchical decomposition
47
+ 2. Schedule waves with dependency analysis
48
+ 3. Spawn Builder(s) per wave
49
+ 4. Independent verification per task
50
+ 5. Integration testing per wave
51
+
52
+ **Full pipeline.** All agents, all gates.
53
+
54
+ ### Large (multi-day, many components)
55
+
56
+ **Entry:** `/rnd-framework:rnd-start`
57
+ **Process:**
58
+ 1. Full pipeline + design review gate between Plan and Schedule
59
+ 2. Sub-waves within large waves
60
+ 3. Proof Gate skipped unless explicitly requested (rarely needed)
61
+ 4. Reality Audit only for tasks with external dependencies
62
+
63
+ ### Multi-session (multiple days, independent deliverables)
64
+
65
+ **Entry:** `/rnd-framework:rnd-roadmap`
66
+ **Process:**
67
+ 1. Decompose the broad goal into milestones via the Planner in roadmap mode
68
+ 2. Each milestone = one pipeline session via `/rnd-framework:rnd-start`
69
+ 3. After each session's SHIP verdict, update roadmap.md and start the next milestone
70
+
71
+ **Verification:** Per-session — each milestone goes through the full pipeline independently
72
+
73
+ ### High-Stakes (security, financial, data integrity)
74
+
75
+ **Entry:** `/rnd-framework:rnd-start`
76
+ **Process:**
77
+ 1. Full pipeline
78
+ 2. Dual independent verification (two separate Verifiers)
79
+ 3. Adversarial verification: one Verifier specifically tries to break it
80
+ 4. Extended iteration budget (5 cycles instead of 3)
81
+
82
+ ## Decision Flow
83
+
84
+ ```
85
+ Is the task a single-line change?
86
+ -> Trivial tier
87
+
88
+ Can it be done in under an hour with clear criteria?
89
+ -> Small tier
90
+
91
+ Does it involve multiple components or files?
92
+ -> Medium tier
93
+
94
+ Will it take more than a day?
95
+ -> Large tier
96
+
97
+ Will it span multiple sessions/days with independent deliverables?
98
+ -> Multi-session tier
99
+
100
+ Could a failure cause security/financial/data harm?
101
+ -> High-stakes tier
102
+ ```
103
+
104
+ ## Verification Depth by Criticality
105
+
106
+ Orthogonal to task size, **criticality** determines how much verification effort each task receives. The Planner should annotate each task in the pre-registration with a criticality tier. The orchestrator reads this annotation to decide verification depth.
107
+
108
+ ### LOW criticality
109
+ **Examples:** Config changes, documentation updates, style fixes, renaming, adding log lines.
110
+ **Verification:** Single-judge verification. No Proof Gate. Quality tier is advisory-only.
111
+ **Rationale:** False negatives here are cheap to fix. Over-verifying wastes tokens.
112
+
113
+ ### NORMAL criticality (default)
114
+ **Examples:** Standard features, bug fixes, refactors with clear scope.
115
+ **Verification:** Single-judge verification. Standard iteration budget (3).
116
+ **Rationale:** Most tasks live here. One independent judge catches the overwhelming majority of issues at a fraction of the token cost.
117
+
118
+ ### HIGH criticality
119
+ **Examples:** Security-sensitive code, data migrations, authentication changes, financial calculations, architectural decisions that constrain future work.
120
+ **Verification:** Single-judge by default. 2-judge consensus available via explicit opt-in (see below). Extended iteration budget (5). If Lean is available, invoke Proof Gate.
121
+ **Rationale:** Sonnet at high effort provides sufficient verification for most high-stakes tasks. Multi-judge available when user explicitly requests maximum confidence.
122
+
123
+ ### How the Planner annotates criticality
124
+
125
+ In the pre-registration document, add a `Criticality:` field:
126
+
127
+ ```
128
+ Task ID: T3
129
+ Intent: Add rate limiting to API endpoints
130
+ Criticality: HIGH
131
+ ```
132
+
133
+ If the Planner omits the field, the orchestrator defaults to NORMAL.
134
+
135
+ ### How the orchestrator applies it
136
+
137
+ | Criticality | Judges | Iteration budget | Proof Gate |
138
+ |-------------|--------|-----------------|------------|
139
+ | LOW | 1 | 2 | Skip |
140
+ | NORMAL | 1 | 3 | If available |
141
+ | HIGH | 1 (2 on opt-in) | 5 | If available |
142
+
143
+ ### Multi-Judge Opt-In
144
+
145
+ By default, all tasks use single-judge verification. To enable 2-judge consensus for a specific task, the user must explicitly request it when starting the pipeline:
146
+
147
+ ```
148
+ /rnd-framework:rnd-start --multi-judge <task description>
149
+ ```
150
+
151
+ Or add to the pre-registration:
152
+ ```
153
+ Task ID: T3
154
+ Intent: Add rate limiting to API endpoints
155
+ Criticality: HIGH
156
+ Verification: multi-judge
157
+ ```
158
+
159
+ When multi-judge is enabled, two independent Verifier agents run in parallel. If they disagree, a third tiebreaker judge resolves the conflict. See `rnd-framework:rnd-multi-judge` for the full consensus protocol.
160
+
161
+ This is the Sherlock principle: place verification effort where it matters most, not uniformly across all tasks.
162
+
163
+ ### Agent Model/Effort Routing by Criticality
164
+
165
+ Criticality drives both iteration budget (table above) and per-agent model selection. The authoritative source is `rnd-framework:rnd-orchestration` under "Dispatch Policy". The matrix below mirrors it for quick reference:
166
+
167
+ | Agent | LOW | MEDIUM | HIGH | Adaptive? |
168
+ |---|---|---|---|---|
169
+ | `rnd-planner` | opus/high | opus/high | opus/xhigh | yes |
170
+ | `rnd-verifier` | sonnet/high | opus/high | opus/xhigh | yes |
171
+ | `rnd-builder` | sonnet/high | sonnet/high | opus/high | yes |
172
+ | `rnd-debugger` | sonnet/high | sonnet/high | opus/high | yes |
173
+ | `rnd-amendment-arbiter` | opus/xhigh | opus/xhigh | opus/xhigh | no (fixed) |
174
+ | `rnd-polisher` | opus/high | opus/high | opus/xhigh | no (per-wave, fixed) |
175
+
176
+ Key rules:
177
+ - `rnd-planner` and `rnd-verifier` escalate to opus at MEDIUM and above; `rnd-builder` and `rnd-debugger` escalate only at HIGH.
178
+ - `rnd-amendment-arbiter` and `rnd-polisher` are non-adaptive — they always run at opus regardless of task criticality.
179
+ - Effort is NOT per-spawn overridable; it stays at the agent's frontmatter value.
180
+
181
+ ## Anti-Pattern: Skipping the Pipeline
182
+
183
+ "This is too simple for the pipeline" is never true. The pipeline scales down to one pre-registration line and one verification check. That takes 30 seconds. Skipping it means unverified work.
184
+
185
+ ## Related Skills
186
+
187
+ - `rnd-framework:rnd-orchestration` — Full pipeline overview
188
+ - `rnd-framework:using-rnd-framework` — Available commands