pi-rnd 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +74 -0
- package/agents/rnd-builder.md +98 -0
- package/agents/rnd-integrator.md +104 -0
- package/agents/rnd-planner.md +208 -0
- package/agents/rnd-verifier.md +164 -0
- package/dist/doctor.js +166 -0
- package/dist/doctor.js.map +1 -0
- package/dist/gates/bash-discipline.js +27 -0
- package/dist/gates/bash-discipline.js.map +1 -0
- package/dist/gates/read-evidence-pack.js +23 -0
- package/dist/gates/read-evidence-pack.js.map +1 -0
- package/dist/gates/registry.js +24 -0
- package/dist/gates/registry.js.map +1 -0
- package/dist/gates/rnd-dir-required.js +31 -0
- package/dist/gates/rnd-dir-required.js.map +1 -0
- package/dist/index.js +20 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/prompts.js +58 -0
- package/dist/orchestrator/prompts.js.map +1 -0
- package/dist/orchestrator/rnd-dir.js +20 -0
- package/dist/orchestrator/rnd-dir.js.map +1 -0
- package/dist/orchestrator/spawn.js +67 -0
- package/dist/orchestrator/spawn.js.map +1 -0
- package/dist/orchestrator/start.js +195 -0
- package/dist/orchestrator/start.js.map +1 -0
- package/dist/orchestrator/state.js +15 -0
- package/dist/orchestrator/state.js.map +1 -0
- package/dist/orchestrator/types.js +2 -0
- package/dist/orchestrator/types.js.map +1 -0
- package/docs/PI-API.md +574 -0
- package/docs/PORTING.md +105 -0
- package/package.json +57 -0
- package/skills/fp-practices/SKILL.md +128 -0
- package/skills/fp-practices/bash.md +114 -0
- package/skills/fp-practices/duckdb.md +116 -0
- package/skills/fp-practices/elixir.md +115 -0
- package/skills/fp-practices/javascript.md +119 -0
- package/skills/fp-practices/koka.md +120 -0
- package/skills/fp-practices/lean.md +120 -0
- package/skills/fp-practices/postgresql.md +120 -0
- package/skills/fp-practices/python.md +120 -0
- package/skills/fp-practices/svelte.md +114 -0
- package/skills/kiss-practices/SKILL.md +41 -0
- package/skills/kiss-practices/bash.md +70 -0
- package/skills/kiss-practices/duckdb.md +30 -0
- package/skills/kiss-practices/elixir.md +38 -0
- package/skills/kiss-practices/javascript.md +43 -0
- package/skills/kiss-practices/koka.md +34 -0
- package/skills/kiss-practices/lean.md +45 -0
- package/skills/kiss-practices/markdown.md +20 -0
- package/skills/kiss-practices/postgresql.md +31 -0
- package/skills/kiss-practices/python.md +64 -0
- package/skills/kiss-practices/svelte.md +59 -0
- package/skills/rnd-building/SKILL.md +256 -0
- package/skills/rnd-decomposition/SKILL.md +188 -0
- package/skills/rnd-experiments/SKILL.md +197 -0
- package/skills/rnd-failure-modes/SKILL.md +222 -0
- package/skills/rnd-iteration/SKILL.md +170 -0
- package/skills/rnd-orchestration/SKILL.md +314 -0
- package/skills/rnd-scaling/SKILL.md +188 -0
- package/skills/rnd-verification/SKILL.md +248 -0
- package/skills/using-rnd-framework/SKILL.md +65 -0
|
@@ -0,0 +1,314 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rnd-orchestration
|
|
3
|
+
description: "Use when coordinating multi-agent R&D pipeline execution — provides pipeline overview, agent roles, information barriers, and gate criteria"
|
|
4
|
+
user-invocable: false
|
|
5
|
+
effort: medium
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# R&D Orchestration Framework
|
|
9
|
+
|
|
10
|
+
## When to activate
|
|
11
|
+
Activate when the user invokes any `/rnd-framework:*` command, mentions "rnd framework", or when you detect a complex multi-step coding task that would benefit from structured decomposition and verification.
|
|
12
|
+
|
|
13
|
+
## Epistemic Foundation
|
|
14
|
+
|
|
15
|
+
This is a scientific process. Treat every claim — including your own — with skepticism until proven by evidence.
|
|
16
|
+
|
|
17
|
+
- **A result is true or false.** There is no "almost true", "mostly works", or "close enough".
|
|
18
|
+
- **Evidence must be reproducible.** If you can't reproduce it, it doesn't count.
|
|
19
|
+
- **First results are hypotheses, not conclusions.** Tests passing on the first run is a data point, not proof. What about the second run? Edge cases? Adversarial inputs?
|
|
20
|
+
- **Disconfirmation over confirmation.** Actively try to break things. A result that survives attempts to disprove it is stronger than one you only tried to confirm.
|
|
21
|
+
- **No one is served by false positives.** Passing broken work is worse than blocking correct work. When in doubt, FAIL.
|
|
22
|
+
|
|
23
|
+
## Framework Overview
|
|
24
|
+
|
|
25
|
+
This framework applies the scientific method to structured coding:
|
|
26
|
+
|
|
27
|
+
| Scientific Method | Principle | Role |
|
|
28
|
+
|---|---|---|
|
|
29
|
+
| Hypothesis declaration | Pre-registration | Declare intent + success criteria BEFORE coding |
|
|
30
|
+
| Structured experimentation | Hierarchical decomposition | Break tasks into System → Module → Unit with paired verification |
|
|
31
|
+
| Blinded peer review | Independent verification | Builder and Verifier are separate — Verifier never sees Builder reasoning |
|
|
32
|
+
| Reproducible evidence | Evidence-based gates | No work proceeds without reproducible evidence |
|
|
33
|
+
| Dependency analysis | Parallel scheduling | Identify parallel vs sequential work |
|
|
34
|
+
|
|
35
|
+
## Agent Roles & Information Barriers
|
|
36
|
+
|
|
37
|
+
The framework defines 10 specialized agent roles. Dedicated agents are spawned for each role.
|
|
38
|
+
|
|
39
|
+
**Planner** — Decomposes tasks, writes pre-registration docs with testable success criteria. Uses `rnd-framework:rnd-decomposition` skill.
|
|
40
|
+
**Orchestrator** — Analyzes dependencies, schedules parallel waves, enforces iteration budgets. Uses `rnd-framework:rnd-orchestration` skill.
|
|
41
|
+
**Builder** — Writes code + tests + honest self-assessment. Uses `rnd-framework:rnd-building` skill. Does NOT verify own work.
|
|
42
|
+
**Proof Gate** — Attempts formal Lean 4 proofs of pre-registration criteria. Advisory — results inform the Verifier but do not block the pipeline. Skips when Lean is unavailable.
|
|
43
|
+
**Reality Auditor** — Adversarially verifies external service contracts (SQL schemas, HTTP endpoints, env vars, SDK behavior). Blocking — INVALID_FOUND routes the task back to the Builder before the Verifier sees it.
|
|
44
|
+
**Verifier** — Checks output against pre-registered criteria. Uses `rnd-framework:rnd-verification` skill. Does NOT read Builder's self-assessment (enforced by `read-gate.sh` hook). In multi-judge mode, two independent Verifiers run in parallel; if they disagree, a third **Tiebreaker** Verifier receives both reports (but never self-assessments) and issues the final verdict.
|
|
45
|
+
**Cleanup** — Post-verification per-task entropy reduction: dead code, orphan files, duplicate implementations, stale comments. Applies mutations in-place and rolls back automatically if re-verification breaks. Uses `rnd-framework:rnd-cleanup` skill.
|
|
46
|
+
**Polisher** — Wave-level cross-task seam fixer: detects cross-task duplication, naming and API drift across the wave, helpers that should be lifted to shared locations, and structural inconsistencies. Runs after all per-task cleanup completes. Applies mutations in-place and rolls back automatically if re-verification breaks. Reports written to `$RND_DIR/polish/wave-<N>-polish-report.md`.
|
|
47
|
+
**Integrator** — Merges verified outputs, runs integration/system tests. Uses `rnd-framework:rnd-integration` skill.
|
|
48
|
+
**Data Scientist** — Handles numerical analysis, financial calculations, data wiring, chart generation. Uses `rnd-framework:rnd-data-science` skill. Spawned on-demand when the task requires Julia, DuckDB, or statistical analysis.
|
|
49
|
+
|
|
50
|
+
### Critical Information Flow Rules
|
|
51
|
+
|
|
52
|
+
These barriers are what make the framework work. Violating them defeats the purpose.
|
|
53
|
+
|
|
54
|
+
- Builder → Verifier: Send code, tests, artifacts. BLOCK reasoning, self-assessment, internal notes.
|
|
55
|
+
- Verifier → Builder (on fail): Send actionable feedback. BLOCK suggested fixes, internal reasoning.
|
|
56
|
+
- The Verifier must assess work purely against the pre-registered spec.
|
|
57
|
+
|
|
58
|
+
## Pre-Registration Document Format
|
|
59
|
+
|
|
60
|
+
Every task must have this BEFORE any code is written:
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
Task ID: T<number>
|
|
64
|
+
Intent: One sentence — what and why.
|
|
65
|
+
Approach: Brief planned implementation.
|
|
66
|
+
Expected outputs: Files/functions to produce.
|
|
67
|
+
Success criteria:
|
|
68
|
+
Correctness:
|
|
69
|
+
- [ ] Specific, testable condition 1
|
|
70
|
+
Quality:
|
|
71
|
+
- [ ] Specific, testable condition 2
|
|
72
|
+
Verification level: unit | integration | system
|
|
73
|
+
Dependencies: [list of task IDs]
|
|
74
|
+
Preconditions:
|
|
75
|
+
- [File/content assertion verified before build starts — omit if none]
|
|
76
|
+
External dependencies:
|
|
77
|
+
- system: [DB | API | file | env | service]
|
|
78
|
+
contract: [What is assumed about this system — schema, response shape, format, presence]
|
|
79
|
+
verification: [How this will be confirmed — e.g., Read actual schema, query endpoint, inspect file sample]
|
|
80
|
+
fulfills: [VAL-AREA-NNN, ...]
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Execution Mode
|
|
84
|
+
|
|
85
|
+
Dedicated agents are spawned for each pipeline role. The orchestrator session coordinates them, enforcing information barriers and gate criteria.
|
|
86
|
+
|
|
87
|
+
### Dispatch Policy: Criticality-Driven Model Selection
|
|
88
|
+
|
|
89
|
+
Four agents support **per-spawn model override** based on the per-task `Criticality` field in the pre-registration. Non-adaptive agents always run at their fixed model/effort regardless of criticality.
|
|
90
|
+
|
|
91
|
+
**Per-agent criticality matrix:**
|
|
92
|
+
|
|
93
|
+
| Agent | LOW | MEDIUM | HIGH | Adaptive? |
|
|
94
|
+
|---|---|---|---|---|
|
|
95
|
+
| `rnd-planner` | opus/high | opus/high | opus/xhigh | yes |
|
|
96
|
+
| `rnd-verifier` | sonnet/high | opus/high | opus/xhigh | yes |
|
|
97
|
+
| `rnd-builder` | sonnet/high | sonnet/high | opus/high | yes |
|
|
98
|
+
| `rnd-debugger` | sonnet/high | sonnet/high | opus/high | yes |
|
|
99
|
+
| `rnd-amendment-arbiter` | opus/xhigh | opus/xhigh | opus/xhigh | no (fixed) |
|
|
100
|
+
| `rnd-polisher` | opus/high | opus/high | opus/xhigh | no (per-wave, fixed) |
|
|
101
|
+
|
|
102
|
+
> **Note on non-adaptive agents:** `rnd-amendment-arbiter` and `rnd-polisher` always run at their listed model and effort — the criticality column shows the same value in every tier to make this explicit. Auxiliary agents not in this table (integrator, cleanup, reality-auditor, proof-gate, data-scientist) are also non-adaptive and always use their frontmatter `model:`.
|
|
103
|
+
|
|
104
|
+
**Fallback rule.** If the task has no `Criticality` field (or no pre-reg), the orchestrator does NOT override — the agent's frontmatter `model:` is used. Effort is NOT per-spawn overridable; it stays at the agent's frontmatter value.
|
|
105
|
+
|
|
106
|
+
**Granularity.** Builder/Verifier/Debugger spawns read the criticality of the specific task they are working on (per-task). Planner uses the overall task tree's max-criticality at plan time (or the user-stated complexity at `/rnd-start`).
|
|
107
|
+
|
|
108
|
+
**Dispatch example:**
|
|
109
|
+
|
|
110
|
+
```typescript
|
|
111
|
+
// Task T7 has `Criticality: HIGH` in plan.md → spawn Builder with model="opus"
|
|
112
|
+
pi.events.emit("subagents:rpc:spawn", {
|
|
113
|
+
requestId,
|
|
114
|
+
type: "rnd-builder",
|
|
115
|
+
prompt: "Task: T7\nRND_DIR: ...\n...",
|
|
116
|
+
options: { description: "Build task T7", model: "opus", run_in_background: true, max_turns: 0 },
|
|
117
|
+
});
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**Frontmatter defaults (used when criticality is absent OR for non-adaptive agents):**
|
|
121
|
+
|
|
122
|
+
| Agent | Default model | Effort | Adaptive? |
|
|
123
|
+
|---|---|---|---|
|
|
124
|
+
| `rnd-planner` | opus | high | yes |
|
|
125
|
+
| `rnd-builder` | sonnet | high | yes |
|
|
126
|
+
| `rnd-verifier` | sonnet | high | yes |
|
|
127
|
+
| `rnd-debugger` | sonnet | high | yes |
|
|
128
|
+
| `rnd-proof-gate` | sonnet | low | no (advisory) |
|
|
129
|
+
| `rnd-reality-auditor` | sonnet | low | no |
|
|
130
|
+
| `rnd-amendment-arbiter` | opus | xhigh | no |
|
|
131
|
+
| `rnd-cleanup` | sonnet | medium | no |
|
|
132
|
+
| `rnd-polisher` | opus | high | no |
|
|
133
|
+
| `rnd-integrator` | haiku | low | no |
|
|
134
|
+
| `rnd-data-scientist` | sonnet | medium | no |
|
|
135
|
+
|
|
136
|
+
> **Note on RND_DIR:** The orchestrator computes the artifact directory and sets `$RND_DIR` before spawning agents. Agents receive `$RND_DIR` in their prompt.
|
|
137
|
+
|
|
138
|
+
### Calibration Auto-Escalation
|
|
139
|
+
|
|
140
|
+
Before spawning any adaptive agent (planner, builder, verifier, debugger), the orchestrator checks whether auto-escalation is warranted based on recorded false-pass rate. When the rolling false-pass rate reaches 20%, the next spawn upgrades one tier automatically.
|
|
141
|
+
|
|
142
|
+
- **Promotion warranted:** set the effective tier to the next tier up, use it for model selection in the dispatch table.
|
|
143
|
+
- **No promotion:** use the original tier.
|
|
144
|
+
- **`RND_DISABLE_AUTO_ESCALATION=1`:** disables the entire mechanism.
|
|
145
|
+
|
|
146
|
+
## Subagent Coordination
|
|
147
|
+
|
|
148
|
+
### Spawning Agents
|
|
149
|
+
|
|
150
|
+
All pipeline agents are spawned via PI's subagent RPC:
|
|
151
|
+
|
|
152
|
+
```typescript
|
|
153
|
+
pi.events.emit("subagents:rpc:spawn", {
|
|
154
|
+
requestId,
|
|
155
|
+
type: "rnd-builder", // agent filename stem
|
|
156
|
+
prompt: "...",
|
|
157
|
+
options: { description: "Build task T7", run_in_background: true, max_turns: 0 },
|
|
158
|
+
});
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Subscribe to the reply channel BEFORE emitting the request. Await completion via `subagents:completed` / `subagents:failed` events filtered by agent ID.
|
|
162
|
+
|
|
163
|
+
- **Planner** — decomposes tasks and writes pre-registrations
|
|
164
|
+
- **Builder** — implements tasks with TDD discipline
|
|
165
|
+
- **Verifier** — independently checks outputs against pre-registered criteria
|
|
166
|
+
- **Cleanup** — sweeps dead code and stale artifacts per task after PASS
|
|
167
|
+
- **Integrator** — merges verified outputs and runs integration tests
|
|
168
|
+
|
|
169
|
+
### Blocking Behavior
|
|
170
|
+
|
|
171
|
+
Awaiting `subagents:completed` is how the orchestrator learns a subagent is done. Do not poll `$RND_DIR` files for progress — subscribe to events.
|
|
172
|
+
|
|
173
|
+
- **Never** use `sleep` to wait for subagents
|
|
174
|
+
- **Never** write bash loops to check if build artifacts exist yet
|
|
175
|
+
- **Do** spawn multiple agents in parallel (multiple `subagents:rpc:spawn` emits in one turn) for independent tasks within a wave
|
|
176
|
+
- **Do** use `run_in_background: true` in spawn options if you want to continue working while agents run, then process results when notified
|
|
177
|
+
|
|
178
|
+
## Execution Phases
|
|
179
|
+
|
|
180
|
+
1. **Plan** — Run environment discovery (structured checklist scan for package manager, test framework, CI, external services, env vars, secrets). Decompose the task, write pre-registrations with `fulfills` traceability, build dependency matrix. Generate Validation Contract (numbered VAL-AREA-NNN assertions with exact evidence commands). Produce enriched plan.md with sections: Task Tree, Environment Setup, Infrastructure, Testing Strategy, Worker Guidelines, Validation Contract, Pre-Registration Documents, Dependency Matrix, Execution Schedule, Iteration Budgets. Write exploration cache to `$RND_DIR/exploration/`. In multi-agent mode, the Planner agent handles this phase.
|
|
181
|
+
2. **Schedule** — Create execution waves from dependency matrix. In multi-agent mode, the Orchestrator session handles scheduling directly.
|
|
182
|
+
3. **Build** — Work tasks in parallel within waves. Produce code + tests + self-assessment. Builder agents are spawned per task.
|
|
183
|
+
3.5. **Proof Gate** (advisory, conditional) — Attempt Lean 4 formal proofs for tasks with mathematical invariants. Only runs when:
|
|
184
|
+
- Task has `Proof: lean` annotation in pre-registration
|
|
185
|
+
- Lean is available in PATH
|
|
186
|
+
Results (PROVEN/UNPROVEN) passed to Verifier. Pipeline continues regardless.
|
|
187
|
+
|
|
188
|
+
3.75. **Reality Audit** (blocking, conditional) — Run only when:
|
|
189
|
+
- Task has `External dependencies` declared in pre-registration AND
|
|
190
|
+
- User has not disabled via `--skip-reality-checks`
|
|
191
|
+
Adversarially verifies declared external references. INVALID_FOUND routes back to build.
|
|
192
|
+
If no external dependencies declared → auto-SKIPPED.
|
|
193
|
+
4. **Verify** — Check each task against pre-registered criteria. PASS/FAIL/ITERATE. In multi-agent mode, Verifier agents are spawned independently.
|
|
194
|
+
4. **Cleanup** (per task, after PASS) — Spawn a Cleanup agent for each task that passed verification. The agent detects and removes: dead functions/variables, orphan files, duplicate implementations, and stale comments. Applies mutations in-place and rolls back automatically if re-verification breaks. Reports written to `$RND_DIR/cleanup/T<id>-cleanup-report.md`. A `cleanup: rolled_back` result is not a pipeline failure.
|
|
195
|
+
4.5. **Polish** (wave-level, after all per-task cleanup) — Spawn ONE Polisher agent for the entire wave. The agent detects and fixes cross-task seam issues: cross-task duplication, naming and API drift across the wave, helpers that should be lifted to a shared location, and structural inconsistencies. Applies mutations in-place and rolls back if re-verification breaks. Reports written to `$RND_DIR/polish/wave-<N>-polish-report.md`. A `polish: skipped` result is not a pipeline failure.
|
|
196
|
+
5. **Iterate** — On FAIL, build phase gets feedback only (not fixes). Iteration budget is wave-scoped and tier-keyed (LOW=2, NORMAL=3, HIGH=5, by highest-criticality task in the wave); see `rnd-framework:rnd-iteration` for the table. Budget exhausted → escalate.
|
|
197
|
+
6. **Integrate** — Merge verified outputs, run integration tests, system validation. In multi-agent mode, the Integrator agent handles this phase.
|
|
198
|
+
|
|
199
|
+
## Gate Criteria
|
|
200
|
+
|
|
201
|
+
**Gate 1 (post-plan):** Every task has complete pre-registration with testable criteria, `fulfills` field linking to VAL assertions, and all Validation Contract assertions are covered.
|
|
202
|
+
**Gate 2 (post-build):** Code + tests + artifacts submitted. Tests pass locally.
|
|
203
|
+
**Gate 2.5 (post-reality-audit):** Reality Audit complete for every task in the wave. Any INVALID verdict blocks pipeline progression for that task — it must return to build before proceeding to verification.
|
|
204
|
+
**Gate 3 (post-verify):** Verification PASS on all criteria with evidence.
|
|
205
|
+
**Gate 4 (post-integrate):** Integration tests pass. No regressions. System validation passes.
|
|
206
|
+
|
|
207
|
+
## Task Status Determination
|
|
208
|
+
|
|
209
|
+
Task status is derived from artifact files — no separate state file is needed. At each gate, check:
|
|
210
|
+
|
|
211
|
+
| Artifact exists? | Status |
|
|
212
|
+
|-----------------|--------|
|
|
213
|
+
| `$RND_DIR/integration/wave-<N>-report.md` contains SHIP | integrated |
|
|
214
|
+
| `$RND_DIR/verifications/T<id>-verification.md` contains `Overall Verdict: PASS` | verified |
|
|
215
|
+
| `$RND_DIR/verifications/T<id>-verification.md` contains NEEDS_ITERATION | iterating |
|
|
216
|
+
| `$RND_DIR/builds/T<id>-manifest.md` exists and is non-empty | built |
|
|
217
|
+
| Task in plan.md but no build artifact | planned |
|
|
218
|
+
|
|
219
|
+
**At each gate**, validate the expected artifact exists and is non-empty (use Bash `test -s`). If missing, notify the user via `ctx.ui.notify(text, level)` and do not proceed with that task.
|
|
220
|
+
|
|
221
|
+
**Always use pipeline IDs in user-facing output.** When displaying task references, blocked-by relationships, or status updates, always use `T<n>` pipeline IDs.
|
|
222
|
+
|
|
223
|
+
**Before scheduling each wave**, scan `$RND_DIR/builds/` and `$RND_DIR/verifications/` to determine which tasks are complete. Skip tasks that already have the expected artifacts for the current phase.
|
|
224
|
+
|
|
225
|
+
## User Decision Points
|
|
226
|
+
|
|
227
|
+
When a phase completes and the user needs to decide what happens next, surface options via `ctx.ui.notify(text, level)` and read the user's typed response from the next turn. Present 2-4 concrete options with action-oriented labels instead of open-ended text.
|
|
228
|
+
|
|
229
|
+
Rules:
|
|
230
|
+
- Always include 2-4 concrete options
|
|
231
|
+
- Mark the recommended option first with "(Recommended)" in the label
|
|
232
|
+
- Use short, action-oriented labels (e.g., "Fix P0 blockers first", "Verify wave-1", "Re-plan T3")
|
|
233
|
+
- Put context alongside the options, not in the label
|
|
234
|
+
|
|
235
|
+
Common decision points:
|
|
236
|
+
- **Post-plan:** "Approve plan", "Revise criteria for T2", "Add more tasks"
|
|
237
|
+
- **Post-build:** "Verify this wave", "Re-build T3", "Review findings first"
|
|
238
|
+
- **Post-verify (mixed results):** "Fix P0 issues first (Recommended)", "Fix all issues", "Ship as-is with known issues"
|
|
239
|
+
- **Post-integrate:** "Ship it", "Run another verification pass", "Fix integration failures"
|
|
240
|
+
|
|
241
|
+
## Scaling Rules
|
|
242
|
+
|
|
243
|
+
- **Small tasks (<1hr):** Collapse — one Builder + one Verifier (single judge). Lightweight pre-registration.
|
|
244
|
+
- **Medium tasks:** Full framework with parallel waves. Use 2-judge consensus verification per task.
|
|
245
|
+
- **Large tasks (multi-day):** Add design review gate between Plan and Schedule. Add sub-waves. Use 2-judge consensus verification.
|
|
246
|
+
- **Exploratory:** Add Phase 0 — spike 2-3 approaches with time-box before committing.
|
|
247
|
+
- **High-stakes:** Multi-judge verification (2 judges + tiebreaker on disagreement). Add formal invariants via Proof Gate.
|
|
248
|
+
|
|
249
|
+
## User-Facing Briefs
|
|
250
|
+
|
|
251
|
+
Briefs are user-facing narratives — plain-language updates the user sees in real time while a non-verifier agent works in the background. They live under `$RND_DIR/briefs/` which is mechanically blocked from Verifier agents via the three PreToolUse gate hooks (`hooks/read-gate.sh`, `hooks/glob-grep-gate.sh`, `hooks/bash-gate.sh`). Only Planner, Builder, Debugger, Integrator, and the orchestrator may read or write briefs.
|
|
252
|
+
|
|
253
|
+
**Files (per agent):**
|
|
254
|
+
- Planner: `$RND_DIR/briefs/plan-briefs.md`
|
|
255
|
+
- Builder / Debugger: `$RND_DIR/briefs/T<id>-briefs.md`
|
|
256
|
+
- Integrator: `$RND_DIR/briefs/wave-<N>-briefs.md`
|
|
257
|
+
|
|
258
|
+
All brief files are append-only. Use the Read tool to load existing content, then Write the concatenated result. Never delete prior entries. `mkdir -p "$RND_DIR/briefs"` before first write.
|
|
259
|
+
|
|
260
|
+
**When to append a brief entry:**
|
|
261
|
+
- **On phase completion (always):** one entry summarizing what was built/decided/integrated, surprising findings, unverified assumptions, anything the user should know.
|
|
262
|
+
- **Mid-phase, on a non-trivial judgment call:** one entry capturing the choice in plain language. Pair (do not replace) with the structured `decisions.md` entry.
|
|
263
|
+
|
|
264
|
+
Skip briefs for routine micro-steps, green-tests status, or anything the user can read off the diff or manifest. Signal, not noise.
|
|
265
|
+
|
|
266
|
+
**Entry template:**
|
|
267
|
+
|
|
268
|
+
```markdown
|
|
269
|
+
## [ISO timestamp] — <Phase> <T<id>|wave-<N>>: [decision|completion] — [short title]
|
|
270
|
+
|
|
271
|
+
[One paragraph in plain language. What changed, why it matters, what the user should know. Avoid pipeline internals. If there is an unverified assumption or surprising finding, surface it here.]
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
**Notify the orchestrator** after each brief append by including the brief context in your final response text:
|
|
275
|
+
|
|
276
|
+
```
|
|
277
|
+
[user-brief] <context>: <short title> — see <file path>
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
The orchestrator reads the latest entry and surfaces it to user chat. The orchestrator MUST NOT forward brief content into any Verifier spawn prompt — the hook layer also enforces this mechanically by blocking `/briefs/` reads when the agent is the verifier.
|
|
281
|
+
|
|
282
|
+
## Decisions Log
|
|
283
|
+
|
|
284
|
+
Persistent, append-only record of non-trivial judgment calls shared across Planner, Builder, Debugger, and Integrator. Survives past the chat transcript so the "why we chose X" thread remains discoverable.
|
|
285
|
+
|
|
286
|
+
**File:** `$RND_DIR/briefs/decisions.md` (append-only — Read existing content, then Write the concatenated result; never delete prior entries).
|
|
287
|
+
|
|
288
|
+
**When to log an entry:**
|
|
289
|
+
- Architectural fork between meaningfully different approaches (not surface variations).
|
|
290
|
+
- Scope cut (deferring or rejecting a requirement).
|
|
291
|
+
- Library / framework / primitive choice when there were real alternatives.
|
|
292
|
+
- Interface-shape decision (API contract, function signature) callers will depend on.
|
|
293
|
+
- Non-obvious ordering or sequencing choice.
|
|
294
|
+
- A fork where the LLM-default was rejected in favor of something else — always log these.
|
|
295
|
+
|
|
296
|
+
**When NOT to log:** variable naming, formatting, micro-refactors within a function, following an already-specified path without divergence, decisions dictated by the pre-registration.
|
|
297
|
+
|
|
298
|
+
**Entry template:**
|
|
299
|
+
|
|
300
|
+
```markdown
|
|
301
|
+
## D<N>: [one-line title]
|
|
302
|
+
|
|
303
|
+
- **Phase:** Planning | Building T<id> | Debugging T<id> | Integration wave <N>
|
|
304
|
+
- **Context:** [what situation forced a choice — 1 sentence]
|
|
305
|
+
- **Considered:**
|
|
306
|
+
- A. [option name] — [tradeoff / why it could work]
|
|
307
|
+
- B. [option name] — [tradeoff / why it could work]
|
|
308
|
+
- C. [option name] (optional) — [tradeoff]
|
|
309
|
+
- **Chosen:** [letter + name]
|
|
310
|
+
- **Why:** [1-2 sentences, tied to constraints or evidence]
|
|
311
|
+
- **Would flip if:** [condition under which a different option becomes better]
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
**Explicit-fork discipline:** when an agent makes a decision that qualifies, the agent's output MUST narrate the fork ("I considered A, B, C; chose A because...") before appending the entry. This forces critical thinking at the decision point instead of post-hoc justification.
|
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rnd-scaling
|
|
3
|
+
description: "Use when deciding how much R&D pipeline ceremony a task needs — scales from trivial to high-stakes (dual verification)"
|
|
4
|
+
user-invocable: false
|
|
5
|
+
effort: medium
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# R&D Scaling
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
The R&D pipeline scales to task complexity. A typo fix doesn't need the full pipeline ceremony. A security-critical feature does.
|
|
13
|
+
|
|
14
|
+
**Core principle:** Always use the pipeline. Scale it, don't skip it.
|
|
15
|
+
|
|
16
|
+
## Scaling Tiers
|
|
17
|
+
|
|
18
|
+
### Trivial (fix typo, add log line)
|
|
19
|
+
|
|
20
|
+
**Entry:** `/rnd-framework:rnd-start`
|
|
21
|
+
**Process:**
|
|
22
|
+
1. Write a one-line pre-registration inline
|
|
23
|
+
2. Spawn a Builder agent for the change
|
|
24
|
+
3. Spawn a Verifier agent to check against criteria
|
|
25
|
+
4. Done
|
|
26
|
+
|
|
27
|
+
**Skip:** Planner, dependency scheduling, Integrator
|
|
28
|
+
**Keep:** Pre-registration, verification
|
|
29
|
+
|
|
30
|
+
### Small (<1 hour of work)
|
|
31
|
+
|
|
32
|
+
**Entry:** `/rnd-framework:rnd-start`
|
|
33
|
+
**Process:**
|
|
34
|
+
1. Write a brief pre-registration inline
|
|
35
|
+
2. Spawn a Builder agent with TDD (uses `rnd-framework:rnd-building`)
|
|
36
|
+
3. Spawn a Verifier agent for independent verification
|
|
37
|
+
4. Max 2 iterations
|
|
38
|
+
|
|
39
|
+
**Skip:** Planner subagent, dependency scheduling, Integrator
|
|
40
|
+
**Keep:** Pre-registration, TDD, independent verification
|
|
41
|
+
|
|
42
|
+
### Medium (multiple components, 1-4 hours)
|
|
43
|
+
|
|
44
|
+
**Entry:** `/rnd-framework:rnd-start`
|
|
45
|
+
**Process:**
|
|
46
|
+
1. Spawn `rnd-planner` for hierarchical decomposition
|
|
47
|
+
2. Schedule waves with dependency analysis
|
|
48
|
+
3. Spawn Builder(s) per wave
|
|
49
|
+
4. Independent verification per task
|
|
50
|
+
5. Integration testing per wave
|
|
51
|
+
|
|
52
|
+
**Full pipeline.** All agents, all gates.
|
|
53
|
+
|
|
54
|
+
### Large (multi-day, many components)
|
|
55
|
+
|
|
56
|
+
**Entry:** `/rnd-framework:rnd-start`
|
|
57
|
+
**Process:**
|
|
58
|
+
1. Full pipeline + design review gate between Plan and Schedule
|
|
59
|
+
2. Sub-waves within large waves
|
|
60
|
+
3. Proof Gate skipped unless explicitly requested (rarely needed)
|
|
61
|
+
4. Reality Audit only for tasks with external dependencies
|
|
62
|
+
|
|
63
|
+
### Multi-session (multiple days, independent deliverables)
|
|
64
|
+
|
|
65
|
+
**Entry:** `/rnd-framework:rnd-roadmap`
|
|
66
|
+
**Process:**
|
|
67
|
+
1. Decompose the broad goal into milestones via the Planner in roadmap mode
|
|
68
|
+
2. Each milestone = one pipeline session via `/rnd-framework:rnd-start`
|
|
69
|
+
3. After each session's SHIP verdict, update roadmap.md and start the next milestone
|
|
70
|
+
|
|
71
|
+
**Verification:** Per-session — each milestone goes through the full pipeline independently
|
|
72
|
+
|
|
73
|
+
### High-Stakes (security, financial, data integrity)
|
|
74
|
+
|
|
75
|
+
**Entry:** `/rnd-framework:rnd-start`
|
|
76
|
+
**Process:**
|
|
77
|
+
1. Full pipeline
|
|
78
|
+
2. Dual independent verification (two separate Verifiers)
|
|
79
|
+
3. Adversarial verification: one Verifier specifically tries to break it
|
|
80
|
+
4. Extended iteration budget (5 cycles instead of 3)
|
|
81
|
+
|
|
82
|
+
## Decision Flow
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
Is the task a single-line change?
|
|
86
|
+
-> Trivial tier
|
|
87
|
+
|
|
88
|
+
Can it be done in under an hour with clear criteria?
|
|
89
|
+
-> Small tier
|
|
90
|
+
|
|
91
|
+
Does it involve multiple components or files?
|
|
92
|
+
-> Medium tier
|
|
93
|
+
|
|
94
|
+
Will it take more than a day?
|
|
95
|
+
-> Large tier
|
|
96
|
+
|
|
97
|
+
Will it span multiple sessions/days with independent deliverables?
|
|
98
|
+
-> Multi-session tier
|
|
99
|
+
|
|
100
|
+
Could a failure cause security/financial/data harm?
|
|
101
|
+
-> High-stakes tier
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Verification Depth by Criticality
|
|
105
|
+
|
|
106
|
+
Orthogonal to task size, **criticality** determines how much verification effort each task receives. The Planner should annotate each task in the pre-registration with a criticality tier. The orchestrator reads this annotation to decide verification depth.
|
|
107
|
+
|
|
108
|
+
### LOW criticality
|
|
109
|
+
**Examples:** Config changes, documentation updates, style fixes, renaming, adding log lines.
|
|
110
|
+
**Verification:** Single-judge verification. No Proof Gate. Quality tier is advisory-only.
|
|
111
|
+
**Rationale:** False negatives here are cheap to fix. Over-verifying wastes tokens.
|
|
112
|
+
|
|
113
|
+
### NORMAL criticality (default)
|
|
114
|
+
**Examples:** Standard features, bug fixes, refactors with clear scope.
|
|
115
|
+
**Verification:** Single-judge verification. Standard iteration budget (3).
|
|
116
|
+
**Rationale:** Most tasks live here. One independent judge catches the overwhelming majority of issues at a fraction of the token cost.
|
|
117
|
+
|
|
118
|
+
### HIGH criticality
|
|
119
|
+
**Examples:** Security-sensitive code, data migrations, authentication changes, financial calculations, architectural decisions that constrain future work.
|
|
120
|
+
**Verification:** Single-judge by default. 2-judge consensus available via explicit opt-in (see below). Extended iteration budget (5). If Lean is available, invoke Proof Gate.
|
|
121
|
+
**Rationale:** Sonnet at high effort provides sufficient verification for most high-stakes tasks. Multi-judge available when user explicitly requests maximum confidence.
|
|
122
|
+
|
|
123
|
+
### How the Planner annotates criticality
|
|
124
|
+
|
|
125
|
+
In the pre-registration document, add a `Criticality:` field:
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
Task ID: T3
|
|
129
|
+
Intent: Add rate limiting to API endpoints
|
|
130
|
+
Criticality: HIGH
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
If the Planner omits the field, the orchestrator defaults to NORMAL.
|
|
134
|
+
|
|
135
|
+
### How the orchestrator applies it
|
|
136
|
+
|
|
137
|
+
| Criticality | Judges | Iteration budget | Proof Gate |
|
|
138
|
+
|-------------|--------|-----------------|------------|
|
|
139
|
+
| LOW | 1 | 2 | Skip |
|
|
140
|
+
| NORMAL | 1 | 3 | If available |
|
|
141
|
+
| HIGH | 1 (2 on opt-in) | 5 | If available |
|
|
142
|
+
|
|
143
|
+
### Multi-Judge Opt-In
|
|
144
|
+
|
|
145
|
+
By default, all tasks use single-judge verification. To enable 2-judge consensus for a specific task, the user must explicitly request it when starting the pipeline:
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
/rnd-framework:rnd-start --multi-judge <task description>
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
Or add to the pre-registration:
|
|
152
|
+
```
|
|
153
|
+
Task ID: T3
|
|
154
|
+
Intent: Add rate limiting to API endpoints
|
|
155
|
+
Criticality: HIGH
|
|
156
|
+
Verification: multi-judge
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
When multi-judge is enabled, two independent Verifier agents run in parallel. If they disagree, a third tiebreaker judge resolves the conflict. See `rnd-framework:rnd-multi-judge` for the full consensus protocol.
|
|
160
|
+
|
|
161
|
+
This is the Sherlock principle: place verification effort where it matters most, not uniformly across all tasks.
|
|
162
|
+
|
|
163
|
+
### Agent Model/Effort Routing by Criticality
|
|
164
|
+
|
|
165
|
+
Criticality drives both iteration budget (table above) and per-agent model selection. The authoritative source is `rnd-framework:rnd-orchestration` under "Dispatch Policy". The matrix below mirrors it for quick reference:
|
|
166
|
+
|
|
167
|
+
| Agent | LOW | MEDIUM | HIGH | Adaptive? |
|
|
168
|
+
|---|---|---|---|---|
|
|
169
|
+
| `rnd-planner` | opus/high | opus/high | opus/xhigh | yes |
|
|
170
|
+
| `rnd-verifier` | sonnet/high | opus/high | opus/xhigh | yes |
|
|
171
|
+
| `rnd-builder` | sonnet/high | sonnet/high | opus/high | yes |
|
|
172
|
+
| `rnd-debugger` | sonnet/high | sonnet/high | opus/high | yes |
|
|
173
|
+
| `rnd-amendment-arbiter` | opus/xhigh | opus/xhigh | opus/xhigh | no (fixed) |
|
|
174
|
+
| `rnd-polisher` | opus/high | opus/high | opus/xhigh | no (per-wave, fixed) |
|
|
175
|
+
|
|
176
|
+
Key rules:
|
|
177
|
+
- `rnd-planner` and `rnd-verifier` escalate to opus at MEDIUM and above; `rnd-builder` and `rnd-debugger` escalate only at HIGH.
|
|
178
|
+
- `rnd-amendment-arbiter` and `rnd-polisher` are non-adaptive — they always run at opus regardless of task criticality.
|
|
179
|
+
- Effort is NOT per-spawn overridable; it stays at the agent's frontmatter value.
|
|
180
|
+
|
|
181
|
+
## Anti-Pattern: Skipping the Pipeline
|
|
182
|
+
|
|
183
|
+
"This is too simple for the pipeline" is never true. The pipeline scales down to one pre-registration line and one verification check. That takes 30 seconds. Skipping it means unverified work.
|
|
184
|
+
|
|
185
|
+
## Related Skills
|
|
186
|
+
|
|
187
|
+
- `rnd-framework:rnd-orchestration` — Full pipeline overview
|
|
188
|
+
- `rnd-framework:using-rnd-framework` — Available commands
|