pi-rnd 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +74 -0
- package/agents/rnd-builder.md +98 -0
- package/agents/rnd-integrator.md +104 -0
- package/agents/rnd-planner.md +208 -0
- package/agents/rnd-verifier.md +164 -0
- package/dist/doctor.js +166 -0
- package/dist/doctor.js.map +1 -0
- package/dist/gates/bash-discipline.js +27 -0
- package/dist/gates/bash-discipline.js.map +1 -0
- package/dist/gates/read-evidence-pack.js +23 -0
- package/dist/gates/read-evidence-pack.js.map +1 -0
- package/dist/gates/registry.js +24 -0
- package/dist/gates/registry.js.map +1 -0
- package/dist/gates/rnd-dir-required.js +31 -0
- package/dist/gates/rnd-dir-required.js.map +1 -0
- package/dist/index.js +20 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/prompts.js +58 -0
- package/dist/orchestrator/prompts.js.map +1 -0
- package/dist/orchestrator/rnd-dir.js +20 -0
- package/dist/orchestrator/rnd-dir.js.map +1 -0
- package/dist/orchestrator/spawn.js +67 -0
- package/dist/orchestrator/spawn.js.map +1 -0
- package/dist/orchestrator/start.js +195 -0
- package/dist/orchestrator/start.js.map +1 -0
- package/dist/orchestrator/state.js +15 -0
- package/dist/orchestrator/state.js.map +1 -0
- package/dist/orchestrator/types.js +2 -0
- package/dist/orchestrator/types.js.map +1 -0
- package/docs/PI-API.md +574 -0
- package/docs/PORTING.md +105 -0
- package/package.json +57 -0
- package/skills/fp-practices/SKILL.md +128 -0
- package/skills/fp-practices/bash.md +114 -0
- package/skills/fp-practices/duckdb.md +116 -0
- package/skills/fp-practices/elixir.md +115 -0
- package/skills/fp-practices/javascript.md +119 -0
- package/skills/fp-practices/koka.md +120 -0
- package/skills/fp-practices/lean.md +120 -0
- package/skills/fp-practices/postgresql.md +120 -0
- package/skills/fp-practices/python.md +120 -0
- package/skills/fp-practices/svelte.md +114 -0
- package/skills/kiss-practices/SKILL.md +41 -0
- package/skills/kiss-practices/bash.md +70 -0
- package/skills/kiss-practices/duckdb.md +30 -0
- package/skills/kiss-practices/elixir.md +38 -0
- package/skills/kiss-practices/javascript.md +43 -0
- package/skills/kiss-practices/koka.md +34 -0
- package/skills/kiss-practices/lean.md +45 -0
- package/skills/kiss-practices/markdown.md +20 -0
- package/skills/kiss-practices/postgresql.md +31 -0
- package/skills/kiss-practices/python.md +64 -0
- package/skills/kiss-practices/svelte.md +59 -0
- package/skills/rnd-building/SKILL.md +256 -0
- package/skills/rnd-decomposition/SKILL.md +188 -0
- package/skills/rnd-experiments/SKILL.md +197 -0
- package/skills/rnd-failure-modes/SKILL.md +222 -0
- package/skills/rnd-iteration/SKILL.md +170 -0
- package/skills/rnd-orchestration/SKILL.md +314 -0
- package/skills/rnd-scaling/SKILL.md +188 -0
- package/skills/rnd-verification/SKILL.md +248 -0
- package/skills/using-rnd-framework/SKILL.md +65 -0
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rnd-failure-modes
|
|
3
|
+
description: Use when verifying work or reviewing your own reasoning — a catalog of failure modes and anti-patterns that cause false PASSes, missed defects, and broken quality gates
|
|
4
|
+
user-invocable: false
|
|
5
|
+
effort: low
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# R&D Failure Modes
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
A catalog of known verification failure modes — anti-patterns that cause agents to issue false PASSes, miss real defects, or abandon quality standards under pressure. Scan this catalog before writing any verdict. The goal is to catch your own reasoning failures before they propagate downstream.
|
|
13
|
+
|
|
14
|
+
**Core principle:** If you recognize one of these patterns in your own thinking, stop and correct course. Naming the failure mode is the first step to avoiding it.
|
|
15
|
+
|
|
16
|
+
## When to Use
|
|
17
|
+
|
|
18
|
+
- Before writing any PASS, FAIL, or NEEDS ITERATION verdict
|
|
19
|
+
- When you notice yourself wanting to be done more than wanting to be right
|
|
20
|
+
- When reviewing your own reasoning during verification
|
|
21
|
+
- When an iteration cycle feels like it should be over but the evidence is thin
|
|
22
|
+
- When a Builder's claim sounds plausible and you haven't verified it independently
|
|
23
|
+
|
|
24
|
+
**Do not use this catalog to**: second-guess legitimate PASSes backed by strong evidence. Its purpose is to surface rationalization, not to manufacture doubt.
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Failure Mode Catalog
|
|
29
|
+
|
|
30
|
+
These are the known failure modes this framework has encountered. Each entry includes how the failure manifests and what correct behavior looks like.
|
|
31
|
+
|
|
32
|
+
### 1. Premature Satisfaction
|
|
33
|
+
|
|
34
|
+
**How it manifests:** You read the code, it looks reasonable, and you write PASS without running tests or tracing execution. The "seems fine" feeling replaces evidence. You may say things like "the implementation clearly handles this case."
|
|
35
|
+
|
|
36
|
+
**Correct behavior:** Every criterion requires concrete, independently produced evidence — test output you ran yourself, code line references with traced execution paths. "Looks right" is not evidence. Run it. Break it. Trace it.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
### 2. Trusting Agent Reports
|
|
41
|
+
|
|
42
|
+
**How it manifests:** The Builder's manifest says "all tests pass" and you accept it. You check whether the claim was made, not whether it is true. Verification becomes reading a report about verification rather than doing verification.
|
|
43
|
+
|
|
44
|
+
**Correct behavior:** Run tests yourself. Read what the tests actually assert — not just that they exist. An agent claiming tests pass does not make them pass, and a test that asserts the wrong thing can pass while the criterion remains unmet.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### 3. Should-Work-Now Fallacy
|
|
49
|
+
|
|
50
|
+
**How it manifests:** After seeing a fix applied, you reason forward: "the bug was X, they fixed X, therefore it works now." You skip re-running the tests because the fix looks right.
|
|
51
|
+
|
|
52
|
+
**Correct behavior:** Re-run the tests. Fixes introduce regressions. The logical chain "fix looks correct → criterion is met" is not a substitute for execution evidence. The test suite tells you what actually happened.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
### 4. Anchoring on Builder Self-Assessment
|
|
57
|
+
|
|
58
|
+
**How it manifests:** You read (or recall) the Builder's self-assessment — their confidence levels, their "known issues" framing — and your verification becomes confirming or refuting their claims rather than independently evaluating the spec. Your findings track the Builder's narrative.
|
|
59
|
+
|
|
60
|
+
**Correct behavior:** The information barrier exists for this reason. Self-assessment files are blocked by hooks. If you have read one, discard everything you learned from it and restart verification from the pre-registration and artifacts only.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
### 5. Incomplete Verification
|
|
65
|
+
|
|
66
|
+
**How it manifests:** You verify 4 of 5 criteria and write a verdict. The 5th criterion was "minor" or "obviously fine" or you ran out of time. You issue PASS or NEEDS ITERATION without evidence for every criterion.
|
|
67
|
+
|
|
68
|
+
**Correct behavior:** Every criterion listed in the pre-registration gets a verdict with evidence. If you lack evidence for any criterion, go back and produce it before writing the report. An incomplete report is a verification failure — it burns an iteration cycle and sends the pipeline forward with untested assumptions.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
### 6. Exit Velocity Bias
|
|
73
|
+
|
|
74
|
+
**How it manifests:** You want to finish. The work looks good. You become motivated to find reasons to PASS rather than reasons to look harder. Your failure mode analysis becomes cursory. You stop probing before you've actually tried to break anything.
|
|
75
|
+
|
|
76
|
+
**Correct behavior:** The desire to be done is not evidence. Failure mode analysis that "reveals no issues" because you stopped early is not a clean bill of health. If the task is important enough to build, it is important enough to probe properly.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### 7. Scope Creep in Verification
|
|
81
|
+
|
|
82
|
+
**How it manifests:** You test beyond the pre-registered criteria. You find problems that weren't in scope and issue FAIL based on them. Or you invent quality standards not present in the pre-registration and mark criteria as unmet because of style, elegance, or unstated requirements.
|
|
83
|
+
|
|
84
|
+
**Correct behavior:** Your reference is the pre-registration, nothing else. Criteria are binary and fixed — met or not met against the spec. Note out-of-scope observations separately if useful, but they do not influence the verdict.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
### 8. Partial Fix Acceptance
|
|
89
|
+
|
|
90
|
+
**How it manifests:** The Builder fixes the primary failure and resubmits. You check the primary failure is resolved and issue PASS, forgetting that the previous report flagged multiple failures. The other failures are still present but you didn't re-examine them.
|
|
91
|
+
|
|
92
|
+
**Correct behavior:** When verifying an iteration, re-check every previously failed criterion, not just the one the Builder addressed. Builders sometimes fix one thing and inadvertently break another, or address only the loudest failure and leave others.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
### 9. False Precision in Evidence
|
|
97
|
+
|
|
98
|
+
**How it manifests:** You cite a line number as evidence but haven't traced what the code does at that line. You mention a test name as evidence but haven't read what it asserts. The evidence looks specific but is not actually verified.
|
|
99
|
+
|
|
100
|
+
**Correct behavior:** Evidence must be the result of active verification: code you read and understood, tests you ran and whose output you recorded, execution paths you traced. Citing identifiers without understanding is not evidence — it is the appearance of evidence.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
### 10. Verbal PASS
|
|
105
|
+
|
|
106
|
+
**How it manifests:** You express satisfaction about the work in your reasoning ("this is well-structured", "the implementation is clean") and then write PASS. The compliments contaminate the verdict — the positive framing becomes the evidence.
|
|
107
|
+
|
|
108
|
+
**Correct behavior:** Aesthetic judgments are not criteria. Separate qualitative impressions from criterion verdicts. The only question is: does the artifact meet each pre-registered criterion? Evidence answers that question; positive impressions do not.
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
### 11. Deflection
|
|
113
|
+
|
|
114
|
+
**How it manifests:** You identify a problem but dismiss it as "pre-existing", "by design", or "not in scope" rather than reporting it. You rationalize that because the issue predates this change, or was intended, it is exempt from your verdict. The issue goes unreported and unfixed.
|
|
115
|
+
|
|
116
|
+
**Correct behavior:** Every finding must include a proposed fix. Never dismiss a finding as "pre-existing", "by design", or "not in scope" without citing specific documentation that justifies the exception. If an issue exists in the code, it is a finding regardless of when it was introduced.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
### 12. Pipeline Ceremony Shortcut
|
|
121
|
+
|
|
122
|
+
**How it manifests:** The task looks simple — a config change, a one-line fix, a documentation update — so you skip phases. You build inline without planning, skip verification because "it's obvious," or commit without integration testing. The framework's scaling tiers exist for this exact pressure, but you bypass them.
|
|
123
|
+
|
|
124
|
+
**Correct behavior:** Even trivial tasks get at minimum a pre-registration line and single-judge verification. The rnd-scaling skill defines the minimum ceremony for each tier. "Too simple to verify" is the pipeline equivalent of "too simple to test."
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
### 13. Attention Decay Drift
|
|
129
|
+
|
|
130
|
+
**How it manifests:** Deep into a long session, you start ignoring pre-registration criteria, deviating from the planned approach, or forgetting constraints established earlier. Your outputs gradually drift from the original requirements. You may not notice because the drift is gradual — each step seems locally reasonable.
|
|
131
|
+
|
|
132
|
+
**Correct behavior:** Use SCAN re-anchoring (output a compliance statement before each criterion). Read the pre-registration again — not from memory, from the file. If context has been compacted, re-read `$RND_DIR/plan.md`. Research shows system prompt tokens command only ~1% of attention at 80K context tokens; active re-generation restores the weight.
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
### 14. Resource Hallucination
|
|
137
|
+
|
|
138
|
+
**How it manifests:** You reference an API that doesn't exist, import a module that was never created, call a function with the wrong signature, or assume a dependency is available that isn't installed. The code looks plausible but uses phantom resources.
|
|
139
|
+
|
|
140
|
+
**Correct behavior:** Before using any API, function, or module you didn't just create: verify it exists. Read the file, grep for the export, check the package.json. The pre-registration's "External dependencies" field exists to force this verification. If you're importing something, confirm it's real.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
### 15. Mapping Hallucination
|
|
145
|
+
|
|
146
|
+
**How it manifests:** You misunderstand how code parts connect — wrong data flow, incorrect call chain, confused ownership between modules. You build something that would work if the architecture were what you imagine, but it's not. Common when working with unfamiliar codebases.
|
|
147
|
+
|
|
148
|
+
**Correct behavior:** Before implementing, trace the actual data flow and call chain in the existing code. Read the files, follow the imports, understand the relationships. The exploration cache (`$RND_DIR/exploration/`) exists for this. Don't assume architecture — verify it.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
### 16. Self-Deception Cycle
|
|
153
|
+
|
|
154
|
+
**How it manifests:** You write tests that encode the same misconceptions as your implementation. The code is wrong, the tests are wrong in the same way, and everything passes. This is especially dangerous with LLM-generated tests because the same model produces both the code and the tests.
|
|
155
|
+
|
|
156
|
+
**Correct behavior:** Write tests BEFORE implementation (TDD). Test properties and invariants rather than specific outputs when possible — properties are harder to hallucinate incorrectly. Use the rnd-building skill's Property-Based Testing guidance. When verifying, run the builder's tests but also write independent experiments from the spec alone.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
### 17. Observation Flooding
|
|
161
|
+
|
|
162
|
+
**How it manifests:** You run a command that produces 500 lines of output. The raw output fills your context window. Your subsequent reasoning degrades because the useful signal (3 lines of error messages) is buried in noise (497 lines of passing test output). You may not realize your capacity for the actual task has diminished.
|
|
163
|
+
|
|
164
|
+
**Correct behavior:** When tool output exceeds ~50 lines, summarize the key signal: pass/fail counts, error messages, failing test names. Don't paste or re-read the full output. The observation-mask hook will remind you, but apply this discipline proactively.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
### 18. Post-Compaction Amnesia
|
|
169
|
+
|
|
170
|
+
**How it manifests:** After context compaction, you lose track of the original requirements, the current task, or constraints established earlier in the session. You continue working but on a subtly different problem. The compact-state.json restoration gives you the facts but not the nuanced understanding.
|
|
171
|
+
|
|
172
|
+
**Correct behavior:** After compaction, re-read `$RND_DIR/plan.md` and the current task's pre-registration. If you can't recall the task ID, plan summary, and iteration count without looking them up, your context has degraded — reload before continuing.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## Red Flag Phrases
|
|
177
|
+
|
|
178
|
+
When you find yourself writing or thinking any of the following, stop and check your reasoning:
|
|
179
|
+
|
|
180
|
+
1. **"should work now"** — assertion without evidence; run the tests
|
|
181
|
+
2. **"probably passes"** — probability is not evidence; verify it
|
|
182
|
+
3. **"clearly handles this"** — "clearly" hides an unverified assumption; trace it
|
|
183
|
+
4. **"looks correct"** — appearances are not evidence; execute and observe
|
|
184
|
+
5. **"the Builder addressed this"** — the Builder's claim is not the same as the criterion being met
|
|
185
|
+
6. **"this is obviously fine"** — obvious things still need evidence; if it's obvious, verification is fast
|
|
186
|
+
7. **"I'll check the rest next round"** — there is no free next round; report all findings now
|
|
187
|
+
8. **"close enough"** — criteria are binary; close enough is FAIL
|
|
188
|
+
9. **"the tests pass, so it works"** — inspect what the tests assert, not just that they pass
|
|
189
|
+
10. **"I already checked something similar"** — prior checks do not transfer; each criterion gets fresh evidence
|
|
190
|
+
11. **"Great!"** (before issuing verdict) — positive affect before evidence is a warning sign
|
|
191
|
+
12. **"I'm confident this is correct"** — confidence without evidence is the definition of Premature Satisfaction
|
|
192
|
+
13. **"too simple to need verification"** — the scaling skill defines minimum ceremony for every tier; nothing is exempt
|
|
193
|
+
14. **"I remember the requirement says..."** — memory degrades; re-read the pre-registration file, don't recall from context
|
|
194
|
+
15. **"this API/function should exist"** — should is not does; grep for it before using it
|
|
195
|
+
16. **"the architecture must work like..."** — must is not does; trace the actual code path before assuming
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Using This Catalog During Verification
|
|
200
|
+
|
|
201
|
+
Before writing any verdict:
|
|
202
|
+
|
|
203
|
+
1. **Scan the catalog.** Take 30 seconds to read the failure mode names. Ask: "Am I falling into any of these?"
|
|
204
|
+
2. **Check your evidence.** For each criterion you are about to mark PASS, ask: "What concrete, independently produced evidence do I have?" If you cannot answer with a specific test output or line reference, you do not have evidence.
|
|
205
|
+
3. **Scan the red flag phrases.** Review your draft reasoning. If any red flag phrase appears, revise before submitting.
|
|
206
|
+
4. **Check completeness.** Count your verdicts. Count the criteria in the pre-registration. They must match.
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Relationship to Other Quality Gates
|
|
211
|
+
|
|
212
|
+
The failure modes catalog is a **diagnostic tool**, not a process requirement. You do not need to document which failure modes you checked — you need to not commit them.
|
|
213
|
+
|
|
214
|
+
The `rnd-verification` skill defines the process. This catalog helps you execute that process without rationalizing your way to premature closure.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Related Skills
|
|
219
|
+
|
|
220
|
+
- `rnd-framework:rnd-verification` — Full verification process; this catalog is a supplement to it
|
|
221
|
+
- `rnd-framework:rnd-iteration` — What happens when failure modes cause a false PASS that the next cycle catches
|
|
222
|
+
- `rnd-framework:rnd-debugging` — For root cause analysis when a failure mode leads to a real defect
|
|
@@ -0,0 +1,170 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rnd-iteration
|
|
3
|
+
description: "Use when handling build-verify feedback loops — receiving verification feedback, iteration budgets, escalation to re-planning"
|
|
4
|
+
user-invocable: false
|
|
5
|
+
effort: medium
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# R&D Iteration
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
When the Verifier issues NEEDS ITERATION or FAIL, the Builder gets feedback and revises. This cycle has a budget to prevent infinite rework.
|
|
13
|
+
|
|
14
|
+
**Core principle:** Feedback describes WHAT is wrong with evidence. The Builder reasons about HOW to fix. Repeated iteration on the same criterion is a signal the task needs re-decomposition, not more attempts.
|
|
15
|
+
|
|
16
|
+
## When to Use
|
|
17
|
+
|
|
18
|
+
- After a Verifier returns NEEDS ITERATION or FAIL
|
|
19
|
+
- When a Builder receives feedback from verification
|
|
20
|
+
- When iteration budget is approaching or exceeded
|
|
21
|
+
|
|
22
|
+
## Information Barrier During Iteration
|
|
23
|
+
|
|
24
|
+
When passing Verifier feedback to the Builder:
|
|
25
|
+
|
|
26
|
+
**INCLUDE:**
|
|
27
|
+
- The "Feedback" section from the verification report
|
|
28
|
+
- Which criteria failed and what evidence showed the failure
|
|
29
|
+
|
|
30
|
+
**EXCLUDE:**
|
|
31
|
+
- The Verifier's internal reasoning
|
|
32
|
+
- Suggested fixes (Verifier should not have provided these)
|
|
33
|
+
- Other tasks' verification results
|
|
34
|
+
|
|
35
|
+
## Builder's Response to Feedback
|
|
36
|
+
|
|
37
|
+
When receiving verification feedback:
|
|
38
|
+
|
|
39
|
+
1. **Read the feedback carefully** — Understand WHAT failed, not just that it failed
|
|
40
|
+
2. **Diagnose** — Use `rnd-framework:rnd-debugging` if the failure is unclear
|
|
41
|
+
3. **Fix ALL failed criteria** — Address every criterion marked FAIL or NEEDS ITERATION, not just the primary failure. Use a checklist: list each failed criterion, diagnose it, fix it, and mark it done. Do not move to step 4 until every failed criterion has been addressed.
|
|
42
|
+
4. **Check shared code paths** — Identify code paths shared between your fixes and currently-passing criteria. Re-run tests covering those paths to confirm your fixes haven't introduced regressions. If a passing criterion shares logic with a fixed one, explicitly re-verify it.
|
|
43
|
+
5. **Re-run ALL tests** — Run the complete test suite, not just tests related to flagged criteria. Fixes often have cross-cutting effects.
|
|
44
|
+
6. **Update self-assessment** — Note what changed and why
|
|
45
|
+
7. **Resubmit** — Same artifacts, updated code and tests
|
|
46
|
+
|
|
47
|
+
> **Learning extraction:** After a successful iteration (re-verify returns PASS), the orchestrator extracts the root cause as a gotcha and writes it to the Learning Library via the `rnd-framework:rnd-learning` skill. This closes the feedback loop — the fix that unblocked this task becomes a "Known gotcha" that prevents the same failure in future builds.
|
|
48
|
+
|
|
49
|
+
## Iteration Budget
|
|
50
|
+
|
|
51
|
+
| Tier | Max Iterations | Escalation |
|
|
52
|
+
|------|---------------|------------|
|
|
53
|
+
| Small | 2 | Report to user |
|
|
54
|
+
| Standard | 3 | Escalate to re-planning |
|
|
55
|
+
| High-stakes | 5 | Escalate to re-planning |
|
|
56
|
+
|
|
57
|
+
### Wave-Scoped Budget
|
|
58
|
+
|
|
59
|
+
Iteration is wave-scoped: the budget for a wave rebuild equals the per-task budget of the highest-criticality task in the wave. All failing tasks in the wave are rebuilt in a single pass; re-verification covers the full wave. One cycle = one wave rebuild + one wave re-verify.
|
|
60
|
+
|
|
61
|
+
Example: a wave containing LOW and NORMAL tasks uses NORMAL budget (3 iterations max).
|
|
62
|
+
|
|
63
|
+
### When Budget Exhausted
|
|
64
|
+
|
|
65
|
+
If a wave still has failures after max iterations:
|
|
66
|
+
|
|
67
|
+
1. **STOP building** — More hammering won't help
|
|
68
|
+
2. **Report to orchestrator:** "Wave <N> exceeded iteration budget"
|
|
69
|
+
3. **Likely causes:**
|
|
70
|
+
- Tasks were decomposed wrong
|
|
71
|
+
- Success criteria were ambiguous
|
|
72
|
+
- The approach is fundamentally flawed
|
|
73
|
+
4. **Orchestrator decision:** Re-plan failing tasks, skip them, or escalate to user
|
|
74
|
+
|
|
75
|
+
**Progress visibility:** When entering a wave iteration cycle, notify the user via `ctx.ui.notify` with the current iteration count — e.g., `"Iterating Wave <N> (2/3)"`. This keeps the user informed and prevents the "silent pipeline" problem.
|
|
76
|
+
|
|
77
|
+
Track wave iterations in `$RND_DIR/iteration-log.md`:
|
|
78
|
+
|
|
79
|
+
```markdown
|
|
80
|
+
## Wave-<N> Iteration Log
|
|
81
|
+
|
|
82
|
+
### Cycle 1
|
|
83
|
+
- **Failing tasks:** [T<id>, T<id>, ...]
|
|
84
|
+
- **Wave failure report:** [summary of per-task verdict map sent to Builder]
|
|
85
|
+
- **Builder response:** [what was changed across all failing tasks]
|
|
86
|
+
- **Result:** PASS | NEEDS_ITERATION | FAIL
|
|
87
|
+
|
|
88
|
+
### Cycle 2
|
|
89
|
+
...
|
|
90
|
+
|
|
91
|
+
#### T<id> detail
|
|
92
|
+
- **Criterion failed:** [criterion text]
|
|
93
|
+
- **Evidence:** [evidence summary]
|
|
94
|
+
- **Fix applied:** [what Builder changed]
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## AMEND_REQUIRED Divergence
|
|
98
|
+
|
|
99
|
+
`AMEND_REQUIRED` is a distinct verdict that routes outside the normal iteration loop. It does **not** mean "try harder" — it means the Verifier believes the pre-registration itself may be wrong.
|
|
100
|
+
|
|
101
|
+
### Routing
|
|
102
|
+
|
|
103
|
+
- `AMEND_REQUIRED` → routes to the **rnd-amendment-arbiter** agent, not the Builder
|
|
104
|
+
- `ESCALATE_REPLAN` (arbiter output) → routes to a **Planner micro-spawn**; the task is re-decomposed, not reworked in-place
|
|
105
|
+
|
|
106
|
+
The Builder does **not** act on an `AMEND_REQUIRED` verdict until the arbiter + user gate has resolved.
|
|
107
|
+
|
|
108
|
+
### Budget rules
|
|
109
|
+
|
|
110
|
+
| Event | Iteration count |
|
|
111
|
+
|-------|----------------|
|
|
112
|
+
| `AMEND_REQUIRED` issued | Does **not** consume an iteration |
|
|
113
|
+
| Amendment approved → re-verify | Does **not** consume an iteration (re-verifies against amended criteria as a fresh run) |
|
|
114
|
+
| Amendment rejected → reverts to `NEEDS_ITERATION` | **Consumes** one iteration from the task's budget |
|
|
115
|
+
|
|
116
|
+
Amendment cycles are off-budget by design. The pipeline pauses at the arbiter, not at the Builder. Only rejection — which forces the Builder back into the normal iteration loop — counts against the budget.
|
|
117
|
+
|
|
118
|
+
## Common Rationalizations
|
|
119
|
+
|
|
120
|
+
| Excuse | Reality |
|
|
121
|
+
|--------|---------|
|
|
122
|
+
| "One more try will fix it" | You said that last time. Escalate. |
|
|
123
|
+
| "The Verifier is being too strict" | Strict is correct. Criteria are binary. If it's not met, it's not met. |
|
|
124
|
+
| "The Verifier is wrong" | Maybe. But 3 failures means the task needs re-thinking, not that the Verifier needs convincing. |
|
|
125
|
+
| "Just a minor tweak" | If 3 minor tweaks didn't fix it, it's not minor. |
|
|
126
|
+
| "It works, the test is just wrong" | Then fix the test and prove it. Claims without evidence are not results. |
|
|
127
|
+
| "I'll fix the other failures next round" | No. Address ALL failed criteria in one pass. Narrow fixes burn iteration budget and cause whack-a-mole cycles. Each round must converge, not punt. |
|
|
128
|
+
| "The Verifier issued AMEND_REQUIRED" | This is not a free retry; the arbiter still evaluates whether the spec or the code is wrong. |
|
|
129
|
+
|
|
130
|
+
## Amendment Log Artifact
|
|
131
|
+
|
|
132
|
+
When a Verifier issues `AMEND_REQUIRED`, the amendment-arbiter writes an amendment log to `$RND_DIR/briefs/T<id>-amendments.md`. This file is barrier-protected — Verifier and proof-gate agents cannot read it.
|
|
133
|
+
|
|
134
|
+
### Path pattern
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
$RND_DIR/briefs/T<id>-amendments.md
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Append-only protocol
|
|
141
|
+
|
|
142
|
+
Each `AMEND_REQUIRED` cycle appends one entry to the file. Entries are never edited or deleted. The file grows as a chronological record of all amendment proposals for that task.
|
|
143
|
+
|
|
144
|
+
### Required fields per entry
|
|
145
|
+
|
|
146
|
+
```markdown
|
|
147
|
+
## Amendment — <ISO 8601 timestamp>
|
|
148
|
+
|
|
149
|
+
**Cited defect:** <Verifier's exact cited defect from the AMEND_REQUIRED verdict's `feedback` field>
|
|
150
|
+
|
|
151
|
+
**Arbiter recommendation:** AMEND | REBUILD | ESCALATE_REPLAN
|
|
152
|
+
|
|
153
|
+
**Arbiter output:**
|
|
154
|
+
<full structured output from the arbiter — AMEND field patches, REBUILD rationale, or ESCALATE_REPLAN rationale>
|
|
155
|
+
|
|
156
|
+
**User decision:** approved | rejected
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### AMEND_REQUIRED vs NEEDS_ITERATION
|
|
160
|
+
|
|
161
|
+
`AMEND_REQUIRED` is NOT an iteration cycle. It does NOT consume iteration budget. It routes through the arbiter + user gate, mutates the pre-reg (on approval), and re-verifies against the amended criteria as if it were a fresh verification. The iteration budget counter for the task does not increment.
|
|
162
|
+
|
|
163
|
+
If the user rejects the amendment proposal, the verdict reverts to `NEEDS_ITERATION` and the normal iteration budget applies.
|
|
164
|
+
|
|
165
|
+
## Related Skills
|
|
166
|
+
|
|
167
|
+
- `rnd-framework:rnd-debugging` — For diagnosing unclear failures
|
|
168
|
+
- `rnd-framework:rnd-building` — Builder methodology
|
|
169
|
+
- `rnd-framework:rnd-verification` — Verifier methodology
|
|
170
|
+
- `rnd-framework:rnd-decomposition` — For re-planning escalation
|