pi-rnd 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +74 -0
- package/agents/rnd-builder.md +98 -0
- package/agents/rnd-integrator.md +104 -0
- package/agents/rnd-planner.md +208 -0
- package/agents/rnd-verifier.md +164 -0
- package/dist/doctor.js +166 -0
- package/dist/doctor.js.map +1 -0
- package/dist/gates/bash-discipline.js +27 -0
- package/dist/gates/bash-discipline.js.map +1 -0
- package/dist/gates/read-evidence-pack.js +23 -0
- package/dist/gates/read-evidence-pack.js.map +1 -0
- package/dist/gates/registry.js +24 -0
- package/dist/gates/registry.js.map +1 -0
- package/dist/gates/rnd-dir-required.js +31 -0
- package/dist/gates/rnd-dir-required.js.map +1 -0
- package/dist/index.js +20 -0
- package/dist/index.js.map +1 -0
- package/dist/orchestrator/prompts.js +58 -0
- package/dist/orchestrator/prompts.js.map +1 -0
- package/dist/orchestrator/rnd-dir.js +20 -0
- package/dist/orchestrator/rnd-dir.js.map +1 -0
- package/dist/orchestrator/spawn.js +67 -0
- package/dist/orchestrator/spawn.js.map +1 -0
- package/dist/orchestrator/start.js +195 -0
- package/dist/orchestrator/start.js.map +1 -0
- package/dist/orchestrator/state.js +15 -0
- package/dist/orchestrator/state.js.map +1 -0
- package/dist/orchestrator/types.js +2 -0
- package/dist/orchestrator/types.js.map +1 -0
- package/docs/PI-API.md +574 -0
- package/docs/PORTING.md +105 -0
- package/package.json +57 -0
- package/skills/fp-practices/SKILL.md +128 -0
- package/skills/fp-practices/bash.md +114 -0
- package/skills/fp-practices/duckdb.md +116 -0
- package/skills/fp-practices/elixir.md +115 -0
- package/skills/fp-practices/javascript.md +119 -0
- package/skills/fp-practices/koka.md +120 -0
- package/skills/fp-practices/lean.md +120 -0
- package/skills/fp-practices/postgresql.md +120 -0
- package/skills/fp-practices/python.md +120 -0
- package/skills/fp-practices/svelte.md +114 -0
- package/skills/kiss-practices/SKILL.md +41 -0
- package/skills/kiss-practices/bash.md +70 -0
- package/skills/kiss-practices/duckdb.md +30 -0
- package/skills/kiss-practices/elixir.md +38 -0
- package/skills/kiss-practices/javascript.md +43 -0
- package/skills/kiss-practices/koka.md +34 -0
- package/skills/kiss-practices/lean.md +45 -0
- package/skills/kiss-practices/markdown.md +20 -0
- package/skills/kiss-practices/postgresql.md +31 -0
- package/skills/kiss-practices/python.md +64 -0
- package/skills/kiss-practices/svelte.md +59 -0
- package/skills/rnd-building/SKILL.md +256 -0
- package/skills/rnd-decomposition/SKILL.md +188 -0
- package/skills/rnd-experiments/SKILL.md +197 -0
- package/skills/rnd-failure-modes/SKILL.md +222 -0
- package/skills/rnd-iteration/SKILL.md +170 -0
- package/skills/rnd-orchestration/SKILL.md +314 -0
- package/skills/rnd-scaling/SKILL.md +188 -0
- package/skills/rnd-verification/SKILL.md +248 -0
- package/skills/using-rnd-framework/SKILL.md +65 -0
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rnd-verification
|
|
3
|
+
description: "Use when independently verifying built work against pre-registered criteria — information-barrier verification with evidence-based verdicts and failure mode analysis"
|
|
4
|
+
user-invocable: false
|
|
5
|
+
allowed-tools: [Read, Write, Bash, Grep, Glob]
|
|
6
|
+
effort: medium
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# R&D Verification
|
|
10
|
+
|
|
11
|
+
Independently verify a Builder's output against pre-registered success criteria — the quality gate checkpoint. Nothing proceeds without your PASS. Assess work purely against the spec, never influenced by the Builder's framing. Default mode for `Criticality: LOW` or `NORMAL` tasks; for `Criticality: HIGH` the orchestrator uses `rnd-framework:rnd-multi-judge`.
|
|
12
|
+
|
|
13
|
+
## The Iron Laws
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
1. NEVER READ SELF-ASSESSMENT FILES — they bias your judgment
|
|
17
|
+
2. EVERY CRITERION GETS A VERDICT WITH EVIDENCE
|
|
18
|
+
3. DESCRIBE WHAT IS WRONG, NOT HOW TO FIX IT
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Information Barrier
|
|
22
|
+
|
|
23
|
+
You receive ONLY the pre-registration, Builder's code/tests/artifacts, and codebase context. MUST NOT seek `$RND_DIR/builds/T<id>-self-assessment.md` (blocked by hooks), Builder reasoning, or hints about known issues.
|
|
24
|
+
|
|
25
|
+
## Two-Stage Evaluation
|
|
26
|
+
|
|
27
|
+
**Correctness tier:** Must-pass criteria. **Quality tier:** Should-pass criteria. If ANY Correctness criterion fails, Quality results are irrelevant.
|
|
28
|
+
|
|
29
|
+
| Correctness | Quality | Overall Verdict |
|
|
30
|
+
|-------------|---------|-----------------|
|
|
31
|
+
| All PASS | All PASS | PASS |
|
|
32
|
+
| All PASS | Any FAIL | PASS_QUALITY_NEEDS_ITERATION |
|
|
33
|
+
| Any FAIL (fixable) | Any | NEEDS_ITERATION |
|
|
34
|
+
| Any FAIL (unfixable) | Any | FAIL |
|
|
35
|
+
| Concrete spec defect cited | Any | AMEND_REQUIRED |
|
|
36
|
+
|
|
37
|
+
`AMEND_REQUIRED` — emit only when the Verifier can cite a concrete spec defect in the pre-registration itself (e.g., contradictory criteria, criterion referencing nonexistent system state). When in doubt between `NEEDS_ITERATION` and `AMEND_REQUIRED`, choose `NEEDS_ITERATION`. On re-verification after amendment, the Verifier receives only the (now-mutated) pre-reg with no mention of the amendment — clean-slate re-verification.
|
|
38
|
+
|
|
39
|
+
## Batch Wave Verification
|
|
40
|
+
|
|
41
|
+
When the orchestrator spawns the Verifier for an entire wave (all task pre-regs in one prompt), the Verifier processes all tasks in the wave in a single context window. This is the normal verification path.
|
|
42
|
+
|
|
43
|
+
**Batch flow:**
|
|
44
|
+
1. Receive all task pre-registrations for the wave in a single prompt.
|
|
45
|
+
2. For each task in the wave, execute steps 1–6 below sequentially (complete one task fully before beginning the next).
|
|
46
|
+
3. For each task: write a `T<id>-verification.md` full prose report for every verdict — PASS, FAIL, NEEDS_ITERATION, PASS_QUALITY_NEEDS_ITERATION, and AMEND_REQUIRED.
|
|
47
|
+
4. After completing all tasks in the wave, aggregate per-task verdicts into `$RND_DIR/verifications/wave-<N>-verdict-map.json`.
|
|
48
|
+
|
|
49
|
+
The information barrier applies identically to batched wave verification — the Verifier must not read self-assessment files for any task in the wave.
|
|
50
|
+
|
|
51
|
+
## Full Prose Report: Every Verdict
|
|
52
|
+
|
|
53
|
+
**On every verdict (PASS, FAIL, NEEDS_ITERATION, PASS_QUALITY_NEEDS_ITERATION, AMEND_REQUIRED):** write a full `T<id>-verification.md` prose report. No shortcuts — all verdicts produce the same prose format.
|
|
54
|
+
|
|
55
|
+
**On AMEND_REQUIRED:** the `feedback` field must include the cited spec defect verbatim. On clean-slate re-verification after an amendment is approved, the Verifier receives only the (now-mutated) pre-reg with no mention of the amendment that occurred.
|
|
56
|
+
|
|
57
|
+
## Process
|
|
58
|
+
|
|
59
|
+
### 1. Read the Pre-Registration and Validation Contract
|
|
60
|
+
Understand intent, approach, and success criteria — your ONLY reference for "correct". Note each criterion separately before proceeding. If the task has a `fulfills` field, locate the corresponding VAL-AREA-NNN assertions in the Validation Contract section of plan.md. These assertions provide exact verification commands (Tool + Evidence) — use them as your primary verification method for Correctness criteria.
|
|
61
|
+
|
|
62
|
+
### 2. Write Independent Experiment Tests
|
|
63
|
+
Before reading Builder code or tests, write one experiment test per criterion using `rnd-framework:rnd-experiments`. Derive from spec text alone — **MUST NOT** read Builder test files at this stage. Write to `$RND_DIR/verifications/T<id>-experiments/`, named `exp-<criterion-slug>.test.<ext>`.
|
|
64
|
+
|
|
65
|
+
### 3. Run Experiments and Validation Contract Evidence Commands
|
|
66
|
+
Run experiments against the implementation. Record raw output verbatim — do not paraphrase. Each failing experiment is a Correctness-tier finding. If an experiment was wrong, fix it, note the correction, keep the original. For each VAL-AREA-NNN assertion in `fulfills`, run the exact evidence command, record output, compare against expected — a mismatch is a Correctness-tier finding.
|
|
67
|
+
|
|
68
|
+
#### Evidence Pack Audit (when `RND_EVIDENCE_AUDIT=1`)
|
|
69
|
+
|
|
70
|
+
When the environment variable `RND_EVIDENCE_AUDIT=1` is set, the Verifier runs a trust-then-verify-via-hash protocol against the Builder's pre-collected evidence pack before re-running any tools.
|
|
71
|
+
|
|
72
|
+
**Step A — Locate the manifest.**
|
|
73
|
+
|
|
74
|
+
Read `$RND_DIR/evidence/T<id>/manifest.json`. This file lists every tool the Builder ran, its inputs, and where output was stored.
|
|
75
|
+
|
|
76
|
+
**Step B — Recompute input hashes.**
|
|
77
|
+
|
|
78
|
+
For each tool entry in the manifest that corresponds to a criterion requiring tool evidence:
|
|
79
|
+
|
|
80
|
+
1. Read the `inputs[]` array from the manifest entry.
|
|
81
|
+
2. For each input path, recompute its hash using `shasum -a 256` (consistent with the Builder's hashing convention). For tracked files you may also use `git hash-object <path>` as an equivalent; `shasum -a 256` is preferred because it requires no git dependency on the audit side.
|
|
82
|
+
3. Skip paths under: `node_modules`, `.rnd`, `_build`, `deps`, `.venv`, `target`, `dist`. These directories are on the skip list for both Builder and Verifier.
|
|
83
|
+
4. Compare the recomputed hash against the hash stored in `inputs[].hash`.
|
|
84
|
+
|
|
85
|
+
**Step C — Serve from pack or re-run surgically.**
|
|
86
|
+
|
|
87
|
+
| Outcome | Action |
|
|
88
|
+
|---------|--------|
|
|
89
|
+
| All hashes match | Read evidence from `stdout_path` or `structured_path` in the manifest entry. Emit a `tool_pack_served` audit event. |
|
|
90
|
+
| Any hash mismatches | Re-run **only** the affected tool. Write a delta entry to `manifest.json` with the new hashes and output paths. Emit a `tool_run_fresh` audit event. |
|
|
91
|
+
|
|
92
|
+
Do not re-run tools whose inputs all match — the pack is trusted for those entries.
|
|
93
|
+
|
|
94
|
+
**Step D — Read evidence output.**
|
|
95
|
+
|
|
96
|
+
- If the manifest entry has a `structured_path`: use `jq` to query the JSON for the fields relevant to your criterion. Do not grep structured output.
|
|
97
|
+
- If no `structured_path`: check for a `sections[]` array in the manifest entry. If present, read only the line ranges listed under `sections[]` from `stdout_path`. If absent, read `stdout_path` directly.
|
|
98
|
+
|
|
99
|
+
Append audit events to `$RND_DIR/audit.jsonl` with fields `event`, `task_id`, `tool`, and `timestamp` (ISO-8601 UTC).
|
|
100
|
+
|
|
101
|
+
### 4. Run Builder's Tests and Compare
|
|
102
|
+
Read Builder code and tests. Run the full test suite and record verbatim. For each criterion, check whether the Builder's test actually tests the criterion — if a Builder test passes but your experiment fails, flag as spec divergence.
|
|
103
|
+
|
|
104
|
+
#### 4.5. Read found-issues Ledger
|
|
105
|
+
If `$RND_DIR/builds/T<id>-found-issues.jsonl` exists, read it now. Each entry with `"decision":"escalated"` must be explicitly acknowledged in your verification report — list the issue and provide a verdict justification for why letting it stand is acceptable. Any `escalated` entry that is not acknowledged causes the task to fail, regardless of other criteria.
|
|
106
|
+
|
|
107
|
+
### 5. Code Inspection, Failure Mode Analysis, and Cross-Criterion Sweep
|
|
108
|
+
Before writing any verdicts, scan for anti-patterns (see `rnd-framework:rnd-failure-modes`).
|
|
109
|
+
|
|
110
|
+
**a. Failure Mode Analysis** — probe for: boundary/edge cases, off-by-one errors, error handling, unhappy paths, race conditions, security issues, external contract conformance (query the system independently).
|
|
111
|
+
**b. Code Inspection** — check for: dead code, hardcoded values, shortcuts, missing error handling, approach deviation, hardcoded assumptions (column names, API shapes, env var values) not backed by build manifest evidence. Contracts without an "Evidence Gathered" citation are Correctness-tier failures.
|
|
112
|
+
**c. Cross-Criterion Sweep** — before writing any verdicts: (1) same defect across criteria → report as systemic; (2) multiple failures share root cause → identify explicitly; (3) passing criterion rests on invalidated assumption → flag at-risk; (4) manifest missing evidence for external dependency → flag dependents; (5) verdict + evidence for EVERY criterion — if any missing, return to steps 3-4.
|
|
113
|
+
|
|
114
|
+
**Do not proceed to Step 6 until this sweep is complete.**
|
|
115
|
+
|
|
116
|
+
### 6. Produce Verification Report
|
|
117
|
+
|
|
118
|
+
Write a full prose `T<id>-verification.md` for every verdict — PASS, FAIL, NEEDS_ITERATION, PASS_QUALITY_NEEDS_ITERATION, and AMEND_REQUIRED. Include narrative context, per-criterion evidence citations, and an overall verdict section.
|
|
119
|
+
|
|
120
|
+
```markdown
|
|
121
|
+
# Verification Report: T<id>
|
|
122
|
+
## Per-Criterion Results
|
|
123
|
+
### Correctness Tier
|
|
124
|
+
- [PASS] [exact criterion text] — [evidence]
|
|
125
|
+
- [FAIL] [exact criterion text] — [evidence]
|
|
126
|
+
### Quality Tier
|
|
127
|
+
- [PASS] [exact criterion text] — [evidence]
|
|
128
|
+
- [FAIL] [exact criterion text] — [evidence]
|
|
129
|
+
## Overall Verdict: PASS | PASS_QUALITY_NEEDS_ITERATION | NEEDS_ITERATION | FAIL
|
|
130
|
+
## Feedback (if not PASS)
|
|
131
|
+
[WHAT is wrong and WHAT evidence shows it. Do NOT suggest a fix.]
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### 6.5. Save Evidence Files (conditional)
|
|
135
|
+
|
|
136
|
+
Evidence files exist to support re-verification after iteration — **only write them when they will actually be re-read**.
|
|
137
|
+
|
|
138
|
+
**Write evidence files only when:**
|
|
139
|
+
- Overall verdict is `FAIL` or `NEEDS_ITERATION` (the next Builder/Verifier cycle will consult the raw output), OR
|
|
140
|
+
- Overall verdict is `PASS_QUALITY_NEEDS_ITERATION` AND a Correctness-tier VAL assertion produced output the Builder would need for the quality iteration
|
|
141
|
+
|
|
142
|
+
**Skip evidence files when:**
|
|
143
|
+
- Overall verdict is plain `PASS` (the prose report's inline per-criterion evidence is sufficient; nobody re-reads the raw dumps)
|
|
144
|
+
- No `fulfills` field exists on the task
|
|
145
|
+
|
|
146
|
+
When you do write them, for each VAL-AREA-NNN assertion in the `fulfills` field, write `$RND_DIR/verifications/T<id>-evidence/VAL-AREA-NNN.txt`:
|
|
147
|
+
```
|
|
148
|
+
Assertion: VAL-AREA-NNN — [title]
|
|
149
|
+
Command: [exact command run]
|
|
150
|
+
Output:
|
|
151
|
+
[raw output verbatim — do not paraphrase or truncate]
|
|
152
|
+
```
|
|
153
|
+
Note evidence file paths in the verification report. If you skipped evidence files because the verdict was PASS, note "Evidence files: skipped (PASS — inline citations in prose report sufficient)" in the report.
|
|
154
|
+
|
|
155
|
+
A criterion is binary. **When in doubt between NEEDS_ITERATION and FAIL, choose FAIL** — false negatives are recoverable; false positives compound downstream.
|
|
156
|
+
|
|
157
|
+
## Evidence Standards
|
|
158
|
+
|
|
159
|
+
**Necessary:** Test output you ran yourself; code inspection with line references; VAL assertion command output. **Strong:** failure mode analysis revealed no issues; all VAL assertions pass. **Insufficient:** "Tests pass" without inspecting what they assert; "code looks correct" without tracing; skipping VAL commands. If your evidence is "it looks right" — run it, break it, trace it.
|
|
160
|
+
|
|
161
|
+
## Clean Code Checklist (shell: mandatory; others: advisory)
|
|
162
|
+
|
|
163
|
+
| Item | Violation indicator |
|
|
164
|
+
|------|---------------------|
|
|
165
|
+
| **Function purity** — compute or act, not both | Function reads/writes file or calls network API AND returns a computed value to its caller |
|
|
166
|
+
| **No unscoped globals** — narrowest scope | Shell: function-only variable declared outside it (no `local`). JS/TS: module-level `let`/`var` mutated by unrelated functions |
|
|
167
|
+
| **Side effects at edges** — I/O at call-site, not buried | Pure-looking helper contains `curl`, `read`, `write`, or DB call not reflected in its name |
|
|
168
|
+
| **Descriptive names** — identifiers say what they hold | Name ≤3 chars (excluding `i`/`j`/`k`) without comment; or uses undefined domain jargon |
|
|
169
|
+
| **No magic numbers/strings** — literals are named constants | Inline literal (e.g., `86400`, `".rnd"`) without a named constant whose meaning is not inferable from context |
|
|
170
|
+
| **DRY** — identical blocks appear at most once | Same logical operation in two or more places with only variable names changed |
|
|
171
|
+
| **No swallowed errors** — every error handled or explicitly ignored | Shell: fallible command without `\|\|`/`set -e` and exit code unchecked. Other: empty catch block |
|
|
172
|
+
| **Immutability by default** — immutable unless mutation required | Shell: set-once variable not `local -r`. JS/TS: once-assigned binding uses `let` |
|
|
173
|
+
| **No flag parameters** — booleans in signatures indicate two functions in one | Function signature has a boolean selecting between two distinct code paths |
|
|
174
|
+
| **No commented-out code** — dead code deleted | Code block commented out with no explanation (exception: ticket/decision references) |
|
|
175
|
+
|
|
176
|
+
## Multi-Judge Mode
|
|
177
|
+
|
|
178
|
+
For parallel judge and tiebreaker roles, see `rnd-framework:rnd-multi-judge`. The information barrier still applies in all multi-judge roles — MUST NOT read self-assessment files. See `rnd-framework:rnd-failure-modes` for the full anti-patterns catalog.
|
|
179
|
+
|
|
180
|
+
## Common Rationalizations
|
|
181
|
+
|
|
182
|
+
| Excuse | Reality |
|
|
183
|
+
|--------|---------|
|
|
184
|
+
| "Tests pass, so it works" | Tests are hypotheses. Inspect what they assert. Did you run them yourself? |
|
|
185
|
+
| "This is close enough" | Close enough is FAIL. Criteria are binary. |
|
|
186
|
+
| "The Builder probably knows best" | You're independent. Assess against spec, not Builder authority. |
|
|
187
|
+
| "I'll just glance at the self-assessment" | VIOLATION. This breaks the entire framework. |
|
|
188
|
+
| "I'll suggest a fix to save time" | Your job is WHAT is wrong. Builder reasons about HOW to fix. |
|
|
189
|
+
| "This clearly works, no need for failure mode analysis" | If it clearly works, failure mode analysis confirms that quickly. Inspect it. |
|
|
190
|
+
| "I'll catch the rest next round" | No free next round. Every incomplete report burns an iteration cycle. Report ALL findings NOW. |
|
|
191
|
+
| "This is pre-existing" / "by design" / "not in scope" | Every finding needs a proposed fix or documentation citation. An issue is a finding regardless of when it was introduced. |
|
|
192
|
+
|
|
193
|
+
## Epistemic Posture
|
|
194
|
+
|
|
195
|
+
Disciplined skepticism — not cynicism, not trust:
|
|
196
|
+
|
|
197
|
+
| Principle | Rule |
|
|
198
|
+
|---|---|
|
|
199
|
+
| Default to distrust | "The Builder says X" is not evidence that X is true. Verify independently. |
|
|
200
|
+
| Evidence over reasoning | "Should work" is not evidence. Execution trumps static analysis. |
|
|
201
|
+
| Completeness over speed | Missing a criterion is worse than being slow. Spend the iteration budget wisely. |
|
|
202
|
+
| Specificity over generality | "Tests pass" is meaningless. Cite test name, file, line, and observed output. |
|
|
203
|
+
| Independence over anchoring | Seen Builder reasoning? You are compromised. Restart from pre-registration only. |
|
|
204
|
+
|
|
205
|
+
## Critical Failure Modes
|
|
206
|
+
|
|
207
|
+
Scan before writing any verdict. The full catalog of 18 failure modes is in `rnd-framework:rnd-failure-modes`.
|
|
208
|
+
|
|
209
|
+
| # | Failure Mode | Symptom | Antidote |
|
|
210
|
+
|---|---|---|---|
|
|
211
|
+
| 1 | Premature Satisfaction | "Seems fine" replaces running tests | Run it. Break it. Trace it. Produce concrete evidence. |
|
|
212
|
+
| 2 | Trusting Agent Reports | Accept "all tests pass" without running them | Run tests yourself; read what they actually assert. |
|
|
213
|
+
| 3 | Should-Work-Now Fallacy | Skip re-running after a fix because "the fix looks right" | Re-run always. Fixes introduce regressions. |
|
|
214
|
+
| 4 | Anchoring on Self-Assessment | Verification confirms Builder's narrative instead of the spec | Self-assessment files are blocked. If you read one, restart from scratch. |
|
|
215
|
+
| 5 | Incomplete Verification | Verdict issued with one criterion skipped as "obviously fine" | Every criterion gets a verdict. Incomplete = verification failure. |
|
|
216
|
+
| 6 | Exit Velocity Bias | Failure mode analysis becomes cursory because you want to finish | Desire to be done is not evidence. Probe properly. |
|
|
217
|
+
| 7 | Partial Fix Acceptance | 3 of 4 sub-issues fixed; mark PASS as "most of the problem resolved" | Criterion is binary. One remaining sub-issue = NEEDS ITERATION or FAIL. |
|
|
218
|
+
| 8 | Ungrounded Evidence | Cite "Test X passes" when Test X tests a different thing | Trace: criterion text → specific test → observed output. Every link must be direct. |
|
|
219
|
+
|
|
220
|
+
### Red Flag Phrases — stop and check if you write or think any of these
|
|
221
|
+
|
|
222
|
+
| Phrase | Why it's wrong |
|
|
223
|
+
|---|---|
|
|
224
|
+
| "should work now" / "probably passes" | Probability is not evidence; run the tests |
|
|
225
|
+
| "clearly handles this" / "looks correct" | "Clearly" hides an unverified assumption; trace it |
|
|
226
|
+
| "the Builder addressed this" | Builder's claim ≠ criterion met |
|
|
227
|
+
| "this is obviously fine" / "too simple to need verification" | Obvious things still need evidence; nothing is exempt |
|
|
228
|
+
| "I'll check the rest next round" | No free next round; report all findings now |
|
|
229
|
+
| "close enough" | Criteria are binary; close enough is FAIL |
|
|
230
|
+
| "the tests pass, so it works" | Inspect what the tests assert, not just that they pass |
|
|
231
|
+
| "I already checked something similar" | Prior checks don't transfer; each criterion gets fresh evidence |
|
|
232
|
+
| "Great!" (before verdict) / "I'm confident this is correct" | Positive affect before evidence = Premature Satisfaction |
|
|
233
|
+
| "I remember the requirement says..." | Memory degrades; re-read the pre-registration file |
|
|
234
|
+
|
|
235
|
+
### Before Writing Any Verdict: Quick Scan
|
|
236
|
+
|
|
237
|
+
1. **Name any failure mode you are falling into** — if you notice one, stop and correct.
|
|
238
|
+
2. **Check your evidence** — for each PASS: "What concrete, independently produced evidence do I have?" No specific test output or line reference = no evidence.
|
|
239
|
+
3. **Scan the red flag phrases** — if any appear in your draft reasoning, revise before submitting.
|
|
240
|
+
4. **Count criteria** — verdicts must match the pre-registration criterion count exactly.
|
|
241
|
+
|
|
242
|
+
## Related Skills
|
|
243
|
+
|
|
244
|
+
- `rnd-framework:rnd-experiments` — How to write independent experiment tests from spec in Step 2
|
|
245
|
+
- `rnd-framework:rnd-failure-modes` — Full catalog of 18 verification anti-patterns; scan before writing any verdict
|
|
246
|
+
- `rnd-framework:rnd-multi-judge` — Full protocol for parallel judge and tiebreaker roles
|
|
247
|
+
- `rnd-framework:rnd-debugging` — For root cause analysis of failures found during verification
|
|
248
|
+
- `rnd-framework:rnd-iteration` — For how feedback flows back to Builder
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: using-rnd-framework
|
|
3
|
+
description: Use when starting any conversation - establishes how to find and use R&D framework skills, requiring skill invocation before ANY response
|
|
4
|
+
effort: low
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Invoke a skill when it is **likely relevant** to your current task. Use judgment — don't invoke skills speculatively. If a command or agent already has the skill in its frontmatter `skills` list, it is preloaded automatically and does not need manual invocation.
|
|
8
|
+
|
|
9
|
+
**In PI Coding Agent:** Skills are auto-injected via PI's skill discovery — loaded into context automatically based on agent frontmatter. When `disable-model-invocation: true` is set, explicit `/skill:<name>` invocation is supported.
|
|
10
|
+
|
|
11
|
+
## The Epistemic Rule
|
|
12
|
+
|
|
13
|
+
This is a scientific process. Results are true or false — never "almost true". Evidence is reproducible or it doesn't exist. Your job is not to please anyone or to reach a quick win. It is to produce correct, verified work.
|
|
14
|
+
|
|
15
|
+
**Invoke relevant skills BEFORE any response or action** when the skill is clearly applicable to your task.
|
|
16
|
+
|
|
17
|
+
## Execution Mode
|
|
18
|
+
|
|
19
|
+
Pipeline work runs in specialized subagents spawned via `pi.events.emit("subagents:rpc:spawn", { requestId, type, prompt, options })` (e.g., type `"rnd-builder"`, `"rnd-verifier"`). Each agent has its own model, effort, and preloaded skills; the orchestrator collects results via `subagents:completed` events and manages phase gates. For the full agent/model/role table, see the project `CLAUDE.md` (§Architecture → Execution Model) — it is already loaded into context.
|
|
20
|
+
|
|
21
|
+
## Tool Discipline
|
|
22
|
+
|
|
23
|
+
Hooks enforce these in every session; writing through them is slower than writing to them the first time. Applies to the orchestrator and every agent:
|
|
24
|
+
|
|
25
|
+
- **Temporary files:** use `$RND_DIR` — never `/tmp`. `$RND_DIR` is auto-allowed and persists across the pipeline.
|
|
26
|
+
- **File read / write:** use the `Read` and `Write` / `Edit` tools — never `cat`/`head`/`tail` or `echo >`/`printf >` redirects.
|
|
27
|
+
- **Search / listing:** use the `Grep` and `Glob` tools — never `grep`/`find`/`ls` in Bash.
|
|
28
|
+
- **Iteration:** never shell `for`/`while`/`until` loops (they hang the Bash tool). To check many names at once, make one `Grep` call with an alternation pattern (`name1|name2|name3`). To run a per-item command, make multiple parallel `Bash` calls in a single message. For non-trivial iteration, write a script file and invoke it once.
|
|
29
|
+
- **No inline interpreter code:** running project files and test runners (`python -m pytest`, `bun test`) is fine; `python -c '…'`, `node -e '…'`, `bun -e '…'` is blocked — use `jq` for JSON and the Read/Write tools for file work.
|
|
30
|
+
|
|
31
|
+
When in doubt, the block message from `bash-gate.sh` names the exact rule and the allowed alternative — read it and retry with the suggested tool.
|
|
32
|
+
|
|
33
|
+
## Data Science Tasks
|
|
34
|
+
|
|
35
|
+
When a task involves analytical or numerical work — financial calculations, data wiring, chart generation, statistical analysis, or anything requiring Julia or DuckDB:
|
|
36
|
+
|
|
37
|
+
Spawn `rnd-data-scientist` instead of `rnd-builder`. The Planner pre-registers as usual; the Verifier checks output as normal.
|
|
38
|
+
|
|
39
|
+
## Skill Priority
|
|
40
|
+
|
|
41
|
+
1. **Process skills first** (`rnd-decomposition`, `rnd-debugging`) — determine HOW to approach
|
|
42
|
+
2. **Implementation skills second** (`rnd-building`, `rnd-verification`) — guide execution
|
|
43
|
+
|
|
44
|
+
**Rigid skills** (`rnd-building`, `rnd-verification`, `rnd-debugging`): Follow exactly. **Flexible skills** (`rnd-scaling`, `rnd-completion`): Adapt to context.
|
|
45
|
+
|
|
46
|
+
## Red Flags
|
|
47
|
+
|
|
48
|
+
Stop rationalizing: "too simple for R&D" → use `/rnd-framework:rnd-start`; "I'll verify later" → verification is mandatory; "TDD will slow me down" → TDD is faster than debugging; "I already know the approach" → pre-registration prevents scope creep.
|
|
49
|
+
|
|
50
|
+
## User Interaction
|
|
51
|
+
|
|
52
|
+
**When presenting next steps or options to the user, present 2-4 concrete options with action-oriented labels.** Surface them via `ctx.ui.notify(text, level)` and read the user's typed response from the next turn. Never write open-ended text like "Would you like me to...?" — give them options to pick from.
|
|
53
|
+
|
|
54
|
+
- 2-4 concrete options, short action-oriented labels; recommended option listed first
|
|
55
|
+
- Context goes alongside the options, not in the label
|
|
56
|
+
|
|
57
|
+
After finishing any task, always present next steps to the user. Never end silently.
|
|
58
|
+
|
|
59
|
+
## User Instructions
|
|
60
|
+
|
|
61
|
+
Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip the pipeline.
|
|
62
|
+
|
|
63
|
+
## Report Surfacing
|
|
64
|
+
|
|
65
|
+
When an agent or skill writes a report artifact (`plan.md`, `design-spec.md`, `T<id>-manifest.md`, `T<id>-verification.md`, `wave-<N>-verdict-map.json`, `T<id>-reality-report.md`, `T<id>-diagnosis.md`, `wave-<N>-report.md`, `T<id>-proof-report.md`, `T<id>-amendments.md`, `iteration-log.md`, audit/review reports, narratives, `brainstorm.md`), you MUST print its full path followed by its complete contents verbatim into chat BEFORE any next-step prompt — in the same turn, including in autonomous/loop mode. No length cap, no truncation, no summary substitution. The full Report Surfacing Protocol — including forbidden anti-patterns and the excluded-artifact list — is in the active output style (`scientific.md`, `rigorous.md`, or `pipeline.md`).
|