npm - pi-rnd - Versions diffs - 0.2.1 - Mend

pi-rnd 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

package/LICENSE +21 -0
package/README.md +74 -0
package/agents/rnd-builder.md +98 -0
package/agents/rnd-integrator.md +104 -0
package/agents/rnd-planner.md +208 -0
package/agents/rnd-verifier.md +164 -0
package/dist/doctor.js +166 -0
package/dist/doctor.js.map +1 -0
package/dist/gates/bash-discipline.js +27 -0
package/dist/gates/bash-discipline.js.map +1 -0
package/dist/gates/read-evidence-pack.js +23 -0
package/dist/gates/read-evidence-pack.js.map +1 -0
package/dist/gates/registry.js +24 -0
package/dist/gates/registry.js.map +1 -0
package/dist/gates/rnd-dir-required.js +31 -0
package/dist/gates/rnd-dir-required.js.map +1 -0
package/dist/index.js +20 -0
package/dist/index.js.map +1 -0
package/dist/orchestrator/prompts.js +58 -0
package/dist/orchestrator/prompts.js.map +1 -0
package/dist/orchestrator/rnd-dir.js +20 -0
package/dist/orchestrator/rnd-dir.js.map +1 -0
package/dist/orchestrator/spawn.js +67 -0
package/dist/orchestrator/spawn.js.map +1 -0
package/dist/orchestrator/start.js +195 -0
package/dist/orchestrator/start.js.map +1 -0
package/dist/orchestrator/state.js +15 -0
package/dist/orchestrator/state.js.map +1 -0
package/dist/orchestrator/types.js +2 -0
package/dist/orchestrator/types.js.map +1 -0
package/docs/PI-API.md +574 -0
package/docs/PORTING.md +105 -0
package/package.json +57 -0
package/skills/fp-practices/SKILL.md +128 -0
package/skills/fp-practices/bash.md +114 -0
package/skills/fp-practices/duckdb.md +116 -0
package/skills/fp-practices/elixir.md +115 -0
package/skills/fp-practices/javascript.md +119 -0
package/skills/fp-practices/koka.md +120 -0
package/skills/fp-practices/lean.md +120 -0
package/skills/fp-practices/postgresql.md +120 -0
package/skills/fp-practices/python.md +120 -0
package/skills/fp-practices/svelte.md +114 -0
package/skills/kiss-practices/SKILL.md +41 -0
package/skills/kiss-practices/bash.md +70 -0
package/skills/kiss-practices/duckdb.md +30 -0
package/skills/kiss-practices/elixir.md +38 -0
package/skills/kiss-practices/javascript.md +43 -0
package/skills/kiss-practices/koka.md +34 -0
package/skills/kiss-practices/lean.md +45 -0
package/skills/kiss-practices/markdown.md +20 -0
package/skills/kiss-practices/postgresql.md +31 -0
package/skills/kiss-practices/python.md +64 -0
package/skills/kiss-practices/svelte.md +59 -0
package/skills/rnd-building/SKILL.md +256 -0
package/skills/rnd-decomposition/SKILL.md +188 -0
package/skills/rnd-experiments/SKILL.md +197 -0
package/skills/rnd-failure-modes/SKILL.md +222 -0
package/skills/rnd-iteration/SKILL.md +170 -0
package/skills/rnd-orchestration/SKILL.md +314 -0
package/skills/rnd-scaling/SKILL.md +188 -0
package/skills/rnd-verification/SKILL.md +248 -0
package/skills/using-rnd-framework/SKILL.md +65 -0

package/skills/rnd-building/SKILL.md ADDED Viewed

@@ -0,0 +1,256 @@
+---
+name: rnd-building
+description: "Use when implementing code within the R&D pipeline — TDD discipline, pre-registration compliance, honest self-assessment, and verification artifact production"
+user-invocable: false
+effort: medium
+---
+# R&D Building
+Implement ONE assigned task against its pre-registered success criteria. Write the test first. Watch it fail. Write minimal code to pass. Produce verification artifacts for the independent Verifier.
+## The Iron Laws
+```
+1. NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+2. NO SILENT DEVIATIONS FROM THE PRE-REGISTERED APPROACH
+3. DO NOT VERIFY YOUR OWN WORK
+4. USE WRITE/EDIT TOOLS TO CREATE AND MODIFY FILES — NEVER BASH HEREDOCS
+5. IF SLOP-GATE RETURNS WARN OR FAIL WITH ANY SEVERITY 3+ MATCH, RE-EDIT IMMEDIATELY — DO NOT DEFER
+6. WHEN YOU HIT AN ISSUE (ERROR, WARNING, BROKEN TEST, BUG, GAP), FIX IT OR LOG IT TO found-issues.jsonl WITH decision="fixed"|"escalated" — SILENT DISMISSAL IS NOT AN OPTION
+7. EXPLAIN BEFORE YOU WRITE — ONE LOGICAL CHANGE PER WRITE/EDIT, NOT WALLS OF CODE
+8. DO NOT EMBED PIPELINE TASK IDs IN PROJECT CODE — NO TASK IDs IN COMMENTS, TEST NAMES, OR VARIABLE NAMES (RND ARTIFACT FILES IN $RND_DIR ARE EXEMPT)
+```
+### 0. Resolve RND_DIR
+`$RND_DIR` is set by the orchestrator in the spawn prompt. If absent, look for it in the environment. Do not attempt to compute it from plugin paths — the orchestrator owns this value.
+### 1. Read Your Assignment
+Find your task in `$RND_DIR/plan.md`. Read its pre-registration — especially success criteria, approach, and the `fulfills` field (which links to specific VAL-AREA-NNN assertions in the Validation Contract). Also read:
+- **Environment Setup** — runtime, package manager, dependencies, install commands
+- **Testing Strategy** — test framework, baseline count, exact run commands for unit/integration/live tests
+- **Worker Guidelines** — project boundaries (USE/OFF-LIMITS), coding conventions, architecture notes
+### 2. Verify Preconditions
+If the pre-registration has a `Preconditions:` field, check each assertion before writing code:
+- **File existence:** run Glob for the declared path pattern
+- **Content existence:** run Grep for the declared pattern in the declared file
+- **Dependency presence:** run Read on the config file and check for the declared key
+- **On failure:** report status `BLOCKED` with the specific failing assertion. Do not proceed.
+### 2.5. Read Exploration Cache
+If `$RND_DIR/exploration/` exists, read it before writing code — file summaries, key patterns, dependencies. The orchestrator may also inject pitfalls from prior builds; review them before writing any code.
+### 2.75. Verify External Dependencies
+Before writing any code, verify every external dependency listed in the pre-registration:
+- **Read or query the actual external system** — read DB schema, call API endpoint, inspect file
+- **Record evidence in the build manifest** under `### Evidence Gathered` — cite file path, line range, and what was learned
+- **Flag any contract mismatch as a STOP condition** — same protocol as a plan deviation
+- **If inaccessible**, document as an unverified assumption in your self-assessment
+### 3. Red-Green-Refactor (per criterion)
+For EACH success criterion, output in your response (not in thinking) — **SCAN** (mandatory):
+> SCAN: Working on criterion [N]: [criterion text]. Approach: [approach from pre-registration].
+**RED** — Write one failing test per criterion. Real code, no mocks unless unavoidable. Run it. Watch it fail. Confirm it fails for the right reason.
+**GREEN** — Write minimal code to pass. Run tests after each change.
+**REFACTOR** — After green only: remove duplication, improve names. Keep tests green. Don't add behavior.
+**DEVIATION** — If the pre-registered approach is wrong: **STOP.** Report to the orchestrator. Minor adjustments: document in your self-assessment.
+### 4. Produce Verification Artifacts
+Save the build manifest to `$RND_DIR/builds/T<id>-manifest.md`. Write in full narrative prose — describe what was built, what decisions were made, and why. Manifest depth scales with task Criticality.
+**For `Criticality: LOW` tasks (config changes, doc edits, renaming, log lines, style fixes) — skinny manifest:**
+```markdown
+# Build Manifest: T<id>
+## Files Created/Modified
+- [list with paths]
+## Files written
+path/to/file-a.ext
+path/to/file-b.ext
+## Tests Written
+- [test name]: Tests [criterion text]
+```
+Omit Evidence Gathered, Edge Cases Covered, and External References entirely — LOW-criticality tasks don't interact with external systems by definition, and edge-case sections on trivial changes are boilerplate, not signal.
+**For `Criticality: NORMAL` and `Criticality: HIGH` tasks — full manifest:**
+```markdown
+# Build Manifest: T<id>
+## Files Created/Modified
+- [list with paths]
+## Files written
+path/to/file-a.ext
+path/to/file-b.ext
+## Evidence Gathered
+- `path/to/file.ext:NN-MM` — [what was learned]
+## Tests Written
+- [test name]: Tests [criterion text]
+## Edge Cases Covered
+- [list edge cases and how they're handled]
+## External References
+- `[reference value]` — type: [URL | email | address | API endpoint | package name | …] — provenance: [verified from user input | from existing codebase file X:line Y | generated from training data]
+```
+The `## Files written` section is **required** in every manifest. List one file path per line (relative to the repo root), no bullets or backticks. This section is parsed by the surgical-revert helper; omitting or misspelling it makes reverting impossible. Sections with genuinely no content may be written as `(none)` on a single line. Do NOT omit section headers entirely from the full-manifest form — Verifier scans for presence.
+**Do not game the skinny form.** If a task is tagged LOW but you discover during implementation that it touches an external system (API call, schema assumption, env var contract), upgrade to the full manifest and flag the misclassification to the orchestrator.
+### 5. Write Honest Self-Assessment
+Save to `$RND_DIR/builds/T<id>-self-assessment.md` (Verifier will NOT see this — be honest). The format depends on your build status.
+**For plain `DONE` status (no concerns, all criteria met with HIGH confidence, no deviations, no unverified assumptions):** write a minimal one-line file. No sections, no boilerplate.
+```markdown
+# Self-Assessment: T<id>
+All criteria met with HIGH confidence. No deviations. No unverified assumptions.
+```
+**For `DONE_WITH_CONCERNS`, `NEEDS_CONTEXT`, or `BLOCKED`:** write the full template. Any genuine uncertainty, any confidence below HIGH on any criterion, any unverified external assumption, or any deviation from plan means you are NOT plain `DONE` — use the full template even if it feels borderline.
+```markdown
+# Self-Assessment: T<id>
+## Confidence per criterion
+- [criterion 1]: HIGH / MEDIUM / LOW — [brief reason]
+## Assumptions made
+### Verified external assumptions
+- [system]: [what was verified] — evidence: [where]
+### Unverified external assumptions
+- [system]: [what was assumed] — reason unverified: [why]
+## Uncertainties & risks
+- [what you're not sure about]
+## Deviations from plan
+- [any changes from pre-registered approach, with reasons]
+```
+**Do not game the minimal form.** If you are tempted to write the one-liner to avoid effort but you have an unverified external assumption or a MEDIUM-confidence criterion, that is dishonesty — downgrade the status to `DONE_WITH_CONCERNS` and use the full template.
+## Found Issues Ledger
+For every issue encountered during a build — error, warning, broken test, bug, or gap — you MUST either fix it or record it. Silently skipping or rationalizing away an issue is not allowed.
+**Path:** `$RND_DIR/builds/T<id>-found-issues.jsonl`
+**JSON-line schema:**
+```json
+{"issue": "<description>", "location": "<path:line>", "decision": "fixed"|"escalated", "reason": "<why fixed or why escalated>"}
+```
+**Rules:**
+- Append one line per issue, even if you fix it immediately.
+- `decision="escalated"` is the only legal path for issues you cannot fix within this task scope.
+- Never omit an issue from the ledger because it seems minor or pre-existing.
+**Example:**
+```json
+{"issue": "TypeScript strict null check failure on optional field access", "location": "src/parser.ts:47", "decision": "fixed", "reason": "added null guard before property access"}
+```
+## Good Tests
+| Quality | Good | Bad |
+|---------|------|-----|
+| **Minimal** | One thing. "and" in name? Split it. | `test('validates and processes')` |
+| **Clear** | Name describes behavior | `test('test1')` |
+| **Real** | Tests actual code | Tests mock behavior |
+| **Intent** | Shows what SHOULD happen | Shows what DOES happen |
+Prefer property-based tests for invariants, roundtrips, ordering guarantees. Use fast-check, hypothesis, or propcheck if available. For specific-output tests: exact API shapes, known edge cases, error messages, UI rendering. Summarize verbose output: test/build output >50 lines → extract pass/fail counts + error messages. Large files: use offset/limit or Grep.
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "TDD will slow me down" | TDD is faster than debugging. |
+| "The approach is wrong but I'll adapt" | STOP. Report deviation. Don't silently diverge. |
+| "I'll verify it myself" | That's the Verifier's job. Produce artifacts, not verdicts. |
+## Verification Checklist
+- [ ] Every success criterion has a corresponding test
+- [ ] Watched each test fail before implementing
+- [ ] All tests pass
+- [ ] Tried to break your own implementation with at least one edge case per criterion
+- [ ] Build manifest lists all files and tests
+- [ ] Self-assessment is honest about uncertainties
+- [ ] No silent deviations from pre-registered approach
+- [ ] Output is clean (no errors, warnings)
+- [ ] Every external dependency verified against the actual system, evidence in manifest
+- [ ] Build manifest `### Evidence Gathered` contains file:line citations for each external contract
+- [ ] All external references (URLs, APIs, packages, addresses) declared in the manifest's `## External References` section with type and provenance
+- [ ] No pipeline-internal context in project code — comments, docstrings, test names, or variable names. Covers task/wave IDs (`T1`, `T01`, `wave-3`), planner phase or disposition labels (`Q4 disposition`, "compatibility audit"), and session artifact paths or meta-references (`research/*.md`, "the R&D session", "the pipeline"). Carve-out applies to RND artifact files only.
+## Convergent Iteration
+When receiving NEEDS ITERATION, address **every** failed criterion in a single pass:
+1. **Inventory all failures.** List every criterion marked FAIL or NEEDS ITERATION. Nothing ships until every item is addressed.
+2. **Diagnose root causes.** Multiple failures often share a root cause.
+3. **Check shared code paths.** Re-verify that fixes do not regress passing criteria.
+4. **Re-run ALL tests.** Run the complete test suite — not just tests related to flagged criteria.
+5. **Update the build manifest and self-assessment** to reflect all changes made in this pass.
+## Status Codes
+| Code | When to Use |
+|------|-------------|
+| `DONE` | All criteria met, all tests pass, no significant concerns. |
+| `DONE_WITH_CONCERNS` | Criteria met, tests pass, but uncertainty exists. Include `concerns:` line summarizing what Verifier should scrutinize. |
+| `NEEDS_CONTEXT` | Cannot proceed without additional information — ambiguous requirement, missing dependency, conflicting specs. |
+| `BLOCKED` | Hard blocker prevents implementation. Requires orchestrator intervention. |
+**Completion message format:**
+```
+T<id> build complete — status: DONE — manifest at $RND_DIR/builds/T<id>-manifest.md
+T<id> build complete — status: DONE_WITH_CONCERNS: [brief summary] — manifest at $RND_DIR/builds/T<id>-manifest.md
+```
+## Evidence Pack (when RND_EVIDENCE_PACK=1)
+When the orchestrator sets `RND_EVIDENCE_PACK=1`, wrap tool invocations through `run-tool.sh` to capture a reproducible evidence artifact for each test run. This is opt-in — normal builds do not require it.
+**Invocation syntax:**
+```bash
+RND_EVIDENCE_PACK=1 RND_DIR="$RND_DIR" RND_TASK_ID="<id>" \
+  run-tool.sh [--task-id <id>] -- <command> [args...]
+```
+**Artifact location:** `$RND_DIR/evidence/T<id>/` — contains `stdout.txt`, `stderr.txt`, and `manifest.json` (tool name, argv, exit code, timestamps, and sha256 hashes of all tracked input files).
+**Validation:** Before the Verifier reads the pack, `evidence-pack-gate.sh` schema-validates `manifest.json`. Packs that fail validation are rejected; the Verifier receives an error rather than stale or malformed data.
+**When to use:** Only when instructed by the pre-registration or the orchestrator. Do not wrap every command — only the test runner or tool that produces the primary verification artifact.
+## Related Skills
+- `rnd-framework:rnd-debugging` — unexpected test failures
+- `rnd-framework:rnd-iteration` — when Verifier sends back feedback
+- `rnd-framework:rnd-data-science` — numerical analysis, financial calculations, data wiring, chart generation (Julia or DuckDB)

package/skills/rnd-decomposition/SKILL.md ADDED Viewed

@@ -0,0 +1,188 @@
+---
+name: rnd-decomposition
+description: "Use when breaking a complex task into sub-tasks with pre-registration documents — structured hierarchical decomposition, dependency analysis, and testable success criteria"
+user-invocable: false
+effort: medium
+---
+# R&D Decomposition
+## Overview
+Decompose tasks into structured sub-task trees. Every sub-task gets a pre-registration document with testable success criteria BEFORE any code is written.
+**Core principle:** If you can't write testable success criteria, the task isn't understood well enough to build.
+## When to Use
+- Planning phase of `/rnd-framework:rnd-start` or `/rnd-framework:rnd-plan`
+- Any non-trivial feature, refactor, or task with multiple moving parts or unclear success criteria
+## The Iron Law
+```
+NO CODING WITHOUT PRE-REGISTRATION
+```
+## Hierarchical Decomposition
+```
+System Level:  [Feature]  <->  [System Validation]
+Module Level:  [Components]  <->  [Integration Tests]
+Unit Level:    [Functions]  <->  [Unit Tests]
+```
+1. Start at System level — what does the feature DO end-to-end?
+2. Identify Modules — what components are needed?
+3. Break into Units — what functions/utilities does each module need?
+4. Pair each level with verification — system validation, integration tests, unit tests
+A criterion is testable if a skeptical Verifier can evaluate it from evidence alone: **observable outcome** (return value, state change, error thrown), **concrete conditions** (specific inputs, thresholds, error types), **binary result** (met or not met).
+### Decomposition Heuristics
+- **Too big:** >5 success criteria → split
+- **Too small:** single function with one criterion → merge up
+- **Too vague:** "works correctly", "handles errors" → rewrite with observable outcomes
+- **Uncertain:** approach unclear → add Phase 0 spike task
+- **Unverified external contract:** DB schema, API shape, env var not confirmed → add Phase 0 spike or dedicated verification step before that task
+- **Local expert available:** set the `Local expert` field in the pre-registration so the Builder knows to invoke it
+## Decomposition Caps
+Hard limits that apply after heuristics:
+- **Maximum 4 tasks per wave** — if a wave would contain more, split into sub-waves or coalesce tasks.
+- **Minimum task scope: 1 hour of work** — tasks smaller than this must coalesce with a sibling. A task that touches one line or one config key is below the minimum scope.
+- **Coalescing rule:** when two tasks share the same file set and could be reviewed in a single pass, merge them unless their success criteria require different verification levels.
+## Exploration Cache
+Before decomposition, write structured findings to `$RND_DIR/exploration/` (`mkdir -p`). One kebab-case file per area (e.g., `hooks-architecture.md`). Each file: `## Files Examined`, `## Key Patterns`, `## Relevant Dependencies`, `## Notes for Builders`.
+## Pre-Registration Document
+```
+Task ID: T<number>
+Intent: [One sentence — what this accomplishes and why]
+Approach: [Brief planned implementation strategy]
+Expected outputs: [List of files/functions/artifacts to produce]
+Criticality: LOW | NORMAL | HIGH
+Success criteria:
+  Correctness:
+  - [ ] [Functional requirement, test passing, or contract conformance condition]
+  - [ ] [Another must-pass condition]
+  Quality:
+  - [ ] [Code quality, naming, patterns, or documentation condition]
+Verification level: unit | integration | system
+Dependencies: [Task IDs this depends on]
+Preconditions:
+  - [File/content assertion verified before build starts]
+  - [Another assertion — if any fails, task is BLOCKED]
+Local expert: [optional — name of project-local agent/skill to invoke, e.g., security-reviewer]
+External Dependencies:
+  - system: [DB | API | file | env | service]
+    contract: [What is assumed about this system — schema, response shape, format, presence]
+    verification: [How this will be confirmed — e.g., Read actual schema, query endpoint, inspect file sample]
+fulfills: [VAL-AREA-NNN, ...]
+```
+The `fulfills` field creates bidirectional traceability between tasks and Validation Contract assertions.
+`Preconditions` declares file/content assertions the Builder verifies before writing code — if any fail, task is immediately BLOCKED. Use concrete, tool-checkable assertions (Glob for file existence, Grep for function presence, Read for dependency key). Omit if the task creates new files from scratch.
+### Tiered Criteria
+| Correctness (must-pass) | Quality (should-pass) |
+|---|---|
+| "Returns 401 for expired tokens" | "Function names follow project naming convention" |
+| "Throws ValidationError when input is null" | "Inline comments explain the retry logic" |
+| "File exists at the declared output path" | "No magic numbers — constants are named" |
+**Decision rule:** "Does a user or downstream system observe this outcome?" Yes → Correctness. Maintainability/DX only → Quality. Unmet Correctness → FAIL. Unmet Quality → NEEDS ITERATION.
+## Environment Discovery
+Check if `project-facts.md` exists in `$RND_DIR`. If fresh (commit hash matches HEAD), use it directly. Otherwise run the checklist below and confirm findings with the user via `ctx.ui.notify`.
+| Area | What to scan | How |
+|------|-------------|-----|
+| Package manager | package.json, Cargo.toml, mix.exs, go.mod, pyproject.toml | Glob for config files |
+| Test framework | vitest, jest, pytest, ExUnit, go test configs | Grep for test runner in configs/scripts |
+| CI config | .github/workflows/, .gitlab-ci.yml, Jenkinsfile | Glob for CI files, Read to extract commands |
+| External service URLs | https:// references in source code | Grep for URLs in src/ |
+| Environment variables | .env.example, .env.template, CI secrets config | Read env templates, Grep for process.env/ENV/os.environ |
+| Secrets and off-limits | .gitignore patterns, CI secret names, sensitive file paths | Read .gitignore, infer from CI config |
+Findings feed into **Environment Setup**, **Infrastructure**, and **Testing Strategy** sections of plan.md.
+## Dependency Analysis
+Build a dependency matrix. Assign tasks with zero dependencies to Wave 1, tasks depending only on Wave 1 to Wave 2, and so on. Flag parallel opportunities within each wave.
+## Output
+Save to `$RND_DIR/plan.md` with these sections:
+- **Task Tree** — hierarchical list with task IDs
+- **Environment Setup** — runtime, package manager, dependencies, install commands
+- **Infrastructure** — external services (URL + auth), off-limits items
+- **Testing Strategy** — test framework, baseline count, exact run commands for unit/integration/live tests, user testing instructions
+- **Worker Guidelines** — `USE`/`OFF-LIMITS` boundaries; coding conventions from CLAUDE.md/linters; architectural patterns; design decisions
+- **Validation Contract** — numbered VAL-AREA-NNN assertions (see format below)
+- **Pre-Registration Documents** — one per task
+- **Dependency Matrix** — task dependency table
+- **Execution Schedule** — wave assignments with parallel opportunities
+- **Iteration Budgets** — per-task budgets based on criticality
+### Validation Contract Format
+```markdown
+### Area: [Functional Domain]
+#### VAL-AREA-NNN: [Assertion title]
+[One-sentence description of what must be true]
+Tool: [shell | grep | glob | read | code review]
+Evidence: [Exact command + expected output pattern]
+```
+ID format: `VAL-` + area abbreviation (2-6 uppercase chars) + `-` + 3-digit number (e.g., `VAL-AUTH-001`). Evidence must be concrete: not "tests pass" but `npx vitest run exits 0, reports >= 50 passed`. Cross-cutting assertions go under `### Area: Cross-Area`. Every assertion must be fulfilled by at least one task; every task should fulfill at least one assertion.
+## Verification Checklist
+- [ ] Every task has a complete pre-registration document
+- [ ] Every success criterion is testable and tagged Correctness or Quality
+- [ ] No circular dependencies; waves correctly ordered; parallel opportunities identified
+- [ ] Tasks >5 criteria have been split; uncertain approaches have Phase 0 spikes
+- [ ] Every task touching an external system has an `External Dependencies` field with system type, assumed contract, and verification method
+- [ ] Environment Setup, Infrastructure, and Testing Strategy sections populated
+- [ ] Worker Guidelines contains boundaries, conventions, and architecture notes
+- [ ] Validation Contract has VAL-AREA-NNN assertions with Tool and Evidence for every Correctness criterion
+- [ ] Every task has a `fulfills` field; every VAL assertion is fulfilled by at least one task
+## Plan Self-Review
+After writing all sections of plan.md, reread it with fresh eyes. This is a checklist you run yourself before notifying the orchestrator — not a subagent dispatch. The Verifier cannot save you from plan-level mistakes; they cascade through every downstream phase.
+Run these six checks against the finished plan.md. If any fails, fix inline and re-check.
+1. **Spec coverage.** For each explicit user requirement or discovery-context constraint, point to the task(s) covering it. Gaps → add a task or explicitly note it as out-of-scope in Worker Guidelines.
+2. **Placeholder scan.** Grep the plan for `TODO`, `TBD`, `???`, `XXX`, `[...]`, `handle appropriately`, `works correctly`, `as needed`. Any hit → replace with concrete content or remove.
+3. **VAL traceability.** Every `VAL-AREA-NNN` is named in at least one task's `fulfills` field, and every task has a non-empty `fulfills`. A VAL with no fulfiller, or a task fulfilling nothing, means the Validation Contract and Pre-Registration drifted apart — fix whichever side is wrong.
+4. **Identifier consistency.** For each function name, file path, type name, or env var that appears in multiple tasks, confirm the spelling matches across mentions. A function named `clearLayers` in T3 and `clearFullLayers` in T7 is the most common cascading plan error.
+5. **External-dependency completeness.** Any task whose Intent, Approach, or Expected outputs references a DB, API, file, env var, or external service MUST have a populated `External Dependencies` block with `system`, `contract`, and `verification`. Missing block → add it (this is also what gates the Reality Auditor).
+6. **Verifier test on each Correctness criterion.** Reread each Correctness criterion as if you have no context. If you can't translate it into "run X, expect Y" in under 10 seconds, rewrite it with an observable outcome, concrete condition, and binary result.
+If the plan has >10 tasks or any HIGH criticality task, consider one additional escalation: spawn a fresh reviewer with the spec and plan.md, asking them to flag only concrete problems from this same checklist. Optional; not required.
+## Related Skills
+- `rnd-framework:rnd-scheduling` — For detailed wave scheduling
+- `rnd-framework:rnd-scaling` — For choosing pipeline scale
+- `rnd-framework:rnd-orchestration` — For pipeline overview
+- `rnd-framework:rnd-data-science` — When a task involves analytical or numerical work

package/skills/rnd-experiments/SKILL.md ADDED Viewed

@@ -0,0 +1,197 @@
+---
+name: rnd-experiments
+description: "Use when independently verifying built work — defines how verifiers write experiment tests from the spec alone before reading Builder code, preventing false PASSes through mandatory independent validation"
+user-invocable: false
+allowed-tools: [Read, Write, Bash, Grep, Glob]
+effort: medium
+---
+# R&D Experiments
+## Overview
+Experiments are independent tests written by the Verifier from the pre-registration spec alone — before reading the Builder's code or tests. They shift verification from "does this look right?" to "does this actually work?" by producing evidence that cannot be anchored to the Builder's implementation choices.
+**Core principle:** Derive tests from the spec, not from what the Builder built. If your test logic mirrors the Builder's test logic, you haven't independently verified anything — you've just confirmed the Builder was internally consistent.
+## When to Use
+- Step 2 of the verification process (before reading Builder code)
+- Any time a success criterion requires observable behavior that can be executed
+- Mandatory for every criterion in the pre-registration — experiments are not on-demand
+## The Iron Law
+```
+EXPERIMENTS ARE MANDATORY FOR EVERY CORRECTNESS CRITERION — NOT OPTIONAL, NOT ON-DEMAND
+```
+If the pre-registration lists N Correctness criteria, the Verifier writes N experiments. Skipping an experiment for a Correctness criterion — because it "looks simple" or "is obviously met" — is a verification failure. The point is to produce independent evidence, not to confirm what already seems true.
+Quality criteria are optional for experiments — writing experiments for them is encouraged but not required when time is limited. Correctness criteria always get experiments.
+## What Makes Experiments Different from Builder Tests
+| Dimension | Builder Tests | Verifier Experiments |
+|-----------|--------------|---------------------|
+| Source of truth | Builder's understanding of the spec | Pre-registration spec text only |
+| When written | Before or during implementation | Before reading Builder's code |
+| What they prove | Implementation is internally consistent | Implementation meets the declared spec |
+| Who writes them | Builder agent | Verifier agent |
+| Allowed to read | Anything | Only the pre-registration and external systems |
+The distinction matters because a Builder can write tests that pass while implementing the wrong thing. If the Verifier derives tests from the same source as the Builder, the Verifier will reproduce the Builder's blind spots.
+**The information barrier:** When writing experiments, the Verifier MUST NOT have read the Builder's test files. Reading them first anchors your experiment logic to the Builder's framing. Write experiments first, then run Builder tests in Step 4.
+## Output Directory
+Save all experiment files to:
+```
+$RND_DIR/verifications/T<id>-experiments/
+```
+Example: `$RND_DIR/verifications/T3-experiments/`
+**Multi-judge mode:** When operating as one of two parallel judges, the orchestrator passes your judge identity (`judge-a`, `judge-b`, or `tiebreaker`) in the prompt. Use a subdirectory to avoid path collisions:
+```
+$RND_DIR/verifications/T<id>-experiments/<judge-id>/
+```
+Example: `$RND_DIR/verifications/T3-experiments/judge-a/`
+If no judge identity is provided (single-verifier mode), use the flat path.
+Create this directory before writing any experiment files:
+```bash
+mkdir -p "$RND_DIR/verifications/T<id>-experiments"
+# or in multi-judge mode:
+mkdir -p "$RND_DIR/verifications/T<id>-experiments/<judge-id>"
+```
+## Naming Convention
+Name experiment files after the criterion they test, using kebab-case:
+```
+exp-<criterion-slug>.test.<ext>
+```
+Examples:
+- `exp-file-exists-at-declared-path.test.ts`
+- `exp-frontmatter-contains-name-field.test.ts`
+- `exp-output-directory-is-created.test.ts`
+- `exp-returns-401-for-expired-token.test.ts`
+Use the same file extension and test runner as the project's existing test suite. One experiment file per criterion. Do not bundle multiple criteria into one file — if an experiment fails, you need to know exactly which criterion it covers.
+## What Makes a Good Experiment
+A good experiment:
+1. **Is derived from the spec text alone.** Read the criterion. Write a test that checks the criterion's stated observable outcome. Do not look at the implementation first.
+2. **Has exactly one assertion.** One experiment, one criterion. If you find yourself writing `expect A` and `expect B` in the same test, split it.
+3. **Uses real execution, not inspection.** Run the code. Do not read the source and reason about what it probably does. Observable behavior is evidence; source analysis is interpretation.
+4. **Tests the boundary, not the happy path only.** If the criterion says "returns 401 for expired tokens," also test that valid tokens are not rejected. Boundary cases reveal implementation shortcuts.
+5. **Is independent of Builder test infrastructure.** Do not import helpers, fixtures, or test utilities written by the Builder. Write your own setup — or use the project's production code directly.
+## Experiment Template
+```typescript
+// exp-<criterion-slug>.test.ts
+// Criterion: <exact criterion text from pre-registration>
+// Spec source: T<id> pre-registration, <Correctness|Quality> tier
+import { describe, test, expect } from "bun:test";
+// Import only production code, not Builder test helpers
+describe("T<id>: <criterion text>", () => {
+  test("<observable outcome — what should happen>", async () => {
+    // Arrange: set up the minimum conditions from the spec
+    // Act: invoke the behavior the criterion describes
+    // Assert: check the observable outcome stated in the criterion
+    expect(result).toBe(expectedValue);
+  });
+  // If the criterion implies a boundary or negative case, add it here
+  test("<boundary case — what should NOT happen or edge input>", async () => {
+    // ...
+  });
+});
+```
+Adapt the test runner imports to the project's conventions (Bun, Jest, Vitest, etc.).
+## Process: Writing Experiments
+For each criterion in the pre-registration:
+1. **Read only the criterion text.** Do not look at Builder files yet.
+2. **Identify the observable outcome.** What can be measured or asserted from outside the implementation?
+3. **Identify the minimal setup.** What inputs, files, or state does the criterion require to be exercised?
+4. **Write the experiment** following the template above.
+5. **Identify one boundary or negative case** — what input or condition should produce a different outcome?
+6. **Save to `$RND_DIR/verifications/T<id>-experiments/`** with the naming convention above.
+After writing all experiments (one per criterion), proceed to Step 3 of the verification process: run experiments against the Builder's code.
+## Recording Experiment Results
+When running experiments in Step 3, record the output verbatim in the verification report. Do not paraphrase. The raw output is the evidence.
+If an experiment fails, this is a Correctness-tier finding — the implementation did not satisfy the spec as the experiment interpreted it. If the experiment itself was wrong (e.g., misread the criterion), fix the experiment and note the correction, but do not delete it. The correction is part of the evidence trail.
+## Reality Experiments
+When the Builder's code touches external services — SQL schemas, HTTP APIs, MCP tools, environment variables — write at least one reality experiment per external contract in addition to functional criterion experiments.
+Reality experiments are adversarial: frame them as "if this assumption is wrong, this command will show X." Design experiments to disprove the assumption, not confirm it. If the assumption survives, that is your evidence.
+### When to write reality experiments
+- Builder code references a specific table column, API response field, or env var
+- Builder self-assessment lists unverified external assumptions
+- The pre-registration says the Builder verified something against a live system — independently verify it
+### Example: SQL Schema Verification
+```
+Builder code assumes: SELECT email, name FROM users
+Adversarial experiment: Run PRAGMA table_info(users) and assert output includes rows for email and name
+If the output does NOT include those columns → the Builder hallucinated the schema
+Command: sqlite3 ./dev.db "PRAGMA table_info(users);"
+Disproving condition: email or name row is absent, or the table does not exist
+```
+### Example: HTTP API Shape
+```
+Builder code assumes: GET /api/v2/users returns { data: User[], total: number }
+Adversarial experiment: curl -s https://api.example.com/api/v2/users | head -c 500
+Disproving condition: Response missing "data" or "total" keys, or is not valid JSON
+```
+### Naming convention
+Name reality experiment files with the `exp-reality-` prefix:
+```
+exp-reality-users-schema.test.ts
+exp-reality-api-users-shape.test.ts
+exp-reality-env-var-database-url.test.ts
+```
+Save to the same `$RND_DIR/verifications/T<id>-experiments/` directory as functional experiments.
+The full adversarial methodology — hypothesis structure, verdict rules (VALID/INVALID/UNCHECKED), and reality reports — is defined in `rnd-framework:rnd-reality-auditing`. This section tells you to include reality experiments in your experiment suite; that skill tells you how to structure them.
+## Related Skills
+- `rnd-framework:rnd-verification` — Full 6-step verification process; experiments are Step 2 and Step 3
+- `rnd-framework:rnd-failure-modes` — Failure modes that experiments are designed to catch (Premature Satisfaction, Trusting Agent Reports)
+- `rnd-framework:rnd-iteration` — What happens when experiment failures trigger NEEDS ITERATION
+- `rnd-framework:rnd-reality-auditing` — Full adversarial methodology for external service contracts; reality experiments follow its structure