@ericrisco/rsc 0.1.31 → 0.1.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/README.md +4 -4
  2. package/manifest.json +24 -5
  3. package/package.json +1 -1
  4. package/scripts/lib/domains.js +1 -1
  5. package/skills/analyze/SKILL.md +1 -0
  6. package/skills/author-skill/SKILL.md +20 -0
  7. package/skills/author-skill/references/description-recipe.md +2 -0
  8. package/skills/debug/SKILL.md +1 -1
  9. package/skills/implement/SKILL.md +72 -2
  10. package/skills/implement/references/per-task-review.md +46 -0
  11. package/skills/implement/scripts/review-package +59 -0
  12. package/skills/implement/scripts/sdd-workspace +47 -0
  13. package/skills/implement/scripts/task-brief +77 -0
  14. package/skills/parallel/SKILL.md +29 -0
  15. package/skills/plan/references/plan-template.md +18 -0
  16. package/skills/roast-me/SKILL.md +124 -0
  17. package/skills/roast-me/evals/README.md +76 -0
  18. package/skills/roast-me/evals/cases.yaml +75 -0
  19. package/skills/roast-me/prompts/analyze.md +90 -0
  20. package/skills/roast-me/prompts/compute.md +100 -0
  21. package/skills/roast-me/prompts/roast.md +181 -0
  22. package/skills/roast-me/tools/adapters/__init__.py +1 -0
  23. package/skills/roast-me/tools/adapters/__pycache__/__init__.cpython-312.pyc +0 -0
  24. package/skills/roast-me/tools/adapters/__pycache__/base.cpython-312.pyc +0 -0
  25. package/skills/roast-me/tools/adapters/__pycache__/claude.cpython-312.pyc +0 -0
  26. package/skills/roast-me/tools/adapters/__pycache__/codex.cpython-312.pyc +0 -0
  27. package/skills/roast-me/tools/adapters/__pycache__/gemini.cpython-312.pyc +0 -0
  28. package/skills/roast-me/tools/adapters/__pycache__/registry.cpython-312.pyc +0 -0
  29. package/skills/roast-me/tools/adapters/base.py +53 -0
  30. package/skills/roast-me/tools/adapters/claude.py +140 -0
  31. package/skills/roast-me/tools/adapters/codex.py +113 -0
  32. package/skills/roast-me/tools/adapters/gemini.py +121 -0
  33. package/skills/roast-me/tools/adapters/registry.py +68 -0
  34. package/skills/roast-me/tools/extract_prompts.py +520 -0
  35. package/skills/sdd/SKILL.md +23 -0
  36. package/skills/ship/SKILL.md +9 -1
  37. package/skills/specify/SKILL.md +26 -1
  38. package/skills/suggest/SKILL.md +1 -1
  39. package/skills/tasks/SKILL.md +25 -0
  40. package/skills/worktrees/SKILL.md +25 -0
package/README.md CHANGED
@@ -21,7 +21,7 @@ skills that fit — one at a time — into every assistant you pick, and keeps t
21
21
  equipped as you work.
22
22
 
23
23
  From *"document my company"* to *"ship a FastAPI service"* to *"grow my YouTube
24
- channel"* — **231 skills across 21 domains**, every one researched against live
24
+ channel"* — **232 skills across 21 domains**, every one researched against live
25
25
  2025-2026 sources and **adversarially scored ≥ 8.5/10** before it shipped.
26
26
 
27
27
  ```bash
@@ -129,7 +129,7 @@ $ rsc
129
129
  ██████╗ ███████╗ ██████╗ ← animated gradient wordmark
130
130
  ██╔══██╗██╔════╝██╔════╝
131
131
  ██████╔╝███████╗██║
132
- 231 skills · one CLI · zero bloat
132
+ 232 skills · one CLI · zero bloat
133
133
 
134
134
  What do you want to do? ↑↓ move · enter select
135
135
  ❯ Base install — the essentials (orient + suggest + harness + init)
@@ -243,7 +243,7 @@ just asks in plain language.
243
243
 
244
244
  ## The catalog
245
245
 
246
- 231 skills, grouped by what you're trying to do. Click any skill to read its
246
+ 232 skills, grouped by what you're trying to do. Click any skill to read its
247
247
  `SKILL.md`. It fires on its own when a task matches.
248
248
 
249
249
  ### 🧭 Core & control plane
@@ -328,7 +328,7 @@ Each with a `02-DOCS` feedback loop that learns from your own results. `remotion
328
328
 
329
329
  ### 🧠 Knowledge & meta
330
330
 
331
- [knowledge-ops](skills/knowledge-ops/) · [codebase-onboarding](skills/codebase-onboarding/) · [research-ops](skills/research-ops/) · [decision-records](skills/decision-records/) · [continuous-learning](skills/continuous-learning/) · [skill-scout](skills/skill-scout/) · [context-budget](skills/context-budget/)
331
+ [knowledge-ops](skills/knowledge-ops/) · [codebase-onboarding](skills/codebase-onboarding/) · [research-ops](skills/research-ops/) · [decision-records](skills/decision-records/) · [continuous-learning](skills/continuous-learning/) · [skill-scout](skills/skill-scout/) · [context-budget](skills/context-budget/) · [roast-me](skills/roast-me/)
332
332
 
333
333
  ---
334
334
 
package/manifest.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
- "version": "0.1.31",
2
+ "version": "0.1.33",
3
3
  "counts": {
4
- "skills": 231
4
+ "skills": 232
5
5
  },
6
6
  "skills": [
7
7
  {
@@ -1123,7 +1123,7 @@
1123
1123
  },
1124
1124
  {
1125
1125
  "id": "debug",
1126
- "description": "Use when something is broken and the impulse is to start patching — a bug, a failing or flaky test, a crash, an exception, a wrong result, a regression, or behavior nobody can explain — and you want the root cause found BEFORE any fix is proposed. The on-demand rsc-sdd diagnosis discipline: reproduce → isolate → hypothesize → fix → verify, callable from inside implement (or any phase) the moment a check fails for a reason you don't understand. Triggers: 'debug this', 'why is this failing', 'this test is flaky', 'find the root cause', 'it works locally but not in CI', 'this used to work', 'track down this bug', 'the fix didn't stick', 'figure out why', 'NullPointer/segfault/500 out of nowhere', 'intermittent failure'. It diagnoses and fixes the ONE confirmed cause, then hands back. NOT writing a planned feature (implement), NOT the lint/test gate (verify), NOT adversarial diff reading (review). Honors the harness accompaniment dial.",
1126
+ "description": "Use when something is broken and the impulse is to start patching — a bug, a failing or flaky test, a crash, an exception, a wrong result, a regression, or behavior nobody can explain — and you want the root cause found BEFORE any fix is proposed. The on-demand rsc-sdd diagnosis discipline, callable from inside implement (or any phase) the moment a check fails for a reason you don't understand. Triggers: 'debug this', 'why is this failing', 'this test is flaky', 'find the root cause', 'it works locally but not in CI', 'this used to work', 'track down this bug', 'the fix didn't stick', 'figure out why', 'NullPointer/segfault/500 out of nowhere', 'intermittent failure'. Fixes the ONE confirmed cause, then hands back. NOT writing a planned feature (that is `implement`), NOT the lint/test gate (that is `verify`), NOT adversarial diff reading (that is `review`). Honors the harness accompaniment dial.",
1127
1127
  "tags": [
1128
1128
  "debug",
1129
1129
  "bug",
@@ -2001,7 +2001,7 @@
2001
2001
  },
2002
2002
  {
2003
2003
  "id": "implement",
2004
- "description": "Use when executing a planned feature into working, tested code — turning the tasks list into commits, writing code task-by-task under TDD discipline (failing test first, then the code that passes it, then refactor), with a stop-and-show checkpoint after each task. Triggers: 'implement the plan', 'build out the tasks', 'start coding this feature', 'work through the task list', 'execute the implementation', 'write the code for this spec', 'do task 3', 'continue implementing', 'red-green-refactor this', 'implement with tests first'. The SDD phase AFTER analyze and BEFORE verify. Embeds red→green→refactor and delegates concrete test tooling to the stack skill (fastapi/go/nextjs/flutter); fans independent tasks out via parallel; logs every non-obvious choice to 02-DOCS/wiki/sdd/decisions.md and stays inside the constitution. NOT spec writing, NOT planning, NOT the final lint/audit gate.",
2004
+ "description": "Use when an approved plan and task list exist and it is time to turn them into working, tested code — the SDD phase AFTER analyze and BEFORE verify. Enforces TDD discipline: a test fails before the code that makes it pass. Triggers: 'implement the plan', 'build out the tasks', 'start coding this feature', 'work through the task list', 'execute the implementation', 'write the code for this spec', 'do task 3', 'continue implementing', 'red-green-refactor this', 'implement with tests first'. Delegates concrete test tooling to the stack skill (fastapi/go/nextjs/flutter) and fans independent tasks out via `parallel`. NOT spec writing (that is `specify`), NOT planning (that is `plan`), NOT the final lint/test/audit gate (that is `verify`).",
2005
2005
  "tags": [
2006
2006
  "sdd",
2007
2007
  "implement",
@@ -3557,6 +3557,25 @@
3557
3557
  ],
3558
3558
  "profiles": []
3559
3559
  },
3560
+ {
3561
+ "id": "roast-me",
3562
+ "description": "Use when you want honest, comedic feedback on how you prompt — analyzing your own past sessions to score prompt quality and compute efficiency, surface your worst habits, and generate a model-selection cheat sheet. Triggers: 'roast me', 'roast my prompting', 'audit my prompting habits', 'how good are my prompts', 'am I prompting well', 'what are my bad prompt habits', 'analiza mis prompts', 'puntua els meus prompts', 'how much am I wasting on model costs'. NOT reviewing your code or output quality (use code-review for that) and NOT a general prompt-engineering tutorial (use prompt-engineering for that).",
3563
+ "tags": [
3564
+ "prompting",
3565
+ "self-audit",
3566
+ "compute-efficiency",
3567
+ "ai-hygiene",
3568
+ "learning"
3569
+ ],
3570
+ "recommends": [
3571
+ "prompt-engineering",
3572
+ "code-review",
3573
+ "context-budget"
3574
+ ],
3575
+ "profiles": [
3576
+ "full"
3577
+ ]
3578
+ },
3560
3579
  {
3561
3580
  "id": "ruby",
3562
3581
  "description": "Use when writing idiomatic Ruby outside Rails — plain scripts, CLIs, gems, libraries — or refactoring imperative loops into Enumerable chains, designing module mixins, packaging with Bundler, deciding whether metaprogramming is worth it, or setting up Minitest/RSpec. Triggers: 'write this in Ruby', 'refactor this loop with map/reduce', 'build a gem', 'gemspec / Gemfile.lock', 'method_missing or define_method?', 'Minitest vs RSpec', 'FrozenError after Ruby 3.4 upgrade', 'mixin vs inheritance', 'escribir una gema en Ruby', 'refactoritzar amb blocs'. NOT Rails/ActiveRecord/controllers (that is rails).",
@@ -3762,7 +3781,7 @@
3762
3781
  },
3763
3782
  {
3764
3783
  "id": "ship",
3765
- "description": "Use when the work is complete and verified and it is time to CLOSE the development branch — the final phase of the rsc SDD chain, after review approves the diff. Triggers: 'ship it', 'close the branch', 'open the PR', 'merge this', 'merge into main', 'create the pull request', 'how do I land this work', 'finish this feature', 'haz el merge', 'abre el PR', 'cierra la rama', 'súbelo a main', 'clean up the branch', 'I'm done, what now'. Presents exactly three landing options (direct merge / pull request / park-or-discard) matched to the workflow, runs the pre-ship safety checklist (review verdict, clean tree, rebased, no secrets), and writes the commit and PR. HARD RULE: git authorship is ALWAYS Eric — never Claude. No Co-Authored-By Claude, no 'generated with' footer in any commit or PR body. NOT running lint/type/test (that is verify), NOT reading the diff adversarially (that is review), NOT the deploy/release mechanics to a server (that is deployment). Honors the harness accompaniment dial.",
3784
+ "description": "Use when the work is complete and verified and it is time to CLOSE the development branch — the final phase of the rsc SDD chain, after review approves the diff. Triggers: 'ship it', 'close the branch', 'open the PR', 'merge this', 'merge into main', 'create the pull request', 'how do I land this work', 'finish this feature', 'haz el merge', 'abre el PR', 'cierra la rama', 'súbelo a main', 'clean up the branch', 'I'm done, what now'. HARD RULE it enforces: git authorship is ALWAYS Eric — never a Co-Authored-By or 'generated with' footer in any commit or PR. NOT running lint/type/test (that is `verify`), NOT reading the diff adversarially (that is `review`), NOT deploy/release mechanics to a server (that is `deployment`). Honors the harness accompaniment dial.",
3766
3785
  "tags": [
3767
3786
  "sdd",
3768
3787
  "ship",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ericrisco/rsc",
3
- "version": "0.1.31",
3
+ "version": "0.1.33",
4
4
  "description": "Eric Risco's agent-skills catalog as a granular, self-recommending CLI installer.",
5
5
  "type": "module",
6
6
  "bin": {
@@ -22,7 +22,7 @@ export const DOMAINS = [
22
22
  { title: 'Ship & operate — devops', ids: ['docker', 'github-actions', 'git-workflow', 'domains-dns', 'monitoring', 'email-deliverability', 'scaling', 'deployment'] },
23
23
  { title: 'Ship & operate — quality & security', ids: ['code-review', 'security-scan', 'secure-coding', 'testing-py', 'testing-web', 'testing-go', 'e2e-testing', 'accessibility', 'performance', 'error-handling', 'observability'] },
24
24
  { title: 'Design & content craft', ids: ['design', 'presentations', 'course-storytelling', 'course-builder', 'technical-writing', 'translation-l10n'] },
25
- { title: 'Knowledge & meta', ids: ['knowledge-ops', 'codebase-onboarding', 'research-ops', 'decision-records', 'continuous-learning', 'skill-scout', 'context-budget'] },
25
+ { title: 'Knowledge & meta', ids: ['knowledge-ops', 'codebase-onboarding', 'research-ops', 'decision-records', 'continuous-learning', 'skill-scout', 'context-budget', 'roast-me'] },
26
26
  ];
27
27
 
28
28
  export function allDomainIds() {
@@ -74,6 +74,7 @@ Run every one. Each compares a specific pair (or the whole set against the const
74
74
  4. **Contradiction** — direct disagreements between two artifacts: the spec says Postgres, the plan says SQLite; the spec says "no auth in v1", a task adds login. Quote both sides.
75
75
  5. **Duplication** — the same requirement stated twice in conflicting words, or two tasks doing the same job. Duplication is where contradictions breed later.
76
76
  6. **Ambiguity / underspecification** — requirements or tasks too vague to implement or to verify ("handle errors gracefully", "make it fast" with no number, a task with no done-check). These do not block by themselves but feed back to `clarify` (spec) or `tasks` (missing done-check).
77
+ - **Carrier completeness (isolated-implementer check).** Because `implement`/`parallel` dispatch tasks to **context-isolated** `developer` subagents that see only their own task, also confirm the plan carries a **§0 Global Constraints** block (verbatim project-wide values) and that every task whose correctness depends on a contract it doesn't own has an **Interfaces** block (`Consumes`/`Produces`, exact signatures). A constraint or neighbor-signature that lives only in prose is invisible to the blind worker — flag a missing carrier as `AMBIGUOUS` (it will surface as drift or breakage at implement time). See `../plan/references/plan-template.md` §0 and `../tasks/SKILL.md` (Per-task Interfaces).
77
78
 
78
79
  ### Requirement coverage map (build this every run)
79
80
 
@@ -137,6 +137,26 @@ Match the catalog, do not invent a new register:
137
137
  - Original prose. Mine ideas from anywhere; the words are Eric's. Do **not** reproduce another ecosystem's signature artifacts or phrasing — no borrowed "1% chance" urgency blocks, no copied rationalization wording, no `*-reviewer-prompt.md` files, no verbatim flowcharts. The rsc identity is its own.
138
138
  - Cross-reference siblings by name or `../<sibling>/SKILL.md`, only ones that actually exist.
139
139
 
140
+ ## Match the form to the failure
141
+
142
+ Before you write an instruction, name the **failure** it's meant to prevent — then pick the form
143
+ that actually fixes *that* failure. The instinct is to write a prohibition ("don't do X") for
144
+ everything. That instinct is wrong for most failures, and measurably counter-productive for one
145
+ class: **a prohibition aimed at output shape tends to summon the very thing it forbids** (the model
146
+ attends to the named token), and can do *worse* than saying nothing at all. Match deliberately:
147
+
148
+ | The failure is… | Use this form | Why, and example |
149
+ | --- | --- | --- |
150
+ | **Discipline** — the agent knows the rule but skips it under pressure (time, sunk cost, "just this once") | **Prohibition + rationalization table** (`Anti-patterns → STOP`) | The agent already knows what's right; it needs the excuse pre-refuted at the moment of temptation. This is exactly what the `STOP` tables are for. "I'll skip the failing test to ship" → name the rationalization, kill it. |
151
+ | **Wrong-shaped output** — tone, verbosity, format, structure come out wrong | **Positive recipe / contract** (show the target shape) | A prohibition ("don't be verbose", "no marketing fluff") makes it *more* likely — the model fixates on the banned shape. Give the shape to hit instead: "Reply in ≤3 sentences, lead with the verdict." Demonstrate, don't forbid. |
152
+ | **Omitted element** — the agent forgets a required piece | **Required structural slot** (a checklist item or a template field it must fill) | You can't prohibit an absence. Make the slot mandatory so its emptiness is visible — a `Done-of-done` checkbox, a template section, a result-envelope field. |
153
+ | **Conditional behavior** — right action depends on the situation | **Predicate-keyed conditional** ("When X → do Y; otherwise Z") | A flat rule fires in the wrong context. Key the behavior to its trigger so the agent branches correctly instead of over- or under-applying. |
154
+
155
+ So: keep the `Anti-patterns → STOP` tables — they're the correct tool for *discipline* failures,
156
+ which is most of what the SDD-discipline skills guard. But when you catch yourself writing "don't
157
+ make it X" about the *shape* of an output, stop and rewrite it as the shape to hit. Prohibition is
158
+ a scalpel for one failure type, not the default for all of them.
159
+
140
160
  ## The best-practice rubric (audit before shipping)
141
161
 
142
162
  A skill ships only when every box is checked or a miss is consciously justified.
@@ -26,6 +26,7 @@ Four moves, in order:
26
26
  - **≤ 1024 characters.** Count them. Over budget = trim verb phrases and trigger duplicates first, never the boundary.
27
27
  - **Valid single-line quoted YAML.** The description is one physical line wrapped in double quotes. No raw newlines inside the value. Avoid characters that break YAML in double quotes; if you need an apostrophe inside, it is fine (single quotes are literal inside double-quoted YAML). Never put an unescaped `"` inside.
28
28
  - **Third person, present tense.** The agent reads *about* the skill. "Use when…", "Triggers on…", "Knows…". Never "I", never "you should".
29
+ - **State triggers and boundary — never the workflow.** The description says *when* to fire and *what it is not*. It must **not** summarize the procedure the body owns (the steps, the phase count, "does X then Y then Z"). This is not a style nit — it changes behavior. A description that pre-summarizes the steps becomes a *substitute* for reading the body: the agent acts on the summary and skips the real instructions. (Observed in the wild: a skill whose description said it runs *two* reviews made the agent run only *one*, because the summary was treated as the spec; deleting the workflow summary made the agent read the body and do both.) The verb phrases in move #2 name *capabilities* ("repairing cases.yaml"), not an ordered recipe ("first lint, then split, then validate"). If your description tells the reader how the skill works step by step, cut it down to triggers + boundary.
29
30
 
30
31
  ## Budget tactics when you blow 1024
31
32
 
@@ -61,6 +62,7 @@ Before committing a description, ask:
61
62
  - Could the router tell from this line alone *when* to fire it? (situation present)
62
63
  - Are there phrasings a real user would type, in their language? (triggers present)
63
64
  - Does it say what it is **not**, naming the sibling? (boundary present)
65
+ - Does it describe *when/what-not*, and **never** the step-by-step procedure? (no workflow summary — if a reader could skip the body and act on the description alone, it leaks the workflow)
64
66
  - Does it parse as YAML and fit 1024? (run the check)
65
67
 
66
68
  If any answer is no, it is not done.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: debug
3
- description: "Use when something is broken and the impulse is to start patching — a bug, a failing or flaky test, a crash, an exception, a wrong result, a regression, or behavior nobody can explain — and you want the root cause found BEFORE any fix is proposed. The on-demand rsc-sdd diagnosis discipline: reproduce → isolate → hypothesize → fix → verify, callable from inside implement (or any phase) the moment a check fails for a reason you don't understand. Triggers: 'debug this', 'why is this failing', 'this test is flaky', 'find the root cause', 'it works locally but not in CI', 'this used to work', 'track down this bug', 'the fix didn't stick', 'figure out why', 'NullPointer/segfault/500 out of nowhere', 'intermittent failure'. It diagnoses and fixes the ONE confirmed cause, then hands back. NOT writing a planned feature (implement), NOT the lint/test gate (verify), NOT adversarial diff reading (review). Honors the harness accompaniment dial."
3
+ description: "Use when something is broken and the impulse is to start patching — a bug, a failing or flaky test, a crash, an exception, a wrong result, a regression, or behavior nobody can explain — and you want the root cause found BEFORE any fix is proposed. The on-demand rsc-sdd diagnosis discipline, callable from inside implement (or any phase) the moment a check fails for a reason you don't understand. Triggers: 'debug this', 'why is this failing', 'this test is flaky', 'find the root cause', 'it works locally but not in CI', 'this used to work', 'track down this bug', 'the fix didn't stick', 'figure out why', 'NullPointer/segfault/500 out of nowhere', 'intermittent failure'. Fixes the ONE confirmed cause, then hands back. NOT writing a planned feature (that is `implement`), NOT the lint/test gate (that is `verify`), NOT adversarial diff reading (that is `review`). Honors the harness accompaniment dial."
4
4
  tags: [debug, bug, troubleshoot]
5
5
  recommends: []
6
6
  profiles: [core, full]
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: implement
3
- description: "Use when executing a planned feature into working, tested code — turning the tasks list into commits, writing code task-by-task under TDD discipline (failing test first, then the code that passes it, then refactor), with a stop-and-show checkpoint after each task. Triggers: 'implement the plan', 'build out the tasks', 'start coding this feature', 'work through the task list', 'execute the implementation', 'write the code for this spec', 'do task 3', 'continue implementing', 'red-green-refactor this', 'implement with tests first'. The SDD phase AFTER analyze and BEFORE verify. Embeds red→green→refactor and delegates concrete test tooling to the stack skill (fastapi/go/nextjs/flutter); fans independent tasks out via parallel; logs every non-obvious choice to 02-DOCS/wiki/sdd/decisions.md and stays inside the constitution. NOT spec writing, NOT planning, NOT the final lint/audit gate."
3
+ description: "Use when an approved plan and task list exist and it is time to turn them into working, tested code — the SDD phase AFTER analyze and BEFORE verify. Enforces TDD discipline: a test fails before the code that makes it pass. Triggers: 'implement the plan', 'build out the tasks', 'start coding this feature', 'work through the task list', 'execute the implementation', 'write the code for this spec', 'do task 3', 'continue implementing', 'red-green-refactor this', 'implement with tests first'. Delegates concrete test tooling to the stack skill (fastapi/go/nextjs/flutter) and fans independent tasks out via `parallel`. NOT spec writing (that is `specify`), NOT planning (that is `plan`), NOT the final lint/test/audit gate (that is `verify`)."
4
4
  tags: [sdd, implement, code]
5
5
  recommends: [verify]
6
6
  profiles: [core, full]
@@ -66,7 +66,9 @@ REFACTOR → with the test green, clean up names/duplication/shape. Re-run; stil
66
66
  CHECK → re-read the task's done-check. Met? Constitution still honored? Decision worth logging?
67
67
  PROGRESS → append task/test/blocker/decision state to 02-DOCS/wiki/sdd/progress/<slug>.md.
68
68
  COMMIT → commit this task as one logical unit (authorship = Eric; see ship for the rule).
69
- CHECKPOINT stop and show: what changed, test output, the next task. Wait per the dial.
69
+ REVIEW dispatch a fresh reviewer subagent over THIS task's commits (see "Per-task review gate").
70
+ Fold its Critical/Important findings back in before you move on; Minor can wait.
71
+ CHECKPOINT → stop and show: what changed, test output, review verdict, the next task. Wait per the dial.
70
72
  ```
71
73
 
72
74
  The order is load-bearing. **Red before green** is not a suggestion — a test you write *after* the
@@ -143,6 +145,18 @@ Entry shape:
143
145
 
144
146
  This file is what makes resumes and archive reliable. Never rewrite old entries; append corrections as new entries.
145
147
 
148
+ ### Resume & recovery — trust the ledger, not memory
149
+
150
+ The progress file is the source of truth across compaction and restarts:
151
+
152
+ - **A task recorded `status: complete` here is DONE — never re-dispatch it.** Re-running completed
153
+ tasks (re-implementing, re-committing) is the single most expensive recovery failure: it rebuilds
154
+ work that already exists and can clobber later commits.
155
+ - **After a compaction or a fresh session, reconstruct state from this ledger + `git log`, not from
156
+ memory.** Read the progress entries and the commit history first; resume at the first task not
157
+ marked complete. If the ledger and the working tree disagree, believe the tree and append a
158
+ correction entry — do not silently re-run.
159
+
146
160
  ## Running independent tasks in parallel
147
161
 
148
162
  When two or more tasks are genuinely disjoint — **no shared files, no shared state, no ordering
@@ -164,6 +178,62 @@ delegate a task or fan out via `parallel`, dispatch it to that agent (e.g. Claud
164
178
  `subagent_type: developer`) so the bulk of TDD execution runs on the cheaper-but-capable model
165
179
  while you stay on the session model to orchestrate. Full rationale: `../sdd/references/model-routing.md`.
166
180
 
181
+ ### The dispatch contract (carriers, statuses, escalation)
182
+
183
+ A dispatched worker is **context-isolated** — it sees only what you hand it, not your session. Make
184
+ each dispatch self-sufficient and make its return machine-checkable:
185
+
186
+ - **Carry the task, not your context.** Hand the worker its task brief, its **Interfaces** block
187
+ (`Consumes`/`Produces`, exact signatures — from `tasks`), and the plan's **§0 Global Constraints**.
188
+ Anything not in those three is invisible to it. Prefer **file paths over pasted text** (use
189
+ `scripts/task-brief` and `scripts/review-package`, below) — pasted briefs and diffs stay resident
190
+ in your context and get re-read every turn.
191
+ - **Model is REQUIRED on every dispatch.** Always set the subagent's model explicitly (the
192
+ `developer` agent already pins balanced). An omitted model silently inherits the session's model —
193
+ usually the most expensive one — which is the quiet way fan-out cost explodes.
194
+ - **The worker reports one of four statuses** (it does not guess when unsure):
195
+ - `DONE` — done-check green, `Produces` contract met.
196
+ - `DONE_WITH_CONCERNS` — green, but it flagged a risk/assumption for you to adjudicate.
197
+ - `BLOCKED` — cannot proceed (failing dependency, contradiction, missing access). Stops; does not
198
+ hack around it.
199
+ - `NEEDS_CONTEXT` — a `Consumes`/constraint it needed was not in the brief. Stops; names what's missing.
200
+ - **Never force the same model to retry unchanged.** On `BLOCKED`/`NEEDS_CONTEXT` you must *change
201
+ something* before re-dispatching: add the missing context/interface, fix the dependency, or
202
+ escalate the tier (balanced → heavy) for a genuinely hard sub-problem. Re-running an identical
203
+ prompt on an identical model is how a loop burns budget without progress.
204
+
205
+ ## Per-task review gate
206
+
207
+ After a task is committed and before its checkpoint, run a **small, fresh-eyes review of that task
208
+ alone** — catching over-/under-building while it is one commit cheap to fix, instead of waiting for
209
+ the whole-branch review when it is buried in a large diff. It is a mid-build gate, not the finish
210
+ line (see the boundary below).
211
+
212
+ How to run it:
213
+
214
+ 1. **Package the diff as a file, not a paste.** `scripts/review-package <BASE> <HEAD>` where `BASE`
215
+ is the task's *recorded* base sha (never `HEAD~1` — that drops every commit but the last of a
216
+ multi-commit task). It writes the commit list + `--stat` + full diff to one file.
217
+ 2. **Dispatch a fresh reviewer subagent** (context-isolated — it must NOT inherit your session
218
+ reasoning). Hand it: the review-package *path*, the task's **done-check**, the task's
219
+ **Interfaces → Produces** contract, and the plan's **§0 Global Constraints**. With routing on,
220
+ balanced is the floor; escalate to heavy for a risky/security-touching task. The frozen brief
221
+ lives in `references/per-task-review.md`.
222
+ 3. **It returns two verdicts in one pass:** *spec compliance* (missing / extra / misunderstood vs the
223
+ done-check + Produces) and *code quality* — every finding tagged severity
224
+ (`Critical`/`Important`/`Minor`) with `file:line` evidence.
225
+ 4. **Fold `Critical`/`Important` back in now**; `Minor` may wait for the end-of-branch review.
226
+
227
+ **Anti-pre-judging (hard rule).** The dispatch must never tell the reviewer what *not* to flag,
228
+ pre-rate a finding's severity, or explain why a choice is fine ("the plan chose X, treat as Minor").
229
+ A stated rationale never downgrades a finding — that is laundering your own bias through the reviewer.
230
+ Hand it the evidence and the contract; let it judge.
231
+
232
+ **Boundary — this does not replace the end gates.** `verify` still runs the *whole* suite and the
233
+ acceptance criteria once at the end, and `review` still reads the *whole* diff adversarially once
234
+ before `ship`. The per-task gate is the cheap early pass; the broad reviews remain. Skip the per-task
235
+ gate on a genuinely trivial task (a one-line change), like the rest of the chain.
236
+
167
237
  ## Model tier — `balanced` (opt-in routing)
168
238
 
169
239
  This phase's default model tier is **`balanced`** — it is the bulk of TDD execution: cost-sensitive, with quality balanced handles well. Routing is **off** unless `models.enabled: true` in `02-DOCS/wiki/sdd/config.yaml`. When on: resolve this phase's tier (`models.overrides` wins over `models.phases`), map it to a model via `models.tiers`, and apply per `../sdd/references/model-routing.md` — announce the switch per the accompaniment dial when it differs from the session model, and dispatch any `Task`/`parallel` subagents on that model (this is where routing pays off most — fan-out runs on `balanced` while a hard sub-problem can be escalated to `heavy`). Routing off or no profile → honor the session model silently. Never fake a switch a tool can't make; skip routing on a one-line change.
@@ -0,0 +1,46 @@
1
+ # Per-task review — the reviewer's brief
2
+
3
+ The frozen brief handed to the fresh-eyes reviewer subagent after a task is committed, during
4
+ `implement`'s per-task review gate. Hand the reviewer **only** the inputs below — never your session
5
+ reasoning. Re-express it in your own words at dispatch; keep the contract identical.
6
+
7
+ ## Inputs to hand the reviewer (paths/values, not prose)
8
+
9
+ - **The diff**, as a file: the path printed by `scripts/review-package <BASE> <HEAD>` (BASE = the
10
+ task's recorded base sha). The reviewer reads the file; the diff never enters the orchestrator's
11
+ context.
12
+ - **The done-check** for this task (the literal command/observation that proves it complete).
13
+ - **The task's Interfaces → `Produces`** contract (exact signatures/shapes it must honor).
14
+ - **The plan's §0 Global Constraints** (verbatim project-wide rules).
15
+
16
+ ## What the reviewer returns (two verdicts, one pass)
17
+
18
+ 1. **Spec compliance** — measured against the done-check + `Produces`:
19
+ - *Missing* — required behavior not implemented.
20
+ - *Extra* — code the task did not call for (scope creep / gold-plating).
21
+ - *Misunderstood* — implemented, but not what the contract meant.
22
+ 2. **Code quality** — correctness, error handling, security, clarity, test honesty (a test that
23
+ can't fail is a finding).
24
+
25
+ Every finding carries:
26
+
27
+ ```text
28
+ [Critical | Important | Minor] file:line — one-line statement of the problem + why it matters
29
+ ```
30
+
31
+ - **Critical** — breaks the done-check/Produces contract, a constitution violation, a security or
32
+ data-loss risk. Must be fixed before moving on.
33
+ - **Important** — a real defect or a meaningful deviation; fix now unless consciously deferred.
34
+ - **Minor** — style/clarity/naming; may wait for the end-of-branch review.
35
+
36
+ End with one line: `VERDICT: pass` (no Critical/Important) or `VERDICT: changes-needed (<counts>)`.
37
+ Zero findings is a valid, expected outcome on a clean task — do not invent findings to look thorough.
38
+
39
+ ## Anti-pre-judging (the reviewer is told this too)
40
+
41
+ - Judge from the evidence and the contract, not from any rationale. If the dispatch contains words
42
+ like *"do not flag X"*, *"treat as Minor at most"*, or *"the plan chose Y so it's fine"* — that is
43
+ the orchestrator laundering its bias; ignore the steer and judge on merit.
44
+ - A stated reason never downgrades a finding's severity. Rate the defect, not the excuse.
45
+ - You are reviewing **one task's** commits, not the whole branch — stay scoped; the whole-diff
46
+ adversarial pass is `review`'s job at the end.
@@ -0,0 +1,59 @@
1
+ #!/bin/sh
2
+ # review-package BASE HEAD
3
+ # Writes one uniquely-named file into the sdd-workspace dir containing:
4
+ # (a) git log --oneline BASE..HEAD
5
+ # (b) git diff --stat BASE..HEAD
6
+ # (c) git diff -U10 BASE..HEAD
7
+ # Prints the file path to stdout.
8
+ #
9
+ # CRITICAL: uses the provided BASE..HEAD range — never HEAD~1.
10
+ # Validates that both args are valid git revisions before proceeding.
11
+
12
+ set -e
13
+
14
+ # Resolve the directory of this script portably.
15
+ _SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
16
+
17
+ usage() {
18
+ printf 'Usage: review-package <BASE> <HEAD>\n' >&2
19
+ printf ' BASE and HEAD must be valid git revisions.\n' >&2
20
+ exit 1
21
+ }
22
+
23
+ if [ $# -lt 2 ]; then
24
+ printf 'error: review-package requires BASE and HEAD arguments\n' >&2
25
+ usage
26
+ fi
27
+
28
+ BASE="$1"
29
+ HEAD="$2"
30
+
31
+ # Validate both revisions.
32
+ if ! git rev-parse --verify "$BASE" >/dev/null 2>&1; then
33
+ printf 'error: BASE "%s" is not a valid git revision\n' "$BASE" >&2
34
+ exit 1
35
+ fi
36
+ if ! git rev-parse --verify "$HEAD" >/dev/null 2>&1; then
37
+ printf 'error: HEAD "%s" is not a valid git revision\n' "$HEAD" >&2
38
+ exit 1
39
+ fi
40
+
41
+ # Resolve short shas for a deterministic, collision-resistant file name.
42
+ BASE_SHORT="$(git rev-parse --short "$BASE")"
43
+ HEAD_SHORT="$(git rev-parse --short "$HEAD")"
44
+
45
+ # Get the scratch workspace dir.
46
+ SCRATCH="$(sh "$_SCRIPT_DIR/sdd-workspace")"
47
+
48
+ OUT="$SCRATCH/review-${BASE_SHORT}-${HEAD_SHORT}.txt"
49
+
50
+ {
51
+ printf '## Commits %s..%s\n\n' "$BASE" "$HEAD"
52
+ git log --oneline "${BASE}..${HEAD}"
53
+ printf '\n## Diff stat\n\n'
54
+ git diff --stat "${BASE}..${HEAD}"
55
+ printf '\n## Diff\n\n'
56
+ git diff -U10 "${BASE}..${HEAD}"
57
+ } > "$OUT"
58
+
59
+ printf '%s\n' "$OUT"
@@ -0,0 +1,47 @@
1
+ #!/bin/sh
2
+ # sdd-workspace — ensures a self-ignoring scratch dir for ephemeral briefs/diffs/reports
3
+ # and prints its absolute path to stdout. Idempotent.
4
+ #
5
+ # Usage: sdd-workspace [DIR_OVERRIDE]
6
+ # $1 optional explicit dir override; if provided, use that dir instead of auto-detecting.
7
+ #
8
+ # Auto-detection: if a 02-DOCS dir exists at repo root, use 02-DOCS/wiki/sdd/.scratch/
9
+ # Otherwise fall back to ${TMPDIR:-/tmp}/rsc-sdd-<pid>.
10
+
11
+ set -e
12
+
13
+ if [ -n "$1" ]; then
14
+ SCRATCH="$1"
15
+ else
16
+ # Walk up from cwd to find repo root (where 02-DOCS lives), or just use cwd.
17
+ _find_repo_root() {
18
+ _dir="$PWD"
19
+ while [ "$_dir" != "/" ]; do
20
+ if [ -d "$_dir/02-DOCS" ]; then
21
+ printf '%s' "$_dir"
22
+ return 0
23
+ fi
24
+ _dir="$(dirname "$_dir")"
25
+ done
26
+ # Not found; return empty
27
+ printf ''
28
+ }
29
+
30
+ _ROOT="$(_find_repo_root)"
31
+
32
+ if [ -n "$_ROOT" ] && [ -d "$_ROOT/02-DOCS" ]; then
33
+ SCRATCH="$_ROOT/02-DOCS/wiki/sdd/.scratch"
34
+ else
35
+ # Use a fixed stable name under TMPDIR so repeated calls return the same path.
36
+ SCRATCH="${TMPDIR:-/tmp}/rsc-sdd-scratch"
37
+ fi
38
+ fi
39
+
40
+ mkdir -p "$SCRATCH"
41
+
42
+ # Drop a self-ignoring .gitignore so nothing in this dir is ever tracked.
43
+ if [ ! -f "$SCRATCH/.gitignore" ]; then
44
+ printf '*\n' > "$SCRATCH/.gitignore"
45
+ fi
46
+
47
+ printf '%s\n' "$SCRATCH"
@@ -0,0 +1,77 @@
1
+ #!/bin/sh
2
+ # task-brief PLAN_FILE N
3
+ # Extracts task number N from a plan/tasks markdown file into a brief file in the
4
+ # sdd-workspace dir, and prints its path.
5
+ #
6
+ # A task starts at a heading matching: ### T<N> / ### T<N>( / ### T<N> — / ## T<N>
7
+ # and ends just before the next ### T / ## heading.
8
+ # If task N is not found, exits nonzero with a clear message.
9
+
10
+ # Resolve the directory of this script portably.
11
+ _SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
12
+
13
+ usage() {
14
+ printf 'Usage: task-brief <PLAN_FILE> <N>\n' >&2
15
+ printf ' PLAN_FILE path to the tasks markdown file\n' >&2
16
+ printf ' N task number to extract (e.g. 2 for T2)\n' >&2
17
+ exit 1
18
+ }
19
+
20
+ if [ $# -lt 2 ]; then
21
+ printf 'error: task-brief requires PLAN_FILE and N arguments\n' >&2
22
+ usage
23
+ fi
24
+
25
+ PLAN_FILE="$1"
26
+ N="$2"
27
+
28
+ if [ ! -f "$PLAN_FILE" ]; then
29
+ printf 'error: plan file not found: %s\n' "$PLAN_FILE" >&2
30
+ exit 1
31
+ fi
32
+
33
+ # Validate N is a positive integer.
34
+ case "$N" in
35
+ ''|*[!0-9]*)
36
+ printf 'error: N must be a positive integer, got: %s\n' "$N" >&2
37
+ exit 1
38
+ ;;
39
+ esac
40
+
41
+ # Use awk to extract the task block.
42
+ # Match start: a heading line (## or ###) followed by T<N> then space, (, —, or end of line.
43
+ # Match end: the next heading line starting with ## (which includes ###).
44
+ # Exit code: 0 if found, 1 if not found.
45
+ CONTENT="$(awk -v n="$N" '
46
+ BEGIN { printing=0; found=0 }
47
+ /^#{2,3}[[:space:]]/ {
48
+ # Check if this heading is the target task T<n>
49
+ line=$0
50
+ pat="^#{2,3}[[:space:]]+T" n "([[:space:](]|[-]|$)"
51
+ if (match(line, pat)) {
52
+ printing=1; found=1; print; next
53
+ }
54
+ # Any other ## / ### heading stops printing
55
+ if (printing) { exit }
56
+ }
57
+ { if (printing) print }
58
+ END { if (!found) exit 1 }
59
+ ' "$PLAN_FILE")" || {
60
+ printf 'error: task T%s not found in %s\n' "$N" "$PLAN_FILE" >&2
61
+ exit 1
62
+ }
63
+
64
+ if [ -z "$CONTENT" ]; then
65
+ printf 'error: task T%s not found in %s\n' "$N" "$PLAN_FILE" >&2
66
+ exit 1
67
+ fi
68
+
69
+ # Get the scratch workspace dir.
70
+ SCRATCH="$(sh "$_SCRIPT_DIR/sdd-workspace")"
71
+
72
+ # Derive a readable unique name from the plan file basename and task number.
73
+ PLAN_BASENAME="$(basename "$PLAN_FILE" .md)"
74
+ OUT="$SCRATCH/brief-${PLAN_BASENAME}-T${N}.txt"
75
+
76
+ printf '%s\n' "$CONTENT" > "$OUT"
77
+ printf '%s\n' "$OUT"
@@ -77,6 +77,25 @@ Each subagent still owns its own discipline inside its scope — TDD via `implem
77
77
 
78
78
  **The `developer` agent is the default worker.** rsc installs a `developer` subagent pinned to the **balanced** tier (Sonnet by default; the user's onboarding choice in `.rsc/developer.json`, never `light`). For implement-type units, **dispatch to the `developer` agent** (e.g. Claude Code `subagent_type: developer`) — that's the deliberate cost cap the user asked for. Only escalate a genuinely heavy unit (real design / root-cause) to a heavy model, and only when routing is enabled; otherwise the `developer` agent's balanced model is the floor and the ceiling.
79
79
 
80
+ **Model is REQUIRED on every dispatch.** Set each subagent's model explicitly. An omitted model
81
+ silently inherits the session's model — usually the most expensive one — and that is how a fan-out's
82
+ cost quietly explodes. The `developer` agent already pins balanced; if you dispatch a raw subagent,
83
+ name its tier.
84
+
85
+ ### The unit report contract (statuses & escalation)
86
+
87
+ Each unit reports **one of four statuses** — it does not guess or hack around a wall when unsure:
88
+
89
+ - `DONE` — done-check green, the unit's frozen interface honored.
90
+ - `DONE_WITH_CONCERNS` — green, but a flagged risk/assumption for the orchestrator to adjudicate.
91
+ - `BLOCKED` — cannot proceed (failing dependency, a contract that doesn't hold, missing access).
92
+ - `NEEDS_CONTEXT` — something it needed (an interface, a constraint) was not in its brief.
93
+
94
+ On `BLOCKED`/`NEEDS_CONTEXT`, **never re-dispatch the same brief to the same model unchanged** —
95
+ change something first: add the missing interface/constraint, fix the dependency, or escalate the
96
+ tier for a genuinely hard unit. An identical re-run burns budget without progress. A `BLOCKED` unit
97
+ is a partition signal too: if it blocked on another unit, the two were not independent — re-partition.
98
+
80
99
  ### Skill resolution feedback
81
100
 
82
101
  Every subagent result must report:
@@ -95,6 +114,16 @@ The orchestrator folds these into the final parallel result. If a unit claims it
95
114
 
96
115
  Wait for **all** units to report. Collect each one's diff, test output, and any decisions. Do not start merging the fast ones while slow ones are still running — partial merges create the exact shared-state races you partitioned to avoid. Check each result against its brief's done-check *before* it touches the integration branch: a unit that came back not-actually-done gets sent back, not merged hopefully.
97
116
 
117
+ **Per-unit review gate (before merge).** For each non-trivial unit, run a fresh-eyes review of *that
118
+ unit's diff alone* before it joins the integration branch — the same gate `implement` runs per task
119
+ (full brief: `../implement/references/per-task-review.md`). Package the unit's diff as a file
120
+ (`../implement/scripts/review-package <BASE> <HEAD>`), dispatch a fresh reviewer (NOT the unit's own
121
+ implementer) with the diff path + the unit's done-check + its frozen interface, and fold
122
+ `Critical`/`Important` findings back before merging. **Anti-pre-judging:** never tell the reviewer
123
+ what not to flag or pre-rate severity. This is the cheap per-unit pass; the *combined* adversarial
124
+ review over the whole merged diff is still `review`'s job at the end (RECONCILE only proves the seam
125
+ compiles and tests green).
126
+
98
127
  ### 4. RECONCILE — merge, then prove the seam holds
99
128
 
100
129
  This is where parallel work is actually finished. In order:
@@ -14,6 +14,20 @@ flows, levels — not framework syntax.
14
14
  > Spec: ../specs/<slug>.md · Constitution: ../constitution.md · Status: draft | approved
15
15
  > Last updated: YYYY-MM-DD
16
16
 
17
+ ## 0. Global Constraints
18
+
19
+ (The project-wide rules that EVERY task implicitly includes — copied **verbatim** from the spec
20
+ and constitution, with exact values, not paraphrased. A context-isolated implementer only sees its
21
+ own task, so anything not stated here or in a task's Interfaces block is invisible to it. This is
22
+ also the **reviewer's attention lens**: the per-task reviewer reads these first and treats a
23
+ violation as a finding. Keep it to hard, checkable values — version floors, naming/copy rules,
24
+ authorship, framework canon, security/compliance bars. If a value matters, write the value, not
25
+ "per the spec".)
26
+
27
+ - **<Category>:** <exact value> (e.g. "Node ≥ 20", "all amounts in cents (integer)", "no
28
+ Co-Authored-By footer", "error envelope `{ code, message }` everywhere").
29
+ - …
30
+
17
31
  ## 1. Context & constraints
18
32
 
19
33
  (One or two lines each. Cite, don't re-paste, the spec and constitution.)
@@ -122,3 +136,7 @@ unit / contract / integration / e2e. Tooling is the stack skill's job — name t
122
136
  If a section genuinely has no content (e.g. no migration), write "none — <why>", not silence.
123
137
  - **Index it.** After writing, add the plan to the root `CLAUDE.md` `## Knowledge map` so the
124
138
  harness wiki and later phases can find it.
139
+ - **Global Constraints carry the verbatim values.** §0 is not a summary of §1 — §1 explains *which*
140
+ constraints shape the design and *why*; §0 lists the exact values a blind implementer and the
141
+ reviewer must honor. A value that lives only in prose ("use the project's naming rule") is a value
142
+ the isolated worker can't see. Put it in §0, exact.