@lemoncode/lemony 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/LICENSE +21 -0
  2. package/PRIVACY.md +147 -0
  3. package/README.md +189 -0
  4. package/catalog/VERSION +1 -0
  5. package/catalog/agents/README.md +29 -0
  6. package/catalog/agents/architect.md +81 -0
  7. package/catalog/agents/fit-assessment.md +94 -0
  8. package/catalog/agents/implementer.md +67 -0
  9. package/catalog/agents/orchestrator.md +627 -0
  10. package/catalog/agents/reviewer.md +124 -0
  11. package/catalog/agents/spec-author.md +69 -0
  12. package/catalog/agents/ui-designer.md +25 -0
  13. package/catalog/commands/add-capability.md +69 -0
  14. package/catalog/commands/bypass.md +40 -0
  15. package/catalog/commands/define.md +24 -0
  16. package/catalog/commands/hotfix.md +47 -0
  17. package/catalog/commands/pause.md +52 -0
  18. package/catalog/commands/resume.md +56 -0
  19. package/catalog/commands/spinoff.md +59 -0
  20. package/catalog/commands/triage.md +24 -0
  21. package/catalog/harness.config.schema.json +116 -0
  22. package/catalog/hooks/README.md +56 -0
  23. package/catalog/hooks/init.sh +281 -0
  24. package/catalog/hooks/lib/lemony.sh +41 -0
  25. package/catalog/hooks/lib/playbook-scan.sh +394 -0
  26. package/catalog/hooks/lib/transcript-grep.sh +56 -0
  27. package/catalog/hooks/require-playbook.sh +97 -0
  28. package/catalog/hooks/session-close.sh +232 -0
  29. package/catalog/hooks/suggest-playbook.sh +72 -0
  30. package/catalog/playbook-format.md +198 -0
  31. package/catalog/schemas/README.md +13 -0
  32. package/catalog/schemas/tier2-events-history.md +104 -0
  33. package/catalog/schemas/tier2-events.md +286 -0
  34. package/catalog/skills/README.md +62 -0
  35. package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
  36. package/catalog/skills/code-explorer/SKILL.md +76 -0
  37. package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
  38. package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
  39. package/catalog/skills/grill-with-docs/SKILL.md +270 -0
  40. package/catalog/skills/grill-with-docs/reference.md +236 -0
  41. package/catalog/skills/mutation-testing/SKILL.md +84 -0
  42. package/catalog/skills/note-side-finding/SKILL.md +89 -0
  43. package/catalog/skills/playbook-iterate/SKILL.md +78 -0
  44. package/catalog/skills/prd-to-spec/SKILL.md +181 -0
  45. package/catalog/skills/raise-discovery/SKILL.md +112 -0
  46. package/catalog/skills/resolve-discovery/SKILL.md +123 -0
  47. package/catalog/skills/review-pr/SKILL.md +106 -0
  48. package/catalog/skills/review-pr/reference.md +105 -0
  49. package/catalog/skills/security-review/SKILL.md +90 -0
  50. package/catalog/skills/senior-review/SKILL.md +99 -0
  51. package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
  52. package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
  53. package/catalog/skills/spec-to-issue/SKILL.md +88 -0
  54. package/catalog/skills/task-closeout/SKILL.md +229 -0
  55. package/catalog/skills/tdd/SKILL.md +171 -0
  56. package/catalog/skills/test-gap-report/SKILL.md +71 -0
  57. package/catalog/skills/triage-issue/SKILL.md +102 -0
  58. package/catalog/skills/update-architecture/SKILL.md +69 -0
  59. package/catalog/skills/verify/SKILL.md +90 -0
  60. package/catalog/skills/write-adr/SKILL.md +77 -0
  61. package/catalog/templates/README.md +32 -0
  62. package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
  63. package/catalog/templates/claude-code/agents.md.tpl +109 -0
  64. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
  65. package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
  66. package/catalog/templates/claude-code/state/history.md.tpl +6 -0
  67. package/dist/cli.mjs +5691 -0
  68. package/package.json +80 -0
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Lemoncode
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/PRIVACY.md ADDED
@@ -0,0 +1,147 @@
1
+ # Privacy
2
+
3
+ _Last updated: 2026-06-21 · Applies to: Lemony anonymous telemetry (v1)._
4
+
5
+ Lemony collects a small amount of **anonymous** usage telemetry to improve the
6
+ harness. **It is on by default**; you can turn it off at any time (see
7
+ [How to opt out](#how-to-opt-out)), and you can inspect exactly what would be sent
8
+ with `lemony telemetry show`. This document explains exactly what is collected, why
9
+ it is genuinely anonymous, who the parties are, and how to turn it off. It is written
10
+ for the developers who run Lemony and is versioned with the code — when the posture
11
+ changes, the change is in git history.
12
+
13
+ > **Scope.** This describes Lemony's **v1 `anonymous` tier only** — the only tier
14
+ > that ships today, on by default. A future `project` tier (identified at
15
+ > org-level, for contracted Lemoncode clients) is designed but **not built or
16
+ > shipped**; when it is, this document gains the corresponding lawful-basis,
17
+ > retention, and data-subject-rights sections, after a formal legal review.
18
+
19
+ ## Who we are
20
+
21
+ - **Data controller:** Lemoncode S.L.
22
+ - **Contact:** daniel.sanchez@lemoncode.net
23
+
24
+ ## What is collected
25
+
26
+ Telemetry events are written locally to `.claude/state/events.jsonl` as you work.
27
+ When telemetry is on, each event is **sanitized** before it leaves your machine and
28
+ only the anonymous projection is sent. The sanitizer keeps two kinds of field and
29
+ drops everything else (fail-closed — any field not explicitly allowed is dropped):
30
+
31
+ **Collected (anonymous):**
32
+
33
+ - `ts` — the event timestamp.
34
+ - `harness_version` — the Lemony version that produced the event.
35
+ - **Numeric metrics** — e.g. session duration, cycle time, review-rejection counts,
36
+ iteration counts, step counts.
37
+ - **Internal enumerations** — fixed, code-defined categories: the event `type` (one
38
+ of nine), task levels, close reasons, severities, checkpoint results.
39
+ - **`attributed_name`** — a short label naming the harness **component** (an agent,
40
+ skill, or playbook) attributed to an event; this is the metric we most care about
41
+ ("which component causes friction"). For the components Lemony ships, these are
42
+ roster names shared across every install — not your code, your project, or free
43
+ text. (It is a bounded free string, not a strictly-enforced enum; the
44
+ custom-local-component edge is covered by the anonymity caveat below.)
45
+
46
+ **Never sent:**
47
+
48
+ - `user` (your git identity) — this is marked **local-only**: it never leaves your
49
+ machine in any tier.
50
+ - `project` and task identifiers (`task_id`, `parent_task_id`) — project-correlating
51
+ identifiers, dropped from the anonymous projection.
52
+ - **Free text** — anything you typed: spec topics, follow-up descriptions, review
53
+ rejection reasons, bypass reasons.
54
+
55
+ You can see exactly what would be sent for your own data at any time:
56
+
57
+ ```bash
58
+ lemony telemetry show # prints each local event raw, then its sanitized projection
59
+ lemony telemetry status # shows whether telemetry is on/off and why
60
+ ```
61
+
62
+ The full field-by-field classification is the schema document
63
+ [`catalog/schemas/tier2-events.md`](./catalog/schemas/tier2-events.md); the
64
+ sanitizer code (`src/telemetry/`) is its executable mirror and is public.
65
+
66
+ ## Why this is genuinely anonymous
67
+
68
+ The `anonymous` tier carries **no persistent identifier** — no account, no device
69
+ ID, no install UUID — so events cannot be linked to you or to each other across
70
+ sessions. The only non-numeric values are timestamps and fixed internal
71
+ enumerations; the one component-name field, `attributed_name`, refers to a
72
+ catalog component that is identical across all installs, so it identifies a part of
73
+ **Lemony**, not a person or a project.
74
+
75
+ Because the data carries no identifier and cannot reasonably be re-linked to an
76
+ individual, we **regard it as genuinely anonymous and, in our assessment, outside the
77
+ scope of the GDPR** — pending the formal legal review noted below. This assessment
78
+ rests on the data staying genuinely anonymous: if a future change ever introduced a
79
+ re-identification vector — e.g. unique locally-authored component names in a small
80
+ population — that change would have to revisit this classification first.
81
+
82
+ ## Cloudflare sub-processor and your IP address
83
+
84
+ Telemetry is sent over HTTPS to a Cloudflare Worker that writes the anonymous
85
+ payload to private storage.
86
+
87
+ - **Cloudflare, Inc.** acts as a **sub-processor**. Because Cloudflare terminates
88
+ TLS at its edge, it transiently sees the **source IP address** of the request.
89
+ - **We never read or store the IP address.** The Lemony Worker does not log it,
90
+ does not persist it, and there is no IP Logpush. The IP exists only in transit and
91
+ is used by Cloudflare's edge for connection handling and an abuse / rate-limit
92
+ rule.
93
+
94
+ The IP, while in transit, is the only piece of personal data touched anywhere in
95
+ this pipeline. Our lawful basis for that transient processing is **legitimate
96
+ interest** — operating the service securely and preventing abuse — and it is
97
+ minimized to the maximum: not persisted, not logged, never associated with the
98
+ anonymous payload.
99
+
100
+ ## Retention
101
+
102
+ Because the collected payload is genuinely anonymous (no personal data, no
103
+ identifier), it is retained **indefinitely** to support long-term, version-aware
104
+ analysis of how the harness is used. The transient IP seen by Cloudflare is **not
105
+ retained at all**.
106
+
107
+ ## How to opt out
108
+
109
+ Anonymous telemetry is **on by default**. You can turn it off at any level — the
110
+ **first** matching switch below wins:
111
+
112
+ 1. **`DO_NOT_TRACK` environment variable** (cross-tool standard) — set it to any
113
+ value other than `0`/empty and Lemony, along with every other tool that honors
114
+ the standard, will not send telemetry.
115
+ 2. **`LEMONY_TELEMETRY_DISABLED` environment variable** — set to `1`, `true`, or
116
+ `yes` to disable Lemony telemetry machine- or shell-wide.
117
+ 3. **Per-checkout opt-out** — run `lemony telemetry disable`. This writes a local,
118
+ gitignored opt-out for this repository checkout only (it does not silence your
119
+ teammates). Add `--purge-local` to also delete the local `events.jsonl` and the
120
+ send cursor:
121
+
122
+ ```bash
123
+ lemony telemetry disable # off for this checkout
124
+ lemony telemetry disable --purge-local # off, and delete local event data
125
+ ```
126
+
127
+ 4. **Team-wide off-switch** — commit `telemetry.enabled: false` in your
128
+ `harness.config.yml` to disable telemetry for everyone who uses that repository.
129
+
130
+ Run `lemony telemetry status` at any time to see whether telemetry is on and which
131
+ switch decided it. Re-enable a per-checkout opt-out with `lemony telemetry enable`.
132
+
133
+ ## Transparency
134
+
135
+ Everything about this pipeline is inspectable:
136
+
137
+ - The exact data for **your** events — `lemony telemetry show`.
138
+ - The field-by-field schema — [`catalog/schemas/tier2-events.md`](./catalog/schemas/tier2-events.md).
139
+ - The sanitizer, send engine, and Worker — all public source (`src/telemetry/`,
140
+ `telemetry-worker/`).
141
+
142
+ ## Changes and review
143
+
144
+ This document is self-drafted following established industry disclosures and is
145
+ versioned with the code. A **formal legal review** is scheduled before Lemony is
146
+ published as a public open-source package. Material changes to what is collected or
147
+ how it is processed will be reflected here and in git history.
package/README.md ADDED
@@ -0,0 +1,189 @@
1
+ # 🍋 Lemony
2
+
3
+ > A Harness for AI Coding.
4
+
5
+ `@lemoncode/lemony` brings a reproducible **Spec-Driven Development (SDD)** workflow to
6
+ AI coding agents. A single CLI installs — into any repo — an **Orchestrator** that
7
+ dispatches work to fresh-context sub-agents (spec, implementation, review, architecture),
8
+ a generic skill catalog, lifecycle hooks, anonymous telemetry, and a versioned
9
+ update/rollback system. You own your code and your domain knowledge; the harness owns the
10
+ framework around it and keeps it up to date.
11
+
12
+ **Target:** Claude Code is the only operative target today; the design is
13
+ abstraction-ready, so Cursor / Codex / OpenCode plug in later without a refactor.
14
+
15
+ **Status:** `0.x` — still evolving. Pins are exact and any pre-1.0 minor can break; `1.0.0`
16
+ lands when the CLI verbs, config schema, and catalog contract are stable enough to commit
17
+ to semver.
18
+
19
+ ---
20
+
21
+ ## Prerequisites
22
+
23
+ - **Node.js ≥ 24** and **npm ≥ 11** (`node --version`).
24
+ - **[Claude Code](https://claude.com/claude-code)** installed in the target repo.
25
+
26
+ ## Install
27
+
28
+ The package is public on npm — no token, no registry config:
29
+
30
+ ```bash
31
+ npm i -D @lemoncode/lemony # add the harness to the project
32
+ npx lemony install # scaffold the harness into this repo
33
+ ```
34
+
35
+ The devDependency is the recommended setup: the harness's hooks and agents emit telemetry
36
+ through a launcher that resolves the CLI from the project's own `node_modules/.bin` first,
37
+ then a global install. That local bin is pinned to the repo, so it survives a global `PATH`
38
+ change or a Node version switch (`fnm`/`nvm`).
39
+
40
+ > No `package.json` (a non-Node repo)? A one-shot `npx @lemoncode/lemony install` still
41
+ > scaffolds the harness; telemetry then needs a **global** install
42
+ > (`npm i -g @lemoncode/lemony`) so the launcher can resolve the CLI.
43
+
44
+ `install` is **fresh-only** and writes, into the current repo:
45
+
46
+ ```
47
+ .claude/
48
+ ├── agents/ # the agent role instances (orchestrator, spec, impl, review, …)
49
+ ├── skills/ # the eligible slice of the generic skill catalog (by capability)
50
+ ├── settings.json # the `hooks` block (vendor-owned) merged into your settings
51
+ └── .harness/
52
+ └── baseline/<ver>/ # pristine copy of every installed vendor file (the merge base)
53
+ agents.md # Orchestrator entry point for Claude Code
54
+ harness.config.yml # your harness config (see below)
55
+ ```
56
+
57
+ Your own files — `CLAUDE.md`, `CONTEXT.md`, `docs/`, playbooks — are never overwritten. If
58
+ you run `install` over a `.claude/` that already has managed content, it reconciles instead
59
+ of refusing (interactive on a TTY).
60
+
61
+ ## A day with the harness — when to run what
62
+
63
+ > Two surfaces: **CLI verbs** in your **terminal** (setup & maintenance), **slash commands**
64
+ > **inside Claude Code** (the daily work).
65
+
66
+ **① First time in a repo** _(terminal)_
67
+
68
+ ```bash
69
+ npx lemony install # scaffold the harness
70
+ npx lemony doctor # confirm it's healthy
71
+ ```
72
+
73
+ **② You have an idea or a new feature** _(Claude Code)_ — `/define "add CSV export to the reports page"`
74
+
75
+ Runs the full SDD round-trip. You stop at two human gates; nothing else needs you:
76
+
77
+ 1. the Orchestrator **grills** the idea into a reviewable PRD,
78
+ 2. opens the task issue + branch, dispatches a fresh-context **Spec Author**,
79
+ 3. **▸ approval gate** — you read the spec cold and approve,
80
+ 4. the **Implementer** builds it (TDD), the **Reviewer** checks security / tests / spec-compliance,
81
+ 5. **▸ merge gate** — you approve the PR; on merge, the task closes out.
82
+
83
+ It **honors every human gate and never self-approves** — that, plus fresh-context
84
+ sub-agents, is the difference from prompting an agent ad hoc.
85
+
86
+ **③ A small bug, not a whole feature** _(Claude Code)_ — `/triage "dates render a day off in Safari"`
87
+
88
+ The lightweight **L2** path: find root cause → fix → review → merge, without the full PRD ceremony.
89
+
90
+ **④ Production's on fire** _(Claude Code)_ — `/hotfix "checkout 500s on empty cart"`
91
+
92
+ Fix and ship now; the review runs async and the issue + postmortem are backfilled after.
93
+
94
+ **⑤ Mid-task you spot an _unrelated_ defect** _(Claude Code)_ — `/spinoff "typo in the 404 page copy"`
95
+
96
+ Captures it as a tracked `pending` stub and **keeps you on your current task** — no context
97
+ switch. You pick it up later with `/resume`.
98
+
99
+ **⑥ Stopping for the day, and coming back** _(Claude Code)_
100
+
101
+ - `/pause` — writes a resume narrative + a `session_closed` event so future-you starts cold.
102
+ - `/resume` — lists the open queue (spec-ready, in-progress, `pending` stubs, parked
103
+ closeouts) and picks one up from its branch + state.
104
+
105
+ **⑦ Throwaway or spike, no ceremony** _(Claude Code)_ — `/bypass`
106
+
107
+ Escape hatch: records one `l3_bypass` event, then you work with no issue / spec / review.
108
+
109
+ > **Not sure which mode?** Just describe what you want — the Orchestrator routes _"I have an
110
+ > idea for…"_ → DEFINE, _"there's a bug in…"_ → TRIAGE, _"continue #42"_ → RESUME.
111
+
112
+ | Slash command | Mode | Use it to |
113
+ | ----------------- | ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
114
+ | `/define` | L1 | Start a new feature/idea: grill → PRD → spec → implement → review → merge. |
115
+ | `/triage` | L2 | Handle a small bug on the lightweight path. |
116
+ | `/hotfix` | L2! | Urgent fast-track: fix and ship now; the review runs async, issue + postmortem are backfilled. |
117
+ | `/spinoff` | — | Capture an unrelated, non-blocking defect mid-task as a tracked stub — without leaving your current task. |
118
+ | `/resume` | — | Pick up an existing harness task (or a `/spinoff` stub) from its branch and state. |
119
+ | `/pause` | — | Pause the session: writes a narrative resume + a `session_closed` event. |
120
+ | `/bypass` | L3 | Escape hatch: record one `l3_bypass` event, then work with no issue/state/review. |
121
+ | `/add-capability` | — | Activate an opt-in capability your repo reported as available (see [Opt-in capabilities](#opt-in-capabilities)) — e.g. have the Architect bootstrap `docs/architecture.md`. |
122
+
123
+ Behind the slash commands the harness installs a catalog of **agents** (orchestrator,
124
+ spec-author, implementer, reviewer, architect, ui-designer) and **skills** (`grill-with-docs`,
125
+ `tdd`, `senior-review`, `security-review`, …) that the agents invoke — you don't call skills
126
+ directly; the Orchestrator and sub-agents do, gated by your repo's capabilities.
127
+
128
+ ## Commands
129
+
130
+ The CLI ships nine verbs. Run `lemony <command> --help` (or `-h`) for usage;
131
+ `lemony version` (or `-v`) prints the installed version.
132
+
133
+ | Command | What it does | Key flags |
134
+ | ----------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
135
+ | `install` | Install into a fresh repo, or reconcile a pre-existing `.claude/`. | `--target=<claude-code>` `--task-storage-repo=<owner/name>` `--on-conflict=<vendor\|client>` |
136
+ | `update` | Move the install to the CLI's catalog version (3-way merge). | `--on-conflict=<vendor\|client>` `--dry-run` |
137
+ | `repair` | Re-sync at the **pinned** version (restore missing files, never clobber edits). | `--dry-run` |
138
+ | `rollback` | Restore a pre-change snapshot (offline). | `--to=<version>` `--list` `--cleanup` `--force` |
139
+ | `uninstall` | Remove vendor-managed files (keeps your docs, state, adopted skills). | `--labels` |
140
+ | `doctor` | Diagnose the installation (read-only); proposes `repair`. | — |
141
+ | `status` | Show installed version, branch drift, and open tasks. | — |
142
+ | `emit` | Append a telemetry event to `.claude/state/events.jsonl`. | `<type> [--key=value …]` |
143
+ | `telemetry` | Inspect or control the local anonymous telemetry. | `status` `show` `disable` `enable` |
144
+
145
+ The harness keeps a **committed baseline** under `.claude/.harness/baseline/<version>/` — a
146
+ verbatim copy of every installed vendor file at the pinned version — so `update` is a true
147
+ 3-way merge that travels with your repo. Overlapping edits get git-style `<<< / >>>` markers
148
+ and a report; your copy is always snapshotted first, so `rollback` is one command away.
149
+
150
+ ## `harness.config.yml`
151
+
152
+ The installer writes a config that pins and shapes your install (validated with a strict
153
+ schema; `harness.config.schema.json` ships for IDE autocomplete):
154
+
155
+ | Key | Meaning |
156
+ | ---------------- | ------------------------------------------------------------ |
157
+ | `vendor_version` | Exact semver pin. `update` bumps it; never edit by hand. |
158
+ | `target` | Which AI-coding harness the install targets (`claude-code`). |
159
+ | `task_storage` | Where tasks live (`owner/name` of the issues repo). |
160
+ | `agents` | Per-agent overrides. |
161
+ | `paths` | Where managed files land. |
162
+
163
+ ## Opt-in capabilities
164
+
165
+ Some skills stay dormant until your repo keeps a **convention file** — the harness never
166
+ creates one for you (that would impose an architecture), so you opt in deliberately.
167
+
168
+ - **What it is.** Today the one convention is **`docs/architecture.md`**. Keep it and the
169
+ `update-architecture` skill activates on your next `update` / `repair`, keeping that map
170
+ current as your system's shape changes.
171
+ - **See what's available.** `install` lists the capabilities your repo qualifies for, and
172
+ `lemony doctor` shows them under an `ℹ capabilities` line — informational, never a failure.
173
+ - **Turn one on.** Run **`/add-capability`** inside Claude Code: it has the right agent
174
+ author the convention file for you, then re-syncs so the gated skill installs.
175
+
176
+ ## Privacy
177
+
178
+ The harness emits **anonymous, opt-out** telemetry to improve the framework.
179
+
180
+ - **What's collected.** Coarse usage signals only — **no source, no file contents, no
181
+ identifiers**. Events are written locally to `.claude/state/events.jsonl` first.
182
+ - **Opt out.** Set `DO_NOT_TRACK=1`, or run `npx lemony telemetry disable`.
183
+ - **Inspect it.** `npx lemony telemetry status` / `show` shows exactly what's on disk.
184
+
185
+ Full details are in `PRIVACY.md`, shipped with the package.
186
+
187
+ ---
188
+
189
+ Made with 🍋 by [Lemoncode](https://lemoncode.net).
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,29 @@
1
+ # agents/ — role catalog templates
2
+
3
+ > **Status: 1 hat + 5 sub-agents, all skill-gated (P4 slice 2).** `orchestrator` (hat)
4
+ > weaves its skills into its prose. The sub-agents `spec-author`, `implementer`,
5
+ > `reviewer`, and the on-demand `architect` each carry a single `{{SKILLS}}` marker the
6
+ > installer fills with the skills they invoke, gated by `applies-when` capabilities
7
+ > (decision #31; the profile tier #39 is retired — ADR 0015). The `architect` is always
8
+ > installed but invoked **on-demand** —
9
+ > outside the linear flow (P4 slice 2 authored it + its skills). `ui-designer` stays
10
+ > inactive (#11).
11
+
12
+ These are the **vendor role catalog templates**, not client instances. The
13
+ installer renders them into a client's `.claude/agents/<role>.md` with the repo's
14
+ resolved skills (decision #31, #32).
15
+
16
+ The harness runs as **1 hat + 4 sub-agents** plus one inactive slot (decisions #10,
17
+ #29):
18
+
19
+ | Role | Reification | When invoked |
20
+ | --------------------------------- | ------------------------------------ | -------------------------------------------------- |
21
+ | [orchestrator](./orchestrator.md) | hat (continuous, human-facing) | entry point, RESUME, closeout, discovery mediation |
22
+ | [spec-author](./spec-author.md) | sub-agent (fresh context) | PRD → spec, create GitHub issue |
23
+ | [implementer](./implementer.md) | sub-agent (fresh context) | implement spec via TDD |
24
+ | [reviewer](./reviewer.md) | sub-agent (fresh context) | validate against spec, tests, security |
25
+ | [architect](./architect.md) | sub-agent, on-demand | ADRs, architecture docs, playbook iteration |
26
+ | [ui-designer](./ui-designer.md) | slot — **inactive by default** (#11) | — |
27
+
28
+ **Fresh context** is the distinctive value of a sub-agent over the hat: it prevents
29
+ confirmation bias and forces self-sufficient artifacts.
@@ -0,0 +1,81 @@
1
+ ---
2
+ name: architect
3
+ description: On-demand sub-agent for decisions about the system's shape — record an ADR, keep docs/architecture.md true, iterate a client playbook, or explore an unfamiliar codebase. Invoked only by the Orchestrator, with fresh context; it proposes and the human decides, and is never part of the linear DEFINE/TRIAGE flow.
4
+ role: Architect
5
+ reification: sub-agent (on-demand)
6
+ invoked-when: on-demand — an ADR / architecture update / playbook iteration, or an explicit request
7
+ origin: vendor
8
+ vendor_version: '{{vendor_version}}'
9
+ ---
10
+
11
+ # Architect
12
+
13
+ A **sub-agent** with fresh context, invoked **on-demand** — never part of the linear
14
+ DEFINE / TRIAGE flow. The Orchestrator is its **only** invoker. Clean eyes for decisions
15
+ about the system's shape: recording why a choice was made, keeping the architecture map
16
+ true, exploring an unfamiliar codebase, and iterating the client's playbooks.
17
+
18
+ The Architect **proposes**; the human (via the Orchestrator) **decides**. It owns the
19
+ ADRs, `docs/architecture.md`, and the client's playbooks — but playbooks are
20
+ client-owned (decision #8), so it never imposes content, it suggests changes.
21
+
22
+ ## When the Orchestrator invokes you
23
+
24
+ You are dispatched for one of these, with the relevant context (the discovery entry, the
25
+ decision, the change, or the request):
26
+
27
+ | Trigger | Skill to run |
28
+ | -------------------------------------------------------------------------------------- | --------------------- |
29
+ | A discovery resolution worth recording as a durable decision | `write-adr` |
30
+ | A `T6 PLAYBOOK_CONFLICT` resolution that changes a playbook (or "capture how we do X") | `playbook-iterate` |
31
+ | A change altered the architecture **and** `docs/architecture.md` exists | `update-architecture` |
32
+ | Orienting in a large or unfamiliar codebase before a decision/spec | `code-explorer` |
33
+ | A grill that a `T6` shows needs a **major** playbook re-think | `grill-with-docs` |
34
+
35
+ These are conditions, not a sequence — run the one you were invoked for. Which skills
36
+ are installed depends on the repo's capabilities (see Skills below); run whichever landed.
37
+
38
+ Your most reliable activation is **closeout** (#138, ADR 0010): the `task-closeout` skill
39
+ drives `write-adr`, `update-architecture`, and `playbook-iterate` at the end of every task,
40
+ in cold blood, so durable capture isn't lost to mid-task resume pressure. There the
41
+ Orchestrator invokes you **automatically** for `update-architecture` (when
42
+ `docs/architecture.md` exists — the map tracks reality, no offer), and on the human's
43
+ acceptance for `write-adr` / `playbook-iterate`. See the Orchestrator's §Closeout.
44
+
45
+ ## Operating procedure
46
+
47
+ 1. **Orient.** Read the context the Orchestrator gave you — the discovery entry and its
48
+ resolution, the changed code, or the request. If the codebase is unfamiliar and the
49
+ task needs it, run `code-explorer` first.
50
+ 2. **Do the one thing you were invoked for**, via its skill. Keep edits surgical and
51
+ the rationale linked, not duplicated: an ADR records the _why_, `architecture.md` the
52
+ _shape_, a playbook the _reusable pattern_. Don't restate one in another.
53
+ 3. **Stay project-agnostic in playbooks.** A rule tied to this one codebase belongs in
54
+ `CLAUDE.md` / an ADR / `CONTEXT.md`, never a playbook. If a "playbook" change is
55
+ really project-specific, say so and route it there instead.
56
+ 4. **Raise a discovery, don't improvise.** If you hit a T1–T6 case the context didn't
57
+ anticipate — the decision contradicts an existing ADR (T1), an unspecified fork (T2),
58
+ the change already exists (T4) — **run the `raise-discovery` skill**: write the entry,
59
+ return the one-line summary, and stop. The Orchestrator mediates and re-invokes you.
60
+ If instead you spot an **independent**, non-blocking defect while mapping or deciding
61
+ — unrelated to the artifact at hand — **run `note-side-finding`** to add it to your
62
+ return summary rather than raising a pausing discovery.
63
+ 5. **Report back** to the Orchestrator: the artifact written/changed (path) and a
64
+ one-line summary — or, if the trigger didn't actually warrant the artifact (an ADR
65
+ that fails the three tests, a change that isn't architecturally significant), say so
66
+ and write nothing. The Orchestrator owns the human dialogue and the label lifecycle.
67
+
68
+ ## Grill condition
69
+
70
+ `grill-with-docs` is **shared** with the Orchestrator but you invoke it **only** under
71
+ the documented condition `IF discovery == T6 AND the playbook requires major iteration`
72
+ — a re-think too large for a single `playbook-iterate` edit. The Orchestrator's
73
+ DEFINE-mode grill is a different use; there is no operational overlap.
74
+
75
+ ## Skills
76
+
77
+ The installer fills this list with the skills your repo's capabilities resolved to
78
+ (decision #31); each skill renders with the condition that triggers it. The rich "how"
79
+ of each lives in its own `SKILL.md`.
80
+
81
+ {{SKILLS}}
@@ -0,0 +1,94 @@
1
+ # Task-fit assessment — the dial
2
+
3
+ > Vendor reference doc. The **canonical, operative nudge criterion** lives in
4
+ > `orchestrator.md` (§"Task-fit assessment") — that file is the hat, always
5
+ > loaded, so it is the single source the Orchestrator acts on. This doc is the
6
+ > fuller model and worked examples behind that one-paragraph rule; when the two
7
+ > ever drift, `orchestrator.md` wins.
8
+
9
+ The harness is a **dial, not an on/off switch** (decisions #57–#61). Every
10
+ incoming task lands at one of three levels of ceremony. The Orchestrator (an
11
+ LLM) classifies — there is **no runtime scorer** (a keyword heuristic would be
12
+ less accurate and would bias toward the expensive failure; a data-driven scorer
13
+ is a later-phase revisit once telemetry exists).
14
+
15
+ ## The three levels
16
+
17
+ | Level | Ceremony | For | Path |
18
+ | -------------------- | ------------------------------------------------------------ | ---------------------------------------------------------------------- | ------------------ |
19
+ | **L1 — full SDD** | Grill → spec → human gate → implement → review → merge gate | Real features; anything whose intent is worth pinning down before code | DEFINE (`/define`) |
20
+ | **L2 — lightweight** | Issue + branch + implement (TDD) + light review → merge gate | Small bugs that still deserve a record and a reviewer, not a spec | TRIAGE (`/triage`) |
21
+ | **L3 — bypass** | None — no issue, no task state, no review | Typos, mechanical renames, lockfile bumps, comment fixes | `/bypass` |
22
+
23
+ The gradient is **cost of ceremony vs. cost of a missed mistake**. L1 spends the
24
+ most ceremony because a misunderstood feature is the most expensive thing to
25
+ discover at review (or in production). L3 spends none because there is nothing a
26
+ spec or a reviewer would catch that the change itself doesn't already make
27
+ obvious.
28
+
29
+ ## The criterion: specifiable, verifiable, or reviewable
30
+
31
+ A task **deserves the harness** (L1 or L2) if it is at least **one** of:
32
+
33
+ - **Specifiable** — you could write down, ahead of time, what "done" means in a
34
+ way that could be wrong. If there's a spec worth agreeing on before code, it's
35
+ L1.
36
+ - **Verifiable** — there's a behaviour a test could pin down (a red test today,
37
+ green after). If the change has a testable contract, it earns at least L2.
38
+ - **Reviewable** — a second reader could catch a real mistake in it. If the diff
39
+ has judgement in it (logic, naming that encodes intent, an edge case), a
40
+ reviewer adds value.
41
+
42
+ If a task meets **none** of the three — there's nothing to specify, nothing a
43
+ test would meaningfully assert, nothing a reviewer would catch — it's **L3**.
44
+
45
+ **When in doubt, keep it in the harness.** A wasted bit of ceremony is cheaper
46
+ than an unreviewed bug. The nudge toward `/bypass` is a single, discardable
47
+ question — it never blocks.
48
+
49
+ ## L1 vs. L2: spec or no spec
50
+
51
+ Both run the branch → PR → human merge gate; the difference is the **front half**:
52
+
53
+ - **L1** when the _intent_ is what's at stake — a feature, a behaviour change, a
54
+ decision that a human should sign off on _before_ code exists. The spec, not
55
+ the code, is the source of truth, so the approval gate sits at the head of
56
+ implementation.
57
+ - **L2** when the intent is obvious but the _fix_ still wants a record and a
58
+ reviewer — a bug with a clear root cause. Skip the spec and its gate; keep the
59
+ issue, the branch, the TDD loop, and the review.
60
+
61
+ The mechanical marker: an L1 task carries `harness:sdd`; an L2 task does not (its
62
+ absence is what selects the lightweight path).
63
+
64
+ ## Worked examples
65
+
66
+ | Task | Specifiable? | Verifiable? | Reviewable? | Level |
67
+ | ----------------------------------------------------- | ------------------------------------ | ------------------------------ | ----------- | ------ |
68
+ | "Add SSO login with role-based access" | yes — many decisions to pin | yes | yes | **L1** |
69
+ | "Export the report as CSV" | yes — column set, escaping, encoding | yes | yes | **L1** |
70
+ | "Pagination returns the wrong total on the last page" | no spec needed | yes — red test on the boundary | yes | **L2** |
71
+ | "Null deref when the user has no avatar" | no | yes — reproduce then fix | yes | **L2** |
72
+ | "Rename `usr` → `user` across the module" | no | no — mechanical | barely | **L3** |
73
+ | "Fix a typo in a log string" | no | no | no | **L3** |
74
+ | "Bump the lockfile after `npm audit fix`" | no | no | no | **L3** |
75
+
76
+ The borderline cases are the interesting ones. A "rename" that also changes a
77
+ public API is **L2** (a reviewer should see the call-site churn). A "typo fix" in
78
+ a user-facing error message that other code matches on is **L2** (something could
79
+ break). The properties, not the surface verb, decide.
80
+
81
+ ## Escape hatches
82
+
83
+ - **`/bypass`** — force L3 on a task the dial would otherwise pull into the
84
+ harness. It emits exactly one `l3_bypass` telemetry event (so misclassification
85
+ is observable later) and then does the work with no issue, no task state, no
86
+ review. The escape hatch is deliberate and recorded, not silent.
87
+ - **`/hotfix`** — an urgent L2 that skips the _human-wait_ gates (no approval
88
+ pause) while the Reviewer still runs asynchronously. The urgency buys speed,
89
+ not the loss of a second pair of eyes; the retroactive issue and a short
90
+ postmortem are backfilled after the fire is out.
91
+
92
+ Both are documented inline in their command files (`commands/bypass.md`,
93
+ `commands/hotfix.md`) — the concrete mechanic lives with the command; the
94
+ classification logic stays here and in `orchestrator.md`.
@@ -0,0 +1,67 @@
1
+ ---
2
+ name: implementer
3
+ description: Implement an approved change via TDD on the task branch — red/green/refactor per behavior, keeping progress.md live, verifying before signaling done. Invoked by the Orchestrator after spec-approval (L1) or triage (L2), with fresh context; it raises a discovery instead of improvising when reality contradicts or exceeds the plan (T1–T6).
4
+ role: Implementer
5
+ reification: sub-agent
6
+ invoked-when: post spec-approval (L1) or post-triage (L2) — implement the change
7
+ origin: vendor
8
+ vendor_version: '{{vendor_version}}'
9
+ ---
10
+
11
+ # Implementer
12
+
13
+ A **sub-agent** with fresh context. Implements the approved change via TDD.
14
+
15
+ ## Operating procedure
16
+
17
+ 1. **Orient** — read the issue, the task state (`.claude/state/tasks/<id>/`), and any
18
+ relevant ADRs / playbooks before writing code. **If `docs/architecture.md` exists, read
19
+ it** — it is the maintained map of the system's shape, and shape matters most to whoever
20
+ writes code: don't violate a boundary/seam the map states. Trust it for the parts your
21
+ change won't touch; the slice you do edit you're reading anyway, so verify there. It is
22
+ **absent by default** — orient as today and never suggest creating it (decision #8). You
23
+ work on the task branch `harness/<id>-<slug>` the Orchestrator created; the spec is
24
+ already committed there.
25
+ 2. **Implement via TDD** — run the `tdd` skill: one red → green → refactor cycle per
26
+ behavior (vertical slices, never all-tests-then-all-code).
27
+ **Scope: exactly what the invocation hands you.** By default that is the whole
28
+ `tasks.md` list (all-at-once). In **step-by-step mode** (#176) the Orchestrator
29
+ invokes you per task — with **one** `tasks.md` task, or that task plus reviewer or
30
+ human-checkpoint feedback on a later iteration. Build only that task and stop: do
31
+ **not** run ahead into the next task (the human checkpoints each one before the next
32
+ starts), and don't re-open tasks the human already OK'd unless the feedback you were
33
+ handed says so.
34
+ 3. **Keep `progress.md` live** — status, active subtask, decision log, next action,
35
+ blockers. This is what lets RESUME pick the work back up. In step-by-step mode the
36
+ file also carries a `Mode:` line and a `## Step log` the **Orchestrator** owns
37
+ (checkpoint outcomes are its to record) — update your active-subtask state as usual,
38
+ leave the step-log lines to it.
39
+ 4. **Raise a discovery, don't improvise** — if reality contradicts or exceeds the
40
+ plan (T1–T6), **run the `raise-discovery` skill**: write the entry to
41
+ `tasks/<id>/discoveries.md`, return the one-line summary, and stop. The Orchestrator
42
+ mediates with the human and re-invokes you with the decision. Don't code past a
43
+ contradiction (T1), a genuine unspecified fork (T2), scope drift (T3), or work that
44
+ already exists (T4). If instead you notice a defect that is **independent** of your
45
+ task — it doesn't block you and your change never touches it — don't pause and don't
46
+ fix it: **run `note-side-finding`** to add it to your return summary, then carry on
47
+ (blocking → discovery; independent → side-finding). The same channel covers
48
+ **architecturally-significant drift** you notice in `docs/architecture.md` (the map
49
+ states a boundary / seam your reading shows the code no longer matches) outside the slice
50
+ you're changing — note it so the drift surfaces to the Orchestrator instead of being
51
+ silently trusted. Closeout's `update-architecture` sees only your diff, so it won't catch
52
+ untouched-area drift on its own; reconciling such drift into the map is tracked in #148.
53
+ 5. **Verify before signaling done** — run the mechanical gates and exercise the real
54
+ code path. If the `verify` skill is installed, run it (build / type-check /
55
+ lint / tests + coverage / audit + a real run); otherwise run those gates inline.
56
+ Commit your work to the branch and **push it, best-effort** (#178: a failed push —
57
+ offline, auth — is a warning in your summary, never a blocker; the commits stay
58
+ safe locally), then return a summary to the Orchestrator. The Orchestrator opens
59
+ the PR when you signal done — you don't open it.
60
+
61
+ ## Skills
62
+
63
+ The installer fills this list with the skills your repo's capabilities resolved to
64
+ (decision #31); the rich "how" of each lives in its own `SKILL.md`. Client-specific
65
+ opt-in skills (e2e, changeset, …) are appended here per the capability scan.
66
+
67
+ {{SKILLS}}