@lemoncode/lemony 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/LICENSE +21 -0
  2. package/PRIVACY.md +147 -0
  3. package/README.md +189 -0
  4. package/catalog/VERSION +1 -0
  5. package/catalog/agents/README.md +29 -0
  6. package/catalog/agents/architect.md +81 -0
  7. package/catalog/agents/fit-assessment.md +94 -0
  8. package/catalog/agents/implementer.md +67 -0
  9. package/catalog/agents/orchestrator.md +627 -0
  10. package/catalog/agents/reviewer.md +124 -0
  11. package/catalog/agents/spec-author.md +69 -0
  12. package/catalog/agents/ui-designer.md +25 -0
  13. package/catalog/commands/add-capability.md +69 -0
  14. package/catalog/commands/bypass.md +40 -0
  15. package/catalog/commands/define.md +24 -0
  16. package/catalog/commands/hotfix.md +47 -0
  17. package/catalog/commands/pause.md +52 -0
  18. package/catalog/commands/resume.md +56 -0
  19. package/catalog/commands/spinoff.md +59 -0
  20. package/catalog/commands/triage.md +24 -0
  21. package/catalog/harness.config.schema.json +116 -0
  22. package/catalog/hooks/README.md +56 -0
  23. package/catalog/hooks/init.sh +281 -0
  24. package/catalog/hooks/lib/lemony.sh +41 -0
  25. package/catalog/hooks/lib/playbook-scan.sh +394 -0
  26. package/catalog/hooks/lib/transcript-grep.sh +56 -0
  27. package/catalog/hooks/require-playbook.sh +97 -0
  28. package/catalog/hooks/session-close.sh +232 -0
  29. package/catalog/hooks/suggest-playbook.sh +72 -0
  30. package/catalog/playbook-format.md +198 -0
  31. package/catalog/schemas/README.md +13 -0
  32. package/catalog/schemas/tier2-events-history.md +104 -0
  33. package/catalog/schemas/tier2-events.md +286 -0
  34. package/catalog/skills/README.md +62 -0
  35. package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
  36. package/catalog/skills/code-explorer/SKILL.md +76 -0
  37. package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
  38. package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
  39. package/catalog/skills/grill-with-docs/SKILL.md +270 -0
  40. package/catalog/skills/grill-with-docs/reference.md +236 -0
  41. package/catalog/skills/mutation-testing/SKILL.md +84 -0
  42. package/catalog/skills/note-side-finding/SKILL.md +89 -0
  43. package/catalog/skills/playbook-iterate/SKILL.md +78 -0
  44. package/catalog/skills/prd-to-spec/SKILL.md +181 -0
  45. package/catalog/skills/raise-discovery/SKILL.md +112 -0
  46. package/catalog/skills/resolve-discovery/SKILL.md +123 -0
  47. package/catalog/skills/review-pr/SKILL.md +106 -0
  48. package/catalog/skills/review-pr/reference.md +105 -0
  49. package/catalog/skills/security-review/SKILL.md +90 -0
  50. package/catalog/skills/senior-review/SKILL.md +99 -0
  51. package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
  52. package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
  53. package/catalog/skills/spec-to-issue/SKILL.md +88 -0
  54. package/catalog/skills/task-closeout/SKILL.md +229 -0
  55. package/catalog/skills/tdd/SKILL.md +171 -0
  56. package/catalog/skills/test-gap-report/SKILL.md +71 -0
  57. package/catalog/skills/triage-issue/SKILL.md +102 -0
  58. package/catalog/skills/update-architecture/SKILL.md +69 -0
  59. package/catalog/skills/verify/SKILL.md +90 -0
  60. package/catalog/skills/write-adr/SKILL.md +77 -0
  61. package/catalog/templates/README.md +32 -0
  62. package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
  63. package/catalog/templates/claude-code/agents.md.tpl +109 -0
  64. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
  65. package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
  66. package/catalog/templates/claude-code/state/history.md.tpl +6 -0
  67. package/dist/cli.mjs +5691 -0
  68. package/package.json +80 -0
@@ -0,0 +1,286 @@
1
+ # Tier 2 events — schema
2
+
3
+ > **Authority.** This document defines the on-wire shape of every event the harness
4
+ > writes to `.claude/state/events.jsonl`. The `src/events/` Zod schemas are the
5
+ > executable mirror; if they disagree, **this document wins**, and the Zod
6
+ > schemas are updated to match in the same change. Per-release deltas are
7
+ > recorded in [`tier2-events-history.md`](tier2-events-history.md) (forward-only,
8
+ > dispatch-on-read).
9
+
10
+ Tier 1 (client-local) writes append-only JSONL from day one so the data is
11
+ forward-compatible with the Tier 2 central backend designed in Fase 1+ (decision
12
+ \#24, #25, #27, #51).
13
+
14
+ ## Storage
15
+
16
+ - One event per line, UTF-8, in `.claude/state/events.jsonl`. The stream is
17
+ **local-only and gitignored — never committed** (ADR 0008, retiring decision
18
+ #18/#21): it sits in the managed `GITIGNORE_BLOCK` beside `current-*.md` /
19
+ `sessions/`. There is no Tier 2 consumer yet, so committing only dirtied the
20
+ base; transport to Tier 2 is the sink designed in #137.
21
+ - **Append-only, except confirmed-sent prefix-prune** (#240, ADR 0008 §Amendment).
22
+ Emitters only ever append. The send engine may **collapse the already-delivered prefix**
23
+ (`[0:cursor]`) once it exceeds ~5MB — never dropping unsent bytes — which rewrites the
24
+ file and may **reorder** the unsent tail relative to concurrent appends. Order is not a
25
+ contract (aggregation groups by version/component, the cursor counts bytes), so this is
26
+ safe; consumers must not rely on global ordering or on the file never being rewritten.
27
+ - **Atomic append.** The CLI writer (`src/events/append.ts`) calls
28
+ `fs.appendFile`, which opens with `O_APPEND` and issues a single `write(2)`.
29
+ Up to `PIPE_BUF` bytes (4096 on macOS/Linux) such a write is POSIX-atomic —
30
+ every event line in this schema stays well under that — so concurrent
31
+ writers can interleave whole lines but never tear a single one and never
32
+ lose an event.
33
+
34
+ ## Envelope
35
+
36
+ Every event line starts with this envelope. Per-type fields are added at the
37
+ same top level — there is no nested `payload`, so Zod discriminated unions key on
38
+ `type`.
39
+
40
+ | Field | Type | Required | Axis | Notes |
41
+ | ----------------- | ------ | -------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
42
+ | `type` | string | yes | `internal-enum` | One of the 9 event types listed below. Discriminator. |
43
+ | `ts` | string | yes | `metric` | UTC ISO 8601 with `Z` suffix (e.g. `2026-05-28T14:30:00.000Z`). **No local offsets.** |
44
+ | `user` | string | yes | `local-only` | `git config user.email` of the actor (decision #53). Never exported in any tier (D7). |
45
+ | `project` | string | yes | `identity` | `task_storage.repo` slug (e.g. `acme/widgets`), from `harness.config.yml`. **Never `OWNER/REPO`** — the CLI refuses to emit while that placeholder is the value (see [Placeholder guard](#placeholder-guard)). |
46
+ | `task_id` | string | no | `identity` | Task issue id (e.g. `42`) when the event has a task context. Absent on session/global events. A per-project correlator — only meaningful alongside `project`, so it shares the `identity` axis. |
47
+ | `harness_version` | string | yes | `metric` | `version` of the **installed** `@lemoncode/lemony` package — _not_ `vendor_version` from config. |
48
+
49
+ ### Placeholder guard
50
+
51
+ `lemony install` writes `task_storage.repo: OWNER/REPO` to
52
+ `harness.config.yml` only when it cannot resolve a real slug — typically a
53
+ non-TTY install in a repo with no `origin` remote and no
54
+ `--task-storage-repo=` flag. The placeholder is a sentinel: every emit call
55
+ checks for it and refuses with a friendly error rather than stamping garbage
56
+ onto telemetry. Downstream aggregators therefore never see a `project:
57
+ "OWNER/REPO"` line and don't need to filter for it.
58
+
59
+ Interactive installs avoid the placeholder entirely — the CLI prompts the user
60
+ to either enter a slug, create a GitHub repo via `gh repo create`, or
61
+ explicitly skip. The skip path falls back to the placeholder with the same
62
+ warning the non-TTY path prints (emits will block until the config is fixed).
63
+
64
+ ### `harness_version` source
65
+
66
+ The CLI reads its **own** `package.json` `version` field via
67
+ `import.meta.dirname` (relative to the build output). This guarantees forensic
68
+ correctness: if a client's `harness.config.yml` was pinned to `0.1.0-alpha.0`
69
+ but they upgraded the CLI to `0.2.0-alpha.0` without re-running `install`, the
70
+ event records `0.2.0-alpha.0`. Drift between the two is surfaced separately by
71
+ the `init.sh` SessionStart hook as a warning.
72
+
73
+ ### Field axes
74
+
75
+ Every `(event_type, field)` occurrence carries one of **five axes**. The axis is a
76
+ property of the **occurrence, not the field name** — the same name can differ by
77
+ event (`reason` is an `internal-enum` in `session_closed` but `free-text` in
78
+ `review_rejected` / `l3_bypass`). The sanitizer dispatches on `type`, so per-event
79
+ assignment is natural. The axis drives forward-sanitization when events are
80
+ exported to Tier 2 (`src/telemetry/sanitize.ts`); the executable mirror of this
81
+ table is `src/telemetry/sanitize.constant.ts` (`FIELD_AXIS`), kept in lock-step by
82
+ a doc-parse test.
83
+
84
+ | Axis | Export policy | Meaning |
85
+ | --------------- | -------------------------------------- | ------------------------------------------------------------------------------------ |
86
+ | `local-only` | **drop always** (every tier) | Stays in `events.jsonl` + `telemetry show`; never leaves the laptop. PII / identity. |
87
+ | `identity` | keep `project` tier / drop `anonymous` | A per-project identifier — only meaningful alongside `project`. |
88
+ | `free-text` | keep `project` tier / drop `anonymous` | Unbounded human text; a de-anonymization vector. |
89
+ | `internal-enum` | **keep always** | A bounded, low-cardinality value (a fixed enum, or a roster component name). |
90
+ | `metric` | **keep always** | Pure measurement — timestamps, counts, durations, booleans. Safest to aggregate. |
91
+
92
+ Two tiers select which axes survive: **`anonymous`** (the on-by-default floor) keeps
93
+ only `internal-enum` + `metric`; **`project`** (opt-in, for dogfood) additionally
94
+ keeps `identity` + `free-text`. `local-only` is dropped in both. `identity` and
95
+ `free-text` share today's policy but stay distinct axes for future divergence (a
96
+ hashed `user_hash` would be `identity`-with-hashing, not `free-text`). v1 wires only
97
+ the `anonymous` branch (decisions D7/D8/D9, ADR 0020).
98
+
99
+ `attributed_name` is deliberately an `internal-enum` even though Zod types it as a
100
+ bounded free string: the axis is policy-oriented (keep-always), and the field is the
101
+ moat metric (#1, "which component causes friction") — a roster component name shared
102
+ across all installs, not sensitive free text (D8). The aggregation script flags names
103
+ outside the known roster as the data-quality thermometer (see [Attribution](#attribution)).
104
+
105
+ ---
106
+
107
+ ## Event types (9)
108
+
109
+ Five are emitted in P5. `bug_post_merge` is deferred to P8 (meta-test). `l3_bypass`
110
+ is deferred to P6 (the `/bypass` command). `followup_captured` is emitted by the
111
+ `/spinoff` command (#112). `step_completed` is emitted by the Orchestrator in
112
+ step-by-step mode (#176). The schema covers all nine so the file is
113
+ forward-compatible — readers dispatch on `type` and ignore unknowns.
114
+
115
+ ### 1. `session_closed` _(P5)_
116
+
117
+ Emitted by `session-close.sh` on `SessionEnd` or `/pause` (manual). One per
118
+ session.
119
+
120
+ | Field | Type | Required | Axis | Notes |
121
+ | ------------------ | ------- | -------- | --------------- | --------------------------------------------------------------------------------------------------------------- |
122
+ | `session_start_ts` | string | yes | `metric` | UTC ISO 8601 Z. Read from `current-<user>.md` frontmatter. |
123
+ | `session_active_h` | number | yes | `metric` | Active hours this session — `(ts − session_start_ts) / 3600s`. ≥ 0, finite. |
124
+ | `reason` | string | yes | `internal-enum` | `clear` \| `resume` \| `logout` \| `prompt_input_exit` \| `bypass_permissions_disabled` \| `other` \| `manual`. |
125
+ | `auto_close` | boolean | yes | `metric` | `true` when fired by `SessionEnd` (no narrative); `false` when fired by `/pause`. |
126
+
127
+ ### 2. `spec_created` _(P5)_
128
+
129
+ Emitted by the `prd-to-spec` skill when the three spec files are written (the
130
+ hand-off to `spec-to-issue`).
131
+
132
+ | Field | Type | Required | Axis | Notes |
133
+ | -------------- | ------ | -------- | ----------- | ------------------------------------------------------------------------------------- |
134
+ | `task_id` | string | yes | `identity` | Required for this type (overrides the envelope's `task_id` optionality). |
135
+ | `topic` | string | yes | `free-text` | The topic slug from the spec branch (`<slug>` in `harness/<id>-<slug>`). 1-200 chars. |
136
+ | `requirements` | number | yes | `metric` | Count of EARS requirements in `requirements.md` (≥ 1, integer). |
137
+
138
+ ### 3. `spec_approved` _(P5)_
139
+
140
+ Emitted by the Orchestrator when it transitions `spec-in-progress → spec-ready`
141
+ (human approval gate cleared).
142
+
143
+ | Field | Type | Required | Axis | Notes |
144
+ | ------------ | ------ | -------- | ---------- | ------------------------------------------------------------- |
145
+ | `task_id` | string | yes | `identity` | Required for this type. |
146
+ | `iterations` | number | yes | `metric` | How many grill-or-refine cycles preceded approval (≥ 1, int). |
147
+
148
+ ### 4. `task_done` _(P5)_
149
+
150
+ Emitted by the Orchestrator at closeout (after `gh pr view` confirms `MERGED`,
151
+ before `git rm` of the task state).
152
+
153
+ | Field | Type | Required | Axis | Notes |
154
+ | ------------------- | ------ | -------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
155
+ | `task_id` | string | yes | `identity` | Required for this type. |
156
+ | `level` | string | yes | `internal-enum` | `L1` \| `L2` \| `L3` — the task-fit dial value used. |
157
+ | `cycle_time_h` | number | yes | `metric` | Wall-clock hours from issue creation to merge. ≥ 0, finite. |
158
+ | `review_rejections` | number | yes | `metric` | Count of `review_rejected` events for this `task_id` (≥ 0, int). |
159
+ | `mode` | string | no | `internal-enum` | `all_at_once` \| `step_by_step` — the mode chosen at the L1 approval gate (#176). **Absent on L2** (the question only exists where `tasks.md` does). |
160
+ | `steps` | number | no | `metric` | Count of `step_completed` events for this task (≥ 1, int). Only meaningful when `mode` is `step_by_step`; < total tasks after a mid-task downgrade. |
161
+
162
+ ### 5. `review_rejected` _(P5)_
163
+
164
+ Emitted by the Reviewer when the verdict is REJECT (decision #25, transient
165
+ state — no dedicated label).
166
+
167
+ | Field | Type | Required | Axis | Notes |
168
+ | ----------------- | ------ | -------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
169
+ | `task_id` | string | yes | `identity` | Required for this type. |
170
+ | `reason` | string | yes | `free-text` | Short human-readable reason (one line; never the full review comment). 1-500 chars. |
171
+ | `iteration` | number | yes | `metric` | 1-based: the Nth rejection of this task (≥ 1, int). |
172
+ | `step` | number | no | `metric` | The step (1-based `tasks.md` task number) whose per-step review rejected (#176). **Absent** on full-pass and all-at-once rejections. |
173
+ | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.** |
174
+ | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars), e.g. `implementer`. Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
175
+
176
+ ### 6. `bug_post_merge` _(P8 — schema only)_
177
+
178
+ Reserved for meta-test (`P8`). Schema is fixed now so readers can dispatch
179
+ forward-compatibly.
180
+
181
+ | Field | Type | Required | Axis | Notes |
182
+ | -------------- | ------ | -------- | --------------- | ------------------------------------------------ |
183
+ | `task_id` | string | yes | `identity` | The task whose merged change introduced the bug. |
184
+ | `discovered_h` | number | yes | `metric` | Hours from merge to bug discovery (≥ 0, finite). |
185
+ | `severity` | string | yes | `internal-enum` | `low` \| `medium` \| `high` \| `critical`. |
186
+
187
+ ### 7. `l3_bypass` _(P6 — schema only)_
188
+
189
+ Reserved for the `/bypass` command (`P6`). A global event — `task_id` is inherited
190
+ from the envelope (optional) and usually absent.
191
+
192
+ | Field | Type | Required | Axis | Notes |
193
+ | -------- | ------ | -------- | ----------- | ------------------------------------------------------------------ |
194
+ | `topic` | string | yes | `free-text` | One-line subject (typo, rename, lockfile bump, etc.). 1-200 chars. |
195
+ | `reason` | string | yes | `free-text` | Why the harness was bypassed. 1-500 chars (≤ 200 recommended). |
196
+
197
+ ### 8. `followup_captured` _(#112 — `/spinoff`)_
198
+
199
+ Emitted by the `/spinoff` command when a non-blocking, independent defect found
200
+ mid-task is parked as a `harness:managed` + `harness:status:pending` stub. Feeds the
201
+ "follow-up bugs per parent task" metric. **Not** `bug_post_merge` — that is a
202
+ post-merge / production signal; conflating them would dirty the post-merge metric.
203
+
204
+ | Field | Type | Required | Axis | Notes |
205
+ | ---------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------- |
206
+ | `task_id` | string | yes | `identity` | The captured stub's own issue id (overrides the envelope's optionality). |
207
+ | `parent_task_id` | string | no | `identity` | The originating task id. **Absent** when `/spinoff` runs outside any active task (a deferred stub). |
208
+ | `severity` | string | no | `internal-enum` | `low` \| `medium` \| `high` \| `critical`. Best-effort — set only when cheaply inferred. |
209
+
210
+ ### 9. `step_completed` _(#176 — step-by-step mode)_
211
+
212
+ Emitted by the Orchestrator each time a human checkpoint **resolves** in
213
+ step-by-step mode (L1 opt-in, chosen at the approval gate). One event per
214
+ checkpoint, not per step: a step the human sends back ("changes") emits again
215
+ when it re-checkpoints, with the same `step`. This is the signal that justifies
216
+ (or condemns) the mode — the rate of checkpoints that catch things, and where
217
+ humans bail out (`ok_downgrade`).
218
+
219
+ | Field | Type | Required | Axis | Notes |
220
+ | ------------------- | ------ | -------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
221
+ | `task_id` | string | yes | `identity` | Required for this type. |
222
+ | `step` | number | yes | `metric` | 1-based `tasks.md` task number the checkpoint belongs to (≥ 1, int). |
223
+ | `review_iterations` | number | yes | `metric` | Reviewer invocations that preceded this checkpoint (≥ 1, int — every step is reviewed before the human; resets after a "changes"). |
224
+ | `checkpoint_result` | string | yes | `internal-enum` | `ok` \| `changes` \| `ok_downgrade` (OK and switch the remaining tasks to all-at-once). |
225
+ | `attributed_kind` | string | no | `internal-enum` | `agent` \| `skill` \| `playbook` — the kind of component the friction is attributed to (#217). **Omitted when the emitter can't attribute.** |
226
+ | `attributed_name` | string | no | `internal-enum` | The component's name (free string, 1-200 chars). Independently optional in the schema; emitters pair it with `attributed_kind` and omit both when they can't attribute (#217). Free-string by design — see [Attribution](#attribution). |
227
+
228
+ ---
229
+
230
+ ## Attribution
231
+
232
+ The two **friction** events — `review_rejected` and `step_completed` — carry an
233
+ optional `attributed_kind` (`agent` | `skill` | `playbook`) + `attributed_name`
234
+ (free string) pair (#217). They answer "_which_ component did this friction come
235
+ from", feeding the "friction attributed to a specific skill/agent" metric. They are
236
+ set by the emitting prompts (Reviewer, Orchestrator), which list the valid roster and
237
+ are instructed to **omit both when they can't confidently attribute** — a wrong guess
238
+ is worse than a gap.
239
+
240
+ `task_done` deliberately has **no** attribution field: its "where" is reconstructed
241
+ from its child `review_rejected` events, not collapsed into one fuzzy culprit.
242
+
243
+ `attributed_name` is a **free string this phase, by design** (measure-then-decide):
244
+ the CLI does not enforce a component registry. It rides the `internal-enum` axis (kept
245
+ in both tiers) because it is the moat metric and a bounded roster name, not sensitive
246
+ free text (D8) — it is the one non-numeric field that survives the `anonymous`
247
+ projection. The aggregation script (#218) reports attribution coverage and flags names
248
+ outside the known roster — that signal is the thermometer. If it shows the data quality
249
+ is poor, the cheap next step is enum validation in `lemony emit`; an MCP-backed registry
250
+ only if that proves insufficient. Historical lines (pre-#217) lack both fields, so
251
+ coverage starts near 0% and ramps.
252
+
253
+ ## Reader contract (forward-only, dispatch-on-read)
254
+
255
+ A reader of `events.jsonl`:
256
+
257
+ 1. Parses each line as JSON. Malformed lines are logged and skipped — they
258
+ never abort the stream.
259
+ 2. Dispatches on `type`. Unknown types are skipped (forward-compatibility:
260
+ new event types appear without breaking old readers).
261
+ 3. Trusts the per-release delta in `tier2-events-history.md` for field
262
+ renames / deprecations.
263
+
264
+ ## Writer contract
265
+
266
+ A writer (the `lemony emit` CLI):
267
+
268
+ 1. Builds the envelope at write time — `ts` (now, UTC `Z`), `user` (from
269
+ `git config user.email`), `project` (from `harness.config.yml`),
270
+ `harness_version` (from the installed package).
271
+ 2. Merges per-type fields into the same top level (no nested `payload`).
272
+ 3. Validates against the Zod schema for `type` (each schema is `.strict()`,
273
+ so an unknown key — typically a typo'd `--task-iid` flag — **rejects**
274
+ loud). Never writes a partial line.
275
+ 4. Appends the JSON line via `fs.appendFile` (single `O_APPEND` `write(2)`,
276
+ POSIX-atomic up to `PIPE_BUF`) to `.claude/state/events.jsonl`. Creates the
277
+ file (and parent dir, scaffolded by `install`) when missing.
278
+
279
+ ## Out of scope (recorded for forward-design)
280
+
281
+ - Tier 2 central ingestion, export, and aggregation (Fase 1+, decision #24).
282
+ - The `project`-tier export branch: the axis table assigns the `identity` /
283
+ `free-text` policy, but the sanitizer wires only the `anonymous` branch in v1
284
+ (D8/D9). The `project` branch lands with the consent/config work (#229).
285
+ - Backward-compatible field renames — `tier2-events-history.md` records them
286
+ forward-only; readers add the alias when they care.
@@ -0,0 +1,62 @@
1
+ # skills/ — vendor skill catalog
2
+
3
+ > **Status: 18 skills (P1–P3 + P4).** P1 migrated `triage-issue`, `tdd`,
4
+ > `senior-review`. P2 migrated `grill-with-docs` and authored `prd-to-spec`,
5
+ > `spec-to-issue`, `task-closeout`. P3 authored the discovery loop (`raise-discovery`,
6
+ > `resolve-discovery`). **P4 slice 1** added the Reviewer set — `verify`,
7
+ > `silent-failure-hunter`, `security-review`, `spec-compliance-check`,
8
+ > `test-gap-report` — rewrote `senior-review` to v2, and rolled out the gating
9
+ > frontmatter (`min-profile`/`phase`/`invoked-by`) across the catalog. **P4 slice 2**
10
+ > added the Architect set — `write-adr`, `update-architecture`, `code-explorer`,
11
+ > `playbook-iterate`. All `origin: vendor`.
12
+
13
+ Generic, **project-agnostic** skills only (decision #28). Project-specific skills
14
+ (e2e, changeset, docs) live in the **client's** `.claude/skills/`, not here. Each
15
+ skill is a folder `skills/<name>/SKILL.md`.
16
+
17
+ ## Gating frontmatter (decision #31; ADR 0015)
18
+
19
+ The installer scans this catalog, parses each skill's frontmatter, and lands a skill
20
+ when every **`applies-when`** capability key holds for the repo (install-time,
21
+ deterministic). A skill with no `applies-when` is always installed — there is no
22
+ profile tier (the coarse `min-profile` filter #39 was retired in ADR 0015, when
23
+ profiles collapsed to a single capability-gated skill set). Each sub-agent's
24
+ `{{SKILLS}}` marker is filled with the skills it `invoked-by`, grouped by **`phase`**.
25
+ A skill with no `phase` is universal (e.g. `raise-discovery`).
26
+
27
+ | Field | Meaning | Default |
28
+ | ------------------- | ---------------------------------------------------------------------- | ---------------- |
29
+ | `phase` | `pre-implementation` / `during-implementation` / `post-implementation` | none ⇒ universal |
30
+ | `invoked-by` | roles whose marker lists it | `[]` |
31
+ | `applies-when` | capability keys (AND) the repo must satisfy at install | `[]` (always) |
32
+ | `trigger-condition` | per-change runtime guard rendered with the skill | — |
33
+
34
+ ## Planned MVP catalog (Fase 0)
35
+
36
+ | Skill | Role | Status |
37
+ | ----------------------- | ---------------------------------- | ------------------ |
38
+ | `grill-with-docs` | Orchestrator (shared w/ Architect) | migrated ✓ (P2) |
39
+ | `triage-issue` | Orchestrator | migrated ✓ (P1) |
40
+ | `task-closeout` | Orchestrator | authored ✓ (P2) |
41
+ | `prd-to-spec` | Spec Author | authored ✓ (P2) |
42
+ | `spec-to-issue` | Spec Author | authored ✓ (P2) |
43
+ | `tdd` | Implementer | migrated ✓ (P1) |
44
+ | `senior-review` | Reviewer | v2 ✓ (P4 s1) |
45
+ | `verify` | Implementer, Reviewer | authored ✓ (P4 s1) |
46
+ | `security-review` | Reviewer | authored ✓ (P4 s1) |
47
+ | `silent-failure-hunter` | Reviewer | authored ✓ (P4 s1) |
48
+ | `spec-compliance-check` | Reviewer | authored ✓ (P4 s1) |
49
+ | `test-gap-report` | Reviewer | authored ✓ (P4 s1) |
50
+ | `write-adr` | Architect | authored ✓ (P4 s2) |
51
+ | `update-architecture` | Architect | authored ✓ (P4 s2) |
52
+ | `code-explorer` | Architect | authored ✓ (P4 s2) |
53
+ | `playbook-iterate` | Architect | authored ✓ (P4 s2) |
54
+ | `raise-discovery` | all sub-agents (universal) | authored ✓ (P3) |
55
+ | `resolve-discovery` | Orchestrator | authored ✓ (P3) |
56
+
57
+ **Parked from the vendor MVP:** `feature-flow`, `prd-to-plan`, `prd-to-issues`,
58
+ `pr-review`, `ui-design`, `project-setup`, `write-a-skill`, `grill-me` (deprecated).
59
+
60
+ **ECC as marketplace (decision #42):** [`affaan-m/ECC`](https://github.com/affaan-m/ECC)
61
+ is MIT; when a client needs a specific skill ECC already has, copy-and-adapt it into
62
+ the client's `.claude/skills/` rather than bloating this catalog.
@@ -0,0 +1,78 @@
1
+ ---
2
+ name: bootstrap-architecture
3
+ description: Author the first docs/architecture.md — a holistic map of the system's current shape, fitted to this project. A one-time bootstrap the Architect runs when a client opts into the architecture capability; thereafter update-architecture maintains it incrementally. The harness only does this when the human asks (decision #8).
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ invoked-by: [architect]
7
+ ---
8
+
9
+ # Bootstrap Architecture
10
+
11
+ ## Core Principle
12
+
13
+ `docs/architecture.md` is the **living high-level map** of the system — the shape a new
14
+ engineer reads first (contexts, boundaries/ownership, integration seams, external
15
+ dependencies, data flow). This skill authors it **for the first time**: a holistic map of
16
+ the system **as it is today**, read from the actual code.
17
+
18
+ This is the one-time **bootstrap**, the opposite of the incremental `update-architecture`:
19
+ that skill makes the smallest surgical edit after a single change; this one reads the whole
20
+ repo and writes the initial coherent map. After this lands, `update-architecture` (gated on
21
+ the file now existing) keeps it true incrementally.
22
+
23
+ It is **not gated** — it is available before any `docs/architecture.md` exists, because its
24
+ job is to create the first one. But the harness runs it **only when the human opts in**
25
+ (via `/add-capability`, or an explicit request), never on its own: the vendor gives the
26
+ framework, not the architecture (decision #8). The map is authored **from the client's real
27
+ code**, so the harness reflects the project's shape — it never imposes one.
28
+
29
+ ## What to produce
30
+
31
+ A map **fitted to this project**, not a template. The sections **emerge from the contexts
32
+ you actually find** — do **not** treat the list below as a fixed heading set to emit. These
33
+ are **lenses to look through**, not mandatory sections: cover the ones this system actually
34
+ has, drop the ones it doesn't, name them in the project's own terms. Aim for the shape, not
35
+ the detail:
36
+
37
+ - the **bounded contexts / modules** the system actually divides into, and what each owns;
38
+ - the **boundaries & ownership rules** between them ("X owns Y; others reference by id");
39
+ - the **integration seams** (sync HTTP, domain events, queues, a provider abstraction);
40
+ - the **external dependencies** that carry lock-in (a database, a broker, an auth provider);
41
+ - the **data flow** a reader would otherwise get wrong.
42
+
43
+ Keep it a **map, not a code listing**. Resist implementation detail that belongs in code,
44
+ `CLAUDE.md`, or a playbook. If the project records decisions as ADRs, link the _why_
45
+ (`see ADR-NNNN`) rather than restating it — the map holds the _shape_, the ADR the _decision_.
46
+
47
+ ## Process
48
+
49
+ 1. **Orient over the whole repo.** Read the structure to derive the real shape — the
50
+ top-level layout, the module/folder boundaries, the entry points, the external deps in
51
+ the manifest, the seams between parts. Where a `code-explorer` map is available, start
52
+ from it. This is a read of the **current** code, not a guess.
53
+ 2. **Draft the map at the right altitude** — sections matching the contexts you found,
54
+ each a few lines: what it is, what it owns, how it connects. Prefer a small, true map
55
+ over an exhaustive one; the reader wants the shape.
56
+ 3. **Write `docs/architecture.md`.** Create the file (and `docs/` if absent).
57
+ 4. **Hand back for activation.** You do not run `repair` yourself — the map alone does not
58
+ install the maintainer skill. Report to your invoker (the Orchestrator) that
59
+ `docs/architecture.md` now exists, so it runs `repair`; its re-scan detects the new file
60
+ and installs `update-architecture` to keep the map current. See the `/add-capability`
61
+ command.
62
+
63
+ ## Report
64
+
65
+ Return to your invoker: the sections written, a one-line summary of the shape the map now
66
+ captures, and the explicit note that `update-architecture` should be activated via `repair`.
67
+ If the project is too small or too uniform to have a meaningful architecture (a single-purpose
68
+ script, a flat library), **say so and write nothing** — an architecture map for a project
69
+ without architecture is noise, and #8 means the client need not have one.
70
+
71
+ ## Uncontemplated Scenarios
72
+
73
+ When a case doesn't clearly fit:
74
+
75
+ 1. Apply the closest matching approach with reasoning.
76
+ 2. **Flag it**: "This isn't covered by the bootstrap-architecture skill. I did [approach]
77
+ because [reason]. Want to refine the skill?"
78
+ 3. Offer to add a rule for the case.
@@ -0,0 +1,76 @@
1
+ ---
2
+ name: code-explorer
3
+ description: Systematically map a large or unfamiliar codebase and return a structured orientation — entry points, modules, data flow, conventions, and the seams where bugs live. Read-only; it explores and reports, it never edits. Use when the Architect (or another agent, via the Orchestrator) needs to get oriented before a decision, a spec, or a deep change in a codebase no one has in context.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ invoked-by: [architect]
7
+ trigger-condition: orienting in a large or unfamiliar codebase
8
+ ---
9
+
10
+ # Code Explorer
11
+
12
+ ## Core Principle
13
+
14
+ Before you can decide, spec, or change anything safely in an unfamiliar codebase, you
15
+ need a map. This skill produces that map: a structured, evidence-based orientation a
16
+ fresh sub-agent can build on. It is **read-only** — it reads, traces, and reports; it
17
+ never edits. Its value is the same fresh-context anti-bias as any sub-agent: it sees the
18
+ code as it is, not as someone hoped it was.
19
+
20
+ ## Process
21
+
22
+ Work outside-in. Cite real paths and symbols — an orientation that can't be checked is
23
+ worthless.
24
+
25
+ **Start from the map if there is one.** If the repo keeps `docs/architecture.md`, read it
26
+ first — it is the maintained high-level map of the system's shape (contexts, boundaries,
27
+ seams). Use it as your baseline: don't re-derive what it already states; deep-dive only
28
+ where it is thin or stale for the question at hand. If it is **absent**, map from scratch
29
+ as below — and never suggest creating it (it is the client's choice, decision #8). When the
30
+ map contradicts the code in an area you read (the map says X, the code does Y), call out the
31
+ staleness in your report's **Notes** so the Architect (your invoker, who owns the map) can
32
+ reconcile it via `update-architecture` — don't silently trust either side.
33
+
34
+ 1. **Frame the question.** What is the exploration _for_? "Where does auth happen?",
35
+ "How does a request flow end to end?", "Is there already a solution to X?" Scope the
36
+ sweep to the question — don't map the whole repo when one slice is asked for.
37
+ 2. **Find the entry points.** The manifest (`package.json` scripts, `bin`, `main`), the
38
+ server bootstrap, the route table, the CLI dispatcher, the test setup. These anchor
39
+ everything else.
40
+ 3. **Map the modules and their boundaries.** The top-level structure, what each
41
+ significant module owns, and how they depend on each other. Note the **seams** —
42
+ where modules talk (HTTP, events, shared state, serialization). Most bugs live here.
43
+ 4. **Trace the critical path(s).** Follow the one or two flows the question cares about
44
+ from entry to effect (request → handler → service → store → response). Name the files
45
+ and functions on the path.
46
+ 5. **Read the conventions.** Naming, file layout, error handling, the test strategy,
47
+ and any `CLAUDE.md` / `CONTEXT.md` / playbooks the repo already documents. An agent
48
+ that follows existing conventions is far less likely to raise a false discovery.
49
+
50
+ ## Report
51
+
52
+ A concise orientation, not a file dump — the conclusion, with paths to verify it:
53
+
54
+ ```
55
+ ## Code map — <scope of the exploration>
56
+
57
+ **Entry points**: <files/commands>
58
+ **Key modules**: <module → what it owns> (the few that matter)
59
+ **Critical path**: <entry → … → effect, with file:symbol references>
60
+ **Seams**: <module boundaries / integration points; where to test first>
61
+ **Conventions**: <naming, layout, error handling, test strategy, docs that exist>
62
+ **Notes for the task**: <existing solutions, risks, open questions>
63
+ ```
64
+
65
+ If the exploration surfaces a genuine T1–T6 case (e.g. the change already exists —
66
+ T4 EXISTING_SOLUTION, or the codebase makes the plan infeasible — T5), that's a
67
+ **discovery**: run `raise-discovery` rather than burying it in the report.
68
+
69
+ ## Uncontemplated Scenarios
70
+
71
+ When a case doesn't clearly fit:
72
+
73
+ 1. Apply the closest matching approach with reasoning.
74
+ 2. **Flag it**: "This isn't covered by the code-explorer skill. I did [approach] because
75
+ [reason]. Want to refine the skill?"
76
+ 3. Offer to add a rule for the case.
@@ -0,0 +1,49 @@
1
+ # ADR Format
2
+
3
+ ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc.
4
+
5
+ Create the `docs/adr/` directory lazily — only when the first ADR is needed.
6
+
7
+ ## Template
8
+
9
+ ```md
10
+ # NNNN — {Short title of the decision}
11
+
12
+ {1-3 sentences: what's the context, what did we decide, and why.}
13
+ ```
14
+
15
+ The `NNNN` number prefixes the title (matching the filename `NNNN-slug.md`).
16
+
17
+ That's it. An ADR can be a single paragraph. The value is in recording _that_ a decision was made and _why_ — not in filling out sections.
18
+
19
+ ## Optional sections
20
+
21
+ Only include these when they add genuine value. Most ADRs won't need them.
22
+
23
+ - **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited
24
+ - **Considered Options** — only when the rejected alternatives are worth remembering
25
+ - **Consequences** — only when non-obvious downstream effects need to be called out
26
+
27
+ ## Numbering
28
+
29
+ Scan `docs/adr/` for the highest existing number and increment by one.
30
+
31
+ ## When to offer an ADR
32
+
33
+ All three of these must be true:
34
+
35
+ 1. **Hard to reverse** — the cost of changing your mind later is meaningful
36
+ 2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?"
37
+ 3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons
38
+
39
+ If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing".
40
+
41
+ ### What qualifies
42
+
43
+ - **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres."
44
+ - **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP."
45
+ - **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out.
46
+ - **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s.
47
+ - **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate.
48
+ - **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract."
49
+ - **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months.
@@ -0,0 +1,77 @@
1
+ # CONTEXT.md Format
2
+
3
+ ## Structure
4
+
5
+ ```md
6
+ # {Context Name}
7
+
8
+ {One or two sentence description of what this context is and why it exists.}
9
+
10
+ ## Language
11
+
12
+ **Order**:
13
+ {A concise description of the term}
14
+ _Avoid_: Purchase, transaction
15
+
16
+ **Invoice**:
17
+ A request for payment sent to a customer after delivery.
18
+ _Avoid_: Bill, payment request
19
+
20
+ **Customer**:
21
+ A person or organization that places orders.
22
+ _Avoid_: Client, buyer, account
23
+
24
+ ## Relationships
25
+
26
+ - An **Order** produces one or more **Invoices**
27
+ - An **Invoice** belongs to exactly one **Customer**
28
+
29
+ ## Example dialogue
30
+
31
+ > **Dev:** "When a **Customer** places an **Order**, do we create the **Invoice** immediately?"
32
+ > **Domain expert:** "No — an **Invoice** is only generated once a **Fulfillment** is confirmed."
33
+
34
+ ## Flagged ambiguities
35
+
36
+ - "account" was used to mean both **Customer** and **User** — resolved: these are distinct concepts.
37
+ ```
38
+
39
+ ## Rules
40
+
41
+ - **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid.
42
+ - **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution.
43
+ - **Keep definitions tight.** One sentence max. Define what it IS, not what it does.
44
+ - **Show relationships.** Use bold term names and express cardinality where obvious.
45
+ - **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs.
46
+ - **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine.
47
+ - **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts.
48
+
49
+ ## Single vs multi-context repos
50
+
51
+ **Single context (most repos):** One `CONTEXT.md` at the repo root.
52
+
53
+ **Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other:
54
+
55
+ ```md
56
+ # Context Map
57
+
58
+ ## Contexts
59
+
60
+ - [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders
61
+ - [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments
62
+ - [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping
63
+
64
+ ## Relationships
65
+
66
+ - **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking
67
+ - **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices
68
+ - **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money`
69
+ ```
70
+
71
+ The skill infers which structure applies:
72
+
73
+ - If `CONTEXT-MAP.md` exists, read it to find contexts
74
+ - If only a root `CONTEXT.md` exists, single context
75
+ - If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved
76
+
77
+ When multiple contexts exist, infer which one the current topic relates to. If unclear, ask.