@lemoncode/lemony 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/LICENSE +21 -0
  2. package/PRIVACY.md +147 -0
  3. package/README.md +189 -0
  4. package/catalog/VERSION +1 -0
  5. package/catalog/agents/README.md +29 -0
  6. package/catalog/agents/architect.md +81 -0
  7. package/catalog/agents/fit-assessment.md +94 -0
  8. package/catalog/agents/implementer.md +67 -0
  9. package/catalog/agents/orchestrator.md +627 -0
  10. package/catalog/agents/reviewer.md +124 -0
  11. package/catalog/agents/spec-author.md +69 -0
  12. package/catalog/agents/ui-designer.md +25 -0
  13. package/catalog/commands/add-capability.md +69 -0
  14. package/catalog/commands/bypass.md +40 -0
  15. package/catalog/commands/define.md +24 -0
  16. package/catalog/commands/hotfix.md +47 -0
  17. package/catalog/commands/pause.md +52 -0
  18. package/catalog/commands/resume.md +56 -0
  19. package/catalog/commands/spinoff.md +59 -0
  20. package/catalog/commands/triage.md +24 -0
  21. package/catalog/harness.config.schema.json +116 -0
  22. package/catalog/hooks/README.md +56 -0
  23. package/catalog/hooks/init.sh +281 -0
  24. package/catalog/hooks/lib/lemony.sh +41 -0
  25. package/catalog/hooks/lib/playbook-scan.sh +394 -0
  26. package/catalog/hooks/lib/transcript-grep.sh +56 -0
  27. package/catalog/hooks/require-playbook.sh +97 -0
  28. package/catalog/hooks/session-close.sh +232 -0
  29. package/catalog/hooks/suggest-playbook.sh +72 -0
  30. package/catalog/playbook-format.md +198 -0
  31. package/catalog/schemas/README.md +13 -0
  32. package/catalog/schemas/tier2-events-history.md +104 -0
  33. package/catalog/schemas/tier2-events.md +286 -0
  34. package/catalog/skills/README.md +62 -0
  35. package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
  36. package/catalog/skills/code-explorer/SKILL.md +76 -0
  37. package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
  38. package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
  39. package/catalog/skills/grill-with-docs/SKILL.md +270 -0
  40. package/catalog/skills/grill-with-docs/reference.md +236 -0
  41. package/catalog/skills/mutation-testing/SKILL.md +84 -0
  42. package/catalog/skills/note-side-finding/SKILL.md +89 -0
  43. package/catalog/skills/playbook-iterate/SKILL.md +78 -0
  44. package/catalog/skills/prd-to-spec/SKILL.md +181 -0
  45. package/catalog/skills/raise-discovery/SKILL.md +112 -0
  46. package/catalog/skills/resolve-discovery/SKILL.md +123 -0
  47. package/catalog/skills/review-pr/SKILL.md +106 -0
  48. package/catalog/skills/review-pr/reference.md +105 -0
  49. package/catalog/skills/security-review/SKILL.md +90 -0
  50. package/catalog/skills/senior-review/SKILL.md +99 -0
  51. package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
  52. package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
  53. package/catalog/skills/spec-to-issue/SKILL.md +88 -0
  54. package/catalog/skills/task-closeout/SKILL.md +229 -0
  55. package/catalog/skills/tdd/SKILL.md +171 -0
  56. package/catalog/skills/test-gap-report/SKILL.md +71 -0
  57. package/catalog/skills/triage-issue/SKILL.md +102 -0
  58. package/catalog/skills/update-architecture/SKILL.md +69 -0
  59. package/catalog/skills/verify/SKILL.md +90 -0
  60. package/catalog/skills/write-adr/SKILL.md +77 -0
  61. package/catalog/templates/README.md +32 -0
  62. package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
  63. package/catalog/templates/claude-code/agents.md.tpl +109 -0
  64. package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
  65. package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
  66. package/catalog/templates/claude-code/state/history.md.tpl +6 -0
  67. package/dist/cli.mjs +5691 -0
  68. package/package.json +80 -0
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: review-pr
3
+ description: Interactive PR review with a dry-run approval flow — fetch the diff, analyze it through the review lenses, present a numbered findings table, and post only the human-selected ones as GitHub inline review comments. Use at the merge gate (or whenever a curated inline pass is wanted) — a lightweight assist, not the heavy multi-agent fan-out.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [orchestrator]
8
+ ---
9
+
10
+ # Review PR
11
+
12
+ ## Core principle
13
+
14
+ **Show before you post.** Analyze → dry-run → human picks → post. Never submit GitHub
15
+ review comments without explicit human approval. This is a single-pass, curated assist
16
+ layered on the merge gate — for an exhaustive multi-agent audit, that's a different,
17
+ heavier tool, deliberately not this one.
18
+
19
+ ## Workflow
20
+
21
+ ### 0. Resolve the PR
22
+
23
+ - **URL** (`https://github.com/<org>/<repo>/pull/123`): extract `<org>/<repo>` + number.
24
+ - **Number** (`#834`, `834`): resolve the repo with `gh repo view --json nameWithOwner`.
25
+ - **No argument**: detect the open PR on the current branch with `gh pr view`.
26
+
27
+ Run in parallel:
28
+
29
+ - `gh pr diff <number> --repo <org>/<repo>`
30
+ - `gh pr view <number> --repo <org>/<repo> --json title,headRefName,baseRefName,files,headRefOid`
31
+
32
+ If the diff is empty, stop and tell the human.
33
+
34
+ ### 1. Read project context
35
+
36
+ Read the repo's `CLAUDE.md` (and `CONTEXT.md` if present). Summarize in ≤ 50 lines:
37
+ code-style rules, comment conventions, naming, architecture constraints. This is what
38
+ the analysis judges "against project convention" by.
39
+
40
+ ### 2. Analyze (single sub-agent, the #43 lenses as a checklist)
41
+
42
+ Spawn **one** `Explore` sub-agent with the full diff, the changed-files list, and the
43
+ project context. Its prompt walks the five review lenses as a checklist — **not** five
44
+ separate agents (that's the heavy fan-out this skill deliberately avoids):
45
+
46
+ 1. **Security** — injection, auth/authz gaps, secret exposure, unsafe input reaching a
47
+ sink, OWASP-class issues.
48
+ 2. **Silent failures** — swallowed errors, empty `catch`, a fallback that hides a bug,
49
+ a `return` where it should `throw`.
50
+ 3. **Spec compliance** — does the diff do what the issue/spec/PR says, no more and no
51
+ less? Scope drift, missing acceptance criteria.
52
+ 4. **Test gaps** — logic-bearing code with no test; a bug fix with no regression test;
53
+ assertions that don't actually pin the behavior.
54
+ 5. **Senior quality** — correctness, edge cases, naming that encodes intent, N+1s,
55
+ convention violations from step 1's summary.
56
+
57
+ The agent returns a strict JSON array of findings (see [reference.md](reference.md) for
58
+ the exact prompt and schema). It returns `[]` when the diff is clean — it must **not**
59
+ manufacture findings to seem thorough.
60
+
61
+ ### 3. Dry-run
62
+
63
+ Print the findings as a numbered table (don't post anything yet):
64
+
65
+ ```
66
+ ## Dry-run — <PR_TITLE> (#<NUMBER>)
67
+
68
+ | # | Sev | File | Line | Problem | Suggestion |
69
+ |---|-----|------|------|---------|------------|
70
+ | 1 | 🟠 | src/foo.ts | 42 | Swallowed error in the catch | Re-throw or log + surface |
71
+ ...
72
+
73
+ Total: N findings. Which do I post? ("all", "1,3,5", or "none")
74
+ ```
75
+
76
+ ### 4. Human selection
77
+
78
+ Wait for the reply. Parse:
79
+
80
+ - `all` → every finding.
81
+ - `none` → stop; post nothing.
82
+ - `1,3,5` or `1 3 5` → the selected indices.
83
+
84
+ Confirm the list before posting.
85
+
86
+ ### 5. Post the review comments
87
+
88
+ Use `headRefOid` from step 0 as `commit_id`. Post all selected comments in **one**
89
+ `gh api` call (one review thread, not N). See [reference.md](reference.md) for the
90
+ exact `gh api` invocation and the comment-body format.
91
+
92
+ ### 6. Confirm
93
+
94
+ Report the count of posted comments and the PR URL.
95
+
96
+ ## Error handling
97
+
98
+ - `gh` not available → stop, tell the human.
99
+ - A finding's line isn't in the diff (position mismatch) → skip that comment and warn.
100
+ - A post fails → show the error, offer a retry.
101
+
102
+ ## Uncontemplated scenarios
103
+
104
+ 1. Apply the closest matching rule, with reasoning.
105
+ 2. Flag it: "This case isn't covered by `review-pr`; I applied [rule] because [reason].
106
+ Want to refine the skill?"
@@ -0,0 +1,105 @@
1
+ # Review PR — reference
2
+
3
+ ## Analysis agent prompt
4
+
5
+ Pass this to the single `Explore` sub-agent. The five lenses are a **checklist one
6
+ agent walks**, not a fan-out.
7
+
8
+ ```
9
+ You are reviewing a GitHub PR diff. Find real issues worth posting as inline review
10
+ comments. Walk these five lenses in order; a finding can come from any of them:
11
+
12
+ 1. Security — injection, auth/authz gaps, secret exposure, unsafe input reaching a
13
+ sink, OWASP-class issues.
14
+ 2. Silent failures — swallowed errors, empty catch, a fallback that hides a bug, a
15
+ return where it should throw.
16
+ 3. Spec compliance — does the diff match the stated intent (issue/spec/PR)? Scope
17
+ drift, missing acceptance criteria.
18
+ 4. Test gaps — logic-bearing code with no test; a bug fix with no regression test;
19
+ assertions that don't pin the behavior.
20
+ 5. Senior quality — correctness, edge cases, naming, N+1s, convention violations.
21
+
22
+ Judge "against convention" using the project context below.
23
+
24
+ Do NOT flag:
25
+ - Style preferences not in the project context.
26
+ - TODOs explicitly marked as such.
27
+ - Changes that are clearly intentional from the diff context.
28
+
29
+ Return ONLY a JSON array (no markdown, no preamble). Return [] if the diff is clean —
30
+ do not manufacture findings.
31
+
32
+ ## Project context
33
+ <PROJECT_CONTEXT>
34
+
35
+ ## Changed files
36
+ <FILES>
37
+
38
+ ## Diff
39
+ <DIFF>
40
+ ```
41
+
42
+ ## Output schema
43
+
44
+ ```json
45
+ [
46
+ {
47
+ "file": "src/foo.ts",
48
+ "line": 42,
49
+ "side": "RIGHT",
50
+ "lens": "security|silent-failure|spec-compliance|test-gap|senior-quality",
51
+ "severity": "critical|important|suggestion",
52
+ "title": "Short title (max 60 chars)",
53
+ "issue": "What is wrong and why it matters",
54
+ "suggestion": "How to fix it (specific, not generic)"
55
+ }
56
+ ]
57
+ ```
58
+
59
+ Empty result: `[]`.
60
+
61
+ `line` must exist on the right side of the diff for that file. `side` is `"RIGHT"` for
62
+ added or modified lines.
63
+
64
+ ## Comment body format
65
+
66
+ Each GitHub review comment body:
67
+
68
+ ````markdown
69
+ **<title>** _(<lens>)_
70
+
71
+ <issue>
72
+
73
+ ```suggestion
74
+ <new line content> ← optional, only for a single-line replacement
75
+ ```
76
+ ````
77
+
78
+ ## Dry-run table
79
+
80
+ | # | Sev | File | Line | Problem | Suggestion |
81
+ | --- | --- | ---- | ---- | ------- | ---------- |
82
+
83
+ Severity emoji: 🔴 `critical` · 🟠 `important` · 🟡 `suggestion`.
84
+
85
+ ## GitHub API — batch review
86
+
87
+ All selected comments go in a **single** API call (one review thread, not N):
88
+
89
+ ```bash
90
+ gh api repos/<org>/<repo>/pulls/<number>/reviews \
91
+ --method POST \
92
+ --field commit_id=<headRefOid> \
93
+ --field body='' \
94
+ --field event=COMMENT \
95
+ --field 'comments[][path]=src/foo.ts' \
96
+ --field 'comments[][line]=42' \
97
+ --field 'comments[][side]=RIGHT' \
98
+ --field 'comments[][body]=**Title** _(silent-failure)_ …' \
99
+ --field 'comments[][path]=src/bar.ts' \
100
+ --field 'comments[][line]=23' \
101
+ --field 'comments[][side]=RIGHT' \
102
+ --field 'comments[][body]=**Title** _(security)_ …'
103
+ ```
104
+
105
+ Repeat the four `comments[]...` fields per selected finding.
@@ -0,0 +1,90 @@
1
+ ---
2
+ name: security-review
3
+ description: Security review of a change — OWASP-style vectors plus AI/LLM-specific risks. Checks input validation, authorization/IDOR, injection, secrets & logging, transport/headers, dependencies, and prompt-injection where an LLM is in the loop. Use after `senior-review` on any change touching user input, auth, data access, or an LLM.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [reviewer]
8
+ ---
9
+
10
+ # Security Review
11
+
12
+ Review the change for security risk with **fresh context**. Scope the depth to what
13
+ the change touches — a pure refactor of internal logic needs a lighter pass than a new
14
+ endpoint or an LLM call. Consult the project's **security playbook** for project-
15
+ specific rules (deployment model, auth scheme, secret tooling); the checklist below is
16
+ the generic floor. Flag only findings you're confident are real, ranked by severity.
17
+
18
+ ## 1. Input validation at boundaries
19
+
20
+ - Every entry point (request body, query/path params, headers, file, env, third-party
21
+ response) validates **shape, type, required-ness, and max length** before use.
22
+ - External strings are validated/converted at the boundary, not asserted into the
23
+ internal type.
24
+ - No mass-assignment: unexpected fields are rejected, not spread into a model.
25
+
26
+ ## 2. Authorization (not just authentication)
27
+
28
+ - **Authentication** (who are you?) and **authorization** (are you allowed?) are
29
+ distinct checks — a valid session is not permission.
30
+ - **Ownership / IDOR**: a resource fetched by id verifies it belongs to the caller. An
31
+ auth guard never substitutes for the ownership check.
32
+ - Privileged actions check role/scope server-side, never trusting a client claim.
33
+
34
+ ## 3. Injection
35
+
36
+ - **SQL/NoSQL**: parameterized queries / validated ids; never interpolate user input
37
+ into a query, never pass a raw request object to a query (object-injection).
38
+ - **Command/path**: no user input concatenated into a shell command or file path
39
+ (path traversal).
40
+ - **Output/XSS**: rendered user content is escaped by default; raw-HTML sinks
41
+ (`dangerouslySetInnerHTML`, server-generated HTML/PDF) sanitize at render time.
42
+
43
+ ## 4. Secrets & logging
44
+
45
+ - No secret, token, password, key, or PII in code, logs, or error messages (a
46
+ never-log redaction list is configured; emails in failure logs are hashed).
47
+ - No secret committed (`.env` gitignored; `.env.example` carries placeholders only).
48
+ - Constant-time comparison for secrets/tokens (no `===` on a hash); strong randomness
49
+ for generated keys.
50
+
51
+ ## 5. Transport, headers & CSRF
52
+
53
+ - Security headers present (nosniff, frame-options/ deny, HSTS in prod).
54
+ - CORS is closed by default; opened only deliberately for known origins.
55
+ - Cookie-based sessions set `httpOnly`, `secure` (prod), and `SameSite` (CSRF
56
+ defense); token/header auth flows are not CSRF-susceptible.
57
+
58
+ ## 6. Dependencies & exposure
59
+
60
+ - `npm audit` (or the ecosystem equivalent) clean at moderate+; the change adds no
61
+ vulnerable dependency.
62
+ - No source maps served publicly in production; no debug/verbose error bodies leaking
63
+ internals to clients.
64
+
65
+ ## 7. AI / LLM-specific (when an LLM is in the loop)
66
+
67
+ - **Prompt injection**: untrusted content (user text, fetched pages, tool output) is
68
+ treated as data, not instructions; system prompts are not overridable by user input.
69
+ - **Output handling**: LLM output is validated/escaped before it drives an action
70
+ (code execution, a query, a shell command, a network call) — never `eval`'d or run
71
+ unchecked.
72
+ - **Secret/PII exposure**: no secret or excessive PII placed in a prompt sent to a
73
+ third-party model; tool/function access from the model is least-privilege.
74
+ - **Resource limits**: token/cost/loop bounds on agentic calls to prevent runaway use.
75
+
76
+ ## Report
77
+
78
+ ```
79
+ ## Security Review — <task name>
80
+
81
+ **Critical**: <vector — file:line — fix> | none
82
+ **High**: <…> | none
83
+ **Medium**: <…> | none
84
+ **Low / hardening**: <…> | none
85
+
86
+ **Verdict**: no blocking issues / <N> issues to fix before merge
87
+ ```
88
+
89
+ A finding that the **spec** mandated an insecure approach is a **discovery** (the spec
90
+ needs a human decision) — run `raise-discovery` rather than silently complying.
@@ -0,0 +1,99 @@
1
+ ---
2
+ name: senior-review
3
+ description: Post-implementation code & test quality review — the "is it good?" half (mechanical gates + execution belong to `verify`). Checks test meaningfulness, integration seams, TypeScript async/type pitfalls, and a self-review checklist, gated by confidence. Use after `verify`, with fresh context, to judge the quality of just-implemented code.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [reviewer]
8
+ ---
9
+
10
+ # Senior Review
11
+
12
+ Judge the **quality** of the code just implemented, with **fresh context** — review
13
+ it as if it were someone else's work. This is the "**is it good?**" half of review;
14
+ the "**does it work?**" half (build, type-check, lint, tests, coverage, audit, and a
15
+ real run) is the `verify` skill, which the Reviewer runs **first**. Do not re-run the
16
+ gates here — assume `verify` passed and focus on judgment. Work each phase in order.
17
+
18
+ ## Confidence gating
19
+
20
+ Only raise a finding you are **>80% confident** is a real problem. State your
21
+ confidence when it's borderline. A review that flags everything trains the reader to
22
+ ignore it; a review that flags only what matters gets acted on. Style nitpicks the
23
+ formatter/linter already owns are not findings.
24
+
25
+ ## Phase 1 — Test quality
26
+
27
+ `verify` confirmed the suite passes; here you judge whether the tests are _meaningful_.
28
+
29
+ 1. Identify the logic-bearing files in the change (services, mappers, hooks,
30
+ processors, helpers, components).
31
+ 2. For each, open its `*.spec.ts(x)` and verify:
32
+ - Happy path covered.
33
+ - At least one **error** case covered.
34
+ - At least one **edge** case (empty input, null, boundary value).
35
+ - **Arrange / Act / Assert** structure present.
36
+ - `toStrictEqual` used (not `toEqual` / `toBe`) — `toStrictEqual` catches
37
+ `undefined` properties the others miss.
38
+ - Assertions test **behavior**, not the mock (no mock-and-assert, no over-mocking
39
+ of internal modules).
40
+ 3. **User-visible flow**: if the change adds/modifies a navigation / form / list /
41
+ auth flow, is there an end-to-end test (or was one updated)? See the project's
42
+ testing playbook (`docs/playbooks/testing.md`, then the global layer) for when E2E
43
+ is warranted.
44
+ 4. If a spec is missing or shallow, that is a finding — name the file and the gap.
45
+
46
+ ## Phase 2 — Integration seams
47
+
48
+ Most bugs live at module boundaries, not inside modules. Identify the seams this
49
+ change touches — event pipelines, cross-module boundaries, API↔UI contracts,
50
+ producer→consumer handoffs. For each seam touched, confirm there is at least one test
51
+ that **crosses the boundary**, not just a unit test inside one module.
52
+
53
+ ## Phase 3 — TypeScript async & type pitfalls
54
+
55
+ Read the diff for the failure modes the compiler does not catch on its own:
56
+
57
+ - [ ] **Floating promises** — every async call is `await`ed or deliberately handled;
58
+ no fire-and-forget that should have been awaited.
59
+ - [ ] **Missing `await` in `try`** — an un-awaited promise inside `try/catch` escapes
60
+ the catch.
61
+ - [ ] **`any` / unsafe casts** — no `as any`, no `as unknown as T` smuggling past the
62
+ type system; assertions are justified.
63
+ - [ ] **Non-null assertions (`!`)** that hide a genuinely possible null.
64
+ - [ ] **Unhandled rejection paths** — `Promise.all` vs `allSettled` chosen
65
+ deliberately; one rejection doesn't silently drop the rest.
66
+ - [ ] **Boundary types** — external/untrusted input is validated and narrowed at the
67
+ boundary, not asserted into the internal type.
68
+
69
+ ## Phase 4 — Self-review checklist
70
+
71
+ Read the diff fresh, as if it were someone else's:
72
+
73
+ - [ ] No N+1 calls or unnecessary work in loops.
74
+ - [ ] Subscriptions, intervals, and event listeners cleaned up on teardown.
75
+ - [ ] No silent error swallowing (a `catch` that does nothing, optional chaining that
76
+ hides a bug) — for a deeper pass, the `silent-failure-hunter` skill.
77
+ - [ ] External/user input validated at system boundaries.
78
+ - [ ] Errors fail **explicitly** with a message that identifies the problem.
79
+ - [ ] No dead code or unused variables.
80
+ - [ ] Comments only where logic is non-obvious (not narrating every line).
81
+
82
+ ## Phase 5 — Report
83
+
84
+ Output a concise verdict the Orchestrator can act on:
85
+
86
+ ```
87
+ ## Senior Review — <task name>
88
+
89
+ **Test quality**: ✅ / ⚠️ <missing/shallow specs>
90
+ **User-visible flow**: ✅ tested / N/A / ⚠️ <gap>
91
+ **Integration seams**: ✅ covered / ⚠️ <uncrossed boundary>
92
+ **TS async/types**: ✅ / ⚠️ <finding>
93
+ **Self-review**: <findings, or "none">
94
+
95
+ **Verdict**: approve / changes requested — <one-line reason>
96
+ ```
97
+
98
+ If a finding is about the spec itself (the spec is wrong or silent on a case), that is
99
+ a **discovery**, not a rejection: run `raise-discovery` instead of failing the change.
@@ -0,0 +1,76 @@
1
+ ---
2
+ name: silent-failure-hunter
3
+ description: Hunt for silently swallowed errors and dangerous fallbacks in a change — the failures that hide bugs instead of surfacing them. Five focused passes over the diff. Use during review when a change touches error handling, async flows, or boundary parsing, before `security-review`.
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [reviewer]
8
+ ---
9
+
10
+ # Silent Failure Hunter
11
+
12
+ The most expensive bugs are the ones that never raise their hand. This skill is a
13
+ focused pass for **silent failures** — code that hides a problem instead of surfacing
14
+ it. The rule it enforces (from the engineering playbook): **errors should fail loudly,
15
+ at the point of the problem, with a message that names it.** Make five passes over the
16
+ diff; for each finding, state the failure mode and the loud alternative.
17
+
18
+ ## Pass 1 — Empty / swallowing catches
19
+
20
+ - `catch` blocks that do nothing, only `console.log`, or `return null/undefined/[]`
21
+ without rethrowing or handling.
22
+ - `.catch(() => {})` on a promise.
23
+ - `try` wrapping logic where the `catch` masks the real failure.
24
+
25
+ > A `catch` is for _handling_ an error (recover, translate, add context, rethrow) —
26
+ > not for making it disappear. An expected no-op (idempotent retry) is fine **only**
27
+ > with a comment stating the intent.
28
+
29
+ ## Pass 2 — Dangerous fallbacks
30
+
31
+ - `value ?? defaultThatHidesAbsence` / `value || fallback` where the fallback masks a
32
+ missing-data bug instead of failing.
33
+ - Optional chaining (`a?.b?.c`) that silently yields `undefined` where the value was
34
+ required — the bug surfaces three layers later, detached from its cause.
35
+ - `try { JSON.parse(x) } catch { return {} }` — a parse failure becomes empty data.
36
+
37
+ ## Pass 3 — Inadequate logging / observability
38
+
39
+ - An error is caught and handled but **never logged**, so it's invisible in
40
+ production.
41
+ - Logged at the wrong level (`debug`/`info` for a real error) so it's filtered out.
42
+ - The log message lacks the context to act on it (no id, no operation, no cause).
43
+
44
+ ## Pass 4 — Propagation issues
45
+
46
+ - A floating promise (no `await`, no `.catch`) whose rejection is lost.
47
+ - `Promise.all` where one rejection silently aborts siblings (vs `allSettled` when
48
+ partial success matters) — or the reverse, swallowing a failure that should abort.
49
+ - An async function whose error path returns a _success-shaped_ value.
50
+ - An error translated to a generic message that erases the original cause (no `cause`
51
+ chain).
52
+
53
+ ## Pass 5 — Missing handling at boundaries
54
+
55
+ - External/untrusted input (API response, file, env var, user input) used without
56
+ validation, so a malformed value flows in as a silent bad state rather than a loud
57
+ rejection.
58
+ - A `switch`/branch with no `default` for an unexpected value.
59
+ - A required env var read with a silent fallback instead of failing fast at startup.
60
+
61
+ ## Report
62
+
63
+ ```
64
+ ## Silent Failure Hunt — <task name>
65
+
66
+ **Empty/swallowing catches**: <file:line — failure → loud fix> | none
67
+ **Dangerous fallbacks**: <…> | none
68
+ **Inadequate logging**: <…> | none
69
+ **Propagation**: <…> | none
70
+ **Missing boundary handling**: <…> | none
71
+
72
+ **Verdict**: clean / <N> silent failures to fix
73
+ ```
74
+
75
+ For each finding, give the loud alternative concretely (`throw new Error(\`… ${id}\`)`instead of`return null`). If the "silent" path is intentional, it must carry a comment
76
+ saying so — absent that comment, treat it as a finding.
@@ -0,0 +1,74 @@
1
+ ---
2
+ name: spec-compliance-check
3
+ description: Walk an L1 spec point by point against the implementation and emit a per-requirement pass/fail verdict. Use during review of an SDD task to prove every EARS requirement (including unwanted-behavior paths) is satisfied and tested — a committable audit trail. Complements `senior-review` (quality) with traceability (coverage of intent).
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: post-implementation
7
+ invoked-by: [reviewer]
8
+ ---
9
+
10
+ # Spec Compliance Check
11
+
12
+ `senior-review` judges whether the code is _good_; this skill proves it does what the
13
+ **spec said** — every requirement, point by point. It produces a traceability matrix:
14
+ a committable audit trail mapping each requirement to its implementation and its test.
15
+ Only meaningful for **L1 (SDD) tasks** that have a spec under
16
+ `.claude/state/tasks/<id>/spec/`; skip it for L2.
17
+
18
+ ## Process
19
+
20
+ ### 1. Load the spec
21
+
22
+ Read all three spec artifacts for the task:
23
+
24
+ - `requirements.md` — the **EARS** requirements (ubiquitous / event-driven /
25
+ state-driven / optional-feature / **unwanted-behavior**), each numbered with
26
+ acceptance criteria.
27
+ - `design.md` — the intended files, interfaces, approach, edge cases.
28
+ - `tasks.md` — the atomic checklist, each item referencing the requirements it
29
+ satisfies.
30
+
31
+ ### 2. Trace each requirement to the implementation
32
+
33
+ For **every** numbered requirement (do not sample — enterprise audit trails are
34
+ exhaustive):
35
+
36
+ 1. Locate where the change implements it (file + symbol).
37
+ 2. Locate the test that proves it (the test whose assertion maps to the acceptance
38
+ criteria).
39
+ 3. Decide the verdict:
40
+ - **PASS** — implemented **and** covered by a test that asserts the acceptance
41
+ criteria.
42
+ - **PARTIAL** — implemented but the test is missing or doesn't assert the criteria.
43
+ - **FAIL** — not implemented, or the behavior contradicts the requirement.
44
+
45
+ Pay special attention to **unwanted-behavior** requirements (`If … then …`) — these
46
+ are the ones implementations quietly skip. Each needs a test that drives the bad input
47
+ and asserts the rejection/guard.
48
+
49
+ ### 3. Check for drift
50
+
51
+ - **Unspecified additions**: implementation behavior with no backing requirement →
52
+ either scope drift (a `T3` discovery) or a missing requirement.
53
+ - **Design divergence**: the change took an approach `design.md` didn't describe —
54
+ note it; if it invalidates a design decision, that's a discovery.
55
+
56
+ ### 4. Emit the matrix
57
+
58
+ ```
59
+ ## Spec Compliance — task #<id>
60
+
61
+ | Req | Summary | Status | Implementation | Test |
62
+ | --- | --- | --- | --- | --- |
63
+ | R1 | <ubiquitous …> | ✅ PASS | `x.service.ts:42` | `x.service.spec.ts` |
64
+ | R7 | <If invalid → reject> | ⚠️ PARTIAL | `x.service.ts:80` | — (no test) |
65
+ | R9 | <state-driven …> | ❌ FAIL | — | — |
66
+
67
+ **Covered**: <n>/<total> · **Partial**: <n> · **Failing**: <n>
68
+ **Verdict**: compliant / non-compliant — <one-line reason>
69
+ ```
70
+
71
+ A `FAIL` or `PARTIAL` is a review finding routed back to the Implementer. If a
72
+ requirement is itself wrong, ambiguous, or contradicted by reality, that's a
73
+ **discovery** — run `raise-discovery`, don't quietly mark it PASS or rewrite the spec
74
+ yourself.
@@ -0,0 +1,88 @@
1
+ ---
2
+ name: spec-to-issue
3
+ description: Fill a managed issue's body with the externalized spec. Use when the Spec Author publishes the completed spec into the GitHub issue the Orchestrator already created, mentions "fill the issue", "spec-to-issue", or "externalize the spec into the issue".
4
+ origin: vendor
5
+ vendor_version: '{{vendor_version}}'
6
+ phase: pre-implementation
7
+ invoked-by: [spec-author]
8
+ ---
9
+
10
+ # Spec to Issue
11
+
12
+ ## Core Principle
13
+
14
+ The GitHub issue **is** the spec, externalized. The Orchestrator already created it (a
15
+ skeleton with the harness labels) at the start of `spec-in-progress` and gave the task
16
+ its permanent `<id>`. This skill does one thing: replace that skeleton body with the
17
+ real spec, so the issue reads as a self-contained spec on GitHub.
18
+
19
+ Run this **after** `prd-to-spec` has produced `requirements.md` + `design.md` +
20
+ `tasks.md` under `.claude/state/tasks/<id>/spec/`.
21
+
22
+ ## Preconditions
23
+
24
+ - The issue already exists (the Orchestrator created it) and `<id>` is known.
25
+ - A spec exists at `.claude/state/tasks/<id>/spec/{requirements,design,tasks}.md`.
26
+ - `gh` is authenticated and `task_storage.type` is `github` (see `harness.config.yml`).
27
+
28
+ You do **not** create the issue, rename any folder (the id is real from the start —
29
+ there is no `draft-<slug>`), or touch labels. Issue creation and the entire label
30
+ lifecycle belong to the Orchestrator.
31
+
32
+ ## Process
33
+
34
+ ### 1. Build the issue body from the spec
35
+
36
+ The body is the externalized spec — self-contained enough to read on GitHub, with
37
+ links back to the committed files for the full detail:
38
+
39
+ ```markdown
40
+ ## Summary
41
+
42
+ <one paragraph: the capability and why, from the PRD>
43
+
44
+ ## Requirements (EARS)
45
+
46
+ <paste requirements.md, or its numbered list>
47
+
48
+ ## Design
49
+
50
+ See `.claude/state/tasks/<id>/spec/design.md`.
51
+
52
+ ## Tasks
53
+
54
+ See `.claude/state/tasks/<id>/spec/tasks.md`.
55
+ ```
56
+
57
+ ### 2. Replace the skeleton body
58
+
59
+ ```bash
60
+ gh issue edit <id> --body-file <body>
61
+ ```
62
+
63
+ This overwrites the `🚧 Spec in progress …` skeleton the Orchestrator wrote at issue
64
+ creation. Labels are untouched — the issue still carries `harness:managed` +
65
+ `harness:sdd` + `harness:status:spec-in-progress`; the Orchestrator flips it to
66
+ `spec-ready` after you hand back.
67
+
68
+ ### 3. Hand back to the Orchestrator
69
+
70
+ Return a short confirmation. From here the **Orchestrator** flips the status label to
71
+ `harness:status:spec-ready`, commits and pushes the task state to the branch, and runs
72
+ the human approval gate. This skill transitions no labels.
73
+
74
+ ## Ownership boundary
75
+
76
+ - **Spec Author** (this skill): fills the issue **body** from the committed spec.
77
+ - **Orchestrator**: created the issue, owns every label transition
78
+ (`spec-in-progress → spec-ready → in-progress → in-review → done`), and closes it at
79
+ closeout.
80
+
81
+ ## Uncontemplated Scenarios
82
+
83
+ When a scenario doesn't clearly fit these rules:
84
+
85
+ 1. Apply the closest matching approach with reasoning.
86
+ 2. **Flag it**: "This scenario isn't covered by the spec-to-issue skill. I applied
87
+ [approach] because [reason]. Want to update the skill?"
88
+ 3. Offer to add a new rule for the case.