@lemoncode/lemony 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/PRIVACY.md +147 -0
- package/README.md +189 -0
- package/catalog/VERSION +1 -0
- package/catalog/agents/README.md +29 -0
- package/catalog/agents/architect.md +81 -0
- package/catalog/agents/fit-assessment.md +94 -0
- package/catalog/agents/implementer.md +67 -0
- package/catalog/agents/orchestrator.md +627 -0
- package/catalog/agents/reviewer.md +124 -0
- package/catalog/agents/spec-author.md +69 -0
- package/catalog/agents/ui-designer.md +25 -0
- package/catalog/commands/add-capability.md +69 -0
- package/catalog/commands/bypass.md +40 -0
- package/catalog/commands/define.md +24 -0
- package/catalog/commands/hotfix.md +47 -0
- package/catalog/commands/pause.md +52 -0
- package/catalog/commands/resume.md +56 -0
- package/catalog/commands/spinoff.md +59 -0
- package/catalog/commands/triage.md +24 -0
- package/catalog/harness.config.schema.json +116 -0
- package/catalog/hooks/README.md +56 -0
- package/catalog/hooks/init.sh +281 -0
- package/catalog/hooks/lib/lemony.sh +41 -0
- package/catalog/hooks/lib/playbook-scan.sh +394 -0
- package/catalog/hooks/lib/transcript-grep.sh +56 -0
- package/catalog/hooks/require-playbook.sh +97 -0
- package/catalog/hooks/session-close.sh +232 -0
- package/catalog/hooks/suggest-playbook.sh +72 -0
- package/catalog/playbook-format.md +198 -0
- package/catalog/schemas/README.md +13 -0
- package/catalog/schemas/tier2-events-history.md +104 -0
- package/catalog/schemas/tier2-events.md +286 -0
- package/catalog/skills/README.md +62 -0
- package/catalog/skills/bootstrap-architecture/SKILL.md +78 -0
- package/catalog/skills/code-explorer/SKILL.md +76 -0
- package/catalog/skills/grill-with-docs/ADR-FORMAT.md +49 -0
- package/catalog/skills/grill-with-docs/CONTEXT-FORMAT.md +77 -0
- package/catalog/skills/grill-with-docs/SKILL.md +270 -0
- package/catalog/skills/grill-with-docs/reference.md +236 -0
- package/catalog/skills/mutation-testing/SKILL.md +84 -0
- package/catalog/skills/note-side-finding/SKILL.md +89 -0
- package/catalog/skills/playbook-iterate/SKILL.md +78 -0
- package/catalog/skills/prd-to-spec/SKILL.md +181 -0
- package/catalog/skills/raise-discovery/SKILL.md +112 -0
- package/catalog/skills/resolve-discovery/SKILL.md +123 -0
- package/catalog/skills/review-pr/SKILL.md +106 -0
- package/catalog/skills/review-pr/reference.md +105 -0
- package/catalog/skills/security-review/SKILL.md +90 -0
- package/catalog/skills/senior-review/SKILL.md +99 -0
- package/catalog/skills/silent-failure-hunter/SKILL.md +76 -0
- package/catalog/skills/spec-compliance-check/SKILL.md +74 -0
- package/catalog/skills/spec-to-issue/SKILL.md +88 -0
- package/catalog/skills/task-closeout/SKILL.md +229 -0
- package/catalog/skills/tdd/SKILL.md +171 -0
- package/catalog/skills/test-gap-report/SKILL.md +71 -0
- package/catalog/skills/triage-issue/SKILL.md +102 -0
- package/catalog/skills/update-architecture/SKILL.md +69 -0
- package/catalog/skills/verify/SKILL.md +90 -0
- package/catalog/skills/write-adr/SKILL.md +77 -0
- package/catalog/templates/README.md +32 -0
- package/catalog/templates/claude-code/.claude/settings.json.tpl +34 -0
- package/catalog/templates/claude-code/agents.md.tpl +109 -0
- package/catalog/templates/claude-code/docs/playbooks/README.md.tpl +96 -0
- package/catalog/templates/claude-code/harness.config.yml.tpl +59 -0
- package/catalog/templates/claude-code/state/history.md.tpl +6 -0
- package/dist/cli.mjs +5691 -0
- package/package.json +80 -0
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: review-pr
|
|
3
|
+
description: Interactive PR review with a dry-run approval flow — fetch the diff, analyze it through the review lenses, present a numbered findings table, and post only the human-selected ones as GitHub inline review comments. Use at the merge gate (or whenever a curated inline pass is wanted) — a lightweight assist, not the heavy multi-agent fan-out.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [orchestrator]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Review PR
|
|
11
|
+
|
|
12
|
+
## Core principle
|
|
13
|
+
|
|
14
|
+
**Show before you post.** Analyze → dry-run → human picks → post. Never submit GitHub
|
|
15
|
+
review comments without explicit human approval. This is a single-pass, curated assist
|
|
16
|
+
layered on the merge gate — for an exhaustive multi-agent audit, that's a different,
|
|
17
|
+
heavier tool, deliberately not this one.
|
|
18
|
+
|
|
19
|
+
## Workflow
|
|
20
|
+
|
|
21
|
+
### 0. Resolve the PR
|
|
22
|
+
|
|
23
|
+
- **URL** (`https://github.com/<org>/<repo>/pull/123`): extract `<org>/<repo>` + number.
|
|
24
|
+
- **Number** (`#834`, `834`): resolve the repo with `gh repo view --json nameWithOwner`.
|
|
25
|
+
- **No argument**: detect the open PR on the current branch with `gh pr view`.
|
|
26
|
+
|
|
27
|
+
Run in parallel:
|
|
28
|
+
|
|
29
|
+
- `gh pr diff <number> --repo <org>/<repo>`
|
|
30
|
+
- `gh pr view <number> --repo <org>/<repo> --json title,headRefName,baseRefName,files,headRefOid`
|
|
31
|
+
|
|
32
|
+
If the diff is empty, stop and tell the human.
|
|
33
|
+
|
|
34
|
+
### 1. Read project context
|
|
35
|
+
|
|
36
|
+
Read the repo's `CLAUDE.md` (and `CONTEXT.md` if present). Summarize in ≤ 50 lines:
|
|
37
|
+
code-style rules, comment conventions, naming, architecture constraints. This is what
|
|
38
|
+
the analysis judges "against project convention" by.
|
|
39
|
+
|
|
40
|
+
### 2. Analyze (single sub-agent, the #43 lenses as a checklist)
|
|
41
|
+
|
|
42
|
+
Spawn **one** `Explore` sub-agent with the full diff, the changed-files list, and the
|
|
43
|
+
project context. Its prompt walks the five review lenses as a checklist — **not** five
|
|
44
|
+
separate agents (that's the heavy fan-out this skill deliberately avoids):
|
|
45
|
+
|
|
46
|
+
1. **Security** — injection, auth/authz gaps, secret exposure, unsafe input reaching a
|
|
47
|
+
sink, OWASP-class issues.
|
|
48
|
+
2. **Silent failures** — swallowed errors, empty `catch`, a fallback that hides a bug,
|
|
49
|
+
a `return` where it should `throw`.
|
|
50
|
+
3. **Spec compliance** — does the diff do what the issue/spec/PR says, no more and no
|
|
51
|
+
less? Scope drift, missing acceptance criteria.
|
|
52
|
+
4. **Test gaps** — logic-bearing code with no test; a bug fix with no regression test;
|
|
53
|
+
assertions that don't actually pin the behavior.
|
|
54
|
+
5. **Senior quality** — correctness, edge cases, naming that encodes intent, N+1s,
|
|
55
|
+
convention violations from step 1's summary.
|
|
56
|
+
|
|
57
|
+
The agent returns a strict JSON array of findings (see [reference.md](reference.md) for
|
|
58
|
+
the exact prompt and schema). It returns `[]` when the diff is clean — it must **not**
|
|
59
|
+
manufacture findings to seem thorough.
|
|
60
|
+
|
|
61
|
+
### 3. Dry-run
|
|
62
|
+
|
|
63
|
+
Print the findings as a numbered table (don't post anything yet):
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
## Dry-run — <PR_TITLE> (#<NUMBER>)
|
|
67
|
+
|
|
68
|
+
| # | Sev | File | Line | Problem | Suggestion |
|
|
69
|
+
|---|-----|------|------|---------|------------|
|
|
70
|
+
| 1 | 🟠 | src/foo.ts | 42 | Swallowed error in the catch | Re-throw or log + surface |
|
|
71
|
+
...
|
|
72
|
+
|
|
73
|
+
Total: N findings. Which do I post? ("all", "1,3,5", or "none")
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### 4. Human selection
|
|
77
|
+
|
|
78
|
+
Wait for the reply. Parse:
|
|
79
|
+
|
|
80
|
+
- `all` → every finding.
|
|
81
|
+
- `none` → stop; post nothing.
|
|
82
|
+
- `1,3,5` or `1 3 5` → the selected indices.
|
|
83
|
+
|
|
84
|
+
Confirm the list before posting.
|
|
85
|
+
|
|
86
|
+
### 5. Post the review comments
|
|
87
|
+
|
|
88
|
+
Use `headRefOid` from step 0 as `commit_id`. Post all selected comments in **one**
|
|
89
|
+
`gh api` call (one review thread, not N). See [reference.md](reference.md) for the
|
|
90
|
+
exact `gh api` invocation and the comment-body format.
|
|
91
|
+
|
|
92
|
+
### 6. Confirm
|
|
93
|
+
|
|
94
|
+
Report the count of posted comments and the PR URL.
|
|
95
|
+
|
|
96
|
+
## Error handling
|
|
97
|
+
|
|
98
|
+
- `gh` not available → stop, tell the human.
|
|
99
|
+
- A finding's line isn't in the diff (position mismatch) → skip that comment and warn.
|
|
100
|
+
- A post fails → show the error, offer a retry.
|
|
101
|
+
|
|
102
|
+
## Uncontemplated scenarios
|
|
103
|
+
|
|
104
|
+
1. Apply the closest matching rule, with reasoning.
|
|
105
|
+
2. Flag it: "This case isn't covered by `review-pr`; I applied [rule] because [reason].
|
|
106
|
+
Want to refine the skill?"
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# Review PR — reference
|
|
2
|
+
|
|
3
|
+
## Analysis agent prompt
|
|
4
|
+
|
|
5
|
+
Pass this to the single `Explore` sub-agent. The five lenses are a **checklist one
|
|
6
|
+
agent walks**, not a fan-out.
|
|
7
|
+
|
|
8
|
+
```
|
|
9
|
+
You are reviewing a GitHub PR diff. Find real issues worth posting as inline review
|
|
10
|
+
comments. Walk these five lenses in order; a finding can come from any of them:
|
|
11
|
+
|
|
12
|
+
1. Security — injection, auth/authz gaps, secret exposure, unsafe input reaching a
|
|
13
|
+
sink, OWASP-class issues.
|
|
14
|
+
2. Silent failures — swallowed errors, empty catch, a fallback that hides a bug, a
|
|
15
|
+
return where it should throw.
|
|
16
|
+
3. Spec compliance — does the diff match the stated intent (issue/spec/PR)? Scope
|
|
17
|
+
drift, missing acceptance criteria.
|
|
18
|
+
4. Test gaps — logic-bearing code with no test; a bug fix with no regression test;
|
|
19
|
+
assertions that don't pin the behavior.
|
|
20
|
+
5. Senior quality — correctness, edge cases, naming, N+1s, convention violations.
|
|
21
|
+
|
|
22
|
+
Judge "against convention" using the project context below.
|
|
23
|
+
|
|
24
|
+
Do NOT flag:
|
|
25
|
+
- Style preferences not in the project context.
|
|
26
|
+
- TODOs explicitly marked as such.
|
|
27
|
+
- Changes that are clearly intentional from the diff context.
|
|
28
|
+
|
|
29
|
+
Return ONLY a JSON array (no markdown, no preamble). Return [] if the diff is clean —
|
|
30
|
+
do not manufacture findings.
|
|
31
|
+
|
|
32
|
+
## Project context
|
|
33
|
+
<PROJECT_CONTEXT>
|
|
34
|
+
|
|
35
|
+
## Changed files
|
|
36
|
+
<FILES>
|
|
37
|
+
|
|
38
|
+
## Diff
|
|
39
|
+
<DIFF>
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Output schema
|
|
43
|
+
|
|
44
|
+
```json
|
|
45
|
+
[
|
|
46
|
+
{
|
|
47
|
+
"file": "src/foo.ts",
|
|
48
|
+
"line": 42,
|
|
49
|
+
"side": "RIGHT",
|
|
50
|
+
"lens": "security|silent-failure|spec-compliance|test-gap|senior-quality",
|
|
51
|
+
"severity": "critical|important|suggestion",
|
|
52
|
+
"title": "Short title (max 60 chars)",
|
|
53
|
+
"issue": "What is wrong and why it matters",
|
|
54
|
+
"suggestion": "How to fix it (specific, not generic)"
|
|
55
|
+
}
|
|
56
|
+
]
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Empty result: `[]`.
|
|
60
|
+
|
|
61
|
+
`line` must exist on the right side of the diff for that file. `side` is `"RIGHT"` for
|
|
62
|
+
added or modified lines.
|
|
63
|
+
|
|
64
|
+
## Comment body format
|
|
65
|
+
|
|
66
|
+
Each GitHub review comment body:
|
|
67
|
+
|
|
68
|
+
````markdown
|
|
69
|
+
**<title>** _(<lens>)_
|
|
70
|
+
|
|
71
|
+
<issue>
|
|
72
|
+
|
|
73
|
+
```suggestion
|
|
74
|
+
<new line content> ← optional, only for a single-line replacement
|
|
75
|
+
```
|
|
76
|
+
````
|
|
77
|
+
|
|
78
|
+
## Dry-run table
|
|
79
|
+
|
|
80
|
+
| # | Sev | File | Line | Problem | Suggestion |
|
|
81
|
+
| --- | --- | ---- | ---- | ------- | ---------- |
|
|
82
|
+
|
|
83
|
+
Severity emoji: 🔴 `critical` · 🟠 `important` · 🟡 `suggestion`.
|
|
84
|
+
|
|
85
|
+
## GitHub API — batch review
|
|
86
|
+
|
|
87
|
+
All selected comments go in a **single** API call (one review thread, not N):
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
gh api repos/<org>/<repo>/pulls/<number>/reviews \
|
|
91
|
+
--method POST \
|
|
92
|
+
--field commit_id=<headRefOid> \
|
|
93
|
+
--field body='' \
|
|
94
|
+
--field event=COMMENT \
|
|
95
|
+
--field 'comments[][path]=src/foo.ts' \
|
|
96
|
+
--field 'comments[][line]=42' \
|
|
97
|
+
--field 'comments[][side]=RIGHT' \
|
|
98
|
+
--field 'comments[][body]=**Title** _(silent-failure)_ …' \
|
|
99
|
+
--field 'comments[][path]=src/bar.ts' \
|
|
100
|
+
--field 'comments[][line]=23' \
|
|
101
|
+
--field 'comments[][side]=RIGHT' \
|
|
102
|
+
--field 'comments[][body]=**Title** _(security)_ …'
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Repeat the four `comments[]...` fields per selected finding.
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: security-review
|
|
3
|
+
description: Security review of a change — OWASP-style vectors plus AI/LLM-specific risks. Checks input validation, authorization/IDOR, injection, secrets & logging, transport/headers, dependencies, and prompt-injection where an LLM is in the loop. Use after `senior-review` on any change touching user input, auth, data access, or an LLM.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [reviewer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Security Review
|
|
11
|
+
|
|
12
|
+
Review the change for security risk with **fresh context**. Scope the depth to what
|
|
13
|
+
the change touches — a pure refactor of internal logic needs a lighter pass than a new
|
|
14
|
+
endpoint or an LLM call. Consult the project's **security playbook** for project-
|
|
15
|
+
specific rules (deployment model, auth scheme, secret tooling); the checklist below is
|
|
16
|
+
the generic floor. Flag only findings you're confident are real, ranked by severity.
|
|
17
|
+
|
|
18
|
+
## 1. Input validation at boundaries
|
|
19
|
+
|
|
20
|
+
- Every entry point (request body, query/path params, headers, file, env, third-party
|
|
21
|
+
response) validates **shape, type, required-ness, and max length** before use.
|
|
22
|
+
- External strings are validated/converted at the boundary, not asserted into the
|
|
23
|
+
internal type.
|
|
24
|
+
- No mass-assignment: unexpected fields are rejected, not spread into a model.
|
|
25
|
+
|
|
26
|
+
## 2. Authorization (not just authentication)
|
|
27
|
+
|
|
28
|
+
- **Authentication** (who are you?) and **authorization** (are you allowed?) are
|
|
29
|
+
distinct checks — a valid session is not permission.
|
|
30
|
+
- **Ownership / IDOR**: a resource fetched by id verifies it belongs to the caller. An
|
|
31
|
+
auth guard never substitutes for the ownership check.
|
|
32
|
+
- Privileged actions check role/scope server-side, never trusting a client claim.
|
|
33
|
+
|
|
34
|
+
## 3. Injection
|
|
35
|
+
|
|
36
|
+
- **SQL/NoSQL**: parameterized queries / validated ids; never interpolate user input
|
|
37
|
+
into a query, never pass a raw request object to a query (object-injection).
|
|
38
|
+
- **Command/path**: no user input concatenated into a shell command or file path
|
|
39
|
+
(path traversal).
|
|
40
|
+
- **Output/XSS**: rendered user content is escaped by default; raw-HTML sinks
|
|
41
|
+
(`dangerouslySetInnerHTML`, server-generated HTML/PDF) sanitize at render time.
|
|
42
|
+
|
|
43
|
+
## 4. Secrets & logging
|
|
44
|
+
|
|
45
|
+
- No secret, token, password, key, or PII in code, logs, or error messages (a
|
|
46
|
+
never-log redaction list is configured; emails in failure logs are hashed).
|
|
47
|
+
- No secret committed (`.env` gitignored; `.env.example` carries placeholders only).
|
|
48
|
+
- Constant-time comparison for secrets/tokens (no `===` on a hash); strong randomness
|
|
49
|
+
for generated keys.
|
|
50
|
+
|
|
51
|
+
## 5. Transport, headers & CSRF
|
|
52
|
+
|
|
53
|
+
- Security headers present (nosniff, frame-options/ deny, HSTS in prod).
|
|
54
|
+
- CORS is closed by default; opened only deliberately for known origins.
|
|
55
|
+
- Cookie-based sessions set `httpOnly`, `secure` (prod), and `SameSite` (CSRF
|
|
56
|
+
defense); token/header auth flows are not CSRF-susceptible.
|
|
57
|
+
|
|
58
|
+
## 6. Dependencies & exposure
|
|
59
|
+
|
|
60
|
+
- `npm audit` (or the ecosystem equivalent) clean at moderate+; the change adds no
|
|
61
|
+
vulnerable dependency.
|
|
62
|
+
- No source maps served publicly in production; no debug/verbose error bodies leaking
|
|
63
|
+
internals to clients.
|
|
64
|
+
|
|
65
|
+
## 7. AI / LLM-specific (when an LLM is in the loop)
|
|
66
|
+
|
|
67
|
+
- **Prompt injection**: untrusted content (user text, fetched pages, tool output) is
|
|
68
|
+
treated as data, not instructions; system prompts are not overridable by user input.
|
|
69
|
+
- **Output handling**: LLM output is validated/escaped before it drives an action
|
|
70
|
+
(code execution, a query, a shell command, a network call) — never `eval`'d or run
|
|
71
|
+
unchecked.
|
|
72
|
+
- **Secret/PII exposure**: no secret or excessive PII placed in a prompt sent to a
|
|
73
|
+
third-party model; tool/function access from the model is least-privilege.
|
|
74
|
+
- **Resource limits**: token/cost/loop bounds on agentic calls to prevent runaway use.
|
|
75
|
+
|
|
76
|
+
## Report
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
## Security Review — <task name>
|
|
80
|
+
|
|
81
|
+
**Critical**: <vector — file:line — fix> | none
|
|
82
|
+
**High**: <…> | none
|
|
83
|
+
**Medium**: <…> | none
|
|
84
|
+
**Low / hardening**: <…> | none
|
|
85
|
+
|
|
86
|
+
**Verdict**: no blocking issues / <N> issues to fix before merge
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
A finding that the **spec** mandated an insecure approach is a **discovery** (the spec
|
|
90
|
+
needs a human decision) — run `raise-discovery` rather than silently complying.
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: senior-review
|
|
3
|
+
description: Post-implementation code & test quality review — the "is it good?" half (mechanical gates + execution belong to `verify`). Checks test meaningfulness, integration seams, TypeScript async/type pitfalls, and a self-review checklist, gated by confidence. Use after `verify`, with fresh context, to judge the quality of just-implemented code.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [reviewer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Senior Review
|
|
11
|
+
|
|
12
|
+
Judge the **quality** of the code just implemented, with **fresh context** — review
|
|
13
|
+
it as if it were someone else's work. This is the "**is it good?**" half of review;
|
|
14
|
+
the "**does it work?**" half (build, type-check, lint, tests, coverage, audit, and a
|
|
15
|
+
real run) is the `verify` skill, which the Reviewer runs **first**. Do not re-run the
|
|
16
|
+
gates here — assume `verify` passed and focus on judgment. Work each phase in order.
|
|
17
|
+
|
|
18
|
+
## Confidence gating
|
|
19
|
+
|
|
20
|
+
Only raise a finding you are **>80% confident** is a real problem. State your
|
|
21
|
+
confidence when it's borderline. A review that flags everything trains the reader to
|
|
22
|
+
ignore it; a review that flags only what matters gets acted on. Style nitpicks the
|
|
23
|
+
formatter/linter already owns are not findings.
|
|
24
|
+
|
|
25
|
+
## Phase 1 — Test quality
|
|
26
|
+
|
|
27
|
+
`verify` confirmed the suite passes; here you judge whether the tests are _meaningful_.
|
|
28
|
+
|
|
29
|
+
1. Identify the logic-bearing files in the change (services, mappers, hooks,
|
|
30
|
+
processors, helpers, components).
|
|
31
|
+
2. For each, open its `*.spec.ts(x)` and verify:
|
|
32
|
+
- Happy path covered.
|
|
33
|
+
- At least one **error** case covered.
|
|
34
|
+
- At least one **edge** case (empty input, null, boundary value).
|
|
35
|
+
- **Arrange / Act / Assert** structure present.
|
|
36
|
+
- `toStrictEqual` used (not `toEqual` / `toBe`) — `toStrictEqual` catches
|
|
37
|
+
`undefined` properties the others miss.
|
|
38
|
+
- Assertions test **behavior**, not the mock (no mock-and-assert, no over-mocking
|
|
39
|
+
of internal modules).
|
|
40
|
+
3. **User-visible flow**: if the change adds/modifies a navigation / form / list /
|
|
41
|
+
auth flow, is there an end-to-end test (or was one updated)? See the project's
|
|
42
|
+
testing playbook (`docs/playbooks/testing.md`, then the global layer) for when E2E
|
|
43
|
+
is warranted.
|
|
44
|
+
4. If a spec is missing or shallow, that is a finding — name the file and the gap.
|
|
45
|
+
|
|
46
|
+
## Phase 2 — Integration seams
|
|
47
|
+
|
|
48
|
+
Most bugs live at module boundaries, not inside modules. Identify the seams this
|
|
49
|
+
change touches — event pipelines, cross-module boundaries, API↔UI contracts,
|
|
50
|
+
producer→consumer handoffs. For each seam touched, confirm there is at least one test
|
|
51
|
+
that **crosses the boundary**, not just a unit test inside one module.
|
|
52
|
+
|
|
53
|
+
## Phase 3 — TypeScript async & type pitfalls
|
|
54
|
+
|
|
55
|
+
Read the diff for the failure modes the compiler does not catch on its own:
|
|
56
|
+
|
|
57
|
+
- [ ] **Floating promises** — every async call is `await`ed or deliberately handled;
|
|
58
|
+
no fire-and-forget that should have been awaited.
|
|
59
|
+
- [ ] **Missing `await` in `try`** — an un-awaited promise inside `try/catch` escapes
|
|
60
|
+
the catch.
|
|
61
|
+
- [ ] **`any` / unsafe casts** — no `as any`, no `as unknown as T` smuggling past the
|
|
62
|
+
type system; assertions are justified.
|
|
63
|
+
- [ ] **Non-null assertions (`!`)** that hide a genuinely possible null.
|
|
64
|
+
- [ ] **Unhandled rejection paths** — `Promise.all` vs `allSettled` chosen
|
|
65
|
+
deliberately; one rejection doesn't silently drop the rest.
|
|
66
|
+
- [ ] **Boundary types** — external/untrusted input is validated and narrowed at the
|
|
67
|
+
boundary, not asserted into the internal type.
|
|
68
|
+
|
|
69
|
+
## Phase 4 — Self-review checklist
|
|
70
|
+
|
|
71
|
+
Read the diff fresh, as if it were someone else's:
|
|
72
|
+
|
|
73
|
+
- [ ] No N+1 calls or unnecessary work in loops.
|
|
74
|
+
- [ ] Subscriptions, intervals, and event listeners cleaned up on teardown.
|
|
75
|
+
- [ ] No silent error swallowing (a `catch` that does nothing, optional chaining that
|
|
76
|
+
hides a bug) — for a deeper pass, the `silent-failure-hunter` skill.
|
|
77
|
+
- [ ] External/user input validated at system boundaries.
|
|
78
|
+
- [ ] Errors fail **explicitly** with a message that identifies the problem.
|
|
79
|
+
- [ ] No dead code or unused variables.
|
|
80
|
+
- [ ] Comments only where logic is non-obvious (not narrating every line).
|
|
81
|
+
|
|
82
|
+
## Phase 5 — Report
|
|
83
|
+
|
|
84
|
+
Output a concise verdict the Orchestrator can act on:
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
## Senior Review — <task name>
|
|
88
|
+
|
|
89
|
+
**Test quality**: ✅ / ⚠️ <missing/shallow specs>
|
|
90
|
+
**User-visible flow**: ✅ tested / N/A / ⚠️ <gap>
|
|
91
|
+
**Integration seams**: ✅ covered / ⚠️ <uncrossed boundary>
|
|
92
|
+
**TS async/types**: ✅ / ⚠️ <finding>
|
|
93
|
+
**Self-review**: <findings, or "none">
|
|
94
|
+
|
|
95
|
+
**Verdict**: approve / changes requested — <one-line reason>
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
If a finding is about the spec itself (the spec is wrong or silent on a case), that is
|
|
99
|
+
a **discovery**, not a rejection: run `raise-discovery` instead of failing the change.
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: silent-failure-hunter
|
|
3
|
+
description: Hunt for silently swallowed errors and dangerous fallbacks in a change — the failures that hide bugs instead of surfacing them. Five focused passes over the diff. Use during review when a change touches error handling, async flows, or boundary parsing, before `security-review`.
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [reviewer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Silent Failure Hunter
|
|
11
|
+
|
|
12
|
+
The most expensive bugs are the ones that never raise their hand. This skill is a
|
|
13
|
+
focused pass for **silent failures** — code that hides a problem instead of surfacing
|
|
14
|
+
it. The rule it enforces (from the engineering playbook): **errors should fail loudly,
|
|
15
|
+
at the point of the problem, with a message that names it.** Make five passes over the
|
|
16
|
+
diff; for each finding, state the failure mode and the loud alternative.
|
|
17
|
+
|
|
18
|
+
## Pass 1 — Empty / swallowing catches
|
|
19
|
+
|
|
20
|
+
- `catch` blocks that do nothing, only `console.log`, or `return null/undefined/[]`
|
|
21
|
+
without rethrowing or handling.
|
|
22
|
+
- `.catch(() => {})` on a promise.
|
|
23
|
+
- `try` wrapping logic where the `catch` masks the real failure.
|
|
24
|
+
|
|
25
|
+
> A `catch` is for _handling_ an error (recover, translate, add context, rethrow) —
|
|
26
|
+
> not for making it disappear. An expected no-op (idempotent retry) is fine **only**
|
|
27
|
+
> with a comment stating the intent.
|
|
28
|
+
|
|
29
|
+
## Pass 2 — Dangerous fallbacks
|
|
30
|
+
|
|
31
|
+
- `value ?? defaultThatHidesAbsence` / `value || fallback` where the fallback masks a
|
|
32
|
+
missing-data bug instead of failing.
|
|
33
|
+
- Optional chaining (`a?.b?.c`) that silently yields `undefined` where the value was
|
|
34
|
+
required — the bug surfaces three layers later, detached from its cause.
|
|
35
|
+
- `try { JSON.parse(x) } catch { return {} }` — a parse failure becomes empty data.
|
|
36
|
+
|
|
37
|
+
## Pass 3 — Inadequate logging / observability
|
|
38
|
+
|
|
39
|
+
- An error is caught and handled but **never logged**, so it's invisible in
|
|
40
|
+
production.
|
|
41
|
+
- Logged at the wrong level (`debug`/`info` for a real error) so it's filtered out.
|
|
42
|
+
- The log message lacks the context to act on it (no id, no operation, no cause).
|
|
43
|
+
|
|
44
|
+
## Pass 4 — Propagation issues
|
|
45
|
+
|
|
46
|
+
- A floating promise (no `await`, no `.catch`) whose rejection is lost.
|
|
47
|
+
- `Promise.all` where one rejection silently aborts siblings (vs `allSettled` when
|
|
48
|
+
partial success matters) — or the reverse, swallowing a failure that should abort.
|
|
49
|
+
- An async function whose error path returns a _success-shaped_ value.
|
|
50
|
+
- An error translated to a generic message that erases the original cause (no `cause`
|
|
51
|
+
chain).
|
|
52
|
+
|
|
53
|
+
## Pass 5 — Missing handling at boundaries
|
|
54
|
+
|
|
55
|
+
- External/untrusted input (API response, file, env var, user input) used without
|
|
56
|
+
validation, so a malformed value flows in as a silent bad state rather than a loud
|
|
57
|
+
rejection.
|
|
58
|
+
- A `switch`/branch with no `default` for an unexpected value.
|
|
59
|
+
- A required env var read with a silent fallback instead of failing fast at startup.
|
|
60
|
+
|
|
61
|
+
## Report
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
## Silent Failure Hunt — <task name>
|
|
65
|
+
|
|
66
|
+
**Empty/swallowing catches**: <file:line — failure → loud fix> | none
|
|
67
|
+
**Dangerous fallbacks**: <…> | none
|
|
68
|
+
**Inadequate logging**: <…> | none
|
|
69
|
+
**Propagation**: <…> | none
|
|
70
|
+
**Missing boundary handling**: <…> | none
|
|
71
|
+
|
|
72
|
+
**Verdict**: clean / <N> silent failures to fix
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
For each finding, give the loud alternative concretely (`throw new Error(\`… ${id}\`)`instead of`return null`). If the "silent" path is intentional, it must carry a comment
|
|
76
|
+
saying so — absent that comment, treat it as a finding.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: spec-compliance-check
|
|
3
|
+
description: Walk an L1 spec point by point against the implementation and emit a per-requirement pass/fail verdict. Use during review of an SDD task to prove every EARS requirement (including unwanted-behavior paths) is satisfied and tested — a committable audit trail. Complements `senior-review` (quality) with traceability (coverage of intent).
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: post-implementation
|
|
7
|
+
invoked-by: [reviewer]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Spec Compliance Check
|
|
11
|
+
|
|
12
|
+
`senior-review` judges whether the code is _good_; this skill proves it does what the
|
|
13
|
+
**spec said** — every requirement, point by point. It produces a traceability matrix:
|
|
14
|
+
a committable audit trail mapping each requirement to its implementation and its test.
|
|
15
|
+
Only meaningful for **L1 (SDD) tasks** that have a spec under
|
|
16
|
+
`.claude/state/tasks/<id>/spec/`; skip it for L2.
|
|
17
|
+
|
|
18
|
+
## Process
|
|
19
|
+
|
|
20
|
+
### 1. Load the spec
|
|
21
|
+
|
|
22
|
+
Read all three spec artifacts for the task:
|
|
23
|
+
|
|
24
|
+
- `requirements.md` — the **EARS** requirements (ubiquitous / event-driven /
|
|
25
|
+
state-driven / optional-feature / **unwanted-behavior**), each numbered with
|
|
26
|
+
acceptance criteria.
|
|
27
|
+
- `design.md` — the intended files, interfaces, approach, edge cases.
|
|
28
|
+
- `tasks.md` — the atomic checklist, each item referencing the requirements it
|
|
29
|
+
satisfies.
|
|
30
|
+
|
|
31
|
+
### 2. Trace each requirement to the implementation
|
|
32
|
+
|
|
33
|
+
For **every** numbered requirement (do not sample — enterprise audit trails are
|
|
34
|
+
exhaustive):
|
|
35
|
+
|
|
36
|
+
1. Locate where the change implements it (file + symbol).
|
|
37
|
+
2. Locate the test that proves it (the test whose assertion maps to the acceptance
|
|
38
|
+
criteria).
|
|
39
|
+
3. Decide the verdict:
|
|
40
|
+
- **PASS** — implemented **and** covered by a test that asserts the acceptance
|
|
41
|
+
criteria.
|
|
42
|
+
- **PARTIAL** — implemented but the test is missing or doesn't assert the criteria.
|
|
43
|
+
- **FAIL** — not implemented, or the behavior contradicts the requirement.
|
|
44
|
+
|
|
45
|
+
Pay special attention to **unwanted-behavior** requirements (`If … then …`) — these
|
|
46
|
+
are the ones implementations quietly skip. Each needs a test that drives the bad input
|
|
47
|
+
and asserts the rejection/guard.
|
|
48
|
+
|
|
49
|
+
### 3. Check for drift
|
|
50
|
+
|
|
51
|
+
- **Unspecified additions**: implementation behavior with no backing requirement →
|
|
52
|
+
either scope drift (a `T3` discovery) or a missing requirement.
|
|
53
|
+
- **Design divergence**: the change took an approach `design.md` didn't describe —
|
|
54
|
+
note it; if it invalidates a design decision, that's a discovery.
|
|
55
|
+
|
|
56
|
+
### 4. Emit the matrix
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
## Spec Compliance — task #<id>
|
|
60
|
+
|
|
61
|
+
| Req | Summary | Status | Implementation | Test |
|
|
62
|
+
| --- | --- | --- | --- | --- |
|
|
63
|
+
| R1 | <ubiquitous …> | ✅ PASS | `x.service.ts:42` | `x.service.spec.ts` |
|
|
64
|
+
| R7 | <If invalid → reject> | ⚠️ PARTIAL | `x.service.ts:80` | — (no test) |
|
|
65
|
+
| R9 | <state-driven …> | ❌ FAIL | — | — |
|
|
66
|
+
|
|
67
|
+
**Covered**: <n>/<total> · **Partial**: <n> · **Failing**: <n>
|
|
68
|
+
**Verdict**: compliant / non-compliant — <one-line reason>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
A `FAIL` or `PARTIAL` is a review finding routed back to the Implementer. If a
|
|
72
|
+
requirement is itself wrong, ambiguous, or contradicted by reality, that's a
|
|
73
|
+
**discovery** — run `raise-discovery`, don't quietly mark it PASS or rewrite the spec
|
|
74
|
+
yourself.
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: spec-to-issue
|
|
3
|
+
description: Fill a managed issue's body with the externalized spec. Use when the Spec Author publishes the completed spec into the GitHub issue the Orchestrator already created, mentions "fill the issue", "spec-to-issue", or "externalize the spec into the issue".
|
|
4
|
+
origin: vendor
|
|
5
|
+
vendor_version: '{{vendor_version}}'
|
|
6
|
+
phase: pre-implementation
|
|
7
|
+
invoked-by: [spec-author]
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Spec to Issue
|
|
11
|
+
|
|
12
|
+
## Core Principle
|
|
13
|
+
|
|
14
|
+
The GitHub issue **is** the spec, externalized. The Orchestrator already created it (a
|
|
15
|
+
skeleton with the harness labels) at the start of `spec-in-progress` and gave the task
|
|
16
|
+
its permanent `<id>`. This skill does one thing: replace that skeleton body with the
|
|
17
|
+
real spec, so the issue reads as a self-contained spec on GitHub.
|
|
18
|
+
|
|
19
|
+
Run this **after** `prd-to-spec` has produced `requirements.md` + `design.md` +
|
|
20
|
+
`tasks.md` under `.claude/state/tasks/<id>/spec/`.
|
|
21
|
+
|
|
22
|
+
## Preconditions
|
|
23
|
+
|
|
24
|
+
- The issue already exists (the Orchestrator created it) and `<id>` is known.
|
|
25
|
+
- A spec exists at `.claude/state/tasks/<id>/spec/{requirements,design,tasks}.md`.
|
|
26
|
+
- `gh` is authenticated and `task_storage.type` is `github` (see `harness.config.yml`).
|
|
27
|
+
|
|
28
|
+
You do **not** create the issue, rename any folder (the id is real from the start —
|
|
29
|
+
there is no `draft-<slug>`), or touch labels. Issue creation and the entire label
|
|
30
|
+
lifecycle belong to the Orchestrator.
|
|
31
|
+
|
|
32
|
+
## Process
|
|
33
|
+
|
|
34
|
+
### 1. Build the issue body from the spec
|
|
35
|
+
|
|
36
|
+
The body is the externalized spec — self-contained enough to read on GitHub, with
|
|
37
|
+
links back to the committed files for the full detail:
|
|
38
|
+
|
|
39
|
+
```markdown
|
|
40
|
+
## Summary
|
|
41
|
+
|
|
42
|
+
<one paragraph: the capability and why, from the PRD>
|
|
43
|
+
|
|
44
|
+
## Requirements (EARS)
|
|
45
|
+
|
|
46
|
+
<paste requirements.md, or its numbered list>
|
|
47
|
+
|
|
48
|
+
## Design
|
|
49
|
+
|
|
50
|
+
See `.claude/state/tasks/<id>/spec/design.md`.
|
|
51
|
+
|
|
52
|
+
## Tasks
|
|
53
|
+
|
|
54
|
+
See `.claude/state/tasks/<id>/spec/tasks.md`.
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### 2. Replace the skeleton body
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
gh issue edit <id> --body-file <body>
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
This overwrites the `🚧 Spec in progress …` skeleton the Orchestrator wrote at issue
|
|
64
|
+
creation. Labels are untouched — the issue still carries `harness:managed` +
|
|
65
|
+
`harness:sdd` + `harness:status:spec-in-progress`; the Orchestrator flips it to
|
|
66
|
+
`spec-ready` after you hand back.
|
|
67
|
+
|
|
68
|
+
### 3. Hand back to the Orchestrator
|
|
69
|
+
|
|
70
|
+
Return a short confirmation. From here the **Orchestrator** flips the status label to
|
|
71
|
+
`harness:status:spec-ready`, commits and pushes the task state to the branch, and runs
|
|
72
|
+
the human approval gate. This skill transitions no labels.
|
|
73
|
+
|
|
74
|
+
## Ownership boundary
|
|
75
|
+
|
|
76
|
+
- **Spec Author** (this skill): fills the issue **body** from the committed spec.
|
|
77
|
+
- **Orchestrator**: created the issue, owns every label transition
|
|
78
|
+
(`spec-in-progress → spec-ready → in-progress → in-review → done`), and closes it at
|
|
79
|
+
closeout.
|
|
80
|
+
|
|
81
|
+
## Uncontemplated Scenarios
|
|
82
|
+
|
|
83
|
+
When a scenario doesn't clearly fit these rules:
|
|
84
|
+
|
|
85
|
+
1. Apply the closest matching approach with reasoning.
|
|
86
|
+
2. **Flag it**: "This scenario isn't covered by the spec-to-issue skill. I applied
|
|
87
|
+
[approach] because [reason]. Want to update the skill?"
|
|
88
|
+
3. Offer to add a new rule for the case.
|