@anthropologies/claudestory 0.1.60 → 0.1.62
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +3603 -701
- package/dist/index.d.ts +66 -66
- package/dist/mcp.js +3315 -573
- package/package.json +1 -1
- package/src/skill/SKILL.md +4 -0
- package/src/skill/autonomous-mode.md +27 -0
- package/src/skill/reference.md +3 -0
- package/src/skill/review-lenses/references/judge.md +85 -7
- package/src/skill/review-lenses/references/lens-accessibility.md +91 -5
- package/src/skill/review-lenses/references/lens-api-design.md +92 -3
- package/src/skill/review-lenses/references/lens-clean-code.md +94 -3
- package/src/skill/review-lenses/references/lens-concurrency.md +92 -4
- package/src/skill/review-lenses/references/lens-error-handling.md +92 -3
- package/src/skill/review-lenses/references/lens-performance.md +96 -4
- package/src/skill/review-lenses/references/lens-security.md +136 -4
- package/src/skill/review-lenses/references/lens-test-quality.md +95 -4
- package/src/skill/review-lenses/references/merger.md +76 -3
- package/src/skill/review-lenses/references/shared-preamble.md +62 -2
- package/src/skill/review-lenses/review-lenses.md +246 -36
|
@@ -1,59 +1,269 @@
|
|
|
1
|
-
|
|
1
|
+
<!--
|
|
2
|
+
MAINTENANCE: Prompts exist in .ts (autonomous guide) and .md (manual fallback).
|
|
3
|
+
The .ts files are the source of truth. The .md files are agent-readable copies.
|
|
4
|
+
When updating a prompt, update the .ts file first, then sync the .md file.
|
|
5
|
+
-->
|
|
2
6
|
|
|
3
|
-
|
|
7
|
+
# Multi-Lens Review -- Evaluation Protocol
|
|
4
8
|
|
|
5
|
-
|
|
9
|
+
This file is referenced from SKILL.md for `/story review-lenses` and when reviewing plans or code.
|
|
6
10
|
|
|
7
|
-
|
|
11
|
+
**Skill command name:** When this file references `/story` in user-facing output, use the actual command that invoked you.
|
|
8
12
|
|
|
9
|
-
##
|
|
13
|
+
## When to Use
|
|
10
14
|
|
|
11
|
-
|
|
15
|
+
- After writing a plan (any mode -- /story plan, native plan mode, manual)
|
|
16
|
+
- Before committing code (after implementation, before merge)
|
|
17
|
+
- When explicitly invoked via `/story review-lenses`
|
|
12
18
|
|
|
19
|
+
The autonomous guide invokes lenses automatically during CODE_REVIEW/PLAN_REVIEW stages when `reviewBackends` includes `"lenses"`. This protocol is for manual/standalone use.
|
|
20
|
+
|
|
21
|
+
## When to Combine with Single-Agent Review
|
|
22
|
+
|
|
23
|
+
Lenses excel at **breadth and static analysis** -- catching duplicated code, missing validation, unused imports, schema gaps, test coverage holes. For **complex state machines, session lifecycles, or multi-file behavioral reasoning** (e.g., "what happens when session resumes after compaction?"), also run a focused single-agent review.
|
|
24
|
+
|
|
25
|
+
Best workflow: single agent for deep reasoning first, then lenses for breadth/defense-in-depth. The combination is more effective than either alone.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Path A: MCP Available (Primary)
|
|
30
|
+
|
|
31
|
+
Use this path when `claudestory_review_lenses_prepare` is available as an MCP tool.
|
|
32
|
+
|
|
33
|
+
### Step 1: Determine review stage
|
|
34
|
+
|
|
35
|
+
- If reviewing a **plan** (plan text, implementation design, architecture doc): stage = `PLAN_REVIEW`
|
|
36
|
+
- If reviewing **code** (uncommitted diff, PR, implementation): stage = `CODE_REVIEW`
|
|
37
|
+
|
|
38
|
+
### Step 2: Capture the artifact
|
|
39
|
+
|
|
40
|
+
- **CODE_REVIEW:** Run `git diff` to capture the current diff. Run `git diff --name-only` for changed file list.
|
|
41
|
+
- **Round 2+:** Use `git diff <commit-at-last-review>..HEAD` to capture only changes since the last review.
|
|
42
|
+
- **PLAN_REVIEW:** Read the plan file (from `.story/sessions/<id>/plan.md` or the current plan in context).
|
|
43
|
+
|
|
44
|
+
### Step 3: Prepare the review
|
|
45
|
+
|
|
46
|
+
Call `claudestory_review_lenses_prepare` with:
|
|
47
|
+
```json
|
|
48
|
+
{
|
|
49
|
+
"stage": "CODE_REVIEW",
|
|
50
|
+
"diff": "<full diff text>",
|
|
51
|
+
"changedFiles": ["src/foo.ts", "src/bar.ts"],
|
|
52
|
+
"ticketDescription": "T-XXX: description of the ticket (or 'Manual review -- brief description' if no ticket)",
|
|
53
|
+
"reviewRound": 1,
|
|
54
|
+
"priorDeferrals": []
|
|
55
|
+
}
|
|
13
56
|
```
|
|
14
|
-
|
|
57
|
+
|
|
58
|
+
For rounds 2+, increment `reviewRound` and pass issueKeys of findings you intentionally deferred:
|
|
59
|
+
```json
|
|
60
|
+
{
|
|
61
|
+
"reviewRound": 2,
|
|
62
|
+
"priorDeferrals": ["clean-code:src/foo.ts:42:dead-param", "test-quality:::handleStart-untested"]
|
|
63
|
+
}
|
|
15
64
|
```
|
|
16
65
|
|
|
17
|
-
|
|
66
|
+
The tool returns `lensPrompts` (one per active lens) and `metadata`.
|
|
18
67
|
|
|
19
|
-
|
|
68
|
+
### Step 4: Spawn lens agents in parallel
|
|
20
69
|
|
|
21
|
-
**
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
70
|
+
For each lens prompt where `cached: false`, launch a subagent in a **single message with multiple Agent tool calls**:
|
|
71
|
+
- **Prompt:** The `prompt` string returned by the prepare tool + append `\n\n## Diff to review\n\n` + the `artifact` string from Step 3
|
|
72
|
+
- **If `promptTruncated: true`:** The prompt was too large to include. Read `promptRef` from the skill directory (`~/.claude/skills/story/review-lenses/<promptRef>`), fill the preamble variables (see Path B Step 5), select the stage-appropriate section, and append the artifact yourself.
|
|
73
|
+
- **Model:** The `model` string returned (sonnet or opus)
|
|
74
|
+
- **Tools:** Read, Grep, Glob (read-only)
|
|
25
75
|
|
|
26
|
-
|
|
27
|
-
4. Performance -- N+1 queries, memory leaks, algorithmic complexity
|
|
28
|
-
5. API Design -- backward compat, REST conventions, error responses
|
|
29
|
-
6. Concurrency -- race conditions, deadlocks, actor isolation (Opus model)
|
|
30
|
-
7. Test Quality -- coverage gaps, flaky patterns, missing assertions
|
|
31
|
-
8. Accessibility -- WCAG, keyboard nav, screen reader support
|
|
76
|
+
Skip lenses where `cached: true` -- their findings are already available in `cachedFindings`. You will include them in Step 6.
|
|
32
77
|
|
|
33
|
-
|
|
78
|
+
### Step 5: Collect results
|
|
34
79
|
|
|
35
|
-
|
|
36
|
-
2. **Judge** -- severity calibration + stage-aware verdict
|
|
80
|
+
Each lens returns JSON: `{ "status": "complete" | "insufficient-context", "findings": [...] }`
|
|
37
81
|
|
|
38
|
-
|
|
82
|
+
Combine with cached findings from Step 3. For cached lenses, create a result entry: `{ "lens": "<name>", "status": "complete", "findings": <cachedFindings from Step 3> }`. Include ALL active lenses in the results array -- both spawned and cached -- otherwise the synthesize tool will mark missing lenses as "failed."
|
|
39
83
|
|
|
40
|
-
|
|
84
|
+
### Step 6: Synthesize
|
|
41
85
|
|
|
86
|
+
Call `claudestory_review_lenses_synthesize` with:
|
|
42
87
|
```json
|
|
43
88
|
{
|
|
44
|
-
"
|
|
45
|
-
|
|
46
|
-
"
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
}
|
|
53
|
-
}
|
|
89
|
+
"lensResults": [
|
|
90
|
+
{ "lens": "clean-code", "status": "complete", "findings": [...] },
|
|
91
|
+
{ "lens": "security", "status": "complete", "findings": [...] }
|
|
92
|
+
],
|
|
93
|
+
"activeLenses": ["clean-code", "security", "error-handling"],
|
|
94
|
+
"skippedLenses": ["performance", "api-design", "concurrency", "test-quality", "accessibility"],
|
|
95
|
+
"reviewRound": 1,
|
|
96
|
+
"reviewId": "lens-xxx"
|
|
54
97
|
}
|
|
55
98
|
```
|
|
56
99
|
|
|
57
|
-
|
|
100
|
+
The tool validates findings, applies blocking policy, and returns `mergerPrompt`.
|
|
101
|
+
|
|
102
|
+
### Step 7: Run merger
|
|
103
|
+
|
|
104
|
+
Spawn one agent with the returned `mergerPrompt`. It deduplicates findings and identifies tensions. The merger returns JSON with `findings`, `tensions`, and `mergeLog`.
|
|
105
|
+
|
|
106
|
+
### Step 8: Run judge
|
|
107
|
+
|
|
108
|
+
Call `claudestory_review_lenses_judge` with:
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"mergerResultRaw": "<raw JSON from merger agent>",
|
|
112
|
+
"lensesCompleted": ["clean-code", "security", "error-handling"],
|
|
113
|
+
"lensesFailed": [],
|
|
114
|
+
"lensesInsufficientContext": [],
|
|
115
|
+
"lensesSkipped": ["performance", "api-design", "concurrency", "test-quality", "accessibility"],
|
|
116
|
+
"convergenceHistory": [
|
|
117
|
+
{ "round": 1, "verdict": "revise", "blocking": 3, "important": 7, "newCode": "--" }
|
|
118
|
+
]
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
The tool returns `judgePrompt` with verdict calibration rules and convergence guidance.
|
|
123
|
+
|
|
124
|
+
### Step 9: Run judge agent
|
|
125
|
+
|
|
126
|
+
Spawn one agent with the `judgePrompt`. It calibrates severity and generates the final verdict. The judge returns the `SynthesisResult` JSON.
|
|
127
|
+
|
|
128
|
+
### Step 10: Present output
|
|
129
|
+
|
|
130
|
+
Format the judge's output using the **Standardized Output Format** below.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Path B: MCP Unavailable (Fallback)
|
|
135
|
+
|
|
136
|
+
Use this path when MCP tools are not available (e.g., plugin-only install, other AI tools).
|
|
137
|
+
|
|
138
|
+
### Step 1-2: Same as Path A
|
|
139
|
+
|
|
140
|
+
Determine stage and capture artifact.
|
|
141
|
+
|
|
142
|
+
### Step 3: Determine active lenses
|
|
143
|
+
|
|
144
|
+
**Core (always):** clean-code, security, error-handling
|
|
145
|
+
|
|
146
|
+
**Surface-activated by changed files:**
|
|
147
|
+
- ORM imports (prisma, sequelize, typeorm, mongoose, knex), nested loops >= 2, files > 300 lines, hotPaths config -> **performance**
|
|
148
|
+
- `**/api/**`, route handlers, controllers, GraphQL resolvers -> **api-design**
|
|
149
|
+
- `.swift`, `.go`, `.rs`, async/await + shared state mutation -> **concurrency**
|
|
150
|
+
- `*.test.*`, `*.spec.*`, `__tests__/` -> **test-quality**
|
|
151
|
+
- `*.tsx`, `*.jsx`, `*.html`, `*.vue`, `*.svelte` -> **accessibility**
|
|
152
|
+
|
|
153
|
+
**Exclude:** lock files, node_modules, generated code (`*.generated.*`), migrations, binaries, vendored deps.
|
|
154
|
+
|
|
155
|
+
### Step 4: Read prompt files
|
|
156
|
+
|
|
157
|
+
Read `references/shared-preamble.md` in this directory. For each active lens, read `references/lens-<name>.md`. Select the **Code Review Prompt** or **Plan Review Prompt** section based on stage.
|
|
158
|
+
|
|
159
|
+
### Step 5: Fill variables and spawn
|
|
160
|
+
|
|
161
|
+
Fill `{{variable}}` placeholders in the shared preamble:
|
|
162
|
+
- **Required:** `{{lensName}}`, `{{lensVersion}}` (from frontmatter), `{{reviewStage}}`, `{{reviewArtifact}}`, `{{fileManifest}}`, `{{ticketDescription}}`
|
|
163
|
+
- **Defaults:** `{{projectRules}}` (empty if no RULES.md), `{{knownFalsePositives}}` (empty), `{{activationReason}}` ("core lens" or file signal), `{{findingBudget}}` (10), `{{confidenceFloor}}` (0.6), `{{artifactType}}` ("diff" or "plan")
|
|
164
|
+
- **Lens-specific:** `{{hotPaths}}` (performance only), `{{scannerFindings}}` (security only)
|
|
165
|
+
- **Ticket description:** If you have a ticket context (from `/story`, ticket in progress), use it. Otherwise: `"Manual review -- [brief description]"`
|
|
166
|
+
|
|
167
|
+
Each lens prompt ends with `Append: ## Diff to review\n\n{{reviewArtifact}}` (or `## Plan to review`). This means: when building the final prompt, append the artifact (diff or plan text) at the end after the lens instructions. Replace `{{reviewArtifact}}` with the actual content.
|
|
168
|
+
|
|
169
|
+
Spawn all lens agents in parallel. Each gets: filled preamble + lens prompt section + artifact. Model from frontmatter (sonnet or opus).
|
|
170
|
+
|
|
171
|
+
### Step 6-9: Simplified synthesis
|
|
172
|
+
|
|
173
|
+
Read `references/merger.md` and `references/judge.md`. Run merger then judge as in Path A. Since the MCP synthesize/judge tools aren't available, use these blocking rules when reading the judge output:
|
|
174
|
+
- Critical severity + confidence >= 0.8 = blocking
|
|
175
|
+
- Everything else = non-blocking
|
|
176
|
+
- "Blocking" requires a concrete failure scenario (crash, data corruption) -- not "missing test"
|
|
177
|
+
|
|
178
|
+
### Step 10: Present output using standardized format below.
|
|
179
|
+
|
|
180
|
+
### Shortcut for small reviews
|
|
181
|
+
|
|
182
|
+
If only 1-2 lenses activate and total findings < 5, skip the merger/judge and present findings directly. The synthesis pipeline adds value with 3+ lenses producing overlapping findings.
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Error Handling
|
|
187
|
+
|
|
188
|
+
- **Lens returns malformed output:** Drop it, note the lens in "lensesFailed"
|
|
189
|
+
- **Merger fails:** Pass raw (unmerged) findings to the judge
|
|
190
|
+
- **Judge fails:** Compute verdict deterministically: any blocking + high-confidence finding = revise, else approve
|
|
191
|
+
- **Core lens fails (security, error-handling, clean-code):** Never approve -- maximum verdict is "revise"
|
|
192
|
+
|
|
193
|
+
---
|
|
194
|
+
|
|
195
|
+
## Acknowledged Deferrals
|
|
196
|
+
|
|
197
|
+
After each round, classify findings you received:
|
|
198
|
+
- **"I'll fix this"** -> fixed (verify next round)
|
|
199
|
+
- **"Out of scope / architectural"** -> deferred (pass issueKey to `priorDeferrals` next round, file as issue)
|
|
200
|
+
- **"I disagree"** -> contested (pass to `priorDeferrals`, adds to knownFalsePositives)
|
|
201
|
+
|
|
202
|
+
This prevents the same findings from being re-reported across rounds.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Standardized Output Format
|
|
207
|
+
|
|
208
|
+
Every lens review produces this structure, regardless of invocation path:
|
|
209
|
+
|
|
210
|
+
```markdown
|
|
211
|
+
## Multi-Lens Review
|
|
212
|
+
|
|
213
|
+
**Verdict: APPROVE | REVISE | REJECT**
|
|
214
|
+
_One sentence explaining what drove the verdict._
|
|
215
|
+
|
|
216
|
+
**Lenses:** clean-code, security, error-handling, concurrency, test-quality (5 ran, 3 skipped)
|
|
217
|
+
**Round:** R3 | **Recommend next round:** No -- blocking at 0 for 2 rounds, important stable
|
|
218
|
+
|
|
219
|
+
### Blocking Findings
|
|
220
|
+
|
|
221
|
+
1. **[severity] description** (lens, confidence, scope: inline/pr/architectural)
|
|
222
|
+
File: `path/to/file.ts:42` | Origin: introduced
|
|
223
|
+
Evidence: `code snippet`
|
|
224
|
+
Fix: actionable recommendation
|
|
225
|
+
|
|
226
|
+
### Non-Blocking Findings
|
|
227
|
+
|
|
228
|
+
| # | Severity | Lens | File | Finding | Confidence | Scope | Origin |
|
|
229
|
+
|---|----------|------|------|---------|------------|-------|--------|
|
|
230
|
+
| 3 | minor | clean-code | src/foo.ts:15 | Function exceeds 80 lines | 0.92 | inline | introduced |
|
|
231
|
+
|
|
232
|
+
### Pre-Existing Issues Discovered
|
|
233
|
+
|
|
234
|
+
_Found in surrounding code, not introduced by this diff. Filed as issues, excluded from verdict._
|
|
235
|
+
|
|
236
|
+
| # | Severity | File | Finding | Filed As |
|
|
237
|
+
|---|----------|------|---------|----------|
|
|
238
|
+
| P1 | high | src/stages/plan.ts:42 | Unguarded loadProject | ISS-089 |
|
|
239
|
+
|
|
240
|
+
### Tensions
|
|
241
|
+
|
|
242
|
+
| Lens A | Lens B | File | Tradeoff |
|
|
243
|
+
|--------|--------|------|----------|
|
|
244
|
+
| security | performance | src/api.ts:42 | Security wants validation; performance flags overhead |
|
|
245
|
+
|
|
246
|
+
### Convergence
|
|
247
|
+
|
|
248
|
+
| Round | Verdict | Blocking | Important | New Code |
|
|
249
|
+
|-------|---------|----------|-----------|----------|
|
|
250
|
+
| R1 | revise | 5 | 9 | -- |
|
|
251
|
+
| R2 | approve | 0 | 3 | 1 file, 12 lines |
|
|
252
|
+
|
|
253
|
+
### Cleared
|
|
254
|
+
|
|
255
|
+
- Path traversal: safe (IDs matched against in-memory state)
|
|
256
|
+
- Session hijacking: safe (targetWork only on start action)
|
|
257
|
+
|
|
258
|
+
### JSON Summary
|
|
259
|
+
|
|
260
|
+
{ "verdict": "approve", "recommendNextRound": false, "blocking": 0, "nonBlocking": 3, "preExisting": 1, "findings": [...] }
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### Verdict Rules
|
|
58
264
|
|
|
59
|
-
|
|
265
|
+
- **APPROVE** -- No blocking findings, OR all findings are non-blocking. "Approve with findings" is valid. Do NOT use REVISE just because findings exist.
|
|
266
|
+
- **REVISE** -- At least one finding has `blocking: true` after calibration. These must be addressed before merge.
|
|
267
|
+
- **REJECT** -- Critical blocking finding with high confidence, or fundamental design flaw requiring replanning.
|
|
268
|
+
- **"Blocking" requires a concrete failure scenario** -- "crashes the session" is blocking. "Missing test" is NOT blocking. "Missing ?? [] guard" is NOT blocking if Zod defaults protect it.
|
|
269
|
+
- **Pre-existing findings** (`origin: "pre-existing"`) and **architectural-scope findings** are excluded from the verdict. File them as issues.
|