@anthropologies/claudestory 0.1.60 → 0.1.61

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,59 +1,269 @@
1
- # Multi-Lens Review
1
+ <!--
2
+ MAINTENANCE: Prompts exist in .ts (autonomous guide) and .md (manual fallback).
3
+ The .ts files are the source of truth. The .md files are agent-readable copies.
4
+ When updating a prompt, update the .ts file first, then sync the .md file.
5
+ -->
2
6
 
3
- The multi-lens review orchestrator runs 8 specialized review agents in parallel, each analyzing the same diff or plan through a focused perspective. Findings are deduplicated semantically by a merger, then calibrated and judged for a final verdict.
7
+ # Multi-Lens Review -- Evaluation Protocol
4
8
 
5
- ## When This Runs
9
+ This file is referenced from SKILL.md for `/story review-lenses` and when reviewing plans or code.
6
10
 
7
- The autonomous guide invokes lenses automatically when `reviewBackends` includes `"lenses"` during CODE_REVIEW or PLAN_REVIEW stages. You don't need to invoke this manually.
11
+ **Skill command name:** When this file references `/story` in user-facing output, use the actual command that invoked you.
8
12
 
9
- ## Manual Invocation
13
+ ## When to Use
10
14
 
11
- For debugging or standalone use:
15
+ - After writing a plan (any mode -- /story plan, native plan mode, manual)
16
+ - Before committing code (after implementation, before merge)
17
+ - When explicitly invoked via `/story review-lenses`
12
18
 
19
+ The autonomous guide invokes lenses automatically during CODE_REVIEW/PLAN_REVIEW stages when `reviewBackends` includes `"lenses"`. This protocol is for manual/standalone use.
20
+
21
+ ## When to Combine with Single-Agent Review
22
+
23
+ Lenses excel at **breadth and static analysis** -- catching duplicated code, missing validation, unused imports, schema gaps, test coverage holes. For **complex state machines, session lifecycles, or multi-file behavioral reasoning** (e.g., "what happens when session resumes after compaction?"), also run a focused single-agent review.
24
+
25
+ Best workflow: single agent for deep reasoning first, then lenses for breadth/defense-in-depth. The combination is more effective than either alone.
26
+
27
+ ---
28
+
29
+ ## Path A: MCP Available (Primary)
30
+
31
+ Use this path when `claudestory_review_lenses_prepare` is available as an MCP tool.
32
+
33
+ ### Step 1: Determine review stage
34
+
35
+ - If reviewing a **plan** (plan text, implementation design, architecture doc): stage = `PLAN_REVIEW`
36
+ - If reviewing **code** (uncommitted diff, PR, implementation): stage = `CODE_REVIEW`
37
+
38
+ ### Step 2: Capture the artifact
39
+
40
+ - **CODE_REVIEW:** Run `git diff` to capture the current diff. Run `git diff --name-only` for changed file list.
41
+ - **Round 2+:** Use `git diff <commit-at-last-review>..HEAD` to capture only changes since the last review.
42
+ - **PLAN_REVIEW:** Read the plan file (from `.story/sessions/<id>/plan.md` or the current plan in context).
43
+
44
+ ### Step 3: Prepare the review
45
+
46
+ Call `claudestory_review_lenses_prepare` with:
47
+ ```json
48
+ {
49
+ "stage": "CODE_REVIEW",
50
+ "diff": "<full diff text>",
51
+ "changedFiles": ["src/foo.ts", "src/bar.ts"],
52
+ "ticketDescription": "T-XXX: description of the ticket (or 'Manual review -- brief description' if no ticket)",
53
+ "reviewRound": 1,
54
+ "priorDeferrals": []
55
+ }
13
56
  ```
14
- /story review-lenses
57
+
58
+ For rounds 2+, increment `reviewRound` and pass issueKeys of findings you intentionally deferred:
59
+ ```json
60
+ {
61
+ "reviewRound": 2,
62
+ "priorDeferrals": ["clean-code:src/foo.ts:42:dead-param", "test-quality:::handleStart-untested"]
63
+ }
15
64
  ```
16
65
 
17
- This reads the current diff and runs the full lens pipeline outside the autonomous guide.
66
+ The tool returns `lensPrompts` (one per active lens) and `metadata`.
18
67
 
19
- ## The 8 Lenses
68
+ ### Step 4: Spawn lens agents in parallel
20
69
 
21
- **Core (always run):**
22
- 1. Clean Code -- structural quality, SRP, naming, duplication
23
- 2. Security -- OWASP top 10, injection, auth, secrets (Opus model)
24
- 3. Error Handling -- failure modes, missing catches, null safety
70
+ For each lens prompt where `cached: false`, launch a subagent in a **single message with multiple Agent tool calls**:
71
+ - **Prompt:** The `prompt` string returned by the prepare tool + append `\n\n## Diff to review\n\n` + the `artifact` string from Step 3
72
+ - **If `promptTruncated: true`:** The prompt was too large to include. Read `promptRef` from the skill directory (`~/.claude/skills/story/review-lenses/<promptRef>`), fill the preamble variables (see Path B Step 5), select the stage-appropriate section, and append the artifact yourself.
73
+ - **Model:** The `model` string returned (sonnet or opus)
74
+ - **Tools:** Read, Grep, Glob (read-only)
25
75
 
26
- **Surface-activated (based on changed files):**
27
- 4. Performance -- N+1 queries, memory leaks, algorithmic complexity
28
- 5. API Design -- backward compat, REST conventions, error responses
29
- 6. Concurrency -- race conditions, deadlocks, actor isolation (Opus model)
30
- 7. Test Quality -- coverage gaps, flaky patterns, missing assertions
31
- 8. Accessibility -- WCAG, keyboard nav, screen reader support
76
+ Skip lenses where `cached: true` -- their findings are already available in `cachedFindings`. You will include them in Step 6.
32
77
 
33
- ## Synthesis Pipeline
78
+ ### Step 5: Collect results
34
79
 
35
- 1. **Merger** -- semantic dedup + conflict identification
36
- 2. **Judge** -- severity calibration + stage-aware verdict
80
+ Each lens returns JSON: `{ "status": "complete" | "insufficient-context", "findings": [...] }`
37
81
 
38
- ## Configuration
82
+ Combine with cached findings from Step 3. For cached lenses, create a result entry: `{ "lens": "<name>", "status": "complete", "findings": <cachedFindings from Step 3> }`. Include ALL active lenses in the results array -- both spawned and cached -- otherwise the synthesize tool will mark missing lenses as "failed."
39
83
 
40
- In `.story/config.json` under `recipeOverrides`:
84
+ ### Step 6: Synthesize
41
85
 
86
+ Call `claudestory_review_lenses_synthesize` with:
42
87
  ```json
43
88
  {
44
- "reviewBackends": ["lenses", "codex"],
45
- "lensConfig": {
46
- "lenses": "auto",
47
- "maxLenses": 8,
48
- "hotPaths": ["src/engine/**"],
49
- "lensModels": {
50
- "default": "sonnet",
51
- "security": "opus"
52
- }
53
- }
89
+ "lensResults": [
90
+ { "lens": "clean-code", "status": "complete", "findings": [...] },
91
+ { "lens": "security", "status": "complete", "findings": [...] }
92
+ ],
93
+ "activeLenses": ["clean-code", "security", "error-handling"],
94
+ "skippedLenses": ["performance", "api-design", "concurrency", "test-quality", "accessibility"],
95
+ "reviewRound": 1,
96
+ "reviewId": "lens-xxx"
54
97
  }
55
98
  ```
56
99
 
57
- ## Prompt Files
100
+ The tool validates findings, applies blocking policy, and returns `mergerPrompt`.
101
+
102
+ ### Step 7: Run merger
103
+
104
+ Spawn one agent with the returned `mergerPrompt`. It deduplicates findings and identifies tensions. The merger returns JSON with `findings`, `tensions`, and `mergeLog`.
105
+
106
+ ### Step 8: Run judge
107
+
108
+ Call `claudestory_review_lenses_judge` with:
109
+ ```json
110
+ {
111
+ "mergerResultRaw": "<raw JSON from merger agent>",
112
+ "lensesCompleted": ["clean-code", "security", "error-handling"],
113
+ "lensesFailed": [],
114
+ "lensesInsufficientContext": [],
115
+ "lensesSkipped": ["performance", "api-design", "concurrency", "test-quality", "accessibility"],
116
+ "convergenceHistory": [
117
+ { "round": 1, "verdict": "revise", "blocking": 3, "important": 7, "newCode": "--" }
118
+ ]
119
+ }
120
+ ```
121
+
122
+ The tool returns `judgePrompt` with verdict calibration rules and convergence guidance.
123
+
124
+ ### Step 9: Run judge agent
125
+
126
+ Spawn one agent with the `judgePrompt`. It calibrates severity and generates the final verdict. The judge returns the `SynthesisResult` JSON.
127
+
128
+ ### Step 10: Present output
129
+
130
+ Format the judge's output using the **Standardized Output Format** below.
131
+
132
+ ---
133
+
134
+ ## Path B: MCP Unavailable (Fallback)
135
+
136
+ Use this path when MCP tools are not available (e.g., plugin-only install, other AI tools).
137
+
138
+ ### Step 1-2: Same as Path A
139
+
140
+ Determine stage and capture artifact.
141
+
142
+ ### Step 3: Determine active lenses
143
+
144
+ **Core (always):** clean-code, security, error-handling
145
+
146
+ **Surface-activated by changed files:**
147
+ - ORM imports (prisma, sequelize, typeorm, mongoose, knex), nested loops >= 2, files > 300 lines, hotPaths config -> **performance**
148
+ - `**/api/**`, route handlers, controllers, GraphQL resolvers -> **api-design**
149
+ - `.swift`, `.go`, `.rs`, async/await + shared state mutation -> **concurrency**
150
+ - `*.test.*`, `*.spec.*`, `__tests__/` -> **test-quality**
151
+ - `*.tsx`, `*.jsx`, `*.html`, `*.vue`, `*.svelte` -> **accessibility**
152
+
153
+ **Exclude:** lock files, node_modules, generated code (`*.generated.*`), migrations, binaries, vendored deps.
154
+
155
+ ### Step 4: Read prompt files
156
+
157
+ Read `references/shared-preamble.md` in this directory. For each active lens, read `references/lens-<name>.md`. Select the **Code Review Prompt** or **Plan Review Prompt** section based on stage.
158
+
159
+ ### Step 5: Fill variables and spawn
160
+
161
+ Fill `{{variable}}` placeholders in the shared preamble:
162
+ - **Required:** `{{lensName}}`, `{{lensVersion}}` (from frontmatter), `{{reviewStage}}`, `{{reviewArtifact}}`, `{{fileManifest}}`, `{{ticketDescription}}`
163
+ - **Defaults:** `{{projectRules}}` (empty if no RULES.md), `{{knownFalsePositives}}` (empty), `{{activationReason}}` ("core lens" or file signal), `{{findingBudget}}` (10), `{{confidenceFloor}}` (0.6), `{{artifactType}}` ("diff" or "plan")
164
+ - **Lens-specific:** `{{hotPaths}}` (performance only), `{{scannerFindings}}` (security only)
165
+ - **Ticket description:** If you have a ticket context (from `/story`, ticket in progress), use it. Otherwise: `"Manual review -- [brief description]"`
166
+
167
+ Each lens prompt ends with `Append: ## Diff to review\n\n{{reviewArtifact}}` (or `## Plan to review`). This means: when building the final prompt, append the artifact (diff or plan text) at the end after the lens instructions. Replace `{{reviewArtifact}}` with the actual content.
168
+
169
+ Spawn all lens agents in parallel. Each gets: filled preamble + lens prompt section + artifact. Model from frontmatter (sonnet or opus).
170
+
171
+ ### Step 6-9: Simplified synthesis
172
+
173
+ Read `references/merger.md` and `references/judge.md`. Run merger then judge as in Path A. Since the MCP synthesize/judge tools aren't available, use these blocking rules when reading the judge output:
174
+ - Critical severity + confidence >= 0.8 = blocking
175
+ - Everything else = non-blocking
176
+ - "Blocking" requires a concrete failure scenario (crash, data corruption) -- not "missing test"
177
+
178
+ ### Step 10: Present output using standardized format below.
179
+
180
+ ### Shortcut for small reviews
181
+
182
+ If only 1-2 lenses activate and total findings < 5, skip the merger/judge and present findings directly. The synthesis pipeline adds value with 3+ lenses producing overlapping findings.
183
+
184
+ ---
185
+
186
+ ## Error Handling
187
+
188
+ - **Lens returns malformed output:** Drop it, note the lens in "lensesFailed"
189
+ - **Merger fails:** Pass raw (unmerged) findings to the judge
190
+ - **Judge fails:** Compute verdict deterministically: any blocking + high-confidence finding = revise, else approve
191
+ - **Core lens fails (security, error-handling, clean-code):** Never approve -- maximum verdict is "revise"
192
+
193
+ ---
194
+
195
+ ## Acknowledged Deferrals
196
+
197
+ After each round, classify findings you received:
198
+ - **"I'll fix this"** -> fixed (verify next round)
199
+ - **"Out of scope / architectural"** -> deferred (pass issueKey to `priorDeferrals` next round, file as issue)
200
+ - **"I disagree"** -> contested (pass to `priorDeferrals`, adds to knownFalsePositives)
201
+
202
+ This prevents the same findings from being re-reported across rounds.
203
+
204
+ ---
205
+
206
+ ## Standardized Output Format
207
+
208
+ Every lens review produces this structure, regardless of invocation path:
209
+
210
+ ```markdown
211
+ ## Multi-Lens Review
212
+
213
+ **Verdict: APPROVE | REVISE | REJECT**
214
+ _One sentence explaining what drove the verdict._
215
+
216
+ **Lenses:** clean-code, security, error-handling, concurrency, test-quality (5 ran, 3 skipped)
217
+ **Round:** R3 | **Recommend next round:** No -- blocking at 0 for 2 rounds, important stable
218
+
219
+ ### Blocking Findings
220
+
221
+ 1. **[severity] description** (lens, confidence, scope: inline/pr/architectural)
222
+ File: `path/to/file.ts:42` | Origin: introduced
223
+ Evidence: `code snippet`
224
+ Fix: actionable recommendation
225
+
226
+ ### Non-Blocking Findings
227
+
228
+ | # | Severity | Lens | File | Finding | Confidence | Scope | Origin |
229
+ |---|----------|------|------|---------|------------|-------|--------|
230
+ | 3 | minor | clean-code | src/foo.ts:15 | Function exceeds 80 lines | 0.92 | inline | introduced |
231
+
232
+ ### Pre-Existing Issues Discovered
233
+
234
+ _Found in surrounding code, not introduced by this diff. Filed as issues, excluded from verdict._
235
+
236
+ | # | Severity | File | Finding | Filed As |
237
+ |---|----------|------|---------|----------|
238
+ | P1 | high | src/stages/plan.ts:42 | Unguarded loadProject | ISS-089 |
239
+
240
+ ### Tensions
241
+
242
+ | Lens A | Lens B | File | Tradeoff |
243
+ |--------|--------|------|----------|
244
+ | security | performance | src/api.ts:42 | Security wants validation; performance flags overhead |
245
+
246
+ ### Convergence
247
+
248
+ | Round | Verdict | Blocking | Important | New Code |
249
+ |-------|---------|----------|-----------|----------|
250
+ | R1 | revise | 5 | 9 | -- |
251
+ | R2 | approve | 0 | 3 | 1 file, 12 lines |
252
+
253
+ ### Cleared
254
+
255
+ - Path traversal: safe (IDs matched against in-memory state)
256
+ - Session hijacking: safe (targetWork only on start action)
257
+
258
+ ### JSON Summary
259
+
260
+ { "verdict": "approve", "recommendNextRound": false, "blocking": 0, "nonBlocking": 3, "preExisting": 1, "findings": [...] }
261
+ ```
262
+
263
+ ### Verdict Rules
58
264
 
59
- Individual lens prompts are in `references/` in this directory. Each has a version in its filename (e.g., `lens-security-v1.md`). The orchestrator reads these and injects context variables.
265
+ - **APPROVE** -- No blocking findings, OR all findings are non-blocking. "Approve with findings" is valid. Do NOT use REVISE just because findings exist.
266
+ - **REVISE** -- At least one finding has `blocking: true` after calibration. These must be addressed before merge.
267
+ - **REJECT** -- Critical blocking finding with high confidence, or fundamental design flaw requiring replanning.
268
+ - **"Blocking" requires a concrete failure scenario** -- "crashes the session" is blocking. "Missing test" is NOT blocking. "Missing ?? [] guard" is NOT blocking if Zod defaults protect it.
269
+ - **Pre-existing findings** (`origin: "pre-existing"`) and **architectural-scope findings** are excluded from the verdict. File them as issues.