opencode-swarm 7.58.0 → 7.58.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/.opencode/skills/brainstorm/SKILL.md +142 -0
  2. package/.opencode/skills/clarify/SKILL.md +103 -0
  3. package/.opencode/skills/clarify-spec/SKILL.md +58 -0
  4. package/.opencode/skills/codebase-review-swarm/INSTALL.md +75 -0
  5. package/.opencode/skills/codebase-review-swarm/README.md +44 -0
  6. package/.opencode/skills/codebase-review-swarm/SKILL.md +65 -0
  7. package/.opencode/skills/codebase-review-swarm/agents/openai.yaml +6 -0
  8. package/.opencode/skills/codebase-review-swarm/assets/jsonl-schemas.md +239 -0
  9. package/.opencode/skills/codebase-review-swarm/assets/review-report-template.md +244 -0
  10. package/.opencode/skills/codebase-review-swarm/references/compatibility-and-research-notes.md +25 -0
  11. package/.opencode/skills/codebase-review-swarm/references/full-v7-source-prompt.md +2373 -0
  12. package/.opencode/skills/codebase-review-swarm/references/review-protocol-v8.2.md +310 -0
  13. package/.opencode/skills/codebase-review-swarm/scripts/init-review-run.py +134 -0
  14. package/.opencode/skills/codebase-review-swarm/scripts/validate-skill-package.py +62 -0
  15. package/.opencode/skills/consult/SKILL.md +16 -0
  16. package/.opencode/skills/council/SKILL.md +147 -0
  17. package/.opencode/skills/critic-gate/SKILL.md +59 -0
  18. package/.opencode/skills/deep-dive/SKILL.md +142 -0
  19. package/.opencode/skills/design-docs/SKILL.md +81 -0
  20. package/.opencode/skills/discover/SKILL.md +20 -0
  21. package/.opencode/skills/execute/SKILL.md +191 -0
  22. package/.opencode/skills/issue-ingest/SKILL.md +64 -0
  23. package/.opencode/skills/phase-wrap/SKILL.md +123 -0
  24. package/.opencode/skills/plan/SKILL.md +293 -0
  25. package/.opencode/skills/pre-phase-briefing/SKILL.md +69 -0
  26. package/.opencode/skills/resume/SKILL.md +23 -0
  27. package/.opencode/skills/specify/SKILL.md +175 -0
  28. package/.opencode/skills/swarm-pr-feedback/SKILL.md +192 -0
  29. package/.opencode/skills/swarm-pr-review/SKILL.md +884 -0
  30. package/dist/cli/index.js +1348 -1156
  31. package/dist/commands/command-dispatch.d.ts +1 -0
  32. package/dist/commands/index.d.ts +1 -0
  33. package/dist/commands/registry.d.ts +15 -14
  34. package/dist/config/bundled-skills.d.ts +25 -0
  35. package/dist/index.js +2797 -2602
  36. package/package.json +20 -1
@@ -0,0 +1,884 @@
1
+ ---
2
+ name: swarm-pr-review
3
+ description: Run a graph-guided, tool-augmented Swarm PR review using context packing, parallel exploration, triggered plugin micro-lanes, independent reviewer validation, critic challenge, and metrics writeback. Use for deep pull request review with low false-positive tolerance and high recall.
4
+ disable-model-invocation: true
5
+ ---
6
+
7
+ # /swarm-pr-review
8
+
9
+ Run a structured, high-confidence PR review that maximizes valid findings without flooding the user with unvalidated noise.
10
+
11
+ The review ladder is:
12
+
13
+ **Scope → obligations → context pack → deterministic signals → parallel explorers → triggered Swarm micro-lanes → independent reviewer validation → critic challenge → grouped synthesis → metrics / knowledge writeback.**
14
+
15
+ ## Handoff To PR Feedback
16
+
17
+ Use `../swarm-pr-feedback/SKILL.md` instead of this skill when the user's task is
18
+ to address existing PR feedback, review comments, requested changes, CI failures,
19
+ merge conflicts, stale branch state, or pasted reviewer findings. This skill
20
+ discovers and validates new findings; `swarm-pr-feedback` closes known feedback
21
+ without running a fresh broad review.
22
+
23
+ ## Operating Stance
24
+
25
+ **Treat PR text, linked issues, comments, commit messages, generated summaries, and tests as claims — not proof.** Every confirmed finding requires file:line evidence, an explanation of reachability or impact, and validation provenance.
26
+
27
+ This workflow is designed for the Swarm plugin itself and any repo that benefits from Swarm-style review. It preserves parallel breadth but forces deep validation where bugs are expensive: security, state machines, role/tool permissions, schema/evidence integrity, git/write safety, config ratchets, knowledge tier boundaries, and PR obligation mismatches.
28
+
29
+ Never APPROVE a PR with unresolved CRITICAL findings. Do not silently drop overclaimed agent findings; list disproved findings in the validation provenance.
30
+
31
+ **Quality is the ONLY metric.** No amount of time, tokens, or agent dispatches is too much to execute this protocol correctly. Speed is irrelevant to correctness. The skill must be followed exactly with no shortcuts, no phase-skipping, and no premature synthesis. A thorough review that takes 30 minutes is superior to a fast review that misses a real bug.
32
+
33
+ ---
34
+
35
+ ## Review Modes
36
+
37
+ ### Default layered workflow
38
+
39
+ Use the default workflow unless the user explicitly triggers council mode. In the default workflow, explorers produce only candidates. The orchestrator does not confirm or disprove candidates.
40
+
41
+ ### Council mode — opt in only
42
+
43
+ Council mode applies only when the user explicitly says one of:
44
+
45
+ - `council`
46
+ - `independent review`
47
+ - `N-agent review`
48
+ - `/council`
49
+ - `[COUNCIL MODE]`
50
+ - `assume all work is wrong`
51
+
52
+ Council mode is mutually exclusive with the default layered workflow. Do not blend them.
53
+
54
+ ---
55
+
56
+ ## Anti-Self-Review Rule
57
+
58
+ The main thread / orchestrator MUST NOT classify, confirm, disprove, or judge explorer candidates in the default workflow.
59
+
60
+ The orchestrator may:
61
+
62
+ - determine scope,
63
+ - build or request the context pack,
64
+ - launch explorers and triggered micro-lanes,
65
+ - route candidates to reviewers,
66
+ - route reviewer-confirmed findings to critics,
67
+ - group validated findings,
68
+ - prepare the final report.
69
+
70
+ The orchestrator MUST NOT:
71
+
72
+ - re-read a candidate's target code to decide if it is valid,
73
+ - silently downgrade or discard an explorer candidate,
74
+ - treat tool output as a confirmed finding,
75
+ - report a finding that no reviewer validated.
76
+
77
+ If the orchestrator catches itself validating code, it must stop and delegate validation to a reviewer subagent.
78
+
79
+ Exception: in explicit Council mode only, the main thread may act as the independent reviewer as described in the Council Mode section. Prefer a reviewer subagent when available.
80
+
81
+ ---
82
+
83
+ ## Scope Detection
84
+
85
+ Determine review scope using this priority:
86
+
87
+ 1. explicit user-provided PR URL, PR number, commit, branch, or file scope,
88
+ 2. current feature branch diff vs `origin/main`, `main`, `origin/master`, or `master`,
89
+ 3. staged changes,
90
+ 4. latest commit,
91
+ 5. user-specified files or directories.
92
+
93
+ Record:
94
+
95
+ - base ref,
96
+ - head ref,
97
+ - commit range,
98
+ - changed files,
99
+ - deleted files,
100
+ - generated files,
101
+ - lockfiles,
102
+ - test files,
103
+ - docs/config/schema files,
104
+ - whether the working tree is dirty.
105
+
106
+ If scope cannot be determined, review the narrowest safe scope available and state the limitation.
107
+
108
+ ### Pre-flight git ref availability
109
+
110
+ Before launching explorers (Phase 3), confirm the PR branch refs are available:
111
+ - If `head_ref` is a remote branch that is not checked out locally, fetch it via `git fetch origin <head_ref>`
112
+ - **Check out the head branch locally.** Explorer agents read files from the working tree, not from git history — passing the commit range in the delegation prompt is not sufficient because `Read` / `Glob` / `Grep` tools operate on the filesystem. Without a checkout, explorers silently read the base branch's version of changed files and produce invalid candidates. **Before checking out, verify the working tree is clean (`git status --porcelain`). If uncommitted changes exist, stash them or abort the checkout to prevent data loss.**
113
+ - Explicitly pass the commit range (`base_ref..head_ref`) in every explorer delegation so explorers have the revision context for `git show` commands if they need to inspect specific versions.
114
+
115
+ If refs cannot be fetched or checked out, state the limitation in the context pack.
116
+
117
+ ---
118
+
119
+ # Default Review Workflow
120
+
121
+ ## Phase 0: Context Pack and Review Signal Collection
122
+
123
+ Before launching explorers, build a compact `swarm-pr-review-context` in scratch or as a local artifact if file writes are allowed.
124
+
125
+ The context pack must include, when available:
126
+
127
+ ```json
128
+ {
129
+ "scope": {
130
+ "base_ref": "...",
131
+ "head_ref": "...",
132
+ "commit_range": "...",
133
+ "changed_files": [],
134
+ "changed_hunks": [],
135
+ "public_api_changes": [],
136
+ "deleted_or_renamed_files": [],
137
+ "generated_files": []
138
+ },
139
+ "pr_metadata": {
140
+ "title": "...",
141
+ "body_claims": [],
142
+ "checkboxes": [],
143
+ "linked_issues": [],
144
+ "review_comments": [],
145
+ "commit_messages": []
146
+ },
147
+ "obligations": [],
148
+ "repo_graph": {
149
+ "source": ".swarm/repo-graph.json or fallback search",
150
+ "changed_symbols": [],
151
+ "callers": [],
152
+ "callees": [],
153
+ "imports": [],
154
+ "exports": [],
155
+ "sibling_implementations": []
156
+ },
157
+ "deterministic_signals": {
158
+ "ci": [],
159
+ "tests": [],
160
+ "coverage_delta": [],
161
+ "lint_typecheck_build": [],
162
+ "security_scanners": [],
163
+ "dependency_audit": [],
164
+ "secrets_scan": [],
165
+ "mutation_testing": []
166
+ },
167
+ "swarm_artifacts": {
168
+ "evidence_bundles": [],
169
+ "knowledge_hits": [],
170
+ "phase_state": [],
171
+ "metrics": []
172
+ },
173
+ "risk_triggers": []
174
+ }
175
+ ```
176
+
177
+ ### Context pack rules
178
+
179
+ - Diff-only review is allowed for quick orientation, but not enough to confirm nontrivial findings.
180
+ - For every changed production file, identify at least one caller, consumer, import path, route entrypoint, or reason none exists.
181
+ - If `.swarm/repo-graph.json` exists, use it to seed impact cones.
182
+ - If no repo graph exists, build a shallow impact cone using imports, exports, symbol search, route registration, CLI registration, or test references.
183
+ - Pull in relevant `.swarm/evidence/`, `.swarm/state`, `.swarm/knowledge`, or hive/project knowledge entries when present.
184
+ - Historical knowledge may guide candidate generation but cannot confirm a finding by itself.
185
+ - Mark stale, quarantined, or cross-project knowledge as advisory until independently verified in this repo.
186
+
187
+ ---
188
+
189
+ ## Phase 1: Intent Reconstruction / Obligation Extraction
190
+
191
+ Reconstruct what the PR is obligated to deliver before looking for bugs.
192
+
193
+ Use deterministic precedence, highest to lowest:
194
+
195
+ 1. PR checkboxes and acceptance criteria,
196
+ 2. linked issues / tickets,
197
+ 3. explicit user request in the current conversation,
198
+ 4. commit scopes and commit messages,
199
+ 5. test names and test assertions,
200
+ 6. interface diff / exported API changes,
201
+ 7. changelog, README, migration, or docs edits,
202
+ 8. LLM synthesis only when no higher-precedence source exists.
203
+
204
+ Output an obligation list:
205
+
206
+ ```text
207
+ O-001 | source | claim | affected files/symbols | status: UNVERIFIED | evidence refs: []
208
+ ```
209
+
210
+ For each obligation, record:
211
+
212
+ - source,
213
+ - exact claim,
214
+ - affected files or symbols,
215
+ - verification status: `UNVERIFIED → IN_PROGRESS → MET / PARTIALLY_MET / NOT_MET / UNVERIFIABLE`,
216
+ - linked finding ID when unmet,
217
+ - reason if unverifiable.
218
+
219
+ Tests are claims. A passing or added test does not prove the obligation unless the reviewer inspects the assertion strength and relevant code path.
220
+
221
+ ### Quantitative claim verification
222
+
223
+ PR body numerical claims (test counts, coverage percentages, assertion counts, performance benchmarks) are obligations, not proof. For each quantitative claim:
224
+
225
+ 1. Extract the claim and its source (PR body, comment, commit message).
226
+ 2. Verify against actual tool output or CI artifacts when available.
227
+ 3. If the claim cannot be independently verified, mark the obligation `UNVERIFIABLE` with reason.
228
+ 4. If the claim is disproved by evidence, create a finding linking the discrepancy.
229
+
230
+ Common patterns to verify:
231
+ - "N tests pass" → count actual test results from CI logs or test runner output
232
+ - "N% coverage" → compare against coverage report
233
+ - "No regressions" → verify against test runner failure count
234
+
235
+ ---
236
+
237
+ ## Phase 2: Deterministic Signal Ingestion
238
+
239
+ Ingest deterministic signals as candidate generators. They are never final findings.
240
+
241
+ Use available local artifacts first. Run safe read-only or standard project validation commands only when appropriate for the environment.
242
+
243
+ Candidate signal sources include:
244
+
245
+ - CI failures and logs,
246
+ - test failures,
247
+ - coverage delta,
248
+ - lint/typecheck/build output,
249
+ - `git diff --check`,
250
+ - dependency audit output,
251
+ - lockfile diff,
252
+ - CodeQL alerts,
253
+ - Semgrep or SAST findings,
254
+ - secrets scan findings,
255
+ - license scan findings,
256
+ - mutation testing output,
257
+ - package manager warnings,
258
+ - generated schema diffs.
259
+
260
+ Record each signal as:
261
+
262
+ ```text
263
+ [TOOL_CANDIDATE] | tool | severity | file:line | claim | raw_signal_summary | confidence
264
+ ```
265
+
266
+ Tool candidate rules:
267
+
268
+ - Confirm reachability before reporting.
269
+ - Confirm PR-introducedness before reporting as a PR blocker.
270
+ - Confirm that a framework, schema, middleware, caller guard, or test isolation rule does not already mitigate it.
271
+ - Do not report scanner output verbatim without reviewer validation.
272
+ - Redact secrets; never paste raw credentials into the final output.
273
+
274
+ ---
275
+
276
+ ## Phase 3: Parallel Base Explorer Lanes
277
+
278
+ Launch all base lanes in parallel in a single message with multiple Agent tool calls when the environment supports it (`run_in_background: true`). Use `Explore` subagents for exploration.
279
+
280
+ If the Agent tool is unavailable, simulate isolated passes. Do not let one lane's conclusions bias another lane.
281
+
282
+ **task_id uniqueness for parallel dispatches:** When re-dispatching failed or re-running explorer lanes, apply these rules:
283
+ - For Agent tools that require caller-supplied `task_id` values, every parallel explorer lane invocation and retry MUST use a unique ID across the review session, including lane and attempt suffix (e.g. `pr_review_explore_lane1_attempt2`). Never reuse a prior `task_id` unless intentionally resuming that exact lane.
284
+ - If the runtime auto-generates `task_id` values (resume mode), omit the `task_id` parameter rather than fabricating one.
285
+ - Do not use the same `task_id` across concurrent lane dispatches — schema validation rejects duplicate `task_id` values.
286
+
287
+ Explorers optimize for recall. Over-reporting is expected. Explorers produce candidates only.
288
+
289
+ | Lane | Focus | Required checks |
290
+ |---|---|---|
291
+ | Lane 1: Correctness and edge cases | Logic errors, null/undefined handling, incorrect operators, async ordering, races, off-by-one, error paths | input domain, nullability, async/await, loop termination, exception behavior, backward compatibility |
292
+ | Lane 2: Security and trust boundaries | Injection, authz/authn bypass, SSRF, path traversal, secret exposure, unsafe deserialization, prompt injection | untrusted input sources, sanitization, credential handling, permission boundary, private network access, output escaping |
293
+ | Lane 3: Dependencies and deployment safety | Import changes, version bumps, lockfile drift, breaking APIs, package scripts, runtime assumptions | lockfile consistency, new transitive deps, Node/Bun/runtime compatibility, platform assumptions, license red flags |
294
+ | Lane 4: Docs, intent, and drift | PR claims vs implementation, docs mismatch, migration/changelog gaps, stale examples | obligation mapping, changed behavior not documented, docs promising behavior not implemented |
295
+ | Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks, `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
296
+ | Lane 6: Performance and architecture | Complexity regressions, memory leaks, over-coupling, inefficient graph scans, global mutable state | algorithmic deltas, caching, resource lifecycle, state ownership, architectural boundary violations |
297
+
298
+ ### Explorer context contract
299
+
300
+ Every explorer must inspect or explicitly mark unavailable:
301
+
302
+ 1. the changed hunk,
303
+ 2. at least one caller, consumer, or downstream impact-cone node,
304
+ 3. at least one callee, dependency, or upstream assumption,
305
+ 4. at least one sibling implementation or prior pattern,
306
+ 5. the nearest relevant test or missing-test location,
307
+ 6. deterministic signal entries mapped to its files/symbols,
308
+ 7. relevant Swarm knowledge/evidence entries, if present.
309
+ 8. the commit range to analyze (`base_ref..head_ref`),
310
+
311
+ Explorer output format:
312
+
313
+ ```text
314
+ [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence: LOW/MEDIUM/HIGH
315
+ ```
316
+
317
+ Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
318
+
319
+ ---
320
+
321
+ ## Phase 4: Triggered Swarm Plugin Micro-Lanes
322
+
323
+ After base lanes start, inspect the context pack risk triggers. Launch focused micro-lanes for triggered categories only. Do not launch irrelevant micro-lanes.
324
+
325
+ Each micro-lane receives:
326
+
327
+ - exact files and hunks in scope,
328
+ - related obligations,
329
+ - impact cone entries,
330
+ - relevant deterministic signals,
331
+ - related historical knowledge with quarantine/staleness status,
332
+ - expected invariants,
333
+ - output format as `[CANDIDATE]` only.
334
+
335
+ ### Swarm plugin risk trigger map
336
+
337
+ | Trigger in diff or context pack | Launch micro-lane | Invariants to check |
338
+ |---|---|---|
339
+ | `agents`, `prompts`, `templates`, prompt interpolation, role text | Architect prompt integrity | no scope escape, no system prompt leakage, safe `{{variable}}` interpolation, untrusted text isolated from instructions |
340
+ | `council`, `verdict`, `quorum`, `veto`, synthesis | Council orchestration | quorum math correct, veto enforced, evidence not lost, dissent preserved, no explorer result treated as final |
341
+ | `guardrail`, `gate`, `delegation`, `rate limit`, approval checks | Guardrail bypass paths | gates cannot be skipped, delegation cannot bypass policy, rate limits cannot be reset by user-controlled state |
342
+ | `schema`, `evidence`, JSONL, migrations, serializers | Evidence schema drift | backward compatibility, required fields preserved, version migration safe, malformed evidence rejected |
343
+ | `knowledge`, `curator`, `hive`, `quarantine`, memory | Knowledge base contract | project vs hive tiers not confused, quarantine honored, CRUD semantics stable, stale knowledge not injected as fact |
344
+ | `phase`, `state`, `plan`, `.swarm/state`, completion markers | Phase transition validation | ordering enforced, retro requirements handled, no premature completion, rollback safe |
345
+ | `model`, `role`, `prefix`, `tool`, agent config | Model-to-role mapping | role prefix enforced, tool permissions least-privilege, unauthorized tools impossible, model fallback safe |
346
+ | `config`, defaults, ratchet, locks, policy flags | Config ratchet semantics | once-enabled gates cannot silently disable, downgrade attempts detected, lock-state integrity preserved |
347
+ | `url`, `fetch`, `http`, GitHub PR/issue parsing, package fetch | URL sanitization and external fetch | scheme allowlist, credential stripping, private IP / localhost / metadata IP blocking, redirect handling, timeout safe |
348
+ | `git`, branch, checkout, reset, worktree, `.git` | Git safety | branch detection reliable, no unsafe `reset --hard`, .git protected, path normalization cross-platform, worktree state preserved |
349
+ | `shell`, `exec`, command parser, file writes, delete/move/copy | Shell/write authority and path containment | destructive commands gated, dry-run preferred, symlink/path escape blocked, writes scoped, command injection impossible |
350
+ | `test`, `bun`, mocks, fixtures, CI matrix | Test infrastructure | `bun:test` API correct, mock isolation, cross-platform paths, no hidden dependency on test order, fixtures reset |
351
+ | `metrics`, telemetry, logs, serialized traces | Metrics and evidence privacy | no secrets in logs, evidence reproducible, privacy preserved, counts cannot be gamed, metrics schema stable |
352
+
353
+ Micro-lane output format:
354
+
355
+ ```text
356
+ [CANDIDATE] | candidate_id | micro_lane | severity | category | file:line | claim | invariant_violated | evidence_summary | confidence
357
+ ```
358
+
359
+ ---
360
+
361
+ ## Phase 5: Swarm-Native Verifier Routing
362
+
363
+ Use Swarm-native agents and artifacts when available. If exact agent names are unavailable, route the same task to the closest equivalent reviewer/critic role.
364
+
365
+ | Swarm verifier / artifact | When to use | Purpose |
366
+ |---|---|---|
367
+ | `critic_drift_verifier` | obligation-vs-code, docs-vs-code, phase/gate changes, schema/config changes | detect drift between stated behavior and actual implementation |
368
+ | `critic_hallucination_verifier` | external APIs, package claims, URLs, CLI flags, GitHub behavior, model/tool names | verify claims against source or mark as unverified |
369
+ | `curator_phase` | before exploration and after synthesis | retrieve relevant lessons; write back confirmed true positives / false positives |
370
+ | `test_engineer` | confirmed/borderline correctness, security, state, schema, or config findings | propose or run falsification probes and regression tests |
371
+ | `prm_scorer` | long or contentious reviews | score whether review trajectory is drifting toward unsupported speculation |
372
+ | `.swarm/repo-graph.json` | all nontrivial code changes | build impact cones and sibling-pattern checks |
373
+ | `.swarm/evidence/` | schema, phase, state, council, and guardrail changes | verify evidence compatibility and serialized provenance |
374
+ | `/swarm metrics` or stored metrics | after synthesis | record review quality and recurring false positives |
375
+
376
+ Verifier output is advisory until incorporated by the independent reviewer or critic.
377
+
378
+ ---
379
+
380
+ ## Phase 6: Independent Reviewer Confirmation
381
+
382
+ Route candidates to reviewer subagents. The reviewer must re-read the candidate's file:line evidence and relevant context pack entries directly.
383
+
384
+ ### Noise budget and universal validation
385
+
386
+ Before reviewer dispatch, the orchestrator may suppress candidates that are ALL of:
387
+ - purely stylistic without correctness, security, test, maintainability, or user-impact implications,
388
+ - exact duplicates of a candidate already queued for validation,
389
+ - explorer-stated confidence=LOW with zero structural evidence (no file:line, no code path, no invariant reference).
390
+
391
+ Every suppressed candidate must appear in the final report under "Suppressed Candidates" with the reason. Suppression without disclosure is a hard rule violation.
392
+
393
+ **All remaining candidates — regardless of severity — must be routed to independent reviewer validation.** Severity alone does not determine validation eligibility; it determines routing priority. A LOW-severity candidate with file:line evidence and a specific code path gets the same reviewer attention as a HIGH-severity candidate.
394
+
395
+ Candidates not routed to reviewers must be listed as UNVERIFIED with reason in the validation provenance. Do not silently drop them.
396
+
397
+ ### Reviewer required checks
398
+
399
+ For each candidate, the reviewer must determine:
400
+
401
+ - exact file:line evidence,
402
+ - whether the issue is introduced by this PR or pre-existing,
403
+ - reachability from realistic execution paths,
404
+ - whether caller guards, schema validation, middleware, framework defaults, feature flags, or state-machine constraints mitigate it,
405
+ - whether tests cover the negative path,
406
+ - whether sibling files or docs must change together,
407
+ - whether the severity is justified,
408
+ - the smallest falsification probe that would prove or disprove it.
409
+
410
+ ### Reviewer classifications
411
+
412
+ | Classification | Meaning |
413
+ |---|---|
414
+ | `CONFIRMED` | Evidence is real, reachable or structurally proven, and introduced or exposed by this PR |
415
+ | `DISPROVED` | Candidate claim is incorrect, unreachable, mitigated, or based on a misunderstanding |
416
+ | `UNVERIFIED` | Available evidence is insufficient to determine validity |
417
+ | `PRE_EXISTING` | Issue exists on the base branch and is not materially worsened by this PR |
418
+
419
+ ### Evidence classifications
420
+
421
+ | Type | Definition |
422
+ |---|---|
423
+ | `STRUCTURALLY_PROVEN` | File:line evidence directly demonstrates the bug or violated invariant |
424
+ | `EXECUTION_PROVEN` | A test, trace, reproduction, or command demonstrates failure |
425
+ | `STATIC_TRACE_PROVEN` | Static analysis plus reviewed path/context demonstrates reachability |
426
+ | `PLAUSIBLE_BUT_UNVERIFIED` | Pattern suggests risk, but reachability or mitigation is unresolved |
427
+
428
+ Reviewer output format:
429
+
430
+ ```text
431
+ [REVIEWED] | candidate_id | classification | evidence_type | final_severity | introduced_by_pr: YES/NO/UNKNOWN | file:line | rationale | falsification_probe | reviewer_id
432
+ ```
433
+
434
+ `DISPROVED` findings must include the reason. `PRE_EXISTING` findings must include the base-branch evidence if available.
435
+
436
+ ---
437
+
438
+ ## Phase 7: Falsification Probe Requirement
439
+
440
+ Each confirmed nontrivial finding must include at least one falsification artifact:
441
+
442
+ - runnable failing command,
443
+ - proposed regression test,
444
+ - mutation that current tests fail to kill,
445
+ - static-analysis trace,
446
+ - minimal execution path,
447
+ - exact reason no runtime probe is available.
448
+
449
+ Nontrivial means any finding that affects correctness, security, state transitions, write authority, git safety, config, schema/evidence integrity, model/tool permissions, external fetches, persistence, or user-visible behavior.
450
+
451
+ A finding may still be reported without a runnable command if it is structurally proven, but the report must state why a runtime probe was not available.
452
+
453
+ ---
454
+
455
+ ## Phase 8: Critic Challenge
456
+
457
+ Route every reviewer-confirmed HIGH or CRITICAL finding to a critic. Also route borderline MEDIUM findings when they involve security, state machines, write authority, evidence integrity, model/tool permissions, git safety, or config ratchets.
458
+
459
+ The critic must challenge:
460
+
461
+ - severity inflation,
462
+ - weak or incomplete evidence,
463
+ - missing mitigating context,
464
+ - false reachability assumptions,
465
+ - framework or middleware defaults,
466
+ - schema validation gates,
467
+ - state-machine constraints,
468
+ - feature flags or dead code,
469
+ - pre-existing status,
470
+ - non-actionable or unsafe fix recommendations,
471
+ - sibling-file gaps,
472
+ - whether multiple comments should be grouped into one root cause.
473
+
474
+ Critic output format:
475
+
476
+ ```text
477
+ [CRITIC] | finding_id | UPHELD / DOWNGRADED / DISPROVED / NEEDS_MORE_EVIDENCE | final_severity | reason | required_report_change
478
+ ```
479
+
480
+ Refuted findings become `DISPROVED` or `ADVISORY`, depending on critic rationale. Downgrades must be listed in the final validation provenance.
481
+
482
+ ---
483
+
484
+ ## Runtime-Aware False-Positive Guard Checklist
485
+
486
+ Before confirming any finding, the reviewer and critic must check all that apply:
487
+
488
+ - [ ] Schema validation gate: does schema validation reject malformed input before the flagged line?
489
+ - [ ] Middleware interception: does middleware handle the request or command before the flagged path?
490
+ - [ ] Framework default mitigation: does the framework inherently prevent this class of issue?
491
+ - [ ] Caller context correctness: who invokes this code, and can untrusted input reach it?
492
+ - [ ] Execution reachability: is the path reachable, or behind a feature flag, dead branch, build-only path, or commented-out code?
493
+ - [ ] State-machine constraints: do ordering rules, locks, mutexes, phase gates, or transition guards prevent the state?
494
+ - [ ] Permission boundary: does role/tool mapping prevent the operation?
495
+ - [ ] Data lifetime: is the flagged state persisted, serialized, logged, or only transient?
496
+ - [ ] Cross-platform behavior: does Windows/macOS/Linux path or shell behavior change the result?
497
+ - [ ] Test environment mismatch: is the finding only true under a mock or fixture that cannot occur in production?
498
+
499
+ If a mitigation applies and was not accounted for, downgrade to `ADVISORY`, `UNVERIFIED`, or `DISPROVED`.
500
+
501
+ ---
502
+
503
+ ## Phase 9: Synthesis, Grouping, and Noise Budget
504
+
505
+ Before final output:
506
+
507
+ - group duplicate candidates by root cause,
508
+ - report one finding per root cause,
509
+ - attach all affected file:line references under that finding,
510
+ - separate ship blockers from advisory notes,
511
+ - suppress pure style/nit findings unless they indicate correctness, security, test, maintainability, or user-impact risk,
512
+ - distinguish PR-introduced from pre-existing,
513
+ - distinguish confirmed from plausible-but-unverified,
514
+ - include disproved agent/tool claims,
515
+ - keep final comments actionable.
516
+
517
+ ### Finding ID format
518
+
519
+ ```text
520
+ F-001 | severity | category | root cause | affected file:line refs | reviewer | critic status
521
+ ```
522
+
523
+ ### Suggested final grouping
524
+
525
+ 1. Ship blockers,
526
+ 2. Important non-blockers,
527
+ 3. Test / coverage gaps,
528
+ 4. Pre-existing issues,
529
+ 5. Unverified plausible risks,
530
+ 6. Disproved candidates / false positives,
531
+ 7. Clean lane summary.
532
+
533
+ ---
534
+
535
+ ## Phase 10: Metrics and Knowledge Writeback
536
+
537
+ At the end of the review, record review quality metrics when Swarm metrics or local evidence storage is available.
538
+
539
+ Record:
540
+
541
+ - raw candidates by base lane,
542
+ - raw candidates by micro-lane,
543
+ - deterministic tool candidates,
544
+ - reviewer-confirmed findings,
545
+ - reviewer-disproved findings,
546
+ - reviewer-unverified findings,
547
+ - critic-upheld findings,
548
+ - critic-downgraded findings,
549
+ - critic-disproved findings,
550
+ - final reported findings,
551
+ - suppressed non-actionable candidates,
552
+ - recurring false-positive patterns,
553
+ - commands or probes used,
554
+ - token/time cost if available,
555
+ - accepted/fixed findings when known.
556
+
557
+ Knowledge writeback rules:
558
+
559
+ - Write back only validated true positives or validated false-positive patterns.
560
+ - Include file patterns, invariant, evidence, and why it was confirmed/disproved.
561
+ - Mark repo-specific lessons as project-tier unless there is strong evidence they generalize.
562
+ - Never promote quarantined or unvalidated knowledge to hive-tier.
563
+ - Never store secrets, private tokens, or raw sensitive logs.
564
+
565
+ ---
566
+
567
+ ## Phase 11: Post-Fix Re-verification
568
+
569
+ When the PR author pushes fixes after a review, perform a targeted re-verification before updating the verdict.
570
+
571
+ ### Re-verification scope
572
+
573
+ Only re-verify findings the author claims to have fixed. Do not re-run the full review pipeline.
574
+
575
+ ### Re-verification steps
576
+
577
+ 1. For each finding the author claims fixed:
578
+ a. Read the changed file(s) from the updated branch at the specific lines referenced in the original finding.
579
+ b. Verify the fix addresses the root cause, not just the symptom.
580
+ c. Check that the fix does not introduce a new issue in the same area.
581
+ 2. Run CI checks on the updated branch to confirm no regressions.
582
+ 3. For findings the author did not address, carry forward the original finding with unchanged status.
583
+
584
+ ### Re-verification output
585
+
586
+ ```
587
+ [REVERIFIED] | finding_id | FIXED / PARTIALLY_FIXED / NOT_FIXED / NEW_ISSUE | evidence | updated_severity
588
+ ```
589
+
590
+ - `FIXED`: the root cause is resolved and no new issue introduced.
591
+ - `PARTIALLY_FIXED`: the root cause is partially addressed or a residual concern remains.
592
+ - `NOT_FIXED`: the root cause persists unchanged.
593
+ - `NEW_ISSUE`: the fix introduced a new problem at the same location.
594
+
595
+ Update the verdict only after re-verifying all previously blocking findings.
596
+
597
+ ---
598
+
599
+ # Council Mode Workflow
600
+
601
+ Council mode is opt-in only and adversarial.
602
+
603
+ When triggered:
604
+
605
+ 1. Build the same context pack as default mode.
606
+ 2. Launch all council agents in a single message with multiple Agent tool calls when supported (`run_in_background: true`).
607
+ 3. Each council agent assumes all work is wrong until code evidence proves otherwise.
608
+ 4. Each agent hunts within its lane only.
609
+ 5. Agents return evidence states only: `EVIDENCE_FOUND`, `SUSPICIOUS`, or `CLEAN`.
610
+ 6. Agents must not return `CONFIRMED`, `DISPROVED`, or final severity.
611
+ 7. The independent reviewer then classifies every council candidate as `CONFIRMED`, `DISPROVED`, `UNVERIFIED`, or `PRE_EXISTING`.
612
+ 8. Apply critic challenge to reviewer-confirmed HIGH/CRITICAL or borderline findings.
613
+ 9. Final synthesis distinguishes real blockers, real low-severity issues, accepted caveats, disproved council claims, and follow-up quality work.
614
+
615
+ Default council lanes:
616
+
617
+ - correctness and edge cases,
618
+ - security and trust boundaries,
619
+ - dependency and deployment safety,
620
+ - docs and intent-vs-actual,
621
+ - tests and falsifiability,
622
+ - performance and architecture when risk justifies it.
623
+
624
+ Council prompt requirements:
625
+
626
+ - branch and commit range,
627
+ - context pack summary,
628
+ - files owned by that lane,
629
+ - relevant impact cone,
630
+ - explicit checklist,
631
+ - strict output cap,
632
+ - `EVIDENCE_FOUND / SUSPICIOUS / CLEAN` only,
633
+ - file:line evidence required for `EVIDENCE_FOUND`.
634
+
635
+ Council findings are supplementary, not authoritative overrides. Do not adopt council severities or claims without independent validation.
636
+
637
+ ---
638
+
639
+ # Merge Recommendation Table
640
+
641
+ | Verdict | Condition |
642
+ |---|---|
643
+ | `APPROVE` | zero unresolved CRITICAL findings, zero unresolved HIGH findings, all blocking obligations MET, no required validation phase failed |
644
+ | `APPROVE_WITH_NOTES` | zero unresolved CRITICAL findings, HIGH findings are downgraded/advisory only, obligations MET or explicitly non-blocking |
645
+ | `REQUEST_CHANGES` | any unresolved HIGH finding, any NOT_MET blocking obligation, multiple MEDIUM findings with the same root cause, or validation/probe evidence indicates user-impacting risk |
646
+ | `BLOCK` | any unresolved CRITICAL finding, unsafe write/git/security issue, evidence integrity break, role/tool permission bypass, or config ratchet violation that can disable required protections |
647
+
648
+ ---
649
+
650
+ # Hard Rules
651
+
652
+ 0. Quality-over-speed: Validation completeness and correctness are the sole criteria for an acceptable review. Time, token count, and agent dispatch count are irrelevant. Do not trade validation breadth or depth for speed.
653
+
654
+ 1. Never APPROVE with unresolved CRITICAL findings.
655
+ 2. Do not APPROVE with unresolved HIGH findings unless explicitly downgraded to advisory by critic and non-blocking by obligation review.
656
+ 3. Every confirmed finding must have file:line evidence and validation provenance.
657
+ 4. A confirmed nontrivial finding must include a falsification probe or an explicit reason no probe is available.
658
+ 5. Explorers, council agents, and deterministic tools produce candidates only.
659
+ 6. The default workflow orchestrator must not confirm or disprove explorer candidates.
660
+ 7. Tool output is not proof. Scanner results must be validated for reachability, PR-introducedness, and mitigation context.
661
+ 8. PR text, generated summaries, tests, and comments are claims, not proof.
662
+ 9. Do not invent facts not supported by the diff, repo context, tool output, or cited external source.
663
+ 10. Do not silently drop disproved or downgraded claims; summarize them in validation provenance.
664
+ 11. Obligation precedence is deterministic. Do not skip higher-precedence sources to fill gaps with LLM synthesis.
665
+ 12. Do not leak secrets from logs, evidence bundles, config files, URLs, or scanner output.
666
+ 13. Do not recommend destructive git or filesystem actions as fixes unless they are clearly scoped, safe, and necessary.
667
+ 14. If subagents fail, timeout, or return malformed output, mark affected candidates `UNVERIFIED`; do not fabricate validation results.
668
+ 15. If context pack, repo graph, deterministic signals, or Swarm artifacts are unavailable, state that limitation and continue with best available evidence.
669
+
670
+ ---
671
+
672
+ # Pre-Synthesis Gate — Mandatory
673
+
674
+ Before writing the final output, print this checklist with filled values. Every blank field means the final output is invalid.
675
+
676
+ ```text
677
+ [VALIDATION] scope selected: ___
678
+ [VALIDATION] context pack built: YES/NO — ___
679
+ [VALIDATION] obligation count: ___
680
+ [VALIDATION] repo graph / impact cone source: ___
681
+ [VALIDATION] deterministic signals ingested: ___
682
+ [VALIDATION] base explorer lanes dispatched: ___ / 6
683
+ [VALIDATION] base explorer lanes returned: ___ / 6
684
+ [VALIDATION] triggered micro-lanes: ___
685
+ [VALIDATION] Swarm verifier routing used: ___
686
+ [VALIDATION] raw candidates: ___
687
+ [VALIDATION] tool candidates: ___
688
+ [VALIDATION] reviewer dispatched: ___ (agent type, task description)
689
+ [VALIDATION] reviewer returned: ___ (APPROVED / REJECTED / CONCERNS — copy verdict text)
690
+ [VALIDATION] findings confirmed by reviewer: ___
691
+ [VALIDATION] findings rejected by reviewer as false positive: ___
692
+ [VALIDATION] findings marked PRE_EXISTING: ___
693
+ [VALIDATION] findings left UNVERIFIED: ___
694
+ [VALIDATION] findings escalated to critic: ___
695
+ [VALIDATION] critic dispatched: ___ OR "SKIPPED — no reviewer-confirmed HIGH/CRITICAL or borderline findings"
696
+ [VALIDATION] critic returned: ___ OR "N/A"
697
+ [VALIDATION] findings upheld by critic: ___
698
+ [VALIDATION] findings downgraded by critic: ___
699
+ [VALIDATION] findings disproved by critic: ___
700
+ [VALIDATION] falsification probes included: ___
701
+ [VALIDATION] grouped root-cause findings: ___
702
+ [VALIDATION] metrics / knowledge writeback: ___
703
+ [VALIDATION] all explorers verified to diff against PR branch, not HEAD: YES/NO
704
+ [VALIDATION] noise-filter suppressed candidates: ___ (count, each with reason in final report)
705
+ [VALIDATION] all non-suppressed candidates routed to reviewer: YES/NO
706
+ ```
707
+
708
+ If the reviewer returned `REJECTED` or `CONCERNS`, route the issue back to implementation context or mark the candidate invalid with reason. Do not silently downgrade a rejection.
709
+
710
+ ---
711
+
712
+ # Final Output Format
713
+
714
+ Produce the final review in this order:
715
+
716
+ ## PR intent
717
+
718
+ Summarize the obligations and user-visible intent.
719
+
720
+ ## Implementation summary
721
+
722
+ Summarize what changed, including major files, public APIs, schemas, configs, tests, and Swarm artifacts.
723
+
724
+ ## Intended vs actual mapping
725
+
726
+ | Obligation | Source | Actual evidence | Status | Linked finding |
727
+ |---|---|---|---|---|
728
+
729
+ Use `MET`, `PARTIALLY_MET`, `NOT_MET`, or `UNVERIFIABLE`.
730
+
731
+ ## Validation provenance
732
+
733
+ Include:
734
+
735
+ - context pack limitations,
736
+ - explorer lanes launched and returned,
737
+ - micro-lanes triggered,
738
+ - deterministic signals ingested,
739
+ - reviewer identity / role for each finding,
740
+ - critic result for each escalated finding,
741
+ - findings DISPROVED by reviewer with reason,
742
+ - findings DOWNGRADED by critic with reason,
743
+ - findings left UNVERIFIED with reason.
744
+
745
+ If zero findings, explicitly state:
746
+
747
+ ```text
748
+ No confirmed findings — all validated lanes CLEAN.
749
+ ```
750
+
751
+ Then provide a lane-by-lane clean summary.
752
+
753
+ ## Confirmed findings
754
+
755
+ For each finding:
756
+
757
+ ```text
758
+ F-001 — Severity — Category — Root cause
759
+ Files: path:line, path:line
760
+ Status: CONFIRMED / critic status
761
+ Evidence type: STRUCTURALLY_PROVEN / EXECUTION_PROVEN / STATIC_TRACE_PROVEN
762
+ Why it matters:
763
+ Validation:
764
+ Falsification probe:
765
+ Suggested fix:
766
+ ```
767
+
768
+ ## Pre-existing findings
769
+
770
+ List separately from PR-introduced findings.
771
+
772
+ ## Unverified but plausible risks
773
+
774
+ Only include if useful and clearly labeled as unverified.
775
+
776
+ ## Test / coverage gaps
777
+
778
+ Focus on missing tests that would catch real risks, not generic coverage requests.
779
+
780
+ ## Disproved candidates and false positives
781
+
782
+ List concise reasons for notable false positives from explorers, tools, council agents, or reviewers.
783
+
784
+ ## Verdict
785
+
786
+ Use one of:
787
+
788
+ - `APPROVE`
789
+ - `APPROVE_WITH_NOTES`
790
+ - `REQUEST_CHANGES`
791
+ - `BLOCK`
792
+
793
+ ## Merge recommendation
794
+
795
+ Explain the recommendation in one short paragraph and list required actions before merge if applicable.
796
+
797
+ ---
798
+
799
+ # Reviewer Prompt Template
800
+
801
+ Use this template when dispatching reviewer subagents:
802
+
803
+ ```text
804
+ You are the independent reviewer. Validate only the candidates assigned below.
805
+ Do not search for new issues except where needed to validate reachability or mitigation.
806
+ Do not trust explorer severity.
807
+
808
+ Context pack summary:
809
+ - scope: ...
810
+ - obligations: ...
811
+ - impact cone: ...
812
+ - deterministic signals: ...
813
+ - relevant Swarm artifacts / knowledge: ...
814
+ - base_ref: <commit SHA of base branch>
815
+ - head_ref: <commit SHA of PR head branch>
816
+
817
+ Candidates:
818
+ - ...
819
+
820
+ For each candidate, return:
821
+ [REVIEWED] | candidate_id | CONFIRMED/DISPROVED/UNVERIFIED/PRE_EXISTING | evidence_type | final_severity | introduced_by_pr | file:line | rationale | falsification_probe | reviewer_id
822
+
823
+ You must check caller context, reachability, schema/middleware/framework mitigations, state-machine constraints, test coverage, PR-introducedness, and severity.
824
+
825
+ IMPORTANT: If a finding claims behavior is "new" or "introduced by the PR", you MUST read the equivalent code on the base branch (git show <base_ref>:<file>) to verify it was not present before. A reviewer claim of "this is new" is invalid without base-branch evidence. Do not compare the new code to an idealized baseline — compare it to what actually existed on the base branch at the time of the PR.
826
+ ```
827
+
828
+ ---
829
+
830
+ # Critic Prompt Template
831
+
832
+ Use this template when dispatching critic subagents:
833
+
834
+ ```text
835
+ You are the adversarial critic. Challenge only reviewer-confirmed findings assigned below.
836
+ Your goal is to reduce false positives, severity inflation, and non-actionable reports.
837
+
838
+ For each finding, challenge:
839
+ - whether evidence proves the claim,
840
+ - whether the path is reachable,
841
+ - whether mitigations apply,
842
+ - whether severity is inflated,
843
+ - whether it is PR-introduced,
844
+ - whether suggested fixes are safe/actionable,
845
+ - whether related files were missed,
846
+ - whether multiple findings should be grouped.
847
+
848
+ Return:
849
+ [CRITIC] | finding_id | UPHELD/DOWNGRADED/DISPROVED/NEEDS_MORE_EVIDENCE | final_severity | reason | required_report_change
850
+ ```
851
+
852
+ ---
853
+
854
+ # Explorer Prompt Template
855
+
856
+ Use this template when dispatching base explorer or micro-lane agents:
857
+
858
+ ```text
859
+ You are an explorer. Optimize for recall, not final judgment.
860
+ Return candidates only. Do not use CONFIRMED, DISPROVED, or PRE_EXISTING.
861
+
862
+ Lane:
863
+ Scope:
864
+ Obligations:
865
+ Changed files/hunks:
866
+ Impact cone:
867
+ Relevant deterministic signals:
868
+ Relevant Swarm artifacts / knowledge:
869
+ Checklist:
870
+
871
+ You must inspect or mark unavailable:
872
+ 1. changed hunk,
873
+ 2. caller/consumer,
874
+ 3. callee/dependency,
875
+ 4. sibling implementation or prior pattern,
876
+ 5. nearest test or missing-test location,
877
+ 6. deterministic signals,
878
+ 7. Swarm artifacts/knowledge.
879
+
880
+ Return:
881
+ [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence
882
+ ```
883
+
884
+ Do not let speed degrade validation quality.