opencode-swarm 7.58.0 → 7.59.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.opencode/skills/brainstorm/SKILL.md +142 -0
- package/.opencode/skills/clarify/SKILL.md +103 -0
- package/.opencode/skills/clarify-spec/SKILL.md +58 -0
- package/.opencode/skills/codebase-review-swarm/INSTALL.md +75 -0
- package/.opencode/skills/codebase-review-swarm/README.md +44 -0
- package/.opencode/skills/codebase-review-swarm/SKILL.md +65 -0
- package/.opencode/skills/codebase-review-swarm/agents/openai.yaml +6 -0
- package/.opencode/skills/codebase-review-swarm/assets/jsonl-schemas.md +239 -0
- package/.opencode/skills/codebase-review-swarm/assets/review-report-template.md +244 -0
- package/.opencode/skills/codebase-review-swarm/references/compatibility-and-research-notes.md +25 -0
- package/.opencode/skills/codebase-review-swarm/references/full-v7-source-prompt.md +2373 -0
- package/.opencode/skills/codebase-review-swarm/references/review-protocol-v8.2.md +310 -0
- package/.opencode/skills/codebase-review-swarm/scripts/init-review-run.py +134 -0
- package/.opencode/skills/codebase-review-swarm/scripts/validate-skill-package.py +62 -0
- package/.opencode/skills/consult/SKILL.md +16 -0
- package/.opencode/skills/council/SKILL.md +147 -0
- package/.opencode/skills/critic-gate/SKILL.md +59 -0
- package/.opencode/skills/deep-dive/SKILL.md +142 -0
- package/.opencode/skills/design-docs/SKILL.md +81 -0
- package/.opencode/skills/discover/SKILL.md +20 -0
- package/.opencode/skills/execute/SKILL.md +191 -0
- package/.opencode/skills/issue-ingest/SKILL.md +64 -0
- package/.opencode/skills/phase-wrap/SKILL.md +123 -0
- package/.opencode/skills/plan/SKILL.md +293 -0
- package/.opencode/skills/pre-phase-briefing/SKILL.md +69 -0
- package/.opencode/skills/resume/SKILL.md +23 -0
- package/.opencode/skills/specify/SKILL.md +175 -0
- package/.opencode/skills/swarm-pr-feedback/SKILL.md +192 -0
- package/.opencode/skills/swarm-pr-review/SKILL.md +884 -0
- package/dist/agents/agent-output-schema.d.ts +1 -1
- package/dist/cli/index.js +1351 -1159
- package/dist/commands/command-dispatch.d.ts +1 -0
- package/dist/commands/index.d.ts +1 -0
- package/dist/commands/registry.d.ts +15 -14
- package/dist/config/bundled-skills.d.ts +25 -0
- package/dist/config/constants.d.ts +1 -1
- package/dist/config/schema.d.ts +42 -0
- package/dist/index.js +3517 -2673
- package/dist/memory/schema.d.ts +1 -1
- package/dist/tools/lean-turbo-run-phase.d.ts +2 -1
- package/dist/turbo/lean/index.d.ts +4 -1
- package/dist/turbo/lean/merge-back.d.ts +180 -0
- package/dist/turbo/lean/runner.d.ts +47 -1
- package/dist/turbo/lean/state.d.ts +10 -0
- package/dist/turbo/lean/worktree.d.ts +194 -0
- package/package.json +20 -1
|
@@ -0,0 +1,310 @@
|
|
|
1
|
+
# Review Protocol v8.2
|
|
2
|
+
|
|
3
|
+
This protocol is the portable, state-of-the-art execution contract for `codebase-review-swarm`. It is derived from the v7 source prompt and updated for current Agent Skills packaging, current ASVS 5.0.0, explicit grounding/critic fields, and non-diluting depth/resource allocation across selected tracks.
|
|
4
|
+
|
|
5
|
+
## Role
|
|
6
|
+
|
|
7
|
+
Act as the Architect/orchestrator conducting a deep codebase review. Produce a verified report and machine-readable artifacts. Do not implement fixes or modify source files.
|
|
8
|
+
|
|
9
|
+
## Review modes
|
|
10
|
+
|
|
11
|
+
After Phase 0, use one or more selected modes:
|
|
12
|
+
|
|
13
|
+
1. Complete Integrated Review — all defect-focused tracks plus enhancement opportunities.
|
|
14
|
+
2. Defect-Focused Comprehensive QA — all defect tracks; no enhancement catalog.
|
|
15
|
+
3. Security and Supply Chain Focus — AppSec, LLM/MCP security, dependency integrity, CI provenance.
|
|
16
|
+
4. Functionality and Correctness Focus — claims-vs-shipped, wiring, edge cases, business logic.
|
|
17
|
+
5. Testing and Test Quality Focus — behavioral coverage, test drift, mutation resilience, property-based gaps.
|
|
18
|
+
6. UI/UX and Accessibility Focus — visual hierarchy, interaction design, WCAG 2.2 AA, typography, polish, design system, UI performance, evidence-backed AI-scaffold patterns.
|
|
19
|
+
7. Performance and Observability Focus — runtime performance, resource use, startup, telemetry, logs, metrics, traces.
|
|
20
|
+
8. AI Slop and Code Provenance Focus — hallucinated APIs, phantom dependencies, confident stubs, slopsquatting, context rot, stale API usage.
|
|
21
|
+
9. Enhancement Opportunities Only — architecture, quality, DX, resilience, observability, UI/UX, testing. Not a bug hunt.
|
|
22
|
+
10. Custom Combination — user-specified tracks or subsystem.
|
|
23
|
+
|
|
24
|
+
Selecting fewer tracks narrows domain only. It never reduces depth inside selected domains.
|
|
25
|
+
|
|
26
|
+
## Depth and resource allocation contract
|
|
27
|
+
|
|
28
|
+
This contract is mandatory for every run and overrides any implicit pressure to finish quickly.
|
|
29
|
+
|
|
30
|
+
### Core invariant
|
|
31
|
+
|
|
32
|
+
Selected tracks define *domain breadth*, not *review intensity*. A selected track must receive the same or greater depth whether it is run alone, with several tracks, or as part of a complete integrated review. The orchestrator must never trade depth inside a selected track for broader track coverage.
|
|
33
|
+
|
|
34
|
+
### Focused-track expansion
|
|
35
|
+
|
|
36
|
+
When the user selects one focused track or a narrow custom track set, convert the unused breadth into deeper analysis inside that domain:
|
|
37
|
+
|
|
38
|
+
- split coverage units more granularly than the minimum when a surface, boundary, component family, test cluster, or dependency family is complex;
|
|
39
|
+
- trace additional caller/callee, ingress/sink, schema, config, and test relationships relevant to that track;
|
|
40
|
+
- run every safe deterministic command relevant to that track rather than only the fastest one;
|
|
41
|
+
- perform additional disproof passes for high-impact candidates and repeated patterns;
|
|
42
|
+
- expand runtime validation attempts when runtime behavior is central and safe to exercise;
|
|
43
|
+
- use more reviewer batches with smaller local reasoning scopes;
|
|
44
|
+
- run targeted critic passes for systemic or high-value findings even below CRITICAL/HIGH when the track is the selected focus;
|
|
45
|
+
- produce fuller track-specific coverage notes, limitations, and remediation/enhancement sequencing.
|
|
46
|
+
|
|
47
|
+
A single-track review should feel like a specialist audit of that domain, not a filtered version of a complete review.
|
|
48
|
+
|
|
49
|
+
### Multi-track non-dilution
|
|
50
|
+
|
|
51
|
+
When the user selects multiple tracks or all tracks, treat the run as a composition of full-depth selected-track reviews plus cross-boundary synthesis. The orchestrator must add passes, waves, and artifacts instead of shrinking per-track effort.
|
|
52
|
+
|
|
53
|
+
Forbidden multi-track shortcuts:
|
|
54
|
+
|
|
55
|
+
- using larger file batches to fit all tracks into fewer contexts;
|
|
56
|
+
- sampling public surfaces, trust boundaries, test clusters, component families, or AI surfaces;
|
|
57
|
+
- reducing caller/callee tracing because another track also needs attention;
|
|
58
|
+
- skipping deterministic tools that would have run in a focused version of the track;
|
|
59
|
+
- omitting reviewer validation or critic challenge to conserve context;
|
|
60
|
+
- collapsing unrelated findings into vague systemic themes without preserving exact evidence;
|
|
61
|
+
- writing a final report that says selected tracks ran when any selected track did not reach its own full-depth closure gate.
|
|
62
|
+
|
|
63
|
+
If the selected scope is too large for one context window or one interactive session, split by track, subsystem, coverage unit, and validation lineage. Continue only from written artifacts. If splitting still leaves a selected unit unreviewed, mark it `BLOCKED` or `SKIPPED_WITH_REASON` with exact reason and exclude unsupported conclusions from the main findings.
|
|
64
|
+
|
|
65
|
+
### Review depth plan
|
|
66
|
+
|
|
67
|
+
After 0K and before Phase 1 candidate generation, create `ledgers/review-depth-plan.md`. The plan must list each selected track, its coverage-unit basis, minimum review passes, deterministic tools to attempt, validation routing, critic routing, and cross-track dependencies. The final critic must verify this plan against completed artifacts.
|
|
68
|
+
|
|
69
|
+
Minimum per-track depth plan fields:
|
|
70
|
+
|
|
71
|
+
```text
|
|
72
|
+
TRACK_DEPTH_PLAN
|
|
73
|
+
track: <A|B|C|D|E|F|G|1X>
|
|
74
|
+
mode: focused | multi_track | complete_integrated | custom
|
|
75
|
+
coverage_unit_basis: <public_surface | trust_boundary | test_cluster | ui_component_family | hot_path | dependency_family | ai_surface | domain_component | cross_boundary_pair>
|
|
76
|
+
expected_units: <count or unknown_until_inventory>
|
|
77
|
+
granularity_rule: <how complex units are split>
|
|
78
|
+
required_passes: <inventory excerpts, candidate pass, deterministic tool pass, caller/callee trace, tests/claims check, validation, critic>
|
|
79
|
+
deterministic_tools_to_attempt: <commands/tools or N/A with reason>
|
|
80
|
+
runtime_validation_policy: <when to run, when to mark UNVERIFIED>
|
|
81
|
+
reviewer_batch_rule: <local reasoning unit definition>
|
|
82
|
+
critic_rule: <inline/final/enhancement/systemic>
|
|
83
|
+
non_dilution_check: <why this track is not shallower because of selected breadth>
|
|
84
|
+
END
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Coverage unit completion depth
|
|
88
|
+
|
|
89
|
+
`REVIEWED` means more than “looked at.” For every selected track, the coverage unit must record `passes_completed`, `evidence_refs`, `deterministic_checks`, `runtime_checks_or_reason`, `validation_refs`, and `remaining_uncertainty`. A unit may close as `REVIEWED` only after the selected track’s depth plan has been satisfied for that unit.
|
|
90
|
+
|
|
91
|
+
## Artifact root
|
|
92
|
+
|
|
93
|
+
Create one run directory before track execution:
|
|
94
|
+
|
|
95
|
+
```text
|
|
96
|
+
.swarm/review-v8/runs/<run_id>/
|
|
97
|
+
metadata.json
|
|
98
|
+
source-of-truth-packet.md
|
|
99
|
+
repository-context-packet.md
|
|
100
|
+
artifacts/
|
|
101
|
+
claims.jsonl
|
|
102
|
+
surfaces.jsonl
|
|
103
|
+
boundaries.jsonl
|
|
104
|
+
ai-surfaces.jsonl
|
|
105
|
+
ui-inventory.jsonl
|
|
106
|
+
test-inventory.jsonl
|
|
107
|
+
coverage.jsonl
|
|
108
|
+
candidates.jsonl
|
|
109
|
+
validations.jsonl
|
|
110
|
+
critic.jsonl
|
|
111
|
+
disproven.jsonl
|
|
112
|
+
commands.jsonl
|
|
113
|
+
ledgers/
|
|
114
|
+
inventory-summary.md
|
|
115
|
+
candidate-summary.md
|
|
116
|
+
validation-summary.md
|
|
117
|
+
test-drift-review.md
|
|
118
|
+
strengths-ledger.md
|
|
119
|
+
review-depth-plan.md
|
|
120
|
+
final-critic-check.md
|
|
121
|
+
review-report.md
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Before writing under `.swarm/`, verify `.swarm/` is ignored or locally excluded. If tracked `.swarm` files exist, warn and record the fact in `metadata.json`.
|
|
125
|
+
|
|
126
|
+
## Phase 0 safe ordering
|
|
127
|
+
|
|
128
|
+
1. Run 0A alone.
|
|
129
|
+
2. After 0A, run 0B and 0C in parallel only if the repository is large enough to benefit.
|
|
130
|
+
3. After 0B, run 0D and 0E in parallel only if 0E can leave `linked_claims` blank for Architect linking in 0J. Otherwise run 0D before 0E.
|
|
131
|
+
4. Preferred batch order: batch 1 = 0F and 0G; batch 2 = 0H and 0I. Never exceed two Phase 0 agents.
|
|
132
|
+
5. Run 0F after 0E when possible.
|
|
133
|
+
6. Run 0G after 0B and 0C.
|
|
134
|
+
7. Run 0H and 0I after 0B and 0C.
|
|
135
|
+
8. Run 0J only after all applicable 0B-0I ledgers exist.
|
|
136
|
+
9. Run 0K after 0J. Stop for user track selection unless preselected.
|
|
137
|
+
10. Run 0L after track selection and before Phase 1 candidate generation. 0L is the last Phase 0 step before Phase 1.
|
|
138
|
+
|
|
139
|
+
Do not run dependent inventory passes merely to keep agents busy. Missing dependency context is `unknown`, not guessed.
|
|
140
|
+
|
|
141
|
+
## Phase 0 inventory
|
|
142
|
+
|
|
143
|
+
### 0A — Bootstrap and prior context
|
|
144
|
+
|
|
145
|
+
Architect reads directly. Capture current directory, git branch/head/status, prior reports (`qa-report.md`, `enhancement-report.md`, `.swarm/review-*`, `OPENCODE.md`, `CLAUDE.md`, `AGENTS.md`), package manager signals, language/workspace roots, and review type: fresh, continuation, or update.
|
|
146
|
+
|
|
147
|
+
### 0B — Directory and entry point map
|
|
148
|
+
|
|
149
|
+
Explorer maps top-level directories, source roots two levels deep, likely app/server/CLI/UI/worker/test/build entry points, generated/vendored/dependency/artifact paths, and approximate reviewable file counts. No architecture judgment.
|
|
150
|
+
|
|
151
|
+
### 0C — Manifest, dependency, tooling, and CI inventory
|
|
152
|
+
|
|
153
|
+
Explorer reads every manifest, lockfile, build script, package-manager metadata, CI workflow, Docker/container file, dependency update config, and release tool. Extract raw facts only: package manager, runtime constraints, scripts, direct dependencies, observed import/manifest mismatches, CI gates, lockfiles, provenance/attestation/signing signals. Do not judge dependency risk until Track B.
|
|
154
|
+
|
|
155
|
+
Run safe deterministic tools when available: package-manager list, lockfile integrity checks, typecheck/lint dry runs, dependency audit, OSV or equivalent, CodeQL/Semgrep if already configured, and MCP/tool scanners if AI surfaces exist. Record commands and outputs in `commands.jsonl`.
|
|
156
|
+
|
|
157
|
+
### 0D — Documentation, claims, and obligations ledger
|
|
158
|
+
|
|
159
|
+
Explorer reads README, docs, changelog, release notes, migration notes, examples, comments describing public behavior, supplied PR/issue text, and test names that claim behavior. Extract claims verbatim. Do not decide truth.
|
|
160
|
+
|
|
161
|
+
### 0E — Public surface inventory
|
|
162
|
+
|
|
163
|
+
Explorer identifies routes, controllers, commands, public exports, SDK APIs, event handlers, schemas, migrations, config keys, environment variables, jobs, queues, plugin hooks, extension points, and MCP tool/resource surfaces. Record input shapes, output shapes, auth/permission signals if locally visible, and wiring targets.
|
|
164
|
+
|
|
165
|
+
### 0F — Trust boundary and data flow inventory
|
|
166
|
+
|
|
167
|
+
Explorer maps ingress to sensitive sinks. Include HTTP, WebSocket, CLI args, environment variables, files/uploads, forms, IPC, queues, webhooks, plugins, browser storage, database reads, subprocess output, LLM prompts, retrieval context, tool schemas, MCP servers, and model outputs. Record guard/auth signals as `unknown` unless visible in the same local code region.
|
|
168
|
+
|
|
169
|
+
### 0G — Test, quality gate, and drift inventory
|
|
170
|
+
|
|
171
|
+
Test engineer, if available, inventories frameworks, commands, roots, fixtures, mocks, coverage, mutation/property/e2e/snapshot tools, CI gates, test names/comments that claim behavior, and obvious surface/test gaps.
|
|
172
|
+
|
|
173
|
+
### 0H — UI, UX, and design system inventory
|
|
174
|
+
|
|
175
|
+
Designer or Explorer determines whether UI exists and inventories UI type, framework, component/page roots, styling system, token/theme files, component library defaults, accessibility tooling, visual testing, Storybook/screenshots/design docs, and structural design signals. No critique yet.
|
|
176
|
+
|
|
177
|
+
### 0I — AI, agent, and model surface inventory
|
|
178
|
+
|
|
179
|
+
Run if 0B or 0C found AI-related names or packages (`ai`, `llm`, `prompt`, `agent`, `model`, `openai`, `anthropic`, `embedding`, `vector`, `rag`, `mcp`, `tool`, `eval`). Inventory model calls, prompts, tools, function schemas, MCP servers, autonomous loops, memory, retrieval, vector stores, evaluators, moderation, output parsers, user-controlled prompt/tool inputs, downstream sinks, limits, retries, budgets, and chain depth.
|
|
180
|
+
|
|
181
|
+
### 0J — Architect synthesis
|
|
182
|
+
|
|
183
|
+
Create `source-of-truth-packet.md`, `repository-context-packet.md`, and `ledgers/inventory-summary.md`. Do not add unquoted repo facts. Verify every required Phase 0 ledger exists and is non-empty or contains explicit `NOT_APPLICABLE` reason.
|
|
184
|
+
|
|
185
|
+
Minimum adequacy gate: if fewer than five non-`NOT_APPLICABLE`, non-empty structured blocks exist across applicable Phase 0 ledgers, or inventory is too sparse to support selected scope, stop and report limitation.
|
|
186
|
+
|
|
187
|
+
The source-of-truth packet must include repo identity, tech stack, commands, public surfaces, trust boundaries, MCP/agent surfaces, claims needing verification, test gates, UI applicability, AI applicability, recommended track, and prohibited assumptions.
|
|
188
|
+
|
|
189
|
+
The repository-context packet must be concise and global: architectural style, key modules and responsibilities, primary data flows, trust boundaries, notable tech decisions, and cross-cutting patterns visible from quoted Phase 0 inventory.
|
|
190
|
+
|
|
191
|
+
### 0K — User review mode gate
|
|
192
|
+
|
|
193
|
+
Stop and present the ten review choices unless the user’s original request already selected tracks and explicitly authorized continuing. If the user selects a focused review, do not run unrelated tracks; record omitted tracks in coverage notes.
|
|
194
|
+
|
|
195
|
+
### 0L — Review depth plan
|
|
196
|
+
|
|
197
|
+
After track selection and before candidate generation, write `ledgers/review-depth-plan.md` using the `TRACK_DEPTH_PLAN` block. This is the binding execution plan for selected-track depth.
|
|
198
|
+
|
|
199
|
+
Rules:
|
|
200
|
+
|
|
201
|
+
- Focused mode must show how unused breadth becomes deeper pass structure for the selected track.
|
|
202
|
+
- Multi-track and complete-integrated modes must show that every selected track keeps the same closure gate it would have had as a focused review.
|
|
203
|
+
- If the plan cannot allocate a full-depth path for a selected track, stop before Phase 1 and report the blocker instead of running a diluted review.
|
|
204
|
+
- Phase 5 final critic must compare the completed run to this plan.
|
|
205
|
+
|
|
206
|
+
## Phase 1 — Candidate generation
|
|
207
|
+
|
|
208
|
+
Every dispatch includes selected track(s), exact file list or surface IDs, source-of-truth packet, repository-context packet, relevant ledgers, the applicable `TRACK_DEPTH_PLAN`, candidate format, `out_of_scope_note` rule, and anti-cursory/non-dilution reminder.
|
|
209
|
+
|
|
210
|
+
File-size rule: no more than 15 files per deep pass; no more than 8 dense files per deep pass. Dense = >300 logical lines, multiple unrelated responsibilities, or interleaved UI/state/network/security logic. No sampling inside assigned scope. Large selections require more deep passes, not larger batches or lower depth.
|
|
211
|
+
|
|
212
|
+
Candidate micro-loop:
|
|
213
|
+
|
|
214
|
+
```text
|
|
215
|
+
1. What exact line or config proves current state?
|
|
216
|
+
2. What claim, contract, boundary, or quality standard is it compared against?
|
|
217
|
+
3. What alternative interpretation would make the concern false?
|
|
218
|
+
4. Did I check that alternative interpretation?
|
|
219
|
+
5. Is there still at least MEDIUM confidence?
|
|
220
|
+
6. Grounding check: does the candidate align precisely with quoted context without overclaim, missing surrounding logic, or unsupported inference? Rate HIGH / MEDIUM / LOW.
|
|
221
|
+
7. If yes and grounding is not LOW, emit candidate. Otherwise record uncertainty only.
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Track A — Functionality, correctness, and claims-vs-shipped
|
|
225
|
+
|
|
226
|
+
Run for modes 1, 2, 4, or custom behavior review. Build one coverage unit for every public surface. A `REVIEWED` surface has entry point read, implementation traced, tests checked, claims compared, and evidence captured.
|
|
227
|
+
|
|
228
|
+
Check wiring/reachability, claims vs implementation, logic correctness, async correctness, persistence/data-model drift, feature flags/config drift, cross-platform assumptions, error handling, timeouts, and happy-path-only behavior.
|
|
229
|
+
|
|
230
|
+
### Track B — Security, privacy, LLM/MCP security, and supply chain
|
|
231
|
+
|
|
232
|
+
Run for modes 1, 2, 3, or custom security review. Build one coverage unit for every trust boundary and every AI surface. In focused Track B mode, split complex boundaries by ingress, guard, sink, privilege context, data sensitivity, deployment/runtime context, and dependency or CI provenance family. A `REVIEWED` boundary has source, guard, sink, impact, callers, authz, exploitability/disproof path, relevant tests, deterministic scanner/dependency checks, and safe runtime validation checked.
|
|
233
|
+
|
|
234
|
+
Apply OWASP ASVS 5.0.0 for web controls. Apply OWASP Top 10 for LLM Applications 2025 for LLM/agent/RAG/MCP surfaces: prompt injection, sensitive information disclosure, supply chain, data/model poisoning, improper output handling, excessive agency, system prompt leakage, vector/embedding weaknesses, misinformation, and unbounded consumption.
|
|
235
|
+
|
|
236
|
+
MCP-specific checks: tool description poisoning, hidden instructions in tool metadata, untrusted resource content, context exfiltration to tools/logs, server-chain lateral movement, missing allow-lists, missing per-session permissions, arbitrary server URLs, and anomalous request/response behavior.
|
|
237
|
+
|
|
238
|
+
Supply-chain checks: phantom imports, undeclared dependencies, non-existent packages, typosquatting/dependency confusion/slopsquatting, unbounded ranges, install scripts, binary downloads, native addons, pinned actions, token scopes, artifact signing, SLSA v1.2 provenance/attestation, dependency update tooling, and OpenSSF Scorecard-style hygiene.
|
|
239
|
+
|
|
240
|
+
### Track C — Testing and test quality
|
|
241
|
+
|
|
242
|
+
Run for modes 1, 2, 5, or custom testing review. Build coverage units for test clusters, fixture/helper clusters, and public surfaces with test implications. In focused Track C mode, split by behavior domain, fixture/helper family, mocking boundary, assertion style, and negative/edge-case family. Passing tests and coverage percentages are not proof. Test names are claims.
|
|
243
|
+
|
|
244
|
+
Check behavior vs implementation assertions, stale mocks/fixtures, weak assertions, snapshot masking, missing negative/edge cases, async test correctness, isolation leakage, mutation resilience, property-based opportunities, CI gates, and whether tests would fail for the claimed bug.
|
|
245
|
+
|
|
246
|
+
### Track D — UI/UX and accessibility
|
|
247
|
+
|
|
248
|
+
Run for modes 1, 2, 6, or custom UI review only if 0H found UI. Build coverage units for every component family. In focused Track D mode, split by page/route, interaction flow, component family, state variant, responsive breakpoint, accessibility mechanism, and design-token dependency. All UI passes must read component files, not infer from names.
|
|
249
|
+
|
|
250
|
+
Apply WCAG 2.2 AA. Check visual hierarchy, layout, primary actions, information architecture, interaction feedback, keyboard/focus/ARIA/contrast, typography, responsive behavior, loading/empty/error states, UI performance, consistency, design tokens, and evidence-backed unmodified AI-scaffold defaults. Never report vibe-based UI slop.
|
|
251
|
+
|
|
252
|
+
### Track E — Performance and observability
|
|
253
|
+
|
|
254
|
+
Run for modes 1, 2, 7, or custom performance/observability review. Build coverage units for hot paths, startup paths, I/O paths, resource-heavy jobs, and telemetry boundaries. In focused Track E mode, split by operation class, input cardinality, resource dimension, deployment lifecycle, and telemetry signal path; require measurement or conservative caveat for performance claims.
|
|
255
|
+
|
|
256
|
+
Check algorithmic complexity, synchronous/blocking work, memory growth, N+1 calls, caching, batching, retries/timeouts, startup cost, bundle size where applicable, logs, metrics, traces, context propagation, correlation IDs, error reporting, redaction, and production diagnosability.
|
|
257
|
+
|
|
258
|
+
### Track F — AI slop and code provenance
|
|
259
|
+
|
|
260
|
+
Run for modes 1, 2, 8, or custom AI/provenance review. Build coverage units for dependency families, recently added/generated-looking clusters only when evidence exists, repeated code patterns, public claims, tests, and AI/tool surfaces. In focused Track F mode, split by package ecosystem, API family, repeated abstraction pattern, generated-code signal with concrete evidence, claim family, mock-only test family, and AI/tool boundary.
|
|
261
|
+
|
|
262
|
+
Check phantom dependencies, hallucinated APIs, stale framework signatures, confident stubs, unsupported public claims, over-abstraction, duplicated semantic code, mock-only tests, context rot, security theater, slopsquatting, copy-paste drift, and UI scaffold defaults. Requires exact quote and concrete consequence.
|
|
263
|
+
|
|
264
|
+
### Track G — Enhancement opportunities only
|
|
265
|
+
|
|
266
|
+
Run for mode 1, 9, or custom enhancement review. Do not hunt defects. Build coverage units by architecture/domain/component family. In focused Track G mode, split by architecture domain, code-quality cluster, developer workflow, resilience/observability concern, test improvement family, and UI improvement family when UI exists. Current code must be framed as working unless evidence proves a defect.
|
|
267
|
+
|
|
268
|
+
Evaluate architecture, code quality, simplification, developer experience, performance headroom, resilience, observability, test robustness, and UI/UX improvements. Report only high/medium-value opportunities unless user requests exhaustive low-value cleanup. Every final enhancement requires critic validation.
|
|
269
|
+
|
|
270
|
+
### Phase 1X — Cross-boundary review
|
|
271
|
+
|
|
272
|
+
Run when two or more tracks ran and quoted cross-track evidence can be compared. For multi-track/all-track reviews, this pass is mandatory unless there is an explicit `NOT_APPLICABLE` reason proving no cross-track comparison is possible. Check caller/callee mismatches, UI/API/schema drift, docs/API/test drift, auth assumptions across middleware/handlers, config-name drift, shared-state assumptions, generated type/schema drift, package scripts calling missing files, and AI prompt/tool boundaries crossing security sinks.
|
|
273
|
+
|
|
274
|
+
## Phase 2 — Reviewer validation
|
|
275
|
+
|
|
276
|
+
Validate candidates in small local reasoning batches: same file, route chain, subsystem, dependency family, public claim, trust boundary, UI component family, or test fixture/helper. Do not validate dozens of unrelated candidates together.
|
|
277
|
+
|
|
278
|
+
Reviewer must re-open exact file and line, read raw file independently before explorer paraphrase, read enough surrounding context, check callers/callees/tests/manifests/configs/schemas/routes/generated files/docs, check mitigating controls, run safe minimal runtime validation where needed, recalibrate severity/value, record disproof reason, and mark `UNVERIFIED` when evidence is insufficient.
|
|
279
|
+
|
|
280
|
+
CRITICAL/HIGH confirmed or pre-existing findings route to inline critic. MEDIUM/LOW confirmed/pre-existing findings require reviewer finalization. Disproved and unverified items do not enter main findings.
|
|
281
|
+
|
|
282
|
+
## Phase 2C — Inline critic for CRITICAL/HIGH defects
|
|
283
|
+
|
|
284
|
+
Run immediately after each reviewer batch containing CRITICAL/HIGH confirmed/pre-existing findings. Critic checks whether the finding is real, severity justified, runtime validation sufficient, fix actionable, no mitigating control missed, no overclaim beyond evidence, and whether sibling coverage is required. Only `UPHELD`, `REFINED`, or `DOWNGRADED` items continue.
|
|
285
|
+
|
|
286
|
+
## Phase 2M — Reviewer finalization for MEDIUM/LOW defects
|
|
287
|
+
|
|
288
|
+
Reviewer confirms each item is not style preference, not severity-inflated, supported by evidence, actionable, and not mitigated. Only finalized/downgraded items continue.
|
|
289
|
+
|
|
290
|
+
## Phase 2E — Enhancement critic
|
|
291
|
+
|
|
292
|
+
Every report-eligible enhancement is challenged for evidence, value, concreteness, effort, complexity cost, style/intent fit, duplication, and merge/split/downgrade/reject decision. Only upheld/refined/merged/downgraded enhancements continue.
|
|
293
|
+
|
|
294
|
+
## Phase 3 — Test validation and drift review
|
|
295
|
+
|
|
296
|
+
Run if any selected track touches functionality, testing, security, public claims, CI, or behavior. If Track C did not run, limit to test drift arising from other findings. Confirm behavior assertions, fixture freshness, mock realism, snapshot quality, property-based opportunities, mutation resilience gaps, and focused commands run.
|
|
297
|
+
|
|
298
|
+
## Phase 4 — Architect synthesis
|
|
299
|
+
|
|
300
|
+
Synthesize only validated evidence. Drop disproved/overturned. Keep unverified only in coverage notes. Deduplicate same root cause. Merge repeated patterns only with evidence. Separate defects from enhancements, unsupported claims from code defects, and AI slop patterns from normal technical debt. Count rejected/unverified items. Create strengths ledger with quoted evidence only. Verify coverage closure. If any selected-track coverage unit is `UNASSIGNED` or `UNREVIEWED`, return to Phase 1. Verify completed artifacts against `ledgers/review-depth-plan.md`; if any selected track was diluted relative to its plan, return to the relevant phase or mark precise units blocked/skipped with reason.
|
|
301
|
+
|
|
302
|
+
## Phase 5 — Final whole-report critic
|
|
303
|
+
|
|
304
|
+
Before writing final report, run adversarial final critic against planned synthesis. It must check evidence, validation routing, critic routing, severity/value calibration, defect/enhancement separation, unverified exclusion, strengths evidence, UI concreteness, security exploitability, performance measurement caveats, AI-slop evidence, claim ledger support, honest coverage notes, counts consistency, zero unreviewed coverage, selected-track completeness, and compliance with `ledgers/review-depth-plan.md` including focused-track expansion and multi-track non-dilution.
|
|
305
|
+
|
|
306
|
+
If verdict is `REVISE`, revise synthesis and rerun final critic until `PASS`.
|
|
307
|
+
|
|
308
|
+
## Phase 6 — Final report
|
|
309
|
+
|
|
310
|
+
Write `review-report.md` in the run directory only after final critic PASS. Use `assets/review-report-template.md`. Final assistant response reports run path, selected tracks, coverage units closed, defect/enhancement counts, candidates filtered, final critic verdict, highest-risk confirmed findings, highest-value enhancements if applicable, coverage limitations, and “No source files were modified.”
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Create a codebase-review-swarm run directory without touching source files."""
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
import argparse
|
|
6
|
+
import datetime as dt
|
|
7
|
+
import json
|
|
8
|
+
from pathlib import Path
|
|
9
|
+
import re
|
|
10
|
+
import subprocess
|
|
11
|
+
import sys
|
|
12
|
+
|
|
13
|
+
ARTIFACTS = [
|
|
14
|
+
"claims.jsonl",
|
|
15
|
+
"surfaces.jsonl",
|
|
16
|
+
"boundaries.jsonl",
|
|
17
|
+
"ai-surfaces.jsonl",
|
|
18
|
+
"ui-inventory.jsonl",
|
|
19
|
+
"test-inventory.jsonl",
|
|
20
|
+
"coverage.jsonl",
|
|
21
|
+
"candidates.jsonl",
|
|
22
|
+
"validations.jsonl",
|
|
23
|
+
"critic.jsonl",
|
|
24
|
+
"disproven.jsonl",
|
|
25
|
+
"commands.jsonl",
|
|
26
|
+
]
|
|
27
|
+
LEDGERS = [
|
|
28
|
+
"inventory-summary.md",
|
|
29
|
+
"candidate-summary.md",
|
|
30
|
+
"validation-summary.md",
|
|
31
|
+
"test-drift-review.md",
|
|
32
|
+
"strengths-ledger.md",
|
|
33
|
+
"final-critic-check.md",
|
|
34
|
+
]
|
|
35
|
+
RUN_ID_RE = re.compile(r"^[A-Za-z0-9._-]{1,128}$")
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
def run(cmd: list[str], cwd: Path) -> str | None:
|
|
39
|
+
try:
|
|
40
|
+
return subprocess.check_output(
|
|
41
|
+
cmd,
|
|
42
|
+
cwd=cwd,
|
|
43
|
+
stdin=subprocess.DEVNULL,
|
|
44
|
+
stderr=subprocess.DEVNULL,
|
|
45
|
+
text=True,
|
|
46
|
+
timeout=5,
|
|
47
|
+
).strip()
|
|
48
|
+
except Exception:
|
|
49
|
+
return None
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
def git_root(cwd: Path) -> Path:
|
|
53
|
+
out = run(["git", "rev-parse", "--show-toplevel"], cwd)
|
|
54
|
+
return Path(out) if out else cwd
|
|
55
|
+
|
|
56
|
+
|
|
57
|
+
def validate_run_id(raw: str) -> str:
|
|
58
|
+
if not RUN_ID_RE.fullmatch(raw) or raw in {".", ".."}:
|
|
59
|
+
raise ValueError(
|
|
60
|
+
"Invalid --run-id. Use 1-128 letters, numbers, dot, underscore, or dash; path segments are not allowed."
|
|
61
|
+
)
|
|
62
|
+
return raw
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
def resolve_run_dir(repo: Path, run_id: str) -> Path:
|
|
66
|
+
runs_root = (repo / ".swarm" / "review-v8" / "runs").resolve()
|
|
67
|
+
run_dir = (runs_root / run_id).resolve()
|
|
68
|
+
try:
|
|
69
|
+
run_dir.relative_to(runs_root)
|
|
70
|
+
except ValueError as exc:
|
|
71
|
+
raise ValueError("Invalid --run-id. Resolved run directory escapes .swarm/review-v8/runs.") from exc
|
|
72
|
+
if run_dir == runs_root:
|
|
73
|
+
raise ValueError("Invalid --run-id. Run id must name a child directory.")
|
|
74
|
+
return run_dir
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
def is_swarm_ignored(repo: Path) -> bool:
|
|
78
|
+
gitignore = repo / ".gitignore"
|
|
79
|
+
if not gitignore.exists():
|
|
80
|
+
return False
|
|
81
|
+
lines = [line.strip() for line in gitignore.read_text(errors="ignore").splitlines()]
|
|
82
|
+
return any(line in {".swarm", ".swarm/", "/.swarm", "/.swarm/"} for line in lines)
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
def main() -> int:
|
|
86
|
+
parser = argparse.ArgumentParser()
|
|
87
|
+
parser.add_argument("--root", default=".", help="repository root or working directory")
|
|
88
|
+
parser.add_argument("--run-id", default=None, help="explicit run id; default UTC timestamp")
|
|
89
|
+
parser.add_argument("--review-type", default="fresh", choices=["fresh", "continuation", "update"])
|
|
90
|
+
args = parser.parse_args()
|
|
91
|
+
|
|
92
|
+
cwd = Path(args.root).resolve()
|
|
93
|
+
repo = git_root(cwd)
|
|
94
|
+
try:
|
|
95
|
+
run_id = validate_run_id(args.run_id or dt.datetime.utcnow().strftime("%Y%m%dT%H%M%SZ"))
|
|
96
|
+
run_dir = resolve_run_dir(repo, run_id)
|
|
97
|
+
except ValueError as exc:
|
|
98
|
+
print(str(exc), file=sys.stderr)
|
|
99
|
+
return 2
|
|
100
|
+
artifacts_dir = run_dir / "artifacts"
|
|
101
|
+
ledgers_dir = run_dir / "ledgers"
|
|
102
|
+
artifacts_dir.mkdir(parents=True, exist_ok=True)
|
|
103
|
+
ledgers_dir.mkdir(parents=True, exist_ok=True)
|
|
104
|
+
|
|
105
|
+
for name in ARTIFACTS:
|
|
106
|
+
(artifacts_dir / name).touch(exist_ok=True)
|
|
107
|
+
for name in LEDGERS:
|
|
108
|
+
p = ledgers_dir / name
|
|
109
|
+
if not p.exists():
|
|
110
|
+
p.write_text("", encoding="utf-8")
|
|
111
|
+
|
|
112
|
+
metadata = {
|
|
113
|
+
"run_id": run_id,
|
|
114
|
+
"created_at_utc": dt.datetime.utcnow().replace(microsecond=0).isoformat() + "Z",
|
|
115
|
+
"review_type": args.review_type,
|
|
116
|
+
"repo_root": str(repo),
|
|
117
|
+
"git_branch": run(["git", "branch", "--show-current"], repo),
|
|
118
|
+
"git_head": run(["git", "rev-parse", "HEAD"], repo),
|
|
119
|
+
"dirty_worktree": bool(run(["git", "status", "--porcelain"], repo)),
|
|
120
|
+
"swarm_ignored": is_swarm_ignored(repo),
|
|
121
|
+
"source_files_modified_by_skill": False,
|
|
122
|
+
}
|
|
123
|
+
(run_dir / "metadata.json").write_text(json.dumps(metadata, indent=2) + "\n", encoding="utf-8")
|
|
124
|
+
(run_dir / "source-of-truth-packet.md").touch(exist_ok=True)
|
|
125
|
+
(run_dir / "repository-context-packet.md").touch(exist_ok=True)
|
|
126
|
+
|
|
127
|
+
print(str(run_dir))
|
|
128
|
+
if not metadata["swarm_ignored"]:
|
|
129
|
+
print("WARNING: .swarm/ was not found in .gitignore; record this in metadata and avoid committing review artifacts.", file=sys.stderr)
|
|
130
|
+
return 0
|
|
131
|
+
|
|
132
|
+
|
|
133
|
+
if __name__ == "__main__":
|
|
134
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Validate the local Agent Skill package structure without external dependencies."""
|
|
3
|
+
from __future__ import annotations
|
|
4
|
+
|
|
5
|
+
from pathlib import Path
|
|
6
|
+
import re
|
|
7
|
+
import sys
|
|
8
|
+
|
|
9
|
+
REQUIRED = [
|
|
10
|
+
"SKILL.md",
|
|
11
|
+
"references/review-protocol-v8.2.md",
|
|
12
|
+
"references/full-v7-source-prompt.md",
|
|
13
|
+
"assets/jsonl-schemas.md",
|
|
14
|
+
"assets/review-report-template.md",
|
|
15
|
+
]
|
|
16
|
+
NAME_RE = re.compile(r"^[a-z0-9]+(-[a-z0-9]+)*$")
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
def parse_frontmatter(text: str) -> dict[str, str]:
|
|
20
|
+
if not text.startswith("---\n"):
|
|
21
|
+
raise ValueError("SKILL.md missing YAML frontmatter")
|
|
22
|
+
end = text.find("\n---", 4)
|
|
23
|
+
if end == -1:
|
|
24
|
+
raise ValueError("SKILL.md frontmatter not closed")
|
|
25
|
+
fm = {}
|
|
26
|
+
for line in text[4:end].splitlines():
|
|
27
|
+
if not line or line.startswith(" ") or ":" not in line:
|
|
28
|
+
continue
|
|
29
|
+
k, v = line.split(":", 1)
|
|
30
|
+
value = v.strip()
|
|
31
|
+
if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
|
|
32
|
+
value = value[1:-1]
|
|
33
|
+
fm[k.strip()] = value
|
|
34
|
+
return fm
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
def main() -> int:
|
|
38
|
+
root = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve()
|
|
39
|
+
missing = [p for p in REQUIRED if not (root / p).exists()]
|
|
40
|
+
if missing:
|
|
41
|
+
print("missing required files:", ", ".join(missing), file=sys.stderr)
|
|
42
|
+
return 1
|
|
43
|
+
skill = (root / "SKILL.md").read_text(encoding="utf-8")
|
|
44
|
+
fm = parse_frontmatter(skill)
|
|
45
|
+
for field in ["name", "description"]:
|
|
46
|
+
if field not in fm or not fm[field]:
|
|
47
|
+
print(f"missing frontmatter field: {field}", file=sys.stderr)
|
|
48
|
+
return 1
|
|
49
|
+
if not NAME_RE.match(fm["name"]):
|
|
50
|
+
print("invalid skill name", file=sys.stderr)
|
|
51
|
+
return 1
|
|
52
|
+
if fm["name"] != root.name:
|
|
53
|
+
print(f"warning: directory name {root.name!r} does not match skill name {fm['name']!r}", file=sys.stderr)
|
|
54
|
+
if len(fm["description"]) > 1024:
|
|
55
|
+
print("description exceeds 1024 chars", file=sys.stderr)
|
|
56
|
+
return 1
|
|
57
|
+
print("skill package OK")
|
|
58
|
+
return 0
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
if __name__ == "__main__":
|
|
62
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: consult
|
|
3
|
+
description: >
|
|
4
|
+
Full execution protocol for MODE: CONSULT -- answering advisory questions with bounded evidence and clear uncertainty.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Consult Protocol
|
|
8
|
+
|
|
9
|
+
This protocol is loaded on demand by the architect stub in src/agents/architect.ts. The architect prompt keeps only activation, action, and hard safety constraints; the full execution details live here.
|
|
10
|
+
|
|
11
|
+
### MODE: CONSULT
|
|
12
|
+
Check .swarm/context.md for cached guidance first.
|
|
13
|
+
Identify 1-3 relevant domains from the task requirements.
|
|
14
|
+
Call the active swarm's sme agent once per domain, serially. Max 3 SME calls per project phase.
|
|
15
|
+
Re-consult if a new domain emerges or if significant changes require fresh evaluation.
|
|
16
|
+
Cache guidance in context.md.
|