@groupby/ai-dev 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@groupby/ai-dev",
3
- "version": "0.5.1",
3
+ "version": "0.5.3",
4
4
  "description": "Interactive installer for Rezolve Ai development content",
5
5
  "type": "module",
6
6
  "bin": {
@@ -0,0 +1,306 @@
1
+ # SNPD Team — SDLC Skills
2
+
3
+ A set of explicitly-invoked skills that walk a code change through the full
4
+ software-development lifecycle — from a Jira ticket to a draft pull request —
5
+ with a clean, auditable artifact at every step.
6
+
7
+ Each skill does **one** job, hands off a file, and **stops**. Nothing runs
8
+ automatically; every command must be explicitly invoked (`/jira-spec`,
9
+ `/draft-plan`, etc.). The human stays in control of every transition, and each
10
+ stage leaves a durable artifact the next stage reads.
11
+
12
+ ---
13
+
14
+ ## The Pipeline
15
+
16
+ ```
17
+ Jira ticket
18
+
19
+
20
+ ┌──────────────┐ writes .claude/artifacts/{KEY}_{slug}-spec.md
21
+ │ jira-spec │ ──────────▶ (verbatim ticket + AI repo context)
22
+ └──────────────┘
23
+
24
+
25
+ ┌──────────────┐ reads spec, discusses, writes on approval
26
+ │ draft-plan │ ──────────▶ .claude/artifacts/{KEY}_{slug}-plan.md
27
+ └──────────────┘ (high-level plan, no code)
28
+
29
+
30
+ ┌──────────────────┐ reads plan, branch + strict TDD, local commits
31
+ │ tdd-implement │ ─────▶ feature branch {KEY}-{slug}
32
+ └──────────────────┘ + appends Impl Details / Post-Mortem to plan
33
+
34
+
35
+ ┌──────────────────┐ 3 LLM models review the diff, vote on findings
36
+ │ council-review │ ─────▶ consensus-ranked review (in chat)
37
+ └──────────────────┘
38
+
39
+
40
+ ┌──────────────┐ pushes branch, opens DRAFT PR on GitHub
41
+ │ draft-pr │ ─────▶ draft pull request
42
+ └──────────────┘
43
+
44
+ ┌─────────────┐
45
+ │ docs-init │ (independent — runs on any repo at any time)
46
+ └─────────────┘
47
+ ```
48
+
49
+ The first five skills form a **linear pipeline** where each stage's output
50
+ becomes the next stage's input. `docs-init` is **independent** and can be run
51
+ on any repo to initialize or refresh project documentation.
52
+
53
+ ---
54
+
55
+ ## Skills
56
+
57
+ ### 1. `jira-spec` — Capture the ticket
58
+
59
+ | | |
60
+ |---|---|
61
+ | **Trigger** | `/jira-spec S4R-1234` or a Jira URL |
62
+ | **Reads** | Jira ticket (via Atlassian MCP, go-jira CLI, or curl) |
63
+ | **Produces** | `.claude/artifacts/{KEY}_{slug}-spec.md` |
64
+ | **Stops before** | Planning |
65
+
66
+ Fetches a Jira ticket and writes a development spec. The spec preserves the
67
+ original ticket description **verbatim** (for audit) and adds AI-gathered
68
+ repo context (touch points, patterns, test surface) from a **medium-depth**
69
+ repo scan. If Jira is unreachable, the skill refuses to proceed — it never
70
+ fabricates ticket content.
71
+
72
+ **Key rules:**
73
+ - Verbatim means verbatim — no paraphrasing the Jira description
74
+ - One spec file per ticket
75
+ - No invented metadata or acceptance criteria
76
+ - Does not start coding or planning
77
+
78
+ ---
79
+
80
+ ### 2. `draft-plan` — Decide the approach
81
+
82
+ | | |
83
+ |---|---|
84
+ | **Trigger** | `/draft-plan S4R-1234` or path to spec file |
85
+ | **Reads** | `…-spec.md` from `.claude/artifacts/` |
86
+ | **Produces** | `.claude/artifacts/{KEY}_{slug}-plan.md` |
87
+ | **Stops before** | Implementation |
88
+
89
+ Turns a spec into a discussed, approved, high-level implementation plan.
90
+ Performs a **deep** codebase review (deeper than jira-spec) — reads relevant
91
+ files end-to-end, traces data flow across layers, checks architecture and
92
+ conventions docs. Presents the plan in chat (Goal, Approach, Affected areas,
93
+ Sequencing, Test strategy, Risks, Out of scope) and **discusses it with the
94
+ user** before writing.
95
+
96
+ **Key rules:**
97
+ - High-level by default — no line numbers, no code snippets
98
+ - Every cited file path is verified to exist in the repo
99
+ - The plan must trace back to the spec's acceptance criteria
100
+ - Only writes the plan file on explicit user approval
101
+ - Never starts implementation
102
+
103
+ ---
104
+
105
+ ### 3. `tdd-implement` — Build with strict TDD
106
+
107
+ | | |
108
+ |---|---|
109
+ | **Trigger** | `/tdd-implement S4R-1234` or path to plan file |
110
+ | **Reads** | `…-plan.md` from `.claude/artifacts/` + `docs/conventions.md`, `docs/architecture.md` |
111
+ | **Produces** | Feature branch `{KEY}-{slug}` with squashed commit; plan file updated with Implementation Details + Post-Mortem |
112
+ | **Stops before** | Push / PR |
113
+
114
+ Implements an approved plan using strict, phase-bound TDD on a feature
115
+ branch. For each phase: writes failing tests → commits → implements until
116
+ green → commits. Supports resuming interrupted sessions. Runs the project's
117
+ full verification gate before declaring completion.
118
+
119
+ **Key rules:**
120
+ - Never assumes the default branch is `main` — detects dynamically
121
+ - Strict TDD per phase (never batch tests up-front)
122
+ - Every commit message starts with `{KEY}:`
123
+ - Discoveries classified as trivial/plan-affecting/spec-affecting with
124
+ appropriate handling
125
+ - Never pushes, never opens a PR
126
+ - Squash via `git reset --soft`, not interactive rebase
127
+
128
+ ---
129
+
130
+ ### 4. `council-review` — Multi-model review before the PR
131
+
132
+ | | |
133
+ |---|---|
134
+ | **Trigger** | `/council-review`, "council review my changes", "multi-model review" |
135
+ | **Reads** | The diff (local changes, staged, branch, last N commits, or a PR) + repo config/conventions |
136
+ | **Produces** | A single consensus-ranked review **in chat** (no file artifact) |
137
+ | **Stops before** | Pushing / opening a PR |
138
+
139
+ Runs a high-signal, consensus-driven code review by spawning **three
140
+ independent `code-review` sub-agents on different LLM models**
141
+ (`claude-opus-4.8`, `gpt-5.3-codex`, `gpt-5.5`) — inspired by
142
+ [Karpathy's LLM Council](https://github.com/karpathy/llm-council). All three
143
+ get the **identical** prompt and diff, then their findings are deduplicated,
144
+ scored by agreement (🟢 unanimous / 🟡 majority / 🔵 solo), and ranked by
145
+ agreement then severity. Different models catch different things, so requiring
146
+ consensus drops noise and raises signal.
147
+
148
+ This is the **quality gate between `tdd-implement` and `draft-pr`**: review the
149
+ implemented diff, address the findings, then open the draft PR with confidence.
150
+
151
+ **Key rules:**
152
+ - Identical prompts for all reviewers — fair comparison is the whole point
153
+ - No model bias — agreement count is the only ranking signal
154
+ - Always launches all 3 agents in parallel, never sequentially
155
+ - Always shows which models agreed on each finding (transparency)
156
+ - Never generates hallucinated replacement code during synthesis
157
+ - On severity disagreement, uses the highest severity any reviewer assigned
158
+
159
+ ---
160
+
161
+ ### 5. `draft-pr` — Open the draft PR
162
+
163
+ | | |
164
+ |---|---|
165
+ | **Trigger** | `/draft-pr` |
166
+ | **Reads** | Plan file (preferred), spec file (for AC), or commit history (fallback) |
167
+ | **Produces** | Draft pull request on GitHub |
168
+ | **Stops before** | Marking ready / assigning reviewers |
169
+
170
+ Pushes the current branch and opens a **draft** pull request on GitHub.
171
+ Resolves the PR template (`.github/PULL_REQUEST_TEMPLATE.md`), derives
172
+ content from the plan and spec files, shows a full preview for confirmation,
173
+ then pushes (fast-forward only) and creates the draft PR.
174
+
175
+ **Key rules:**
176
+ - Always `--draft` — never marks the PR ready
177
+ - Shows a full preview before any push or PR creation
178
+ - Leaves all template `- [ ]` checkboxes unchecked
179
+ - Never assigns reviewers, labels, or milestones
180
+ - Never force-pushes
181
+ - Never invents acceptance criteria or test results
182
+
183
+ ---
184
+
185
+ ### 6. `docs-init` — Initialize or refresh project docs
186
+
187
+ | | |
188
+ |---|---|
189
+ | **Trigger** | `/docs-init` or "update the docs", "refresh project documentation" |
190
+ | **Reads** | The entire repo (build files, source, CI, deploy configs, etc.) |
191
+ | **Produces** | `README.md`, `CLAUDE.md`, and `docs/` folder (architecture, local-setup, conventions, operations, decisions) |
192
+ | **Stops before** | Writing anything — presents an audit report first |
193
+
194
+ Initializes or refreshes the canonical documentation structure. Works in
195
+ two modes:
196
+
197
+ - **Init mode** — `docs/` is missing or incomplete. Drafts all seven
198
+ canonical files from a fresh repo scan.
199
+ - **Refresh mode** — `docs/` exists. Audits each file for outdated content,
200
+ drift, and new opportunities.
201
+
202
+ Includes a **verification checklist** — every command, config key, directory
203
+ description, and workflow trigger is confirmed against actual source files
204
+ before inclusion. Presents a human-readable audit report and applies changes
205
+ only after explicit user approval.
206
+
207
+ **Key rules:**
208
+ - Never writes files before user approval
209
+ - `decisions.md` is append-only
210
+ - Preserves `> **Scope:**` blocks on every edit
211
+ - Verifies every factual claim against source
212
+ - `CLAUDE.md` ≤ 40 lines, `README.md` ≤ 30 lines
213
+
214
+ ---
215
+
216
+ ## How the Skills Connect
217
+
218
+ ### Artifact Flow
219
+
220
+ ```
221
+ Jira Ticket
222
+
223
+ │ /jira-spec
224
+
225
+ {KEY}_{slug}-spec.md ◄── verbatim ticket + shallow repo context
226
+
227
+ │ /draft-plan
228
+
229
+ {KEY}_{slug}-plan.md ◄── approved high-level plan
230
+
231
+ │ /tdd-implement
232
+ ├──▶ feature branch ◄── squashed commit with code changes
233
+
234
+ {KEY}_{slug}-plan.md ◄── updated with Implementation Details + Post-Mortem
235
+
236
+ │ /council-review
237
+
238
+ Consensus review (in chat) ◄── 3 models vote; address findings before the PR
239
+
240
+ │ /draft-pr
241
+
242
+ Draft Pull Request ◄── body sourced from plan + spec
243
+ ```
244
+
245
+ All spec and plan artifacts live in `.claude/artifacts/` within the repo,
246
+ using a consistent naming convention: `{KEY}_{slug}-spec.md` →
247
+ `{KEY}_{slug}-plan.md`. The `-spec` and `-plan` suffixes are mandatory.
248
+
249
+ ### Separation of Concerns
250
+
251
+ | Boundary | Meaning |
252
+ |---|---|
253
+ | Spec ≠ Plan | The spec records *what the ticket asks*; the plan decides *how to build it* |
254
+ | Plan ≠ Code | The plan describes the approach at the level of files and phases — no code |
255
+ | Build ≠ Ship | Implementation produces a local branch only — no push, no PR |
256
+ | Code ≠ Review | The council reviews the diff and surfaces findings — it never edits code or ships |
257
+ | PR ≠ Merge | The PR is always a *draft* — marking ready is the human's call |
258
+
259
+ Each boundary is a **human checkpoint**. No skill silently rolls into the
260
+ next. Approval at one stage authorizes *only* that stage's output.
261
+
262
+ ### docs-init Integration
263
+
264
+ `docs-init` is not part of the linear pipeline but supports it:
265
+
266
+ - **draft-plan** reads `docs/architecture.md`, `docs/conventions.md`, and
267
+ `docs/decisions.md` during its deep codebase review.
268
+ - **tdd-implement** loads `docs/conventions.md` and `docs/architecture.md`
269
+ before writing any code, and references them for style and patterns.
270
+ - **tdd-implement** may suggest entries for `docs/decisions.md` in its
271
+ Post-Mortem section (but never writes them directly).
272
+
273
+ Running `/docs-init` on a repo before starting the pipeline ensures the
274
+ plan and implementation stages have reliable documentation to reference.
275
+
276
+ ---
277
+
278
+ ## Typical End-to-End Run
279
+
280
+ ```bash
281
+ /jira-spec S4R-10453 # → S4R-10453_mongo-cluster-routing-spec.md
282
+ /draft-plan S4R-10453 # discuss → approve → …-plan.md
283
+ /tdd-implement S4R-10453 # branch + TDD + squash; plan updated
284
+ /council-review # 3-model review of the diff; address findings
285
+ /draft-pr # confirm → draft PR on GitHub
286
+ ```
287
+
288
+ ---
289
+
290
+ ## Conventions Shared Across All Skills
291
+
292
+ - **Explicit invocation only.** None auto-trigger on keywords.
293
+ - **Artifacts live in the repo.** All specs and plans go to
294
+ `.claude/artifacts/`, never `~/.claude/`.
295
+ - **Each stage stops at a human checkpoint.** No skill silently continues
296
+ into the next.
297
+ - **No fabrication.** Ticket text, acceptance criteria, file paths, and test
298
+ results are verified or quoted — never invented.
299
+ - **Source control hygiene.** Commits start with the Jira key (`{KEY}: …`).
300
+
301
+ ---
302
+
303
+ ## Also in This Folder
304
+
305
+ - `../github/PULL_REQUEST_TEMPLATE.md` — the SNPD team's PR template,
306
+ used by `draft-pr` when resolving templates.
@@ -0,0 +1,243 @@
1
+ ---
2
+ name: council-review
3
+ description: "Multi-model code review council inspired by Karpathy's LLM Council. Spawns 3 sub-agents on different models (Claude Opus 4.8, GPT-5.3 Codex, GPT-5.5) to independently review code changes, then synthesizes and votes on the best comments to produce a unified, high-signal review. Use when the user says /council-review, 'council review', 'multi-model review', 'review council', or 'LLM council'."
4
+ ---
5
+
6
+ # LLM Council Code Review
7
+
8
+ ## Purpose
9
+
10
+ Provide a high-quality, consensus-driven code review by running **three independent
11
+ reviewers on different LLM models**, then synthesizing their findings into a single
12
+ review ranked by agreement and severity — similar to
13
+ [Karpathy's LLM Council](https://github.com/karpathy/llm-council).
14
+
15
+ The insight: different models catch different things. One model may spot a race
16
+ condition another misses; one may flag a security issue the others gloss over.
17
+ By requiring consensus, noise drops and signal rises.
18
+
19
+ ## Models Used (The Council)
20
+
21
+ | Seat | Model ID | Strengths |
22
+ |------------|---------------------|-------------------------------------------------|
23
+ | Reviewer A | `claude-opus-4.8` | Deep reasoning, architecture, subtle logic bugs |
24
+ | Reviewer B | `gpt-5.3-codex` | Code-native, practical fixes, test gaps |
25
+ | Reviewer C | `gpt-5.5` | Broad knowledge, security, API design |
26
+
27
+ ## Trigger
28
+
29
+ Activate this skill when the user says any of:
30
+ - `/council-review`
31
+ - `council review my changes`
32
+ - `multi-model review`
33
+ - `LLM council review`
34
+ - `review council`
35
+
36
+ ## Inputs
37
+
38
+ The user may provide:
39
+ - **No argument** → review local uncommitted changes (staged + unstaged)
40
+ - **`--staged`** → review only staged changes
41
+ - **`--branch`** → review current branch diff vs `origin/main`
42
+ - **`--commits <N>`** → review the last N commits (ignores uncommitted changes)
43
+ - **`--commits <sha>..<sha>`** → review a specific commit range
44
+ - **`--pr <number>`** → review a specific GitHub PR
45
+ - **A file path or glob** → review only those files
46
+
47
+ Natural language also works:
48
+ - "review my last 2 commits" → same as `--commits 2`
49
+ - "review last 3 commits before I open a PR" → same as `--commits 3`
50
+
51
+ ## Workflow
52
+
53
+ ### Phase 0: Repo Discovery
54
+
55
+ Before reviewing any code, discover the **current repo's own rules**. Do not
56
+ carry assumptions from another repo.
57
+
58
+ 1. **Confirm repository scope:**
59
+ - Run `git status --short` and `git branch --show-current`.
60
+ - Identify the repo root and project type.
61
+
62
+ 2. **Discover review configuration:**
63
+ - Check for these files and read them if present:
64
+ - `.github/workflows/claude-pr-review.yml` or `.github/workflows/claude.yml`
65
+ - `.github/workflows/build-pr.yaml`
66
+ - `.github/PULL_REQUEST_TEMPLATE.md`
67
+ - `.github/CODEOWNERS`
68
+ - Run this skill's bundled `scripts/summarize_review_config.py` script for a
69
+ quick context summary — it is repo-agnostic and works on any repository.
70
+ Resolve the script from this skill's own base directory (shown in your skill
71
+ context; do not hard-code a personal/author path) and pass the current repo
72
+ root as its argument:
73
+
74
+ ```bash
75
+ # <skill-dir> = this skill's base directory, e.g.
76
+ # macOS/Linux: ~/.copilot/skills/council-review (or ~/.agents/skills/...)
77
+ # Windows: %USERPROFILE%\.copilot\skills\council-review
78
+ python3 "<skill-dir>/scripts/summarize_review_config.py" . # use `python` on Windows
79
+ ```
80
+
81
+ If Python is unavailable, fall back to reading the config files above manually.
82
+
83
+ 3. **Discover repo guidance (read if present):**
84
+ - `.github/copilot-instructions.md`
85
+ - `CLAUDE.md`, `AGENTS.md`
86
+ - `docs/conventions.md`, `docs/project-rule.md`, `docs/source-control.md`
87
+ - `README.md` (skim for architecture/setup sections)
88
+
89
+ 4. **Detect technology profile:**
90
+ Scan build files and guidance docs for technology markers. Apply the
91
+ matching profile from `references/technology-profiles.md`. Only apply
92
+ rules that the **current repo actually uses**.
93
+
94
+ Key markers to scan for:
95
+ - Java/Gradle: `build.gradle`, `build.gradle.kts`, `settings.gradle`
96
+ - Maven: `pom.xml`
97
+ - Go: `go.mod`, `Makefile`
98
+ - Python: `pyproject.toml`, `requirements*.txt`
99
+ - Node: `package.json`
100
+
101
+ 5. **Build a PROJECT_CONTEXT block** from all discovered information. This
102
+ block will be injected into every reviewer's prompt so all three models
103
+ review against the same repo-specific rules.
104
+
105
+ ### Phase 1: Gather the Diff
106
+
107
+ 1. Determine the review scope based on user input:
108
+ - **Local changes (default):** If working tree has edits, use `git diff HEAD`
109
+ (includes staged + unstaged). If working tree is clean but branch has
110
+ commits, compare against the PR base: `git diff origin/main...HEAD`.
111
+ If `origin/main` is not available, inspect upstream and available remotes.
112
+ - **Staged only:** `git diff --cached`
113
+ - **Branch diff:** `git diff origin/main...HEAD`
114
+ - **Last N commits:** `git diff HEAD~N..HEAD` (ignores working tree entirely)
115
+ Example: `--commits 2` → `git diff HEAD~2..HEAD`
116
+ - **Commit range:** `git diff <sha1>..<sha2>` for explicit ranges
117
+ - **PR:** `gh pr diff <number>`
118
+ 2. Also gather context:
119
+ - `git diff --stat` for the file change summary
120
+ - The PROJECT_CONTEXT block built in Phase 0
121
+ 3. If the diff is empty, tell the user and stop.
122
+ 4. If the diff is very large (>5000 lines), warn the user and suggest narrowing scope.
123
+ 5. **Classify the change** (helps reviewers focus):
124
+ - API/controller, service/orchestration, repository/database, search engine,
125
+ Mongo query/indexing, cache, Pub/Sub/messaging, auth/security, feature flags,
126
+ docs, tests, build/dependency, deployment, or tooling.
127
+
128
+ ### Phase 2: Deploy the Council (Parallel Sub-Agents)
129
+
130
+ Launch **exactly 3 `code-review` agents in parallel** using the `task` tool, each
131
+ with a different `model` parameter. All three receive the **identical prompt** so
132
+ their reviews are directly comparable.
133
+
134
+ **CRITICAL: Launch all 3 in a single response — they run in parallel.**
135
+
136
+ Each agent receives the prompt from `references/reviewer-prompt.md`, with the
137
+ diff and project context injected.
138
+
139
+ ```
140
+ Agent A: task(agent_type="code-review", model="claude-opus-4.8", ...)
141
+ Agent B: task(agent_type="code-review", model="gpt-5.3-codex", ...)
142
+ Agent C: task(agent_type="code-review", model="gpt-5.5", ...)
143
+ ```
144
+
145
+ All three agents run in `mode="background"`. Wait for all three to complete
146
+ before proceeding to Phase 3.
147
+
148
+ ### Phase 3: Collect & Parse Reviews
149
+
150
+ Read all three agent results. Each agent returns findings in the structured
151
+ format defined in `references/reviewer-prompt.md`. Extract:
152
+ - File path and line range for each comment
153
+ - Severity (P1/P2/P3)
154
+ - Category (bug, security, performance, style, test-gap, design)
155
+ - The finding description and suggested fix
156
+
157
+ ### Phase 4: Council Vote — Synthesize & Rank
158
+
159
+ This is the core "council" step. Process the three reviews:
160
+
161
+ #### 4a. Deduplicate
162
+
163
+ Group comments that refer to the **same issue** (same file, overlapping lines,
164
+ same root cause). Two comments are "the same issue" if they:
165
+ - Point to the same file and overlapping line range, AND
166
+ - Describe the same underlying problem (even in different words)
167
+
168
+ #### 4b. Score by Agreement
169
+
170
+ For each unique issue, count how many of the 3 reviewers flagged it:
171
+
172
+ | Agreement | Label | Weight |
173
+ |-----------|--------------|--------|
174
+ | 3/3 | 🟢 Unanimous | High |
175
+ | 2/3 | 🟡 Majority | Medium |
176
+ | 1/3 | 🔵 Solo | Low |
177
+
178
+ #### 4c. Rank
179
+
180
+ Sort the final list by:
181
+ 1. **Agreement** (unanimous > majority > solo)
182
+ 2. **Severity** (P1 > P2 > P3) within each agreement tier
183
+ 3. Within the same tier+severity, keep the most actionable/clear version of
184
+ the comment (pick the best phrasing from whichever model wrote it)
185
+
186
+ #### 4d. Solo Comment Filter
187
+
188
+ Solo comments (1/3) are **not discarded** but are presented separately under
189
+ a "Minority Opinions" section. They may contain genuine catches the other
190
+ models missed, or they may be noise. Let the user decide.
191
+
192
+ ### Phase 5: Present the Council Review
193
+
194
+ Output the review using the format in `references/output-format.md`.
195
+
196
+ ## Hard Rules
197
+
198
+ - **Identical prompts.** All three reviewers get exactly the same input.
199
+ Do not customize prompts per model — the whole point is fair comparison.
200
+ - **No model bias.** Do not weight one model's opinion over another during
201
+ voting. Agreement count is the only ranking signal.
202
+ - **Parallel launch.** Always launch all 3 agents in a single response.
203
+ Never run them sequentially.
204
+ - **Transparency.** Always show which models agreed on each finding.
205
+ - **No hallucinated code.** Do not generate suggested replacement code
206
+ yourself during synthesis. Use the reviewers' suggestions as-is.
207
+ - **Severity consistency.** If reviewers disagree on severity for the same
208
+ issue, use the highest severity any reviewer assigned.
209
+ - **Signal over noise.** The council exists to reduce noise. If a comment
210
+ is unclear or contradictory across reviewers, note the disagreement rather
211
+ than forcing consensus.
212
+
213
+ ## Configuration
214
+
215
+ The user can customize the council by telling the agent:
216
+ - Different models: "use Opus 4.5 instead of Opus 4.8"
217
+ - Different number of reviewers: "use 5 models" (but default is 3)
218
+ - Focus areas: "focus on security" or "focus on performance"
219
+ - Strictness: "be strict" (lower the noise threshold) or "only critical" (P1 only)
220
+
221
+ ## Error Handling
222
+
223
+ - If one agent fails, proceed with the remaining 2. Note the failure.
224
+ - If two agents fail, fall back to a single-model review and explain.
225
+ - If all three fail, tell the user and suggest running a simple code-review instead.
226
+
227
+ ## Phase 6 (Optional): Post-Review Verification
228
+
229
+ After presenting the council review, **offer** to run verification. Do not
230
+ run automatically — the user may just want the review.
231
+
232
+ If the user accepts:
233
+
234
+ 1. **Run targeted tests** for changed files using the repo's test command
235
+ (discovered in Phase 0). Prefer the narrowest test scope first.
236
+ 2. **Run the PR build command** when feasible (from `build-pr.yaml`).
237
+ 3. **Run `git diff --check`** for whitespace issues.
238
+ 4. **Check PR template compliance** — if the repo has a PR template with
239
+ checkboxes, note which items are affected by the change.
240
+ 5. **Report CODEOWNERS** — if the repo has CODEOWNERS, note which owners
241
+ are relevant for the changed files.
242
+
243
+ Append verification results to the review output.
@@ -0,0 +1,108 @@
1
+ # Council Review Output Format
2
+
3
+ Use this format when presenting the synthesized council review to the user.
4
+
5
+ ---
6
+
7
+ ## Header
8
+
9
+ ```
10
+ # 🏛️ LLM Council Code Review
11
+
12
+ **Scope:** <description of what was reviewed — branch, PR #, local changes>
13
+ **Council:** Claude Opus 4.8 · GPT-5.3 Codex · GPT-5.5
14
+ **Date:** <current date>
15
+ **Verdict:** <PASS | PASS WITH COMMENTS | NEEDS CHANGES>
16
+ ```
17
+
18
+ ### Verdict Rules
19
+ - **PASS** — No P1 or P2 issues found by any reviewer
20
+ - **PASS WITH COMMENTS** — No P1 issues; some P2/P3 found
21
+ - **NEEDS CHANGES** — At least one P1 issue found, OR 3+ P2 issues with majority agreement
22
+
23
+ ---
24
+
25
+ ## Consensus Findings (2/3 or 3/3 agreement)
26
+
27
+ These are issues flagged by multiple models independently. High confidence.
28
+
29
+ For each finding:
30
+
31
+ ```
32
+ ### <N>. <One-line summary>
33
+ 🟢 Unanimous (3/3) | 🟡 Majority (2/3)
34
+ **Severity:** P1 | P2 | P3
35
+ **Category:** <category>
36
+ **File:** `<path>` (lines ~<range>)
37
+ **Agreed by:** Opus 4.8 ✓ · Codex 5.3 ✓ · GPT-5.5 ✓
38
+
39
+ <Best description from the reviewers. Pick the clearest, most actionable version.>
40
+
41
+ **Suggested fix:**
42
+ <Most concrete suggestion from any reviewer.>
43
+ ```
44
+
45
+ ---
46
+
47
+ ## Minority Opinions (1/3 — solo catches)
48
+
49
+ These were flagged by only one model. They may be genuine catches the others
50
+ missed, or false positives. Included for completeness.
51
+
52
+ For each:
53
+
54
+ ```
55
+ ### <N>. <One-line summary>
56
+ 🔵 Solo — flagged by <model name> only
57
+ **Severity:** P1 | P2 | P3
58
+ **Category:** <category>
59
+ **File:** `<path>` (lines ~<range>)
60
+
61
+ <Description from the flagging model.>
62
+
63
+ **Suggested fix:**
64
+ <Suggestion if provided.>
65
+ ```
66
+
67
+ ---
68
+
69
+ ## Review Statistics
70
+
71
+ ```
72
+ | Metric | Value |
73
+ |---------------------------|-------|
74
+ | Total unique issues | <N> |
75
+ | Unanimous (3/3) | <N> |
76
+ | Majority (2/3) | <N> |
77
+ | Solo (1/3) | <N> |
78
+ | P1 (Critical) | <N> |
79
+ | P2 (Important) | <N> |
80
+ | P3 (Minor) | <N> |
81
+ | Files reviewed | <N> |
82
+ | Lines changed | +<N> / -<N> |
83
+ ```
84
+
85
+ ---
86
+
87
+ ## Model Agreement Matrix (optional, for large reviews)
88
+
89
+ Show which model caught what. Only include for reviews with 5+ findings.
90
+
91
+ ```
92
+ | # | Finding | Opus 4.8 | Codex 5.3 | GPT-5.5 |
93
+ |---|--------------------------------|----------|-----------|---------|
94
+ | 1 | Race condition in UserService | ✓ | ✓ | ✓ |
95
+ | 2 | Missing null check in parser | ✓ | ✓ | |
96
+ | 3 | SQL injection in search filter | | ✓ | ✓ |
97
+ | 4 | Unused import (solo) | ✓ | | |
98
+ ```
99
+
100
+ ---
101
+
102
+ ## Footer
103
+
104
+ ```
105
+ ---
106
+ *Review generated by LLM Council · 3 independent models · consensus-ranked*
107
+ *Models may miss issues. This review supplements, not replaces, human judgment.*
108
+ ```