@groupby/ai-dev 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/teams/snpd/skills/README.md +306 -0
- package/teams/snpd/skills/council-review/SKILL.md +243 -0
- package/teams/snpd/skills/council-review/references/output-format.md +108 -0
- package/teams/snpd/skills/council-review/references/reviewer-prompt.md +99 -0
- package/teams/snpd/skills/council-review/references/technology-profiles.md +54 -0
- package/teams/snpd/skills/council-review/scripts/summarize_review_config.py +226 -0
- package/teams/snpd/skills/docs-init/SKILL.md +71 -11
- package/teams/snpd/skills/docs-init-v2/SKILL.md +0 -402
package/package.json
CHANGED
|
@@ -0,0 +1,306 @@
|
|
|
1
|
+
# SNPD Team — SDLC Skills
|
|
2
|
+
|
|
3
|
+
A set of explicitly-invoked skills that walk a code change through the full
|
|
4
|
+
software-development lifecycle — from a Jira ticket to a draft pull request —
|
|
5
|
+
with a clean, auditable artifact at every step.
|
|
6
|
+
|
|
7
|
+
Each skill does **one** job, hands off a file, and **stops**. Nothing runs
|
|
8
|
+
automatically; every command must be explicitly invoked (`/jira-spec`,
|
|
9
|
+
`/draft-plan`, etc.). The human stays in control of every transition, and each
|
|
10
|
+
stage leaves a durable artifact the next stage reads.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## The Pipeline
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
Jira ticket
|
|
18
|
+
│
|
|
19
|
+
▼
|
|
20
|
+
┌──────────────┐ writes .claude/artifacts/{KEY}_{slug}-spec.md
|
|
21
|
+
│ jira-spec │ ──────────▶ (verbatim ticket + AI repo context)
|
|
22
|
+
└──────────────┘
|
|
23
|
+
│
|
|
24
|
+
▼
|
|
25
|
+
┌──────────────┐ reads spec, discusses, writes on approval
|
|
26
|
+
│ draft-plan │ ──────────▶ .claude/artifacts/{KEY}_{slug}-plan.md
|
|
27
|
+
└──────────────┘ (high-level plan, no code)
|
|
28
|
+
│
|
|
29
|
+
▼
|
|
30
|
+
┌──────────────────┐ reads plan, branch + strict TDD, local commits
|
|
31
|
+
│ tdd-implement │ ─────▶ feature branch {KEY}-{slug}
|
|
32
|
+
└──────────────────┘ + appends Impl Details / Post-Mortem to plan
|
|
33
|
+
│
|
|
34
|
+
▼
|
|
35
|
+
┌──────────────────┐ 3 LLM models review the diff, vote on findings
|
|
36
|
+
│ council-review │ ─────▶ consensus-ranked review (in chat)
|
|
37
|
+
└──────────────────┘
|
|
38
|
+
│
|
|
39
|
+
▼
|
|
40
|
+
┌──────────────┐ pushes branch, opens DRAFT PR on GitHub
|
|
41
|
+
│ draft-pr │ ─────▶ draft pull request
|
|
42
|
+
└──────────────┘
|
|
43
|
+
|
|
44
|
+
┌─────────────┐
|
|
45
|
+
│ docs-init │ (independent — runs on any repo at any time)
|
|
46
|
+
└─────────────┘
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The first five skills form a **linear pipeline** where each stage's output
|
|
50
|
+
becomes the next stage's input. `docs-init` is **independent** and can be run
|
|
51
|
+
on any repo to initialize or refresh project documentation.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Skills
|
|
56
|
+
|
|
57
|
+
### 1. `jira-spec` — Capture the ticket
|
|
58
|
+
|
|
59
|
+
| | |
|
|
60
|
+
|---|---|
|
|
61
|
+
| **Trigger** | `/jira-spec S4R-1234` or a Jira URL |
|
|
62
|
+
| **Reads** | Jira ticket (via Atlassian MCP, go-jira CLI, or curl) |
|
|
63
|
+
| **Produces** | `.claude/artifacts/{KEY}_{slug}-spec.md` |
|
|
64
|
+
| **Stops before** | Planning |
|
|
65
|
+
|
|
66
|
+
Fetches a Jira ticket and writes a development spec. The spec preserves the
|
|
67
|
+
original ticket description **verbatim** (for audit) and adds AI-gathered
|
|
68
|
+
repo context (touch points, patterns, test surface) from a **medium-depth**
|
|
69
|
+
repo scan. If Jira is unreachable, the skill refuses to proceed — it never
|
|
70
|
+
fabricates ticket content.
|
|
71
|
+
|
|
72
|
+
**Key rules:**
|
|
73
|
+
- Verbatim means verbatim — no paraphrasing the Jira description
|
|
74
|
+
- One spec file per ticket
|
|
75
|
+
- No invented metadata or acceptance criteria
|
|
76
|
+
- Does not start coding or planning
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### 2. `draft-plan` — Decide the approach
|
|
81
|
+
|
|
82
|
+
| | |
|
|
83
|
+
|---|---|
|
|
84
|
+
| **Trigger** | `/draft-plan S4R-1234` or path to spec file |
|
|
85
|
+
| **Reads** | `…-spec.md` from `.claude/artifacts/` |
|
|
86
|
+
| **Produces** | `.claude/artifacts/{KEY}_{slug}-plan.md` |
|
|
87
|
+
| **Stops before** | Implementation |
|
|
88
|
+
|
|
89
|
+
Turns a spec into a discussed, approved, high-level implementation plan.
|
|
90
|
+
Performs a **deep** codebase review (deeper than jira-spec) — reads relevant
|
|
91
|
+
files end-to-end, traces data flow across layers, checks architecture and
|
|
92
|
+
conventions docs. Presents the plan in chat (Goal, Approach, Affected areas,
|
|
93
|
+
Sequencing, Test strategy, Risks, Out of scope) and **discusses it with the
|
|
94
|
+
user** before writing.
|
|
95
|
+
|
|
96
|
+
**Key rules:**
|
|
97
|
+
- High-level by default — no line numbers, no code snippets
|
|
98
|
+
- Every cited file path is verified to exist in the repo
|
|
99
|
+
- The plan must trace back to the spec's acceptance criteria
|
|
100
|
+
- Only writes the plan file on explicit user approval
|
|
101
|
+
- Never starts implementation
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
### 3. `tdd-implement` — Build with strict TDD
|
|
106
|
+
|
|
107
|
+
| | |
|
|
108
|
+
|---|---|
|
|
109
|
+
| **Trigger** | `/tdd-implement S4R-1234` or path to plan file |
|
|
110
|
+
| **Reads** | `…-plan.md` from `.claude/artifacts/` + `docs/conventions.md`, `docs/architecture.md` |
|
|
111
|
+
| **Produces** | Feature branch `{KEY}-{slug}` with squashed commit; plan file updated with Implementation Details + Post-Mortem |
|
|
112
|
+
| **Stops before** | Push / PR |
|
|
113
|
+
|
|
114
|
+
Implements an approved plan using strict, phase-bound TDD on a feature
|
|
115
|
+
branch. For each phase: writes failing tests → commits → implements until
|
|
116
|
+
green → commits. Supports resuming interrupted sessions. Runs the project's
|
|
117
|
+
full verification gate before declaring completion.
|
|
118
|
+
|
|
119
|
+
**Key rules:**
|
|
120
|
+
- Never assumes the default branch is `main` — detects dynamically
|
|
121
|
+
- Strict TDD per phase (never batch tests up-front)
|
|
122
|
+
- Every commit message starts with `{KEY}:`
|
|
123
|
+
- Discoveries classified as trivial/plan-affecting/spec-affecting with
|
|
124
|
+
appropriate handling
|
|
125
|
+
- Never pushes, never opens a PR
|
|
126
|
+
- Squash via `git reset --soft`, not interactive rebase
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
### 4. `council-review` — Multi-model review before the PR
|
|
131
|
+
|
|
132
|
+
| | |
|
|
133
|
+
|---|---|
|
|
134
|
+
| **Trigger** | `/council-review`, "council review my changes", "multi-model review" |
|
|
135
|
+
| **Reads** | The diff (local changes, staged, branch, last N commits, or a PR) + repo config/conventions |
|
|
136
|
+
| **Produces** | A single consensus-ranked review **in chat** (no file artifact) |
|
|
137
|
+
| **Stops before** | Pushing / opening a PR |
|
|
138
|
+
|
|
139
|
+
Runs a high-signal, consensus-driven code review by spawning **three
|
|
140
|
+
independent `code-review` sub-agents on different LLM models**
|
|
141
|
+
(`claude-opus-4.8`, `gpt-5.3-codex`, `gpt-5.5`) — inspired by
|
|
142
|
+
[Karpathy's LLM Council](https://github.com/karpathy/llm-council). All three
|
|
143
|
+
get the **identical** prompt and diff, then their findings are deduplicated,
|
|
144
|
+
scored by agreement (🟢 unanimous / 🟡 majority / 🔵 solo), and ranked by
|
|
145
|
+
agreement then severity. Different models catch different things, so requiring
|
|
146
|
+
consensus drops noise and raises signal.
|
|
147
|
+
|
|
148
|
+
This is the **quality gate between `tdd-implement` and `draft-pr`**: review the
|
|
149
|
+
implemented diff, address the findings, then open the draft PR with confidence.
|
|
150
|
+
|
|
151
|
+
**Key rules:**
|
|
152
|
+
- Identical prompts for all reviewers — fair comparison is the whole point
|
|
153
|
+
- No model bias — agreement count is the only ranking signal
|
|
154
|
+
- Always launches all 3 agents in parallel, never sequentially
|
|
155
|
+
- Always shows which models agreed on each finding (transparency)
|
|
156
|
+
- Never generates hallucinated replacement code during synthesis
|
|
157
|
+
- On severity disagreement, uses the highest severity any reviewer assigned
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
### 5. `draft-pr` — Open the draft PR
|
|
162
|
+
|
|
163
|
+
| | |
|
|
164
|
+
|---|---|
|
|
165
|
+
| **Trigger** | `/draft-pr` |
|
|
166
|
+
| **Reads** | Plan file (preferred), spec file (for AC), or commit history (fallback) |
|
|
167
|
+
| **Produces** | Draft pull request on GitHub |
|
|
168
|
+
| **Stops before** | Marking ready / assigning reviewers |
|
|
169
|
+
|
|
170
|
+
Pushes the current branch and opens a **draft** pull request on GitHub.
|
|
171
|
+
Resolves the PR template (`.github/PULL_REQUEST_TEMPLATE.md`), derives
|
|
172
|
+
content from the plan and spec files, shows a full preview for confirmation,
|
|
173
|
+
then pushes (fast-forward only) and creates the draft PR.
|
|
174
|
+
|
|
175
|
+
**Key rules:**
|
|
176
|
+
- Always `--draft` — never marks the PR ready
|
|
177
|
+
- Shows a full preview before any push or PR creation
|
|
178
|
+
- Leaves all template `- [ ]` checkboxes unchecked
|
|
179
|
+
- Never assigns reviewers, labels, or milestones
|
|
180
|
+
- Never force-pushes
|
|
181
|
+
- Never invents acceptance criteria or test results
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
### 6. `docs-init` — Initialize or refresh project docs
|
|
186
|
+
|
|
187
|
+
| | |
|
|
188
|
+
|---|---|
|
|
189
|
+
| **Trigger** | `/docs-init` or "update the docs", "refresh project documentation" |
|
|
190
|
+
| **Reads** | The entire repo (build files, source, CI, deploy configs, etc.) |
|
|
191
|
+
| **Produces** | `README.md`, `CLAUDE.md`, and `docs/` folder (architecture, local-setup, conventions, operations, decisions) |
|
|
192
|
+
| **Stops before** | Writing anything — presents an audit report first |
|
|
193
|
+
|
|
194
|
+
Initializes or refreshes the canonical documentation structure. Works in
|
|
195
|
+
two modes:
|
|
196
|
+
|
|
197
|
+
- **Init mode** — `docs/` is missing or incomplete. Drafts all seven
|
|
198
|
+
canonical files from a fresh repo scan.
|
|
199
|
+
- **Refresh mode** — `docs/` exists. Audits each file for outdated content,
|
|
200
|
+
drift, and new opportunities.
|
|
201
|
+
|
|
202
|
+
Includes a **verification checklist** — every command, config key, directory
|
|
203
|
+
description, and workflow trigger is confirmed against actual source files
|
|
204
|
+
before inclusion. Presents a human-readable audit report and applies changes
|
|
205
|
+
only after explicit user approval.
|
|
206
|
+
|
|
207
|
+
**Key rules:**
|
|
208
|
+
- Never writes files before user approval
|
|
209
|
+
- `decisions.md` is append-only
|
|
210
|
+
- Preserves `> **Scope:**` blocks on every edit
|
|
211
|
+
- Verifies every factual claim against source
|
|
212
|
+
- `CLAUDE.md` ≤ 40 lines, `README.md` ≤ 30 lines
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## How the Skills Connect
|
|
217
|
+
|
|
218
|
+
### Artifact Flow
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
Jira Ticket
|
|
222
|
+
│
|
|
223
|
+
│ /jira-spec
|
|
224
|
+
▼
|
|
225
|
+
{KEY}_{slug}-spec.md ◄── verbatim ticket + shallow repo context
|
|
226
|
+
│
|
|
227
|
+
│ /draft-plan
|
|
228
|
+
▼
|
|
229
|
+
{KEY}_{slug}-plan.md ◄── approved high-level plan
|
|
230
|
+
│
|
|
231
|
+
│ /tdd-implement
|
|
232
|
+
├──▶ feature branch ◄── squashed commit with code changes
|
|
233
|
+
▼
|
|
234
|
+
{KEY}_{slug}-plan.md ◄── updated with Implementation Details + Post-Mortem
|
|
235
|
+
│
|
|
236
|
+
│ /council-review
|
|
237
|
+
▼
|
|
238
|
+
Consensus review (in chat) ◄── 3 models vote; address findings before the PR
|
|
239
|
+
│
|
|
240
|
+
│ /draft-pr
|
|
241
|
+
▼
|
|
242
|
+
Draft Pull Request ◄── body sourced from plan + spec
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
All spec and plan artifacts live in `.claude/artifacts/` within the repo,
|
|
246
|
+
using a consistent naming convention: `{KEY}_{slug}-spec.md` →
|
|
247
|
+
`{KEY}_{slug}-plan.md`. The `-spec` and `-plan` suffixes are mandatory.
|
|
248
|
+
|
|
249
|
+
### Separation of Concerns
|
|
250
|
+
|
|
251
|
+
| Boundary | Meaning |
|
|
252
|
+
|---|---|
|
|
253
|
+
| Spec ≠ Plan | The spec records *what the ticket asks*; the plan decides *how to build it* |
|
|
254
|
+
| Plan ≠ Code | The plan describes the approach at the level of files and phases — no code |
|
|
255
|
+
| Build ≠ Ship | Implementation produces a local branch only — no push, no PR |
|
|
256
|
+
| Code ≠ Review | The council reviews the diff and surfaces findings — it never edits code or ships |
|
|
257
|
+
| PR ≠ Merge | The PR is always a *draft* — marking ready is the human's call |
|
|
258
|
+
|
|
259
|
+
Each boundary is a **human checkpoint**. No skill silently rolls into the
|
|
260
|
+
next. Approval at one stage authorizes *only* that stage's output.
|
|
261
|
+
|
|
262
|
+
### docs-init Integration
|
|
263
|
+
|
|
264
|
+
`docs-init` is not part of the linear pipeline but supports it:
|
|
265
|
+
|
|
266
|
+
- **draft-plan** reads `docs/architecture.md`, `docs/conventions.md`, and
|
|
267
|
+
`docs/decisions.md` during its deep codebase review.
|
|
268
|
+
- **tdd-implement** loads `docs/conventions.md` and `docs/architecture.md`
|
|
269
|
+
before writing any code, and references them for style and patterns.
|
|
270
|
+
- **tdd-implement** may suggest entries for `docs/decisions.md` in its
|
|
271
|
+
Post-Mortem section (but never writes them directly).
|
|
272
|
+
|
|
273
|
+
Running `/docs-init` on a repo before starting the pipeline ensures the
|
|
274
|
+
plan and implementation stages have reliable documentation to reference.
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
## Typical End-to-End Run
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
/jira-spec S4R-10453 # → S4R-10453_mongo-cluster-routing-spec.md
|
|
282
|
+
/draft-plan S4R-10453 # discuss → approve → …-plan.md
|
|
283
|
+
/tdd-implement S4R-10453 # branch + TDD + squash; plan updated
|
|
284
|
+
/council-review # 3-model review of the diff; address findings
|
|
285
|
+
/draft-pr # confirm → draft PR on GitHub
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
## Conventions Shared Across All Skills
|
|
291
|
+
|
|
292
|
+
- **Explicit invocation only.** None auto-trigger on keywords.
|
|
293
|
+
- **Artifacts live in the repo.** All specs and plans go to
|
|
294
|
+
`.claude/artifacts/`, never `~/.claude/`.
|
|
295
|
+
- **Each stage stops at a human checkpoint.** No skill silently continues
|
|
296
|
+
into the next.
|
|
297
|
+
- **No fabrication.** Ticket text, acceptance criteria, file paths, and test
|
|
298
|
+
results are verified or quoted — never invented.
|
|
299
|
+
- **Source control hygiene.** Commits start with the Jira key (`{KEY}: …`).
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Also in This Folder
|
|
304
|
+
|
|
305
|
+
- `../github/PULL_REQUEST_TEMPLATE.md` — the SNPD team's PR template,
|
|
306
|
+
used by `draft-pr` when resolving templates.
|
|
@@ -0,0 +1,243 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: council-review
|
|
3
|
+
description: "Multi-model code review council inspired by Karpathy's LLM Council. Spawns 3 sub-agents on different models (Claude Opus 4.8, GPT-5.3 Codex, GPT-5.5) to independently review code changes, then synthesizes and votes on the best comments to produce a unified, high-signal review. Use when the user says /council-review, 'council review', 'multi-model review', 'review council', or 'LLM council'."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# LLM Council Code Review
|
|
7
|
+
|
|
8
|
+
## Purpose
|
|
9
|
+
|
|
10
|
+
Provide a high-quality, consensus-driven code review by running **three independent
|
|
11
|
+
reviewers on different LLM models**, then synthesizing their findings into a single
|
|
12
|
+
review ranked by agreement and severity — similar to
|
|
13
|
+
[Karpathy's LLM Council](https://github.com/karpathy/llm-council).
|
|
14
|
+
|
|
15
|
+
The insight: different models catch different things. One model may spot a race
|
|
16
|
+
condition another misses; one may flag a security issue the others gloss over.
|
|
17
|
+
By requiring consensus, noise drops and signal rises.
|
|
18
|
+
|
|
19
|
+
## Models Used (The Council)
|
|
20
|
+
|
|
21
|
+
| Seat | Model ID | Strengths |
|
|
22
|
+
|------------|---------------------|-------------------------------------------------|
|
|
23
|
+
| Reviewer A | `claude-opus-4.8` | Deep reasoning, architecture, subtle logic bugs |
|
|
24
|
+
| Reviewer B | `gpt-5.3-codex` | Code-native, practical fixes, test gaps |
|
|
25
|
+
| Reviewer C | `gpt-5.5` | Broad knowledge, security, API design |
|
|
26
|
+
|
|
27
|
+
## Trigger
|
|
28
|
+
|
|
29
|
+
Activate this skill when the user says any of:
|
|
30
|
+
- `/council-review`
|
|
31
|
+
- `council review my changes`
|
|
32
|
+
- `multi-model review`
|
|
33
|
+
- `LLM council review`
|
|
34
|
+
- `review council`
|
|
35
|
+
|
|
36
|
+
## Inputs
|
|
37
|
+
|
|
38
|
+
The user may provide:
|
|
39
|
+
- **No argument** → review local uncommitted changes (staged + unstaged)
|
|
40
|
+
- **`--staged`** → review only staged changes
|
|
41
|
+
- **`--branch`** → review current branch diff vs `origin/main`
|
|
42
|
+
- **`--commits <N>`** → review the last N commits (ignores uncommitted changes)
|
|
43
|
+
- **`--commits <sha>..<sha>`** → review a specific commit range
|
|
44
|
+
- **`--pr <number>`** → review a specific GitHub PR
|
|
45
|
+
- **A file path or glob** → review only those files
|
|
46
|
+
|
|
47
|
+
Natural language also works:
|
|
48
|
+
- "review my last 2 commits" → same as `--commits 2`
|
|
49
|
+
- "review last 3 commits before I open a PR" → same as `--commits 3`
|
|
50
|
+
|
|
51
|
+
## Workflow
|
|
52
|
+
|
|
53
|
+
### Phase 0: Repo Discovery
|
|
54
|
+
|
|
55
|
+
Before reviewing any code, discover the **current repo's own rules**. Do not
|
|
56
|
+
carry assumptions from another repo.
|
|
57
|
+
|
|
58
|
+
1. **Confirm repository scope:**
|
|
59
|
+
- Run `git status --short` and `git branch --show-current`.
|
|
60
|
+
- Identify the repo root and project type.
|
|
61
|
+
|
|
62
|
+
2. **Discover review configuration:**
|
|
63
|
+
- Check for these files and read them if present:
|
|
64
|
+
- `.github/workflows/claude-pr-review.yml` or `.github/workflows/claude.yml`
|
|
65
|
+
- `.github/workflows/build-pr.yaml`
|
|
66
|
+
- `.github/PULL_REQUEST_TEMPLATE.md`
|
|
67
|
+
- `.github/CODEOWNERS`
|
|
68
|
+
- Run this skill's bundled `scripts/summarize_review_config.py` script for a
|
|
69
|
+
quick context summary — it is repo-agnostic and works on any repository.
|
|
70
|
+
Resolve the script from this skill's own base directory (shown in your skill
|
|
71
|
+
context; do not hard-code a personal/author path) and pass the current repo
|
|
72
|
+
root as its argument:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# <skill-dir> = this skill's base directory, e.g.
|
|
76
|
+
# macOS/Linux: ~/.copilot/skills/council-review (or ~/.agents/skills/...)
|
|
77
|
+
# Windows: %USERPROFILE%\.copilot\skills\council-review
|
|
78
|
+
python3 "<skill-dir>/scripts/summarize_review_config.py" . # use `python` on Windows
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
If Python is unavailable, fall back to reading the config files above manually.
|
|
82
|
+
|
|
83
|
+
3. **Discover repo guidance (read if present):**
|
|
84
|
+
- `.github/copilot-instructions.md`
|
|
85
|
+
- `CLAUDE.md`, `AGENTS.md`
|
|
86
|
+
- `docs/conventions.md`, `docs/project-rule.md`, `docs/source-control.md`
|
|
87
|
+
- `README.md` (skim for architecture/setup sections)
|
|
88
|
+
|
|
89
|
+
4. **Detect technology profile:**
|
|
90
|
+
Scan build files and guidance docs for technology markers. Apply the
|
|
91
|
+
matching profile from `references/technology-profiles.md`. Only apply
|
|
92
|
+
rules that the **current repo actually uses**.
|
|
93
|
+
|
|
94
|
+
Key markers to scan for:
|
|
95
|
+
- Java/Gradle: `build.gradle`, `build.gradle.kts`, `settings.gradle`
|
|
96
|
+
- Maven: `pom.xml`
|
|
97
|
+
- Go: `go.mod`, `Makefile`
|
|
98
|
+
- Python: `pyproject.toml`, `requirements*.txt`
|
|
99
|
+
- Node: `package.json`
|
|
100
|
+
|
|
101
|
+
5. **Build a PROJECT_CONTEXT block** from all discovered information. This
|
|
102
|
+
block will be injected into every reviewer's prompt so all three models
|
|
103
|
+
review against the same repo-specific rules.
|
|
104
|
+
|
|
105
|
+
### Phase 1: Gather the Diff
|
|
106
|
+
|
|
107
|
+
1. Determine the review scope based on user input:
|
|
108
|
+
- **Local changes (default):** If working tree has edits, use `git diff HEAD`
|
|
109
|
+
(includes staged + unstaged). If working tree is clean but branch has
|
|
110
|
+
commits, compare against the PR base: `git diff origin/main...HEAD`.
|
|
111
|
+
If `origin/main` is not available, inspect upstream and available remotes.
|
|
112
|
+
- **Staged only:** `git diff --cached`
|
|
113
|
+
- **Branch diff:** `git diff origin/main...HEAD`
|
|
114
|
+
- **Last N commits:** `git diff HEAD~N..HEAD` (ignores working tree entirely)
|
|
115
|
+
Example: `--commits 2` → `git diff HEAD~2..HEAD`
|
|
116
|
+
- **Commit range:** `git diff <sha1>..<sha2>` for explicit ranges
|
|
117
|
+
- **PR:** `gh pr diff <number>`
|
|
118
|
+
2. Also gather context:
|
|
119
|
+
- `git diff --stat` for the file change summary
|
|
120
|
+
- The PROJECT_CONTEXT block built in Phase 0
|
|
121
|
+
3. If the diff is empty, tell the user and stop.
|
|
122
|
+
4. If the diff is very large (>5000 lines), warn the user and suggest narrowing scope.
|
|
123
|
+
5. **Classify the change** (helps reviewers focus):
|
|
124
|
+
- API/controller, service/orchestration, repository/database, search engine,
|
|
125
|
+
Mongo query/indexing, cache, Pub/Sub/messaging, auth/security, feature flags,
|
|
126
|
+
docs, tests, build/dependency, deployment, or tooling.
|
|
127
|
+
|
|
128
|
+
### Phase 2: Deploy the Council (Parallel Sub-Agents)
|
|
129
|
+
|
|
130
|
+
Launch **exactly 3 `code-review` agents in parallel** using the `task` tool, each
|
|
131
|
+
with a different `model` parameter. All three receive the **identical prompt** so
|
|
132
|
+
their reviews are directly comparable.
|
|
133
|
+
|
|
134
|
+
**CRITICAL: Launch all 3 in a single response — they run in parallel.**
|
|
135
|
+
|
|
136
|
+
Each agent receives the prompt from `references/reviewer-prompt.md`, with the
|
|
137
|
+
diff and project context injected.
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
Agent A: task(agent_type="code-review", model="claude-opus-4.8", ...)
|
|
141
|
+
Agent B: task(agent_type="code-review", model="gpt-5.3-codex", ...)
|
|
142
|
+
Agent C: task(agent_type="code-review", model="gpt-5.5", ...)
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
All three agents run in `mode="background"`. Wait for all three to complete
|
|
146
|
+
before proceeding to Phase 3.
|
|
147
|
+
|
|
148
|
+
### Phase 3: Collect & Parse Reviews
|
|
149
|
+
|
|
150
|
+
Read all three agent results. Each agent returns findings in the structured
|
|
151
|
+
format defined in `references/reviewer-prompt.md`. Extract:
|
|
152
|
+
- File path and line range for each comment
|
|
153
|
+
- Severity (P1/P2/P3)
|
|
154
|
+
- Category (bug, security, performance, style, test-gap, design)
|
|
155
|
+
- The finding description and suggested fix
|
|
156
|
+
|
|
157
|
+
### Phase 4: Council Vote — Synthesize & Rank
|
|
158
|
+
|
|
159
|
+
This is the core "council" step. Process the three reviews:
|
|
160
|
+
|
|
161
|
+
#### 4a. Deduplicate
|
|
162
|
+
|
|
163
|
+
Group comments that refer to the **same issue** (same file, overlapping lines,
|
|
164
|
+
same root cause). Two comments are "the same issue" if they:
|
|
165
|
+
- Point to the same file and overlapping line range, AND
|
|
166
|
+
- Describe the same underlying problem (even in different words)
|
|
167
|
+
|
|
168
|
+
#### 4b. Score by Agreement
|
|
169
|
+
|
|
170
|
+
For each unique issue, count how many of the 3 reviewers flagged it:
|
|
171
|
+
|
|
172
|
+
| Agreement | Label | Weight |
|
|
173
|
+
|-----------|--------------|--------|
|
|
174
|
+
| 3/3 | 🟢 Unanimous | High |
|
|
175
|
+
| 2/3 | 🟡 Majority | Medium |
|
|
176
|
+
| 1/3 | 🔵 Solo | Low |
|
|
177
|
+
|
|
178
|
+
#### 4c. Rank
|
|
179
|
+
|
|
180
|
+
Sort the final list by:
|
|
181
|
+
1. **Agreement** (unanimous > majority > solo)
|
|
182
|
+
2. **Severity** (P1 > P2 > P3) within each agreement tier
|
|
183
|
+
3. Within the same tier+severity, keep the most actionable/clear version of
|
|
184
|
+
the comment (pick the best phrasing from whichever model wrote it)
|
|
185
|
+
|
|
186
|
+
#### 4d. Solo Comment Filter
|
|
187
|
+
|
|
188
|
+
Solo comments (1/3) are **not discarded** but are presented separately under
|
|
189
|
+
a "Minority Opinions" section. They may contain genuine catches the other
|
|
190
|
+
models missed, or they may be noise. Let the user decide.
|
|
191
|
+
|
|
192
|
+
### Phase 5: Present the Council Review
|
|
193
|
+
|
|
194
|
+
Output the review using the format in `references/output-format.md`.
|
|
195
|
+
|
|
196
|
+
## Hard Rules
|
|
197
|
+
|
|
198
|
+
- **Identical prompts.** All three reviewers get exactly the same input.
|
|
199
|
+
Do not customize prompts per model — the whole point is fair comparison.
|
|
200
|
+
- **No model bias.** Do not weight one model's opinion over another during
|
|
201
|
+
voting. Agreement count is the only ranking signal.
|
|
202
|
+
- **Parallel launch.** Always launch all 3 agents in a single response.
|
|
203
|
+
Never run them sequentially.
|
|
204
|
+
- **Transparency.** Always show which models agreed on each finding.
|
|
205
|
+
- **No hallucinated code.** Do not generate suggested replacement code
|
|
206
|
+
yourself during synthesis. Use the reviewers' suggestions as-is.
|
|
207
|
+
- **Severity consistency.** If reviewers disagree on severity for the same
|
|
208
|
+
issue, use the highest severity any reviewer assigned.
|
|
209
|
+
- **Signal over noise.** The council exists to reduce noise. If a comment
|
|
210
|
+
is unclear or contradictory across reviewers, note the disagreement rather
|
|
211
|
+
than forcing consensus.
|
|
212
|
+
|
|
213
|
+
## Configuration
|
|
214
|
+
|
|
215
|
+
The user can customize the council by telling the agent:
|
|
216
|
+
- Different models: "use Opus 4.5 instead of Opus 4.8"
|
|
217
|
+
- Different number of reviewers: "use 5 models" (but default is 3)
|
|
218
|
+
- Focus areas: "focus on security" or "focus on performance"
|
|
219
|
+
- Strictness: "be strict" (lower the noise threshold) or "only critical" (P1 only)
|
|
220
|
+
|
|
221
|
+
## Error Handling
|
|
222
|
+
|
|
223
|
+
- If one agent fails, proceed with the remaining 2. Note the failure.
|
|
224
|
+
- If two agents fail, fall back to a single-model review and explain.
|
|
225
|
+
- If all three fail, tell the user and suggest running a simple code-review instead.
|
|
226
|
+
|
|
227
|
+
## Phase 6 (Optional): Post-Review Verification
|
|
228
|
+
|
|
229
|
+
After presenting the council review, **offer** to run verification. Do not
|
|
230
|
+
run automatically — the user may just want the review.
|
|
231
|
+
|
|
232
|
+
If the user accepts:
|
|
233
|
+
|
|
234
|
+
1. **Run targeted tests** for changed files using the repo's test command
|
|
235
|
+
(discovered in Phase 0). Prefer the narrowest test scope first.
|
|
236
|
+
2. **Run the PR build command** when feasible (from `build-pr.yaml`).
|
|
237
|
+
3. **Run `git diff --check`** for whitespace issues.
|
|
238
|
+
4. **Check PR template compliance** — if the repo has a PR template with
|
|
239
|
+
checkboxes, note which items are affected by the change.
|
|
240
|
+
5. **Report CODEOWNERS** — if the repo has CODEOWNERS, note which owners
|
|
241
|
+
are relevant for the changed files.
|
|
242
|
+
|
|
243
|
+
Append verification results to the review output.
|
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
# Council Review Output Format
|
|
2
|
+
|
|
3
|
+
Use this format when presenting the synthesized council review to the user.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Header
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
# 🏛️ LLM Council Code Review
|
|
11
|
+
|
|
12
|
+
**Scope:** <description of what was reviewed — branch, PR #, local changes>
|
|
13
|
+
**Council:** Claude Opus 4.8 · GPT-5.3 Codex · GPT-5.5
|
|
14
|
+
**Date:** <current date>
|
|
15
|
+
**Verdict:** <PASS | PASS WITH COMMENTS | NEEDS CHANGES>
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
### Verdict Rules
|
|
19
|
+
- **PASS** — No P1 or P2 issues found by any reviewer
|
|
20
|
+
- **PASS WITH COMMENTS** — No P1 issues; some P2/P3 found
|
|
21
|
+
- **NEEDS CHANGES** — At least one P1 issue found, OR 3+ P2 issues with majority agreement
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Consensus Findings (2/3 or 3/3 agreement)
|
|
26
|
+
|
|
27
|
+
These are issues flagged by multiple models independently. High confidence.
|
|
28
|
+
|
|
29
|
+
For each finding:
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
### <N>. <One-line summary>
|
|
33
|
+
🟢 Unanimous (3/3) | 🟡 Majority (2/3)
|
|
34
|
+
**Severity:** P1 | P2 | P3
|
|
35
|
+
**Category:** <category>
|
|
36
|
+
**File:** `<path>` (lines ~<range>)
|
|
37
|
+
**Agreed by:** Opus 4.8 ✓ · Codex 5.3 ✓ · GPT-5.5 ✓
|
|
38
|
+
|
|
39
|
+
<Best description from the reviewers. Pick the clearest, most actionable version.>
|
|
40
|
+
|
|
41
|
+
**Suggested fix:**
|
|
42
|
+
<Most concrete suggestion from any reviewer.>
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Minority Opinions (1/3 — solo catches)
|
|
48
|
+
|
|
49
|
+
These were flagged by only one model. They may be genuine catches the others
|
|
50
|
+
missed, or false positives. Included for completeness.
|
|
51
|
+
|
|
52
|
+
For each:
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
### <N>. <One-line summary>
|
|
56
|
+
🔵 Solo — flagged by <model name> only
|
|
57
|
+
**Severity:** P1 | P2 | P3
|
|
58
|
+
**Category:** <category>
|
|
59
|
+
**File:** `<path>` (lines ~<range>)
|
|
60
|
+
|
|
61
|
+
<Description from the flagging model.>
|
|
62
|
+
|
|
63
|
+
**Suggested fix:**
|
|
64
|
+
<Suggestion if provided.>
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Review Statistics
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
| Metric | Value |
|
|
73
|
+
|---------------------------|-------|
|
|
74
|
+
| Total unique issues | <N> |
|
|
75
|
+
| Unanimous (3/3) | <N> |
|
|
76
|
+
| Majority (2/3) | <N> |
|
|
77
|
+
| Solo (1/3) | <N> |
|
|
78
|
+
| P1 (Critical) | <N> |
|
|
79
|
+
| P2 (Important) | <N> |
|
|
80
|
+
| P3 (Minor) | <N> |
|
|
81
|
+
| Files reviewed | <N> |
|
|
82
|
+
| Lines changed | +<N> / -<N> |
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Model Agreement Matrix (optional, for large reviews)
|
|
88
|
+
|
|
89
|
+
Show which model caught what. Only include for reviews with 5+ findings.
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
| # | Finding | Opus 4.8 | Codex 5.3 | GPT-5.5 |
|
|
93
|
+
|---|--------------------------------|----------|-----------|---------|
|
|
94
|
+
| 1 | Race condition in UserService | ✓ | ✓ | ✓ |
|
|
95
|
+
| 2 | Missing null check in parser | ✓ | ✓ | |
|
|
96
|
+
| 3 | SQL injection in search filter | | ✓ | ✓ |
|
|
97
|
+
| 4 | Unused import (solo) | ✓ | | |
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Footer
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
---
|
|
106
|
+
*Review generated by LLM Council · 3 independent models · consensus-ranked*
|
|
107
|
+
*Models may miss issues. This review supplements, not replaces, human judgment.*
|
|
108
|
+
```
|