@clipboard-health/ai-rules 2.25.0 β 2.26.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/rules/common/coreLibraries.md +0 -1
- package/skills/babysit-pr/SKILL.md +4 -4
- package/skills/babysit-pr/scripts/_sentinel.sh +1 -1
- package/skills/commit-push-pr/SKILL.md +2 -2
- package/skills/in-depth-review/SKILL.md +491 -0
- package/skills/in-depth-review/references/cross-repo-evidence.md +37 -0
- package/skills/in-depth-review/references/posting-pr-review.md +105 -0
- package/skills/simple-review/SKILL.md +472 -0
- package/skills/simple-review/references/cross-repo-evidence.md +64 -0
- package/skills/simple-review/references/posting-pr-review.md +100 -0
package/package.json
CHANGED
|
@@ -34,7 +34,6 @@ When a bug traces into a `@clipboard-health/*` library, read the source code in
|
|
|
34
34
|
- **playwright-reporter-llm**: Playwright reporter that outputs structured JSON for LLM agents. Minimal console output, flat schema, easy to filter to failures.
|
|
35
35
|
- **rules-engine**: A pure functional rules engine to keep logic-dense code simple, reliable, understandable, and explainable.
|
|
36
36
|
- **testing-core**: TypeScript-friendly testing utilities.
|
|
37
|
-
- **tribunal**: Structured second opinions from advocate, skeptic, analyst, and deliberator LLM perspectives.
|
|
38
37
|
- **util-ts**: TypeScript utilities.
|
|
39
38
|
|
|
40
39
|
<!-- END: Auto-generated by ./populateLibraries.ts -->
|
|
@@ -27,9 +27,9 @@ This skill always runs exactly one pass. It never waits or repeats internally. F
|
|
|
27
27
|
|
|
28
28
|
The skill uses two sentinels. Each is a visible footer line wrapped in `<sub>` (a π€ mark plus the token in `<code>`).
|
|
29
29
|
|
|
30
|
-
**Addressed sentinel**: `<sub>π€ <code>babysit-pr:addressed v1 core@3.
|
|
30
|
+
**Addressed sentinel**: `<sub>π€ <code>babysit-pr:addressed v1 core@3.9.0</code></sub>`. Appended on its own line at the end of every reply the skill posts (both thread replies and the review-body summary); this is how re-runs know which threads and review-body comments are already handled. Dedupe matches the version-agnostic substring `babysit-pr:addressed v1` followed by a space (also matches legacy `<!-- babysit-pr:addressed v1 ... -->` sentinels). Grep `babysit-pr:addressed v1` for any version; add `core@3.9.0` for a specific one.
|
|
31
31
|
|
|
32
|
-
**Follow-up sentinel**: `<sub>π€ <code>babysit-pr:followup v1 core@3.
|
|
32
|
+
**Follow-up sentinel**: `<sub>π€ <code>babysit-pr:followup v1 core@3.9.0</code></sub>`. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive β the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it β by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel.
|
|
33
33
|
|
|
34
34
|
**Sentinel recency rules.** The script emits a per-thread `activityState` with three values:
|
|
35
35
|
|
|
@@ -280,7 +280,7 @@ Body templates (the script appends the `addressed` sentinel if missing):
|
|
|
280
280
|
- **Agree**: `Addressed in <commit-url>. <one-line what-changed>.`
|
|
281
281
|
- **Disagree**: `Leaving current behavior. <reasoning>.`
|
|
282
282
|
- **Already fixed**: `Already handled by <commit-url-or-file:line>. <brief pointer>.`
|
|
283
|
-
- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. <one-line rationale or pointer if useful>.\n\n<sub>π€ <code>babysit-pr:followup v1 core@3.
|
|
283
|
+
- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. <one-line rationale or pointer if useful>.\n\n<sub>π€ <code>babysit-pr:followup v1 core@3.9.0</code></sub>`
|
|
284
284
|
|
|
285
285
|
For Defer replies, include the follow-up sentinel on its own line as shown. The script will append the `addressed` sentinel after it on its own line, so the final body ends with the follow-up sentinel followed by a blank line followed by the `addressed` sentinel β `grep babysit-pr:followup` finds the deferral and `grep babysit-pr:addressed` still marks the thread handled for dedupe.
|
|
286
286
|
|
|
@@ -296,7 +296,7 @@ The PR-level summary should:
|
|
|
296
296
|
|
|
297
297
|
- Group by source. Use `## Review-body findings` for step-7 work and `## Conversation-tab comments` for step-6b work. Omit a section if its list is empty.
|
|
298
298
|
- Inside each section, group verdicts under **Agree / Disagree / Already fixed / Deferred (out of scope)** subheadings. Omit a subheading if its list is empty.
|
|
299
|
-
- Under **Deferred (out of scope)**, list each deferred item as a bullet, followed on its own line by `<sub>π€ <code>babysit-pr:followup v1 core@3.
|
|
299
|
+
- Under **Deferred (out of scope)**, list each deferred item as a bullet, followed on its own line by `<sub>π€ <code>babysit-pr:followup v1 core@3.9.0</code></sub>` so grep catches them individually.
|
|
300
300
|
- Include the commit URL for fixes.
|
|
301
301
|
- End with a fenced fingerprint block listing every current fingerprint β addressed and deferred β one per line. Include both `reviewBodyComments[].fingerprint` (whole-body, one per automated review) and `activeIssueComments[].fingerprint` (per Conversation-tab comment). Future runs dedupe by matching these against `priorBabysitSentinels`.
|
|
302
302
|
|
|
@@ -9,7 +9,7 @@
|
|
|
9
9
|
# substituted at build time by embedPluginVersion.mts.
|
|
10
10
|
|
|
11
11
|
SENTINEL_PREFIX='babysit-pr:addressed v1 '
|
|
12
|
-
SENTINEL='<sub>π€ <code>babysit-pr:addressed v1 core@3.
|
|
12
|
+
SENTINEL='<sub>π€ <code>babysit-pr:addressed v1 core@3.9.0</code></sub>'
|
|
13
13
|
|
|
14
14
|
# Bot author allowlist (JSON array literal). Used by unresolvedPrComments.sh
|
|
15
15
|
# as a fallback when GraphQL's `author.__typename == "Bot"` misses a GitHub
|
|
@@ -47,7 +47,7 @@ Script paths in this procedure are written as `scripts/...`, relative to this SK
|
|
|
47
47
|
6. Check for an existing PR with `gh pr view`.
|
|
48
48
|
|
|
49
49
|
PR title format: conventional-commit type + description, with no scope, plus the Linear ticket in parentheses at the end when one applies (e.g., `feat: add resume command (STAFF-123)`). This differs from the commit subject, which keeps its scope. Derive the ticket from the branch name, commit body, or session context; omit the parenthetical when no ticket applies.
|
|
50
|
-
- No PR: create with `gh pr create` using the PR title format above. Description = the PR body shape above, followed by the session footer line if known and the agent footer `<sub>π€ <code>commit-push-pr:created v1 core@3.
|
|
51
|
-
- PR exists: if the title doesn't match the format above, correct it with `gh pr edit --title`. Refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing footer carrying the substring `commit-push-pr:created v1` is preserved verbatim, appending `<sub>π€ <code>commit-push-pr:created v1 core@3.
|
|
50
|
+
- No PR: create with `gh pr create` using the PR title format above. Description = the PR body shape above, followed by the session footer line if known and the agent footer `<sub>π€ <code>commit-push-pr:created v1 core@3.9.0</code></sub>` on its own line.
|
|
51
|
+
- PR exists: if the title doesn't match the format above, correct it with `gh pr edit --title`. Refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing footer carrying the substring `commit-push-pr:created v1` is preserved verbatim, appending `<sub>π€ <code>commit-push-pr:created v1 core@3.9.0</code></sub>` only if absent. Then report the URL.
|
|
52
52
|
|
|
53
53
|
7. End with one short text response: branch name and the full PR URL (e.g., `https://github.com/clipboardhealth/core-utils/pull/123`). Never use shorthand like `repo#123` β always output the complete URL.
|
|
@@ -0,0 +1,491 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: in-depth-review
|
|
3
|
+
description: Multi-agent code review of the current branch or a PR β parallel engineering, minimalist, conventions, and AntiSlop reviewers plus conditional security, database, and frontend specialists that debate, then a synthesized review posted as anchored PR comments. Use when the user asks for a deep, thorough, or multi-perspective review, reviews a large or high-stakes diff, or runs /in-depth-review [PR-number-or-URL]. For small diffs or a quick single-pass review, use simple-review instead.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# In-Depth Review
|
|
7
|
+
|
|
8
|
+
Run a multi-agent code review on the current branch _or_ on a PR identified by argument. By default dispatches four parallel reviewers (engineering+customer, minimalist, conventions, anti-AI-slop) and conditionally adds up to three domain specialists (security, database, frontend) when the diff touches their surface. The first-principles adversarial agent (Agent A) is **opt-in** β include it only when the user explicitly asks (see Invocation). Agents review in parallel, debate once, and the main agent moderator-filters and synthesizes a report. Author vs reviewer mode adapts the post-review flow.
|
|
9
|
+
|
|
10
|
+
## Invocation
|
|
11
|
+
|
|
12
|
+
- **`/in-depth-review`** β review the current branch (resolves the open PR for the branch if any, otherwise diffs against the default branch).
|
|
13
|
+
- **`/in-depth-review <PR-number-or-URL>`** β fast path for reviewing someone else's PR while sitting on `main` (typical entry from Claude Desktop in code mode on a worktree). Skips the local branch checkout, fetches diff and metadata via `gh`, forces **reviewer mode**, and skips the plan/execute path. Accepts either a bare number (resolved against the current repo) or a full GitHub PR URL (which also identifies the owner/repo, useful when the worktree's remote differs).
|
|
14
|
+
|
|
15
|
+
### Agent roster
|
|
16
|
+
|
|
17
|
+
| Letter | Name | Role | Default state |
|
|
18
|
+
| ------ | ----------- | ------------------------------------- | -------------------------------------------- |
|
|
19
|
+
| A | Adversarial | First-principles adversarial | Opt-in only (`with adversarial`, `+A`, etc.) |
|
|
20
|
+
| B | Engineering | High engineering standards + customer | Always on |
|
|
21
|
+
| C | Minimalist | Smallest-diff posture | Always on |
|
|
22
|
+
| D | Conventions | Repo-rules / docs | Always on |
|
|
23
|
+
| E | Security | Security & API surface | Conditional (auth/route/contract files) |
|
|
24
|
+
| F | Database | DB schema, queries, migrations | Conditional (migration/schema/model files) |
|
|
25
|
+
| G | Frontend | FE conventions & UX correctness | Conditional (FE files) |
|
|
26
|
+
| H | AntiSlop | Anti-AI-bias / pushback on slop | Always on |
|
|
27
|
+
|
|
28
|
+
**Refer to agents by name in everything the user sees** (Summary, Actionable items, Raised-by lines, Disagreements, Withdrawn). The letter is a compact prefix for finding IDs (`B1`, `C3`, `Adversarial`/`A1`, β¦) and a shorthand inside this document β never the only label in user-facing prose. The Round 2 `original_author` field stores the **name**, not the letter.
|
|
29
|
+
|
|
30
|
+
### Agent A (first-principles adversarial) is opt-in
|
|
31
|
+
|
|
32
|
+
The Adversarial agent (A) is skipped by default. Include it only when the user's invocation contains a clear opt-in phrase such as `with adversarial`, `with agent a`, `include adversarial`, `add the adversarial agent`, `+A`, or equivalent. If the user gives a more ambiguous instruction (e.g. "run the full review"), ask once whether to include the Adversarial agent before dispatching. Record the decision (`agent_a_enabled: true|false` + the phrase that triggered the opt-in, if any) in `/tmp/in-depth-review-meta.json` and surface it in the Summary so the user can see what ran.
|
|
33
|
+
|
|
34
|
+
## Scope
|
|
35
|
+
|
|
36
|
+
Two entry paths:
|
|
37
|
+
|
|
38
|
+
### Path A β PR-argument fast path (when an argument is provided)
|
|
39
|
+
|
|
40
|
+
- Parse the argument: bare integer β use current repo (`gh repo view --json nameWithOwner --jq .nameWithOwner`); URL β extract `<owner>/<repo>` and `<N>` from the URL.
|
|
41
|
+
- Do **not** check out the PR branch. Operate from whatever the working directory is (typically `main` in a worktree). Convention/file reads happen against a verified-fresh ref β see **Freshness preflight** below.
|
|
42
|
+
- Mode is **locked to reviewer**. Skip mode detection.
|
|
43
|
+
|
|
44
|
+
Gather in parallel:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
gh pr view <N> --repo <owner>/<repo> --json number,url,title,body,baseRefName,author,headRefOid,headRepository,files
|
|
48
|
+
gh pr diff <N> --repo <owner>/<repo>
|
|
49
|
+
gh api user --jq .login
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
Persist:
|
|
53
|
+
|
|
54
|
+
- Diff stdout from `gh pr diff` β `/tmp/in-depth-review-diff.patch`.
|
|
55
|
+
- `pr.title + pr.body` β `/tmp/in-depth-review-context.md`.
|
|
56
|
+
- `pr.files[].path` β `/tmp/in-depth-review-files.txt`.
|
|
57
|
+
- Meta (PR number/url/baseRefName/author, viewer login, head SHA, owner/repo, mode=`reviewer`, conditional-agent set after classification) β `/tmp/in-depth-review-meta.json`.
|
|
58
|
+
|
|
59
|
+
If `pr.author.login == viewer.login`, still proceed in reviewer mode (the user explicitly asked for the PR-arg path; honor it) but flag this in the synthesis Summary so the user can switch to a manual `/in-depth-review` run on their checkout if they meant to implement.
|
|
60
|
+
|
|
61
|
+
### Path B β current-branch path (no argument)
|
|
62
|
+
|
|
63
|
+
- If there is an open PR for the current branch, review that PR. Base = `baseRefName`, diff = `git diff $base...HEAD`. Capture PR title + body as context.
|
|
64
|
+
- Otherwise, review the current branch vs the default branch (`main`, fall back to `master`). Capture `git log --format='%h %s%n%n%b' $base..HEAD` as context.
|
|
65
|
+
- **Never** review uncommitted working-tree changes. If the branch has no diff vs base, stop and report.
|
|
66
|
+
|
|
67
|
+
Gather in parallel:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
git rev-parse --abbrev-ref HEAD
|
|
71
|
+
git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'
|
|
72
|
+
gh pr list --head "$(git branch --show-current)" --state open --json number,url,title,body,baseRefName,author,headRefOid
|
|
73
|
+
gh api user --jq .login
|
|
74
|
+
gh repo view --json nameWithOwner --jq .nameWithOwner
|
|
75
|
+
git diff --name-only "$base"...HEAD
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Determine **mode**:
|
|
79
|
+
|
|
80
|
+
- PR exists and `pr.author.login == viewer.login` β **author mode**.
|
|
81
|
+
- PR exists and authors differ β **reviewer mode**.
|
|
82
|
+
- No PR exists β **author mode** (reviewing your own pre-PR branch).
|
|
83
|
+
|
|
84
|
+
Persist context so subagents read it from disk:
|
|
85
|
+
|
|
86
|
+
- Diff β `/tmp/in-depth-review-diff.patch`
|
|
87
|
+
- Context (PR body or commit log) β `/tmp/in-depth-review-context.md`
|
|
88
|
+
- Changed-file list β `/tmp/in-depth-review-files.txt`
|
|
89
|
+
- Mode + repo metadata (PR number, url, base, head SHA, author, viewer, owner/repo, list of conditional agents that will run, verified-fresh `context_ref` from the freshness preflight) β `/tmp/in-depth-review-meta.json`
|
|
90
|
+
|
|
91
|
+
## Freshness preflight (runs before Diff classification β both paths)
|
|
92
|
+
|
|
93
|
+
Stale local state is the #1 cause of false-positive findings: it produces "evidence" of bugs that were already fixed on `main`. Before dispatching any agent, the main agent **must** verify that the ref it tells subagents to read for context is current with the remote, and pass that ref into subagent prompts so agents read code via `git show <ref>:<path>` and `git grep ... <ref> -- <paths>` rather than the worktree filesystem.
|
|
94
|
+
|
|
95
|
+
Run this preflight on the **primary repo** (the one containing the diff under review):
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
git fetch origin "$base" --quiet 2>&1 | tail -3
|
|
99
|
+
git rev-parse --abbrev-ref HEAD
|
|
100
|
+
git status --porcelain
|
|
101
|
+
git rev-list --left-right --count "HEAD...origin/${base}" # prints "<ahead>\t<behind>"
|
|
102
|
+
git log -1 --format='%h %ci' "origin/${base}"
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Decide `context_ref` (the ref subagents must read for repo context, distinct from the diff itself):
|
|
106
|
+
|
|
107
|
+
- **Path A (reviewer mode, PR-arg):** ideal `context_ref = origin/${base}` (whatever the PR's base branch is, usually `main`). The worktree filesystem must **only** be used as `context_ref` when `HEAD == origin/${base}` AND working tree is clean AND `git fetch` succeeded.
|
|
108
|
+
- **Path B (author mode, current-branch):** `context_ref = origin/${base}`. The user's local feature-branch files are the _diff_, not the _pre-PR context_; pre-PR convention/neighbor reads must come from `origin/${base}`.
|
|
109
|
+
|
|
110
|
+
If any of the following are true, **stop and ask the user before continuing**:
|
|
111
|
+
|
|
112
|
+
1. `git fetch origin "$base"` failed (offline, auth, repo permissions). Print the error.
|
|
113
|
+
2. `HEAD` differs from `origin/${base}` (Path A only β Path B expects HEAD on a feature branch).
|
|
114
|
+
3. Working tree is dirty (`git status --porcelain` is non-empty) **and** the dirty paths overlap the diff's changed-file list or any path subagents will need to read.
|
|
115
|
+
4. `HEAD` is behind `origin/${base}` (any non-zero "behind" count).
|
|
116
|
+
5. `HEAD` is more than a small number of commits ahead of `origin/${base}` on Path A (ahead implies the user is mid-work on a branch that isn't the PR; their worktree is not a clean main).
|
|
117
|
+
|
|
118
|
+
Warn template (substitute the verified state):
|
|
119
|
+
|
|
120
|
+
> Freshness check for `<owner>/<repo>` at `<worktree-path>`:
|
|
121
|
+
>
|
|
122
|
+
> - on branch `<HEAD-branch>` (expected `<base>` for Path A)
|
|
123
|
+
> - `<N>` commit(s) ahead, `<M>` commit(s) behind `origin/<base>` (last remote commit: `<short-sha> <iso-date>`)
|
|
124
|
+
> - working tree: `<clean | dirty: N file(s)>`
|
|
125
|
+
>
|
|
126
|
+
> Reading code from this worktree may surface findings based on stale or local-only state.
|
|
127
|
+
> Reply: `proceed` (use worktree anyway and accept the risk), `use-origin` (read context via `git show origin/<base>:<path>` instead β recommended, no checkout needed), or `stop` (let me clean up and re-run).
|
|
128
|
+
|
|
129
|
+
**Never** run `git checkout`, `git stash`, `git reset`, or any other state-modifying git command on the user's behalf. The skill warns and asks; the user is the only actor that resolves local state.
|
|
130
|
+
|
|
131
|
+
Record the resolved `context_ref` (e.g. `origin/main`, or the literal string `worktree (stale, user accepted risk)`) in `/tmp/in-depth-review-meta.json`. Every subagent prompt must explicitly state which ref to read and how (see "Reading code in subagents" below).
|
|
132
|
+
|
|
133
|
+
### Reading code in subagents
|
|
134
|
+
|
|
135
|
+
When `context_ref` is `origin/<base>`:
|
|
136
|
+
|
|
137
|
+
- Read repo context via `git show "${context_ref}:<path>"` (for whole files) and `git grep -n <pattern> "${context_ref}" -- <paths>` (for searches).
|
|
138
|
+
- The Read tool reads the working tree, which may be stale; subagents must **prefer** `git show` for context reads. The working tree may only be read when the file isn't tracked at `context_ref` (e.g. brand-new file in the PR diff) and that fact is explicitly noted in the finding.
|
|
139
|
+
|
|
140
|
+
When `context_ref` is `worktree (stale, user accepted risk)`:
|
|
141
|
+
|
|
142
|
+
- Subagents may use Read on the worktree, but every finding must include a one-line "verified against: worktree-stale" caveat in the `point`, and the moderator filter must downgrade CRITICAL/MAJOR to MINOR unless the evidence is internal to the diff itself.
|
|
143
|
+
|
|
144
|
+
## Cross-repo evidence policy (binding for all agents)
|
|
145
|
+
|
|
146
|
+
A finding's evidence is "cross-repo" when it depends on code in any repo other than the one containing the diff (e.g. a consumer or producer in another service). Subagents must **NOT** silently read external repos β stale checkouts fabricate evidence and unverifiable paths can't be checked. Emit such findings with an `evidence_required` field and cap severity at **MAJOR** until the moderator verifies the evidence on a fresh ref; after Round 1 the moderator collects every `evidence_required` block and asks the user for access before Round 2.
|
|
147
|
+
|
|
148
|
+
Read [references/cross-repo-evidence.md](references/cross-repo-evidence.md) before raising or finalizing any cross-repo finding β it has the `evidence_required` schema, the moderator's access-request prompt, the freshness rules for external repos, and `skip` handling.
|
|
149
|
+
|
|
150
|
+
## Diff classification (runs before Round 1)
|
|
151
|
+
|
|
152
|
+
Match the changed-file list against these path patterns. A conditional agent runs only when its pattern matches at least one changed file:
|
|
153
|
+
|
|
154
|
+
- **E β Security** triggers on: any file under `routes/`, `controllers/`, `middleware*/`, files matching `auth*`, `*permission*`, `*acl*`, `*token*`, `*session*`, response serializers, OpenAPI / contract definitions, or new API endpoint files.
|
|
155
|
+
- **F β Database & migrations** triggers on: `migrations/`, `*.sql`, files matching `schema*`, Mongoose/Prisma model files (`models/`, `*.model.ts`, `*.schema.ts`), repository / DAL files, or files containing query builders.
|
|
156
|
+
- **G β Frontend** triggers on: `*.tsx`, `*.jsx`, `*.css`, `*.scss`, files under `pages/`, `components/`, `hooks/`, or anything importing from `react`, `@tanstack/react-query`, or a design-system package.
|
|
157
|
+
|
|
158
|
+
Record the dispatched set in `/tmp/in-depth-review-meta.json` so Round 2 knows the full agent list.
|
|
159
|
+
|
|
160
|
+
## Severity rubric (binding for all agents)
|
|
161
|
+
|
|
162
|
+
- **CRITICAL** β realistic input causes incorrect behavior, data loss, security regression, broken contract, or a paging incident.
|
|
163
|
+
- **MAJOR** β meaningful degradation of correctness/UX/observability under realistic conditions; OR a documented-convention violation with concrete downstream impact.
|
|
164
|
+
- **MINOR** β cheap, concrete improvement with a named benefit.
|
|
165
|
+
- **NIT** β only admissible if (a) repeats β₯2Γ in the diff, (b) conflicts with a documented convention, or (c) is a one-line trivial fix. Otherwise drop silently.
|
|
166
|
+
|
|
167
|
+
Every finding **must** include a `failure_mode` field: one sentence on the concrete user-, oncall-, or maintainer-visible bad outcome that would occur if not fixed. Hypotheticals like "a future caller mightβ¦" do not satisfy this; such findings will be dropped during moderation.
|
|
168
|
+
|
|
169
|
+
Run the litmus test on every candidate finding before keeping it: _"What is the concrete, current, product-visible cost of leaving this code in?"_ If you cannot answer in one sentence, drop the finding.
|
|
170
|
+
|
|
171
|
+
### Do-not-raise list (binding)
|
|
172
|
+
|
|
173
|
+
Do not surface any of:
|
|
174
|
+
|
|
175
|
+
- Speculative defensiveness at trusted internal boundaries (e.g. "what if this private helper got null?").
|
|
176
|
+
- Restating the obvious ("consider a comment explaining what this does").
|
|
177
|
+
- Hypothetical future-caller scenarios with no current caller.
|
|
178
|
+
- Style/formatting a linter or formatter already covers.
|
|
179
|
+
- Test-coverage demands on trivially-verifiable code (one-line passthroughs, simple getters).
|
|
180
|
+
- "Add observability" without naming a concrete failure mode it would help debug.
|
|
181
|
+
- Abstract SOLID-style "consider extractingβ¦" suggestions without a concrete failure mode.
|
|
182
|
+
- Aesthetic naming preferences. Only raise names that mislead about behavior.
|
|
183
|
+
|
|
184
|
+
## Rounds (hard cap: 2)
|
|
185
|
+
|
|
186
|
+
### Round 1 β parallel independent reviews
|
|
187
|
+
|
|
188
|
+
Dispatch all selected agents **in the same message** via the `Agent` tool with `subagent_type: general-purpose`. The default selected set is **Engineering, Minimalist, Conventions, AntiSlop** (B/C/D/H), plus **Security, Database, Frontend** (E/F/G) when triggered by classification. **The Adversarial agent (A) is dispatched only when `agent_a_enabled` is true** (see Invocation β "Agent A is opt-in"). Each agent gets the file paths above, their role brief, and permission to Read repo files for context. Do **not** let agents see each other's output in this round. **Each agent emits at most 8 findings, prioritized β not exhaustive.**
|
|
189
|
+
|
|
190
|
+
**Every subagent prompt must include the following input contract verbatim** (substitute `<context_ref>` with the value resolved in the Freshness preflight):
|
|
191
|
+
|
|
192
|
+
> **Context-read contract.** The verified-fresh ref for repo context is `<context_ref>`. For any file you read that is part of the primary repo's tracked content (i.e. _not_ a brand-new file in this PR's diff), use `git show "<context_ref>:<path>"` or `git grep -n <pattern> "<context_ref>" -- <paths>` rather than the Read tool on the worktree. If you read from the worktree because the file is new in the diff or untracked at `<context_ref>`, say so in the finding's `point`.
|
|
193
|
+
>
|
|
194
|
+
> **Cross-repo evidence contract.** If a finding's load-bearing evidence is in any repo other than this one, do NOT silently read another local checkout β those checkouts may be stale or on unrelated branches. Instead, emit the finding with an `evidence_required` field naming the repo(s) and the specific verification question, and cap your severity at MAJOR. The moderator will ask the user for verified access before Round 2 and re-dispatch you if needed.
|
|
195
|
+
|
|
196
|
+
Required output per finding: `id` (`A1`, `B3`, `C2`, `D4`, `E1`, `F1`, `G1`, `H1`, β¦), `severity`, `file:lines`, `title` (one sentence), `point` (one short paragraph), `failure_mode` (one sentence), optional `suggested_fix` (see schema below), and optional `evidence_required` (object with `repos: string[]` and `what_to_verify: string` β required whenever the finding depends on cross-repo state).
|
|
197
|
+
|
|
198
|
+
**`suggested_fix` schema** (when present): an object with two string fields, both code (not prose), preserving the file's actual indentation:
|
|
199
|
+
|
|
200
|
+
```json
|
|
201
|
+
"suggested_fix": {
|
|
202
|
+
"before": "<the exact current code being changed, copied verbatim from file:lines>",
|
|
203
|
+
"after": "<the replacement code that would land at the same anchor>"
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
- For a **pure deletion**, set `after` to the empty string `""`.
|
|
208
|
+
- For a **pure addition** at a new line (nothing replaced), set `before` to the empty string `""` and prefix `after` with a one-line comment indicating where it lands (e.g. `// add after line 142`).
|
|
209
|
+
- For a **structural change** with no clean drop-in (e.g. "move this file"), omit `suggested_fix` entirely and put the guidance in `point` / `failure_mode`.
|
|
210
|
+
- Both fields must be code snippets directly usable to compute a diff. Do not include surrounding prose, do not paraphrase, and do not elide content with `// ...`. If the change is too large to inline both, omit `suggested_fix` and describe the fix in `point`.
|
|
211
|
+
|
|
212
|
+
#### Always-on agents (Engineering / Minimalist / Conventions / AntiSlop) and the opt-in Adversarial agent
|
|
213
|
+
|
|
214
|
+
**Adversarial (A) β First-principles. Opt-in only.** Dispatch only when the user explicitly asks (see Invocation β "Agent A is opt-in"). Skip silently otherwise. Posture: _"What is the single most important thing this PR gets wrong? Then up to 7 more, ranked."_ Question whether the change actually solves the underlying problem and whether the chosen approach is right. Challenge specific library / pattern / approach choices when they don't hold up on their own merits. Do **not** challenge foundational stack choices that are out of scope for the PR. Flag internal inconsistencies inside the diff itself. Specifically also probe:
|
|
215
|
+
|
|
216
|
+
- Should this PR be split, or is it bundling unrelated tickets?
|
|
217
|
+
- Is the root cause of the bug clear, and is it fixed in _every_ place it manifests?
|
|
218
|
+
- Is a feature flag warranted but missing?
|
|
219
|
+
- Are there old feature flags or dead references this change makes deletable?
|
|
220
|
+
|
|
221
|
+
Conventions are the Conventions agent's (D) job β do not double up.
|
|
222
|
+
|
|
223
|
+
**Engineering (B) β High engineering standards + customer-centric.** Posture: _"For each change, name a realistic input or condition that would expose a bug. If you cannot, do not raise it."_ Evaluate edge cases, error paths, observability, tests (coverage of real risk, not lines), high-level security (E does the deep dive when dispatched), concurrency, performance at real scale, data integrity, accessibility, UX, degraded-mode behavior, backward compatibility, on-call implications, and domain assumptions in the diff. Specifically also probe:
|
|
224
|
+
|
|
225
|
+
- Async/await correctness β ordering matches the actual dependency between calls.
|
|
226
|
+
- Timezone correctness in date-handling code; currency variables explicitly in minor units.
|
|
227
|
+
- When API surface changed: AuthN, AuthZ, sensitive-data leakage, PII handling. Hand the deep dive to Security (E) if dispatched.
|
|
228
|
+
- When a migration is in the diff: rollback strategy, backward compatibility during rolling deploy, ETL / downstream impact. Hand to Database (F) if dispatched.
|
|
229
|
+
- When DB schema or queries changed: indexes for new query patterns, N+1 risk, cascade-delete semantics, data-type fit. Hand to Database (F) if dispatched.
|
|
230
|
+
- Telemetry covers _business / product_ value, not only engineering surface metrics (latency / error rate).
|
|
231
|
+
- Contract / backward-compatibility for any consumer-visible response shape change.
|
|
232
|
+
|
|
233
|
+
Skip items that fail the realistic-input test.
|
|
234
|
+
|
|
235
|
+
**Minimalist (C).** Posture: the smallest diff that ships the intent is the best diff. Identify unneeded abstractions, speculative generality, dead branches, redundant validation, defensive code at trusted internal boundaries, comments that restate code, new files / utilities that duplicate existing ones, tests that test the framework rather than behavior, flags / config knobs without a concrete caller. Specifically also probe:
|
|
236
|
+
|
|
237
|
+
- Duplicated error handlers in the same scope.
|
|
238
|
+
- Commented-out code or dead branches (still gated by `failure_mode`).
|
|
239
|
+
|
|
240
|
+
For every "delete this" finding, `failure_mode` must state the _concrete cost of keeping the code_.
|
|
241
|
+
|
|
242
|
+
**Conventions (D) β Repo-rules.** Sole owner of convention checks (backend / shared). Read `AGENTS.md`, `CLAUDE.md`, `.rules/`, `.cursorrules`, `docs/adr/` (or equivalent), package READMEs in the diff's path, and one or two neighboring files in the same service for in-practice patterns. Then check the diff for:
|
|
243
|
+
|
|
244
|
+
- Preferred internal packages over third-party ones (e.g. `@clipboard-health/*`, internal `lib/*`).
|
|
245
|
+
- Terminology rules (worker/workplace, not HCP/facility).
|
|
246
|
+
- `lodash` removal preference.
|
|
247
|
+
- `@clipboard-health/datetime` over `date-fns`, `date-fns-tz`, `moment`, `luxon`.
|
|
248
|
+
- Internal `Money` package for currencies; explicit minor-units variable names.
|
|
249
|
+
- Logging conventions (`longContext`, `ObjectId.toString()`, no variables in log messages).
|
|
250
|
+
- Null/undefined checks via `isDefined` / `isNil` over truthy.
|
|
251
|
+
- 4+ argument functions β params object.
|
|
252
|
+
- Error-handling patterns in the same service (typed `ServiceResult` vs throw).
|
|
253
|
+
- Test conventions, naming, file location.
|
|
254
|
+
- Business-meaningful magic constants that should be named.
|
|
255
|
+
- ENV access via the project's Configuration abstraction, not direct `process.env`.
|
|
256
|
+
- Logging library standard for the repo (e.g. Winston) β derive from existing code, not hardcoded.
|
|
257
|
+
- Pinned dependencies; lock file (`package-lock.json` / `yarn.lock`) in sync with `package.json`.
|
|
258
|
+
- RESTful route shape and uniform response format within a service.
|
|
259
|
+
- Internal inconsistency _within the diff itself_ (e.g. half of imports from one package family, half from upstream).
|
|
260
|
+
|
|
261
|
+
Frontend conventions are the Frontend agent's (G) territory when G is dispatched. If Frontend is not dispatched (no FE files), Conventions (D) may surface FE convention drift if it appears.
|
|
262
|
+
|
|
263
|
+
Tag every finding `[CONVENTION]`. Cap severity at MAJOR (only when behavior diverges as a result of the violation) or MINOR otherwise. At the top of your output, list the specific files/sources you checked.
|
|
264
|
+
|
|
265
|
+
**AntiSlop (H) β Anti-AI-bias.** Posture: _"This PR may have been written or assisted by an LLM. For each addition, ask: 'is this line earning its keep, or is it pattern-matching what code is supposed to look like?' If a reasonable senior engineer wouldn't have written it, call it out and propose deleting or shrinking it."_ Your job is to push back on AI-shaped padding that the other agents are too polite to call slop. Specifically watch for, **inside the diff itself**:
|
|
266
|
+
|
|
267
|
+
- **Defensive code at trusted internal boundaries.** Null / undefined guards on private helpers whose callers' types already guarantee non-null. `try` / `catch` wrapping a single call that doesn't itself throw, or that re-throws unchanged, or that "logs and swallows" without naming what to do next. Optional-chaining cascades through types that don't include optionality.
|
|
268
|
+
- **Defensiveness against unrealistic product / domain scenarios.** Code that guards against situations that wouldn't actually arise when the product is used as designed. Ask the litmus question: _"In the real product flow this code participates in, what user action, system event, or upstream call could land us in this branch?"_ If the answer is "none" or "I had to invent one to justify the guard", the code is slop. Concrete shapes this takes:
|
|
269
|
+
- A `null` / `undefined` guard on an ID immediately after that ID was used to load (and find) the entity.
|
|
270
|
+
- A `null` / `undefined` / `""` / `0` branch on a field whose TypeScript type permits only one of those values, or whose Zod / class-validator schema already rejected the others at the request boundary.
|
|
271
|
+
- A re-validation of a value the request DTO already validated upstream in the same request lifecycle.
|
|
272
|
+
- A consistency check (`if (a !== b) throw`) between two fields the data model forbids being unequal.
|
|
273
|
+
- A branch for a product-impossible state β e.g. "what if the worker is assigned to two workplaces at once" when the product enforces single-active-assignment; "what if the shift end is before its start" after the schema already enforces ordering; "what if the dispute amount is negative" when the validator caps it.
|
|
274
|
+
- A retry / fallback layer around an SDK or library call that already retries or already returns a typed error you can ignore.
|
|
275
|
+
- A `catch` clause for an error class the underlying code is statically known not to throw, or that "logs and swallows" without naming what to do next.
|
|
276
|
+
- A "future-proof" code path (`if (feature === "v2")`) when there is no `v2` and no concrete ticket creating one.
|
|
277
|
+
|
|
278
|
+
Don't accept "but what if upstream changes?" as a defense β that's the hypothetical-future-caller anti-pattern. The fix when upstream genuinely changes is a typed-error / schema update in the same PR, not preemptive guards.
|
|
279
|
+
|
|
280
|
+
- **Restating-the-code comments.** `// fetch the user`, `// loop through items`, `// add 1 to counter`. JSDoc on private helpers that only restates the signature. `// Note: this is important` lines. Section banners (`// === HELPERS ===`) inside short files.
|
|
281
|
+
- **Empty scaffolding.** `// TODO` comments with no owner, ticket, or trigger condition. Redundant pre-conditions (`if (!x) throw` immediately after a Zod parse). Log lines added "for debugging" that survive into the PR. Default `else { return undefined; }` after already-exhaustive branching. `_unused` parameter prefixes that should just be deletions.
|
|
282
|
+
- **Speculative generality.** A helper called once that wraps two trivial lines. A `Map` / `Set` / config object keyed by a single hardcoded value. A "strategy" / "registry" pattern with one strategy. Type unions whose only second case is `never`, a placeholder, or a string literal nobody emits.
|
|
283
|
+
- **Unused parameters / overloads / fields.** Function args destructured but never read; interface methods with empty implementations; type fields written but never consumed downstream in the diff; new optional fields with no producer or consumer.
|
|
284
|
+
- **Tests that exercise the framework, not the code.** Tests that assert "the mock returns what the mock returned" via `jest.fn().mockReturnValue(x); expect(fn()).toBe(x)`. Snapshot tests with no semantic assertion. Tests that mock the unit under test. Tests with names that describe the implementation (`it("calls foo.bar"β¦)`) rather than the behavior.
|
|
285
|
+
- **Dead AI breadcrumbs.** Variables whose only use is being logged or echoed in a debug branch; `console.log` / `console.error` calls that should have been removed before commit; commented-out alternative implementations left in the diff; "\_legacy" parameter renames where nothing actually changed about how it's read.
|
|
286
|
+
- **Tone / description mismatch.** PR description / commit message claims behavior the diff doesn't have, or describes the diff in language that pattern-matches engineering writing without naming the actual change. Variable / function names that sound right but don't reflect the value's role (`data`, `result`, `processed`, `_handle`, `doStuff`).
|
|
287
|
+
|
|
288
|
+
For every finding, `failure_mode` must name the _concrete cost of leaving the code in_ β e.g. "future readers waste cycles understanding a wrapper that does nothing", "the unused parameter masks the next refactor's mistake", "the `try` / `catch` silently swallows an unrelated runtime error", "the restating comment goes stale in the next change and misleads readers", "the speculative helper makes call-site search return one false hit per call". A finding without a concrete cost-of-keeping fails the moderator filter.
|
|
289
|
+
|
|
290
|
+
The default `suggested_fix` shape is **delete** (empty `after`) or **simplify** (smaller `after` than `before`). Suggesting "add a justifying comment" is itself slop β do not propose it. If the right move is structural (e.g. inline the wrapper), omit `suggested_fix` and describe the move in `point`.
|
|
291
|
+
|
|
292
|
+
Stay in your lane:
|
|
293
|
+
|
|
294
|
+
- Do **not** raise items that the Conventions agent (D) owns (terminology, lodash, datetime / money libraries, logging context shape) β that's their turf.
|
|
295
|
+
- Do **not** demand observability, tests, or error handling that **does not exist** in the diff β that's the inverse of your job. You only call out what's _present_ and unnecessary.
|
|
296
|
+
- Do **not** challenge whether the PR solves the right problem β that's the Adversarial agent's (A) turf when opted in.
|
|
297
|
+
- A change that's small but unnecessary is still slop. A change that's large but earns each line is not.
|
|
298
|
+
|
|
299
|
+
Cap output at 8 findings. `failure_mode` required.
|
|
300
|
+
|
|
301
|
+
**Round 2 expanded role β audit the other agents' findings, not just your own.** In Round 2 you receive every other agent's Round 1 findings. In addition to defending or withdrawing your own (the standard Round 2 contract), you are the AI-bias counterweight in the debate: audit every other agent's finding for the same slop patterns you scan code for. Mark each such finding `disagree` with a one-line `slop:` label in the `reasoning` field. Specifically watch for:
|
|
302
|
+
|
|
303
|
+
- **Asks-for-more-defensiveness.** "Add a null check", "wrap in try / catch", "validate this input", "guard against β¦". If the value is already typed, already validated upstream, or already guaranteed by a preceding call inside the same scope, tag `slop: asks for defensive guard on already-narrowed value`.
|
|
304
|
+
- **Hypothetical concerns.** "What if a future caller passes β¦", "consider the case where β¦", "if someone later β¦". If the finding cannot name a current realistic input or product flow that hits the concern, tag `slop: hypothetical future caller β no current path`.
|
|
305
|
+
- **Restating-the-obvious comment requests.** "Add a JSDoc explaining what this does", "consider a comment here". If the name + signature already convey intent, tag `slop: restating-the-obvious comment request`.
|
|
306
|
+
- **Abstract "extract this helper" / SOLID-aesthetic refactors.** Refactor suggestions whose `failure_mode` is shape-of-the-code rather than a concrete cost of keeping the inline version. Tag `slop: refactor with no concrete cost-of-keeping`.
|
|
307
|
+
- **Observability demands without a named failure mode.** "Add a log here", "emit a metric". If the finding can't name the specific failure the telemetry would help debug, tag `slop: observability without named failure mode`.
|
|
308
|
+
- **Test-coverage demands on trivially-verifiable code.** "Add a unit test for this getter / passthrough / one-line wrapper". If the code's correctness is type-evident or already exercised by a higher-level test in the same PR, tag `slop: test for trivially-verifiable code`.
|
|
309
|
+
- **Domain-impossible scenarios surfaced by other agents.** Any finding that worries about a state the product cannot produce given how it actually works. Apply the same litmus question to _the finding_ that you apply to code: "what real product flow could land us here?" If none, tag `slop: defends against a state the product cannot produce`.
|
|
310
|
+
|
|
311
|
+
The moderator filter weighs your slop tags heavily β see the moderator filter step on AntiSlop tags. The original author gets a chance to defend in their own Round 2 entry (a concrete `final_failure_mode` naming a real, product-specific cost); without that defense, the finding is dropped.
|
|
312
|
+
|
|
313
|
+
Even when your Round 1 findings are sparse because the diff is genuinely clean, do not under-spend on the Round 2 audit. The audit is often the bigger lever β it stops slop _suggestions_ from polluting the final review.
|
|
314
|
+
|
|
315
|
+
#### Conditional agents
|
|
316
|
+
|
|
317
|
+
**Security (E) β API surface.** Dispatched when the diff touches auth / route / permission / serializer / contract files. Posture: _"Where could this change leak data, bypass authorization, or expand the trust boundary?"_ Specifically check:
|
|
318
|
+
|
|
319
|
+
- AuthN: every new or modified endpoint has authenticated identity unless explicitly public.
|
|
320
|
+
- AuthZ: per-endpoint and per-resource permission checks; cross-tenant / cross-user access.
|
|
321
|
+
- Secrets in configuration β no hardcoded keys, no secrets in client bundles.
|
|
322
|
+
- No self-made or client-side cryptography.
|
|
323
|
+
- SQL / NoSQL injection vectors (string-concatenated queries, unvalidated `$where`, raw aggregation pipelines from user input).
|
|
324
|
+
- Sensitive data in response payloads β over-fetching, over-serialization, internal IDs leaked.
|
|
325
|
+
- PII handling β PII fields logged, cached, or sent to telemetry.
|
|
326
|
+
- Input validation at the boundary (Zod / class-validator) on every new endpoint.
|
|
327
|
+
|
|
328
|
+
Cap output at 8 findings. `failure_mode` required.
|
|
329
|
+
|
|
330
|
+
**Database (F) β Schema, queries, migrations.** Dispatched when the diff touches migrations / SQL / schema / model / query files. Posture: _"What breaks at production scale or during deploy?"_ Specifically check:
|
|
331
|
+
|
|
332
|
+
- Index coverage for new query patterns; missing compound indexes.
|
|
333
|
+
- N+1 query shapes; loops issuing per-iteration queries.
|
|
334
|
+
- Schema normalization β denormalization that creates write-amplification or update anomalies.
|
|
335
|
+
- Data types β range, precision, locale (numeric, date, money).
|
|
336
|
+
- Cascade-delete and FK-on-delete semantics; orphan risk.
|
|
337
|
+
- Migration rollback strategy; can the migration run online without downtime?
|
|
338
|
+
- Backward compatibility during a rolling deploy (old code reads new schema, new code reads old schema).
|
|
339
|
+
- ETL / downstream consumer impact when schema or field semantics change.
|
|
340
|
+
- New collections / tables ship with the indexes their query patterns need on day one.
|
|
341
|
+
|
|
342
|
+
Cap output at 8 findings. `failure_mode` required.
|
|
343
|
+
|
|
344
|
+
**Frontend (G) β Conventions & UX-correctness.** Dispatched when the diff touches `*.tsx` / `*.jsx` / FE component / hook / page paths. Posture: _"Does this follow our FE conventions, and will it behave correctly under realistic user conditions?"_ Specifically check:
|
|
345
|
+
|
|
346
|
+
- API calls go through v2 / custom hooks, not raw `fetch` / `axios` in components.
|
|
347
|
+
- React Query: thin wrappers (`useGetQuery`, etc.), not raw `useQuery` / `useMutation`.
|
|
348
|
+
- Falsy-value checks use `isDefined()` / `isNil()`, not `!value`.
|
|
349
|
+
- Test actions wrapped in `act()`.
|
|
350
|
+
- Zod schemas: `z.nativeEnum` used where a TS enum already exists.
|
|
351
|
+
- Prop drilling β when a value is passed through β₯2 layers, is a context preferable?
|
|
352
|
+
- Feature flags via `useCbhFlag` (or repo equivalent), not direct config reads.
|
|
353
|
+
- New components / styles pulled from the design-system library, not built from scratch.
|
|
354
|
+
- Loading / empty / error states present for any new data-fetching surface.
|
|
355
|
+
- Accessibility: keyboard navigation, ARIA roles, labels on interactive controls.
|
|
356
|
+
|
|
357
|
+
Cap output at 8 findings. `failure_mode` required.
|
|
358
|
+
|
|
359
|
+
### Round 2 β debate
|
|
360
|
+
|
|
361
|
+
Dispatch the **same set of agents** that ran in Round 1, in parallel. Each agent receives **all Round 1 outputs** (passed inline) and must:
|
|
362
|
+
|
|
363
|
+
- Re-examine each of their own comments and **withdraw** any that don't survive scrutiny.
|
|
364
|
+
- For each comment from the other agents: mark `agree`, `disagree` (with reasoning), or `refine` (propose a tighter version).
|
|
365
|
+
- Flag items where disagreement is substantive and unlikely to resolve.
|
|
366
|
+
|
|
367
|
+
Output per item: `id`, `original_author` (agent **name**, e.g. `Engineering`, `Minimalist`, `Conventions`, `AntiSlop`, `Security`, `Database`, `Frontend`, `Adversarial` β not the bare letter), `verdict` (keep | withdraw | agree | disagree | refine), `final_severity`, `final_title`, `final_failure_mode`, `reasoning`, `suggested_fix`, and rebuttals `[{from, stance, reasoning}]` where `from` is also the agent name.
|
|
368
|
+
|
|
369
|
+
**AntiSlop (H) plays an enhanced Round 2 role.** Beyond defending / withdrawing its own findings and voting on the others, AntiSlop audits every other agent's findings for AI-bias patterns β asks-for-more-defensiveness, hypothetical "future caller" concerns, restating-the-obvious comment requests, abstract "extract this helper" refactors, observability / test-coverage demands without a named failure mode, and guards-against-states-the-product-cannot-produce. AntiSlop tags those `disagree` with a one-line `slop:` label in `reasoning`. These tags feed the moderator filter (see step 2 there); the original author's Round 2 entry is their chance to rebut with a concrete, product-specific `final_failure_mode`. When other agents see an AntiSlop slop tag on one of their own findings, they should either rebut with a concrete failure mode or withdraw β not both stand on the original framing and re-assert the same wording.
|
|
370
|
+
|
|
371
|
+
**This is the last round.** Residual disagreement goes to the Disagreements section.
|
|
372
|
+
|
|
373
|
+
## Moderator filter (main agent β runs after Round 2, before synthesis)
|
|
374
|
+
|
|
375
|
+
Apply these filters in order. They are non-negotiable:
|
|
376
|
+
|
|
377
|
+
1. **Drop findings with empty or hypothetical `failure_mode`.** "A future caller mightβ¦", "in case someoneβ¦" β drop.
|
|
378
|
+
2. **Apply AntiSlop slop tags.** For every Round 2 entry where AntiSlop voted `disagree` with a `slop:` label in `reasoning`, check the original author's Round 2 rebuttal. If the author did not provide a concrete, product-specific `final_failure_mode` (a real user-, oncall-, or maintainer-visible cost realized through an actual product flow) that addresses AntiSlop's specific tag, **drop the finding**. AntiSlop's audit is not a unilateral veto, but the burden of proof shifts to the original author once AntiSlop tags slop. Record dropped findings in Withdrawn with the slop label so the user can audit AntiSlop's calls (e.g. `B4 β dropped: AntiSlop tagged "slop: asks for defensive guard on already-narrowed value", Engineering did not rebut with concrete failure_mode`).
|
|
379
|
+
3. **Drop do-not-raise items** that slipped through.
|
|
380
|
+
4. **Apply the NIT gate**. NITs that don't meet (a)/(b)/(c) β drop. NITs that do meet are kept internally but hidden by default in synthesis.
|
|
381
|
+
5. **Merge near-duplicates** across agents into one item (preserve all attributions in `Raised by:`).
|
|
382
|
+
6. **Apply hard caps**: at most **6 actionable** items (CRITICAL/MAJOR/MINOR) and **8 NITs** retained internally. Anything beyond is summarized as "N additional items omitted; ask for the full list."
|
|
383
|
+
|
|
384
|
+
Filtered items go into Withdrawn (one-liner) so nothing is invisible.
|
|
385
|
+
|
|
386
|
+
## Synthesize (main agent)
|
|
387
|
+
|
|
388
|
+
### Summary
|
|
389
|
+
|
|
390
|
+
One short paragraph: what the change does and your overall recommendation (ship / ship with changes / do not ship). Do not state the mode, name which agents ran, attribute headline verdicts to specific agents, or note whether Adversarial ran β that methodology metadata is noise for the user. Keep it to substance: the headline concerns and the recommendation.
|
|
391
|
+
|
|
392
|
+
### Actionable
|
|
393
|
+
|
|
394
|
+
Up to 6 items that survived scrutiny and the moderator filter. Format each:
|
|
395
|
+
|
|
396
|
+
- **[SEVERITY] Title** β `file:lines`
|
|
397
|
+
- **Point:** one sentence.
|
|
398
|
+
- **Why it matters:** the `failure_mode` sentence.
|
|
399
|
+
- **Counterpoint (if any):** one sentence on the rebuttal and why it didn't overturn.
|
|
400
|
+
- **Suggested fix (before β after):** two stacked fenced code blocks with language tags β first labeled `// Before` (the current code from `file:lines`), then `// After` (the replacement). Use the same language tag for both. Omit when the fix is structural (no clean drop-in) and explain in prose instead. For a pure deletion, show the `Before` block and write "_Delete these lines._" under it. For a pure addition, show only the `After` block prefixed with `// Add after line <N>`.
|
|
401
|
+
- **Raised by:** comma-separated agent **names** (e.g. `Engineering, Conventions`); **agreed by:** comma-separated agent names. Do not use bare letters in this field β the user shouldn't have to decode A/B/C.
|
|
402
|
+
|
|
403
|
+
Render template:
|
|
404
|
+
|
|
405
|
+
````markdown
|
|
406
|
+
**Suggested fix (before β after):**
|
|
407
|
+
|
|
408
|
+
```ts
|
|
409
|
+
// Before β src/foo/bar.ts:42-45
|
|
410
|
+
const x = doThing();
|
|
411
|
+
if (x !== null) {
|
|
412
|
+
return x;
|
|
413
|
+
}
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
```ts
|
|
417
|
+
// After
|
|
418
|
+
const x = doThing();
|
|
419
|
+
return isDefined(x) ? x : undefined;
|
|
420
|
+
```
|
|
421
|
+
````
|
|
422
|
+
|
|
423
|
+
Order by severity (CRITICAL β MAJOR β MINOR), then by file.
|
|
424
|
+
|
|
425
|
+
### Disagreements
|
|
426
|
+
|
|
427
|
+
Items where agents substantively disagreed and did not converge. One sentence per side (attribute each by agent **name**, e.g. "Engineering: β¦" / "Minimalist: β¦"), then the moderator's call with a one-line reason.
|
|
428
|
+
|
|
429
|
+
### Nits
|
|
430
|
+
|
|
431
|
+
Do **not** print the nits list by default. Print only one line: _"N nits available (M from convention audit). Reply `show nits` to see them."_ If N is 0, omit the section entirely. When the user replies `show nits`, print the gated NITs as one-liners with `file:lines`, the raising agent's **name**, and `[CONVENTION]` tag where applicable, then re-prompt user gate 1.
|
|
432
|
+
|
|
433
|
+
### Withdrawn
|
|
434
|
+
|
|
435
|
+
Terse one-liners of Round 1 comments that were withdrawn or filtered. Each line ends with the raising agent's **name** in parentheses (e.g. "C7 β superseded by C4 (Minimalist)"). For transparency only.
|
|
436
|
+
|
|
437
|
+
## User gate 1 β select items
|
|
438
|
+
|
|
439
|
+
Ask exactly: _"Which items do you want to act on? Reply with ids (e.g. `B1, C2`), `all actionable`, `show nits`, or `none`."_ Do not proceed until the user answers. `none` ends the skill cleanly. `show nits` prints the hidden list, then re-asks the same question.
|
|
440
|
+
|
|
441
|
+
## Branch on mode
|
|
442
|
+
|
|
443
|
+
### Author mode
|
|
444
|
+
|
|
445
|
+
**Plan.** For the selected ids only, produce an ordered implementation plan: steps, files touched per step, tests to add/update, verification commands. Do **not** edit any files yet.
|
|
446
|
+
|
|
447
|
+
**User gate 2 (author mode).** Ask: _"What do you want to do? (`implement-locally` / `post-as-review` / `both` / `edit-plan` / `cancel`)"_
|
|
448
|
+
|
|
449
|
+
- `implement-locally` β execute (see below).
|
|
450
|
+
- `post-as-review` β post anchored review (see below); skill ends.
|
|
451
|
+
- `both` β post the review first, then execute.
|
|
452
|
+
- `edit-plan` β revise from feedback, re-prompt.
|
|
453
|
+
- `cancel` β stop.
|
|
454
|
+
|
|
455
|
+
**Execute.** Use `TaskCreate` to track each step, apply the plan, run the verification, report results. If a step surfaces a new substantive issue not in the selected items, stop and ask before expanding scope.
|
|
456
|
+
|
|
457
|
+
### Reviewer mode
|
|
458
|
+
|
|
459
|
+
Skip the plan step entirely β you are not implementing someone else's code.
|
|
460
|
+
|
|
461
|
+
**User gate 2 (reviewer mode).** Ask: _"Post these on the PR? (`post` / `edit` / `cancel`)"_
|
|
462
|
+
|
|
463
|
+
- `post` β post anchored review (see below).
|
|
464
|
+
- `edit` β ask which items to drop or refine, then re-prompt.
|
|
465
|
+
- `cancel` β stop.
|
|
466
|
+
|
|
467
|
+
## Posting an anchored PR review (both modes)
|
|
468
|
+
|
|
469
|
+
When the user approves items to post, submit a **single** review via the GitHub Reviews API (`event: COMMENT` β never `APPROVE`/`REQUEST_CHANGES`), with every selected actionable item as an inline comment anchored to its diff line. Never use loose issue comments.
|
|
470
|
+
|
|
471
|
+
Follow [references/posting-pr-review.md](references/posting-pr-review.md) exactly β it has the `gh api` call, the payload shape, the three-block review body (attribution line, synthesis summary, apply-all prompt), and the per-comment body budget and format. After posting, print the review `html_url` and a one-line summary of how many comments were posted (and how many fell back to general notes, if any).
|
|
472
|
+
|
|
473
|
+
## Rules
|
|
474
|
+
|
|
475
|
+
- Use fresh `general-purpose` subagents for every round β do not reuse other subagent types.
|
|
476
|
+
- Issue all agent calls in a single message for each round so they run truly in parallel.
|
|
477
|
+
- Keep large content on disk (`/tmp/in-depth-review-*`) so round-2 prompts stay compact.
|
|
478
|
+
- If one agent fails or returns malformed output, re-dispatch **that agent only**; do not restart the round.
|
|
479
|
+
- If the diff is empty, stop with a one-line message β do not invent findings.
|
|
480
|
+
- Do not auto-apply recommendations and do not auto-post reviews. Both user gates are mandatory.
|
|
481
|
+
- The moderator filter is non-negotiable: empty `failure_mode` β drop, NIT not meeting the gate β drop, do-not-raise items β drop, caps applied.
|
|
482
|
+
- Hide nits by default. Only print them when the user replies `show nits`.
|
|
483
|
+
- Reviews are always `event: COMMENT`. Never approve or request changes on the user's behalf.
|
|
484
|
+
- Posted comment bodies are terse: β€60 words of prose + the code block, no bulleted lists, no prose preamble on the `suggestion`, no trailing addenda, and `Why it matters` is dropped when it would only rephrase the Point. The Example/context block is opt-out β include it only when prose + `suggestion` is genuinely ambiguous.
|
|
485
|
+
- Conditional agents (E/F/G) run only when classification matches. Do not invent triggers; if the user wants a forced full run, they will say so.
|
|
486
|
+
- The Adversarial agent (A) is opt-in. Default selected set is Engineering / Minimalist / Conventions / AntiSlop (B/C/D/H), plus the classified subset of Security / Database / Frontend (E/F/G). Dispatch Adversarial only when the user explicitly asks (`with adversarial`, `with agent a`, `+A`, etc.). Surface the Adversarial on/off decision in the Summary so the user can see what ran.
|
|
487
|
+
- **Refer to agents by name in every user-facing surface** β Summary, Actionable items (`Raised by` / `agreed by`), Disagreements, Withdrawn, and `original_author` / `rebuttals[].from` in Round 2 output. Bare letters (A/B/C/D/E/F/G) are acceptable only as ID prefixes (`B1`, `C3`) and as parenthetical shorthand next to the name. The agent-name table in the "Agent roster" section is the authoritative mapping; do not invent alternative names.
|
|
488
|
+
- Freshness preflight is mandatory before any agent dispatch. Subagents read repo context via `git show "<context_ref>:<path>"` / `git grep ... "<context_ref>"`, not the worktree filesystem, unless the resolved `context_ref` is `worktree (stale, user accepted risk)`.
|
|
489
|
+
- Never run state-modifying git commands on the user's behalf (`checkout`, `stash`, `reset`, `clean`, `pull` with merge). The skill warns and asks; the user resolves local state. `git fetch` is allowed because it does not modify the working tree.
|
|
490
|
+
- Cross-repo evidence is opt-in by the user. Subagents tag findings with `evidence_required` and cap them at MAJOR until verified; the moderator asks the user for paths or `gh:owner/repo` slugs and runs the freshness preflight on each before treating the evidence as load-bearing. `skip` downgrades the finding to MINOR with a "speculative" prefix.
|
|
491
|
+
- `suggested_fix` is always a `{ before, after }` object of language-matched code snippets (never prose). **In-chat synthesis Actionable section:** render both halves as separate language-tagged fenced blocks (`// Before` then `// After`) β the user has no GitHub renderer in their terminal, so they need the pair to review the change before approving the post. **PR-review inline comments:** render ONLY the `suggestion` block β GitHub already renders the diff vs. the anchored line(s), so a separate `// Before` block would duplicate what's on screen. Empty `before` means pure addition (still emit a `suggestion` block; prefix `after` with `// Add after line <N>`). Empty `after` means pure deletion (omit the `suggestion` block and write "_Delete the anchored line(s)._"). Omit `suggested_fix` entirely for structural changes with no clean drop-in.
|