@glrs-dev/harness-plugin-opencode 2.1.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +70 -0
- package/README.md +39 -104
- package/dist/agents/prompts/build.md +18 -4
- package/dist/agents/prompts/build.open.md +18 -4
- package/dist/agents/prompts/{qa-thorough.md → code-reviewer-thorough.md} +34 -19
- package/dist/agents/prompts/code-reviewer.md +80 -0
- package/dist/agents/prompts/code-reviewer.open.md +68 -0
- package/dist/agents/prompts/gap-analyzer.md +2 -0
- package/dist/agents/prompts/plan-reviewer.md +3 -0
- package/dist/agents/prompts/plan.md +23 -4
- package/dist/agents/prompts/prime.md +146 -87
- package/dist/agents/prompts/research-auto.md +1 -1
- package/dist/agents/prompts/research-local.md +1 -1
- package/dist/agents/prompts/research-web.md +1 -1
- package/dist/agents/prompts/research.md +2 -0
- package/dist/agents/prompts/spec-reviewer.md +54 -0
- package/dist/agents/prompts/spec-reviewer.open.md +57 -0
- package/dist/agents/shared/index.ts +1 -0
- package/dist/agents/shared/ui-evaluation-ladder.md +50 -0
- package/dist/agents/shared/workflow-mechanics.md +5 -5
- package/dist/autopilot/prompt-template.md +80 -0
- package/dist/{chunk-VJUETC6A.js → chunk-PDMXYZM4.js} +53 -1
- package/dist/cli.js +1333 -1646
- package/dist/commands/prompts/fresh.md +27 -24
- package/dist/commands/prompts/review.md +3 -3
- package/dist/commands/prompts/ship.md +2 -0
- package/dist/index.js +106 -627
- package/dist/skills/adversarial-review-rubric/SKILL.md +47 -0
- package/dist/skills/code-quality/SKILL.md +1 -1
- package/dist/skills/root-cause-diagnosis/SKILL.md +24 -0
- package/dist/skills/spear-protocol/SKILL.md +166 -0
- package/package.json +1 -1
- package/dist/agents/prompts/pilot-assessor.md +0 -77
- package/dist/agents/prompts/pilot-builder.md +0 -40
- package/dist/agents/prompts/pilot-planner.md +0 -56
- package/dist/agents/prompts/pilot-scoper.md +0 -58
- package/dist/agents/prompts/qa-reviewer.md +0 -68
- package/dist/agents/prompts/qa-reviewer.open.md +0 -58
- package/dist/chunk-6CZPRUMJ.js +0 -869
- package/dist/chunk-DZG4D3OH.js +0 -54
- package/dist/chunk-OYRKOEXK.js +0 -88
- package/dist/commands/prompts/autopilot.md +0 -96
- package/dist/install-6775ZBDG.js +0 -13
- package/dist/paths-WZ23ZQOV.js +0 -18
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,75 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 2.2.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- [#58](https://github.com/iceglober/glrs/pull/58) [`2720440`](https://github.com/iceglober/glrs/commit/2720440e76ed76f95a59b77525cb140bd673d669) Thanks [@iceglober](https://github.com/iceglober)! - Autopilot rewrite, pilot rip-out, Tier 1 visual capabilities, opencode-snip toggle, research-variant hiding.
|
|
8
|
+
|
|
9
|
+
**Breaking changes:**
|
|
10
|
+
|
|
11
|
+
- **Pilot subsystem removed.** The `glrs oc pilot` CLI subcommand, the four pilot agents (`pilot-scoper` / `planner` / `builder` / `assessor`), the pilot-planning skill references, the `pilot-plugin.ts` runtime enforcer, and all pilot state/docs are gone. Users on pilot should migrate to the CLI autopilot or plain PRIME workflow.
|
|
12
|
+
- **TUI `/autopilot` slash command removed.** Autopilot is now CLI-only: `glrs oc autopilot "<prompt>"`. Users who want autonomous looping run the CLI in any terminal; the TUI stays for interactive work.
|
|
13
|
+
- **Research-variant agents (`research-web`, `research-local`, `research-auto`) hidden from the primary-agent picker.** They now run only as subagents dispatched by `@research`. Users who previously selected them directly should select `@research` instead.
|
|
14
|
+
|
|
15
|
+
**New features:**
|
|
16
|
+
|
|
17
|
+
- **CLI autopilot (`glrs oc autopilot "<prompt>"`)** — Ralph-loop engine: sends your prompt each iteration, watches the agent's response for `<autopilot-done>` sentinel, retries the same prompt when absent. Budgets: 50 iterations / 4h / 3 zero-progress iterations / kill-switch file. Supports single-issue (`"ship ENG-1234"`) and multi-issue (`"ship every open ENG-* issue in project ROADMAP"`) prompts.
|
|
18
|
+
- **opencode-snip installer toggle** — new "Plugin add-ons" section in `glrs oc install` (parallel to existing MCP toggles). Opt-in adds `opencode-snip` to the user's `plugin` array via config-merge, no vendored code. Useful for token reduction on bash-heavy sessions. Requires the Go `snip` binary separately.
|
|
19
|
+
- **Tier 1 visual capabilities** — `@plan`, `@research`, `@gap-analyzer` now have Playwright MCP access (joining `@prime`, `@build`, `@assessor`, `@assessor-thorough`, `@plan-reviewer`). Enable via the installer's Playwright toggle.
|
|
20
|
+
- **UI evaluation ladder (graceful degradation)** — all visual-capable agents now carry a four-tier capability ladder (Playwright → curl → webfetch → source inspection). When Playwright is unavailable, agents fall through to the next tier and report which method they used. No hard failure on Playwright absence.
|
|
21
|
+
|
|
22
|
+
**Internal:**
|
|
23
|
+
|
|
24
|
+
- Server lifecycle helpers (`startServer` / `createSession` / `sendAndWait` / `getLastAssistantMessage`) moved from `src/pilot/server.ts` to `src/lib/opencode-server.ts` (consumed by the CLI autopilot).
|
|
25
|
+
- Agent roster reduced from 20 → 16. Net −5,308 lines across 91 files. Test count 536 → 462 (pilot tests removed, visual-capability tests added).
|
|
26
|
+
|
|
27
|
+
- [#55](https://github.com/iceglober/glrs/pull/55) [`8099c49`](https://github.com/iceglober/glrs/commit/8099c498fa6a9c05c8880bfd09cb2c4fd7d1721c) Thanks [@iceglober](https://github.com/iceglober)! - Rename PRIME arc phases to SPEAR model (Scope → Plan → Execute → Assess → Resolve). Rename @qa-reviewer → @assessor, @qa-thorough → @assessor-thorough. Resolve stage auto-ships (pushes branch, opens PR) — /ship becomes a resume path for interrupted sessions.
|
|
28
|
+
|
|
29
|
+
- [#57](https://github.com/iceglober/glrs/pull/57) [`6212c48`](https://github.com/iceglober/glrs/commit/6212c483efa2cc8f0407bc6a0d8c23110498eb21) Thanks [@iceglober](https://github.com/iceglober)! - Restructure the SPEAR protocol (PRIME's five-stage arc) across four areas: Assess quality, failure discipline, skill modularity, and agent-contract hygiene.
|
|
30
|
+
|
|
31
|
+
**Breaking changes** (match the prior `@assessor` rename's hard-break pattern):
|
|
32
|
+
|
|
33
|
+
- `@assessor` is replaced by `@spec-reviewer` (first pass, returns `[PASS_SPEC]` or `[FAIL_SPEC]`) and `@code-reviewer` (second pass, runs only on PASS_SPEC, returns `[PASS]` / `[LOOP-TO-PLAN]` / `[FIX-INLINE]`). User configs referencing `@assessor` by name will fail to resolve — update to the appropriate replacement.
|
|
34
|
+
- `@assessor-thorough` is renamed to `@code-reviewer-thorough` (same role: opus-tier backstop for high-risk diffs that re-runs the full suite unconditionally).
|
|
35
|
+
- Registered agent count: 20 → 21.
|
|
36
|
+
|
|
37
|
+
**Assess rigor (two-stage review + MECE rubric):**
|
|
38
|
+
|
|
39
|
+
- Every Assess cycle now dispatches two subagents sequentially instead of one, roughly doubling the subagent calls per review cycle. The spec pass is cheaper; the code-quality pass runs only if spec passed.
|
|
40
|
+
- Assess delegations carry a five-dimension MECE rubric (Correctness, Completeness, Consistency, Safety, Scope) and a progressive-strictness signal (Level 1/2/3) that tightens across Assess iterations.
|
|
41
|
+
- PRs with red CI (typecheck, lint, or tests failing) now fail Assess regardless of whether the failure appears pre-existing. "Pre-existing" claims require three-part evidence: a specific commit SHA, `git log` output showing the failure pre-dates the branch, and merge-base reproduction. Claims without all three are auto-rejected.
|
|
42
|
+
|
|
43
|
+
**Failure discipline (no-defer policy):**
|
|
44
|
+
|
|
45
|
+
- The hard rule that allowed logging pre-existing failures to a plan's `## Open questions` section and deferring them is removed.
|
|
46
|
+
- `@build` now runs a mandatory root-cause diagnosis protocol on any unexpected test/lint/typecheck failure: merge-base reproduction, `git blame`, rationalization table countering common excuse patterns ("likely pre-existing", "unrelated to my change", etc.).
|
|
47
|
+
- If fixing a failure would require touching more than ~5 files outside the plan's `## File-level changes`, `@build` STOPs with a reorganization proposal for PRIME to present to the user — there is no autonomous deferral path.
|
|
48
|
+
|
|
49
|
+
**TDD enforcement:**
|
|
50
|
+
|
|
51
|
+
- For any plan with a `## Test plan` entry or a `tests:` field in the acceptance-criteria fence, `@build` now enforces TDD order: write the test first, verify it fails, then implement. Tests in a just-written RED state are explicitly carved out of the failure-diagnosis protocol — they're expected failures, not unexpected ones.
|
|
52
|
+
|
|
53
|
+
**New bundled skills:**
|
|
54
|
+
|
|
55
|
+
- `spear-protocol` — the full SPEAR stage logic (Bootstrap, Scope, Plan, Execute, Assess, Resolve). Loaded by PRIME at session start. Inline fallback retained in `prime.md` in case skill-loading is unavailable.
|
|
56
|
+
- `root-cause-diagnosis` — the failure-diagnosis protocol + rationalization table. Loaded by `@build` and its strict-executor variant on unexpected failures.
|
|
57
|
+
- `adversarial-review-rubric` — the MECE rubric, progressive strictness levels, Red-CI-blocks-merge rule, and three-part evidence test. Loaded by all Assess-layer agents before reviewing.
|
|
58
|
+
|
|
59
|
+
**Agent-contract changes:**
|
|
60
|
+
|
|
61
|
+
- `@build` gains a four-status return protocol: DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED.
|
|
62
|
+
- `@build` now reports guidance deviations (item (e) of its return payload) when PRIME's Execute-prompt guidance permits multiple readings and `@build` picked one. Same "silence is not acceptable" bar as plan-file mutations.
|
|
63
|
+
- PRIME runs a pre-dispatch consistency check before every `@build` dispatch: re-read the Execute prompt against the plan and against any already-drafted follow-up prompts. Contradictions caught pre-dispatch avoid the downstream blame-misattribution pattern where faithful agent execution gets narrated as deviation.
|
|
64
|
+
- `@plan` bans placeholder phrases (TBD, TODO, "implement later", etc.) and runs a self-review checklist (spec coverage, placeholder scan, type/name consistency) before handing to `@plan-reviewer`.
|
|
65
|
+
- `@build`'s prompt is trimmed of orchestration context per the Minimal Contract principle (subagents perform worse when carrying parent-level workflow philosophy).
|
|
66
|
+
|
|
67
|
+
**Other refinements:**
|
|
68
|
+
|
|
69
|
+
- PRIME's Scope grounding dispatches parallel `@code-searcher` calls in a single message when grounding touches 3+ independent subsystems.
|
|
70
|
+
- PRIME's Plan stage detects multi-subsystem requests (3+ independent subsystems with no shared interface) and asks whether to split into separate plans.
|
|
71
|
+
- Delegation prompts apply the Minimal Contract minimality test: remove any sentence that doesn't help the subagent produce a better result. Non-goals prefer positive-instruction form ("Only modify files listed above") over negative lists when the positive form is shorter.
|
|
72
|
+
|
|
3
73
|
## 2.1.0
|
|
4
74
|
|
|
5
75
|
## 2.0.1
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# @glrs-dev/harness-plugin-opencode
|
|
2
2
|
|
|
3
|
-
Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended
|
|
3
|
+
Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended autopilot loop — one package.
|
|
4
4
|
|
|
5
5
|
## Quick start
|
|
6
6
|
|
|
@@ -21,7 +21,7 @@ bunx @glrs-dev/harness-plugin-opencode install
|
|
|
21
21
|
opencode
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but
|
|
24
|
+
No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but you can add it later.
|
|
25
25
|
|
|
26
26
|
### Verifying the published tarball
|
|
27
27
|
|
|
@@ -43,18 +43,18 @@ Open OpenCode in any repo. The `prime` agent handles everything end-to-end.
|
|
|
43
43
|
```
|
|
44
44
|
/fresh ENG-1234
|
|
45
45
|
```
|
|
46
|
-
Wipes the worktree, creates a branch from the ticket ref, and begins the
|
|
46
|
+
Wipes the worktree, creates a branch from the ticket ref, and begins the SPEAR workflow: scope → plan → execute → assess → resolve.
|
|
47
47
|
|
|
48
48
|
**Start a task from a description:**
|
|
49
49
|
```
|
|
50
50
|
/fresh add rate limiting to the upload endpoint
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
**Go hands-off
|
|
53
|
+
**Go hands-off with the Ralph loop (CLI, lights-out):**
|
|
54
54
|
```
|
|
55
|
-
|
|
55
|
+
glrs oc autopilot "ship ENG-1234"
|
|
56
56
|
```
|
|
57
|
-
Runs
|
|
57
|
+
Runs PRIME in a loop: sends your prompt each iteration, watches for `<autopilot-done>` in the response, exits when the sentinel appears or a budget is hit (50 iterations / 4h / 3 zero-progress iterations / kill-switch at `.agent/autopilot-disable`). Works with multi-issue prompts too: `glrs oc autopilot "ship every open issue in Linear project ENG-ROADMAP until the project is done"`. There is no TUI slash command — if you're in the TUI and don't want the loop, just type the task normally.
|
|
58
58
|
|
|
59
59
|
**Ship when done:**
|
|
60
60
|
```
|
|
@@ -66,7 +66,7 @@ Squashes commits, pushes, opens a PR with the plan as the body.
|
|
|
66
66
|
```
|
|
67
67
|
/review 87
|
|
68
68
|
```
|
|
69
|
-
Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@
|
|
69
|
+
Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@assessor`, outputs a structured verdict.
|
|
70
70
|
|
|
71
71
|
**Deep codebase research:**
|
|
72
72
|
```
|
|
@@ -74,41 +74,21 @@ Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates t
|
|
|
74
74
|
```
|
|
75
75
|
Spawns parallel subagents, synthesizes findings with exact file:line references.
|
|
76
76
|
|
|
77
|
-
### Autonomous (pilot CLI)
|
|
78
|
-
|
|
79
|
-
For larger work that benefits from structured scoping and autonomous execution with self-assessment.
|
|
80
|
-
|
|
81
|
-
```bash
|
|
82
|
-
# Scope interactively — spawns OpenCode TUI with the pilot-scoper agent
|
|
83
|
-
glrs-oc pilot scope "Refactor the billing module into separate services"
|
|
84
|
-
|
|
85
|
-
# Execute autonomously — Plan → Execute → Assess → Resolve (SPEAR loop)
|
|
86
|
-
glrs-oc pilot go
|
|
87
|
-
|
|
88
|
-
# Configure models and verify commands for this repo
|
|
89
|
-
glrs-oc pilot configure
|
|
90
|
-
|
|
91
|
-
# Check workflow status
|
|
92
|
-
glrs-oc pilot status
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
See [Pilot mode](#pilot-mode) for the full command reference.
|
|
96
|
-
|
|
97
77
|
---
|
|
98
78
|
|
|
99
79
|
## What the plugin provides
|
|
100
80
|
|
|
101
|
-
|
|
81
|
+
16 agents, 7 slash commands, 5 tools, 5 MCPs, 11 skill bundles, 3 sub-plugins. Details below.
|
|
102
82
|
|
|
103
83
|
### Agents
|
|
104
84
|
|
|
105
85
|
| Agent | Tier | Role |
|
|
106
86
|
|-------|------|------|
|
|
107
|
-
| `prime` | deep |
|
|
87
|
+
| `prime` | deep | SPEAR end-to-end workflow (default agent) |
|
|
108
88
|
| `plan` | deep | Interactive planner with gap analysis and adversarial review |
|
|
109
89
|
| `build` | mid | Plan executor |
|
|
110
|
-
| `
|
|
111
|
-
| `
|
|
90
|
+
| `assessor` | mid | Fast adversarial code review |
|
|
91
|
+
| `assessor-thorough` | deep | Full-suite adversarial review |
|
|
112
92
|
| `plan-reviewer` | deep | Adversarial plan review |
|
|
113
93
|
| `gap-analyzer` | deep | Identifies gaps in plans |
|
|
114
94
|
| `architecture-advisor` | deep | Architecture guidance |
|
|
@@ -116,8 +96,8 @@ See [Pilot mode](#pilot-mode) for the full command reference.
|
|
|
116
96
|
| `docs-maintainer` | mid | Documentation updates |
|
|
117
97
|
| `lib-reader` | mid | Library/dependency reader |
|
|
118
98
|
| `agents-md-writer` | mid | AGENTS.md generation |
|
|
119
|
-
| `
|
|
120
|
-
| `
|
|
99
|
+
| `research` | deep | Multi-workstream research orchestrator |
|
|
100
|
+
| `research-web` / `research-local` / `research-auto` | deep | Research subagents (dispatched by `@research`) |
|
|
121
101
|
|
|
122
102
|
Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Override with [`harness.models`](#model-overrides).
|
|
123
103
|
|
|
@@ -126,13 +106,14 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
|
|
|
126
106
|
| Command | What it does |
|
|
127
107
|
|---------|-------------|
|
|
128
108
|
| `/fresh <ref>` | Wipe worktree, branch from ticket or description, start PRIME |
|
|
129
|
-
| `/autopilot <ref>` | Hands-off PRIME run; stops when acceptance criteria pass |
|
|
130
109
|
| `/ship <plan>` | Squash, push, open PR |
|
|
131
110
|
| `/review <target>` | Read-only adversarial review (PR#, SHA, branch, or file) |
|
|
132
111
|
| `/research <topic>` | Parallel codebase exploration with file:line citations |
|
|
133
112
|
| `/init-deep` | Generate hierarchical AGENTS.md files |
|
|
134
113
|
| `/costs` | Show running LLM spend totals |
|
|
135
114
|
|
|
115
|
+
Autopilot is CLI-only: `glrs oc autopilot "<prompt>"` (see above).
|
|
116
|
+
|
|
136
117
|
### Tools
|
|
137
118
|
|
|
138
119
|
`ast_grep` · `tsc_check` · `eslint_check` · `todo_scan` · `comment_check`
|
|
@@ -149,94 +130,48 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
|
|
|
149
130
|
|
|
150
131
|
### Sub-plugins
|
|
151
132
|
|
|
152
|
-
- **autopilot** — idle-nudge loop driver (only activates via `/autopilot`)
|
|
153
133
|
- **notify** — OS notifications when the agent asks a question
|
|
154
134
|
- **cost-tracker** — LLM spend by provider/model at `~/.glorious/opencode/costs.json`
|
|
155
|
-
- **
|
|
135
|
+
- **tool-hooks** — post-edit verification loop (tsc, eslint) + output backpressure
|
|
156
136
|
|
|
157
137
|
### Skills
|
|
158
138
|
|
|
159
|
-
`
|
|
139
|
+
`adr` · `agent-estimation` · `code-quality` · `research` · `research-auto` · `research-local` · `research-web` · `review-plan` · `vercel-composition-patterns` · `vercel-react-best-practices` · `web-design-guidelines`
|
|
160
140
|
|
|
161
141
|
---
|
|
162
142
|
|
|
163
|
-
##
|
|
164
|
-
|
|
165
|
-
Autonomous code execution using the SPEAR loop (Scope → Plan → Execute → Assess → Resolve). The user scopes interactively, then `pilot go` runs the rest autonomously with self-assessment and deployment-risk reflection.
|
|
166
|
-
|
|
167
|
-
**Prerequisites:** `git` >= 2.5, `opencode` on PATH. Plugin must be installed (auto-prompted if missing).
|
|
168
|
-
|
|
169
|
-
### Commands
|
|
143
|
+
## Enabling visual UI capabilities
|
|
170
144
|
|
|
171
|
-
|
|
172
|
-
|---------|-------------|
|
|
173
|
-
| `glrs-oc pilot scope "<goal>"` | Interactive scoping session. Produces `scope.json` with framing + acceptance criteria. |
|
|
174
|
-
| `glrs-oc pilot go` | Autonomous execution. Reads scope, runs Plan → Execute → Assess → Resolve. |
|
|
175
|
-
| `glrs-oc pilot configure` | Interactive per-phase model selection, verify commands, assess cycles, Playwright toggle. |
|
|
176
|
-
| `glrs-oc pilot status` | Workflow status from SQLite. `--workflow <id>`, `--json`. |
|
|
177
|
-
|
|
178
|
-
### SPEAR loop
|
|
179
|
-
|
|
180
|
-
1. **Scope** (interactive) — scoper agent interviews you, explores the codebase, produces acceptance criteria.
|
|
181
|
-
2. **Plan** (autonomous) — planner agent decomposes ACs into an ordered task list.
|
|
182
|
-
3. **Execute** (autonomous) — builder agent runs one task at a time, commits on verify pass.
|
|
183
|
-
4. **Assess** (autonomous) — assessor evaluates ACs + asks deployment-risk questions (what could break? unexpected consequences? what could go wrong?). If fail → re-plan the gap → re-execute → re-assess (bounded by `max_assess_cycles`).
|
|
184
|
-
5. **Resolve** (autonomous) — final summary with acknowledged risks.
|
|
145
|
+
The `@plan`, `@research`, `@gap-analyzer`, `@prime`, `@build`, `@assessor`, `@assessor-thorough`, and `@plan-reviewer` agents can verify web UIs, rendered output, and visual components when Playwright is available.
|
|
185
146
|
|
|
186
|
-
###
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
~/.glorious/opencode/<repo>/pilot/
|
|
190
|
-
state.sqlite # workflows + events
|
|
191
|
-
current-scope.json # pointer to active scope
|
|
192
|
-
scopes/<workflowId>/
|
|
193
|
-
scope.json # framing + acceptance criteria
|
|
194
|
-
plan.json # task list
|
|
195
|
-
assessment-cycle-N.json # assessment reports
|
|
196
|
-
```
|
|
147
|
+
### Enable Playwright MCP
|
|
197
148
|
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
### Configuration
|
|
201
|
-
|
|
202
|
-
Config lives at `.glrs/pilot.json` in your repo (not per-plan YAML):
|
|
149
|
+
During `glrs-oc install-plugin`, select **Playwright — browser automation + visual UI verification (requires Chromium)** in the MCP toggle list. Or enable it manually in `opencode.json`:
|
|
203
150
|
|
|
204
151
|
```json
|
|
205
152
|
{
|
|
206
|
-
"
|
|
207
|
-
"
|
|
208
|
-
|
|
209
|
-
"execute": "anthropic/claude-sonnet-4-6",
|
|
210
|
-
"assess": "anthropic/claude-sonnet-4-6"
|
|
211
|
-
},
|
|
212
|
-
"verify": {
|
|
213
|
-
"baseline": ["bun test", "bun run typecheck"],
|
|
214
|
-
"after_each": ["bun run typecheck"]
|
|
215
|
-
},
|
|
216
|
-
"max_assess_cycles": 3,
|
|
217
|
-
"playwright": { "enabled": false, "base_url": "http://localhost:3000" }
|
|
153
|
+
"mcp": {
|
|
154
|
+
"playwright": { "enabled": true }
|
|
155
|
+
}
|
|
218
156
|
}
|
|
219
157
|
```
|
|
220
158
|
|
|
221
|
-
|
|
159
|
+
Then install Chromium:
|
|
222
160
|
|
|
223
|
-
|
|
161
|
+
```bash
|
|
162
|
+
npx playwright install chromium
|
|
163
|
+
```
|
|
224
164
|
|
|
225
|
-
|
|
165
|
+
### Graceful degradation
|
|
226
166
|
|
|
227
|
-
|
|
228
|
-
|---|---|
|
|
229
|
-
| `pilot plan` | `pilot scope "<goal>"` |
|
|
230
|
-
| `pilot build` | `pilot go` |
|
|
231
|
-
| `pilot validate` | `pilot configure` (config validation) |
|
|
232
|
-
| `pilot status` | `pilot status` (same name, different output) |
|
|
233
|
-
| `pilot logs` | `pilot status --json` |
|
|
234
|
-
| `pilot cost` | `pilot status --json` |
|
|
235
|
-
| `pilot build-resume` | `pilot go` (re-reads scope, restarts from Plan) |
|
|
167
|
+
Agents automatically fall back when Playwright is unavailable:
|
|
236
168
|
|
|
237
|
-
|
|
169
|
+
1. **Tier A (Playwright)** — navigate, screenshot, evaluate DOM. Best signal.
|
|
170
|
+
2. **Tier B (curl)** — parse returned HTML for structure and reachability.
|
|
171
|
+
3. **Tier C (webfetch)** — built-in tool for public URLs.
|
|
172
|
+
4. **Tier D (source inspection)** — read component files and reason about rendering. Agent flags "visual verification skipped" in its final message.
|
|
238
173
|
|
|
239
|
-
|
|
174
|
+
No configuration required — agents detect capability absence from MCP errors and fall through automatically.
|
|
240
175
|
|
|
241
176
|
---
|
|
242
177
|
|
|
@@ -293,7 +228,7 @@ Your opencode.json values win. Example:
|
|
|
293
228
|
| `glrs-oc install-plugin [--pin] [--dry-run]` | Register plugin in opencode.json |
|
|
294
229
|
| `glrs-oc uninstall [--dry-run]` | Remove plugin from opencode.json |
|
|
295
230
|
| `glrs-oc doctor` | Check installation health |
|
|
296
|
-
| `glrs-oc
|
|
231
|
+
| `glrs-oc autopilot "<prompt>"` | Run PRIME in a loop (lights-out) |
|
|
297
232
|
| `glrs-oc plan-dir` | Print repo-shared plan directory |
|
|
298
233
|
| `glrs-oc plan-check <path>` | Validate legacy markdown plan files |
|
|
299
234
|
|
|
@@ -324,7 +259,7 @@ bun remove -g @glrs-dev/harness-plugin-opencode # remove CLI
|
|
|
324
259
|
- `bun`
|
|
325
260
|
- `uvx` for serena + git MCPs (`brew install uv`)
|
|
326
261
|
- `node`/`npx` for memory MCP
|
|
327
|
-
- `git`
|
|
262
|
+
- `git` for version control operations
|
|
328
263
|
|
|
329
264
|
## Security & threat boundaries
|
|
330
265
|
|
|
@@ -334,8 +269,8 @@ Report vulnerabilities privately per [`SECURITY.md`](./SECURITY.md) — do NOT o
|
|
|
334
269
|
|
|
335
270
|
This is a plugin with broad local-machine access. Install it deliberately:
|
|
336
271
|
|
|
337
|
-
- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo
|
|
338
|
-
- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands
|
|
272
|
+
- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo>/*`).
|
|
273
|
+
- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands.
|
|
339
274
|
- **Makes outbound HTTPS calls** (all opt-out-able):
|
|
340
275
|
- `registry.npmjs.org` — daily version check. Opt out: `HARNESS_OPENCODE_UPDATE_CHECK=0`.
|
|
341
276
|
- `catwalk.charm.land` — model catalog during interactive install only. Response is schema-validated before it reaches your `opencode.json`.
|
|
@@ -47,9 +47,12 @@ Before editing any file longer than ~200 lines, run `comment_check` scoped to th
|
|
|
47
47
|
For each item in `## File-level changes`:
|
|
48
48
|
1. Make the change.
|
|
49
49
|
2. After each non-trivial change, run lint and tests for the affected files.
|
|
50
|
-
3. If a test fails, fix it before moving on.
|
|
50
|
+
3. If a test fails, fix it before moving on. Run the root-cause diagnosis protocol below before drawing any conclusion about the failure's origin.
|
|
51
51
|
4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
|
|
52
52
|
|
|
53
|
+
**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
|
|
54
|
+
The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
|
|
55
|
+
|
|
53
56
|
**Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0. This is how fenced plans encode strict TDD — the `tests:` field is the spec; the code is secondary.
|
|
54
57
|
|
|
55
58
|
When you discover the plan is wrong:
|
|
@@ -64,7 +67,7 @@ Before returning to PRIME (or declaring complete on a top-level invocation):
|
|
|
64
67
|
- `tsc_check` on each edited file is clean (it's capped and fast — run it).
|
|
65
68
|
- `git diff --stat` matches the plan's `## File-level changes`.
|
|
66
69
|
|
|
67
|
-
Do NOT run the full test suite or a full lint pass. PRIME's
|
|
70
|
+
Do NOT run the full test suite or a full lint pass. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`, which will fail you if a full-suite regression slips through. Running the full suite here duplicates that work. Per-file tests during execution (section 3) are expected; a final full-suite run is not.
|
|
68
71
|
|
|
69
72
|
## 5. Return payload
|
|
70
73
|
|
|
@@ -76,13 +79,22 @@ Return control to your caller with a structured summary:
|
|
|
76
79
|
|
|
77
80
|
**(c) Plan mutations** — any cosmetic/numeric threshold bumps you absorbed silently, any scope expansions under the 2-file limit you absorbed. Be explicit: *"Updated plan §4 line-count threshold from 200 → 260 (file ended up 258 lines; self-imposed metric)"* is a good entry; silence is not.
|
|
78
81
|
|
|
79
|
-
**(d) Unusual conditions** —
|
|
82
|
+
**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition you hit.
|
|
83
|
+
|
|
84
|
+
**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts and put only in skill' and extracted fully; alternate reading was 'duplicate in skill while keeping inline as enforced default.' Chose full extraction because DRY and the rules also live in prime.md hard rules."* Silence is not acceptable — same bar as item (c). A PRIME that can't see the decision-point after the fact has no way to tell a defensible judgment from a silent disobedience.
|
|
85
|
+
|
|
86
|
+
**Return status.** Use one of these four statuses in your return:
|
|
87
|
+
|
|
88
|
+
- **DONE** — all acceptance criteria met, no concerns.
|
|
89
|
+
- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention (e.g., a pattern inconsistency you worked around, a non-blocking lint warning, a TODO you left in place per the plan's `## Out of scope`). List concerns explicitly.
|
|
90
|
+
- **NEEDS_CONTEXT** — you hit ambiguity that requires user input before you can proceed. Describe what's needed.
|
|
91
|
+
- **BLOCKED** — a hard blocker prevents completion (missing dependency, conflicting plan, broken environment). Describe the blocker.
|
|
80
92
|
|
|
81
93
|
**STOP payloads.** If you hit a blocker instead of completing, make the STOP clearly labeled in your return so PRIME recognizes it as a blocker rather than a completion. Format:
|
|
82
94
|
|
|
83
95
|
> STOP: <one-sentence blocker>. <Which of the three classes this falls under: cosmetic-numeric / approach-design / scope-expansion-over-2-files>. <What PRIME needs to resolve to re-dispatch>.
|
|
84
96
|
|
|
85
|
-
PRIME owns QA dispatch. Do NOT delegate to `@
|
|
97
|
+
PRIME owns QA dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent — PRIME's Assess stage applies a fast-vs-thorough heuristic based on diff size + risk that you don't have full context for. When invoked top-level (`@build <plan-path>`), you may delegate to `@spec-reviewer` directly as the session's final step.
|
|
86
98
|
|
|
87
99
|
# Hard rules
|
|
88
100
|
|
|
@@ -91,3 +103,5 @@ PRIME owns QA dispatch. Do NOT delegate to `@qa-reviewer` or `@qa-thorough` your
|
|
|
91
103
|
- **Never use `--no-verify` or `--no-gpg-sign`** to bypass pre-commit hooks. If a hook blocks you, fix the root cause (resolve TODOs, repair lint/type errors). If the hook seems genuinely wrong, STOP and ask the user.
|
|
92
104
|
- Plan file mutations: mark `[x]` freely as items complete. For **cosmetic / self-imposed numeric thresholds** (line-count budgets, row caps, arbitrary `< N` limits the planner set on itself), update the threshold silently and note it in your commit message — do NOT stop. For **approach / design changes** (the interface doesn't exist, the test strategy won't work, a whole section needs restructuring), stop and use the `question` tool. For **scope expansion** (an extra file or two needed to finish the item), add to `## File-level changes` and keep going; only ask if the expansion is > ~2 files or shifts the `## Goal`.
|
|
93
105
|
- The user's goals are fixed; your own metrics are revisable. If you find yourself working around the plan's *approach*, that's a design-change signal — stop and ask. If you're just bumping a threshold you set on yourself, keep moving.
|
|
106
|
+
|
|
107
|
+
{UI_EVALUATION_LADDER}
|
|
@@ -37,12 +37,17 @@ Before starting, note: file count, which acceptance criteria you will verify, an
|
|
|
37
37
|
|
|
38
38
|
## 3. Execute task by task
|
|
39
39
|
|
|
40
|
+
**Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0.
|
|
41
|
+
|
|
40
42
|
For each item in `## File-level changes`:
|
|
41
43
|
1. Make the change.
|
|
42
|
-
2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, fix and re-run.
|
|
44
|
+
2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, run the root-cause diagnosis protocol below, fix, and re-run.
|
|
43
45
|
3. If a test fails, fix it before moving on.
|
|
44
46
|
4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
|
|
45
47
|
|
|
48
|
+
**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
|
|
49
|
+
The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
|
|
50
|
+
|
|
46
51
|
**Verify commands.** Run the verify commands listed in the plan. If they pass, the item is done. If they fail, read the output, fix the code, and re-run. Do not mark an item `[x]` until the verify command exits 0.
|
|
47
52
|
|
|
48
53
|
When you discover the plan is wrong:
|
|
@@ -59,7 +64,7 @@ Before returning:
|
|
|
59
64
|
- `tsc_check` on each edited file is clean.
|
|
60
65
|
- `git diff --stat` matches the plan's `## File-level changes`.
|
|
61
66
|
|
|
62
|
-
Do NOT run the full test suite. PRIME's
|
|
67
|
+
Do NOT run the full test suite. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`.
|
|
63
68
|
|
|
64
69
|
## 5. Return payload
|
|
65
70
|
|
|
@@ -71,13 +76,22 @@ Return control to your caller with a structured summary:
|
|
|
71
76
|
|
|
72
77
|
**(c) Plan mutations** — any changes you made to the plan file itself (threshold bumps, etc.).
|
|
73
78
|
|
|
74
|
-
**(d) Unusual conditions** —
|
|
79
|
+
**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition.
|
|
80
|
+
|
|
81
|
+
**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts' and extracted fully; alternate reading was 'duplicate in skill while keeping inline.' Chose full extraction because DRY."* Silence is not acceptable — same bar as item (c).
|
|
82
|
+
|
|
83
|
+
**Return status.** Use one of these four statuses:
|
|
84
|
+
|
|
85
|
+
- **DONE** — all acceptance criteria met, no concerns.
|
|
86
|
+
- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention. List concerns explicitly.
|
|
87
|
+
- **NEEDS_CONTEXT** — ambiguity requires user input before you can proceed.
|
|
88
|
+
- **BLOCKED** — a hard blocker prevents completion.
|
|
75
89
|
|
|
76
90
|
**STOP payloads.** If you hit a blocker, label it clearly:
|
|
77
91
|
|
|
78
92
|
> STOP: <one-sentence blocker>. <What needs to be resolved to re-dispatch>.
|
|
79
93
|
|
|
80
|
-
PRIME owns
|
|
94
|
+
PRIME owns Assess dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent.
|
|
81
95
|
|
|
82
96
|
# Hard rules
|
|
83
97
|
|
|
@@ -1,39 +1,41 @@
|
|
|
1
1
|
---
|
|
2
|
-
name:
|
|
3
|
-
description: Thorough
|
|
2
|
+
name: code-reviewer-thorough
|
|
3
|
+
description: Thorough code reviewer for high-risk diffs. Re-runs full lint/test/typecheck unconditionally. Use for large or high-risk diffs. Returns [PASS], [LOOP-TO-PLAN], or [FIX-INLINE].
|
|
4
4
|
mode: subagent
|
|
5
5
|
model: anthropic/claude-opus-4-7
|
|
6
6
|
temperature: 0.1
|
|
7
7
|
---
|
|
8
8
|
|
|
9
|
-
You are the
|
|
9
|
+
You are the Code Reviewer (thorough variant). The PRIME picks this variant for large or high-risk diffs — your job is to re-run the full lint / test / typecheck suite from scratch and independently verify every acceptance criterion, regardless of what the PRIME claims.
|
|
10
10
|
|
|
11
|
-
Do not ask the user questions. Return `[PASS]` or `[
|
|
11
|
+
Do not ask the user questions. Return `[PASS]`, `[LOOP-TO-PLAN: <summary>]`, or `[FIX-INLINE: <summary>]` only.
|
|
12
12
|
|
|
13
|
-
You are distinct from `@
|
|
13
|
+
You are distinct from `@code-reviewer`. That variant trusts the PRIME's recent green output and skips redundant re-runs. You do NOT — re-execution is the whole point of delegating to thorough.
|
|
14
|
+
|
|
15
|
+
You run ONLY after `@spec-reviewer` has returned `[PASS_SPEC]` — spec/scope compliance is already confirmed.
|
|
14
16
|
|
|
15
17
|
# Process
|
|
16
18
|
|
|
17
19
|
1. **Read the plan** at the path provided.
|
|
18
20
|
2. **Inspect the diff.** Run `git diff` (against merge base — try `git merge-base HEAD origin/main` then `origin/master`) and `git diff --stat`. Also run `git status` to see untracked files.
|
|
19
21
|
3. **Plan-drift check (AUTO-FAIL).** For each modified file in the diff, verify it appears in the plan's `## File-level changes`. A modified file NOT listed in `## File-level changes` is AUTO-FAIL regardless of how "implicit" the coverage seems — the plan should have listed it. Report as `Plan drift: <path> modified but not in ## File-level changes`.
|
|
20
|
-
4. **Scope-creep check.** For each UNTRACKED file (from `git status`) that is NOT in `## File-level changes`, run `git log --oneline -- <file>` to determine whether the file is pre-existing work or scope creep. Do NOT accept the PRIME's verbal "pre-existing" claim without this check. If the file has no prior commits on this branch AND isn't in the plan,
|
|
22
|
+
4. **Scope-creep check.** For each UNTRACKED file (from `git status`) that is NOT in `## File-level changes`, run `git log --oneline -- <file>` to determine whether the file is pre-existing work or scope creep. Do NOT accept the PRIME's verbal "pre-existing" claim without this check. If the file has no prior commits on this branch AND isn't in the plan, LOOP-TO-PLAN with `Scope creep: <path> untracked and not in plan`.
|
|
21
23
|
5. **Semantic verification.** For each item in `## File-level changes`, verify the corresponding code change exists and matches the description. For each `## Acceptance criteria` item, verify it is actually met by reading the code — do NOT trust `[x]` checkboxes.
|
|
22
|
-
6. **Plan-state verify commands (fenced plans only).** Run `bunx @glrs-dev/harness-plugin-opencode plan-check --run <plan-path>` and execute each returned verify command via `bash`. Any non-zero exit →
|
|
23
|
-
7. **Re-run the project's test command.** Unconditionally. Discover the invocation from `package.json` scripts / `Makefile` / `CONTRIBUTING.md` / `AGENTS.md` — typical forms: `pnpm test`, `npm test`, `bun test`, `cargo test`, `pytest`, `go test ./...`. Any failure →
|
|
24
|
-
8. **Re-run the project's lint command.** Unconditionally. E.g., `pnpm lint`, `npm run lint`, `ruff check`, `golangci-lint run`. Any failure →
|
|
25
|
-
9. **Re-run the project's typecheck / build command.** Unconditionally. E.g., `pnpm typecheck`, `tsc --noEmit`, `mypy`, `cargo check`. Any failure →
|
|
24
|
+
6. **Plan-state verify commands (fenced plans only).** Run `bunx @glrs-dev/harness-plugin-opencode plan-check --run <plan-path>` and execute each returned verify command via `bash`. Any non-zero exit → LOOP-TO-PLAN with `Verify failed: <command> (exit N)`. If the plan has no fence (legacy), skip.
|
|
25
|
+
7. **Re-run the project's test command.** Unconditionally. Discover the invocation from `package.json` scripts / `Makefile` / `CONTRIBUTING.md` / `AGENTS.md` — typical forms: `pnpm test`, `npm test`, `bun test`, `cargo test`, `pytest`, `go test ./...`. Any failure → FIX-INLINE (if trivial) or LOOP-TO-PLAN (if structural).
|
|
26
|
+
8. **Re-run the project's lint command.** Unconditionally. E.g., `pnpm lint`, `npm run lint`, `ruff check`, `golangci-lint run`. Any failure → FIX-INLINE.
|
|
27
|
+
9. **Re-run the project's typecheck / build command.** Unconditionally. E.g., `pnpm typecheck`, `tsc --noEmit`, `mypy`, `cargo check`. Any failure → FIX-INLINE.
|
|
26
28
|
10. **Check for missed concerns:**
|
|
27
29
|
- Regressions in adjacent code not mentioned in the plan
|
|
28
30
|
- Missing test coverage for new behavior
|
|
29
31
|
- Hardcoded values that should be config
|
|
30
32
|
- Error paths not handled
|
|
31
|
-
11. **AGENTS.md freshness (hierarchical docs).** For each directory touched by the change, check whether a local `AGENTS.md` exists. If yes, read it and verify its conventions/claims still match the code. If the change shifts a convention and the local `AGENTS.md` wasn't updated,
|
|
32
|
-
12. **Scan for new tech debt.** Run `todo_scan` with `onlyChanged: true`. For every TODO / FIXME / HACK / XXX, check whether the plan's `## Out of scope` or `## Open questions` acknowledges it. Unacknowledged new debt →
|
|
33
|
+
11. **AGENTS.md freshness (hierarchical docs).** For each directory touched by the change, check whether a local `AGENTS.md` exists. If yes, read it and verify its conventions/claims still match the code. If the change shifts a convention and the local `AGENTS.md` wasn't updated, return FIX-INLINE with: `Update <path>/AGENTS.md to reflect <specific change>`. Do not fail on unrelated staleness — only on drift caused by THIS change.
|
|
34
|
+
12. **Scan for new tech debt.** Run `todo_scan` with `onlyChanged: true`. For every TODO / FIXME / HACK / XXX, check whether the plan's `## Out of scope` or `## Open questions` acknowledges it. Unacknowledged new debt → FIX-INLINE with `file:line`.
|
|
33
35
|
|
|
34
36
|
# Output
|
|
35
37
|
|
|
36
|
-
Exactly one of these
|
|
38
|
+
Exactly one of these three formats. Nothing else.
|
|
37
39
|
|
|
38
40
|
**If everything passes:**
|
|
39
41
|
|
|
@@ -43,10 +45,20 @@ Exactly one of these two formats. Nothing else.
|
|
|
43
45
|
<2–3 sentence summary of verified changes.>
|
|
44
46
|
```
|
|
45
47
|
|
|
46
|
-
**If
|
|
48
|
+
**If structural issues require re-planning:**
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
[LOOP-TO-PLAN: <one-line summary>]
|
|
47
52
|
|
|
53
|
+
1. <File:line> — <Specific issue requiring plan-level change>
|
|
54
|
+
2. <File:line> — <Next issue>
|
|
55
|
+
...
|
|
48
56
|
```
|
|
49
|
-
|
|
57
|
+
|
|
58
|
+
**If trivial issues can be fixed inline:**
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
[FIX-INLINE: <one-line summary>]
|
|
50
62
|
|
|
51
63
|
1. <File:line> — <Specific issue>
|
|
52
64
|
2. <File:line> — <Next issue>
|
|
@@ -56,8 +68,11 @@ Exactly one of these two formats. Nothing else.
|
|
|
56
68
|
# Rules
|
|
57
69
|
|
|
58
70
|
- Never suggest fixes. Report precisely; the build agent will fix.
|
|
59
|
-
-
|
|
60
|
-
-
|
|
61
|
-
- **
|
|
62
|
-
- **AUTO-FAIL on scope creep.** Untracked file not in plan with no prior commits → FAIL.
|
|
71
|
+
- A single failing item is enough to return a non-PASS verdict. Do not minimize.
|
|
72
|
+
- **LOOP-TO-PLAN** for: new files needed, different approach required, missed acceptance criteria, structural regressions.
|
|
73
|
+
- **FIX-INLINE** for: lint failures, missing test assertions, typos, AGENTS.md staleness, unacknowledged tech debt.
|
|
63
74
|
- Re-run test / lint / typecheck unconditionally. That is the whole reason the PRIME picked you over the fast variant.
|
|
75
|
+
- **Load the `adversarial-review-rubric` skill via the Skill tool before reviewing.**
|
|
76
|
+
The skill contains: MECE rubric, progressive strictness levels, Red-CI-blocks-merge rule, and the evidence test for pre-existing claims.
|
|
77
|
+
|
|
78
|
+
{UI_EVALUATION_LADDER}
|