@glrs-dev/harness-plugin-opencode 2.1.0 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +133 -0
- package/README.md +42 -106
- package/SECURITY.md +1 -1
- package/dist/agents/prompts/build.md +34 -4
- package/dist/agents/prompts/build.open.md +18 -4
- package/dist/agents/prompts/code-reviewer-thorough.md +77 -0
- package/dist/agents/prompts/code-reviewer.md +80 -0
- package/dist/agents/prompts/code-reviewer.open.md +68 -0
- package/dist/agents/prompts/debriefer.md +55 -0
- package/dist/agents/prompts/gap-analyzer.md +2 -0
- package/dist/agents/prompts/plan-reviewer.md +5 -1
- package/dist/agents/prompts/plan.md +119 -10
- package/dist/agents/prompts/prime.md +149 -88
- package/dist/agents/prompts/research-auto.md +1 -1
- package/dist/agents/prompts/research-local.md +1 -1
- package/dist/agents/prompts/research-web.md +1 -1
- package/dist/agents/prompts/research.md +2 -0
- package/dist/agents/prompts/scoper.md +129 -0
- package/dist/agents/prompts/spec-reviewer.md +53 -0
- package/dist/agents/prompts/spec-reviewer.open.md +56 -0
- package/dist/agents/shared/index.ts +1 -0
- package/dist/agents/shared/ui-evaluation-ladder.md +50 -0
- package/dist/agents/shared/workflow-mechanics.md +5 -5
- package/dist/autopilot/prompt-template.md +104 -0
- package/dist/chunk-GCWHRUOK.js +259 -0
- package/dist/chunk-MJSMBY2Y.js +87 -0
- package/dist/chunk-NIFAVPNN.js +544 -0
- package/dist/{chunk-VJUETC6A.js → chunk-PDMXYZM4.js} +53 -1
- package/dist/cli.js +1596 -1964
- package/dist/commands/prompts/fresh.md +27 -24
- package/dist/commands/prompts/review.md +3 -3
- package/dist/commands/prompts/ship.md +2 -0
- package/dist/index.js +188 -633
- package/dist/loop-session-J35NILUZ.js +30 -0
- package/dist/opencode-server-KPCDFYAX.js +22 -0
- package/dist/plan-parser-TMHEKT22.js +6 -0
- package/dist/plan-session-7VS32P52.js +117 -0
- package/dist/scoper-S77SOK7X.js +326 -0
- package/dist/skills/adversarial-review-rubric/SKILL.md +47 -0
- package/dist/skills/code-quality/SKILL.md +1 -1
- package/dist/skills/root-cause-diagnosis/SKILL.md +24 -0
- package/dist/skills/spear-protocol/SKILL.md +167 -0
- package/package.json +3 -1
- package/dist/agents/prompts/pilot-assessor.md +0 -77
- package/dist/agents/prompts/pilot-builder.md +0 -40
- package/dist/agents/prompts/pilot-planner.md +0 -56
- package/dist/agents/prompts/pilot-scoper.md +0 -58
- package/dist/agents/prompts/qa-reviewer.md +0 -68
- package/dist/agents/prompts/qa-reviewer.open.md +0 -58
- package/dist/agents/prompts/qa-thorough.md +0 -63
- package/dist/bin/plan-check.sh +0 -255
- package/dist/chunk-6CZPRUMJ.js +0 -869
- package/dist/chunk-DZG4D3OH.js +0 -54
- package/dist/chunk-OYRKOEXK.js +0 -88
- package/dist/commands/prompts/autopilot.md +0 -96
- package/dist/install-6775ZBDG.js +0 -13
- package/dist/paths-WZ23ZQOV.js +0 -18
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,138 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 2.3.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- [#71](https://github.com/iceglober/glrs/pull/71) [`94704ad`](https://github.com/iceglober/glrs/commit/94704adf36b5ea36fde4557cfd7b1d8494d0e68b) Thanks [@iceglober](https://github.com/iceglober)! - Add `@debriefer` agent and post-run debrief to the autopilot CLI
|
|
8
|
+
|
|
9
|
+
After the Ralph loop exits (any exit reason — sentinel, struggle, timeout, max-iterations, kill-switch, stall, or error), the CLI now optionally spawns a `@debriefer` agent session that produces a structured five-section summary:
|
|
10
|
+
|
|
11
|
+
1. **What was accomplished** — files changed, commits made, PRs opened
|
|
12
|
+
2. **What wasn't finished** — unchecked plan items
|
|
13
|
+
3. **Cost summary** — total USD, iterations completed, exit reason
|
|
14
|
+
4. **What to do next** — actionable suggestions based on exit reason
|
|
15
|
+
5. **Session artifacts** — log file path, plan file path, session ID
|
|
16
|
+
|
|
17
|
+
The debrief runs by default. Skip it with `--no-debrief` on the CLI or by setting `GLRS_AUTOPILOT_DEBRIEF=off` in the environment.
|
|
18
|
+
|
|
19
|
+
The `@debriefer` agent is mid-tier (Sonnet-class), read-only (no file edits, bash limited to git read commands), and never throws — if the debrief session fails, a warning is printed and the CLI exits normally based on the loop result.
|
|
20
|
+
|
|
21
|
+
- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Autopilot rewrite, pilot rip-out, Tier 1 visual capabilities, opencode-snip toggle, research-variant hiding.
|
|
22
|
+
|
|
23
|
+
**Breaking changes:**
|
|
24
|
+
|
|
25
|
+
- **Pilot subsystem removed.** The `glrs oc pilot` CLI subcommand, the four pilot agents (`pilot-scoper` / `planner` / `builder` / `assessor`), the pilot-planning skill references, the `pilot-plugin.ts` runtime enforcer, and all pilot state/docs are gone. Users on pilot should migrate to the CLI autopilot or plain PRIME workflow.
|
|
26
|
+
- **TUI `/autopilot` slash command removed.** Autopilot is now CLI-only: `glrs oc autopilot "<prompt>"`. Users who want autonomous looping run the CLI in any terminal; the TUI stays for interactive work.
|
|
27
|
+
- **Research-variant agents (`research-web`, `research-local`, `research-auto`) hidden from the primary-agent picker.** They now run only as subagents dispatched by `@research`. Users who previously selected them directly should select `@research` instead.
|
|
28
|
+
|
|
29
|
+
**New features:**
|
|
30
|
+
|
|
31
|
+
- **CLI autopilot (`glrs oc autopilot "<prompt>"`)** — Ralph-loop engine: sends your prompt each iteration, watches the agent's response for `<autopilot-done>` sentinel, retries the same prompt when absent. Budgets: 50 iterations / 4h / 3 zero-progress iterations / kill-switch file. Supports single-issue (`"ship ENG-1234"`) and multi-issue (`"ship every open ENG-* issue in project ROADMAP"`) prompts.
|
|
32
|
+
- **opencode-snip installer toggle** — new "Plugin add-ons" section in `glrs oc install` (parallel to existing MCP toggles). Opt-in adds `opencode-snip` to the user's `plugin` array via config-merge, no vendored code. Useful for token reduction on bash-heavy sessions. Requires the Go `snip` binary separately.
|
|
33
|
+
- **Tier 1 visual capabilities** — `@plan`, `@research`, `@gap-analyzer` now have Playwright MCP access (joining `@prime`, `@build`, `@assessor`, `@assessor-thorough`, `@plan-reviewer`). Enable via the installer's Playwright toggle.
|
|
34
|
+
- **UI evaluation ladder (graceful degradation)** — all visual-capable agents now carry a four-tier capability ladder (Playwright → curl → webfetch → source inspection). When Playwright is unavailable, agents fall through to the next tier and report which method they used. No hard failure on Playwright absence.
|
|
35
|
+
|
|
36
|
+
**Internal:**
|
|
37
|
+
|
|
38
|
+
- Server lifecycle helpers (`startServer` / `createSession` / `sendAndWait` / `getLastAssistantMessage`) moved from `src/pilot/server.ts` to `src/lib/opencode-server.ts` (consumed by the CLI autopilot).
|
|
39
|
+
- Agent roster reduced from 20 → 16. Net −5,308 lines across 91 files. Test count 536 → 462 (pilot tests removed, visual-capability tests added).
|
|
40
|
+
|
|
41
|
+
- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Add `glrs oc loop` as the canonical name for the Ralph-loop CLI runner (previously `glrs oc autopilot`). `autopilot` continues to work as an alias during this release cycle — no user scripts break.
|
|
42
|
+
|
|
43
|
+
A future release will diverge the two: `loop` stays as the raw-prompt Ralph-loop runner, and `autopilot` becomes an interactive scoping walkthrough that generates a structured multi-file plan and then invokes `loop` against it. This change (PR 2 of 3) lays the CLI plumbing for that split; PR 3 ships the interactive walkthrough and the structured plan format.
|
|
44
|
+
|
|
45
|
+
No behavior change in this release — both `glrs oc loop "<prompt>"` and `glrs oc autopilot "<prompt>"` do exactly what `autopilot` did before.
|
|
46
|
+
|
|
47
|
+
- [#65](https://github.com/iceglober/glrs/pull/65) [`4e20574`](https://github.com/iceglober/glrs/commit/4e205745f9d8c46180d99b3237fc038a62cf94f1) Thanks [@iceglober](https://github.com/iceglober)! - Remove the broken `plan-dir` and `plan-check` CLI subcommands and fix `@plan`'s write permission
|
|
48
|
+
|
|
49
|
+
The `bunx @glrs-dev/harness-plugin-opencode plan-dir` and `plan-check` subcommands had been dead since the standalone-invocation redirect guard was introduced in April 2026 — they exit 1 with a deprecation banner and produce no stdout when an agent invokes them via `bunx`. Every caller silently fell through, so this surface was not load-bearing. This release rips both subcommands (and the bundled `plan-check.sh` script) out of the CLI. Agents that previously resolved the plan directory via `plan-dir` now use a four-line inline bash snippet that composes `git rev-parse --git-common-dir`, `dirname`, `basename`, and `mkdir -p` to compute `~/.glorious/opencode/<repo-folder>/plans/` directly (honoring `$GLORIOUS_PLAN_DIR` as an override base). The `plan-paths.ts` library module and its `getRepoFolder`, `getPlanDir`, `migratePlans` exports remain — they were never the broken piece.
|
|
50
|
+
|
|
51
|
+
Companion fix: `@plan`'s permission block was missing `write: "allow"`, which prevented the agent from ever creating a plan file even when `plan-dir` was conceptually working. The permission now grants `write: "allow"` plus a four-entry bash allow-list covering only the commands the inline snippet needs. The "plan writes only plan files" invariant is preserved at the prompt layer (hard-rules section).
|
|
52
|
+
|
|
53
|
+
If you were calling `bunx @glrs-dev/harness-plugin-opencode plan-dir` or `plan-check` directly in a script, switch to either (a) the inline bash snippet above or (b) importing `getPlanDir` / `migratePlans` from the library if you're writing TypeScript.
|
|
54
|
+
|
|
55
|
+
- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Add multi-file structured plan schema, @scoper agent for interactive scoping, and plan-aware progress reporting in the autopilot plugin.
|
|
56
|
+
|
|
57
|
+
- New `@scoper` primary agent for first-principles alignment before planning
|
|
58
|
+
- Multi-file plan schema: `plans/<slug>/main.md` + `phase_N.md` files for complex features
|
|
59
|
+
- `plan-parser` module: parses both single-file and multi-file plans, returns structured progress data
|
|
60
|
+
- Plan-aware heartbeat: status messages include phase progress for multi-file plans
|
|
61
|
+
- `glrs oc autopilot` is now its own interactive subcommand (diverged from `loop`)
|
|
62
|
+
- `@plan` agent updated with multi-file decision heuristic
|
|
63
|
+
- `@build` agent updated with multi-file plan navigation instructions
|
|
64
|
+
- `@plan-reviewer` agent updated with multi-file consistency validation
|
|
65
|
+
|
|
66
|
+
## 2.2.0
|
|
67
|
+
|
|
68
|
+
### Minor Changes
|
|
69
|
+
|
|
70
|
+
- [#58](https://github.com/iceglober/glrs/pull/58) [`2720440`](https://github.com/iceglober/glrs/commit/2720440e76ed76f95a59b77525cb140bd673d669) Thanks [@iceglober](https://github.com/iceglober)! - Autopilot rewrite, pilot rip-out, Tier 1 visual capabilities, opencode-snip toggle, research-variant hiding.
|
|
71
|
+
|
|
72
|
+
**Breaking changes:**
|
|
73
|
+
|
|
74
|
+
- **Pilot subsystem removed.** The `glrs oc pilot` CLI subcommand, the four pilot agents (`pilot-scoper` / `planner` / `builder` / `assessor`), the pilot-planning skill references, the `pilot-plugin.ts` runtime enforcer, and all pilot state/docs are gone. Users on pilot should migrate to the CLI autopilot or plain PRIME workflow.
|
|
75
|
+
- **TUI `/autopilot` slash command removed.** Autopilot is now CLI-only: `glrs oc autopilot "<prompt>"`. Users who want autonomous looping run the CLI in any terminal; the TUI stays for interactive work.
|
|
76
|
+
- **Research-variant agents (`research-web`, `research-local`, `research-auto`) hidden from the primary-agent picker.** They now run only as subagents dispatched by `@research`. Users who previously selected them directly should select `@research` instead.
|
|
77
|
+
|
|
78
|
+
**New features:**
|
|
79
|
+
|
|
80
|
+
- **CLI autopilot (`glrs oc autopilot "<prompt>"`)** — Ralph-loop engine: sends your prompt each iteration, watches the agent's response for `<autopilot-done>` sentinel, retries the same prompt when absent. Budgets: 50 iterations / 4h / 3 zero-progress iterations / kill-switch file. Supports single-issue (`"ship ENG-1234"`) and multi-issue (`"ship every open ENG-* issue in project ROADMAP"`) prompts.
|
|
81
|
+
- **opencode-snip installer toggle** — new "Plugin add-ons" section in `glrs oc install` (parallel to existing MCP toggles). Opt-in adds `opencode-snip` to the user's `plugin` array via config-merge, no vendored code. Useful for token reduction on bash-heavy sessions. Requires the Go `snip` binary separately.
|
|
82
|
+
- **Tier 1 visual capabilities** — `@plan`, `@research`, `@gap-analyzer` now have Playwright MCP access (joining `@prime`, `@build`, `@assessor`, `@assessor-thorough`, `@plan-reviewer`). Enable via the installer's Playwright toggle.
|
|
83
|
+
- **UI evaluation ladder (graceful degradation)** — all visual-capable agents now carry a four-tier capability ladder (Playwright → curl → webfetch → source inspection). When Playwright is unavailable, agents fall through to the next tier and report which method they used. No hard failure on Playwright absence.
|
|
84
|
+
|
|
85
|
+
**Internal:**
|
|
86
|
+
|
|
87
|
+
- Server lifecycle helpers (`startServer` / `createSession` / `sendAndWait` / `getLastAssistantMessage`) moved from `src/pilot/server.ts` to `src/lib/opencode-server.ts` (consumed by the CLI autopilot).
|
|
88
|
+
- Agent roster reduced from 20 → 16. Net −5,308 lines across 91 files. Test count 536 → 462 (pilot tests removed, visual-capability tests added).
|
|
89
|
+
|
|
90
|
+
- [#55](https://github.com/iceglober/glrs/pull/55) [`8099c49`](https://github.com/iceglober/glrs/commit/8099c498fa6a9c05c8880bfd09cb2c4fd7d1721c) Thanks [@iceglober](https://github.com/iceglober)! - Rename PRIME arc phases to SPEAR model (Scope → Plan → Execute → Assess → Resolve). Rename @qa-reviewer → @assessor, @qa-thorough → @assessor-thorough. Resolve stage auto-ships (pushes branch, opens PR) — /ship becomes a resume path for interrupted sessions.
|
|
91
|
+
|
|
92
|
+
- [#57](https://github.com/iceglober/glrs/pull/57) [`6212c48`](https://github.com/iceglober/glrs/commit/6212c483efa2cc8f0407bc6a0d8c23110498eb21) Thanks [@iceglober](https://github.com/iceglober)! - Restructure the SPEAR protocol (PRIME's five-stage arc) across four areas: Assess quality, failure discipline, skill modularity, and agent-contract hygiene.
|
|
93
|
+
|
|
94
|
+
**Breaking changes** (match the prior `@assessor` rename's hard-break pattern):
|
|
95
|
+
|
|
96
|
+
- `@assessor` is replaced by `@spec-reviewer` (first pass, returns `[PASS_SPEC]` or `[FAIL_SPEC]`) and `@code-reviewer` (second pass, runs only on PASS_SPEC, returns `[PASS]` / `[LOOP-TO-PLAN]` / `[FIX-INLINE]`). User configs referencing `@assessor` by name will fail to resolve — update to the appropriate replacement.
|
|
97
|
+
- `@assessor-thorough` is renamed to `@code-reviewer-thorough` (same role: opus-tier backstop for high-risk diffs that re-runs the full suite unconditionally).
|
|
98
|
+
- Registered agent count: 20 → 21.
|
|
99
|
+
|
|
100
|
+
**Assess rigor (two-stage review + MECE rubric):**
|
|
101
|
+
|
|
102
|
+
- Every Assess cycle now dispatches two subagents sequentially instead of one, roughly doubling the subagent calls per review cycle. The spec pass is cheaper; the code-quality pass runs only if spec passed.
|
|
103
|
+
- Assess delegations carry a five-dimension MECE rubric (Correctness, Completeness, Consistency, Safety, Scope) and a progressive-strictness signal (Level 1/2/3) that tightens across Assess iterations.
|
|
104
|
+
- PRs with red CI (typecheck, lint, or tests failing) now fail Assess regardless of whether the failure appears pre-existing. "Pre-existing" claims require three-part evidence: a specific commit SHA, `git log` output showing the failure pre-dates the branch, and merge-base reproduction. Claims without all three are auto-rejected.
|
|
105
|
+
|
|
106
|
+
**Failure discipline (no-defer policy):**
|
|
107
|
+
|
|
108
|
+
- The hard rule that allowed logging pre-existing failures to a plan's `## Open questions` section and deferring them is removed.
|
|
109
|
+
- `@build` now runs a mandatory root-cause diagnosis protocol on any unexpected test/lint/typecheck failure: merge-base reproduction, `git blame`, rationalization table countering common excuse patterns ("likely pre-existing", "unrelated to my change", etc.).
|
|
110
|
+
- If fixing a failure would require touching more than ~5 files outside the plan's `## File-level changes`, `@build` STOPs with a reorganization proposal for PRIME to present to the user — there is no autonomous deferral path.
|
|
111
|
+
|
|
112
|
+
**TDD enforcement:**
|
|
113
|
+
|
|
114
|
+
- For any plan with a `## Test plan` entry or a `tests:` field in the acceptance-criteria fence, `@build` now enforces TDD order: write the test first, verify it fails, then implement. Tests in a just-written RED state are explicitly carved out of the failure-diagnosis protocol — they're expected failures, not unexpected ones.
|
|
115
|
+
|
|
116
|
+
**New bundled skills:**
|
|
117
|
+
|
|
118
|
+
- `spear-protocol` — the full SPEAR stage logic (Bootstrap, Scope, Plan, Execute, Assess, Resolve). Loaded by PRIME at session start. Inline fallback retained in `prime.md` in case skill-loading is unavailable.
|
|
119
|
+
- `root-cause-diagnosis` — the failure-diagnosis protocol + rationalization table. Loaded by `@build` and its strict-executor variant on unexpected failures.
|
|
120
|
+
- `adversarial-review-rubric` — the MECE rubric, progressive strictness levels, Red-CI-blocks-merge rule, and three-part evidence test. Loaded by all Assess-layer agents before reviewing.
|
|
121
|
+
|
|
122
|
+
**Agent-contract changes:**
|
|
123
|
+
|
|
124
|
+
- `@build` gains a four-status return protocol: DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED.
|
|
125
|
+
- `@build` now reports guidance deviations (item (e) of its return payload) when PRIME's Execute-prompt guidance permits multiple readings and `@build` picked one. Same "silence is not acceptable" bar as plan-file mutations.
|
|
126
|
+
- PRIME runs a pre-dispatch consistency check before every `@build` dispatch: re-read the Execute prompt against the plan and against any already-drafted follow-up prompts. Contradictions caught pre-dispatch avoid the downstream blame-misattribution pattern where faithful agent execution gets narrated as deviation.
|
|
127
|
+
- `@plan` bans placeholder phrases (TBD, TODO, "implement later", etc.) and runs a self-review checklist (spec coverage, placeholder scan, type/name consistency) before handing to `@plan-reviewer`.
|
|
128
|
+
- `@build`'s prompt is trimmed of orchestration context per the Minimal Contract principle (subagents perform worse when carrying parent-level workflow philosophy).
|
|
129
|
+
|
|
130
|
+
**Other refinements:**
|
|
131
|
+
|
|
132
|
+
- PRIME's Scope grounding dispatches parallel `@code-searcher` calls in a single message when grounding touches 3+ independent subsystems.
|
|
133
|
+
- PRIME's Plan stage detects multi-subsystem requests (3+ independent subsystems with no shared interface) and asks whether to split into separate plans.
|
|
134
|
+
- Delegation prompts apply the Minimal Contract minimality test: remove any sentence that doesn't help the subagent produce a better result. Non-goals prefer positive-instruction form ("Only modify files listed above") over negative lists when the positive form is shorter.
|
|
135
|
+
|
|
3
136
|
## 2.1.0
|
|
4
137
|
|
|
5
138
|
## 2.0.1
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# @glrs-dev/harness-plugin-opencode
|
|
2
2
|
|
|
3
|
-
Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended
|
|
3
|
+
Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended autopilot loop — one package.
|
|
4
4
|
|
|
5
5
|
## Quick start
|
|
6
6
|
|
|
@@ -21,7 +21,7 @@ bunx @glrs-dev/harness-plugin-opencode install
|
|
|
21
21
|
opencode
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but
|
|
24
|
+
No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but you can add it later.
|
|
25
25
|
|
|
26
26
|
### Verifying the published tarball
|
|
27
27
|
|
|
@@ -43,18 +43,21 @@ Open OpenCode in any repo. The `prime` agent handles everything end-to-end.
|
|
|
43
43
|
```
|
|
44
44
|
/fresh ENG-1234
|
|
45
45
|
```
|
|
46
|
-
Wipes the worktree, creates a branch from the ticket ref, and begins the
|
|
46
|
+
Wipes the worktree, creates a branch from the ticket ref, and begins the SPEAR workflow: scope → plan → execute → assess → resolve.
|
|
47
47
|
|
|
48
48
|
**Start a task from a description:**
|
|
49
49
|
```
|
|
50
50
|
/fresh add rate limiting to the upload endpoint
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
**Go hands-off
|
|
53
|
+
**Go hands-off with the Ralph loop (CLI, lights-out):**
|
|
54
54
|
```
|
|
55
|
-
|
|
55
|
+
glrs oc loop "ship ENG-1234"
|
|
56
56
|
```
|
|
57
|
-
|
|
57
|
+
|
|
58
|
+
Runs PRIME in a loop: sends your prompt each iteration, watches for `<autopilot-done>` in the response, exits when the sentinel appears or a budget is hit (50 iterations / 4h / 3 zero-progress iterations / kill-switch at `.agent/autopilot-disable`). Works with multi-issue prompts too: `glrs oc loop "ship every open issue in Linear project ENG-ROADMAP until the project is done"`. There is no TUI slash command — if you're in the TUI and don't want the loop, just type the task normally.
|
|
59
|
+
|
|
60
|
+
`glrs oc autopilot` is an alias for `glrs oc loop` during the current release cycle. A future release will make `autopilot` an interactive scoping walkthrough that produces a structured plan and then invokes `loop` against it; `loop` will stay as the raw-prompt runner.
|
|
58
61
|
|
|
59
62
|
**Ship when done:**
|
|
60
63
|
```
|
|
@@ -66,7 +69,7 @@ Squashes commits, pushes, opens a PR with the plan as the body.
|
|
|
66
69
|
```
|
|
67
70
|
/review 87
|
|
68
71
|
```
|
|
69
|
-
Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@
|
|
72
|
+
Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@assessor`, outputs a structured verdict.
|
|
70
73
|
|
|
71
74
|
**Deep codebase research:**
|
|
72
75
|
```
|
|
@@ -74,41 +77,21 @@ Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates t
|
|
|
74
77
|
```
|
|
75
78
|
Spawns parallel subagents, synthesizes findings with exact file:line references.
|
|
76
79
|
|
|
77
|
-
### Autonomous (pilot CLI)
|
|
78
|
-
|
|
79
|
-
For larger work that benefits from structured scoping and autonomous execution with self-assessment.
|
|
80
|
-
|
|
81
|
-
```bash
|
|
82
|
-
# Scope interactively — spawns OpenCode TUI with the pilot-scoper agent
|
|
83
|
-
glrs-oc pilot scope "Refactor the billing module into separate services"
|
|
84
|
-
|
|
85
|
-
# Execute autonomously — Plan → Execute → Assess → Resolve (SPEAR loop)
|
|
86
|
-
glrs-oc pilot go
|
|
87
|
-
|
|
88
|
-
# Configure models and verify commands for this repo
|
|
89
|
-
glrs-oc pilot configure
|
|
90
|
-
|
|
91
|
-
# Check workflow status
|
|
92
|
-
glrs-oc pilot status
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
See [Pilot mode](#pilot-mode) for the full command reference.
|
|
96
|
-
|
|
97
80
|
---
|
|
98
81
|
|
|
99
82
|
## What the plugin provides
|
|
100
83
|
|
|
101
|
-
|
|
84
|
+
16 agents, 7 slash commands, 5 tools, 5 MCPs, 11 skill bundles, 3 sub-plugins. Details below.
|
|
102
85
|
|
|
103
86
|
### Agents
|
|
104
87
|
|
|
105
88
|
| Agent | Tier | Role |
|
|
106
89
|
|-------|------|------|
|
|
107
|
-
| `prime` | deep |
|
|
90
|
+
| `prime` | deep | SPEAR end-to-end workflow (default agent) |
|
|
108
91
|
| `plan` | deep | Interactive planner with gap analysis and adversarial review |
|
|
109
92
|
| `build` | mid | Plan executor |
|
|
110
|
-
| `
|
|
111
|
-
| `
|
|
93
|
+
| `assessor` | mid | Fast adversarial code review |
|
|
94
|
+
| `assessor-thorough` | deep | Full-suite adversarial review |
|
|
112
95
|
| `plan-reviewer` | deep | Adversarial plan review |
|
|
113
96
|
| `gap-analyzer` | deep | Identifies gaps in plans |
|
|
114
97
|
| `architecture-advisor` | deep | Architecture guidance |
|
|
@@ -116,8 +99,8 @@ See [Pilot mode](#pilot-mode) for the full command reference.
|
|
|
116
99
|
| `docs-maintainer` | mid | Documentation updates |
|
|
117
100
|
| `lib-reader` | mid | Library/dependency reader |
|
|
118
101
|
| `agents-md-writer` | mid | AGENTS.md generation |
|
|
119
|
-
| `
|
|
120
|
-
| `
|
|
102
|
+
| `research` | deep | Multi-workstream research orchestrator |
|
|
103
|
+
| `research-web` / `research-local` / `research-auto` | deep | Research subagents (dispatched by `@research`) |
|
|
121
104
|
|
|
122
105
|
Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Override with [`harness.models`](#model-overrides).
|
|
123
106
|
|
|
@@ -126,13 +109,14 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
|
|
|
126
109
|
| Command | What it does |
|
|
127
110
|
|---------|-------------|
|
|
128
111
|
| `/fresh <ref>` | Wipe worktree, branch from ticket or description, start PRIME |
|
|
129
|
-
| `/autopilot <ref>` | Hands-off PRIME run; stops when acceptance criteria pass |
|
|
130
112
|
| `/ship <plan>` | Squash, push, open PR |
|
|
131
113
|
| `/review <target>` | Read-only adversarial review (PR#, SHA, branch, or file) |
|
|
132
114
|
| `/research <topic>` | Parallel codebase exploration with file:line citations |
|
|
133
115
|
| `/init-deep` | Generate hierarchical AGENTS.md files |
|
|
134
116
|
| `/costs` | Show running LLM spend totals |
|
|
135
117
|
|
|
118
|
+
Autopilot is CLI-only: `glrs oc loop "<prompt>"` (or the `glrs oc autopilot` alias during the current release cycle — see above).
|
|
119
|
+
|
|
136
120
|
### Tools
|
|
137
121
|
|
|
138
122
|
`ast_grep` · `tsc_check` · `eslint_check` · `todo_scan` · `comment_check`
|
|
@@ -149,94 +133,48 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
|
|
|
149
133
|
|
|
150
134
|
### Sub-plugins
|
|
151
135
|
|
|
152
|
-
- **autopilot** — idle-nudge loop driver (only activates via `/autopilot`)
|
|
153
136
|
- **notify** — OS notifications when the agent asks a question
|
|
154
137
|
- **cost-tracker** — LLM spend by provider/model at `~/.glorious/opencode/costs.json`
|
|
155
|
-
- **
|
|
138
|
+
- **tool-hooks** — post-edit verification loop (tsc, eslint) + output backpressure
|
|
156
139
|
|
|
157
140
|
### Skills
|
|
158
141
|
|
|
159
|
-
`
|
|
142
|
+
`adr` · `agent-estimation` · `code-quality` · `research` · `research-auto` · `research-local` · `research-web` · `review-plan` · `vercel-composition-patterns` · `vercel-react-best-practices` · `web-design-guidelines`
|
|
160
143
|
|
|
161
144
|
---
|
|
162
145
|
|
|
163
|
-
##
|
|
164
|
-
|
|
165
|
-
Autonomous code execution using the SPEAR loop (Scope → Plan → Execute → Assess → Resolve). The user scopes interactively, then `pilot go` runs the rest autonomously with self-assessment and deployment-risk reflection.
|
|
166
|
-
|
|
167
|
-
**Prerequisites:** `git` >= 2.5, `opencode` on PATH. Plugin must be installed (auto-prompted if missing).
|
|
146
|
+
## Enabling visual UI capabilities
|
|
168
147
|
|
|
169
|
-
|
|
148
|
+
The `@plan`, `@research`, `@gap-analyzer`, `@prime`, `@build`, `@assessor`, `@assessor-thorough`, and `@plan-reviewer` agents can verify web UIs, rendered output, and visual components when Playwright is available.
|
|
170
149
|
|
|
171
|
-
|
|
172
|
-
|---------|-------------|
|
|
173
|
-
| `glrs-oc pilot scope "<goal>"` | Interactive scoping session. Produces `scope.json` with framing + acceptance criteria. |
|
|
174
|
-
| `glrs-oc pilot go` | Autonomous execution. Reads scope, runs Plan → Execute → Assess → Resolve. |
|
|
175
|
-
| `glrs-oc pilot configure` | Interactive per-phase model selection, verify commands, assess cycles, Playwright toggle. |
|
|
176
|
-
| `glrs-oc pilot status` | Workflow status from SQLite. `--workflow <id>`, `--json`. |
|
|
177
|
-
|
|
178
|
-
### SPEAR loop
|
|
179
|
-
|
|
180
|
-
1. **Scope** (interactive) — scoper agent interviews you, explores the codebase, produces acceptance criteria.
|
|
181
|
-
2. **Plan** (autonomous) — planner agent decomposes ACs into an ordered task list.
|
|
182
|
-
3. **Execute** (autonomous) — builder agent runs one task at a time, commits on verify pass.
|
|
183
|
-
4. **Assess** (autonomous) — assessor evaluates ACs + asks deployment-risk questions (what could break? unexpected consequences? what could go wrong?). If fail → re-plan the gap → re-execute → re-assess (bounded by `max_assess_cycles`).
|
|
184
|
-
5. **Resolve** (autonomous) — final summary with acknowledged risks.
|
|
185
|
-
|
|
186
|
-
### State storage
|
|
187
|
-
|
|
188
|
-
```
|
|
189
|
-
~/.glorious/opencode/<repo>/pilot/
|
|
190
|
-
state.sqlite # workflows + events
|
|
191
|
-
current-scope.json # pointer to active scope
|
|
192
|
-
scopes/<workflowId>/
|
|
193
|
-
scope.json # framing + acceptance criteria
|
|
194
|
-
plan.json # task list
|
|
195
|
-
assessment-cycle-N.json # assessment reports
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
Repo identity derived from `git rev-parse --git-common-dir` — worktrees of the same repo share state. Override with `$GLORIOUS_PILOT_DIR`.
|
|
150
|
+
### Enable Playwright MCP
|
|
199
151
|
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
Config lives at `.glrs/pilot.json` in your repo (not per-plan YAML):
|
|
152
|
+
During `glrs-oc install-plugin`, select **Playwright — browser automation + visual UI verification (requires Chromium)** in the MCP toggle list. Or enable it manually in `opencode.json`:
|
|
203
153
|
|
|
204
154
|
```json
|
|
205
155
|
{
|
|
206
|
-
"
|
|
207
|
-
"
|
|
208
|
-
|
|
209
|
-
"execute": "anthropic/claude-sonnet-4-6",
|
|
210
|
-
"assess": "anthropic/claude-sonnet-4-6"
|
|
211
|
-
},
|
|
212
|
-
"verify": {
|
|
213
|
-
"baseline": ["bun test", "bun run typecheck"],
|
|
214
|
-
"after_each": ["bun run typecheck"]
|
|
215
|
-
},
|
|
216
|
-
"max_assess_cycles": 3,
|
|
217
|
-
"playwright": { "enabled": false, "base_url": "http://localhost:3000" }
|
|
156
|
+
"mcp": {
|
|
157
|
+
"playwright": { "enabled": true }
|
|
158
|
+
}
|
|
218
159
|
}
|
|
219
160
|
```
|
|
220
161
|
|
|
221
|
-
|
|
162
|
+
Then install Chromium:
|
|
222
163
|
|
|
223
|
-
|
|
164
|
+
```bash
|
|
165
|
+
npx playwright install chromium
|
|
166
|
+
```
|
|
224
167
|
|
|
225
|
-
|
|
168
|
+
### Graceful degradation
|
|
226
169
|
|
|
227
|
-
|
|
228
|
-
|---|---|
|
|
229
|
-
| `pilot plan` | `pilot scope "<goal>"` |
|
|
230
|
-
| `pilot build` | `pilot go` |
|
|
231
|
-
| `pilot validate` | `pilot configure` (config validation) |
|
|
232
|
-
| `pilot status` | `pilot status` (same name, different output) |
|
|
233
|
-
| `pilot logs` | `pilot status --json` |
|
|
234
|
-
| `pilot cost` | `pilot status --json` |
|
|
235
|
-
| `pilot build-resume` | `pilot go` (re-reads scope, restarts from Plan) |
|
|
170
|
+
Agents automatically fall back when Playwright is unavailable:
|
|
236
171
|
|
|
237
|
-
|
|
172
|
+
1. **Tier A (Playwright)** — navigate, screenshot, evaluate DOM. Best signal.
|
|
173
|
+
2. **Tier B (curl)** — parse returned HTML for structure and reachability.
|
|
174
|
+
3. **Tier C (webfetch)** — built-in tool for public URLs.
|
|
175
|
+
4. **Tier D (source inspection)** — read component files and reason about rendering. Agent flags "visual verification skipped" in its final message.
|
|
238
176
|
|
|
239
|
-
|
|
177
|
+
No configuration required — agents detect capability absence from MCP errors and fall through automatically.
|
|
240
178
|
|
|
241
179
|
---
|
|
242
180
|
|
|
@@ -293,9 +231,7 @@ Your opencode.json values win. Example:
|
|
|
293
231
|
| `glrs-oc install-plugin [--pin] [--dry-run]` | Register plugin in opencode.json |
|
|
294
232
|
| `glrs-oc uninstall [--dry-run]` | Remove plugin from opencode.json |
|
|
295
233
|
| `glrs-oc doctor` | Check installation health |
|
|
296
|
-
| `glrs-oc
|
|
297
|
-
| `glrs-oc plan-dir` | Print repo-shared plan directory |
|
|
298
|
-
| `glrs-oc plan-check <path>` | Validate legacy markdown plan files |
|
|
234
|
+
| `glrs-oc loop "<prompt>"` | Run PRIME in a Ralph loop (lights-out). `autopilot` is an alias during the current release cycle. |
|
|
299
235
|
|
|
300
236
|
`install` is an alias for `install-plugin`.
|
|
301
237
|
|
|
@@ -324,7 +260,7 @@ bun remove -g @glrs-dev/harness-plugin-opencode # remove CLI
|
|
|
324
260
|
- `bun`
|
|
325
261
|
- `uvx` for serena + git MCPs (`brew install uv`)
|
|
326
262
|
- `node`/`npx` for memory MCP
|
|
327
|
-
- `git`
|
|
263
|
+
- `git` for version control operations
|
|
328
264
|
|
|
329
265
|
## Security & threat boundaries
|
|
330
266
|
|
|
@@ -334,8 +270,8 @@ Report vulnerabilities privately per [`SECURITY.md`](./SECURITY.md) — do NOT o
|
|
|
334
270
|
|
|
335
271
|
This is a plugin with broad local-machine access. Install it deliberately:
|
|
336
272
|
|
|
337
|
-
- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo
|
|
338
|
-
- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands
|
|
273
|
+
- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo>/*`).
|
|
274
|
+
- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands.
|
|
339
275
|
- **Makes outbound HTTPS calls** (all opt-out-able):
|
|
340
276
|
- `registry.npmjs.org` — daily version check. Opt out: `HARNESS_OPENCODE_UPDATE_CHECK=0`.
|
|
341
277
|
- `catwalk.charm.land` — model catalog during interactive install only. Response is schema-validated before it reaches your `opencode.json`.
|
package/SECURITY.md
CHANGED
|
@@ -44,7 +44,7 @@ If a vulnerability is confirmed and fixed, we will publish a GitHub security adv
|
|
|
44
44
|
**In scope:**
|
|
45
45
|
|
|
46
46
|
- The published npm tarball (`@glrs-dev/harness-plugin-opencode`).
|
|
47
|
-
- CLI subcommands (`glrs-oc`, `harness-opencode`): `install`, `uninstall`, `doctor`, `
|
|
47
|
+
- CLI subcommands (`glrs-oc`, `harness-opencode`): `install`, `uninstall`, `doctor`, `pilot`.
|
|
48
48
|
- Plugin hooks registered via the OpenCode plugin API (`config`, `tool.execute.before/after`, `session.idle`, etc.).
|
|
49
49
|
- The MCP config writer (`src/cli/install.ts`, `src/mcp/index.ts`) and the `opencode.json` merge logic (`src/cli/merge-config.ts`).
|
|
50
50
|
- Outbound network calls the plugin makes on its own:
|
|
@@ -34,6 +34,22 @@ If ANY of these are missing, STOP and report to the user:
|
|
|
34
34
|
|
|
35
35
|
Do NOT attempt to "fill in" missing structure on behalf of the plan. The plan is the spec; if the spec is wrong, fix it explicitly — don't improvise.
|
|
36
36
|
|
|
37
|
+
## 1.5 Multi-file plan handling
|
|
38
|
+
|
|
39
|
+
If the plan path is a directory (contains `main.md`), it is a multi-file plan. Handle it as follows:
|
|
40
|
+
|
|
41
|
+
1. Read `main.md`'s `## Phases` checklist.
|
|
42
|
+
2. Find the first unchecked phase (`- [ ] phase_N.md — ...`).
|
|
43
|
+
3. Open the corresponding `phase_N.md` as the working plan for this iteration.
|
|
44
|
+
4. Execute its items per the normal workflow (sections 2–4 below).
|
|
45
|
+
5. After completing all items in the phase file, re-read it and verify all ACs are `[x]`.
|
|
46
|
+
6. Update `main.md`'s corresponding phase checkbox to `[x]`.
|
|
47
|
+
7. Proceed to the next unchecked phase.
|
|
48
|
+
|
|
49
|
+
Cross-cutting ACs in `main.md` (under `## Cross-cutting acceptance criteria`) are verified independently via their own `verify:` commands after all phases are complete.
|
|
50
|
+
|
|
51
|
+
If the plan path is a single `.md` file, skip this section and proceed normally.
|
|
52
|
+
|
|
37
53
|
## 2. Prepare the return summary
|
|
38
54
|
|
|
39
55
|
Before starting execution, prepare a brief summary for your eventual return payload to PRIME: file count, which acceptance criteria you will verify, any unknowns. When invoked as a subagent (the common case — PRIME delegates Phase 3 to you), this summary is for PRIME to relay to the user; do not narrate to the user directly. When invoked top-level by the user (`@build <plan-path>`), you may print the summary to chat.
|
|
@@ -47,9 +63,12 @@ Before editing any file longer than ~200 lines, run `comment_check` scoped to th
|
|
|
47
63
|
For each item in `## File-level changes`:
|
|
48
64
|
1. Make the change.
|
|
49
65
|
2. After each non-trivial change, run lint and tests for the affected files.
|
|
50
|
-
3. If a test fails, fix it before moving on.
|
|
66
|
+
3. If a test fails, fix it before moving on. Run the root-cause diagnosis protocol below before drawing any conclusion about the failure's origin.
|
|
51
67
|
4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
|
|
52
68
|
|
|
69
|
+
**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
|
|
70
|
+
The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
|
|
71
|
+
|
|
53
72
|
**Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0. This is how fenced plans encode strict TDD — the `tests:` field is the spec; the code is secondary.
|
|
54
73
|
|
|
55
74
|
When you discover the plan is wrong:
|
|
@@ -64,7 +83,7 @@ Before returning to PRIME (or declaring complete on a top-level invocation):
|
|
|
64
83
|
- `tsc_check` on each edited file is clean (it's capped and fast — run it).
|
|
65
84
|
- `git diff --stat` matches the plan's `## File-level changes`.
|
|
66
85
|
|
|
67
|
-
Do NOT run the full test suite or a full lint pass. PRIME's
|
|
86
|
+
Do NOT run the full test suite or a full lint pass. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`, which will fail you if a full-suite regression slips through. Running the full suite here duplicates that work. Per-file tests during execution (section 3) are expected; a final full-suite run is not.
|
|
68
87
|
|
|
69
88
|
## 5. Return payload
|
|
70
89
|
|
|
@@ -76,13 +95,22 @@ Return control to your caller with a structured summary:
|
|
|
76
95
|
|
|
77
96
|
**(c) Plan mutations** — any cosmetic/numeric threshold bumps you absorbed silently, any scope expansions under the 2-file limit you absorbed. Be explicit: *"Updated plan §4 line-count threshold from 200 → 260 (file ended up 258 lines; self-imposed metric)"* is a good entry; silence is not.
|
|
78
97
|
|
|
79
|
-
**(d) Unusual conditions** —
|
|
98
|
+
**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition you hit.
|
|
99
|
+
|
|
100
|
+
**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts and put only in skill' and extracted fully; alternate reading was 'duplicate in skill while keeping inline as enforced default.' Chose full extraction because DRY and the rules also live in prime.md hard rules."* Silence is not acceptable — same bar as item (c). A PRIME that can't see the decision-point after the fact has no way to tell a defensible judgment from a silent disobedience.
|
|
101
|
+
|
|
102
|
+
**Return status.** Use one of these four statuses in your return:
|
|
103
|
+
|
|
104
|
+
- **DONE** — all acceptance criteria met, no concerns.
|
|
105
|
+
- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention (e.g., a pattern inconsistency you worked around, a non-blocking lint warning, a TODO you left in place per the plan's `## Out of scope`). List concerns explicitly.
|
|
106
|
+
- **NEEDS_CONTEXT** — you hit ambiguity that requires user input before you can proceed. Describe what's needed.
|
|
107
|
+
- **BLOCKED** — a hard blocker prevents completion (missing dependency, conflicting plan, broken environment). Describe the blocker.
|
|
80
108
|
|
|
81
109
|
**STOP payloads.** If you hit a blocker instead of completing, make the STOP clearly labeled in your return so PRIME recognizes it as a blocker rather than a completion. Format:
|
|
82
110
|
|
|
83
111
|
> STOP: <one-sentence blocker>. <Which of the three classes this falls under: cosmetic-numeric / approach-design / scope-expansion-over-2-files>. <What PRIME needs to resolve to re-dispatch>.
|
|
84
112
|
|
|
85
|
-
PRIME owns QA dispatch. Do NOT delegate to `@
|
|
113
|
+
PRIME owns QA dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent — PRIME's Assess stage applies a fast-vs-thorough heuristic based on diff size + risk that you don't have full context for. When invoked top-level (`@build <plan-path>`), you may delegate to `@spec-reviewer` directly as the session's final step.
|
|
86
114
|
|
|
87
115
|
# Hard rules
|
|
88
116
|
|
|
@@ -91,3 +119,5 @@ PRIME owns QA dispatch. Do NOT delegate to `@qa-reviewer` or `@qa-thorough` your
|
|
|
91
119
|
- **Never use `--no-verify` or `--no-gpg-sign`** to bypass pre-commit hooks. If a hook blocks you, fix the root cause (resolve TODOs, repair lint/type errors). If the hook seems genuinely wrong, STOP and ask the user.
|
|
92
120
|
- Plan file mutations: mark `[x]` freely as items complete. For **cosmetic / self-imposed numeric thresholds** (line-count budgets, row caps, arbitrary `< N` limits the planner set on itself), update the threshold silently and note it in your commit message — do NOT stop. For **approach / design changes** (the interface doesn't exist, the test strategy won't work, a whole section needs restructuring), stop and use the `question` tool. For **scope expansion** (an extra file or two needed to finish the item), add to `## File-level changes` and keep going; only ask if the expansion is > ~2 files or shifts the `## Goal`.
|
|
93
121
|
- The user's goals are fixed; your own metrics are revisable. If you find yourself working around the plan's *approach*, that's a design-change signal — stop and ask. If you're just bumping a threshold you set on yourself, keep moving.
|
|
122
|
+
|
|
123
|
+
{UI_EVALUATION_LADDER}
|
|
@@ -37,12 +37,17 @@ Before starting, note: file count, which acceptance criteria you will verify, an
|
|
|
37
37
|
|
|
38
38
|
## 3. Execute task by task
|
|
39
39
|
|
|
40
|
+
**Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0.
|
|
41
|
+
|
|
40
42
|
For each item in `## File-level changes`:
|
|
41
43
|
1. Make the change.
|
|
42
|
-
2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, fix and re-run.
|
|
44
|
+
2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, run the root-cause diagnosis protocol below, fix, and re-run.
|
|
43
45
|
3. If a test fails, fix it before moving on.
|
|
44
46
|
4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
|
|
45
47
|
|
|
48
|
+
**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
|
|
49
|
+
The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
|
|
50
|
+
|
|
46
51
|
**Verify commands.** Run the verify commands listed in the plan. If they pass, the item is done. If they fail, read the output, fix the code, and re-run. Do not mark an item `[x]` until the verify command exits 0.
|
|
47
52
|
|
|
48
53
|
When you discover the plan is wrong:
|
|
@@ -59,7 +64,7 @@ Before returning:
|
|
|
59
64
|
- `tsc_check` on each edited file is clean.
|
|
60
65
|
- `git diff --stat` matches the plan's `## File-level changes`.
|
|
61
66
|
|
|
62
|
-
Do NOT run the full test suite. PRIME's
|
|
67
|
+
Do NOT run the full test suite. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`.
|
|
63
68
|
|
|
64
69
|
## 5. Return payload
|
|
65
70
|
|
|
@@ -71,13 +76,22 @@ Return control to your caller with a structured summary:
|
|
|
71
76
|
|
|
72
77
|
**(c) Plan mutations** — any changes you made to the plan file itself (threshold bumps, etc.).
|
|
73
78
|
|
|
74
|
-
**(d) Unusual conditions** —
|
|
79
|
+
**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition.
|
|
80
|
+
|
|
81
|
+
**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts' and extracted fully; alternate reading was 'duplicate in skill while keeping inline.' Chose full extraction because DRY."* Silence is not acceptable — same bar as item (c).
|
|
82
|
+
|
|
83
|
+
**Return status.** Use one of these four statuses:
|
|
84
|
+
|
|
85
|
+
- **DONE** — all acceptance criteria met, no concerns.
|
|
86
|
+
- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention. List concerns explicitly.
|
|
87
|
+
- **NEEDS_CONTEXT** — ambiguity requires user input before you can proceed.
|
|
88
|
+
- **BLOCKED** — a hard blocker prevents completion.
|
|
75
89
|
|
|
76
90
|
**STOP payloads.** If you hit a blocker, label it clearly:
|
|
77
91
|
|
|
78
92
|
> STOP: <one-sentence blocker>. <What needs to be resolved to re-dispatch>.
|
|
79
93
|
|
|
80
|
-
PRIME owns
|
|
94
|
+
PRIME owns Assess dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent.
|
|
81
95
|
|
|
82
96
|
# Hard rules
|
|
83
97
|
|