npm - @glrs-dev/harness-plugin-opencode - Versions diffs - 2.1.0 → 2.3.0 - Mend

@glrs-dev/harness-plugin-opencode 2.1.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/CHANGELOG.md +133 -0
package/README.md +42 -106
package/SECURITY.md +1 -1
package/dist/agents/prompts/build.md +34 -4
package/dist/agents/prompts/build.open.md +18 -4
package/dist/agents/prompts/code-reviewer-thorough.md +77 -0
package/dist/agents/prompts/code-reviewer.md +80 -0
package/dist/agents/prompts/code-reviewer.open.md +68 -0
package/dist/agents/prompts/debriefer.md +55 -0
package/dist/agents/prompts/gap-analyzer.md +2 -0
package/dist/agents/prompts/plan-reviewer.md +5 -1
package/dist/agents/prompts/plan.md +119 -10
package/dist/agents/prompts/prime.md +149 -88
package/dist/agents/prompts/research-auto.md +1 -1
package/dist/agents/prompts/research-local.md +1 -1
package/dist/agents/prompts/research-web.md +1 -1
package/dist/agents/prompts/research.md +2 -0
package/dist/agents/prompts/scoper.md +129 -0
package/dist/agents/prompts/spec-reviewer.md +53 -0
package/dist/agents/prompts/spec-reviewer.open.md +56 -0
package/dist/agents/shared/index.ts +1 -0
package/dist/agents/shared/ui-evaluation-ladder.md +50 -0
package/dist/agents/shared/workflow-mechanics.md +5 -5
package/dist/autopilot/prompt-template.md +104 -0
package/dist/chunk-GCWHRUOK.js +259 -0
package/dist/chunk-MJSMBY2Y.js +87 -0
package/dist/chunk-NIFAVPNN.js +544 -0
package/dist/{chunk-VJUETC6A.js → chunk-PDMXYZM4.js} +53 -1
package/dist/cli.js +1596 -1964
package/dist/commands/prompts/fresh.md +27 -24
package/dist/commands/prompts/review.md +3 -3
package/dist/commands/prompts/ship.md +2 -0
package/dist/index.js +188 -633
package/dist/loop-session-J35NILUZ.js +30 -0
package/dist/opencode-server-KPCDFYAX.js +22 -0
package/dist/plan-parser-TMHEKT22.js +6 -0
package/dist/plan-session-7VS32P52.js +117 -0
package/dist/scoper-S77SOK7X.js +326 -0
package/dist/skills/adversarial-review-rubric/SKILL.md +47 -0
package/dist/skills/code-quality/SKILL.md +1 -1
package/dist/skills/root-cause-diagnosis/SKILL.md +24 -0
package/dist/skills/spear-protocol/SKILL.md +167 -0
package/package.json +3 -1
package/dist/agents/prompts/pilot-assessor.md +0 -77
package/dist/agents/prompts/pilot-builder.md +0 -40
package/dist/agents/prompts/pilot-planner.md +0 -56
package/dist/agents/prompts/pilot-scoper.md +0 -58
package/dist/agents/prompts/qa-reviewer.md +0 -68
package/dist/agents/prompts/qa-reviewer.open.md +0 -58
package/dist/agents/prompts/qa-thorough.md +0 -63
package/dist/bin/plan-check.sh +0 -255
package/dist/chunk-6CZPRUMJ.js +0 -869
package/dist/chunk-DZG4D3OH.js +0 -54
package/dist/chunk-OYRKOEXK.js +0 -88
package/dist/commands/prompts/autopilot.md +0 -96
package/dist/install-6775ZBDG.js +0 -13
package/dist/paths-WZ23ZQOV.js +0 -18

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,138 @@
 # Changelog
+## 2.3.0
+### Minor Changes
+- [#71](https://github.com/iceglober/glrs/pull/71) [`94704ad`](https://github.com/iceglober/glrs/commit/94704adf36b5ea36fde4557cfd7b1d8494d0e68b) Thanks [@iceglober](https://github.com/iceglober)! - Add `@debriefer` agent and post-run debrief to the autopilot CLI
+  After the Ralph loop exits (any exit reason — sentinel, struggle, timeout, max-iterations, kill-switch, stall, or error), the CLI now optionally spawns a `@debriefer` agent session that produces a structured five-section summary:
+  1. **What was accomplished** — files changed, commits made, PRs opened
+  2. **What wasn't finished** — unchecked plan items
+  3. **Cost summary** — total USD, iterations completed, exit reason
+  4. **What to do next** — actionable suggestions based on exit reason
+  5. **Session artifacts** — log file path, plan file path, session ID
+  The debrief runs by default. Skip it with `--no-debrief` on the CLI or by setting `GLRS_AUTOPILOT_DEBRIEF=off` in the environment.
+  The `@debriefer` agent is mid-tier (Sonnet-class), read-only (no file edits, bash limited to git read commands), and never throws — if the debrief session fails, a warning is printed and the CLI exits normally based on the loop result.
+- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Autopilot rewrite, pilot rip-out, Tier 1 visual capabilities, opencode-snip toggle, research-variant hiding.
+  **Breaking changes:**
+  - **Pilot subsystem removed.** The `glrs oc pilot` CLI subcommand, the four pilot agents (`pilot-scoper` / `planner` / `builder` / `assessor`), the pilot-planning skill references, the `pilot-plugin.ts` runtime enforcer, and all pilot state/docs are gone. Users on pilot should migrate to the CLI autopilot or plain PRIME workflow.
+  - **TUI `/autopilot` slash command removed.** Autopilot is now CLI-only: `glrs oc autopilot "<prompt>"`. Users who want autonomous looping run the CLI in any terminal; the TUI stays for interactive work.
+  - **Research-variant agents (`research-web`, `research-local`, `research-auto`) hidden from the primary-agent picker.** They now run only as subagents dispatched by `@research`. Users who previously selected them directly should select `@research` instead.
+  **New features:**
+  - **CLI autopilot (`glrs oc autopilot "<prompt>"`)** — Ralph-loop engine: sends your prompt each iteration, watches the agent's response for `<autopilot-done>` sentinel, retries the same prompt when absent. Budgets: 50 iterations / 4h / 3 zero-progress iterations / kill-switch file. Supports single-issue (`"ship ENG-1234"`) and multi-issue (`"ship every open ENG-* issue in project ROADMAP"`) prompts.
+  - **opencode-snip installer toggle** — new "Plugin add-ons" section in `glrs oc install` (parallel to existing MCP toggles). Opt-in adds `opencode-snip` to the user's `plugin` array via config-merge, no vendored code. Useful for token reduction on bash-heavy sessions. Requires the Go `snip` binary separately.
+  - **Tier 1 visual capabilities** — `@plan`, `@research`, `@gap-analyzer` now have Playwright MCP access (joining `@prime`, `@build`, `@assessor`, `@assessor-thorough`, `@plan-reviewer`). Enable via the installer's Playwright toggle.
+  - **UI evaluation ladder (graceful degradation)** — all visual-capable agents now carry a four-tier capability ladder (Playwright → curl → webfetch → source inspection). When Playwright is unavailable, agents fall through to the next tier and report which method they used. No hard failure on Playwright absence.
+  **Internal:**
+  - Server lifecycle helpers (`startServer` / `createSession` / `sendAndWait` / `getLastAssistantMessage`) moved from `src/pilot/server.ts` to `src/lib/opencode-server.ts` (consumed by the CLI autopilot).
+  - Agent roster reduced from 20 → 16. Net −5,308 lines across 91 files. Test count 536 → 462 (pilot tests removed, visual-capability tests added).
+- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Add `glrs oc loop` as the canonical name for the Ralph-loop CLI runner (previously `glrs oc autopilot`). `autopilot` continues to work as an alias during this release cycle — no user scripts break.
+  A future release will diverge the two: `loop` stays as the raw-prompt Ralph-loop runner, and `autopilot` becomes an interactive scoping walkthrough that generates a structured multi-file plan and then invokes `loop` against it. This change (PR 2 of 3) lays the CLI plumbing for that split; PR 3 ships the interactive walkthrough and the structured plan format.
+  No behavior change in this release — both `glrs oc loop "<prompt>"` and `glrs oc autopilot "<prompt>"` do exactly what `autopilot` did before.
+- [#65](https://github.com/iceglober/glrs/pull/65) [`4e20574`](https://github.com/iceglober/glrs/commit/4e205745f9d8c46180d99b3237fc038a62cf94f1) Thanks [@iceglober](https://github.com/iceglober)! - Remove the broken `plan-dir` and `plan-check` CLI subcommands and fix `@plan`'s write permission
+  The `bunx @glrs-dev/harness-plugin-opencode plan-dir` and `plan-check` subcommands had been dead since the standalone-invocation redirect guard was introduced in April 2026 — they exit 1 with a deprecation banner and produce no stdout when an agent invokes them via `bunx`. Every caller silently fell through, so this surface was not load-bearing. This release rips both subcommands (and the bundled `plan-check.sh` script) out of the CLI. Agents that previously resolved the plan directory via `plan-dir` now use a four-line inline bash snippet that composes `git rev-parse --git-common-dir`, `dirname`, `basename`, and `mkdir -p` to compute `~/.glorious/opencode/<repo-folder>/plans/` directly (honoring `$GLORIOUS_PLAN_DIR` as an override base). The `plan-paths.ts` library module and its `getRepoFolder`, `getPlanDir`, `migratePlans` exports remain — they were never the broken piece.
+  Companion fix: `@plan`'s permission block was missing `write: "allow"`, which prevented the agent from ever creating a plan file even when `plan-dir` was conceptually working. The permission now grants `write: "allow"` plus a four-entry bash allow-list covering only the commands the inline snippet needs. The "plan writes only plan files" invariant is preserved at the prompt layer (hard-rules section).
+  If you were calling `bunx @glrs-dev/harness-plugin-opencode plan-dir` or `plan-check` directly in a script, switch to either (a) the inline bash snippet above or (b) importing `getPlanDir` / `migratePlans` from the library if you're writing TypeScript.
+- [#68](https://github.com/iceglober/glrs/pull/68) [`a5bbbba`](https://github.com/iceglober/glrs/commit/a5bbbba3819b2ba8b08bd8baed8af69670895ca9) Thanks [@iceglober](https://github.com/iceglober)! - Add multi-file structured plan schema, @scoper agent for interactive scoping, and plan-aware progress reporting in the autopilot plugin.
+  - New `@scoper` primary agent for first-principles alignment before planning
+  - Multi-file plan schema: `plans/<slug>/main.md` + `phase_N.md` files for complex features
+  - `plan-parser` module: parses both single-file and multi-file plans, returns structured progress data
+  - Plan-aware heartbeat: status messages include phase progress for multi-file plans
+  - `glrs oc autopilot` is now its own interactive subcommand (diverged from `loop`)
+  - `@plan` agent updated with multi-file decision heuristic
+  - `@build` agent updated with multi-file plan navigation instructions
+  - `@plan-reviewer` agent updated with multi-file consistency validation
+## 2.2.0
+### Minor Changes
+- [#58](https://github.com/iceglober/glrs/pull/58) [`2720440`](https://github.com/iceglober/glrs/commit/2720440e76ed76f95a59b77525cb140bd673d669) Thanks [@iceglober](https://github.com/iceglober)! - Autopilot rewrite, pilot rip-out, Tier 1 visual capabilities, opencode-snip toggle, research-variant hiding.
+  **Breaking changes:**
+  - **Pilot subsystem removed.** The `glrs oc pilot` CLI subcommand, the four pilot agents (`pilot-scoper` / `planner` / `builder` / `assessor`), the pilot-planning skill references, the `pilot-plugin.ts` runtime enforcer, and all pilot state/docs are gone. Users on pilot should migrate to the CLI autopilot or plain PRIME workflow.
+  - **TUI `/autopilot` slash command removed.** Autopilot is now CLI-only: `glrs oc autopilot "<prompt>"`. Users who want autonomous looping run the CLI in any terminal; the TUI stays for interactive work.
+  - **Research-variant agents (`research-web`, `research-local`, `research-auto`) hidden from the primary-agent picker.** They now run only as subagents dispatched by `@research`. Users who previously selected them directly should select `@research` instead.
+  **New features:**
+  - **CLI autopilot (`glrs oc autopilot "<prompt>"`)** — Ralph-loop engine: sends your prompt each iteration, watches the agent's response for `<autopilot-done>` sentinel, retries the same prompt when absent. Budgets: 50 iterations / 4h / 3 zero-progress iterations / kill-switch file. Supports single-issue (`"ship ENG-1234"`) and multi-issue (`"ship every open ENG-* issue in project ROADMAP"`) prompts.
+  - **opencode-snip installer toggle** — new "Plugin add-ons" section in `glrs oc install` (parallel to existing MCP toggles). Opt-in adds `opencode-snip` to the user's `plugin` array via config-merge, no vendored code. Useful for token reduction on bash-heavy sessions. Requires the Go `snip` binary separately.
+  - **Tier 1 visual capabilities** — `@plan`, `@research`, `@gap-analyzer` now have Playwright MCP access (joining `@prime`, `@build`, `@assessor`, `@assessor-thorough`, `@plan-reviewer`). Enable via the installer's Playwright toggle.
+  - **UI evaluation ladder (graceful degradation)** — all visual-capable agents now carry a four-tier capability ladder (Playwright → curl → webfetch → source inspection). When Playwright is unavailable, agents fall through to the next tier and report which method they used. No hard failure on Playwright absence.
+  **Internal:**
+  - Server lifecycle helpers (`startServer` / `createSession` / `sendAndWait` / `getLastAssistantMessage`) moved from `src/pilot/server.ts` to `src/lib/opencode-server.ts` (consumed by the CLI autopilot).
+  - Agent roster reduced from 20 → 16. Net −5,308 lines across 91 files. Test count 536 → 462 (pilot tests removed, visual-capability tests added).
+- [#55](https://github.com/iceglober/glrs/pull/55) [`8099c49`](https://github.com/iceglober/glrs/commit/8099c498fa6a9c05c8880bfd09cb2c4fd7d1721c) Thanks [@iceglober](https://github.com/iceglober)! - Rename PRIME arc phases to SPEAR model (Scope → Plan → Execute → Assess → Resolve). Rename @qa-reviewer → @assessor, @qa-thorough → @assessor-thorough. Resolve stage auto-ships (pushes branch, opens PR) — /ship becomes a resume path for interrupted sessions.
+- [#57](https://github.com/iceglober/glrs/pull/57) [`6212c48`](https://github.com/iceglober/glrs/commit/6212c483efa2cc8f0407bc6a0d8c23110498eb21) Thanks [@iceglober](https://github.com/iceglober)! - Restructure the SPEAR protocol (PRIME's five-stage arc) across four areas: Assess quality, failure discipline, skill modularity, and agent-contract hygiene.
+  **Breaking changes** (match the prior `@assessor` rename's hard-break pattern):
+  - `@assessor` is replaced by `@spec-reviewer` (first pass, returns `[PASS_SPEC]` or `[FAIL_SPEC]`) and `@code-reviewer` (second pass, runs only on PASS_SPEC, returns `[PASS]` / `[LOOP-TO-PLAN]` / `[FIX-INLINE]`). User configs referencing `@assessor` by name will fail to resolve — update to the appropriate replacement.
+  - `@assessor-thorough` is renamed to `@code-reviewer-thorough` (same role: opus-tier backstop for high-risk diffs that re-runs the full suite unconditionally).
+  - Registered agent count: 20 → 21.
+  **Assess rigor (two-stage review + MECE rubric):**
+  - Every Assess cycle now dispatches two subagents sequentially instead of one, roughly doubling the subagent calls per review cycle. The spec pass is cheaper; the code-quality pass runs only if spec passed.
+  - Assess delegations carry a five-dimension MECE rubric (Correctness, Completeness, Consistency, Safety, Scope) and a progressive-strictness signal (Level 1/2/3) that tightens across Assess iterations.
+  - PRs with red CI (typecheck, lint, or tests failing) now fail Assess regardless of whether the failure appears pre-existing. "Pre-existing" claims require three-part evidence: a specific commit SHA, `git log` output showing the failure pre-dates the branch, and merge-base reproduction. Claims without all three are auto-rejected.
+  **Failure discipline (no-defer policy):**
+  - The hard rule that allowed logging pre-existing failures to a plan's `## Open questions` section and deferring them is removed.
+  - `@build` now runs a mandatory root-cause diagnosis protocol on any unexpected test/lint/typecheck failure: merge-base reproduction, `git blame`, rationalization table countering common excuse patterns ("likely pre-existing", "unrelated to my change", etc.).
+  - If fixing a failure would require touching more than ~5 files outside the plan's `## File-level changes`, `@build` STOPs with a reorganization proposal for PRIME to present to the user — there is no autonomous deferral path.
+  **TDD enforcement:**
+  - For any plan with a `## Test plan` entry or a `tests:` field in the acceptance-criteria fence, `@build` now enforces TDD order: write the test first, verify it fails, then implement. Tests in a just-written RED state are explicitly carved out of the failure-diagnosis protocol — they're expected failures, not unexpected ones.
+  **New bundled skills:**
+  - `spear-protocol` — the full SPEAR stage logic (Bootstrap, Scope, Plan, Execute, Assess, Resolve). Loaded by PRIME at session start. Inline fallback retained in `prime.md` in case skill-loading is unavailable.
+  - `root-cause-diagnosis` — the failure-diagnosis protocol + rationalization table. Loaded by `@build` and its strict-executor variant on unexpected failures.
+  - `adversarial-review-rubric` — the MECE rubric, progressive strictness levels, Red-CI-blocks-merge rule, and three-part evidence test. Loaded by all Assess-layer agents before reviewing.
+  **Agent-contract changes:**
+  - `@build` gains a four-status return protocol: DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED.
+  - `@build` now reports guidance deviations (item (e) of its return payload) when PRIME's Execute-prompt guidance permits multiple readings and `@build` picked one. Same "silence is not acceptable" bar as plan-file mutations.
+  - PRIME runs a pre-dispatch consistency check before every `@build` dispatch: re-read the Execute prompt against the plan and against any already-drafted follow-up prompts. Contradictions caught pre-dispatch avoid the downstream blame-misattribution pattern where faithful agent execution gets narrated as deviation.
+  - `@plan` bans placeholder phrases (TBD, TODO, "implement later", etc.) and runs a self-review checklist (spec coverage, placeholder scan, type/name consistency) before handing to `@plan-reviewer`.
+  - `@build`'s prompt is trimmed of orchestration context per the Minimal Contract principle (subagents perform worse when carrying parent-level workflow philosophy).
+  **Other refinements:**
+  - PRIME's Scope grounding dispatches parallel `@code-searcher` calls in a single message when grounding touches 3+ independent subsystems.
+  - PRIME's Plan stage detects multi-subsystem requests (3+ independent subsystems with no shared interface) and asks whether to split into separate plans.
+  - Delegation prompts apply the Minimal Contract minimality test: remove any sentence that doesn't help the subagent produce a better result. Non-goals prefer positive-instruction form ("Only modify files listed above") over negative lists when the positive form is shorter.
 ## 2.1.0
 ## 2.0.1

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # @glrs-dev/harness-plugin-opencode
-Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended pilot mode — one package.
+Opinionated agent harness for [OpenCode](https://opencode.ai). Agents, tools, slash commands, and an unattended autopilot loop — one package.
 ## Quick start
@@ -21,7 +21,7 @@ bunx @glrs-dev/harness-plugin-opencode install
 opencode
 ```
-No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but pilot commands will offer to install the plugin if you add the CLI later.
+No global install. All [plugin features](#what-the-plugin-provides) load automatically. You won't have the `glrs-oc` CLI, but you can add it later.
 ### Verifying the published tarball
@@ -43,18 +43,21 @@ Open OpenCode in any repo. The `prime` agent handles everything end-to-end.
 ```
 /fresh ENG-1234
 ```
-Wipes the worktree, creates a branch from the ticket ref, and begins the five-phase workflow: understand → plan → execute → verify → handoff.
+Wipes the worktree, creates a branch from the ticket ref, and begins the SPEAR workflow: scope → plan → execute → assess → resolve.
 **Start a task from a description:**
 ```
 /fresh add rate limiting to the upload endpoint
 ```
-**Go hands-off after the plan looks good:**
+**Go hands-off with the Ralph loop (CLI, lights-out):**
 ```
-/autopilot ENG-1234
+glrs oc loop "ship ENG-1234"
 ```
-Runs the full workflow unattended. Stops when all acceptance criteria are checked off. You review, then `/ship`.
+Runs PRIME in a loop: sends your prompt each iteration, watches for `<autopilot-done>` in the response, exits when the sentinel appears or a budget is hit (50 iterations / 4h / 3 zero-progress iterations / kill-switch at `.agent/autopilot-disable`). Works with multi-issue prompts too: `glrs oc loop "ship every open issue in Linear project ENG-ROADMAP until the project is done"`. There is no TUI slash command — if you're in the TUI and don't want the loop, just type the task normally.
+`glrs oc autopilot` is an alias for `glrs oc loop` during the current release cycle. A future release will make `autopilot` an interactive scoping walkthrough that produces a structured plan and then invokes `loop` against it; `loop` will stay as the raw-prompt runner.
 **Ship when done:**
 ```
@@ -66,7 +69,7 @@ Squashes commits, pushes, opens a PR with the plan as the body.
 ```
 /review 87
 ```
-Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@qa-reviewer`, outputs a structured verdict.
+Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates to `@assessor`, outputs a structured verdict.
 **Deep codebase research:**
 ```
@@ -74,41 +77,21 @@ Read-only adversarial review. Fetches the diff, runs typecheck/lint, delegates t
 ```
 Spawns parallel subagents, synthesizes findings with exact file:line references.
-### Autonomous (pilot CLI)
-For larger work that benefits from structured scoping and autonomous execution with self-assessment.
-```bash
-# Scope interactively — spawns OpenCode TUI with the pilot-scoper agent
-glrs-oc pilot scope "Refactor the billing module into separate services"
-# Execute autonomously — Plan → Execute → Assess → Resolve (SPEAR loop)
-glrs-oc pilot go
-# Configure models and verify commands for this repo
-glrs-oc pilot configure
-# Check workflow status
-glrs-oc pilot status
-```
-See [Pilot mode](#pilot-mode) for the full command reference.
 ---
 ## What the plugin provides
-14 agents, 7 slash commands, 5 tools, 5 MCPs, 5 skill bundles, 4 sub-plugins. Details below.
+16 agents, 7 slash commands, 5 tools, 5 MCPs, 11 skill bundles, 3 sub-plugins. Details below.
 ### Agents
 | Agent | Tier | Role |
 |-------|------|------|
-| `prime` | deep | Five-phase end-to-end workflow (default agent) |
+| `prime` | deep | SPEAR end-to-end workflow (default agent) |
 | `plan` | deep | Interactive planner with gap analysis and adversarial review |
 | `build` | mid | Plan executor |
-| `qa-reviewer` | mid | Fast adversarial code review |
-| `qa-thorough` | deep | Full-suite adversarial review |
+| `assessor` | mid | Fast adversarial code review |
+| `assessor-thorough` | deep | Full-suite adversarial review |
 | `plan-reviewer` | deep | Adversarial plan review |
 | `gap-analyzer` | deep | Identifies gaps in plans |
 | `architecture-advisor` | deep | Architecture guidance |
@@ -116,8 +99,8 @@ See [Pilot mode](#pilot-mode) for the full command reference.
 | `docs-maintainer` | mid | Documentation updates |
 | `lib-reader` | mid | Library/dependency reader |
 | `agents-md-writer` | mid | AGENTS.md generation |
-| `pilot-builder` | mid | Unattended task executor (pilot subsystem) |
-| `pilot-planner` | deep | Decomposes work into pilot.yaml DAGs |
+| `research` | deep | Multi-workstream research orchestrator |
+| `research-web` / `research-local` / `research-auto` | deep | Research subagents (dispatched by `@research`) |
 Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Override with [`harness.models`](#model-overrides).
@@ -126,13 +109,14 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
 | Command | What it does |
 |---------|-------------|
 | `/fresh <ref>` | Wipe worktree, branch from ticket or description, start PRIME |
-| `/autopilot <ref>` | Hands-off PRIME run; stops when acceptance criteria pass |
 | `/ship <plan>` | Squash, push, open PR |
 | `/review <target>` | Read-only adversarial review (PR#, SHA, branch, or file) |
 | `/research <topic>` | Parallel codebase exploration with file:line citations |
 | `/init-deep` | Generate hierarchical AGENTS.md files |
 | `/costs` | Show running LLM spend totals |
+Autopilot is CLI-only: `glrs oc loop "<prompt>"` (or the `glrs oc autopilot` alias during the current release cycle — see above).
 ### Tools
 `ast_grep` · `tsc_check` · `eslint_check` · `todo_scan` · `comment_check`
@@ -149,94 +133,48 @@ Tiers: **deep** = opus-class, **mid** = sonnet-class, **fast** = haiku-class. Ov
 ### Sub-plugins
-- **autopilot** — idle-nudge loop driver (only activates via `/autopilot`)
 - **notify** — OS notifications when the agent asks a question
 - **cost-tracker** — LLM spend by provider/model at `~/.glorious/opencode/costs.json`
-- **pilot-plugin** — runtime invariant enforcement for pilot agents
+- **tool-hooks** — post-edit verification loop (tsc, eslint) + output backpressure
 ### Skills
-`review-plan` · `web-design-guidelines` · `vercel-react-best-practices` · `vercel-composition-patterns` · `pilot-planning`
+`adr` · `agent-estimation` · `code-quality` · `research` · `research-auto` · `research-local` · `research-web` · `review-plan` · `vercel-composition-patterns` · `vercel-react-best-practices` · `web-design-guidelines`
 ---
-## Pilot mode
-Autonomous code execution using the SPEAR loop (Scope → Plan → Execute → Assess → Resolve). The user scopes interactively, then `pilot go` runs the rest autonomously with self-assessment and deployment-risk reflection.
-**Prerequisites:** `git` >= 2.5, `opencode` on PATH. Plugin must be installed (auto-prompted if missing).
+## Enabling visual UI capabilities
-### Commands
+The `@plan`, `@research`, `@gap-analyzer`, `@prime`, `@build`, `@assessor`, `@assessor-thorough`, and `@plan-reviewer` agents can verify web UIs, rendered output, and visual components when Playwright is available.
-| Command | Description |
-|---------|-------------|
-| `glrs-oc pilot scope "<goal>"` | Interactive scoping session. Produces `scope.json` with framing + acceptance criteria. |
-| `glrs-oc pilot go` | Autonomous execution. Reads scope, runs Plan → Execute → Assess → Resolve. |
-| `glrs-oc pilot configure` | Interactive per-phase model selection, verify commands, assess cycles, Playwright toggle. |
-| `glrs-oc pilot status` | Workflow status from SQLite. `--workflow <id>`, `--json`. |
-### SPEAR loop
-1. **Scope** (interactive) — scoper agent interviews you, explores the codebase, produces acceptance criteria.
-2. **Plan** (autonomous) — planner agent decomposes ACs into an ordered task list.
-3. **Execute** (autonomous) — builder agent runs one task at a time, commits on verify pass.
-4. **Assess** (autonomous) — assessor evaluates ACs + asks deployment-risk questions (what could break? unexpected consequences? what could go wrong?). If fail → re-plan the gap → re-execute → re-assess (bounded by `max_assess_cycles`).
-5. **Resolve** (autonomous) — final summary with acknowledged risks.
-### State storage
-```
-~/.glorious/opencode/<repo>/pilot/
-  state.sqlite              # workflows + events
-  current-scope.json        # pointer to active scope
-  scopes/<workflowId>/
-    scope.json              # framing + acceptance criteria
-    plan.json               # task list
-    assessment-cycle-N.json # assessment reports
-```
-Repo identity derived from `git rev-parse --git-common-dir` — worktrees of the same repo share state. Override with `$GLORIOUS_PILOT_DIR`.
+### Enable Playwright MCP
-### Configuration
-Config lives at `.glrs/pilot.json` in your repo (not per-plan YAML):
+During `glrs-oc install-plugin`, select **Playwright — browser automation + visual UI verification (requires Chromium)** in the MCP toggle list. Or enable it manually in `opencode.json`:
 ```json
 {
-  "models": {
-    "scope": "anthropic/claude-sonnet-4-6",
-    "plan": "anthropic/claude-sonnet-4-6",
-    "execute": "anthropic/claude-sonnet-4-6",
-    "assess": "anthropic/claude-sonnet-4-6"
-  },
-  "verify": {
-    "baseline": ["bun test", "bun run typecheck"],
-    "after_each": ["bun run typecheck"]
-  },
-  "max_assess_cycles": 3,
-  "playwright": { "enabled": false, "base_url": "http://localhost:3000" }
+  "mcp": {
+    "playwright": { "enabled": true }
+  }
 }
 ```
-Run `glrs-oc pilot configure` for interactive setup with searchable model selection.
+Then install Chromium:
-### Migrating from pilot v1
+```bash
+npx playwright install chromium
+```
-If you used `pilot build` / `pilot.yaml` previously:
+### Graceful degradation
-| v1 command | v2 equivalent |
-|---|---|
-| `pilot plan` | `pilot scope "<goal>"` |
-| `pilot build` | `pilot go` |
-| `pilot validate` | `pilot configure` (config validation) |
-| `pilot status` | `pilot status` (same name, different output) |
-| `pilot logs` | `pilot status --json` |
-| `pilot cost` | `pilot status --json` |
-| `pilot build-resume` | `pilot go` (re-reads scope, restarts from Plan) |
+Agents automatically fall back when Playwright is unavailable:
-Old `.glrs/pilot.json` (v1 format with `baseline`/`after_each` at top level) is detected and a migration banner is shown. Run `pilot configure` to set up the new format.
+1. **Tier A (Playwright)** — navigate, screenshot, evaluate DOM. Best signal.
+2. **Tier B (curl)** — parse returned HTML for structure and reachability.
+3. **Tier C (webfetch)** — built-in tool for public URLs.
+4. **Tier D (source inspection)** — read component files and reason about rendering. Agent flags "visual verification skipped" in its final message.
-Old state DBs under `~/.glorious/opencode/<repo>/pilot/` are orphaned — they won't be read or migrated. You can safely delete them.
+No configuration required — agents detect capability absence from MCP errors and fall through automatically.
 ---
@@ -293,9 +231,7 @@ Your opencode.json values win. Example:
 | `glrs-oc install-plugin [--pin] [--dry-run]` | Register plugin in opencode.json |
 | `glrs-oc uninstall [--dry-run]` | Remove plugin from opencode.json |
 | `glrs-oc doctor` | Check installation health |
-| `glrs-oc pilot <verb>` | [Pilot mode](#pilot-mode) |
-| `glrs-oc plan-dir` | Print repo-shared plan directory |
-| `glrs-oc plan-check <path>` | Validate legacy markdown plan files |
+| `glrs-oc loop "<prompt>"` | Run PRIME in a Ralph loop (lights-out). `autopilot` is an alias during the current release cycle. |
 `install` is an alias for `install-plugin`.
@@ -324,7 +260,7 @@ bun remove -g @glrs-dev/harness-plugin-opencode    # remove CLI
 - `bun`
 - `uvx` for serena + git MCPs (`brew install uv`)
 - `node`/`npx` for memory MCP
-- `git` >= 2.5 for pilot worktrees
+- `git` for version control operations
 ## Security & threat boundaries
@@ -334,8 +270,8 @@ Report vulnerabilities privately per [`SECURITY.md`](./SECURITY.md) — do NOT o
 This is a plugin with broad local-machine access. Install it deliberately:
-- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo>/pilot/*`).
-- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands from any `pilot.yaml` you author.
+- **Reads and writes files** under your home directory (`~/.config/opencode/opencode.json`, `~/.cache/harness-opencode/*`, `~/.config/harness-opencode/install-id`, `~/.glorious/opencode/<repo>/*`).
+- **Runs local subprocesses** during normal operation: `git`, `gh`, `npm`/`bun`, `ast-grep`, `tsc`, `opencode`, and project-specific verify commands.
 - **Makes outbound HTTPS calls** (all opt-out-able):
   - `registry.npmjs.org` — daily version check. Opt out: `HARNESS_OPENCODE_UPDATE_CHECK=0`.
   - `catwalk.charm.land` — model catalog during interactive install only. Response is schema-validated before it reaches your `opencode.json`.

package/SECURITY.md CHANGED Viewed

@@ -44,7 +44,7 @@ If a vulnerability is confirmed and fixed, we will publish a GitHub security adv
 **In scope:**
 - The published npm tarball (`@glrs-dev/harness-plugin-opencode`).
-- CLI subcommands (`glrs-oc`, `harness-opencode`): `install`, `uninstall`, `doctor`, `plan-dir`, `plan-check`, `pilot`.
+- CLI subcommands (`glrs-oc`, `harness-opencode`): `install`, `uninstall`, `doctor`, `pilot`.
 - Plugin hooks registered via the OpenCode plugin API (`config`, `tool.execute.before/after`, `session.idle`, etc.).
 - The MCP config writer (`src/cli/install.ts`, `src/mcp/index.ts`) and the `opencode.json` merge logic (`src/cli/merge-config.ts`).
 - Outbound network calls the plugin makes on its own:

package/dist/agents/prompts/build.md CHANGED Viewed

@@ -34,6 +34,22 @@ If ANY of these are missing, STOP and report to the user:
 Do NOT attempt to "fill in" missing structure on behalf of the plan. The plan is the spec; if the spec is wrong, fix it explicitly — don't improvise.
+## 1.5 Multi-file plan handling
+If the plan path is a directory (contains `main.md`), it is a multi-file plan. Handle it as follows:
+1. Read `main.md`'s `## Phases` checklist.
+2. Find the first unchecked phase (`- [ ] phase_N.md — ...`).
+3. Open the corresponding `phase_N.md` as the working plan for this iteration.
+4. Execute its items per the normal workflow (sections 2–4 below).
+5. After completing all items in the phase file, re-read it and verify all ACs are `[x]`.
+6. Update `main.md`'s corresponding phase checkbox to `[x]`.
+7. Proceed to the next unchecked phase.
+Cross-cutting ACs in `main.md` (under `## Cross-cutting acceptance criteria`) are verified independently via their own `verify:` commands after all phases are complete.
+If the plan path is a single `.md` file, skip this section and proceed normally.
 ## 2. Prepare the return summary
 Before starting execution, prepare a brief summary for your eventual return payload to PRIME: file count, which acceptance criteria you will verify, any unknowns. When invoked as a subagent (the common case — PRIME delegates Phase 3 to you), this summary is for PRIME to relay to the user; do not narrate to the user directly. When invoked top-level by the user (`@build <plan-path>`), you may print the summary to chat.
@@ -47,9 +63,12 @@ Before editing any file longer than ~200 lines, run `comment_check` scoped to th
 For each item in `## File-level changes`:
 1. Make the change.
 2. After each non-trivial change, run lint and tests for the affected files.
-3. If a test fails, fix it before moving on.
+3. If a test fails, fix it before moving on. Run the root-cause diagnosis protocol below before drawing any conclusion about the failure's origin.
 4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
+**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
+The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
 **Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0. This is how fenced plans encode strict TDD — the `tests:` field is the spec; the code is secondary.
 When you discover the plan is wrong:
@@ -64,7 +83,7 @@ Before returning to PRIME (or declaring complete on a top-level invocation):
 - `tsc_check` on each edited file is clean (it's capped and fast — run it).
 - `git diff --stat` matches the plan's `## File-level changes`.
-Do NOT run the full test suite or a full lint pass. PRIME's Phase 4 delegates that to `@qa-reviewer` / `@qa-thorough`, which will fail you if a full-suite regression slips through. Running the full suite here duplicates that work. Per-file tests during execution (section 3) are expected; a final full-suite run is not.
+Do NOT run the full test suite or a full lint pass. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`, which will fail you if a full-suite regression slips through. Running the full suite here duplicates that work. Per-file tests during execution (section 3) are expected; a final full-suite run is not.
 ## 5. Return payload
@@ -76,13 +95,22 @@ Return control to your caller with a structured summary:
 **(c) Plan mutations** — any cosmetic/numeric threshold bumps you absorbed silently, any scope expansions under the 2-file limit you absorbed. Be explicit: *"Updated plan §4 line-count threshold from 200 → 260 (file ended up 258 lines; self-imposed metric)"* is a good entry; silence is not.
-**(d) Unusual conditions** — pre-existing failures encountered and logged to the plan's `## Open questions` (cite the bullet verbatim), files touched outside `## File-level changes` with justification, any STOP condition you hit.
+**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition you hit.
+**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts and put only in skill' and extracted fully; alternate reading was 'duplicate in skill while keeping inline as enforced default.' Chose full extraction because DRY and the rules also live in prime.md hard rules."* Silence is not acceptable — same bar as item (c). A PRIME that can't see the decision-point after the fact has no way to tell a defensible judgment from a silent disobedience.
+**Return status.** Use one of these four statuses in your return:
+- **DONE** — all acceptance criteria met, no concerns.
+- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention (e.g., a pattern inconsistency you worked around, a non-blocking lint warning, a TODO you left in place per the plan's `## Out of scope`). List concerns explicitly.
+- **NEEDS_CONTEXT** — you hit ambiguity that requires user input before you can proceed. Describe what's needed.
+- **BLOCKED** — a hard blocker prevents completion (missing dependency, conflicting plan, broken environment). Describe the blocker.
 **STOP payloads.** If you hit a blocker instead of completing, make the STOP clearly labeled in your return so PRIME recognizes it as a blocker rather than a completion. Format:
 > STOP: <one-sentence blocker>. <Which of the three classes this falls under: cosmetic-numeric / approach-design / scope-expansion-over-2-files>. <What PRIME needs to resolve to re-dispatch>.
-PRIME owns QA dispatch. Do NOT delegate to `@qa-reviewer` or `@qa-thorough` yourself when invoked as a subagent — PRIME's Phase 4 applies a fast-vs-thorough heuristic based on diff size + risk that you don't have full context for. When invoked top-level (`@build <plan-path>`), you may delegate to `@qa-reviewer` directly as the session's final step.
+PRIME owns QA dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent — PRIME's Assess stage applies a fast-vs-thorough heuristic based on diff size + risk that you don't have full context for. When invoked top-level (`@build <plan-path>`), you may delegate to `@spec-reviewer` directly as the session's final step.
 # Hard rules
@@ -91,3 +119,5 @@ PRIME owns QA dispatch. Do NOT delegate to `@qa-reviewer` or `@qa-thorough` your
 - **Never use `--no-verify` or `--no-gpg-sign`** to bypass pre-commit hooks. If a hook blocks you, fix the root cause (resolve TODOs, repair lint/type errors). If the hook seems genuinely wrong, STOP and ask the user.
 - Plan file mutations: mark `[x]` freely as items complete. For **cosmetic / self-imposed numeric thresholds** (line-count budgets, row caps, arbitrary `< N` limits the planner set on itself), update the threshold silently and note it in your commit message — do NOT stop. For **approach / design changes** (the interface doesn't exist, the test strategy won't work, a whole section needs restructuring), stop and use the `question` tool. For **scope expansion** (an extra file or two needed to finish the item), add to `## File-level changes` and keep going; only ask if the expansion is > ~2 files or shifts the `## Goal`.
 - The user's goals are fixed; your own metrics are revisable. If you find yourself working around the plan's *approach*, that's a design-change signal — stop and ask. If you're just bumping a threshold you set on yourself, keep moving.
+{UI_EVALUATION_LADDER}

package/dist/agents/prompts/build.open.md CHANGED Viewed

@@ -37,12 +37,17 @@ Before starting, note: file count, which acceptance criteria you will verify, an
 ## 3. Execute task by task
+**Fenced plans — TDD order.** If the plan's `## Acceptance criteria` contains a ```plan-state fence, work item-by-item in TDD order: for each acceptance item, write the test(s) named in its `tests:` field FIRST (they must fail initially), then implement the change that makes them pass, then confirm by running the item's `verify:` command. Only mark the fence item `- [x]` after the verify command exits 0.
 For each item in `## File-level changes`:
 1. Make the change.
-2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, fix and re-run.
+2. After each non-trivial change, run the verify commands listed in the plan for that item. If they fail, run the root-cause diagnosis protocol below, fix, and re-run.
 3. If a test fails, fix it before moving on.
 4. Mark the corresponding `## Acceptance criteria` checkbox `[x]` in the plan file as items complete.
+**When any test/lint/typecheck fails unexpectedly, load the `root-cause-diagnosis` skill via the Skill tool and follow its protocol.**
+The skill contains: merge-base reproduction, git blame evidence, scope check, rationalization table, and TDD-RED exception.
 **Verify commands.** Run the verify commands listed in the plan. If they pass, the item is done. If they fail, read the output, fix the code, and re-run. Do not mark an item `[x]` until the verify command exits 0.
 When you discover the plan is wrong:
@@ -59,7 +64,7 @@ Before returning:
 - `tsc_check` on each edited file is clean.
 - `git diff --stat` matches the plan's `## File-level changes`.
-Do NOT run the full test suite. PRIME's Phase 4 delegates that to `@qa-reviewer` / `@qa-thorough`.
+Do NOT run the full test suite. PRIME's Assess stage delegates that to `@spec-reviewer` / `@code-reviewer` / `@code-reviewer-thorough`.
 ## 5. Return payload
@@ -71,13 +76,22 @@ Return control to your caller with a structured summary:
 **(c) Plan mutations** — any changes you made to the plan file itself (threshold bumps, etc.).
-**(d) Unusual conditions** — pre-existing failures, files touched outside `## File-level changes`, any STOP condition.
+**(d) Unusual conditions** — files touched outside `## File-level changes` with justification, any STOP condition.
+**(e) Guidance deviations** — when PRIME's Execute-prompt guidance contains instructions that you interpreted in a way that could plausibly be read differently (the plan permitted multiple readings; the Execute prompt and the plan pointed in subtly different directions; two items in the Execute prompt were in tension and you picked one), surface the decision explicitly. Example entry: *"Execute prompt item #12 said 'extract common content to skill'; I read this as 'remove from agent prompts' and extracted fully; alternate reading was 'duplicate in skill while keeping inline.' Chose full extraction because DRY."* Silence is not acceptable — same bar as item (c).
+**Return status.** Use one of these four statuses:
+- **DONE** — all acceptance criteria met, no concerns.
+- **DONE_WITH_CONCERNS** — all acceptance criteria met, but you noticed issues worth PRIME's attention. List concerns explicitly.
+- **NEEDS_CONTEXT** — ambiguity requires user input before you can proceed.
+- **BLOCKED** — a hard blocker prevents completion.
 **STOP payloads.** If you hit a blocker, label it clearly:
 > STOP: <one-sentence blocker>. <What needs to be resolved to re-dispatch>.
-PRIME owns QA dispatch. Do NOT delegate to `@qa-reviewer` or `@qa-thorough` yourself when invoked as a subagent.
+PRIME owns Assess dispatch. Do NOT delegate to `@spec-reviewer`, `@code-reviewer`, or `@code-reviewer-thorough` yourself when invoked as a subagent.
 # Hard rules