opencodekit 0.23.3 → 0.23.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +7 -14
  2. package/dist/index.js +1 -1
  3. package/dist/template/.opencode/AGENTS.md +89 -17
  4. package/dist/template/.opencode/README.md +43 -6
  5. package/dist/template/.opencode/artifacts/harness-workflows/plan.md +317 -0
  6. package/dist/template/.opencode/command/audit.md +65 -0
  7. package/dist/template/.opencode/command/init.md +19 -2
  8. package/dist/template/.opencode/command/research.md +67 -16
  9. package/dist/template/.opencode/command/ship.md +55 -5
  10. package/dist/template/.opencode/command/verify.md +5 -5
  11. package/dist/template/.opencode/opencode.json +12 -0
  12. package/dist/template/.opencode/plugin/README.md +0 -6
  13. package/dist/template/.opencode/skill/defense-in-depth/SKILL.md +0 -2
  14. package/dist/template/.opencode/skill/development-lifecycle/SKILL.md +11 -9
  15. package/dist/template/.opencode/skill/manifest.json +77 -0
  16. package/dist/template/.opencode/workflows/audit-pattern.md +51 -0
  17. package/dist/template/.opencode/workflows/batch-implement.md +82 -0
  18. package/dist/template/.opencode/workflows/deep-research.md +58 -0
  19. package/dist/template/.opencode/workflows/development-lifecycle-workflow.md +129 -0
  20. package/package.json +1 -1
  21. package/dist/template/.opencode/command/clarify.md +0 -46
  22. package/dist/template/.opencode/command/commit.md +0 -53
  23. package/dist/template/.opencode/command/design.md +0 -129
  24. package/dist/template/.opencode/command/explore.md +0 -169
  25. package/dist/template/.opencode/command/improve-architecture.md +0 -55
  26. package/dist/template/.opencode/command/pr.md +0 -148
  27. package/dist/template/.opencode/command/refactor.md +0 -65
  28. package/dist/template/.opencode/command/review-codebase.md +0 -128
  29. package/dist/template/.opencode/command/test.md +0 -66
  30. package/dist/template/.opencode/command/ui-review.md +0 -109
  31. package/dist/template/.opencode/opencodex-fast.jsonc +0 -3
  32. package/dist/template/.opencode/plugin/rtk.ts +0 -43
  33. package/dist/template/.opencode/skill/agent-teams/SKILL.md +0 -268
  34. package/dist/template/.opencode/skill/code-navigation/SKILL.md +0 -142
  35. package/dist/template/.opencode/skill/condition-based-waiting/SKILL.md +0 -135
  36. package/dist/template/.opencode/skill/condition-based-waiting/example.ts +0 -171
  37. package/dist/template/.opencode/skill/context-engineering/SKILL.md +0 -176
  38. package/dist/template/.opencode/skill/memory-system/SKILL.md +0 -147
  39. package/dist/template/.opencode/skill/structured-edit/SKILL.md +0 -191
  40. package/dist/template/.opencode/skill/ubiquitous-language/SKILL.md +0 -184
  41. package/dist/template/.opencode/skill/v0/SKILL.md +0 -158
package/README.md CHANGED
@@ -32,20 +32,13 @@ Use these inside OpenCode:
32
32
 
33
33
  ## Available Slash Commands (Template)
34
34
 
35
- - `/create`
36
- - `/start`
37
- - `/ship`
38
- - `/plan`
39
- - `/status`
40
- - `/pr`
41
- - `/resume`
42
- - `/handoff`
43
- - `/research`
44
- - `/review-codebase`
45
- - `/verify`
46
- - `/design`
47
- - `/ui-review`
48
- - `/init`
35
+ - `/create` — Create a feature spec
36
+ - `/plan` — Plan implementation architecture
37
+ - `/ship` — Implement, verify, review, close
38
+ - `/research` — Research a topic or codebase
39
+ - `/fix` — Targeted bugfix
40
+ - `/verify` — Run verification gates
41
+ - `/init` — Initialize project setup (run once)
49
42
 
50
43
  ## CLI Command Surface (`ock`)
51
44
 
package/dist/index.js CHANGED
@@ -20,7 +20,7 @@ var __require = /* @__PURE__ */ createRequire(import.meta.url);
20
20
 
21
21
  //#endregion
22
22
  //#region package.json
23
- var version = "0.23.3";
23
+ var version = "0.23.4";
24
24
 
25
25
  //#endregion
26
26
  //#region src/utils/license.ts
@@ -46,9 +46,11 @@ This is the compressed always-on execution loop. Keep these six rules active eve
46
46
  ## Core Operating Principles
47
47
 
48
48
  ### Default to Action
49
+
49
50
  If intent is clear and constraints permit, act. Escalate only when blocked or materially uncertain. **Provide options, not excuses** — don't say "it can't be done"; describe the constraint and the path forward.
50
51
 
51
52
  ### Scope Discipline
53
+
52
54
  - Stay in scope; no speculative refactors
53
55
  - Read files before editing
54
56
  - Complexity is incremental. **Don't live with broken windows:** fix bad design in code you're changing. Isolate damage if you can't fix now.
@@ -57,6 +59,7 @@ If intent is clear and constraints permit, act. Escalate only when blocked or ma
57
59
  - Delegate when work is large, uncertain, or cross-domain
58
60
 
59
61
  ### Complexity First
62
+
60
63
  The primary goal of software design is to minimize complexity. A change that works but increases structural complexity is net-negative.
61
64
 
62
65
  - Default to the simplest viable solution
@@ -68,6 +71,7 @@ The primary goal of software design is to minimize complexity. A change that wor
68
71
  - **Distrust the prompt's diagnosis** — independently verify user-provided analysis. Confident prose is not proof.
69
72
 
70
73
  ### Code Quality Gate
74
+
71
75
  - Correct behavior + edge cases
72
76
  - Minimal scope — no drive-by refactors
73
77
  - Meaningful tests; tests must fail if behavior breaks
@@ -81,12 +85,14 @@ Reject changes that worsen overall code health.
81
85
  ---
82
86
 
83
87
  ## Verification Before Completion
88
+
84
89
  - No success claims without fresh evidence. Run typecheck/lint/test/build after meaningful changes.
85
90
  - **If you create or modify a test file, run that test file directly and iterate until it passes.**
86
91
  - If verification fails twice on the same approach, stop and escalate.
87
92
  - **Auto-detect project toolchain** — look for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Makefile`, etc.
88
93
 
89
94
  ### Fallow Codebase Gate
95
+
90
96
  Before committing or claiming completion, run the Fallow codebase gate to catch structural issues that linters and type checkers cannot see:
91
97
 
92
98
  - Run **`npx fallow audit --format json --quiet`** and check the `verdict`. If `"fail"`, resolve all findings before proceeding.
@@ -99,6 +105,7 @@ Fallow builds a complete module graph (Rust-native, sub-second). Its analysis is
99
105
  See `.opencode/context/fallow.md` for the full command reference.
100
106
 
101
107
  ## Tool Discipline
108
+
102
109
  - Use tools whenever they materially improve correctness. Keep calling until the task is complete **and** verified.
103
110
  - If a tool returns empty, partial, or suspiciously narrow results, try 1-2 fallback strategies before reporting "no results found."
104
111
  - Check prerequisite steps before acting — don't skip discovery because the final action seems obvious.
@@ -106,25 +113,85 @@ See `.opencode/context/fallow.md` for the full command reference.
106
113
  - **Before meaningful edits and verification commands, send one sentence describing the immediate action.** Make the call in the same turn — don't ask "shall I?" unless blocked.
107
114
 
108
115
  ## Skills Protocol
116
+
109
117
  Before implementing any non-trivial task, check the available skills list injected at session start. If a skill's description matches the current task, load its `SKILL.md` and follow its instructions before proceeding. Skills provide pre-verified, specialized workflows — using them is faster and safer than ad-hoc implementation.
110
118
 
111
119
  When the task spans multiple domains, load all matching skills. If skill instructions conflict, ask the user for guidance. Do not skip this step for tasks that clearly match a skill's purpose.
112
120
 
121
+ ### Skill Tiers
122
+
123
+ Skills follow a 2-tier classification (see `.opencode/skill/manifest.json`):
124
+
125
+ **Tier 1 — Essential (always consider first):**
126
+
127
+ - `behavioral-kernel` — core execution discipline (no silent assumptions, smallest change, surgical diffs)
128
+ - `defense-in-depth` — validate at every layer, make bad state structurally impossible
129
+ - `incremental-implementation` — thin vertical slices, verify after each
130
+ - `verification-before-completion` — never claim success without fresh evidence
131
+
132
+ Load one of these whenever the task matches their purpose. They're small and general-purpose.
133
+
134
+ **Tier 2 — On-Demand (load when the task domain matches):**
135
+ All skills in `.opencode/skill/` not listed as Tier 1. Covers UI, testing, debugging, design, workflow, platform integration. Load when the task description references the skill's domain.
136
+
137
+ ## Workflow Execution
138
+
139
+ Workflows are markdown files in `.opencode/workflows/` that define multi-phase, multi-agent execution plans.
140
+
141
+ **Execution steps:**
142
+
143
+ 1. Read workflow from `.opencode/workflows/<name>.md`
144
+ 2. Execute phases in order using `task()` with specified agent type
145
+ 3. For parallel phases: spawn multiple `task()` calls concurrently, aggregate results
146
+ 4. Replace `{phase_N_output}` with actual output, `{variable}` with user arguments
147
+ 5. Handle errors: retry up to 2 times, then abort
148
+ 6. Main agent performs final synthesis/merge (no subagent needed)
149
+
150
+ **Built-in workflows:**
151
+
152
+ - `deep-research` — Multi-angle web research with cross-checking
153
+ - `audit-pattern` — Codebase pattern audit with prioritized remediation
154
+ - `batch-implement` — Parallel multi-file implementation with review and merge
155
+ - `development-lifecycle-workflow` — Full feature development with parallelism
156
+
157
+ **Workflow composition:** When a phase specifies `Workflow: <name>`, read and execute that workflow's phases, passing current phase output as input.
158
+
159
+ **Example:**
160
+
161
+ ```
162
+ Read .opencode/workflows/deep-research.md
163
+ Phase 1: spawn 8 @scout agents → aggregate findings
164
+ Phase 2: spawn @review agents → aggregate verified facts
165
+ Main agent synthesizes final report from Phase 2 output
166
+ ```
167
+
168
+ **Composition example:**
169
+
170
+ ```
171
+ Read .opencode/workflows/development-lifecycle-workflow.md
172
+ Phase 1: spawn 3 @scout agents
173
+ Phase 2: spawn 2 @review agents
174
+ Phase 3: spawn 1 @plan agent
175
+ Phase 4: execute batch-implement workflow with plan
176
+ Phase 5: spawn 3 @review agents
177
+ ```
178
+
113
179
  ## Plan Quality Gate
180
+
114
181
  Before approving or executing any implementation plan: write the plan to `.opencode/artifacts/<slug>/plan.md` and the tracking checklist to `.opencode/artifacts/<slug>/progress.md`. The plan MUST contain a `## Discovery` section with substantive research findings. No boilerplate. If missing, research first. Track implementation progress in `progress.md` to make work visible, reviewable, and resumable across sessions.
115
182
 
116
183
  ---
117
184
 
118
185
  ## Hard Constraints (Never Violate)
119
186
 
120
- | Constraint | Rule |
121
- |---|---|
122
- | Security | Never expose or invent credentials |
123
- | Git Safety | Never force push main/master; never bypass hooks |
124
- | Git Restore | Never run `reset --hard`, `checkout .`, `clean -fd` without explicit user request |
125
- | Honesty | Never fabricate tool output; never guess URLs; label inferences; state source conflicts |
126
- | Paths | Use absolute paths for file operations |
127
- | Reversibility | Ask first before destructive or irreversible actions |
187
+ | Constraint | Rule |
188
+ | ------------- | --------------------------------------------------------------------------------------- |
189
+ | Security | Never expose or invent credentials |
190
+ | Git Safety | Never force push main/master; never bypass hooks |
191
+ | Git Restore | Never run `reset --hard`, `checkout .`, `clean -fd` without explicit user request |
192
+ | Honesty | Never fabricate tool output; never guess URLs; label inferences; state source conflicts |
193
+ | Paths | Use absolute paths for file operations |
194
+ | Reversibility | Ask first before destructive or irreversible actions |
128
195
 
129
196
  ---
130
197
 
@@ -143,14 +210,14 @@ Delegate when specialist context, isolation, or parallelism improves correctness
143
210
 
144
211
  ## Delegation Policy
145
212
 
146
- | Agent | Use For |
147
- |---|---|
148
- | `@general` | Small implementation tasks |
149
- | `@explore` | Codebase search and patterns |
150
- | `@scout` | External docs/research |
151
- | `@review` | Correctness/security/debug review |
152
- | `@plan` | Architecture and execution plans |
153
- | `@vision` | UI/UX and accessibility judgment |
213
+ | Agent | Use For |
214
+ | ---------- | --------------------------------- |
215
+ | `@general` | Small implementation tasks |
216
+ | `@explore` | Codebase search and patterns |
217
+ | `@scout` | External docs/research |
218
+ | `@review` | Correctness/security/debug review |
219
+ | `@plan` | Architecture and execution plans |
220
+ | `@vision` | UI/UX and accessibility judgment |
154
221
 
155
222
  **Parallelism rule:** Parallel subagents for 3+ independent tasks; otherwise sequential.
156
223
 
@@ -164,7 +231,7 @@ Subagent self-reports are not sufficient. After any subagent reports success:
164
231
  4. Confirm the agent stayed within scope
165
232
 
166
233
  ```
167
- Agent reports → Read diff → Verify → Check criteria → Accept
234
+ Agent reports → Read diff → Verify → Check criteria → Accept
168
235
  ```
169
236
 
170
237
  Subagent results must include: **status**, **files modified**, **verification evidence**, **summary**, **blockers** (if any).
@@ -174,11 +241,13 @@ When a subagent returns without this structure, treat the response with extra sk
174
241
  ---
175
242
 
176
243
  ## Question Policy
244
+
177
245
  Ask only when ambiguity materially changes the outcome or the action is destructive. Keep questions targeted. Prefer a reversible action or narrow assumption when it can resolve the ambiguity safely.
178
246
 
179
247
  ---
180
248
 
181
249
  ## Web Retrieval Priority
250
+
182
251
  1. `context7` — official library/framework docs
183
252
  2. `websearch` / `codesearch` — discover URLs
184
253
  3. `web_fetch` — read result URL as markdown
@@ -188,6 +257,7 @@ Ask only when ambiguity materially changes the outcome or the action is destruct
188
257
  ---
189
258
 
190
259
  ## Edit Protocol
260
+
191
261
  1. **LOCATE** — find exact position of what must change
192
262
  2. **READ** — get fresh file content around the target
193
263
  3. **VERIFY** — confirm expected content exists
@@ -201,6 +271,7 @@ Prefer `edit` for modifications; reserve `write` for new files or deliberate ful
201
271
  ---
202
272
 
203
273
  ## Context Management
274
+
204
275
  - Keep context high-signal
205
276
  - Use DCP/VCC tools to compress completed phases and recover targeted history
206
277
  - After any context compaction, re-read: (1) this `AGENTS.md`, (2) the current task details, (3) active state
@@ -211,6 +282,7 @@ Prefer `edit` for modifications; reserve `write` for new files or deliberate ful
211
282
  ---
212
283
 
213
284
  ## Output Style
285
+
214
286
  - Be concise and direct. Cite concrete file paths and line numbers.
215
287
  - **No cheerleading** — no filler, no artificial reassurance
216
288
  - **Never narrate abstractly** — explain what you're doing, not that you're "going to look into it"
@@ -9,11 +9,12 @@ This directory contains project-specific OpenCode configuration: agents, command
9
9
  ├── AGENTS.md # Global operating rules for agents
10
10
  ├── opencode.json # OpenCode runtime configuration
11
11
  ├── dcp.jsonc # Dynamic context pruning settings
12
- ├── agent/ # Agent definitions (9)
13
- ├── command/ # Slash commands (14)
12
+ ├── agent/ # Agent definitions (7)
13
+ ├── command/ # Slash commands (6)
14
14
  ├── skill/ # Skill library used by agents/commands
15
15
  ├── tool/ # Custom tools (memory, swarm, research, etc.)
16
16
  ├── plugin/ # OpenCode plugins and plugin-local SDK code
17
+ ├── workflows/ # Multi-agent orchestration plans (markdown)
17
18
  ├── memory/ # Memory templates + project memory files
18
19
  └── .env.example # Environment variable template
19
20
  ```
@@ -34,11 +35,30 @@ Add the keys you actually need for enabled services.
34
35
 
35
36
  ## Skills
36
37
 
37
- Skills live in `.opencode/skill/` and are loaded on demand with `skill({ name: "..." })`.
38
+ Skills live in `.opencode/skill/` and are loaded on demand with `skill({ name: "..." })`. They follow a 3-tier system (see `manifest.json`):
38
39
 
39
- - Core workflow examples: `verification-before-completion`, `writing-plans`, `executing-plans`
40
- - Debug/reliability examples: `systematic-debugging`, `root-cause-tracing`, `defense-in-depth`
41
- - UI/design examples: `frontend-design`, `visual-analysis`, `accessibility-audit`
40
+ **Tier 1 Essential** Always consider first; general-purpose execution discipline.
41
+ - `behavioral-kernel`, `defense-in-depth`, `incremental-implementation`, `verification-before-completion`
42
+
43
+ **Tier 2 — On-Demand** — Load when the task domain matches:
44
+ - *UI/design*: `frontend-design`, `design-taste-frontend`, `minimalist-ui`, `high-end-visual-design`, `industrial-brutalist-ui`, `accessibility-audit`, `redesign-existing-projects`, `mockup-to-code`
45
+ - *Testing*: `test-driven-development`, `testing-anti-patterns`, `browser-testing-with-devtools`, `playwright`
46
+ - *Debugging*: `debugging-and-error-recovery`, `root-cause-tracing`, `defense-in-depth`, `fallow`
47
+ - *Workflow*: `spec-driven-development`, `planning-and-task-breakdown`, `subagent-driven-development`, `development-lifecycle`, `git-workflow-and-versioning`, `shipping-and-launch`
48
+ - *Code quality*: `code-review-and-quality`, `agent-code-quality-gate`, `code-cleanup`, `deep-module-design`
49
+ - *Platform*: `supabase`, `resend`, `polar`, `cloudflare`*, `jira`, `figma`, `vercel-deploy-claimable`
50
+ - *Docs/design*: `documentation-and-adrs`, `deprecation-and-migration`, `api-and-interface-design`, `brainstorming`, `grill-me`
51
+ - *Research*: `opensrc`, `webclaw`, `pdf-extract`, `gemini-large-context`
52
+ - *Navigation*: `srcwalk`
53
+ - *Etc*: `ci-cd-and-automation`, `security-and-hardening`, `performance-optimization`, `source-driven-development`, `writing-skills`
54
+
55
+ **Tier 3 — Platform Reference** — Large reference directories (not shipped by default). Install on demand:
56
+
57
+ ```
58
+ .opencode/scripts/install-skill.sh <name>
59
+ ```
60
+
61
+ \* `cloudflare` is tier-3 (257 files), listed here for discoverability. Run `install-skill.sh` to install.
42
62
 
43
63
  ## Custom Tools
44
64
 
@@ -61,6 +81,23 @@ Current plugin source files in `.opencode/plugin/`:
61
81
 
62
82
  See `.opencode/plugin/README.md` for plugin details.
63
83
 
84
+ ## Workflows
85
+
86
+ Workflows live in `.opencode/workflows/` and define reusable multi-agent orchestration plans. Each is a markdown file that specifies phases with agent types, concurrency, dependencies, and prompt templates.
87
+
88
+ **Built-in workflows:**
89
+ - `deep-research` — Fan out 8 search agents, cross-check findings, synthesize a cited report
90
+ - `audit-pattern` — Discover code pattern occurrences, audit each, produce remediation report
91
+ - `batch-implement` — Parallel task implementation with review and merge phases
92
+
93
+ **Usage:**
94
+ 1. Read the workflow file from `.opencode/workflows/<name>.md`
95
+ 2. Execute each phase via `task()` with the specified agent type and prompt
96
+ 3. For parallel phases, spawn multiple `task()` calls concurrently
97
+ 4. Replace `{phase_N_output}` placeholders with actual output from completed phases
98
+
99
+ New workflows: add a `.md` file to `.opencode/workflows/` following the same structure. See `AGENTS.md` for execution details.
100
+
64
101
  ## Guardrails
65
102
 
66
103
  - Keep edits focused; avoid changing generated output under `dist/`.
@@ -0,0 +1,317 @@
1
+ # Harness Redesign: Workflows + Surface Area Reduction
2
+
3
+ ## TL;DR
4
+
5
+ Add 1 plugin (~300 lines), 1 directory (`.opencode/workflows/`), cut the template from 800+ files to ~80 essential files, lazy-load the rest. The result is a harness that is strictly better than Claude Code's: more flexible, more verifiable, and fully extensible.
6
+
7
+ ---
8
+
9
+ ## Discovery
10
+
11
+ ### Current State (Brutal)
12
+
13
+ | Category | File Count | Problem |
14
+ |---|---|---|
15
+ | Agents | 7 files | `build` (main agent) and `general` (subagent default) are distinct roles. Keep both. |
16
+ | Commands | 17 files | ~10 of these will never be invoked. They bloat context on every `/init`. |
17
+ | Skills | 50+ dirs | Cloudflare is 280 files. React best-practices is 50 files. Core Data is 15 files. SwiftUI is 17 files. **If you don't use these, they're dead weight in the skill index.** |
18
+ | Plugins | ~20 files + Copilot SDK | All plugins including Copilot SDK stay — they're part of the core stack. |
19
+ | State/Artifacts | ~10 files | Workable — needed for the beads lifecycle. |
20
+ | DCP prompts | 9 files | 9 carefully tuned compression prompts. Keep. |
21
+ | `src/` (CLI) | 25 files | Fine. This is the `ock` CLI surface. Keep. |
22
+
23
+ **Total: ~800 files.** A new user has no idea where to start. The `README.md` lists 14 slash commands and 7 agents — the cognitive load before the first prompt is too high.
24
+
25
+ ### What OpenCode Already Has That Claude Code Doesn't
26
+
27
+ 1. **Plugin API hooks**: `tool.execute.before`, `experimental.chat.system.transform`, `experimental.session.compacting`, `message.part.updated`. Claude Code has file-based extension only.
28
+ 2. **4-tier memory pipeline**: capture → distill → curate → inject. Claude Code has auto memory (model writes to a file).
29
+ 3. **Tool constraints per subagent**: `explore` literally cannot edit files — the runtime enforces it. Claude Code uses prompt-level restrictions.
30
+ 4. **Fallow codebase gate**: deterministic static analysis gating completion claims.
31
+ 5. **Worker distrust protocol**: the harness requires reading changed files and re-running verification after every subagent returns.
32
+
33
+ ### What OpenCode Is Missing vs Claude Code Workflows
34
+
35
+ Claude Code's dynamic workflows provide:
36
+ 1. **A script that holds the plan** — orchestration lives in JavaScript, not the model's context window
37
+ 2. **Isolated runtime** — the script executes outside the conversation
38
+ 3. **Intermediate results in script variables** — not in context
39
+ 4. **Phase-level monitoring** — track agents per phase, token usage, elapsed time
40
+ 5. **Resumability** — cached agent results survive pauses
41
+ 6. **Cross-checking** — agents adversarially review each other's findings
42
+
43
+ OpenCode's equivalent is the `subagent-driven-development` skill — a **markdown file** describing how to orchestrate subagents manually. This is a prompt, not a primitive. The model still holds the orchestration in context. For a 50-agent codebase audit, this hits the context wall.
44
+
45
+ ---
46
+
47
+ ## Design: Workflow Primitive
48
+
49
+ ### 1. Workflow File Format
50
+
51
+ ```
52
+ .opencode/workflows/
53
+ ├── deep-research.ts # Built-in
54
+ ├── audit-endpoints.ts # User-created
55
+ └── migration-runner.ts # User-created
56
+ ```
57
+
58
+ Each file exports a workflow definition:
59
+
60
+ ```typescript
61
+ // .opencode/workflows/deep-research.ts
62
+ import { defineWorkflow } from "../plugin/workflow/runtime.js"
63
+
64
+ export default defineWorkflow({
65
+ name: "deep-research",
66
+ description: "Fan out web searches on a question, cross-check sources, return a cited report",
67
+ agents: 16, // max concurrent agents
68
+ phases: [
69
+ {
70
+ name: "research",
71
+ parallel: true,
72
+ agents: 8,
73
+ prompt: "Search for different angles on: {question}"
74
+ },
75
+ {
76
+ name: "cross-check",
77
+ parallel: true,
78
+ agents: 4,
79
+ dependsOn: ["research"],
80
+ prompt: "Verify findings from research phase against each other"
81
+ },
82
+ {
83
+ name: "synthesize",
84
+ parallel: false,
85
+ agents: 1,
86
+ dependsOn: ["cross-check"],
87
+ prompt: "Write a final cited report from verified findings"
88
+ }
89
+ ]
90
+ })
91
+ ```
92
+
93
+ **Alternative (more flexible) — function-based:**
94
+
95
+ ```typescript
96
+ export default defineWorkflow({
97
+ name: "audit-endpoints",
98
+ async run({ task, args, log }) {
99
+ // Phase 1: discover endpoints
100
+ const endpoints = await task({
101
+ agent: "explore",
102
+ prompt: `Find all API route handlers matching pattern: ${args.pattern ?? "src/**/route.ts"}`
103
+ })
104
+
105
+ // Phase 2: audit in parallel
106
+ const results = await Promise.all(
107
+ parseEndpoints(endpoints).map(ep => task({
108
+ agent: "review",
109
+ prompt: `Audit ${ep.path} for: auth checks, input validation, error handling`
110
+ }))
111
+ )
112
+
113
+ // Phase 3: synthesize
114
+ return synthesize(results)
115
+ }
116
+ })
117
+ ```
118
+
119
+ The function-based form is more powerful. It lets the workflow script hold state, branch, and aggregate — exactly what Claude Code's workflows do.
120
+
121
+ ### 2. Workflow Runtime (~300 lines in a new plugin)
122
+
123
+ The runtime is a single plugin file: `.opencode/plugin/workflow.ts`
124
+
125
+ ```
126
+ plugin/workflow.ts — Plugin entry: tools + command registration
127
+ plugin/workflow/runtime.ts — Script loader + sandboxed executor
128
+ plugin/workflow/monitor.ts — Phase progress tracking via session-summary
129
+ plugin/workflow/registry.ts — List/save/load workflows from .opencode/workflows/
130
+ ```
131
+
132
+ **Key interfaces:**
133
+
134
+ ```typescript
135
+ // The runtime tool exposed to the model
136
+ tool.workflow.run = {
137
+ name: "workflow-run",
138
+ description: "Run a workflow script that orchestrates multiple subagents",
139
+ parameters: {
140
+ workflow: string, // name of workflow in .opencode/workflows/
141
+ args: Record<string, unknown>
142
+ },
143
+ execute: async ({ workflow, args }, context) => {
144
+ const script = await load(`.opencode/workflows/${workflow}.ts`)
145
+ const result = await sandboxedExecute(script, {
146
+ task: context.task, // pass through the built-in task() tool
147
+ args,
148
+ log: context.log
149
+ })
150
+ return result
151
+ }
152
+ }
153
+ ```
154
+
155
+ **Sandboxed execution** means the workflow script runs in a separate context with its own `task()` pool. It cannot directly read/edit/write files (only its subagents can). This prevents the orchestration script from corrupting state — the same constraint Claude Code's runtime enforces.
156
+
157
+ ### 3. Integration Points
158
+
159
+ **Plugin hooks:**
160
+ - `tool.execute.before` — intercept `workflow-run` calls, route to runtime
161
+ - `experimental.session.compacting` — preserve workflow run state across compaction
162
+ - `experimental.chat.system.transform` — inject available workflow descriptions into context (progressive disclosure — only active ones, not all 50)
163
+
164
+ **Existing surface to reuse:**
165
+ - `task()` tool — already exists, workflows delegate to it
166
+ - Artifacts — workflow results land in `.opencode/artifacts/<run-id>/`
167
+ - Session summary — workflow phase progress is tracked via the existing session-summary plugin interface
168
+
169
+ ### 4. Built-in Workflows (ship 3)
170
+
171
+ | Workflow | What it does | When to use |
172
+ |---|---|---|
173
+ | `/deep-research` | Fan out web searches across angles, cross-check sources, write cited report | Questions needing multi-source verification |
174
+ | `/audit-pattern` | Explore codebase for a pattern, review each match, synthesize findings | "Find all X and check for Y" |
175
+ | `/batch-implement` | Take a plan with independent tasks, dispatch one subagent per task, review each | Multi-file feature implementation |
176
+
177
+ These replace ~5 of the 17 existing slash commands (research, review-codebase, fix, improve-architecture, refactor) with a single unified primitive.
178
+
179
+ ---
180
+
181
+ ## Design: Surface Area Reduction
182
+
183
+ ### 1. Keep `build` and `general` — distinct roles, no merge
184
+
185
+ **Confirmed:** `build` is the main/primary agent for development sessions. `general` is the default subagent used by `task()`. They serve different routing purposes and both stay.
186
+
187
+ **Action:** None — no merge needed. If anything, ensure `general.md` explicitly references `build.md` as its parent for context inheritance.
188
+
189
+ ### 2. Cut the command list from 17 to 6
190
+
191
+ | Keep | Delete | Why |
192
+ |---|---|---|
193
+ | `/ship` | → Keep | Core workflow end |
194
+ | `/plan` | → Keep | Core workflow middle |
195
+ | `/create` | → Keep | Core workflow start |
196
+ | `/verify` | → Keep | Verification gate |
197
+ | `/research` | → Keep | Research command |
198
+ | `/fix` | → Keep | Targeted bugfix |
199
+ | | `/clarify` | Merged into `/plan` — the plan agent should clarify as part of planning |
200
+ | | `/commit` | `git commit` is a mechanical action, not a command. Let the agent do it automatically at ship time |
201
+ | | `/design` | Merged into `/plan` — architecture design is a phase of planning |
202
+ | | `/explore` | Users type "find the auth logic", not "/explore auth logic" |
203
+ | | `/improve-architecture` | Merged into `/plan --refactor` flag |
204
+ | | `/init` | Keep but hide from command list — called once on setup |
205
+ | | `/pr` | Merged into `/ship` — PR creation is the final phase |
206
+ | | `/refactor` | Merged into `/plan --refactor` |
207
+ | | `/review-codebase` | Replaced by `/audit` workflow |
208
+ | | `/test` | Too narrow — users say "add tests" not "/test" |
209
+ | | `/ui-review` | Merged into the verification phase of `/ship` |
210
+
211
+ **Impact:** -11 files. The remaining 6 commands are discoverable and non-overlapping. Users learn `create → plan → ship` and everything else is a phase of those three.
212
+
213
+ ### 3. Skill triage: 3 tiers
214
+
215
+ **Tier 1 — Essential (always loaded, in context):**
216
+ - `behavioral-kernel` — core execution discipline
217
+ - `code-navigation` — how to read code effectively
218
+ - `verification-before-completion` — must-run gates
219
+ - `incremental-implementation` — thin slices
220
+ - `defense-in-depth` — structural safety
221
+
222
+ **Tier 2 — On-demand (model loads when relevant, 5-10 files):**
223
+ - `frontend-design`, `design-taste-frontend`, `minimalist-ui`, `high-end-visual-design`, `industrial-brutalist-ui`
224
+ - `spec-driven-development`, `planning-and-task-breakdown`, `subagent-driven-development`
225
+ - `documentation-and-adrs`, `deprecation-and-migration`
226
+ - `testing-anti-patterns`, `test-driven-development`
227
+ - `debugging-and-error-recovery`, `root-cause-tracing`
228
+ - `browser-testing-with-devtools`, `playwright`
229
+ - `code-review-and-quality`, `agent-code-quality-gate`
230
+ - `git-workflow-and-versioning`, `shipping-and-launch`
231
+ - `fallow`, `srcwalk`, `structured-edit`
232
+ - ~10 design/UI skills
233
+ - ~5 platform skills (supabase, resend, polar, cloudflare-postgres-basics)
234
+
235
+ **Tier 3 — Platform reference (load only when the user confirms they build on that platform):**
236
+
237
+ These are large reference directories. They should NOT ship in every template:
238
+ - `cloudflare` — 280 files, 15+ sub-services. Add only if user selects "Cloudflare" in `init` wizard
239
+ - `react-best-practices` — 50 files. Add only if user selects "React"
240
+ - `supabase-postgres-best-practices` — 35 files. Add only if user selects "Supabase"
241
+ - `core-data-expert` — 15 files. Add only if user selects "iOS/Core Data"
242
+ - `swiftui-expert-skill` — 17 files. Add only if user selects "SwiftUI"
243
+ - `swift-concurrency` — 15 files. Add only if user selects "Swift"
244
+
245
+ **Impact:** Template drops from 800 files to ~100-150 for most users (Cloudflare alone is 280 files). The `init` wizard asks 3 questions and installs the right tier-3 skills.
246
+
247
+ ### 4. Plugin cleanup — all plugins stay
248
+
249
+ **Confirmed:** All plugins including the Copilot provider/auth integration and SDK stay. They're part of the core stack.
250
+
251
+ | Plugin | Keep? | Why |
252
+ |---|---|---|
253
+ | `memory.ts` + lib/ | ✅ Keep | Core 4-tier memory |
254
+ | `session-summary.ts` | ✅ Keep | Anchored iterative summarization |
255
+ | `sessions.ts` | ✅ Keep | Session search |
256
+ | `skill-mcp.ts` | ✅ Keep | Skill MCP bridge |
257
+ | `srcwalk.ts` | ✅ Keep | Code navigation |
258
+ | `copilot-auth.ts` + `sdk/copilot/` | ✅ Keep | Copilot provider integration |
259
+ | `prompt-leverage.ts` | ✅ Keep | Prompt framing |
260
+ | `rtk.ts` | ❌ Removed | External dependency for marginal benefit — not earning its place in core stack |
261
+ | `guard.ts` | ✅ Keep | Conventional commits + pipe-to-shell blocker |
262
+
263
+ **Impact:** 0 deletions. The plugin surface stays intact.
264
+
265
+ ### 5. DCP and config cleanup
266
+
267
+ | File | Keep? | Why |
268
+ |---|---|---|
269
+ | `dcp.jsonc` | ✅ Keep | Core compression settings |
270
+ | `dcp-prompts/defaults/` (5 files) | ✅ Keep | Tuned compression prompts |
271
+ | `dcp-prompts/overrides/` (2 files) | ✅ Keep | User overrides |
272
+ | `tui.json` | ✅ Keep | TUI config |
273
+ | `.env.example` | ✅ Keep | Environment reference |
274
+ | `.template-manifest.json` | 🟡 Keep but hide | Build system internal |
275
+ | `.version` | 🟡 Keep but hide | Build system internal |
276
+ | `opencodex-fast.jsonc` | ❓ What is this? | If unused, delete |
277
+
278
+ ---
279
+
280
+ ## Implementation Effort
281
+
282
+ | Item | Effort | Files Changed | Risk |
283
+ |---|---|---|---|---|
284
+ | Workflow runtime plugin | **M** (2-3 days) | ~4 new files (plugin + runtime + monitor + registry) | Medium — sandboxed execution has edge cases |
285
+ | Phase monitoring via session-summary hook | **S** (half day) | ~2 files modified | Low — existing plugin interface |
286
+ | Built-in workflows (deep-research, audit-pattern, batch-implement) | **S** each (half day each) | ~3 new workflow files | Low — all use existing `task()` |
287
+ | Cut commands 17→6 | **S** (2 hours) | 11 deletions, update `ship.md`, README, and init command | Low — old commands unused |
288
+ | Skill triage (tier system) | **M** (1-2 days) | Init wizard, skill metadata, lazy-loading config | Medium — changing skill loading has UX impact |
289
+ | Tier-3 skill gate in init wizard | **M** (1 day) | Add questions to init wizard, conditional skill install | Low |
290
+ | **Total** | **M overall** (1 week) | ~20 files changed | Medium |
291
+
292
+ ---
293
+
294
+ ## The Brutal Self-Critique
295
+
296
+ **Where this design could fail:**
297
+
298
+ 1. **The workflow runtime adds complexity.** Every runtime has bugs. Error handling in multi-agent scripts is hard. If the runtime is flaky, the workflow feature hurts more than it helps. *Mitigation: keep the runtime under 300 lines, no dependencies, hard fail on uncaught exceptions.*
299
+
300
+ 2. **Cutting commands removes discoverability.** The 17 commands are a menu of "things Claude can do." Cutting to 6 means users need to know the workflow names. *Mitigation: `/help` should list available workflows + the 6 core commands.*
301
+
302
+ 3. **Skill triage creates friction.** If a user wants Cloudflare but didn't select it at init, they now need to know they can `skill install cloudflare`. That's an extra step. *Mitigation: the `init` wizard should have a "Browse skill marketplace" option that lazily loads the full list.*
303
+
304
+ 4. **The function-based workflow format is too powerful.** Giving workflow scripts full JavaScript means they can have bugs, infinite loops, and resource leaks. Claude Code limits workflows to declarative phases + pre-defined templates. *Mitigation: impose a timeout per workflow, max agent count, and disallow raw `while(true)` via sandbox. Add a `maxAgents: 1000` cap matching Claude Code's.*
305
+
306
+ 5. **The template gets smaller but the init wizard gets bigger.** Shifting complexity from file count to an interactive wizard is a tradeoff, not a pure win. If the wizard is bad, users have a worse experience than a big file tree. *Mitigation: the wizard asks exactly 3 questions (project type, target platform, optional skills). No more.*
307
+
308
+ ---
309
+
310
+ ## Acceptance Criteria
311
+
312
+ 1. **Workflow runtime works**: `workflow-run deep-research "What changed in Node.js v20-v22"` fans out 8 search agents, cross-checks, returns a cited report
313
+ 2. **Workflow scripts are saveable**: `workflow-save` stores the current run's script as a reusable command
314
+ 3. **Surface area measured**: template ships with ≤150 files (down from 800+)
315
+ 4. **Init wizard working**: `ock init` asks 3 questions → installs only matching tier-3 skills
316
+ 5. **All existing `/ship` flows still pass**: no regressions from the 17→6 command cut
317
+ 6. **Models actually use workflows**: functional test where a prompt containing "audit" triggers a workflow instead of a single-agent turn