npm - opencodekit - Versions diffs - 0.23.2 → 0.23.4 - Mend

opencodekit 0.23.2 → 0.23.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/README.md CHANGED Viewed

@@ -32,20 +32,13 @@ Use these inside OpenCode:
 ## Available Slash Commands (Template)
-- `/create`
-- `/start`
-- `/ship`
-- `/plan`
-- `/status`
-- `/pr`
-- `/resume`
-- `/handoff`
-- `/research`
-- `/review-codebase`
-- `/verify`
-- `/design`
-- `/ui-review`
-- `/init`
+- `/create` — Create a feature spec
+- `/plan` — Plan implementation architecture
+- `/ship` — Implement, verify, review, close
+- `/research` — Research a topic or codebase
+- `/fix` — Targeted bugfix
+- `/verify` — Run verification gates
+- `/init` — Initialize project setup (run once)
 ## CLI Command Surface (`ock`)

package/dist/index.js CHANGED Viewed

@@ -20,7 +20,7 @@ var __require = /* @__PURE__ */ createRequire(import.meta.url);
 //#endregion
 //#region package.json
-var version = "0.23.2";
+var version = "0.23.4";
 //#endregion
 //#region src/utils/license.ts

package/dist/template/.opencode/AGENTS.md CHANGED Viewed

@@ -13,8 +13,9 @@
 5. This `AGENTS.md`
 6. **Skills** — before non-trivial work, check the available skills list injected at session start. If a skill's purpose matches the task, load and follow it.
 7. Memory (`memory-search`, `observation`) — create observations, search/read memory
-8. Srcwalk (`srcwalk_search`, `srcwalk_read`, `srcwalk_files`, `srcwalk_map`, `srcwalk_callers`, `srcwalk_callees`, `srcwalk_flow`, `srcwalk_deps`, `srcwalk_impact`) — code navigation and intelligence
-9. Project files and codebase evidence
+8. Csearch (`csearch`) — multi-keyword code chunk search with BM25 ranking. Provide specific keywords; returns complete function/class code chunks ranked by relevance. Use when you need to find code by what it does, not by exact symbol name.
+9. Srcwalk (`srcwalk_read`, `srcwalk_map`, `srcwalk_callers`, `srcwalk_callees`, `srcwalk_flow`, `srcwalk_deps`, `srcwalk_impact`) — structural code navigation (file reading, call graphs, import analysis, directory maps, blast-radius triage). Use when you have a known file/symbol and need to trace connections.
+10. Project files and codebase evidence
 If sources conflict, state the conflict explicitly. Official docs > code > blog posts > AI-generated content.
@@ -45,9 +46,11 @@ This is the compressed always-on execution loop. Keep these six rules active eve
 ## Core Operating Principles
 ### Default to Action
 If intent is clear and constraints permit, act. Escalate only when blocked or materially uncertain. **Provide options, not excuses** — don't say "it can't be done"; describe the constraint and the path forward.
 ### Scope Discipline
 - Stay in scope; no speculative refactors
 - Read files before editing
 - Complexity is incremental. **Don't live with broken windows:** fix bad design in code you're changing. Isolate damage if you can't fix now.
@@ -56,6 +59,7 @@ If intent is clear and constraints permit, act. Escalate only when blocked or ma
 - Delegate when work is large, uncertain, or cross-domain
 ### Complexity First
 The primary goal of software design is to minimize complexity. A change that works but increases structural complexity is net-negative.
 - Default to the simplest viable solution
@@ -67,6 +71,7 @@ The primary goal of software design is to minimize complexity. A change that wor
 - **Distrust the prompt's diagnosis** — independently verify user-provided analysis. Confident prose is not proof.
 ### Code Quality Gate
 - Correct behavior + edge cases
 - Minimal scope — no drive-by refactors
 - Meaningful tests; tests must fail if behavior breaks
@@ -80,12 +85,14 @@ Reject changes that worsen overall code health.
 ---
 ## Verification Before Completion
 - No success claims without fresh evidence. Run typecheck/lint/test/build after meaningful changes.
 - **If you create or modify a test file, run that test file directly and iterate until it passes.**
 - If verification fails twice on the same approach, stop and escalate.
 - **Auto-detect project toolchain** — look for `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Makefile`, etc.
 ### Fallow Codebase Gate
 Before committing or claiming completion, run the Fallow codebase gate to catch structural issues that linters and type checkers cannot see:
 - Run **`npx fallow audit --format json --quiet`** and check the `verdict`. If `"fail"`, resolve all findings before proceeding.
@@ -98,6 +105,7 @@ Fallow builds a complete module graph (Rust-native, sub-second). Its analysis is
 See `.opencode/context/fallow.md` for the full command reference.
 ## Tool Discipline
 - Use tools whenever they materially improve correctness. Keep calling until the task is complete **and** verified.
 - If a tool returns empty, partial, or suspiciously narrow results, try 1-2 fallback strategies before reporting "no results found."
 - Check prerequisite steps before acting — don't skip discovery because the final action seems obvious.
@@ -105,25 +113,85 @@ See `.opencode/context/fallow.md` for the full command reference.
 - **Before meaningful edits and verification commands, send one sentence describing the immediate action.** Make the call in the same turn — don't ask "shall I?" unless blocked.
 ## Skills Protocol
 Before implementing any non-trivial task, check the available skills list injected at session start. If a skill's description matches the current task, load its `SKILL.md` and follow its instructions before proceeding. Skills provide pre-verified, specialized workflows — using them is faster and safer than ad-hoc implementation.
 When the task spans multiple domains, load all matching skills. If skill instructions conflict, ask the user for guidance. Do not skip this step for tasks that clearly match a skill's purpose.
+### Skill Tiers
+Skills follow a 2-tier classification (see `.opencode/skill/manifest.json`):
+**Tier 1 — Essential (always consider first):**
+- `behavioral-kernel` — core execution discipline (no silent assumptions, smallest change, surgical diffs)
+- `defense-in-depth` — validate at every layer, make bad state structurally impossible
+- `incremental-implementation` — thin vertical slices, verify after each
+- `verification-before-completion` — never claim success without fresh evidence
+Load one of these whenever the task matches their purpose. They're small and general-purpose.
+**Tier 2 — On-Demand (load when the task domain matches):**
+All skills in `.opencode/skill/` not listed as Tier 1. Covers UI, testing, debugging, design, workflow, platform integration. Load when the task description references the skill's domain.
+## Workflow Execution
+Workflows are markdown files in `.opencode/workflows/` that define multi-phase, multi-agent execution plans.
+**Execution steps:**
+1. Read workflow from `.opencode/workflows/<name>.md`
+2. Execute phases in order using `task()` with specified agent type
+3. For parallel phases: spawn multiple `task()` calls concurrently, aggregate results
+4. Replace `{phase_N_output}` with actual output, `{variable}` with user arguments
+5. Handle errors: retry up to 2 times, then abort
+6. Main agent performs final synthesis/merge (no subagent needed)
+**Built-in workflows:**
+- `deep-research` — Multi-angle web research with cross-checking
+- `audit-pattern` — Codebase pattern audit with prioritized remediation
+- `batch-implement` — Parallel multi-file implementation with review and merge
+- `development-lifecycle-workflow` — Full feature development with parallelism
+**Workflow composition:** When a phase specifies `Workflow: <name>`, read and execute that workflow's phases, passing current phase output as input.
+**Example:**
+```
+Read .opencode/workflows/deep-research.md
+Phase 1: spawn 8 @scout agents → aggregate findings
+Phase 2: spawn @review agents → aggregate verified facts
+Main agent synthesizes final report from Phase 2 output
+```
+**Composition example:**
+```
+Read .opencode/workflows/development-lifecycle-workflow.md
+Phase 1: spawn 3 @scout agents
+Phase 2: spawn 2 @review agents
+Phase 3: spawn 1 @plan agent
+Phase 4: execute batch-implement workflow with plan
+Phase 5: spawn 3 @review agents
+```
 ## Plan Quality Gate
 Before approving or executing any implementation plan: write the plan to `.opencode/artifacts/<slug>/plan.md` and the tracking checklist to `.opencode/artifacts/<slug>/progress.md`. The plan MUST contain a `## Discovery` section with substantive research findings. No boilerplate. If missing, research first. Track implementation progress in `progress.md` to make work visible, reviewable, and resumable across sessions.
 ---
 ## Hard Constraints (Never Violate)
-| Constraint | Rule |
-|---|---|
-| Security | Never expose or invent credentials |
-| Git Safety | Never force push main/master; never bypass hooks |
-| Git Restore | Never run `reset --hard`, `checkout .`, `clean -fd` without explicit user request |
-| Honesty | Never fabricate tool output; never guess URLs; label inferences; state source conflicts |
-| Paths | Use absolute paths for file operations |
-| Reversibility | Ask first before destructive or irreversible actions |
+| Constraint    | Rule                                                                                    |
+| ------------- | --------------------------------------------------------------------------------------- |
+| Security      | Never expose or invent credentials                                                      |
+| Git Safety    | Never force push main/master; never bypass hooks                                        |
+| Git Restore   | Never run `reset --hard`, `checkout .`, `clean -fd` without explicit user request       |
+| Honesty       | Never fabricate tool output; never guess URLs; label inferences; state source conflicts |
+| Paths         | Use absolute paths for file operations                                                  |
+| Reversibility | Ask first before destructive or irreversible actions                                    |
 ---
@@ -142,14 +210,14 @@ Delegate when specialist context, isolation, or parallelism improves correctness
 ## Delegation Policy
-| Agent | Use For |
-|---|---|
-| `@general` | Small implementation tasks |
-| `@explore` | Codebase search and patterns |
-| `@scout` | External docs/research |
-| `@review` | Correctness/security/debug review |
-| `@plan` | Architecture and execution plans |
-| `@vision` | UI/UX and accessibility judgment |
+| Agent      | Use For                           |
+| ---------- | --------------------------------- |
+| `@general` | Small implementation tasks        |
+| `@explore` | Codebase search and patterns      |
+| `@scout`   | External docs/research            |
+| `@review`  | Correctness/security/debug review |
+| `@plan`    | Architecture and execution plans  |
+| `@vision`  | UI/UX and accessibility judgment  |
 **Parallelism rule:** Parallel subagents for 3+ independent tasks; otherwise sequential.
@@ -163,7 +231,7 @@ Subagent self-reports are not sufficient. After any subagent reports success:
 4. Confirm the agent stayed within scope
 ```
-✅ Agent reports → Read diff → Verify → Check criteria → Accept
+Agent reports → Read diff → Verify → Check criteria → Accept
 ```
 Subagent results must include: **status**, **files modified**, **verification evidence**, **summary**, **blockers** (if any).
@@ -173,11 +241,13 @@ When a subagent returns without this structure, treat the response with extra sk
 ---
 ## Question Policy
 Ask only when ambiguity materially changes the outcome or the action is destructive. Keep questions targeted. Prefer a reversible action or narrow assumption when it can resolve the ambiguity safely.
 ---
 ## Web Retrieval Priority
 1. `context7` — official library/framework docs
 2. `websearch` / `codesearch` — discover URLs
 3. `web_fetch` — read result URL as markdown
@@ -187,6 +257,7 @@ Ask only when ambiguity materially changes the outcome or the action is destruct
 ---
 ## Edit Protocol
 1. **LOCATE** — find exact position of what must change
 2. **READ** — get fresh file content around the target
 3. **VERIFY** — confirm expected content exists
@@ -200,6 +271,7 @@ Prefer `edit` for modifications; reserve `write` for new files or deliberate ful
 ---
 ## Context Management
 - Keep context high-signal
 - Use DCP/VCC tools to compress completed phases and recover targeted history
 - After any context compaction, re-read: (1) this `AGENTS.md`, (2) the current task details, (3) active state
@@ -210,6 +282,7 @@ Prefer `edit` for modifications; reserve `write` for new files or deliberate ful
 ---
 ## Output Style
 - Be concise and direct. Cite concrete file paths and line numbers.
 - **No cheerleading** — no filler, no artificial reassurance
 - **Never narrate abstractly** — explain what you're doing, not that you're "going to look into it"

package/dist/template/.opencode/README.md CHANGED Viewed

@@ -9,11 +9,12 @@ This directory contains project-specific OpenCode configuration: agents, command
 ├── AGENTS.md                # Global operating rules for agents
 ├── opencode.json            # OpenCode runtime configuration
 ├── dcp.jsonc                # Dynamic context pruning settings
-├── agent/                   # Agent definitions (9)
-├── command/                 # Slash commands (14)
+├── agent/                   # Agent definitions (7)
+├── command/                 # Slash commands (6)
 ├── skill/                   # Skill library used by agents/commands
 ├── tool/                    # Custom tools (memory, swarm, research, etc.)
 ├── plugin/                  # OpenCode plugins and plugin-local SDK code
+├── workflows/               # Multi-agent orchestration plans (markdown)
 ├── memory/                  # Memory templates + project memory files
 └── .env.example             # Environment variable template
 ```
@@ -34,11 +35,30 @@ Add the keys you actually need for enabled services.
 ## Skills
-Skills live in `.opencode/skill/` and are loaded on demand with `skill({ name: "..." })`.
+Skills live in `.opencode/skill/` and are loaded on demand with `skill({ name: "..." })`. They follow a 3-tier system (see `manifest.json`):
-- Core workflow examples: `verification-before-completion`, `writing-plans`, `executing-plans`
-- Debug/reliability examples: `systematic-debugging`, `root-cause-tracing`, `defense-in-depth`
-- UI/design examples: `frontend-design`, `visual-analysis`, `accessibility-audit`
+**Tier 1 — Essential** — Always consider first; general-purpose execution discipline.
+- `behavioral-kernel`, `defense-in-depth`, `incremental-implementation`, `verification-before-completion`
+**Tier 2 — On-Demand** — Load when the task domain matches:
+- *UI/design*: `frontend-design`, `design-taste-frontend`, `minimalist-ui`, `high-end-visual-design`, `industrial-brutalist-ui`, `accessibility-audit`, `redesign-existing-projects`, `mockup-to-code`
+- *Testing*: `test-driven-development`, `testing-anti-patterns`, `browser-testing-with-devtools`, `playwright`
+- *Debugging*: `debugging-and-error-recovery`, `root-cause-tracing`, `defense-in-depth`, `fallow`
+- *Workflow*: `spec-driven-development`, `planning-and-task-breakdown`, `subagent-driven-development`, `development-lifecycle`, `git-workflow-and-versioning`, `shipping-and-launch`
+- *Code quality*: `code-review-and-quality`, `agent-code-quality-gate`, `code-cleanup`, `deep-module-design`
+- *Platform*: `supabase`, `resend`, `polar`, `cloudflare`*, `jira`, `figma`, `vercel-deploy-claimable`
+- *Docs/design*: `documentation-and-adrs`, `deprecation-and-migration`, `api-and-interface-design`, `brainstorming`, `grill-me`
+- *Research*: `opensrc`, `webclaw`, `pdf-extract`, `gemini-large-context`
+- *Navigation*: `srcwalk`
+- *Etc*: `ci-cd-and-automation`, `security-and-hardening`, `performance-optimization`, `source-driven-development`, `writing-skills`
+**Tier 3 — Platform Reference** — Large reference directories (not shipped by default). Install on demand:
+```
+.opencode/scripts/install-skill.sh <name>
+```
+\* `cloudflare` is tier-3 (257 files), listed here for discoverability. Run `install-skill.sh` to install.
 ## Custom Tools
@@ -61,6 +81,23 @@ Current plugin source files in `.opencode/plugin/`:
 See `.opencode/plugin/README.md` for plugin details.
+## Workflows
+Workflows live in `.opencode/workflows/` and define reusable multi-agent orchestration plans. Each is a markdown file that specifies phases with agent types, concurrency, dependencies, and prompt templates.
+**Built-in workflows:**
+- `deep-research` — Fan out 8 search agents, cross-check findings, synthesize a cited report
+- `audit-pattern` — Discover code pattern occurrences, audit each, produce remediation report
+- `batch-implement` — Parallel task implementation with review and merge phases
+**Usage:**
+1. Read the workflow file from `.opencode/workflows/<name>.md`
+2. Execute each phase via `task()` with the specified agent type and prompt
+3. For parallel phases, spawn multiple `task()` calls concurrently
+4. Replace `{phase_N_output}` placeholders with actual output from completed phases
+New workflows: add a `.md` file to `.opencode/workflows/` following the same structure. See `AGENTS.md` for execution details.
 ## Guardrails
 - Keep edits focused; avoid changing generated output under `dist/`.

package/dist/template/.opencode/artifacts/harness-workflows/plan.md ADDED Viewed

@@ -0,0 +1,317 @@
+# Harness Redesign: Workflows + Surface Area Reduction
+## TL;DR
+Add 1 plugin (~300 lines), 1 directory (`.opencode/workflows/`), cut the template from 800+ files to ~80 essential files, lazy-load the rest. The result is a harness that is strictly better than Claude Code's: more flexible, more verifiable, and fully extensible.
+---
+## Discovery
+### Current State (Brutal)
+| Category | File Count | Problem |
+|---|---|---|
+| Agents | 7 files | `build` (main agent) and `general` (subagent default) are distinct roles. Keep both. |
+| Commands | 17 files | ~10 of these will never be invoked. They bloat context on every `/init`. |
+| Skills | 50+ dirs | Cloudflare is 280 files. React best-practices is 50 files. Core Data is 15 files. SwiftUI is 17 files. **If you don't use these, they're dead weight in the skill index.** |
+| Plugins | ~20 files + Copilot SDK | All plugins including Copilot SDK stay — they're part of the core stack. |
+| State/Artifacts | ~10 files | Workable — needed for the beads lifecycle. |
+| DCP prompts | 9 files | 9 carefully tuned compression prompts. Keep. |
+| `src/` (CLI) | 25 files | Fine. This is the `ock` CLI surface. Keep. |
+**Total: ~800 files.** A new user has no idea where to start. The `README.md` lists 14 slash commands and 7 agents — the cognitive load before the first prompt is too high.
+### What OpenCode Already Has That Claude Code Doesn't
+1. **Plugin API hooks**: `tool.execute.before`, `experimental.chat.system.transform`, `experimental.session.compacting`, `message.part.updated`. Claude Code has file-based extension only.
+2. **4-tier memory pipeline**: capture → distill → curate → inject. Claude Code has auto memory (model writes to a file).
+3. **Tool constraints per subagent**: `explore` literally cannot edit files — the runtime enforces it. Claude Code uses prompt-level restrictions.
+4. **Fallow codebase gate**: deterministic static analysis gating completion claims.
+5. **Worker distrust protocol**: the harness requires reading changed files and re-running verification after every subagent returns.
+### What OpenCode Is Missing vs Claude Code Workflows
+Claude Code's dynamic workflows provide:
+1. **A script that holds the plan** — orchestration lives in JavaScript, not the model's context window
+2. **Isolated runtime** — the script executes outside the conversation
+3. **Intermediate results in script variables** — not in context
+4. **Phase-level monitoring** — track agents per phase, token usage, elapsed time
+5. **Resumability** — cached agent results survive pauses
+6. **Cross-checking** — agents adversarially review each other's findings
+OpenCode's equivalent is the `subagent-driven-development` skill — a **markdown file** describing how to orchestrate subagents manually. This is a prompt, not a primitive. The model still holds the orchestration in context. For a 50-agent codebase audit, this hits the context wall.
+---
+## Design: Workflow Primitive
+### 1. Workflow File Format
+```
+.opencode/workflows/
+├── deep-research.ts      # Built-in
+├── audit-endpoints.ts    # User-created
+└── migration-runner.ts   # User-created
+```
+Each file exports a workflow definition:
+```typescript
+// .opencode/workflows/deep-research.ts
+import { defineWorkflow } from "../plugin/workflow/runtime.js"
+export default defineWorkflow({
+  name: "deep-research",
+  description: "Fan out web searches on a question, cross-check sources, return a cited report",
+  agents: 16,        // max concurrent agents
+  phases: [
+    {
+      name: "research",
+      parallel: true,
+      agents: 8,
+      prompt: "Search for different angles on: {question}"
+    },
+    {
+      name: "cross-check",
+      parallel: true,
+      agents: 4,
+      dependsOn: ["research"],
+      prompt: "Verify findings from research phase against each other"
+    },
+    {
+      name: "synthesize",
+      parallel: false,
+      agents: 1,
+      dependsOn: ["cross-check"],
+      prompt: "Write a final cited report from verified findings"
+    }
+  ]
+})
+```
+**Alternative (more flexible) — function-based:**
+```typescript
+export default defineWorkflow({
+  name: "audit-endpoints",
+  async run({ task, args, log }) {
+    // Phase 1: discover endpoints
+    const endpoints = await task({
+      agent: "explore",
+      prompt: `Find all API route handlers matching pattern: ${args.pattern ?? "src/**/route.ts"}`
+    })
+    // Phase 2: audit in parallel
+    const results = await Promise.all(
+      parseEndpoints(endpoints).map(ep => task({
+        agent: "review",
+        prompt: `Audit ${ep.path} for: auth checks, input validation, error handling`
+      }))
+    )
+    // Phase 3: synthesize
+    return synthesize(results)
+  }
+})
+```
+The function-based form is more powerful. It lets the workflow script hold state, branch, and aggregate — exactly what Claude Code's workflows do.
+### 2. Workflow Runtime (~300 lines in a new plugin)
+The runtime is a single plugin file: `.opencode/plugin/workflow.ts`
+```
+plugin/workflow.ts                  — Plugin entry: tools + command registration
+plugin/workflow/runtime.ts          — Script loader + sandboxed executor
+plugin/workflow/monitor.ts          — Phase progress tracking via session-summary
+plugin/workflow/registry.ts         — List/save/load workflows from .opencode/workflows/
+```
+**Key interfaces:**
+```typescript
+// The runtime tool exposed to the model
+tool.workflow.run = {
+  name: "workflow-run",
+  description: "Run a workflow script that orchestrates multiple subagents",
+  parameters: {
+    workflow: string,     // name of workflow in .opencode/workflows/
+    args: Record<string, unknown>
+  },
+  execute: async ({ workflow, args }, context) => {
+    const script = await load(`.opencode/workflows/${workflow}.ts`)
+    const result = await sandboxedExecute(script, {
+      task: context.task,    // pass through the built-in task() tool
+      args,
+      log: context.log
+    })
+    return result
+  }
+}
+```
+**Sandboxed execution** means the workflow script runs in a separate context with its own `task()` pool. It cannot directly read/edit/write files (only its subagents can). This prevents the orchestration script from corrupting state — the same constraint Claude Code's runtime enforces.
+### 3. Integration Points
+**Plugin hooks:**
+- `tool.execute.before` — intercept `workflow-run` calls, route to runtime
+- `experimental.session.compacting` — preserve workflow run state across compaction
+- `experimental.chat.system.transform` — inject available workflow descriptions into context (progressive disclosure — only active ones, not all 50)
+**Existing surface to reuse:**
+- `task()` tool — already exists, workflows delegate to it
+- Artifacts — workflow results land in `.opencode/artifacts/<run-id>/`
+- Session summary — workflow phase progress is tracked via the existing session-summary plugin interface
+### 4. Built-in Workflows (ship 3)
+| Workflow | What it does | When to use |
+|---|---|---|
+| `/deep-research` | Fan out web searches across angles, cross-check sources, write cited report | Questions needing multi-source verification |
+| `/audit-pattern` | Explore codebase for a pattern, review each match, synthesize findings | "Find all X and check for Y" |
+| `/batch-implement` | Take a plan with independent tasks, dispatch one subagent per task, review each | Multi-file feature implementation |
+These replace ~5 of the 17 existing slash commands (research, review-codebase, fix, improve-architecture, refactor) with a single unified primitive.
+---
+## Design: Surface Area Reduction
+### 1. Keep `build` and `general` — distinct roles, no merge
+**Confirmed:** `build` is the main/primary agent for development sessions. `general` is the default subagent used by `task()`. They serve different routing purposes and both stay.
+**Action:** None — no merge needed. If anything, ensure `general.md` explicitly references `build.md` as its parent for context inheritance.
+### 2. Cut the command list from 17 to 6
+| Keep | Delete | Why |
+|---|---|---|
+| `/ship` | → Keep | Core workflow end |
+| `/plan` | → Keep | Core workflow middle |
+| `/create` | → Keep | Core workflow start |
+| `/verify` | → Keep | Verification gate |
+| `/research` | → Keep | Research command |
+| `/fix` | → Keep | Targeted bugfix |
+| | `/clarify` | Merged into `/plan` — the plan agent should clarify as part of planning |
+| | `/commit` | `git commit` is a mechanical action, not a command. Let the agent do it automatically at ship time |
+| | `/design` | Merged into `/plan` — architecture design is a phase of planning |
+| | `/explore` | Users type "find the auth logic", not "/explore auth logic" |
+| | `/improve-architecture` | Merged into `/plan --refactor` flag |
+| | `/init` | Keep but hide from command list — called once on setup |
+| | `/pr` | Merged into `/ship` — PR creation is the final phase |
+| | `/refactor` | Merged into `/plan --refactor` |
+| | `/review-codebase` | Replaced by `/audit` workflow |
+| | `/test` | Too narrow — users say "add tests" not "/test" |
+| | `/ui-review` | Merged into the verification phase of `/ship` |
+**Impact:** -11 files. The remaining 6 commands are discoverable and non-overlapping. Users learn `create → plan → ship` and everything else is a phase of those three.
+### 3. Skill triage: 3 tiers
+**Tier 1 — Essential (always loaded, in context):**
+- `behavioral-kernel` — core execution discipline
+- `code-navigation` — how to read code effectively
+- `verification-before-completion` — must-run gates
+- `incremental-implementation` — thin slices
+- `defense-in-depth` — structural safety
+**Tier 2 — On-demand (model loads when relevant, 5-10 files):**
+- `frontend-design`, `design-taste-frontend`, `minimalist-ui`, `high-end-visual-design`, `industrial-brutalist-ui`
+- `spec-driven-development`, `planning-and-task-breakdown`, `subagent-driven-development`
+- `documentation-and-adrs`, `deprecation-and-migration`
+- `testing-anti-patterns`, `test-driven-development`
+- `debugging-and-error-recovery`, `root-cause-tracing`
+- `browser-testing-with-devtools`, `playwright`
+- `code-review-and-quality`, `agent-code-quality-gate`
+- `git-workflow-and-versioning`, `shipping-and-launch`
+- `fallow`, `srcwalk`, `structured-edit`
+- ~10 design/UI skills
+- ~5 platform skills (supabase, resend, polar, cloudflare-postgres-basics)
+**Tier 3 — Platform reference (load only when the user confirms they build on that platform):**
+These are large reference directories. They should NOT ship in every template:
+- `cloudflare` — 280 files, 15+ sub-services. Add only if user selects "Cloudflare" in `init` wizard
+- `react-best-practices` — 50 files. Add only if user selects "React"
+- `supabase-postgres-best-practices` — 35 files. Add only if user selects "Supabase"
+- `core-data-expert` — 15 files. Add only if user selects "iOS/Core Data"
+- `swiftui-expert-skill` — 17 files. Add only if user selects "SwiftUI"
+- `swift-concurrency` — 15 files. Add only if user selects "Swift"
+**Impact:** Template drops from 800 files to ~100-150 for most users (Cloudflare alone is 280 files). The `init` wizard asks 3 questions and installs the right tier-3 skills.
+### 4. Plugin cleanup — all plugins stay
+**Confirmed:** All plugins including the Copilot provider/auth integration and SDK stay. They're part of the core stack.
+| Plugin | Keep? | Why |
+|---|---|---|
+| `memory.ts` + lib/ | ✅ Keep | Core 4-tier memory |
+| `session-summary.ts` | ✅ Keep | Anchored iterative summarization |
+| `sessions.ts` | ✅ Keep | Session search |
+| `skill-mcp.ts` | ✅ Keep | Skill MCP bridge |
+| `srcwalk.ts` | ✅ Keep | Code navigation |
+| `copilot-auth.ts` + `sdk/copilot/` | ✅ Keep | Copilot provider integration |
+| `prompt-leverage.ts` | ✅ Keep | Prompt framing |
+| `rtk.ts` | ❌ Removed | External dependency for marginal benefit — not earning its place in core stack |
+| `guard.ts` | ✅ Keep | Conventional commits + pipe-to-shell blocker |
+**Impact:** 0 deletions. The plugin surface stays intact.
+### 5. DCP and config cleanup
+| File | Keep? | Why |
+|---|---|---|
+| `dcp.jsonc` | ✅ Keep | Core compression settings |
+| `dcp-prompts/defaults/` (5 files) | ✅ Keep | Tuned compression prompts |
+| `dcp-prompts/overrides/` (2 files) | ✅ Keep | User overrides |
+| `tui.json` | ✅ Keep | TUI config |
+| `.env.example` | ✅ Keep | Environment reference |
+| `.template-manifest.json` | 🟡 Keep but hide | Build system internal |
+| `.version` | 🟡 Keep but hide | Build system internal |
+| `opencodex-fast.jsonc` | ❓ What is this? | If unused, delete |
+---
+## Implementation Effort
+| Item | Effort | Files Changed | Risk |
+|---|---|---|---|---|
+| Workflow runtime plugin | **M** (2-3 days) | ~4 new files (plugin + runtime + monitor + registry) | Medium — sandboxed execution has edge cases |
+| Phase monitoring via session-summary hook | **S** (half day) | ~2 files modified | Low — existing plugin interface |
+| Built-in workflows (deep-research, audit-pattern, batch-implement) | **S** each (half day each) | ~3 new workflow files | Low — all use existing `task()` |
+| Cut commands 17→6 | **S** (2 hours) | 11 deletions, update `ship.md`, README, and init command | Low — old commands unused |
+| Skill triage (tier system) | **M** (1-2 days) | Init wizard, skill metadata, lazy-loading config | Medium — changing skill loading has UX impact |
+| Tier-3 skill gate in init wizard | **M** (1 day) | Add questions to init wizard, conditional skill install | Low |
+| **Total** | **M overall** (1 week) | ~20 files changed | Medium |
+---
+## The Brutal Self-Critique
+**Where this design could fail:**
+1. **The workflow runtime adds complexity.** Every runtime has bugs. Error handling in multi-agent scripts is hard. If the runtime is flaky, the workflow feature hurts more than it helps. *Mitigation: keep the runtime under 300 lines, no dependencies, hard fail on uncaught exceptions.*
+2. **Cutting commands removes discoverability.** The 17 commands are a menu of "things Claude can do." Cutting to 6 means users need to know the workflow names. *Mitigation: `/help` should list available workflows + the 6 core commands.*
+3. **Skill triage creates friction.** If a user wants Cloudflare but didn't select it at init, they now need to know they can `skill install cloudflare`. That's an extra step. *Mitigation: the `init` wizard should have a "Browse skill marketplace" option that lazily loads the full list.*
+4. **The function-based workflow format is too powerful.** Giving workflow scripts full JavaScript means they can have bugs, infinite loops, and resource leaks. Claude Code limits workflows to declarative phases + pre-defined templates. *Mitigation: impose a timeout per workflow, max agent count, and disallow raw `while(true)` via sandbox. Add a `maxAgents: 1000` cap matching Claude Code's.*
+5. **The template gets smaller but the init wizard gets bigger.** Shifting complexity from file count to an interactive wizard is a tradeoff, not a pure win. If the wizard is bad, users have a worse experience than a big file tree. *Mitigation: the wizard asks exactly 3 questions (project type, target platform, optional skills). No more.*
+---
+## Acceptance Criteria
+1. **Workflow runtime works**: `workflow-run deep-research "What changed in Node.js v20-v22"` fans out 8 search agents, cross-checks, returns a cited report
+2. **Workflow scripts are saveable**: `workflow-save` stores the current run's script as a reusable command
+3. **Surface area measured**: template ships with ≤150 files (down from 800+)
+4. **Init wizard working**: `ock init` asks 3 questions → installs only matching tier-3 skills
+5. **All existing `/ship` flows still pass**: no regressions from the 17→6 command cut
+6. **Models actually use workflows**: functional test where a prompt containing "audit" triggers a workflow instead of a single-agent turn