npm - @kontourai/flow-agents - Versions diffs - 1.1.0 → 1.3.0 - Mend

@kontourai/flow-agents 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (119) hide show

package/.github/workflows/ci.yml +6 -1
package/.github/workflows/kit-gates-demo.yml +6 -2
package/.github/workflows/runtime-compat.yml +5 -2
package/CHANGELOG.md +51 -0
package/CONTRIBUTING.md +30 -0
package/README.md +26 -5
package/agents/dev.json +1 -1
package/agents/tool-planner.json +1 -1
package/build/src/cli/{flow-kit.js → kit.js} +122 -108
package/build/src/cli/validate-source-tree.js +4 -4
package/build/src/cli/workflow-sidecar.js +70 -5
package/build/src/cli.js +3 -3
package/build/src/flow-kit/validate.js +89 -62
package/build/src/tools/build-universal-bundles.js +78 -17
package/build/src/tools/generate-context-map.js +49 -7
package/build/src/tools/validate-source-tree.js +32 -1
package/console.telemetry.json +1 -1
package/docs/adr/0004-gates-expect-surface-claims.md +7 -7
package/docs/adr/0007-flow-skill-kit-tool-boundary.md +169 -0
package/docs/adr/0007-skill-audit.md +112 -0
package/docs/adr/0008-kit-operation-boundary.md +88 -0
package/docs/context-map.md +18 -22
package/docs/flow-kit-repository-contract.md +5 -5
package/docs/getting-started.md +177 -0
package/docs/index.md +19 -8
package/docs/kit-authoring-guide.md +125 -13
package/docs/knowledge-kit.md +2 -2
package/docs/operating-layers.md +2 -2
package/docs/spec/runtime-hook-surface.md +1 -1
package/docs/veritas-integration.md +4 -4
package/docs/vision.md +1 -1
package/docs/workflow-eval-strategy.md +2 -2
package/docs/workflow-usage-guide.md +2 -2
package/evals/acceptance/test_opencode_harness.sh +18 -10
package/evals/acceptance/test_pi_harness.sh +10 -6
package/evals/ci/run-baseline.sh +1 -1
package/evals/fixtures/builder-kit-workflow-state/happy-path.json +2 -2
package/evals/fixtures/builder-kit-workflow-state/mid-work-resume.json +2 -2
package/evals/fixtures/console-learning-projection/artifacts/console-learning-correction/learning.json +1 -1
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +4 -4
package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k0-flows-only/flows/review.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k1-agent-extension/flows/build.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k2-with-evals/flows/synthesize.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/third-party-extension/flows/review.flow.json +4 -4
package/evals/fixtures/pull-work-provider/github-issues.json +5 -5
package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +2 -2
package/evals/fixtures/surface-trust/artifact-absent.json +2 -2
package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +2 -2
package/evals/fixtures/surface-trust/missing-authority-trust-report.json +2 -2
package/evals/fixtures/surface-trust/provider-absent.json +2 -2
package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +2 -2
package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +2 -2
package/evals/integration/test_activate_npx_context.sh +2 -2
package/evals/integration/test_bundle_install.sh +17 -12
package/evals/integration/test_console_learning_projection.sh +2 -2
package/evals/integration/test_flow_kit_install_git.sh +7 -7
package/evals/integration/test_flow_kit_repository.sh +4 -4
package/evals/integration/test_goal_fit_hook.sh +144 -0
package/evals/integration/test_kit_conformance_levels.sh +56 -2
package/evals/integration/test_local_flow_kit_install.sh +7 -7
package/evals/integration/test_publish_change_helper.sh +1 -1
package/evals/integration/test_pull_work_provider.sh +1 -1
package/evals/integration/test_runtime_adapter_activation.sh +3 -3
package/evals/integration/test_workflow_sidecar_writer.sh +9 -9
package/evals/lib/node.sh +2 -2
package/evals/static/test_package.sh +3 -3
package/evals/static/test_workflow_skills.sh +19 -19
package/integrations/strands/flow_agents_strands/steering.py +1 -1
package/integrations/strands-ts/src/hooks.ts +1 -1
package/kits/builder/flows/build.flow.json +48 -48
package/kits/builder/flows/shape.flow.json +36 -36
package/kits/builder/kit.json +17 -0
package/{skills → kits/builder/skills}/builder-shape/SKILL.md +4 -4
package/{skills → kits/builder/skills}/idea-to-backlog/SKILL.md +1 -1
package/kits/knowledge/adapters/obsidian-store/index.js +137 -26
package/kits/knowledge/evals/contract-suite/suite.test.js +90 -0
package/kits/knowledge/flows/compile.flow.json +12 -12
package/kits/knowledge/flows/consolidate.flow.json +16 -16
package/kits/knowledge/flows/ingest.flow.json +12 -12
package/kits/knowledge/flows/retire.flow.json +16 -16
package/kits/knowledge/flows/store-contract.flow.json +12 -12
package/kits/knowledge/flows/synthesize.flow.json +16 -16
package/kits/knowledge/kit.json +16 -9
package/kits/release-evidence/flows/release-evidence.flow.json +3 -3
package/package.json +11 -5
package/packaging/packs.json +1 -21
package/schemas/workflow-evidence.schema.json +2 -1
package/scripts/README.md +1 -1
package/scripts/hooks/stop-goal-fit.js +66 -18
package/scripts/kit.js +2 -0
package/skills/README.md +23 -0
package/src/cli/{flow-kit.ts → kit.ts} +124 -109
package/src/cli/validate-source-tree.ts +4 -4
package/src/cli/workflow-sidecar.ts +62 -4
package/src/cli.ts +3 -3
package/src/flow-kit/validate.ts +118 -58
package/src/tools/build-universal-bundles.ts +74 -13
package/src/tools/generate-context-map.ts +36 -6
package/src/tools/validate-source-tree.ts +27 -1
package/scripts/flow-kit.js +0 -2
package/skills/context-budget/SKILL.md +0 -40
package/skills/explore/SKILL.md +0 -137
package/skills/feedback-loop/SKILL.md +0 -87
package/skills/frontend-design/SKILL.md +0 -80
/package/{skills → kits/builder/skills}/deliver/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/design-probe/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/evidence-gate/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/execute-plan/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/fix-bug/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/learning-review/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/pickup-probe/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/plan-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/pull-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/release-readiness/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/review-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/tdd-workflow/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/verify-work/SKILL.md +0 -0
/package/{skills → kits/knowledge/skills}/knowledge-capture/SKILL.md +0 -0

package/console.telemetry.json CHANGED Viewed

@@ -49,7 +49,7 @@
     },
     {
       "id": "flow-agents-evidence",
-      "label": "Verification evidence",
+      "label": "Verification evidence (trust-backed refs use Hachure trust.bundle format when present)",
       "root": "product:flow-agents:.flow-agents",
       "files": [
         "evidence.json"

package/docs/adr/0004-gates-expect-surface-claims.md CHANGED Viewed

@@ -1,15 +1,15 @@
 ---
-title: ADR 0004: Gates Expect Surface Claims
+title: ADR 0004: Gates Expect Hachure Trust Bundles
 ---
-# ADR 0004: Gates Expect Surface Claims
+# ADR 0004: Gates Expect Hachure Trust Bundles
-Flow-backed kits will model rich gate evidence as claim expectations rather than provider-specific requirements. A gate expectation can require `kind: "surface.claim"`, a Surface claim type such as `repo.policy_compliance`, accepted trust statuses such as `verified`, and whether the expectation blocks the transition; project or runtime config maps claim types to trusted Surface producers and authority traces. This lets the Builder Kit use repo governance, command checks, CI, human decisions, or future producers without naming a specific provider in the Flow Definition.
+Flow-backed kits will model rich gate evidence as claim expectations using the Hachure trust.bundle format rather than provider-specific requirements. A gate expectation can require `kind: "trust.bundle"`, a domain claim type such as `builder.verify.tests`, accepted trust statuses such as `verified`, and whether the expectation blocks the transition; project or runtime config maps claim types to trusted Surface producers and authority traces. This lets the Builder Kit use repo governance, command checks, CI, human decisions, or future producers without naming a specific provider in the Flow Definition.
-**Status**: Accepted
+**Status**: Accepted (updated: vocabulary aligned to Hachure trust.bundle in hachure-align)
-**Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly.
+**Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly. An earlier version used `kind: "surface.claim"` and `artifact_kind: "TrustReport"/"Trust Snapshot"` — those have been renamed to `kind: "trust.bundle"` and `artifact_kind: "trust.bundle"` to align with the Hachure schema standard that Flow now ships.
-**Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model.
+**Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model. When hachure is installed as an optional dependency, referenced trust artifacts are validated against hachure's trust-bundle.schema.json at evidence-recording time.
-**Initial Shape**: Gate expectations should use `expects` entries with `id`, `kind: "surface.claim"`, `required`, `claim.type`, optional `claim.subject`, `claim.accepted_statuses`, `description`, and optional `explore_hint`. The Builder Kit should use intuitive subject strings such as `flow-run`, `flow-step`, `work-item`, `change`, `pull-request`, `release`, `decision`, and `artifact`, while the schema remains open to other subject values.
+**Initial Shape**: Gate expectations should use `expects` entries with `id`, `kind: "trust.bundle"`, `required`, `claim.type`, optional `claim.subject`, `claim.accepted_statuses`, `description`, and optional `explore_hint`. The Builder Kit should use intuitive subject strings such as `flow-run`, `flow-step`, `work-item`, `change`, `pull-request`, `release`, `decision`, and `artifact`, while the schema remains open to other subject values.

package/docs/adr/0007-flow-skill-kit-tool-boundary.md ADDED Viewed

@@ -0,0 +1,169 @@
+---
+title: "ADR 0007: Flow / Skill / Kit / Tool Boundary"
+---
+# ADR 0007: Flow / Skill / Kit / Tool Boundary
+**Date:** 2026-06-15
+**Status:** Accepted
+---
+## Context
+Flow Agents has accumulated ~26 skills. When kits were first introduced, the word "skill" was used loosely for anything the agent knew how to do — from kit-specific workflow procedures to general capabilities (search, browser, dependency scanning) to cross-cutting concerns (agentic engineering principles, context budgeting). That blurred three genuinely different things: the agent's *procedural methods for flow steps*, the *raw capabilities it wields*, and the *containers that package flows with their implementations*.
+The boundary became sharper when the Flow Definition container moved to `kontourai/flow` (ADR 0001). Once Flow owns the process contract and Flow Agents owns the agent-facing implementation, a natural question emerges: where does each skill belong, and what is a skill in the first place?
+A related accumulation was an informal "general capability skill tier" — skills like `agentic-engineering`, `search-first`, `github-cli`, and `eval-rebuild` that described how the agent uses runtime capabilities, not how it implements a specific flow step. That tier was never designed; it accreted as the skill list grew. It is not a peer design concept alongside Kit-skills; it is an artifact of undifferentiated naming.
+This ADR records the conceptual model that was agreed in a design conversation with Brian Anderson on 2026-06-15 and makes it the authoritative reference for future skill placement, kit authoring, and orphan triage.
+---
+## Decision
+### The Four Definitions
+#### 1. Flow Definition = the WHAT
+A Flow Definition is a process contract: steps, gates, expected evidence, and definition-of-done. It is the container for *what* a workflow accomplishes, not *how* the agent does it.
+- Flow Definitions are owned and validated by `kontourai/flow`.
+- They are agentless-capable: a CI job or a human can satisfy a flow without an agent, as proven by the agentless gate-eval work (issues #52 and #60).
+- Flow Agents *consumes* Flow Definitions; it does not own them (ADR 0001).
+#### 2. Skill = the HOW = the *do* of a specific flow step
+A skill is the agent's procedural method for accomplishing one step or gate of a flow. Brian's framing: *"skills are the do part of the flow definition."*
+Consequences of this definition:
+- There is exactly **one kind of skill** in the intended design.
+- Every skill is bound to a flow step: it implements the agent's side of that step.
+- Every skill belongs to the kit that owns that flow.
+- A skill with no flow step behind it is not a skill in this model — it is either a tool, an orphan, or evidence of a missing flow.
+The earlier informal "general capability skill tier" was an accreted artifact, not a designed concept. It is not a peer category to Kit-skills. Skills in that tier are reclassified as tools, orphans, or — where a missing flow is strongly implied — candidates for a future kit.
+#### 3. Tool = a raw capability the agent wields
+A tool is something the agent *uses* to do the work: run bash, read/write files, drive a browser, call `gh`, look up packages, search the web. Tools are:
+- Provided by the runtime or harness, not tied to any flow.
+- The agent's *hands*, not its *method*.
+- Distinct from a skill even when the agent wraps a tool call in a named skill file — if the "skill" is just "here is how to invoke this capability," it is a tool description, not a flow-step method.
+A skill that only describes how to use a tool (e.g., `github-cli`, `browser-test`, `eval-rebuild`) belongs in harness documentation or an agent system prompt, not in the `skills/` directory as a first-class skill.
+#### 4. Kit = packages a flow with its skills
+A kit:
+- Owns one or more Flow Definitions.
+- Provides the agent-side skills that implement those flows' steps.
+- May include store adapters, evals, and docs.
+A kit's skills are the agent-side implementation of that kit's flows, in a one-to-one relationship between skills and flow steps (or a small cluster of closely related steps).
+### Core Rule
+> **A skill that has no flow step behind it is not a Kit-skill.** It is either:
+> - (a) a **TOOL** mislabeled as a skill — it describes a raw capability and should live in harness docs or an agent system prompt, or
+> - (b) an **ORPHAN** — it is procedural and agent-driven but the flow that would own its step has not been defined yet; it signals a missing or implicit flow, or
+> - (c) **scope drift** — it does not belong in this repo's skill layer at all.
+### Cross-Kit Skill Sharing: DEFERRED
+A question arose during this discussion: can a skill be shared across kits? For example, `knowledge-capture` is implemented as a Knowledge Kit skill (`knowledge.ingest:capture`) but is used as a support step inside Builder Kit flows (e.g., `learning-review` calls `knowledge-capture`).
+The sharing model under active consideration is an npm-dependency model: Kit B declares a peer dependency on Kit A and invokes Kit A's skills as a consumer, without absorbing them into Kit B's skill list.
+**This alternative is not adopted now.** The cross-kit sharing question is deferred. Until it is resolved:
+- Each skill belongs to exactly one kit.
+- When a skill is consumed across kits, the consumer kit calls the skill by reference; it does not duplicate or re-own it.
+- The audit table in the companion document (`0007-skill-audit.md`) reflects the intended single-kit ownership for each skill.
+---
+## Consequences
+### Immediate Structural Clarity
+- The `skills/` directory contains a mix of Kit-skills, tools, and orphans. The audit table in `docs/adr/0007-skill-audit.md` maps each skill to one of those three classifications.
+- Skills that are tools (`agentic-engineering`, `browser-test`, `dependency-update`, `eval-rebuild`, `github-cli`, `search-first`) should eventually migrate to agent system prompts, harness context, or be removed from `skills/` entirely. No move is made in this ADR.
+- Orphans (`context-budget`, `explore`, `feedback-loop`, `frontend-design`) each implied either a missing flow or a reclassification. All four were ruled on 2026-06-15 (see "Orphan Rulings — 2026-06-15" below); all are REMOVED, with preserved intents recorded.
+### Builder Kit Fix (Issue #62) Becomes Mechanical
+With this model, the Builder Kit fix discussed in issue #62 is straightforward: every skill in `skills/` that maps to `builder.build` or `builder.shape` steps belongs in the Builder Kit. The audit table provides the exact step-to-skill mapping. The fix is to move those skills into `kits/builder/` — but that move is explicitly out of scope for this ADR. This ADR provides the analysis that makes the move mechanical.
+### Tools Are Flow-Agnostic
+Runtime-provided tools (`gh`, Playwright, bash, file read/write, package registries) are not skills. They do not belong to flows. The agent wields them as hands while executing flow steps. Packaging them as skills creates false parallelism with Kit-skills and inflates the skill list.
+### Orphan Triage Protocol
+For each orphan, one of these outcomes is required:
+1. **Define the missing flow.** If the orphan describes a coherent procedural method that deserves a kit, define the flow, create the kit, and move the skill.
+2. **Reclassify as a tool.** If the orphan is really a description of how to use a raw capability, remove it from `skills/` and fold its content into agent context or harness docs.
+3. **Accept scope drift.** If the orphan does not belong in this repo's skill layer, remove it.
+No orphan should remain permanently in `skills/` without a documented triage outcome.
+### Orphan Rulings — 2026-06-15
+Brian ruled on all four orphans on 2026-06-15. All four are **REMOVED** for now; preserved intents are recorded below so the rationale is not lost.
+| Orphan Skill | Ruling | Preserved Intent |
+| --- | --- | --- |
+| `explore` | **REMOVED** — reclassified as a tool: parallel codebase-reading capability. | The original aim was multi-angle codebase capture (dependencies, security, runnability/testability, business logic, patterns). This preserved intent is the seed of a possible future `codebase-onboarding` flow if that capability is ever wanted as a first-class kit flow. |
+| `feedback-loop` | **REMOVED** — subsumed. Its "did the change work locally" concern is now handled by `verify-work` plus the flow route-back capability. | No separate preserved intent; the concern is covered. |
+| `context-budget` | **REMOVED** — agent self-maintenance; relates conceptually to `learning-review`. Not a flow-step skill. | Conceptually adjacent to `learning-review`; could inform a future self-maintenance flow, but is not a flow-step skill under the current model. |
+| `frontend-design` | **REMOVED** for now. | "Plan-work but for UI" — the seed of a possible future UI/Frontend Kit with design + visual-verify steps. Revisit if a UI kit is ever built. |
+These rulings do not change this ADR's status. The ADR remains **Proposed** pending Brian's separate confirmation of the whole document.
+### Knowledge Kit Boundary
+`knowledge-capture` is the one skill in `skills/` that belongs to the Knowledge Kit rather than the Builder Kit. Its canonical flow is `knowledge.ingest:capture`. Its presence in the top-level `skills/` directory (rather than in `kits/knowledge/`) is itself a consequence of the pre-ADR undifferentiated structure. Future kit authoring should place kit skills inside their kit directory.
+### Audit Table as Evidence
+The companion document `docs/adr/0007-skill-audit.md` is the authoritative mapping of every skill in `skills/` to this model. It provides the evidence base for Builder Kit issue #62 and any future kit migrations.
+---
+## Alternatives Considered
+### Keep the "General Capability Skill Tier" as a Peer Category
+Rejected. The general capability tier was never designed — it accreted. Treating it as a first-class category alongside Kit-skills would institutionalize an accident. The model is simpler with exactly one kind of skill.
+### Allow Skills to Be Defined Without Flow Binding
+Rejected. The whole value of the model is that a skill's purpose, ownership, and location are derivable from its flow step. A skill with no flow binding is undefined in the model — it either needs a flow or needs to be reclassified.
+### Cross-Kit Skill Sharing via npm Dependency (DEFERRED)
+Not rejected but not adopted now. The npm-dependency model for cross-kit sharing is the most likely future path if skills need to be consumed across kits. It is deferred until a concrete cross-kit case requires it. See "Cross-Kit Skill Sharing: DEFERRED" above.
+### Move All Tool-like Skills to Agent System Prompts Immediately
+Deferred. The tool-like skills in `skills/` have runtime users. Moving them without updating agent specs and evals would break existing behavior. This ADR records the intended direction; the migrations should be sequenced through backlog issues.
+---
+## References
+- [ADR 0001: Flow Agents Consumes Flow For Workflow Enforcement](./0001-flow-agents-consumes-flow.md) — establishes that Flow owns Flow Definitions and Flow Agents consumes them.
+- [ADR 0002: Flow Kits as Extension Unit](./0002-flow-kits-as-extension-unit.md) — establishes the kit as the packaging unit.
+- [ADR 0003: Flow Agents Coordinates Kits and Adapters](./0003-flow-agents-coordinates-kits-and-adapters.md) — establishes how Flow Agents relates to kits.
+- [docs/adr/0007-skill-audit.md](./0007-skill-audit.md) — companion skill audit table.
+- `kits/builder/flows/build.flow.json` — Builder Kit build flow step IDs.
+- `kits/builder/flows/shape.flow.json` — Builder Kit shape flow step IDs.
+- `kits/knowledge/flows/ingest.flow.json` — Knowledge Kit ingest flow step IDs.
+- GitHub issue #62 — Builder Kit skill placement fix (becomes mechanical given this model).
+- GitHub issues #52 and #60 — agentless gate-eval work proving Flow Definitions are agentless-capable.

package/docs/adr/0007-skill-audit.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+title: "Skill Audit 2026-06-15: Flow / Skill / Kit / Tool Boundary"
+---
+# Skill Audit: Flow / Skill / Kit / Tool Boundary
+**Date:** 2026-06-15
+**Companion to:** [ADR 0007](./0007-flow-skill-kit-tool-boundary.md)
+**Scope:** All 26 skills in `skills/` — no skills declared inside kit directories were found separate from those already listed here.
+---
+## Classification Key
+| Label | Meaning |
+| --- | --- |
+| **KIT-SKILL** | The agent's procedural method for one step of a kit-owned flow. Belongs in the kit that owns that flow. |
+| **TOOL** | A raw capability the agent wields. Not tied to any flow step. Should be provided by the runtime or harness, not packaged as a "skill." |
+| **ORPHAN** | Procedural but no flow step can be cited as the home. Either implies a missing/implicit flow, or signals scope drift. |
+Flow step IDs used below are from:
+- `kits/builder/flows/build.flow.json` — steps: `pull-work`, `design-probe`, `plan`, `execute`, `verify`, `merge-ready`, `pr-open`, `merge-ready-ci`, `learn`, `done`
+- `kits/builder/flows/shape.flow.json` — steps: `shape`, `breakdown`, `file-issues`, `shape-done`
+- `kits/knowledge/flows/ingest.flow.json` — steps: `capture`, `classify`, `route`
+- `kits/knowledge/flows/compile.flow.json` — steps: `select-raws`, `compile`, `link`
+- `kits/knowledge/flows/synthesize.flow.json` — steps: `detect-cluster`, `propose`, `evidence-gate`, `apply-or-reject`
+- `kits/knowledge/flows/consolidate.flow.json` — steps: `related-event`, `propose`, `evidence-gate`, `apply-or-reject`
+- `kits/knowledge/flows/retire.flow.json` — steps: `identify`, `propose-retirement`, `evidence-gate`, `apply-or-reject`
+- `kits/knowledge/flows/store-contract.flow.json` — steps: `verify-contract`
+---
+## Full Audit Table
+| Skill | What It Does | Classification | Kit + Flow Step (if KIT-SKILL) / Rationale (if TOOL or ORPHAN) |
+| --- | --- | --- | --- |
+| `agentic-engineering` | Principles for eval-first loops, task decomposition (15-minute units), model routing (Haiku/Sonnet/Opus), and session strategy. | TOOL | Documents how to use the agent's cognitive capabilities and model-selection judgment. It is guidance the agent *applies* while using tools, not a method for a specific flow step. It is not tied to any flow or kit. |
+| `browser-test` | Delegates browser automation tasks — screenshots, accessibility checks, form filling, UI testing, DOM inspection — to `tool-playwright`. | TOOL | Wraps raw access to a browser automation capability (`tool-playwright`). No flow step backs it; it is a harness/runtime capability the agent directs. Equivalent to "how to run Playwright." |
+| `builder-shape` | User-facing entry into the Builder Kit shape flow — invokes `idea-to-backlog` as a primitive and links `kits/builder/flows/shape.flow.json`; stops at the backlog gate unless issue sync is requested. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.shape`. **Step:** `shape` (and through it, `breakdown` and `file-issues` via `idea-to-backlog` delegation). `builder-shape` is the agent's procedural method for satisfying the `builder.shape` flow's entry step. |
+| `context-budget` | Audits token overhead across installed Flow Agents bundles; scans components and produces a budget report with per-component breakdown and optimization suggestions. | ORPHAN | Procedural and agent-driven, but there is no flow in any kit that has a step for "audit the agent's own context budget." **Implies missing flow:** an implicit "context-health" or "self-maintenance" flow. Until that flow exists and is owned by a kit, this skill is unanchored. |
+| `deliver` | Orchestrates the full plan → execute → review → verify loop, including preflight (pull-work, pickup-probe), looping on failures, and delivery confirmation. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Steps:** orchestrates across `pull-work`, `design-probe`, `plan`, `execute`, `verify`, `merge-ready` in sequence. `deliver` is the agent's top-level orchestration method for the builder build flow. It subsumes multiple build-flow steps and is the primary orchestrator skill for that flow. |
+| `dependency-update` | Analyzes and upgrades project dependencies — delegates registry/advisory lookups to `tool-dependencies-updater`, then presents a plan and applies approved updates. | TOOL | Orchestrating a dependency scanner subagent (`tool-dependencies-updater`) is a raw capability use. There is no kit-owned flow with a "dependency-update" step. |
+| `design-probe` | Generic one-question-at-a-time alignment interview — turns unclear goals, designs, or workflow states into shared understanding before planning or execution. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `design-probe`. The skill's own SKILL.md names the Builder Kit `design-probe` step explicitly. It also applies outside Builder Kit, but the canonical flow binding is `builder.build:design-probe`. |
+| `eval-rebuild` | Defines project-specific rebuild/reinstall steps for the eval feedback loop so the `eval-builder` agent knows how to rebuild after editing a prompt or skill. | TOOL | This is harness/tooling guidance for how to run evals — a raw capability instruction with no flow step home. It is not a method for any kit-owned step; it is instructions about how the agent's own evaluation tooling works. |
+| `evidence-gate` | Evaluates whether completed work has enough trustworthy evidence, scope integrity, and provider/runtime signal to publish, continue fixing, or request a human decision. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `verify` (the gate evaluation that determines whether the verify step's evidence satisfies the gate claim `builder.verify.tests`). Also maps to `merge-ready` evidence checks. The skill explicitly separates from release-readiness and handles the `verify`-step gate logic. |
+| `execute-plan` | Parallel execution primitive — reads a plan artifact, fans out to `tool-worker` subagents in waves, and updates the session artifact between waves. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `execute`. The skill is the agent's procedural method for the `execute` step of the builder build flow. |
+| `explore` | Fans out parallel subagents to map codebase structure, entry points, dependencies, architectural patterns, config, tests, and documentation accuracy in one pass. | ORPHAN | Procedural and multi-wave but there is no kit-owned flow step for "explore a codebase." It is used as a support skill during discovery/shaping and debugging but is not anchored to a specific flow step. **Implies missing flow:** a "codebase-onboarding" or "repository-exploration" flow with an `explore` step, or it belongs as a tool-like capability rather than a flow step skill. |
+| `feedback-loop` | Verifies that completed implementation actually works by classifying the change (visual vs. integration) and delegating to the appropriate verification method (Playwright or direct command execution). | ORPHAN | There is no kit-owned flow step called "feedback-loop." It overlaps with the `verify` step of the builder build flow, but its scope is narrower (per-implementation-task confirmation) and it is used as a support skill during `execute-plan`, not as the canonical agent method for the `verify` step. **Implies missing flow:** or this is a tool-like capability (a "quick verify" affordance) that could be subsumed into `verify-work`. |
+| `fix-bug` | Bug-fix orchestrator — adds a diagnosis phase (reproduce + root-cause via `tool-planner`), then chains plan → execute → review → verify identical to `deliver`. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Steps:** `design-probe` (root-cause/diagnosis maps to alignment before planning), `plan`, `execute`, `verify`. `fix-bug` is an alternative entry into the builder build flow for defect work; it adds a diagnosis front-end and otherwise implements the same flow steps as `deliver`. |
+| `frontend-design` | Delegates frontend implementation to `tool-worker` with curated design guidelines (typography, color, motion, spatial composition, anti-patterns); requires Playwright visual verification after implementation. | ORPHAN | There is no kit-owned flow with a "frontend-design" step. This skill injects design taste into the `execute` step of the builder build flow but is not the canonical method for that step — it is used as a support layer inside `execute-plan`/`deliver`. **Implies missing flow:** a "frontend" or "UI-design" flow with dedicated design and verify steps, or this is more accurately a library of guidelines that the `execute` step (via `execute-plan`) consumes. |
+| `github-cli` | Uses the `gh` CLI to interact with GitHub — PRs, issues, repos, releases, Actions, gists, search, and arbitrary API calls. | TOOL | `gh` is a raw capability — a command-line tool the agent wields. The skill is a how-to for operating that tool, not a method for a kit-owned flow step. Used as support across many flow steps without being bound to one. |
+| `idea-to-backlog` | Turns raw product or technical ideas into shaped, prioritized, executable GitHub issue backlog through intake, separation, opportunity review, shaping, prioritization, and issue creation. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.shape`. **Steps:** `shape` (idea intake → shaped problem/outcome/constraints/non-goals/success/risk), `breakdown` (slices and thinnest meaningful slices), `file-issues` (creating GitHub issues with provider-neutral metadata). `idea-to-backlog` is the primary agent method that implements all three active steps of `builder.shape`. |
+| `knowledge-capture` | Saves durable knowledge, pointers, decisions, lessons, corrections, and source references into the knowledge base using pointer or curated-knowledge modes. | KIT-SKILL | **Kit:** knowledge. **Flow:** `knowledge.ingest`. **Step:** `capture` (the first step of `knowledge.ingest`: capture raw text → produce a raw record). This skill is the agent's method for the `capture` step of the knowledge ingest flow. |
+| `learning-review` | Captures post-merge/post-deploy/post-incident learnings and routes them back to backlog, workflow skills, tests, docs, or knowledge; includes correction telemetry and a verdict. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `learn`. The skill is the agent's method for the `learn` step of the builder build flow: turn delivery outcomes into durable learning and follow-up routing. |
+| `pickup-probe` | Builder Kit specialization of the `design-probe` flow step — records scope, provider state, WIP/conflict scan, revision freshness, decisions, unresolved questions, accepted gaps, and planning readiness for selected backlog work. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `design-probe` (the `pickup-probe` skill is explicitly described in its SKILL.md as "the Builder Kit pickup specialization of the `design-probe` flow step"). It implements the `design-probe` step for the productized pickup path. |
+| `plan-work` | Planning primitive — delegates codebase analysis and execution plan creation to `tool-planner`; produces a plan artifact, `acceptance.json`, and `handoff.json`. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `plan`. The skill is the agent's method for the `plan` step of the builder build flow. |
+| `pull-work` | Selects ready GitHub issues from the backlog, enforces WIP limits, checks dependencies, determines worktree isolation, and hands selected work to planning. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `pull-work`. The skill is the agent's method for the `pull-work` step of the builder build flow. |
+| `release-readiness` | Decides whether evidence-backed work is ready to merge, release, deploy, or hold — checks committed/pushed state, provider change record, CI/checks, rollback plan, observability, and docs; produces a structured release decision. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Steps:** `merge-ready` and `merge-ready-ci`. The skill implements the agent-facing logic for both merge readiness gates: it consumes evidence-gate output, checks operational and CI state, and produces a merge/release/deploy/hold decision. |
+| `review-work` | Report-only critique primitive — delegates to `tool-code-reviewer`, `tool-security-reviewer`, and optionally `tool-dependencies-updater`; records findings through the `critique.json` artifact/sink. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `verify` (review is part of the verify gate's quality checks) or more precisely as an intermediate step between `execute` and the formal `verify` gate. The SKILL.md describes it as a gate that must be satisfied before verification. Mapped to: between `execute` and `verify` in `builder.build`. |
+| `search-first` | Research-before-coding workflow — searches the codebase, package registries, GitHub, and web in parallel; evaluates candidates; and decides to adopt, extend, or build before writing code. | TOOL | This is a research/lookup methodology, not the agent's method for a specific flow step. It is used as a support behavior across multiple steps (shaping, planning, execution) without being anchored to one. It could be seen as a harness capability (web + registry search). |
+| `tdd-workflow` | TDD orchestrator — wraps plan → execute → review → verify with test-first constraints, git checkpoints (RED/GREEN/REFACTOR), and a coverage gate (>= 80%). | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Steps:** `plan`, `execute`, `verify`. `tdd-workflow` is an alternative parameterization of the builder build flow that enforces test-first discipline across those three steps. |
+| `verify-work` | Verification primitive — delegates to `tool-verifier` and `tool-playwright`; maps evidence to acceptance criteria; updates `evidence.json` and `acceptance.json`. | KIT-SKILL | **Kit:** builder. **Flow:** `builder.build`. **Step:** `verify`. The skill is the canonical agent method for the `verify` step. |
+---
+## Summary Counts
+| Category | Count | Notes |
+| --- | --- | --- |
+| **KIT-SKILL (builder kit)** | 17 | `builder-shape`, `deliver`, `design-probe`, `evidence-gate`, `execute-plan`, `fix-bug`, `idea-to-backlog`, `learning-review`, `pickup-probe`, `plan-work`, `pull-work`, `release-readiness`, `review-work`, `tdd-workflow`, `verify-work`, `knowledge-capture` (builder-side consumer), `fix-bug` (builder alt-entry) |
+| **KIT-SKILL (knowledge kit)** | 1 | `knowledge-capture` implements `knowledge.ingest:capture` |
+| **TOOL** | 6 | `agentic-engineering`, `browser-test`, `dependency-update`, `eval-rebuild`, `github-cli`, `search-first` |
+| **ORPHAN** | 4 | `context-budget`, `explore`, `feedback-loop`, `frontend-design` |
+Note: `knowledge-capture` appears in both the builder-kit count above and the knowledge-kit count. The canonical home is the Knowledge Kit (`knowledge.ingest:capture`); its use inside builder flows is as a support dependency.
+Corrected final count:
+- **KIT-SKILL:** 17 total — 16 belonging to the builder kit (across `builder.build` and `builder.shape` flows), 1 belonging to the knowledge kit (`knowledge.ingest:capture`)
+- **TOOL:** 6
+- **ORPHAN:** 4
+---
+## Orphans With "Implies Missing Flow" Detail
+| Orphan Skill | Implication | Disposition |
+| --- | --- | --- |
+| `context-budget` | Implies missing flow: a "context-health" or "agent-self-audit" flow. No kit currently owns context-budget management as a named flow. This could eventually become a `builder.context-audit` or standalone kit flow. Alternatively, if the repo decides context budgeting is always a harness concern, this should be reclassified as a TOOL and the skill dissolved or folded into harness documentation. | **REMOVED** (2026-06-15). Agent self-maintenance; not a flow-step skill. Conceptually adjacent to `learning-review`; preserved intent noted in ADR 0007. |
+| `explore` | Implies missing flow: a "codebase-onboarding" or "repository-exploration" flow with discrete steps (structure, entry points, dependencies, patterns, docs accuracy). Alternatively, `explore` is a multi-step capability the agent uses across many flow phases — in which case it is more accurately a TOOL (raw codebase-reading capability orchestrated across subagents) than a flow-step skill. | **REMOVED** (2026-06-15). Reclassified as a tool (parallel codebase-reading capability). Preserved intent: seed of a possible future `codebase-onboarding` flow — see ADR 0007. |
+| `feedback-loop` | Implies missing flow: or more precisely, it overlaps with the `verify` step of `builder.build` without being the canonical method for it. The skill is used as a lightweight per-task verification inside `execute-plan`. If the builder build flow added a sub-step or explicit "local-verify" step between `execute` and the formal `verify` gate, `feedback-loop` would map there. Otherwise, it should be subsumed into `verify-work` or reclassified as support tooling. | **REMOVED** (2026-06-15). Subsumed: concern now handled by `verify-work` plus flow route-back. |
+| `frontend-design` | Implies missing flow: a "frontend" or "UI-kit" flow with steps for design direction, implementation, and visual verification. Alternatively, the design guidelines could be packaged as a context resource injected into `execute-plan`/`tool-worker` rather than as a separate "skill." If it stays a skill, it belongs in a hypothetical UI Kit that owns a `frontend.build` flow with design and verify steps. | **REMOVED** (2026-06-15). Preserved intent: "plan-work but for UI" — seed of a possible future UI/Frontend Kit with design + visual-verify steps. Revisit if a UI kit is built. |
+---
+## Implementation Record (Issue #62, 2026-06-15)
+The dispositions in this audit table were implemented in PR #62:
+- **16 KIT-SKILLS moved to Builder Kit:** `builder-shape`, `deliver`, `design-probe`, `evidence-gate`, `execute-plan`, `fix-bug`, `idea-to-backlog`, `learning-review`, `pickup-probe`, `plan-work`, `pull-work`, `release-readiness`, `review-work`, `tdd-workflow`, `verify-work` — moved from `skills/<name>/` to `kits/builder/skills/<name>/` and declared in `kits/builder/kit.json` `skills` array.
+- **1 KIT-SKILL moved to Knowledge Kit:** `knowledge-capture` — moved to `kits/knowledge/skills/knowledge-capture/` and declared in `kits/knowledge/kit.json` `skills` array.
+- **4 ORPHANS deleted:** `context-budget`, `explore`, `feedback-loop`, `frontend-design` — removed per Brian's 2026-06-15 ruling above.
+- **6 TOOLs left in place:** `agentic-engineering`, `browser-test`, `dependency-update`, `eval-rebuild`, `github-cli`, `search-first` — remain in `skills/` pending separate reclassification. See `skills/README.md`.
+**Structural changes:**
+- `src/tools/build-universal-bundles.ts`: `collectAllSkills()` function added; bundle builders now collect skills from both `skills/` (tool-skills) and kit-declared `skills` arrays. Runtime bundles (`.claude/skills/`, `.codex/skills/`, etc.) include all kit-owned skills unchanged.
+- `src/tools/generate-context-map.ts`: `allSkillPaths()` function added; context map generation now includes kit-owned skills.
+- `src/tools/validate-source-tree.ts`: `validateLegacyRefs()` updated to skip legacy-ref matches that resolve as declared kit-owned asset subpaths.
+- `packaging/packs.json`: Skill entries limited to the 6 remaining tool-skills in `skills/`. Kit-owned skills are no longer listed in packs (they're always included in the bundle as kit assets).
+- `flow-agents kit inspect kits/builder` now reports `k1: true` (skills present).
+- `flow-agents kit inspect kits/knowledge` now reports `k1: true` (skills present).

package/docs/adr/0008-kit-operation-boundary.md ADDED Viewed

@@ -0,0 +1,88 @@
+---
+title: "ADR 0008: Kit Operation Boundary"
+---
+# ADR 0008: Kit Operation Boundary
+**Date:** 2026-06-15
+**Status:** Accepted
+---
+## Context
+A kit is the SEAM where Flow and Flow Agents meet: Flow owns the container (manifest + flows), Flow Agents owns the extension (skills, adapters, docs, activation), and a kit fuses both into one package. So "which layer owns an operation on a kit?" is ill-posed whenever the operation touches both halves.
+Of the kit operations: `validate` touches only the container (cleanly Flow); `activate` touches only a specific agent runtime — codex-local/strands-local (cleanly Flow Agents); `install` and `inspect` STRADDLE the seam.
+A concrete duplication was found motivating this: flow-agents reimplements Flow's container contract — `src/flow-kit/validate.ts` has its own `validateCoreContainer` (schema_version/id/name/flows), while Flow exposes the authoritative `validateKitContainer` from its `src/index.ts`, and flow-agents does not even depend on `@kontourai/flow`. The two contracts can silently drift.
+This ADR records the design decision reached with Brian Anderson on 2026-06-15.
+---
+## Decision
+### The Dividing Test
+Does the operation need to INTERPRET the agent extension (what a skill or adapter MEANS), or only the container (manifest + flows + the *names* of declared asset classes)? Container-only → Flow. Extension-interpreting → Flow Agents.
+### Flow Owns the Agent-Blind Kit Operations
+`flow kit validate`, `flow kit install` (fetch + validate + place a kit package), `flow kit inspect` (container validity + flows + declared asset-class NAMES — the K0/structural view). Flow knows NOTHING about what a skill or adapter means.
+### Flow Agents Owns the Extension Operations
+`flow-agents kit activate` (wire the extension into a runtime), plus the extension-interpreting augmentation of install (place skill/adapter assets) and inspect (interpret asset classes → K1/K2 + runtime targets). Flow Agents COMPOSES on Flow's primitives; it never reimplements them.
+### The Agent-Blind Guardrail
+Flow's kit operations must NEVER interpret extension semantics — fetch, validate, place, report-structure, full stop. Holding this line keeps Flow's operations genuinely generic even though flow-agents is currently the only consumer; the line between the layers is precisely "does it interpret the extension?"
+### DRY via Delegation
+flow-agents depends on `@kontourai/flow`, deletes `validateCoreContainer`, and delegates all container work to Flow's primitives. The container contract lives ONCE, in Flow.
+### Flow Agents Is the Reference Consumer
+Flow Agents is the worked example for any future producer building on Flow — lean on Flow's agent-blind primitives, add your own extension layer in your own CLI.
+---
+## Consequences
+### CLI Surface
+`flow kit <validate|install|inspect>` (container, agent-blind) and `flow-agents kit <install|inspect|activate>` (extension-composing). The standalone `flow-kit` binary and the flat `flow validate-kit` verb are deprecated with aliases.
+### Position C Adopted
+This adopts position C (generic kit operations live in Flow) over position B (whole-kit lifecycle stays in flow-agents). C was chosen because doing it twice (B now, migrate later) is wasteful, and the agent-blind guardrail removes the premature-abstraction risk that motivated B.
+### Breaking CLI Change
+Breaking CLI change on published 1.x in BOTH repos → deprecation aliases + a coordinated release.
+### Separate-Product-Ready
+Because the primitives live in Flow with flow-agents as a consumer, Flow Kits could later be productized (container + primitives + marketplace) without re-architecture.
+---
+## Alternatives Considered
+### Position B (kit lifecycle entirely in flow-agents, defer generic Flow ops)
+Rejected. Defensible short-term (flow-agents is the only layer that comprehends a whole kit today) but causes a double-migration; the agent-blind guardrail makes C's genericity real now, removing B's main justification.
+### Position A (container ops = Flow, runtime ops = flow-agents, by category)
+Rejected earlier in the discussion — kits are not pure containers, so a category split mislocates install/inspect.
+---
+## References
+- [ADR 0007: Flow / Skill / Kit / Tool Boundary](./0007-flow-skill-kit-tool-boundary.md) — the skill/tool boundary; same conversation.
+- GitHub #62 (Builder Kit skill placement), #50 / #79 (marketplace / trust layer), #52 / #60 (agentless gate-eval proving Flow Definitions are agentless-capable).
+- kontourai/flow container spec (flow PR #67) — establishes Flow owns the container contract + `validateKitContainer`.

package/docs/context-map.md CHANGED Viewed

@@ -65,18 +65,18 @@ Primary tools: `npm run workflow:sidecar`, `npm run workflow:validate-artifacts`
 | Skill | Source | When To Load |
 | --- | --- | --- |
-| deliver | skills/deliver/SKILL.md | Delivery workflow — selected work to delivered code. Ensures pull-work + pickup-probe preflight, then chains plan-work → execute-plan → review-work → verify-work → loop on failure without requiring user interaction between cleanly determ... |
-| evidence-gate | skills/evidence-gate/SKILL.md | Evaluate whether completed work is trustworthy enough for human review, merge, or release. Use after implementation, verify-work, provider checks, CI, or remediation to map acceptance criteria to evidence, inspect scope integrity, classi... |
-| execute-plan | skills/execute-plan/SKILL.md | Parallel execution primitive — plan artifact path to implemented code via tool-worker (x4). Reads plan directly. Updates session file between waves. |
-| fix-bug | skills/fix-bug/SKILL.md | Bug fix orchestrator — diagnose → plan-work → execute-plan → review-work → verify-work → loop. Diagnosis phase is unique to bugs, then chains the same primitives. |
-| idea-to-backlog | skills/idea-to-backlog/SKILL.md | Turn raw product or technical ideas into shaped, prioritized, executable GitHub issue backlog. Use for idea intake, ideation, product shaping, spike/prototype decisions, PRD-like feature briefs, prioritization, and backlog creation befor... |
-| learning-review | skills/learning-review/SKILL.md | Capture post-merge, post-deploy, or post-incident learnings and feed them back into backlog, workflow skills, tests, docs, or knowledge. Use after release readiness, post-deploy checks, retrospectives, failed gates, or repeated workflow... |
-| plan-work | skills/plan-work/SKILL.md | Code planning primitive — goal + directory to structured execution plan. Delegates to tool-planner. No resume, no ideation. |
-| pull-work | skills/pull-work/SKILL.md | Select ready GitHub issues from the executable backlog and prepare them for implementation. Use when choosing what to work on next, reviewing a kanban-style issue board, enforcing WIP limits, grouping issues, deciding worktree isolation,... |
-| release-readiness | skills/release-readiness/SKILL.md | Decide whether evidence-backed work is ready to merge, release, deploy, or hold. Use after evidence-gate PASS, before merge/release/deploy, and for post-deploy verification planning. |
-| review-work | skills/review-work/SKILL.md | Review primitive - run report-only code, security, dependency, architecture/standards, and IaC/policy critique before verification; records findings through the critique artifact/sink, currently critique.json locally. |
-| tdd-workflow | skills/tdd-workflow/SKILL.md | Test-driven development — RED → GREEN → REFACTOR with git checkpoints. Wraps plan-work → execute-plan → review-work → verify-work with test-first constraints and coverage gates. |
-| verify-work | skills/verify-work/SKILL.md | Verification primitive — session file path to structured evidence verdict via tool-verifier + tool-playwright. Reads acceptance criteria from plan artifact. |
+| deliver | kits/builder/skills/deliver/SKILL.md | Delivery workflow — selected work to delivered code. Ensures pull-work + pickup-probe preflight, then chains plan-work → execute-plan → review-work → verify-work → loop on failure without requiring user interaction between cleanly determ... |
+| evidence-gate | kits/builder/skills/evidence-gate/SKILL.md | Evaluate whether completed work is trustworthy enough for human review, merge, or release. Use after implementation, verify-work, provider checks, CI, or remediation to map acceptance criteria to evidence, inspect scope integrity, classi... |
+| execute-plan | kits/builder/skills/execute-plan/SKILL.md | Parallel execution primitive — plan artifact path to implemented code via tool-worker (x4). Reads plan directly. Updates session file between waves. |
+| fix-bug | kits/builder/skills/fix-bug/SKILL.md | Bug fix orchestrator — diagnose → plan-work → execute-plan → review-work → verify-work → loop. Diagnosis phase is unique to bugs, then chains the same primitives. |
+| idea-to-backlog | kits/builder/skills/idea-to-backlog/SKILL.md | Turn raw product or technical ideas into shaped, prioritized, executable GitHub issue backlog. Use for idea intake, ideation, product shaping, spike/prototype decisions, PRD-like feature briefs, prioritization, and backlog creation befor... |
+| learning-review | kits/builder/skills/learning-review/SKILL.md | Capture post-merge, post-deploy, or post-incident learnings and feed them back into backlog, workflow skills, tests, docs, or knowledge. Use after release readiness, post-deploy checks, retrospectives, failed gates, or repeated workflow... |
+| plan-work | kits/builder/skills/plan-work/SKILL.md | Code planning primitive — goal + directory to structured execution plan. Delegates to tool-planner. No resume, no ideation. |
+| pull-work | kits/builder/skills/pull-work/SKILL.md | Select ready GitHub issues from the executable backlog and prepare them for implementation. Use when choosing what to work on next, reviewing a kanban-style issue board, enforcing WIP limits, grouping issues, deciding worktree isolation,... |
+| release-readiness | kits/builder/skills/release-readiness/SKILL.md | Decide whether evidence-backed work is ready to merge, release, deploy, or hold. Use after evidence-gate PASS, before merge/release/deploy, and for post-deploy verification planning. |
+| review-work | kits/builder/skills/review-work/SKILL.md | Review primitive - run report-only code, security, dependency, architecture/standards, and IaC/policy critique before verification; records findings through the critique artifact/sink, currently critique.json locally. |
+| tdd-workflow | kits/builder/skills/tdd-workflow/SKILL.md | Test-driven development — RED → GREEN → REFACTOR with git checkpoints. Wraps plan-work → execute-plan → review-work → verify-work with test-first constraints and coverage gates. |
+| verify-work | kits/builder/skills/verify-work/SKILL.md | Verification primitive — session file path to structured evidence verdict via tool-verifier + tool-playwright. Reads acceptance criteria from plan artifact. |
 ## Support Skills
@@ -84,17 +84,13 @@ Primary tools: `npm run workflow:sidecar`, `npm run workflow:validate-artifacts`
 | --- | --- | --- |
 | agentic-engineering | skills/agentic-engineering/SKILL.md | Eval-first execution, task decomposition, and cost-aware model routing for AI-driven development workflows. |
 | browser-test | skills/browser-test/SKILL.md | Headless browser automation via Playwright — screenshots, accessibility checks, form filling, UI testing, DOM inspection. |
-| builder-shape | skills/builder-shape/SKILL.md | Invoke Builder Kit shape from a raw idea or the current conversation context without requiring the user to name idea-to-backlog. Delegates shaping to idea-to-backlog, records the Builder Kit Flow Definition link, and stops at the backlog... |
-| context-budget | skills/context-budget/SKILL.md | Audit token overhead across Flow Agents bundles — agent specs, skills, context files, MCP servers. Produces budget report with per-component breakdown and optimization suggestions. |
+| builder-shape | kits/builder/skills/builder-shape/SKILL.md | Invoke Builder Kit shape from a raw idea or the current conversation context without requiring the user to name idea-to-backlog. Delegates shaping to idea-to-backlog, records the Builder Kit Flow Definition link, and stops at the backlog... |
 | dependency-update | skills/dependency-update/SKILL.md | Analyze and upgrade project dependencies — latest versions, security vulnerabilities, actionable update plan across all package managers. |
-| design-probe | skills/design-probe/SKILL.md | Generic one-question-at-a-time design probing interview for turning unclear goals, designs, or workflow states into shared understanding before planning or execution. |
+| design-probe | kits/builder/skills/design-probe/SKILL.md | Generic one-question-at-a-time design probing interview for turning unclear goals, designs, or workflow states into shared understanding before planning or execution. |
 | eval-rebuild | skills/eval-rebuild/SKILL.md | Project-specific build and install commands for the eval feedback loop. Injected into eval-builder agent. Replace this skill for different build systems. |
-| explore | skills/explore/SKILL.md | Parallel codebase exploration — fans out subagents to map structure, entry points, dependencies, patterns, config, and tests in one pass. |
-| feedback-loop | skills/feedback-loop/SKILL.md | Verify implementation actually works. Visual changes → Playwright; integration changes → commands/tests. Run after completing builds. |
-| frontend-design | skills/frontend-design/SKILL.md | Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics. |
 | github-cli | skills/github-cli/SKILL.md | Interact with GitHub via gh CLI — PRs, issues, repos, releases, workflows, gists. |
-| knowledge-capture | skills/knowledge-capture/SKILL.md | Save durable knowledge, lightweight pointers, user corrections, decisions, lessons, relationship context, or source references into the knowledge base. Use when the user says save, remember, capture, file this, bookmark context, or when... |
-| pickup-probe | skills/pickup-probe/SKILL.md | Builder Kit work-item/docs/provider-grounded Probe specialization used at the design-probe flow step before plan-work. |
+| knowledge-capture | kits/knowledge/skills/knowledge-capture/SKILL.md | Save durable knowledge, lightweight pointers, user corrections, decisions, lessons, relationship context, or source references into the knowledge base. Use when the user says save, remember, capture, file this, bookmark context, or when... |
+| pickup-probe | kits/builder/skills/pickup-probe/SKILL.md | Builder Kit work-item/docs/provider-grounded Probe specialization used at the design-probe flow step before plan-work. |
 | search-first | skills/search-first/SKILL.md | Research-before-coding workflow. Search for existing tools, libraries, and patterns before writing custom code. |
 ## Agents
@@ -129,8 +125,8 @@ Pack composition is defined in `packaging/packs.json`. The current builder expor
 | Pack | Default | Skills | Agents | Powers | Purpose |
 | --- | --- | --- | --- | --- | --- |
-| core | yes | 9 | 5 | 1 | Small default surface for reliable coding and workflow execution. |
-| development | no | 17 | 9 | 1 | Development workflow depth for backlog, release, dependency, GitHub, TDD, and frontend work. |
+| core | yes | 2 | 5 | 1 | Small default surface for reliable coding and workflow execution. |
+| development | no | 4 | 9 | 1 | Development workflow depth for backlog, release, dependency, GitHub, TDD, and frontend work. |
 ## Current Workflow State

package/docs/flow-kit-repository-contract.md CHANGED Viewed

@@ -23,11 +23,11 @@ npm run validate:source --
 Installed Flow Agents bundles include a local-only install command for Flow Kit repositories that already exist on disk:
 ```bash
-npm run flow-kit -- install-local path/to/local-kit --dest /path/to/installed-flow-agents
-npm run flow-kit -- list --dest /path/to/installed-flow-agents
-npm run flow-kit -- status --dest /path/to/installed-flow-agents
-npm run flow-kit -- status example-kit --dest /path/to/installed-flow-agents
-npm run flow-kit -- activate --dest /path/to/installed-flow-agents --format json
+npm run kit -- install path/to/local-kit --dest /path/to/installed-flow-agents
+npm run kit -- list --dest /path/to/installed-flow-agents
+npm run kit -- status --dest /path/to/installed-flow-agents
+npm run kit -- status example-kit --dest /path/to/installed-flow-agents
+npm run kit -- activate --dest /path/to/installed-flow-agents --format json
 ```
 `--dest` is the installed bundle or workspace root. When omitted, the command uses the current working directory. Tests and automation should pass a temp destination; the command does not need to write to a user home directory.