npm - @kontourai/flow-agents - Versions diffs - 0.1.1 - Mend

@kontourai/flow-agents 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (418) hide show

package/.githooks/pre-push +11 -0
package/.github/workflows/ci.yml +210 -0
package/.github/workflows/docs-pages.yml +52 -0
package/.github/workflows/publish-npm.yml +104 -0
package/AGENTS.md +26 -0
package/CHANGELOG.md +66 -0
package/CODE_OF_CONDUCT.md +25 -0
package/CONTEXT.md +300 -0
package/CONTRIBUTING.md +44 -0
package/LICENSE +201 -0
package/README.md +129 -0
package/SECURITY.md +33 -0
package/agent-cards/dev.json +19 -0
package/agents/dev.json +127 -0
package/agents/tool-code-reviewer.json +61 -0
package/agents/tool-dependencies-updater.json +118 -0
package/agents/tool-explore-config.json +92 -0
package/agents/tool-explore-deps.json +92 -0
package/agents/tool-explore-entry.json +92 -0
package/agents/tool-explore-patterns.json +92 -0
package/agents/tool-explore-structure.json +92 -0
package/agents/tool-explore-tests.json +92 -0
package/agents/tool-planner.json +57 -0
package/agents/tool-playwright.json +145 -0
package/agents/tool-security-reviewer.json +56 -0
package/agents/tool-verifier.json +61 -0
package/agents/tool-worker.json +58 -0
package/build/src/cli/console-learning-projection.js +123 -0
package/build/src/cli/docs-preview.js +39 -0
package/build/src/cli/effective-backlog-settings.js +102 -0
package/build/src/cli/export-bookmarks.js +38 -0
package/build/src/cli/fixture-retirement-audit.js +140 -0
package/build/src/cli/flow-kit.js +138 -0
package/build/src/cli/import-bookmarks.js +50 -0
package/build/src/cli/init.js +239 -0
package/build/src/cli/instinct-cli.js +93 -0
package/build/src/cli/promote-workflow-artifact.js +63 -0
package/build/src/cli/publish-change-helper.js +154 -0
package/build/src/cli/pull-work-provider.js +469 -0
package/build/src/cli/runtime-adapter.js +23 -0
package/build/src/cli/telemetry-doctor.js +221 -0
package/build/src/cli/usage-feedback.js +443 -0
package/build/src/cli/validate-hook-influence.js +152 -0
package/build/src/cli/validate-source-tree.js +31 -0
package/build/src/cli/validate-workflow-artifacts.js +486 -0
package/build/src/cli/veritas-governance.js +262 -0
package/build/src/cli/workflow-artifact-cleanup-audit.js +272 -0
package/build/src/cli/workflow-sidecar.js +816 -0
package/build/src/cli.js +89 -0
package/build/src/flow-kit/validate.js +75 -0
package/build/src/lib/args.js +45 -0
package/build/src/lib/fs.js +62 -0
package/build/src/lib/workflow-learning-projection.js +334 -0
package/build/src/runtime-adapters.js +146 -0
package/build/src/tools/build-universal-bundles.js +397 -0
package/build/src/tools/common.js +56 -0
package/build/src/tools/filter-installed-packs.js +132 -0
package/build/src/tools/generate-context-map.js +198 -0
package/build/src/tools/validate-package.js +64 -0
package/build/src/tools/validate-source-tree.js +622 -0
package/console.telemetry.json +176 -0
package/context/base-rules.md +17 -0
package/context/code-review-standards.md +62 -0
package/context/coding-standards.md +42 -0
package/context/common/orchestrators.md +12 -0
package/context/common/subagents.md +28 -0
package/context/contracts/artifact-contract.md +182 -0
package/context/contracts/builder-kit-workflow-state-contract.md +319 -0
package/context/contracts/delivery-contract.md +69 -0
package/context/contracts/execution-contract.md +53 -0
package/context/contracts/governance-adapter-contract.md +67 -0
package/context/contracts/planning-contract.md +85 -0
package/context/contracts/review-contract.md +104 -0
package/context/contracts/sandbox-policy.md +52 -0
package/context/contracts/verification-contract.md +134 -0
package/context/contracts/work-item-contract.md +215 -0
package/context/deferred/demo-mode.md +33 -0
package/context/deferred/languages/go.md +31 -0
package/context/deferred/languages/python.md +31 -0
package/context/deferred/languages/typescript.md +34 -0
package/context/deferred/parallelization.md +35 -0
package/context/deferred/worktree-isolation.md +24 -0
package/context/development-workflow.md +50 -0
package/context/scripts/context-budget/budget-scan.sh +166 -0
package/context/scripts/detect-tools.sh +3 -0
package/context/scripts/discover-agents.sh +28 -0
package/context/scripts/git-status.sh +49 -0
package/context/scripts/hooks/config-protection.js +79 -0
package/context/scripts/hooks/desktop-notify.sh +39 -0
package/context/scripts/hooks/governance-audit.sh +135 -0
package/context/scripts/hooks/lib/audit-transport.sh +40 -0
package/context/scripts/hooks/lib/hook-flags.js +49 -0
package/context/scripts/hooks/lib/patterns.sh +57 -0
package/context/scripts/hooks/lib/resolve-formatter.js +80 -0
package/context/scripts/hooks/post-edit-accumulator.js +66 -0
package/context/scripts/hooks/pre-commit-quality.js +194 -0
package/context/scripts/hooks/quality-gate.js +93 -0
package/context/scripts/hooks/report-only-guard.js +21 -0
package/context/scripts/hooks/run-hook.js +136 -0
package/context/scripts/hooks/stop-format-typecheck.js +141 -0
package/context/scripts/hooks/stop-goal-fit.js +337 -0
package/context/scripts/hooks/workflow-steering.js +250 -0
package/context/scripts/telemetry/console-presets.sh +14 -0
package/context/scripts/telemetry/install-console-config.sh +214 -0
package/context/scripts/telemetry/lib/config.sh +85 -0
package/context/scripts/telemetry/lib/enrich.sh +115 -0
package/context/scripts/telemetry/lib/redact.sh +22 -0
package/context/scripts/telemetry/lib/session.sh +63 -0
package/context/scripts/telemetry/lib/transport.sh +183 -0
package/context/scripts/telemetry/lib/usage.sh +29 -0
package/context/scripts/telemetry/sync-agents.sh +173 -0
package/context/scripts/telemetry/telemetry.conf +23 -0
package/context/scripts/telemetry/telemetry.sh +387 -0
package/context/scripts/validate-package.sh +89 -0
package/context/settings/backlog-provider-settings.json +54 -0
package/context/templates/core/identity.md +26 -0
package/context/templates/core/user.md +15 -0
package/docs/_config.yml +15 -0
package/docs/_layouts/default.html +87 -0
package/docs/adr/0001-flow-agents-consumes-flow.md +77 -0
package/docs/adr/0002-flow-kits-as-extension-unit.md +13 -0
package/docs/adr/0003-flow-agents-coordinates-kits-and-adapters.md +13 -0
package/docs/adr/0004-gates-expect-surface-claims.md +15 -0
package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +48 -0
package/docs/adr/0006-typescript-first-source-policy.md +98 -0
package/docs/agent-system-guidebook.md +391 -0
package/docs/agent-usage-feedback-loop.md +351 -0
package/docs/assets/favicon.svg +13 -0
package/docs/assets/og-image.png +0 -0
package/docs/assets/site.css +774 -0
package/docs/assets/site.js +139 -0
package/docs/configurable-workflow-routing.md +174 -0
package/docs/context-map.md +145 -0
package/docs/developer-architecture.md +145 -0
package/docs/developer-hook-setup.md +61 -0
package/docs/fixture-ownership.md +44 -0
package/docs/flow-kit-repository-contract.md +180 -0
package/docs/index.md +129 -0
package/docs/kontour-resource-contract.md +358 -0
package/docs/migrations.md +64 -0
package/docs/north-star.md +322 -0
package/docs/operating-layers.md +110 -0
package/docs/repository-structure.md +132 -0
package/docs/sandbox-policy.md +56 -0
package/docs/skills-map.md +203 -0
package/docs/standards-register.md +96 -0
package/docs/veritas-integration.md +165 -0
package/docs/work-item-adapters.md +72 -0
package/docs/workflow-artifact-lifecycle.md +141 -0
package/docs/workflow-eval-strategy.md +295 -0
package/docs/workflow-shared-contracts.md +51 -0
package/docs/workflow-usage-guide.md +443 -0
package/evals/ARCHITECTURE.md +143 -0
package/evals/CONVENTIONS.md +58 -0
package/evals/README.md +128 -0
package/evals/acceptance/run.sh +29 -0
package/evals/acceptance/test_claude_harness.sh +242 -0
package/evals/acceptance/test_codex_harness.sh +108 -0
package/evals/acceptance/test_kiro_harness.sh +128 -0
package/evals/cases/dev/404.html +97 -0
package/evals/cases/dev/code-review.yaml +44 -0
package/evals/cases/dev/dashboard.html +300 -0
package/evals/cases/dev/deliver.yaml +66 -0
package/evals/cases/dev/dependency-update.yaml +16 -0
package/evals/cases/dev/explore.yaml +20 -0
package/evals/cases/dev/index.html +370 -0
package/evals/cases/dev/package-lock.json +28 -0
package/evals/cases/dev/package.json +16 -0
package/evals/cases/dev/plan-work.yaml +20 -0
package/evals/cases/dev/promptfooconfig.yaml +666 -0
package/evals/cases/dev/search-first.yaml +20 -0
package/evals/cases/dev/tdd-workflow.yaml +48 -0
package/evals/cases/dev/verify-work.yaml +44 -0
package/evals/cases/dev/workflow.yaml +34 -0
package/evals/ci/run-baseline.sh +283 -0
package/evals/fixtures/backlog-provider-settings/global-default.json +44 -0
package/evals/fixtures/backlog-provider-settings/project-override.json +53 -0
package/evals/fixtures/builder-kit-workflow-state/baseline-freshness-resolution-hint.json +139 -0
package/evals/fixtures/builder-kit-workflow-state/direct-primitive-stop.json +59 -0
package/evals/fixtures/builder-kit-workflow-state/empty-board-route-shape.json +55 -0
package/evals/fixtures/builder-kit-workflow-state/happy-path.json +71 -0
package/evals/fixtures/builder-kit-workflow-state/mid-work-resume.json +80 -0
package/evals/fixtures/builder-kit-workflow-state/missing-prestep-recovery.json +65 -0
package/evals/fixtures/builder-kit-workflow-state/product-build-chaining.json +60 -0
package/evals/fixtures/builder-kit-workflow-state/stale-continuation-requires-new-probe.json +57 -0
package/evals/fixtures/console-learning-projection/artifacts/console-learning-correction/learning.json +50 -0
package/evals/fixtures/console-learning-projection/artifacts/console-learning-open-route/learning.json +41 -0
package/evals/fixtures/flow-kit-repository/invalid-absolute-path/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-asset-section/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-asset-section/kit.json +11 -0
package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/kit.json +9 -0
package/evals/fixtures/flow-kit-repository/invalid-id/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-id/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-malformed-json/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-flow/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-id/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-id/kit.json +7 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/kit.json +7 -0
package/evals/fixtures/flow-kit-repository/invalid-name/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-name/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-schema-version/flows/review.flow.json +6 -0
package/evals/fixtures/flow-kit-repository/invalid-schema-version/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/invalid-traversal/kit.json +8 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/adapters/example.json +3 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/assets/example.txt +1 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/docs/README.md +3 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-evals/example.json +3 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-skills/mixed/SKILL.md +3 -0
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit.json +44 -0
package/evals/fixtures/flow-kit-repository/valid-local-kit/docs/README.md +3 -0
package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/valid-local-kit/kit.json +20 -0
package/evals/fixtures/hook-influence/cases.json +336 -0
package/evals/fixtures/pull-work-provider/github-issues.json +170 -0
package/evals/fixtures/pull-work-wip-shepherding/global-wip-informs.json +43 -0
package/evals/fixtures/pull-work-wip-shepherding/personal-wip-blocks.json +42 -0
package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +31 -0
package/evals/fixtures/surface-trust/artifact-absent.json +19 -0
package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +32 -0
package/evals/fixtures/surface-trust/missing-authority-trust-report.json +27 -0
package/evals/fixtures/surface-trust/provider-absent.json +19 -0
package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +30 -0
package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +31 -0
package/evals/fixtures/usage-feedback/sample-full.jsonl +11 -0
package/evals/fixtures/usage-feedback/sample-outcomes.jsonl +1 -0
package/evals/fixtures/veritas-governance-adapter/fake-veritas-pass.sh +18 -0
package/evals/fixtures/veritas-governance-adapter/fake-veritas-secret-fail.sh +10 -0
package/evals/fixtures/veritas-governance-adapter/fake-veritas-unconfigured.sh +4 -0
package/evals/integration/test_bundle_install.sh +541 -0
package/evals/integration/test_console_learning_projection.sh +192 -0
package/evals/integration/test_context_map.sh +65 -0
package/evals/integration/test_effective_backlog_settings.sh +58 -0
package/evals/integration/test_fixture_retirement_audit.sh +58 -0
package/evals/integration/test_flow_agents_statusline.sh +93 -0
package/evals/integration/test_flow_kit_repository.sh +90 -0
package/evals/integration/test_goal_fit_hook.sh +482 -0
package/evals/integration/test_hook_category_behaviors.sh +190 -0
package/evals/integration/test_hook_influence_cases.sh +69 -0
package/evals/integration/test_local_flow_kit_install.sh +145 -0
package/evals/integration/test_publish_change_helper.sh +176 -0
package/evals/integration/test_pull_work_provider.sh +140 -0
package/evals/integration/test_runtime_adapter_activation.sh +106 -0
package/evals/integration/test_telemetry.sh +485 -0
package/evals/integration/test_telemetry_doctor.sh +193 -0
package/evals/integration/test_usage_feedback_dashboard.sh +169 -0
package/evals/integration/test_usage_feedback_global.sh +117 -0
package/evals/integration/test_usage_feedback_import.sh +227 -0
package/evals/integration/test_usage_feedback_outcomes.sh +165 -0
package/evals/integration/test_usage_feedback_report.sh +263 -0
package/evals/integration/test_veritas_governance_adapter.sh +235 -0
package/evals/integration/test_workflow_artifact_cleanup_audit.sh +287 -0
package/evals/integration/test_workflow_artifacts.sh +1247 -0
package/evals/integration/test_workflow_sidecar_writer.sh +2112 -0
package/evals/integration/test_workflow_steering_hook.sh +337 -0
package/evals/lib/assertions/delegated-to.js +40 -0
package/evals/lib/assertions/max-tool-calls.js +15 -0
package/evals/lib/assertions/no-write-tools.js +27 -0
package/evals/lib/assertions/pass-at-k.js +39 -0
package/evals/lib/assertions/telemetry-utils.js +105 -0
package/evals/lib/assertions/tool-called.js +39 -0
package/evals/lib/assertions/verify-after-fix.js +61 -0
package/evals/lib/claude-judge.sh +40 -0
package/evals/lib/claude-provider.sh +74 -0
package/evals/lib/codex-judge.sh +39 -0
package/evals/lib/codex-provider.sh +81 -0
package/evals/lib/eval-dev.sh +5 -0
package/evals/lib/eval-judge.sh +22 -0
package/evals/lib/eval-provider.sh +26 -0
package/evals/lib/eval-report.sh +73 -0
package/evals/lib/kiro-dev.sh +4 -0
package/evals/lib/kiro-judge.sh +17 -0
package/evals/lib/kiro-provider.sh +62 -0
package/evals/lib/node.sh +111 -0
package/evals/promptfooconfig.yaml +70 -0
package/evals/run.sh +309 -0
package/evals/static/test_evidence_refs.sh +141 -0
package/evals/static/test_package.sh +407 -0
package/evals/static/test_repo_hooks.sh +68 -0
package/evals/static/test_universal_bundles.sh +274 -0
package/evals/static/test_workflow_skills.sh +1207 -0
package/install.sh +64 -0
package/integrations/veritas/flow-agents.adapter.json +138 -0
package/integrations/veritas/flow-agents.authority-settings.json +26 -0
package/integrations/veritas/flow-agents.repo-standards.json +82 -0
package/kits/builder/flows/build.flow.json +218 -0
package/kits/builder/flows/shape.flow.json +127 -0
package/kits/builder/kit.json +19 -0
package/kits/catalog.json +11 -0
package/package.json +130 -0
package/packaging/README.md +60 -0
package/packaging/manifest.json +173 -0
package/packaging/packs.json +69 -0
package/powers/dependency-checker/POWER.md +20 -0
package/powers/dependency-checker/mcp.json +20 -0
package/powers/playwright/POWER.md +25 -0
package/powers/playwright/mcp.json +12 -0
package/prompts/code-audit.md +123 -0
package/prompts/kcommit.md +88 -0
package/schemas/backlog-provider-settings.schema.json +138 -0
package/schemas/workflow-acceptance.schema.json +216 -0
package/schemas/workflow-critique.schema.json +113 -0
package/schemas/workflow-evidence.schema.json +357 -0
package/schemas/workflow-handoff.schema.json +52 -0
package/schemas/workflow-learning.schema.json +223 -0
package/schemas/workflow-release.schema.json +172 -0
package/schemas/workflow-state.schema.json +80 -0
package/scripts/README.md +111 -0
package/scripts/build-universal-bundles.js +3 -0
package/scripts/check-content-boundary.cjs +99 -0
package/scripts/context-budget/budget-scan.sh +166 -0
package/scripts/detect-tools.sh +3 -0
package/scripts/discover-agents.sh +28 -0
package/scripts/effective-backlog-settings.js +2 -0
package/scripts/filter-installed-packs.js +2 -0
package/scripts/flow-kit.js +2 -0
package/scripts/generate-context-map.js +2 -0
package/scripts/git-status.sh +49 -0
package/scripts/hooks/claude-hook-adapter.js +174 -0
package/scripts/hooks/claude-telemetry-hook.js +115 -0
package/scripts/hooks/codex-hook-adapter.js +176 -0
package/scripts/hooks/codex-telemetry-hook.js +95 -0
package/scripts/hooks/config-protection.js +79 -0
package/scripts/hooks/desktop-notify.sh +39 -0
package/scripts/hooks/governance-audit.sh +135 -0
package/scripts/hooks/lib/audit-transport.sh +40 -0
package/scripts/hooks/lib/hook-flags.js +49 -0
package/scripts/hooks/lib/patterns.sh +57 -0
package/scripts/hooks/lib/resolve-formatter.js +80 -0
package/scripts/hooks/post-edit-accumulator.js +66 -0
package/scripts/hooks/pre-commit-quality.js +194 -0
package/scripts/hooks/quality-gate.js +93 -0
package/scripts/hooks/report-only-guard.js +21 -0
package/scripts/hooks/run-hook.js +136 -0
package/scripts/hooks/stop-format-typecheck.js +141 -0
package/scripts/hooks/stop-goal-fit.js +337 -0
package/scripts/hooks/workflow-steering.js +250 -0
package/scripts/install-codex-home.sh +106 -0
package/scripts/package.json +3 -0
package/scripts/promote-workflow-artifact.js +2 -0
package/scripts/publish-change-helper.js +2 -0
package/scripts/pull-work-provider.js +2 -0
package/scripts/setup-repo-hooks.sh +8 -0
package/scripts/statusline/flow-agents-statusline.js +157 -0
package/scripts/telemetry/console-presets.sh +14 -0
package/scripts/telemetry/install-console-config.sh +214 -0
package/scripts/telemetry/lib/config.sh +85 -0
package/scripts/telemetry/lib/enrich.sh +115 -0
package/scripts/telemetry/lib/redact.sh +22 -0
package/scripts/telemetry/lib/session.sh +63 -0
package/scripts/telemetry/lib/transport.sh +183 -0
package/scripts/telemetry/lib/usage.sh +29 -0
package/scripts/telemetry/sync-agents.sh +173 -0
package/scripts/telemetry/telemetry.conf +23 -0
package/scripts/telemetry/telemetry.sh +387 -0
package/scripts/usage-feedback.js +2 -0
package/scripts/validate-hook-influence-cases.js +2 -0
package/scripts/validate-package.sh +89 -0
package/scripts/validate-source-tree.js +9 -0
package/skills/agentic-engineering/SKILL.md +62 -0
package/skills/browser-test/SKILL.md +51 -0
package/skills/builder-shape/SKILL.md +76 -0
package/skills/context-budget/SKILL.md +40 -0
package/skills/deliver/SKILL.md +241 -0
package/skills/dependency-update/SKILL.md +68 -0
package/skills/design-probe/SKILL.md +107 -0
package/skills/eval-rebuild/SKILL.md +39 -0
package/skills/evidence-gate/SKILL.md +186 -0
package/skills/execute-plan/SKILL.md +110 -0
package/skills/explore/SKILL.md +137 -0
package/skills/feedback-loop/SKILL.md +87 -0
package/skills/fix-bug/SKILL.md +133 -0
package/skills/frontend-design/SKILL.md +80 -0
package/skills/github-cli/SKILL.md +63 -0
package/skills/idea-to-backlog/SKILL.md +267 -0
package/skills/knowledge-capture/SKILL.md +55 -0
package/skills/learning-review/SKILL.md +115 -0
package/skills/pickup-probe/SKILL.md +114 -0
package/skills/plan-work/SKILL.md +176 -0
package/skills/pull-work/SKILL.md +309 -0
package/skills/release-readiness/SKILL.md +121 -0
package/skills/review-work/SKILL.md +161 -0
package/skills/search-first/SKILL.md +66 -0
package/skills/tdd-workflow/SKILL.md +140 -0
package/skills/verify-work/SKILL.md +109 -0
package/src/cli/console-learning-projection.ts +140 -0
package/src/cli/effective-backlog-settings.ts +99 -0
package/src/cli/fixture-retirement-audit.ts +154 -0
package/src/cli/flow-kit.ts +139 -0
package/src/cli/init.ts +248 -0
package/src/cli/promote-workflow-artifact.ts +64 -0
package/src/cli/publish-change-helper.ts +143 -0
package/src/cli/pull-work-provider.ts +481 -0
package/src/cli/runtime-adapter.ts +24 -0
package/src/cli/telemetry-doctor.ts +243 -0
package/src/cli/usage-feedback.ts +418 -0
package/src/cli/validate-hook-influence.ts +119 -0
package/src/cli/validate-source-tree.ts +30 -0
package/src/cli/validate-workflow-artifacts.ts +411 -0
package/src/cli/veritas-governance.ts +322 -0
package/src/cli/workflow-artifact-cleanup-audit.ts +281 -0
package/src/cli/workflow-sidecar.ts +676 -0
package/src/cli.ts +95 -0
package/src/flow-kit/validate.ts +74 -0
package/src/lib/args.ts +43 -0
package/src/lib/fs.ts +62 -0
package/src/lib/workflow-learning-projection.ts +491 -0
package/src/runtime-adapters.ts +154 -0
package/src/tools/build-universal-bundles.ts +366 -0
package/src/tools/common.ts +61 -0
package/src/tools/filter-installed-packs.ts +129 -0
package/src/tools/generate-context-map.ts +199 -0
package/src/tools/validate-package.ts +57 -0
package/src/tools/validate-source-tree.ts +488 -0
package/tsconfig.json +19 -0
package/veritas.claims.json +6 -0

package/docs/north-star.md ADDED Viewed

@@ -0,0 +1,322 @@
+---
+title: Flow Agents North Star
+---
+# Flow Agents North Star
+Flow Agents is the agent-facing vertical of Kontour Flow. It makes agents more reliable than they are out of the box by surrounding them with just-in-time guidance, scoped capabilities, durable context, Flow-backed workflow enforcement, evidence gates, and self-improving feedback loops.
+The long-term goal is not to build another agent runtime, coding assistant, workflow engine, or orchestration control plane. Flow Agents should compose open standards, Kontour Flow, Kontour Veritas, and portable runtime conventions into a coherent system that works across coding, knowledge work, meetings, sales contexts, research, operations, and personal productivity.
+## Product Promise
+Flow Agents should help an agent do the right thing even when:
+- the context window is crowded
+- the conversation has drifted
+- the user request is underspecified
+- the agent is overconfident
+- tools are numerous or risky
+- prior work needs to be resumed
+- verification evidence is missing
+- the user does not know which specialized workflow to invoke
+The system earns trust by reducing the amount of agent behavior that depends on a perfect prompt, perfect memory, or perfect model output.
+## Design Principles
+### Standards First
+Use existing standards before inventing new formats:
+- `AGENTS.md` for project instructions
+- Agent Skills / `SKILL.md` for reusable capabilities
+- MCP for tools, resources, prompts, and integrations
+- OpenAPI for HTTP APIs
+- OAuth/OIDC for delegated access
+- JSON Schema for Flow Agents-owned machine-readable artifacts
+- OpenTelemetry GenAI conventions for traces and metrics
+- SARIF for code, security, and review findings
+- CycloneDX and SLSA for supply chain and provenance workflows
+- iCalendar, JSContact, JMAP, WebVTT/SRT, CommonMark, and JSON-LD where they fit knowledge and communication workflows
+Flow Agents should only invent a format when no durable standard or Kontour foundation product fits. Generic process enforcement belongs in Flow. Repo-local development governance belongs in Veritas. Any Flow Agents-owned format must be small, schema-described, human-inspectable, versionable, and exportable.
+### Progressive Disclosure
+Do not load the whole operating manual into every session.
+Flow Agents should expose small discovery metadata first, then load guidance only when it is useful. Skills, powers, workflow contracts, context packs, and references should be activated just in time.
+### Reliability Over Ceremony
+Workflow gates are valuable only when they improve outcomes.
+Small tasks should stay lightweight. Larger or riskier tasks should gain more structure: planning, acceptance criteria, sandbox decisions, verification, Flow gate evidence, Veritas repo readiness when relevant, release checks, and learning capture.
+### User Simplicity, System Depth
+The user should not need to know whether the system used a rule, skill, power, subagent, memory lookup, workflow contract, or telemetry artifact.
+Flow Agents should preserve a simple user experience while making the underlying behavior more disciplined.
+### Safe Autonomy
+Autonomy should increase only inside appropriate boundaries.
+Flow Agents should distinguish local read-only work, local edits, git worktrees, containers, cloud sandboxes, and privileged integrations. Risky tools need explicit scope, clear ownership, and evidence that the result was checked.
+### Evidence Beats Confidence
+Flow Agents should prefer verifiable evidence over agent self-assessment.
+Important work should end with concrete proof: tests, lint, browser checks, runtime checks, CI, screenshots, trace evidence, review findings, or explicit `NOT_VERIFIED` gaps.
+### Learning Loops
+Every repeated failure, correction, successful pattern, or quality outcome should have a path back into the system.
+Learning may update rules, skills, prompts, evals, docs, telemetry dashboards, backlog items, or knowledge notes. The system should improve without relying on hidden memory alone.
+## Operating Layers
+Flow Agents should converge toward a small set of clear layers.
+| Layer | Purpose | Preferred Standards |
+| --- | --- | --- |
+| Rules | Persistent guidance and constraints | `AGENTS.md`, Markdown, frontmatter |
+| Skills | Reusable task procedures | Agent Skills / `SKILL.md` |
+| Powers | Tools plus activation guidance | MCP, OpenAPI, OAuth/OIDC |
+| Agents | Specialized roles with scoped tools | Harness-native subagents, A2A where useful |
+| Workflows | Durable state, gates, and handoffs | Kontour Flow, JSON Schema, Markdown summaries |
+| Knowledge | People, orgs, meetings, decisions, notes | JSContact, iCalendar, JMAP, CommonMark, JSON-LD |
+| Evidence | Traces, evals, findings, provenance | OpenTelemetry, SARIF, CycloneDX, SLSA |
+These layers should be understandable independently and composable together.
+## Global-On Behavior
+Flow Agents should be useful as a global companion, not just a project-local coding tool.
+For development work, it should help with exploration, planning, implementation, review, verification, delivery, dependency hygiene, and release confidence.
+For personal productivity, it should help route requests into the right workflow, remember durable preferences in inspectable form, and keep scheduled or recurring work from depending on prompt recall.
+## What Flow Agents Owns
+Flow Agents does not need to own the model, runtime, IDE, agent UI, calendar store, CRM, inbox, workflow engine, process transparency kernel, or repo governance engine.
+Flow Agents owns the glue:
+- discovery of relevant context and capabilities
+- activation of the right guidance at the right time
+- scoped delegation to tools and subagents
+- Flow-backed workflow state and gate enforcement inside agent harnesses
+- evidence-backed completion
+- feedback loops that make the next run better
+- portable exports across agent harnesses
+Flow owns generic process transparency: steps, gates, transitions, Flow Runs, exceptions, continuation, and Flow Reports. Veritas owns repo-local development governance: repo standards, requirements, evidence checks, change guidance, and merge readiness. Flow Agents packages those foundations into useful agent modes, skills, provider settings, runtime adapters, hooks, and Console views.
+## Success Criteria
+Flow Agents is working when:
+- users get better outcomes without writing better prompts
+- agents recover from context drift instead of compounding it
+- workflow state survives long sessions and context compaction
+- tools are available without overwhelming the context window
+- risky work is isolated or gated
+- completed work has evidence, not just a confident summary
+- recurring corrections become system improvements
+- standards-based artifacts can move across Codex, Claude Code, Kiro, and future harnesses
+- Flow and Veritas evidence can be surfaced without making users learn their internal product vocabularies
+The system should feel simple at the surface because the complexity has been organized underneath it.
+## Roadmap
+This roadmap turns the north star above into incremental work.
+The goal is not to add ceremony. The goal is to make agents more reliable while keeping the user experience simple: users ask for outcomes, and Flow Agents supplies the right context, capabilities, tools, checks, and learning loops just in time.
+### Progress Checklist
+| Status | Workstream | Target Outcome |
+| --- | --- | --- |
+| [x] | North star | Durable direction documented in `docs/north-star.md`. |
+| [x] | Layer taxonomy | Repo vocabulary clearly separates rules, skills, powers, agents, workflows, knowledge, and evidence. |
+| [x] | Core vs optional packs | Pack composition manifest exists and generated install scripts support opt-in `FLOW_AGENTS_PACKS` filtering while preserving all-pack installs by default. |
+| [x] | Standards register | Supported standards and Flow Agents-owned formats are documented with adoption rules. |
+| [ ] | Structured workflow state | Draft schemas, contracts, validation, explicit current-session identity, delegation-safe agent event logs, sidecar writer commands, and direct workflow-skill writer instructions exist for state, acceptance, evidence, handoff, critique, release, and learning; automatic enforcement remains partial. |
+| [ ] | Context map | Generated repo/context map exists; workflow steering and core planner/worker/verifier agents now use it, but broader agent coverage remains. |
+| [ ] | JIT guidance | Stop hook checks sidecars; workflow steering reads `state.json`, `critique.json`, context-map availability, and high-risk state after non-subagent tools; broader file/task-aware guidance remains. |
+| [x] | Sandbox policy | `context/contracts/sandbox-policy.md` and https://github.com/kontourai/flow-agents/blob/main/docs/sandbox-policy.md classify local read-only, local edit, worktree, container, cloud sandbox, and privileged integration modes. |
+| [ ] | Evidence integration | Evidence sidecars now carry `standard_refs` for SARIF, OpenTelemetry, JUnit/TAP, Veritas, and custom proof; a local Veritas readiness wrapper can record native Veritas reports as optional Flow Agents evidence. |
+| [ ] | Feedback loop | Runtime telemetry, outcomes, evals, and recurring corrections feed back into docs, skills, rules, or backlog. |
+| [ ] | Export validation | Codex, Claude Code, and Kiro exports preserve the same operating layers and now install telemetry, Goal Fit, and workflow steering hook wiring; adapter output, installed-command coverage, Claude live hook influence, and Kiro live strict-stop coverage exist. |
+### Now / Next / Later
+Use this as the pickup list for future sessions.
+| Priority | Work | Exit Signal |
+| --- | --- | --- |
+| Now | Hook influence evals | `evals/fixtures/hook-influence/cases.json` validates expected agent behavior after hook guidance, and runtime gaps are explicit instead of implied. |
+| Now | Self-validation loop | Each Flow Agents change creates or resumes a workflow artifact, then uses `dogfood-pass` when checks and critique are ready to record evidence, critique, state, handoff, and optional learning follow-ups. |
+| Now | Guidebook UX | The GitHub Pages guidebook explains the system with examples, diagrams, and “user says / Flow Agents does” framing. |
+| Now | Veritas spike | Run Veritas readiness through the governance adapter boundary and record native output as Flow Agents evidence without taking a dependency. |
+| Next | Runtime upgrades | Upgrade documented hook-influence gaps when Codex or Kiro expose post-tool hook guidance as model context in live harnesses. |
+| Later | Automatic learning proposals | Detect repeated workflow friction from telemetry/evidence and propose rule, skill, eval, doc, backlog, or knowledge updates. |
+| Later | Broader file-aware JIT guidance | Surface task/file-specific guidance before risky edits, not only after sidecar state indicates a problem. |
+## Phase 1: Clarify The System Shape
+**Purpose:** Make Flow Agents easy to understand before adding more machinery.
+Tasks:
+- Document the public layers: rules, skills, powers, agents, workflows, knowledge, and evidence. **Done:** see https://github.com/kontourai/flow-agents/blob/main/docs/operating-layers.md.
+- Mark which directories are canonical source, generated exports, runtime state, and optional integrations.
+- Decide which workflow skills are part of the core pack and which are optional domain packs. **Started:** `packaging/packs.json` defines core, development, knowledge, AWS, and experimental packs.
+- Add a standards register that lists each external standard, how Flow Agents uses it, and what Flow Agents-owned schemas still exist. **Done:** see https://github.com/kontourai/flow-agents/blob/main/docs/standards-register.md.
+- Add a "do not invent without checking standards" rule to contributor docs.
+Exit criteria:
+- A new contributor can explain where to put persistent guidance, a reusable skill, an MCP integration, workflow state, evidence, or knowledge notes.
+- The default mental model has fewer top-level concepts than the current repo surface.
+## Phase 2: Make Workflow State Durable
+**Purpose:** Preserve reliability when context windows are full, sessions are resumed, or agent output quality drifts.
+Tasks:
+- Define JSON Schemas for workflow state, acceptance criteria, evidence, handoff, critique, release readiness, and learning records. **Done:** draft schemas exist under `schemas/`.
+- Keep Markdown artifacts as human summaries, but make JSON sidecars the machine-readable source for gates.
+- Update `plan-work`, `execute-plan`, `verify-work`, `evidence-gate`, and `release-readiness` to read and update the sidecars. **Started:** `npm run workflow:sidecar --` provides a reusable writer for plan, state, evidence, critique, release, and learning records, and core workflow skills now direct agents to use it when available.
+- Make sidecar writes serialized or conflict-aware so concurrent critique/evidence updates cannot overwrite each other during parallel self-validation. **Started:** `npm run workflow:sidecar --` takes a per-artifact lock around writer commands.
+- Add validation to `npm run workflow:validate-artifacts --`.
+- Add eval fixtures for context-compaction and long-session recovery.
+Exit criteria:
+- A workflow can resume from artifacts without relying on the model remembering prior turns.
+- Goal Fit, evidence status, and next action can be read mechanically.
+- Multiple agents can share one workflow root by resolving `.flow-agents/current.json` and appending agent-local events instead of racing on root state.
+## Phase 3: Add Just-In-Time Guidance
+**Purpose:** Give the agent small, relevant guidance at the moment it matters.
+Tasks:
+- Generate a compact context map for each repo: structure, commands, test strategy, key conventions, recent workflow state, and available packs. **Started:** `npm run context-map --` writes `docs/context-map.md` and supports drift checks.
+- Extend hooks so they can surface file-specific, workflow-specific, or evidence-specific guidance without loading whole docs. **Started:** workflow steering now emits ambient reminders after non-subagent tools when sidecars show `not_verified`, `needs_decision`, `blocked`, `failed`, or `needs_user`.
+- Add skill discovery metadata that lets agents choose a skill from a short summary, then progressively load the body.
+- Add missing-evidence prompts: when a workflow is about to stop without proof, show the specific gate that failed. **Started:** the Goal Fit stop hook now reads `state.json`, `evidence.json`, and `critique.json` to report unfinished phase, next action, failed checks, `NOT_VERIFIED` gaps, and open critique findings.
+- Extend stop hooks to require sidecars in strict mode. **Started:** `FLOW_AGENTS_REQUIRE_SIDECARS=true` makes the Goal Fit hook block missing or invalid sidecars; `FLOW_AGENTS_REQUIRE_CRITIQUE=true` also requires a passing critique record.
+- Keep guidance output short enough to be useful inside a degraded or crowded context window.
+Exit criteria:
+- Agents receive targeted reminders before risky edits, before stopping, and when proof is missing.
+- Routine tasks do not carry the full operating manual in prompt context.
+- Codex, Claude Code, and Kiro exports install equivalent Goal Fit and workflow-steering hooks for the same workflow state.
+- Claude Code, Codex, and Pi-compatible extension paths expose the loaded workflow and progress in runtime status surfaces where the host supports them.
+- Hook evals prove guidance is delivered through each runtime's hook protocol. Live harnesses prove Claude Code responds to prompt-submit workflow guidance and Kiro surfaces strict Stop gates; Codex `exec` currently remains covered by installed-command and protocol evals rather than live hook-context injection.
+- Hook-influence behavioral cases define what the agent must do after receiving guidance and classify evidence as installed-command, live-acceptance, or documented-runtime-gap.
+## Phase 4: Evidence And Governance
+**Purpose:** Replace agent confidence with proof and make recurring mistakes self-correcting.
+Tasks:
+- Map Flow Agents telemetry and workflow evidence toward OpenTelemetry GenAI conventions.
+- Define how lint, review, security, and policy findings can emit SARIF or SARIF-like summaries.
+- Add a governance/evidence adapter point so an external tool can enforce repo-local rules. **Started:** see `context/contracts/governance-adapter-contract.md`.
+- Prototype optional Veritas integration for development workflows. **Next:** keep the integration on the governance adapter contract and implement any reusable bridge as TypeScript, not as a repo-specific Python wrapper.
+- Decide which checks belong in Flow Agents itself and which should be delegated to Veritas or other tools.
+- Keep the user-facing boundary in the Veritas Integration Boundary:
+  https://github.com/kontourai/flow-agents/blob/main/docs/veritas-integration.md
+Exit criteria:
+- Evidence can answer: what changed, what proof ran, what failed, what is not verified, and what should happen next.
+- Governance checks can be introduced without baking repo-specific policy into Flow Agents core.
+## Phase 6: Self-Improving Loop
+**Purpose:** Turn repeated usage into system improvement.
+Tasks:
+- Normalize runtime telemetry, workflow evidence, eval outcomes, and human quality feedback into one reporting model.
+- Identify recurring failures: missing tests, premature stopping, wrong skill choice, context drift, bad tool use, weak handoffs, or stale knowledge.
+- Route findings into the right improvement target: rules, skills, powers, evals, docs, backlog, or knowledge notes.
+- Add promotion gates so guidance becomes stricter only after evidence shows it helps.
+- Make dashboards answer whether Flow Agents is improving outcomes over time.
+- Dogfood the workflow artifacts on Flow Agents changes: each substantial pass should produce sidecars, run artifact validation, delegate critique, update durable docs, and route accepted critique into the next slice. **Started:** `npm run workflow:sidecar -- dogfood-pass` records evidence, required critique, optional release readiness, optional learning, state, and handoff in one fail-closed validated pass.
+- Automatically create or select the current session artifact so self-validation does not depend on the user or orchestrator hand-picking `.flow-agents/<slug>`. **Started:** `npm run workflow:sidecar -- ensure-session` creates or selects a delivery session artifact plus initial state, acceptance, and handoff sidecars.
+Exit criteria:
+- The system can show which guidance is working, which rules are noisy, and which failures keep recurring.
+- Improvements are reviewable and not hidden in opaque memory.
+## Veritas Fit Assessment
+`~/dev/github/kontourai/veritas` appears strongly aligned with the evidence and governance parts of the north star.
+Veritas already provides:
+- repo-local adapters, repo standards, and authority settings
+- lint-style feedback designed for agents
+- just-in-time `explain` guidance
+- evidence records with JSON Schemas
+- evidence checks and verification budgets
+- advisory readiness runs and eval history
+- governance blocks for AI instruction files
+- agent-agnostic activation through repo-local artifacts, hooks, and CI
+That overlaps with Flow Agents' desired evidence layer, but it should not be folded in blindly.
+Recommended stance:
+- Treat Veritas as a first-class optional integration candidate, not a vendored subsystem.
+- Use Veritas for repo-local development governance where path/surface/policy checks are valuable.
+- Keep Flow Agents responsible for cross-domain orchestration, skills, powers, global knowledge, workflow state, and harness exports.
+- Define a small adapter contract: Flow Agents can invoke Veritas and ingest its evidence, but Flow Agents does not need to own Veritas policy semantics.
+Decision gate before adopting Veritas in Flow Agents:
+- Does Flow Agents need repo-local policy enforcement for its own workflows now, or only later?
+- Can Veritas output map cleanly into Flow Agents evidence artifacts without duplicating schemas?
+- Can installation remain optional so non-development knowledge workflows stay lightweight?
+- Does Veritas' Surface terminology create confusion inside Flow Agents, or can it stay behind the adapter boundary?
+- Would using Veritas improve Flow Agents' reliability faster than building a smaller local evidence checker?
+Initial experiment:
+1. Add a Flow Agents Veritas spike issue or plan, not a dependency yet.
+2. Configure Veritas against Flow Agents in advisory readiness mode in a branch or local artifact.
+3. Test three rules: instruction governance block intact, workflow docs require eval updates when contracts change, and hook/script changes require validation evidence.
+4. Compare the output against Flow Agents' current `evidence-gate` and telemetry artifacts.
+5. Decide whether to adopt Veritas as an optional dev-governance power.
+## First Implementation Slice
+The thinnest meaningful slice is:
+1. Add the layer taxonomy and standards register.
+2. Add the roadmap checklist to docs.
+3. Define the first JSON Schemas for workflow state and evidence.
+4. Update one workflow path, likely `plan-work -> verify-work -> evidence-gate`, to write/read sidecars.
+5. Add validation and eval fixtures for that path.
+6. Run a Veritas advisory-readiness spike separately before making it a Flow Agents dependency.
+This gives Flow Agents a concrete path toward the north star without prematurely coupling it to any one external project.

package/docs/operating-layers.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+title: Operating Layers
+---
+# Operating Layers
+Flow Agents should stay understandable by keeping a small public vocabulary. Each layer has one job, one source-of-truth pattern, and one reason to exist.
+The layers are ordered from durable context to execution evidence. When adding a new capability, choose the lowest layer that can own it cleanly. Do not create a new layer unless the existing ones make the behavior harder to understand.
+For the concrete directory-by-directory source map, generated output policy, runtime state policy, and cleanup rules, use [Repository Structure](repository-structure.md). This page explains conceptual ownership layers; the repository structure page is the durable file-placement reference.
+## Layer Map
+| Layer | Owns | Does Not Own | Source Pattern |
+| --- | --- | --- | --- |
+| Rules | Persistent guidance, conventions, boundaries, and defaults | Step-by-step task procedures or tool configuration | `AGENTS.md`, Markdown, frontmatter |
+| Skills | Reusable procedures an agent can invoke when a task matches | Always-on policy, credentials, or long-lived memory | Agent Skills / `SKILL.md` |
+| Powers | Tool bundles, MCP configs, and activation guidance | Workflow gates or repo-specific policy semantics | MCP, OpenAPI, OAuth/OIDC |
+| Agents | Role prompts, delegation boundaries, and scoped tool access | Generic task procedures that should be skills | Harness-native subagents/profiles |
+| Workflows | Durable state, gates, handoffs, acceptance criteria, and phase transitions | Domain-specific knowledge records or tool internals | JSON Schema sidecars plus Markdown summaries |
+| Knowledge | People, organizations, meetings, decisions, commitments, notes, and follow-ups | Verification verdicts or runtime telemetry | CommonMark, JSContact, iCalendar, JMAP, WebVTT/SRT, JSON-LD |
+| Evidence | Proof, telemetry, findings, evals, provenance, and quality outcomes | User-facing procedure instructions | OpenTelemetry, SARIF, CycloneDX, SLSA, JSON Schema |
+Governance tools such as Veritas belong at the Evidence boundary. Flow Agents should call them through `context/contracts/governance-adapter-contract.md`, record native artifact refs in `evidence.json`, and leave repo-specific policy semantics with the adapter.
+## Current Repo Mapping
+| Path | Layer | Notes |
+| --- | --- | --- |
+| `AGENTS.md` | Rules | Project-level source guidance for agents. |
+| `context/` | Rules / Knowledge | Shared guidance, contracts, and reusable context. Prefer specific subfolders or docs when the split becomes clearer. |
+| `skills/` | Skills | Shared `SKILL.md` packages exported to supported harnesses. |
+| `powers/` | Powers | Optional MCP and capability bundles. A power means a tool surface plus activation guidance, not guaranteed credentials. |
+| `agents/` | Agents | Canonical role and specialist definitions. Keep public agent count small; prefer skills for reusable procedures. |
+| `agent-cards/` | Agents | Discovery metadata for routable orchestrators. |
+| `kits/` | Flow Kits | Kit Catalog entries, Flow Kit manifests, Flow Definitions, and supporting assets. Builder Kit is the first proof point. |
+| `prompts/` | Skills / Rules | Saved invocations. Promote repeatable procedures into skills when they grow stable. |
+| `docs/workflow-*.md` | Workflows | Human-readable workflow contracts and usage guidance. |
+| `.flow-agents/` | Workflows | Cross-session task artifacts. Runtime state stays local and ignored; durable outcomes are promoted into docs, source, schemas, or provider records before merge. |
+| `scripts/` | Evidence / Workflows / Packaging | Validation, build, telemetry, hooks, and artifact tooling. |
+| `src/` | Workflows / Evidence / Packaging | TypeScript CLI, runtime adapter, Flow Kit, shared library, build, validation, context-map, packaging, and CLI helper source compiled into `build/src/`. |
+| `evals/` | Evidence | Static, behavioral, integration, and acceptance checks. |
+| `.telemetry/` | Evidence | Runtime telemetry, outcomes, and reports. |
+| `packaging/` | Packaging | Cross-harness manifest and bundle docs. |
+| `dist/` | Packaging | Generated exports. Never edit by hand. |
+| `build/` | Packaging | Generated TypeScript compiler output. Never edit by hand. |
+| `_site/` | Docs / Packaging | Generated GitHub Pages output from `docs/`. Never edit by hand. |
+## Flow Kit Coordination
+Flow owns Flow Definition semantics: gates use typed `expects` entries, Surface requirements use `kind: "surface.claim"`, and project configuration owns trusted producer mappings plus gate overrides. Flow Agents should author, install, adapt, and control those assets for local runtimes; it should not become the authority source for claim trust or override semantics.
+The Kit Catalog is the Flow Agents index of installable Flow Kits. A Flow Kit can contain Flow Definitions, skills, docs, adapters, and evals, but the catalog points at those assets instead of defining gate behavior itself. Builder Kit is the first Kontour-authored kit and proves the path from shaping through build, verification, merge readiness, and learning.
+Local kit repositories must follow the Flow Kit Repository Contract:
+https://github.com/kontourai/flow-agents/blob/main/docs/flow-kit-repository-contract.md
+The contract requires a root `kit.json`, declared Flow Definition paths, declared asset paths, and local path-safety rules. Flow Agents validates that repository shape, installs validated local repositories as runtime overlay state, and records provenance metadata; Flow validates the Flow Definition semantics.
+Builder Kit vocabulary should be used in public and internal guidance:
+- Flow Kit: installable workflow bundle.
+- Kit Catalog: index of Flow Kits and their runtime assets.
+- Builder Kit: the coding/building kit shipped by this repo.
+- Probe: question-driven design and context challenge step, surfaced as `design-probe`.
+Builder Kit evidence gates can reference Surface trust state without naming a provider. A trust-backed gate may attach a TrustReport or Trust Snapshot ref for the relevant Surface claim, while Flow keeps authority over gate evaluation, trusted producer mapping, and route-back behavior. Surface remains the portable trust-state layer, and Veritas remains an optional producer rather than a required Builder Kit dependency.
+## Placement Rules
+- Put persistent behavior that should apply before any task starts in **Rules**.
+- Put repeatable procedures with activation criteria in **Skills**.
+- Put external tools, MCP servers, API integrations, and credentialed capabilities in **Powers**.
+- Put role identity, delegation boundaries, and scoped tool access in **Agents**.
+- Put phase state, gate decisions, acceptance criteria, and resumable handoffs in **Workflows**.
+- Put people, orgs, meetings, notes, decisions, and relationship context in **Knowledge**.
+- Put proof, findings, traces, eval results, and quality outcomes in **Evidence**.
+Workflow artifacts have their own lifecycle policy:
+https://github.com/kontourai/flow-agents/blob/main/docs/workflow-artifact-lifecycle.md
+Use `.flow-agents/<slug>/` for local runtime/session state. If in-progress planning needs review or handoff, promote the durable decision, behavior, and evidence summary into normal docs or provider records before merge; keep runtime artifacts out of git.
+If a proposed artifact seems to belong to multiple layers, split it. For example, a dependency-checking capability may have:
+- a power for the dependency tool
+- a skill for the update procedure
+- workflow state for a specific update task
+- evidence for the scan result
+## Core Surface And Kit Filtering
+The default Flow Agents surface should remain small. Flow Kits add workflow depth without making every installation carry every concept. The current install implementation still has legacy composition metadata under `packaging/`; treat that as compatibility/build mechanics while the Kit Catalog becomes the product-facing vocabulary.
+Do not duplicate full membership lists in prose. Update the canonical kit and packaging metadata, then regenerate the Context Map for the current skill, agent, power, and Flow Kit counts:
+https://github.com/kontourai/flow-agents/blob/main/docs/context-map.md
+Kit boundaries should be validated by usage data, context budget impact, and whether users can predict what will load before making install filtering the default behavior.
+## Design Checks
+Before adding or changing a capability, answer:
+- Which layer owns this?
+- Is there already a standard for the artifact shape?
+- Does this need to be globally available, project-local, or task-local?
+- Can it be loaded just in time instead of always-on?
+- What evidence will show it improved outcomes?
+- Does it belong in core, a Flow Kit, or an optional integration?

package/docs/repository-structure.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+title: Repository Structure
+---
+# Repository Structure
+This is the canonical developer-facing map for the Flow Agents repository. Use it to decide where a change belongs, whether a path is source or generated output, and which cleanup decisions are safe.
+## Source Of Truth Rules
+- Edit canonical source in the repo root areas listed below, then regenerate derived output with the documented commands.
+- Do not edit `dist/`, `build/`, or `_site/` by hand. They are generated from tracked source.
+- Do not commit local runtime state from `.flow-agents/<slug>/`, `.codex/`, `.claude/`, `.omx/`, `.promptfoo/`, `.telemetry/`, `.surface/`, `.veritas/`, or tool caches.
+- Runtime workflow artifacts stay local and ignored; promote reviewable or durable outcomes to docs, source, schemas, or provider records before merging to `main`.
+- Treat generated exports and installed runtime config as products of `packaging/manifest.json`, `src/tools/build-universal-bundles.ts`, `scripts/install-*.sh`, and the source directories they copy.
+## Target Layout
+```text
+/
+  README.md                    # human entry point
+  src/                         # TypeScript CLI and runtime source
+  src/tools/                   # TypeScript build, packaging, validation, and context-map tooling
+  scripts/                     # public wrappers, shell tools, hooks, telemetry, installers
+  agents/ agent-cards/         # canonical agent specs and discovery cards
+  skills/ context/ powers/ prompts/
+                                # canonical workflow bundle content
+  kits/                        # Flow Kit catalog and bundled kit assets
+  schemas/                     # JSON sidecar and provider schemas
+  packaging/                   # bundle/export manifests and pack definitions
+  evals/                       # eval harness, fixtures, static checks, integration checks
+  docs/                        # durable docs and GitHub Pages source
+  integrations/                # optional external integration config
+  dist/ build/ _site/           # generated output; ignored
+  .flow-agents/ .codex/ .claude/ ... # local runtime state; ignored by default
+```
+## Top-Level Inventory
+| Path | Classification | Source of truth | Generated or runtime policy | Safe cleanup rule |
+| --- | --- | --- | --- | --- |
+| `.flow-agents/` | runtime state | Workflow tools write local session artifacts. | Ignored. | Do not commit task runtime roots; promote durable decisions to docs, source, schemas, or providers before merge. |
+| `.claude/` | installed runtime config | Generated bundle or local runtime install. | Ignored. | Reinstall from `dist/claude-code/` instead of editing as source. |
+| `.codex/` | installed runtime config | Generated bundle or local runtime install. | Ignored. | Reinstall from `dist/codex/` or `scripts/install-codex-home.sh`; do not treat local hooks as canonical. |
+| `.githooks/` | canonical repo tooling | Tracked repository hook scripts. | Source, not runtime agent hooks. | Keep compatible with `npm run setup:repo-hooks` and `npm run validate:repo-hooks --`. |
+| `.github/` | canonical CI config | GitHub workflow files. | Source. | Preserve workflow command names and artifact expectations. |
+| `.ai/`, `.omx/`, `.promptfoo/`, `.surface/`, `.telemetry/`, `.veritas/` | runtime, cache, or integration output | Local tools and optional integrations. | Ignored runtime state. | Clean locally when not needed; promote only stable integration config under `integrations/` or durable docs. |
+| `.venv/`, `node_modules/`, `test-results/`, `__pycache__/` | dependency/cache output | Package managers and test tools. | Ignored. | Safe local cleanup; recreate with normal install or test commands. |
+| `_site/` | generated docs output | Built from `docs/`. | Ignored. | Recreate with docs preview/build tooling. |
+| `agent-cards/` | canonical source | Discovery metadata for routable agents. | Exported into runtime bundles. | Do not delete without checking bundle manifests and evals. |
+| `agents/` | canonical source | Source agent definitions. | Exported to Kiro, Claude Code, Codex, and compatible harnesses. | Keep public agent names compatible or provide shims. |
+| `build/` | generated output | TypeScript compiler output from `src/`. | Ignored. | Recreate with `npm run build --`. |
+| `context/` | canonical source | Shared contracts, settings, templates, hooks context, and reusable guidance. | Exported to bundles. | Contract changes require validation and docs review. |
+| `dist/` | generated bundle output | Created by `npm run build:bundles --`. | Ignored. | Never edit by hand; rebuild from source and packaging metadata. |
+| `docs/` | canonical docs/site source | Durable developer and product documentation. | Source for GitHub Pages and context docs. | Update when behavior or boundaries change; regenerate context map when relevant. |
+| `evals/` | canonical eval source plus ignored results | Harness, cases, fixtures, static checks, integration checks. | `evals/results/*.json`, reports, and CI logs are generated output unless intentionally tracked fixtures. | Do not remove fixtures without reference proof; generated results can be local cleanup candidates. |
+| `integrations/` | optional integration source | Integration config shipped with the repo. | Source; local run state belongs under ignored runtime roots. | Keep optional and adapter-driven. |
+| `kits/` | canonical Flow Kit source | Kit Catalog and bundled Builder Kit assets. | Exported and validated by Flow Kit commands. | Preserve catalog paths and validation coverage. |
+| `packaging/` | canonical packaging source | Manifest, pack definitions, and packaging docs. | Drives generated bundles under `dist/`. | Update before changing export shape. |
+| `powers/` | canonical source | Optional MCP/tool capability bundles. | Exported where supported. | Keep activation guidance separate from credentials. |
+| `prompts/` | canonical source | Saved prompt entry points. | Exported where supported. | Promote stable procedures into skills when needed. |
+| `schemas/` | canonical source | JSON schemas for sidecars and provider/resource records. | Used by validators and workflow tooling. | Schema changes require artifact validation. |
+| `scripts/` | canonical source and compatibility surface | Shell and JavaScript wrappers, installers, hooks, telemetry, workflow tooling. | Some scripts wrap compiled `build/` output. | Public wrappers are compatibility-sensitive; see [`scripts/README.md`](../scripts/README.md) before moving. |
+| `src/tools/` | canonical TypeScript tooling source | Build, packaging, context-map, validators, and utility modules imported by `src/cli.ts`. | Compiled to `build/src/tools/`. | Keep public wrappers in `scripts/` stable when tooling internals move. |
+| `skills/` | canonical source | Reusable workflow skills. | Exported to runtime bundles. | Skill renames need compatibility and docs updates. |
+| `src/` | canonical TypeScript product source | CLI, runtime adapters, Flow Kit helpers, and shared libraries. | Compiled to `build/src/`. | Preserve public bin command behavior. |
+| root files | canonical metadata | `package.json`, `tsconfig.json`, `install.sh`, license, contribution docs, security docs, and repo instructions. | Source. | Keep command names and install behavior compatible. |
+| `veritas.claims.json` | optional integration source | Repo-local Veritas claim configuration. | Source for optional governance evidence. | Keep optional; local Veritas run output stays ignored. |
+## Regeneration And Validation Commands
+| Need | Command |
+| --- | --- |
+| Compile TypeScript | `npm run build --` |
+| Validate source tree | `npm run validate:source --` |
+| Regenerate context map | `npm run context-map --` |
+| Check context map drift | `npm run context-map:check --` |
+| Rebuild runtime bundles | `npm run build:bundles --` |
+| Validate packaging | `npm run validate:package -- <package-prefix>` |
+| Run static evals | `bash evals/run.sh static` |
+| Run integration evals | `bash evals/run.sh integration` |
+| Validate repo Git hooks | `npm run validate:repo-hooks --` |
+| Audit fixture retirement candidates | `npm run fixture:retirement-audit --` |
+## Where Changes Belong
+Use this table before adding a new file or moving behavior. Prefer the most
+specific row that matches the change.
+| Change | Put it here | Validate with |
+| --- | --- | --- |
+| Agent behavior, specialist instructions, model/tool routing | `agents/`, `agent-cards/`, and the relevant `skills/` or `context/` contract | `npm run validate:source --` and `bash evals/run.sh static` |
+| Product CLI behavior or reusable implementation logic | `src/cli/`, `src/tools/`, `src/lib/`, or a product-specific `src/` module | `npm run typecheck --` and the relevant integration eval |
+| Stable command path for existing callers | Thin launcher in `scripts/`; implementation remains in `src/` | `npm run validate:source --` |
+| Runtime hook adapter or policy behavior | `scripts/hooks/` for runtime JS/shell hooks; shared helpers under `scripts/hooks/lib/` | `npm run validate:source --` plus the relevant hook integration eval |
+| Bundle/export shape | `packaging/`, `src/tools/build-universal-bundles.ts`, and source directories copied into bundles | `bash evals/static/test_universal_bundles.sh` |
+| Installer or local runtime setup behavior | `scripts/install-*.sh`, package bins, and generated bundle install scripts | `bash evals/integration/test_bundle_install.sh` |
+| Workflow artifact, sidecar, or provider contract | `context/contracts/`, `schemas/`, `src/cli/workflow-*`, and matching eval fixtures | `npm run workflow:validate-artifacts --` and workflow integration evals |
+| Flow Kit catalog or bundled kit content | `kits/`, Flow Definition files, and kit repository fixtures | `npm run flow-kit -- validate` or `bash evals/integration/test_flow_kit_repository.sh` |
+| Durable developer guidance | `docs/`; regenerate/check the context map when navigation or durable contracts change | `npm run context-map:check --` |
+| Eval scenario or fixture | `evals/static/`, `evals/integration/`, `evals/fixtures/`, or `evals/cases/` | The owning eval plus `bash evals/run.sh static` when contracts are touched |
+| Optional external integration configuration | `integrations/` or `veritas.claims.json`; keep local run output ignored | The integration-specific eval or documented dry run |
+## Runtime And TypeScript Policy
+The package requires Node `>=22`, and GitHub Actions runs CI on Node 22. Keep `@types/node` on the Node 22 major line while CI remains the runtime baseline. Moving to a newer Node type major should be paired with an explicit runtime policy update and CI validation.
+## Generated And Runtime Boundaries
+`dist/`, `build/`, and `_site/` are generated output. `dist/` mirrors canonical bundle source for runtime installation; `build/` mirrors TypeScript compilation output; `_site/` mirrors the docs site build. If any of these are stale, rebuild them instead of patching them. Static bundle validation builds the same source into two fresh output directories and diffs the results so reproducibility regressions fail with evidence.
+`scripts/` is a compatibility surface, not a dumping ground for implementation logic. Public JavaScript wrappers are documented in `scripts/README.md` and checked by `npm run validate:source --`; keep wrappers thin and move behavior into `src/cli/` or `src/tools/`.
+`.codex/` and `.claude/` at the repo root are installed runtime configuration surfaces. They can be useful for local testing, but canonical hook scripts and runtime config live in `scripts/hooks/`, `context/`, `packaging/`, and generated bundle output. The stale local `.codex/hooks.json` incident came from treating an installed runtime file as if it were canonical source. The fix is to regenerate or reinstall runtime config and update the canonical builder/install sources when behavior must change.
+`.flow-agents/<slug>/` is workflow working memory. Keep plans, sidecars, evidence, and handoffs there while work is active. Promote stable outcomes into `docs/`, schemas, source, or provider records before final acceptance.
+`evals/fixtures/` ownership is tracked in [Fixture Ownership](fixture-ownership.md).
+Do not delete or add fixture directories without updating that inventory and the
+owning eval evidence.
+## Dead-Code Cleanup Policy
+Do not delete production source in a cleanup pass unless repeatable proof shows it is unused and validators still pass. Minimum proof for a candidate:
+1. `git ls-files <path>` shows whether the path is tracked source or local ignored state.
+2. `rg -n "<path-or-command-name>" README.md docs context agents agent-cards skills powers prompts scripts src evals packaging kits integrations package.json install.sh .github .githooks` has no live references, or all references are updated in the same change.
+3. Public commands, package bins, installers, bundle manifests, kit catalogs, and evals do not depend on the path, or compatibility shims remain in place.
+4. Generated output has a documented source and regeneration command before removal.
+5. Relevant validation passes after cleanup.
+Current low-risk cleanup candidates are ignored local caches and generated result payloads, not production source. Keep `evals/fixtures/` and tracked `.gitkeep` or baseline ignore files unless a separate eval migration proves they are obsolete.

package/docs/sandbox-policy.md ADDED Viewed

@@ -0,0 +1,56 @@
+---
+title: Sandbox Policy
+---
+# Sandbox Policy
+Flow Agents workflows should choose the smallest execution boundary that can produce useful evidence.
+The policy is not a replacement for a runtime's permission model. It is the shared vocabulary agents use when planning, delegating, asking for approval, and explaining why a task needs isolation or escalation.
+The canonical contract lives in `context/contracts/sandbox-policy.md`. This page is the human-facing explanation of the same vocabulary.
+## Modes
+| Mode | Use When | Allowed Shape | Approval / Evidence |
+| --- | --- | --- | --- |
+| `local-read-only` | Research, review, planning, verification that does not mutate files or external systems | Read files, inspect configs, run safe read-only commands | Record sources read and checks attempted |
+| `local-edit` | Small changes in the current workspace with low conflict and rollback risk | Edit files inside the active workspace; run local checks | Record modified files, commands, and evidence |
+| `worktree` | Parallel work, risky refactors, overlapping file ownership, generated artifacts, or tasks likely to outlive one session | Create or use an isolated git worktree and branch | Record worktree path, branch, owner, and merge/cleanup plan |
+| `container` | Untrusted dependencies, destructive build steps, generated-code experiments, or tools that may pollute the host | Run in a disposable container or equivalent isolated environment | Record image/context, mounted paths, and copied outputs |
+| `cloud-sandbox` | Cloud resources, remote preview environments, infrastructure plans, or account-scoped experiments | Use scoped cloud accounts/projects/environments with explicit owner and teardown | Record account/project, region, permissions, cost risk, and teardown evidence |
+| `privileged-integration` | Actions that mutate production-like systems, send external messages, access sensitive data, approve releases, or require elevated local permissions | Use the narrowest tool/action scope; request explicit approval | Record approval reason, target, expected effect, rollback, and post-action verification |
+## Selection Rules
+- Start at `local-read-only` for discovery and planning.
+- Use `local-edit` only when the active workspace is the intended edit surface and conflict risk is low.
+- Prefer `worktree` when work overlaps with another active task, touches broad/shared files, or may need independent review.
+- Prefer `container` when dependency installation, generation, or destructive tooling could change host state outside the repo.
+- Use `cloud-sandbox` for cloud experiments instead of real/shared environments unless the user explicitly authorizes otherwise.
+- Use `privileged-integration` only with explicit scope, approval reason, and evidence that the action completed or was rolled back.
+## Required Records
+Planning or execution artifacts should record:
+- `sandbox_mode`: one of the modes above
+- `scope`: files, systems, resources, accounts, or branches in scope
+- `owner`: agent, human, or integration responsible for the action
+- `approval`: not required, requested, granted, denied, or blocked
+- `rollback`: how to revert or clean up if the action fails
+- `evidence`: commands, checks, logs, links, screenshots, or sidecars proving the outcome
+## Stop Conditions
+Stop and route back to planning or user approval when:
+- the needed mode is stronger than the plan recorded
+- the task requires destructive git operations, external sends, production data access, or cloud mutations without explicit approval
+- the rollback path is unknown
+- evidence cannot distinguish success from partial completion
+- the runtime sandbox blocks a required action and no safer equivalent exists
+## Relationship To Worktrees
+`pull-work` owns the first worktree decision. `execute-plan` must respect it and may upgrade to `worktree`, `container`, `cloud-sandbox`, or `privileged-integration` if implementation risk increases. Downgrades require a reason in the workflow artifact.