@kontourai/flow-agents 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.githooks/pre-push +11 -0
- package/.github/workflows/ci.yml +210 -0
- package/.github/workflows/docs-pages.yml +52 -0
- package/.github/workflows/publish-npm.yml +104 -0
- package/AGENTS.md +26 -0
- package/CHANGELOG.md +66 -0
- package/CODE_OF_CONDUCT.md +25 -0
- package/CONTEXT.md +300 -0
- package/CONTRIBUTING.md +44 -0
- package/LICENSE +201 -0
- package/README.md +129 -0
- package/SECURITY.md +33 -0
- package/agent-cards/dev.json +19 -0
- package/agents/dev.json +127 -0
- package/agents/tool-code-reviewer.json +61 -0
- package/agents/tool-dependencies-updater.json +118 -0
- package/agents/tool-explore-config.json +92 -0
- package/agents/tool-explore-deps.json +92 -0
- package/agents/tool-explore-entry.json +92 -0
- package/agents/tool-explore-patterns.json +92 -0
- package/agents/tool-explore-structure.json +92 -0
- package/agents/tool-explore-tests.json +92 -0
- package/agents/tool-planner.json +57 -0
- package/agents/tool-playwright.json +145 -0
- package/agents/tool-security-reviewer.json +56 -0
- package/agents/tool-verifier.json +61 -0
- package/agents/tool-worker.json +58 -0
- package/build/src/cli/console-learning-projection.js +123 -0
- package/build/src/cli/docs-preview.js +39 -0
- package/build/src/cli/effective-backlog-settings.js +102 -0
- package/build/src/cli/export-bookmarks.js +38 -0
- package/build/src/cli/fixture-retirement-audit.js +140 -0
- package/build/src/cli/flow-kit.js +138 -0
- package/build/src/cli/import-bookmarks.js +50 -0
- package/build/src/cli/init.js +239 -0
- package/build/src/cli/instinct-cli.js +93 -0
- package/build/src/cli/promote-workflow-artifact.js +63 -0
- package/build/src/cli/publish-change-helper.js +154 -0
- package/build/src/cli/pull-work-provider.js +469 -0
- package/build/src/cli/runtime-adapter.js +23 -0
- package/build/src/cli/telemetry-doctor.js +221 -0
- package/build/src/cli/usage-feedback.js +443 -0
- package/build/src/cli/validate-hook-influence.js +152 -0
- package/build/src/cli/validate-source-tree.js +31 -0
- package/build/src/cli/validate-workflow-artifacts.js +486 -0
- package/build/src/cli/veritas-governance.js +262 -0
- package/build/src/cli/workflow-artifact-cleanup-audit.js +272 -0
- package/build/src/cli/workflow-sidecar.js +816 -0
- package/build/src/cli.js +89 -0
- package/build/src/flow-kit/validate.js +75 -0
- package/build/src/lib/args.js +45 -0
- package/build/src/lib/fs.js +62 -0
- package/build/src/lib/workflow-learning-projection.js +334 -0
- package/build/src/runtime-adapters.js +146 -0
- package/build/src/tools/build-universal-bundles.js +397 -0
- package/build/src/tools/common.js +56 -0
- package/build/src/tools/filter-installed-packs.js +132 -0
- package/build/src/tools/generate-context-map.js +198 -0
- package/build/src/tools/validate-package.js +64 -0
- package/build/src/tools/validate-source-tree.js +622 -0
- package/console.telemetry.json +176 -0
- package/context/base-rules.md +17 -0
- package/context/code-review-standards.md +62 -0
- package/context/coding-standards.md +42 -0
- package/context/common/orchestrators.md +12 -0
- package/context/common/subagents.md +28 -0
- package/context/contracts/artifact-contract.md +182 -0
- package/context/contracts/builder-kit-workflow-state-contract.md +319 -0
- package/context/contracts/delivery-contract.md +69 -0
- package/context/contracts/execution-contract.md +53 -0
- package/context/contracts/governance-adapter-contract.md +67 -0
- package/context/contracts/planning-contract.md +85 -0
- package/context/contracts/review-contract.md +104 -0
- package/context/contracts/sandbox-policy.md +52 -0
- package/context/contracts/verification-contract.md +134 -0
- package/context/contracts/work-item-contract.md +215 -0
- package/context/deferred/demo-mode.md +33 -0
- package/context/deferred/languages/go.md +31 -0
- package/context/deferred/languages/python.md +31 -0
- package/context/deferred/languages/typescript.md +34 -0
- package/context/deferred/parallelization.md +35 -0
- package/context/deferred/worktree-isolation.md +24 -0
- package/context/development-workflow.md +50 -0
- package/context/scripts/context-budget/budget-scan.sh +166 -0
- package/context/scripts/detect-tools.sh +3 -0
- package/context/scripts/discover-agents.sh +28 -0
- package/context/scripts/git-status.sh +49 -0
- package/context/scripts/hooks/config-protection.js +79 -0
- package/context/scripts/hooks/desktop-notify.sh +39 -0
- package/context/scripts/hooks/governance-audit.sh +135 -0
- package/context/scripts/hooks/lib/audit-transport.sh +40 -0
- package/context/scripts/hooks/lib/hook-flags.js +49 -0
- package/context/scripts/hooks/lib/patterns.sh +57 -0
- package/context/scripts/hooks/lib/resolve-formatter.js +80 -0
- package/context/scripts/hooks/post-edit-accumulator.js +66 -0
- package/context/scripts/hooks/pre-commit-quality.js +194 -0
- package/context/scripts/hooks/quality-gate.js +93 -0
- package/context/scripts/hooks/report-only-guard.js +21 -0
- package/context/scripts/hooks/run-hook.js +136 -0
- package/context/scripts/hooks/stop-format-typecheck.js +141 -0
- package/context/scripts/hooks/stop-goal-fit.js +337 -0
- package/context/scripts/hooks/workflow-steering.js +250 -0
- package/context/scripts/telemetry/console-presets.sh +14 -0
- package/context/scripts/telemetry/install-console-config.sh +214 -0
- package/context/scripts/telemetry/lib/config.sh +85 -0
- package/context/scripts/telemetry/lib/enrich.sh +115 -0
- package/context/scripts/telemetry/lib/redact.sh +22 -0
- package/context/scripts/telemetry/lib/session.sh +63 -0
- package/context/scripts/telemetry/lib/transport.sh +183 -0
- package/context/scripts/telemetry/lib/usage.sh +29 -0
- package/context/scripts/telemetry/sync-agents.sh +173 -0
- package/context/scripts/telemetry/telemetry.conf +23 -0
- package/context/scripts/telemetry/telemetry.sh +387 -0
- package/context/scripts/validate-package.sh +89 -0
- package/context/settings/backlog-provider-settings.json +54 -0
- package/context/templates/core/identity.md +26 -0
- package/context/templates/core/user.md +15 -0
- package/docs/_config.yml +15 -0
- package/docs/_layouts/default.html +87 -0
- package/docs/adr/0001-flow-agents-consumes-flow.md +77 -0
- package/docs/adr/0002-flow-kits-as-extension-unit.md +13 -0
- package/docs/adr/0003-flow-agents-coordinates-kits-and-adapters.md +13 -0
- package/docs/adr/0004-gates-expect-surface-claims.md +15 -0
- package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +48 -0
- package/docs/adr/0006-typescript-first-source-policy.md +98 -0
- package/docs/agent-system-guidebook.md +391 -0
- package/docs/agent-usage-feedback-loop.md +351 -0
- package/docs/assets/favicon.svg +13 -0
- package/docs/assets/og-image.png +0 -0
- package/docs/assets/site.css +774 -0
- package/docs/assets/site.js +139 -0
- package/docs/configurable-workflow-routing.md +174 -0
- package/docs/context-map.md +145 -0
- package/docs/developer-architecture.md +145 -0
- package/docs/developer-hook-setup.md +61 -0
- package/docs/fixture-ownership.md +44 -0
- package/docs/flow-kit-repository-contract.md +180 -0
- package/docs/index.md +129 -0
- package/docs/kontour-resource-contract.md +358 -0
- package/docs/migrations.md +64 -0
- package/docs/north-star.md +322 -0
- package/docs/operating-layers.md +110 -0
- package/docs/repository-structure.md +132 -0
- package/docs/sandbox-policy.md +56 -0
- package/docs/skills-map.md +203 -0
- package/docs/standards-register.md +96 -0
- package/docs/veritas-integration.md +165 -0
- package/docs/work-item-adapters.md +72 -0
- package/docs/workflow-artifact-lifecycle.md +141 -0
- package/docs/workflow-eval-strategy.md +295 -0
- package/docs/workflow-shared-contracts.md +51 -0
- package/docs/workflow-usage-guide.md +443 -0
- package/evals/ARCHITECTURE.md +143 -0
- package/evals/CONVENTIONS.md +58 -0
- package/evals/README.md +128 -0
- package/evals/acceptance/run.sh +29 -0
- package/evals/acceptance/test_claude_harness.sh +242 -0
- package/evals/acceptance/test_codex_harness.sh +108 -0
- package/evals/acceptance/test_kiro_harness.sh +128 -0
- package/evals/cases/dev/404.html +97 -0
- package/evals/cases/dev/code-review.yaml +44 -0
- package/evals/cases/dev/dashboard.html +300 -0
- package/evals/cases/dev/deliver.yaml +66 -0
- package/evals/cases/dev/dependency-update.yaml +16 -0
- package/evals/cases/dev/explore.yaml +20 -0
- package/evals/cases/dev/index.html +370 -0
- package/evals/cases/dev/package-lock.json +28 -0
- package/evals/cases/dev/package.json +16 -0
- package/evals/cases/dev/plan-work.yaml +20 -0
- package/evals/cases/dev/promptfooconfig.yaml +666 -0
- package/evals/cases/dev/search-first.yaml +20 -0
- package/evals/cases/dev/tdd-workflow.yaml +48 -0
- package/evals/cases/dev/verify-work.yaml +44 -0
- package/evals/cases/dev/workflow.yaml +34 -0
- package/evals/ci/run-baseline.sh +283 -0
- package/evals/fixtures/backlog-provider-settings/global-default.json +44 -0
- package/evals/fixtures/backlog-provider-settings/project-override.json +53 -0
- package/evals/fixtures/builder-kit-workflow-state/baseline-freshness-resolution-hint.json +139 -0
- package/evals/fixtures/builder-kit-workflow-state/direct-primitive-stop.json +59 -0
- package/evals/fixtures/builder-kit-workflow-state/empty-board-route-shape.json +55 -0
- package/evals/fixtures/builder-kit-workflow-state/happy-path.json +71 -0
- package/evals/fixtures/builder-kit-workflow-state/mid-work-resume.json +80 -0
- package/evals/fixtures/builder-kit-workflow-state/missing-prestep-recovery.json +65 -0
- package/evals/fixtures/builder-kit-workflow-state/product-build-chaining.json +60 -0
- package/evals/fixtures/builder-kit-workflow-state/stale-continuation-requires-new-probe.json +57 -0
- package/evals/fixtures/console-learning-projection/artifacts/console-learning-correction/learning.json +50 -0
- package/evals/fixtures/console-learning-projection/artifacts/console-learning-open-route/learning.json +41 -0
- package/evals/fixtures/flow-kit-repository/invalid-absolute-path/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-asset-section/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-asset-section/kit.json +11 -0
- package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/kit.json +9 -0
- package/evals/fixtures/flow-kit-repository/invalid-id/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-id/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-malformed-json/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-flow/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-id/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-id/kit.json +7 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/kit.json +7 -0
- package/evals/fixtures/flow-kit-repository/invalid-name/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-name/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-schema-version/flows/review.flow.json +6 -0
- package/evals/fixtures/flow-kit-repository/invalid-schema-version/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/invalid-traversal/kit.json +8 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/adapters/example.json +3 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/assets/example.txt +1 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/docs/README.md +3 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +26 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-evals/example.json +3 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-skills/mixed/SKILL.md +3 -0
- package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit.json +44 -0
- package/evals/fixtures/flow-kit-repository/valid-local-kit/docs/README.md +3 -0
- package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +26 -0
- package/evals/fixtures/flow-kit-repository/valid-local-kit/kit.json +20 -0
- package/evals/fixtures/hook-influence/cases.json +336 -0
- package/evals/fixtures/pull-work-provider/github-issues.json +170 -0
- package/evals/fixtures/pull-work-wip-shepherding/global-wip-informs.json +43 -0
- package/evals/fixtures/pull-work-wip-shepherding/personal-wip-blocks.json +42 -0
- package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +31 -0
- package/evals/fixtures/surface-trust/artifact-absent.json +19 -0
- package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +32 -0
- package/evals/fixtures/surface-trust/missing-authority-trust-report.json +27 -0
- package/evals/fixtures/surface-trust/provider-absent.json +19 -0
- package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +30 -0
- package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +31 -0
- package/evals/fixtures/usage-feedback/sample-full.jsonl +11 -0
- package/evals/fixtures/usage-feedback/sample-outcomes.jsonl +1 -0
- package/evals/fixtures/veritas-governance-adapter/fake-veritas-pass.sh +18 -0
- package/evals/fixtures/veritas-governance-adapter/fake-veritas-secret-fail.sh +10 -0
- package/evals/fixtures/veritas-governance-adapter/fake-veritas-unconfigured.sh +4 -0
- package/evals/integration/test_bundle_install.sh +541 -0
- package/evals/integration/test_console_learning_projection.sh +192 -0
- package/evals/integration/test_context_map.sh +65 -0
- package/evals/integration/test_effective_backlog_settings.sh +58 -0
- package/evals/integration/test_fixture_retirement_audit.sh +58 -0
- package/evals/integration/test_flow_agents_statusline.sh +93 -0
- package/evals/integration/test_flow_kit_repository.sh +90 -0
- package/evals/integration/test_goal_fit_hook.sh +482 -0
- package/evals/integration/test_hook_category_behaviors.sh +190 -0
- package/evals/integration/test_hook_influence_cases.sh +69 -0
- package/evals/integration/test_local_flow_kit_install.sh +145 -0
- package/evals/integration/test_publish_change_helper.sh +176 -0
- package/evals/integration/test_pull_work_provider.sh +140 -0
- package/evals/integration/test_runtime_adapter_activation.sh +106 -0
- package/evals/integration/test_telemetry.sh +485 -0
- package/evals/integration/test_telemetry_doctor.sh +193 -0
- package/evals/integration/test_usage_feedback_dashboard.sh +169 -0
- package/evals/integration/test_usage_feedback_global.sh +117 -0
- package/evals/integration/test_usage_feedback_import.sh +227 -0
- package/evals/integration/test_usage_feedback_outcomes.sh +165 -0
- package/evals/integration/test_usage_feedback_report.sh +263 -0
- package/evals/integration/test_veritas_governance_adapter.sh +235 -0
- package/evals/integration/test_workflow_artifact_cleanup_audit.sh +287 -0
- package/evals/integration/test_workflow_artifacts.sh +1247 -0
- package/evals/integration/test_workflow_sidecar_writer.sh +2112 -0
- package/evals/integration/test_workflow_steering_hook.sh +337 -0
- package/evals/lib/assertions/delegated-to.js +40 -0
- package/evals/lib/assertions/max-tool-calls.js +15 -0
- package/evals/lib/assertions/no-write-tools.js +27 -0
- package/evals/lib/assertions/pass-at-k.js +39 -0
- package/evals/lib/assertions/telemetry-utils.js +105 -0
- package/evals/lib/assertions/tool-called.js +39 -0
- package/evals/lib/assertions/verify-after-fix.js +61 -0
- package/evals/lib/claude-judge.sh +40 -0
- package/evals/lib/claude-provider.sh +74 -0
- package/evals/lib/codex-judge.sh +39 -0
- package/evals/lib/codex-provider.sh +81 -0
- package/evals/lib/eval-dev.sh +5 -0
- package/evals/lib/eval-judge.sh +22 -0
- package/evals/lib/eval-provider.sh +26 -0
- package/evals/lib/eval-report.sh +73 -0
- package/evals/lib/kiro-dev.sh +4 -0
- package/evals/lib/kiro-judge.sh +17 -0
- package/evals/lib/kiro-provider.sh +62 -0
- package/evals/lib/node.sh +111 -0
- package/evals/promptfooconfig.yaml +70 -0
- package/evals/run.sh +309 -0
- package/evals/static/test_evidence_refs.sh +141 -0
- package/evals/static/test_package.sh +407 -0
- package/evals/static/test_repo_hooks.sh +68 -0
- package/evals/static/test_universal_bundles.sh +274 -0
- package/evals/static/test_workflow_skills.sh +1207 -0
- package/install.sh +64 -0
- package/integrations/veritas/flow-agents.adapter.json +138 -0
- package/integrations/veritas/flow-agents.authority-settings.json +26 -0
- package/integrations/veritas/flow-agents.repo-standards.json +82 -0
- package/kits/builder/flows/build.flow.json +218 -0
- package/kits/builder/flows/shape.flow.json +127 -0
- package/kits/builder/kit.json +19 -0
- package/kits/catalog.json +11 -0
- package/package.json +130 -0
- package/packaging/README.md +60 -0
- package/packaging/manifest.json +173 -0
- package/packaging/packs.json +69 -0
- package/powers/dependency-checker/POWER.md +20 -0
- package/powers/dependency-checker/mcp.json +20 -0
- package/powers/playwright/POWER.md +25 -0
- package/powers/playwright/mcp.json +12 -0
- package/prompts/code-audit.md +123 -0
- package/prompts/kcommit.md +88 -0
- package/schemas/backlog-provider-settings.schema.json +138 -0
- package/schemas/workflow-acceptance.schema.json +216 -0
- package/schemas/workflow-critique.schema.json +113 -0
- package/schemas/workflow-evidence.schema.json +357 -0
- package/schemas/workflow-handoff.schema.json +52 -0
- package/schemas/workflow-learning.schema.json +223 -0
- package/schemas/workflow-release.schema.json +172 -0
- package/schemas/workflow-state.schema.json +80 -0
- package/scripts/README.md +111 -0
- package/scripts/build-universal-bundles.js +3 -0
- package/scripts/check-content-boundary.cjs +99 -0
- package/scripts/context-budget/budget-scan.sh +166 -0
- package/scripts/detect-tools.sh +3 -0
- package/scripts/discover-agents.sh +28 -0
- package/scripts/effective-backlog-settings.js +2 -0
- package/scripts/filter-installed-packs.js +2 -0
- package/scripts/flow-kit.js +2 -0
- package/scripts/generate-context-map.js +2 -0
- package/scripts/git-status.sh +49 -0
- package/scripts/hooks/claude-hook-adapter.js +174 -0
- package/scripts/hooks/claude-telemetry-hook.js +115 -0
- package/scripts/hooks/codex-hook-adapter.js +176 -0
- package/scripts/hooks/codex-telemetry-hook.js +95 -0
- package/scripts/hooks/config-protection.js +79 -0
- package/scripts/hooks/desktop-notify.sh +39 -0
- package/scripts/hooks/governance-audit.sh +135 -0
- package/scripts/hooks/lib/audit-transport.sh +40 -0
- package/scripts/hooks/lib/hook-flags.js +49 -0
- package/scripts/hooks/lib/patterns.sh +57 -0
- package/scripts/hooks/lib/resolve-formatter.js +80 -0
- package/scripts/hooks/post-edit-accumulator.js +66 -0
- package/scripts/hooks/pre-commit-quality.js +194 -0
- package/scripts/hooks/quality-gate.js +93 -0
- package/scripts/hooks/report-only-guard.js +21 -0
- package/scripts/hooks/run-hook.js +136 -0
- package/scripts/hooks/stop-format-typecheck.js +141 -0
- package/scripts/hooks/stop-goal-fit.js +337 -0
- package/scripts/hooks/workflow-steering.js +250 -0
- package/scripts/install-codex-home.sh +106 -0
- package/scripts/package.json +3 -0
- package/scripts/promote-workflow-artifact.js +2 -0
- package/scripts/publish-change-helper.js +2 -0
- package/scripts/pull-work-provider.js +2 -0
- package/scripts/setup-repo-hooks.sh +8 -0
- package/scripts/statusline/flow-agents-statusline.js +157 -0
- package/scripts/telemetry/console-presets.sh +14 -0
- package/scripts/telemetry/install-console-config.sh +214 -0
- package/scripts/telemetry/lib/config.sh +85 -0
- package/scripts/telemetry/lib/enrich.sh +115 -0
- package/scripts/telemetry/lib/redact.sh +22 -0
- package/scripts/telemetry/lib/session.sh +63 -0
- package/scripts/telemetry/lib/transport.sh +183 -0
- package/scripts/telemetry/lib/usage.sh +29 -0
- package/scripts/telemetry/sync-agents.sh +173 -0
- package/scripts/telemetry/telemetry.conf +23 -0
- package/scripts/telemetry/telemetry.sh +387 -0
- package/scripts/usage-feedback.js +2 -0
- package/scripts/validate-hook-influence-cases.js +2 -0
- package/scripts/validate-package.sh +89 -0
- package/scripts/validate-source-tree.js +9 -0
- package/skills/agentic-engineering/SKILL.md +62 -0
- package/skills/browser-test/SKILL.md +51 -0
- package/skills/builder-shape/SKILL.md +76 -0
- package/skills/context-budget/SKILL.md +40 -0
- package/skills/deliver/SKILL.md +241 -0
- package/skills/dependency-update/SKILL.md +68 -0
- package/skills/design-probe/SKILL.md +107 -0
- package/skills/eval-rebuild/SKILL.md +39 -0
- package/skills/evidence-gate/SKILL.md +186 -0
- package/skills/execute-plan/SKILL.md +110 -0
- package/skills/explore/SKILL.md +137 -0
- package/skills/feedback-loop/SKILL.md +87 -0
- package/skills/fix-bug/SKILL.md +133 -0
- package/skills/frontend-design/SKILL.md +80 -0
- package/skills/github-cli/SKILL.md +63 -0
- package/skills/idea-to-backlog/SKILL.md +267 -0
- package/skills/knowledge-capture/SKILL.md +55 -0
- package/skills/learning-review/SKILL.md +115 -0
- package/skills/pickup-probe/SKILL.md +114 -0
- package/skills/plan-work/SKILL.md +176 -0
- package/skills/pull-work/SKILL.md +309 -0
- package/skills/release-readiness/SKILL.md +121 -0
- package/skills/review-work/SKILL.md +161 -0
- package/skills/search-first/SKILL.md +66 -0
- package/skills/tdd-workflow/SKILL.md +140 -0
- package/skills/verify-work/SKILL.md +109 -0
- package/src/cli/console-learning-projection.ts +140 -0
- package/src/cli/effective-backlog-settings.ts +99 -0
- package/src/cli/fixture-retirement-audit.ts +154 -0
- package/src/cli/flow-kit.ts +139 -0
- package/src/cli/init.ts +248 -0
- package/src/cli/promote-workflow-artifact.ts +64 -0
- package/src/cli/publish-change-helper.ts +143 -0
- package/src/cli/pull-work-provider.ts +481 -0
- package/src/cli/runtime-adapter.ts +24 -0
- package/src/cli/telemetry-doctor.ts +243 -0
- package/src/cli/usage-feedback.ts +418 -0
- package/src/cli/validate-hook-influence.ts +119 -0
- package/src/cli/validate-source-tree.ts +30 -0
- package/src/cli/validate-workflow-artifacts.ts +411 -0
- package/src/cli/veritas-governance.ts +322 -0
- package/src/cli/workflow-artifact-cleanup-audit.ts +281 -0
- package/src/cli/workflow-sidecar.ts +676 -0
- package/src/cli.ts +95 -0
- package/src/flow-kit/validate.ts +74 -0
- package/src/lib/args.ts +43 -0
- package/src/lib/fs.ts +62 -0
- package/src/lib/workflow-learning-projection.ts +491 -0
- package/src/runtime-adapters.ts +154 -0
- package/src/tools/build-universal-bundles.ts +366 -0
- package/src/tools/common.ts +61 -0
- package/src/tools/filter-installed-packs.ts +129 -0
- package/src/tools/generate-context-map.ts +199 -0
- package/src/tools/validate-package.ts +57 -0
- package/src/tools/validate-source-tree.ts +488 -0
- package/tsconfig.json +19 -0
- package/veritas.claims.json +6 -0
|
@@ -0,0 +1,186 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "evidence-gate"
|
|
3
|
+
description: "Evaluate whether completed work is trustworthy enough for human review, merge, or release. Use after implementation, verify-work, provider checks, CI, or remediation to map acceptance criteria to evidence, inspect scope integrity, classify failures, assess check health, and produce a confidence report."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Evidence Gate
|
|
7
|
+
|
|
8
|
+
Build confidence with falsifiable evidence, not process completion.
|
|
9
|
+
|
|
10
|
+
Evidence Gate is not Release Readiness. It asks whether completed work has enough trustworthy evidence, scope integrity, and provider/runtime signal to publish the change, continue fixing, or ask for a human decision. Release Readiness comes later and decides whether a published branch/provider change should merge, release, deploy, hold, or roll back.
|
|
11
|
+
|
|
12
|
+
## Contract
|
|
13
|
+
|
|
14
|
+
- Review evidence after implementation and verification.
|
|
15
|
+
- Do not fix code.
|
|
16
|
+
- Do not mark unverified work as passing.
|
|
17
|
+
- Treat `NOT_VERIFIED` as a first-class outcome.
|
|
18
|
+
- Separate evidence provenance: human-authored, agent-authored, CI-generated, runtime-observed.
|
|
19
|
+
- Do not approve release readiness.
|
|
20
|
+
- After a clean local evidence verdict, require a publish-change gate before `release-readiness`: verified diff committed, branch pushed, provider change opened or updated by the active `ChangeProvider` or an explicit no-provider-change reason recorded, closing refs recorded, provider checks known, and evidence refs linked.
|
|
21
|
+
- Provider-facing summaries, PR/change descriptions, issue comments, closure comments, and final acceptance comments that claim implementation behavior must include an `Acceptance Evidence` table with columns `AC id`, `Status`, `Command/Test Evidence`, `Source Evidence / Permalinks`, and `Gaps`.
|
|
22
|
+
|
|
23
|
+
## Inputs
|
|
24
|
+
|
|
25
|
+
- Work brief or selected GitHub issue.
|
|
26
|
+
- Execution plan.
|
|
27
|
+
- Verification report.
|
|
28
|
+
- Provider change / branch / check run links when available.
|
|
29
|
+
- Changed-file summary.
|
|
30
|
+
- Active TODOs, issue links, and release/rollback notes.
|
|
31
|
+
|
|
32
|
+
## Artifact Contract
|
|
33
|
+
|
|
34
|
+
Write or update `.flow-agents/<slug>/<slug>--evidence-gate.md` with:
|
|
35
|
+
|
|
36
|
+
- `intent`: issue/brief, acceptance criteria, non-goals, risk class
|
|
37
|
+
- `evidence_manifest`: command/check name, source, timestamp, result, link/output pointer
|
|
38
|
+
- `test_map`: acceptance criterion to evidence tier and gaps
|
|
39
|
+
- `integrity_report`: scope drift, weakened tests/config, sensitive files
|
|
40
|
+
- `ci_report`: checks, reruns, flakes, failures, skipped checks
|
|
41
|
+
- `risk_assessment`: residual risks and required human review
|
|
42
|
+
- `verdict`: PASS, FAIL, or NOT_VERIFIED
|
|
43
|
+
- `next_step`: publish-change, release-readiness, verify-work, execute-plan, plan-work, CI remediation, or human decision
|
|
44
|
+
|
|
45
|
+
Also write or update structured sidecars:
|
|
46
|
+
|
|
47
|
+
- `state.json`: phase `evidence`, current status, and required next action
|
|
48
|
+
- `acceptance.json`: final criterion statuses and goal-fit status
|
|
49
|
+
- `evidence.json`: normalized checks, `standard_refs`, external evidence refs, not-verified gaps, and verdict
|
|
50
|
+
- `handoff.json`: next step and blockers when verdict is not a clean pass
|
|
51
|
+
|
|
52
|
+
Prefer `npm run workflow:sidecar --` for sidecar updates when available, then validate the artifact directory before reporting a clean pass.
|
|
53
|
+
|
|
54
|
+
## Workflow
|
|
55
|
+
|
|
56
|
+
### 1. Anchor To Intent
|
|
57
|
+
|
|
58
|
+
Restate:
|
|
59
|
+
|
|
60
|
+
- original problem
|
|
61
|
+
- acceptance criteria
|
|
62
|
+
- non-goals
|
|
63
|
+
- expected risk class
|
|
64
|
+
- authoritative artifacts
|
|
65
|
+
|
|
66
|
+
If acceptance criteria changed after implementation began, flag scope drift unless the decision is documented.
|
|
67
|
+
|
|
68
|
+
### 2. Build Test Map
|
|
69
|
+
|
|
70
|
+
For each acceptance criterion, map evidence to one of:
|
|
71
|
+
|
|
72
|
+
- existing automated test
|
|
73
|
+
- new or modified automated test
|
|
74
|
+
- browser/runtime check
|
|
75
|
+
- static analysis
|
|
76
|
+
- CI check
|
|
77
|
+
- manual/human verification
|
|
78
|
+
- `NOT_VERIFIED` with rationale
|
|
79
|
+
|
|
80
|
+
Block clean pass if high-risk criteria have only indirect evidence.
|
|
81
|
+
Every acceptance criterion must map to evidence or `NOT_VERIFIED`.
|
|
82
|
+
For implementation-behavior claims, each criterion must map to both command/test proof and structured source evidence refs. Source refs require `kind: "source"`, `file`, `line_start`, `line_end`, and `excerpt`; include immutable GitHub blob permalinks pinned to a commit SHA in `url` when a pushed commit/provider URL exists. Local file/line refs are acceptable only as pre-publish fallback evidence.
|
|
83
|
+
|
|
84
|
+
Use this table shape in evidence-gate summaries and provider/closure comments:
|
|
85
|
+
|
|
86
|
+
| AC id | Status | Command/Test Evidence | Source Evidence / Permalinks | Gaps |
|
|
87
|
+
| --- | --- | --- | --- | --- |
|
|
88
|
+
|
|
89
|
+
Rows must preserve the original AC ids. If source evidence is missing for a behavior claim, the row must say `NOT_VERIFIED` or name an accepted gap; do not issue a clean pass from prose-only claims.
|
|
90
|
+
|
|
91
|
+
### 3. Scope And Integrity Check
|
|
92
|
+
|
|
93
|
+
Check for process gaming or accidental drift:
|
|
94
|
+
|
|
95
|
+
- scope expanded beyond issue/brief
|
|
96
|
+
- acceptance criteria changed after implementation
|
|
97
|
+
- tests removed or weakened
|
|
98
|
+
- verification config altered
|
|
99
|
+
- CI config altered
|
|
100
|
+
- required CI bypassed
|
|
101
|
+
- sensitive files touched without review
|
|
102
|
+
|
|
103
|
+
Sensitive areas include auth, security middleware, data migrations, CI config, deployment scripts, feature flags, test helpers, lint/type config, payment, crypto, and filesystem/network operations.
|
|
104
|
+
|
|
105
|
+
### 4. CI And Flake Assessment
|
|
106
|
+
|
|
107
|
+
Use `github-cli` / `gh` when available.
|
|
108
|
+
|
|
109
|
+
Record:
|
|
110
|
+
|
|
111
|
+
- check names
|
|
112
|
+
- pass/fail/skipped
|
|
113
|
+
- rerun count
|
|
114
|
+
- flake suspicion
|
|
115
|
+
- logs or artifact links
|
|
116
|
+
- failure class
|
|
117
|
+
- standard evidence refs when CI emits SARIF, JUnit, TAP, OpenTelemetry, Veritas, or another native proof format
|
|
118
|
+
|
|
119
|
+
For Flow Agents source changes, prefer the GitHub Actions `Flow Agents CI / Builder Kit Baseline` provider check when present. Its local equivalent is `bash evals/ci/run-baseline.sh`, which writes `evals/results/ci-baseline/summary.md` and command logs. Treat skipped live GitHub mutation checks, LLM acceptance, or unavailable Veritas/governance evidence as explicit skip or `NOT_VERIFIED` entries based on the work's risk class; do not convert the baseline summary into proof that those live lanes ran.
|
|
120
|
+
|
|
121
|
+
Treat passed-after-rerun as degraded confidence unless explained.
|
|
122
|
+
|
|
123
|
+
### 5. Evidence Tiers
|
|
124
|
+
|
|
125
|
+
Classify evidence:
|
|
126
|
+
|
|
127
|
+
- Tier 0: claim only, no artifact.
|
|
128
|
+
- Tier 1: local command output.
|
|
129
|
+
- Tier 2: automated test tied to acceptance criterion.
|
|
130
|
+
- Tier 3: CI-confirmed test on a clean environment.
|
|
131
|
+
- Tier 4: runtime/browser/production-like verification with trace or log artifact.
|
|
132
|
+
- Tier 5: post-deploy telemetry confirms expected behavior.
|
|
133
|
+
|
|
134
|
+
Higher-risk work requires stronger tiers.
|
|
135
|
+
|
|
136
|
+
When an evidence source already has a standard format, keep that format as the native artifact and reference it from `evidence.json`:
|
|
137
|
+
|
|
138
|
+
- SARIF: static analysis, security, code review, and policy findings.
|
|
139
|
+
- OpenTelemetry logs/traces: runtime behavior, tool/model calls, workflow telemetry, and post-deploy events.
|
|
140
|
+
- JUnit/TAP: test results.
|
|
141
|
+
- Veritas: optional evidence checks, repo standards, and authority settings. Flow Agents records the Veritas reference and verdict but does not own Veritas policy semantics.
|
|
142
|
+
|
|
143
|
+
Use `context/contracts/governance-adapter-contract.md` before invoking Veritas or any similar governance provider. If the adapter is unavailable, record `NOT_VERIFIED` unless the user explicitly accepts skipping that governance evidence.
|
|
144
|
+
|
|
145
|
+
### 6. Verdict
|
|
146
|
+
|
|
147
|
+
Produce:
|
|
148
|
+
|
|
149
|
+
- `PASS`: evidence satisfies risk and acceptance criteria.
|
|
150
|
+
- `FAIL`: evidence shows the work is wrong or unsafe.
|
|
151
|
+
- `NOT_VERIFIED`: evidence is missing, indirect, blocked, or inconclusive.
|
|
152
|
+
|
|
153
|
+
For failures, classify:
|
|
154
|
+
|
|
155
|
+
- implementation defect
|
|
156
|
+
- bad plan
|
|
157
|
+
- bad acceptance criteria
|
|
158
|
+
- flaky infrastructure
|
|
159
|
+
- missing environment
|
|
160
|
+
- security concern
|
|
161
|
+
- product ambiguity
|
|
162
|
+
- scope drift
|
|
163
|
+
|
|
164
|
+
Include required next evidence and whether to return to `plan-work`, `execute-plan`, `verify-work`, `remediate-ci`, or human decision.
|
|
165
|
+
|
|
166
|
+
### 7. Publish Change Gate
|
|
167
|
+
|
|
168
|
+
If the evidence verdict is otherwise `PASS` but the verified diff is not committed, pushed, and represented by a provider change record or an explicit no-provider-change reason, set `next_step` to `publish-change` instead of `release-readiness`.
|
|
169
|
+
|
|
170
|
+
Use `git` and the active `ChangeProvider` adapter when available to:
|
|
171
|
+
|
|
172
|
+
- confirm the working tree contains only the verified scope
|
|
173
|
+
- commit the verified diff with a clear message
|
|
174
|
+
- push the branch
|
|
175
|
+
- open or update the provider change record linked to the issue/brief, closing refs, and evidence artifact, or record why no provider change is required
|
|
176
|
+
- include or update the provider-facing `Acceptance Evidence` table, upgrading local source refs to immutable GitHub blob permalinks when the commit SHA and repository URL are known
|
|
177
|
+
- collect provider check/CI links and statuses, or record why provider checks are unavailable
|
|
178
|
+
- keep GitHub PRs as the first `ChangeProvider` adapter example: for GitHub, open or update a PR and collect PR checks
|
|
179
|
+
|
|
180
|
+
If commit, push, provider change publication, or provider checks are blocked, keep the release path at `NOT_VERIFIED` or `HOLD` until the blocker is resolved or explicitly accepted by the user.
|
|
181
|
+
|
|
182
|
+
## Gate
|
|
183
|
+
|
|
184
|
+
Evidence passes only when acceptance criteria, scope integrity, CI/runtime evidence, and residual risk are sufficient for the risk class.
|
|
185
|
+
|
|
186
|
+
After `PASS`, hand off to `publish-change` when the work is still local, or to `release-readiness` when the verified commit, pushed branch, provider change record or no-provider-change reason, provider checks, closing refs, structured evidence refs, and `Acceptance Evidence` table are available. After `FAIL` or `NOT_VERIFIED`, stop and name the missing work or evidence.
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "execute-plan"
|
|
3
|
+
description: "Parallel execution primitive — plan artifact path to implemented code via tool-worker (x4). Reads plan directly. Updates session file between waves."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Execute
|
|
7
|
+
|
|
8
|
+
Plan artifact in, implemented code out. Fans out to tool-worker subagents in parallel waves.
|
|
9
|
+
|
|
10
|
+
## Agents
|
|
11
|
+
|
|
12
|
+
| Agent | Role |
|
|
13
|
+
|---|---|
|
|
14
|
+
| tool-worker | Implementation per task spec (up to 4 parallel) |
|
|
15
|
+
|
|
16
|
+
## Orchestrator Rule
|
|
17
|
+
|
|
18
|
+
You do not write source files. You read the plan artifact, fan out tasks to tool-worker, and update the session file between waves.
|
|
19
|
+
|
|
20
|
+
## Shared Contracts
|
|
21
|
+
|
|
22
|
+
Follow:
|
|
23
|
+
- `context/contracts/artifact-contract.md`
|
|
24
|
+
- `context/contracts/execution-contract.md`
|
|
25
|
+
- `context/contracts/planning-contract.md` for the plan artifact and Definition Of Done
|
|
26
|
+
- `context/contracts/sandbox-policy.md`
|
|
27
|
+
|
|
28
|
+
This skill owns orchestration between waves. The contracts own artifact continuity, worker task expectations, conflict handling, validation expectations, and completion rules.
|
|
29
|
+
|
|
30
|
+
## Input
|
|
31
|
+
|
|
32
|
+
- **Plan artifact path**: path to the `-plan.md` file in `.flow-agents/<slug>/`
|
|
33
|
+
- **Session file path**: the session file to update with progress
|
|
34
|
+
|
|
35
|
+
## Workflow
|
|
36
|
+
|
|
37
|
+
1. Read the plan artifact directly
|
|
38
|
+
2. Confirm the plan follows `context/contracts/planning-contract.md`, including `## Definition Of Done`. If missing, return to `plan-work` before implementation.
|
|
39
|
+
3. Confirm the plan records an appropriate `sandbox_mode` using `context/contracts/sandbox-policy.md`. If missing, infer the smallest safe mode and record it before delegation.
|
|
40
|
+
4. Confirm execution traceability before any worker starts:
|
|
41
|
+
- acceptance criteria have stable ids, preferably matching `acceptance.json`
|
|
42
|
+
- every wave/task lists the acceptance ids it supports
|
|
43
|
+
- the session/deliver file copies or links the criteria and includes a `Requirements Trace` or equivalent mapping
|
|
44
|
+
- each worker prompt includes the relevant acceptance ids and required evidence, not only a loose task title
|
|
45
|
+
- if traceability is missing, update the session file and/or send the plan back for refinement before delegation
|
|
46
|
+
5. Set session file `status: executing` and use `npm run workflow:sidecar -- advance-state <artifact-dir> --status in_progress --phase execution --summary ... --next-action ...` when the repository provides it
|
|
47
|
+
6. **Frontend design check:** If any tasks involve UI, CSS, layouts, components, or visual design, read the `frontend-design` skill and include its aesthetics guidelines in the tool-worker prompts for those tasks
|
|
48
|
+
7. Fan out each wave to tool-worker subagents (up to 4 parallel):
|
|
49
|
+
- Delegate to the exact `tool-worker` role for every implementation worker. Do not spawn unnamed/default implementation agents.
|
|
50
|
+
```
|
|
51
|
+
Each tool-worker gets:
|
|
52
|
+
- Task description from plan
|
|
53
|
+
- Files to create/modify
|
|
54
|
+
- Acceptance criteria
|
|
55
|
+
- Acceptance criterion ids and requirement ids this task supports
|
|
56
|
+
- Required evidence for those criteria
|
|
57
|
+
- Definition Of Done items that this task supports
|
|
58
|
+
- Sandbox mode, approval assumptions, rollback expectations, and escalation stop conditions
|
|
59
|
+
- Context from plan + prior wave results
|
|
60
|
+
- Plan artifact path (so it can read full context directly)
|
|
61
|
+
```
|
|
62
|
+
8. Between waves:
|
|
63
|
+
- Collect results from all tool-worker subagents
|
|
64
|
+
- Check for conflicts before next wave
|
|
65
|
+
- Feed completed wave context forward
|
|
66
|
+
- **Checkpoint**: update session file with completed tasks and next wave
|
|
67
|
+
- Record worker progress with `npm run workflow:sidecar -- record-agent-event --artifact-dir <artifact-dir> --agent-id <worker-id> --kind evidence --status active|done --summary ...`
|
|
68
|
+
9. After all waves: set session file `status: executed` and update `state.json` / `handoff.json` with `advance-state`
|
|
69
|
+
|
|
70
|
+
The orchestrator owns root `state.json` updates. Workers should receive the workflow artifact root explicitly and append agent events under that root instead of inferring the slug or rewriting shared sidecars.
|
|
71
|
+
|
|
72
|
+
## Session File Updates
|
|
73
|
+
|
|
74
|
+
Between each wave, append to the session file:
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
## Execution Progress
|
|
78
|
+
|
|
79
|
+
### Wave 1 (completed)
|
|
80
|
+
- [x] Task A — done. Supports: AC1, AC2. Evidence: <test/check/artifact>. Modified files: `<path>`.
|
|
81
|
+
- [x] Task B — done. Supports: AC3. Evidence: <test/check/artifact>. Modified files: `<path>`.
|
|
82
|
+
|
|
83
|
+
### Wave 2 (in progress)
|
|
84
|
+
- [ ] Task C. Supports: AC4, AC5. Required evidence: <test/check/artifact>.
|
|
85
|
+
- [ ] Task D. Supports: AC6. Required evidence: <test/check/artifact>.
|
|
86
|
+
|
|
87
|
+
## Requirements Trace
|
|
88
|
+
|
|
89
|
+
- R1 <requirement>. Acceptance: AC1, AC2.
|
|
90
|
+
- R2 <requirement>. Acceptance: AC3.
|
|
91
|
+
|
|
92
|
+
## Modified Files / Scope
|
|
93
|
+
|
|
94
|
+
- Record changed paths in the session/deliver artifact and worker event summaries after each wave.
|
|
95
|
+
- Do not add ad hoc `modified_files` keys to `state.json` unless the sidecar schema explicitly supports them.
|
|
96
|
+
- Verification and optional governance providers such as Veritas should consume this scope from the session/evidence artifacts or a dedicated evidence sidecar, not from invalid state fields.
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
This is the recovery point. If context is lost, a new session reads this and knows which waves are done.
|
|
100
|
+
|
|
101
|
+
## Output
|
|
102
|
+
|
|
103
|
+
- Implemented code in the working directory
|
|
104
|
+
- Session file updated with execution progress and `status: executed`
|
|
105
|
+
- Execution progress follows `context/contracts/execution-contract.md`
|
|
106
|
+
- Structured state/handoff sidecars advanced when `npm run workflow:sidecar --` is available
|
|
107
|
+
|
|
108
|
+
If `advance-state` or artifact validation is unavailable or blocked, record that exact blocker in the session file and do not mark execution as cleanly complete.
|
|
109
|
+
|
|
110
|
+
{context?}
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "explore"
|
|
3
|
+
description: "Parallel codebase exploration — fans out subagents to map structure, entry points, dependencies, patterns, config, and tests in one pass."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Codebase Exploration
|
|
7
|
+
|
|
8
|
+
Efficiently gather context about repositories by running parallel exploration tasks.
|
|
9
|
+
|
|
10
|
+
## Harness Limit
|
|
11
|
+
|
|
12
|
+
Some harnesses cap a single delegation batch at 4 subagents.
|
|
13
|
+
- Respect the current harness limit.
|
|
14
|
+
- If the limit is unknown, assume 4.
|
|
15
|
+
- Never submit more than 4 subagents in one batch.
|
|
16
|
+
- Use multiple waves when needed rather than overfilling the first fan-out.
|
|
17
|
+
|
|
18
|
+
## Exploration Strategy
|
|
19
|
+
|
|
20
|
+
Spawn MULTIPLE subagents IN PARALLEL to investigate different dimensions:
|
|
21
|
+
|
|
22
|
+
### Wave 1A (parallel, up to 4 subagents)
|
|
23
|
+
1. **Structure Scout** - Map directory structure, identify key folders (src, lib, tests, config)
|
|
24
|
+
2. **Entry Point Finder** - Locate main files, CLI entry points, API routes, exports
|
|
25
|
+
3. **Dependency Analyzer** - Parse package.json, requirements.txt, go.mod, Cargo.toml, pom.xml
|
|
26
|
+
4. **Pattern Detective** - Identify architectural patterns, frameworks, coding conventions
|
|
27
|
+
|
|
28
|
+
### Wave 1B (parallel, after Wave 1A if needed)
|
|
29
|
+
5. **Config Inspector** - Find and summarize configuration files, env vars, build configs
|
|
30
|
+
6. **Test Mapper** - Locate test files, understand testing strategy and coverage areas
|
|
31
|
+
7. **Documentation Auditor** - Cross-reference all documentation against actual file system state:
|
|
32
|
+
- README agent tables vs actual `agents/*.agent-spec.json` files (ghost agents? missing agents?)
|
|
33
|
+
- README skill lists vs actual `skills/*/SKILL.md` files
|
|
34
|
+
- README dependency lists vs `Config` file declarations
|
|
35
|
+
- AGENTS.md shared sections consistency across packages (paths, naming examples, model references)
|
|
36
|
+
- All `.md` and `.json` files: grep for references to agents, skills, or paths that don't exist
|
|
37
|
+
- Agent spec `resources` paths: verify referenced context files exist
|
|
38
|
+
- Agent spec `model` fields: verify they follow conventions (orchestrators=opus, tools=haiku/sonnet)
|
|
39
|
+
- Typos and spelling errors in documentation files
|
|
40
|
+
- Empty directories or dead skill/SOP stubs
|
|
41
|
+
|
|
42
|
+
### Wave 2 (after Wave 1A/1B — needs dependency list)
|
|
43
|
+
7. **Tech Stack Researcher** - Research the identified tech stack using web search tools (`web_search`, `web_fetch`) and `tool-dependencies-updater` (audit-only — do NOT apply updates). Goals:
|
|
44
|
+
- Identify outdated or deprecated dependencies and how significant an upgrade would be (patch vs minor vs major, breaking changes)
|
|
45
|
+
- Discover new features in the current stack that the project could leverage
|
|
46
|
+
- Assess whether any part of the stack is irrelevant, superseded, or approaching EOL
|
|
47
|
+
- Surface project-specific context (migration guides, EOL announcements, known issues)
|
|
48
|
+
|
|
49
|
+
## Execution Model
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
[User Request]
|
|
53
|
+
|
|
|
54
|
+
v
|
|
55
|
+
[Wave 1A: Spawn first 4 dimensions in parallel]
|
|
56
|
+
|
|
|
57
|
+
v
|
|
58
|
+
[Wave 1B: Spawn remaining dimensions in parallel if needed]
|
|
59
|
+
|
|
|
60
|
+
v
|
|
61
|
+
[Aggregate Wave 1 findings]
|
|
62
|
+
|
|
|
63
|
+
v
|
|
64
|
+
[Wave 2: Spawn Tech Stack Researcher with dependency list from Wave 1]
|
|
65
|
+
- tool-dependencies-updater: audit-only scan for outdated packages, version gaps, security advisories
|
|
66
|
+
- web search: research key frameworks/libraries for new features, deprecation, relevance
|
|
67
|
+
|
|
|
68
|
+
v
|
|
69
|
+
[Final Synthesis]
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Subagent Prompts (use these as templates)
|
|
73
|
+
|
|
74
|
+
Wave 1A:
|
|
75
|
+
- "Explore the directory structure of this repo. List key folders and their purposes. Focus on: [specific area if provided]"
|
|
76
|
+
- "Find all entry points in this codebase - main files, CLI commands, API routes, exported modules"
|
|
77
|
+
- "Analyze dependencies - what frameworks, libraries, and tools does this project use?"
|
|
78
|
+
- "Identify architectural patterns - is this MVC, microservices, monolith? What conventions are used?"
|
|
79
|
+
|
|
80
|
+
Wave 1B:
|
|
81
|
+
- "Find and summarize all configuration files - what can be configured and how?"
|
|
82
|
+
- "Map the test structure - where are tests, what testing frameworks, what's the coverage strategy?"
|
|
83
|
+
- "Audit all documentation for accuracy: (1) List every agent-spec.json file and cross-reference against README agent tables — flag any agents listed in docs but missing from disk or vice versa. (2) List every skills/*/SKILL.md and cross-reference against README skill lists. (3) Compare Config dependency declarations against README dependency sections. (4) Grep all .md and .json files for references to agent names and verify each referenced agent exists as an agent-spec.json. (5) Check AGENTS.md files across packages for inconsistent paths, naming examples, or model references. (6) Flag empty directories, typos, and dead stubs."
|
|
84
|
+
|
|
85
|
+
Wave 2 (spawn these two in parallel):
|
|
86
|
+
- tool-dependencies-updater: "Scan this project for all dependency manifests, check every dependency against the latest available version, run security advisory checks on outdated packages, and report findings grouped by risk level (critical/major/minor). Do NOT apply any updates — audit only."
|
|
87
|
+
- web search: "Research the following tech stack: [list key frameworks/libraries from Wave 1]. For each, find: (1) latest stable version and what's new, (2) any deprecation or EOL announcements, (3) notable new features that could benefit this project, (4) whether any component has been superseded by a better alternative. Cite sources."
|
|
88
|
+
|
|
89
|
+
## Output Format
|
|
90
|
+
|
|
91
|
+
After all subagents complete, synthesize into:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
## Codebase Overview
|
|
95
|
+
[1-2 sentence summary]
|
|
96
|
+
|
|
97
|
+
## Key Findings
|
|
98
|
+
- **Tech Stack**: [languages, frameworks, tools]
|
|
99
|
+
- **Architecture**: [pattern, structure]
|
|
100
|
+
- **Entry Points**: [main files, commands]
|
|
101
|
+
- **Configuration**: [key config files]
|
|
102
|
+
- **Testing**: [strategy, frameworks]
|
|
103
|
+
|
|
104
|
+
## Tech Stack Health
|
|
105
|
+
- **Outdated (Critical)**: [packages with security vulnerabilities]
|
|
106
|
+
- **Outdated (Major)**: [packages with major version bumps available — note breaking change risk]
|
|
107
|
+
- **Outdated (Minor)**: [packages with minor/patch updates]
|
|
108
|
+
- **New Features Available**: [notable new capabilities in current stack]
|
|
109
|
+
- **Deprecation/EOL Warnings**: [anything approaching end of life]
|
|
110
|
+
- **Upgrade Effort Summary**: [overall assessment — low/medium/high effort to get current]
|
|
111
|
+
|
|
112
|
+
## Recommended Starting Points
|
|
113
|
+
[Files to read first for understanding]
|
|
114
|
+
|
|
115
|
+
## Potential Concerns
|
|
116
|
+
[Any issues, outdated deps, missing tests, etc.]
|
|
117
|
+
|
|
118
|
+
## Documentation Audit
|
|
119
|
+
- **Ghost references**: [agents/skills/paths mentioned in docs but not on disk]
|
|
120
|
+
- **Missing from docs**: [agents/skills that exist on disk but aren't documented]
|
|
121
|
+
- **Stale content**: [outdated descriptions, wrong dependency lists, inconsistent AGENTS.md sections]
|
|
122
|
+
- **Config mismatches**: [README deps vs Config file deps]
|
|
123
|
+
- **Path inconsistencies**: [resource paths in agent specs that don't follow conventions]
|
|
124
|
+
- **Empty/dead artifacts**: [empty directories, stub files with no content]
|
|
125
|
+
- **Typos**: [spelling errors found in documentation]
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
## Key Principles
|
|
129
|
+
|
|
130
|
+
- ALWAYS run explorations in PARALLEL within the current harness limit - this is the whole point
|
|
131
|
+
- Never exceed 4 subagents in one batch unless the harness explicitly allows more
|
|
132
|
+
- Wave 2 (Tech Stack Researcher) runs AFTER Wave 1A/1B completes because it needs the dependency list
|
|
133
|
+
- tool-dependencies-updater is used in AUDIT-ONLY mode — never apply updates during explore
|
|
134
|
+
- Be thorough but efficient - don't read entire files, scan for structure
|
|
135
|
+
- Focus on what helps someone GET STARTED quickly
|
|
136
|
+
- Flag anything unusual or concerning
|
|
137
|
+
- If a specific area is requested, weight exploration toward that area
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "feedback-loop"
|
|
3
|
+
description: "Verify implementation actually works. Visual changes → Playwright; integration changes → commands/tests. Run after completing builds."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Feedback Loop
|
|
7
|
+
|
|
8
|
+
Verify that what you claim to have built actually works. Don't just say "done" — prove it.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- After implementing changes, before declaring them complete
|
|
13
|
+
- When the user asks you to verify or prove your work
|
|
14
|
+
- As the final step of any implementation workflow
|
|
15
|
+
- When you're uncertain whether your changes actually function correctly
|
|
16
|
+
|
|
17
|
+
## Workflow
|
|
18
|
+
|
|
19
|
+
### Step 1: IDENTIFY CHANGES
|
|
20
|
+
|
|
21
|
+
Determine what was just built:
|
|
22
|
+
- Check `git diff` for modified/added files
|
|
23
|
+
- Review the active TODO list for context on what was implemented
|
|
24
|
+
- Identify the nature of the change: what should be different now?
|
|
25
|
+
|
|
26
|
+
### Step 2: CLASSIFY
|
|
27
|
+
|
|
28
|
+
Determine the verification method:
|
|
29
|
+
|
|
30
|
+
| Change Type | Method | Examples |
|
|
31
|
+
|---|---|---|
|
|
32
|
+
| **Visual** | Playwright via `tool-playwright` | UI components, pages, styles, layouts, forms, visual regressions |
|
|
33
|
+
| **Integration** | Commands, tests, execution | APIs, CLIs, libraries, configs, build scripts, data processing |
|
|
34
|
+
|
|
35
|
+
If changes span both, run both verification paths.
|
|
36
|
+
|
|
37
|
+
### Step 3: VERIFY
|
|
38
|
+
|
|
39
|
+
#### Visual Path (frontend/UI changes)
|
|
40
|
+
Delegate to `tool-playwright`:
|
|
41
|
+
1. Load the relevant URL (local dev server, preview, etc.)
|
|
42
|
+
2. Take an accessibility snapshot to confirm elements exist and are structured correctly
|
|
43
|
+
3. Take a screenshot for visual confirmation
|
|
44
|
+
4. If interactive — click, type, navigate to exercise the changed behavior
|
|
45
|
+
5. Compare against expected state: are the right elements present? Does the layout match intent?
|
|
46
|
+
|
|
47
|
+
If the dev server isn't running, start it (or tell the user to) before proceeding.
|
|
48
|
+
|
|
49
|
+
#### Integration Path (non-visual changes)
|
|
50
|
+
Use the most direct verification available, in priority order:
|
|
51
|
+
1. **Run existing tests** — if tests cover the changed code, run them
|
|
52
|
+
2. **Execute the code** — run the CLI command, call the API endpoint, import the module
|
|
53
|
+
3. **Check build** — compile/lint to confirm no syntax or type errors
|
|
54
|
+
4. **Inspect output** — verify the output matches expected behavior
|
|
55
|
+
|
|
56
|
+
Always capture actual output as evidence.
|
|
57
|
+
|
|
58
|
+
### Step 4: REPORT
|
|
59
|
+
|
|
60
|
+
State clearly:
|
|
61
|
+
- **What was verified** — which changes, which method
|
|
62
|
+
- **Evidence** — actual output, screenshots, test results, command output
|
|
63
|
+
- **Verdict** — ✅ confirmed working, or ❌ found issues with specifics
|
|
64
|
+
|
|
65
|
+
If verification fails, fix the issue and re-verify. Don't report failure without attempting a fix first.
|
|
66
|
+
|
|
67
|
+
## Persistence Rule
|
|
68
|
+
|
|
69
|
+
**Keep trying until the user says stop.** This is the core behavior of the feedback loop.
|
|
70
|
+
|
|
71
|
+
- If a verification method fails (Playwright won't connect, tests error out, server won't start), **debug and retry**. Don't downgrade to a weaker method or declare "good enough."
|
|
72
|
+
- If visual verification is required and Playwright is having issues, fix the Playwright issue. Don't fall back to "well the build passes so it's probably fine."
|
|
73
|
+
- If integration tests fail, diagnose why, fix, and re-run. Don't report partial success.
|
|
74
|
+
- Cycle: **verify → fail → diagnose → fix → verify again**. Repeat until either:
|
|
75
|
+
1. ✅ All verification methods pass with evidence, OR
|
|
76
|
+
2. 🛑 The user explicitly says to stop or skip a method
|
|
77
|
+
|
|
78
|
+
Never self-exit the loop. Never decide on the AI's behalf that a failure is acceptable. The user breaks the loop, not the agent.
|
|
79
|
+
|
|
80
|
+
## Key Principles
|
|
81
|
+
|
|
82
|
+
- **Evidence over assertion.** Show output, not just "it works."
|
|
83
|
+
- **Never settle.** If a verification method should work but isn't, that's a bug to fix — not a reason to skip it.
|
|
84
|
+
- **Fix before reporting.** If verification reveals a bug you introduced, fix it and re-run.
|
|
85
|
+
- **Match the medium.** UI changes need visual proof. Backend changes need execution proof.
|
|
86
|
+
- **Be specific.** "Tests pass" is weak. "Ran `npm test` — 14 tests passed, 0 failed, output attached" is strong.
|
|
87
|
+
- **Don't skip this.** The whole point is catching the gap between "I wrote the code" and "the code works."
|
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: "fix-bug"
|
|
3
|
+
description: "Bug fix orchestrator — diagnose → plan-work → execute-plan → review-work → verify-work → loop. Diagnosis phase is unique to bugs, then chains the same primitives."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Bug Fix
|
|
7
|
+
|
|
8
|
+
Diagnose a bug, then chain the same plan → execute → verify loop. The diagnosis phase is what makes this different from deliver.
|
|
9
|
+
|
|
10
|
+
## Agents
|
|
11
|
+
|
|
12
|
+
Inherited from primitives + diagnosis:
|
|
13
|
+
|
|
14
|
+
| Agent | Used by |
|
|
15
|
+
|---|---|
|
|
16
|
+
| tool-planner | diagnosis + plan-work |
|
|
17
|
+
| tool-worker (x4) | execute-plan |
|
|
18
|
+
| tool-code-reviewer | review-work |
|
|
19
|
+
| tool-security-reviewer | review-work (conditional — security-sensitive changes) |
|
|
20
|
+
| tool-verifier | verify-work |
|
|
21
|
+
| tool-playwright | diagnosis (reproduce) + verify-work |
|
|
22
|
+
|
|
23
|
+
## Orchestrator Rule
|
|
24
|
+
|
|
25
|
+
You never use `read`, `glob`, `grep`, or `code` on source files. All codebase analysis goes through tool-planner. All review goes through review-work. All verification goes through tool-verifier or tool-playwright.
|
|
26
|
+
|
|
27
|
+
## Input
|
|
28
|
+
|
|
29
|
+
- **Bug report**: screenshot, error log, user description, or all three
|
|
30
|
+
- **Directory**: working directory
|
|
31
|
+
|
|
32
|
+
## Session File
|
|
33
|
+
|
|
34
|
+
Filename: `<branch>--fix-bug-<slug>.md`
|
|
35
|
+
|
|
36
|
+
```markdown
|
|
37
|
+
# BUG: <one-liner>
|
|
38
|
+
|
|
39
|
+
branch: <branch>
|
|
40
|
+
worktree: <worktree>
|
|
41
|
+
created: <date>
|
|
42
|
+
status: diagnosing | planning | fixing | verifying | resolved
|
|
43
|
+
type: fix-bug
|
|
44
|
+
iteration: 0
|
|
45
|
+
|
|
46
|
+
## Bug Report
|
|
47
|
+
|
|
48
|
+
Source: screenshot | error log | user description
|
|
49
|
+
<original report, pasted verbatim>
|
|
50
|
+
|
|
51
|
+
## Diagnosis
|
|
52
|
+
|
|
53
|
+
Root cause from tool-planner.
|
|
54
|
+
|
|
55
|
+
## Plan
|
|
56
|
+
|
|
57
|
+
(populated by plan-work)
|
|
58
|
+
|
|
59
|
+
## Execution Progress
|
|
60
|
+
|
|
61
|
+
(populated by execute-plan)
|
|
62
|
+
|
|
63
|
+
## Verification Report
|
|
64
|
+
|
|
65
|
+
(populated by verify-work)
|
|
66
|
+
|
|
67
|
+
## History
|
|
68
|
+
|
|
69
|
+
- iteration 1: partial — fix applied but regression in sidebar
|
|
70
|
+
- iteration 2: pass — bug fixed, no regressions
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Workflow
|
|
74
|
+
|
|
75
|
+
### 1. Create session file
|
|
76
|
+
|
|
77
|
+
Paste the bug report verbatim. Set `status: diagnosing`.
|
|
78
|
+
|
|
79
|
+
### 2. Diagnose (unique to bugs)
|
|
80
|
+
|
|
81
|
+
1. **Reproduce** (if visual) — delegate to tool-playwright to confirm the bug is visible. Screenshot the broken state.
|
|
82
|
+
2. **Find root cause** — delegate to tool-planner:
|
|
83
|
+
```
|
|
84
|
+
Bug: <description>
|
|
85
|
+
Reproduction: <steps or screenshot evidence>
|
|
86
|
+
Directory: <working directory>
|
|
87
|
+
todo_file: <session file path>
|
|
88
|
+
Find the root cause and propose a fix plan.
|
|
89
|
+
```
|
|
90
|
+
3. Read the diagnosis from tool-planner's output
|
|
91
|
+
4. Paste into session file `## Diagnosis`
|
|
92
|
+
5. Present to user: "Here's what's broken and how I'd fix it. Agree?"
|
|
93
|
+
6. On approval → proceed to plan
|
|
94
|
+
|
|
95
|
+
### 3. Plan (plan-work)
|
|
96
|
+
|
|
97
|
+
Invoke plan-work with: diagnosis + fix goal, directory, session file path.
|
|
98
|
+
|
|
99
|
+
### 4. Execute (execute-plan)
|
|
100
|
+
|
|
101
|
+
Invoke execute-plan with the plan artifact path and session file path.
|
|
102
|
+
|
|
103
|
+
### 5. Review (review-work)
|
|
104
|
+
|
|
105
|
+
Invoke `review-work` with the session file path. It must delegate to `tool-code-reviewer`, and to `tool-security-reviewer` when security triggers are present. CRITICAL/HIGH findings block and loop back to Execute unless explicitly accepted.
|
|
106
|
+
|
|
107
|
+
### 6. Verify (verify-work)
|
|
108
|
+
|
|
109
|
+
Invoke verify-work with the session file path. tool-verifier must verify:
|
|
110
|
+
1. **Bug is fixed** — the specific issue from the report
|
|
111
|
+
2. **No regressions** — build passes, existing tests pass, related functionality works
|
|
112
|
+
|
|
113
|
+
### 7. Route on verdict
|
|
114
|
+
|
|
115
|
+
- **All PASS** → resolve
|
|
116
|
+
- **Any FAIL** → loop
|
|
117
|
+
- **Any NOT_VERIFIED** → surface to user
|
|
118
|
+
|
|
119
|
+
### 8. Loop (on failure)
|
|
120
|
+
|
|
121
|
+
1. Summarize what failed
|
|
122
|
+
2. Increment `iteration`
|
|
123
|
+
3. Re-invoke plan-work with: original diagnosis + failure summary → updated fix plan
|
|
124
|
+
4. Back to step 4
|
|
125
|
+
|
|
126
|
+
### 9. Resolve
|
|
127
|
+
|
|
128
|
+
1. Include verification report verbatim
|
|
129
|
+
2. Show before/after evidence (screenshots if visual)
|
|
130
|
+
3. `git diff --stat`
|
|
131
|
+
4. Set `status: resolved`
|
|
132
|
+
|
|
133
|
+
{context?}
|