@kontourai/flow-agents 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/.githooks/pre-push +11 -0
  2. package/.github/workflows/ci.yml +210 -0
  3. package/.github/workflows/docs-pages.yml +52 -0
  4. package/.github/workflows/publish-npm.yml +104 -0
  5. package/AGENTS.md +26 -0
  6. package/CHANGELOG.md +66 -0
  7. package/CODE_OF_CONDUCT.md +25 -0
  8. package/CONTEXT.md +300 -0
  9. package/CONTRIBUTING.md +44 -0
  10. package/LICENSE +201 -0
  11. package/README.md +129 -0
  12. package/SECURITY.md +33 -0
  13. package/agent-cards/dev.json +19 -0
  14. package/agents/dev.json +127 -0
  15. package/agents/tool-code-reviewer.json +61 -0
  16. package/agents/tool-dependencies-updater.json +118 -0
  17. package/agents/tool-explore-config.json +92 -0
  18. package/agents/tool-explore-deps.json +92 -0
  19. package/agents/tool-explore-entry.json +92 -0
  20. package/agents/tool-explore-patterns.json +92 -0
  21. package/agents/tool-explore-structure.json +92 -0
  22. package/agents/tool-explore-tests.json +92 -0
  23. package/agents/tool-planner.json +57 -0
  24. package/agents/tool-playwright.json +145 -0
  25. package/agents/tool-security-reviewer.json +56 -0
  26. package/agents/tool-verifier.json +61 -0
  27. package/agents/tool-worker.json +58 -0
  28. package/build/src/cli/console-learning-projection.js +123 -0
  29. package/build/src/cli/docs-preview.js +39 -0
  30. package/build/src/cli/effective-backlog-settings.js +102 -0
  31. package/build/src/cli/export-bookmarks.js +38 -0
  32. package/build/src/cli/fixture-retirement-audit.js +140 -0
  33. package/build/src/cli/flow-kit.js +138 -0
  34. package/build/src/cli/import-bookmarks.js +50 -0
  35. package/build/src/cli/init.js +239 -0
  36. package/build/src/cli/instinct-cli.js +93 -0
  37. package/build/src/cli/promote-workflow-artifact.js +63 -0
  38. package/build/src/cli/publish-change-helper.js +154 -0
  39. package/build/src/cli/pull-work-provider.js +469 -0
  40. package/build/src/cli/runtime-adapter.js +23 -0
  41. package/build/src/cli/telemetry-doctor.js +221 -0
  42. package/build/src/cli/usage-feedback.js +443 -0
  43. package/build/src/cli/validate-hook-influence.js +152 -0
  44. package/build/src/cli/validate-source-tree.js +31 -0
  45. package/build/src/cli/validate-workflow-artifacts.js +486 -0
  46. package/build/src/cli/veritas-governance.js +262 -0
  47. package/build/src/cli/workflow-artifact-cleanup-audit.js +272 -0
  48. package/build/src/cli/workflow-sidecar.js +816 -0
  49. package/build/src/cli.js +89 -0
  50. package/build/src/flow-kit/validate.js +75 -0
  51. package/build/src/lib/args.js +45 -0
  52. package/build/src/lib/fs.js +62 -0
  53. package/build/src/lib/workflow-learning-projection.js +334 -0
  54. package/build/src/runtime-adapters.js +146 -0
  55. package/build/src/tools/build-universal-bundles.js +397 -0
  56. package/build/src/tools/common.js +56 -0
  57. package/build/src/tools/filter-installed-packs.js +132 -0
  58. package/build/src/tools/generate-context-map.js +198 -0
  59. package/build/src/tools/validate-package.js +64 -0
  60. package/build/src/tools/validate-source-tree.js +622 -0
  61. package/console.telemetry.json +176 -0
  62. package/context/base-rules.md +17 -0
  63. package/context/code-review-standards.md +62 -0
  64. package/context/coding-standards.md +42 -0
  65. package/context/common/orchestrators.md +12 -0
  66. package/context/common/subagents.md +28 -0
  67. package/context/contracts/artifact-contract.md +182 -0
  68. package/context/contracts/builder-kit-workflow-state-contract.md +319 -0
  69. package/context/contracts/delivery-contract.md +69 -0
  70. package/context/contracts/execution-contract.md +53 -0
  71. package/context/contracts/governance-adapter-contract.md +67 -0
  72. package/context/contracts/planning-contract.md +85 -0
  73. package/context/contracts/review-contract.md +104 -0
  74. package/context/contracts/sandbox-policy.md +52 -0
  75. package/context/contracts/verification-contract.md +134 -0
  76. package/context/contracts/work-item-contract.md +215 -0
  77. package/context/deferred/demo-mode.md +33 -0
  78. package/context/deferred/languages/go.md +31 -0
  79. package/context/deferred/languages/python.md +31 -0
  80. package/context/deferred/languages/typescript.md +34 -0
  81. package/context/deferred/parallelization.md +35 -0
  82. package/context/deferred/worktree-isolation.md +24 -0
  83. package/context/development-workflow.md +50 -0
  84. package/context/scripts/context-budget/budget-scan.sh +166 -0
  85. package/context/scripts/detect-tools.sh +3 -0
  86. package/context/scripts/discover-agents.sh +28 -0
  87. package/context/scripts/git-status.sh +49 -0
  88. package/context/scripts/hooks/config-protection.js +79 -0
  89. package/context/scripts/hooks/desktop-notify.sh +39 -0
  90. package/context/scripts/hooks/governance-audit.sh +135 -0
  91. package/context/scripts/hooks/lib/audit-transport.sh +40 -0
  92. package/context/scripts/hooks/lib/hook-flags.js +49 -0
  93. package/context/scripts/hooks/lib/patterns.sh +57 -0
  94. package/context/scripts/hooks/lib/resolve-formatter.js +80 -0
  95. package/context/scripts/hooks/post-edit-accumulator.js +66 -0
  96. package/context/scripts/hooks/pre-commit-quality.js +194 -0
  97. package/context/scripts/hooks/quality-gate.js +93 -0
  98. package/context/scripts/hooks/report-only-guard.js +21 -0
  99. package/context/scripts/hooks/run-hook.js +136 -0
  100. package/context/scripts/hooks/stop-format-typecheck.js +141 -0
  101. package/context/scripts/hooks/stop-goal-fit.js +337 -0
  102. package/context/scripts/hooks/workflow-steering.js +250 -0
  103. package/context/scripts/telemetry/console-presets.sh +14 -0
  104. package/context/scripts/telemetry/install-console-config.sh +214 -0
  105. package/context/scripts/telemetry/lib/config.sh +85 -0
  106. package/context/scripts/telemetry/lib/enrich.sh +115 -0
  107. package/context/scripts/telemetry/lib/redact.sh +22 -0
  108. package/context/scripts/telemetry/lib/session.sh +63 -0
  109. package/context/scripts/telemetry/lib/transport.sh +183 -0
  110. package/context/scripts/telemetry/lib/usage.sh +29 -0
  111. package/context/scripts/telemetry/sync-agents.sh +173 -0
  112. package/context/scripts/telemetry/telemetry.conf +23 -0
  113. package/context/scripts/telemetry/telemetry.sh +387 -0
  114. package/context/scripts/validate-package.sh +89 -0
  115. package/context/settings/backlog-provider-settings.json +54 -0
  116. package/context/templates/core/identity.md +26 -0
  117. package/context/templates/core/user.md +15 -0
  118. package/docs/_config.yml +15 -0
  119. package/docs/_layouts/default.html +87 -0
  120. package/docs/adr/0001-flow-agents-consumes-flow.md +77 -0
  121. package/docs/adr/0002-flow-kits-as-extension-unit.md +13 -0
  122. package/docs/adr/0003-flow-agents-coordinates-kits-and-adapters.md +13 -0
  123. package/docs/adr/0004-gates-expect-surface-claims.md +15 -0
  124. package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +48 -0
  125. package/docs/adr/0006-typescript-first-source-policy.md +98 -0
  126. package/docs/agent-system-guidebook.md +391 -0
  127. package/docs/agent-usage-feedback-loop.md +351 -0
  128. package/docs/assets/favicon.svg +13 -0
  129. package/docs/assets/og-image.png +0 -0
  130. package/docs/assets/site.css +774 -0
  131. package/docs/assets/site.js +139 -0
  132. package/docs/configurable-workflow-routing.md +174 -0
  133. package/docs/context-map.md +145 -0
  134. package/docs/developer-architecture.md +145 -0
  135. package/docs/developer-hook-setup.md +61 -0
  136. package/docs/fixture-ownership.md +44 -0
  137. package/docs/flow-kit-repository-contract.md +180 -0
  138. package/docs/index.md +129 -0
  139. package/docs/kontour-resource-contract.md +358 -0
  140. package/docs/migrations.md +64 -0
  141. package/docs/north-star.md +322 -0
  142. package/docs/operating-layers.md +110 -0
  143. package/docs/repository-structure.md +132 -0
  144. package/docs/sandbox-policy.md +56 -0
  145. package/docs/skills-map.md +203 -0
  146. package/docs/standards-register.md +96 -0
  147. package/docs/veritas-integration.md +165 -0
  148. package/docs/work-item-adapters.md +72 -0
  149. package/docs/workflow-artifact-lifecycle.md +141 -0
  150. package/docs/workflow-eval-strategy.md +295 -0
  151. package/docs/workflow-shared-contracts.md +51 -0
  152. package/docs/workflow-usage-guide.md +443 -0
  153. package/evals/ARCHITECTURE.md +143 -0
  154. package/evals/CONVENTIONS.md +58 -0
  155. package/evals/README.md +128 -0
  156. package/evals/acceptance/run.sh +29 -0
  157. package/evals/acceptance/test_claude_harness.sh +242 -0
  158. package/evals/acceptance/test_codex_harness.sh +108 -0
  159. package/evals/acceptance/test_kiro_harness.sh +128 -0
  160. package/evals/cases/dev/404.html +97 -0
  161. package/evals/cases/dev/code-review.yaml +44 -0
  162. package/evals/cases/dev/dashboard.html +300 -0
  163. package/evals/cases/dev/deliver.yaml +66 -0
  164. package/evals/cases/dev/dependency-update.yaml +16 -0
  165. package/evals/cases/dev/explore.yaml +20 -0
  166. package/evals/cases/dev/index.html +370 -0
  167. package/evals/cases/dev/package-lock.json +28 -0
  168. package/evals/cases/dev/package.json +16 -0
  169. package/evals/cases/dev/plan-work.yaml +20 -0
  170. package/evals/cases/dev/promptfooconfig.yaml +666 -0
  171. package/evals/cases/dev/search-first.yaml +20 -0
  172. package/evals/cases/dev/tdd-workflow.yaml +48 -0
  173. package/evals/cases/dev/verify-work.yaml +44 -0
  174. package/evals/cases/dev/workflow.yaml +34 -0
  175. package/evals/ci/run-baseline.sh +283 -0
  176. package/evals/fixtures/backlog-provider-settings/global-default.json +44 -0
  177. package/evals/fixtures/backlog-provider-settings/project-override.json +53 -0
  178. package/evals/fixtures/builder-kit-workflow-state/baseline-freshness-resolution-hint.json +139 -0
  179. package/evals/fixtures/builder-kit-workflow-state/direct-primitive-stop.json +59 -0
  180. package/evals/fixtures/builder-kit-workflow-state/empty-board-route-shape.json +55 -0
  181. package/evals/fixtures/builder-kit-workflow-state/happy-path.json +71 -0
  182. package/evals/fixtures/builder-kit-workflow-state/mid-work-resume.json +80 -0
  183. package/evals/fixtures/builder-kit-workflow-state/missing-prestep-recovery.json +65 -0
  184. package/evals/fixtures/builder-kit-workflow-state/product-build-chaining.json +60 -0
  185. package/evals/fixtures/builder-kit-workflow-state/stale-continuation-requires-new-probe.json +57 -0
  186. package/evals/fixtures/console-learning-projection/artifacts/console-learning-correction/learning.json +50 -0
  187. package/evals/fixtures/console-learning-projection/artifacts/console-learning-open-route/learning.json +41 -0
  188. package/evals/fixtures/flow-kit-repository/invalid-absolute-path/kit.json +8 -0
  189. package/evals/fixtures/flow-kit-repository/invalid-asset-section/flows/review.flow.json +6 -0
  190. package/evals/fixtures/flow-kit-repository/invalid-asset-section/kit.json +11 -0
  191. package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/flows/review.flow.json +6 -0
  192. package/evals/fixtures/flow-kit-repository/invalid-duplicate-flow/kit.json +9 -0
  193. package/evals/fixtures/flow-kit-repository/invalid-id/flows/review.flow.json +6 -0
  194. package/evals/fixtures/flow-kit-repository/invalid-id/kit.json +8 -0
  195. package/evals/fixtures/flow-kit-repository/invalid-malformed-json/kit.json +8 -0
  196. package/evals/fixtures/flow-kit-repository/invalid-missing-flow/kit.json +8 -0
  197. package/evals/fixtures/flow-kit-repository/invalid-missing-id/flows/review.flow.json +6 -0
  198. package/evals/fixtures/flow-kit-repository/invalid-missing-id/kit.json +7 -0
  199. package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/flows/review.flow.json +6 -0
  200. package/evals/fixtures/flow-kit-repository/invalid-missing-schema-version/kit.json +7 -0
  201. package/evals/fixtures/flow-kit-repository/invalid-name/flows/review.flow.json +6 -0
  202. package/evals/fixtures/flow-kit-repository/invalid-name/kit.json +8 -0
  203. package/evals/fixtures/flow-kit-repository/invalid-schema-version/flows/review.flow.json +6 -0
  204. package/evals/fixtures/flow-kit-repository/invalid-schema-version/kit.json +8 -0
  205. package/evals/fixtures/flow-kit-repository/invalid-traversal/kit.json +8 -0
  206. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/adapters/example.json +3 -0
  207. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/assets/example.txt +1 -0
  208. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/docs/README.md +3 -0
  209. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +26 -0
  210. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-evals/example.json +3 -0
  211. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit-skills/mixed/SKILL.md +3 -0
  212. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/kit.json +44 -0
  213. package/evals/fixtures/flow-kit-repository/valid-local-kit/docs/README.md +3 -0
  214. package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +26 -0
  215. package/evals/fixtures/flow-kit-repository/valid-local-kit/kit.json +20 -0
  216. package/evals/fixtures/hook-influence/cases.json +336 -0
  217. package/evals/fixtures/pull-work-provider/github-issues.json +170 -0
  218. package/evals/fixtures/pull-work-wip-shepherding/global-wip-informs.json +43 -0
  219. package/evals/fixtures/pull-work-wip-shepherding/personal-wip-blocks.json +42 -0
  220. package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +31 -0
  221. package/evals/fixtures/surface-trust/artifact-absent.json +19 -0
  222. package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +32 -0
  223. package/evals/fixtures/surface-trust/missing-authority-trust-report.json +27 -0
  224. package/evals/fixtures/surface-trust/provider-absent.json +19 -0
  225. package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +30 -0
  226. package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +31 -0
  227. package/evals/fixtures/usage-feedback/sample-full.jsonl +11 -0
  228. package/evals/fixtures/usage-feedback/sample-outcomes.jsonl +1 -0
  229. package/evals/fixtures/veritas-governance-adapter/fake-veritas-pass.sh +18 -0
  230. package/evals/fixtures/veritas-governance-adapter/fake-veritas-secret-fail.sh +10 -0
  231. package/evals/fixtures/veritas-governance-adapter/fake-veritas-unconfigured.sh +4 -0
  232. package/evals/integration/test_bundle_install.sh +541 -0
  233. package/evals/integration/test_console_learning_projection.sh +192 -0
  234. package/evals/integration/test_context_map.sh +65 -0
  235. package/evals/integration/test_effective_backlog_settings.sh +58 -0
  236. package/evals/integration/test_fixture_retirement_audit.sh +58 -0
  237. package/evals/integration/test_flow_agents_statusline.sh +93 -0
  238. package/evals/integration/test_flow_kit_repository.sh +90 -0
  239. package/evals/integration/test_goal_fit_hook.sh +482 -0
  240. package/evals/integration/test_hook_category_behaviors.sh +190 -0
  241. package/evals/integration/test_hook_influence_cases.sh +69 -0
  242. package/evals/integration/test_local_flow_kit_install.sh +145 -0
  243. package/evals/integration/test_publish_change_helper.sh +176 -0
  244. package/evals/integration/test_pull_work_provider.sh +140 -0
  245. package/evals/integration/test_runtime_adapter_activation.sh +106 -0
  246. package/evals/integration/test_telemetry.sh +485 -0
  247. package/evals/integration/test_telemetry_doctor.sh +193 -0
  248. package/evals/integration/test_usage_feedback_dashboard.sh +169 -0
  249. package/evals/integration/test_usage_feedback_global.sh +117 -0
  250. package/evals/integration/test_usage_feedback_import.sh +227 -0
  251. package/evals/integration/test_usage_feedback_outcomes.sh +165 -0
  252. package/evals/integration/test_usage_feedback_report.sh +263 -0
  253. package/evals/integration/test_veritas_governance_adapter.sh +235 -0
  254. package/evals/integration/test_workflow_artifact_cleanup_audit.sh +287 -0
  255. package/evals/integration/test_workflow_artifacts.sh +1247 -0
  256. package/evals/integration/test_workflow_sidecar_writer.sh +2112 -0
  257. package/evals/integration/test_workflow_steering_hook.sh +337 -0
  258. package/evals/lib/assertions/delegated-to.js +40 -0
  259. package/evals/lib/assertions/max-tool-calls.js +15 -0
  260. package/evals/lib/assertions/no-write-tools.js +27 -0
  261. package/evals/lib/assertions/pass-at-k.js +39 -0
  262. package/evals/lib/assertions/telemetry-utils.js +105 -0
  263. package/evals/lib/assertions/tool-called.js +39 -0
  264. package/evals/lib/assertions/verify-after-fix.js +61 -0
  265. package/evals/lib/claude-judge.sh +40 -0
  266. package/evals/lib/claude-provider.sh +74 -0
  267. package/evals/lib/codex-judge.sh +39 -0
  268. package/evals/lib/codex-provider.sh +81 -0
  269. package/evals/lib/eval-dev.sh +5 -0
  270. package/evals/lib/eval-judge.sh +22 -0
  271. package/evals/lib/eval-provider.sh +26 -0
  272. package/evals/lib/eval-report.sh +73 -0
  273. package/evals/lib/kiro-dev.sh +4 -0
  274. package/evals/lib/kiro-judge.sh +17 -0
  275. package/evals/lib/kiro-provider.sh +62 -0
  276. package/evals/lib/node.sh +111 -0
  277. package/evals/promptfooconfig.yaml +70 -0
  278. package/evals/run.sh +309 -0
  279. package/evals/static/test_evidence_refs.sh +141 -0
  280. package/evals/static/test_package.sh +407 -0
  281. package/evals/static/test_repo_hooks.sh +68 -0
  282. package/evals/static/test_universal_bundles.sh +274 -0
  283. package/evals/static/test_workflow_skills.sh +1207 -0
  284. package/install.sh +64 -0
  285. package/integrations/veritas/flow-agents.adapter.json +138 -0
  286. package/integrations/veritas/flow-agents.authority-settings.json +26 -0
  287. package/integrations/veritas/flow-agents.repo-standards.json +82 -0
  288. package/kits/builder/flows/build.flow.json +218 -0
  289. package/kits/builder/flows/shape.flow.json +127 -0
  290. package/kits/builder/kit.json +19 -0
  291. package/kits/catalog.json +11 -0
  292. package/package.json +130 -0
  293. package/packaging/README.md +60 -0
  294. package/packaging/manifest.json +173 -0
  295. package/packaging/packs.json +69 -0
  296. package/powers/dependency-checker/POWER.md +20 -0
  297. package/powers/dependency-checker/mcp.json +20 -0
  298. package/powers/playwright/POWER.md +25 -0
  299. package/powers/playwright/mcp.json +12 -0
  300. package/prompts/code-audit.md +123 -0
  301. package/prompts/kcommit.md +88 -0
  302. package/schemas/backlog-provider-settings.schema.json +138 -0
  303. package/schemas/workflow-acceptance.schema.json +216 -0
  304. package/schemas/workflow-critique.schema.json +113 -0
  305. package/schemas/workflow-evidence.schema.json +357 -0
  306. package/schemas/workflow-handoff.schema.json +52 -0
  307. package/schemas/workflow-learning.schema.json +223 -0
  308. package/schemas/workflow-release.schema.json +172 -0
  309. package/schemas/workflow-state.schema.json +80 -0
  310. package/scripts/README.md +111 -0
  311. package/scripts/build-universal-bundles.js +3 -0
  312. package/scripts/check-content-boundary.cjs +99 -0
  313. package/scripts/context-budget/budget-scan.sh +166 -0
  314. package/scripts/detect-tools.sh +3 -0
  315. package/scripts/discover-agents.sh +28 -0
  316. package/scripts/effective-backlog-settings.js +2 -0
  317. package/scripts/filter-installed-packs.js +2 -0
  318. package/scripts/flow-kit.js +2 -0
  319. package/scripts/generate-context-map.js +2 -0
  320. package/scripts/git-status.sh +49 -0
  321. package/scripts/hooks/claude-hook-adapter.js +174 -0
  322. package/scripts/hooks/claude-telemetry-hook.js +115 -0
  323. package/scripts/hooks/codex-hook-adapter.js +176 -0
  324. package/scripts/hooks/codex-telemetry-hook.js +95 -0
  325. package/scripts/hooks/config-protection.js +79 -0
  326. package/scripts/hooks/desktop-notify.sh +39 -0
  327. package/scripts/hooks/governance-audit.sh +135 -0
  328. package/scripts/hooks/lib/audit-transport.sh +40 -0
  329. package/scripts/hooks/lib/hook-flags.js +49 -0
  330. package/scripts/hooks/lib/patterns.sh +57 -0
  331. package/scripts/hooks/lib/resolve-formatter.js +80 -0
  332. package/scripts/hooks/post-edit-accumulator.js +66 -0
  333. package/scripts/hooks/pre-commit-quality.js +194 -0
  334. package/scripts/hooks/quality-gate.js +93 -0
  335. package/scripts/hooks/report-only-guard.js +21 -0
  336. package/scripts/hooks/run-hook.js +136 -0
  337. package/scripts/hooks/stop-format-typecheck.js +141 -0
  338. package/scripts/hooks/stop-goal-fit.js +337 -0
  339. package/scripts/hooks/workflow-steering.js +250 -0
  340. package/scripts/install-codex-home.sh +106 -0
  341. package/scripts/package.json +3 -0
  342. package/scripts/promote-workflow-artifact.js +2 -0
  343. package/scripts/publish-change-helper.js +2 -0
  344. package/scripts/pull-work-provider.js +2 -0
  345. package/scripts/setup-repo-hooks.sh +8 -0
  346. package/scripts/statusline/flow-agents-statusline.js +157 -0
  347. package/scripts/telemetry/console-presets.sh +14 -0
  348. package/scripts/telemetry/install-console-config.sh +214 -0
  349. package/scripts/telemetry/lib/config.sh +85 -0
  350. package/scripts/telemetry/lib/enrich.sh +115 -0
  351. package/scripts/telemetry/lib/redact.sh +22 -0
  352. package/scripts/telemetry/lib/session.sh +63 -0
  353. package/scripts/telemetry/lib/transport.sh +183 -0
  354. package/scripts/telemetry/lib/usage.sh +29 -0
  355. package/scripts/telemetry/sync-agents.sh +173 -0
  356. package/scripts/telemetry/telemetry.conf +23 -0
  357. package/scripts/telemetry/telemetry.sh +387 -0
  358. package/scripts/usage-feedback.js +2 -0
  359. package/scripts/validate-hook-influence-cases.js +2 -0
  360. package/scripts/validate-package.sh +89 -0
  361. package/scripts/validate-source-tree.js +9 -0
  362. package/skills/agentic-engineering/SKILL.md +62 -0
  363. package/skills/browser-test/SKILL.md +51 -0
  364. package/skills/builder-shape/SKILL.md +76 -0
  365. package/skills/context-budget/SKILL.md +40 -0
  366. package/skills/deliver/SKILL.md +241 -0
  367. package/skills/dependency-update/SKILL.md +68 -0
  368. package/skills/design-probe/SKILL.md +107 -0
  369. package/skills/eval-rebuild/SKILL.md +39 -0
  370. package/skills/evidence-gate/SKILL.md +186 -0
  371. package/skills/execute-plan/SKILL.md +110 -0
  372. package/skills/explore/SKILL.md +137 -0
  373. package/skills/feedback-loop/SKILL.md +87 -0
  374. package/skills/fix-bug/SKILL.md +133 -0
  375. package/skills/frontend-design/SKILL.md +80 -0
  376. package/skills/github-cli/SKILL.md +63 -0
  377. package/skills/idea-to-backlog/SKILL.md +267 -0
  378. package/skills/knowledge-capture/SKILL.md +55 -0
  379. package/skills/learning-review/SKILL.md +115 -0
  380. package/skills/pickup-probe/SKILL.md +114 -0
  381. package/skills/plan-work/SKILL.md +176 -0
  382. package/skills/pull-work/SKILL.md +309 -0
  383. package/skills/release-readiness/SKILL.md +121 -0
  384. package/skills/review-work/SKILL.md +161 -0
  385. package/skills/search-first/SKILL.md +66 -0
  386. package/skills/tdd-workflow/SKILL.md +140 -0
  387. package/skills/verify-work/SKILL.md +109 -0
  388. package/src/cli/console-learning-projection.ts +140 -0
  389. package/src/cli/effective-backlog-settings.ts +99 -0
  390. package/src/cli/fixture-retirement-audit.ts +154 -0
  391. package/src/cli/flow-kit.ts +139 -0
  392. package/src/cli/init.ts +248 -0
  393. package/src/cli/promote-workflow-artifact.ts +64 -0
  394. package/src/cli/publish-change-helper.ts +143 -0
  395. package/src/cli/pull-work-provider.ts +481 -0
  396. package/src/cli/runtime-adapter.ts +24 -0
  397. package/src/cli/telemetry-doctor.ts +243 -0
  398. package/src/cli/usage-feedback.ts +418 -0
  399. package/src/cli/validate-hook-influence.ts +119 -0
  400. package/src/cli/validate-source-tree.ts +30 -0
  401. package/src/cli/validate-workflow-artifacts.ts +411 -0
  402. package/src/cli/veritas-governance.ts +322 -0
  403. package/src/cli/workflow-artifact-cleanup-audit.ts +281 -0
  404. package/src/cli/workflow-sidecar.ts +676 -0
  405. package/src/cli.ts +95 -0
  406. package/src/flow-kit/validate.ts +74 -0
  407. package/src/lib/args.ts +43 -0
  408. package/src/lib/fs.ts +62 -0
  409. package/src/lib/workflow-learning-projection.ts +491 -0
  410. package/src/runtime-adapters.ts +154 -0
  411. package/src/tools/build-universal-bundles.ts +366 -0
  412. package/src/tools/common.ts +61 -0
  413. package/src/tools/filter-installed-packs.ts +129 -0
  414. package/src/tools/generate-context-map.ts +199 -0
  415. package/src/tools/validate-package.ts +57 -0
  416. package/src/tools/validate-source-tree.ts +488 -0
  417. package/tsconfig.json +19 -0
  418. package/veritas.claims.json +6 -0
@@ -0,0 +1,186 @@
1
+ ---
2
+ name: "evidence-gate"
3
+ description: "Evaluate whether completed work is trustworthy enough for human review, merge, or release. Use after implementation, verify-work, provider checks, CI, or remediation to map acceptance criteria to evidence, inspect scope integrity, classify failures, assess check health, and produce a confidence report."
4
+ ---
5
+
6
+ # Evidence Gate
7
+
8
+ Build confidence with falsifiable evidence, not process completion.
9
+
10
+ Evidence Gate is not Release Readiness. It asks whether completed work has enough trustworthy evidence, scope integrity, and provider/runtime signal to publish the change, continue fixing, or ask for a human decision. Release Readiness comes later and decides whether a published branch/provider change should merge, release, deploy, hold, or roll back.
11
+
12
+ ## Contract
13
+
14
+ - Review evidence after implementation and verification.
15
+ - Do not fix code.
16
+ - Do not mark unverified work as passing.
17
+ - Treat `NOT_VERIFIED` as a first-class outcome.
18
+ - Separate evidence provenance: human-authored, agent-authored, CI-generated, runtime-observed.
19
+ - Do not approve release readiness.
20
+ - After a clean local evidence verdict, require a publish-change gate before `release-readiness`: verified diff committed, branch pushed, provider change opened or updated by the active `ChangeProvider` or an explicit no-provider-change reason recorded, closing refs recorded, provider checks known, and evidence refs linked.
21
+ - Provider-facing summaries, PR/change descriptions, issue comments, closure comments, and final acceptance comments that claim implementation behavior must include an `Acceptance Evidence` table with columns `AC id`, `Status`, `Command/Test Evidence`, `Source Evidence / Permalinks`, and `Gaps`.
22
+
23
+ ## Inputs
24
+
25
+ - Work brief or selected GitHub issue.
26
+ - Execution plan.
27
+ - Verification report.
28
+ - Provider change / branch / check run links when available.
29
+ - Changed-file summary.
30
+ - Active TODOs, issue links, and release/rollback notes.
31
+
32
+ ## Artifact Contract
33
+
34
+ Write or update `.flow-agents/<slug>/<slug>--evidence-gate.md` with:
35
+
36
+ - `intent`: issue/brief, acceptance criteria, non-goals, risk class
37
+ - `evidence_manifest`: command/check name, source, timestamp, result, link/output pointer
38
+ - `test_map`: acceptance criterion to evidence tier and gaps
39
+ - `integrity_report`: scope drift, weakened tests/config, sensitive files
40
+ - `ci_report`: checks, reruns, flakes, failures, skipped checks
41
+ - `risk_assessment`: residual risks and required human review
42
+ - `verdict`: PASS, FAIL, or NOT_VERIFIED
43
+ - `next_step`: publish-change, release-readiness, verify-work, execute-plan, plan-work, CI remediation, or human decision
44
+
45
+ Also write or update structured sidecars:
46
+
47
+ - `state.json`: phase `evidence`, current status, and required next action
48
+ - `acceptance.json`: final criterion statuses and goal-fit status
49
+ - `evidence.json`: normalized checks, `standard_refs`, external evidence refs, not-verified gaps, and verdict
50
+ - `handoff.json`: next step and blockers when verdict is not a clean pass
51
+
52
+ Prefer `npm run workflow:sidecar --` for sidecar updates when available, then validate the artifact directory before reporting a clean pass.
53
+
54
+ ## Workflow
55
+
56
+ ### 1. Anchor To Intent
57
+
58
+ Restate:
59
+
60
+ - original problem
61
+ - acceptance criteria
62
+ - non-goals
63
+ - expected risk class
64
+ - authoritative artifacts
65
+
66
+ If acceptance criteria changed after implementation began, flag scope drift unless the decision is documented.
67
+
68
+ ### 2. Build Test Map
69
+
70
+ For each acceptance criterion, map evidence to one of:
71
+
72
+ - existing automated test
73
+ - new or modified automated test
74
+ - browser/runtime check
75
+ - static analysis
76
+ - CI check
77
+ - manual/human verification
78
+ - `NOT_VERIFIED` with rationale
79
+
80
+ Block clean pass if high-risk criteria have only indirect evidence.
81
+ Every acceptance criterion must map to evidence or `NOT_VERIFIED`.
82
+ For implementation-behavior claims, each criterion must map to both command/test proof and structured source evidence refs. Source refs require `kind: "source"`, `file`, `line_start`, `line_end`, and `excerpt`; include immutable GitHub blob permalinks pinned to a commit SHA in `url` when a pushed commit/provider URL exists. Local file/line refs are acceptable only as pre-publish fallback evidence.
83
+
84
+ Use this table shape in evidence-gate summaries and provider/closure comments:
85
+
86
+ | AC id | Status | Command/Test Evidence | Source Evidence / Permalinks | Gaps |
87
+ | --- | --- | --- | --- | --- |
88
+
89
+ Rows must preserve the original AC ids. If source evidence is missing for a behavior claim, the row must say `NOT_VERIFIED` or name an accepted gap; do not issue a clean pass from prose-only claims.
90
+
91
+ ### 3. Scope And Integrity Check
92
+
93
+ Check for process gaming or accidental drift:
94
+
95
+ - scope expanded beyond issue/brief
96
+ - acceptance criteria changed after implementation
97
+ - tests removed or weakened
98
+ - verification config altered
99
+ - CI config altered
100
+ - required CI bypassed
101
+ - sensitive files touched without review
102
+
103
+ Sensitive areas include auth, security middleware, data migrations, CI config, deployment scripts, feature flags, test helpers, lint/type config, payment, crypto, and filesystem/network operations.
104
+
105
+ ### 4. CI And Flake Assessment
106
+
107
+ Use `github-cli` / `gh` when available.
108
+
109
+ Record:
110
+
111
+ - check names
112
+ - pass/fail/skipped
113
+ - rerun count
114
+ - flake suspicion
115
+ - logs or artifact links
116
+ - failure class
117
+ - standard evidence refs when CI emits SARIF, JUnit, TAP, OpenTelemetry, Veritas, or another native proof format
118
+
119
+ For Flow Agents source changes, prefer the GitHub Actions `Flow Agents CI / Builder Kit Baseline` provider check when present. Its local equivalent is `bash evals/ci/run-baseline.sh`, which writes `evals/results/ci-baseline/summary.md` and command logs. Treat skipped live GitHub mutation checks, LLM acceptance, or unavailable Veritas/governance evidence as explicit skip or `NOT_VERIFIED` entries based on the work's risk class; do not convert the baseline summary into proof that those live lanes ran.
120
+
121
+ Treat passed-after-rerun as degraded confidence unless explained.
122
+
123
+ ### 5. Evidence Tiers
124
+
125
+ Classify evidence:
126
+
127
+ - Tier 0: claim only, no artifact.
128
+ - Tier 1: local command output.
129
+ - Tier 2: automated test tied to acceptance criterion.
130
+ - Tier 3: CI-confirmed test on a clean environment.
131
+ - Tier 4: runtime/browser/production-like verification with trace or log artifact.
132
+ - Tier 5: post-deploy telemetry confirms expected behavior.
133
+
134
+ Higher-risk work requires stronger tiers.
135
+
136
+ When an evidence source already has a standard format, keep that format as the native artifact and reference it from `evidence.json`:
137
+
138
+ - SARIF: static analysis, security, code review, and policy findings.
139
+ - OpenTelemetry logs/traces: runtime behavior, tool/model calls, workflow telemetry, and post-deploy events.
140
+ - JUnit/TAP: test results.
141
+ - Veritas: optional evidence checks, repo standards, and authority settings. Flow Agents records the Veritas reference and verdict but does not own Veritas policy semantics.
142
+
143
+ Use `context/contracts/governance-adapter-contract.md` before invoking Veritas or any similar governance provider. If the adapter is unavailable, record `NOT_VERIFIED` unless the user explicitly accepts skipping that governance evidence.
144
+
145
+ ### 6. Verdict
146
+
147
+ Produce:
148
+
149
+ - `PASS`: evidence satisfies risk and acceptance criteria.
150
+ - `FAIL`: evidence shows the work is wrong or unsafe.
151
+ - `NOT_VERIFIED`: evidence is missing, indirect, blocked, or inconclusive.
152
+
153
+ For failures, classify:
154
+
155
+ - implementation defect
156
+ - bad plan
157
+ - bad acceptance criteria
158
+ - flaky infrastructure
159
+ - missing environment
160
+ - security concern
161
+ - product ambiguity
162
+ - scope drift
163
+
164
+ Include required next evidence and whether to return to `plan-work`, `execute-plan`, `verify-work`, `remediate-ci`, or human decision.
165
+
166
+ ### 7. Publish Change Gate
167
+
168
+ If the evidence verdict is otherwise `PASS` but the verified diff is not committed, pushed, and represented by a provider change record or an explicit no-provider-change reason, set `next_step` to `publish-change` instead of `release-readiness`.
169
+
170
+ Use `git` and the active `ChangeProvider` adapter when available to:
171
+
172
+ - confirm the working tree contains only the verified scope
173
+ - commit the verified diff with a clear message
174
+ - push the branch
175
+ - open or update the provider change record linked to the issue/brief, closing refs, and evidence artifact, or record why no provider change is required
176
+ - include or update the provider-facing `Acceptance Evidence` table, upgrading local source refs to immutable GitHub blob permalinks when the commit SHA and repository URL are known
177
+ - collect provider check/CI links and statuses, or record why provider checks are unavailable
178
+ - keep GitHub PRs as the first `ChangeProvider` adapter example: for GitHub, open or update a PR and collect PR checks
179
+
180
+ If commit, push, provider change publication, or provider checks are blocked, keep the release path at `NOT_VERIFIED` or `HOLD` until the blocker is resolved or explicitly accepted by the user.
181
+
182
+ ## Gate
183
+
184
+ Evidence passes only when acceptance criteria, scope integrity, CI/runtime evidence, and residual risk are sufficient for the risk class.
185
+
186
+ After `PASS`, hand off to `publish-change` when the work is still local, or to `release-readiness` when the verified commit, pushed branch, provider change record or no-provider-change reason, provider checks, closing refs, structured evidence refs, and `Acceptance Evidence` table are available. After `FAIL` or `NOT_VERIFIED`, stop and name the missing work or evidence.
@@ -0,0 +1,110 @@
1
+ ---
2
+ name: "execute-plan"
3
+ description: "Parallel execution primitive — plan artifact path to implemented code via tool-worker (x4). Reads plan directly. Updates session file between waves."
4
+ ---
5
+
6
+ # Execute
7
+
8
+ Plan artifact in, implemented code out. Fans out to tool-worker subagents in parallel waves.
9
+
10
+ ## Agents
11
+
12
+ | Agent | Role |
13
+ |---|---|
14
+ | tool-worker | Implementation per task spec (up to 4 parallel) |
15
+
16
+ ## Orchestrator Rule
17
+
18
+ You do not write source files. You read the plan artifact, fan out tasks to tool-worker, and update the session file between waves.
19
+
20
+ ## Shared Contracts
21
+
22
+ Follow:
23
+ - `context/contracts/artifact-contract.md`
24
+ - `context/contracts/execution-contract.md`
25
+ - `context/contracts/planning-contract.md` for the plan artifact and Definition Of Done
26
+ - `context/contracts/sandbox-policy.md`
27
+
28
+ This skill owns orchestration between waves. The contracts own artifact continuity, worker task expectations, conflict handling, validation expectations, and completion rules.
29
+
30
+ ## Input
31
+
32
+ - **Plan artifact path**: path to the `-plan.md` file in `.flow-agents/<slug>/`
33
+ - **Session file path**: the session file to update with progress
34
+
35
+ ## Workflow
36
+
37
+ 1. Read the plan artifact directly
38
+ 2. Confirm the plan follows `context/contracts/planning-contract.md`, including `## Definition Of Done`. If missing, return to `plan-work` before implementation.
39
+ 3. Confirm the plan records an appropriate `sandbox_mode` using `context/contracts/sandbox-policy.md`. If missing, infer the smallest safe mode and record it before delegation.
40
+ 4. Confirm execution traceability before any worker starts:
41
+ - acceptance criteria have stable ids, preferably matching `acceptance.json`
42
+ - every wave/task lists the acceptance ids it supports
43
+ - the session/deliver file copies or links the criteria and includes a `Requirements Trace` or equivalent mapping
44
+ - each worker prompt includes the relevant acceptance ids and required evidence, not only a loose task title
45
+ - if traceability is missing, update the session file and/or send the plan back for refinement before delegation
46
+ 5. Set session file `status: executing` and use `npm run workflow:sidecar -- advance-state <artifact-dir> --status in_progress --phase execution --summary ... --next-action ...` when the repository provides it
47
+ 6. **Frontend design check:** If any tasks involve UI, CSS, layouts, components, or visual design, read the `frontend-design` skill and include its aesthetics guidelines in the tool-worker prompts for those tasks
48
+ 7. Fan out each wave to tool-worker subagents (up to 4 parallel):
49
+ - Delegate to the exact `tool-worker` role for every implementation worker. Do not spawn unnamed/default implementation agents.
50
+ ```
51
+ Each tool-worker gets:
52
+ - Task description from plan
53
+ - Files to create/modify
54
+ - Acceptance criteria
55
+ - Acceptance criterion ids and requirement ids this task supports
56
+ - Required evidence for those criteria
57
+ - Definition Of Done items that this task supports
58
+ - Sandbox mode, approval assumptions, rollback expectations, and escalation stop conditions
59
+ - Context from plan + prior wave results
60
+ - Plan artifact path (so it can read full context directly)
61
+ ```
62
+ 8. Between waves:
63
+ - Collect results from all tool-worker subagents
64
+ - Check for conflicts before next wave
65
+ - Feed completed wave context forward
66
+ - **Checkpoint**: update session file with completed tasks and next wave
67
+ - Record worker progress with `npm run workflow:sidecar -- record-agent-event --artifact-dir <artifact-dir> --agent-id <worker-id> --kind evidence --status active|done --summary ...`
68
+ 9. After all waves: set session file `status: executed` and update `state.json` / `handoff.json` with `advance-state`
69
+
70
+ The orchestrator owns root `state.json` updates. Workers should receive the workflow artifact root explicitly and append agent events under that root instead of inferring the slug or rewriting shared sidecars.
71
+
72
+ ## Session File Updates
73
+
74
+ Between each wave, append to the session file:
75
+
76
+ ```markdown
77
+ ## Execution Progress
78
+
79
+ ### Wave 1 (completed)
80
+ - [x] Task A — done. Supports: AC1, AC2. Evidence: <test/check/artifact>. Modified files: `<path>`.
81
+ - [x] Task B — done. Supports: AC3. Evidence: <test/check/artifact>. Modified files: `<path>`.
82
+
83
+ ### Wave 2 (in progress)
84
+ - [ ] Task C. Supports: AC4, AC5. Required evidence: <test/check/artifact>.
85
+ - [ ] Task D. Supports: AC6. Required evidence: <test/check/artifact>.
86
+
87
+ ## Requirements Trace
88
+
89
+ - R1 <requirement>. Acceptance: AC1, AC2.
90
+ - R2 <requirement>. Acceptance: AC3.
91
+
92
+ ## Modified Files / Scope
93
+
94
+ - Record changed paths in the session/deliver artifact and worker event summaries after each wave.
95
+ - Do not add ad hoc `modified_files` keys to `state.json` unless the sidecar schema explicitly supports them.
96
+ - Verification and optional governance providers such as Veritas should consume this scope from the session/evidence artifacts or a dedicated evidence sidecar, not from invalid state fields.
97
+ ```
98
+
99
+ This is the recovery point. If context is lost, a new session reads this and knows which waves are done.
100
+
101
+ ## Output
102
+
103
+ - Implemented code in the working directory
104
+ - Session file updated with execution progress and `status: executed`
105
+ - Execution progress follows `context/contracts/execution-contract.md`
106
+ - Structured state/handoff sidecars advanced when `npm run workflow:sidecar --` is available
107
+
108
+ If `advance-state` or artifact validation is unavailable or blocked, record that exact blocker in the session file and do not mark execution as cleanly complete.
109
+
110
+ {context?}
@@ -0,0 +1,137 @@
1
+ ---
2
+ name: "explore"
3
+ description: "Parallel codebase exploration — fans out subagents to map structure, entry points, dependencies, patterns, config, and tests in one pass."
4
+ ---
5
+
6
+ # Codebase Exploration
7
+
8
+ Efficiently gather context about repositories by running parallel exploration tasks.
9
+
10
+ ## Harness Limit
11
+
12
+ Some harnesses cap a single delegation batch at 4 subagents.
13
+ - Respect the current harness limit.
14
+ - If the limit is unknown, assume 4.
15
+ - Never submit more than 4 subagents in one batch.
16
+ - Use multiple waves when needed rather than overfilling the first fan-out.
17
+
18
+ ## Exploration Strategy
19
+
20
+ Spawn MULTIPLE subagents IN PARALLEL to investigate different dimensions:
21
+
22
+ ### Wave 1A (parallel, up to 4 subagents)
23
+ 1. **Structure Scout** - Map directory structure, identify key folders (src, lib, tests, config)
24
+ 2. **Entry Point Finder** - Locate main files, CLI entry points, API routes, exports
25
+ 3. **Dependency Analyzer** - Parse package.json, requirements.txt, go.mod, Cargo.toml, pom.xml
26
+ 4. **Pattern Detective** - Identify architectural patterns, frameworks, coding conventions
27
+
28
+ ### Wave 1B (parallel, after Wave 1A if needed)
29
+ 5. **Config Inspector** - Find and summarize configuration files, env vars, build configs
30
+ 6. **Test Mapper** - Locate test files, understand testing strategy and coverage areas
31
+ 7. **Documentation Auditor** - Cross-reference all documentation against actual file system state:
32
+ - README agent tables vs actual `agents/*.agent-spec.json` files (ghost agents? missing agents?)
33
+ - README skill lists vs actual `skills/*/SKILL.md` files
34
+ - README dependency lists vs `Config` file declarations
35
+ - AGENTS.md shared sections consistency across packages (paths, naming examples, model references)
36
+ - All `.md` and `.json` files: grep for references to agents, skills, or paths that don't exist
37
+ - Agent spec `resources` paths: verify referenced context files exist
38
+ - Agent spec `model` fields: verify they follow conventions (orchestrators=opus, tools=haiku/sonnet)
39
+ - Typos and spelling errors in documentation files
40
+ - Empty directories or dead skill/SOP stubs
41
+
42
+ ### Wave 2 (after Wave 1A/1B — needs dependency list)
43
+ 7. **Tech Stack Researcher** - Research the identified tech stack using web search tools (`web_search`, `web_fetch`) and `tool-dependencies-updater` (audit-only — do NOT apply updates). Goals:
44
+ - Identify outdated or deprecated dependencies and how significant an upgrade would be (patch vs minor vs major, breaking changes)
45
+ - Discover new features in the current stack that the project could leverage
46
+ - Assess whether any part of the stack is irrelevant, superseded, or approaching EOL
47
+ - Surface project-specific context (migration guides, EOL announcements, known issues)
48
+
49
+ ## Execution Model
50
+
51
+ ```
52
+ [User Request]
53
+ |
54
+ v
55
+ [Wave 1A: Spawn first 4 dimensions in parallel]
56
+ |
57
+ v
58
+ [Wave 1B: Spawn remaining dimensions in parallel if needed]
59
+ |
60
+ v
61
+ [Aggregate Wave 1 findings]
62
+ |
63
+ v
64
+ [Wave 2: Spawn Tech Stack Researcher with dependency list from Wave 1]
65
+ - tool-dependencies-updater: audit-only scan for outdated packages, version gaps, security advisories
66
+ - web search: research key frameworks/libraries for new features, deprecation, relevance
67
+ |
68
+ v
69
+ [Final Synthesis]
70
+ ```
71
+
72
+ ## Subagent Prompts (use these as templates)
73
+
74
+ Wave 1A:
75
+ - "Explore the directory structure of this repo. List key folders and their purposes. Focus on: [specific area if provided]"
76
+ - "Find all entry points in this codebase - main files, CLI commands, API routes, exported modules"
77
+ - "Analyze dependencies - what frameworks, libraries, and tools does this project use?"
78
+ - "Identify architectural patterns - is this MVC, microservices, monolith? What conventions are used?"
79
+
80
+ Wave 1B:
81
+ - "Find and summarize all configuration files - what can be configured and how?"
82
+ - "Map the test structure - where are tests, what testing frameworks, what's the coverage strategy?"
83
+ - "Audit all documentation for accuracy: (1) List every agent-spec.json file and cross-reference against README agent tables — flag any agents listed in docs but missing from disk or vice versa. (2) List every skills/*/SKILL.md and cross-reference against README skill lists. (3) Compare Config dependency declarations against README dependency sections. (4) Grep all .md and .json files for references to agent names and verify each referenced agent exists as an agent-spec.json. (5) Check AGENTS.md files across packages for inconsistent paths, naming examples, or model references. (6) Flag empty directories, typos, and dead stubs."
84
+
85
+ Wave 2 (spawn these two in parallel):
86
+ - tool-dependencies-updater: "Scan this project for all dependency manifests, check every dependency against the latest available version, run security advisory checks on outdated packages, and report findings grouped by risk level (critical/major/minor). Do NOT apply any updates — audit only."
87
+ - web search: "Research the following tech stack: [list key frameworks/libraries from Wave 1]. For each, find: (1) latest stable version and what's new, (2) any deprecation or EOL announcements, (3) notable new features that could benefit this project, (4) whether any component has been superseded by a better alternative. Cite sources."
88
+
89
+ ## Output Format
90
+
91
+ After all subagents complete, synthesize into:
92
+
93
+ ```
94
+ ## Codebase Overview
95
+ [1-2 sentence summary]
96
+
97
+ ## Key Findings
98
+ - **Tech Stack**: [languages, frameworks, tools]
99
+ - **Architecture**: [pattern, structure]
100
+ - **Entry Points**: [main files, commands]
101
+ - **Configuration**: [key config files]
102
+ - **Testing**: [strategy, frameworks]
103
+
104
+ ## Tech Stack Health
105
+ - **Outdated (Critical)**: [packages with security vulnerabilities]
106
+ - **Outdated (Major)**: [packages with major version bumps available — note breaking change risk]
107
+ - **Outdated (Minor)**: [packages with minor/patch updates]
108
+ - **New Features Available**: [notable new capabilities in current stack]
109
+ - **Deprecation/EOL Warnings**: [anything approaching end of life]
110
+ - **Upgrade Effort Summary**: [overall assessment — low/medium/high effort to get current]
111
+
112
+ ## Recommended Starting Points
113
+ [Files to read first for understanding]
114
+
115
+ ## Potential Concerns
116
+ [Any issues, outdated deps, missing tests, etc.]
117
+
118
+ ## Documentation Audit
119
+ - **Ghost references**: [agents/skills/paths mentioned in docs but not on disk]
120
+ - **Missing from docs**: [agents/skills that exist on disk but aren't documented]
121
+ - **Stale content**: [outdated descriptions, wrong dependency lists, inconsistent AGENTS.md sections]
122
+ - **Config mismatches**: [README deps vs Config file deps]
123
+ - **Path inconsistencies**: [resource paths in agent specs that don't follow conventions]
124
+ - **Empty/dead artifacts**: [empty directories, stub files with no content]
125
+ - **Typos**: [spelling errors found in documentation]
126
+ ```
127
+
128
+ ## Key Principles
129
+
130
+ - ALWAYS run explorations in PARALLEL within the current harness limit - this is the whole point
131
+ - Never exceed 4 subagents in one batch unless the harness explicitly allows more
132
+ - Wave 2 (Tech Stack Researcher) runs AFTER Wave 1A/1B completes because it needs the dependency list
133
+ - tool-dependencies-updater is used in AUDIT-ONLY mode — never apply updates during explore
134
+ - Be thorough but efficient - don't read entire files, scan for structure
135
+ - Focus on what helps someone GET STARTED quickly
136
+ - Flag anything unusual or concerning
137
+ - If a specific area is requested, weight exploration toward that area
@@ -0,0 +1,87 @@
1
+ ---
2
+ name: "feedback-loop"
3
+ description: "Verify implementation actually works. Visual changes → Playwright; integration changes → commands/tests. Run after completing builds."
4
+ ---
5
+
6
+ # Feedback Loop
7
+
8
+ Verify that what you claim to have built actually works. Don't just say "done" — prove it.
9
+
10
+ ## When to Use
11
+
12
+ - After implementing changes, before declaring them complete
13
+ - When the user asks you to verify or prove your work
14
+ - As the final step of any implementation workflow
15
+ - When you're uncertain whether your changes actually function correctly
16
+
17
+ ## Workflow
18
+
19
+ ### Step 1: IDENTIFY CHANGES
20
+
21
+ Determine what was just built:
22
+ - Check `git diff` for modified/added files
23
+ - Review the active TODO list for context on what was implemented
24
+ - Identify the nature of the change: what should be different now?
25
+
26
+ ### Step 2: CLASSIFY
27
+
28
+ Determine the verification method:
29
+
30
+ | Change Type | Method | Examples |
31
+ |---|---|---|
32
+ | **Visual** | Playwright via `tool-playwright` | UI components, pages, styles, layouts, forms, visual regressions |
33
+ | **Integration** | Commands, tests, execution | APIs, CLIs, libraries, configs, build scripts, data processing |
34
+
35
+ If changes span both, run both verification paths.
36
+
37
+ ### Step 3: VERIFY
38
+
39
+ #### Visual Path (frontend/UI changes)
40
+ Delegate to `tool-playwright`:
41
+ 1. Load the relevant URL (local dev server, preview, etc.)
42
+ 2. Take an accessibility snapshot to confirm elements exist and are structured correctly
43
+ 3. Take a screenshot for visual confirmation
44
+ 4. If interactive — click, type, navigate to exercise the changed behavior
45
+ 5. Compare against expected state: are the right elements present? Does the layout match intent?
46
+
47
+ If the dev server isn't running, start it (or tell the user to) before proceeding.
48
+
49
+ #### Integration Path (non-visual changes)
50
+ Use the most direct verification available, in priority order:
51
+ 1. **Run existing tests** — if tests cover the changed code, run them
52
+ 2. **Execute the code** — run the CLI command, call the API endpoint, import the module
53
+ 3. **Check build** — compile/lint to confirm no syntax or type errors
54
+ 4. **Inspect output** — verify the output matches expected behavior
55
+
56
+ Always capture actual output as evidence.
57
+
58
+ ### Step 4: REPORT
59
+
60
+ State clearly:
61
+ - **What was verified** — which changes, which method
62
+ - **Evidence** — actual output, screenshots, test results, command output
63
+ - **Verdict** — ✅ confirmed working, or ❌ found issues with specifics
64
+
65
+ If verification fails, fix the issue and re-verify. Don't report failure without attempting a fix first.
66
+
67
+ ## Persistence Rule
68
+
69
+ **Keep trying until the user says stop.** This is the core behavior of the feedback loop.
70
+
71
+ - If a verification method fails (Playwright won't connect, tests error out, server won't start), **debug and retry**. Don't downgrade to a weaker method or declare "good enough."
72
+ - If visual verification is required and Playwright is having issues, fix the Playwright issue. Don't fall back to "well the build passes so it's probably fine."
73
+ - If integration tests fail, diagnose why, fix, and re-run. Don't report partial success.
74
+ - Cycle: **verify → fail → diagnose → fix → verify again**. Repeat until either:
75
+ 1. ✅ All verification methods pass with evidence, OR
76
+ 2. 🛑 The user explicitly says to stop or skip a method
77
+
78
+ Never self-exit the loop. Never decide on the AI's behalf that a failure is acceptable. The user breaks the loop, not the agent.
79
+
80
+ ## Key Principles
81
+
82
+ - **Evidence over assertion.** Show output, not just "it works."
83
+ - **Never settle.** If a verification method should work but isn't, that's a bug to fix — not a reason to skip it.
84
+ - **Fix before reporting.** If verification reveals a bug you introduced, fix it and re-run.
85
+ - **Match the medium.** UI changes need visual proof. Backend changes need execution proof.
86
+ - **Be specific.** "Tests pass" is weak. "Ran `npm test` — 14 tests passed, 0 failed, output attached" is strong.
87
+ - **Don't skip this.** The whole point is catching the gap between "I wrote the code" and "the code works."
@@ -0,0 +1,133 @@
1
+ ---
2
+ name: "fix-bug"
3
+ description: "Bug fix orchestrator — diagnose → plan-work → execute-plan → review-work → verify-work → loop. Diagnosis phase is unique to bugs, then chains the same primitives."
4
+ ---
5
+
6
+ # Bug Fix
7
+
8
+ Diagnose a bug, then chain the same plan → execute → verify loop. The diagnosis phase is what makes this different from deliver.
9
+
10
+ ## Agents
11
+
12
+ Inherited from primitives + diagnosis:
13
+
14
+ | Agent | Used by |
15
+ |---|---|
16
+ | tool-planner | diagnosis + plan-work |
17
+ | tool-worker (x4) | execute-plan |
18
+ | tool-code-reviewer | review-work |
19
+ | tool-security-reviewer | review-work (conditional — security-sensitive changes) |
20
+ | tool-verifier | verify-work |
21
+ | tool-playwright | diagnosis (reproduce) + verify-work |
22
+
23
+ ## Orchestrator Rule
24
+
25
+ You never use `read`, `glob`, `grep`, or `code` on source files. All codebase analysis goes through tool-planner. All review goes through review-work. All verification goes through tool-verifier or tool-playwright.
26
+
27
+ ## Input
28
+
29
+ - **Bug report**: screenshot, error log, user description, or all three
30
+ - **Directory**: working directory
31
+
32
+ ## Session File
33
+
34
+ Filename: `<branch>--fix-bug-<slug>.md`
35
+
36
+ ```markdown
37
+ # BUG: <one-liner>
38
+
39
+ branch: <branch>
40
+ worktree: <worktree>
41
+ created: <date>
42
+ status: diagnosing | planning | fixing | verifying | resolved
43
+ type: fix-bug
44
+ iteration: 0
45
+
46
+ ## Bug Report
47
+
48
+ Source: screenshot | error log | user description
49
+ <original report, pasted verbatim>
50
+
51
+ ## Diagnosis
52
+
53
+ Root cause from tool-planner.
54
+
55
+ ## Plan
56
+
57
+ (populated by plan-work)
58
+
59
+ ## Execution Progress
60
+
61
+ (populated by execute-plan)
62
+
63
+ ## Verification Report
64
+
65
+ (populated by verify-work)
66
+
67
+ ## History
68
+
69
+ - iteration 1: partial — fix applied but regression in sidebar
70
+ - iteration 2: pass — bug fixed, no regressions
71
+ ```
72
+
73
+ ## Workflow
74
+
75
+ ### 1. Create session file
76
+
77
+ Paste the bug report verbatim. Set `status: diagnosing`.
78
+
79
+ ### 2. Diagnose (unique to bugs)
80
+
81
+ 1. **Reproduce** (if visual) — delegate to tool-playwright to confirm the bug is visible. Screenshot the broken state.
82
+ 2. **Find root cause** — delegate to tool-planner:
83
+ ```
84
+ Bug: <description>
85
+ Reproduction: <steps or screenshot evidence>
86
+ Directory: <working directory>
87
+ todo_file: <session file path>
88
+ Find the root cause and propose a fix plan.
89
+ ```
90
+ 3. Read the diagnosis from tool-planner's output
91
+ 4. Paste into session file `## Diagnosis`
92
+ 5. Present to user: "Here's what's broken and how I'd fix it. Agree?"
93
+ 6. On approval → proceed to plan
94
+
95
+ ### 3. Plan (plan-work)
96
+
97
+ Invoke plan-work with: diagnosis + fix goal, directory, session file path.
98
+
99
+ ### 4. Execute (execute-plan)
100
+
101
+ Invoke execute-plan with the plan artifact path and session file path.
102
+
103
+ ### 5. Review (review-work)
104
+
105
+ Invoke `review-work` with the session file path. It must delegate to `tool-code-reviewer`, and to `tool-security-reviewer` when security triggers are present. CRITICAL/HIGH findings block and loop back to Execute unless explicitly accepted.
106
+
107
+ ### 6. Verify (verify-work)
108
+
109
+ Invoke verify-work with the session file path. tool-verifier must verify:
110
+ 1. **Bug is fixed** — the specific issue from the report
111
+ 2. **No regressions** — build passes, existing tests pass, related functionality works
112
+
113
+ ### 7. Route on verdict
114
+
115
+ - **All PASS** → resolve
116
+ - **Any FAIL** → loop
117
+ - **Any NOT_VERIFIED** → surface to user
118
+
119
+ ### 8. Loop (on failure)
120
+
121
+ 1. Summarize what failed
122
+ 2. Increment `iteration`
123
+ 3. Re-invoke plan-work with: original diagnosis + failure summary → updated fix plan
124
+ 4. Back to step 4
125
+
126
+ ### 9. Resolve
127
+
128
+ 1. Include verification report verbatim
129
+ 2. Show before/after evidence (screenshots if visual)
130
+ 3. `git diff --stat`
131
+ 4. Set `status: resolved`
132
+
133
+ {context?}