all-hands-cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (305) hide show
  1. package/.allhands/README.md +75 -0
  2. package/.allhands/agents/compounder.yaml +15 -0
  3. package/.allhands/agents/coordinator.yaml +17 -0
  4. package/.allhands/agents/documentor.yaml +15 -0
  5. package/.allhands/agents/e2e-test-planner.yaml +17 -0
  6. package/.allhands/agents/emergent.yaml +22 -0
  7. package/.allhands/agents/executor.yaml +14 -0
  8. package/.allhands/agents/ideation.yaml +11 -0
  9. package/.allhands/agents/initiative-steering.yaml +19 -0
  10. package/.allhands/agents/judge.yaml +13 -0
  11. package/.allhands/agents/planner.yaml +19 -0
  12. package/.allhands/agents/pr-reviewer.yaml +15 -0
  13. package/.allhands/docs.json +5 -0
  14. package/.allhands/docs.local.json +26 -0
  15. package/.allhands/flows/COMPOUNDING.md +203 -0
  16. package/.allhands/flows/COORDINATION.md +89 -0
  17. package/.allhands/flows/CORE.md +87 -0
  18. package/.allhands/flows/DOCUMENTATION.md +218 -0
  19. package/.allhands/flows/E2E_TEST_PLAN_BUILDING.md +140 -0
  20. package/.allhands/flows/EMERGENT_PLANNING.md +57 -0
  21. package/.allhands/flows/IDEATION_SCOPING.md +154 -0
  22. package/.allhands/flows/INITIATIVE_STEERING.md +110 -0
  23. package/.allhands/flows/JUDGE_REVIEWING.md +79 -0
  24. package/.allhands/flows/PROMPT_TASK_EXECUTION.md +68 -0
  25. package/.allhands/flows/PR_REVIEWING.md +43 -0
  26. package/.allhands/flows/SPEC_PLANNING.md +216 -0
  27. package/.allhands/flows/harness/WRITING_HARNESS_FLOWS.md +27 -0
  28. package/.allhands/flows/harness/WRITING_HARNESS_KNOWLEDGE.md +27 -0
  29. package/.allhands/flows/harness/WRITING_HARNESS_ORCHESTRATION.md +27 -0
  30. package/.allhands/flows/harness/WRITING_HARNESS_SKILLS.md +27 -0
  31. package/.allhands/flows/harness/WRITING_HARNESS_TOOLS.md +27 -0
  32. package/.allhands/flows/harness/WRITING_HARNESS_VALIDATION_TOOLING.md +27 -0
  33. package/.allhands/flows/shared/CODEBASE_UNDERSTANDING.md +72 -0
  34. package/.allhands/flows/shared/CREATE_HARNESS_SPEC.md +48 -0
  35. package/.allhands/flows/shared/CREATE_SPEC.md +41 -0
  36. package/.allhands/flows/shared/CREATE_VALIDATION_TOOLING_SPEC.md +70 -0
  37. package/.allhands/flows/shared/DOCUMENTATION_DISCOVERY.md +123 -0
  38. package/.allhands/flows/shared/DOCUMENTATION_WRITER.md +101 -0
  39. package/.allhands/flows/shared/EMERGENT_REFINEMENT_ANALYSIS.md +76 -0
  40. package/.allhands/flows/shared/EXTERNAL_TECH_GUIDANCE.md +97 -0
  41. package/.allhands/flows/shared/IDEATION_CODEBASE_GROUNDING.md +49 -0
  42. package/.allhands/flows/shared/PLAN_DEEPENING.md +152 -0
  43. package/.allhands/flows/shared/PROMPT_TASKS_CURATION.md +113 -0
  44. package/.allhands/flows/shared/PROMPT_VALIDATION_REVIEW.MD +99 -0
  45. package/.allhands/flows/shared/QUICK_PREMORTEM.md +70 -0
  46. package/.allhands/flows/shared/RESEARCH_GUIDANCE.md +38 -0
  47. package/.allhands/flows/shared/REVIEW_OPTIONS_BREAKDOWN.md +68 -0
  48. package/.allhands/flows/shared/SKILL_EXTRACTION.md +84 -0
  49. package/.allhands/flows/shared/SPEC_FLOW_ANALYSIS.md +119 -0
  50. package/.allhands/flows/shared/TDD_WORKFLOW.md +109 -0
  51. package/.allhands/flows/shared/UTILIZE_VALIDATION_TOOLING.md +84 -0
  52. package/.allhands/flows/shared/WRITING_HARNESS_FLOWS.md +11 -0
  53. package/.allhands/flows/shared/WRITING_HARNESS_MCP_TOOLS.md +84 -0
  54. package/.allhands/flows/shared/jury/ARCHITECTURE_REVIEW.md +91 -0
  55. package/.allhands/flows/shared/jury/BEST_PRACTICES_REVIEW.md +80 -0
  56. package/.allhands/flows/shared/jury/CLAIM_VERIFICATION_REVIEW.md +101 -0
  57. package/.allhands/flows/shared/jury/EXPECTATIONS_FIT_REVIEW.md +78 -0
  58. package/.allhands/flows/shared/jury/MAINTAINABILITY_REVIEW.md +110 -0
  59. package/.allhands/flows/shared/jury/PROMPTS_EXPECTATIONS_FIT.md +74 -0
  60. package/.allhands/flows/shared/jury/PROMPTS_FLOW_ANALYSIS.md +92 -0
  61. package/.allhands/flows/shared/jury/PROMPTS_YAGNI.md +78 -0
  62. package/.allhands/flows/shared/jury/PROMPT_PREMORTEM.md +125 -0
  63. package/.allhands/flows/shared/jury/SECURITY_REVIEW.md +86 -0
  64. package/.allhands/flows/shared/jury/YAGNI_REVIEW.md +82 -0
  65. package/.allhands/flows/wip/DEBUG_INVESTIGATION.md +162 -0
  66. package/.allhands/flows/wip/MEMORY_RECALL.md +62 -0
  67. package/.allhands/harness/ah +131 -0
  68. package/.allhands/harness/package-lock.json +5292 -0
  69. package/.allhands/harness/package.json +52 -0
  70. package/.allhands/harness/src/__tests__/e2e/commands.test.ts +307 -0
  71. package/.allhands/harness/src/__tests__/e2e/event-loop.test.ts +539 -0
  72. package/.allhands/harness/src/__tests__/e2e/hooks.test.ts +427 -0
  73. package/.allhands/harness/src/__tests__/e2e/new-initiative-routing.test.ts +137 -0
  74. package/.allhands/harness/src/__tests__/e2e/run-e2e.ts +109 -0
  75. package/.allhands/harness/src/__tests__/e2e/specs-type.test.ts +210 -0
  76. package/.allhands/harness/src/__tests__/e2e/validation-hooks.test.ts +669 -0
  77. package/.allhands/harness/src/__tests__/e2e/validation-path-consistency.test.ts +354 -0
  78. package/.allhands/harness/src/__tests__/e2e/validation.test.ts +528 -0
  79. package/.allhands/harness/src/__tests__/harness/assertions.ts +318 -0
  80. package/.allhands/harness/src/__tests__/harness/cli-runner.ts +359 -0
  81. package/.allhands/harness/src/__tests__/harness/fixture.ts +384 -0
  82. package/.allhands/harness/src/__tests__/harness/hook-runner.ts +411 -0
  83. package/.allhands/harness/src/__tests__/harness/index.ts +122 -0
  84. package/.allhands/harness/src/cli.ts +36 -0
  85. package/.allhands/harness/src/commands/complexity.ts +177 -0
  86. package/.allhands/harness/src/commands/context7.ts +202 -0
  87. package/.allhands/harness/src/commands/docs.ts +557 -0
  88. package/.allhands/harness/src/commands/hooks.ts +24 -0
  89. package/.allhands/harness/src/commands/index.ts +51 -0
  90. package/.allhands/harness/src/commands/knowledge.ts +382 -0
  91. package/.allhands/harness/src/commands/memories.ts +302 -0
  92. package/.allhands/harness/src/commands/notify.ts +61 -0
  93. package/.allhands/harness/src/commands/oracle.ts +158 -0
  94. package/.allhands/harness/src/commands/perplexity.ts +220 -0
  95. package/.allhands/harness/src/commands/planning.ts +245 -0
  96. package/.allhands/harness/src/commands/schema.ts +73 -0
  97. package/.allhands/harness/src/commands/skills.ts +128 -0
  98. package/.allhands/harness/src/commands/solutions.ts +353 -0
  99. package/.allhands/harness/src/commands/spawn.ts +158 -0
  100. package/.allhands/harness/src/commands/specs.ts +532 -0
  101. package/.allhands/harness/src/commands/tavily.ts +226 -0
  102. package/.allhands/harness/src/commands/tools.ts +579 -0
  103. package/.allhands/harness/src/commands/trace.ts +327 -0
  104. package/.allhands/harness/src/commands/tui.ts +960 -0
  105. package/.allhands/harness/src/commands/validate.ts +143 -0
  106. package/.allhands/harness/src/commands/validation-tools.ts +108 -0
  107. package/.allhands/harness/src/hooks/context.ts +1442 -0
  108. package/.allhands/harness/src/hooks/enforcement.ts +170 -0
  109. package/.allhands/harness/src/hooks/index.ts +54 -0
  110. package/.allhands/harness/src/hooks/lifecycle.ts +229 -0
  111. package/.allhands/harness/src/hooks/notification.ts +104 -0
  112. package/.allhands/harness/src/hooks/observability.ts +551 -0
  113. package/.allhands/harness/src/hooks/session.ts +88 -0
  114. package/.allhands/harness/src/hooks/shared.ts +815 -0
  115. package/.allhands/harness/src/hooks/transcript-parser.ts +208 -0
  116. package/.allhands/harness/src/hooks/validation.ts +617 -0
  117. package/.allhands/harness/src/lib/__tests__/ctags.test.ts +244 -0
  118. package/.allhands/harness/src/lib/__tests__/docs-validation.test.ts +344 -0
  119. package/.allhands/harness/src/lib/__tests__/mcp-runtime.test.ts +190 -0
  120. package/.allhands/harness/src/lib/__tests__/schema.test.ts +861 -0
  121. package/.allhands/harness/src/lib/base-command.ts +198 -0
  122. package/.allhands/harness/src/lib/cli-daemon.ts +343 -0
  123. package/.allhands/harness/src/lib/compaction.ts +313 -0
  124. package/.allhands/harness/src/lib/ctags.ts +497 -0
  125. package/.allhands/harness/src/lib/docs-validation.ts +907 -0
  126. package/.allhands/harness/src/lib/event-loop.ts +662 -0
  127. package/.allhands/harness/src/lib/flows.ts +155 -0
  128. package/.allhands/harness/src/lib/git.ts +276 -0
  129. package/.allhands/harness/src/lib/knowledge-worker.ts +72 -0
  130. package/.allhands/harness/src/lib/knowledge.ts +810 -0
  131. package/.allhands/harness/src/lib/llm.ts +255 -0
  132. package/.allhands/harness/src/lib/mcp-client.ts +432 -0
  133. package/.allhands/harness/src/lib/mcp-daemon.ts +486 -0
  134. package/.allhands/harness/src/lib/mcp-runtime.ts +418 -0
  135. package/.allhands/harness/src/lib/notification.ts +115 -0
  136. package/.allhands/harness/src/lib/opencode/index.ts +70 -0
  137. package/.allhands/harness/src/lib/opencode/profiles.ts +300 -0
  138. package/.allhands/harness/src/lib/opencode/prompts/codesearch.md +98 -0
  139. package/.allhands/harness/src/lib/opencode/prompts/knowledge-aggregator.md +67 -0
  140. package/.allhands/harness/src/lib/opencode/runner.ts +281 -0
  141. package/.allhands/harness/src/lib/oracle.ts +926 -0
  142. package/.allhands/harness/src/lib/planning-utils.ts +150 -0
  143. package/.allhands/harness/src/lib/planning.ts +605 -0
  144. package/.allhands/harness/src/lib/pr-review.ts +225 -0
  145. package/.allhands/harness/src/lib/prompts.ts +522 -0
  146. package/.allhands/harness/src/lib/schema.ts +418 -0
  147. package/.allhands/harness/src/lib/schemas/agent-profile.ts +141 -0
  148. package/.allhands/harness/src/lib/schemas/template-vars.ts +138 -0
  149. package/.allhands/harness/src/lib/session.ts +164 -0
  150. package/.allhands/harness/src/lib/specs.ts +348 -0
  151. package/.allhands/harness/src/lib/tldr.ts +829 -0
  152. package/.allhands/harness/src/lib/tmux.ts +1051 -0
  153. package/.allhands/harness/src/lib/trace-store.ts +714 -0
  154. package/.allhands/harness/src/mcp/__tests__/index.test.ts +46 -0
  155. package/.allhands/harness/src/mcp/_template.ts +47 -0
  156. package/.allhands/harness/src/mcp/filesystem.ts +33 -0
  157. package/.allhands/harness/src/mcp/index.ts +69 -0
  158. package/.allhands/harness/src/mcp/playwright.ts +34 -0
  159. package/.allhands/harness/src/mcp/xcodebuild.ts +29 -0
  160. package/.allhands/harness/src/schemas/docs.schema.json +44 -0
  161. package/.allhands/harness/src/schemas/settings.schema.json +214 -0
  162. package/.allhands/harness/src/tui/actions.ts +227 -0
  163. package/.allhands/harness/src/tui/file-viewer-modal.ts +270 -0
  164. package/.allhands/harness/src/tui/index.ts +1574 -0
  165. package/.allhands/harness/src/tui/modal.ts +232 -0
  166. package/.allhands/harness/src/tui/prompts-pane.ts +186 -0
  167. package/.allhands/harness/src/tui/status-pane.ts +434 -0
  168. package/.allhands/harness/tsconfig.json +22 -0
  169. package/.allhands/harness/vitest.config.ts +13 -0
  170. package/.allhands/pillars.md +33 -0
  171. package/.allhands/principles.md +88 -0
  172. package/.allhands/schemas/alignment.yaml +51 -0
  173. package/.allhands/schemas/documentation.yaml +10 -0
  174. package/.allhands/schemas/prompt.yaml +92 -0
  175. package/.allhands/schemas/skill.yaml +34 -0
  176. package/.allhands/schemas/solution.yaml +131 -0
  177. package/.allhands/schemas/spec.yaml +67 -0
  178. package/.allhands/schemas/validation-suite.yaml +49 -0
  179. package/.allhands/schemas/workflow.yaml +51 -0
  180. package/.allhands/settings.json +57 -0
  181. package/.allhands/skills/claude-code-patterns/SKILL.md +60 -0
  182. package/.allhands/skills/claude-code-patterns/docs/context-hygiene.md +19 -0
  183. package/.allhands/skills/harness-maintenance/SKILL.md +449 -0
  184. package/.allhands/skills/harness-maintenance/references/core-architecture.md +187 -0
  185. package/.allhands/skills/harness-maintenance/references/harness-skills.md +87 -0
  186. package/.allhands/skills/harness-maintenance/references/knowledge-compounding.md +78 -0
  187. package/.allhands/skills/harness-maintenance/references/tools-commands-mcp-hooks.md +115 -0
  188. package/.allhands/skills/harness-maintenance/references/validation-tooling.md +77 -0
  189. package/.allhands/skills/harness-maintenance/references/writing-flows.md +84 -0
  190. package/.allhands/validation/browser-automation.md +109 -0
  191. package/.allhands/validation/xcode-automation.md +195 -0
  192. package/.allhands/workflows/documentation.md +86 -0
  193. package/.allhands/workflows/investigation.md +81 -0
  194. package/.allhands/workflows/milestone.md +91 -0
  195. package/.allhands/workflows/optimization.md +85 -0
  196. package/.allhands/workflows/refactor.md +99 -0
  197. package/.allhands/workflows/triage.md +81 -0
  198. package/.claude/README.md +1 -0
  199. package/.claude/agents/explorer.md +10 -0
  200. package/.claude/agents/researcher.md +11 -0
  201. package/.claude/agents/task-runner.md +8 -0
  202. package/.claude/settings.json +231 -0
  203. package/.env.ai.example +7 -0
  204. package/.github/workflows/npm-publish.yml +69 -0
  205. package/.internal.json +45 -0
  206. package/.tldr/config.json +11 -0
  207. package/.tldrignore +90 -0
  208. package/CLAUDE.md +6 -0
  209. package/README.md +98 -0
  210. package/bin/sync-cli.js +7552 -0
  211. package/concerns.md +7 -0
  212. package/docs/README.md +41 -0
  213. package/docs/agents/README.md +24 -0
  214. package/docs/agents/agent-configuration-system.md +86 -0
  215. package/docs/agents/execution-agents.md +50 -0
  216. package/docs/agents/knowledge-agents.md +61 -0
  217. package/docs/agents/orchestration-agent.md +57 -0
  218. package/docs/agents/planning-agents.md +84 -0
  219. package/docs/agents/quality-review-agents.md +67 -0
  220. package/docs/agents/workflow-agent-orchestration.md +69 -0
  221. package/docs/flows/README.md +44 -0
  222. package/docs/flows/compounding.md +126 -0
  223. package/docs/flows/coordination.md +72 -0
  224. package/docs/flows/core-harness-integration.md +63 -0
  225. package/docs/flows/documentation-orchestration.md +98 -0
  226. package/docs/flows/e2e-test-plan-building.md +83 -0
  227. package/docs/flows/emergent-refinement.md +104 -0
  228. package/docs/flows/flow-authoring-and-mcp-tools.md +89 -0
  229. package/docs/flows/judge-reviewing.md +112 -0
  230. package/docs/flows/plan-deepening-and-research.md +107 -0
  231. package/docs/flows/plan-review-jury.md +114 -0
  232. package/docs/flows/pr-reviewing.md +54 -0
  233. package/docs/flows/prompt-task-execution.md +119 -0
  234. package/docs/flows/spec-planning.md +162 -0
  235. package/docs/flows/type-specific-scoping-flows.md +49 -0
  236. package/docs/flows/validation-and-skills-integration.md +145 -0
  237. package/docs/flows/wip/wip-flows.md +102 -0
  238. package/docs/harness/README.md +23 -0
  239. package/docs/harness/agent-profiles.md +84 -0
  240. package/docs/harness/cli/README.md +24 -0
  241. package/docs/harness/cli/cli-entry-and-command-discovery.md +91 -0
  242. package/docs/harness/cli/docs-command.md +87 -0
  243. package/docs/harness/cli/knowledge-command.md +91 -0
  244. package/docs/harness/cli/minor-cli-commands.md +65 -0
  245. package/docs/harness/cli/oracle-command.md +113 -0
  246. package/docs/harness/cli/planning-command.md +95 -0
  247. package/docs/harness/cli/schema-and-validation-commands.md +154 -0
  248. package/docs/harness/cli/search-commands.md +97 -0
  249. package/docs/harness/cli/spawn-command.md +136 -0
  250. package/docs/harness/cli/specs-command.md +102 -0
  251. package/docs/harness/cli/tools-command.md +122 -0
  252. package/docs/harness/cli/trace-command.md +122 -0
  253. package/docs/harness/cli-daemon.md +92 -0
  254. package/docs/harness/event-loop.md +184 -0
  255. package/docs/harness/hooks/README.md +15 -0
  256. package/docs/harness/hooks/context-hooks.md +96 -0
  257. package/docs/harness/hooks/lifecycle-and-observability-hooks.md +135 -0
  258. package/docs/harness/hooks/validation-hooks.md +97 -0
  259. package/docs/harness/test-harness.md +149 -0
  260. package/docs/harness/tui.md +176 -0
  261. package/docs/memories.md +20 -0
  262. package/docs/solutions/agentic-issues/premature-agent-deletion-tui-action-dependency-20260130.md +49 -0
  263. package/docs/solutions/agentic-issues/ref-anchor-scope-mismatch-skill-references-20260131.md +55 -0
  264. package/docs/solutions/agentic-issues/tautological-tests-routing-20260131.md +52 -0
  265. package/docs/solutions/integration_issue/blocktool-output-format-mismatch-hook-runner-20260130.md +52 -0
  266. package/docs/solutions/integration_issue/dual-validation-path-divergence-schema-20260130.md +66 -0
  267. package/docs/solutions/security-issues/unsanitized-domain-path-join-20260131.md +52 -0
  268. package/docs/solutions/test-failures/event-loop-mock-ordering-checkAgentWindows-20260130.md +63 -0
  269. package/docs/sync-cli/README.md +19 -0
  270. package/docs/sync-cli/cli-entrypoint-and-commands.md +39 -0
  271. package/docs/sync-cli/commands/README.md +11 -0
  272. package/docs/sync-cli/commands/pull-manifest-command.md +36 -0
  273. package/docs/sync-cli/commands/push-command.md +84 -0
  274. package/docs/sync-cli/commands/sync-command.md +71 -0
  275. package/docs/sync-cli/systems/README.md +14 -0
  276. package/docs/sync-cli/systems/git-and-github-integration.md +49 -0
  277. package/docs/sync-cli/systems/interactive-ui.md +43 -0
  278. package/docs/sync-cli/systems/manifest-and-distribution.md +51 -0
  279. package/docs/sync-cli/systems/path-resolution.md +42 -0
  280. package/package.json +46 -0
  281. package/scripts/install-shim.sh +40 -0
  282. package/scripts/pre-pack.sh +25 -0
  283. package/specs/harness-maintenance-skill.spec.md +138 -0
  284. package/specs/roadmap/git-spec-lifecycle-management.spec.md +113 -0
  285. package/specs/sync-init-flag.spec.md +117 -0
  286. package/specs/unified-workflow-orchestration.spec.md +250 -0
  287. package/specs/validation-tooling-practice.spec.md +98 -0
  288. package/specs/workflow-domain-configuration.spec.md +265 -0
  289. package/src/commands/pull-manifest.ts +31 -0
  290. package/src/commands/push.ts +344 -0
  291. package/src/commands/sync.ts +289 -0
  292. package/src/lib/constants.ts +10 -0
  293. package/src/lib/dotfiles.ts +36 -0
  294. package/src/lib/fs-utils.ts +18 -0
  295. package/src/lib/gh.ts +40 -0
  296. package/src/lib/git.ts +63 -0
  297. package/src/lib/gitignore.ts +167 -0
  298. package/src/lib/manifest.ts +121 -0
  299. package/src/lib/marker-sync.ts +39 -0
  300. package/src/lib/paths.ts +38 -0
  301. package/src/lib/target-lines.ts +66 -0
  302. package/src/lib/ui.ts +78 -0
  303. package/src/sync-cli.ts +120 -0
  304. package/target-lines.json +23 -0
  305. package/tsconfig.json +20 -0
@@ -0,0 +1,250 @@
1
+ ---
2
+ name: unified-workflow-orchestration
3
+ domain_name: infrastructure
4
+ status: completed
5
+ dependencies: []
6
+ branch: feature/unified-workflow-orchestration
7
+ ---
8
+
9
+ # Unified Workflow Orchestration
10
+
11
+ ## Motivation
12
+
13
+ The harness currently has one workflow: milestone-based feature development. The entire TUI, event loop, planning system, and agent registry are hardcoded to this pipeline: Ideation → Spec → Planner → Execution Loop → Emergent → Review → PR → Compound. This works exceptionally well for milestone-based development but creates three structural problems:
14
+
15
+ 1. **No entry point for exploratory work** — Debugging, optimization, refactoring, documentation, and triage workflows have no way to leverage the harness's prompt-based execution loop. The ideation flow is fitted to milestone spec creation (6-dimension deep interview). The planner assumes all open questions must be resolved via engineer interview before execution. The event loop requires all planned prompts to complete before emergent agents spawn. None of this fits exploratory work where execution IS discovery.
16
+
17
+ 2. **Emergent refinement is coupled to execution** — The current emergent agent (`EMERGENT_REFINEMENT_EXECUTION.md`) both formulates hypotheses AND implements them. This means: a separate agent type (`emergent.yaml`), a separate flow, special spawn logic in the event loop, special window tracking, and a dedicated TUI toggle. All of this exists because the emergent agent is a planner-executor hybrid — but the executor agent already knows how to execute prompts perfectly.
18
+
19
+ 3. **Workflow configuration is over-engineered** — The `workflows/` directory with YAML configs, the `workflows.ts` loader, the `workflow.ts` Zod schema, and the `emergentEnabled` state across TUIState/EventLoopState/ToggleState all exist to control one thing: when emergent agents spawn and what domains they explore. This machinery serves a concern that can be expressed as a field on the spec and a simple condition in the event loop.
20
+
21
+ The engineer has validated the current milestone pipeline extensively and wants to extend the harness to support the full SDLC — debugging, optimization, refactoring, documentation, triage — while simplifying orchestration. The core insight: all workflows share the same execution primitive (prompt-based loop with hypothesis-driven emergence). They differ only in planning depth and how initial prompts are created, both of which are driven by the spec type and the planner's behavior.
22
+
23
+ ## Goals
24
+
25
+ ### 1. Two-Agent Execution Model: Hypothesis Planner + Executors
26
+
27
+ Replace the current emergent agent (which plans AND executes in one lifecycle) with a separation of concerns:
28
+
29
+ - **Hypothesis Planner** — A new agent that reads the alignment doc + prior prompt summaries, formulates 1-N non-overlapping hypotheses, creates prompt files for each, and dies. It is a non-prompt-scoped agent (one instance, not one-per-prompt). It produces prompts that regular executors pick up.
30
+ - **Executors** — Unchanged. Pick up prompts from the planning directory and execute them. No distinction between "planned" prompts and "hypothesis" prompts at the executor level.
31
+
32
+ The event loop becomes: pick pending prompt → spawn executor. When no prompts remain and nothing is in progress → spawn hypothesis planner → it creates more prompts → loop picks them up on next tick. One decision path, not two.
33
+
34
+ The hypothesis planner can create **multiple prompts per invocation**, which is better than the current one-emergent-one-prompt model. It sees the full picture and can plan a batch of non-overlapping hypotheses. Executors can then run those in parallel if the parallel toggle is on. Compounding happens between planner invocations via the alignment doc.
35
+
36
+ If the hypothesis planner determines nothing valuable remains (all open questions addressed, goals met, no meaningful gaps), it creates 0 prompts and the loop idles until the user acts.
37
+
38
+ ### 2. Unified Event Loop — No Emergent Concept
39
+
40
+ Simplify `checkPromptLoop()` to have one code path:
41
+
42
+ ```
43
+ loop enabled + status ready + pending prompts exist → pick next, spawn executor
44
+ loop enabled + status ready + no pending + no in_progress → spawn hypothesis planner
45
+ loop enabled + status ready + no pending + in_progress exist → wait (executors still working)
46
+ loop disabled → nothing
47
+ ```
48
+
49
+ Remove: `emergentEnabled` from EventLoopState/TUIState/ToggleState, emergent toggle from TUI actions, emergent-specific spawn callbacks, emergent window prefix checks in `checkAgentWindows()`, emergent agent profile, `EMERGENT_REFINEMENT_EXECUTION.md` flow, and all code that distinguishes emergent from executor spawning.
50
+
51
+ The hypothesis planner is just another agent the event loop can spawn — the same way it spawns executors. The spawn condition is "no prompts left" instead of "prompt available."
52
+
53
+ No `emergent_trigger` flag on status.yaml. No mode field. The loop's behavior is identical for all spec types. The difference in timing is entirely determined by how many initial prompts the planner creates (many for milestone, few for exploratory, zero for documentation), which naturally determines when hypothesis planning kicks in.
54
+
55
+ ### 3. Spec Type Field + User-Selected Hypothesis Domains
56
+
57
+ Add two fields to the spec schema:
58
+
59
+ - `type` — Enum: `milestone`, `investigation`, `optimization`, `refactor`, `documentation`, `triage`. Determines planner behavior (planning depth) and branch prefix convention.
60
+ - `hypothesis_domains` — String array, optional. Selected by the user during spec creation based on the agent's recommendation from the available domains in `settings.json`. Falls back to `settings.json` global defaults if absent. Constrains what kinds of hypotheses the hypothesis planner can create.
61
+
62
+ During `CREATE_SPEC`, the spec creation flow presents the available hypothesis domains from `settings.json` with its recommendation for the spec being created. The user selects which domains to include. This is one question, asked in context, providing maximum flexibility with zero mapping maintenance.
63
+
64
+ Remove: `workflows/` directory, `workflows.ts`, `workflow.ts` schema, `getHypothesisDomains()` function, `formatHypothesisDomains()` function, all workflow config loading logic.
65
+
66
+ ### 4. Consolidated "New Initiative" TUI Action
67
+
68
+ Replace the current `[2] Ideation` action with `[2] New Initiative` that opens a selection modal listing all spec creation workflows:
69
+
70
+ | Type | Description | Spec Creation Flow |
71
+ |------|-------------|-------------------|
72
+ | Milestone | Feature development with deep ideation | Existing `IDEATION_SESSION.md` (deep 6-dimension interview) |
73
+ | Investigation | Debug / diagnose issues | Lightweight scoping flow (problem, evidence, success criteria, constraints) |
74
+ | Optimization | Performance / efficiency work | Lightweight scoping flow (targets, baselines, measurement approach) |
75
+ | Refactor | Cleanup / tech debt | Lightweight scoping flow (scope boundaries, invariants to preserve) |
76
+ | Documentation | Coverage gaps | Lightweight scoping flow (areas to cover, documentation standards) |
77
+ | Triage | External signal analysis | Scoping flow that reads from external sources (analytics, error tracking) |
78
+
79
+ Each selection runs the appropriate spec creation flow. All flows produce a spec following the same schema, with `type` set and `hypothesis_domains` selected. All flows end with `CREATE_SPEC.md` to persist. The milestone flow is the existing `IDEATION_SESSION.md` — unchanged. The exploratory flows are new, lighter flows (~40 lines each) that ask the 3-5 questions appropriate to their domain.
80
+
81
+ ### 5. Unified Planner Behavior by Spec Type
82
+
83
+ The planner agent (`SPEC_PLANNING.md`) reads the spec's `type` field and calibrates its depth:
84
+
85
+ **For milestone specs** (existing behavior, mostly unchanged):
86
+ - Deep codebase + external research (1-4 subtasks)
87
+ - Engineer interview with decision points — each open question becomes an `AskUserQuestion` with 2-4 options including a recommended approach
88
+ - Unanswered/skipped questions → disposable variant prompts behind feature flags for quality engineering comparison
89
+ - Concerns from spec → specific prompts to de-risk via implementation
90
+ - Jury review of prompt plan
91
+ - Output: 5-15 coordinated prompts + detailed alignment doc with overview, hard user requirements, engineer decisions
92
+
93
+ **For all exploratory spec types** (new behavior):
94
+ - Focused research (1-2 subtasks grounded in the problem area)
95
+ - Engineer presented with open questions and concerns from spec — can answer to narrow scope, or skip to leave open for hypothesis-driven discovery
96
+ - Skipped/unanswered questions → documented in alignment doc as "Unresolved Questions" visible to the hypothesis planner for experiment design
97
+ - Concerns/limitations → documented in alignment doc as context for hypothesis formation
98
+ - No jury review (lightweight)
99
+ - Output: 0-3 seed prompts (testable hypotheses) + alignment doc with problem statement, evidence, unresolved questions, and success criteria
100
+
101
+ The planner is always the agent that bridges "spec" to "executable loop." It creates the `.planning/{branch}/` directory contents (prompts + alignment doc) and transitions status.yaml to ready. The engineer runs the planner, optionally answers questions to narrow scope, and enables the loop. Same pipeline for all spec types, different depth.
102
+
103
+ ### 6. Always-Available TUI Actions
104
+
105
+ Remove all conditional visibility (`hidden`, `disabled` based on state) from the TUI actions pane. Every action is always visible. Agents that find nothing to do exit early with a message. This eliminates conditional state tracking in `buildActionItems()` and makes the TUI behavior predictable regardless of workflow type.
106
+
107
+ The resulting actions pane:
108
+
109
+ ```
110
+ [1] Coordinator
111
+ [2] New Initiative
112
+ [3] Planner
113
+ [4] Review Jury
114
+ [5] E2E Test Plan
115
+ [6] PR Action
116
+ [7] Address PR Review
117
+ [8] Compound
118
+ [9] Complete
119
+ [0] Switch Workspace
120
+ [-] Custom Flow
121
+ ━━ Toggles ━━
122
+ [O] Loop
123
+ [P] Parallel
124
+ ━━ Controls ━━
125
+ [V] View Logs
126
+ [C] Clear Logs
127
+ [R] Refresh
128
+ [Q] Quit
129
+ ```
130
+
131
+ Two toggles (Loop, Parallel). No emergent toggle. No workflow-dependent action sets.
132
+
133
+ ### 7. Hypothesis Planner Agent + Flow
134
+
135
+ Create `hypothesis-planner.yaml` agent profile and a corresponding flow file. The flow is concise (~30-40 lines):
136
+
137
+ - Read alignment doc: goals, prior prompt summaries, unresolved questions, learnings
138
+ - Identify gaps between current state and desired state (per spec goals + success criteria)
139
+ - Select hypothesis domains from spec's `hypothesis_domains` field, diversifying from prior work
140
+ - Discover validation tooling for hypotheses
141
+ - Create 1-N prompt files following `PROMPT_TASKS_CURATION.md` (each targeting a non-overlapping hypothesis)
142
+ - If no valuable hypotheses remain (goals met, questions resolved, no meaningful gaps): create 0 prompts, stop
143
+ - Stop — executors pick up prompts via the loop
144
+
145
+ The hypothesis planner replaces both `emergent.yaml` and `EMERGENT_REFINEMENT_EXECUTION.md`. The key difference from the current emergent agent: it creates prompts but does NOT execute them. Execution is always done by executor agents.
146
+
147
+ ### 8. Open Questions Flow Through the System
148
+
149
+ For milestone specs:
150
+ - Spec open questions → planner interviews engineer → definitive decisions or disposable variants → documented in alignment doc as "Engineer Decisions"
151
+ - Concerns → specific de-risk prompts → documented in alignment doc
152
+
153
+ For exploratory specs:
154
+ - Spec open questions → planner presents to engineer → answered questions narrow scope, skipped questions stay open → documented in alignment doc as "Unresolved Questions"
155
+ - Concerns/limitations → documented in alignment doc as hypothesis formation context
156
+ - The hypothesis planner reads "Unresolved Questions" and creates experiments to test different answers
157
+ - Each experiment's summary feeds back into the alignment doc, narrowing the solution space for the next hypothesis planner invocation
158
+
159
+ This creates a natural convergence: exploratory specs start with many unknowns and converge toward solutions through iterative hypothesis testing. Milestone specs start with resolved decisions and diverge into comprehensive implementation.
160
+
161
+ ### 9. Lightweight Scoping Flows for Exploratory Spec Types
162
+
163
+ Create scoping flows for each non-milestone spec type. These are lighter than `IDEATION_SESSION.md` (~40 lines each) and follow a consistent pattern:
164
+
165
+ 1. Ask type-specific questions (3-5 questions max)
166
+ 2. Spawn 1-2 targeted codebase grounding tasks
167
+ 3. Write spec with `type` set and `hypothesis_domains` selected
168
+ 4. Persist via `CREATE_SPEC.md`
169
+
170
+ Each scoping flow elicits different information:
171
+
172
+ | Type | Key Questions |
173
+ |------|--------------|
174
+ | Investigation | What's broken? What evidence? What does fixed look like? Constraints? |
175
+ | Optimization | What's slow/expensive? What are the targets? How to measure? Constraints? |
176
+ | Refactor | What's the scope? What invariants must be preserved? What's the target architecture? |
177
+ | Documentation | What areas need coverage? What audience? Any existing docs to extend? |
178
+ | Triage | Which external sources? What time range? What severity threshold? |
179
+
180
+ The triage scoping flow is distinct in that it reads from external sources (PostHog, Sentry, etc.) rather than interviewing the user. It compiles findings into a structured report, the user selects which issues to address, and those become the spec content.
181
+
182
+ ## Non-Goals
183
+
184
+ - **Changing the executor agent or PROMPT_TASK_EXECUTION flow** — Executors are unchanged. They pick up prompts and execute them regardless of how those prompts were created (planner, hypothesis planner, or coordinator patch).
185
+ - **Changing the spec schema body sections** — Motivation, Goals, Non-Goals, Open Questions, Technical Considerations work for all spec types. The body content varies in depth, not structure.
186
+ - **Automated spec type detection** — The user selects the type via the New Initiative modal. No inference from branch name or content.
187
+ - **Removing the coordinator agent** — The coordinator remains for mid-workflow intervention (quick patches, prompt edits, kill/restart). Its flow may need minor updates to work without the emergent concept.
188
+ - **Changing the planner's research or jury capabilities** — The planner's subtask spawning for codebase/external research is unchanged. The jury is skipped for exploratory specs but the capability remains.
189
+ - **Parallel hypothesis planner invocations** — Only one hypothesis planner runs at a time. Parallelism is at the executor level (multiple executors running different prompts concurrently).
190
+ - **CI/CD integration for new workflow types** — This milestone establishes the orchestration model. CI/CD pipeline changes for non-milestone branches are downstream.
191
+
192
+ ## Open Questions
193
+
194
+ - **Hypothesis planner termination** — When the hypothesis planner determines nothing valuable remains, it creates 0 prompts and the loop idles. Should there be an explicit signal to the TUI (e.g., "Hypothesis planner found no more work")? Or is the loop simply idling with no activity sufficient?
195
+ - **Status.yaml `stage` field** — Currently `stage: 'planning' | 'executing' | 'reviewing' | 'pr' | 'compound'`. Does this need updating for exploratory workflows where the stages blend (planning and executing overlap via hypothesis-driven discovery)?
196
+ - **Branch prefix conventions** — Milestone specs use `feature/{name}`. Should exploratory types have their own prefixes (`fix/`, `optimize/`, `refactor/`, `docs/`, `triage/`) or all use a generic prefix?
197
+ - **Seed prompt type field** — Should prompts created by the hypothesis planner use `type: hypothesis` in frontmatter (vs `type: planned` from the planner, `type: user-patch` from coordinator)? Or is the distinction unnecessary since executors treat all prompts identically?
198
+ - **Scoping flow reuse** — The investigation, optimization, refactor, and documentation scoping flows share a pattern (ask questions, ground in codebase, write spec). Should there be one parameterized scoping flow or separate flows per type? Separate flows are more maintainable per **Context is Precious** (each is small and self-contained) but create more files.
199
+ - **Triage external source integration** — The triage scoping flow needs to read from PostHog, Sentry, or similar tools. Should this be via MCP servers, direct API calls via `ah` commands, or agent-driven web fetching? The integration approach affects how easily new external sources can be added.
200
+ - **Compounding flow updates** — The current `COMPOUNDING.md` flow is milestone-oriented (spec finalization, memory extraction). Does it need adjustments for exploratory specs where the "completion" criteria are different (problem solved vs. spec acceptance criteria met)?
201
+ - **Alignment doc schema for exploratory types** — The current alignment doc schema has Overview, Hard User Requirements, Engineer Decisions, Prompt Summaries. Exploratory workflows need: Problem Statement, Evidence, Unresolved Questions, Hypothesis Results, Prompt Summaries. Should the alignment schema be updated with a type-dependent structure, or should the planner simply write different content into the same sections?
202
+ - **TUI "New Initiative" modal data source** — Should the list of available spec types be hardcoded in the TUI, driven by the spec schema enum, or configurable in settings.json? Schema-driven is self-documenting but requires schema parsing. Settings-driven is flexible but another config surface.
203
+ - **Custom Flow action scope** — With the New Initiative modal covering spec creation and the planner handling prompt generation, does the Custom Flow action (`[-]`) need to change? Currently it spawns any agent with a custom message. This remains useful for ad-hoc agent invocations outside the standard pipeline.
204
+
205
+ ## Technical Considerations
206
+
207
+ - **Event loop change is minimal** — The core change in `checkPromptLoop()` is replacing the two-path logic (executor spawn vs. emergent spawn) with one path that either spawns an executor (prompts exist) or spawns the hypothesis planner (no prompts, nothing in progress). The `emergentEnabled` state, emergent window prefix checks, and emergent spawn callbacks are removed. Net code reduction.
208
+ - **Agent profile for hypothesis planner** — The `hypothesis-planner.yaml` profile needs: `prompt_scoped: false` (one instance), `non_coding: true` (only creates prompt files, doesn't implement), template vars for alignment path, prompts folder, and hypothesis domains.
209
+ - **Hypothesis planner overlap prevention** — Since the hypothesis planner creates all prompts for a "round" at once, they are inherently non-overlapping. No need for distributed locking or early prompt file claims. The planner coordinates internally.
210
+ - **Backwards compatibility with existing specs** — Existing specs lack `type` and `hypothesis_domains` fields. The system should treat missing `type` as `milestone` and missing `hypothesis_domains` as falling back to `settings.json` defaults. No migration needed for existing specs.
211
+ - **`pickNextPrompt()` unchanged** — The prompt picker algorithm already works generically: find pending prompts with satisfied dependencies, exclude active ones, return lowest number. It doesn't know or care about prompt origin (planned, hypothesis, user-patch). No changes needed.
212
+ - **TUI action simplification** — Removing all `hidden`/`disabled` conditions from `buildActionItems()` simplifies the function significantly. The `ToggleState` interface loses `emergentEnabled`, `hasSpec`, `hasCompletedPrompts`, `compoundRun`, `prReviewUnlocked`. Only `loopEnabled`, `parallelEnabled`, and `prActionState` remain.
213
+ - **Planner flow branching** — The spec type branch in `SPEC_PLANNING.md` should be concise per **Frontier Models are Capable**. ~10 lines of flow guidance explaining "read spec type, if milestone do deep planning, otherwise do light planning." The model deduces the appropriate depth.
214
+ - **Documentation flow consolidation** — The current `DOCUMENTATION.md` flow uses a two-layer delegation pattern (discovery agents → writer agents). With the hypothesis planner model, documentation becomes: hypothesis planner identifies uncovered areas → creates prompts for each area → executors write documentation. This replaces the custom documentation orchestration with the standard loop, eliminating the `documentor.yaml` agent and `DOCUMENTATION.md` flow.
215
+ - **Settings.json `emergent` section** — The `emergent.hypothesisDomains` field in settings.json remains as the global default menu of available domains. The `emergent` key could be renamed to `hypothesis` for clarity, but this is a cosmetic change.
216
+ - **Spec `type` determines branch prefix** — The `CREATE_SPEC.md` flow (or `ah specs persist`) should derive branch prefix from spec type: milestone → `feature/`, investigation → `fix/`, optimization → `optimize/`, refactor → `refactor/`, documentation → `docs/`, triage → `triage/`. This is a convention, not a hard constraint — the `branch` field on the spec is always the source of truth.
217
+
218
+ ## Implementation Reality
219
+
220
+ ### What Was Implemented vs Planned
221
+
222
+ All 9 spec goals were implemented across 17 prompts (8 planned, 4 emergent, 2 user-patch, 3 review-fix):
223
+
224
+ 1. **Two-agent model** — Implemented as designed. `hypothesis-planner.yaml` (plan only) + executor (unchanged). Agent file uses canonical `emergent` naming per engineer decision, not `hypothesis-planner`.
225
+ 2. **Unified event loop** — Single code path in `checkPromptLoop()`. All emergent toggle/state machinery removed.
226
+ 3. **Spec type field** — 6-value enum added. `hypothesis_domains` deferred as spec field (jury-approved scope reduction) — settings.json global defaults only.
227
+ 4. **New Initiative TUI action** — Implemented via `flowOverride` on existing `ideation` agent profile rather than separate profiles per type. `SCOPING_FLOW_MAP` exported as `Record<SpecType, string | null>`.
228
+ 5. **Planner type-aware behavior** — `SPEC_PLANNING.md` branches milestone (deep) vs exploratory (lightweight) via table format.
229
+ 6. **Always-available TUI actions** — All `hidden`/`disabled` conditions removed. Two toggles: Loop, Parallel.
230
+ 7. **Hypothesis planner agent + flow** — Created with `prompt_scoped: false`, `non_coding: true`. Always produces at least 1 prompt (engineer decision).
231
+ 8. **Scoping flows** — 5 separate flows created (~25 lines each). Triage is a stub with manual fallback (deferred).
232
+ 9. **CREATE_SPEC + Compounding + Pillars** — Branch prefix convention, type-conditional compounding, pillars 1/8/9 updated.
233
+
234
+ ### How Engineer Desires Evolved
235
+
236
+ - **Hypothesis planner termination**: Spec proposed 0-prompt termination → engineer decided "always produce work" with progressive tangentiality. Engineer controls termination via loop toggle.
237
+ - **Emergent naming**: Spec proposed renaming to `hypothesis-planner.yaml` → engineer declined, "emergent" is canonical agent identity. `hp-` prefixed variables renamed to `emergent-` via PR review fix.
238
+ - **documentor.yaml deletion**: Prompt 07 deleted as part of documentation consolidation → engineer restored via user-patch 17 because compound TUI action requires both profiles.
239
+ - **Uncommitted changes guards**: Not in original spec → engineer added via user-patch 11 after TUI overhaul exposed PR actions without safety guards.
240
+ - **Exponential backoff**: Not in original spec → emergent prompt 10 added spawn resilience. Engineer kept it.
241
+ - **Test coverage**: Not explicitly planned → 4 emergent prompts (09, 10, 15, 16) added 62 tests covering event loop decisions, backoff, spec type parsing, and initiative routing.
242
+
243
+ ### Key Technical Decisions
244
+
245
+ - `flowOverride` parameter on `spawnAgentFromProfile()` enables routing without per-type agent profiles
246
+ - `SPEC_TYPE` template variable separate from `WORKFLOW_TYPE` (different concepts)
247
+ - `type: emergent` preserved on hypothesis planner prompts (describes work type, not agent type)
248
+ - `EmergentSettings` interface added to `ProjectSettings` for typed settings.json access
249
+ - `WORKFLOW_TYPE` template variable removed entirely (no references remained)
250
+ - `confirmProceedWithUncommittedChanges()` extracted as shared helper across 3 TUI handlers
@@ -0,0 +1,98 @@
1
+ ---
2
+ name: validation-tooling-practice
3
+ domain_name: validation
4
+ status: completed
5
+ dependencies: []
6
+ branch: feature/validation-tooling-practice
7
+ ---
8
+
9
+ # Validation Tooling Practice
10
+
11
+ ## Motivation
12
+
13
+ The harness's Pillar 10 (**Agentic Validation Tooling**) exists to decouple engineers from the implementation loop — engineers ideate and set expectations, agents implement and validate, engineers return for quality control. This only works if validation tooling is robust, well-categorized, and well-understood by agents.
14
+
15
+ Today, the harness has almost no validation tooling infrastructure:
16
+ - One validation suite (`typescript-typecheck.md`) that doesn't meet the refined suite existence threshold — it has no stochastic dimension and is already enforced by hooks
17
+ - A validation-suite schema that models validation as a flat list of commands with no distinction between exploratory agent-driven validation and deterministic CI/CD-integrated validation
18
+ - Flows that reference validation suites but don't distinguish how agents should use them (stochastically during implementation vs. deterministically for acceptance criteria)
19
+ - No concrete suite demonstrating the model end-to-end
20
+
21
+ Engineer desires a two-dimensional validation model where every suite covers the same domain across stochastic validation (agent-driven exploratory testing using model intuition) and deterministic integration (binary pass/fail gating for CI/CD). These dimensions follow a crystallization lifecycle: stochastic exploration discovers patterns, patterns crystallize into deterministic checks, deterministic checks get entrenched in CI/CD, and stochastic exploration shifts to the frontier.
22
+
23
+ ## Goals
24
+
25
+ 1. **Establish the validation tooling practice model** — Define the stochastic/deterministic two-dimension taxonomy, the crystallization lifecycle, and the suite existence threshold as the foundational mental model for all validation in the harness
26
+
27
+ 2. **Update the validation-suite schema** — Replace the current body sections (`Purpose`, `When to Use`, `Validation Commands`, `Interpreting Results`, `CICD Integration`) with the refined structure: `Purpose`, `Tooling`, `Stochastic Validation`, `Deterministic Integration`, conditional `ENV Configuration`. Add `tools` frontmatter field (string array, required)
28
+
29
+ 3. **Update harness flows** — Align the following flows with the two-dimensional model:
30
+ - `UTILIZE_VALIDATION_TOOLING.md` — Reference stochastic/deterministic dimensions when matching suites to acceptance criteria
31
+ - `CREATE_VALIDATION_TOOLING_SPEC.md` — New suites must articulate their stochastic dimension to justify existence per the suite threshold
32
+ - `PROMPT_TASK_EXECUTION.md` — Distinguish stochastic exploration (during implementation) from deterministic validation (for acceptance criteria)
33
+ - `E2E_TEST_PLAN_BUILDING.md` — Align categorization with suite taxonomy
34
+ - `COMPOUNDING.md` — Track the crystallization lifecycle: which stochastic patterns should be engrained deterministically
35
+
36
+ 4. **Update Principle #6 in `principles.md`** — Enrich "Agentic Validation Tooling" with the stochastic/deterministic dimension distinction, the crystallization lifecycle concept, and the suite existence threshold
37
+
38
+ 5. **Delete `typescript-typecheck.md`** — Does not meet the suite existence threshold (no stochastic dimension). Type checking is already enforced by the validation hooks (tsc diagnostics on every write) and is intuitive to frontier models
39
+
40
+ 6. **Create `supabase-database.md` as the first real validation suite** — Following the new schema, covering database migration validation with Supabase branching. Includes stochastic playbook (migration + connected services exploration, rollback behavior, production-grade data stress, concurrent access), deterministic integration (migration scripts, schema diff assertions, CI/CD pipeline guidance), and ENV configuration for preview database connection swapping
41
+
42
+ 7. **Define where deterministic-only tools go** — Engineer desires these are simply not suites. They are test commands referenced directly in acceptance criteria and CI/CD pipelines. No new schema type or directory needed — the suite abstraction is reserved for domains with meaningful stochastic dimensions
43
+
44
+ ## Non-Goals
45
+
46
+ - **CI/CD pipeline implementation** — This milestone defines the practice model and creates the first suite. Actually building GitHub Actions workflows for preview database provisioning, migration-on-push, and deterministic gating is downstream work
47
+ - **Playwright suite creation** — The existing `validation-playwright.spec.md` will be superseded by this milestone's practices. A future milestone will create a Playwright suite following the new schema
48
+ - **Automated suite matching** — No `ah validation-tools match <file>` command. Suite matching remains agent-driven via glob patterns and semantic inference
49
+ - **Cross-suite ENV configuration** — ENV configuration is suite-specific by design. No shared ENV doc or cross-suite ENV management
50
+
51
+ ## Open Questions
52
+
53
+ - **Supabase CLI setup prerequisites**: What Supabase project configuration is needed before the suite's stochastic playbook can be executed? Architect should research the minimum viable Supabase branching setup and document in the suite's Tooling section
54
+ - **Preview database data seeding**: Engineer desires production-grade data for migration testing. Architect should determine the recommended approach for seeding preview databases — Supabase snapshot restore, pg_dump/pg_restore, or synthetic seed scripts — and document tradeoffs in the stochastic playbook
55
+ - **Hook validation updates**: The validation hooks in `validation.ts` enforce the current `validation-suite` schema via pattern matching. Architect should determine whether schema enforcement needs code changes to support the new required sections, or whether the YAML schema update is sufficient since enforcement uses the schema file as source of truth
56
+
57
+ ## Technical Considerations
58
+
59
+ - **Schema is source of truth**: The `validation-suite.yaml` schema file is loaded by `loadSchema()` in `validation.ts` and enforced on every write to `.allhands/validation/*.md`. Changing the schema file should propagate enforcement automatically, but the section validation logic in `validateFrontmatter()` may need review
60
+ - **`ah validation-tools list` command**: The `listValidationSuites()` function in `validation-tools.ts` reads frontmatter fields `name`, `description`, `globs`. Adding the `tools` field means this command should also surface `tools` in its output for richer discovery
61
+ - **Existing flow references**: Multiple flows reference validation suites. The updated suite schema changes body section names, which means any flow that instructs agents to "read the Validation Commands section" or "check the CICD Integration section" needs updating to reference the new section names
62
+ - **`validation-playwright.spec.md` supersession**: This spec exists at `specs/roadmap/validation-playwright.spec.md`. It was written under the old schema model. Future work on Playwright validation should follow the practices established by this milestone. The existing spec should be marked as superseded or updated to reference this milestone's practices
63
+ - **Pillar 10 in `pillars.md`**: The newly created `pillars.md` describes Agentic Validation Tooling as the 10th pillar. The practices established in this milestone are the concrete realization of that pillar. No changes to `pillars.md` are needed — it already captures the two-dimensional model at the pillar level
64
+ - **Stochastic terminology**: Engineer deliberately chose "stochastic" over "heuristic" — deterministic/stochastic is an established CS pair. All flows and documentation should use this terminology consistently
65
+ - **Suite existence threshold enforcement**: Assuming the CREATE_VALIDATION_TOOLING_SPEC flow update is sufficient to enforce the threshold during suite creation. No programmatic enforcement needed — per **Frontier Models are Capable**, the flow guidance is sufficient
66
+
67
+ ## Implementation Reality
68
+
69
+ ### What was actually implemented vs planned
70
+
71
+ All 7 Goals were achieved, with one significant pivot and substantial emergent work:
72
+
73
+ **Goal 6 pivoted**: Spec planned `supabase-database.md` but implementation created `browser-automation.md` instead. The engineer chose browser automation as the first suite — it demonstrated the stochastic/deterministic model more clearly with agent-browser (stochastic) and Playwright (deterministic) as distinct tools for each dimension. Open Questions about Supabase CLI setup and data seeding became moot.
74
+
75
+ **Goal 7 resolved by design**: Deterministic-only tools were defined as "not suites" through principle/flow updates. No separate schema or directory was needed.
76
+
77
+ **Open Question on hook validation**: Resolved — `validateFrontmatter()` only checks frontmatter fields, not body sections. Body section definitions in the schema YAML are documentation-only, not hook-enforced. Accepted per **Frontier Models are Capable**.
78
+
79
+ ### How engineer desires evolved
80
+
81
+ 1. **Documentation philosophy reversal (Prompt 05→09)**: Engineer initially chose detailed CLI commands for browser-automation suite, then reversed after hands-on agent-browser testing. Discovery: commands are discoverable via `--help`; suite value is teaching agents HOW TO THINK about using a tool. This spawned Prompt 08 (suite creation flow refinement with documentation principles) before Prompt 09 applied it.
82
+
83
+ 2. **Pillar terminology override**: Engineer overrode spec instruction to "leave pillars.md as-is" and updated "heuristic" → "stochastic" for cross-document consistency.
84
+
85
+ 3. **validation-playwright.spec.md**: Engineer deleted directly rather than marking superseded through the review process.
86
+
87
+ ### Emergent work (all kept)
88
+
89
+ - **147 new tests** across 3 emergent testing prompts (06, 07, 10) for previously untested schema validation infrastructure
90
+ - **Suite creation flow refinement** (Prompt 08) — Tool Validation phase, documentation principles, evidence capture guidance, 6-subsection stochastic structure
91
+ - **Dual validation path consolidation** — Emergent testing (Prompt 10) documented 4 divergences between `hooks/validation.ts` and `lib/schema.ts`, enabling Jury Review to consolidate both paths into single-source-of-truth delegation
92
+
93
+ ### Key technical decisions
94
+
95
+ - **Schema enforcement is frontmatter-only**: Body section validation is documentation-only in schema YAML, not hook-enforced. Per **Frontier Models are Capable**, flow guidance suffices.
96
+ - **Triple extractFrontmatter consolidation**: `hooks/validation.ts`, `commands/validation-tools.ts`, and `lib/schema.ts` all had independent implementations. Consolidated to lib as single source of truth.
97
+ - **Array item-type validation**: Added to both validation paths — `tools: [123]` now rejected when `items: string` specified in schema.
98
+ - **blockTool format mismatch**: Discovered `blockTool()` outputs `{ decision: 'block' }` while hook-runner expects `{ continue: false }`. Documented as known harness inconsistency.