autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,537 @@
1
+ # Code Factory v2 Phase 4 — Design Document
2
+
3
+ **Date:** 2026-02-21
4
+ **Status:** Approved
5
+ **Approach:** Fixes-First, Then Features (Batch 1 → 2 → 3 → 4 → 5, sequential)
6
+ **Prior work:** `docs/plans/2026-02-21-code-factory-v2-design.md` (Phases 1-3 complete, Phase 4 partial)
7
+
8
+ ## Problem Statement
9
+
10
+ Phase 4 of Code Factory v2 has 4 remaining design tasks (4.2, 4.4, 4.5, 4.6) plus 2 quick fixes and 43 new lessons discovered during v2 execution. The existing 6 lesson files in the toolkit are a fraction of the 53 lessons accumulated across projects. This plan completes Phase 4 and brings all generalizable lessons into the public toolkit.
11
+
12
+ ## What's Already Done (Phases 1-3 + partial Phase 4)
13
+
14
+ - Shared libraries: common.sh, ollama.sh, telegram.sh, run-plan-headless.sh
15
+ - Quality gates: lesson-check + lint (ruff/eslint) + tests + license-check + memory
16
+ - Prior-art search (text-based), pipeline-status, failure-digest, context_refs
17
+ - 19 test files, 224 assertions, all scripts under 300 lines
18
+
19
+ ## Batch 1: Quick Fixes + All Lessons
20
+
21
+ ### Fix 1: Empty Batch Detection
22
+
23
+ In `run-plan-headless.sh` line 37, the batch loop iterates `START_BATCH` to `END_BATCH` without checking if the batch has content. The parser found 9 batches for a 7-batch plan, burning 2 API calls on empty batches (~50s wasted).
24
+
25
+ **Fix:** After `get_batch_title`, call `get_batch_text` and skip if empty:
26
+
27
+ ```bash
28
+ local batch_text
29
+ batch_text=$(get_batch_text "$PLAN_FILE" "$batch")
30
+ if [[ -z "$batch_text" ]]; then
31
+ echo " (empty batch -- skipping)"
32
+ continue
33
+ fi
34
+ ```
35
+
36
+ ### Fix 2: Bash Test Suite Detection
37
+
38
+ `quality-gate.sh` detects pytest/npm/make but not bash test suites. For this repo, quality gates between batches reported "No test suite detected -- skipped" while 224 assertions existed.
39
+
40
+ **Fix:** Add `bash` case to `detect_project_type()` in `common.sh` when `scripts/tests/run-all-tests.sh` or a `test-*.sh` glob exists. Add corresponding `bash)` case in quality-gate.sh's test suite section.
41
+
42
+ ### Lessons: 43 New Files (0007-0049)
43
+
44
+ Port all generalizable lessons from the Documents workspace (53 total - 6 already in toolkit - 11 too project-specific = 36 to port) plus 7 new lessons from v2 execution.
45
+
46
+ **Generalization rules:**
47
+ - No project names (no ARIA, HA, Telegram, etc.)
48
+ - No specific IPs, hostnames, or usernames
49
+ - No internal API references — use generic equivalents
50
+ - Focus on the universal anti-pattern, not the specific bug
51
+
52
+ **Lesson mapping (new ID → source → generalized title):**
53
+
54
+ | New ID | Source | Title | Type | Severity | Category |
55
+ |--------|--------|-------|------|----------|----------|
56
+ | 0007 | v2 | Runner state file rejected by own git-clean check | syntactic | should-fix | integration-boundaries |
57
+ | 0008 | v2 | Quality gate blind spot for non-standard test suites | semantic | should-fix | silent-failures |
58
+ | 0009 | v2 | Plan parser over-count burns empty API calls | semantic | should-fix | silent-failures |
59
+ | 0010 | v2 | `local` outside function silently misbehaves in bash | syntactic | blocker | silent-failures |
60
+ | 0011 | v2 | Batch execution writes tests for unimplemented code | semantic | should-fix | integration-boundaries |
61
+ | 0012 | v2 | API rejects markdown with unescaped special chars | semantic | nice-to-have | integration-boundaries |
62
+ | 0013 | v2 | `export` prefix in env files breaks naive parsing | syntactic | should-fix | silent-failures |
63
+ | 0014 | #2 | Decorator registries are import-time side effects | semantic | should-fix | silent-failures |
64
+ | 0015 | #4 | Frontend-backend schema drift invisible until e2e trace | semantic | should-fix | integration-boundaries |
65
+ | 0016 | #5 | Event-driven systems must seed current state on startup | semantic | should-fix | integration-boundaries |
66
+ | 0017 | #6 | Copy-pasted logic between modules diverges silently | semantic | should-fix | integration-boundaries |
67
+ | 0018 | #8 | Every layer passes its test while full pipeline is broken | semantic | should-fix | integration-boundaries |
68
+ | 0019 | #9 | systemd EnvironmentFile ignores `export` keyword | syntactic | should-fix | silent-failures |
69
+ | 0020 | #10 | Persist state incrementally before expensive work | semantic | should-fix | silent-failures |
70
+ | 0021 | #11 | Dual-axis testing: horizontal sweep + vertical trace | semantic | lesson-learned | integration-boundaries |
71
+ | 0022 | #13 | Build tool JSX factory shadowed by arrow params | syntactic | blocker | silent-failures |
72
+ | 0023 | #14 | Static analysis spiral -- chasing lint fixes creates more bugs | semantic | should-fix | test-anti-patterns |
73
+ | 0024 | #15 | Shared pipeline features must share implementation | semantic | should-fix | integration-boundaries |
74
+ | 0025 | #16 | Defense-in-depth: validate at all entry points | semantic | lesson-learned | integration-boundaries |
75
+ | 0026 | #17 | Linter with no rules enabled = false enforcement | semantic | should-fix | silent-failures |
76
+ | 0027 | #18 | JSX silently drops wrong prop names | syntactic | should-fix | silent-failures |
77
+ | 0028 | #20 | Never embed infrastructure details in client-side code | syntactic | blocker | silent-failures |
78
+ | 0029 | #21 | Never write secret values into committed files | syntactic | blocker | silent-failures |
79
+ | 0030 | #22 | Cache/registry updates must merge, never replace | semantic | should-fix | integration-boundaries |
80
+ | 0031 | #26 | Verify units at every boundary (0-1 vs 0-100) | semantic | should-fix | integration-boundaries |
81
+ | 0032 | #28 | Module lifecycle: subscribe after init gate, unsubscribe on shutdown | semantic | should-fix | resource-lifecycle |
82
+ | 0033 | #29 | Async iteration over mutable collections needs snapshot | syntactic | blocker | async-traps |
83
+ | 0034 | #30 | Caller-side missing await silently discards work | semantic | blocker | async-traps |
84
+ | 0035 | #31 | Duplicate registration IDs cause silent overwrite | semantic | should-fix | silent-failures |
85
+ | 0036 | #34 | WebSocket dirty disconnects raise RuntimeError, not close | semantic | should-fix | resource-lifecycle |
86
+ | 0037 | #36 | Parallel agents sharing worktree corrupt staging area | semantic | blocker | integration-boundaries |
87
+ | 0038 | #37 | Subscribe without stored ref = cannot unsubscribe | syntactic | should-fix | resource-lifecycle |
88
+ | 0039 | #38 | Fallback `or default()` hides initialization bugs | semantic | should-fix | silent-failures |
89
+ | 0040 | #39 | Process all events when 5% are relevant -- filter first | semantic | should-fix | performance |
90
+ | 0041 | #40 | Ambiguous base dir variable causes path double-nesting | semantic | should-fix | integration-boundaries |
91
+ | 0042 | #42 | Spec compliance without quality review misses defensive gaps | semantic | should-fix | integration-boundaries |
92
+ | 0043 | #44 | Exact count assertions on extensible collections break on addition | syntactic | should-fix | test-anti-patterns |
93
+ | 0044 | #46 | Relative `file:` deps break in git worktrees | semantic | should-fix | integration-boundaries |
94
+ | 0045 | #49 | Iterative "how would you improve" catches 35% more design gaps | semantic | lesson-learned | integration-boundaries |
95
+ | 0046 | #50 | Plan-specified test assertions can have math bugs | semantic | should-fix | test-anti-patterns |
96
+ | 0047 | #52 | pytest runs single-threaded by default -- add xdist | semantic | should-fix | performance |
97
+ | 0048 | #53 | Multi-batch plans need explicit integration wiring batch | semantic | lesson-learned | integration-boundaries |
98
+ | 0049 | #56 | A/B verification finds zero-overlap bug classes | semantic | lesson-learned | integration-boundaries |
99
+
100
+ **SUMMARY.md:** Generalized version of the Documents workspace summary with:
101
+ - Quick reference table (all 49 lessons)
102
+ - Three root cause clusters (Silent Failures, Integration Boundaries, Cold-Start)
103
+ - Six rules to build by
104
+ - Diagnostic shortcuts table
105
+ - No project-specific references
106
+
107
+ All lesson files follow the toolkit's YAML frontmatter schema (see `docs/lessons/TEMPLATE.md`).
108
+
109
+ ## Batch 2: Per-Batch Context Assembler
110
+
111
+ **Goal:** Minimize the context gap between a fresh batch agent and an experienced one. Each agent gets exactly the context it needs within a token budget -- directives, not just facts.
112
+
113
+ ### Architecture
114
+
115
+ A `generate_batch_context()` function in `scripts/lib/run-plan-context.sh` that:
116
+
117
+ 1. **Reads all context sources:** state file, progress.txt, git log, context_refs, failure-patterns.json
118
+ 2. **Scores by relevance:** recency (recent batches score higher) + direct dependency (context_refs from this batch score highest) + failure history (if this batch type failed before, that scores high)
119
+ 3. **Assembles within token budget:** ~1500 tokens target. Priority order: directives > failure history > context_refs contents > git log > progress.txt
120
+ 4. **Outputs directives:** "Don't repeat X", "Read Y before modifying", "Quality gate expects N+ tests"
121
+ 5. **Writes to CLAUDE.md:** Appends `## Run-Plan: Batch N` section (overwritten per batch, not accumulated)
122
+
123
+ ### Context Sources (priority order)
124
+
125
+ 1. **Failure patterns** (highest) — from `logs/failure-patterns.json`, cross-run learning
126
+ 2. **Context_refs file contents** — first 100 lines of files declared in batch header
127
+ 3. **Prior batch quality gate results** — test count, pass/fail, duration
128
+ 4. **Git log** — last 5 commits from prior batches
129
+ 5. **Progress.txt** — last 20 lines of discoveries/decisions
130
+ 6. **Directives** — synthesized from above: "tests must stay above 224", "these files were modified by batch 2"
131
+
132
+ ### Cross-Run Failure Patterns
133
+
134
+ `logs/failure-patterns.json` persists across runs:
135
+
136
+ ```json
137
+ [
138
+ {
139
+ "batch_title_pattern": "integration wiring",
140
+ "failure_type": "missing import",
141
+ "frequency": 3,
142
+ "last_seen": "2026-02-21",
143
+ "winning_fix": "check all imports before running tests"
144
+ }
145
+ ]
146
+ ```
147
+
148
+ When a batch title fuzzy-matches a pattern, the relevant warning is injected into context.
149
+
150
+ ### Token Budget
151
+
152
+ - Budget: ~1500 tokens (~6000 chars)
153
+ - If assembled context exceeds budget, trim lowest-priority items first
154
+ - Always include: directives (mandatory), failure patterns (if matched), quality gate expectations
155
+ - Trim first: progress.txt, git log, context_refs file contents (truncate to first 50 lines)
156
+
157
+ ## Batch 3: ast-grep Integration
158
+
159
+ **Goal:** Help agents write code that fits the existing codebase and catch semantic anti-patterns that grep cannot detect. Two modes: discovery (before PRD) and enforcement (in quality gate).
160
+
161
+ ### Discovery Mode (prior-art-search.sh)
162
+
163
+ Run `ast-grep` once at plan start to extract the dominant codebase patterns:
164
+
165
+ - Error handling style (try/except with logging vs bare except)
166
+ - Test patterns (assert helpers, fixture usage, naming conventions)
167
+ - Function size distribution
168
+ - Import patterns
169
+
170
+ Results feed into the context assembler (Batch 2) as "Codebase style: [patterns]" — every batch agent writes code that fits without being told to.
171
+
172
+ ### Enforcement Mode (quality-gate.sh)
173
+
174
+ Optional quality gate step that runs ast-grep rules derived from lesson files:
175
+
176
+ - Read lesson YAML where `pattern.type: semantic` and language has ast-grep support
177
+ - Auto-generate ast-grep rule files from lesson descriptions
178
+ - Run against changed files in the batch
179
+ - Warn (not fail) by default — `--strict-ast` to make it a hard gate
180
+
181
+ ### Auto-Generated Rules from Lessons
182
+
183
+ Lessons with `pattern.type: semantic` that describe structural patterns (e.g., "async def body has no await") can be converted to ast-grep YAML rules. A `scripts/generate-ast-rules.sh` script reads lesson files and produces `scripts/patterns/*.yml`.
184
+
185
+ Not all semantic lessons can be converted — some require true AI understanding. The script attempts conversion and logs which lessons it could/couldn't handle.
186
+
187
+ ### Built-in Pattern Files
188
+
189
+ 5-10 patterns in `scripts/patterns/` for common structural anti-patterns:
190
+
191
+ ```
192
+ scripts/patterns/
193
+ retry-loop.yml — retry without backoff
194
+ bare-except.yml — except without specific exception
195
+ async-no-await.yml — async def with no await in body
196
+ empty-catch.yml — catch block with no logging
197
+ unused-import.yml — imported but never referenced
198
+ ```
199
+
200
+ ### Graceful Degradation
201
+
202
+ If `ast-grep` is not installed:
203
+ - Discovery mode: skip with note ("install ast-grep for structural analysis")
204
+ - Enforcement mode: skip silently (grep-based lesson-check.sh still runs)
205
+ - No hard dependency — ast-grep enhances but is not required
206
+
207
+ ## Batch 4: Team Mode with Decision Gate
208
+
209
+ **Goal:** Reduce total wall-clock time for plan execution while maintaining quality. Automatically select the optimal execution mode based on plan analysis.
210
+
211
+ ### Decision Gate
212
+
213
+ Before any execution starts, `run-plan.sh` analyzes the plan and selects a mode:
214
+
215
+ ```
216
+ run-plan.sh <plan>
217
+ |
218
+ v
219
+ analyze_plan_for_mode()
220
+ |-- Parse all batches: Files, context_refs, depends_on
221
+ |-- Build file-level dependency graph
222
+ |-- Compute parallelism score (0-100)
223
+ |-- Check: AGENT_TEAMS flag available?
224
+ |-- Check: available memory vs worker count
225
+ |
226
+ v
227
+ Decision:
228
+ score < 20 --> HEADLESS (sequential is optimal)
229
+ score 20-60 --> HEADLESS with advisory ("team mode would save ~Xmin")
230
+ score > 60 + teams available + memory OK --> TEAM (parallel)
231
+ score > 60 + teams unavailable --> HEADLESS with note
232
+ any + --mode override --> use override
233
+ ```
234
+
235
+ ### Parallelism Score Factors
236
+
237
+ - % of batches with zero file overlap with neighbors (+)
238
+ - Number of batches in first parallel group (+)
239
+ - Total file overlap across all batch pairs (-)
240
+ - Shared runtime hints: "starts server", "modifies DB" (-)
241
+ - Explicit `parallel_safe: true` in plan header (+20 bonus)
242
+
243
+ ### Routing Plan (always shown)
244
+
245
+ ```
246
+ === Execution Mode Analysis ===
247
+
248
+ Plan: implementation-plan.md
249
+ Batches: 7 | Files touched: 31 | Avg overlap: 12%
250
+
251
+ Dependency graph:
252
+ B1 --> B2 --> B3
253
+ B1 --> B4 --> B5 --> B7
254
+ B6 -------> B7
255
+
256
+ Parallelism score: 72/100
257
+ + 3 independent groups detected
258
+ + Max parallel width: 3 (B3, B5, B6)
259
+ + File overlap < 20% in parallel groups
260
+ - B2->B3 share 2 files (conservative: sequential)
261
+
262
+ Recommendation: TEAM MODE
263
+ Workers: 2 (21G available, 8G/worker threshold)
264
+ Est. wall time: 14min (vs 28min sequential)
265
+ Est. cost: $2.40 (vs $3.10 sequential)
266
+
267
+ Model routing:
268
+ B1: sonnet (implementation -- creates 4 files)
269
+ B2: sonnet (implementation -- modifies 3 files, adds tests)
270
+ B3: haiku (verification -- 0 creates, 5 run commands) [auto-escalate]
271
+ B4: sonnet (implementation -- creates 2 files)
272
+ B5: sonnet (implementation -- modifies + tests)
273
+ B6: haiku (wiring -- 0 new logic) [auto-escalate]
274
+ B7: haiku (verification -- pipeline trace only) [auto-escalate]
275
+
276
+ Speculative execution:
277
+ B2 starts while B1 gate runs (overlap: 0%)
278
+ B5 waits for B4 gate (overlap: 73%)
279
+ ```
280
+
281
+ ### Auto-Detect Parallelism
282
+
283
+ Build dependency graph from plan content — no `depends_on:` annotations required:
284
+
285
+ - `context_refs` in batch headers declare which files a batch reads from prior batches
286
+ - `Files:` sections declare which files a batch creates/modifies
287
+ - If batch B's context_refs don't include any of batch A's output files, they're independent
288
+ - Fall back to sequential when analysis is ambiguous
289
+
290
+ Existing plans work in team mode with zero changes.
291
+
292
+ ### Team Execution Architecture
293
+
294
+ - **Team lead:** owns task list, quality gates, merge queue
295
+ - **N workers:** each gets isolated git worktree, claims batches, executes
296
+ - **Progressive merge queue:** each batch merges to main immediately after gate pass (keeps divergence small)
297
+ - **Speculative execution:** start next batch while gate runs when file overlap < threshold. Abort speculation if gate fails.
298
+ - **Model routing with auto-escalation:** haiku batches that fail retry on sonnet, sonnet failures escalate to opus
299
+
300
+ ### Routing Configuration (`scripts/lib/run-plan-routing.sh`)
301
+
302
+ ```bash
303
+ # Parallelism thresholds
304
+ PARALLEL_SCORE_THRESHOLD=60 # min score for team mode recommendation
305
+ SPECULATE_MAX_OVERLAP=20 # max file overlap % for speculative execution
306
+
307
+ # Model routing (batch classification --> model)
308
+ MODEL_IMPLEMENTATION="sonnet" # creates/modifies code files
309
+ MODEL_VERIFICATION="haiku" # only run/verify commands
310
+ MODEL_ARCHITECTURE="opus" # "design" or "architecture" in title
311
+ MODEL_ESCALATE_ON_FAIL=true # haiku-->sonnet-->opus on retry
312
+
313
+ # Resource limits
314
+ WORKER_MEM_THRESHOLD_GB=8 # min GB available per worker
315
+ MAX_WORKERS=3 # hard cap regardless of memory
316
+ ```
317
+
318
+ ### Override Escape Hatches
319
+
320
+ - `--mode headless` — force sequential regardless of score
321
+ - `--mode team` — force team regardless of score
322
+ - `--workers N` — override worker count
323
+ - `--model-override B3=opus` — force specific model for a batch
324
+ - `--no-speculate` — disable speculative execution
325
+ - `--sequential-after B4` — parallel until B4, then sequential
326
+
327
+ ### Decision Log (`logs/routing-decisions.log`)
328
+
329
+ Every decision logged with timestamp and reasoning:
330
+
331
+ ```
332
+ [12:03:14] MODE: team (score=72, threshold=60)
333
+ [12:03:14] PARALLEL: B2,B4 -- overlap=0 files, both depend only on B1
334
+ [12:03:14] MODEL: B3-->haiku -- 0 create/modify, 5 run commands, confidence=85%
335
+ [12:05:22] SPECULATE: B3 starting while B2 gate runs -- overlap 0%
336
+ [12:05:45] GATE_PASS: B2 (224-->231 tests), merging worktree
337
+ [12:05:48] MERGE: B2 --> main, 3 files, 0 conflicts
338
+ [12:06:01] SPECULATE_OK: B3 confirmed
339
+ [12:08:30] ESCALATE: B6 failed on haiku, retrying on sonnet
340
+ ```
341
+
342
+ ### Where Team Mode Falls Back to Headless
343
+
344
+ - Parallelism score < 60 (tightly coupled batches)
345
+ - Shared runtime state detected (service ports, DB migrations)
346
+ - Plan is concern-batched (all impl then all tests)
347
+ - Available memory < 2 x worker threshold
348
+ - `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` flag not set
349
+
350
+ ### Integration with pipeline-status.sh
351
+
352
+ After execution, pipeline-status.sh shows routing decisions alongside results:
353
+
354
+ ```
355
+ Batch 3: haiku --> PASSED (22s, 8 tests added)
356
+ Batch 4: sonnet --> PASSED (180s, 15 tests added)
357
+ Batch 6: haiku-->sonnet (escalated) --> PASSED (45s, 3 tests added)
358
+ Total: 14min wall, $2.38 cost, 2 workers
359
+ ```
360
+
361
+ ### Writing-Plans Integration
362
+
363
+ The writing-plans skill should assess parallelism when creating plans:
364
+ - Add `parallel_safe: true/false` to plan header
365
+ - Add `depends_on: [batch-N]` hints to batch headers when dependencies exist
366
+ - Design batches for independence when possible (different files per batch)
367
+
368
+ ## Batch 5: Parallel Patch Sampling
369
+
370
+ **Goal:** Maximize the probability that a batch succeeds, especially for hard batches. Improve success probability over time through outcome learning.
371
+
372
+ ### When Sampling Triggers
373
+
374
+ Not every batch — only when:
375
+ - Batch marked `critical: true` in plan header
376
+ - Batch failed its first attempt (sampling replaces naive retry)
377
+ - User passes `--sample N` flag explicitly
378
+
379
+ ### Tournament Architecture
380
+
381
+ ```
382
+ Batch fails first attempt (or marked critical)
383
+ |
384
+ v
385
+ Round 1: N candidates in parallel (default: 3)
386
+ |-- Candidate 1: vanilla prompt
387
+ |-- Candidate 2: prompt + failure digest + "try a different approach"
388
+ |-- Candidate 3: prompt + failure digest + "minimal change only"
389
+ |
390
+ Each in isolated worktree
391
+ |
392
+ v
393
+ Score each candidate:
394
+ |-- Quality gate pass/fail (mandatory -- eliminates failures)
395
+ |-- Test count (more = better)
396
+ |-- Diff size (smaller = better among passers)
397
+ |-- Lint warnings (fewer = better)
398
+ |-- Lesson-check violations (penalty: -200 each)
399
+ |-- ast-grep violations (penalty: -100 each)
400
+ |
401
+ v
402
+ Decision:
403
+ Clear winner (1 passes, others don't) --> use it
404
+ Multiple passers --> highest score wins
405
+ No winner OR close scores --> Round 2: Synthesis
406
+ |
407
+ v
408
+ Round 2: Synthesis agent
409
+ Reads: all N attempts + their gate results + their diffs
410
+ Task: "Candidate 1 had best architecture but failed test X.
411
+ Candidate 3 passed but duplicated 40 lines.
412
+ Synthesize: use C1's approach, fix using C3's insight."
413
+ |
414
+ v
415
+ Score synthesis --> if passes, use it. If not, best Round 1 winner.
416
+ ```
417
+
418
+ ### Scoring Function
419
+
420
+ ```bash
421
+ score_candidate() {
422
+ local gate_passed="$1" # 0 or 1
423
+ local test_count="$2" # integer
424
+ local diff_lines="$3" # integer
425
+ local lint_warnings="$4" # integer
426
+ local lesson_violations="$5" # integer
427
+ local ast_violations="$6" # integer
428
+
429
+ # Gate pass is mandatory
430
+ if [[ "$gate_passed" -ne 1 ]]; then
431
+ echo 0; return
432
+ fi
433
+
434
+ # Weighted score: tests most important, quality penalties heavy
435
+ local score=$(( (test_count * 10) + (10000 / (diff_lines + 1)) + (1000 / (lint_warnings + 1)) - (lesson_violations * 200) - (ast_violations * 100) ))
436
+ echo "$score"
437
+ }
438
+ ```
439
+
440
+ ### Prompt Diversity: Batch-Type-Aware + Learned
441
+
442
+ **Batch type classification** from plan content:
443
+
444
+ | Batch type | Likely failure | Best prompt variants |
445
+ |------------|---------------|---------------------|
446
+ | New file creation | Missing imports, incomplete API | vanilla, "check all imports", "write tests first" |
447
+ | Refactoring | Breaking existing tests | vanilla, "minimal change", "run tests after each edit" |
448
+ | Integration wiring | Missing connections | vanilla, "trace end-to-end", "check every import/export" |
449
+ | Test-only | Flaky assertions, wrong mocks | vanilla, "use real objects not mocks", "edge cases only" |
450
+
451
+ **Learned from outcomes** (`logs/sampling-outcomes.json`):
452
+
453
+ ```json
454
+ [
455
+ {
456
+ "batch_type": "refactoring",
457
+ "prompt_variant": "minimal-change",
458
+ "won": true,
459
+ "score": 2450,
460
+ "timestamp": "2026-02-21T12:05:00Z"
461
+ }
462
+ ]
463
+ ```
464
+
465
+ Over 10+ runs, patterns emerge. Candidate slot allocation:
466
+ - 1 slot always vanilla (baseline)
467
+ - Remaining slots allocated to historically winning variants for this batch type
468
+ - 1 slot always experimental (random variant for exploration)
469
+
470
+ This is a simple multi-armed bandit: exploit what works, explore 1 slot.
471
+
472
+ ### Integration with Team Mode
473
+
474
+ - In headless mode: candidates run sequentially (N claude -p calls)
475
+ - In team mode: candidates run as parallel workers on same batch (natural fit)
476
+ - Decision gate factors this in: worker count = sample count for sampled batches
477
+
478
+ ### Resource Guards
479
+
480
+ - Memory: don't sample if available memory < N x 4G
481
+ - Cost: log estimated cost in routing plan ("Sampling B4: ~$1.20 for 3 candidates vs $0.40 single")
482
+ - Time: sampling adds ~50% wall time per batch (parallel) or Nx (sequential)
483
+
484
+ ### Configuration
485
+
486
+ ```bash
487
+ # In run-plan-routing.sh
488
+ SAMPLE_ON_RETRY=true # auto-sample when batch fails first attempt
489
+ SAMPLE_ON_CRITICAL=true # auto-sample for critical: true batches
490
+ SAMPLE_COUNT=3 # default candidate count
491
+ SAMPLE_MAX_COUNT=5 # hard cap
492
+ SAMPLE_MIN_MEMORY_PER_GB=4 # per-candidate memory requirement
493
+ ```
494
+
495
+ ### Override Flags
496
+
497
+ - `--sample N` — force sampling for all batches with N candidates
498
+ - `--sample-batch B4=5` — sample only batch 4 with 5 candidates
499
+ - `--no-sample` — disable all sampling
500
+
501
+ ## Dependencies
502
+
503
+ - **Batch 1** has no dependencies (fixes + lesson files)
504
+ - **Batch 2** depends on Batch 1 (failure patterns reference lesson IDs)
505
+ - **Batch 3** depends on Batch 2 (ast-grep feeds into context assembler) + optional install: `ast-grep`
506
+ - **Batch 4** depends on Batch 2 (context assembler) + Batch 3 (ast-grep scoring) + requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`
507
+ - **Batch 5** depends on Batch 4 (team mode for parallel candidates) + Batch 3 (ast-grep in scoring)
508
+
509
+ ## Success Metrics
510
+
511
+ 1. All 49 lessons in toolkit with YAML frontmatter, no project-specific references
512
+ 2. Empty batches detected and skipped (0 wasted API calls)
513
+ 3. Bash test suites detected by quality gate
514
+ 4. Context assembler reduces agent "discovery" time (measurable via batch duration comparison)
515
+ 5. ast-grep catches at least 3 anti-patterns that grep cannot
516
+ 6. Team mode parallelism score correctly predicts speedup within 20%
517
+ 7. Patch sampling improves retry success rate vs naive retry (track in sampling-outcomes.json)
518
+
519
+ ## Risk Mitigations
520
+
521
+ - **Lesson volume:** 43 new files is mechanical work — each follows the template. Use subagents for parallel writing.
522
+ - **ast-grep availability:** All ast-grep features fail-open. The toolkit works without it installed.
523
+ - **Agent teams instability:** Team mode falls back to headless. Decision gate prevents team mode when conditions aren't right.
524
+ - **Sampling cost:** Resource guards prevent sampling when memory is low. Cost shown in routing plan before execution.
525
+ - **Prompt diversity convergence:** Multi-armed bandit prevents getting stuck on one variant. Always explores 1 slot.
526
+
527
+ ## New Files (estimated)
528
+
529
+ | Category | Count | Location |
530
+ |----------|-------|----------|
531
+ | Lesson files | 43 | `docs/lessons/0007-*.md` through `0049-*.md` |
532
+ | Lesson summary | 1 | `docs/lessons/SUMMARY.md` (rewrite) |
533
+ | Lib scripts | 5 | `scripts/lib/run-plan-context.sh`, `run-plan-routing.sh`, `run-plan-team.sh`, `run-plan-scoring.sh`, `generate-ast-rules.sh` |
534
+ | Pattern files | 5-10 | `scripts/patterns/*.yml` |
535
+ | Config | 1 | Routing defaults in `run-plan-routing.sh` |
536
+ | Test files | 8-10 | `scripts/tests/test-*.sh` for each new lib |
537
+ | Logs | 3 | `logs/failure-patterns.json`, `logs/routing-decisions.log`, `logs/sampling-outcomes.json` |