autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,630 @@
1
+ # Architecture: How Autonomous Coding Works
2
+
3
+ > From idea to shipped branch with no manual copy-paste, no context degradation, and machine-verifiable completion.
4
+
5
+ ## The Core Problem
6
+
7
+ Claude Code has a context window. Long implementation tasks degrade quality as context fills. Manual workflows (copy-paste prompts, eyeball test results, hand-run quality checks) don't scale past 3-4 tasks. This system solves both problems:
8
+
9
+ 1. **Fresh context per unit of work** — each batch/task/iteration starts clean
10
+ 2. **Machine-verifiable gates** — no batch proceeds until tests pass and anti-patterns are absent
11
+ 3. **Resumability** — every state transition is persisted; any interruption is recoverable
12
+
13
+ ## System Overview
14
+
15
+ ```
16
+ IDEA
17
+
18
+
19
+ ┌─────────────────┐
20
+ │ ROADMAP │ Decompose multi-feature epics (conditional)
21
+ │ (Stage 0.5) │ Output: tasks/roadmap.md (dependency-ordered features)
22
+ └────────┬────────┘
23
+ │ (loops per feature)
24
+
25
+ ┌─────────────────┐
26
+ │ BRAINSTORMING │ Explore intent, ask questions, propose approaches
27
+ │ (Stage 1) │ Output: design doc (docs/plans/YYYY-MM-DD-*-design.md)
28
+ └────────┬────────┘
29
+
30
+
31
+ ┌─────────────────┐
32
+ │ RESEARCH │ Investigate unknowns, resolve blockers (conditional)
33
+ │ (Stage 1.5) │ Output: tasks/research-<slug>.md + .json
34
+ │ │ Gate: research-gate.sh (blocks if unresolved issues)
35
+ └────────┬────────┘
36
+
37
+
38
+ ┌─────────────────┐
39
+ │ GIT WORKTREE │ Isolated branch, baseline tests, clean workspace
40
+ │ (isolation) │ Output: worktree at .worktrees/<branch>
41
+ └────────┬────────┘
42
+
43
+ ┌────┴────┐
44
+ │ PRD │ Machine-verifiable acceptance criteria (tasks/prd.json)
45
+ │(optional)│ Every criterion is a shell command: exit 0 = pass
46
+ └────┬────┘
47
+
48
+
49
+ ┌─────────────────┐
50
+ │ WRITING PLANS │ TDD-structured tasks at 2-5 minute granularity
51
+ │ │ Output: plan file (docs/plans/YYYY-MM-DD-*.md)
52
+ └────────┬────────┘
53
+
54
+
55
+ ┌─────────────────────────────────────────────────┐
56
+ │ EXECUTION (choose one) │
57
+ │ │
58
+ │ A. Subagent-Driven Fresh agent per task, │
59
+ │ (same session) two-stage review each │
60
+ │ │
61
+ │ B. Executing-Plans Batch + human checkpoint │
62
+ │ (separate session) every 3 tasks │
63
+ │ │
64
+ │ C. run-plan.sh Headless bash loop, │
65
+ │ (unattended) claude -p per batch │
66
+ │ │
67
+ │ D. Ralph Loop Stop-hook loop, │
68
+ │ (autonomous) iterates until done │
69
+ │ │
70
+ │ ┌──────────────────────────────────────────┐ │
71
+ │ │ QUALITY GATE (between every batch) │ │
72
+ │ │ lesson-check → test suite → memory → │ │
73
+ │ │ test count regression → git clean │ │
74
+ │ └──────────────────────────────────────────┘ │
75
+ └────────────────────┬────────────────────────────┘
76
+
77
+
78
+ ┌─────────────────┐
79
+ │ VERIFICATION │ Evidence-based gate: run commands, read output,
80
+ │ (mandatory) │ confirm claim BEFORE making it
81
+ └────────┬────────┘
82
+
83
+
84
+ ┌─────────────────┐
85
+ │ FINISH BRANCH │ Merge | PR | Keep | Discard
86
+ │ │ Worktree cleanup
87
+ └─────────────────┘
88
+ ```
89
+
90
+ ## The Skill Chain
91
+
92
+ Each stage is a Claude Code skill — a prompt template that teaches Claude how to execute that stage. Skills are **rigid**: follow exactly, don't adapt away discipline.
93
+
94
+ ### Stage 1: Brainstorming
95
+
96
+ **Trigger:** Mandatory before any new feature, component, or behavior change. No exceptions.
97
+
98
+ **What happens:**
99
+ 1. Explore project context (files, docs, recent commits)
100
+ 2. Ask clarifying questions — one at a time, multiple choice preferred
101
+ 3. Propose 2-3 approaches with trade-offs and a recommendation
102
+ 4. Present design section by section, get user approval after each
103
+ 5. Write design doc to `docs/plans/YYYY-MM-DD-<topic>-design.md`
104
+
105
+ **Hard gate:** No code, no scaffold, no implementation skill until design is approved and written.
106
+
107
+ **Code Factory enhancement:** After design approval, generate `tasks/prd.json` — 8-15 granular tasks where every acceptance criterion is a shell command that exits 0 on pass. This creates the machine-verifiable contract that the quality gates enforce.
108
+
109
+ ### Stage 2: Git Worktree Isolation
110
+
111
+ **Trigger:** Before executing any plan.
112
+
113
+ **What happens:**
114
+ 1. Create worktree: `git worktree add .worktrees/<branch> -b <branch-name>`
115
+ 2. Auto-detect and run project setup (npm install / pip install / etc.)
116
+ 3. Run baseline test suite — if tests fail, stop and report before proceeding
117
+
118
+ **Why:** Isolation means the main branch stays clean. Failed experiments are discardable. Multiple agents can work in separate worktrees without staging area conflicts (lesson #36).
119
+
120
+ ### Stage 3: Writing Plans
121
+
122
+ **Trigger:** After approved design, before touching code.
123
+
124
+ **What happens:**
125
+ 1. Read the design doc
126
+ 2. Produce a plan file with TDD-structured tasks at 2-5 minute granularity
127
+ 3. Each task specifies: exact file paths, complete code, exact commands with expected output
128
+ 4. Every task follows: write failing test → confirm fail → implement → confirm pass → commit
129
+
130
+ **Plan format:**
131
+ ```markdown
132
+ ## Batch 1: Title
133
+ ### Task 1: Name
134
+ [full task description with exact files and commands]
135
+
136
+ ### Task 2: Name
137
+ ...
138
+
139
+ ## Batch 2: Title
140
+ ...
141
+ ```
142
+
143
+ **Code Factory enhancement:** Plan must include a `## Quality Gates` section listing checks to run between batches, cross-references to `tasks/prd.json` task IDs, and `progress.txt` initialization as the first step.
144
+
145
+ ### Stage 4: Execution
146
+
147
+ Four execution modes, each solving a different problem:
148
+
149
+ #### Mode A: Subagent-Driven Development (same session)
150
+
151
+ **Best for:** Plans with 5-15 independent tasks where you want to watch progress.
152
+
153
+ **How it works:**
154
+ ```
155
+ For each task:
156
+ 1. Spawn implementer agent (Task tool, general-purpose)
157
+ - Receives full task text (never reads plan file)
158
+ - Implements using TDD, commits
159
+ - Self-reviews before reporting
160
+
161
+ 2. Spawn spec compliance reviewer
162
+ - Reads actual code (does NOT trust implementer's report)
163
+ - Checks: nothing missing, nothing extra vs. spec
164
+ - If gaps → implementer fixes → re-review
165
+
166
+ 3. Spawn code quality reviewer
167
+ - Only runs AFTER spec compliance passes
168
+ - Checks: naming, patterns, clean code
169
+ - If issues → implementer fixes → re-review
170
+
171
+ After all tasks:
172
+ 4. Spawn final code reviewer for entire implementation
173
+ ```
174
+
175
+ **Key constraint:** Never dispatch multiple implementer agents in parallel on the same worktree. Parallel commits corrupt the staging area (lesson #36).
176
+
177
+ #### Mode B: Executing Plans (separate session, batch + checkpoint)
178
+
179
+ **Best for:** Plans you want to execute in a fresh session with human review between batches.
180
+
181
+ **How it works:**
182
+ ```
183
+ 1. Load plan, create task list
184
+ 2. Execute first 3 tasks as a batch
185
+ 3. Report: what was implemented + verification output
186
+ 4. Say "Ready for feedback" — wait for user
187
+ 5. Apply feedback, execute next batch
188
+ 6. Repeat until done
189
+ ```
190
+
191
+ **Key constraint:** Stops immediately if blocked, plan has gaps, or verification fails. Asks rather than guesses.
192
+
193
+ #### Mode C: Headless Bash (`run-plan.sh`)
194
+
195
+ **Best for:** Long plans (10+ batches) where you want to walk away.
196
+
197
+ **How it works:**
198
+ ```bash
199
+ for batch in (start..end):
200
+ prompt = parse_plan(plan_file, batch)
201
+ claude -p "$prompt" --allowedTools Bash,Read,Write,Edit,Grep,Glob
202
+ run_quality_gate || handle_failure
203
+ update_state_file
204
+ [optional: telegram_notify]
205
+ done
206
+ ```
207
+
208
+ Each `claude -p` is a fresh process with a fresh context window. No degradation over 13 batches because there's no accumulated context.
209
+
210
+ **Retry escalation:** On failure, the next attempt includes the previous attempt's log tail in its prompt. Attempt 1 gets the task. Attempt 2 gets the task + "previous attempt failed." Attempt 3 gets the task + the last 50 lines of attempt 2's log.
211
+
212
+ **State management:** `.run-plan-state.json` tracks completed batches, test counts, and quality gate results. `--resume` picks up where it left off.
213
+
214
+ **Sub-modes within headless:**
215
+
216
+ | Mode | Flag | Architecture |
217
+ |------|------|-------------|
218
+ | Headless | `--mode headless` (default) | Bash loop, `claude -p` per batch |
219
+ | Team | `--mode team` | Leader session spawns implementer + reviewer agents per batch |
220
+ | Competitive | `--mode competitive` | Two agents implement same batch in separate worktrees, judge picks winner |
221
+ | MAB | `--mab` | Thompson Sampling routes to best strategy; uncertain batches trigger competitive dual-track |
222
+
223
+ **Competitive dual-track** (for critical batches):
224
+ ```
225
+ Leader
226
+ ├── git worktree: competitor-a
227
+ ├── git worktree: competitor-b
228
+ ├── Agent A implements (subagent-dev style)
229
+ ├── Agent B implements (ralph style)
230
+ ├── Both finish in parallel (separate worktrees = safe)
231
+ ├── Judge agent compares:
232
+ │ Tests pass (binary gate)
233
+ │ Spec compliance (0.4)
234
+ │ Code quality (0.3)
235
+ │ Test coverage (0.3)
236
+ ├── Cherry-pick winner into main worktree
237
+ └── Cleanup both competitor worktrees
238
+ ```
239
+
240
+ #### Mode E: Multi-Armed Bandit (`--mab`)
241
+
242
+ **Best for:** Plans where you want the system to learn which execution strategy works best per batch type.
243
+
244
+ **How it works:** Thompson Sampling routes each batch to either "superpowers" (TDD-style subagent) or "ralph" (iterative loop) strategy. Uncertain batches trigger competitive dual-track execution where both strategies run in parallel worktrees and an LLM judge picks the winner.
245
+
246
+ ```
247
+ Batch arrives
248
+
249
+ ├── strategy-perf.json has < 5 data points per strategy
250
+ │ → "mab" — compete (both strategies, parallel worktrees)
251
+
252
+ ├── integration batch type
253
+ │ → "mab" — always compete (most variable outcome)
254
+
255
+ ├── Clear winner (≥70% win rate, 10+ data points)
256
+ │ → route directly to winning strategy
257
+
258
+ └── Otherwise
259
+ → Thompson sample from Beta(wins+1, losses+1) for each strategy
260
+ → route to highest sample
261
+ ```
262
+
263
+ **Key components:**
264
+ - **`scripts/lib/thompson-sampling.sh`** — Beta approximation using Box-Muller, routing logic with calibration thresholds
265
+ - **`logs/strategy-perf.json`** — Win/loss counters per strategy per batch type (new-file, refactoring, integration, test-only)
266
+ - **`logs/mab-lessons.json`** — Patterns the LLM judge observes during competitive runs (auto-promoted at 3+ occurrences)
267
+ - **Human calibration** — First 10 decisions default to competitive mode to build a baseline before the sampling model takes over
268
+ - **Quality gate override** — If the judge's pick fails the quality gate but the loser passes, the loser wins regardless of judge score
269
+
270
+ Enable with `--mab` flag on `run-plan.sh`.
271
+
272
+ #### Mode D: Ralph Loop (autonomous iteration)
273
+
274
+ **Best for:** Tasks with clear boolean success criteria. "Make all tests pass." "Implement everything in prd.json."
275
+
276
+ **How it works:** Uses a **Stop hook** — a shell script that intercepts Claude's attempt to exit the session and re-injects the original prompt. Claude sees its own previous work in files and git history, iterates, and improves.
277
+
278
+ ```
279
+ 1. User: /ralph-loop "Build X. Output <promise>COMPLETE</promise> when done."
280
+ 2. Claude works on the task
281
+ 3. Claude tries to exit
282
+ 4. Stop hook intercepts → re-injects prompt
283
+ 5. Claude sees previous work, continues
284
+ 6. Loop exits ONLY when completion promise string appears
285
+ ```
286
+
287
+ **Quality gates in Ralph:** The `--quality-checks` flag runs shell commands between iterations. Combined with `--prd` flag, it checks `tasks/prd.json` acceptance criteria.
288
+
289
+ **`progress.txt`:** Auto-created by Ralph setup. Read at the start of each iteration (gives Claude memory across context resets). Appended at the end of each iteration.
290
+
291
+ ### Stage 5: Verification
292
+
293
+ **Trigger:** Before claiming ANY work is complete. Applies to exact claims AND implications of success.
294
+
295
+ **The Iron Law:** No completion claim without fresh verification evidence. If you haven't run the command in this turn, you cannot claim it passes.
296
+
297
+ **Five mandatory steps:**
298
+ 1. **IDENTIFY** — what command proves this claim?
299
+ 2. **RUN** — execute the full command fresh (not from cache or memory)
300
+ 3. **READ** — full output, check exit code, count failures
301
+ 4. **VERIFY** — does output actually confirm the claim?
302
+ 5. **ONLY THEN** — make the claim
303
+
304
+ **Code Factory extension:** Run ALL `tasks/prd.json` acceptance criteria. Every task must have `"passes": true`. Include quality gate results as evidence.
305
+
306
+ **Local extension (`/verify` skill):**
307
+ - Integration wiring check
308
+ - Lesson-scanner agent against changed files
309
+ - Horizontal sweep: every endpoint/CLI command
310
+ - Vertical trace: one real input through entire stack
311
+ - Checklist of specific lessons to verify (#11, #16, #34, #43, etc.)
312
+
313
+ ### Stage 6: Finish Branch
314
+
315
+ **Trigger:** All tasks complete, all tests verified passing.
316
+
317
+ **What happens:**
318
+ 1. Run test suite — if failing, STOP (do not present options)
319
+ 2. Present exactly 4 options:
320
+ - **Merge** locally (cleanup worktree)
321
+ - **Push + PR** (keep worktree)
322
+ - **Keep** branch as-is (keep worktree)
323
+ - **Discard** (requires typed confirmation, cleanup worktree)
324
+ 3. Execute chosen option
325
+ 4. Clean up worktree for merge and discard only
326
+
327
+ ## Quality Gate Pipeline
328
+
329
+ Quality gates run between every batch in every execution mode. They are the enforcement mechanism that prevents degradation.
330
+
331
+ ```
332
+ ┌─────────────────────────────┐
333
+ │ lesson-check.sh │ Syntactic anti-pattern scan
334
+ │ (<2 seconds, grep-based) │ 6 checks from real bugs:
335
+ │ │ - bare except without logging
336
+ │ │ - async def without await
337
+ │ │ - create_task without done_callback
338
+ │ │ - hub.cache direct access
339
+ │ │ - HA automation singular keys
340
+ │ │ - .venv/bin/pip wrong path
341
+ └──────────┬──────────────────┘
342
+ │ if clean
343
+
344
+ ┌─────────────────────────────┐
345
+ │ ast-grep patterns │ 5 structural code patterns:
346
+ │ (scripts/patterns/*.yml) │ - bare-except, empty-catch
347
+ │ │ - async-no-await
348
+ │ │ - retry-loop-no-backoff
349
+ │ │ - hardcoded-localhost
350
+ └──────────┬──────────────────┘
351
+ │ if clean
352
+
353
+ ┌─────────────────────────────┐
354
+ │ Test suite │ Auto-detected:
355
+ │ (pytest / npm test / make) │ pytest / npm test / make test
356
+ └──────────┬──────────────────┘
357
+ │ if pass
358
+
359
+ ┌─────────────────────────────┐
360
+ │ Memory check │ Advisory: warn if < 4GB
361
+ │ (never fails) │ available (OOM prevention)
362
+ └──────────┬──────────────────┘
363
+
364
+
365
+ ┌─────────────────────────────┐
366
+ │ Test count regression │ new_count >= previous_count
367
+ │ (monotonic enforcement) │ Catches: deleted tests,
368
+ │ │ broken test discovery
369
+ └──────────┬──────────────────┘
370
+ │ if no regression
371
+
372
+ ┌─────────────────────────────┐
373
+ │ Git clean check │ All changes committed
374
+ │ │ No leftover unstaged work
375
+ └──────────┬──────────────────┘
376
+ │ if clean
377
+
378
+ ┌─────────────────────────────┐
379
+ │ MAB lessons injection │ Inject judge observations
380
+ │ (--mab mode only) │ from logs/mab-lessons.json
381
+ │ │ into next batch context
382
+ └──────────┬──────────────────┘
383
+ │ if clean
384
+
385
+ ✅ PASS → next batch
386
+ ```
387
+
388
+ ## State & Persistence
389
+
390
+ Three persistence mechanisms prevent data loss across context resets:
391
+
392
+ ### `.run-plan-state.json` (execution state)
393
+ ```json
394
+ {
395
+ "plan_file": "docs/plans/feature.md",
396
+ "current_batch": 5,
397
+ "completed_batches": [1, 2, 3, 4],
398
+ "test_counts": {"1": 10, "2": 25, "3": 42, "4": 58},
399
+ "last_quality_gate": {"batch": 4, "passed": true, "test_count": 58}
400
+ }
401
+ ```
402
+ Written after every batch. Enables `--resume`.
403
+
404
+ ### `progress.txt` (discovery log)
405
+ Append-only file written by the executing agent. Contains:
406
+ - Batch summaries (what was done, what was discovered)
407
+ - Decisions made during implementation
408
+ - Issues encountered and how they were resolved
409
+
410
+ Read at the start of each batch/iteration to give the agent memory across context resets. This is how a headless `claude -p` process (which has no memory of previous batches) knows what happened before.
411
+
412
+ ### `tasks/prd.json` (acceptance criteria tracker)
413
+ ```json
414
+ [
415
+ {
416
+ "id": 1,
417
+ "title": "Implement parser",
418
+ "acceptance_criteria": ["pytest tests/test_parser.py -x"],
419
+ "passes": false
420
+ }
421
+ ]
422
+ ```
423
+ Updated after each batch. `"passes": true` is set when all acceptance criteria exit 0. Verification stage requires every task to pass.
424
+
425
+ ### `logs/failure-patterns.json` (cross-run failure learning)
426
+ Tracks failure types, frequencies, and winning fixes indexed by batch title pattern. Fed into the next run's context injection so agents don't repeat the same mistakes.
427
+
428
+ ### `logs/routing-decisions.log` (execution traceability)
429
+ Append-only log of mode selection, model routing, and parallelism scores for each batch. Enables post-run analysis of why specific strategies were chosen.
430
+
431
+ ### `logs/sampling-outcomes.json` (prompt variant learning)
432
+ Records which sampling strategy (prompt variant) won per batch type. Used by `--sample N` to weight future variant selection.
433
+
434
+ ### `logs/strategy-perf.json` (MAB Thompson Sampling data)
435
+ Win/loss counters per strategy (superpowers, ralph) per batch type (new-file, refactoring, integration, test-only). The Thompson Sampling routing in `--mab` mode reads this to decide whether to compete or route directly.
436
+
437
+ ### `logs/mab-lessons.json` (MAB judge observations)
438
+ Patterns observed by the LLM judge during competitive runs. When a pattern reaches 3+ occurrences, it is auto-promoted into the context injection for future batches.
439
+
440
+ ## Feedback Loops
441
+
442
+ ### Lessons → Checks → Gates → Enforcement
443
+
444
+ ```
445
+ Bug happens
446
+
447
+
448
+ Lesson captured (docs/lessons/YYYY-MM-DD-*.md)
449
+ │ Using Army OIL taxonomy: Observation → Insight → Lesson → Lesson Learned
450
+
451
+
452
+ Pattern identified
453
+
454
+ ├─ Syntactic pattern (grep-detectable, near-zero false positives)
455
+ │ → Add to lesson-check.sh
456
+ │ → Enforced by quality gate on every batch
457
+
458
+ ├─ Semantic pattern (needs context, AI-detectable)
459
+ │ → Add to lesson-scanner agent
460
+ │ → Run during verification stage
461
+
462
+ └─ Behavioral pattern (process/workflow)
463
+ → Add hookify rule
464
+ → Enforced at tool-call time (pre-write, pre-commit)
465
+ ```
466
+
467
+ ### Community Lesson Loop
468
+
469
+ Every user's production failures improve every other user's agent:
470
+
471
+ ```
472
+ User encounters bug
473
+
474
+
475
+ /submit-lesson command
476
+ │ Captures anti-pattern, generates structured YAML lesson file
477
+
478
+
479
+ PR opened against toolkit repo
480
+ │ Maintainer reviews regex accuracy, severity, category
481
+
482
+
483
+ Lesson file merged to docs/lessons/
484
+
485
+ ├─ pattern.type: syntactic
486
+ │ → lesson-check.sh reads regex from YAML, runs grep
487
+ │ → Enforced by quality gate on every batch (<2s)
488
+
489
+ └─ pattern.type: semantic
490
+ → lesson-scanner agent reads description + example
491
+ → Run during verification stage (AI-assisted analysis)
492
+
493
+
494
+ Every user's next scan catches that anti-pattern
495
+ ```
496
+
497
+ Adding a lesson file is all it takes — no code changes to the scanner or check script.
498
+
499
+ ### Scope Metadata (Project-Level Filtering)
500
+
501
+ Not every lesson applies to every project. The `scope:` field on each lesson enables project-level filtering so lessons only fire where they're relevant.
502
+
503
+ **How it works:**
504
+
505
+ 1. Each lesson has a `scope:` YAML field with tags like `[universal]`, `[language:python]`, `[project:ha-aria]`
506
+ 2. Each project's `CLAUDE.md` declares `## Scope Tags` (e.g., `language:python, framework:pytest, project:ha-aria`)
507
+ 3. `detect_project_scope()` reads `CLAUDE.md` from the working directory and extracts these tags
508
+ 4. `scope_matches()` computes the intersection — a lesson applies if any of its scope tags match the project's tags (or if the lesson is `[universal]`)
509
+
510
+ **CLI flags on `lesson-check.sh`:**
511
+ - `--all-scopes` — Ignore scope filtering, scan everything (useful for cross-project audits)
512
+ - `--show-scope` — Display the scope tags for each matched lesson
513
+ - `--scope <tags>` — Override project scope detection with explicit tags
514
+
515
+ **Design rationale:** Without scope metadata, false positives compound — at ~100 lessons, research shows 67% of flagged violations are irrelevant to the current project. Scope filtering keeps the signal-to-noise ratio high as the lesson library grows.
516
+
517
+ ### Hookify (Real-Time Enforcement)
518
+
519
+ Hookify rules run on every file write and commit. They are the last line of defense:
520
+ - **bare-except:** Block writes containing `except:` without logging
521
+ - **test-counts:** Warn on hardcoded test count assertions
522
+ - **venv-pip:** Warn on `.venv/bin/pip` (use `.venv/bin/python -m pip`)
523
+ - **secrets:** Block writes containing values from `~/.env`
524
+ - **force-push:** Block `git push --force` and `-f`
525
+
526
+ Design rule: Syntactic patterns (near-zero false positives) → lesson files with `pattern.type: syntactic` → `lesson-check.sh`. Semantic patterns (needs context) → lesson files with `pattern.type: semantic` → `lesson-scanner` agent. Reserve hookify for behavioral/workflow enforcement (process violations, security boundaries).
527
+
528
+ ## Agent Suite
529
+
530
+ The toolkit ships with 7 agents in the `agents/` directory, dispatched via Claude Code's Task tool. Each serves a distinct role in the quality pipeline.
531
+
532
+ | Agent | Model | Purpose | When to Use |
533
+ |-------|-------|---------|-------------|
534
+ | `lesson-scanner` | sonnet | Dynamic anti-pattern scan from lesson files | Verification stage, post-commit audit |
535
+ | `bash-expert` | sonnet | Review, write, debug bash scripts | .sh files, CI steps, Makefile targets |
536
+ | `shell-expert` | sonnet | Diagnose systemd, PATH, permissions | Service failures, environment issues |
537
+ | `python-expert` | sonnet | Async discipline, resource lifecycle, type safety | Python code review, HA/Telegram ecosystem |
538
+ | `integration-tester` | opus | Verify data flows across service seams | After deployments, timer failures, pipeline validation |
539
+ | `dependency-auditor` | haiku | CVE scan, outdated packages, license compliance | Periodic audits, pre-release checks |
540
+ | `service-monitor` | sonnet | Deep systemd service + timer investigation | When infra-auditor flags issues |
541
+
542
+ **Agent chains** (manual, not yet automated):
543
+ 1. **Post-commit:** security-reviewer → lesson-scanner → doc-updater
544
+ 2. **Service triage:** infra-auditor (detect) → shell-expert (investigate) → service-monitor (verify)
545
+ 3. **Pre-release:** dependency-auditor → integration-tester → lesson-scanner
546
+
547
+ ## Research Phase (Stage 1.5)
548
+
549
+ After design approval and before PRD generation, the optional research phase investigates technical unknowns. This prevents the most expensive failure mode: building the wrong thing correctly.
550
+
551
+ **Artifacts produced:**
552
+ - `tasks/research-<slug>.md` — human-readable report (questions, findings, recommendations)
553
+ - `tasks/research-<slug>.json` — machine-readable output with `blocking_issues`, `warnings`, `dependencies`, `confidence_ratings`
554
+
555
+ **Gate:** `scripts/research-gate.sh` reads the JSON and blocks PRD generation if any `blocking_issues` have `resolved: false`. Use `--force` to override. The gate integrates with both the interactive pipeline (`skills/autocode/SKILL.md`) and the headless pipeline (`scripts/auto-compound.sh`).
556
+
557
+ **Context injection:** `scripts/lib/run-plan-context.sh` reads research warnings from all `tasks/research-*.json` files and injects them into batch context within the token budget. This ensures agents see relevant warnings even when research was done in a prior session.
558
+
559
+ ## Roadmap Stage (Stage 0.5)
560
+
561
+ For multi-feature epics (3+ features or "roadmap" keyword), the roadmap stage decomposes the work before brainstorming begins. Each feature then runs the full Stage 1-6 pipeline independently.
562
+
563
+ **Artifact:** `tasks/roadmap.md` with dependency-ordered features, phase groupings, complexity estimates, and risk ratings.
564
+
565
+ **When it activates:** Automatically when autocode detects multi-feature input. Skipped for single-feature work.
566
+
567
+ ## Positive Policy System
568
+
569
+ Policies are the complement to lessons — instead of "don't do X" (negative, lesson-based), policies say "always do Y" (positive, pattern-based). Research (#62) shows positive instructions outperform negative ones for LLMs.
570
+
571
+ **Policy files** in `policies/`:
572
+ | File | Scope | Patterns |
573
+ |------|-------|----------|
574
+ | `universal.md` | All projects | Error visibility, test before ship, fresh context, durable artifacts |
575
+ | `python.md` | Python projects | Async discipline, closing(), create_task callbacks, pip via module |
576
+ | `bash.md` | Shell scripts | Strict mode, quoting, subshell cd, temp cleanup, atomic writes |
577
+ | `testing.md` | All test files | No hardcoded counts, boundary testing, test the test, live > static |
578
+
579
+ **Checker:** `scripts/policy-check.sh` — advisory by default (always exits 0). Use `--strict` to exit non-zero on violations. Auto-detects project language and runs applicable checks.
580
+
581
+ ## Entropy Management
582
+
583
+ Over time, codebases drift. The entropy audit catches it:
584
+
585
+ ```bash
586
+ scripts/entropy-audit.sh --projects-dir ~/projects --all
587
+ ```
588
+
589
+ **Checks:**
590
+ 1. Dead references in CLAUDE.md (files that no longer exist)
591
+ 2. File size violations (>300 lines)
592
+ 3. Naming convention drift (camelCase in Python)
593
+ 4. Unused imports
594
+ 5. Uncommitted work
595
+
596
+ Designed to run as a systemd timer (weekly) for continuous entropy management.
597
+
598
+ ## Cross-Project Operations
599
+
600
+ ### Batch Audit
601
+ ```bash
602
+ scripts/batch-audit.sh ~/projects lessons
603
+ ```
604
+ Runs headless `claude -p` against every project repo in a directory. Each gets its own process with read-only tools.
605
+
606
+ ### Batch Test
607
+ ```bash
608
+ scripts/batch-test.sh ~/projects
609
+ ```
610
+ Memory-aware test runner. Auto-detects test framework per project. Skips full suite if available memory < 4GB.
611
+
612
+ ### Auto-Compound (Full Pipeline)
613
+ ```bash
614
+ scripts/auto-compound.sh ~/projects/my-app --report reports/daily.md
615
+ ```
616
+ End-to-end: analyze report → pick #1 priority → generate PRD → create branch → Ralph loop with quality gates → push → open PR.
617
+
618
+ ## Design Principles
619
+
620
+ 1. **Fresh context per unit of work.** Context degradation is the #1 quality killer. Every execution mode solves this differently: `claude -p` per batch (Mode C), fresh subagent per task (Mode A), stop-hook re-injection (Mode D).
621
+
622
+ 2. **Machine-verifiable gates.** No human judgment in the loop for "did this work?" Every gate is a command that exits 0 or non-zero. Humans decide *what to build*; machines verify *that it was built correctly*.
623
+
624
+ 3. **Test count monotonicity.** Tests only go up. If the count decreases between batches, something broke — the gate catches it before the next batch compounds the damage.
625
+
626
+ 4. **State survives interruption.** Every state transition is persisted to disk (JSON state file, progress.txt, prd.json). Kill the process, reboot the machine, come back a week later — `--resume` picks up where it left off.
627
+
628
+ 5. **Orthogonal verification.** Bottom-up (syntactic anti-patterns, file-level checks) and top-down (integration boundaries, data flow traces) catch non-overlapping bug classes. A/B verification found zero overlap in critical findings across 6 critical bugs.
629
+
630
+ 6. **Lessons compound.** Every bug that costs real debugging time becomes a lesson. Lessons with syntactic signatures become automated checks. Checks run on every batch. The system gets harder to break over time.