autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,204 @@
1
+ # Code Factory v2 — Design Document
2
+
3
+ **Date:** 2026-02-21
4
+ **Status:** Approved
5
+ **Approach:** Foundation-First (Phase 1 → 2 → 3 → 4, sequential)
6
+
7
+ ## Problem Statement
8
+
9
+ The Code Factory pipeline (`auto-compound.sh` → `quality-gate.sh` → `run-plan.sh`) works but has accumulated technical debt: code duplication across scripts, hardcoded paths, missing quality gate steps, no cross-batch context for agents, and no prior-art search. Research across Notion, GitHub (10 repos), web best practices, and the codebase identified 24 concrete improvements.
10
+
11
+ ## Research Findings
12
+
13
+ ### Competitive Landscape (10 repos analyzed)
14
+ - **Unique strengths to preserve:** PRD with shell exit codes (no other repo does this), hookify pre-write guardrails, lesson-indexed anti-patterns
15
+ - **Gaps to close:** No prior-art search (only RepoMaster/NeurIPS 2025 does this), no lint step, no cross-batch context, no cost tracking
16
+ - **Patterns to adopt:** Aider's Architect/Editor split, structured `context_refs` from multi-agent-coding-system, 4-checkpoint quality gate pipeline
17
+
18
+ ### Key Principles
19
+ - **Harness Engineering** (OpenAI): Design environments/feedback loops that govern agent behavior
20
+ - **Compound Product** (Ryan Carson): Self-improving agent loop where each iteration improves operating instructions
21
+ - **Agent specialization formula:** Model + Runtime + MCP + Skills = Specialized Agent (one agent + composable skills > many specialized agents)
22
+
23
+ ### Module Health
24
+ | Script | Lines | Status |
25
+ |--------|-------|--------|
26
+ | `run-plan.sh` | 412 | VIOLATION (>300) — extract headless loop |
27
+ | `auto-compound.sh` | 230 | OK |
28
+ | `entropy-audit.sh` | 213 | OK |
29
+ | `lesson-check.sh` | 195 | OK |
30
+ | `analyze-report.sh` | 114 | OK |
31
+ | `quality-gate.sh` | 111 | OK |
32
+
33
+ ### Code Duplications Found
34
+ 1. Project type detection (auto-compound.sh lines 145-163, quality-gate.sh lines 30-50)
35
+ 2. Arg parsing boilerplate (5 scripts repeat the same pattern)
36
+ 3. Ollama API calls (analyze-report.sh, entropy-audit.sh)
37
+ 4. Telegram credential loading (run-plan-notify.sh, lessons-review.sh)
38
+ 5. JSON fence stripping (analyze-report.sh lines 104-110)
39
+
40
+ ## Design
41
+
42
+ ### Phase 1: Foundation (Shared Library + Module Compliance)
43
+
44
+ Extract duplicated code into a shared library and bring all scripts under the 300-line limit.
45
+
46
+ **Task 1.1: Create `scripts/lib/common.sh`**
47
+ Extract into shared functions:
48
+ - `detect_project_type()` — unified Python/Node/general detection
49
+ - `parse_common_args()` — `--help`, `--project-root`, `--verbose` boilerplate
50
+ - `strip_json_fences()` — remove ```json wrappers from LLM output
51
+ - `check_memory_available()` — memory guard (threshold parameterized)
52
+ - `require_command()` — check binary exists, print install hint
53
+
54
+ **Task 1.2: Create `scripts/lib/ollama.sh`**
55
+ Extract Ollama interaction:
56
+ - `ollama_query()` — submit prompt to ollama-queue or direct API
57
+ - `ollama_parse_json()` — query + strip fences + validate JSON
58
+
59
+ **Task 1.3: Refactor `auto-compound.sh` to use `common.sh`**
60
+ - Replace inline project detection with `detect_project_type()`
61
+ - Replace JSON stripping with `strip_json_fences()`
62
+ - Fix line 127: PRD output discarded to `/dev/null` with `|| true` (lesson-7 violation)
63
+
64
+ **Task 1.4: Refactor `quality-gate.sh` to use `common.sh`**
65
+ - Replace inline project detection with `detect_project_type()`
66
+ - Replace inline memory check with `check_memory_available()`
67
+
68
+ **Task 1.5: Refactor `entropy-audit.sh`**
69
+ - Replace hardcoded `PROJECTS_DIR="$HOME/Documents/projects"` (line 17) with `--project-root` arg or env var
70
+ - Use `ollama.sh` for LLM calls
71
+
72
+ **Task 1.6: Extract `scripts/lib/run-plan-headless.sh`**
73
+ - Move `run_mode_headless()` (lines 229-376, 148 lines) from `run-plan.sh` into dedicated lib module
74
+ - Target: `run-plan.sh` drops to ~260 lines
75
+
76
+ **Task 1.7: Refactor `analyze-report.sh` to use shared libs**
77
+ - Use `ollama.sh` for LLM calls
78
+ - Use `strip_json_fences()` from `common.sh`
79
+
80
+ ### Phase 2: Accuracy (Fix Broken Pipeline Steps)
81
+
82
+ Fix the pipeline steps that silently fail or produce incomplete results.
83
+
84
+ **Task 2.1: Fix PRD invocation in `auto-compound.sh`**
85
+ - Line 127 discards `/create-prd` output — capture and validate
86
+ - Verify headless `claude --print` loads project-scoped commands from `~/Documents/.claude/commands/`
87
+ - If not, inline the PRD prompt or add `--commands-dir` flag
88
+
89
+ **Task 2.2: Fix test count parsing for non-pytest projects**
90
+ - `run-plan-quality-gate.sh` line 23: `grep -oP '\b(\d+) passed\b'` is pytest-only
91
+ - Add parsers for: `jest` (`Tests: N passed`), `go test` (`ok`/`FAIL`), `npm test` (TAP format)
92
+ - Return `-1` (skip regression check) when format is unrecognized, not `0` (which defeats detection)
93
+
94
+ **Task 2.3: Add cross-batch context to `run-plan-prompt.sh`**
95
+ - Include `git log --oneline -5` (recent commits from prior batches)
96
+ - Include last 20 lines of `progress.txt` (discoveries, decisions)
97
+ - Include previous quality gate result (pass/fail, test count)
98
+ - Keep prompt under 2000 tokens to leave room for batch instructions
99
+
100
+ **Task 2.4: Add cost/duration tracking to state**
101
+ - Track per-batch wall time (already computed but not saved)
102
+ - Track cumulative duration across batches
103
+ - Add `duration_seconds` field to batch entries in `.run-plan-state.json`
104
+
105
+ **Task 2.5: Wire Telegram credential loading through shared lib**
106
+ - Create `scripts/lib/telegram.sh` — single source for `_load_telegram_env()`
107
+ - Replace duplicate in `run-plan-notify.sh` and `lessons-review.sh`
108
+
109
+ ### Phase 3: Quality Gates (Lint + Search + Status)
110
+
111
+ Add missing quality gate steps and a new prior-art search capability.
112
+
113
+ **Task 3.1: Add `ruff` lint step to `quality-gate.sh`**
114
+ - Run `ruff check --select E,W,F` for Python projects
115
+ - Run `eslint` for Node projects (if `.eslintrc*` exists)
116
+ - Gate: lint errors = fail, warnings = warn-only
117
+
118
+ **Task 3.2: Create `scripts/prior-art-search.sh`**
119
+ - Input: feature description or plan file
120
+ - Search GitHub via `gh search repos` and `gh search code`
121
+ - Search local codebase via `grep -r` for similar patterns
122
+ - Output: ranked list of relevant repos/files with relevance scores
123
+ - Integrate with `ast-grep` for structural code search (Phase 4)
124
+
125
+ **Task 3.3: Create `scripts/license-check.sh`**
126
+ - Check dependencies for license compatibility
127
+ - Python: parse `pip licenses` output
128
+ - Node: parse `license-checker` output
129
+ - Flag GPL/AGPL in MIT-licensed projects
130
+
131
+ **Task 3.4: Create `scripts/pipeline-status.sh`**
132
+ - Single-command view of all pipeline components
133
+ - Show: last run time, pass/fail, test count, batch progress
134
+ - Read from `.run-plan-state.json` and quality gate logs
135
+
136
+ **Task 3.5: Wire new gates into `quality-gate.sh`**
137
+ - Add lint step (Task 3.1) between lesson-check and tests
138
+ - Add license check (Task 3.3) as optional `--with-license` flag
139
+ - Preserve fast-path: skip slow checks when `--quick` flag is passed
140
+
141
+ **Task 3.6: Wire prior-art search into `auto-compound.sh`**
142
+ - Run before PRD generation
143
+ - Pass results as context to PRD prompt
144
+ - Log findings to `progress.txt`
145
+
146
+ ### Phase 4: New Capabilities
147
+
148
+ Add advanced features based on research findings.
149
+
150
+ **Task 4.1: Create `scripts/failure-digest.sh`**
151
+ - Parse failed batch logs
152
+ - Extract: error messages, stack traces, failed test names
153
+ - Generate structured digest for retry prompts
154
+ - Replace the naive `tail -50` in `run-plan.sh` line 291
155
+
156
+ **Task 4.2: Add persistent `AGENTS.md` to worktrees**
157
+ - Auto-generated file listing agent capabilities used in the plan
158
+ - Include: tools allowed, model, permission mode, batch assignments
159
+ - Agents read this at start of each batch for team awareness
160
+
161
+ **Task 4.3: Add structured `context_refs` to plan format**
162
+ - Each batch can declare dependencies on prior batch outputs
163
+ - Format: `context_refs: [batch-2:src/auth.py, batch-3:tests/]`
164
+ - Parser extracts refs and includes referenced file contents in prompt
165
+
166
+ **Task 4.4: Add `ast-grep` integration to prior-art search**
167
+ - Structural code search (find patterns by AST shape, not text)
168
+ - Install: `cargo install ast-grep` or `npm i @ast-grep/cli`
169
+ - Use for: finding similar function signatures, API patterns, test structures
170
+
171
+ **Task 4.5: Implement team mode in `run-plan.sh`**
172
+ - Replace stub at lines 379-384
173
+ - Use Claude Code agent teams (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`)
174
+ - Assign batches to parallel agents with shared state file
175
+ - Quality gate runs after each batch completion (any agent)
176
+
177
+ **Task 4.6: Add parallel patch sampling**
178
+ - For critical batches: generate N candidate implementations
179
+ - Run quality gate on each
180
+ - Keep the one with highest test count / cleanest lint
181
+ - Inspired by Agentless (NeurIPS 2024) approach
182
+
183
+ ## Dependencies
184
+
185
+ - **Phase 1** has no external dependencies (pure refactoring)
186
+ - **Phase 2** depends on Phase 1 (shared libs)
187
+ - **Phase 3** depends on Phase 2 (accurate pipeline) + installs: `ruff`, `ast-grep`
188
+ - **Phase 4** depends on Phase 3 (quality gates) + requires agent teams feature
189
+
190
+ ## Success Metrics
191
+
192
+ 1. All scripts under 300 lines
193
+ 2. Zero code duplication across scripts (shared lib extraction complete)
194
+ 3. Quality gate catches lint errors, license issues, and test regressions
195
+ 4. Prior-art search runs before every PRD generation
196
+ 5. Cross-batch context reduces retry rate by providing agents with prior batch results
197
+ 6. Pipeline status visible in single command
198
+
199
+ ## Risk Mitigations
200
+
201
+ - **Breaking existing workflows:** Each phase is independently shippable. Phase 1 is pure refactoring with no behavior change.
202
+ - **Headless command loading:** Task 2.1 explicitly tests whether project-scoped commands work in headless mode. Fallback: inline the prompt.
203
+ - **Tool installation:** Install tools as needed per phase (ruff in Phase 3, ast-grep in Phase 4). No upfront bulk install.
204
+ - **Agent teams instability:** Phase 4 team mode depends on experimental feature flag. Headless mode remains the stable default.