autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,602 @@
1
+ # Research Phase Integration: Formalizing Research in the Autonomous Coding Pipeline
2
+
3
+ **Date:** 2026-02-22
4
+ **Status:** Research complete
5
+ **Scope:** How to integrate a structured research phase into the toolkit's workflow pipeline, plus code factory consolidation and roadmap stage
6
+ **Method:** 3 parallel research agents (external frameworks, codebase analysis, cross-domain analogies) + manual codebase exploration
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ The autonomous coding toolkit's pipeline (brainstorm → PRD → plan → execute → verify → finish) has no formalized research phase. Research happens informally during brainstorming (codebase reconnaissance) and is partially automated via `prior-art-search.sh` in the headless pipeline — but neither produces a structured, reusable artifact.
13
+
14
+ Evidence from six domains (medicine, military intelligence, design thinking, competitive intelligence, deep research agents, SWE-bench) converges on the same pattern: **structured research before action, producing a durable artifact that downstream phases consume.**
15
+
16
+ The MAB research we conducted in this session is the proof case — Round 1 alone halved the batch count, identified 80% code reuse, surfaced 3 academic techniques the design missed, and found 8 latent bugs. All before a single line of implementation code was written.
17
+
18
+ **Three additions proposed:**
19
+ 1. **Research phase** — new Stage 1.5 between brainstorming and PRD, producing structured `tasks/research-<slug>.md` + `.json`
20
+ 2. **Roadmap stage** — new Stage 0.5 before brainstorming, for multi-feature sequencing
21
+ 3. **Code factory consolidation** — bring all Code Factory scripts and skills into the toolkit as first-class pipeline components
22
+
23
+ ---
24
+
25
+ ## 1. The Case for Formalized Research
26
+
27
+ ### 1.1 What Top-Performing Agents Do
28
+
29
+ Evidence from SWE-bench, Cognition (Devin), and academic literature:
30
+
31
+ | Finding | Source | Implication |
32
+ |---------|--------|-------------|
33
+ | Agents spend >60% of first-turn time retrieving context | Cognition SWE-bench report | Context retrieval is the bottleneck, not code generation |
34
+ | SWE-grep (specialized retrieval sub-agent) reduced context retrieval from 20+ turns to 4 turns | Cognition SWE-grep blog | Separate the retrieval agent from the coding agent |
35
+ | Performance degrades >30% when relevant info is in the middle of context vs beginning/end | Stanford "Lost in the Middle" (arXiv 2307.03172) | Compress and select before injecting — don't dump everything |
36
+ | RAG from diverse, high-quality sources produces significant gains even on top of GPT-4 | CodeRAG-Bench (arXiv 2406.14497) | Multi-source research (codebase + docs + web + papers) compounds |
37
+ | 72% of SWE-bench successes take >10 minutes | SWE-bench Pro (Scale AI) | Exploration time is not waste — it's the work |
38
+ | "Most agent failures are not model failures — they are context failures" | Anthropic context engineering guide | The research phase IS context engineering |
39
+
40
+ ### 1.2 What the Current Pipeline Does (and Doesn't)
41
+
42
+ | Research-like Activity | Where | Artifact Produced | Consumed By | Gap |
43
+ |----------------------|-------|-------------------|-------------|-----|
44
+ | Codebase reconnaissance | brainstorming Step 1 | None — ephemeral | Clarifying questions only | No artifact, no record |
45
+ | Prior-art search | `auto-compound.sh` Step 2.5 | `prior-art-results.txt` (unstructured) | PRD prompt injection | Not in interactive path; unstructured; no schema |
46
+ | Report analysis | `analyze-report.sh` | `analysis.json` | `auto-compound.sh` only | Triage, not research |
47
+ | PRD investigation tasks | `create-prd.md` Step 4 | None — findings disappear | `progress.txt` at best | No template, no format, no enforcement |
48
+ | Competitive pre-flight | `competitive-mode.md` | Context brief (ephemeral) | Competitor prompts | Only in competitive mode; not a durable artifact |
49
+ | Manual research reports | MAB session (this session) | `docs/plans/*.md` (structured) | Design doc, plan | No automated analog |
50
+
51
+ **The structural gap:** Research findings have no path back into the pipeline. There is no stage that reads research output and uses it to modify the design, scope the PRD, or annotate the plan.
52
+
53
+ ### 1.3 What the MAB Research Session Proved
54
+
55
+ | Activity | Impact | Pipeline Could Have Done This? |
56
+ |----------|--------|-------------------------------|
57
+ | Codebase gap analysis | Identified 80% infrastructure reuse — halved batch count | No — brainstorming doesn't audit existing code against plan assumptions |
58
+ | Academic literature review | Added Thompson Sampling, position bias mitigation, prompt evolution | No — no external search mechanism |
59
+ | Cross-domain analogies | 7 analogies produced 3 universal patterns (locked criteria, diversity as signal, discriminating conditions) | No — nothing searches outside the domain |
60
+ | Cost modeling | $1.88 vs $10.58/batch with cache priming — changed architecture | No — no cost analysis mechanism |
61
+ | Latent bug identification | 8 bugs found before implementation (including state schema mismatch affecting all headless runs) | Partial — lesson-check is post-hoc, not pre-implementation |
62
+ | Research → plan reshape | Round 1 halved batches; Round 2 added cache-prime step | No — no feedback path from research to plan |
63
+
64
+ ---
65
+
66
+ ## 2. Cross-Domain Research Frameworks
67
+
68
+ ### 2.1 Evidence-Based Medicine (PICO + Cochrane)
69
+
70
+ The strongest anti-bias framework. Five mandatory phases before analysis:
71
+
72
+ 1. **Protocol registration** — pre-specify question, inclusion/exclusion criteria, synthesis method *before seeing data*
73
+ 2. **Question decomposition (PICO):** Population, Intervention, Comparison, Outcome
74
+ 3. **Search strategy** — explicit queries across explicit sources, documented for reproducibility
75
+ 4. **Screening** — two-stage: title/abstract first, then full-text, with pre-defined inclusion rules
76
+ 5. **Data extraction → synthesis** — structured form per source, then aggregation with confidence grades (GRADE: high/moderate/low/very low)
77
+
78
+ **Key artifacts:** `review_protocol.md` (frozen before search), `search_log.json`, `screening_matrix.csv`, `evidence_table.md`
79
+
80
+ **Transferable insight:** The protocol is frozen before data collection. You cannot adjust inclusion criteria after seeing results. Applied to coding: define what "relevant prior art" means *before* searching.
81
+
82
+ **Automated analog:** otto-SR reproduced 12 Cochrane reviews in 2 days using a multi-agent LLM pipeline (abstract screen → full-text screen → extraction → synthesis). Sensitivity: 96.7%, specificity: 97.9%.
83
+
84
+ ### 2.2 Military Intelligence (IPB + ACH + OODA)
85
+
86
+ **Intelligence Preparation of the Battlefield (IPB)** — four mandatory steps:
87
+
88
+ 1. Define operational environment (scope)
89
+ 2. Describe environmental effects on operations (constraints)
90
+ 3. Evaluate the threat (adversary capabilities)
91
+ 4. Determine threat courses of action (all plausible, not just most likely)
92
+
93
+ **What's distinctive:** IPB explicitly maps *what you don't know* alongside what you do. The artifact isn't just findings — it's a **structured-ignorance document** that defines what information would change the assessment.
94
+
95
+ **Analysis of Competing Hypotheses (ACH):** Build an evidence matrix where rows are evidence items and columns are competing hypotheses. Score each cell. The hypothesis with the least disconfirming evidence wins — not the one with the most confirming evidence.
96
+
97
+ **Transferable insights:**
98
+ - Map unknowns explicitly, not just knowns
99
+ - Evaluate competing approaches by disconfirmation, not confirmation
100
+ - The ASCOPE matrix (Areas, Structures, Capabilities, Organizations, People, Events) translates to: Files, Modules, APIs, Dependencies, Users, Workflows
101
+
102
+ ### 2.3 Design Thinking (Double Diamond)
103
+
104
+ Two explicit diverge/converge cycles with a hard gate between them:
105
+
106
+ ```
107
+ Diamond 1: Problem Space Diamond 2: Solution Space
108
+ [Discover] → [Define] GATE [Develop] → [Deliver]
109
+ (diverge) (converge) (diverge) (converge)
110
+ ```
111
+
112
+ **Gate rule:** You *cannot* enter solution space without a frozen problem definition.
113
+
114
+ **Discovery phase artifacts:** Empathy maps, competitive landscape matrix, "How Might We" question bank, insight statements
115
+
116
+ **Transferable insight:** Discovery is explicitly divergent — collect more than you need, then cull. The cull produces a Point of View (POV) statement that's frozen before solution work begins.
117
+
118
+ ### 2.4 Competitive Intelligence
119
+
120
+ The intelligence cycle: **Requirements → Collection → Analysis → Dissemination**
121
+
122
+ **What's distinctive:** Dissemination is tailored by consumer role. The same research produces different artifacts for different downstream consumers (executive summary for decision-makers, detailed analysis for implementers, raw data for further analysis).
123
+
124
+ **Applied to the pipeline:** A single research phase produces:
125
+ - `research-<slug>.md` — human-readable report for design review
126
+ - `research-<slug>.json` — machine-readable for PRD scoping and context injection
127
+ - GitHub issues — for deferred items discovered during research
128
+
129
+ ### 2.5 Deep Research Agent Architecture
130
+
131
+ The canonical pipeline (from GPT Researcher, OpenAI Deep Research, DeepResearchAgent):
132
+
133
+ ```
134
+ Phase 1: PLAN — decompose query into sub-questions (strategic LLM)
135
+ Phase 2: EXECUTE — parallel retrieval per sub-question (crawler agents)
136
+ Phase 3: CURATE — embedding similarity filter + credibility ranking
137
+ Phase 4: SYNTHESIZE — aggregate into structured output (smart LLM)
138
+ Phase 5: PUBLISH — format with citations
139
+ ```
140
+
141
+ **Key insight:** These pipelines treat research output as a *durable artifact* (a report), not ephemeral context. Coding agents typically treat retrieved context as ephemeral — this is the architectural gap.
142
+
143
+ ### 2.6 Agile Technical Spikes
144
+
145
+ **Definition:** A time-boxed investigation task with a single question and a concrete deliverable (decision, estimate, or prototype).
146
+
147
+ **Best practices:**
148
+ - Single clear question — not "understand the codebase" but "what dependency injection pattern does the auth module use?"
149
+ - Time-boxed to 1-3 days (for agents: token/turn budgets)
150
+ - Deliverable is a decision, not code
151
+ - Two types: Technical (how to build) vs Functional (what to build)
152
+
153
+ **The anti-pattern AI agents make:** They conflate spike and implementation into a single trajectory. The agent starts searching and starts writing before search is complete.
154
+
155
+ ---
156
+
157
+ ## 3. Proposed Pipeline Changes
158
+
159
+ ### 3.1 Current Pipeline
160
+
161
+ ```
162
+ Stage 0: Initialize — detect project, load context
163
+ Stage 1: Brainstorm — design doc + user approval
164
+ Stage 2: PRD — tasks/prd.json with shell-verifiable criteria
165
+ Stage 3: Plan — TDD implementation plan
166
+ Stage 3.5: Isolate — git worktree
167
+ Stage 4: Execute — one of 4 modes
168
+ Stage 5: Verify — all PRD criteria pass
169
+ Stage 6: Finish — merge/PR/keep/discard
170
+ ```
171
+
172
+ ### 3.2 Proposed Pipeline (3 additions)
173
+
174
+ ```
175
+ Stage 0: Initialize — detect project, load context
176
+ Stage 0.5: ROADMAP [NEW] — multi-feature sequencing, priority ordering
177
+ Stage 1: Brainstorm — design doc + user approval
178
+ Stage 1.5: RESEARCH [NEW] — structured investigation, produces durable artifact
179
+ Stage 2: PRD — tasks/prd.json (scoped by research findings)
180
+ Stage 3: Plan — TDD implementation plan (informed by research)
181
+ Stage 3.5: Isolate — git worktree
182
+ Stage 4: Execute — one of 4+ modes (including MAB)
183
+ Stage 5: Verify — all PRD criteria pass
184
+ Stage 6: Finish — merge/PR/keep/discard
185
+ ```
186
+
187
+ ### 3.3 Stage 0.5: Roadmap (New)
188
+
189
+ **Purpose:** Before brainstorming a single feature, assess whether the work fits into a larger picture. A roadmap answers: *What order should features be built in? What blocks what? What's the minimum viable sequence?*
190
+
191
+ **When to invoke:**
192
+ - When the user describes multiple features or a large system
193
+ - When `auto-compound.sh` processes a report with multiple priorities
194
+ - When multiple GitHub issues exist and need sequencing
195
+ - Skip for single, isolated features
196
+
197
+ **Artifact:** `docs/roadmap-<project-or-theme>.md`
198
+
199
+ ```markdown
200
+ # Roadmap: <theme>
201
+ **Date:** YYYY-MM-DD
202
+ **Scope:** <what this roadmap covers>
203
+
204
+ ## Features (priority order)
205
+ | # | Feature | Depends On | Effort | Value Signal |
206
+ |---|---------|-----------|--------|-------------|
207
+ | 1 | <name> | — | S/M/L | <why this first> |
208
+ | 2 | <name> | #1 | S/M/L | <why this order> |
209
+
210
+ ## Dependency Graph
211
+ <text-based or mermaid graph>
212
+
213
+ ## Decision Log
214
+ - <decision>: <rationale>
215
+
216
+ ## Out of Scope
217
+ - <item>: <why deferred>
218
+ ```
219
+
220
+ **Gate:** User approves roadmap before brainstorming the first feature. Each feature in the roadmap gets its own brainstorm → research → PRD → plan → execute cycle.
221
+
222
+ **Integration with pipeline:**
223
+ - `autocode` skill checks for existing roadmap; if none exists and scope seems multi-feature, prompts user
224
+ - `auto-compound.sh` can generate roadmap from multi-priority `analysis.json`
225
+ - Roadmap is a living document — updated after each feature completes
226
+
227
+ ### 3.4 Stage 1.5: Research (New)
228
+
229
+ **Purpose:** After the design is approved, before PRD generation, conduct structured investigation to validate assumptions, find reusable components, surface latent issues, and mine external knowledge.
230
+
231
+ **Activities (parallel where possible):**
232
+
233
+ | Activity | Agent Type | Sources | Output |
234
+ |----------|-----------|---------|--------|
235
+ | Codebase gap analysis | Explore | Local files, AST, imports | Reuse table |
236
+ | Prior-art search | general-purpose | GitHub, web, Context7 | Library recommendations, patterns |
237
+ | Academic/external lit | general-purpose | Web search, papers | Techniques, measured impact |
238
+ | Cross-domain analogies | general-purpose | Web search (lateral) | Transferable patterns |
239
+ | Cost/feasibility | general-purpose | API pricing, benchmarks | Cost model |
240
+ | Latent issue scan | Explore + Bash | Existing code, tests, lint | Bug list with file:line |
241
+
242
+ **Research protocol (adapted from Cochrane):**
243
+ 1. **Scope** — what questions does this research answer? (derived from design doc)
244
+ 2. **Search** — explicit queries, documented in the artifact
245
+ 3. **Screen** — relevance filter on results
246
+ 4. **Extract** — structured findings per source
247
+ 5. **Synthesize** — implications for design, PRD scope, and plan
248
+
249
+ **Artifacts produced:**
250
+
251
+ **`tasks/research-<feature-slug>.md`** — human-readable report:
252
+ ```markdown
253
+ # Research: <feature>
254
+ **Date:** YYYY-MM-DD
255
+ **Design doc:** docs/plans/YYYY-MM-DD-<topic>-design.md
256
+
257
+ ## Research Questions
258
+ 1. <question derived from design>
259
+ 2. <question>
260
+
261
+ ## Codebase Gap Analysis
262
+ | Requirement | Existing File | Reusable? | Gap |
263
+ |-------------|--------------|-----------|-----|
264
+
265
+ ## External Findings
266
+ ### <Source Title>
267
+ - **Source:** <URL or citation>
268
+ - **Key finding:** <1-2 sentences>
269
+ - **Implication:** <how this affects our design>
270
+
271
+ ## Latent Issues
272
+ | File:Line | Description | Severity | Blocking? |
273
+ |-----------|-------------|----------|-----------|
274
+
275
+ ## Cross-Domain Insights
276
+ | Domain | Pattern | Application |
277
+ |--------|---------|-------------|
278
+
279
+ ## Design Changes Recommended
280
+ 1. [BLOCKING] <change> — <rationale>
281
+ 2. <change> — <rationale>
282
+
283
+ ## Cost Model
284
+ <if applicable>
285
+
286
+ ## Deferred Items
287
+ - <item> → GitHub issue created: #<number>
288
+ ```
289
+
290
+ **`tasks/research-<feature-slug>.json`** — machine-readable:
291
+ ```json
292
+ {
293
+ "feature": "string",
294
+ "date": "YYYY-MM-DD",
295
+ "design_doc": "path",
296
+ "reuse_components": [
297
+ {"requirement": "string", "file": "string", "lines": "string", "gap": "none|partial|full"}
298
+ ],
299
+ "latent_issues": [
300
+ {"file": "string", "line": 0, "description": "string", "severity": "critical|high|medium|low", "blocking": true}
301
+ ],
302
+ "design_changes": [
303
+ {"change": "string", "rationale": "string", "blocking": true}
304
+ ],
305
+ "prd_scope_delta": {
306
+ "tasks_removable": ["string"],
307
+ "tasks_added": ["string"],
308
+ "estimated_task_reduction": 0
309
+ },
310
+ "external_findings_count": 0,
311
+ "search_queries": ["string"]
312
+ }
313
+ ```
314
+
315
+ **Consumption by downstream stages:**
316
+
317
+ | Stage | How It Uses Research |
318
+ |-------|---------------------|
319
+ | PRD generation | Reads `prd_scope_delta` — removes tasks covered by reuse, adds tasks for latent issues |
320
+ | Writing plans | References research report under `## Research Findings`; adds fix tasks for latent issues |
321
+ | run-plan-context.sh | Injects critical/high latent issues as `### Research Warnings` in per-batch context |
322
+ | auto-compound.sh | Replaces Step 2.5 (prior-art-results.txt) with structured research JSON |
323
+ | Quality gate | `research-gate.sh` blocks PRD generation if blocking design changes unresolved |
324
+
325
+ ### 3.5 Code Factory Consolidation
326
+
327
+ **Current state:** Code Factory scripts and concepts are split between the toolkit repo and the Documents workspace:
328
+
329
+ | Component | Location | Should Be In Toolkit? |
330
+ |-----------|----------|----------------------|
331
+ | `auto-compound.sh` | toolkit `scripts/` | Yes (already there) |
332
+ | `quality-gate.sh` | toolkit `scripts/` | Yes (already there) |
333
+ | `run-plan.sh` + libs | toolkit `scripts/` | Yes (already there) |
334
+ | `analyze-report.sh` | toolkit `scripts/` | Yes (already there) |
335
+ | `prior-art-search.sh` | toolkit `scripts/` | Yes (already there) |
336
+ | `/create-prd` command | toolkit `commands/` | Yes (already there) |
337
+ | `/code-factory` command | toolkit `commands/` | Yes (already there) |
338
+ | `autocode` skill | toolkit `skills/` | Yes (already there) |
339
+ | `competitive-mode.md` | toolkit `skills/autocode/` | Yes (already there) |
340
+ | Code Factory design doc | workspace `docs/plans/` | Move to toolkit `docs/` |
341
+ | Code Factory V2 design | workspace `docs/plans/` | Move to toolkit `docs/` |
342
+ | `claude-md-validate.sh` | workspace `scripts/` | Keep in workspace (workspace-specific) |
343
+ | `lessons-review.sh` | workspace `scripts/` | Keep in workspace (workspace-specific) |
344
+ | PRD template/examples | toolkit `examples/` | Yes (already there) |
345
+
346
+ **The consolidation is mostly done.** The remaining gap is documentation — the Code Factory design docs and V2 design are in the workspace, not the toolkit. The pipeline integration points documented in `~/Documents/CLAUDE.md` under "Code Factory (Agent-Driven Development)" should be extracted into a toolkit-native `docs/CODE-FACTORY.md`.
347
+
348
+ **What "Code Factory in the toolkit" means concretely:**
349
+ 1. Move Code Factory V2 design insights into `docs/ARCHITECTURE.md` (the authoritative architecture doc)
350
+ 2. Ensure `autocode` skill references all pipeline scripts by their toolkit paths
351
+ 3. The `competitive-mode.md` becomes the template for MAB's dual-agent execution
352
+ 4. Prior-art search evolves into the research phase (this proposal)
353
+
354
+ ---
355
+
356
+ ## 4. Implementation Architecture
357
+
358
+ ### 4.1 Research Skill
359
+
360
+ New file: `skills/research/SKILL.md`
361
+
362
+ ```markdown
363
+ # Research Phase
364
+
365
+ ## Overview
366
+ Conduct structured investigation after design approval and before PRD generation.
367
+ Produces a durable artifact that scopes the PRD and informs the plan.
368
+
369
+ ## Checklist
370
+ 1. Define research questions (from approved design doc)
371
+ 2. Codebase gap analysis (Explore agent)
372
+ 3. Prior-art search (call existing prior-art-search.sh + web search)
373
+ 4. External literature (web search agents, parallel)
374
+ 5. Cross-domain analogies (optional, for complex designs)
375
+ 6. Latent issue scan (grep + lint on files the plan will touch)
376
+ 7. Cost/feasibility model (optional, for compute-intensive features)
377
+ 8. Synthesize into tasks/research-<slug>.md + .json
378
+ 9. Present findings, get user approval
379
+ 10. Apply blocking design changes before proceeding
380
+ ```
381
+
382
+ ### 4.2 Roadmap Skill
383
+
384
+ New file: `skills/roadmap/SKILL.md`
385
+
386
+ Invoked when scope is multi-feature. Produces `docs/roadmap-<theme>.md`. Gates brainstorming — each feature in the roadmap gets its own brainstorm cycle.
387
+
388
+ ### 4.3 Pipeline Updates
389
+
390
+ **`skills/autocode/SKILL.md`** — add Stage 0.5 (roadmap, conditional) and Stage 1.5 (research, always):
391
+
392
+ ```
393
+ Stage 0: Initialize
394
+ Stage 0.5: Roadmap (if multi-feature scope)
395
+ Stage 1: Brainstorm → design doc
396
+ Stage 1.5: Research → tasks/research-<slug>.md + .json
397
+ Stage 2: PRD (scoped by research)
398
+ Stage 3: Plan (informed by research)
399
+ ...
400
+ ```
401
+
402
+ **`commands/code-factory.md`** — add research stage between brainstorming and PRD
403
+
404
+ **`scripts/auto-compound.sh`** — replace Step 2.5 (prior-art search) with full research phase:
405
+ ```bash
406
+ # Step 2.5: Research phase (replaces prior-art search)
407
+ log_step "Running research phase..."
408
+ # Call claude -p with research skill prompt
409
+ # Produces tasks/research-<slug>.json
410
+ # Check for blocking design changes
411
+ if jq -e '.design_changes[] | select(.blocking == true)' "tasks/research-${slug}.json" >/dev/null 2>&1; then
412
+ log_error "Blocking design changes found — review before proceeding"
413
+ exit 1
414
+ fi
415
+ ```
416
+
417
+ **`scripts/lib/run-plan-context.sh`** — add research warnings to per-batch context:
418
+ ```bash
419
+ # After failure patterns, before context_refs:
420
+ local research_file
421
+ research_file=$(find "$worktree/tasks/" -name "research-*.json" -print -quit 2>/dev/null)
422
+ if [[ -f "$research_file" ]]; then
423
+ local warnings
424
+ warnings=$(jq -r '.latent_issues[] | select(.severity == "critical" or .severity == "high") | "⚠ \(.file):\(.line) — \(.description)"' "$research_file" 2>/dev/null || true)
425
+ if [[ -n "$warnings" ]]; then
426
+ context+="### Research Warnings (fix before touching these files)"$'\n'
427
+ context+="$warnings"$'\n\n'
428
+ fi
429
+ fi
430
+ ```
431
+
432
+ ### 4.4 Research Gate
433
+
434
+ New file: `scripts/research-gate.sh`
435
+
436
+ Runs before PRD generation. Checks `tasks/research-<slug>.json` for blocking items:
437
+ - Blocking design changes → exit 1 (blocks PRD generation)
438
+ - Critical latent issues → exit 1 (must be acknowledged)
439
+ - Non-blocking items → exit 0 (warnings only)
440
+
441
+ Same enforcement pattern as quality gates — machine-verifiable, exit-code-driven.
442
+
443
+ ---
444
+
445
+ ## 5. The "Always Make a File" Principle
446
+
447
+ **Rule:** Every research activity produces a file. No ephemeral research.
448
+
449
+ This principle applies across the pipeline:
450
+
451
+ | Activity | File Produced | Format |
452
+ |----------|--------------|--------|
453
+ | Brainstorming exploration | `docs/plans/YYYY-MM-DD-<topic>-design.md` | Already exists |
454
+ | Research phase | `tasks/research-<slug>.md` + `.json` | New |
455
+ | PRD generation | `tasks/prd.json` + `tasks/prd-<feature>.md` | Already exists |
456
+ | Plan writing | `docs/plans/YYYY-MM-DD-<feature>.md` | Already exists |
457
+ | Per-batch execution | `.run-plan-state.json` + `progress.txt` | Already exists |
458
+ | MAB judge verdicts | `logs/mab-run-<ts>.json` | Already exists |
459
+ | Verification | Inline (PRD criteria results) | Could produce `tasks/verification-<slug>.md` |
460
+
461
+ **Why files, not memory:** Files survive context resets. A research finding discovered in one session and written to a file is available to every future session. A finding that lives only in conversation context dies when the session ends.
462
+
463
+ **Implementation:** The research skill's checklist Step 8 ("Synthesize into tasks/research-<slug>.md + .json") makes file creation mandatory, not optional. The research gate (Section 4.4) makes the file's existence a prerequisite for PRD generation.
464
+
465
+ ---
466
+
467
+ ## 6. Revised Full Pipeline
468
+
469
+ ```
470
+ USER INPUT (feature description, report, or issue)
471
+
472
+
473
+ Stage 0: INITIALIZE
474
+ │ Detect project, load CLAUDE.md, check Telegram, init progress.txt
475
+ │ If input is report: analyze-report.sh → analysis.json
476
+
477
+ ├── Multi-feature scope detected?
478
+ │ │
479
+ │ ▼ Yes
480
+ │ Stage 0.5: ROADMAP
481
+ │ Invoke skills/roadmap
482
+ │ Produce: docs/roadmap-<theme>.md
483
+ │ Gate: user approves roadmap
484
+ │ Loop: for each feature in roadmap order ─────┐
485
+ │ │
486
+ ▼ │
487
+ Stage 1: BRAINSTORM │
488
+ │ Invoke brainstorming skill │
489
+ │ Produce: docs/plans/YYYY-MM-DD-<topic>-design.md │
490
+ │ Gate: user approves design │
491
+ │ │
492
+ ▼ │
493
+ Stage 1.5: RESEARCH [NEW] │
494
+ │ Invoke research skill (parallel agents) │
495
+ │ Produce: tasks/research-<slug>.md + .json │
496
+ │ Gate: research-gate.sh (no blocking items) │
497
+ │ Feedback: blocking changes → revise design │
498
+ │ │
499
+ ▼ │
500
+ Stage 2: PRD │
501
+ │ /create-prd (reads research JSON for scoping) │
502
+ │ Produce: tasks/prd.json + tasks/prd-<feature>.md │
503
+ │ Gate: user approves │
504
+ │ │
505
+ ▼ │
506
+ Stage 3: PLAN │
507
+ │ writing-plans (references research report) │
508
+ │ Produce: docs/plans/YYYY-MM-DD-<feature>.md │
509
+ │ Gate: user chooses execution mode │
510
+ │ │
511
+ ▼ │
512
+ Stage 3.5: ISOLATE │
513
+ │ using-git-worktrees │
514
+ │ Produce: .worktrees/<branch>/ │
515
+ │ Gate: baseline tests pass │
516
+ │ │
517
+ ▼ │
518
+ Stage 4: EXECUTE │
519
+ │ One of: subagent / executing-plans / headless / │
520
+ │ ralph-loop / MAB │
521
+ │ Per-batch: quality gate + research warnings │
522
+ │ Produce: committed code, progress.txt updates │
523
+ │ │
524
+ ▼ │
525
+ Stage 5: VERIFY │
526
+ │ verification-before-completion │
527
+ │ ALL PRD criteria pass (shell commands) │
528
+ │ Lesson scanner on changed files │
529
+ │ │
530
+ ▼ │
531
+ Stage 6: FINISH │
532
+ │ finishing-a-development-branch │
533
+ │ Merge / PR / Keep / Discard │
534
+ │ ───────────────────────── Loop back for next ─────┘
535
+ │ feature in roadmap
536
+
537
+ DONE
538
+ ```
539
+
540
+ ---
541
+
542
+ ## 7. Effort Estimate
543
+
544
+ | Component | Files | New/Modify | Effort |
545
+ |-----------|-------|-----------|--------|
546
+ | Research skill | `skills/research/SKILL.md` | New | 1 task |
547
+ | Roadmap skill | `skills/roadmap/SKILL.md` | New | 1 task |
548
+ | Research gate | `scripts/research-gate.sh` | New | 1 task |
549
+ | Autocode skill update | `skills/autocode/SKILL.md` | Modify | 1 task |
550
+ | Code factory command update | `commands/code-factory.md` | Modify | 1 task |
551
+ | create-prd command update | `commands/create-prd.md` | Modify | 1 task |
552
+ | Context injection update | `scripts/lib/run-plan-context.sh` | Modify | 1 task |
553
+ | auto-compound.sh update | `scripts/auto-compound.sh` | Modify | 1 task |
554
+ | Code Factory docs | `docs/CODE-FACTORY.md` | New | 1 task |
555
+ | ARCHITECTURE.md update | `docs/ARCHITECTURE.md` | Modify | 1 task |
556
+ | Tests | `scripts/tests/test_research_gate.sh` | New | 1 task |
557
+ | **Total** | **11 files** | **5 new, 6 modify** | **~2 batches** |
558
+
559
+ ---
560
+
561
+ ## 8. Sources
562
+
563
+ ### AI Agent Architecture
564
+ - [SWE-bench Technical Report — Cognition](https://cognition.ai/blog/swe-bench-technical-report)
565
+ - [SWE-grep: RL for Fast Context Retrieval — Cognition](https://cognition.ai/blog/swe-grep)
566
+ - [Devin 2.0 Planning Mode — Cognition](https://cognition.ai/blog/devin-2)
567
+ - [Lost in the Middle — Stanford, arXiv 2307.03172](https://arxiv.org/abs/2307.03172)
568
+ - [Effective Context Engineering — Anthropic Engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
569
+ - [Context Engineering for Agents — LangChain Blog](https://blog.langchain.com/context-engineering-for-agents/)
570
+ - [RAG Review 2025 — RAGFlow](https://ragflow.io/blog/rag-review-2025-from-rag-to-context)
571
+ - [CodeRAG-Bench — arXiv 2406.14497](https://arxiv.org/html/2406.14497v1)
572
+ - [RACG Survey — arXiv 2510.04905](https://arxiv.org/abs/2510.04905)
573
+ - [A-RAG Hierarchical Retrieval — arXiv 2602.03442](https://arxiv.org/html/2602.03442v1)
574
+ - [Building Effective AI Agents — Anthropic](https://www.anthropic.com/research/building-effective-agents)
575
+ - [Code Generation with LLM Agents Survey — arXiv 2508.00083](https://arxiv.org/html/2508.00083v1)
576
+
577
+ ### Deep Research Agent Pipelines
578
+ - [GPT Researcher — GitHub](https://github.com/assafelovic/gpt-researcher)
579
+ - [GPT Researcher Architecture — DeepWiki](https://deepwiki.com/assafelovic/gpt-researcher)
580
+ - [DeepResearchAgent — SkyworkAI](https://github.com/SkyworkAI/DeepResearchAgent)
581
+ - [Deep Research API — OpenAI Cookbook](https://cookbook.openai.com/examples/deep_research_api/introduction_to_deep_research_api_agents)
582
+ - [Deep Research Agents Examination — arXiv 2506.18096](https://arxiv.org/html/2506.18096v2)
583
+
584
+ ### Cross-Domain Frameworks
585
+ - [Cochrane PICO](https://www.cochranelibrary.com/about-pico)
586
+ - [otto-SR: Automated Systematic Reviews](https://ottosr.com/manuscript.pdf)
587
+ - [ASReview — Nature Machine Intelligence](https://www.nature.com/articles/s42256-020-00287-7)
588
+ - [Double Diamond — British Design Council / Maze](https://maze.co/blog/double-diamond-design-process/)
589
+ - [Intelligence Preparation of the Battlefield — Army ADP 2-01.3](https://armypubs.army.mil/epubs/DR_pubs/DR_a/ARN36709-ATP_2-01.3-001-WEB-2.pdf)
590
+ - [Analysis of Competing Hypotheses — CIA](https://www.cia.gov/static/955180a45afe3f5013772c313b16face/Tradecraft-Primer-apr09.pdf)
591
+ - [Technical Spikes in Agile — Talent500](https://talent500.com/blog/spike-in-agile-purpose-process-best-practices/)
592
+
593
+ ### Codebase (Internal)
594
+ - `skills/autocode/competitive-mode.md` — pre-flight exploration pattern (codebase + external agents)
595
+ - `scripts/prior-art-search.sh` — existing prior-art search (GitHub + local + ast-grep)
596
+ - `scripts/auto-compound.sh` — automated pipeline with Step 2.5 prior-art search
597
+ - `docs/plans/2026-02-21-code-factory-v2-design.md` — V2 design with prior-art search as Task 3.2
598
+ - `docs/plans/2026-02-21-code-factory-v2-phase4-design.md` — ast-grep discovery mode
599
+ - `docs/plans/2026-02-13-ha-intelligence-research-findings.md` — example of structured research (4 parallel agents, 100+ papers)
600
+ - `docs/plans/2026-02-21-infrastructure-deep-research.md` — example of structured research (5 parallel agents)
601
+ - `docs/plans/2026-02-21-mab-research-report.md` — MAB Round 1 research (this session)
602
+ - `docs/plans/2026-02-22-mab-research-round2.md` — MAB Round 2 research (this session)