autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,602 @@
|
|
|
1
|
+
# Research Phase Integration: Formalizing Research in the Autonomous Coding Pipeline
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-22
|
|
4
|
+
**Status:** Research complete
|
|
5
|
+
**Scope:** How to integrate a structured research phase into the toolkit's workflow pipeline, plus code factory consolidation and roadmap stage
|
|
6
|
+
**Method:** 3 parallel research agents (external frameworks, codebase analysis, cross-domain analogies) + manual codebase exploration
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Executive Summary
|
|
11
|
+
|
|
12
|
+
The autonomous coding toolkit's pipeline (brainstorm → PRD → plan → execute → verify → finish) has no formalized research phase. Research happens informally during brainstorming (codebase reconnaissance) and is partially automated via `prior-art-search.sh` in the headless pipeline — but neither produces a structured, reusable artifact.
|
|
13
|
+
|
|
14
|
+
Evidence from six domains (medicine, military intelligence, design thinking, competitive intelligence, deep research agents, SWE-bench) converges on the same pattern: **structured research before action, producing a durable artifact that downstream phases consume.**
|
|
15
|
+
|
|
16
|
+
The MAB research we conducted in this session is the proof case — Round 1 alone halved the batch count, identified 80% code reuse, surfaced 3 academic techniques the design missed, and found 8 latent bugs. All before a single line of implementation code was written.
|
|
17
|
+
|
|
18
|
+
**Three additions proposed:**
|
|
19
|
+
1. **Research phase** — new Stage 1.5 between brainstorming and PRD, producing structured `tasks/research-<slug>.md` + `.json`
|
|
20
|
+
2. **Roadmap stage** — new Stage 0.5 before brainstorming, for multi-feature sequencing
|
|
21
|
+
3. **Code factory consolidation** — bring all Code Factory scripts and skills into the toolkit as first-class pipeline components
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 1. The Case for Formalized Research
|
|
26
|
+
|
|
27
|
+
### 1.1 What Top-Performing Agents Do
|
|
28
|
+
|
|
29
|
+
Evidence from SWE-bench, Cognition (Devin), and academic literature:
|
|
30
|
+
|
|
31
|
+
| Finding | Source | Implication |
|
|
32
|
+
|---------|--------|-------------|
|
|
33
|
+
| Agents spend >60% of first-turn time retrieving context | Cognition SWE-bench report | Context retrieval is the bottleneck, not code generation |
|
|
34
|
+
| SWE-grep (specialized retrieval sub-agent) reduced context retrieval from 20+ turns to 4 turns | Cognition SWE-grep blog | Separate the retrieval agent from the coding agent |
|
|
35
|
+
| Performance degrades >30% when relevant info is in the middle of context vs beginning/end | Stanford "Lost in the Middle" (arXiv 2307.03172) | Compress and select before injecting — don't dump everything |
|
|
36
|
+
| RAG from diverse, high-quality sources produces significant gains even on top of GPT-4 | CodeRAG-Bench (arXiv 2406.14497) | Multi-source research (codebase + docs + web + papers) compounds |
|
|
37
|
+
| 72% of SWE-bench successes take >10 minutes | SWE-bench Pro (Scale AI) | Exploration time is not waste — it's the work |
|
|
38
|
+
| "Most agent failures are not model failures — they are context failures" | Anthropic context engineering guide | The research phase IS context engineering |
|
|
39
|
+
|
|
40
|
+
### 1.2 What the Current Pipeline Does (and Doesn't)
|
|
41
|
+
|
|
42
|
+
| Research-like Activity | Where | Artifact Produced | Consumed By | Gap |
|
|
43
|
+
|----------------------|-------|-------------------|-------------|-----|
|
|
44
|
+
| Codebase reconnaissance | brainstorming Step 1 | None — ephemeral | Clarifying questions only | No artifact, no record |
|
|
45
|
+
| Prior-art search | `auto-compound.sh` Step 2.5 | `prior-art-results.txt` (unstructured) | PRD prompt injection | Not in interactive path; unstructured; no schema |
|
|
46
|
+
| Report analysis | `analyze-report.sh` | `analysis.json` | `auto-compound.sh` only | Triage, not research |
|
|
47
|
+
| PRD investigation tasks | `create-prd.md` Step 4 | None — findings disappear | `progress.txt` at best | No template, no format, no enforcement |
|
|
48
|
+
| Competitive pre-flight | `competitive-mode.md` | Context brief (ephemeral) | Competitor prompts | Only in competitive mode; not a durable artifact |
|
|
49
|
+
| Manual research reports | MAB session (this session) | `docs/plans/*.md` (structured) | Design doc, plan | No automated analog |
|
|
50
|
+
|
|
51
|
+
**The structural gap:** Research findings have no path back into the pipeline. There is no stage that reads research output and uses it to modify the design, scope the PRD, or annotate the plan.
|
|
52
|
+
|
|
53
|
+
### 1.3 What the MAB Research Session Proved
|
|
54
|
+
|
|
55
|
+
| Activity | Impact | Pipeline Could Have Done This? |
|
|
56
|
+
|----------|--------|-------------------------------|
|
|
57
|
+
| Codebase gap analysis | Identified 80% infrastructure reuse — halved batch count | No — brainstorming doesn't audit existing code against plan assumptions |
|
|
58
|
+
| Academic literature review | Added Thompson Sampling, position bias mitigation, prompt evolution | No — no external search mechanism |
|
|
59
|
+
| Cross-domain analogies | 7 analogies produced 3 universal patterns (locked criteria, diversity as signal, discriminating conditions) | No — nothing searches outside the domain |
|
|
60
|
+
| Cost modeling | $1.88 vs $10.58/batch with cache priming — changed architecture | No — no cost analysis mechanism |
|
|
61
|
+
| Latent bug identification | 8 bugs found before implementation (including state schema mismatch affecting all headless runs) | Partial — lesson-check is post-hoc, not pre-implementation |
|
|
62
|
+
| Research → plan reshape | Round 1 halved batches; Round 2 added cache-prime step | No — no feedback path from research to plan |
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 2. Cross-Domain Research Frameworks
|
|
67
|
+
|
|
68
|
+
### 2.1 Evidence-Based Medicine (PICO + Cochrane)
|
|
69
|
+
|
|
70
|
+
The strongest anti-bias framework. Five mandatory phases before analysis:
|
|
71
|
+
|
|
72
|
+
1. **Protocol registration** — pre-specify question, inclusion/exclusion criteria, synthesis method *before seeing data*
|
|
73
|
+
2. **Question decomposition (PICO):** Population, Intervention, Comparison, Outcome
|
|
74
|
+
3. **Search strategy** — explicit queries across explicit sources, documented for reproducibility
|
|
75
|
+
4. **Screening** — two-stage: title/abstract first, then full-text, with pre-defined inclusion rules
|
|
76
|
+
5. **Data extraction → synthesis** — structured form per source, then aggregation with confidence grades (GRADE: high/moderate/low/very low)
|
|
77
|
+
|
|
78
|
+
**Key artifacts:** `review_protocol.md` (frozen before search), `search_log.json`, `screening_matrix.csv`, `evidence_table.md`
|
|
79
|
+
|
|
80
|
+
**Transferable insight:** The protocol is frozen before data collection. You cannot adjust inclusion criteria after seeing results. Applied to coding: define what "relevant prior art" means *before* searching.
|
|
81
|
+
|
|
82
|
+
**Automated analog:** otto-SR reproduced 12 Cochrane reviews in 2 days using a multi-agent LLM pipeline (abstract screen → full-text screen → extraction → synthesis). Sensitivity: 96.7%, specificity: 97.9%.
|
|
83
|
+
|
|
84
|
+
### 2.2 Military Intelligence (IPB + ACH + OODA)
|
|
85
|
+
|
|
86
|
+
**Intelligence Preparation of the Battlefield (IPB)** — four mandatory steps:
|
|
87
|
+
|
|
88
|
+
1. Define operational environment (scope)
|
|
89
|
+
2. Describe environmental effects on operations (constraints)
|
|
90
|
+
3. Evaluate the threat (adversary capabilities)
|
|
91
|
+
4. Determine threat courses of action (all plausible, not just most likely)
|
|
92
|
+
|
|
93
|
+
**What's distinctive:** IPB explicitly maps *what you don't know* alongside what you do. The artifact isn't just findings — it's a **structured-ignorance document** that defines what information would change the assessment.
|
|
94
|
+
|
|
95
|
+
**Analysis of Competing Hypotheses (ACH):** Build an evidence matrix where rows are evidence items and columns are competing hypotheses. Score each cell. The hypothesis with the least disconfirming evidence wins — not the one with the most confirming evidence.
|
|
96
|
+
|
|
97
|
+
**Transferable insights:**
|
|
98
|
+
- Map unknowns explicitly, not just knowns
|
|
99
|
+
- Evaluate competing approaches by disconfirmation, not confirmation
|
|
100
|
+
- The ASCOPE matrix (Areas, Structures, Capabilities, Organizations, People, Events) translates to: Files, Modules, APIs, Dependencies, Users, Workflows
|
|
101
|
+
|
|
102
|
+
### 2.3 Design Thinking (Double Diamond)
|
|
103
|
+
|
|
104
|
+
Two explicit diverge/converge cycles with a hard gate between them:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
Diamond 1: Problem Space Diamond 2: Solution Space
|
|
108
|
+
[Discover] → [Define] GATE [Develop] → [Deliver]
|
|
109
|
+
(diverge) (converge) (diverge) (converge)
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**Gate rule:** You *cannot* enter solution space without a frozen problem definition.
|
|
113
|
+
|
|
114
|
+
**Discovery phase artifacts:** Empathy maps, competitive landscape matrix, "How Might We" question bank, insight statements
|
|
115
|
+
|
|
116
|
+
**Transferable insight:** Discovery is explicitly divergent — collect more than you need, then cull. The cull produces a Point of View (POV) statement that's frozen before solution work begins.
|
|
117
|
+
|
|
118
|
+
### 2.4 Competitive Intelligence
|
|
119
|
+
|
|
120
|
+
The intelligence cycle: **Requirements → Collection → Analysis → Dissemination**
|
|
121
|
+
|
|
122
|
+
**What's distinctive:** Dissemination is tailored by consumer role. The same research produces different artifacts for different downstream consumers (executive summary for decision-makers, detailed analysis for implementers, raw data for further analysis).
|
|
123
|
+
|
|
124
|
+
**Applied to the pipeline:** A single research phase produces:
|
|
125
|
+
- `research-<slug>.md` — human-readable report for design review
|
|
126
|
+
- `research-<slug>.json` — machine-readable for PRD scoping and context injection
|
|
127
|
+
- GitHub issues — for deferred items discovered during research
|
|
128
|
+
|
|
129
|
+
### 2.5 Deep Research Agent Architecture
|
|
130
|
+
|
|
131
|
+
The canonical pipeline (from GPT Researcher, OpenAI Deep Research, DeepResearchAgent):
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
Phase 1: PLAN — decompose query into sub-questions (strategic LLM)
|
|
135
|
+
Phase 2: EXECUTE — parallel retrieval per sub-question (crawler agents)
|
|
136
|
+
Phase 3: CURATE — embedding similarity filter + credibility ranking
|
|
137
|
+
Phase 4: SYNTHESIZE — aggregate into structured output (smart LLM)
|
|
138
|
+
Phase 5: PUBLISH — format with citations
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
**Key insight:** These pipelines treat research output as a *durable artifact* (a report), not ephemeral context. Coding agents typically treat retrieved context as ephemeral — this is the architectural gap.
|
|
142
|
+
|
|
143
|
+
### 2.6 Agile Technical Spikes
|
|
144
|
+
|
|
145
|
+
**Definition:** A time-boxed investigation task with a single question and a concrete deliverable (decision, estimate, or prototype).
|
|
146
|
+
|
|
147
|
+
**Best practices:**
|
|
148
|
+
- Single clear question — not "understand the codebase" but "what dependency injection pattern does the auth module use?"
|
|
149
|
+
- Time-boxed to 1-3 days (for agents: token/turn budgets)
|
|
150
|
+
- Deliverable is a decision, not code
|
|
151
|
+
- Two types: Technical (how to build) vs Functional (what to build)
|
|
152
|
+
|
|
153
|
+
**The anti-pattern AI agents make:** They conflate spike and implementation into a single trajectory. The agent starts searching and starts writing before search is complete.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## 3. Proposed Pipeline Changes
|
|
158
|
+
|
|
159
|
+
### 3.1 Current Pipeline
|
|
160
|
+
|
|
161
|
+
```
|
|
162
|
+
Stage 0: Initialize — detect project, load context
|
|
163
|
+
Stage 1: Brainstorm — design doc + user approval
|
|
164
|
+
Stage 2: PRD — tasks/prd.json with shell-verifiable criteria
|
|
165
|
+
Stage 3: Plan — TDD implementation plan
|
|
166
|
+
Stage 3.5: Isolate — git worktree
|
|
167
|
+
Stage 4: Execute — one of 4 modes
|
|
168
|
+
Stage 5: Verify — all PRD criteria pass
|
|
169
|
+
Stage 6: Finish — merge/PR/keep/discard
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### 3.2 Proposed Pipeline (3 additions)
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
Stage 0: Initialize — detect project, load context
|
|
176
|
+
Stage 0.5: ROADMAP [NEW] — multi-feature sequencing, priority ordering
|
|
177
|
+
Stage 1: Brainstorm — design doc + user approval
|
|
178
|
+
Stage 1.5: RESEARCH [NEW] — structured investigation, produces durable artifact
|
|
179
|
+
Stage 2: PRD — tasks/prd.json (scoped by research findings)
|
|
180
|
+
Stage 3: Plan — TDD implementation plan (informed by research)
|
|
181
|
+
Stage 3.5: Isolate — git worktree
|
|
182
|
+
Stage 4: Execute — one of 4+ modes (including MAB)
|
|
183
|
+
Stage 5: Verify — all PRD criteria pass
|
|
184
|
+
Stage 6: Finish — merge/PR/keep/discard
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### 3.3 Stage 0.5: Roadmap (New)
|
|
188
|
+
|
|
189
|
+
**Purpose:** Before brainstorming a single feature, assess whether the work fits into a larger picture. A roadmap answers: *What order should features be built in? What blocks what? What's the minimum viable sequence?*
|
|
190
|
+
|
|
191
|
+
**When to invoke:**
|
|
192
|
+
- When the user describes multiple features or a large system
|
|
193
|
+
- When `auto-compound.sh` processes a report with multiple priorities
|
|
194
|
+
- When multiple GitHub issues exist and need sequencing
|
|
195
|
+
- Skip for single, isolated features
|
|
196
|
+
|
|
197
|
+
**Artifact:** `docs/roadmap-<project-or-theme>.md`
|
|
198
|
+
|
|
199
|
+
```markdown
|
|
200
|
+
# Roadmap: <theme>
|
|
201
|
+
**Date:** YYYY-MM-DD
|
|
202
|
+
**Scope:** <what this roadmap covers>
|
|
203
|
+
|
|
204
|
+
## Features (priority order)
|
|
205
|
+
| # | Feature | Depends On | Effort | Value Signal |
|
|
206
|
+
|---|---------|-----------|--------|-------------|
|
|
207
|
+
| 1 | <name> | — | S/M/L | <why this first> |
|
|
208
|
+
| 2 | <name> | #1 | S/M/L | <why this order> |
|
|
209
|
+
|
|
210
|
+
## Dependency Graph
|
|
211
|
+
<text-based or mermaid graph>
|
|
212
|
+
|
|
213
|
+
## Decision Log
|
|
214
|
+
- <decision>: <rationale>
|
|
215
|
+
|
|
216
|
+
## Out of Scope
|
|
217
|
+
- <item>: <why deferred>
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
**Gate:** User approves roadmap before brainstorming the first feature. Each feature in the roadmap gets its own brainstorm → research → PRD → plan → execute cycle.
|
|
221
|
+
|
|
222
|
+
**Integration with pipeline:**
|
|
223
|
+
- `autocode` skill checks for existing roadmap; if none exists and scope seems multi-feature, prompts user
|
|
224
|
+
- `auto-compound.sh` can generate roadmap from multi-priority `analysis.json`
|
|
225
|
+
- Roadmap is a living document — updated after each feature completes
|
|
226
|
+
|
|
227
|
+
### 3.4 Stage 1.5: Research (New)
|
|
228
|
+
|
|
229
|
+
**Purpose:** After the design is approved, before PRD generation, conduct structured investigation to validate assumptions, find reusable components, surface latent issues, and mine external knowledge.
|
|
230
|
+
|
|
231
|
+
**Activities (parallel where possible):**
|
|
232
|
+
|
|
233
|
+
| Activity | Agent Type | Sources | Output |
|
|
234
|
+
|----------|-----------|---------|--------|
|
|
235
|
+
| Codebase gap analysis | Explore | Local files, AST, imports | Reuse table |
|
|
236
|
+
| Prior-art search | general-purpose | GitHub, web, Context7 | Library recommendations, patterns |
|
|
237
|
+
| Academic/external lit | general-purpose | Web search, papers | Techniques, measured impact |
|
|
238
|
+
| Cross-domain analogies | general-purpose | Web search (lateral) | Transferable patterns |
|
|
239
|
+
| Cost/feasibility | general-purpose | API pricing, benchmarks | Cost model |
|
|
240
|
+
| Latent issue scan | Explore + Bash | Existing code, tests, lint | Bug list with file:line |
|
|
241
|
+
|
|
242
|
+
**Research protocol (adapted from Cochrane):**
|
|
243
|
+
1. **Scope** — what questions does this research answer? (derived from design doc)
|
|
244
|
+
2. **Search** — explicit queries, documented in the artifact
|
|
245
|
+
3. **Screen** — relevance filter on results
|
|
246
|
+
4. **Extract** — structured findings per source
|
|
247
|
+
5. **Synthesize** — implications for design, PRD scope, and plan
|
|
248
|
+
|
|
249
|
+
**Artifacts produced:**
|
|
250
|
+
|
|
251
|
+
**`tasks/research-<feature-slug>.md`** — human-readable report:
|
|
252
|
+
```markdown
|
|
253
|
+
# Research: <feature>
|
|
254
|
+
**Date:** YYYY-MM-DD
|
|
255
|
+
**Design doc:** docs/plans/YYYY-MM-DD-<topic>-design.md
|
|
256
|
+
|
|
257
|
+
## Research Questions
|
|
258
|
+
1. <question derived from design>
|
|
259
|
+
2. <question>
|
|
260
|
+
|
|
261
|
+
## Codebase Gap Analysis
|
|
262
|
+
| Requirement | Existing File | Reusable? | Gap |
|
|
263
|
+
|-------------|--------------|-----------|-----|
|
|
264
|
+
|
|
265
|
+
## External Findings
|
|
266
|
+
### <Source Title>
|
|
267
|
+
- **Source:** <URL or citation>
|
|
268
|
+
- **Key finding:** <1-2 sentences>
|
|
269
|
+
- **Implication:** <how this affects our design>
|
|
270
|
+
|
|
271
|
+
## Latent Issues
|
|
272
|
+
| File:Line | Description | Severity | Blocking? |
|
|
273
|
+
|-----------|-------------|----------|-----------|
|
|
274
|
+
|
|
275
|
+
## Cross-Domain Insights
|
|
276
|
+
| Domain | Pattern | Application |
|
|
277
|
+
|--------|---------|-------------|
|
|
278
|
+
|
|
279
|
+
## Design Changes Recommended
|
|
280
|
+
1. [BLOCKING] <change> — <rationale>
|
|
281
|
+
2. <change> — <rationale>
|
|
282
|
+
|
|
283
|
+
## Cost Model
|
|
284
|
+
<if applicable>
|
|
285
|
+
|
|
286
|
+
## Deferred Items
|
|
287
|
+
- <item> → GitHub issue created: #<number>
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
**`tasks/research-<feature-slug>.json`** — machine-readable:
|
|
291
|
+
```json
|
|
292
|
+
{
|
|
293
|
+
"feature": "string",
|
|
294
|
+
"date": "YYYY-MM-DD",
|
|
295
|
+
"design_doc": "path",
|
|
296
|
+
"reuse_components": [
|
|
297
|
+
{"requirement": "string", "file": "string", "lines": "string", "gap": "none|partial|full"}
|
|
298
|
+
],
|
|
299
|
+
"latent_issues": [
|
|
300
|
+
{"file": "string", "line": 0, "description": "string", "severity": "critical|high|medium|low", "blocking": true}
|
|
301
|
+
],
|
|
302
|
+
"design_changes": [
|
|
303
|
+
{"change": "string", "rationale": "string", "blocking": true}
|
|
304
|
+
],
|
|
305
|
+
"prd_scope_delta": {
|
|
306
|
+
"tasks_removable": ["string"],
|
|
307
|
+
"tasks_added": ["string"],
|
|
308
|
+
"estimated_task_reduction": 0
|
|
309
|
+
},
|
|
310
|
+
"external_findings_count": 0,
|
|
311
|
+
"search_queries": ["string"]
|
|
312
|
+
}
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
**Consumption by downstream stages:**
|
|
316
|
+
|
|
317
|
+
| Stage | How It Uses Research |
|
|
318
|
+
|-------|---------------------|
|
|
319
|
+
| PRD generation | Reads `prd_scope_delta` — removes tasks covered by reuse, adds tasks for latent issues |
|
|
320
|
+
| Writing plans | References research report under `## Research Findings`; adds fix tasks for latent issues |
|
|
321
|
+
| run-plan-context.sh | Injects critical/high latent issues as `### Research Warnings` in per-batch context |
|
|
322
|
+
| auto-compound.sh | Replaces Step 2.5 (prior-art-results.txt) with structured research JSON |
|
|
323
|
+
| Quality gate | `research-gate.sh` blocks PRD generation if blocking design changes unresolved |
|
|
324
|
+
|
|
325
|
+
### 3.5 Code Factory Consolidation
|
|
326
|
+
|
|
327
|
+
**Current state:** Code Factory scripts and concepts are split between the toolkit repo and the Documents workspace:
|
|
328
|
+
|
|
329
|
+
| Component | Location | Should Be In Toolkit? |
|
|
330
|
+
|-----------|----------|----------------------|
|
|
331
|
+
| `auto-compound.sh` | toolkit `scripts/` | Yes (already there) |
|
|
332
|
+
| `quality-gate.sh` | toolkit `scripts/` | Yes (already there) |
|
|
333
|
+
| `run-plan.sh` + libs | toolkit `scripts/` | Yes (already there) |
|
|
334
|
+
| `analyze-report.sh` | toolkit `scripts/` | Yes (already there) |
|
|
335
|
+
| `prior-art-search.sh` | toolkit `scripts/` | Yes (already there) |
|
|
336
|
+
| `/create-prd` command | toolkit `commands/` | Yes (already there) |
|
|
337
|
+
| `/code-factory` command | toolkit `commands/` | Yes (already there) |
|
|
338
|
+
| `autocode` skill | toolkit `skills/` | Yes (already there) |
|
|
339
|
+
| `competitive-mode.md` | toolkit `skills/autocode/` | Yes (already there) |
|
|
340
|
+
| Code Factory design doc | workspace `docs/plans/` | Move to toolkit `docs/` |
|
|
341
|
+
| Code Factory V2 design | workspace `docs/plans/` | Move to toolkit `docs/` |
|
|
342
|
+
| `claude-md-validate.sh` | workspace `scripts/` | Keep in workspace (workspace-specific) |
|
|
343
|
+
| `lessons-review.sh` | workspace `scripts/` | Keep in workspace (workspace-specific) |
|
|
344
|
+
| PRD template/examples | toolkit `examples/` | Yes (already there) |
|
|
345
|
+
|
|
346
|
+
**The consolidation is mostly done.** The remaining gap is documentation — the Code Factory design docs and V2 design are in the workspace, not the toolkit. The pipeline integration points documented in `~/Documents/CLAUDE.md` under "Code Factory (Agent-Driven Development)" should be extracted into a toolkit-native `docs/CODE-FACTORY.md`.
|
|
347
|
+
|
|
348
|
+
**What "Code Factory in the toolkit" means concretely:**
|
|
349
|
+
1. Move Code Factory V2 design insights into `docs/ARCHITECTURE.md` (the authoritative architecture doc)
|
|
350
|
+
2. Ensure `autocode` skill references all pipeline scripts by their toolkit paths
|
|
351
|
+
3. The `competitive-mode.md` becomes the template for MAB's dual-agent execution
|
|
352
|
+
4. Prior-art search evolves into the research phase (this proposal)
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
## 4. Implementation Architecture
|
|
357
|
+
|
|
358
|
+
### 4.1 Research Skill
|
|
359
|
+
|
|
360
|
+
New file: `skills/research/SKILL.md`
|
|
361
|
+
|
|
362
|
+
```markdown
|
|
363
|
+
# Research Phase
|
|
364
|
+
|
|
365
|
+
## Overview
|
|
366
|
+
Conduct structured investigation after design approval and before PRD generation.
|
|
367
|
+
Produces a durable artifact that scopes the PRD and informs the plan.
|
|
368
|
+
|
|
369
|
+
## Checklist
|
|
370
|
+
1. Define research questions (from approved design doc)
|
|
371
|
+
2. Codebase gap analysis (Explore agent)
|
|
372
|
+
3. Prior-art search (call existing prior-art-search.sh + web search)
|
|
373
|
+
4. External literature (web search agents, parallel)
|
|
374
|
+
5. Cross-domain analogies (optional, for complex designs)
|
|
375
|
+
6. Latent issue scan (grep + lint on files the plan will touch)
|
|
376
|
+
7. Cost/feasibility model (optional, for compute-intensive features)
|
|
377
|
+
8. Synthesize into tasks/research-<slug>.md + .json
|
|
378
|
+
9. Present findings, get user approval
|
|
379
|
+
10. Apply blocking design changes before proceeding
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
### 4.2 Roadmap Skill
|
|
383
|
+
|
|
384
|
+
New file: `skills/roadmap/SKILL.md`
|
|
385
|
+
|
|
386
|
+
Invoked when scope is multi-feature. Produces `docs/roadmap-<theme>.md`. Gates brainstorming — each feature in the roadmap gets its own brainstorm cycle.
|
|
387
|
+
|
|
388
|
+
### 4.3 Pipeline Updates
|
|
389
|
+
|
|
390
|
+
**`skills/autocode/SKILL.md`** — add Stage 0.5 (roadmap, conditional) and Stage 1.5 (research, always):
|
|
391
|
+
|
|
392
|
+
```
|
|
393
|
+
Stage 0: Initialize
|
|
394
|
+
Stage 0.5: Roadmap (if multi-feature scope)
|
|
395
|
+
Stage 1: Brainstorm → design doc
|
|
396
|
+
Stage 1.5: Research → tasks/research-<slug>.md + .json
|
|
397
|
+
Stage 2: PRD (scoped by research)
|
|
398
|
+
Stage 3: Plan (informed by research)
|
|
399
|
+
...
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
**`commands/code-factory.md`** — add research stage between brainstorming and PRD
|
|
403
|
+
|
|
404
|
+
**`scripts/auto-compound.sh`** — replace Step 2.5 (prior-art search) with full research phase:
|
|
405
|
+
```bash
|
|
406
|
+
# Step 2.5: Research phase (replaces prior-art search)
|
|
407
|
+
log_step "Running research phase..."
|
|
408
|
+
# Call claude -p with research skill prompt
|
|
409
|
+
# Produces tasks/research-<slug>.json
|
|
410
|
+
# Check for blocking design changes
|
|
411
|
+
if jq -e '.design_changes[] | select(.blocking == true)' "tasks/research-${slug}.json" >/dev/null 2>&1; then
|
|
412
|
+
log_error "Blocking design changes found — review before proceeding"
|
|
413
|
+
exit 1
|
|
414
|
+
fi
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**`scripts/lib/run-plan-context.sh`** — add research warnings to per-batch context:
|
|
418
|
+
```bash
|
|
419
|
+
# After failure patterns, before context_refs:
|
|
420
|
+
local research_file
|
|
421
|
+
research_file=$(find "$worktree/tasks/" -name "research-*.json" -print -quit 2>/dev/null)
|
|
422
|
+
if [[ -f "$research_file" ]]; then
|
|
423
|
+
local warnings
|
|
424
|
+
warnings=$(jq -r '.latent_issues[] | select(.severity == "critical" or .severity == "high") | "⚠ \(.file):\(.line) — \(.description)"' "$research_file" 2>/dev/null || true)
|
|
425
|
+
if [[ -n "$warnings" ]]; then
|
|
426
|
+
context+="### Research Warnings (fix before touching these files)"$'\n'
|
|
427
|
+
context+="$warnings"$'\n\n'
|
|
428
|
+
fi
|
|
429
|
+
fi
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
### 4.4 Research Gate
|
|
433
|
+
|
|
434
|
+
New file: `scripts/research-gate.sh`
|
|
435
|
+
|
|
436
|
+
Runs before PRD generation. Checks `tasks/research-<slug>.json` for blocking items:
|
|
437
|
+
- Blocking design changes → exit 1 (blocks PRD generation)
|
|
438
|
+
- Critical latent issues → exit 1 (must be acknowledged)
|
|
439
|
+
- Non-blocking items → exit 0 (warnings only)
|
|
440
|
+
|
|
441
|
+
Same enforcement pattern as quality gates — machine-verifiable, exit-code-driven.
|
|
442
|
+
|
|
443
|
+
---
|
|
444
|
+
|
|
445
|
+
## 5. The "Always Make a File" Principle
|
|
446
|
+
|
|
447
|
+
**Rule:** Every research activity produces a file. No ephemeral research.
|
|
448
|
+
|
|
449
|
+
This principle applies across the pipeline:
|
|
450
|
+
|
|
451
|
+
| Activity | File Produced | Format |
|
|
452
|
+
|----------|--------------|--------|
|
|
453
|
+
| Brainstorming exploration | `docs/plans/YYYY-MM-DD-<topic>-design.md` | Already exists |
|
|
454
|
+
| Research phase | `tasks/research-<slug>.md` + `.json` | New |
|
|
455
|
+
| PRD generation | `tasks/prd.json` + `tasks/prd-<feature>.md` | Already exists |
|
|
456
|
+
| Plan writing | `docs/plans/YYYY-MM-DD-<feature>.md` | Already exists |
|
|
457
|
+
| Per-batch execution | `.run-plan-state.json` + `progress.txt` | Already exists |
|
|
458
|
+
| MAB judge verdicts | `logs/mab-run-<ts>.json` | Already exists |
|
|
459
|
+
| Verification | Inline (PRD criteria results) | Could produce `tasks/verification-<slug>.md` |
|
|
460
|
+
|
|
461
|
+
**Why files, not memory:** Files survive context resets. A research finding discovered in one session and written to a file is available to every future session. A finding that lives only in conversation context dies when the session ends.
|
|
462
|
+
|
|
463
|
+
**Implementation:** The research skill's checklist Step 8 ("Synthesize into tasks/research-<slug>.md + .json") makes file creation mandatory, not optional. The research gate (Section 4.4) makes the file's existence a prerequisite for PRD generation.
|
|
464
|
+
|
|
465
|
+
---
|
|
466
|
+
|
|
467
|
+
## 6. Revised Full Pipeline
|
|
468
|
+
|
|
469
|
+
```
|
|
470
|
+
USER INPUT (feature description, report, or issue)
|
|
471
|
+
│
|
|
472
|
+
▼
|
|
473
|
+
Stage 0: INITIALIZE
|
|
474
|
+
│ Detect project, load CLAUDE.md, check Telegram, init progress.txt
|
|
475
|
+
│ If input is report: analyze-report.sh → analysis.json
|
|
476
|
+
│
|
|
477
|
+
├── Multi-feature scope detected?
|
|
478
|
+
│ │
|
|
479
|
+
│ ▼ Yes
|
|
480
|
+
│ Stage 0.5: ROADMAP
|
|
481
|
+
│ Invoke skills/roadmap
|
|
482
|
+
│ Produce: docs/roadmap-<theme>.md
|
|
483
|
+
│ Gate: user approves roadmap
|
|
484
|
+
│ Loop: for each feature in roadmap order ─────┐
|
|
485
|
+
│ │
|
|
486
|
+
▼ │
|
|
487
|
+
Stage 1: BRAINSTORM │
|
|
488
|
+
│ Invoke brainstorming skill │
|
|
489
|
+
│ Produce: docs/plans/YYYY-MM-DD-<topic>-design.md │
|
|
490
|
+
│ Gate: user approves design │
|
|
491
|
+
│ │
|
|
492
|
+
▼ │
|
|
493
|
+
Stage 1.5: RESEARCH [NEW] │
|
|
494
|
+
│ Invoke research skill (parallel agents) │
|
|
495
|
+
│ Produce: tasks/research-<slug>.md + .json │
|
|
496
|
+
│ Gate: research-gate.sh (no blocking items) │
|
|
497
|
+
│ Feedback: blocking changes → revise design │
|
|
498
|
+
│ │
|
|
499
|
+
▼ │
|
|
500
|
+
Stage 2: PRD │
|
|
501
|
+
│ /create-prd (reads research JSON for scoping) │
|
|
502
|
+
│ Produce: tasks/prd.json + tasks/prd-<feature>.md │
|
|
503
|
+
│ Gate: user approves │
|
|
504
|
+
│ │
|
|
505
|
+
▼ │
|
|
506
|
+
Stage 3: PLAN │
|
|
507
|
+
│ writing-plans (references research report) │
|
|
508
|
+
│ Produce: docs/plans/YYYY-MM-DD-<feature>.md │
|
|
509
|
+
│ Gate: user chooses execution mode │
|
|
510
|
+
│ │
|
|
511
|
+
▼ │
|
|
512
|
+
Stage 3.5: ISOLATE │
|
|
513
|
+
│ using-git-worktrees │
|
|
514
|
+
│ Produce: .worktrees/<branch>/ │
|
|
515
|
+
│ Gate: baseline tests pass │
|
|
516
|
+
│ │
|
|
517
|
+
▼ │
|
|
518
|
+
Stage 4: EXECUTE │
|
|
519
|
+
│ One of: subagent / executing-plans / headless / │
|
|
520
|
+
│ ralph-loop / MAB │
|
|
521
|
+
│ Per-batch: quality gate + research warnings │
|
|
522
|
+
│ Produce: committed code, progress.txt updates │
|
|
523
|
+
│ │
|
|
524
|
+
▼ │
|
|
525
|
+
Stage 5: VERIFY │
|
|
526
|
+
│ verification-before-completion │
|
|
527
|
+
│ ALL PRD criteria pass (shell commands) │
|
|
528
|
+
│ Lesson scanner on changed files │
|
|
529
|
+
│ │
|
|
530
|
+
▼ │
|
|
531
|
+
Stage 6: FINISH │
|
|
532
|
+
│ finishing-a-development-branch │
|
|
533
|
+
│ Merge / PR / Keep / Discard │
|
|
534
|
+
│ ───────────────────────── Loop back for next ─────┘
|
|
535
|
+
│ feature in roadmap
|
|
536
|
+
▼
|
|
537
|
+
DONE
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
---
|
|
541
|
+
|
|
542
|
+
## 7. Effort Estimate
|
|
543
|
+
|
|
544
|
+
| Component | Files | New/Modify | Effort |
|
|
545
|
+
|-----------|-------|-----------|--------|
|
|
546
|
+
| Research skill | `skills/research/SKILL.md` | New | 1 task |
|
|
547
|
+
| Roadmap skill | `skills/roadmap/SKILL.md` | New | 1 task |
|
|
548
|
+
| Research gate | `scripts/research-gate.sh` | New | 1 task |
|
|
549
|
+
| Autocode skill update | `skills/autocode/SKILL.md` | Modify | 1 task |
|
|
550
|
+
| Code factory command update | `commands/code-factory.md` | Modify | 1 task |
|
|
551
|
+
| create-prd command update | `commands/create-prd.md` | Modify | 1 task |
|
|
552
|
+
| Context injection update | `scripts/lib/run-plan-context.sh` | Modify | 1 task |
|
|
553
|
+
| auto-compound.sh update | `scripts/auto-compound.sh` | Modify | 1 task |
|
|
554
|
+
| Code Factory docs | `docs/CODE-FACTORY.md` | New | 1 task |
|
|
555
|
+
| ARCHITECTURE.md update | `docs/ARCHITECTURE.md` | Modify | 1 task |
|
|
556
|
+
| Tests | `scripts/tests/test_research_gate.sh` | New | 1 task |
|
|
557
|
+
| **Total** | **11 files** | **5 new, 6 modify** | **~2 batches** |
|
|
558
|
+
|
|
559
|
+
---
|
|
560
|
+
|
|
561
|
+
## 8. Sources
|
|
562
|
+
|
|
563
|
+
### AI Agent Architecture
|
|
564
|
+
- [SWE-bench Technical Report — Cognition](https://cognition.ai/blog/swe-bench-technical-report)
|
|
565
|
+
- [SWE-grep: RL for Fast Context Retrieval — Cognition](https://cognition.ai/blog/swe-grep)
|
|
566
|
+
- [Devin 2.0 Planning Mode — Cognition](https://cognition.ai/blog/devin-2)
|
|
567
|
+
- [Lost in the Middle — Stanford, arXiv 2307.03172](https://arxiv.org/abs/2307.03172)
|
|
568
|
+
- [Effective Context Engineering — Anthropic Engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
|
|
569
|
+
- [Context Engineering for Agents — LangChain Blog](https://blog.langchain.com/context-engineering-for-agents/)
|
|
570
|
+
- [RAG Review 2025 — RAGFlow](https://ragflow.io/blog/rag-review-2025-from-rag-to-context)
|
|
571
|
+
- [CodeRAG-Bench — arXiv 2406.14497](https://arxiv.org/html/2406.14497v1)
|
|
572
|
+
- [RACG Survey — arXiv 2510.04905](https://arxiv.org/abs/2510.04905)
|
|
573
|
+
- [A-RAG Hierarchical Retrieval — arXiv 2602.03442](https://arxiv.org/html/2602.03442v1)
|
|
574
|
+
- [Building Effective AI Agents — Anthropic](https://www.anthropic.com/research/building-effective-agents)
|
|
575
|
+
- [Code Generation with LLM Agents Survey — arXiv 2508.00083](https://arxiv.org/html/2508.00083v1)
|
|
576
|
+
|
|
577
|
+
### Deep Research Agent Pipelines
|
|
578
|
+
- [GPT Researcher — GitHub](https://github.com/assafelovic/gpt-researcher)
|
|
579
|
+
- [GPT Researcher Architecture — DeepWiki](https://deepwiki.com/assafelovic/gpt-researcher)
|
|
580
|
+
- [DeepResearchAgent — SkyworkAI](https://github.com/SkyworkAI/DeepResearchAgent)
|
|
581
|
+
- [Deep Research API — OpenAI Cookbook](https://cookbook.openai.com/examples/deep_research_api/introduction_to_deep_research_api_agents)
|
|
582
|
+
- [Deep Research Agents Examination — arXiv 2506.18096](https://arxiv.org/html/2506.18096v2)
|
|
583
|
+
|
|
584
|
+
### Cross-Domain Frameworks
|
|
585
|
+
- [Cochrane PICO](https://www.cochranelibrary.com/about-pico)
|
|
586
|
+
- [otto-SR: Automated Systematic Reviews](https://ottosr.com/manuscript.pdf)
|
|
587
|
+
- [ASReview — Nature Machine Intelligence](https://www.nature.com/articles/s42256-020-00287-7)
|
|
588
|
+
- [Double Diamond — British Design Council / Maze](https://maze.co/blog/double-diamond-design-process/)
|
|
589
|
+
- [Intelligence Preparation of the Battlefield — Army ADP 2-01.3](https://armypubs.army.mil/epubs/DR_pubs/DR_a/ARN36709-ATP_2-01.3-001-WEB-2.pdf)
|
|
590
|
+
- [Analysis of Competing Hypotheses — CIA](https://www.cia.gov/static/955180a45afe3f5013772c313b16face/Tradecraft-Primer-apr09.pdf)
|
|
591
|
+
- [Technical Spikes in Agile — Talent500](https://talent500.com/blog/spike-in-agile-purpose-process-best-practices/)
|
|
592
|
+
|
|
593
|
+
### Codebase (Internal)
|
|
594
|
+
- `skills/autocode/competitive-mode.md` — pre-flight exploration pattern (codebase + external agents)
|
|
595
|
+
- `scripts/prior-art-search.sh` — existing prior-art search (GitHub + local + ast-grep)
|
|
596
|
+
- `scripts/auto-compound.sh` — automated pipeline with Step 2.5 prior-art search
|
|
597
|
+
- `docs/plans/2026-02-21-code-factory-v2-design.md` — V2 design with prior-art search as Task 3.2
|
|
598
|
+
- `docs/plans/2026-02-21-code-factory-v2-phase4-design.md` — ast-grep discovery mode
|
|
599
|
+
- `docs/plans/2026-02-13-ha-intelligence-research-findings.md` — example of structured research (4 parallel agents, 100+ papers)
|
|
600
|
+
- `docs/plans/2026-02-21-infrastructure-deep-research.md` — example of structured research (5 parallel agents)
|
|
601
|
+
- `docs/plans/2026-02-21-mab-research-report.md` — MAB Round 1 research (this session)
|
|
602
|
+
- `docs/plans/2026-02-22-mab-research-round2.md` — MAB Round 2 research (this session)
|