autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,462 @@
|
|
|
1
|
+
# Multi-Armed Bandit System Design
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-22 (updated 2026-02-23)
|
|
4
|
+
**Status:** Approved — updated with research findings
|
|
5
|
+
**Goal:** Competing autonomous agents (superpowers vs ralph-wiggum) execute the same brief using different methodologies, judged by an LLM that extracts lessons and updates strategy performance data. The toolkit gets smarter with every run, and community contributions compound learning for everyone.
|
|
6
|
+
|
|
7
|
+
> ## Research-Driven Updates (2026-02-23)
|
|
8
|
+
>
|
|
9
|
+
> Based on cross-cutting synthesis of 25 research papers, the following changes were made:
|
|
10
|
+
>
|
|
11
|
+
> 1. **Thompson Sampling replaces LLM planner.** The planner agent (Section "Planner Agent") is now a bash function using Beta distribution sampling, not a separate `claude -p` call. Cheaper, faster, better calibrated. (Source: MAB Research R1, cross-cutting synthesis §F)
|
|
12
|
+
>
|
|
13
|
+
> 2. **Human calibration for first 10 decisions.** The judge's verdict is presented to the user for approval/override for the first 10 MAB runs. Only after 10 human-validated decisions does automated routing take over. (Source: cross-cutting synthesis §F — "validate against human judgment")
|
|
14
|
+
>
|
|
15
|
+
> 3. **Selective MAB (~30% of batches).** MAB is not the default mode. It triggers on: integration batches, first-time batch types (insufficient data), and historically flaky batches (>50% retry rate). Single-strategy routing is the default when win rates are clear. (Source: Cost/Quality paper — break-even only if prevents 1 debugging batch per 2 features)
|
|
16
|
+
>
|
|
17
|
+
> 4. **Prerequisites added.** Phase 1 (bug fixes, especially #10 state schema) and Phase 3 (cost tracking, prompt caching) must complete before MAB implementation. Without cost data, MAB economics can't be validated. (Source: cross-cutting synthesis §8)
|
|
18
|
+
>
|
|
19
|
+
> 5. **Plan slimmed from 6 to 4 batches.** Prompts are just files (no code), planner is now a function (not an agent), and community sync is a simple script. The original plan over-scoped. (Source: 80% infrastructure reuse finding from MAB R1)
|
|
20
|
+
>
|
|
21
|
+
> 6. **`{AB_LESSONS}` placeholder bug fixed.** Original plan used `{AB_LESSONS}` in `assemble_prompt()` but data file is `mab-lessons.json`. Changed to `{MAB_LESSONS}`.
|
|
22
|
+
>
|
|
23
|
+
> See updated plan: `docs/plans/2026-02-23-roadmap-to-completion.md` Phase 4.
|
|
24
|
+
|
|
25
|
+
## Problem
|
|
26
|
+
|
|
27
|
+
The toolkit has two execution strategies — structured (superpowers skill chain) and autonomous (ralph-wiggum iteration loop) — but no empirical data on which works better for which types of work. Users pick one and hope. The toolkit learns nothing from execution outcomes.
|
|
28
|
+
|
|
29
|
+
## Design Principles
|
|
30
|
+
|
|
31
|
+
1. **Thin infrastructure, rich data, LLM intelligence.** Bash scripts create worktrees, run quality gates, merge branches. LLM agents make all decisions (routing, judging, lesson extraction). Data files are the interface between runs.
|
|
32
|
+
|
|
33
|
+
2. **Both agents are full toolkit citizens.** They inherit all skills, lessons, hooks, quality gates, and CLAUDE.md conventions. The competition is about orchestration strategy, not available tools.
|
|
34
|
+
|
|
35
|
+
3. **Human input ends at PRD approval.** Brainstorm → design → PRD is human-in-the-loop. Everything after is machine-driven.
|
|
36
|
+
|
|
37
|
+
4. **Every run produces learning.** MAB lessons, strategy performance data, and failure mode classifications feed back into future runs. Community contributions propagate via git.
|
|
38
|
+
|
|
39
|
+
## Architecture
|
|
40
|
+
|
|
41
|
+
```
|
|
42
|
+
PHASE 1 — HUMAN + SINGLE AGENT (shared)
|
|
43
|
+
1. Brainstorm → approved design doc
|
|
44
|
+
2. PRD → machine-verifiable acceptance criteria
|
|
45
|
+
3. Architecture map generated
|
|
46
|
+
|
|
47
|
+
PHASE 2 — PLANNER AGENT (LLM)
|
|
48
|
+
Reads: design doc, PRD, architecture map, strategy-perf.json
|
|
49
|
+
Decides per work unit: MAB or single? Which strategy? Unit size?
|
|
50
|
+
|
|
51
|
+
PHASE 3 — MAB EXECUTION (parallel worktrees)
|
|
52
|
+
Agent A (superpowers): writes own plan, TDD, batch-by-batch
|
|
53
|
+
Agent B (ralph): iterates until PRD criteria pass
|
|
54
|
+
|
|
55
|
+
PHASE 4 — JUDGE AGENT (LLM)
|
|
56
|
+
Reads: both diffs, design doc, PRD, architecture map, lesson history
|
|
57
|
+
Outputs: winner, bidirectional lessons, strategy update, failure mode
|
|
58
|
+
|
|
59
|
+
PHASE 5 — MERGE + LEARN
|
|
60
|
+
Merge winner, log lessons, update strategy data, promote patterns
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Shared Inputs (Human-Approved)
|
|
64
|
+
|
|
65
|
+
All created before MAB execution begins. Human approves design and PRD.
|
|
66
|
+
|
|
67
|
+
| Artifact | Source | Purpose |
|
|
68
|
+
|----------|--------|---------|
|
|
69
|
+
| Design doc | Brainstorming skill | What to build and why |
|
|
70
|
+
| `tasks/prd.json` | `/create-prd` | Machine-verifiable acceptance criteria |
|
|
71
|
+
| `docs/ARCHITECTURE-MAP.json` | `architecture-map.sh` (auto-generated) | Module graph, imports/exports, dependencies |
|
|
72
|
+
|
|
73
|
+
## Agent Configuration
|
|
74
|
+
|
|
75
|
+
Both agents receive identical toolkit context. The only difference is the lead instruction.
|
|
76
|
+
|
|
77
|
+
### What both agents always get
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
1. CLAUDE.md chain
|
|
81
|
+
~/.claude/CLAUDE.md (global)
|
|
82
|
+
~/Documents/CLAUDE.md (workspace)
|
|
83
|
+
<project>/CLAUDE.md (project)
|
|
84
|
+
|
|
85
|
+
2. Toolkit enforcement
|
|
86
|
+
hooks/hooks.json (hookify rules)
|
|
87
|
+
quality-gate.sh (lesson-check + tests + memory)
|
|
88
|
+
docs/lessons/*.md (all lessons)
|
|
89
|
+
|
|
90
|
+
3. Shared brief
|
|
91
|
+
Design doc
|
|
92
|
+
tasks/prd.json
|
|
93
|
+
docs/ARCHITECTURE-MAP.json
|
|
94
|
+
logs/mab-lessons.json (previous MAB learnings)
|
|
95
|
+
|
|
96
|
+
4. All skills available via Skill tool
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Agent A (superpowers-led)
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
Lead instruction:
|
|
103
|
+
"Execute this using the superpowers skill chain.
|
|
104
|
+
Write your own implementation plan first.
|
|
105
|
+
Follow TDD: write test → verify fail → implement → verify pass → commit.
|
|
106
|
+
Run quality gates between logical batches."
|
|
107
|
+
|
|
108
|
+
Natural behavior:
|
|
109
|
+
→ writing-plans (creates its own plan from the design doc)
|
|
110
|
+
→ executing-plans (batch-by-batch)
|
|
111
|
+
→ test-driven-development (per task)
|
|
112
|
+
→ verification-before-completion (at end)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Agent B (ralph-led)
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
Lead instruction:
|
|
119
|
+
"Execute this using the ralph-loop approach.
|
|
120
|
+
All PRD acceptance criteria in tasks/prd.json must pass (exit 0).
|
|
121
|
+
Iterate until done. Use any toolkit skills as needed."
|
|
122
|
+
|
|
123
|
+
Natural behavior:
|
|
124
|
+
→ Reads PRD criteria
|
|
125
|
+
→ Starts coding toward acceptance criteria
|
|
126
|
+
→ Uses TDD, debugging, etc. as needed (not mandated order)
|
|
127
|
+
→ Stop-hook checks criteria each cycle
|
|
128
|
+
→ Done when all criteria pass
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## Worktree Isolation
|
|
132
|
+
|
|
133
|
+
Each MAB run creates two git worktrees branched from HEAD.
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
# Create worktrees
|
|
137
|
+
git worktree add .claude/worktrees/mab-a-batch-N -b mab-a-batch-N HEAD
|
|
138
|
+
git worktree add .claude/worktrees/mab-b-batch-N -b mab-b-batch-N HEAD
|
|
139
|
+
|
|
140
|
+
# After judge picks winner (say A):
|
|
141
|
+
git merge mab-a-batch-N
|
|
142
|
+
|
|
143
|
+
# Cleanup
|
|
144
|
+
git worktree remove .claude/worktrees/mab-a-batch-N
|
|
145
|
+
git worktree remove .claude/worktrees/mab-b-batch-N
|
|
146
|
+
git branch -d mab-a-batch-N mab-b-batch-N
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Both agents run in parallel. Neither can see the other's work.
|
|
150
|
+
|
|
151
|
+
## Planner Agent
|
|
152
|
+
|
|
153
|
+
An LLM agent that decides routing before execution begins. Not a bash script — reads data files and produces a JSON routing plan.
|
|
154
|
+
|
|
155
|
+
### Inputs
|
|
156
|
+
|
|
157
|
+
- Design doc (scope and complexity)
|
|
158
|
+
- PRD task graph (dependencies, count)
|
|
159
|
+
- `docs/ARCHITECTURE-MAP.json` (cross-module touches)
|
|
160
|
+
- `logs/strategy-perf.json` (historical win rates per strategy x batch type)
|
|
161
|
+
|
|
162
|
+
### Decision Logic
|
|
163
|
+
|
|
164
|
+
```
|
|
165
|
+
For each work unit:
|
|
166
|
+
1. Classify type: new-file, refactoring, integration, test-only
|
|
167
|
+
2. Check strategy-perf.json for this type
|
|
168
|
+
3. If clear winner (>70% win rate, 10+ data points): route to winner
|
|
169
|
+
4. If uncertain or insufficient data: MAB run
|
|
170
|
+
5. If error-prone type (historically high retry rate): MAB run
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
### Output
|
|
174
|
+
|
|
175
|
+
```json
|
|
176
|
+
{
|
|
177
|
+
"routing": [
|
|
178
|
+
{
|
|
179
|
+
"unit": 1,
|
|
180
|
+
"description": "Create test helpers and validators",
|
|
181
|
+
"type": "new-file",
|
|
182
|
+
"decision": "single",
|
|
183
|
+
"strategy": "ralph",
|
|
184
|
+
"reasoning": "new-file: ralph wins 70%, 15 data points"
|
|
185
|
+
},
|
|
186
|
+
{
|
|
187
|
+
"unit": 2,
|
|
188
|
+
"description": "Integration wiring and CI",
|
|
189
|
+
"type": "integration",
|
|
190
|
+
"decision": "mmab_run",
|
|
191
|
+
"reasoning": "integration: superpowers 55%, only 8 data points — need more data"
|
|
192
|
+
}
|
|
193
|
+
]
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Work Unit Sizing
|
|
198
|
+
|
|
199
|
+
| Project size | Strategy |
|
|
200
|
+
|-------------|----------|
|
|
201
|
+
| Small (< 5 PRD tasks) | MAB the whole project |
|
|
202
|
+
| Medium (5-15 PRD tasks) | Chunk by PRD dependency groups, route per chunk |
|
|
203
|
+
| Large (15+ PRD tasks) | Phase 1: MAB (explore), Phase 2+: route to winners (exploit) |
|
|
204
|
+
|
|
205
|
+
## Judge Agent
|
|
206
|
+
|
|
207
|
+
An LLM agent that evaluates both candidates after execution.
|
|
208
|
+
|
|
209
|
+
### Inputs
|
|
210
|
+
|
|
211
|
+
```
|
|
212
|
+
1. Full plan context: design doc, PRD, architecture map
|
|
213
|
+
2. Both diffs: git diff main...ab-a, git diff main...ab-b
|
|
214
|
+
3. Quality gate results for both
|
|
215
|
+
4. All previous MAB lessons: logs/mab-lessons.json
|
|
216
|
+
5. Score from automated scoring (test count, diff size, gate pass)
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### Evaluation Criteria
|
|
220
|
+
|
|
221
|
+
```
|
|
222
|
+
1. WINNER SELECTION
|
|
223
|
+
Which implementation better serves the overall architecture?
|
|
224
|
+
|
|
225
|
+
2. BIDIRECTIONAL LESSONS
|
|
226
|
+
What did the winner do well that the loser should learn from?
|
|
227
|
+
What did the loser do well that the winner should learn from?
|
|
228
|
+
|
|
229
|
+
3. FAILURE MODE CLASSIFICATION
|
|
230
|
+
How did the weaker submission fall short?
|
|
231
|
+
Categories: over-engineering, under-testing, code-duplication,
|
|
232
|
+
integration-gap, convention-violation, wrong-abstraction-level
|
|
233
|
+
|
|
234
|
+
4. TOOLKIT COMPLIANCE
|
|
235
|
+
Did both agents follow CLAUDE.md conventions?
|
|
236
|
+
Did both use TDD (regardless of strategy)?
|
|
237
|
+
Did either trigger hookify blocks?
|
|
238
|
+
Did either skip verification?
|
|
239
|
+
|
|
240
|
+
5. STRATEGY RECOMMENDATION
|
|
241
|
+
For this work unit type, which strategy should be preferred?
|
|
242
|
+
Confidence level (low/medium/high)?
|
|
243
|
+
|
|
244
|
+
6. LESSON EXTRACTION
|
|
245
|
+
{
|
|
246
|
+
"pattern": "description of what was learned",
|
|
247
|
+
"context": "when this applies (batch type, project type)",
|
|
248
|
+
"recommendation": "what to do differently",
|
|
249
|
+
"source_strategy": "which agent's behavior this came from",
|
|
250
|
+
"lesson_type": "syntactic|semantic"
|
|
251
|
+
}
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Output
|
|
255
|
+
|
|
256
|
+
```json
|
|
257
|
+
{
|
|
258
|
+
"winner": "agent_a",
|
|
259
|
+
"confidence": "high",
|
|
260
|
+
"reasoning": "Agent A's implementation separated validation logic into composable functions. Agent B duplicated validation across 3 files.",
|
|
261
|
+
"failure_mode": "code-duplication-under-time-pressure",
|
|
262
|
+
"toolkit_compliance": {
|
|
263
|
+
"agent_a": {"tdd": true, "conventions": true, "hookify_blocks": 0},
|
|
264
|
+
"agent_b": {"tdd": false, "conventions": true, "hookify_blocks": 0}
|
|
265
|
+
},
|
|
266
|
+
"lessons": [
|
|
267
|
+
{
|
|
268
|
+
"pattern": "Extract shared validation patterns before writing per-type validators",
|
|
269
|
+
"context": "new-file batches with 3+ similar validators",
|
|
270
|
+
"recommendation": "Create a shared contract function first, then implement per-type",
|
|
271
|
+
"source_strategy": "agent_a",
|
|
272
|
+
"lesson_type": "semantic"
|
|
273
|
+
}
|
|
274
|
+
],
|
|
275
|
+
"strategy_update": {
|
|
276
|
+
"batch_type": "new-file",
|
|
277
|
+
"winner": "superpowers",
|
|
278
|
+
"confidence": "medium"
|
|
279
|
+
}
|
|
280
|
+
}
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
## Data Files
|
|
284
|
+
|
|
285
|
+
### `logs/mab-lessons.json` — Accumulated MMAB Lessons
|
|
286
|
+
|
|
287
|
+
```json
|
|
288
|
+
[
|
|
289
|
+
{
|
|
290
|
+
"timestamp": "2026-02-22T15:30:00Z",
|
|
291
|
+
"project": "autonomous-coding-toolkit",
|
|
292
|
+
"work_unit": "validator-suite",
|
|
293
|
+
"batch_type": "new-file",
|
|
294
|
+
"winner": "agent_a",
|
|
295
|
+
"pattern": "Extract shared validation patterns before per-type validators",
|
|
296
|
+
"context": "new-file batches with 3+ similar validators",
|
|
297
|
+
"recommendation": "Create shared contract function first",
|
|
298
|
+
"failure_mode": "code-duplication-under-time-pressure",
|
|
299
|
+
"occurrences": 1
|
|
300
|
+
}
|
|
301
|
+
]
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
### `logs/strategy-perf.json` — Strategy Win Rates
|
|
305
|
+
|
|
306
|
+
```json
|
|
307
|
+
{
|
|
308
|
+
"new-file": {
|
|
309
|
+
"superpowers": {"wins": 12, "losses": 8, "total": 20},
|
|
310
|
+
"ralph": {"wins": 8, "losses": 12, "total": 20}
|
|
311
|
+
},
|
|
312
|
+
"refactoring": {
|
|
313
|
+
"superpowers": {"wins": 3, "losses": 11, "total": 14},
|
|
314
|
+
"ralph": {"wins": 11, "losses": 3, "total": 14}
|
|
315
|
+
},
|
|
316
|
+
"integration": {
|
|
317
|
+
"superpowers": {"wins": 9, "losses": 2, "total": 11},
|
|
318
|
+
"ralph": {"wins": 2, "losses": 9, "total": 11}
|
|
319
|
+
},
|
|
320
|
+
"test-only": {
|
|
321
|
+
"superpowers": {"wins": 5, "losses": 7, "total": 12},
|
|
322
|
+
"ralph": {"wins": 7, "losses": 5, "total": 12}
|
|
323
|
+
}
|
|
324
|
+
}
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### `docs/ARCHITECTURE-MAP.json` — Auto-Generated Module Graph
|
|
328
|
+
|
|
329
|
+
```json
|
|
330
|
+
{
|
|
331
|
+
"generated_at": "2026-02-22T15:00:00Z",
|
|
332
|
+
"modules": [
|
|
333
|
+
{
|
|
334
|
+
"name": "run-plan",
|
|
335
|
+
"files": ["scripts/run-plan.sh", "scripts/lib/run-plan-*.sh"],
|
|
336
|
+
"exports": ["run_mode_headless", "run_mode_team"],
|
|
337
|
+
"depends_on": ["quality-gate", "lesson-check", "telegram"]
|
|
338
|
+
}
|
|
339
|
+
]
|
|
340
|
+
}
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
## Lesson Lifecycle
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
MAB judge extracts lesson
|
|
347
|
+
→ logs/mab-lessons.json (immediate, local)
|
|
348
|
+
|
|
349
|
+
Pattern recurs 3+ times (same pattern across runs)
|
|
350
|
+
→ Auto-promoted to docs/lessons/NNNN-*.md
|
|
351
|
+
→ lesson-check.sh enforces syntactic lessons
|
|
352
|
+
→ lesson-scanner agent enforces semantic lessons
|
|
353
|
+
|
|
354
|
+
Promoted lesson causes quality gate failure
|
|
355
|
+
→ Tagged "disputed" in mab-lessons.json
|
|
356
|
+
→ Excluded from injection until human review
|
|
357
|
+
|
|
358
|
+
User runs /submit-lesson
|
|
359
|
+
→ PR to upstream autonomous-coding-toolkit repo
|
|
360
|
+
→ Maintainer reviews and merges
|
|
361
|
+
→ Community users pull via scripts/pull-community-lessons.sh
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
## Community Propagation
|
|
365
|
+
|
|
366
|
+
### Contributing Lessons
|
|
367
|
+
|
|
368
|
+
```bash
|
|
369
|
+
# Existing command — already in the toolkit
|
|
370
|
+
/submit-lesson
|
|
371
|
+
|
|
372
|
+
# Creates PR with:
|
|
373
|
+
# docs/lessons/NNNN-<slug>.md (the lesson)
|
|
374
|
+
# Commit message references the MAB run that produced it
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
### Consuming Community Lessons
|
|
378
|
+
|
|
379
|
+
```bash
|
|
380
|
+
# New script
|
|
381
|
+
scripts/pull-community-lessons.sh
|
|
382
|
+
|
|
383
|
+
# Behavior:
|
|
384
|
+
# git fetch upstream
|
|
385
|
+
# Copy new docs/lessons/*.md files
|
|
386
|
+
# Copy updated logs/strategy-perf.json (community aggregate)
|
|
387
|
+
# lesson-check.sh picks up new lessons automatically
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
### Community Strategy Data
|
|
391
|
+
|
|
392
|
+
Aggregated `strategy-perf.json` from all contributors. When merged upstream, includes anonymous win/loss data across all users' projects. New users start with community baseline instead of zero data.
|
|
393
|
+
|
|
394
|
+
### Semantic Search (Pinecone)
|
|
395
|
+
|
|
396
|
+
For large lesson corpus (100+ lessons):
|
|
397
|
+
|
|
398
|
+
```
|
|
399
|
+
Before judge extracts a lesson:
|
|
400
|
+
Query Pinecone: "has this pattern been learned before?"
|
|
401
|
+
If match: refine existing lesson instead of creating duplicate
|
|
402
|
+
If no match: create new lesson
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
Uses the existing Pinecone MCP integration.
|
|
406
|
+
|
|
407
|
+
## Infrastructure Scripts
|
|
408
|
+
|
|
409
|
+
### `scripts/mab-run.sh` — Orchestrator
|
|
410
|
+
|
|
411
|
+
Thin bash script that:
|
|
412
|
+
1. Creates worktrees
|
|
413
|
+
2. Launches both agents in parallel (`claude -p` per worktree)
|
|
414
|
+
3. Runs quality gate on both
|
|
415
|
+
4. Launches judge agent
|
|
416
|
+
5. Merges winner
|
|
417
|
+
6. Cleans up worktrees
|
|
418
|
+
7. Updates data files
|
|
419
|
+
|
|
420
|
+
### `scripts/architecture-map.sh` — Module Graph Generator
|
|
421
|
+
|
|
422
|
+
Scans project source files:
|
|
423
|
+
- Python: `import` / `from X import` statements
|
|
424
|
+
- JavaScript/TypeScript: `import` / `require` statements
|
|
425
|
+
- Shell: `source` statements
|
|
426
|
+
- Produces `docs/ARCHITECTURE-MAP.json`
|
|
427
|
+
|
|
428
|
+
### `scripts/pull-community-lessons.sh` — Community Sync
|
|
429
|
+
|
|
430
|
+
Fetches latest lessons and strategy data from upstream repo.
|
|
431
|
+
|
|
432
|
+
### Agent Prompts
|
|
433
|
+
|
|
434
|
+
- `scripts/prompts/planner-agent.md` — routing decision prompt
|
|
435
|
+
- `scripts/prompts/judge-agent.md` — evaluation prompt
|
|
436
|
+
- `scripts/prompts/agent-a-superpowers.md` — superpowers lead instruction
|
|
437
|
+
- `scripts/prompts/agent-b-ralph.md` — ralph lead instruction
|
|
438
|
+
|
|
439
|
+
## File Summary
|
|
440
|
+
|
|
441
|
+
New files:
|
|
442
|
+
- `scripts/mab-run.sh` — MAB execution orchestrator
|
|
443
|
+
- `scripts/architecture-map.sh` — module graph generator
|
|
444
|
+
- `scripts/pull-community-lessons.sh` — community lesson sync
|
|
445
|
+
- `scripts/prompts/planner-agent.md` — planner prompt
|
|
446
|
+
- `scripts/prompts/judge-agent.md` — judge prompt
|
|
447
|
+
- `scripts/prompts/agent-a-superpowers.md` — Agent A instructions
|
|
448
|
+
- `scripts/prompts/agent-b-ralph.md` — Agent B instructions
|
|
449
|
+
- `scripts/tests/test-mab-run.sh` — MAB orchestrator tests
|
|
450
|
+
- `scripts/tests/test-architecture-map.sh` — map generator tests
|
|
451
|
+
- `docs/plans/2026-02-22-mab-run-design.md` — this document
|
|
452
|
+
|
|
453
|
+
Modified files:
|
|
454
|
+
- `scripts/run-plan.sh` — add `--mab` flag that routes through `mab-run.sh`
|
|
455
|
+
- `scripts/lib/run-plan-context.sh` — inject MAB lessons into batch context
|
|
456
|
+
- `docs/ARCHITECTURE.md` — document MAB system
|
|
457
|
+
|
|
458
|
+
Data files (created at runtime):
|
|
459
|
+
- `logs/mab-lessons.json`
|
|
460
|
+
- `logs/strategy-perf.json`
|
|
461
|
+
- `logs/mab-run-<timestamp>.json`
|
|
462
|
+
- `docs/ARCHITECTURE-MAP.json`
|