autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,537 @@
|
|
|
1
|
+
# Code Factory v2 Phase 4 — Design Document
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-21
|
|
4
|
+
**Status:** Approved
|
|
5
|
+
**Approach:** Fixes-First, Then Features (Batch 1 → 2 → 3 → 4 → 5, sequential)
|
|
6
|
+
**Prior work:** `docs/plans/2026-02-21-code-factory-v2-design.md` (Phases 1-3 complete, Phase 4 partial)
|
|
7
|
+
|
|
8
|
+
## Problem Statement
|
|
9
|
+
|
|
10
|
+
Phase 4 of Code Factory v2 has 4 remaining design tasks (4.2, 4.4, 4.5, 4.6) plus 2 quick fixes and 43 new lessons discovered during v2 execution. The existing 6 lesson files in the toolkit are a fraction of the 53 lessons accumulated across projects. This plan completes Phase 4 and brings all generalizable lessons into the public toolkit.
|
|
11
|
+
|
|
12
|
+
## What's Already Done (Phases 1-3 + partial Phase 4)
|
|
13
|
+
|
|
14
|
+
- Shared libraries: common.sh, ollama.sh, telegram.sh, run-plan-headless.sh
|
|
15
|
+
- Quality gates: lesson-check + lint (ruff/eslint) + tests + license-check + memory
|
|
16
|
+
- Prior-art search (text-based), pipeline-status, failure-digest, context_refs
|
|
17
|
+
- 19 test files, 224 assertions, all scripts under 300 lines
|
|
18
|
+
|
|
19
|
+
## Batch 1: Quick Fixes + All Lessons
|
|
20
|
+
|
|
21
|
+
### Fix 1: Empty Batch Detection
|
|
22
|
+
|
|
23
|
+
In `run-plan-headless.sh` line 37, the batch loop iterates `START_BATCH` to `END_BATCH` without checking if the batch has content. The parser found 9 batches for a 7-batch plan, burning 2 API calls on empty batches (~50s wasted).
|
|
24
|
+
|
|
25
|
+
**Fix:** After `get_batch_title`, call `get_batch_text` and skip if empty:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
local batch_text
|
|
29
|
+
batch_text=$(get_batch_text "$PLAN_FILE" "$batch")
|
|
30
|
+
if [[ -z "$batch_text" ]]; then
|
|
31
|
+
echo " (empty batch -- skipping)"
|
|
32
|
+
continue
|
|
33
|
+
fi
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Fix 2: Bash Test Suite Detection
|
|
37
|
+
|
|
38
|
+
`quality-gate.sh` detects pytest/npm/make but not bash test suites. For this repo, quality gates between batches reported "No test suite detected -- skipped" while 224 assertions existed.
|
|
39
|
+
|
|
40
|
+
**Fix:** Add `bash` case to `detect_project_type()` in `common.sh` when `scripts/tests/run-all-tests.sh` or a `test-*.sh` glob exists. Add corresponding `bash)` case in quality-gate.sh's test suite section.
|
|
41
|
+
|
|
42
|
+
### Lessons: 43 New Files (0007-0049)
|
|
43
|
+
|
|
44
|
+
Port all generalizable lessons from the Documents workspace (53 total - 6 already in toolkit - 11 too project-specific = 36 to port) plus 7 new lessons from v2 execution.
|
|
45
|
+
|
|
46
|
+
**Generalization rules:**
|
|
47
|
+
- No project names (no ARIA, HA, Telegram, etc.)
|
|
48
|
+
- No specific IPs, hostnames, or usernames
|
|
49
|
+
- No internal API references — use generic equivalents
|
|
50
|
+
- Focus on the universal anti-pattern, not the specific bug
|
|
51
|
+
|
|
52
|
+
**Lesson mapping (new ID → source → generalized title):**
|
|
53
|
+
|
|
54
|
+
| New ID | Source | Title | Type | Severity | Category |
|
|
55
|
+
|--------|--------|-------|------|----------|----------|
|
|
56
|
+
| 0007 | v2 | Runner state file rejected by own git-clean check | syntactic | should-fix | integration-boundaries |
|
|
57
|
+
| 0008 | v2 | Quality gate blind spot for non-standard test suites | semantic | should-fix | silent-failures |
|
|
58
|
+
| 0009 | v2 | Plan parser over-count burns empty API calls | semantic | should-fix | silent-failures |
|
|
59
|
+
| 0010 | v2 | `local` outside function silently misbehaves in bash | syntactic | blocker | silent-failures |
|
|
60
|
+
| 0011 | v2 | Batch execution writes tests for unimplemented code | semantic | should-fix | integration-boundaries |
|
|
61
|
+
| 0012 | v2 | API rejects markdown with unescaped special chars | semantic | nice-to-have | integration-boundaries |
|
|
62
|
+
| 0013 | v2 | `export` prefix in env files breaks naive parsing | syntactic | should-fix | silent-failures |
|
|
63
|
+
| 0014 | #2 | Decorator registries are import-time side effects | semantic | should-fix | silent-failures |
|
|
64
|
+
| 0015 | #4 | Frontend-backend schema drift invisible until e2e trace | semantic | should-fix | integration-boundaries |
|
|
65
|
+
| 0016 | #5 | Event-driven systems must seed current state on startup | semantic | should-fix | integration-boundaries |
|
|
66
|
+
| 0017 | #6 | Copy-pasted logic between modules diverges silently | semantic | should-fix | integration-boundaries |
|
|
67
|
+
| 0018 | #8 | Every layer passes its test while full pipeline is broken | semantic | should-fix | integration-boundaries |
|
|
68
|
+
| 0019 | #9 | systemd EnvironmentFile ignores `export` keyword | syntactic | should-fix | silent-failures |
|
|
69
|
+
| 0020 | #10 | Persist state incrementally before expensive work | semantic | should-fix | silent-failures |
|
|
70
|
+
| 0021 | #11 | Dual-axis testing: horizontal sweep + vertical trace | semantic | lesson-learned | integration-boundaries |
|
|
71
|
+
| 0022 | #13 | Build tool JSX factory shadowed by arrow params | syntactic | blocker | silent-failures |
|
|
72
|
+
| 0023 | #14 | Static analysis spiral -- chasing lint fixes creates more bugs | semantic | should-fix | test-anti-patterns |
|
|
73
|
+
| 0024 | #15 | Shared pipeline features must share implementation | semantic | should-fix | integration-boundaries |
|
|
74
|
+
| 0025 | #16 | Defense-in-depth: validate at all entry points | semantic | lesson-learned | integration-boundaries |
|
|
75
|
+
| 0026 | #17 | Linter with no rules enabled = false enforcement | semantic | should-fix | silent-failures |
|
|
76
|
+
| 0027 | #18 | JSX silently drops wrong prop names | syntactic | should-fix | silent-failures |
|
|
77
|
+
| 0028 | #20 | Never embed infrastructure details in client-side code | syntactic | blocker | silent-failures |
|
|
78
|
+
| 0029 | #21 | Never write secret values into committed files | syntactic | blocker | silent-failures |
|
|
79
|
+
| 0030 | #22 | Cache/registry updates must merge, never replace | semantic | should-fix | integration-boundaries |
|
|
80
|
+
| 0031 | #26 | Verify units at every boundary (0-1 vs 0-100) | semantic | should-fix | integration-boundaries |
|
|
81
|
+
| 0032 | #28 | Module lifecycle: subscribe after init gate, unsubscribe on shutdown | semantic | should-fix | resource-lifecycle |
|
|
82
|
+
| 0033 | #29 | Async iteration over mutable collections needs snapshot | syntactic | blocker | async-traps |
|
|
83
|
+
| 0034 | #30 | Caller-side missing await silently discards work | semantic | blocker | async-traps |
|
|
84
|
+
| 0035 | #31 | Duplicate registration IDs cause silent overwrite | semantic | should-fix | silent-failures |
|
|
85
|
+
| 0036 | #34 | WebSocket dirty disconnects raise RuntimeError, not close | semantic | should-fix | resource-lifecycle |
|
|
86
|
+
| 0037 | #36 | Parallel agents sharing worktree corrupt staging area | semantic | blocker | integration-boundaries |
|
|
87
|
+
| 0038 | #37 | Subscribe without stored ref = cannot unsubscribe | syntactic | should-fix | resource-lifecycle |
|
|
88
|
+
| 0039 | #38 | Fallback `or default()` hides initialization bugs | semantic | should-fix | silent-failures |
|
|
89
|
+
| 0040 | #39 | Process all events when 5% are relevant -- filter first | semantic | should-fix | performance |
|
|
90
|
+
| 0041 | #40 | Ambiguous base dir variable causes path double-nesting | semantic | should-fix | integration-boundaries |
|
|
91
|
+
| 0042 | #42 | Spec compliance without quality review misses defensive gaps | semantic | should-fix | integration-boundaries |
|
|
92
|
+
| 0043 | #44 | Exact count assertions on extensible collections break on addition | syntactic | should-fix | test-anti-patterns |
|
|
93
|
+
| 0044 | #46 | Relative `file:` deps break in git worktrees | semantic | should-fix | integration-boundaries |
|
|
94
|
+
| 0045 | #49 | Iterative "how would you improve" catches 35% more design gaps | semantic | lesson-learned | integration-boundaries |
|
|
95
|
+
| 0046 | #50 | Plan-specified test assertions can have math bugs | semantic | should-fix | test-anti-patterns |
|
|
96
|
+
| 0047 | #52 | pytest runs single-threaded by default -- add xdist | semantic | should-fix | performance |
|
|
97
|
+
| 0048 | #53 | Multi-batch plans need explicit integration wiring batch | semantic | lesson-learned | integration-boundaries |
|
|
98
|
+
| 0049 | #56 | A/B verification finds zero-overlap bug classes | semantic | lesson-learned | integration-boundaries |
|
|
99
|
+
|
|
100
|
+
**SUMMARY.md:** Generalized version of the Documents workspace summary with:
|
|
101
|
+
- Quick reference table (all 49 lessons)
|
|
102
|
+
- Three root cause clusters (Silent Failures, Integration Boundaries, Cold-Start)
|
|
103
|
+
- Six rules to build by
|
|
104
|
+
- Diagnostic shortcuts table
|
|
105
|
+
- No project-specific references
|
|
106
|
+
|
|
107
|
+
All lesson files follow the toolkit's YAML frontmatter schema (see `docs/lessons/TEMPLATE.md`).
|
|
108
|
+
|
|
109
|
+
## Batch 2: Per-Batch Context Assembler
|
|
110
|
+
|
|
111
|
+
**Goal:** Minimize the context gap between a fresh batch agent and an experienced one. Each agent gets exactly the context it needs within a token budget -- directives, not just facts.
|
|
112
|
+
|
|
113
|
+
### Architecture
|
|
114
|
+
|
|
115
|
+
A `generate_batch_context()` function in `scripts/lib/run-plan-context.sh` that:
|
|
116
|
+
|
|
117
|
+
1. **Reads all context sources:** state file, progress.txt, git log, context_refs, failure-patterns.json
|
|
118
|
+
2. **Scores by relevance:** recency (recent batches score higher) + direct dependency (context_refs from this batch score highest) + failure history (if this batch type failed before, that scores high)
|
|
119
|
+
3. **Assembles within token budget:** ~1500 tokens target. Priority order: directives > failure history > context_refs contents > git log > progress.txt
|
|
120
|
+
4. **Outputs directives:** "Don't repeat X", "Read Y before modifying", "Quality gate expects N+ tests"
|
|
121
|
+
5. **Writes to CLAUDE.md:** Appends `## Run-Plan: Batch N` section (overwritten per batch, not accumulated)
|
|
122
|
+
|
|
123
|
+
### Context Sources (priority order)
|
|
124
|
+
|
|
125
|
+
1. **Failure patterns** (highest) — from `logs/failure-patterns.json`, cross-run learning
|
|
126
|
+
2. **Context_refs file contents** — first 100 lines of files declared in batch header
|
|
127
|
+
3. **Prior batch quality gate results** — test count, pass/fail, duration
|
|
128
|
+
4. **Git log** — last 5 commits from prior batches
|
|
129
|
+
5. **Progress.txt** — last 20 lines of discoveries/decisions
|
|
130
|
+
6. **Directives** — synthesized from above: "tests must stay above 224", "these files were modified by batch 2"
|
|
131
|
+
|
|
132
|
+
### Cross-Run Failure Patterns
|
|
133
|
+
|
|
134
|
+
`logs/failure-patterns.json` persists across runs:
|
|
135
|
+
|
|
136
|
+
```json
|
|
137
|
+
[
|
|
138
|
+
{
|
|
139
|
+
"batch_title_pattern": "integration wiring",
|
|
140
|
+
"failure_type": "missing import",
|
|
141
|
+
"frequency": 3,
|
|
142
|
+
"last_seen": "2026-02-21",
|
|
143
|
+
"winning_fix": "check all imports before running tests"
|
|
144
|
+
}
|
|
145
|
+
]
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
When a batch title fuzzy-matches a pattern, the relevant warning is injected into context.
|
|
149
|
+
|
|
150
|
+
### Token Budget
|
|
151
|
+
|
|
152
|
+
- Budget: ~1500 tokens (~6000 chars)
|
|
153
|
+
- If assembled context exceeds budget, trim lowest-priority items first
|
|
154
|
+
- Always include: directives (mandatory), failure patterns (if matched), quality gate expectations
|
|
155
|
+
- Trim first: progress.txt, git log, context_refs file contents (truncate to first 50 lines)
|
|
156
|
+
|
|
157
|
+
## Batch 3: ast-grep Integration
|
|
158
|
+
|
|
159
|
+
**Goal:** Help agents write code that fits the existing codebase and catch semantic anti-patterns that grep cannot detect. Two modes: discovery (before PRD) and enforcement (in quality gate).
|
|
160
|
+
|
|
161
|
+
### Discovery Mode (prior-art-search.sh)
|
|
162
|
+
|
|
163
|
+
Run `ast-grep` once at plan start to extract the dominant codebase patterns:
|
|
164
|
+
|
|
165
|
+
- Error handling style (try/except with logging vs bare except)
|
|
166
|
+
- Test patterns (assert helpers, fixture usage, naming conventions)
|
|
167
|
+
- Function size distribution
|
|
168
|
+
- Import patterns
|
|
169
|
+
|
|
170
|
+
Results feed into the context assembler (Batch 2) as "Codebase style: [patterns]" — every batch agent writes code that fits without being told to.
|
|
171
|
+
|
|
172
|
+
### Enforcement Mode (quality-gate.sh)
|
|
173
|
+
|
|
174
|
+
Optional quality gate step that runs ast-grep rules derived from lesson files:
|
|
175
|
+
|
|
176
|
+
- Read lesson YAML where `pattern.type: semantic` and language has ast-grep support
|
|
177
|
+
- Auto-generate ast-grep rule files from lesson descriptions
|
|
178
|
+
- Run against changed files in the batch
|
|
179
|
+
- Warn (not fail) by default — `--strict-ast` to make it a hard gate
|
|
180
|
+
|
|
181
|
+
### Auto-Generated Rules from Lessons
|
|
182
|
+
|
|
183
|
+
Lessons with `pattern.type: semantic` that describe structural patterns (e.g., "async def body has no await") can be converted to ast-grep YAML rules. A `scripts/generate-ast-rules.sh` script reads lesson files and produces `scripts/patterns/*.yml`.
|
|
184
|
+
|
|
185
|
+
Not all semantic lessons can be converted — some require true AI understanding. The script attempts conversion and logs which lessons it could/couldn't handle.
|
|
186
|
+
|
|
187
|
+
### Built-in Pattern Files
|
|
188
|
+
|
|
189
|
+
5-10 patterns in `scripts/patterns/` for common structural anti-patterns:
|
|
190
|
+
|
|
191
|
+
```
|
|
192
|
+
scripts/patterns/
|
|
193
|
+
retry-loop.yml — retry without backoff
|
|
194
|
+
bare-except.yml — except without specific exception
|
|
195
|
+
async-no-await.yml — async def with no await in body
|
|
196
|
+
empty-catch.yml — catch block with no logging
|
|
197
|
+
unused-import.yml — imported but never referenced
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Graceful Degradation
|
|
201
|
+
|
|
202
|
+
If `ast-grep` is not installed:
|
|
203
|
+
- Discovery mode: skip with note ("install ast-grep for structural analysis")
|
|
204
|
+
- Enforcement mode: skip silently (grep-based lesson-check.sh still runs)
|
|
205
|
+
- No hard dependency — ast-grep enhances but is not required
|
|
206
|
+
|
|
207
|
+
## Batch 4: Team Mode with Decision Gate
|
|
208
|
+
|
|
209
|
+
**Goal:** Reduce total wall-clock time for plan execution while maintaining quality. Automatically select the optimal execution mode based on plan analysis.
|
|
210
|
+
|
|
211
|
+
### Decision Gate
|
|
212
|
+
|
|
213
|
+
Before any execution starts, `run-plan.sh` analyzes the plan and selects a mode:
|
|
214
|
+
|
|
215
|
+
```
|
|
216
|
+
run-plan.sh <plan>
|
|
217
|
+
|
|
|
218
|
+
v
|
|
219
|
+
analyze_plan_for_mode()
|
|
220
|
+
|-- Parse all batches: Files, context_refs, depends_on
|
|
221
|
+
|-- Build file-level dependency graph
|
|
222
|
+
|-- Compute parallelism score (0-100)
|
|
223
|
+
|-- Check: AGENT_TEAMS flag available?
|
|
224
|
+
|-- Check: available memory vs worker count
|
|
225
|
+
|
|
|
226
|
+
v
|
|
227
|
+
Decision:
|
|
228
|
+
score < 20 --> HEADLESS (sequential is optimal)
|
|
229
|
+
score 20-60 --> HEADLESS with advisory ("team mode would save ~Xmin")
|
|
230
|
+
score > 60 + teams available + memory OK --> TEAM (parallel)
|
|
231
|
+
score > 60 + teams unavailable --> HEADLESS with note
|
|
232
|
+
any + --mode override --> use override
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
### Parallelism Score Factors
|
|
236
|
+
|
|
237
|
+
- % of batches with zero file overlap with neighbors (+)
|
|
238
|
+
- Number of batches in first parallel group (+)
|
|
239
|
+
- Total file overlap across all batch pairs (-)
|
|
240
|
+
- Shared runtime hints: "starts server", "modifies DB" (-)
|
|
241
|
+
- Explicit `parallel_safe: true` in plan header (+20 bonus)
|
|
242
|
+
|
|
243
|
+
### Routing Plan (always shown)
|
|
244
|
+
|
|
245
|
+
```
|
|
246
|
+
=== Execution Mode Analysis ===
|
|
247
|
+
|
|
248
|
+
Plan: implementation-plan.md
|
|
249
|
+
Batches: 7 | Files touched: 31 | Avg overlap: 12%
|
|
250
|
+
|
|
251
|
+
Dependency graph:
|
|
252
|
+
B1 --> B2 --> B3
|
|
253
|
+
B1 --> B4 --> B5 --> B7
|
|
254
|
+
B6 -------> B7
|
|
255
|
+
|
|
256
|
+
Parallelism score: 72/100
|
|
257
|
+
+ 3 independent groups detected
|
|
258
|
+
+ Max parallel width: 3 (B3, B5, B6)
|
|
259
|
+
+ File overlap < 20% in parallel groups
|
|
260
|
+
- B2->B3 share 2 files (conservative: sequential)
|
|
261
|
+
|
|
262
|
+
Recommendation: TEAM MODE
|
|
263
|
+
Workers: 2 (21G available, 8G/worker threshold)
|
|
264
|
+
Est. wall time: 14min (vs 28min sequential)
|
|
265
|
+
Est. cost: $2.40 (vs $3.10 sequential)
|
|
266
|
+
|
|
267
|
+
Model routing:
|
|
268
|
+
B1: sonnet (implementation -- creates 4 files)
|
|
269
|
+
B2: sonnet (implementation -- modifies 3 files, adds tests)
|
|
270
|
+
B3: haiku (verification -- 0 creates, 5 run commands) [auto-escalate]
|
|
271
|
+
B4: sonnet (implementation -- creates 2 files)
|
|
272
|
+
B5: sonnet (implementation -- modifies + tests)
|
|
273
|
+
B6: haiku (wiring -- 0 new logic) [auto-escalate]
|
|
274
|
+
B7: haiku (verification -- pipeline trace only) [auto-escalate]
|
|
275
|
+
|
|
276
|
+
Speculative execution:
|
|
277
|
+
B2 starts while B1 gate runs (overlap: 0%)
|
|
278
|
+
B5 waits for B4 gate (overlap: 73%)
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
### Auto-Detect Parallelism
|
|
282
|
+
|
|
283
|
+
Build dependency graph from plan content — no `depends_on:` annotations required:
|
|
284
|
+
|
|
285
|
+
- `context_refs` in batch headers declare which files a batch reads from prior batches
|
|
286
|
+
- `Files:` sections declare which files a batch creates/modifies
|
|
287
|
+
- If batch B's context_refs don't include any of batch A's output files, they're independent
|
|
288
|
+
- Fall back to sequential when analysis is ambiguous
|
|
289
|
+
|
|
290
|
+
Existing plans work in team mode with zero changes.
|
|
291
|
+
|
|
292
|
+
### Team Execution Architecture
|
|
293
|
+
|
|
294
|
+
- **Team lead:** owns task list, quality gates, merge queue
|
|
295
|
+
- **N workers:** each gets isolated git worktree, claims batches, executes
|
|
296
|
+
- **Progressive merge queue:** each batch merges to main immediately after gate pass (keeps divergence small)
|
|
297
|
+
- **Speculative execution:** start next batch while gate runs when file overlap < threshold. Abort speculation if gate fails.
|
|
298
|
+
- **Model routing with auto-escalation:** haiku batches that fail retry on sonnet, sonnet failures escalate to opus
|
|
299
|
+
|
|
300
|
+
### Routing Configuration (`scripts/lib/run-plan-routing.sh`)
|
|
301
|
+
|
|
302
|
+
```bash
|
|
303
|
+
# Parallelism thresholds
|
|
304
|
+
PARALLEL_SCORE_THRESHOLD=60 # min score for team mode recommendation
|
|
305
|
+
SPECULATE_MAX_OVERLAP=20 # max file overlap % for speculative execution
|
|
306
|
+
|
|
307
|
+
# Model routing (batch classification --> model)
|
|
308
|
+
MODEL_IMPLEMENTATION="sonnet" # creates/modifies code files
|
|
309
|
+
MODEL_VERIFICATION="haiku" # only run/verify commands
|
|
310
|
+
MODEL_ARCHITECTURE="opus" # "design" or "architecture" in title
|
|
311
|
+
MODEL_ESCALATE_ON_FAIL=true # haiku-->sonnet-->opus on retry
|
|
312
|
+
|
|
313
|
+
# Resource limits
|
|
314
|
+
WORKER_MEM_THRESHOLD_GB=8 # min GB available per worker
|
|
315
|
+
MAX_WORKERS=3 # hard cap regardless of memory
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### Override Escape Hatches
|
|
319
|
+
|
|
320
|
+
- `--mode headless` — force sequential regardless of score
|
|
321
|
+
- `--mode team` — force team regardless of score
|
|
322
|
+
- `--workers N` — override worker count
|
|
323
|
+
- `--model-override B3=opus` — force specific model for a batch
|
|
324
|
+
- `--no-speculate` — disable speculative execution
|
|
325
|
+
- `--sequential-after B4` — parallel until B4, then sequential
|
|
326
|
+
|
|
327
|
+
### Decision Log (`logs/routing-decisions.log`)
|
|
328
|
+
|
|
329
|
+
Every decision logged with timestamp and reasoning:
|
|
330
|
+
|
|
331
|
+
```
|
|
332
|
+
[12:03:14] MODE: team (score=72, threshold=60)
|
|
333
|
+
[12:03:14] PARALLEL: B2,B4 -- overlap=0 files, both depend only on B1
|
|
334
|
+
[12:03:14] MODEL: B3-->haiku -- 0 create/modify, 5 run commands, confidence=85%
|
|
335
|
+
[12:05:22] SPECULATE: B3 starting while B2 gate runs -- overlap 0%
|
|
336
|
+
[12:05:45] GATE_PASS: B2 (224-->231 tests), merging worktree
|
|
337
|
+
[12:05:48] MERGE: B2 --> main, 3 files, 0 conflicts
|
|
338
|
+
[12:06:01] SPECULATE_OK: B3 confirmed
|
|
339
|
+
[12:08:30] ESCALATE: B6 failed on haiku, retrying on sonnet
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
### Where Team Mode Falls Back to Headless
|
|
343
|
+
|
|
344
|
+
- Parallelism score < 60 (tightly coupled batches)
|
|
345
|
+
- Shared runtime state detected (service ports, DB migrations)
|
|
346
|
+
- Plan is concern-batched (all impl then all tests)
|
|
347
|
+
- Available memory < 2 x worker threshold
|
|
348
|
+
- `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` flag not set
|
|
349
|
+
|
|
350
|
+
### Integration with pipeline-status.sh
|
|
351
|
+
|
|
352
|
+
After execution, pipeline-status.sh shows routing decisions alongside results:
|
|
353
|
+
|
|
354
|
+
```
|
|
355
|
+
Batch 3: haiku --> PASSED (22s, 8 tests added)
|
|
356
|
+
Batch 4: sonnet --> PASSED (180s, 15 tests added)
|
|
357
|
+
Batch 6: haiku-->sonnet (escalated) --> PASSED (45s, 3 tests added)
|
|
358
|
+
Total: 14min wall, $2.38 cost, 2 workers
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### Writing-Plans Integration
|
|
362
|
+
|
|
363
|
+
The writing-plans skill should assess parallelism when creating plans:
|
|
364
|
+
- Add `parallel_safe: true/false` to plan header
|
|
365
|
+
- Add `depends_on: [batch-N]` hints to batch headers when dependencies exist
|
|
366
|
+
- Design batches for independence when possible (different files per batch)
|
|
367
|
+
|
|
368
|
+
## Batch 5: Parallel Patch Sampling
|
|
369
|
+
|
|
370
|
+
**Goal:** Maximize the probability that a batch succeeds, especially for hard batches. Improve success probability over time through outcome learning.
|
|
371
|
+
|
|
372
|
+
### When Sampling Triggers
|
|
373
|
+
|
|
374
|
+
Not every batch — only when:
|
|
375
|
+
- Batch marked `critical: true` in plan header
|
|
376
|
+
- Batch failed its first attempt (sampling replaces naive retry)
|
|
377
|
+
- User passes `--sample N` flag explicitly
|
|
378
|
+
|
|
379
|
+
### Tournament Architecture
|
|
380
|
+
|
|
381
|
+
```
|
|
382
|
+
Batch fails first attempt (or marked critical)
|
|
383
|
+
|
|
|
384
|
+
v
|
|
385
|
+
Round 1: N candidates in parallel (default: 3)
|
|
386
|
+
|-- Candidate 1: vanilla prompt
|
|
387
|
+
|-- Candidate 2: prompt + failure digest + "try a different approach"
|
|
388
|
+
|-- Candidate 3: prompt + failure digest + "minimal change only"
|
|
389
|
+
|
|
|
390
|
+
Each in isolated worktree
|
|
391
|
+
|
|
|
392
|
+
v
|
|
393
|
+
Score each candidate:
|
|
394
|
+
|-- Quality gate pass/fail (mandatory -- eliminates failures)
|
|
395
|
+
|-- Test count (more = better)
|
|
396
|
+
|-- Diff size (smaller = better among passers)
|
|
397
|
+
|-- Lint warnings (fewer = better)
|
|
398
|
+
|-- Lesson-check violations (penalty: -200 each)
|
|
399
|
+
|-- ast-grep violations (penalty: -100 each)
|
|
400
|
+
|
|
|
401
|
+
v
|
|
402
|
+
Decision:
|
|
403
|
+
Clear winner (1 passes, others don't) --> use it
|
|
404
|
+
Multiple passers --> highest score wins
|
|
405
|
+
No winner OR close scores --> Round 2: Synthesis
|
|
406
|
+
|
|
|
407
|
+
v
|
|
408
|
+
Round 2: Synthesis agent
|
|
409
|
+
Reads: all N attempts + their gate results + their diffs
|
|
410
|
+
Task: "Candidate 1 had best architecture but failed test X.
|
|
411
|
+
Candidate 3 passed but duplicated 40 lines.
|
|
412
|
+
Synthesize: use C1's approach, fix using C3's insight."
|
|
413
|
+
|
|
|
414
|
+
v
|
|
415
|
+
Score synthesis --> if passes, use it. If not, best Round 1 winner.
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
### Scoring Function
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
score_candidate() {
|
|
422
|
+
local gate_passed="$1" # 0 or 1
|
|
423
|
+
local test_count="$2" # integer
|
|
424
|
+
local diff_lines="$3" # integer
|
|
425
|
+
local lint_warnings="$4" # integer
|
|
426
|
+
local lesson_violations="$5" # integer
|
|
427
|
+
local ast_violations="$6" # integer
|
|
428
|
+
|
|
429
|
+
# Gate pass is mandatory
|
|
430
|
+
if [[ "$gate_passed" -ne 1 ]]; then
|
|
431
|
+
echo 0; return
|
|
432
|
+
fi
|
|
433
|
+
|
|
434
|
+
# Weighted score: tests most important, quality penalties heavy
|
|
435
|
+
local score=$(( (test_count * 10) + (10000 / (diff_lines + 1)) + (1000 / (lint_warnings + 1)) - (lesson_violations * 200) - (ast_violations * 100) ))
|
|
436
|
+
echo "$score"
|
|
437
|
+
}
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
### Prompt Diversity: Batch-Type-Aware + Learned
|
|
441
|
+
|
|
442
|
+
**Batch type classification** from plan content:
|
|
443
|
+
|
|
444
|
+
| Batch type | Likely failure | Best prompt variants |
|
|
445
|
+
|------------|---------------|---------------------|
|
|
446
|
+
| New file creation | Missing imports, incomplete API | vanilla, "check all imports", "write tests first" |
|
|
447
|
+
| Refactoring | Breaking existing tests | vanilla, "minimal change", "run tests after each edit" |
|
|
448
|
+
| Integration wiring | Missing connections | vanilla, "trace end-to-end", "check every import/export" |
|
|
449
|
+
| Test-only | Flaky assertions, wrong mocks | vanilla, "use real objects not mocks", "edge cases only" |
|
|
450
|
+
|
|
451
|
+
**Learned from outcomes** (`logs/sampling-outcomes.json`):
|
|
452
|
+
|
|
453
|
+
```json
|
|
454
|
+
[
|
|
455
|
+
{
|
|
456
|
+
"batch_type": "refactoring",
|
|
457
|
+
"prompt_variant": "minimal-change",
|
|
458
|
+
"won": true,
|
|
459
|
+
"score": 2450,
|
|
460
|
+
"timestamp": "2026-02-21T12:05:00Z"
|
|
461
|
+
}
|
|
462
|
+
]
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
Over 10+ runs, patterns emerge. Candidate slot allocation:
|
|
466
|
+
- 1 slot always vanilla (baseline)
|
|
467
|
+
- Remaining slots allocated to historically winning variants for this batch type
|
|
468
|
+
- 1 slot always experimental (random variant for exploration)
|
|
469
|
+
|
|
470
|
+
This is a simple multi-armed bandit: exploit what works, explore 1 slot.
|
|
471
|
+
|
|
472
|
+
### Integration with Team Mode
|
|
473
|
+
|
|
474
|
+
- In headless mode: candidates run sequentially (N claude -p calls)
|
|
475
|
+
- In team mode: candidates run as parallel workers on same batch (natural fit)
|
|
476
|
+
- Decision gate factors this in: worker count = sample count for sampled batches
|
|
477
|
+
|
|
478
|
+
### Resource Guards
|
|
479
|
+
|
|
480
|
+
- Memory: don't sample if available memory < N x 4G
|
|
481
|
+
- Cost: log estimated cost in routing plan ("Sampling B4: ~$1.20 for 3 candidates vs $0.40 single")
|
|
482
|
+
- Time: sampling adds ~50% wall time per batch (parallel) or Nx (sequential)
|
|
483
|
+
|
|
484
|
+
### Configuration
|
|
485
|
+
|
|
486
|
+
```bash
|
|
487
|
+
# In run-plan-routing.sh
|
|
488
|
+
SAMPLE_ON_RETRY=true # auto-sample when batch fails first attempt
|
|
489
|
+
SAMPLE_ON_CRITICAL=true # auto-sample for critical: true batches
|
|
490
|
+
SAMPLE_COUNT=3 # default candidate count
|
|
491
|
+
SAMPLE_MAX_COUNT=5 # hard cap
|
|
492
|
+
SAMPLE_MIN_MEMORY_PER_GB=4 # per-candidate memory requirement
|
|
493
|
+
```
|
|
494
|
+
|
|
495
|
+
### Override Flags
|
|
496
|
+
|
|
497
|
+
- `--sample N` — force sampling for all batches with N candidates
|
|
498
|
+
- `--sample-batch B4=5` — sample only batch 4 with 5 candidates
|
|
499
|
+
- `--no-sample` — disable all sampling
|
|
500
|
+
|
|
501
|
+
## Dependencies
|
|
502
|
+
|
|
503
|
+
- **Batch 1** has no dependencies (fixes + lesson files)
|
|
504
|
+
- **Batch 2** depends on Batch 1 (failure patterns reference lesson IDs)
|
|
505
|
+
- **Batch 3** depends on Batch 2 (ast-grep feeds into context assembler) + optional install: `ast-grep`
|
|
506
|
+
- **Batch 4** depends on Batch 2 (context assembler) + Batch 3 (ast-grep scoring) + requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`
|
|
507
|
+
- **Batch 5** depends on Batch 4 (team mode for parallel candidates) + Batch 3 (ast-grep in scoring)
|
|
508
|
+
|
|
509
|
+
## Success Metrics
|
|
510
|
+
|
|
511
|
+
1. All 49 lessons in toolkit with YAML frontmatter, no project-specific references
|
|
512
|
+
2. Empty batches detected and skipped (0 wasted API calls)
|
|
513
|
+
3. Bash test suites detected by quality gate
|
|
514
|
+
4. Context assembler reduces agent "discovery" time (measurable via batch duration comparison)
|
|
515
|
+
5. ast-grep catches at least 3 anti-patterns that grep cannot
|
|
516
|
+
6. Team mode parallelism score correctly predicts speedup within 20%
|
|
517
|
+
7. Patch sampling improves retry success rate vs naive retry (track in sampling-outcomes.json)
|
|
518
|
+
|
|
519
|
+
## Risk Mitigations
|
|
520
|
+
|
|
521
|
+
- **Lesson volume:** 43 new files is mechanical work — each follows the template. Use subagents for parallel writing.
|
|
522
|
+
- **ast-grep availability:** All ast-grep features fail-open. The toolkit works without it installed.
|
|
523
|
+
- **Agent teams instability:** Team mode falls back to headless. Decision gate prevents team mode when conditions aren't right.
|
|
524
|
+
- **Sampling cost:** Resource guards prevent sampling when memory is low. Cost shown in routing plan before execution.
|
|
525
|
+
- **Prompt diversity convergence:** Multi-armed bandit prevents getting stuck on one variant. Always explores 1 slot.
|
|
526
|
+
|
|
527
|
+
## New Files (estimated)
|
|
528
|
+
|
|
529
|
+
| Category | Count | Location |
|
|
530
|
+
|----------|-------|----------|
|
|
531
|
+
| Lesson files | 43 | `docs/lessons/0007-*.md` through `0049-*.md` |
|
|
532
|
+
| Lesson summary | 1 | `docs/lessons/SUMMARY.md` (rewrite) |
|
|
533
|
+
| Lib scripts | 5 | `scripts/lib/run-plan-context.sh`, `run-plan-routing.sh`, `run-plan-team.sh`, `run-plan-scoring.sh`, `generate-ast-rules.sh` |
|
|
534
|
+
| Pattern files | 5-10 | `scripts/patterns/*.yml` |
|
|
535
|
+
| Config | 1 | Routing defaults in `run-plan-routing.sh` |
|
|
536
|
+
| Test files | 8-10 | `scripts/tests/test-*.sh` for each new lib |
|
|
537
|
+
| Logs | 3 | `logs/failure-patterns.json`, `logs/routing-decisions.log`, `logs/sampling-outcomes.json` |
|