autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,204 @@
|
|
|
1
|
+
# Code Factory v2 — Design Document
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-21
|
|
4
|
+
**Status:** Approved
|
|
5
|
+
**Approach:** Foundation-First (Phase 1 → 2 → 3 → 4, sequential)
|
|
6
|
+
|
|
7
|
+
## Problem Statement
|
|
8
|
+
|
|
9
|
+
The Code Factory pipeline (`auto-compound.sh` → `quality-gate.sh` → `run-plan.sh`) works but has accumulated technical debt: code duplication across scripts, hardcoded paths, missing quality gate steps, no cross-batch context for agents, and no prior-art search. Research across Notion, GitHub (10 repos), web best practices, and the codebase identified 24 concrete improvements.
|
|
10
|
+
|
|
11
|
+
## Research Findings
|
|
12
|
+
|
|
13
|
+
### Competitive Landscape (10 repos analyzed)
|
|
14
|
+
- **Unique strengths to preserve:** PRD with shell exit codes (no other repo does this), hookify pre-write guardrails, lesson-indexed anti-patterns
|
|
15
|
+
- **Gaps to close:** No prior-art search (only RepoMaster/NeurIPS 2025 does this), no lint step, no cross-batch context, no cost tracking
|
|
16
|
+
- **Patterns to adopt:** Aider's Architect/Editor split, structured `context_refs` from multi-agent-coding-system, 4-checkpoint quality gate pipeline
|
|
17
|
+
|
|
18
|
+
### Key Principles
|
|
19
|
+
- **Harness Engineering** (OpenAI): Design environments/feedback loops that govern agent behavior
|
|
20
|
+
- **Compound Product** (Ryan Carson): Self-improving agent loop where each iteration improves operating instructions
|
|
21
|
+
- **Agent specialization formula:** Model + Runtime + MCP + Skills = Specialized Agent (one agent + composable skills > many specialized agents)
|
|
22
|
+
|
|
23
|
+
### Module Health
|
|
24
|
+
| Script | Lines | Status |
|
|
25
|
+
|--------|-------|--------|
|
|
26
|
+
| `run-plan.sh` | 412 | VIOLATION (>300) — extract headless loop |
|
|
27
|
+
| `auto-compound.sh` | 230 | OK |
|
|
28
|
+
| `entropy-audit.sh` | 213 | OK |
|
|
29
|
+
| `lesson-check.sh` | 195 | OK |
|
|
30
|
+
| `analyze-report.sh` | 114 | OK |
|
|
31
|
+
| `quality-gate.sh` | 111 | OK |
|
|
32
|
+
|
|
33
|
+
### Code Duplications Found
|
|
34
|
+
1. Project type detection (auto-compound.sh lines 145-163, quality-gate.sh lines 30-50)
|
|
35
|
+
2. Arg parsing boilerplate (5 scripts repeat the same pattern)
|
|
36
|
+
3. Ollama API calls (analyze-report.sh, entropy-audit.sh)
|
|
37
|
+
4. Telegram credential loading (run-plan-notify.sh, lessons-review.sh)
|
|
38
|
+
5. JSON fence stripping (analyze-report.sh lines 104-110)
|
|
39
|
+
|
|
40
|
+
## Design
|
|
41
|
+
|
|
42
|
+
### Phase 1: Foundation (Shared Library + Module Compliance)
|
|
43
|
+
|
|
44
|
+
Extract duplicated code into a shared library and bring all scripts under the 300-line limit.
|
|
45
|
+
|
|
46
|
+
**Task 1.1: Create `scripts/lib/common.sh`**
|
|
47
|
+
Extract into shared functions:
|
|
48
|
+
- `detect_project_type()` — unified Python/Node/general detection
|
|
49
|
+
- `parse_common_args()` — `--help`, `--project-root`, `--verbose` boilerplate
|
|
50
|
+
- `strip_json_fences()` — remove ```json wrappers from LLM output
|
|
51
|
+
- `check_memory_available()` — memory guard (threshold parameterized)
|
|
52
|
+
- `require_command()` — check binary exists, print install hint
|
|
53
|
+
|
|
54
|
+
**Task 1.2: Create `scripts/lib/ollama.sh`**
|
|
55
|
+
Extract Ollama interaction:
|
|
56
|
+
- `ollama_query()` — submit prompt to ollama-queue or direct API
|
|
57
|
+
- `ollama_parse_json()` — query + strip fences + validate JSON
|
|
58
|
+
|
|
59
|
+
**Task 1.3: Refactor `auto-compound.sh` to use `common.sh`**
|
|
60
|
+
- Replace inline project detection with `detect_project_type()`
|
|
61
|
+
- Replace JSON stripping with `strip_json_fences()`
|
|
62
|
+
- Fix line 127: PRD output discarded to `/dev/null` with `|| true` (lesson-7 violation)
|
|
63
|
+
|
|
64
|
+
**Task 1.4: Refactor `quality-gate.sh` to use `common.sh`**
|
|
65
|
+
- Replace inline project detection with `detect_project_type()`
|
|
66
|
+
- Replace inline memory check with `check_memory_available()`
|
|
67
|
+
|
|
68
|
+
**Task 1.5: Refactor `entropy-audit.sh`**
|
|
69
|
+
- Replace hardcoded `PROJECTS_DIR="$HOME/Documents/projects"` (line 17) with `--project-root` arg or env var
|
|
70
|
+
- Use `ollama.sh` for LLM calls
|
|
71
|
+
|
|
72
|
+
**Task 1.6: Extract `scripts/lib/run-plan-headless.sh`**
|
|
73
|
+
- Move `run_mode_headless()` (lines 229-376, 148 lines) from `run-plan.sh` into dedicated lib module
|
|
74
|
+
- Target: `run-plan.sh` drops to ~260 lines
|
|
75
|
+
|
|
76
|
+
**Task 1.7: Refactor `analyze-report.sh` to use shared libs**
|
|
77
|
+
- Use `ollama.sh` for LLM calls
|
|
78
|
+
- Use `strip_json_fences()` from `common.sh`
|
|
79
|
+
|
|
80
|
+
### Phase 2: Accuracy (Fix Broken Pipeline Steps)
|
|
81
|
+
|
|
82
|
+
Fix the pipeline steps that silently fail or produce incomplete results.
|
|
83
|
+
|
|
84
|
+
**Task 2.1: Fix PRD invocation in `auto-compound.sh`**
|
|
85
|
+
- Line 127 discards `/create-prd` output — capture and validate
|
|
86
|
+
- Verify headless `claude --print` loads project-scoped commands from `~/Documents/.claude/commands/`
|
|
87
|
+
- If not, inline the PRD prompt or add `--commands-dir` flag
|
|
88
|
+
|
|
89
|
+
**Task 2.2: Fix test count parsing for non-pytest projects**
|
|
90
|
+
- `run-plan-quality-gate.sh` line 23: `grep -oP '\b(\d+) passed\b'` is pytest-only
|
|
91
|
+
- Add parsers for: `jest` (`Tests: N passed`), `go test` (`ok`/`FAIL`), `npm test` (TAP format)
|
|
92
|
+
- Return `-1` (skip regression check) when format is unrecognized, not `0` (which defeats detection)
|
|
93
|
+
|
|
94
|
+
**Task 2.3: Add cross-batch context to `run-plan-prompt.sh`**
|
|
95
|
+
- Include `git log --oneline -5` (recent commits from prior batches)
|
|
96
|
+
- Include last 20 lines of `progress.txt` (discoveries, decisions)
|
|
97
|
+
- Include previous quality gate result (pass/fail, test count)
|
|
98
|
+
- Keep prompt under 2000 tokens to leave room for batch instructions
|
|
99
|
+
|
|
100
|
+
**Task 2.4: Add cost/duration tracking to state**
|
|
101
|
+
- Track per-batch wall time (already computed but not saved)
|
|
102
|
+
- Track cumulative duration across batches
|
|
103
|
+
- Add `duration_seconds` field to batch entries in `.run-plan-state.json`
|
|
104
|
+
|
|
105
|
+
**Task 2.5: Wire Telegram credential loading through shared lib**
|
|
106
|
+
- Create `scripts/lib/telegram.sh` — single source for `_load_telegram_env()`
|
|
107
|
+
- Replace duplicate in `run-plan-notify.sh` and `lessons-review.sh`
|
|
108
|
+
|
|
109
|
+
### Phase 3: Quality Gates (Lint + Search + Status)
|
|
110
|
+
|
|
111
|
+
Add missing quality gate steps and a new prior-art search capability.
|
|
112
|
+
|
|
113
|
+
**Task 3.1: Add `ruff` lint step to `quality-gate.sh`**
|
|
114
|
+
- Run `ruff check --select E,W,F` for Python projects
|
|
115
|
+
- Run `eslint` for Node projects (if `.eslintrc*` exists)
|
|
116
|
+
- Gate: lint errors = fail, warnings = warn-only
|
|
117
|
+
|
|
118
|
+
**Task 3.2: Create `scripts/prior-art-search.sh`**
|
|
119
|
+
- Input: feature description or plan file
|
|
120
|
+
- Search GitHub via `gh search repos` and `gh search code`
|
|
121
|
+
- Search local codebase via `grep -r` for similar patterns
|
|
122
|
+
- Output: ranked list of relevant repos/files with relevance scores
|
|
123
|
+
- Integrate with `ast-grep` for structural code search (Phase 4)
|
|
124
|
+
|
|
125
|
+
**Task 3.3: Create `scripts/license-check.sh`**
|
|
126
|
+
- Check dependencies for license compatibility
|
|
127
|
+
- Python: parse `pip licenses` output
|
|
128
|
+
- Node: parse `license-checker` output
|
|
129
|
+
- Flag GPL/AGPL in MIT-licensed projects
|
|
130
|
+
|
|
131
|
+
**Task 3.4: Create `scripts/pipeline-status.sh`**
|
|
132
|
+
- Single-command view of all pipeline components
|
|
133
|
+
- Show: last run time, pass/fail, test count, batch progress
|
|
134
|
+
- Read from `.run-plan-state.json` and quality gate logs
|
|
135
|
+
|
|
136
|
+
**Task 3.5: Wire new gates into `quality-gate.sh`**
|
|
137
|
+
- Add lint step (Task 3.1) between lesson-check and tests
|
|
138
|
+
- Add license check (Task 3.3) as optional `--with-license` flag
|
|
139
|
+
- Preserve fast-path: skip slow checks when `--quick` flag is passed
|
|
140
|
+
|
|
141
|
+
**Task 3.6: Wire prior-art search into `auto-compound.sh`**
|
|
142
|
+
- Run before PRD generation
|
|
143
|
+
- Pass results as context to PRD prompt
|
|
144
|
+
- Log findings to `progress.txt`
|
|
145
|
+
|
|
146
|
+
### Phase 4: New Capabilities
|
|
147
|
+
|
|
148
|
+
Add advanced features based on research findings.
|
|
149
|
+
|
|
150
|
+
**Task 4.1: Create `scripts/failure-digest.sh`**
|
|
151
|
+
- Parse failed batch logs
|
|
152
|
+
- Extract: error messages, stack traces, failed test names
|
|
153
|
+
- Generate structured digest for retry prompts
|
|
154
|
+
- Replace the naive `tail -50` in `run-plan.sh` line 291
|
|
155
|
+
|
|
156
|
+
**Task 4.2: Add persistent `AGENTS.md` to worktrees**
|
|
157
|
+
- Auto-generated file listing agent capabilities used in the plan
|
|
158
|
+
- Include: tools allowed, model, permission mode, batch assignments
|
|
159
|
+
- Agents read this at start of each batch for team awareness
|
|
160
|
+
|
|
161
|
+
**Task 4.3: Add structured `context_refs` to plan format**
|
|
162
|
+
- Each batch can declare dependencies on prior batch outputs
|
|
163
|
+
- Format: `context_refs: [batch-2:src/auth.py, batch-3:tests/]`
|
|
164
|
+
- Parser extracts refs and includes referenced file contents in prompt
|
|
165
|
+
|
|
166
|
+
**Task 4.4: Add `ast-grep` integration to prior-art search**
|
|
167
|
+
- Structural code search (find patterns by AST shape, not text)
|
|
168
|
+
- Install: `cargo install ast-grep` or `npm i @ast-grep/cli`
|
|
169
|
+
- Use for: finding similar function signatures, API patterns, test structures
|
|
170
|
+
|
|
171
|
+
**Task 4.5: Implement team mode in `run-plan.sh`**
|
|
172
|
+
- Replace stub at lines 379-384
|
|
173
|
+
- Use Claude Code agent teams (`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`)
|
|
174
|
+
- Assign batches to parallel agents with shared state file
|
|
175
|
+
- Quality gate runs after each batch completion (any agent)
|
|
176
|
+
|
|
177
|
+
**Task 4.6: Add parallel patch sampling**
|
|
178
|
+
- For critical batches: generate N candidate implementations
|
|
179
|
+
- Run quality gate on each
|
|
180
|
+
- Keep the one with highest test count / cleanest lint
|
|
181
|
+
- Inspired by Agentless (NeurIPS 2024) approach
|
|
182
|
+
|
|
183
|
+
## Dependencies
|
|
184
|
+
|
|
185
|
+
- **Phase 1** has no external dependencies (pure refactoring)
|
|
186
|
+
- **Phase 2** depends on Phase 1 (shared libs)
|
|
187
|
+
- **Phase 3** depends on Phase 2 (accurate pipeline) + installs: `ruff`, `ast-grep`
|
|
188
|
+
- **Phase 4** depends on Phase 3 (quality gates) + requires agent teams feature
|
|
189
|
+
|
|
190
|
+
## Success Metrics
|
|
191
|
+
|
|
192
|
+
1. All scripts under 300 lines
|
|
193
|
+
2. Zero code duplication across scripts (shared lib extraction complete)
|
|
194
|
+
3. Quality gate catches lint errors, license issues, and test regressions
|
|
195
|
+
4. Prior-art search runs before every PRD generation
|
|
196
|
+
5. Cross-batch context reduces retry rate by providing agents with prior batch results
|
|
197
|
+
6. Pipeline status visible in single command
|
|
198
|
+
|
|
199
|
+
## Risk Mitigations
|
|
200
|
+
|
|
201
|
+
- **Breaking existing workflows:** Each phase is independently shippable. Phase 1 is pure refactoring with no behavior change.
|
|
202
|
+
- **Headless command loading:** Task 2.1 explicitly tests whether project-scoped commands work in headless mode. Fallback: inline the prompt.
|
|
203
|
+
- **Tool installation:** Install tools as needed per phase (ruff in Phase 3, ast-grep in Phase 4). No upfront bulk install.
|
|
204
|
+
- **Agent teams instability:** Phase 4 team mode depends on experimental feature flag. Headless mode remains the stable default.
|