autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,1965 @@
|
|
|
1
|
+
# npm Packaging Implementation Plan
|
|
2
|
+
|
|
3
|
+
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
4
|
+
|
|
5
|
+
**Goal:** Package the autonomous-coding-toolkit as an installable npm package (`act` CLI) with telemetry, benchmarks, and learning system infrastructure.
|
|
6
|
+
|
|
7
|
+
**Architecture:** Add `package.json` + `bin/act.js` Node.js router on top of existing bash scripts. No scripts move or change structure. Three new scripts (`init.sh`, `telemetry.sh`, `benchmarks/runner.sh`) follow existing patterns. All state remains project-local.
|
|
8
|
+
|
|
9
|
+
**Tech Stack:** Node.js 18+ (CLI router only), bash 4+ (all scripts), jq (state/telemetry)
|
|
10
|
+
|
|
11
|
+
**Design doc:** `docs/plans/2026-02-24-npm-packaging-design.md`
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Priority Tiers
|
|
16
|
+
|
|
17
|
+
- **P0 (Batches 1-4):** Required for `npm publish` — package.json, CLI router, init, portability fixes, README
|
|
18
|
+
- **P1 (Batches 5-7):** Learning system — telemetry capture/dashboard, quality gate integration, benchmark suite
|
|
19
|
+
- **P2 (Batches 8-9):** Enhancements — trust score, graduated autonomy, semantic echo-back Tier 2
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Batch 1: package.json + CLI Router
|
|
24
|
+
|
|
25
|
+
### Task 1: Create package.json
|
|
26
|
+
|
|
27
|
+
**Files:**
|
|
28
|
+
- Create: `package.json`
|
|
29
|
+
|
|
30
|
+
**Step 1: Create package.json**
|
|
31
|
+
|
|
32
|
+
```json
|
|
33
|
+
{
|
|
34
|
+
"name": "autonomous-coding-toolkit",
|
|
35
|
+
"version": "1.0.0",
|
|
36
|
+
"description": "Autonomous AI coding pipeline: quality gates, fresh-context execution, community lessons, and compounding learning",
|
|
37
|
+
"license": "MIT",
|
|
38
|
+
"author": "Justin McFarland <parthalon025@gmail.com>",
|
|
39
|
+
"homepage": "https://github.com/parthalon025/autonomous-coding-toolkit",
|
|
40
|
+
"repository": "https://github.com/parthalon025/autonomous-coding-toolkit",
|
|
41
|
+
"bin": {
|
|
42
|
+
"act": "./bin/act.js"
|
|
43
|
+
},
|
|
44
|
+
"files": [
|
|
45
|
+
"bin/",
|
|
46
|
+
"scripts/",
|
|
47
|
+
"skills/",
|
|
48
|
+
"commands/",
|
|
49
|
+
"agents/",
|
|
50
|
+
"hooks/",
|
|
51
|
+
"policies/",
|
|
52
|
+
"examples/",
|
|
53
|
+
"benchmarks/",
|
|
54
|
+
"docs/",
|
|
55
|
+
".claude-plugin/",
|
|
56
|
+
"Makefile",
|
|
57
|
+
"SECURITY.md"
|
|
58
|
+
],
|
|
59
|
+
"engines": {
|
|
60
|
+
"node": ">=18.0.0"
|
|
61
|
+
},
|
|
62
|
+
"os": [
|
|
63
|
+
"linux",
|
|
64
|
+
"darwin",
|
|
65
|
+
"win32"
|
|
66
|
+
],
|
|
67
|
+
"keywords": [
|
|
68
|
+
"autonomous-coding",
|
|
69
|
+
"ai-agents",
|
|
70
|
+
"quality-gates",
|
|
71
|
+
"claude-code",
|
|
72
|
+
"tdd",
|
|
73
|
+
"lessons-learned",
|
|
74
|
+
"headless",
|
|
75
|
+
"multi-armed-bandit",
|
|
76
|
+
"code-review",
|
|
77
|
+
"pipeline"
|
|
78
|
+
]
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Step 2: Verify package.json is valid**
|
|
83
|
+
|
|
84
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && node -e "require('./package.json'); console.log('valid')"`
|
|
85
|
+
Expected: `valid`
|
|
86
|
+
|
|
87
|
+
**Step 3: Verify npm pack lists expected files**
|
|
88
|
+
|
|
89
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && npm pack --dry-run 2>&1 | head -20`
|
|
90
|
+
Expected: Output should list `bin/act.js`, `scripts/`, `skills/`, `docs/`, etc. Should NOT list `logs/`, `.run-plan-state.json`, `.worktrees/`.
|
|
91
|
+
|
|
92
|
+
**Step 4: Commit**
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
git add package.json
|
|
96
|
+
git commit -m "feat: add package.json for npm distribution"
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Task 2: Create bin/act.js CLI Router
|
|
100
|
+
|
|
101
|
+
**Files:**
|
|
102
|
+
- Create: `bin/act.js`
|
|
103
|
+
|
|
104
|
+
**Step 1: Create directory**
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
mkdir -p bin
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Step 2: Write bin/act.js**
|
|
111
|
+
|
|
112
|
+
```javascript
|
|
113
|
+
#!/usr/bin/env node
|
|
114
|
+
'use strict';
|
|
115
|
+
|
|
116
|
+
const { execFileSync, execSync } = require('child_process');
|
|
117
|
+
const path = require('path');
|
|
118
|
+
const fs = require('fs');
|
|
119
|
+
|
|
120
|
+
const TOOLKIT_ROOT = path.resolve(__dirname, '..');
|
|
121
|
+
const SCRIPTS = path.join(TOOLKIT_ROOT, 'scripts');
|
|
122
|
+
const VERSION = require(path.join(TOOLKIT_ROOT, 'package.json')).version;
|
|
123
|
+
|
|
124
|
+
// --- Platform check ---
|
|
125
|
+
function checkBash() {
|
|
126
|
+
try {
|
|
127
|
+
execFileSync('bash', ['--version'], { stdio: 'pipe' });
|
|
128
|
+
} catch {
|
|
129
|
+
console.error('Error: bash is required but not found.');
|
|
130
|
+
if (process.platform === 'win32') {
|
|
131
|
+
console.error('');
|
|
132
|
+
console.error('On Windows, install WSL (Windows Subsystem for Linux):');
|
|
133
|
+
console.error(' wsl --install');
|
|
134
|
+
console.error('Then run this command inside WSL.');
|
|
135
|
+
}
|
|
136
|
+
process.exit(1);
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
// --- Dependency check ---
|
|
141
|
+
function checkDeps() {
|
|
142
|
+
const required = ['git', 'jq'];
|
|
143
|
+
const missing = required.filter(cmd => {
|
|
144
|
+
try {
|
|
145
|
+
execFileSync('which', [cmd], { stdio: 'pipe' });
|
|
146
|
+
return false;
|
|
147
|
+
} catch {
|
|
148
|
+
return true;
|
|
149
|
+
}
|
|
150
|
+
});
|
|
151
|
+
if (missing.length > 0) {
|
|
152
|
+
console.error(`Error: Required commands not found: ${missing.join(', ')}`);
|
|
153
|
+
console.error('Install them and try again.');
|
|
154
|
+
process.exit(1);
|
|
155
|
+
}
|
|
156
|
+
}
|
|
157
|
+
|
|
158
|
+
// --- Command routing ---
|
|
159
|
+
const COMMANDS = {
|
|
160
|
+
// Execution
|
|
161
|
+
'plan': { script: 'run-plan.sh' },
|
|
162
|
+
'compound': { script: 'auto-compound.sh' },
|
|
163
|
+
'mab': { script: 'mab-run.sh' },
|
|
164
|
+
|
|
165
|
+
// Quality
|
|
166
|
+
'gate': { script: 'quality-gate.sh' },
|
|
167
|
+
'check': { script: 'lesson-check.sh' },
|
|
168
|
+
'policy': { script: 'policy-check.sh' },
|
|
169
|
+
'research-gate': { script: 'research-gate.sh' },
|
|
170
|
+
'validate': { script: 'validate-all.sh' },
|
|
171
|
+
'validate-plan': { script: 'validate-plan-quality.sh' },
|
|
172
|
+
'validate-prd': { script: 'validate-prd.sh' },
|
|
173
|
+
|
|
174
|
+
// Lessons
|
|
175
|
+
'lessons': { dispatch: true },
|
|
176
|
+
|
|
177
|
+
// Analysis
|
|
178
|
+
'audit': { script: 'entropy-audit.sh' },
|
|
179
|
+
'batch-audit': { script: 'batch-audit.sh' },
|
|
180
|
+
'batch-test': { script: 'batch-test.sh' },
|
|
181
|
+
'analyze': { script: 'analyze-report.sh' },
|
|
182
|
+
'digest': { script: 'failure-digest.sh' },
|
|
183
|
+
'status': { script: 'pipeline-status.sh' },
|
|
184
|
+
'architecture': { script: 'architecture-map.sh' },
|
|
185
|
+
|
|
186
|
+
// Setup
|
|
187
|
+
'init': { script: 'init.sh' },
|
|
188
|
+
'license-check': { script: 'license-check.sh' },
|
|
189
|
+
'module-size': { script: 'module-size-check.sh' },
|
|
190
|
+
|
|
191
|
+
// Telemetry
|
|
192
|
+
'telemetry': { script: 'telemetry.sh' },
|
|
193
|
+
|
|
194
|
+
// Benchmarks
|
|
195
|
+
'benchmark': { script: path.join('..', 'benchmarks', 'runner.sh'), relative: true },
|
|
196
|
+
};
|
|
197
|
+
|
|
198
|
+
// Lessons sub-dispatch
|
|
199
|
+
const LESSONS_COMMANDS = {
|
|
200
|
+
'pull': { script: 'pull-community-lessons.sh' },
|
|
201
|
+
'check': { script: 'lesson-check.sh', args: ['--list'] },
|
|
202
|
+
'promote': { script: 'promote-mab-lessons.sh' },
|
|
203
|
+
'infer': { script: 'scope-infer.sh' },
|
|
204
|
+
};
|
|
205
|
+
|
|
206
|
+
function runScript(scriptPath, args) {
|
|
207
|
+
const fullPath = path.join(SCRIPTS, scriptPath);
|
|
208
|
+
if (!fs.existsSync(fullPath)) {
|
|
209
|
+
console.error(`Error: Script not found: ${fullPath}`);
|
|
210
|
+
console.error('This command may not be available yet.');
|
|
211
|
+
process.exit(1);
|
|
212
|
+
}
|
|
213
|
+
try {
|
|
214
|
+
execFileSync('bash', [fullPath, ...args], { stdio: 'inherit' });
|
|
215
|
+
} catch (err) {
|
|
216
|
+
process.exit(err.status || 1);
|
|
217
|
+
}
|
|
218
|
+
}
|
|
219
|
+
|
|
220
|
+
function showHelp() {
|
|
221
|
+
console.log(`Autonomous Coding Toolkit v${VERSION}`);
|
|
222
|
+
console.log('');
|
|
223
|
+
console.log('Usage: act <command> [options]');
|
|
224
|
+
console.log('');
|
|
225
|
+
console.log('Execution:');
|
|
226
|
+
console.log(' plan <file> [flags] Headless/team/MAB batch execution');
|
|
227
|
+
console.log(' plan --resume Resume interrupted execution');
|
|
228
|
+
console.log(' compound [dir] Full pipeline: report→PRD→execute→PR');
|
|
229
|
+
console.log(' mab <flags> Multi-Armed Bandit competing agents');
|
|
230
|
+
console.log('');
|
|
231
|
+
console.log('Quality:');
|
|
232
|
+
console.log(' gate [flags] Composite quality gate');
|
|
233
|
+
console.log(' check [files...] Syntactic anti-pattern scan');
|
|
234
|
+
console.log(' policy [flags] Advisory positive-pattern check');
|
|
235
|
+
console.log(' validate Toolkit self-validation');
|
|
236
|
+
console.log(' validate-plan <file> Score plan quality (8 dimensions)');
|
|
237
|
+
console.log(' validate-prd [file] Validate PRD JSON structure');
|
|
238
|
+
console.log('');
|
|
239
|
+
console.log('Lessons:');
|
|
240
|
+
console.log(' lessons pull [--remote] Sync community lessons');
|
|
241
|
+
console.log(' lessons check List active lesson checks');
|
|
242
|
+
console.log(' lessons promote Auto-promote MAB patterns');
|
|
243
|
+
console.log(' lessons infer [--apply] Infer scope tags');
|
|
244
|
+
console.log('');
|
|
245
|
+
console.log('Analysis:');
|
|
246
|
+
console.log(' audit [flags] Doc drift & naming violations');
|
|
247
|
+
console.log(' batch-audit <dir> Cross-project audit');
|
|
248
|
+
console.log(' batch-test <dir> Memory-aware cross-project tests');
|
|
249
|
+
console.log(' analyze <report> Extract priority from report');
|
|
250
|
+
console.log(' digest <log> Summarize failure patterns');
|
|
251
|
+
console.log(' status [dir] Pipeline health check');
|
|
252
|
+
console.log(' architecture [dir] Generate architecture diagram');
|
|
253
|
+
console.log('');
|
|
254
|
+
console.log('Telemetry:');
|
|
255
|
+
console.log(' telemetry show Dashboard: success rate, cost, lesson hits');
|
|
256
|
+
console.log(' telemetry export Export anonymized run data');
|
|
257
|
+
console.log(' telemetry import <f> Import community aggregate data');
|
|
258
|
+
console.log(' telemetry reset Clear local telemetry');
|
|
259
|
+
console.log('');
|
|
260
|
+
console.log('Benchmarks:');
|
|
261
|
+
console.log(' benchmark run [name] Execute benchmark tasks');
|
|
262
|
+
console.log(' benchmark compare a b Compare two benchmark results');
|
|
263
|
+
console.log('');
|
|
264
|
+
console.log('Setup:');
|
|
265
|
+
console.log(' init Bootstrap project for toolkit use');
|
|
266
|
+
console.log(' init --quickstart Fast lane: working example in <3 min');
|
|
267
|
+
console.log(' license-check GPL/AGPL dependency audit');
|
|
268
|
+
console.log(' module-size Detect oversized modules');
|
|
269
|
+
console.log('');
|
|
270
|
+
console.log('Meta:');
|
|
271
|
+
console.log(' version Print version');
|
|
272
|
+
console.log(' help Show this help');
|
|
273
|
+
}
|
|
274
|
+
|
|
275
|
+
// --- Main ---
|
|
276
|
+
function main() {
|
|
277
|
+
const args = process.argv.slice(2);
|
|
278
|
+
const command = args[0];
|
|
279
|
+
const rest = args.slice(1);
|
|
280
|
+
|
|
281
|
+
if (!command || command === 'help' || command === '--help' || command === '-h') {
|
|
282
|
+
showHelp();
|
|
283
|
+
process.exit(0);
|
|
284
|
+
}
|
|
285
|
+
|
|
286
|
+
if (command === 'version' || command === '--version' || command === '-v') {
|
|
287
|
+
console.log(`act v${VERSION}`);
|
|
288
|
+
process.exit(0);
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
checkBash();
|
|
292
|
+
checkDeps();
|
|
293
|
+
|
|
294
|
+
// Lessons sub-dispatch
|
|
295
|
+
if (command === 'lessons') {
|
|
296
|
+
const sub = rest[0];
|
|
297
|
+
if (!sub || !LESSONS_COMMANDS[sub]) {
|
|
298
|
+
console.error('Usage: act lessons <pull|check|promote|infer> [options]');
|
|
299
|
+
process.exit(1);
|
|
300
|
+
}
|
|
301
|
+
const cmd = LESSONS_COMMANDS[sub];
|
|
302
|
+
const subArgs = cmd.args ? [...cmd.args, ...rest.slice(1)] : rest.slice(1);
|
|
303
|
+
runScript(cmd.script, subArgs);
|
|
304
|
+
return;
|
|
305
|
+
}
|
|
306
|
+
|
|
307
|
+
const cmd = COMMANDS[command];
|
|
308
|
+
if (!cmd) {
|
|
309
|
+
console.error(`Unknown command: ${command}`);
|
|
310
|
+
console.error('Run "act help" for available commands.');
|
|
311
|
+
process.exit(1);
|
|
312
|
+
}
|
|
313
|
+
|
|
314
|
+
if (cmd.relative) {
|
|
315
|
+
// Script path relative to toolkit root, not scripts/
|
|
316
|
+
const fullPath = path.join(TOOLKIT_ROOT, 'benchmarks', 'runner.sh');
|
|
317
|
+
if (!fs.existsSync(fullPath)) {
|
|
318
|
+
console.error(`Error: Script not found: ${fullPath}`);
|
|
319
|
+
process.exit(1);
|
|
320
|
+
}
|
|
321
|
+
try {
|
|
322
|
+
execFileSync('bash', [fullPath, ...rest], { stdio: 'inherit' });
|
|
323
|
+
} catch (err) {
|
|
324
|
+
process.exit(err.status || 1);
|
|
325
|
+
}
|
|
326
|
+
return;
|
|
327
|
+
}
|
|
328
|
+
|
|
329
|
+
runScript(cmd.script, rest);
|
|
330
|
+
}
|
|
331
|
+
|
|
332
|
+
main();
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
**Step 3: Make executable**
|
|
336
|
+
|
|
337
|
+
```bash
|
|
338
|
+
chmod +x bin/act.js
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
**Step 4: Verify the router starts**
|
|
342
|
+
|
|
343
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && node bin/act.js version`
|
|
344
|
+
Expected: `act v1.0.0`
|
|
345
|
+
|
|
346
|
+
Run: `node bin/act.js help | head -5`
|
|
347
|
+
Expected: Shows "Autonomous Coding Toolkit v1.0.0" and "Usage: act <command> [options]"
|
|
348
|
+
|
|
349
|
+
**Step 5: Verify subcommand routing works**
|
|
350
|
+
|
|
351
|
+
Run: `node bin/act.js validate --help`
|
|
352
|
+
Expected: Shows validate-all.sh usage (or runs successfully)
|
|
353
|
+
|
|
354
|
+
Run: `node bin/act.js gate --help`
|
|
355
|
+
Expected: Shows quality-gate.sh usage
|
|
356
|
+
|
|
357
|
+
**Step 6: Commit**
|
|
358
|
+
|
|
359
|
+
```bash
|
|
360
|
+
git add bin/act.js
|
|
361
|
+
git commit -m "feat: add bin/act.js CLI router for npm distribution"
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
### Task 3: Write test for CLI router
|
|
365
|
+
|
|
366
|
+
**Files:**
|
|
367
|
+
- Create: `scripts/tests/test-act-cli.sh`
|
|
368
|
+
|
|
369
|
+
**Step 1: Write the test**
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
#!/usr/bin/env bash
|
|
373
|
+
# Test bin/act.js — CLI router
|
|
374
|
+
set -euo pipefail
|
|
375
|
+
|
|
376
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
377
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
378
|
+
ACT="$REPO_ROOT/bin/act.js"
|
|
379
|
+
|
|
380
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
381
|
+
|
|
382
|
+
# --- Test 1: version ---
|
|
383
|
+
output=$(node "$ACT" version 2>&1)
|
|
384
|
+
assert_contains "act version prints version" "act v" "$output"
|
|
385
|
+
|
|
386
|
+
# --- Test 2: help ---
|
|
387
|
+
output=$(node "$ACT" help 2>&1)
|
|
388
|
+
assert_contains "act help shows usage" "Usage: act <command>" "$output"
|
|
389
|
+
assert_contains "act help lists plan command" "plan" "$output"
|
|
390
|
+
assert_contains "act help lists gate command" "gate" "$output"
|
|
391
|
+
|
|
392
|
+
# --- Test 3: unknown command exits non-zero ---
|
|
393
|
+
exit_code=0
|
|
394
|
+
node "$ACT" nonexistent-command >/dev/null 2>&1 || exit_code=$?
|
|
395
|
+
assert_eq "unknown command exits non-zero" "1" "$exit_code"
|
|
396
|
+
|
|
397
|
+
# --- Test 4: validate routes correctly ---
|
|
398
|
+
output=$(node "$ACT" validate --help 2>&1 || true)
|
|
399
|
+
assert_contains "validate routes to validate-all.sh" "validate" "$output"
|
|
400
|
+
|
|
401
|
+
# --- Test 5: lessons subcommand without sub shows usage ---
|
|
402
|
+
exit_code=0
|
|
403
|
+
output=$(node "$ACT" lessons 2>&1) || exit_code=$?
|
|
404
|
+
assert_eq "lessons without sub exits non-zero" "1" "$exit_code"
|
|
405
|
+
assert_contains "lessons shows usage hint" "Usage: act lessons" "$output"
|
|
406
|
+
|
|
407
|
+
report_results
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
**Step 2: Make executable and run**
|
|
411
|
+
|
|
412
|
+
```bash
|
|
413
|
+
chmod +x scripts/tests/test-act-cli.sh
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-act-cli.sh`
|
|
417
|
+
Expected: All tests PASS
|
|
418
|
+
|
|
419
|
+
**Step 3: Verify run-all-tests discovers it**
|
|
420
|
+
|
|
421
|
+
Run: `bash scripts/tests/run-all-tests.sh 2>&1 | tail -5`
|
|
422
|
+
Expected: test-act-cli.sh appears in the test list, all pass
|
|
423
|
+
|
|
424
|
+
**Step 4: Commit**
|
|
425
|
+
|
|
426
|
+
```bash
|
|
427
|
+
git add scripts/tests/test-act-cli.sh
|
|
428
|
+
git commit -m "test: add CLI router tests for bin/act.js"
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
---
|
|
432
|
+
|
|
433
|
+
## Batch 2: Project Bootstrapper (act init)
|
|
434
|
+
|
|
435
|
+
### Task 4: Write test for init.sh
|
|
436
|
+
|
|
437
|
+
**Files:**
|
|
438
|
+
- Create: `scripts/tests/test-init.sh`
|
|
439
|
+
|
|
440
|
+
**Step 1: Write the failing test**
|
|
441
|
+
|
|
442
|
+
```bash
|
|
443
|
+
#!/usr/bin/env bash
|
|
444
|
+
# Test scripts/init.sh — project bootstrapper
|
|
445
|
+
set -euo pipefail
|
|
446
|
+
|
|
447
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
448
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
449
|
+
INIT_SCRIPT="$REPO_ROOT/scripts/init.sh"
|
|
450
|
+
|
|
451
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
452
|
+
|
|
453
|
+
# --- Setup temp project ---
|
|
454
|
+
WORK=$(mktemp -d)
|
|
455
|
+
trap 'rm -rf "$WORK"' EXIT
|
|
456
|
+
cd "$WORK"
|
|
457
|
+
git init -q
|
|
458
|
+
|
|
459
|
+
# --- Test 1: init creates tasks/ directory ---
|
|
460
|
+
bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true
|
|
461
|
+
assert_eq "init creates tasks/ directory" "true" "$([ -d "$WORK/tasks" ] && echo true || echo false)"
|
|
462
|
+
|
|
463
|
+
# --- Test 2: init creates progress.txt ---
|
|
464
|
+
assert_eq "init creates progress.txt" "true" "$([ -f "$WORK/progress.txt" ] && echo true || echo false)"
|
|
465
|
+
|
|
466
|
+
# --- Test 3: init creates logs/ directory ---
|
|
467
|
+
assert_eq "init creates logs/ directory" "true" "$([ -d "$WORK/logs" ] && echo true || echo false)"
|
|
468
|
+
|
|
469
|
+
# --- Test 4: init detects project type ---
|
|
470
|
+
output=$(bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true)
|
|
471
|
+
assert_contains "init detects project type" "Detected:" "$output"
|
|
472
|
+
|
|
473
|
+
# --- Test 5: init with --quickstart copies quickstart plan ---
|
|
474
|
+
mkdir -p "$WORK/docs/plans"
|
|
475
|
+
bash "$INIT_SCRIPT" --project-root "$WORK" --quickstart 2>&1 || true
|
|
476
|
+
assert_eq "quickstart creates plan file" "true" "$([ -f "$WORK/docs/plans/quickstart.md" ] && echo true || echo false)"
|
|
477
|
+
|
|
478
|
+
# --- Test 6: init is idempotent ---
|
|
479
|
+
bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true
|
|
480
|
+
exit_code=0
|
|
481
|
+
bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || exit_code=$?
|
|
482
|
+
assert_eq "init is idempotent (exit 0 on re-run)" "0" "$exit_code"
|
|
483
|
+
|
|
484
|
+
report_results
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
**Step 2: Make executable and verify it fails**
|
|
488
|
+
|
|
489
|
+
```bash
|
|
490
|
+
chmod +x scripts/tests/test-init.sh
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-init.sh 2>&1 | tail -3`
|
|
494
|
+
Expected: FAIL (init.sh doesn't exist yet)
|
|
495
|
+
|
|
496
|
+
### Task 5: Implement init.sh
|
|
497
|
+
|
|
498
|
+
**Files:**
|
|
499
|
+
- Create: `scripts/init.sh`
|
|
500
|
+
|
|
501
|
+
**Step 1: Write the implementation**
|
|
502
|
+
|
|
503
|
+
```bash
|
|
504
|
+
#!/usr/bin/env bash
|
|
505
|
+
# init.sh — Bootstrap a project for use with the Autonomous Coding Toolkit
|
|
506
|
+
#
|
|
507
|
+
# Usage: init.sh --project-root <dir> [--quickstart]
|
|
508
|
+
set -euo pipefail
|
|
509
|
+
|
|
510
|
+
SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
|
|
511
|
+
TOOLKIT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
|
512
|
+
source "$SCRIPT_DIR/lib/common.sh"
|
|
513
|
+
|
|
514
|
+
PROJECT_ROOT=""
|
|
515
|
+
QUICKSTART=false
|
|
516
|
+
|
|
517
|
+
usage() {
|
|
518
|
+
cat <<'USAGE'
|
|
519
|
+
Usage: init.sh --project-root <dir> [--quickstart]
|
|
520
|
+
|
|
521
|
+
Bootstrap a project for the Autonomous Coding Toolkit.
|
|
522
|
+
|
|
523
|
+
Creates:
|
|
524
|
+
tasks/ — PRD and acceptance criteria
|
|
525
|
+
logs/ — Telemetry, routing decisions, failure patterns
|
|
526
|
+
progress.txt — Append-only discovery log
|
|
527
|
+
|
|
528
|
+
Options:
|
|
529
|
+
--project-root <dir> Project directory to initialize (required)
|
|
530
|
+
--quickstart Copy quickstart plan + run quality gate
|
|
531
|
+
--help, -h Show this help
|
|
532
|
+
|
|
533
|
+
USAGE
|
|
534
|
+
exit 0
|
|
535
|
+
}
|
|
536
|
+
|
|
537
|
+
while [[ $# -gt 0 ]]; do
|
|
538
|
+
case "$1" in
|
|
539
|
+
--project-root) PROJECT_ROOT="${2:-}"; shift 2 ;;
|
|
540
|
+
--quickstart) QUICKSTART=true; shift ;;
|
|
541
|
+
--help|-h) usage ;;
|
|
542
|
+
*) echo "init: unknown option: $1" >&2; exit 1 ;;
|
|
543
|
+
esac
|
|
544
|
+
done
|
|
545
|
+
|
|
546
|
+
if [[ -z "$PROJECT_ROOT" ]]; then
|
|
547
|
+
echo "init: --project-root is required" >&2
|
|
548
|
+
exit 1
|
|
549
|
+
fi
|
|
550
|
+
|
|
551
|
+
PROJECT_ROOT="$(cd "$PROJECT_ROOT" && pwd)"
|
|
552
|
+
|
|
553
|
+
echo "Autonomous Coding Toolkit — Project Init"
|
|
554
|
+
echo "========================================="
|
|
555
|
+
echo ""
|
|
556
|
+
|
|
557
|
+
# Detect project type
|
|
558
|
+
project_type=$(detect_project_type "$PROJECT_ROOT")
|
|
559
|
+
echo "Detected: $project_type project"
|
|
560
|
+
|
|
561
|
+
# Create directories
|
|
562
|
+
mkdir -p "$PROJECT_ROOT/tasks"
|
|
563
|
+
mkdir -p "$PROJECT_ROOT/logs"
|
|
564
|
+
mkdir -p "$PROJECT_ROOT/docs/plans"
|
|
565
|
+
echo "Created: tasks/, logs/, docs/plans/"
|
|
566
|
+
|
|
567
|
+
# Create progress.txt if missing
|
|
568
|
+
if [[ ! -f "$PROJECT_ROOT/progress.txt" ]]; then
|
|
569
|
+
echo "# Progress — $(basename "$PROJECT_ROOT")" > "$PROJECT_ROOT/progress.txt"
|
|
570
|
+
echo "# Append-only discovery log. Read at start of each batch." >> "$PROJECT_ROOT/progress.txt"
|
|
571
|
+
echo "" >> "$PROJECT_ROOT/progress.txt"
|
|
572
|
+
echo "Created: progress.txt"
|
|
573
|
+
else
|
|
574
|
+
echo "Exists: progress.txt (skipped)"
|
|
575
|
+
fi
|
|
576
|
+
|
|
577
|
+
# Detect language for scope tags
|
|
578
|
+
scope_lang=""
|
|
579
|
+
case "$project_type" in
|
|
580
|
+
python) scope_lang="language:python" ;;
|
|
581
|
+
node) scope_lang="language:javascript" ;;
|
|
582
|
+
bash) scope_lang="language:bash" ;;
|
|
583
|
+
*) scope_lang="" ;;
|
|
584
|
+
esac
|
|
585
|
+
|
|
586
|
+
# Print next steps
|
|
587
|
+
echo ""
|
|
588
|
+
echo "--- Next Steps ---"
|
|
589
|
+
echo ""
|
|
590
|
+
echo "1. Quality gate: act gate --project-root $PROJECT_ROOT"
|
|
591
|
+
echo "2. Run a plan: act plan docs/plans/your-plan.md"
|
|
592
|
+
|
|
593
|
+
if [[ -n "$scope_lang" ]]; then
|
|
594
|
+
echo ""
|
|
595
|
+
echo "Recommended: Add to your CLAUDE.md:"
|
|
596
|
+
echo " ## Scope Tags"
|
|
597
|
+
echo " $scope_lang"
|
|
598
|
+
fi
|
|
599
|
+
|
|
600
|
+
# Quickstart mode
|
|
601
|
+
if [[ "$QUICKSTART" == true ]]; then
|
|
602
|
+
echo ""
|
|
603
|
+
echo "--- Quickstart ---"
|
|
604
|
+
if [[ -f "$TOOLKIT_ROOT/examples/quickstart-plan.md" ]]; then
|
|
605
|
+
cp "$TOOLKIT_ROOT/examples/quickstart-plan.md" "$PROJECT_ROOT/docs/plans/quickstart.md"
|
|
606
|
+
echo "Copied: docs/plans/quickstart.md"
|
|
607
|
+
echo ""
|
|
608
|
+
echo "Run your first quality-gated execution:"
|
|
609
|
+
echo " act plan docs/plans/quickstart.md"
|
|
610
|
+
else
|
|
611
|
+
echo "WARNING: quickstart-plan.md not found in toolkit" >&2
|
|
612
|
+
fi
|
|
613
|
+
fi
|
|
614
|
+
|
|
615
|
+
echo ""
|
|
616
|
+
echo "Init complete."
|
|
617
|
+
```
|
|
618
|
+
|
|
619
|
+
**Step 2: Make executable**
|
|
620
|
+
|
|
621
|
+
```bash
|
|
622
|
+
chmod +x scripts/init.sh
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
**Step 3: Run the tests**
|
|
626
|
+
|
|
627
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-init.sh`
|
|
628
|
+
Expected: All tests PASS
|
|
629
|
+
|
|
630
|
+
**Step 4: Commit**
|
|
631
|
+
|
|
632
|
+
```bash
|
|
633
|
+
git add scripts/init.sh scripts/tests/test-init.sh
|
|
634
|
+
git commit -m "feat: add init.sh project bootstrapper with quickstart mode"
|
|
635
|
+
```
|
|
636
|
+
|
|
637
|
+
---
|
|
638
|
+
|
|
639
|
+
## Batch 3: Portability Fixes
|
|
640
|
+
|
|
641
|
+
### Task 6: Fix hardcoded ~/.env in telegram.sh
|
|
642
|
+
|
|
643
|
+
**Files:**
|
|
644
|
+
- Modify: `scripts/lib/telegram.sh:9`
|
|
645
|
+
- Create: `scripts/tests/test-telegram-env.sh`
|
|
646
|
+
|
|
647
|
+
**Step 1: Write the failing test**
|
|
648
|
+
|
|
649
|
+
```bash
|
|
650
|
+
#!/usr/bin/env bash
|
|
651
|
+
# Test telegram.sh — ACT_ENV_FILE support
|
|
652
|
+
set -euo pipefail
|
|
653
|
+
|
|
654
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
655
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
656
|
+
|
|
657
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
658
|
+
|
|
659
|
+
# --- Setup ---
|
|
660
|
+
WORK=$(mktemp -d)
|
|
661
|
+
trap 'rm -rf "$WORK"' EXIT
|
|
662
|
+
|
|
663
|
+
# Create a fake .env
|
|
664
|
+
cat > "$WORK/test.env" <<'ENV'
|
|
665
|
+
TELEGRAM_BOT_TOKEN=test-token-123
|
|
666
|
+
TELEGRAM_CHAT_ID=test-chat-456
|
|
667
|
+
ENV
|
|
668
|
+
|
|
669
|
+
# --- Test 1: ACT_ENV_FILE overrides default ---
|
|
670
|
+
(
|
|
671
|
+
export ACT_ENV_FILE="$WORK/test.env"
|
|
672
|
+
source "$REPO_ROOT/scripts/lib/telegram.sh"
|
|
673
|
+
_load_telegram_env
|
|
674
|
+
assert_eq "ACT_ENV_FILE loads token" "test-token-123" "$TELEGRAM_BOT_TOKEN"
|
|
675
|
+
assert_eq "ACT_ENV_FILE loads chat id" "test-chat-456" "$TELEGRAM_CHAT_ID"
|
|
676
|
+
)
|
|
677
|
+
|
|
678
|
+
# --- Test 2: Explicit argument still works ---
|
|
679
|
+
(
|
|
680
|
+
source "$REPO_ROOT/scripts/lib/telegram.sh"
|
|
681
|
+
_load_telegram_env "$WORK/test.env"
|
|
682
|
+
assert_eq "Explicit arg loads token" "test-token-123" "$TELEGRAM_BOT_TOKEN"
|
|
683
|
+
)
|
|
684
|
+
|
|
685
|
+
# --- Test 3: Missing file returns error ---
|
|
686
|
+
(
|
|
687
|
+
source "$REPO_ROOT/scripts/lib/telegram.sh"
|
|
688
|
+
exit_code=0
|
|
689
|
+
_load_telegram_env "$WORK/nonexistent.env" 2>/dev/null || exit_code=$?
|
|
690
|
+
assert_eq "Missing env file returns 1" "1" "$exit_code"
|
|
691
|
+
)
|
|
692
|
+
|
|
693
|
+
report_results
|
|
694
|
+
```
|
|
695
|
+
|
|
696
|
+
**Step 2: Make executable and verify it fails**
|
|
697
|
+
|
|
698
|
+
```bash
|
|
699
|
+
chmod +x scripts/tests/test-telegram-env.sh
|
|
700
|
+
```
|
|
701
|
+
|
|
702
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telegram-env.sh 2>&1 | tail -3`
|
|
703
|
+
Expected: Test 1 FAILS (ACT_ENV_FILE not recognized yet)
|
|
704
|
+
|
|
705
|
+
**Step 3: Fix telegram.sh**
|
|
706
|
+
|
|
707
|
+
In `scripts/lib/telegram.sh`, change line 9 from:
|
|
708
|
+
|
|
709
|
+
```bash
|
|
710
|
+
local env_file="${1:-$HOME/.env}"
|
|
711
|
+
```
|
|
712
|
+
|
|
713
|
+
to:
|
|
714
|
+
|
|
715
|
+
```bash
|
|
716
|
+
local env_file="${1:-${ACT_ENV_FILE:-$HOME/.env}}"
|
|
717
|
+
```
|
|
718
|
+
|
|
719
|
+
This adds `ACT_ENV_FILE` as an intermediate default — if set, it overrides `$HOME/.env`; if not, behavior is unchanged.
|
|
720
|
+
|
|
721
|
+
**Step 4: Run the tests**
|
|
722
|
+
|
|
723
|
+
Run: `bash scripts/tests/test-telegram-env.sh`
|
|
724
|
+
Expected: All tests PASS
|
|
725
|
+
|
|
726
|
+
**Step 5: Commit**
|
|
727
|
+
|
|
728
|
+
```bash
|
|
729
|
+
git add scripts/lib/telegram.sh scripts/tests/test-telegram-env.sh
|
|
730
|
+
git commit -m "fix: support ACT_ENV_FILE in telegram.sh for portable installs"
|
|
731
|
+
```
|
|
732
|
+
|
|
733
|
+
### Task 7: Add ACT_ENV_FILE support to ollama.sh
|
|
734
|
+
|
|
735
|
+
**Files:**
|
|
736
|
+
- Modify: `scripts/lib/ollama.sh` (add env file sourcing)
|
|
737
|
+
|
|
738
|
+
**Step 1: Verify current behavior**
|
|
739
|
+
|
|
740
|
+
The ollama.sh module already uses env vars (`OLLAMA_DIRECT_URL`, `OLLAMA_QUEUE_URL`) with defaults. No hardcoded path to fix — the credentials (if any) come from the calling script's environment.
|
|
741
|
+
|
|
742
|
+
If `ACT_ENV_FILE` is set, the calling script (e.g., `auto-compound.sh`) should source it. This is not an ollama.sh change — it's a convention.
|
|
743
|
+
|
|
744
|
+
**Step 2: Verify no change needed**
|
|
745
|
+
|
|
746
|
+
Run: `grep -n 'HOME\|\.env' ~/Documents/projects/autonomous-coding-toolkit/scripts/lib/ollama.sh`
|
|
747
|
+
Expected: No matches (ollama.sh has no hardcoded paths)
|
|
748
|
+
|
|
749
|
+
**Step 3: Skip — no change needed**
|
|
750
|
+
|
|
751
|
+
ollama.sh is already portable. Document the `ACT_ENV_FILE` convention in init.sh output instead.
|
|
752
|
+
|
|
753
|
+
### Task 8: Add project-local lessons fallback to lesson-check.sh
|
|
754
|
+
|
|
755
|
+
**Files:**
|
|
756
|
+
- Modify: `scripts/lesson-check.sh:8`
|
|
757
|
+
- Create: `scripts/tests/test-lesson-local.sh`
|
|
758
|
+
|
|
759
|
+
**Step 1: Write the failing test**
|
|
760
|
+
|
|
761
|
+
```bash
|
|
762
|
+
#!/usr/bin/env bash
|
|
763
|
+
# Test lesson-check.sh — project-local lesson loading (Tier 3)
|
|
764
|
+
set -euo pipefail
|
|
765
|
+
|
|
766
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
767
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
768
|
+
LESSON_CHECK="$REPO_ROOT/scripts/lesson-check.sh"
|
|
769
|
+
|
|
770
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
771
|
+
|
|
772
|
+
# --- Setup: project with local lessons ---
|
|
773
|
+
WORK=$(mktemp -d)
|
|
774
|
+
trap 'rm -rf "$WORK"' EXIT
|
|
775
|
+
|
|
776
|
+
# Create a project-local lesson
|
|
777
|
+
mkdir -p "$WORK/docs/lessons"
|
|
778
|
+
cat > "$WORK/docs/lessons/0001-local-test.md" <<'LESSON'
|
|
779
|
+
---
|
|
780
|
+
id: "0001"
|
|
781
|
+
title: "Test local lesson"
|
|
782
|
+
severity: error
|
|
783
|
+
languages: [python]
|
|
784
|
+
scope: [universal]
|
|
785
|
+
category: testing
|
|
786
|
+
pattern:
|
|
787
|
+
type: syntactic
|
|
788
|
+
regex: "LOCALTEST_BAD_PATTERN"
|
|
789
|
+
fix: "Use LOCALTEST_GOOD_PATTERN instead"
|
|
790
|
+
positive_alternative: "LOCALTEST_GOOD_PATTERN"
|
|
791
|
+
---
|
|
792
|
+
LESSON
|
|
793
|
+
|
|
794
|
+
# Create a file that triggers the local lesson
|
|
795
|
+
cat > "$WORK/bad.py" <<'PY'
|
|
796
|
+
x = LOCALTEST_BAD_PATTERN
|
|
797
|
+
PY
|
|
798
|
+
|
|
799
|
+
# --- Test: project-local lesson is loaded ---
|
|
800
|
+
output=$(PROJECT_ROOT="$WORK" PROJECT_CLAUDE_MD="/dev/null" bash "$LESSON_CHECK" "$WORK/bad.py" 2>&1 || true)
|
|
801
|
+
if echo "$output" | grep -q 'lesson-1'; then
|
|
802
|
+
pass "Project-local lesson detected violation"
|
|
803
|
+
else
|
|
804
|
+
fail "Project-local lesson not loaded, got: $output"
|
|
805
|
+
fi
|
|
806
|
+
|
|
807
|
+
# --- Test: clean file passes with local lessons ---
|
|
808
|
+
cat > "$WORK/good.py" <<'PY'
|
|
809
|
+
x = LOCALTEST_GOOD_PATTERN
|
|
810
|
+
PY
|
|
811
|
+
|
|
812
|
+
exit_code=0
|
|
813
|
+
PROJECT_ROOT="$WORK" PROJECT_CLAUDE_MD="/dev/null" bash "$LESSON_CHECK" "$WORK/good.py" 2>/dev/null || exit_code=$?
|
|
814
|
+
assert_eq "Clean file passes with local lessons" "0" "$exit_code"
|
|
815
|
+
|
|
816
|
+
report_results
|
|
817
|
+
```
|
|
818
|
+
|
|
819
|
+
**Step 2: Make executable and verify it fails**
|
|
820
|
+
|
|
821
|
+
```bash
|
|
822
|
+
chmod +x scripts/tests/test-lesson-local.sh
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-lesson-local.sh 2>&1 | tail -3`
|
|
826
|
+
Expected: FAIL (project-local lessons not loaded yet)
|
|
827
|
+
|
|
828
|
+
**Step 3: Add project-local lesson loading**
|
|
829
|
+
|
|
830
|
+
In `scripts/lesson-check.sh`, after line 8 (`LESSONS_DIR=...`), add:
|
|
831
|
+
|
|
832
|
+
```bash
|
|
833
|
+
# Project-local lessons (Tier 3) — loaded alongside bundled lessons.
|
|
834
|
+
# Set PROJECT_ROOT to the project being checked for project-specific anti-patterns.
|
|
835
|
+
PROJECT_LESSONS_DIR=""
|
|
836
|
+
if [[ -n "${PROJECT_ROOT:-}" && -d "${PROJECT_ROOT}/docs/lessons" ]]; then
|
|
837
|
+
PROJECT_LESSONS_DIR="${PROJECT_ROOT}/docs/lessons"
|
|
838
|
+
fi
|
|
839
|
+
```
|
|
840
|
+
|
|
841
|
+
Then find the glob loop that loads lesson files (the line that iterates over `"$LESSONS_DIR"/[0-9]*.md`). After that loop completes, add a second loop for project-local lessons:
|
|
842
|
+
|
|
843
|
+
```bash
|
|
844
|
+
# Load project-local lessons (Tier 3)
|
|
845
|
+
if [[ -n "$PROJECT_LESSONS_DIR" ]]; then
|
|
846
|
+
for lesson_file in "$PROJECT_LESSONS_DIR"/[0-9]*.md; do
|
|
847
|
+
[[ -f "$lesson_file" ]] || continue
|
|
848
|
+
# Same parse_lesson + check logic as bundled lessons
|
|
849
|
+
# (reuse the same function — it's already defined)
|
|
850
|
+
done
|
|
851
|
+
fi
|
|
852
|
+
```
|
|
853
|
+
|
|
854
|
+
The exact insertion point depends on the lesson-check.sh structure. The implementer should read the full file to find where lessons are iterated and add the project-local loop after.
|
|
855
|
+
|
|
856
|
+
**Step 4: Run the tests**
|
|
857
|
+
|
|
858
|
+
Run: `bash scripts/tests/test-lesson-local.sh`
|
|
859
|
+
Expected: All tests PASS
|
|
860
|
+
|
|
861
|
+
Run: `bash scripts/tests/test-lesson-check.sh`
|
|
862
|
+
Expected: All existing tests still PASS (no regression)
|
|
863
|
+
|
|
864
|
+
**Step 5: Commit**
|
|
865
|
+
|
|
866
|
+
```bash
|
|
867
|
+
git add scripts/lesson-check.sh scripts/tests/test-lesson-local.sh
|
|
868
|
+
git commit -m "feat: support project-local lessons (Tier 3) in lesson-check.sh"
|
|
869
|
+
```
|
|
870
|
+
|
|
871
|
+
---
|
|
872
|
+
|
|
873
|
+
## Batch 4: README + npm Prep
|
|
874
|
+
|
|
875
|
+
### Task 9: Update README.md with npm install instructions
|
|
876
|
+
|
|
877
|
+
**Files:**
|
|
878
|
+
- Modify: `README.md`
|
|
879
|
+
|
|
880
|
+
**Step 1: Update installation section**
|
|
881
|
+
|
|
882
|
+
Replace the current Install section with:
|
|
883
|
+
|
|
884
|
+
```markdown
|
|
885
|
+
## Install
|
|
886
|
+
|
|
887
|
+
### npm (recommended)
|
|
888
|
+
|
|
889
|
+
```bash
|
|
890
|
+
npm install -g autonomous-coding-toolkit
|
|
891
|
+
```
|
|
892
|
+
|
|
893
|
+
This puts `act` on your PATH. Requires Node.js 18+ and bash 4+.
|
|
894
|
+
|
|
895
|
+
### Claude Code Plugin
|
|
896
|
+
|
|
897
|
+
```bash
|
|
898
|
+
# Add the marketplace source
|
|
899
|
+
/plugin marketplace add parthalon025/autonomous-coding-toolkit
|
|
900
|
+
|
|
901
|
+
# Install the plugin
|
|
902
|
+
/plugin install autonomous-coding-toolkit@autonomous-coding-toolkit
|
|
903
|
+
```
|
|
904
|
+
|
|
905
|
+
### From Source
|
|
906
|
+
|
|
907
|
+
```bash
|
|
908
|
+
git clone https://github.com/parthalon025/autonomous-coding-toolkit.git
|
|
909
|
+
cd autonomous-coding-toolkit
|
|
910
|
+
npm link # puts 'act' on PATH
|
|
911
|
+
```
|
|
912
|
+
|
|
913
|
+
> **Windows:** Requires [WSL](https://learn.microsoft.com/en-us/windows/wsl/install). Run `wsl --install`, then use the toolkit inside WSL.
|
|
914
|
+
```
|
|
915
|
+
|
|
916
|
+
**Step 2: Add Quick Start section for CLI**
|
|
917
|
+
|
|
918
|
+
Update the Quick Start section to include CLI commands alongside plugin commands:
|
|
919
|
+
|
|
920
|
+
```markdown
|
|
921
|
+
## Quick Start
|
|
922
|
+
|
|
923
|
+
```bash
|
|
924
|
+
# Bootstrap your project
|
|
925
|
+
act init --quickstart
|
|
926
|
+
|
|
927
|
+
# Full pipeline — brainstorm → plan → execute → verify → finish
|
|
928
|
+
/autocode "Add user authentication with JWT"
|
|
929
|
+
|
|
930
|
+
# Run a plan headless (fully autonomous, fresh context per batch)
|
|
931
|
+
act plan docs/plans/my-feature.md --on-failure retry --notify
|
|
932
|
+
|
|
933
|
+
# Quality check
|
|
934
|
+
act gate --project-root .
|
|
935
|
+
|
|
936
|
+
# See all commands
|
|
937
|
+
act help
|
|
938
|
+
```
|
|
939
|
+
```
|
|
940
|
+
|
|
941
|
+
**Step 3: Verify README renders correctly**
|
|
942
|
+
|
|
943
|
+
Run: `head -60 ~/Documents/projects/autonomous-coding-toolkit/README.md`
|
|
944
|
+
Expected: Updated installation and quick start sections visible
|
|
945
|
+
|
|
946
|
+
**Step 4: Commit**
|
|
947
|
+
|
|
948
|
+
```bash
|
|
949
|
+
git add README.md
|
|
950
|
+
git commit -m "docs: update README with npm install and CLI usage"
|
|
951
|
+
```
|
|
952
|
+
|
|
953
|
+
### Task 10: Add .npmignore
|
|
954
|
+
|
|
955
|
+
**Files:**
|
|
956
|
+
- Create: `.npmignore`
|
|
957
|
+
|
|
958
|
+
**Step 1: Create .npmignore**
|
|
959
|
+
|
|
960
|
+
```
|
|
961
|
+
# Development files
|
|
962
|
+
.worktrees/
|
|
963
|
+
.run-plan-state.json
|
|
964
|
+
progress.txt
|
|
965
|
+
logs/
|
|
966
|
+
tasks/
|
|
967
|
+
.claude/
|
|
968
|
+
.github/
|
|
969
|
+
research/
|
|
970
|
+
|
|
971
|
+
# Test fixtures (tests themselves ship for validation)
|
|
972
|
+
scripts/tests/fixtures/
|
|
973
|
+
|
|
974
|
+
# Git
|
|
975
|
+
.git/
|
|
976
|
+
.gitignore
|
|
977
|
+
```
|
|
978
|
+
|
|
979
|
+
**Step 2: Verify npm pack excludes dev files**
|
|
980
|
+
|
|
981
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && npm pack --dry-run 2>&1 | grep -c 'run-plan-state\|\.worktrees\|research/'`
|
|
982
|
+
Expected: `0` (none of those files included)
|
|
983
|
+
|
|
984
|
+
**Step 3: Commit**
|
|
985
|
+
|
|
986
|
+
```bash
|
|
987
|
+
git add .npmignore
|
|
988
|
+
git commit -m "chore: add .npmignore for clean npm packaging"
|
|
989
|
+
```
|
|
990
|
+
|
|
991
|
+
### Task 11: Verify full test suite passes
|
|
992
|
+
|
|
993
|
+
**Step 1: Run all tests**
|
|
994
|
+
|
|
995
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
|
|
996
|
+
Expected: All tests PASS, including the 3 new test files
|
|
997
|
+
|
|
998
|
+
**Step 2: Run quality gate on self**
|
|
999
|
+
|
|
1000
|
+
Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
|
|
1001
|
+
Expected: ALL PASSED
|
|
1002
|
+
|
|
1003
|
+
---
|
|
1004
|
+
|
|
1005
|
+
## Batch 5: Telemetry Script (P1)
|
|
1006
|
+
|
|
1007
|
+
### Task 12: Write tests for telemetry.sh
|
|
1008
|
+
|
|
1009
|
+
**Files:**
|
|
1010
|
+
- Create: `scripts/tests/test-telemetry.sh`
|
|
1011
|
+
|
|
1012
|
+
**Step 1: Write the tests**
|
|
1013
|
+
|
|
1014
|
+
```bash
|
|
1015
|
+
#!/usr/bin/env bash
|
|
1016
|
+
# Test scripts/telemetry.sh — telemetry capture, show, export, reset
|
|
1017
|
+
set -euo pipefail
|
|
1018
|
+
|
|
1019
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
1020
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
1021
|
+
TELEMETRY="$REPO_ROOT/scripts/telemetry.sh"
|
|
1022
|
+
|
|
1023
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
1024
|
+
|
|
1025
|
+
# --- Setup ---
|
|
1026
|
+
WORK=$(mktemp -d)
|
|
1027
|
+
trap 'rm -rf "$WORK"' EXIT
|
|
1028
|
+
mkdir -p "$WORK/logs"
|
|
1029
|
+
|
|
1030
|
+
# --- Test 1: record writes to telemetry.jsonl ---
|
|
1031
|
+
bash "$TELEMETRY" record --project-root "$WORK" \
|
|
1032
|
+
--batch-number 1 --passed true --strategy superpowers \
|
|
1033
|
+
--duration 120 --cost 0.42 --test-delta 5 2>&1 || true
|
|
1034
|
+
assert_eq "record creates telemetry.jsonl" "true" \
|
|
1035
|
+
"$([ -f "$WORK/logs/telemetry.jsonl" ] && echo true || echo false)"
|
|
1036
|
+
|
|
1037
|
+
# --- Test 2: record appends valid JSON ---
|
|
1038
|
+
line=$(head -1 "$WORK/logs/telemetry.jsonl")
|
|
1039
|
+
echo "$line" | jq . >/dev/null 2>&1
|
|
1040
|
+
assert_eq "record writes valid JSON" "0" "$?"
|
|
1041
|
+
|
|
1042
|
+
# --- Test 3: show produces dashboard output ---
|
|
1043
|
+
output=$(bash "$TELEMETRY" show --project-root "$WORK" 2>&1 || true)
|
|
1044
|
+
assert_contains "show displays header" "Telemetry Dashboard" "$output"
|
|
1045
|
+
|
|
1046
|
+
# --- Test 4: export produces anonymized output ---
|
|
1047
|
+
bash "$TELEMETRY" export --project-root "$WORK" > "$WORK/export.json" 2>&1 || true
|
|
1048
|
+
assert_eq "export creates output" "true" "$([ -s "$WORK/export.json" ] && echo true || echo false)"
|
|
1049
|
+
|
|
1050
|
+
# --- Test 5: reset clears telemetry ---
|
|
1051
|
+
bash "$TELEMETRY" reset --project-root "$WORK" --yes 2>&1 || true
|
|
1052
|
+
if [[ -f "$WORK/logs/telemetry.jsonl" ]]; then
|
|
1053
|
+
line_count=$(wc -l < "$WORK/logs/telemetry.jsonl")
|
|
1054
|
+
assert_eq "reset clears telemetry" "0" "$line_count"
|
|
1055
|
+
else
|
|
1056
|
+
pass "reset removes telemetry file"
|
|
1057
|
+
fi
|
|
1058
|
+
|
|
1059
|
+
report_results
|
|
1060
|
+
```
|
|
1061
|
+
|
|
1062
|
+
**Step 2: Make executable and verify it fails**
|
|
1063
|
+
|
|
1064
|
+
```bash
|
|
1065
|
+
chmod +x scripts/tests/test-telemetry.sh
|
|
1066
|
+
```
|
|
1067
|
+
|
|
1068
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telemetry.sh 2>&1 | tail -3`
|
|
1069
|
+
Expected: FAIL (telemetry.sh doesn't exist yet)
|
|
1070
|
+
|
|
1071
|
+
### Task 13: Implement telemetry.sh
|
|
1072
|
+
|
|
1073
|
+
**Files:**
|
|
1074
|
+
- Create: `scripts/telemetry.sh`
|
|
1075
|
+
|
|
1076
|
+
**Step 1: Write the implementation**
|
|
1077
|
+
|
|
1078
|
+
```bash
|
|
1079
|
+
#!/usr/bin/env bash
|
|
1080
|
+
# telemetry.sh — Local telemetry capture, dashboard, export, and import
|
|
1081
|
+
#
|
|
1082
|
+
# Usage:
|
|
1083
|
+
# telemetry.sh record --project-root <dir> [--batch-number N] [--passed true|false] ...
|
|
1084
|
+
# telemetry.sh show --project-root <dir>
|
|
1085
|
+
# telemetry.sh export --project-root <dir>
|
|
1086
|
+
# telemetry.sh import --project-root <dir> <file>
|
|
1087
|
+
# telemetry.sh reset --project-root <dir> --yes
|
|
1088
|
+
set -euo pipefail
|
|
1089
|
+
|
|
1090
|
+
SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
|
|
1091
|
+
source "$SCRIPT_DIR/lib/common.sh"
|
|
1092
|
+
|
|
1093
|
+
PROJECT_ROOT=""
|
|
1094
|
+
SUBCOMMAND=""
|
|
1095
|
+
|
|
1096
|
+
# --- Parse top-level ---
|
|
1097
|
+
SUBCOMMAND="${1:-}"
|
|
1098
|
+
shift || true
|
|
1099
|
+
|
|
1100
|
+
# Parse remaining args
|
|
1101
|
+
BATCH_NUMBER=""
|
|
1102
|
+
PASSED=""
|
|
1103
|
+
STRATEGY=""
|
|
1104
|
+
DURATION=""
|
|
1105
|
+
COST=""
|
|
1106
|
+
TEST_DELTA=""
|
|
1107
|
+
LESSONS_TRIGGERED=""
|
|
1108
|
+
PLAN_QUALITY=""
|
|
1109
|
+
BATCH_TYPE=""
|
|
1110
|
+
CONFIRM_YES=false
|
|
1111
|
+
|
|
1112
|
+
while [[ $# -gt 0 ]]; do
|
|
1113
|
+
case "$1" in
|
|
1114
|
+
--project-root) PROJECT_ROOT="${2:-}"; shift 2 ;;
|
|
1115
|
+
--batch-number) BATCH_NUMBER="${2:-}"; shift 2 ;;
|
|
1116
|
+
--passed) PASSED="${2:-}"; shift 2 ;;
|
|
1117
|
+
--strategy) STRATEGY="${2:-}"; shift 2 ;;
|
|
1118
|
+
--duration) DURATION="${2:-}"; shift 2 ;;
|
|
1119
|
+
--cost) COST="${2:-}"; shift 2 ;;
|
|
1120
|
+
--test-delta) TEST_DELTA="${2:-}"; shift 2 ;;
|
|
1121
|
+
--lessons-triggered) LESSONS_TRIGGERED="${2:-}"; shift 2 ;;
|
|
1122
|
+
--plan-quality) PLAN_QUALITY="${2:-}"; shift 2 ;;
|
|
1123
|
+
--batch-type) BATCH_TYPE="${2:-}"; shift 2 ;;
|
|
1124
|
+
--yes) CONFIRM_YES=true; shift ;;
|
|
1125
|
+
--help|-h) echo "Usage: telemetry.sh <record|show|export|import|reset> --project-root <dir> [options]"; exit 0 ;;
|
|
1126
|
+
*)
|
|
1127
|
+
# Positional arg (for import file)
|
|
1128
|
+
if [[ -z "${IMPORT_FILE:-}" ]]; then
|
|
1129
|
+
IMPORT_FILE="$1"
|
|
1130
|
+
fi
|
|
1131
|
+
shift ;;
|
|
1132
|
+
esac
|
|
1133
|
+
done
|
|
1134
|
+
|
|
1135
|
+
if [[ -z "$PROJECT_ROOT" ]]; then
|
|
1136
|
+
echo "telemetry: --project-root is required" >&2
|
|
1137
|
+
exit 1
|
|
1138
|
+
fi
|
|
1139
|
+
|
|
1140
|
+
TELEMETRY_FILE="$PROJECT_ROOT/logs/telemetry.jsonl"
|
|
1141
|
+
|
|
1142
|
+
case "$SUBCOMMAND" in
|
|
1143
|
+
record)
|
|
1144
|
+
mkdir -p "$PROJECT_ROOT/logs"
|
|
1145
|
+
jq -n \
|
|
1146
|
+
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
1147
|
+
--arg bn "${BATCH_NUMBER:-0}" \
|
|
1148
|
+
--arg passed "${PASSED:-false}" \
|
|
1149
|
+
--arg strategy "${STRATEGY:-unknown}" \
|
|
1150
|
+
--arg duration "${DURATION:-0}" \
|
|
1151
|
+
--arg cost "${COST:-0}" \
|
|
1152
|
+
--arg td "${TEST_DELTA:-0}" \
|
|
1153
|
+
--arg lt "${LESSONS_TRIGGERED:-}" \
|
|
1154
|
+
--arg pq "${PLAN_QUALITY:-}" \
|
|
1155
|
+
--arg bt "${BATCH_TYPE:-unknown}" \
|
|
1156
|
+
--arg pt "$(detect_project_type "$PROJECT_ROOT")" \
|
|
1157
|
+
'{
|
|
1158
|
+
timestamp: $ts,
|
|
1159
|
+
project_type: $pt,
|
|
1160
|
+
batch_type: $bt,
|
|
1161
|
+
batch_number: ($bn | tonumber),
|
|
1162
|
+
passed_gate: ($passed == "true"),
|
|
1163
|
+
strategy: $strategy,
|
|
1164
|
+
duration_seconds: ($duration | tonumber),
|
|
1165
|
+
cost_usd: ($cost | tonumber),
|
|
1166
|
+
test_count_delta: ($td | tonumber),
|
|
1167
|
+
lessons_triggered: (if $lt == "" then [] else ($lt | split(",")) end),
|
|
1168
|
+
plan_quality_score: (if $pq == "" then null else ($pq | tonumber) end)
|
|
1169
|
+
}' >> "$TELEMETRY_FILE"
|
|
1170
|
+
echo "telemetry: recorded batch $BATCH_NUMBER"
|
|
1171
|
+
;;
|
|
1172
|
+
|
|
1173
|
+
show)
|
|
1174
|
+
echo "Autonomous Coding Toolkit — Telemetry Dashboard"
|
|
1175
|
+
echo "════════════════════════════════════════════════"
|
|
1176
|
+
echo ""
|
|
1177
|
+
|
|
1178
|
+
if [[ ! -f "$TELEMETRY_FILE" ]] || [[ ! -s "$TELEMETRY_FILE" ]]; then
|
|
1179
|
+
echo "No telemetry data yet. Run some batches first."
|
|
1180
|
+
exit 0
|
|
1181
|
+
fi
|
|
1182
|
+
|
|
1183
|
+
# Summary stats
|
|
1184
|
+
total=$(wc -l < "$TELEMETRY_FILE")
|
|
1185
|
+
passed=$(jq -s '[.[] | select(.passed_gate == true)] | length' "$TELEMETRY_FILE")
|
|
1186
|
+
total_cost=$(jq -s '[.[].cost_usd] | add // 0' "$TELEMETRY_FILE")
|
|
1187
|
+
total_duration=$(jq -s '[.[].duration_seconds] | add // 0' "$TELEMETRY_FILE")
|
|
1188
|
+
avg_cost=$(jq -s 'if length > 0 then ([.[].cost_usd] | add) / length else 0 end' "$TELEMETRY_FILE")
|
|
1189
|
+
|
|
1190
|
+
echo "Runs: $total batches"
|
|
1191
|
+
if [[ "$total" -gt 0 ]]; then
|
|
1192
|
+
pct=$((passed * 100 / total))
|
|
1193
|
+
echo "Success rate: ${pct}% ($passed/$total passed gate on first attempt)"
|
|
1194
|
+
fi
|
|
1195
|
+
printf "Total cost: \$%.2f (\$%.2f/batch average)\n" "$total_cost" "$avg_cost"
|
|
1196
|
+
hours=$(awk "BEGIN {printf \"%.1f\", $total_duration / 3600}")
|
|
1197
|
+
echo "Total time: ${hours} hours"
|
|
1198
|
+
|
|
1199
|
+
# Strategy performance
|
|
1200
|
+
echo ""
|
|
1201
|
+
echo "Strategy Performance:"
|
|
1202
|
+
jq -s '
|
|
1203
|
+
group_by(.strategy) | .[] |
|
|
1204
|
+
{
|
|
1205
|
+
strategy: .[0].strategy,
|
|
1206
|
+
wins: [.[] | select(.passed_gate == true)] | length,
|
|
1207
|
+
total: length
|
|
1208
|
+
} |
|
|
1209
|
+
" \(.strategy): \(.wins)/\(.total) (\(if .total > 0 then (.wins * 100 / .total) else 0 end)% win rate)"
|
|
1210
|
+
' "$TELEMETRY_FILE" 2>/dev/null || echo " (no strategy data)"
|
|
1211
|
+
|
|
1212
|
+
# Top lesson hits
|
|
1213
|
+
echo ""
|
|
1214
|
+
echo "Top Lesson Hits:"
|
|
1215
|
+
jq -s '
|
|
1216
|
+
[.[].lessons_triggered | arrays | .[]] |
|
|
1217
|
+
group_by(.) | map({lesson: .[0], count: length}) |
|
|
1218
|
+
sort_by(-.count) | .[:5] |
|
|
1219
|
+
.[] | " \(.lesson): \(.count) hits"
|
|
1220
|
+
' "$TELEMETRY_FILE" 2>/dev/null || echo " (no lesson data)"
|
|
1221
|
+
;;
|
|
1222
|
+
|
|
1223
|
+
export)
|
|
1224
|
+
if [[ ! -f "$TELEMETRY_FILE" ]]; then
|
|
1225
|
+
echo "No telemetry data to export." >&2
|
|
1226
|
+
exit 1
|
|
1227
|
+
fi
|
|
1228
|
+
# Anonymize: remove timestamps precision, no file paths
|
|
1229
|
+
jq -s '
|
|
1230
|
+
[.[] | {
|
|
1231
|
+
project_type,
|
|
1232
|
+
batch_type,
|
|
1233
|
+
passed_gate,
|
|
1234
|
+
strategy,
|
|
1235
|
+
duration_seconds,
|
|
1236
|
+
cost_usd,
|
|
1237
|
+
test_count_delta,
|
|
1238
|
+
lessons_triggered,
|
|
1239
|
+
plan_quality_score
|
|
1240
|
+
}]
|
|
1241
|
+
' "$TELEMETRY_FILE"
|
|
1242
|
+
;;
|
|
1243
|
+
|
|
1244
|
+
import)
|
|
1245
|
+
if [[ -z "${IMPORT_FILE:-}" || ! -f "${IMPORT_FILE:-}" ]]; then
|
|
1246
|
+
echo "telemetry: import requires a file argument" >&2
|
|
1247
|
+
exit 1
|
|
1248
|
+
fi
|
|
1249
|
+
echo "telemetry: import not yet implemented (planned for community sync)"
|
|
1250
|
+
;;
|
|
1251
|
+
|
|
1252
|
+
reset)
|
|
1253
|
+
if [[ "$CONFIRM_YES" != true ]]; then
|
|
1254
|
+
echo "telemetry: use --yes to confirm reset" >&2
|
|
1255
|
+
exit 1
|
|
1256
|
+
fi
|
|
1257
|
+
if [[ -f "$TELEMETRY_FILE" ]]; then
|
|
1258
|
+
> "$TELEMETRY_FILE"
|
|
1259
|
+
echo "telemetry: cleared $TELEMETRY_FILE"
|
|
1260
|
+
else
|
|
1261
|
+
echo "telemetry: no telemetry file to reset"
|
|
1262
|
+
fi
|
|
1263
|
+
;;
|
|
1264
|
+
|
|
1265
|
+
*)
|
|
1266
|
+
echo "Usage: telemetry.sh <record|show|export|import|reset> --project-root <dir>" >&2
|
|
1267
|
+
exit 1
|
|
1268
|
+
;;
|
|
1269
|
+
esac
|
|
1270
|
+
```
|
|
1271
|
+
|
|
1272
|
+
**Step 2: Make executable**
|
|
1273
|
+
|
|
1274
|
+
```bash
|
|
1275
|
+
chmod +x scripts/telemetry.sh
|
|
1276
|
+
```
|
|
1277
|
+
|
|
1278
|
+
**Step 3: Run the tests**
|
|
1279
|
+
|
|
1280
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telemetry.sh`
|
|
1281
|
+
Expected: All tests PASS
|
|
1282
|
+
|
|
1283
|
+
**Step 4: Commit**
|
|
1284
|
+
|
|
1285
|
+
```bash
|
|
1286
|
+
git add scripts/telemetry.sh scripts/tests/test-telemetry.sh
|
|
1287
|
+
git commit -m "feat: add telemetry.sh — capture, dashboard, export, reset"
|
|
1288
|
+
```
|
|
1289
|
+
|
|
1290
|
+
---
|
|
1291
|
+
|
|
1292
|
+
## Batch 6: Telemetry Integration in Quality Gate
|
|
1293
|
+
|
|
1294
|
+
### Task 14: Add telemetry capture to quality-gate.sh
|
|
1295
|
+
|
|
1296
|
+
**Files:**
|
|
1297
|
+
- Modify: `scripts/quality-gate.sh:248` (after "ALL PASSED")
|
|
1298
|
+
|
|
1299
|
+
**Step 1: Add telemetry capture after the final echo**
|
|
1300
|
+
|
|
1301
|
+
Before the `exit 0` at the end of quality-gate.sh, add telemetry recording:
|
|
1302
|
+
|
|
1303
|
+
```bash
|
|
1304
|
+
# === Telemetry capture (append batch result) ===
|
|
1305
|
+
# Only record if TELEMETRY_BATCH_NUMBER is set (called from run-plan context)
|
|
1306
|
+
if [[ -n "${TELEMETRY_BATCH_NUMBER:-}" ]]; then
|
|
1307
|
+
"$SCRIPT_DIR/telemetry.sh" record \
|
|
1308
|
+
--project-root "$PROJECT_ROOT" \
|
|
1309
|
+
--batch-number "${TELEMETRY_BATCH_NUMBER}" \
|
|
1310
|
+
--passed true \
|
|
1311
|
+
--strategy "${TELEMETRY_STRATEGY:-unknown}" \
|
|
1312
|
+
--duration "${TELEMETRY_DURATION:-0}" \
|
|
1313
|
+
--cost "${TELEMETRY_COST:-0}" \
|
|
1314
|
+
--test-delta "${TELEMETRY_TEST_DELTA:-0}" \
|
|
1315
|
+
--batch-type "${TELEMETRY_BATCH_TYPE:-unknown}" \
|
|
1316
|
+
2>/dev/null || true # Never fail the gate for telemetry errors
|
|
1317
|
+
fi
|
|
1318
|
+
```
|
|
1319
|
+
|
|
1320
|
+
This is conditional — telemetry only records when the env vars are set by the calling script (run-plan.sh). Quality gate still works exactly as before when called standalone.
|
|
1321
|
+
|
|
1322
|
+
**Step 2: Verify quality gate still passes standalone**
|
|
1323
|
+
|
|
1324
|
+
Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit --quick`
|
|
1325
|
+
Expected: ALL PASSED (no telemetry vars set, so telemetry capture is silently skipped)
|
|
1326
|
+
|
|
1327
|
+
**Step 3: Verify telemetry records when vars are set**
|
|
1328
|
+
|
|
1329
|
+
```bash
|
|
1330
|
+
WORK=$(mktemp -d)
|
|
1331
|
+
mkdir -p "$WORK/logs"
|
|
1332
|
+
git -C "$WORK" init -q
|
|
1333
|
+
TELEMETRY_BATCH_NUMBER=1 TELEMETRY_STRATEGY=test \
|
|
1334
|
+
bash scripts/quality-gate.sh --project-root "$WORK" --quick 2>&1 | tail -3
|
|
1335
|
+
cat "$WORK/logs/telemetry.jsonl" 2>/dev/null || echo "(no telemetry)"
|
|
1336
|
+
rm -rf "$WORK"
|
|
1337
|
+
```
|
|
1338
|
+
|
|
1339
|
+
Expected: quality gate passes and telemetry.jsonl has one line
|
|
1340
|
+
|
|
1341
|
+
**Step 4: Commit**
|
|
1342
|
+
|
|
1343
|
+
```bash
|
|
1344
|
+
git add scripts/quality-gate.sh
|
|
1345
|
+
git commit -m "feat: integrate telemetry capture into quality gate pipeline"
|
|
1346
|
+
```
|
|
1347
|
+
|
|
1348
|
+
### Task 15: Run full test suite
|
|
1349
|
+
|
|
1350
|
+
**Step 1: Run all tests**
|
|
1351
|
+
|
|
1352
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
|
|
1353
|
+
Expected: All tests PASS
|
|
1354
|
+
|
|
1355
|
+
**Step 2: Run quality gate**
|
|
1356
|
+
|
|
1357
|
+
Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
|
|
1358
|
+
Expected: ALL PASSED
|
|
1359
|
+
|
|
1360
|
+
---
|
|
1361
|
+
|
|
1362
|
+
## Batch 7: Benchmark Suite (P1)
|
|
1363
|
+
|
|
1364
|
+
### Task 16: Create benchmark directory structure
|
|
1365
|
+
|
|
1366
|
+
**Files:**
|
|
1367
|
+
- Create: `benchmarks/runner.sh`
|
|
1368
|
+
- Create: `benchmarks/tasks/01-rest-endpoint/task.md`
|
|
1369
|
+
- Create: `benchmarks/tasks/01-rest-endpoint/rubric.sh`
|
|
1370
|
+
|
|
1371
|
+
**Step 1: Create directories**
|
|
1372
|
+
|
|
1373
|
+
```bash
|
|
1374
|
+
mkdir -p benchmarks/tasks/01-rest-endpoint
|
|
1375
|
+
mkdir -p benchmarks/tasks/02-refactor-module
|
|
1376
|
+
mkdir -p benchmarks/tasks/03-fix-integration-bug
|
|
1377
|
+
mkdir -p benchmarks/tasks/04-add-test-coverage
|
|
1378
|
+
mkdir -p benchmarks/tasks/05-multi-file-feature
|
|
1379
|
+
mkdir -p benchmarks/rubrics
|
|
1380
|
+
```
|
|
1381
|
+
|
|
1382
|
+
**Step 2: Write benchmark runner**
|
|
1383
|
+
|
|
1384
|
+
```bash
|
|
1385
|
+
#!/usr/bin/env bash
|
|
1386
|
+
# runner.sh — Benchmark orchestrator for the Autonomous Coding Toolkit
|
|
1387
|
+
#
|
|
1388
|
+
# Usage:
|
|
1389
|
+
# runner.sh run [task-name] Run all or one benchmark
|
|
1390
|
+
# runner.sh compare <a> <b> Compare two result files
|
|
1391
|
+
# runner.sh list List available benchmarks
|
|
1392
|
+
set -euo pipefail
|
|
1393
|
+
|
|
1394
|
+
SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
|
|
1395
|
+
TASKS_DIR="$SCRIPT_DIR/tasks"
|
|
1396
|
+
RESULTS_DIR="${BENCHMARK_RESULTS_DIR:-$SCRIPT_DIR/results}"
|
|
1397
|
+
|
|
1398
|
+
usage() {
|
|
1399
|
+
cat <<'USAGE'
|
|
1400
|
+
Usage: runner.sh <run|compare|list> [options]
|
|
1401
|
+
|
|
1402
|
+
Commands:
|
|
1403
|
+
run [name] Run all benchmarks, or a specific one by directory name
|
|
1404
|
+
compare <a> <b> Compare two result JSON files
|
|
1405
|
+
list List available benchmark tasks
|
|
1406
|
+
|
|
1407
|
+
Options:
|
|
1408
|
+
--help, -h Show this help
|
|
1409
|
+
|
|
1410
|
+
Results are saved to benchmarks/results/ (gitignored).
|
|
1411
|
+
USAGE
|
|
1412
|
+
exit 0
|
|
1413
|
+
}
|
|
1414
|
+
|
|
1415
|
+
SUBCOMMAND="${1:-}"
|
|
1416
|
+
shift || true
|
|
1417
|
+
|
|
1418
|
+
case "$SUBCOMMAND" in
|
|
1419
|
+
list)
|
|
1420
|
+
echo "Available benchmarks:"
|
|
1421
|
+
for task_dir in "$TASKS_DIR"/*/; do
|
|
1422
|
+
[[ -d "$task_dir" ]] || continue
|
|
1423
|
+
name=$(basename "$task_dir")
|
|
1424
|
+
desc=""
|
|
1425
|
+
if [[ -f "$task_dir/task.md" ]]; then
|
|
1426
|
+
desc=$(head -1 "$task_dir/task.md" | sed 's/^# //')
|
|
1427
|
+
fi
|
|
1428
|
+
echo " $name — $desc"
|
|
1429
|
+
done
|
|
1430
|
+
;;
|
|
1431
|
+
|
|
1432
|
+
run)
|
|
1433
|
+
TARGET="${1:-all}"
|
|
1434
|
+
mkdir -p "$RESULTS_DIR"
|
|
1435
|
+
timestamp=$(date -u +%Y%m%dT%H%M%SZ)
|
|
1436
|
+
|
|
1437
|
+
run_benchmark() {
|
|
1438
|
+
local task_dir="$1"
|
|
1439
|
+
local name=$(basename "$task_dir")
|
|
1440
|
+
echo "=== Benchmark: $name ==="
|
|
1441
|
+
|
|
1442
|
+
if [[ ! -f "$task_dir/rubric.sh" ]]; then
|
|
1443
|
+
echo " SKIP: no rubric.sh found"
|
|
1444
|
+
return
|
|
1445
|
+
fi
|
|
1446
|
+
|
|
1447
|
+
local score=0
|
|
1448
|
+
local total=0
|
|
1449
|
+
local pass=0
|
|
1450
|
+
|
|
1451
|
+
# Run rubric — each line of output is "PASS: desc" or "FAIL: desc"
|
|
1452
|
+
while IFS= read -r line; do
|
|
1453
|
+
total=$((total + 1))
|
|
1454
|
+
if [[ "$line" == PASS:* ]]; then
|
|
1455
|
+
pass=$((pass + 1))
|
|
1456
|
+
fi
|
|
1457
|
+
echo " $line"
|
|
1458
|
+
done < <(bash "$task_dir/rubric.sh" 2>&1 || true)
|
|
1459
|
+
|
|
1460
|
+
if [[ $total -gt 0 ]]; then
|
|
1461
|
+
score=$((pass * 100 / total))
|
|
1462
|
+
fi
|
|
1463
|
+
echo " Score: ${score}% ($pass/$total)"
|
|
1464
|
+
echo ""
|
|
1465
|
+
|
|
1466
|
+
# Write result
|
|
1467
|
+
jq -n --arg name "$name" --argjson score "$score" \
|
|
1468
|
+
--argjson pass "$pass" --argjson total "$total" \
|
|
1469
|
+
--arg ts "$timestamp" \
|
|
1470
|
+
'{name: $name, score: $score, passed: $pass, total: $total, timestamp: $ts}' \
|
|
1471
|
+
>> "$RESULTS_DIR/$timestamp.jsonl"
|
|
1472
|
+
}
|
|
1473
|
+
|
|
1474
|
+
if [[ "$TARGET" == "all" ]]; then
|
|
1475
|
+
for task_dir in "$TASKS_DIR"/*/; do
|
|
1476
|
+
[[ -d "$task_dir" ]] || continue
|
|
1477
|
+
run_benchmark "$task_dir"
|
|
1478
|
+
done
|
|
1479
|
+
else
|
|
1480
|
+
if [[ -d "$TASKS_DIR/$TARGET" ]]; then
|
|
1481
|
+
run_benchmark "$TASKS_DIR/$TARGET"
|
|
1482
|
+
else
|
|
1483
|
+
echo "Benchmark not found: $TARGET" >&2
|
|
1484
|
+
echo "Run 'runner.sh list' to see available benchmarks." >&2
|
|
1485
|
+
exit 1
|
|
1486
|
+
fi
|
|
1487
|
+
fi
|
|
1488
|
+
|
|
1489
|
+
echo "Results saved to: $RESULTS_DIR/$timestamp.jsonl"
|
|
1490
|
+
;;
|
|
1491
|
+
|
|
1492
|
+
compare)
|
|
1493
|
+
FILE_A="${1:-}"
|
|
1494
|
+
FILE_B="${2:-}"
|
|
1495
|
+
if [[ -z "$FILE_A" || -z "$FILE_B" ]]; then
|
|
1496
|
+
echo "Usage: runner.sh compare <result-a.jsonl> <result-b.jsonl>" >&2
|
|
1497
|
+
exit 1
|
|
1498
|
+
fi
|
|
1499
|
+
if [[ ! -f "$FILE_A" || ! -f "$FILE_B" ]]; then
|
|
1500
|
+
echo "One or both files not found." >&2
|
|
1501
|
+
exit 1
|
|
1502
|
+
fi
|
|
1503
|
+
|
|
1504
|
+
echo "Benchmark Comparison"
|
|
1505
|
+
echo "═════════════════════════════════════"
|
|
1506
|
+
printf "%-25s %8s %8s %8s\n" "Task" "Before" "After" "Delta"
|
|
1507
|
+
echo "─────────────────────────────────────────────"
|
|
1508
|
+
|
|
1509
|
+
# Merge by name and compare
|
|
1510
|
+
jq -s '
|
|
1511
|
+
[.[0], .[1]] | transpose | .[] |
|
|
1512
|
+
select(.[0] != null and .[1] != null) |
|
|
1513
|
+
"\(.[0].name)|\(.[0].score)|\(.[1].score)|\(.[1].score - .[0].score)"
|
|
1514
|
+
' <(jq -s '.' "$FILE_A") <(jq -s '.' "$FILE_B") 2>/dev/null | \
|
|
1515
|
+
while IFS='|' read -r name before after delta; do
|
|
1516
|
+
sign=""
|
|
1517
|
+
[[ "$delta" -gt 0 ]] && sign="+"
|
|
1518
|
+
printf "%-25s %7s%% %7s%% %7s%%\n" "$name" "$before" "$after" "${sign}${delta}"
|
|
1519
|
+
done
|
|
1520
|
+
|
|
1521
|
+
echo "═════════════════════════════════════"
|
|
1522
|
+
;;
|
|
1523
|
+
|
|
1524
|
+
help|--help|-h|"")
|
|
1525
|
+
usage
|
|
1526
|
+
;;
|
|
1527
|
+
|
|
1528
|
+
*)
|
|
1529
|
+
echo "Unknown command: $SUBCOMMAND" >&2
|
|
1530
|
+
usage
|
|
1531
|
+
;;
|
|
1532
|
+
esac
|
|
1533
|
+
```
|
|
1534
|
+
|
|
1535
|
+
**Step 3: Make executable**
|
|
1536
|
+
|
|
1537
|
+
```bash
|
|
1538
|
+
chmod +x benchmarks/runner.sh
|
|
1539
|
+
```
|
|
1540
|
+
|
|
1541
|
+
**Step 4: Write first benchmark task definition**
|
|
1542
|
+
|
|
1543
|
+
Create `benchmarks/tasks/01-rest-endpoint/task.md`:
|
|
1544
|
+
|
|
1545
|
+
```markdown
|
|
1546
|
+
# Add a REST Endpoint with Tests
|
|
1547
|
+
|
|
1548
|
+
**Complexity:** Simple (1 batch)
|
|
1549
|
+
**Measures:** Basic execution, TDD compliance
|
|
1550
|
+
|
|
1551
|
+
## Task
|
|
1552
|
+
|
|
1553
|
+
Add a `/health` endpoint to the project that:
|
|
1554
|
+
1. Returns HTTP 200 with JSON body `{"status": "ok", "timestamp": "<ISO8601>"}`
|
|
1555
|
+
2. Has a test that verifies the response status and body structure
|
|
1556
|
+
3. All tests pass
|
|
1557
|
+
|
|
1558
|
+
## Constraints
|
|
1559
|
+
|
|
1560
|
+
- Use the project's existing web framework (or add minimal one if none exists)
|
|
1561
|
+
- Follow existing code style and patterns
|
|
1562
|
+
- Test must be automated (no manual verification)
|
|
1563
|
+
```
|
|
1564
|
+
|
|
1565
|
+
Create `benchmarks/tasks/01-rest-endpoint/rubric.sh`:
|
|
1566
|
+
|
|
1567
|
+
```bash
|
|
1568
|
+
#!/usr/bin/env bash
|
|
1569
|
+
# Rubric for 01-rest-endpoint benchmark
|
|
1570
|
+
# Checks for task completion criteria
|
|
1571
|
+
set -euo pipefail
|
|
1572
|
+
|
|
1573
|
+
PROJECT_ROOT="${BENCHMARK_PROJECT_ROOT:-.}"
|
|
1574
|
+
|
|
1575
|
+
# Criterion 1: Health endpoint file exists
|
|
1576
|
+
if compgen -G "$PROJECT_ROOT/src/*health*" >/dev/null 2>&1 || \
|
|
1577
|
+
compgen -G "$PROJECT_ROOT/app/*health*" >/dev/null 2>&1 || \
|
|
1578
|
+
grep -rl "health" "$PROJECT_ROOT/src/" "$PROJECT_ROOT/app/" 2>/dev/null | head -1 >/dev/null 2>&1; then
|
|
1579
|
+
echo "PASS: Health endpoint file exists"
|
|
1580
|
+
else
|
|
1581
|
+
echo "FAIL: Health endpoint file not found"
|
|
1582
|
+
fi
|
|
1583
|
+
|
|
1584
|
+
# Criterion 2: Test file exists
|
|
1585
|
+
if compgen -G "$PROJECT_ROOT/tests/*health*" >/dev/null 2>&1 || \
|
|
1586
|
+
compgen -G "$PROJECT_ROOT/test/*health*" >/dev/null 2>&1; then
|
|
1587
|
+
echo "PASS: Health endpoint test file exists"
|
|
1588
|
+
else
|
|
1589
|
+
echo "FAIL: Health endpoint test file not found"
|
|
1590
|
+
fi
|
|
1591
|
+
|
|
1592
|
+
# Criterion 3: Test passes
|
|
1593
|
+
if cd "$PROJECT_ROOT" && (npm test 2>/dev/null || pytest 2>/dev/null || make test 2>/dev/null); then
|
|
1594
|
+
echo "PASS: Tests pass"
|
|
1595
|
+
else
|
|
1596
|
+
echo "FAIL: Tests do not pass"
|
|
1597
|
+
fi
|
|
1598
|
+
```
|
|
1599
|
+
|
|
1600
|
+
```bash
|
|
1601
|
+
chmod +x benchmarks/tasks/01-rest-endpoint/rubric.sh
|
|
1602
|
+
```
|
|
1603
|
+
|
|
1604
|
+
**Step 5: Write remaining task stubs**
|
|
1605
|
+
|
|
1606
|
+
For benchmarks 02-05, create minimal `task.md` files (rubrics can be expanded later):
|
|
1607
|
+
|
|
1608
|
+
Create `benchmarks/tasks/02-refactor-module/task.md`:
|
|
1609
|
+
```markdown
|
|
1610
|
+
# Refactor a Module into Two
|
|
1611
|
+
|
|
1612
|
+
**Complexity:** Medium (2 batches)
|
|
1613
|
+
**Measures:** Refactoring quality, test preservation
|
|
1614
|
+
|
|
1615
|
+
## Task
|
|
1616
|
+
|
|
1617
|
+
Split `src/utils.sh` into `src/string-utils.sh` and `src/file-utils.sh`, preserving all existing tests.
|
|
1618
|
+
```
|
|
1619
|
+
|
|
1620
|
+
Create `benchmarks/tasks/03-fix-integration-bug/task.md`:
|
|
1621
|
+
```markdown
|
|
1622
|
+
# Fix an Integration Bug
|
|
1623
|
+
|
|
1624
|
+
**Complexity:** Medium (2 batches)
|
|
1625
|
+
**Measures:** Debugging, root cause analysis
|
|
1626
|
+
|
|
1627
|
+
## Task
|
|
1628
|
+
|
|
1629
|
+
The `/api/users` endpoint returns 500 when the database connection pool is exhausted. Find and fix the root cause.
|
|
1630
|
+
```
|
|
1631
|
+
|
|
1632
|
+
Create `benchmarks/tasks/04-add-test-coverage/task.md`:
|
|
1633
|
+
```markdown
|
|
1634
|
+
# Add Test Coverage to Untested Module
|
|
1635
|
+
|
|
1636
|
+
**Complexity:** Medium (2 batches)
|
|
1637
|
+
**Measures:** Test quality, edge case discovery
|
|
1638
|
+
|
|
1639
|
+
## Task
|
|
1640
|
+
|
|
1641
|
+
Add comprehensive tests to `src/parser.sh` which currently has 0% coverage. Cover happy path, edge cases, and error conditions.
|
|
1642
|
+
```
|
|
1643
|
+
|
|
1644
|
+
Create `benchmarks/tasks/05-multi-file-feature/task.md`:
|
|
1645
|
+
```markdown
|
|
1646
|
+
# Multi-File Feature with API + DB + Tests
|
|
1647
|
+
|
|
1648
|
+
**Complexity:** Complex (4 batches)
|
|
1649
|
+
**Measures:** Full pipeline, cross-file coordination
|
|
1650
|
+
|
|
1651
|
+
## Task
|
|
1652
|
+
|
|
1653
|
+
Add a "bookmarks" feature: API endpoints (CRUD), database migration, and integration tests.
|
|
1654
|
+
```
|
|
1655
|
+
|
|
1656
|
+
**Step 6: Verify runner works**
|
|
1657
|
+
|
|
1658
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash benchmarks/runner.sh list`
|
|
1659
|
+
Expected: Lists all 5 benchmark tasks
|
|
1660
|
+
|
|
1661
|
+
**Step 7: Add results/ to .gitignore**
|
|
1662
|
+
|
|
1663
|
+
```bash
|
|
1664
|
+
echo "benchmarks/results/" >> .gitignore
|
|
1665
|
+
```
|
|
1666
|
+
|
|
1667
|
+
**Step 8: Commit**
|
|
1668
|
+
|
|
1669
|
+
```bash
|
|
1670
|
+
git add benchmarks/ .gitignore
|
|
1671
|
+
git commit -m "feat: add benchmark suite with 5 tasks and runner.sh"
|
|
1672
|
+
```
|
|
1673
|
+
|
|
1674
|
+
### Task 17: Write benchmark runner test
|
|
1675
|
+
|
|
1676
|
+
**Files:**
|
|
1677
|
+
- Create: `scripts/tests/test-benchmark-runner.sh`
|
|
1678
|
+
|
|
1679
|
+
**Step 1: Write the test**
|
|
1680
|
+
|
|
1681
|
+
```bash
|
|
1682
|
+
#!/usr/bin/env bash
|
|
1683
|
+
# Test benchmarks/runner.sh
|
|
1684
|
+
set -euo pipefail
|
|
1685
|
+
|
|
1686
|
+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
1687
|
+
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
|
1688
|
+
RUNNER="$REPO_ROOT/benchmarks/runner.sh"
|
|
1689
|
+
|
|
1690
|
+
source "$SCRIPT_DIR/test-helpers.sh"
|
|
1691
|
+
|
|
1692
|
+
# --- Test 1: list shows benchmarks ---
|
|
1693
|
+
output=$(bash "$RUNNER" list 2>&1)
|
|
1694
|
+
assert_contains "list shows benchmarks" "01-rest-endpoint" "$output"
|
|
1695
|
+
assert_contains "list shows all 5" "05-multi-file-feature" "$output"
|
|
1696
|
+
|
|
1697
|
+
# --- Test 2: help works ---
|
|
1698
|
+
output=$(bash "$RUNNER" help 2>&1)
|
|
1699
|
+
assert_contains "help shows usage" "Usage:" "$output"
|
|
1700
|
+
|
|
1701
|
+
# --- Test 3: unknown benchmark fails gracefully ---
|
|
1702
|
+
exit_code=0
|
|
1703
|
+
bash "$RUNNER" run nonexistent-benchmark >/dev/null 2>&1 || exit_code=$?
|
|
1704
|
+
assert_eq "unknown benchmark exits non-zero" "1" "$exit_code"
|
|
1705
|
+
|
|
1706
|
+
report_results
|
|
1707
|
+
```
|
|
1708
|
+
|
|
1709
|
+
**Step 2: Make executable and run**
|
|
1710
|
+
|
|
1711
|
+
```bash
|
|
1712
|
+
chmod +x scripts/tests/test-benchmark-runner.sh
|
|
1713
|
+
```
|
|
1714
|
+
|
|
1715
|
+
Run: `bash scripts/tests/test-benchmark-runner.sh`
|
|
1716
|
+
Expected: All tests PASS
|
|
1717
|
+
|
|
1718
|
+
**Step 3: Commit**
|
|
1719
|
+
|
|
1720
|
+
```bash
|
|
1721
|
+
git add scripts/tests/test-benchmark-runner.sh
|
|
1722
|
+
git commit -m "test: add benchmark runner tests"
|
|
1723
|
+
```
|
|
1724
|
+
|
|
1725
|
+
---
|
|
1726
|
+
|
|
1727
|
+
## Batch 8: Trust Score + Graduated Autonomy (P2)
|
|
1728
|
+
|
|
1729
|
+
### Task 18: Add trust score computation to telemetry.sh
|
|
1730
|
+
|
|
1731
|
+
**Files:**
|
|
1732
|
+
- Modify: `scripts/telemetry.sh` (add `trust` subcommand)
|
|
1733
|
+
|
|
1734
|
+
**Step 1: Add trust score subcommand**
|
|
1735
|
+
|
|
1736
|
+
Add a new case to the `case "$SUBCOMMAND"` block in telemetry.sh:
|
|
1737
|
+
|
|
1738
|
+
```bash
|
|
1739
|
+
trust)
|
|
1740
|
+
if [[ ! -f "$TELEMETRY_FILE" ]] || [[ ! -s "$TELEMETRY_FILE" ]]; then
|
|
1741
|
+
echo '{"score":0,"level":"new","runs":0,"message":"No telemetry data yet"}'
|
|
1742
|
+
exit 0
|
|
1743
|
+
fi
|
|
1744
|
+
|
|
1745
|
+
jq -s '
|
|
1746
|
+
def trust_level(score; runs):
|
|
1747
|
+
if runs < 10 then "new"
|
|
1748
|
+
elif score < 30 then "new"
|
|
1749
|
+
elif score < 70 then "growing"
|
|
1750
|
+
elif score < 90 then "trusted"
|
|
1751
|
+
else "autonomous"
|
|
1752
|
+
end;
|
|
1753
|
+
|
|
1754
|
+
length as $total |
|
|
1755
|
+
([.[] | select(.passed_gate == true)] | length) as $passed |
|
|
1756
|
+
(if $total > 0 then ($passed * 100 / $total) else 0 end) as $gate_rate |
|
|
1757
|
+
# Trust score = gate pass rate (simplified; full formula adds echo-back, regression, revert)
|
|
1758
|
+
$gate_rate as $score |
|
|
1759
|
+
trust_level($score; $total) as $level |
|
|
1760
|
+
{
|
|
1761
|
+
score: $score,
|
|
1762
|
+
level: $level,
|
|
1763
|
+
runs: $total,
|
|
1764
|
+
gate_pass_rate: $gate_rate,
|
|
1765
|
+
default_mode: (
|
|
1766
|
+
if $level == "new" then "human checkpoint every batch"
|
|
1767
|
+
elif $level == "growing" then "headless with checkpoint every 3rd batch"
|
|
1768
|
+
elif $level == "trusted" then "headless with notification on failures only"
|
|
1769
|
+
else "full headless, post-run summary only"
|
|
1770
|
+
end
|
|
1771
|
+
)
|
|
1772
|
+
}
|
|
1773
|
+
' "$TELEMETRY_FILE"
|
|
1774
|
+
;;
|
|
1775
|
+
```
|
|
1776
|
+
|
|
1777
|
+
**Step 2: Verify trust score works**
|
|
1778
|
+
|
|
1779
|
+
Create some test data and check:
|
|
1780
|
+
|
|
1781
|
+
```bash
|
|
1782
|
+
WORK=$(mktemp -d)
|
|
1783
|
+
mkdir -p "$WORK/logs"
|
|
1784
|
+
for i in $(seq 1 15); do
|
|
1785
|
+
bash scripts/telemetry.sh record --project-root "$WORK" --batch-number "$i" --passed true --strategy test --duration 60 --cost 0.30
|
|
1786
|
+
done
|
|
1787
|
+
bash scripts/telemetry.sh trust --project-root "$WORK"
|
|
1788
|
+
rm -rf "$WORK"
|
|
1789
|
+
```
|
|
1790
|
+
|
|
1791
|
+
Expected: JSON with `"score": 100`, `"level": "autonomous"`, `"runs": 15`
|
|
1792
|
+
|
|
1793
|
+
**Step 3: Commit**
|
|
1794
|
+
|
|
1795
|
+
```bash
|
|
1796
|
+
git add scripts/telemetry.sh
|
|
1797
|
+
git commit -m "feat: add trust score computation to telemetry"
|
|
1798
|
+
```
|
|
1799
|
+
|
|
1800
|
+
### Task 19: Add trust score to pipeline-status.sh
|
|
1801
|
+
|
|
1802
|
+
**Files:**
|
|
1803
|
+
- Modify: `scripts/pipeline-status.sh` (add trust score display section)
|
|
1804
|
+
|
|
1805
|
+
**Step 1: Add trust score section**
|
|
1806
|
+
|
|
1807
|
+
After the "Git" section (before the final `echo` at the bottom), add:
|
|
1808
|
+
|
|
1809
|
+
```bash
|
|
1810
|
+
# Trust score (from telemetry)
|
|
1811
|
+
if [[ -x "$SCRIPT_DIR/telemetry.sh" ]]; then
|
|
1812
|
+
trust_json=$("$SCRIPT_DIR/telemetry.sh" trust --project-root "$PROJECT_ROOT" 2>/dev/null || echo '{}')
|
|
1813
|
+
trust_score=$(echo "$trust_json" | jq -r '.score // "n/a"' 2>/dev/null || echo "n/a")
|
|
1814
|
+
trust_level=$(echo "$trust_json" | jq -r '.level // "unknown"' 2>/dev/null || echo "unknown")
|
|
1815
|
+
trust_runs=$(echo "$trust_json" | jq -r '.runs // 0' 2>/dev/null || echo "0")
|
|
1816
|
+
trust_mode=$(echo "$trust_json" | jq -r '.default_mode // "unknown"' 2>/dev/null || echo "unknown")
|
|
1817
|
+
|
|
1818
|
+
if [[ "$trust_score" != "n/a" && "$trust_runs" != "0" ]]; then
|
|
1819
|
+
echo ""
|
|
1820
|
+
echo "--- Trust Score ---"
|
|
1821
|
+
echo " Score: ${trust_score}/100 ($trust_runs runs)"
|
|
1822
|
+
echo " Level: $trust_level"
|
|
1823
|
+
echo " Default mode: $trust_mode"
|
|
1824
|
+
fi
|
|
1825
|
+
fi
|
|
1826
|
+
```
|
|
1827
|
+
|
|
1828
|
+
**Step 2: Verify it works (with no telemetry data, silently skips)**
|
|
1829
|
+
|
|
1830
|
+
Run: `bash scripts/pipeline-status.sh ~/Documents/projects/autonomous-coding-toolkit 2>&1 | tail -10`
|
|
1831
|
+
Expected: Shows git section, trust section may show "n/a" or be absent (no telemetry data in the toolkit itself)
|
|
1832
|
+
|
|
1833
|
+
**Step 3: Commit**
|
|
1834
|
+
|
|
1835
|
+
```bash
|
|
1836
|
+
git add scripts/pipeline-status.sh
|
|
1837
|
+
git commit -m "feat: display trust score in pipeline status"
|
|
1838
|
+
```
|
|
1839
|
+
|
|
1840
|
+
---
|
|
1841
|
+
|
|
1842
|
+
## Batch 9: Semantic Echo-Back Tier 2 (P2)
|
|
1843
|
+
|
|
1844
|
+
### Task 20: Add Tier 2 echo-back support
|
|
1845
|
+
|
|
1846
|
+
**Files:**
|
|
1847
|
+
- Modify: `scripts/lib/run-plan-echo-back.sh` (add LLM verification tier)
|
|
1848
|
+
|
|
1849
|
+
**Step 1: Read the current echo-back implementation**
|
|
1850
|
+
|
|
1851
|
+
The implementer should read `scripts/lib/run-plan-echo-back.sh` fully to understand the current keyword-matching logic before adding Tier 2.
|
|
1852
|
+
|
|
1853
|
+
**Step 2: Add Tier 2 function**
|
|
1854
|
+
|
|
1855
|
+
Add after the existing `run_echo_back()` function:
|
|
1856
|
+
|
|
1857
|
+
```bash
|
|
1858
|
+
# --- Tier 2: LLM semantic verification ---
|
|
1859
|
+
# Activates on batch 1, integration batches, or --strict-echo-back
|
|
1860
|
+
# Requires: claude CLI available
|
|
1861
|
+
run_echo_back_tier2() {
|
|
1862
|
+
local batch_text="$1"
|
|
1863
|
+
local agent_summary="$2"
|
|
1864
|
+
|
|
1865
|
+
if ! command -v claude >/dev/null 2>&1; then
|
|
1866
|
+
echo "echo-back-tier2: claude CLI not available — skipping" >&2
|
|
1867
|
+
return 0
|
|
1868
|
+
fi
|
|
1869
|
+
|
|
1870
|
+
local prompt
|
|
1871
|
+
prompt=$(cat <<PROMPT
|
|
1872
|
+
You are a specification compliance reviewer. Compare:
|
|
1873
|
+
|
|
1874
|
+
SPECIFICATION:
|
|
1875
|
+
$batch_text
|
|
1876
|
+
|
|
1877
|
+
AGENT'S UNDERSTANDING:
|
|
1878
|
+
$agent_summary
|
|
1879
|
+
|
|
1880
|
+
Does the agent's understanding match the specification? Flag any:
|
|
1881
|
+
- Missing requirements
|
|
1882
|
+
- Added requirements not in spec
|
|
1883
|
+
- Misinterpreted requirements
|
|
1884
|
+
- Ambiguous interpretations
|
|
1885
|
+
|
|
1886
|
+
Output exactly one line: PASS or FAIL followed by a colon and explanation.
|
|
1887
|
+
PROMPT
|
|
1888
|
+
)
|
|
1889
|
+
|
|
1890
|
+
local result
|
|
1891
|
+
result=$(echo "$prompt" | claude -p --max-tokens 200 2>/dev/null || echo "PASS: echo-back tier2 unavailable")
|
|
1892
|
+
|
|
1893
|
+
if echo "$result" | grep -qi "^FAIL"; then
|
|
1894
|
+
echo "echo-back-tier2: FAILED — $result"
|
|
1895
|
+
return 1
|
|
1896
|
+
else
|
|
1897
|
+
echo "echo-back-tier2: PASSED"
|
|
1898
|
+
return 0
|
|
1899
|
+
fi
|
|
1900
|
+
}
|
|
1901
|
+
|
|
1902
|
+
# Determine if tier 2 should activate
|
|
1903
|
+
should_run_tier2() {
|
|
1904
|
+
local batch_number="${1:-0}"
|
|
1905
|
+
local batch_type="${2:-unknown}"
|
|
1906
|
+
local strict="${3:-false}"
|
|
1907
|
+
|
|
1908
|
+
# Always on batch 1 (disproportionate risk)
|
|
1909
|
+
[[ "$batch_number" == "1" ]] && return 0
|
|
1910
|
+
|
|
1911
|
+
# Always on integration batches
|
|
1912
|
+
[[ "$batch_type" == "integration" ]] && return 0
|
|
1913
|
+
|
|
1914
|
+
# When strict mode is set
|
|
1915
|
+
[[ "$strict" == "true" ]] && return 0
|
|
1916
|
+
|
|
1917
|
+
return 1
|
|
1918
|
+
}
|
|
1919
|
+
```
|
|
1920
|
+
|
|
1921
|
+
**Step 3: Integration point**
|
|
1922
|
+
|
|
1923
|
+
The Tier 2 function is now available. Integration into the run-plan headless loop is optional — it will be called by `run-plan-headless.sh` when `STRICT_ECHO_BACK=true` or conditions match. The implementer should add the call at the appropriate point in the headless loop (after agent generates output, before quality gate).
|
|
1924
|
+
|
|
1925
|
+
**Step 4: Commit**
|
|
1926
|
+
|
|
1927
|
+
```bash
|
|
1928
|
+
git add scripts/lib/run-plan-echo-back.sh
|
|
1929
|
+
git commit -m "feat: add Tier 2 semantic echo-back via LLM verification"
|
|
1930
|
+
```
|
|
1931
|
+
|
|
1932
|
+
### Task 21: Final test suite + quality gate
|
|
1933
|
+
|
|
1934
|
+
**Step 1: Run full test suite**
|
|
1935
|
+
|
|
1936
|
+
Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
|
|
1937
|
+
Expected: All tests PASS (including all new tests from this plan)
|
|
1938
|
+
|
|
1939
|
+
**Step 2: Run quality gate**
|
|
1940
|
+
|
|
1941
|
+
Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
|
|
1942
|
+
Expected: ALL PASSED
|
|
1943
|
+
|
|
1944
|
+
**Step 3: Run validate-all**
|
|
1945
|
+
|
|
1946
|
+
Run: `bash scripts/validate-all.sh`
|
|
1947
|
+
Expected: All validators pass
|
|
1948
|
+
|
|
1949
|
+
---
|
|
1950
|
+
|
|
1951
|
+
## Summary
|
|
1952
|
+
|
|
1953
|
+
| Batch | Priority | Tasks | New Files | Modified Files |
|
|
1954
|
+
|-------|----------|-------|-----------|---------------|
|
|
1955
|
+
| 1 | P0 | 1-3 | `package.json`, `bin/act.js`, `test-act-cli.sh` | — |
|
|
1956
|
+
| 2 | P0 | 4-5 | `scripts/init.sh`, `test-init.sh` | — |
|
|
1957
|
+
| 3 | P0 | 6-8 | `test-telegram-env.sh`, `test-lesson-local.sh` | `telegram.sh`, `lesson-check.sh` |
|
|
1958
|
+
| 4 | P0 | 9-11 | `.npmignore` | `README.md` |
|
|
1959
|
+
| 5 | P1 | 12-13 | `scripts/telemetry.sh`, `test-telemetry.sh` | — |
|
|
1960
|
+
| 6 | P1 | 14-15 | — | `quality-gate.sh` |
|
|
1961
|
+
| 7 | P1 | 16-17 | `benchmarks/runner.sh`, 5 task dirs, `test-benchmark-runner.sh` | `.gitignore` |
|
|
1962
|
+
| 8 | P2 | 18-19 | — | `telemetry.sh`, `pipeline-status.sh` |
|
|
1963
|
+
| 9 | P2 | 20-21 | — | `run-plan-echo-back.sh` |
|
|
1964
|
+
|
|
1965
|
+
**Total: 21 tasks across 9 batches. ~1,150 new lines. 6 new files, 6 modified files.**
|