autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,1965 @@
1
+ # npm Packaging Implementation Plan
2
+
3
+ > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
4
+
5
+ **Goal:** Package the autonomous-coding-toolkit as an installable npm package (`act` CLI) with telemetry, benchmarks, and learning system infrastructure.
6
+
7
+ **Architecture:** Add `package.json` + `bin/act.js` Node.js router on top of existing bash scripts. No scripts move or change structure. Three new scripts (`init.sh`, `telemetry.sh`, `benchmarks/runner.sh`) follow existing patterns. All state remains project-local.
8
+
9
+ **Tech Stack:** Node.js 18+ (CLI router only), bash 4+ (all scripts), jq (state/telemetry)
10
+
11
+ **Design doc:** `docs/plans/2026-02-24-npm-packaging-design.md`
12
+
13
+ ---
14
+
15
+ ## Priority Tiers
16
+
17
+ - **P0 (Batches 1-4):** Required for `npm publish` — package.json, CLI router, init, portability fixes, README
18
+ - **P1 (Batches 5-7):** Learning system — telemetry capture/dashboard, quality gate integration, benchmark suite
19
+ - **P2 (Batches 8-9):** Enhancements — trust score, graduated autonomy, semantic echo-back Tier 2
20
+
21
+ ---
22
+
23
+ ## Batch 1: package.json + CLI Router
24
+
25
+ ### Task 1: Create package.json
26
+
27
+ **Files:**
28
+ - Create: `package.json`
29
+
30
+ **Step 1: Create package.json**
31
+
32
+ ```json
33
+ {
34
+ "name": "autonomous-coding-toolkit",
35
+ "version": "1.0.0",
36
+ "description": "Autonomous AI coding pipeline: quality gates, fresh-context execution, community lessons, and compounding learning",
37
+ "license": "MIT",
38
+ "author": "Justin McFarland <parthalon025@gmail.com>",
39
+ "homepage": "https://github.com/parthalon025/autonomous-coding-toolkit",
40
+ "repository": "https://github.com/parthalon025/autonomous-coding-toolkit",
41
+ "bin": {
42
+ "act": "./bin/act.js"
43
+ },
44
+ "files": [
45
+ "bin/",
46
+ "scripts/",
47
+ "skills/",
48
+ "commands/",
49
+ "agents/",
50
+ "hooks/",
51
+ "policies/",
52
+ "examples/",
53
+ "benchmarks/",
54
+ "docs/",
55
+ ".claude-plugin/",
56
+ "Makefile",
57
+ "SECURITY.md"
58
+ ],
59
+ "engines": {
60
+ "node": ">=18.0.0"
61
+ },
62
+ "os": [
63
+ "linux",
64
+ "darwin",
65
+ "win32"
66
+ ],
67
+ "keywords": [
68
+ "autonomous-coding",
69
+ "ai-agents",
70
+ "quality-gates",
71
+ "claude-code",
72
+ "tdd",
73
+ "lessons-learned",
74
+ "headless",
75
+ "multi-armed-bandit",
76
+ "code-review",
77
+ "pipeline"
78
+ ]
79
+ }
80
+ ```
81
+
82
+ **Step 2: Verify package.json is valid**
83
+
84
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && node -e "require('./package.json'); console.log('valid')"`
85
+ Expected: `valid`
86
+
87
+ **Step 3: Verify npm pack lists expected files**
88
+
89
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && npm pack --dry-run 2>&1 | head -20`
90
+ Expected: Output should list `bin/act.js`, `scripts/`, `skills/`, `docs/`, etc. Should NOT list `logs/`, `.run-plan-state.json`, `.worktrees/`.
91
+
92
+ **Step 4: Commit**
93
+
94
+ ```bash
95
+ git add package.json
96
+ git commit -m "feat: add package.json for npm distribution"
97
+ ```
98
+
99
+ ### Task 2: Create bin/act.js CLI Router
100
+
101
+ **Files:**
102
+ - Create: `bin/act.js`
103
+
104
+ **Step 1: Create directory**
105
+
106
+ ```bash
107
+ mkdir -p bin
108
+ ```
109
+
110
+ **Step 2: Write bin/act.js**
111
+
112
+ ```javascript
113
+ #!/usr/bin/env node
114
+ 'use strict';
115
+
116
+ const { execFileSync, execSync } = require('child_process');
117
+ const path = require('path');
118
+ const fs = require('fs');
119
+
120
+ const TOOLKIT_ROOT = path.resolve(__dirname, '..');
121
+ const SCRIPTS = path.join(TOOLKIT_ROOT, 'scripts');
122
+ const VERSION = require(path.join(TOOLKIT_ROOT, 'package.json')).version;
123
+
124
+ // --- Platform check ---
125
+ function checkBash() {
126
+ try {
127
+ execFileSync('bash', ['--version'], { stdio: 'pipe' });
128
+ } catch {
129
+ console.error('Error: bash is required but not found.');
130
+ if (process.platform === 'win32') {
131
+ console.error('');
132
+ console.error('On Windows, install WSL (Windows Subsystem for Linux):');
133
+ console.error(' wsl --install');
134
+ console.error('Then run this command inside WSL.');
135
+ }
136
+ process.exit(1);
137
+ }
138
+ }
139
+
140
+ // --- Dependency check ---
141
+ function checkDeps() {
142
+ const required = ['git', 'jq'];
143
+ const missing = required.filter(cmd => {
144
+ try {
145
+ execFileSync('which', [cmd], { stdio: 'pipe' });
146
+ return false;
147
+ } catch {
148
+ return true;
149
+ }
150
+ });
151
+ if (missing.length > 0) {
152
+ console.error(`Error: Required commands not found: ${missing.join(', ')}`);
153
+ console.error('Install them and try again.');
154
+ process.exit(1);
155
+ }
156
+ }
157
+
158
+ // --- Command routing ---
159
+ const COMMANDS = {
160
+ // Execution
161
+ 'plan': { script: 'run-plan.sh' },
162
+ 'compound': { script: 'auto-compound.sh' },
163
+ 'mab': { script: 'mab-run.sh' },
164
+
165
+ // Quality
166
+ 'gate': { script: 'quality-gate.sh' },
167
+ 'check': { script: 'lesson-check.sh' },
168
+ 'policy': { script: 'policy-check.sh' },
169
+ 'research-gate': { script: 'research-gate.sh' },
170
+ 'validate': { script: 'validate-all.sh' },
171
+ 'validate-plan': { script: 'validate-plan-quality.sh' },
172
+ 'validate-prd': { script: 'validate-prd.sh' },
173
+
174
+ // Lessons
175
+ 'lessons': { dispatch: true },
176
+
177
+ // Analysis
178
+ 'audit': { script: 'entropy-audit.sh' },
179
+ 'batch-audit': { script: 'batch-audit.sh' },
180
+ 'batch-test': { script: 'batch-test.sh' },
181
+ 'analyze': { script: 'analyze-report.sh' },
182
+ 'digest': { script: 'failure-digest.sh' },
183
+ 'status': { script: 'pipeline-status.sh' },
184
+ 'architecture': { script: 'architecture-map.sh' },
185
+
186
+ // Setup
187
+ 'init': { script: 'init.sh' },
188
+ 'license-check': { script: 'license-check.sh' },
189
+ 'module-size': { script: 'module-size-check.sh' },
190
+
191
+ // Telemetry
192
+ 'telemetry': { script: 'telemetry.sh' },
193
+
194
+ // Benchmarks
195
+ 'benchmark': { script: path.join('..', 'benchmarks', 'runner.sh'), relative: true },
196
+ };
197
+
198
+ // Lessons sub-dispatch
199
+ const LESSONS_COMMANDS = {
200
+ 'pull': { script: 'pull-community-lessons.sh' },
201
+ 'check': { script: 'lesson-check.sh', args: ['--list'] },
202
+ 'promote': { script: 'promote-mab-lessons.sh' },
203
+ 'infer': { script: 'scope-infer.sh' },
204
+ };
205
+
206
+ function runScript(scriptPath, args) {
207
+ const fullPath = path.join(SCRIPTS, scriptPath);
208
+ if (!fs.existsSync(fullPath)) {
209
+ console.error(`Error: Script not found: ${fullPath}`);
210
+ console.error('This command may not be available yet.');
211
+ process.exit(1);
212
+ }
213
+ try {
214
+ execFileSync('bash', [fullPath, ...args], { stdio: 'inherit' });
215
+ } catch (err) {
216
+ process.exit(err.status || 1);
217
+ }
218
+ }
219
+
220
+ function showHelp() {
221
+ console.log(`Autonomous Coding Toolkit v${VERSION}`);
222
+ console.log('');
223
+ console.log('Usage: act <command> [options]');
224
+ console.log('');
225
+ console.log('Execution:');
226
+ console.log(' plan <file> [flags] Headless/team/MAB batch execution');
227
+ console.log(' plan --resume Resume interrupted execution');
228
+ console.log(' compound [dir] Full pipeline: report→PRD→execute→PR');
229
+ console.log(' mab <flags> Multi-Armed Bandit competing agents');
230
+ console.log('');
231
+ console.log('Quality:');
232
+ console.log(' gate [flags] Composite quality gate');
233
+ console.log(' check [files...] Syntactic anti-pattern scan');
234
+ console.log(' policy [flags] Advisory positive-pattern check');
235
+ console.log(' validate Toolkit self-validation');
236
+ console.log(' validate-plan <file> Score plan quality (8 dimensions)');
237
+ console.log(' validate-prd [file] Validate PRD JSON structure');
238
+ console.log('');
239
+ console.log('Lessons:');
240
+ console.log(' lessons pull [--remote] Sync community lessons');
241
+ console.log(' lessons check List active lesson checks');
242
+ console.log(' lessons promote Auto-promote MAB patterns');
243
+ console.log(' lessons infer [--apply] Infer scope tags');
244
+ console.log('');
245
+ console.log('Analysis:');
246
+ console.log(' audit [flags] Doc drift & naming violations');
247
+ console.log(' batch-audit <dir> Cross-project audit');
248
+ console.log(' batch-test <dir> Memory-aware cross-project tests');
249
+ console.log(' analyze <report> Extract priority from report');
250
+ console.log(' digest <log> Summarize failure patterns');
251
+ console.log(' status [dir] Pipeline health check');
252
+ console.log(' architecture [dir] Generate architecture diagram');
253
+ console.log('');
254
+ console.log('Telemetry:');
255
+ console.log(' telemetry show Dashboard: success rate, cost, lesson hits');
256
+ console.log(' telemetry export Export anonymized run data');
257
+ console.log(' telemetry import <f> Import community aggregate data');
258
+ console.log(' telemetry reset Clear local telemetry');
259
+ console.log('');
260
+ console.log('Benchmarks:');
261
+ console.log(' benchmark run [name] Execute benchmark tasks');
262
+ console.log(' benchmark compare a b Compare two benchmark results');
263
+ console.log('');
264
+ console.log('Setup:');
265
+ console.log(' init Bootstrap project for toolkit use');
266
+ console.log(' init --quickstart Fast lane: working example in <3 min');
267
+ console.log(' license-check GPL/AGPL dependency audit');
268
+ console.log(' module-size Detect oversized modules');
269
+ console.log('');
270
+ console.log('Meta:');
271
+ console.log(' version Print version');
272
+ console.log(' help Show this help');
273
+ }
274
+
275
+ // --- Main ---
276
+ function main() {
277
+ const args = process.argv.slice(2);
278
+ const command = args[0];
279
+ const rest = args.slice(1);
280
+
281
+ if (!command || command === 'help' || command === '--help' || command === '-h') {
282
+ showHelp();
283
+ process.exit(0);
284
+ }
285
+
286
+ if (command === 'version' || command === '--version' || command === '-v') {
287
+ console.log(`act v${VERSION}`);
288
+ process.exit(0);
289
+ }
290
+
291
+ checkBash();
292
+ checkDeps();
293
+
294
+ // Lessons sub-dispatch
295
+ if (command === 'lessons') {
296
+ const sub = rest[0];
297
+ if (!sub || !LESSONS_COMMANDS[sub]) {
298
+ console.error('Usage: act lessons <pull|check|promote|infer> [options]');
299
+ process.exit(1);
300
+ }
301
+ const cmd = LESSONS_COMMANDS[sub];
302
+ const subArgs = cmd.args ? [...cmd.args, ...rest.slice(1)] : rest.slice(1);
303
+ runScript(cmd.script, subArgs);
304
+ return;
305
+ }
306
+
307
+ const cmd = COMMANDS[command];
308
+ if (!cmd) {
309
+ console.error(`Unknown command: ${command}`);
310
+ console.error('Run "act help" for available commands.');
311
+ process.exit(1);
312
+ }
313
+
314
+ if (cmd.relative) {
315
+ // Script path relative to toolkit root, not scripts/
316
+ const fullPath = path.join(TOOLKIT_ROOT, 'benchmarks', 'runner.sh');
317
+ if (!fs.existsSync(fullPath)) {
318
+ console.error(`Error: Script not found: ${fullPath}`);
319
+ process.exit(1);
320
+ }
321
+ try {
322
+ execFileSync('bash', [fullPath, ...rest], { stdio: 'inherit' });
323
+ } catch (err) {
324
+ process.exit(err.status || 1);
325
+ }
326
+ return;
327
+ }
328
+
329
+ runScript(cmd.script, rest);
330
+ }
331
+
332
+ main();
333
+ ```
334
+
335
+ **Step 3: Make executable**
336
+
337
+ ```bash
338
+ chmod +x bin/act.js
339
+ ```
340
+
341
+ **Step 4: Verify the router starts**
342
+
343
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && node bin/act.js version`
344
+ Expected: `act v1.0.0`
345
+
346
+ Run: `node bin/act.js help | head -5`
347
+ Expected: Shows "Autonomous Coding Toolkit v1.0.0" and "Usage: act <command> [options]"
348
+
349
+ **Step 5: Verify subcommand routing works**
350
+
351
+ Run: `node bin/act.js validate --help`
352
+ Expected: Shows validate-all.sh usage (or runs successfully)
353
+
354
+ Run: `node bin/act.js gate --help`
355
+ Expected: Shows quality-gate.sh usage
356
+
357
+ **Step 6: Commit**
358
+
359
+ ```bash
360
+ git add bin/act.js
361
+ git commit -m "feat: add bin/act.js CLI router for npm distribution"
362
+ ```
363
+
364
+ ### Task 3: Write test for CLI router
365
+
366
+ **Files:**
367
+ - Create: `scripts/tests/test-act-cli.sh`
368
+
369
+ **Step 1: Write the test**
370
+
371
+ ```bash
372
+ #!/usr/bin/env bash
373
+ # Test bin/act.js — CLI router
374
+ set -euo pipefail
375
+
376
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
377
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
378
+ ACT="$REPO_ROOT/bin/act.js"
379
+
380
+ source "$SCRIPT_DIR/test-helpers.sh"
381
+
382
+ # --- Test 1: version ---
383
+ output=$(node "$ACT" version 2>&1)
384
+ assert_contains "act version prints version" "act v" "$output"
385
+
386
+ # --- Test 2: help ---
387
+ output=$(node "$ACT" help 2>&1)
388
+ assert_contains "act help shows usage" "Usage: act <command>" "$output"
389
+ assert_contains "act help lists plan command" "plan" "$output"
390
+ assert_contains "act help lists gate command" "gate" "$output"
391
+
392
+ # --- Test 3: unknown command exits non-zero ---
393
+ exit_code=0
394
+ node "$ACT" nonexistent-command >/dev/null 2>&1 || exit_code=$?
395
+ assert_eq "unknown command exits non-zero" "1" "$exit_code"
396
+
397
+ # --- Test 4: validate routes correctly ---
398
+ output=$(node "$ACT" validate --help 2>&1 || true)
399
+ assert_contains "validate routes to validate-all.sh" "validate" "$output"
400
+
401
+ # --- Test 5: lessons subcommand without sub shows usage ---
402
+ exit_code=0
403
+ output=$(node "$ACT" lessons 2>&1) || exit_code=$?
404
+ assert_eq "lessons without sub exits non-zero" "1" "$exit_code"
405
+ assert_contains "lessons shows usage hint" "Usage: act lessons" "$output"
406
+
407
+ report_results
408
+ ```
409
+
410
+ **Step 2: Make executable and run**
411
+
412
+ ```bash
413
+ chmod +x scripts/tests/test-act-cli.sh
414
+ ```
415
+
416
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-act-cli.sh`
417
+ Expected: All tests PASS
418
+
419
+ **Step 3: Verify run-all-tests discovers it**
420
+
421
+ Run: `bash scripts/tests/run-all-tests.sh 2>&1 | tail -5`
422
+ Expected: test-act-cli.sh appears in the test list, all pass
423
+
424
+ **Step 4: Commit**
425
+
426
+ ```bash
427
+ git add scripts/tests/test-act-cli.sh
428
+ git commit -m "test: add CLI router tests for bin/act.js"
429
+ ```
430
+
431
+ ---
432
+
433
+ ## Batch 2: Project Bootstrapper (act init)
434
+
435
+ ### Task 4: Write test for init.sh
436
+
437
+ **Files:**
438
+ - Create: `scripts/tests/test-init.sh`
439
+
440
+ **Step 1: Write the failing test**
441
+
442
+ ```bash
443
+ #!/usr/bin/env bash
444
+ # Test scripts/init.sh — project bootstrapper
445
+ set -euo pipefail
446
+
447
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
448
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
449
+ INIT_SCRIPT="$REPO_ROOT/scripts/init.sh"
450
+
451
+ source "$SCRIPT_DIR/test-helpers.sh"
452
+
453
+ # --- Setup temp project ---
454
+ WORK=$(mktemp -d)
455
+ trap 'rm -rf "$WORK"' EXIT
456
+ cd "$WORK"
457
+ git init -q
458
+
459
+ # --- Test 1: init creates tasks/ directory ---
460
+ bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true
461
+ assert_eq "init creates tasks/ directory" "true" "$([ -d "$WORK/tasks" ] && echo true || echo false)"
462
+
463
+ # --- Test 2: init creates progress.txt ---
464
+ assert_eq "init creates progress.txt" "true" "$([ -f "$WORK/progress.txt" ] && echo true || echo false)"
465
+
466
+ # --- Test 3: init creates logs/ directory ---
467
+ assert_eq "init creates logs/ directory" "true" "$([ -d "$WORK/logs" ] && echo true || echo false)"
468
+
469
+ # --- Test 4: init detects project type ---
470
+ output=$(bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true)
471
+ assert_contains "init detects project type" "Detected:" "$output"
472
+
473
+ # --- Test 5: init with --quickstart copies quickstart plan ---
474
+ mkdir -p "$WORK/docs/plans"
475
+ bash "$INIT_SCRIPT" --project-root "$WORK" --quickstart 2>&1 || true
476
+ assert_eq "quickstart creates plan file" "true" "$([ -f "$WORK/docs/plans/quickstart.md" ] && echo true || echo false)"
477
+
478
+ # --- Test 6: init is idempotent ---
479
+ bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || true
480
+ exit_code=0
481
+ bash "$INIT_SCRIPT" --project-root "$WORK" 2>&1 || exit_code=$?
482
+ assert_eq "init is idempotent (exit 0 on re-run)" "0" "$exit_code"
483
+
484
+ report_results
485
+ ```
486
+
487
+ **Step 2: Make executable and verify it fails**
488
+
489
+ ```bash
490
+ chmod +x scripts/tests/test-init.sh
491
+ ```
492
+
493
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-init.sh 2>&1 | tail -3`
494
+ Expected: FAIL (init.sh doesn't exist yet)
495
+
496
+ ### Task 5: Implement init.sh
497
+
498
+ **Files:**
499
+ - Create: `scripts/init.sh`
500
+
501
+ **Step 1: Write the implementation**
502
+
503
+ ```bash
504
+ #!/usr/bin/env bash
505
+ # init.sh — Bootstrap a project for use with the Autonomous Coding Toolkit
506
+ #
507
+ # Usage: init.sh --project-root <dir> [--quickstart]
508
+ set -euo pipefail
509
+
510
+ SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
511
+ TOOLKIT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
512
+ source "$SCRIPT_DIR/lib/common.sh"
513
+
514
+ PROJECT_ROOT=""
515
+ QUICKSTART=false
516
+
517
+ usage() {
518
+ cat <<'USAGE'
519
+ Usage: init.sh --project-root <dir> [--quickstart]
520
+
521
+ Bootstrap a project for the Autonomous Coding Toolkit.
522
+
523
+ Creates:
524
+ tasks/ — PRD and acceptance criteria
525
+ logs/ — Telemetry, routing decisions, failure patterns
526
+ progress.txt — Append-only discovery log
527
+
528
+ Options:
529
+ --project-root <dir> Project directory to initialize (required)
530
+ --quickstart Copy quickstart plan + run quality gate
531
+ --help, -h Show this help
532
+
533
+ USAGE
534
+ exit 0
535
+ }
536
+
537
+ while [[ $# -gt 0 ]]; do
538
+ case "$1" in
539
+ --project-root) PROJECT_ROOT="${2:-}"; shift 2 ;;
540
+ --quickstart) QUICKSTART=true; shift ;;
541
+ --help|-h) usage ;;
542
+ *) echo "init: unknown option: $1" >&2; exit 1 ;;
543
+ esac
544
+ done
545
+
546
+ if [[ -z "$PROJECT_ROOT" ]]; then
547
+ echo "init: --project-root is required" >&2
548
+ exit 1
549
+ fi
550
+
551
+ PROJECT_ROOT="$(cd "$PROJECT_ROOT" && pwd)"
552
+
553
+ echo "Autonomous Coding Toolkit — Project Init"
554
+ echo "========================================="
555
+ echo ""
556
+
557
+ # Detect project type
558
+ project_type=$(detect_project_type "$PROJECT_ROOT")
559
+ echo "Detected: $project_type project"
560
+
561
+ # Create directories
562
+ mkdir -p "$PROJECT_ROOT/tasks"
563
+ mkdir -p "$PROJECT_ROOT/logs"
564
+ mkdir -p "$PROJECT_ROOT/docs/plans"
565
+ echo "Created: tasks/, logs/, docs/plans/"
566
+
567
+ # Create progress.txt if missing
568
+ if [[ ! -f "$PROJECT_ROOT/progress.txt" ]]; then
569
+ echo "# Progress — $(basename "$PROJECT_ROOT")" > "$PROJECT_ROOT/progress.txt"
570
+ echo "# Append-only discovery log. Read at start of each batch." >> "$PROJECT_ROOT/progress.txt"
571
+ echo "" >> "$PROJECT_ROOT/progress.txt"
572
+ echo "Created: progress.txt"
573
+ else
574
+ echo "Exists: progress.txt (skipped)"
575
+ fi
576
+
577
+ # Detect language for scope tags
578
+ scope_lang=""
579
+ case "$project_type" in
580
+ python) scope_lang="language:python" ;;
581
+ node) scope_lang="language:javascript" ;;
582
+ bash) scope_lang="language:bash" ;;
583
+ *) scope_lang="" ;;
584
+ esac
585
+
586
+ # Print next steps
587
+ echo ""
588
+ echo "--- Next Steps ---"
589
+ echo ""
590
+ echo "1. Quality gate: act gate --project-root $PROJECT_ROOT"
591
+ echo "2. Run a plan: act plan docs/plans/your-plan.md"
592
+
593
+ if [[ -n "$scope_lang" ]]; then
594
+ echo ""
595
+ echo "Recommended: Add to your CLAUDE.md:"
596
+ echo " ## Scope Tags"
597
+ echo " $scope_lang"
598
+ fi
599
+
600
+ # Quickstart mode
601
+ if [[ "$QUICKSTART" == true ]]; then
602
+ echo ""
603
+ echo "--- Quickstart ---"
604
+ if [[ -f "$TOOLKIT_ROOT/examples/quickstart-plan.md" ]]; then
605
+ cp "$TOOLKIT_ROOT/examples/quickstart-plan.md" "$PROJECT_ROOT/docs/plans/quickstart.md"
606
+ echo "Copied: docs/plans/quickstart.md"
607
+ echo ""
608
+ echo "Run your first quality-gated execution:"
609
+ echo " act plan docs/plans/quickstart.md"
610
+ else
611
+ echo "WARNING: quickstart-plan.md not found in toolkit" >&2
612
+ fi
613
+ fi
614
+
615
+ echo ""
616
+ echo "Init complete."
617
+ ```
618
+
619
+ **Step 2: Make executable**
620
+
621
+ ```bash
622
+ chmod +x scripts/init.sh
623
+ ```
624
+
625
+ **Step 3: Run the tests**
626
+
627
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-init.sh`
628
+ Expected: All tests PASS
629
+
630
+ **Step 4: Commit**
631
+
632
+ ```bash
633
+ git add scripts/init.sh scripts/tests/test-init.sh
634
+ git commit -m "feat: add init.sh project bootstrapper with quickstart mode"
635
+ ```
636
+
637
+ ---
638
+
639
+ ## Batch 3: Portability Fixes
640
+
641
+ ### Task 6: Fix hardcoded ~/.env in telegram.sh
642
+
643
+ **Files:**
644
+ - Modify: `scripts/lib/telegram.sh:9`
645
+ - Create: `scripts/tests/test-telegram-env.sh`
646
+
647
+ **Step 1: Write the failing test**
648
+
649
+ ```bash
650
+ #!/usr/bin/env bash
651
+ # Test telegram.sh — ACT_ENV_FILE support
652
+ set -euo pipefail
653
+
654
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
655
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
656
+
657
+ source "$SCRIPT_DIR/test-helpers.sh"
658
+
659
+ # --- Setup ---
660
+ WORK=$(mktemp -d)
661
+ trap 'rm -rf "$WORK"' EXIT
662
+
663
+ # Create a fake .env
664
+ cat > "$WORK/test.env" <<'ENV'
665
+ TELEGRAM_BOT_TOKEN=test-token-123
666
+ TELEGRAM_CHAT_ID=test-chat-456
667
+ ENV
668
+
669
+ # --- Test 1: ACT_ENV_FILE overrides default ---
670
+ (
671
+ export ACT_ENV_FILE="$WORK/test.env"
672
+ source "$REPO_ROOT/scripts/lib/telegram.sh"
673
+ _load_telegram_env
674
+ assert_eq "ACT_ENV_FILE loads token" "test-token-123" "$TELEGRAM_BOT_TOKEN"
675
+ assert_eq "ACT_ENV_FILE loads chat id" "test-chat-456" "$TELEGRAM_CHAT_ID"
676
+ )
677
+
678
+ # --- Test 2: Explicit argument still works ---
679
+ (
680
+ source "$REPO_ROOT/scripts/lib/telegram.sh"
681
+ _load_telegram_env "$WORK/test.env"
682
+ assert_eq "Explicit arg loads token" "test-token-123" "$TELEGRAM_BOT_TOKEN"
683
+ )
684
+
685
+ # --- Test 3: Missing file returns error ---
686
+ (
687
+ source "$REPO_ROOT/scripts/lib/telegram.sh"
688
+ exit_code=0
689
+ _load_telegram_env "$WORK/nonexistent.env" 2>/dev/null || exit_code=$?
690
+ assert_eq "Missing env file returns 1" "1" "$exit_code"
691
+ )
692
+
693
+ report_results
694
+ ```
695
+
696
+ **Step 2: Make executable and verify it fails**
697
+
698
+ ```bash
699
+ chmod +x scripts/tests/test-telegram-env.sh
700
+ ```
701
+
702
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telegram-env.sh 2>&1 | tail -3`
703
+ Expected: Test 1 FAILS (ACT_ENV_FILE not recognized yet)
704
+
705
+ **Step 3: Fix telegram.sh**
706
+
707
+ In `scripts/lib/telegram.sh`, change line 9 from:
708
+
709
+ ```bash
710
+ local env_file="${1:-$HOME/.env}"
711
+ ```
712
+
713
+ to:
714
+
715
+ ```bash
716
+ local env_file="${1:-${ACT_ENV_FILE:-$HOME/.env}}"
717
+ ```
718
+
719
+ This adds `ACT_ENV_FILE` as an intermediate default — if set, it overrides `$HOME/.env`; if not, behavior is unchanged.
720
+
721
+ **Step 4: Run the tests**
722
+
723
+ Run: `bash scripts/tests/test-telegram-env.sh`
724
+ Expected: All tests PASS
725
+
726
+ **Step 5: Commit**
727
+
728
+ ```bash
729
+ git add scripts/lib/telegram.sh scripts/tests/test-telegram-env.sh
730
+ git commit -m "fix: support ACT_ENV_FILE in telegram.sh for portable installs"
731
+ ```
732
+
733
+ ### Task 7: Add ACT_ENV_FILE support to ollama.sh
734
+
735
+ **Files:**
736
+ - Modify: `scripts/lib/ollama.sh` (add env file sourcing)
737
+
738
+ **Step 1: Verify current behavior**
739
+
740
+ The ollama.sh module already uses env vars (`OLLAMA_DIRECT_URL`, `OLLAMA_QUEUE_URL`) with defaults. No hardcoded path to fix — the credentials (if any) come from the calling script's environment.
741
+
742
+ If `ACT_ENV_FILE` is set, the calling script (e.g., `auto-compound.sh`) should source it. This is not an ollama.sh change — it's a convention.
743
+
744
+ **Step 2: Verify no change needed**
745
+
746
+ Run: `grep -n 'HOME\|\.env' ~/Documents/projects/autonomous-coding-toolkit/scripts/lib/ollama.sh`
747
+ Expected: No matches (ollama.sh has no hardcoded paths)
748
+
749
+ **Step 3: Skip — no change needed**
750
+
751
+ ollama.sh is already portable. Document the `ACT_ENV_FILE` convention in init.sh output instead.
752
+
753
+ ### Task 8: Add project-local lessons fallback to lesson-check.sh
754
+
755
+ **Files:**
756
+ - Modify: `scripts/lesson-check.sh:8`
757
+ - Create: `scripts/tests/test-lesson-local.sh`
758
+
759
+ **Step 1: Write the failing test**
760
+
761
+ ```bash
762
+ #!/usr/bin/env bash
763
+ # Test lesson-check.sh — project-local lesson loading (Tier 3)
764
+ set -euo pipefail
765
+
766
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
767
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
768
+ LESSON_CHECK="$REPO_ROOT/scripts/lesson-check.sh"
769
+
770
+ source "$SCRIPT_DIR/test-helpers.sh"
771
+
772
+ # --- Setup: project with local lessons ---
773
+ WORK=$(mktemp -d)
774
+ trap 'rm -rf "$WORK"' EXIT
775
+
776
+ # Create a project-local lesson
777
+ mkdir -p "$WORK/docs/lessons"
778
+ cat > "$WORK/docs/lessons/0001-local-test.md" <<'LESSON'
779
+ ---
780
+ id: "0001"
781
+ title: "Test local lesson"
782
+ severity: error
783
+ languages: [python]
784
+ scope: [universal]
785
+ category: testing
786
+ pattern:
787
+ type: syntactic
788
+ regex: "LOCALTEST_BAD_PATTERN"
789
+ fix: "Use LOCALTEST_GOOD_PATTERN instead"
790
+ positive_alternative: "LOCALTEST_GOOD_PATTERN"
791
+ ---
792
+ LESSON
793
+
794
+ # Create a file that triggers the local lesson
795
+ cat > "$WORK/bad.py" <<'PY'
796
+ x = LOCALTEST_BAD_PATTERN
797
+ PY
798
+
799
+ # --- Test: project-local lesson is loaded ---
800
+ output=$(PROJECT_ROOT="$WORK" PROJECT_CLAUDE_MD="/dev/null" bash "$LESSON_CHECK" "$WORK/bad.py" 2>&1 || true)
801
+ if echo "$output" | grep -q 'lesson-1'; then
802
+ pass "Project-local lesson detected violation"
803
+ else
804
+ fail "Project-local lesson not loaded, got: $output"
805
+ fi
806
+
807
+ # --- Test: clean file passes with local lessons ---
808
+ cat > "$WORK/good.py" <<'PY'
809
+ x = LOCALTEST_GOOD_PATTERN
810
+ PY
811
+
812
+ exit_code=0
813
+ PROJECT_ROOT="$WORK" PROJECT_CLAUDE_MD="/dev/null" bash "$LESSON_CHECK" "$WORK/good.py" 2>/dev/null || exit_code=$?
814
+ assert_eq "Clean file passes with local lessons" "0" "$exit_code"
815
+
816
+ report_results
817
+ ```
818
+
819
+ **Step 2: Make executable and verify it fails**
820
+
821
+ ```bash
822
+ chmod +x scripts/tests/test-lesson-local.sh
823
+ ```
824
+
825
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-lesson-local.sh 2>&1 | tail -3`
826
+ Expected: FAIL (project-local lessons not loaded yet)
827
+
828
+ **Step 3: Add project-local lesson loading**
829
+
830
+ In `scripts/lesson-check.sh`, after line 8 (`LESSONS_DIR=...`), add:
831
+
832
+ ```bash
833
+ # Project-local lessons (Tier 3) — loaded alongside bundled lessons.
834
+ # Set PROJECT_ROOT to the project being checked for project-specific anti-patterns.
835
+ PROJECT_LESSONS_DIR=""
836
+ if [[ -n "${PROJECT_ROOT:-}" && -d "${PROJECT_ROOT}/docs/lessons" ]]; then
837
+ PROJECT_LESSONS_DIR="${PROJECT_ROOT}/docs/lessons"
838
+ fi
839
+ ```
840
+
841
+ Then find the glob loop that loads lesson files (the line that iterates over `"$LESSONS_DIR"/[0-9]*.md`). After that loop completes, add a second loop for project-local lessons:
842
+
843
+ ```bash
844
+ # Load project-local lessons (Tier 3)
845
+ if [[ -n "$PROJECT_LESSONS_DIR" ]]; then
846
+ for lesson_file in "$PROJECT_LESSONS_DIR"/[0-9]*.md; do
847
+ [[ -f "$lesson_file" ]] || continue
848
+ # Same parse_lesson + check logic as bundled lessons
849
+ # (reuse the same function — it's already defined)
850
+ done
851
+ fi
852
+ ```
853
+
854
+ The exact insertion point depends on the lesson-check.sh structure. The implementer should read the full file to find where lessons are iterated and add the project-local loop after.
855
+
856
+ **Step 4: Run the tests**
857
+
858
+ Run: `bash scripts/tests/test-lesson-local.sh`
859
+ Expected: All tests PASS
860
+
861
+ Run: `bash scripts/tests/test-lesson-check.sh`
862
+ Expected: All existing tests still PASS (no regression)
863
+
864
+ **Step 5: Commit**
865
+
866
+ ```bash
867
+ git add scripts/lesson-check.sh scripts/tests/test-lesson-local.sh
868
+ git commit -m "feat: support project-local lessons (Tier 3) in lesson-check.sh"
869
+ ```
870
+
871
+ ---
872
+
873
+ ## Batch 4: README + npm Prep
874
+
875
+ ### Task 9: Update README.md with npm install instructions
876
+
877
+ **Files:**
878
+ - Modify: `README.md`
879
+
880
+ **Step 1: Update installation section**
881
+
882
+ Replace the current Install section with:
883
+
884
+ ```markdown
885
+ ## Install
886
+
887
+ ### npm (recommended)
888
+
889
+ ```bash
890
+ npm install -g autonomous-coding-toolkit
891
+ ```
892
+
893
+ This puts `act` on your PATH. Requires Node.js 18+ and bash 4+.
894
+
895
+ ### Claude Code Plugin
896
+
897
+ ```bash
898
+ # Add the marketplace source
899
+ /plugin marketplace add parthalon025/autonomous-coding-toolkit
900
+
901
+ # Install the plugin
902
+ /plugin install autonomous-coding-toolkit@autonomous-coding-toolkit
903
+ ```
904
+
905
+ ### From Source
906
+
907
+ ```bash
908
+ git clone https://github.com/parthalon025/autonomous-coding-toolkit.git
909
+ cd autonomous-coding-toolkit
910
+ npm link # puts 'act' on PATH
911
+ ```
912
+
913
+ > **Windows:** Requires [WSL](https://learn.microsoft.com/en-us/windows/wsl/install). Run `wsl --install`, then use the toolkit inside WSL.
914
+ ```
915
+
916
+ **Step 2: Add Quick Start section for CLI**
917
+
918
+ Update the Quick Start section to include CLI commands alongside plugin commands:
919
+
920
+ ```markdown
921
+ ## Quick Start
922
+
923
+ ```bash
924
+ # Bootstrap your project
925
+ act init --quickstart
926
+
927
+ # Full pipeline — brainstorm → plan → execute → verify → finish
928
+ /autocode "Add user authentication with JWT"
929
+
930
+ # Run a plan headless (fully autonomous, fresh context per batch)
931
+ act plan docs/plans/my-feature.md --on-failure retry --notify
932
+
933
+ # Quality check
934
+ act gate --project-root .
935
+
936
+ # See all commands
937
+ act help
938
+ ```
939
+ ```
940
+
941
+ **Step 3: Verify README renders correctly**
942
+
943
+ Run: `head -60 ~/Documents/projects/autonomous-coding-toolkit/README.md`
944
+ Expected: Updated installation and quick start sections visible
945
+
946
+ **Step 4: Commit**
947
+
948
+ ```bash
949
+ git add README.md
950
+ git commit -m "docs: update README with npm install and CLI usage"
951
+ ```
952
+
953
+ ### Task 10: Add .npmignore
954
+
955
+ **Files:**
956
+ - Create: `.npmignore`
957
+
958
+ **Step 1: Create .npmignore**
959
+
960
+ ```
961
+ # Development files
962
+ .worktrees/
963
+ .run-plan-state.json
964
+ progress.txt
965
+ logs/
966
+ tasks/
967
+ .claude/
968
+ .github/
969
+ research/
970
+
971
+ # Test fixtures (tests themselves ship for validation)
972
+ scripts/tests/fixtures/
973
+
974
+ # Git
975
+ .git/
976
+ .gitignore
977
+ ```
978
+
979
+ **Step 2: Verify npm pack excludes dev files**
980
+
981
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && npm pack --dry-run 2>&1 | grep -c 'run-plan-state\|\.worktrees\|research/'`
982
+ Expected: `0` (none of those files included)
983
+
984
+ **Step 3: Commit**
985
+
986
+ ```bash
987
+ git add .npmignore
988
+ git commit -m "chore: add .npmignore for clean npm packaging"
989
+ ```
990
+
991
+ ### Task 11: Verify full test suite passes
992
+
993
+ **Step 1: Run all tests**
994
+
995
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
996
+ Expected: All tests PASS, including the 3 new test files
997
+
998
+ **Step 2: Run quality gate on self**
999
+
1000
+ Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
1001
+ Expected: ALL PASSED
1002
+
1003
+ ---
1004
+
1005
+ ## Batch 5: Telemetry Script (P1)
1006
+
1007
+ ### Task 12: Write tests for telemetry.sh
1008
+
1009
+ **Files:**
1010
+ - Create: `scripts/tests/test-telemetry.sh`
1011
+
1012
+ **Step 1: Write the tests**
1013
+
1014
+ ```bash
1015
+ #!/usr/bin/env bash
1016
+ # Test scripts/telemetry.sh — telemetry capture, show, export, reset
1017
+ set -euo pipefail
1018
+
1019
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
1020
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
1021
+ TELEMETRY="$REPO_ROOT/scripts/telemetry.sh"
1022
+
1023
+ source "$SCRIPT_DIR/test-helpers.sh"
1024
+
1025
+ # --- Setup ---
1026
+ WORK=$(mktemp -d)
1027
+ trap 'rm -rf "$WORK"' EXIT
1028
+ mkdir -p "$WORK/logs"
1029
+
1030
+ # --- Test 1: record writes to telemetry.jsonl ---
1031
+ bash "$TELEMETRY" record --project-root "$WORK" \
1032
+ --batch-number 1 --passed true --strategy superpowers \
1033
+ --duration 120 --cost 0.42 --test-delta 5 2>&1 || true
1034
+ assert_eq "record creates telemetry.jsonl" "true" \
1035
+ "$([ -f "$WORK/logs/telemetry.jsonl" ] && echo true || echo false)"
1036
+
1037
+ # --- Test 2: record appends valid JSON ---
1038
+ line=$(head -1 "$WORK/logs/telemetry.jsonl")
1039
+ echo "$line" | jq . >/dev/null 2>&1
1040
+ assert_eq "record writes valid JSON" "0" "$?"
1041
+
1042
+ # --- Test 3: show produces dashboard output ---
1043
+ output=$(bash "$TELEMETRY" show --project-root "$WORK" 2>&1 || true)
1044
+ assert_contains "show displays header" "Telemetry Dashboard" "$output"
1045
+
1046
+ # --- Test 4: export produces anonymized output ---
1047
+ bash "$TELEMETRY" export --project-root "$WORK" > "$WORK/export.json" 2>&1 || true
1048
+ assert_eq "export creates output" "true" "$([ -s "$WORK/export.json" ] && echo true || echo false)"
1049
+
1050
+ # --- Test 5: reset clears telemetry ---
1051
+ bash "$TELEMETRY" reset --project-root "$WORK" --yes 2>&1 || true
1052
+ if [[ -f "$WORK/logs/telemetry.jsonl" ]]; then
1053
+ line_count=$(wc -l < "$WORK/logs/telemetry.jsonl")
1054
+ assert_eq "reset clears telemetry" "0" "$line_count"
1055
+ else
1056
+ pass "reset removes telemetry file"
1057
+ fi
1058
+
1059
+ report_results
1060
+ ```
1061
+
1062
+ **Step 2: Make executable and verify it fails**
1063
+
1064
+ ```bash
1065
+ chmod +x scripts/tests/test-telemetry.sh
1066
+ ```
1067
+
1068
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telemetry.sh 2>&1 | tail -3`
1069
+ Expected: FAIL (telemetry.sh doesn't exist yet)
1070
+
1071
+ ### Task 13: Implement telemetry.sh
1072
+
1073
+ **Files:**
1074
+ - Create: `scripts/telemetry.sh`
1075
+
1076
+ **Step 1: Write the implementation**
1077
+
1078
+ ```bash
1079
+ #!/usr/bin/env bash
1080
+ # telemetry.sh — Local telemetry capture, dashboard, export, and import
1081
+ #
1082
+ # Usage:
1083
+ # telemetry.sh record --project-root <dir> [--batch-number N] [--passed true|false] ...
1084
+ # telemetry.sh show --project-root <dir>
1085
+ # telemetry.sh export --project-root <dir>
1086
+ # telemetry.sh import --project-root <dir> <file>
1087
+ # telemetry.sh reset --project-root <dir> --yes
1088
+ set -euo pipefail
1089
+
1090
+ SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
1091
+ source "$SCRIPT_DIR/lib/common.sh"
1092
+
1093
+ PROJECT_ROOT=""
1094
+ SUBCOMMAND=""
1095
+
1096
+ # --- Parse top-level ---
1097
+ SUBCOMMAND="${1:-}"
1098
+ shift || true
1099
+
1100
+ # Parse remaining args
1101
+ BATCH_NUMBER=""
1102
+ PASSED=""
1103
+ STRATEGY=""
1104
+ DURATION=""
1105
+ COST=""
1106
+ TEST_DELTA=""
1107
+ LESSONS_TRIGGERED=""
1108
+ PLAN_QUALITY=""
1109
+ BATCH_TYPE=""
1110
+ CONFIRM_YES=false
1111
+
1112
+ while [[ $# -gt 0 ]]; do
1113
+ case "$1" in
1114
+ --project-root) PROJECT_ROOT="${2:-}"; shift 2 ;;
1115
+ --batch-number) BATCH_NUMBER="${2:-}"; shift 2 ;;
1116
+ --passed) PASSED="${2:-}"; shift 2 ;;
1117
+ --strategy) STRATEGY="${2:-}"; shift 2 ;;
1118
+ --duration) DURATION="${2:-}"; shift 2 ;;
1119
+ --cost) COST="${2:-}"; shift 2 ;;
1120
+ --test-delta) TEST_DELTA="${2:-}"; shift 2 ;;
1121
+ --lessons-triggered) LESSONS_TRIGGERED="${2:-}"; shift 2 ;;
1122
+ --plan-quality) PLAN_QUALITY="${2:-}"; shift 2 ;;
1123
+ --batch-type) BATCH_TYPE="${2:-}"; shift 2 ;;
1124
+ --yes) CONFIRM_YES=true; shift ;;
1125
+ --help|-h) echo "Usage: telemetry.sh <record|show|export|import|reset> --project-root <dir> [options]"; exit 0 ;;
1126
+ *)
1127
+ # Positional arg (for import file)
1128
+ if [[ -z "${IMPORT_FILE:-}" ]]; then
1129
+ IMPORT_FILE="$1"
1130
+ fi
1131
+ shift ;;
1132
+ esac
1133
+ done
1134
+
1135
+ if [[ -z "$PROJECT_ROOT" ]]; then
1136
+ echo "telemetry: --project-root is required" >&2
1137
+ exit 1
1138
+ fi
1139
+
1140
+ TELEMETRY_FILE="$PROJECT_ROOT/logs/telemetry.jsonl"
1141
+
1142
+ case "$SUBCOMMAND" in
1143
+ record)
1144
+ mkdir -p "$PROJECT_ROOT/logs"
1145
+ jq -n \
1146
+ --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
1147
+ --arg bn "${BATCH_NUMBER:-0}" \
1148
+ --arg passed "${PASSED:-false}" \
1149
+ --arg strategy "${STRATEGY:-unknown}" \
1150
+ --arg duration "${DURATION:-0}" \
1151
+ --arg cost "${COST:-0}" \
1152
+ --arg td "${TEST_DELTA:-0}" \
1153
+ --arg lt "${LESSONS_TRIGGERED:-}" \
1154
+ --arg pq "${PLAN_QUALITY:-}" \
1155
+ --arg bt "${BATCH_TYPE:-unknown}" \
1156
+ --arg pt "$(detect_project_type "$PROJECT_ROOT")" \
1157
+ '{
1158
+ timestamp: $ts,
1159
+ project_type: $pt,
1160
+ batch_type: $bt,
1161
+ batch_number: ($bn | tonumber),
1162
+ passed_gate: ($passed == "true"),
1163
+ strategy: $strategy,
1164
+ duration_seconds: ($duration | tonumber),
1165
+ cost_usd: ($cost | tonumber),
1166
+ test_count_delta: ($td | tonumber),
1167
+ lessons_triggered: (if $lt == "" then [] else ($lt | split(",")) end),
1168
+ plan_quality_score: (if $pq == "" then null else ($pq | tonumber) end)
1169
+ }' >> "$TELEMETRY_FILE"
1170
+ echo "telemetry: recorded batch $BATCH_NUMBER"
1171
+ ;;
1172
+
1173
+ show)
1174
+ echo "Autonomous Coding Toolkit — Telemetry Dashboard"
1175
+ echo "════════════════════════════════════════════════"
1176
+ echo ""
1177
+
1178
+ if [[ ! -f "$TELEMETRY_FILE" ]] || [[ ! -s "$TELEMETRY_FILE" ]]; then
1179
+ echo "No telemetry data yet. Run some batches first."
1180
+ exit 0
1181
+ fi
1182
+
1183
+ # Summary stats
1184
+ total=$(wc -l < "$TELEMETRY_FILE")
1185
+ passed=$(jq -s '[.[] | select(.passed_gate == true)] | length' "$TELEMETRY_FILE")
1186
+ total_cost=$(jq -s '[.[].cost_usd] | add // 0' "$TELEMETRY_FILE")
1187
+ total_duration=$(jq -s '[.[].duration_seconds] | add // 0' "$TELEMETRY_FILE")
1188
+ avg_cost=$(jq -s 'if length > 0 then ([.[].cost_usd] | add) / length else 0 end' "$TELEMETRY_FILE")
1189
+
1190
+ echo "Runs: $total batches"
1191
+ if [[ "$total" -gt 0 ]]; then
1192
+ pct=$((passed * 100 / total))
1193
+ echo "Success rate: ${pct}% ($passed/$total passed gate on first attempt)"
1194
+ fi
1195
+ printf "Total cost: \$%.2f (\$%.2f/batch average)\n" "$total_cost" "$avg_cost"
1196
+ hours=$(awk "BEGIN {printf \"%.1f\", $total_duration / 3600}")
1197
+ echo "Total time: ${hours} hours"
1198
+
1199
+ # Strategy performance
1200
+ echo ""
1201
+ echo "Strategy Performance:"
1202
+ jq -s '
1203
+ group_by(.strategy) | .[] |
1204
+ {
1205
+ strategy: .[0].strategy,
1206
+ wins: [.[] | select(.passed_gate == true)] | length,
1207
+ total: length
1208
+ } |
1209
+ " \(.strategy): \(.wins)/\(.total) (\(if .total > 0 then (.wins * 100 / .total) else 0 end)% win rate)"
1210
+ ' "$TELEMETRY_FILE" 2>/dev/null || echo " (no strategy data)"
1211
+
1212
+ # Top lesson hits
1213
+ echo ""
1214
+ echo "Top Lesson Hits:"
1215
+ jq -s '
1216
+ [.[].lessons_triggered | arrays | .[]] |
1217
+ group_by(.) | map({lesson: .[0], count: length}) |
1218
+ sort_by(-.count) | .[:5] |
1219
+ .[] | " \(.lesson): \(.count) hits"
1220
+ ' "$TELEMETRY_FILE" 2>/dev/null || echo " (no lesson data)"
1221
+ ;;
1222
+
1223
+ export)
1224
+ if [[ ! -f "$TELEMETRY_FILE" ]]; then
1225
+ echo "No telemetry data to export." >&2
1226
+ exit 1
1227
+ fi
1228
+ # Anonymize: remove timestamps precision, no file paths
1229
+ jq -s '
1230
+ [.[] | {
1231
+ project_type,
1232
+ batch_type,
1233
+ passed_gate,
1234
+ strategy,
1235
+ duration_seconds,
1236
+ cost_usd,
1237
+ test_count_delta,
1238
+ lessons_triggered,
1239
+ plan_quality_score
1240
+ }]
1241
+ ' "$TELEMETRY_FILE"
1242
+ ;;
1243
+
1244
+ import)
1245
+ if [[ -z "${IMPORT_FILE:-}" || ! -f "${IMPORT_FILE:-}" ]]; then
1246
+ echo "telemetry: import requires a file argument" >&2
1247
+ exit 1
1248
+ fi
1249
+ echo "telemetry: import not yet implemented (planned for community sync)"
1250
+ ;;
1251
+
1252
+ reset)
1253
+ if [[ "$CONFIRM_YES" != true ]]; then
1254
+ echo "telemetry: use --yes to confirm reset" >&2
1255
+ exit 1
1256
+ fi
1257
+ if [[ -f "$TELEMETRY_FILE" ]]; then
1258
+ > "$TELEMETRY_FILE"
1259
+ echo "telemetry: cleared $TELEMETRY_FILE"
1260
+ else
1261
+ echo "telemetry: no telemetry file to reset"
1262
+ fi
1263
+ ;;
1264
+
1265
+ *)
1266
+ echo "Usage: telemetry.sh <record|show|export|import|reset> --project-root <dir>" >&2
1267
+ exit 1
1268
+ ;;
1269
+ esac
1270
+ ```
1271
+
1272
+ **Step 2: Make executable**
1273
+
1274
+ ```bash
1275
+ chmod +x scripts/telemetry.sh
1276
+ ```
1277
+
1278
+ **Step 3: Run the tests**
1279
+
1280
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/test-telemetry.sh`
1281
+ Expected: All tests PASS
1282
+
1283
+ **Step 4: Commit**
1284
+
1285
+ ```bash
1286
+ git add scripts/telemetry.sh scripts/tests/test-telemetry.sh
1287
+ git commit -m "feat: add telemetry.sh — capture, dashboard, export, reset"
1288
+ ```
1289
+
1290
+ ---
1291
+
1292
+ ## Batch 6: Telemetry Integration in Quality Gate
1293
+
1294
+ ### Task 14: Add telemetry capture to quality-gate.sh
1295
+
1296
+ **Files:**
1297
+ - Modify: `scripts/quality-gate.sh:248` (after "ALL PASSED")
1298
+
1299
+ **Step 1: Add telemetry capture after the final echo**
1300
+
1301
+ Before the `exit 0` at the end of quality-gate.sh, add telemetry recording:
1302
+
1303
+ ```bash
1304
+ # === Telemetry capture (append batch result) ===
1305
+ # Only record if TELEMETRY_BATCH_NUMBER is set (called from run-plan context)
1306
+ if [[ -n "${TELEMETRY_BATCH_NUMBER:-}" ]]; then
1307
+ "$SCRIPT_DIR/telemetry.sh" record \
1308
+ --project-root "$PROJECT_ROOT" \
1309
+ --batch-number "${TELEMETRY_BATCH_NUMBER}" \
1310
+ --passed true \
1311
+ --strategy "${TELEMETRY_STRATEGY:-unknown}" \
1312
+ --duration "${TELEMETRY_DURATION:-0}" \
1313
+ --cost "${TELEMETRY_COST:-0}" \
1314
+ --test-delta "${TELEMETRY_TEST_DELTA:-0}" \
1315
+ --batch-type "${TELEMETRY_BATCH_TYPE:-unknown}" \
1316
+ 2>/dev/null || true # Never fail the gate for telemetry errors
1317
+ fi
1318
+ ```
1319
+
1320
+ This is conditional — telemetry only records when the env vars are set by the calling script (run-plan.sh). Quality gate still works exactly as before when called standalone.
1321
+
1322
+ **Step 2: Verify quality gate still passes standalone**
1323
+
1324
+ Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit --quick`
1325
+ Expected: ALL PASSED (no telemetry vars set, so telemetry capture is silently skipped)
1326
+
1327
+ **Step 3: Verify telemetry records when vars are set**
1328
+
1329
+ ```bash
1330
+ WORK=$(mktemp -d)
1331
+ mkdir -p "$WORK/logs"
1332
+ git -C "$WORK" init -q
1333
+ TELEMETRY_BATCH_NUMBER=1 TELEMETRY_STRATEGY=test \
1334
+ bash scripts/quality-gate.sh --project-root "$WORK" --quick 2>&1 | tail -3
1335
+ cat "$WORK/logs/telemetry.jsonl" 2>/dev/null || echo "(no telemetry)"
1336
+ rm -rf "$WORK"
1337
+ ```
1338
+
1339
+ Expected: quality gate passes and telemetry.jsonl has one line
1340
+
1341
+ **Step 4: Commit**
1342
+
1343
+ ```bash
1344
+ git add scripts/quality-gate.sh
1345
+ git commit -m "feat: integrate telemetry capture into quality gate pipeline"
1346
+ ```
1347
+
1348
+ ### Task 15: Run full test suite
1349
+
1350
+ **Step 1: Run all tests**
1351
+
1352
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
1353
+ Expected: All tests PASS
1354
+
1355
+ **Step 2: Run quality gate**
1356
+
1357
+ Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
1358
+ Expected: ALL PASSED
1359
+
1360
+ ---
1361
+
1362
+ ## Batch 7: Benchmark Suite (P1)
1363
+
1364
+ ### Task 16: Create benchmark directory structure
1365
+
1366
+ **Files:**
1367
+ - Create: `benchmarks/runner.sh`
1368
+ - Create: `benchmarks/tasks/01-rest-endpoint/task.md`
1369
+ - Create: `benchmarks/tasks/01-rest-endpoint/rubric.sh`
1370
+
1371
+ **Step 1: Create directories**
1372
+
1373
+ ```bash
1374
+ mkdir -p benchmarks/tasks/01-rest-endpoint
1375
+ mkdir -p benchmarks/tasks/02-refactor-module
1376
+ mkdir -p benchmarks/tasks/03-fix-integration-bug
1377
+ mkdir -p benchmarks/tasks/04-add-test-coverage
1378
+ mkdir -p benchmarks/tasks/05-multi-file-feature
1379
+ mkdir -p benchmarks/rubrics
1380
+ ```
1381
+
1382
+ **Step 2: Write benchmark runner**
1383
+
1384
+ ```bash
1385
+ #!/usr/bin/env bash
1386
+ # runner.sh — Benchmark orchestrator for the Autonomous Coding Toolkit
1387
+ #
1388
+ # Usage:
1389
+ # runner.sh run [task-name] Run all or one benchmark
1390
+ # runner.sh compare <a> <b> Compare two result files
1391
+ # runner.sh list List available benchmarks
1392
+ set -euo pipefail
1393
+
1394
+ SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
1395
+ TASKS_DIR="$SCRIPT_DIR/tasks"
1396
+ RESULTS_DIR="${BENCHMARK_RESULTS_DIR:-$SCRIPT_DIR/results}"
1397
+
1398
+ usage() {
1399
+ cat <<'USAGE'
1400
+ Usage: runner.sh <run|compare|list> [options]
1401
+
1402
+ Commands:
1403
+ run [name] Run all benchmarks, or a specific one by directory name
1404
+ compare <a> <b> Compare two result JSON files
1405
+ list List available benchmark tasks
1406
+
1407
+ Options:
1408
+ --help, -h Show this help
1409
+
1410
+ Results are saved to benchmarks/results/ (gitignored).
1411
+ USAGE
1412
+ exit 0
1413
+ }
1414
+
1415
+ SUBCOMMAND="${1:-}"
1416
+ shift || true
1417
+
1418
+ case "$SUBCOMMAND" in
1419
+ list)
1420
+ echo "Available benchmarks:"
1421
+ for task_dir in "$TASKS_DIR"/*/; do
1422
+ [[ -d "$task_dir" ]] || continue
1423
+ name=$(basename "$task_dir")
1424
+ desc=""
1425
+ if [[ -f "$task_dir/task.md" ]]; then
1426
+ desc=$(head -1 "$task_dir/task.md" | sed 's/^# //')
1427
+ fi
1428
+ echo " $name — $desc"
1429
+ done
1430
+ ;;
1431
+
1432
+ run)
1433
+ TARGET="${1:-all}"
1434
+ mkdir -p "$RESULTS_DIR"
1435
+ timestamp=$(date -u +%Y%m%dT%H%M%SZ)
1436
+
1437
+ run_benchmark() {
1438
+ local task_dir="$1"
1439
+ local name=$(basename "$task_dir")
1440
+ echo "=== Benchmark: $name ==="
1441
+
1442
+ if [[ ! -f "$task_dir/rubric.sh" ]]; then
1443
+ echo " SKIP: no rubric.sh found"
1444
+ return
1445
+ fi
1446
+
1447
+ local score=0
1448
+ local total=0
1449
+ local pass=0
1450
+
1451
+ # Run rubric — each line of output is "PASS: desc" or "FAIL: desc"
1452
+ while IFS= read -r line; do
1453
+ total=$((total + 1))
1454
+ if [[ "$line" == PASS:* ]]; then
1455
+ pass=$((pass + 1))
1456
+ fi
1457
+ echo " $line"
1458
+ done < <(bash "$task_dir/rubric.sh" 2>&1 || true)
1459
+
1460
+ if [[ $total -gt 0 ]]; then
1461
+ score=$((pass * 100 / total))
1462
+ fi
1463
+ echo " Score: ${score}% ($pass/$total)"
1464
+ echo ""
1465
+
1466
+ # Write result
1467
+ jq -n --arg name "$name" --argjson score "$score" \
1468
+ --argjson pass "$pass" --argjson total "$total" \
1469
+ --arg ts "$timestamp" \
1470
+ '{name: $name, score: $score, passed: $pass, total: $total, timestamp: $ts}' \
1471
+ >> "$RESULTS_DIR/$timestamp.jsonl"
1472
+ }
1473
+
1474
+ if [[ "$TARGET" == "all" ]]; then
1475
+ for task_dir in "$TASKS_DIR"/*/; do
1476
+ [[ -d "$task_dir" ]] || continue
1477
+ run_benchmark "$task_dir"
1478
+ done
1479
+ else
1480
+ if [[ -d "$TASKS_DIR/$TARGET" ]]; then
1481
+ run_benchmark "$TASKS_DIR/$TARGET"
1482
+ else
1483
+ echo "Benchmark not found: $TARGET" >&2
1484
+ echo "Run 'runner.sh list' to see available benchmarks." >&2
1485
+ exit 1
1486
+ fi
1487
+ fi
1488
+
1489
+ echo "Results saved to: $RESULTS_DIR/$timestamp.jsonl"
1490
+ ;;
1491
+
1492
+ compare)
1493
+ FILE_A="${1:-}"
1494
+ FILE_B="${2:-}"
1495
+ if [[ -z "$FILE_A" || -z "$FILE_B" ]]; then
1496
+ echo "Usage: runner.sh compare <result-a.jsonl> <result-b.jsonl>" >&2
1497
+ exit 1
1498
+ fi
1499
+ if [[ ! -f "$FILE_A" || ! -f "$FILE_B" ]]; then
1500
+ echo "One or both files not found." >&2
1501
+ exit 1
1502
+ fi
1503
+
1504
+ echo "Benchmark Comparison"
1505
+ echo "═════════════════════════════════════"
1506
+ printf "%-25s %8s %8s %8s\n" "Task" "Before" "After" "Delta"
1507
+ echo "─────────────────────────────────────────────"
1508
+
1509
+ # Merge by name and compare
1510
+ jq -s '
1511
+ [.[0], .[1]] | transpose | .[] |
1512
+ select(.[0] != null and .[1] != null) |
1513
+ "\(.[0].name)|\(.[0].score)|\(.[1].score)|\(.[1].score - .[0].score)"
1514
+ ' <(jq -s '.' "$FILE_A") <(jq -s '.' "$FILE_B") 2>/dev/null | \
1515
+ while IFS='|' read -r name before after delta; do
1516
+ sign=""
1517
+ [[ "$delta" -gt 0 ]] && sign="+"
1518
+ printf "%-25s %7s%% %7s%% %7s%%\n" "$name" "$before" "$after" "${sign}${delta}"
1519
+ done
1520
+
1521
+ echo "═════════════════════════════════════"
1522
+ ;;
1523
+
1524
+ help|--help|-h|"")
1525
+ usage
1526
+ ;;
1527
+
1528
+ *)
1529
+ echo "Unknown command: $SUBCOMMAND" >&2
1530
+ usage
1531
+ ;;
1532
+ esac
1533
+ ```
1534
+
1535
+ **Step 3: Make executable**
1536
+
1537
+ ```bash
1538
+ chmod +x benchmarks/runner.sh
1539
+ ```
1540
+
1541
+ **Step 4: Write first benchmark task definition**
1542
+
1543
+ Create `benchmarks/tasks/01-rest-endpoint/task.md`:
1544
+
1545
+ ```markdown
1546
+ # Add a REST Endpoint with Tests
1547
+
1548
+ **Complexity:** Simple (1 batch)
1549
+ **Measures:** Basic execution, TDD compliance
1550
+
1551
+ ## Task
1552
+
1553
+ Add a `/health` endpoint to the project that:
1554
+ 1. Returns HTTP 200 with JSON body `{"status": "ok", "timestamp": "<ISO8601>"}`
1555
+ 2. Has a test that verifies the response status and body structure
1556
+ 3. All tests pass
1557
+
1558
+ ## Constraints
1559
+
1560
+ - Use the project's existing web framework (or add minimal one if none exists)
1561
+ - Follow existing code style and patterns
1562
+ - Test must be automated (no manual verification)
1563
+ ```
1564
+
1565
+ Create `benchmarks/tasks/01-rest-endpoint/rubric.sh`:
1566
+
1567
+ ```bash
1568
+ #!/usr/bin/env bash
1569
+ # Rubric for 01-rest-endpoint benchmark
1570
+ # Checks for task completion criteria
1571
+ set -euo pipefail
1572
+
1573
+ PROJECT_ROOT="${BENCHMARK_PROJECT_ROOT:-.}"
1574
+
1575
+ # Criterion 1: Health endpoint file exists
1576
+ if compgen -G "$PROJECT_ROOT/src/*health*" >/dev/null 2>&1 || \
1577
+ compgen -G "$PROJECT_ROOT/app/*health*" >/dev/null 2>&1 || \
1578
+ grep -rl "health" "$PROJECT_ROOT/src/" "$PROJECT_ROOT/app/" 2>/dev/null | head -1 >/dev/null 2>&1; then
1579
+ echo "PASS: Health endpoint file exists"
1580
+ else
1581
+ echo "FAIL: Health endpoint file not found"
1582
+ fi
1583
+
1584
+ # Criterion 2: Test file exists
1585
+ if compgen -G "$PROJECT_ROOT/tests/*health*" >/dev/null 2>&1 || \
1586
+ compgen -G "$PROJECT_ROOT/test/*health*" >/dev/null 2>&1; then
1587
+ echo "PASS: Health endpoint test file exists"
1588
+ else
1589
+ echo "FAIL: Health endpoint test file not found"
1590
+ fi
1591
+
1592
+ # Criterion 3: Test passes
1593
+ if cd "$PROJECT_ROOT" && (npm test 2>/dev/null || pytest 2>/dev/null || make test 2>/dev/null); then
1594
+ echo "PASS: Tests pass"
1595
+ else
1596
+ echo "FAIL: Tests do not pass"
1597
+ fi
1598
+ ```
1599
+
1600
+ ```bash
1601
+ chmod +x benchmarks/tasks/01-rest-endpoint/rubric.sh
1602
+ ```
1603
+
1604
+ **Step 5: Write remaining task stubs**
1605
+
1606
+ For benchmarks 02-05, create minimal `task.md` files (rubrics can be expanded later):
1607
+
1608
+ Create `benchmarks/tasks/02-refactor-module/task.md`:
1609
+ ```markdown
1610
+ # Refactor a Module into Two
1611
+
1612
+ **Complexity:** Medium (2 batches)
1613
+ **Measures:** Refactoring quality, test preservation
1614
+
1615
+ ## Task
1616
+
1617
+ Split `src/utils.sh` into `src/string-utils.sh` and `src/file-utils.sh`, preserving all existing tests.
1618
+ ```
1619
+
1620
+ Create `benchmarks/tasks/03-fix-integration-bug/task.md`:
1621
+ ```markdown
1622
+ # Fix an Integration Bug
1623
+
1624
+ **Complexity:** Medium (2 batches)
1625
+ **Measures:** Debugging, root cause analysis
1626
+
1627
+ ## Task
1628
+
1629
+ The `/api/users` endpoint returns 500 when the database connection pool is exhausted. Find and fix the root cause.
1630
+ ```
1631
+
1632
+ Create `benchmarks/tasks/04-add-test-coverage/task.md`:
1633
+ ```markdown
1634
+ # Add Test Coverage to Untested Module
1635
+
1636
+ **Complexity:** Medium (2 batches)
1637
+ **Measures:** Test quality, edge case discovery
1638
+
1639
+ ## Task
1640
+
1641
+ Add comprehensive tests to `src/parser.sh` which currently has 0% coverage. Cover happy path, edge cases, and error conditions.
1642
+ ```
1643
+
1644
+ Create `benchmarks/tasks/05-multi-file-feature/task.md`:
1645
+ ```markdown
1646
+ # Multi-File Feature with API + DB + Tests
1647
+
1648
+ **Complexity:** Complex (4 batches)
1649
+ **Measures:** Full pipeline, cross-file coordination
1650
+
1651
+ ## Task
1652
+
1653
+ Add a "bookmarks" feature: API endpoints (CRUD), database migration, and integration tests.
1654
+ ```
1655
+
1656
+ **Step 6: Verify runner works**
1657
+
1658
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash benchmarks/runner.sh list`
1659
+ Expected: Lists all 5 benchmark tasks
1660
+
1661
+ **Step 7: Add results/ to .gitignore**
1662
+
1663
+ ```bash
1664
+ echo "benchmarks/results/" >> .gitignore
1665
+ ```
1666
+
1667
+ **Step 8: Commit**
1668
+
1669
+ ```bash
1670
+ git add benchmarks/ .gitignore
1671
+ git commit -m "feat: add benchmark suite with 5 tasks and runner.sh"
1672
+ ```
1673
+
1674
+ ### Task 17: Write benchmark runner test
1675
+
1676
+ **Files:**
1677
+ - Create: `scripts/tests/test-benchmark-runner.sh`
1678
+
1679
+ **Step 1: Write the test**
1680
+
1681
+ ```bash
1682
+ #!/usr/bin/env bash
1683
+ # Test benchmarks/runner.sh
1684
+ set -euo pipefail
1685
+
1686
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
1687
+ REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
1688
+ RUNNER="$REPO_ROOT/benchmarks/runner.sh"
1689
+
1690
+ source "$SCRIPT_DIR/test-helpers.sh"
1691
+
1692
+ # --- Test 1: list shows benchmarks ---
1693
+ output=$(bash "$RUNNER" list 2>&1)
1694
+ assert_contains "list shows benchmarks" "01-rest-endpoint" "$output"
1695
+ assert_contains "list shows all 5" "05-multi-file-feature" "$output"
1696
+
1697
+ # --- Test 2: help works ---
1698
+ output=$(bash "$RUNNER" help 2>&1)
1699
+ assert_contains "help shows usage" "Usage:" "$output"
1700
+
1701
+ # --- Test 3: unknown benchmark fails gracefully ---
1702
+ exit_code=0
1703
+ bash "$RUNNER" run nonexistent-benchmark >/dev/null 2>&1 || exit_code=$?
1704
+ assert_eq "unknown benchmark exits non-zero" "1" "$exit_code"
1705
+
1706
+ report_results
1707
+ ```
1708
+
1709
+ **Step 2: Make executable and run**
1710
+
1711
+ ```bash
1712
+ chmod +x scripts/tests/test-benchmark-runner.sh
1713
+ ```
1714
+
1715
+ Run: `bash scripts/tests/test-benchmark-runner.sh`
1716
+ Expected: All tests PASS
1717
+
1718
+ **Step 3: Commit**
1719
+
1720
+ ```bash
1721
+ git add scripts/tests/test-benchmark-runner.sh
1722
+ git commit -m "test: add benchmark runner tests"
1723
+ ```
1724
+
1725
+ ---
1726
+
1727
+ ## Batch 8: Trust Score + Graduated Autonomy (P2)
1728
+
1729
+ ### Task 18: Add trust score computation to telemetry.sh
1730
+
1731
+ **Files:**
1732
+ - Modify: `scripts/telemetry.sh` (add `trust` subcommand)
1733
+
1734
+ **Step 1: Add trust score subcommand**
1735
+
1736
+ Add a new case to the `case "$SUBCOMMAND"` block in telemetry.sh:
1737
+
1738
+ ```bash
1739
+ trust)
1740
+ if [[ ! -f "$TELEMETRY_FILE" ]] || [[ ! -s "$TELEMETRY_FILE" ]]; then
1741
+ echo '{"score":0,"level":"new","runs":0,"message":"No telemetry data yet"}'
1742
+ exit 0
1743
+ fi
1744
+
1745
+ jq -s '
1746
+ def trust_level(score; runs):
1747
+ if runs < 10 then "new"
1748
+ elif score < 30 then "new"
1749
+ elif score < 70 then "growing"
1750
+ elif score < 90 then "trusted"
1751
+ else "autonomous"
1752
+ end;
1753
+
1754
+ length as $total |
1755
+ ([.[] | select(.passed_gate == true)] | length) as $passed |
1756
+ (if $total > 0 then ($passed * 100 / $total) else 0 end) as $gate_rate |
1757
+ # Trust score = gate pass rate (simplified; full formula adds echo-back, regression, revert)
1758
+ $gate_rate as $score |
1759
+ trust_level($score; $total) as $level |
1760
+ {
1761
+ score: $score,
1762
+ level: $level,
1763
+ runs: $total,
1764
+ gate_pass_rate: $gate_rate,
1765
+ default_mode: (
1766
+ if $level == "new" then "human checkpoint every batch"
1767
+ elif $level == "growing" then "headless with checkpoint every 3rd batch"
1768
+ elif $level == "trusted" then "headless with notification on failures only"
1769
+ else "full headless, post-run summary only"
1770
+ end
1771
+ )
1772
+ }
1773
+ ' "$TELEMETRY_FILE"
1774
+ ;;
1775
+ ```
1776
+
1777
+ **Step 2: Verify trust score works**
1778
+
1779
+ Create some test data and check:
1780
+
1781
+ ```bash
1782
+ WORK=$(mktemp -d)
1783
+ mkdir -p "$WORK/logs"
1784
+ for i in $(seq 1 15); do
1785
+ bash scripts/telemetry.sh record --project-root "$WORK" --batch-number "$i" --passed true --strategy test --duration 60 --cost 0.30
1786
+ done
1787
+ bash scripts/telemetry.sh trust --project-root "$WORK"
1788
+ rm -rf "$WORK"
1789
+ ```
1790
+
1791
+ Expected: JSON with `"score": 100`, `"level": "autonomous"`, `"runs": 15`
1792
+
1793
+ **Step 3: Commit**
1794
+
1795
+ ```bash
1796
+ git add scripts/telemetry.sh
1797
+ git commit -m "feat: add trust score computation to telemetry"
1798
+ ```
1799
+
1800
+ ### Task 19: Add trust score to pipeline-status.sh
1801
+
1802
+ **Files:**
1803
+ - Modify: `scripts/pipeline-status.sh` (add trust score display section)
1804
+
1805
+ **Step 1: Add trust score section**
1806
+
1807
+ After the "Git" section (before the final `echo` at the bottom), add:
1808
+
1809
+ ```bash
1810
+ # Trust score (from telemetry)
1811
+ if [[ -x "$SCRIPT_DIR/telemetry.sh" ]]; then
1812
+ trust_json=$("$SCRIPT_DIR/telemetry.sh" trust --project-root "$PROJECT_ROOT" 2>/dev/null || echo '{}')
1813
+ trust_score=$(echo "$trust_json" | jq -r '.score // "n/a"' 2>/dev/null || echo "n/a")
1814
+ trust_level=$(echo "$trust_json" | jq -r '.level // "unknown"' 2>/dev/null || echo "unknown")
1815
+ trust_runs=$(echo "$trust_json" | jq -r '.runs // 0' 2>/dev/null || echo "0")
1816
+ trust_mode=$(echo "$trust_json" | jq -r '.default_mode // "unknown"' 2>/dev/null || echo "unknown")
1817
+
1818
+ if [[ "$trust_score" != "n/a" && "$trust_runs" != "0" ]]; then
1819
+ echo ""
1820
+ echo "--- Trust Score ---"
1821
+ echo " Score: ${trust_score}/100 ($trust_runs runs)"
1822
+ echo " Level: $trust_level"
1823
+ echo " Default mode: $trust_mode"
1824
+ fi
1825
+ fi
1826
+ ```
1827
+
1828
+ **Step 2: Verify it works (with no telemetry data, silently skips)**
1829
+
1830
+ Run: `bash scripts/pipeline-status.sh ~/Documents/projects/autonomous-coding-toolkit 2>&1 | tail -10`
1831
+ Expected: Shows git section, trust section may show "n/a" or be absent (no telemetry data in the toolkit itself)
1832
+
1833
+ **Step 3: Commit**
1834
+
1835
+ ```bash
1836
+ git add scripts/pipeline-status.sh
1837
+ git commit -m "feat: display trust score in pipeline status"
1838
+ ```
1839
+
1840
+ ---
1841
+
1842
+ ## Batch 9: Semantic Echo-Back Tier 2 (P2)
1843
+
1844
+ ### Task 20: Add Tier 2 echo-back support
1845
+
1846
+ **Files:**
1847
+ - Modify: `scripts/lib/run-plan-echo-back.sh` (add LLM verification tier)
1848
+
1849
+ **Step 1: Read the current echo-back implementation**
1850
+
1851
+ The implementer should read `scripts/lib/run-plan-echo-back.sh` fully to understand the current keyword-matching logic before adding Tier 2.
1852
+
1853
+ **Step 2: Add Tier 2 function**
1854
+
1855
+ Add after the existing `run_echo_back()` function:
1856
+
1857
+ ```bash
1858
+ # --- Tier 2: LLM semantic verification ---
1859
+ # Activates on batch 1, integration batches, or --strict-echo-back
1860
+ # Requires: claude CLI available
1861
+ run_echo_back_tier2() {
1862
+ local batch_text="$1"
1863
+ local agent_summary="$2"
1864
+
1865
+ if ! command -v claude >/dev/null 2>&1; then
1866
+ echo "echo-back-tier2: claude CLI not available — skipping" >&2
1867
+ return 0
1868
+ fi
1869
+
1870
+ local prompt
1871
+ prompt=$(cat <<PROMPT
1872
+ You are a specification compliance reviewer. Compare:
1873
+
1874
+ SPECIFICATION:
1875
+ $batch_text
1876
+
1877
+ AGENT'S UNDERSTANDING:
1878
+ $agent_summary
1879
+
1880
+ Does the agent's understanding match the specification? Flag any:
1881
+ - Missing requirements
1882
+ - Added requirements not in spec
1883
+ - Misinterpreted requirements
1884
+ - Ambiguous interpretations
1885
+
1886
+ Output exactly one line: PASS or FAIL followed by a colon and explanation.
1887
+ PROMPT
1888
+ )
1889
+
1890
+ local result
1891
+ result=$(echo "$prompt" | claude -p --max-tokens 200 2>/dev/null || echo "PASS: echo-back tier2 unavailable")
1892
+
1893
+ if echo "$result" | grep -qi "^FAIL"; then
1894
+ echo "echo-back-tier2: FAILED — $result"
1895
+ return 1
1896
+ else
1897
+ echo "echo-back-tier2: PASSED"
1898
+ return 0
1899
+ fi
1900
+ }
1901
+
1902
+ # Determine if tier 2 should activate
1903
+ should_run_tier2() {
1904
+ local batch_number="${1:-0}"
1905
+ local batch_type="${2:-unknown}"
1906
+ local strict="${3:-false}"
1907
+
1908
+ # Always on batch 1 (disproportionate risk)
1909
+ [[ "$batch_number" == "1" ]] && return 0
1910
+
1911
+ # Always on integration batches
1912
+ [[ "$batch_type" == "integration" ]] && return 0
1913
+
1914
+ # When strict mode is set
1915
+ [[ "$strict" == "true" ]] && return 0
1916
+
1917
+ return 1
1918
+ }
1919
+ ```
1920
+
1921
+ **Step 3: Integration point**
1922
+
1923
+ The Tier 2 function is now available. Integration into the run-plan headless loop is optional — it will be called by `run-plan-headless.sh` when `STRICT_ECHO_BACK=true` or conditions match. The implementer should add the call at the appropriate point in the headless loop (after agent generates output, before quality gate).
1924
+
1925
+ **Step 4: Commit**
1926
+
1927
+ ```bash
1928
+ git add scripts/lib/run-plan-echo-back.sh
1929
+ git commit -m "feat: add Tier 2 semantic echo-back via LLM verification"
1930
+ ```
1931
+
1932
+ ### Task 21: Final test suite + quality gate
1933
+
1934
+ **Step 1: Run full test suite**
1935
+
1936
+ Run: `cd ~/Documents/projects/autonomous-coding-toolkit && bash scripts/tests/run-all-tests.sh`
1937
+ Expected: All tests PASS (including all new tests from this plan)
1938
+
1939
+ **Step 2: Run quality gate**
1940
+
1941
+ Run: `bash scripts/quality-gate.sh --project-root ~/Documents/projects/autonomous-coding-toolkit`
1942
+ Expected: ALL PASSED
1943
+
1944
+ **Step 3: Run validate-all**
1945
+
1946
+ Run: `bash scripts/validate-all.sh`
1947
+ Expected: All validators pass
1948
+
1949
+ ---
1950
+
1951
+ ## Summary
1952
+
1953
+ | Batch | Priority | Tasks | New Files | Modified Files |
1954
+ |-------|----------|-------|-----------|---------------|
1955
+ | 1 | P0 | 1-3 | `package.json`, `bin/act.js`, `test-act-cli.sh` | — |
1956
+ | 2 | P0 | 4-5 | `scripts/init.sh`, `test-init.sh` | — |
1957
+ | 3 | P0 | 6-8 | `test-telegram-env.sh`, `test-lesson-local.sh` | `telegram.sh`, `lesson-check.sh` |
1958
+ | 4 | P0 | 9-11 | `.npmignore` | `README.md` |
1959
+ | 5 | P1 | 12-13 | `scripts/telemetry.sh`, `test-telemetry.sh` | — |
1960
+ | 6 | P1 | 14-15 | — | `quality-gate.sh` |
1961
+ | 7 | P1 | 16-17 | `benchmarks/runner.sh`, 5 task dirs, `test-benchmark-runner.sh` | `.gitignore` |
1962
+ | 8 | P2 | 18-19 | — | `telemetry.sh`, `pipeline-status.sh` |
1963
+ | 9 | P2 | 20-21 | — | `run-plan-echo-back.sh` |
1964
+
1965
+ **Total: 21 tasks across 9 batches. ~1,150 new lines. 6 new files, 6 modified files.**