autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,49 @@
1
+ ---
2
+ id: 58
3
+ title: "Config keys registered but never consumed are dead knobs"
4
+ severity: should-fix
5
+ languages: [python]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: silent-failures
8
+ pattern:
9
+ type: semantic
10
+ description: "Config keys registered in a defaults/schema file but never read via get_config or equivalent"
11
+ fix: "Wire every registered config key to a get_config call, or remove the dead registration"
12
+ example:
13
+ bad: |
14
+ # config_defaults.py
15
+ register_config("automation.min_confidence", default=0.7)
16
+ register_config("automation.max_suggestions", default=5)
17
+
18
+ # automation.py — uses hardcoded constants, never reads config
19
+ MIN_CONFIDENCE = 0.7
20
+ MAX_SUGGESTIONS = 5
21
+ good: |
22
+ # config_defaults.py
23
+ register_config("automation.min_confidence", default=0.7)
24
+
25
+ # automation.py — reads from config system
26
+ min_confidence = get_config_value("automation.min_confidence")
27
+ ---
28
+
29
+ ## Observation
30
+
31
+ Config keys were registered in a defaults file and exposed in a Settings UI,
32
+ but the consuming module used hardcoded module-level constants instead of
33
+ reading from config. Users could adjust settings that had zero runtime effect.
34
+
35
+ ## Insight
36
+
37
+ This happens when registration and consumption are built in different work
38
+ batches. Batch N registers the config keys with defaults. Batch N+1
39
+ implements the module with hardcoded constants matching those defaults.
40
+ Neither batch verifies the integration. Dead config is worse than missing
41
+ config — it lies to operators by showing controls that do nothing.
42
+
43
+ ## Lesson
44
+
45
+ Every config key registration must have a corresponding read call in the
46
+ consuming module. Add a CI check or quality gate step to detect orphaned
47
+ config keys: extract registered keys, extract consumed keys, diff them.
48
+ Config registration and consumption should happen in the same PR, or a
49
+ contract test must verify that every registered key has at least one consumer.
@@ -0,0 +1,53 @@
1
+ ---
2
+ id: 59
3
+ title: "Independently-built shared structures diverge without contract tests"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: integration-boundaries
8
+ pattern:
9
+ type: semantic
10
+ description: "Two modules independently construct the same ordered structure (feature list, column names, schema) without a shared source or contract test"
11
+ fix: "Add a contract test asserting both structures match, or extract a shared source of truth"
12
+ example:
13
+ bad: |
14
+ # module_a.py — builds feature list from config iteration
15
+ features = [f.name for section in config for f in section.fields]
16
+
17
+ # module_b.py — builds feature list from manual append
18
+ features = []
19
+ features.extend(presence_features)
20
+ features.extend(pattern_features)
21
+ # Missing: event_features added to module_a but not here
22
+ good: |
23
+ # shared.py — single source of truth
24
+ def get_feature_names(config):
25
+ return [f.name for section in config for f in section.fields]
26
+
27
+ # OR: contract test
28
+ def test_feature_names_match():
29
+ assert module_a.get_features() == module_b.get_features()
30
+ ---
31
+
32
+ ## Observation
33
+
34
+ Two modules independently built the same ordered list (feature names for ML
35
+ column alignment). When a new section was added to one, the other was missed.
36
+ The lists had the same names but different ordering — causing a model trained
37
+ with column 3 = "lights_on" to use column 3 = "people_count" at inference.
38
+ Silent data corruption, no error.
39
+
40
+ ## Insight
41
+
42
+ When two code paths independently construct a shared structure, a developer
43
+ adding to one path must manually remember to update the other — a human-memory
44
+ contract with no compile-time enforcement. This applies to feature vectors,
45
+ schema definitions, API response formats, config key lists, enum values, and
46
+ any ordered structure where position matters.
47
+
48
+ ## Lesson
49
+
50
+ When two modules independently build a structure that must match (same
51
+ elements, same order), either: (1) extract a shared source of truth that both
52
+ import, or (2) add a contract test asserting equality. Add the contract test
53
+ BEFORE adding new elements — not after discovering the divergence.
@@ -0,0 +1,53 @@
1
+ ---
2
+ id: 60
3
+ title: "set -e kills long-running bash scripts silently when inter-step commands fail"
4
+ severity: blocker
5
+ languages: [shell]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: silent-failures
8
+ pattern:
9
+ type: semantic
10
+ description: "Bash script uses set -euo pipefail without EXIT trap or guards around non-critical inter-step operations (notifications, logging, context injection). Any unguarded command failure silently terminates the entire script."
11
+ fix: "Add trap '_log_exit $?' EXIT for diagnostics, trap '' HUP PIPE for background survival, and wrap non-critical commands in { ... } || warn blocks"
12
+ example:
13
+ bad: |
14
+ set -euo pipefail
15
+ for batch in ...; do
16
+ context=$(generate_context) # guarded
17
+ sed '...' "$file" > "$tmp" # NOT guarded — kills script on failure
18
+ run_batch
19
+ notify_success "$batch" # NOT guarded — kills script on failure
20
+ done
21
+ good: |
22
+ set -euo pipefail
23
+ trap '_log_exit $?' EXIT
24
+ trap '' HUP PIPE
25
+ for batch in ...; do
26
+ context=$(generate_context || true)
27
+ { sed '...' "$file" > "$tmp"; } || echo "WARNING: context injection failed" >&2
28
+ run_batch
29
+ { notify_success "$batch"; } || echo "WARNING: notification failed" >&2
30
+ done
31
+ ---
32
+
33
+ ## Observation
34
+
35
+ `run-plan.sh` repeatedly died silently between batches during headless execution. The process simply vanished — no error output, no log entry, no state update. The script completed one batch successfully, then disappeared before starting the next.
36
+
37
+ Log files showed the last batch succeeded (quality gate passed, state updated), but the process was gone. Restarting with `--start-batch N` always worked for the next batch, then died again.
38
+
39
+ ## Insight
40
+
41
+ Three compounding factors:
42
+
43
+ 1. **`set -euo pipefail` with no EXIT trap.** Any command returning non-zero anywhere in the inter-batch code (CLAUDE.md sed manipulation, notification calls, failure pattern recording) kills the script instantly. Since there's no EXIT trap, the death is completely silent — no stack trace, no error message, no breadcrumb.
44
+
45
+ 2. **No signal handling — specifically SIGPIPE (confirmed).** The script pipes `claude -p` output through `tee` to write to both a log file and stdout. When stdout is a pipe to a task manager (Claude Code background task), the pipe can close between batches. `tee` then receives SIGPIPE (signal 13, exit code 141), which kills the process. Background processes need `trap '' HUP PIPE` to survive both terminal disconnects and broken pipes.
46
+
47
+ 3. **Non-critical operations not guarded.** The loop contained ~15 unguarded commands between the critical path (run batch → quality gate). Notifications, context injection, sed transformations, git log summaries — all could fail for transient reasons, and each failure was fatal under `set -e`.
48
+
49
+ The pattern is: `set -e` is for correctness on the *critical path*. But when a long-running script has both critical operations (batch execution, quality gates) and non-critical operations (notifications, logging, context assembly), `set -e` can't distinguish between them. Non-critical failures become critical kills.
50
+
51
+ ## Lesson
52
+
53
+ Long-running bash scripts with `set -e` must: (1) add `trap '_log_exit $?' EXIT` so unexpected terminations leave diagnostic breadcrumbs, (2) add `trap '' HUP` if they run in the background, and (3) wrap every non-critical operation in `{ commands; } || warn` blocks so transient failures don't kill the entire pipeline. The rule: if losing this operation wouldn't invalidate the batch, it must not be able to kill the script.
@@ -0,0 +1,50 @@
1
+ ---
2
+ id: 61
3
+ title: "Context injection into tracked files creates dirty git state when subprocess commits"
4
+ severity: should-fix
5
+ languages: [shell]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: integration-boundaries
8
+ pattern:
9
+ type: semantic
10
+ description: "Script injects temporary content into a tracked file (e.g., CLAUDE.md), runs a subprocess that may commit that file, then tries to restore from backup — creating a diff against the committed version."
11
+ fix: "Use git checkout -- <file> to restore to HEAD state instead of backup-based restoration. Fall back to backup only if file was never tracked."
12
+ example:
13
+ bad: |
14
+ backup=$(cat "$file")
15
+ echo "$context" >> "$file"
16
+ run_subprocess # subprocess commits $file with injected content
17
+ echo "$backup" > "$file" # now differs from HEAD — dirty state
18
+ good: |
19
+ echo "$context" >> "$file"
20
+ run_subprocess
21
+ git checkout -- "$file" 2>/dev/null || {
22
+ # fallback: file was never tracked
23
+ if [[ "$existed_before" == false ]]; then
24
+ rm -f "$file"
25
+ fi
26
+ }
27
+ ---
28
+
29
+ ## Observation
30
+
31
+ `run-plan.sh` injects per-batch context into `CLAUDE.md` before each batch (a `## Run-Plan: Batch N` section with failure patterns, prior batch summaries, and referenced files). After the batch completes, it restores CLAUDE.md from a backup taken before injection.
32
+
33
+ Batch 5 failed the quality gate with "uncommitted changes to CLAUDE.md" even though the batch itself passed all tests. The issue: the Claude subprocess committed CLAUDE.md with the injected context as part of its work. The restoration code then wrote the pre-injection backup, creating a diff against the now-committed HEAD that included the injected content.
34
+
35
+ ## Insight
36
+
37
+ This is an integration boundary bug between two phases that both touch the same file:
38
+
39
+ 1. **Orchestrator phase** — injects context into CLAUDE.md, expects to restore it after
40
+ 2. **Subprocess phase** — sees CLAUDE.md as a project file, may commit it with its changes
41
+
42
+ The backup-based restoration assumes CLAUDE.md's HEAD hasn't changed during the subprocess run. But if the subprocess commits the file (which is correct behavior — it should commit its changes), the backup is now out of date. Writing the backup creates a diff between HEAD (with injected content) and the working tree (without it).
43
+
44
+ The fix is to use `git checkout -- CLAUDE.md` which always restores to whatever HEAD currently is — regardless of whether the subprocess committed the injected version.
45
+
46
+ Edge case: if CLAUDE.md was never tracked (created fresh by injection), `git checkout` fails. Fall back to `rm -f` in that case.
47
+
48
+ ## Lesson
49
+
50
+ When injecting temporary content into tracked files before running a subprocess that may commit, never restore from an in-memory backup. The subprocess may commit the modified version, making the backup stale. Use `git checkout -- <file>` to restore to HEAD state, which is always correct regardless of whether the subprocess committed. Guard the edge case where the file wasn't previously tracked.
@@ -0,0 +1,29 @@
1
+ ---
2
+ id: 62
3
+ title: "Sibling bugs hide next to the fix"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: integration-boundaries
8
+ pattern:
9
+ type: semantic
10
+ description: "When fixing a bug in a function, scan adjacent functions in the same file for the same root cause pattern"
11
+ fix: "After fixing a function, grep the same file for the same anti-pattern in sibling functions"
12
+ example:
13
+ bad: |
14
+ # Fix complete_batch's --argjson crash, ship it
15
+ # (set_quality_gate has the same crash 30 lines below)
16
+ good: |
17
+ # Fix complete_batch's --argjson crash
18
+ # Scan file: grep -n 'argjson' run-plan-state.sh
19
+ # Found same pattern in set_quality_gate — fix both
20
+ ---
21
+
22
+ ## Observation
23
+ In Phase 1 bug fixes, 2 of 8 tasks had code quality reviewers find the exact same bug in a sibling function within the same file. `set_quality_gate` had the same `--argjson` crash as `complete_batch`. The API curl lacked `--connect-timeout` just like the health check curl 6 lines above it.
24
+
25
+ ## Insight
26
+ Implementers fix what the ticket says. The same root cause often exists in nearby code written at the same time with the same assumptions. Fresh-context subagents don't carry knowledge of what was just fixed, so they can't pattern-match on "I just fixed this — is there another one?"
27
+
28
+ ## Lesson
29
+ After fixing a bug, grep the entire file for the same anti-pattern before committing. If the root cause is a bad API usage (like `--argjson` with strings), search for all call sites of that API in the file. Code review should always check: "does this same bug exist anywhere else in this file?"
@@ -0,0 +1,31 @@
1
+ ---
2
+ id: 63
3
+ title: "One boolean flag serving two lifetimes is a conflation bug"
4
+ severity: should-fix
5
+ languages: [shell, python, javascript]
6
+ scope: [universal]
7
+ category: silent-failures
8
+ pattern:
9
+ type: semantic
10
+ description: "A boolean flag that is set in one lifecycle (e.g., per-iteration) but read in another (e.g., post-loop) — the flag's meaning changes depending on when you read it"
11
+ fix: "Split into separate variables with explicit lifecycle names (e.g., _baseline_stash_created vs _winner_stash_created)"
12
+ example:
13
+ bad: |
14
+ _stash_created=false
15
+ # Set during per-candidate loop (baseline purpose)
16
+ # Read after loop ends (winner purpose)
17
+ # Same flag, different meanings at different times
18
+ good: |
19
+ _baseline_stash_created=false
20
+ _winner_stash_created=false
21
+ # Each flag has one meaning throughout its entire lifetime
22
+ ---
23
+
24
+ ## Observation
25
+ In the sampling stash fix (#27), `_stash_created` tracked both "was the baseline stashed?" (per-candidate lifecycle) and "was the winner stashed?" (post-loop lifecycle). When candidate 0 passed and its winner state was stashed, the next candidate's restore code popped the winner stash thinking it was the baseline.
26
+
27
+ ## Insight
28
+ A boolean with two meanings at different points in time is a state machine with implicit transitions. The transitions are invisible because the variable name doesn't change — only the programmer's mental model of what it represents changes. This is especially dangerous in loops where the flag is set in one iteration and read in a different context.
29
+
30
+ ## Lesson
31
+ When a flag variable is set in one code block and read in a different block with a different purpose, split it into named variables that encode their purpose. The variable name should make its lifecycle explicit. If you can't describe when the flag is "active" in one sentence, it needs to be split.
@@ -0,0 +1,31 @@
1
+ ---
2
+ id: 64
3
+ title: "Tests that pass for the wrong reason provide false confidence"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: test-anti-patterns
8
+ pattern:
9
+ type: syntactic
10
+ regex: "\\bPATH=\"[^\":]*\""
11
+ description: "PATH assignment without colon (no prepend/append) — replaces entire PATH, removing all other commands from the environment"
12
+ fix: "Verify the test fails when the fix is reverted. Ensure test setup affects only the variable under test, not its dependencies."
13
+ example:
14
+ bad: |
15
+ # Test: free is missing → exit 2
16
+ PATH="/fake/bin" # removes awk too!
17
+ check_memory_available 4 # exits 2 because awk is missing, not free
18
+ good: |
19
+ # Test: free is missing → exit 2
20
+ PATH="/fake/bin:$PATH" # fake free, real awk
21
+ check_memory_available 4 # exits 2 because free outputs nothing
22
+ ---
23
+
24
+ ## Observation
25
+ A test to verify `check_memory_available` returns exit 2 when `free` is unavailable set `PATH="/fake/bin"` (replacing entire PATH). This also removed `awk`, so the function returned exit 2 because awk failed — not because free was missing. The test passed, but it wasn't testing what it claimed.
26
+
27
+ ## Insight
28
+ Tests that replace environment state (PATH, env vars, config files) can have blast radius beyond the intended target. The test author thinks they're isolating one variable, but they're changing a system-wide setting that affects multiple tools in the pipeline.
29
+
30
+ ## Lesson
31
+ When mocking system commands, prepend to PATH (`PATH="$fake:$PATH"`) rather than replacing it. After writing a test, revert the fix and verify the test fails — if it still passes, it's testing the wrong thing. Name tests to describe the code path they exercise, not just the expected outcome.
@@ -0,0 +1,39 @@
1
+ ---
2
+ id: 65
3
+ title: "pipefail + grep -c + fallback produces double output"
4
+ severity: should-fix
5
+ languages: [shell]
6
+ scope: [language:bash]
7
+ category: silent-failures
8
+ pattern:
9
+ type: syntactic
10
+ regex: "grep\\s+-c.*\\|\\|\\s*echo\\s+0"
11
+ description: "grep -c piped with || echo 0 under set -o pipefail produces '0\\n0' — grep writes 0, then fallback also writes 0"
12
+ fix: "Wrap grep -c in a helper function that captures the exit code internally, or use || true inside a subshell"
13
+ positive_alternative: "Use a _count_matches helper: result=$(grep -c ... || true); echo \"${result:-0}\""
14
+ example:
15
+ bad: |
16
+ set -euo pipefail
17
+ count=$(echo "$text" | grep -c "pattern" || echo 0)
18
+ # Produces "0\n0" when no match — grep outputs 0, then fallback also outputs 0
19
+ good: |
20
+ set -euo pipefail
21
+ _count_matches() {
22
+ local result exit_code=0
23
+ result=$(grep -ciE "$1" 2>&1) || exit_code=$?
24
+ [[ $exit_code -le 1 ]] && echo "${result:-0}" || echo "0"
25
+ }
26
+ count=$(echo "$text" | _count_matches "pattern")
27
+ ---
28
+
29
+ ## Observation
30
+
31
+ In `validate-plan-quality.sh`, scoring functions used `grep -ciE "pattern" || echo 0` to count matches safely. Under `set -euo pipefail`, when grep found zero matches (exit 1), both grep's output ("0") AND the fallback ("0") were written to stdout, producing "0\n0" instead of "0".
32
+
33
+ ## Insight
34
+
35
+ `set -o pipefail` propagates the non-zero exit from grep through the pipe, causing the `|| echo 0` fallback to execute. But grep already wrote "0" to stdout before exiting. The fallback then appends another "0". This is invisible in most tests because `[[ "0\n0" -gt 0 ]]` still works in bash (it reads the first line), but it corrupts any downstream parsing.
36
+
37
+ ## Lesson
38
+
39
+ Never use `command || echo default` for commands that write output before failing. Instead, capture the exit code in a wrapper function and handle it explicitly. The `_count_matches` pattern works: run grep inside the function, capture exit code, distinguish "no matches" (exit 1, normal) from "grep error" (exit 2+, unexpected).
@@ -0,0 +1,37 @@
1
+ ---
2
+ id: 66
3
+ title: "local keyword used outside function scope"
4
+ severity: should-fix
5
+ languages: [shell]
6
+ scope: [language:bash]
7
+ category: silent-failures
8
+ pattern:
9
+ type: semantic
10
+ description: "bash `local` keyword outside a function body — undefined behavior, works in bash but fails in dash/sh and is technically a bug"
11
+ fix: "Only use `local` inside function bodies. At script top-level, just assign the variable directly."
12
+ positive_alternative: "Remove `local` from top-level variable assignments; use plain assignment instead"
13
+ example:
14
+ bad: |
15
+ # At script top-level (not inside a function)
16
+ if [[ "$JSON_OUTPUT" == true ]]; then
17
+ local escaped_plan
18
+ escaped_plan=$(printf '%s' "$PLAN_FILE" | jq -Rs '.')
19
+ fi
20
+ good: |
21
+ # At script top-level — no local keyword
22
+ if [[ "$JSON_OUTPUT" == true ]]; then
23
+ escaped_plan=$(printf '%s' "$PLAN_FILE" | jq -Rs '.')
24
+ fi
25
+ ---
26
+
27
+ ## Observation
28
+
29
+ In `validate-plan-quality.sh`, the JSON output block at the script's top level used `local escaped_plan` to declare a variable. This worked in bash but is technically undefined behavior — `local` is only valid inside functions.
30
+
31
+ ## Insight
32
+
33
+ Bash tolerates `local` outside functions (it just creates a regular variable), but this is a portability landmine. If the script is ever sourced by another script or run with `dash`/`sh`, it fails. It also misleads readers into thinking the code is inside a function when it isn't.
34
+
35
+ ## Lesson
36
+
37
+ Reserve `local` for function bodies exclusively. At script top-level, use plain variable assignment. This is especially important in scripts that use `source` chains, where the boundary between "inside a function" and "top-level" blurs across files.
@@ -0,0 +1,36 @@
1
+ ---
2
+ id: 67
3
+ title: "Scripts hang when stdin is a socket or pipe in non-interactive shells"
4
+ severity: should-fix
5
+ languages: [shell]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: silent-failures
8
+ pattern:
9
+ type: semantic
10
+ description: "Script reads from stdin without redirection — hangs in CI, Claude Code, cron, or any environment where stdin is not a terminal"
11
+ fix: "Add </dev/null to commands that may read stdin, or redirect stdin at the test harness level"
12
+ positive_alternative: "Run subprocesses with explicit stdin: bash script.sh </dev/null"
13
+ example:
14
+ bad: |
15
+ # Test harness — stdin inherited from parent (may be socket/pipe)
16
+ for t in scripts/tests/test-*.sh; do
17
+ bash "$t" >/dev/null 2>&1
18
+ done
19
+ good: |
20
+ # Test harness — stdin explicitly from /dev/null
21
+ for t in scripts/tests/test-*.sh; do
22
+ bash "$t" </dev/null >/dev/null 2>&1
23
+ done
24
+ ---
25
+
26
+ ## Observation
27
+
28
+ Running the test suite from Claude Code's shell caused `test-lesson-check.sh` to hang indefinitely. The process was blocked on `unix_stream_read_generic` — reading from a Unix socket that served as stdin in the Claude environment. Multiple stale processes accumulated across retries.
29
+
30
+ ## Insight
31
+
32
+ Claude Code (and similar environments like CI runners, cron jobs, tmux send-keys) connects stdin to non-terminal file descriptors. Any script that reads stdin — even indirectly through a command like `read`, `cat` without args, or a tool that checks for piped input — will block forever waiting for data that never arrives. This is invisible in interactive testing because the terminal provides EOF on Ctrl+D.
33
+
34
+ ## Lesson
35
+
36
+ Always redirect stdin from `/dev/null` when invoking scripts in non-interactive contexts. The safest place is the test harness loop itself (`bash "$t" </dev/null`), which protects all tests regardless of what they do internally. For individual scripts, audit for stdin-reading commands and add explicit `/dev/null` redirection.
@@ -0,0 +1,31 @@
1
+ ---
2
+ id: 68
3
+ title: "Agent builds the wrong thing correctly"
4
+ severity: blocker
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: specification-drift
8
+ pattern:
9
+ type: semantic
10
+ description: "Agent misinterprets requirements — code passes tests but doesn't match the actual spec. Tests were written against the agent's interpretation, not the user's intent."
11
+ fix: "Before implementation, echo back the spec in your own words and get explicit user confirmation. Write acceptance criteria from the spec, not from your interpretation."
12
+ example:
13
+ bad: |
14
+ # User asks for "retry with backoff"
15
+ # Agent implements retry with fixed 1s delay
16
+ # Test checks retry happens — passes
17
+ # But spec meant exponential backoff
18
+ good: |
19
+ # Echo back: "I'll implement retry with exponential backoff: 1s, 2s, 4s, 8s, max 30s"
20
+ # User confirms or corrects
21
+ # Write test that verifies exponential timing
22
+ ---
23
+
24
+ ## Observation
25
+ An agent received a feature request, implemented it with full test coverage, and all tests passed. But the implementation didn't match what the user actually wanted — the agent's interpretation of the requirements diverged from the user's intent. The bug was only discovered during manual review.
26
+
27
+ ## Insight
28
+ When an agent writes both the implementation AND the tests, the tests validate the agent's understanding, not the user's requirements. This creates a closed loop where wrong code passes wrong tests. The spec is the only external anchor — but agents often skip the echo-back step that would catch misinterpretation.
29
+
30
+ ## Lesson
31
+ Always echo back requirements before implementing. The echo-back gate catches the 60%+ of failures that come from spec misunderstanding (not from coding errors). Write acceptance criteria from the original spec text, not from your paraphrase of it.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 69
3
+ title: "Plan quality dominates execution quality 3:1"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: specification-drift
8
+ pattern:
9
+ type: semantic
10
+ description: "Investing heavily in execution optimization (retries, sampling, model routing) while the plan itself has gaps, ambiguities, or wrong decomposition. A bad plan executed perfectly still produces wrong output."
11
+ fix: "Invest in plan quality first: scorecard the plan for completeness, correctness of decomposition, and dependency ordering before starting execution."
12
+ example:
13
+ bad: |
14
+ # Plan says "add authentication" with no detail
15
+ # Execution uses MAB + competitive mode + 3 retries
16
+ # Result: perfectly executed wrong authentication scheme
17
+ good: |
18
+ # Plan specifies: JWT with refresh tokens, 15min access TTL
19
+ # Plan scorecard: all tasks have acceptance criteria
20
+ # Simple headless execution gets it right first try
21
+ ---
22
+
23
+ ## Observation
24
+ Across multiple autonomous coding runs, the correlation between plan quality and final output quality was 3x stronger than the correlation between execution quality (retries, model choice, sampling) and output quality. The best execution infrastructure cannot compensate for a plan that decomposes the work incorrectly or omits critical requirements.
25
+
26
+ ## Insight
27
+ Plan quality and execution quality are not interchangeable investments. A well-specified plan with simple execution beats a vague plan with sophisticated execution infrastructure. The plan is the specification — if it's wrong, every downstream batch inherits the error.
28
+
29
+ ## Lesson
30
+ Score your plan before executing it. Check: Does every task have clear acceptance criteria? Are dependencies correctly ordered? Are there any ambiguous requirements? A 30-minute plan review saves hours of execution rework.
@@ -0,0 +1,31 @@
1
+ ---
2
+ id: 70
3
+ title: "Spec echo-back prevents 60% of agent failures"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: specification-drift
8
+ pattern:
9
+ type: semantic
10
+ description: "Agent proceeds directly from requirements to implementation without restating the requirements in its own words and confirming understanding with the user."
11
+ fix: "Add an echo-back gate: agent restates requirements, user confirms or corrects, only then proceed to implementation."
12
+ example:
13
+ bad: |
14
+ User: "Add rate limiting to the API"
15
+ Agent: *immediately starts coding*
16
+ good: |
17
+ User: "Add rate limiting to the API"
18
+ Agent: "I'll add token bucket rate limiting at 100 req/min per IP,
19
+ with 429 responses and Retry-After header. Correct?"
20
+ User: "Yes, but 60 req/min"
21
+ Agent: *now implements with correct limit*
22
+ ---
23
+
24
+ ## Observation
25
+ Analysis of autonomous coding failures showed that 60%+ of failures stemmed from spec misunderstanding, not from coding errors. The agent understood the words but not the intent — implementing a technically correct solution to the wrong problem.
26
+
27
+ ## Insight
28
+ Spec misunderstanding is invisible until late in the process because the agent's implementation is internally consistent. Tests pass because they test the agent's interpretation. The echo-back step forces the misunderstanding to surface before any code is written.
29
+
30
+ ## Lesson
31
+ Before implementing any feature, restate the requirements in your own words and confirm with the user. This single step prevents more failures than any amount of testing or code review.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 71
3
+ title: "Positive instructions outperform negative ones for LLMs"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: specification-drift
8
+ pattern:
9
+ type: semantic
10
+ description: "Instructions phrased as 'don't do X' instead of 'do Y'. Negative instructions trigger the Pink Elephant Problem — the model encodes the forbidden pattern and may reproduce it."
11
+ fix: "Rephrase negative instructions as positive alternatives: instead of 'don't use var', write 'use const or let'."
12
+ example:
13
+ bad: |
14
+ # Don't use bare except clauses
15
+ # Don't hardcode test counts
16
+ # Don't use .venv/bin/pip
17
+ good: |
18
+ # Always catch specific exception classes and log
19
+ # Use threshold assertions (>=) for extensible collections
20
+ # Use .venv/bin/python -m pip for correct site-packages
21
+ ---
22
+
23
+ ## Observation
24
+ When lesson files and instructions used negative phrasing ("don't do X"), agents occasionally reproduced the exact anti-pattern described — the Pink Elephant Problem. Positive phrasing ("do Y instead") consistently produced better compliance.
25
+
26
+ ## Insight
27
+ LLMs process instructions by encoding all tokens, including the forbidden pattern. "Don't use bare except" encodes "bare except" as a salient concept. "Always catch specific exception classes" encodes the correct pattern directly. The model follows what it encodes most strongly.
28
+
29
+ ## Lesson
30
+ Write instructions as positive alternatives: "do Y" outperforms "don't do X" for LLM compliance. When writing lessons, always include a `positive_alternative` that the agent can follow directly.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 72
3
+ title: "Lost in the Middle — context placement affects accuracy 20pp"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: context-retrieval
8
+ pattern:
9
+ type: semantic
10
+ description: "Critical instructions or requirements placed in the middle of a long context window, where LLM attention is weakest. Task description buried after long preambles or between large code blocks."
11
+ fix: "Place the task at the top of the context and requirements at the bottom. Keep the middle for reference material that's useful but not critical."
12
+ example:
13
+ bad: |
14
+ [500 lines of project context]
15
+ [task description buried here]
16
+ [300 lines of code examples]
17
+ good: |
18
+ [task description — FIRST]
19
+ [reference material in middle]
20
+ [requirements and constraints — LAST]
21
+ ---
22
+
23
+ ## Observation
24
+ Research on LLM context windows shows a U-shaped attention curve: models attend most strongly to the beginning and end of context, with accuracy dropping up to 20 percentage points for information placed in the middle. When critical instructions were placed mid-context, agents missed them reliably.
25
+
26
+ ## Insight
27
+ The "Lost in the Middle" effect means context order matters as much as context content. A perfectly written requirement placed in the wrong position has the same effect as a missing requirement. This is especially relevant for context injection in autonomous pipelines.
28
+
29
+ ## Lesson
30
+ Structure all context injection with task at the top and requirements at the bottom. Use the middle for supplementary reference material. For `run-plan-context.sh`, this means: batch description first, prior art and warnings in the middle, acceptance criteria last.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 73
3
+ title: "Unscoped lessons cause 67% false positive rate at scale"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: context-retrieval
8
+ pattern:
9
+ type: semantic
10
+ description: "Lesson files without scope metadata applied universally to all projects, causing irrelevant violations to fire on projects where the anti-pattern cannot occur."
11
+ fix: "Add scope: tags to every lesson. Use detect_project_scope() to filter lessons by project context. Default to [universal] only for genuinely cross-cutting patterns."
12
+ example:
13
+ bad: |
14
+ # Lesson about HA automation keys fires on a React project
15
+ # Lesson about JSX factory fires on a Python-only project
16
+ # 67% of violations are irrelevant noise
17
+ good: |
18
+ scope: [domain:ha-aria] # Only fires on HA projects
19
+ scope: [language:javascript, framework:preact] # Only fires on JSX projects
20
+ scope: [universal] # Genuinely applies everywhere
21
+ ---
22
+
23
+ ## Observation
24
+ As the lesson library grew past ~50 lessons, the false positive rate on any given project reached 67%. Lessons about Home Assistant automation keys fired on React projects. Lessons about JSX factory issues fired on Python-only projects. Developers started ignoring lesson-check output entirely.
25
+
26
+ ## Insight
27
+ Without scope metadata, every lesson fires everywhere. This is correct for universal patterns (bare except, missing await) but wrong for domain-specific patterns. The noise from irrelevant violations drowns the signal from real issues, causing the entire system to be ignored.
28
+
29
+ ## Lesson
30
+ Every lesson needs scope metadata. Use `scope: [universal]` only for patterns that genuinely apply to all projects. For everything else, scope to language, framework, domain, or specific project. The scope system keeps signal-to-noise high as the library scales.
@@ -0,0 +1,32 @@
1
+ ---
2
+ id: 74
3
+ title: "Stale context injection sends wrong batch's state to next agent"
4
+ severity: should-fix
5
+ languages: [shell]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: context-retrieval
8
+ pattern:
9
+ type: semantic
10
+ description: "Context injection (CLAUDE.md modifications, AGENTS.md generation) from a previous batch persists into the next batch because the injection writes to tracked files and the git-clean check fails, or the injection is not cleaned up between batches."
11
+ fix: "Context injection must be idempotent and batch-scoped. Clean up injected context after each batch. Use temporary files or environment variables instead of modifying tracked files."
12
+ example:
13
+ bad: |
14
+ # Batch 3 context injected into CLAUDE.md
15
+ # Batch 3 fails, retries
16
+ # Batch 4 starts — still sees Batch 3's context in CLAUDE.md
17
+ # Agent makes decisions based on stale context
18
+ good: |
19
+ # Context injected into /tmp/batch-context.md
20
+ # Passed via --context flag or environment variable
21
+ # Automatically cleaned up between batches
22
+ # Each batch starts with fresh, correct context
23
+ ---
24
+
25
+ ## Observation
26
+ Context injection that modified tracked files (like appending to CLAUDE.md) created dirty git state between batches. The next batch's agent inherited the previous batch's context injection, making decisions based on stale information. When batch 3 failed and batch 4 started, batch 4 still saw batch 3's failure context.
27
+
28
+ ## Insight
29
+ Context injection into version-controlled files conflates two lifetimes: the file's permanent content and the batch's temporary context. The git-clean quality gate catches this as "uncommitted changes" but the root cause is architectural — using the wrong persistence mechanism for ephemeral data.
30
+
31
+ ## Lesson
32
+ Never inject batch-scoped context into tracked files. Use temporary files, environment variables, or the context budget in `run-plan-context.sh` which is designed for ephemeral, per-batch context injection.