autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,32 @@
1
+ ---
2
+ id: 75
3
+ title: "Research artifacts must persist — ephemeral research is wasted research"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: context-retrieval
8
+ pattern:
9
+ type: semantic
10
+ description: "Research findings discussed in conversation but never written to a file. When context resets (new session, /clear, context compression), all research is lost and must be redone."
11
+ fix: "Every research activity must produce a durable file artifact. Write findings to tasks/research-<slug>.md immediately. Never rely on conversation context for research persistence."
12
+ example:
13
+ bad: |
14
+ # Agent researches 3 libraries, compares trade-offs
15
+ # Findings exist only in conversation
16
+ # User does /clear
17
+ # Next session: "What libraries did we evaluate?" — gone
18
+ good: |
19
+ # Agent researches 3 libraries, writes tasks/research-auth-libs.md
20
+ # File includes: comparison table, recommendation, blocking issues
21
+ # User does /clear
22
+ # Next session reads the file — full context preserved
23
+ ---
24
+
25
+ ## Observation
26
+ Research conducted during brainstorming or planning was discussed in conversation but never written to a file. When the session ended or context compressed, all research findings were lost. The next session had to redo the same research, often reaching different conclusions.
27
+
28
+ ## Insight
29
+ Conversation context is ephemeral by design — context windows compress, sessions end, `/clear` resets everything. Research that exists only in conversation has the same durability as spoken words. File artifacts are the only mechanism that survives context boundaries.
30
+
31
+ ## Lesson
32
+ Always make a file. Every research activity, design decision, and investigation produces a durable artifact at `tasks/research-<slug>.md`. No ephemeral research — files survive context resets, conversation context doesn't.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 76
3
+ title: "Wrong decomposition contaminates all downstream batches"
4
+ severity: blocker
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: planning-control-flow
8
+ pattern:
9
+ type: semantic
10
+ description: "A plan decomposes work into batches where an early batch has the wrong boundary — wrong files, wrong order, or wrong grouping. All subsequent batches inherit the wrong foundation and compound the error."
11
+ fix: "Validate decomposition before execution: check that batch boundaries align with module boundaries, dependencies flow forward (never backward), and each batch is independently testable."
12
+ example:
13
+ bad: |
14
+ # Batch 1: Create API + frontend (too broad, untestable)
15
+ # Batch 2: Add tests (tests written after, not with)
16
+ # Batch 3: Integration (discovers Batch 1 was wrong)
17
+ good: |
18
+ # Batch 1: Create API with tests (independently verifiable)
19
+ # Batch 2: Create frontend with tests (independently verifiable)
20
+ # Batch 3: Integration wiring with e2e test
21
+ ---
22
+
23
+ ## Observation
24
+ A plan decomposed a feature into 5 batches. Batch 1 grouped files incorrectly — putting the data model and the API handler in the same batch when they had different dependencies. Every subsequent batch built on batch 1's incorrect structure. By batch 4, the agent was fighting the architecture instead of building on it.
25
+
26
+ ## Insight
27
+ Decomposition errors are the most expensive kind of plan bug because they compound. Each batch that builds on a wrong foundation adds more code that depends on the wrong structure. The cost to fix grows quadratically with the number of affected batches.
28
+
29
+ ## Lesson
30
+ Validate plan decomposition before executing. Check: Does each batch align with a natural module boundary? Do dependencies flow strictly forward (batch N never depends on batch N+1)? Is each batch independently testable? A 10-minute decomposition review prevents multi-hour rework.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 77
3
+ title: "Cherry-pick merges from parallel worktrees need manual conflict resolution"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [project:autonomous-coding-toolkit]
7
+ category: planning-control-flow
8
+ pattern:
9
+ type: semantic
10
+ description: "Multiple agents work in parallel worktrees on the same files. Cherry-picking the winner's commits into main creates merge conflicts that automated tools cannot resolve correctly because they lack the semantic context of why each change was made."
11
+ fix: "When cherry-picking from parallel worktrees, always use interactive conflict resolution. Never auto-resolve with --theirs or --ours. Review each conflict with the judge agent's scoring context."
12
+ example:
13
+ bad: |
14
+ git cherry-pick abc123 --strategy-option theirs # blindly takes one side
15
+ # Loses valuable changes from the other worktree
16
+ good: |
17
+ git cherry-pick abc123 # stops on conflict
18
+ # Review each conflict with context from judge's scoring
19
+ # Manually merge best-of-both changes
20
+ git add resolved-file.py && git cherry-pick --continue
21
+ ---
22
+
23
+ ## Observation
24
+ In competitive mode, two agents implemented the same batch in separate worktrees. The judge picked a winner, but cherry-picking the winner's commits into the main worktree produced merge conflicts in 3 files. Using `--theirs` to auto-resolve discarded valuable fixes from the losing agent that the judge had flagged for best-of-both synthesis.
25
+
26
+ ## Insight
27
+ Cherry-pick conflicts between parallel implementations are semantically rich — each side made deliberate, different choices. Automated resolution strategies (`--theirs`, `--ours`) discard information. Only a reviewer with the judge's scoring context can correctly merge the best of both.
28
+
29
+ ## Lesson
30
+ Never auto-resolve cherry-pick conflicts from parallel worktrees. Use interactive resolution with the judge agent's scoring context. The mandatory best-of-both synthesis in competitive mode means both sides have value — the conflict resolution is where that value is captured.
@@ -0,0 +1,30 @@
1
+ ---
2
+ id: 78
3
+ title: "Static review without live test optimizes for the wrong risk class"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: planning-control-flow
8
+ pattern:
9
+ type: semantic
10
+ description: "Relying solely on static code review (reading code, checking patterns) without running the code in a live environment. Static review catches structural issues but misses behavioral bugs that only manifest at runtime."
11
+ fix: "Always combine static review with at least one live integration test. One live test catches more real bugs than six static reviewers."
12
+ example:
13
+ bad: |
14
+ # 6 review agents read the code
15
+ # All report "looks good"
16
+ # Deploy → runtime error on first request
17
+ good: |
18
+ # 2 review agents read the code
19
+ # 1 integration test runs the actual pipeline
20
+ # Runtime error caught before deploy
21
+ ---
22
+
23
+ ## Observation
24
+ A code change was reviewed by six static analysis agents. All reported the code was correct. On deployment, the first real request triggered a runtime error that none of the static reviewers could have caught — the bug was in the interaction between two components at runtime, not in either component's code.
25
+
26
+ ## Insight
27
+ Static review and live testing catch non-overlapping bug classes. Static review finds structural issues (wrong patterns, missing imports, type errors). Live testing finds behavioral issues (wrong data flow, timing bugs, environment-dependent failures). Investing only in static review creates a false sense of confidence.
28
+
29
+ ## Lesson
30
+ Always combine static review (at most 2 agents — diminishing returns after that) with at least one live integration test. The audit method should always be: static review for structural correctness + live test for behavioral correctness. One live test is worth six static reviewers.
@@ -0,0 +1,32 @@
1
+ ---
2
+ id: 79
3
+ title: "Multi-batch plans need an explicit integration wiring batch"
4
+ severity: should-fix
5
+ languages: [all]
6
+ scope: [universal]
7
+ category: planning-control-flow
8
+ pattern:
9
+ type: semantic
10
+ description: "A multi-batch plan where each batch creates separate components but no batch is dedicated to wiring them together. Each component passes its own tests but the pipeline is disconnected."
11
+ fix: "Add an explicit integration wiring batch at the end (or at natural integration points) that connects components and runs end-to-end tests. The wiring batch should have no new feature code — only imports, configuration, and integration tests."
12
+ example:
13
+ bad: |
14
+ # Batch 1: Build parser (tests pass)
15
+ # Batch 2: Build formatter (tests pass)
16
+ # Batch 3: Build CLI (tests pass)
17
+ # Result: CLI doesn't call parser, parser doesn't feed formatter
18
+ good: |
19
+ # Batch 1: Build parser (tests pass)
20
+ # Batch 2: Build formatter (tests pass)
21
+ # Batch 3: Wire parser → formatter, integration test
22
+ # Batch 4: Build CLI using wired pipeline, e2e test
23
+ ---
24
+
25
+ ## Observation
26
+ A 5-batch plan created 5 separate components. Each batch had thorough unit tests and all passed. But the components were never wired together — no batch was responsible for integration. The final "verify" step discovered that the components couldn't communicate because they used incompatible interfaces.
27
+
28
+ ## Insight
29
+ When each batch is scoped to a single component, integration is an implicit assumption — "someone will wire these together." But in autonomous execution, implicit assumptions don't get executed. Each batch follows its explicit instructions, and if no batch says "wire X to Y", it doesn't happen.
30
+
31
+ ## Lesson
32
+ Every multi-batch plan with 3+ components needs at least one explicit integration wiring batch. This batch should: import all components, configure their connections, and run at least one end-to-end test that traces data through the full pipeline. No new feature code — only wiring and verification.
@@ -0,0 +1,161 @@
1
+ # Lessons Learned Framework
2
+
3
+ Synthesized from three methodologies adapted for personal infrastructure and AI system development.
4
+
5
+ ## Source Frameworks
6
+
7
+ | Framework | Contribution | Reference |
8
+ |-----------|-------------|-----------|
9
+ | **Army CALL OIL** | Maturity taxonomy (Observation → Insight → Lesson → Lesson Learned) | CALL Handbook 11-33 |
10
+ | **PMI PMBOK** | Structured register (category, root cause, corrective action, keywords) | PMI Lessons Learned Register |
11
+ | **Lean Six Sigma** | Analysis tools (5 Whys, Fishbone 6M, A3 format, DMAIC phases) | LSS Green Belt (Notion: Tri-County LEAN Six Sigma Made EZ) |
12
+
13
+ ## OIL Maturity Taxonomy
14
+
15
+ Not every finding is a "lesson learned." The Army's OIL taxonomy provides a promotion path:
16
+
17
+ ### Tier 1: Observation
18
+ **"What happened."** Raw conditions, symptoms, and facts from an incident.
19
+ - Requires: Date, system, files involved, factual description
20
+ - Example: "Intraday snapshots showed entities.total: 0 despite 14,392 logbook events"
21
+ - Status: `observed`
22
+
23
+ ### Tier 2: Insight
24
+ **"What it means."** Analysis connecting observation to root cause. Uses 5 Whys or Fishbone to dig below symptoms.
25
+ - Requires: Root cause identified, impact assessed, "why" chain documented
26
+ - Example: "Decorator-based collector registry was never imported, so CollectorRegistry.all() returned empty dict"
27
+ - Status: `analyzed`
28
+
29
+ ### Tier 3: Lesson
30
+ **"What to do about it."** Proposed corrective action — specific, implementable, testable.
31
+ - Requires: Corrective action described, preventive measure proposed, owner identified
32
+ - Example: "Add integration test verifying collector count >= 15; add import comment pattern to all decorator registries"
33
+ - Status: `proposed`
34
+
35
+ ### Tier 4: Lesson Learned
36
+ **"Validated behavioral change."** The corrective action has been implemented, tested, AND confirmed to prevent recurrence. This is the highest tier — most entries won't reach it immediately.
37
+ - Requires: Implementation proof (commit, test, config change), validation evidence, sustain plan
38
+ - Example: "Test added (commit abc123), collector count assertion catches this class of bug. No recurrence in 30 days."
39
+ - Status: `validated`
40
+
41
+ **Key distinction:** A lesson is *proposed*. A lesson learned is *proven*. Most entries start as Tier 1-2 and promote over time.
42
+
43
+ ## Lesson Structure (A3-Inspired)
44
+
45
+ Each lesson follows a compressed A3 format — the same one-page problem-solution structure from the Green Belt coursework, adapted for technical lessons:
46
+
47
+ ```
48
+ # Lesson: [Title]
49
+
50
+ **Date:** YYYY-MM-DD
51
+ **System:** [Project name]
52
+ **Tier:** observation | insight | lesson | lesson_learned
53
+ **Category:** [See categories below]
54
+ **Keywords:** [comma-separated for retrieval]
55
+ **Files:** [affected files]
56
+
57
+ ## Observation (What Happened)
58
+ [Factual description of the incident/discovery. Include data contradictions.]
59
+
60
+ ## Analysis (Root Cause — 5 Whys)
61
+ Why #1: [surface cause]
62
+ Why #2: [deeper cause]
63
+ Why #3: [root cause — stop at the deepest controllable cause]
64
+
65
+ ## Corrective Actions
66
+ | # | Action | Status | Owner | Evidence |
67
+ |---|--------|--------|-------|----------|
68
+ | 1 | [specific action] | proposed/implemented/validated | [who] | [commit/test/config] |
69
+
70
+ ## Ripple Effects
71
+ [What other systems/services/pipelines does this touch?]
72
+
73
+ ## Sustain Plan
74
+ - [ ] 7-day check: [what to verify]
75
+ - [ ] 30-day check: [confirm no recurrence]
76
+ - [ ] Contingency: [if corrective action doesn't hold]
77
+
78
+ ## Key Takeaway
79
+ [One sentence. The thing you'd tell someone in 10 seconds.]
80
+ ```
81
+
82
+ ## Categories
83
+
84
+ Aligned to the system hierarchy — when searching for patterns, filter by category:
85
+
86
+ | Category | Scope | Examples |
87
+ |----------|-------|---------|
88
+ | `data-model` | Schema, inheritance, data flow assumptions | HA entity→device→area chain |
89
+ | `registration` | Module loading, decorator patterns, import side effects | Collector registry empty |
90
+ | `cold-start` | First-run behavior, missing baselines, graceful degradation | Predictions 0 for missing weekday |
91
+ | `integration` | Cross-service dependencies, shared state, API contracts | Engine↔Hub JSON schema coupling |
92
+ | `deployment` | Service config, systemd, env vars, restart behavior | ~/.env export syntax |
93
+ | `monitoring` | Alert logic, noise suppression, false positives, staleness | Stuck sensor alerts |
94
+ | `ui` | Frontend assumptions, data display, user-facing bugs | Area counts showing 0 |
95
+ | `testing` | Coverage gaps, mock masking, smoke tests | Mocked collectors hiding registration bug |
96
+ | `performance` | Resource contention, memory, Ollama scheduling | Timer deconfliction |
97
+ | `security` | Auth, secrets, permissions | Credential exposure |
98
+
99
+ ## Analysis Tools
100
+
101
+ ### 5 Whys (Primary)
102
+ Use for most lessons. Stop at the deepest **controllable** root cause.
103
+
104
+ ### Fishbone / 6M (Complex Issues)
105
+ When 5 Whys branches into multiple causes, use the 6M categories adapted for infrastructure:
106
+ - **Method:** Process/workflow gap (no smoke test after refactor)
107
+ - **Machine:** System/service failure (Ollama contention, OOM)
108
+ - **Material:** Data quality (stale cache, missing baselines)
109
+ - **Manpower:** Knowledge gap (didn't know HA inheritance model)
110
+ - **Management:** Process gap (no review step, no sustain plan)
111
+ - **Mother Nature:** External factor (upstream API change, network)
112
+
113
+ ### Pareto Principle
114
+ When reviewing lessons over time: **most frequent category ≠ most impactful category.** Track both. Optimize for impact, not frequency.
115
+
116
+ ## Lifecycle & Promotion
117
+
118
+ ```
119
+ observed → analyzed → proposed → validated
120
+ ↑ ↑ ↑ ↑
121
+ Incident 5 Whys Action Proof +
122
+ logged done defined 30-day
123
+ sustain
124
+ ```
125
+
126
+ **Promotion criteria:**
127
+ - `observed → analyzed`: Root cause identified via 5 Whys or Fishbone
128
+ - `analyzed → proposed`: Corrective action defined with owner and timeline
129
+ - `proposed → validated`: Action implemented + evidence of behavioral change (test passing, config applied, no recurrence for 30 days)
130
+
131
+ **Review cadence:** Check `proposed` items monthly. If no action in 60 days, either implement or archive with reason.
132
+
133
+ ## Connecting to MEMORY.md
134
+
135
+ MEMORY.md carries **one-line summaries** pointing to full lesson files:
136
+
137
+ ```markdown
138
+ ## Lessons Learned
139
+ - `docs/lessons/2026-02-14-area-entity-resolution.md` — HA entity→device→area chain [lesson_learned]
140
+ - `docs/lessons/2026-02-14-collector-registration.md` — Decorator registries need explicit imports [lesson]
141
+ ```
142
+
143
+ The `[tier]` tag shows maturity at a glance. Update when tier changes.
144
+
145
+ ## Cross-Framework Mapping
146
+
147
+ For career transition context — how these frameworks translate:
148
+
149
+ | Military | LSS | PMI | This System |
150
+ |----------|-----|-----|-------------|
151
+ | AAR (After Action Review) | Kaizen session | Retrospective | Lesson file creation |
152
+ | MDMP (Military Decision Making) | DMAIC | Planning process | Plan docs in docs/plans/ |
153
+ | OPORD | A3 Report | Project charter | CLAUDE.md + plan doc |
154
+ | IPB | VSM + SIPOC | Stakeholder analysis | System audit (/ha-audit, /status) |
155
+ | F3EAD cycle | PDCA cycle | Monitor & Control | Counter system (/counter, /check, /reflect) |
156
+
157
+ ## File Naming
158
+
159
+ `docs/lessons/YYYY-MM-DD-short-description.md`
160
+
161
+ One lesson per file. If an incident produces multiple independent lessons, split them.
@@ -0,0 +1,201 @@
1
+ # Lessons Learned — Summary
2
+
3
+ 79 lessons captured from autonomous coding workflows. Each is a standalone markdown file with YAML frontmatter, grep-detectable patterns (syntactic) or AI-reviewable descriptions (semantic), and concrete fix guidance.
4
+
5
+ ## Quick Reference
6
+
7
+ | ID | Title | Category | Severity | Type |
8
+ |----|-------|----------|----------|------|
9
+ | 0001 | Bare exception swallowing hides failures | silent-failures | blocker | syntactic |
10
+ | 0002 | async def without await returns truthy coroutine | async-traps | blocker | semantic |
11
+ | 0003 | asyncio.create_task without done_callback swallows exceptions | silent-failures | should-fix | semantic |
12
+ | 0004 | Hardcoded count assertions break when datasets grow | test-anti-patterns | should-fix | syntactic |
13
+ | 0005 | sqlite3 connections leak without closing() context manager | silent-failures | should-fix | syntactic |
14
+ | 0006 | .venv/bin/pip installs to wrong site-packages | integration-boundaries | should-fix | syntactic |
15
+ | 0007 | Runner state file rejected by own git-clean check | integration-boundaries | should-fix | semantic |
16
+ | 0008 | Quality gate blind spot for non-standard test suites | silent-failures | should-fix | semantic |
17
+ | 0009 | Plan parser over-count burns empty API calls | silent-failures | should-fix | semantic |
18
+ | 0010 | `local` outside function silently misbehaves in bash | silent-failures | blocker | syntactic |
19
+ | 0011 | Batch execution writes tests for unimplemented code | integration-boundaries | should-fix | semantic |
20
+ | 0012 | API rejects markdown with unescaped special chars | integration-boundaries | nice-to-have | semantic |
21
+ | 0013 | `export` prefix in env files breaks naive parsing | silent-failures | should-fix | syntactic |
22
+ | 0014 | Decorator registries are import-time side effects | silent-failures | should-fix | semantic |
23
+ | 0015 | Frontend-backend schema drift invisible until e2e trace | integration-boundaries | should-fix | semantic |
24
+ | 0016 | Event-driven systems must seed current state on startup | integration-boundaries | should-fix | semantic |
25
+ | 0017 | Copy-pasted logic between modules diverges silently | integration-boundaries | should-fix | semantic |
26
+ | 0018 | Every layer passes its test while full pipeline is broken | integration-boundaries | should-fix | semantic |
27
+ | 0019 | systemd EnvironmentFile ignores `export` keyword | silent-failures | should-fix | syntactic |
28
+ | 0020 | Persist state incrementally before expensive work | silent-failures | should-fix | semantic |
29
+ | 0021 | Dual-axis testing: horizontal sweep + vertical trace | integration-boundaries | should-fix | semantic |
30
+ | 0022 | Build tool JSX factory shadowed by arrow params | silent-failures | blocker | syntactic |
31
+ | 0023 | Static analysis spiral — chasing lint fixes creates more bugs | test-anti-patterns | should-fix | semantic |
32
+ | 0024 | Shared pipeline features must share implementation | integration-boundaries | should-fix | semantic |
33
+ | 0025 | Defense-in-depth: validate at all entry points | integration-boundaries | should-fix | semantic |
34
+ | 0026 | Linter with no rules enabled = false enforcement | silent-failures | should-fix | semantic |
35
+ | 0027 | JSX silently drops wrong prop names | silent-failures | should-fix | semantic |
36
+ | 0028 | Never embed infrastructure details in client-side code | silent-failures | blocker | syntactic |
37
+ | 0029 | Never write secret values into committed files | silent-failures | blocker | syntactic |
38
+ | 0030 | Cache/registry updates must merge, never replace | integration-boundaries | should-fix | semantic |
39
+ | 0031 | Verify units at every boundary (0-1 vs 0-100) | integration-boundaries | should-fix | semantic |
40
+ | 0032 | Module lifecycle: subscribe after init, unsubscribe on shutdown | resource-lifecycle | should-fix | semantic |
41
+ | 0033 | Async iteration over mutable collections needs snapshot | async-traps | blocker | syntactic |
42
+ | 0034 | Caller-side missing await silently discards work | async-traps | blocker | semantic |
43
+ | 0035 | Duplicate registration IDs cause silent overwrite | silent-failures | should-fix | semantic |
44
+ | 0036 | WebSocket dirty disconnects raise RuntimeError, not close | resource-lifecycle | should-fix | semantic |
45
+ | 0037 | Parallel agents sharing worktree corrupt staging area | integration-boundaries | blocker | semantic |
46
+ | 0038 | Subscribe without stored ref = cannot unsubscribe | resource-lifecycle | should-fix | syntactic |
47
+ | 0039 | Fallback `or default()` hides initialization bugs | silent-failures | should-fix | semantic |
48
+ | 0040 | Process all events when 5% are relevant — filter first | performance | should-fix | semantic |
49
+ | 0041 | Ambiguous base dir variable causes path double-nesting | integration-boundaries | should-fix | semantic |
50
+ | 0042 | Spec compliance without quality review misses defensive gaps | integration-boundaries | should-fix | semantic |
51
+ | 0043 | Exact count assertions on extensible collections break on addition | test-anti-patterns | should-fix | syntactic |
52
+ | 0044 | Relative `file:` deps break in git worktrees | integration-boundaries | should-fix | semantic |
53
+ | 0045 | Iterative "how would you improve" catches 35% more design gaps | integration-boundaries | should-fix | semantic |
54
+ | 0046 | Plan-specified test assertions can have math bugs | test-anti-patterns | should-fix | semantic |
55
+ | 0047 | pytest runs single-threaded by default — add xdist | performance | should-fix | semantic |
56
+ | 0048 | Multi-batch plans need explicit integration wiring batch | integration-boundaries | should-fix | semantic |
57
+ | 0049 | A/B verification finds zero-overlap bug classes | integration-boundaries | should-fix | semantic |
58
+ | 0050 | Editing files sourced by a running process breaks function signatures | integration-boundaries | blocker | semantic |
59
+ | 0051 | Infrastructure fixes in a plan cannot benefit the run executing that plan | integration-boundaries | should-fix | semantic |
60
+ | 0052 | Uncommitted changes from parallel work fail the quality gate git-clean check | integration-boundaries | blocker | semantic |
61
+ | 0053 | Missing jq -c flag causes string comparison failures in tests | test-anti-patterns | should-fix | syntactic |
62
+ | 0054 | Markdown parser matches headers inside code blocks and test fixtures | silent-failures | should-fix | semantic |
63
+ | 0055 | LLM agents compensate for garbled batch prompts using cross-batch context | integration-boundaries | nice-to-have | semantic |
64
+ | 0056 | grep -c exits 1 on zero matches, breaking || fallback arithmetic | silent-failures | should-fix | syntactic |
65
+ | 0057 | New generated artifacts break git-clean quality gates | integration-boundaries | should-fix | semantic |
66
+ | 0058 | Dead config keys never consumed by any module | silent-failures | should-fix | semantic |
67
+ | 0059 | Contract test shared structures across producer and consumer | test-anti-patterns | should-fix | semantic |
68
+ | 0060 | set -e kills long-running bash scripts silently on inter-step failures | silent-failures | blocker | semantic |
69
+ | 0061 | Context injection into tracked files creates dirty git state when subprocess commits | integration-boundaries | should-fix | semantic |
70
+ | 0062 | Sibling bugs hide next to the fix | integration-boundaries | should-fix | semantic |
71
+ | 0063 | One boolean flag serving two lifetimes is a conflation bug | silent-failures | should-fix | semantic |
72
+ | 0064 | Tests that pass for the wrong reason provide false confidence | test-anti-patterns | should-fix | syntactic |
73
+ | 0065 | pipefail grep count double output | silent-failures | should-fix | syntactic |
74
+ | 0066 | local keyword outside function | silent-failures | blocker | syntactic |
75
+ | 0067 | stdin hang non-interactive shell | silent-failures | should-fix | semantic |
76
+ | 0068 | Agent builds the wrong thing correctly | specification-drift | blocker | semantic |
77
+ | 0069 | Plan quality dominates execution quality 3:1 | specification-drift | should-fix | semantic |
78
+ | 0070 | Spec echo-back prevents 60% of agent failures | specification-drift | should-fix | semantic |
79
+ | 0071 | Positive instructions outperform negative ones for LLMs | specification-drift | should-fix | semantic |
80
+ | 0072 | Lost in the Middle — context placement affects accuracy 20pp | context-retrieval | should-fix | semantic |
81
+ | 0073 | Unscoped lessons cause 67% false positive rate at scale | context-retrieval | should-fix | semantic |
82
+ | 0074 | Stale context injection sends wrong batch's state | context-retrieval | should-fix | semantic |
83
+ | 0075 | Research artifacts must persist — ephemeral research is wasted | context-retrieval | should-fix | semantic |
84
+ | 0076 | Wrong decomposition contaminates all downstream batches | planning-control-flow | blocker | semantic |
85
+ | 0077 | Cherry-pick merges from parallel worktrees need manual resolution | planning-control-flow | should-fix | semantic |
86
+ | 0078 | Static review without live test optimizes for wrong risk class | planning-control-flow | should-fix | semantic |
87
+ | 0079 | Multi-batch plans need explicit integration wiring batch | planning-control-flow | should-fix | semantic |
88
+
89
+ ## Root Cause Clusters
90
+
91
+ ### Cluster A: Silent Failures
92
+
93
+ Something fails but produces no error, no log, no crash. The system continues with wrong data or missing functionality. You only discover the failure when a downstream consumer produces garbage — hours or days later.
94
+
95
+ **Lessons:** 0001, 0003, 0005, 0008, 0009, 0010, 0013, 0014, 0019, 0020, 0022, 0026, 0027, 0028, 0029, 0035, 0039, 0054, 0056, 0058, 0060, 0063
96
+
97
+ **Also silent (async/lifecycle):** 0002, 0033, 0034 (async bugs are silent failures with extra steps), 0032, 0036, 0038 (lifecycle bugs cause silent resource leaks)
98
+
99
+ **Pattern:** The failure mode is always the same — no exception, no log line, no crash. The operation appears to succeed. The root cause varies: swallowed exceptions (0001, 0003), wrong tool configuration (0008, 0026), implicit behavior (0010, 0014, 0019, 0022), or missing validation (0028, 0029).
100
+
101
+ **Defense:** Every `except` block logs before returning. Every tool configuration is tested against a known-bad input. Every implicit behavior is documented with an explicit test.
102
+
103
+ ### Cluster B: Integration Boundaries
104
+
105
+ Each component works alone. The bug hides at the seam between two components — where one produces output and another consumes it. Unit tests pass. Integration fails.
106
+
107
+ **Lessons:** 0006, 0007, 0011, 0012, 0015, 0016, 0017, 0018, 0021, 0024, 0025, 0030, 0031, 0037, 0041, 0042, 0044, 0045, 0048, 0049, 0050, 0051, 0052, 0055, 0057, 0059, 0061, 0062
108
+
109
+ **Pattern:** Producer and consumer agree on the interface but disagree on semantics — units (0031), schema shape (0015), path depth (0041), or lifecycle timing (0016). Each passes its own tests because each tests against its own assumptions, not the other's reality.
110
+
111
+ **Defense:** Dual-axis testing (0021) — horizontal sweep confirms every interface exists, vertical trace confirms data flows end-to-end. Contract tests between producer and consumer. Shared schema definitions instead of independent copies.
112
+
113
+ ### Cluster C: Cold-Start Assumptions
114
+
115
+ Works in steady state, fails on restart or first boot. The system depends on state that accumulates during runtime — event history, caches, registries — and produces wrong results when that state is empty.
116
+
117
+ **Lessons:** 0016, 0020, 0035, 0039
118
+
119
+ **Pattern:** The system is designed for the happy path (events flowing, caches warm, registries populated) and never tested from a cold start. First-boot behavior is an afterthought — or never thought of at all.
120
+
121
+ **Defense:** Test every component from empty state. Seed current state on startup via REST/query (0016). Checkpoint state incrementally (0020). Validate initialization rather than falling back silently (0039).
122
+
123
+ ### Cluster D: Specification Drift
124
+
125
+ The agent builds the wrong thing correctly. Code passes tests, but tests validate the agent's interpretation — not the user's intent. The spec was misunderstood, and no echo-back step caught it.
126
+
127
+ **Lessons:** 0068, 0069, 0070, 0071
128
+
129
+ **Pattern:** The agent reads requirements, forms an interpretation, writes code and tests against that interpretation, and everything passes. The divergence from user intent is invisible because the feedback loop is closed — the agent grades its own homework.
130
+
131
+ **Defense:** Echo back requirements before implementing. Score plan quality before execution. Use positive instructions ("do Y") instead of negative ("don't do X"). The echo-back gate catches 60%+ of failures.
132
+
133
+ ### Cluster E: Context & Retrieval
134
+
135
+ Information is available but buried, misscoped, or placed in the wrong position within the context window. The agent has access to the right data but doesn't use it effectively.
136
+
137
+ **Lessons:** 0072, 0073, 0074, 0075
138
+
139
+ **Pattern:** Critical requirements are lost in the middle of long context (0072), irrelevant lessons fire due to missing scope (0073), stale context from a previous batch pollutes the current one (0074), or research findings exist only in conversation and are lost on context reset (0075).
140
+
141
+ **Defense:** Place task at top, requirements at bottom (U-shaped attention). Scope lessons to projects. Use ephemeral context injection for batch-scoped data. Always write research to files.
142
+
143
+ ### Cluster F: Planning & Control Flow
144
+
145
+ The plan itself is wrong — wrong decomposition, wrong integration assumptions, or wrong verification strategy. Individual batches execute correctly but the overall result is broken.
146
+
147
+ **Lessons:** 0076, 0077, 0078, 0079
148
+
149
+ **Pattern:** Decomposition errors compound downstream (0076), parallel worktree merges need semantic conflict resolution (0077), static review alone misses runtime bugs (0078), and components built in separate batches are never wired together (0079).
150
+
151
+ **Defense:** Validate decomposition before execution. Add explicit integration wiring batches. Combine static review with live testing. Use interactive conflict resolution for cherry-picks.
152
+
153
+ ## Six Rules to Build By
154
+
155
+ 1. **Log before fallback.** Every `except`, every `catch`, every `|| true` — log the failure before returning a default. Silent fallbacks are the #1 source of invisible bugs. (0001, 0003, 0039)
156
+
157
+ 2. **Test from cold start.** If your system depends on accumulated state, test it from empty. Seed current state on boot, checkpoint incrementally, and verify initialization completed before proceeding. (0016, 0020, 0035)
158
+
159
+ 3. **One source of truth.** When two components need the same logic, schema, or configuration — one owns it, the other imports it. Independent copies diverge. Always. (0015, 0017, 0024, 0030)
160
+
161
+ 4. **Verify at boundaries.** Every time data crosses a boundary (module to module, service to service, human to machine), verify the contract: types, units, format, completeness. Don't trust, verify. (0025, 0031, 0042)
162
+
163
+ 5. **Trace end-to-end.** Unit tests are necessary but not sufficient. At least one test must trace a single input through every layer to the final output. If it takes too long to write, the architecture has too many layers. (0018, 0021, 0048)
164
+
165
+ 6. **Make failures visible.** Every gate, check, and quality tool must be tested against a known-bad input to prove it actually catches something. A tool that reports "0 issues" on any input is worse than no tool. (0008, 0026, 0043)
166
+
167
+ ## Diagnostic Shortcuts
168
+
169
+ When you see this symptom, check these lessons first.
170
+
171
+ | Symptom | Check First |
172
+ |---------|-------------|
173
+ | Feature works but produces no output on restart | 0016, 0020 |
174
+ | Tests pass but feature doesn't work end-to-end | 0018, 0021, 0048 |
175
+ | Exception happens but no log entry appears | 0001, 0003, 0034 |
176
+ | Script works on one machine, fails on another | 0010, 0019 |
177
+ | Quality gate reports "no issues" on bad code | 0008, 0026 |
178
+ | Frontend shows stale or wrong data | 0015, 0031 |
179
+ | Registry/cache missing entries after update | 0030, 0035 |
180
+ | WebSocket connection drops without error | 0036, 0038 |
181
+ | Async function appears to do nothing | 0002, 0034 |
182
+ | Build works locally but fails in CI/worktree | 0037, 0044 |
183
+ | Component renders blank in JSX | 0022, 0027 |
184
+ | API rejects message that looks correct | 0012, 0013 |
185
+ | Secret value appears in git log | 0029 |
186
+ | Test suite takes 10x longer than expected | 0047 |
187
+ | Test breaks every time collection grows | 0004, 0043 |
188
+ | Lint fix creates new lint failures | 0023 |
189
+ | Plan looks complete but integration is broken | 0011, 0042, 0045, 0049 |
190
+ | Quality gate fails but batch agent didn't cause it | 0050, 0052 |
191
+ | Infrastructure fix committed but not taking effect | 0051 |
192
+ | Parser finds more batches/tasks than plan actually has | 0054 |
193
+ | jq assertion fails with multiline vs compact mismatch | 0053 |
194
+ | Agent implements correct work despite garbled prompt | 0055 |
195
+ | Bash arithmetic fails with "syntax error in expression" | 0056 |
196
+ | Quality gate fails with "uncommitted changes" after adding new feature | 0007, 0052, 0057, 0061 |
197
+ | Long-running bash script dies silently between steps | 0060 |
198
+ | Config key exists but has no effect | 0058 |
199
+ | Fixed a bug but same bug exists in sibling function | 0062 |
200
+ | Boolean flag means different things at different times | 0063 |
201
+ | Test passes but reversing the fix doesn't break it | 0064 |
@@ -0,0 +1,85 @@
1
+ # Lesson Template
2
+
3
+ Copy this file to `docs/lessons/NNNN-<slug>.md` where NNNN is the next sequential ID.
4
+
5
+ ```yaml
6
+ ---
7
+ id: <next sequential number>
8
+ title: "<Short descriptive title — what the anti-pattern IS>"
9
+ severity: <blocker|should-fix|nice-to-have>
10
+ languages: [<python|javascript|typescript|shell|all>]
11
+ scope: [<universal|language:X|framework:X|domain:X|project:X>] # optional, defaults to universal
12
+ category: <async-traps|resource-lifecycle|silent-failures|integration-boundaries|test-anti-patterns|performance>
13
+ pattern:
14
+ type: <syntactic|semantic>
15
+ regex: "<grep -P pattern>" # Required for syntactic, omit for semantic
16
+ description: "<what to look for>"
17
+ fix: "<one-line description of the correct approach>"
18
+ positive_alternative: "<optional — what TO DO instead, phrased positively>"
19
+ example:
20
+ bad: |
21
+ <2-5 lines showing the anti-pattern>
22
+ good: |
23
+ <2-5 lines showing the correct code>
24
+ ---
25
+
26
+ ## Observation
27
+ <What happened — the bug, the symptom, the impact. Be factual and specific.>
28
+
29
+ ## Insight
30
+ <Why it happened — the root cause, the mechanism that makes this dangerous.>
31
+
32
+ ## Lesson
33
+ <The rule to follow. One paragraph, actionable, testable.>
34
+ ```
35
+
36
+ ## Field Guide
37
+
38
+ ### Severity
39
+ - **blocker** — Data loss, crashes, silent corruption. Must fix before merge.
40
+ - **should-fix** — Subtle bugs, degraded behavior, tech debt. Fix in this sprint.
41
+ - **nice-to-have** — Code smells, future risk. Fix when touching the file.
42
+
43
+ ### Pattern Type
44
+ - **syntactic** — Detectable by grep. Requires a `regex` field. Used by `lesson-check.sh` for instant enforcement (<2s). Aim for near-zero false positives.
45
+ - **semantic** — Needs context to detect. Requires a `description` field. Used by the `lesson-scanner` agent during verification. Can have higher false positive tolerance since AI reviews context.
46
+
47
+ ### Categories
48
+ | Category | What it covers |
49
+ |----------|---------------|
50
+ | `async-traps` | Forgotten awaits, concurrent modification, coroutine misuse |
51
+ | `resource-lifecycle` | Leaked connections, missing cleanup, subscription without unsubscribe |
52
+ | `silent-failures` | Bare exceptions, swallowed errors, lost stack traces |
53
+ | `integration-boundaries` | Cross-module bugs, path issues, API contract mismatches |
54
+ | `test-anti-patterns` | Brittle assertions, mocking the wrong thing, false confidence |
55
+ | `performance` | Missing filters, unnecessary work, resource waste |
56
+ | `specification-drift` | Misinterpreted requirements, wrong implementation, plan quality failures |
57
+ | `context-retrieval` | Lost context, stale injection, misscoped lessons, ephemeral research |
58
+ | `planning-control-flow` | Wrong decomposition, missing integration wiring, cherry-pick conflicts |
59
+
60
+ ### Scope (Project-Level Filtering)
61
+ Scope controls which projects a lesson applies to. Language filtering (`languages:`) picks files; scope filtering picks projects. Both are orthogonal.
62
+
63
+ | Tag Format | Example | Matches |
64
+ |------------|---------|---------|
65
+ | `universal` | `[universal]` | All projects (default) |
66
+ | `language:<lang>` | `[language:python]` | Projects with that language |
67
+ | `framework:<name>` | `[framework:pytest]` | Projects using that framework |
68
+ | `domain:<name>` | `[domain:ha-aria]` | Domain-specific projects |
69
+ | `project:<name>` | `[project:autonomous-coding-toolkit]` | Exact project match |
70
+
71
+ Default when omitted: `[universal]` — backward compatible.
72
+
73
+ ### Writing Good Regex Patterns
74
+ - Test with `grep -P "<pattern>" <your_file>` before submitting
75
+ - Escape special characters: `\\.` for literal dot, `\\(` for literal paren
76
+ - Use `\\b` for word boundaries to reduce false positives
77
+ - Use `\\s` for any whitespace
78
+ - Prefer patterns that match the *structure* of the anti-pattern, not specific variable names
79
+
80
+ ## Examples
81
+
82
+ See existing lessons in this directory for reference:
83
+ - `0001-bare-exception-swallowing.md` — syntactic, blocker
84
+ - `0002-async-def-without-await.md` — semantic, blocker
85
+ - `0003-create-task-without-callback.md` — semantic, should-fix