autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 58
|
|
3
|
+
title: "Config keys registered but never consumed are dead knobs"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [python]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Config keys registered in a defaults/schema file but never read via get_config or equivalent"
|
|
11
|
+
fix: "Wire every registered config key to a get_config call, or remove the dead registration"
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# config_defaults.py
|
|
15
|
+
register_config("automation.min_confidence", default=0.7)
|
|
16
|
+
register_config("automation.max_suggestions", default=5)
|
|
17
|
+
|
|
18
|
+
# automation.py — uses hardcoded constants, never reads config
|
|
19
|
+
MIN_CONFIDENCE = 0.7
|
|
20
|
+
MAX_SUGGESTIONS = 5
|
|
21
|
+
good: |
|
|
22
|
+
# config_defaults.py
|
|
23
|
+
register_config("automation.min_confidence", default=0.7)
|
|
24
|
+
|
|
25
|
+
# automation.py — reads from config system
|
|
26
|
+
min_confidence = get_config_value("automation.min_confidence")
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Observation
|
|
30
|
+
|
|
31
|
+
Config keys were registered in a defaults file and exposed in a Settings UI,
|
|
32
|
+
but the consuming module used hardcoded module-level constants instead of
|
|
33
|
+
reading from config. Users could adjust settings that had zero runtime effect.
|
|
34
|
+
|
|
35
|
+
## Insight
|
|
36
|
+
|
|
37
|
+
This happens when registration and consumption are built in different work
|
|
38
|
+
batches. Batch N registers the config keys with defaults. Batch N+1
|
|
39
|
+
implements the module with hardcoded constants matching those defaults.
|
|
40
|
+
Neither batch verifies the integration. Dead config is worse than missing
|
|
41
|
+
config — it lies to operators by showing controls that do nothing.
|
|
42
|
+
|
|
43
|
+
## Lesson
|
|
44
|
+
|
|
45
|
+
Every config key registration must have a corresponding read call in the
|
|
46
|
+
consuming module. Add a CI check or quality gate step to detect orphaned
|
|
47
|
+
config keys: extract registered keys, extract consumed keys, diff them.
|
|
48
|
+
Config registration and consumption should happen in the same PR, or a
|
|
49
|
+
contract test must verify that every registered key has at least one consumer.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 59
|
|
3
|
+
title: "Independently-built shared structures diverge without contract tests"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: integration-boundaries
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Two modules independently construct the same ordered structure (feature list, column names, schema) without a shared source or contract test"
|
|
11
|
+
fix: "Add a contract test asserting both structures match, or extract a shared source of truth"
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# module_a.py — builds feature list from config iteration
|
|
15
|
+
features = [f.name for section in config for f in section.fields]
|
|
16
|
+
|
|
17
|
+
# module_b.py — builds feature list from manual append
|
|
18
|
+
features = []
|
|
19
|
+
features.extend(presence_features)
|
|
20
|
+
features.extend(pattern_features)
|
|
21
|
+
# Missing: event_features added to module_a but not here
|
|
22
|
+
good: |
|
|
23
|
+
# shared.py — single source of truth
|
|
24
|
+
def get_feature_names(config):
|
|
25
|
+
return [f.name for section in config for f in section.fields]
|
|
26
|
+
|
|
27
|
+
# OR: contract test
|
|
28
|
+
def test_feature_names_match():
|
|
29
|
+
assert module_a.get_features() == module_b.get_features()
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Observation
|
|
33
|
+
|
|
34
|
+
Two modules independently built the same ordered list (feature names for ML
|
|
35
|
+
column alignment). When a new section was added to one, the other was missed.
|
|
36
|
+
The lists had the same names but different ordering — causing a model trained
|
|
37
|
+
with column 3 = "lights_on" to use column 3 = "people_count" at inference.
|
|
38
|
+
Silent data corruption, no error.
|
|
39
|
+
|
|
40
|
+
## Insight
|
|
41
|
+
|
|
42
|
+
When two code paths independently construct a shared structure, a developer
|
|
43
|
+
adding to one path must manually remember to update the other — a human-memory
|
|
44
|
+
contract with no compile-time enforcement. This applies to feature vectors,
|
|
45
|
+
schema definitions, API response formats, config key lists, enum values, and
|
|
46
|
+
any ordered structure where position matters.
|
|
47
|
+
|
|
48
|
+
## Lesson
|
|
49
|
+
|
|
50
|
+
When two modules independently build a structure that must match (same
|
|
51
|
+
elements, same order), either: (1) extract a shared source of truth that both
|
|
52
|
+
import, or (2) add a contract test asserting equality. Add the contract test
|
|
53
|
+
BEFORE adding new elements — not after discovering the divergence.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 60
|
|
3
|
+
title: "set -e kills long-running bash scripts silently when inter-step commands fail"
|
|
4
|
+
severity: blocker
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Bash script uses set -euo pipefail without EXIT trap or guards around non-critical inter-step operations (notifications, logging, context injection). Any unguarded command failure silently terminates the entire script."
|
|
11
|
+
fix: "Add trap '_log_exit $?' EXIT for diagnostics, trap '' HUP PIPE for background survival, and wrap non-critical commands in { ... } || warn blocks"
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
set -euo pipefail
|
|
15
|
+
for batch in ...; do
|
|
16
|
+
context=$(generate_context) # guarded
|
|
17
|
+
sed '...' "$file" > "$tmp" # NOT guarded — kills script on failure
|
|
18
|
+
run_batch
|
|
19
|
+
notify_success "$batch" # NOT guarded — kills script on failure
|
|
20
|
+
done
|
|
21
|
+
good: |
|
|
22
|
+
set -euo pipefail
|
|
23
|
+
trap '_log_exit $?' EXIT
|
|
24
|
+
trap '' HUP PIPE
|
|
25
|
+
for batch in ...; do
|
|
26
|
+
context=$(generate_context || true)
|
|
27
|
+
{ sed '...' "$file" > "$tmp"; } || echo "WARNING: context injection failed" >&2
|
|
28
|
+
run_batch
|
|
29
|
+
{ notify_success "$batch"; } || echo "WARNING: notification failed" >&2
|
|
30
|
+
done
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Observation
|
|
34
|
+
|
|
35
|
+
`run-plan.sh` repeatedly died silently between batches during headless execution. The process simply vanished — no error output, no log entry, no state update. The script completed one batch successfully, then disappeared before starting the next.
|
|
36
|
+
|
|
37
|
+
Log files showed the last batch succeeded (quality gate passed, state updated), but the process was gone. Restarting with `--start-batch N` always worked for the next batch, then died again.
|
|
38
|
+
|
|
39
|
+
## Insight
|
|
40
|
+
|
|
41
|
+
Three compounding factors:
|
|
42
|
+
|
|
43
|
+
1. **`set -euo pipefail` with no EXIT trap.** Any command returning non-zero anywhere in the inter-batch code (CLAUDE.md sed manipulation, notification calls, failure pattern recording) kills the script instantly. Since there's no EXIT trap, the death is completely silent — no stack trace, no error message, no breadcrumb.
|
|
44
|
+
|
|
45
|
+
2. **No signal handling — specifically SIGPIPE (confirmed).** The script pipes `claude -p` output through `tee` to write to both a log file and stdout. When stdout is a pipe to a task manager (Claude Code background task), the pipe can close between batches. `tee` then receives SIGPIPE (signal 13, exit code 141), which kills the process. Background processes need `trap '' HUP PIPE` to survive both terminal disconnects and broken pipes.
|
|
46
|
+
|
|
47
|
+
3. **Non-critical operations not guarded.** The loop contained ~15 unguarded commands between the critical path (run batch → quality gate). Notifications, context injection, sed transformations, git log summaries — all could fail for transient reasons, and each failure was fatal under `set -e`.
|
|
48
|
+
|
|
49
|
+
The pattern is: `set -e` is for correctness on the *critical path*. But when a long-running script has both critical operations (batch execution, quality gates) and non-critical operations (notifications, logging, context assembly), `set -e` can't distinguish between them. Non-critical failures become critical kills.
|
|
50
|
+
|
|
51
|
+
## Lesson
|
|
52
|
+
|
|
53
|
+
Long-running bash scripts with `set -e` must: (1) add `trap '_log_exit $?' EXIT` so unexpected terminations leave diagnostic breadcrumbs, (2) add `trap '' HUP` if they run in the background, and (3) wrap every non-critical operation in `{ commands; } || warn` blocks so transient failures don't kill the entire pipeline. The rule: if losing this operation wouldn't invalidate the batch, it must not be able to kill the script.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 61
|
|
3
|
+
title: "Context injection into tracked files creates dirty git state when subprocess commits"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: integration-boundaries
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Script injects temporary content into a tracked file (e.g., CLAUDE.md), runs a subprocess that may commit that file, then tries to restore from backup — creating a diff against the committed version."
|
|
11
|
+
fix: "Use git checkout -- <file> to restore to HEAD state instead of backup-based restoration. Fall back to backup only if file was never tracked."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
backup=$(cat "$file")
|
|
15
|
+
echo "$context" >> "$file"
|
|
16
|
+
run_subprocess # subprocess commits $file with injected content
|
|
17
|
+
echo "$backup" > "$file" # now differs from HEAD — dirty state
|
|
18
|
+
good: |
|
|
19
|
+
echo "$context" >> "$file"
|
|
20
|
+
run_subprocess
|
|
21
|
+
git checkout -- "$file" 2>/dev/null || {
|
|
22
|
+
# fallback: file was never tracked
|
|
23
|
+
if [[ "$existed_before" == false ]]; then
|
|
24
|
+
rm -f "$file"
|
|
25
|
+
fi
|
|
26
|
+
}
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Observation
|
|
30
|
+
|
|
31
|
+
`run-plan.sh` injects per-batch context into `CLAUDE.md` before each batch (a `## Run-Plan: Batch N` section with failure patterns, prior batch summaries, and referenced files). After the batch completes, it restores CLAUDE.md from a backup taken before injection.
|
|
32
|
+
|
|
33
|
+
Batch 5 failed the quality gate with "uncommitted changes to CLAUDE.md" even though the batch itself passed all tests. The issue: the Claude subprocess committed CLAUDE.md with the injected context as part of its work. The restoration code then wrote the pre-injection backup, creating a diff against the now-committed HEAD that included the injected content.
|
|
34
|
+
|
|
35
|
+
## Insight
|
|
36
|
+
|
|
37
|
+
This is an integration boundary bug between two phases that both touch the same file:
|
|
38
|
+
|
|
39
|
+
1. **Orchestrator phase** — injects context into CLAUDE.md, expects to restore it after
|
|
40
|
+
2. **Subprocess phase** — sees CLAUDE.md as a project file, may commit it with its changes
|
|
41
|
+
|
|
42
|
+
The backup-based restoration assumes CLAUDE.md's HEAD hasn't changed during the subprocess run. But if the subprocess commits the file (which is correct behavior — it should commit its changes), the backup is now out of date. Writing the backup creates a diff between HEAD (with injected content) and the working tree (without it).
|
|
43
|
+
|
|
44
|
+
The fix is to use `git checkout -- CLAUDE.md` which always restores to whatever HEAD currently is — regardless of whether the subprocess committed the injected version.
|
|
45
|
+
|
|
46
|
+
Edge case: if CLAUDE.md was never tracked (created fresh by injection), `git checkout` fails. Fall back to `rm -f` in that case.
|
|
47
|
+
|
|
48
|
+
## Lesson
|
|
49
|
+
|
|
50
|
+
When injecting temporary content into tracked files before running a subprocess that may commit, never restore from an in-memory backup. The subprocess may commit the modified version, making the backup stale. Use `git checkout -- <file>` to restore to HEAD state, which is always correct regardless of whether the subprocess committed. Guard the edge case where the file wasn't previously tracked.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 62
|
|
3
|
+
title: "Sibling bugs hide next to the fix"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: integration-boundaries
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "When fixing a bug in a function, scan adjacent functions in the same file for the same root cause pattern"
|
|
11
|
+
fix: "After fixing a function, grep the same file for the same anti-pattern in sibling functions"
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# Fix complete_batch's --argjson crash, ship it
|
|
15
|
+
# (set_quality_gate has the same crash 30 lines below)
|
|
16
|
+
good: |
|
|
17
|
+
# Fix complete_batch's --argjson crash
|
|
18
|
+
# Scan file: grep -n 'argjson' run-plan-state.sh
|
|
19
|
+
# Found same pattern in set_quality_gate — fix both
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Observation
|
|
23
|
+
In Phase 1 bug fixes, 2 of 8 tasks had code quality reviewers find the exact same bug in a sibling function within the same file. `set_quality_gate` had the same `--argjson` crash as `complete_batch`. The API curl lacked `--connect-timeout` just like the health check curl 6 lines above it.
|
|
24
|
+
|
|
25
|
+
## Insight
|
|
26
|
+
Implementers fix what the ticket says. The same root cause often exists in nearby code written at the same time with the same assumptions. Fresh-context subagents don't carry knowledge of what was just fixed, so they can't pattern-match on "I just fixed this — is there another one?"
|
|
27
|
+
|
|
28
|
+
## Lesson
|
|
29
|
+
After fixing a bug, grep the entire file for the same anti-pattern before committing. If the root cause is a bad API usage (like `--argjson` with strings), search for all call sites of that API in the file. Code review should always check: "does this same bug exist anywhere else in this file?"
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 63
|
|
3
|
+
title: "One boolean flag serving two lifetimes is a conflation bug"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell, python, javascript]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "A boolean flag that is set in one lifecycle (e.g., per-iteration) but read in another (e.g., post-loop) — the flag's meaning changes depending on when you read it"
|
|
11
|
+
fix: "Split into separate variables with explicit lifecycle names (e.g., _baseline_stash_created vs _winner_stash_created)"
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
_stash_created=false
|
|
15
|
+
# Set during per-candidate loop (baseline purpose)
|
|
16
|
+
# Read after loop ends (winner purpose)
|
|
17
|
+
# Same flag, different meanings at different times
|
|
18
|
+
good: |
|
|
19
|
+
_baseline_stash_created=false
|
|
20
|
+
_winner_stash_created=false
|
|
21
|
+
# Each flag has one meaning throughout its entire lifetime
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Observation
|
|
25
|
+
In the sampling stash fix (#27), `_stash_created` tracked both "was the baseline stashed?" (per-candidate lifecycle) and "was the winner stashed?" (post-loop lifecycle). When candidate 0 passed and its winner state was stashed, the next candidate's restore code popped the winner stash thinking it was the baseline.
|
|
26
|
+
|
|
27
|
+
## Insight
|
|
28
|
+
A boolean with two meanings at different points in time is a state machine with implicit transitions. The transitions are invisible because the variable name doesn't change — only the programmer's mental model of what it represents changes. This is especially dangerous in loops where the flag is set in one iteration and read in a different context.
|
|
29
|
+
|
|
30
|
+
## Lesson
|
|
31
|
+
When a flag variable is set in one code block and read in a different block with a different purpose, split it into named variables that encode their purpose. The variable name should make its lifecycle explicit. If you can't describe when the flag is "active" in one sentence, it needs to be split.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 64
|
|
3
|
+
title: "Tests that pass for the wrong reason provide false confidence"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: test-anti-patterns
|
|
8
|
+
pattern:
|
|
9
|
+
type: syntactic
|
|
10
|
+
regex: "\\bPATH=\"[^\":]*\""
|
|
11
|
+
description: "PATH assignment without colon (no prepend/append) — replaces entire PATH, removing all other commands from the environment"
|
|
12
|
+
fix: "Verify the test fails when the fix is reverted. Ensure test setup affects only the variable under test, not its dependencies."
|
|
13
|
+
example:
|
|
14
|
+
bad: |
|
|
15
|
+
# Test: free is missing → exit 2
|
|
16
|
+
PATH="/fake/bin" # removes awk too!
|
|
17
|
+
check_memory_available 4 # exits 2 because awk is missing, not free
|
|
18
|
+
good: |
|
|
19
|
+
# Test: free is missing → exit 2
|
|
20
|
+
PATH="/fake/bin:$PATH" # fake free, real awk
|
|
21
|
+
check_memory_available 4 # exits 2 because free outputs nothing
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Observation
|
|
25
|
+
A test to verify `check_memory_available` returns exit 2 when `free` is unavailable set `PATH="/fake/bin"` (replacing entire PATH). This also removed `awk`, so the function returned exit 2 because awk failed — not because free was missing. The test passed, but it wasn't testing what it claimed.
|
|
26
|
+
|
|
27
|
+
## Insight
|
|
28
|
+
Tests that replace environment state (PATH, env vars, config files) can have blast radius beyond the intended target. The test author thinks they're isolating one variable, but they're changing a system-wide setting that affects multiple tools in the pipeline.
|
|
29
|
+
|
|
30
|
+
## Lesson
|
|
31
|
+
When mocking system commands, prepend to PATH (`PATH="$fake:$PATH"`) rather than replacing it. After writing a test, revert the fix and verify the test fails — if it still passes, it's testing the wrong thing. Name tests to describe the code path they exercise, not just the expected outcome.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 65
|
|
3
|
+
title: "pipefail + grep -c + fallback produces double output"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [language:bash]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: syntactic
|
|
10
|
+
regex: "grep\\s+-c.*\\|\\|\\s*echo\\s+0"
|
|
11
|
+
description: "grep -c piped with || echo 0 under set -o pipefail produces '0\\n0' — grep writes 0, then fallback also writes 0"
|
|
12
|
+
fix: "Wrap grep -c in a helper function that captures the exit code internally, or use || true inside a subshell"
|
|
13
|
+
positive_alternative: "Use a _count_matches helper: result=$(grep -c ... || true); echo \"${result:-0}\""
|
|
14
|
+
example:
|
|
15
|
+
bad: |
|
|
16
|
+
set -euo pipefail
|
|
17
|
+
count=$(echo "$text" | grep -c "pattern" || echo 0)
|
|
18
|
+
# Produces "0\n0" when no match — grep outputs 0, then fallback also outputs 0
|
|
19
|
+
good: |
|
|
20
|
+
set -euo pipefail
|
|
21
|
+
_count_matches() {
|
|
22
|
+
local result exit_code=0
|
|
23
|
+
result=$(grep -ciE "$1" 2>&1) || exit_code=$?
|
|
24
|
+
[[ $exit_code -le 1 ]] && echo "${result:-0}" || echo "0"
|
|
25
|
+
}
|
|
26
|
+
count=$(echo "$text" | _count_matches "pattern")
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Observation
|
|
30
|
+
|
|
31
|
+
In `validate-plan-quality.sh`, scoring functions used `grep -ciE "pattern" || echo 0` to count matches safely. Under `set -euo pipefail`, when grep found zero matches (exit 1), both grep's output ("0") AND the fallback ("0") were written to stdout, producing "0\n0" instead of "0".
|
|
32
|
+
|
|
33
|
+
## Insight
|
|
34
|
+
|
|
35
|
+
`set -o pipefail` propagates the non-zero exit from grep through the pipe, causing the `|| echo 0` fallback to execute. But grep already wrote "0" to stdout before exiting. The fallback then appends another "0". This is invisible in most tests because `[[ "0\n0" -gt 0 ]]` still works in bash (it reads the first line), but it corrupts any downstream parsing.
|
|
36
|
+
|
|
37
|
+
## Lesson
|
|
38
|
+
|
|
39
|
+
Never use `command || echo default` for commands that write output before failing. Instead, capture the exit code in a wrapper function and handle it explicitly. The `_count_matches` pattern works: run grep inside the function, capture exit code, distinguish "no matches" (exit 1, normal) from "grep error" (exit 2+, unexpected).
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 66
|
|
3
|
+
title: "local keyword used outside function scope"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [language:bash]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "bash `local` keyword outside a function body — undefined behavior, works in bash but fails in dash/sh and is technically a bug"
|
|
11
|
+
fix: "Only use `local` inside function bodies. At script top-level, just assign the variable directly."
|
|
12
|
+
positive_alternative: "Remove `local` from top-level variable assignments; use plain assignment instead"
|
|
13
|
+
example:
|
|
14
|
+
bad: |
|
|
15
|
+
# At script top-level (not inside a function)
|
|
16
|
+
if [[ "$JSON_OUTPUT" == true ]]; then
|
|
17
|
+
local escaped_plan
|
|
18
|
+
escaped_plan=$(printf '%s' "$PLAN_FILE" | jq -Rs '.')
|
|
19
|
+
fi
|
|
20
|
+
good: |
|
|
21
|
+
# At script top-level — no local keyword
|
|
22
|
+
if [[ "$JSON_OUTPUT" == true ]]; then
|
|
23
|
+
escaped_plan=$(printf '%s' "$PLAN_FILE" | jq -Rs '.')
|
|
24
|
+
fi
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Observation
|
|
28
|
+
|
|
29
|
+
In `validate-plan-quality.sh`, the JSON output block at the script's top level used `local escaped_plan` to declare a variable. This worked in bash but is technically undefined behavior — `local` is only valid inside functions.
|
|
30
|
+
|
|
31
|
+
## Insight
|
|
32
|
+
|
|
33
|
+
Bash tolerates `local` outside functions (it just creates a regular variable), but this is a portability landmine. If the script is ever sourced by another script or run with `dash`/`sh`, it fails. It also misleads readers into thinking the code is inside a function when it isn't.
|
|
34
|
+
|
|
35
|
+
## Lesson
|
|
36
|
+
|
|
37
|
+
Reserve `local` for function bodies exclusively. At script top-level, use plain variable assignment. This is especially important in scripts that use `source` chains, where the boundary between "inside a function" and "top-level" blurs across files.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 67
|
|
3
|
+
title: "Scripts hang when stdin is a socket or pipe in non-interactive shells"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: silent-failures
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Script reads from stdin without redirection — hangs in CI, Claude Code, cron, or any environment where stdin is not a terminal"
|
|
11
|
+
fix: "Add </dev/null to commands that may read stdin, or redirect stdin at the test harness level"
|
|
12
|
+
positive_alternative: "Run subprocesses with explicit stdin: bash script.sh </dev/null"
|
|
13
|
+
example:
|
|
14
|
+
bad: |
|
|
15
|
+
# Test harness — stdin inherited from parent (may be socket/pipe)
|
|
16
|
+
for t in scripts/tests/test-*.sh; do
|
|
17
|
+
bash "$t" >/dev/null 2>&1
|
|
18
|
+
done
|
|
19
|
+
good: |
|
|
20
|
+
# Test harness — stdin explicitly from /dev/null
|
|
21
|
+
for t in scripts/tests/test-*.sh; do
|
|
22
|
+
bash "$t" </dev/null >/dev/null 2>&1
|
|
23
|
+
done
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Observation
|
|
27
|
+
|
|
28
|
+
Running the test suite from Claude Code's shell caused `test-lesson-check.sh` to hang indefinitely. The process was blocked on `unix_stream_read_generic` — reading from a Unix socket that served as stdin in the Claude environment. Multiple stale processes accumulated across retries.
|
|
29
|
+
|
|
30
|
+
## Insight
|
|
31
|
+
|
|
32
|
+
Claude Code (and similar environments like CI runners, cron jobs, tmux send-keys) connects stdin to non-terminal file descriptors. Any script that reads stdin — even indirectly through a command like `read`, `cat` without args, or a tool that checks for piped input — will block forever waiting for data that never arrives. This is invisible in interactive testing because the terminal provides EOF on Ctrl+D.
|
|
33
|
+
|
|
34
|
+
## Lesson
|
|
35
|
+
|
|
36
|
+
Always redirect stdin from `/dev/null` when invoking scripts in non-interactive contexts. The safest place is the test harness loop itself (`bash "$t" </dev/null`), which protects all tests regardless of what they do internally. For individual scripts, audit for stdin-reading commands and add explicit `/dev/null` redirection.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 68
|
|
3
|
+
title: "Agent builds the wrong thing correctly"
|
|
4
|
+
severity: blocker
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: specification-drift
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Agent misinterprets requirements — code passes tests but doesn't match the actual spec. Tests were written against the agent's interpretation, not the user's intent."
|
|
11
|
+
fix: "Before implementation, echo back the spec in your own words and get explicit user confirmation. Write acceptance criteria from the spec, not from your interpretation."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# User asks for "retry with backoff"
|
|
15
|
+
# Agent implements retry with fixed 1s delay
|
|
16
|
+
# Test checks retry happens — passes
|
|
17
|
+
# But spec meant exponential backoff
|
|
18
|
+
good: |
|
|
19
|
+
# Echo back: "I'll implement retry with exponential backoff: 1s, 2s, 4s, 8s, max 30s"
|
|
20
|
+
# User confirms or corrects
|
|
21
|
+
# Write test that verifies exponential timing
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Observation
|
|
25
|
+
An agent received a feature request, implemented it with full test coverage, and all tests passed. But the implementation didn't match what the user actually wanted — the agent's interpretation of the requirements diverged from the user's intent. The bug was only discovered during manual review.
|
|
26
|
+
|
|
27
|
+
## Insight
|
|
28
|
+
When an agent writes both the implementation AND the tests, the tests validate the agent's understanding, not the user's requirements. This creates a closed loop where wrong code passes wrong tests. The spec is the only external anchor — but agents often skip the echo-back step that would catch misinterpretation.
|
|
29
|
+
|
|
30
|
+
## Lesson
|
|
31
|
+
Always echo back requirements before implementing. The echo-back gate catches the 60%+ of failures that come from spec misunderstanding (not from coding errors). Write acceptance criteria from the original spec text, not from your paraphrase of it.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 69
|
|
3
|
+
title: "Plan quality dominates execution quality 3:1"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: specification-drift
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Investing heavily in execution optimization (retries, sampling, model routing) while the plan itself has gaps, ambiguities, or wrong decomposition. A bad plan executed perfectly still produces wrong output."
|
|
11
|
+
fix: "Invest in plan quality first: scorecard the plan for completeness, correctness of decomposition, and dependency ordering before starting execution."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# Plan says "add authentication" with no detail
|
|
15
|
+
# Execution uses MAB + competitive mode + 3 retries
|
|
16
|
+
# Result: perfectly executed wrong authentication scheme
|
|
17
|
+
good: |
|
|
18
|
+
# Plan specifies: JWT with refresh tokens, 15min access TTL
|
|
19
|
+
# Plan scorecard: all tasks have acceptance criteria
|
|
20
|
+
# Simple headless execution gets it right first try
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Observation
|
|
24
|
+
Across multiple autonomous coding runs, the correlation between plan quality and final output quality was 3x stronger than the correlation between execution quality (retries, model choice, sampling) and output quality. The best execution infrastructure cannot compensate for a plan that decomposes the work incorrectly or omits critical requirements.
|
|
25
|
+
|
|
26
|
+
## Insight
|
|
27
|
+
Plan quality and execution quality are not interchangeable investments. A well-specified plan with simple execution beats a vague plan with sophisticated execution infrastructure. The plan is the specification — if it's wrong, every downstream batch inherits the error.
|
|
28
|
+
|
|
29
|
+
## Lesson
|
|
30
|
+
Score your plan before executing it. Check: Does every task have clear acceptance criteria? Are dependencies correctly ordered? Are there any ambiguous requirements? A 30-minute plan review saves hours of execution rework.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 70
|
|
3
|
+
title: "Spec echo-back prevents 60% of agent failures"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: specification-drift
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Agent proceeds directly from requirements to implementation without restating the requirements in its own words and confirming understanding with the user."
|
|
11
|
+
fix: "Add an echo-back gate: agent restates requirements, user confirms or corrects, only then proceed to implementation."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
User: "Add rate limiting to the API"
|
|
15
|
+
Agent: *immediately starts coding*
|
|
16
|
+
good: |
|
|
17
|
+
User: "Add rate limiting to the API"
|
|
18
|
+
Agent: "I'll add token bucket rate limiting at 100 req/min per IP,
|
|
19
|
+
with 429 responses and Retry-After header. Correct?"
|
|
20
|
+
User: "Yes, but 60 req/min"
|
|
21
|
+
Agent: *now implements with correct limit*
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Observation
|
|
25
|
+
Analysis of autonomous coding failures showed that 60%+ of failures stemmed from spec misunderstanding, not from coding errors. The agent understood the words but not the intent — implementing a technically correct solution to the wrong problem.
|
|
26
|
+
|
|
27
|
+
## Insight
|
|
28
|
+
Spec misunderstanding is invisible until late in the process because the agent's implementation is internally consistent. Tests pass because they test the agent's interpretation. The echo-back step forces the misunderstanding to surface before any code is written.
|
|
29
|
+
|
|
30
|
+
## Lesson
|
|
31
|
+
Before implementing any feature, restate the requirements in your own words and confirm with the user. This single step prevents more failures than any amount of testing or code review.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 71
|
|
3
|
+
title: "Positive instructions outperform negative ones for LLMs"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: specification-drift
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Instructions phrased as 'don't do X' instead of 'do Y'. Negative instructions trigger the Pink Elephant Problem — the model encodes the forbidden pattern and may reproduce it."
|
|
11
|
+
fix: "Rephrase negative instructions as positive alternatives: instead of 'don't use var', write 'use const or let'."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# Don't use bare except clauses
|
|
15
|
+
# Don't hardcode test counts
|
|
16
|
+
# Don't use .venv/bin/pip
|
|
17
|
+
good: |
|
|
18
|
+
# Always catch specific exception classes and log
|
|
19
|
+
# Use threshold assertions (>=) for extensible collections
|
|
20
|
+
# Use .venv/bin/python -m pip for correct site-packages
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Observation
|
|
24
|
+
When lesson files and instructions used negative phrasing ("don't do X"), agents occasionally reproduced the exact anti-pattern described — the Pink Elephant Problem. Positive phrasing ("do Y instead") consistently produced better compliance.
|
|
25
|
+
|
|
26
|
+
## Insight
|
|
27
|
+
LLMs process instructions by encoding all tokens, including the forbidden pattern. "Don't use bare except" encodes "bare except" as a salient concept. "Always catch specific exception classes" encodes the correct pattern directly. The model follows what it encodes most strongly.
|
|
28
|
+
|
|
29
|
+
## Lesson
|
|
30
|
+
Write instructions as positive alternatives: "do Y" outperforms "don't do X" for LLM compliance. When writing lessons, always include a `positive_alternative` that the agent can follow directly.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 72
|
|
3
|
+
title: "Lost in the Middle — context placement affects accuracy 20pp"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [universal]
|
|
7
|
+
category: context-retrieval
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Critical instructions or requirements placed in the middle of a long context window, where LLM attention is weakest. Task description buried after long preambles or between large code blocks."
|
|
11
|
+
fix: "Place the task at the top of the context and requirements at the bottom. Keep the middle for reference material that's useful but not critical."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
[500 lines of project context]
|
|
15
|
+
[task description buried here]
|
|
16
|
+
[300 lines of code examples]
|
|
17
|
+
good: |
|
|
18
|
+
[task description — FIRST]
|
|
19
|
+
[reference material in middle]
|
|
20
|
+
[requirements and constraints — LAST]
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Observation
|
|
24
|
+
Research on LLM context windows shows a U-shaped attention curve: models attend most strongly to the beginning and end of context, with accuracy dropping up to 20 percentage points for information placed in the middle. When critical instructions were placed mid-context, agents missed them reliably.
|
|
25
|
+
|
|
26
|
+
## Insight
|
|
27
|
+
The "Lost in the Middle" effect means context order matters as much as context content. A perfectly written requirement placed in the wrong position has the same effect as a missing requirement. This is especially relevant for context injection in autonomous pipelines.
|
|
28
|
+
|
|
29
|
+
## Lesson
|
|
30
|
+
Structure all context injection with task at the top and requirements at the bottom. Use the middle for supplementary reference material. For `run-plan-context.sh`, this means: batch description first, prior art and warnings in the middle, acceptance criteria last.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 73
|
|
3
|
+
title: "Unscoped lessons cause 67% false positive rate at scale"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [all]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: context-retrieval
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Lesson files without scope metadata applied universally to all projects, causing irrelevant violations to fire on projects where the anti-pattern cannot occur."
|
|
11
|
+
fix: "Add scope: tags to every lesson. Use detect_project_scope() to filter lessons by project context. Default to [universal] only for genuinely cross-cutting patterns."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# Lesson about HA automation keys fires on a React project
|
|
15
|
+
# Lesson about JSX factory fires on a Python-only project
|
|
16
|
+
# 67% of violations are irrelevant noise
|
|
17
|
+
good: |
|
|
18
|
+
scope: [domain:ha-aria] # Only fires on HA projects
|
|
19
|
+
scope: [language:javascript, framework:preact] # Only fires on JSX projects
|
|
20
|
+
scope: [universal] # Genuinely applies everywhere
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Observation
|
|
24
|
+
As the lesson library grew past ~50 lessons, the false positive rate on any given project reached 67%. Lessons about Home Assistant automation keys fired on React projects. Lessons about JSX factory issues fired on Python-only projects. Developers started ignoring lesson-check output entirely.
|
|
25
|
+
|
|
26
|
+
## Insight
|
|
27
|
+
Without scope metadata, every lesson fires everywhere. This is correct for universal patterns (bare except, missing await) but wrong for domain-specific patterns. The noise from irrelevant violations drowns the signal from real issues, causing the entire system to be ignored.
|
|
28
|
+
|
|
29
|
+
## Lesson
|
|
30
|
+
Every lesson needs scope metadata. Use `scope: [universal]` only for patterns that genuinely apply to all projects. For everything else, scope to language, framework, domain, or specific project. The scope system keeps signal-to-noise high as the library scales.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: 74
|
|
3
|
+
title: "Stale context injection sends wrong batch's state to next agent"
|
|
4
|
+
severity: should-fix
|
|
5
|
+
languages: [shell]
|
|
6
|
+
scope: [project:autonomous-coding-toolkit]
|
|
7
|
+
category: context-retrieval
|
|
8
|
+
pattern:
|
|
9
|
+
type: semantic
|
|
10
|
+
description: "Context injection (CLAUDE.md modifications, AGENTS.md generation) from a previous batch persists into the next batch because the injection writes to tracked files and the git-clean check fails, or the injection is not cleaned up between batches."
|
|
11
|
+
fix: "Context injection must be idempotent and batch-scoped. Clean up injected context after each batch. Use temporary files or environment variables instead of modifying tracked files."
|
|
12
|
+
example:
|
|
13
|
+
bad: |
|
|
14
|
+
# Batch 3 context injected into CLAUDE.md
|
|
15
|
+
# Batch 3 fails, retries
|
|
16
|
+
# Batch 4 starts — still sees Batch 3's context in CLAUDE.md
|
|
17
|
+
# Agent makes decisions based on stale context
|
|
18
|
+
good: |
|
|
19
|
+
# Context injected into /tmp/batch-context.md
|
|
20
|
+
# Passed via --context flag or environment variable
|
|
21
|
+
# Automatically cleaned up between batches
|
|
22
|
+
# Each batch starts with fresh, correct context
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Observation
|
|
26
|
+
Context injection that modified tracked files (like appending to CLAUDE.md) created dirty git state between batches. The next batch's agent inherited the previous batch's context injection, making decisions based on stale information. When batch 3 failed and batch 4 started, batch 4 still saw batch 3's failure context.
|
|
27
|
+
|
|
28
|
+
## Insight
|
|
29
|
+
Context injection into version-controlled files conflates two lifetimes: the file's permanent content and the batch's temporary context. The git-clean quality gate catches this as "uncommitted changes" but the root cause is architectural — using the wrong persistence mechanism for ephemeral data.
|
|
30
|
+
|
|
31
|
+
## Lesson
|
|
32
|
+
Never inject batch-scoped context into tracked files. Use temporary files, environment variables, or the context budget in `run-plan-context.sh` which is designed for ephemeral, per-batch context injection.
|