npm - autonomous-coding-toolkit - Versions diffs - 1.0.0 - Mend

autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

package/.claude-plugin/marketplace.json +22 -0
package/.claude-plugin/plugin.json +13 -0
package/LICENSE +21 -0
package/Makefile +21 -0
package/README.md +140 -0
package/SECURITY.md +28 -0
package/agents/bash-expert.md +113 -0
package/agents/dependency-auditor.md +138 -0
package/agents/integration-tester.md +120 -0
package/agents/lesson-scanner.md +149 -0
package/agents/python-expert.md +179 -0
package/agents/service-monitor.md +141 -0
package/agents/shell-expert.md +147 -0
package/benchmarks/runner.sh +147 -0
package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
package/benchmarks/tasks/02-refactor-module/task.md +8 -0
package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
package/bin/act.js +238 -0
package/commands/autocode.md +6 -0
package/commands/cancel-ralph.md +18 -0
package/commands/code-factory.md +53 -0
package/commands/create-prd.md +55 -0
package/commands/ralph-loop.md +18 -0
package/commands/run-plan.md +117 -0
package/commands/submit-lesson.md +122 -0
package/docs/ARCHITECTURE.md +630 -0
package/docs/CONTRIBUTING.md +125 -0
package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
package/docs/lessons/0002-async-def-without-await.md +28 -0
package/docs/lessons/0003-create-task-without-callback.md +28 -0
package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
package/docs/lessons/0005-sqlite-without-closing.md +33 -0
package/docs/lessons/0006-venv-pip-path.md +27 -0
package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
package/docs/lessons/0010-local-outside-function-bash.md +33 -0
package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
package/docs/lessons/0020-persist-state-incrementally.md +44 -0
package/docs/lessons/0021-dual-axis-testing.md +48 -0
package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
package/docs/lessons/0023-static-analysis-spiral.md +51 -0
package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
package/docs/lessons/0045-iterative-design-improvement.md +33 -0
package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
package/docs/lessons/0048-integration-wiring-batch.md +40 -0
package/docs/lessons/0049-ab-verification.md +41 -0
package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
package/docs/lessons/0078-static-review-without-live-test.md +30 -0
package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
package/docs/lessons/FRAMEWORK.md +161 -0
package/docs/lessons/SUMMARY.md +201 -0
package/docs/lessons/TEMPLATE.md +85 -0
package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
package/docs/plans/2026-02-21-mab-research-report.md +406 -0
package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
package/docs/plans/2026-02-22-mab-run-design.md +462 -0
package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
package/docs/plans/2026-02-24-headless-module-split.md +443 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
package/docs/plans/audit-findings.md +186 -0
package/docs/telegram-notification-format.md +98 -0
package/examples/example-plan.md +51 -0
package/examples/example-prd.json +72 -0
package/examples/example-roadmap.md +33 -0
package/examples/quickstart-plan.md +63 -0
package/hooks/hooks.json +26 -0
package/hooks/setup-symlinks.sh +48 -0
package/hooks/stop-hook.sh +135 -0
package/package.json +47 -0
package/policies/bash.md +71 -0
package/policies/python.md +71 -0
package/policies/testing.md +61 -0
package/policies/universal.md +60 -0
package/scripts/analyze-report.sh +97 -0
package/scripts/architecture-map.sh +145 -0
package/scripts/auto-compound.sh +273 -0
package/scripts/batch-audit.sh +42 -0
package/scripts/batch-test.sh +101 -0
package/scripts/entropy-audit.sh +221 -0
package/scripts/failure-digest.sh +51 -0
package/scripts/generate-ast-rules.sh +96 -0
package/scripts/init.sh +112 -0
package/scripts/lesson-check.sh +428 -0
package/scripts/lib/common.sh +61 -0
package/scripts/lib/cost-tracking.sh +153 -0
package/scripts/lib/ollama.sh +60 -0
package/scripts/lib/progress-writer.sh +128 -0
package/scripts/lib/run-plan-context.sh +215 -0
package/scripts/lib/run-plan-echo-back.sh +231 -0
package/scripts/lib/run-plan-headless.sh +396 -0
package/scripts/lib/run-plan-notify.sh +57 -0
package/scripts/lib/run-plan-parser.sh +81 -0
package/scripts/lib/run-plan-prompt.sh +215 -0
package/scripts/lib/run-plan-quality-gate.sh +132 -0
package/scripts/lib/run-plan-routing.sh +315 -0
package/scripts/lib/run-plan-sampling.sh +170 -0
package/scripts/lib/run-plan-scoring.sh +146 -0
package/scripts/lib/run-plan-state.sh +142 -0
package/scripts/lib/run-plan-team.sh +199 -0
package/scripts/lib/telegram.sh +54 -0
package/scripts/lib/thompson-sampling.sh +176 -0
package/scripts/license-check.sh +74 -0
package/scripts/mab-run.sh +575 -0
package/scripts/module-size-check.sh +146 -0
package/scripts/patterns/async-no-await.yml +5 -0
package/scripts/patterns/bare-except.yml +6 -0
package/scripts/patterns/empty-catch.yml +6 -0
package/scripts/patterns/hardcoded-localhost.yml +9 -0
package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
package/scripts/pipeline-status.sh +197 -0
package/scripts/policy-check.sh +226 -0
package/scripts/prior-art-search.sh +133 -0
package/scripts/promote-mab-lessons.sh +126 -0
package/scripts/prompts/agent-a-superpowers.md +29 -0
package/scripts/prompts/agent-b-ralph.md +29 -0
package/scripts/prompts/judge-agent.md +61 -0
package/scripts/prompts/planner-agent.md +44 -0
package/scripts/pull-community-lessons.sh +90 -0
package/scripts/quality-gate.sh +266 -0
package/scripts/research-gate.sh +90 -0
package/scripts/run-plan.sh +329 -0
package/scripts/scope-infer.sh +159 -0
package/scripts/setup-ralph-loop.sh +155 -0
package/scripts/telemetry.sh +230 -0
package/scripts/tests/run-all-tests.sh +52 -0
package/scripts/tests/test-act-cli.sh +46 -0
package/scripts/tests/test-agents-md.sh +87 -0
package/scripts/tests/test-analyze-report.sh +114 -0
package/scripts/tests/test-architecture-map.sh +89 -0
package/scripts/tests/test-auto-compound.sh +169 -0
package/scripts/tests/test-batch-test.sh +65 -0
package/scripts/tests/test-benchmark-runner.sh +25 -0
package/scripts/tests/test-common.sh +168 -0
package/scripts/tests/test-cost-tracking.sh +158 -0
package/scripts/tests/test-echo-back.sh +180 -0
package/scripts/tests/test-entropy-audit.sh +146 -0
package/scripts/tests/test-failure-digest.sh +66 -0
package/scripts/tests/test-generate-ast-rules.sh +145 -0
package/scripts/tests/test-helpers.sh +82 -0
package/scripts/tests/test-init.sh +47 -0
package/scripts/tests/test-lesson-check.sh +278 -0
package/scripts/tests/test-lesson-local.sh +55 -0
package/scripts/tests/test-license-check.sh +109 -0
package/scripts/tests/test-mab-run.sh +182 -0
package/scripts/tests/test-ollama-lib.sh +49 -0
package/scripts/tests/test-ollama.sh +60 -0
package/scripts/tests/test-pipeline-status.sh +198 -0
package/scripts/tests/test-policy-check.sh +124 -0
package/scripts/tests/test-prior-art-search.sh +96 -0
package/scripts/tests/test-progress-writer.sh +140 -0
package/scripts/tests/test-promote-mab-lessons.sh +110 -0
package/scripts/tests/test-pull-community-lessons.sh +149 -0
package/scripts/tests/test-quality-gate.sh +241 -0
package/scripts/tests/test-research-gate.sh +132 -0
package/scripts/tests/test-run-plan-cli.sh +86 -0
package/scripts/tests/test-run-plan-context.sh +305 -0
package/scripts/tests/test-run-plan-e2e.sh +153 -0
package/scripts/tests/test-run-plan-headless.sh +424 -0
package/scripts/tests/test-run-plan-notify.sh +124 -0
package/scripts/tests/test-run-plan-parser.sh +217 -0
package/scripts/tests/test-run-plan-prompt.sh +254 -0
package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
package/scripts/tests/test-run-plan-routing.sh +178 -0
package/scripts/tests/test-run-plan-scoring.sh +148 -0
package/scripts/tests/test-run-plan-state.sh +261 -0
package/scripts/tests/test-run-plan-team.sh +157 -0
package/scripts/tests/test-scope-infer.sh +150 -0
package/scripts/tests/test-setup-ralph-loop.sh +63 -0
package/scripts/tests/test-telegram-env.sh +38 -0
package/scripts/tests/test-telegram.sh +121 -0
package/scripts/tests/test-telemetry.sh +46 -0
package/scripts/tests/test-thompson-sampling.sh +139 -0
package/scripts/tests/test-validate-all.sh +60 -0
package/scripts/tests/test-validate-commands.sh +89 -0
package/scripts/tests/test-validate-hooks.sh +98 -0
package/scripts/tests/test-validate-lessons.sh +150 -0
package/scripts/tests/test-validate-plan-quality.sh +235 -0
package/scripts/tests/test-validate-plans.sh +187 -0
package/scripts/tests/test-validate-plugin.sh +106 -0
package/scripts/tests/test-validate-prd.sh +184 -0
package/scripts/tests/test-validate-skills.sh +134 -0
package/scripts/validate-all.sh +57 -0
package/scripts/validate-commands.sh +67 -0
package/scripts/validate-hooks.sh +89 -0
package/scripts/validate-lessons.sh +98 -0
package/scripts/validate-plan-quality.sh +369 -0
package/scripts/validate-plans.sh +120 -0
package/scripts/validate-plugin.sh +86 -0
package/scripts/validate-policies.sh +42 -0
package/scripts/validate-prd.sh +118 -0
package/scripts/validate-skills.sh +96 -0
package/skills/autocode/SKILL.md +285 -0
package/skills/autocode/ab-verification.md +51 -0
package/skills/autocode/code-quality-standards.md +37 -0
package/skills/autocode/competitive-mode.md +364 -0
package/skills/brainstorming/SKILL.md +97 -0
package/skills/capture-lesson/SKILL.md +187 -0
package/skills/check-lessons/SKILL.md +116 -0
package/skills/dispatching-parallel-agents/SKILL.md +110 -0
package/skills/executing-plans/SKILL.md +85 -0
package/skills/finishing-a-development-branch/SKILL.md +201 -0
package/skills/receiving-code-review/SKILL.md +72 -0
package/skills/requesting-code-review/SKILL.md +59 -0
package/skills/requesting-code-review/code-reviewer.md +82 -0
package/skills/research/SKILL.md +145 -0
package/skills/roadmap/SKILL.md +115 -0
package/skills/subagent-driven-development/SKILL.md +98 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
package/skills/subagent-driven-development/implementer-prompt.md +73 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
package/skills/systematic-debugging/SKILL.md +134 -0
package/skills/systematic-debugging/condition-based-waiting.md +64 -0
package/skills/systematic-debugging/defense-in-depth.md +32 -0
package/skills/systematic-debugging/root-cause-tracing.md +55 -0
package/skills/test-driven-development/SKILL.md +167 -0
package/skills/using-git-worktrees/SKILL.md +219 -0
package/skills/using-superpowers/SKILL.md +54 -0
package/skills/verification-before-completion/SKILL.md +140 -0
package/skills/verify/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +128 -0
package/skills/writing-skills/SKILL.md +93 -0

package/docs/plans/2026-02-23-research-improving-existing-agents.md ADDED Viewed

@@ -0,0 +1,503 @@
+# Research: Improving Existing Claude Code Agents
+**Date:** 2026-02-23
+**Status:** Complete
+**Scope:** ~/.claude/agents/ — 8 existing agents
+---
+## BLUF
+The 8 existing agents range from production-quality (lesson-scanner, counter) to underspecified (security-reviewer, doc-updater). Priority improvements fall into four categories: (1) add `model` fields to 5 agents that inherit unnecessarily, (2) add `memory` fields to 3 agents that would benefit from cross-session learning, (3) tighten tool lists on 4 agents that are over-permissioned, (4) add explicit hallucination guards to the 2 audit agents.
+---
+## Sources
+- [wshobson/agents](https://github.com/wshobson/agents) — 112-agent production system, plugin architecture, progressive disclosure skills
+- [VoltAgent/awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents) — 127+ agent community collection
+- [0xfurai/claude-code-subagents](https://github.com/0xfurai/claude-code-subagents) — 100+ production-ready subagents
+- [iannuttall/claude-agents](https://github.com/iannuttall/claude-agents) — custom agents collection
+- [hesreallyhim/awesome-claude-code](https://github.com/hesreallyhim/awesome-claude-code) — curated skills/hooks/agents list
+- [Claude Code Docs — Create custom subagents](https://code.claude.com/docs/en/sub-agents) — official frontmatter reference
+- [PubNub — Best Practices for Claude Code Sub-Agents](https://www.pubnub.com/blog/best-practices-for-claude-code-sub-agents/) — tool constraints, hooks, error handling
+- [PubNub — From Prompts to Pipelines](https://www.pubnub.com/blog/best-practices-claude-code-subagents-part-two-from-prompts-to-pipelines/) — agent chain patterns, artifact structure
+- [Claude Docs — Reduce Hallucinations](https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/reduce-hallucinations) — hallucination prevention
+- [Adaline Labs — Ship Reliably with Claude Code](https://labs.adaline.ai/p/how-to-ship-reliably-with-claude-code) — governance patterns
+---
+## Key Findings from External Research
+### Frontmatter Capabilities Most Agents Are Not Using
+From the official docs, these frontmatter fields exist and none of the current agents use them fully:
+| Field | What it does | Agents missing it |
+|-------|-------------|-------------------|
+| `model` | Route to right model tier | security-reviewer, infra-auditor, doc-updater, notion-researcher, notion-writer |
+| `memory` | Persistent cross-session learning | security-reviewer, lesson-scanner, infra-auditor |
+| `maxTurns` | Hard stop against runaway execution | all agents |
+| `isolation: worktree` | Isolated git context for write agents | doc-updater |
+| `hooks` | Pre/post tool validation | infra-auditor, notion-writer |
+| `permissionMode` | Default is overly permissive for read-only agents | security-reviewer, infra-auditor, counter, counter-daily |
+### Tool Constraint Anti-Pattern: Omission = Full Inheritance
+From PubNub research: "If you omit `tools`, you're implicitly granting access to all available tools." All 8 agents explicitly list tools, which is correct. However, several include write-capable tools (Edit, Write, Bash) when they only need read access.
+- `counter.md` and `counter-daily.md` have `tools: Read, Grep, Glob` — correct, no write needed
+- `security-reviewer.md` includes `Bash` — risky if used for active exploitation testing
+- `infra-auditor.md` includes `Bash` — necessary for system checks, but should add `permissionMode: dontAsk` for writes
+### Hallucination Prevention Patterns
+From Anthropic's official docs on reducing hallucinations:
+1. **Ground assertions in tool output** — agents should be required to cite specific grep/read results before any finding
+2. **Explicit "do not report what grep + read does not confirm"** instruction — lesson-scanner has this; security-reviewer and infra-auditor do not
+3. **Uncertainty declarations** — agents should say "I could not verify X" rather than inferring
+### Agent Chain Integration Patterns
+From PubNub Part 2:
+- Structured handoff artifacts: active-plan.md, implementation-summary.md, qa-summary.md
+- Each agent returns a clean summary to the orchestrator, not raw logs
+- Hook-based governance: PreToolUse for validation, PostToolUse for verification
+- Plan → Execute → Verify pipeline as the canonical sequence
+### Model Selection Best Practice
+From official docs and PubNub:
+- Haiku: mechanical tasks, read-only searches, daily lightweight checks
+- Sonnet: balanced analysis, multi-file operations
+- Opus: complex reasoning, adversarial review, architecture critique
+Current agents: counter correctly uses `model: opus`. counter-daily correctly uses `model: sonnet`. The other 6 all inherit from the parent conversation, which means they will run at whatever model the user happens to be using — wasteful for lightweight agents, underspecced for analysis agents.
+### Persistent Memory Pattern
+From official docs: `memory: user` gives agents a `~/.claude/agent-memory/<name>/` directory that persists across sessions. The agent's system prompt automatically includes the first 200 lines of MEMORY.md.
+Agents that would benefit most from memory:
+- **lesson-scanner**: could accumulate false-positive patterns per-project, avoid rescanning clean files
+- **security-reviewer**: could remember known-safe patterns and previously flagged issues
+- **infra-auditor**: could track baseline service states and flag deviations vs. absolute thresholds
+---
+## Per-Agent Assessment and Improvements
+### 1. security-reviewer.md
+**Current state:** Minimal (35 lines). Covers 4 vulnerability categories. Output format exists but is implicit. No hallucination guard. Web-focused (SQL injection, XSS) — misses Python/bash attack surfaces.
+**Gap analysis:**
+- No explicit "only report what the tools confirm" guardrail — will hallucinate findings on code it hasn't read
+- Missing attack categories for Python/shell scripts: deserialization, subprocess injection, pickle loading, hardcoded secrets in environment variable fallbacks
+- No `model` field (should be `sonnet`)
+- No `memory` field — can't accumulate project-specific baseline
+- Bash tool included but no guard against running exploits — should be `permissionMode: plan` or `dontAsk`
+- Missing cryptography category: weak algorithms (MD5, SHA1), hardcoded salts, insecure random
+- Output format has no "CLEAN" affirmation — leaves ambiguity about unreviewed files
+**Recommended improvements:**
+```markdown
+---
+name: security-reviewer
+description: Reviews code for security vulnerabilities and sensitive data exposure. Use proactively after any code changes that touch authentication, data handling, file I/O, subprocess calls, or network requests.
+tools: Read, Grep, Glob
+model: sonnet
+memory: project
+permissionMode: plan
+---
+```
+Changes:
+1. Remove `Bash` — not needed for read-only review; eliminates risk of active exploitation
+2. Add `model: sonnet` — analysis task, not opus-level reasoning
+3. Add `memory: project` — accumulate known-safe patterns and previously reviewed baselines
+4. Add `permissionMode: plan` — read-only mode, no writes
+5. Expand vulnerability categories:
+   - Add Python-specific: `pickle.loads()`, `eval()`, `exec()`, `subprocess` with `shell=True`
+   - Add cryptography: `hashlib.md5`, `hashlib.sha1`, `random.random()` in security context, hardcoded salts
+   - Add dependency chain: check `requirements.txt`, `package.json`, `Pipfile.lock` for known CVEs via `safety check` in bash (after re-adding Bash with hook guard)
+6. Add explicit hallucination guard: "Only report findings grounded in specific file:line evidence from Read/Grep output. If a grep returns no matches, record the category as CLEAN — do not infer."
+7. Add structured `CLEAN` section to output format
+### 2. infra-auditor.md
+**Current state:** Well-specified (77 lines). Clear check categories, concrete commands, good report format. Strong baseline.
+**Gap analysis:**
+- No `model` field (should be `haiku` — mechanical checks, not reasoning)
+- No `maxTurns` — could loop indefinitely if a service check hangs
+- Missing checks: memory slice caps (the systemd-oomd and user-1000 slice are defined in CLAUDE.md), ollama-queue service, open-webui health
+- `systemctl --user is-active` for 6 services — correct, but missing the timer units (21 timers)
+- Sync freshness check uses `stat -c '%Y'` (epoch) but compares to nothing — needs `$(date +%s)` math
+- No hook to validate bash commands before execution (adds risk if agent hallucinates a destructive command)
+- Missing: `journalctl --user -u <service> --since "1 hour ago" --no-pager` for recent errors on unhealthy services
+**Recommended improvements:**
+```yaml
+model: haiku
+maxTurns: 30
+hooks:
+  PreToolUse:
+    - matcher: "Bash"
+      hooks:
+        - type: command
+          command: "~/.claude/hooks/validate-readonly-bash.sh"
+```
+Specific content additions:
+1. Add timer audit: `systemctl --user list-timers --no-pager` — check that all 21 timers are active
+2. Fix sync freshness math: `NOW=$(date +%s); SYNC=$(stat -c '%Y' file); echo $((NOW - SYNC))` seconds
+3. Add ollama-queue service check: `curl -s http://127.0.0.1:7683/health`
+4. Add memory slice check: `systemctl show user-1000.slice --property=MemoryHigh`
+5. Add hallucination guard: "Only report the output of commands you actually executed. Do not infer service health without running the check."
+6. Add journal check for any unhealthy service before escalating to CRITICAL
+### 3. doc-updater.md
+**Current state:** Well-structured (40 lines). Context hierarchy table is excellent. CLAUDE.md chain enforcement is the right mental model.
+**Gap analysis:**
+- No `model` field (should be `sonnet` — needs to reason about content placement)
+- No `isolation: worktree` — doc writes could corrupt staging area (Lesson #44 parallel agent concern)
+- `git diff HEAD~1` only looks at last commit — misses uncommitted changes; should use `git diff HEAD` and `git status --short` together
+- No check for MEMORY.md line count (stated in the rules but no scan instruction)
+- Missing: validate that CLAUDE.md files don't contain hardcoded secrets (should grep for IP addresses, tokens)
+- No output format — the agent makes changes but returns no structured summary of what was changed and why
+- Write tool is included — needs explicit guard against writing to CLAUDE.md files it hasn't read first (lesson #file-editing from CLAUDE.md)
+**Recommended improvements:**
+```yaml
+model: sonnet
+isolation: worktree
+```
+Content additions:
+1. Add explicit scan sequence:
+   - Step 0: `git status --short && git diff HEAD --name-only` (catch both staged and unstaged)
+   - Add MEMORY.md line count check: `wc -l ~/.claude/projects/.../memory/MEMORY.md`
+2. Add output format:
+   ```
+   ## Doc Update Summary
+   Files reviewed: [list]
+   Files modified: [list with reason]
+   Duplication removed: [what and where]
+   No-op: [what needed no change and why]
+   ```
+3. Add security check: before writing, grep new content for IP addresses, tokens, credentials
+4. Add explicit "read before write" rule — must Read the target file before Edit/Write
+### 4. lesson-scanner.md
+**Current state:** Excellent (294 lines). Most mature agent in the set. Structured scan groups, explicit patterns, hallucination guard already present, clean report format. This is the reference implementation.
+**Gap analysis:**
+- Description says "53 lessons" — now 66 lessons (stale count)
+- No `model` field (should be `sonnet` — pattern matching and analysis, not Opus-level)
+- No `memory: project` — could cache "clean file" hashes to skip unchanged files on repeat runs
+- Scan Group coverage gaps vs. current lesson set:
+  - Missing Lessons #60-66 (research-derived, added 2026-02-21): plan quality, spec compliance, positive instructions, lesson scope, context placement
+  - Missing Lesson #51: `.venv/bin/pip` vs `.venv/bin/python -m pip` (hookify warns but scanner should flag too)
+  - Missing Lesson #50: plan assertion math (if scanner runs on docs/plans/*.md)
+  - Missing Lesson #26: unit boundary verification
+- Scan Group 4a (duplicate function names) has a false-positive threshold of 3 files — should be configurable
+**Recommended improvements:**
+```yaml
+model: sonnet
+memory: project
+```
+Content additions:
+1. Update description count: "66 lessons" (from 53)
+2. Add Scan Group 7: Plan Quality (Lessons #60-66):
+   - Scan `docs/plans/*.md` for missing hypothesis statements, missing acceptance criteria, missing success metrics
+   - Pattern: check for "hypothesis:" or "we believe" keywords — absence is a flag
+   - Pattern: check for "acceptance criteria" section — absence is Should-Fix
+3. Add Scan 3f: `.venv/bin/pip` usage (Lesson #51):
+   ```
+   pattern: \.venv/bin/pip\b
+   glob: **/*.{sh,md,py}
+   ```
+   Flag as Should-Fix with fix: use `.venv/bin/python -m pip`
+4. Add memory instruction: "After each scan, write a one-line entry to MEMORY.md noting the project path, timestamp, and blocker count. On repeat scans, check memory first — if a file has not changed since last scan and had no blockers, skip it."
+### 5. counter.md
+**Current state:** Exceptional (466 lines). Most sophisticated agent in the set. Psychological grounding, four lenses, lean gate, wildcard, human contact gate, severity system, critical rules. This is a complete system.
+**Gap analysis:**
+- No `maxTurns` — a review could spiral into exhaustive analysis; 20 turns is sufficient for any review
+- The `Discovered Patterns` section at the bottom is the right pattern but has no reminder to check it — the agent could skip it on automatic runs
+- No reference to Lessons #60-66 in the Bias Detection section — "Lesson regression" check (Lens 2) should include the research-derived clusters E and F
+- `~/.claude/counter-humans.md` is referenced but if this file doesn't exist the human contact gate silently fails
+- Missing: the agent has no instruction to check if it's being invoked recursively (counter reviewing a counter output creates echo chamber)
+**Recommended improvements:**
+1. Add `maxTurns: 20` to frontmatter
+2. Add Cluster E and F to the Lesson Regression check in Lens 2:
+   ```
+   "Lesson regression — mental grep against all 6 clusters:
+   A (silent failures), B (integration boundaries), C (cold-start),
+   D (specification drift), E (context & retrieval — info buried or misscoped),
+   F (planning & control flow — wrong decomposition contaminates downstream)"
+   ```
+3. Add check at top of Discovered Patterns section: "Before reviewing, scan Discovered Patterns for any pattern matching the input type."
+4. Add guard: "If the input being reviewed is itself a Counter output or review of a review, flag this to the user before proceeding — adversarial review of adversarial review creates false certainty."
+### 6. counter-daily.md
+**Current state:** Well-calibrated (66 lines). Tight scope, correct model, no padding.
+**Gap analysis:**
+- No `maxTurns` — should be 5 (three questions, acknowledgment, done)
+- Missing question pool entry for "Lesson regression" gap — the daily check could include "Did you repeat a known failure pattern today?" as an optional question
+- The defaults fire when no context is provided — but if the user provides partial context, question selection logic is vague ("pick the three most relevant")
+- No output structure at all — questions are unformatted, which is correct for this agent, but there's no instruction about follow-up behavior if Justin responds
+**Recommended improvements:**
+1. Add `maxTurns: 5`
+2. Add one question to each pool as options:
+   - Collaboration: "Did you make any decision today based on a lesson you've documented but ignored anyway?"
+   - Focus: "What would have changed if you'd checked Lessons SUMMARY.md before starting today's main task?"
+3. Add behavior rule: "If Justin responds to the questions, acknowledge once and stop. Do not analyze the response. Do not follow up with more questions. That's the full counter's job."
+### 7. notion-researcher.md
+**Current state:** Well-structured (77 lines). Search strategy hierarchy is excellent. Content domain shortcuts are high-value. Synthesis rules are correct.
+**Gap analysis:**
+- No `model` field (should be `sonnet` — cross-database synthesis, not mechanical lookup)
+- `tools: Read, Grep, Glob, Bash` — Bash is needed for `notion-vector-search` CLI, correct
+- No `maxTurns` — large Notion workspaces could cause runaway exploration; limit to 40 turns
+- Staleness check instruction is there but weak — "if freshness matters" is vague; should always check if data is >12 hours old
+- No citation format standardization — the output rule says "cite sources" but doesn't specify format; the main session can't parse inconsistent citations
+- Missing: the agent should check `~/Documents/notion/CLAUDE.md` exists before starting — if Notion sync has never run, the file may not exist
+- Missing: if `notion-vector-search` returns 0 results, the agent has no fallback instruction (will hallucinate or stop)
+**Recommended improvements:**
+```yaml
+model: sonnet
+maxTurns: 40
+```
+Content additions:
+1. Standardize citation format:
+   ```
+   Source: [Database/Page Name] | ID: {uuid} | Updated: {date}
+   ```
+2. Add vector search fallback: "If `notion-vector-search` returns 0 results, fall back to Grep with decomposed keyword terms before concluding the topic is not in Notion."
+3. Strengthen staleness check: always run `stat` on sync metadata at start and include age in output — do not wait for "freshness matters"
+4. Add guard: "Check that `~/Documents/notion/CLAUDE.md` exists before searching. If it doesn't exist, report: 'Notion local replica not found — run notion-sync first.'"
+### 8. notion-writer.md
+**Current state:** Functional (115 lines). Complete API reference, good property formats, batch operation example, SQLite sync instruction.
+**Gap analysis:**
+- No `model` field (should be `haiku` — mechanical API calls, not reasoning)
+- No `maxTurns` — should be 20 to prevent runaway batch operations
+- Rate limit handling is documented but has no instruction for what to do after hitting rate limit beyond "wait and retry" — should include exponential backoff
+- No input validation instruction — if called with a missing database ID, will attempt API call and get a cryptic 404
+- The SQLite sync step is noted as "when creating pages from capture bot" — but the agent has no way to know which origin triggered it; it should always offer to sync
+- No rollback instruction — if a batch create fails midway, the agent has no guidance on how to identify which pages were created vs. not
+- Missing: the agent should verify `NOTION_API_KEY` is set before first API call, not discover it's missing on first 401
+**Recommended improvements:**
+```yaml
+model: haiku
+maxTurns: 20
+hooks:
+  PreToolUse:
+    - matcher: "Bash"
+      hooks:
+        - type: command
+          command: "~/.claude/hooks/validate-api-key.sh NOTION_API_KEY"
+```
+Content additions:
+1. Add pre-flight check: "Before any API call, verify `NOTION_API_KEY` is set: `bash -c 'source ~/.env && [ -n \"$NOTION_API_KEY\" ] && echo OK || echo MISSING'`"
+2. Add input validation: "Before calling any API with a database ID, check that the ID matches UUID format (8-4-4-4-12 hex). If not, stop and report the malformed ID."
+3. Add exponential backoff for 429: `sleep $((retry_after + 1))`, double delay on second retry
+4. Add batch operation tracking: maintain a local list of successfully created page IDs during batch operations; if an error occurs, report "Created N of M pages: [list of IDs]"
+5. Add SQLite sync offer: always end with "Run `notion-sync --page PAGE_ID` to refresh local replica for each created page?"
+---
+## Cross-Cutting Patterns
+### Pattern 1: Hallucination Guard Template
+Every audit/review agent (security-reviewer, infra-auditor, lesson-scanner) should include this as its final instruction:
+```
+## Anti-Hallucination Rules
+- Report ONLY what Grep/Read/Bash output directly confirms.
+- If a scan group returns no grep matches, record it as CLEAN — do not infer vulnerabilities.
+- If you are uncertain about a finding, read more context before flagging — do not flag based on pattern proximity alone.
+- If a command fails or returns no output, report "Could not verify: [check name]" rather than assuming pass or fail.
+```
+lesson-scanner already has a version of this. security-reviewer and infra-auditor need it added.
+### Pattern 2: Model Tier Alignment
+Current state vs. correct assignment:
+| Agent | Current | Should Be | Reason |
+|-------|---------|-----------|--------|
+| security-reviewer | inherit | sonnet | Multi-file analysis |
+| infra-auditor | inherit | haiku | Mechanical checks |
+| doc-updater | inherit | sonnet | Content reasoning |
+| lesson-scanner | inherit | sonnet | Pattern analysis |
+| counter | opus | opus | Correct — adversarial reasoning |
+| counter-daily | sonnet | sonnet | Correct — lightweight |
+| notion-researcher | inherit | sonnet | Cross-database synthesis |
+| notion-writer | inherit | haiku | Mechanical API calls |
+### Pattern 3: maxTurns as Safety Net
+None of the current agents set `maxTurns`. Per official docs, this is a hard stop on runaway execution. Recommended values:
+| Agent | maxTurns | Reason |
+|-------|----------|--------|
+| security-reviewer | 50 | May scan many files |
+| infra-auditor | 30 | ~20 discrete checks |
+| doc-updater | 20 | Few files to read+write |
+| lesson-scanner | 80 | 6 scan groups × many files |
+| counter | 20 | Review, not analysis marathon |
+| counter-daily | 5 | 3 questions only |
+| notion-researcher | 40 | May explore many pages |
+| notion-writer | 20 | Bounded by batch size |
+### Pattern 4: Memory for Audit Agents
+Three agents would benefit most from `memory: project`:
+- **lesson-scanner**: Cache scan results per file hash; skip unchanged clean files on repeat runs. This transforms it from O(project_size) to O(changed_files) on every run.
+- **security-reviewer**: Store baseline of known-safe patterns (e.g., "this project uses parameterized queries throughout — SQL injection is mitigated at the ORM layer"). Avoid re-flagging architecturally sound patterns.
+- **infra-auditor**: Store service baseline state. Flag deviations from baseline rather than absolute thresholds. Reduces false positives on expected service restarts.
+### Pattern 5: Description Quality
+The `description` field is how Claude decides when to delegate. Current descriptions vary in specificity:
+**Weak** (won't trigger delegation reliably):
+- `security-reviewer`: "Reviews code for security vulnerabilities and sensitive data exposure" — no trigger phrase
+- `doc-updater`: "Reviews recent changes and updates documentation" — no trigger phrase
+**Strong** (explicit invocation triggers):
+- `lesson-scanner`: "Scans codebase for anti-patterns... Dispatched via /audit lessons against any Python/JS/TS project root" — explicit dispatch instruction
+- `notion-researcher`: "Use this agent when answering questions that require reading multiple Notion files..." — clear use-case examples
+All agents should include: "Use proactively when..." or "Dispatch when..." with specific trigger conditions.
+### Pattern 6: Tool Minimization
+Per PubNub: "Be intentional" about tools. Current over-permissions:
+- `security-reviewer` has `Bash` — remove it; read-only review needs only Read/Grep/Glob
+- `infra-auditor` has `Bash` — keep it (needed for system checks), but add PreToolUse hook to validate no destructive commands
+- `doc-updater` has `Edit, Write, Bash` — all justified, but add read-before-write rule
+---
+## Priority-Ordered Action List
+### P0 — Correctness (prevents wrong output)
+1. **Add hallucination guards to security-reviewer and infra-auditor** — these agents report findings that drive action; false findings are costly
+2. **Fix infra-auditor sync freshness math** — current `stat -c '%Y'` comparison is broken without `$(date +%s)` delta math
+3. **Remove Bash from security-reviewer** — read-only review should not have shell execution; eliminates active-exploitation risk
+4. **Update lesson-scanner description count** — "53 lessons" is stale; now 66
+### P1 — Quality (prevents waste or confusion)
+5. **Add `model` fields to all 6 agents missing them** — prevents sonnet-scale tasks routing to haiku or opus-scale tasks routing to haiku by accident
+6. **Add `maxTurns` to all agents** — prevents runaway execution; values above
+7. **Add explicit trigger phrases to security-reviewer and doc-updater descriptions** — delegation won't activate reliably without them
+8. **Fix doc-updater git diff command** — `HEAD~1` misses uncommitted changes; use `git status --short && git diff HEAD`
+### P2 — Capability (adds meaningful new features)
+9. **Add `memory: project` to lesson-scanner** — caching clean-file results transforms repeat scan performance
+10. **Add Scan Group 7 (Plan Quality, Lessons #60-66) to lesson-scanner** — research-derived lessons are not currently scanned
+11. **Add Scan 3f (`.venv/bin/pip`, Lesson #51) to lesson-scanner** — hookify warns but scanner should also flag
+12. **Add Clusters E and F to counter Bias Detection (Lens 2)** — lesson regression check is incomplete without them
+13. **Add notion-researcher vector search fallback** — zero-result behavior is undefined
+14. **Add notion-writer pre-flight API key check** — currently discovers missing key on first 401
+### P3 — Polish (reduces friction)
+15. **Add structured output format to doc-updater** — currently makes changes but returns no summary
+16. **Add counter-daily follow-up behavior rule** — "acknowledge once and stop" prevents it from morphing into a full counter session
+17. **Add notion-writer batch operation tracking** — partial failure currently leaves ambiguous state
+18. **Add `memory: project` to security-reviewer** — baseline known-safe patterns across sessions
+19. **Add `isolation: worktree` to doc-updater** — protects staging area during CLAUDE.md writes
+20. **Add counter `maxTurns: 20`** — prevents review sessions from becoming analysis marathons
+---
+## Agent Chain Integration Opportunities
+Three natural agent chains exist that are not currently wired:
+### Chain 1: Code Change Pipeline
+```
+[code change committed]
+  → security-reviewer (read-only scan, report findings)
+  → lesson-scanner (pattern audit, report violations)
+  → doc-updater (update CLAUDE.md + README if needed)
+```
+Currently these run independently. Wiring via a slash command or hook would create a single `/post-commit-audit` that runs all three.
+### Chain 2: Notion Research → Write
+```
+[user asks a Notion question]
+  → notion-researcher (explore, synthesize, return citations)
+  → notion-writer (create capture page with findings if requested)
+```
+Currently the user manually switches between agents. The researcher's output format should be designed to be directly consumable by the writer.
+### Chain 3: Counter → doc-updater
+```
+[counter reviews a plan and finds issues]
+  → counter returns critique with specific gaps
+  → doc-updater updates the plan doc with flagged items
+```
+This requires counter's output format to include actionable file:line references compatible with doc-updater's input format — a structural change to counter's output format.
+---
+## Appendix: Official Frontmatter Reference (as of 2026-02-23)
+From [Claude Code Docs](https://code.claude.com/docs/en/sub-agents):
+| Field | Required | Default | Notes |
+|-------|----------|---------|-------|
+| `name` | Yes | — | Lowercase + hyphens |
+| `description` | Yes | — | Delegation trigger text |
+| `tools` | No | All inherited | Omitting = full inheritance (dangerous) |
+| `disallowedTools` | No | — | Blocklist from inherited set |
+| `model` | No | inherit | sonnet/opus/haiku/inherit |
+| `permissionMode` | No | default | default/acceptEdits/dontAsk/bypassPermissions/plan |
+| `maxTurns` | No | unlimited | Hard stop on agentic turns |
+| `skills` | No | — | Inject skill content at startup |
+| `mcpServers` | No | — | MCP servers available to subagent |
+| `hooks` | No | — | Lifecycle hooks scoped to subagent |
+| `memory` | No | — | user/project/local |
+| `background` | No | false | Always run as background task |
+| `isolation` | No | — | worktree = isolated git context |