npm - autonomous-coding-toolkit - Versions diffs - 1.0.0 - Mend

autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

package/.claude-plugin/marketplace.json +22 -0
package/.claude-plugin/plugin.json +13 -0
package/LICENSE +21 -0
package/Makefile +21 -0
package/README.md +140 -0
package/SECURITY.md +28 -0
package/agents/bash-expert.md +113 -0
package/agents/dependency-auditor.md +138 -0
package/agents/integration-tester.md +120 -0
package/agents/lesson-scanner.md +149 -0
package/agents/python-expert.md +179 -0
package/agents/service-monitor.md +141 -0
package/agents/shell-expert.md +147 -0
package/benchmarks/runner.sh +147 -0
package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
package/benchmarks/tasks/02-refactor-module/task.md +8 -0
package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
package/bin/act.js +238 -0
package/commands/autocode.md +6 -0
package/commands/cancel-ralph.md +18 -0
package/commands/code-factory.md +53 -0
package/commands/create-prd.md +55 -0
package/commands/ralph-loop.md +18 -0
package/commands/run-plan.md +117 -0
package/commands/submit-lesson.md +122 -0
package/docs/ARCHITECTURE.md +630 -0
package/docs/CONTRIBUTING.md +125 -0
package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
package/docs/lessons/0002-async-def-without-await.md +28 -0
package/docs/lessons/0003-create-task-without-callback.md +28 -0
package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
package/docs/lessons/0005-sqlite-without-closing.md +33 -0
package/docs/lessons/0006-venv-pip-path.md +27 -0
package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
package/docs/lessons/0010-local-outside-function-bash.md +33 -0
package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
package/docs/lessons/0020-persist-state-incrementally.md +44 -0
package/docs/lessons/0021-dual-axis-testing.md +48 -0
package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
package/docs/lessons/0023-static-analysis-spiral.md +51 -0
package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
package/docs/lessons/0045-iterative-design-improvement.md +33 -0
package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
package/docs/lessons/0048-integration-wiring-batch.md +40 -0
package/docs/lessons/0049-ab-verification.md +41 -0
package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
package/docs/lessons/0078-static-review-without-live-test.md +30 -0
package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
package/docs/lessons/FRAMEWORK.md +161 -0
package/docs/lessons/SUMMARY.md +201 -0
package/docs/lessons/TEMPLATE.md +85 -0
package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
package/docs/plans/2026-02-21-mab-research-report.md +406 -0
package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
package/docs/plans/2026-02-22-mab-run-design.md +462 -0
package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
package/docs/plans/2026-02-24-headless-module-split.md +443 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
package/docs/plans/audit-findings.md +186 -0
package/docs/telegram-notification-format.md +98 -0
package/examples/example-plan.md +51 -0
package/examples/example-prd.json +72 -0
package/examples/example-roadmap.md +33 -0
package/examples/quickstart-plan.md +63 -0
package/hooks/hooks.json +26 -0
package/hooks/setup-symlinks.sh +48 -0
package/hooks/stop-hook.sh +135 -0
package/package.json +47 -0
package/policies/bash.md +71 -0
package/policies/python.md +71 -0
package/policies/testing.md +61 -0
package/policies/universal.md +60 -0
package/scripts/analyze-report.sh +97 -0
package/scripts/architecture-map.sh +145 -0
package/scripts/auto-compound.sh +273 -0
package/scripts/batch-audit.sh +42 -0
package/scripts/batch-test.sh +101 -0
package/scripts/entropy-audit.sh +221 -0
package/scripts/failure-digest.sh +51 -0
package/scripts/generate-ast-rules.sh +96 -0
package/scripts/init.sh +112 -0
package/scripts/lesson-check.sh +428 -0
package/scripts/lib/common.sh +61 -0
package/scripts/lib/cost-tracking.sh +153 -0
package/scripts/lib/ollama.sh +60 -0
package/scripts/lib/progress-writer.sh +128 -0
package/scripts/lib/run-plan-context.sh +215 -0
package/scripts/lib/run-plan-echo-back.sh +231 -0
package/scripts/lib/run-plan-headless.sh +396 -0
package/scripts/lib/run-plan-notify.sh +57 -0
package/scripts/lib/run-plan-parser.sh +81 -0
package/scripts/lib/run-plan-prompt.sh +215 -0
package/scripts/lib/run-plan-quality-gate.sh +132 -0
package/scripts/lib/run-plan-routing.sh +315 -0
package/scripts/lib/run-plan-sampling.sh +170 -0
package/scripts/lib/run-plan-scoring.sh +146 -0
package/scripts/lib/run-plan-state.sh +142 -0
package/scripts/lib/run-plan-team.sh +199 -0
package/scripts/lib/telegram.sh +54 -0
package/scripts/lib/thompson-sampling.sh +176 -0
package/scripts/license-check.sh +74 -0
package/scripts/mab-run.sh +575 -0
package/scripts/module-size-check.sh +146 -0
package/scripts/patterns/async-no-await.yml +5 -0
package/scripts/patterns/bare-except.yml +6 -0
package/scripts/patterns/empty-catch.yml +6 -0
package/scripts/patterns/hardcoded-localhost.yml +9 -0
package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
package/scripts/pipeline-status.sh +197 -0
package/scripts/policy-check.sh +226 -0
package/scripts/prior-art-search.sh +133 -0
package/scripts/promote-mab-lessons.sh +126 -0
package/scripts/prompts/agent-a-superpowers.md +29 -0
package/scripts/prompts/agent-b-ralph.md +29 -0
package/scripts/prompts/judge-agent.md +61 -0
package/scripts/prompts/planner-agent.md +44 -0
package/scripts/pull-community-lessons.sh +90 -0
package/scripts/quality-gate.sh +266 -0
package/scripts/research-gate.sh +90 -0
package/scripts/run-plan.sh +329 -0
package/scripts/scope-infer.sh +159 -0
package/scripts/setup-ralph-loop.sh +155 -0
package/scripts/telemetry.sh +230 -0
package/scripts/tests/run-all-tests.sh +52 -0
package/scripts/tests/test-act-cli.sh +46 -0
package/scripts/tests/test-agents-md.sh +87 -0
package/scripts/tests/test-analyze-report.sh +114 -0
package/scripts/tests/test-architecture-map.sh +89 -0
package/scripts/tests/test-auto-compound.sh +169 -0
package/scripts/tests/test-batch-test.sh +65 -0
package/scripts/tests/test-benchmark-runner.sh +25 -0
package/scripts/tests/test-common.sh +168 -0
package/scripts/tests/test-cost-tracking.sh +158 -0
package/scripts/tests/test-echo-back.sh +180 -0
package/scripts/tests/test-entropy-audit.sh +146 -0
package/scripts/tests/test-failure-digest.sh +66 -0
package/scripts/tests/test-generate-ast-rules.sh +145 -0
package/scripts/tests/test-helpers.sh +82 -0
package/scripts/tests/test-init.sh +47 -0
package/scripts/tests/test-lesson-check.sh +278 -0
package/scripts/tests/test-lesson-local.sh +55 -0
package/scripts/tests/test-license-check.sh +109 -0
package/scripts/tests/test-mab-run.sh +182 -0
package/scripts/tests/test-ollama-lib.sh +49 -0
package/scripts/tests/test-ollama.sh +60 -0
package/scripts/tests/test-pipeline-status.sh +198 -0
package/scripts/tests/test-policy-check.sh +124 -0
package/scripts/tests/test-prior-art-search.sh +96 -0
package/scripts/tests/test-progress-writer.sh +140 -0
package/scripts/tests/test-promote-mab-lessons.sh +110 -0
package/scripts/tests/test-pull-community-lessons.sh +149 -0
package/scripts/tests/test-quality-gate.sh +241 -0
package/scripts/tests/test-research-gate.sh +132 -0
package/scripts/tests/test-run-plan-cli.sh +86 -0
package/scripts/tests/test-run-plan-context.sh +305 -0
package/scripts/tests/test-run-plan-e2e.sh +153 -0
package/scripts/tests/test-run-plan-headless.sh +424 -0
package/scripts/tests/test-run-plan-notify.sh +124 -0
package/scripts/tests/test-run-plan-parser.sh +217 -0
package/scripts/tests/test-run-plan-prompt.sh +254 -0
package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
package/scripts/tests/test-run-plan-routing.sh +178 -0
package/scripts/tests/test-run-plan-scoring.sh +148 -0
package/scripts/tests/test-run-plan-state.sh +261 -0
package/scripts/tests/test-run-plan-team.sh +157 -0
package/scripts/tests/test-scope-infer.sh +150 -0
package/scripts/tests/test-setup-ralph-loop.sh +63 -0
package/scripts/tests/test-telegram-env.sh +38 -0
package/scripts/tests/test-telegram.sh +121 -0
package/scripts/tests/test-telemetry.sh +46 -0
package/scripts/tests/test-thompson-sampling.sh +139 -0
package/scripts/tests/test-validate-all.sh +60 -0
package/scripts/tests/test-validate-commands.sh +89 -0
package/scripts/tests/test-validate-hooks.sh +98 -0
package/scripts/tests/test-validate-lessons.sh +150 -0
package/scripts/tests/test-validate-plan-quality.sh +235 -0
package/scripts/tests/test-validate-plans.sh +187 -0
package/scripts/tests/test-validate-plugin.sh +106 -0
package/scripts/tests/test-validate-prd.sh +184 -0
package/scripts/tests/test-validate-skills.sh +134 -0
package/scripts/validate-all.sh +57 -0
package/scripts/validate-commands.sh +67 -0
package/scripts/validate-hooks.sh +89 -0
package/scripts/validate-lessons.sh +98 -0
package/scripts/validate-plan-quality.sh +369 -0
package/scripts/validate-plans.sh +120 -0
package/scripts/validate-plugin.sh +86 -0
package/scripts/validate-policies.sh +42 -0
package/scripts/validate-prd.sh +118 -0
package/scripts/validate-skills.sh +96 -0
package/skills/autocode/SKILL.md +285 -0
package/skills/autocode/ab-verification.md +51 -0
package/skills/autocode/code-quality-standards.md +37 -0
package/skills/autocode/competitive-mode.md +364 -0
package/skills/brainstorming/SKILL.md +97 -0
package/skills/capture-lesson/SKILL.md +187 -0
package/skills/check-lessons/SKILL.md +116 -0
package/skills/dispatching-parallel-agents/SKILL.md +110 -0
package/skills/executing-plans/SKILL.md +85 -0
package/skills/finishing-a-development-branch/SKILL.md +201 -0
package/skills/receiving-code-review/SKILL.md +72 -0
package/skills/requesting-code-review/SKILL.md +59 -0
package/skills/requesting-code-review/code-reviewer.md +82 -0
package/skills/research/SKILL.md +145 -0
package/skills/roadmap/SKILL.md +115 -0
package/skills/subagent-driven-development/SKILL.md +98 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
package/skills/subagent-driven-development/implementer-prompt.md +73 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
package/skills/systematic-debugging/SKILL.md +134 -0
package/skills/systematic-debugging/condition-based-waiting.md +64 -0
package/skills/systematic-debugging/defense-in-depth.md +32 -0
package/skills/systematic-debugging/root-cause-tracing.md +55 -0
package/skills/test-driven-development/SKILL.md +167 -0
package/skills/using-git-worktrees/SKILL.md +219 -0
package/skills/using-superpowers/SKILL.md +54 -0
package/skills/verification-before-completion/SKILL.md +140 -0
package/skills/verify/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +128 -0
package/skills/writing-skills/SKILL.md +93 -0

package/docs/plans/2026-02-22-research-lesson-transferability.md ADDED Viewed

@@ -0,0 +1,508 @@
+# Research: Lesson Transferability — Do Anti-Pattern Lessons Generalize Across Projects?
+**Date:** 2026-02-22
+**Researcher:** Claude Opus 4.6 (research agent)
+**Domain:** Cynefin complicated — well-studied in adjacent fields (static analysis, safety science), but novel in community lesson systems for AI coding agents
+**Confidence:** Medium overall — strong evidence from analogous domains, limited direct evidence for AI-agent-specific lesson systems
+---
+## Executive Summary
+Anti-pattern lessons transfer reliably within a well-defined scope boundary, but applying all lessons universally to all projects produces unacceptable false positive rates. The evidence from static analysis research (SonarQube, ESLint, Semgrep), cross-project defect prediction (622-pair studies), and safety science (aviation ASRS, medical NRLS, Toyota A3) converges on one conclusion: **transferability is a function of abstraction level, not project similarity**.
+Lessons that encode universal programming invariants ("log before fallback," "close resources") transfer with near-zero false positives. Lessons that encode language-specific semantics ("async def without await") transfer within their language. Lessons that encode framework or tool-specific behavior ("hub.cache access patterns") do not transfer at all. The current lesson system has no scope metadata, which means every lesson runs against every project — a guaranteed source of noise as the library grows.
+**Recommendation:** Add a `scope` field to lesson YAML frontmatter with values `universal | language | framework | project-specific`. Filter lessons at scan time based on project language/framework detection. Target library size of 100-150 active lessons with aggressive retirement. Confidence: high.
+---
+## 1. Do Anti-Pattern Lessons From One Project Actually Prevent Bugs in Another?
+### Findings
+**Yes, but with significant caveats.** The cross-project defect prediction (CPDP) literature provides the strongest evidence. Zimmermann, Nagappan et al. (2009) ran 622 cross-project predictions across 12 real-world applications and found that cross-project prediction "is a serious challenge" — simply using models from projects in the same domain or with the same process does not guarantee accurate predictions. However, when the projects share structural and metric similarity, transfer works. More recent work (Tao 2024, using LSTM networks; TriStage-CPDP 2025, using CodeT5+) shows that deep learning can extract "project-invariant features" that improve transfer — but these are statistical features, not the kind of discrete rules in a lesson system.
+The more relevant evidence comes from the static analysis ecosystem. Tools like ESLint, Pylint, and SonarQube have been applying cross-project rules for decades. Their experience shows:
+- **Universal rules work universally.** SonarQube's "Sonar way" default profiles activate rules "that should be applicable to most projects" (SonarQube docs). These are the equivalent of our "bare except" lesson — language-level invariants.
+- **Framework-specific rules produce false positives outside their framework.** This is why Semgrep uses `<language>/<framework>/<category>` namespacing — a React rule applied to a Django project is pure noise.
+- **Unconfigured deployments average 67% false positive rates.** The 2024 GitLab Security Report measured SAST tools (including Semgrep, CodeQL, SonarQube CE) and found that out-of-the-box configurations "overwhelm developers with false positives."
+The lesson system is analogous to a static analysis tool with no scoping mechanism. Every lesson runs everywhere. This works at 10 lessons but will not work at 100.
+### Evidence Quality
+- Zimmermann et al. 2009: Large-scale empirical study, widely cited (2000+ citations). **High confidence.**
+- GitLab Security Report 2024: Industry report, methodology unclear. **Medium confidence.**
+- SonarQube/ESLint ecosystem behavior: Observable, well-documented. **High confidence.**
+### Implications for the Toolkit
+The current 61-lesson library is already at the point where scope matters. Of the 61 lessons, roughly 15 are Python-specific (async traps, venv/pip issues), 5 are JavaScript-specific (JSX factory, prop names), 8 are bash/shell-specific (local scope, set -e, grep -c), and the rest are universal or integration-level. Running Python-specific lessons against a JavaScript project is wasted work and false positive risk.
+---
+## 2. What's the Generalizability Boundary?
+### Findings
+Anti-patterns exist on a spectrum of abstraction, and transferability maps directly to that spectrum:
+| Scope Level | Transfers To | False Positive Risk | Examples from Our Lessons |
+|---|---|---|---|
+| **Universal** | All projects in all languages | Near zero | 0001 (bare except → bare catch), 0018 (each layer passes its test), 0020 (persist state before expensive work), 0029 (no secrets in code) |
+| **Language** | All projects in that language | Low | 0002 (async def without await — Python), 0010 (local outside function — bash), 0022 (JSX factory shadowed — JavaScript) |
+| **Framework/Tool** | Projects using that framework | Medium | 0006 (.venv/bin/pip — Python + pip), 0044 (relative file: deps — npm workspaces), 0047 (pytest xdist) |
+| **Domain** | Projects in the same domain | High if misapplied | 0016 (event-driven cold start — only relevant to event-driven systems), 0037 (parallel agents sharing worktree — only relevant to multi-agent systems) |
+| **Project-specific** | Only the originating project | Very high | 0007 (runner state file rejected by own git-clean check), 0009 (plan parser over-count), 0051 (infrastructure fixes can't benefit their own run) |
+The Wikipedia list of software anti-patterns confirms this hierarchy: anti-patterns are classified into software development (universal), architecture (domain), programming (language-specific), and methodological (process-specific) categories.
+### Evidence
+The Semgrep registry provides the clearest empirical model. Their rule namespace `<language>/<framework>/<category>/$MORE` explicitly encodes this hierarchy. When Semgrep scans a repository, it "identifies the languages used in your repositories and only runs rules applicable to those languages." This is exactly the filtering mechanism our lesson system lacks.
+SonarQube's quality profiles reinforce this: "Every project has a quality profile set for each supported language." Rules are never applied cross-language. The built-in "Sonar way" profile activates a curated subset — not all rules.
+### Implications
+The lesson system needs scope metadata. Without it, the lesson library cannot scale past ~80-100 lessons without producing noise. The Semgrep model (`language/framework/category`) is the right template.
+---
+## 3. What Does the Static Analysis Literature Say About Rule Transferability?
+### Findings
+The static analysis ecosystem has converged on several principles over two decades:
+**1. Rules are scoped to language, always.** No tool applies Python rules to JavaScript. This is foundational — not even debated. SonarQube uses per-language quality profiles. ESLint only applies to JavaScript/TypeScript. Pylint only applies to Python. Semgrep runs only rules matching detected languages.
+**2. Default rule sets are intentionally conservative.** SonarQube's "Sonar way" activates a subset of available rules. ESLint's recommended config enables ~50 of 200+ available rules. The principle: start with high-confidence universal rules, let users opt into more specific ones. This is directly applicable to our lesson system — not every lesson should be active by default.
+**3. Shared configs are the primary transfer mechanism.** In the ESLint ecosystem, `eslint-config-airbnb` (4M+ weekly downloads) and `standard` (545K+ weekly downloads) represent community consensus on which rules apply broadly. These configs are curated — someone decided which rules transfer and which don't. Our lesson system has no curation layer.
+**4. False positive rates are the primary adoption barrier.** SonarQube empirical studies show 18% precision (69/384 sample) without configuration. DeepSource targets <5% false positive rate through a multi-stage relevance engine. The industry consensus is that unconfigured analysis averaging 67% false positive rates is worse than no analysis — it erodes trust and causes developer fatigue.
+**5. AI-powered triage is the emerging solution.** Datadog's Bits AI and Semgrep's AI noise filtering (announced 2025) use LLMs to classify findings as true/false positives. This is directly relevant — our lesson-scanner agent could apply the same approach, using the AI agent's understanding of project context to suppress irrelevant lessons.
+### Evidence Quality
+- ESLint/SonarQube/Semgrep documentation: Primary sources. **High confidence.**
+- GitLab Security Report 2024 false positive rates: Industry measurement. **Medium confidence.**
+- DeepSource <5% target: Self-reported, methodology published. **Medium confidence.**
+- Datadog/Semgrep AI triage: Early stage, limited public evaluation data. **Low-medium confidence.**
+---
+## 4. Are There Lesson Categories That Transfer Well vs. Poorly?
+### Findings
+Mapping our six categories against the transferability spectrum:
+| Category | Transfer Potential | Reasoning |
+|---|---|---|
+| **silent-failures** | **High** (mostly universal) | "Log before fallback" is language-agnostic. Bare except (Python), empty catch (JS/Java/Go), `|| true` (bash) — same concept, different syntax. 18 of 21 silent-failure lessons transfer with syntax adaptation. |
+| **integration-boundaries** | **Medium** (domain-dependent) | "Verify at boundaries" is universal, but specific boundary patterns (worktree corruption, systemd env files, JSX prop names) are domain/tool-specific. 10 of 27 integration lessons are truly universal; 17 are context-dependent. |
+| **async-traps** | **Medium** (language-dependent) | Async anti-patterns transfer within the async programming model (Python asyncio, JavaScript Promises/async-await, C# async/await, Rust async). They do NOT transfer to synchronous-only projects. 3 of 3 async lessons are Python-specific in syntax but concept-transferable. |
+| **test-anti-patterns** | **High** (mostly universal) | "Don't hardcode counts in assertions" applies to any test framework in any language. 5 of 6 test lessons transfer universally. |
+| **resource-lifecycle** | **High** (universal concept, language-specific syntax) | "Close what you open" applies everywhere. Specific mechanisms (Python context managers, Java try-with-resources, Go defer) vary by language. 3 of 3 lessons transfer conceptually. |
+| **performance** | **Medium** (context-dependent) | "Filter before processing" is universal. "Use pytest-xdist" is tool-specific. 1 of 2 performance lessons transfers universally. |
+**Key finding:** The categories that transfer best encode **invariants of correctness** (silent failures, test anti-patterns, resource lifecycle). The categories that transfer worst encode **operational specifics** (integration boundaries, domain-specific async patterns).
+This maps to the Semgrep finding that their most widely-used community rules are generic security checks (SQL injection, XSS, hardcoded secrets) rather than framework-specific patterns.
+---
+## 5. What's the False Positive Cost of Applying Project-Specific Lessons to Unrelated Projects?
+### Findings
+The cost is higher than it appears because false positives compound in three ways:
+**1. Direct noise cost.** Each false positive requires a developer (or AI agent) to read, evaluate, and dismiss the finding. At 61 lessons, with perhaps 10 project-specific ones, the noise is manageable. At 200 lessons with 60 project-specific ones, every scan produces dozens of irrelevant findings.
+**2. Trust erosion.** The Parasoft blog on static analysis false positives states it clearly: "too much noise kills adoption." SonarQube community forums document cases of "hundreds of obvious false positives" leading teams to disable scanning entirely. The same dynamic applies to our lesson system — if users see irrelevant warnings repeatedly, they stop reading any warnings.
+**3. Alert fatigue leading to missed true positives.** This is the most dangerous cost. The medical safety literature documents this extensively: the NHS NRLS receives over 2 million reports per year, and the primary challenge is ensuring that signal isn't lost in noise. In our context: if 30% of lesson warnings are false positives, users develop a habit of dismissing warnings — including the one true positive that would have prevented a production bug.
+**Quantifying the cost:** If unconfigured SAST tools average 67% false positive rates, and our lesson system has no scoping, we can expect a similar trajectory as the library grows. At 100 lessons with no scope filtering, a JavaScript project would receive warnings from ~15 Python-specific and ~8 bash-specific lessons — roughly 23% noise before even considering domain-specific false positives.
+### Evidence
+- GitLab Security Report 2024: 67% average FP rate for unconfigured SAST. **Medium confidence.**
+- SonarQube community forums: Documented user complaints about "hundreds of obvious false positives." **High confidence** (primary source).
+- NHS NRLS: 2M+ reports/year, signal-in-noise is the central challenge. **High confidence.**
+- Quantitative estimate for our system (23% noise at 100 lessons): **Low confidence** (extrapolation, not measurement).
+---
+## 6. How Do Other Community-Driven Quality Systems Handle Scope?
+### Findings
+Six systems, six approaches:
+**ESLint Shared Configs** — Community-driven scope via npm packages. `eslint-config-airbnb` encodes Airbnb's opinion on which rules apply to their JavaScript projects. Users explicitly opt in by installing the package. Scope is implicit (JavaScript-only because ESLint is JavaScript-only) and explicit (curated rule sets). The flat config system introduced namespace challenges — "the ecosystem needs to decide how it solves the problem of plugin namespacing." Lesson: explicit scope metadata prevents namespace collisions as libraries grow.
+**Semgrep Registry** — `<language>/<framework>/<category>` namespacing. Technology metadata tags (e.g., `express`, `django`) link rules to frameworks. Language auto-detection at scan time filters irrelevant rules. Rulesets group rules by programming language, OWASP category, or framework. Lesson: the namespace hierarchy IS the scope mechanism.
+**SonarQube Quality Profiles** — Per-language profiles with inheritance. Built-in "Sonar way" as conservative default. Organizations extend profiles with project-specific rules. Lesson: a default conservative profile plus opt-in extensions is the right activation model.
+**CodeClimate** — File-path-based filtering via `Filters` tool. Excludes `config/`, `test/`, `vendor/` by default. Per-project filter definitions for monorepos. Lesson: path-based filtering catches project-structure-specific noise.
+**DeepSource** — Multi-stage relevance engine. AST analysis → processor pipeline → relevance engine → confidence scoring. Targets <5% false positive rate. Each issue gets a "dynamic weight." Lesson: post-detection relevance scoring can reduce noise without removing rules.
+**Aviation ASRS** — NASA's voluntary safety reporting. Reports are de-identified and published in CALLBACK monthly bulletin with "supporting commentary." The key insight: raw reports are not directly actionable — they require expert curation and contextualization before becoming "lessons." The ASRS model has been adopted by the UK (CHIRP), Canada, Australia, Japan, and cross-domain (NFIRS for fire, NRLS for healthcare). Lesson: curation transforms reports into transferable knowledge.
+**NHS NRLS** — 2M+ reports/year. Reports are classified by type, severity, and clinical area. National-level analysis produces "rapid response reports, patient safety alerts, and safer practice notices" — curated outputs from raw data. Lesson: the volume of raw incident data must be distilled into actionable alerts, not applied wholesale.
+**Toyota A3** — One-page problem-solution format. A3s are stored in a searchable database so "you never solve the same problem twice." The format forces root cause analysis (5 Whys), proposed countermeasures, and follow-up validation. Lesson: structured format (which our lesson system already has) enables searchability and reuse.
+### Synthesis
+Every successful system employs at least one of three mechanisms:
+1. **Scope metadata** (Semgrep, SonarQube) — rules tagged with their applicability
+2. **Curation** (ASRS, NRLS, ESLint configs) — expert review before broad distribution
+3. **Relevance filtering** (DeepSource, Datadog) — post-detection scoring to suppress noise
+Our lesson system currently has none of these. Adding scope metadata (mechanism 1) is the highest-leverage change. The maintainer review process in CONTRIBUTING.md provides mechanism 2 but doesn't enforce scope classification. Mechanism 3 (relevance filtering) could be added to the lesson-scanner agent.
+---
+## 7. Should Lessons Have Scope Metadata?
+### Findings
+**Yes. Unequivocally.** Every analogous system that has scaled past ~50 rules uses scope metadata.
+Proposed schema addition to lesson YAML frontmatter:
+```yaml
+scope:
+  level: universal | language | framework | domain | project-specific
+  languages: [python]              # Required if level != universal
+  frameworks: [asyncio, pytest]    # Optional, for framework-level lessons
+  domains: [event-driven, multi-agent]  # Optional, for domain-level lessons
+```
+**Filtering logic at scan time:**
+1. `universal` — always active
+2. `language` — active if project contains files in the specified language(s)
+3. `framework` — active if project's dependency manifest includes the specified framework(s)
+4. `domain` — active if project's CLAUDE.md or config declares the domain
+5. `project-specific` — active only in the originating project (or explicitly opted-in)
+**Classification of current 61 lessons by proposed scope:**
+| Scope Level | Count | Examples |
+|---|---|---|
+| Universal | ~25 | 0001 (bare except — concept is universal even if regex is Python), 0018, 0020, 0029 |
+| Language (Python) | ~15 | 0002, 0003, 0005, 0033, 0034 |
+| Language (JavaScript) | ~5 | 0022, 0027, 0044 |
+| Language (Bash) | ~8 | 0010, 0013, 0019, 0053, 0056, 0060 |
+| Framework/Tool | ~4 | 0006 (pip), 0047 (pytest-xdist) |
+| Domain | ~2 | 0037 (multi-agent worktree), 0016 (event-driven cold start) |
+| Project-specific | ~2 | 0007 (runner state file), 0009 (plan parser over-count) |
+**Implementation cost:** Low. Adding a YAML field to existing lessons is a one-time effort. Filtering logic in `lesson-check.sh` requires reading the project's language from file extensions or a config file — perhaps 20 lines of bash. The lesson-scanner agent already has project context and can filter semantically.
+### Confidence: High
+Every analogous system does this. The only question is syntax, not whether.
+---
+## 8. What's the Optimal Lesson Library Size?
+### Findings
+The static analysis literature and industry practice converge on a principle: **focused sets outperform exhaustive sets.**
+**Evidence for diminishing returns:**
+- Parasoft (static analysis vendor): "Checking a lot of rules is not the secret to achieving the best ROI with static analysis. In fact, in many cases, the reverse is true."
+- SonarQube's "Sonar way" activates a curated subset, not all available rules. The full rule set for Java alone exceeds 600 rules; "Sonar way" activates approximately 350.
+- ESLint's `recommended` config enables ~50 of 200+ rules. Airbnb's config enables ~250, but these are heavily curated for a specific use case.
+- DeepSource's approach: fewer rules, but <5% false positive rate per rule. Quality over quantity.
+**The noise accumulation curve:**
+```
+Rules  | True Positives | False Positives | Signal-to-Noise
+-------|----------------|-----------------|----------------
+  20   |   High/rule    |   Very low      |   Excellent
+  50   |   Medium/rule  |   Low           |   Good
+ 100   |   Low/rule     |   Medium        |   Acceptable
+ 200   |   Very low/rule|   High          |   Poor
+ 500   |   Negligible   |   Very high     |   Unusable
+```
+Each new lesson has diminishing marginal value (the most common anti-patterns are caught early) and increasing marginal cost (more rules = more false positives = more noise). The crossover point — where adding a rule produces more noise than signal — depends on scope filtering:
+- **Without scope filtering:** Crossover at ~80-100 lessons (our current trajectory)
+- **With language-level filtering:** Crossover at ~150-200 lessons per language
+- **With framework-level filtering:** Crossover at ~250-300 lessons per framework
+**Recommendation:** Target 100-150 active lessons with scope filtering. Institute a retirement policy: lessons with zero true positive matches across 100+ scans should be archived (not deleted — moved to an `archived/` subdirectory). Review annually.
+### Confidence: Medium
+The principle is well-established (high confidence), but the specific numbers are extrapolations from adjacent domains (medium confidence). Empirical measurement of false positive rates for our specific lesson system would increase confidence.
+---
+## 9. How Do Lesson Systems in Other Domains Work?
+### Findings
+Three domains with mature lesson systems:
+**Aviation (ASRS)**
+- **Structure:** Voluntary, confidential, de-identified. NASA operates (neutral third party). No enforcement authority.
+- **Volume:** 1.7M+ reports since 1976. Monthly CALLBACK bulletin with curated excerpts.
+- **Transfer mechanism:** Expert analysis → categorized alerts → industry-wide distribution. Raw reports are never applied directly — they are distilled.
+- **Success evidence:** "A proven and effective way to fill in the gaps left by accident investigations" (FAA Safety). Model adopted by 6+ countries and 3+ other industries.
+- **Key insight for us:** Raw incident reports (our lesson files) need a curation and distillation layer to transfer effectively. The ASRS doesn't say "here are 1.7M reports, read all of them." It says "here are this month's 6 most important patterns."
+**Medical Safety (NHS NRLS)**
+- **Structure:** Mandatory reporting for serious incidents, voluntary for near-misses. 2M+ reports/year.
+- **Volume:** World's largest patient safety database.
+- **Transfer mechanism:** National analysis → rapid response reports → patient safety alerts → local action plans. Classification by incident type, severity, clinical area, and contributing factors.
+- **Success evidence:** "Findings and learnings shared across the organization, leading to redesigning policies, improving processes."
+- **Key insight for us:** Classification metadata (type, severity, area) is essential for making large databases searchable and actionable. Our lesson system has severity and category but lacks scope.
+**Manufacturing (Toyota A3)**
+- **Structure:** One-page structured problem-solution format. Searchable database.
+- **Volume:** Organization-wide, accumulated over decades.
+- **Transfer mechanism:** "You never solve the same problem twice" — A3s are indexed and searchable. Managers use A3s to mentor root-cause thinking.
+- **Success evidence:** MIT Sloan Management Review describes A3 as "the key tactic in sharing a deeper method of thinking that lies at the heart of Toyota's sustained success."
+- **Key insight for us:** Our lesson format (YAML frontmatter + observation/insight/lesson) already follows the A3 structure. The missing piece is the searchable database — currently lessons are flat files discovered by grep.
+### Cross-Domain Synthesis
+All three systems share four properties:
+1. **Structured capture** — standardized format for raw incidents (we have this)
+2. **Expert curation** — human review before broad distribution (we have this via PR review)
+3. **Scope classification** — metadata for filtering relevance (we lack this)
+4. **Distilled outputs** — curated summaries for different audiences (we partially have this via SUMMARY.md)
+The aviation and medical systems also share a critical property we lack: **tiered distribution**. Not every report goes to every practitioner. Alerts are routed based on relevance (clinical area, aircraft type, role). Our lesson system applies every lesson to every project — the equivalent of sending every patient safety alert to every doctor regardless of specialty.
+---
+## 10. What Filtering/Relevance Mechanisms Could Reduce False Positives Without Losing Coverage?
+### Findings
+Five mechanisms, ordered by implementation cost and expected impact:
+**Mechanism 1: Scope metadata filtering (High impact, Low cost)**
+Add `scope.level` and `scope.languages` to lesson YAML. At scan time, detect project language(s) from file extensions and filter lessons accordingly. This eliminates the most obvious noise — Python lessons in JavaScript projects — with minimal implementation effort.
+Expected noise reduction: 30-40% of current false positives.
+**Mechanism 2: Framework detection (Medium impact, Medium cost)**
+Read `requirements.txt`, `package.json`, `Cargo.toml`, etc. to detect frameworks. Filter framework-scoped lessons based on actual dependencies. More implementation effort (must parse multiple manifest formats) but eliminates framework-specific noise.
+Expected noise reduction: 10-15% additional.
+**Mechanism 3: Confidence scoring on matches (Medium impact, Medium cost)**
+For syntactic lessons, score matches by context. A bare `except:` in a test helper is less critical than in production code. For semantic lessons, the lesson-scanner agent already has context — add a confidence field to its output. This follows the DeepSource model of post-detection relevance scoring.
+Expected noise reduction: 15-20% additional (primarily for syntactic lessons in test/example code).
+**Mechanism 4: AI-powered triage (High impact, High cost)**
+Following Datadog's Bits AI model, use the lesson-scanner agent to evaluate whether a syntactic match is actually problematic in context. The agent reads the surrounding code, understands the project's patterns, and suppresses findings that are technically matches but practically benign. This is the most powerful mechanism but requires significant agent compute.
+Expected noise reduction: 20-30% additional, but at high compute cost per scan.
+**Mechanism 5: Community feedback loop (Medium impact, Low ongoing cost)**
+Track true/false positive rates per lesson across the community. Lessons with >20% false positive rate get flagged for review. Lessons with >50% false positive rate get automatically demoted from `blocker` to `nice-to-have`. This follows the DeepSource model of "static issue filtering based on conventions and user feedback."
+Expected noise reduction: Compounds over time. 5% in year 1, 15% by year 2.
+### Recommended Implementation Order
+1. Scope metadata (immediate — one PR, low risk, high impact)
+2. Framework detection (next quarter — moderate effort, good ROI)
+3. Confidence scoring (same quarter — extends existing lesson-scanner)
+4. Community feedback loop (ongoing — requires usage telemetry infrastructure)
+5. AI-powered triage (future — high cost, diminishing marginal returns if 1-4 are done)
+---
+## Transferability Framework for the Lesson System
+Based on all findings, here is the proposed framework:
+### Lesson Scope Taxonomy
+```
+┌─────────────────────────────────────────────────┐
+│ UNIVERSAL                                       │
+│ "Log before fallback" — applies to all code     │
+│ Transfer: unconditional                         │
+│ False positive risk: near zero                  │
+├─────────────────────────────────────────────────┤
+│ LANGUAGE                                        │
+│ "async def without await" — Python only         │
+│ Transfer: within language boundary              │
+│ Filter: file extension / language detection     │
+├─────────────────────────────────────────────────┤
+│ FRAMEWORK                                       │
+│ ".venv/bin/pip installs wrong" — pip-specific   │
+│ Transfer: within framework users                │
+│ Filter: dependency manifest detection           │
+├─────────────────────────────────────────────────┤
+│ DOMAIN                                          │
+│ "Seed state on event-driven startup" — EDA only │
+│ Transfer: within architectural pattern          │
+│ Filter: project config / CLAUDE.md declaration  │
+├─────────────────────────────────────────────────┤
+│ PROJECT-SPECIFIC                                │
+│ "Runner state file rejected by git-clean"       │
+│ Transfer: originating project only              │
+│ Filter: project name match                      │
+│ Default: inactive in community library          │
+└─────────────────────────────────────────────────┘
+```
+### YAML Schema Extension
+```yaml
+# Proposed addition to lesson frontmatter
+scope:
+  level: universal          # universal | language | framework | domain | project-specific
+  languages: [python]       # Required unless level = universal
+  frameworks: []            # Optional, for framework-level lessons
+  domains: []               # Optional, for domain-level lessons
+  project: ""               # Required if level = project-specific
+```
+### Filtering Algorithm
+```
+function should_run_lesson(lesson, project):
+  if lesson.scope.level == "universal":
+    return true
+  if lesson.scope.level == "language":
+    return project.languages ∩ lesson.scope.languages ≠ ∅
+  if lesson.scope.level == "framework":
+    return project.dependencies ∩ lesson.scope.frameworks ≠ ∅
+  if lesson.scope.level == "domain":
+    return project.domains ∩ lesson.scope.domains ≠ ∅
+  if lesson.scope.level == "project-specific":
+    return project.name == lesson.scope.project
+```
+### Activation Policy
+| Scope Level | Default State | Activation |
+|---|---|---|
+| Universal | Active for all | Cannot be disabled |
+| Language | Active if language detected | Auto-detected from file extensions |
+| Framework | Inactive by default | Activated by dependency detection or user opt-in |
+| Domain | Inactive by default | Activated by project config declaration |
+| Project-specific | Inactive | Only active in originating project |
+### Library Growth Policy
+- **Target:** 100-150 active lessons with scope filtering
+- **Retirement:** Archive lessons with zero matches across 100+ scans
+- **Review cadence:** Quarterly review of false positive rates per lesson
+- **Quality bar:** New lessons must specify scope level; PR review verifies scope accuracy
+- **Universal lessons cap:** No more than 40 universal lessons (diminishing returns)
+---
+## Recommendations
+### Immediate (This Sprint)
+1. **Add `scope` field to TEMPLATE.md and CONTRIBUTING.md** — require scope for all new lessons
+2. **Backfill scope on existing 61 lessons** — classify each lesson by scope level (estimated effort: 1-2 hours)
+3. **Add language filtering to `lesson-check.sh`** — detect project language(s) from file extensions, skip lessons with non-matching `scope.languages` (estimated effort: 30 minutes)
+### Next Quarter
+4. **Add framework detection** — parse `requirements.txt`, `package.json`, `Cargo.toml` for framework-level filtering
+5. **Add confidence scoring to lesson-scanner output** — each finding gets a confidence level based on context
+6. **Document scope taxonomy** in ARCHITECTURE.md
+### Future
+7. **Community false positive tracking** — aggregate match data to identify noisy lessons
+8. **AI-powered triage** — use lesson-scanner agent to evaluate syntactic matches in context
+9. **Lesson retirement automation** — auto-archive lessons below signal threshold
+### Confidence Assessment
+| Recommendation | Confidence | Basis |
+|---|---|---|
+| Add scope metadata | **High** | Every analogous system does this. Zero counterevidence. |
+| Language-level filtering | **High** | SonarQube, Semgrep, ESLint all do this. Standard practice. |
+| Framework detection | **Medium** | Semgrep does this well; implementation complexity varies. |
+| Library size target (100-150) | **Medium** | Extrapolated from static analysis literature; needs empirical validation. |
+| AI-powered triage | **Low-Medium** | Datadog/Semgrep results promising but early-stage. |
+---
+## Sources
+### Academic Research
+- [Zimmermann, Nagappan et al. — Cross-project Defect Prediction (ESEC/FSE 2009)](https://dl.acm.org/doi/10.1145/1595696.1595713) — 622 cross-project predictions, "a serious challenge"
+- [Tao 2024 — Cross-project Defect Prediction Using Transfer Learning with LSTM](https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/2024/5550801)
+- [TriStage-CPDP 2025 — Three-stage Cross-project Defect Prediction](https://link.springer.com/article/10.1007/s40747-025-02098-y) — CodeT5+ for project-invariant features
+- [Cross-project Defect Prediction Based on Transfer GCN (2025)](https://link.springer.com/article/10.1007/s10664-025-10783-2)
+- [Antipatterns in Software Classification Taxonomies (ScienceDirect 2022)](https://www.sciencedirect.com/science/article/pii/S0164121222000826)
+- [Predicting Bugs Using Antipatterns (ResearchGate)](https://www.researchgate.net/publication/261416699_Predicting_Bugs_Using_Antipatterns)
+- [Multi-Programming-Language Bug Prediction (2024)](https://arxiv.org/html/2407.10906v1)
+- [Are Static Analysis Violations Really Fixed? (IEEE 2019)](https://ieeexplore.ieee.org/document/8813272/) — SonarQube empirical study
+- [Toyota's Secret: The A3 Report (MIT Sloan Management Review)](https://sloanreview.mit.edu/article/toyotas-secret-the-a3-report/)
+- [Toyota A3 Report: Process Improvement in Healthcare (PubMed)](https://pubmed.ncbi.nlm.nih.gov/19380942/)
+### Industry Reports and Documentation
+- [SonarQube Quality Profiles Documentation](https://docs.sonarsource.com/sonarqube-server/latest/quality-standards-administration/managing-quality-profiles/)
+- [SonarQube Rules Overview](https://docs.sonarsource.com/sonarqube-server/quality-standards-administration/managing-rules/rules)
+- [Semgrep Rule Structure Syntax](https://semgrep.dev/docs/writing-rules/rule-syntax)
+- [Semgrep Registry Contribution Guide](https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository)
+- [Semgrep Policies and Rule Management](https://semgrep.dev/docs/semgrep-code/policies)
+- [ESLint Shareable Configs](https://eslint.org/docs/latest/extend/shareable-configs)
+- [ESLint Flat Config Introduction](https://eslint.org/blog/2022/08/new-config-system-part-2/)
+- [CodeClimate Maintainability Documentation](https://docs.codeclimate.com/docs/maintainability)
+- [CodeClimate Filters](https://docs.codeclimate.com/docs/filters)
+- [Sonar Blog — False Positives Are Our Enemies](https://www.sonarsource.com/blog/false-positives-our-enemies-but-maybe-your-friends/)
+- SonarQube Community Forum — [Hundreds of Obvious False Positives](https://community.sonarsource.com/t/hundreds-of-obvious-false-positives/57867)
+### False Positive Research and Filtering
+- [DeepSource — How We Ensure Less Than 5% False Positive Rate](https://deepsource.com/blog/how-deepsource-ensures-less-false-positives)
+- [Datadog — Using LLMs to Filter Out False Positives from SAST](https://www.datadoghq.com/blog/using-llms-to-filter-out-false-positives/)
+- [Semgrep — Announcing AI Noise Filtering and Triage Memories (2025)](https://semgrep.dev/blog/2025/announcing-ai-noise-filtering-and-triage-memories/)
+- [GitLab Security Report 2024 — 67% average FP rate](https://kb.secuarden.com/briefs/issue-2-drowning-in-alerts-the-false-positive-trap-in-open-source-sast/)
+- [Parasoft — False Positives in Static Code Analysis](https://www.parasoft.com/blog/false-positives-in-static-code-analysis/)
+- [Parasoft — 10 Tips for Static Analysis Clean-Up](https://www.parasoft.com/blog/10-tips-static-analysis/)
+### Safety Science
+- [NASA ASRS — Aviation Safety Reporting System](https://asrs.arc.nasa.gov/)
+- [ASRS Wikipedia](https://en.wikipedia.org/wiki/Aviation_Safety_Reporting_System)
+- [NASA ASRS Program Briefing (2024)](https://ntrs.nasa.gov/api/citations/20240014226/downloads/ICASS%202024%20ASRS.pdf)
+- [SKYbrary — ASRS Overview](https://skybrary.aero/articles/aviation-safety-reporting-system-asrs)
+- [FAA — The Case for Confidential Incident Reporting Systems](https://www.faasafety.gov/files/events/EA/EA23/2010/EA2334954/NASA_Reporting.pdf)
+- [NHS NRLS — Using Incident Reporting Systems to Improve Patient Safety (PMC)](https://pmc.ncbi.nlm.nih.gov/articles/PMC11554398/)
+- [NHS NRLS Background (NCBI Bookshelf)](https://www.ncbi.nlm.nih.gov/books/NBK385184/)
+- [Systems for Identifying and Reporting Medicines-Related Safety Incidents (NCBI)](https://www.ncbi.nlm.nih.gov/books/NBK355903/)
+### Developer Experience
+- [Agoda Engineering — How to Make Linting Rules Work](https://medium.com/agoda-engineering/how-to-make-linting-rules-work-from-enforcement-to-education-be7071d2fcf0)
+- [Qlty — Developer Experience Gaps of Linting on CI](https://qlty.sh/blog/developer-experience-gaps-of-linting-on-ci)
+- [eslint-config-airbnb on npm](https://www.npmjs.com/package/eslint-config-airbnb) — 4M+ weekly downloads
+- [standard on npm](https://www.npmjs.com/package/standard) — 545K+ weekly downloads