autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
|
|
3
|
+
"name": "autonomous-coding-toolkit",
|
|
4
|
+
"description": "Autonomous coding pipeline with quality gates, fresh-context execution, and community lessons",
|
|
5
|
+
"owner": {
|
|
6
|
+
"name": "Justin McFarland",
|
|
7
|
+
"email": "parthalon025@gmail.com"
|
|
8
|
+
},
|
|
9
|
+
"plugins": [
|
|
10
|
+
{
|
|
11
|
+
"name": "autonomous-coding-toolkit",
|
|
12
|
+
"description": "Complete autonomous coding pipeline with skills, agents, scripts, and a community lesson system that improves with every user",
|
|
13
|
+
"version": "1.0.0",
|
|
14
|
+
"source": "./",
|
|
15
|
+
"author": {
|
|
16
|
+
"name": "Justin McFarland",
|
|
17
|
+
"email": "parthalon025@gmail.com"
|
|
18
|
+
},
|
|
19
|
+
"category": "development"
|
|
20
|
+
}
|
|
21
|
+
]
|
|
22
|
+
}
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "autonomous-coding-toolkit",
|
|
3
|
+
"description": "Complete autonomous coding pipeline: skills for every stage from brainstorming through verification, quality gates between batches, headless execution, and a lessons-learned feedback loop that compounds with every user",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": {
|
|
6
|
+
"name": "Justin McFarland",
|
|
7
|
+
"email": "parthalon025@gmail.com"
|
|
8
|
+
},
|
|
9
|
+
"homepage": "https://github.com/parthalon025/autonomous-coding-toolkit",
|
|
10
|
+
"repository": "https://github.com/parthalon025/autonomous-coding-toolkit",
|
|
11
|
+
"license": "MIT",
|
|
12
|
+
"keywords": ["autonomous", "tdd", "quality-gates", "headless", "skills", "pipeline", "lessons-learned"]
|
|
13
|
+
}
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Justin McFarland
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/Makefile
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
.PHONY: test validate lint ci
|
|
2
|
+
|
|
3
|
+
lint:
|
|
4
|
+
@echo "=== ShellCheck ==="
|
|
5
|
+
@shellcheck scripts/*.sh scripts/lib/*.sh 2>&1 || true
|
|
6
|
+
@echo "=== shfmt ==="
|
|
7
|
+
@shfmt -d -i 2 -ci scripts/*.sh scripts/lib/*.sh 2>&1 || true
|
|
8
|
+
@echo "=== Shellharden ==="
|
|
9
|
+
@shellharden --check scripts/*.sh scripts/lib/*.sh 2>&1 || true
|
|
10
|
+
@echo "=== Semgrep ==="
|
|
11
|
+
@semgrep --config "p/bash" --quiet scripts/ 2>&1 || true
|
|
12
|
+
@echo "=== Lint Complete ==="
|
|
13
|
+
|
|
14
|
+
test:
|
|
15
|
+
@bash scripts/tests/run-all-tests.sh
|
|
16
|
+
|
|
17
|
+
validate:
|
|
18
|
+
@bash scripts/validate-all.sh
|
|
19
|
+
|
|
20
|
+
ci: lint validate test
|
|
21
|
+
@echo "CI: ALL PASSED"
|
package/README.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
[](https://github.com/parthalon025/autonomous-coding-toolkit/actions)
|
|
2
|
+
[](https://opensource.org/licenses/MIT)
|
|
3
|
+
[](https://github.com/parthalon025/autonomous-coding-toolkit/releases/tag/v1.0.0)
|
|
4
|
+
|
|
5
|
+
# Autonomous Coding Toolkit
|
|
6
|
+
|
|
7
|
+
> **Goal:** Code better than a human on large projects — not by being smarter on any single batch, but by compounding learning across thousands of batches across hundreds of users.
|
|
8
|
+
|
|
9
|
+
**A learning system for autonomous AI coding.** Fresh context per batch, quality gates between every step, 79 community lessons that prevent the same bug twice, and telemetry that makes the system smarter with every run.
|
|
10
|
+
|
|
11
|
+
Built for [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (v1.0.33+). Works as a Claude Code plugin (interactive) and npm CLI (headless/CI).
|
|
12
|
+
|
|
13
|
+
## What It Does
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
You write a plan → the toolkit executes it batch-by-batch with:
|
|
17
|
+
- Fresh 200k context window per batch (no accumulated degradation)
|
|
18
|
+
- Quality gates between every batch (tests + anti-pattern scan + memory check)
|
|
19
|
+
- Machine-verifiable completion (every criterion is a shell command)
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Install
|
|
23
|
+
|
|
24
|
+
### npm (recommended)
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npm install -g autonomous-coding-toolkit
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
This puts `act` on your PATH. Requires Node.js 18+ and bash 4+.
|
|
31
|
+
|
|
32
|
+
### Claude Code Plugin
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Add the marketplace source
|
|
36
|
+
/plugin marketplace add parthalon025/autonomous-coding-toolkit
|
|
37
|
+
|
|
38
|
+
# Install the plugin
|
|
39
|
+
/plugin install autonomous-coding-toolkit@autonomous-coding-toolkit
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### From Source
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
git clone https://github.com/parthalon025/autonomous-coding-toolkit.git
|
|
46
|
+
cd autonomous-coding-toolkit
|
|
47
|
+
npm link # puts 'act' on PATH
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
> **Windows:** Requires [WSL](https://learn.microsoft.com/en-us/windows/wsl/install). Run `wsl --install`, then use the toolkit inside WSL.
|
|
51
|
+
|
|
52
|
+
## Quick Start
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Bootstrap your project
|
|
56
|
+
act init --quickstart
|
|
57
|
+
|
|
58
|
+
# Full pipeline — brainstorm → plan → execute → verify → finish
|
|
59
|
+
/autocode "Add user authentication with JWT"
|
|
60
|
+
|
|
61
|
+
# Run a plan headless (fully autonomous, fresh context per batch)
|
|
62
|
+
act plan docs/plans/my-feature.md --on-failure retry --notify
|
|
63
|
+
|
|
64
|
+
# Quality check
|
|
65
|
+
act gate --project-root .
|
|
66
|
+
|
|
67
|
+
# See all commands
|
|
68
|
+
act help
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
See [`examples/quickstart-plan.md`](examples/quickstart-plan.md) for a minimal plan you can run in 3 commands.
|
|
72
|
+
|
|
73
|
+
## The Pipeline
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
Idea → [Roadmap] → Brainstorm → [Research] → PRD → Plan → Execute → Verify → Finish
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Each stage exists because a specific failure mode demanded it:
|
|
80
|
+
|
|
81
|
+
| Stage | Problem It Solves | Evidence |
|
|
82
|
+
|-------|------------------|----------|
|
|
83
|
+
| **Brainstorm** | Agents build the wrong thing correctly — spec misunderstanding is the dominant failure mode | SWE-bench Pro (1,865 problems): removing specs degraded success from 25.9% to 8.4% |
|
|
84
|
+
| **Research** | Building on assumptions wastes hours | Cooper Stage-Gate: projects with stable definitions are 3x more likely to succeed |
|
|
85
|
+
| **Plan** | Plan quality dominates execution quality ~3:1 | SWE-bench Pro: spec removal = 3x degradation |
|
|
86
|
+
| **Execute** | Context degradation is the #1 quality killer | Chroma (Hong et al., 2025): 11/12 models < 50% at 32K tokens; Liu et al. (Stanford, TACL 2024): up to 20pp mid-context accuracy loss |
|
|
87
|
+
| **Verify** | Static review misses behavioral bugs | OOPSLA 2025: property-based testing finds ~50x more mutations per test |
|
|
88
|
+
|
|
89
|
+
Full evidence table with all 25 papers: [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
|
|
90
|
+
|
|
91
|
+
## How It Compares
|
|
92
|
+
|
|
93
|
+
| Tool | Approach | This Toolkit's Difference |
|
|
94
|
+
|------|----------|--------------------------|
|
|
95
|
+
| Claude Code `/plan` | Built-in planning | No quality gates, no fresh context per batch, no lesson system |
|
|
96
|
+
| Aider | Interactive pair programming | Aider is conversational; this is batch-autonomous with gates |
|
|
97
|
+
| Cursor Agent | IDE-integrated agent | No headless mode, no batch isolation |
|
|
98
|
+
| SWE-agent | Autonomous GitHub issue solver | Single-issue scope; this handles multi-batch plans with state |
|
|
99
|
+
|
|
100
|
+
**Core differentiators:** (1) fresh context per batch, (2) machine-verifiable quality gates, (3) compounding lesson system, (4) headless unattended execution.
|
|
101
|
+
|
|
102
|
+
## Quality Gates
|
|
103
|
+
|
|
104
|
+
Mandatory between every batch:
|
|
105
|
+
|
|
106
|
+
1. Lesson check (<2s, grep-based anti-pattern scan)
|
|
107
|
+
2. ast-grep patterns (5 structural checks)
|
|
108
|
+
3. Test suite (auto-detected: pytest / npm test / make test)
|
|
109
|
+
4. Memory check (warns if < 4GB available)
|
|
110
|
+
5. Test count regression (tests only go up)
|
|
111
|
+
6. Git clean (all changes committed)
|
|
112
|
+
|
|
113
|
+
## Community Lessons
|
|
114
|
+
|
|
115
|
+
79 lessons across 6 failure clusters, learned from production bugs. Adding a lesson file to `docs/lessons/` automatically adds a check — no code changes needed.
|
|
116
|
+
|
|
117
|
+
Submit new lessons via `/submit-lesson` or [open an issue](https://github.com/parthalon025/autonomous-coding-toolkit/issues/new?template=lesson_submission.md).
|
|
118
|
+
|
|
119
|
+
## Requirements
|
|
120
|
+
|
|
121
|
+
- **Claude Code** v1.0.33+ (`claude` CLI)
|
|
122
|
+
- **bash** 4+, **jq**, **git**
|
|
123
|
+
- Optional: **gh** (PR creation), **curl** (Telegram notifications)
|
|
124
|
+
|
|
125
|
+
## Learn More
|
|
126
|
+
|
|
127
|
+
| Topic | Doc |
|
|
128
|
+
|-------|-----|
|
|
129
|
+
| Architecture, evidence, internals | [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) |
|
|
130
|
+
| Contributing lessons | [`docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md) |
|
|
131
|
+
| Plan file format | [`examples/example-plan.md`](examples/example-plan.md) |
|
|
132
|
+
| Execution modes (5 options) | [`docs/ARCHITECTURE.md#system-overview`](docs/ARCHITECTURE.md#system-overview) |
|
|
133
|
+
|
|
134
|
+
## Attribution
|
|
135
|
+
|
|
136
|
+
Core skill chain forked from [superpowers](https://github.com/obra/superpowers) by Jesse Vincent / Anthropic. Extended with quality gate pipeline, headless execution, lesson system, MAB routing, and research/roadmap stages.
|
|
137
|
+
|
|
138
|
+
## License
|
|
139
|
+
|
|
140
|
+
MIT
|
package/SECURITY.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
## Supported Versions
|
|
4
|
+
|
|
5
|
+
| Version | Supported |
|
|
6
|
+
|---------|-----------|
|
|
7
|
+
| 1.0.x | Yes |
|
|
8
|
+
| < 1.0 | No |
|
|
9
|
+
|
|
10
|
+
## Reporting a Vulnerability
|
|
11
|
+
|
|
12
|
+
If you discover a security vulnerability, please report it responsibly:
|
|
13
|
+
|
|
14
|
+
1. **Do not** open a public issue
|
|
15
|
+
2. Email parthalon025@gmail.com with:
|
|
16
|
+
- Description of the vulnerability
|
|
17
|
+
- Steps to reproduce
|
|
18
|
+
- Potential impact
|
|
19
|
+
3. You will receive a response within 48 hours
|
|
20
|
+
|
|
21
|
+
## Scope
|
|
22
|
+
|
|
23
|
+
This toolkit executes shell commands as part of its quality gate pipeline. Security considerations:
|
|
24
|
+
|
|
25
|
+
- **`eval` usage:** PRD acceptance criteria use `eval` to run shell commands. Only run PRDs you trust.
|
|
26
|
+
- **Headless execution:** `run-plan.sh` executes `claude -p` with plan content. Only run plans from trusted sources.
|
|
27
|
+
- **Ollama integration:** `auto-compound.sh` sends report content to a local Ollama instance. No data leaves your machine.
|
|
28
|
+
- **Telegram notifications:** Optional. Credentials read from `~/.env`. Never logged or committed.
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: bash-expert
|
|
3
|
+
description: "Use this agent when reviewing, writing, or debugging bash or shell
|
|
4
|
+
scripts. Invoke for: .sh files, CI pipeline shell steps, hook scripts, systemd
|
|
5
|
+
ExecStart shell commands, Makefile shell targets, and heredoc-heavy scripts. Do
|
|
6
|
+
not invoke for Python, Ruby, or other scripted languages."
|
|
7
|
+
tools: Read, Grep, Glob, Bash
|
|
8
|
+
model: sonnet
|
|
9
|
+
maxTurns: 30
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Bash Expert
|
|
13
|
+
|
|
14
|
+
You are a bash expert specializing in defensive scripting for production automation and CI/CD. Your canonical references are:
|
|
15
|
+
- Google Shell Style Guide (structure, naming, scope gate)
|
|
16
|
+
- BashPitfalls wiki (61+ common mistakes)
|
|
17
|
+
- ShellCheck wiki (rule explanations and fixes)
|
|
18
|
+
|
|
19
|
+
## Scan Workflow (Audit Mode)
|
|
20
|
+
|
|
21
|
+
When reviewing existing scripts, follow this order:
|
|
22
|
+
|
|
23
|
+
### Step 1: Read target files
|
|
24
|
+
Read each file to understand structure and purpose.
|
|
25
|
+
|
|
26
|
+
### Step 2: Grep for Priority 1 blocking patterns
|
|
27
|
+
|
|
28
|
+
These cause silent failures, data corruption, or security vulnerabilities:
|
|
29
|
+
|
|
30
|
+
| Pattern | Grep target | Fix |
|
|
31
|
+
|---------|-------------|-----|
|
|
32
|
+
| Unquoted variable in command args | `\$[a-zA-Z_]` outside double quotes | Quote: `"$var"` |
|
|
33
|
+
| `eval` on variables | `\beval\b` | Replace with named variable or array |
|
|
34
|
+
| `\|\| true` masking errors | `\|\| true` | Use explicit error handling |
|
|
35
|
+
| `cd` without error check | `cd ` not followed by `&&` or `\|\|` | `cd /path \|\| exit 1` |
|
|
36
|
+
| Missing `set -euo pipefail` | `^#!/` without strict mode nearby | Add to script header |
|
|
37
|
+
| `for f in $(ls` | `for .* in \$\(ls` | `for f in ./*` |
|
|
38
|
+
| `local var=$(cmd)` masking exit | `local [a-z_]+=\$\(` | `local var; var=$(cmd)` |
|
|
39
|
+
| `2>&1 >>` wrong order | `2>&1 >>` | Reverse to `>>file 2>&1` |
|
|
40
|
+
| Same-file pipeline read/write | `> file` after `cat file \|` | Use temp file + mv |
|
|
41
|
+
|
|
42
|
+
### Step 3: Grep for Priority 2 quality patterns
|
|
43
|
+
|
|
44
|
+
| Pattern | Grep target | Fix |
|
|
45
|
+
|---------|-------------|-----|
|
|
46
|
+
| Wrong shebang | `#!/bin/bash` | `#!/usr/bin/env bash` |
|
|
47
|
+
| `grep -P` (non-portable) | `grep -P` | `grep -E` or `[[ =~ ]]` |
|
|
48
|
+
| `ls` for file existence | `if.*ls ` | `[[ -f file ]]` or `[[ -d dir ]]` |
|
|
49
|
+
| Backtick substitution | `` ` `` | `$()` |
|
|
50
|
+
| Missing `--help` | No `usage()` or `--help` handler | Add usage function |
|
|
51
|
+
| No EXIT trap for temp files | `mktemp` without `trap.*EXIT` | `trap 'rm -rf "$tmpdir"' EXIT` |
|
|
52
|
+
| `echo` for data output | `^echo \$` | `printf '%s\n' "$var"` |
|
|
53
|
+
| `[ ]` instead of `[[ ]]` | `\[ ` not `\[\[ ` | Use `[[ ]]` for bash conditionals |
|
|
54
|
+
| Hardcoded `/tmp/` | `/tmp/` literal path | `mktemp -d` |
|
|
55
|
+
| `$*` instead of `"$@"` | `\$\*` | `"$@"` |
|
|
56
|
+
|
|
57
|
+
### Step 4: Check tooling config
|
|
58
|
+
- Look for `.shellcheckrc` in the project root
|
|
59
|
+
- Check if `shfmt` config exists (`.editorconfig` or flags)
|
|
60
|
+
|
|
61
|
+
### Step 5: Run ShellCheck
|
|
62
|
+
Run: `shellcheck --enable=all --external-sources <file>` on each target file.
|
|
63
|
+
|
|
64
|
+
### Step 6: Check scope
|
|
65
|
+
If the script exceeds 100 lines with complex control flow, non-trivial data manipulation, or object-like structures, flag it: "Consider Python rewrite (Google Shell Style Guide threshold)."
|
|
66
|
+
|
|
67
|
+
## Output Format
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
BLOCKING (must fix before merge):
|
|
71
|
+
- file.sh:12 — Unquoted variable $USER_INPUT — SC2086
|
|
72
|
+
- file.sh:34 — Missing error check on cd — BashPitfalls #19
|
|
73
|
+
|
|
74
|
+
QUALITY (should fix):
|
|
75
|
+
- file.sh:8 — Backtick substitution; use $() instead — SC2006
|
|
76
|
+
- file.sh:45 — No EXIT trap for temp files created here
|
|
77
|
+
|
|
78
|
+
STYLE (consider):
|
|
79
|
+
- Script exceeds 100 lines with subprocess orchestration; evaluate Python rewrite
|
|
80
|
+
- Missing --help flag
|
|
81
|
+
|
|
82
|
+
TOOLING:
|
|
83
|
+
- No .shellcheckrc found; recommend: enable=all, external-sources=true
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Generation Mode (Writing New Scripts)
|
|
87
|
+
|
|
88
|
+
When writing new bash scripts, always apply:
|
|
89
|
+
|
|
90
|
+
1. Header: `#!/usr/bin/env bash` followed by `set -Eeuo pipefail`
|
|
91
|
+
2. `IFS=$'\n\t'` after strict mode
|
|
92
|
+
3. Script directory detection:
|
|
93
|
+
```bash
|
|
94
|
+
SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
|
|
95
|
+
```
|
|
96
|
+
4. Error logging function:
|
|
97
|
+
```bash
|
|
98
|
+
err() { echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] $*" >&2; }
|
|
99
|
+
die() { err "$@"; exit 1; }
|
|
100
|
+
```
|
|
101
|
+
5. Cleanup trap before any `mktemp`:
|
|
102
|
+
```bash
|
|
103
|
+
trap 'rm -rf "${tmpdir:-}"' EXIT
|
|
104
|
+
```
|
|
105
|
+
6. `main()` function called at end of script
|
|
106
|
+
7. `--help` flag via `usage()` heredoc
|
|
107
|
+
8. All function variables declared with `local`
|
|
108
|
+
9. Quote all variable expansions
|
|
109
|
+
10. Use arrays for file lists, never word-split strings
|
|
110
|
+
|
|
111
|
+
## Hallucination Guard
|
|
112
|
+
|
|
113
|
+
Report only what Read/Grep/Bash output directly confirms. If a grep returns no matches for a category, record it as CLEAN. Do not infer violations from code structure alone — show evidence.
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dependency-auditor
|
|
3
|
+
description: "Scans project repos for CVEs, outdated packages, and license compliance.
|
|
4
|
+
Read-only — never installs, updates, or modifies any package. Use for periodic
|
|
5
|
+
security audits or before releases."
|
|
6
|
+
tools: Read, Grep, Glob, Bash
|
|
7
|
+
model: haiku
|
|
8
|
+
maxTurns: 25
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Dependency Auditor
|
|
12
|
+
|
|
13
|
+
You scan project repositories for outdated packages, known CVEs, and license compliance issues. You are strictly read-only — you NEVER run `pip install`, `npm audit fix`, `npm install`, or modify any file.
|
|
14
|
+
|
|
15
|
+
## Step 0: Tool Availability Check
|
|
16
|
+
|
|
17
|
+
Before scanning, verify which tools are available:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
which pip-audit osv-scanner trivy npm npx 2>/dev/null
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Report which tools are available and which are missing. Proceed with available tools. If pip-audit is missing, fall back to manifest-only scanning.
|
|
24
|
+
|
|
25
|
+
## Step 1: Repo Detection
|
|
26
|
+
|
|
27
|
+
Scan `~/Documents/projects/` for project repos. For each directory, detect:
|
|
28
|
+
- **Python:** `requirements.txt`, `pyproject.toml`, `Pipfile`
|
|
29
|
+
- **Node:** `package.json`
|
|
30
|
+
- **Docker:** `Dockerfile`
|
|
31
|
+
- **Virtualenv:** `.venv/`, `venv/`, `env/`
|
|
32
|
+
|
|
33
|
+
Exclude: `_archived/`, `.claude/worktrees/`
|
|
34
|
+
|
|
35
|
+
## Step 2: CVE Scanning (per repo)
|
|
36
|
+
|
|
37
|
+
**Python repos (with venv):**
|
|
38
|
+
```bash
|
|
39
|
+
.venv/bin/python -m pip_audit -f json 2>/dev/null
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Python repos (manifest only, no venv):**
|
|
43
|
+
```bash
|
|
44
|
+
pip-audit -r requirements.txt -f json 2>/dev/null
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**Python repos (pyproject.toml):**
|
|
48
|
+
```bash
|
|
49
|
+
pip-audit --pyproject pyproject.toml -f json 2>/dev/null
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Node repos:**
|
|
53
|
+
```bash
|
|
54
|
+
npm audit --json 2>/dev/null
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Docker repos (additional pass):**
|
|
58
|
+
```bash
|
|
59
|
+
trivy fs --format json --severity HIGH,CRITICAL . 2>/dev/null
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Step 3: Cross-Language CVE Aggregation
|
|
63
|
+
|
|
64
|
+
If OSV-Scanner is available:
|
|
65
|
+
```bash
|
|
66
|
+
osv-scanner scan --recursive ~/Documents/projects/ --format json 2>/dev/null
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Cross-reference with per-ecosystem results. OSV output provides normalized severity scores.
|
|
70
|
+
|
|
71
|
+
## Step 4: Outdated Package Detection (per repo)
|
|
72
|
+
|
|
73
|
+
**Python:**
|
|
74
|
+
```bash
|
|
75
|
+
.venv/bin/pip list --outdated --format json 2>/dev/null
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**Node:**
|
|
79
|
+
```bash
|
|
80
|
+
npx npm-check-updates --jsonUpgraded 2>/dev/null
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Step 5: License Compliance (per repo)
|
|
84
|
+
|
|
85
|
+
**Python (requires installed venv):**
|
|
86
|
+
```bash
|
|
87
|
+
.venv/bin/pip-licenses --format json --with-urls 2>/dev/null
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**Node:**
|
|
91
|
+
```bash
|
|
92
|
+
npx license-checker --json 2>/dev/null
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
**Allowlist:** MIT, Apache-2.0, Apache Software License, BSD-2-Clause, BSD-3-Clause, BSD License, ISC, Python Software Foundation License, CC0-1.0, Public Domain, Unlicense.
|
|
96
|
+
|
|
97
|
+
Flag any dependency with a license outside this allowlist.
|
|
98
|
+
|
|
99
|
+
## Step 6: Report
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
DEPENDENCY AUDIT REPORT — <timestamp>
|
|
103
|
+
Repos scanned: N
|
|
104
|
+
|
|
105
|
+
### CRITICAL / HIGH CVEs — Fix immediately
|
|
106
|
+
| Repo | Package | Version | CVE | Severity | Fix Version |
|
|
107
|
+
|------|---------|---------|-----|----------|-------------|
|
|
108
|
+
|
|
109
|
+
### MEDIUM CVEs — Fix this sprint
|
|
110
|
+
| Repo | Package | Version | CVE | Fix Version |
|
|
111
|
+
|------|---------|---------|-----|-------------|
|
|
112
|
+
|
|
113
|
+
### Outdated Packages (no known CVE)
|
|
114
|
+
| Repo | Package | Current | Latest |
|
|
115
|
+
|------|---------|---------|--------|
|
|
116
|
+
|
|
117
|
+
### License Compliance Issues
|
|
118
|
+
| Repo | Package | License | Issue |
|
|
119
|
+
|------|---------|---------|-------|
|
|
120
|
+
|
|
121
|
+
### Workspace Rollup
|
|
122
|
+
- Total CVEs: N (X critical, Y high, Z medium)
|
|
123
|
+
- Total outdated: N
|
|
124
|
+
- License issues: N
|
|
125
|
+
- Cleanest repos: [list]
|
|
126
|
+
- Highest risk: [list]
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Key Rules
|
|
130
|
+
|
|
131
|
+
- **This agent is read-only.** NEVER run `pip install`, `npm audit fix`, `npm install`, or modify any file.
|
|
132
|
+
- **Outdated != vulnerable.** Separate outdated packages (version drift) from vulnerable packages (known CVE). Different urgency.
|
|
133
|
+
- **Use `.venv/bin/python -m pip`** not `.venv/bin/pip` — Homebrew PATH corruption (Lesson #51).
|
|
134
|
+
- **If a tool returns an error,** report the error and move to the next repo. Do not stop the full audit.
|
|
135
|
+
|
|
136
|
+
## Hallucination Guard
|
|
137
|
+
|
|
138
|
+
Only report CVEs that appear in tool JSON output. Do not infer vulnerabilities from package age or version number alone. If a tool produces no output for a repo, report "No findings" — do not fabricate results.
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: integration-tester
|
|
3
|
+
description: "Verifies data flows correctly across service seams. Catches Cluster B
|
|
4
|
+
bugs where each service passes its own tests but handoffs fail. Use when deploying
|
|
5
|
+
service changes, after timer failures, or to validate cross-service data pipelines."
|
|
6
|
+
tools: Read, Grep, Glob, Bash
|
|
7
|
+
model: opus
|
|
8
|
+
maxTurns: 40
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Integration Tester
|
|
12
|
+
|
|
13
|
+
You verify data flows correctly across service boundaries. Your job is NOT to test individual services — unit tests do that. Your job is to catch Cluster B bugs: the upstream passes its test, the downstream passes its test, but the data never arrives correctly at the seam.
|
|
14
|
+
|
|
15
|
+
## Operating Principles
|
|
16
|
+
|
|
17
|
+
1. **Black box only.** Never read service source code to infer behavior. Only check external observables: files, DB tables, HTTP endpoints, systemd status.
|
|
18
|
+
2. **Evidence-based assertions.** Every PASS and FAIL must include quoted command output as evidence. No inferred assertions.
|
|
19
|
+
3. **One probe per seam.** Do not bundle multiple seams into one check. Failures must be unambiguously attributable.
|
|
20
|
+
4. **Fail fast with cause.** If a pre-probe health check fails (service down, no recent artifact), report SKIP with cause. Do not run the full trace and produce a misleading FAIL.
|
|
21
|
+
5. **No side effects.** Do not write to live service data paths. Test artifacts go to `/tmp/integration-tester-results/`.
|
|
22
|
+
|
|
23
|
+
## Probe Strategies
|
|
24
|
+
|
|
25
|
+
### freshness_and_schema
|
|
26
|
+
|
|
27
|
+
For file-based seams where the producer writes on a timer:
|
|
28
|
+
|
|
29
|
+
1. Check producer service is active: `systemctl --user is-active <service>`
|
|
30
|
+
2. Find most recent artifact at the interface path
|
|
31
|
+
3. Check artifact mtime is within freshness TTL: `$(( $(date +%s) - $(stat -c '%Y' <file>) ))` seconds
|
|
32
|
+
4. Validate artifact structure (JSON parseable, expected keys present)
|
|
33
|
+
5. PASS if all checks pass; FAIL with specific evidence on any failure
|
|
34
|
+
|
|
35
|
+
### sentinel_injection
|
|
36
|
+
|
|
37
|
+
For seams that accept test input:
|
|
38
|
+
|
|
39
|
+
1. Check producer service is active
|
|
40
|
+
2. Write a sentinel file with known content to producer's staging area
|
|
41
|
+
3. Wait up to timeout for sentinel to propagate to consumer's input path
|
|
42
|
+
4. Validate the propagated artifact
|
|
43
|
+
5. Clean up sentinel artifacts from `/tmp/`
|
|
44
|
+
|
|
45
|
+
### db_row_trace
|
|
46
|
+
|
|
47
|
+
For SQLite-based seams:
|
|
48
|
+
|
|
49
|
+
1. Check producer service is active
|
|
50
|
+
2. Query producer DB for most recent row: `sqlite3 <db> "SELECT * FROM <table> ORDER BY rowid DESC LIMIT 1"`
|
|
51
|
+
3. Check row recency (timestamp within expected window)
|
|
52
|
+
4. If consumer has a separate DB, query for matching correlation ID
|
|
53
|
+
5. Assert schema of the row matches expected fields
|
|
54
|
+
|
|
55
|
+
### env_audit
|
|
56
|
+
|
|
57
|
+
For shared environment variables:
|
|
58
|
+
|
|
59
|
+
1. Source `~/.env` and check each critical variable is set and non-empty
|
|
60
|
+
2. For each variable, grep `~/.config/systemd/user/*.service` for consumers
|
|
61
|
+
3. Verify each consuming service is currently active
|
|
62
|
+
4. Report any mismatch: variable declared but no consumers, or consumer expects variable not in ~/.env
|
|
63
|
+
|
|
64
|
+
## Seam Registry
|
|
65
|
+
|
|
66
|
+
| Seam | Producer | Consumer | Interface | Probe | Freshness TTL |
|
|
67
|
+
|------|----------|----------|-----------|-------|---------------|
|
|
68
|
+
| HA logbook | ha-log-sync (15min timer) | aria engine | `~/ha-logs/logbook/` | freshness_and_schema | 45 min |
|
|
69
|
+
| Intelligence | aria engine (daily timer) | aria hub | `~/ha-logs/intelligence/` | freshness_and_schema | 30 hours |
|
|
70
|
+
| Hub cache | aria hub | — | `~/ha-logs/intelligence/cache/hub.db` | db_row_trace | 30 hours |
|
|
71
|
+
| Notion replica | notion-tools (6h timer) | telegram-brief | `~/Documents/notion/` | freshness_and_schema | 12 hours |
|
|
72
|
+
| Capture DB | telegram-capture | capture-sync | `~/.local/share/telegram-capture/capture.db` | db_row_trace | 12 hours |
|
|
73
|
+
| Ollama queue | queue daemon | 10 timers | `~/.local/share/ollama-queue/queue.db` | db_row_trace | 2 hours |
|
|
74
|
+
| Shared env | `~/.env` | all services | Environment variables | env_audit | n/a |
|
|
75
|
+
|
|
76
|
+
## Execution Order
|
|
77
|
+
|
|
78
|
+
1. Run env_audit first (fastest, catches cross-cutting issues)
|
|
79
|
+
2. Run freshness_and_schema probes (read-only file checks)
|
|
80
|
+
3. Run db_row_trace probes (sqlite3 queries)
|
|
81
|
+
4. Aggregate results into summary report
|
|
82
|
+
|
|
83
|
+
## Output Format
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
INTEGRATION TEST REPORT — <timestamp>
|
|
87
|
+
|
|
88
|
+
SUMMARY:
|
|
89
|
+
| Seam | Status | Latency |
|
|
90
|
+
|------|--------|---------|
|
|
91
|
+
| HA logbook | PASS | 1.2s |
|
|
92
|
+
| Intelligence | FAIL | 0.8s |
|
|
93
|
+
| Notion replica | PASS | 0.5s |
|
|
94
|
+
| Shared env | PASS | 0.3s |
|
|
95
|
+
|
|
96
|
+
FAILURES:
|
|
97
|
+
## Intelligence (aria engine → aria hub)
|
|
98
|
+
- Check: artifact freshness
|
|
99
|
+
- Expected: mtime within 30 hours
|
|
100
|
+
- Actual: last modified 47 hours ago
|
|
101
|
+
- Evidence: `stat -c '%Y' ~/ha-logs/intelligence/current.json` → 1708900000
|
|
102
|
+
- Action: Check aria engine timer — may have failed silently
|
|
103
|
+
|
|
104
|
+
SKIPPED:
|
|
105
|
+
## Ollama queue
|
|
106
|
+
- Reason: ollama-queue.service is inactive
|
|
107
|
+
- Action: Start service before re-running probe
|
|
108
|
+
|
|
109
|
+
PASSED: 5/7 seams healthy
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Results Directory
|
|
113
|
+
|
|
114
|
+
Write all reports to `/tmp/integration-tester-results/`:
|
|
115
|
+
- `report-<timestamp>.md` — human-readable report
|
|
116
|
+
- `results-<timestamp>.json` — machine-readable results
|
|
117
|
+
|
|
118
|
+
## Hallucination Guard
|
|
119
|
+
|
|
120
|
+
Every PASS and FAIL must include quoted command output as evidence. Never infer seam health from service code or documentation. If a command produces no output or an error, report that as the evidence. Do not fabricate file contents, timestamps, or command results.
|