npm - autonomous-coding-toolkit - Versions diffs - 1.0.0 - Mend

autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

package/.claude-plugin/marketplace.json +22 -0
package/.claude-plugin/plugin.json +13 -0
package/LICENSE +21 -0
package/Makefile +21 -0
package/README.md +140 -0
package/SECURITY.md +28 -0
package/agents/bash-expert.md +113 -0
package/agents/dependency-auditor.md +138 -0
package/agents/integration-tester.md +120 -0
package/agents/lesson-scanner.md +149 -0
package/agents/python-expert.md +179 -0
package/agents/service-monitor.md +141 -0
package/agents/shell-expert.md +147 -0
package/benchmarks/runner.sh +147 -0
package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
package/benchmarks/tasks/02-refactor-module/task.md +8 -0
package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
package/bin/act.js +238 -0
package/commands/autocode.md +6 -0
package/commands/cancel-ralph.md +18 -0
package/commands/code-factory.md +53 -0
package/commands/create-prd.md +55 -0
package/commands/ralph-loop.md +18 -0
package/commands/run-plan.md +117 -0
package/commands/submit-lesson.md +122 -0
package/docs/ARCHITECTURE.md +630 -0
package/docs/CONTRIBUTING.md +125 -0
package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
package/docs/lessons/0002-async-def-without-await.md +28 -0
package/docs/lessons/0003-create-task-without-callback.md +28 -0
package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
package/docs/lessons/0005-sqlite-without-closing.md +33 -0
package/docs/lessons/0006-venv-pip-path.md +27 -0
package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
package/docs/lessons/0010-local-outside-function-bash.md +33 -0
package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
package/docs/lessons/0020-persist-state-incrementally.md +44 -0
package/docs/lessons/0021-dual-axis-testing.md +48 -0
package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
package/docs/lessons/0023-static-analysis-spiral.md +51 -0
package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
package/docs/lessons/0045-iterative-design-improvement.md +33 -0
package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
package/docs/lessons/0048-integration-wiring-batch.md +40 -0
package/docs/lessons/0049-ab-verification.md +41 -0
package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
package/docs/lessons/0078-static-review-without-live-test.md +30 -0
package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
package/docs/lessons/FRAMEWORK.md +161 -0
package/docs/lessons/SUMMARY.md +201 -0
package/docs/lessons/TEMPLATE.md +85 -0
package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
package/docs/plans/2026-02-21-mab-research-report.md +406 -0
package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
package/docs/plans/2026-02-22-mab-run-design.md +462 -0
package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
package/docs/plans/2026-02-24-headless-module-split.md +443 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
package/docs/plans/audit-findings.md +186 -0
package/docs/telegram-notification-format.md +98 -0
package/examples/example-plan.md +51 -0
package/examples/example-prd.json +72 -0
package/examples/example-roadmap.md +33 -0
package/examples/quickstart-plan.md +63 -0
package/hooks/hooks.json +26 -0
package/hooks/setup-symlinks.sh +48 -0
package/hooks/stop-hook.sh +135 -0
package/package.json +47 -0
package/policies/bash.md +71 -0
package/policies/python.md +71 -0
package/policies/testing.md +61 -0
package/policies/universal.md +60 -0
package/scripts/analyze-report.sh +97 -0
package/scripts/architecture-map.sh +145 -0
package/scripts/auto-compound.sh +273 -0
package/scripts/batch-audit.sh +42 -0
package/scripts/batch-test.sh +101 -0
package/scripts/entropy-audit.sh +221 -0
package/scripts/failure-digest.sh +51 -0
package/scripts/generate-ast-rules.sh +96 -0
package/scripts/init.sh +112 -0
package/scripts/lesson-check.sh +428 -0
package/scripts/lib/common.sh +61 -0
package/scripts/lib/cost-tracking.sh +153 -0
package/scripts/lib/ollama.sh +60 -0
package/scripts/lib/progress-writer.sh +128 -0
package/scripts/lib/run-plan-context.sh +215 -0
package/scripts/lib/run-plan-echo-back.sh +231 -0
package/scripts/lib/run-plan-headless.sh +396 -0
package/scripts/lib/run-plan-notify.sh +57 -0
package/scripts/lib/run-plan-parser.sh +81 -0
package/scripts/lib/run-plan-prompt.sh +215 -0
package/scripts/lib/run-plan-quality-gate.sh +132 -0
package/scripts/lib/run-plan-routing.sh +315 -0
package/scripts/lib/run-plan-sampling.sh +170 -0
package/scripts/lib/run-plan-scoring.sh +146 -0
package/scripts/lib/run-plan-state.sh +142 -0
package/scripts/lib/run-plan-team.sh +199 -0
package/scripts/lib/telegram.sh +54 -0
package/scripts/lib/thompson-sampling.sh +176 -0
package/scripts/license-check.sh +74 -0
package/scripts/mab-run.sh +575 -0
package/scripts/module-size-check.sh +146 -0
package/scripts/patterns/async-no-await.yml +5 -0
package/scripts/patterns/bare-except.yml +6 -0
package/scripts/patterns/empty-catch.yml +6 -0
package/scripts/patterns/hardcoded-localhost.yml +9 -0
package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
package/scripts/pipeline-status.sh +197 -0
package/scripts/policy-check.sh +226 -0
package/scripts/prior-art-search.sh +133 -0
package/scripts/promote-mab-lessons.sh +126 -0
package/scripts/prompts/agent-a-superpowers.md +29 -0
package/scripts/prompts/agent-b-ralph.md +29 -0
package/scripts/prompts/judge-agent.md +61 -0
package/scripts/prompts/planner-agent.md +44 -0
package/scripts/pull-community-lessons.sh +90 -0
package/scripts/quality-gate.sh +266 -0
package/scripts/research-gate.sh +90 -0
package/scripts/run-plan.sh +329 -0
package/scripts/scope-infer.sh +159 -0
package/scripts/setup-ralph-loop.sh +155 -0
package/scripts/telemetry.sh +230 -0
package/scripts/tests/run-all-tests.sh +52 -0
package/scripts/tests/test-act-cli.sh +46 -0
package/scripts/tests/test-agents-md.sh +87 -0
package/scripts/tests/test-analyze-report.sh +114 -0
package/scripts/tests/test-architecture-map.sh +89 -0
package/scripts/tests/test-auto-compound.sh +169 -0
package/scripts/tests/test-batch-test.sh +65 -0
package/scripts/tests/test-benchmark-runner.sh +25 -0
package/scripts/tests/test-common.sh +168 -0
package/scripts/tests/test-cost-tracking.sh +158 -0
package/scripts/tests/test-echo-back.sh +180 -0
package/scripts/tests/test-entropy-audit.sh +146 -0
package/scripts/tests/test-failure-digest.sh +66 -0
package/scripts/tests/test-generate-ast-rules.sh +145 -0
package/scripts/tests/test-helpers.sh +82 -0
package/scripts/tests/test-init.sh +47 -0
package/scripts/tests/test-lesson-check.sh +278 -0
package/scripts/tests/test-lesson-local.sh +55 -0
package/scripts/tests/test-license-check.sh +109 -0
package/scripts/tests/test-mab-run.sh +182 -0
package/scripts/tests/test-ollama-lib.sh +49 -0
package/scripts/tests/test-ollama.sh +60 -0
package/scripts/tests/test-pipeline-status.sh +198 -0
package/scripts/tests/test-policy-check.sh +124 -0
package/scripts/tests/test-prior-art-search.sh +96 -0
package/scripts/tests/test-progress-writer.sh +140 -0
package/scripts/tests/test-promote-mab-lessons.sh +110 -0
package/scripts/tests/test-pull-community-lessons.sh +149 -0
package/scripts/tests/test-quality-gate.sh +241 -0
package/scripts/tests/test-research-gate.sh +132 -0
package/scripts/tests/test-run-plan-cli.sh +86 -0
package/scripts/tests/test-run-plan-context.sh +305 -0
package/scripts/tests/test-run-plan-e2e.sh +153 -0
package/scripts/tests/test-run-plan-headless.sh +424 -0
package/scripts/tests/test-run-plan-notify.sh +124 -0
package/scripts/tests/test-run-plan-parser.sh +217 -0
package/scripts/tests/test-run-plan-prompt.sh +254 -0
package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
package/scripts/tests/test-run-plan-routing.sh +178 -0
package/scripts/tests/test-run-plan-scoring.sh +148 -0
package/scripts/tests/test-run-plan-state.sh +261 -0
package/scripts/tests/test-run-plan-team.sh +157 -0
package/scripts/tests/test-scope-infer.sh +150 -0
package/scripts/tests/test-setup-ralph-loop.sh +63 -0
package/scripts/tests/test-telegram-env.sh +38 -0
package/scripts/tests/test-telegram.sh +121 -0
package/scripts/tests/test-telemetry.sh +46 -0
package/scripts/tests/test-thompson-sampling.sh +139 -0
package/scripts/tests/test-validate-all.sh +60 -0
package/scripts/tests/test-validate-commands.sh +89 -0
package/scripts/tests/test-validate-hooks.sh +98 -0
package/scripts/tests/test-validate-lessons.sh +150 -0
package/scripts/tests/test-validate-plan-quality.sh +235 -0
package/scripts/tests/test-validate-plans.sh +187 -0
package/scripts/tests/test-validate-plugin.sh +106 -0
package/scripts/tests/test-validate-prd.sh +184 -0
package/scripts/tests/test-validate-skills.sh +134 -0
package/scripts/validate-all.sh +57 -0
package/scripts/validate-commands.sh +67 -0
package/scripts/validate-hooks.sh +89 -0
package/scripts/validate-lessons.sh +98 -0
package/scripts/validate-plan-quality.sh +369 -0
package/scripts/validate-plans.sh +120 -0
package/scripts/validate-plugin.sh +86 -0
package/scripts/validate-policies.sh +42 -0
package/scripts/validate-prd.sh +118 -0
package/scripts/validate-skills.sh +96 -0
package/skills/autocode/SKILL.md +285 -0
package/skills/autocode/ab-verification.md +51 -0
package/skills/autocode/code-quality-standards.md +37 -0
package/skills/autocode/competitive-mode.md +364 -0
package/skills/brainstorming/SKILL.md +97 -0
package/skills/capture-lesson/SKILL.md +187 -0
package/skills/check-lessons/SKILL.md +116 -0
package/skills/dispatching-parallel-agents/SKILL.md +110 -0
package/skills/executing-plans/SKILL.md +85 -0
package/skills/finishing-a-development-branch/SKILL.md +201 -0
package/skills/receiving-code-review/SKILL.md +72 -0
package/skills/requesting-code-review/SKILL.md +59 -0
package/skills/requesting-code-review/code-reviewer.md +82 -0
package/skills/research/SKILL.md +145 -0
package/skills/roadmap/SKILL.md +115 -0
package/skills/subagent-driven-development/SKILL.md +98 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
package/skills/subagent-driven-development/implementer-prompt.md +73 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
package/skills/systematic-debugging/SKILL.md +134 -0
package/skills/systematic-debugging/condition-based-waiting.md +64 -0
package/skills/systematic-debugging/defense-in-depth.md +32 -0
package/skills/systematic-debugging/root-cause-tracing.md +55 -0
package/skills/test-driven-development/SKILL.md +167 -0
package/skills/using-git-worktrees/SKILL.md +219 -0
package/skills/using-superpowers/SKILL.md +54 -0
package/skills/verification-before-completion/SKILL.md +140 -0
package/skills/verify/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +128 -0
package/skills/writing-skills/SKILL.md +93 -0

package/benchmarks/runner.sh ADDED Viewed

@@ -0,0 +1,147 @@
+#!/usr/bin/env bash
+# runner.sh — Benchmark orchestrator for the Autonomous Coding Toolkit
+#
+# Usage:
+#   runner.sh run [task-name]      Run all or one benchmark
+#   runner.sh compare <a> <b>      Compare two result files
+#   runner.sh list                 List available benchmarks
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)"
+TASKS_DIR="$SCRIPT_DIR/tasks"
+RESULTS_DIR="${BENCHMARK_RESULTS_DIR:-$SCRIPT_DIR/results}"
+usage() {
+    cat <<'USAGE'
+Usage: runner.sh <run|compare|list> [options]
+Commands:
+  run [name]        Run all benchmarks, or a specific one by directory name
+  compare <a> <b>   Compare two result JSON files
+  list              List available benchmark tasks
+Options:
+  --help, -h        Show this help
+Results are saved to benchmarks/results/ (gitignored).
+USAGE
+    exit 0
+}
+SUBCOMMAND="${1:-}"
+shift || true
+case "$SUBCOMMAND" in
+    list)
+        echo "Available benchmarks:"
+        for task_dir in "$TASKS_DIR"/*/; do
+            [[ -d "$task_dir" ]] || continue
+            name=$(basename "$task_dir")
+            desc=""
+            if [[ -f "$task_dir/task.md" ]]; then
+                desc=$(head -1 "$task_dir/task.md" | sed 's/^# //')
+            fi
+            echo "  $name — $desc"
+        done
+        ;;
+    run)
+        TARGET="${1:-all}"
+        mkdir -p "$RESULTS_DIR"
+        timestamp=$(date -u +%Y%m%dT%H%M%SZ)
+        run_benchmark() {
+            local task_dir="$1"
+            local name=$(basename "$task_dir")
+            echo "=== Benchmark: $name ==="
+            if [[ ! -f "$task_dir/rubric.sh" ]]; then
+                echo "  SKIP: no rubric.sh found"
+                return
+            fi
+            local score=0
+            local total=0
+            local pass=0
+            # Run rubric — each line of output is "PASS: desc" or "FAIL: desc"
+            while IFS= read -r line; do
+                total=$((total + 1))
+                if [[ "$line" == PASS:* ]]; then
+                    pass=$((pass + 1))
+                fi
+                echo "  $line"
+            done < <(bash "$task_dir/rubric.sh" 2>&1 || true)
+            if [[ $total -gt 0 ]]; then
+                score=$((pass * 100 / total))
+            fi
+            echo "  Score: ${score}% ($pass/$total)"
+            echo ""
+            # Write result
+            jq -n --arg name "$name" --argjson score "$score" \
+                --argjson pass "$pass" --argjson total "$total" \
+                --arg ts "$timestamp" \
+                '{name: $name, score: $score, passed: $pass, total: $total, timestamp: $ts}' \
+                >> "$RESULTS_DIR/$timestamp.jsonl"
+        }
+        if [[ "$TARGET" == "all" ]]; then
+            for task_dir in "$TASKS_DIR"/*/; do
+                [[ -d "$task_dir" ]] || continue
+                run_benchmark "$task_dir"
+            done
+        else
+            if [[ -d "$TASKS_DIR/$TARGET" ]]; then
+                run_benchmark "$TASKS_DIR/$TARGET"
+            else
+                echo "Benchmark not found: $TARGET" >&2
+                echo "Run 'runner.sh list' to see available benchmarks." >&2
+                exit 1
+            fi
+        fi
+        echo "Results saved to: $RESULTS_DIR/$timestamp.jsonl"
+        ;;
+    compare)
+        FILE_A="${1:-}"
+        FILE_B="${2:-}"
+        if [[ -z "$FILE_A" || -z "$FILE_B" ]]; then
+            echo "Usage: runner.sh compare <result-a.jsonl> <result-b.jsonl>" >&2
+            exit 1
+        fi
+        if [[ ! -f "$FILE_A" || ! -f "$FILE_B" ]]; then
+            echo "One or both files not found." >&2
+            exit 1
+        fi
+        echo "Benchmark Comparison"
+        echo "═════════════════════════════════════"
+        printf "%-25s %8s %8s %8s\n" "Task" "Before" "After" "Delta"
+        echo "─────────────────────────────────────────────"
+        jq -s '
+            [.[0], .[1]] | transpose | .[] |
+            select(.[0] != null and .[1] != null) |
+            "\(.[0].name)|\(.[0].score)|\(.[1].score)|\(.[1].score - .[0].score)"
+        ' <(jq -s '.' "$FILE_A") <(jq -s '.' "$FILE_B") 2>/dev/null | \
+        while IFS='|' read -r name before after delta; do
+            sign=""
+            [[ "$delta" -gt 0 ]] && sign="+"
+            printf "%-25s %7s%% %7s%% %7s%%\n" "$name" "$before" "$after" "${sign}${delta}"
+        done
+        echo "═════════════════════════════════════"
+        ;;
+    help|--help|-h|"")
+        usage
+        ;;
+    *)
+        echo "Unknown command: $SUBCOMMAND" >&2
+        usage
+        ;;
+esac

package/benchmarks/tasks/01-rest-endpoint/rubric.sh ADDED Viewed

@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+# Rubric for 01-rest-endpoint benchmark
+set -euo pipefail
+PROJECT_ROOT="${BENCHMARK_PROJECT_ROOT:-.}"
+# Criterion 1: Health endpoint file exists
+if compgen -G "$PROJECT_ROOT/src/*health*" >/dev/null 2>&1 || \
+   compgen -G "$PROJECT_ROOT/app/*health*" >/dev/null 2>&1 || \
+   grep -rl "health" "$PROJECT_ROOT/src/" "$PROJECT_ROOT/app/" 2>/dev/null | head -1 >/dev/null 2>&1; then
+    echo "PASS: Health endpoint file exists"
+else
+    echo "FAIL: Health endpoint file not found"
+fi
+# Criterion 2: Test file exists
+if compgen -G "$PROJECT_ROOT/tests/*health*" >/dev/null 2>&1 || \
+   compgen -G "$PROJECT_ROOT/test/*health*" >/dev/null 2>&1; then
+    echo "PASS: Health endpoint test file exists"
+else
+    echo "FAIL: Health endpoint test file not found"
+fi
+# Criterion 3: Test passes
+if cd "$PROJECT_ROOT" && (npm test 2>/dev/null || pytest 2>/dev/null || make test 2>/dev/null); then
+    echo "PASS: Tests pass"
+else
+    echo "FAIL: Tests do not pass"
+fi

package/benchmarks/tasks/01-rest-endpoint/task.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Add a REST Endpoint with Tests
+**Complexity:** Simple (1 batch)
+**Measures:** Basic execution, TDD compliance
+## Task
+Add a `/health` endpoint to the project that:
+1. Returns HTTP 200 with JSON body `{"status": "ok", "timestamp": "<ISO8601>"}`
+2. Has a test that verifies the response status and body structure
+3. All tests pass
+## Constraints
+- Use the project's existing web framework (or add minimal one if none exists)
+- Follow existing code style and patterns
+- Test must be automated (no manual verification)

package/benchmarks/tasks/02-refactor-module/task.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Refactor a Module into Two
+**Complexity:** Medium (2 batches)
+**Measures:** Refactoring quality, test preservation
+## Task
+Split `src/utils.sh` into `src/string-utils.sh` and `src/file-utils.sh`, preserving all existing tests.

package/benchmarks/tasks/03-fix-integration-bug/task.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Fix an Integration Bug
+**Complexity:** Medium (2 batches)
+**Measures:** Debugging, root cause analysis
+## Task
+The `/api/users` endpoint returns 500 when the database connection pool is exhausted. Find and fix the root cause.

package/benchmarks/tasks/04-add-test-coverage/task.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Add Test Coverage to Untested Module
+**Complexity:** Medium (2 batches)
+**Measures:** Test quality, edge case discovery
+## Task
+Add comprehensive tests to `src/parser.sh` which currently has 0% coverage. Cover happy path, edge cases, and error conditions.

package/benchmarks/tasks/05-multi-file-feature/task.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Multi-File Feature with API + DB + Tests
+**Complexity:** Complex (4 batches)
+**Measures:** Full pipeline, cross-file coordination
+## Task
+Add a "bookmarks" feature: API endpoints (CRUD), database migration, and integration tests.

package/bin/act.js ADDED Viewed

@@ -0,0 +1,238 @@
+#!/usr/bin/env node
+'use strict';
+const { execFileSync } = require('child_process');
+const fs = require('fs');
+const path = require('path');
+// ---------------------------------------------------------------------------
+// Toolkit root: works for npm global, npx, and local clone
+// ---------------------------------------------------------------------------
+const TOOLKIT_ROOT = path.resolve(__dirname, '..');
+let VERSION;
+try {
+  const pkg = JSON.parse(fs.readFileSync(path.join(TOOLKIT_ROOT, 'package.json'), 'utf8'));
+  VERSION = pkg.version;
+} catch (err) {
+  console.error('Error: Could not read package.json');
+  console.error(`  ${err.message}`);
+  process.exit(1);
+}
+// ---------------------------------------------------------------------------
+// Platform check — bash required
+// ---------------------------------------------------------------------------
+function checkPlatform() {
+  if (process.platform === 'win32') {
+    let inWsl = false;
+    try {
+      inWsl = fs.existsSync('/proc/version') &&
+        fs.readFileSync('/proc/version', 'utf8').toLowerCase().includes('microsoft');
+    } catch (_) {
+      // If /proc/version is unreadable, assume not WSL
+    }
+    if (!inWsl) {
+      console.error(
+        'Error: act requires bash, which is not available on native Windows.\n' +
+        'Hint: Install WSL2 (https://aka.ms/wsl) and run act from a WSL terminal.'
+      );
+      process.exit(1);
+    }
+  }
+}
+// ---------------------------------------------------------------------------
+// Dependency check
+// ---------------------------------------------------------------------------
+function checkDep(cmd) {
+  try {
+    execFileSync('which', [cmd], { stdio: 'pipe' });
+  } catch (_) {
+    console.error(`Error: Required dependency "${cmd}" not found on PATH.`);
+    console.error(`Install it and try again.`);
+    process.exit(1);
+  }
+}
+function checkDependencies() {
+  checkDep('bash');
+  checkDep('git');
+  checkDep('jq');
+}
+// ---------------------------------------------------------------------------
+// Script runner
+// ---------------------------------------------------------------------------
+function scripts(name) {
+  return path.join(TOOLKIT_ROOT, 'scripts', name);
+}
+function runScript(scriptPath, args) {
+  if (!fs.existsSync(scriptPath)) {
+    console.error(`Error: Script not found: ${scriptPath}`);
+    console.error('This script may not be included in the current installation.');
+    console.error('Try reinstalling: npm install -g autonomous-coding-toolkit');
+    process.exit(1);
+  }
+  try {
+    execFileSync('bash', [scriptPath, ...args], { stdio: 'inherit' });
+  } catch (err) {
+    process.exit(err.status != null ? err.status : 1);
+  }
+}
+// ---------------------------------------------------------------------------
+// Help text
+// ---------------------------------------------------------------------------
+function printHelp() {
+  console.log(`Autonomous Coding Toolkit v${VERSION}
+Usage: act <command> [options]
+Execution:
+  plan <file> [flags]          Headless/team/MAB batch execution
+  plan --resume                Resume interrupted execution
+  compound [dir]               Full pipeline: report→PRD→execute→PR
+  mab <flags>                  Multi-Armed Bandit competing agents
+Quality:
+  gate [flags]                 Composite quality gate (lesson-check + tests + memory)
+  check [files...]             Syntactic anti-pattern scan from lesson files
+  policy [files...]            Advisory positive-pattern checker
+  research-gate [flags]        Block PRD if unresolved research issues
+  validate                     Run all validators
+  validate-plan <file>         Validate plan quality score
+  validate-prd [file]          Validate PRD shell-command criteria
+Lessons:
+  lessons pull                 Pull community lessons from upstream
+  lessons check                List active lesson checks
+  lessons promote              Promote MAB-discovered lessons
+  lessons infer                Infer scope metadata for lesson files
+Analysis:
+  audit [flags]                Entropy audit: doc drift, naming violations
+  batch-audit [flags]          Cross-project audit runner
+  batch-test [flags]           Memory-aware cross-project test runner
+  analyze [report]             Analyze audit/test report
+  digest [flags]               Failure digest from run logs
+  status [flags]               Pipeline status summary
+  architecture [flags]         Generate architecture map
+Telemetry:
+  telemetry [flags]            Telemetry reporting
+Benchmarks:
+  benchmark [flags]            Run benchmark suite
+Setup:
+  init [flags]                 Initialize toolkit in current project
+  license-check [flags]        Check dependency licenses
+  module-size [flags]          Check module sizes against budget
+Meta:
+  version                      Print version
+  help                         Show this help text
+`);
+}
+// ---------------------------------------------------------------------------
+// Command map
+// ---------------------------------------------------------------------------
+const COMMANDS = {
+  // Execution
+  plan:            { script: scripts('run-plan.sh') },
+  compound:        { script: scripts('auto-compound.sh') },
+  mab:             { script: scripts('mab-run.sh') },
+  // Quality
+  gate:            { script: scripts('quality-gate.sh') },
+  check:           { script: scripts('lesson-check.sh') },
+  policy:          { script: scripts('policy-check.sh') },
+  'research-gate': { script: scripts('research-gate.sh') },
+  validate:        { script: scripts('validate-all.sh') },
+  'validate-plan': { script: scripts('validate-plan-quality.sh') },
+  'validate-prd':  { script: scripts('validate-prd.sh') },
+  // Analysis
+  audit:           { script: scripts('entropy-audit.sh') },
+  'batch-audit':   { script: scripts('batch-audit.sh') },
+  'batch-test':    { script: scripts('batch-test.sh') },
+  analyze:         { script: scripts('analyze-report.sh') },
+  digest:          { script: scripts('failure-digest.sh') },
+  status:          { script: scripts('pipeline-status.sh') },
+  architecture:    { script: scripts('architecture-map.sh') },
+  // Setup
+  init:            { script: scripts('init.sh') },
+  'license-check': { script: scripts('license-check.sh') },
+  'module-size':   { script: scripts('module-size-check.sh') },
+  // Telemetry
+  telemetry:       { script: scripts('telemetry.sh') },
+  // Benchmarks (note: under benchmarks/, not scripts/)
+  benchmark:       { script: path.join(TOOLKIT_ROOT, 'benchmarks', 'runner.sh') },
+};
+// Lessons sub-dispatch
+const LESSONS_COMMANDS = {
+  pull:    { script: scripts('pull-community-lessons.sh'), args: [] },
+  check:   { script: scripts('lesson-check.sh'),          args: ['--list'] },
+  promote: { script: scripts('promote-mab-lessons.sh'),   args: [] },
+  infer:   { script: scripts('scope-infer.sh'),           args: [] },
+};
+// ---------------------------------------------------------------------------
+// Main
+// ---------------------------------------------------------------------------
+function main() {
+  const args = process.argv.slice(2);
+  const cmd = args[0];
+  const rest = args.slice(1);
+  // Built-in meta commands (no bash needed)
+  if (!cmd || cmd === 'help' || cmd === '--help' || cmd === '-h') {
+    printHelp();
+    process.exit(0);
+  }
+  if (cmd === 'version' || cmd === '--version' || cmd === '-v') {
+    console.log(`act v${VERSION}`);
+    process.exit(0);
+  }
+  // Platform + dependency checks for all other commands
+  checkPlatform();
+  checkDependencies();
+  // Lessons sub-dispatch
+  if (cmd === 'lessons') {
+    const sub = rest[0];
+    const subArgs = rest.slice(1);
+    if (!sub) {
+      console.error('Error: "lessons" requires a subcommand: pull, check, promote, infer');
+      process.exit(1);
+    }
+    const lessonCmd = LESSONS_COMMANDS[sub];
+    if (!lessonCmd) {
+      console.error(`Error: Unknown lessons subcommand: ${sub}`);
+      console.error('Available: pull, check, promote, infer');
+      process.exit(1);
+    }
+    runScript(lessonCmd.script, [...lessonCmd.args, ...subArgs]);
+    return;
+  }
+  // Standard command routing
+  const entry = COMMANDS[cmd];
+  if (!entry) {
+    console.error(`Error: Unknown command: ${cmd}`);
+    console.error(`Run "act help" to see available commands.`);
+    process.exit(1);
+  }
+  runScript(entry.script, rest);
+}
+main();

package/commands/autocode.md ADDED Viewed

@@ -0,0 +1,6 @@
+---
+description: "Run the full autonomous coding pipeline — brainstorm → PRD → plan → execute → verify → finish"
+argument-hint: "<feature description, report path, or issue #>"
+---
+Invoke the `autonomous-coding-toolkit:autocode` skill to run the full pipeline for: $ARGUMENTS

package/commands/cancel-ralph.md ADDED Viewed

@@ -0,0 +1,18 @@
+---
+description: "Cancel active Ralph Loop"
+allowed-tools: ["Bash(test -f .claude/ralph-loop.local.md:*)", "Bash(rm .claude/ralph-loop.local.md)", "Read(.claude/ralph-loop.local.md)"]
+hide-from-slash-command-tool: "true"
+---
+# Cancel Ralph
+To cancel the Ralph loop:
+1. Check if `.claude/ralph-loop.local.md` exists using Bash: `test -f .claude/ralph-loop.local.md && echo "EXISTS" || echo "NOT_FOUND"`
+2. **If NOT_FOUND**: Say "No active Ralph loop found."
+3. **If EXISTS**:
+   - Read `.claude/ralph-loop.local.md` to get the current iteration number from the `iteration:` field
+   - Remove the file using Bash: `rm .claude/ralph-loop.local.md`
+   - Report: "Cancelled Ralph loop (was at iteration N)" where N is the iteration value

package/commands/code-factory.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+description: "Run the full Code Factory pipeline — brainstorm → PRD → plan → execute → verify"
+argument-hint: "<feature description or report path>"
+---
+# Code Factory
+Run the full agent-driven development pipeline for: $ARGUMENTS
+## Pipeline
+This command orchestrates the superpowers skill chain with Code Factory enhancements integrated at each stage. Follow each step in order — do not skip stages.
+### Stage 1: Brainstorming
+Invoke `superpowers:brainstorming` to explore the idea, ask questions, propose approaches, and produce an approved design doc at `docs/plans/YYYY-MM-DD-<topic>-design.md`.
+### Stage 2: PRD Generation
+After the design is approved, generate `tasks/prd.json` using the `/create-prd` format:
+- 8-15 granular tasks with machine-verifiable acceptance criteria (shell commands)
+- Separate investigation tasks from implementation tasks
+- Order by dependency
+- Save both `tasks/prd.json` and `tasks/prd-<feature>.md`
+### Stage 3: Writing Plans
+Invoke `superpowers:writing-plans` to create the implementation plan. Enhance the plan with:
+- A `## Quality Gates` section listing project checks (auto-detect: pytest, npm test, npm run lint, make test)
+- Cross-references to `tasks/prd.json` task IDs
+- `progress.txt` initialization as the first step
+### Stage 4: Execution
+Invoke `superpowers:executing-plans` to execute in batches. Between each batch:
+- Run quality gate commands and report results
+- Update `tasks/prd.json` — mark passing tasks
+- Append batch summary to `progress.txt`
+- Fix any failures before proceeding
+### Stage 5: Verification
+Invoke `superpowers:verification-before-completion`:
+- Run ALL `tasks/prd.json` acceptance criteria
+- Confirm every task has `"passes": true`
+- Show quality gate evidence
+- Only claim completion with full evidence
+### Stage 6: Finish
+Invoke `superpowers:finishing-a-development-branch` to handle commit, PR, or merge.
+## Rules
+- Never skip a stage. The design must be approved before PRD generation.
+- Every acceptance criterion is a shell command. No vague criteria.
+- Quality gates run between EVERY batch, not just at the end.
+- Progress.txt is append-only during execution — never truncate it.
+- If the input is a report file path, run `scripts/analyze-report.sh` first to identify the top priority, then use that as the feature description for brainstorming.

package/commands/create-prd.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+description: "Generate a PRD with machine-verifiable acceptance criteria from a feature description"
+argument-hint: "<feature description>"
+---
+# Create PRD
+Generate a Product Requirements Document for the given feature.
+## Input
+The user provides a feature description: $ARGUMENTS
+## Process
+1. **Understand the feature** — Ask clarifying questions if the description is ambiguous
+2. **Break into tasks** — Generate 8-15 small, granular tasks (not 3-5 large ones)
+3. **Machine-verifiable criteria** — Every acceptance criterion must be a command that returns pass/fail:
+   - Test commands: `pytest tests/test_feature.py -x`
+   - Lint commands: `python3 -m py_compile file.py`
+   - Endpoint checks: `curl -s -o /dev/null -w '%{http_code}' http://localhost:8000/endpoint`
+   - File existence: `test -f path/to/file`
+   - Pattern checks: `grep -q 'expected_pattern' file`
+4. **Separate investigation from implementation** — "Research X" and "Implement X" are different tasks
+5. **Order by dependency** — Tasks should be ordered so each builds on the previous
+## Output Format
+Save to `tasks/prd.json` (create `tasks/` directory if needed):
+```json
+[
+  {
+    "id": 1,
+    "title": "Short imperative title",
+    "description": "What needs to be done and why",
+    "acceptance_criteria": [
+      "pytest tests/test_auth.py::test_login -x",
+      "test -f src/auth/handler.py"
+    ],
+    "passes": false,
+    "blocked_by": []
+  }
+]
+```
+Also save a human-readable version to `tasks/prd-<feature-slug>.md` with full descriptions.
+## Rules
+- Each task should take 1-3 iterations of a Ralph loop to complete
+- Acceptance criteria MUST be shell commands that exit 0 on success, non-zero on failure
+- No vague criteria like "code is clean" or "well-tested" — everything is boolean
+- Include setup tasks (create directories, install deps) as separate tasks
+- Final task should always be "Run full quality gate" with all checks combined

package/commands/ralph-loop.md ADDED Viewed

@@ -0,0 +1,18 @@
+---
+description: "Start Ralph Loop in current session"
+argument-hint: "PROMPT [--max-iterations N] [--completion-promise TEXT]"
+allowed-tools: ["Bash(${CLAUDE_PLUGIN_ROOT}/scripts/setup-ralph-loop.sh:*)"]
+hide-from-slash-command-tool: "true"
+---
+# Ralph Loop Command
+Execute the setup script to initialize the Ralph loop:
+```!
+"${CLAUDE_PLUGIN_ROOT}/scripts/setup-ralph-loop.sh" $ARGUMENTS
+```
+Please work on the task. When you try to exit, the Ralph loop will feed the SAME PROMPT back to you for the next iteration. You'll see your previous work in files and git history, allowing you to iterate and improve.
+CRITICAL RULE: If a completion promise is set, you may ONLY output it when the statement is completely and unequivocally TRUE. Do not output false promises to escape the loop, even if you think you're stuck or should exit for other reasons. The loop is designed to continue until genuine completion.