npm - autonomous-coding-toolkit - Versions diffs - 1.0.0 - Mend

autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

package/.claude-plugin/marketplace.json +22 -0
package/.claude-plugin/plugin.json +13 -0
package/LICENSE +21 -0
package/Makefile +21 -0
package/README.md +140 -0
package/SECURITY.md +28 -0
package/agents/bash-expert.md +113 -0
package/agents/dependency-auditor.md +138 -0
package/agents/integration-tester.md +120 -0
package/agents/lesson-scanner.md +149 -0
package/agents/python-expert.md +179 -0
package/agents/service-monitor.md +141 -0
package/agents/shell-expert.md +147 -0
package/benchmarks/runner.sh +147 -0
package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
package/benchmarks/tasks/02-refactor-module/task.md +8 -0
package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
package/bin/act.js +238 -0
package/commands/autocode.md +6 -0
package/commands/cancel-ralph.md +18 -0
package/commands/code-factory.md +53 -0
package/commands/create-prd.md +55 -0
package/commands/ralph-loop.md +18 -0
package/commands/run-plan.md +117 -0
package/commands/submit-lesson.md +122 -0
package/docs/ARCHITECTURE.md +630 -0
package/docs/CONTRIBUTING.md +125 -0
package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
package/docs/lessons/0002-async-def-without-await.md +28 -0
package/docs/lessons/0003-create-task-without-callback.md +28 -0
package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
package/docs/lessons/0005-sqlite-without-closing.md +33 -0
package/docs/lessons/0006-venv-pip-path.md +27 -0
package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
package/docs/lessons/0010-local-outside-function-bash.md +33 -0
package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
package/docs/lessons/0020-persist-state-incrementally.md +44 -0
package/docs/lessons/0021-dual-axis-testing.md +48 -0
package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
package/docs/lessons/0023-static-analysis-spiral.md +51 -0
package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
package/docs/lessons/0045-iterative-design-improvement.md +33 -0
package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
package/docs/lessons/0048-integration-wiring-batch.md +40 -0
package/docs/lessons/0049-ab-verification.md +41 -0
package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
package/docs/lessons/0078-static-review-without-live-test.md +30 -0
package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
package/docs/lessons/FRAMEWORK.md +161 -0
package/docs/lessons/SUMMARY.md +201 -0
package/docs/lessons/TEMPLATE.md +85 -0
package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
package/docs/plans/2026-02-21-mab-research-report.md +406 -0
package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
package/docs/plans/2026-02-22-mab-run-design.md +462 -0
package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
package/docs/plans/2026-02-24-headless-module-split.md +443 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
package/docs/plans/audit-findings.md +186 -0
package/docs/telegram-notification-format.md +98 -0
package/examples/example-plan.md +51 -0
package/examples/example-prd.json +72 -0
package/examples/example-roadmap.md +33 -0
package/examples/quickstart-plan.md +63 -0
package/hooks/hooks.json +26 -0
package/hooks/setup-symlinks.sh +48 -0
package/hooks/stop-hook.sh +135 -0
package/package.json +47 -0
package/policies/bash.md +71 -0
package/policies/python.md +71 -0
package/policies/testing.md +61 -0
package/policies/universal.md +60 -0
package/scripts/analyze-report.sh +97 -0
package/scripts/architecture-map.sh +145 -0
package/scripts/auto-compound.sh +273 -0
package/scripts/batch-audit.sh +42 -0
package/scripts/batch-test.sh +101 -0
package/scripts/entropy-audit.sh +221 -0
package/scripts/failure-digest.sh +51 -0
package/scripts/generate-ast-rules.sh +96 -0
package/scripts/init.sh +112 -0
package/scripts/lesson-check.sh +428 -0
package/scripts/lib/common.sh +61 -0
package/scripts/lib/cost-tracking.sh +153 -0
package/scripts/lib/ollama.sh +60 -0
package/scripts/lib/progress-writer.sh +128 -0
package/scripts/lib/run-plan-context.sh +215 -0
package/scripts/lib/run-plan-echo-back.sh +231 -0
package/scripts/lib/run-plan-headless.sh +396 -0
package/scripts/lib/run-plan-notify.sh +57 -0
package/scripts/lib/run-plan-parser.sh +81 -0
package/scripts/lib/run-plan-prompt.sh +215 -0
package/scripts/lib/run-plan-quality-gate.sh +132 -0
package/scripts/lib/run-plan-routing.sh +315 -0
package/scripts/lib/run-plan-sampling.sh +170 -0
package/scripts/lib/run-plan-scoring.sh +146 -0
package/scripts/lib/run-plan-state.sh +142 -0
package/scripts/lib/run-plan-team.sh +199 -0
package/scripts/lib/telegram.sh +54 -0
package/scripts/lib/thompson-sampling.sh +176 -0
package/scripts/license-check.sh +74 -0
package/scripts/mab-run.sh +575 -0
package/scripts/module-size-check.sh +146 -0
package/scripts/patterns/async-no-await.yml +5 -0
package/scripts/patterns/bare-except.yml +6 -0
package/scripts/patterns/empty-catch.yml +6 -0
package/scripts/patterns/hardcoded-localhost.yml +9 -0
package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
package/scripts/pipeline-status.sh +197 -0
package/scripts/policy-check.sh +226 -0
package/scripts/prior-art-search.sh +133 -0
package/scripts/promote-mab-lessons.sh +126 -0
package/scripts/prompts/agent-a-superpowers.md +29 -0
package/scripts/prompts/agent-b-ralph.md +29 -0
package/scripts/prompts/judge-agent.md +61 -0
package/scripts/prompts/planner-agent.md +44 -0
package/scripts/pull-community-lessons.sh +90 -0
package/scripts/quality-gate.sh +266 -0
package/scripts/research-gate.sh +90 -0
package/scripts/run-plan.sh +329 -0
package/scripts/scope-infer.sh +159 -0
package/scripts/setup-ralph-loop.sh +155 -0
package/scripts/telemetry.sh +230 -0
package/scripts/tests/run-all-tests.sh +52 -0
package/scripts/tests/test-act-cli.sh +46 -0
package/scripts/tests/test-agents-md.sh +87 -0
package/scripts/tests/test-analyze-report.sh +114 -0
package/scripts/tests/test-architecture-map.sh +89 -0
package/scripts/tests/test-auto-compound.sh +169 -0
package/scripts/tests/test-batch-test.sh +65 -0
package/scripts/tests/test-benchmark-runner.sh +25 -0
package/scripts/tests/test-common.sh +168 -0
package/scripts/tests/test-cost-tracking.sh +158 -0
package/scripts/tests/test-echo-back.sh +180 -0
package/scripts/tests/test-entropy-audit.sh +146 -0
package/scripts/tests/test-failure-digest.sh +66 -0
package/scripts/tests/test-generate-ast-rules.sh +145 -0
package/scripts/tests/test-helpers.sh +82 -0
package/scripts/tests/test-init.sh +47 -0
package/scripts/tests/test-lesson-check.sh +278 -0
package/scripts/tests/test-lesson-local.sh +55 -0
package/scripts/tests/test-license-check.sh +109 -0
package/scripts/tests/test-mab-run.sh +182 -0
package/scripts/tests/test-ollama-lib.sh +49 -0
package/scripts/tests/test-ollama.sh +60 -0
package/scripts/tests/test-pipeline-status.sh +198 -0
package/scripts/tests/test-policy-check.sh +124 -0
package/scripts/tests/test-prior-art-search.sh +96 -0
package/scripts/tests/test-progress-writer.sh +140 -0
package/scripts/tests/test-promote-mab-lessons.sh +110 -0
package/scripts/tests/test-pull-community-lessons.sh +149 -0
package/scripts/tests/test-quality-gate.sh +241 -0
package/scripts/tests/test-research-gate.sh +132 -0
package/scripts/tests/test-run-plan-cli.sh +86 -0
package/scripts/tests/test-run-plan-context.sh +305 -0
package/scripts/tests/test-run-plan-e2e.sh +153 -0
package/scripts/tests/test-run-plan-headless.sh +424 -0
package/scripts/tests/test-run-plan-notify.sh +124 -0
package/scripts/tests/test-run-plan-parser.sh +217 -0
package/scripts/tests/test-run-plan-prompt.sh +254 -0
package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
package/scripts/tests/test-run-plan-routing.sh +178 -0
package/scripts/tests/test-run-plan-scoring.sh +148 -0
package/scripts/tests/test-run-plan-state.sh +261 -0
package/scripts/tests/test-run-plan-team.sh +157 -0
package/scripts/tests/test-scope-infer.sh +150 -0
package/scripts/tests/test-setup-ralph-loop.sh +63 -0
package/scripts/tests/test-telegram-env.sh +38 -0
package/scripts/tests/test-telegram.sh +121 -0
package/scripts/tests/test-telemetry.sh +46 -0
package/scripts/tests/test-thompson-sampling.sh +139 -0
package/scripts/tests/test-validate-all.sh +60 -0
package/scripts/tests/test-validate-commands.sh +89 -0
package/scripts/tests/test-validate-hooks.sh +98 -0
package/scripts/tests/test-validate-lessons.sh +150 -0
package/scripts/tests/test-validate-plan-quality.sh +235 -0
package/scripts/tests/test-validate-plans.sh +187 -0
package/scripts/tests/test-validate-plugin.sh +106 -0
package/scripts/tests/test-validate-prd.sh +184 -0
package/scripts/tests/test-validate-skills.sh +134 -0
package/scripts/validate-all.sh +57 -0
package/scripts/validate-commands.sh +67 -0
package/scripts/validate-hooks.sh +89 -0
package/scripts/validate-lessons.sh +98 -0
package/scripts/validate-plan-quality.sh +369 -0
package/scripts/validate-plans.sh +120 -0
package/scripts/validate-plugin.sh +86 -0
package/scripts/validate-policies.sh +42 -0
package/scripts/validate-prd.sh +118 -0
package/scripts/validate-skills.sh +96 -0
package/skills/autocode/SKILL.md +285 -0
package/skills/autocode/ab-verification.md +51 -0
package/skills/autocode/code-quality-standards.md +37 -0
package/skills/autocode/competitive-mode.md +364 -0
package/skills/brainstorming/SKILL.md +97 -0
package/skills/capture-lesson/SKILL.md +187 -0
package/skills/check-lessons/SKILL.md +116 -0
package/skills/dispatching-parallel-agents/SKILL.md +110 -0
package/skills/executing-plans/SKILL.md +85 -0
package/skills/finishing-a-development-branch/SKILL.md +201 -0
package/skills/receiving-code-review/SKILL.md +72 -0
package/skills/requesting-code-review/SKILL.md +59 -0
package/skills/requesting-code-review/code-reviewer.md +82 -0
package/skills/research/SKILL.md +145 -0
package/skills/roadmap/SKILL.md +115 -0
package/skills/subagent-driven-development/SKILL.md +98 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
package/skills/subagent-driven-development/implementer-prompt.md +73 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
package/skills/systematic-debugging/SKILL.md +134 -0
package/skills/systematic-debugging/condition-based-waiting.md +64 -0
package/skills/systematic-debugging/defense-in-depth.md +32 -0
package/skills/systematic-debugging/root-cause-tracing.md +55 -0
package/skills/test-driven-development/SKILL.md +167 -0
package/skills/using-git-worktrees/SKILL.md +219 -0
package/skills/using-superpowers/SKILL.md +54 -0
package/skills/verification-before-completion/SKILL.md +140 -0
package/skills/verify/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +128 -0
package/skills/writing-skills/SKILL.md +93 -0

package/docs/plans/2026-02-22-research-user-adoption.md ADDED Viewed

@@ -0,0 +1,638 @@
+# Research: User Adoption and Friction in Autonomous Coding Tools
+**Date:** 2026-02-22
+**Type:** Research Report
+**Domain:** Complex (Cynefin) -- developer adoption involves emergent behavior, cultural factors, and feedback loops that resist linear prediction
+**Confidence:** Medium overall. Individual findings range from High (large-N surveys, RCTs) to Low (anecdotal forum data). Noted per section.
+---
+## Executive Summary
+The autonomous-coding-toolkit implements a rigid six-stage skill chain (brainstorm -> PRD -> plan -> worktree -> execute -> verify -> finish) with hard gates at every stage. This research examines whether that rigidity helps or hurts adoption by synthesizing evidence from developer tool UX research, AI coding tool user feedback, TDD adoption studies, progressive disclosure literature, and trust research.
+**Bottom line:** The toolkit's quality discipline is its competitive advantage, but the mandatory linear chain is its adoption bottleneck. The evidence strongly supports:
+1. **34.7% of developers abandon tools with difficult setup** -- the toolkit's multi-stage chain looks difficult before delivering value.
+2. **Time to first value is the single strongest predictor of retention** -- the toolkit delays value behind mandatory brainstorming + PRD + plan stages.
+3. **Guardrails beat gates** -- continuous guidance without blocking (guardrails) outperforms binary pass/fail checkpoints (gates) for developer satisfaction and velocity.
+4. **Trust in AI tools is declining** (29% trust rate, down from 40%) -- the toolkit's quality gates actually address this, but only if users get far enough to experience them.
+5. **Progressive disclosure is the proven path** -- start simple (run-plan.sh on an existing plan), unlock complexity (full pipeline) as users build confidence.
+**Recommendation:** Implement a "fast lane" entry point that lets users experience value in under 5 minutes, then progressively disclose the full skill chain. Keep the discipline; restructure the discovery path.
+---
+## 1. Developer Tool Adoption: What Makes Devs Try, Stick, or Abandon
+### Findings
+**Confidence: High** (multiple large-N surveys: Stack Overflow 2025 N=65k+, Evil Martians/Catchy Agency N=202, JetBrains State of Developer Ecosystem 2025)
+#### Abandonment triggers (ranked by impact):
+| Trigger | % of Developers | Source |
+|---------|----------------|--------|
+| Difficult setup | 34.7% | Catchy Agency (N=202 OSS devs) |
+| Appears unmaintained | 26.2% | Catchy Agency |
+| Bad documentation | 17.3% | Catchy Agency |
+| Missing features | 12.4% | Catchy Agency |
+| Security/privacy concerns | Top 3 | Stack Overflow 2025 |
+| Prohibitive pricing | Top 3 | Stack Overflow 2025 |
+#### Retention drivers:
+- **Discoverability** -- how fast developers can turn intent into action. Poor discoverability snowballs into UX debt (Evil Martians, 2026).
+- **Performance/latency** -- devtool latency matters more than initial load speed because sessions are long (Evil Martians, 2026).
+- **Maintenance signals** -- developers check GitHub issue response time (48-hour benchmark) as a proxy for project health (Catchy Agency).
+- **Time to first value** -- pre-boarding reduces first-week friction by 80% (Remote Work Association, 2024). Reprise reduced developer ramp-up from 4 days to 30 minutes by focusing on TTFV.
+#### AI-specific adoption:
+- 84% of developers use or plan to use AI tools (Stack Overflow 2025), up from 76% in 2024.
+- 51% of professional developers use AI tools daily.
+- But positive sentiment dropped to 60%, down from >70% in 2023-2024.
+- Top deal-breakers: security concerns, pricing, availability of better alternatives. Lack of AI features is the *least* important factor.
+### Implications for the Toolkit
+The toolkit's biggest adoption risk is not missing features -- it's **perceived setup difficulty**. A six-stage mandatory pipeline *looks* hard to set up, even if individual stages are well-designed. The 34.7% abandonment rate for "difficult setup" is the single largest threat. The toolkit needs to demonstrate value before asking for process commitment.
+---
+## 2. Multi-Step Workflow Abandonment: The Funnel
+### Findings
+**Confidence: Medium** (UX research literature is strong on funnels generally; specific data on developer workflow funnels is sparse)
+Research identifies three categories of friction that cause funnel drop-off (FullStory, Userpilot):
+1. **Cognitive friction** -- too many decisions required before starting. Each mandatory stage in the toolkit represents a cognitive load decision point.
+2. **Interaction friction** -- the experience feels slow or effortful. Long forms, unnecessary steps, and slow responses increase abandonment.
+3. **Emotional friction** -- anxiety about making mistakes or losing work. Irreversible actions without clear rollback increase emotional friction.
+Developer-specific findings (Atlassian DX Report 2024, HashiCorp):
+- 97% of developers lose significant time to workflow inefficiencies.
+- Engineers who spend more time battling processes than solving problems eventually disengage -- this is a leading cause of burnout.
+- "Process friction" = barriers, delays, and inefficiencies in a developer's workflow. The resistance engineers experience getting work done due to tooling, handoffs, bureaucracy.
+**Where multi-step funnels break:**
+- Funnel analysis consistently shows the steepest drop-off at the **first non-trivial step** -- the point where the user must invest effort before seeing any return (CXL, Datadog).
+- In the toolkit's case, that's the brainstorming stage. The user has an idea and wants to code. The toolkit says "no, first you must brainstorm."
+### Implications for the Toolkit
+The toolkit's funnel likely looks like:
+```
+Install/discover toolkit    100%
+Read README/docs            ~60% (34.7% abandon on perceived difficulty)
+Try /autocode               ~40% (another ~20% don't get past documentation)
+Complete brainstorming      ~25% (forced process before value)
+Complete PRD                ~20% (another gate before code)
+Complete plan               ~18%
+Execute first batch         ~15%
+Experience quality gates    ~12%
+Complete full pipeline      ~8%
+Become repeat user          ~5%
+```
+These are estimates based on typical SaaS funnel shapes applied to the toolkit's structure, not measured data. The critical insight: **quality gates -- the toolkit's core value -- are experienced by ~12% of users who start**. Most users never get far enough to see why the discipline matters.
+---
+## 3. Mandatory Brainstorming: Help or Barrier?
+### Findings
+**Confidence: Medium** (strong evidence for design reviews in high-stakes contexts; weak evidence for mandatory brainstorming in developer tools specifically)
+#### Evidence FOR mandatory design before code:
+- **DoD acquisition programs** mandate Preliminary Design Review (PDR) and Critical Design Review (CDR) before coding begins (DoDI 5000.88). This is for systems where failure = lives lost.
+- **Code inspections/walk-throughs** are "highly effective and an excellent way to encourage proper designs" (JPL Review Process).
+- The toolkit's own experience: brainstorming catches scope creep, misaligned requirements, and architectural mistakes before they become expensive code changes.
+#### Evidence AGAINST mandatory brainstorming:
+- **No empirical research** specifically links mandatory brainstorming in developer tools to better outcomes vs. optional brainstorming.
+- The DoD analogy breaks down: the toolkit targets individual developers and small teams, not defense acquisition programs. The cost of a bad design in a personal project is an afternoon, not a missile.
+- **Developer autonomy research** (Microsoft Platform Engineering, Built In) shows developers resist mandatory process steps that feel like "artificial ceremonies" when they already know what they want to build.
+- The brainstorming skill asks clarifying questions "one at a time, multiple choice preferred" -- this is pedagogically sound but feels slow when a developer has a clear vision.
+#### The middle ground:
+- Brainstorming is most valuable when the problem is **complex or ambiguous** (Cynefin: complex domain). For **clear** problems ("add a new endpoint that mirrors this existing one"), it's overhead.
+- The most successful design review processes are **proportional to risk** -- lightweight for small changes, heavy for architecture changes.
+### Implications for the Toolkit
+Mandatory brainstorming is correct for the toolkit's target use case (autonomous execution of multi-batch plans where a bad design cascades through every batch). But it should be **skippable for experienced users who provide a design doc or clear spec upfront**. The current "no exceptions" stance is the right default for autonomous pipelines but wrong for the "first 5 minutes" experience.
+**Recommendation:** Make brainstorming the default but allow `--skip-brainstorm` with a design doc path. For `/autocode`, keep it mandatory. For `run-plan.sh` (which already takes a plan file), brainstorming is implicitly complete.
+---
+## 4. Rigid vs. Flexible Skill Chains
+### Findings
+**Confidence: High** (strong research from platform engineering, DevOps, and workflow automation fields)
+#### The research is clear: rigid pipelines create friction.
+- "Rigidity in workflows introduces friction, wastes time, and reduces productivity, resulting in artificial ceremonies, unnecessary artifacts, redundant approvals, and process overhead that impede velocity" (AWS AI-DLC research).
+- "Dynamic automation enables greater agility and requires minimal long-term maintenance" (AWS).
+- "Adaptive workflows, flexible depth, and embedded human oversight have been validated by all engineering teams engaged in this research" (AWS AI-DLC).
+#### But discipline prevents failure:
+- Devin AI's autonomous failures demonstrate what happens without gates: 3/20 tasks completed successfully in independent testing (Answer.AI). The agent spent "days on impossible solutions rather than recognizing fundamental blockers."
+- The SAFE-AI framework (arxiv:2508.11824) recommends: "differentiated validation intensity based on operation criticality" -- not uniform rigidity, but risk-proportional enforcement.
+- The METR RCT found AI tools make experienced developers 19% slower partly because developers spent time "cleaning up AI-generated code" -- quality gates prevent this cleanup debt.
+#### Guardrails vs. Gates (the critical distinction):
+| Characteristic | Gates | Guardrails |
+|---------------|-------|------------|
+| Mechanism | Binary pass/fail | Continuous guidance |
+| Effect | Blocks progress | Enables safe progress |
+| Developer experience | Frustrating | Empowering |
+| Analogy | Locked door | Highway lane markers |
+| Source | Traditional QA | Platform engineering |
+"Guardrails don't restrict freedom; they enable it. When developers know that automated policies will catch dangerous configurations, they gain the confidence to move fast and experiment" (Microsoft Platform Engineering).
+"Security gates are binary, blocking mechanisms that operate on a pass/fail basis and create artificial checkpoints that interrupt workflow. By contrast, guardrails provide continuous guidance without blocking progress" (Built In / Avyka).
+### Implications for the Toolkit
+The toolkit currently uses **gates** (hard blocks at every stage). The research strongly supports converting some gates to **guardrails** (warnings that allow override with acknowledgment). Specifically:
+- **Keep as gates:** Quality gate between batches (test suite, lesson check), verification before completion claims.
+- **Convert to guardrails:** Brainstorming requirement (warn but allow skip), PRD generation (recommend but don't block), plan format validation (suggest corrections, don't reject).
+This preserves the toolkit's quality discipline where it matters most (execution and verification) while reducing friction in the design phases where experienced users may have already done the thinking.
+---
+## 5. Progressive Disclosure in Developer Tools
+### Findings
+**Confidence: High** (well-established UX principle with extensive empirical backing from Nielsen Norman Group, Interaction Design Foundation, and SaaS research)
+Jakob Nielsen introduced progressive disclosure in 1995: "Initially show users only a few of the most important options. Offer a larger set of specialized options upon request. Disclose secondary features only if a user asks for them" (NN/g).
+Applied to developer tools:
+- **CLI tools** should "nudge developers toward the most likely commands" rather than showing complex manual pages (Lucas F. Costa, clig.dev).
+- **Complex software** benefits from two levels of subcommand -- noun/verb pairs (Heroku CLI pattern).
+- **Onboarding** should "limit to only show users the core and secondary features," deferring advanced features (UXPin, Userpilot).
+- **Error handling** should include not just what went wrong but suggestions for how to fix it and links to more information (Atlassian CLI principles).
+Successful examples:
+- Git: `git init`, `git add`, `git commit` covers 80% of use cases. Rebasing, cherry-picking, worktrees are discovered later.
+- Docker: `docker run` gets immediate value. Compose, Swarm, multi-stage builds come later.
+- Heroku: `heroku create` + `git push heroku main` deploys in 2 commands. Addons, dynos, pipelines are progressive.
+### Implications for the Toolkit
+The toolkit currently presents the full pipeline upfront in the README:
+```
+Idea -> Brainstorm -> Plan -> Worktree -> Execute -> Verify -> Finish
+```
+This is the *architecture* diagram, not the *user journey*. Progressive disclosure would restructure the experience:
+**Level 0 (first 5 minutes):** `scripts/run-plan.sh examples/example-plan.md` -- run an existing plan, see quality gates work, experience the value.
+**Level 1 (first hour):** Write your own plan file, run it with `run-plan.sh`. Learn plan format and quality gates.
+**Level 2 (first day):** Use `/autocode` in a Claude Code session. Experience the full pipeline with brainstorming.
+**Level 3 (first week):** Customize quality gates, add lessons, use competitive mode, configure Telegram notifications.
+**Level 4 (ongoing):** Submit community lessons, create custom skills, run cross-project audits.
+---
+## 6. The "First 5 Minutes" Experience
+### Findings
+**Confidence: High** (consistent across onboarding research, time-to-value studies, and developer tool best practices)
+- "Time to first commit" should happen within 3 days for small teams, 2 weeks for enterprises (Zavvy). For a CLI tool, the analog is "time to first successful run."
+- Pre-boarding reduces first-week friction by 80% (Remote Work Association, 2024).
+- "Rather than overwhelming users with complicated documentation, CLI tools should nudge developers toward the most likely commands" (Lucas F. Costa).
+- "Start with an example and show users what command they're most likely to use first" (clig.dev).
+- "At the end of each command, suggest the next best step" -- reduces need to reference documentation (Zapier CLI best practices).
+**The first 5 minutes decide everything.** If a developer doesn't get value in the first session, they rarely return. The Baremetrics/Appcues research on Time to First Value (TTFV) shows:
+- TTFV is the strongest predictor of long-term retention.
+- Most customers report 4+ days wasted on setup before getting value.
+- Tools that compress TTFV to minutes (not days) see dramatically higher retention.
+### Implications for the Toolkit
+The toolkit's current "first 5 minutes" experience:
+1. Read README (long -- 519 lines)
+2. Install (clone repo or install plugin -- reasonable)
+3. Try `/autocode "Add feature X"` -- immediately hits mandatory brainstorming
+4. Answer clarifying questions one at a time
+5. Review design proposal section by section
+6. Approve design
+7. Generate PRD (optional but recommended)
+8. Write plan
+9. Execute first batch
+10. See quality gates work
+**Time to first value: 30-60 minutes** (optimistically). Most of that is the brainstorming/design phase.
+**Recommended "first 5 minutes" redesign:**
+```bash
+# Clone
+git clone https://github.com/parthalon025/autonomous-coding-toolkit.git
+cd autonomous-coding-toolkit
+# Run the example plan (3 minutes to first quality gate pass)
+scripts/run-plan.sh examples/example-plan.md --dry-run
+# See what happens: plan parsed, batches identified, quality gates explained
+# User immediately understands the value proposition
+```
+This gives the user a **mental model** of the system before asking them to invest in the full pipeline.
+---
+## 7. Friction Points in Cursor, Devin, Aider, and Claude Code
+### Findings
+**Confidence: Medium-High** (mix of user surveys, reviews, and developer discussions; some selection bias in complaints)
+#### Cursor
+| Friction Point | Impact | Source |
+|---------------|--------|--------|
+| UI changes every week | Re-learning required | DevClass, 2025 |
+| Command+K hijacked for AI | Breaks muscle memory | DevClass |
+| Release-breaking updates | Chat history/worktree corruption | Cursor 2.1 release, Nov 2025 |
+| Mandatory telemetry | Enterprise security blocks adoption | CheckThat.ai |
+| Interface feels cluttered | Learning curve for new users | EngineLabsAI review |
+**What Cursor gets right:** Full codebase context, suggestions inline (not in separate panel), reducing context-switching.
+#### Devin
+| Friction Point | Impact | Source |
+|---------------|--------|--------|
+| 15-minute iteration cycles | Breaks developer flow | TechPoint Africa review |
+| 3/20 task success rate | Unpredictable failures | Answer.AI testing |
+| Remote-first (Slack-based) | No local environment control | Multiple reviews |
+| ACU cost unpredictability | Budget anxiety | Trickle.so review |
+| Spends days on impossible tasks | No failure recognition | Multiple reviews |
+**What Devin gets right:** Ambitious vision of full autonomy, good for async delegation of clearly-defined tasks.
+#### Aider
+| Friction Point | Impact | Source |
+|---------------|--------|--------|
+| Terminal-only (initially) | Intimidating for some | Blott review |
+| Model selection complexity | Which model for which task? | Community feedback |
+| Git commit per change | Verbose history | Community feedback |
+**What Aider gets right:** "Precision tool of LLM code gen. Minimal, thoughtful, capable of surgical changes while keeping the developer in control" (community). Voice commands for focused sessions. Browser UI added for accessibility.
+#### Claude Code
+| Friction Point | Impact | Source |
+|---------------|--------|--------|
+| Terminal-only interface | Barrier for GUI-preferring devs | General feedback |
+| Cost management opaque | Hard to predict API spend | Community reports |
+| Context window management | Long sessions degrade quality | Anthropic best practices |
+| Permission prompts | Interrupts flow | User feedback |
+**What Claude Code gets right:** CLAUDE.md for project context, natural language interface, deep codebase understanding, extensibility (skills, hooks, commands). Developers report being in "reviewer mode more often than coding mode" -- a healthy pattern.
+### Implications for the Toolkit
+**Common friction themes across all tools:**
+1. **Cost unpredictability** -- every tool struggles with this. The toolkit's headless mode (fresh `claude -p` per batch) at least makes cost predictable per-batch.
+2. **Unreliable autonomy** -- Devin's 15% success rate is the cautionary tale. The toolkit's quality gates directly address this.
+3. **Loss of control** -- developers want to feel in charge. The toolkit's worktree isolation + rollback addresses this.
+4. **Context degradation** -- universal problem. The toolkit's fresh-context-per-batch is a genuine innovation.
+**The toolkit's unique position:** It addresses the problems users complain about most in competing tools (reliability, quality, context degradation) but does so by adding process (brainstorming, PRD, plans) that creates a *different* kind of friction.
+---
+## 8. TDD Adoption Resistance
+### Findings
+**Confidence: High** (meta-analyses, large industry studies from Microsoft/IBM, 2024 State of TDD survey)
+#### Barriers to TDD adoption:
+- **Engineering culture** -- the biggest barrier is organizational, not technical (State of TDD 2024).
+- **Perception that TDD is slow** -- managers worry about shipping velocity. Microsoft found ~15% additional upfront time.
+- **Individual preference** -- "some developers find it extremely beneficial, while others argue it demands too much effort" (ScrumLaunch, 2024).
+- **Misconceptions about complexity** -- developers overestimate how hard TDD is before trying it.
+- **Time pressure** -- "pressure to get products to market quickly" overrides process discipline.
+#### Evidence that TDD works (despite resistance):
+| Metric | Finding | Source |
+|--------|---------|--------|
+| Release frequency | 32% more frequent with TDD | Thoughtworks 2024 |
+| Defect reduction | 40-90% fewer defects | IBM + Microsoft combined study |
+| Quality improvement | 2x quality vs non-TDD | Microsoft industrial case studies |
+| Time overhead | ~15% more upfront time | Microsoft industrial case studies |
+#### The adoption paradox:
+TDD has clear empirical evidence in its favor, yet adoption remains low. This is the same pattern the toolkit faces: **evidence that discipline works does not translate to adoption of discipline.** The barrier is not rational -- it's emotional and cultural.
+### Implications for the Toolkit
+The toolkit mandates TDD ("Every task follows: write failing test -> confirm fail -> implement -> confirm pass -> commit"). This is the correct technical choice but creates adoption friction because:
+1. Developers who don't already practice TDD must learn a new methodology *and* a new tool simultaneously.
+2. The toolkit enforces TDD mechanically (test count monotonicity) rather than teaching it progressively.
+3. No escape hatch exists for tasks where TDD is genuinely awkward (UI changes, data migrations, config changes).
+**Recommendation:** Make TDD the default execution mode but allow `--test-strategy after` for "test after" workflows. Maintain the quality gate (tests must exist and pass before next batch) but don't mandate the red-green-refactor sequence for every task type.
+---
+## 9. Developer Trust in Autonomous Code Modification
+### Findings
+**Confidence: High** (Stack Overflow 2025 N=65k+, Qodo State of AI Code Quality 2025, SAFE-AI framework)
+#### The trust crisis is real:
+- 46% of developers distrust AI tool accuracy (up from 31% last year) -- Stack Overflow 2025
+- Only 29% trust AI output (down from 40%) -- Stack Overflow 2025
+- Only 3% report "highly trusting" AI output -- Stack Overflow 2025
+- Experienced developers are *most* skeptical: lowest "highly trust" (2.6%), highest "highly distrust" (20%) -- Stack Overflow 2025
+- 90% of developers use tools they don't fully trust -- BayTech Consulting
+#### The verification gap:
+- 96% don't fully trust AI-generated code is functionally correct -- Qodo
+- But fewer than half review it before committing -- Qodo
+- 38% say reviewing AI code takes more effort than reviewing human code -- IT Pro
+- 66% spend extra time fixing "almost right" suggestions -- Stack Overflow 2025
+- 45% call debugging AI code their top frustration -- Stack Overflow 2025
+#### What builds trust:
+- **Transparency** -- showing reasoning, not just output (SAFE-AI framework: Explainability pillar)
+- **Rollback capability** -- "checkpoint systems enabling instant reversion to previous code states are critical" (RedMonk 2025)
+- **Audit trails** -- "clear audit trails documenting every autonomous agent action" (RedMonk 2025)
+- **Proportional autonomy** -- "differentiated validation intensity based on operation criticality" (SAFE-AI)
+- **Human-in-the-loop for critical operations** -- "essential for high-stakes domains" (SAFE-AI)
+#### The autonomy-risk gradient:
+Greater AI independence correlates with amplified safety risks (SAFE-AI). The Replit incident: an AI agent with excessive permissions deleted production databases and fabricated test results. This is the failure mode that quality gates prevent.
+### Implications for the Toolkit
+The toolkit is **well-positioned on trust** -- better than most competitors:
+| Trust factor | Toolkit's approach | Assessment |
+|-------------|-------------------|------------|
+| Transparency | progress.txt, routing logs, quality gate output | Strong |
+| Rollback | Git worktree isolation | Strong |
+| Audit trails | .run-plan-state.json, logs/ directory | Strong |
+| Proportional autonomy | Same gates for all batches | Weak -- should vary |
+| Human-in-the-loop | Mode B (checkpoints), but Mode C is fully autonomous | Mixed |
+| Verification | Iron Law: no claims without evidence | Very strong |
+The toolkit should **market its trust features more prominently**. In a market where 46% of developers distrust AI tools, the quality gate pipeline is the toolkit's strongest selling point -- but it's buried behind the process overhead.
+---
+## 10. Documentation and Onboarding Patterns
+### Findings
+**Confidence: High** (extensive practitioner literature, CLI design guidelines, developer onboarding research)
+#### What works for complex dev tools:
+**Structure:**
+- Two-page architecture overview + visual system diagrams (Cortex, 2025)
+- Team-specific glossaries and tech stack maps (FullScale)
+- Progressive disclosure: core features first, advanced features on request (NN/g)
+**CLI-specific patterns (clig.dev, Atlassian CLI principles):**
+1. Show the most likely command first, not the full manual
+2. Use noun/verb subcommand pairs for complex tools
+3. Suggest the next step at the end of each command output
+4. Include fix suggestions in error messages, not just error descriptions
+5. Follow existing conventions (don't invent new patterns)
+6. Provide `--help` at every level with examples, not just flag descriptions
+**Onboarding sequence (Port.io, Cortex):**
+1. Pre-boarding: environment setup automated, access provisioned
+2. Day 1: 30-min welcome + codebase walkthrough + first real task
+3. Week 1: assigned technical buddy + domain-specific learning
+4. Month 1: first independent contribution
+**What fails:**
+- Documentation that explains *what* without explaining *why* (Document360)
+- Requiring reading before doing (users learn by doing, not reading)
+- "Making onboarding everyone's responsibility means it becomes no one's responsibility" (Shake, 2024)
+### Implications for the Toolkit
+The toolkit's current documentation:
+- README.md: 519 lines. Comprehensive but overwhelming. Architecture before value.
+- CLAUDE.md: 224 lines. Internal reference, not onboarding material.
+- ARCHITECTURE.md: 483 lines. Excellent for understanding, wrong for first contact.
+- No dedicated "Getting Started" guide.
+- No interactive tutorial or guided first run.
+- No `--help` output optimization for progressive discovery.
+**Recommendation:** Create a focused onboarding path:
+1. **README.md** -- trim to <100 lines. Problem statement, 3-command quickstart, link to guide.
+2. **docs/GETTING-STARTED.md** -- guided tutorial: run example plan -> write simple plan -> use /autocode.
+3. **examples/** -- expand with a "hello world" plan that completes in 2 minutes.
+4. **Command output** -- add "next step" suggestions after successful runs.
+---
+## Friction Map: Autonomous Coding Toolkit
+This map identifies every friction point in the current toolkit experience, categorized by severity and fixability.
+### Critical Friction (blocks adoption)
+| # | Friction Point | Stage | Evidence | Fix Difficulty |
+|---|---------------|-------|----------|---------------|
+| F1 | No "first 5 minutes" value path | Discovery | 34.7% abandon on setup difficulty | Medium |
+| F2 | Mandatory brainstorming for ALL use cases | Brainstorm | Developers resist forced process for clear problems | Easy |
+| F3 | Full pipeline shown before any value delivered | Documentation | Progressive disclosure literature unanimously recommends simple-first | Medium |
+| F4 | 519-line README is architecture, not onboarding | Documentation | "Start with example, show most likely command first" (clig.dev) | Easy |
+| F5 | No dry-run / preview mode | Discovery | Users can't see what the tool does without committing to the full pipeline | Medium |
+### Significant Friction (slows adoption)
+| # | Friction Point | Stage | Evidence | Fix Difficulty |
+|---|---------------|-------|----------|---------------|
+| F6 | TDD mandated with no escape hatch | Execution | 15% time overhead, cultural resistance (State of TDD 2024) | Easy |
+| F7 | All quality gates are hard blocks (gates, not guardrails) | Execution | Guardrails > gates research (Microsoft, Built In) | Medium |
+| F8 | No proportional autonomy -- same gates for config changes and architecture | Execution | SAFE-AI: "differentiated validation based on criticality" | Hard |
+| F9 | Trust features (quality gates, rollback) buried in docs | Marketing | 46% of devs distrust AI tools; trust features are the selling point | Easy |
+| F10 | No guided error recovery | All stages | CLI best practice: "suggest how to fix" in every error (Atlassian) | Medium |
+### Minor Friction (polish items)
+| # | Friction Point | Stage | Evidence | Fix Difficulty |
+|---|---------------|-------|----------|---------------|
+| F11 | No "next step" suggestions in command output | CLI UX | CLI design guidelines (clig.dev, Zapier) | Easy |
+| F12 | No interactive tutorial | Onboarding | "Users learn by doing, not reading" (Document360) | Hard |
+| F13 | Example plan exists but isn't highlighted | Onboarding | Progressive disclosure: lead with examples | Easy |
+| F14 | Competitive mode explained before basic mode | Documentation | Progressive disclosure violation | Easy |
+| F15 | No telemetry / analytics on where users actually drop off | Product | Can't improve what you don't measure | Medium |
+---
+## Recommendations
+Ordered by impact-to-effort ratio. Each recommendation maps to specific friction points and research evidence.
+### 1. Create a "Fast Lane" Quick Start (addresses F1, F3, F4, F13)
+**What:** A 3-command getting started experience that delivers value in under 5 minutes.
+```bash
+git clone https://github.com/parthalon025/autonomous-coding-toolkit.git
+cd autonomous-coding-toolkit
+scripts/run-plan.sh examples/hello-world-plan.md --dry-run
+```
+**Why:** TTFV is the strongest predictor of retention. The current path to first value is 30-60 minutes. This compresses it to 3-5 minutes.
+**Evidence:** 34.7% abandon on difficult setup (Catchy Agency). Pre-boarding reduces friction by 80% (Remote Work Association). "Start with an example" (clig.dev).
+### 2. Make Brainstorming Default-On but Skippable (addresses F2)
+**What:** Add `--skip-brainstorm --design-doc path/to/design.md` to `/autocode`. Keep brainstorming mandatory for headless pipelines where no human is reviewing.
+**Why:** Mandatory brainstorming is correct for autonomous execution but wrong for experienced users who already have a design. Proportional process matches risk.
+**Evidence:** "Artificial ceremonies, unnecessary artifacts" reduce velocity (AWS AI-DLC). "Differentiated validation based on criticality" (SAFE-AI).
+### 3. Convert Design-Phase Gates to Guardrails (addresses F7)
+**What:** Change brainstorming and PRD from "block" to "warn + allow override." Keep execution-phase quality gates as hard blocks.
+**Why:** Guardrails maintain discipline without blocking progress. The evidence is unanimous that guardrails outperform gates for developer satisfaction.
+**Evidence:** "Guardrails don't restrict freedom; they enable it" (Microsoft Platform Engineering). "Security gates create artificial checkpoints that interrupt workflow" (Built In).
+### 4. Add a Test-After Mode for TDD (addresses F6)
+**What:** Allow `--test-strategy after` that lets the agent write implementation first, then tests. Still enforce test existence and passage before the next batch.
+**Why:** TDD has 15% overhead and strong cultural resistance. "Test after" still achieves quality goals while reducing adoption friction for TDD-skeptical developers.
+**Evidence:** TDD adoption blocked by culture more than technical barriers (State of TDD 2024). The quality gate (test count monotonicity) already enforces test existence regardless of when tests are written.
+### 5. Restructure README for Progressive Disclosure (addresses F3, F4, F14)
+**What:** README: problem (5 lines) -> quickstart (10 lines) -> "what is this" (20 lines) -> link to full docs. Move architecture, competitive mode, and advanced features to separate docs.
+**Why:** 519-line README violates every progressive disclosure guideline. Users need to understand the value proposition before the architecture.
+**Evidence:** "Show the most likely command first" (clig.dev). "Limit to core features during onboarding" (NN/g). Progressive disclosure reduces cognitive load (Nielsen, 1995).
+### 6. Add "Next Step" Suggestions to Command Output (addresses F10, F11)
+**What:** After every successful command, print what the user should do next.
+```
+Quality gate PASSED (batch 3/5)
+  Tests: 42 passed | Lesson check: clean | Memory: 8.2GB available
+  Next: Batch 4 will start automatically in 5 seconds.
+  To pause: Ctrl+C (resume with --resume)
+  To skip: --on-failure skip
+```
+**Why:** Reduces documentation dependency. Turns the CLI into a self-teaching tool.
+**Evidence:** "Identify common patterns of use, suggest the next best step" (Zapier CLI). "Include suggestions for how to fix" in errors (clig.dev, Atlassian).
+### 7. Lead Marketing with Trust Features (addresses F9)
+**What:** Reposition the toolkit's value proposition from "autonomous coding pipeline" to "the AI coding pipeline you can actually trust." Lead with quality gates, rollback, verification, and fresh context -- the features that address the trust crisis.
+**Why:** 46% of developers distrust AI tools. 90% use tools they don't fully trust. The toolkit solves the trust problem better than any competitor, but this isn't communicated.
+**Evidence:** Trust declining across all AI tools (Stack Overflow 2025). "Reliability" ranked #7 on developer wishlist for agentic IDEs but is the #1 frustration with current tools (RedMonk 2025).
+### 8. Implement Proportional Quality Gates (addresses F8)
+**What:** Classify batches by risk level (config-only, test-only, new-file, refactoring, integration) and adjust gate intensity. Config changes get lightweight gates; integration changes get full pipeline.
+**Why:** The SAFE-AI framework's strongest recommendation is proportional autonomy. One-size-fits-all gates are the workflow automation equivalent of one-size-fits-all clothing.
+**Evidence:** "Differentiated validation intensity based on operation criticality" (SAFE-AI). Batch-type classification already exists in the toolkit (`classify_batch_type()`) -- this just needs to be wired to gate intensity.
+---
+## Sources
+### Academic and Research Papers
+- [METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/) -- RCT with 16 developers, 246 tasks. AI made experienced devs 19% slower.
+- [GitHub Copilot Productivity Study (Peng et al., 2023)](https://arxiv.org/abs/2302.06590) -- RCT showing 55% faster task completion for simple tasks.
+- [SAFE-AI: Rethinking Autonomy in AI-Driven Software Engineering](https://arxiv.org/html/2508.11824v1) -- Framework for autonomy-risk gradient, guardrails, and human-in-the-loop.
+- [TiMi Studio: Impact of AI-Pair Programmers on Code Quality](https://dl.acm.org/doi/10.1145/3665348.3665383) -- Positive impact on quality and satisfaction, but trust and autonomy concerns.
+- [Testability-driven development: An improvement to TDD efficiency](https://www.sciencedirect.com/science/article/abs/pii/S0920548924000461) -- Alternative to strict TDD with similar quality outcomes.
+### Industry Surveys
+- [Stack Overflow 2025 Developer Survey](https://survey.stackoverflow.co/2025/) -- N=65k+. Trust at all-time low (29%), adoption at all-time high (84%).
+- [Stack Overflow: Developers remain willing but reluctant to use AI](https://stackoverflow.blog/2025/12/29/developers-remain-willing-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/)
+- [JetBrains State of Developer Ecosystem 2025](https://blog.jetbrains.com/research/2025/10/state-of-developer-ecosystem-2025/)
+- [The State of TDD 2024](https://thestateoftdd.org/results/2024) -- Biggest barriers: culture, individual preference, time pressure.
+- [Qodo: State of AI Code Quality 2025](https://www.qodo.ai/reports/state-of-ai-code-quality/) -- 96% don't trust AI code is correct; fewer than half review it.
+- [BayTech: The AI Trust Paradox in Software Development 2025](https://www.baytechconsulting.com/blog/the-ai-trust-paradox-software-development-2025) -- 90% use tools they don't fully trust.
+### Developer Tool Analysis
+- [RedMonk: 10 Things Developers Want from Agentic IDEs in 2025](https://redmonk.com/kholterhoff/2025/12/22/10-things-developers-want-from-their-agentic-ides-in-2025/) -- Background agents, persistent memory, reliability, human-in-the-loop, skills.
+- [Evil Martians: 6 Things Developer Tools Must Have for Trust and Adoption](https://evilmartians.com/chronicles/six-things-developer-tools-must-have-to-earn-trust-and-adoption) -- Discoverability, performance, maintenance signals.
+- [Catchy Agency: What 202 Open Source Developers Taught Us About Tool Adoption](https://www.catchyagency.com/post/what-202-open-source-developers-taught-us-about-tool-adoption) -- 34.7% abandon on difficult setup.
+- [Atlassian Developer Experience Report 2024](https://www.atlassian.com/blog/developer/developer-experience-report-2024) -- 97% lose time to inefficiencies.
+### Tool-Specific Reviews
+- [Cursor Reviews and Friction Points](https://blog.enginelabs.ai/cursor-ai-an-in-depth-review) -- UI instability, security concerns, codebase context strength.
+- [Devin AI Review: Testing Results](https://trickle.so/blog/devin-ai-review) -- 3/20 success rate, 15-min iteration cycles.
+- [Cognition: Devin's 2025 Performance Review](https://cognition.ai/blog/devin-annual-performance-review-2025) -- Learnings from 18 months of agents.
+- [Aider: Terminal-Based Code Assistant Review](https://www.blott.com/blog/post/aider-review-a-developers-month-with-this-terminal-based-code-assistant) -- "Precision tool," developer control emphasis.
+- [Anthropic: Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices) -- Feedback loops, context management, reviewer mode.
+- [MIT Technology Review: AI Coding is Everywhere, Not Everyone Convinced](https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/)
+### UX and Design Principles
+- [Nielsen Norman Group: Progressive Disclosure](https://www.nngroup.com/articles/progressive-disclosure/) -- Core principle since 1995.
+- [Interaction Design Foundation: Progressive Disclosure](https://www.interaction-design.org/literature/topics/progressive-disclosure)
+- [clig.dev: Command Line Interface Guidelines](https://clig.dev/) -- Comprehensive CLI UX patterns.
+- [Atlassian: 10 Design Principles for Delightful CLIs](https://www.atlassian.com/blog/it-teams/10-design-principles-for-delightful-clis)
+- [FullStory: What is User Friction?](https://www.fullstory.com/blog/user-friction/) -- Three types: cognitive, interaction, emotional.
+- [Userpilot: Progressive Disclosure Examples](https://userpilot.com/blog/progressive-disclosure-examples/)
+### Platform Engineering and Workflow Research
+- [AWS: Adaptive Workflows for AI-Driven Development Life Cycle](https://aws.amazon.com/blogs/devops/open-sourcing-adaptive-workflows-for-ai-driven-development-life-cycle-ai-dlc/) -- Rigid workflows reduce productivity.
+- [Microsoft: Self-Service with Guardrails](https://learn.microsoft.com/en-us/platform-engineering/about/self-service) -- Guardrails enable freedom.
+- [Built In: Developer-First Security Is About Guardrails, Not Gates](https://builtin.com/articles/developer-first-security-guardrails)
+- [Avyka: Why Guardrails > Gates in Modern Software Delivery](https://www.avyka.com/post/why-guardrails-gates-in-modern-software-delivery)
+- [HashiCorp: Reducing Cognitive Load and Process Friction](https://www.hashicorp.com/en/blog/3-ways-engineering-leaders-can-reduce-cognitive-load-and-process-friction)
+### Onboarding and Time-to-Value
+- [Cortex: Developer Onboarding Guide 2025](https://www.cortex.io/post/developer-onboarding-guide)
+- [Appcues: How to Shorten Time to Value](https://www.appcues.com/blog/time-to-value)
+- [Baremetrics: Time to Value (TTV)](https://baremetrics.com/academy/time-to-value-ttv)
+- [Lucas F. Costa: UX Patterns for CLI Tools](https://www.lucasfcosta.com/blog/ux-patterns-cli-tools)
+- [Zapier: Best Practices Building a CLI Tool](https://zapier.com/engineering/how-to-cli/)
+### Developer Discussions
+- [Hacker News: The Friction Between AI Coding Agents and Developer Flow](https://news.ycombinator.com/item?id=46950154)
+- [Hacker News: Professional Software Developers Don't Vibe, They Control](https://news.ycombinator.com/item?id=46437391)
+- [Hacker News: AI Agents -- Less Capability, More Reliability, Please](https://news.ycombinator.com/item?id=43535653)
+- [IT Pro: Nearly Half of Developers Don't Check AI-Generated Code](https://www.itpro.com/software/development/software-developers-not-checking-ai-generated-code-verification-debt)
+- [Stack Overflow Blog: Closing the AI Trust Gap for Developers](https://stackoverflow.blog/2026/02/18/closing-the-developer-ai-trust-gap)