npm - claude-code-pilot - Versions diffs - 3.0.0 → 3.1.1 - Mend

claude-code-pilot 3.0.0 → 3.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

package/README.md +76 -97
package/bin/install.js +14 -14
package/manifest.json +1 -1
package/package.json +17 -5
package/src/agents/doc-updater.md +1 -1
package/src/agents/gan-evaluator.md +209 -0
package/src/agents/gan-generator.md +131 -0
package/src/agents/gan-planner.md +99 -0
package/src/agents/harness-optimizer.md +35 -0
package/src/agents/loop-operator.md +36 -0
package/src/agents/opensource-forker.md +198 -0
package/src/agents/opensource-packager.md +249 -0
package/src/agents/opensource-sanitizer.md +188 -0
package/src/agents/performance-optimizer.md +446 -0
package/src/available-rules/README.md +1 -1
package/src/commands/{aside.md → ccp/aside.md} +14 -13
package/src/commands/{build-fix.md → ccp/build-fix.md} +5 -0
package/src/commands/{checkpoint.md → ccp/checkpoint.md} +12 -7
package/src/commands/{code-review.md → ccp/code-review.md} +5 -0
package/src/commands/{context-budget.md → ccp/context-budget.md} +2 -1
package/src/commands/{cpp-build.md → ccp/cpp-build.md} +6 -5
package/src/commands/{cpp-review.md → ccp/cpp-review.md} +7 -6
package/src/commands/{cpp-test.md → ccp/cpp-test.md} +6 -5
package/src/commands/ccp/docs-update.md +48 -0
package/src/commands/{docs.md → ccp/docs.md} +4 -3
package/src/commands/{e2e.md → ccp/e2e.md} +7 -6
package/src/commands/{eval.md → ccp/eval.md} +10 -5
package/src/commands/{evolve.md → ccp/evolve.md} +3 -3
package/src/commands/{go-build.md → ccp/go-build.md} +6 -5
package/src/commands/{go-review.md → ccp/go-review.md} +7 -6
package/src/commands/{go-test.md → ccp/go-test.md} +6 -5
package/src/commands/{gradle-build.md → ccp/gradle-build.md} +1 -0
package/src/commands/{harness-audit.md → ccp/harness-audit.md} +6 -1
package/src/commands/{kotlin-build.md → ccp/kotlin-build.md} +6 -5
package/src/commands/{kotlin-review.md → ccp/kotlin-review.md} +7 -6
package/src/commands/{kotlin-test.md → ccp/kotlin-test.md} +6 -5
package/src/commands/{learn.md → ccp/learn.md} +7 -2
package/src/commands/{model-route.md → ccp/model-route.md} +6 -1
package/src/commands/{orchestrate.md → ccp/orchestrate.md} +4 -3
package/src/commands/{plan.md → ccp/plan.md} +6 -5
package/src/commands/ccp/profile-user.md +46 -0
package/src/commands/{prompt-optimize.md → ccp/prompt-optimize.md} +3 -2
package/src/commands/{prune.md → ccp/prune.md} +4 -4
package/src/commands/{python-review.md → ccp/python-review.md} +7 -6
package/src/commands/{quality-gate.md → ccp/quality-gate.md} +6 -1
package/src/commands/{refactor-clean.md → ccp/refactor-clean.md} +5 -0
package/src/commands/{resume-session.md → ccp/resume-session.md} +9 -8
package/src/commands/ccp/review.md +37 -0
package/src/commands/{rules-distill.md → ccp/rules-distill.md} +2 -1
package/src/commands/{rust-build.md → ccp/rust-build.md} +6 -5
package/src/commands/{rust-review.md → ccp/rust-review.md} +7 -6
package/src/commands/{rust-test.md → ccp/rust-test.md} +6 -5
package/src/commands/{save-session.md → ccp/save-session.md} +2 -1
package/src/commands/ccp/secure-phase.md +35 -0
package/src/commands/{sessions.md → ccp/sessions.md} +29 -24
package/src/commands/{setup-pm.md → ccp/setup-pm.md} +1 -0
package/src/commands/{setup-refresh.md → ccp/setup-refresh.md} +4 -3
package/src/commands/{setup.md → ccp/setup.md} +24 -23
package/src/commands/{skill-create.md → ccp/skill-create.md} +8 -8
package/src/commands/{skill-health.md → ccp/skill-health.md} +5 -5
package/src/commands/{tdd.md → ccp/tdd.md} +9 -8
package/src/commands/{test-coverage.md → ccp/test-coverage.md} +5 -0
package/src/commands/{tool-guide.md → ccp/tool-guide.md} +2 -1
package/src/commands/{update-codemaps.md → ccp/update-codemaps.md} +5 -0
package/src/commands/{update-docs.md → ccp/update-docs.md} +5 -0
package/src/commands/{verify.md → ccp/verify.md} +5 -0
package/src/commands/ccp/workstreams.md +68 -0
package/src/examples/CLAUDE.md +4 -4
package/src/examples/django-api-CLAUDE.md +5 -5
package/src/examples/go-microservice-CLAUDE.md +6 -6
package/src/examples/rust-api-CLAUDE.md +4 -4
package/src/examples/saas-nextjs-CLAUDE.md +8 -8
package/src/hooks/session-start.js +1 -1
package/src/pilot/references/mcp-servers.json +1 -1
package/src/pilot/workflows/docs-update.md +1165 -0
package/src/pilot/workflows/help.md +48 -56
package/src/pilot/workflows/profile-user.md +452 -0
package/src/pilot/workflows/review.md +244 -0
package/src/pilot/workflows/secure-phase.md +164 -0
package/src/rules/common/code-review.md +124 -0
package/src/rules/zh/README.md +108 -0
package/src/rules/zh/agents.md +50 -0
package/src/rules/zh/code-review.md +124 -0
package/src/rules/zh/coding-style.md +48 -0
package/src/rules/zh/development-workflow.md +44 -0
package/src/rules/zh/git-workflow.md +24 -0
package/src/rules/zh/hooks.md +30 -0
package/src/rules/zh/patterns.md +31 -0
package/src/rules/zh/performance.md +55 -0
package/src/rules/zh/security.md +29 -0
package/src/rules/zh/testing.md +29 -0
package/src/skills/autonomous-agent-harness/SKILL.md +267 -0
package/src/skills/autonomous-loops/SKILL.md +610 -0
package/src/skills/bun-runtime/SKILL.md +84 -0
package/src/skills/content-hash-cache-pattern/SKILL.md +161 -0
package/src/skills/context-budget/SKILL.md +3 -3
package/src/skills/continuous-learning-v2/SKILL.md +4 -4
package/src/skills/continuous-learning-v2/agents/observer.md +1 -1
package/src/skills/cost-aware-llm-pipeline/SKILL.md +183 -0
package/src/skills/design-system/SKILL.md +82 -0
package/src/skills/eval-harness/SKILL.md +270 -0
package/src/skills/flutter-dart-code-review/SKILL.md +435 -0
package/src/skills/gan-style-harness/SKILL.md +278 -0
package/src/skills/git-workflow/SKILL.md +715 -0
package/src/skills/hexagonal-architecture/SKILL.md +276 -0
package/src/skills/iterative-retrieval/SKILL.md +211 -0
package/src/skills/laravel-plugin-discovery/SKILL.md +229 -0
package/src/skills/nextjs-turbopack/SKILL.md +44 -0
package/src/skills/nuxt4-patterns/SKILL.md +100 -0
package/src/skills/opensource-pipeline/SKILL.md +255 -0
package/src/skills/perl-security/SKILL.md +503 -0
package/src/skills/project-flow-ops/SKILL.md +111 -0
package/src/skills/project-guidelines-example/SKILL.md +349 -0
package/src/skills/prompt-optimizer/SKILL.md +38 -38
package/src/skills/pytorch-patterns/SKILL.md +396 -0
package/src/skills/regex-vs-llm-structured-text/SKILL.md +220 -0
package/src/skills/repo-scan/SKILL.md +78 -0
package/src/skills/rules-distill/SKILL.md +264 -0
package/src/skills/rules-distill/scripts/scan-rules.sh +58 -0
package/src/skills/rules-distill/scripts/scan-skills.sh +129 -0
package/src/skills/swift-concurrency-6-2/SKILL.md +216 -0
package/src/skills/token-budget-advisor/SKILL.md +133 -0
package/src/skills/verification-loop/SKILL.md +1 -1
package/src/skills/workspace-surface-audit/SKILL.md +125 -0

package/README.md CHANGED Viewed

@@ -1,11 +1,6 @@
 # Claude Code Pilot
-Universal Claude Code configuration for **any project**. Bundles two complementary systems:
-- **GSD** (Get Shit Done) v1.22.4 — *Methodology*. Spec-driven development with fresh 200k contexts per task. Prevents context rot.
-- **ECC** (Everything Claude Code) v1.8.0 — *Toolbox*. Session persistence, continuous learning, verification, quality gates. Prevents knowledge loss.
-One command to install. `/setup` wizard auto-configures everything.
+Universal Claude Code configuration for any project. One-command install, auto-setup wizard, 101 commands, 51 agents, 87 skills.
 ## Install
@@ -13,117 +8,99 @@ One command to install. `/setup` wizard auto-configures everything.
 cd your-project
 npx claude-code-pilot
 claude
-> /setup
+> /ccp:setup
 ```
+The `/ccp:setup` wizard scans your codebase, generates a tailored `CLAUDE.md`, detects your language, and activates the matching rule set.
 ## The Workflow
 ```
-npx claude-code-pilot            ← install (once)
-claude → /setup                         ← wizard scans codebase, generates CLAUDE.md
+npx claude-code-pilot              <- install (once)
+claude -> /ccp:setup               <- wizard scans codebase, generates CLAUDE.md
 # Development
-/gsd:discuss-phase 1                    ← GSD: capture requirements
-/gsd:plan-phase 1                       ← GSD: atomic plans
-/checkpoint create "before-impl"        ← ECC: snapshot
-/gsd:execute-phase 1                    ← GSD: build with fresh contexts
-/verify                                 ← ECC: build → types → lint → tests → security
-/gsd:verify-work 1                      ← GSD: does it match the spec?
-/save-session                           ← ECC: persist for next session
-/learn                                  ← ECC: extract reusable patterns
+/ccp:discuss-phase 1               <- capture requirements
+/ccp:plan-phase 1                  <- atomic plans
+/ccp:checkpoint create "before"    <- snapshot
+/ccp:execute-phase 1               <- build with fresh contexts
+/ccp:verify                        <- build, types, lint, tests, security
+/ccp:verify-work 1                 <- does it match the spec?
+/ccp:save-session                  <- persist for next session
+/ccp:learn                         <- extract reusable patterns
 # Next session
-/resume-session                         ← ECC: load previous context
-/evolve                                 ← ECC: promote instincts → skills
-/kit:update                             ← update everything
+/ccp:resume-session                <- load previous context
+/ccp:evolve                        <- promote instincts to skills
+/ccp:update                        <- update everything
 ```
-## What's Included (254 files after install)
+## What's Included (~494 files after install)
+| Category | Count | Description |
+|----------|-------|-------------|
+| `/ccp:` commands | 101 | Development lifecycle, quality gates, session management, learning |
+| Agents | 51 | Planning, execution, review, testing, security, documentation |
+| Skill packs | 87 | Continuous learning, verification, strategic compaction, and more |
+| Hooks | 15 | 14 event + 1 statusLine: safety guards, context management, notifications, session persistence |
+| Common rules | 10 | Always active (coding style, security, testing, performance, etc.) |
+| Chinese language rules | 11 | Always active (zh/ translations of common rules) |
+| Language-specific rule sets | 11 | Activated by `/ccp:setup` (see Language Rules below) |
+| Context modes | 3 | dev, research, review |
+| Reference examples | 6 | CLAUDE.md examples for different project types |
+### Command Highlights
-### GSD — Methodology (32 commands, 12 agents)
-| Command | What |
-|---------|------|
-| `/gsd:new-project` | Initialize with roadmap and milestones |
-| `/gsd:discuss-phase N` | Capture requirements |
-| `/gsd:plan-phase N` | Atomic plans (max 3 tasks per subagent) |
-| `/gsd:execute-phase N` | Build in fresh 200k contexts |
-| `/gsd:verify-work N` | Goal-backward verification |
-| `/gsd:quick "task"` | Same quality, lighter process |
-| `/gsd:progress` | Milestone progress |
-| `/gsd:help` | All 32 commands |
-12 specialized subagents: executor, planner, verifier, debugger, codebase-mapper, roadmapper, etc.
-### ECC — Toolbox (9 commands, 6 agents, 3 skills)
 | Command | What |
 |---------|------|
-| `/verify` | 6-phase quality check (build, types, lint, tests, security, diff) |
-| `/checkpoint` | Git-backed snapshots with before/after comparison |
-| `/save-session` | Persist what worked, what didn't, exact next step |
-| `/resume-session` | Load previous session context |
-| `/learn` | Extract reusable patterns from session |
-| `/evolve` | Promote learned instincts into skills/commands/agents |
-| `/quality-gate` | On-demand quality pipeline |
-| `/model-route` | Recommend model tier (Haiku/Sonnet/Opus) |
-| `/sessions` | List, load, alias saved sessions |
-6 agents: architect, code-reviewer, security-reviewer, tdd-guide, e2e-runner, doc-updater.
-3 skills: continuous-learning-v2 (instinct system), strategic-compact, verification-loop.
-### Kit — Infrastructure
-| Feature | What |
-|---------|------|
-| `/setup` wizard | Scans codebase, generates CLAUDE.md, installs language rules |
-| `/setup:refresh` | Updates config after project changes |
-| `/kit:update` | Updates kit + GSD + ECC |
-| Safety hooks | Blocks rm -rf, DROP TABLE, push --force; protects .env/lock files |
-| Notifications | Desktop alerts (task complete, idle, permission needed) |
-| Context modes | dev.md, research.md, review.md |
-| Language rules | 9 common + 7 language-specific sets (activated by /setup) |
-## How GSD and ECC Work Together
-| | GSD | ECC |
-|---|---|---|
-| **Purpose** | Structure *how* to work | Ensure *quality* of work |
-| **Core** | Phases, milestones, subagents | Verification, learning, persistence |
-| **Verification** | "Does it match the spec?" | "Is it production-ready?" |
-| **Context** | Fresh 200k per task | Strategic compact, pre-compact save |
-| **Knowledge** | Plans in files | Instincts that evolve into skills |
-| **Persistence** | Atomic commits | Session files with full context |
-## Hooks (13 total)
-| Hook | Event | Source |
-|------|-------|--------|
-| Command blocker | PreToolUse:Bash | Kit |
-| File protection | PreToolUse:Write\|Edit | Kit |
-| Context monitor | PostToolUse | GSD |
-| GSD update check | SessionStart | GSD |
-| Kit update check | SessionStart | Kit |
-| Session loader | SessionStart | ECC |
-| Session saver | Stop | ECC |
-| Pre-compact save | PreCompact | ECC |
-| Strategic compact | PreToolUse:Edit\|Write | ECC |
-| Task completed | Stop | Kit |
-| Idle prompt | Notification | Kit |
-| Permission needed | Notification | Kit |
-| Statusline | Always | GSD |
+| `/ccp:setup` | Wizard: scan codebase, generate CLAUDE.md, activate language rules |
+| `/ccp:discuss-phase N` | Capture requirements for a development phase |
+| `/ccp:plan-phase N` | Create atomic plans (max 3 tasks per subagent) |
+| `/ccp:execute-phase N` | Build in fresh 200k contexts, preventing context rot |
+| `/ccp:verify` | 6-phase quality check: build, types, lint, tests, security, diff |
+| `/ccp:checkpoint` | Git-backed snapshots with before/after comparison |
+| `/ccp:save-session` | Persist what worked, what didn't, exact next step |
+| `/ccp:resume-session` | Load previous session context |
+| `/ccp:learn` | Extract reusable patterns from session |
+| `/ccp:evolve` | Promote learned instincts into skills, commands, or agents |
+Run `/ccp:help` for the full list of all 101 commands.
+## Hooks (15 total)
+14 event hooks and 1 statusLine, all registered automatically during install.
+| Hook | Event | Description |
+|------|-------|-------------|
+| Command blocker | PreToolUse:Bash | Blocks dangerous commands (rm -rf, DROP TABLE, push --force) |
+| File protection | PreToolUse:Write/Edit | Protects .env, lock files, and other sensitive files |
+| Context monitor | PostToolUse | Tracks context window usage |
+| Update check | SessionStart | Checks npm for new CCP versions |
+| Session loader | SessionStart | Loads previous session context |
+| Session saver | Stop | Persists session state on exit |
+| Pre-compact save | PreCompact | Saves context before compaction |
+| Strategic compact | PreToolUse:Edit/Write | Manages context strategically |
+| Task completed | Stop | Desktop notification on task completion |
+| Idle prompt | Notification | Desktop alert when idle |
+| Permission needed | Notification | Desktop alert when permission required |
+| Auto-format | PostToolUse:Write/Edit | Formats code after writes |
+| Agent shield | PreToolUse | Agent safety guardrails |
+| Quality gate | Stop | Final quality verification |
+| Statusline | Always | Shows project status in Claude Code status bar |
 ## Language Rules
-The installer copies 9 common rules (always active) and 7 language-specific rule sets into `available-rules/`. The `/setup` wizard detects your language and activates the matching set.
+10 common rules are always active. 11 language-specific rule sets are available in `available-rules/` and activated by the `/ccp:setup` wizard when it detects your project language.
-Available: TypeScript, Python, Go, Swift, Kotlin, PHP, Perl.
+Available language rule sets: TypeScript, Python, Go, Swift, Kotlin, PHP, Perl, Java, C++, C#, Rust.
 ## Self-Updating
-On every session start, the kit checks npm for new versions. When available, the agent sees: *"⬆️ Pilot update available. Run /kit:update."*
+On every session start, CCP checks npm for new versions. When available, the agent sees a notification prompting an update.
 ```bash
 # From Claude Code:
-/kit:update
+/ccp:update
 # From terminal:
 npx claude-code-pilot@latest --update
@@ -145,7 +122,9 @@ npx claude-code-pilot [options]
 ## Credits
-- **GSD** by [TÂCHES](https://github.com/glittercowboy/get-shit-done) (MIT)
-- **ECC** by [Affaan Mustafa](https://github.com/affaan-m/everything-claude-code) (MIT)
-- **find-skills** by [Vercel Labs](https://github.com/vercel-labs/skills)
-- **Context7** by [Upstash](https://github.com/upstash/context7)
+Built on the shoulders of:
+- **GSD** by [TACHES](https://github.com/glittercowboy/get-shit-done) (MIT) -- spec-driven development methodology
+- **ECC** by [Affaan Mustafa](https://github.com/affaan-m/everything-claude-code) (MIT) -- session persistence and learning toolbox
+- **find-skills** by [Vercel Labs](https://github.com/vercel-labs/skills) -- skill discovery
+- **Context7** by [Upstash](https://github.com/upstash/context7) -- documentation context

package/bin/install.js CHANGED Viewed

@@ -42,24 +42,24 @@ const targetDir = isGlobal
   ? (process.env.CLAUDE_CONFIG_DIR || path.join(os.homedir(), '.claude'))
   : path.join(process.cwd(), '.claude');
 const locationLabel = isGlobal ? targetDir.replace(os.homedir(), '~') : '.claude/';
-const pathPrefix = isGlobal ? `${targetDir.replace(/\\/g, '/')}/` : './.claude/';
+const pathPrefix = `${targetDir.replace(/\\/g, '/')}/`;
 const projectRoot = isGlobal ? os.homedir() : process.cwd();
 // --- Templates ---
-const BOOTSTRAP_MARKER = "Run `/setup` to auto-configure";
+const BOOTSTRAP_MARKER = "Run `/ccp:setup` to auto-configure";
 const BOOTSTRAP_TEMPLATE = `# Project Agent Guide
-> **First time?** Run \`/setup\` to auto-configure this project.
+> **First time?** Run \`/ccp:setup\` to auto-configure this project.
 > **Update?** Run \`/ccp:update\` to get the latest version.
-This file will be generated by \`/setup\` with project-specific details.
+This file will be generated by \`/ccp:setup\` with project-specific details.
 ## Claude Code Pilot
 Spec-driven development with fresh 200k contexts per task:
-- \`/setup\` -- Auto-configure this project
+- \`/ccp:setup\` -- Auto-configure this project
 - \`/ccp:new-project\` -- Initialize with roadmap and milestones
 - \`/ccp:quick "task"\` -- Lightweight mode for small tasks
 - \`/ccp:discuss-phase N\` -- Capture requirements
@@ -69,10 +69,10 @@ Spec-driven development with fresh 200k contexts per task:
 ## Utilities
-- \`/verify\` -- Multi-phase verification (build, types, lint, tests, security)
-- \`/checkpoint create "name"\` -- Git-backed snapshots
-- \`/save-session\` -- Persist session state
-- \`/learn\` -- Extract reusable patterns
+- \`/ccp:verify\` -- Multi-phase verification (build, types, lint, tests, security)
+- \`/ccp:checkpoint create "name"\` -- Git-backed snapshots
+- \`/ccp:save-session\` -- Persist session state
+- \`/ccp:learn\` -- Extract reusable patterns
 ## Before ANY Work
@@ -409,7 +409,7 @@ function install() {
   ${fileCount} files, ${hookCount} hooks, ${totalAgents} agents
   Next:
-    ${cyan}/setup${reset}            -- auto-configure this project
+    ${cyan}/ccp:setup${reset}        -- auto-configure this project
   Build:
     ${cyan}/ccp:new-project${reset}  -- initialize with roadmap
@@ -418,10 +418,10 @@ function install() {
     ${cyan}/ccp:help${reset}         -- all commands
   Utilities:
-    ${cyan}/save-session${reset}     -- persist session state
-    ${cyan}/verify${reset}           -- multi-phase verification
-    ${cyan}/checkpoint create "v1"${reset} -- git-backed snapshot
-    ${cyan}/learn${reset}            -- extract reusable patterns
+    ${cyan}/ccp:save-session${reset} -- persist session state
+    ${cyan}/ccp:verify${reset}       -- multi-phase verification
+    ${cyan}/ccp:checkpoint create "v1"${reset} -- git-backed snapshot
+    ${cyan}/ccp:learn${reset}        -- extract reusable patterns
 `);
 }

package/manifest.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "version": "3.0.0",
+  "version": "3.1.1",
   "external": {
     "find-skills": {
       "install": "npx skills add https://github.com/vercel-labs/skills --skill find-skills -g -a claude-code -y",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-code-pilot",
-  "version": "3.0.0",
+  "version": "3.1.1",
   "description": "Claude Code Pilot -- universal AI coding companion with spec-driven development, session persistence, and auto-setup.",
   "bin": {
     "claude-code-pilot": "bin/install.js"
@@ -12,9 +12,19 @@
     "manifest.json"
   ],
   "keywords": [
-    "claude", "claude-code", "claude-code-pilot", "ai", "coding-agent",
-    "pilot", "spec-driven", "skills", "mcp",
-    "context-engineering", "continuous-learning", "tdd", "code-review"
+    "claude",
+    "claude-code",
+    "claude-code-pilot",
+    "ai",
+    "coding-agent",
+    "pilot",
+    "spec-driven",
+    "skills",
+    "mcp",
+    "context-engineering",
+    "continuous-learning",
+    "tdd",
+    "code-review"
   ],
   "author": "Othmane Hanyf <othmane.hanyf@ahven.ma>",
   "license": "MIT",
@@ -26,7 +36,9 @@
   "bugs": {
     "url": "https://github.com/OthmaneHanyf/claude-code-pilot/issues"
   },
-  "engines": { "node": ">=16.7.0" },
+  "engines": {
+    "node": ">=16.7.0"
+  },
   "scripts": {
     "test": "node test/run.js",
     "prepublishOnly": "node scripts/prepublish-checks.js"

package/src/agents/doc-updater.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: doc-updater
-description: Documentation and codemap specialist. Use PROACTIVELY for updating codemaps and documentation. Runs /update-codemaps and /update-docs, generates docs/CODEMAPS/*, updates READMEs and guides.
+description: Documentation and codemap specialist. Use PROACTIVELY for updating codemaps and documentation. Runs /ccp:update-codemaps and /ccp:update-docs, generates docs/CODEMAPS/*, updates READMEs and guides.
 tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
 model: haiku
 ---

package/src/agents/gan-evaluator.md ADDED Viewed

@@ -0,0 +1,209 @@
+---
+name: gan-evaluator
+description: "GAN Harness — Evaluator agent. Tests the live running application via Playwright, scores against rubric, and provides actionable feedback to the Generator."
+tools: ["Read", "Write", "Bash", "Grep", "Glob"]
+model: opus
+color: red
+---
+You are the **Evaluator** in a GAN-style multi-agent harness (inspired by Anthropic's harness design paper, March 2026).
+## Your Role
+You are the QA Engineer and Design Critic. You test the **live running application** — not the code, not a screenshot, but the actual interactive product. You score it against a strict rubric and provide detailed, actionable feedback.
+## Core Principle: Be Ruthlessly Strict
+> You are NOT here to be encouraging. You are here to find every flaw, every shortcut, every sign of mediocrity. A passing score must mean the app is genuinely good — not "good for an AI."
+**Your natural tendency is to be generous.** Fight it. Specifically:
+- Do NOT say "overall good effort" or "solid foundation" — these are cope
+- Do NOT talk yourself out of issues you found ("it's minor, probably fine")
+- Do NOT give points for effort or "potential"
+- DO penalize heavily for AI-slop aesthetics (generic gradients, stock layouts)
+- DO test edge cases (empty inputs, very long text, special characters, rapid clicking)
+- DO compare against what a professional human developer would ship
+## Evaluation Workflow
+### Step 1: Read the Rubric
+```
+Read gan-harness/eval-rubric.md for project-specific criteria
+Read gan-harness/spec.md for feature requirements
+Read gan-harness/generator-state.md for what was built
+```
+### Step 2: Launch Browser Testing
+```bash
+# The Generator should have left a dev server running
+# Use Playwright MCP to interact with the live app
+# Navigate to the app
+playwright navigate http://localhost:${GAN_DEV_SERVER_PORT:-3000}
+# Take initial screenshot
+playwright screenshot --name "initial-load"
+```
+### Step 3: Systematic Testing
+#### A. First Impression (30 seconds)
+- Does the page load without errors?
+- What's the immediate visual impression?
+- Does it feel like a real product or a tutorial project?
+- Is there a clear visual hierarchy?
+#### B. Feature Walk-Through
+For each feature in the spec:
+```
+1. Navigate to the feature
+2. Test the happy path (normal usage)
+3. Test edge cases:
+   - Empty inputs
+   - Very long inputs (500+ characters)
+   - Special characters (<script>, emoji, unicode)
+   - Rapid repeated actions (double-click, spam submit)
+4. Test error states:
+   - Invalid data
+   - Network-like failures
+   - Missing required fields
+5. Screenshot each state
+```
+#### C. Design Audit
+```
+1. Check color consistency across all pages
+2. Verify typography hierarchy (headings, body, captions)
+3. Test responsive: resize to 375px, 768px, 1440px
+4. Check spacing consistency (padding, margins)
+5. Look for:
+   - AI-slop indicators (generic gradients, stock patterns)
+   - Alignment issues
+   - Orphaned elements
+   - Inconsistent border radiuses
+   - Missing hover/focus/active states
+```
+#### D. Interaction Quality
+```
+1. Test all clickable elements
+2. Check keyboard navigation (Tab, Enter, Escape)
+3. Verify loading states exist (not instant renders)
+4. Check transitions/animations (smooth? purposeful?)
+5. Test form validation (inline? on submit? real-time?)
+```
+### Step 4: Score
+Score each criterion on a 1-10 scale. Use the rubric in `gan-harness/eval-rubric.md`.
+**Scoring calibration:**
+- 1-3: Broken, embarrassing, would not show to anyone
+- 4-5: Functional but clearly AI-generated, tutorial-quality
+- 6: Decent but unremarkable, missing polish
+- 7: Good — a junior developer's solid work
+- 8: Very good — professional quality, some rough edges
+- 9: Excellent — senior developer quality, polished
+- 10: Exceptional — could ship as a real product
+**Weighted score formula:**
+```
+weighted = (design * 0.3) + (originality * 0.2) + (craft * 0.3) + (functionality * 0.2)
+```
+### Step 5: Write Feedback
+Write feedback to `gan-harness/feedback/feedback-NNN.md`:
+```markdown
+# Evaluation — Iteration NNN
+## Scores
+| Criterion | Score | Weight | Weighted |
+|-----------|-------|--------|----------|
+| Design Quality | X/10 | 0.3 | X.X |
+| Originality | X/10 | 0.2 | X.X |
+| Craft | X/10 | 0.3 | X.X |
+| Functionality | X/10 | 0.2 | X.X |
+| **TOTAL** | | | **X.X/10** |
+## Verdict: PASS / FAIL (threshold: 7.0)
+## Critical Issues (must fix)
+1. [Issue]: [What's wrong] → [How to fix]
+2. [Issue]: [What's wrong] → [How to fix]
+## Major Issues (should fix)
+1. [Issue]: [What's wrong] → [How to fix]
+## Minor Issues (nice to fix)
+1. [Issue]: [What's wrong] → [How to fix]
+## What Improved Since Last Iteration
+- [Improvement 1]
+- [Improvement 2]
+## What Regressed Since Last Iteration
+- [Regression 1] (if any)
+## Specific Suggestions for Next Iteration
+1. [Concrete, actionable suggestion]
+2. [Concrete, actionable suggestion]
+## Screenshots
+- [Description of what was captured and key observations]
+```
+## Feedback Quality Rules
+1. **Every issue must have a "how to fix"** — Don't just say "design is generic." Say "Replace the gradient background (#667eea→#764ba2) with a solid color from the spec palette. Add a subtle texture or pattern for depth."
+2. **Reference specific elements** — Not "the layout needs work" but "the sidebar cards at 375px overflow their container. Set `max-width: 100%` and add `overflow: hidden`."
+3. **Quantify when possible** — "The CLS score is 0.15 (should be <0.1)" or "3 out of 7 features have no error state handling."
+4. **Compare to spec** — "Spec requires drag-and-drop reordering (Feature #4). Currently not implemented."
+5. **Acknowledge genuine improvements** — When the Generator fixes something well, note it. This calibrates the feedback loop.
+## Browser Testing Commands
+Use Playwright MCP or direct browser automation:
+```bash
+# Navigate
+npx playwright test --headed --browser=chromium
+# Or via MCP tools if available:
+# mcp__playwright__navigate { url: "http://localhost:3000" }
+# mcp__playwright__click { selector: "button.submit" }
+# mcp__playwright__fill { selector: "input[name=email]", value: "test@example.com" }
+# mcp__playwright__screenshot { name: "after-submit" }
+```
+If Playwright MCP is not available, fall back to:
+1. `curl` for API testing
+2. Build output analysis
+3. Screenshot via headless browser
+4. Test runner output
+## Evaluation Mode Adaptation
+### `playwright` mode (default)
+Full browser interaction as described above.
+### `screenshot` mode
+Take screenshots only, analyze visually. Less thorough but works without MCP.
+### `code-only` mode
+For APIs/libraries: run tests, check build, analyze code quality. No browser.
+```bash
+# Code-only evaluation
+npm run build 2>&1 | tee /tmp/build-output.txt
+npm test 2>&1 | tee /tmp/test-output.txt
+npx eslint . 2>&1 | tee /tmp/lint-output.txt
+```
+Score based on: test pass rate, build success, lint issues, code coverage, API response correctness.