forgedev 1.1.3 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/README.md +2 -1
  2. package/bin/devforge.js +2 -1
  3. package/docs/00-README.md +310 -0
  4. package/docs/01-universal-prompt-library.md +1049 -0
  5. package/docs/02-claude-code-mastery-playbook.md +283 -0
  6. package/docs/03-multi-agent-verification.md +565 -0
  7. package/docs/04-errata-and-verification-checklist.md +284 -0
  8. package/docs/05-universal-scaffolder-vision.md +452 -0
  9. package/docs/06-confidence-assessment-and-repo-prompt.md +407 -0
  10. package/docs/errata.md +58 -0
  11. package/docs/multi-agent-verification.md +66 -0
  12. package/docs/plans/.gitkeep +0 -0
  13. package/docs/playbook.md +95 -0
  14. package/docs/prompt-library.md +160 -0
  15. package/docs/uat/UAT_CHECKLIST.csv +9 -0
  16. package/docs/uat/UAT_TEMPLATE.md +163 -0
  17. package/package.json +10 -2
  18. package/src/claude-configurator.js +1 -0
  19. package/src/cli.js +5 -5
  20. package/src/index.js +3 -3
  21. package/src/utils.js +1 -1
  22. package/templates/base/docs/plans/.gitkeep +0 -0
  23. package/templates/base/docs/uat/UAT_CHECKLIST.csv.template +2 -0
  24. package/templates/base/docs/uat/UAT_TEMPLATE.md.template +22 -0
  25. package/templates/claude-code/agents/build-error-resolver.md +3 -2
  26. package/templates/claude-code/agents/code-quality-reviewer.md +1 -1
  27. package/templates/claude-code/agents/database-reviewer.md +1 -1
  28. package/templates/claude-code/agents/doc-updater.md +1 -1
  29. package/templates/claude-code/agents/harness-optimizer.md +26 -0
  30. package/templates/claude-code/agents/loop-operator.md +2 -1
  31. package/templates/claude-code/agents/product-strategist.md +124 -0
  32. package/templates/claude-code/agents/security-reviewer.md +1 -0
  33. package/templates/claude-code/agents/spec-validator.md +31 -1
  34. package/templates/claude-code/agents/uat-validator.md +4 -0
  35. package/templates/claude-code/claude-md/base.md +1 -0
  36. package/templates/claude-code/claude-md/nextjs.md +1 -1
  37. package/templates/claude-code/commands/code-review.md +7 -1
  38. package/templates/claude-code/commands/full-audit.md +3 -2
  39. package/templates/claude-code/commands/workflows.md +3 -0
  40. package/templates/claude-code/hooks/scripts/autofix-polyglot.mjs +20 -10
  41. package/templates/claude-code/hooks/scripts/autofix-python.mjs +3 -4
  42. package/templates/claude-code/hooks/scripts/autofix-typescript.mjs +3 -3
  43. package/templates/claude-code/hooks/scripts/guard-protected-files.mjs +2 -2
  44. package/templates/claude-code/skills/git-workflow/SKILL.md +2 -2
  45. package/templates/claude-code/skills/nextjs/SKILL.md +1 -1
  46. package/templates/claude-code/skills/playwright/SKILL.md +6 -5
  47. package/templates/claude-code/skills/security-web/SKILL.md +1 -0
  48. package/templates/infra/github-actions/.github/workflows/ci.yml.template +49 -0
  49. package/templates/testing/pytest/backend/tests/__init__.py +0 -0
  50. package/templates/testing/pytest/backend/tests/conftest.py.template +11 -0
  51. package/templates/testing/pytest/backend/tests/test_health.py.template +10 -0
  52. package/templates/testing/vitest/vitest.config.ts.template +18 -0
  53. package/CLAUDE.md +0 -38
@@ -0,0 +1,407 @@
1
+ # Confidence Assessment & The Repo Prompt
2
+
3
+ ---
4
+
5
+ ## Gap Analysis: Failover and UAT
6
+
7
+ ### UAT Coverage: WEAK (3/10)
8
+
9
+ **What's there:**
10
+ - Prompt library mentions "acceptance criteria" in spec writing (Flow 1)
11
+ - Multi-agent doc's spec-validator checks requirements as IMPLEMENTED/PARTIAL/MISSING
12
+ - AI quality auditor checks for graceful degradation
13
+
14
+ **What's MISSING:**
15
+ - No dedicated UAT flow or prompt
16
+ - No UAT scenario template
17
+ - No staging/pre-production verification step
18
+ - No manual testing checklist that pairs with automated tests
19
+ - No "run through the app as a real user" verification step
20
+ - No smoke test protocol after deployment
21
+ - No UAT sign-off gate before marking a feature complete
22
+
23
+ ### Failover Coverage: MODERATE (5/10)
24
+
25
+ **What's there:**
26
+ - AI quality auditor checks for graceful fallback (point 7 of the 7-point audit)
27
+ - Production readiness agent checks error handling and recovery
28
+ - Security agent checks for AI unavailability handling
29
+
30
+ **What's MISSING:**
31
+ - No circuit breaker patterns in the scaffolding
32
+ - No health check endpoint generation
33
+ - No retry/backoff strategy guidance
34
+ - No database connection failover
35
+ - No external service timeout configuration
36
+ - No graceful shutdown handling
37
+ - No queue/dead letter patterns for async operations
38
+
39
+ ---
40
+
41
+ ## Confidence Assessment: Will These Documents Reduce Recurring Issues?
42
+
43
+ | Issue You Described | Coverage | Confidence | Why |
44
+ |--------------------|----------|-----------|-----|
45
+ | "I always have to ask if it's sure many times" | Stop hooks + completion protocol | **8/10** | Hooks are deterministic — Claude physically can't say "done" with broken types/tests. The 20% gap is that hooks only catch what scripts can check — logic errors still need human review. |
46
+ | "Missing things in instructions" | Lean CLAUDE.md (~150 lines) + skills | **7/10** | Cutting from 650+ to 150 lines directly addresses instruction-following degradation. Skills load domain knowledge on demand without bloating every session. The gap: you still need discipline to keep it lean over time. |
47
+ | "Playwright regression issues" | Testing skill + pre-commit hooks + data-testid enforcement | **6/10** | The patterns are sound (no waitForTimeout, semantic locators, independent tests). But Playwright flakiness often comes from app-specific timing issues that no generic pattern can predict. You'll still need to tune for your app. |
48
+ | "Things wouldn't work as intended" | Spec-validator agent + audit-wiring command | **7/10** | Dead feature detection (endpoint exists but nothing calls it) catches the most common "looks done but isn't" problem. The gap: business logic correctness still requires human judgment. |
49
+ | "Quality of AI output" | 7-point prompt audit + AI quality auditor agent | **8/10** | This is actually the strongest area. The concrete data embedding rule alone would have prevented most AI quality issues. The anti-hallucination boundary and confidence scoring are well-established patterns. |
50
+ | "Doesn't group frontend/backend well" | Directory-scoped CLAUDE.md + task separation prompts | **7/10** | Backend-first-then-frontend with /clear between works well. The gap: Claude sometimes still sneaks in cross-domain changes if the task description is ambiguous. |
51
+ | "Waste tokens troubleshooting" | Atomic tasks + /clear + plan mode | **7/10** | Smaller tasks = less context degradation = fewer retries. Plan mode catches issues before code. The gap: you need the discipline to actually /clear between tasks. |
52
+ | **UAT / user acceptance** | **WEAK** | **3/10** | **This is the biggest gap. No structured UAT flow exists in the current documents.** |
53
+ | **Failover / production resilience** | **MODERATE** | **5/10** | **Partially covered by production-readiness agent but no scaffolded patterns.** |
54
+
55
+ ### Overall Confidence: **6.5/10**
56
+
57
+ The documents would meaningfully reduce your issues — probably cut "are you sure?"
58
+ cycles by 60-70%. But they're strongest on code quality and weakest on UAT and
59
+ production resilience. Those gaps need to be filled.
60
+
61
+ ---
62
+
63
+ ## The Repo Concept
64
+
65
+ **Name suggestions (matching the mission):**
66
+
67
+ | Name | Why | npm availability |
68
+ |------|-----|-----------------|
69
+ | `forge-init` | You're forging production-ready projects from raw ideas | Check `npx forge-init` |
70
+ | `scaffold-ai` | Scaffolding + AI-first development | Check `npx scaffold-ai` |
71
+ | `launchpad-dev` | Launch pad for any project | Check `npx launchpad-dev` |
72
+ | `init-forge` | Forging from init | Check `npx init-forge` |
73
+ | `devforge` | Developer's forge | Check `npx devforge` |
74
+ | `buildkit-ai` | Build kit with AI infrastructure | Check `npx buildkit-ai` |
75
+ | `startship` | Start + ship (you start, you ship) | Check `npx startship` |
76
+
77
+ My recommendation: **`devforge`** — short, memorable, captures the idea of
78
+ forging production-ready projects from raw materials.
79
+
80
+ ---
81
+
82
+ ## The Prompt (Drop This Into Claude Code)
83
+
84
+ This is the single prompt that builds the entire repo from scratch. It's long
85
+ because it's a complete specification. Copy the whole thing.
86
+
87
+ ```
88
+ I'm building a new open-source CLI tool called devforge (or [your chosen name]).
89
+
90
+ It's a universal, AI-first project scaffolding tool that:
91
+ 1. Asks what you're building (web app, API, full-stack, mobile, CLI, AI service, etc.)
92
+ 2. Recommends the optimal tech stack based on the service type
93
+ 3. Scaffolds the project with the right structure, configs, and dependencies
94
+ 4. Ships with Claude Code infrastructure (CLAUDE.md, hooks, skills, agents, commands) tailored to the selected stack
95
+ 5. Includes UAT templates, failover patterns, and production readiness checks
96
+
97
+ ## Repository structure to create:
98
+
99
+ ```
100
+ devforge/
101
+ ├── package.json # CLI package, bin: "devforge"
102
+ ├── README.md # How to install and use
103
+ ├── LICENSE # MIT
104
+ ├── CLAUDE.md # For developing devforge itself
105
+ ├── .claude/
106
+ │ ├── settings.json # Hooks for devforge development
107
+ │ └── agents/
108
+ │ └── code-quality-reviewer.md
109
+
110
+ ├── bin/
111
+ │ └── devforge.js # CLI entry point (#!/usr/bin/env node)
112
+
113
+ ├── src/
114
+ │ ├── index.js # Main orchestrator
115
+ │ ├── prompts.js # Interactive CLI prompts (Inquirer.js)
116
+ │ ├── recommender.js # Service type → stack recommendation engine
117
+ │ ├── composer.js # Template composition engine
118
+ │ ├── claude-configurator.js # Generates .claude/ directory for the project
119
+ │ ├── uat-generator.js # Generates UAT templates and checklists
120
+ │ └── utils.js # File operations, logging, colors
121
+
122
+ ├── templates/
123
+ │ ├── base/ # Every project gets this
124
+ │ │ ├── .gitignore.template
125
+ │ │ ├── README.md.template
126
+ │ │ └── docs/
127
+ │ │ ├── plans/ # Empty plans directory
128
+ │ │ └── uat/ # UAT templates
129
+ │ │ ├── UAT_TEMPLATE.md
130
+ │ │ └── UAT_CHECKLIST.csv
131
+ │ │
132
+ │ ├── frontend/
133
+ │ │ ├── nextjs/ # Next.js App Router + TypeScript + Tailwind + Shadcn
134
+ │ │ └── react-vite/ # React + Vite + TypeScript + Tailwind
135
+ │ │
136
+ │ ├── backend/
137
+ │ │ ├── fastapi/ # FastAPI + SQLAlchemy 2.0 + Pydantic v2
138
+ │ │ ├── hono/ # Hono + TypeScript
139
+ │ │ └── express/ # Express + TypeScript
140
+ │ │
141
+ │ ├── database/
142
+ │ │ ├── prisma-postgres/ # Prisma + PostgreSQL
143
+ │ │ └── sqlalchemy-postgres/ # SQLAlchemy + PostgreSQL + Alembic
144
+ │ │
145
+ │ ├── auth/
146
+ │ │ ├── nextauth/ # NextAuth.js
147
+ │ │ └── jwt-custom/ # Custom JWT
148
+ │ │
149
+ │ ├── testing/
150
+ │ │ ├── vitest/ # Vitest config + example test
151
+ │ │ ├── playwright/ # Playwright config + example E2E + fixture patterns
152
+ │ │ └── pytest/ # Pytest config + example test + fixtures
153
+ │ │
154
+ │ ├── infra/
155
+ │ │ ├── docker-compose/ # docker-compose.yml template
156
+ │ │ └── github-actions/ # CI/CD workflow template
157
+ │ │
158
+ │ └── claude-code/ # Claude Code infrastructure modules
159
+ │ ├── hooks/
160
+ │ │ ├── typescript.json # PostToolUse: eslint, Stop: tsc + eslint
161
+ │ │ ├── python.json # PostToolUse: ruff, Stop: pyright + ruff
162
+ │ │ └── polyglot.json # Both TypeScript and Python hooks
163
+ │ │
164
+ │ ├── claude-md/
165
+ │ │ ├── nextjs.md # CLAUDE.md template for Next.js projects
166
+ │ │ ├── fastapi.md # CLAUDE.md template for FastAPI projects
167
+ │ │ ├── fullstack.md # CLAUDE.md template for full-stack projects
168
+ │ │ └── base.md # Base CLAUDE.md template (universal rules)
169
+ │ │
170
+ │ ├── skills/
171
+ │ │ ├── nextjs/SKILL.md
172
+ │ │ ├── fastapi/SKILL.md
173
+ │ │ ├── playwright/SKILL.md
174
+ │ │ ├── security-web/SKILL.md
175
+ │ │ ├── security-api/SKILL.md
176
+ │ │ └── ai-prompts/SKILL.md
177
+ │ │
178
+ │ ├── agents/
179
+ │ │ ├── code-quality-reviewer.md # Universal code quality agent
180
+ │ │ ├── security-reviewer.md # Universal security agent
181
+ │ │ ├── spec-validator.md # Universal spec compliance agent
182
+ │ │ ├── production-readiness.md # Universal production readiness agent
183
+ │ │ └── uat-validator.md # UAT verification agent (NEW)
184
+ │ │
185
+ │ └── commands/
186
+ │ ├── verify-all.md
187
+ │ ├── audit-spec.md
188
+ │ ├── audit-wiring.md
189
+ │ ├── audit-security.md
190
+ │ ├── pre-pr.md
191
+ │ └── run-uat.md # UAT execution command (NEW)
192
+
193
+ ├── docs/
194
+ │ ├── universal-prompt-library.md # The complete prompt library (all 6 flows)
195
+ │ ├── multi-agent-verification.md # Agent architecture documentation
196
+ │ ├── playbook.md # CLAUDE.md structuring guide
197
+ │ └── errata.md # Known issues and testing checklist
198
+
199
+ └── tests/
200
+ ├── recommender.test.js # Tests for stack recommendation logic
201
+ ├── composer.test.js # Tests for template composition
202
+ └── claude-configurator.test.js # Tests for .claude/ generation
203
+ ```
204
+
205
+ ## The CLI flow:
206
+
207
+ ```
208
+ $ npx devforge my-app
209
+
210
+ 🔨 Welcome to DevForge!
211
+
212
+ What are you building?
213
+ 1. Web app (SPA / SSR / static)
214
+ 2. API / backend service
215
+ 3. Full-stack app (frontend + backend)
216
+ 4. Mobile app (cross-platform)
217
+ 5. CLI tool / utility
218
+ 6. AI/ML powered service
219
+ 7. Desktop app
220
+ 8. Browser extension
221
+ 9. Microservice / serverless
222
+ 10. Describe it (AI recommends)
223
+ > _
224
+
225
+ [After selection, ask refinement questions:]
226
+ - Language preference? (TypeScript, Python, Go, Rust)
227
+ - Need authentication? (y/n)
228
+ - Need AI/LLM integration? (y/n)
229
+ - Need file uploads? (y/n)
230
+ - Need real-time features? (y/n)
231
+ - Deployment target? (Docker, Vercel, AWS, GCP)
232
+ - Include Claude Code infrastructure? (y/n) — default yes
233
+
234
+ [Show recommendation, let user confirm/adjust]
235
+
236
+ [Scaffold the project]
237
+ [Generate Claude Code infrastructure]
238
+ [Generate UAT templates]
239
+ [Show "next steps" with exact commands to run]
240
+ ```
241
+
242
+ ## Critical features to include:
243
+
244
+ ### 1. UAT Templates (REQUIRED)
245
+ Every generated project must include:
246
+ - `docs/uat/UAT_TEMPLATE.md` — a scenario pack template:
247
+ ```markdown
248
+ # UAT Scenario Pack: [Project Name]
249
+
250
+ ## Pre-Conditions
251
+ - [ ] Application is deployed to staging
252
+ - [ ] Test accounts are created
253
+ - [ ] Test data is seeded
254
+
255
+ ## Scenarios
256
+
257
+ ### UAT-001: [Feature Name] — Happy Path
258
+ **Priority:** P0
259
+ **Preconditions:** [what must be true before testing]
260
+ **Steps:**
261
+ 1. [action]
262
+ 2. [action]
263
+ 3. [action]
264
+ **Expected Result:** [what should happen]
265
+ **Actual Result:** ___
266
+ **Status:** PASS / FAIL / BLOCKED / NOT RUN
267
+ **Tester:** ___
268
+ **Date:** ___
269
+ **Notes:** ___
270
+
271
+ ### UAT-002: [Feature Name] — Error Handling
272
+ ...
273
+ ```
274
+
275
+ - `docs/uat/UAT_CHECKLIST.csv` — machine-readable tracking:
276
+ ```csv
277
+ UAT_ID,Feature,Priority,Status,Tester,Date,Defect_ID,Notes
278
+ UAT-001,[Feature],P0,NOT RUN,,,,
279
+ ```
280
+
281
+ - A `run-uat` Claude Code command:
282
+ ```markdown
283
+ <!-- .claude/commands/run-uat.md -->
284
+ Read docs/uat/UAT_TEMPLATE.md.
285
+ For each P0 scenario:
286
+ 1. Check if automated tests exist that cover this scenario
287
+ 2. If automated: run the test and report PASS/FAIL
288
+ 3. If not automated: flag as MANUAL REQUIRED
289
+ 4. Update UAT_CHECKLIST.csv with results
290
+
291
+ Output:
292
+ - Automated coverage: X/Y scenarios have automated tests
293
+ - Results: X passed, Y failed, Z need manual testing
294
+ - Blocking issues: list any P0 failures
295
+ ```
296
+
297
+ - A `uat-validator` agent:
298
+ ```markdown
299
+ <!-- .claude/agents/uat-validator.md -->
300
+ You are a QA engineer validating UAT scenarios.
301
+ Read-only. Never modify code.
302
+
303
+ For each UAT scenario:
304
+ 1. Verify the feature exists in the codebase
305
+ 2. Check if there's a corresponding automated test
306
+ 3. If automated test exists, verify it covers the scenario's steps
307
+ 4. Flag gaps: scenarios without tests, tests without scenarios
308
+
309
+ Output a traceability matrix:
310
+ | UAT ID | Feature | Has Automated Test? | Test File | Coverage |
311
+ ```
312
+
313
+ ### 2. Failover / Production Resilience Patterns (REQUIRED)
314
+ Every generated backend must include:
315
+ - Health check endpoint (`/health` or `/healthz`)
316
+ - Graceful shutdown handler
317
+ - Database connection retry with exponential backoff
318
+ - External service timeout configuration
319
+ - Structured error responses (never leak stack traces)
320
+
321
+ For AI-powered services, additionally:
322
+ - AI service fallback (rule-based when AI is unavailable)
323
+ - AI response validation (Pydantic/Zod, not raw strings)
324
+ - AI timeout + retry configuration
325
+ - Rate limit handling
326
+
327
+ ### 3. The Prompt Library
328
+ Include the complete prompt library from docs/universal-prompt-library.md
329
+ in every generated project at `docs/prompt-library.md`. This gives every
330
+ developer on the project access to the 6 flows and utility prompts.
331
+
332
+ ### 4. Pre-built Verification Chain
333
+ Every generated project ships with the full agent verification chain:
334
+ - code-quality-reviewer.md (tailored to the selected stack)
335
+ - security-reviewer.md (tailored to web/API/full-stack)
336
+ - spec-validator.md (universal)
337
+ - production-readiness.md (tailored to the deployment target)
338
+ - uat-validator.md (universal)
339
+
340
+ ## Implementation approach:
341
+ - Use Node.js for the CLI (so it's npx-installable)
342
+ - Use Inquirer.js for interactive prompts
343
+ - Use simple file copying + string replacement for templates (no complex template engine needed for v1)
344
+ - Test with Vitest
345
+ - Start with 3 stack combinations:
346
+ 1. Next.js full-stack (extends your existing next-init)
347
+ 2. FastAPI backend service
348
+ 3. Next.js frontend + FastAPI backend (polyglot full-stack)
349
+ - Make it easy to add more stacks later (each stack is just a folder in templates/)
350
+
351
+ ## What to do RIGHT NOW:
352
+ 1. Read this entire specification
353
+ 2. Create the repo structure
354
+ 3. Implement the CLI flow (prompts.js → recommender.js → composer.js → claude-configurator.js)
355
+ 4. Implement the first 3 stack templates
356
+ 5. Implement the Claude Code infrastructure generation
357
+ 6. Implement the UAT template generation
358
+ 7. Write tests for the recommendation engine
359
+ 8. Test by running `node bin/devforge.js test-app` locally
360
+
361
+ Build this phase by phase. Start with the CLI flow and base template.
362
+ Show me the plan before writing any code.
363
+ ```
364
+
365
+ ---
366
+
367
+ ## After The Repo Is Built: Your Workflow
368
+
369
+ ```
370
+ 1. You have a product idea
371
+
372
+ 2. Run: npx devforge my-new-saas
373
+
374
+ 3. Answer: "Full-stack app" → "TypeScript + Python" → "Yes auth" → "Yes AI" → "Docker"
375
+
376
+ 4. DevForge generates:
377
+ - Next.js frontend + FastAPI backend
378
+ - PostgreSQL + pgvector
379
+ - NextAuth + JWT
380
+ - Playwright + Pytest
381
+ - Docker Compose
382
+ - CLAUDE.md tailored to this stack
383
+ - Hooks (eslint, tsc, ruff, pyright)
384
+ - Skills (nextjs, fastapi, playwright, security, ai-prompts)
385
+ - Agents (code-quality, security, spec-validator, production-readiness, uat-validator)
386
+ - Commands (verify-all, audit-spec, audit-wiring, pre-pr, run-uat)
387
+ - UAT templates
388
+ - Health check endpoints
389
+ - Graceful shutdown handlers
390
+ - docs/prompt-library.md (the complete 6-flow prompt library)
391
+
392
+ 5. cd my-new-saas && npm install && npm run dev
393
+
394
+ 6. Open Claude Code. Everything is already configured.
395
+ - Hooks enforce quality on every edit
396
+ - Skills provide domain knowledge on demand
397
+ - Agents verify your work before PRs
398
+ - UAT templates track acceptance testing
399
+ - Prompt library guides every task
400
+
401
+ 7. You just... build.
402
+ ```
403
+
404
+ No setup. No "are you sure?" cycles. No troubleshooting hooks.
405
+ No writing CLAUDE.md from scratch. No figuring out which agents to create.
406
+
407
+ Clone, run, build.
package/docs/errata.md ADDED
@@ -0,0 +1,58 @@
1
+ # DevForge Errata & Testing Checklist
2
+
3
+ Known issues, limitations, and manual testing procedures.
4
+
5
+ ---
6
+
7
+ ## Known Issues
8
+
9
+ ### No ESLint
10
+ The `post-edit.sh` hook is a no-op because ESLint is not yet configured. The `stop-quality-gate.sh` only runs Vitest. When ESLint is added, both hooks should be updated.
11
+
12
+ ### Windows Bash Compatibility
13
+ Hook scripts require `bash` and `jq`. On Windows:
14
+ - Git Bash provides `bash`
15
+ - `jq` must be installed separately (e.g., via `choco install jq` or `scoop install jq`)
16
+ - `chmod +x` may not persist on NTFS — hooks use `bash script.sh` invocation to avoid this
17
+
18
+ ### Template Variable Edge Cases
19
+ - Template substitution is simple regex: `{{(\w+)}}` → replacement
20
+ - Nested braces like `{{{VAR}}}` will produce `{value}` (outer brace preserved)
21
+ - Variables not in the vars map are left as-is (no error thrown)
22
+ - Binary files with `{{` patterns will not be modified (only `.template` files are processed)
23
+
24
+ ### V1 Stack Limitations
25
+ - Only 3 stack combinations supported
26
+ - No Go, Rust, React SPA, React Native, Tauri, or browser extension support
27
+ - "Describe it" option (AI recommendation) is not implemented in V1
28
+
29
+ ---
30
+
31
+ ## Manual Testing Checklist
32
+
33
+ Before any release, verify these manually:
34
+
35
+ ### CLI Flow
36
+ - [ ] `node bin/devforge.js` (no args) shows usage error
37
+ - [ ] `node bin/devforge.js test-output` starts the interactive flow
38
+ - [ ] Ctrl+C during prompts exits cleanly (no partial output)
39
+ - [ ] Selecting each of the 3 supported stacks completes successfully
40
+ - [ ] Selecting an unsupported stack shows a helpful message
41
+
42
+ ### Output Verification
43
+ - [ ] All `{{VAR}}` placeholders are replaced in output
44
+ - [ ] No `.template` extensions remain in output filenames
45
+ - [ ] `.gitignore` includes `.env` in every generated project
46
+ - [ ] `CLAUDE.md` is generated with stack-specific content
47
+ - [ ] Health check endpoint code exists in generated project
48
+ - [ ] Graceful shutdown handler exists in generated project
49
+
50
+ ### Claude Code Infrastructure
51
+ - [ ] `.claude/settings.json` generated with correct hooks for the stack
52
+ - [ ] All 5 agents present in `.claude/agents/`
53
+ - [ ] Relevant skills present in `.claude/skills/`
54
+ - [ ] All 6 commands present in `.claude/commands/`
55
+ - [ ] `docs/prompt-library.md` exists in generated project
56
+
57
+ ### Clean Up
58
+ - [ ] `rm -rf test-output/` after each test run
@@ -0,0 +1,66 @@
1
+ # Multi-Agent Verification Chain
2
+
3
+ How the 5 review agents work together to verify DevForge code quality.
4
+
5
+ ---
6
+
7
+ ## The 5 Agents
8
+
9
+ ### 1. Code Quality Reviewer
10
+ **Focus:** Code correctness and conventions
11
+ - ESM import patterns (`.js` extensions, no `require()`)
12
+ - Single-responsibility functions in `src/`
13
+ - Template `{{VARIABLE_NAME}}` conventions
14
+ - Chalk for user output, stderr for errors
15
+ - No dead code or unused imports
16
+
17
+ ### 2. Security Reviewer
18
+ **Focus:** CLI-specific security (not web security)
19
+ - Path traversal in template composition (can output escape target dir?)
20
+ - Command injection via project names
21
+ - File system safety (no symlink following)
22
+ - No secrets in templates or generated output
23
+ - Dependency safety
24
+
25
+ ### 3. Spec Validator
26
+ **Focus:** Requirement traceability
27
+ - Every feature in README → has implementation in `src/`
28
+ - Every stack in recommender → has templates in `templates/`
29
+ - Every Claude Code feature → has templates in `templates/claude-code/`
30
+ - Reports IMPLEMENTED / PARTIAL / MISSING / DIVERGED
31
+
32
+ ### 4. Production Readiness Reviewer
33
+ **Focus:** npm publish readiness
34
+ - Package.json correctness (bin, engines, type, version)
35
+ - CLI behavior (--help, --version, exit codes, Ctrl+C)
36
+ - Cross-platform compatibility (path.join, no hardcoded separators)
37
+ - Error handling (existing dirs, invalid names, missing templates)
38
+
39
+ ### 5. UAT Validator
40
+ **Focus:** Test coverage completeness
41
+ - Maps UAT scenarios in `docs/uat/UAT_TEMPLATE.md` to tests in `tests/`
42
+ - Reports coverage matrix
43
+ - Flags P0 scenarios without automated tests
44
+ - Suggests test implementations for gaps
45
+
46
+ ---
47
+
48
+ ## Orchestration
49
+
50
+ ### Via `/project:verify-all`
51
+ Runs all 5 agents sequentially after `npx vitest run`. Summarizes findings by severity.
52
+
53
+ ### Via `/project:pre-pr`
54
+ Runs code-quality and security agents on the PR diff only (not full codebase).
55
+
56
+ ### Manual
57
+ Launch any agent individually from the Claude Code agents panel.
58
+
59
+ ---
60
+
61
+ ## Key Principles
62
+
63
+ 1. **All agents are read-only** — they have `disallowedTools: [Write, Edit, MultiEdit]`
64
+ 2. **Self-verification** — each agent re-checks findings before reporting to reduce false positives
65
+ 3. **Tailored to DevForge** — agents check CLI-specific patterns, not generic web app patterns
66
+ 4. **Severity-based output** — findings are grouped as critical/high/medium/low
File without changes
@@ -0,0 +1,95 @@
1
+ # DevForge AI Infrastructure Playbook
2
+
3
+ How the Claude Code development infrastructure is structured for DevForge.
4
+
5
+ ---
6
+
7
+ ## Layer 0: CLAUDE.md (<50 lines)
8
+
9
+ The root `CLAUDE.md` contains only what Claude needs on every interaction:
10
+ - Project identity and purpose
11
+ - Directory map
12
+ - Build/test/lint commands
13
+ - Key rules (ESM only, template conventions, read-only docs)
14
+
15
+ Everything else goes in hooks, agents, commands, or skills.
16
+
17
+ ---
18
+
19
+ ## Layer 1: Hooks (3 scripts in .claude/hooks/)
20
+
21
+ ### protect-files.sh (PreToolUse)
22
+ Blocks edits to: `.env`, `.env.*`, `package-lock.json`, `.git/`, `docs/0*` reference docs.
23
+ Triggered on Write, Edit, MultiEdit. Exit 2 = blocked.
24
+
25
+ ### post-edit.sh (PostToolUse)
26
+ Currently a no-op stub — DevForge has no ESLint yet. When ESLint is added, this will auto-lint `.js` files after every edit. Non-blocking (exit 0 always).
27
+
28
+ ### stop-quality-gate.sh (Stop)
29
+ Runs `npx vitest run` before Claude can mark a task as done. Prevents completing work with broken tests. Has infinite-loop guard via `stop_hook_active` check.
30
+
31
+ ---
32
+
33
+ ## Layer 2: Agents (5 read-only reviewers in .claude/agents/)
34
+
35
+ All agents have `disallowedTools: [Write, Edit, MultiEdit]` — they can only read and report.
36
+
37
+ | Agent | Focus |
38
+ |---|---|
39
+ | code-quality-reviewer | ESM patterns, single responsibility, template conventions |
40
+ | security-reviewer | Path traversal, command injection, file system safety |
41
+ | spec-validator | Requirement traceability against a spec file |
42
+ | production-readiness | CLI packaging, cross-platform, error handling |
43
+ | uat-validator | UAT scenario to test coverage mapping |
44
+
45
+ ---
46
+
47
+ ## Layer 3: Commands (10 slash commands in .claude/commands/)
48
+
49
+ ### Daily Workflow (4 commands)
50
+ | Command | When to Use |
51
+ |---|---|
52
+ | `/project:help` | Don't know what to do — guides you to the right workflow |
53
+ | `/project:status` | Quick dashboard of tests, git state, recent commits |
54
+ | `/project:next` | Suggests what to work on next based on context |
55
+ | `/project:done` | Verifies task completion before moving on |
56
+
57
+ ### Verification (6 commands)
58
+ | Command | What It Does |
59
+ |---|---|
60
+ | `/project:verify-all` | Runs all 5 agents + tests |
61
+ | `/project:audit-spec` | Checks implementation vs specification |
62
+ | `/project:audit-wiring` | Finds dead/unwired code and templates |
63
+ | `/project:audit-security` | Security review of src/ and bin/ |
64
+ | `/project:pre-pr` | Full pre-PR checklist |
65
+ | `/project:run-uat` | Execute UAT scenarios, report coverage |
66
+
67
+ ---
68
+
69
+ ## Adding New Infrastructure
70
+
71
+ ### Adding a hook
72
+ 1. Create `.claude/hooks/[name].sh` with `#!/bin/bash` and stdin JSON parsing
73
+ 2. Add to `.claude/settings.json` under the appropriate event (PreToolUse/PostToolUse/Stop)
74
+ 3. Make executable: `chmod +x .claude/hooks/[name].sh`
75
+
76
+ ### Adding an agent
77
+ 1. Create `.claude/agents/[name].md` with YAML frontmatter including `disallowedTools`
78
+ 2. Add a review checklist tailored to DevForge
79
+ 3. Include a self-verification protocol
80
+
81
+ ### Adding a command
82
+ 1. Create `.claude/commands/[name].md` with instructions
83
+ 2. Use DevForge's real commands (not placeholders)
84
+ 3. Reference it in the help command if it's a common workflow
85
+
86
+ ---
87
+
88
+ ## Why No ESLint Yet
89
+
90
+ DevForge is a small CLI tool (~900 lines of JS). The CLAUDE.md notes ESLint is planned ("when eslint is added"). When it's added:
91
+ 1. Install: `npm install -D eslint @eslint/js`
92
+ 2. Create `eslint.config.js` (flat config for ESM)
93
+ 3. Update `post-edit.sh` to run `npx eslint --fix`
94
+ 4. Update `stop-quality-gate.sh` to add `npx eslint .`
95
+ 5. Update CLAUDE.md lint command