forgedev 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -7
- package/bin/devforge.js +11 -1
- package/docs/00-README.md +310 -0
- package/docs/01-universal-prompt-library.md +1049 -0
- package/docs/02-claude-code-mastery-playbook.md +283 -0
- package/docs/03-multi-agent-verification.md +565 -0
- package/docs/04-errata-and-verification-checklist.md +284 -0
- package/docs/05-universal-scaffolder-vision.md +452 -0
- package/docs/06-confidence-assessment-and-repo-prompt.md +407 -0
- package/docs/errata.md +58 -0
- package/docs/multi-agent-verification.md +66 -0
- package/docs/plans/.gitkeep +0 -0
- package/docs/playbook.md +95 -0
- package/docs/prompt-library.md +160 -0
- package/docs/uat/UAT_CHECKLIST.csv +9 -0
- package/docs/uat/UAT_TEMPLATE.md +163 -0
- package/package.json +10 -2
- package/src/claude-configurator.js +2 -0
- package/src/cli.js +16 -5
- package/src/doctor-prompts.js +9 -2
- package/src/doctor.js +19 -0
- package/src/index.js +7 -0
- package/src/update-check.js +49 -0
- package/src/update.js +33 -0
- package/src/utils.js +1 -1
- package/templates/auth/jwt-custom/backend/app/core/security.py.template +4 -1
- package/templates/backend/fastapi/backend/app/core/config.py.template +2 -2
- package/templates/base/docs/plans/.gitkeep +0 -0
- package/templates/base/docs/uat/UAT_CHECKLIST.csv.template +2 -0
- package/templates/base/docs/uat/UAT_TEMPLATE.md.template +22 -0
- package/templates/claude-code/agents/build-error-resolver.md +4 -4
- package/templates/claude-code/agents/code-quality-reviewer.md +1 -1
- package/templates/claude-code/agents/database-reviewer.md +2 -2
- package/templates/claude-code/agents/doc-updater.md +1 -1
- package/templates/claude-code/agents/harness-optimizer.md +26 -0
- package/templates/claude-code/agents/loop-operator.md +1 -1
- package/templates/claude-code/agents/product-strategist.md +124 -0
- package/templates/claude-code/agents/security-reviewer.md +1 -0
- package/templates/claude-code/agents/spec-validator.md +31 -1
- package/templates/claude-code/agents/uat-validator.md +6 -1
- package/templates/claude-code/claude-md/base.md +3 -2
- package/templates/claude-code/claude-md/nextjs.md +1 -1
- package/templates/claude-code/commands/build-fix.md +1 -1
- package/templates/claude-code/commands/code-review.md +6 -1
- package/templates/claude-code/commands/full-audit.md +61 -0
- package/templates/claude-code/commands/workflows.md +4 -0
- package/templates/claude-code/hooks/scripts/autofix-polyglot.mjs +28 -10
- package/templates/claude-code/hooks/scripts/autofix-python.mjs +11 -4
- package/templates/claude-code/hooks/scripts/autofix-typescript.mjs +11 -3
- package/templates/claude-code/hooks/scripts/guard-protected-files.mjs +2 -2
- package/templates/claude-code/skills/ai-prompts/SKILL.md +1 -0
- package/templates/claude-code/skills/fastapi/SKILL.md +1 -1
- package/templates/claude-code/skills/git-workflow/SKILL.md +3 -3
- package/templates/claude-code/skills/nextjs/SKILL.md +1 -1
- package/templates/claude-code/skills/playwright/SKILL.md +8 -7
- package/templates/claude-code/skills/security-api/SKILL.md +2 -2
- package/templates/claude-code/skills/security-web/SKILL.md +1 -0
- package/templates/database/sqlalchemy-postgres/.env.example +1 -0
- package/templates/infra/github-actions/.github/workflows/ci.yml.template +49 -0
- package/templates/testing/pytest/backend/tests/__init__.py +0 -0
- package/templates/testing/pytest/backend/tests/conftest.py.template +11 -0
- package/templates/testing/pytest/backend/tests/test_health.py.template +10 -0
- package/templates/testing/vitest/vitest.config.ts.template +18 -0
- package/CLAUDE.md +0 -38
|
@@ -0,0 +1,565 @@
|
|
|
1
|
+
# Multi-Agent Verification Architecture
|
|
2
|
+
## Agents That Check Each Other's Work Before Anything Hits Production
|
|
3
|
+
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## How the Layers Work (Not Everything Is an Agent)
|
|
7
|
+
|
|
8
|
+
```
|
|
9
|
+
Layer 0: CLAUDE.md (scoped notes) — NOT an agent. Just context.
|
|
10
|
+
Layer 1: Hooks (shell scripts) — NOT an agent. Deterministic pass/fail.
|
|
11
|
+
Layer 2: Subagents (specialized reviewers) — YES, separate Claude instances.
|
|
12
|
+
Layer 3: Agent Teams (parallel reviewers) — YES, coordinated multi-agent.
|
|
13
|
+
Layer 4: Claude Code Review (managed) — YES, Anthropic's managed multi-agent PR reviewer.
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
Layers 0-1 are cheap and instant. Layers 2-3 cost tokens but catch
|
|
17
|
+
what scripts can't. Layer 4 runs on PR creation in GitHub.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Layer 1: Hooks (Deterministic Gates — Always On)
|
|
22
|
+
|
|
23
|
+
Hooks run automatically. No AI judgment. Pass or fail.
|
|
24
|
+
|
|
25
|
+
**What they catch:** syntax errors, type errors, lint violations, protected file edits.
|
|
26
|
+
**What they miss:** logic errors, missing features, security design flaws.
|
|
27
|
+
|
|
28
|
+
### Essential Hook 1: PostToolUse — Auto-Lint After Every Edit
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
#!/bin/bash
|
|
32
|
+
# .claude/hooks/post-edit.sh
|
|
33
|
+
INPUT=$(cat)
|
|
34
|
+
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
|
|
35
|
+
if [ -z "$FILE_PATH" ]; then exit 0; fi
|
|
36
|
+
|
|
37
|
+
# [CUSTOMIZE] Replace with your project's lint commands
|
|
38
|
+
if [[ "$FILE_PATH" == *.ts || "$FILE_PATH" == *.tsx ]]; then
|
|
39
|
+
npx eslint --fix "$FILE_PATH" 2>&1 || true
|
|
40
|
+
fi
|
|
41
|
+
if [[ "$FILE_PATH" == *.py ]]; then
|
|
42
|
+
ruff check --fix "$FILE_PATH" 2>&1 || true
|
|
43
|
+
fi
|
|
44
|
+
exit 0
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Essential Hook 2: Stop — Quality Gate Before Completion
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
#!/bin/bash
|
|
51
|
+
# .claude/hooks/stop-quality-gate.sh
|
|
52
|
+
INPUT=$(cat)
|
|
53
|
+
ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active')
|
|
54
|
+
if [ "$ACTIVE" = "true" ]; then exit 0; fi
|
|
55
|
+
|
|
56
|
+
echo "Running quality gate..." >&2
|
|
57
|
+
|
|
58
|
+
# [CUSTOMIZE] Replace with your project's validation commands
|
|
59
|
+
# Example for TypeScript + Python:
|
|
60
|
+
npx tsc --noEmit 2>&1 || { echo "BLOCKED: Type errors found" >&2; exit 2; }
|
|
61
|
+
npx eslint . --ext .ts,.tsx 2>&1 || { echo "BLOCKED: Lint errors found" >&2; exit 2; }
|
|
62
|
+
|
|
63
|
+
# Example for other stacks:
|
|
64
|
+
# cargo check 2>&1 || { echo "BLOCKED: Rust compile errors" >&2; exit 2; }
|
|
65
|
+
# go vet ./... 2>&1 || { echo "BLOCKED: Go vet errors" >&2; exit 2; }
|
|
66
|
+
# bundle exec rubocop 2>&1 || { echo "BLOCKED: Ruby lint errors" >&2; exit 2; }
|
|
67
|
+
|
|
68
|
+
echo "All checks passed." >&2
|
|
69
|
+
exit 0
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Essential Hook 3: PreToolUse — Protected Files
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
#!/bin/bash
|
|
76
|
+
# .claude/hooks/protect-files.sh
|
|
77
|
+
INPUT=$(cat)
|
|
78
|
+
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
|
|
79
|
+
|
|
80
|
+
# [CUSTOMIZE] Add your project's protected paths
|
|
81
|
+
PROTECTED=(".env" ".env.local" "package-lock.json" "pnpm-lock.yaml" ".git/" "yarn.lock")
|
|
82
|
+
for p in "${PROTECTED[@]}"; do
|
|
83
|
+
if [[ "$FILE_PATH" == *"$p"* ]]; then
|
|
84
|
+
echo "BLOCKED: $FILE_PATH is protected" >&2
|
|
85
|
+
exit 2
|
|
86
|
+
fi
|
|
87
|
+
done
|
|
88
|
+
exit 0
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### settings.json Configuration
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"hooks": {
|
|
96
|
+
"PreToolUse": [{
|
|
97
|
+
"matcher": "Write|Edit|MultiEdit",
|
|
98
|
+
"hooks": [{
|
|
99
|
+
"type": "command",
|
|
100
|
+
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/protect-files.sh"
|
|
101
|
+
}]
|
|
102
|
+
}],
|
|
103
|
+
"PostToolUse": [{
|
|
104
|
+
"matcher": "Write|Edit|MultiEdit",
|
|
105
|
+
"hooks": [{
|
|
106
|
+
"type": "command",
|
|
107
|
+
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/post-edit.sh"
|
|
108
|
+
}]
|
|
109
|
+
}],
|
|
110
|
+
"Stop": [{
|
|
111
|
+
"hooks": [{
|
|
112
|
+
"type": "command",
|
|
113
|
+
"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/stop-quality-gate.sh"
|
|
114
|
+
}]
|
|
115
|
+
}]
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## Layer 2: Specialized Subagents (The Verification Chain)
|
|
123
|
+
|
|
124
|
+
Each subagent is a separate Claude instance with its own context and tool
|
|
125
|
+
restrictions. Validators have `disallowedTools` set so they CANNOT modify code.
|
|
126
|
+
|
|
127
|
+
### Agent 1: Code Quality Reviewer
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
<!-- .claude/agents/code-quality-reviewer.md -->
|
|
131
|
+
---
|
|
132
|
+
name: code-quality-reviewer
|
|
133
|
+
description: Reviews code changes for quality, patterns, and maintainability. Read-only.
|
|
134
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
135
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
136
|
+
model: sonnet
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
You are a senior code reviewer. You NEVER modify code. You ONLY report findings.
|
|
140
|
+
|
|
141
|
+
## Review Checklist
|
|
142
|
+
|
|
143
|
+
### General Quality
|
|
144
|
+
- [ ] All functions have type annotations / type hints
|
|
145
|
+
- [ ] No bare except/catch blocks (must catch specific exceptions)
|
|
146
|
+
- [ ] No deprecated patterns (check against current library versions)
|
|
147
|
+
- [ ] No TODO/FIXME/HACK that should be resolved before merge
|
|
148
|
+
- [ ] Error handling exists for all external calls (API, DB, file I/O)
|
|
149
|
+
|
|
150
|
+
### [CUSTOMIZE] Language-Specific Checks
|
|
151
|
+
<!-- Add checks for your primary language/framework. Examples: -->
|
|
152
|
+
<!-- Python: Pydantic v2 patterns, SQLAlchemy 2.0, async/await -->
|
|
153
|
+
<!-- TypeScript: strict mode, no `any`, proper null checks -->
|
|
154
|
+
<!-- Go: error wrapping, context propagation, goroutine cleanup -->
|
|
155
|
+
<!-- Rust: proper error types, no unwrap() in production code -->
|
|
156
|
+
|
|
157
|
+
### Wiring (Full-Stack Projects)
|
|
158
|
+
- [ ] Every new backend endpoint has a corresponding frontend API call
|
|
159
|
+
- [ ] Every frontend API call has typed request and response
|
|
160
|
+
- [ ] Every API call is used by at least one component/page
|
|
161
|
+
- [ ] Props/interfaces declared are actually used (no dead declarations)
|
|
162
|
+
|
|
163
|
+
## Output Format
|
|
164
|
+
For each finding:
|
|
165
|
+
- File: [path] Line: [number]
|
|
166
|
+
- Severity: CRITICAL | HIGH | MEDIUM | LOW
|
|
167
|
+
- Issue: [description]
|
|
168
|
+
- Fix: [what should change]
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### Agent 2: Security Reviewer
|
|
172
|
+
|
|
173
|
+
```markdown
|
|
174
|
+
<!-- .claude/agents/security-reviewer.md -->
|
|
175
|
+
---
|
|
176
|
+
name: security-reviewer
|
|
177
|
+
description: Reviews code for security vulnerabilities. Read-only.
|
|
178
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
179
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
180
|
+
model: opus
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
You are a senior application security engineer.
|
|
184
|
+
You NEVER modify code. You ONLY report vulnerabilities.
|
|
185
|
+
|
|
186
|
+
## Security Review Checklist
|
|
187
|
+
|
|
188
|
+
### Authentication & Authorization
|
|
189
|
+
- [ ] Every endpoint/route requires authentication
|
|
190
|
+
- [ ] Authorization checks enforce role/scope restrictions
|
|
191
|
+
- [ ] No privilege escalation paths (user A accessing user B's data)
|
|
192
|
+
- [ ] Bulk operations verify all records share same auth scope
|
|
193
|
+
|
|
194
|
+
### Input Validation
|
|
195
|
+
- [ ] User input is sanitized before storage and display
|
|
196
|
+
- [ ] File uploads validate: size limit, content-type, magic bytes
|
|
197
|
+
- [ ] Enum/status fields use typed constraints (not bare strings)
|
|
198
|
+
- [ ] List endpoints have pagination limits (no unbounded queries)
|
|
199
|
+
- [ ] No SQL/NoSQL injection vectors
|
|
200
|
+
|
|
201
|
+
### [CUSTOMIZE] Framework-Specific Security
|
|
202
|
+
<!-- Add checks for your stack. Examples: -->
|
|
203
|
+
<!-- Django: CSRF tokens, ORM usage, settings.DEBUG -->
|
|
204
|
+
<!-- Express: helmet middleware, rate limiting, CORS config -->
|
|
205
|
+
<!-- FastAPI: Depends() for auth, Pydantic validation -->
|
|
206
|
+
<!-- Rails: strong parameters, mass assignment protection -->
|
|
207
|
+
|
|
208
|
+
### Secrets & Configuration
|
|
209
|
+
- [ ] No hardcoded credentials, API keys, or secrets
|
|
210
|
+
- [ ] No hardcoded URLs without environment variable override
|
|
211
|
+
- [ ] Error responses don't leak stack traces or internal paths
|
|
212
|
+
|
|
213
|
+
### AI/LLM Integration (if applicable)
|
|
214
|
+
- [ ] PII/sensitive data redacted before sending to AI
|
|
215
|
+
- [ ] User content has injection boundaries (not mixed with system prompts)
|
|
216
|
+
- [ ] AI responses are validated/parsed (not used as raw trusted output)
|
|
217
|
+
- [ ] Graceful fallback when AI service is unavailable
|
|
218
|
+
|
|
219
|
+
## Self-Verification Protocol
|
|
220
|
+
After finding a potential vulnerability:
|
|
221
|
+
1. Attempt to DISPROVE it — check for middleware, decorators, base classes
|
|
222
|
+
that might handle this elsewhere
|
|
223
|
+
2. Check if test coverage validates this behavior
|
|
224
|
+
3. Only report CONFIRMED findings you could not disprove
|
|
225
|
+
|
|
226
|
+
## Output Format
|
|
227
|
+
For each vulnerability:
|
|
228
|
+
- File: [path] Line: [number]
|
|
229
|
+
- Severity: CRITICAL | HIGH | MEDIUM | LOW
|
|
230
|
+
- CWE: [CWE-XXX if applicable]
|
|
231
|
+
- Vulnerability: [description]
|
|
232
|
+
- Attack Scenario: [how an attacker would exploit this]
|
|
233
|
+
- Remediation: [specific fix]
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Agent 3: Spec Compliance Validator
|
|
237
|
+
|
|
238
|
+
```markdown
|
|
239
|
+
<!-- .claude/agents/spec-validator.md -->
|
|
240
|
+
---
|
|
241
|
+
name: spec-validator
|
|
242
|
+
description: Validates implementations match specifications. Read-only.
|
|
243
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
244
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
245
|
+
model: sonnet
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
You are a QA lead validating code matches specifications.
|
|
249
|
+
You NEVER modify code. You ONLY report gaps.
|
|
250
|
+
|
|
251
|
+
## Validation Process
|
|
252
|
+
1. Read the spec file provided in the task prompt
|
|
253
|
+
2. Extract every discrete requirement
|
|
254
|
+
3. For each requirement, search the codebase for the implementation
|
|
255
|
+
4. Trace the full wire: data model → API → client → UI (if full-stack)
|
|
256
|
+
5. Rate each: IMPLEMENTED | PARTIAL | MISSING | DIVERGED
|
|
257
|
+
|
|
258
|
+
## Output Format
|
|
259
|
+
| # | Requirement | Status | Evidence (file:line) | Gap |
|
|
260
|
+
|---|------------|--------|---------------------|-----|
|
|
261
|
+
|
|
262
|
+
Summary: X/Y implemented, Z partial, W missing, V diverged.
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### Agent 4: Production Readiness Checker
|
|
266
|
+
|
|
267
|
+
```markdown
|
|
268
|
+
<!-- .claude/agents/production-readiness.md -->
|
|
269
|
+
---
|
|
270
|
+
name: production-readiness
|
|
271
|
+
description: Checks if code is ready for production deployment. Read-only.
|
|
272
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
273
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
274
|
+
model: sonnet
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
You are a DevOps/SRE engineer doing a pre-deployment review.
|
|
278
|
+
You NEVER modify code. You ONLY flag production risks.
|
|
279
|
+
|
|
280
|
+
## Checklist
|
|
281
|
+
|
|
282
|
+
### Environment & Configuration
|
|
283
|
+
- [ ] No hardcoded localhost URLs or ports
|
|
284
|
+
- [ ] No hardcoded secrets or credentials
|
|
285
|
+
- [ ] CORS/security headers configured (not wildcard in production)
|
|
286
|
+
- [ ] Database connections use pooling (if applicable)
|
|
287
|
+
|
|
288
|
+
### Error Handling
|
|
289
|
+
- [ ] No unhandled promise rejections / unhandled exceptions
|
|
290
|
+
- [ ] API endpoints return structured error responses (not stack traces)
|
|
291
|
+
- [ ] File/network operations have cleanup (try-finally / context managers)
|
|
292
|
+
- [ ] Database transactions have rollback on failure
|
|
293
|
+
|
|
294
|
+
### Performance
|
|
295
|
+
- [ ] List endpoints have pagination (no unbounded queries)
|
|
296
|
+
- [ ] N+1 query patterns identified (if ORM is used)
|
|
297
|
+
- [ ] Long operations run in background (not blocking request/response)
|
|
298
|
+
|
|
299
|
+
### Data Safety
|
|
300
|
+
- [ ] Database migrations are backward-compatible
|
|
301
|
+
- [ ] New columns have defaults (won't break existing rows)
|
|
302
|
+
- [ ] Cascading deletes won't orphan critical data
|
|
303
|
+
|
|
304
|
+
### Failover & Resilience
|
|
305
|
+
- [ ] Health check endpoint exists (/health or /healthz)
|
|
306
|
+
- [ ] Health check verifies: app + database + critical external services
|
|
307
|
+
- [ ] Database connection has retry with exponential backoff
|
|
308
|
+
- [ ] External API calls have explicit timeouts (not infinite)
|
|
309
|
+
- [ ] External API calls have retry logic for transient failures (5xx, timeout)
|
|
310
|
+
- [ ] AI/LLM calls have fallback path (rule-based when AI unavailable)
|
|
311
|
+
- [ ] AI/LLM responses validated through schemas (not used as raw strings)
|
|
312
|
+
- [ ] Graceful shutdown handler exists (SIGTERM → finish requests → close DB → exit)
|
|
313
|
+
- [ ] Background jobs have dead letter / retry mechanism
|
|
314
|
+
|
|
315
|
+
## Output Format
|
|
316
|
+
For each finding:
|
|
317
|
+
- Category: [Environment | Error Handling | Performance | Data Safety | Failover]
|
|
318
|
+
- Severity: BLOCKER | CRITICAL | HIGH | MEDIUM
|
|
319
|
+
- File: [path:line]
|
|
320
|
+
- Issue: [what's wrong]
|
|
321
|
+
- Production Impact: [what would happen in prod]
|
|
322
|
+
- Fix: [what to change]
|
|
323
|
+
|
|
324
|
+
BLOCKER = deployment must not proceed.
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### Agent 6: UAT Validator
|
|
328
|
+
|
|
329
|
+
```markdown
|
|
330
|
+
<!-- .claude/agents/uat-validator.md -->
|
|
331
|
+
---
|
|
332
|
+
name: uat-validator
|
|
333
|
+
description: Validates UAT scenarios against implementation and automated test coverage. Read-only.
|
|
334
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
335
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
336
|
+
model: sonnet
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
You are a QA engineer validating user acceptance testing coverage.
|
|
340
|
+
You NEVER modify code. You ONLY report gaps.
|
|
341
|
+
|
|
342
|
+
## Process
|
|
343
|
+
1. Read docs/uat/UAT_SCENARIOS.md (or UAT_TEMPLATE.md)
|
|
344
|
+
2. For each UAT scenario:
|
|
345
|
+
a. Verify the feature EXISTS in the codebase (trace source to UI)
|
|
346
|
+
b. Search for a corresponding automated test
|
|
347
|
+
c. If found: does the test cover ALL steps in the scenario?
|
|
348
|
+
d. If not found: flag as UNTESTED
|
|
349
|
+
|
|
350
|
+
## Output Format
|
|
351
|
+
|
|
352
|
+
### Traceability Matrix
|
|
353
|
+
| UAT ID | Priority | Feature | Automated Test | Test File | Covers All Steps? | Gap |
|
|
354
|
+
|--------|----------|---------|---------------|-----------|-------------------|-----|
|
|
355
|
+
|
|
356
|
+
### Summary
|
|
357
|
+
- Total scenarios: X
|
|
358
|
+
- P0 with automation: X/Y
|
|
359
|
+
- P0 WITHOUT automation: Z (LIST THEM — these are blockers)
|
|
360
|
+
- P1 with automation: X/Y
|
|
361
|
+
- Recommended: [which scenarios to automate next]
|
|
362
|
+
|
|
363
|
+
### UAT Readiness Verdict
|
|
364
|
+
- READY: All P0 scenarios have passing automated tests
|
|
365
|
+
- NOT READY: List the P0 gaps that block acceptance
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
### Agent 5: UAT Validator
|
|
369
|
+
|
|
370
|
+
```markdown
|
|
371
|
+
<!-- .claude/agents/uat-validator.md -->
|
|
372
|
+
---
|
|
373
|
+
name: uat-validator
|
|
374
|
+
description: Validates UAT scenarios against implementations. Checks automated test coverage of acceptance criteria. Read-only.
|
|
375
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
376
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
377
|
+
model: sonnet
|
|
378
|
+
---
|
|
379
|
+
|
|
380
|
+
You are a QA engineer validating user acceptance testing coverage.
|
|
381
|
+
You NEVER modify code. You ONLY audit and report.
|
|
382
|
+
|
|
383
|
+
## Validation Process
|
|
384
|
+
|
|
385
|
+
1. Read docs/uat/UAT_TEMPLATE.md (or the file specified in the task)
|
|
386
|
+
2. For each UAT scenario:
|
|
387
|
+
- Search the codebase for the feature it tests
|
|
388
|
+
- Search the test directory for automated tests covering this scenario
|
|
389
|
+
- If automated test exists: verify it covers the scenario's steps
|
|
390
|
+
- If no automated test: flag as MANUAL REQUIRED
|
|
391
|
+
|
|
392
|
+
## Output Format
|
|
393
|
+
|
|
394
|
+
### Traceability Matrix
|
|
395
|
+
| UAT ID | Scenario | Priority | Feature Exists? | Automated Test? | Test File | Coverage |
|
|
396
|
+
|--------|----------|----------|----------------|-----------------|-----------|----------|
|
|
397
|
+
|
|
398
|
+
### Summary
|
|
399
|
+
- P0 automated coverage: X/Y scenarios
|
|
400
|
+
- P0 gaps: [list scenarios with no automated test]
|
|
401
|
+
- P1 automated coverage: X/Y scenarios
|
|
402
|
+
- Recommendation: READY FOR UAT | NEEDS MORE TESTS | CRITICAL GAPS
|
|
403
|
+
|
|
404
|
+
### For Each Gap
|
|
405
|
+
- UAT ID: [id]
|
|
406
|
+
- What's missing: [description]
|
|
407
|
+
- Suggested test: [brief description of what test to write]
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
### Agent 6 (Optional): AI Output Quality Auditor
|
|
411
|
+
|
|
412
|
+
Only needed if your project includes AI/LLM features.
|
|
413
|
+
|
|
414
|
+
```markdown
|
|
415
|
+
<!-- .claude/agents/ai-quality-auditor.md -->
|
|
416
|
+
---
|
|
417
|
+
name: ai-quality-auditor
|
|
418
|
+
description: Audits AI prompts and LLM integration for output quality and safety. Read-only.
|
|
419
|
+
allowedTools: Read, Bash, Grep, Glob, LS
|
|
420
|
+
disallowedTools: Write, Edit, MultiEdit, Notebook
|
|
421
|
+
model: opus
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
You are an AI/ML engineer auditing LLM integrations for quality and safety.
|
|
425
|
+
You NEVER modify code. You ONLY audit and report.
|
|
426
|
+
|
|
427
|
+
## 7-Point Prompt Audit
|
|
428
|
+
|
|
429
|
+
For every prompt-building function, evaluate:
|
|
430
|
+
|
|
431
|
+
1. **Concrete Data** — Does it embed actual values (not just descriptions)?
|
|
432
|
+
2. **Output Schema** — Is the expected response format specified with types?
|
|
433
|
+
3. **Anti-Hallucination** — Does it include "use ONLY provided data" instructions?
|
|
434
|
+
4. **Audience & Purpose** — Does it specify who reads the output and why?
|
|
435
|
+
5. **Citations** — Does it require citing source data for claims?
|
|
436
|
+
6. **Confidence Scoring** — Does it require confidence levels per finding?
|
|
437
|
+
7. **Graceful Degradation** — Does the caller handle: bad JSON, empty response, AI down?
|
|
438
|
+
|
|
439
|
+
## Output Format
|
|
440
|
+
| File | Function | Score (X/7) | Missing |
|
|
441
|
+
|------|----------|-------------|---------|
|
|
442
|
+
|
|
443
|
+
Flag any scoring below 5/7 as NEEDS IMPROVEMENT.
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
---
|
|
447
|
+
|
|
448
|
+
## Layer 3: The Verification Chain
|
|
449
|
+
|
|
450
|
+
### Option A: Sequential (Recommended)
|
|
451
|
+
|
|
452
|
+
```
|
|
453
|
+
You finish implementing a feature
|
|
454
|
+
↓
|
|
455
|
+
Agent 1: code-quality-reviewer → reports findings
|
|
456
|
+
↓
|
|
457
|
+
Agent 2: security-reviewer → reports vulnerabilities
|
|
458
|
+
↓
|
|
459
|
+
Agent 3: spec-validator → reports gaps
|
|
460
|
+
↓
|
|
461
|
+
Agent 4: production-readiness → reports blockers
|
|
462
|
+
↓
|
|
463
|
+
Agent 5: ai-quality-auditor (if AI features) → reports scores
|
|
464
|
+
↓
|
|
465
|
+
Agent 6: uat-validator → reports acceptance gaps
|
|
466
|
+
↓
|
|
467
|
+
You review all reports, fix issues
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
**Prompt to run the chain:**
|
|
471
|
+
|
|
472
|
+
```
|
|
473
|
+
Run the verification chain on the current git diff:
|
|
474
|
+
|
|
475
|
+
1. Task code-quality-reviewer: "Review all changed files for quality issues"
|
|
476
|
+
2. Task security-reviewer: "Review all changed files for security vulnerabilities"
|
|
477
|
+
3. Task spec-validator: "Validate against the spec at [YOUR SPEC PATH]"
|
|
478
|
+
4. Task production-readiness: "Check all modified files for production readiness"
|
|
479
|
+
5. Task uat-validator: "Check UAT coverage against docs/uat/UAT_TEMPLATE.md"
|
|
480
|
+
|
|
481
|
+
Compile findings into a single report grouped by severity.
|
|
482
|
+
BLOCKER and CRITICAL must be fixed before PR.
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
### Option B: Parallel (Faster, More Tokens)
|
|
486
|
+
|
|
487
|
+
All agents run simultaneously. See the detailed patterns in
|
|
488
|
+
Claude Code docs: `code.claude.com/docs/en/sub-agents`
|
|
489
|
+
|
|
490
|
+
### Option C: Builder-Validator Dependencies
|
|
491
|
+
|
|
492
|
+
Use the Task system with `addBlockedBy` so validators wait for builders:
|
|
493
|
+
|
|
494
|
+
```
|
|
495
|
+
Task #1: "Implement API endpoints" (builder)
|
|
496
|
+
Task #2: "Implement frontend" (builder, blocked by #1)
|
|
497
|
+
Task #3: "Quality review" (validator, blocked by #1 and #2)
|
|
498
|
+
Task #4: "Security review" (validator, blocked by #1 and #2)
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
---
|
|
502
|
+
|
|
503
|
+
## Layer 4: Claude Code Review (Managed, On Every PR)
|
|
504
|
+
|
|
505
|
+
Anthropic's managed multi-agent PR reviewer. Runs multiple specialized agents
|
|
506
|
+
in parallel, self-verifies findings to filter false positives, posts inline
|
|
507
|
+
comments ranked by severity.
|
|
508
|
+
|
|
509
|
+
**Availability:** Team and Enterprise plans.
|
|
510
|
+
**Cost:** ~$15-25 per review.
|
|
511
|
+
**Setup:** Admin installs Anthropic GitHub App → connects repos → enables review.
|
|
512
|
+
|
|
513
|
+
### Customization with REVIEW.md
|
|
514
|
+
|
|
515
|
+
Create `REVIEW.md` at your repo root:
|
|
516
|
+
|
|
517
|
+
```markdown
|
|
518
|
+
# Code Review Guidelines
|
|
519
|
+
|
|
520
|
+
## Always Check (High Priority)
|
|
521
|
+
<!-- [CUSTOMIZE] Add your project's critical patterns -->
|
|
522
|
+
- Every new endpoint/route has authentication
|
|
523
|
+
- No hardcoded secrets or credentials
|
|
524
|
+
- Error handling on all external calls
|
|
525
|
+
- New database columns have defaults
|
|
526
|
+
|
|
527
|
+
## [CUSTOMIZE] Project-Specific Patterns
|
|
528
|
+
<!-- Add patterns specific to your framework, architecture, etc. -->
|
|
529
|
+
|
|
530
|
+
## Deprioritize
|
|
531
|
+
- Formatting and import order (linter handles this)
|
|
532
|
+
- Naming-only comments without runtime risk
|
|
533
|
+
- Style preferences already covered by linting
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
---
|
|
537
|
+
|
|
538
|
+
## Complete Defense Layers
|
|
539
|
+
|
|
540
|
+
```
|
|
541
|
+
You write code
|
|
542
|
+
↓
|
|
543
|
+
Layer 1: Hooks (free, instant) — catch 60% of issues
|
|
544
|
+
↓
|
|
545
|
+
Layer 2: Subagent chain (on demand) — catch another 30%
|
|
546
|
+
↓
|
|
547
|
+
You fix findings, create PR
|
|
548
|
+
↓
|
|
549
|
+
Layer 4: Claude Code Review (per PR) — final safety net
|
|
550
|
+
↓
|
|
551
|
+
You review architecture + business logic (the 10% only humans judge)
|
|
552
|
+
↓
|
|
553
|
+
Merge
|
|
554
|
+
```
|
|
555
|
+
|
|
556
|
+
---
|
|
557
|
+
|
|
558
|
+
## Token Cost Reference
|
|
559
|
+
|
|
560
|
+
| Layer | Cost | When |
|
|
561
|
+
|-------|------|------|
|
|
562
|
+
| Hooks | Zero (shell scripts) | Every edit, every stop |
|
|
563
|
+
| Agent chain (6 sequential) | ~60-100K tokens | On demand |
|
|
564
|
+
| Claude Code Review | $15-25 per PR | Every PR |
|
|
565
|
+
| UAT execution | Agent + test run time | Before release |
|