vibe-coding-master 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +226 -0
  3. package/dist/backend/adapters/claude-adapter.js +38 -0
  4. package/dist/backend/adapters/command-runner.js +33 -0
  5. package/dist/backend/adapters/filesystem.js +60 -0
  6. package/dist/backend/adapters/git-adapter.js +33 -0
  7. package/dist/backend/api/artifact-routes.js +109 -0
  8. package/dist/backend/api/message-routes.js +90 -0
  9. package/dist/backend/api/project-routes.js +17 -0
  10. package/dist/backend/api/session-routes.js +64 -0
  11. package/dist/backend/api/task-routes.js +30 -0
  12. package/dist/backend/errors.js +29 -0
  13. package/dist/backend/runtime/node-pty-runtime.js +162 -0
  14. package/dist/backend/runtime/session-registry.js +36 -0
  15. package/dist/backend/runtime/terminal-runtime.js +1 -0
  16. package/dist/backend/server.js +159 -0
  17. package/dist/backend/services/artifact-service.js +170 -0
  18. package/dist/backend/services/command-dispatcher.js +37 -0
  19. package/dist/backend/services/message-service.js +217 -0
  20. package/dist/backend/services/project-service.js +71 -0
  21. package/dist/backend/services/session-service.js +221 -0
  22. package/dist/backend/services/status-service.js +21 -0
  23. package/dist/backend/services/task-service.js +88 -0
  24. package/dist/backend/templates/handoff.js +76 -0
  25. package/dist/backend/templates/message-envelope.js +27 -0
  26. package/dist/backend/templates/role-command.js +21 -0
  27. package/dist/backend/templates/role-messaging-context.js +44 -0
  28. package/dist/backend/ws/terminal-ws.js +60 -0
  29. package/dist/cli/vcmctl.js +141 -0
  30. package/dist/main.js +63 -0
  31. package/dist/shared/constants.js +45 -0
  32. package/dist/shared/types/api.js +1 -0
  33. package/dist/shared/types/artifact.js +1 -0
  34. package/dist/shared/types/message.js +1 -0
  35. package/dist/shared/types/project.js +1 -0
  36. package/dist/shared/types/role.js +1 -0
  37. package/dist/shared/types/session.js +1 -0
  38. package/dist/shared/types/task.js +1 -0
  39. package/dist/shared/types/terminal.js +1 -0
  40. package/dist/shared/validation/artifact-check.js +64 -0
  41. package/dist/shared/validation/slug-check.js +22 -0
  42. package/dist-frontend/assets/index-Bah6k-Ix.css +32 -0
  43. package/dist-frontend/assets/index-EMaQuIB6.js +58 -0
  44. package/dist-frontend/index.html +13 -0
  45. package/docs/cc-best-practices.md +2142 -0
  46. package/docs/product-design.md +1597 -0
  47. package/docs/v1-architecture-design.md +1431 -0
  48. package/docs/v1-implementation-plan.md +1949 -0
  49. package/docs/v1-message-bus-orchestration-design.md +534 -0
  50. package/package.json +60 -0
  51. package/scripts/clean-build.mjs +12 -0
  52. package/scripts/fix-node-pty-spawn-helper.mjs +31 -0
@@ -0,0 +1,2142 @@
1
+ # Claude Code AI Coding Best Practices
2
+
3
+ Date: 2026-05-22
4
+
5
+
6
+ Core principle:
7
+
8
+ > AI coding reliability comes from two things: **public contract design** prevents architecture drift, and **public contract tests** prevent behavior drift.
9
+
10
+ Reliable loop:
11
+
12
+ ```text
13
+ task spec
14
+ -> file responsibilities / public function contracts
15
+ -> small-step implementation
16
+ -> layered testing
17
+ -> architecture and acceptance checks
18
+ -> documentation sync
19
+ -> Replan when needed
20
+ ```
21
+
22
+ ## 1. Basic Principles
23
+
24
+ Treat Claude Code as a smart engineer with limited context. It needs clear boundaries, executable feedback, and an acceptance checklist.
25
+
26
+ Good tasks for Claude:
27
+
28
+ - reproducible, testable bug fixes
29
+ - small to medium features with clear boundaries
30
+ - implementation that follows existing patterns
31
+ - test additions, PR review comment fixes, documentation drafts
32
+ - codebase exploration, explanation, onboarding
33
+
34
+ Do not hand these directly to Claude:
35
+
36
+ - “refactor the whole system”
37
+ - “figure it out” tasks for performance, auth, permissions, payments, schema, or data deletion
38
+ - complex business changes without a spec, tests, or acceptance criteria
39
+
40
+ High-risk tasks must reduce Claude’s autonomy:
41
+
42
+ - auth / permission
43
+ - payment / billing
44
+ - database schema / migration
45
+ - public API / SDK
46
+ - protocol / serialization
47
+ - data deletion / privacy
48
+ - concurrency / distributed consistency
49
+ - security-sensitive infrastructure
50
+
51
+ These tasks require a plan, public contracts, test contracts, validation commands, and human review.
52
+
53
+ Behavioral guardrails:
54
+
55
+ - State key assumptions before coding; call out unclear requirements, boundaries, or acceptance criteria.
56
+ - When multiple interpretations are reasonable, do not choose silently; explain the difference and tradeoff, and ask for confirmation when needed.
57
+ - Prefer the simplest solution that satisfies the task; do not add unrequested features, configuration, extension points, or abstractions.
58
+ - Touch only files required by the task; do not clean up, format, or refactor adjacent code opportunistically.
59
+ - Clean up only unused imports, variables, functions, or test leftovers created by the current change.
60
+ - Report-but-don't-act: when noticing an issue outside the current task scope (unrelated dead code, doc drift, adjacent bug, architecture concern, security smell), record it in `.ai/state/known-issues.md` and continue; do not act on it without an explicit task.
61
+ - Every diff line must trace to the task goal, public contract, test contract, or required documentation sync.
62
+ - For multi-step tasks, define the validation check for each step.
63
+
64
+ ## 2. Repo Harness Structure
65
+
66
+ Recommended structure:
67
+
68
+ ```text
69
+ repo/
70
+ CLAUDE.md
71
+
72
+ docs/
73
+ ARCHITECTURE.md
74
+ MODULE_MAP.md
75
+ TESTING.md
76
+ SECURITY.md
77
+ DEPENDENCY_RULES.md
78
+ exec-plans/
79
+ active/
80
+ completed/
81
+
82
+ .claude/
83
+ settings.json
84
+ skills/
85
+ agents/
86
+ project-manager.md
87
+ architect.md
88
+ coder.md
89
+ reviewer.md
90
+ optional/
91
+ security-specialist.md
92
+ migration-specialist.md
93
+ performance-specialist.md
94
+ frontend-qa.md
95
+ commands/
96
+
97
+ .ai/
98
+ task-specs/
99
+ handoffs/
100
+ <task-slug>/
101
+ architecture-plan.md
102
+ implementation-log.md
103
+ validation-log.md
104
+ review-report.md
105
+ state/
106
+ progress.md
107
+ decisions.md
108
+ validation-log.md
109
+ known-issues.md
110
+ scratch.md
111
+ generated/
112
+ module-index.json
113
+ test-map.json
114
+ public-surface.json
115
+
116
+ tools/
117
+ check-fast
118
+ check-changed
119
+ check-module
120
+ check-e2e-smoke
121
+ check-boundaries
122
+ check-public-surface
123
+ check-contract-tests
124
+ check-generated-artifacts
125
+ check-docs-freshness
126
+ check-agent-rules
127
+ ```
128
+
129
+ Required large-project baseline:
130
+
131
+ For a large project, the harness is not a maturity ladder. Treat the structure above as the baseline before letting Claude Code make non-trivial changes.
132
+
133
+ Missing pieces are not an accepted intermediate design. If a legacy project is missing part of the harness, record the gap in `.ai/state/known-issues.md` or an execution plan with owner, risk, and target date. High-risk work must wait until the relevant rules, role agents, docs, and validation commands exist.
134
+
135
+ Minimum baseline for non-trivial AI coding:
136
+
137
+ - root `CLAUDE.md`
138
+ - module-local `CLAUDE.md` for edited modules
139
+ - architecture, module map, testing, security, and dependency docs
140
+ - role agents for project management/user communication/translation, architecture/planning, coding, and independent review/testing
141
+ - task specs, handoff artifacts, progress state, decisions, validation logs, known issues, and generated context artifacts
142
+ - fast, changed-file, module, boundary, public-surface, contract-test, generated-artifact, docs-freshness, and agent-rule checks
143
+ - hooks or CI gates for protected files, validation, docs sync, public contracts, and test quality
144
+
145
+ ## 3. `CLAUDE.md`
146
+
147
+ `CLAUDE.md` is an entry map, not a project encyclopedia.
148
+
149
+ Include:
150
+
151
+ - one-sentence project description
152
+ - repository map
153
+ - common build / test / lint / typecheck commands
154
+ - documents Claude should read before starting
155
+ - module boundaries and forbidden actions
156
+ - high-risk areas
157
+ - files not to touch
158
+ - Definition of Done
159
+ - what to do when unsure
160
+
161
+ Do not include:
162
+
163
+ - long architecture essays
164
+ - full descriptions of every file
165
+ - complete business rule manuals
166
+ - all API docs
167
+ - long style guides
168
+ - vague rules like “write high-quality code”
169
+ - frequently changing task state
170
+
171
+ Root template:
172
+
173
+ ```md
174
+ # CLAUDE.md
175
+
176
+ ## Project Map
177
+
178
+ - `services/`: production services
179
+ - `packages/`: shared libraries
180
+ - `apps/`: user-facing applications
181
+ - `docs/`: architecture, testing, security, module docs
182
+ - `tools/`: validation and developer utilities
183
+
184
+ ## Start Here
185
+
186
+ - Read `docs/ARCHITECTURE.md` for system overview.
187
+ - Read `docs/MODULE_MAP.md` before choosing files.
188
+ - Read module-local `CLAUDE.md` before editing a subdirectory.
189
+ - Prefer existing APIs, services, helpers, and patterns before adding abstractions.
190
+
191
+ ## Commands
192
+
193
+ - Fast validation: `tools/check-fast`
194
+ - Changed files validation: `tools/check-changed`
195
+ - Module validation: `tools/check-module <module>`
196
+
197
+ ## Role Entry Points
198
+
199
+ Role-specific behavior lives in `.claude/agents/`.
200
+
201
+ - Use `claude --agent project-manager` for user communication, translation, task clarification, task specs, role routing, role commands, status summaries, and final acceptance.
202
+ - Use `claude --agent architect` for architecture plans, module boundaries, file responsibilities, public contracts, test contracts, and phase plans.
203
+ - Use `claude --agent coder` for implementation and direct tests within an approved plan.
204
+ - Use `claude --agent reviewer` for independent review, test adequacy, validation evidence, docs sync, and acceptance findings.
205
+ - Do not use an untagged session as an implicit project manager for non-trivial work.
206
+ - Do not simulate another role inside the wrong session.
207
+
208
+ ## Role Sessions
209
+
210
+ - For complex features, cross-module changes, refactors, public API changes, schema changes, auth, payment, permission, or security-sensitive work, start Claude Code with an explicit role: `claude --agent <role>`.
211
+ - Default core roles are `project-manager`, `architect`, `coder`, and `reviewer`.
212
+ - The `project-manager` role owns user communication, translation, task routing, role commands, handoff verification, and final status reporting.
213
+ - Do not let one coding session own architecture/plan decisions, implementation, final testing responsibility, and review.
214
+ - Role outputs are exchanged through `.ai/handoffs/<task-slug>/`, not through chat history.
215
+ - When the required role route includes `architect`, coding must not start until the architecture and plan artifact exists.
216
+ - If the current session was not started with the required role, stop and ask the user to restart with `claude --agent <role>`; do not pretend to be that role inside the wrong session.
217
+ - Critical global rules may be repeated in role agent files for defense in depth, but repeated rules must use stable rule IDs and be checked by `tools/check-agent-rules`. Do not maintain untracked manual copies.
218
+
219
+ ## Default Behavior
220
+
221
+ - State assumptions before coding; ask when requirements, boundaries, or acceptance criteria are unclear.
222
+ - When multiple interpretations are reasonable, do not choose silently; explain the difference and tradeoff, and ask for confirmation when needed.
223
+ - Prefer the simplest solution that satisfies the task; do not add speculative features, abstractions, configuration, or flexibility.
224
+ - Touch only files required by the task; do not clean up or refactor unrelated code.
225
+ - Clean up only unused code created by the current change.
226
+ - Report-but-don't-act: record out-of-scope issues in `.ai/state/known-issues.md`; do not act on them.
227
+ - Every changed line must trace to the task goal, public contract, test contract, or required documentation sync.
228
+ - For multi-step tasks, define the validation check for each step before implementing it.
229
+
230
+ ## Forbidden
231
+
232
+ - Do not edit generated, vendor, third-party, lock, or secret files unless explicitly requested.
233
+ - Do not introduce dependencies without approval.
234
+ - Do not bypass tests, lint, typecheck, auth, permissions, or security checks.
235
+ - Public API, schema, auth, payment, and permission changes require explicit plan and approval.
236
+ - Do not cross module boundaries through internal imports.
237
+
238
+ ## Definition of Done
239
+
240
+ - Diff is scoped to the task.
241
+ - Required validation passes.
242
+ - New or modified public functions have contract tests.
243
+ - Behavior changes have regression tests unless impractical.
244
+ - Plan, architecture, public contract, test strategy, and module responsibility changes are reflected in docs.
245
+ - Follow-ups are recorded in `.ai/state/known-issues.md` or the execution plan.
246
+ ```
247
+
248
+ Large projects must have module-local `CLAUDE.md` files:
249
+
250
+ ```text
251
+ services/billing/CLAUDE.md
252
+ services/auth/CLAUDE.md
253
+ apps/web/CLAUDE.md
254
+ packages/ui/CLAUDE.md
255
+ ```
256
+
257
+ Module files should define:
258
+
259
+ - module responsibility
260
+ - important files
261
+ - public entry points
262
+ - forbidden dependencies
263
+ - test commands
264
+ - historical pitfalls
265
+ - high-risk behavior
266
+
267
+ ## 4. Task Specs and Planning Granularity
268
+
269
+ Every task must define at least **file-level responsibilities**.
270
+
271
+ Ordinary PRs, features, and bug fixes must define **public function contracts**.
272
+
273
+ Public functions include:
274
+
275
+ - exported functions
276
+ - public methods
277
+ - module APIs
278
+ - service / controller / repository public entry points
279
+ - route handlers / command handlers
280
+ - hooks
281
+ - externally used component props
282
+
283
+ Planning granularity:
284
+
285
+ ```text
286
+ exploration / research module level + candidate files
287
+ large rewrite / greenfield module level + file responsibilities, refine by phase
288
+ ordinary feature file responsibilities + public function contracts
289
+ bug fix touched files + affected public behavior
290
+ public API / SDK / permissions contract level, interface-level design when needed
291
+ small internal change file responsibilities + existing function behavior constraints
292
+ ```
293
+
294
+ Principles:
295
+
296
+ ```text
297
+ module boundaries: must be explicit
298
+ file responsibilities: must be explicit
299
+ public function contracts: required for ordinary tasks
300
+ private helpers: depends on risk
301
+ function internals: usually not fixed in advance
302
+ ```
303
+
304
+ Large tasks can start with modules, directories, file responsibilities, data flow, and dependency direction. Before each implementation phase, define the public function contracts involved in that phase.
305
+
306
+ ### 4.1 Task Spec Template
307
+
308
+ ```md
309
+ # Task Spec
310
+
311
+ ## Goal
312
+
313
+ ## Background
314
+
315
+ ## Scope
316
+
317
+ ## Non-goals
318
+
319
+ ## Task Severity
320
+
321
+ ## Required Role Route
322
+
323
+ ## Handoff Directory
324
+
325
+ ## Relevant Files
326
+
327
+ ## File Responsibilities
328
+
329
+ For every file likely to be edited, define its responsibility.
330
+
331
+ ## Public Surface Contract
332
+
333
+ For ordinary PRs, feature additions, bug fixes, and high-risk changes, define:
334
+
335
+ - public/exported functions or methods
336
+ - module APIs
337
+ - inputs and outputs
338
+ - side effects
339
+ - error behavior
340
+ - dependency rules
341
+ - signatures that must remain unchanged
342
+
343
+ For large rewrites or greenfield work, each implementation phase must define public surface before coding.
344
+
345
+ ## Test Contract
346
+
347
+ For every new or modified public function, define required tests.
348
+
349
+ Minimum:
350
+ - happy path
351
+ - boundary or failure path
352
+
353
+ Business-critical functions also cover:
354
+ - invalid input
355
+ - permission or state constraints
356
+ - side effects
357
+ - idempotency
358
+ - historical regressions
359
+
360
+ ## Architecture Constraints
361
+
362
+ ## Stop Conditions
363
+
364
+ ## Expected Behavior
365
+
366
+ ## Validation Commands
367
+
368
+ ## Definition of Done
369
+
370
+ ## Risks
371
+
372
+ ## Questions
373
+ ```
374
+
375
+ ### 4.2 Stop Conditions
376
+
377
+ Stop and update the plan before editing if:
378
+
379
+ - current session role does not match the required role route
380
+ - public API change seems necessary
381
+ - DB schema change seems necessary
382
+ - planned contract duplicates an existing API
383
+ - module boundaries make the plan inaccurate
384
+ - implementation needs to differ from the approved plan
385
+ - related architecture, module, testing, or docs would become stale
386
+
387
+ ## 5. Workflows
388
+
389
+ ### 5.1 Small Change
390
+
391
+ Use for single-file bugs, simple tests, copy, config, or known-pattern changes.
392
+
393
+ ```text
394
+ prompt
395
+ -> edit
396
+ -> focused validation
397
+ -> review diff
398
+ -> commit
399
+ ```
400
+
401
+ Prompt:
402
+
403
+ ```text
404
+ Fix the edge case in `src/foo.ts`.
405
+ Keep the diff minimal.
406
+ Run `pnpm test src/foo.test.ts`.
407
+ Report the validation result.
408
+ ```
409
+
410
+ ### 5.2 Complex Change
411
+
412
+ Use for multi-file changes, new features, business rules, or uncertain implementation paths.
413
+
414
+ ```text
415
+ Explore
416
+ -> Plan
417
+ -> approval
418
+ -> implement phase 1
419
+ -> validate
420
+ -> review
421
+ -> commit
422
+ -> implement next phase
423
+ ```
424
+
425
+ Exploration must not edit files:
426
+
427
+ ```text
428
+ Explore the codebase and create an implementation plan.
429
+ Do not edit files yet.
430
+
431
+ Include:
432
+ - relevant files
433
+ - proposed changes
434
+ - public surface contract
435
+ - tests to add/update
436
+ - validation commands
437
+ - risks
438
+ - questions
439
+ ```
440
+
441
+ ### 5.3 Debug
442
+
443
+ ```text
444
+ reproduction
445
+ -> hypotheses
446
+ -> instrumentation
447
+ -> reproduce
448
+ -> inspect logs
449
+ -> targeted fix
450
+ -> regression test
451
+ ```
452
+
453
+ Prompt:
454
+
455
+ ```text
456
+ Debug this issue. Do not guess a fix yet.
457
+
458
+ First:
459
+ 1. List plausible hypotheses.
460
+ 2. Identify where to inspect.
461
+ 3. Propose the smallest validation command.
462
+
463
+ Then make a targeted fix and add regression test.
464
+ ```
465
+
466
+ ### 5.4 TDD
467
+
468
+ Use for bugs, parsers, serializers, validators, calculators, state machines, public API behavior, and functionality with clear input/output.
469
+
470
+ ```text
471
+ write failing contract test
472
+ -> confirm it fails
473
+ -> freeze test expectations
474
+ -> implement
475
+ -> do not weaken test
476
+ -> pass focused test
477
+ -> run module validation
478
+ ```
479
+
480
+ ### 5.5 Review
481
+
482
+ Review should prioritize:
483
+
484
+ - correctness
485
+ - security / permission risk
486
+ - regressions
487
+ - missing tests
488
+ - architecture boundary violations
489
+ - public contract mismatch
490
+
491
+ Use a `reviewer` role session for complex or high-risk tasks. A fresh review session or reviewer subagent is acceptable for smaller scoped changes. Do not let the same session that implemented the change be the only reviewer.
492
+
493
+ ## 6. Context Management
494
+
495
+ One session should correspond to one coherent task.
496
+
497
+ Continue the same session when:
498
+
499
+ - still working on the same bug / feature / review comment
500
+ - prior exploration context is needed
501
+ - fixing a problem Claude just introduced
502
+
503
+ Start a new session when:
504
+
505
+ - switching tasks
506
+ - moving from implementation to independent review
507
+ - the current session has read too many unrelated files
508
+ - Claude repeats the same mistake
509
+ - a phase is completed and committed
510
+ - fresh eyes are needed
511
+
512
+ Rule of thumb:
513
+
514
+ > Continue when the next action depends on previous reasoning. Start fresh when the next action needs independent judgment. Start fresh when Claude gets confused.
515
+
516
+ For large-codebase exploration, use read-only subagents. Keep only findings, file paths, and the plan in the `project-manager` session or the current owning role session.
517
+
518
+ Context should include:
519
+
520
+ - files to edit
521
+ - related tests
522
+ - module rules
523
+ - failure logs
524
+ - architecture boundaries
525
+ - concrete examples
526
+ - acceptance criteria
527
+
528
+ Do not include:
529
+
530
+ - many unrelated files
531
+ - full old chat histories
532
+ - long external docs
533
+ - stale design docs
534
+ - unrelated CI logs
535
+
536
+ ## 7. Role-Based Agent Sessions
537
+
538
+ For large projects, the default execution model should be explicit role-based sessions, not dynamic role routing inside one generic Claude conversation.
539
+
540
+ The user-facing task should start with a `project-manager` role session. The project manager owns user communication, translation, role command dispatch, severity classification, role routing, progress tracking, and process verification. It does not own architecture, coding, and independent review for the same non-trivial task.
541
+
542
+ Do not make one generic Claude session own architecture, planning, coding, final testing, and review for non-trivial work. That blurs responsibility and makes acceptance weak.
543
+
544
+ ### 7.1 Project Manager Session
545
+
546
+ Start the user-facing coordination session explicitly:
547
+
548
+ ```bash
549
+ claude --agent project-manager
550
+ ```
551
+
552
+ Project manager responsibilities:
553
+
554
+ ```text
555
+ communicate with user
556
+ -> clarify task
557
+ -> translate user intent into a task brief / task spec
558
+ -> classify severity
559
+ -> choose required role route
560
+ -> prepare the next role command
561
+ -> ensure handoff directory exists when needed
562
+ -> start or ask the user to start architect/coder/reviewer/specialist sessions when needed
563
+ -> track progress, blockers, validation, docs sync, and Replan
564
+ -> verify role outputs and handoff artifacts
565
+ -> summarize final status and risks to the user
566
+ ```
567
+
568
+ The project manager is a process owner, not an execution owner.
569
+
570
+ It is also the communication bridge between the user and the role agents. The user should not need to know how to write a perfect Claude Code prompt. The project manager owns the translation from user intent to precise agent instructions.
571
+
572
+ It may route T0/T1 work to a lightweight coder flow when the task is small, scoped, and low risk. For non-trivial work, it coordinates role sessions and verifies the process.
573
+
574
+ Do not let the project manager:
575
+
576
+ - implement complex changes directly
577
+ - skip required `architect`, `coder`, or `reviewer` sessions
578
+ - approve coder output without independent reviewer evidence
579
+ - bypass the required role route for high-risk work
580
+ - turn coordination into a do-everything session
581
+
582
+ ### 7.1.1 Role Command Contract
583
+
584
+ For non-trivial tasks, the project manager must not hand off a vague prompt to the next role. It must prepare a role command that is specific enough for that role to execute without recovering missing process context from chat history.
585
+
586
+ A role command must include:
587
+
588
+ ```text
589
+ role identity
590
+ task spec path
591
+ required input artifacts
592
+ allowed write scope
593
+ public surface contract
594
+ test contract
595
+ stop conditions
596
+ validation commands
597
+ expected output artifact path
598
+ escalation / Replan triggers
599
+ ```
600
+
601
+ Role command examples:
602
+
603
+ ```text
604
+ architect command:
605
+ read the task spec, architecture docs, module map, and relevant module-local CLAUDE.md
606
+ produce .ai/handoffs/<task-slug>/architecture-plan.md
607
+ define file responsibilities, public contracts, test contracts, phases, validation, and Replan triggers
608
+ do not edit production code
609
+
610
+ coder command:
611
+ read the task spec and approved architecture-plan.md
612
+ implement only the approved phase and allowed files
613
+ add or update direct contract/regression tests
614
+ update implementation-log.md and validation-log.md
615
+ stop if scope, public contract, architecture, or test strategy must change
616
+
617
+ reviewer command:
618
+ read task spec, architecture-plan.md, implementation-log.md, validation-log.md, and git diff
619
+ verify scope, architecture, public contract, tests, validation, and docs sync
620
+ write review-report.md
621
+ only apply small, local, low-risk review-scoped fixes
622
+ ```
623
+
624
+ The project manager may use a prompt compiler or template system to build role commands, but the responsibility stays with the project manager. A role command is an auditable artifact: if a role agent fails because the command was vague, the harness should improve the command template rather than blaming the role agent alone.
625
+
626
+ ### 7.2 Session-Wide Role Agents
627
+
628
+ Instead, start each major phase with an explicit session-wide role:
629
+
630
+ ```bash
631
+ claude --agent project-manager
632
+ claude --agent architect
633
+ claude --agent coder
634
+ claude --agent reviewer
635
+ ```
636
+
637
+ For background work:
638
+
639
+ ```bash
640
+ claude --agent reviewer --bg "Review PR 123 for architecture drift, test gaps, and scope creep"
641
+ ```
642
+
643
+ The role is selected at session startup. The agent file defines that session's system prompt, tool restrictions, model, stop conditions, and output format. `CLAUDE.md` still provides project rules, but critical safety, architecture, permission, and output constraints must be repeated inside the role agent file.
644
+
645
+ If the current session was not started with the required role, stop and ask the user to restart with the correct `claude --agent <role>` command. Do not simulate a different role through a normal prompt.
646
+
647
+ ### 7.3 Task Severity Routing
648
+
649
+ This is not progressive adoption. The full harness exists by default; the role chain depends on task risk.
650
+
651
+ All user-facing routes begin with `project-manager`. The project manager may hand off T0/T1 work to `coder` quickly, but it still owns translation, status reporting, and acceptance communication.
652
+
653
+ | Task class | Examples | Required role route |
654
+ | --- | --- | --- |
655
+ | T0 trivial | copy, comments, docs typo, tiny config with no behavior change | `project-manager` -> `coder`; optional reviewer checklist |
656
+ | T1 small scoped change | single-file bug, focused test addition, known-pattern fix | `project-manager` -> `coder` -> fresh review context or `reviewer` |
657
+ | T2 ordinary feature | bounded behavior, normal multi-file feature, ordinary PR | `project-manager` -> `architect` -> `coder` -> `reviewer` |
658
+ | T3 cross-module / architectural | cross-module change, module boundary change, refactor, new public surface | `project-manager` -> `architect` -> `coder` -> `reviewer` |
659
+ | T4 high-risk | auth, permission, payment, billing, schema, data deletion, public API/SDK, security-sensitive infrastructure | `project-manager` -> `architect` -> relevant specialist if needed -> `coder` -> `reviewer` -> human approval |
660
+ | T5 large rewrite / greenfield | new subsystem, major rewrite, migration across many modules | `project-manager` -> `architect`; then repeat `coder` -> `reviewer` per phase, with architect review at phase boundaries |
661
+
662
+ If classification is unclear, use the stricter route.
663
+
664
+ ### 7.4 Required Roles
665
+
666
+ Large projects should define these project-level agents:
667
+
668
+ ```text
669
+ .claude/agents/
670
+ project-manager.md
671
+ architect.md
672
+ coder.md
673
+ reviewer.md
674
+ optional/
675
+ security-specialist.md
676
+ migration-specialist.md
677
+ performance-specialist.md
678
+ frontend-qa.md
679
+ ```
680
+
681
+ Role responsibilities:
682
+
683
+ ```text
684
+ project-manager
685
+ owns user communication, multilingual translation, task clarification, task specs, role routing, and role command dispatch
686
+ translates user input into an English engineering task when needed
687
+ translates role outputs back into the user's preferred language
688
+ creates and verifies handoff artifacts
689
+ tracks progress, blockers, validation, docs sync, and Replan
690
+ outputs task specs, role commands, status summaries, and final acceptance reports
691
+ must not own architecture, implementation, and independent review for the same non-trivial task
692
+
693
+ architect
694
+ owns architecture and plan
695
+ defines module boundaries, file responsibilities, public contracts, dependency direction, risk, and phases
696
+ outputs .ai/handoffs/<task-slug>/architecture-plan.md
697
+ must not implement production code
698
+
699
+ coder
700
+ owns code changes and baseline tests required to complete the approved task
701
+ follows approved architecture-plan.md and task spec
702
+ outputs touched files, implementation notes, validation results, and follow-ups
703
+ must write/update direct unit, contract, or regression tests needed for the changed behavior
704
+ must not change module responsibilities, public contracts, architecture direction, or test strategy without Replan
705
+
706
+ reviewer
707
+ owns independent acceptance and final test responsibility
708
+ checks scope, role compliance, architecture compliance, public contract compliance, docs sync, validation evidence, and risk
709
+ checks, designs, and adds missing tests when needed
710
+ may directly apply small, local, low-risk review fixes
711
+ owns complex tests, E2E coverage, regression matrix, and release-level validation recommendations
712
+ outputs .ai/handoffs/<task-slug>/review-report.md
713
+ must escalate larger implementation issues to coder
714
+ must escalate architecture, public contract, or design issues to architect
715
+ ```
716
+
717
+ ### 7.5 Role Permission Matrix
718
+
719
+ Prompt rules are not enough. Role separation must be backed by tool scope, permission mode, hooks, and review.
720
+
721
+ | Role | Suggested tools | Write scope | Must not |
722
+ | --- | --- | --- | --- |
723
+ | `project-manager` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | task specs, role commands, handoff metadata, status/progress/known-issues, final reports | implement non-trivial production code, approve without reviewer evidence, replace architect/coder/reviewer roles |
724
+ | `architect` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | architecture plan, task spec, architecture docs only with approval | edit production code, rewrite tests, expand task scope |
725
+ | `coder` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | approved source files, baseline tests, validation log, implementation log | change scope, public contracts, module boundaries, or test strategy without Replan |
726
+ | `reviewer` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | review report, missing tests/fixtures, validation log, small review-scoped fixes | take over implementation, change architecture/public contracts, approve own implementation, weaken tests |
727
+ | `security-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | security review report and approved security tests | bypass approvals, edit production code without explicit scope |
728
+ | `migration-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | migration plan, migration tests, validation notes | run destructive migrations, change schema without approval |
729
+ | `performance-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | performance report, benchmarks, approved perf tests | change product behavior, hide regressions |
730
+
731
+ Recommended permission modes:
732
+
733
+ ```text
734
+ project-manager: default with write hooks limited to task specs, role commands, handoff metadata, state files, and final reports
735
+ architect: default with write hooks limited to architecture-plan.md, task specs, and approved docs
736
+ coder: default or acceptEdits, but only inside approved scope
737
+ reviewer: default with production-code writes blocked except explicitly review-scoped small fixes; test writes allowed
738
+ specialist: default with write hooks limited to specialist reports, tests, and approved files
739
+ ```
740
+
741
+ Tool lists alone cannot enforce path-level ownership. Add hooks or CI checks that reject writes outside each role's allowed scope. If path-scoped enforcement is unavailable, the final review must explicitly inspect role ownership violations.
742
+
743
+ ### 7.6 Handoff Contract
744
+
745
+ Role sessions communicate through files, not memory from previous chats.
746
+
747
+ Required handoff directory:
748
+
749
+ ```text
750
+ .ai/handoffs/<task-slug>/
751
+ role-commands/
752
+ architect-command.md
753
+ coder-command.md
754
+ reviewer-command.md
755
+ architecture-plan.md
756
+ implementation-log.md
757
+ validation-log.md
758
+ review-report.md
759
+ ```
760
+
761
+ Each role session must start by reading the artifacts it depends on:
762
+
763
+ ```text
764
+ project-manager
765
+ reads: user request, repo entry docs, task state, role outputs
766
+ writes: task spec, role commands, progress/status, known issues, final acceptance report
767
+
768
+ architect
769
+ reads: task request, task spec, ARCHITECTURE.md, MODULE_MAP.md, module-local CLAUDE.md, relevant source/tests
770
+ writes: architecture-plan.md
771
+
772
+ coder
773
+ reads: task spec, architecture-plan.md, relevant module docs
774
+ writes: code, baseline tests, implementation-log.md, validation-log.md
775
+
776
+ reviewer
777
+ reads: task spec, architecture-plan.md, implementation-log.md, validation-log.md, git diff
778
+ writes: review-report.md
779
+
780
+ optional specialist
781
+ reads: task spec, architecture-plan.md, relevant source/tests
782
+ writes: specialist report, approved tests, validation-log.md
783
+ ```
784
+
785
+ Reviewer test responsibility:
786
+
787
+ ```text
788
+ coder:
789
+ writes direct tests required by the code change
790
+ runs focused validation
791
+
792
+ reviewer:
793
+ owns final test adequacy
794
+ identifies and adds missing unit/contract/integration tests when needed
795
+ owns complex test strategy, E2E smoke/release coverage, and regression matrix
796
+ may directly apply small, local, low-risk review fixes
797
+ must request coder fixes for larger implementation issues
798
+ must request architect review for architecture, public contract, dependency, schema, auth, permission, payment, or design issues
799
+ must not weaken tests to pass validation
800
+ ```
801
+
802
+ Reviewer direct fixes must be review-scoped:
803
+
804
+ ```text
805
+ allowed:
806
+ strengthen test assertions
807
+ add missing small boundary/regression tests
808
+ fix test names, fixtures, or validation documentation
809
+ fix obvious typo, import, lint, formatting, or local compile error
810
+ fix a small local bug discovered during review
811
+
812
+ required conditions:
813
+ small and local
814
+ low-risk
815
+ no public contract change
816
+ no architecture change
817
+ no new dependency
818
+ no schema/migration change
819
+ no auth, permission, payment, or data deletion behavior change
820
+ no broad production rewrite
821
+
822
+ escalate to coder:
823
+ business logic needs a medium or large change
824
+ multiple production files need coordinated edits
825
+ implementation structure needs rework
826
+ validation fails because core behavior is wrong
827
+ the fix would exceed a small review patch
828
+
829
+ escalate to architect:
830
+ module boundary is wrong
831
+ file responsibilities are wrong
832
+ public contract is wrong
833
+ dependency direction is wrong
834
+ schema, auth, permission, payment, public API, or security design is wrong
835
+ the implementation reveals that the architecture plan is invalid
836
+ ```
837
+
838
+ For a task with a handoff directory, `.ai/handoffs/<task-slug>/validation-log.md` is the authoritative validation record for that task. `.ai/state/validation-log.md` is only a rolling index of recent validation results across tasks.
839
+
840
+ For complex or high-risk work, the next role must not start until the required previous artifact exists and is coherent.
841
+
842
+ Handoff artifact schemas:
843
+
844
+ ```md
845
+ # architecture-plan.md
846
+
847
+ ## Architecture Summary
848
+ ## Task Classification
849
+ ## Required Role Route
850
+ ## Modules / Files
851
+ ## File Responsibilities
852
+ ## Public Surface Contract
853
+ ## Dependency Direction
854
+ ## Data Flow
855
+ ## Phases
856
+ ## Files Per Phase
857
+ ## Validation Per Phase
858
+ ## Rollback / Replan Triggers
859
+ ## Risks
860
+ ## Stop Conditions
861
+ ## Docs To Update
862
+ ## Approval
863
+
864
+ # implementation-log.md
865
+
866
+ ## Summary
867
+ ## Files Changed
868
+ ## Public Surface Changed
869
+ ## Tests Added / Updated
870
+ ## Validation Run
871
+ ## Deviations From Architecture Plan
872
+ ## Follow-ups
873
+
874
+ # validation-log.md
875
+
876
+ ## <timestamp> <command>
877
+
878
+ - role:
879
+ - commit / diff:
880
+ - scope:
881
+ - result:
882
+ - failures:
883
+ - fixes:
884
+ - rerun:
885
+
886
+ # review-report.md
887
+
888
+ ## Summary
889
+ ## Role / Handoff Compliance
890
+ ## Scope Review
891
+ ## Architecture Review
892
+ ## Public Contract Review
893
+ ## Test Review
894
+ ## Missing Tests Added
895
+ ## Review Fixes Applied
896
+ ## Escalations To Coder / Architect
897
+ ## E2E / Regression Recommendation
898
+ ## Validation Evidence
899
+ ## Docs Sync
900
+ ## Findings
901
+ ## Decision
902
+ ```
903
+
904
+ ### 7.7 Role Session vs Subagent
905
+
906
+ Role sessions for the same task should normally share the same task worktree and branch.
907
+
908
+ Worktree isolation is by task, not by role:
909
+
910
+ ```text
911
+ one task
912
+ -> one branch
913
+ -> one worktree
914
+ -> one handoff directory
915
+ -> architect -> coder -> reviewer in sequence
916
+ ```
917
+
918
+ Do not create separate worktrees only because the task uses `architect`, `coder`, and `reviewer`. That fragments context, makes diffs harder to audit, and adds merge overhead without improving role separation.
919
+
920
+ Role separation comes from:
921
+
922
+ - session role
923
+ - write permissions
924
+ - handoff files
925
+ - phase boundaries
926
+ - validation and review
927
+
928
+ Worktree separation is for different tasks or truly parallel writable sub-tasks.
929
+
930
+ Use a role session when:
931
+
932
+ - the phase is the main work, not a side task
933
+ - the role needs sustained interaction with the user
934
+ - the role owns decisions or artifacts
935
+ - the role may run for a long time
936
+ - the role needs clear accountability
937
+
938
+ Use a subagent when:
939
+
940
+ - the work is a bounded side task
941
+ - the task produces verbose output that should not pollute the main context
942
+ - the task can return a concise summary
943
+ - the task is read-only exploration, review, triage, or log analysis
944
+ - the task can safely run in parallel
945
+
946
+ Do not use dynamic subagent routing as the primary workflow for architecture/plan -> coding -> independent review/testing. Use explicit role sessions and file handoffs for that.
947
+
948
+ ### 7.8 Agent File Contract
949
+
950
+ Every role agent file should define:
951
+
952
+ ```md
953
+ ---
954
+ name: architect
955
+ description: Use as a session-wide role for architecture design, task planning, module boundaries, file responsibilities, public contracts, dependency direction, and risk assessment.
956
+ tools: Read, Grep, Glob, Bash, Edit, Write
957
+ disallowedTools: Agent
958
+ permissionMode: default
959
+ model: sonnet
960
+ ---
961
+
962
+ # Role
963
+
964
+ You are the architecture and planning role for this project.
965
+
966
+ # Global Rules To Repeat
967
+
968
+ - Follow root `CLAUDE.md`, module-local `CLAUDE.md`, and the relevant handoff artifacts.
969
+ - Do not exceed this role's write scope.
970
+ - Stop when scope, architecture, public contract, test strategy, or risk changes.
971
+
972
+ # Responsibilities
973
+
974
+ - Define module boundaries.
975
+ - Define file-level responsibilities.
976
+ - Define public function and public API contracts.
977
+ - Identify dependency direction and forbidden imports.
978
+ - Split work into phases.
979
+ - Define validation per phase.
980
+ - Identify architecture risks and stop conditions.
981
+
982
+ # Required Inputs
983
+
984
+ - task spec or user request
985
+ - `docs/ARCHITECTURE.md`
986
+ - `docs/MODULE_MAP.md`
987
+ - relevant module-local `CLAUDE.md`
988
+
989
+ # Outputs
990
+
991
+ - `.ai/handoffs/<task-slug>/architecture-plan.md`
992
+
993
+ # Do Not
994
+
995
+ - Do not implement production code.
996
+ - Do not rewrite tests.
997
+ - Do not invent product requirements.
998
+ - Do not bypass module ownership rules.
999
+
1000
+ # Stop Conditions
1001
+
1002
+ - Requested behavior is ambiguous.
1003
+ - The design requires public API, schema, auth, payment, permission, or security boundary changes without approval.
1004
+ - The existing architecture cannot support the requested behavior without Replan.
1005
+ ```
1006
+
1007
+ Use frontmatter fields such as `tools`, `disallowedTools`, `permissionMode`, `hooks`, `mcpServers`, and `skills` when the role needs stricter tool, permission, or integration boundaries.
1008
+
1009
+ Minimum role templates:
1010
+
1011
+ ```text
1012
+ project-manager.md
1013
+ frontmatter:
1014
+ tools: Read, Grep, Glob, Bash, Edit, Write
1015
+ permissionMode: default
1016
+ required inputs:
1017
+ user request, repo entry docs, current task state, role outputs
1018
+ outputs:
1019
+ task spec, role commands, progress/status updates, final acceptance report
1020
+ do not:
1021
+ implement non-trivial production code, replace architect/coder/reviewer, approve without reviewer evidence
1022
+ stop when:
1023
+ user intent is ambiguous, role route is unclear, handoff artifacts are missing, or scope/risk changes
1024
+
1025
+ architect.md
1026
+ frontmatter:
1027
+ tools: Read, Grep, Glob, Bash, Edit, Write
1028
+ permissionMode: default
1029
+ required inputs:
1030
+ task spec, ARCHITECTURE.md, MODULE_MAP.md, module-local CLAUDE.md
1031
+ outputs:
1032
+ architecture-plan.md
1033
+ do not:
1034
+ implement code, rewrite tests, expand task scope
1035
+ stop when:
1036
+ public API, schema, auth, permission, payment, or security boundaries need approval
1037
+
1038
+ coder.md
1039
+ frontmatter:
1040
+ tools: Read, Grep, Glob, Bash, Edit, Write
1041
+ permissionMode: default
1042
+ required inputs:
1043
+ task spec, architecture-plan.md
1044
+ outputs:
1045
+ code, baseline tests, implementation-log.md, validation-log.md
1046
+ do not:
1047
+ change architecture, public contracts, scope, test strategy, or module responsibilities without Replan
1048
+ stop when:
1049
+ implementation requires design, contract, dependency, schema, permission, or test-strategy changes
1050
+
1051
+ reviewer.md
1052
+ frontmatter:
1053
+ tools: Read, Grep, Glob, Bash, Edit, Write
1054
+ permissionMode: default
1055
+ required inputs:
1056
+ task spec, architecture-plan.md, implementation-log.md, validation-log.md, git diff
1057
+ outputs:
1058
+ review-report.md, missing tests/fixtures when needed, review-scoped small fixes, validation-log.md
1059
+ do not:
1060
+ take over implementation, change architecture/public contracts, weaken tests, lower assertions, delete failing tests, approve own implementation
1061
+ stop when:
1062
+ handoffs are missing, validation evidence is missing, architecture/test/doc compliance cannot be verified, or the fix is no longer small/local/low-risk
1063
+ ```
1064
+
1065
+ ### 7.9 Default Workflow
1066
+
1067
+ For large features:
1068
+
1069
+ ```text
1070
+ project-manager session
1071
+ -> communicate with user + translate intent + classify task + route roles + track process
1072
+
1073
+ architect session
1074
+ -> architecture-plan.md
1075
+
1076
+ coder session
1077
+ -> code + baseline tests + implementation-log.md + validation-log.md
1078
+
1079
+ reviewer session
1080
+ -> review-report.md + missing tests/fixtures if needed + validation-log.md
1081
+
1082
+ human approval
1083
+ ```
1084
+
1085
+ For small bug fixes or ordinary PRs, one coder session is acceptable if the task spec is clear, file responsibilities are explicit, public contracts are defined when needed, and validation is cheap.
1086
+
1087
+ For complex features, cross-module changes, public API changes, schema changes, auth, payment, permissions, data deletion, or security-sensitive work, role sessions are required.
1088
+
1089
+ ## 8. Testing and Validation
1090
+
1091
+ Core principle:
1092
+
1093
+ > Test assets should be rich, and execution should be smart. Run fast, relevant tests during development; run broad, expensive suites before release.
1094
+
1095
+ ### 8.1 Layers
1096
+
1097
+ ```text
1098
+ L0 Fast Checks
1099
+ format, lint, typecheck, architecture boundary, dependency rules
1100
+
1101
+ L1 Focused Unit / Contract Tests
1102
+ changed-file related tests, public function contract tests, regression tests
1103
+
1104
+ L2 Module / Integration Tests
1105
+ module service tests, DB integration, API contract, service/controller integration
1106
+
1107
+ L3 Smoke E2E
1108
+ core user journeys, minimal browser/API smoke flows
1109
+
1110
+ L4 Full Regression / Release Suite
1111
+ complex business combinations, multi-browser, historical replay, visual/accessibility/perf
1112
+ ```
1113
+
1114
+ Time budgets:
1115
+
1116
+ ```text
1117
+ L0 check-fast: <= 60s
1118
+ L1 check-changed: <= 3min
1119
+ L2 check-module: <= 10min
1120
+ L3 smoke-e2e: <= 15min
1121
+ L4 full-regression: nightly / release only
1122
+ ```
1123
+
1124
+ ### 8.2 Commands
1125
+
1126
+ ```text
1127
+ tools/check-fast
1128
+ tools/check-changed
1129
+ tools/check-module <module>
1130
+ tools/check-e2e-smoke [scope]
1131
+ tools/check-e2e-release
1132
+ tools/check-full
1133
+ ```
1134
+
1135
+ What Claude should run:
1136
+
1137
+ ```text
1138
+ docs / comments / small config:
1139
+ L0
1140
+
1141
+ ordinary bug fix:
1142
+ L0 + L1 + regression test
1143
+
1144
+ new or modified public function:
1145
+ L0 + L1 public contract tests
1146
+
1147
+ ordinary feature:
1148
+ L0 + L1 + relevant L2
1149
+
1150
+ module behavior change:
1151
+ L0 + L1 + L2
1152
+
1153
+ user-visible critical path:
1154
+ L0 + L1 + L2 + relevant L3 smoke E2E
1155
+
1156
+ auth / payment / permission / schema / public API:
1157
+ L0 + L1 + L2 + relevant L3
1158
+ L4 before release
1159
+
1160
+ release / major version / high-risk migration:
1161
+ L0 + L1 + L2 + L3 + L4
1162
+ ```
1163
+
1164
+ ### 8.3 Change-Aware Test Selection
1165
+
1166
+ Do not maintain a manual test map. Generate or verify a test map from source code, test naming conventions, coverage data, build metadata, and CI history.
1167
+
1168
+ ```json
1169
+ {
1170
+ "services/billing/invoice/calculator.ts": {
1171
+ "module": "billing",
1172
+ "unit": ["tests/billing/invoice-calculator.test.ts"],
1173
+ "integration": ["tests/billing/refund-service.test.ts"],
1174
+ "e2eSmoke": ["e2e/smoke/billing-checkout.spec.ts"]
1175
+ }
1176
+ }
1177
+ ```
1178
+
1179
+ The generated artifact should live at:
1180
+
1181
+ ```text
1182
+ .ai/generated/test-map.json
1183
+ ```
1184
+
1185
+ Rules:
1186
+
1187
+ - `.ai/generated/test-map.json` is a derived artifact, not a hand-edited source of truth.
1188
+ - Manual edits to generated test maps are forbidden.
1189
+ - `tools/check-generated-artifacts` fails in CI if the generated map is stale.
1190
+ - If the map cannot be generated reliably, `tools/check-changed` must fall back to code search, LSP, ownership metadata, and conservative module-level tests.
1191
+
1192
+ `tools/check-changed` should:
1193
+
1194
+ ```text
1195
+ git diff
1196
+ -> touched files
1197
+ -> map to modules
1198
+ -> find related unit/contract/regression tests
1199
+ -> run L0 + focused L1
1200
+ -> if public surface changed, suggest L2
1201
+ -> if critical user path changed, suggest L3
1202
+ ```
1203
+
1204
+ ### 8.4 E2E Tiers
1205
+
1206
+ ```text
1207
+ e2e/
1208
+ smoke/
1209
+ login.spec.ts
1210
+ checkout-happy-path.spec.ts
1211
+ core-dashboard-load.spec.ts
1212
+
1213
+ regression/
1214
+ coupon-partial-refund.spec.ts
1215
+ permission-edge-cases.spec.ts
1216
+ multi-user-collaboration.spec.ts
1217
+
1218
+ release/
1219
+ cross-browser.spec.ts
1220
+ mobile-responsive.spec.ts
1221
+ upgrade-migration.spec.ts
1222
+ ```
1223
+
1224
+ Smoke E2E: small, stable, core paths, runnable on every PR or high-risk change.
1225
+ Release E2E: complex combinations, historical incidents, cross-browser, slower but non-flaky, run before release or nightly.
1226
+
1227
+ Test tags:
1228
+
1229
+ ```text
1230
+ @smoke @regression @release @slow @flaky
1231
+ @billing @auth @risk-high @public-api @contract
1232
+ ```
1233
+
1234
+ ### 8.5 Public Function Test Contract
1235
+
1236
+ Every new or modified public function must have tests covering its contract.
1237
+
1238
+ Minimum:
1239
+
1240
+ ```text
1241
+ ordinary public function:
1242
+ happy path + boundary/failure path
1243
+
1244
+ business-critical public function:
1245
+ happy path + boundary + invalid input + state/permission + side effect + regression
1246
+
1247
+ high-risk public function:
1248
+ table-driven tests where practical
1249
+ contract/integration tests at module boundary
1250
+ replay/golden tests when behavior is complex
1251
+
1252
+ cross-module contract:
1253
+ if a public function is consumed by another module,
1254
+ add a contract test owned by the consumer module
1255
+ to lock in the behavior the consumer actually depends on
1256
+ ```
1257
+
1258
+ Tests should verify:
1259
+
1260
+ ```text
1261
+ input -> output -> side effects -> error behavior -> state changes
1262
+ ```
1263
+
1264
+ Do not only verify:
1265
+
1266
+ ```text
1267
+ mock call order
1268
+ internal helper call counts
1269
+ local implementation steps
1270
+ ```
1271
+
1272
+ ### 8.6 Test Quality Red Lines
1273
+
1274
+ Forbidden:
1275
+
1276
+ - weakening tests to make implementation pass
1277
+ - deleting failing tests without explanation
1278
+ - testing only mock call order
1279
+ - copying implementation logic into tests
1280
+ - testing only the happy path
1281
+ - large snapshots without clear assertion intent
1282
+ - fragile private-helper tests while missing public contract coverage
1283
+ - marking work complete without running declared validation
1284
+
1285
+ Encouraged:
1286
+
1287
+ - table-driven tests
1288
+ - regression test names that include historical scenarios
1289
+ - comments explaining why complex cases matter
1290
+ - golden / replay tests
1291
+ - integration / contract tests
1292
+
1293
+ Maintenance:
1294
+
1295
+ - flaky tests must have an owner, issue, and isolation strategy
1296
+ - slow tests are tagged `@slow` or moved to the release suite
1297
+ - skipped tests require issue, owner, and expiration
1298
+ - fast and slow tests are maintained separately
1299
+
1300
+ ## 9. Hooks / Skills / Subagents / Commands
1301
+
1302
+ Do not rely on `CLAUDE.md` for constraints that can be automated.
1303
+
1304
+ Recommended hooks:
1305
+
1306
+ ```text
1307
+ PreToolUse:
1308
+ block protected files
1309
+ block destructive commands
1310
+ block unapproved deploy/migration/data deletion
1311
+ block production secrets
1312
+ block writes outside the current role's allowed scope
1313
+ block implementation edits that change architecture/public contracts without Replan
1314
+
1315
+ PostToolUse:
1316
+ format touched files
1317
+ collect touched files
1318
+ run cheap lint
1319
+
1320
+ Stop:
1321
+ check project manager did not bypass required role route
1322
+ check task severity and required role route
1323
+ check required handoff artifacts exist
1324
+ check required validation
1325
+ check task-level validation-log.md updated when handoffs exist
1326
+ check progress updated
1327
+ check docs synced after plan/contract/test changes
1328
+ check no TODO(agent), placeholder, mocked implementation
1329
+ check public functions have contract tests
1330
+ check tests were not weakened
1331
+
1332
+ SessionStart:
1333
+ show that task coordination should use `claude --agent project-manager`
1334
+ warn when a non-trivial task is running in an untagged session
1335
+ show current role and expected role for the task
1336
+ show required handoff artifacts for the task severity
1337
+ inject current task state
1338
+ show recent failing checks
1339
+ show module owner and validation commands
1340
+ ```
1341
+
1342
+ Protected files:
1343
+
1344
+ ```text
1345
+ .env
1346
+ .env.*
1347
+ secrets/
1348
+ vendor/
1349
+ third_party/
1350
+ generated/
1351
+ .ai/generated/
1352
+ package-lock.json
1353
+ pnpm-lock.yaml
1354
+ db/migrations/
1355
+ ```
1356
+
1357
+ Lockfiles and migrations are not permanently forbidden, but they require explicit approval.
1358
+
1359
+ If you type the same long prompt for the third time, turn it into a skill or command.
1360
+
1361
+ Recommended subagents:
1362
+
1363
+ ```text
1364
+ codebase-explorer
1365
+ test-failure-triager
1366
+ security-specialist
1367
+ performance-specialist
1368
+ frontend-qa
1369
+ api-contract-reviewer
1370
+ migration-specialist
1371
+ ```
1372
+
1373
+ Review and explorer subagents should default to read-only.
1374
+
1375
+ Role agent sessions are different from subagents. Role sessions own a project phase and should be started with `claude --agent <role>`. Subagents are for bounded side tasks, context isolation, parallel exploration, triage, and independent review.
1376
+
1377
+ ## 10. Git / Worktrees / Review
1378
+
1379
+ Git is part of the harness. It is the audit trail for scope, architecture compliance, validation, and rollback.
1380
+
1381
+ Default rule:
1382
+
1383
+ ```text
1384
+ one task
1385
+ -> one branch
1386
+ -> one worktree
1387
+ -> one handoff directory
1388
+ -> one PR
1389
+ ```
1390
+
1391
+ Architect, coder, and reviewer should normally work in the same task worktree. They should hand off sequentially, not write concurrently.
1392
+
1393
+ Do not split worktrees by role:
1394
+
1395
+ ```text
1396
+ bad:
1397
+ task-login-architect/
1398
+ task-login-coder/
1399
+ task-login-reviewer/
1400
+
1401
+ good:
1402
+ task-login/
1403
+ architect -> coder -> reviewer
1404
+ ```
1405
+
1406
+ Role isolation is enforced by role files, permissions, hooks, handoff artifacts, and review. Worktree isolation is enforced at the task boundary.
1407
+
1408
+ Use separate worktrees only when:
1409
+
1410
+ - two different tasks are active at the same time
1411
+ - a large task has been explicitly split into independent writable sub-tasks
1412
+ - CI repair must proceed without disturbing an active implementation
1413
+ - a read-only investigation needs a clean checkout of a different branch or commit
1414
+
1415
+ Single-writer rule:
1416
+
1417
+ - only one write-capable role should edit a task worktree at a time
1418
+ - read-only review, exploration, or log analysis may run in parallel
1419
+ - reviewer may apply small review-scoped fixes after coder hands off
1420
+ - if two sessions need to edit the same files at the same time, split the work or stop and replan
1421
+
1422
+ Branch rules:
1423
+
1424
+ - never do AI implementation work directly on the main branch
1425
+ - one task branch should map to one task worktree
1426
+ - large work should use phase commits on the same task branch unless phases are independently releasable
1427
+ - if a task becomes too large, split it into child tasks with explicit branch and PR ownership
1428
+
1429
+ Small commits:
1430
+
1431
+ - one commit per phase
1432
+ - commit messages describe behavior changes
1433
+ - each commit should be understandable and revertible
1434
+ - each commit should pass the relevant validation tier when practical
1435
+ - use draft PRs for large changes
1436
+ - do not leave a 2,000-line diff for final review
1437
+
1438
+ Diff discipline:
1439
+
1440
+ - every changed file must trace to the task spec, architecture plan, implementation log, validation log, or reviewer fix
1441
+ - before handoff, coder must inspect `git diff` for unrelated changes, architecture drift, accidental formatting churn, generated artifacts, lockfiles, and migrations
1442
+ - before acceptance, reviewer must compare `git diff` against the architecture plan and public contracts
1443
+ - unrelated cleanup belongs in a separate task
1444
+
1445
+ PR discipline:
1446
+
1447
+ - PR description must link or summarize task spec, architecture plan, validation evidence, docs sync, and known risks
1448
+ - draft PRs are preferred for large or phased work
1449
+ - PR review must check scope, architecture compliance, public contracts, test adequacy, docs sync, and whether the diff is appropriately small
1450
+ - final merge requires human accountability for product semantics, security boundaries, and business risk
1451
+
1452
+ AI review is good at details. Humans remain responsible for:
1453
+
1454
+ - architecture direction
1455
+ - business semantics
1456
+ - security boundaries
1457
+ - product experience
1458
+ - whether the work is worth doing
1459
+ - whether the solution is over-engineered
1460
+
1461
+ ## 11. Large Codebase Rules
1462
+
1463
+ Do not rely only on grep. In large codebases, grep easily finds the wrong symbol, misses affected files, and causes partial completion or tool thrashing.
1464
+
1465
+ Provide:
1466
+
1467
+ ```text
1468
+ tools/ai-context <module>
1469
+ tools/find-owner <path>
1470
+ tools/find-callers <symbol>
1471
+ tools/find-tests <path>
1472
+ tools/check-boundaries
1473
+ ```
1474
+
1475
+ If LSP, Sourcegraph, code search, or MCP is available, Claude should prefer them.
1476
+
1477
+ Do not maintain hand-written large-codebase indexes as authoritative context. Indexes drift, and stale indexes mislead agents.
1478
+
1479
+ Generate context artifacts from source-of-truth systems:
1480
+
1481
+ ```text
1482
+ source of truth:
1483
+ codebase
1484
+ package manifests
1485
+ CODEOWNERS / ownership metadata
1486
+ build graph
1487
+ import graph
1488
+ test config
1489
+ coverage / CI metadata
1490
+ LSP / code search
1491
+
1492
+ derived artifacts:
1493
+ .ai/generated/module-index.json
1494
+ .ai/generated/test-map.json
1495
+ .ai/generated/public-surface.json
1496
+ ```
1497
+
1498
+ Example generated module index:
1499
+
1500
+ ```json
1501
+ {
1502
+ "billing": {
1503
+ "owner": "billing-platform",
1504
+ "docs": ["docs/modules/billing.md"],
1505
+ "entrypoints": ["services/billing/invoice/calculator.ts"],
1506
+ "tests": ["tests/billing/invoice-calculator.test.ts"],
1507
+ "commands": ["tools/check-module billing"],
1508
+ "rules": [
1509
+ "Use Money object for all amounts",
1510
+ "Do not import from payment/adapters/internal"
1511
+ ]
1512
+ }
1513
+ }
1514
+ ```
1515
+
1516
+ Rules:
1517
+
1518
+ - generated artifacts are caches, not truth
1519
+ - manual edits to `.ai/generated/**` are forbidden
1520
+ - CI must run `tools/check-generated-artifacts`
1521
+ - if a generated artifact is stale, Claude must regenerate it or fall back to live code search
1522
+ - if generated context conflicts with live code, live code wins
1523
+
1524
+ Architecture boundaries must be mechanically checked:
1525
+
1526
+ ```text
1527
+ tools/check-boundaries
1528
+ tools/check-generated-artifacts
1529
+ ```
1530
+
1531
+ and enforced in CI.
1532
+
1533
+ ## 12. Long Tasks, Documentation Sync, and Replan
1534
+
1535
+ Long tasks cannot rely on chat context.
1536
+
1537
+ State files:
1538
+
1539
+ ```text
1540
+ .ai/state/
1541
+ progress.md — snapshot of all active tasks' current state
1542
+ decisions.md — architectural / design decisions with rationale (append-only)
1543
+ validation-log.md — recent validation runs across tasks (rolling index, ~last 20)
1544
+ known-issues.md — deferred findings awaiting triage
1545
+ scratch.md — current session's working TODOs (cleared at task completion)
1546
+ ```
1547
+
1548
+ Validation log authority:
1549
+
1550
+ - `.ai/handoffs/<task-slug>/validation-log.md` is authoritative for one task.
1551
+ - `.ai/state/validation-log.md` is a rolling index across tasks and should point to the task-level log when one exists.
1552
+ - Final reports and review reports should cite the task-level validation log, not scattered chat output.
1553
+
1554
+ Information lifetime determines where it lives:
1555
+
1556
+ ```text
1557
+ within one session (phase breakdown, mid-implementation TODOs)
1558
+ -> scratch.md
1559
+
1560
+ across sessions of one task (progress, decisions)
1561
+ -> exec-plan (if task has one) + decisions.md
1562
+ -> otherwise progress.md + decisions.md
1563
+
1564
+ across tasks (deferred findings, out-of-scope discoveries)
1565
+ -> known-issues.md
1566
+ ```
1567
+
1568
+ `progress.md` rules:
1569
+
1570
+ - Snapshot, not log. Holds current status of every active task in one place.
1571
+ - One entry per active task; entry is rewritten in place, not appended.
1572
+ - When a task has an `exec-plan`, its detailed progress lives in the exec-plan's `current state`; `progress.md` keeps only a one-line pointer.
1573
+ - Completed tasks are removed from `progress.md`; their final state is preserved in the archived exec-plan or commit history.
1574
+
1575
+ `scratch.md` rules:
1576
+
1577
+ - Session-local working memory for multi-phase tasks: current phase, intermediate TODOs discovered mid-implementation, temporary notes.
1578
+ - Cleared when the task completes or when a fresh session starts.
1579
+ - Anything that must survive (decisions, deferred findings, progress) is promoted to `decisions.md`, `known-issues.md`, `progress.md`, or the exec-plan before clearing.
1580
+ - This file is the legitimate home for the working TODOs that `Stop` hook forbids inside source code.
1581
+
1582
+ `known-issues.md` entry format:
1583
+
1584
+ ```md
1585
+ ## YYYY-MM-DD <one-line summary>
1586
+
1587
+ - discovered in: <task / session>
1588
+ - type: bug | doc-drift | dead-code | architecture | security | other
1589
+ - impact: low | medium | high
1590
+ - proposed action: ignore | create task | revisit at next replan
1591
+ ```
1592
+
1593
+ Update after each session:
1594
+
1595
+ ```md
1596
+ ## Session Summary
1597
+
1598
+ Date:
1599
+ Task:
1600
+ Files changed:
1601
+ Validation run:
1602
+ Result:
1603
+ Decisions:
1604
+ Open issues:
1605
+ Next step:
1606
+ ```
1607
+
1608
+ For tasks longer than one day, create:
1609
+
1610
+ ```text
1611
+ docs/exec-plans/active/<task-name>.md
1612
+ ```
1613
+
1614
+ Execution plans include:
1615
+
1616
+ - background
1617
+ - goal
1618
+ - phased plan
1619
+ - validation per phase
1620
+ - risks
1621
+ - decision log
1622
+ - current state
1623
+
1624
+ When a task has an exec-plan, `current state` in the exec-plan is the authoritative progress record; `progress.md` only points to it.
1625
+
1626
+ ### 12.1 Documentation Sync Contract
1627
+
1628
+ Changes to plan, architecture, public function contracts, test strategy, or module responsibilities must update the related docs.
1629
+
1630
+ Rule:
1631
+
1632
+ > If implementation differs from the approved plan, the task cannot only change code; it must update the plan and explain why.
1633
+
1634
+ Check:
1635
+
1636
+ - task spec
1637
+ - execution plan
1638
+ - `docs/ARCHITECTURE.md`
1639
+ - `docs/MODULE_MAP.md`
1640
+ - module docs
1641
+ - module-local `CLAUDE.md`
1642
+ - public surface contract
1643
+ - test plan / validation section
1644
+ - `.ai/state/decisions.md`
1645
+ - `.ai/state/progress.md`
1646
+ - `.ai/state/known-issues.md`
1647
+
1648
+ Final report must list:
1649
+
1650
+ ```text
1651
+ Docs checked:
1652
+ Docs updated:
1653
+ Known stale docs:
1654
+ ```
1655
+
1656
+ Enforcement:
1657
+
1658
+ - PR template must include a docs sync checklist covering plan, public contract, architecture, module docs, and test plan.
1659
+ - Stop hook checks that plan, public contract, or test strategy changes have matching doc updates before the session ends.
1660
+ - `tools/check-docs-freshness` runs in CI and fails the build when code touching tracked surfaces lands without corresponding doc updates.
1661
+
1662
+ ### 12.2 Replan Protocol
1663
+
1664
+ Triggers:
1665
+
1666
+ - planned API, module, or data structure does not exist
1667
+ - existing architecture invalidates plan assumptions
1668
+ - public API, schema, auth, permission, or payment must change
1669
+ - scope must expand
1670
+ - tests show the plan cannot satisfy real behavior
1671
+ - repeated fixes do not converge
1672
+ - a better existing implementation or abstraction is discovered
1673
+ - continuing would violate architecture constraints
1674
+
1675
+ Process:
1676
+
1677
+ ```text
1678
+ Stop
1679
+ -> Explain blocker
1680
+ -> Compare approved plan with code reality
1681
+ -> List options
1682
+ -> Recommend new plan
1683
+ -> Ask approval if scope/risk changed
1684
+ -> Update docs
1685
+ -> Continue implementation
1686
+ ```
1687
+
1688
+ Must pause for approval:
1689
+
1690
+ - scope expands
1691
+ - public API changes
1692
+ - schema changes
1693
+ - auth / permission / payment behavior changes
1694
+ - architecture boundary changes
1695
+ - test contract changes
1696
+ - existing abstraction is deleted or replaced
1697
+ - new dependency is introduced
1698
+ - phased migration is needed
1699
+
1700
+ Low-risk deviations may continue with a note:
1701
+
1702
+ - file or test location differs
1703
+ - existing helper replaces planned helper
1704
+ - private implementation detail changes
1705
+ - scope, public surface, architecture boundary, and test contract stay unchanged
1706
+
1707
+ ### 12.3 Design Change Control
1708
+
1709
+ When a large feature is split into subtasks and a design defect is found midstream, do not default to full rollback, and do not continue because of sunk cost.
1710
+
1711
+ Process:
1712
+
1713
+ ```text
1714
+ Freeze current implementation
1715
+ -> Run current validation
1716
+ -> Record completed subtasks
1717
+ -> Identify design defect
1718
+ -> Assess impact radius
1719
+ -> Classify severity
1720
+ -> Compare options
1721
+ -> Preserve reusable assets
1722
+ -> Discard wrong boundaries/contracts/abstractions
1723
+ -> Update plan and docs
1724
+ -> Continue with approved path
1725
+ ```
1726
+
1727
+ Severity:
1728
+
1729
+ ```text
1730
+ P0 architecture direction is wrong:
1731
+ module boundary, public API, data model, security/permission model
1732
+ => favor rebuild or major rollback
1733
+
1734
+ P1 public contract needs adjustment:
1735
+ public functions, file responsibilities, data flow direction
1736
+ => partial rollback + migration
1737
+
1738
+ P2 internal implementation issue:
1739
+ helper, private function split, test organization
1740
+ => local refactor
1741
+
1742
+ P3 plan detail mismatch:
1743
+ file name, call location, helper replacement
1744
+ => update plan and continue
1745
+ ```
1746
+
1747
+ Compare three options:
1748
+
1749
+ ```text
1750
+ A. Patch forward
1751
+ B. Partial rollback + redesign
1752
+ C. Full rollback + rebuild
1753
+ ```
1754
+
1755
+ Prefer preserving tests, fixtures, docs, clarified requirements, types, UI components, validated pure functions, and low-level tools.
1756
+ Prefer discarding wrong public APIs, wrong module boundaries, wrong data models, wrong permission models, wrong abstractions, and glue code built around the wrong design.
1757
+
1758
+ Principle:
1759
+
1760
+ > Preserve tests, knowledge, and reusable assets; discard wrong boundaries, wrong contracts, and wrong abstractions. Decide based on future maintenance cost, not lines already written.
1761
+
1762
+ ## 13. AI Code Acceptance
1763
+
1764
+ AI code must satisfy:
1765
+
1766
+ ```text
1767
+ behavior is correct
1768
+ + architecture is compliant
1769
+ + public contract is accurate
1770
+ + tests are sufficient
1771
+ + docs are synced
1772
+ + plan deviations are traceable
1773
+ ```
1774
+
1775
+ ### 13.1 Acceptance Checklist
1776
+
1777
+ ```md
1778
+ # AI Code Acceptance Checklist
1779
+
1780
+ ## Scope
1781
+
1782
+ - [ ] Diff is scoped to the task.
1783
+ - [ ] No unrelated refactor, rename, formatting churn, or cleanup.
1784
+ - [ ] No forbidden files changed.
1785
+ - [ ] No unapproved dependency added.
1786
+ - [ ] No scope expansion without Replan.
1787
+
1788
+ ## Role / Handoff
1789
+
1790
+ - [ ] The task used an explicit `project-manager` role session for user communication, translation, routing, and status reporting.
1791
+ - [ ] Task severity was classified.
1792
+ - [ ] Required role route was followed or an exception was approved.
1793
+ - [ ] The project manager verified required handoff artifacts, validation evidence, docs sync, and remaining risks.
1794
+ - [ ] The project manager did not become the architect, coder, and reviewer for the same non-trivial task.
1795
+ - [ ] The coder session did not own architecture, planning, final testing responsibility, and review by itself.
1796
+ - [ ] Required handoff artifacts exist and match the handoff schemas.
1797
+ - [ ] The coder did not change task scope, module boundaries, public contracts, or test strategy without Replan.
1798
+ - [ ] The reviewer used fresh context, a reviewer role session, or a read-only reviewer subagent.
1799
+ - [ ] Any reviewer direct fixes were small, local, low-risk, and review-scoped.
1800
+ - [ ] Larger implementation issues were returned to coder.
1801
+ - [ ] Architecture, public contract, dependency, schema, auth, permission, payment, or design issues were returned to architect.
1802
+ - [ ] Task-level validation evidence is recorded in `.ai/handoffs/<task-slug>/validation-log.md` when a handoff directory exists.
1803
+
1804
+ ## Architecture
1805
+
1806
+ - [ ] Module boundaries are preserved.
1807
+ - [ ] No forbidden internal imports.
1808
+ - [ ] Business logic stays in the correct layer.
1809
+ - [ ] Existing service/domain/repository APIs are reused where appropriate.
1810
+ - [ ] No duplicate parallel abstraction was introduced.
1811
+ - [ ] `tools/check-boundaries` passes.
1812
+
1813
+ ## Public Contract
1814
+
1815
+ - [ ] Public surface matches the approved plan.
1816
+ - [ ] No unplanned public API was added.
1817
+ - [ ] No public signature changed unexpectedly.
1818
+ - [ ] Inputs, outputs, side effects, and error behavior match the contract.
1819
+ - [ ] Public contract changes are reflected in docs and tests.
1820
+
1821
+ ## Tests
1822
+
1823
+ - [ ] New or modified public functions have contract tests.
1824
+ - [ ] Behavior changes have regression tests.
1825
+ - [ ] Tests cover happy path and boundary/failure path.
1826
+ - [ ] High-risk functions have expanded behavior-matrix coverage.
1827
+ - [ ] Tests were not weakened, deleted, or skipped.
1828
+ - [ ] Tests assert behavior, not just mock call order.
1829
+
1830
+ ## Validation
1831
+
1832
+ - [ ] Required validation commands were run.
1833
+ - [ ] Validation passed.
1834
+ - [ ] Failures were fixed and rerun.
1835
+ - [ ] Relevant L0/L1/L2 checks were run.
1836
+ - [ ] Relevant L3 smoke E2E was run for user-facing critical paths.
1837
+
1838
+ ## Docs
1839
+
1840
+ - [ ] Task spec remains accurate.
1841
+ - [ ] Execution plan is updated if implementation changed.
1842
+ - [ ] Module docs are updated if responsibilities changed.
1843
+ - [ ] Public surface contract is updated if public functions changed.
1844
+ - [ ] Test plan / validation section is updated if testing strategy changed.
1845
+ - [ ] Decisions and known issues are recorded.
1846
+ - [ ] No known stale docs are left behind.
1847
+
1848
+ ## Replan / Design Change
1849
+
1850
+ - [ ] Any deviation from the plan is documented.
1851
+ - [ ] Scope, architecture, public contract, or test contract changes were approved.
1852
+ - [ ] Design defects were handled through Design Change Control.
1853
+ ```
1854
+
1855
+ ### 13.2 Acceptance Flow
1856
+
1857
+ ```text
1858
+ 1. Project manager classifies task severity and required role route
1859
+ 2. Verify required handoff artifacts exist
1860
+ 3. Inspect diff scope
1861
+ 4. Compare diff against task spec and architecture plan
1862
+ 5. Compare public surface against Public Surface Contract
1863
+ 6. Review tests for contract and regression coverage
1864
+ 7. Run or inspect validation evidence
1865
+ 8. Run architecture boundary checks
1866
+ 9. Check docs consistency
1867
+ 10. Run independent review for complex/high-risk changes
1868
+ 11. Project manager verifies process compliance, remaining risks, and next step
1869
+ 12. Approve, request changes, or trigger Replan
1870
+ ```
1871
+
1872
+ Claude’s final report must include:
1873
+
1874
+ ```text
1875
+ Task severity:
1876
+ Project manager decision:
1877
+ Role sessions used:
1878
+ Handoff artifacts:
1879
+ Files changed:
1880
+ Public surface changed:
1881
+ Tests added/updated:
1882
+ Validation run:
1883
+ Architecture checks:
1884
+ Docs updated:
1885
+ Plan deviations:
1886
+ Remaining risks:
1887
+ ```
1888
+
1889
+ Acceptance result:
1890
+
1891
+ ```text
1892
+ Accepted:
1893
+ meets task, architecture, public contract, tests, validation, and docs requirements.
1894
+
1895
+ Needs Changes:
1896
+ clear issue, but no redesign needed.
1897
+
1898
+ Replan Required:
1899
+ scope, architecture, public contract, test contract, or design assumptions changed.
1900
+ ```
1901
+
1902
+ Automation tools:
1903
+
1904
+ ```text
1905
+ tools/check-boundaries
1906
+ tools/check-public-surface
1907
+ tools/check-contract-tests
1908
+ tools/check-docs-freshness
1909
+ tools/check-generated-artifacts
1910
+ tools/check-agent-rules
1911
+ tools/check-changed
1912
+ tools/check-e2e-smoke
1913
+ ```
1914
+
1915
+ ## 14. MCP and Permissions
1916
+
1917
+ MCP is useful for repo-external, frequently changing, tool-accessible context:
1918
+
1919
+ - issue / PR
1920
+ - docs/wiki
1921
+ - logs/metrics/traces
1922
+ - browser / Playwright
1923
+ - database inspection
1924
+ - feature flags
1925
+ - CI logs
1926
+ - code search
1927
+
1928
+ Do not connect every MCP server at the start. Prioritize:
1929
+
1930
+ ```text
1931
+ 1. GitHub / issue / PR
1932
+ 2. browser / Playwright
1933
+ 3. code search
1934
+ 4. CI logs
1935
+ 5. internal docs
1936
+ ```
1937
+
1938
+ Permission principles:
1939
+
1940
+ - prefer read-only
1941
+ - write actions require explicit approval
1942
+ - production data is not available by default
1943
+ - destructive actions require hard gates
1944
+ - third-party MCP servers require source review and version pinning, with an owner recorded for each enabled server
1945
+
1946
+ ## 15. Harness Drift and Evolution
1947
+
1948
+ The harness itself can drift. Rules, role agent files, commands, hooks, generated indexes, and validation scripts are also software and must be maintained with the same skepticism as production code.
1949
+
1950
+ ### 15.1 Generated Context Only
1951
+
1952
+ Manual indexes are not reliable sources of truth in a large codebase.
1953
+
1954
+ Forbidden:
1955
+
1956
+ - hand-maintained module indexes as authoritative context
1957
+ - hand-maintained test maps as authoritative context
1958
+ - stale public-surface maps
1959
+ - generated artifacts edited by hand
1960
+
1961
+ Allowed:
1962
+
1963
+ - generated artifacts created from source code, build graphs, ownership metadata, test configs, coverage, CI, LSP, and code search
1964
+ - checked-in generated artifacts only when CI verifies freshness
1965
+ - fallback to live code search when generated context is missing or stale
1966
+
1967
+ Required check:
1968
+
1969
+ ```text
1970
+ tools/check-generated-artifacts
1971
+ ```
1972
+
1973
+ This check should fail when:
1974
+
1975
+ - `.ai/generated/module-index.json` is stale
1976
+ - `.ai/generated/test-map.json` is stale
1977
+ - `.ai/generated/public-surface.json` is stale
1978
+ - a generated artifact was hand-edited
1979
+ - generated context disagrees with source-of-truth code metadata
1980
+
1981
+ If generated context conflicts with live code, live code wins.
1982
+
1983
+ ### 15.2 Repeated Rules Need Rule IDs
1984
+
1985
+ Repeating critical rules in root `CLAUDE.md`, module-local `CLAUDE.md`, and role agent files can be useful defense in depth, but untracked duplication causes drift.
1986
+
1987
+ Rules:
1988
+
1989
+ - critical repeated rules must have stable IDs, such as `RULE-SCOPE-001`, `RULE-ARCH-001`, `RULE-TEST-001`, `RULE-PERM-001`
1990
+ - root rule text is canonical unless a different canonical source is explicitly defined
1991
+ - role agent files should reference rule IDs or include generated rule snippets
1992
+ - `tools/check-agent-rules` must fail when repeated rule text, rule IDs, or required rule coverage drift
1993
+ - do not copy long rule blocks manually into many files without a freshness check
1994
+
1995
+ ### 15.3 Scaffolding Must Earn Its Keep
1996
+
1997
+ The right philosophy is not "more constraints means more reliability." The right philosophy is:
1998
+
1999
+ > Add constraints when they prevent observed failures. Remove constraints when they no longer pay for their maintenance cost.
2000
+
2001
+ Every rule, role, handoff artifact, hook, generated index, validation command, and required checklist item adds cost.
2002
+
2003
+ Monthly review must remove as well as add scaffolding:
2004
+
2005
+ - remove rules that no longer prevent real failures
2006
+ - merge roles that create handoff overhead without reducing risk
2007
+ - replace manual docs or indexes with generated artifacts
2008
+ - relax constraints that block safe cross-file edits by stronger models
2009
+ - promote useful repeated behavior into tests, CI, hooks, or generated checks
2010
+ - delete soft behavioral rules that are not measurable and do not prevent observed failures
2011
+ - delete stale workarounds created for older model limitations
2012
+
2013
+ Do not preserve a rule only because it helped an older model. A rule must justify itself against the current model, current tools, current codebase, and current failure data.
2014
+
2015
+ ### 15.4 Model Evolution Review
2016
+
2017
+ When the model, Claude Code, MCP tooling, code search, test selection, or CI improves, revisit the harness.
2018
+
2019
+ Review questions:
2020
+
2021
+ - Which constraints were added for a weaker model?
2022
+ - Which rules now block safe multi-file reasoning or coordinated edits?
2023
+ - Which role handoffs are producing useful artifacts, and which are ritual?
2024
+ - Which prompts can be replaced by stronger tests, generated checks, or tools?
2025
+ - Which checks are redundant because CI or type systems now cover them?
2026
+ - Which tasks can safely move to a lighter route because failure data improved?
2027
+
2028
+ The goal is a harness that stays strong by staying lean.
2029
+
2030
+ ## 16. Team Governance
2031
+
2032
+ The team needs a Claude Code Harness owner, usually DevEx, platform, staff engineer, or architecture group.
2033
+
2034
+ Responsibilities:
2035
+
2036
+ - maintain `CLAUDE.md`
2037
+ - maintain `.claude/agents/project-manager.md`
2038
+ - maintain role command templates used by the project manager to brief `architect`, `coder`, `reviewer`, and specialist sessions
2039
+ - maintain hooks
2040
+ - maintain role agents, skills, subagents, and commands
2041
+ - maintain validation commands
2042
+ - maintain generated context artifacts and freshness checks
2043
+ - maintain docs freshness and documentation sync rules
2044
+ - review MCP permissions
2045
+ - clean stale rules, stale docs, stale generated artifacts, and stale scaffolding
2046
+ - collect agent failure modes
2047
+
2048
+ Rule updates:
2049
+
2050
+ - frequent mistakes go into `CLAUDE.md`
2051
+ - high-risk boundaries go into `CLAUDE.md`
2052
+ - information used by every task goes into `CLAUDE.md`
2053
+ - everything else goes into module-local `CLAUDE.md`, docs, skill, command, hook, or CI check
2054
+ - every repeated critical rule needs a stable rule ID and freshness check
2055
+ - every generated context artifact needs a source-of-truth generator and CI freshness check
2056
+
2057
+ Monthly review:
2058
+
2059
+ - What mistakes does Claude make most often?
2060
+ - Which task types succeed most often?
2061
+ - Which tasks should be forbidden for automation?
2062
+ - Which rules or docs are stale?
2063
+ - Which rules should be removed because the current model no longer needs them?
2064
+ - Which roles or handoff artifacts add overhead without reducing real failures?
2065
+ - Which manual indexes or docs should become generated artifacts?
2066
+ - Which prompts should become skills?
2067
+ - Which validation commands are too slow?
2068
+ - Which checks should move into hooks?
2069
+ - Which checks, hooks, or role requirements can be simplified?
2070
+
2071
+ `known-issues.md` triage (every monthly review):
2072
+
2073
+ - For each unhandled entry, decide: promote to task, fold into a planned change, or dismiss.
2074
+ - Entries older than 90 days with no action are dismissed with a reason recorded in `decisions.md`.
2075
+ - `known-issues.md` is only useful if it stays small; an ever-growing file means triage is not happening.
2076
+
2077
+ ## 17. Minimum Team Rules
2078
+
2079
+ If you can only enforce 16 rules, enforce these:
2080
+
2081
+ 1. User-facing tasks start with `claude --agent project-manager`; untagged sessions are not implicit project managers.
2082
+ 2. The `project-manager` agent owns user communication, translation, task clarification, and precise role command dispatch.
2083
+ 3. Complex tasks use explicit role sessions, handoff artifacts, and plan first; do not edit directly.
2084
+ 4. One task uses one branch, one worktree, one handoff directory, and one PR by default.
2085
+ 5. Role sessions for the same task work in the same task worktree sequentially; parallel write work uses separate task or sub-task worktrees.
2086
+ 6. One session handles one coherent role and task.
2087
+ 7. Tasks must define scope, non-goals, and validation.
2088
+ 8. Every task must define file-level responsibilities.
2089
+ 9. Ordinary tasks must define public function contracts.
2090
+ 10. New or modified public functions must have contract tests.
2091
+ 11. Code changes must run relevant validation.
2092
+ 12. Architecture, public contract, or test strategy changes must sync docs.
2093
+ 13. Manual indexes are not authoritative; generated context must be freshness-checked.
2094
+ 14. AI review uses fresh context or a reviewer role session.
2095
+ 15. High-risk actions require human approval.
2096
+ 16. Harness rules, roles, checks, and handoffs must be reviewed for removal as well as addition.
2097
+
2098
+ ## 18. Common Anti-Patterns
2099
+
2100
+ ```text
2101
+ Huge CLAUDE.md
2102
+ -> short entry file + docs + module-local rules
2103
+
2104
+ Natural-language-only constraints
2105
+ -> hooks / lint / tests / CI / permissions
2106
+
2107
+ Hand-maintained indexes
2108
+ -> generated artifacts + CI freshness checks
2109
+
2110
+ Copied critical rules in many files
2111
+ -> rule IDs + generated snippets + check-agent-rules
2112
+
2113
+ No validation command
2114
+ -> check-fast / check-changed / check-module
2115
+
2116
+ Too much at once
2117
+ -> phases / incremental commits / draft PR
2118
+
2119
+ One do-everything session
2120
+ -> explicit role sessions + file handoffs
2121
+
2122
+ Project manager becomes coder/reviewer
2123
+ -> project manager coordinates and verifies; role sessions execute
2124
+
2125
+ Project manager only tracks status but sends vague role prompts
2126
+ -> project manager translates user intent into precise role commands with scope, contracts, validation, outputs, and stop conditions
2127
+
2128
+ Role-based worktree fragmentation
2129
+ -> one task worktree; architect -> coder -> reviewer hand off sequentially inside it
2130
+
2131
+ Dynamic subagent routing for the main workflow
2132
+ -> start the session with claude --agent <role>
2133
+
2134
+ Permanent scaffolding for old model limits
2135
+ -> monthly model evolution review + remove stale constraints
2136
+
2137
+ Coder session self-reviews
2138
+ -> reviewer role session / fresh review session / reviewer subagent
2139
+
2140
+ Unbounded multi-agent parallelism
2141
+ -> separate task or sub-task worktrees / ownership / read-write separation
2142
+ ```