vibe-coding-master 0.4.42 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,2465 +0,0 @@
1
- # Claude Code AI Coding Best Practices
2
-
3
- > Archived: this document is kept for historical reference only. As of 2026-06-08, it is no longer maintained or updated. Current VCM-specific practice belongs in `docs/vcm-cc-best-practices.md`.
4
-
5
- Date: 2026-05-22
6
-
7
-
8
- Core principle:
9
-
10
- > AI coding reliability comes from two things: **public contract design** prevents architecture drift, and **public contract tests** prevent behavior drift.
11
-
12
- Reliable loop:
13
-
14
- ```text
15
- task brief
16
- -> file responsibilities / public function contracts
17
- -> small-step implementation
18
- -> layered testing
19
- -> architecture and acceptance checks
20
- -> documentation sync
21
- -> Replan when needed
22
- ```
23
-
24
- ## 1. Basic Principles
25
-
26
- Treat Claude Code as a smart engineer with limited context. It needs clear boundaries, executable feedback, and an acceptance checklist.
27
-
28
- Good tasks for Claude:
29
-
30
- - reproducible, testable bug fixes
31
- - small to medium features with clear boundaries
32
- - implementation that follows existing patterns
33
- - test additions, PR review comment fixes, documentation drafts
34
- - codebase exploration, explanation, onboarding
35
-
36
- Do not hand these directly to Claude:
37
-
38
- - “refactor the whole system”
39
- - “figure it out” tasks for performance, auth, permissions, payments, schema, or data deletion
40
- - complex business changes without a spec, tests, or acceptance criteria
41
-
42
- High-risk tasks must reduce Claude’s autonomy:
43
-
44
- - auth / permission
45
- - payment / billing
46
- - database schema / migration
47
- - public API / SDK
48
- - protocol / serialization
49
- - data deletion / privacy
50
- - concurrency / distributed consistency
51
- - security-sensitive infrastructure
52
-
53
- These tasks require a plan, public contracts, test contracts, validation commands, and human review.
54
-
55
- Behavioral guardrails:
56
-
57
- - State key assumptions before coding; call out unclear requirements, boundaries, or acceptance criteria.
58
- - When multiple interpretations are reasonable, do not choose silently; explain the difference and tradeoff, and ask for confirmation when needed.
59
- - Prefer the simplest solution that satisfies the task; do not add unrequested features, configuration, extension points, or abstractions.
60
- - Touch only files required by the task; do not clean up, format, or refactor adjacent code opportunistically.
61
- - When the current task replaces a mechanism, remove obsolete code, stale paths, dead branches, legacy adapters, and unused compatibility shims. Do not preserve backward compatibility with stale code unless the user explicitly asks for it.
62
- - Clean up only unused imports, variables, functions, or test leftovers created by the current change.
63
- - Report-but-don't-act: when noticing an issue outside the current task scope, record it in the task-local handoff `known-issues.md`; promote it to `docs/known-issues.md` only if it remains confirmed and useful across tasks.
64
- - Every diff line must trace to the task goal, public contract, test contract, or required documentation sync.
65
- - For multi-step tasks, define the validation check for each step.
66
-
67
- ## 2. Repo Harness Structure
68
-
69
- Recommended structure:
70
-
71
- ```text
72
- repo/
73
- CLAUDE.md
74
- .gitignore
75
-
76
- docs/
77
- ARCHITECTURE.md
78
- MODULE_MAP.md
79
- TESTING.md
80
- SECURITY.md
81
- DEPENDENCY_RULES.md
82
- plans/
83
- active/
84
- completed/
85
-
86
- .claude/
87
- settings.json
88
- skills/
89
- route-message.md
90
- docs-sync.md
91
- long-running-validation.md
92
- agents/
93
- project-manager.md
94
- architect.md
95
- coder.md
96
- reviewer.md
97
- optional/
98
- security-specialist.md
99
- migration-specialist.md
100
- performance-specialist.md
101
- frontend-qa.md
102
- commands/
103
-
104
- .ai/
105
- jobs/ # ignored runtime state for long-running validation jobs
106
- handoffs/
107
- role-commands/
108
- architect.md
109
- coder.md
110
- reviewer.md
111
- architecture-plan.md
112
- implementation-log.md
113
- validation-log.md
114
- known-issues.md
115
- review-report.md
116
- docs-sync-report.md
117
- generated/
118
- module-index.json
119
- test-map.json
120
- public-surface.json
121
-
122
- .ai/tools/
123
- check-fast
124
- check-changed
125
- check-module
126
- check-e2e-smoke
127
- check-boundaries
128
- check-public-surface
129
- check-contract-tests
130
- check-generated-artifacts
131
- check-docs-freshness
132
- check-agent-rules
133
- run-long-check
134
- watch-job
135
- ```
136
-
137
- `.ai/tools/` is the recommended durable harness tool root. It keeps AI validation and discovery entry points near `.ai/generated/` while avoiding collisions with project-owned root `tools/` directories.
138
-
139
- Required large-project baseline:
140
-
141
- For a large project, the harness is not a maturity ladder. Treat the structure above as the baseline before letting Claude Code make non-trivial changes.
142
-
143
- Missing pieces are not an accepted intermediate design. If a legacy project is missing part of the harness, record the gap in `docs/known-issues.md` or an execution plan with owner, risk, and target date. High-risk work must wait until the relevant rules, role agents, docs, and validation commands exist.
144
-
145
- Minimum baseline for non-trivial AI coding:
146
-
147
- - root `CLAUDE.md`
148
- - module-local `CLAUDE.md` for edited modules
149
- - architecture, module map, testing, security, and dependency docs
150
- - role agents for project management/user communication, architecture/planning, coding, and independent review/testing
151
- - task briefs or long-running plans, handoff artifacts, task-local known issues, durable `docs/known-issues.md`, validation logs, and generated context artifacts
152
- - fast, changed-file, module, boundary, public-surface, contract-test, generated-artifact, docs-freshness, and agent-rule checks
153
- - project skills for repeated operations such as long-running validation
154
- - hooks or CI gates for protected files, validation, docs sync, public contracts, and test quality
155
-
156
- ## 3. `CLAUDE.md`
157
-
158
- `CLAUDE.md` is an entry map, not a project encyclopedia.
159
-
160
- Include:
161
-
162
- - one-sentence project description
163
- - repository map
164
- - common build / test / lint / typecheck commands
165
- - documents Claude should read before starting
166
- - module boundaries and forbidden actions
167
- - high-risk areas
168
- - files not to touch
169
- - Definition of Done
170
- - what to do when unsure
171
-
172
- Do not include:
173
-
174
- - long architecture essays
175
- - full descriptions of every file
176
- - complete business rule manuals
177
- - all API docs
178
- - long style guides
179
- - vague rules like “write high-quality code”
180
- - frequently changing task state
181
-
182
- Root template:
183
-
184
- ```md
185
- # CLAUDE.md
186
-
187
- ## Project Map
188
-
189
- - `services/`: production services
190
- - `packages/`: shared libraries
191
- - `apps/`: user-facing applications
192
- - `docs/`: architecture, testing, security, module docs
193
- - `.ai/tools/`: validation and developer utilities
194
-
195
- ## Start Here
196
-
197
- - Read `docs/ARCHITECTURE.md` for system overview.
198
- - Read `docs/MODULE_MAP.md` before choosing files.
199
- - Read module-local `CLAUDE.md` before editing a subdirectory.
200
- - Prefer existing APIs, services, helpers, and patterns before adding abstractions.
201
-
202
- ## Commands
203
-
204
- - Fast validation: `.ai/tools/check-fast`
205
- - Changed files validation: `.ai/tools/check-changed`
206
- - Module validation: `.ai/tools/check-module <module>`
207
-
208
- ## Role Entry Points
209
-
210
- Role-specific behavior lives in `.claude/agents/`.
211
-
212
- - Use `claude --agent project-manager` for user communication, task clarification, task briefs, role routing, role commands, status summaries, and final acceptance.
213
- - Use `claude --agent architect` for architecture plans, module boundaries, file responsibilities, public contracts, test contracts, phase plans, and post-review docs sync / architecture drift checks.
214
- - Use `claude --agent coder` for implementation and direct tests within an approved plan.
215
- - Use `claude --agent reviewer` for independent review, test adequacy, validation evidence, docs gap detection, and acceptance findings.
216
- - Do not use an untagged session as an implicit project manager for non-trivial work.
217
- - Do not simulate another role inside the wrong session.
218
-
219
- ## Role Sessions
220
-
221
- - For complex features, cross-module changes, refactors, public API changes, schema changes, auth, payment, permission, or security-sensitive work, start Claude Code with an explicit role: `claude --agent <role>`.
222
- - Default core roles are `project-manager`, `architect`, `coder`, and `reviewer`.
223
- - The `project-manager` role owns user communication, task routing, role commands, handoff verification, final status reporting, and PR preparation after required gates pass.
224
- - Do not let one coding session own architecture/plan decisions, implementation, final testing responsibility, and review.
225
- - Role outputs are exchanged through a task-local handoff directory, for example `.ai/handoffs/`, not through chat history.
226
- - Role messaging is turn-based. Keep at most one active message to the same target role.
227
- - If the project installs a route-message skill, use it whenever writing or updating role message files.
228
- - Send role messages by writing or updating a fixed route file, for example `.ai/handoffs/messages/<from-role>-<to-role>.md`.
229
- - For a given target role, update the same route file instead of creating multiple fragmented messages.
230
- - After writing or updating a role message file, end the current Claude Code turn. Treat the file write as the final coordination action of that turn.
231
- - Do not poll message files, start shell loops, or keep the turn open waiting for another role's answer. The harness manager should dispatch pending route files after the role turn ends.
232
- - Do not end the current turn only to wait for a long-running shell callback. For long-running builds/tests, use the `long-running-validation` skill and the project long-task wrapper.
233
- - Do not use Claude Code Task/Subagent as the primary delegation mechanism for the main role chain; explicit role sessions own the long-running workflow.
234
- - If new information appears while a role is still processing, update the handoff artifact or wait instead of sending fragmented follow-up messages.
235
- - When the required role route includes `architect`, coding must not start until the architecture and plan artifact exists.
236
- - If the current session was not started with the required role, stop and ask the user to restart with `claude --agent <role>`; do not pretend to be that role inside the wrong session.
237
- - Critical global rules may be repeated in role agent files for defense in depth, but repeated rules must use stable rule IDs and be checked by `.ai/tools/check-agent-rules`. Do not maintain untracked manual copies.
238
-
239
- ## Harness-Managed Blocks
240
-
241
- If a harness manager maintains project rules, those rules must live in repo-local files and be reviewable in Git. Do not inject long-lived collaboration rules into a Claude Code terminal as ordinary input.
242
-
243
- Allowed managed block format:
244
-
245
- ```md
246
- <!-- HARNESS:BEGIN version=1 -->
247
- Harness-managed rules.
248
- <!-- HARNESS:END -->
249
- ```
250
-
251
- Rules:
252
-
253
- - A harness manager may create missing `CLAUDE.md` and `.claude/agents/{project-manager,architect,coder,reviewer}.md` from recommended defaults.
254
- - If a file already exists, the harness manager may only insert or replace the harness-managed block.
255
- - The harness manager must not overwrite user-authored content outside managed blocks.
256
- - After applying harness changes, the harness manager must report changed files and recommend a user review/commit.
257
- - Session startup should pass environment variables and start the role agent; it should not paste a long harness context into the terminal.
258
-
259
- ## Default Behavior
260
-
261
- - State assumptions before coding; ask when requirements, boundaries, or acceptance criteria are unclear.
262
- - When multiple interpretations are reasonable, do not choose silently; explain the difference and tradeoff, and ask for confirmation when needed.
263
- - Prefer the simplest solution that satisfies the task; do not add speculative features, abstractions, configuration, or flexibility.
264
- - Touch only files required by the task; do not clean up or refactor unrelated code.
265
- - Clean up only unused code created by the current change.
266
- - Report-but-don't-act: record out-of-scope issues in the task-local handoff `known-issues.md`; do not act on them.
267
- - Every changed line must trace to the task goal, public contract, test contract, or required documentation sync.
268
- - For multi-step tasks, define the validation check for each step before implementing it.
269
-
270
- ## Forbidden
271
-
272
- - Do not edit generated, vendor, third-party, lock, or secret files unless explicitly requested.
273
- - Do not introduce dependencies without approval.
274
- - Do not bypass tests, lint, typecheck, auth, permissions, or security checks.
275
- - Public API, schema, auth, payment, and permission changes require explicit plan and approval.
276
- - Do not cross module boundaries through internal imports.
277
-
278
- ## Definition of Done
279
-
280
- - Changed files and meaningful hunks have traceable reasons.
281
- - Required validation passes.
282
- - New or modified public functions have contract tests.
283
- - Behavior changes have regression tests unless impractical.
284
- - Plan, architecture, public contract, test strategy, and module responsibility changes are reflected in docs after post-review architect docs sync.
285
- - Follow-ups are recorded in the task-local handoff `known-issues.md`, `docs/known-issues.md`, or the execution plan.
286
- ```
287
-
288
- Large projects must have module-local `CLAUDE.md` files:
289
-
290
- ```text
291
- services/billing/CLAUDE.md
292
- services/auth/CLAUDE.md
293
- apps/web/CLAUDE.md
294
- packages/ui/CLAUDE.md
295
- ```
296
-
297
- Module files should define:
298
-
299
- - module responsibility
300
- - important files
301
- - public entry points
302
- - forbidden dependencies
303
- - test commands
304
- - historical pitfalls
305
- - high-risk behavior
306
-
307
- ## 4. Task Briefs and Planning Granularity
308
-
309
- Every task must define at least **file-level responsibilities**.
310
-
311
- Ordinary PRs, features, and bug fixes must define **public function contracts**.
312
-
313
- Public functions include:
314
-
315
- - exported functions
316
- - public methods
317
- - module APIs
318
- - service / controller / repository public entry points
319
- - route handlers / command handlers
320
- - hooks
321
- - externally used component props
322
-
323
- Planning granularity:
324
-
325
- ```text
326
- exploration / research module level + candidate files
327
- large rewrite / greenfield module level + file responsibilities, refine by phase
328
- ordinary feature file responsibilities + public function contracts
329
- bug fix touched files + affected public behavior
330
- public API / SDK / permissions contract level, interface-level design when needed
331
- small internal change file responsibilities + existing function behavior constraints
332
- ```
333
-
334
- Principles:
335
-
336
- ```text
337
- module boundaries: must be explicit
338
- file responsibilities: must be explicit
339
- public function contracts: required for ordinary tasks
340
- private helpers: depends on risk
341
- function internals: usually not fixed in advance
342
- ```
343
-
344
- Large tasks can start with modules, directories, file responsibilities, data flow, and dependency direction. Before each implementation phase, define the public function contracts involved in that phase.
345
-
346
- Task briefs do not require a dedicated file for every task. For ordinary work, the brief may live in the issue, PR description, role command, or project-manager handoff. For large, multi-phase, or multi-day work, the durable plan lives under `docs/plans/active/<plan-name>.md`.
347
-
348
- ### 4.1 Task Brief / Plan Template
349
-
350
- ```md
351
- # Task Brief / Plan
352
-
353
- ## Goal
354
-
355
- ## Background
356
-
357
- ## Scope
358
-
359
- ## Non-goals
360
-
361
- ## Task Severity
362
-
363
- ## Required Role Route
364
-
365
- ## Handoff Directory
366
-
367
- ## Relevant Files
368
-
369
- ## File Responsibilities
370
-
371
- For every file likely to be edited, define its responsibility.
372
-
373
- ## Public Surface Contract
374
-
375
- For ordinary PRs, feature additions, bug fixes, and high-risk changes, define:
376
-
377
- - public/exported functions or methods
378
- - module APIs
379
- - inputs and outputs
380
- - side effects
381
- - error behavior
382
- - dependency rules
383
- - signatures that must remain unchanged
384
-
385
- For large rewrites or greenfield work, each implementation phase must define public surface before coding.
386
-
387
- ## Test Contract
388
-
389
- For every new or modified public function, define required tests.
390
-
391
- Minimum:
392
- - happy path
393
- - boundary or failure path
394
-
395
- Business-critical functions also cover:
396
- - invalid input
397
- - permission or state constraints
398
- - side effects
399
- - idempotency
400
- - historical regressions
401
-
402
- ## Architecture Constraints
403
-
404
- ## Stop Conditions
405
-
406
- ## Expected Behavior
407
-
408
- ## Validation Commands
409
-
410
- ## Definition of Done
411
-
412
- ## Risks
413
-
414
- ## Questions
415
- ```
416
-
417
- ### 4.2 Stop Conditions
418
-
419
- Stop and update the plan before editing if:
420
-
421
- - current session role does not match the required role route
422
- - public API change seems necessary
423
- - DB schema change seems necessary
424
- - planned contract duplicates an existing API
425
- - module boundaries make the plan inaccurate
426
- - implementation needs to differ from the approved plan
427
- - related architecture, module, testing, or docs would become stale
428
-
429
- ## 5. Workflows
430
-
431
- ### 5.1 Small Change
432
-
433
- Use for single-file bugs, simple tests, copy, config, or known-pattern changes.
434
-
435
- ```text
436
- prompt
437
- -> edit
438
- -> focused validation
439
- -> review diff
440
- -> commit
441
- ```
442
-
443
- Prompt:
444
-
445
- ```text
446
- Fix the edge case in `src/foo.ts`.
447
- Keep the diff minimal.
448
- Run `pnpm test src/foo.test.ts`.
449
- Report the validation result.
450
- ```
451
-
452
- ### 5.2 Complex Change
453
-
454
- Use for multi-file changes, new features, business rules, or uncertain implementation paths.
455
-
456
- ```text
457
- Explore
458
- -> Plan
459
- -> approval
460
- -> implement phase 1
461
- -> validate
462
- -> review
463
- -> architect docs sync / architecture drift check
464
- -> commit
465
- -> implement next phase
466
- ```
467
-
468
- Exploration must not edit files:
469
-
470
- ```text
471
- Explore the codebase and create an implementation plan.
472
- Do not edit files yet.
473
-
474
- Include:
475
- - relevant files
476
- - proposed changes
477
- - public surface contract
478
- - tests to add/update
479
- - validation commands
480
- - risks
481
- - questions
482
- ```
483
-
484
- ### 5.3 Debug
485
-
486
- ```text
487
- reproduction
488
- -> hypotheses
489
- -> instrumentation
490
- -> reproduce
491
- -> inspect logs
492
- -> targeted fix
493
- -> regression test
494
- ```
495
-
496
- Prompt:
497
-
498
- ```text
499
- Debug this issue. Do not guess a fix yet.
500
-
501
- First:
502
- 1. List plausible hypotheses.
503
- 2. Identify where to inspect.
504
- 3. Propose the smallest validation command.
505
-
506
- Then make a targeted fix and add regression test.
507
- ```
508
-
509
- ### 5.4 TDD
510
-
511
- Use for bugs, parsers, serializers, validators, calculators, state machines, public API behavior, and functionality with clear input/output.
512
-
513
- ```text
514
- write failing contract test
515
- -> confirm it fails
516
- -> freeze test expectations
517
- -> implement
518
- -> do not weaken test
519
- -> pass focused test
520
- -> run module validation
521
- ```
522
-
523
- ### 5.5 Review
524
-
525
- Review should prioritize:
526
-
527
- - correctness
528
- - security / permission risk
529
- - regressions
530
- - missing tests
531
- - architecture boundary violations
532
- - public contract mismatch
533
-
534
- Use a `reviewer` role session for complex or high-risk tasks. A fresh review session or reviewer subagent is acceptable for smaller scoped changes. Do not let the same session that implemented the change be the only reviewer.
535
-
536
- ## 6. Context Management
537
-
538
- One session should correspond to one coherent task.
539
-
540
- Continue the same session when:
541
-
542
- - still working on the same bug / feature / review comment
543
- - prior exploration context is needed
544
- - fixing a problem Claude just introduced
545
-
546
- Start a new session when:
547
-
548
- - switching tasks
549
- - moving from implementation to independent review
550
- - the current session has read too many unrelated files
551
- - Claude repeats the same mistake
552
- - a phase is completed and committed
553
- - fresh eyes are needed
554
-
555
- Rule of thumb:
556
-
557
- > Continue when the next action depends on previous reasoning. Start fresh when the next action needs independent judgment. Start fresh when Claude gets confused.
558
-
559
- For large-codebase exploration, use read-only subagents. Keep only findings, file paths, and the plan in the `project-manager` session or the current owning role session.
560
-
561
- Context should include:
562
-
563
- - files to edit
564
- - related tests
565
- - module rules
566
- - failure logs
567
- - architecture boundaries
568
- - concrete examples
569
- - acceptance criteria
570
-
571
- Do not include:
572
-
573
- - many unrelated files
574
- - full old chat histories
575
- - long external docs
576
- - stale design docs
577
- - unrelated CI logs
578
-
579
- ## 7. Role-Based Agent Sessions
580
-
581
- For large projects, the default execution model should be explicit role-based sessions, not dynamic role routing inside one generic Claude conversation.
582
-
583
- The user-facing task should start with a `project-manager` role session. The project manager owns user communication, route-file message preparation, task risk classification, role routing, progress tracking, and process verification. It does not own architecture, coding, and independent review for the same non-trivial task.
584
-
585
- Do not make one generic Claude session own architecture, planning, coding, final testing, and review for non-trivial work. That blurs responsibility and makes acceptance weak.
586
-
587
- ### 7.1 Project Manager Session
588
-
589
- Start the user-facing coordination session explicitly:
590
-
591
- ```bash
592
- claude --agent project-manager
593
- ```
594
-
595
- Project manager responsibilities:
596
-
597
- ```text
598
- communicate with user
599
- -> clarify task
600
- -> turn user intent into a task brief / durable plan
601
- -> classify task risk
602
- -> choose required role route
603
- -> prepare the next role command
604
- -> ensure handoff directory exists when needed
605
- -> start or ask the user to start architect/coder/reviewer/specialist sessions when needed
606
- -> track progress, blockers, validation, docs sync, and Replan
607
- -> verify role outputs and handoff artifacts
608
- -> request post-review architect docs sync when required
609
- -> prepare commit and submit the PR only after review and docs sync gates pass
610
- -> summarize final status and risks to the user
611
- ```
612
-
613
- The project manager is a process owner, not an execution owner.
614
-
615
- It is also the communication bridge between the user and the role agents. The user should not need to know how to write a perfect Claude Code prompt. The project manager owns the conversion from user intent to precise agent instructions.
616
-
617
- A shorter route is a per-task exception, not a baseline default. If the user explicitly approves one, the project manager records the exception and still verifies ownership, validation, and acceptance gates.
618
-
619
- Do not let the project manager:
620
-
621
- - implement complex changes directly
622
- - skip required `architect`, `coder`, or `reviewer` sessions
623
- - approve coder output without independent reviewer evidence
624
- - bypass the required role route for high-risk work
625
- - turn coordination into a do-everything session
626
-
627
- ### 7.1.1 Role Command Contract
628
-
629
- For non-trivial tasks, the project manager must not hand off a vague prompt to the next role. It must prepare a role command that is specific enough for that role to execute without recovering missing process context from chat history.
630
-
631
- A role command must include:
632
-
633
- ```text
634
- role identity
635
- task brief path
636
- required input artifacts
637
- allowed write scope
638
- public surface contract
639
- test contract
640
- stop conditions
641
- validation commands
642
- expected output artifact path
643
- escalation / Replan triggers
644
- ```
645
-
646
- Role command examples:
647
-
648
- ```text
649
- architect command:
650
- read the task brief, architecture docs, module map, and relevant module-local CLAUDE.md
651
- produce .ai/handoffs/architecture-plan.md
652
- define file responsibilities, public contracts, test contracts, phases, validation, and Replan triggers
653
- do not edit production code
654
-
655
- coder command:
656
- read the task brief and approved architecture-plan.md
657
- implement only the approved phase and allowed files
658
- add or update direct contract/regression tests
659
- update implementation-log.md and validation-log.md
660
- stop if scope, public contract, architecture, or test strategy must change
661
-
662
- reviewer command:
663
- read task brief, architecture-plan.md, implementation-log.md, validation-log.md, and git diff
664
- verify scope, architecture, public contract, tests, validation, and docs gaps
665
- write review-report.md
666
- only apply small, local, low-risk review-scoped fixes
667
-
668
- architect docs-sync command:
669
- read task brief, architecture-plan.md, implementation-log.md, validation-log.md, review-report.md, and git diff
670
- verify whether the final code still matches the approved architecture and public contracts
671
- update architecture/module/testing/security docs when the code change made them stale
672
- write docs-sync-report.md with docs changed, docs intentionally unchanged, and remaining doc risks
673
- stop and request Replan if implementation drift changes architecture, public contracts, dependency direction, schema, auth, permission, payment, or design assumptions
674
- ```
675
-
676
- The project manager may use a prompt compiler or template system to build role commands, but the responsibility stays with the project manager. A role command is an auditable artifact: if a role agent fails because the command was vague, the harness should improve the command template rather than blaming the role agent alone.
677
-
678
- ### 7.2 Session-Wide Role Agents
679
-
680
- Instead, start each major phase with an explicit session-wide role:
681
-
682
- ```bash
683
- claude --agent project-manager
684
- claude --agent architect
685
- claude --agent coder
686
- claude --agent reviewer
687
- ```
688
-
689
- For background work:
690
-
691
- ```bash
692
- claude --agent reviewer --bg "Review PR 123 for architecture drift, test gaps, and scope creep"
693
- ```
694
-
695
- The role is selected at session startup. The agent file defines that session's system prompt, tool restrictions, model, stop conditions, and output format. `CLAUDE.md` still provides project rules, but critical safety, architecture, permission, and output constraints must be repeated inside the role agent file.
696
-
697
- If the current session was not started with the required role, stop and ask the user to restart with the correct `claude --agent <role>` command. Do not simulate a different role through a normal prompt.
698
-
699
- ### 7.3 Task Routing
700
-
701
- This is not progressive adoption. The full harness exists by default, and the baseline does not define shortcut routes.
702
-
703
- All user-facing routes begin with `project-manager`. A shorter route requires explicit user approval recorded in the task evidence.
704
-
705
- | Task class | Examples | Required role route |
706
- | --- | --- | --- |
707
- | Normal managed task | bug fix, doc change, config change, feature, ordinary PR | `project-manager` -> `architect` -> `coder` -> `reviewer` -> `architect` docs sync -> PM commit/PR |
708
- | High-risk task | auth, permission, payment, billing, schema, data deletion, public API/SDK, security-sensitive infrastructure | `project-manager` -> `architect` -> relevant specialist if needed -> `coder` -> `reviewer` -> `architect` docs sync -> human approval -> PM commit/PR |
709
- | Large / multi-phase task | new subsystem, major rewrite, migration across many modules | `project-manager` -> `architect`; then repeat `coder` -> `reviewer` -> `architect` docs sync per phase; PM commit/PR at phase or task boundary |
710
-
711
- If classification is unclear, use the stricter route.
712
-
713
- ### 7.4 Required Roles
714
-
715
- Large projects should define these project-level agents:
716
-
717
- ```text
718
- .claude/agents/
719
- project-manager.md
720
- architect.md
721
- coder.md
722
- reviewer.md
723
- optional/
724
- security-specialist.md
725
- migration-specialist.md
726
- performance-specialist.md
727
- frontend-qa.md
728
- ```
729
-
730
- Role responsibilities:
731
-
732
- ```text
733
- project-manager
734
- owns user communication, task clarification, task briefs, role routing, and route-file message preparation
735
- turns user input into an engineering task when needed
736
- summarizes role outputs back to the user
737
- creates and verifies handoff artifacts
738
- tracks progress, blockers, validation, docs sync, and Replan
739
- outputs task briefs, role commands, status summaries, and final acceptance reports
740
- must not own architecture, implementation, and independent review for the same non-trivial task
741
-
742
- architect
743
- owns architecture and plan
744
- defines module boundaries, file responsibilities, public contracts, dependency direction, risk, and phases
745
- owns post-review docs sync and architecture drift checks before PM final acceptance
746
- outputs .ai/handoffs/architecture-plan.md
747
- outputs .ai/handoffs/docs-sync-report.md when a post-review docs sync gate is required
748
- must not implement production code
749
-
750
- coder
751
- owns code changes and baseline tests required to complete the approved task
752
- follows approved architecture-plan.md and task brief
753
- outputs touched files, implementation notes, validation results, and follow-ups
754
- must write/update direct unit, contract, or regression tests needed for the changed behavior
755
- must not change module responsibilities, public contracts, architecture direction, or test strategy without Replan
756
-
757
- reviewer
758
- owns independent acceptance and final test responsibility
759
- checks scope, role compliance, architecture compliance, public contract compliance, docs gaps, validation evidence, and risk
760
- checks, designs, and adds missing tests when needed
761
- may directly apply small, local, low-risk review fixes
762
- owns complex tests, E2E coverage, regression matrix, and release-level validation recommendations
763
- outputs .ai/handoffs/review-report.md
764
- must escalate larger implementation issues to coder
765
- must escalate architecture, public contract, design, or documentation drift issues to architect
766
- ```
767
-
768
- ### 7.5 Role Permission Matrix
769
-
770
- Prompt rules are not enough. Role separation must be backed by tool scope, permission mode, hooks, and review.
771
-
772
- | Role | Suggested tools | Write scope | Must not |
773
- | --- | --- | --- | --- |
774
- | `project-manager` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | task briefs, role commands, handoff metadata, status/progress/known-issues, final reports, PR description | implement non-trivial production code, write durable project docs, approve without reviewer/docs-sync evidence, replace architect/coder/reviewer roles |
775
- | `architect` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | architecture plan, docs sync report, durable plan updates, approved architecture/module/testing/security/dependency docs | edit production code, rewrite tests, expand task scope |
776
- | `coder` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | approved source files, baseline tests, validation log, implementation log | change scope, write durable project docs, change public contracts, module boundaries, or test strategy without Replan |
777
- | `reviewer` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | review report, missing tests/fixtures, validation log, small review-scoped fixes | take over implementation, change architecture/public contracts, approve own implementation, weaken tests |
778
- | `security-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | security review report and approved security tests | bypass approvals, edit production code without explicit scope |
779
- | `migration-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | migration plan, migration tests, validation notes | run destructive migrations, change schema without approval |
780
- | `performance-specialist` | `Read`, `Grep`, `Glob`, `Bash`, `Edit`, `Write` | performance report, benchmarks, approved perf tests | change product behavior, hide regressions |
781
-
782
- Recommended permission modes:
783
-
784
- ```text
785
- project-manager: default with write hooks limited to task briefs, role commands, handoff metadata, state files, final reports, and PR description
786
- architect: default with write hooks limited to architecture-plan.md, docs-sync-report.md, durable plan updates, and approved durable docs
787
- coder: default or acceptEdits, but only inside approved scope
788
- reviewer: default with production-code writes blocked except explicitly review-scoped small fixes; test writes allowed
789
- specialist: default with write hooks limited to specialist reports, tests, and approved files
790
- ```
791
-
792
- Tool lists alone cannot enforce path-level ownership. Add hooks or CI checks that reject writes outside each role's allowed scope. If path-scoped enforcement is unavailable, the final review must explicitly inspect role ownership violations.
793
-
794
- ### 7.6 Handoff Contract
795
-
796
- Role sessions communicate through files, not memory from previous chats.
797
-
798
- Required handoff directory:
799
-
800
- ```text
801
- .ai/handoffs/
802
- role-commands/
803
- architect.md
804
- coder.md
805
- reviewer.md
806
- architecture-plan.md
807
- implementation-log.md
808
- validation-log.md
809
- review-report.md
810
- docs-sync-report.md
811
- ```
812
-
813
- Each role session must start by reading the artifacts it depends on:
814
-
815
- ```text
816
- project-manager
817
- reads: user request, repo entry docs, task state, role outputs
818
- writes: task brief, role commands, progress/status, known issues, final acceptance report
819
-
820
- architect
821
- reads: task request, task brief, ARCHITECTURE.md, MODULE_MAP.md, module-local CLAUDE.md, relevant source/tests
822
- writes: architecture-plan.md
823
-
824
- architect docs sync
825
- reads: task brief, architecture-plan.md, implementation-log.md, validation-log.md, review-report.md, git diff, relevant docs
826
- writes: docs updates when needed, docs-sync-report.md
827
-
828
- coder
829
- reads: task brief, architecture-plan.md, relevant module docs
830
- writes: code, baseline tests, implementation-log.md, validation-log.md
831
-
832
- reviewer
833
- reads: task brief, architecture-plan.md, implementation-log.md, validation-log.md, git diff
834
- writes: review-report.md
835
-
836
- optional specialist
837
- reads: task brief, architecture-plan.md, relevant source/tests
838
- writes: specialist report, approved tests, validation-log.md
839
- ```
840
-
841
- Reviewer test responsibility:
842
-
843
- ```text
844
- coder:
845
- writes direct tests required by the code change
846
- runs focused validation
847
-
848
- reviewer:
849
- owns final test adequacy
850
- identifies and adds missing unit/contract/integration tests when needed
851
- owns complex test strategy, E2E smoke/release coverage, and regression matrix
852
- may directly apply small, local, low-risk review fixes
853
- must request coder fixes for larger implementation issues
854
- must request architect review for architecture, public contract, dependency, schema, auth, permission, payment, design, or docs drift issues
855
- must not weaken tests to pass validation
856
- ```
857
-
858
- Reviewer direct fixes must be review-scoped:
859
-
860
- ```text
861
- allowed:
862
- strengthen test assertions
863
- add missing small boundary/regression tests
864
- fix test names, fixtures, or validation documentation
865
- fix obvious typo, import, lint, formatting, or local compile error
866
- fix a small local bug discovered during review
867
-
868
- required conditions:
869
- small and local
870
- low-risk
871
- no public contract change
872
- no architecture change
873
- no new dependency
874
- no schema/migration change
875
- no auth, permission, payment, or data deletion behavior change
876
- no broad production rewrite
877
-
878
- escalate to coder:
879
- business logic needs a medium or large change
880
- multiple production files need coordinated edits
881
- implementation structure needs rework
882
- validation fails because core behavior is wrong
883
- the fix would exceed a small review patch
884
-
885
- escalate to architect:
886
- module boundary is wrong
887
- file responsibilities are wrong
888
- public contract is wrong
889
- dependency direction is wrong
890
- schema, auth, permission, payment, public API, or security design is wrong
891
- architecture, module, testing, security, or dependency docs are stale after implementation
892
- the implementation reveals that the architecture plan is invalid
893
- ```
894
-
895
- For a task with a handoff directory, the task-level `validation-log.md` is the authoritative validation record for that task. Do not maintain a separate rolling validation log as durable truth; completed validation evidence should survive through tests, CI, commits, PR text, or explicitly preserved plans.
896
-
897
- For complex or high-risk work, the next role must not start until the required previous artifact exists and is coherent.
898
-
899
- Handoff artifact schemas:
900
-
901
- ```md
902
- # architecture-plan.md
903
-
904
- ## Architecture Summary
905
- ## Task Classification
906
- ## Required Role Route
907
- ## Scaffold Manifest
908
- ## Modules / Files
909
- ## File Responsibilities
910
- ## Public Surface Contract
911
- ## Dependency Direction
912
- ## Data Flow
913
- ## Phases
914
- ## Files Per Phase
915
- ## Validation Per Phase
916
- ## Rollback / Replan Triggers
917
- ## Risks
918
- ## Stop Conditions
919
- ## Docs To Update
920
- ## Approval
921
-
922
- # implementation-log.md
923
-
924
- ## Summary
925
- ## Files Changed
926
- ## Public Surface Changed
927
- ## Tests Added / Updated
928
- ## Validation Run
929
- ## Deviations From Architecture Plan
930
- ## Follow-ups
931
-
932
- # validation-log.md
933
-
934
- ## <timestamp> <command>
935
-
936
- - role:
937
- - commit / diff:
938
- - scope:
939
- - result:
940
- - failures:
941
- - fixes:
942
- - rerun:
943
-
944
- # review-report.md
945
-
946
- ## Summary
947
- ## Role / Handoff Compliance
948
- ## Scope Review
949
- ## Architecture Review
950
- ## Public Contract Review
951
- ## Test Review
952
- ## Missing Tests Added
953
- ## Review Fixes Applied
954
- ## Escalations To Coder / Architect
955
- ## E2E / Regression Recommendation
956
- ## Validation Evidence
957
- ## Docs Gap Review
958
- ## Findings
959
- ## Decision
960
-
961
- # docs-sync-report.md
962
-
963
- ## Summary
964
- ## Architecture Drift Check
965
- ## Docs Updated
966
- ## Docs Reviewed And Left Unchanged
967
- ## Public Contract / Module Boundary Notes
968
- ## Remaining Documentation Risks
969
- ## Decision
970
- ```
971
-
972
- ### 7.7 Role Session vs Subagent
973
-
974
- Role sessions for the same task should normally share the same task worktree and branch.
975
-
976
- Worktree isolation is by task, not by role:
977
-
978
- ```text
979
- one task
980
- -> one branch: feature/<task-slug>
981
- -> one worktree: <task-worktree-root>/<task-slug>
982
- -> one handoff directory
983
- -> architect -> coder -> reviewer in sequence
984
- ```
985
-
986
- Do not create separate worktrees only because the task uses `architect`, `coder`, and `reviewer`. That fragments context, makes diffs harder to audit, and adds merge overhead without improving role separation.
987
-
988
- Role separation comes from:
989
-
990
- - session role
991
- - write permissions
992
- - handoff files
993
- - phase boundaries
994
- - validation and review
995
-
996
- Worktree separation is for different tasks or truly parallel writable sub-tasks.
997
-
998
- Use a role session when:
999
-
1000
- - the phase is the main work, not a side task
1001
- - the role needs sustained interaction with the user
1002
- - the role owns decisions or artifacts
1003
- - the role may run for a long time
1004
- - the role needs clear accountability
1005
-
1006
- Use a subagent when:
1007
-
1008
- - the work is a bounded side task
1009
- - the task produces verbose output that should not pollute the main context
1010
- - the task can return a concise summary
1011
- - the task is read-only exploration, review, triage, or log analysis
1012
- - the task can safely run in parallel
1013
-
1014
- Do not use dynamic subagent routing as the primary workflow for architecture/plan -> coding -> independent review/testing. Use explicit role sessions and file handoffs for that.
1015
-
1016
- ### 7.8 Agent File Contract
1017
-
1018
- Every role agent file should define:
1019
-
1020
- ```md
1021
- ---
1022
- name: architect
1023
- description: Use as a session-wide role for architecture design, task planning, module boundaries, file responsibilities, public contracts, dependency direction, and risk assessment.
1024
- tools: Read, Grep, Glob, Bash, Edit, Write
1025
- disallowedTools: Agent
1026
- permissionMode: default
1027
- model: sonnet
1028
- ---
1029
-
1030
- # Role
1031
-
1032
- You are the architecture and planning role for this project.
1033
-
1034
- # Global Rules To Repeat
1035
-
1036
- - Follow root `CLAUDE.md`, module-local `CLAUDE.md`, and the relevant handoff artifacts.
1037
- - Do not exceed this role's write scope.
1038
- - Stop when scope, architecture, public contract, test strategy, or risk changes.
1039
-
1040
- # Responsibilities
1041
-
1042
- - Define module boundaries.
1043
- - Define file-level responsibilities.
1044
- - Define public function and public API contracts.
1045
- - Identify dependency direction and forbidden imports.
1046
- - Split work into phases.
1047
- - Define validation per phase.
1048
- - Identify architecture risks and stop conditions.
1049
-
1050
- # Required Inputs
1051
-
1052
- - task brief or user request
1053
- - `docs/ARCHITECTURE.md`
1054
- - `docs/MODULE_MAP.md`
1055
- - relevant module-local `CLAUDE.md`
1056
-
1057
- # Outputs
1058
-
1059
- - `.ai/handoffs/architecture-plan.md`
1060
-
1061
- # Do Not
1062
-
1063
- - Do not implement production code.
1064
- - Do not rewrite tests.
1065
- - Do not invent product requirements.
1066
- - Do not bypass module ownership rules.
1067
-
1068
- # Stop Conditions
1069
-
1070
- - Requested behavior is ambiguous.
1071
- - The design requires public API, schema, auth, payment, permission, or security boundary changes without approval.
1072
- - The existing architecture cannot support the requested behavior without Replan.
1073
- ```
1074
-
1075
- Use frontmatter fields such as `tools`, `disallowedTools`, `permissionMode`, `hooks`, `mcpServers`, and `skills` when the role needs stricter tool, permission, or integration boundaries.
1076
-
1077
- Minimum role templates:
1078
-
1079
- ```text
1080
- project-manager.md
1081
- frontmatter:
1082
- tools: Read, Grep, Glob, Bash, Edit, Write
1083
- permissionMode: default
1084
- required inputs:
1085
- user request, repo entry docs, current task state, role outputs
1086
- outputs:
1087
- task brief, role commands, progress/status updates, final acceptance report
1088
- do not:
1089
- implement non-trivial production code, replace architect/coder/reviewer, approve without reviewer evidence
1090
- stop when:
1091
- user intent is ambiguous, role route is unclear, handoff artifacts are missing, or scope/risk changes
1092
-
1093
- architect.md
1094
- frontmatter:
1095
- tools: Read, Grep, Glob, Bash, Edit, Write
1096
- permissionMode: default
1097
- required inputs:
1098
- task brief, ARCHITECTURE.md, MODULE_MAP.md, module-local CLAUDE.md
1099
- outputs:
1100
- architecture-plan.md
1101
- do not:
1102
- implement code, rewrite tests, expand task scope
1103
- stop when:
1104
- public API, schema, auth, permission, payment, or security boundaries need approval
1105
-
1106
- coder.md
1107
- frontmatter:
1108
- tools: Read, Grep, Glob, Bash, Edit, Write
1109
- permissionMode: default
1110
- required inputs:
1111
- task brief, architecture-plan.md
1112
- outputs:
1113
- code, baseline tests, implementation-log.md, validation-log.md
1114
- do not:
1115
- change architecture, public contracts, scope, test strategy, or module responsibilities without Replan
1116
- stop when:
1117
- implementation requires design, contract, dependency, schema, permission, or test-strategy changes
1118
-
1119
- reviewer.md
1120
- frontmatter:
1121
- tools: Read, Grep, Glob, Bash, Edit, Write
1122
- permissionMode: default
1123
- required inputs:
1124
- task brief, architecture-plan.md, implementation-log.md, validation-log.md, git diff
1125
- outputs:
1126
- review-report.md, missing tests/fixtures when needed, review-scoped small fixes, validation-log.md
1127
- do not:
1128
- take over implementation, change architecture/public contracts, weaken tests, lower assertions, delete failing tests, approve own implementation
1129
- stop when:
1130
- handoffs are missing, validation evidence is missing, architecture/test/doc compliance cannot be verified, or the fix is no longer small/local/low-risk
1131
- ```
1132
-
1133
- ### 7.9 Default Workflow
1134
-
1135
- For large features:
1136
-
1137
- ```text
1138
- project-manager session
1139
- -> communicate with user + clarify intent + classify task + route roles + track process
1140
-
1141
- architect session
1142
- -> architecture-plan.md
1143
-
1144
- coder session
1145
- -> code + baseline tests + implementation-log.md + validation-log.md
1146
-
1147
- reviewer session
1148
- -> review-report.md + missing tests/fixtures if needed + validation-log.md
1149
-
1150
- architect docs-sync session
1151
- -> docs updates if needed + docs-sync-report.md
1152
-
1153
- human approval when required
1154
-
1155
- project-manager session
1156
- -> final acceptance + commit + PR submission
1157
- ```
1158
-
1159
- For small bug fixes or ordinary PRs, one coder session is acceptable if the task brief is clear, file responsibilities are explicit, public contracts are defined when needed, and validation is cheap.
1160
-
1161
- For complex features, cross-module changes, public API changes, schema changes, auth, payment, permissions, data deletion, or security-sensitive work, role sessions are required.
1162
-
1163
- ## 8. Testing and Validation
1164
-
1165
- Core principle:
1166
-
1167
- > Test assets should be rich, and execution should be smart. Run fast, relevant tests during development; run broad, expensive suites before release.
1168
-
1169
- ### 8.1 Layers
1170
-
1171
- ```text
1172
- L0 Fast Checks
1173
- format, lint, typecheck, architecture boundary, dependency rules
1174
-
1175
- L1 Focused Unit / Contract Tests
1176
- changed-file related tests, public function contract tests, regression tests
1177
-
1178
- L2 Module / Integration Tests
1179
- module service tests, DB integration, API contract, service/controller integration
1180
-
1181
- L3 Smoke E2E
1182
- core user journeys, minimal browser/API smoke flows
1183
-
1184
- L4 Full Regression / Release Suite
1185
- complex business combinations, multi-browser, historical replay, visual/accessibility/perf
1186
- ```
1187
-
1188
- Time budgets:
1189
-
1190
- ```text
1191
- L0 check-fast: <= 60s
1192
- L1 check-changed: <= 3min
1193
- L2 check-module: <= 10min
1194
- L3 smoke-e2e: <= 15min
1195
- L4 full-regression: nightly / release only
1196
- ```
1197
-
1198
- ### 8.2 Commands
1199
-
1200
- ```text
1201
- .ai/tools/check-fast
1202
- .ai/tools/check-changed
1203
- .ai/tools/check-module <module>
1204
- .ai/tools/check-e2e-smoke [scope]
1205
- .ai/tools/check-e2e-release
1206
- .ai/tools/check-full
1207
- .ai/tools/run-long-check
1208
- .ai/tools/watch-job
1209
- ```
1210
-
1211
- What Claude should run:
1212
-
1213
- ```text
1214
- docs / comments / small config:
1215
- L0
1216
-
1217
- ordinary bug fix:
1218
- L0 + L1 + regression test
1219
-
1220
- new or modified public function:
1221
- L0 + L1 public contract tests
1222
-
1223
- ordinary feature:
1224
- L0 + L1 + relevant L2
1225
-
1226
- module behavior change:
1227
- L0 + L1 + L2
1228
-
1229
- user-visible critical path:
1230
- L0 + L1 + L2 + relevant L3 smoke E2E
1231
-
1232
- auth / payment / permission / schema / public API:
1233
- L0 + L1 + L2 + relevant L3
1234
- L4 before release
1235
-
1236
- release / major version / high-risk migration:
1237
- L0 + L1 + L2 + L3 + L4
1238
- ```
1239
-
1240
- ### 8.3 Long-Running Validation
1241
-
1242
- Do not end the current Claude Code turn only to wait for a long-running build or test callback. That callback-based waiting model is unreliable: the callback can be delayed, lost, or resumed with stale context.
1243
-
1244
- Use a project skill instead:
1245
-
1246
- ```text
1247
- .claude/skills/long-running-validation.md
1248
- ```
1249
-
1250
- The skill should instruct Claude to:
1251
-
1252
- ```text
1253
- 1. Start the long-running command through `.ai/tools/run-long-check`.
1254
- 2. Write status and logs under `.ai/jobs/<job-id>/` or the harness-managed runtime job directory.
1255
- 3. Run `.ai/tools/watch-job` in the same turn with a bounded timeout.
1256
- 4. Exit with success, failure, or timeout.
1257
- 5. Read the final status and relevant log tail.
1258
- 6. Record the command, result, duration, and skipped/follow-up checks in validation-log.md.
1259
- ```
1260
-
1261
- Recommended job files:
1262
-
1263
- ```text
1264
- .ai/jobs/<job-id>/status.json
1265
- .ai/jobs/<job-id>/stdout.log
1266
- .ai/jobs/<job-id>/stderr.log
1267
- ```
1268
-
1269
- `status.json` should include:
1270
-
1271
- ```json
1272
- {
1273
- "jobId": "check-e2e-20260606-001",
1274
- "command": ".ai/tools/check-e2e-smoke --run",
1275
- "status": "running",
1276
- "startedAt": "2026-06-06T00:00:00Z",
1277
- "finishedAt": null,
1278
- "exitCode": null,
1279
- "stdoutPath": ".ai/jobs/check-e2e-20260606-001/stdout.log",
1280
- "stderrPath": ".ai/jobs/check-e2e-20260606-001/stderr.log"
1281
- }
1282
- ```
1283
-
1284
- Rules:
1285
-
1286
- - The watcher must be bounded and exit on success, failure, or timeout.
1287
- - Claude must not hand-write an infinite shell loop.
1288
- - Claude must not rely on ending the conversation to receive a shell-completion callback.
1289
- - Timeout is a result. Record it, summarize the log tail, and route the blocker through the normal handoff / Replan path.
1290
- - Job state under `.ai/jobs/**` or the harness-managed runtime job directory is runtime state. Delete it during task close after useful facts are promoted.
1291
-
1292
- ### 8.4 Change-Aware Test Selection
1293
-
1294
- Do not maintain a manual test map. Generate or verify a test map from source code, test naming conventions, coverage data, build metadata, and CI history.
1295
-
1296
- ```json
1297
- {
1298
- "services/billing/invoice/calculator.ts": {
1299
- "module": "billing",
1300
- "unit": ["tests/billing/invoice-calculator.test.ts"],
1301
- "integration": ["tests/billing/refund-service.test.ts"],
1302
- "e2eSmoke": ["e2e/smoke/billing-checkout.spec.ts"]
1303
- }
1304
- }
1305
- ```
1306
-
1307
- The generated artifact should live at:
1308
-
1309
- ```text
1310
- .ai/generated/test-map.json
1311
- ```
1312
-
1313
- Rules:
1314
-
1315
- - `.ai/generated/test-map.json` is a derived artifact, not a hand-edited source of truth.
1316
- - Manual edits to generated test maps are forbidden.
1317
- - `.ai/tools/check-generated-artifacts` fails in CI if the generated map is stale.
1318
- - If the map cannot be generated reliably, `.ai/tools/check-changed` must fall back to code search, LSP, ownership metadata, and conservative module-level tests.
1319
-
1320
- `.ai/tools/check-changed` should:
1321
-
1322
- ```text
1323
- git diff
1324
- -> touched files
1325
- -> map to modules
1326
- -> find related unit/contract/regression tests
1327
- -> run L0 + focused L1
1328
- -> if public surface changed, suggest L2
1329
- -> if critical user path changed, suggest L3
1330
- ```
1331
-
1332
- ### 8.5 E2E Tiers
1333
-
1334
- ```text
1335
- e2e/
1336
- smoke/
1337
- login.spec.ts
1338
- checkout-happy-path.spec.ts
1339
- core-dashboard-load.spec.ts
1340
-
1341
- regression/
1342
- coupon-partial-refund.spec.ts
1343
- permission-edge-cases.spec.ts
1344
- multi-user-collaboration.spec.ts
1345
-
1346
- release/
1347
- cross-browser.spec.ts
1348
- mobile-responsive.spec.ts
1349
- upgrade-migration.spec.ts
1350
- ```
1351
-
1352
- Smoke E2E: small, stable, core paths, runnable on every PR or high-risk change.
1353
- Release E2E: complex combinations, historical incidents, cross-browser, slower but non-flaky, run before release or nightly.
1354
-
1355
- Test tags:
1356
-
1357
- ```text
1358
- @smoke @regression @release @slow @flaky
1359
- @billing @auth @risk-high @public-api @contract
1360
- ```
1361
-
1362
- ### 8.6 Public Function Test Contract
1363
-
1364
- Every new or modified public function must have tests covering its contract.
1365
-
1366
- Minimum:
1367
-
1368
- ```text
1369
- ordinary public function:
1370
- happy path + boundary/failure path
1371
-
1372
- business-critical public function:
1373
- happy path + boundary + invalid input + state/permission + side effect + regression
1374
-
1375
- high-risk public function:
1376
- table-driven tests where practical
1377
- contract/integration tests at module boundary
1378
- replay/golden tests when behavior is complex
1379
-
1380
- cross-module contract:
1381
- if a public function is consumed by another module,
1382
- add a contract test owned by the consumer module
1383
- to lock in the behavior the consumer actually depends on
1384
- ```
1385
-
1386
- Tests should verify:
1387
-
1388
- ```text
1389
- input -> output -> side effects -> error behavior -> state changes
1390
- ```
1391
-
1392
- Do not only verify:
1393
-
1394
- ```text
1395
- mock call order
1396
- internal helper call counts
1397
- local implementation steps
1398
- ```
1399
-
1400
- ### 8.7 Test Quality Red Lines
1401
-
1402
- Forbidden:
1403
-
1404
- - weakening tests to make implementation pass
1405
- - deleting failing tests without explanation
1406
- - testing only mock call order
1407
- - copying implementation logic into tests
1408
- - testing only the happy path
1409
- - large snapshots without clear assertion intent
1410
- - fragile private-helper tests while missing public contract coverage
1411
- - marking work complete without running declared validation
1412
-
1413
- Encouraged:
1414
-
1415
- - table-driven tests
1416
- - regression test names that include historical scenarios
1417
- - comments explaining why complex cases matter
1418
- - golden / replay tests
1419
- - integration / contract tests
1420
-
1421
- Maintenance:
1422
-
1423
- - flaky tests must have an owner, issue, and isolation strategy
1424
- - slow tests are tagged `@slow` or moved to the release suite
1425
- - skipped tests require issue, owner, and expiration
1426
- - fast and slow tests are maintained separately
1427
-
1428
- ## 9. Hooks / Skills / Subagents / Commands
1429
-
1430
- Do not rely on `CLAUDE.md` for constraints that can be automated.
1431
-
1432
- Recommended hooks:
1433
-
1434
- ```text
1435
- Stop:
1436
- notify the harness manager that a role turn ended
1437
- trigger the harness manager to scan task-local route files for pending messages
1438
-
1439
- PreToolUse:
1440
- block protected files
1441
- block destructive commands
1442
- block unapproved deploy/migration/data deletion
1443
- block production secrets
1444
- block writes outside the current role's allowed scope
1445
- block implementation edits that change architecture/public contracts without Replan
1446
-
1447
- PostToolUse:
1448
- format touched files
1449
- collect touched files
1450
- run cheap lint
1451
-
1452
- Stop:
1453
- switch the role activity state to idle
1454
- check project manager did not bypass required role route
1455
- check task risk and required role route
1456
- check required handoff artifacts exist
1457
- check required validation
1458
- check task-level validation-log.md updated when handoffs exist
1459
- check progress updated
1460
- check docs synced after plan/contract/test changes
1461
- check no TODO(agent), placeholder, mocked implementation
1462
- check public functions have contract tests
1463
- check tests were not weakened
1464
-
1465
- SessionStart:
1466
- show that task coordination should use `claude --agent project-manager`
1467
- warn when a non-trivial task is running in an untagged session
1468
- show current role and expected role for the task
1469
- show required handoff artifacts for the required route
1470
- inject current task state
1471
- show recent failing checks
1472
- show module owner and validation commands
1473
- ```
1474
-
1475
- Protected files:
1476
-
1477
- ```text
1478
- .env
1479
- .env.*
1480
- secrets/
1481
- vendor/
1482
- third_party/
1483
- generated/
1484
- .ai/generated/
1485
- package-lock.json
1486
- pnpm-lock.yaml
1487
- db/migrations/
1488
- ```
1489
-
1490
- Lockfiles and migrations are not permanently forbidden, but they require explicit approval.
1491
-
1492
- If you type the same long prompt for the third time, turn it into a skill or command.
1493
-
1494
- Skill placement:
1495
-
1496
- ```text
1497
- CLAUDE.md
1498
- -> short mandatory rules
1499
-
1500
- docs/AI_WORKFLOW.md
1501
- -> role route, gates, handoff protocol, acceptance policy
1502
-
1503
- .claude/agents/*.md
1504
- -> role ownership and stop conditions
1505
-
1506
- .claude/skills/*.md
1507
- -> reusable operating procedures
1508
-
1509
- .ai/tools/*
1510
- -> deterministic execution
1511
- ```
1512
-
1513
- Good skill candidates:
1514
-
1515
- - `route-message`
1516
- - `long-running-validation`
1517
- - `final-acceptance`
1518
- - `docs-sync`
1519
- - `replan`
1520
- - `harness-bootstrap`
1521
- - `harness-maintenance`
1522
- - `known-issues-triage`
1523
- - `task-cleanup`
1524
-
1525
- Hard constraints must not live only in skills. Role boundaries, default routes, high-risk approval rules, protected-file rules, route-file turn rules, and durable-doc ownership must remain in `CLAUDE.md`, `docs/AI_WORKFLOW.md`, role agent files, hooks, or CI checks.
1526
-
1527
- One-off or occasional procedures do not always need to be committed as repo-local skill files. Harness bootstrap and harness maintenance can be injected as temporary session procedures when their main purpose is to guide a single analysis/audit run. Keep deterministic installation, managed-block updates, hook merging, manifest migration, and uninstall logic in tools or backend code.
1528
-
1529
- ### `route-message` Skill
1530
-
1531
- Use this skill when a role needs to hand work, ask a question, report a result, report a blocker, or raise a finding to another role.
1532
-
1533
- Hard rule for `CLAUDE.md` / `docs/AI_WORKFLOW.md`:
1534
-
1535
- ```text
1536
- When sending a role message, use the route-message skill.
1537
- After writing the route file, end the current turn.
1538
- Do not poll, loop, or wait for another role's answer.
1539
- ```
1540
-
1541
- Skill contract:
1542
-
1543
- - write or update exactly one route file under the task-local handoff messages directory
1544
- - keep the filename as the authoritative route
1545
- - use only the allowed message types for the route
1546
- - include artifact references instead of copying long handoff documents
1547
- - update an existing pending route file instead of creating fragmented follow-ups
1548
- - leave backend delivery, history, target-idle checks, and route-file clearing to the harness manager
1549
-
1550
- The skill is an authoring procedure, not a transport. It must not paste directly into another role terminal or bypass the harness manager.
1551
-
1552
- ### `long-running-validation` Skill
1553
-
1554
- Use this skill for builds, test suites, browser/E2E runs, or validation commands that may exceed the normal interactive shell timeout.
1555
-
1556
- Hard rule for `CLAUDE.md` / `docs/AI_WORKFLOW.md`:
1557
-
1558
- ```text
1559
- Do not end the current turn only to wait for a long-running shell callback.
1560
- Use the long-running-validation skill and bounded job watcher.
1561
- ```
1562
-
1563
- Skill contract:
1564
-
1565
- - start the job through `.ai/tools/run-long-check`
1566
- - write job status and logs under `.ai/jobs/<job-id>/` or the harness-managed runtime job directory
1567
- - run `.ai/tools/watch-job` in the same turn
1568
- - watcher exits on success, failure, or timeout
1569
- - summarize the final status and log tail
1570
- - record the result in the task `validation-log.md`
1571
- - route failures or timeouts through the normal handoff / Replan path
1572
-
1573
- Do not implement this by asking Claude to keep an infinite loop open. The loop belongs inside the project tool, with a bounded timeout and clear output.
1574
-
1575
- ### `docs-sync` Skill
1576
-
1577
- Use this skill after implementation and review, before final acceptance or PR preparation, when code changes may affect long-term project documentation.
1578
-
1579
- Hard rule for `CLAUDE.md` / `docs/AI_WORKFLOW.md`:
1580
-
1581
- ```text
1582
- The architecture/documentation owner performs docs sync after review.
1583
- Final acceptance checks the docs-sync result; it does not replace it.
1584
- ```
1585
-
1586
- Skill contract:
1587
-
1588
- - read the task brief or durable plan, architecture plan, implementation evidence, validation evidence, review findings, current diff, and affected durable docs
1589
- - check architecture, module boundaries, public contracts, validation strategy, security assumptions, dependency direction, and durable plan state
1590
- - update long-term docs when implementation changed durable project truth
1591
- - explicitly list docs reviewed and left unchanged
1592
- - write a task-local docs-sync report with decision, evidence, docs updated, docs unchanged, remaining documentation risks, and final-acceptance notes
1593
- - route architecture or contract drift back through the normal Replan path
1594
-
1595
- The skill may edit durable documentation, but it must not edit production code, tests, or generated artifacts unless a project-specific generator/check owns that output.
1596
-
1597
- Recommended subagents:
1598
-
1599
- ```text
1600
- codebase-explorer
1601
- test-failure-triager
1602
- security-specialist
1603
- performance-specialist
1604
- frontend-qa
1605
- api-contract-reviewer
1606
- migration-specialist
1607
- ```
1608
-
1609
- Review and explorer subagents should default to read-only.
1610
-
1611
- Role agent sessions are different from subagents. Role sessions own a project phase and should be started with `claude --agent <role>`. Subagents are for bounded side tasks, context isolation, parallel exploration, triage, and independent review.
1612
-
1613
- ## 10. Git / Worktrees / Review
1614
-
1615
- Git is part of the harness. It is the audit trail for scope, architecture compliance, validation, and rollback.
1616
-
1617
- Default rule:
1618
-
1619
- ```text
1620
- one task
1621
- -> one branch: feature/<task-slug>
1622
- -> one worktree: <task-worktree-root>/<task-slug>
1623
- -> one handoff directory
1624
- -> one PR
1625
- ```
1626
-
1627
- Architect, coder, and reviewer should normally work in the same task worktree. They should hand off sequentially, not write concurrently.
1628
-
1629
- Do not split worktrees by role:
1630
-
1631
- ```text
1632
- bad:
1633
- task-login-architect/
1634
- task-login-coder/
1635
- task-login-reviewer/
1636
-
1637
- good:
1638
- task-login/
1639
- architect -> coder -> reviewer
1640
- ```
1641
-
1642
- Role isolation is enforced by role files, permissions, hooks, handoff artifacts, and review. Worktree isolation is enforced at the task boundary.
1643
-
1644
- Use separate worktrees only when:
1645
-
1646
- - two different tasks are active at the same time
1647
- - a large task has been explicitly split into independent writable sub-tasks
1648
- - CI repair must proceed without disturbing an active implementation
1649
- - a read-only investigation needs a clean checkout of a different branch or commit
1650
-
1651
- Single-writer rule:
1652
-
1653
- - only one write-capable role should edit a task worktree at a time
1654
- - read-only review, exploration, or log analysis may run in parallel
1655
- - reviewer may apply small review-scoped fixes after coder hands off
1656
- - if two sessions need to edit the same files at the same time, split the work or stop and replan
1657
-
1658
- Branch rules:
1659
-
1660
- - never do AI implementation work directly on the main branch
1661
- - one task branch should map to one task worktree
1662
- - harness-managed task branches should use a stable task naming convention, for example `feature/<task-slug>`
1663
- - harness-managed task worktrees should live under a repo-local ignored worktree root
1664
- - `.gitignore` should ignore harness runtime state and repo-local task worktrees
1665
- - a task should not switch to a different branch/worktree after creation; create a new task instead
1666
- - large work should use phase commits on the same task branch unless phases are independently releasable
1667
- - if a task becomes too large, split it into child tasks with explicit branch and PR ownership
1668
-
1669
- Task close rules:
1670
-
1671
- - after task completion, close the task only when the user is ready to delete task-local state
1672
- - task close deletes the task worktree, deletes the task branch, and removes task/session/message/orchestration/handoff metadata
1673
- - task close may stop harness-managed running role sessions, but it must not silently discard uncommitted changes; finish, commit, or preserve anything important before using it
1674
-
1675
- Small commits:
1676
-
1677
- - one commit per phase
1678
- - commit messages describe behavior changes
1679
- - each commit should be understandable and revertible
1680
- - each commit should pass the relevant validation tier when practical
1681
- - use draft PRs for large changes
1682
- - do not leave a 2,000-line diff for final review
1683
-
1684
- Diff discipline:
1685
-
1686
- - every changed file must trace to the task brief, architecture plan, implementation log, validation log, or reviewer fix
1687
- - before handoff, coder must inspect `git diff` for unrelated changes, architecture drift, accidental formatting churn, generated artifacts, lockfiles, and migrations
1688
- - before acceptance, reviewer must compare `git diff` against the architecture plan and public contracts
1689
- - unrelated cleanup belongs in a separate task
1690
-
1691
- PR discipline:
1692
-
1693
- - PR description must link or summarize task brief, architecture plan, validation evidence, docs sync, and known risks
1694
- - draft PRs are preferred for large or phased work
1695
- - PR review must check scope, architecture compliance, public contracts, test adequacy, docs sync, and whether the diff is appropriately small
1696
- - final merge requires human accountability for product semantics, security boundaries, and business risk
1697
-
1698
- AI review is good at details. Humans remain responsible for:
1699
-
1700
- - architecture direction
1701
- - business semantics
1702
- - security boundaries
1703
- - product experience
1704
- - whether the work is worth doing
1705
- - whether the solution is over-engineered
1706
-
1707
- ## 11. Large Codebase Rules
1708
-
1709
- Do not rely only on grep. In large codebases, grep easily finds the wrong symbol, misses affected files, and causes partial completion or tool thrashing.
1710
-
1711
- Provide:
1712
-
1713
- ```text
1714
- .ai/tools/find-owner <path>
1715
- .ai/tools/find-callers <symbol>
1716
- .ai/tools/find-tests <path>
1717
- .ai/tools/check-boundaries
1718
- ```
1719
-
1720
- If LSP, Sourcegraph, code search, or MCP is available, Claude should prefer them.
1721
-
1722
- Do not maintain hand-written large-codebase indexes as authoritative context. Indexes drift, and stale indexes mislead agents.
1723
-
1724
- Generate context artifacts from source-of-truth systems:
1725
-
1726
- ```text
1727
- source of truth:
1728
- codebase
1729
- package manifests
1730
- CODEOWNERS / ownership metadata
1731
- build graph
1732
- import graph
1733
- test config
1734
- coverage / CI metadata
1735
- LSP / code search
1736
-
1737
- derived artifacts:
1738
- .ai/generated/module-index.json
1739
- .ai/generated/test-map.json
1740
- .ai/generated/public-surface.json
1741
- ```
1742
-
1743
- Example generated module index:
1744
-
1745
- ```json
1746
- {
1747
- "billing": {
1748
- "owner": "billing-platform",
1749
- "docs": ["docs/modules/billing.md"],
1750
- "entrypoints": ["services/billing/invoice/calculator.ts"],
1751
- "tests": ["tests/billing/invoice-calculator.test.ts"],
1752
- "commands": [".ai/tools/check-module billing"],
1753
- "rules": [
1754
- "Use Money object for all amounts",
1755
- "Do not import from payment/adapters/internal"
1756
- ]
1757
- }
1758
- }
1759
- ```
1760
-
1761
- Rules:
1762
-
1763
- - generated artifacts are caches, not truth
1764
- - manual edits to `.ai/generated/**` are forbidden
1765
- - CI must run `.ai/tools/check-generated-artifacts`
1766
- - if a generated artifact is stale, Claude must regenerate it or fall back to live code search
1767
- - if generated context conflicts with live code, live code wins
1768
-
1769
- Architecture boundaries must be mechanically checked:
1770
-
1771
- ```text
1772
- .ai/tools/check-boundaries
1773
- .ai/tools/check-generated-artifacts
1774
- ```
1775
-
1776
- and enforced in CI.
1777
-
1778
- ## 12. Long Tasks, Documentation Sync, and Replan
1779
-
1780
- Long tasks cannot rely on chat context.
1781
-
1782
- Task runtime and durable issue files:
1783
-
1784
- ```text
1785
- .ai/handoffs/
1786
- validation-log.md — task-local validation evidence
1787
- known-issues.md — task-local unresolved findings
1788
-
1789
- docs/known-issues.md — confirmed unresolved issues that must survive across tasks
1790
- ```
1791
-
1792
- Validation log authority:
1793
-
1794
- - the task-level handoff `validation-log.md` is authoritative for one task.
1795
- - completed validation evidence should not be copied into a separate rolling state file as current truth.
1796
- - Final reports and review reports should cite the task-level validation log, not scattered chat output.
1797
-
1798
- Information lifetime determines where it lives:
1799
-
1800
- ```text
1801
- within one session (phase breakdown, mid-implementation TODOs)
1802
- -> task-local scratch notes or role command updates
1803
-
1804
- across sessions of one task (progress, pending decisions)
1805
- -> durable plan current-state section when the plan should survive
1806
- -> otherwise task-local handoff/runtime state that is deleted at close
1807
-
1808
- durable architecture, testing, security, dependency, or module facts
1809
- -> docs/ARCHITECTURE.md, docs/TESTING.md, docs/SECURITY.md, docs/DEPENDENCY_RULES.md, docs/MODULE_MAP.md, or module-local CLAUDE.md
1810
-
1811
- across tasks (deferred findings, out-of-scope discoveries)
1812
- -> docs/known-issues.md
1813
- ```
1814
-
1815
- Task-local state rules:
1816
-
1817
- - Progress, scratch notes, pending decisions, and validation evidence are temporary task runtime state.
1818
- - Completed task state is deleted or archived only when it still has real future value.
1819
- - Durable architecture truth belongs in `docs/ARCHITECTURE.md`.
1820
- - Durable testing decisions belong in `docs/TESTING.md`.
1821
- - Durable security decisions belong in `docs/SECURITY.md`.
1822
- - Durable dependency or module-boundary decisions belong in `docs/DEPENDENCY_RULES.md`, `docs/MODULE_MAP.md`, or module-local `CLAUDE.md`.
1823
- - Rare historical rationale that should remain discoverable but does not fit the current-state docs may move to an optional ADR file, for example `docs/adr/<id>.md`.
1824
-
1825
- `docs/known-issues.md` entry format:
1826
-
1827
- ```md
1828
- ## YYYY-MM-DD <one-line summary>
1829
-
1830
- - discovered in: <task / session>
1831
- - type: bug | doc-drift | dead-code | architecture | security | other
1832
- - impact: low | medium | high
1833
- - proposed action: ignore | create task | revisit at next replan
1834
- ```
1835
-
1836
- Update after each session:
1837
-
1838
- ```md
1839
- ## Session Summary
1840
-
1841
- Date:
1842
- Task:
1843
- Files changed:
1844
- Validation run:
1845
- Result:
1846
- Decisions:
1847
- Open issues:
1848
- Next step:
1849
- ```
1850
-
1851
- For large, multi-phase, or multi-day tasks, create:
1852
-
1853
- ```text
1854
- docs/plans/active/<plan-name>.md
1855
- ```
1856
-
1857
- Durable plans include:
1858
-
1859
- - background
1860
- - goal
1861
- - phased plan
1862
- - validation per phase
1863
- - risks
1864
- - decision log
1865
- - current state
1866
-
1867
- When a task has a durable plan, `current state` in the plan is the authoritative progress record; task runtime state may point to it but should not duplicate it.
1868
-
1869
- ### 12.1 Documentation Sync Contract
1870
-
1871
- Changes to plan, architecture, public function contracts, test strategy, or module responsibilities must update the related docs.
1872
-
1873
- Rule:
1874
-
1875
- > If implementation differs from the approved plan, the task cannot only change code; it must update the plan and explain why.
1876
-
1877
- Check:
1878
-
1879
- - task brief
1880
- - durable plan
1881
- - `docs/ARCHITECTURE.md`
1882
- - `docs/MODULE_MAP.md`
1883
- - module docs
1884
- - module-local `CLAUDE.md`
1885
- - public surface contract
1886
- - test plan / validation section
1887
- - task-local pending decisions
1888
- - task-local progress state
1889
- - task-local known issues and `docs/known-issues.md`
1890
-
1891
- Final report must list:
1892
-
1893
- ```text
1894
- Docs checked:
1895
- Docs updated:
1896
- Known stale docs:
1897
- ```
1898
-
1899
- Enforcement:
1900
-
1901
- - PR template must include a docs sync checklist covering plan, public contract, architecture, module docs, and test plan.
1902
- - Stop hook checks that plan, public contract, or test strategy changes have matching doc updates before the session ends.
1903
- - `.ai/tools/check-docs-freshness` runs in CI and fails the build when code touching tracked surfaces lands without corresponding doc updates.
1904
-
1905
- ### 12.2 Documentation Lifecycle and Cleanup
1906
-
1907
- The rule is:
1908
-
1909
- > Temporary task documents must be deleted or archived when they stop being useful; long-term project documents must be updated when the task changes durable project truth.
1910
-
1911
- AI coding creates many coordination artifacts. Those artifacts are useful while a task is moving, but harmful after completion if they become stale pseudo-documentation. A clean repository should preserve durable knowledge, not every intermediate note.
1912
-
1913
- Documentation lifetime categories:
1914
-
1915
- ```text
1916
- runtime / disposable
1917
- -> route messages, raw logs, translation cache, session records, orchestration records
1918
-
1919
- task-local / temporary
1920
- -> task briefs, role commands, architecture-plan.md, implementation-log.md,
1921
- validation-log.md, known-issues.md, review-report.md, docs-sync-report.md,
1922
- scratch notes and decision staging entries
1923
-
1924
- durable / source of truth
1925
- -> ARCHITECTURE.md, MODULE_MAP.md, TESTING.md, SECURITY.md,
1926
- DEPENDENCY_RULES.md, module-local CLAUDE.md, accepted public contracts,
1927
- code, tests, commit messages, PR text
1928
-
1929
- archival / exceptional
1930
- -> completed plans or optional ADRs for large features, migrations,
1931
- incidents, or high-value historical rationale
1932
- ```
1933
-
1934
- Cleanup rules:
1935
-
1936
- - Harness runtime artifacts are not long-term project documentation. They should be removed by task close after the user has preserved anything important in durable docs, source, commit messages, or PR text.
1937
- - Role route files under the task-local handoff messages directory are pending message queues. After dispatch, manual handling, or task close, they should be blank or deleted with the task state.
1938
- - Raw terminal logs are recovery/debug evidence. They should not be retained as project knowledge after task close unless an incident review explicitly needs them.
1939
- - Task briefs captured in role commands, handoffs, issues, or PR text are temporary unless the requirements remain useful as durable product or engineering knowledge. Large, multi-phase work should promote durable requirements into `docs/plans/active/<plan-name>.md`.
1940
- - `architecture-plan.md` is a task handoff artifact. Durable architecture changes must be promoted to `docs/ARCHITECTURE.md`, `docs/MODULE_MAP.md`, `docs/DEPENDENCY_RULES.md`, or module-local `CLAUDE.md`.
1941
- - `implementation-log.md`, `review-report.md`, and `docs-sync-report.md` are task evidence. They should not become long-term docs unless their findings are promoted to durable docs or PR text.
1942
- - `validation-log.md` is useful evidence during acceptance. After merge, the durable facts are the passing tests, CI history, and PR/commit record; stale local validation logs should not be treated as current truth.
1943
- - Pending decisions are task-local staging notes. Confirmed durable decisions should be moved into the appropriate long-term doc and then removed from task state.
1944
- - `docs/known-issues.md` must be actively triaged. Fixed, rejected, obsolete, or no-longer-actionable issues should be removed or marked resolved with a short reason.
1945
- - Completed plans should move to `docs/plans/completed/` only for work whose execution history remains useful. Routine tasks should not leave completed plans behind forever.
1946
-
1947
- Promotion rules:
1948
-
1949
- - Durable architecture facts go to `docs/ARCHITECTURE.md`.
1950
- - Durable module ownership, file responsibility, or public surface facts go to `docs/MODULE_MAP.md` or module-local `CLAUDE.md`.
1951
- - Durable testing policy, regression strategy, or E2E scope goes to `docs/TESTING.md`.
1952
- - Durable security, auth, permission, privacy, or data-deletion policy goes to `docs/SECURITY.md`.
1953
- - Durable dependency direction or forbidden-import rules go to `docs/DEPENDENCY_RULES.md`.
1954
- - Durable public behavior belongs in code, tests, public contract docs, and PR text.
1955
- - Deferred out-of-scope work goes to `docs/known-issues.md` only if it has owner-worthy future value; otherwise leave it out.
1956
-
1957
- Task close checklist:
1958
-
1959
- ```text
1960
- Before closing a task:
1961
- 1. Promote durable facts into long-term docs, tests, source, PR text, or commit messages.
1962
- 2. Remove or clear temporary route messages, scratch notes, and staging decisions.
1963
- 3. Decide whether task brief and plan should be deleted, archived, or kept active.
1964
- 4. Triage known-issues entries touched by the task.
1965
- 5. Confirm no stale task artifact is being used as current project truth.
1966
- ```
1967
-
1968
- Final acceptance must state:
1969
-
1970
- ```text
1971
- Temporary docs removed:
1972
- Temporary docs archived:
1973
- Durable docs updated:
1974
- Known stale docs:
1975
- Cleanup exceptions:
1976
- ```
1977
-
1978
- Do not keep a document only because it was useful during the task. A document earns long-term residence only if it helps future humans or future AI sessions understand current project truth, durable rationale, or a still-open obligation.
1979
-
1980
- ### 12.3 Replan Protocol
1981
-
1982
- Triggers:
1983
-
1984
- - planned API, module, or data structure does not exist
1985
- - existing architecture invalidates plan assumptions
1986
- - public API, schema, auth, permission, or payment must change
1987
- - scope must expand
1988
- - tests show the plan cannot satisfy real behavior
1989
- - repeated fixes do not converge
1990
- - a better existing implementation or abstraction is discovered
1991
- - continuing would violate architecture constraints
1992
-
1993
- Process:
1994
-
1995
- ```text
1996
- Stop
1997
- -> Explain blocker
1998
- -> Compare approved plan with code reality
1999
- -> List options
2000
- -> Recommend new plan
2001
- -> Ask approval if scope/risk changed
2002
- -> Update docs
2003
- -> Continue implementation
2004
- ```
2005
-
2006
- Must pause for approval:
2007
-
2008
- - scope expands
2009
- - public API changes
2010
- - schema changes
2011
- - auth / permission / payment behavior changes
2012
- - architecture boundary changes
2013
- - test contract changes
2014
- - existing abstraction is deleted or replaced
2015
- - new dependency is introduced
2016
- - phased migration is needed
2017
-
2018
- Low-risk deviations may continue with a note:
2019
-
2020
- - file or test location differs
2021
- - existing helper replaces planned helper
2022
- - private implementation detail changes
2023
- - scope, public surface, architecture boundary, and test contract stay unchanged
2024
-
2025
- ### 12.4 Design Change Control
2026
-
2027
- When a large feature is split into subtasks and a design defect is found midstream, do not default to full rollback, and do not continue because of sunk cost.
2028
-
2029
- Process:
2030
-
2031
- ```text
2032
- Freeze current implementation
2033
- -> Run current validation
2034
- -> Record completed subtasks
2035
- -> Identify design defect
2036
- -> Assess impact radius
2037
- -> Classify task risk
2038
- -> Compare options
2039
- -> Preserve reusable assets
2040
- -> Discard wrong boundaries/contracts/abstractions
2041
- -> Update plan and docs
2042
- -> Continue with approved path
2043
- ```
2044
-
2045
- Severity:
2046
-
2047
- ```text
2048
- P0 architecture direction is wrong:
2049
- module boundary, public API, data model, security/permission model
2050
- => favor rebuild or major rollback
2051
-
2052
- P1 public contract needs adjustment:
2053
- public functions, file responsibilities, data flow direction
2054
- => partial rollback + migration
2055
-
2056
- P2 internal implementation issue:
2057
- helper, private function split, test organization
2058
- => local refactor
2059
-
2060
- P3 plan detail mismatch:
2061
- file name, call location, helper replacement
2062
- => update plan and continue
2063
- ```
2064
-
2065
- Compare three options:
2066
-
2067
- ```text
2068
- A. Patch forward
2069
- B. Partial rollback + redesign
2070
- C. Full rollback + rebuild
2071
- ```
2072
-
2073
- Prefer preserving tests, fixtures, docs, clarified requirements, types, UI components, validated pure functions, and low-level tools.
2074
- Prefer discarding wrong public APIs, wrong module boundaries, wrong data models, wrong permission models, wrong abstractions, and glue code built around the wrong design.
2075
-
2076
- Principle:
2077
-
2078
- > Preserve tests, knowledge, and reusable assets; discard wrong boundaries, wrong contracts, and wrong abstractions. Decide based on future maintenance cost, not lines already written.
2079
-
2080
- ## 13. AI Code Acceptance
2081
-
2082
- AI code must satisfy:
2083
-
2084
- ```text
2085
- behavior is correct
2086
- + architecture is compliant
2087
- + public contract is accurate
2088
- + tests are sufficient
2089
- + docs are synced
2090
- + plan deviations are traceable
2091
- ```
2092
-
2093
- ### 13.1 Acceptance Checklist
2094
-
2095
- ```md
2096
- # AI Code Acceptance Checklist
2097
-
2098
- ## Scope
2099
-
2100
- - [ ] Changed files and meaningful hunks have traceable reasons.
2101
- - [ ] Unexplained refactor, rename, formatting churn, or cleanup was explained, reverted, approved, or routed for follow-up.
2102
- - [ ] No forbidden files changed.
2103
- - [ ] No unapproved dependency added.
2104
- - [ ] No scope expansion without Replan.
2105
-
2106
- ## Role / Handoff
2107
-
2108
- - [ ] The task used an explicit `project-manager` role session for user communication, routing, and status reporting.
2109
- - [ ] Task risk and required route were classified.
2110
- - [ ] Required role route was followed or an exception was approved.
2111
- - [ ] The project manager verified required handoff artifacts, validation evidence, docs sync, and remaining risks.
2112
- - [ ] The project manager did not become the architect, coder, and reviewer for the same non-trivial task.
2113
- - [ ] The coder session did not own architecture, planning, final testing responsibility, and review by itself.
2114
- - [ ] Required handoff artifacts exist and match the handoff schemas.
2115
- - [ ] The coder did not change task scope, module boundaries, public contracts, or test strategy without Replan.
2116
- - [ ] The reviewer used fresh context, a reviewer role session, or a read-only reviewer subagent.
2117
- - [ ] Any reviewer direct fixes were small, local, low-risk, and review-scoped.
2118
- - [ ] Larger implementation issues were returned to coder.
2119
- - [ ] Architecture, public contract, dependency, schema, auth, permission, payment, or design issues were returned to architect.
2120
- - [ ] Architect performed post-review docs sync / architecture drift check before final PM acceptance.
2121
- - [ ] Docs updates or a docs-sync-report explain why affected architecture/module/testing/security/dependency docs are current.
2122
- - [ ] The project manager prepared final acceptance, commit, and PR only after reviewer and docs-sync gates passed or an exception was approved.
2123
- - [ ] Task-level validation evidence is recorded in the task handoff `validation-log.md` when a handoff directory exists.
2124
-
2125
- ## Architecture
2126
-
2127
- - [ ] Module boundaries are preserved.
2128
- - [ ] No forbidden internal imports.
2129
- - [ ] Business logic stays in the correct layer.
2130
- - [ ] Existing service/domain/repository APIs are reused where appropriate.
2131
- - [ ] No duplicate parallel abstraction was introduced.
2132
- - [ ] `.ai/tools/check-boundaries` passes.
2133
-
2134
- ## Public Contract
2135
-
2136
- - [ ] Public surface matches the approved plan.
2137
- - [ ] No unplanned public API was added.
2138
- - [ ] No public signature changed unexpectedly.
2139
- - [ ] Inputs, outputs, side effects, and error behavior match the contract.
2140
- - [ ] Public contract changes are reflected in docs and tests.
2141
-
2142
- ## Tests
2143
-
2144
- - [ ] New or modified public functions have contract tests.
2145
- - [ ] Behavior changes have regression tests.
2146
- - [ ] Tests cover happy path and boundary/failure path.
2147
- - [ ] High-risk functions have expanded behavior-matrix coverage.
2148
- - [ ] Tests were not weakened, deleted, or skipped.
2149
- - [ ] Tests assert behavior, not just mock call order.
2150
-
2151
- ## Validation
2152
-
2153
- - [ ] Required validation commands were run.
2154
- - [ ] Validation passed.
2155
- - [ ] Failures were fixed and rerun.
2156
- - [ ] Relevant L0/L1/L2 checks were run.
2157
- - [ ] Relevant L3 smoke E2E was run for user-facing critical paths.
2158
-
2159
- ## Docs
2160
-
2161
- - [ ] Task spec remains accurate.
2162
- - [ ] Execution plan is updated if implementation changed.
2163
- - [ ] Module docs are updated if responsibilities changed.
2164
- - [ ] Public surface contract is updated if public functions changed.
2165
- - [ ] Test plan / validation section is updated if testing strategy changed.
2166
- - [ ] Durable decisions are promoted into the appropriate long-term docs.
2167
- - [ ] Temporary task documents, staging decisions, route messages, and scratch notes are removed, cleared, or intentionally archived.
2168
- - [ ] Known issues are triaged: fixed/resolved entries are removed or marked resolved, and still-open entries remain actionable.
2169
- - [ ] No known stale docs are left behind.
2170
-
2171
- ## Replan / Design Change
2172
-
2173
- - [ ] Any deviation from the plan is documented.
2174
- - [ ] Scope, architecture, public contract, or test contract changes were approved.
2175
- - [ ] Design defects were handled through Design Change Control.
2176
- ```
2177
-
2178
- ### 13.2 Acceptance Flow
2179
-
2180
- ```text
2181
- 1. Project manager classifies task risk and required role route
2182
- 2. Verify required handoff artifacts exist
2183
- 3. Inspect diff scope
2184
- 4. Compare diff against task brief and architecture plan
2185
- 5. Compare public surface against Public Surface Contract
2186
- 6. Review tests for contract and regression coverage
2187
- 7. Run or inspect validation evidence
2188
- 8. Run architecture boundary checks
2189
- 9. Check docs consistency
2190
- 10. Run independent review for complex/high-risk changes
2191
- 11. Project manager verifies process compliance, remaining risks, and next step
2192
- 12. Approve, request changes, or trigger Replan
2193
- ```
2194
-
2195
- Claude’s final report must include:
2196
-
2197
- ```text
2198
- Task risk and required route:
2199
- Project manager decision:
2200
- Role sessions used:
2201
- Handoff artifacts:
2202
- Files changed:
2203
- Public surface changed:
2204
- Tests added/updated:
2205
- Validation run:
2206
- Architecture checks:
2207
- Docs updated:
2208
- Plan deviations:
2209
- Remaining risks:
2210
- ```
2211
-
2212
- Acceptance result:
2213
-
2214
- ```text
2215
- Accepted:
2216
- meets task, architecture, public contract, tests, validation, and docs requirements.
2217
-
2218
- Needs Changes:
2219
- clear issue, but no redesign needed.
2220
-
2221
- Replan Required:
2222
- scope, architecture, public contract, test contract, or design assumptions changed.
2223
- ```
2224
-
2225
- Automation tools:
2226
-
2227
- ```text
2228
- .ai/tools/check-boundaries
2229
- .ai/tools/check-public-surface
2230
- .ai/tools/check-contract-tests
2231
- .ai/tools/check-docs-freshness
2232
- .ai/tools/check-generated-artifacts
2233
- .ai/tools/check-agent-rules
2234
- .ai/tools/check-changed
2235
- .ai/tools/check-e2e-smoke
2236
- ```
2237
-
2238
- ## 14. MCP and Permissions
2239
-
2240
- MCP is useful for repo-external, frequently changing, tool-accessible context:
2241
-
2242
- - issue / PR
2243
- - docs/wiki
2244
- - logs/metrics/traces
2245
- - browser / Playwright
2246
- - database inspection
2247
- - feature flags
2248
- - CI logs
2249
- - code search
2250
-
2251
- Do not connect every MCP server at the start. Prioritize:
2252
-
2253
- ```text
2254
- 1. GitHub / issue / PR
2255
- 2. browser / Playwright
2256
- 3. code search
2257
- 4. CI logs
2258
- 5. internal docs
2259
- ```
2260
-
2261
- Permission principles:
2262
-
2263
- - prefer read-only
2264
- - write actions require explicit approval
2265
- - production data is not available by default
2266
- - destructive actions require hard gates
2267
- - third-party MCP servers require source review and version pinning, with an owner recorded for each enabled server
2268
-
2269
- ## 15. Harness Drift and Evolution
2270
-
2271
- The harness itself can drift. Rules, role agent files, commands, hooks, generated indexes, and validation scripts are also software and must be maintained with the same skepticism as production code.
2272
-
2273
- ### 15.1 Generated Context Only
2274
-
2275
- Manual indexes are not reliable sources of truth in a large codebase.
2276
-
2277
- Forbidden:
2278
-
2279
- - hand-maintained module indexes as authoritative context
2280
- - hand-maintained test maps as authoritative context
2281
- - stale public-surface maps
2282
- - generated artifacts edited by hand
2283
-
2284
- Allowed:
2285
-
2286
- - generated artifacts created from source code, build graphs, ownership metadata, test configs, coverage, CI, LSP, and code search
2287
- - checked-in generated artifacts only when CI verifies freshness
2288
- - fallback to live code search when generated context is missing or stale
2289
-
2290
- Required check:
2291
-
2292
- ```text
2293
- .ai/tools/check-generated-artifacts
2294
- ```
2295
-
2296
- This check should fail when:
2297
-
2298
- - `.ai/generated/module-index.json` is stale
2299
- - `.ai/generated/test-map.json` is stale
2300
- - `.ai/generated/public-surface.json` is stale
2301
- - a generated artifact was hand-edited
2302
- - generated context disagrees with source-of-truth code metadata
2303
-
2304
- If generated context conflicts with live code, live code wins.
2305
-
2306
- ### 15.2 Repeated Rules Need Rule IDs
2307
-
2308
- Repeating critical rules in root `CLAUDE.md`, module-local `CLAUDE.md`, and role agent files can be useful defense in depth, but untracked duplication causes drift.
2309
-
2310
- Rules:
2311
-
2312
- - critical repeated rules must have stable IDs, such as `RULE-SCOPE-001`, `RULE-ARCH-001`, `RULE-TEST-001`, `RULE-PERM-001`
2313
- - root rule text is canonical unless a different canonical source is explicitly defined
2314
- - role agent files should reference rule IDs or include generated rule snippets
2315
- - `.ai/tools/check-agent-rules` must fail when repeated rule text, rule IDs, or required rule coverage drift
2316
- - do not copy long rule blocks manually into many files without a freshness check
2317
-
2318
- ### 15.3 Scaffolding Must Earn Its Keep
2319
-
2320
- The right philosophy is not "more constraints means more reliability." The right philosophy is:
2321
-
2322
- > Add constraints when they prevent observed failures. Remove constraints when they no longer pay for their maintenance cost.
2323
-
2324
- Every rule, role, handoff artifact, hook, generated index, validation command, and required checklist item adds cost.
2325
-
2326
- Monthly review must remove as well as add scaffolding:
2327
-
2328
- - remove rules that no longer prevent real failures
2329
- - merge roles that create handoff overhead without reducing risk
2330
- - replace manual docs or indexes with generated artifacts
2331
- - relax constraints that block safe cross-file edits by stronger models
2332
- - promote useful repeated behavior into tests, CI, hooks, or generated checks
2333
- - delete soft behavioral rules that are not measurable and do not prevent observed failures
2334
- - delete stale workarounds created for older model limitations
2335
-
2336
- Do not preserve a rule only because it helped an older model. A rule must justify itself against the current model, current tools, current codebase, and current failure data.
2337
-
2338
- ### 15.4 Model Evolution Review
2339
-
2340
- When the model, Claude Code, MCP tooling, code search, test selection, or CI improves, revisit the harness.
2341
-
2342
- Review questions:
2343
-
2344
- - Which constraints were added for a weaker model?
2345
- - Which rules now block safe multi-file reasoning or coordinated edits?
2346
- - Which role handoffs are producing useful artifacts, and which are ritual?
2347
- - Which prompts can be replaced by stronger tests, generated checks, or tools?
2348
- - Which checks are redundant because CI or type systems now cover them?
2349
- - Which tasks can safely move to a lighter route because failure data improved?
2350
-
2351
- The goal is a harness that stays strong by staying lean.
2352
-
2353
- ## 16. Team Governance
2354
-
2355
- The team needs a Claude Code Harness owner, usually DevEx, platform, staff engineer, or architecture group.
2356
-
2357
- Responsibilities:
2358
-
2359
- - maintain `CLAUDE.md`
2360
- - maintain `.claude/agents/project-manager.md`
2361
- - maintain role command templates used by the project manager to brief `architect`, `coder`, `reviewer`, and specialist sessions
2362
- - maintain hooks
2363
- - maintain role agents, skills, subagents, and commands
2364
- - maintain validation commands
2365
- - maintain generated context artifacts and freshness checks
2366
- - maintain docs freshness and documentation sync rules
2367
- - review MCP permissions
2368
- - clean stale rules, stale docs, stale generated artifacts, and stale scaffolding
2369
- - collect agent failure modes
2370
-
2371
- Rule updates:
2372
-
2373
- - frequent mistakes go into `CLAUDE.md`
2374
- - high-risk boundaries go into `CLAUDE.md`
2375
- - information used by every task goes into `CLAUDE.md`
2376
- - everything else goes into module-local `CLAUDE.md`, docs, skill, command, hook, or CI check
2377
- - every repeated critical rule needs a stable rule ID and freshness check
2378
- - every generated context artifact needs a source-of-truth generator and CI freshness check
2379
-
2380
- Monthly review:
2381
-
2382
- - What mistakes does Claude make most often?
2383
- - Which task types succeed most often?
2384
- - Which tasks should be forbidden for automation?
2385
- - Which rules or docs are stale?
2386
- - Which rules should be removed because the current model no longer needs them?
2387
- - Which roles or handoff artifacts add overhead without reducing real failures?
2388
- - Which manual indexes or docs should become generated artifacts?
2389
- - Which prompts should become skills?
2390
- - Which validation commands are too slow?
2391
- - Which checks should move into hooks?
2392
- - Which checks, hooks, or role requirements can be simplified?
2393
-
2394
- `docs/known-issues.md` triage (every monthly review):
2395
-
2396
- - For each unhandled entry, decide: promote to task, fold into a planned change, or dismiss.
2397
- - Entries older than 90 days with no action are dismissed with a short reason recorded in `docs/known-issues.md` before removal, or in the relevant durable doc when the reason changes project policy.
2398
- - `docs/known-issues.md` is only useful if it stays small; an ever-growing file means triage is not happening.
2399
-
2400
- ## 17. Minimum Team Rules
2401
-
2402
- If you can only enforce 16 rules, enforce these:
2403
-
2404
- 1. User-facing tasks start with `claude --agent project-manager`; untagged sessions are not implicit project managers.
2405
- 2. The `project-manager` agent owns user communication, task clarification, and precise route-file message preparation.
2406
- 3. Complex tasks use explicit role sessions, handoff artifacts, and plan first; do not edit directly.
2407
- 4. One task uses one branch, one worktree, one handoff directory, and one PR by default.
2408
- 5. Role sessions for the same task work in the same task worktree sequentially; parallel write work uses separate task or sub-task worktrees.
2409
- 6. One session handles one coherent role and task.
2410
- 7. Tasks must define scope, non-goals, and validation.
2411
- 8. Every task must define file-level responsibilities.
2412
- 9. Ordinary tasks must define public function contracts.
2413
- 10. New or modified public functions must have contract tests.
2414
- 11. Code changes must run relevant validation.
2415
- 12. Architecture, public contract, or test strategy changes must sync durable docs, and completed task artifacts must be cleaned up.
2416
- 13. Manual indexes are not authoritative; generated context must be freshness-checked.
2417
- 14. AI review uses fresh context or a reviewer role session.
2418
- 15. High-risk actions require human approval.
2419
- 16. Harness rules, roles, checks, and handoffs must be reviewed for removal as well as addition.
2420
-
2421
- ## 18. Common Anti-Patterns
2422
-
2423
- ```text
2424
- Huge CLAUDE.md
2425
- -> short entry file + docs + module-local rules
2426
-
2427
- Natural-language-only constraints
2428
- -> hooks / lint / tests / CI / permissions
2429
-
2430
- Hand-maintained indexes
2431
- -> generated artifacts + CI freshness checks
2432
-
2433
- Copied critical rules in many files
2434
- -> rule IDs + generated snippets + check-agent-rules
2435
-
2436
- No validation command
2437
- -> check-fast / check-changed / check-module
2438
-
2439
- Too much at once
2440
- -> phases / incremental commits / draft PR
2441
-
2442
- One do-everything session
2443
- -> explicit role sessions + file handoffs
2444
-
2445
- Project manager becomes coder/reviewer
2446
- -> project manager coordinates and verifies; role sessions execute
2447
-
2448
- Project manager only tracks status but sends vague role prompts
2449
- -> project manager turns user intent into precise role commands with scope, contracts, validation, outputs, and stop conditions
2450
-
2451
- Role-based worktree fragmentation
2452
- -> one task worktree; architect -> coder -> reviewer hand off sequentially inside it
2453
-
2454
- Dynamic subagent routing for the main workflow
2455
- -> start the session with claude --agent <role>
2456
-
2457
- Permanent scaffolding for old model limits
2458
- -> monthly model evolution review + remove stale constraints
2459
-
2460
- Coder session self-reviews
2461
- -> reviewer role session / fresh review session / reviewer subagent
2462
-
2463
- Unbounded multi-agent parallelism
2464
- -> separate task or sub-task worktrees / ownership / read-write separation
2465
- ```