@lumenflow/cli 3.18.0 → 3.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (63) hide show
  1. package/README.md +44 -43
  2. package/dist/config-set.js +10 -1
  3. package/dist/config-set.js.map +1 -1
  4. package/dist/docs-sync.js +123 -6
  5. package/dist/docs-sync.js.map +1 -1
  6. package/dist/gate-co-change.js +28 -6
  7. package/dist/gate-co-change.js.map +1 -1
  8. package/dist/gates-runners.js +108 -12
  9. package/dist/gates-runners.js.map +1 -1
  10. package/dist/initiative-edit.js +8 -3
  11. package/dist/initiative-edit.js.map +1 -1
  12. package/dist/lumenflow-upgrade.js +50 -0
  13. package/dist/lumenflow-upgrade.js.map +1 -1
  14. package/dist/public-manifest.js +1 -1
  15. package/dist/public-manifest.js.map +1 -1
  16. package/dist/sync-templates.js +13 -0
  17. package/dist/sync-templates.js.map +1 -1
  18. package/dist/wu-block.js +10 -0
  19. package/dist/wu-block.js.map +1 -1
  20. package/dist/wu-claim-validation.js +3 -1
  21. package/dist/wu-claim-validation.js.map +1 -1
  22. package/dist/wu-claim.js +3 -1
  23. package/dist/wu-claim.js.map +1 -1
  24. package/dist/wu-done-memory-telemetry.js +5 -1
  25. package/dist/wu-done-memory-telemetry.js.map +1 -1
  26. package/dist/wu-done-ownership.js +6 -0
  27. package/dist/wu-done-ownership.js.map +1 -1
  28. package/dist/wu-edit-operations.js +4 -4
  29. package/dist/wu-edit-operations.js.map +1 -1
  30. package/dist/wu-prep.js +88 -13
  31. package/dist/wu-prep.js.map +1 -1
  32. package/dist/wu-recover.js +15 -0
  33. package/dist/wu-recover.js.map +1 -1
  34. package/dist/wu-release.js +10 -1
  35. package/dist/wu-release.js.map +1 -1
  36. package/dist/wu-spawn-prompt-builders.js +27 -2
  37. package/dist/wu-spawn-prompt-builders.js.map +1 -1
  38. package/dist/wu-state-mutation-ownership.js +136 -0
  39. package/dist/wu-state-mutation-ownership.js.map +1 -0
  40. package/dist/wu-unblock.js +10 -0
  41. package/dist/wu-unblock.js.map +1 -1
  42. package/package.json +111 -110
  43. package/packs/agent-runtime/.turbo/turbo-build.log +1 -1
  44. package/packs/agent-runtime/package.json +1 -1
  45. package/packs/sidekick/.turbo/turbo-build.log +1 -1
  46. package/packs/sidekick/package.json +1 -1
  47. package/packs/software-delivery/.turbo/turbo-build.log +1 -1
  48. package/packs/software-delivery/package.json +1 -1
  49. package/templates/core/AGENTS.md.template +157 -32
  50. package/templates/core/LUMENFLOW.md.template +44 -29
  51. package/templates/core/_frameworks/lumenflow/wu-sizing-guide.md.template +644 -0
  52. package/templates/core/ai/onboarding/agent-invocation-guide.md.template +5 -5
  53. package/templates/core/ai/onboarding/agent-safety-card.md.template +1 -0
  54. package/templates/core/ai/onboarding/docs-generation.md.template +94 -4
  55. package/templates/core/ai/onboarding/first-15-mins.md.template +1 -1
  56. package/templates/core/ai/onboarding/first-wu-mistakes.md.template +2 -1
  57. package/templates/core/ai/onboarding/initiative-orchestration.md.template +21 -21
  58. package/templates/core/ai/onboarding/quick-ref-commands.md.template +102 -95
  59. package/templates/core/ai/onboarding/release-process.md.template +12 -12
  60. package/templates/core/ai/onboarding/starting-prompt.md.template +31 -31
  61. package/templates/vendors/claude/.claude/skills/initiative-management/SKILL.md.template +2 -2
  62. package/templates/vendors/claude/.claude/skills/multi-agent-coordination/SKILL.md.template +2 -2
  63. package/templates/vendors/claude/.claude/skills/orchestration/SKILL.md.template +3 -3
@@ -0,0 +1,644 @@
1
+ # Work Unit Sizing & Strategy Guide
2
+
3
+ **Purpose:** Decision framework for agents to determine when work should remain one WU, when it needs a handoff strategy, and when it truly needs decomposition.
4
+
5
+ **Effective Date:** {{DATE}} (Post-WU-1215 Analysis)
6
+
7
+ **Status:** Active — Thresholds are mandatory **strategy triggers**. They are not permission to over-split cohesive work.
8
+
9
+ ---
10
+
11
+ ## 1. Complexity Assessment Matrix
12
+
13
+ Before claiming a WU, estimate its "weight" using these heuristics.
14
+
15
+ | Complexity | Files | Tool Calls | Context Budget | Strategy |
16
+ | :------------ | :---- | :--------- | :------------- | :---------------------------------------------------------------------------------------------------------- |
17
+ | **Simple** | <20 | <50 | <30% | **Single Session** (Tier 2 Context) |
18
+ | **Medium** | 20-50 | 50-100 | 30-50% | **Checkpoint-Resume** (Standard Handoff) |
19
+ | **Complex** | 50+ | 100+ | >50% | **Orchestrator-Worker** or **Checkpoint-Resume**; split only if non-atomic |
20
+ | **Oversized** | 100+ | 200+ | — | **Re-check cohesion**; split only when the work cannot land as one coherent outcome or no exception applies |
21
+
22
+ **These thresholds are mandatory strategy triggers.** They exist to prevent context exhaustion and rule loss (WU-1215 failure: 80k tokens consumed on analysis alone, zero implementation). They do **not** mean every Medium or Complex WU should be broken into multiple smaller WUs. Agents operate in context windows and tool calls, not clock time.
23
+
24
+ ### 1.0 Cohesion Rule
25
+
26
+ **Default bias: one coherent outcome = one WU.**
27
+
28
+ Keep work in one WU when all of the following are true:
29
+
30
+ - The acceptance criteria describe one coherent outcome
31
+ - The work should land together to be meaningful
32
+ - A single agent or handoff chain can finish it with `single-session`, `checkpoint-resume`, or `orchestrator-worker`
33
+ - The touched files all support the same change, even if there are several of them
34
+
35
+ Do **not** split a WU just because:
36
+
37
+ - It has multiple implementation steps
38
+ - Tests, docs, and code all need updates for the same change
39
+ - The work may take more than one session
40
+ - There are several files in the same lane supporting one atomic outcome
41
+
42
+ Split a WU only when one of these is true:
43
+
44
+ - Different parts can ship, review, or roll back independently
45
+ - Different lanes or owners should deliver different parts
46
+ - A risk-reduction pattern is needed, such as tracer bullet or feature flag
47
+ - The work no longer has a clean stopping point and keeps widening during execution
48
+
49
+ **Required question before splitting:** Can these parts ship, review, and roll back independently? If no, keep one WU and choose a better execution strategy.
50
+
51
+ **Examples that should usually stay one WU:**
52
+
53
+ - Add one API endpoint plus its tests and docs
54
+ - Implement one dashboard card plus the backend query it depends on in the same lane
55
+ - Mechanical import or config rewrites across many files when every change is uniform
56
+
57
+ **Anti-patterns that should usually stay one WU:**
58
+
59
+ - "Endpoint first, tests second, docs third" for one API change
60
+ - "Backend WU" and "frontend WU" when neither is independently shippable
61
+ - "Refactor step 1", "refactor step 2", and "refactor step 3" for one atomic fix
62
+ - "Metrics tests WU", "Memory tests WU", "Runtime tests WU" for closing one test coverage gap -- one outcome, one WU
63
+ - "Fix bug WU" and "Clean up file where bug was found WU" -- if the cleanup is in the same file, one WU
64
+
65
+ ### 1.0.1 Consolidation Checklist (Mandatory Before Splitting)
66
+
67
+ Agents consistently under-apply the cohesion rule, defaulting to micro-splitting. Before proposing more than 2 WUs for any body of work, run this checklist:
68
+
69
+ 1. **Same lane?** If two proposed WUs are in the same lane and the same outcome, merge them.
70
+ 2. **Same file or module?** If two WUs touch the same file, they are almost certainly one WU.
71
+ 3. **Can either ship alone?** If WU-A is meaningless without WU-B, they are one WU.
72
+ 4. **Is the split by phase, not outcome?** "Step 1" and "Step 2" of the same fix = one WU.
73
+ 5. **Is the split by artifact type?** "Tests WU" and "Code WU" for the same feature = one WU.
74
+ 6. **Would a reviewer see these as one PR?** If yes, one WU.
75
+
76
+ **Apply this checklist iteratively.** After your first pass at splitting, re-run the checklist. If any merges are possible, merge and re-run again. Stop only when no further merges pass the checklist. Real-world experience shows the first pass typically over-splits by 2-3x.
77
+
78
+ ### 1.1 Documentation-Only Exception
79
+
80
+ Documentation WUs (`type: documentation`) have relaxed file count thresholds because:
81
+
82
+ - Doc files have lower cognitive complexity than code
83
+ - No test/type/lint dependencies to track
84
+ - Changes are typically additive, not structural
85
+
86
+ | Complexity | Files (docs) | Tool Calls | Context Budget | Strategy |
87
+ | :---------- | :----------- | :--------- | :------------- | :---------------- |
88
+ | **Simple** | <40 | <50 | <30% | Single Session |
89
+ | **Medium** | 40-80 | 50-100 | 30-50% | Checkpoint-Resume |
90
+ | **Complex** | 80+ | 100+ | >50% | Decomposition |
91
+
92
+ **Applies when ALL true:**
93
+
94
+ - WU `type: documentation`
95
+ - Only modifies: `docs/**`, `*.md`, `.lumenflow/stamps/**`
96
+ - Does NOT touch: `apps/**`, `packages/**`, `tools/**` (code paths)
97
+
98
+ **Example: Docs-only WU touching 35 markdown files**
99
+
100
+ ```yaml
101
+ # WU-XXX.yaml
102
+ id: WU-XXX
103
+ type: documentation
104
+ code_paths:
105
+ - docs/operations/_frameworks/lumenflow/*.md
106
+ - docs/01-product/*.md
107
+ ```
108
+
109
+ This is allowed under docs exception: 35 files < 40 threshold for Simple.
110
+
111
+ ---
112
+
113
+ ### 1.2 Shallow Multi-File Exception
114
+
115
+ Some WUs touch many files but make shallow, uniform changes (e.g., renaming, search-replace, config updates). These may exceed file count thresholds while remaining low-complexity.
116
+
117
+ **Single-session override allowed when ALL true:**
118
+
119
+ 1. **Uniform change pattern**: Same edit repeated across files (e.g., rename, import path update, config value change)
120
+ 2. **No structural changes**: No new functions, classes, or control flow
121
+ 3. **Low cognitive load**: Each file change is <=5 lines and mechanically identical
122
+ 4. **Justification documented**: WU notes explain why thresholds are exceeded
123
+
124
+ **Example: Renaming a module across 45 files**
125
+
126
+ ```yaml
127
+ # WU YAML notes field
128
+ notes: |
129
+ Single-session override: 45 files modified but all are mechanical
130
+ import path updates from '@old/path' to '@new/path'. Each change is
131
+ 1 line, pattern is identical across all files. No structural changes.
132
+ Complexity: Low (uniform search-replace).
133
+ ```
134
+
135
+ **Counter-example (NOT eligible for override):**
136
+
137
+ A WU touching 30 files where each requires unique logic changes, test updates, or structural modifications. This is Complex, not shallow multi-file - standard thresholds apply.
138
+
139
+ ---
140
+
141
+ ### 1.3 Examples Summary
142
+
143
+ | WU Type | Files | Eligible for Override? | Reasoning |
144
+ | :-------------------------- | :---- | :--------------------- | :--------------------------------------------- |
145
+ | Docs: update 25 markdown | 25 | Yes (docs exception) | <40 files, docs-only, low complexity |
146
+ | Docs: reorg 60 doc files | 60 | Yes (docs exception) | <80 files = Medium, checkpoint-resume strategy |
147
+ | Code: rename import in 45 | 45 | Yes (shallow override) | Uniform 1-line changes, no structural edits |
148
+ | Code: refactor 30 files | 30 | No | Each file has unique logic changes |
149
+ | Code: add feature across 25 | 25 | No | Exceeds 20-file threshold, structural changes |
150
+ | Docs + Code: mixed 15 files | 15 | No | Not docs-only, standard code thresholds apply |
151
+
152
+ ---
153
+
154
+ ### 1.4 Review, Audit, and Exploration WUs
155
+
156
+ Some WUs are inherently wide and parallel: codebase reviews, security audits, migration assessments, dependency health checks. These don't follow the linear "investigate → implement → test" pattern. They follow a "fan out → gather → synthesise" pattern.
157
+
158
+ **The multiple-passes problem:** Agents consistently under-scope review WUs on the first attempt, producing a modest plan that requires the user to say "no, go wider" or "use all sub-agents." This wastes a round-trip and often an entire session restart. The guidance below eliminates that by telling agents to go wide from the start.
159
+
160
+ #### When This Section Applies
161
+
162
+ A WU qualifies as a review/audit/exploration WU when ALL of these are true:
163
+
164
+ - The outcome is a **report, assessment, or findings list** — not a code change
165
+ - The work is **read-heavy**: the agent reads far more than it writes
166
+ - Coverage matters more than depth on any single file
167
+ - The codebase area under review exceeds what one agent can explore in a single session
168
+
169
+ #### Default to Maximum Parallelism
170
+
171
+ **Do not start with one agent and escalate.** For review WUs, the correct first move is to decompose the review into independent sub-agent scopes and launch them all in parallel.
172
+
173
+ **Minimum sub-agent count for review WUs:**
174
+
175
+ | Codebase Size | Files Under Review | Minimum Parallel Agents |
176
+ | :------------ | :----------------- | :---------------------- |
177
+ | Small | <50 | 2-3 |
178
+ | Medium | 50-200 | 4-6 |
179
+ | Large | 200+ | 6-8 |
180
+
181
+ Each sub-agent should have a **distinct, non-overlapping scope** defined by one of:
182
+
183
+ - **Domain boundary**: security, performance, accessibility, architecture
184
+ - **Package/directory boundary**: `apps/web`, `packages/shared`, `lib/llm`
185
+ - **Concern boundary**: string literals, dead code, missing tests, broken routes
186
+
187
+ #### Sub-Agent Scope Template
188
+
189
+ When spawning a review sub-agent, give it:
190
+
191
+ 1. **Exactly what to look for** — not "review code quality" but "find components using inline styles instead of Tailwind tokens, missing loading states, missing error boundaries, and components that should be server components but are marked 'use client'"
192
+ 2. **Where to look** — specific directories or file patterns
193
+ 3. **Output format** — severity-ranked table with file paths, line numbers, and concrete findings
194
+ 4. **What NOT to report** — issues already caught by existing tooling (linters, type checks)
195
+
196
+ **Bad prompt** (causes multiple passes):
197
+
198
+ > "Do a code review of the frontend"
199
+
200
+ **Good prompt** (complete in one pass):
201
+
202
+ > "Review all components in `apps/web/src/components/` for: (1) inline styles that should use Tailwind tokens, (2) missing loading/empty/error states, (3) 'use client' directives on components that could be server components, (4) components rendered in loops without React.memo, (5) lists over 50 items without virtualisation. Report as severity-ranked table with file:line references."
203
+
204
+ #### Synthesising Results
205
+
206
+ The orchestrating agent (or the user) synthesises sub-agent reports into a single findings document. The orchestrator should:
207
+
208
+ - De-duplicate findings that multiple agents flagged
209
+ - Resolve contradictions (e.g., one agent says "add error boundary" and another says "component is dead code")
210
+ - Rank by severity across all sub-reports
211
+ - Group actionable items into potential follow-up WUs
212
+
213
+ #### Sizing Metadata for Review WUs
214
+
215
+ ```yaml
216
+ # WU-XXX.yaml
217
+ id: WU-XXX
218
+ type: documentation # Review output is a report, not code
219
+ sizing_estimate:
220
+ estimated_files: 200 # Files READ, not modified
221
+ estimated_tool_calls: 150
222
+ strategy: orchestrator-worker
223
+ exception_type: review-audit
224
+ exception_reason: >
225
+ Read-only codebase review. 8 parallel sub-agents each
226
+ covering a distinct concern. No code modifications.
227
+ ```
228
+
229
+ #### Anti-Patterns for Review WUs
230
+
231
+ - **Starting with one agent** and only parallelising after it runs out of context — go wide from the start
232
+ - **Overlapping scopes** where two agents review the same files for the same concerns — waste of tokens
233
+ - **Vague scope** like "review everything" — each agent needs a concrete checklist
234
+ - **Missing output format** — agents produce narrative instead of actionable tables, requiring a second pass to extract findings
235
+ - **Treating findings as implementation** — a review WU produces a report; fixes are separate WUs
236
+
237
+ ---
238
+
239
+ ### 1.5 Sizing Contract (Tooling-Backed Enforcement) (WU-2141)
240
+
241
+ The sizing thresholds in sections 1.0-1.2 are now enforced by CLI tooling. The `sizing_estimate` metadata contract lets agents declare their sizing intent at WU creation time, and the tooling validates compliance.
242
+
243
+ #### The `sizing_estimate` Metadata Contract
244
+
245
+ WU YAML specs support an optional `sizing_estimate` field. When present, `wu:create` and `wu:brief` validate it against the thresholds above.
246
+
247
+ **Fields:**
248
+
249
+ | Field | Type | Required | Description |
250
+ | :--------------------- | :----- | :------- | :-------------------------------------------------------- |
251
+ | `estimated_files` | number | Yes | Estimated number of files to be modified |
252
+ | `estimated_tool_calls` | number | Yes | Estimated number of tool calls for the session |
253
+ | `strategy` | string | Yes | Execution strategy (see valid values below) |
254
+ | `exception_type` | string | No | Exception type when thresholds are intentionally exceeded |
255
+ | `exception_reason` | string | No | Justification for the exception (required with type) |
256
+
257
+ **Valid `strategy` values:**
258
+
259
+ - `single-session` -- Fits within Simple thresholds
260
+ - `checkpoint-resume` -- Medium complexity, requires checkpoint-resume
261
+ - `orchestrator-worker` -- Complex, requires orchestrator-worker pattern
262
+ - `decomposition` -- Must be split into multiple WUs
263
+
264
+ **Valid `exception_type` values:**
265
+
266
+ - `docs-only` -- Documentation-only exception (section 1.1)
267
+ - `shallow-multi-file` -- Shallow multi-file exception (section 1.2)
268
+ - `review-audit` -- Review/audit/exploration exception (section 1.4)
269
+
270
+ #### Example: Compliant WU with sizing metadata
271
+
272
+ ```yaml
273
+ # WU-XXX.yaml
274
+ id: WU-XXX
275
+ type: feature
276
+ sizing_estimate:
277
+ estimated_files: 8
278
+ estimated_tool_calls: 35
279
+ strategy: single-session
280
+ ```
281
+
282
+ #### Example: Oversize WU with exception metadata
283
+
284
+ ```yaml
285
+ # WU-XXX.yaml
286
+ id: WU-XXX
287
+ type: documentation
288
+ sizing_estimate:
289
+ estimated_files: 45
290
+ estimated_tool_calls: 40
291
+ strategy: single-session
292
+ exception_type: docs-only
293
+ exception_reason: >
294
+ All markdown documentation files. Low cognitive complexity,
295
+ no test/type/lint dependencies.
296
+ ```
297
+
298
+ #### Example: Shallow multi-file exception
299
+
300
+ ```yaml
301
+ # WU-XXX.yaml
302
+ id: WU-XXX
303
+ type: refactor
304
+ sizing_estimate:
305
+ estimated_files: 45
306
+ estimated_tool_calls: 40
307
+ strategy: single-session
308
+ exception_type: shallow-multi-file
309
+ exception_reason: >
310
+ Uniform import path rename from '@old/path' to '@new/path'.
311
+ Each file change is 1 line, mechanically identical.
312
+ ```
313
+
314
+ #### Advisory vs Strict Mode
315
+
316
+ The tooling operates in two modes:
317
+
318
+ **Advisory mode (default):** `wu:create` and `wu:brief` emit warnings when an estimate exceeds thresholds without valid exception metadata. The operation still proceeds. This mode preserves backward compatibility -- WUs without `sizing_estimate` produce no warnings.
319
+
320
+ ```
321
+ [wu:create] WARNING (WU-100): sizing: estimated_files (30) exceeds Simple
322
+ threshold (20). Consider adding exception_type/exception_reason or splitting
323
+ the WU. See docs/operations/_frameworks/lumenflow/wu-sizing-guide.md.
324
+ ```
325
+
326
+ **Strict mode (`--strict-sizing`):** `wu:brief` supports a `--strict-sizing` flag that blocks when:
327
+
328
+ - `sizing_estimate` metadata is missing from the WU YAML
329
+ - The estimate exceeds thresholds without a valid exception
330
+
331
+ ```bash
332
+ # Advisory mode (default) -- warns but proceeds
333
+ pnpm wu:brief --id WU-XXX --client claude-code
334
+
335
+ # Strict mode -- blocks non-compliant WUs
336
+ pnpm wu:brief --id WU-XXX --client claude-code --strict-sizing
337
+ ```
338
+
339
+ Strict mode is intended for teams that want to enforce sizing discipline before delegating work to agents. It is opt-in and does not affect existing workflows.
340
+
341
+ #### Backward Compatibility
342
+
343
+ - WUs created before WU-2141 (without `sizing_estimate`) are fully supported
344
+ - Missing `sizing_estimate` is treated as "no estimate provided" -- no warnings, no errors
345
+ - Advisory mode never blocks WU creation or brief generation
346
+ - Strict mode is opt-in via `--strict-sizing` flag only
347
+
348
+ #### When to Add Sizing Metadata
349
+
350
+ **Always recommended for:**
351
+
352
+ - Feature WUs expected to exceed Simple thresholds
353
+ - WUs being delegated to agents via `wu:brief`
354
+ - Initiative WUs where scope discipline is critical
355
+
356
+ **Optional for:**
357
+
358
+ - Simple bug fixes under 10 files
359
+ - Documentation WUs under 20 files
360
+ - Any WU comfortably within Simple thresholds
361
+
362
+ ---
363
+
364
+ ## 2. Strategy Decision Tree
365
+
366
+ Use this logic to select your approach. If `git status` ever shows >20 modified files, STOP and re-evaluate cohesion and strategy. Do not auto-split on file count alone. First check the cohesion rule, the required ship/review/rollback question, and the documented exceptions.
367
+
368
+ ```
369
+ ┌─────────────────────────┐
370
+ │ Start WU Analysis │
371
+ └────────────┬────────────┘
372
+
373
+
374
+ ┌──────────────┐
375
+ │ Est. Tool │
376
+ │ Calls > 50? │
377
+ └──┬────────┬──┘
378
+ │ │
379
+ No Yes
380
+ │ │
381
+ ▼ ▼
382
+ ┌─────────┐ ┌────────────────┐
383
+ │Standard │ │ Complexity │
384
+ │Strategy │ │ Type? │
385
+ │(Tier 2) │ └──┬──────────┬──┘
386
+ └─────────┘ │ │
387
+ │ │
388
+ Single Domain Multi-Domain
389
+ Clear Phases High Coordination
390
+ │ │
391
+ ▼ ▼
392
+ ┌──────────────┐ ┌────────────────┐
393
+ │Checkpoint- │ │ Must Land │
394
+ │Resume │ │ Atomically? │
395
+ │• Investigate │ └──┬──────────┬──┘
396
+ │• Implement │ │ │
397
+ │• Mid-WU │ Yes No
398
+ │ Handoff │ │ │
399
+ └──────────────┘ ▼ ▼
400
+ ┌──────────────┐ ┌──────────────┐
401
+ │Orchestrator- │ │Decomposition │
402
+ │Worker │ │• Split WUs │
403
+ │• Main Agent │ │• Feature │
404
+ │ = Coord. │ │ Flags │
405
+ │• Spawns: │ │• Dependencies│
406
+ │ Tester, │ └──────────────┘
407
+ │ Guardian, │
408
+ │ Coder │
409
+ └──────────────┘
410
+ ```
411
+
412
+ ---
413
+
414
+ ## 3. Splitting Patterns (Decomposition)
415
+
416
+ Only use these patterns after you have confirmed the work is no longer one coherent, independently reviewable outcome. Complex does not mean "automatically decompose."
417
+
418
+ ### Pattern A: The Tracer Bullet (Risk Reduction)
419
+
420
+ **Best for:** New integrations, unproven libraries.
421
+
422
+ **Strategy:**
423
+
424
+ - **WU-1:** Define Ports, implement a hardcoded/mock Adapter, write the E2E test. Prove the "walking skeleton" works.
425
+ - **WU-2:** Implement the real infrastructure Adapter and logic.
426
+
427
+ **Example:** Integrating a new LLM provider
428
+
429
+ - WU-A: Create port interface + mock adapter returning fixed responses + E2E test proving UI can display results
430
+ - WU-B: Implement real API adapter with error handling, rate limiting, etc.
431
+
432
+ **Why:** De-risks unknowns early; proves integration works before investing in full implementation.
433
+
434
+ ---
435
+
436
+ ### Pattern B: The Layer Split (Architectural)
437
+
438
+ **Best for:** Large backend features following hexagonal architecture.
439
+
440
+ **Strategy:**
441
+
442
+ - **WU-1:** Core Domain (Ports + Application). Pure logic, fast unit tests.
443
+ - **WU-2:** Infrastructure Adapters + Integration Tests.
444
+
445
+ **Example:** New data export feature
446
+
447
+ - WU-A: Port definitions + application use case + unit tests (no external dependencies)
448
+ - WU-B: File system adapter + S3 adapter + integration tests
449
+
450
+ **Why:** Application logic can be reviewed/tested independently; adapters can be implemented in parallel lanes.
451
+
452
+ ---
453
+
454
+ ### Pattern C: The UI/Logic Split (Lane Separation)
455
+
456
+ **Best for:** Full-stack features requiring heavy frontend work.
457
+
458
+ **Strategy:**
459
+
460
+ - **WU-1 (Core Systems):** API endpoints, database schema, backend logic.
461
+ - **WU-2 (Experience):** Frontend components, UI state, integration with API.
462
+
463
+ **Example:** New analytics dashboard widget
464
+
465
+ - WU-A (Core Systems lane): `/api/dashboard/summary` endpoint + DB queries + API tests
466
+ - WU-B (Experience lane): `DashboardSummaryCard` component + state management + E2E tests
467
+
468
+ **Why:** Enables parallel work across lanes; backend can be tested independently of UI.
469
+
470
+ ---
471
+
472
+ ### Pattern D: The Feature Flag (Phased Rollout)
473
+
474
+ **Best for:** High-risk refactoring (like WU-1215), breaking changes, gradual migrations.
475
+
476
+ **Strategy:**
477
+
478
+ - **WU-1:** Implement new logic behind a `ENABLE_NEW_FLOW=true` flag. Tests run against the flag.
479
+ - **WU-2:** Remove the flag and delete old code path.
480
+
481
+ **Example:** Refactoring a large function (WU-1215 case)
482
+
483
+ - WU-A: Extract new modular functions, call them behind `USE_NEW_WU_DONE=true`, preserve old `main()` as default
484
+ - WU-B: After validation, remove flag and old `main()` implementation
485
+
486
+ **Why:** Allows incremental delivery with rollback safety; can test new code in production without risk.
487
+
488
+ ---
489
+
490
+ ## 4. Context Safety Triggers
491
+
492
+ **Heading:** Default Triggers (Deviations require written justification in WU notes)
493
+
494
+ If you hit ANY of these triggers during a session, you MUST perform a Standard Session Handoff (see [session-handoff.md](./agent/onboarding/session-handoff.md)):
495
+
496
+ - **Token Limit:** Context usage hits **50% (Warning)** or **80% (Critical)**.
497
+ - **Tool Volume:** **50+ tool calls** in current session.
498
+ - **File Volume:** **20+ files** modified in `git status`.
499
+ - **Session Staleness:** Repeated redundant queries or forgotten context (performance degradation).
500
+
501
+ **Why these triggers matter:** Ignoring them led to the WU-1215 failure. An agent consumed 40% of context (80k tokens) on analysis alone, violated worktree discipline using absolute paths, and failed to deliver implementation. Preserve your reasoning capability by clearing context before you crash.
502
+
503
+ **Performance degradation symptoms:**
504
+
505
+ - Redundant tool calls (re-fetching already retrieved information)
506
+ - Lost worktree discipline (edits landing in main instead of worktree)
507
+ - Forgotten decisions or contradicting earlier conclusions
508
+ - Increased latency on similar operations
509
+
510
+ **When triggers fire:**
511
+
512
+ 1. Update WU YAML `notes` field with progress, decisions, and next steps
513
+ 2. Run `pnpm mem:checkpoint --wu WU-XXX`
514
+ 3. Commit work to the lane branch (in the worktree)
515
+ 4. Push the lane branch to origin
516
+ 5. Generate a fresh handoff with `pnpm wu:brief --id WU-XXX --client <client>`
517
+ 6. Resume fresh from the documented checkpoint
518
+
519
+ **Deviation protocol:** If a trigger fires but you believe an exception applies, check section 1.1 (Documentation-Only Exception) or section 1.2 (Shallow Multi-File Exception). If your WU qualifies:
520
+
521
+ 1. Document the justification in WU notes (required)
522
+ 2. Specify which exception applies and why
523
+ 3. Monitor for performance degradation symptoms listed above
524
+ 4. If symptoms appear, checkpoint and spawn fresh regardless of file count
525
+
526
+ ---
527
+
528
+ ## 5. Spawn Fresh, Don't Continue (Mandatory Policy)
529
+
530
+ **When approaching context limits, spawn a fresh agent instead of continuing after compaction.**
531
+
532
+ Context compaction (summarization) causes agents to lose critical rules. The recommended approach from [Anthropic's engineering guidance](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) is:
533
+
534
+ > "An initializer agent that sets up the environment, and a coding agent tasked with **making incremental progress in every session**, while leaving clear artifacts for the next session."
535
+
536
+ ### When to Spawn Fresh
537
+
538
+ Spawn a fresh agent when ANY of these apply:
539
+
540
+ - Context usage exceeds 80%
541
+ - Tool calls exceed 50 in current session
542
+ - You notice performance degradation (redundant queries, forgotten context)
543
+ - You're about to run `/compact` or `/clear`
544
+
545
+ ### Spawn Fresh Protocol
546
+
547
+ ```bash
548
+ # 1. Checkpoint your progress
549
+ pnpm mem:checkpoint "Progress: completed X, next: Y" --wu WU-XXX
550
+
551
+ # 2. Commit and push work
552
+ git add -A && git commit -m "checkpoint: progress on X"
553
+ git push origin lane/<lane>/wu-xxx
554
+
555
+ # 3. Generate fresh agent prompt
556
+ pnpm wu:brief --id WU-XXX --client <client>
557
+
558
+ # 4. EXIT current session (do NOT continue after compaction)
559
+
560
+ # 5. Start fresh agent with the generated prompt
561
+ ```
562
+
563
+ ### Why Not Continue After Compaction?
564
+
565
+ - Compaction summarizes conversation → rules get lost in summary
566
+ - Agent forgets worktree discipline, WU context, constraints
567
+ - Recovery mechanisms are complex and vendor-specific
568
+ - Prevention (fresh agent) is simpler and more reliable than recovery
569
+
570
+ **This is not failure—it's disciplined execution.** Each agent session makes bounded progress and leaves clear artifacts for the next session.
571
+
572
+ ---
573
+
574
+ ## 6. Quick Reference
575
+
576
+ | Scenario | Strategy | Action |
577
+ | :----------------------------------------------- | :------------------ | :------------------------------------------------------------ |
578
+ | Bug fix, single file, <20 tool calls | Simple | Claim, fix, commit, `wu:done` |
579
+ | Feature spanning 50-100 tool calls, clear phases | Checkpoint-Resume | Phase 1 → checkpoint → Phase 2 → checkpoint → done |
580
+ | Multi-domain feature, must land atomically | Orchestrator-Worker | Main agent coordinates, spawns test-engineer, safety-reviewer |
581
+ | Large refactor 100+ tool calls | Feature Flag Split | WU-A: New behind flag → WU-B: Remove flag + old code |
582
+ | New integration, uncertain complexity | Tracer Bullet | WU-A: Prove skeleton works → WU-B: Real implementation |
583
+ | Docs-only, 30 markdown files | Simple (exception) | Single session, document in notes, monitor for degradation |
584
+ | Rename import across 45 files (uniform) | Simple (override) | Document justification, proceed if all 4 criteria met |
585
+ | Full codebase code review, 200+ files | Orchestrator-Worker | 6-8 parallel sub-agents, each with distinct concern scope |
586
+ | Security audit of auth flows | Orchestrator-Worker | 3-4 agents: RLS, auth middleware, secrets, API exposure |
587
+
588
+ ---
589
+
590
+ ## 7. Case Study: WU-1215 (Learning from Failure)
591
+
592
+ **WU:** Refactor wu-done.mjs 768-line `main()` function
593
+
594
+ **What went wrong:**
595
+
596
+ 1. **Token exhaustion:** 80k/200k tokens (40%) consumed on analysis alone, zero implementation
597
+ 2. **Worktree discipline violation:** Absolute paths (`/home/...`) in tool calls bypassed isolation, edits landed in main checkout
598
+ 3. **Scope underestimation:** Spec said 708 LOC, actual was 768 LOC + 100 control flow statements (8% larger, significantly more complex)
599
+ 4. **Single-session attempt for multi-phase work:** Extract → test → integrate phases require separate sessions
600
+
601
+ **What the agent did right (healthy recovery):**
602
+
603
+ - Self-detected violation via `git status` in main
604
+ - Immediately blocked WU with clear reason
605
+ - Documented root cause and next steps for handover
606
+ - Did not attempt to "power through" context exhaustion
607
+
608
+ **Lesson:** When scope exceeds session capacity, STOP and checkpoint. Document progress, commit, `/clear`, resume fresh. This is not failure—it's disciplined execution.
609
+
610
+ **Recommended strategy for WU-1215:** Feature Flag Split (Pattern D) with 3 WUs:
611
+
612
+ - WU-1: Extract validation modules + tests (~40 tool calls)
613
+ - WU-2: Extract orchestration logic + tests (~40 tool calls)
614
+ - WU-3: Final integration + cleanup (~30 tool calls)
615
+
616
+ This is a case study in genuinely non-atomic work. Do not generalize it into "multi-step work should be split." Most endpoint + tests + docs changes remain one WU.
617
+
618
+ ---
619
+
620
+ ## 8. Related Documentation
621
+
622
+ - [session-handoff.md](./agent/onboarding/session-handoff.md) — Mid-WU checkpoint protocol
623
+ - [agent-safety-card.md](./agent/onboarding/agent-safety-card.md) — Quick reference safety thresholds
624
+ - [parallel-session-optimization.md](./agent/onboarding/parallel-session-optimization.md) — Running 4-6 WUs concurrently
625
+ - [agent-invocation-guide.md](./agent/onboarding/agent-invocation-guide.md) — Orchestrator-worker patterns
626
+ - [LumenFlow Agent Capsule](./lumenflow-agent-capsule.md) — Full LumenFlow framework
627
+ - [Canonical Lifecycle Map](./lumenflow-agent-capsule.md) — Command-mode matrix and handoff points
628
+ - [Failure-Mode Runbook](./lumenflow-agent-capsule.md) — Remediation for common operational failures
629
+
630
+ ---
631
+
632
+ **Version:** 1.6 ({{DATE}})
633
+ **Last Updated:** {{DATE}}
634
+ **Contributors:** Claude (research), Codex (pragmatic framing), Gemini (trigger enforcement)
635
+
636
+ **Changelog:**
637
+
638
+ - v1.6 ({{DATE}}): Added section 1.4 (Review, Audit, and Exploration WUs) addressing multi-agent parallelism for read-heavy WUs. Fixes the "multiple passes" problem where agents under-scope review WUs. Deleted stale CLI docs shadow copy missed by WU-2398. Fixed broken relative links. Renumbered sizing contract to section 1.5.
639
+ - v1.5 ({{DATE}}): Consolidated from two files into single canonical doc. Added consolidation checklist (1.0.1) to counter systematic agent over-splitting. Strengthened anti-patterns with additional examples.
640
+ - v1.3 ({{DATE}}): Added sizing contract section (1.4) documenting tooling-backed enforcement via `sizing_estimate` metadata, advisory warnings in `wu:create`, and `--strict-sizing` mode in `wu:brief` (WU-2141, WU-2143).
641
+ - v1.4 ({{DATE}}): Tightened anti-fragmentation guidance. Complex and oversized heuristics now force a cohesion re-check before decomposition, added explicit anti-patterns for backend/tests/docs micro-splitting, and replaced `/clear`-centric recovery advice with checkpoint + `wu:brief` handoff guidance.
642
+ - v1.2 ({{DATE}}): Added documentation-only exception (section 1.1), shallow multi-file exception with single-session override criteria (section 1.2), and examples summary table (section 1.3). Updated deviation protocol to reference exceptions.
643
+ - v1.1 ({{DATE}}): Removed time-based estimates (hours); replaced with tool-call and context-budget heuristics. Agents operate in context windows, not clock time.
644
+ - v1.0 ({{DATE}}): Initial version based on WU-1215 post-mortem.
@@ -38,7 +38,7 @@ Use Tier 1 after `/clear` to stay lean, then load more only if needed.
38
38
  ## 2) Session Management (Start Fresh)
39
39
 
40
40
  When approaching context limits, **start a fresh agent instead of compaction**.
41
- The handoff prompt is the bridge between sessions. `wu:brief` always outputs full WU context AND records evidence -- there is no separate evidence-only mode.
41
+ The handoff prompt is the bridge between sessions. `wu:brief` always generates the full WU context prompt and records evidence, whether you are delegating or self-implementing.
42
42
 
43
43
  **Mandatory triggers:**
44
44
 
@@ -119,17 +119,17 @@ It does **not** by itself prove pickup or execution. Pickup/execution confirmati
119
119
 
120
120
  ---
121
121
 
122
- ## 2c) Self-Implementation Flow
122
+ ## 2c) Self-Implementation Flow (WU-2222)
123
123
 
124
- When you are **not** delegating and will implement the WU in the current session, run `wu:brief` to record evidence and then continue working:
124
+ When you are **not** delegating and will implement the WU in the current session, run `wu:brief` normally:
125
125
 
126
126
  ```bash
127
127
  pnpm wu:brief --id WU-XXX --client <client>
128
- # wu:brief always outputs full WU context AND records evidence.
128
+ # This outputs full WU context AND records evidence.
129
129
  # Then continue implementation directly in this session (no Task spawn).
130
130
  ```
131
131
 
132
- Use `wu:delegate` instead of `wu:brief` only when you need explicit delegation lineage recording for audit trails.
132
+ `wu:brief` always provides full context and records evidence in a single step. Use delegation flow (`wu:delegate`) only when you need auditable lineage tracking for initiative work.
133
133
 
134
134
  ---
135
135
 
@@ -100,6 +100,7 @@ cat {{DOCS_TASKS_PATH}}/status.md
100
100
 
101
101
  # Claim a WU
102
102
  pnpm wu:claim --id WU-XXX --lane <Lane>
103
+ pnpm wu:brief --id WU-XXX --client <client>
103
104
 
104
105
  # Work in worktree
105
106
  cd worktrees/<lane>-wu-xxx