forge-orkes 0.13.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "forge-orkes",
3
- "version": "0.13.0",
3
+ "version": "0.16.0",
4
4
  "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
5
5
  "bin": {
6
6
  "create-forge": "./bin/create-forge.js"
@@ -51,6 +51,9 @@ requirements:
51
51
  Mark unknowns `[NEEDS CLARIFICATION]` — never guess.
52
52
 
53
53
  ### 5. Decompose Tasks
54
+
55
+ **Cross-layer first:** if this phase introduces/changes an interface one layer produces and another consumes (litmus: would an isolated agent have to guess the other's shape?), pin the delta in `contract.md` and either tag tasks `layer:` (Tier 1, one plan) or split into producer plan-NNa (pins contract) + consumer plan-NNb (`depends_on` the *frozen contract*, builds in parallel) -- Tier 2. See planning skill Step 6.1.
56
+
54
57
  ```xml
55
58
  <task type="auto|manual">
56
59
  <name>{Verb} {thing} {detail}</name>
@@ -109,3 +112,4 @@ must_haves:
109
112
  - **Horizontal slicing**: models->routes->UI (prefer vertical)
110
113
  - **Gold-plating**: Beyond requirements
111
114
  - **Guessing**: Fill unknowns instead of `[NEEDS CLARIFICATION]`
115
+ - **Guessing a cross-layer shape**: splitting layers without pinning `contract.md` first -- the consumer ends up guessing the producer's shape. Pin the contract, then split.
@@ -0,0 +1,76 @@
1
+ # Forge Hooks
2
+
3
+ ## `forge-claim-check.sh` — PreToolUse claim-check
4
+
5
+ Cross-session file-claim collision detector. Pairs with the Forge MCP
6
+ orchestrator (`.forge/.mcp-server/`) to prevent two concurrent Claude Code
7
+ sessions from clobbering each other's edits on the same file.
8
+
9
+ ### Behavior
10
+
11
+ Reads the Claude Code `PreToolUse` JSON payload on stdin. Extracts target
12
+ file path(s) from `tool_input.file_path`, `tool_input.notebook_path`,
13
+ `tool_input.path`, or `tool_input.edits[].file_path` (MultiEdit). For each
14
+ path, queries `.forge/.mcp-server/claims.db` for an active claim.
15
+
16
+ | Situation | Exit | Effect |
17
+ |---|---|---|
18
+ | No claim, or DB missing (fresh repo) | `0` | allow |
19
+ | Claim held by current `CLAUDE_SESSION_ID` | `0` | allow |
20
+ | `CLAUDE_SESSION_ID` unset (single-agent / non-Claude invocation) | `0` | allow + stderr warning |
21
+ | Unknown payload schema (no recognized path field) | `0` | allow |
22
+ | Claim held by another session | `2` | deny, stderr names owner + expiry |
23
+ | Any unexpected error (corrupt DB, jq failure, sqlite timeout, etc.) | `2` | fail-closed deny |
24
+
25
+ **Never exits 1.** Claude Code treats non-zero as warning by default; we
26
+ need a hard block on collision, so deny is always `exit 2`.
27
+
28
+ ### Prerequisites
29
+
30
+ - `bash` (≥ 4 recommended — relies on `set -u` array safety patterns)
31
+ - `jq`
32
+ - `sqlite3`
33
+ - `timeout` (GNU coreutils) **or** `gtimeout` (macOS, `brew install coreutils`) — optional but recommended; without it the SQLite query is unbounded (DB-level `busy_timeout` still applies)
34
+
35
+ Run `bash .claude/hooks/forge-claim-check-doctor.sh` to verify prerequisites.
36
+
37
+ ### Environment
38
+
39
+ | Var | Source | Purpose |
40
+ |---|---|---|
41
+ | `CLAUDE_PROJECT_DIR` | Claude Code | Project root, used to resolve relative paths and locate DB |
42
+ | `CLAUDE_SESSION_ID` | Claude Code | Current session identifier — own claims pass through |
43
+ | `FORGE_CLAIMS_DB` | optional override | Path to `claims.db` (defaults to `$CLAUDE_PROJECT_DIR/.forge/.mcp-server/claims.db`) |
44
+
45
+ ### Registration
46
+
47
+ Not registered automatically. The install procedure (plan-06) adds the
48
+ `PreToolUse` entry to `.claude/settings.json`:
49
+
50
+ ```json
51
+ {
52
+ "hooks": {
53
+ "PreToolUse": [
54
+ {
55
+ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
56
+ "hooks": [
57
+ { "type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/forge-claim-check.sh" }
58
+ ]
59
+ }
60
+ ]
61
+ }
62
+ }
63
+ ```
64
+
65
+ ### Disabling
66
+
67
+ Rename or remove the hook entry in `.claude/settings.json`, or set the file
68
+ non-executable: `chmod -x .claude/hooks/forge-claim-check.sh`. The hook is
69
+ defense-in-depth — the MCP server's `forge_claim_files` tool remains the
70
+ primary coordination point.
71
+
72
+ ### Troubleshooting
73
+
74
+ - "internal error at line N" on every edit → corrupt DB or missing tool. Run doctor. Common: `jq` not on PATH.
75
+ - No collisions detected → confirm `CLAUDE_SESSION_ID` set and `claims.db` exists; otherwise hook fail-opens.
76
+ - macOS `timeout: command not found` → `brew install coreutils` for `gtimeout`, or skip (DB busy_timeout still applies).
@@ -7,6 +7,17 @@ description: "Make architectural decisions: choose frameworks, design data model
7
7
 
8
8
  Make architectural decisions. Document rationale. Consider alternatives.
9
9
 
10
+ ## Vertical-Slice Bias
11
+
12
+ Architectural decisions should preserve the planning skill's slice-first decomposition. Favor designs that let the team ship thin end-to-end user journeys early:
13
+
14
+ - **Prefer feature-folder layouts** (`src/features/signup/{ui,api,data}.ts`) over strict layer-folder layouts (`src/models/`, `src/api/`, `src/components/`) when the project allows. Layer-folder layouts are not banned -- but they invite horizontal decomposition.
15
+ - **Avoid framework choices that force big-bang integration.** If picking framework A means UI cannot be wired until the whole data layer ships, that's a red flag -- document the trade-off in the ADR's Consequences section.
16
+ - **Contracts before completeness.** Define the minimal API contract a single slice needs. Resist designing the full API surface upfront -- successive slices extend it.
17
+ - **Data models grow per slice.** Start with the fields slice 1 needs. Add columns/entities as later slices require. Reject "design the whole schema first" unless a `slice_exception: data_migration` phase is planned.
18
+
19
+ When an architectural decision conflicts with vertical slicing (e.g., a framework that requires full backend before any UI is testable), surface the conflict explicitly in the ADR's **Trade-Offs** section.
20
+
10
21
  ## When to Use
11
22
 
12
23
  - Choosing a framework, library, or major dependency
@@ -7,6 +7,20 @@ description: "Systematic debugging when tests fail, features break, errors are c
7
7
 
8
8
  Every hypothesis tested, every dead end recorded.
9
9
 
10
+ ## Entry Path: Merge Conflict [Experimental — M10]
11
+
12
+ Invoked by `orchestrating` skill when `forge_queue_commit` returns `status: conflict`.
13
+
14
+ **Payload:** `{ conflicted_files: [...], base_sha, messages: [...], branch }`
15
+
16
+ **Workflow:**
17
+ 1. Inside agent's worktree: `git fetch origin main && git rebase ${base_sha}` (use payload ref).
18
+ 2. For each `conflicted_files[]` entry: inspect both sides via `git status` + `git diff`. Resolve per task context (or prompt user when intent unclear).
19
+ 3. After all resolved: `git add <files>` then `git rebase --continue`.
20
+ 4. Re-invoke `Skill(orchestrating)` with `action: retry-teardown` → orchestrating re-calls `forge_queue_commit`.
21
+
22
+ **Abort path:** if user aborts or resolution stalls, leave worktree in conflicted state and append `{ kind: "merge_conflict", branch, files }` to `lifecycle.blockers[]` in `.forge/state/milestone-{id}.yml`. Single-agent debugging entry paths below are unaffected.
23
+
10
24
  ## Scientific Method
11
25
 
12
26
  1. **Observe**: Exact behavior — error, repro steps, when it started
@@ -35,6 +35,16 @@ Execution-phase operational guidance below supplements the rules — it does not
35
35
  ### Scope Boundary
36
36
  Only fix issues DIRECTLY caused by the current task. Pre-existing warnings, tech debt, unrelated bugs → log to `.forge/deferred-issues.md`.
37
37
 
38
+ ### Slice Integrity (Execution-Side)
39
+
40
+ The planning skill enforces vertical slicing at plan-creation time. Executor responsibility: do not silently re-introduce horizontal decomposition.
41
+
42
+ - **Do not split a slice plan into "backend now, UI later"** under Rule 1/2/3. If the UI half of a slice is broken, fix it -- do not defer it. Deferring the UI breaks the slice's user-observable truth.
43
+ - **Do not collapse the slice into a stub** to pass verification. `key_links` must be real (component actually hits handler; handler actually persists). Stubbed links fail the verifying gate.
44
+ - If a slice genuinely cannot ship end-to-end (e.g., external API blocker), invoke **Rule 4 -- STOP, ask user.** Options: redefine the slice, declare a `slice_exception:` and continue, or defer the phase. Do not autonomously ship half a slice.
45
+
46
+ This rule does NOT override the 3-strike limit or scope boundary -- it sits alongside them.
47
+
38
48
  ## Native Task Tracking
39
49
 
40
50
  Use `TaskCreate`/`TaskUpdate`/`TaskList` for in-session visibility. `.forge/state/milestone-{id}.yml` remains the cross-session source of truth.
@@ -95,6 +105,30 @@ feat(auth-01): implement JWT-based login
95
105
  - Include integration test for login flow
96
106
  ```
97
107
 
108
+ ## Multi-Agent Claim Convention [Experimental — M10]
109
+
110
+ **Trigger:** active milestone state has `lifecycle.worktree_mode: active` (set by `orchestrating` skill). If absent or any other value → skip this section entirely; single-agent behavior unchanged.
111
+
112
+ **Pre-edit:** before the first `Edit` / `Write` / `MultiEdit` / `NotebookEdit` in a task, call MCP tool:
113
+
114
+ ```
115
+ forge_claim_files {
116
+ session_id: lifecycle.session_id,
117
+ files: [<absolute paths from task <files> manifest>],
118
+ ttl_seconds: 1800,
119
+ reason: "executing m{M}-{N} task <name>"
120
+ }
121
+ ```
122
+
123
+ **Branches:**
124
+ - **Full claim granted** → proceed with edits.
125
+ - **Partial rejection** → surface `rejected[].file`, `rejected[].owner_session`, `rejected[].expires_at` to user. Three options: **abort** task, **skip** rejected files (continue with granted subset), **wait** then retry after `expires_at`.
126
+ - **`DB_UNAVAILABLE`** → log warning, proceed. PreToolUse claim-check hook is defense-in-depth — coordination degrades to best-effort, isolation (worktree) still holds.
127
+
128
+ **End of task:** call `forge_release_claims { session_id, files: [...] }` after final commit. Plan-complete bulk release handled by `orchestrating` teardown.
129
+
130
+ See ADR-003 and `.claude/skills/orchestrating/SKILL.md`.
131
+
98
132
  ## Verification Gate
99
133
 
100
134
  After each task commit, run configured verification commands. Mechanical — not optional.
@@ -221,7 +255,21 @@ Log to `.forge/state/index.yml → desire_paths` (global, not per-milestone):
221
255
  - **User corrections**: Repeated correction matching a prior one → `user_correction`, increment count
222
256
  - **Agent struggles**: Multiple attempts or user guidance needed → `agent_struggle`
223
257
 
258
+ ## Cross-Layer Seam Check
259
+
260
+ **Trigger:** the phase was split by planning Step 6.1 into a **Tier-2** producer plan-NNa + consumer plan-NNb (both carry a `contract:` frontmatter path pointing at the same `contract.md`). Single-plan / Tier-1 (`layer:` tag, no split, contract honored inline) → skip; no seam check needed.
261
+
262
+ After **both** layer plans are committed, the executing flow owns one final **seam-check task** — there is no standing agent for this:
263
+
264
+ 1. **Read** the phase `contract.md` — `delta`, `producer_layer`, `consumer_layer`, `seam_check`, `status` (should already be `ratified` from the planning Tier-2 gate).
265
+ 2. **Merge** the layer branches/worktrees. If `lifecycle.worktree_mode: active`, the `orchestrating` teardown merges them; otherwise merge the sibling plan branches into the phase working tree.
266
+ 3. **Verify the seam** — run the assertion named in `contract.md` `seam_check` (a test, a type-check, or a structural grep) to prove the shape the producer emits matches what the consumer built against per `delta`.
267
+ 4. **Match** → commit the merge: `feat({phase}): seam check {integration_point}`. Contract stays `status: ratified`.
268
+ **Mismatch** → the consumer guessed wrong against a frozen contract → **Rule 1** fix on the consumer side. If the *contract itself* is wrong (producer can't emit the agreed shape) → **Rule 4** STOP, re-ratify with the user before proceeding.
269
+ 5. Leave the contract at `status: ratified` — the `reviewing` skill folds `delta` into the governing ADR (`status: absorbed`) at milestone landing. Do **not** absorb here.
270
+
224
271
  ## Phase Handoff
225
272
  1. Confirm persistence — summary documented, commits made, state updated, desire paths logged
226
- 2. Set `current.status` to `verifying`
227
- 3. Recommend: *"Tasks committed, state updated. `/clear` then `/forge` to continue with verifying."*
273
+ 2. **Run the Cross-Layer Seam Check** (above) if this phase was a Tier-2 contract split
274
+ 3. Set `current.status` to `verifying`
275
+ 4. Recommend: *"Tasks committed, state updated. `/clear` then `/forge` to continue with verifying."*
@@ -188,6 +188,8 @@ Tier + state → invoke via `Skill` tool. All phases use `Skill()`.
188
188
 
189
189
  **CRITICAL: NEVER `EnterPlanMode`.** "Planning" = `Skill(planning)`. Native plan mode writes wrong format, bypasses gates + state.
190
190
 
191
+ **Experimental:** if user invokes `orchestrating` skill (M10) and repo has it installed (`.claude/skills/orchestrating/` present + MCP server + claim-check hook), route through it **before** `executing` to bootstrap multi-agent worktree. Skill is opt-in per ADR-001; absent install → fall through to standard routing.
192
+
191
193
  ### Auto-Routing (Always Deterministic)
192
194
 
193
195
  **No menus.** Applies on first run and resume. Deterministic. Brief → route. Choices only at `complete` or corrupted.
@@ -223,7 +225,7 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
223
225
  | reviewing | sonnet | Audit judgment |
224
226
  | quick-tasking | haiku | Speed |
225
227
  | discussing | sonnet | Conversation |
226
- | testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing |
228
+ | testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing. M9: author-mode refuses e2e without `e2e:true` + `validated:true`. |
227
229
  | deferred | haiku | Read + format only |
228
230
 
229
231
  | `current.status` | Route To |
@@ -234,8 +236,8 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
234
236
  | `architecting` | `Skill(architecting)` → planning |
235
237
  | `planning` | `Skill(planning)` → executing |
236
238
  | `executing` | `Skill(executing)` → verifying |
237
- | `verifying` | `Skill(verifying)` → reviewing |
238
- | `reviewing` | `Skill(reviewing)` → complete |
239
+ | `verifying` | `Skill(verifying)` → reviewing — runs M9 e2e validation gate when `e2e:true` stories present |
240
+ | `reviewing` | `Skill(reviewing)` → complete — adds M9 e2e suite audit (soft-cap, orphans, flake-rate) |
239
241
  | `complete` | Done. Ask what's next. |
240
242
  | `deferred` | Milestone frozen. *"Resume milestone {id}" to reactivate.* |
241
243
  | `quick-tasking` | `Skill(quick-tasking)` |
@@ -225,6 +225,19 @@ Glob: src/**/index.{ts,tsx,js} # barrel exports
225
225
  Grep: src/ for "import.*from.*@/" # path aliases
226
226
  ```
227
227
 
228
+ ### Step 3.5: Architectural Layers
229
+
230
+ Detect distinct layers that hand a typed interface across a boundary — feeds the cross-layer contract detection in planning Step 6.1. A layer = a directory whose code is *produced for* or *consumed by* another (engine↔ui, core↔plugins, api↔web, native↔bindings).
231
+
232
+ ```bash
233
+ Bash: ls -d */ src/*/ 2>/dev/null # top-level + src subdirs as layer candidates
234
+ Grep: cross-boundary imports (e.g. ui importing engine types, generated bindings, ABI/descriptor/schema files)
235
+ ```
236
+
237
+ A single cohesive codebase with no internal producer→consumer boundary → **not** layered; leave `layers: []` (Step 6.1 no-ops). Only flag layers when one directory's output is another's typed input.
238
+
239
+ Present detected layers for confirmation: *"Detected layers: [{name → path}]. These hand interfaces across a boundary — confirm or correct."* Confirmed 2+ → written to `project.yml` `layers:` and seeded into `.forge/contracts/index.yml` at Finalize.
240
+
228
241
  ### Step 4: Present
229
242
 
230
243
  *"Project: {name} — {description}
@@ -269,6 +282,12 @@ User describes project → `.forge/project.yml`: name, goal, stack, constraints,
269
282
 
270
283
  Validate each term against `.forge/templates/interface-detection.md` type vocabulary. On unrecognized term, prompt: *"Did you mean [closest match]? Valid: browser | cli | api | desktop | native-apple | none."* Write validated answer as `interface: [...]` in project.yml.
271
284
 
285
+ ### Step 1.6: Architectural Layers
286
+
287
+ *"Will this project have distinct layers that hand a typed interface across a boundary (e.g. engine ↔ ui, core ↔ plugins, api ↔ web)? List them as name → path, or 'no' for a single-layer project."*
288
+
289
+ 2+ layers → write `layers:` to project.yml + seed `.forge/contracts/index.yml` at Finalize. Otherwise `layers: []` (planning Step 6.1 no-ops).
290
+
272
291
  ### Step 2: Design System
273
292
 
274
293
  *"UI library?"*
@@ -314,10 +333,11 @@ User selects per stack.
314
333
 
315
334
  ## Finalize
316
335
 
317
- 1. Write `.forge/project.yml` (all info + `verification`)
336
+ 1. Write `.forge/project.yml` (all info + `verification` + `layers`)
318
337
  2. Write `.forge/constitution.md`
319
338
  3. Write `.forge/design-system.md` (if configured)
320
- 4. Init state:
339
+ 4. Write `.forge/contracts/index.yml` (only if `layers:` has 2+ entries) — copy `.forge/templates/contracts-index.yml`, fill `layers:` from the confirmed list, leave `integration_points:` empty (first cross-layer phase populates them via planning Step 6.1)
340
+ 5. Init state:
321
341
  - `.forge/state/index.yml`:
322
342
  ```yaml
323
343
  milestones:
@@ -339,7 +359,7 @@ User selects per stack.
339
359
  task: null
340
360
  status: not_started
341
361
  ```
342
- 5. Templates as needed
362
+ 6. Templates as needed
343
363
 
344
364
  *"Initialized. Ready?"*
345
365
 
@@ -7,6 +7,26 @@ description: "Break work into executable tasks with verification gates. Enforces
7
7
 
8
8
  > **Do NOT use `EnterPlanMode`.** Output -> `.forge/phases/`.
9
9
 
10
+ ## Core Principle: Vertical Slicing
11
+
12
+ **Every phase and every plan MUST deliver a thin vertical slice -- a user-observable behavior reachable end-to-end (UI -> API -> data, or CLI -> core -> output).** Never decompose by horizontal layer (all models, then all APIs, then all UI). Horizontal slicing defers user-testable behavior until the last phase and amplifies integration risk.
13
+
14
+ Why:
15
+ - Each slice is testable, demoable, shippable on its own
16
+ - Bugs surface at the seam (where layers meet) on day one, not week three
17
+ - User can redirect direction after slice 1 instead of after the whole stack lands
18
+
19
+ Apply at three levels:
20
+ - **Roadmap (Step 5)**: phases are slices, not layers
21
+ - **Decompose (Step 6)**: plans are slices, not layers
22
+ - **Verify (Step 8)**: Slice Integrity gate -- hard fail on layer-only plans
23
+
24
+ Exceptions (must be explicitly justified in plan frontmatter `slice_exception:`):
25
+ - Foundational infra phase that no slice can reach yet (build setup, framework bootstrap)
26
+ - Shared library / cross-cutting refactor with no user-facing surface
27
+
28
+ If you find yourself writing a plan that only touches `src/models/`, `src/db/`, `src/schemas/`, `src/api/` (without UI/CLI counterpart), or `src/components/` (without data path) -- STOP. Merge with the slice that reaches the user, or claim an exception.
29
+
10
30
  ## Step 1: Resolution Gate
11
31
 
12
32
  Read `.forge/context.md` **Needs Resolution**. If unchecked `- [ ]` items:
@@ -53,6 +73,17 @@ If missing, create from `.forge/templates/requirements.yml`:
53
73
  5. P1 (must) / P2 (should) / P3 (nice)
54
74
  6. Deferred: DEF-001... (also globally unique)
55
75
 
76
+ **E2E gate (M9):** For each functional requirement being added or refined:
77
+ 1. Decide `e2e: true|false` -- does this story need a post-validation e2e test?
78
+ - true = high-value user journey worth a real-browser walk + automated guard
79
+ - false = covered by integration/unit, or low-value to e2e
80
+ - Default to false. Only flag true for spine flows (auth, checkout-class flows, primary user task).
81
+ 2. When `e2e:true`, capture `observable_outcome:` -- one sentence describing what the user observes when the flow succeeds. Block planning until provided. No silent default.
82
+ 3. Re-planning: read existing `e2e` / `observable_outcome` decisions from `requirements/m{N}.yml`. Preserve them. Only prompt for new or unflagged FRs.
83
+ 4. Write `e2e`, `observable_outcome`, `validated: false`, `observable_outcome_hash: ""` to each FR. The hash + `validated` flip later in verifying.
84
+
85
+ Contract: locked decision in `.forge/context.md` (M9 section, "Approach D"). Do NOT enforce the e2e soft cap here -- that's reviewing's job.
86
+
56
87
  **Blocks until all P1 `[NEEDS CLARIFICATION]` resolved.**
57
88
 
58
89
  Never write to top-level `.forge/requirements.yml` -- that path is deprecated.
@@ -66,11 +97,14 @@ Never write to top-level `.forge/requirements.yml` -- that path is deprecated.
66
97
  ### Case A: `roadmap.yml` missing (Full only)
67
98
 
68
99
  Create from `.forge/templates/roadmap.yml`:
69
- 1. Group by delivery boundaries
70
- 2. Inter-group dependencies
71
- 3. Phases (coherent, verifiable)
100
+ 1. **Group by vertical slice, NOT by layer.** Each phase = a thin end-to-end user journey. Wrong: `m1-models`, `m2-apis`, `m3-ui`. Right: `m1-user-can-sign-up`, `m2-user-can-post`, `m3-user-can-comment`.
101
+ 2. Inter-slice dependencies (slice B builds on artifact from slice A)
102
+ 3. Each phase has a one-sentence `goal:` written as user-observable outcome ("User can X"), never as "Build Y"
72
103
  4. Every FR -> one phase, no orphans
73
- 5. Waves: independent=1, dependent=2+
104
+ 5. Waves: independent slices=1, dependent slices=2+
105
+ 6. **Phase 1 must be demoable.** If phase 1 has no user-observable output, the roadmap is layered -- redesign.
106
+
107
+ Anti-pattern detection: scan proposed phase names. Reject if any phase name contains layer-only terms without a user verb: `models`, `schema`, `database`, `api-only`, `backend-only`, `ui-only`, `frontend-only`, `infrastructure` (unless tagged as exception phase).
74
108
 
75
109
  ### Case B: `roadmap.yml` exists, current milestone already in it
76
110
 
@@ -94,16 +128,62 @@ If a sibling milestone (e.g. m50) has state + requirements but is missing from `
94
128
 
95
129
  ## Step 6: Decompose Tasks
96
130
 
97
- Per phase (or feature, Standard tier):
131
+ ### Step 6.1: Cross-Layer Contract Detection
132
+
133
+ Before decomposing, classify whether this phase crosses a layer boundary with a *new or changing* contract. This decides single-plan vs a contract-pinned layer split. Read the project's layers from `project.yml` `layers:` (or `.forge/contracts/index.yml`; fallback: top-level source dirs).
134
+
135
+ **Trigger -- all three hold:**
136
+ 1. Work touches >= 2 declared layers (e.g. engine / blocks / ui).
137
+ 2. A struct / signature / ABI field / descriptor is *produced* by one layer and *consumed* by another.
138
+ 3. That interface is *new or changing* in this phase.
139
+
140
+ **Litmus (decisive):** "Would an agent building one layer in isolation have to GUESS the shape the other layer owns?" No (already specified in a durable contract) -> not cross-layer here.
141
+
142
+ **Two contract tiers:**
143
+ - **Durable** = the standing layer API. Lives in ADRs (`.forge/decisions/`) + constitution, indexed in `.forge/contracts/index.yml` (integration-point -> governing ADR). Stable + unchanged -> agents read the ADR; no per-phase artifact.
144
+ - **Per-phase delta** = the specific new/changed shape THIS phase introduces. Pinned in `.forge/phases/m{M}-{N}-{name}/contract.md` (from `.forge/templates/contract.md`); references its governing ADR; folded back into that ADR on landing.
145
+
146
+ **Classify:**
147
+
148
+ | Tier | Condition | Response |
149
+ |------|-----------|----------|
150
+ | 0 | Trigger fails | Normal decomposition (6.2). Nothing added. |
151
+ | 1 | Cross-layer delta, small / tightly sequential | Write `contract.md`; tag tasks `layer:`; ONE plan. No interruption. |
152
+ | 2 | Cross-layer delta, cleanly separable, worth parallel sessions | Pin `contract.md`; split into plan-NNa (producer layer, pins contract) + plan-NNb (consumer layer, `depends_on` the contract). Ratify gate. |
153
+
154
+ **Liberal detect, conservative interrupt:** classify every phase. Unsure between 1 and 2 -> default **Tier 1** (write the doc, no interruption). Escalate to 2 only when confident the parallel split pays off.
155
+
156
+ **Tier-2 ratify gate** (the ONLY interruption; frame as contract-correctness, not "parallelize y/n"):
157
+ > *"This phase changes the {integration point} contract ({governing ADR}). Delta: [summary]. plan-NNa ({producer}) pins it; plan-NNb ({consumer}) builds against it in parallel. Is this contract shape correct?"*
158
+
159
+ Block the split until confirmed. Override ("keep it one plan") -> log to `state/index.yml` `desire_paths` (recurring overrides tune the threshold), fall back to Tier 1.
160
+
161
+ **Integration (Tier 2):** layer plans build isolated (per-layer worktrees). The phase's final task is a **seam check** owned by the executing flow (NOT a standing agent): merge the layer branches, verify the shape the producer emits matches what the consumer built against, per `contract.md`.
162
+
163
+ ### Step 6.2: Task Decomposition
164
+
165
+ Per phase (or feature, Standard tier). **Each plan = one vertical slice** -- except a sanctioned Tier-2 contract split (6.1), which divides one slice across producer/consumer layer plans reconciled at the seam check.
166
+
167
+ #### Slice-First Decomposition
168
+
169
+ Before writing any plan:
170
+ 1. List the user-observable behaviors this phase must deliver (from `requirements/m{N}.yml`)
171
+ 2. For each behavior, identify the full path: UI/CLI surface -> handler -> business logic -> persistence (only the parts that behavior needs)
172
+ 3. **One plan = one behavior end-to-end.** Plan touches every layer that behavior needs, not all of one layer.
173
+ 4. If a plan can only ship part of the path (e.g., UI without backend wired), it is NOT a slice -- restructure.
174
+
175
+ Plan naming reflects the slice: `plan-01-user-signs-up.md`, not `plan-01-models.md`.
176
+
177
+ #### File Layout
98
178
 
99
179
  1. `.forge/templates/plan.md` -> `.forge/phases/m{M}-{N}-{name}/plan-{NN}.md`
100
180
  - `{M}`=milestone, `{N}`=phase#, `{name}`=kebab, `{NN}`=seq
101
181
  - Ex: `.forge/phases/m3-2-providers/plan-01.md`
102
- 2. Frontmatter: phase, plan#, wave, deps
182
+ 2. Frontmatter: phase, plan#, wave, deps, `slice_exception:` (optional, see Core Principle)
103
183
  3. must_haves:
104
- - **Truths:** User-observable outcomes (3-7)
105
- - **Artifacts:** Must exist, substantive not stubs
106
- - **Key Links:** Connections between artifacts
184
+ - **Truths:** User-observable outcomes (3-7). MUST be phrased as something the user can see, click, or receive -- not "model X exists" or "table Y created".
185
+ - **Artifacts:** Must exist, substantive not stubs. Slice plans typically span 2-4 layers (e.g., component + handler + repo).
186
+ - **Key Links:** Connections between artifacts -- these prove the slice is wired, not stubbed.
107
187
  4. XML tasks (2-3/plan, 15-60 min):
108
188
 
109
189
  ```xml
@@ -133,20 +213,45 @@ Per phase (or feature, Standard tier):
133
213
  | `checkpoint:decision` | Pause for user choice between options |
134
214
  | `checkpoint:human-action` | Pause for manual action (email verification, 2FA) |
135
215
 
136
- ### Vertical Slices (Preferred)
216
+ ### Vertical Slices (Required)
217
+
137
218
  ```
138
- Plan 01: User feature (model + API + UI) → Wave 1
139
- Plan 02: Product feature (model + API + UI) → Wave 1
219
+ Plan 01: User can sign up (UI form + /api/signup + users table write) → Wave 1
220
+ Plan 02: User can log in (UI form + /api/login + session issue) → Wave 2 (uses table from 01)
221
+ Plan 03: User can post note (UI editor + /api/notes + notes table) → Wave 2 (uses auth from 02)
140
222
  ```
141
- Independent plans run parallel.
142
223
 
143
- ### Avoid Horizontal Layers
224
+ Each plan is independently demoable. Bugs at layer seams surface in plan 01.
225
+
226
+ ### Horizontal Layers (Anti-Pattern -- BLOCKED)
227
+
144
228
  ```
145
- Plan 01: All models → Wave 1
146
- Plan 02: All APIs → Wave 2 (depends on 01)
147
- Plan 03: All UI → Wave 3 (depends on 02)
229
+ Plan 01: All models → Wave 1
230
+ Plan 02: All APIs → Wave 2 (depends on 01)
231
+ Plan 03: All UI → Wave 3 (depends on 02)
148
232
  ```
149
- Sequential. Only when architecturally required.
233
+
234
+ This decomposition is **rejected by default** at Step 8 (Slice Integrity gate). To proceed, declare `slice_exception:` in plan frontmatter with one of:
235
+ - `infra_bootstrap` -- foundational setup with no user-reachable surface
236
+ - `shared_library` -- cross-cutting utility used by future slices
237
+ - `data_migration` -- one-shot schema/data change with no behavior added
238
+
239
+ Anything else: restructure into slices.
240
+
241
+ ### Anti-Pattern Auto-Detector
242
+
243
+ A plan fails Slice Integrity if ALL of:
244
+ - `must_haves.truths` contain only artifacts/internals (e.g., "Schema migration applied", "Model class created") with no user-visible verb (see, click, receive, fail-with-error)
245
+ - `must_haves.artifacts` paths all live under a single layer prefix (only `src/models/`, only `src/api/`, only `src/components/`)
246
+ - No `slice_exception:` declared
247
+
248
+ Detection runs in Step 8.
249
+
250
+ ### Contract-Driven Layer Split (Tier 2 exception)
251
+
252
+ When Step 6.1 flags a **Tier-2** cross-layer contract, splitting by layer is correct -- NOT the horizontal anti-pattern above. The difference:
253
+ - *Horizontal anti-pattern:* split by layer with no contract; each layer waits on the previous. Serializes.
254
+ - *Contract-driven split:* plan-NNb `depends_on` the **frozen contract** (pinned by NNa up front), not NNa's implementation -> both layers build in parallel (separate sessions/worktrees), reconciled at the seam check.
150
255
 
151
256
  ## Step 7: Test Specs (Optional)
152
257
 
@@ -238,7 +343,7 @@ Decision captured once, pre-code. Does not block planning.
238
343
 
239
344
  ## Step 8: Verify Plans
240
345
 
241
- 8 dimensions:
346
+ 9 dimensions:
242
347
  1. **Requirement Coverage** -- every req has task(s)
243
348
  2. **Task Completeness** -- files + action + verify + done
244
349
  3. **Deps** -- valid DAG, no cycles
@@ -247,6 +352,20 @@ Decision captured once, pre-code. Does not block planning.
247
352
  6. **Verification** -- must_haves trace to goal
248
353
  7. **Context** -- honors locked, excludes deferred
249
354
  8. **Spec Validity** -- valid syntax, correct paths
355
+ 9. **Slice Integrity (HARD GATE)** -- every plan delivers a vertical slice OR declares `slice_exception:`
356
+
357
+ ### Slice Integrity Check
358
+
359
+ For each plan, FAIL if all of these hold and no `slice_exception:` is declared:
360
+ - `must_haves.truths` lack a user-observable verb (`see`, `click`, `submit`, `receive`, `view`, `download`, `error`, `redirected`, `login`, `signup`, etc.) -- internal-only truths like "Schema applied", "Model registered", "Index built" do not satisfy
361
+ - `must_haves.artifacts` paths cluster in a single layer (all under `models/`, all under `api/`, all under `components/`, etc.)
362
+ - No file path crosses a layer boundary (e.g., a component plus its handler, a CLI plus its core)
363
+
364
+ **Exempt:** a Tier-2 contract-split plan (Step 6.1) carries both `layer:` and `contract:` frontmatter. It is a sanctioned single-layer plan reconciled at the seam check -- treat as auto-`slice_exception` (the *phase*, not the plan, owns the vertical slice). It passes without declaring `slice_exception:`.
365
+
366
+ Roadmap-level check: FAIL if phase 1 has no user-observable goal.
367
+
368
+ On fail: restructure plan(s) into slices, or declare `slice_exception:` with one of `infra_bootstrap | shared_library | data_migration` and a one-line rationale. Re-verify.
250
369
 
251
370
  Issues -> fix, re-verify. Max 3 cycles.
252
371
 
@@ -150,6 +150,52 @@ refactoring_scan:
150
150
  suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
151
151
  ```
152
152
 
153
+ ### Part 4: E2E Suite Audit (M9)
154
+
155
+ Three sub-checks. All advisory. None block milestone close.
156
+
157
+ **1. Soft-cap warning**
158
+
159
+ - Read `verification.e2e_soft_cap` from `.forge/project.yml`. Default 10 if absent.
160
+ - Count `e2e: true` stories in the active milestone's `.forge/requirements/m{N}.yml`.
161
+ - If count > cap → warn: `"E2E soft cap exceeded: {count}/{cap} stories flagged. Trim e2e:true stories or raise verification.e2e_soft_cap in project.yml. Soft cap — does not block."`
162
+ - **Skip-clean:** zero `e2e:true` stories → sub-check omitted from report.
163
+
164
+ **2. Orphan-test detection**
165
+
166
+ - Glob for e2e test files. Stack-detect from `project.yml` `interface_tools` (fallback: Playwright `tests/e2e/**/*.spec.ts` + `e2e/**/*.spec.ts`; pytest `tests/e2e/test_*.py`; go `e2e/*_test.go`).
167
+ - For each file, grep for `story: FR-` (either in comment or test/function name).
168
+ - If no match → flag: `"Orphan e2e test: {path} — no FR-XXX reference found. Either tag the story or delete the test."`
169
+ - List orphans in a dedicated subsection.
170
+ - **Skip-clean:** zero e2e files discovered → sub-check omitted from report.
171
+
172
+ **3. Flake-rate signal**
173
+
174
+ - Best-effort. Attempt sources in order:
175
+ 1. `.forge/testing/suite-health.md` flake entries (tester analyst-mode output)
176
+ 2. GitHub Actions test summary artifacts (parse from `.github/workflows/` outputs if accessible)
177
+ 3. Local `playwright-report/` retry counts (if present)
178
+ - Aggregate per-test flake count. Surface top 5 flakiest with counts.
179
+ - If no source available AND e2e files exist → emit `"Flake-rate: no data (run testing skill analyst-mode for suite-health.md)"`.
180
+ - Never blocks.
181
+ - **Skip-clean:** zero e2e files discovered → sub-check omitted entirely (no "no data" line).
182
+
183
+ **Section-level skip-clean:** zero e2e test files AND zero `e2e:true` stories → omit the entire "E2E Suite Audit" section from the health report.
184
+
185
+ ```yaml
186
+ e2e_suite_audit:
187
+ soft_cap:
188
+ count: 3
189
+ cap: 10
190
+ status: ok # ok | exceeded
191
+ orphan_tests:
192
+ files_scanned: 4
193
+ orphans: [] # list of paths with no story: FR- reference
194
+ flake_rate:
195
+ source: "suite-health.md" # or "no data"
196
+ top_flaky: [] # [{path, count}, ...]
197
+ ```
198
+
153
199
  ## Step 4: Score
154
200
 
155
201
  **Per-category:**
@@ -325,9 +371,21 @@ If the milestone being completed has `milestone.origin: {R-id}` set (promoted fr
325
371
  3. Update item: `status: resolved`, set `completed: "<ISO 8601 date>"`. Keep `promoted_to: {milestone-id}` intact for audit trail.
326
372
  4. Log in summary: *"Backlog item {R-id} → resolved (promoted milestone {id} complete)."*
327
373
 
374
+ ## Contract Landing (cross-layer phases)
375
+
376
+ If the milestone's phases produced `contract.md` files (planning Step 6.1 Tier 1/2), close their lifecycle before completing the milestone. The durable contract is the ADR; the per-phase `contract.md` is a working delta that must be folded back in.
377
+
378
+ 1. Glob `.forge/phases/m{id}-*/contract.md`.
379
+ 2. For each contract not yet `absorbed` (Tier-2 lands at `ratified` after the executing seam check; Tier-1 lands at `proposed` — both fold the same way now that the phase is verified):
380
+ - Fold `delta` into its `governing_adr` — amend the ADR in `.forge/decisions/`, or supersede it (`Status: Superseded by ADR-{NNN}`) if the shape changed materially.
381
+ - If a **new** integration point was introduced, add it to `.forge/contracts/index.yml` `integration_points:` (id, produces, consumes, governing_adr, summary).
382
+ - Set the contract's `status: absorbed`. The ADR is now authoritative; `contract.md` becomes history.
383
+ 3. Any contract with no `governing_adr` set (nothing to fold into) → warn: *"Contract {integration_point} has no governing ADR — file one in `.forge/decisions/` before close, or the durable contract drifts from code."* Advisory — does not block completion.
384
+
328
385
  ## Phase Handoff
329
386
 
330
387
  1. Confirm report + backlog
331
388
  2. **Run promoted-milestone completion hook** (above) if `milestone.origin` set
332
- 3. Set `current.status: complete` and `current.completed_at: "<ISO 8601 timestamp>"`
333
- 4. *"Milestone [{name}] complete. Report: `.forge/audits/milestone-{id}-health-report.md`. {N} backlog items. `/forge` or backlog."*
389
+ 3. **Run Contract Landing** (above) for any cross-layer phases — fold ratified contracts into their ADRs
390
+ 4. Set `current.status: complete` and `current.completed_at: "<ISO 8601 timestamp>"`
391
+ 5. *"Milestone [{name}] complete. Report: `.forge/audits/milestone-{id}-health-report.md`. {N} backlog items. `/forge` or backlog."*
@@ -66,6 +66,35 @@ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
66
66
 
67
67
  ### Author Mode
68
68
 
69
+ #### E2E Preflight Gate (M9)
70
+
71
+ Runs ONLY for e2e authoring requests. Integration-test authoring + analyst mode skip this gate entirely.
72
+
73
+ **Preconditions per story** — for every FR the user requests an e2e for:
74
+
75
+ 1. Read `.forge/requirements/m{N}.yml`, locate the FR by ID.
76
+ 2. Check `e2e: true`. If false or missing → REFUSE with:
77
+ `"Story {FR-ID} not flagged for e2e — add `e2e: true` + `observable_outcome` in requirements/m{N}.yml first (planning skill captures this during story breakdown)."`
78
+ 3. Check `validated: true`. If false or missing → REFUSE with:
79
+ `"Story {FR-ID} not yet validated by human — run verifying skill and walk the flow first (the e2e validation gate writes validated:true on confirmation)."`
80
+ 4. Recompute `observable_outcome_hash` from current outcome text (SHA-256 utf-8, first 12 hex). Compare to stored hash. If mismatch → REFUSE with:
81
+ `"Story {FR-ID} observable_outcome changed since validation — re-run verifying skill to re-validate the updated flow."`
82
+ 5. Only when all three pass: proceed to author the e2e test.
83
+
84
+ **Story stamping (required on every authored e2e)** — every generated e2e file MUST include the story reference. Use the framework's natural mechanism:
85
+
86
+ - Playwright / Vitest / Jest TS: `// story: FR-XXX` at the top of the spec file AND the FR ID in the test name (e.g. `test('FR-053: user signs in with correct credentials', ...)`)
87
+ - pytest: `# story: FR-XXX` at the top of the test module AND in the test function name (`def test_FR_053_user_signs_in(...)`)
88
+ - go test: `// story: FR-XXX` above the test function AND in the test name (`func TestFR053UserSignsIn(t *testing.T)`)
89
+
90
+ No story ID = orphan. Reviewing skill (phase 17) flags orphans for deletion.
91
+
92
+ **Integration + analyst modes** — unchanged. No flag check, no validated check, no story-ID stamping enforcement. M9 lock is e2e-only.
93
+
94
+ Refusal message wording is contract (NFR-009 requires story ID + exact missing field). Do not paraphrase.
95
+
96
+ #### Standard author flow
97
+
69
98
  1. **Determine layer** — e2e vs integration. Ask if ambiguous.
70
99
  2. **Select runner:**
71
100
  - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)
@@ -91,6 +91,35 @@ Re-run verifying after tests are added.
91
91
 
92
92
  If detection is ambiguous (e.g. API tests hard to grep definitively) → lean toward PASS to avoid false blocks; note uncertainty in the verdict.
93
93
 
94
+ ## E2E Validation Gate (M9)
95
+
96
+ Runs AFTER code-level verification commands pass. Skipped if no `e2e:true` stories in the active milestone.
97
+
98
+ ### Steps
99
+
100
+ 1. Read `.forge/requirements/m{N}.yml` for the active milestone. Collect every functional requirement with `e2e: true`.
101
+ 2. If list is empty → skip gate silently. No prompt. No error.
102
+ 3. For each `e2e:true` FR, present to the human:
103
+ - FR ID + description
104
+ - `observable_outcome` text verbatim
105
+ - Prompt: *"Walk this flow manually. Did the observable outcome occur? [confirm | decline | skip]"*
106
+ 4. Per response:
107
+ - **confirm** → compute `observable_outcome_hash` = SHA-256(observable_outcome utf-8), truncate to first 12 hex chars. Write `validated: true` + the hash to the FR entry in `requirements/m{N}.yml`.
108
+ - **decline** → leave `validated: false`. Record decline + reason (free text) in the verification report.
109
+ - **skip** → leave `validated: false`. Record skip in the verification report. No reason required.
110
+ 5. **Hash drift check** (run BEFORE prompting, every gate invocation): for each `e2e:true` FR with `validated: true`, recompute hash from current `observable_outcome`. If it differs from stored `observable_outcome_hash` → set `validated: false`, clear hash. Note auto-reset in verification report. Then prompt that FR as unvalidated.
111
+ 6. Write per-FR validation outcomes into the verification report under section "E2E Validation".
112
+
113
+ ### Gate behavior
114
+
115
+ - **Advisory, not blocking.** Verifying still passes even if no stories validated — the hard gate is in `testing` skill author-mode (phase 16). This gate's job is to surface + record, not block.
116
+ - Per-story (not batch). Human walks one at a time.
117
+ - Hash: SHA-256, UTF-8 input, hex output truncated to first 12 chars. Deterministic across machines.
118
+
119
+ ### Skip-clean
120
+
121
+ Milestones with zero `e2e:true` stories never see this gate. Verifying logs nothing — appears as if the gate doesn't exist.
122
+
94
123
  ## 3-Level Goal-Backward Verification
95
124
 
96
125
  ### Level 1: Observable Truths
@@ -0,0 +1,27 @@
1
+ # Phase Contract: {integration point}
2
+
3
+ Copy to `.forge/phases/m{M}-{N}-{name}/contract.md` when planning **Step 6.1** detects a cross-layer delta (Tier 1 or 2). Pins the NEW or CHANGED interface shape the producing and consuming layers must agree on for this phase, so an agent building one layer in isolation does not have to guess the other's shape.
4
+
5
+ Lifecycle: pinned by the producer plan (NNa) BEFORE the consumer plan (NNb) builds against it -> ratified at the Tier-2 gate -> folded into the governing ADR when the phase lands (`status: absorbed`). The durable contract is the ADR; this file is the working delta on top of it.
6
+
7
+ ---
8
+
9
+ ```yaml
10
+ contract:
11
+ integration_point: "" # e.g. "engine -> ui (block descriptor)"
12
+ governing_adr: "" # durable contract this delta extends, e.g. "ADR-026" (see .forge/contracts/index.yml)
13
+ producer_layer: "" # owns + pins the shape (plan-NNa), e.g. "engine"
14
+ consumer_layer: "" # builds against it (plan-NNb), e.g. "ui"
15
+ status: proposed # proposed | ratified | absorbed
16
+ delta: |
17
+ # The exact new/changed shape the consumer must NOT have to guess:
18
+ # struct fields, ABI fields + sizeof/layout, function signatures,
19
+ # enum values, units, ownership. ASCII only.
20
+ seam_check: "" # how integration verifies producer output == consumer expectation,
21
+ # e.g. "ui block_shape_projection_test asserts port count from descriptor"
22
+ ```
23
+
24
+ ## Notes
25
+ - One contract block per cross-layer delta. A phase changing two integration points pins two blocks (or two files).
26
+ - The consumer plan's `depends_on` points at THIS contract reaching `status: ratified`, NOT the producer plan's completion -- that is what lets the layers build in parallel.
27
+ - On landing: amend or supersede `governing_adr` to absorb `delta`, set `status: absorbed`. The ADR is then authoritative; this file becomes history.
@@ -0,0 +1,35 @@
1
+ # Durable cross-layer contract index
2
+ #
3
+ # Maps each standing integration point between layers to the ADR(s) that
4
+ # govern it. Planning Step 6.1 reads this to (a) know an integration point
5
+ # already has a durable contract, (b) decide whether the current phase
6
+ # CHANGES it, (c) point producer/consumer agents at the authoritative shape.
7
+ #
8
+ # Durable contracts live in the ADRs themselves (.forge/decisions/). This
9
+ # index is only the lookup. A phase that changes a contract pins a per-phase
10
+ # delta in its contract.md, then folds it back into the governing ADR.
11
+ #
12
+ # Copy to .forge/contracts/index.yml and fill in for your project.
13
+
14
+ # The project's architectural layers -- used by the cross-layer trigger
15
+ # (Step 6.1 condition 1: "work touches >= 2 declared layers").
16
+ layers:
17
+ - name: "" # e.g. "engine"
18
+ path: "" # e.g. "engine/"
19
+ # - name: "blocks"
20
+ # path: "blocks/"
21
+ # - name: "ui"
22
+ # path: "ui/"
23
+
24
+ # Standing integration points and their governing ADR(s).
25
+ integration_points:
26
+ - id: "" # e.g. "engine<->block"
27
+ produces: "" # producer layer, e.g. "engine"
28
+ consumes: "" # consumer layer, e.g. "blocks"
29
+ governing_adr: [] # e.g. ["ADR-001"]
30
+ summary: "" # one line: what the contract covers
31
+ # - id: "engine->ui"
32
+ # produces: "engine"
33
+ # consumes: "ui"
34
+ # governing_adr: ["ADR-026"]
35
+ # summary: "Block descriptor: UI projects ports/params/layout from the engine descriptor"
@@ -16,20 +16,32 @@ type: execute # execute | tdd
16
16
  wave: 1 # Execution wave (1 = no dependencies)
17
17
  depends_on: [] # Plan IDs that must complete first
18
18
  autonomous: true # false if contains checkpoints
19
+ layer: "" # name from project.yml layers[], or "" -- set by planning Step 6.1 cross-layer split (Tier 1/2)
20
+ contract: "" # path to this phase's contract.md (cross-layer delta this plan pins/consumes), if any
21
+
22
+ # Vertical slice declaration. A plan delivers a thin end-to-end user behavior
23
+ # (UI -> API -> data, or CLI -> core -> output). Plans that touch only ONE layer
24
+ # are rejected by the planning Slice Integrity gate unless slice_exception is set.
25
+ slice_exception: null # null | infra_bootstrap | shared_library | data_migration
26
+ slice_exception_rationale: "" # Required if slice_exception != null. One line.
19
27
 
20
28
  must_haves:
21
- truths: # Observable from user perspective when plan is done
22
- - "" # e.g., "User can see their profile page"
23
- - "" # e.g., "API returns user data as JSON"
24
- artifacts: # Files that must exist and be substantive (not stubs)
25
- - path: "" # e.g., "src/components/Profile.tsx"
26
- provides: "" # e.g., "User profile display with avatar and bio"
29
+ truths: # USER-observable when plan is done. Must contain a user verb
30
+ # (see, click, submit, receive, view, download, error, redirect).
31
+ # Internal-only truths like "Schema applied" do NOT satisfy.
32
+ - "" # e.g., "User submits signup form and lands on dashboard"
33
+ - "" # e.g., "Invalid email shows inline 'must be a valid email' error"
34
+ artifacts: # Files that must exist and be substantive (not stubs).
35
+ # Slice plans typically span 2-4 layers — paths should cross
36
+ # boundaries (component + handler + repo), not cluster in one dir.
37
+ - path: "" # e.g., "src/components/SignupForm.tsx"
38
+ provides: "" # e.g., "Signup form with email + password fields"
27
39
  min_lines: 30 # Stub detection threshold
28
- key_links: # Critical connections between artifacts
29
- - from: "" # e.g., "src/components/Profile.tsx"
30
- to: "" # e.g., "/api/users/[id]"
31
- via: "" # e.g., "fetch in useEffect"
32
- pattern: "" # e.g., "fetch.*api/users"
40
+ key_links: # Connections between layers — these prove the slice is wired
41
+ - from: "" # e.g., "src/components/SignupForm.tsx"
42
+ to: "" # e.g., "/api/signup"
43
+ via: "" # e.g., "fetch on submit"
44
+ pattern: "" # e.g., "fetch.*api/signup"
33
45
  ```
34
46
 
35
47
  ## Tasks
@@ -14,6 +14,12 @@ tech_stack:
14
14
  testing: "" # e.g., Vitest, Jest, Pytest
15
15
  other: [] # Additional key dependencies
16
16
 
17
+ layers: [] # Architectural layers — enables cross-layer contract detection (planning Step 6.1).
18
+ # Populated during init (brownfield: producer/consumer source dirs; greenfield: asked).
19
+ # Each entry: {name, path}. Empty/absent = single-layer project, detection no-ops.
20
+ # e.g. [{name: engine, path: src/engine/}, {name: ui, path: src/ui/}]
21
+ # Durable integration points + governing ADRs live in .forge/contracts/index.yml.
22
+
17
23
  interface: [none] # Surfaces this project exposes: browser | cli | api | desktop | native-apple | none
18
24
  # Array — e.g. [browser, api] for full-stack projects
19
25
 
@@ -60,6 +66,7 @@ verification:
60
66
  # advisory: true # pre-existing type errors — warn, don't block
61
67
  auto_fix: true # On failure, agent fixes and retries
62
68
  max_retries: 2 # Max auto-fix attempts per command (0 = fail immediately)
69
+ e2e_soft_cap: 10 # M9: advisory cap on e2e:true stories per milestone. Reviewing warns when exceeded. Soft — never blocks.
63
70
  # Advisory mode: commands already failing before Forge started run but don't block — warn only.
64
71
 
65
72
  success_criteria: # How do we know we're done?
@@ -7,6 +7,11 @@
7
7
  milestone: 1 # Milestone this file belongs to (matches state/milestone-{id}.yml)
8
8
  version: "v1" # v1 = MVP, v2 = next iteration
9
9
 
10
+ # E2E fields (M9): mark `e2e: true` + `observable_outcome` during planning. Verifying skill
11
+ # prompts a human walk; on confirm it sets `validated: true` + `observable_outcome_hash`.
12
+ # Testing skill author-mode refuses e2e without validated:true. Reviewing skill warns on
13
+ # soft-cap exceeded + flags orphan tests. Fields are lazy — absent = e2e:false/validated:false.
14
+
10
15
  functional:
11
16
  # Each requirement: unique ID, description, acceptance criteria, phase assignment
12
17
  - id: FR-001
@@ -18,7 +23,12 @@ functional:
18
23
  phase: null # Assigned during roadmap creation
19
24
  priority: P1 # P1 = must-have, P2 = should-have, P3 = nice-to-have
20
25
  status: pending # pending | clarifying | planned | implemented | verified
21
- notes: "" # [NEEDS CLARIFICATION] if uncertain
26
+ notes: ""
27
+ # E2E gate (M9). Lazy — absent fields = e2e:false, validated:false.
28
+ # e2e: false # true = story gets one e2e test post-validation
29
+ # observable_outcome: "" # one-sentence user-observable outcome (required when e2e:true)
30
+ # observable_outcome_hash: "" # auto-computed SHA-256 of outcome (12 hex chars); editing outcome resets validated
31
+ # validated: false # set true by verifying skill after human walks the flow # [NEEDS CLARIFICATION] if uncertain
22
32
 
23
33
  - id: FR-002
24
34
  description: ""
@@ -1,6 +1,16 @@
1
1
  # Forge Roadmap Template
2
2
  # Copy to .forge/roadmap.yml and customize.
3
- # Phases are delivery boundaries — coherent, verifiable capabilities.
3
+ #
4
+ # Phases are VERTICAL SLICES — thin end-to-end user journeys shippable on their own.
5
+ # NOT horizontal layers (do NOT carve into "all models" / "all APIs" / "all UI" phases).
6
+ #
7
+ # Right: m1-user-can-sign-up, m2-user-can-post, m3-user-can-comment
8
+ # Wrong: m1-models, m2-apis, m3-ui
9
+ #
10
+ # Each phase's `goal:` must read as a user-observable outcome ("User can X"),
11
+ # never as "Build Y". Phase 1 must be demoable. The planning skill's
12
+ # Slice Integrity gate will reject layered roadmaps unless a `slice_exception:`
13
+ # is declared on the plan (infra_bootstrap | shared_library | data_migration).
4
14
 
5
15
  roadmap:
6
16
  # Milestones group phases into concurrent work streams.
@@ -16,13 +26,15 @@ roadmap:
16
26
  # Example: m1-1-foundation/, m1-2-auth/, m2-1-dashboard/
17
27
  phases:
18
28
  - id: 1
19
- name: "" # e.g., "Foundation & Architecture"
20
- goal: "" # Outcome, not task. "Users can X" not "Build Y"
21
- requirements: [] # List of FR-IDs this phase delivers
29
+ name: "" # Vertical slice name, e.g., "User can sign up"
30
+ goal: "" # User-observable outcome. "User can X". Never "Build Y".
31
+ slice_exception: null # null | infra_bootstrap | shared_library | data_migration
32
+ # Only set if this phase legitimately has no user-facing surface.
33
+ requirements: [] # List of FR-IDs this phase delivers (end-to-end)
22
34
  dependencies: [] # Phase IDs that must complete first
23
- success_criteria: # Observable truths when phase is done
24
- - "" # e.g., "User can see the landing page"
25
- - "" # e.g., "API returns valid JSON for /users"
35
+ success_criteria: # User-observable truths when phase is done
36
+ - "" # e.g., "User submits signup form and lands on dashboard"
37
+ - "" # e.g., "Invalid email shows inline error"
26
38
  estimated_hours: null
27
39
  status: pending # pending | researching | planning | executing | verifying | deferred | complete
28
40
 
@@ -51,15 +51,18 @@ Auto-detects complexity. Override: "Use Quick/Standard/Full tier."
51
51
  | Architectural decisions | `architecting` | Full |
52
52
  | Break work into tasks with gates | `planning` | Standard, Full |
53
53
  | Build with deviation rules + atomic commits | `executing` | All |
54
- | Prove work delivers on goals | `verifying` | Standard, Full |
55
- | Audit health + catalog refactoring | `reviewing` | Standard, Full |
54
+ | Prove work delivers on goals (+ M9 e2e validation gate when `e2e:true` stories present) | `verifying` | Standard, Full |
55
+ | Audit health + catalog refactoring (+ M9 e2e soft-cap, orphan-test, flake-rate audits) | `reviewing` | Standard, Full |
56
56
  | Small scoped fix | `quick-tasking` | Quick |
57
57
  | UI with design system | `designing` | When UI |
58
58
  | Security review | `securing` | When auth/data/API |
59
- | E2E/integration tests + suite audit | `testing` | When UI/flows or flaky suite |
59
+ | E2E/integration tests + suite audit (+ M9 author-mode gate refuses e2e without `e2e:true` + `validated:true`) | `testing` | When UI/flows or flaky suite |
60
60
  | Systematic debugging | `debugging` | When stuck |
61
61
  | Upgrade Forge files | `upgrading` | On-demand |
62
62
  | Cross-session memory | `beads-integration` | When Beads installed |
63
+ | Multi-agent orchestration (experimental) | `orchestrating` | Full (opt-in) |
64
+
65
+ > Experimental skills require opt-in install — see `packages/create-forge/experimental/m10/README.md`.
63
66
 
64
67
  ## Context Engineering
65
68
 
@@ -125,7 +128,7 @@ State lives in `.forge/`:
125
128
  - `project.yml` — Vision, stack, design system, verification, constraints (<5KB)
126
129
  - `constitution.md` — Active architectural gates
127
130
  - `design-system.md` — Component mapping table
128
- - `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared.
131
+ - `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared. Functional requirements may carry M9 e2e gate fields (`e2e`, `observable_outcome`, `observable_outcome_hash`, `validated`) — lazy migration, absent fields default to `e2e:false`/`validated:false`.
129
132
  - `roadmap.yml` — Phases, milestones, dependencies
130
133
  - `state/index.yml` — Global: active milestones, desire_paths, metrics
131
134
  - `state/milestone-{id}.yml` — Per-milestone cursor: position, progress, decisions, blockers
@@ -173,6 +176,7 @@ verification:
173
176
  - Auto-fix loop: read output → fix → amend → re-run (up to max_retries)
174
177
  - 3-strike: retries count toward task limit
175
178
  - Empty commands = no gate (opt-out)
179
+ - `verification.e2e_soft_cap` (default 10) — advisory cap on `e2e:true` stories per milestone surfaced by the `reviewing` skill. Soft — never blocks.
176
180
 
177
181
  ## Beads Integration (Optional)
178
182