npm - forge-orkes - Versions diffs - 0.13.0 → 0.16.0 - Mend

forge-orkes 0.13.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/package.json +1 -1
package/template/.claude/agents/planner.md +4 -0
package/template/.claude/hooks/README.md +76 -0
package/template/.claude/skills/architecting/SKILL.md +11 -0
package/template/.claude/skills/debugging/SKILL.md +14 -0
package/template/.claude/skills/executing/SKILL.md +50 -2
package/template/.claude/skills/forge/SKILL.md +5 -3
package/template/.claude/skills/initializing/SKILL.md +23 -3
package/template/.claude/skills/planning/SKILL.md +138 -19
package/template/.claude/skills/reviewing/SKILL.md +60 -2
package/template/.claude/skills/testing/SKILL.md +29 -0
package/template/.claude/skills/verifying/SKILL.md +29 -0
package/template/.forge/templates/contract.md +27 -0
package/template/.forge/templates/contracts-index.yml +35 -0
package/template/.forge/templates/plan.md +23 -11
package/template/.forge/templates/project.yml +7 -0
package/template/.forge/templates/requirements.yml +11 -1
package/template/.forge/templates/roadmap.yml +19 -7
package/template/CLAUDE.md +8 -4

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "forge-orkes",
-  "version": "0.13.0",
+  "version": "0.16.0",
   "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
   "bin": {
     "create-forge": "./bin/create-forge.js"

package/template/.claude/agents/planner.md CHANGED Viewed

@@ -51,6 +51,9 @@ requirements:
 Mark unknowns `[NEEDS CLARIFICATION]` — never guess.
 ### 5. Decompose Tasks
+**Cross-layer first:** if this phase introduces/changes an interface one layer produces and another consumes (litmus: would an isolated agent have to guess the other's shape?), pin the delta in `contract.md` and either tag tasks `layer:` (Tier 1, one plan) or split into producer plan-NNa (pins contract) + consumer plan-NNb (`depends_on` the *frozen contract*, builds in parallel) -- Tier 2. See planning skill Step 6.1.
 ```xml
 <task type="auto|manual">
   <name>{Verb} {thing} {detail}</name>
@@ -109,3 +112,4 @@ must_haves:
 - **Horizontal slicing**: models->routes->UI (prefer vertical)
 - **Gold-plating**: Beyond requirements
 - **Guessing**: Fill unknowns instead of `[NEEDS CLARIFICATION]`
+- **Guessing a cross-layer shape**: splitting layers without pinning `contract.md` first -- the consumer ends up guessing the producer's shape. Pin the contract, then split.

package/template/.claude/hooks/README.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Forge Hooks
+## `forge-claim-check.sh` — PreToolUse claim-check
+Cross-session file-claim collision detector. Pairs with the Forge MCP
+orchestrator (`.forge/.mcp-server/`) to prevent two concurrent Claude Code
+sessions from clobbering each other's edits on the same file.
+### Behavior
+Reads the Claude Code `PreToolUse` JSON payload on stdin. Extracts target
+file path(s) from `tool_input.file_path`, `tool_input.notebook_path`,
+`tool_input.path`, or `tool_input.edits[].file_path` (MultiEdit). For each
+path, queries `.forge/.mcp-server/claims.db` for an active claim.
+| Situation | Exit | Effect |
+|---|---|---|
+| No claim, or DB missing (fresh repo) | `0` | allow |
+| Claim held by current `CLAUDE_SESSION_ID` | `0` | allow |
+| `CLAUDE_SESSION_ID` unset (single-agent / non-Claude invocation) | `0` | allow + stderr warning |
+| Unknown payload schema (no recognized path field) | `0` | allow |
+| Claim held by another session | `2` | deny, stderr names owner + expiry |
+| Any unexpected error (corrupt DB, jq failure, sqlite timeout, etc.) | `2` | fail-closed deny |
+**Never exits 1.** Claude Code treats non-zero as warning by default; we
+need a hard block on collision, so deny is always `exit 2`.
+### Prerequisites
+- `bash` (≥ 4 recommended — relies on `set -u` array safety patterns)
+- `jq`
+- `sqlite3`
+- `timeout` (GNU coreutils) **or** `gtimeout` (macOS, `brew install coreutils`) — optional but recommended; without it the SQLite query is unbounded (DB-level `busy_timeout` still applies)
+Run `bash .claude/hooks/forge-claim-check-doctor.sh` to verify prerequisites.
+### Environment
+| Var | Source | Purpose |
+|---|---|---|
+| `CLAUDE_PROJECT_DIR` | Claude Code | Project root, used to resolve relative paths and locate DB |
+| `CLAUDE_SESSION_ID` | Claude Code | Current session identifier — own claims pass through |
+| `FORGE_CLAIMS_DB` | optional override | Path to `claims.db` (defaults to `$CLAUDE_PROJECT_DIR/.forge/.mcp-server/claims.db`) |
+### Registration
+Not registered automatically. The install procedure (plan-06) adds the
+`PreToolUse` entry to `.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Edit|Write|MultiEdit|NotebookEdit",
+        "hooks": [
+          { "type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/forge-claim-check.sh" }
+        ]
+      }
+    ]
+  }
+}
+```
+### Disabling
+Rename or remove the hook entry in `.claude/settings.json`, or set the file
+non-executable: `chmod -x .claude/hooks/forge-claim-check.sh`. The hook is
+defense-in-depth — the MCP server's `forge_claim_files` tool remains the
+primary coordination point.
+### Troubleshooting
+- "internal error at line N" on every edit → corrupt DB or missing tool. Run doctor. Common: `jq` not on PATH.
+- No collisions detected → confirm `CLAUDE_SESSION_ID` set and `claims.db` exists; otherwise hook fail-opens.
+- macOS `timeout: command not found` → `brew install coreutils` for `gtimeout`, or skip (DB busy_timeout still applies).

package/template/.claude/skills/architecting/SKILL.md CHANGED Viewed

@@ -7,6 +7,17 @@ description: "Make architectural decisions: choose frameworks, design data model
 Make architectural decisions. Document rationale. Consider alternatives.
+## Vertical-Slice Bias
+Architectural decisions should preserve the planning skill's slice-first decomposition. Favor designs that let the team ship thin end-to-end user journeys early:
+- **Prefer feature-folder layouts** (`src/features/signup/{ui,api,data}.ts`) over strict layer-folder layouts (`src/models/`, `src/api/`, `src/components/`) when the project allows. Layer-folder layouts are not banned -- but they invite horizontal decomposition.
+- **Avoid framework choices that force big-bang integration.** If picking framework A means UI cannot be wired until the whole data layer ships, that's a red flag -- document the trade-off in the ADR's Consequences section.
+- **Contracts before completeness.** Define the minimal API contract a single slice needs. Resist designing the full API surface upfront -- successive slices extend it.
+- **Data models grow per slice.** Start with the fields slice 1 needs. Add columns/entities as later slices require. Reject "design the whole schema first" unless a `slice_exception: data_migration` phase is planned.
+When an architectural decision conflicts with vertical slicing (e.g., a framework that requires full backend before any UI is testable), surface the conflict explicitly in the ADR's **Trade-Offs** section.
 ## When to Use
 - Choosing a framework, library, or major dependency

package/template/.claude/skills/debugging/SKILL.md CHANGED Viewed

@@ -7,6 +7,20 @@ description: "Systematic debugging when tests fail, features break, errors are c
 Every hypothesis tested, every dead end recorded.
+## Entry Path: Merge Conflict [Experimental — M10]
+Invoked by `orchestrating` skill when `forge_queue_commit` returns `status: conflict`.
+**Payload:** `{ conflicted_files: [...], base_sha, messages: [...], branch }`
+**Workflow:**
+1. Inside agent's worktree: `git fetch origin main && git rebase ${base_sha}` (use payload ref).
+2. For each `conflicted_files[]` entry: inspect both sides via `git status` + `git diff`. Resolve per task context (or prompt user when intent unclear).
+3. After all resolved: `git add <files>` then `git rebase --continue`.
+4. Re-invoke `Skill(orchestrating)` with `action: retry-teardown` → orchestrating re-calls `forge_queue_commit`.
+**Abort path:** if user aborts or resolution stalls, leave worktree in conflicted state and append `{ kind: "merge_conflict", branch, files }` to `lifecycle.blockers[]` in `.forge/state/milestone-{id}.yml`. Single-agent debugging entry paths below are unaffected.
 ## Scientific Method
 1. **Observe**: Exact behavior — error, repro steps, when it started

package/template/.claude/skills/executing/SKILL.md CHANGED Viewed

@@ -35,6 +35,16 @@ Execution-phase operational guidance below supplements the rules — it does not
 ### Scope Boundary
 Only fix issues DIRECTLY caused by the current task. Pre-existing warnings, tech debt, unrelated bugs → log to `.forge/deferred-issues.md`.
+### Slice Integrity (Execution-Side)
+The planning skill enforces vertical slicing at plan-creation time. Executor responsibility: do not silently re-introduce horizontal decomposition.
+- **Do not split a slice plan into "backend now, UI later"** under Rule 1/2/3. If the UI half of a slice is broken, fix it -- do not defer it. Deferring the UI breaks the slice's user-observable truth.
+- **Do not collapse the slice into a stub** to pass verification. `key_links` must be real (component actually hits handler; handler actually persists). Stubbed links fail the verifying gate.
+- If a slice genuinely cannot ship end-to-end (e.g., external API blocker), invoke **Rule 4 -- STOP, ask user.** Options: redefine the slice, declare a `slice_exception:` and continue, or defer the phase. Do not autonomously ship half a slice.
+This rule does NOT override the 3-strike limit or scope boundary -- it sits alongside them.
 ## Native Task Tracking
 Use `TaskCreate`/`TaskUpdate`/`TaskList` for in-session visibility. `.forge/state/milestone-{id}.yml` remains the cross-session source of truth.
@@ -95,6 +105,30 @@ feat(auth-01): implement JWT-based login
 - Include integration test for login flow
 ```
+## Multi-Agent Claim Convention [Experimental — M10]
+**Trigger:** active milestone state has `lifecycle.worktree_mode: active` (set by `orchestrating` skill). If absent or any other value → skip this section entirely; single-agent behavior unchanged.
+**Pre-edit:** before the first `Edit` / `Write` / `MultiEdit` / `NotebookEdit` in a task, call MCP tool:
+```
+forge_claim_files {
+  session_id: lifecycle.session_id,
+  files: [<absolute paths from task <files> manifest>],
+  ttl_seconds: 1800,
+  reason: "executing m{M}-{N} task <name>"
+}
+```
+**Branches:**
+- **Full claim granted** → proceed with edits.
+- **Partial rejection** → surface `rejected[].file`, `rejected[].owner_session`, `rejected[].expires_at` to user. Three options: **abort** task, **skip** rejected files (continue with granted subset), **wait** then retry after `expires_at`.
+- **`DB_UNAVAILABLE`** → log warning, proceed. PreToolUse claim-check hook is defense-in-depth — coordination degrades to best-effort, isolation (worktree) still holds.
+**End of task:** call `forge_release_claims { session_id, files: [...] }` after final commit. Plan-complete bulk release handled by `orchestrating` teardown.
+See ADR-003 and `.claude/skills/orchestrating/SKILL.md`.
 ## Verification Gate
 After each task commit, run configured verification commands. Mechanical — not optional.
@@ -221,7 +255,21 @@ Log to `.forge/state/index.yml → desire_paths` (global, not per-milestone):
 - **User corrections**: Repeated correction matching a prior one → `user_correction`, increment count
 - **Agent struggles**: Multiple attempts or user guidance needed → `agent_struggle`
+## Cross-Layer Seam Check
+**Trigger:** the phase was split by planning Step 6.1 into a **Tier-2** producer plan-NNa + consumer plan-NNb (both carry a `contract:` frontmatter path pointing at the same `contract.md`). Single-plan / Tier-1 (`layer:` tag, no split, contract honored inline) → skip; no seam check needed.
+After **both** layer plans are committed, the executing flow owns one final **seam-check task** — there is no standing agent for this:
+1. **Read** the phase `contract.md` — `delta`, `producer_layer`, `consumer_layer`, `seam_check`, `status` (should already be `ratified` from the planning Tier-2 gate).
+2. **Merge** the layer branches/worktrees. If `lifecycle.worktree_mode: active`, the `orchestrating` teardown merges them; otherwise merge the sibling plan branches into the phase working tree.
+3. **Verify the seam** — run the assertion named in `contract.md` `seam_check` (a test, a type-check, or a structural grep) to prove the shape the producer emits matches what the consumer built against per `delta`.
+4. **Match** → commit the merge: `feat({phase}): seam check {integration_point}`. Contract stays `status: ratified`.
+   **Mismatch** → the consumer guessed wrong against a frozen contract → **Rule 1** fix on the consumer side. If the *contract itself* is wrong (producer can't emit the agreed shape) → **Rule 4** STOP, re-ratify with the user before proceeding.
+5. Leave the contract at `status: ratified` — the `reviewing` skill folds `delta` into the governing ADR (`status: absorbed`) at milestone landing. Do **not** absorb here.
 ## Phase Handoff
 1. Confirm persistence — summary documented, commits made, state updated, desire paths logged
-2. Set `current.status` to `verifying`
-3. Recommend: *"Tasks committed, state updated. `/clear` then `/forge` to continue with verifying."*
+2. **Run the Cross-Layer Seam Check** (above) if this phase was a Tier-2 contract split
+3. Set `current.status` to `verifying`
+4. Recommend: *"Tasks committed, state updated. `/clear` then `/forge` to continue with verifying."*

package/template/.claude/skills/forge/SKILL.md CHANGED Viewed

@@ -188,6 +188,8 @@ Tier + state → invoke via `Skill` tool. All phases use `Skill()`.
 **CRITICAL: NEVER `EnterPlanMode`.** "Planning" = `Skill(planning)`. Native plan mode writes wrong format, bypasses gates + state.
+**Experimental:** if user invokes `orchestrating` skill (M10) and repo has it installed (`.claude/skills/orchestrating/` present + MCP server + claim-check hook), route through it **before** `executing` to bootstrap multi-agent worktree. Skill is opt-in per ADR-001; absent install → fall through to standard routing.
 ### Auto-Routing (Always Deterministic)
 **No menus.** Applies on first run and resume. Deterministic. Brief → route. Choices only at `complete` or corrupted.
@@ -223,7 +225,7 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
 | reviewing | sonnet | Audit judgment |
 | quick-tasking | haiku | Speed |
 | discussing | sonnet | Conversation |
-| testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing |
+| testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing. M9: author-mode refuses e2e without `e2e:true` + `validated:true`. |
 | deferred | haiku | Read + format only |
 | `current.status` | Route To |
@@ -234,8 +236,8 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
 | `architecting` | `Skill(architecting)` → planning |
 | `planning` | `Skill(planning)` → executing |
 | `executing` | `Skill(executing)` → verifying |
-| `verifying` | `Skill(verifying)` → reviewing |
-| `reviewing` | `Skill(reviewing)` → complete |
+| `verifying` | `Skill(verifying)` → reviewing — runs M9 e2e validation gate when `e2e:true` stories present |
+| `reviewing` | `Skill(reviewing)` → complete — adds M9 e2e suite audit (soft-cap, orphans, flake-rate) |
 | `complete` | Done. Ask what's next. |
 | `deferred` | Milestone frozen. *"Resume milestone {id}" to reactivate.* |
 | `quick-tasking` | `Skill(quick-tasking)` |

package/template/.claude/skills/initializing/SKILL.md CHANGED Viewed

@@ -225,6 +225,19 @@ Glob: src/**/index.{ts,tsx,js}  # barrel exports
 Grep: src/ for "import.*from.*@/"  # path aliases
 ```
+### Step 3.5: Architectural Layers
+Detect distinct layers that hand a typed interface across a boundary — feeds the cross-layer contract detection in planning Step 6.1. A layer = a directory whose code is *produced for* or *consumed by* another (engine↔ui, core↔plugins, api↔web, native↔bindings).
+```bash
+Bash: ls -d */ src/*/ 2>/dev/null              # top-level + src subdirs as layer candidates
+Grep: cross-boundary imports (e.g. ui importing engine types, generated bindings, ABI/descriptor/schema files)
+```
+A single cohesive codebase with no internal producer→consumer boundary → **not** layered; leave `layers: []` (Step 6.1 no-ops). Only flag layers when one directory's output is another's typed input.
+Present detected layers for confirmation: *"Detected layers: [{name → path}]. These hand interfaces across a boundary — confirm or correct."* Confirmed 2+ → written to `project.yml` `layers:` and seeded into `.forge/contracts/index.yml` at Finalize.
 ### Step 4: Present
 *"Project: {name} — {description}
@@ -269,6 +282,12 @@ User describes project → `.forge/project.yml`: name, goal, stack, constraints,
 Validate each term against `.forge/templates/interface-detection.md` type vocabulary. On unrecognized term, prompt: *"Did you mean [closest match]? Valid: browser | cli | api | desktop | native-apple | none."* Write validated answer as `interface: [...]` in project.yml.
+### Step 1.6: Architectural Layers
+*"Will this project have distinct layers that hand a typed interface across a boundary (e.g. engine ↔ ui, core ↔ plugins, api ↔ web)? List them as name → path, or 'no' for a single-layer project."*
+2+ layers → write `layers:` to project.yml + seed `.forge/contracts/index.yml` at Finalize. Otherwise `layers: []` (planning Step 6.1 no-ops).
 ### Step 2: Design System
 *"UI library?"*
@@ -314,10 +333,11 @@ User selects per stack.
 ## Finalize
-1. Write `.forge/project.yml` (all info + `verification`)
+1. Write `.forge/project.yml` (all info + `verification` + `layers`)
 2. Write `.forge/constitution.md`
 3. Write `.forge/design-system.md` (if configured)
-4. Init state:
+4. Write `.forge/contracts/index.yml` (only if `layers:` has 2+ entries) — copy `.forge/templates/contracts-index.yml`, fill `layers:` from the confirmed list, leave `integration_points:` empty (first cross-layer phase populates them via planning Step 6.1)
+5. Init state:
    - `.forge/state/index.yml`:
      ```yaml
      milestones:
@@ -339,7 +359,7 @@ User selects per stack.
        task: null
        status: not_started
      ```
-5. Templates as needed
+6. Templates as needed
 *"Initialized. Ready?"*

package/template/.claude/skills/planning/SKILL.md CHANGED Viewed

@@ -7,6 +7,26 @@ description: "Break work into executable tasks with verification gates. Enforces
 > **Do NOT use `EnterPlanMode`.** Output -> `.forge/phases/`.
+## Core Principle: Vertical Slicing
+**Every phase and every plan MUST deliver a thin vertical slice -- a user-observable behavior reachable end-to-end (UI -> API -> data, or CLI -> core -> output).** Never decompose by horizontal layer (all models, then all APIs, then all UI). Horizontal slicing defers user-testable behavior until the last phase and amplifies integration risk.
+Why:
+- Each slice is testable, demoable, shippable on its own
+- Bugs surface at the seam (where layers meet) on day one, not week three
+- User can redirect direction after slice 1 instead of after the whole stack lands
+Apply at three levels:
+- **Roadmap (Step 5)**: phases are slices, not layers
+- **Decompose (Step 6)**: plans are slices, not layers
+- **Verify (Step 8)**: Slice Integrity gate -- hard fail on layer-only plans
+Exceptions (must be explicitly justified in plan frontmatter `slice_exception:`):
+- Foundational infra phase that no slice can reach yet (build setup, framework bootstrap)
+- Shared library / cross-cutting refactor with no user-facing surface
+If you find yourself writing a plan that only touches `src/models/`, `src/db/`, `src/schemas/`, `src/api/` (without UI/CLI counterpart), or `src/components/` (without data path) -- STOP. Merge with the slice that reaches the user, or claim an exception.
 ## Step 1: Resolution Gate
 Read `.forge/context.md` **Needs Resolution**. If unchecked `- [ ]` items:
@@ -53,6 +73,17 @@ If missing, create from `.forge/templates/requirements.yml`:
 5. P1 (must) / P2 (should) / P3 (nice)
 6. Deferred: DEF-001... (also globally unique)
+**E2E gate (M9):** For each functional requirement being added or refined:
+1. Decide `e2e: true|false` -- does this story need a post-validation e2e test?
+   - true = high-value user journey worth a real-browser walk + automated guard
+   - false = covered by integration/unit, or low-value to e2e
+   - Default to false. Only flag true for spine flows (auth, checkout-class flows, primary user task).
+2. When `e2e:true`, capture `observable_outcome:` -- one sentence describing what the user observes when the flow succeeds. Block planning until provided. No silent default.
+3. Re-planning: read existing `e2e` / `observable_outcome` decisions from `requirements/m{N}.yml`. Preserve them. Only prompt for new or unflagged FRs.
+4. Write `e2e`, `observable_outcome`, `validated: false`, `observable_outcome_hash: ""` to each FR. The hash + `validated` flip later in verifying.
+Contract: locked decision in `.forge/context.md` (M9 section, "Approach D"). Do NOT enforce the e2e soft cap here -- that's reviewing's job.
 **Blocks until all P1 `[NEEDS CLARIFICATION]` resolved.**
 Never write to top-level `.forge/requirements.yml` -- that path is deprecated.
@@ -66,11 +97,14 @@ Never write to top-level `.forge/requirements.yml` -- that path is deprecated.
 ### Case A: `roadmap.yml` missing (Full only)
 Create from `.forge/templates/roadmap.yml`:
-1. Group by delivery boundaries
-2. Inter-group dependencies
-3. Phases (coherent, verifiable)
+1. **Group by vertical slice, NOT by layer.** Each phase = a thin end-to-end user journey. Wrong: `m1-models`, `m2-apis`, `m3-ui`. Right: `m1-user-can-sign-up`, `m2-user-can-post`, `m3-user-can-comment`.
+2. Inter-slice dependencies (slice B builds on artifact from slice A)
+3. Each phase has a one-sentence `goal:` written as user-observable outcome ("User can X"), never as "Build Y"
 4. Every FR -> one phase, no orphans
-5. Waves: independent=1, dependent=2+
+5. Waves: independent slices=1, dependent slices=2+
+6. **Phase 1 must be demoable.** If phase 1 has no user-observable output, the roadmap is layered -- redesign.
+Anti-pattern detection: scan proposed phase names. Reject if any phase name contains layer-only terms without a user verb: `models`, `schema`, `database`, `api-only`, `backend-only`, `ui-only`, `frontend-only`, `infrastructure` (unless tagged as exception phase).
 ### Case B: `roadmap.yml` exists, current milestone already in it
@@ -94,16 +128,62 @@ If a sibling milestone (e.g. m50) has state + requirements but is missing from `
 ## Step 6: Decompose Tasks
-Per phase (or feature, Standard tier):
+### Step 6.1: Cross-Layer Contract Detection
+Before decomposing, classify whether this phase crosses a layer boundary with a *new or changing* contract. This decides single-plan vs a contract-pinned layer split. Read the project's layers from `project.yml` `layers:` (or `.forge/contracts/index.yml`; fallback: top-level source dirs).
+**Trigger -- all three hold:**
+1. Work touches >= 2 declared layers (e.g. engine / blocks / ui).
+2. A struct / signature / ABI field / descriptor is *produced* by one layer and *consumed* by another.
+3. That interface is *new or changing* in this phase.
+**Litmus (decisive):** "Would an agent building one layer in isolation have to GUESS the shape the other layer owns?" No (already specified in a durable contract) -> not cross-layer here.
+**Two contract tiers:**
+- **Durable** = the standing layer API. Lives in ADRs (`.forge/decisions/`) + constitution, indexed in `.forge/contracts/index.yml` (integration-point -> governing ADR). Stable + unchanged -> agents read the ADR; no per-phase artifact.
+- **Per-phase delta** = the specific new/changed shape THIS phase introduces. Pinned in `.forge/phases/m{M}-{N}-{name}/contract.md` (from `.forge/templates/contract.md`); references its governing ADR; folded back into that ADR on landing.
+**Classify:**
+| Tier | Condition | Response |
+|------|-----------|----------|
+| 0 | Trigger fails | Normal decomposition (6.2). Nothing added. |
+| 1 | Cross-layer delta, small / tightly sequential | Write `contract.md`; tag tasks `layer:`; ONE plan. No interruption. |
+| 2 | Cross-layer delta, cleanly separable, worth parallel sessions | Pin `contract.md`; split into plan-NNa (producer layer, pins contract) + plan-NNb (consumer layer, `depends_on` the contract). Ratify gate. |
+**Liberal detect, conservative interrupt:** classify every phase. Unsure between 1 and 2 -> default **Tier 1** (write the doc, no interruption). Escalate to 2 only when confident the parallel split pays off.
+**Tier-2 ratify gate** (the ONLY interruption; frame as contract-correctness, not "parallelize y/n"):
+> *"This phase changes the {integration point} contract ({governing ADR}). Delta: [summary]. plan-NNa ({producer}) pins it; plan-NNb ({consumer}) builds against it in parallel. Is this contract shape correct?"*
+Block the split until confirmed. Override ("keep it one plan") -> log to `state/index.yml` `desire_paths` (recurring overrides tune the threshold), fall back to Tier 1.
+**Integration (Tier 2):** layer plans build isolated (per-layer worktrees). The phase's final task is a **seam check** owned by the executing flow (NOT a standing agent): merge the layer branches, verify the shape the producer emits matches what the consumer built against, per `contract.md`.
+### Step 6.2: Task Decomposition
+Per phase (or feature, Standard tier). **Each plan = one vertical slice** -- except a sanctioned Tier-2 contract split (6.1), which divides one slice across producer/consumer layer plans reconciled at the seam check.
+#### Slice-First Decomposition
+Before writing any plan:
+1. List the user-observable behaviors this phase must deliver (from `requirements/m{N}.yml`)
+2. For each behavior, identify the full path: UI/CLI surface -> handler -> business logic -> persistence (only the parts that behavior needs)
+3. **One plan = one behavior end-to-end.** Plan touches every layer that behavior needs, not all of one layer.
+4. If a plan can only ship part of the path (e.g., UI without backend wired), it is NOT a slice -- restructure.
+Plan naming reflects the slice: `plan-01-user-signs-up.md`, not `plan-01-models.md`.
+#### File Layout
 1. `.forge/templates/plan.md` -> `.forge/phases/m{M}-{N}-{name}/plan-{NN}.md`
    - `{M}`=milestone, `{N}`=phase#, `{name}`=kebab, `{NN}`=seq
    - Ex: `.forge/phases/m3-2-providers/plan-01.md`
-2. Frontmatter: phase, plan#, wave, deps
+2. Frontmatter: phase, plan#, wave, deps, `slice_exception:` (optional, see Core Principle)
 3. must_haves:
-   - **Truths:** User-observable outcomes (3-7)
-   - **Artifacts:** Must exist, substantive not stubs
-   - **Key Links:** Connections between artifacts
+   - **Truths:** User-observable outcomes (3-7). MUST be phrased as something the user can see, click, or receive -- not "model X exists" or "table Y created".
+   - **Artifacts:** Must exist, substantive not stubs. Slice plans typically span 2-4 layers (e.g., component + handler + repo).
+   - **Key Links:** Connections between artifacts -- these prove the slice is wired, not stubbed.
 4. XML tasks (2-3/plan, 15-60 min):
 ```xml
@@ -133,20 +213,45 @@ Per phase (or feature, Standard tier):
 | `checkpoint:decision` | Pause for user choice between options |
 | `checkpoint:human-action` | Pause for manual action (email verification, 2FA) |
-### Vertical Slices (Preferred)
+### Vertical Slices (Required)
 ```
-Plan 01: User feature (model + API + UI)   → Wave 1
-Plan 02: Product feature (model + API + UI) → Wave 1
+Plan 01: User can sign up    (UI form + /api/signup + users table write)  → Wave 1
+Plan 02: User can log in     (UI form + /api/login + session issue)        → Wave 2 (uses table from 01)
+Plan 03: User can post note  (UI editor + /api/notes + notes table)        → Wave 2 (uses auth from 02)
 ```
-Independent plans run parallel.
-### Avoid Horizontal Layers
+Each plan is independently demoable. Bugs at layer seams surface in plan 01.
+### Horizontal Layers (Anti-Pattern -- BLOCKED)
 ```
-Plan 01: All models    → Wave 1
-Plan 02: All APIs      → Wave 2 (depends on 01)
-Plan 03: All UI        → Wave 3 (depends on 02)
+Plan 01: All models       → Wave 1
+Plan 02: All APIs         → Wave 2 (depends on 01)
+Plan 03: All UI           → Wave 3 (depends on 02)
 ```
-Sequential. Only when architecturally required.
+This decomposition is **rejected by default** at Step 8 (Slice Integrity gate). To proceed, declare `slice_exception:` in plan frontmatter with one of:
+- `infra_bootstrap` -- foundational setup with no user-reachable surface
+- `shared_library` -- cross-cutting utility used by future slices
+- `data_migration` -- one-shot schema/data change with no behavior added
+Anything else: restructure into slices.
+### Anti-Pattern Auto-Detector
+A plan fails Slice Integrity if ALL of:
+- `must_haves.truths` contain only artifacts/internals (e.g., "Schema migration applied", "Model class created") with no user-visible verb (see, click, receive, fail-with-error)
+- `must_haves.artifacts` paths all live under a single layer prefix (only `src/models/`, only `src/api/`, only `src/components/`)
+- No `slice_exception:` declared
+Detection runs in Step 8.
+### Contract-Driven Layer Split (Tier 2 exception)
+When Step 6.1 flags a **Tier-2** cross-layer contract, splitting by layer is correct -- NOT the horizontal anti-pattern above. The difference:
+- *Horizontal anti-pattern:* split by layer with no contract; each layer waits on the previous. Serializes.
+- *Contract-driven split:* plan-NNb `depends_on` the **frozen contract** (pinned by NNa up front), not NNa's implementation -> both layers build in parallel (separate sessions/worktrees), reconciled at the seam check.
 ## Step 7: Test Specs (Optional)
@@ -238,7 +343,7 @@ Decision captured once, pre-code. Does not block planning.
 ## Step 8: Verify Plans
-8 dimensions:
+9 dimensions:
 1. **Requirement Coverage** -- every req has task(s)
 2. **Task Completeness** -- files + action + verify + done
 3. **Deps** -- valid DAG, no cycles
@@ -247,6 +352,20 @@ Decision captured once, pre-code. Does not block planning.
 6. **Verification** -- must_haves trace to goal
 7. **Context** -- honors locked, excludes deferred
 8. **Spec Validity** -- valid syntax, correct paths
+9. **Slice Integrity (HARD GATE)** -- every plan delivers a vertical slice OR declares `slice_exception:`
+### Slice Integrity Check
+For each plan, FAIL if all of these hold and no `slice_exception:` is declared:
+- `must_haves.truths` lack a user-observable verb (`see`, `click`, `submit`, `receive`, `view`, `download`, `error`, `redirected`, `login`, `signup`, etc.) -- internal-only truths like "Schema applied", "Model registered", "Index built" do not satisfy
+- `must_haves.artifacts` paths cluster in a single layer (all under `models/`, all under `api/`, all under `components/`, etc.)
+- No file path crosses a layer boundary (e.g., a component plus its handler, a CLI plus its core)
+**Exempt:** a Tier-2 contract-split plan (Step 6.1) carries both `layer:` and `contract:` frontmatter. It is a sanctioned single-layer plan reconciled at the seam check -- treat as auto-`slice_exception` (the *phase*, not the plan, owns the vertical slice). It passes without declaring `slice_exception:`.
+Roadmap-level check: FAIL if phase 1 has no user-observable goal.
+On fail: restructure plan(s) into slices, or declare `slice_exception:` with one of `infra_bootstrap | shared_library | data_migration` and a one-line rationale. Re-verify.
 Issues -> fix, re-verify. Max 3 cycles.

package/template/.claude/skills/reviewing/SKILL.md CHANGED Viewed

@@ -150,6 +150,52 @@ refactoring_scan:
       suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
 ```
+### Part 4: E2E Suite Audit (M9)
+Three sub-checks. All advisory. None block milestone close.
+**1. Soft-cap warning**
+- Read `verification.e2e_soft_cap` from `.forge/project.yml`. Default 10 if absent.
+- Count `e2e: true` stories in the active milestone's `.forge/requirements/m{N}.yml`.
+- If count > cap → warn: `"E2E soft cap exceeded: {count}/{cap} stories flagged. Trim e2e:true stories or raise verification.e2e_soft_cap in project.yml. Soft cap — does not block."`
+- **Skip-clean:** zero `e2e:true` stories → sub-check omitted from report.
+**2. Orphan-test detection**
+- Glob for e2e test files. Stack-detect from `project.yml` `interface_tools` (fallback: Playwright `tests/e2e/**/*.spec.ts` + `e2e/**/*.spec.ts`; pytest `tests/e2e/test_*.py`; go `e2e/*_test.go`).
+- For each file, grep for `story: FR-` (either in comment or test/function name).
+- If no match → flag: `"Orphan e2e test: {path} — no FR-XXX reference found. Either tag the story or delete the test."`
+- List orphans in a dedicated subsection.
+- **Skip-clean:** zero e2e files discovered → sub-check omitted from report.
+**3. Flake-rate signal**
+- Best-effort. Attempt sources in order:
+  1. `.forge/testing/suite-health.md` flake entries (tester analyst-mode output)
+  2. GitHub Actions test summary artifacts (parse from `.github/workflows/` outputs if accessible)
+  3. Local `playwright-report/` retry counts (if present)
+- Aggregate per-test flake count. Surface top 5 flakiest with counts.
+- If no source available AND e2e files exist → emit `"Flake-rate: no data (run testing skill analyst-mode for suite-health.md)"`.
+- Never blocks.
+- **Skip-clean:** zero e2e files discovered → sub-check omitted entirely (no "no data" line).
+**Section-level skip-clean:** zero e2e test files AND zero `e2e:true` stories → omit the entire "E2E Suite Audit" section from the health report.
+```yaml
+e2e_suite_audit:
+  soft_cap:
+    count: 3
+    cap: 10
+    status: ok                # ok | exceeded
+  orphan_tests:
+    files_scanned: 4
+    orphans: []               # list of paths with no story: FR- reference
+  flake_rate:
+    source: "suite-health.md" # or "no data"
+    top_flaky: []             # [{path, count}, ...]
+```
 ## Step 4: Score
 **Per-category:**
@@ -325,9 +371,21 @@ If the milestone being completed has `milestone.origin: {R-id}` set (promoted fr
 3. Update item: `status: resolved`, set `completed: "<ISO 8601 date>"`. Keep `promoted_to: {milestone-id}` intact for audit trail.
 4. Log in summary: *"Backlog item {R-id} → resolved (promoted milestone {id} complete)."*
+## Contract Landing (cross-layer phases)
+If the milestone's phases produced `contract.md` files (planning Step 6.1 Tier 1/2), close their lifecycle before completing the milestone. The durable contract is the ADR; the per-phase `contract.md` is a working delta that must be folded back in.
+1. Glob `.forge/phases/m{id}-*/contract.md`.
+2. For each contract not yet `absorbed` (Tier-2 lands at `ratified` after the executing seam check; Tier-1 lands at `proposed` — both fold the same way now that the phase is verified):
+   - Fold `delta` into its `governing_adr` — amend the ADR in `.forge/decisions/`, or supersede it (`Status: Superseded by ADR-{NNN}`) if the shape changed materially.
+   - If a **new** integration point was introduced, add it to `.forge/contracts/index.yml` `integration_points:` (id, produces, consumes, governing_adr, summary).
+   - Set the contract's `status: absorbed`. The ADR is now authoritative; `contract.md` becomes history.
+3. Any contract with no `governing_adr` set (nothing to fold into) → warn: *"Contract {integration_point} has no governing ADR — file one in `.forge/decisions/` before close, or the durable contract drifts from code."* Advisory — does not block completion.
 ## Phase Handoff
 1. Confirm report + backlog
 2. **Run promoted-milestone completion hook** (above) if `milestone.origin` set
-3. Set `current.status: complete` and `current.completed_at: "<ISO 8601 timestamp>"`
-4. *"Milestone [{name}] complete. Report: `.forge/audits/milestone-{id}-health-report.md`. {N} backlog items. `/forge` or backlog."*
+3. **Run Contract Landing** (above) for any cross-layer phases — fold ratified contracts into their ADRs
+4. Set `current.status: complete` and `current.completed_at: "<ISO 8601 timestamp>"`
+5. *"Milestone [{name}] complete. Report: `.forge/audits/milestone-{id}-health-report.md`. {N} backlog items. `/forge` or backlog."*

package/template/.claude/skills/testing/SKILL.md CHANGED Viewed

@@ -66,6 +66,35 @@ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
 ### Author Mode
+#### E2E Preflight Gate (M9)
+Runs ONLY for e2e authoring requests. Integration-test authoring + analyst mode skip this gate entirely.
+**Preconditions per story** — for every FR the user requests an e2e for:
+1. Read `.forge/requirements/m{N}.yml`, locate the FR by ID.
+2. Check `e2e: true`. If false or missing → REFUSE with:
+   `"Story {FR-ID} not flagged for e2e — add `e2e: true` + `observable_outcome` in requirements/m{N}.yml first (planning skill captures this during story breakdown)."`
+3. Check `validated: true`. If false or missing → REFUSE with:
+   `"Story {FR-ID} not yet validated by human — run verifying skill and walk the flow first (the e2e validation gate writes validated:true on confirmation)."`
+4. Recompute `observable_outcome_hash` from current outcome text (SHA-256 utf-8, first 12 hex). Compare to stored hash. If mismatch → REFUSE with:
+   `"Story {FR-ID} observable_outcome changed since validation — re-run verifying skill to re-validate the updated flow."`
+5. Only when all three pass: proceed to author the e2e test.
+**Story stamping (required on every authored e2e)** — every generated e2e file MUST include the story reference. Use the framework's natural mechanism:
+- Playwright / Vitest / Jest TS: `// story: FR-XXX` at the top of the spec file AND the FR ID in the test name (e.g. `test('FR-053: user signs in with correct credentials', ...)`)
+- pytest: `# story: FR-XXX` at the top of the test module AND in the test function name (`def test_FR_053_user_signs_in(...)`)
+- go test: `// story: FR-XXX` above the test function AND in the test name (`func TestFR053UserSignsIn(t *testing.T)`)
+No story ID = orphan. Reviewing skill (phase 17) flags orphans for deletion.
+**Integration + analyst modes** — unchanged. No flag check, no validated check, no story-ID stamping enforcement. M9 lock is e2e-only.
+Refusal message wording is contract (NFR-009 requires story ID + exact missing field). Do not paraphrase.
+#### Standard author flow
 1. **Determine layer** — e2e vs integration. Ask if ambiguous.
 2. **Select runner:**
    - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)

package/template/.claude/skills/verifying/SKILL.md CHANGED Viewed

@@ -91,6 +91,35 @@ Re-run verifying after tests are added.
 If detection is ambiguous (e.g. API tests hard to grep definitively) → lean toward PASS to avoid false blocks; note uncertainty in the verdict.
+## E2E Validation Gate (M9)
+Runs AFTER code-level verification commands pass. Skipped if no `e2e:true` stories in the active milestone.
+### Steps
+1. Read `.forge/requirements/m{N}.yml` for the active milestone. Collect every functional requirement with `e2e: true`.
+2. If list is empty → skip gate silently. No prompt. No error.
+3. For each `e2e:true` FR, present to the human:
+   - FR ID + description
+   - `observable_outcome` text verbatim
+   - Prompt: *"Walk this flow manually. Did the observable outcome occur? [confirm | decline | skip]"*
+4. Per response:
+   - **confirm** → compute `observable_outcome_hash` = SHA-256(observable_outcome utf-8), truncate to first 12 hex chars. Write `validated: true` + the hash to the FR entry in `requirements/m{N}.yml`.
+   - **decline** → leave `validated: false`. Record decline + reason (free text) in the verification report.
+   - **skip** → leave `validated: false`. Record skip in the verification report. No reason required.
+5. **Hash drift check** (run BEFORE prompting, every gate invocation): for each `e2e:true` FR with `validated: true`, recompute hash from current `observable_outcome`. If it differs from stored `observable_outcome_hash` → set `validated: false`, clear hash. Note auto-reset in verification report. Then prompt that FR as unvalidated.
+6. Write per-FR validation outcomes into the verification report under section "E2E Validation".
+### Gate behavior
+- **Advisory, not blocking.** Verifying still passes even if no stories validated — the hard gate is in `testing` skill author-mode (phase 16). This gate's job is to surface + record, not block.
+- Per-story (not batch). Human walks one at a time.
+- Hash: SHA-256, UTF-8 input, hex output truncated to first 12 chars. Deterministic across machines.
+### Skip-clean
+Milestones with zero `e2e:true` stories never see this gate. Verifying logs nothing — appears as if the gate doesn't exist.
 ## 3-Level Goal-Backward Verification
 ### Level 1: Observable Truths

package/template/.forge/templates/contract.md ADDED Viewed

@@ -0,0 +1,27 @@
+# Phase Contract: {integration point}
+Copy to `.forge/phases/m{M}-{N}-{name}/contract.md` when planning **Step 6.1** detects a cross-layer delta (Tier 1 or 2). Pins the NEW or CHANGED interface shape the producing and consuming layers must agree on for this phase, so an agent building one layer in isolation does not have to guess the other's shape.
+Lifecycle: pinned by the producer plan (NNa) BEFORE the consumer plan (NNb) builds against it -> ratified at the Tier-2 gate -> folded into the governing ADR when the phase lands (`status: absorbed`). The durable contract is the ADR; this file is the working delta on top of it.
+---
+```yaml
+contract:
+  integration_point: ""        # e.g. "engine -> ui (block descriptor)"
+  governing_adr: ""            # durable contract this delta extends, e.g. "ADR-026" (see .forge/contracts/index.yml)
+  producer_layer: ""           # owns + pins the shape (plan-NNa), e.g. "engine"
+  consumer_layer: ""           # builds against it (plan-NNb), e.g. "ui"
+  status: proposed             # proposed | ratified | absorbed
+  delta: |
+    # The exact new/changed shape the consumer must NOT have to guess:
+    # struct fields, ABI fields + sizeof/layout, function signatures,
+    # enum values, units, ownership. ASCII only.
+  seam_check: ""               # how integration verifies producer output == consumer expectation,
+                               # e.g. "ui block_shape_projection_test asserts port count from descriptor"
+```
+## Notes
+- One contract block per cross-layer delta. A phase changing two integration points pins two blocks (or two files).
+- The consumer plan's `depends_on` points at THIS contract reaching `status: ratified`, NOT the producer plan's completion -- that is what lets the layers build in parallel.
+- On landing: amend or supersede `governing_adr` to absorb `delta`, set `status: absorbed`. The ADR is then authoritative; this file becomes history.

package/template/.forge/templates/contracts-index.yml ADDED Viewed

@@ -0,0 +1,35 @@
+# Durable cross-layer contract index
+#
+# Maps each standing integration point between layers to the ADR(s) that
+# govern it. Planning Step 6.1 reads this to (a) know an integration point
+# already has a durable contract, (b) decide whether the current phase
+# CHANGES it, (c) point producer/consumer agents at the authoritative shape.
+#
+# Durable contracts live in the ADRs themselves (.forge/decisions/). This
+# index is only the lookup. A phase that changes a contract pins a per-phase
+# delta in its contract.md, then folds it back into the governing ADR.
+#
+# Copy to .forge/contracts/index.yml and fill in for your project.
+# The project's architectural layers -- used by the cross-layer trigger
+# (Step 6.1 condition 1: "work touches >= 2 declared layers").
+layers:
+  - name: ""                   # e.g. "engine"
+    path: ""                   # e.g. "engine/"
+  # - name: "blocks"
+  #   path: "blocks/"
+  # - name: "ui"
+  #   path: "ui/"
+# Standing integration points and their governing ADR(s).
+integration_points:
+  - id: ""                     # e.g. "engine<->block"
+    produces: ""               # producer layer, e.g. "engine"
+    consumes: ""               # consumer layer, e.g. "blocks"
+    governing_adr: []          # e.g. ["ADR-001"]
+    summary: ""                # one line: what the contract covers
+  # - id: "engine->ui"
+  #   produces: "engine"
+  #   consumes: "ui"
+  #   governing_adr: ["ADR-026"]
+  #   summary: "Block descriptor: UI projects ports/params/layout from the engine descriptor"

package/template/.forge/templates/plan.md CHANGED Viewed

@@ -16,20 +16,32 @@ type: execute                        # execute | tdd
 wave: 1                              # Execution wave (1 = no dependencies)
 depends_on: []                       # Plan IDs that must complete first
 autonomous: true                     # false if contains checkpoints
+layer: ""                            # name from project.yml layers[], or "" -- set by planning Step 6.1 cross-layer split (Tier 1/2)
+contract: ""                         # path to this phase's contract.md (cross-layer delta this plan pins/consumes), if any
+# Vertical slice declaration. A plan delivers a thin end-to-end user behavior
+# (UI -> API -> data, or CLI -> core -> output). Plans that touch only ONE layer
+# are rejected by the planning Slice Integrity gate unless slice_exception is set.
+slice_exception: null                # null | infra_bootstrap | shared_library | data_migration
+slice_exception_rationale: ""        # Required if slice_exception != null. One line.
 must_haves:
-  truths:                            # Observable from user perspective when plan is done
-    - ""                             # e.g., "User can see their profile page"
-    - ""                             # e.g., "API returns user data as JSON"
-  artifacts:                         # Files that must exist and be substantive (not stubs)
-    - path: ""                       # e.g., "src/components/Profile.tsx"
-      provides: ""                   # e.g., "User profile display with avatar and bio"
+  truths:                            # USER-observable when plan is done. Must contain a user verb
+                                     # (see, click, submit, receive, view, download, error, redirect).
+                                     # Internal-only truths like "Schema applied" do NOT satisfy.
+    - ""                             # e.g., "User submits signup form and lands on dashboard"
+    - ""                             # e.g., "Invalid email shows inline 'must be a valid email' error"
+  artifacts:                         # Files that must exist and be substantive (not stubs).
+                                     # Slice plans typically span 2-4 layers — paths should cross
+                                     # boundaries (component + handler + repo), not cluster in one dir.
+    - path: ""                       # e.g., "src/components/SignupForm.tsx"
+      provides: ""                   # e.g., "Signup form with email + password fields"
       min_lines: 30                  # Stub detection threshold
-  key_links:                         # Critical connections between artifacts
-    - from: ""                       # e.g., "src/components/Profile.tsx"
-      to: ""                         # e.g., "/api/users/[id]"
-      via: ""                        # e.g., "fetch in useEffect"
-      pattern: ""                    # e.g., "fetch.*api/users"
+  key_links:                         # Connections between layers — these prove the slice is wired
+    - from: ""                       # e.g., "src/components/SignupForm.tsx"
+      to: ""                         # e.g., "/api/signup"
+      via: ""                        # e.g., "fetch on submit"
+      pattern: ""                    # e.g., "fetch.*api/signup"
 ```
 ## Tasks

package/template/.forge/templates/project.yml CHANGED Viewed

@@ -14,6 +14,12 @@ tech_stack:
   testing: ""                       # e.g., Vitest, Jest, Pytest
   other: []                         # Additional key dependencies
+layers: []                             # Architectural layers — enables cross-layer contract detection (planning Step 6.1).
+                                       # Populated during init (brownfield: producer/consumer source dirs; greenfield: asked).
+                                       # Each entry: {name, path}. Empty/absent = single-layer project, detection no-ops.
+                                       # e.g. [{name: engine, path: src/engine/}, {name: ui, path: src/ui/}]
+                                       # Durable integration points + governing ADRs live in .forge/contracts/index.yml.
 interface: [none]                      # Surfaces this project exposes: browser | cli | api | desktop | native-apple | none
                                        # Array — e.g. [browser, api] for full-stack projects
@@ -60,6 +66,7 @@ verification:
     #   advisory: true                # pre-existing type errors — warn, don't block
   auto_fix: true                      # On failure, agent fixes and retries
   max_retries: 2                      # Max auto-fix attempts per command (0 = fail immediately)
+  e2e_soft_cap: 10                    # M9: advisory cap on e2e:true stories per milestone. Reviewing warns when exceeded. Soft — never blocks.
   # Advisory mode: commands already failing before Forge started run but don't block — warn only.
 success_criteria:                   # How do we know we're done?

package/template/.forge/templates/requirements.yml CHANGED Viewed

@@ -7,6 +7,11 @@
 milestone: 1                         # Milestone this file belongs to (matches state/milestone-{id}.yml)
 version: "v1"                        # v1 = MVP, v2 = next iteration
+# E2E fields (M9): mark `e2e: true` + `observable_outcome` during planning. Verifying skill
+# prompts a human walk; on confirm it sets `validated: true` + `observable_outcome_hash`.
+# Testing skill author-mode refuses e2e without validated:true. Reviewing skill warns on
+# soft-cap exceeded + flags orphan tests. Fields are lazy — absent = e2e:false/validated:false.
 functional:
   # Each requirement: unique ID, description, acceptance criteria, phase assignment
   - id: FR-001
@@ -18,7 +23,12 @@ functional:
     phase: null                      # Assigned during roadmap creation
     priority: P1                     # P1 = must-have, P2 = should-have, P3 = nice-to-have
     status: pending                  # pending | clarifying | planned | implemented | verified
-    notes: ""                        # [NEEDS CLARIFICATION] if uncertain
+    notes: ""
+    # E2E gate (M9). Lazy — absent fields = e2e:false, validated:false.
+    # e2e: false                     # true = story gets one e2e test post-validation
+    # observable_outcome: ""         # one-sentence user-observable outcome (required when e2e:true)
+    # observable_outcome_hash: ""    # auto-computed SHA-256 of outcome (12 hex chars); editing outcome resets validated
+    # validated: false               # set true by verifying skill after human walks the flow                        # [NEEDS CLARIFICATION] if uncertain
   - id: FR-002
     description: ""

package/template/.forge/templates/roadmap.yml CHANGED Viewed

@@ -1,6 +1,16 @@
 # Forge Roadmap Template
 # Copy to .forge/roadmap.yml and customize.
-# Phases are delivery boundaries — coherent, verifiable capabilities.
+#
+# Phases are VERTICAL SLICES — thin end-to-end user journeys shippable on their own.
+# NOT horizontal layers (do NOT carve into "all models" / "all APIs" / "all UI" phases).
+#
+# Right:  m1-user-can-sign-up, m2-user-can-post, m3-user-can-comment
+# Wrong:  m1-models, m2-apis, m3-ui
+#
+# Each phase's `goal:` must read as a user-observable outcome ("User can X"),
+# never as "Build Y". Phase 1 must be demoable. The planning skill's
+# Slice Integrity gate will reject layered roadmaps unless a `slice_exception:`
+# is declared on the plan (infra_bootstrap | shared_library | data_migration).
 roadmap:
   # Milestones group phases into concurrent work streams.
@@ -16,13 +26,15 @@ roadmap:
   # Example: m1-1-foundation/, m1-2-auth/, m2-1-dashboard/
   phases:
     - id: 1
-      name: ""                       # e.g., "Foundation & Architecture"
-      goal: ""                       # Outcome, not task. "Users can X" not "Build Y"
-      requirements: []               # List of FR-IDs this phase delivers
+      name: ""                       # Vertical slice name, e.g., "User can sign up"
+      goal: ""                       # User-observable outcome. "User can X". Never "Build Y".
+      slice_exception: null          # null | infra_bootstrap | shared_library | data_migration
+                                     # Only set if this phase legitimately has no user-facing surface.
+      requirements: []               # List of FR-IDs this phase delivers (end-to-end)
       dependencies: []               # Phase IDs that must complete first
-      success_criteria:              # Observable truths when phase is done
-        - ""                         # e.g., "User can see the landing page"
-        - ""                         # e.g., "API returns valid JSON for /users"
+      success_criteria:              # User-observable truths when phase is done
+        - ""                         # e.g., "User submits signup form and lands on dashboard"
+        - ""                         # e.g., "Invalid email shows inline error"
       estimated_hours: null
       status: pending                # pending | researching | planning | executing | verifying | deferred | complete

package/template/CLAUDE.md CHANGED Viewed

@@ -51,15 +51,18 @@ Auto-detects complexity. Override: "Use Quick/Standard/Full tier."
 | Architectural decisions | `architecting` | Full |
 | Break work into tasks with gates | `planning` | Standard, Full |
 | Build with deviation rules + atomic commits | `executing` | All |
-| Prove work delivers on goals | `verifying` | Standard, Full |
-| Audit health + catalog refactoring | `reviewing` | Standard, Full |
+| Prove work delivers on goals (+ M9 e2e validation gate when `e2e:true` stories present) | `verifying` | Standard, Full |
+| Audit health + catalog refactoring (+ M9 e2e soft-cap, orphan-test, flake-rate audits) | `reviewing` | Standard, Full |
 | Small scoped fix | `quick-tasking` | Quick |
 | UI with design system | `designing` | When UI |
 | Security review | `securing` | When auth/data/API |
-| E2E/integration tests + suite audit | `testing` | When UI/flows or flaky suite |
+| E2E/integration tests + suite audit (+ M9 author-mode gate refuses e2e without `e2e:true` + `validated:true`) | `testing` | When UI/flows or flaky suite |
 | Systematic debugging | `debugging` | When stuck |
 | Upgrade Forge files | `upgrading` | On-demand |
 | Cross-session memory | `beads-integration` | When Beads installed |
+| Multi-agent orchestration (experimental) | `orchestrating` | Full (opt-in) |
+> Experimental skills require opt-in install — see `packages/create-forge/experimental/m10/README.md`.
 ## Context Engineering
@@ -125,7 +128,7 @@ State lives in `.forge/`:
 - `project.yml` — Vision, stack, design system, verification, constraints (<5KB)
 - `constitution.md` — Active architectural gates
 - `design-system.md` — Component mapping table
-- `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared.
+- `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared. Functional requirements may carry M9 e2e gate fields (`e2e`, `observable_outcome`, `observable_outcome_hash`, `validated`) — lazy migration, absent fields default to `e2e:false`/`validated:false`.
 - `roadmap.yml` — Phases, milestones, dependencies
 - `state/index.yml` — Global: active milestones, desire_paths, metrics
 - `state/milestone-{id}.yml` — Per-milestone cursor: position, progress, decisions, blockers
@@ -173,6 +176,7 @@ verification:
 - Auto-fix loop: read output → fix → amend → re-run (up to max_retries)
 - 3-strike: retries count toward task limit
 - Empty commands = no gate (opt-out)
+- `verification.e2e_soft_cap` (default 10) — advisory cap on `e2e:true` stories per milestone surfaced by the `reviewing` skill. Soft — never blocks.
 ## Beads Integration (Optional)