npm - forge-orkes - Versions diffs - 0.13.0 → 0.14.0 - Mend

forge-orkes 0.13.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/package.json +1 -1
package/template/.claude/hooks/README.md +76 -0
package/template/.claude/hooks/forge-claim-check-doctor.sh +37 -0
package/template/.claude/hooks/forge-claim-check.sh +96 -0
package/template/.claude/skills/forge/SKILL.md +3 -3
package/template/.claude/skills/orchestrating/SKILL.md +135 -0
package/template/.claude/skills/orchestrating/bootstrap-checks.md +48 -0
package/template/.claude/skills/planning/SKILL.md +11 -0
package/template/.claude/skills/reviewing/SKILL.md +46 -0
package/template/.claude/skills/testing/SKILL.md +29 -0
package/template/.claude/skills/verifying/SKILL.md +29 -0
package/template/.forge/templates/project.yml +1 -0
package/template/.forge/templates/requirements.yml +11 -1
package/template/CLAUDE.md +5 -4

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "forge-orkes",
-  "version": "0.13.0",
+  "version": "0.14.0",
   "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
   "bin": {
     "create-forge": "./bin/create-forge.js"

package/template/.claude/hooks/README.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Forge Hooks
+## `forge-claim-check.sh` — PreToolUse claim-check
+Cross-session file-claim collision detector. Pairs with the Forge MCP
+orchestrator (`.forge/.mcp-server/`) to prevent two concurrent Claude Code
+sessions from clobbering each other's edits on the same file.
+### Behavior
+Reads the Claude Code `PreToolUse` JSON payload on stdin. Extracts target
+file path(s) from `tool_input.file_path`, `tool_input.notebook_path`,
+`tool_input.path`, or `tool_input.edits[].file_path` (MultiEdit). For each
+path, queries `.forge/.mcp-server/claims.db` for an active claim.
+| Situation | Exit | Effect |
+|---|---|---|
+| No claim, or DB missing (fresh repo) | `0` | allow |
+| Claim held by current `CLAUDE_SESSION_ID` | `0` | allow |
+| `CLAUDE_SESSION_ID` unset (single-agent / non-Claude invocation) | `0` | allow + stderr warning |
+| Unknown payload schema (no recognized path field) | `0` | allow |
+| Claim held by another session | `2` | deny, stderr names owner + expiry |
+| Any unexpected error (corrupt DB, jq failure, sqlite timeout, etc.) | `2` | fail-closed deny |
+**Never exits 1.** Claude Code treats non-zero as warning by default; we
+need a hard block on collision, so deny is always `exit 2`.
+### Prerequisites
+- `bash` (≥ 4 recommended — relies on `set -u` array safety patterns)
+- `jq`
+- `sqlite3`
+- `timeout` (GNU coreutils) **or** `gtimeout` (macOS, `brew install coreutils`) — optional but recommended; without it the SQLite query is unbounded (DB-level `busy_timeout` still applies)
+Run `bash .claude/hooks/forge-claim-check-doctor.sh` to verify prerequisites.
+### Environment
+| Var | Source | Purpose |
+|---|---|---|
+| `CLAUDE_PROJECT_DIR` | Claude Code | Project root, used to resolve relative paths and locate DB |
+| `CLAUDE_SESSION_ID` | Claude Code | Current session identifier — own claims pass through |
+| `FORGE_CLAIMS_DB` | optional override | Path to `claims.db` (defaults to `$CLAUDE_PROJECT_DIR/.forge/.mcp-server/claims.db`) |
+### Registration
+Not registered automatically. The install procedure (plan-06) adds the
+`PreToolUse` entry to `.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Edit|Write|MultiEdit|NotebookEdit",
+        "hooks": [
+          { "type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/forge-claim-check.sh" }
+        ]
+      }
+    ]
+  }
+}
+```
+### Disabling
+Rename or remove the hook entry in `.claude/settings.json`, or set the file
+non-executable: `chmod -x .claude/hooks/forge-claim-check.sh`. The hook is
+defense-in-depth — the MCP server's `forge_claim_files` tool remains the
+primary coordination point.
+### Troubleshooting
+- "internal error at line N" on every edit → corrupt DB or missing tool. Run doctor. Common: `jq` not on PATH.
+- No collisions detected → confirm `CLAUDE_SESSION_ID` set and `claims.db` exists; otherwise hook fail-opens.
+- macOS `timeout: command not found` → `brew install coreutils` for `gtimeout`, or skip (DB busy_timeout still applies).

package/template/.claude/hooks/forge-claim-check-doctor.sh ADDED Viewed

@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+# Forge claim-check hook prerequisites probe. Informational only — always exit 0.
+# Run manually or via install procedure to confirm bash/jq/sqlite3/timeout availability.
+set -uo pipefail
+check() {
+  local name=$1 cmd=$2 version_flag=${3:---version}
+  if command -v "$cmd" >/dev/null 2>&1; then
+    local ver
+    ver=$("$cmd" "$version_flag" 2>&1 | head -1)
+    printf '  ✓ %-10s %s\n' "$name" "$ver"
+  else
+    printf '  ✗ %-10s MISSING\n' "$name"
+  fi
+}
+echo "Forge claim-check hook — prerequisites"
+echo
+check "bash"     "bash"     "--version"
+check "jq"       "jq"       "--version"
+check "sqlite3"  "sqlite3"  "--version"
+if command -v timeout >/dev/null 2>&1; then
+  printf '  ✓ %-10s %s\n' "timeout" "$(timeout --version 2>&1 | head -1)"
+elif command -v gtimeout >/dev/null 2>&1; then
+  printf '  ✓ %-10s (gtimeout) %s\n' "timeout" "$(gtimeout --version 2>&1 | head -1)"
+else
+  printf '  ✗ %-10s MISSING (optional — install coreutils for bounded queries)\n' "timeout"
+fi
+echo
+echo "DB lookup path: ${FORGE_CLAIMS_DB:-${CLAUDE_PROJECT_DIR:-$PWD}/.forge/.mcp-server/claims.db}"
+echo "Session id:     ${CLAUDE_SESSION_ID:-<unset — hook will fail-open>}"
+exit 0

package/template/.claude/hooks/forge-claim-check.sh ADDED Viewed

@@ -0,0 +1,96 @@
+#!/usr/bin/env bash
+# Forge PreToolUse hook — cross-session file-claim collision detector.
+#
+# Contract:
+#   stdin: Claude Code PreToolUse JSON payload
+#   exit 0 = allow (no claim, own claim, no DB, no session context, unknown schema)
+#   exit 2 = deny (cross-session claim active, or any internal error — fail-closed)
+#
+# Defense-in-depth: ERR trap converts ANY unexpected failure into an exit-2 deny.
+# Never exit 1 — Claude Code treats exit 1 as soft warning; we want hard block on error.
+set -euo pipefail
+deny() {
+  echo "[forge-hook] $*" >&2
+  exit 2
+}
+trap 'deny "internal error at line $LINENO — denying for safety (fail-closed)"' ERR
+PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$PWD}"
+DB="${FORGE_CLAIMS_DB:-$PROJECT_DIR/.forge/.mcp-server/claims.db}"
+SESSION_ID="${CLAUDE_SESSION_ID:-}"
+# Detect timeout wrapper (macOS lacks GNU `timeout` unless coreutils installed → `gtimeout`).
+if command -v timeout >/dev/null 2>&1; then
+  TIMEOUT_CMD=(timeout 4)
+elif command -v gtimeout >/dev/null 2>&1; then
+  TIMEOUT_CMD=(gtimeout 4)
+else
+  TIMEOUT_CMD=()  # no wrapper — sqlite call runs unbounded; busy_timeout in DB still applies
+fi
+PAYLOAD=$(cat)
+# Normalize across known path fields:
+#   Edit / Write           → tool_input.file_path
+#   NotebookEdit           → tool_input.notebook_path   (validated in spike — see milestone-10-validation.md §2)
+#   MultiEdit              → tool_input.file_path       (single file, multiple edits; per docs)
+#   Future / unknown tools → tool_input.path            (defensive)
+# jq emits each matching path on its own line. Empty output = no path field present.
+FILES=$(printf '%s' "$PAYLOAD" | jq -r '
+  [ .tool_input.file_path?
+  , .tool_input.notebook_path?
+  , .tool_input.path?
+  , ( .tool_input.edits? // [] | .[]?.file_path? )
+  ] | map(select(. != null and . != "")) | unique | .[]
+')
+if [ -z "$FILES" ]; then
+  # No recognized path field — unknown schema. Allow (do not deny on schema drift).
+  exit 0
+fi
+# No DB = MCP server has never run in this repo (fresh-repo case). Fail-open only here.
+if [ ! -f "$DB" ]; then
+  exit 0
+fi
+if [ -z "$SESSION_ID" ]; then
+  # No session context — single-agent or hook invoked outside Claude Code. Allow.
+  echo "[forge-hook] CLAUDE_SESSION_ID unset — allowing (single-agent mode)" >&2
+  exit 0
+fi
+# Resolve each path to absolute (matches what MCP server stores via path.resolve).
+abspath() {
+  case "$1" in
+    /*) printf '%s' "$1" ;;
+    *)  printf '%s/%s' "$PROJECT_DIR" "$1" ;;
+  esac
+}
+while IFS= read -r raw; do
+  [ -z "$raw" ] && continue
+  file=$(abspath "$raw")
+  # Parameterized query via .param — avoids quoting injection on path strings.
+  result=$("${TIMEOUT_CMD[@]+"${TIMEOUT_CMD[@]}"}" sqlite3 -batch "$DB" \
+    ".param set :fp '$file'" \
+    "SELECT session_id || '|' || expires_at FROM claims WHERE file_path = :fp AND expires_at > strftime('%s','now') LIMIT 1;")
+  [ -z "$result" ] && continue
+  owner="${result%%|*}"
+  expires_at="${result##*|}"
+  if [ "$owner" = "$SESSION_ID" ]; then
+    continue  # own claim
+  fi
+  expires_human=$(date -r "$expires_at" "+%Y-%m-%d %H:%M:%S %Z" 2>/dev/null || echo "epoch:$expires_at")
+  deny "Edit denied: $file claimed by session $owner until $expires_human. Call forge_release_claims in that session, or wait."
+done <<< "$FILES"
+exit 0

package/template/.claude/skills/forge/SKILL.md CHANGED Viewed

@@ -223,7 +223,7 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
 | reviewing | sonnet | Audit judgment |
 | quick-tasking | haiku | Speed |
 | discussing | sonnet | Conversation |
-| testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing |
+| testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing. M9: author-mode refuses e2e without `e2e:true` + `validated:true`. |
 | deferred | haiku | Read + format only |
 | `current.status` | Route To |
@@ -234,8 +234,8 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
 | `architecting` | `Skill(architecting)` → planning |
 | `planning` | `Skill(planning)` → executing |
 | `executing` | `Skill(executing)` → verifying |
-| `verifying` | `Skill(verifying)` → reviewing |
-| `reviewing` | `Skill(reviewing)` → complete |
+| `verifying` | `Skill(verifying)` → reviewing — runs M9 e2e validation gate when `e2e:true` stories present |
+| `reviewing` | `Skill(reviewing)` → complete — adds M9 e2e suite audit (soft-cap, orphans, flake-rate) |
 | `complete` | Done. Ask what's next. |
 | `deferred` | Milestone frozen. *"Resume milestone {id}" to reactivate.* |
 | `quick-tasking` | `Skill(quick-tasking)` |

package/template/.claude/skills/orchestrating/SKILL.md ADDED Viewed

@@ -0,0 +1,135 @@
+---
+name: orchestrating
+description: "[Experimental — M10] Owns multi-agent session lifecycle. Bootstrap, worktree create, claim+merge coordination, teardown. Refuses worktree mode on incompatible repos and falls back to single-agent."
+---
+# Orchestrating
+Multi-agent session lifecycle. Worktree isolation + MCP-coordinated claims + merge queue. Experimental — opt-in per ADR-001. Refuses on incompatible repos, falls back to single-agent.
+## When to use
+- User explicitly invokes multi-agent mode (`/forge` argument selects multi-agent, or direct skill invocation).
+- `executing` skill at Full tier with ≥2 concurrent-eligible phases routes through this skill.
+Skip on Quick tier, single-phase work, or when bootstrap checks fail.
+## Step 1: Bootstrap
+Run all checks in `bootstrap-checks.md`:
+1. Git version ≥ 2.48
+2. LFS version ≥ 3.6 (skip if not installed)
+3. No submodules
+4. `core.hooksPath` empty or resolves inside worktree
+5. `git hook run pre-commit` smoke test in fresh worktree
+**Any check fails** → log reason, write `lifecycle.worktree_mode: refused` + `lifecycle.refused_reason` into active milestone state, emit fallback message (see `bootstrap-checks.md`), return to caller. Caller continues single-agent.
+## Step 2: Session ID + worktree
+```bash
+session_id=$(uuidgen | cut -c1-8)
+git worktree prune
+git worktree add -b forge/${session_id} --lock --reason "forge session" ../forge-worktrees/${session_id} main
+( cd ../forge-worktrees/${session_id} && git hook run pre-commit || true )
+```
+Verify worktree dir exists, branch locked. On failure → cleanup partial state, refuse mode.
+## Step 3: State update
+Write into `.forge/state/milestone-{id}.yml`:
+```yaml
+lifecycle:
+  session_id: "{session_id}"
+  worktree_path: "../forge-worktrees/{session_id}"
+  worktree_branch: "forge/{session_id}"
+  worktree_mode: "active"
+  started_at: "{ISO8601}"
+```
+Update `state/index.yml` milestone `last_updated`.
+## Step 4: Hand back
+Return control to caller (typically `executing`). All subsequent work runs inside `../forge-worktrees/{session_id}`. Caller honors claim convention.
+### Claim convention (executing-skill contract)
+Before any `Edit`, `Write`, `MultiEdit`, or `NotebookEdit` on a file outside `.forge/state/milestone-{own_id}.yml`:
+1. Call `forge_claim_files` with `{ session_id, files: [...], ttl_seconds: 900 }`.
+2. On `granted` → proceed with edit.
+3. On `conflict: { holder_session, files: [...] }` → surface holder + files to user. Options:
+   - **wait** → poll `forge_claim_status` until released or TTL expiry.
+   - **skip** → drop the conflicted file from the task scope, continue with rest.
+   - **steal** → only if holder session is provably dead (PreToolUse hook validates). Otherwise refused.
+4. After edit batch → claim auto-extends on continued use; explicit `forge_release_claims` on plan-complete.
+PreToolUse hook (installed by plan-03) enforces this — uncaught violations block at hook level, not skill level.
+## Step 5: Teardown
+Triggered when caller signals work complete OR user requests teardown.
+```
+forge_queue_commit(branch=forge/{session_id}, base_sha={merge_base})
+```
+Branch on response status:
+- **`merged`** → `forge_release_claims(session_id)` → `git worktree remove --force ../forge-worktrees/{session_id}` → `git branch -d forge/{session_id}` → clear `lifecycle.*` (set `worktree_mode: complete`, retain `session_id` for audit).
+- **`conflict`** → invoke `Skill(debugging)` with payload `{ conflicted_files, base_sha, messages, branch }`. Teardown blocks until debugging signals resolution (re-invoke teardown after fix).
+- **`stale_base`** → caller rebases worktree branch onto `current_main_sha` from response, retries `forge_queue_commit`. Max 3 retries → escalate to conflict path.
+## Step 6: Crash recovery (next session start)
+If no clean teardown happened previously:
+1. `git worktree prune` — drops stale admin dirs.
+2. `git branch --list 'forge/*'` — for each branch with no live worktree, prompt user:
+   - **resume** → re-attach: `git worktree add ../forge-worktrees/{id} forge/{id}` and restore lifecycle state.
+   - **delete** → `git branch -D forge/{id}`.
+3. MCP server startup handles pidfile takeover + claim TTL expiry independently (see ADR-003).
+## Failure modes & operator notes
+- **MCP server absent** — bootstrap check 5 (hook smoke) will not detect this. Skill detects on first `forge_claim_files` call: error `MCP_SERVER_UNAVAILABLE` → write `lifecycle.worktree_mode: degraded`, warn user, continue without claim coordination. Hard isolation (worktree) still active; coordination is downgraded to best-effort.
+- **Disk full during worktree add** — `git worktree add` will error. Cleanup any partial `../forge-worktrees/{session_id}/` dir, refuse mode, fall back.
+- **Concurrent orchestrating invocations** — second invocation reads first's `lifecycle.session_id` in state. If present and `worktree_mode: active` → refuse (one orchestration per milestone). Use a separate milestone for parallel orchestrated work.
+- **Worktree path collision** — UUIDv4 short (8 chars) collision negligible at <100 concurrent sessions. If `../forge-worktrees/{id}/` already exists → regenerate session_id, retry up to 3 times.
+## Example: clean session
+```
+user: /forge multi-agent
+forge → orchestrating
+orchestrating: bootstrap OK → session_id=a1b2c3d4 → worktree created → state written
+orchestrating → executing (working dir: ../forge-worktrees/a1b2c3d4)
+executing: claim files → edit → commit (× N tasks)
+executing → verifying → reviewing (all inside worktree)
+reviewing → orchestrating (teardown)
+orchestrating: forge_queue_commit → merged → release claims → remove worktree → delete branch
+done.
+```
+## Example: conflict path
+```
+orchestrating: forge_queue_commit → conflict { files: [src/auth.ts] }
+orchestrating → debugging { conflicted_files, base_sha, branch }
+debugging: user resolves → signal resolved
+orchestrating: retry forge_queue_commit → merged → cleanup
+```
+## References
+- ADR-001 — experimental track / opt-in carve-out
+- ADR-002 — worktrees as isolation substrate
+- ADR-003 — MCP server + per-repo SQLite
+- ADR-004 — merge queue (forge_queue_commit status semantics)
+- ADR-005 — session lifecycle (this skill realizes it)
+- `.forge/research/milestone-10.md` — spike findings
+- `bootstrap-checks.md` — bootstrap check matrix + fallback

package/template/.claude/skills/orchestrating/bootstrap-checks.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Bootstrap Checks
+Run before worktree creation. Any failure → refuse worktree mode, fall back to single-agent.
+| Check | Command | Pass criterion | Fail action | Reason |
+|-------|---------|----------------|-------------|--------|
+| Git version | `git --version` | major.minor ≥ 2.48 | refuse worktree mode | Known worktree bugs < 2.48 (admin-dir leaks, prune races) |
+| LFS version | `git lfs version` (skip if not installed) | ≥ 3.6 OR not installed | refuse worktree mode | Known worktree-locking bugs < 3.6 |
+| Submodule scan | `git submodule status` | empty output | refuse worktree mode | Submodules officially "not recommended" in worktrees (git docs) |
+| `core.hooksPath` | `git config core.hooksPath` | empty OR path resolves inside worktree | refuse worktree mode | Husky/lefthook silent-skip risk when path points outside worktree |
+| Hook smoke test | `git hook run pre-commit` in fresh worktree | exit 0 OR matches main-repo exit code | refuse worktree mode | Confirms hook plumbing actually fires inside worktree |
+## Fallback message
+On any check failure, emit verbatim to user:
+```
+M10 worktree mode unavailable: <reason>. Continuing in single-agent mode. See ADR-005 for compatibility matrix.
+```
+Replace `<reason>` with the failing check name + observed value (e.g., "Git version 2.45.1 < required 2.48").
+## State write on refusal
+```yaml
+lifecycle:
+  worktree_mode: refused
+  refused_reason: "<check name>: <observed>"
+  refused_at: "{ISO8601}"
+```
+## Check execution order
+Run checks 1→5 in order. First failure halts the sequence — no point running hook smoke if Git itself is too old. Record the failing check name + observed value as `refused_reason` so the user knows exactly what to fix.
+## Remediation hints
+| Failing check | User remediation |
+|---------------|------------------|
+| Git version | Upgrade Git to ≥ 2.48 (Homebrew: `brew upgrade git`; apt: backports or PPA) |
+| LFS version | Upgrade Git LFS to ≥ 3.6 OR uninstall if not actually used |
+| Submodule scan | Convert submodules to subtrees or vendored copies; M10 cannot coexist with submodules per upstream git guidance |
+| `core.hooksPath` | Either unset (`git config --unset core.hooksPath`) or move hook dir inside repo tree so it resolves under each worktree |
+| Hook smoke test | Inspect failing hook output; common cause is hook script assuming `$PWD` is main repo root — fix to use `git rev-parse --show-toplevel` |
+## Re-running checks
+Bootstrap is idempotent and cheap (~200ms total). Skill re-runs on every session start; users do not invoke directly. If a check transiently fails (e.g., LFS not yet installed during initial setup), simply restart the orchestration entry.

package/template/.claude/skills/planning/SKILL.md CHANGED Viewed

@@ -53,6 +53,17 @@ If missing, create from `.forge/templates/requirements.yml`:
 5. P1 (must) / P2 (should) / P3 (nice)
 6. Deferred: DEF-001... (also globally unique)
+**E2E gate (M9):** For each functional requirement being added or refined:
+1. Decide `e2e: true|false` -- does this story need a post-validation e2e test?
+   - true = high-value user journey worth a real-browser walk + automated guard
+   - false = covered by integration/unit, or low-value to e2e
+   - Default to false. Only flag true for spine flows (auth, checkout-class flows, primary user task).
+2. When `e2e:true`, capture `observable_outcome:` -- one sentence describing what the user observes when the flow succeeds. Block planning until provided. No silent default.
+3. Re-planning: read existing `e2e` / `observable_outcome` decisions from `requirements/m{N}.yml`. Preserve them. Only prompt for new or unflagged FRs.
+4. Write `e2e`, `observable_outcome`, `validated: false`, `observable_outcome_hash: ""` to each FR. The hash + `validated` flip later in verifying.
+Contract: locked decision in `.forge/context.md` (M9 section, "Approach D"). Do NOT enforce the e2e soft cap here -- that's reviewing's job.
 **Blocks until all P1 `[NEEDS CLARIFICATION]` resolved.**
 Never write to top-level `.forge/requirements.yml` -- that path is deprecated.

package/template/.claude/skills/reviewing/SKILL.md CHANGED Viewed

@@ -150,6 +150,52 @@ refactoring_scan:
       suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
 ```
+### Part 4: E2E Suite Audit (M9)
+Three sub-checks. All advisory. None block milestone close.
+**1. Soft-cap warning**
+- Read `verification.e2e_soft_cap` from `.forge/project.yml`. Default 10 if absent.
+- Count `e2e: true` stories in the active milestone's `.forge/requirements/m{N}.yml`.
+- If count > cap → warn: `"E2E soft cap exceeded: {count}/{cap} stories flagged. Trim e2e:true stories or raise verification.e2e_soft_cap in project.yml. Soft cap — does not block."`
+- **Skip-clean:** zero `e2e:true` stories → sub-check omitted from report.
+**2. Orphan-test detection**
+- Glob for e2e test files. Stack-detect from `project.yml` `interface_tools` (fallback: Playwright `tests/e2e/**/*.spec.ts` + `e2e/**/*.spec.ts`; pytest `tests/e2e/test_*.py`; go `e2e/*_test.go`).
+- For each file, grep for `story: FR-` (either in comment or test/function name).
+- If no match → flag: `"Orphan e2e test: {path} — no FR-XXX reference found. Either tag the story or delete the test."`
+- List orphans in a dedicated subsection.
+- **Skip-clean:** zero e2e files discovered → sub-check omitted from report.
+**3. Flake-rate signal**
+- Best-effort. Attempt sources in order:
+  1. `.forge/testing/suite-health.md` flake entries (tester analyst-mode output)
+  2. GitHub Actions test summary artifacts (parse from `.github/workflows/` outputs if accessible)
+  3. Local `playwright-report/` retry counts (if present)
+- Aggregate per-test flake count. Surface top 5 flakiest with counts.
+- If no source available AND e2e files exist → emit `"Flake-rate: no data (run testing skill analyst-mode for suite-health.md)"`.
+- Never blocks.
+- **Skip-clean:** zero e2e files discovered → sub-check omitted entirely (no "no data" line).
+**Section-level skip-clean:** zero e2e test files AND zero `e2e:true` stories → omit the entire "E2E Suite Audit" section from the health report.
+```yaml
+e2e_suite_audit:
+  soft_cap:
+    count: 3
+    cap: 10
+    status: ok                # ok | exceeded
+  orphan_tests:
+    files_scanned: 4
+    orphans: []               # list of paths with no story: FR- reference
+  flake_rate:
+    source: "suite-health.md" # or "no data"
+    top_flaky: []             # [{path, count}, ...]
+```
 ## Step 4: Score
 **Per-category:**

package/template/.claude/skills/testing/SKILL.md CHANGED Viewed

@@ -66,6 +66,35 @@ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
 ### Author Mode
+#### E2E Preflight Gate (M9)
+Runs ONLY for e2e authoring requests. Integration-test authoring + analyst mode skip this gate entirely.
+**Preconditions per story** — for every FR the user requests an e2e for:
+1. Read `.forge/requirements/m{N}.yml`, locate the FR by ID.
+2. Check `e2e: true`. If false or missing → REFUSE with:
+   `"Story {FR-ID} not flagged for e2e — add `e2e: true` + `observable_outcome` in requirements/m{N}.yml first (planning skill captures this during story breakdown)."`
+3. Check `validated: true`. If false or missing → REFUSE with:
+   `"Story {FR-ID} not yet validated by human — run verifying skill and walk the flow first (the e2e validation gate writes validated:true on confirmation)."`
+4. Recompute `observable_outcome_hash` from current outcome text (SHA-256 utf-8, first 12 hex). Compare to stored hash. If mismatch → REFUSE with:
+   `"Story {FR-ID} observable_outcome changed since validation — re-run verifying skill to re-validate the updated flow."`
+5. Only when all three pass: proceed to author the e2e test.
+**Story stamping (required on every authored e2e)** — every generated e2e file MUST include the story reference. Use the framework's natural mechanism:
+- Playwright / Vitest / Jest TS: `// story: FR-XXX` at the top of the spec file AND the FR ID in the test name (e.g. `test('FR-053: user signs in with correct credentials', ...)`)
+- pytest: `# story: FR-XXX` at the top of the test module AND in the test function name (`def test_FR_053_user_signs_in(...)`)
+- go test: `// story: FR-XXX` above the test function AND in the test name (`func TestFR053UserSignsIn(t *testing.T)`)
+No story ID = orphan. Reviewing skill (phase 17) flags orphans for deletion.
+**Integration + analyst modes** — unchanged. No flag check, no validated check, no story-ID stamping enforcement. M9 lock is e2e-only.
+Refusal message wording is contract (NFR-009 requires story ID + exact missing field). Do not paraphrase.
+#### Standard author flow
 1. **Determine layer** — e2e vs integration. Ask if ambiguous.
 2. **Select runner:**
    - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)

package/template/.claude/skills/verifying/SKILL.md CHANGED Viewed

@@ -91,6 +91,35 @@ Re-run verifying after tests are added.
 If detection is ambiguous (e.g. API tests hard to grep definitively) → lean toward PASS to avoid false blocks; note uncertainty in the verdict.
+## E2E Validation Gate (M9)
+Runs AFTER code-level verification commands pass. Skipped if no `e2e:true` stories in the active milestone.
+### Steps
+1. Read `.forge/requirements/m{N}.yml` for the active milestone. Collect every functional requirement with `e2e: true`.
+2. If list is empty → skip gate silently. No prompt. No error.
+3. For each `e2e:true` FR, present to the human:
+   - FR ID + description
+   - `observable_outcome` text verbatim
+   - Prompt: *"Walk this flow manually. Did the observable outcome occur? [confirm | decline | skip]"*
+4. Per response:
+   - **confirm** → compute `observable_outcome_hash` = SHA-256(observable_outcome utf-8), truncate to first 12 hex chars. Write `validated: true` + the hash to the FR entry in `requirements/m{N}.yml`.
+   - **decline** → leave `validated: false`. Record decline + reason (free text) in the verification report.
+   - **skip** → leave `validated: false`. Record skip in the verification report. No reason required.
+5. **Hash drift check** (run BEFORE prompting, every gate invocation): for each `e2e:true` FR with `validated: true`, recompute hash from current `observable_outcome`. If it differs from stored `observable_outcome_hash` → set `validated: false`, clear hash. Note auto-reset in verification report. Then prompt that FR as unvalidated.
+6. Write per-FR validation outcomes into the verification report under section "E2E Validation".
+### Gate behavior
+- **Advisory, not blocking.** Verifying still passes even if no stories validated — the hard gate is in `testing` skill author-mode (phase 16). This gate's job is to surface + record, not block.
+- Per-story (not batch). Human walks one at a time.
+- Hash: SHA-256, UTF-8 input, hex output truncated to first 12 chars. Deterministic across machines.
+### Skip-clean
+Milestones with zero `e2e:true` stories never see this gate. Verifying logs nothing — appears as if the gate doesn't exist.
 ## 3-Level Goal-Backward Verification
 ### Level 1: Observable Truths

package/template/.forge/templates/project.yml CHANGED Viewed

@@ -60,6 +60,7 @@ verification:
     #   advisory: true                # pre-existing type errors — warn, don't block
   auto_fix: true                      # On failure, agent fixes and retries
   max_retries: 2                      # Max auto-fix attempts per command (0 = fail immediately)
+  e2e_soft_cap: 10                    # M9: advisory cap on e2e:true stories per milestone. Reviewing warns when exceeded. Soft — never blocks.
   # Advisory mode: commands already failing before Forge started run but don't block — warn only.
 success_criteria:                   # How do we know we're done?

package/template/.forge/templates/requirements.yml CHANGED Viewed

@@ -7,6 +7,11 @@
 milestone: 1                         # Milestone this file belongs to (matches state/milestone-{id}.yml)
 version: "v1"                        # v1 = MVP, v2 = next iteration
+# E2E fields (M9): mark `e2e: true` + `observable_outcome` during planning. Verifying skill
+# prompts a human walk; on confirm it sets `validated: true` + `observable_outcome_hash`.
+# Testing skill author-mode refuses e2e without validated:true. Reviewing skill warns on
+# soft-cap exceeded + flags orphan tests. Fields are lazy — absent = e2e:false/validated:false.
 functional:
   # Each requirement: unique ID, description, acceptance criteria, phase assignment
   - id: FR-001
@@ -18,7 +23,12 @@ functional:
     phase: null                      # Assigned during roadmap creation
     priority: P1                     # P1 = must-have, P2 = should-have, P3 = nice-to-have
     status: pending                  # pending | clarifying | planned | implemented | verified
-    notes: ""                        # [NEEDS CLARIFICATION] if uncertain
+    notes: ""
+    # E2E gate (M9). Lazy — absent fields = e2e:false, validated:false.
+    # e2e: false                     # true = story gets one e2e test post-validation
+    # observable_outcome: ""         # one-sentence user-observable outcome (required when e2e:true)
+    # observable_outcome_hash: ""    # auto-computed SHA-256 of outcome (12 hex chars); editing outcome resets validated
+    # validated: false               # set true by verifying skill after human walks the flow                        # [NEEDS CLARIFICATION] if uncertain
   - id: FR-002
     description: ""

package/template/CLAUDE.md CHANGED Viewed

@@ -51,12 +51,12 @@ Auto-detects complexity. Override: "Use Quick/Standard/Full tier."
 | Architectural decisions | `architecting` | Full |
 | Break work into tasks with gates | `planning` | Standard, Full |
 | Build with deviation rules + atomic commits | `executing` | All |
-| Prove work delivers on goals | `verifying` | Standard, Full |
-| Audit health + catalog refactoring | `reviewing` | Standard, Full |
+| Prove work delivers on goals (+ M9 e2e validation gate when `e2e:true` stories present) | `verifying` | Standard, Full |
+| Audit health + catalog refactoring (+ M9 e2e soft-cap, orphan-test, flake-rate audits) | `reviewing` | Standard, Full |
 | Small scoped fix | `quick-tasking` | Quick |
 | UI with design system | `designing` | When UI |
 | Security review | `securing` | When auth/data/API |
-| E2E/integration tests + suite audit | `testing` | When UI/flows or flaky suite |
+| E2E/integration tests + suite audit (+ M9 author-mode gate refuses e2e without `e2e:true` + `validated:true`) | `testing` | When UI/flows or flaky suite |
 | Systematic debugging | `debugging` | When stuck |
 | Upgrade Forge files | `upgrading` | On-demand |
 | Cross-session memory | `beads-integration` | When Beads installed |
@@ -125,7 +125,7 @@ State lives in `.forge/`:
 - `project.yml` — Vision, stack, design system, verification, constraints (<5KB)
 - `constitution.md` — Active architectural gates
 - `design-system.md` — Component mapping table
-- `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared.
+- `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared. Functional requirements may carry M9 e2e gate fields (`e2e`, `observable_outcome`, `observable_outcome_hash`, `validated`) — lazy migration, absent fields default to `e2e:false`/`validated:false`.
 - `roadmap.yml` — Phases, milestones, dependencies
 - `state/index.yml` — Global: active milestones, desire_paths, metrics
 - `state/milestone-{id}.yml` — Per-milestone cursor: position, progress, decisions, blockers
@@ -173,6 +173,7 @@ verification:
 - Auto-fix loop: read output → fix → amend → re-run (up to max_retries)
 - 3-strike: retries count toward task limit
 - Empty commands = no gate (opt-out)
+- `verification.e2e_soft_cap` (default 10) — advisory cap on `e2e:true` stories per milestone surfaced by the `reviewing` skill. Soft — never blocks.
 ## Beads Integration (Optional)