forge-orkes 0.13.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "forge-orkes",
3
- "version": "0.13.0",
3
+ "version": "0.14.0",
4
4
  "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
5
5
  "bin": {
6
6
  "create-forge": "./bin/create-forge.js"
@@ -0,0 +1,76 @@
1
+ # Forge Hooks
2
+
3
+ ## `forge-claim-check.sh` — PreToolUse claim-check
4
+
5
+ Cross-session file-claim collision detector. Pairs with the Forge MCP
6
+ orchestrator (`.forge/.mcp-server/`) to prevent two concurrent Claude Code
7
+ sessions from clobbering each other's edits on the same file.
8
+
9
+ ### Behavior
10
+
11
+ Reads the Claude Code `PreToolUse` JSON payload on stdin. Extracts target
12
+ file path(s) from `tool_input.file_path`, `tool_input.notebook_path`,
13
+ `tool_input.path`, or `tool_input.edits[].file_path` (MultiEdit). For each
14
+ path, queries `.forge/.mcp-server/claims.db` for an active claim.
15
+
16
+ | Situation | Exit | Effect |
17
+ |---|---|---|
18
+ | No claim, or DB missing (fresh repo) | `0` | allow |
19
+ | Claim held by current `CLAUDE_SESSION_ID` | `0` | allow |
20
+ | `CLAUDE_SESSION_ID` unset (single-agent / non-Claude invocation) | `0` | allow + stderr warning |
21
+ | Unknown payload schema (no recognized path field) | `0` | allow |
22
+ | Claim held by another session | `2` | deny, stderr names owner + expiry |
23
+ | Any unexpected error (corrupt DB, jq failure, sqlite timeout, etc.) | `2` | fail-closed deny |
24
+
25
+ **Never exits 1.** Claude Code treats non-zero as warning by default; we
26
+ need a hard block on collision, so deny is always `exit 2`.
27
+
28
+ ### Prerequisites
29
+
30
+ - `bash` (≥ 4 recommended — relies on `set -u` array safety patterns)
31
+ - `jq`
32
+ - `sqlite3`
33
+ - `timeout` (GNU coreutils) **or** `gtimeout` (macOS, `brew install coreutils`) — optional but recommended; without it the SQLite query is unbounded (DB-level `busy_timeout` still applies)
34
+
35
+ Run `bash .claude/hooks/forge-claim-check-doctor.sh` to verify prerequisites.
36
+
37
+ ### Environment
38
+
39
+ | Var | Source | Purpose |
40
+ |---|---|---|
41
+ | `CLAUDE_PROJECT_DIR` | Claude Code | Project root, used to resolve relative paths and locate DB |
42
+ | `CLAUDE_SESSION_ID` | Claude Code | Current session identifier — own claims pass through |
43
+ | `FORGE_CLAIMS_DB` | optional override | Path to `claims.db` (defaults to `$CLAUDE_PROJECT_DIR/.forge/.mcp-server/claims.db`) |
44
+
45
+ ### Registration
46
+
47
+ Not registered automatically. The install procedure (plan-06) adds the
48
+ `PreToolUse` entry to `.claude/settings.json`:
49
+
50
+ ```json
51
+ {
52
+ "hooks": {
53
+ "PreToolUse": [
54
+ {
55
+ "matcher": "Edit|Write|MultiEdit|NotebookEdit",
56
+ "hooks": [
57
+ { "type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/forge-claim-check.sh" }
58
+ ]
59
+ }
60
+ ]
61
+ }
62
+ }
63
+ ```
64
+
65
+ ### Disabling
66
+
67
+ Rename or remove the hook entry in `.claude/settings.json`, or set the file
68
+ non-executable: `chmod -x .claude/hooks/forge-claim-check.sh`. The hook is
69
+ defense-in-depth — the MCP server's `forge_claim_files` tool remains the
70
+ primary coordination point.
71
+
72
+ ### Troubleshooting
73
+
74
+ - "internal error at line N" on every edit → corrupt DB or missing tool. Run doctor. Common: `jq` not on PATH.
75
+ - No collisions detected → confirm `CLAUDE_SESSION_ID` set and `claims.db` exists; otherwise hook fail-opens.
76
+ - macOS `timeout: command not found` → `brew install coreutils` for `gtimeout`, or skip (DB busy_timeout still applies).
@@ -0,0 +1,37 @@
1
+ #!/usr/bin/env bash
2
+ # Forge claim-check hook prerequisites probe. Informational only — always exit 0.
3
+ # Run manually or via install procedure to confirm bash/jq/sqlite3/timeout availability.
4
+
5
+ set -uo pipefail
6
+
7
+ check() {
8
+ local name=$1 cmd=$2 version_flag=${3:---version}
9
+ if command -v "$cmd" >/dev/null 2>&1; then
10
+ local ver
11
+ ver=$("$cmd" "$version_flag" 2>&1 | head -1)
12
+ printf ' ✓ %-10s %s\n' "$name" "$ver"
13
+ else
14
+ printf ' ✗ %-10s MISSING\n' "$name"
15
+ fi
16
+ }
17
+
18
+ echo "Forge claim-check hook — prerequisites"
19
+ echo
20
+
21
+ check "bash" "bash" "--version"
22
+ check "jq" "jq" "--version"
23
+ check "sqlite3" "sqlite3" "--version"
24
+
25
+ if command -v timeout >/dev/null 2>&1; then
26
+ printf ' ✓ %-10s %s\n' "timeout" "$(timeout --version 2>&1 | head -1)"
27
+ elif command -v gtimeout >/dev/null 2>&1; then
28
+ printf ' ✓ %-10s (gtimeout) %s\n' "timeout" "$(gtimeout --version 2>&1 | head -1)"
29
+ else
30
+ printf ' ✗ %-10s MISSING (optional — install coreutils for bounded queries)\n' "timeout"
31
+ fi
32
+
33
+ echo
34
+ echo "DB lookup path: ${FORGE_CLAIMS_DB:-${CLAUDE_PROJECT_DIR:-$PWD}/.forge/.mcp-server/claims.db}"
35
+ echo "Session id: ${CLAUDE_SESSION_ID:-<unset — hook will fail-open>}"
36
+
37
+ exit 0
@@ -0,0 +1,96 @@
1
+ #!/usr/bin/env bash
2
+ # Forge PreToolUse hook — cross-session file-claim collision detector.
3
+ #
4
+ # Contract:
5
+ # stdin: Claude Code PreToolUse JSON payload
6
+ # exit 0 = allow (no claim, own claim, no DB, no session context, unknown schema)
7
+ # exit 2 = deny (cross-session claim active, or any internal error — fail-closed)
8
+ #
9
+ # Defense-in-depth: ERR trap converts ANY unexpected failure into an exit-2 deny.
10
+ # Never exit 1 — Claude Code treats exit 1 as soft warning; we want hard block on error.
11
+
12
+ set -euo pipefail
13
+
14
+ deny() {
15
+ echo "[forge-hook] $*" >&2
16
+ exit 2
17
+ }
18
+
19
+ trap 'deny "internal error at line $LINENO — denying for safety (fail-closed)"' ERR
20
+
21
+ PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$PWD}"
22
+ DB="${FORGE_CLAIMS_DB:-$PROJECT_DIR/.forge/.mcp-server/claims.db}"
23
+ SESSION_ID="${CLAUDE_SESSION_ID:-}"
24
+
25
+ # Detect timeout wrapper (macOS lacks GNU `timeout` unless coreutils installed → `gtimeout`).
26
+ if command -v timeout >/dev/null 2>&1; then
27
+ TIMEOUT_CMD=(timeout 4)
28
+ elif command -v gtimeout >/dev/null 2>&1; then
29
+ TIMEOUT_CMD=(gtimeout 4)
30
+ else
31
+ TIMEOUT_CMD=() # no wrapper — sqlite call runs unbounded; busy_timeout in DB still applies
32
+ fi
33
+
34
+ PAYLOAD=$(cat)
35
+
36
+ # Normalize across known path fields:
37
+ # Edit / Write → tool_input.file_path
38
+ # NotebookEdit → tool_input.notebook_path (validated in spike — see milestone-10-validation.md §2)
39
+ # MultiEdit → tool_input.file_path (single file, multiple edits; per docs)
40
+ # Future / unknown tools → tool_input.path (defensive)
41
+ # jq emits each matching path on its own line. Empty output = no path field present.
42
+ FILES=$(printf '%s' "$PAYLOAD" | jq -r '
43
+ [ .tool_input.file_path?
44
+ , .tool_input.notebook_path?
45
+ , .tool_input.path?
46
+ , ( .tool_input.edits? // [] | .[]?.file_path? )
47
+ ] | map(select(. != null and . != "")) | unique | .[]
48
+ ')
49
+
50
+ if [ -z "$FILES" ]; then
51
+ # No recognized path field — unknown schema. Allow (do not deny on schema drift).
52
+ exit 0
53
+ fi
54
+
55
+ # No DB = MCP server has never run in this repo (fresh-repo case). Fail-open only here.
56
+ if [ ! -f "$DB" ]; then
57
+ exit 0
58
+ fi
59
+
60
+ if [ -z "$SESSION_ID" ]; then
61
+ # No session context — single-agent or hook invoked outside Claude Code. Allow.
62
+ echo "[forge-hook] CLAUDE_SESSION_ID unset — allowing (single-agent mode)" >&2
63
+ exit 0
64
+ fi
65
+
66
+ # Resolve each path to absolute (matches what MCP server stores via path.resolve).
67
+ abspath() {
68
+ case "$1" in
69
+ /*) printf '%s' "$1" ;;
70
+ *) printf '%s/%s' "$PROJECT_DIR" "$1" ;;
71
+ esac
72
+ }
73
+
74
+ while IFS= read -r raw; do
75
+ [ -z "$raw" ] && continue
76
+ file=$(abspath "$raw")
77
+
78
+ # Parameterized query via .param — avoids quoting injection on path strings.
79
+ result=$("${TIMEOUT_CMD[@]+"${TIMEOUT_CMD[@]}"}" sqlite3 -batch "$DB" \
80
+ ".param set :fp '$file'" \
81
+ "SELECT session_id || '|' || expires_at FROM claims WHERE file_path = :fp AND expires_at > strftime('%s','now') LIMIT 1;")
82
+
83
+ [ -z "$result" ] && continue
84
+
85
+ owner="${result%%|*}"
86
+ expires_at="${result##*|}"
87
+
88
+ if [ "$owner" = "$SESSION_ID" ]; then
89
+ continue # own claim
90
+ fi
91
+
92
+ expires_human=$(date -r "$expires_at" "+%Y-%m-%d %H:%M:%S %Z" 2>/dev/null || echo "epoch:$expires_at")
93
+ deny "Edit denied: $file claimed by session $owner until $expires_human. Call forge_release_claims in that session, or wait."
94
+ done <<< "$FILES"
95
+
96
+ exit 0
@@ -223,7 +223,7 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
223
223
  | reviewing | sonnet | Audit judgment |
224
224
  | quick-tasking | haiku | Speed |
225
225
  | discussing | sonnet | Conversation |
226
- | testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing |
226
+ | testing | sonnet | Code gen (author) + audit judgment (analyst) — matches executing/reviewing. M9: author-mode refuses e2e without `e2e:true` + `validated:true`. |
227
227
  | deferred | haiku | Read + format only |
228
228
 
229
229
  | `current.status` | Route To |
@@ -234,8 +234,8 @@ Where `{source}` = `skills.{name}` | `models.default` | `parent session`. Suppre
234
234
  | `architecting` | `Skill(architecting)` → planning |
235
235
  | `planning` | `Skill(planning)` → executing |
236
236
  | `executing` | `Skill(executing)` → verifying |
237
- | `verifying` | `Skill(verifying)` → reviewing |
238
- | `reviewing` | `Skill(reviewing)` → complete |
237
+ | `verifying` | `Skill(verifying)` → reviewing — runs M9 e2e validation gate when `e2e:true` stories present |
238
+ | `reviewing` | `Skill(reviewing)` → complete — adds M9 e2e suite audit (soft-cap, orphans, flake-rate) |
239
239
  | `complete` | Done. Ask what's next. |
240
240
  | `deferred` | Milestone frozen. *"Resume milestone {id}" to reactivate.* |
241
241
  | `quick-tasking` | `Skill(quick-tasking)` |
@@ -0,0 +1,135 @@
1
+ ---
2
+ name: orchestrating
3
+ description: "[Experimental — M10] Owns multi-agent session lifecycle. Bootstrap, worktree create, claim+merge coordination, teardown. Refuses worktree mode on incompatible repos and falls back to single-agent."
4
+ ---
5
+
6
+ # Orchestrating
7
+
8
+ Multi-agent session lifecycle. Worktree isolation + MCP-coordinated claims + merge queue. Experimental — opt-in per ADR-001. Refuses on incompatible repos, falls back to single-agent.
9
+
10
+ ## When to use
11
+
12
+ - User explicitly invokes multi-agent mode (`/forge` argument selects multi-agent, or direct skill invocation).
13
+ - `executing` skill at Full tier with ≥2 concurrent-eligible phases routes through this skill.
14
+
15
+ Skip on Quick tier, single-phase work, or when bootstrap checks fail.
16
+
17
+ ## Step 1: Bootstrap
18
+
19
+ Run all checks in `bootstrap-checks.md`:
20
+
21
+ 1. Git version ≥ 2.48
22
+ 2. LFS version ≥ 3.6 (skip if not installed)
23
+ 3. No submodules
24
+ 4. `core.hooksPath` empty or resolves inside worktree
25
+ 5. `git hook run pre-commit` smoke test in fresh worktree
26
+
27
+ **Any check fails** → log reason, write `lifecycle.worktree_mode: refused` + `lifecycle.refused_reason` into active milestone state, emit fallback message (see `bootstrap-checks.md`), return to caller. Caller continues single-agent.
28
+
29
+ ## Step 2: Session ID + worktree
30
+
31
+ ```bash
32
+ session_id=$(uuidgen | cut -c1-8)
33
+ git worktree prune
34
+ git worktree add -b forge/${session_id} --lock --reason "forge session" ../forge-worktrees/${session_id} main
35
+ ( cd ../forge-worktrees/${session_id} && git hook run pre-commit || true )
36
+ ```
37
+
38
+ Verify worktree dir exists, branch locked. On failure → cleanup partial state, refuse mode.
39
+
40
+ ## Step 3: State update
41
+
42
+ Write into `.forge/state/milestone-{id}.yml`:
43
+
44
+ ```yaml
45
+ lifecycle:
46
+ session_id: "{session_id}"
47
+ worktree_path: "../forge-worktrees/{session_id}"
48
+ worktree_branch: "forge/{session_id}"
49
+ worktree_mode: "active"
50
+ started_at: "{ISO8601}"
51
+ ```
52
+
53
+ Update `state/index.yml` milestone `last_updated`.
54
+
55
+ ## Step 4: Hand back
56
+
57
+ Return control to caller (typically `executing`). All subsequent work runs inside `../forge-worktrees/{session_id}`. Caller honors claim convention.
58
+
59
+ ### Claim convention (executing-skill contract)
60
+
61
+ Before any `Edit`, `Write`, `MultiEdit`, or `NotebookEdit` on a file outside `.forge/state/milestone-{own_id}.yml`:
62
+
63
+ 1. Call `forge_claim_files` with `{ session_id, files: [...], ttl_seconds: 900 }`.
64
+ 2. On `granted` → proceed with edit.
65
+ 3. On `conflict: { holder_session, files: [...] }` → surface holder + files to user. Options:
66
+ - **wait** → poll `forge_claim_status` until released or TTL expiry.
67
+ - **skip** → drop the conflicted file from the task scope, continue with rest.
68
+ - **steal** → only if holder session is provably dead (PreToolUse hook validates). Otherwise refused.
69
+ 4. After edit batch → claim auto-extends on continued use; explicit `forge_release_claims` on plan-complete.
70
+
71
+ PreToolUse hook (installed by plan-03) enforces this — uncaught violations block at hook level, not skill level.
72
+
73
+ ## Step 5: Teardown
74
+
75
+ Triggered when caller signals work complete OR user requests teardown.
76
+
77
+ ```
78
+ forge_queue_commit(branch=forge/{session_id}, base_sha={merge_base})
79
+ ```
80
+
81
+ Branch on response status:
82
+
83
+ - **`merged`** → `forge_release_claims(session_id)` → `git worktree remove --force ../forge-worktrees/{session_id}` → `git branch -d forge/{session_id}` → clear `lifecycle.*` (set `worktree_mode: complete`, retain `session_id` for audit).
84
+ - **`conflict`** → invoke `Skill(debugging)` with payload `{ conflicted_files, base_sha, messages, branch }`. Teardown blocks until debugging signals resolution (re-invoke teardown after fix).
85
+ - **`stale_base`** → caller rebases worktree branch onto `current_main_sha` from response, retries `forge_queue_commit`. Max 3 retries → escalate to conflict path.
86
+
87
+ ## Step 6: Crash recovery (next session start)
88
+
89
+ If no clean teardown happened previously:
90
+
91
+ 1. `git worktree prune` — drops stale admin dirs.
92
+ 2. `git branch --list 'forge/*'` — for each branch with no live worktree, prompt user:
93
+ - **resume** → re-attach: `git worktree add ../forge-worktrees/{id} forge/{id}` and restore lifecycle state.
94
+ - **delete** → `git branch -D forge/{id}`.
95
+ 3. MCP server startup handles pidfile takeover + claim TTL expiry independently (see ADR-003).
96
+
97
+ ## Failure modes & operator notes
98
+
99
+ - **MCP server absent** — bootstrap check 5 (hook smoke) will not detect this. Skill detects on first `forge_claim_files` call: error `MCP_SERVER_UNAVAILABLE` → write `lifecycle.worktree_mode: degraded`, warn user, continue without claim coordination. Hard isolation (worktree) still active; coordination is downgraded to best-effort.
100
+ - **Disk full during worktree add** — `git worktree add` will error. Cleanup any partial `../forge-worktrees/{session_id}/` dir, refuse mode, fall back.
101
+ - **Concurrent orchestrating invocations** — second invocation reads first's `lifecycle.session_id` in state. If present and `worktree_mode: active` → refuse (one orchestration per milestone). Use a separate milestone for parallel orchestrated work.
102
+ - **Worktree path collision** — UUIDv4 short (8 chars) collision negligible at <100 concurrent sessions. If `../forge-worktrees/{id}/` already exists → regenerate session_id, retry up to 3 times.
103
+
104
+ ## Example: clean session
105
+
106
+ ```
107
+ user: /forge multi-agent
108
+ forge → orchestrating
109
+ orchestrating: bootstrap OK → session_id=a1b2c3d4 → worktree created → state written
110
+ orchestrating → executing (working dir: ../forge-worktrees/a1b2c3d4)
111
+ executing: claim files → edit → commit (× N tasks)
112
+ executing → verifying → reviewing (all inside worktree)
113
+ reviewing → orchestrating (teardown)
114
+ orchestrating: forge_queue_commit → merged → release claims → remove worktree → delete branch
115
+ done.
116
+ ```
117
+
118
+ ## Example: conflict path
119
+
120
+ ```
121
+ orchestrating: forge_queue_commit → conflict { files: [src/auth.ts] }
122
+ orchestrating → debugging { conflicted_files, base_sha, branch }
123
+ debugging: user resolves → signal resolved
124
+ orchestrating: retry forge_queue_commit → merged → cleanup
125
+ ```
126
+
127
+ ## References
128
+
129
+ - ADR-001 — experimental track / opt-in carve-out
130
+ - ADR-002 — worktrees as isolation substrate
131
+ - ADR-003 — MCP server + per-repo SQLite
132
+ - ADR-004 — merge queue (forge_queue_commit status semantics)
133
+ - ADR-005 — session lifecycle (this skill realizes it)
134
+ - `.forge/research/milestone-10.md` — spike findings
135
+ - `bootstrap-checks.md` — bootstrap check matrix + fallback
@@ -0,0 +1,48 @@
1
+ # Bootstrap Checks
2
+
3
+ Run before worktree creation. Any failure → refuse worktree mode, fall back to single-agent.
4
+
5
+ | Check | Command | Pass criterion | Fail action | Reason |
6
+ |-------|---------|----------------|-------------|--------|
7
+ | Git version | `git --version` | major.minor ≥ 2.48 | refuse worktree mode | Known worktree bugs < 2.48 (admin-dir leaks, prune races) |
8
+ | LFS version | `git lfs version` (skip if not installed) | ≥ 3.6 OR not installed | refuse worktree mode | Known worktree-locking bugs < 3.6 |
9
+ | Submodule scan | `git submodule status` | empty output | refuse worktree mode | Submodules officially "not recommended" in worktrees (git docs) |
10
+ | `core.hooksPath` | `git config core.hooksPath` | empty OR path resolves inside worktree | refuse worktree mode | Husky/lefthook silent-skip risk when path points outside worktree |
11
+ | Hook smoke test | `git hook run pre-commit` in fresh worktree | exit 0 OR matches main-repo exit code | refuse worktree mode | Confirms hook plumbing actually fires inside worktree |
12
+
13
+ ## Fallback message
14
+
15
+ On any check failure, emit verbatim to user:
16
+
17
+ ```
18
+ M10 worktree mode unavailable: <reason>. Continuing in single-agent mode. See ADR-005 for compatibility matrix.
19
+ ```
20
+
21
+ Replace `<reason>` with the failing check name + observed value (e.g., "Git version 2.45.1 < required 2.48").
22
+
23
+ ## State write on refusal
24
+
25
+ ```yaml
26
+ lifecycle:
27
+ worktree_mode: refused
28
+ refused_reason: "<check name>: <observed>"
29
+ refused_at: "{ISO8601}"
30
+ ```
31
+
32
+ ## Check execution order
33
+
34
+ Run checks 1→5 in order. First failure halts the sequence — no point running hook smoke if Git itself is too old. Record the failing check name + observed value as `refused_reason` so the user knows exactly what to fix.
35
+
36
+ ## Remediation hints
37
+
38
+ | Failing check | User remediation |
39
+ |---------------|------------------|
40
+ | Git version | Upgrade Git to ≥ 2.48 (Homebrew: `brew upgrade git`; apt: backports or PPA) |
41
+ | LFS version | Upgrade Git LFS to ≥ 3.6 OR uninstall if not actually used |
42
+ | Submodule scan | Convert submodules to subtrees or vendored copies; M10 cannot coexist with submodules per upstream git guidance |
43
+ | `core.hooksPath` | Either unset (`git config --unset core.hooksPath`) or move hook dir inside repo tree so it resolves under each worktree |
44
+ | Hook smoke test | Inspect failing hook output; common cause is hook script assuming `$PWD` is main repo root — fix to use `git rev-parse --show-toplevel` |
45
+
46
+ ## Re-running checks
47
+
48
+ Bootstrap is idempotent and cheap (~200ms total). Skill re-runs on every session start; users do not invoke directly. If a check transiently fails (e.g., LFS not yet installed during initial setup), simply restart the orchestration entry.
@@ -53,6 +53,17 @@ If missing, create from `.forge/templates/requirements.yml`:
53
53
  5. P1 (must) / P2 (should) / P3 (nice)
54
54
  6. Deferred: DEF-001... (also globally unique)
55
55
 
56
+ **E2E gate (M9):** For each functional requirement being added or refined:
57
+ 1. Decide `e2e: true|false` -- does this story need a post-validation e2e test?
58
+ - true = high-value user journey worth a real-browser walk + automated guard
59
+ - false = covered by integration/unit, or low-value to e2e
60
+ - Default to false. Only flag true for spine flows (auth, checkout-class flows, primary user task).
61
+ 2. When `e2e:true`, capture `observable_outcome:` -- one sentence describing what the user observes when the flow succeeds. Block planning until provided. No silent default.
62
+ 3. Re-planning: read existing `e2e` / `observable_outcome` decisions from `requirements/m{N}.yml`. Preserve them. Only prompt for new or unflagged FRs.
63
+ 4. Write `e2e`, `observable_outcome`, `validated: false`, `observable_outcome_hash: ""` to each FR. The hash + `validated` flip later in verifying.
64
+
65
+ Contract: locked decision in `.forge/context.md` (M9 section, "Approach D"). Do NOT enforce the e2e soft cap here -- that's reviewing's job.
66
+
56
67
  **Blocks until all P1 `[NEEDS CLARIFICATION]` resolved.**
57
68
 
58
69
  Never write to top-level `.forge/requirements.yml` -- that path is deprecated.
@@ -150,6 +150,52 @@ refactoring_scan:
150
150
  suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
151
151
  ```
152
152
 
153
+ ### Part 4: E2E Suite Audit (M9)
154
+
155
+ Three sub-checks. All advisory. None block milestone close.
156
+
157
+ **1. Soft-cap warning**
158
+
159
+ - Read `verification.e2e_soft_cap` from `.forge/project.yml`. Default 10 if absent.
160
+ - Count `e2e: true` stories in the active milestone's `.forge/requirements/m{N}.yml`.
161
+ - If count > cap → warn: `"E2E soft cap exceeded: {count}/{cap} stories flagged. Trim e2e:true stories or raise verification.e2e_soft_cap in project.yml. Soft cap — does not block."`
162
+ - **Skip-clean:** zero `e2e:true` stories → sub-check omitted from report.
163
+
164
+ **2. Orphan-test detection**
165
+
166
+ - Glob for e2e test files. Stack-detect from `project.yml` `interface_tools` (fallback: Playwright `tests/e2e/**/*.spec.ts` + `e2e/**/*.spec.ts`; pytest `tests/e2e/test_*.py`; go `e2e/*_test.go`).
167
+ - For each file, grep for `story: FR-` (either in comment or test/function name).
168
+ - If no match → flag: `"Orphan e2e test: {path} — no FR-XXX reference found. Either tag the story or delete the test."`
169
+ - List orphans in a dedicated subsection.
170
+ - **Skip-clean:** zero e2e files discovered → sub-check omitted from report.
171
+
172
+ **3. Flake-rate signal**
173
+
174
+ - Best-effort. Attempt sources in order:
175
+ 1. `.forge/testing/suite-health.md` flake entries (tester analyst-mode output)
176
+ 2. GitHub Actions test summary artifacts (parse from `.github/workflows/` outputs if accessible)
177
+ 3. Local `playwright-report/` retry counts (if present)
178
+ - Aggregate per-test flake count. Surface top 5 flakiest with counts.
179
+ - If no source available AND e2e files exist → emit `"Flake-rate: no data (run testing skill analyst-mode for suite-health.md)"`.
180
+ - Never blocks.
181
+ - **Skip-clean:** zero e2e files discovered → sub-check omitted entirely (no "no data" line).
182
+
183
+ **Section-level skip-clean:** zero e2e test files AND zero `e2e:true` stories → omit the entire "E2E Suite Audit" section from the health report.
184
+
185
+ ```yaml
186
+ e2e_suite_audit:
187
+ soft_cap:
188
+ count: 3
189
+ cap: 10
190
+ status: ok # ok | exceeded
191
+ orphan_tests:
192
+ files_scanned: 4
193
+ orphans: [] # list of paths with no story: FR- reference
194
+ flake_rate:
195
+ source: "suite-health.md" # or "no data"
196
+ top_flaky: [] # [{path, count}, ...]
197
+ ```
198
+
153
199
  ## Step 4: Score
154
200
 
155
201
  **Per-category:**
@@ -66,6 +66,35 @@ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
66
66
 
67
67
  ### Author Mode
68
68
 
69
+ #### E2E Preflight Gate (M9)
70
+
71
+ Runs ONLY for e2e authoring requests. Integration-test authoring + analyst mode skip this gate entirely.
72
+
73
+ **Preconditions per story** — for every FR the user requests an e2e for:
74
+
75
+ 1. Read `.forge/requirements/m{N}.yml`, locate the FR by ID.
76
+ 2. Check `e2e: true`. If false or missing → REFUSE with:
77
+ `"Story {FR-ID} not flagged for e2e — add `e2e: true` + `observable_outcome` in requirements/m{N}.yml first (planning skill captures this during story breakdown)."`
78
+ 3. Check `validated: true`. If false or missing → REFUSE with:
79
+ `"Story {FR-ID} not yet validated by human — run verifying skill and walk the flow first (the e2e validation gate writes validated:true on confirmation)."`
80
+ 4. Recompute `observable_outcome_hash` from current outcome text (SHA-256 utf-8, first 12 hex). Compare to stored hash. If mismatch → REFUSE with:
81
+ `"Story {FR-ID} observable_outcome changed since validation — re-run verifying skill to re-validate the updated flow."`
82
+ 5. Only when all three pass: proceed to author the e2e test.
83
+
84
+ **Story stamping (required on every authored e2e)** — every generated e2e file MUST include the story reference. Use the framework's natural mechanism:
85
+
86
+ - Playwright / Vitest / Jest TS: `// story: FR-XXX` at the top of the spec file AND the FR ID in the test name (e.g. `test('FR-053: user signs in with correct credentials', ...)`)
87
+ - pytest: `# story: FR-XXX` at the top of the test module AND in the test function name (`def test_FR_053_user_signs_in(...)`)
88
+ - go test: `// story: FR-XXX` above the test function AND in the test name (`func TestFR053UserSignsIn(t *testing.T)`)
89
+
90
+ No story ID = orphan. Reviewing skill (phase 17) flags orphans for deletion.
91
+
92
+ **Integration + analyst modes** — unchanged. No flag check, no validated check, no story-ID stamping enforcement. M9 lock is e2e-only.
93
+
94
+ Refusal message wording is contract (NFR-009 requires story ID + exact missing field). Do not paraphrase.
95
+
96
+ #### Standard author flow
97
+
69
98
  1. **Determine layer** — e2e vs integration. Ask if ambiguous.
70
99
  2. **Select runner:**
71
100
  - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)
@@ -91,6 +91,35 @@ Re-run verifying after tests are added.
91
91
 
92
92
  If detection is ambiguous (e.g. API tests hard to grep definitively) → lean toward PASS to avoid false blocks; note uncertainty in the verdict.
93
93
 
94
+ ## E2E Validation Gate (M9)
95
+
96
+ Runs AFTER code-level verification commands pass. Skipped if no `e2e:true` stories in the active milestone.
97
+
98
+ ### Steps
99
+
100
+ 1. Read `.forge/requirements/m{N}.yml` for the active milestone. Collect every functional requirement with `e2e: true`.
101
+ 2. If list is empty → skip gate silently. No prompt. No error.
102
+ 3. For each `e2e:true` FR, present to the human:
103
+ - FR ID + description
104
+ - `observable_outcome` text verbatim
105
+ - Prompt: *"Walk this flow manually. Did the observable outcome occur? [confirm | decline | skip]"*
106
+ 4. Per response:
107
+ - **confirm** → compute `observable_outcome_hash` = SHA-256(observable_outcome utf-8), truncate to first 12 hex chars. Write `validated: true` + the hash to the FR entry in `requirements/m{N}.yml`.
108
+ - **decline** → leave `validated: false`. Record decline + reason (free text) in the verification report.
109
+ - **skip** → leave `validated: false`. Record skip in the verification report. No reason required.
110
+ 5. **Hash drift check** (run BEFORE prompting, every gate invocation): for each `e2e:true` FR with `validated: true`, recompute hash from current `observable_outcome`. If it differs from stored `observable_outcome_hash` → set `validated: false`, clear hash. Note auto-reset in verification report. Then prompt that FR as unvalidated.
111
+ 6. Write per-FR validation outcomes into the verification report under section "E2E Validation".
112
+
113
+ ### Gate behavior
114
+
115
+ - **Advisory, not blocking.** Verifying still passes even if no stories validated — the hard gate is in `testing` skill author-mode (phase 16). This gate's job is to surface + record, not block.
116
+ - Per-story (not batch). Human walks one at a time.
117
+ - Hash: SHA-256, UTF-8 input, hex output truncated to first 12 chars. Deterministic across machines.
118
+
119
+ ### Skip-clean
120
+
121
+ Milestones with zero `e2e:true` stories never see this gate. Verifying logs nothing — appears as if the gate doesn't exist.
122
+
94
123
  ## 3-Level Goal-Backward Verification
95
124
 
96
125
  ### Level 1: Observable Truths
@@ -60,6 +60,7 @@ verification:
60
60
  # advisory: true # pre-existing type errors — warn, don't block
61
61
  auto_fix: true # On failure, agent fixes and retries
62
62
  max_retries: 2 # Max auto-fix attempts per command (0 = fail immediately)
63
+ e2e_soft_cap: 10 # M9: advisory cap on e2e:true stories per milestone. Reviewing warns when exceeded. Soft — never blocks.
63
64
  # Advisory mode: commands already failing before Forge started run but don't block — warn only.
64
65
 
65
66
  success_criteria: # How do we know we're done?
@@ -7,6 +7,11 @@
7
7
  milestone: 1 # Milestone this file belongs to (matches state/milestone-{id}.yml)
8
8
  version: "v1" # v1 = MVP, v2 = next iteration
9
9
 
10
+ # E2E fields (M9): mark `e2e: true` + `observable_outcome` during planning. Verifying skill
11
+ # prompts a human walk; on confirm it sets `validated: true` + `observable_outcome_hash`.
12
+ # Testing skill author-mode refuses e2e without validated:true. Reviewing skill warns on
13
+ # soft-cap exceeded + flags orphan tests. Fields are lazy — absent = e2e:false/validated:false.
14
+
10
15
  functional:
11
16
  # Each requirement: unique ID, description, acceptance criteria, phase assignment
12
17
  - id: FR-001
@@ -18,7 +23,12 @@ functional:
18
23
  phase: null # Assigned during roadmap creation
19
24
  priority: P1 # P1 = must-have, P2 = should-have, P3 = nice-to-have
20
25
  status: pending # pending | clarifying | planned | implemented | verified
21
- notes: "" # [NEEDS CLARIFICATION] if uncertain
26
+ notes: ""
27
+ # E2E gate (M9). Lazy — absent fields = e2e:false, validated:false.
28
+ # e2e: false # true = story gets one e2e test post-validation
29
+ # observable_outcome: "" # one-sentence user-observable outcome (required when e2e:true)
30
+ # observable_outcome_hash: "" # auto-computed SHA-256 of outcome (12 hex chars); editing outcome resets validated
31
+ # validated: false # set true by verifying skill after human walks the flow # [NEEDS CLARIFICATION] if uncertain
22
32
 
23
33
  - id: FR-002
24
34
  description: ""
@@ -51,12 +51,12 @@ Auto-detects complexity. Override: "Use Quick/Standard/Full tier."
51
51
  | Architectural decisions | `architecting` | Full |
52
52
  | Break work into tasks with gates | `planning` | Standard, Full |
53
53
  | Build with deviation rules + atomic commits | `executing` | All |
54
- | Prove work delivers on goals | `verifying` | Standard, Full |
55
- | Audit health + catalog refactoring | `reviewing` | Standard, Full |
54
+ | Prove work delivers on goals (+ M9 e2e validation gate when `e2e:true` stories present) | `verifying` | Standard, Full |
55
+ | Audit health + catalog refactoring (+ M9 e2e soft-cap, orphan-test, flake-rate audits) | `reviewing` | Standard, Full |
56
56
  | Small scoped fix | `quick-tasking` | Quick |
57
57
  | UI with design system | `designing` | When UI |
58
58
  | Security review | `securing` | When auth/data/API |
59
- | E2E/integration tests + suite audit | `testing` | When UI/flows or flaky suite |
59
+ | E2E/integration tests + suite audit (+ M9 author-mode gate refuses e2e without `e2e:true` + `validated:true`) | `testing` | When UI/flows or flaky suite |
60
60
  | Systematic debugging | `debugging` | When stuck |
61
61
  | Upgrade Forge files | `upgrading` | On-demand |
62
62
  | Cross-session memory | `beads-integration` | When Beads installed |
@@ -125,7 +125,7 @@ State lives in `.forge/`:
125
125
  - `project.yml` — Vision, stack, design system, verification, constraints (<5KB)
126
126
  - `constitution.md` — Active architectural gates
127
127
  - `design-system.md` — Component mapping table
128
- - `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared.
128
+ - `requirements/m{N}.yml` — Per-milestone structured requirements with `[NEEDS CLARIFICATION]` markers. **FR-IDs, DEF-IDs, and NFR-IDs are globally unique across all milestone files** — `FR-001` may exist in exactly one `m{N}.yml`. Before adding a new ID, scan `.forge/requirements/*.yml` for the highest in-use number and continue the sequence. On collision (e.g. during a migration), keep the older milestone's ID and renumber the newer. Concurrent milestones each own their file — no cross-stream contention on file writes, but ID space is shared. Functional requirements may carry M9 e2e gate fields (`e2e`, `observable_outcome`, `observable_outcome_hash`, `validated`) — lazy migration, absent fields default to `e2e:false`/`validated:false`.
129
129
  - `roadmap.yml` — Phases, milestones, dependencies
130
130
  - `state/index.yml` — Global: active milestones, desire_paths, metrics
131
131
  - `state/milestone-{id}.yml` — Per-milestone cursor: position, progress, decisions, blockers
@@ -173,6 +173,7 @@ verification:
173
173
  - Auto-fix loop: read output → fix → amend → re-run (up to max_retries)
174
174
  - 3-strike: retries count toward task limit
175
175
  - Empty commands = no gate (opt-out)
176
+ - `verification.e2e_soft_cap` (default 10) — advisory cap on `e2e:true` stories per milestone surfaced by the `reviewing` skill. Soft — never blocks.
176
177
 
177
178
  ## Beads Integration (Optional)
178
179