npm - @therocketcode/gsd-core - Versions diffs - 1.7.5 → 1.8.0 - Mend

@therocketcode/gsd-core 1.7.5 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.claude-plugin/plugin.json +1 -1
package/agents/gsd-plan-checker.md +2 -2
package/commands/gsd/cicd-strategy.md +67 -0
package/commands/gsd/discover-product.md +2 -2
package/commands/gsd/infrastructure-strategy.md +65 -0
package/gemini-extension.json +1 -1
package/gsd-core/references/ai-test-quality.md +85 -0
package/gsd-core/references/architecture-decision.md +10 -7
package/gsd-core/references/cicd-strategy.md +115 -0
package/gsd-core/references/contract-testing.md +9 -1
package/gsd-core/references/data-environments.md +89 -0
package/gsd-core/references/domain-modeling.md +14 -2
package/gsd-core/references/e2e-tiering.md +2 -2
package/gsd-core/references/infrastructure-strategy.md +91 -0
package/gsd-core/references/product-discovery.md +7 -7
package/gsd-core/references/test-doubles.md +88 -0
package/gsd-core/references/test-strategy.md +6 -5
package/gsd-core/templates/adr.md +21 -1
package/gsd-core/templates/cicd-strategy.md +72 -0
package/gsd-core/templates/domain-model.md +4 -2
package/gsd-core/templates/infra-strategy.md +77 -0
package/gsd-core/templates/product-brief.md +10 -8
package/gsd-core/templates/test-strategy.md +8 -0
package/gsd-core/workflows/add-tests.md +8 -3
package/gsd-core/workflows/cicd-strategy.md +152 -0
package/gsd-core/workflows/discover-product.md +13 -9
package/gsd-core/workflows/discuss-phase.md +1 -1
package/gsd-core/workflows/help/modes/full.md +2 -0
package/gsd-core/workflows/infrastructure-strategy.md +142 -0
package/gsd-core/workflows/model-domain.md +13 -13
package/gsd-core/workflows/plan-phase.md +2 -2
package/gsd-core/workflows/recommend-architecture.md +22 -8
package/gsd-core/workflows/testing-strategy.md +6 -4
package/package.json +1 -1

package/gsd-core/workflows/cicd-strategy.md ADDED Viewed

@@ -0,0 +1,152 @@
+<purpose>
+Recommend a CI/CD strategy matched to the test strategy, the target infrastructure, and the team: WHERE CI runs, HOW it authenticates, WHICH test tiers gate which stage, and HOW deploys promote. GitHub Actions is the default platform; OIDC with a pinned `sub` is the default auth; the deployment ladder rung follows team size + blast radius — never aspiration. Runs after testing-strategy, before planning. Produces `.planning/CICD-STRATEGY.md`, consumed by plan-phase.
+</purpose>
+<required_reading>
+@~/.claude/gsd-core/references/cicd-strategy.md
+@~/.claude/gsd-core/templates/cicd-strategy.md
+</required_reading>
+<process>
+## Step 1: Initialize
+```bash
+_GSD_SHIM_NAME="gsd-tools.cjs"; _GSD_RUNTIME_ROOT="${RUNTIME_DIR:-$(git rev-parse --show-toplevel 2>/dev/null || pwd)}"; GSD_TOOLS="${_GSD_RUNTIME_ROOT}/gsd-core/bin/${_GSD_SHIM_NAME}"; if [ -f "$GSD_TOOLS" ]; then gsd_run() { node "$GSD_TOOLS" "$@"; }; elif [ -f "${_GSD_RUNTIME_ROOT}/.claude/gsd-core/bin/${_GSD_SHIM_NAME}" ]; then GSD_TOOLS="${_GSD_RUNTIME_ROOT}/.claude/gsd-core/bin/${_GSD_SHIM_NAME}"; gsd_run() { node "$GSD_TOOLS" "$@"; }; elif command -v gsd-tools >/dev/null 2>&1; then GSD_TOOLS="$(command -v gsd-tools)"; gsd_run() { "$GSD_TOOLS" "$@"; }; elif [ -f "$HOME/.claude/gsd-core/bin/${_GSD_SHIM_NAME}" ]; then GSD_TOOLS="$HOME/.claude/gsd-core/bin/${_GSD_SHIM_NAME}"; gsd_run() { node "$GSD_TOOLS" "$@"; }; else echo "ERROR: gsd-tools.cjs not found at $GSD_TOOLS and gsd-tools is not on PATH. Run: npx -y @therocketcode/gsd-core@latest --claude --local" >&2; exit 1; fi
+COMMIT_DOCS=$(gsd_run query config-get commit_docs 2>/dev/null || echo "true")
+RESPONSE_LANG=$(gsd_run query config-get response_language 2>/dev/null || true)
+ls .planning/PROJECT.md >/dev/null 2>&1 && echo "PROJECT_FOUND" || echo "NO_PROJECT"
+ls .planning/TEST-STRATEGY.md >/dev/null 2>&1 && echo "HAS_TEST_STRATEGY" || echo "NO_TEST_STRATEGY"
+ls .planning/INFRA-STRATEGY.md >/dev/null 2>&1 && echo "HAS_INFRA_STRATEGY" || echo "NO_INFRA_STRATEGY"
+ls .planning/CICD-STRATEGY.md >/dev/null 2>&1 && echo "EXISTS" || echo "NEW"
+```
+**If `NO_PROJECT`:** Stop — "No project found. Run /gsd:new-project first." Exit.
+**If `RESPONSE_LANG` non-empty:** all user-facing text in that language; keep technical terms, code, and stage names (PR gate / merge / nightly, OIDC, `sub`) in English.
+**Text mode** (`--text` OR `workflow.text_mode: true`): replace every `AskUserQuestion` with a plain-text numbered list.
+**If `EXISTS` and not `--auto`:** ask Update / View / Skip (header "Strategy"). On Skip: exit ("Existing CICD-STRATEGY.md preserved."). On View: show then Update/Skip.
+## Step 2: Load context
+```bash
+cat .planning/TEST-STRATEGY.md 2>/dev/null || true
+cat .planning/INFRA-STRATEGY.md 2>/dev/null || true
+cat .planning/adr/*.md 2>/dev/null || true
+cat .planning/PROJECT.md 2>/dev/null || true
+```
+**Read `@~/.claude/gsd-core/references/cicd-strategy.md` now** — it defines the GHA-default platform decision, OIDC-with-pinned-`sub`, the secrets split, the tier→stage mapping with the ≤10-min PR budget, the flaky canon, the merge-queue trigger, the deployment ladder, the free-six supply-chain table stakes, and the anti-pattern table.
+**If `NO_TEST_STRATEGY`:** tell the user "No test strategy found — the pipeline mapping is much better with one. (Consider `/gsd:testing-strategy` first.)" If they decline, proceed with generic tiers (small/unit → PR; medium/integration → merge; large/e2e → nightly) and note the gap in the output. From TEST-STRATEGY.md, extract: the per-subdomain level emphasis, the persistent e2e smoke list (the 3–7 flows), and the mutation-testing targets. From INFRA-STRATEGY.md / ADR / PROJECT.md, extract: target cloud + deploy target, repo host, team size, and blast radius (payments/PII/data = high). Ask only what's missing (header "Context"): team size, blast radius, expected merge volume.
+## Step 3: Platform choice
+**Default: GitHub Actions** — 41% org adoption, the ecosystem, merge queue, OIDC into all three clouds endorsed by AWS's and Google's own blogs. Recommend it whenever the repo is on GitHub.
+**If the user wants cloud-native CI (Cloud Build / CodeBuild) — push back unless they have a real reason:** "Cloud-native CI is a deliberate exception, not a default — even AWS publishes first-class GitHub Actions → AWS paths. The two reasons that justify it: (1) VPC-isolated/regulated builds that must run inside a private network or compliance boundary, (2) cheap compute behind GHA (e.g., CodeBuild hosting GHA runner jobs). Do either apply?" If yes — **honor it**: that's exactly the carve-out (record which reason). If no, recommend GHA and record their final choice either way.
+**The reverse holds too:** if the user has a genuine VPC/regulatory constraint and you were about to recommend GHA hosted runners, the cloud-native exception (or self-hosted runners in their VPC behind GHA) is the right call — don't dogmatically default.
+Pricing context if cost comes up: GHA Linux $0.006/min (2,000–3,000 free min/mo), Cloud Build $0.006/min + 2,500 free min/mo, Azure $40/parallel-job unlimited minutes. Stay on hosted runners until the bill clears the free tier plus low-hundreds $/mo; then managed third-party runners before DIY self-hosted; never self-hosted on public repos.
+## Step 4: Auth + secrets
+**Recommendation: OIDC/WIF with a pinned `sub` condition (repo + branch/environment) — zero long-lived cloud keys in CI.** Always state the caveat: ~1,500 cloud roles have been found assumable by *any* GitHub repo due to missing/wildcard `sub` conditions — bare "OIDC" is not the recommendation; **pinned-`sub` OIDC** is.
+**If the user says "we'll just put the service-account JSON in GitHub secrets" — push back:** "That's the exact pattern the CircleCI 2023 breach turned into a rotate-everything-everywhere incident, and GitGuardian finds 70% of leaked secrets still valid 2+ years later. Google, AWS, GitHub, and Microsoft all recommend OIDC federation instead — short-lived tokens, valid for a single job, nothing to steal. It's ~30 minutes of setup. Is there a target here that genuinely can't do federation?" Only if the target truly can't federate (legacy/3rd-party SaaS): a short-lived, scoped, rotated secret in CI secrets is the documented fallback.
+**The secrets split (record it as a table):**
+- Cloud deploy creds → nowhere (OIDC mints them per job).
+- CI-scoped secrets → CI platform secrets ONLY when OIDC is unavailable for that target.
+- Application secrets → ALWAYS the cloud secret manager, injected at runtime — **never baked into images, never a committed `.env`**.
+## Step 5: Pipeline design (map TEST-STRATEGY tiers to stages)
+Map the tiers from TEST-STRATEGY.md onto the three stages (this is Google's stated presubmit/postsubmit policy, and size↔flakiness is measured across 4.2M tests):
+- **PR gate — ≤10 min wall clock (hard budget; CD book + DORA canon):** lint, types, small (unit), fast in-process medium, and the **persistent smoke e2e list from TEST-STRATEGY.md (3–7 flows, happy paths only)**. If the suite doesn't fit, cut the gate — don't stretch the budget.
+- **Merge to main:** full medium suite + e2e subset against a real (preview/ephemeral) environment.
+- **Nightly / pre-release:** full e2e portfolio, long suites, cross-browser/device, and the **mutation run** (Stryker targets from TEST-STRATEGY.md).
+**Flaky policy (record it):** PR-gate tests must hold <1% flake rate; flaky tests are **quarantined from the PR gate but keep running post-merge with a fix SLA** (Google/Dropbox pattern). Differentiated retries (same-process / time-shifted / different-host) for *diagnosis* only. If the user asks for automatic retry-until-green on the gate, push back: blanket retries destroy the signal (Fowler) — quarantine + diagnose instead.
+**Merge queue:** recommend only at ~tens of merges/day to one branch (Uber measured ~40% conflict-breakage odds at just 16 concurrent conflicting changes). Below that volume it's pure latency — record the trigger ("enable when stale-base failures become routine") instead of enabling now.
+## Step 6: Deployment ladder rung
+Pick the rung from **team size + blast radius** (from Step 2), using the reference's ladder. Build-once/promote-same-artifact and one-command rollback are invariants at every rung.
+- **Solo/small, low blast radius:** trunk-based + one automated deploy path + free platform PR previews (+ Neon-style DB branch per preview if Postgres) + one-command rollback. **No staging environment.**
+- **High blast radius (payments/data) at any small size:** add feature flags (internal-first exposure) + revertable expand-contract schema changes + a deliberate blue-keep-alive rollback window.
+- **~10 people:** previews for every PR incl. backend, real flag hygiene, DORA metrics, manual canary.
+- **~50 / high traffic:** automated canary analysis.
+**Scripted pushback — "we need a staging environment" (solo dev):** "Staging catches only known-unknowns, and mirroring it to prod is — per Charity Majors — a fool's errand; Uber is actively deprecating staging. For a team your size the evidence-backed spend is PR previews + one-command rollback + production observability. The one exception worth a thin pre-prod: rehearsing a risky migration. Does that apply here?" Honor a genuine migration-rehearsal or compliance need.
+**Scripted pushback — "let's add canary deployments" without SLIs:** "Canary *analysis* has prerequisites (SRE Workbook): ~a dozen trustworthy low-variance SLIs and enough real traffic that a 1–5% slice yields signal. Without those it's automation theater on noise. Until then: rolling deploy + health checks + one-command rollback, and feature flags give you progressive exposure more simply. Do you have the SLIs and traffic today?" Record canary as a deferred item with its promotion trigger.
+## Step 7: Supply-chain table stakes
+Recommend the **free six** — each ≤ hours of work, each counters a real 2023–25 attack: (1) SHA-pin all actions + Dependabot pin updates (tj-actions CVE-2025-30066: moved tags, 23k+ repos), (2) lockfile + `npm ci` (Shai-Hulud worm), (3) top-level read-only `permissions:` / read-only `GITHUB_TOKEN` default, (4) OIDC zero-long-lived-keys (CircleCI breach), (5) push protection + secret scanning + no `.env` in repo, (6) branch ruleset on main (PR + checks, no force-push). Plus: never `pull_request_target` with untrusted checkout; never self-hosted runners on public repos.
+**Defer the ceremony** (record as deferred, with triggers): SLSA L3, cosign-signing internal artifacts, SBOM programs beyond the free SPDX export, org-wide Scorecard dashboards, self-hosted runner fleets. If publishing packages, take the free provenance win (`npm publish --provenance`).
+## Step 8: Over/under-engineering meta-tell check
+Before writing, audit every choice against the meta-tell: **if you cannot point to a current, concrete requirement justifying a capability** — a real compliance boundary for cloud-native CI, real merge volume for a queue, real SLIs+traffic for canary, a real migration to rehearse for pre-prod — **it's over-engineering: defer it with a recorded trigger.** Conversely, if a concrete requirement exists and the strategy ignores it — high blast radius with no flags/revertable schema path, a target that can't federate with no rotation plan, >10 merges/day with no queue — **that's under-engineering: fix it now.**
+## Step 9: Write CICD-STRATEGY.md
+Render `@~/.claude/gsd-core/templates/cicd-strategy.md` (fill `[DATE]`, `[PROJECT_TITLE]`). Fill: platform + why, auth method (OIDC config incl. the `sub` condition), the secrets table, the pipeline map with time budgets, the flaky policy, the ladder rung + promotion triggers, the supply-chain checklist, anti-patterns acknowledged, deferred items, and handoff notes for plan-phase.
+Write to `.planning/CICD-STRATEGY.md`.
+## Step 10: Commit
+```bash
+if [ "$COMMIT_DOCS" = "true" ]; then
+  gsd_run query commit "docs: add CI/CD strategy (pipeline follows test strategy)" --files .planning/CICD-STRATEGY.md
+else
+  echo "CICD-STRATEGY.md written but not committed (commit_docs is false)."
+fi
+```
+## Step 11: Wrap up
+Display:
+```
+CICD-STRATEGY.md written — pipeline mapped to the test strategy.
+  Platform: [GitHub Actions] · Auth: [OIDC, sub pinned to repo+env]
+  PR gate (≤10 min): [unit + fast integration + N smoke e2e]
+  Ladder rung: [solo: trunk + previews + rollback — no staging]
+  Supply chain: [6/6 free table stakes] · Deferred: [SLSA L3, cosign, canary]
+Next: /gsd:plan-phase   (CI/deploy phases will plan against this strategy)
+```
+</process>
+<critical_rules>
+- **GitHub Actions by default; cloud-native CI only as a deliberate exception** (VPC/regulatory isolation, or cheap compute behind GHA) — and honor the exception when the reason is real.
+- **Never bare "OIDC" — always OIDC with a pinned `sub` condition** (repo + branch/environment). Long-lived cloud keys in CI only when federation is genuinely impossible, then short-lived/scoped/rotated.
+- **App secrets live in the cloud secret manager, runtime-injected — never in images, never a committed `.env`.** CI platform secrets hold CI-scoped values only.
+- **The PR gate is ≤10 minutes.** Cut the gate to fit the budget, never the reverse. Quarantine flakes from the gate but keep them running post-merge; never blanket retry-until-green.
+- **Ladder rung follows team size + blast radius, not aspiration.** No staging for a solo dev (except migration rehearsal); no canary analysis without ~a dozen trustworthy SLIs + real traffic. Record promotion triggers for everything deferred.
+- **Recommend, don't dictate.** Present trade-offs with rationale; the user has context you lack. Respect `commit_docs` / `response_language`.
+</critical_rules>
+<success_criteria>
+- TEST-STRATEGY.md (or generic tiers, gap noted) + INFRA-STRATEGY/ADR context loaded; team size + blast radius established
+- Platform chosen with rationale (GHA default; any cloud-native exception justified by VPC/regulatory or compute-behind-GHA)
+- Auth recorded as pinned-`sub` OIDC (or the documented fallback with rotation); secrets split table filled
+- Pipeline map: PR gate ≤10 min (unit + fast medium + 3–7 smoke e2e), merge, nightly+mutation; flaky quarantine policy + merge-queue trigger recorded
+- Deployment ladder rung matched to team size + blast radius; staging/canary pushbacks applied; promotion triggers recorded
+- The free-six supply-chain table stakes recommended; SLSA/cosign/SBOM ceremony deferred with triggers
+- Meta-tell check passed (no capability without a current concrete requirement; no ignored requirement)
+- CICD-STRATEGY.md written and committed (when commit_docs is true)
+- User directed to /gsd:plan-phase
+</success_criteria>

package/gsd-core/workflows/discover-product.md CHANGED Viewed

@@ -42,22 +42,26 @@ Use `AskUserQuestion` (header "Discovery"):
   - "Requirements are clear & evidenced" (→ Step 2a)
   - "Clear, but help me prioritize" (→ Step 2a, prioritization only)
-**Step 2a (clear/evidenced path):** First **audit the evidence** — confirm it is *behavioral* (a paying client, a signed LOI, real usage data), not *interest* (waitlists, likes, "people say it's great"). If the cited evidence is only interest, say so and route to the full interview (Step 3) instead — never honor "evidenced" on the strength of waitlists/likes. If the evidence is genuinely behavioral, do NOT run the full interview. Either:
+**Step 2a (clear/evidenced path):** First **audit the evidence** — two tests, BOTH must pass:
+1. **Strength:** the evidence is behavioral with *money moved or real usage* (a paying client, a live pilot), not *interest* (waitlists, likes, "people say it's great"). Signed non-binding LOIs are **medium** — never skip-qualifying alone.
+2. **Coverage:** the evidence covers the *specific candidate list* to be prioritized. If it covers only a slice, run the full interview (Step 3) scoped to the unevidenced remainder.
+If either fails, say so and route to the full interview — never honor "evidenced" on waitlists/likes/LOIs alone. If both pass, do NOT run the full interview, but first ask three one-question checks: the **named specific user**, the **narrowest-slice statement**, and an **outcome (not output) metric** — each must get a real answer before the minimal brief. Then either:
 - Offer lightweight **RICE** prioritization on their known candidate list (Reach × Impact × Confidence ÷ Effort; note table-stakes/dependencies override the score), capture it, then
 - Point them onward: "Requirements are clear — run `/gsd:new-project` to capture them, then `/gsd:model-domain`."
-Write a minimal PRODUCT-BRIEF.md (outcome + the prioritized list + "discovery skipped: requirements pre-evidenced") and skip to Step 10. Exit early if they don't even want prioritization.
+Write a minimal PRODUCT-BRIEF.md (outcome + user + slice + the prioritized list + "discovery skipped: requirements pre-evidenced") and skip to Step 10. Exit early if they don't even want prioritization.
 ## Step 3–9: The forcing interview
-Run the ordered question set from the reference. **Posture: the first answer is polished — push 2–3 times for concrete specifics (the actual human, the actual consequence), reflect back, confirm. One thread at a time.** Ask about the PAST, never hypotheticals. Skip any block already evidenced.
+Run the ordered question set from the reference. **Posture: the first answer is polished — push 2–3 times for concrete specifics (the actual human, the actual consequence), reflect back, confirm. One thread at a time.** Ask about the PAST, never hypotheticals. Skip a block ONLY when its named outputs are already captured at **strong** evidence — Step 4: specific user + job story + measurable outcomes; Step 5: signals marked strong/medium/weak; Step 6: wedge + >1-solution check; Step 9: dated outcome metric — and reflect the skipped block's conclusion back for confirmation before moving on.
-- **Step 3 — Frame (outcome):** "What customer behavior or metric do we want to change — not a feature?" "If we skipped discovery, what assumption would we be betting the whole build on?"
+- **Step 3 — Frame (outcome):** "What customer behavior or metric do we want to change — not a feature?" (The betting-the-build assumption is covered in Step 7 — don't ask it here.)
 - **Step 4 — Job & user:** "Who *specifically* — and for whom is this most acute, frequent, expensive, unavoidable?" Capture the solution-free job and a job story ("When … I want to … so I can …"). Then capture **2–3 measurable desired outcomes** for the job as *direction + metric + object* ("reduce the time to find an open class slot") — these are what "better" is measured against later. If the job-population is heterogeneous, capture outcomes **per segment** (different segments want different things — don't average them away). If after 2–3 pushes the user still can't name a specific acute role (answers "everyone"/"all X"), do NOT record a generic user — record the target user as **UNRESOLVED** and make "identify the acute user" the first open question. A non-specific user is a discovery red flag, not a finding.
-- **Step 5 — Demand vs interest:** "Tell me about the *last time* you hit this." "What are you doing about it *today*, and what does it cost?" "What *real* evidence exists — pre-pay, LOI, pilot, converted signups?" Mark each signal strong (behavior/money) vs weak (interest). **Never** ask hypotheticals — neither "would you use X?" nor "would you pay $Y?"; redirect any "they'd pay $Y" answer to "tell me about the last time someone actually paid for a workaround."
+- **Step 5 — Demand vs interest:** "Tell me about the *last time* you hit this." "What are you doing about it *today*, and what does it cost?" "What do they use today instead — including spreadsheets or nothing — and why hasn't it won?" "What *real* evidence exists — pre-pay, pilot in use, converted signups, signed LOIs?" Mark each signal **strong** (money moved / real usage / panic-when-it-breaks), **medium** (signed LOIs/unpaid pilots — real but not yet demand; convert to strong or treat as open), or **weak** (interest — waitlists, likes, "great idea"). **Never** ask hypotheticals — neither "would you use X?" nor "would you pay $Y?"; redirect any "they'd pay $Y" answer to "tell me about the last time someone actually paid for a workaround."
 - **Step 6 — Wedge:** "Which single opportunity, solved, most moves the outcome? What's the narrowest version that fully solves it for one user this week?" Check: can we imagine >1 solution? (If no, we smuggled in a solution — re-frame.)
-- **Step 7 — Four risks** (only the unvalidated): value / usability / feasibility / viability. First **enumerate the leap-of-faith assumptions** behind the chosen wedge (what must be true for it to work), order them by risk, and run the *cheapest test on the riskiest* — not just one test on the least-validated risk. Do not rely on the user's self-rating — if a risk is dismissed without evidence ("it's fine," "AI can do it"), treat it as **open**. Independently name any obvious risk the user omitted (e.g., legal/consent, data privacy, platform dependency) and mark it open with a test. Record the surviving assumptions in the brief's "Assumptions to re-test" table — the brief is a hypothesis to keep testing, not a verdict.
+- **Step 7 — Four risks** (only the unvalidated): value / usability / feasibility / viability. First **enumerate the leap-of-faith assumptions** behind the chosen wedge (what must be true for it to work), order them by risk, and **specify** the cheapest test for the *riskiest* — record it in the brief with pass/fail threshold, kill criterion, owner, and by-when (tests run *after* this session, before building) — not just one test on the least-validated risk. Do not rely on the user's self-rating — if a risk is dismissed without evidence ("it's fine," "AI can do it"), treat it as **open**. **Value is never "validated" on founder testimony alone** — it requires customer-sourced evidence (a named customer's behavior, quote, or money). Independently name any obvious risk the user omitted (e.g., legal/consent, data privacy, platform dependency) and mark it open with a test. Record the surviving assumptions in the brief's "Assumptions to re-test" table — the brief is a hypothesis to keep testing, not a verdict.
 - **Step 8 — Scope & prioritization:** the end-to-end journey → the thin first slice; RICE the candidate list; record explicit "not in scope."
-- **Step 9 — Success:** the **outcome metric** (a change in customer behavior or business result) + by when; the PMF check (what would make ≥40% of core users "very disappointed"). **Reject vanity/output metrics — signups, waitlist size, downloads, page views, "launched" — and push to the behavior/result they proxy for (retained paying users, task completion, % of the target behavior achieved). A user-count is an output unless tied to retained value.**
+- **Step 9 — Success:** the **outcome metric** (a change in customer behavior or business result) + by when; the PMF check, **pre-registered**: define now the Sean Ellis criterion (≥40% "very disappointed") to survey once ≥N pilots have used the core (only users who used it) — a *planned measurement*, never a founder prediction. **Reject vanity/output metrics — signups, waitlist size, downloads, page views, "launched" — and push to the behavior/result they proxy for (retained paying users, task completion, % of the target behavior achieved). A user-count is an output unless tied to retained value.**
 ## Step 10: Write PRODUCT-BRIEF.md
@@ -83,7 +87,7 @@ PRODUCT-BRIEF.md written — product defined.
   Outcome: [one line]
   Wedge: [the narrowest paid slice]
-  Demand: [strong | weak — based on past-behavior evidence]
+  Demand: [strong | medium | weak — based on past-behavior evidence]
   Four risks: [N validated · M open]
 Next: /gsd:new-project (capture it) → /gsd:model-domain (the domain) → /gsd:recommend-architecture
@@ -105,7 +109,7 @@ Next: /gsd:new-project (capture it) → /gsd:model-domain (the domain) → /gsd:
 - Specific user + solution-free job + job story captured
 - Demand separated from interest via past-behavior evidence
 - Narrowest wedge identified; vision admits >1 solution
-- Four risks assessed (unvalidated ones flagged with a cheapest test)
+- Four risks assessed (unvalidated ones get a specified cheapest test — threshold, owner, by-when)
 - Scope prioritized (thin slice + RICE; explicit "not in scope")
 - PRODUCT-BRIEF.md written and committed (when commit_docs is true)
 - User directed to /gsd:new-project or /gsd:model-domain

package/gsd-core/workflows/discuss-phase.md CHANGED Viewed

@@ -293,7 +293,7 @@ Analyze the phase to identify gray areas. Use both `prior_decisions` and `codeba
 1b. **Initialize canonical refs accumulator** — Start building `<canonical_refs>` for CONTEXT.md. Sources:
    - **Now:** Copy `Canonical refs:` from ROADMAP.md for this phase. Expand each to a full relative path. Check REQUIREMENTS.md and PROJECT.md for specs/ADRs referenced.
-   - **Project discovery artifacts (if present):** add `.planning/DOMAIN-MODEL.md` (ubiquitous language + core/supporting/generic subdomains), the most recent `.planning/adr/*.md` (the architecture decision — per-subdomain domain-logic rung + topology), and `.planning/TEST-STRATEGY.md` (per-subdomain test levels + the linked test-infra how-to references). These are project-wide and apply to EVERY phase: planning and implementation MUST follow the domain model, the architecture decision, and the test strategy. For each, note its role (e.g. "Architecture decision — MUST follow the chosen rung per subdomain"). When `TEST-STRATEGY.md` is present, the plan's test tasks must also pull in the specific `gsd-core/references/<test-infra>.md` it links for the level being written.
+   - **Project discovery artifacts (if present):** add `.planning/DOMAIN-MODEL.md` (ubiquitous language + core/supporting/generic subdomains), the most recent `.planning/adr/*.md` (the architecture decision — per-subdomain domain-logic rung + topology), and `.planning/TEST-STRATEGY.md` (per-subdomain test levels + the linked test-infra how-to references), plus `.planning/INFRA-STRATEGY.md` (compute/data/environments decisions) and `.planning/CICD-STRATEGY.md` (pipeline stages, deploy ladder, secrets policy) when present. These are project-wide and apply to EVERY phase: planning and implementation MUST follow the domain model, the architecture decision, and the test strategy. For each, note its role (e.g. "Architecture decision — MUST follow the chosen rung per subdomain"). When `TEST-STRATEGY.md` is present, the plan's test tasks must also pull in the specific `gsd-core/references/<test-infra>.md` it links for the level being written.
    - **`scout_codebase`:** If existing code references docs (e.g., comments citing ADRs), add those.
    - **`discuss_areas`:** When the user says "read X", "check Y", or references any doc/spec/ADR — add it immediately. These are often the MOST important refs.

package/gsd-core/workflows/help/modes/full.md CHANGED Viewed

@@ -584,6 +584,8 @@ The commands above cover the most common day-to-day flows. Every command listed
 ### Planning & Execution
 - **`/gsd:testing-strategy [--auto] [--text]`** — Recommend a test strategy matched to the architecture (shape follows architecture; levels; what to test); writes TEST-STRATEGY.md.
+- **`/gsd:infrastructure-strategy [--auto] [--text]`** — Recommend infrastructure matched to actual scale and team: compute rung, data layer per environment, observability + IaC floors; writes INFRA-STRATEGY.md.
+- **`/gsd:cicd-strategy [--auto] [--text]`** — Recommend a CI/CD strategy: platform, OIDC auth, secrets split, test-tier→stage mapping, deploy ladder; writes CICD-STRATEGY.md.
 - **`/gsd:mvp-phase <phase-number>`** — Plan a phase as a vertical MVP slice (user story + SPIDR splitting) before handing off to plan-phase. Same end-state as `/gsd:plan-phase --mvp`, with a guided MVP-shaping intro.
 - **`/gsd:ultraplan-phase [phase]`** — [BETA] Offload plan phase to Claude Code's ultraplan cloud; review in browser and import back.
 - **`/gsd:plan-review-convergence <phase> [--codex] [--gemini] [--claude] [--opencode] [--ollama] [--lm-studio] [--llama-cpp] [--all] [--text] [--ws <name>] [--max-cycles N]`** — Cross-AI plan convergence loop — replan with review feedback until no HIGH concerns remain. Supports both cloud reviewers (Codex/Gemini/Claude/OpenCode) and local model runtimes (Ollama, LM Studio, llama.cpp).

package/gsd-core/workflows/infrastructure-strategy.md ADDED Viewed

@@ -0,0 +1,142 @@
+<purpose>
+Recommend an infrastructure strategy matched to the project's actual traffic shape, team size, and spend: WHICH cloud, WHICH compute rung per component, WHAT data layer per environment, and the observability + IaC floors. The compute rung is an OUTPUT of evidence (utilization crossovers, team-size floor, capability triggers), never a platform picked for résumé or comfort. Runs after recommend-architecture (consumes the topology) and testing-strategy (consumes CI needs), before planning. Produces `.planning/INFRA-STRATEGY.md`, consumed by /gsd:cicd-strategy and plan-phase.
+</purpose>
+<required_reading>
+@~/.claude/gsd-core/references/infrastructure-strategy.md
+@~/.claude/gsd-core/references/data-environments.md
+@~/.claude/gsd-core/templates/infra-strategy.md
+</required_reading>
+<process>
+## Step 1: Initialize
+```bash
+_GSD_SHIM_NAME="gsd-tools.cjs"; _GSD_RUNTIME_ROOT="${RUNTIME_DIR:-$(git rev-parse --show-toplevel 2>/dev/null || pwd)}"; GSD_TOOLS="${_GSD_RUNTIME_ROOT}/gsd-core/bin/${_GSD_SHIM_NAME}"; if [ -f "$GSD_TOOLS" ]; then gsd_run() { node "$GSD_TOOLS" "$@"; }; elif [ -f "${_GSD_RUNTIME_ROOT}/.claude/gsd-core/bin/${_GSD_SHIM_NAME}" ]; then GSD_TOOLS="${_GSD_RUNTIME_ROOT}/.claude/gsd-core/bin/${_GSD_SHIM_NAME}"; gsd_run() { node "$GSD_TOOLS" "$@"; }; elif command -v gsd-tools >/dev/null 2>&1; then GSD_TOOLS="$(command -v gsd-tools)"; gsd_run() { "$GSD_TOOLS" "$@"; }; elif [ -f "$HOME/.claude/gsd-core/bin/${_GSD_SHIM_NAME}" ]; then GSD_TOOLS="$HOME/.claude/gsd-core/bin/${_GSD_SHIM_NAME}"; gsd_run() { node "$GSD_TOOLS" "$@"; }; else echo "ERROR: gsd-tools.cjs not found at $GSD_TOOLS and gsd-tools is not on PATH. Run: npx -y @therocketcode/gsd-core@latest --claude --local" >&2; exit 1; fi
+COMMIT_DOCS=$(gsd_run query config-get commit_docs 2>/dev/null || echo "true")
+RESPONSE_LANG=$(gsd_run query config-get response_language 2>/dev/null || true)
+ls .planning/PROJECT.md >/dev/null 2>&1 && echo "PROJECT_FOUND" || echo "NO_PROJECT"
+ls .planning/adr/*.md >/dev/null 2>&1 && echo "HAS_ADR" || echo "NO_ADR"
+ls .planning/INFRA-STRATEGY.md >/dev/null 2>&1 && echo "EXISTS" || echo "NEW"
+```
+**If `NO_PROJECT`:** Stop — "No project found. Run /gsd:new-project first." Exit.
+**If `RESPONSE_LANG` non-empty:** all user-facing text in that language; keep technical terms, service names (Cloud Run, Fargate, GKE), and rung names in English.
+**Text mode** (`--text` OR `workflow.text_mode: true`): replace every `AskUserQuestion` with a plain-text numbered list.
+**If `EXISTS` and not `--auto`:** ask Update / View / Skip (header "Infra"). On Skip: exit ("Existing INFRA-STRATEGY.md preserved."). On View: show then Update/Skip.
+## Step 2: Load context
+```bash
+cat .planning/PROJECT.md 2>/dev/null || true
+cat .planning/PRODUCT-BRIEF.md 2>/dev/null || true
+cat .planning/REQUIREMENTS.md 2>/dev/null || true
+cat .planning/adr/*.md 2>/dev/null || true
+cat .planning/TEST-STRATEGY.md 2>/dev/null || true
+```
+**Read `@~/.claude/gsd-core/references/infrastructure-strategy.md` now** — it defines the compute ladder with quantified move-up triggers, the crossover numbers (Fargate-vs-EC2, the CAST AI utilization data, the <4-engineers floor), the per-cloud asymmetries and equivalences table, the observability floor, the when-you-actually-need triggers, the IaC floor, the anti-patterns, and the meta-tell.
+From the artifacts, extract: **scale expectations + traffic shape** (PRODUCT-BRIEF), **deployment topology** — how many independently deployed components (the ADR; monolith → one service is the normal answer), and **CI environment needs** (TEST-STRATEGY: test containers, e2e environments). **If `NO_ADR`:** tell the user "No architecture decision found — I'll ask briefly. (Consider `/gsd:recommend-architecture` first.)" then ask: how many deployables, and is anything stateful self-managed?
+Then gather the three inputs every crossover keys off (AskUserQuestion, header "Shape", or a text list): **traffic shape** (idle most of the day? bursty? steady?), **team size** (engineers who'd touch infra), and **expected monthly compute spend** (or "no idea" — fine, the default rung is the safe prior).
+## Step 3: Cloud selection
+Ask (AskUserQuestion, header "Cloud"): "Is a cloud already decided or constrained (existing org account, credits, compliance, team experience)?" Options: GCP / AWS / Azure / "No constraint — recommend one".
+If constrained: take it — the ladder is cloud-portable; use the reference's equivalences table to translate every later recommendation into that cloud's column. If unconstrained: recommend by **team familiarity first** (the cloud the team knows beats marginal pricing differences), and surface the one asymmetry that matters at this stage: a scale-to-zero **$0-idle dev environment** is real on GCP (Cloud Run) and Azure (Container Apps) via their shared free grant; on AWS, Fargate has no free tier and does not scale to zero — dev costs idle money or gets redesigned around Lambda. For a small greenfield team with no constraint, that asymmetry usually tips GCP or Azure.
+## Step 4: Compute rung (walk the ladder per component)
+For each deployable component from the ADR topology, walk the reference's ladder. **Default = serverless containers** (Cloud Run / ECS+Fargate / Container Apps). Only place a component on another rung when a concrete trigger from the reference fires — and record the trigger next to the rung.
+- **Rung-down check (static/FaaS):** pre-renderable frontends → static/edge hosting; pure event-glue (webhooks, queue consumers, cron) → FaaS is fine *until* the FaaS→containers triggers: >15-min runs, WebSockets/streaming, connection pools / in-memory caches, or ~>15M invocations/month sub-second.
+- **Rung-up check (K8s):** only when sustained utilization >40–50% with commitments (roughly >$50–100k/yr compute) AND team ≥4 engineers or a platform owner, OR a capability trigger (GPUs, sidecars/mesh, multi-tenant isolation, stateful sets, operators). If the K8s API is genuinely needed early, recommend **GKE Autopilot** (pod-level billing, no node ops) before a standard cluster.
+**Scripted pushbacks — use these, don't improvise:**
+- *"We need Kubernetes to scale."* Real clusters average **8–13% CPU utilization** (CAST AI, 2,000+ orgs; ~70% of requested resources never used); AWS's own study shows **Fargate 87% cheaper at ~6% utilization**, and dedicated compute only wins above ~70–80% sustained on-demand (~40–50% with commitments) — 4–10× above what the median cluster achieves. Serverless containers scale further than this project's brief requires. And the team-size floor is hard: **<4 engineers → no self-managed K8s.** Ask for the current, concrete capability trigger; if none, the default rung stands. If the K8s API itself is the requirement, offer GKE Autopilot as the middle rung.
+- *"We'll just use a VM, it's simpler."* Check the exception list: BYOL licensing, special hardware/GPU control, kernel access, self-managed stateful systems, max-utilization 3-yr-RI fleets. None apply → the VM means hand-rolling deploys, health checks, rollouts, and autoscaling that the serverless-container platform does for free. (Honest exception: a tiny fixed-traffic product on one boring VM is legitimate — say so if it fits, and record the promotion trigger.)
+For every component, also record the **promotion trigger to the next rung** (the measurable signal — sustained utilization, invocation volume, a capability need — that would justify moving up later).
+## Step 5: Data layer (delegate detail)
+**Read `@~/.claude/gsd-core/references/data-environments.md` now** — it owns the database detail: the serverless-Postgres crossover (when Neon/Aurora-Serverless-class beats provisioned), why **connection pooling is mandatory** the moment serverless compute talks to Postgres (every scaled-out instance opens connections; poolers/pgbouncer or the provider's pooled endpoint), and the per-environment data story.
+Here, decide only the headline: managed Postgres in the chosen cloud (Cloud SQL / RDS / PG Flexible Server) vs serverless Postgres for dev/preview, sized one notch smaller than instinct (61% never rightsize; rightsizing recovers 20–40%), storage autoscaling on, multi-AZ only when users would notice an outage. Record per environment: dev / preview / staging / prod, and the **crossover-watch metric** from the reference (the number that, when crossed, flips the provisioned-vs-serverless answer).
+## Step 6: Environments, observability floor, IaC floor
+Recommend defaults, then confirm in one round (AskUserQuestion, header "Floors", or a text list):
+- **Environments:** dev (scale-to-zero, $0-idle where the cloud allows) → staging (prod-shaped, smaller) → prod. Preview-per-PR only if the platform makes it free-ish (Cloud Run/ACA revisions do).
+- **Observability floor (day one, ~$0–26/mo):** structured JSON logs to stdout; error tracking (Sentry free tier); an external uptime check; one golden-signals dashboard from platform metrics (rate, errors, p50/p95/p99, instances); **3–5 alerts max including a billing budget alert** — the most important alert a small team sets. Explicitly defer tracing/OTel and SLO machinery until **>3 services in a request path**.
+- **IaC floor:** one small Terraform/OpenTofu root module (~100–200 lines, remote state in a bucket) OR honest scripted CLI deploys checked into the repo — both acceptable; **Terraform earns its keep at the second environment or second service**. Secrets in the cloud secret manager, never in tfvars. No premature modules.
+- **Day-one non-negotiables:** max-instances cap (the cost ceiling), billing alert, structured logs, uptime check, the IaC floor.
+## Step 7: Over-/under-engineering check (the meta-tell, both directions)
+- **Downward:** every rung above serverless containers must name a **current, concrete requirement** (a real >15-min job, a real GPU/operator need, measured utilization above the crossover, a real BYOL contract). No concrete requirement → drop to the default. Same for LB/VPC/multi-region: no trigger from the reference's table → not yet.
+- **Upward:** scan for parked mismatches — a self-managed Kafka or a stateful workload on the default rung, a 16-hour batch on Lambda, private-data access with no VPC plan, a CI strategy (TEST-STRATEGY) that needs containers the platform can't run. Move **that one component** up, not the whole stack.
+State the surviving justifications; they go in the strategy doc.
+## Step 8: Write INFRA-STRATEGY.md
+**Recommend, don't dictate.** Present the full recommendation in one paragraph (cloud, rung per component, data layer, floors) with 1–2 alternatives and trade-offs (AskUserQuestion, header "Infra"): "Accept", "Adjust (I'll tell you what)", "Show alternatives in detail".
+Once approved, render `@~/.claude/gsd-core/templates/infra-strategy.md` (fill `[DATE]`, `[PROJECT_TITLE]`, `[ADR-NNNN]`). Fill: cloud + why; the per-component compute table with triggers and promotion triggers; data layer per environment with the crossover-watch metric; environments map; secrets; the observability checklist; IaC approach; cost guardrails (billing alert thresholds, max-instances caps); NOT-decided/deferred; handoff notes.
+Write to `.planning/INFRA-STRATEGY.md`.
+## Step 9: Commit
+```bash
+if [ "$COMMIT_DOCS" = "true" ]; then
+  gsd_run query commit "docs: add infrastructure strategy (serverless-container default)" --files .planning/INFRA-STRATEGY.md
+else
+  echo "INFRA-STRATEGY.md written but not committed (commit_docs is false)."
+fi
+```
+## Step 10: Wrap up
+Display:
+```
+INFRA-STRATEGY.md written — infrastructure matched to traffic shape and team size.
+  Cloud: [GCP|AWS|Azure] ([why])
+  Compute: [component → rung (trigger)] ...
+  Data: [managed PG / serverless PG per env]
+  Floors: observability [N alerts incl. billing] · IaC [Terraform root module | scripted CLI]
+  Cost guardrails: max-instances cap + billing alert at [$N]
+Next: /gsd:cicd-strategy   (pipelines + deploy targets will follow this strategy)
+```
+</process>
+<critical_rules>
+- **Serverless containers are the default rung.** Every rung above it needs a current, concrete trigger — recorded next to the rung. The CAST AI utilization data and the Fargate crossover are the evidence; cite them when pushing back.
+- **The team-size floor is hard.** <4 engineers → no self-managed Kubernetes; GKE Autopilot is the escape hatch when the K8s API is genuinely required.
+- **Per-cloud asymmetries change answers.** Fargate ≠ scale-to-zero and has no free tier; Cloud Run/Container Apps give a $0-idle dev story. Never recommend symmetrically across clouds.
+- **Day-one non-negotiables:** max-instances cap, billing alert, structured logs, uptime check, the IaC floor.
+- **Apply the meta-tell in both directions.** Drop unjustified rungs down; move genuinely-triggered components up — one component at a time, never the whole stack.
+- **Recommend, don't dictate.** Present trade-offs and alternatives; the user approves. Respect `commit_docs` / `response_language`.
+</critical_rules>
+<success_criteria>
+- Context loaded (PRODUCT-BRIEF / ADR / TEST-STRATEGY where present); traffic shape, team size, and spend gathered
+- Cloud chosen (constraint or familiarity) with the scale-to-zero asymmetry surfaced
+- Compute rung decided per component, each non-default rung tied to a concrete trigger; promotion triggers recorded
+- Data layer per environment decided, pooling noted, crossover-watch metric recorded (per data-environments.md)
+- Observability floor (incl. billing alert) and IaC floor confirmed; tracing/SLO explicitly deferred until >3 services in a request path
+- Meta-tell applied both directions
+- INFRA-STRATEGY.md written and committed (when commit_docs is true)
+- User directed to /gsd:cicd-strategy
+</success_criteria>

package/gsd-core/workflows/model-domain.md CHANGED Viewed

@@ -82,25 +82,24 @@ Reply with a number, or just tell me the corrections.
 For each candidate area, classify it and **capture the rationale**. Apply the misclassification check from the reference.
-For each area, use `AskUserQuestion` (header = the area name):
-- question: "Is *[area]* where you compete and win, something you need but isn't your edge, or a commodity every product has?"
-- options:
-  - "Core — we differentiate here" (→ build in-house, invest)
-  - "Supporting — needed, not our edge" (→ build simply / buy-and-extend)
-  - "Generic — commodity" (→ buy / off-the-shelf / library)
+**Confirm the area list first** (AskUserQuestion, header "Areas"): "These are the areas I see: [list]. What's missing, and is anything really two areas?"
+**Then propose all classifications at once** (draft-then-refine, like Step 3): one table — area · proposed type ("Core — we differentiate here" / "Supporting — needed, not our edge" / "Generic — commodity") · one-line rationale — asking (header "Subdomains"): "Which of these are wrong?" Run the checks and complexity signals below on every contested area and the claimed core — batching cuts question count, not rigor.
 **Apply these checks before finalizing each classification (state them to the user when they apply):**
 1. **Differentiation — not difficulty — decides Core.** If the user justifies Core by *difficulty, criticality, security, risk, or regulatory burden* rather than by competitive differentiation, test it: "Is this actually your competitive advantage, or a hard/critical-but-standard problem (e.g., tax, auth, encryption, compliance) you could buy?" If standard → **Generic (buy)**, not core. Critical ≠ differentiating; regulated ≠ differentiating.
-2. **CRUD that will grow.** Before accepting Generic/CRUD, ask: "Will this accumulate real business rules and invariants over time, or stay simple data-in/data-out?" If it will grow → mark it **emerging Supporting** (the default for growing areas), not generic. It is Core only if it is itself the competitive differentiator — and there is normally exactly one of those.
+2. **CRUD that will grow.** Whenever an area is *described* as CRUD/simple/"just forms and dates" — regardless of the type being claimed — ask: "Will this accumulate real business rules and invariants over time, or stay simple data-in/data-out?" If it will grow → mark it **emerging Supporting** (the default for growing areas), not generic. It is Core only if it is itself the competitive differentiator — and there is normally exactly one of those. Claiming Core *while* describing it as trivial is a contradiction — Core means differentiating AND complex; probe which half is wrong.
 3. **Generic ≠ low quality.** Note to the user that "generic" means *not differentiating*, not *low effort*.
-Record each subdomain's name, type, one-line description, rationale, and a rough complexity (low/medium/high). You should end with exactly **one** clearly-named core domain in most cases — if the user names many "core" areas, push back (anti-sprawl): "Which ONE is the real competitive core?"
+**Complexity is derived, never asked.** For each non-generic area, elicit 2–3 of the reference rubric's five signals (invariants; lifecycle depth; derivation/optimization; temporal logic; policy variance) — usually one question: "What rules can never be broken here, and what's the hardest decision this area makes?" — then rate per the rubric, recording fired signals in the rationale cell. **Tripwire: Core+low is a contradiction** — probe: "If it's your differentiator but has no complex rules, what makes it hard to copy?" — it's either not core, or not low. Generic+high is a buy-harder signal.
+Record each subdomain's name, type, one-line description, rationale, and the derived complexity (low/medium/high). You should end with exactly **one** clearly-named core domain in most cases — if the user names many "core" areas, push back (anti-sprawl): "Which ONE is the real competitive core?"
 For the single core domain only, capture in one line **what "winning" means** — the decision dimensions the core optimizes (e.g., "best match = price × reliability × lane-fit") — NOT the algorithm or any implementation. This sharpens the core for the architecture and planning phases.
 ## Step 5: Bounded contexts (optional — only if `--event-storming`)
-If `--event-storming` is NOT set: write in DOMAIN-MODEL.md "Bounded Contexts: deferred — single context assumed; planning will refine if boundaries emerge." Skip to Step 6.
+If `--event-storming` is NOT set: write in DOMAIN-MODEL.md "Bounded Contexts: deferred — single context assumed; planning will refine if boundaries emerge" (unless Step 6's candidate-boundary rule fires — then record the candidates instead). Skip to Step 6.
 If set, run a **Big-Picture** pass (timeline of events, not aggregates):
 1. Ask (AskUserQuestion or text list) for the major domain events: "What significant things *happen* in this system? (e.g., 'Order Placed', 'Payment Captured', 'Shipment Dispatched')". Collect 5–10.
@@ -117,8 +116,8 @@ If set, run a **Big-Picture** pass (timeline of events, not aggregates):
 Render `@~/.claude/gsd-core/templates/domain-model.md` (fill `[DATE]` with today's date and `[PROJECT_TITLE]` from PROJECT.md), filling:
 - **Ubiquitous Language** table (term, definition, used-by, aliases/confusions)
 - **Subdomains** table + the Core/Supporting/Generic groupings, each with rationale; note the misclassification check was applied
-- **Bounded Contexts** (filled or explicitly deferred)
-- **Notes for downstream phases** — one line for architecture (e.g., "Core 'X' is high-complexity → expect a richer domain model; rest is CRUD") plus deferred boundaries. **Any polyseme / language conflict flagged in Step 3 MUST appear here**, even when bounded contexts are deferred — it is a candidate context boundary.
+- **Bounded Contexts** (filled, candidate-recorded, or explicitly deferred). **Candidate-boundary rule (even without `--event-storming`):** a flagged polyseme OR third-party/legacy upstream vocabulary in the glossary is a *proven* boundary — don't merely defer it. Name the candidate contexts (one line each) and the seam relationship (default **ACL** against a third-party upstream), marked "candidate — refine in planning", noting `--event-storming` would formalize them. Recording only — never "single context assumed" next to a flagged boundary.
+- **Notes for downstream phases** — one line for architecture (e.g., "Core 'X' is high-complexity → expect a richer domain model; rest is CRUD") plus deferred boundaries. **Any polyseme / language conflict flagged in Step 3 MUST appear here**, even when bounded contexts are deferred.
 Write to `.planning/DOMAIN-MODEL.md`. **Do not include any architecture recommendation** — only the domain.
@@ -151,8 +150,9 @@ Next: /gsd:plan-phase   (planning will use the subdomain complexity to shape arc
 - **Strategic only.** Capture language + subdomains (+ optional contexts). Never prescribe architecture, never design aggregates — that is a later phase.
 - **Buy-vs-build is allowed; stacks are not.** Classifying a subdomain as buy / off-the-shelf / library is strategic and encouraged. Choosing architecture patterns, frameworks, or deployment topology is forbidden here — that is `recommend-architecture`.
 - Every subdomain classification MUST carry an explicit rationale, and the misclassification check (complex≠core, CRUD-that-grows, generic≠low-quality) MUST be applied.
+- Complexity is derived from elicited signals (reference rubric), never a free label; Core+low is a contradiction — challenge it.
 - Capture the team's language, not textbook definitions.
-- Anti-sprawl: aim for one clearly-named core domain; defer unclear context boundaries rather than inventing them.
+- Anti-sprawl: aim for one clearly-named core domain; defer unclear context boundaries rather than inventing them — but record glossary-proven boundaries as candidates.
 - Respect `commit_docs` and `response_language`.
 </critical_rules>
@@ -160,7 +160,7 @@ Next: /gsd:plan-phase   (planning will use the subdomain complexity to shape arc
 - Project context loaded (PROJECT.md/REQUIREMENTS.md) before questioning
 - Ubiquitous language captured (~8–15 terms) with definitions, usage, and conflicts
 - Every subdomain classified with rationale + misclassification check applied
-- Bounded contexts surfaced (with `--event-storming`) or explicitly deferred
+- Bounded contexts surfaced (with `--event-storming`), candidate-recorded, or explicitly deferred
 - No architecture prescribed
 - DOMAIN-MODEL.md written from the template and committed (when commit_docs is true)
 - User directed to /gsd:plan-phase

package/gsd-core/workflows/plan-phase.md CHANGED Viewed

@@ -831,7 +831,7 @@ Pattern mapper prompt:
 <files_to_read>
 - {context_path} (USER DECISIONS from /gsd:discuss-phase)
-- **Every file listed under `## Canonical References` inside {context_path}** — these are MANDATORY ("Downstream agents MUST read these before planning"). When they include a DOMAIN-MODEL.md, an architecture ADR, or a TEST-STRATEGY.md, the plan MUST follow the domain model's subdomain classification, the architecture decision's per-subdomain rung, and the test strategy's per-subdomain test levels — and pull the test-infra references that TEST-STRATEGY links into the `@`-context of the relevant test tasks. Also read `.planning/DOMAIN-MODEL.md`, the latest `.planning/adr/*.md`, and `.planning/TEST-STRATEGY.md` directly if they exist — they are project-wide and always apply, even when not listed.
+- **Every file listed under `## Canonical References` inside {context_path}** — these are MANDATORY ("Downstream agents MUST read these before planning"). When they include a DOMAIN-MODEL.md, an architecture ADR, or a TEST-STRATEGY.md, the plan MUST follow the domain model's subdomain classification, the architecture decision's per-subdomain rung, and the test strategy's per-subdomain test levels — and pull the test-infra references that TEST-STRATEGY links into the `@`-context of the relevant test tasks. Also read `.planning/DOMAIN-MODEL.md`, the latest `.planning/adr/*.md`, `.planning/TEST-STRATEGY.md`, `.planning/INFRA-STRATEGY.md`, and `.planning/CICD-STRATEGY.md` directly if they exist — they are project-wide and always apply, even when not listed.
 - {research_path} (Technical Research)
 </files_to_read>
@@ -896,7 +896,7 @@ Planner prompt:
 - {roadmap_path} (Roadmap)
 - {requirements_path} (Requirements)
 - {context_path} (USER DECISIONS from /gsd:discuss-phase)
-- **Every file listed under `## Canonical References` inside {context_path}** — these are MANDATORY ("Downstream agents MUST read these before planning"). When they include a DOMAIN-MODEL.md, an architecture ADR, or a TEST-STRATEGY.md, the plan MUST follow the domain model's subdomain classification, the architecture decision's per-subdomain rung, and the test strategy's per-subdomain test levels — and pull the test-infra references that TEST-STRATEGY links into the `@`-context of the relevant test tasks. Also read `.planning/DOMAIN-MODEL.md`, the latest `.planning/adr/*.md`, and `.planning/TEST-STRATEGY.md` directly if they exist — they are project-wide and always apply, even when not listed.
+- **Every file listed under `## Canonical References` inside {context_path}** — these are MANDATORY ("Downstream agents MUST read these before planning"). When they include a DOMAIN-MODEL.md, an architecture ADR, or a TEST-STRATEGY.md, the plan MUST follow the domain model's subdomain classification, the architecture decision's per-subdomain rung, and the test strategy's per-subdomain test levels — and pull the test-infra references that TEST-STRATEGY links into the `@`-context of the relevant test tasks. Also read `.planning/DOMAIN-MODEL.md`, the latest `.planning/adr/*.md`, `.planning/TEST-STRATEGY.md`, `.planning/INFRA-STRATEGY.md`, and `.planning/CICD-STRATEGY.md` directly if they exist — they are project-wide and always apply, even when not listed.
 - {research_path} (Technical Research)
 - {PATTERNS_PATH} (Pattern Map — analog files and code excerpts, if exists)
 - {verification_path} (Verification Gaps - if --gaps)