npm - @therocketcode/gsd-core - Versions diffs - 1.7.5 → 1.8.0 - Mend

@therocketcode/gsd-core 1.7.5 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.claude-plugin/plugin.json +1 -1
package/agents/gsd-plan-checker.md +2 -2
package/commands/gsd/cicd-strategy.md +67 -0
package/commands/gsd/discover-product.md +2 -2
package/commands/gsd/infrastructure-strategy.md +65 -0
package/gemini-extension.json +1 -1
package/gsd-core/references/ai-test-quality.md +85 -0
package/gsd-core/references/architecture-decision.md +10 -7
package/gsd-core/references/cicd-strategy.md +115 -0
package/gsd-core/references/contract-testing.md +9 -1
package/gsd-core/references/data-environments.md +89 -0
package/gsd-core/references/domain-modeling.md +14 -2
package/gsd-core/references/e2e-tiering.md +2 -2
package/gsd-core/references/infrastructure-strategy.md +91 -0
package/gsd-core/references/product-discovery.md +7 -7
package/gsd-core/references/test-doubles.md +88 -0
package/gsd-core/references/test-strategy.md +6 -5
package/gsd-core/templates/adr.md +21 -1
package/gsd-core/templates/cicd-strategy.md +72 -0
package/gsd-core/templates/domain-model.md +4 -2
package/gsd-core/templates/infra-strategy.md +77 -0
package/gsd-core/templates/product-brief.md +10 -8
package/gsd-core/templates/test-strategy.md +8 -0
package/gsd-core/workflows/add-tests.md +8 -3
package/gsd-core/workflows/cicd-strategy.md +152 -0
package/gsd-core/workflows/discover-product.md +13 -9
package/gsd-core/workflows/discuss-phase.md +1 -1
package/gsd-core/workflows/help/modes/full.md +2 -0
package/gsd-core/workflows/infrastructure-strategy.md +142 -0
package/gsd-core/workflows/model-domain.md +13 -13
package/gsd-core/workflows/plan-phase.md +2 -2
package/gsd-core/workflows/recommend-architecture.md +22 -8
package/gsd-core/workflows/testing-strategy.md +6 -4
package/package.json +1 -1

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "gsd-core",
   "displayName": "GSD Core",
-  "version": "1.7.5",
+  "version": "1.8.0",
   "description": "GSD Core is a meta-prompting, context engineering, and spec-driven development system for AI coding agents.",
   "author": {
     "name": "TheRocketCodeMX",

package/agents/gsd-plan-checker.md CHANGED Viewed

@@ -71,13 +71,13 @@ This ensures verification checks that plans follow project-specific conventions.
 | `## Decisions` | LOCKED — plans MUST implement these exactly. Flag if contradicted. |
 | `## Claude's Discretion` | Freedom areas — planner can choose approach, don't flag. |
 | `## Deferred Ideas` | Out of scope — plans must NOT include these. Flag if present. |
-| `## Canonical References` | MUST-read docs (incl. any DOMAIN-MODEL.md / architecture ADR / TEST-STRATEGY.md). Read them; plans MUST follow them. |
+| `## Canonical References` | MUST-read docs (incl. any DOMAIN-MODEL.md / architecture ADR / TEST-STRATEGY.md / INFRA-STRATEGY.md / CICD-STRATEGY.md). Read them; plans MUST follow them. |
 If CONTEXT.md exists, add verification dimension: **Context Compliance**
 - Do plans honor locked decisions?
 - Are deferred ideas excluded?
 - Are discretion areas handled appropriately?
-- **Do plans honor the canonical discovery artifacts?** Flag a HIGH concern if a task contradicts the architecture ADR's per-subdomain rung (e.g. CRUD where a Domain Model is mandated), the DOMAIN-MODEL classification, or the TEST-STRATEGY's test levels (e.g. unit-mocking the DB where integration via Testcontainers is required, or float money where integer minor units are mandated).
+- **Do plans honor the canonical discovery artifacts?** Flag a HIGH concern if a task contradicts the architecture ADR's per-subdomain rung (e.g. CRUD where a Domain Model is mandated), the DOMAIN-MODEL classification, or the TEST-STRATEGY's test levels (e.g. unit-mocking the DB where integration via Testcontainers is required, or float money where integer minor units are mandated). Same for INFRA-STRATEGY/CICD-STRATEGY when present (e.g. committed .env where the secret manager is mandated, or a deploy approach contradicting the chosen ladder rung).
 </upstream_input>
 <core_principle>

package/commands/gsd/cicd-strategy.md ADDED Viewed

@@ -0,0 +1,67 @@
+---
+name: gsd:cicd-strategy
+description: Recommend a CI/CD strategy — CI platform, OIDC auth, test-tier pipeline stages, deploy ladder.
+argument-hint: "[--auto] [--text]"
+allowed-tools:
+  - Read
+  - Write
+  - Bash
+  - Glob
+  - Grep
+  - AskUserQuestion
+requires: [testing-strategy, plan-phase]
+---
+<objective>
+Decide WHERE CI runs, HOW it authenticates to the cloud, WHICH test tiers gate which pipeline stage, and HOW deploys promote — matched to the test strategy, the target infrastructure, and the team — and capture it so CI/deploy phases plan against a coherent pipeline.
+**Position in workflow:** `testing-strategy → cicd-strategy → plan-phase / execute-phase`
+**How it works:**
+1. Load TEST-STRATEGY.md (the tiers + smoke list) and INFRA-STRATEGY.md / ADR (the target cloud)
+2. Pick the CI platform — GitHub Actions by default; cloud-native CI only as a deliberate exception
+3. Set auth (OIDC with a pinned `sub` condition) and the secrets split (CI-scoped vs application)
+4. Map test tiers to stages: PR gate ≤10 min, merge-to-main, nightly + mutation; flaky quarantine policy
+5. Pick the deployment ladder rung (team size + blast radius) and the free supply-chain table stakes
+6. Write CICD-STRATEGY.md and commit
+**Output:** `.planning/CICD-STRATEGY.md` — platform + why, auth + secrets split, the stage map with time budgets, flaky policy, deployment ladder rung with promotion triggers, and the supply-chain checklist. Feeds plan-phase.
+</objective>
+<execution_context>
+@~/.claude/gsd-core/workflows/cicd-strategy.md
+@~/.claude/gsd-core/references/cicd-strategy.md
+@~/.claude/gsd-core/templates/cicd-strategy.md
+</execution_context>
+<runtime_note>
+**Copilot (VS Code):** Use `vscode_askquestions` wherever this workflow calls `AskUserQuestion`. They are equivalent.
+</runtime_note>
+<context>
+**Flags:**
+- `--auto` — Skip interactive questions; synthesize the strategy from TEST-STRATEGY / INFRA-STRATEGY using the consensus defaults (GHA, pinned-`sub` OIDC, ≤10-min PR gate, ladder rung from recorded team size).
+- `--text` — Use plain-text numbered lists instead of TUI menus (required for `/rc` remote sessions).
+**When to run:** after `/gsd:testing-strategy` (it consumes the test tiers and smoke list), before planning CI/deploy phases. Works without a TEST-STRATEGY too — it will suggest running it first, then proceed with generic tiers.
+Context files are resolved in-workflow during initialization.
+</context>
+<process>
+Execute end-to-end.
+**MANDATORY:** Read the workflow file BEFORE taking any action. It contains the full process: the GHA-default platform decision with the cloud-native exception (and its scripted pushbacks in both directions), pinned-`sub` OIDC and the secrets split, the tier→stage mapping with the hard ≤10-minute PR budget, the flaky quarantine canon, the merge-queue trigger, the deployment ladder with the staging/canary pushbacks, the free-six supply-chain table stakes vs the deferred ceremony, and the over/under-engineering meta-tell check. Do not improvise from the objective summary above. Never recommend bare "OIDC" without the pinned `sub` condition; never put application secrets anywhere but the cloud secret manager.
+</process>
+<success_criteria>
+- TEST-STRATEGY.md tiers + smoke list loaded (or generic tiers with the gap noted); team size + blast radius established
+- CI platform chosen with rationale; cloud-native CI only with a VPC/regulatory or compute-behind-GHA justification
+- Auth = OIDC with pinned `sub` (repo + branch/environment), or the documented short-lived fallback; secrets split recorded
+- Pipeline map set: PR gate ≤10 min (unit + fast integration + 3–7 smoke e2e), merge-to-main, nightly + mutation
+- Flaky policy (quarantine from gate, keep post-merge, no blanket retries) and merge-queue trigger recorded
+- Deployment ladder rung matched to team size + blast radius; promotion triggers recorded for deferred capabilities
+- Free-six supply-chain table stakes recommended; SLSA L3 / cosign / SBOM programs deferred
+- CICD-STRATEGY.md written and committed (when commit_docs is true)
+- User directed to /gsd:plan-phase
+</success_criteria>

package/commands/gsd/discover-product.md CHANGED Viewed

@@ -38,7 +38,7 @@ Define WHAT to build and WHY before building it — separating real demand from
 <context>
 **Flags:**
-- `--auto` — Skip the forcing interview; synthesize DISCOVERY.md from any existing PROJECT.md / REQUIREMENTS.md using recommended defaults.
+- `--auto` — Skip the forcing interview; synthesize PRODUCT-BRIEF.md from any existing PROJECT.md / REQUIREMENTS.md using recommended defaults.
 - `--text` — Use plain-text numbered lists instead of TUI menus (required for `/rc` remote sessions).
 **When to run:** when product value is uncertain (new market, no past-behavior evidence, demand asserted from a hypothetical, or a large/irreversible bet). **Skip** when a client/customer has explicit, evidenced requirements — then go straight to `/gsd:new-project` or lightweight prioritization. Runs standalone — does not require an existing project.
@@ -49,7 +49,7 @@ Context files (if any) are resolved in-workflow during initialization.
 <process>
 Execute end-to-end.
-**MANDATORY:** Read the workflow file BEFORE taking any action. The workflow contains the complete process including the optionality gate, the forcing interview (push past polished first answers), demand-vs-interest probing, the wedge, the four risks, and DISCOVERY.md generation. Do not improvise from the objective summary above. Ask about the PAST, never hypotheticals; frame the vision as an outcome/opportunity (it must admit more than one solution) so the domain and architecture stay open.
+**MANDATORY:** Read the workflow file BEFORE taking any action. The workflow contains the complete process including the optionality gate, the forcing interview (push past polished first answers), demand-vs-interest probing, the wedge, the four risks, and PRODUCT-BRIEF.md generation. Do not improvise from the objective summary above. Ask about the PAST, never hypotheticals; frame the vision as an outcome/opportunity (it must admit more than one solution) so the domain and architecture stay open.
 </process>
 <success_criteria>

package/commands/gsd/infrastructure-strategy.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+name: gsd:infrastructure-strategy
+description: Recommend an infrastructure strategy matched to the project — compute rung, data layer, floors.
+argument-hint: "[--auto] [--text]"
+allowed-tools:
+  - Read
+  - Write
+  - Bash
+  - Glob
+  - Grep
+  - AskUserQuestion
+requires: [recommend-architecture, testing-strategy, cicd-strategy, plan-phase]
+---
+<objective>
+Decide WHERE the system runs — which cloud, which compute rung per component, what data layer per environment, and the observability + IaC floors — matched to actual traffic shape, team size, and spend, and capture it so CI/CD and planning follow a coherent platform.
+**Position in workflow:** `recommend-architecture → testing-strategy → infrastructure-strategy → cicd-strategy / plan-phase`
+**How it works:**
+1. Load PRODUCT-BRIEF (scale expectations), the architecture ADR (topology), and TEST-STRATEGY (CI needs)
+2. Gather the three crossover inputs — traffic shape, team size, monthly compute spend — and pick the cloud
+3. Walk the compute ladder per component — serverless containers are the DEFAULT; every rung above needs a current, concrete trigger (the reference carries the quantified crossovers)
+4. Decide the data layer per environment, the observability floor (incl. a billing alert), and the IaC floor
+5. Apply the over-/under-engineering meta-tell both directions, write INFRA-STRATEGY.md, and commit
+**Output:** `.planning/INFRA-STRATEGY.md` — cloud + why, compute rung per component with promotion triggers, data layer per environment, environments map, observability checklist, IaC approach, and cost guardrails. Feeds cicd-strategy and plan-phase.
+</objective>
+<execution_context>
+@$HOME/.claude/gsd-core/workflows/infrastructure-strategy.md
+@~/.claude/gsd-core/references/infrastructure-strategy.md
+@~/.claude/gsd-core/templates/infra-strategy.md
+</execution_context>
+<runtime_note>
+**Copilot (VS Code):** Use `vscode_askquestions` wherever this workflow calls `AskUserQuestion`. They are equivalent.
+</runtime_note>
+<context>
+**Flags:**
+- `--auto` — Skip interactive questions; synthesize the strategy from PRODUCT-BRIEF / ADR / TEST-STRATEGY using the consensus defaults (serverless containers, managed Postgres, the day-one floors).
+- `--text` — Use plain-text numbered lists instead of TUI menus (required for `/rc` remote sessions).
+**When to run:** after `/gsd:recommend-architecture` and `/gsd:testing-strategy` (it consumes both), before `/gsd:cicd-strategy` and planning. Works without them too — it will ask briefly about topology and CI needs.
+Context files are resolved in-workflow during initialization.
+</context>
+<process>
+Execute end-to-end.
+**MANDATORY:** Read the workflow file BEFORE taking any action. It contains the full process: the compute decision ladder with quantified move-up triggers (the Fargate-vs-EC2 crossovers, the CAST AI utilization evidence, the <4-engineers Kubernetes floor, GKE Autopilot as the escape hatch), the per-cloud asymmetries (Fargate ≠ scale-to-zero; Cloud Run/Container Apps $0-idle dev), the scripted pushbacks for "we need Kubernetes" and "we'll just use a VM", the data-layer delegation to data-environments.md, the observability and IaC floors, and the meta-tell. Do not improvise from the objective summary above. The rung is an OUTPUT of evidence, never a platform you pick; serverless containers are the default and every rung above needs a current, concrete trigger.
+</process>
+<success_criteria>
+- PRODUCT-BRIEF / ADR / TEST-STRATEGY loaded where present; traffic shape, team size, and spend gathered
+- Cloud chosen by constraint or team familiarity, with the scale-to-zero asymmetry surfaced
+- Compute rung per component recorded with the concrete trigger that justifies anything above serverless containers, plus promotion triggers
+- Data layer decided per environment (pooling mandatory; crossover-watch metric recorded)
+- Observability floor (3–5 alerts incl. billing) and IaC floor confirmed; tracing/SLO deferred until >3 services in a request path
+- Over-/under-engineering meta-tell applied in both directions
+- INFRA-STRATEGY.md written and committed (when commit_docs is true)
+- User directed to /gsd:cicd-strategy
+</success_criteria>

package/gemini-extension.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gsd-core",
-  "version": "1.7.5",
+  "version": "1.8.0",
   "description": "GSD Core — a meta-prompting, context engineering, and spec-driven development system for AI coding agents. Loads gsd's operating context into every Gemini CLI session.",
   "contextFileName": "GEMINI.md"
 }

package/gsd-core/references/ai-test-quality.md ADDED Viewed

@@ -0,0 +1,85 @@
+# AI-Written Tests — The Quality Contract
+How-to reference and enforcement contract for tests authored by a model (`add-tests`, TDD inside `execute-phase`). LLM test-writers have *known, named* failure modes: vacuous assertions, happy-path-only suites, mock-everything-then-assert-the-mock, change-detector tests (the implementation's current output copied back as the oracle), and weakening a failing assertion to go green. This contract converts test quality from judgment into mechanical checks — inventory-first, greppable forbidden patterns, a falsifiability gate, a mutation gate. Read before generating any test. Pairs with `test-strategy.md` and `test-doubles.md`.
+## A. Behavior inventory BEFORE any test is written
+For each public behavior in scope, enumerate — before writing a line of test code:
+- happy path(s);
+- boundary values: empty, zero, one, max, negative, off-by-one, rollover;
+- **every** error/rejection path the surface can produce;
+- illegal states / illegal transitions (state machines especially);
+- clock/concurrency cases wherever the strategy flags them.
+Tests map **1:1 to inventory rows** — every row gets a test; every test cites its row. A suite whose inventory is ≥80% happy-path rows is rejected at the plan-approval step, before generation. The inventory is the plan; "write some tests for this file" is not.
+## B. Forbidden patterns (greppable — check before running the suite)
+Reject any generated test matching these; the list is deliberately machine-checkable:
+- **Vacuous sole assertions:** `toBeDefined()`, `toBeTruthy()`, `not.toThrow()`, `expect(true).toBe(true)`, or `expect(result).toEqual(result)`-shaped self-comparison as a test's *only* assertion.
+- **Mocking the wrong seam:** `jest.mock` / `vi.mock` of an in-process collaborator. Doubles are legal only at the seams TEST-STRATEGY.md declares — its per-subdomain table is the **mockable-seam allow-list** (`test-doubles.md`).
+- **Testing the mock / asserting query interactions:** `toHaveBeenCalled*` on a stubbed *query*; interaction-verify only outbound commands (`test-doubles.md`). Configuring a double and asserting its canned value back is testing nothing.
+- **Mocking the SUT itself** (or its module) and asserting through the mock.
+- **Change-detector oracle:** an expected literal that was obtained by *running the SUT*, with no comment deriving it from the spec/requirement. If the only way to know the expected value is to execute the code, the test enshrines today's bugs.
+- **Snapshot-everything:** `toMatchSnapshot()` as the only assertion on logic output.
+- **Happy-path-only suites:** zero error-path or boundary tests for a surface that has error paths (caught structurally by the inventory, A).
+- **Copy-paste parametrization:** `it.each` / parametrized rows whose assertions don't actually differ per row.
+- **Evasions:** conditional assertions (`if (x) expect(…)`), `try/catch` that swallows the failure, `.skip` without a linked issue, `sleep(`/fixed waits, raw Faker output inside assertions (`realistic-test-data.md`).
+A quick pre-run sweep over the generated files:
+```bash
+grep -nE 'expect\(true\)|toEqual\(result\)|toMatchSnapshot\(\)|\.skip\(|sleep\(|toHaveBeenCalled' $TEST_FILES
+grep -nE '(jest|vi)\.mock\(' $TEST_FILES   # every hit must point at an allow-listed seam
+```
+Hits are not auto-failures — they are review-blocking until justified against this list.
+## C. Assertion-quality rules
+- Every test contains **≥1 specific-value assertion** on observable output or state.
+- Error-path tests assert the error **type** plus one discriminating property (code or message) — *and* that state did not change as a side effect of the rejection.
+- State-machine tests assert both the rejection **and** state preservation ("transition refused AND status still `active`").
+- One behavior per test; the test name states the behavior and expected outcome, not the method name.
+## D. Falsifiability gate — the RED-equivalent for test-after
+A generated test that passes on its first run is **unverified**, not done. The strategy's RED step exists so a test is *seen to fail*; for code that already exists, the equivalent is **prove it can fail**:
+1. Temporarily mutate the SUT — flip a branch condition, drop the write, return a constant.
+2. Re-run; observe the test go **red**.
+3. Revert the mutation; re-run; observe green.
+One extra run per test file; fully automatable. A test that cannot be made to fail by breaking the code under test is testing nothing — delete or rewrite it. **Never waive this step** because "the code already works"; that waiver is exactly where vacuous AI-written tests are born. (Where the mutation gate in E runs on the same files, a killed mutant covering the test's target behavior satisfies this gate.)
+## E. Mutation gate on the diff
+Run mutation testing (Stryker) **incrementally on changed files** as part of accepting a generated suite:
+- mutation score ≥ the strategy's floor (default **80**) on the gnarly-bit/critical modules touched;
+- every surviving mutant is either killed with a new/strengthened test or explicitly waived with a one-line reason.
+This is the only gate that catches vacuous suites *systemically*. Full-codebase mutation runs are too slow for this loop — schedule those nightly (see the strategy's CI execution map); changed-files incremental is the per-change gate.
+## F. Self-interrogation before declaring done
+In the test-plan output, the writer answers — naming the specific test that catches each:
+- Would this suite fail if the function returned a constant?
+- …if the DB write were silently dropped?
+- …if the boundary were off by one?
+- …if the error path threw the wrong error (or none)?
+Any unanswerable question is an inventory gap: go back to A and add the row. "Probably" is not an answer; a test name is.
+## Anti-patterns (summary)
+- Accepting a first-run-green suite without the falsifiability gate (D).
+- Writing tests file-by-file with no behavior inventory — yields happy-path mush.
+- Treating grep hits from B as style nits instead of review blockers.
+- Letting the model "fix" a red test by weakening the assertion instead of investigating which side is wrong.
+- Skipping the mutation gate because coverage looks high — coverage proves lines ran, not that assertions check anything.
+*Sources: Khorikov "Unit Testing Principles, Practices, and Patterns"; Google Testing Blog ("Change-Detector Tests Considered Harmful", "Test Behavior, Not Implementation"); Stryker Mutator docs (incremental mode); "Software Engineering at Google" ch. 12.*

package/gsd-core/references/architecture-decision.md CHANGED Viewed

@@ -19,22 +19,22 @@ Use the core subdomain's complexity from DOMAIN-MODEL. Apply per subdomain: the
 |------|--------------|-----------------------|
 | **Transaction Script / simple layered CRUD** (floor) | "validate → persist → return"; few rules; supporting/generic subdomains | — |
 | **Domain Model** | business rules multiply and tangle; the same invariant is duplicated across scripts; rich conditional behavior; long-lived core | rich aggregates wrapping what is really CRUD; anemic getter-bag "domain" objects |
-| **Hexagonal / Clean wrapper** (orthogonal — wraps either above) | real domain logic worth isolating; multiple/swappable adapters (DB, queue, 3rd-party APIs); high testability need; long lifespan | ports/interfaces with exactly one forever-implementation; DTO-mapping boilerplate around a CRUD endpoint |
+| **Hexagonal / Clean wrapper** (orthogonal — wraps either above) | a **current, concrete** second adapter or delivery mechanism (DB/queue/3rd-party swap, second front-end); or a genuinely pure core worth isolating for test speed | ports/interfaces with exactly one forever-implementation; DTO-mapping boilerplate around a CRUD endpoint; a wrapper claimed on lifespan or abstract "testability" alone |
 | **CQRS** | read and write models genuinely diverge; reads ≫ writes; write model under strain | separate read/write stacks where one model serves both fine |
 | **Event Sourcing** | audit/temporal history is a hard requirement (finance, compliance, "reconstruct past state") | ES on a simple entity with no audit/temporal need |
 ## Axis B — deployment topology
-- **Modular Monolith — the DEFAULT for greenfield.** One team; domain still being learned; few moving parts; fast to change. This is the recommended floor. Enforce internal module boundaries (separate schemas, dependency rules).
+- **Modular Monolith — the DEFAULT for greenfield.** One team; domain still being learned; few moving parts; fast to change. This is the recommended floor. Enforce internal module boundaries (separate schemas, dependency rules). Modules come from DOMAIN-MODEL: **modules = bounded contexts** when mapped, else subdomain groupings; flagged polysemes resolve to one owning module each; an **ACL applies now** to any third-party/legacy integration whose model differs from yours — not only at a future split.
 - **Microservices — only when ALL "you must be this tall" gates pass:**
   1. **Multiple independent teams** needing independent deploy cadence (Conway / Team Topologies).
   2. **CD / monitoring / DevOps maturity** already in place.
   3. **Bounded contexts well-understood** already (not still being discovered).
-  If **any** is "no" → recommend **modular monolith and stop**, regardless of complexity (deferred, not forbidden — record the promotion trigger; see *Evolving the topology* below). (The "microservice premium": below a complexity+org threshold the distributed tax is pure loss.)
-- **Component-level split (Hard Parts):** for a specific component, score the **6 disintegrators** (low cohesion · divergent volatility · divergent scalability · fault isolation · differential security · independent extensibility) against the **4 integrators** (ACID across the data · tightly-coupled workflow · heavy shared code · tight data relationships). Net disintegrators ≫ integrators → candidate extraction; otherwise keep it in the monolith.
+  If **any** is "no" → recommend **modular monolith and stop on the microservices question**, regardless of complexity — the per-component Hard-Parts scan below still runs when a single component shows divergent pressure (deferred, not forbidden — record the promotion trigger; see *Evolving the topology* below). (The "microservice premium": below a complexity+org threshold the distributed tax is pure loss.)
+- **Component-level split (Hard Parts):** for a specific component, score the **6 disintegrators** (low cohesion · divergent volatility · divergent scalability · fault isolation · differential security · independent extensibility) against the **4 integrators** (ACID across the data · tightly-coupled workflow · heavy shared code · tight data relationships). Net disintegrators ≫ integrators → extraction **candidate**: extract now only if the pressure is **current (not projected)** and the CD/ops gate passes — otherwise it becomes that component's promotion trigger. Integrators dominate → keep it in the monolith.
 - **Distributed monolith** (services that can't deploy independently) is the failure mode — you pay the premium and get none of the autonomy. Avoid.
-The "modular monolith and stop, regardless of complexity" rule is about *not splitting prematurely* — it is **not** "never split." It means the split is deferred until a gate flips, and the modular boundaries are built now so the split is cheap later (a **sacrificial / evolutionary** architecture). Record the **promotion trigger** — the concrete future signal (a second team forms, a component's scaling diverges, a bounded context stabilizes) that would justify revisiting Axis B.
+The "modular monolith — stop on the microservices question, regardless of complexity" rule is about *not splitting prematurely* — it is **not** "never split." It means the split is deferred until a gate flips, and the modular boundaries are built now so the split is cheap later (a **sacrificial / evolutionary** architecture). Record the **promotion trigger** — the concrete future signal (a second team forms, a component's scaling diverges, a bounded context stabilizes) that would justify revisiting Axis B.
 ## Evolving the topology — decomposition & migration (when a gate later flips)
@@ -59,8 +59,11 @@ The same tools run in reverse for a brownfield monolith you're decomposing — s
 | Availability / fault isolation | monolith | isolate failure-prone component |
 | Differential security | one trust boundary | separate the stricter-security component |
 | Integration count (adapters) | direct calls | Hexagonal ports & adapters |
-| Expected lifespan | short → keep simple (sacrificial) | long → Domain Model + Hexagonal + fitness functions |
+| Expected lifespan | short → keep simple (sacrificial) | long → Domain Model + fitness functions (Hexagonal only with a real second-adapter/delivery signal) |
 | Team count / ops maturity | 1 team / low → monolith | many independent teams / high → microservices viable |
+| Tenancy isolation (multi-tenant) | shared schema + tenant-scoped RLS (the default) | contractual/regulatory isolation mandate → schema-per-tenant → DB-per-tenant |
+| High-volume ingestion / pipeline | normal tables | decide the pipeline shape (buffer/queue, backpressure, retention) — the rung covers logic only |
+| Async work inside the monolith | direct in-process calls | in-process events / job queue (+ outbox once events must cross a process boundary) |
 ## Over- AND under-engineering
@@ -68,7 +71,7 @@ The same tools run in reverse for a brownfield monolith you're decomposing — s
 **Under-engineering tells:** the same invariant duplicated across many transaction scripts; a big-ball-of-mud monolith with no enforced module boundaries; a complex/regulated domain modeled as thin CRUD; no audit trail where compliance needs it; no ADRs / no fitness functions.
-**The meta-tell (use this to settle every rung):** if you cannot point to a **current, concrete** requirement — a real second adapter, a real divergent-scaling component, a real second team, a real audit mandate — that justifies a rung, you are **over-engineering**. If such a requirement exists and you ignored it, you are **under-engineering**.
+**The meta-tell (use this to settle every rung):** if you cannot point to a **current, concrete** requirement — a real second adapter or delivery mechanism, a real divergent-scaling component, a real second team, a real audit mandate, a real tenant-isolation mandate, a genuinely pure core isolated for test speed — that justifies a rung, you are **over-engineering**. If such a requirement exists and you ignored it, you are **under-engineering**.
 ## Default baseline (when in doubt)

package/gsd-core/references/cicd-strategy.md ADDED Viewed

@@ -0,0 +1,115 @@
+# CI/CD Strategy — Pipeline Follows the Test Strategy
+Reference for `/gsd:cicd-strategy`. Decides WHERE CI runs, HOW it authenticates to the cloud, WHICH test tiers gate which stage, and HOW deploys promote — matched to team size and blast radius. Consumes `TEST-STRATEGY.md` (the tiers) and `INFRA-STRATEGY.md` (the target cloud). Recommends; the user decides.
+## CI platform: GitHub Actions is the DEFAULT
+GitHub Actions has **41% organizational adoption** (62% personal — JetBrains State of Developer Ecosystem 2025, n=24,534), and the old "all-in on one cloud → use that cloud's CI" argument has collapsed: **AWS and Google both publish first-class GitHub Actions → their-cloud deployment paths**, including official OIDC federation docs (AWS Security Blog; Google's keyless Workload Identity Federation blog). AWS even quietly stopped onboarding CodeCommit customers in June 2024 (reversed Nov 2025) while recommending GitHub/GitLab. Even the cloud vendors don't assume cloud-native CI for cloud-native apps.
+**Cloud-native CI (Cloud Build / CodeBuild) is a deliberate EXCEPTION, justified only by:**
+- **VPC-isolated / regulated builds** — builds that must execute inside a private network or compliance boundary (Cloud Build private pools, CodeBuild in-VPC).
+- **Cheap compute behind GHA** — e.g., CodeBuild can host GitHub Actions runner jobs; GHA stays the orchestrator/ecosystem, the cloud supplies the metal.
+If neither applies, cloud-native CI buys a smaller ecosystem for no security gain — OIDC/WIF closed the in-project-credentials advantage.
+### Pricing anchors (official pages, verified 2026)
+| Platform | Free tier | Marginal cost |
+|---|---|---|
+| GitHub Actions | 2,000–3,000 min/mo private; public repos free | Linux x64 **$0.006/min** (arm64 $0.005, macOS $0.062); Jan 2026 cut "up to 39%" |
+| GCP Cloud Build | **2,500 min/mo free** | **$0.006/min**, per-second proration, queue time free |
+| AWS CodeBuild | 100 min/mo | general1.small $0.005/min |
+| Azure Pipelines | 1 hosted job (1,800 min/mo) | **$40/mo per parallel job**, unlimited minutes |
+| GitLab CI | 400 min/mo | $10/1,000 min; per-seat Premium $29 is the real cost driver |
+Runner rule of thumb: stay on hosted runners until you exceed the free tier plus low-hundreds of $/mo, or macOS/heavy-Docker dominates — then **managed third-party runners** (Depot/RunsOn class) before DIY self-hosted. Never self-hosted runners on public repos (GitHub: "almost never").
+## Auth: OIDC keyless is THE standard — with the pinned `sub` caveat
+Rare four-party unanimity: **GitHub** ("no cloud secrets… short-lived access token valid for a single job"), **Google** ("Workload Identity Federation is recommended over Service Account Keys"), **AWS** ("OIDC, recommended… temporary credentials"), and **Microsoft** (federation "eliminates the risk of leaking secrets") all say the same thing. Long-lived cloud keys in CI are empirically disqualified: the CircleCI Jan 2023 breach exfiltrated every stored CI secret via one infected laptop ("immediately rotate any and all secrets"); GitGuardian found 23.8M secrets leaked on public GitHub in 2024 with **70% still valid 2+ years later**; Unit 42 honeypots saw leaked AWS keys exploited in **~5 minutes**.
+**The MANDATORY caveat:** OIDC moves the risk from secret hygiene to **trust-policy hygiene**. Three independent security teams (Datadog Security Labs, Rezonate, Tinder Security Labs) found hundreds-to-**~1,500 cloud roles assumable by ANY GitHub repo** due to missing/wildcard `sub` conditions. The recommendation is always "**OIDC with a pinned `sub` condition (repo + branch/environment)**" — never bare "OIDC."
+**Fallback:** long-lived cloud keys in CI secrets are acceptable ONLY when the target genuinely cannot do federation (legacy/3rd-party SaaS) — then short-lived, scoped, rotated.
+## The secrets split
+| Secret type | Lives where | Rule |
+|---|---|---|
+| Cloud deploy credentials | **Nowhere** — OIDC mints them per job | Pinned `sub`; zero long-lived keys |
+| CI-scoped secrets (e.g., an SaaS API token CI itself needs) | CI platform secrets | **ONLY when OIDC is unavailable** for that target; short-lived, scoped, rotated |
+| Application secrets | **ALWAYS the cloud secret manager** (Secret Manager / Secrets Manager) | Injected at **runtime** (native integration or API fetch); **never baked into images, never a committed `.env`** |
+Backing: 12factor config (repo open-sourceable without compromising credentials); OWASP ("never built-in [to the container], as this will leak the secret with the container definition"); GCP/AWS secret-manager best practices; GitHub's own docs position Actions secrets as small CI-scoped values (48 KB limit, imperfect log masking) and point to OIDC for cloud creds. Empirical: ~100k valid secrets in 15M public Docker images (GitGuardian); Unit 42's large-scale extortion campaign built on exposed `.env` files.
+## Test tiers → pipeline stages (consume TEST-STRATEGY.md)
+This is stated **policy** at Google (SWE at Google ch. 23: presubmit runs only fast, reliable small tests; large/slow tests deferred to postsubmit; release candidates get the full sweep), and the size↔flakiness link is **measured across 4.2M tests** ("larger tests are more flaky… test it in a different, smaller way").
+| Stage | What runs | Budget |
+|---|---|---|
+| **PR gate** | lint, types, **small (unit)** + fast **medium (in-process integration)** + the **3–7 persistent smoke e2e** from TEST-STRATEGY.md (happy paths only) | **≤10 min wall clock** — Continuous Delivery's commit stage ("ideally less than five minutes and no more than ten"); DORA: test feedback "in less than ten minutes" |
+| **Merge to main** | full medium suite + e2e subset against a real (preview/ephemeral) environment | minutes-to-tens-of-minutes |
+| **Nightly / pre-release** | **full e2e portfolio**, long-running suites, cross-browser/device, **mutation run** (Stryker on the critical modules) | unbounded |
+Tests in the PR gate must hold <1% flake rate or be quarantined out (Google: "as you approach 1% flakiness, the tests begin to lose value").
+### Flaky tests — the canon
+- **Quarantine from the PR gate but KEEP RUNNING post-merge, with a fix SLA** — Google ch. 23 + Dropbox Athena (the cleanest published implementation).
+- **Differentiated retries for diagnosis only** (same-process / time-shifted / different-host, to classify root cause) — GitHub Engineering cut flaky-failure impact 18x this way.
+- **NEVER blanket retry-until-green** — Fowler ("Eradicating Non-Determinism in Tests"): rerun-until-green destroys the signal.
+### Merge queue trigger
+Enable a merge queue at roughly **tens of merges/day to one branch** — when "PR passed CI against a stale base" failures become routine. The math: Uber SubmitQueue (EuroSys 2019) showed **~40% chance of conflict-induced breakage at just 16 concurrent potentially-conflicting changes**. Commodity now: GitHub merge queue GA, GitLab merge trains. Below that volume it's pure latency.
+## The deployment ladder ("you must be this tall")
+The invariant at every rung (DORA + SRE Workbook + Charity Majors converge): **small frequent changes through one automated pipeline, fast trustworthy rollback, production observability — these beat pre-prod environment fidelity.** Build once; promote the same digest-pinned artifact with env-attached config (12factor build-release-run).
+| Rung | Capability justified |
+|---|---|
+| **Solo / small team, low blast radius** | Trunk-based + CI + one automated deploy path + **free platform PR previews** (Vercel/Netlify; Neon-style DB branch per preview if Postgres) + **one-command rollback**. **NO staging environment** — Majors: "trying to mirror your staging environment to production is a fool's errand"; staging catches only known-unknowns. |
+| **1–3 people, HIGH blast radius (payments/data)** | Add: **feature flags** for risky paths (internal-first exposure) + **revertable expand-contract schema changes** (Neon/PlanetScale-style reviewed deploy requests) + deliberate blue-keep-alive rollback window (AWS: 15–30 min). Still no canary analysis — insufficient traffic for signal. |
+| **~10 people** | Previews standard for every PR incl. backend; real flag system with hygiene (expiry dates — Knight Capital is the failure mode); DORA metrics; *manual* canary (one instance, watch dashboards). |
+| **~50 people / high traffic + risk** | Automated canary analysis (Argo/Flagger); shared staging is now actively failing you (Uber SLATE deprecates staging for tenancy-isolated test-in-prod). |
+**Canary ANALYSIS prerequisites (SRE Workbook ch. 16 — all required):** ~**a dozen trustworthy, low-variance SLI-derived metrics**, real traffic volume that yields signal on a 1–5% slice, and deploy frequency exceeding human attention — on top of repeatable builds and automated deploys. Below that: rolling deploy + health checks + one-command rollback. Use "the simplest model that meets your technical and business objectives." Plain blue-green is a "before/after canary" — risky because time is the largest source of metric variance.
+## Supply-chain table stakes (small team — all free, each ≤ hours, each counters a real 2023–25 attack)
+1. **SHA-pin all third-party actions + Dependabot updating the pins.** tj-actions/changed-files (Mar 2025, **CVE-2025-30066**): attacker retroactively moved version tags to a malicious secrets-dumping commit, 23,000+ repos hit — tag pinning gave zero protection. Dependabot updates SHA pins with version comments, so "pins go stale" is solved.
+2. **Committed lockfile + `npm ci`** (errors instead of mutating the lock) — counters the 2025 npm wave (chalk/debug compromise, Shai-Hulud worm: ~796 packages, ran TruffleHog on victims).
+3. **Top-level read-only `permissions:`** (`contents: read`) in every workflow + org read-only `GITHUB_TOKEN` default — in tj-actions and Shai-Hulud the blast radius was whatever the stolen token could do (OpenSSF Scorecard Token-Permissions).
+4. **OIDC federation, zero long-lived cloud keys in CI** (above) — counters the CircleCI 2023 breach class.
+5. **Push protection + secret scanning on; no `.env` in repo** — GitGuardian's 23.8M-leaked-secrets numbers.
+6. **Branch ruleset on main: require PR + status checks, block force-push** — GitHub rulesets + Scorecard Branch-Protection.
+Plus two free habits: dependency-review-action + a short cooldown on new dep versions; `npm publish --provenance` / artifact attestations if publishing.
+**DEFER until bigger:** SLSA L3 (the spec itself: "usually requires significant changes to existing build platforms" — hosted runners already ≈ L1–L2), cosign-signing internal artifacts (ceremony without a verifier until artifacts cross trust boundaries), SBOM management programs (enable the free SPDX export, stop there), org-wide Scorecard dashboards, self-hosted runner fleets.
+## Anti-patterns
+| Anti-pattern | Why / best citation |
+|---|---|
+| Long-lived cloud keys in CI secrets | CircleCI 2023 breach: rotate-everything advisory, "use OIDC tokens wherever possible" |
+| Secrets/`.env` committed to repo | 28.65M secrets leaked on public GitHub in 2025; 70% still valid 2+ yrs later (GitGuardian) |
+| Secrets baked into images | OWASP: "never built-in… this will leak the secret with the container definition" |
+| Different artifact per environment | 12factor build-release-run; Humble & Farley "build once, deploy many" |
+| Manual prod deploys, no audit trail | DORA: manual steps increase time and error; deploy any version on demand |
+| Heavy e2e suite as PR gate | SWE at Google ch. 23 (presubmit = small fast tests only) + measured size↔flakiness (4.2M tests) |
+| Blanket retry-until-green on flakes | Fowler: rerun-until-green destroys signal; GitHub's differentiated-retry alternative |
+| Actions pinned to tags, not SHAs | tj-actions CVE-2025-30066: tags retroactively moved to malicious commit |
+| Default-write `GITHUB_TOKEN` | OpenSSF Scorecard Token-Permissions check |
+| `pull_request_target` + untrusted checkout | GitHub Security Lab "Preventing pwn requests" |
+| Self-hosted runners on public repos | GitHub: "should almost never be used" |
+| Force-push to main / no branch protection | GitHub rulesets docs + Scorecard Branch-Protection check |
+| OIDC with wildcard/missing `sub` condition | Datadog Security Labs: 275+ accounts with roles assumable by arbitrary repos |
+| High-fidelity staging as the safety strategy | Majors: "mirror staging to production is a fool's errand"; Uber deprecating staging (SLATE) |
+## Consumes / produces
+- **Consumes** `TEST-STRATEGY.md` (the tiers and the persistent smoke list → stage mapping) and `INFRA-STRATEGY.md` (target cloud → OIDC provider, secret manager, deploy target). If TEST-STRATEGY is absent, suggest `/gsd:testing-strategy` first; proceed with generic small/medium/large tiers if declined.
+- **Produces** `.planning/CICD-STRATEGY.md` — platform, auth, secrets split, pipeline map, flaky policy, ladder rung, supply-chain checklist. Feeds `plan-phase` (CI/deploy phases plan against it).

package/gsd-core/references/contract-testing.md CHANGED Viewed

@@ -50,7 +50,15 @@ Contract testing replaces the temptation to mock the 3rd party in an integration
 ## Schema/spec-based alternative
-When both sides share a spec (REST/gRPC/events) and you control them, schema-driven checks — OpenAPI/AsyncAPI validation, JSON Schema, protobuf backward-compat, Spring Cloud Contract — are a lighter alternative to full consumer-driven contracts.
+When both sides share a spec (REST/gRPC/events) — whether you control both, or the provider won't run your verification — schema-driven checks (OpenAPI/AsyncAPI validation, JSON Schema, protobuf backward-compat, Spring Cloud Contract) are a lighter alternative to full consumer-driven contracts.
+## When the provider won't verify (true 3rd parties — Stripe/Samsara-class)
+A paid vendor will never run your pacts — yet mocking them is still anti-pattern #1. Fall back to a pinned, observable contract:
+- **Pin the contract:** validate your client against the vendor's published OpenAPI/JSON Schema, or against **recorded real responses** replayed at your boundary (refreshed deliberately, never silently).
+- **One thin, contract-verified adapter:** all vendor knowledge behind a single adapter tested against the pinned contract; the rest of the codebase fakes the *port*, not the vendor (`test-doubles.md`).
+- **Scheduled live smoke** against a sandbox account — non-blocking, with drift alerting. Catches "the vendor changed" without flaking every PR.
 ## Anti-patterns

package/gsd-core/references/data-environments.md ADDED Viewed

@@ -0,0 +1,89 @@
+# Data Layer, Secrets & Environments — Decision Reference
+Reference for `/gsd:infrastructure-strategy` (data-layer step): Postgres hosting, database-per-environment, non-prod data, migrations, secrets. Recommends; the user decides. Pairs with `test-containers.md` and `db-test-isolation.md`.
+## Postgres hosting: serverless/branching vs dedicated
+**Serverless carries a 1.5–4× premium per capacity-hour. It wins only when idle most of the time.** 24/7 at 1 CU (1 vCPU/4 GB): Neon ~$77/mo (Launch) to ~$162/mo (Scale) vs ~$47–51/mo dedicated (RDS db.t4g.medium / Cloud SQL equiv). Storage widens it: ~$0.35/GB-mo vs ~$0.115–0.17 (**2–3× premium**). AWS's own blog concedes provisioned beats Aurora Serverless v2 at steady load.
+**The inverse holds:** a dev/preview DB active ~2 h/day is **~8× cheaper** serverless (~$6/mo vs ~$50/mo always-on). Branching serverless is the correct *default for non-prod* and for pre-traffic MVPs.
+| Stage | Recommend | Why |
+|---|---|---|
+| MVP / pre-traffic | Neon-class serverless (free/Launch tier) | Scale-to-zero; **branch-per-PR on day one** — free behavior you can't cheaply retrofit |
+| Growing, spiky | Stay serverless while duty cycle <~50% (or avg utilization <~30%) | Premium only bites at sustained load; watch the two-month trend |
+| Sustained steady load | Dedicated vanilla prod (RDS/Cloud SQL, RI/CUD pricing) + keep a serverless dev/preview twin | You keep branching without paying the 24/7 premium (the Dispatch pattern: Aurora prod, DMS-synced Neon branches per preview) |
+**Migrate OFF serverless when any two hold (then plan the move):**
+1. Compute >~50–65% duty cycle at stable size for **2+ consecutive months**
+2. Autoscaler curve is flat — min ≈ max ≈ steady load (you're paying premium for elasticity you don't use)
+3. Storage in the 100s of GB (the 2–3×/GB premium becomes material)
+4. Cold starts ever hit a **user-facing** path (Neon ~500 ms resume is preview-tolerable; Aurora SLv2's ~15 s is not)
+**Connection limits — pooled endpoint, ALWAYS, with serverless app compute.** Postgres = one OS process per connection; serverless/edge functions open one per invocation. A small serverless instance allows only ~104 direct connections at 0.25 CU — **connection storms are the classic serverless-Postgres outage**. App traffic goes through the pooler (PgBouncer/Supavisor/RDS Proxy); the direct endpoint is reserved for migrations, pg_dump, logical replication.
+## Database-per-environment
+| Environment | Setup |
+|---|---|
+| **Production** | Dedicated, **vanilla engine** (RDS/Cloud SQL — not a fork), least-privilege roles, private endpoint |
+| **Staging** | Same engine + major version, **sized down**; fewer replicas is fine; own isolated DB |
+| **Preview (per-PR)** | Copy-on-write branch (Neon-class) or per-PR small instance, auto-created/destroyed by CI; migrations run on every branch |
+| **Dev / test** | Branch per developer, or local Testcontainers pinned to prod's major version (see `test-containers.md`) |
+**The parity rule (12-factor, original + 2024 official revision — unchanged):** what must match across envs is engine **type + major version**, the extension list you actually use, the **migration history** (same ordered migrations everywhere), and the config mechanism shape. What may differ: instance size, replica count, HA topology, data volume/realism, **and vendor**.
+**Nuance worth teaching:** Neon runs the unmodified Postgres query engine (the fork is at the storage layer), so vanilla-prod + Neon-dev satisfies "same type and version" — it is *more* parity-faithful than Aurora-prod with anything in dev, because **Aurora/AlloyDB are themselves non-vanilla forks** that lag upstream majors. Checklist, not blocker: diff the serverless vendor's extension list and compatibility caveats (no superuser, unlogged tables) against your dependencies.
+## Non-prod data
+**A masked prod clone is still personal data under GDPR** (EDPB Guidelines 01/2025: pseudonymization does not exit scope; only irreversible anonymization does). Therefore:
+- **Default: synthetic seed** — schema-only branches + factory-generated data (see `@~/.claude/gsd-core/references/realistic-test-data.md`).
+- **When realism is needed: irreversible anonymization at branch/clone time** — `postgresql_anonymizer` (Neon's anonymized branches use it) or Greenmask.
+- **Raw production PII never lands in dev/preview/staging.** No exceptions for "it's just staging."
+## Migration discipline
+- **Expand–contract (parallel change):** a breaking schema change is never zero-downtime in one deploy — expand + backfill first, contract later.
+- **Destructive steps (drop column/table) go in a separate, later deploy** than the code that stops using them.
+- **Lint migrations** in CI: strong_migrations (Rails) or squawk (any stack, SQL-level) — catches unsafe locks; always set `lock_timeout`/`statement_timeout` (one unguarded FK's AccessExclusive lock has caused real outages).
+- **Run the forward migration on every preview branch** — validates it against prod-shaped schema before it ever touches prod.
+- **Separate roles:** migrator (DDL) ≠ app (DML-only) ≠ admin. The app user never holds DROP/TRUNCATE — and **never superuser** (managed providers refuse true superuser anyway; code assuming it breaks in prod).
+## Secrets — the floor for any team (~$0–15/mo)
+**Every major secrets breach indicts secrets *copied* (into files, repos, scripts, CI config) and *long-lived*. None was a firewall failure.** The record: **28.65M secrets leaked on public GitHub in 2025** (GitGuardian); a mass-extortion campaign harvested exposed **`.env` files across 110,000 domains** (Unit 42); Uber, CircleCI, and LastPass were all long-lived-credential compromises.
+The floor — non-negotiable at any team size:
+1. **Cloud secret manager as single source of truth** (SSM Parameter Store standard tier is $0; Secrets Manager/GCP SM ≈ $10–15/mo at app scale)
+2. **Runtime references, never copies** — Cloud Run `--update-secrets` refs, ECS `valueFrom` SSM/SM ARNs, K8s External Secrets Operator. Secrets never appear in images, task defs, repos, or CI config.
+3. **CI via OIDC** — short-lived job-scoped cloud tokens; zero long-lived deploy keys (kills the CircleCI failure mode).
+4. **Local dev via CLI injection** — `doppler run -- npm run dev`, `op run`, or a wrapper fetching from the secret manager with the dev's own IAM identity; `direnv` may *trigger* the fetch but `.envrc` never contains values. **Never a committed `.env`.** Dev gets dev-scoped creds only — prod creds never on laptops.
+| Option | Cost | Pick when |
+|---|---|---|
+| **Cloud-native SM** (SSM/Secrets Manager/GCP SM) | $0–15/mo | Single cloud — the default; full audit trail |
+| **Doppler / Infisical** | free dev tier; $12–18/user-or-identity/mo | Multi-cloud or DX-first teams (watch Infisical's per-machine-identity billing) |
+| **SOPS+age / sealed-secrets** | $0 | GitOps-heavy, few humans — eyes open: **no per-read audit trail**, leaked key decrypts all history |
+| **Vault** | self-hosted clusters or enterprise contract | Enterprise ceremony now (BSL, IBM-owned; cheap SaaS on-ramp sunsetted) — earns it only for dynamic creds at scale / strict compliance. OpenBao is the OSS fork |
+Everything beyond the floor (HSMs, dynamic DB creds, Vault clusters) is scale/compliance-driven, not security baseline.
+## Anti-patterns
+- **Shared mutable staging DB** — contention; one bad migration blocks every team. Fix: branch-per-PR.
+- **Prod creds on laptops** — Uber/CircleCI/LastPass; one endpoint compromise = full prod compromise.
+- **Secrets echoed into CI logs** — masking is best-effort exact-match; supply-chain payloads (CVE-2025-30066) dumped runner secrets into *public* logs past the redactor. Fix: OIDC + minimal job-scoped secrets.
+- **Committed `.env` / drifting `.env.example`** — the 110k-domain harvest target. No value-bearing env files in repos at all.
+- **Superuser/admin app connections** — app role is DML-only; assume no superuser exists.
+- **Serverless compute without a pooler** — connection storm at the first traffic spike.
+- **"Masked = anonymous" GDPR mistake** — pseudonymized clones remain in scope (EDPB 01/2025); anonymize irreversibly or seed synthetically.
+- **Paying for scale-to-zero you never get** — persistent connections/RDS Proxy defeat Aurora auto-pause.
+## Consumes / produces
+- **Read by** `/gsd:infrastructure-strategy` (data-layer step) — hosting, environments, and secrets decisions feed the infrastructure ADR.
+- **Read alongside** `@~/.claude/gsd-core/references/test-containers.md` and `@~/.claude/gsd-core/references/db-test-isolation.md` when designing test infrastructure — the parity rule (engine type + version) and the synthetic-seed default apply to test DBs too.
+*Sources: vendor pricing pages (Neon/AWS/GCP, 2026 — spot-check JS-rendered AWS/GCP rates), AWS Database Blog (against-interest), cloudonaut & Jeremy Daly break-evens, 12factor.net + official revision, EDPB 01/2025, GitGuardian Sprawl 2026, Unit 42, CircleCI/Uber/LastPass incident reports, Fowler ParallelChange, strong_migrations/squawk docs.*

package/gsd-core/references/domain-modeling.md CHANGED Viewed

@@ -37,7 +37,19 @@ Classify each area of the system. This is the single highest-leverage output —
 1. **Core = differentiating AND complex — not just complex (and not merely *critical* or *regulated*).** A complex-, critical-, or regulated-but-*generic* subdomain (tax, identity/auth, encryption, compliance) is a **buy**, not a build. Difficulty, security-criticality, and regulatory burden do not make something core — only competitive differentiation does. Don't invest core-grade effort there.
 2. **Generic ≠ low quality.** "Generic" means *not differentiating*, not *low effort*. A battle-tested auth library is high-quality and generic.
-3. **Watch for CRUD that will grow business rules.** The dangerous case is an app that "starts as a UI over the database, then evolves into real domain logic." Before classifying something Generic/CRUD, probe future features: *will this accumulate invariants and rules?* If yes, treat it as (emerging) core/supporting, not generic.
+3. **Watch for CRUD that will grow business rules.** The dangerous case is an app that "starts as a UI over the database, then evolves into real domain logic." Whenever an area is *described* as simple/CRUD — whatever type is being claimed — probe future features: *will this accumulate invariants and rules?* If yes, treat it as (emerging) core/supporting, not generic. "Core but just CRUD" is self-contradictory — challenge which half is wrong.
+### What "complex" means — the complexity-signals rubric
+Complexity is **derived from observable signals, never asked as a free label** — domain experts understate their own domain; "honestly just CRUD" is how founders describe rule-rich cores. Elicit 2–3 signals per non-generic area, then rate:
+1. **Business invariants/rules** — non-trivial rules that must never be violated.
+2. **Lifecycle/state depth** — the central thing moves through states with restricted transitions.
+3. **Derivation/optimization logic** — outputs *computed* from competing factors (scoring, tradeoffs, allocation), not looked up.
+4. **Temporal/scheduling logic** — deadlines, windows, grace periods, retroactivity.
+5. **Policy/variant proliferation** — rules differ by customer, class, or configuration.
+**Rating:** 0–1 signals → low · 2–3 → medium · ≥3 with signal 3 present → high. Record fired signals in the rationale — the evidence *is* the rating. **Consistency tripwire:** Core+low is a contradiction (Core *means* differentiating and complex) — probe: "if it's your differentiator but has no complex rules, what makes it hard to copy?" Generic+high is a buy-harder signal.
 ### Anti-sprawl rule
@@ -51,7 +63,7 @@ Only when the domain is non-trivial. A **Big-Picture event storming** pass surfa
 2. For each: *who triggers it? who reacts? what decision follows?*
 3. Group events by actor/responsibility → each cluster is a candidate **bounded context**.
-Boundaries often fall where the **language changes** (the same word means different things) or where the **rate of change** differs. If boundaries are unclear, **defer** them — say so explicitly and let planning refine them.
+Boundaries often fall where the **language changes** (the same word means different things) or where the **rate of change** differs. If boundaries are unclear, **defer** them — say so explicitly and let planning refine them. Deferring a boundary your own glossary proved (polyseme, third-party vocabulary) is discarding findings — record it as a candidate.
 A **Process-level** pass is the optional middle gear between Big-Picture and Design-level storming: take one important event flow (e.g., "Order → Payment → Fulfillment") and walk its commands, policies ("whenever X, then Y"), and read-models. It sharpens *one* boundary and its hand-offs without dropping to aggregates. Use it only when a single flow's boundary is genuinely contested; otherwise stay Big-Picture. Do **not** run design-level (aggregate-level) event storming here — that is tactical and belongs to a core subdomain you've already identified.