npm - @therocketcode/gsd-core - Versions diffs - 1.9.0 → 1.10.0 - Mend

@therocketcode/gsd-core 1.9.0 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/.claude-plugin/plugin.json +1 -1
package/agents/gsd-code-reviewer.md +1 -1
package/agents/gsd-plan-checker.md +1 -1
package/agents/gsd-verifier.md +3 -1
package/gemini-extension.json +1 -1
package/gsd-core/references/ai-test-quality.md +5 -1
package/gsd-core/references/architecture-decision.md +19 -3
package/gsd-core/references/engineering-standards.md +1 -1
package/gsd-core/references/test-strategy.md +7 -1
package/gsd-core/workflows/execute-phase.md +1 -0
package/gsd-core/workflows/infrastructure-strategy.md +2 -0
package/gsd-core/workflows/plan-phase.md +1 -0
package/gsd-core/workflows/recommend-architecture.md +3 -1
package/gsd-core/workflows/testing-strategy.md +2 -0
package/package.json +1 -1

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "gsd-core",
   "displayName": "GSD Core",
-  "version": "1.9.0",
+  "version": "1.10.0",
   "description": "GSD Core is a meta-prompting, context engineering, and spec-driven development system for AI coding agents.",
   "author": {
     "name": "TheRocketCodeMX",

package/agents/gsd-code-reviewer.md CHANGED Viewed

@@ -60,7 +60,7 @@ This ensures project-specific patterns, conventions, and best practices are appl
 **3. Code Quality** — Dead code, unused imports/variables, poor naming conventions, missing error handling, inconsistent patterns, overly complex functions (high cyclomatic complexity), code duplication, magic numbers, commented-out code
 **4. Contract Conformance (enforce `engineering-standards.md` BOTH ways — you are the fresh-context adversary)** — Fit to the ADR's chosen rung for this subdomain is itself a defect class. Never reward "simpler is better"; the bar is "fits the architecture's chosen rung, executed fully." Hunt in both directions and classify as BLOCKER when behavior/architecture is wrong, WARNING otherwise:
-- **Reward-hacking / under-engineering:** hardcoded expected outputs; happy-path-only handling (missing edge cases/failure modes); a test weakened, skipped, deleted, or made trivially-passing (e.g. `assert True`, removed assertion, rewritten `expected`); logic the codebase already implements re-written instead of reused; CRUD / transaction-script or patching *around* a mandated abstraction where the ADR mandates a richer rung (Domain Model / ports / aggregates / CQRS) — under-built against the architecture.
+- **Reward-hacking / under-engineering:** hardcoded expected outputs; happy-path-only handling (missing edge cases/failure modes); a test weakened, skipped, deleted, or made trivially-passing (e.g. `assert True`, removed assertion, rewritten `expected`); logic the codebase already implements re-written instead of reused; CRUD / transaction-script or patching *around* a mandated abstraction where the ADR mandates a richer rung (Domain Model / ports / aggregates / CQRS) — under-built against the architecture; **or skipping the universal floor** — no seam at the true external boundaries (DB/3rd-party reached into from everywhere, untestable without the real services), a defect *even on a simple Transaction-Script subdomain* (the floor is the cheap testable baseline, not optional ceremony).
 - **Over-engineering:** ports / aggregates / CQRS / speculative layers, options, or config the ADR did NOT mandate for this subdomain; an abstraction invented on the first or second similarity (the *wrong* abstraction — prefer duplication until the shape is obvious). Note: a mandated port/aggregate is NOT speculative — its absence is the defect, not its presence.
 To judge the rung, read the subdomain's classification and ADR (`DOMAIN-MODEL.md`, the architecture ADR) when present; absent those, fall back to "matches existing patterns + no hacks + complete."

package/agents/gsd-plan-checker.md CHANGED Viewed

@@ -78,7 +78,7 @@ If CONTEXT.md exists, add verification dimension: **Context Compliance**
 - Do plans honor locked decisions?
 - Are deferred ideas excluded?
 - Are discretion areas handled appropriately?
-- **Do plans honor the canonical discovery artifacts?** Flag a HIGH concern if a task contradicts the architecture ADR's per-subdomain rung (e.g. CRUD where a Domain Model is mandated), the DOMAIN-MODEL classification, or the TEST-STRATEGY's test levels (e.g. unit-mocking the DB where integration via Testcontainers is required, or float money where integer minor units are mandated). Same for INFRA-STRATEGY/CICD-STRATEGY when present (e.g. committed .env where the secret manager is mandated, or a deploy approach contradicting the chosen ladder rung). Per `engineering-standards.md` this gate is **symmetric** — flag HIGH in BOTH directions, never bias toward "simpler": (a) a plan that bakes in a hack/shortcut to pass a gate (hardcoded expected output, weakened/skipped test, "make it pass"); (b) **under-engineering** — CRUD/transaction-script, or patching around a mandated abstraction, where the ADR mandates a richer rung (Domain Model / ports / aggregates / CQRS); (c) **over-engineering** — adding ports/aggregates/CQRS/speculative layers the ADR did NOT mandate for that subdomain.
+- **Do plans honor the canonical discovery artifacts?** Flag a HIGH concern if a task contradicts the architecture ADR's per-subdomain rung (e.g. CRUD where a Domain Model is mandated), the DOMAIN-MODEL classification, or the TEST-STRATEGY's test levels (e.g. unit-mocking the DB where integration via Testcontainers is required, or float money where integer minor units are mandated). Same for INFRA-STRATEGY/CICD-STRATEGY when present (e.g. committed .env where the secret manager is mandated, or a deploy approach contradicting the chosen ladder rung). Per `engineering-standards.md` this gate is **symmetric** — flag HIGH in BOTH directions, never bias toward "simpler": (a) a plan that bakes in a hack/shortcut to pass a gate (hardcoded expected output, weakened/skipped test, "make it pass"); (b) **under-engineering** — CRUD/transaction-script or patching around a mandated abstraction where the ADR mandates a richer rung (Domain Model / ports / aggregates / CQRS); **or skipping the universal floor** — no seam at the true external boundaries (DB/3rd-party reached into from everywhere, untestable without the real services), even on a simple Transaction-Script subdomain; (c) **over-engineering** — adding ports/aggregates/CQRS/speculative layers the ADR did NOT mandate for that subdomain.
 </upstream_input>
 <core_principle>

package/agents/gsd-verifier.md CHANGED Viewed

@@ -451,7 +451,9 @@ grep -n -B 2 -A 2 "console\.log" "$file" 2>/dev/null | grep -E "^\s*(const|funct
 git diff ${DIFF_BASE:-HEAD~1}..HEAD -- '*test*' '*spec*' 2>/dev/null | grep -nE '^\-.*(assert|expect|EXPECT)|\+.*(skip|xit|\.only|assert\s*\(?\s*[Tt]rue|return;?\s*//.*skip)'
 ```
-**Architecture-fit gate (per `engineering-standards.md`; only when `.planning/adr/*.md` or `DOMAIN-MODEL.md` exists):** "wired and tamper-free" is not enough — the implementation must match the ADR's chosen rung for the subdomain it touches. Read the latest ADR + DOMAIN-MODEL and check **both directions**: (a) **under-engineering** — thin CRUD / transaction-script / a patch-around where the rung mandates a Domain Model / hexagonal ports / CQRS / event-driven flow (a working-but-wrong-shape implementation is a 🛑 BLOCKER, `status: gaps_found` — it's the under-build the contract forbids, not a passing phase); (b) **over-engineering** — ports/aggregates/CQRS/speculative layers on a subdomain the ADR marked Transaction Script (⚠️ Warning). Judge the shape, not just that it runs. If no ADR/DOMAIN-MODEL exists, skip this gate.
+**Architecture-fit gate (per `engineering-standards.md`):** "wired and tamper-free" is not enough — judge the *shape*, not just that it runs.
+- **Floor check (applies ALWAYS, even with no ADR):** the universal floor is dependency inversion at the true external boundaries (DB/3rd-party/IO) + a Functional-Core/Imperative-Shell shape so the logic is testable without the real services. An implementation that reaches into the DB/3rd-party from everywhere, untestable without standing up real infrastructure, **skipped the floor** — a 🛑 BLOCKER (`status: gaps_found`) *even on a simple Transaction-Script subdomain*. The floor is the cheap baseline, not ceremony.
+- **Rung-fit check (when `.planning/adr/*.md` or `DOMAIN-MODEL.md` exists):** the implementation must match the ADR's chosen rung. Both directions: (a) **under-engineering** — thin CRUD / patch-around where the rung mandates a Domain Model / hexagonal ports / CQRS / event-driven flow (a working-but-wrong-shape impl is a 🛑 BLOCKER, not a passing phase); (b) **over-engineering** — ports/aggregates/CQRS/speculative layers on a subdomain the ADR marked Transaction Script (⚠️ Warning).
 Categorize: 🛑 Blocker (prevents goal, unresolved debt marker, reward-hacked check, or ADR-rung under-build) | ⚠️ Warning (incomplete, or over-built vs the rung) | ℹ️ Info (notable)

package/gemini-extension.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "gsd-core",
-  "version": "1.9.0",
+  "version": "1.10.0",
   "description": "GSD Core — a meta-prompting, context engineering, and spec-driven development system for AI coding agents. Loads gsd's operating context into every Gemini CLI session.",
   "contextFileName": "GEMINI.md"
 }

package/gsd-core/references/ai-test-quality.md CHANGED Viewed

@@ -1,6 +1,8 @@
 # AI-Written Tests — The Quality Contract
-How-to reference and enforcement contract for tests authored by a model (`add-tests`, TDD inside `execute-phase`). LLM test-writers have *known, named* failure modes: vacuous assertions, happy-path-only suites, mock-everything-then-assert-the-mock, change-detector tests (the implementation's current output copied back as the oracle), and weakening a failing assertion to go green. This contract converts test quality from judgment into mechanical checks — inventory-first, greppable forbidden patterns, a falsifiability gate, a mutation gate. Read before generating any test. Pairs with `test-strategy.md` and `test-doubles.md`.
+How-to reference and enforcement contract for tests authored by a model (`add-tests`, TDD inside `execute-phase`). LLM test-writers have *known, named* failure modes: vacuous assertions, happy-path-only suites, mock-everything-then-assert-the-mock, change-detector tests (the implementation's current output copied back as the oracle), and weakening a failing assertion to go green. The deepest of these is **lost independence**: when the same agent writes both the code and its tests, the tests rubber-stamp the implementation — "documentation with assertions" that confirm bugs as expected behavior. This contract converts test quality from judgment into mechanical checks — inventory-first, greppable forbidden patterns, a falsifiability gate, an **independence gate**, a mutation gate. Read before generating any test. Pairs with `test-strategy.md` and `test-doubles.md`.
+**The independence requirement (load-bearing).** Assertions must be anchored to the **spec / expected behavior**, NOT derived from reading the implementation. The agent must never edit a test to make its own code pass — that inverts the check. The test of independence: *would this assertion exist, in this form, if you hadn't seen the implementation? Is it checking the SPEC, or the code?* Green ≠ correct — a passing suite the implementer authored against its own output proves nothing.
 ## A. Behavior inventory BEFORE any test is written
@@ -54,6 +56,8 @@ A generated test that passes on its first run is **unverified**, not done. The s
 One extra run per test file; fully automatable. A test that cannot be made to fail by breaking the code under test is testing nothing — delete or rewrite it. **Never waive this step** because "the code already works"; that waiver is exactly where vacuous AI-written tests are born. (Where the mutation gate in E runs on the same files, a killed mutant covering the test's target behavior satisfies this gate.)
+**The independence dimension — "can it fail" is necessary but not sufficient.** A test can fail-on-mutation and still be a rubber stamp if its *oracle* came from the implementation rather than the spec. So for each assertion also ask: **would this assertion exist, in this form, if I hadn't read the implementation? Is it checking the SPEC or the code?** The expected value must trace to a requirement/acceptance criterion (cite it), not to "what the SUT currently returns" (that is the change-detector pattern, B). And when a test goes red, **never edit the test to make the agent's own code pass** — investigate which side is wrong; weakening the assertion to go green destroys the independence the gate exists to protect.
 ## E. Mutation gate on the diff
 Run mutation testing (Stryker) **incrementally on changed files** as part of accepting a generated suite:

package/gsd-core/references/architecture-decision.md CHANGED Viewed

@@ -15,6 +15,12 @@ The common sweet spot is a **Domain Model inside a modular monolith**. High scal
 Use the core subdomain's complexity from DOMAIN-MODEL. Apply per subdomain: the complex core may warrant a Domain Model (± Hexagonal); supporting/generic subdomains stay Transaction Script.
+### The floor — the cheap baseline, even for simple projects
+Below the rungs sits a baseline that applies **even to simple/short-lived projects** and is explicitly **NOT full hexagonal**: **dependency inversion at *true external boundaries only* (DB, 3rd-party APIs, clock/IO) + a Functional Core / Imperative Shell shape (pure logic separated from side-effecting glue) + strong, independent tests.** This is the lighter discipline the senior voices on *both* sides of the "always hexagonal?" debate converge on — it delivers the pro-camp's real prize (day-one isolated testability) at a fraction of hexagonal's cost, with **no internal port ceremony**. Extract a real internal port only "when you feel the second adapter appearing."
+Transaction Script is the domain-logic *organization* at the floor — it still sits **behind** this seam discipline. **Do not read "Transaction Script floor" as "no seams":** a CRUD app that reaches straight into the DB/3rd-party from everywhere, untestable without the real services, is **under-engineered even though it is simple** — it skipped the floor. The floor is the cheap minimum; the rungs below are about how much *domain-logic structure* to add on top of it.
 | Rung | Move up when | Over-engineering tell |
 |------|--------------|-----------------------|
 | **Transaction Script / simple layered CRUD** (floor) | "validate → persist → return"; few rules; supporting/generic subdomains | — |
@@ -69,13 +75,23 @@ The same tools run in reverse for a brownfield monolith you're decomposing — s
 **Over-engineering tells:** one-implementation ports; rich aggregates over structural CRUD; CQRS/ES with no asymmetry or audit need; "microservice envy"; distributed monolith; elaborate config/abstraction "wanting all options all the time."
-**Under-engineering tells:** the same invariant duplicated across many transaction scripts; a big-ball-of-mud monolith with no enforced module boundaries; a complex/regulated domain modeled as thin CRUD; no audit trail where compliance needs it; no ADRs / no fitness functions.
+**Under-engineering tells:** the same invariant duplicated across many transaction scripts; a big-ball-of-mud monolith with no enforced module boundaries; a complex/regulated domain modeled as thin CRUD; **no seam at the true external boundaries (DB/3rd-party reached into from everywhere), so the code can't be tested without the real services — the Functional-Core/Imperative-Shell floor was skipped, even on a simple app**; no audit trail where compliance needs it; no ADRs / no fitness functions.
+## The AI-coding era moves the FLOOR up a notch (not the ceiling)
+The evidence (agent-reliability studies) shifts *where the floor sits*, not whether the higher rungs are universal:
+- **Seams pay from the first agent session.** Agent reliability collapses as codebase scale and files-touched-per-change rise, and the bottleneck is **architectural legibility, not context-window size** (bigger windows don't rescue large tangled repos). Clean boundaries help the *agent* navigate and verify — so the Functional-Core/Imperative-Shell floor + strong independent tests are worth more now, even on simple projects.
+- **Get the *seams* right early; defer *feature* speculation.** AI is **bad at adding architectural seams later** (agent success on compound/architectural refactors is low — well under half in the agent-reliability studies) but **good at adding features later** — so cheap-AI *weakens* YAGNI for core seams (build them now) and *strengthens* it for speculative features (defer them).
+- **Don't over-generate structure.** AI writes structure cheaply but **maintains it worse** (duplication rises, drift) — so this moves the floor *up a notch*, NOT toward always-hexagonal. Prefer **deep modules, not shallow many-file layering** (the "design for agents" consensus converges with classic good design on tests/interfaces/naming/deep-modules; it diverges only by penalizing indirection *depth*).
+- **Tests must be *independent*.** Because agents reward-hack tests, the lever is test *independence + quality* (human-owned/agent-non-editable assertions; verify beyond green), not test-*first ordering*. See `test-strategy.md` / `ai-test-quality.md`.
+Net: raise the floor (seams + independent tests earlier), keep the ceiling calibrated (Domain Model / full hexagonal / CQRS / ES still require their concrete signals — the originators themselves reserve them for the complex core).
-**The meta-tell (use this to settle every rung):** if you cannot point to a **current, concrete** requirement — a real second adapter or delivery mechanism, a real divergent-scaling component, a real second team, a real audit mandate, a real tenant-isolation mandate, a genuinely pure core isolated for test speed — that justifies a rung, you are **over-engineering**. If such a requirement exists and you ignored it, you are **under-engineering**.
+**The meta-tell (use this to settle every rung *above the floor*; the floor itself is always-on and exempt — it is the baseline, not a rung):** if you cannot point to a **current, concrete** requirement — a real second adapter or delivery mechanism, a real divergent-scaling component, a real second team, a real audit mandate, a real tenant-isolation mandate, a genuinely pure core isolated for test speed — that justifies a rung, you are **over-engineering**. If such a requirement exists and you ignored it, you are **under-engineering**.
 ## Default baseline (when in doubt)
-**Modular monolith + Domain Model only in the complex core bounded contexts + Transaction Script in the simple/CRUD ones + ADRs + boundary fitness functions.** This is the modern consensus starting point.
+**The floor (Functional-Core/Imperative-Shell + dependency inversion at true external boundaries + strong independent tests) everywhere — then modular monolith + Domain Model only in the complex core bounded contexts + Transaction Script in the simple/CRUD ones + ADRs + boundary fitness functions.** This is the modern consensus starting point: a cheap testable floor for all, rich structure only where complexity earns it.
 ## Always

package/gsd-core/references/engineering-standards.md CHANGED Viewed

@@ -16,7 +16,7 @@ Shared reference injected into the producing agents (`gsd-planner`, `gsd-phase-r
 ## The contract (behaviors — most load-bearing last)
 1. **Understand before you change.** Read the real code paths, the tests-as-spec, and — for this subdomain — the DOMAIN-MODEL classification and the ADR rung, before deciding or writing. Most defects are acting on an incomplete mental model, or building to the wrong structure.
-2. **Build to the architecture, fully.** Implement the rung the ADR chose for this subdomain — its ports, aggregates, boundaries — properly and completely. Don't skip mandated structure "to keep it simple" (under-engineering); don't add un-mandated structure "to be robust" (over-engineering).
+2. **Build to the architecture, fully — on top of the universal floor.** A cheap baseline applies to *every* project, even simple ones: dependency inversion at **true external boundaries only** (DB, 3rd-party, clock/IO) + a Functional-Core/Imperative-Shell shape (pure logic separable from side-effecting glue) + strong independent tests. That floor is part of the invariant bar — skipping it (reaching into the DB/3rd-party from everywhere, untestable) is under-engineering even on a CRUD app. *On top of that*, implement the rung the ADR chose for this subdomain — ports/aggregates/boundaries — properly. Don't skip mandated structure "to keep it simple" (under-engineering); don't add un-mandated internal ceremony "to be robust" (over-engineering). "Don't add un-mandated structure" (rule 4) means the *ceremony above the floor*, never the floor itself.
 3. **Match what exists; deep modules, clear names.** Make new code look like it belongs — reuse the established patterns, names, and conventions (`gsd-pattern-mapper` supplies the analogs); don't add a second way to do something that already has one. Prefer a simple interface over real substance to many shallow pass-throughs (Ousterhout — and "extract into tiny functions" is genuinely contested; don't cargo-cult classitis). Descriptive names are one of the few empirically-validated quality levers.
 4. **Don't invent un-mandated structure (but build everything the ADR mandates).** Prefer duplication to a *speculative* abstraction — duplication is far cheaper than the wrong abstraction (Metz); wait until the right shape is obvious, inline it back when it stops fitting (AHA); DRY is about *knowledge*, not coincidentally-similar code. Skip speculative layers/options/config for imagined futures (YAGNI; Fowler's cost-of-carry). None of this overrides the architecture's *deliberate, required* abstractions — a mandated port/aggregate is not speculative; declining it is under-engineering, not YAGNI.
 5. **Bounded boy-scout.** Improve what you touch when the change needs it and it's honestly better; keep structural changes in a *separate* commit from behavioral ones (Beck's two hats); flag larger refactors instead of silently sprawling — unbounded refactoring is scope creep.

package/gsd-core/references/test-strategy.md CHANGED Viewed

@@ -35,7 +35,13 @@ Pure, dependency-light, logic-dense code: **money/currency (integer minor units
 ## TDD — honestly
-The quality benefit is real, but the evidence favors **small, uniform increments** over *test-first specifically* (controlled studies found test-first vs test-after didn't differ; granularity did). So **mandate: behavior-level tests + small increments + a regression floor.** Keep the **RED step** (watch a test fail) so tests actually test something. Test-first vs test-after is a **knob** (`workflow.tdd_mode`), not dogma.
+The defensible mandate is **always strong, INDEPENDENT tests** — not "always TDD, even for simple." The evidence is clear that what drives the quality benefit is **granularity + test existence + independence**, not test-first *ordering* (controlled studies found test-first vs test-after didn't differ; granularity did; the design-improvement claim for ordering is contested, not established). So **mandate: behavior-level tests + small increments + a regression floor.** Keep the **RED step** (watch a test fail) so tests actually test something.
+**Default to test-first for AI-WRITTEN code — but on anti-gaming/independence grounds, not "test-first improves design."** When an agent authors the implementation, writing the spec/test first stops it from shaping tests to its own output: agents reward-hack tests, and a large share of "passing" agent patches diverge from ground truth (green ≠ correct). The lever is an *independent*, human-anchored check the agent did not write to its own convenience (see `ai-test-quality.md`) — not the ordering ritual. Test-first vs test-after stays a **knob** (`workflow.tdd_mode`); the independence requirement does not.
+**Skip TDD for** UI/visual components, prototypes/spikes, glue/wiring, and trivial CRUD — correctness there is visual, throwaway, or too thin to assert cheaply (both pro- and anti-TDD camps agree). Still cover them with the cheapest check that fits (visual/snapshot, integration smoke).
+**Green ≠ correct — verify beyond green.** A passing suite is a *lower bound* on correctness, not proof. For critical paths pair it with mutation/property/differential testing + human UAT.
 ## Persistent vs transient E2E

package/gsd-core/workflows/execute-phase.md CHANGED Viewed

@@ -649,6 +649,7 @@ increases monotonically across waves. `{status}` is `complete` (success),
        ` : ''}
        - ${PROJECT_ROOT}/CLAUDE.md (Project instructions, if exists — follow project-specific guidelines and coding conventions)
        - ${PROJECT_ROOT}/.claude/skills/ or ${PROJECT_ROOT}/.agents/skills/ (Project skills, if either exists — list skills, read SKILL.md for each, follow relevant rules during implementation)
+       - ${PROJECT_ROOT}/.planning/adr/*.md, ${PROJECT_ROOT}/.planning/DOMAIN-MODEL.md, ${PROJECT_ROOT}/.planning/TEST-STRATEGY.md (if they exist — the chosen architecture rung and test levels; any autonomous deviation must honor them: build to the rung fully, neither under- nor over-engineering it, per @~/.claude/gsd-core/references/engineering-standards.md)
        </files_to_read>
        ${AGENT_SKILLS}

package/gsd-core/workflows/infrastructure-strategy.md CHANGED Viewed

@@ -124,6 +124,8 @@ INFRA-STRATEGY.md written — infrastructure matched to traffic shape and team s
 Next: /gsd:cicd-strategy   (pipelines + deploy targets will follow this strategy)
 ```
+**Roadmap reconciliation:** ROADMAP.md predates this strategy. Scan it against the compute rungs and data layer — if a phase assumes compute, a data store, or an environment this strategy invalidates or reshapes (e.g. a phase planning a self-managed cluster now dropped to serverless containers, or one missing the IaC/observability floor this strategy mandates), SAY SO explicitly and offer `/gsd:phase --edit` (or a roadmap refresh — the roadmapper re-reads discovery artifacts). Never leave a known strategy↔roadmap contradiction unspoken.
 </process>
 <critical_rules>

package/gsd-core/workflows/plan-phase.md CHANGED Viewed

@@ -10,6 +10,7 @@ Read all files referenced by the invoking prompt's execution_context before star
 @~/.claude/gsd-core/references/gate-prompts.md
 @~/.claude/gsd-core/references/agent-contracts.md
 @~/.claude/gsd-core/references/gates.md
+@~/.claude/gsd-core/references/engineering-standards.md
 </required_reading>
 <available_agent_types>

package/gsd-core/workflows/recommend-architecture.md CHANGED Viewed

@@ -46,7 +46,9 @@ From DOMAIN-MODEL.md (or the answers): extract each subdomain's type + complexit
 ## Step 3: Axis A — domain-logic organization (per subdomain)
-Decide a rung **per subdomain** (not one rung for the whole app). Supporting/generic subdomains almost always stay at Transaction Script; the complex core may climb.
+**First, the floor applies to ALL subdomains, even simple ones (state it in the ADR):** dependency inversion at *true external boundaries only* (DB, 3rd-party, clock/IO) + a Functional-Core/Imperative-Shell shape + strong independent tests. This is NOT hexagonal — no internal port ceremony — it's the cheap testability baseline both camps' senior voices converge on, and the AI era makes it pay from the first agent session (see the reference's floor + AI-era sections). "Transaction Script" is the domain-logic organization *on top of* this floor, never an excuse to skip it (a CRUD app that reaches into the DB from everywhere is under-engineered).
+Then decide a rung **per subdomain** (not one rung for the whole app). Supporting/generic subdomains almost always stay at Transaction Script *behind the floor seams*; the complex core may climb.
 For the **core** subdomain, use `AskUserQuestion` (header "Core logic"):
 - question: "For the core (*[core name]*): are operations mostly validate→persist→return, or are there rich invariants / state machines / tangled business rules?"

package/gsd-core/workflows/testing-strategy.md CHANGED Viewed

@@ -105,6 +105,8 @@ TEST-STRATEGY.md written — test shape set to follow the architecture.
 Next: /gsd:plan-phase   (plans + /gsd:add-tests will follow this strategy)
 ```
+**Roadmap reconciliation:** ROADMAP.md predates this strategy. Scan it against the test shape — if a phase relies on test infrastructure or a test level this strategy invalidates or reshapes (e.g. a phase planning e2e for what's now demoted to integration, or one missing the test infra a gnarly-bit tier needs), SAY SO explicitly and offer `/gsd:phase --edit` (or a roadmap refresh — the roadmapper re-reads discovery artifacts). Never leave a known strategy↔roadmap contradiction unspoken.
 </process>
 <critical_rules>

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@therocketcode/gsd-core",
-  "version": "1.9.0",
+  "version": "1.10.0",
   "description": "GSD Core is a meta-prompting, context engineering, and spec-driven development system for AI coding agents.",
   "bin": {
     "gsd-core": "bin/install.js",