npm - groundwork-method - Versions diffs - 0.10.0 → 0.11.0 - Mend

groundwork-method 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/src/engineer-skills/groundwork-nextjs-engineer/references/observability.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Observability
+A Next.js app has two telemetry surfaces, and they obey different rules. The **server side** — route handlers, Server Actions, `instrumentation.ts` — is genuinely backend-like: it emits OpenTelemetry spans the same way a service does. The **browser client** emits user-experience signal: Core Web Vitals and errors from a device you do not control. Instrument each for the questions you will actually ask in an incident; the signal that proves a path correct in test is the signal you debug with in production (`docs/principles/quality/observability.md`).
+## Server Side — OpenTelemetry Spans
+`instrumentation.ts` is the registration point; from there, route handlers and Server Actions emit spans through the OTel SDK to a collector. This half follows the backend canon unchanged: vendor lock-in lives at the collector boundary, not in application code.
+- **Trace-driven.** Sketch the span a server path should produce — name, attributes, parent — before writing the handler. The instrumentation design shapes the code.
+- **Assert what you debug.** The in-memory span exporter that proves a route's trace in `references/testing.md` (Trace Assertions) reads the same span a dashboard or SLO does. A critical-path span a query depends on is part of the contract, not decoration — a missing span is a test failure.
+- **Structured logs carry the trace.** Server logs are JSON and inject `trace_id`/`span_id` from the active context, so a log line pivots to the trace that produced it. Sample debug/info; never sample errors.
+## Client Side — Web Vitals and Error Reporting
+The browser cannot run a collector. It reports field signal to a sink.
+- **Core Web Vitals (RUM).** Report LCP, INP, CLS, and TTFB from real sessions via `useReportWebVitals` to a sink. These are the user's experience; a green Lighthouse lab score is not — it measures one synthetic load, not the field.
+  ```tsx
+  // app/_components/web-vitals.tsx — mounted once in the root layout
+  'use client';
+  import { useReportWebVitals } from 'next/web-vitals';
+  export function WebVitals() {
+    useReportWebVitals(({ name, value, id }) => {
+      navigator.sendBeacon('/api/rum', JSON.stringify({ name, value, id }));
+    });
+    return null;
+  }
+  ```
+- **Error reporting.** `error.tsx` and `global-error.tsx` catch render errors; a `window` `error`/`unhandledrejection` handler catches the rest. Both forward to the sink. An error boundary that renders a fallback but reports nothing is a silent failure — the user sees the broken state and you never do.
+- **Structured events, not `console.log`.** `console` output in a production bundle is not telemetry; emit structured events the sink can query.
+- **Connect the halves.** The browser `fetch` can inject a W3C `traceparent` so a client interaction links to the server trace it triggered — one causal thread across the boundary.
+## What to Capture vs PII
+- **Capture** route, status, duration, the span attributes a dashboard queries, the web-vital name and value, error type and stack.
+- **Never** put tokens, full request bodies, or emails/PII in span attributes, breadcrumbs, or client events. The sink is third-party — redact at the edge.
+- **Cardinality is a design choice.** High cardinality on server traces where it is queryable; keep it off client metric dimensions, which multiply by every session.
+## Anti-Patterns
+- **`console.log` as telemetry.** It is noise in the field, not a signal you can query.
+- **Boundary without report.** A fallback UI that swallows the error instead of forwarding it.
+- **Lab metrics as field RUM.** Lighthouse is a synthetic check, not what users experienced.
+- **A collector in the bundle.** The client reports to a sink; it does not run backend OTel infrastructure.
+- **Over-instrumenting.** A span or metric nobody will query during an incident is cost and clutter — instrument the questions, not the surface area.

package/src/engineer-skills/groundwork-nextjs-engineer/references/security.md ADDED Viewed

@@ -0,0 +1,131 @@
+# Security
+## Table of Contents
+- [The Posture](#the-posture)
+- [XSS: Trust React's Escaping](#xss-trust-reacts-escaping)
+- [The Client Bundle Is Public](#the-client-bundle-is-public)
+- [Server Action Input Validation](#server-action-input-validation)
+- [Auth and Sessions: httpOnly Cookies](#auth-and-sessions-httponly-cookies)
+- [CSRF on Mutations](#csrf-on-mutations)
+- [SSRF on Server Fetches](#ssrf-on-server-fetches)
+- [Content-Security-Policy](#content-security-policy)
+- [Security Review Checklist](#security-review-checklist)
+---
+## The Posture
+A Next.js app runs in two places at once, and the boundary between them is the whole security model. Server Components, Server Actions, and route handlers run on a trusted server; everything else ships to a browser the user controls. Code, props, and environment values that cross into the client are public — assume an attacker reads the bundle and replays every request. This file is the Next.js idiom of the framework security canon (`docs/principles/quality/security.md`); when this file and the canon disagree, the canon wins and this file is the one to fix.
+The single discipline underneath every rule below: validate and authorize on the server, never trust the client. Client-side validation is UX; the server check is the security boundary (`references/mutations-and-forms.md` → Error Flow).
+## XSS: Trust React's Escaping
+React escapes every value interpolated into JSX by default — `{userValue}` cannot break out of its text node. XSS re-enters only when you opt out of that escaping.
+- `dangerouslySetInnerHTML` is the named opt-out. Render it only with HTML you produced or sanitised server-side (a vetted sanitiser such as DOMPurify); never with a value that originated from a user or an API.
+- A `javascript:` or `data:` URL in an `href`/`src` is script. Validate that user-supplied URLs are `https:` before rendering them.
+- Untrusted JSON parsed and injected as markup is the same hole by another route — keep untrusted data as text.
+```tsx
+// Hostile — renders attacker markup into a privileged origin
+<div dangerouslySetInnerHTML={{ __html: order.customerNote }} />
+// Safe — React escapes it; the note is text, not markup
+<div>{order.customerNote}</div>
+```
+## The Client Bundle Is Public
+Every value reachable from client code is shipped to the browser. The `NEXT_PUBLIC_` prefix is the boundary: a variable with that prefix is inlined into the bundle, and a variable without it is unreadable from any `'use client'` module.
+- Secrets — API keys, signing secrets, database URLs — never carry the `NEXT_PUBLIC_` prefix and are read only in server code (Server Components, Server Actions, route handlers, `lib/api.ts` on the server).
+- A secret consumed by the client is a secret leaked. If a feature "needs" a key in the browser, the call belongs on the server: route it through a Server Action and keep the key server-side.
+- The downward dependency graph (`references/architecture.md`) keeps this honest — schemas and the API client never import client-only code, so server secrets have no path into a client component by construction.
+```ts
+const apiKey = process.env.UPSTREAM_API_KEY;        // server-only — correct
+const pub = process.env.NEXT_PUBLIC_ANALYTICS_ID;   // inlined into the bundle — public by design
+```
+## Server Action Input Validation
+A Server Action is a public POST endpoint — anyone can invoke it with any payload, regardless of what the form allows. Every Server Action re-validates its input with the same Zod schema the form uses, on the server, before any work (`references/type-system.md` → Zod as the Contract).
+```tsx
+'use server';
+export async function updateOrderAction(
+  id: string,
+  formData: FormData,
+): Promise<ActionResult<Order>> {
+  const principal = await requirePrincipal();           // 1. authenticate the caller
+  const parsed = updateOrderSchema.safeParse({          // 2. validate every field
+    quantity: Number(formData.get('quantity')),
+    note: formData.get('note'),
+  });
+  if (!parsed.success) {
+    return { data: null, error: parsed.error.issues[0].message };
+  }
+  if (!(await canEditOrder(principal, id))) {           // 3. authorize this action on this resource
+    return { data: null, error: 'Not found' };          //    deny as not-found — do not confirm existence
+  }
+  // ... mutate, revalidatePath, return ActionResult
+}
+```
+The order is non-negotiable: authenticate, validate, authorize, then act. A Server Action that trusts the form's own validation is unauthenticated and unvalidated.
+## Auth and Sessions: httpOnly Cookies
+The session token lives in an `httpOnly`, `Secure`, `SameSite` cookie set by the server — never in `localStorage` or a JS-readable cookie. A token JavaScript can read is a token an XSS payload can exfiltrate; `httpOnly` removes it from script's reach entirely.
+```ts
+cookies().set('session', token, {
+  httpOnly: true,
+  secure: true,
+  sameSite: 'lax',   // lax/strict is the baseline CSRF defence for cookie auth
+  path: '/',
+});
+```
+- Authentication runs through a proven provider (OIDC); the app does not hand-roll JWT verification or session crypto. Auth is boring technology — `docs/principles/system-design/identity-and-access.md`.
+- The session is read and verified on the server (Server Component / Server Action / route handler), and the proxy (`proxy.ts`) gates protected segments. A client-side `isLoggedIn` flag is UX, never a gate.
+## CSRF on Mutations
+Cookie-authenticated mutations need CSRF protection, because the browser attaches the session cookie to cross-site requests automatically. The first layer is `SameSite=lax`/`strict` on the session cookie, which blocks the classic cross-site form post.
+- Server Actions carry framework-level protection: Next.js verifies an origin/action token, so a third-party page cannot replay one. Keep that — do not expose the same mutation as an unprotected route handler that bypasses it.
+- A route handler that mutates state under cookie auth validates the `Origin` header against an allowlist (and uses a CSRF token if it cannot rely on `SameSite`). A `GET` never mutates.
+## SSRF on Server Fetches
+Server-side `fetch` runs from inside your network, so a fetch aimed at an input-supplied URL is an SSRF vector — an attacker points it at cloud metadata endpoints or internal services.
+- The API client (`lib/api.ts`) targets a configured base URL, never a host taken from the request. Keep outbound targets constant or allowlisted.
+- A feature that must fetch a user-supplied URL (a webhook, an image proxy) validates the resolved host against an allowlist and rejects non-`https:` schemes and private address ranges before the call.
+## Content-Security-Policy
+A strict CSP is the second line that contains an XSS that slips past escaping. Set it on responses via the proxy (`proxy.ts`) or `next.config.ts` headers.
+```
+default-src 'self'; script-src 'self'; object-src 'none'; base-uri 'self'; frame-ancestors 'none'
+```
+- `frame-ancestors 'none'` (or an allowlist) blocks clickjacking; pair it with `X-Content-Type-Options: nosniff`.
+- Avoid `'unsafe-inline'` in `script-src`; use a nonce for any inline script Next.js requires. A CSP containing `*` in `script-src` is not a CSP.
+## Security Review Checklist
+For any PR touching Server Actions, route handlers, auth, `proxy.ts`, or `dangerouslySetInnerHTML`:
+- [ ] No `dangerouslySetInnerHTML` on unsanitised or user-origin HTML
+- [ ] No secret without the `NEXT_PUBLIC_` decision being deliberate; no secret read in a `'use client'` module
+- [ ] Every Server Action authenticates, Zod-validates, and authorizes before acting
+- [ ] Session token in an `httpOnly` `Secure` cookie — never `localStorage`
+- [ ] Mutating route handlers check `Origin`; `GET` never mutates
+- [ ] Server fetches target a constant/allowlisted host — no input-supplied URL unchecked
+- [ ] CSP present and not weakened to `*`/`unsafe-inline` script
+- [ ] Authorization decided on the server; no client flag used as a gate

package/src/engineer-skills/groundwork-nextjs-engineer/references/testing.md CHANGED Viewed

@@ -15,6 +15,10 @@
 ## Testing Philosophy
+The frontend shape is the **testing trophy** (Kent Dodds): a thin static-analysis base, a few unit tests, a fat middle of integration tests that render real component trees against a mocked network, and a thin layer of end-to-end checks. It is the frontend idiom of the framework testing canon (`docs/principles/foundations/testing.md`) — the backends run the honeycomb, the frontend runs the trophy, and both put the weight on integration rather than isolated units. When this file and the canon disagree, the canon wins and this file is the one to fix.
+Above the trophy sits the front-door proof: drive the real running app the way a user does, end to end against the real backend — Playwright against the running stack — because component and integration tests that each pass against an MSW-mocked network can still assemble into an app that does nothing on the real API. That is also the fake-needs-a-real-test rule: every MSW handler or fixture standing in for a real endpoint is a debt, and some real integration or e2e test against the actual network path must pay it. Seeded inputs are fine; what cannot stand is a mock with no real test behind it — that is a green light wired to nothing (`docs/principles/foundations/testing.md`).
 Tests in the Next.js application follow four rules:
 1. **Vitest + React Testing Library** for all component and hook tests
@@ -413,6 +417,59 @@ it('shows error message on server failure', async () => {
 ---
+## Trace Assertions
+The app ships OpenTelemetry through `instrumentation.ts`, so server-side work — route handlers and Server Actions — emits spans. Where a slice adds a server path whose trace a dashboard or SLO depends on, assert on it with an **in-memory span exporter** rather than trusting the instrumentation silently. This is server-side only; component and hook tests assert on rendered behaviour, not traces.
+```ts
+import {
+  BasicTracerProvider,
+  InMemorySpanExporter,
+  SimpleSpanProcessor,
+} from '@opentelemetry/sdk-trace-base';
+const exporter = new InMemorySpanExporter();
+const provider = new BasicTracerProvider({ spanProcessors: [new SimpleSpanProcessor(exporter)] });
+provider.register();
+// invoke the route handler / server action, then:
+const names = exporter.getFinishedSpans().map((s) => s.name);
+expect(names).toContain('POST /v1/meetings'); // the entry span exists, trace connected
+```
+Assert the spans that must exist and the attributes a query depends on; let the rest float — pinning the whole span tree couples the test to implementation.
+## Mutation Testing — the assertion-quality read-out
+Coverage tells you a line ran; it does not tell you an assertion checked it — a 100% covered `lib/utils.ts` can still assert nothing. **StrykerJS** is the read-out that proves the assertions bite: it mutates the code and confirms a test fails. Treat it as a **signal, never a gate**, and run it incrementally on changed code (`stryker run --incremental`, which diffs against the cached `reports/stryker-incremental.json` — there is no `--since` flag). Point it at the dense pure logic first (`lib/utils.ts`, schema validators, formatters); a surviving mutant there is the missing assertion to add. This is also the antidote to AI-generated component tests, whose oracle is lifted from the current markup and so cement bugs as expected.
+## Generate the Inputs You Can't Enumerate
+Example-based tests check the cases you thought of; the bugs live in the cases you didn't (canon principle 7). Two generative surfaces apply to a Next.js app:
+- **Property-based tests with `fast-check`** for the dense pure logic — formatters, `lib/utils.ts`, Zod-adjacent transforms, anything with an invariant (a round-trip, a sort that must stay stable, a parse that must never throw). State the property; fast-check generates and shrinks counterexamples. This is the highest-leverage complement to example-based unit tests: one property covers an infinity of inputs.
+```ts
+import fc from 'fast-check';
+it('formatDuration never throws and always ends in m', () => {
+  fc.assert(
+    fc.property(fc.nat({ max: 100_000 }), (minutes) => {
+      const out = formatDuration(minutes);
+      expect(out).toMatch(/m$/);
+    }),
+  );
+});
+```
+- **Schemathesis at the API boundary.** Route handlers backed by an OpenAPI schema are the bridge between contract testing and property fuzzing: point Schemathesis at the spec and it derives a semantics-aware fuzzer that finds materially more defects than example-based API tests for the cost of pointing it at the schema. Run it against the app's route handlers (`/api/*`) in a dedicated lane, not on every component PR.
+Reach for these where invariants are real. Presentational components have no invariant to state — test them with example-based RTL renders.
+## Naming Tests by Behaviour
+A test name must let an on-call engineer form a hypothesis from the failure log alone, without opening the file. State the behaviour and the condition — `should [expected outcome] when [condition]` — not the implementation. `renders correctly` and `works` say nothing the dashboard doesn't already show; `shows the retry button when the meetings request fails` does. The format serves the goal; a name that states behaviour and condition in another shape is fine.
 ## Test Commands
 | Command | Purpose |
@@ -425,9 +482,10 @@ it('shows error message on server failure', async () => {
 ## Bet Slice Rollout — the permanent tests a slice owes
-When a bet slice's progress tests go green, the slice rolls out permanent coverage before it closes (bet workflow, Delivery step 5). The bet-progress tests prove the capability once and are archived; these stay.
+When a bet slice's progress tests go green, the slice-worker rolls out permanent coverage as part of the same slice, before the driver reviews it (bet workflow, Delivery). The bet-progress tests prove the capability once and are archived; these stay.
 - **Interface test (always).** One Playwright test per user-observable behaviour the slice delivered, using the page objects under `tests/system/pages/` — selectors live in the page object, assertions in the test.
 - **Component tests (when state earned them).** Components the slice introduced with conditional rendering, optimistic updates, or error states get component-level tests; purely presentational markup does not.
 - **Accessibility coverage (when the slice added a surface).** A new screen or interactive flow extends the a11y smoke — axe scan clean and keyboard path exercised — because regressions here are invisible to every other test type.
 - **Server action / route tests (when the slice added them).** Server actions and route handlers the slice introduced get request-level tests with Zod schema failures exercised, not just the happy path.
+- **Critical-path trace assertions (when the slice added an instrumented server path).** A route handler or Server Action whose trace a dashboard or SLO depends on pins it with an in-memory-exporter test: the entry span exists and the trace stays connected. A missing span is a test failure, not an instrumentation TODO.

package/src/engineer-skills/groundwork-nextjs-engineer/references/ux-principles.md CHANGED Viewed

@@ -150,55 +150,7 @@ Actions should reveal contextually — on hover, on selection, or on focus. Don'
 ## Accessibility
-Accessibility is not optional — it is a baseline requirement.
-### Contrast
-- Body text must meet **WCAG AA** (4.5:1 contrast ratio)
-- Large text (18px+ or 14px+ bold) must meet **3:1**
-- Interactive elements must meet **3:1** against their background
-- Test in both dark and light themes
-### Focus States
-- All interactive elements must have a visible focus indicator
-- Focus indicators must contrast with the background (not just browser default outline)
-- Tab order must follow visual layout order
-```css
-:focus-visible {
-  outline: 2px solid var(--color-accent);
-  outline-offset: 2px;
-  border-radius: var(--radius-sm);
-}
-```
-### Semantic HTML & ARIA
-- Use semantic elements: `<nav>`, `<main>`, `<article>`, `<section>`, `<aside>`, `<header>`, `<footer>`
-- Icon-only buttons must have `aria-label`:
-  ```tsx
-  <button onClick={handleClose} aria-label="Close dialog">
-    <X size={20} />
-  </button>
-  ```
-- Use `role="alert"` for error messages that must be announced by screen readers
-- Never use `div` or `span` for clickable elements — use `button` or `a`
-### Colour Independence
-- Never communicate information through colour alone
-- Pair colour indicators with icons or text labels:
-  ```tsx
-  // Bad — colour-only status
-  <span className="text-success">●</span>
-  // Good — colour + icon + text
-  <span className="text-success flex items-center gap-1">
-    <CheckCircle size={16} aria-hidden />
-    Completed
-  </span>
-  ```
+Accessibility is a merge gate, not optional polish — semantic HTML, keyboard reachability, WCAG AA contrast, and labelled forms are a baseline requirement on every surface. See `references/accessibility.md` for the full reference.
 ---

package/src/engineer-skills/groundwork-nextjs-engineer/sync-anchor.md CHANGED Viewed

@@ -1,9 +1,16 @@
 # Sync Anchor
-This file pins the principle files this skill embeds. When any listed file
-changes, this skill must be reviewed in the same commit. CI verifies the
-hashes match.
+This file pins the principle files this skill embeds — both the per-stack
+TypeScript/frontend idiom doc and the cross-cutting central canon this skill
+distils. When any listed file changes, this skill must be reviewed in the same
+commit. CI verifies the hashes match.
 | Principle file | SHA-256 | Last reviewed |
 |---|---|---|
 | src/generators/nextjs-app/docs/principles/stack/typescript/frontend.md | 98232d067ad03c08d6c1ca5f2caec30e7c3400da55c3afb7754482bc121d7554 | 2026-05-26 |
+| src/docs/principles/foundations/testing.md | 205ac40d4c643e7b61cf1e4295df8a7b8b46dcd7c81b857aa8c642ea353f62ef | 2026-06-27 |
+| src/docs/principles/quality/observability.md | 8aa60e213ba03e989c93263153e3a1ac10b2336f6d0360c394f473660d565a0b | 2026-06-26 |
+| src/docs/principles/quality/security.md | 61157d97677142737ec537954dc5aaad7a04012cc8a3dcc855e2d324287fdc64 | 2026-06-26 |
+| src/docs/principles/quality/performance.md | 18b6d3391c57d97342068f9f1da732b24de4221489d0459bb6ad8900fac0a02e | 2026-06-26 |
+| src/docs/principles/quality/accessibility.md | f921e7bf6256bc105b127b841d0a30af8a70ad1ddd7632d492589f052e6501b2 | 2026-06-26 |
+| src/docs/principles/foundations/documentation.md | 8b576072eaf4970f1251b560781e3e755c864a7920faa599b2834c921cbb8734 | 2026-06-26 |

package/src/engineer-skills/groundwork-python-engineer/SKILL.md CHANGED Viewed

@@ -21,10 +21,11 @@ Python backend execution router for service repositories. Durable engineering gu
 ## Operating Contract
 1. Load reference docs from `references/` for architectural and implementation guidance. Treat the current repository's code, specs, and generated contracts as the source of truth for naming, structure, and behavior.
-2. Inspect the current repository before naming packages, commands, import paths, schemas, or generated files.
+2. Orient with the repo map and Serena before reading widely (see Required First Checks) — find the hubs, then navigate by symbol. Inspect the current repository before naming packages, commands, import paths, schemas, or generated files.
 3. Load the smallest reference set that explains the task. Add more context only when the task crosses a boundary.
 4. Preserve the service's dependency direction and public contracts. Code implements OpenAPI, database migrations, event schemas, and documented architecture — it does not invent them.
-5. Coordinate with adjacent skills when another skill owns the primary decision surface.
+5. Treat observability as part of the contract, not an afterthought: a critical path emits an unbroken trace, and a missing span is a defect. Route durable engineering policy to the canonical docs (`docs/principles/stack/python/`, and the cross-cutting canon under `docs/principles/quality/` and `docs/principles/foundations/`) rather than restating it in code comments or this skill.
+6. Coordinate with adjacent skills when another skill owns the primary decision surface.
 ---
@@ -40,6 +41,7 @@ Before non-trivial Python implementation or review work:
 | Check | Why |
 |---|---|
+| **Orient with the repo map + Serena** — refresh `npx groundwork-method repo-map`, read its `centrality` ranking to find the hubs, then navigate them with Serena (`get_symbols_overview` / `find_symbol` / `find_referencing_symbols`) | A blind file crawl misses the structure the map already computed; symbol navigation and reference-aware edits beat grep-and-read. Fall back to ordinary reads only when these are unavailable |
 | Service package layout and nearby examples for the touched layer | Prevents inventing structure that already has a convention |
 | `pyproject.toml` for Python version and dependencies | Avoids version-specific advice that contradicts the project |
 | OpenAPI spec (if HTTP behavior changes) | HTTP contracts are generated — code must match the spec |
@@ -67,6 +69,7 @@ Load only the rows relevant to the current task. Reference files are in the skil
 | Resilience — timeouts, retries, circuit breakers, health probes | `resilience.md` |
 | Graceful shutdown, degradation, lifespan management | `resilience.md`, `async-patterns.md` |
 | Observability — tracing, structured logging, metrics | `observability.md` |
+| Security, auth, secrets, input validation, supply chain, SSRF | `security.md` |
 | Tests, quality gates, coverage strategy, fixture design | `testing.md` |
 | Code documentation, docstrings, Pydantic Field docs | `documentation-mcp.md` |
 | Error handling, exception hierarchy, domain errors | `implementation-patterns.md` |

package/src/engineer-skills/groundwork-python-engineer/references/security.md ADDED Viewed

@@ -0,0 +1,148 @@
+# Security
+This service is a trust boundary. Everything outside it — clients, webhooks, queue events, upstream APIs, model output — is hostile until validated. This file is the Python idiom of the framework security canon (`docs/principles/quality/security.md`); when this file and the canon disagree, the canon wins and this file is the one to fix.
+The controls below are enforced at the FastAPI entrypoint and the adapter edge, not scattered through the Domain. The boundary is validated once and explicitly; inside it, the core trusts its own types.
+## 1. Input is hostile; validate at the boundary
+Every inbound payload is a Pydantic model parsed at the route, not a `dict` read field-by-field. Pydantic v2 validation *is* the boundary check — a request that fails parsing never reaches a service.
+```python
+from pydantic import BaseModel, Field, EmailStr
+class CreateOrderRequest(BaseModel):
+    model_config = {"extra": "forbid"}  # reject unknown fields, never silently drop
+    customer_email: EmailStr
+    quantity: int = Field(gt=0, le=1000)
+    note: str = Field(default="", max_length=2000)
+@router.post("/orders")
+async def create_order(body: CreateOrderRequest) -> OrderResponse:
+    # body is validated; the service receives a typed domain request, not raw input
+    ...
+```
+- `extra="forbid"` turns mass-assignment and typo'd fields into a `422`, not a silent accept.
+- Constrain at the type (`gt`, `le`, `max_length`, `EmailStr`, `Literal`), so the constraint travels with the field and cannot be forgotten by a caller.
+- Do not re-validate between internal callers — the core trusts its own dataclasses (`references/implementation-patterns.md` → Strict Typing). One boundary, scrutinised; no defensive re-checks inside.
+## 2. Parameterised queries — never string-built SQL
+SQL injection is closed by construction: the query text is constant and every value is a bound parameter. SQLAlchemy and the driver do the binding; an f-string carrying user input into SQL is a defect.
+```python
+from sqlalchemy import select, text
+# ORM — values are bound, never interpolated
+stmt = select(OrderRow).where(OrderRow.customer_id == customer_id)
+# Raw SQL when unavoidable — named bind parameters, never an f-string
+await session.execute(
+    text("SELECT * FROM orders WHERE customer_id = :cid"),
+    {"cid": customer_id},
+)
+```
+The session lifecycle and the repository port live in `references/database.md`; security adds one rule on top — no user value reaches a query except as a bound parameter, and table/column names are never taken from input.
+## 3. Authorization at the dependency boundary
+Authentication establishes *who*; authorization decides *what they may do*. Both are FastAPI dependencies on the route, enforced through one path, not re-implemented per handler.
+```python
+from fastapi import Depends, HTTPException, status
+async def require_order_access(
+    order_id: str,
+    principal: Principal = Depends(get_principal),  # from the verified token
+) -> str:
+    if not await policy.can_access_order(principal, order_id):
+        raise HTTPException(status.HTTP_403_FORBIDDEN)
+    return order_id
+@router.get("/orders/{order_id}")
+async def get_order(order_id: str = Depends(require_order_access)) -> OrderResponse:
+    ...
+```
+- The token is verified by a proven provider (OIDC); the service does not hand-roll JWT or session logic. Auth is boring technology — see `docs/principles/system-design/identity-and-access.md`.
+- In a multi-tenant service the tenant is bound to the authenticated principal and enforced at the data boundary, never trusted from a path or query parameter.
+- Least privilege: the database role and any cloud identity the service runs as start minimal and widen only on evidence.
+## 4. Secrets are managed, never in code
+No secret lives in source, in a committed `.env`, or baked into an image layer. Configuration is validated once at boot with `pydantic-settings` (`references/implementation-patterns.md` → Configuration Validation); secret *values* are injected from the platform's secret manager at runtime.
+```python
+from pydantic import Field
+from pydantic_settings import BaseSettings
+class Secrets(BaseSettings):
+    # Sourced from the secret manager / injected env at runtime — never a default here
+    database_url: str = Field(..., min_length=1)
+    upstream_api_key: str = Field(..., min_length=16)
+# .env.example carries names with empty values; real values never enter the repo
+```
+The hierarchy is eliminate, then shorten, then rotate: prefer workload identity or OIDC federation (no static credential at all), then short-lived minted secrets, and reserve scheduled rotation for static credentials that genuinely cannot be made ephemeral.
+## 5. Supply chain is part of the attack surface
+Every third-party package is a potential exploit vector. `uv` pins the full dependency graph in `uv.lock`; CI installs from the lockfile (`uv sync --frozen`), never an unpinned resolve.
+- A new dependency is a reviewed decision, not an intuition — check maintenance, ownership, and transitive weight before adding it.
+- Generate an SBOM and run a vulnerability scan (`uv pip audit` / `pip-audit` or equivalent) on every build; a known-vulnerable transitive dependency fails the build.
+- Emit build provenance for anything published, so the artifact's origin is verifiable, not just its contents.
+## 6. SSRF on outbound calls
+A service that fetches a URL derived from input is an SSRF vector — an attacker aims it at internal metadata endpoints or private hosts. Outbound targets are allowlisted, not reflected from the request.
+```python
+from urllib.parse import urlparse
+ALLOWED_HOSTS = {"api.partner.example", "cdn.partner.example"}
+def assert_allowed(url: str) -> str:
+    host = urlparse(url).hostname
+    if host not in ALLOWED_HOSTS:
+        raise PermanentInferenceError(f"outbound host not allowed: {host}")
+    return url
+```
+- Validate the resolved host against an allowlist before the call; reject `file:`, `gopher:`, and non-HTTPS schemes.
+- Set explicit connect/read timeouts on every outbound client so a hostile or slow upstream cannot exhaust the service (`references/resilience.md`).
+## 7. Error envelopes that do not leak internals
+A client receives a stable, structured error; stack traces, SQL fragments, and upstream provider messages stay in the logs. The Domain raises typed exceptions (`references/implementation-patterns.md` → Error Handling); one exception handler maps them to a safe envelope.
+```python
+from fastapi import Request
+from fastapi.responses import JSONResponse
+@app.exception_handler(AppError)
+async def handle_app_error(request: Request, exc: AppError) -> JSONResponse:
+    logger.exception("request failed", extra={"trace_id": current_trace_id()})
+    # client sees a code and a correlation id — never exc internals
+    return JSONResponse(
+        status_code=422,
+        content={"error": exc.code, "trace_id": current_trace_id()},
+    )
+```
+The `422` unification and CORS rules live in `references/api-standards.md`; security's addition is that the body never carries an internal detail and the correlation id is how support traces it without exposing it.
+## Anti-Patterns
+- **Reading the raw request body as a `dict`.** Bypasses validation; parse a Pydantic model with `extra="forbid"`.
+- **f-string SQL.** `f"WHERE id = {user_id}"` is injection. Bind every value.
+- **Per-handler permission checks that drift.** Authorize through one dependency; model the policy once.
+- **Secrets in `.env`, an image layer, or a default value.** Inject from the secret manager at runtime.
+- **Unpinned installs in CI.** `uv sync --frozen` against `uv.lock`; scan and SBOM every build.
+- **Fetching an input-supplied URL unchecked.** Allowlist the host; block internal addresses and non-HTTPS schemes.
+- **Returning the exception string to the client.** Log the detail, return a code and a trace id.
+- **"It is an internal service, skip auth."** Internal services are an attacker's favourite foothold — zero trust between services.

package/src/engineer-skills/groundwork-python-engineer/references/testing.md CHANGED Viewed

@@ -6,6 +6,10 @@ Testcontainers spins up real Postgres/Pub/Sub in seconds. The confidence it prov
 **Service tests are the default.** Unit tests are reserved for genuinely complex logic. System tests are minimal.
+This is the stack idiom of the framework testing canon (`docs/principles/foundations/testing.md`); when this file and the canon disagree, the canon wins and this file is the one to fix.
+Above the honeycomb sits the front-door proof: drive the real running service through the front door a consumer actually calls — its HTTP API — end to end on the real pipeline. Service tests that each pass behind a `dependency_overrides` mock or a stubbed adapter can still assemble into a product that does nothing on the real path, so one proof exercises that path with nothing in the middle faked. This makes a rule on every fake: a stub or fixture standing in for a real stage — an LLM response, a downstream service, a producer's output — is a debt that some test of the real producer must pay. Seeding the input is fine; faking the work in the middle is the violation, and an unpaid debt is a green light wired to nothing.
 ---
 ## Tier 1 — Service Tests (Default)
@@ -159,6 +163,40 @@ uv run pytest tests/integration -m live         # Live API tests — requires re
 uv run pytest tests/system                      # Bootstrap + golden path
 ```
+## Trace Assertions
+Observability is a test surface: a critical-path request must emit an unbroken trace, and a missing span is a test failure, not an instrumentation TODO. The mechanism is an **in-memory span exporter** from the OTel SDK — no external tooling, and the durable approach now that the dedicated trace-test tools have gone dormant.
+```python
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import SimpleSpanProcessor
+from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
+@pytest.fixture
+def spans():
+    exporter = InMemorySpanExporter()
+    provider = TracerProvider()
+    provider.add_span_processor(SimpleSpanProcessor(exporter))
+    trace.set_tracer_provider(provider)
+    return exporter
+async def test_process_emits_trace(client, spans):
+    await client.post("/process", json={"input": "test"})
+    names = {s.name for s in spans.get_finished_spans()}
+    assert "POST /process" in names   # the entry span exists
+    assert "db.insert" in names        # the work span exists
+    # assert the work span is a descendant of the entry span — the trace is connected
+```
+Assert what the contract promises — the spans that must exist on the journey, that the trace stays connected across hops, and the attributes a dashboard or SLO query depends on. Pinning the exact span tree and every attribute couples the test to implementation and trains the team to delete it.
+## Mutation Testing — the assertion-quality read-out
+A fat service test drives many branches through one HTTP call and can *execute* them all while only asserting on the response body. Mutation testing is the one instrument that proves the suite checks what it runs: inject a fault, confirm a test fails, and a surviving mutant is a line you cover but do not check. Treat it as a **signal, never a gate** — run it on the high-risk modules the risk matrix flags and on changed code only.
+Use `mutmut` or `cosmic-ray`. A gotcha: `mutmut` 3 no longer mutates module-level code (only function bodies), so a module whose risk lives in top-level constants or table definitions needs `cosmic-ray` or `mutmut` 2. Scope a run to the dense package under change (`mutmut run --paths-to-mutate src/<pkg>/pricing`), read it as a read-out, and turn each surviving mutant on changed high-risk code into the missing assertion — the same read-out catches AI-generated tests whose oracle was lifted from the implementation.
 ## Anti-Patterns
 - **Mocking the interior.** Test what comes out, not which methods were called.
@@ -169,9 +207,10 @@ uv run pytest tests/system                      # Bootstrap + golden path
 ## Bet Slice Rollout — the permanent tests a slice owes
-When a bet slice's progress tests go green, the slice rolls out permanent coverage before it closes (bet workflow, Delivery step 5). The bet-progress tests prove the capability once and are archived; these stay.
+When a bet slice's progress tests go green, the slice-worker rolls out permanent coverage as part of the same slice, before the driver reviews it (bet workflow, Delivery). The bet-progress tests prove the capability once and are archived; these stay.
 - **Service perimeter test (always).** One test per capability the slice delivered, through `httpx.AsyncClient` against the real app with a real database — the coverage that survives refactors.
 - **Unit tests (when logic earned them).** Pure-function tests for branching business logic the slice introduced — validation rules, transformations, state machines. Plumbing does not earn unit tests; the perimeter test already covers it.
 - **Property-based tests (when invariants exist).** A slice that introduced an invariant — round-trip serialization, idempotent consumers, order-independent merges — pins it with Hypothesis, because example-based tests sample invariants instead of stating them.
+- **Critical-path trace assertions (when the slice added an observable path).** A slice that introduced an endpoint or worker whose trace a dashboard or SLO depends on pins it with an in-memory-exporter test: the entry and work spans exist and the trace stays connected. A missing span is a test failure, not an instrumentation TODO.
 - **Contract conformance (when the slice changed an API).** FastAPI's served `/openapi.json` must match the promoted spec in `docs/architecture/api/<service>/openapi.yaml`; the generated system suite checks this — the slice's job is to keep the spec promotion current.

package/src/engineer-skills/groundwork-python-engineer/sync-anchor.md CHANGED Viewed

@@ -1,13 +1,20 @@
 # Sync Anchor
-This file pins the principle files this skill embeds. When any listed file
-changes, this skill must be reviewed in the same commit. CI verifies the
-hashes match.
+This file pins the principle files this skill embeds — both the per-stack Python
+idiom docs and the cross-cutting central canon this skill distils. When any
+listed file changes, this skill must be reviewed in the same commit (and the
+matching per-stack idiom doc reconciled to the canon). CI verifies the hashes
+match.
 | Principle file | SHA-256 | Last reviewed |
 |---|---|---|
 | src/generators/python-microservice/docs/principles/stack/python/async.md | 6fdd399fb3052381020ff6e792a724d72bdabe674817794093853cbf24fa9f97 | 2026-05-26 |
 | src/generators/python-microservice/docs/principles/stack/python/resilience.md | d5a7b8f089acdb71d64c1bd4fc9ce80e6947504b01b0ace695ac5ee66554a1b1 | 2026-06-19 |
-| src/generators/python-microservice/docs/principles/stack/python/testing.md | b596a4281825349c627dca17e671052ef64a371ff66c50b66d11ceab5ee7b5f2 | 2026-06-19 |
+| src/generators/python-microservice/docs/principles/stack/python/testing.md | f15e62c83b659788f8b3e39f560779e892080d6bce73768be093d919f3e6946c | 2026-06-26 |
 | src/generators/python-microservice/docs/principles/stack/python/documentation.md | ac58228ba22435bf9bad2ea5bf924bdf9e3674e9967515d5c82aaf3b7825214d | 2026-05-26 |
 | src/generators/python-microservice/docs/principles/stack/python/mcp.md | 1e6deab0b45c7271e0038e9b3d51bc30cb2917488f608e847d566739ac6caeba | 2026-06-19 |
+| src/docs/principles/foundations/testing.md | 205ac40d4c643e7b61cf1e4295df8a7b8b46dcd7c81b857aa8c642ea353f62ef | 2026-06-27 |
+| src/docs/principles/quality/observability.md | 8aa60e213ba03e989c93263153e3a1ac10b2336f6d0360c394f473660d565a0b | 2026-06-26 |
+| src/docs/principles/quality/security.md | 61157d97677142737ec537954dc5aaad7a04012cc8a3dcc855e2d324287fdc64 | 2026-06-26 |
+| src/docs/principles/quality/reliability.md | 9c9788504e0963458667d2727c3fc2359776108be593a2efc6603f6470002252 | 2026-06-26 |
+| src/docs/principles/foundations/documentation.md | 8b576072eaf4970f1251b560781e3e755c864a7920faa599b2834c921cbb8734 | 2026-06-26 |

package/src/generators/electron-app/docs/principles/stack/electron/index.md CHANGED Viewed

@@ -14,6 +14,8 @@ Electron is GroundWork's standard desktop surface. This set owns what the deskto
 [Surface Stack Selection](../../surface-stack-selection.md) picks Electron on the agent-closable-loop axis: Electron renders on bundled Chromium, so Playwright's `_electron` driver launches the real packaged app, drives its windows as ordinary `Page`s, and evaluates code in the main process — one deterministic engine on every OS, headless under Xvfb in CI. No other desktop option closes generate → boot → test → observe without a human in the loop. The renderer reusing the web stack and brand-token projection wholesale is the second axis win. Tauri is the recorded alternative when binary size and RAM dominate — its per-OS system webviews and WebDriver-only testing surrender the loop, so it is never the default.
+That `_electron` smoke (`tests/smoke/app.spec.ts`) is the Electron side of the **native UI check contract** (`src/generators/system-test-runner/NATIVE-CHECK-CONTRACT.md`): the `system-test-runner` drives it as the surface's visual gate, so it carries the contract's dimensions on the real binary — it renders without a blank or crash frame, drives the **named async state** (a deterministically unreachable core renders its designed state, not a crash), and confirms the **design-system tokens landed** (the brand custom properties resolve and the heading paints the projected primary token, not an unstyled default). Navigation / no-dead-ends is exempt while the app is single-screen; a bet that adds screens drives between them and back here. Keep it thin — the milestone's front-door bet-progress proof is what drives the real pipeline end to end.
 ## What this set owns
 | File | Owns |