npm - groundwork-method - Versions diffs - 0.10.0 → 0.11.0 - Mend

groundwork-method 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/src/engineer-skills/groundwork-flutter-engineer/references/security.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Security
+## Table of Contents
+- [The Posture](#the-posture)
+- [No Secrets in the Binary](#no-secrets-in-the-binary)
+- [Secure Storage for Tokens](#secure-storage-for-tokens)
+- [Transport and Certificate Pinning](#transport-and-certificate-pinning)
+- [Deep Link and Intent Validation](#deep-link-and-intent-validation)
+- [Biometric and Auth-Token Handling](#biometric-and-auth-token-handling)
+- [Obfuscation's Real Limits](#obfuscations-real-limits)
+- [Security Review Checklist](#security-review-checklist)
+---
+## The Posture
+A shipped mobile binary runs on a device the attacker owns. It can be pulled off the device, decompiled, patched, and run under a debugger or a proxy. So the client is **not** a trust boundary — it is hostile territory you are deploying into. Every security guarantee that matters is enforced server-side, behind the gateway (`references/data-and-contracts.md` → The Seam); the client's job is to handle credentials carefully and fail safe, not to keep secrets or enforce rules. This file is the Flutter idiom of the framework security canon (`docs/principles/quality/security.md`); when this file and the canon disagree, the canon wins and this file is the one to fix.
+The recurring mistake is treating the app as trusted because it is "your" code. It stops being yours the moment it ships.
+## No Secrets in the Binary
+There is no secure place in the app bundle for a secret. API keys, signing keys, and shared credentials compiled into Dart — or passed via `--dart-define`, or baked into a `.env` asset — are all recoverable from the binary. The config seam already states this: no `.env` files baked in, no secrets in `--dart-define` (`references/data-and-contracts.md` → Configuration).
+- A capability that needs a secret (a third-party API key, a server-to-server credential) lives **server-side**, behind a core endpoint the app calls with its user session. The surface never embeds a provider key — one owner per capability, keys stay server-side.
+- `--dart-define` is for non-secret configuration only: the gateway base URL, build flavour, feature toggles.
+- If the app appears to need a secret to call a third party directly, that call belongs on the server; route it through the core.
+## Secure Storage for Tokens
+Session and refresh tokens go in the platform keystore via `flutter_secure_storage`, which is backed by the iOS **Keychain** and the Android **Keystore**/EncryptedSharedPreferences. `SharedPreferences` and plain files are world-readable on a rooted/jailbroken device — they are never used for anything sensitive.
+```dart
+const _storage = FlutterSecureStorage(
+  aOptions: AndroidOptions(encryptedSharedPreferences: true),
+  iOptions: IOSOptions(accessibility: KeychainAccessibility.first_unlock_this_device),
+);
+Future<void> saveSession(String token) =>
+    _storage.write(key: 'session_token', value: token);
+```
+- Tokens read out of secure storage are injected as a header by a single dio interceptor (`references/data-and-contracts.md` → dio Conventions), not stored in widget state or passed through constructors.
+- `first_unlock_this_device` keeps the credential off device backups and off other devices; widen it only with a recorded reason.
+- Clear secure storage on sign-out and on detected token invalidation — a lingering token is a credential waiting to be lifted.
+## Transport and Certificate Pinning
+All traffic is HTTPS; the dio client rejects plaintext. For an app handling sensitive data, pin the gateway's certificate (or its public-key/SPKI hash) so a user-installed or malicious root CA cannot transparently proxy the connection.
+```dart
+(dio.httpClientAdapter as IOHttpClientAdapter).createHttpClient = () {
+  final client = HttpClient();
+  client.badCertificateCallback = (cert, host, port) =>
+      _spkiSha256(cert) == _pinnedSpkiHash;  // pin to the key, not the leaf cert
+  return client;
+};
+```
+- Pin to the SPKI hash, not a specific leaf certificate, so routine certificate renewal does not brick the app; ship at least one backup pin for rotation.
+- Pinning raises the bar against interception but is bypassable on a fully compromised device — it is defence in depth, not a substitute for server-side authorization.
+## Deep Link and Intent Validation
+Every `GoRoute` is deep-linkable, and a deep link is untrusted input from another app or a web page (`references/navigation.md` → Deep Links). Route parameters arriving cold are hostile until proven otherwise.
+- Validate and parse path/query parameters in the view model that receives them; never assume a deep-linked id exists, belongs to the user, or is well-formed. The server authorizes the fetch — the client renders the result or an error state.
+- Guard-sensitive destinations are protected by the central `redirect` (`references/navigation.md` → Auth Guards), so a deep link cannot skip authentication. Confirm new protected routes are covered by the guard, not by a per-screen check.
+- Treat any token or credential arriving *in* a deep link (an OAuth callback) as single-use, validated server-side, and never logged.
+## Biometric and Auth-Token Handling
+Biometric unlock (`local_auth`) gates access to a credential already held in the keystore; it does not authenticate the user to the server. The server trusts the session token, not the fingerprint.
+- Use biometrics to unlock or re-confirm before a sensitive action, then present the keystore-held token to the gateway. The biometric check is a local gate, not a replacement for a verified session.
+- Token refresh runs through the dio interceptor against the auth provider; the app does not mint, sign, or verify tokens itself. Auth is boring technology — `docs/principles/system-design/identity-and-access.md`.
+- On biometric failure or lockout, fall back to full re-authentication, never to an unguarded path.
+## Obfuscation's Real Limits
+`flutter build --obfuscate --split-debug-info` renames symbols and raises the cost of reverse engineering. It is a speed bump, not a control: it does not encrypt logic, hide strings reliably, or protect anything the earlier sections cover. An obfuscated binary still hands a determined attacker every embedded secret and every client-side check.
+Ship it for the marginal friction, and depend on it for nothing. The actual protections are: no secrets in the binary, tokens in the keystore, TLS pinning, and authorization on the server.
+## Security Review Checklist
+For any PR touching auth, secure storage, the dio client, deep-link routes, or build config:
+- [ ] No secret, key, or credential embedded in Dart, assets, or `--dart-define`
+- [ ] Tokens and credentials in `flutter_secure_storage` — never `SharedPreferences` or plain files
+- [ ] Secure storage cleared on sign-out and token invalidation
+- [ ] Transport is HTTPS only; pinning present (with a backup pin) for sensitive apps
+- [ ] Deep-link parameters validated in the view model; protected routes covered by the central guard
+- [ ] No client-side check standing in for a server-side authorization decision
+- [ ] Biometrics gate a keystore credential — they do not authenticate to the server
+- [ ] Obfuscation enabled, and relied on for nothing

package/src/engineer-skills/groundwork-flutter-engineer/references/testing.md CHANGED Viewed

@@ -25,6 +25,10 @@
 Pick the **cheapest tier that can carry the assertion**. If a widget test can prove it, an integration test that proves it is waste.
+This taxonomy is the Flutter idiom of the framework testing canon (`docs/principles/foundations/testing.md`): widget tests are the fat middle that the canon's honeycomb puts the weight on, unit tests are the thin solitary layer, and `integration_test` is the few-end-to-end top. When this file and the canon disagree, the canon wins and this file is the one to fix.
+Above all of these sits the front-door proof the canon now demands: an `integration_test` harness driving the real shipping app the way a user does — end to end against the real backend, not a fake gateway — because widget tests that each pass against fakes can still assemble into an app that does nothing on the real data path. And the fake-needs-a-real-test rule follows from it: a mock repository or fixture standing in for a real stage is a debt that some integration test of the real producer must pay. Seeded inputs are fine; faking the work in the middle with nothing real behind it is a green light wired to nothing. See `docs/principles/foundations/testing.md`.
 ## The Prove-Once Rule
 Capability behaviour is proven once, headless, at the core's contract. The surface suite proves three things only:
@@ -124,6 +128,18 @@ Goldens guard **design-system-level components** (the token-projected theme made
 - **iOS is a local-only / device-farm lane (Firebase Test Lab, Codemagic), never a CI gate** — it needs macOS runners and hands, and putting it in the gate breaks the headless loop.
 - A runner without the Flutter SDK reports the tier **skipped-with-reason, never silently green**.
+## Assertion Quality — Mutation Testing and Its Absence
+The canon's assertion-quality read-out is mutation testing — inject a fault, confirm a test fails. Dart has **no production-grade mutation tool** (the existing packages are experimental and unmaintained), so that automated read-out is not available here. The discipline it enforces is carried by review instead: fakes over mocks (a stub-and-verify mock asserts the call, not the outcome), assert on semantics and visible text rather than widget types, and cover error and empty states — the failure modes mutation testing would otherwise catch. When a dense pure-Dart algorithm genuinely warrants it, a hand-run experimental tool is a spot check, never a gate.
+## Generate the Inputs You Can't Enumerate
+Example-based tests check the cases you thought of (canon principle 7). The generative surface on a Flutter client is narrow but real: a **dense, pure-Dart unit** with an invariant — a mapper round-trip (`fromJson ∘ toJson = id`), a date/currency formatter, a validator that must never throw — can state the property and let the framework generate counterexamples. `glados` is the Dart property-based option; it is niche, so reach for it only where a genuine invariant lives in pure logic, never for widget trees. The service-boundary generative tools the canon names — Schemathesis, coverage-guided fuzzing — do not apply to a client: the gateway's contract is fuzzed once at the capability core, and re-running it from the surface duplicates a proof that already exists (the Prove-Once Rule).
+## Naming Tests by Behaviour
+A test name must let an engineer form a hypothesis from the failure log alone. State the observable behaviour and the condition — `'placing an order shows the confirmation'`, not `'OrderView test'`. Names that describe what the user sees survive refactors and double as living documentation; names that describe the widget under test convey nothing the file tree doesn't.
 ## Test Commands
 | Command | Purpose |
@@ -133,3 +149,12 @@ Goldens guard **design-system-level components** (the token-projected theme made
 | `npx nx run <app>:test-integration` | integration_test against a device/emulator |
 | `flutter test test/home_view_test.dart` | single file |
 | `flutter test --name 'refresh'` | by test name |
+## Bet Slice Rollout — the permanent tests a slice owes
+When a bet slice's progress tests go green, the slice rolls out permanent coverage before it closes (bet workflow, Delivery). The bet-progress tests prove the capability once and are archived; these stay. The Prove-Once Rule governs the whole rollout — surface tests prove wiring, rendering, and interaction; they never re-prove a business rule the capability core already owns.
+- **Widget tests (always).** The bulk of the slice's coverage: pump each View the slice delivered inside a `ProviderScope` with fake repositories, assert through semantics and visible text, and cover its error and empty states — not just the happy path.
+- **Unit tests (when logic earned them).** Pure-Dart tests for branching logic the slice introduced in a view model, mapper, or repository — through `ProviderContainer`, testing the real Notifier. Plumbing does not earn one; the widget test already covers it.
+- **Golden tests (when the slice touched a design-system component).** A new or changed token-projected component extends the alchemist goldens; screen-level goldens do not — they churn on every copy change.
+- **Integration / Patrol (only when the journey or the OS boundary is new).** A new critical journey earns one happy-path `integration_test`; a flow that newly leaves Flutter for the OS earns Patrol. Trace assertions do not apply — a Flutter client emits no OpenTelemetry traces, so there is no span surface to assert on.

package/src/engineer-skills/groundwork-flutter-engineer/sync-anchor.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # Sync Anchor
-This file pins the principle files this skill embeds. When any listed file
-changes, this skill must be reviewed in the same commit. CI verifies the
-hashes match.
+This file pins the principle files this skill embeds — both the per-stack
+Flutter idiom docs and the cross-cutting central canon this skill distils. When
+any listed file changes, this skill must be reviewed in the same commit (and the
+matching per-stack idiom doc reconciled to the canon). CI verifies the hashes
+match.
 | Principle file | SHA-256 | Last reviewed |
 |---|---|---|
@@ -10,6 +12,13 @@ hashes match.
 | src/generators/flutter-app/docs/principles/stack/flutter/architecture.md | ac10c2c87da358157973ebbfe07657491d68a248b946b00697fdc5e18f3af596 | 2026-06-12 |
 | src/generators/flutter-app/docs/principles/stack/flutter/state-management.md | a690a3476453cb8ed0af0d3f48566d8ba6a2d508ac1ef782fd27bbffd2268994 | 2026-06-12 |
 | src/generators/flutter-app/docs/principles/stack/flutter/widgets-and-composition.md | b45c55220f14a7886837c4d4159b33febceed772db32a7efb277e7dba00512e8 | 2026-06-12 |
-| src/generators/flutter-app/docs/principles/stack/flutter/testing.md | 5aec7d7c5300cf6f4c4f4b07809ff02010505cf71329f7dc5b766de39b97685e | 2026-06-12 |
+| src/generators/flutter-app/docs/principles/stack/flutter/testing.md | e337a0e1c4a6c5502f69745b0a4e0be35dd3303fbf11ed1f2f3688e93f16ed4f | 2026-06-27 |
 | src/generators/flutter-app/docs/principles/stack/flutter/platform-channels.md | 6b5a54dcb8b55433b7175cf715a6a3abf03c86c317f2037af6155a131691cfb2 | 2026-06-12 |
 | src/generators/flutter-app/docs/principles/stack/flutter/releases-and-distribution.md | 70ecdca2be6d8476359dbc2e72e3510157b49db0f5512cded76e0e3f19bed46f | 2026-06-12 |
+| src/docs/principles/foundations/testing.md | 205ac40d4c643e7b61cf1e4295df8a7b8b46dcd7c81b857aa8c642ea353f62ef | 2026-06-27 |
+| src/docs/principles/quality/observability.md | 8aa60e213ba03e989c93263153e3a1ac10b2336f6d0360c394f473660d565a0b | 2026-06-26 |
+| src/docs/principles/quality/security.md | 61157d97677142737ec537954dc5aaad7a04012cc8a3dcc855e2d324287fdc64 | 2026-06-26 |
+| src/docs/principles/quality/performance.md | 18b6d3391c57d97342068f9f1da732b24de4221489d0459bb6ad8900fac0a02e | 2026-06-26 |
+| src/docs/principles/quality/reliability.md | 9c9788504e0963458667d2727c3fc2359776108be593a2efc6603f6470002252 | 2026-06-26 |
+| src/docs/principles/quality/accessibility.md | f921e7bf6256bc105b127b841d0a30af8a70ad1ddd7632d492589f052e6501b2 | 2026-06-26 |
+| src/docs/principles/foundations/documentation.md | 8b576072eaf4970f1251b560781e3e755c864a7920faa599b2834c921cbb8734 | 2026-06-26 |

package/src/engineer-skills/groundwork-go-engineer/SKILL.md CHANGED Viewed

@@ -20,10 +20,11 @@ Go backend execution router for service repositories. Durable engineering guidan
 ## Operating Contract
 1. Load reference docs from `references/` for architectural and implementation guidance. Treat the current repository's code, specs, and generated contracts as the source of truth for naming, structure, and behavior.
-2. Inspect the current repository before naming packages, commands, import paths, schemas, or generated files.
+2. Orient with the repo map and Serena before reading widely (see Required First Checks) — find the hubs, then navigate by symbol. Inspect the current repository before naming packages, commands, import paths, schemas, or generated files.
 3. Load the smallest reference set that explains the task. Add more context only when the task crosses a boundary.
 4. Preserve the service's dependency direction and public contracts. Code implements OpenAPI, database migrations, event schemas, and documented architecture — it does not invent them.
-5. Coordinate with adjacent skills when another skill owns the primary decision surface.
+5. Treat observability as part of the contract, not an afterthought: a critical path emits an unbroken trace, and a missing span is a defect. Route durable engineering policy to the canonical docs (`docs/principles/stack/go/`, and the cross-cutting canon under `docs/principles/quality/` and `docs/principles/foundations/`) rather than restating it in code comments or this skill.
+6. Coordinate with adjacent skills when another skill owns the primary decision surface.
 ---
@@ -39,6 +40,7 @@ Before non-trivial Go implementation or review work:
 | Check | Why |
 |---|---|
+| **Orient with the repo map + Serena** — refresh `npx groundwork-method repo-map`, read its `centrality` ranking to find the hubs, then navigate them with Serena (`get_symbols_overview` / `find_symbol` / `find_referencing_symbols`) | A blind file crawl misses the structure the map already computed; symbol navigation and reference-aware edits beat grep-and-read. Fall back to ordinary reads only when these are unavailable |
 | Service package layout and nearby examples for the touched layer | Prevents inventing structure that already has a convention |
 | `go.mod` for Go and dependency versions | Avoids version-specific advice that contradicts the project |
 | OpenAPI spec (if HTTP behavior changes) | HTTP contracts are generated — code must match the spec |
@@ -67,6 +69,7 @@ Load only the rows relevant to the current task. Reference files are in the skil
 | Tests, quality gates, coverage strategy, flake triage | `testing.md` |
 | Code quality, naming, simplicity, deletion | `code-craft-security.md` |
 | Security, auth, secrets, input validation, supply chain | `code-craft-security.md` |
+| Doc comments, naming-as-documentation, godoc, comment-is-a-smell | `documentation.md` |
 ---

package/src/engineer-skills/groundwork-go-engineer/references/documentation.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Documentation
+Go ships its documentation discipline in the toolchain: `go doc` and pkg.go.dev render doc comments, `gofmt` normalises them, `go vet` flags malformed ones. The language was designed so that well-named code needs little prose around it. Lean on that.
+## Hierarchy
+Structure documents more reliably than comments. A comment is a promise no compiler checks; when the code changes, the comment silently lies. Documentation priority — the foundations principle (`docs/principles/foundations/documentation.md`) written the Go way:
+1. **Types and signatures** — the compiler rejects incorrect types. Zero drift risk.
+2. **Naming** — self-documenting identifiers and small interfaces. Refactor before you comment.
+3. **Error context** — `fmt.Errorf("doing X: %w", err)` strings document the failure path and are exercised by every call site.
+4. **Test names** — `TestReserve_OutOfStock_ReturnsConflict` is executable documentation verified by CI.
+5. **Doc comments on exported API** — rendered by godoc; written only when the signature cannot carry the contract.
+6. **Inline "why" comments** — last resort for a genuinely non-obvious decision.
+Levels 1–4 are verified by tooling. Levels 5–6 are human promises that drift. Minimise them.
+## Doc Comments
+A doc comment is the sentence directly above a declaration, and it begins with the name of the thing it documents — `go doc` and pkg.go.dev parse that convention, and `go vet`'s doc checks rely on it.
+```go
+// Reserve holds stock for an order until the reservation expires.
+// It returns ErrOutOfStock when the requested quantity is unavailable.
+func (s *Service) Reserve(ctx context.Context, orderID string, qty int) error
+```
+Document the **exported** surface — the package's contract with its callers. Unexported helpers are read with their implementation in view; a doc comment there is usually redundant with the code below it.
+State the contract the signature cannot: the error conditions a caller must branch on, side effects invisible in the return type, the units or invariants of a parameter. Do not restate the signature in prose.
+```go
+// BAD — restates the signature
+// GetOrder gets an order by id and returns it.
+func GetOrder(ctx context.Context, id string) (*Order, error)
+// GOOD — skip it; the name + types already say this
+func GetOrder(ctx context.Context, id string) (*Order, error)
+```
+## Package Documentation
+A package earns a doc comment when its name does not convey its purpose, its boundaries, or how its pieces fit. Put it in a dedicated `doc.go` so it survives file churn:
+```go
+// Package inventory tracks stock levels and reservations.
+//
+// A reservation is a hold with a TTL; a hold that expires returns its
+// quantity to available stock. Callers reserve, then either commit or
+// release — an abandoned hold is reclaimed by the sweeper, not the caller.
+package inventory
+```
+The comment orients a reader before they open a single file. A package whose name and exported identifiers already explain it (`httpclient`, `postgres`) needs no `doc.go`.
+## Names and Interfaces Are the Documentation
+The cheapest documentation is a name that makes the comment unnecessary. Go's conventions push this hard, and the service standards bake them in (`references/go-services.md`): small interfaces defined by their consumer, concrete types returned, no stuttering (`inventory.Service`, not `inventory.InventoryService`).
+A one-to-three-method interface documents a capability by its shape:
+```go
+// The name and the single method are the whole contract.
+type ReservationStore interface {
+    Save(ctx context.Context, r Reservation) error
+}
+```
+A wide interface needs a comment to explain what it is *for* — which is the signal it is doing too much. Narrow it, and the comment disappears.
+## Error Messages Are Documentation
+A Go error string is read far more often than any doc comment — it surfaces in logs, traces, and incident timelines. Treat it as documentation of the failure path. Wrap with context that names the operation, so the chain reads as a trace:
+```go
+if err := store.Save(ctx, r); err != nil {
+    return fmt.Errorf("reserving stock for order %s: %w", orderID, err)
+}
+```
+Lowercase, no trailing punctuation, no "failed to" prefix — the convention that lets wraps compose cleanly (`reserving stock for order 7: writing row: connection refused`). Sentinel and structured errors document the branches a caller is expected to take; define them where that caller lives (`references/go-services.md`).
+## Inline Comments
+Inline comments explain **why**, never **what**. The code already says what it does; the comment captures the reason the next reader cannot recover from the code alone.
+```go
+// Sweep every 30s: shorter churns the DB, longer lets expired holds
+// starve available stock past the SLA. Tuned against load test #214.
+ticker := time.NewTicker(30 * time.Second)
+```
+A comment that narrates the mechanics is noise — the reader can see the loop.
+```go
+// BAD — narrates the obvious
+// loop over the items and sum the quantities
+for _, item := range items {
+    total += item.Qty
+}
+```
+## A Comment Is Often a Smell
+When you reach for a comment to explain *what* a block does, the code is asking to be refactored. The comment is debt; the fix is in the code:
+- A comment explaining a variable → rename the variable.
+- A comment heading a block → extract a function whose name is that comment.
+- A comment decoding a boolean argument (`Process(data, true) // skip cache`) → introduce a named type or option.
+- A comment listing what a function does in three parts → the function does three things; split it.
+Delete the comment and fix the name. The refactor cannot drift; the comment can.
+## In-Code Markers
+```go
+// TODO(bob): batch these writes once the store supports it. Issue #231.
+// FIXME(carol): retry storms under partition; needs a circuit breaker. Issue #245.
+// HACK(dave): upstream returns 200 with an error body; inspect payload until fixed.
+```
+Always include `(username)` and an issue reference. A marker without one will never be resolved.
+## What NOT to Document
+- Self-evident exported functions where the name and types tell the whole story.
+- Unexported helpers read in context with their callers.
+- `Args`/`Returns`-style prose that duplicates the signature — Go has no such convention; the types are the parameter docs.
+- Struct fields whose name and type are clear (`CreatedAt time.Time`); comment only a non-obvious unit or invariant.
+- Generated code (protobuf, mocks) — never hand-edit comments into it.

package/src/engineer-skills/groundwork-go-engineer/references/testing.md CHANGED Viewed

@@ -1,5 +1,11 @@
 # Testing
+## The Model: Honeycomb, Not Pyramid
+The default shape is the test honeycomb: a fat middle of sociable service-perimeter tests, a thin layer of solitary unit tests, a few end-to-end checks on top. Testcontainers starts real Postgres in seconds, so the old excuse for mocking the database is gone — and a mock-heavy suite passes while production breaks. This is the stack idiom of the framework testing canon (`docs/principles/foundations/testing.md`); when this file and the canon disagree, the canon wins and this file is the one to fix.
+Above the honeycomb sits one proof the tiers below cannot give you: drive the real running service through its real front door — the HTTP/gRPC API a consumer actually calls — end to end on the real pipeline. Service tests that each pass behind a stubbed or faked dependency can still assemble into a product that does nothing on the real path, because the seams between them were never wired. And every fake or fixture standing in for a real stage is a debt: some test must drive the real producer of that data, or the fixture is a green light wired to nothing (canon, `docs/principles/foundations/testing.md`). Seeding the inputs is fine; faking the work in the middle is the violation.
 ## Testing Tiers
 ### Tier 1 — Service Perimeter Tests (Default)
@@ -120,6 +126,61 @@ Tests are risk-weighted assertions about production behaviour — not boxes tick
 5. **Risk-based depth.** Score modules with Impact × Complexity × Change-frequency before deciding test depth.
 6. **Tests are part of the change.** A PR without tests is incomplete.
+## Trace Assertions
+Observability is a test surface (principle 3): a critical-path request must emit an unbroken trace, and a missing span is a test failure, not an instrumentation TODO. The mechanism is an **in-memory span exporter** from the OTel SDK — no external tooling, and the durable approach now that the dedicated trace-test tools have gone dormant.
+```go
+import (
+    sdktrace "go.opentelemetry.io/otel/sdk/trace"
+    "go.opentelemetry.io/otel/sdk/trace/tracetest"
+)
+func TestCreateEntity_EmitsTrace(t *testing.T) {
+    exporter := tracetest.NewInMemoryExporter()
+    tp := sdktrace.NewTracerProvider(sdktrace.WithSyncer(exporter))
+    otel.SetTracerProvider(tp)
+    t.Cleanup(func() { _ = tp.Shutdown(context.Background()) })
+    // exercise the real handler (Tier 1 setup) ...
+    spans := exporter.GetSpans().Snapshots()
+    names := spanNames(spans)
+    require.Contains(t, names, "POST /entities") // the entry span exists
+    require.Contains(t, names, "db.insert")       // the work span exists
+    // assert the DB span is a descendant of the entry span — the trace is connected
+}
+```
+Assert what the contract promises — the spans that must exist on the journey, that the trace stays connected across hops, and the attributes a dashboard or SLO query depends on. Pinning the exact span tree and every attribute couples the test to implementation and trains the team to delete it.
+## Mutation Testing — the assertion-quality read-out
+A fat service-perimeter test drives many branches through one HTTP call and can *execute* them all while only asserting on the response body. Mutation testing is the one instrument that proves the suite checks what it runs: inject a fault, confirm a test fails, and a surviving mutant is a line you cover but do not check. Treat it as a **signal, never a gate** — run it on the high-risk modules the risk matrix flags and on changed code only.
+Go's mutation tooling is immature: `gremlins` is pre-1.0 and slow on large packages, and `go-mutesting` is effectively unmaintained. So here it stays a deliberate, hand-run spot check on a dense package under active change (`gremlins unleash ./internal/pricing`), not a CI expectation. Where it surfaces a surviving mutant on changed high-risk code, that is the missing assertion to add — the same read-out catches AI-generated tests whose oracle was lifted from the implementation.
+## Generate the Inputs You Can't Enumerate
+Example-based tests check the cases you thought of; the bugs live in the cases you didn't (canon principle 7). Two generative surfaces apply in Go, both highest-leverage on the **dense, boundary-poor** logic that earns Tier 2 unit tests:
+- **Property-based tests** for code with an invariant that holds across a large input space — a round-trip (`Decode(Encode(x)) == x`), a parser that must never panic, a pricing calculation with an algebraic law, a state machine that must preserve a constraint. State the property and let the framework generate and shrink counterexamples. Reach for `pgregory.net/rapid` (modern, shrinks well) over stdlib `testing/quick`; one property covers an infinity of examples, and most caught faults surface on a single generated input.
+```go
+func TestEncodeDecode_RoundTrips(t *testing.T) {
+    rapid.Check(t, func(t *rapid.T) {
+        order := genOrder().Draw(t, "order") // a rapid generator for the domain type
+        got, err := Decode(Encode(order))
+        require.NoError(t, err)
+        require.Equal(t, order, got)
+    })
+}
+```
+- **Native fuzzing** (`go test -fuzz`) at the byte boundary — parsers, decoders, anything that ingests untrusted input. Coverage-guided, first-class in the toolchain since Go 1.18, and a failing input is saved under `testdata/fuzz/` as a permanent regression seed. Run a fuzz target in a bounded CI lane (`-fuzztime=30s`) on changed parsers, not unbounded on every PR.
+The cost is authoring — a meaningful property needs a real invariant and a generator — so reach for these where invariants are real, not everywhere. Where the input space is small or there is no invariant, a table-driven unit test is the right tool.
 ## Anti-Patterns
 - **Mocking the database.** Test against a real schema.
@@ -131,9 +192,10 @@ Tests are risk-weighted assertions about production behaviour — not boxes tick
 ## Bet Slice Rollout — the permanent tests a slice owes
-When a bet slice's progress tests go green, the slice rolls out permanent coverage before it closes (bet workflow, Delivery step 5). The bet-progress tests prove the capability once and are archived; these stay.
+When a bet slice's progress tests go green, the slice-worker rolls out permanent coverage as part of the same slice, before the driver reviews it (bet workflow, Delivery). The bet-progress tests prove the capability once and are archived; these stay.
 - **Service perimeter test (always).** One Tier 1 test per capability the slice delivered, exercising the real handler against real Postgres — this is the honeycomb wall that survives refactors.
 - **Unit tests (when logic earned them).** Pure-function tests for branching business logic the slice introduced — state machines, pricing rules, parsers. CRUD plumbing does not earn unit tests; the perimeter test already covers it.
 - **Property-based tests (when invariants exist).** A slice that introduced an invariant — serialization round-trips, idempotent handlers, commutative merges — pins it with a property test (`testing/quick` or rapid), because example-based tests sample invariants instead of stating them.
+- **Critical-path trace assertions (when the slice added an observable path).** A slice that introduced a handler or background job whose trace a dashboard or SLO depends on pins it with an in-memory-exporter test: the entry and work spans exist and the trace stays connected. A missing span is a test failure, not an instrumentation TODO.
 - **Contract conformance (when the slice changed an API).** The served OpenAPI must match the promoted spec in `docs/architecture/api/<service>/openapi.yaml`; the generated system suite checks this — the slice's job is to keep the spec promotion current, not to hand-write the check.

package/src/engineer-skills/groundwork-go-engineer/sync-anchor.md CHANGED Viewed

@@ -1,11 +1,20 @@
 # Sync Anchor
-This file pins the principle files this skill embeds. When any listed file
-changes, this skill must be reviewed in the same commit. CI verifies the
-hashes match.
+This file pins the principle files this skill embeds — both the per-stack Go
+idiom docs and the cross-cutting central canon this skill distils. When any
+listed file changes, this skill must be reviewed in the same commit (and the
+matching per-stack idiom doc reconciled to the canon). CI verifies the hashes
+match.
 | Principle file | SHA-256 | Last reviewed |
 |---|---|---|
 | src/generators/go-microservice/docs/principles/stack/go/index.md | 5404a872df986823ca05107485ec6e22952c25b39882e44ed82316c0044f0973 | 2026-06-19 |
 | src/generators/go-microservice/docs/principles/stack/go/concurrency.md | e63a90b46f85ad63f0e1535f105106b9f30bff58e0270e9dc8d4b8f91951e0ca | 2026-05-26 |
-| src/generators/go-microservice/docs/principles/stack/go/testing.md | 5df1715b9fa10ff50cb3f18348a7a67dbef70cff80d7d77aefef2d6f1dc5da4c | 2026-05-26 |
+| src/generators/go-microservice/docs/principles/stack/go/testing.md | fd3561fbcb1c79bbd219ff0e76b8958898b0159daacef41bcf59092613773064 | 2026-06-26 |
+| src/docs/principles/foundations/testing.md | 205ac40d4c643e7b61cf1e4295df8a7b8b46dcd7c81b857aa8c642ea353f62ef | 2026-06-27 |
+| src/docs/principles/quality/observability.md | 8aa60e213ba03e989c93263153e3a1ac10b2336f6d0360c394f473660d565a0b | 2026-06-26 |
+| src/docs/principles/quality/security.md | 61157d97677142737ec537954dc5aaad7a04012cc8a3dcc855e2d324287fdc64 | 2026-06-26 |
+| src/docs/principles/foundations/code-craft.md | 55aa79dffada43c86e546ead89b07578dddb6a9ec8a7dba15034e3628b3e9d38 | 2026-06-26 |
+| src/docs/principles/quality/performance.md | 18b6d3391c57d97342068f9f1da732b24de4221489d0459bb6ad8900fac0a02e | 2026-06-26 |
+| src/docs/principles/quality/reliability.md | 9c9788504e0963458667d2727c3fc2359776108be593a2efc6603f6470002252 | 2026-06-26 |
+| src/docs/principles/foundations/documentation.md | 8b576072eaf4970f1251b560781e3e755c864a7920faa599b2834c921cbb8734 | 2026-06-26 |

package/src/engineer-skills/groundwork-nextjs-engineer/SKILL.md CHANGED Viewed

@@ -41,7 +41,7 @@ GroundWork gives you a deterministic **repo map** (`npx groundwork-method repo-m
 ## How to Use This Skill
-Match the user's task to the smallest relevant reference set. Most tasks touch one or two references.
+**Orient first.** On any non-trivial task, refresh the repo map (`npx groundwork-method repo-map`), read its `centrality` ranking to find the hubs, and navigate them with Serena before reading widely (see Code intelligence above) — this is the first step, not optional; fall back to ordinary reads only when those tools are unavailable. Then match the user's task to the smallest relevant reference set. Most tasks touch one or two references.
 | Topic | Reference | Load When |
 |-------|-----------|-----------|
@@ -55,6 +55,9 @@ Match the user's task to the smallest relevant reference set. Most tasks touch o
 | Tailwind & Styling | `references/tailwind-and-styling.md` | Tailwind v4 mechanics, consuming projected tokens, theming, dark mode, responsive design. |
 | Visual Language | `references/visual-language.md` | Consuming the design system: colour/type/spacing/elevation/surface technique and the projected token + surface utilities. |
 | UX Principles | `references/ux-principles.md` | Interaction patterns, progressive disclosure, feedback, empty states. |
+| Accessibility | `references/accessibility.md` | Semantic HTML, ARIA discipline, keyboard/focus, WCAG AA contrast, accessible forms, `jest-axe`. |
+| Security | `references/security.md` | XSS, CSRF, auth/session, the `NEXT_PUBLIC` secret boundary, Server Action validation, CSP, SSRF on server fetches. |
+| Observability | `references/observability.md` | Server spans via `instrumentation.ts`, client Web Vitals/RUM, error reporting, PII discipline. |
 | Testing | `references/testing.md` | Component tests, integration tests, accessibility testing, test utilities. |
 | Performance & Deployment | `references/performance-and-deployment.md` | Bundle analysis, lazy loading, image optimization, build configuration. |
 | Documentation | `references/documentation.md` | Component documentation, Storybook patterns, inline docs. |
@@ -68,6 +71,8 @@ Match the user's task to the smallest relevant reference set. Most tasks touch o
 - **Form work** → Load `references/mutations-and-forms.md`. Verify Zod schema patterns.
 - **Styling/theming** → Load `references/tailwind-and-styling.md` and `references/visual-language.md`. Check design guide.
 - **Performance issues** → Load `references/performance-and-deployment.md`. Profile before optimizing.
+- **Security / auth / session work** → Load `references/security.md`. Check the server/client boundary and the `NEXT_PUBLIC` secret line.
+- **Instrumentation / telemetry** → Load `references/observability.md`. Distinguish server spans from client RUM.
 ## Safety Gates

package/src/engineer-skills/groundwork-nextjs-engineer/references/accessibility.md ADDED Viewed

@@ -0,0 +1,111 @@
+# Accessibility
+## Table of Contents
+- [The Baseline](#the-baseline)
+- [Semantic HTML First, ARIA Last](#semantic-html-first-aria-last)
+- [Keyboard & Focus](#keyboard--focus)
+- [Contrast & Colour](#contrast--colour)
+- [Accessible Forms](#accessible-forms)
+- [Motion](#motion)
+- [Accessibility as the Test Seam](#accessibility-as-the-test-seam)
+- [Review Checklist](#review-checklist)
+---
+## The Baseline
+Accessibility is a merge gate, not a backlog item. The baseline for every surface: native semantic elements over `div` soup, every interactive element keyboard-reachable with a visible focus ring, WCAG AA contrast on text and UI boundaries, programmatic labels on every form field, and motion that honours `prefers-reduced-motion`. An accessibility failure blocks the slice the way a failing test does. The standard is WCAG 2.2 AA — see `docs/principles/quality/accessibility.md`.
+## Semantic HTML First, ARIA Last
+The first rule of ARIA is that no ARIA is better than bad ARIA — a misapplied `role` silently overrides the native semantics that already worked. Reach for the native element before the attribute:
+- Use `<button>`, `<a>`, `<nav>`, `<main>`, `<header>`, `<footer>`, `<article>`, `<section>`. Never a `<div onClick>` for a control — it is invisible to keyboard and assistive tech.
+- Headings form an outline: one `<h1>` per page, no skipped levels.
+- Reach for ARIA only when HTML cannot express the semantics, and then wire up every state and key handler by hand — an ARIA reimplementation is correct only if complete.
+When naming, associate visible text first; fall back to `aria-label` only when there is no on-screen text to reference. Icon-only buttons need an explicit label:
+```tsx
+<button onClick={onClose} aria-label="Close dialog">
+  <X size={20} aria-hidden />
+</button>
+```
+## Keyboard & Focus
+Every journey completes without a pointer. Tab order follows reading order, there are no keyboard traps, and focus is always visible.
+- Never delete the focus indicator for aesthetics. Style `:focus-visible` so a clear ring shows for keyboard users without firing on mouse clicks:
+  ```css
+  :focus-visible {
+    outline: 2px solid var(--color-accent);
+    outline-offset: 2px;
+  }
+  ```
+- Composite widgets (menus, tabs, grids) are a single tab stop, then arrow-key navigation inside via roving `tabindex` — a 30-item menu is one tab stop, not thirty.
+- Modals and overlays manage focus: trap focus inside while open, return focus to the trigger on close, and mark them with `role="dialog"` and `aria-modal`. Otherwise keyboard users are stranded behind the overlay.
+## Contrast & Colour
+Contrast is measured against rendered colour, not eyeballed. Body text meets **4.5:1**, large text (18px+, or 14px+ bold) meets **3:1**, and UI component boundaries and meaningful graphics meet **3:1**. Consume the design-system role tokens (see `references/visual-language.md`) so audited pairs stay paired; a hand-mixed colour or opacity hack silently breaks contrast and is a review finding. Verify in both themes.
+Colour is never the only signal. Pair every colour-coded state with a second cue — icon, label, or position:
+```tsx
+<span className="text-success flex items-center gap-1">
+  <CheckCircle size={16} aria-hidden />
+  Completed
+</span>
+```
+## Accessible Forms
+Every field carries a programmatic label — a real `<label htmlFor>`, never a placeholder standing in for one, because the placeholder vanishes on input.
+- Errors are announced, not just coloured. Render the message in a `role="alert"` region and tie it to the field with `aria-describedby`; set `aria-invalid` on the failed input.
+- A failed submit keeps every entered value, marks each field inline, and moves focus to the first error.
+- Use the correct input `type` (`email`, `tel`, `url`) so the right keyboard and validation apply.
+## Motion
+Animations honour `prefers-reduced-motion`. For users with vestibular conditions, unrequested motion is an accessibility failure, not decoration. The reduced-motion path still communicates the state change — it drops the movement, not the meaning:
+```css
+@media (prefers-reduced-motion: reduce) {
+  *, *::before, *::after {
+    animation-duration: 0.01ms !important;
+    transition-duration: 0.01ms !important;
+  }
+}
+```
+## Accessibility as the Test Seam
+The accessible query is also the test query: Testing Library finds by `getByRole`, `getByLabelText`, and visible text, so **inaccessible UI is untestable UI**. A control with no role and no accessible name is one a screen reader cannot address and a test cannot reach — when a test falls back to `getByTestId` because nothing semantic exists, treat it as the accessibility defect it is and fix the markup.
+Automated checks cover the mechanical layer only. Run `jest-axe` on every component that renders interactive elements (see `references/testing.md`), and gate it in CI:
+```tsx
+it('has no accessibility violations', async () => {
+  const { container } = render(<Component />);
+  expect(await axe(container)).toHaveNoViolations();
+});
+```
+Axe catches roughly a third of WCAG criteria — it cannot judge whether alt text is meaningful or focus order makes sense. A new journey also earns a manual keyboard walk.
+## Review Checklist
+- [ ] Native semantic element used; no `div`/`span` as a control.
+- [ ] Icon-only buttons carry `aria-label`; ARIA used only where HTML can't express it.
+- [ ] Every journey completes by keyboard; focus order matches reading order.
+- [ ] `:focus-visible` ring present on all interactive elements.
+- [ ] Modals trap and restore focus.
+- [ ] Contrast meets AA via theme tokens — no hand-mixed colours.
+- [ ] No state communicated by colour alone.
+- [ ] Every form field has a programmatic label; errors announced via `role="alert"`.
+- [ ] Motion respects `prefers-reduced-motion`.
+- [ ] `jest-axe` scan clean; queries find by role/label, not test IDs.