npm - sanook-cli - Versions diffs - 0.4.0 → 0.5.0 - Mend

sanook-cli 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (235) hide show

package/.env.example +19 -0
package/CHANGELOG.md +144 -0
package/README.md +153 -20
package/README.th.md +136 -0
package/dist/agentContext.js +4 -0
package/dist/approval.js +6 -0
package/dist/bin.js +394 -51
package/dist/brain.js +92 -59
package/dist/brand.js +47 -0
package/dist/checkpoint.js +37 -0
package/dist/commands.js +86 -6
package/dist/compaction.js +76 -5
package/dist/config.js +100 -12
package/dist/cost.js +60 -3
package/dist/doctor.js +92 -0
package/dist/gateway/auth.js +2 -2
package/dist/gateway/ledger.js +2 -2
package/dist/gateway/scheduler.js +1 -0
package/dist/gateway/serve.js +6 -4
package/dist/gateway/server.js +10 -2
package/dist/git.js +11 -2
package/dist/hooks.js +43 -17
package/dist/knowledge.js +48 -49
package/dist/loop.js +182 -66
package/dist/lsp/client.js +173 -0
package/dist/lsp/framing.js +56 -0
package/dist/lsp/index.js +138 -0
package/dist/lsp/servers.js +82 -0
package/dist/mcp-server.js +244 -0
package/dist/mcp.js +184 -29
package/dist/memory-store.js +559 -0
package/dist/memory.js +143 -29
package/dist/orchestrate.js +150 -0
package/dist/providers/codex.js +2 -2
package/dist/providers/keys.js +3 -2
package/dist/providers/registry.js +133 -1
package/dist/repomap.js +93 -0
package/dist/search/chunk.js +158 -0
package/dist/search/embed-store.js +187 -0
package/dist/search/engine.js +203 -0
package/dist/search/fuse.js +35 -0
package/dist/search/index-core.js +187 -0
package/dist/search/indexer.js +241 -0
package/dist/search/store.js +77 -0
package/dist/session.js +42 -8
package/dist/skill-install.js +10 -10
package/dist/skills.js +12 -9
package/dist/summarize.js +31 -0
package/dist/tools/bash.js +21 -2
package/dist/tools/diagnostics.js +41 -0
package/dist/tools/edit.js +29 -7
package/dist/tools/index.js +8 -1
package/dist/tools/list.js +7 -2
package/dist/tools/permission.js +90 -9
package/dist/tools/read.js +23 -4
package/dist/tools/remember.js +1 -1
package/dist/tools/sandbox.js +61 -0
package/dist/tools/search.js +105 -4
package/dist/tools/task.js +195 -29
package/dist/tools/timeout.js +35 -0
package/dist/tools/util.js +10 -0
package/dist/tools/write.js +6 -4
package/dist/trust.js +89 -0
package/dist/ui/app.js +218 -27
package/dist/ui/banner.js +4 -9
package/dist/ui/history.js +30 -0
package/dist/ui/mentions.js +44 -0
package/dist/ui/setup.js +6 -5
package/dist/ui/useEditor.js +83 -0
package/dist/update.js +114 -0
package/dist/worktree.js +173 -0
package/package.json +11 -5
package/scripts/postinstall.mjs +33 -0
package/second-brain/.agents/_Index.md +30 -0
package/second-brain/.agents/skills/_Index.md +30 -0
package/second-brain/.agents/workflows/_Index.md +30 -0
package/second-brain/AGENTS.md +4 -4
package/second-brain/Acceptance/_Index.md +30 -0
package/second-brain/Acceptance/golden-case-template.md +39 -0
package/second-brain/Areas/_Index.md +30 -0
package/second-brain/Bugs/System-OS/_Index.md +30 -0
package/second-brain/Bugs/_Index.md +30 -0
package/second-brain/CLAUDE.md +4 -1
package/second-brain/Checklists/_Index.md +30 -0
package/second-brain/Checklists/preflight-postflight-template.md +29 -0
package/second-brain/Distillations/_Index.md +30 -0
package/second-brain/Entities/_Index.md +30 -0
package/second-brain/Entities/entity-template.md +33 -0
package/second-brain/Evals/_Index.md +30 -0
package/second-brain/Evals/correction-pairs.md +24 -0
package/second-brain/Evals/failure-taxonomy.md +24 -0
package/second-brain/Evals/golden-set.md +25 -0
package/second-brain/Evals/quality-ledger.md +23 -0
package/second-brain/Evals/self-eval-rubric.md +23 -0
package/second-brain/GEMINI.md +4 -4
package/second-brain/Goals/_Index.md +30 -0
package/second-brain/Handoffs/_Index.md +30 -0
package/second-brain/Home.md +7 -0
package/second-brain/Intake/Raw Sources/_Index.md +30 -0
package/second-brain/Intake/_Index.md +30 -0
package/second-brain/Intake/_Quarantine/_Index.md +30 -0
package/second-brain/Learning/_Index.md +30 -0
package/second-brain/Playbooks/_Index.md +30 -0
package/second-brain/Playbooks/playbook-template.md +23 -0
package/second-brain/Projects/_Index.md +30 -0
package/second-brain/Prompts/_Index.md +30 -0
package/second-brain/README.md +2 -1
package/second-brain/Research/_Index.md +30 -0
package/second-brain/Retrospectives/_Index.md +30 -0
package/second-brain/Reviews/_Index.md +30 -0
package/second-brain/Runbooks/_Index.md +30 -0
package/second-brain/Runbooks/eval-loop.md +24 -0
package/second-brain/Sessions/_Index.md +30 -0
package/second-brain/Shared/AI-Context-Index.md +20 -0
package/second-brain/Shared/AI-Threads/_Index.md +30 -0
package/second-brain/Shared/Archive/_Index.md +30 -0
package/second-brain/Shared/Assets/_Index.md +30 -0
package/second-brain/Shared/Context-Packs/_Index.md +30 -0
package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
package/second-brain/Shared/Coordination/NOW.md +28 -0
package/second-brain/Shared/Coordination/_Index.md +30 -0
package/second-brain/Shared/Coordination/agent-registry.md +24 -0
package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
package/second-brain/Shared/Coordination/task-board.md +32 -0
package/second-brain/Shared/Core-Facts/_Index.md +30 -0
package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
package/second-brain/Shared/Glossary/_Index.md +30 -0
package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
package/second-brain/Shared/Operating-State/_Index.md +30 -0
package/second-brain/Shared/Prompting/_Index.md +30 -0
package/second-brain/Shared/Provenance/_Index.md +30 -0
package/second-brain/Shared/Rules/_Index.md +30 -0
package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
package/second-brain/Shared/Rules/rules-formatting.md +34 -0
package/second-brain/Shared/Scripts/_Index.md +30 -0
package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
package/second-brain/Shared/User-Memory/_Index.md +30 -0
package/second-brain/Shared/User-Persona/_Index.md +30 -0
package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
package/second-brain/Shared/Working-Memory/_Index.md +30 -0
package/second-brain/Shared/_Index.md +30 -0
package/second-brain/Shared/mcp-servers/_Index.md +30 -0
package/second-brain/Skills/_Index.md +30 -0
package/second-brain/Templates/_Index.md +30 -0
package/second-brain/Templates/bug.md +2 -0
package/second-brain/Templates/handoff.md +2 -0
package/second-brain/Templates/session.md +2 -0
package/second-brain/Tools/_Index.md +30 -0
package/second-brain/Traces/_Index.md +30 -0
package/second-brain/Vault Structure Map.md +33 -1
package/second-brain/copilot/_Index.md +30 -0
package/skills/audit-license-compliance/SKILL.md +117 -0
package/skills/author-codemod/SKILL.md +110 -0
package/skills/build-audit-logging/SKILL.md +112 -0
package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
package/skills/build-cli-tool/SKILL.md +108 -0
package/skills/build-data-table/SKILL.md +141 -0
package/skills/build-native-mobile-ui/SKILL.md +154 -0
package/skills/build-offline-first-sync/SKILL.md +118 -0
package/skills/build-realtime-channel/SKILL.md +122 -0
package/skills/build-vector-search/SKILL.md +131 -0
package/skills/compose-local-dev-stack/SKILL.md +149 -0
package/skills/configure-bundler-build/SKILL.md +166 -0
package/skills/configure-dns-tls/SKILL.md +142 -0
package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
package/skills/configure-security-headers-csp/SKILL.md +122 -0
package/skills/contract-testing/SKILL.md +140 -0
package/skills/datetime-timezone-correctness/SKILL.md +125 -0
package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
package/skills/debug-flaky-tests/SKILL.md +128 -0
package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
package/skills/deliver-webhooks/SKILL.md +116 -0
package/skills/design-api-pagination/SKILL.md +144 -0
package/skills/design-authorization-model/SKILL.md +119 -0
package/skills/design-backup-dr-recovery/SKILL.md +113 -0
package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
package/skills/design-multi-tenancy/SKILL.md +100 -0
package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
package/skills/design-relational-schema/SKILL.md +129 -0
package/skills/design-search-index-infra/SKILL.md +151 -0
package/skills/design-state-machine/SKILL.md +108 -0
package/skills/design-token-system/SKILL.md +109 -0
package/skills/distributed-locks-leases/SKILL.md +120 -0
package/skills/encrypt-sensitive-data/SKILL.md +148 -0
package/skills/feature-flags-rollout/SKILL.md +130 -0
package/skills/file-upload-object-storage/SKILL.md +107 -0
package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
package/skills/harden-llm-app-reliability/SKILL.md +126 -0
package/skills/i18n-localization-setup/SKILL.md +113 -0
package/skills/idempotency-keys/SKILL.md +107 -0
package/skills/implement-push-notifications/SKILL.md +142 -0
package/skills/ingest-webhook-secure/SKILL.md +120 -0
package/skills/integrate-oauth-oidc/SKILL.md +126 -0
package/skills/load-stress-test/SKILL.md +129 -0
package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
package/skills/model-nosql-data/SKILL.md +118 -0
package/skills/money-decimal-arithmetic/SKILL.md +123 -0
package/skills/monitor-ml-drift/SKILL.md +109 -0
package/skills/numeric-precision-units/SKILL.md +144 -0
package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
package/skills/optimize-react-rerenders/SKILL.md +124 -0
package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
package/skills/payments-billing-integration/SKILL.md +114 -0
package/skills/pin-toolchain-versions/SKILL.md +116 -0
package/skills/plan-strangler-migration/SKILL.md +95 -0
package/skills/property-based-testing/SKILL.md +108 -0
package/skills/publish-package-registry/SKILL.md +130 -0
package/skills/recover-git-state/SKILL.md +119 -0
package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
package/skills/resilience-timeouts-retries/SKILL.md +104 -0
package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
package/skills/rewrite-git-history/SKILL.md +109 -0
package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
package/skills/schema-evolution-compatibility/SKILL.md +121 -0
package/skills/send-transactional-email/SKILL.md +126 -0
package/skills/serve-deploy-ml-model/SKILL.md +107 -0
package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
package/skills/setup-devcontainer-env/SKILL.md +131 -0
package/skills/setup-lint-format-precommit/SKILL.md +140 -0
package/skills/setup-monorepo-tooling/SKILL.md +125 -0
package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
package/skills/structured-output-llm/SKILL.md +86 -0
package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
package/skills/test-data-factories/SKILL.md +158 -0
package/skills/threat-model-stride/SKILL.md +123 -0
package/skills/train-evaluate-ml-model/SKILL.md +109 -0
package/skills/unicode-text-correctness/SKILL.md +109 -0
package/skills/visual-regression-testing/SKILL.md +120 -0

package/skills/design-protobuf-grpc-service/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: design-protobuf-grpc-service
+description: Designs and evolves gRPC/protobuf service contracts — message and service definitions, unary vs streaming RPC selection, wire-compatible schema evolution (reserved tags, safe vs breaking changes), canonical status codes, deadlines/cancellation, interceptors, and buf-driven codegen plus breaking-change detection.
+when_to_use: User is writing or changing a .proto/gRPC service, picking unary vs streaming, worried about breaking wire compat on a rolling deploy, wiring multi-language codegen, or adding deadlines/auth/error semantics. This is the binary RPC contract; HTTP/JSON REST or GraphQL surfaces are rest-graphql-contract.
+---
+## When to Use
+Reach for this skill when the contract is a **.proto / gRPC wire format**, not an HTTP/JSON shape:
+- "Design the messages and RPCs for this new service" / "add a method to this `.proto`"
+- "Is renaming/renumbering this field safe to deploy?" — wire-compat review
+- "Should this be unary, server-streaming, or bidi?" / "stream vs websocket?"
+- "Wire codegen for Go + TS + Python off one schema" / "set up `buf` + breaking-change CI"
+- "Set deadlines / map our errors to gRPC status codes / add an auth interceptor"
+- "Expose this to a browser" → gRPC-Web / Connect
+NOT this skill:
+- REST resources, JSON envelopes, OpenAPI/SDL, HTTP versioning/pagination → rest-graphql-contract
+- Reviewing an existing HTTP/RPC API *diff* for naming/compat as an audit pass → api-design-review
+- Issuing/verifying JWTs, OAuth/OIDC flows, RBAC logic (the interceptor *calls* this) → auth-jwt-session
+- Adding tracing/metrics/logs to the service internals → observability-instrument
+- Correctness of the streaming/concurrency code itself (races, missing await) → async-concurrency-correctness
+## Steps
+1. **Model messages — field numbers are the contract, names are not.** The tag number is what goes on the wire; renaming a field is free, renumbering is catastrophic.
+   - Number `1–15` cost 1 byte; reserve them for the hot, always-present fields. `16+` cost 2 bytes.
+   - **Removing a field:** delete it *and* `reserved` both its number and name, so nobody reuses them. This is non-negotiable.
+     ```proto
+     message User {
+       reserved 4, 7 to 9;            // retired tags — never reuse
+       reserved "email_verified";     // retired name — block re-add under old meaning
+       string id = 1;
+       string display_name = 2;
+       optional string email = 3;     // optional => field presence (knows set-vs-default)
+     }
+     ```
+   - Use `optional` (proto3) when you must distinguish "unset" from zero-value; bare scalars can't tell `0`/`""`/`false` from absent.
+   - **Every enum starts at `0 = *_UNSPECIFIED`.** 0 is the default on the wire; if 0 means a real state you can't detect "not set," and you can't safely add values before it.
+     ```proto
+     enum Status { STATUS_UNSPECIFIED = 0; STATUS_ACTIVE = 1; STATUS_BANNED = 2; }
+     ```
+   - Prefer `google.protobuf.Timestamp`/`Duration` over raw int64; `map<k,v>` over parallel lists; a `Money{currency_code, units, nanos}` message over a float. Never put currency in a `double`.
+2. **Pick the RPC shape from the data flow — default to unary.** Streaming is for unbounded or incremental data, not for "it's faster."
+   | Shape | Signature | Use when | Don't use for |
+   |---|---|---|---|
+   | **Unary** | `rpc Get(Req) returns (Resp)` | request/response, bounded payload — **the default** | huge/unbounded results |
+   | Server-streaming | `returns (stream Resp)` | feed/tail, large result set, server-push progress | a single object that fits in memory |
+   | Client-streaming | `(stream Req) returns (Resp)` | chunked upload, batch ingest, client-side aggregation | small fixed-size input |
+   | Bidi | `(stream Req) returns (stream Resp)` | live chat, long-lived sync, interactive session | anything a sequence of unary calls covers |
+   - **Stream vs websocket:** if both ends are gRPC and you need typed messages + backpressure + deadlines, use a gRPC stream. Reach for a websocket only when a *browser* needs raw duplex and you're not on Connect/gRPC-Web.
+   - Page large reads with `page_size`/`page_token` (AIP-158) **before** reaching for server-streaming — pagination is resumable and cacheable; a broken stream restarts from zero.
+3. **Run the wire-compat checklist before any schema change** — clients and servers deploy at different times, in multiple languages, and old binaries must keep parsing new messages.
+   | Change | Wire-safe? | Why |
+   |---|---|---|
+   | Add a new field (new tag) | ✅ | old readers skip unknown fields |
+   | Add a new RPC / new message | ✅ | additive |
+   | Rename a field (same tag/type) | ✅ wire / ⚠️ JSON | wire keys on number; **gRPC-JSON/Connect keys on name** — breaks JSON clients |
+   | Add an enum value | ✅ | but old clients see it as the unknown/default — handle that branch |
+   | `int32`↔`int64`, `sint`↔`int`, `optional`↔`repeated` | ❌ | different wire encoding → silent corruption |
+   | Reuse / renumber a tag | ❌ | old data deserializes into the wrong field |
+   | Remove a field without `reserved` | ❌ | tag can be reused later → corruption |
+   | Change a field's type/cardinality | ❌ | re-version the message or add a new field instead |
+   | Rename/move a service or package | ❌ | path is `/pkg.Service/Method` — old stubs 404 with `UNIMPLEMENTED` |
+   To evolve incompatibly: **add a new field/method, deprecate the old (`[deprecated = true]`), migrate, then `reserved` it** — never mutate in place. Enforce this with `buf breaking` (step 6).
+4. **Error & control plane — set a deadline on every call, return canonical codes.**
+   - **Deadlines are mandatory.** A call without one can hang forever and pin a server thread. Set an absolute deadline client-side (`context.WithTimeout`, ~the SLO); servers must check `ctx.Err()`/`isCancelled` and stop work when the client gives up. Propagate the deadline to downstream calls — don't reset it.
+   - Map failures to the [canonical status codes](https://grpc.io/docs/guides/status-codes/), not a generic `UNKNOWN`/`INTERNAL`:
+     | Code | Use for | Retry? |
+     |---|---|---|
+     | `INVALID_ARGUMENT` | malformed request, fails regardless of state | no |
+     | `FAILED_PRECONDITION` | valid request, wrong system state | no (fix state first) |
+     | `NOT_FOUND` / `ALREADY_EXISTS` | missing / duplicate resource | no |
+     | `PERMISSION_DENIED` / `UNAUTHENTICATED` | authz fail / missing-bad creds | no |
+     | `RESOURCE_EXHAUSTED` | quota / rate limit | yes, with backoff + honor `Retry-After`-style detail |
+     | `DEADLINE_EXCEEDED` | call ran past deadline | yes if idempotent |
+     | `UNAVAILABLE` | transient — server down/restarting | yes, backoff (the canonical retryable code) |
+     | `ABORTED` | concurrency conflict (CAS/txn) | yes, after re-read |
+   - Attach machine-readable detail with `google.rpc.Status` + typed details (`ErrorInfo` with a stable `reason` + `domain`, `BadRequest.field_violations`, `QuotaFailure`) — not a prose string clients must regex.
+   - **Retry only idempotent methods.** Configure a service-config retry policy (`maxAttempts`, `UNAVAILABLE`/`DEADLINE_EXCEEDED` only, exponential backoff). For non-idempotent creates, pass a client-generated idempotency key in metadata and dedupe server-side. Cancellation propagates automatically when the client closes the stream/context — release resources on it.
+5. **Cross-cutting concerns belong in interceptors + metadata, not in every method.**
+   - **Interceptors** (chained, ordered) for auth, logging, tracing, panic-recovery, rate-limit. Auth interceptor reads the token from metadata and *delegates verification* (that logic lives in auth-jwt-session) — return `UNAUTHENTICATED` (missing/invalid creds) vs `PERMISSION_DENIED` (valid identity, not allowed).
+   - **Metadata** = gRPC's headers/trailers. Lowercase ASCII keys; a key carrying raw bytes must end in `-bin` (e.g. `trace-id-bin`) so the runtime base64-handles it. Carry auth (`authorization: Bearer …`), request id, idempotency key, locale. Never put a deadline in metadata — it's a first-class call property. Reserved `grpc-*` keys are runtime-owned; don't set them yourself.
+   - **TLS always; mTLS for service-to-service.** Never run a non-loopback gRPC server on an insecure channel — h2c in the clear leaks every byte.
+   - **Browser/edge:** native gRPC needs HTTP/2 trailers a browser can't send, so expose **Connect** (speaks gRPC, gRPC-Web, *and* JSON over the same handler — easiest) or **gRPC-Web** behind an Envoy/proxy translation layer. Don't try to call raw gRPC from `fetch`.
+6. **Codegen + lint with `buf`, not raw `protoc` — and wire breaking-change detection into CI.** `protoc` plugin/path juggling is the classic footgun; `buf` makes the schema the source of truth.
+   ```yaml
+   # buf.yaml
+   version: v2
+   lint:   { use: [STANDARD] }
+   breaking: { use: [WIRE_JSON] }   # catch tag/type/name breakage
+   ```
+   ```yaml
+   # buf.gen.yaml — one schema, many languages
+   version: v2
+   plugins:
+     - { remote: buf.build/protocolbuffers/go,    out: gen/go,  opt: paths=source_relative }
+     - { remote: buf.build/connectrpc/go,         out: gen/go,  opt: paths=source_relative }
+     - { remote: buf.build/bufbuild/es,           out: gen/ts }
+   ```
+   ```bash
+   buf lint                                   # naming/style/UNSPECIFIED rules
+   buf breaking --against '.git#branch=main'  # FAIL CI on any wire/JSON break
+   buf generate                               # regenerate all stubs from .proto
+   ```
+   Check generated stubs into VCS *or* regenerate in CI — pick one and enforce it; a stale committed stub that disagrees with the `.proto` is a silent contract drift. Back the contract with a contract test (step in Verify) so the running server and the `.proto` can't diverge.
+## Common Errors
+- **Reusing or renumbering a field tag.** Old bytes deserialize into the wrong field — silent data corruption, no error. Always `reserved` removed tags *and* names; `buf breaking` catches it if you let it.
+- **Enum without `0 = *_UNSPECIFIED`.** 0 is the wire default, so you can't distinguish "unset" from your first real value, and you can't prepend values later. Always reserve 0 for UNSPECIFIED.
+- **No deadline on the call.** One slow/hung downstream pins server resources indefinitely and cascades into outage. Set an absolute deadline on every client call; propagate, don't reset, downstream.
+- **Returning `INTERNAL`/`UNKNOWN` for everything.** Clients can't tell retryable from fatal and either hammer a down service or give up on a transient blip. Map to the specific canonical code; reserve `INTERNAL` for genuine server bugs.
+- **Retrying non-idempotent RPCs.** A retried `Create`/`Charge` after a timeout double-executes. Restrict the retry policy to idempotent methods; for the rest use a server-deduped idempotency key.
+- **Changing a scalar type to a "compatible-looking" one** (`int32`→`int64`, `optional`→`repeated`). Different wire encoding → garbled values on old readers. Add a new field instead and migrate.
+- **Renaming a field assumed free, but a Connect/gRPC-JSON client keys on the name.** Wire-safe, JSON-breaking. If any client speaks JSON, treat a rename as breaking.
+- **Calling raw gRPC from a browser.** Native gRPC needs HTTP/2 trailers the browser can't produce. Use Connect or gRPC-Web through a proxy.
+- **`protoc` plugin/import-path hell producing stale or wrong stubs.** Use `buf` with a remote plugin set so paths and versions are pinned and reproducible.
+- **Insecure (h2c, no TLS) channel in prod.** Everything including bearer tokens is in cleartext. TLS always; mTLS between services.
+- **Breaking-change check missing from CI.** A bad merge ships an incompatible schema and breaks every deployed client. `buf breaking --against main` must gate merges.
+## Verify
+1. **Lint clean:** `buf lint` passes — every enum has `*_UNSPECIFIED = 0`, fields snake_case, services/methods follow the standard naming rules.
+2. **Breaking-change gate:** `buf breaking --against '.git#branch=main'` is green; deliberately renumber a tag locally and confirm it goes **red** (proves the gate actually fires).
+3. **Codegen reproducible:** `buf generate` from a clean tree produces stubs byte-identical to what's committed (no uncommitted diff) for every target language.
+4. **Wire round-trip across versions:** serialize a message with the *new* schema, parse it with a binary built on the *old* schema (and vice-versa) — no error, no field loss for additive changes. This is the real proof of compatibility, not eyeballing the diff.
+5. **Deadline honored:** a call given a 100ms deadline against an artificially slow method returns `DEADLINE_EXCEEDED` near 100ms (not hanging), and the server logs show it cancelled work rather than running to completion.
+6. **Status mapping:** each error path returns its specific canonical code (asserted in tests), and retry policy retries `UNAVAILABLE`/`DEADLINE_EXCEEDED` only — a deliberate `INVALID_ARGUMENT` is not retried.
+7. **Streaming flow:** a server-stream consumer that cancels mid-stream causes the server's context to cancel and stop producing (no leaked goroutine/thread); a client-stream upload that drops mid-send leaves no half-written state.
+8. **Auth interceptor:** missing token → `UNAUTHENTICATED`; valid token without permission → `PERMISSION_DENIED`; both asserted, and the path runs over TLS (insecure channel rejected).
+9. **Contract test:** a cross-language client built from the generated stub calls the running server end-to-end and gets the expected typed response — proves `.proto`, server, and stubs agree.
+Done = `buf lint` and `buf breaking --against main` pass in CI (and the gate provably fails on a real break), `buf generate` leaves no diff, the old↔new wire round-trip and the cross-language contract test both pass, every call sets a deadline, and each error path returns its specific canonical status code over TLS.

package/skills/design-relational-schema/SKILL.md ADDED Viewed

@@ -0,0 +1,129 @@
+---
+name: design-relational-schema
+description: Designs a normalized relational schema from requirements — entities, relationships, PK strategy (surrogate bigint vs natural vs UUIDv7/ULID), 1:1/1:N/M:N and inheritance modeling, 3NF/BCNF normalization, invariants encoded as UNIQUE/CHECK/FK/exclusion constraints, and deliberate read-path denormalization with stated consistency tradeoffs.
+when_to_use: When starting a new database or a new table cluster and you need the logical+physical model — turning requirements/an ERD into tables, choosing keys, modeling cardinalities and inheritance, normalizing, then deciding where to denormalize. Distinct from db-migration-safety (altering a live table safely) and optimize-sql-query (speeding up a query against an existing schema).
+---
+## When to Use
+Reach for this skill when you're designing the **shape of the data**, before any table exists:
+- "Model these requirements / this ERD as tables"
+- "Should this PK be a UUID or a bigint? natural or surrogate? composite?"
+- "How do I model users↔roles (M:N) / orders→items (1:N) / a polymorphic comment?"
+- "Normalize this — I've got repeating columns / update anomalies / duplicated data"
+- "Where should I denormalize for a read-heavy dashboard, and what breaks?"
+- Choosing column types: enum vs lookup table, soft vs hard delete, audit columns, money/time precision
+NOT this skill:
+- Changing a table that already has rows/traffic (locks, backfills, rollback) → **db-migration-safety**
+- A query against an existing schema is slow → **optimize-sql-query**
+- You need an append-only, replayable, audit-complete domain model → **design-event-sourcing-cqrs**
+- Computing prices/tax/rounding/FX (the math, not the column type) → **money-decimal-arithmetic**
+- Storing/converting/comparing timestamps & DST correctly → **datetime-timezone-correctness**
+- Shaping items/documents for a non-relational store (DynamoDB/Mongo/Cassandra) around access patterns → **model-nosql-data**
+## Steps
+1. **Extract entities, attributes, relationships from requirements — nouns→tables, verbs→relationships.** List each entity, its attributes, and for every pair the cardinality (1:1 / 1:N / M:N) and optionality (mandatory vs nullable side). Mark each attribute's identity role: is it a candidate key (naturally unique, immutable), or descriptive? Write functional dependencies (`A → B`: A determines B) — they drive normalization in step 3. One table = one entity type; if an attribute is itself a list ("tags", "phone numbers"), it's a separate table, not a CSV column or `jsonb` dumping ground.
+2. **Pick a PK strategy per table — default to surrogate, choose the integer/UUID flavor deliberately.**
+   | Strategy | Use when | Avoid when |
+   |---|---|---|
+   | **`bigint GENERATED ALWAYS AS IDENTITY`** | Single-DB, internal IDs, smallest/fastest index, FK-heavy — **default** | IDs leak to clients/URLs and count/sequence is sensitive; multi-master inserts |
+   | **`uuid` v7 / ULID** (time-ordered) | IDs generated client-side or across shards, exposed in URLs, need merge without collision | You can use bigint and don't expose the ID — 16B vs 8B and bigger indexes |
+   | **`uuid` v4** (random) | Only if unguessability matters *and* you accept index-locality cost | Hot insert paths — random UUIDs fragment B-tree pages and bloat WAL |
+   | **Natural key** (email, ISO code, slug) | Truly immutable, single-attribute, externally governed (`country.iso2`, `currency.code`) | It can ever change or isn't guaranteed unique — a changing PK cascades through every FK |
+   | **Composite key** | Junction tables (`(a_id, b_id)`); rows identified only by the combination | A tempting single surrogate would be simpler and the combo isn't queried as a unit |
+   Rules: use a **surrogate `bigint` IDENTITY by default**; reach for **UUIDv7/ULID (not v4)** the moment IDs cross a process boundary or are client-generated; never expose a sequential surrogate where the sequence is sensitive (use UUIDv7 instead); a natural key still deserves a `UNIQUE` constraint even when you also keep a surrogate PK. Never use `serial`/`SERIAL` (legacy, ownership/permission footguns) — use `GENERATED ALWAYS AS IDENTITY`.
+3. **Normalize 1NF → 2NF → 3NF/BCNF; stop at BCNF.** Eliminate the anomaly classes in order:
+   - **1NF** — atomic columns, no repeating groups, no arrays-as-CSV. Split `phone1, phone2, phone3` and `tags TEXT` into child rows.
+   - **2NF** — no non-key attribute depends on *part* of a composite key. In `order_item(order_id, product_id, product_name)`, `product_name` depends only on `product_id` → move it to `product`.
+   - **3NF** — no transitive dependency (non-key → non-key). `employee(id, dept_id, dept_name)`: `dept_name` depends on `dept_id`, not `id` → split out `department`.
+   - **BCNF** — every determinant is a candidate key. Fixes the rare overlapping-candidate-key case 3NF misses.
+   Target **3NF as the floor, BCNF where a determinant anomaly exists.** Each non-key fact lives in exactly one place; a fact changes via exactly one `UPDATE` to one row. Do **not** model attribute-value pairs generically (EAV: `entity/attribute/value` rows) — it destroys typing, FKs, and constraints; make real typed columns instead.
+4. **Model cardinalities explicitly — the FK lives on the "many" side.**
+   - **1:N** — FK column on the child (many) side pointing at the parent's PK. `order.customer_id → customer.id`. The direction is not a choice: the many side carries the FK.
+   - **M:N** — a junction (associative) table with a composite PK of both FKs: `enrollment(student_id, course_id, PRIMARY KEY(student_id, course_id))`. Relationship attributes (`enrolled_at`, `grade`) live on the junction.
+   - **1:1** — share a PK: the dependent table's PK *is* an FK to the parent (`user_profile.user_id PK REFERENCES user(id)`). Use only for optional/rarely-loaded columns; otherwise just add the columns to the parent.
+   - **Inheritance/polymorphism** — pick one, don't mix:
+     | Pattern | Shape | Use when |
+     |---|---|---|
+     | Single-table | one table, nullable subtype columns, `kind` discriminator | few subtypes, mostly shared columns — **default** |
+     | Class-table | base table + one child table per subtype, shared PK | subtypes have many distinct, NOT-NULL-able columns |
+     | Concrete-table | one full table per subtype, no base | subtypes never queried together |
+     For a polymorphic FK ("comment on a post *or* a photo"), **avoid the nullable-`(target_type, target_id)` pair** — it can't have a real FK. Prefer separate nullable FK columns each with its own real `REFERENCES` plus a `CHECK` that exactly one is set.
+5. **Encode every invariant as a constraint in the DDL, not in app code.** If the database can enforce it, the database enforces it — app checks race and drift.
+   - `NOT NULL` on every column that is logically required (default to NOT NULL; justify each nullable column).
+   - `UNIQUE` on each natural/candidate key and on business-unique combos.
+   - `FOREIGN KEY ... ON DELETE` — choose the action deliberately: `CASCADE` (children are parts of the parent), `RESTRICT`/`NO ACTION` (default; refuse to orphan), `SET NULL` (only if the FK is legitimately optional).
+   - `CHECK` for value rules (`amount_minor >= 0`, `status IN (...)`, `start_at < end_at`).
+   - **Partial unique index** for conditional uniqueness: `CREATE UNIQUE INDEX ON users(email) WHERE deleted_at IS NULL;` (one active email, history allowed).
+   - **Exclusion constraint** for "no overlap" (e.g. no double-booking a room): `EXCLUDE USING gist (room_id WITH =, during WITH &&)`.
+   ```sql
+   CREATE TABLE booking (
+     id         bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
+     room_id    bigint NOT NULL REFERENCES room(id) ON DELETE RESTRICT,
+     guest_id   bigint NOT NULL REFERENCES guest(id) ON DELETE RESTRICT,
+     during     tstzrange NOT NULL,
+     status     text NOT NULL DEFAULT 'held' CHECK (status IN ('held','confirmed','cancelled')),
+     created_at timestamptz NOT NULL DEFAULT now(),
+     updated_at timestamptz NOT NULL DEFAULT now(),
+     CONSTRAINT no_double_booking
+       EXCLUDE USING gist (room_id WITH =, during WITH &&) WHERE (status <> 'cancelled')
+   );
+   ```
+6. **Decide the cross-cutting column conventions once, apply everywhere.**
+   - **Soft vs hard delete** — default **hard delete** with `ON DELETE` rules. Use soft delete (`deleted_at timestamptz NULL`) only when you must retain history or undo; then *every* uniqueness and FK must account for it (partial indexes `WHERE deleted_at IS NULL`, filtered FKs) or you reintroduce duplicates and dangling references.
+   - **Audit columns** — `created_at timestamptz NOT NULL DEFAULT now()`, `updated_at timestamptz NOT NULL DEFAULT now()` (kept current by a trigger), and `created_by/updated_by` FKs where attribution matters. Full row history → separate `*_history` table or → **design-event-sourcing-cqrs**, not bolted onto the live row.
+   - **Enum vs lookup table** — small, fixed, code-coupled set (`status`) → `CHECK (x IN (...))` or a native enum. Editable-by-users or carrying extra attributes (label, sort order, active flag) → a lookup table with an FK. Don't ship a `roles` lookup table of three forever-fixed values; don't ship a CHECK list for something product managers edit weekly.
+   - **Types** — money as `NUMERIC(precision, scale)` or integer minor units, **never `float`/`real`/`double`** (see **money-decimal-arithmetic**); timestamps as `timestamptz` storing UTC instants, never naive `timestamp` (see **datetime-timezone-correctness**); text as `text` (not `varchar(n)` unless a real domain limit exists); booleans as `boolean`, not `0/1` ints or `'Y'/'N'`.
+7. **Denormalize only on a measured read hot path, and write down what you traded.** Start fully normalized; denormalize a specific column **only** when a real, frequent read can't be served cheaply by a join/index. Each denormalization is a stated consistency contract:
+   | Technique | Buys you | Costs you (must be stated) |
+   |---|---|---|
+   | Derived/rollup column (`post.comment_count`) | O(1) read, no aggregate join | Must update on every child write — trigger or app, or it drifts |
+   | Duplicated parent attribute (`order_item.product_name` at sale time) | Stable historical snapshot | Diverges from source by design — that's the point; document it |
+   | Materialized view | Precomputed report | Staleness window; explicit `REFRESH` (concurrently) needed |
+   | Pre-joined wide read table | Single-table dashboard read | Whole second write path to keep in sync |
+   Default: keep it normalized and add an index first. A rollup counter maintained by a trigger is acceptable; copying mutable data you then have to keep in sync in two places is a liability — only when the read win is proven. Record for each: *what's duplicated, who keeps it consistent, and the acceptable staleness*.
+8. **Output the DDL plus an access-pattern → table/index map.** Deliver: (a) `CREATE TABLE` statements with all constraints from steps 5–6, (b) the FK graph, (c) a table mapping each top query/access pattern to the table(s) and the index that serves it (so every hot read has a supporting index and no table has unused indexes). This map is the proof the schema serves the real queries, not just an abstract model.
+## Common Errors
+- **EAV ("flexible schema") tables.** `entity/attribute/value` rows throw away typing, FKs, and constraints and turn every read into a self-join pivot. Use real typed columns; if attributes are genuinely open-ended, a single typed `jsonb` column beats EAV.
+- **Float money.** `price float` loses cents to binary rounding — `0.1 + 0.2 ≠ 0.3`. Use `NUMERIC` or integer minor units; defer the math rules to money-decimal-arithmetic.
+- **Nullable-FK soup / polymorphic `(type, id)`.** A `parent_type text, parent_id bigint` pair can't have a foreign key, so the DB can't stop dangling references. Use separate real FK columns + a `CHECK` that exactly one is non-null.
+- **Natural key as PK that later changes.** Making `email` or a username the PK means a single edit cascades through every referencing FK. Keep a surrogate PK; put `UNIQUE` on the natural key.
+- **Random UUID (v4) PK on a hot insert path.** Random keys scatter B-tree inserts, bloating the index and WAL. Use UUID**v7**/ULID (time-ordered) when you need a UUID, or a `bigint` when the ID isn't exposed.
+- **Soft delete without filtered constraints.** `deleted_at` plus a plain `UNIQUE(email)` blocks a user from re-registering a freed email, and plain FKs still "see" deleted parents. Make uniqueness and lookups partial: `WHERE deleted_at IS NULL`.
+- **Over-normalizing tiny fixed sets.** A 3-value lookup table joined on every query adds a join for no benefit. A `CHECK (x IN (...))` enum is fine for small, code-coupled, rarely-changing sets.
+- **Storing lists in a column.** `tags TEXT` as CSV (or an unindexed array) can't be FK'd, constrained, or joined cleanly. Model it as a child/junction table.
+- **`varchar(255)` cargo-culting and naive `timestamp`.** Arbitrary length caps cause silent truncation; `timestamp` without time zone loses the offset. Use `text` and `timestamptz`.
+- **Missing `ON DELETE` action.** Defaulting blindly leaves you with either accidental orphans or surprise cascade deletes. Choose `CASCADE`/`RESTRICT`/`SET NULL` per FK on purpose.
+- **Denormalizing speculatively.** Duplicating data "for speed" before any query proves slow doubles your write paths and invites drift. Normalize first, index, measure, then denormalize the proven hot path.
+## Verify
+1. **3NF check:** For each table, every non-key column depends on the key, the whole key, and nothing but the key. Name any transitive (`non-key → non-key`) or partial dependency you allowed and justify it as a deliberate denormalization — otherwise split it.
+2. **Anomaly probe:** Pick one update, one insert, and one delete per core entity. Confirm each touches exactly one row in one place with no way to leave the data inconsistent (no second copy to forget).
+3. **Constraint coverage:** Every invariant you stated in step 1 maps to an actual `NOT NULL`/`UNIQUE`/`CHECK`/`FK`/exclusion/partial-index in the DDL — not to an app-layer comment. List any invariant *not* enforced by the DB and why.
+4. **Referential integrity:** Every FK names an explicit `ON DELETE` action; no polymorphic `(type, id)` pair lacks a real FK; every junction table has a composite PK of its two FKs.
+5. **Key sanity:** Every table has a PK; no natural key that can change is used as a PK; sequential surrogates aren't exposed where the sequence is sensitive; UUID columns are v7/ULID unless v4 is justified.
+6. **Type sanity:** No money in `float`; timestamps are `timestamptz` (UTC); no CSV/array masquerading as a relationship; enums vs lookup chosen per the step-6 rule.
+7. **Access-pattern map:** Every listed top query is served by an existing index/PK; every index supports at least one stated query (no orphan indexes); each denormalized column has a named owner-of-consistency and a stated staleness bound.
+Done = the schema is at 3NF (BCNF where a determinant anomaly existed) with every stated invariant enforced by a DB constraint, every PK/FK and `ON DELETE` chosen deliberately, no float money / naive timestamps / EAV / polymorphic-FK soup, and an access-pattern→table/index map in which every hot read has a supporting index and every denormalization carries a written consistency tradeoff.

package/skills/design-search-index-infra/SKILL.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+name: design-search-index-infra
+description: Designs full-text and vector search infrastructure — Elasticsearch/OpenSearch mappings and analyzers, vector index parameters (HNSW M/efConstruction, IVF nlist/PQ), BM25+vector hybrid via RRF, offline relevance tuning, capacity/shard topology, and alias-based zero-downtime reindex.
+when_to_use: Building or tuning a search backend — defining a text mapping (analyzers/tokenizers/multi-fields), sizing a vector index for recall-vs-latency-vs-memory, fusing lexical and vector into hybrid search, tuning relevance with offline eval, or planning a zero-downtime reindex. NOT for wiring an LLM retrieval/grounding flow (use rag-pipeline) or keeping the index synced from a DB log (use build-cdc-streaming-pipeline).
+---
+## When to Use
+Reach for this when the request is about **the search index itself** — how documents are mapped, scored, and stored — not the application logic that calls it:
+- "Set up a mapping/analyzer so partial-word and stemmed search works"
+- "Add autocomplete / typeahead / search-as-you-type"
+- "Pick HNSW vs IVF and size `M`/`efConstruction`/`nlist` for N million vectors"
+- "Combine keyword (BM25) and semantic (embedding) search into one ranked list"
+- "Search relevance is bad — boost titles, add synonyms, tune fuzziness"
+- "Reindex 200M docs to a new mapping with no downtime"
+- "How many shards/replicas, what refresh interval, how much heap for the HNSW graph?"
+NOT this skill:
+- Wiring an LLM to answer over the corpus (chunking → embed → retrieve → rerank → ground) → rag-pipeline (this skill builds the index that pipeline queries)
+- Keeping the index in sync with a source DB as rows change → build-cdc-streaming-pipeline
+- Tuning a relational `WHERE`/`JOIN`/`GIN` query plan in Postgres/MySQL → optimize-sql-query
+- Putting a read cache in front of the search cluster → caching-strategy
+- Measuring downstream *answer* quality of an LLM → llm-eval-harness
+## Steps
+1. **Classify the query workload first — it dictates index type. Do not vector-index everything.**
+   | Workload | Example query | Index | Scoring |
+   |---|---|---|---|
+   | Exact / filter | `status=active`, `sku=ABC`, range, faceting | `keyword`/numeric, `doc_values` | none (constant) — wrap in `filter` (cached, no scoring) |
+   | Full-text relevance | "wireless noise cancelling headphones" | `text` + analyzer | BM25 |
+   | Autocomplete / prefix | "wir" → "wireless…" | `search_as_you_type` or edge-ngram | prefix match |
+   | Semantic / fuzzy-intent | "thing to block out plane noise" | `dense_vector` (HNSW) | cosine/dot |
+   | Filtered hybrid | semantic + `brand IN (...)` + `price<200` | text + vector + keyword | RRF fusion + filter |
+   Most real search is the **last row**. Build all three field families in one index; choose per query, not per cluster.
+2. **Full-text mapping — be explicit, never rely on dynamic mapping in prod.** Disable `dynamic` or set `"dynamic": "strict"` so a stray field can't silently become the wrong type. Per field decide: `text` (analyzed, for relevance) vs `keyword` (exact, for filter/sort/agg) — you almost always want **both** via multi-fields:
+   ```json
+   {
+     "mappings": {
+       "dynamic": "strict",
+       "properties": {
+         "title": {
+           "type": "text",
+           "analyzer": "english",                 // stemming + lowercase + stop
+           "fields": {
+             "raw":  { "type": "keyword" },        // exact match / sort / agg
+             "ac":   { "type": "search_as_you_type" }, // typeahead
+             "ngram":{ "type": "text", "analyzer": "edge_ngram_idx",
+                       "search_analyzer": "standard" } // index edge-ngrams, query whole term
+           }
+         },
+         "body":      { "type": "text", "analyzer": "english" },
+         "brand":     { "type": "keyword" },
+         "price":     { "type": "scaled_float", "scaling_factor": 100 },
+         "embedding": { "type": "dense_vector", "dims": 768, "index": true,
+                        "similarity": "cosine",
+                        "index_options": { "type": "hnsw", "m": 16, "ef_construction": 128 } }
+       }
+     }
+   }
+   ```
+   Rules: set `search_analyzer` ≠ index `analyzer` for edge-ngram (index n-grams, search the **whole** query term — otherwise the query is shredded too and precision collapses). Use `english`/language analyzers for stemming; keep a `.raw` keyword for anything you sort, aggregate, or exact-match. Set `"index": false` on fields you only display (saves space). Mapping is **immutable** — wrong type means reindex (step 7), so get it right now.
+3. **Vector index — pick the algorithm by corpus size and recall target. Default HNSW; reach for IVF/PQ only when RAM-bound.**
+   | Param | What it trades | Opinionated default |
+   |---|---|---|
+   | `m` (HNSW edges/node) | recall + memory ↑ vs build time | **16** (32 for high-recall >10M) |
+   | `ef_construction` | build-time recall ↑ vs index speed | **128** (200 if recall short) |
+   | `ef_search`/`num_candidates` | query recall ↑ vs latency | **100**, raise until recall@10 plateaus |
+   | IVF `nlist` (partitions) | speed ↑ vs recall | **≈√N** vectors |
+   | IVF `nprobe` (lists scanned) | recall ↑ vs latency | **nlist/20**, tune up for recall |
+   | PQ (product quant.) | **memory ÷4–16** vs recall ↓ | only when graph won't fit RAM |
+   - **HNSW** = best recall/latency, default for ≤ ~10M vectors per node. The graph lives in **RAM** — budget `~(dims*4 + m*8) bytes × N`; a 768-dim, 10M, m=16 index ≈ **31 GB** resident. If it won't fit, go IVF-PQ (FAISS/Milvus) or scalar-quantize (`int8_hnsw` in ES 8.x → ~4× smaller, recall ~unchanged).
+   - **Distance must match how the model was trained.** Normalized embeddings (most sentence-transformers, OpenAI) → `cosine`, or `dot_product` if you pre-normalize vectors to unit length (skips the per-query magnitude divide → faster). Never `l2`/euclidean on cosine-trained vectors — silently wrong ranking, not an error.
+   - `dims` must equal the model's output exactly. Truncating/padding to a "round" number breaks similarity.
+4. **Hybrid — run BM25 and vector separately, then fuse with RRF. Do not just add raw scores.** BM25 (unbounded, ~0–30) and cosine (0–1) are different scales; summing lets one dominate. **Reciprocal Rank Fusion** uses rank position, not score:
+   ```
+   rrf_score(d) = Σ_q  1 / (k + rank_q(d))        // k=60 default, sum over each retriever
+   ```
+   Use ES `rank: { rrf: {...} }` / OpenSearch `hybrid` query, or compute RRF app-side over the two result sets. Weighted fusion (`α·norm(bm25) + (1-α)·cosine`) works **only if you min-max normalize each list first** and tune α offline (step 5); RRF needs no normalization and is the safer default.
+   **Filtering is where hybrid breaks.** A post-filter (retrieve top-k vectors, *then* drop ones failing `brand`/`price`) can return **0 results** when matches sit at rank 5000 — the recall cliff. Push filters **into** the ANN search (`knn.filter` in ES, `filter` clause in Milvus/Qdrant) so the graph traversal only visits passing nodes. For very selective filters (<1% pass), ANN degrades to near-exhaustive — detect it and fall back to a **brute-force exact** vector scan over the pre-filtered set; it's faster than fighting the graph.
+5. **Relevance tuning — change one lever, gate every change on an offline eval set. Never tune by eyeballing one query.**
+   - Levers, in order of leverage: **field boosts** (`title^3 body^1`), **synonyms** (`synonym_graph` filter, expand at *search* time so you can edit without reindex), **fuzziness** (`AUTO` = 0 edits <3 chars, 1 <6, 2 else — never blanket `fuzziness:2`, it wrecks precision), `minimum_should_match`, phrase/proximity boosts, recency/popularity `function_score`.
+   - Build a labeled judgment set (≥50 queries × graded docs) and gate with the **Ranking Evaluation API** (`_rank_eval`) or an offline harness. Metrics: **recall@k** (did we retrieve it at all), **P@k**, **NDCG@10** (rank quality with graded relevance). A change ships only if NDCG/recall **doesn't regress** on the set — local "looks better" is how you trade one query's win for ten silent losses.
+6. **Capacity / topology — size shards to data, not to instinct.**
+   - **Shard size 20–50 GB** each; target ≤ ~20 shards per GB of JVM heap; heap ≤ 31 GB (compressed-oops). Over-sharding (hundreds of tiny shards) is the #1 cluster-health killer — `shards = ceil(total_primary_GB / 40)`, round to your data-node count.
+   - **Replicas ≥1** for HA and to scale **search** throughput (each replica serves queries); they don't help write throughput. Set `number_of_replicas: 0` during a bulk reindex, restore to ≥1 after — cuts reindex time ~2×.
+   - **`refresh_interval`**: default `1s` is wasteful for write-heavy/bulk loads. Set **`30s`** (or `-1` during pure bulk, then restore) — controls the latency between index and searchability; raise it whenever you don't need sub-second freshness.
+   - Vector graph memory is **separate from and on top of** BM25/heap budgeting (step 3) — size the box for resident HNSW graphs, not just heap.
+7. **Lifecycle — alias-based zero-downtime reindex. The app NEVER names a concrete index.**
+   Apps read/write the alias `products`, which points at `products-v1`. To change mapping/analyzer/`dims`:
+   ```bash
+   # 1. create v2 with the NEW mapping, replicas=0, refresh=-1 (fast bulk)
+   PUT products-v2  { "settings": {"number_of_replicas":0,"refresh_interval":"-1"}, "mappings": {...} }
+   # 2. backfill v1 -> v2 (async, throttled so you don't starve live traffic)
+   POST _reindex?wait_for_completion=false  { "source":{"index":"products-v1","size":5000},
+                                              "dest":{"index":"products-v2"} }
+   # poll: GET _tasks/<taskId>   — bulk in 5–15k-doc batches; size by tuning until throughput plateaus, not by guessing
+   # 3. restore prod settings, then ATOMIC alias swap (single request = no gap, no double-read)
+   PUT products-v2/_settings { "number_of_replicas":1, "refresh_interval":"30s" }
+   POST _aliases { "actions":[ {"remove":{"index":"products-v1","alias":"products"}},
+                               {"add":   {"index":"products-v2","alias":"products"}} ]}
+   # 4. keep v1 until v2 verified in prod, then delete
+   ```
+   **Catch-up writes during reindex:** rows changed *after* the `_reindex` snapshot are missed. Either dual-write to both indices during the window, or replay the change log from the snapshot timestamp — the source-of-truth → index sync is owned by **build-cdc-streaming-pipeline**; this skill only guarantees the swap is atomic.
+## Common Errors
+- **Dynamic mapping in prod.** First doc with a stringly-typed number makes the field `text`; later range queries silently match nothing. Set `"dynamic": "strict"`.
+- **Wrong distance metric.** Indexing cosine-trained embeddings with `l2`/euclidean returns results — just in the wrong order, with no error. Match `similarity` to the model.
+- **Summing BM25 + cosine scores raw.** Different scales; one retriever dominates. Use RRF, or min-max normalize each list before weighting.
+- **Post-filtering vector results.** `knn` top-k then drop non-matching → empty or thin results when matches rank deep. Push the filter *into* the ANN search; brute-force exact for very selective filters.
+- **`fuzziness: 2` on everything.** Matches "cat"→"car"→"can" — precision tanks. Use `AUTO` (edit distance scaled by term length).
+- **Edge-ngram with the same analyzer at search time.** The query gets shredded into n-grams too, so "wire" matches "fire" via shared grams. Set `search_analyzer: standard` — index grams, search the whole term.
+- **HNSW graph that doesn't fit RAM.** Once it spills to disk, query latency jumps 10–100×. Compute resident size *before* indexing; quantize (`int8_hnsw`) or go IVF-PQ if it won't fit.
+- **Over-sharding.** 500 shards for 10 GB of data — each shard is a Lucene index with fixed overhead; cluster state bloats, GC thrashes. Aim 20–50 GB/shard.
+- **Reindex with default `refresh_interval` and `replicas≥1`.** Every batch refreshes + replicates → reindex crawls. Set `refresh:-1, replicas:0` during, restore after.
+- **App pinned to a concrete index name.** Any reindex is now downtime + a deploy. Always read/write through an alias from day one.
+- **Tuning relevance on one query.** A title boost that fixes "iphone" can wreck "running shoes review." Gate every change on the offline eval set.
+## Verify
+- **Mapping is explicit & immutable-safe:** `GET <index>/_mapping` shows `dynamic: strict`, every searched field has the intended `type`/`analyzer`, and a `.raw` keyword exists for each sorted/aggregated field.
+- **Analyzer does what you think:** `POST <index>/_analyze {"field":"title","text":"running shoes"}` emits the expected stemmed/lowercased/ngram tokens (e.g. `run`, `shoe`).
+- **Vector metric & dims match the model:** `dims` equals the embedding model's output, `similarity` matches its training; a near-duplicate of an indexed doc returns itself as the #1 nearest neighbor.
+- **Recall measured, not assumed:** kNN results compared against an exact brute-force scan on a sample → **recall@10 ≥ 0.95** at the chosen `ef_search`/`nprobe`; raise the param if below.
+- **Hybrid beats either alone:** on the labeled judgment set, RRF NDCG@10 ≥ max(BM25-only, vector-only), and a query with a hard filter (`brand=X`) still returns relevant, filter-passing results (no recall cliff, no empty set).
+- **Relevance change gated:** `_rank_eval` (or offline harness) shows NDCG@10 and recall@k did **not** regress vs the previous config across all judgment queries.
+- **Topology sane:** shards are 20–50 GB each, heap ≤ 31 GB, `_cluster/health` is `green`, and HNSW graphs fit resident RAM (no disk spill in node stats).
+- **Reindex was truly zero-downtime:** alias flipped in a single `_aliases` call, doc counts reconcile (`v2.count == v1.count + writes-during-window`), and live search returned 200s with no error spike across the swap.
+Done = the index serves the target workload with explicit immutable-safe mapping, measured recall@10 ≥ 0.95 and a non-regressing NDCG@10 on the offline eval set, hybrid+filter returns no empty/cliffed results, and a mapping change can ship via an atomic alias swap with zero search downtime.

package/skills/design-state-machine/SKILL.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+name: design-state-machine
+description: Models a lifecycle (order status, connection, checkout/approval flow, device/job state) as an EXPLICIT finite state machine or statechart instead of boolean-flag soup — enumerate states + events as closed sets, define transitions as a total (state×event)→state function with guards and entry/exit actions, make the current state a single persisted column (not N booleans), reject every undefined (state,event) pair loudly, and reach for hierarchical/parallel/history statecharts (Harel/SCXML semantics, XState v5 setup/createMachine, or a hand-rolled transition table) once flat states explode combinatorially; persist with optimistic-lock guarded transitions, drive side effects from entry actions or an outbox, and test by asserting the full transition matrix including illegal-edge rejection.
+when_to_use: A thing moves through named stages where only some transitions are legal and code is sprouting isPaid/isShipped/isCancelled flags, scattered if-ladders, or "how did it get into THIS state?" bugs — order/payment/subscription status, WebSocket/TCP connection lifecycle, multi-step wizard or approval workflow, document review, or a long-running job. Distinct from design-event-sourcing-cqrs (the append-only event LOG is the source of truth and state is a fold/projection over it; this skill models the state graph itself and may persist only the current state) and async-concurrency-correctness (races/locks/ordering between concurrent tasks; this skill models one entity's legal transitions, then uses a guarded write so concurrent transitions don't corrupt it).
+---
+## When to Use
+Reach for this skill when an entity moves through named stages and only some moves are legal:
+- "Order goes pending → paid → shipped → delivered, can also cancel/refund — model it properly"
+- "We have `isPaid && !isShipped && !isCancelled` checks everywhere and they keep contradicting"
+- "How did this row end up paid AND cancelled?" / "a refund fired on an unpaid order"
+- "Connection lifecycle: connecting → open → reconnecting → closed with backoff"
+- "Multi-step checkout / approval workflow / document review with back-and-forth"
+- "Add a new status and half the if-ladders broke" / "illegal transition slipped through"
+- "Should I use XState, or a transition table, or just an enum?"
+NOT this skill:
+- The append-only **event log is the source of truth** and state is rebuilt by folding events, with separate read models → design-event-sourcing-cqrs (this skill models the legal-transition graph and may persist only the *current* state; you can combine them — an FSM that emits events into a log)
+- **Concurrency** between tasks — locks, ordering, races, async correctness → async-concurrency-correctness (this skill defines one entity's legal moves; it then *uses* a guarded/optimistic write so two concurrent transitions don't corrupt the row)
+- **Distributed mutual exclusion / leader leases** across nodes → distributed-locks-leases
+- **Workflow orchestration across multiple agents/services** (sagas, fan-out, retries) → orchestrate-agent-workflow (use this skill to model each participant's local state)
+- **Idempotent retries** so a replayed transition command is a no-op → idempotency-keys (this skill makes the transition function; that makes invoking it twice safe)
+- The **DB column type / safe migration** to add the status column or new enum value → db-migration-safety; **how the enum evolves** without breaking old readers → schema-evolution-compatibility
+- A **front-end multi-step form's** validation/field state → build-form-validation; client/server cache sync → manage-client-server-state
+## Steps
+1. **Enumerate states and events as two CLOSED sets first — on paper/in a table before any code.** A state is a *named condition the entity rests in* (`pending`, `paid`, `shipped`); an event is a *named trigger that may cause a move* (`Pay`, `Ship`, `Cancel`). Keep them disjoint and finite. The single best diagnostic that you need this skill: you have ≥3 booleans describing one entity and not all `2^n` combinations are valid. `isPaid + isShipped + isCancelled` admits "shipped but not paid" and "paid and cancelled" — nonsense states the type system permits. Replace them with one `status` enum whose values are *exactly* the legal conditions. **Make illegal states unrepresentable.**
+2. **Define the transition as a total function `(state, event) → state` with guards, entry/exit actions — a TABLE, not scattered `if`s.** This table is the entire spec; review it like one. Anything not in the table is illegal by default.
+   | From | Event | Guard (must be true) | To | Entry action (on arrival) |
+   |---|---|---|---|---|
+   | `pending` | `Pay` | amount == order.total | `paid` | capture funds, emit `OrderPaid` |
+   | `pending` | `Cancel` | — | `cancelled` | release inventory |
+   | `paid` | `Ship` | inventory.reserved | `shipped` | create shipment, notify |
+   | `paid` | `Refund` | — | `refunded` | reverse charge |
+   | `shipped` | `Deliver` | — | `delivered` | close order |
+   | `shipped` | `Refund` | within return window | `refunded` | reverse charge, RMA |
+   - **Guard** = boolean precondition; if false, the event is *rejected* (transition does not fire), not an error-state. `pending --Pay[amount≠total]-->` simply doesn't move.
+   - **Entry action** runs on *every* arrival into a state (idempotent, since you may re-enter); **exit action** runs on leaving. Prefer entry actions over per-transition actions so the side effect is tied to *being in* the state, not the path taken.
+   - `delivered`, `cancelled`, `refunded` are **terminal** — no outgoing transitions. Mark them; assert nothing leaves.
+3. **Reject undefined `(state, event)` pairs LOUDLY — the rejection is the feature.** The whole point over flag-soup is that an out-of-order event can't silently corrupt state. The transition function must, for any pair not in the table, return a typed rejection (don't throw for *expected* business rejections; throw/log for *impossible* ones). Distinguish:
+   - **Guard-failed** (legal event, precondition not met) → 409/422, "cannot Ship: inventory not reserved", state unchanged.
+   - **Illegal event for state** (`Ship` while `cancelled`) → 409 + log/metric `illegal_transition{from,event}`; this often signals a real bug (double-click, replayed message, race) and you *want* the alarm.
+   ```ts
+   function transition(s: State, e: Event, ctx): Result<State> {
+     const row = table[s]?.[e.type];
+     if (!row) return reject("illegal", `${e.type} not allowed in ${s}`); // not in table at all
+     if (row.guard && !row.guard(e, ctx)) return reject("guard", row.why);
+     return ok(row.to);
+   }
+   ```
+   Never write `if (status !== 'cancelled') { ... }` ad hoc — that's flag-soup creeping back. Route *every* change through the one function.
+4. **Persist the current state as ONE column and make the write a guarded compare-and-set so concurrent transitions can't corrupt it.** Store `status` as a single enum/text column with a CHECK or DB enum constraint — not N booleans, not a separate row per flag. The transition write must be conditional on the *expected* from-state (optimistic concurrency), so two racing transitions don't both "succeed":
+   ```sql
+   UPDATE orders SET status = 'shipped', version = version + 1, updated_at = now()
+   WHERE id = $1 AND status = 'paid' AND version = $2;   -- 0 rows affected ⇒ someone else moved it; reject & re-read
+   ```
+   `WHERE status = <expected_from>` is the cheap optimistic lock; 0 rows updated means the precondition no longer holds → reload and re-decide, never blind-overwrite. Add a `status_history(order_id, from, to, event, actor, at)` audit row in the same transaction so "how did it get here?" is answerable. (For locking semantics → async-concurrency-correctness; for making a re-sent command idempotent → idempotency-keys.)
+5. **Drive side effects from entry actions, and make external effects atomic-with-the-state-change via an outbox.** A side effect that must happen *because* you entered a state (charge, email, shipment) belongs in that state's entry action, so it fires on every path in and only once. But "update status row" + "call Stripe/send email" as two separate operations can crash between them (state changed, effect lost — or effect fired, state didn't). Write the status change AND an `outbox` row in **one DB transaction**; a relay publishes the outbox at-least-once and consumers dedup. This keeps the FSM's state and its observable effects consistent. (Outbox/dedup mechanics → idempotency-keys; emitting domain events into a log instead → design-event-sourcing-cqrs.)
+6. **When flat states explode combinatorially, go hierarchical/parallel/history (statecharts) — don't multiply states.** Harel statecharts (the basis of SCXML and XState) add three tools that kill state explosion:
+   - **Hierarchy (nested/compound states):** group `connecting`/`open`/`reconnecting` under a parent `online`; a single `Disconnect` transition on the parent applies to all children — write the common edge once instead of N times.
+   - **Parallel (orthogonal regions):** independent concerns that vary simultaneously become separate regions instead of the cross-product. A media player's `{playing|paused} × {muted|unmuted}` is 2 regions, not 4 states; add a third dimension and you avoid `2×2×2 = 8`.
+   - **History states:** re-entering a compound state resumes the last active child (`H` shallow / `H*` deep) — for "resume where the wizard left off" or reconnect-to-prior-substate.
+   Rule of thumb: a handful of states + a clear table → **hand-rolled transition table** (zero deps, fully testable, easiest to review). Nesting/parallelism/history, or you want a visualizer and typed `assign` context → **XState v5** (`setup({ types, actions, guards }).createMachine({...})`, `actor.send(event)`, statelyai inspector). Cross-language/standards interop → **SCXML**. Don't reach for a library for 3 states; don't hand-roll 4 orthogonal regions.
+7. **Model time/retries as real states + events, not sleeps buried in code.** `reconnecting` with a `backoff` timer is a state; the timer firing is an event (`RetryTimeout`) that transitions back to `connecting` (or to `failed` after `attempts >= max`, a guard on context). Keep the retry count in machine context, not a module global. This makes the backoff policy reviewable in the table and testable without real clocks. (The retry *policy* — jitter, budget, circuit-breaker → resilience-timeouts-retries; this skill places those decisions as guarded transitions in the lifecycle.)
+8. **Visualize and review the graph; treat unreachable/trap states as bugs.** Generate a diagram from the table (XState → Stately inspector; hand-rolled → emit Mermaid `stateDiagram-v2`, see mermaid-diagram) and eyeball it for: a **trap** (non-terminal state with no outgoing edge — entity gets stuck), an **unreachable** state (no incoming edge — dead enum value), and a **missing terminal** (a "done" that still has edges). Every non-terminal state should have at least one path to a terminal/expected state. A new status added without table edges shows up immediately as an orphan node.
+9. **Test the full transition MATRIX, including the illegal edges — that's the differentiator.** For every `(state, event)` pair: assert legal ones land in the right target and run the entry action exactly once; assert *illegal* ones leave state unchanged and return the typed rejection (and emit the `illegal_transition` metric). Property test: from any reachable state, applying any event either transitions per the table or rejects — it never produces a state outside the enum. Add a concurrency test: two parallel `Ship` on the same `paid` order → exactly one wins (guarded UPDATE), the other gets 0-rows-and-reject. (Structure the suite → write-tests; the matrix-as-cases is a natural fit for test-data-factories/property-based-testing.)
+## Common Errors
+- **Boolean-flag soup (`isPaid && !isShipped && !isCancelled`).** N booleans encode `2^n` combinations but only a few are legal; contradictory states ("shipped, not paid") become representable and *do* happen. Fix: one `status` enum = exactly the legal conditions; make illegal states unrepresentable.
+- **Transition logic scattered across `if`-ladders in controllers/services.** No single place owns "what's legal"; a new caller forgets a guard. Fix: one transition function + table; route 100% of changes through it.
+- **Silently ignoring out-of-order events.** `if (status === 'paid') ship()` with no `else` swallows a `Ship` on a `cancelled` order — masking double-clicks, replays, races. Fix: explicit reject + log/metric `illegal_transition`; the alarm is the value.
+- **Blind `UPDATE ... SET status = 'shipped' WHERE id = ?`.** No from-state guard → a stale/concurrent writer overwrites a state it never saw. Fix: `WHERE status = <expected_from> AND version = ?`; 0 rows ⇒ re-read and re-decide.
+- **Side effect outside the state transaction.** Charge fires, then the status write crashes (or vice versa) → state and effect diverge. Fix: status change + outbox row in one transaction; relay publishes; consumers dedup (idempotency-keys).
+- **Entry action that isn't idempotent.** Re-entering a state (retry, replay) double-sends the email/double-charges. Fix: idempotent entry actions, or gate the effect on the *transition* having actually committed.
+- **State explosion from flattening orthogonal concerns.** Modeling `{playing,paused}×{muted,unmuted}` as 4 flat states, then 8, then 16. Fix: parallel regions (one per independent concern); they compose instead of multiply.
+- **Reaching for a heavy library for 3 states** (or hand-rolling 4 orthogonal regions). Fix: match the tool to the shape — table for flat/small, XState for hierarchy/parallel/history, SCXML for cross-language.
+- **Trap / unreachable states.** A non-terminal state with no exit (stuck forever) or an enum value with no incoming edge (dead). Fix: visualize the graph; assert reachability and that every non-terminal has an outgoing edge.
+- **Timers/retries as `sleep()` buried in handlers.** Backoff logic invisible to the spec, untestable without real time. Fix: model `reconnecting`/`RetryTimeout` as state+event with the attempt count in context.
+- **Adding a status without updating the table.** New enum value, old if-ladders don't handle it → falls through to a default branch silently. Fix: the table is the spec; a new state with no edges is an orphan the visualizer/tests catch.
+## Verify
+1. **No flag soup:** grep the diff for `is<X> && !is<Y>`-style combinations on one entity; the state is a single enum/column, and contradictory combinations are no longer representable.
+2. **One transition function:** every status mutation routes through the single `transition(state,event)`; no ad-hoc `SET status =` or `if (status !== ...)` outside it (grep for stray status writes).
+3. **Illegal edges rejected:** for the full `(state,event)` matrix, illegal pairs leave state unchanged and return the typed rejection + emit `illegal_transition`; guard-failures return 409/422 with a reason, not a crash.
+4. **Legal edges + entry actions:** each table transition lands in the correct target and runs its entry action exactly once (assert via spy/counter), even on a re-entry path.
+5. **Guarded persistence:** the UPDATE is conditional on the expected from-state (and version); a test with two concurrent transitions on the same row shows exactly one commits, the other gets 0-rows-and-reject.
+6. **Atomic effects:** status change and external-effect publish are in one transaction (outbox); kill the process between them → on restart the relay still publishes (effect recorded iff state changed), no orphan/lost effect.
+7. **Graph is sound:** generated diagram has no trap (non-terminal with no exit), no unreachable state (no incoming edge), terminals truly terminal; every non-terminal reaches a terminal.
+8. **Statechart features (if used):** a parent transition applies to all nested children (written once); parallel regions vary independently; a history state resumes the prior child.
+9. **Property test holds:** from any reachable state, any event either transitions per the table or rejects — never yields a value outside the state enum.
+Done = the lifecycle is one persisted enum driven by a single total transition function with explicit guards and entry/exit actions, every illegal `(state,event)` pair is rejected loudly (not silently swallowed), persistence is a from-state-guarded compare-and-set, side effects are atomic with the state change via an outbox, hierarchy/parallel/history are used only where flat states would explode, and the full transition matrix — legal AND illegal edges plus the concurrent-write race — is proven by the tests in checks 3–9.