npm - sanook-cli - Versions diffs - 0.4.0 → 0.5.1 - Mend

sanook-cli 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (238) hide show

package/.env.example +19 -0
package/CHANGELOG.md +173 -0
package/README.md +153 -20
package/README.th.md +136 -0
package/dist/agentContext.js +4 -0
package/dist/approval.js +6 -0
package/dist/bin.js +405 -57
package/dist/brain.js +92 -59
package/dist/brand.js +47 -0
package/dist/checkpoint.js +37 -0
package/dist/commands.js +86 -6
package/dist/compaction.js +76 -5
package/dist/config.js +100 -12
package/dist/cost.js +60 -3
package/dist/doctor.js +92 -0
package/dist/gateway/auth.js +2 -2
package/dist/gateway/ledger.js +2 -2
package/dist/gateway/scheduler.js +1 -0
package/dist/gateway/serve.js +6 -4
package/dist/gateway/server.js +10 -2
package/dist/git.js +11 -2
package/dist/hooks.js +43 -17
package/dist/knowledge.js +48 -49
package/dist/loop.js +182 -66
package/dist/lsp/client.js +173 -0
package/dist/lsp/framing.js +56 -0
package/dist/lsp/index.js +138 -0
package/dist/lsp/servers.js +82 -0
package/dist/mcp-server.js +244 -0
package/dist/mcp.js +184 -29
package/dist/memory-store.js +559 -0
package/dist/memory.js +143 -29
package/dist/orchestrate.js +150 -0
package/dist/providers/codex.js +21 -7
package/dist/providers/keys.js +3 -2
package/dist/providers/models.js +22 -6
package/dist/providers/registry.js +155 -1
package/dist/repomap.js +93 -0
package/dist/search/chunk.js +158 -0
package/dist/search/embed-store.js +187 -0
package/dist/search/engine.js +203 -0
package/dist/search/fuse.js +35 -0
package/dist/search/index-core.js +187 -0
package/dist/search/indexer.js +241 -0
package/dist/search/store.js +77 -0
package/dist/session.js +42 -8
package/dist/skill-install.js +10 -10
package/dist/skills.js +12 -9
package/dist/summarize.js +31 -0
package/dist/tools/bash.js +21 -2
package/dist/tools/diagnostics.js +41 -0
package/dist/tools/edit.js +29 -7
package/dist/tools/index.js +8 -1
package/dist/tools/list.js +7 -2
package/dist/tools/permission.js +90 -9
package/dist/tools/read.js +23 -4
package/dist/tools/remember.js +1 -1
package/dist/tools/sandbox.js +61 -0
package/dist/tools/search.js +105 -4
package/dist/tools/task.js +195 -29
package/dist/tools/timeout.js +35 -0
package/dist/tools/util.js +10 -0
package/dist/tools/write.js +6 -4
package/dist/trust.js +89 -0
package/dist/ui/app.js +228 -31
package/dist/ui/banner.js +4 -9
package/dist/ui/brain-wizard.js +2 -2
package/dist/ui/history.js +30 -0
package/dist/ui/mentions.js +44 -0
package/dist/ui/render.js +55 -15
package/dist/ui/setup.js +97 -12
package/dist/ui/useEditor.js +83 -0
package/dist/update.js +114 -0
package/dist/worktree.js +173 -0
package/package.json +11 -5
package/scripts/postinstall.mjs +33 -0
package/second-brain/.agents/_Index.md +30 -0
package/second-brain/.agents/skills/_Index.md +30 -0
package/second-brain/.agents/workflows/_Index.md +30 -0
package/second-brain/AGENTS.md +4 -4
package/second-brain/Acceptance/_Index.md +30 -0
package/second-brain/Acceptance/golden-case-template.md +39 -0
package/second-brain/Areas/_Index.md +30 -0
package/second-brain/Bugs/System-OS/_Index.md +30 -0
package/second-brain/Bugs/_Index.md +30 -0
package/second-brain/CLAUDE.md +4 -1
package/second-brain/Checklists/_Index.md +30 -0
package/second-brain/Checklists/preflight-postflight-template.md +29 -0
package/second-brain/Distillations/_Index.md +30 -0
package/second-brain/Entities/_Index.md +30 -0
package/second-brain/Entities/entity-template.md +33 -0
package/second-brain/Evals/_Index.md +30 -0
package/second-brain/Evals/correction-pairs.md +24 -0
package/second-brain/Evals/failure-taxonomy.md +24 -0
package/second-brain/Evals/golden-set.md +25 -0
package/second-brain/Evals/quality-ledger.md +23 -0
package/second-brain/Evals/self-eval-rubric.md +23 -0
package/second-brain/GEMINI.md +4 -4
package/second-brain/Goals/_Index.md +30 -0
package/second-brain/Handoffs/_Index.md +30 -0
package/second-brain/Home.md +7 -0
package/second-brain/Intake/Raw Sources/_Index.md +30 -0
package/second-brain/Intake/_Index.md +30 -0
package/second-brain/Intake/_Quarantine/_Index.md +30 -0
package/second-brain/Learning/_Index.md +30 -0
package/second-brain/Playbooks/_Index.md +30 -0
package/second-brain/Playbooks/playbook-template.md +23 -0
package/second-brain/Projects/_Index.md +30 -0
package/second-brain/Prompts/_Index.md +30 -0
package/second-brain/README.md +2 -1
package/second-brain/Research/_Index.md +30 -0
package/second-brain/Retrospectives/_Index.md +30 -0
package/second-brain/Reviews/_Index.md +30 -0
package/second-brain/Runbooks/_Index.md +30 -0
package/second-brain/Runbooks/eval-loop.md +24 -0
package/second-brain/Sessions/_Index.md +30 -0
package/second-brain/Shared/AI-Context-Index.md +20 -0
package/second-brain/Shared/AI-Threads/_Index.md +30 -0
package/second-brain/Shared/Archive/_Index.md +30 -0
package/second-brain/Shared/Assets/_Index.md +30 -0
package/second-brain/Shared/Context-Packs/_Index.md +30 -0
package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
package/second-brain/Shared/Coordination/NOW.md +28 -0
package/second-brain/Shared/Coordination/_Index.md +30 -0
package/second-brain/Shared/Coordination/agent-registry.md +24 -0
package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
package/second-brain/Shared/Coordination/task-board.md +32 -0
package/second-brain/Shared/Core-Facts/_Index.md +30 -0
package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
package/second-brain/Shared/Glossary/_Index.md +30 -0
package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
package/second-brain/Shared/Operating-State/_Index.md +30 -0
package/second-brain/Shared/Prompting/_Index.md +30 -0
package/second-brain/Shared/Provenance/_Index.md +30 -0
package/second-brain/Shared/Rules/_Index.md +30 -0
package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
package/second-brain/Shared/Rules/rules-formatting.md +34 -0
package/second-brain/Shared/Scripts/_Index.md +30 -0
package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
package/second-brain/Shared/User-Memory/_Index.md +30 -0
package/second-brain/Shared/User-Persona/_Index.md +30 -0
package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
package/second-brain/Shared/Working-Memory/_Index.md +30 -0
package/second-brain/Shared/_Index.md +30 -0
package/second-brain/Shared/mcp-servers/_Index.md +30 -0
package/second-brain/Skills/_Index.md +30 -0
package/second-brain/Templates/_Index.md +30 -0
package/second-brain/Templates/bug.md +2 -0
package/second-brain/Templates/handoff.md +2 -0
package/second-brain/Templates/session.md +2 -0
package/second-brain/Tools/_Index.md +30 -0
package/second-brain/Traces/_Index.md +30 -0
package/second-brain/Vault Structure Map.md +33 -1
package/second-brain/copilot/_Index.md +30 -0
package/skills/audit-license-compliance/SKILL.md +117 -0
package/skills/author-codemod/SKILL.md +110 -0
package/skills/build-audit-logging/SKILL.md +112 -0
package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
package/skills/build-cli-tool/SKILL.md +108 -0
package/skills/build-data-table/SKILL.md +141 -0
package/skills/build-native-mobile-ui/SKILL.md +154 -0
package/skills/build-offline-first-sync/SKILL.md +118 -0
package/skills/build-realtime-channel/SKILL.md +122 -0
package/skills/build-vector-search/SKILL.md +131 -0
package/skills/compose-local-dev-stack/SKILL.md +149 -0
package/skills/configure-bundler-build/SKILL.md +166 -0
package/skills/configure-dns-tls/SKILL.md +142 -0
package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
package/skills/configure-security-headers-csp/SKILL.md +122 -0
package/skills/contract-testing/SKILL.md +140 -0
package/skills/datetime-timezone-correctness/SKILL.md +125 -0
package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
package/skills/debug-flaky-tests/SKILL.md +128 -0
package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
package/skills/deliver-webhooks/SKILL.md +116 -0
package/skills/design-api-pagination/SKILL.md +144 -0
package/skills/design-authorization-model/SKILL.md +119 -0
package/skills/design-backup-dr-recovery/SKILL.md +113 -0
package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
package/skills/design-multi-tenancy/SKILL.md +100 -0
package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
package/skills/design-relational-schema/SKILL.md +129 -0
package/skills/design-search-index-infra/SKILL.md +151 -0
package/skills/design-state-machine/SKILL.md +108 -0
package/skills/design-token-system/SKILL.md +109 -0
package/skills/distributed-locks-leases/SKILL.md +120 -0
package/skills/encrypt-sensitive-data/SKILL.md +148 -0
package/skills/feature-flags-rollout/SKILL.md +130 -0
package/skills/file-upload-object-storage/SKILL.md +107 -0
package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
package/skills/harden-llm-app-reliability/SKILL.md +126 -0
package/skills/i18n-localization-setup/SKILL.md +113 -0
package/skills/idempotency-keys/SKILL.md +107 -0
package/skills/implement-push-notifications/SKILL.md +142 -0
package/skills/ingest-webhook-secure/SKILL.md +120 -0
package/skills/integrate-oauth-oidc/SKILL.md +126 -0
package/skills/load-stress-test/SKILL.md +129 -0
package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
package/skills/model-nosql-data/SKILL.md +118 -0
package/skills/money-decimal-arithmetic/SKILL.md +123 -0
package/skills/monitor-ml-drift/SKILL.md +109 -0
package/skills/numeric-precision-units/SKILL.md +144 -0
package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
package/skills/optimize-react-rerenders/SKILL.md +124 -0
package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
package/skills/payments-billing-integration/SKILL.md +114 -0
package/skills/pin-toolchain-versions/SKILL.md +116 -0
package/skills/plan-strangler-migration/SKILL.md +95 -0
package/skills/property-based-testing/SKILL.md +108 -0
package/skills/publish-package-registry/SKILL.md +130 -0
package/skills/recover-git-state/SKILL.md +119 -0
package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
package/skills/resilience-timeouts-retries/SKILL.md +104 -0
package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
package/skills/rewrite-git-history/SKILL.md +109 -0
package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
package/skills/schema-evolution-compatibility/SKILL.md +121 -0
package/skills/send-transactional-email/SKILL.md +126 -0
package/skills/serve-deploy-ml-model/SKILL.md +107 -0
package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
package/skills/setup-devcontainer-env/SKILL.md +131 -0
package/skills/setup-lint-format-precommit/SKILL.md +140 -0
package/skills/setup-monorepo-tooling/SKILL.md +125 -0
package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
package/skills/structured-output-llm/SKILL.md +86 -0
package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
package/skills/test-data-factories/SKILL.md +158 -0
package/skills/threat-model-stride/SKILL.md +123 -0
package/skills/train-evaluate-ml-model/SKILL.md +109 -0
package/skills/unicode-text-correctness/SKILL.md +109 -0
package/skills/visual-regression-testing/SKILL.md +120 -0

package/skills/design-search-index-infra/SKILL.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+name: design-search-index-infra
+description: Designs full-text and vector search infrastructure — Elasticsearch/OpenSearch mappings and analyzers, vector index parameters (HNSW M/efConstruction, IVF nlist/PQ), BM25+vector hybrid via RRF, offline relevance tuning, capacity/shard topology, and alias-based zero-downtime reindex.
+when_to_use: Building or tuning a search backend — defining a text mapping (analyzers/tokenizers/multi-fields), sizing a vector index for recall-vs-latency-vs-memory, fusing lexical and vector into hybrid search, tuning relevance with offline eval, or planning a zero-downtime reindex. NOT for wiring an LLM retrieval/grounding flow (use rag-pipeline) or keeping the index synced from a DB log (use build-cdc-streaming-pipeline).
+---
+## When to Use
+Reach for this when the request is about **the search index itself** — how documents are mapped, scored, and stored — not the application logic that calls it:
+- "Set up a mapping/analyzer so partial-word and stemmed search works"
+- "Add autocomplete / typeahead / search-as-you-type"
+- "Pick HNSW vs IVF and size `M`/`efConstruction`/`nlist` for N million vectors"
+- "Combine keyword (BM25) and semantic (embedding) search into one ranked list"
+- "Search relevance is bad — boost titles, add synonyms, tune fuzziness"
+- "Reindex 200M docs to a new mapping with no downtime"
+- "How many shards/replicas, what refresh interval, how much heap for the HNSW graph?"
+NOT this skill:
+- Wiring an LLM to answer over the corpus (chunking → embed → retrieve → rerank → ground) → rag-pipeline (this skill builds the index that pipeline queries)
+- Keeping the index in sync with a source DB as rows change → build-cdc-streaming-pipeline
+- Tuning a relational `WHERE`/`JOIN`/`GIN` query plan in Postgres/MySQL → optimize-sql-query
+- Putting a read cache in front of the search cluster → caching-strategy
+- Measuring downstream *answer* quality of an LLM → llm-eval-harness
+## Steps
+1. **Classify the query workload first — it dictates index type. Do not vector-index everything.**
+   | Workload | Example query | Index | Scoring |
+   |---|---|---|---|
+   | Exact / filter | `status=active`, `sku=ABC`, range, faceting | `keyword`/numeric, `doc_values` | none (constant) — wrap in `filter` (cached, no scoring) |
+   | Full-text relevance | "wireless noise cancelling headphones" | `text` + analyzer | BM25 |
+   | Autocomplete / prefix | "wir" → "wireless…" | `search_as_you_type` or edge-ngram | prefix match |
+   | Semantic / fuzzy-intent | "thing to block out plane noise" | `dense_vector` (HNSW) | cosine/dot |
+   | Filtered hybrid | semantic + `brand IN (...)` + `price<200` | text + vector + keyword | RRF fusion + filter |
+   Most real search is the **last row**. Build all three field families in one index; choose per query, not per cluster.
+2. **Full-text mapping — be explicit, never rely on dynamic mapping in prod.** Disable `dynamic` or set `"dynamic": "strict"` so a stray field can't silently become the wrong type. Per field decide: `text` (analyzed, for relevance) vs `keyword` (exact, for filter/sort/agg) — you almost always want **both** via multi-fields:
+   ```json
+   {
+     "mappings": {
+       "dynamic": "strict",
+       "properties": {
+         "title": {
+           "type": "text",
+           "analyzer": "english",                 // stemming + lowercase + stop
+           "fields": {
+             "raw":  { "type": "keyword" },        // exact match / sort / agg
+             "ac":   { "type": "search_as_you_type" }, // typeahead
+             "ngram":{ "type": "text", "analyzer": "edge_ngram_idx",
+                       "search_analyzer": "standard" } // index edge-ngrams, query whole term
+           }
+         },
+         "body":      { "type": "text", "analyzer": "english" },
+         "brand":     { "type": "keyword" },
+         "price":     { "type": "scaled_float", "scaling_factor": 100 },
+         "embedding": { "type": "dense_vector", "dims": 768, "index": true,
+                        "similarity": "cosine",
+                        "index_options": { "type": "hnsw", "m": 16, "ef_construction": 128 } }
+       }
+     }
+   }
+   ```
+   Rules: set `search_analyzer` ≠ index `analyzer` for edge-ngram (index n-grams, search the **whole** query term — otherwise the query is shredded too and precision collapses). Use `english`/language analyzers for stemming; keep a `.raw` keyword for anything you sort, aggregate, or exact-match. Set `"index": false` on fields you only display (saves space). Mapping is **immutable** — wrong type means reindex (step 7), so get it right now.
+3. **Vector index — pick the algorithm by corpus size and recall target. Default HNSW; reach for IVF/PQ only when RAM-bound.**
+   | Param | What it trades | Opinionated default |
+   |---|---|---|
+   | `m` (HNSW edges/node) | recall + memory ↑ vs build time | **16** (32 for high-recall >10M) |
+   | `ef_construction` | build-time recall ↑ vs index speed | **128** (200 if recall short) |
+   | `ef_search`/`num_candidates` | query recall ↑ vs latency | **100**, raise until recall@10 plateaus |
+   | IVF `nlist` (partitions) | speed ↑ vs recall | **≈√N** vectors |
+   | IVF `nprobe` (lists scanned) | recall ↑ vs latency | **nlist/20**, tune up for recall |
+   | PQ (product quant.) | **memory ÷4–16** vs recall ↓ | only when graph won't fit RAM |
+   - **HNSW** = best recall/latency, default for ≤ ~10M vectors per node. The graph lives in **RAM** — budget `~(dims*4 + m*8) bytes × N`; a 768-dim, 10M, m=16 index ≈ **31 GB** resident. If it won't fit, go IVF-PQ (FAISS/Milvus) or scalar-quantize (`int8_hnsw` in ES 8.x → ~4× smaller, recall ~unchanged).
+   - **Distance must match how the model was trained.** Normalized embeddings (most sentence-transformers, OpenAI) → `cosine`, or `dot_product` if you pre-normalize vectors to unit length (skips the per-query magnitude divide → faster). Never `l2`/euclidean on cosine-trained vectors — silently wrong ranking, not an error.
+   - `dims` must equal the model's output exactly. Truncating/padding to a "round" number breaks similarity.
+4. **Hybrid — run BM25 and vector separately, then fuse with RRF. Do not just add raw scores.** BM25 (unbounded, ~0–30) and cosine (0–1) are different scales; summing lets one dominate. **Reciprocal Rank Fusion** uses rank position, not score:
+   ```
+   rrf_score(d) = Σ_q  1 / (k + rank_q(d))        // k=60 default, sum over each retriever
+   ```
+   Use ES `rank: { rrf: {...} }` / OpenSearch `hybrid` query, or compute RRF app-side over the two result sets. Weighted fusion (`α·norm(bm25) + (1-α)·cosine`) works **only if you min-max normalize each list first** and tune α offline (step 5); RRF needs no normalization and is the safer default.
+   **Filtering is where hybrid breaks.** A post-filter (retrieve top-k vectors, *then* drop ones failing `brand`/`price`) can return **0 results** when matches sit at rank 5000 — the recall cliff. Push filters **into** the ANN search (`knn.filter` in ES, `filter` clause in Milvus/Qdrant) so the graph traversal only visits passing nodes. For very selective filters (<1% pass), ANN degrades to near-exhaustive — detect it and fall back to a **brute-force exact** vector scan over the pre-filtered set; it's faster than fighting the graph.
+5. **Relevance tuning — change one lever, gate every change on an offline eval set. Never tune by eyeballing one query.**
+   - Levers, in order of leverage: **field boosts** (`title^3 body^1`), **synonyms** (`synonym_graph` filter, expand at *search* time so you can edit without reindex), **fuzziness** (`AUTO` = 0 edits <3 chars, 1 <6, 2 else — never blanket `fuzziness:2`, it wrecks precision), `minimum_should_match`, phrase/proximity boosts, recency/popularity `function_score`.
+   - Build a labeled judgment set (≥50 queries × graded docs) and gate with the **Ranking Evaluation API** (`_rank_eval`) or an offline harness. Metrics: **recall@k** (did we retrieve it at all), **P@k**, **NDCG@10** (rank quality with graded relevance). A change ships only if NDCG/recall **doesn't regress** on the set — local "looks better" is how you trade one query's win for ten silent losses.
+6. **Capacity / topology — size shards to data, not to instinct.**
+   - **Shard size 20–50 GB** each; target ≤ ~20 shards per GB of JVM heap; heap ≤ 31 GB (compressed-oops). Over-sharding (hundreds of tiny shards) is the #1 cluster-health killer — `shards = ceil(total_primary_GB / 40)`, round to your data-node count.
+   - **Replicas ≥1** for HA and to scale **search** throughput (each replica serves queries); they don't help write throughput. Set `number_of_replicas: 0` during a bulk reindex, restore to ≥1 after — cuts reindex time ~2×.
+   - **`refresh_interval`**: default `1s` is wasteful for write-heavy/bulk loads. Set **`30s`** (or `-1` during pure bulk, then restore) — controls the latency between index and searchability; raise it whenever you don't need sub-second freshness.
+   - Vector graph memory is **separate from and on top of** BM25/heap budgeting (step 3) — size the box for resident HNSW graphs, not just heap.
+7. **Lifecycle — alias-based zero-downtime reindex. The app NEVER names a concrete index.**
+   Apps read/write the alias `products`, which points at `products-v1`. To change mapping/analyzer/`dims`:
+   ```bash
+   # 1. create v2 with the NEW mapping, replicas=0, refresh=-1 (fast bulk)
+   PUT products-v2  { "settings": {"number_of_replicas":0,"refresh_interval":"-1"}, "mappings": {...} }
+   # 2. backfill v1 -> v2 (async, throttled so you don't starve live traffic)
+   POST _reindex?wait_for_completion=false  { "source":{"index":"products-v1","size":5000},
+                                              "dest":{"index":"products-v2"} }
+   # poll: GET _tasks/<taskId>   — bulk in 5–15k-doc batches; size by tuning until throughput plateaus, not by guessing
+   # 3. restore prod settings, then ATOMIC alias swap (single request = no gap, no double-read)
+   PUT products-v2/_settings { "number_of_replicas":1, "refresh_interval":"30s" }
+   POST _aliases { "actions":[ {"remove":{"index":"products-v1","alias":"products"}},
+                               {"add":   {"index":"products-v2","alias":"products"}} ]}
+   # 4. keep v1 until v2 verified in prod, then delete
+   ```
+   **Catch-up writes during reindex:** rows changed *after* the `_reindex` snapshot are missed. Either dual-write to both indices during the window, or replay the change log from the snapshot timestamp — the source-of-truth → index sync is owned by **build-cdc-streaming-pipeline**; this skill only guarantees the swap is atomic.
+## Common Errors
+- **Dynamic mapping in prod.** First doc with a stringly-typed number makes the field `text`; later range queries silently match nothing. Set `"dynamic": "strict"`.
+- **Wrong distance metric.** Indexing cosine-trained embeddings with `l2`/euclidean returns results — just in the wrong order, with no error. Match `similarity` to the model.
+- **Summing BM25 + cosine scores raw.** Different scales; one retriever dominates. Use RRF, or min-max normalize each list before weighting.
+- **Post-filtering vector results.** `knn` top-k then drop non-matching → empty or thin results when matches rank deep. Push the filter *into* the ANN search; brute-force exact for very selective filters.
+- **`fuzziness: 2` on everything.** Matches "cat"→"car"→"can" — precision tanks. Use `AUTO` (edit distance scaled by term length).
+- **Edge-ngram with the same analyzer at search time.** The query gets shredded into n-grams too, so "wire" matches "fire" via shared grams. Set `search_analyzer: standard` — index grams, search the whole term.
+- **HNSW graph that doesn't fit RAM.** Once it spills to disk, query latency jumps 10–100×. Compute resident size *before* indexing; quantize (`int8_hnsw`) or go IVF-PQ if it won't fit.
+- **Over-sharding.** 500 shards for 10 GB of data — each shard is a Lucene index with fixed overhead; cluster state bloats, GC thrashes. Aim 20–50 GB/shard.
+- **Reindex with default `refresh_interval` and `replicas≥1`.** Every batch refreshes + replicates → reindex crawls. Set `refresh:-1, replicas:0` during, restore after.
+- **App pinned to a concrete index name.** Any reindex is now downtime + a deploy. Always read/write through an alias from day one.
+- **Tuning relevance on one query.** A title boost that fixes "iphone" can wreck "running shoes review." Gate every change on the offline eval set.
+## Verify
+- **Mapping is explicit & immutable-safe:** `GET <index>/_mapping` shows `dynamic: strict`, every searched field has the intended `type`/`analyzer`, and a `.raw` keyword exists for each sorted/aggregated field.
+- **Analyzer does what you think:** `POST <index>/_analyze {"field":"title","text":"running shoes"}` emits the expected stemmed/lowercased/ngram tokens (e.g. `run`, `shoe`).
+- **Vector metric & dims match the model:** `dims` equals the embedding model's output, `similarity` matches its training; a near-duplicate of an indexed doc returns itself as the #1 nearest neighbor.
+- **Recall measured, not assumed:** kNN results compared against an exact brute-force scan on a sample → **recall@10 ≥ 0.95** at the chosen `ef_search`/`nprobe`; raise the param if below.
+- **Hybrid beats either alone:** on the labeled judgment set, RRF NDCG@10 ≥ max(BM25-only, vector-only), and a query with a hard filter (`brand=X`) still returns relevant, filter-passing results (no recall cliff, no empty set).
+- **Relevance change gated:** `_rank_eval` (or offline harness) shows NDCG@10 and recall@k did **not** regress vs the previous config across all judgment queries.
+- **Topology sane:** shards are 20–50 GB each, heap ≤ 31 GB, `_cluster/health` is `green`, and HNSW graphs fit resident RAM (no disk spill in node stats).
+- **Reindex was truly zero-downtime:** alias flipped in a single `_aliases` call, doc counts reconcile (`v2.count == v1.count + writes-during-window`), and live search returned 200s with no error spike across the swap.
+Done = the index serves the target workload with explicit immutable-safe mapping, measured recall@10 ≥ 0.95 and a non-regressing NDCG@10 on the offline eval set, hybrid+filter returns no empty/cliffed results, and a mapping change can ship via an atomic alias swap with zero search downtime.

package/skills/design-state-machine/SKILL.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+name: design-state-machine
+description: Models a lifecycle (order status, connection, checkout/approval flow, device/job state) as an EXPLICIT finite state machine or statechart instead of boolean-flag soup — enumerate states + events as closed sets, define transitions as a total (state×event)→state function with guards and entry/exit actions, make the current state a single persisted column (not N booleans), reject every undefined (state,event) pair loudly, and reach for hierarchical/parallel/history statecharts (Harel/SCXML semantics, XState v5 setup/createMachine, or a hand-rolled transition table) once flat states explode combinatorially; persist with optimistic-lock guarded transitions, drive side effects from entry actions or an outbox, and test by asserting the full transition matrix including illegal-edge rejection.
+when_to_use: A thing moves through named stages where only some transitions are legal and code is sprouting isPaid/isShipped/isCancelled flags, scattered if-ladders, or "how did it get into THIS state?" bugs — order/payment/subscription status, WebSocket/TCP connection lifecycle, multi-step wizard or approval workflow, document review, or a long-running job. Distinct from design-event-sourcing-cqrs (the append-only event LOG is the source of truth and state is a fold/projection over it; this skill models the state graph itself and may persist only the current state) and async-concurrency-correctness (races/locks/ordering between concurrent tasks; this skill models one entity's legal transitions, then uses a guarded write so concurrent transitions don't corrupt it).
+---
+## When to Use
+Reach for this skill when an entity moves through named stages and only some moves are legal:
+- "Order goes pending → paid → shipped → delivered, can also cancel/refund — model it properly"
+- "We have `isPaid && !isShipped && !isCancelled` checks everywhere and they keep contradicting"
+- "How did this row end up paid AND cancelled?" / "a refund fired on an unpaid order"
+- "Connection lifecycle: connecting → open → reconnecting → closed with backoff"
+- "Multi-step checkout / approval workflow / document review with back-and-forth"
+- "Add a new status and half the if-ladders broke" / "illegal transition slipped through"
+- "Should I use XState, or a transition table, or just an enum?"
+NOT this skill:
+- The append-only **event log is the source of truth** and state is rebuilt by folding events, with separate read models → design-event-sourcing-cqrs (this skill models the legal-transition graph and may persist only the *current* state; you can combine them — an FSM that emits events into a log)
+- **Concurrency** between tasks — locks, ordering, races, async correctness → async-concurrency-correctness (this skill defines one entity's legal moves; it then *uses* a guarded/optimistic write so two concurrent transitions don't corrupt the row)
+- **Distributed mutual exclusion / leader leases** across nodes → distributed-locks-leases
+- **Workflow orchestration across multiple agents/services** (sagas, fan-out, retries) → orchestrate-agent-workflow (use this skill to model each participant's local state)
+- **Idempotent retries** so a replayed transition command is a no-op → idempotency-keys (this skill makes the transition function; that makes invoking it twice safe)
+- The **DB column type / safe migration** to add the status column or new enum value → db-migration-safety; **how the enum evolves** without breaking old readers → schema-evolution-compatibility
+- A **front-end multi-step form's** validation/field state → build-form-validation; client/server cache sync → manage-client-server-state
+## Steps
+1. **Enumerate states and events as two CLOSED sets first — on paper/in a table before any code.** A state is a *named condition the entity rests in* (`pending`, `paid`, `shipped`); an event is a *named trigger that may cause a move* (`Pay`, `Ship`, `Cancel`). Keep them disjoint and finite. The single best diagnostic that you need this skill: you have ≥3 booleans describing one entity and not all `2^n` combinations are valid. `isPaid + isShipped + isCancelled` admits "shipped but not paid" and "paid and cancelled" — nonsense states the type system permits. Replace them with one `status` enum whose values are *exactly* the legal conditions. **Make illegal states unrepresentable.**
+2. **Define the transition as a total function `(state, event) → state` with guards, entry/exit actions — a TABLE, not scattered `if`s.** This table is the entire spec; review it like one. Anything not in the table is illegal by default.
+   | From | Event | Guard (must be true) | To | Entry action (on arrival) |
+   |---|---|---|---|---|
+   | `pending` | `Pay` | amount == order.total | `paid` | capture funds, emit `OrderPaid` |
+   | `pending` | `Cancel` | — | `cancelled` | release inventory |
+   | `paid` | `Ship` | inventory.reserved | `shipped` | create shipment, notify |
+   | `paid` | `Refund` | — | `refunded` | reverse charge |
+   | `shipped` | `Deliver` | — | `delivered` | close order |
+   | `shipped` | `Refund` | within return window | `refunded` | reverse charge, RMA |
+   - **Guard** = boolean precondition; if false, the event is *rejected* (transition does not fire), not an error-state. `pending --Pay[amount≠total]-->` simply doesn't move.
+   - **Entry action** runs on *every* arrival into a state (idempotent, since you may re-enter); **exit action** runs on leaving. Prefer entry actions over per-transition actions so the side effect is tied to *being in* the state, not the path taken.
+   - `delivered`, `cancelled`, `refunded` are **terminal** — no outgoing transitions. Mark them; assert nothing leaves.
+3. **Reject undefined `(state, event)` pairs LOUDLY — the rejection is the feature.** The whole point over flag-soup is that an out-of-order event can't silently corrupt state. The transition function must, for any pair not in the table, return a typed rejection (don't throw for *expected* business rejections; throw/log for *impossible* ones). Distinguish:
+   - **Guard-failed** (legal event, precondition not met) → 409/422, "cannot Ship: inventory not reserved", state unchanged.
+   - **Illegal event for state** (`Ship` while `cancelled`) → 409 + log/metric `illegal_transition{from,event}`; this often signals a real bug (double-click, replayed message, race) and you *want* the alarm.
+   ```ts
+   function transition(s: State, e: Event, ctx): Result<State> {
+     const row = table[s]?.[e.type];
+     if (!row) return reject("illegal", `${e.type} not allowed in ${s}`); // not in table at all
+     if (row.guard && !row.guard(e, ctx)) return reject("guard", row.why);
+     return ok(row.to);
+   }
+   ```
+   Never write `if (status !== 'cancelled') { ... }` ad hoc — that's flag-soup creeping back. Route *every* change through the one function.
+4. **Persist the current state as ONE column and make the write a guarded compare-and-set so concurrent transitions can't corrupt it.** Store `status` as a single enum/text column with a CHECK or DB enum constraint — not N booleans, not a separate row per flag. The transition write must be conditional on the *expected* from-state (optimistic concurrency), so two racing transitions don't both "succeed":
+   ```sql
+   UPDATE orders SET status = 'shipped', version = version + 1, updated_at = now()
+   WHERE id = $1 AND status = 'paid' AND version = $2;   -- 0 rows affected ⇒ someone else moved it; reject & re-read
+   ```
+   `WHERE status = <expected_from>` is the cheap optimistic lock; 0 rows updated means the precondition no longer holds → reload and re-decide, never blind-overwrite. Add a `status_history(order_id, from, to, event, actor, at)` audit row in the same transaction so "how did it get here?" is answerable. (For locking semantics → async-concurrency-correctness; for making a re-sent command idempotent → idempotency-keys.)
+5. **Drive side effects from entry actions, and make external effects atomic-with-the-state-change via an outbox.** A side effect that must happen *because* you entered a state (charge, email, shipment) belongs in that state's entry action, so it fires on every path in and only once. But "update status row" + "call Stripe/send email" as two separate operations can crash between them (state changed, effect lost — or effect fired, state didn't). Write the status change AND an `outbox` row in **one DB transaction**; a relay publishes the outbox at-least-once and consumers dedup. This keeps the FSM's state and its observable effects consistent. (Outbox/dedup mechanics → idempotency-keys; emitting domain events into a log instead → design-event-sourcing-cqrs.)
+6. **When flat states explode combinatorially, go hierarchical/parallel/history (statecharts) — don't multiply states.** Harel statecharts (the basis of SCXML and XState) add three tools that kill state explosion:
+   - **Hierarchy (nested/compound states):** group `connecting`/`open`/`reconnecting` under a parent `online`; a single `Disconnect` transition on the parent applies to all children — write the common edge once instead of N times.
+   - **Parallel (orthogonal regions):** independent concerns that vary simultaneously become separate regions instead of the cross-product. A media player's `{playing|paused} × {muted|unmuted}` is 2 regions, not 4 states; add a third dimension and you avoid `2×2×2 = 8`.
+   - **History states:** re-entering a compound state resumes the last active child (`H` shallow / `H*` deep) — for "resume where the wizard left off" or reconnect-to-prior-substate.
+   Rule of thumb: a handful of states + a clear table → **hand-rolled transition table** (zero deps, fully testable, easiest to review). Nesting/parallelism/history, or you want a visualizer and typed `assign` context → **XState v5** (`setup({ types, actions, guards }).createMachine({...})`, `actor.send(event)`, statelyai inspector). Cross-language/standards interop → **SCXML**. Don't reach for a library for 3 states; don't hand-roll 4 orthogonal regions.
+7. **Model time/retries as real states + events, not sleeps buried in code.** `reconnecting` with a `backoff` timer is a state; the timer firing is an event (`RetryTimeout`) that transitions back to `connecting` (or to `failed` after `attempts >= max`, a guard on context). Keep the retry count in machine context, not a module global. This makes the backoff policy reviewable in the table and testable without real clocks. (The retry *policy* — jitter, budget, circuit-breaker → resilience-timeouts-retries; this skill places those decisions as guarded transitions in the lifecycle.)
+8. **Visualize and review the graph; treat unreachable/trap states as bugs.** Generate a diagram from the table (XState → Stately inspector; hand-rolled → emit Mermaid `stateDiagram-v2`, see mermaid-diagram) and eyeball it for: a **trap** (non-terminal state with no outgoing edge — entity gets stuck), an **unreachable** state (no incoming edge — dead enum value), and a **missing terminal** (a "done" that still has edges). Every non-terminal state should have at least one path to a terminal/expected state. A new status added without table edges shows up immediately as an orphan node.
+9. **Test the full transition MATRIX, including the illegal edges — that's the differentiator.** For every `(state, event)` pair: assert legal ones land in the right target and run the entry action exactly once; assert *illegal* ones leave state unchanged and return the typed rejection (and emit the `illegal_transition` metric). Property test: from any reachable state, applying any event either transitions per the table or rejects — it never produces a state outside the enum. Add a concurrency test: two parallel `Ship` on the same `paid` order → exactly one wins (guarded UPDATE), the other gets 0-rows-and-reject. (Structure the suite → write-tests; the matrix-as-cases is a natural fit for test-data-factories/property-based-testing.)
+## Common Errors
+- **Boolean-flag soup (`isPaid && !isShipped && !isCancelled`).** N booleans encode `2^n` combinations but only a few are legal; contradictory states ("shipped, not paid") become representable and *do* happen. Fix: one `status` enum = exactly the legal conditions; make illegal states unrepresentable.
+- **Transition logic scattered across `if`-ladders in controllers/services.** No single place owns "what's legal"; a new caller forgets a guard. Fix: one transition function + table; route 100% of changes through it.
+- **Silently ignoring out-of-order events.** `if (status === 'paid') ship()` with no `else` swallows a `Ship` on a `cancelled` order — masking double-clicks, replays, races. Fix: explicit reject + log/metric `illegal_transition`; the alarm is the value.
+- **Blind `UPDATE ... SET status = 'shipped' WHERE id = ?`.** No from-state guard → a stale/concurrent writer overwrites a state it never saw. Fix: `WHERE status = <expected_from> AND version = ?`; 0 rows ⇒ re-read and re-decide.
+- **Side effect outside the state transaction.** Charge fires, then the status write crashes (or vice versa) → state and effect diverge. Fix: status change + outbox row in one transaction; relay publishes; consumers dedup (idempotency-keys).
+- **Entry action that isn't idempotent.** Re-entering a state (retry, replay) double-sends the email/double-charges. Fix: idempotent entry actions, or gate the effect on the *transition* having actually committed.
+- **State explosion from flattening orthogonal concerns.** Modeling `{playing,paused}×{muted,unmuted}` as 4 flat states, then 8, then 16. Fix: parallel regions (one per independent concern); they compose instead of multiply.
+- **Reaching for a heavy library for 3 states** (or hand-rolling 4 orthogonal regions). Fix: match the tool to the shape — table for flat/small, XState for hierarchy/parallel/history, SCXML for cross-language.
+- **Trap / unreachable states.** A non-terminal state with no exit (stuck forever) or an enum value with no incoming edge (dead). Fix: visualize the graph; assert reachability and that every non-terminal has an outgoing edge.
+- **Timers/retries as `sleep()` buried in handlers.** Backoff logic invisible to the spec, untestable without real time. Fix: model `reconnecting`/`RetryTimeout` as state+event with the attempt count in context.
+- **Adding a status without updating the table.** New enum value, old if-ladders don't handle it → falls through to a default branch silently. Fix: the table is the spec; a new state with no edges is an orphan the visualizer/tests catch.
+## Verify
+1. **No flag soup:** grep the diff for `is<X> && !is<Y>`-style combinations on one entity; the state is a single enum/column, and contradictory combinations are no longer representable.
+2. **One transition function:** every status mutation routes through the single `transition(state,event)`; no ad-hoc `SET status =` or `if (status !== ...)` outside it (grep for stray status writes).
+3. **Illegal edges rejected:** for the full `(state,event)` matrix, illegal pairs leave state unchanged and return the typed rejection + emit `illegal_transition`; guard-failures return 409/422 with a reason, not a crash.
+4. **Legal edges + entry actions:** each table transition lands in the correct target and runs its entry action exactly once (assert via spy/counter), even on a re-entry path.
+5. **Guarded persistence:** the UPDATE is conditional on the expected from-state (and version); a test with two concurrent transitions on the same row shows exactly one commits, the other gets 0-rows-and-reject.
+6. **Atomic effects:** status change and external-effect publish are in one transaction (outbox); kill the process between them → on restart the relay still publishes (effect recorded iff state changed), no orphan/lost effect.
+7. **Graph is sound:** generated diagram has no trap (non-terminal with no exit), no unreachable state (no incoming edge), terminals truly terminal; every non-terminal reaches a terminal.
+8. **Statechart features (if used):** a parent transition applies to all nested children (written once); parallel regions vary independently; a history state resumes the prior child.
+9. **Property test holds:** from any reachable state, any event either transitions per the table or rejects — never yields a value outside the state enum.
+Done = the lifecycle is one persisted enum driven by a single total transition function with explicit guards and entry/exit actions, every illegal `(state,event)` pair is rejected loudly (not silently swallowed), persistence is a from-state-guarded compare-and-set, side effects are atomic with the state change via an outbox, hierarchy/parallel/history are used only where flat states would explode, and the full transition matrix — legal AND illegal edges plus the concurrent-write race — is proven by the tests in checks 3–9.

package/skills/design-token-system/SKILL.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+name: design-token-system
+description: Architects a framework-agnostic design-token system with primitive/semantic/component tiers, theming and multi-brand/dark-mode alias contracts, and multi-platform export (CSS vars, Tailwind, JS/TS, iOS/Android) from one W3C-DTCG source via Style Dictionary.
+when_to_use: Setting up or refactoring a token architecture, building a theme/multi-brand/dark-mode system, exporting one token source to web + native, or adopting Style Dictionary / the W3C Design Tokens format. Distinct from style-responsive-tailwind (consuming tokens in markup) and brainstorm-design (choosing the palette/visual direction).
+---
+## When to Use
+Reach for this skill when the problem is the **token architecture and export pipeline**, not a single component's styling:
+- "Set up design tokens / a theme system from scratch"
+- "Add dark mode without forking every color"
+- "Support multiple brands / white-label from one codebase"
+- "Export the same tokens to CSS, Tailwind, and our iOS + Android apps"
+- "Adopt Style Dictionary / the W3C Design Tokens (DTCG) format"
+- "We have 300 hardcoded hex/px values — give us a governed token layer"
+NOT this skill:
+- Writing the markup/utility classes that *consume* tokens → style-responsive-tailwind
+- Picking the actual palette, type pairing, or visual mood → brainstorm-design
+- Translating one Figma frame into a component → implement-from-design
+- Building the React component that renders from tokens → build-react-component
+- Wiring a cross-platform app shell/build → scaffold-cross-platform-app
+- Certifying contrast ratios meet WCAG → audit-accessibility-wcag (this skill *structures* color; it does not verify contrast)
+## Steps
+1. **Build exactly three tiers — never let a component read a primitive.** This is the whole architecture; get it wrong and theming is impossible.
+   | Tier | Names mean | References | Example | Rule |
+   |---|---|---|---|---|
+   | **Primitive** (global/core) | nothing — raw scale | literal values only | `blue.500 = #2563EB`, `space.4 = 16px` | No semantics. Never themed. Never imported by components. |
+   | **Semantic** (alias) | role/intent | → primitives | `color.bg.surface → gray.50`, `color.intent.danger → red.600` | The *only* layer that swaps per theme/brand. |
+   | **Component** (scoped) | one part | → semantics | `button.primary.bg → color.intent.brand` | Optional; add only when a component overrides a semantic. |
+   Default to **2 tiers (primitive + semantic)**; add component tokens only where a component genuinely diverges. Components and Tailwind/CSS consume **semantic tokens only**.
+2. **One source of truth in W3C DTCG JSON.** Use the spec's `$value` / `$type` and `{dot.path}` references so any compliant tool (Style Dictionary v4+, Tokens Studio) can read it. No per-platform hand-edited files.
+   ```jsonc
+   // tokens/primitive/color.json
+   { "color": { "blue": { "500": { "$type": "color", "$value": "#2563EB" } } } }
+   // tokens/semantic/color.json  — alias, NOT a literal
+   { "color": { "intent": { "brand": { "$type": "color", "$value": "{color.blue.500}" } },
+                "bg":     { "surface": { "$type": "color", "$value": "{color.gray.50}" } } } }
+   ```
+   A semantic token whose `$value` is a literal hex is a bug — it must be a `{reference}`.
+3. **Theming = swap the semantic layer, never fork the palette.** Light, dark, and each brand are *alternate semantic files* pointing at the *same* primitives. One `primitive/` set; `semantic/light.json`, `semantic/dark.json`, `semantic/brand-acme.json`. Dark mode flips `bg.surface → gray.900` instead of `gray.50` — the primitives don't move. Never create `blue.500.dark`.
+4. **Author color in OKLCH so themes shift predictably.** Build scales in OKLCH (fall back to HSL only if tooling can't): equal lightness steps stay perceptually even and a brand hue rotation keeps contrast. Hardcoded hex per shade drifts. Emit hex/rgb as a *build output* for legacy targets, not as the source.
+5. **Cover every token type — color is the easy half.** Define and `$type` all of: `color`, `dimension` (spacing/sizing), `fontFamily`/`fontWeight`/`fontSize`/`lineHeight`/`letterSpacing` (typography), `borderRadius`, `shadow` (elevation), `duration`/`cubicBezier` (motion), and z-index. Derive primitives from a **base scale** (4px grid for spacing, a modular ratio for type); semantics name the use (`space.inline.sm`, `text.heading.lg`).
+6. **Export everything from one Style Dictionary config.** One source → many platforms, each with the right transform group and output format:
+   ```js
+   // style-dictionary.config.js  (v4 — ESM)
+   export default {
+     source: ['tokens/primitive/**/*.json', 'tokens/semantic/light.json'],
+     platforms: {
+       css:      { transformGroup: 'css', buildPath: 'build/css/',
+                   files: [{ destination: 'vars.css', format: 'css/variables',
+                             options: { outputReferences: true } }] }, // keeps var(--x) chains
+       tailwind: { transformGroup: 'js', buildPath: 'build/tw/',
+                   files: [{ destination: 'tokens.cjs', format: 'javascript/module-flat' }] },
+       ts:       { transformGroup: 'js', buildPath: 'build/ts/',
+                   files: [{ destination: 'tokens.ts', format: 'javascript/es6' }] },
+       ios:      { transformGroup: 'ios-swift', buildPath: 'build/ios/',
+                   files: [{ destination: 'Tokens.swift', format: 'ios-swift/class.swift' }] },
+       android:  { transformGroup: 'android', buildPath: 'build/android/',
+                   files: [{ destination: 'tokens.xml', format: 'android/resources' }] }
+     }
+   };
+   ```
+   Run `style-dictionary build`. For each extra theme, run the same config with `semantic/dark.json` swapped into `source` and scope output under `[data-theme="dark"]` (CSS `options.selector`).
+7. **Wire Tailwind to the generated tokens — do not retype them.** `tailwind.config` imports `build/tw/tokens.cjs` into `theme.colors/spacing/...`. CSS vars drive runtime theme switching: Tailwind utilities resolve `var(--color-bg-surface)`, and the `[data-theme]` attribute swaps which value that var resolves to. One toggle, zero recompiled CSS.
+8. **Forbid raw values in app code with a linter.** Add `stylelint-declaration-strict-value` (web CSS) or an ESLint/lint rule that bans hex, `rgb(`, and bare `px` outside `tokens/` and `build/`. Raw values must fail CI, not slip through code review.
+9. **Govern it as a published API.** Fix a naming grammar `category.role.variant.state` (e.g. `color.bg.surface.hover`); semver the published token package (removed/renamed semantic token = **major**, added = minor, primitive value tweak = patch); keep a CHANGELOG; treat the `semantic` layer as the public API and primitives as private/internal.
+## Common Errors
+- **Components reading primitives** (`button { color: blue.500 }`). Dark mode and rebrand degrade to find-and-replace. Components must reference semantics only.
+- **Forking the palette per theme** (`blue.500.dark`). Palette count explodes and brands drift. Themes swap the *semantic* alias target; primitives are shared and immutable.
+- **Semantic tokens holding literal values** instead of `{references}`. The indirection is the entire point — a literal hex in a semantic token can't be retargeted by a theme.
+- **`outputReferences: false` (the default) flattening CSS vars.** The build bakes `#2563EB` into every rule, killing runtime theme switching. Set `options: { outputReferences: true }` so `var(--color-intent-brand)` chains survive.
+- **Duplicating tokens into `tailwind.config` by hand.** They desync within the first week. Import the Style Dictionary build output; never maintain two sources.
+- **No grid/scale — arbitrary `13px`, `17px` primitives.** Defeats consistency. Primitives come from a 4px (or 8px) grid and a modular type ratio.
+- **Treating contrast as solved because colors are tokenized.** Tokens organize color; they don't guarantee `bg.surface`/`text.primary` meet 4.5:1. Run audit-accessibility-wcag on each theme.
+- **Component tokens for everything**, including parts that never override a semantic. Pure bloat. Add a component token only where it diverges from the semantic.
+- **Per-platform manual edits to `build/` outputs.** They're regenerated; your edit vanishes on the next build. Fix the source and rebuild.
+- **No versioning/changelog on the token package.** A renamed semantic token silently breaks every consumer. Semver it; a rename is a breaking (major) change.
+## Verify
+1. **Tier discipline:** `grep` app/component source — zero references to primitive names (`blue.500`, `space.4`) and zero raw hex/`rgb(`/bare `px`. Every match is a violation.
+2. **Aliases resolve:** every semantic `$value` is a `{reference}`, not a literal; `style-dictionary build` reports **0 unresolved references** and exits `0`.
+3. **One source, many outputs:** a single `style-dictionary build` produces CSS, Tailwind, TS, iOS, and Android artifacts from the same `tokens/` tree (no hand-edited platform file).
+4. **Theme swap is alias-only:** diff `semantic/light.json` vs `semantic/dark.json` — they differ only in reference *targets*; `primitive/` is byte-identical across themes. Adding a brand touches no primitive.
+5. **Runtime switch works:** toggling `[data-theme="dark"]` on the built CSS recolors the page with **no CSS recompile** (proves `outputReferences` chains survived).
+6. **Lint gate is live:** committing a raw `#fff` or `12px` in app code fails CI, not review.
+7. **Native parity:** the same semantic token (e.g. `color.intent.brand`) yields the same color in `build/css/vars.css`, `build/ios/Tokens.swift`, and `build/android/tokens.xml`.
+8. **Governance:** naming matches `category.role.variant.state`, the package carries a semver + CHANGELOG, and a token rename ships as a major bump.
+Done = one W3C-DTCG source builds all platforms with zero unresolved references, components reference semantics only (lint-enforced in CI), themes/brands swap via alias targets over shared immutable primitives, and runtime theme switching recolors with no recompile.

package/skills/distributed-locks-leases/SKILL.md ADDED Viewed

@@ -0,0 +1,120 @@
+---
+name: distributed-locks-leases
+description: Implements distributed mutual exclusion and leader election correctly across processes/nodes — Redis `SET key token NX PX <ttl>` with a unique random token + Lua compare-and-delete unlock (never bare DEL), etcd/ZooKeeper/Consul leases (lease grant + TTL + keepAlive renewal, ephemeral znode + watch on predecessor for leader election), and Postgres advisory locks (`pg_advisory_lock`/`pg_try_advisory_xact_lock`) for single-DB serialization — while treating every lock as a LEASE that can expire mid-work, so safety rides on monotonic fencing tokens that the protected resource checks-and-rejects-stale (per Kleppmann's Redlock critique), never on the lock alone. Covers TTL sizing vs work duration, renewal/keepalive, the GC-pause/clock-skew expiry hazard, split-brain, and choosing idempotency or partitioning INSTEAD of a lock.
+when_to_use: You need only-one-runner-at-a-time across machines — a leader/singleton (cron that must not double-fire, one active scheduler/consumer), a critical section over a shared external resource (a row, a file, an API quota) spanning multiple nodes, leader election, or you're reaching for Redlock/`SETNX`/etcd leases/ZooKeeper. Distinct from async-concurrency-correctness (in-process mutexes/atomics/channels within ONE process — no network, no lease expiry) and idempotency-keys (the real safety net when the lock fails or expires — make the protected operation safe to repeat instead of/in addition to locking).
+---
+## When to Use
+Reach for this skill when you need **at most one actor running at a time across separate processes or machines**, coordinated through a shared store — and a second concurrent runner would corrupt state:
+- "Only one instance should run this cron / scheduler / migration / cleanup at a time"
+- "Elect a leader / single active consumer across N replicas" (active-passive failover)
+- "Two pods both processed the same job / both wrote the same file"
+- "Serialize edits to one row/aggregate/external resource across the cluster"
+- "I'm using Redis `SETNX` / Redlock / etcd lease / ZooKeeper ephemeral node for a lock"
+- "Hold a lock while I do work, renew it, and release it safely"
+- "The lock expired while my job was still running and another node started"
+NOT this skill:
+- A mutex/semaphore/atomic/channel **inside a single process** (Go `sync.Mutex`, Java `synchronized`/`ReentrantLock`, Python `Lock`, `asyncio` races) — no network, no TTL, no lease expiry → async-concurrency-correctness
+- Making the protected operation **safe to run twice** so a lock failure/expiry is harmless (dedup table, upsert, set-don't-increment) → idempotency-keys (this is the safety net BELOW the lock; prefer it over a lock when you can)
+- Throttling request *rate* (token bucket / sliding window), not exclusivity → rate-limiting
+- Worker pool, job dispatch, DLQ, poison-message handling, exactly-once consumer semantics → message-queue-jobs
+- Optimistic concurrency on a single DB row (`WHERE version = N` / `If-Match`/ETag, no separate lock service) → idempotency-keys (by-design) / db-migration-safety for schema
+- Timeouts, retries, backoff, circuit breakers around the locked call → resilience-timeouts-retries
+- Saga/state-machine coordination of a long multi-step workflow → design-state-machine / orchestrate-agent-workflow
+## Steps
+1. **First ask: do you actually need a distributed lock? Usually you don't.** A lock is a liveness/correctness liability (a held-but-dead lock stalls everyone; an expired one breaks mutual exclusion). Prefer, in order:
+   | Instead of a lock | Technique | Why it's better |
+   |---|---|---|
+   | **Idempotency** | make the op safe to repeat (upsert, set-don't-increment, dedup key) → idempotency-keys | concurrent runs are *harmless*, not *prevented* — no expiry hazard at all |
+   | **Partitioning** | shard work by key (Kafka partition, consistent-hash, `id % N`) so each key has exactly one owner | structural single-ownership, no shared lock at all |
+   | **Single-DB serialization** | `SELECT ... FOR UPDATE` / unique constraint / `INSERT ... ON CONFLICT` / advisory lock (step 6) | the DB transaction *is* the lock, with real ACID guarantees |
+   | **A queue / leader-elected scheduler** | one consumer per partition; framework-provided leader election (k8s `Lease`, Raft) | offloads the hard part to a tested system |
+   Use a distributed lock only for **efficiency** (avoid duplicate work, where a rare double-run is *tolerable*) — NOT as your sole correctness guarantee. For correctness you also need step 4 (fencing) or idempotency.
+2. **Treat every lock as a LEASE: it auto-expires after a TTL, and it can expire WHILE you still think you hold it.** This is the central hazard. A lock without a TTL deadlocks the whole system if the holder crashes; a lock with a TTL can expire mid-work (GC pause, CPU starvation, slow I/O, network partition, VM freeze) — then the store hands the lock to node B while node A, paused, *believes* it still holds it and resumes writing. Two writers, one lock. Conclusions that follow:
+   - Always set a TTL (no infinite locks).
+   - TTL alone is never sufficient for correctness — you must also fence (step 4) or be idempotent (step 1).
+   - Pick TTL ≥ p99 work duration + safety margin; renew (step 5) for long work rather than setting a huge TTL.
+3. **Redis single-node lock — acquire with a unique token, release with compare-and-delete (Lua), never bare `DEL`.** Use one atomic command and a per-acquisition random token so only the owner can unlock:
+   ```
+   # acquire — NX = only if absent, PX = TTL in ms, token = unique per acquisition (uuid/16 random bytes)
+   SET resource_lock <token> NX PX 30000
+   ```
+   ```lua
+   -- release — DELETE ONLY IF the value is still OUR token (compare-and-delete, atomic)
+   if redis.call("GET", KEYS[1]) == ARGV[1] then
+     return redis.call("DEL", KEYS[1])
+   else return 0 end
+   ```
+   - **Never** `SETNX` + separate `EXPIRE` (non-atomic: crash between them = a lock that never expires). Use `SET ... NX PX` in one call.
+   - **Never** a bare `DEL resource_lock` to release: if your lease already expired and B re-acquired, your `DEL` deletes *B's* lock. The token check prevents that.
+   - **Redlock (multi-node) is contested — default to single-node + fencing.** Kleppmann's critique ("How to do distributed locking", 2016): Redlock relies on bounded clocks and pauses it can't guarantee, so it provides neither efficiency nor correctness better than a single node *for correctness*. Antirez disputes the framing, but the practical takeaway holds: **do not rely on any timing-based lock (Redlock included) for correctness — fence the resource (step 4).** Use single-node Redis for the cheap mutual-exclusion-for-efficiency case; reach for a consensus store (step 7) when you need real leader election.
+4. **Fencing tokens — the only thing that makes a lease-based lock SAFE. The protected resource must reject stale writers.** On every acquisition, get a **monotonically increasing** token (the "fence"). Pass it with every write to the protected resource. The resource stores the highest token it has seen and **rejects any write carrying a token ≤ the last accepted one.** Now a paused node A (token 33) that wakes after B acquired (token 34) gets its write rejected — mutual exclusion is enforced *at the resource*, independent of who "thinks" they hold the lock.
+   ```
+   client A acquires → fence=33 → write(x, fence=33)   accepted, resource now at 33
+   A pauses; lease expires; B acquires → fence=34 → write(y, fence=34)   accepted, resource at 34
+   A resumes, still "holds" lock → write(z, fence=33)   REJECTED (33 ≤ 34)
+   ```
+   - Source of monotonic tokens: ZooKeeper `zxid`/znode version, etcd key `mod_revision` / a `CreateRevision`-based counter, Redis `INCR fence_counter` (single-node only — multi-node Redis can't guarantee monotonicity), or a DB sequence.
+   - The resource MUST participate — if your storage/API can't check-and-reject a token (e.g. a dumb blob store), fencing is impossible and you fall back to idempotency (step 1). Many real systems can't fence; that's exactly why idempotency is the more robust default.
+5. **Long work: renew (keepalive) instead of guessing a huge TTL — and abort if renewal fails.** For work that may exceed the TTL, run a watchdog that re-extends the lease at ~TTL/3:
+   - Redis: a Lua `PEXPIRE` guarded by the same token check (extend only if still ours).
+   - etcd: `LeaseKeepAlive` stream; ZooKeeper: session heartbeats keep the ephemeral node alive; Consul: session renew before TTL.
+   - **Critical:** if a renewal FAILS or is late, you may have already lost the lease — **stop doing work immediately** (cancel the in-flight operation), don't blindly continue. The renewer and the worker must share a cancellation signal (context/CancellationToken). A renew thread that keeps extending after the worker is wedged is also a bug (it masks a stuck holder).
+6. **Postgres advisory locks — the right tool when one Postgres is your coordination point.** No extra infra; the lock lives in the DB you already trust:
+   | Function | Scope | Released by | Use for |
+   |---|---|---|---|
+   | `pg_advisory_lock(key)` | **session** | explicit `pg_advisory_unlock` or session end | held across transactions; must release manually (leaks if connection pooled + forgotten) |
+   | `pg_advisory_xact_lock(key)` | **transaction** | automatically at COMMIT/ROLLBACK | **preferred** — no manual release, no leak; held only for the txn |
+   | `pg_try_advisory_lock(key)` | session, **non-blocking** | as above | returns `true/false` instantly — "skip if someone else has it" (e.g. cron singleton) |
+   - Key is a `bigint` (or two `int4`s) — hash your logical name: `pg_try_advisory_xact_lock(hashtext('nightly-report'))`. Beware `hashtext` collisions; use a deliberate keyspace for unrelated locks.
+   - **Advisory locks are NOT enforced by the data** — they're cooperative; only code that *also* takes the lock is excluded. They don't lock rows. For row-level exclusion use `SELECT ... FOR UPDATE` instead.
+   - **Pooling gotcha:** with a transaction pooler (PgBouncer `transaction` mode), session-level advisory locks break (different backend per statement). Use `*_xact_lock` or a `session` pool.
+7. **etcd / ZooKeeper / Consul — when you need real leader election and consensus.** These are CP (consistent under partition) consensus stores; use them when a *rare* double-leader is unacceptable:
+   - **etcd:** `Lease` (grant TTL) + a key written with that lease; election via the `concurrency.Election` API (campaign → leader holds key until lease lapses or it resigns). `mod_revision` gives you a fencing token for free.
+   - **ZooKeeper:** create an **ephemeral sequential** znode; the lowest sequence number is leader; each node **watches only its immediate predecessor** (not all nodes — avoids the herd effect). On predecessor delete, re-check if you're now lowest. Ephemeral = auto-removed on session loss → automatic failover. The Curator `LeaderLatch`/`InterProcessMutex` recipes implement this correctly; prefer them over hand-rolling.
+   - **Consul:** session + KV `acquire` flag; session TTL + health check ties lock liveness to the holder's health.
+   - **Even here, fence.** Consensus guarantees agreement on *who holds the lease*, but a GC-paused leader still doesn't know its lease lapsed — the resource must still reject its stale-token writes (step 4). Consensus narrows the window; it doesn't remove the mid-work-expiry hazard.
+8. **Defend against split-brain and clock skew.** Two nodes both believing they're leader = split-brain. Mitigations: a single consensus source of truth (don't run two independent lock services); fencing tokens so even a split-brain second writer is rejected at the resource; **never trust wall-clock time for lease math across nodes** — use the lock service's own expiry, and within a node use a *monotonic* clock (`CLOCK_MONOTONIC`, `time.monotonic()`, `Instant`/`System.nanoTime`) for "have I exceeded my budget?" since NTP steps and VM time-warps corrupt wall-clock deltas. Assume your process can pause arbitrarily long between any two lines (GC, OS scheduler, live-migration).
+## Common Errors
+- **No TTL → permanent deadlock on crash.** A holder dies, the lock is held forever, the system stalls. Fix: always set a TTL; renew for long work (step 5).
+- **TTL but no fencing → silent double-write on mid-work expiry.** The lock expires during a GC pause, B acquires, A resumes and writes. Fix: monotonic fencing token rejected at the resource (step 4), or make the op idempotent (step 1).
+- **`SETNX` then separate `EXPIRE`.** Crash between the two leaves a lock with no expiry = deadlock. Fix: single atomic `SET key token NX PX <ttl>`.
+- **Releasing with bare `DEL` / no owner check.** If your lease already expired and someone re-acquired, you delete *their* lock. Fix: Lua compare-and-delete on your unique token.
+- **Reusing a constant lock value.** Without a per-acquisition random token you can't tell your lock from a successor's — unlock and renew both become unsafe. Fix: fresh uuid/random token each acquire.
+- **Trusting Redlock (or any timing lock) for correctness.** Bounded-clock/bounded-pause assumptions don't hold. Fix: single-node for efficiency-only; fencing/consensus for correctness (steps 3, 4, 7).
+- **Renewal failure ignored.** The watchdog can't renew but the worker keeps writing without the lease. Fix: failed/late renew → cancel the work immediately via a shared cancellation signal.
+- **Session-level `pg_advisory_lock` behind a transaction pooler.** Different backend per statement → lock acquired on one connection, never released / not visible. Fix: `pg_advisory_xact_lock`, or a session-mode pool.
+- **Forgetting to release a session advisory lock.** Leaks until the connection dies; with pooling that connection is reused holding the lock. Fix: prefer `*_xact_lock` (auto-release at txn end).
+- **Using a distributed lock where idempotency/partitioning was the right tool.** You inherit the whole expiry/split-brain failure surface for no reason. Fix: revisit step 1 — can the op be idempotent or key-partitioned instead?
+- **Wall-clock lease math across nodes.** NTP steps / VM time-warps make "is my lease still valid?" wrong. Fix: trust the lock service's expiry; use a monotonic clock for local budget checks.
+- **Watching all nodes in ZooKeeper leader election (herd effect).** Every change wakes every node. Fix: ephemeral-sequential + watch only your immediate predecessor (or use Curator recipes).
+## Verify
+1. **Mutual exclusion under contention:** spawn N nodes/goroutines racing for the same lock against the *real* shared store; assert exactly one holds it at any instant (e.g. each increments a shared counter inside the section and the section must never overlap — verified with a sentinel that fails if two enter).
+2. **Crash releases the lock:** kill the holder mid-section; another node acquires within ~TTL (the lease expires), not never (no permanent deadlock) and not instantly (no missing TTL).
+3. **Fencing rejects the stale writer:** simulate the Kleppmann scenario — A acquires (fence 33), pause A, let the lease expire, B acquires (fence 34) and writes, then resume A's write with fence 33 → the resource **rejects** it. Without fencing, this is the test that exposes the double-write.
+4. **Atomic acquire:** the acquire path is a single `SET NX PX` (or equivalent) — grep shows no `SETNX`+`EXPIRE` two-step and no infinite/missing TTL.
+5. **Safe release:** the unlock only deletes when the stored token matches (Lua/compare-and-delete); a test where the lease expired and was re-acquired confirms the old holder's release does NOT remove the new holder's lock.
+6. **Renewal + abort:** for long work, the lease is extended at ~TTL/3 while the token still matches; inject a renewal failure and assert the worker *cancels* rather than continuing without the lease.
+7. **Advisory-lock leak/pooling check:** advisory locks are `*_xact_lock` (or explicitly unlocked) and behave correctly under the actual connection-pool mode; `pg_locks` shows no orphaned advisory locks after the txn ends.
+8. **Leader election failover:** kill the leader; a new leader is elected within the session/lease TTL; assert there is never *zero* leader for long nor *two* leaders simultaneously (split-brain) — and that a deposed leader's writes are fenced out.
+9. **Default-choice justification:** confirm a distributed lock is genuinely needed — document why idempotency (idempotency-keys) or partitioning couldn't replace it; if the lock is correctness-critical, fencing or idempotency is present, not the lock alone.
+Done = at most one actor runs at a time under real contention, every lock has a TTL and crash-frees within it, mid-work expiry cannot cause a double effect because the resource rejects stale fencing tokens (or the op is idempotent), acquire/release/renew are atomic and owner-checked, advisory locks are pool-safe and leak-free, leader election survives failover without split-brain, and the choice of a lock over idempotency/partitioning is deliberate — all proven by the contention, crash, and fencing tests in checks 1–8.