sanook-cli 0.4.0 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +19 -0
- package/CHANGELOG.md +173 -0
- package/README.md +153 -20
- package/README.th.md +136 -0
- package/dist/agentContext.js +4 -0
- package/dist/approval.js +6 -0
- package/dist/bin.js +405 -57
- package/dist/brain.js +92 -59
- package/dist/brand.js +47 -0
- package/dist/checkpoint.js +37 -0
- package/dist/commands.js +86 -6
- package/dist/compaction.js +76 -5
- package/dist/config.js +100 -12
- package/dist/cost.js +60 -3
- package/dist/doctor.js +92 -0
- package/dist/gateway/auth.js +2 -2
- package/dist/gateway/ledger.js +2 -2
- package/dist/gateway/scheduler.js +1 -0
- package/dist/gateway/serve.js +6 -4
- package/dist/gateway/server.js +10 -2
- package/dist/git.js +11 -2
- package/dist/hooks.js +43 -17
- package/dist/knowledge.js +48 -49
- package/dist/loop.js +182 -66
- package/dist/lsp/client.js +173 -0
- package/dist/lsp/framing.js +56 -0
- package/dist/lsp/index.js +138 -0
- package/dist/lsp/servers.js +82 -0
- package/dist/mcp-server.js +244 -0
- package/dist/mcp.js +184 -29
- package/dist/memory-store.js +559 -0
- package/dist/memory.js +143 -29
- package/dist/orchestrate.js +150 -0
- package/dist/providers/codex.js +21 -7
- package/dist/providers/keys.js +3 -2
- package/dist/providers/models.js +22 -6
- package/dist/providers/registry.js +155 -1
- package/dist/repomap.js +93 -0
- package/dist/search/chunk.js +158 -0
- package/dist/search/embed-store.js +187 -0
- package/dist/search/engine.js +203 -0
- package/dist/search/fuse.js +35 -0
- package/dist/search/index-core.js +187 -0
- package/dist/search/indexer.js +241 -0
- package/dist/search/store.js +77 -0
- package/dist/session.js +42 -8
- package/dist/skill-install.js +10 -10
- package/dist/skills.js +12 -9
- package/dist/summarize.js +31 -0
- package/dist/tools/bash.js +21 -2
- package/dist/tools/diagnostics.js +41 -0
- package/dist/tools/edit.js +29 -7
- package/dist/tools/index.js +8 -1
- package/dist/tools/list.js +7 -2
- package/dist/tools/permission.js +90 -9
- package/dist/tools/read.js +23 -4
- package/dist/tools/remember.js +1 -1
- package/dist/tools/sandbox.js +61 -0
- package/dist/tools/search.js +105 -4
- package/dist/tools/task.js +195 -29
- package/dist/tools/timeout.js +35 -0
- package/dist/tools/util.js +10 -0
- package/dist/tools/write.js +6 -4
- package/dist/trust.js +89 -0
- package/dist/ui/app.js +228 -31
- package/dist/ui/banner.js +4 -9
- package/dist/ui/brain-wizard.js +2 -2
- package/dist/ui/history.js +30 -0
- package/dist/ui/mentions.js +44 -0
- package/dist/ui/render.js +55 -15
- package/dist/ui/setup.js +97 -12
- package/dist/ui/useEditor.js +83 -0
- package/dist/update.js +114 -0
- package/dist/worktree.js +173 -0
- package/package.json +11 -5
- package/scripts/postinstall.mjs +33 -0
- package/second-brain/.agents/_Index.md +30 -0
- package/second-brain/.agents/skills/_Index.md +30 -0
- package/second-brain/.agents/workflows/_Index.md +30 -0
- package/second-brain/AGENTS.md +4 -4
- package/second-brain/Acceptance/_Index.md +30 -0
- package/second-brain/Acceptance/golden-case-template.md +39 -0
- package/second-brain/Areas/_Index.md +30 -0
- package/second-brain/Bugs/System-OS/_Index.md +30 -0
- package/second-brain/Bugs/_Index.md +30 -0
- package/second-brain/CLAUDE.md +4 -1
- package/second-brain/Checklists/_Index.md +30 -0
- package/second-brain/Checklists/preflight-postflight-template.md +29 -0
- package/second-brain/Distillations/_Index.md +30 -0
- package/second-brain/Entities/_Index.md +30 -0
- package/second-brain/Entities/entity-template.md +33 -0
- package/second-brain/Evals/_Index.md +30 -0
- package/second-brain/Evals/correction-pairs.md +24 -0
- package/second-brain/Evals/failure-taxonomy.md +24 -0
- package/second-brain/Evals/golden-set.md +25 -0
- package/second-brain/Evals/quality-ledger.md +23 -0
- package/second-brain/Evals/self-eval-rubric.md +23 -0
- package/second-brain/GEMINI.md +4 -4
- package/second-brain/Goals/_Index.md +30 -0
- package/second-brain/Handoffs/_Index.md +30 -0
- package/second-brain/Home.md +7 -0
- package/second-brain/Intake/Raw Sources/_Index.md +30 -0
- package/second-brain/Intake/_Index.md +30 -0
- package/second-brain/Intake/_Quarantine/_Index.md +30 -0
- package/second-brain/Learning/_Index.md +30 -0
- package/second-brain/Playbooks/_Index.md +30 -0
- package/second-brain/Playbooks/playbook-template.md +23 -0
- package/second-brain/Projects/_Index.md +30 -0
- package/second-brain/Prompts/_Index.md +30 -0
- package/second-brain/README.md +2 -1
- package/second-brain/Research/_Index.md +30 -0
- package/second-brain/Retrospectives/_Index.md +30 -0
- package/second-brain/Reviews/_Index.md +30 -0
- package/second-brain/Runbooks/_Index.md +30 -0
- package/second-brain/Runbooks/eval-loop.md +24 -0
- package/second-brain/Sessions/_Index.md +30 -0
- package/second-brain/Shared/AI-Context-Index.md +20 -0
- package/second-brain/Shared/AI-Threads/_Index.md +30 -0
- package/second-brain/Shared/Archive/_Index.md +30 -0
- package/second-brain/Shared/Assets/_Index.md +30 -0
- package/second-brain/Shared/Context-Packs/_Index.md +30 -0
- package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
- package/second-brain/Shared/Coordination/NOW.md +28 -0
- package/second-brain/Shared/Coordination/_Index.md +30 -0
- package/second-brain/Shared/Coordination/agent-registry.md +24 -0
- package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
- package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
- package/second-brain/Shared/Coordination/task-board.md +32 -0
- package/second-brain/Shared/Core-Facts/_Index.md +30 -0
- package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
- package/second-brain/Shared/Glossary/_Index.md +30 -0
- package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
- package/second-brain/Shared/Operating-State/_Index.md +30 -0
- package/second-brain/Shared/Prompting/_Index.md +30 -0
- package/second-brain/Shared/Provenance/_Index.md +30 -0
- package/second-brain/Shared/Rules/_Index.md +30 -0
- package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
- package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
- package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
- package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
- package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
- package/second-brain/Shared/Rules/rules-formatting.md +34 -0
- package/second-brain/Shared/Scripts/_Index.md +30 -0
- package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
- package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
- package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
- package/second-brain/Shared/User-Memory/_Index.md +30 -0
- package/second-brain/Shared/User-Persona/_Index.md +30 -0
- package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
- package/second-brain/Shared/Working-Memory/_Index.md +30 -0
- package/second-brain/Shared/_Index.md +30 -0
- package/second-brain/Shared/mcp-servers/_Index.md +30 -0
- package/second-brain/Skills/_Index.md +30 -0
- package/second-brain/Templates/_Index.md +30 -0
- package/second-brain/Templates/bug.md +2 -0
- package/second-brain/Templates/handoff.md +2 -0
- package/second-brain/Templates/session.md +2 -0
- package/second-brain/Tools/_Index.md +30 -0
- package/second-brain/Traces/_Index.md +30 -0
- package/second-brain/Vault Structure Map.md +33 -1
- package/second-brain/copilot/_Index.md +30 -0
- package/skills/audit-license-compliance/SKILL.md +117 -0
- package/skills/author-codemod/SKILL.md +110 -0
- package/skills/build-audit-logging/SKILL.md +112 -0
- package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
- package/skills/build-cli-tool/SKILL.md +108 -0
- package/skills/build-data-table/SKILL.md +141 -0
- package/skills/build-native-mobile-ui/SKILL.md +154 -0
- package/skills/build-offline-first-sync/SKILL.md +118 -0
- package/skills/build-realtime-channel/SKILL.md +122 -0
- package/skills/build-vector-search/SKILL.md +131 -0
- package/skills/compose-local-dev-stack/SKILL.md +149 -0
- package/skills/configure-bundler-build/SKILL.md +166 -0
- package/skills/configure-dns-tls/SKILL.md +142 -0
- package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
- package/skills/configure-security-headers-csp/SKILL.md +122 -0
- package/skills/contract-testing/SKILL.md +140 -0
- package/skills/datetime-timezone-correctness/SKILL.md +125 -0
- package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
- package/skills/debug-flaky-tests/SKILL.md +128 -0
- package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
- package/skills/deliver-webhooks/SKILL.md +116 -0
- package/skills/design-api-pagination/SKILL.md +144 -0
- package/skills/design-authorization-model/SKILL.md +119 -0
- package/skills/design-backup-dr-recovery/SKILL.md +113 -0
- package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
- package/skills/design-multi-tenancy/SKILL.md +100 -0
- package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
- package/skills/design-relational-schema/SKILL.md +129 -0
- package/skills/design-search-index-infra/SKILL.md +151 -0
- package/skills/design-state-machine/SKILL.md +108 -0
- package/skills/design-token-system/SKILL.md +109 -0
- package/skills/distributed-locks-leases/SKILL.md +120 -0
- package/skills/encrypt-sensitive-data/SKILL.md +148 -0
- package/skills/feature-flags-rollout/SKILL.md +130 -0
- package/skills/file-upload-object-storage/SKILL.md +107 -0
- package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
- package/skills/harden-llm-app-reliability/SKILL.md +126 -0
- package/skills/i18n-localization-setup/SKILL.md +113 -0
- package/skills/idempotency-keys/SKILL.md +107 -0
- package/skills/implement-push-notifications/SKILL.md +142 -0
- package/skills/ingest-webhook-secure/SKILL.md +120 -0
- package/skills/integrate-oauth-oidc/SKILL.md +126 -0
- package/skills/load-stress-test/SKILL.md +129 -0
- package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
- package/skills/model-nosql-data/SKILL.md +118 -0
- package/skills/money-decimal-arithmetic/SKILL.md +123 -0
- package/skills/monitor-ml-drift/SKILL.md +109 -0
- package/skills/numeric-precision-units/SKILL.md +144 -0
- package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
- package/skills/optimize-react-rerenders/SKILL.md +124 -0
- package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
- package/skills/payments-billing-integration/SKILL.md +114 -0
- package/skills/pin-toolchain-versions/SKILL.md +116 -0
- package/skills/plan-strangler-migration/SKILL.md +95 -0
- package/skills/property-based-testing/SKILL.md +108 -0
- package/skills/publish-package-registry/SKILL.md +130 -0
- package/skills/recover-git-state/SKILL.md +119 -0
- package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
- package/skills/resilience-timeouts-retries/SKILL.md +104 -0
- package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
- package/skills/rewrite-git-history/SKILL.md +109 -0
- package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
- package/skills/schema-evolution-compatibility/SKILL.md +121 -0
- package/skills/send-transactional-email/SKILL.md +126 -0
- package/skills/serve-deploy-ml-model/SKILL.md +107 -0
- package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
- package/skills/setup-devcontainer-env/SKILL.md +131 -0
- package/skills/setup-lint-format-precommit/SKILL.md +140 -0
- package/skills/setup-monorepo-tooling/SKILL.md +125 -0
- package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
- package/skills/structured-output-llm/SKILL.md +86 -0
- package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
- package/skills/test-data-factories/SKILL.md +158 -0
- package/skills/threat-model-stride/SKILL.md +123 -0
- package/skills/train-evaluate-ml-model/SKILL.md +109 -0
- package/skills/unicode-text-correctness/SKILL.md +109 -0
- package/skills/visual-regression-testing/SKILL.md +120 -0
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: model-nosql-data
|
|
3
|
+
description: Models data for document, key-value, and wide-column stores access-pattern-first — enumerates queries, then picks partition/sort keys for even distribution, chooses embed-vs-reference per relationship, lays out single-table/aggregate items, denormalizes deliberately with a fan-out path, and avoids hot partitions.
|
|
4
|
+
when_to_use: When the datastore is non-relational (DynamoDB, MongoDB, Cassandra/ScyllaDB, Firestore, Bigtable) and you must shape items/documents/rows around queries — picking partition keys, embed vs reference, a single-table model, or a wide-column primary key — before writing the data layer. Distinct from design-relational-schema (normalized tables + joins) and optimize-sql-query (tunes an existing relational query); caching-strategy is a read cache in front of any store, not the store's own model.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when you must **shape the store around its queries**, before any table/collection exists:
|
|
10
|
+
|
|
11
|
+
- "Design the DynamoDB table(s) for this service"
|
|
12
|
+
- "Should this be embedded or a separate collection in MongoDB?"
|
|
13
|
+
- "Pick the partition key and clustering columns for this Cassandra table"
|
|
14
|
+
- "We're getting hot partitions / throttling on one key — fix the key design"
|
|
15
|
+
- "Model a many-to-many (users↔teams, products↔orders) in a store with no joins"
|
|
16
|
+
- "Firestore/Bigtable layout for a feed/timeline read"
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- A normalized schema with joins in a **relational** DB (entities, 3NF, FK/CHECK) → design-relational-schema
|
|
20
|
+
- A slow query against an existing **relational** schema → optimize-sql-query
|
|
21
|
+
- A read cache (TTL, invalidation, stampede) **in front of** the store → caching-strategy
|
|
22
|
+
- Schema *change* safety / locks / rollback on a live table → db-migration-safety
|
|
23
|
+
- Append-only event streams + projections as the system of record → design-event-sourcing-cqrs
|
|
24
|
+
- Background work / queue semantics → message-queue-jobs
|
|
25
|
+
|
|
26
|
+
## Steps
|
|
27
|
+
|
|
28
|
+
1. **Enumerate every access pattern first — this drives 100% of the design. No keys until this table is full.** One row per *operation*, reads and writes. A pattern you forget becomes a full scan in prod.
|
|
29
|
+
|
|
30
|
+
| Pattern | R/W | Args (known at call time) | Result shape | Freq | Latency target | Selectivity |
|
|
31
|
+
|---|---|---|---|---|---|---|
|
|
32
|
+
| Get user by id | R | userId | 1 item | very high | <10ms | 1 |
|
|
33
|
+
| List orders for user, newest first | R | userId, limit | N items, sorted | high | <20ms | bounded ~100s |
|
|
34
|
+
| Get order + its line items | R | orderId | 1+M items | high | <20ms | bounded |
|
|
35
|
+
| Create order (+ items, + user counter) | W | order, items | — | med | <30ms | multi-item |
|
|
36
|
+
|
|
37
|
+
Rule: **you can only query by what you have in hand.** Every read's *Args* column must become a key or index prefix in step 3. If an Arg isn't a key, that read is a scan — reject the model.
|
|
38
|
+
|
|
39
|
+
2. **Confirm store-family fit before modeling.** Don't model a graph in a KV store.
|
|
40
|
+
|
|
41
|
+
| Family | Pick when | Avoid when | Examples |
|
|
42
|
+
|---|---|---|---|
|
|
43
|
+
| **Document** | nested aggregate read/written as a unit; flexible fields; secondary indexes needed | heavy cross-doc joins; huge fan-out updates of shared data | MongoDB, Firestore |
|
|
44
|
+
| **Wide-column** | massive write volume; time-series/feeds; query = known partition + range scan | ad-hoc queries on non-key columns; multi-key transactions | Cassandra, ScyllaDB, Bigtable |
|
|
45
|
+
| **Key-value / single-table** | every access is by a designed key; you want one round trip per pattern | analytics / unpredictable query shapes | DynamoDB single-table, Redis-as-primary |
|
|
46
|
+
|
|
47
|
+
Default for an app backend with a fixed, known pattern set: **document store** unless write volume or strict single-digit-ms-at-scale forces wide-column/DynamoDB.
|
|
48
|
+
|
|
49
|
+
3. **Design keys for distribution (partition) and range (sort) — distribution is non-negotiable.**
|
|
50
|
+
- **Partition/hash key** = which physical shard. Choose **high-cardinality, evenly-requested** values: `userId`, `tenantId#userId`, `deviceId`. **Never** a low-cardinality or monotonic value (`status`, `country`, `true/false`, a date, an auto-increment) as the sole partition key — that is the #1 hot-partition cause.
|
|
51
|
+
- **Sort/clustering key** = order *within* a partition and enables range/`begins_with` queries: `createdAt`, `ORDER#<ts>`, `<type>#<id>`. Compose it to serve range reads: items sorted newest-first, prefix-filterable.
|
|
52
|
+
- DynamoDB single-table generic keys: `PK` / `SK` plus overloaded `GSI1PK`/`GSI1SK`. Encode the entity type in the key, not a separate column:
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
PK SK GSI1PK GSI1SK
|
|
56
|
+
USER#u1 PROFILE — — # user profile
|
|
57
|
+
USER#u1 ORDER#2026-06-15#o9 ORDER#o9 STATUS#PAID # order under user, + lookup by order
|
|
58
|
+
ORDER#o9 ITEM#li1 — — # line item under order
|
|
59
|
+
```
|
|
60
|
+
`Query(PK=USER#u1, SK begins_with ORDER#)` → that user's orders, newest-first, one round trip, no scan.
|
|
61
|
+
- **Bound every partition.** If items-per-partition grows without limit (all events under `PK=GLOBAL`, a celebrity's followers under one key), shard it: suffix `#<bucket 0..N-1>` (write-sharding) and scatter-gather on read, or split the partition by time (`PK=feed#u1#2026-06`).
|
|
62
|
+
|
|
63
|
+
4. **Embed vs reference — decide per relationship, default to embed for read-together.**
|
|
64
|
+
|
|
65
|
+
| Embed (nest inside parent doc/item) | Reference (separate doc/item + id) |
|
|
66
|
+
|---|---|
|
|
67
|
+
| Read together almost always | Accessed independently / by other parents |
|
|
68
|
+
| Child count **bounded & small** (≤ dozens) | Unbounded or large child set |
|
|
69
|
+
| Child owned by exactly one parent | Shared across many parents |
|
|
70
|
+
| Updated together / rarely | High-churn child, low-churn parent (write amplification) |
|
|
71
|
+
| Total well under the item-size limit | Would blow the size limit |
|
|
72
|
+
|
|
73
|
+
**Hard ceilings — model to them, not near them:** DynamoDB item **400 KB**; MongoDB document **16 MB**; Cassandra partition keep **< 100 MB / < 100k rows**. Embedding an unbounded array (comments, events, followers) eventually hits the ceiling and turns every append into a full-doc rewrite — **reference those.** Embed `address`, `lineItems` of one order; reference `comments`, `auditEvents`, `members`.
|
|
74
|
+
|
|
75
|
+
5. **Model M:N and secondary lookups as items/indexes — there are no joins.**
|
|
76
|
+
- **Composite sort key** for one-to-many under a parent: `SK = ORDER#<ts>#<id>`, query by `begins_with(ORDER#)`.
|
|
77
|
+
- **Adjacency list** for M:N in single-table: both directions are items. `PK=USER#u1, SK=TEAM#t1` *and* `PK=TEAM#t1, SK=USER#u1`. Query a user's teams by `PK=USER#u1, begins_with(TEAM#)`; a team's users by the mirror.
|
|
78
|
+
- **GSI / inverted index** to query by a non-key attribute: project `email` into `GSI1PK=EMAIL#<email>` to "get user by email." Each new *read* pattern that doesn't fit the base key = one GSI (DynamoDB) or one secondary index (Mongo/Cassandra), not a scan.
|
|
79
|
+
- Keep GSIs few and purposeful — each is a full extra copy of projected attributes (storage + write cost on every base write).
|
|
80
|
+
|
|
81
|
+
6. **Denormalize deliberately, and write the path that keeps the copies consistent.** Duplicating data (order carries `userName`, post carries `authorAvatar`) is correct NoSQL — *if* you own the fan-out:
|
|
82
|
+
- **Single write touches multiple items** → use a **transaction** (DynamoDB `TransactWriteItems` ≤ 100 items; Mongo multi-doc txn) so the copies commit atomically.
|
|
83
|
+
- **One source → many copies** (author renames → update 10k posts) → **async fan-out** via a stream (DynamoDB Streams / CDC / outbox), never a synchronous loop on the write path.
|
|
84
|
+
- **Tolerate brief drift** → store a source-of-truth pointer and run **async repair**/reconciliation; never let two copies both claim authority.
|
|
85
|
+
Pick one strategy per duplicated field and write it down — silent divergence is the failure mode.
|
|
86
|
+
|
|
87
|
+
7. **Time-ordering, TTL, and blob offload.**
|
|
88
|
+
- Newest-first reads: make the sort key descending-friendly (`ORDER#<reverse-ts>` or query with `ScanIndexForward=false`); never client-side sort a scan.
|
|
89
|
+
- Expiring data (sessions, OTPs, ephemeral feeds): set native **TTL** (DynamoDB TTL attribute, Mongo TTL index, Cassandra per-row TTL) — don't run a delete cron.
|
|
90
|
+
- **Large/binary payloads** (images, PDFs, >100 KB blobs): store in object storage (S3/GCS) and keep only the **key/URL + metadata** in the item. Inlining blobs burns item-size budget and read throughput.
|
|
91
|
+
|
|
92
|
+
8. **Prove no full scans: map each access pattern to exactly one key/index path.** Re-walk the step-1 table; for every row write the resolved access: `Query(PK=…, SK …)` / `GetItem` / `Query(GSI1, …)`. If any row resolves to `Scan` or "filter after fetch on a non-key attribute," the model is incomplete — add a key, GSI, or duplicated item and repeat. A pattern with no index path is a bug, not a tradeoff.
|
|
93
|
+
|
|
94
|
+
## Common Errors
|
|
95
|
+
|
|
96
|
+
- **Designing keys before listing access patterns.** Guarantees a missing query path discovered in prod as a scan. Fill the step-1 table first, always.
|
|
97
|
+
- **Low-cardinality or monotonic partition key** (`status`, `date`, `true`, auto-increment id). Concentrates traffic on one shard → hot partition + throttling. Use a high-cardinality, evenly-hit key; write-shard or time-bucket if forced.
|
|
98
|
+
- **Embedding an unbounded array** (comments/events/followers inside the parent). Hits the 400 KB / 16 MB ceiling and makes every append rewrite the whole doc. Reference it as child items.
|
|
99
|
+
- **Modeling relational then "adding NoSQL on top."** Normalized tables + app-side joins = N+1 round trips and scans. Model the aggregate the query needs, even if it duplicates data.
|
|
100
|
+
- **A `Filter`/`$match` on a non-key field mistaken for a query.** DynamoDB `FilterExpression` and Mongo filters on un-indexed fields run *after* a scan reads everything — billed and slow. Make the filter field a key/index prefix.
|
|
101
|
+
- **Denormalized copy with no fan-out path.** `userName` cached in 10k orders, never updated on rename → permanent stale data. Define transaction / stream fan-out / async repair per duplicated field.
|
|
102
|
+
- **One GSI per attribute "just in case."** Each index is a full write-amplifying copy. Add a GSI only for an actual read pattern from step 1.
|
|
103
|
+
- **Synchronous fan-out on the write path** (loop updating thousands of copies in the request). Latency spikes and partial failures. Offload to a stream/queue.
|
|
104
|
+
- **Unbounded partition growth** (all rows under one `PK`, a whale tenant). Wide-column partition > ~100 MB degrades; DynamoDB throttles the key. Bucket by time or write-shard with a suffix.
|
|
105
|
+
- **Blobs inline in the item.** Caps how many items fit per read and wastes throughput. Offload to object storage, keep a pointer.
|
|
106
|
+
|
|
107
|
+
## Verify
|
|
108
|
+
|
|
109
|
+
1. **Coverage:** every step-1 access pattern maps to exactly one `Get`/`Query`/index path; **zero** resolve to `Scan` or post-fetch filter on a non-key field.
|
|
110
|
+
2. **Distribution:** the partition key is high-cardinality and request-even; no sole partition key is a status/boolean/date/sequence. Estimate items & bytes per hottest partition — under the family ceiling (DynamoDB ~10 GB/partition soft, Cassandra < 100 MB, Mongo doc < 16 MB, DynamoDB item < 400 KB).
|
|
111
|
+
3. **Range reads** return already-sorted (sort/clustering key does the ordering); no client-side sort over a fetched set.
|
|
112
|
+
4. **Embed/reference** justified per relationship against the step-4 table; no unbounded array embedded; largest realistic item stays well under the limit.
|
|
113
|
+
5. **M:N & secondary lookups** each have an explicit path (adjacency item pair, composite SK, or GSI) — confirm both directions of every M:N.
|
|
114
|
+
6. **Each denormalized field** names its consistency mechanism (transaction / stream fan-out / async repair); none has two authoritative copies.
|
|
115
|
+
7. **Throughput sim:** project read+write units (or ops/s) per partition under peak from step-1 frequencies; confirm no single key exceeds the per-partition limit; write-shard/time-bucket where it does.
|
|
116
|
+
8. **TTL** set on every ephemeral entity; **blobs > ~100 KB** offloaded to object storage with only a pointer stored.
|
|
117
|
+
|
|
118
|
+
Done = every access pattern resolves to a single non-scan key/index path, no partition key is hot or unbounded under projected peak load, every embedded relationship is bounded under the item-size limit, and every denormalized copy has a named write-path keeping it consistent.
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: money-decimal-arithmetic
|
|
3
|
+
description: Implements correct monetary and decimal arithmetic using integer minor units or arbitrary-precision decimals — per-currency exponents (ISO 4217), explicit rounding modes (banker's vs half-up), largest-remainder allocation that sums exactly, FX triangulation, NUMERIC storage, and locale-aware formatting — to eliminate float drift and off-by-a-penny totals.
|
|
4
|
+
when_to_use: Code does financial math — prices, totals, tax/VAT, discounts, interest, invoicing, splitting a charge across line items, multi-currency conversion, or rounding to cents; money is stored as float or summed with ad-hoc Math.round; totals are off by a cent; or you're choosing a money/decimal type (BigDecimal, Python decimal, dinero.js, rust_decimal). Distinct from numeric-precision-units (general float/units correctness, not currency-exponent/allocation/FX rules) and payments-billing-integration (drives PSP charge/subscription state, then calls this skill for the totals).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when the bug or task is about **numeric correctness of money**, not throughput, schema, or float/units in general:
|
|
10
|
+
|
|
11
|
+
- "Total is off by a cent" / "tax doesn't add up to the sum of line items"
|
|
12
|
+
- "Split this $100 charge across 3 items / refund proportionally"
|
|
13
|
+
- "Store and compute prices, discounts, VAT, interest, invoice rounding"
|
|
14
|
+
- "Convert USD→JPY→EUR, what rate precision and rounding?"
|
|
15
|
+
- "We store amounts as `float`/`DOUBLE` — is that safe?" (no)
|
|
16
|
+
- Choosing a money/decimal type: `BigDecimal`, Python `decimal.Decimal`, `dinero.js`, `rust_decimal`, `js-joda`-style money libs
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- General float pitfalls (epsilon/ULP compare, Kahan summation, NaN/Inf guards) or non-money unit conversion (metric/imperial, data sizes, angles) → numeric-precision-units (this skill is the money-specific specialization: ISO 4217 exponents, allocation, FX)
|
|
20
|
+
- Integrating a PSP, idempotent charges, subscription/proration, payment webhooks → payments-billing-integration (it owns billing *state*; it calls this skill for the rounding/allocation/FX math)
|
|
21
|
+
- Making a slow aggregate query fast → optimize-sql-query
|
|
22
|
+
- Detecting nulls/outliers/dupes in a dataset → validate-data-quality
|
|
23
|
+
- Picking column types / running a safe ALTER on a money column → db-migration-safety
|
|
24
|
+
- Reshaping/cleaning numeric columns in a dataframe → wrangle-tabular-data
|
|
25
|
+
- Serialization contract for an API field's type → rest-graphql-contract
|
|
26
|
+
- Writing the property tests themselves as a test suite → write-tests (this skill specifies *which* invariants; that one structures the suite)
|
|
27
|
+
|
|
28
|
+
## Steps
|
|
29
|
+
|
|
30
|
+
1. **Never use binary float for money. Pick the representation by language, not by habit.** Floats can't represent `0.10` exactly, so `0.1 + 0.2 === 0.30000000000000004` and `0.1 * 3 !== 0.3`. Two safe representations:
|
|
31
|
+
|
|
32
|
+
| Representation | What it is | Use when | Watch out |
|
|
33
|
+
|---|---|---|---|
|
|
34
|
+
| **Integer minor units** | store cents as `int`/`bigint` (`$12.34` → `1234`) | default for fixed exponent, transactional ledgers, money over the wire | must track currency to know the exponent; intermediate math (interest, %) still needs a decimal/round step |
|
|
35
|
+
| **Arbitrary-precision decimal** | base-10 type: Python `Decimal`, Java `BigDecimal`, .NET `decimal`, `rust_decimal`, JS `decimal.js`/`big.js` | rates, %, interest, tax with sub-cent intermediates, accounting needing >2 dp | set a context/precision; still must round to currency exponent at the boundary |
|
|
36
|
+
|
|
37
|
+
Per language: **JS/TS** → `dinero.js` v2 or `big.js` (never `Number`); **Python** → `decimal.Decimal` (never `float`); **Java/Kotlin** → `BigDecimal` (never `double`); **.NET** → `decimal`; **Rust** → `rust_decimal::Decimal`; **Go** → `int64` minor units or `shopspring/decimal`; **Postgres** → `NUMERIC` (never `FLOAT`/`REAL`/`DOUBLE`).
|
|
38
|
+
|
|
39
|
+
2. **Model amount + currency as one value; respect the per-currency exponent.** A bare number is not money — `100` is meaningless without a currency, and the exponent varies by ISO 4217:
|
|
40
|
+
|
|
41
|
+
| Currency | Exponent (minor digits) | `1.00` unit = |
|
|
42
|
+
|---|---|---|
|
|
43
|
+
| USD, EUR, GBP | 2 | 100 cents |
|
|
44
|
+
| JPY, KRW, CLP | **0** | 1 (no cents) |
|
|
45
|
+
| BHD, KWD, TND | **3** | 1000 fils |
|
|
46
|
+
|
|
47
|
+
```ts
|
|
48
|
+
type Money = { amount: bigint; currency: string }; // amount in MINOR units
|
|
49
|
+
// $12.34 → { amount: 1234n, currency: "USD" } exponent 2
|
|
50
|
+
// ¥1234 → { amount: 1234n, currency: "JPY" } exponent 0
|
|
51
|
+
// 1.234 BD → { amount: 1234n, currency: "BHD" } exponent 3
|
|
52
|
+
```
|
|
53
|
+
Reject any binary op on two `Money` of different currencies — throw, don't coerce. Drive the exponent from an ISO 4217 table, never hardcode `2`.
|
|
54
|
+
|
|
55
|
+
3. **Decide and document ONE rounding mode; round only at boundaries.** The default sources disagree, so state it explicitly:
|
|
56
|
+
- **Banker's rounding (half-to-even, `ROUND_HALF_EVEN`)** — default for statistical/aggregate fairness; removes the upward bias of always rounding `.5` up. Use for interest, large batches, GAAP/IFRS contexts. `2.5→2`, `3.5→4`.
|
|
57
|
+
- **Half-up (`ROUND_HALF_UP`, "arithmetic")** — what invoices and most tax authorities expect for a single bill line. `2.5→3`. Many VAT rules mandate this per line.
|
|
58
|
+
|
|
59
|
+
Pick **half-even as the engine default**, override to **half-up where a tax/billing rule requires it**, and write the chosen mode next to the code. **Carry full precision through the calculation; round exactly once, at the point you produce a displayable/storable currency amount** — never round intermediates, or errors compound.
|
|
60
|
+
|
|
61
|
+
4. **Allocate with largest-remainder so the parts sum EXACTLY to the whole.** Splitting `$100 / 3` as `33.33 × 3 = 99.99` leaks a penny. Distribute the remainder deterministically:
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
def allocate(total_minor: int, ratios: list[int]) -> list[int]:
|
|
65
|
+
s = sum(ratios)
|
|
66
|
+
shares = [total_minor * r // s for r in ratios] # floor each
|
|
67
|
+
remainder = total_minor - sum(shares) # pennies left over
|
|
68
|
+
# hand out the leftover pennies, one each, by largest fractional part
|
|
69
|
+
order = sorted(range(len(ratios)),
|
|
70
|
+
key=lambda i: (total_minor * ratios[i]) % s, reverse=True)
|
|
71
|
+
for i in order[:remainder]:
|
|
72
|
+
shares[i] += 1
|
|
73
|
+
return shares
|
|
74
|
+
# allocate(10000, [1,1,1]) -> [3334, 3333, 3333] sums to 10000 exactly
|
|
75
|
+
```
|
|
76
|
+
Invariant: `sum(allocate(total, ratios)) == total`, always, for any total and ratios. Use this for splitting charges, proportional refunds, tax-inclusive line breakdowns.
|
|
77
|
+
|
|
78
|
+
5. **Fix tax/discount ordering and the rounding points — they change the total.** Decide and document:
|
|
79
|
+
- **Discount before tax** (typical retail): `taxable = price − discount`, then `tax = round(taxable × rate)`.
|
|
80
|
+
- **Round per line vs round on total**: per-line rounding (round each line's tax, then sum) and total rounding (sum exact line taxes, round once) can differ by cents. Most invoice/VAT regimes require **round per line**; pick one, document it, keep it consistent across the whole invoice.
|
|
81
|
+
- **Tax-inclusive (gross) prices**: extract tax with `tax = round(gross × rate / (1 + rate))`; the net is `gross − tax` so the parts reconcile exactly.
|
|
82
|
+
|
|
83
|
+
6. **FX conversion — fix precision, direction, triangulation, and one round.** A rate is a high-precision decimal, not money. Rules:
|
|
84
|
+
- Store rates at **≥6 significant decimal places** (`decimal`, not float); know the direction (`USD→EUR` rate vs its reciprocal — they are not 1/x to display precision).
|
|
85
|
+
- Compute in full decimal precision: `target_minor = source_major × rate`, scaled to the **target** currency's exponent, then **round once** (half-even) to target minor units.
|
|
86
|
+
- **Triangulate** through a base when no direct pair exists (`THB→base→JPY`); apply both legs in full precision and round only the final result, never the intermediate base amount.
|
|
87
|
+
- Never round the source before converting; never reuse a stale/averaged rate when an exact contractual rate is required.
|
|
88
|
+
|
|
89
|
+
7. **Compare and test equality on the exact integer/decimal — never a float epsilon.** With minor units / decimals, `a == b` is exact; `abs(a−b) < 1e-9` is a code smell signaling float crept in. Equality must include currency: `{1234,"USD"} != {1234,"JPY"}`. Sort/compare amounts only within the same currency.
|
|
90
|
+
|
|
91
|
+
8. **Store as exact types; serialize as string, not float JSON.** Postgres `NUMERIC(precision, scale)` or `BIGINT` minor units — **never `FLOAT`/`DOUBLE`/`REAL`** (lossy) and never `MONEY` (locale-fragile, fixed scale). Over JSON, emit amounts as a **string** (`"12.34"`) or as integer minor units + currency code — a JSON number is an IEEE-754 double and silently corrupts ≥16-digit and some 2-dp values. Set the column scale to the currency's max exponent (3 to be safe across BHD/KWD).
|
|
92
|
+
|
|
93
|
+
9. **Display and parse via the locale layer, separate from the math.** Format only at the edge with `Intl.NumberFormat(locale, {style:'currency', currency})` (JS) / `babel.numbers.format_currency` (Python) / `NumberFormat.getCurrencyInstance` (Java) — these place the symbol, group separators, and minor digits per locale (`-1.234,56 €` vs `($1,234.56)`). When parsing user input, strip locale separators back to a canonical decimal/minor-unit value before any arithmetic; never `parseFloat` a formatted string.
|
|
94
|
+
|
|
95
|
+
10. **Lock the invariants with property tests** (delegate suite structure to write-tests; assert these properties): allocation sums to the whole; round-trip format→parse is identity in canonical units; conversion+inverse stays within one minor unit; commutativity/associativity of addition in the same currency; no operation produces a fractional minor unit.
|
|
96
|
+
|
|
97
|
+
## Common Errors
|
|
98
|
+
|
|
99
|
+
- **`float`/`double` anywhere in the money path.** `0.1 + 0.2 != 0.3`; sums drift over many rows. Fix: integer minor units or a base-10 decimal type end to end.
|
|
100
|
+
- **Hardcoding exponent `2`.** Breaks JPY (0) and BHD/KWD (3) — `¥1234` becomes `¥12.34`. Fix: read the exponent from an ISO 4217 table.
|
|
101
|
+
- **Rounding intermediates.** Rounding each step before the final means errors accumulate. Fix: full precision through the calc, round exactly once at the output boundary.
|
|
102
|
+
- **Naïve split (`total/n`, round each).** `100/3 → 33.33×3 = 99.99`, a penny vanishes. Fix: largest-remainder allocation (step 4); assert `sum == total`.
|
|
103
|
+
- **Mixing currencies in one operation.** Adding USD to JPY silently yields garbage. Fix: type `Money` with currency; throw on mismatch.
|
|
104
|
+
- **Unspecified/mixed rounding mode.** Half-even in one place, half-up in another → reconciliation gaps. Fix: one documented mode, override only where a tax rule mandates.
|
|
105
|
+
- **Float JSON for amounts.** `12.34` over the wire becomes `12.339999999999`. Fix: serialize as string or integer minor units + currency.
|
|
106
|
+
- **`FLOAT`/`MONEY` SQL columns.** Lossy or locale-fragile storage. Fix: `NUMERIC(p,s)` or `BIGINT` minor units.
|
|
107
|
+
- **`parseFloat` on a formatted string.** `"1.234,56"` (de-DE) parses to `1.234`. Fix: locale-aware parse to canonical units before math.
|
|
108
|
+
- **Float epsilon comparison.** `abs(a-b) < 1e-9` for money means float leaked in. Fix: exact integer/decimal compare, including currency.
|
|
109
|
+
- **Reciprocal FX assumption.** Treating `EUR→USD` as exactly `1/(USD→EUR)` introduces drift. Fix: store/quote each direction; round only the final converted amount.
|
|
110
|
+
|
|
111
|
+
## Verify
|
|
112
|
+
|
|
113
|
+
1. **No float in the money path:** grep the diff — no `float`/`double`/`Number(`/`parseFloat`/`FLOAT`/`DOUBLE` on monetary values; types are minor-unit integers or a base-10 decimal. Schema columns are `NUMERIC`/`BIGINT`, not `FLOAT`/`MONEY`.
|
|
114
|
+
2. **Exponent correctness:** format `1234` minor units in USD→`$12.34`, JPY→`¥1234`, BHD→`1.234` — the exponent comes from the currency, not a constant.
|
|
115
|
+
3. **Allocation sums exactly:** property test `sum(allocate(total, ratios)) == total` for thousands of random totals and ratio vectors, including `total/3`, `/7`, zero ratios, and a single element. Zero penny leaks.
|
|
116
|
+
4. **Single rounding boundary:** a chained calc (price × qty × (1−discount) × (1+tax)) rounds once and equals a hand-computed full-precision-then-round figure; intermediates carry sub-minor precision.
|
|
117
|
+
5. **Tax reconciles:** sum of per-line taxes equals the documented invoice total under the chosen per-line/total rule; tax-inclusive extraction satisfies `net + tax == gross` exactly.
|
|
118
|
+
6. **FX round-trip bounded:** convert `A→B→A` for many amounts; result is within 1 minor unit of the original (rounding only, no drift), and a triangulated path rounds only the final leg.
|
|
119
|
+
7. **Equality is exact:** money equality/compare uses no epsilon and treats different currencies as unequal; tests assert `{1234,"USD"} != {1234,"JPY"}`.
|
|
120
|
+
8. **Serialization is lossless:** amounts cross JSON/DB boundaries as string or minor-unit integer + currency; a `12.34`-as-float anywhere fails the check.
|
|
121
|
+
9. **Format/parse identity:** for a set of locales, `parse(format(x)) == x` in canonical units.
|
|
122
|
+
|
|
123
|
+
Done = no binary float touches money anywhere, every currency uses its ISO 4217 exponent, allocation/tax/FX sum to the whole with zero penny leak, rounding mode is documented and applied exactly once at each boundary, and amounts are stored and serialized as exact (NUMERIC/minor-unit/string) values — all proven by the property tests in checks 3–9.
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: monitor-ml-drift
|
|
3
|
+
description: Monitors a production ML model for input data drift, prediction drift, and performance decay against delayed labels — using PSI/KS/Chi-square drift tests, train/serve skew checks, alert thresholds, and scheduled-or-drift-triggered retraining with a champion/challenger loop — so a silently degrading model is caught before it costs.
|
|
4
|
+
when_to_use: A deployed model needs ongoing statistical health monitoring or has quietly degraded. Distinct from serve-deploy-ml-model (rollout/canary/autoscale), train-evaluate-ml-model (initial build + offline metrics), observability-instrument (service latency/error RED metrics), and validate-data-quality (rule assertions, not distribution shift).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when the concern is **the model's statistical health in production**, not whether the service is up:
|
|
10
|
+
|
|
11
|
+
- "Accuracy looked fine at launch but the model feels worse now — is it drifting?"
|
|
12
|
+
- "Our feature distributions shifted (new user segment, seasonality, upstream schema change) — did the model degrade?"
|
|
13
|
+
- "Set up drift + performance monitoring and an alert when a retrain is warranted"
|
|
14
|
+
- "Labels arrive 2 weeks late — how do I track real accuracy/AUC over time?"
|
|
15
|
+
- "Detect train/serve skew — the model scores differently offline vs online on the same row"
|
|
16
|
+
- "Wire a champion/challenger so a candidate retrain only ships if it beats prod"
|
|
17
|
+
|
|
18
|
+
NOT this skill:
|
|
19
|
+
- Shipping/rolling out the model artifact, canary, autoscaling → serve-deploy-ml-model
|
|
20
|
+
- The original training run, offline eval, hyperparameter search, test-set metrics → train-evaluate-ml-model
|
|
21
|
+
- Service-level latency/error-rate/RED metrics, traces, dashboards, p99 alerts → observability-instrument
|
|
22
|
+
- Rule assertions on the data pipeline (not-null, unique, freshness, range) → validate-data-quality (drift is *distributional*; a column can pass every range rule and still have shifted its whole distribution)
|
|
23
|
+
|
|
24
|
+
## Steps
|
|
25
|
+
|
|
26
|
+
1. **Log every prediction as an immutable event — no logging = no monitoring.** Per request, write one row: `prediction_id`, `ts`, `model_version`, the **raw feature vector actually scored** (post-transform, exactly what the model saw), the output (`pred_proba` + `pred_label`), and a `label_join_key`. Land it in a columnar store (Parquet on S3, BigQuery, Delta). Labels arrive later out-of-band → write them to a separate table keyed by `label_join_key` and **left-join on arrival**; never block scoring on a label. Snapshot the **training reference** (a held-out slice of the training data + its predictions) once and pin it — every drift test compares live vs this fixed reference.
|
|
27
|
+
|
|
28
|
+
2. **Pick the drift test per feature type — do not PSI everything.**
|
|
29
|
+
|
|
30
|
+
| Signal | Test | Fires when | Default threshold |
|
|
31
|
+
|---|---|---|---|
|
|
32
|
+
| Numeric / continuous feature | **PSI** (population stability index) | Binned distribution shifted vs reference | PSI > 0.2 = significant; 0.1–0.2 = watch |
|
|
33
|
+
| Numeric, distribution shape | **KS** (Kolmogorov–Smirnov) 2-sample | Max CDF gap large | p < 0.05 |
|
|
34
|
+
| Categorical feature | **Chi-square** / PSI on category freqs | Category mix shifted, new/unseen level | p < 0.05 / PSI > 0.2 |
|
|
35
|
+
| Prediction output (proba) | **PSI / KS** on `pred_proba` | Output distribution drifts | PSI > 0.2 |
|
|
36
|
+
| Multivariate / overall | **Domain classifier** (ref vs live, AUC) | Classifier separates ref from live | AUC > 0.7 |
|
|
37
|
+
|
|
38
|
+
Compute over a **rolling window** (default: last 7 days or 10k preds, whichever larger) vs the pinned reference. Use a fixed reference for stable populations; switch to a **trailing-window reference** only if the population legitimately evolves (and document that you've given up detecting slow drift). Apply **Bonferroni/BH correction** across features — with 200 features at p<0.05 you get ~10 false alarms per run by chance.
|
|
39
|
+
|
|
40
|
+
3. **Separate the three drift types — they mean different things and trigger different actions.**
|
|
41
|
+
- **Data (input) drift** — features moved. Model may still be fine; this is an *early warning*, not proof of decay. Page only if widespread.
|
|
42
|
+
- **Prediction drift** — output distribution moved without a known input cause → upstream feature pipeline broke, or real population shift. Higher signal than single-feature input drift.
|
|
43
|
+
- **Concept drift / performance decay** — the input→output relationship changed. **Only measurable once labels land.** This is the one that actually justifies a retrain. Track the real metric (AUC/F1/MAE — whatever you optimized) per cohort window vs a **baseline window** (e.g. first 2 weeks post-deploy, or last known-good).
|
|
44
|
+
|
|
45
|
+
4. **Run it with a library — don't hand-roll the stats.** Evidently for reports + tests, whylogs for lightweight profile logging at scale, NannyML for *estimating* performance **before** labels arrive (CBPE/DLE). Pin `evidently==0.4.*` and use its `Report` / `metric_preset` API:
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
from evidently.report import Report
|
|
49
|
+
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
|
|
50
|
+
from evidently import ColumnMapping
|
|
51
|
+
|
|
52
|
+
cm = ColumnMapping(prediction="pred_proba")
|
|
53
|
+
report = Report(metrics=[
|
|
54
|
+
DataDriftPreset(stattest="psi", stattest_threshold=0.2), # per-feature input drift
|
|
55
|
+
TargetDriftPreset(), # prediction-column drift
|
|
56
|
+
])
|
|
57
|
+
report.run(reference_data=ref_df, current_data=live_df, column_mapping=cm)
|
|
58
|
+
res = report.as_dict()
|
|
59
|
+
|
|
60
|
+
drift = res["metrics"][0]["result"] # DataDriftPreset summary
|
|
61
|
+
if drift["share_of_drifted_columns"] > 0.3: # >30% of features drifted → alert
|
|
62
|
+
fire_alert("data_drift", detail=drift)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
For pre-label performance estimation when labels lag:
|
|
66
|
+
```python
|
|
67
|
+
import nannyml as nml
|
|
68
|
+
est = nml.CBPE(problem_type="classification_binary", y_pred="pred_label",
|
|
69
|
+
y_pred_proba="pred_proba", y_true="label",
|
|
70
|
+
metrics=["roc_auc"], chunk_size=5000)
|
|
71
|
+
est.fit(reference_df) # reference must include matured labels
|
|
72
|
+
estimated = est.estimate(live_df) # estimated AUC + confidence band, no live labels needed
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
5. **Detect train/serve skew explicitly — it's a silent killer.** Re-score a sample of logged production feature vectors through the **offline** model and assert `abs(online_proba − offline_proba) < 1e-4`. Mismatch = a transform diverged between training and serving (different encoder fit, a default-fill applied online only, version skew in a preprocessing lib). Also compare **training-time** feature distributions vs **serving-time** for the same feature: skew shows up as a step change at deploy, not a gradual drift. Run this nightly on a sample.
|
|
76
|
+
|
|
77
|
+
6. **Set thresholds and a retraining trigger — opinionated defaults, then tune to your false-alarm budget.**
|
|
78
|
+
- **Trigger retrain** when *any* holds: estimated/actual primary metric drops > **5% relative** below baseline for ≥2 consecutive windows; OR prediction-drift PSI > 0.2 sustained; OR > 30% of top-importance features drifted. One noisy window ≠ retrain — require **persistence** (2+ windows) to kill flapping.
|
|
79
|
+
- **Schedule** a baseline retrain regardless (weekly/monthly) so you never rely solely on drift detection catching it.
|
|
80
|
+
- On trigger, retrain a **challenger** and gate promotion through a champion/challenger comparison (step 7) — never auto-promote on a drift signal alone; drift can be benign.
|
|
81
|
+
|
|
82
|
+
7. **Champion/challenger before promotion.** Train challenger on fresh data, evaluate **both** on the same recent labeled window (and ideally a shadow/online split). Promote only if challenger beats champion on the primary metric by a margin **beyond noise** (bootstrap CI on the metric, or a paired test) — not a single point estimate. Log the decision + metrics to a model registry. Hand the actual rollout (canary, traffic shift, rollback) to **serve-deploy-ml-model**; this skill decides *whether*, that skill does *how*.
|
|
83
|
+
|
|
84
|
+
8. **Alert routing, not just detection.** Page on **performance decay** and **prediction drift** (high signal). Send **input drift** to a dashboard/digest, not a pager — single-feature input drift is frequent and usually benign; paging on it trains everyone to ignore the channel. Every alert carries: which signal, which features/metric, the value vs threshold, the window, and a link to the drift report.
|
|
85
|
+
|
|
86
|
+
## Common Errors
|
|
87
|
+
|
|
88
|
+
- **Logging transformed-then-re-derived features instead of what the model scored.** You then compare a reconstruction, not reality, and miss real skew. Log the exact post-transform vector at inference time.
|
|
89
|
+
- **Reference set = the whole training data including the part the model trained on.** Leaks optimism. Use a **held-out** slice as reference.
|
|
90
|
+
- **PSI/KS run with no multiple-comparison correction.** 200 features × p<0.05 ≈ 10 false "drifts" every run → alert fatigue. Apply Bonferroni/BH and a `share_of_drifted_columns` gate, don't alert per feature.
|
|
91
|
+
- **Treating any data drift as "model is broken."** Features can shift while accuracy holds. Only **performance decay** (or prediction drift with a cause) justifies a retrain; input drift is a watch signal.
|
|
92
|
+
- **Computing "live accuracy" the moment predictions are made.** Labels are delayed — that number is empty until labels land. Use NannyML CBPE/DLE to *estimate* performance pre-label, and report actual metric only over windows whose labels have matured.
|
|
93
|
+
- **Joining labels to predictions on timestamp.** Late/duplicate/reordered labels corrupt the join. Join on a stable `label_join_key`, and bucket by **prediction** time, not label-arrival time.
|
|
94
|
+
- **Comparing windows of wildly different size.** PSI/KS are sensitive to n; a 200-row window vs a 50k reference flags noise as drift. Fix a minimum window size and equal-ish bins.
|
|
95
|
+
- **Fixed reference forever on a legitimately evolving population.** Everything reads as drift and the signal dies. Either accept slow drift goes undetected with a trailing reference, or re-baseline deliberately on each retrain — and write down which.
|
|
96
|
+
- **Auto-retrain + auto-promote on a single drift spike.** Promotes a worse model on a benign blip or a data outage. Require persistence (2+ windows) and a champion/challenger win beyond noise.
|
|
97
|
+
- **No train/serve skew check.** The most common production regression — an encoder/imputer that differs online — is invisible to distribution drift. Re-score logged rows offline and assert equality.
|
|
98
|
+
|
|
99
|
+
## Verify
|
|
100
|
+
|
|
101
|
+
1. **Inject a known input shift:** take a held-out reference, build a `current` where one numeric feature is multiplied (e.g. ×1.5) or a category's frequency is swapped → the per-feature drift test (PSI/KS) for *that* feature fires and the others stay green. Proves sensitivity *and* specificity.
|
|
102
|
+
2. **Inject prediction drift:** shift `pred_proba` for the current window → prediction-drift alert fires while input features are unchanged. Proves the output monitor is independent.
|
|
103
|
+
3. **Replay a known-degraded period:** feed a window whose labels you know are bad (mislabel a slice or use a historically-bad date range) → the performance tracker shows the metric dropping > 5% below baseline and the **retrain trigger** fires after the 2nd consecutive bad window (not the 1st).
|
|
104
|
+
4. **Negative control:** feed `current = reference` (resampled) → **no** alert fires. If a same-distribution sample trips an alert, your thresholds/correction are too tight.
|
|
105
|
+
5. **Skew check:** re-score a sample of logged prod vectors offline → `max|online − offline| < 1e-4`. Then deliberately break one transform and confirm the skew check catches it.
|
|
106
|
+
6. **Delayed-label join:** insert labels out of order / late → actual-metric windows recompute correctly keyed by prediction time, and pre-label estimated metric (CBPE) tracks the eventual actual within its confidence band.
|
|
107
|
+
7. **Champion/challenger gate:** feed a challenger that's worse on the recent window → promotion is **rejected**; feed one that's better beyond the CI → promotion is approved and logged to the registry.
|
|
108
|
+
|
|
109
|
+
Done = an injected input shift fires only the right feature's drift alert (negative control stays silent), prediction drift is detected independently, the performance tracker reflects the known-degraded period and trips the retrain trigger after sustained (not single-window) decay, train/serve skew is caught, and champion/challenger blocks a worse model from promoting.
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: numeric-precision-units
|
|
3
|
+
description: Prevents numeric-precision and units defects by enforcing epsilon/ULP/relative float comparison, Kahan/Welford stable accumulation, NaN/Inf and div-by-zero guards, checked/saturating integer arithmetic, lossless int64/decimal transport across JSON/JS/DB boundaries, and explicit unit-typed conversions with consistent rounding.
|
|
4
|
+
when_to_use: Code does scientific/statistical math, accumulates many floats, compares floats with ==, converts units (metric/imperial, time, data sizes, angles), or moves large integers/decimals across JSON/JS/DB/language boundaries; or bugs involve flaky float equality, NaN/Inf, silent overflow, or lost int64→double precision. Distinct from money-decimal-arithmetic (monetary rounding/allocation correctness) and validate-data-quality (schema/null/range checks).
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Reach for this skill when the defect is about **the number itself** — its representation, precision, or unit — not its monetary rounding, schema, or business validity:
|
|
10
|
+
|
|
11
|
+
- "These two floats are equal but `a == b` returns false" / flaky test on a computed total
|
|
12
|
+
- "Summing a million values gives a different answer depending on order"
|
|
13
|
+
- "Mean/variance is wrong / NaN on large or near-equal data"
|
|
14
|
+
- "My int64 ID comes back rounded after a round-trip through JSON/JS"
|
|
15
|
+
- "A big integer turned into `1.0000000000000002e18` in the browser / spreadsheet"
|
|
16
|
+
- "Counter wrapped to a negative number" / `i32` overflow / cast truncated a value
|
|
17
|
+
- "We mixed meters and feet / ms and seconds / radians and degrees / KB(1000) and KiB(1024)"
|
|
18
|
+
- Division by a value that can be zero; `0.1 + 0.2 != 0.3`; signed-zero or `-0.0` surprises
|
|
19
|
+
|
|
20
|
+
NOT this skill:
|
|
21
|
+
- Monetary correctness — cents/`Decimal`, banker's rounding, splitting a charge so it sums exactly, FX → **money-decimal-arithmetic** (this skill keeps money *out* of binary float and intact across boundaries; that one does the rounding/allocation math)
|
|
22
|
+
- Duration/clock precision, monotonic vs wall-clock, leap seconds, DST math → **datetime-timezone-correctness** (this skill stores time as an integer; that one interprets it)
|
|
23
|
+
- Checking a field is present / in range / right type at ingest → **validate-data-quality**
|
|
24
|
+
- Configuring the compiler/linter to forbid implicit numeric coercion (tsconfig, mypy, clippy) → **type-safety-strict**
|
|
25
|
+
- Choosing `NUMERIC` vs `BIGINT` column types for a *schema migration* → **db-migration-safety**
|
|
26
|
+
- Declaring the wire shape of a number in an API (string vs int64 in the contract) → **rest-graphql-contract**
|
|
27
|
+
- Writing the test harness/property-test scaffolding itself → **write-tests**
|
|
28
|
+
- Making a `SUM` aggregate query fast → **optimize-sql-query**
|
|
29
|
+
|
|
30
|
+
## Steps
|
|
31
|
+
|
|
32
|
+
1. **Decide the representation first — float is a default, not a law.**
|
|
33
|
+
|
|
34
|
+
| Domain | Use | Never |
|
|
35
|
+
|---|---|---|
|
|
36
|
+
| IDs, counts, timestamps (ns/ms) | integer (`int64`) | `double` (loses precision > 2^53) |
|
|
37
|
+
| Physical/scientific measurement | `float64` (`double`) | `float32` unless memory-bound and tolerance allows |
|
|
38
|
+
| Exact fractions / ratios | rational type or scaled integer | float |
|
|
39
|
+
| Probabilities, weights, signals | `float64` | `float32` |
|
|
40
|
+
| Money / currency | → defer to **money-decimal-arithmetic** | binary `float`/`double` |
|
|
41
|
+
|
|
42
|
+
Rule: if two values must compare *exactly equal*, they must not be binary floats.
|
|
43
|
+
|
|
44
|
+
2. **Never `==` floats. Pick the tolerance by scale.** Absolute epsilon fails for large magnitudes; relative fails near zero. Use a combined check:
|
|
45
|
+
|
|
46
|
+
```python
|
|
47
|
+
def close(a, b, rel=1e-9, abs_tol=1e-12):
|
|
48
|
+
if a == b: # exact / both inf same sign
|
|
49
|
+
return True
|
|
50
|
+
if math.isnan(a) or math.isnan(b):
|
|
51
|
+
return False # NaN is never close to anything
|
|
52
|
+
return abs(a - b) <= max(rel * max(abs(a), abs(b)), abs_tol)
|
|
53
|
+
```
|
|
54
|
+
- Library defaults: Python `math.isclose(a, b, rel_tol=1e-9)`, NumPy `np.isclose`/`allclose`, Rust `approx::relative_eq!`, JS — write the above (no stdlib equivalent).
|
|
55
|
+
- **Near zero, relative tolerance collapses** — that's why `abs_tol` exists; set it to the smallest difference you consider zero in your domain.
|
|
56
|
+
- ULP comparison (`a` and `b` within N representable steps) only for low-level kernels where you control rounding mode — overkill for app code; use relative+abs.
|
|
57
|
+
|
|
58
|
+
3. **Accumulate stably — order and algorithm change the answer.** Naive left-to-right `sum` accumulates rounding error ∝ n·ε and suffers **catastrophic cancellation** when subtracting near-equal large numbers.
|
|
59
|
+
- Summation: use **pairwise** (NumPy's `np.sum` already does this) or **Kahan/Neumaier compensated** summation for long running totals:
|
|
60
|
+
```python
|
|
61
|
+
def kahan_sum(xs):
|
|
62
|
+
s = 0.0; c = 0.0 # c = running compensation
|
|
63
|
+
for x in xs:
|
|
64
|
+
y = x - c
|
|
65
|
+
t = s + y
|
|
66
|
+
c = (t - s) - y
|
|
67
|
+
s = t
|
|
68
|
+
return s
|
|
69
|
+
```
|
|
70
|
+
- Mean/variance: **never** `sum(x²)/n - mean²` (cancellation → negative variance / NaN). Use **Welford's online** algorithm:
|
|
71
|
+
```python
|
|
72
|
+
n = 0; mean = 0.0; M2 = 0.0
|
|
73
|
+
for x in data:
|
|
74
|
+
n += 1
|
|
75
|
+
d = x - mean; mean += d / n
|
|
76
|
+
M2 += d * (x - mean)
|
|
77
|
+
var = M2 / n # population; M2/(n-1) for sample
|
|
78
|
+
```
|
|
79
|
+
- Sort ascending by magnitude before summing wildly different scales if you can't use compensated summation.
|
|
80
|
+
|
|
81
|
+
4. **Guard special values at the source, not three layers downstream.**
|
|
82
|
+
- Before any `/`: reject or branch on a zero/near-zero divisor (`if abs(d) < abs_tol: raise/return sentinel`). Float `x/0.0` yields `±inf`/`nan` *silently*; integer `/0` traps/UB.
|
|
83
|
+
- Treat `NaN`/`Inf` as poison: one `NaN` propagates through every subsequent op and **every comparison with it is false** (including `nan == nan`). Validate inputs with `math.isfinite(x)` at boundaries; assert finiteness on outputs.
|
|
84
|
+
- `-0.0 == 0.0` is true but `1/-0.0 == -inf` while `1/0.0 == +inf`; normalize with `x + 0.0` or `x == 0.0 ? 0.0 : x` when sign of zero leaks into results.
|
|
85
|
+
- `log`/`sqrt`/`acos` domain: clamp inputs (`acos(min(1, max(-1, x)))`) — rounding can push a value to `1.0000000002` and yield `NaN`.
|
|
86
|
+
|
|
87
|
+
5. **Integer arithmetic: assume it overflows, prove it doesn't.** Fixed-width ints wrap (C/Go/Rust release/Java) or trap (Rust debug) or silently promote (Python/JS bignum-ish). Choose explicit semantics:
|
|
88
|
+
|
|
89
|
+
| Need | Rust | Go/C | Java | Generic |
|
|
90
|
+
|---|---|---|---|---|
|
|
91
|
+
| Detect overflow | `checked_add` → `Option` | compare after op / `bits.Add64` | `Math.addExact` (throws) | check before/after |
|
|
92
|
+
| Clamp at bound | `saturating_add` | manual `min`/`max` | manual | manual |
|
|
93
|
+
| Intentional wrap | `wrapping_add` | default `+` | default `+` | mask |
|
|
94
|
+
- **Narrowing casts lose data silently**: `(int32)bigLong`, `i64 as i32`, `Number → Int32`. Range-check before narrowing; never cast a length/ID down.
|
|
95
|
+
- Multiplication overflows long before addition — `a*b` for two `int32` near 2^16 already wraps; widen to `int64` *before* multiplying.
|
|
96
|
+
- `INT_MIN / -1` and `-INT_MIN` overflow; `abs(INT_MIN)` is still negative.
|
|
97
|
+
|
|
98
|
+
6. **Cross-boundary precision: the 2^53 trap is the #1 silent corruption.** JSON has one number type; JS `Number` is `float64` with exact integers only up to `2^53-1` (9007199254740991). Any `int64` above that **loses its low bits** the instant a JS/JSON parser touches it — no error.
|
|
99
|
+
- **Send large ints and exact decimals as JSON strings.** Contract: `{"id": "9223372036854775807", "ratio": "12345.6789"}`. Parse to `BigInt`/`int64`/`Decimal` explicitly on each side.
|
|
100
|
+
- Guard in JS: `Number.isSafeInteger(x)` before trusting any integer; use `BigInt` + a string-aware parser (`json-bigint`) when you can't change the wire format.
|
|
101
|
+
- DB: `DOUBLE`/`FLOAT` columns are binary — IDs and exact decimals go in `NUMERIC(p,s)`/`DECIMAL`/`BIGINT`. Read them through a driver path that returns `Decimal`/string, not float (many drivers default to float for `NUMERIC` — configure it off).
|
|
102
|
+
- Language interop (protobuf/Thrift/FFI): `int64`→language `long`/`BigInt`, never `double`; protobuf JSON mapping *already* encodes `int64` as string — keep it.
|
|
103
|
+
|
|
104
|
+
7. **Units: make the unit part of the type or the name — no bare numbers cross a function boundary.**
|
|
105
|
+
- Suffix every quantity: `timeout_ms`, `dist_m`, `angle_rad`, `size_bytes`, `temp_c`. A parameter named `timeout` is a bug waiting to happen.
|
|
106
|
+
- Convert through one explicit factor table; round **once, at the conversion**, to the target's precision — don't let conversions compound. Conventions that bite:
|
|
107
|
+
|
|
108
|
+
| Domain | Trap | Rule |
|
|
109
|
+
|---|---|---|
|
|
110
|
+
| Data size | KB=1000 vs KiB=1024 | use IEC (`KiB/MiB`) for binary; label which |
|
|
111
|
+
| Angle | trig functions take **radians** | convert `deg * π/180` at the edge |
|
|
112
|
+
| Temperature | C/F is **affine** (offset), not a ratio | `F = C*9/5 + 32`, never `*9/5` alone |
|
|
113
|
+
| Time | ms vs s vs ns | store epoch as int ns/ms; never float seconds |
|
|
114
|
+
| Imperial | 1 mi = 1609.344 m (exact) | keep full-precision factors, round at end |
|
|
115
|
+
- Dimensional consistency: only add/subtract same-unit quantities; multiply/divide changes the dimension (m/s · s = m). If a units library exists (`pint`, `uom`, `js-quantities`), use it; otherwise centralize factors in one module and unit-test round-trips.
|
|
116
|
+
|
|
117
|
+
## Common Errors
|
|
118
|
+
|
|
119
|
+
- **`if x == 0.1 + 0.2`** — false; `0.1+0.2 == 0.30000000000000004`. Use a tolerance compare (step 2).
|
|
120
|
+
- **`abs(a-b) < 1e-9` as a universal epsilon** — passes for tiny numbers, fails for `1e12`. Scale tolerance relatively (step 2).
|
|
121
|
+
- **`sum(sq)/n - mean**2` for variance** — catastrophic cancellation gives negative/NaN variance. Use Welford (step 3).
|
|
122
|
+
- **`json.parse` of `{"id": 9007199254740993}` in JS** — silently becomes `...992`. Send IDs as strings; `Number.isSafeInteger` guards.
|
|
123
|
+
- **Storing an int64 ID in `FLOAT`/`double`** — loses the low bits above 2^53 on round-trip. Use `BIGINT`/`NUMERIC` and an integer/string driver path (step 6).
|
|
124
|
+
- **`(int) (a * b)` with two int32s** — overflows before the cast even runs. Widen to int64 before multiplying (step 5).
|
|
125
|
+
- **`while (n != target)` on a float loop counter** — may never hit `target` exactly; loop forever. Iterate with an integer index, compute the float.
|
|
126
|
+
- **`nan == nan` to detect NaN** — always false. Use `isnan`/`isFinite`; sort/min/max with NaN present is also undefined.
|
|
127
|
+
- **`Math.sqrt(neg)` / `acos(1.0000001)`** — returns `NaN` from rounding overshoot. Clamp domain before the call (step 4).
|
|
128
|
+
- **Passing `deg` to `Math.sin`** — silently wrong, no error. Sin takes radians; convert at the boundary.
|
|
129
|
+
- **`°C → °F` as a pure scale (`*9/5`)** — drops the `+32` offset; temperature conversions are affine, not linear.
|
|
130
|
+
- **Mixing 1000- and 1024-based sizes** — "5 GB" disk vs "5 GiB" RAM differ by ~7%. Label and use IEC binary units.
|
|
131
|
+
- **Casting a `length`/`count`/`id` to a narrower int** — truncates above the bound with no error. Range-check or keep it wide.
|
|
132
|
+
|
|
133
|
+
## Verify
|
|
134
|
+
|
|
135
|
+
1. **Float equality:** every float comparison in the diff uses a tolerance helper (or compares a quantity that is provably integer/decimal). `grep -nE '==|!=' ` over float paths returns no bare float `==`.
|
|
136
|
+
2. **Accumulation:** sum a 1e6-element array forwards vs reversed vs Kahan — Kahan matches a higher-precision (`Decimal`/`float128`) reference within tolerance; naive may not. Variance of near-equal large values is ≥ 0 and finite (Welford), not NaN.
|
|
137
|
+
3. **Special values:** feed `0`, `-0.0`, `NaN`, `+Inf`, `-Inf`, and a near-zero divisor through each public function — none crash silently; divide-by-zero is rejected or returns a documented sentinel; outputs pass an `isfinite` assertion.
|
|
138
|
+
4. **Integer bounds:** test at `MAX`, `MAX-1`, `MIN`, `0`, `MIN/-1` for every fixed-width arithmetic op — overflow is detected/saturated/intentionally-wrapped per the chosen semantics, never an undocumented wrap. Narrowing casts reject or clamp out-of-range input.
|
|
139
|
+
5. **Boundary round-trip:** serialize the value `9223372036854775807` (`int64` max) and a `12345.6789` decimal to JSON, parse on the other side (especially JS) → byte-identical value restored. `Number.isSafeInteger` is checked on any JS integer path.
|
|
140
|
+
6. **DB round-trip:** write `NUMERIC(38,9)` max-precision values and an int64-max ID, read back → equal as `Decimal`/integer/string (not float-coerced). No ID or exact-decimal column is `FLOAT`/`DOUBLE`.
|
|
141
|
+
7. **Units:** round-trip every conversion (`m→ft→m`, `C→F→C`, `KiB→bytes→KiB`) returns the original within rounding tolerance; trig inputs are radians; affine conversions keep their offset; mismatched-unit add/subtract is impossible (typed) or covered by a failing-on-mix test.
|
|
142
|
+
8. **Property tests:** generators include extremes (`±MAX`, `±0.0`, `NaN`, `Inf`, subnormals, near-epsilon pairs, overflow boundaries) — not just typical mid-range values.
|
|
143
|
+
|
|
144
|
+
Done = no bare float `==`, no ID/exact-decimal in binary float, every cross-boundary int64/decimal survives a JSON/JS/DB round-trip bit-for-bit, all fixed-width integer ops have defined overflow behavior, divide-by-zero and NaN/Inf are guarded at the boundary, and every unit-bearing quantity is named/typed with its unit and round-trips through conversion within tolerance.
|