npm - sanook-cli - Versions diffs - 0.4.0 → 0.5.0 - Mend

sanook-cli 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (235) hide show

package/.env.example +19 -0
package/CHANGELOG.md +144 -0
package/README.md +153 -20
package/README.th.md +136 -0
package/dist/agentContext.js +4 -0
package/dist/approval.js +6 -0
package/dist/bin.js +394 -51
package/dist/brain.js +92 -59
package/dist/brand.js +47 -0
package/dist/checkpoint.js +37 -0
package/dist/commands.js +86 -6
package/dist/compaction.js +76 -5
package/dist/config.js +100 -12
package/dist/cost.js +60 -3
package/dist/doctor.js +92 -0
package/dist/gateway/auth.js +2 -2
package/dist/gateway/ledger.js +2 -2
package/dist/gateway/scheduler.js +1 -0
package/dist/gateway/serve.js +6 -4
package/dist/gateway/server.js +10 -2
package/dist/git.js +11 -2
package/dist/hooks.js +43 -17
package/dist/knowledge.js +48 -49
package/dist/loop.js +182 -66
package/dist/lsp/client.js +173 -0
package/dist/lsp/framing.js +56 -0
package/dist/lsp/index.js +138 -0
package/dist/lsp/servers.js +82 -0
package/dist/mcp-server.js +244 -0
package/dist/mcp.js +184 -29
package/dist/memory-store.js +559 -0
package/dist/memory.js +143 -29
package/dist/orchestrate.js +150 -0
package/dist/providers/codex.js +2 -2
package/dist/providers/keys.js +3 -2
package/dist/providers/registry.js +133 -1
package/dist/repomap.js +93 -0
package/dist/search/chunk.js +158 -0
package/dist/search/embed-store.js +187 -0
package/dist/search/engine.js +203 -0
package/dist/search/fuse.js +35 -0
package/dist/search/index-core.js +187 -0
package/dist/search/indexer.js +241 -0
package/dist/search/store.js +77 -0
package/dist/session.js +42 -8
package/dist/skill-install.js +10 -10
package/dist/skills.js +12 -9
package/dist/summarize.js +31 -0
package/dist/tools/bash.js +21 -2
package/dist/tools/diagnostics.js +41 -0
package/dist/tools/edit.js +29 -7
package/dist/tools/index.js +8 -1
package/dist/tools/list.js +7 -2
package/dist/tools/permission.js +90 -9
package/dist/tools/read.js +23 -4
package/dist/tools/remember.js +1 -1
package/dist/tools/sandbox.js +61 -0
package/dist/tools/search.js +105 -4
package/dist/tools/task.js +195 -29
package/dist/tools/timeout.js +35 -0
package/dist/tools/util.js +10 -0
package/dist/tools/write.js +6 -4
package/dist/trust.js +89 -0
package/dist/ui/app.js +218 -27
package/dist/ui/banner.js +4 -9
package/dist/ui/history.js +30 -0
package/dist/ui/mentions.js +44 -0
package/dist/ui/setup.js +6 -5
package/dist/ui/useEditor.js +83 -0
package/dist/update.js +114 -0
package/dist/worktree.js +173 -0
package/package.json +11 -5
package/scripts/postinstall.mjs +33 -0
package/second-brain/.agents/_Index.md +30 -0
package/second-brain/.agents/skills/_Index.md +30 -0
package/second-brain/.agents/workflows/_Index.md +30 -0
package/second-brain/AGENTS.md +4 -4
package/second-brain/Acceptance/_Index.md +30 -0
package/second-brain/Acceptance/golden-case-template.md +39 -0
package/second-brain/Areas/_Index.md +30 -0
package/second-brain/Bugs/System-OS/_Index.md +30 -0
package/second-brain/Bugs/_Index.md +30 -0
package/second-brain/CLAUDE.md +4 -1
package/second-brain/Checklists/_Index.md +30 -0
package/second-brain/Checklists/preflight-postflight-template.md +29 -0
package/second-brain/Distillations/_Index.md +30 -0
package/second-brain/Entities/_Index.md +30 -0
package/second-brain/Entities/entity-template.md +33 -0
package/second-brain/Evals/_Index.md +30 -0
package/second-brain/Evals/correction-pairs.md +24 -0
package/second-brain/Evals/failure-taxonomy.md +24 -0
package/second-brain/Evals/golden-set.md +25 -0
package/second-brain/Evals/quality-ledger.md +23 -0
package/second-brain/Evals/self-eval-rubric.md +23 -0
package/second-brain/GEMINI.md +4 -4
package/second-brain/Goals/_Index.md +30 -0
package/second-brain/Handoffs/_Index.md +30 -0
package/second-brain/Home.md +7 -0
package/second-brain/Intake/Raw Sources/_Index.md +30 -0
package/second-brain/Intake/_Index.md +30 -0
package/second-brain/Intake/_Quarantine/_Index.md +30 -0
package/second-brain/Learning/_Index.md +30 -0
package/second-brain/Playbooks/_Index.md +30 -0
package/second-brain/Playbooks/playbook-template.md +23 -0
package/second-brain/Projects/_Index.md +30 -0
package/second-brain/Prompts/_Index.md +30 -0
package/second-brain/README.md +2 -1
package/second-brain/Research/_Index.md +30 -0
package/second-brain/Retrospectives/_Index.md +30 -0
package/second-brain/Reviews/_Index.md +30 -0
package/second-brain/Runbooks/_Index.md +30 -0
package/second-brain/Runbooks/eval-loop.md +24 -0
package/second-brain/Sessions/_Index.md +30 -0
package/second-brain/Shared/AI-Context-Index.md +20 -0
package/second-brain/Shared/AI-Threads/_Index.md +30 -0
package/second-brain/Shared/Archive/_Index.md +30 -0
package/second-brain/Shared/Assets/_Index.md +30 -0
package/second-brain/Shared/Context-Packs/_Index.md +30 -0
package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
package/second-brain/Shared/Coordination/NOW.md +28 -0
package/second-brain/Shared/Coordination/_Index.md +30 -0
package/second-brain/Shared/Coordination/agent-registry.md +24 -0
package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
package/second-brain/Shared/Coordination/task-board.md +32 -0
package/second-brain/Shared/Core-Facts/_Index.md +30 -0
package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
package/second-brain/Shared/Glossary/_Index.md +30 -0
package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
package/second-brain/Shared/Operating-State/_Index.md +30 -0
package/second-brain/Shared/Prompting/_Index.md +30 -0
package/second-brain/Shared/Provenance/_Index.md +30 -0
package/second-brain/Shared/Rules/_Index.md +30 -0
package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
package/second-brain/Shared/Rules/rules-formatting.md +34 -0
package/second-brain/Shared/Scripts/_Index.md +30 -0
package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
package/second-brain/Shared/User-Memory/_Index.md +30 -0
package/second-brain/Shared/User-Persona/_Index.md +30 -0
package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
package/second-brain/Shared/Working-Memory/_Index.md +30 -0
package/second-brain/Shared/_Index.md +30 -0
package/second-brain/Shared/mcp-servers/_Index.md +30 -0
package/second-brain/Skills/_Index.md +30 -0
package/second-brain/Templates/_Index.md +30 -0
package/second-brain/Templates/bug.md +2 -0
package/second-brain/Templates/handoff.md +2 -0
package/second-brain/Templates/session.md +2 -0
package/second-brain/Tools/_Index.md +30 -0
package/second-brain/Traces/_Index.md +30 -0
package/second-brain/Vault Structure Map.md +33 -1
package/second-brain/copilot/_Index.md +30 -0
package/skills/audit-license-compliance/SKILL.md +117 -0
package/skills/author-codemod/SKILL.md +110 -0
package/skills/build-audit-logging/SKILL.md +112 -0
package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
package/skills/build-cli-tool/SKILL.md +108 -0
package/skills/build-data-table/SKILL.md +141 -0
package/skills/build-native-mobile-ui/SKILL.md +154 -0
package/skills/build-offline-first-sync/SKILL.md +118 -0
package/skills/build-realtime-channel/SKILL.md +122 -0
package/skills/build-vector-search/SKILL.md +131 -0
package/skills/compose-local-dev-stack/SKILL.md +149 -0
package/skills/configure-bundler-build/SKILL.md +166 -0
package/skills/configure-dns-tls/SKILL.md +142 -0
package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
package/skills/configure-security-headers-csp/SKILL.md +122 -0
package/skills/contract-testing/SKILL.md +140 -0
package/skills/datetime-timezone-correctness/SKILL.md +125 -0
package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
package/skills/debug-flaky-tests/SKILL.md +128 -0
package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
package/skills/deliver-webhooks/SKILL.md +116 -0
package/skills/design-api-pagination/SKILL.md +144 -0
package/skills/design-authorization-model/SKILL.md +119 -0
package/skills/design-backup-dr-recovery/SKILL.md +113 -0
package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
package/skills/design-multi-tenancy/SKILL.md +100 -0
package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
package/skills/design-relational-schema/SKILL.md +129 -0
package/skills/design-search-index-infra/SKILL.md +151 -0
package/skills/design-state-machine/SKILL.md +108 -0
package/skills/design-token-system/SKILL.md +109 -0
package/skills/distributed-locks-leases/SKILL.md +120 -0
package/skills/encrypt-sensitive-data/SKILL.md +148 -0
package/skills/feature-flags-rollout/SKILL.md +130 -0
package/skills/file-upload-object-storage/SKILL.md +107 -0
package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
package/skills/harden-llm-app-reliability/SKILL.md +126 -0
package/skills/i18n-localization-setup/SKILL.md +113 -0
package/skills/idempotency-keys/SKILL.md +107 -0
package/skills/implement-push-notifications/SKILL.md +142 -0
package/skills/ingest-webhook-secure/SKILL.md +120 -0
package/skills/integrate-oauth-oidc/SKILL.md +126 -0
package/skills/load-stress-test/SKILL.md +129 -0
package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
package/skills/model-nosql-data/SKILL.md +118 -0
package/skills/money-decimal-arithmetic/SKILL.md +123 -0
package/skills/monitor-ml-drift/SKILL.md +109 -0
package/skills/numeric-precision-units/SKILL.md +144 -0
package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
package/skills/optimize-react-rerenders/SKILL.md +124 -0
package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
package/skills/payments-billing-integration/SKILL.md +114 -0
package/skills/pin-toolchain-versions/SKILL.md +116 -0
package/skills/plan-strangler-migration/SKILL.md +95 -0
package/skills/property-based-testing/SKILL.md +108 -0
package/skills/publish-package-registry/SKILL.md +130 -0
package/skills/recover-git-state/SKILL.md +119 -0
package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
package/skills/resilience-timeouts-retries/SKILL.md +104 -0
package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
package/skills/rewrite-git-history/SKILL.md +109 -0
package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
package/skills/schema-evolution-compatibility/SKILL.md +121 -0
package/skills/send-transactional-email/SKILL.md +126 -0
package/skills/serve-deploy-ml-model/SKILL.md +107 -0
package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
package/skills/setup-devcontainer-env/SKILL.md +131 -0
package/skills/setup-lint-format-precommit/SKILL.md +140 -0
package/skills/setup-monorepo-tooling/SKILL.md +125 -0
package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
package/skills/structured-output-llm/SKILL.md +86 -0
package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
package/skills/test-data-factories/SKILL.md +158 -0
package/skills/threat-model-stride/SKILL.md +123 -0
package/skills/train-evaluate-ml-model/SKILL.md +109 -0
package/skills/unicode-text-correctness/SKILL.md +109 -0
package/skills/visual-regression-testing/SKILL.md +120 -0

package/skills/map-privacy-data-gdpr/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: map-privacy-data-gdpr
+description: Implements privacy/data-protection engineering — personal-data inventory/mapping (RoPA), lawful-basis and versioned consent capture, DSAR machine-readable export and right-to-erasure cascades across derived data/logs/backups, TTL/scheduled retention purge, and PII minimization/pseudonymization — for GDPR/CCPA-style compliance.
+when_to_use: A product stores personal data and needs consent capture, data export, deletion/erasure, or retention controls. Distinct from auth-jwt-session and design-authorization-model (who may access), build-audit-logging (tamper-evident action trail), and security-review (vulnerability audit).
+---
+## When to Use
+Reach for this skill when the requirement is **what happens to a person's data**, not who may touch it:
+- "A user requested their data / file a DSAR export endpoint"
+- "Implement account deletion / right to be forgotten / erasure that actually removes them everywhere"
+- "Record consent for marketing/analytics — granular, withdrawable, with proof"
+- "We keep PII forever — add retention limits / a purge job"
+- "Map where personal data lives for our DPIA / Article 30 record of processing"
+- "Stop collecting/storing PII we don't need; pseudonymize the rest"
+NOT this skill:
+- Logging in users, sessions, OAuth, refresh rotation → **auth-jwt-session**
+- Deciding which role/tenant may read a record (RBAC/ABAC/row scoping) → **design-authorization-model**
+- The tamper-evident trail of *who did what* (including who ran an erasure) → **build-audit-logging**
+- Hunting injection/SSRF/access-control bugs in changed code → **security-review**
+- Design-level threat enumeration over a system handling PII → **threat-model-stride**
+- Backup/PITR/retention *of the datastore itself* (RPO/RTO, WAL archiving) → **design-backup-dr-recovery**
+## Steps
+1. **Build the data inventory first — you cannot delete or export what you haven't mapped.** Produce a machine-readable record of processing (RoPA, GDPR Art. 30). One row per (data element × store). Drive everything downstream — export, erasure, retention — off this file, not off tribal knowledge.
+   ```yaml
+   # privacy/inventory.yaml — source of truth for DSAR + purge + DPIA
+   - element: email
+     category: contact          # contact | identifier | special-category | behavioral | financial
+     store: postgres.users.email
+     purpose: account login, transactional mail
+     lawful_basis: contract     # see step 2 table
+     retention: account_lifetime_plus_30d
+     subject_key: users.id
+     export: true
+     erase: anonymize           # delete | anonymize | retain-with-basis
+   - element: ip_address
+     category: identifier
+     store: postgres.request_logs.ip
+     purpose: fraud/abuse detection
+     lawful_basis: legitimate_interest
+     retention: 90d
+     subject_key: request_logs.user_id
+     export: true
+     erase: delete
+   - element: clickstream
+     category: behavioral
+     store: bigquery.analytics.events
+     purpose: product analytics
+     lawful_basis: consent
+     retention: 14m
+     subject_key: events.user_pseudo_id    # NOT the raw user id — see step 6
+     export: true
+     erase: delete
+   ```
+   Enumerate **every** store: primary DB, replicas, search index (Elasticsearch), caches (Redis), object storage (S3 uploads), data warehouse, application logs, third-party processors (Stripe, Segment, Intercom, Sentry), and **backups**. A store missing from this file is a store your erasure silently skips — that is the #1 compliance gap.
+2. **Pin a lawful basis to every element before you collect it.** No basis = you may not process it. Pick the *narrowest* basis that fits and record it in the inventory; consent is the weakest because it's revocable and must be audited.
+   | Lawful basis (GDPR Art. 6) | Use for | On erasure request |
+   |---|---|---|
+   | **Consent** | marketing, non-essential analytics, optional cookies | must delete; consent withdrawable any time |
+   | **Contract** | login email, order/shipping data, billing | retain while contract active, then purge |
+   | **Legal obligation** | tax invoices, AML/KYC records | **keep** for statutory period; refuse erasure with reason |
+   | **Legitimate interest** | fraud/abuse, basic security logs | keep if LIA balancing holds; honor objection |
+   | **Vital / public interest** | rare; safety-of-life | document case-by-case |
+   Default new fields to **no collection** until a basis is assigned. CCPA framing differs (opt-out of "sale/share", not opt-in consent) — model both as flags on the same consent record.
+3. **Capture consent as granular, versioned, withdrawable, time-stamped records — never a single boolean.** A `marketing_opt_in = true` column proves nothing and can't show *which* policy version they agreed to. Append-only consent ledger:
+   ```sql
+   CREATE TABLE consent_records (
+     id             bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
+     subject_id     uuid NOT NULL,
+     purpose        text NOT NULL,      -- 'marketing_email','analytics','third_party_share'
+     granted        boolean NOT NULL,   -- false row = explicit withdrawal
+     policy_version text NOT NULL,      -- 'privacy-policy@2026-03-01'
+     source         text NOT NULL,      -- 'signup_form','preference_center','cookie_banner'
+     evidence       jsonb NOT NULL,     -- {ip, user_agent, banner_choice_ids}
+     created_at     timestamptz NOT NULL DEFAULT now()
+   );
+   CREATE INDEX ON consent_records (subject_id, purpose, created_at DESC);
+   -- current state = latest row per (subject_id, purpose); NEVER UPDATE/DELETE rows
+   ```
+   Withdrawal is a new `granted=false` row, not a mutation — you must be able to prove the full history. Check consent at point of use (`SELECT … WHERE subject_id=$1 AND purpose=$2 ORDER BY created_at DESC LIMIT 1`), and gate consent-based processing on it. Cookie banner: no non-essential cookies/tags fire before a `granted=true` row exists.
+4. **DSAR export: assemble a complete, machine-readable package keyed off the inventory.** Iterate every row with `export: true`, pull the subject's data by `subject_key`, emit structured JSON (GDPR Art. 20 portability requires machine-readable). Include data from processors via their APIs. Authenticate the requester *hard* (re-auth + verification) — handing one user's data to another is itself a breach. Respond within the statutory window (GDPR 30 days; CCPA 45). Don't include other people's PII caught in the subject's rows (e.g. recipient emails) — redact third parties.
+5. **Right-to-erasure must cascade to every store, including derived data, caches, logs, and backups — or document the backup-expiry path.** A `DELETE FROM users WHERE id=…` that leaves the subject in the search index, Redis cache, analytics warehouse, and last night's snapshot is **not** erasure. Drive a cascade from the inventory's `erase:` column:
+   ```python
+   def erase_subject(subject_id):
+       for row in inventory:                      # one source of truth
+           store = connect(row.store)
+           match row.erase:
+               case "delete":    store.delete(row, subject_id)
+               case "anonymize": store.anonymize(row, subject_id)   # see step 6
+               case "retain-with-basis":
+                   log_retained(subject_id, row, reason=row.lawful_basis)  # legal hold
+       invalidate_cache(subject_id)               # Redis keys, CDN signed URLs
+       delete_from_search_index(subject_id)       # Elasticsearch/OpenSearch
+       enqueue_processor_deletions(subject_id)    # Stripe, Segment, Intercom, Sentry APIs
+       add_to_suppression_list(subject_id)        # tombstone — see below
+       record_erasure(subject_id)                 # write to audit log (build-audit-logging)
+   ```
+   **Backups can't be selectively edited** — the defensible approach is: (a) restrict restored-backup access, (b) re-apply the erasure to any data restored from backup, and (c) let backups age out under a bounded retention (e.g. 35 days), documented in your privacy policy. Maintain a **suppression/tombstone list** so a restore or a late-arriving event for an erased subject is re-deleted, not resurrected. Erasure under a legal-hold basis (tax/AML) is refused with a recorded reason, not silently ignored.
+6. **Minimize and pseudonymize — the cheapest data to protect is data you don't hold.** Per element ask: do we *need* it? Don't collect optional PII "just in case". Replace direct identifiers with a per-subject pseudonym key in analytics/derived stores so erasing the key map breaks linkage (`erase: anonymize` then becomes deleting the mapping, not rewriting a warehouse). True anonymization (irreversible, no re-identification via combination) takes data **out of GDPR scope** — prefer it for analytics/ML training sets. Tokenize or keyed-hash (HMAC with a secret salt) and delete the mapping; **never** use a plain unsalted hash you call "anonymized". Drop high-cardinality quasi-identifiers (full IP → /24, exact DOB → birth year) where the purpose allows.
+7. **Enforce retention with TTL or scheduled purge jobs — retention written in a policy but not enforced in code is a fiction.** Translate each `retention:` value into a real mechanism:
+   - Native TTL where the store has one: MongoDB TTL index, DynamoDB TTL attribute, Redis `EXPIRE`, BigQuery partition expiration, S3 lifecycle rules, Elasticsearch ILM.
+   - A scheduled job (cron / Airflow / pg_cron) that `DELETE`s rows past `created_at + retention` for stores without TTL — run daily, log counts purged, alert on zero-purged-when-expected.
+   Logs and analytics are the usual offenders (PII-laden, retained forever). Cap them explicitly.
+8. **Document cross-border transfers and processor agreements.** Any element flowing to a processor outside the data's region needs a transfer mechanism (SCCs / adequacy decision) and a signed DPA. List sub-processors. Record this alongside the inventory — auditors ask for it, and a new third-party integration that isn't in the list is an unmapped data egress.
+## Common Errors
+- **Erasure that hits the primary DB only.** The subject survives in the search index, cache, warehouse, logs, and backups. Drive deletion from the inventory across *every* store; assert absence afterward.
+- **Consent as one boolean column.** Can't prove which policy version, when, or how; an `UPDATE` erases the prior state. Use an append-only versioned ledger; withdrawal is a new row.
+- **No suppression list.** A backup restore or a delayed event re-creates an erased subject ("data resurrection"). Keep a tombstone list and re-apply erasure on restore/ingest.
+- **Reversible "anonymization".** A plain SHA-256 of an email is re-identifiable by dictionary attack — still personal data, still in scope. Keyed-hash/tokenize and delete the mapping, or aggregate so individuals can't be singled out.
+- **Treating backups as out of scope entirely.** Ignoring them fails erasure; trying to surgically edit them corrupts them. Use bounded retention + restore-time re-erasure, and document it.
+- **Weak DSAR identity check.** Emailing an export to whoever asks lets an attacker harvest a victim's data. Re-authenticate and verify before export or erasure.
+- **Retention policy with no enforcement job.** "We keep logs 90 days" while the table grows unbounded. Wire a TTL or a scheduled purge and verify it actually runs.
+- **Logging full PII.** Request/error logs capturing emails, tokens, full bodies become an uncontrolled PII store with infinite retention. Redact at the logger; set log retention.
+- **Forgetting processors.** Deleting locally but leaving the subject in Stripe/Segment/Intercom/Sentry. Call each processor's deletion API as part of the cascade.
+- **Hardcoding the store list in deletion code instead of the inventory.** A new table added without touching the deletion code is silently skipped forever. Single source of truth, generated cascade.
+## Verify
+- **Erasure completeness:** run `erase_subject(id)`, then query **every** store in the inventory for that subject's `subject_key` → zero rows (or only anonymized/legal-hold-retained rows with a recorded reason). This is the test that catches the forgotten store; automate it per store.
+- **Resurrection resistance:** restore a backup taken before an erasure (or replay a late event) → the subject is re-suppressed, not present. Suppression list is consulted on restore/ingest.
+- **Export completeness:** for a seeded subject with data in N stores, the DSAR package contains all N, is valid JSON/machine-readable, and contains **no other** subject's PII.
+- **Consent gate:** withdrawing consent (new `granted=false` row) stops the gated processing on the next check; granted/withdrawn history is fully reconstructable; no non-essential tag fires before consent.
+- **Retention enforcement:** advance a record past its retention window (or wait/seed) → the TTL/purge job removes it on the next scheduled run; purge job logs counts and alerts on anomalies.
+- **Lawful basis coverage:** every element in the inventory has a basis; legal-obligation elements correctly *survive* an erasure request with a recorded reason.
+- **Minimization:** no element is collected/stored without a row in the inventory; derived/analytics stores key on a pseudonym, not the raw identifier.
+- **Transfers:** every cross-border element maps to a transfer mechanism + DPA; processor list matches actual integrations.
+Done = an erasure request provably removes or anonymizes the subject across every inventoried store (with backups handled by bounded retention + restore-time re-erasure and a suppression list), the DSAR export is complete and machine-readable with no third-party PII leakage, consent is versioned/withdrawable with reconstructable history, and every retention window is enforced by a TTL or scheduled purge that demonstrably runs.

package/skills/model-nosql-data/SKILL.md ADDED Viewed

@@ -0,0 +1,118 @@
+---
+name: model-nosql-data
+description: Models data for document, key-value, and wide-column stores access-pattern-first — enumerates queries, then picks partition/sort keys for even distribution, chooses embed-vs-reference per relationship, lays out single-table/aggregate items, denormalizes deliberately with a fan-out path, and avoids hot partitions.
+when_to_use: When the datastore is non-relational (DynamoDB, MongoDB, Cassandra/ScyllaDB, Firestore, Bigtable) and you must shape items/documents/rows around queries — picking partition keys, embed vs reference, a single-table model, or a wide-column primary key — before writing the data layer. Distinct from design-relational-schema (normalized tables + joins) and optimize-sql-query (tunes an existing relational query); caching-strategy is a read cache in front of any store, not the store's own model.
+---
+## When to Use
+Reach for this skill when you must **shape the store around its queries**, before any table/collection exists:
+- "Design the DynamoDB table(s) for this service"
+- "Should this be embedded or a separate collection in MongoDB?"
+- "Pick the partition key and clustering columns for this Cassandra table"
+- "We're getting hot partitions / throttling on one key — fix the key design"
+- "Model a many-to-many (users↔teams, products↔orders) in a store with no joins"
+- "Firestore/Bigtable layout for a feed/timeline read"
+NOT this skill:
+- A normalized schema with joins in a **relational** DB (entities, 3NF, FK/CHECK) → design-relational-schema
+- A slow query against an existing **relational** schema → optimize-sql-query
+- A read cache (TTL, invalidation, stampede) **in front of** the store → caching-strategy
+- Schema *change* safety / locks / rollback on a live table → db-migration-safety
+- Append-only event streams + projections as the system of record → design-event-sourcing-cqrs
+- Background work / queue semantics → message-queue-jobs
+## Steps
+1. **Enumerate every access pattern first — this drives 100% of the design. No keys until this table is full.** One row per *operation*, reads and writes. A pattern you forget becomes a full scan in prod.
+   | Pattern | R/W | Args (known at call time) | Result shape | Freq | Latency target | Selectivity |
+   |---|---|---|---|---|---|---|
+   | Get user by id | R | userId | 1 item | very high | <10ms | 1 |
+   | List orders for user, newest first | R | userId, limit | N items, sorted | high | <20ms | bounded ~100s |
+   | Get order + its line items | R | orderId | 1+M items | high | <20ms | bounded |
+   | Create order (+ items, + user counter) | W | order, items | — | med | <30ms | multi-item |
+   Rule: **you can only query by what you have in hand.** Every read's *Args* column must become a key or index prefix in step 3. If an Arg isn't a key, that read is a scan — reject the model.
+2. **Confirm store-family fit before modeling.** Don't model a graph in a KV store.
+   | Family | Pick when | Avoid when | Examples |
+   |---|---|---|---|
+   | **Document** | nested aggregate read/written as a unit; flexible fields; secondary indexes needed | heavy cross-doc joins; huge fan-out updates of shared data | MongoDB, Firestore |
+   | **Wide-column** | massive write volume; time-series/feeds; query = known partition + range scan | ad-hoc queries on non-key columns; multi-key transactions | Cassandra, ScyllaDB, Bigtable |
+   | **Key-value / single-table** | every access is by a designed key; you want one round trip per pattern | analytics / unpredictable query shapes | DynamoDB single-table, Redis-as-primary |
+   Default for an app backend with a fixed, known pattern set: **document store** unless write volume or strict single-digit-ms-at-scale forces wide-column/DynamoDB.
+3. **Design keys for distribution (partition) and range (sort) — distribution is non-negotiable.**
+   - **Partition/hash key** = which physical shard. Choose **high-cardinality, evenly-requested** values: `userId`, `tenantId#userId`, `deviceId`. **Never** a low-cardinality or monotonic value (`status`, `country`, `true/false`, a date, an auto-increment) as the sole partition key — that is the #1 hot-partition cause.
+   - **Sort/clustering key** = order *within* a partition and enables range/`begins_with` queries: `createdAt`, `ORDER#<ts>`, `<type>#<id>`. Compose it to serve range reads: items sorted newest-first, prefix-filterable.
+   - DynamoDB single-table generic keys: `PK` / `SK` plus overloaded `GSI1PK`/`GSI1SK`. Encode the entity type in the key, not a separate column:
+     ```
+     PK              SK                   GSI1PK        GSI1SK
+     USER#u1         PROFILE              —             —              # user profile
+     USER#u1         ORDER#2026-06-15#o9  ORDER#o9      STATUS#PAID    # order under user, + lookup by order
+     ORDER#o9        ITEM#li1             —             —              # line item under order
+     ```
+     `Query(PK=USER#u1, SK begins_with ORDER#)` → that user's orders, newest-first, one round trip, no scan.
+   - **Bound every partition.** If items-per-partition grows without limit (all events under `PK=GLOBAL`, a celebrity's followers under one key), shard it: suffix `#<bucket 0..N-1>` (write-sharding) and scatter-gather on read, or split the partition by time (`PK=feed#u1#2026-06`).
+4. **Embed vs reference — decide per relationship, default to embed for read-together.**
+   | Embed (nest inside parent doc/item) | Reference (separate doc/item + id) |
+   |---|---|
+   | Read together almost always | Accessed independently / by other parents |
+   | Child count **bounded & small** (≤ dozens) | Unbounded or large child set |
+   | Child owned by exactly one parent | Shared across many parents |
+   | Updated together / rarely | High-churn child, low-churn parent (write amplification) |
+   | Total well under the item-size limit | Would blow the size limit |
+   **Hard ceilings — model to them, not near them:** DynamoDB item **400 KB**; MongoDB document **16 MB**; Cassandra partition keep **< 100 MB / < 100k rows**. Embedding an unbounded array (comments, events, followers) eventually hits the ceiling and turns every append into a full-doc rewrite — **reference those.** Embed `address`, `lineItems` of one order; reference `comments`, `auditEvents`, `members`.
+5. **Model M:N and secondary lookups as items/indexes — there are no joins.**
+   - **Composite sort key** for one-to-many under a parent: `SK = ORDER#<ts>#<id>`, query by `begins_with(ORDER#)`.
+   - **Adjacency list** for M:N in single-table: both directions are items. `PK=USER#u1, SK=TEAM#t1` *and* `PK=TEAM#t1, SK=USER#u1`. Query a user's teams by `PK=USER#u1, begins_with(TEAM#)`; a team's users by the mirror.
+   - **GSI / inverted index** to query by a non-key attribute: project `email` into `GSI1PK=EMAIL#<email>` to "get user by email." Each new *read* pattern that doesn't fit the base key = one GSI (DynamoDB) or one secondary index (Mongo/Cassandra), not a scan.
+   - Keep GSIs few and purposeful — each is a full extra copy of projected attributes (storage + write cost on every base write).
+6. **Denormalize deliberately, and write the path that keeps the copies consistent.** Duplicating data (order carries `userName`, post carries `authorAvatar`) is correct NoSQL — *if* you own the fan-out:
+   - **Single write touches multiple items** → use a **transaction** (DynamoDB `TransactWriteItems` ≤ 100 items; Mongo multi-doc txn) so the copies commit atomically.
+   - **One source → many copies** (author renames → update 10k posts) → **async fan-out** via a stream (DynamoDB Streams / CDC / outbox), never a synchronous loop on the write path.
+   - **Tolerate brief drift** → store a source-of-truth pointer and run **async repair**/reconciliation; never let two copies both claim authority.
+   Pick one strategy per duplicated field and write it down — silent divergence is the failure mode.
+7. **Time-ordering, TTL, and blob offload.**
+   - Newest-first reads: make the sort key descending-friendly (`ORDER#<reverse-ts>` or query with `ScanIndexForward=false`); never client-side sort a scan.
+   - Expiring data (sessions, OTPs, ephemeral feeds): set native **TTL** (DynamoDB TTL attribute, Mongo TTL index, Cassandra per-row TTL) — don't run a delete cron.
+   - **Large/binary payloads** (images, PDFs, >100 KB blobs): store in object storage (S3/GCS) and keep only the **key/URL + metadata** in the item. Inlining blobs burns item-size budget and read throughput.
+8. **Prove no full scans: map each access pattern to exactly one key/index path.** Re-walk the step-1 table; for every row write the resolved access: `Query(PK=…, SK …)` / `GetItem` / `Query(GSI1, …)`. If any row resolves to `Scan` or "filter after fetch on a non-key attribute," the model is incomplete — add a key, GSI, or duplicated item and repeat. A pattern with no index path is a bug, not a tradeoff.
+## Common Errors
+- **Designing keys before listing access patterns.** Guarantees a missing query path discovered in prod as a scan. Fill the step-1 table first, always.
+- **Low-cardinality or monotonic partition key** (`status`, `date`, `true`, auto-increment id). Concentrates traffic on one shard → hot partition + throttling. Use a high-cardinality, evenly-hit key; write-shard or time-bucket if forced.
+- **Embedding an unbounded array** (comments/events/followers inside the parent). Hits the 400 KB / 16 MB ceiling and makes every append rewrite the whole doc. Reference it as child items.
+- **Modeling relational then "adding NoSQL on top."** Normalized tables + app-side joins = N+1 round trips and scans. Model the aggregate the query needs, even if it duplicates data.
+- **A `Filter`/`$match` on a non-key field mistaken for a query.** DynamoDB `FilterExpression` and Mongo filters on un-indexed fields run *after* a scan reads everything — billed and slow. Make the filter field a key/index prefix.
+- **Denormalized copy with no fan-out path.** `userName` cached in 10k orders, never updated on rename → permanent stale data. Define transaction / stream fan-out / async repair per duplicated field.
+- **One GSI per attribute "just in case."** Each index is a full write-amplifying copy. Add a GSI only for an actual read pattern from step 1.
+- **Synchronous fan-out on the write path** (loop updating thousands of copies in the request). Latency spikes and partial failures. Offload to a stream/queue.
+- **Unbounded partition growth** (all rows under one `PK`, a whale tenant). Wide-column partition > ~100 MB degrades; DynamoDB throttles the key. Bucket by time or write-shard with a suffix.
+- **Blobs inline in the item.** Caps how many items fit per read and wastes throughput. Offload to object storage, keep a pointer.
+## Verify
+1. **Coverage:** every step-1 access pattern maps to exactly one `Get`/`Query`/index path; **zero** resolve to `Scan` or post-fetch filter on a non-key field.
+2. **Distribution:** the partition key is high-cardinality and request-even; no sole partition key is a status/boolean/date/sequence. Estimate items & bytes per hottest partition — under the family ceiling (DynamoDB ~10 GB/partition soft, Cassandra < 100 MB, Mongo doc < 16 MB, DynamoDB item < 400 KB).
+3. **Range reads** return already-sorted (sort/clustering key does the ordering); no client-side sort over a fetched set.
+4. **Embed/reference** justified per relationship against the step-4 table; no unbounded array embedded; largest realistic item stays well under the limit.
+5. **M:N & secondary lookups** each have an explicit path (adjacency item pair, composite SK, or GSI) — confirm both directions of every M:N.
+6. **Each denormalized field** names its consistency mechanism (transaction / stream fan-out / async repair); none has two authoritative copies.
+7. **Throughput sim:** project read+write units (or ops/s) per partition under peak from step-1 frequencies; confirm no single key exceeds the per-partition limit; write-shard/time-bucket where it does.
+8. **TTL** set on every ephemeral entity; **blobs > ~100 KB** offloaded to object storage with only a pointer stored.
+Done = every access pattern resolves to a single non-scan key/index path, no partition key is hot or unbounded under projected peak load, every embedded relationship is bounded under the item-size limit, and every denormalized copy has a named write-path keeping it consistent.

package/skills/money-decimal-arithmetic/SKILL.md ADDED Viewed

@@ -0,0 +1,123 @@
+---
+name: money-decimal-arithmetic
+description: Implements correct monetary and decimal arithmetic using integer minor units or arbitrary-precision decimals — per-currency exponents (ISO 4217), explicit rounding modes (banker's vs half-up), largest-remainder allocation that sums exactly, FX triangulation, NUMERIC storage, and locale-aware formatting — to eliminate float drift and off-by-a-penny totals.
+when_to_use: Code does financial math — prices, totals, tax/VAT, discounts, interest, invoicing, splitting a charge across line items, multi-currency conversion, or rounding to cents; money is stored as float or summed with ad-hoc Math.round; totals are off by a cent; or you're choosing a money/decimal type (BigDecimal, Python decimal, dinero.js, rust_decimal). Distinct from numeric-precision-units (general float/units correctness, not currency-exponent/allocation/FX rules) and payments-billing-integration (drives PSP charge/subscription state, then calls this skill for the totals).
+---
+## When to Use
+Reach for this skill when the bug or task is about **numeric correctness of money**, not throughput, schema, or float/units in general:
+- "Total is off by a cent" / "tax doesn't add up to the sum of line items"
+- "Split this $100 charge across 3 items / refund proportionally"
+- "Store and compute prices, discounts, VAT, interest, invoice rounding"
+- "Convert USD→JPY→EUR, what rate precision and rounding?"
+- "We store amounts as `float`/`DOUBLE` — is that safe?" (no)
+- Choosing a money/decimal type: `BigDecimal`, Python `decimal.Decimal`, `dinero.js`, `rust_decimal`, `js-joda`-style money libs
+NOT this skill:
+- General float pitfalls (epsilon/ULP compare, Kahan summation, NaN/Inf guards) or non-money unit conversion (metric/imperial, data sizes, angles) → numeric-precision-units (this skill is the money-specific specialization: ISO 4217 exponents, allocation, FX)
+- Integrating a PSP, idempotent charges, subscription/proration, payment webhooks → payments-billing-integration (it owns billing *state*; it calls this skill for the rounding/allocation/FX math)
+- Making a slow aggregate query fast → optimize-sql-query
+- Detecting nulls/outliers/dupes in a dataset → validate-data-quality
+- Picking column types / running a safe ALTER on a money column → db-migration-safety
+- Reshaping/cleaning numeric columns in a dataframe → wrangle-tabular-data
+- Serialization contract for an API field's type → rest-graphql-contract
+- Writing the property tests themselves as a test suite → write-tests (this skill specifies *which* invariants; that one structures the suite)
+## Steps
+1. **Never use binary float for money. Pick the representation by language, not by habit.** Floats can't represent `0.10` exactly, so `0.1 + 0.2 === 0.30000000000000004` and `0.1 * 3 !== 0.3`. Two safe representations:
+   | Representation | What it is | Use when | Watch out |
+   |---|---|---|---|
+   | **Integer minor units** | store cents as `int`/`bigint` (`$12.34` → `1234`) | default for fixed exponent, transactional ledgers, money over the wire | must track currency to know the exponent; intermediate math (interest, %) still needs a decimal/round step |
+   | **Arbitrary-precision decimal** | base-10 type: Python `Decimal`, Java `BigDecimal`, .NET `decimal`, `rust_decimal`, JS `decimal.js`/`big.js` | rates, %, interest, tax with sub-cent intermediates, accounting needing >2 dp | set a context/precision; still must round to currency exponent at the boundary |
+   Per language: **JS/TS** → `dinero.js` v2 or `big.js` (never `Number`); **Python** → `decimal.Decimal` (never `float`); **Java/Kotlin** → `BigDecimal` (never `double`); **.NET** → `decimal`; **Rust** → `rust_decimal::Decimal`; **Go** → `int64` minor units or `shopspring/decimal`; **Postgres** → `NUMERIC` (never `FLOAT`/`REAL`/`DOUBLE`).
+2. **Model amount + currency as one value; respect the per-currency exponent.** A bare number is not money — `100` is meaningless without a currency, and the exponent varies by ISO 4217:
+   | Currency | Exponent (minor digits) | `1.00` unit = |
+   |---|---|---|
+   | USD, EUR, GBP | 2 | 100 cents |
+   | JPY, KRW, CLP | **0** | 1 (no cents) |
+   | BHD, KWD, TND | **3** | 1000 fils |
+   ```ts
+   type Money = { amount: bigint; currency: string }; // amount in MINOR units
+   // $12.34 → { amount: 1234n, currency: "USD" }  exponent 2
+   // ¥1234  → { amount: 1234n, currency: "JPY" }  exponent 0
+   // 1.234 BD → { amount: 1234n, currency: "BHD" } exponent 3
+   ```
+   Reject any binary op on two `Money` of different currencies — throw, don't coerce. Drive the exponent from an ISO 4217 table, never hardcode `2`.
+3. **Decide and document ONE rounding mode; round only at boundaries.** The default sources disagree, so state it explicitly:
+   - **Banker's rounding (half-to-even, `ROUND_HALF_EVEN`)** — default for statistical/aggregate fairness; removes the upward bias of always rounding `.5` up. Use for interest, large batches, GAAP/IFRS contexts. `2.5→2`, `3.5→4`.
+   - **Half-up (`ROUND_HALF_UP`, "arithmetic")** — what invoices and most tax authorities expect for a single bill line. `2.5→3`. Many VAT rules mandate this per line.
+   Pick **half-even as the engine default**, override to **half-up where a tax/billing rule requires it**, and write the chosen mode next to the code. **Carry full precision through the calculation; round exactly once, at the point you produce a displayable/storable currency amount** — never round intermediates, or errors compound.
+4. **Allocate with largest-remainder so the parts sum EXACTLY to the whole.** Splitting `$100 / 3` as `33.33 × 3 = 99.99` leaks a penny. Distribute the remainder deterministically:
+   ```python
+   def allocate(total_minor: int, ratios: list[int]) -> list[int]:
+       s = sum(ratios)
+       shares = [total_minor * r // s for r in ratios]   # floor each
+       remainder = total_minor - sum(shares)              # pennies left over
+       # hand out the leftover pennies, one each, by largest fractional part
+       order = sorted(range(len(ratios)),
+                      key=lambda i: (total_minor * ratios[i]) % s, reverse=True)
+       for i in order[:remainder]:
+           shares[i] += 1
+       return shares
+   # allocate(10000, [1,1,1]) -> [3334, 3333, 3333]  sums to 10000 exactly
+   ```
+   Invariant: `sum(allocate(total, ratios)) == total`, always, for any total and ratios. Use this for splitting charges, proportional refunds, tax-inclusive line breakdowns.
+5. **Fix tax/discount ordering and the rounding points — they change the total.** Decide and document:
+   - **Discount before tax** (typical retail): `taxable = price − discount`, then `tax = round(taxable × rate)`.
+   - **Round per line vs round on total**: per-line rounding (round each line's tax, then sum) and total rounding (sum exact line taxes, round once) can differ by cents. Most invoice/VAT regimes require **round per line**; pick one, document it, keep it consistent across the whole invoice.
+   - **Tax-inclusive (gross) prices**: extract tax with `tax = round(gross × rate / (1 + rate))`; the net is `gross − tax` so the parts reconcile exactly.
+6. **FX conversion — fix precision, direction, triangulation, and one round.** A rate is a high-precision decimal, not money. Rules:
+   - Store rates at **≥6 significant decimal places** (`decimal`, not float); know the direction (`USD→EUR` rate vs its reciprocal — they are not 1/x to display precision).
+   - Compute in full decimal precision: `target_minor = source_major × rate`, scaled to the **target** currency's exponent, then **round once** (half-even) to target minor units.
+   - **Triangulate** through a base when no direct pair exists (`THB→base→JPY`); apply both legs in full precision and round only the final result, never the intermediate base amount.
+   - Never round the source before converting; never reuse a stale/averaged rate when an exact contractual rate is required.
+7. **Compare and test equality on the exact integer/decimal — never a float epsilon.** With minor units / decimals, `a == b` is exact; `abs(a−b) < 1e-9` is a code smell signaling float crept in. Equality must include currency: `{1234,"USD"} != {1234,"JPY"}`. Sort/compare amounts only within the same currency.
+8. **Store as exact types; serialize as string, not float JSON.** Postgres `NUMERIC(precision, scale)` or `BIGINT` minor units — **never `FLOAT`/`DOUBLE`/`REAL`** (lossy) and never `MONEY` (locale-fragile, fixed scale). Over JSON, emit amounts as a **string** (`"12.34"`) or as integer minor units + currency code — a JSON number is an IEEE-754 double and silently corrupts ≥16-digit and some 2-dp values. Set the column scale to the currency's max exponent (3 to be safe across BHD/KWD).
+9. **Display and parse via the locale layer, separate from the math.** Format only at the edge with `Intl.NumberFormat(locale, {style:'currency', currency})` (JS) / `babel.numbers.format_currency` (Python) / `NumberFormat.getCurrencyInstance` (Java) — these place the symbol, group separators, and minor digits per locale (`-1.234,56 €` vs `($1,234.56)`). When parsing user input, strip locale separators back to a canonical decimal/minor-unit value before any arithmetic; never `parseFloat` a formatted string.
+10. **Lock the invariants with property tests** (delegate suite structure to write-tests; assert these properties): allocation sums to the whole; round-trip format→parse is identity in canonical units; conversion+inverse stays within one minor unit; commutativity/associativity of addition in the same currency; no operation produces a fractional minor unit.
+## Common Errors
+- **`float`/`double` anywhere in the money path.** `0.1 + 0.2 != 0.3`; sums drift over many rows. Fix: integer minor units or a base-10 decimal type end to end.
+- **Hardcoding exponent `2`.** Breaks JPY (0) and BHD/KWD (3) — `¥1234` becomes `¥12.34`. Fix: read the exponent from an ISO 4217 table.
+- **Rounding intermediates.** Rounding each step before the final means errors accumulate. Fix: full precision through the calc, round exactly once at the output boundary.
+- **Naïve split (`total/n`, round each).** `100/3 → 33.33×3 = 99.99`, a penny vanishes. Fix: largest-remainder allocation (step 4); assert `sum == total`.
+- **Mixing currencies in one operation.** Adding USD to JPY silently yields garbage. Fix: type `Money` with currency; throw on mismatch.
+- **Unspecified/mixed rounding mode.** Half-even in one place, half-up in another → reconciliation gaps. Fix: one documented mode, override only where a tax rule mandates.
+- **Float JSON for amounts.** `12.34` over the wire becomes `12.339999999999`. Fix: serialize as string or integer minor units + currency.
+- **`FLOAT`/`MONEY` SQL columns.** Lossy or locale-fragile storage. Fix: `NUMERIC(p,s)` or `BIGINT` minor units.
+- **`parseFloat` on a formatted string.** `"1.234,56"` (de-DE) parses to `1.234`. Fix: locale-aware parse to canonical units before math.
+- **Float epsilon comparison.** `abs(a-b) < 1e-9` for money means float leaked in. Fix: exact integer/decimal compare, including currency.
+- **Reciprocal FX assumption.** Treating `EUR→USD` as exactly `1/(USD→EUR)` introduces drift. Fix: store/quote each direction; round only the final converted amount.
+## Verify
+1. **No float in the money path:** grep the diff — no `float`/`double`/`Number(`/`parseFloat`/`FLOAT`/`DOUBLE` on monetary values; types are minor-unit integers or a base-10 decimal. Schema columns are `NUMERIC`/`BIGINT`, not `FLOAT`/`MONEY`.
+2. **Exponent correctness:** format `1234` minor units in USD→`$12.34`, JPY→`¥1234`, BHD→`1.234` — the exponent comes from the currency, not a constant.
+3. **Allocation sums exactly:** property test `sum(allocate(total, ratios)) == total` for thousands of random totals and ratio vectors, including `total/3`, `/7`, zero ratios, and a single element. Zero penny leaks.
+4. **Single rounding boundary:** a chained calc (price × qty × (1−discount) × (1+tax)) rounds once and equals a hand-computed full-precision-then-round figure; intermediates carry sub-minor precision.
+5. **Tax reconciles:** sum of per-line taxes equals the documented invoice total under the chosen per-line/total rule; tax-inclusive extraction satisfies `net + tax == gross` exactly.
+6. **FX round-trip bounded:** convert `A→B→A` for many amounts; result is within 1 minor unit of the original (rounding only, no drift), and a triangulated path rounds only the final leg.
+7. **Equality is exact:** money equality/compare uses no epsilon and treats different currencies as unequal; tests assert `{1234,"USD"} != {1234,"JPY"}`.
+8. **Serialization is lossless:** amounts cross JSON/DB boundaries as string or minor-unit integer + currency; a `12.34`-as-float anywhere fails the check.
+9. **Format/parse identity:** for a set of locales, `parse(format(x)) == x` in canonical units.
+Done = no binary float touches money anywhere, every currency uses its ISO 4217 exponent, allocation/tax/FX sum to the whole with zero penny leak, rounding mode is documented and applied exactly once at each boundary, and amounts are stored and serialized as exact (NUMERIC/minor-unit/string) values — all proven by the property tests in checks 3–9.

package/skills/monitor-ml-drift/SKILL.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+name: monitor-ml-drift
+description: Monitors a production ML model for input data drift, prediction drift, and performance decay against delayed labels — using PSI/KS/Chi-square drift tests, train/serve skew checks, alert thresholds, and scheduled-or-drift-triggered retraining with a champion/challenger loop — so a silently degrading model is caught before it costs.
+when_to_use: A deployed model needs ongoing statistical health monitoring or has quietly degraded. Distinct from serve-deploy-ml-model (rollout/canary/autoscale), train-evaluate-ml-model (initial build + offline metrics), observability-instrument (service latency/error RED metrics), and validate-data-quality (rule assertions, not distribution shift).
+---
+## When to Use
+Reach for this skill when the concern is **the model's statistical health in production**, not whether the service is up:
+- "Accuracy looked fine at launch but the model feels worse now — is it drifting?"
+- "Our feature distributions shifted (new user segment, seasonality, upstream schema change) — did the model degrade?"
+- "Set up drift + performance monitoring and an alert when a retrain is warranted"
+- "Labels arrive 2 weeks late — how do I track real accuracy/AUC over time?"
+- "Detect train/serve skew — the model scores differently offline vs online on the same row"
+- "Wire a champion/challenger so a candidate retrain only ships if it beats prod"
+NOT this skill:
+- Shipping/rolling out the model artifact, canary, autoscaling → serve-deploy-ml-model
+- The original training run, offline eval, hyperparameter search, test-set metrics → train-evaluate-ml-model
+- Service-level latency/error-rate/RED metrics, traces, dashboards, p99 alerts → observability-instrument
+- Rule assertions on the data pipeline (not-null, unique, freshness, range) → validate-data-quality (drift is *distributional*; a column can pass every range rule and still have shifted its whole distribution)
+## Steps
+1. **Log every prediction as an immutable event — no logging = no monitoring.** Per request, write one row: `prediction_id`, `ts`, `model_version`, the **raw feature vector actually scored** (post-transform, exactly what the model saw), the output (`pred_proba` + `pred_label`), and a `label_join_key`. Land it in a columnar store (Parquet on S3, BigQuery, Delta). Labels arrive later out-of-band → write them to a separate table keyed by `label_join_key` and **left-join on arrival**; never block scoring on a label. Snapshot the **training reference** (a held-out slice of the training data + its predictions) once and pin it — every drift test compares live vs this fixed reference.
+2. **Pick the drift test per feature type — do not PSI everything.**
+   | Signal | Test | Fires when | Default threshold |
+   |---|---|---|---|
+   | Numeric / continuous feature | **PSI** (population stability index) | Binned distribution shifted vs reference | PSI > 0.2 = significant; 0.1–0.2 = watch |
+   | Numeric, distribution shape | **KS** (Kolmogorov–Smirnov) 2-sample | Max CDF gap large | p < 0.05 |
+   | Categorical feature | **Chi-square** / PSI on category freqs | Category mix shifted, new/unseen level | p < 0.05 / PSI > 0.2 |
+   | Prediction output (proba) | **PSI / KS** on `pred_proba` | Output distribution drifts | PSI > 0.2 |
+   | Multivariate / overall | **Domain classifier** (ref vs live, AUC) | Classifier separates ref from live | AUC > 0.7 |
+   Compute over a **rolling window** (default: last 7 days or 10k preds, whichever larger) vs the pinned reference. Use a fixed reference for stable populations; switch to a **trailing-window reference** only if the population legitimately evolves (and document that you've given up detecting slow drift). Apply **Bonferroni/BH correction** across features — with 200 features at p<0.05 you get ~10 false alarms per run by chance.
+3. **Separate the three drift types — they mean different things and trigger different actions.**
+   - **Data (input) drift** — features moved. Model may still be fine; this is an *early warning*, not proof of decay. Page only if widespread.
+   - **Prediction drift** — output distribution moved without a known input cause → upstream feature pipeline broke, or real population shift. Higher signal than single-feature input drift.
+   - **Concept drift / performance decay** — the input→output relationship changed. **Only measurable once labels land.** This is the one that actually justifies a retrain. Track the real metric (AUC/F1/MAE — whatever you optimized) per cohort window vs a **baseline window** (e.g. first 2 weeks post-deploy, or last known-good).
+4. **Run it with a library — don't hand-roll the stats.** Evidently for reports + tests, whylogs for lightweight profile logging at scale, NannyML for *estimating* performance **before** labels arrive (CBPE/DLE). Pin `evidently==0.4.*` and use its `Report` / `metric_preset` API:
+   ```python
+   from evidently.report import Report
+   from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
+   from evidently import ColumnMapping
+   cm = ColumnMapping(prediction="pred_proba")
+   report = Report(metrics=[
+       DataDriftPreset(stattest="psi", stattest_threshold=0.2),  # per-feature input drift
+       TargetDriftPreset(),                                       # prediction-column drift
+   ])
+   report.run(reference_data=ref_df, current_data=live_df, column_mapping=cm)
+   res = report.as_dict()
+   drift = res["metrics"][0]["result"]                # DataDriftPreset summary
+   if drift["share_of_drifted_columns"] > 0.3:        # >30% of features drifted → alert
+       fire_alert("data_drift", detail=drift)
+   ```
+   For pre-label performance estimation when labels lag:
+   ```python
+   import nannyml as nml
+   est = nml.CBPE(problem_type="classification_binary", y_pred="pred_label",
+                  y_pred_proba="pred_proba", y_true="label",
+                  metrics=["roc_auc"], chunk_size=5000)
+   est.fit(reference_df)                 # reference must include matured labels
+   estimated = est.estimate(live_df)     # estimated AUC + confidence band, no live labels needed
+   ```
+5. **Detect train/serve skew explicitly — it's a silent killer.** Re-score a sample of logged production feature vectors through the **offline** model and assert `abs(online_proba − offline_proba) < 1e-4`. Mismatch = a transform diverged between training and serving (different encoder fit, a default-fill applied online only, version skew in a preprocessing lib). Also compare **training-time** feature distributions vs **serving-time** for the same feature: skew shows up as a step change at deploy, not a gradual drift. Run this nightly on a sample.
+6. **Set thresholds and a retraining trigger — opinionated defaults, then tune to your false-alarm budget.**
+   - **Trigger retrain** when *any* holds: estimated/actual primary metric drops > **5% relative** below baseline for ≥2 consecutive windows; OR prediction-drift PSI > 0.2 sustained; OR > 30% of top-importance features drifted. One noisy window ≠ retrain — require **persistence** (2+ windows) to kill flapping.
+   - **Schedule** a baseline retrain regardless (weekly/monthly) so you never rely solely on drift detection catching it.
+   - On trigger, retrain a **challenger** and gate promotion through a champion/challenger comparison (step 7) — never auto-promote on a drift signal alone; drift can be benign.
+7. **Champion/challenger before promotion.** Train challenger on fresh data, evaluate **both** on the same recent labeled window (and ideally a shadow/online split). Promote only if challenger beats champion on the primary metric by a margin **beyond noise** (bootstrap CI on the metric, or a paired test) — not a single point estimate. Log the decision + metrics to a model registry. Hand the actual rollout (canary, traffic shift, rollback) to **serve-deploy-ml-model**; this skill decides *whether*, that skill does *how*.
+8. **Alert routing, not just detection.** Page on **performance decay** and **prediction drift** (high signal). Send **input drift** to a dashboard/digest, not a pager — single-feature input drift is frequent and usually benign; paging on it trains everyone to ignore the channel. Every alert carries: which signal, which features/metric, the value vs threshold, the window, and a link to the drift report.
+## Common Errors
+- **Logging transformed-then-re-derived features instead of what the model scored.** You then compare a reconstruction, not reality, and miss real skew. Log the exact post-transform vector at inference time.
+- **Reference set = the whole training data including the part the model trained on.** Leaks optimism. Use a **held-out** slice as reference.
+- **PSI/KS run with no multiple-comparison correction.** 200 features × p<0.05 ≈ 10 false "drifts" every run → alert fatigue. Apply Bonferroni/BH and a `share_of_drifted_columns` gate, don't alert per feature.
+- **Treating any data drift as "model is broken."** Features can shift while accuracy holds. Only **performance decay** (or prediction drift with a cause) justifies a retrain; input drift is a watch signal.
+- **Computing "live accuracy" the moment predictions are made.** Labels are delayed — that number is empty until labels land. Use NannyML CBPE/DLE to *estimate* performance pre-label, and report actual metric only over windows whose labels have matured.
+- **Joining labels to predictions on timestamp.** Late/duplicate/reordered labels corrupt the join. Join on a stable `label_join_key`, and bucket by **prediction** time, not label-arrival time.
+- **Comparing windows of wildly different size.** PSI/KS are sensitive to n; a 200-row window vs a 50k reference flags noise as drift. Fix a minimum window size and equal-ish bins.
+- **Fixed reference forever on a legitimately evolving population.** Everything reads as drift and the signal dies. Either accept slow drift goes undetected with a trailing reference, or re-baseline deliberately on each retrain — and write down which.
+- **Auto-retrain + auto-promote on a single drift spike.** Promotes a worse model on a benign blip or a data outage. Require persistence (2+ windows) and a champion/challenger win beyond noise.
+- **No train/serve skew check.** The most common production regression — an encoder/imputer that differs online — is invisible to distribution drift. Re-score logged rows offline and assert equality.
+## Verify
+1. **Inject a known input shift:** take a held-out reference, build a `current` where one numeric feature is multiplied (e.g. ×1.5) or a category's frequency is swapped → the per-feature drift test (PSI/KS) for *that* feature fires and the others stay green. Proves sensitivity *and* specificity.
+2. **Inject prediction drift:** shift `pred_proba` for the current window → prediction-drift alert fires while input features are unchanged. Proves the output monitor is independent.
+3. **Replay a known-degraded period:** feed a window whose labels you know are bad (mislabel a slice or use a historically-bad date range) → the performance tracker shows the metric dropping > 5% below baseline and the **retrain trigger** fires after the 2nd consecutive bad window (not the 1st).
+4. **Negative control:** feed `current = reference` (resampled) → **no** alert fires. If a same-distribution sample trips an alert, your thresholds/correction are too tight.
+5. **Skew check:** re-score a sample of logged prod vectors offline → `max|online − offline| < 1e-4`. Then deliberately break one transform and confirm the skew check catches it.
+6. **Delayed-label join:** insert labels out of order / late → actual-metric windows recompute correctly keyed by prediction time, and pre-label estimated metric (CBPE) tracks the eventual actual within its confidence band.
+7. **Champion/challenger gate:** feed a challenger that's worse on the recent window → promotion is **rejected**; feed one that's better beyond the CI → promotion is approved and logged to the registry.
+Done = an injected input shift fires only the right feature's drift alert (negative control stays silent), prediction drift is detected independently, the performance tracker reflects the known-degraded period and trips the retrain trigger after sustained (not single-window) decay, train/serve skew is caught, and champion/challenger blocks a worse model from promoting.