npm - sanook-cli - Versions diffs - 0.4.0 → 0.5.1 - Mend

sanook-cli 0.4.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (238) hide show

package/.env.example +19 -0
package/CHANGELOG.md +173 -0
package/README.md +153 -20
package/README.th.md +136 -0
package/dist/agentContext.js +4 -0
package/dist/approval.js +6 -0
package/dist/bin.js +405 -57
package/dist/brain.js +92 -59
package/dist/brand.js +47 -0
package/dist/checkpoint.js +37 -0
package/dist/commands.js +86 -6
package/dist/compaction.js +76 -5
package/dist/config.js +100 -12
package/dist/cost.js +60 -3
package/dist/doctor.js +92 -0
package/dist/gateway/auth.js +2 -2
package/dist/gateway/ledger.js +2 -2
package/dist/gateway/scheduler.js +1 -0
package/dist/gateway/serve.js +6 -4
package/dist/gateway/server.js +10 -2
package/dist/git.js +11 -2
package/dist/hooks.js +43 -17
package/dist/knowledge.js +48 -49
package/dist/loop.js +182 -66
package/dist/lsp/client.js +173 -0
package/dist/lsp/framing.js +56 -0
package/dist/lsp/index.js +138 -0
package/dist/lsp/servers.js +82 -0
package/dist/mcp-server.js +244 -0
package/dist/mcp.js +184 -29
package/dist/memory-store.js +559 -0
package/dist/memory.js +143 -29
package/dist/orchestrate.js +150 -0
package/dist/providers/codex.js +21 -7
package/dist/providers/keys.js +3 -2
package/dist/providers/models.js +22 -6
package/dist/providers/registry.js +155 -1
package/dist/repomap.js +93 -0
package/dist/search/chunk.js +158 -0
package/dist/search/embed-store.js +187 -0
package/dist/search/engine.js +203 -0
package/dist/search/fuse.js +35 -0
package/dist/search/index-core.js +187 -0
package/dist/search/indexer.js +241 -0
package/dist/search/store.js +77 -0
package/dist/session.js +42 -8
package/dist/skill-install.js +10 -10
package/dist/skills.js +12 -9
package/dist/summarize.js +31 -0
package/dist/tools/bash.js +21 -2
package/dist/tools/diagnostics.js +41 -0
package/dist/tools/edit.js +29 -7
package/dist/tools/index.js +8 -1
package/dist/tools/list.js +7 -2
package/dist/tools/permission.js +90 -9
package/dist/tools/read.js +23 -4
package/dist/tools/remember.js +1 -1
package/dist/tools/sandbox.js +61 -0
package/dist/tools/search.js +105 -4
package/dist/tools/task.js +195 -29
package/dist/tools/timeout.js +35 -0
package/dist/tools/util.js +10 -0
package/dist/tools/write.js +6 -4
package/dist/trust.js +89 -0
package/dist/ui/app.js +228 -31
package/dist/ui/banner.js +4 -9
package/dist/ui/brain-wizard.js +2 -2
package/dist/ui/history.js +30 -0
package/dist/ui/mentions.js +44 -0
package/dist/ui/render.js +55 -15
package/dist/ui/setup.js +97 -12
package/dist/ui/useEditor.js +83 -0
package/dist/update.js +114 -0
package/dist/worktree.js +173 -0
package/package.json +11 -5
package/scripts/postinstall.mjs +33 -0
package/second-brain/.agents/_Index.md +30 -0
package/second-brain/.agents/skills/_Index.md +30 -0
package/second-brain/.agents/workflows/_Index.md +30 -0
package/second-brain/AGENTS.md +4 -4
package/second-brain/Acceptance/_Index.md +30 -0
package/second-brain/Acceptance/golden-case-template.md +39 -0
package/second-brain/Areas/_Index.md +30 -0
package/second-brain/Bugs/System-OS/_Index.md +30 -0
package/second-brain/Bugs/_Index.md +30 -0
package/second-brain/CLAUDE.md +4 -1
package/second-brain/Checklists/_Index.md +30 -0
package/second-brain/Checklists/preflight-postflight-template.md +29 -0
package/second-brain/Distillations/_Index.md +30 -0
package/second-brain/Entities/_Index.md +30 -0
package/second-brain/Entities/entity-template.md +33 -0
package/second-brain/Evals/_Index.md +30 -0
package/second-brain/Evals/correction-pairs.md +24 -0
package/second-brain/Evals/failure-taxonomy.md +24 -0
package/second-brain/Evals/golden-set.md +25 -0
package/second-brain/Evals/quality-ledger.md +23 -0
package/second-brain/Evals/self-eval-rubric.md +23 -0
package/second-brain/GEMINI.md +4 -4
package/second-brain/Goals/_Index.md +30 -0
package/second-brain/Handoffs/_Index.md +30 -0
package/second-brain/Home.md +7 -0
package/second-brain/Intake/Raw Sources/_Index.md +30 -0
package/second-brain/Intake/_Index.md +30 -0
package/second-brain/Intake/_Quarantine/_Index.md +30 -0
package/second-brain/Learning/_Index.md +30 -0
package/second-brain/Playbooks/_Index.md +30 -0
package/second-brain/Playbooks/playbook-template.md +23 -0
package/second-brain/Projects/_Index.md +30 -0
package/second-brain/Prompts/_Index.md +30 -0
package/second-brain/README.md +2 -1
package/second-brain/Research/_Index.md +30 -0
package/second-brain/Retrospectives/_Index.md +30 -0
package/second-brain/Reviews/_Index.md +30 -0
package/second-brain/Runbooks/_Index.md +30 -0
package/second-brain/Runbooks/eval-loop.md +24 -0
package/second-brain/Sessions/_Index.md +30 -0
package/second-brain/Shared/AI-Context-Index.md +20 -0
package/second-brain/Shared/AI-Threads/_Index.md +30 -0
package/second-brain/Shared/Archive/_Index.md +30 -0
package/second-brain/Shared/Assets/_Index.md +30 -0
package/second-brain/Shared/Context-Packs/_Index.md +30 -0
package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
package/second-brain/Shared/Coordination/NOW.md +28 -0
package/second-brain/Shared/Coordination/_Index.md +30 -0
package/second-brain/Shared/Coordination/agent-registry.md +24 -0
package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
package/second-brain/Shared/Coordination/task-board.md +32 -0
package/second-brain/Shared/Core-Facts/_Index.md +30 -0
package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
package/second-brain/Shared/Glossary/_Index.md +30 -0
package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
package/second-brain/Shared/Operating-State/_Index.md +30 -0
package/second-brain/Shared/Prompting/_Index.md +30 -0
package/second-brain/Shared/Provenance/_Index.md +30 -0
package/second-brain/Shared/Rules/_Index.md +30 -0
package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
package/second-brain/Shared/Rules/rules-formatting.md +34 -0
package/second-brain/Shared/Scripts/_Index.md +30 -0
package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
package/second-brain/Shared/User-Memory/_Index.md +30 -0
package/second-brain/Shared/User-Persona/_Index.md +30 -0
package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
package/second-brain/Shared/Working-Memory/_Index.md +30 -0
package/second-brain/Shared/_Index.md +30 -0
package/second-brain/Shared/mcp-servers/_Index.md +30 -0
package/second-brain/Skills/_Index.md +30 -0
package/second-brain/Templates/_Index.md +30 -0
package/second-brain/Templates/bug.md +2 -0
package/second-brain/Templates/handoff.md +2 -0
package/second-brain/Templates/session.md +2 -0
package/second-brain/Tools/_Index.md +30 -0
package/second-brain/Traces/_Index.md +30 -0
package/second-brain/Vault Structure Map.md +33 -1
package/second-brain/copilot/_Index.md +30 -0
package/skills/audit-license-compliance/SKILL.md +117 -0
package/skills/author-codemod/SKILL.md +110 -0
package/skills/build-audit-logging/SKILL.md +112 -0
package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
package/skills/build-cli-tool/SKILL.md +108 -0
package/skills/build-data-table/SKILL.md +141 -0
package/skills/build-native-mobile-ui/SKILL.md +154 -0
package/skills/build-offline-first-sync/SKILL.md +118 -0
package/skills/build-realtime-channel/SKILL.md +122 -0
package/skills/build-vector-search/SKILL.md +131 -0
package/skills/compose-local-dev-stack/SKILL.md +149 -0
package/skills/configure-bundler-build/SKILL.md +166 -0
package/skills/configure-dns-tls/SKILL.md +142 -0
package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
package/skills/configure-security-headers-csp/SKILL.md +122 -0
package/skills/contract-testing/SKILL.md +140 -0
package/skills/datetime-timezone-correctness/SKILL.md +125 -0
package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
package/skills/debug-flaky-tests/SKILL.md +128 -0
package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
package/skills/deliver-webhooks/SKILL.md +116 -0
package/skills/design-api-pagination/SKILL.md +144 -0
package/skills/design-authorization-model/SKILL.md +119 -0
package/skills/design-backup-dr-recovery/SKILL.md +113 -0
package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
package/skills/design-multi-tenancy/SKILL.md +100 -0
package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
package/skills/design-relational-schema/SKILL.md +129 -0
package/skills/design-search-index-infra/SKILL.md +151 -0
package/skills/design-state-machine/SKILL.md +108 -0
package/skills/design-token-system/SKILL.md +109 -0
package/skills/distributed-locks-leases/SKILL.md +120 -0
package/skills/encrypt-sensitive-data/SKILL.md +148 -0
package/skills/feature-flags-rollout/SKILL.md +130 -0
package/skills/file-upload-object-storage/SKILL.md +107 -0
package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
package/skills/harden-llm-app-reliability/SKILL.md +126 -0
package/skills/i18n-localization-setup/SKILL.md +113 -0
package/skills/idempotency-keys/SKILL.md +107 -0
package/skills/implement-push-notifications/SKILL.md +142 -0
package/skills/ingest-webhook-secure/SKILL.md +120 -0
package/skills/integrate-oauth-oidc/SKILL.md +126 -0
package/skills/load-stress-test/SKILL.md +129 -0
package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
package/skills/model-nosql-data/SKILL.md +118 -0
package/skills/money-decimal-arithmetic/SKILL.md +123 -0
package/skills/monitor-ml-drift/SKILL.md +109 -0
package/skills/numeric-precision-units/SKILL.md +144 -0
package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
package/skills/optimize-react-rerenders/SKILL.md +124 -0
package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
package/skills/payments-billing-integration/SKILL.md +114 -0
package/skills/pin-toolchain-versions/SKILL.md +116 -0
package/skills/plan-strangler-migration/SKILL.md +95 -0
package/skills/property-based-testing/SKILL.md +108 -0
package/skills/publish-package-registry/SKILL.md +130 -0
package/skills/recover-git-state/SKILL.md +119 -0
package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
package/skills/resilience-timeouts-retries/SKILL.md +104 -0
package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
package/skills/rewrite-git-history/SKILL.md +109 -0
package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
package/skills/schema-evolution-compatibility/SKILL.md +121 -0
package/skills/send-transactional-email/SKILL.md +126 -0
package/skills/serve-deploy-ml-model/SKILL.md +107 -0
package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
package/skills/setup-devcontainer-env/SKILL.md +131 -0
package/skills/setup-lint-format-precommit/SKILL.md +140 -0
package/skills/setup-monorepo-tooling/SKILL.md +125 -0
package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
package/skills/structured-output-llm/SKILL.md +86 -0
package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
package/skills/test-data-factories/SKILL.md +158 -0
package/skills/threat-model-stride/SKILL.md +123 -0
package/skills/train-evaluate-ml-model/SKILL.md +109 -0
package/skills/unicode-text-correctness/SKILL.md +109 -0
package/skills/visual-regression-testing/SKILL.md +120 -0

package/second-brain/copilot/_Index.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+tags: [index, moc, copilot]
+note_type: moc
+created: {{DATE}}
+updated: {{DATE}}
+parent: "[[Home]]"
+---
+# copilot
+> vendor export (conversation/custom-prompt/memory snapshot) — review/promote, ไม่ใช่ source of truth
+## ใส่ที่นี่
+export จาก Copilot ที่เก็บใน-vault
+## ไม่ใส่ที่นี่
+durable (promote เข้า durable layer)
+## AI Routing Contract
+- ก่อนเขียน: เช็กว่าเนื้อหาตรง "ใส่ที่นี่" และไม่เข้า "ไม่ใส่ที่นี่"; ถ้าก้ำกึ่งอ่าน [[Vault Structure Map]] ก่อน
+- ก่อนสร้างไฟล์ใหม่: ค้นหาโน้ตเดิมในโฟลเดอร์นี้และโฟลเดอร์ใกล้เคียงก่อน เพื่อ merge/update แทน append ซ้ำ
+- เมื่อสร้างโน้ตในโฟลเดอร์นี้: ตั้ง `parent: "[[copilot/_Index]]"` และท้ายไฟล์ `up:: [[copilot/_Index]]`
+- หลังเขียน: เชื่อม link ไป source/project/session/decision ที่เกี่ยวข้อง และอัปเดต hub/index ถ้าโน้ตนี้ควรถูกค้นเจอในอนาคต
+> รายละเอียดทุกโฟลเดอร์ + decision rules → [[Vault Structure Map]]
+_(ยังว่าง — โน้ตในโฟลเดอร์นี้จะถูกลิงก์ที่นี่)_
+up:: [[Home]]

package/skills/audit-license-compliance/SKILL.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+name: audit-license-compliance
+description: Audits open-source license compliance — resolves SPDX identifiers across the full transitive dependency tree (license-checker/scancode), classifies copyleft (GPL/AGPL/LGPL) exposure against the distribution model, enforces an allow/deny CI policy, and generates NOTICE/THIRD-PARTY attribution files.
+when_to_use: Shipping a product/library or prepping a legal/procurement review and needing to clear OSS license obligations. Distinct from supply-chain-sbom-provenance (build integrity, SBOM signing, provenance), dependency-upgrade (version bumps), and publish-package-registry (the publish step).
+---
+## When to Use
+Reach for this skill when the question is about **license obligations**, not which versions to ship or whether the build is tamper-proof:
+- "Can we ship — does anything here have a GPL/AGPL problem?"
+- "Generate the NOTICE / THIRD-PARTY-LICENSES file for the release."
+- "Add a CI gate that fails the build on a forbidden license."
+- "Legal/procurement needs the full list of dependencies and their licenses."
+- "This transitive dep has no license / is dual-licensed — what do we do?"
+- "We're going from internal SaaS to a downloadable binary — what changes?"
+NOT this skill:
+- Generating/signing an SBOM, provenance, or attestations → supply-chain-sbom-provenance (it lists *components*; this skill judges their *licenses*)
+- Bumping a version or swapping a GPL dep for an alternative → dependency-upgrade
+- The actual `npm publish` / `twine upload` step and its OIDC gate → publish-package-registry
+- A design-level risk enumeration of the system → threat-model-stride
+## Steps
+1. **Scan the FULL transitive tree, not just direct deps — and pin the result to a manifest.** Direct deps are a tiny minority; copyleft almost always rides in transitively. Pick the resolver for the ecosystem and emit machine-readable output:
+   ```bash
+   # Node — license-checker-rseki (maintained fork) over the *production* tree only
+   npx license-checker-rseki --production --json --out licenses.json
+   # Python
+   pip-licenses --format=json --with-license-file --with-urls > licenses.json
+   # Rust
+   cargo install cargo-deny && cargo deny list -f json > licenses.json
+   # Go
+   go install github.com/google/go-licenses@latest && go-licenses report ./... > licenses.csv
+   # Ground truth when metadata lies — scans actual file headers/text
+   pipx run scancode-toolkit scancode --license --json-pp scancode.json <vendored_dir>
+   ```
+   Scope to what you **distribute**: prod/runtime deps only. devDependencies, test, and build-only tooling are generally not distributed — exclude them (`--production`, `--omit=dev`) or you'll drown in false GPL hits from linters. When package metadata is missing or wrong, `scancode` reading the real LICENSE/headers is the tiebreaker, not the `package.json` `license` field.
+2. **Classify each license by risk against YOUR distribution model — this is the whole audit.** The same license is fine or fatal depending on how you ship. Decide the model first, then read the table left-to-right:
+   | License class | Examples | SaaS (network only) | Distributed binary / app | Library you publish |
+   |---|---|---|---|---|
+   | Permissive | MIT, BSD-2/3, ISC, Apache-2.0, Unlicense, 0BSD | ✅ allow | ✅ allow (must keep NOTICE) | ✅ allow |
+   | Weak copyleft (file) | MPL-2.0, EPL-2.0, CDDL | ✅ allow | ⚠️ allow if unmodified & file-isolated | ⚠️ review |
+   | Weak copyleft (lib) | LGPL-2.1/3.0 | ✅ allow | ⚠️ **dynamic** link only; static link triggers relink obligation | ⚠️ review |
+   | Strong copyleft | GPL-2.0, GPL-3.0 | ✅ allow (no conveying) | ❌ **deny** — forces whole-program source disclosure | ❌ deny |
+   | Network copyleft | **AGPL-3.0** | ❌ **deny** — network use = conveying, source must be offered to users | ❌ deny | ❌ deny |
+   | Notice-heavy / patent | Apache-2.0, BSD-4-Clause | ✅ (track NOTICE/patent grant) | ⚠️ BSD-4 advertising clause incompatible w/ GPL | ⚠️ review |
+   | Non-OSS / source-available | SSPL, BUSL-1.1, Elastic-2.0, CC-BY-NC, "Commons Clause" | ❌ deny (not OSI; usage-restricted) | ❌ deny | ❌ deny |
+   | Public domain / unclear | WTFPL, "UNLICENSED", no license | ❌ deny pending manual review | ❌ deny pending review | ❌ deny |
+   The trap most teams miss: **AGPL bites SaaS** (where GPL does not, because you never "convey" a binary), and **LGPL/GPL bite distributed binaries** (where they're harmless on a server). Set the model once; don't hand-wave it as "it depends."
+3. **Encode the policy as allow / deny / review with SPDX IDs and gate CI on it.** A human-readable table isn't enforcement. Use one tool to both classify and fail the build. `cargo-deny`-style config (mirror the shape in `license-checker --failOn` or an `oss-review-toolkit`/`fossa`/`trivy` policy):
+   ```toml
+   # deny.toml — explicit allowlist; everything unlisted is a hard failure
+   [licenses]
+   allow = ["MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "ISC", "MPL-2.0"]
+   confidence-threshold = 0.9          # below this, treat as unknown → fail
+   # exceptions: this one crate is allowed despite the policy, with a reason on record
+   [[licenses.exceptions]]
+   crate = "ring"
+   allow = ["OpenSSL"]                 # rationale: leaf TLS dep, OpenSSL terms cleared by legal 2026-05
+   ```
+   ```bash
+   # CI gate — non-zero exit blocks the merge. Run on every PR.
+   cargo deny check licenses
+   # Node equivalent (allowlist must match deny.toml above):
+   npx license-checker-rseki --production --onlyAllow \
+     "MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;MPL-2.0" --excludePrivatePackages
+   ```
+   Default posture is **allowlist, deny-by-default**: an unknown/unparseable license fails closed, so a newly added denied or no-license dep cannot merge silently. Route genuinely-ambiguous cases to a `review` bucket that fails CI with a "needs legal sign-off" message rather than auto-allowing.
+4. **Resolve dual-licensed and missing-license deps explicitly — never let the tool pick for you.** SPDX `OR` means *you choose* (e.g. `(MIT OR Apache-2.0)` → pick the one in your allowlist and record the choice). SPDX `AND` means *both apply* (you must satisfy every obligation). For a dep with **no license**, default-deny: open an issue, contact the maintainer, or remove it — "no license" means all-rights-reserved, not free. Pin every resolution and exception in `deny.toml`/policy with a one-line rationale so the next audit doesn't relitigate it.
+5. **Generate NOTICE / THIRD-PARTY-LICENSES attribution from the same scan.** Permissive licenses (MIT/BSD/Apache) require you to reproduce their copyright + license text in distributed artifacts; Apache-2.0 also requires propagating any upstream `NOTICE`. Auto-generate, don't hand-maintain:
+   ```bash
+   npx oss-attribution-generator ./   # → oss-attribution/attribution.txt
+   # or: license-checker-rseki --production --customPath fields.json --files THIRD-PARTY-LICENSES/
+   pip-licenses --format=plain-vertical --with-license-file --no-license-path \
+     --output-file THIRD-PARTY-LICENSES.txt
+   go-licenses save ./... --save_path=THIRD-PARTY-LICENSES/
+   ```
+   Ship `THIRD-PARTY-LICENSES.txt` (or `NOTICE`) inside the artifact — in the package tarball, the container image, the app's "Licenses" screen, or `/licenses`. Regenerate it in CI from the locked tree and **diff against the committed copy** so a new dep can't slip in without its attribution.
+6. **Wire both gates into one CI job.** Policy check (step 3) + attribution drift check (step 5) run on every PR and on the release tag. Fail the build if a denied/unknown license appears OR the generated NOTICE differs from the committed one. This is what makes the audit durable instead of a one-time spreadsheet.
+## Common Errors
+- **Scanning direct deps only.** The GPL/AGPL almost always arrives 3 levels deep. Always resolve the full transitive tree (`--production` flattens it).
+- **Including devDependencies in the distributed verdict.** A GPL linter or test runner isn't distributed and isn't a violation — it just floods the report and gets the gate disabled. Scope to runtime/prod.
+- **Trusting the package `license` field over the actual files.** Metadata is frequently wrong, stale, or `SEE LICENSE IN ...`. When it matters, let `scancode` read the real LICENSE/headers; that's ground truth.
+- **Treating AGPL like GPL for a SaaS.** AGPL's network clause means serving it over HTTP *is* conveying — source must be offered to every user. Deny AGPL even when GPL would be fine for your server-only model.
+- **Static-linking an LGPL library into a shipped binary.** That triggers the relink obligation (users must be able to swap the lib). Dynamic-link it, or treat it as deny for static builds.
+- **Auto-picking a side of a dual license silently.** `(GPL-2.0 OR MIT)` is only safe because *you* elect MIT — record the election. If the tool defaulted to GPL, you may have manufactured an obligation that didn't exist.
+- **"No license" read as permissive.** Absence of a license = all rights reserved = you have no grant to use it. Default-deny and resolve, don't ship.
+- **Allowlist that fails *open* on unknowns.** A typo SPDX id or low-confidence match must fail the build, not pass. Set `confidence-threshold` and deny-by-default.
+- **Hand-maintaining NOTICE.** It drifts the moment a transitive dep changes. Generate it from the lockfile in CI and diff against the committed copy.
+- **Confusing source-available with open-source.** SSPL/BUSL/Elastic-2.0/"Commons Clause" are usage-restricted and not OSI-approved — deny them unless legal explicitly cleared the specific use.
+## Verify
+1. **Coverage:** The scan output lists *transitive* deps, not just the handful in `package.json`/`Cargo.toml`. Spot-check a known deep dep appears with a license.
+2. **Policy gate fires (positive control):** Add a dep with a denied license (e.g. a GPL-3.0 package) on a throwaway branch → CI **fails** with a message naming the dep and its license. Revert.
+3. **Unknown fails closed:** Point the scanner at a dep with a stripped/garbled license → it is reported as unknown and the gate **fails**, not passes.
+4. **Distribution-model correctness:** Re-run classification under the other model (flip SaaS↔distributed) and confirm AGPL/LGPL/GPL verdicts change as the table predicts — proves the model is actually driving the verdict, not hardcoded.
+5. **Attribution completeness:** Every distributed (prod) dependency in the lockfile appears in `THIRD-PARTY-LICENSES`/`NOTICE` with its license text. Count of attributed deps == count of distributed deps; no entry is empty.
+6. **Attribution drift gate:** Add a prod dep without regenerating → CI's NOTICE-diff check **fails**. Regenerate → it passes and the new dep+license is present.
+7. **Dual/missing resolved:** No dep is left with an unresolved `OR` expression or empty license; each exception in the policy has a written rationale.
+Done = the scan covers the full transitive prod tree, CI blocks a newly added denied-license **and** an unknown-license dep (verified by positive controls), every distributed dependency is listed with its license in the committed NOTICE, and the NOTICE-drift gate fails when that list goes stale.

package/skills/author-codemod/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+name: author-codemod
+description: Writes, fixture-tests, and runs codebase-wide automated transforms (codemods) that parse source to an AST and rewrite nodes via grammar-aware tools (jscodeshift/ts-morph, ast-grep, Comby, libcst/Bowler, OpenRewrite), dry-running before applying one mechanical change across many files. Use when a structural edit must hit many call sites reliably and find-replace would mangle strings, comments, or shadowed names.
+when_to_use: One mechanical change must land across many files — rename/move an exported API across all call sites, swap a deprecated call, rewrite an import path or signature, migrate an idiom (callbacks→async, class→hooks). Distinct from refactor-cleanup (judgment-driven edits in a few files), dependency-upgrade (bumping versions), and regex-build (a single pattern, not a tree transform).
+---
+## When to Use
+Reach for this skill when the **same structural edit must hit many files** and correctness depends on understanding the code's grammar, not its text:
+- "Rename `getUser` → `fetchUser` everywhere, including imports/exports/JSX, but not the string `"getUser"` in logs"
+- "Replace every `moment(x)` with `dayjs(x)` across the repo"
+- "Change the import path `@old/pkg` → `@new/pkg` and rewrite the now-renamed named exports"
+- "Migrate all `React.Component` classes to function components / hooks"
+- "Add `await` to every call site of a function that became async"
+- "Swap deprecated `assert.equal` → `assert.strictEqual` repo-wide"
+NOT this skill:
+- A judgment-heavy cleanup in 1–3 files where each edit is a decision, not a rule → refactor-cleanup
+- Bumping a package version and fixing the fallout it causes → dependency-upgrade (a codemod may be a *step* inside it)
+- One search/replace pattern over text where a tree isn't needed → regex-build
+- Reviewing the resulting diff for bugs → code-review
+- Schema/data changes in a running DB → db-migration-safety
+## Steps
+1. **Pick the tool by language and edit shape — don't reach for `sed`.** A grammar-aware tool refuses to touch strings, comments, and shadowed bindings; text tools can't tell them apart.
+   | Language / scope | Tool | Reach for it when |
+   |---|---|---|
+   | JS/TS, complex semantic rewrite | **jscodeshift** (Babel/recast AST) | rename across imports/exports/JSX, signature changes; preserves formatting via recast |
+   | TS, type-aware, follow references | **ts-morph** | needs the type checker — find *all* references to a symbol, not name matches |
+   | JS/TS/Go/Python/Rust, pattern→pattern | **ast-grep** (`sg`) | declarative `pattern:`/`rewrite:` in YAML, polyglot, fast, no JS to write |
+   | Any language, structural find/replace | **Comby** | lightweight `:[hole]` matchers when you don't want a full AST pass |
+   | Python | **libcst** (or **Bowler** for simple cases) | preserves comments + formatting; libcst for graph-aware, Bowler for fluent one-liners |
+   | Java/Kotlin | **OpenRewrite** (recipes) | type-aware recipes, dependency + API migrations at scale |
+   Default: **ast-grep** for a clean pattern→pattern swap in any language; **jscodeshift** when the JS/TS rewrite needs real logic; **ts-morph** when you must follow type references, not text.
+2. **Characterize on 2–3 representative files first, then enumerate edge cases.** Open real call sites and write down what must and must NOT change. Hostile cases that break naive transforms:
+   - **Shadowing** — a local `const getUser = ...` that is *not* the imported symbol.
+   - **Re-exports / aliases** — `export { getUser as gu }`, `import { getUser as g }`.
+   - **Dynamic access** — `obj["getUser"]`, computed members, reflection — usually out of AST reach; list them as manual tail.
+   - **Strings & comments** — must be left alone unless explicitly targeted.
+   - **Formatting** — preserve it (recast/libcst/ts-morph do; a naive print does not).
+   - **Partial matches** — `getUserById` must not be caught by a `getUser` rule.
+3. **Write the transform as code and test it on fixtures before touching the tree.** The codemod is software — assert before/after so a bad rule can't run silently. jscodeshift skeleton:
+   ```js
+   // rename-getUser.js — jscodeshift transform
+   module.exports = function (file, api) {
+     const j = api.jscodeshift;
+     const root = j(file.source);
+     // only rename the IMPORTED binding, follow its local name (handles aliases)
+     root.find(j.ImportSpecifier, { imported: { name: 'getUser' } })
+       .forEach((p) => {
+         const local = p.node.local.name;            // respects `as` alias
+         root.find(j.Identifier, { name: local })
+           .filter((id) => id.parent.node.type !== 'ImportSpecifier')
+           .forEach((id) => { id.node.name = 'fetchUser'; });
+         p.node.imported.name = 'fetchUser';
+       });
+     return root.toSource({ quote: 'single' });
+   };
+   ```
+   ```js
+   // rename-getUser.test.js — the codemod itself is under test
+   const { applyTransform } = require('jscodeshift/dist/testUtils');
+   const t = require('./rename-getUser');
+   test('renames imported symbol, not the shadow or the string', () => {
+     expect(applyTransform(t, {}, { source:
+       `import { getUser } from './api';\nconst x = getUser(1);\nlog("getUser");` }))
+     .toBe(
+       `import { fetchUser } from './api';\nconst x = fetchUser(1);\nlog("getUser");`);
+   });
+   ```
+   ast-grep equivalent needs no test harness but still gets a fixture diff: `sg run -p 'getUser($A)' -r 'fetchUser($A)' --lang ts fixtures/ --dry-run`.
+4. **Dry-run across the whole tree; review stats and sampled hunks; tune to zero false hits.** Never let the first run write. jscodeshift `--dry --print` (or `-d -p`), ast-grep `--dry-run`, Comby without `-i`, libcst via a `--no-write` flag. Inspect the **count** of changed files and read a random sample of hunks — if any touch a string, comment, or shadow, fix the rule and re-dry-run. Iterate until the only diff is the intended one.
+5. **Apply, then immediately run the toolchain — the codemod's output is unverified until the build agrees.** Run formatter → linter → typecheck → tests in that order: `prettier --write` / `gofmt`, then `eslint --fix`, then `tsc --noEmit`, then the suite. A green typecheck is the strongest signal a JS/TS rename hit every real reference; a red one points straight at a missed or over-eager site.
+6. **Handle the long tail manually and document it.** Dynamic access, generated code, vendored files, and cross-package boundaries the AST can't follow stay broken — `grep` the old name to find survivors, fix them by hand, and note in the commit what the codemod could not reach and why.
+7. **Commit the transform script alongside its diff.** Check in the codemod file + its fixture test in the same commit as the generated changes, with the exact command in the message. The change becomes reproducible, reviewable as a rule (not N edits), and re-runnable when stragglers appear.
+## Common Errors
+- **`sed`/find-replace across a grammar.** Rewrites the symbol inside strings, comments, and shadowed locals. Use an AST tool that distinguishes node kinds.
+- **Matching the name, not the binding.** A bare `Identifier` rename also hits an unrelated local `getUser`. Anchor to the import/declaration and follow its references (ts-morph `findReferences`, or trace the local name as in step 3).
+- **Ignoring `as` aliases and re-exports.** `import { getUser as g }` renames nothing if you only look for `getUser` identifiers. Read `local` vs `imported`; rewrite both the export and its consumers.
+- **Substring false positives.** A `getUser` text rule mangles `getUserById`/`getUsers`. AST identifier matching is exact; text rules need word boundaries and review.
+- **Printing instead of patching the AST** (`root.toSource()` from a fresh print) reflows the whole file — huge noise diff. Use recast (jscodeshift default), libcst, or ts-morph to preserve untouched formatting.
+- **No fixture test on the codemod.** A subtle rule bug silently corrupts hundreds of files. Assert before/after on representative inputs *before* the tree run.
+- **Skipping the dry-run.** First run writes; you discover the over-match after it's everywhere. Always dry-run, read sampled hunks, then apply.
+- **Trusting "0 errors" without a typecheck.** A rename can compile-pass yet miss dynamic call sites. Run `tsc`/tests after; `grep` the old name for survivors.
+- **Letting the formatter mask a bad transform.** Running `prettier` first can hide a structural mistake behind reformatting. Diff the raw codemod output, *then* format.
+- **One mega-transform doing five things.** Unreviewable and unrevertable. One codemod = one rule; chain separate scripts.
+## Verify
+1. **Fixture tests pass:** the codemod's own before/after assertions are green, including a shadow case, an alias/re-export case, and a string-that-must-not-change case.
+2. **Dry-run is clean:** the dry-run diff contains only intended hunks — zero edits inside strings, comments, or shadowed locals (confirm by reading a random sample, not just the count).
+3. **No stragglers:** `grep -rn '\bgetUser\b'` (old name, word-bounded) over the tree returns only the documented manual tail — nothing the codemod should have caught.
+4. **Toolchain agrees:** formatter clean, linter clean, `tsc --noEmit` (or language equivalent) exits 0, full test suite passes on the transformed tree.
+5. **Diff is minimal:** changed lines are the transform's effect only — no incidental reformatting of untouched code.
+6. **Reproducible:** the codemod script + fixture test are committed with the diff and the exact run command; re-running the codemod on the result is a no-op.
+Done = fixture tests + dry-run are clean, the post-apply toolchain (format/lint/typecheck/tests) is fully green, no undocumented stragglers remain, and the transform script ships in the same commit so the change re-runs deterministically.

package/skills/build-audit-logging/SKILL.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+name: build-audit-logging
+description: Builds tamper-evident audit logging — structured actor/action/target/result records for security-relevant events, append-only hash-chained or WORM/object-lock storage, PII-safe payloads that log references not raw data, and regulation-driven retention — to satisfy SOC2/HIPAA-style controls and support incident forensics.
+when_to_use: A system needs a defensible, queryable record of sensitive actions (access, permission/config changes, admin ops) for compliance or forensics. Distinct from observability-instrument (operational logs/metrics/traces for debugging) and map-privacy-data-gdpr (data-subject rights and lawful-basis mapping).
+---
+## When to Use
+Reach for this skill when the requirement is a **defensible record of who did what to whom**, not operational telemetry:
+- "We need an audit trail for SOC2 / HIPAA / PCI — access, admin actions, config changes"
+- "Auditors want to know who changed this permission / exported this report / read this patient record"
+- "After the breach, prove what the attacker touched and that nobody edited the logs"
+- "Log every admin override / impersonation / data export, immutably"
+- "Make sensitive-action history queryable for investigations and legal hold"
+NOT this skill:
+- Debugging latency/errors with logs, metrics, traces, dashboards → observability-instrument (operational, sampled, short-retention — the opposite of an audit log)
+- Data-subject access/erasure requests, consent, lawful basis, retention *policy* for personal data → map-privacy-data-gdpr
+- *Deciding* whether an action is allowed (the policy engine itself) → design-authorization-model (audit logging records the decision; it does not make it)
+- An immutable append-only store as the system of record for business state (rebuildable projections) → design-event-sourcing-cqrs
+- Storing/rotating the secrets and signing keys this log references → secrets-management
+- Running the actual breach investigation/postmortem → incident-response-sre (this skill makes that investigation *possible*)
+## Steps
+1. **Enumerate auditable events first — code to a closed list, not "log everything."** An audit log with too much noise is as useless as one with gaps. Audit exactly the security-relevant control points:
+   | Category | Examples | SOC2 (TSC) | HIPAA |
+   |---|---|---|---|
+   | Authentication | login success/fail, MFA, logout, password/key change, session revoke | CC6.1 | §164.312(b) |
+   | Authorization decisions | access denied, privilege grant/revoke, role change, impersonation start/stop | CC6.3 | §164.308(a)(4) |
+   | Sensitive data access | read/export/print of PII/PHI/financial records, bulk query, report download | CC6.1 / CC7.2 | §164.312(b) audit controls |
+   | Config / security changes | feature flag, retention policy, encryption setting, integration/webhook, IAM policy | CC8.1 | §164.308(a)(1) |
+   | Admin / break-glass ops | user delete, data purge, override, prod DB access, support impersonation | CC6.1 | §164.308(a)(3) |
+   Define this list with security/compliance, not ad hoc per feature. Each event gets a stable `action` constant (e.g. `user.role.granted`, `record.exported`) — never a free-text string you can't query or version.
+2. **Fix one structured schema and emit it everywhere.** Required fields, machine-parseable (JSON), one shape across services:
+   ```json
+   {
+     "id": "01J8...ULID",                  // unique, sortable, dedup key
+     "ts": "2026-06-15T09:41:02.117Z",     // UTC, ISO-8601, ms precision, server clock
+     "action": "record.exported",          // from the closed list, dotted, versioned
+     "actor": { "type": "user", "id": "u_8821", "auth": "session", "on_behalf_of": "support_agent_31" },
+     "target": { "type": "patient_record", "id": "p_5567" },   // id/reference ONLY
+     "result": "allow",                     // allow | deny | error
+     "reason": "policy:export.phi.granted", // why, esp. for deny
+     "source_ip": "203.0.113.7",            // normalized from trusted proxy header
+     "user_agent": "...",
+     "request_id": "req_9f3c",              // correlation id → ties to app/trace logs
+     "tenant": "org_204",
+     "meta": { "row_count": 1420, "format": "csv" }  // counts/refs, NEVER raw payload
+   }
+   ```
+   Use **ULID/UUIDv7** for `id` (sortable + a natural dedup key for at-least-once emitters). `on_behalf_of` is mandatory whenever an admin/support acts as another user — impersonation without it is an audit gap auditors will flag.
+3. **Keep the audit log physically separate from application logs.** Different store, different write credentials, different retention. App logs are mutable, sampled, debug-grade; audit logs are not. Mixing them means a developer with log-write access can forge or delete audit history. Ship audit events to a **dedicated append-only sink** (dedicated Postgres table with revoked UPDATE/DELETE, a WORM object store, or a managed audit service) — never the same index/bucket as `console.log` output.
+4. **Make tamper-evidence structural, not a promise. Pick by threat model:**
+   | Mechanism | Detects | Use when | Cost |
+   |---|---|---|---|
+   | **Hash chain** (each row stores `hash(prev_hash + entry)`) | any edit/delete/reorder of past rows | default — works in any DB, cheap, verifiable offline | 1 hash/write + periodic verify job |
+   | **WORM / object-lock** (S3 Object Lock COMPLIANCE, GCS retention lock) | deletion/overwrite before retention expiry, even by root | regulated retention, untrusted operators | storage + immutable retention window |
+   | **Per-entry digital signature** (HSM/KMS sign each batch) | forgery + proves origin/non-repudiation | strict non-repudiation, third-party verifier | KMS calls, key mgmt |
+   | **External anchoring** (periodic chain-head to a notary/transparency log) | insider with full DB+app access rewriting the whole chain | high-value targets, hostile-insider model | scheduled external write |
+   **Default: hash chain + WORM storage.** The chain proves *no row was altered*; object-lock proves *no row was deleted*. Hash chain alone doesn't stop a truncate-and-rebuild by someone with full write access — pair it with object-lock or external anchoring for that threat. Restrict write access to an **append-only path** (DB role with `INSERT` only; bucket policy allowing `PutObject` but not `DeleteObject`/overwrite); **nobody — including the app service account — gets row-level update/delete.**
+5. **Never log secrets or raw PII/PHI — log references and minimized metadata.** The audit log is high-value, long-retention, and widely readable by auditors; a raw payload in it is a second copy of your most sensitive data with the worst blast radius. Log the *id* of the record touched, not its contents. For changes, log a **field-name diff** (`changed: ["role","status"]`) or hashed before/after, never the literal old/new PII values. Run a serializer allowlist + a secret/PII scrubber on the `meta` object before write; drop anything not on the allowlist. Tokens, passwords, full card/SSN, message bodies, query result rows → never.
+6. **Set retention per regulation, then enforce it in the store — don't rely on a cron `DELETE`.** Map each event category to its longest applicable requirement and configure the immutable window so deletion *can't* happen early and *does* happen on schedule:
+   | Regime | Typical minimum | Enforce via |
+   |---|---|---|
+   | HIPAA | 6 years | object-lock retention 6y + lifecycle expiry |
+   | SOC2 | 1 year (often 7 for evidence) | partition + lifecycle policy |
+   | PCI-DSS | 1 year (3 months hot) | hot tier + cold archive |
+   Use **time-partitioned tables or object lifecycle rules** so expiry is declarative and audited, not a script someone can disable. Don't over-retain past the requirement (that's its own liability under privacy law — see map-privacy-data-gdpr).
+7. **Make it queryable for investigations.** A trail you can't search is forensically useless. Index `actor.id`, `target.id`, `action`, `ts`, `tenant`, `request_id`. The two queries every investigation needs: *"everything actor X did in window T"* and *"everyone who touched target Y."* Tie `request_id` back to operational traces (observability-instrument owns those) so an investigator can pivot from an audit entry to the full request. Provide a read-only investigator role separate from the write path.
+8. **Emit exactly once, synchronously to the decision, fail-closed on sensitive actions.** Write the audit record **in the same transaction/critical path as the action it records** (or via a transactional outbox) so an action can never succeed without its record. For sensitive control points (data export, permission grant, break-glass), if the audit write fails, **deny the action** — an unlogged privileged action is worse than a blocked one. Dedup downstream consumers on `id`. Never fire-and-forget an audit write for a security-critical event.
+## Common Errors
+- **Audit log shares the store/credentials with app logs.** Anyone who can write debug logs can then forge or wipe audit history. Separate store, separate INSERT-only credential, separate retention.
+- **Logging raw PII/PHI or secrets in the payload.** Creates a long-retention, broadly-read second copy of your crown jewels. Log ids and field-name diffs; scrub `meta` against an allowlist before write.
+- **"Append-only" that the app account can still UPDATE/DELETE.** That's not append-only. Revoke update/delete at the DB-role / bucket-policy level; verify with an attempted delete that must fail.
+- **Hash chain with no verification job.** An undetected break = no tamper evidence at all. Run a scheduled verifier that recomputes the chain and alerts on the first mismatch; anchor the chain head externally if insiders are in scope.
+- **Async fire-and-forget emit.** The action commits, the audit write is dropped on a queue overflow or crash, and you have a silent gap. Write in-transaction or via outbox; fail-closed for sensitive actions.
+- **Free-text `action` strings.** `"User exported the data"` can't be queried, aggregated, or mapped to a control. Use a versioned closed enum.
+- **Trusting client-supplied `X-Forwarded-For` / actor id.** Both are spoofable. Take `source_ip` only from the header your trusted proxy sets; take `actor.id` from the authenticated session, never from the request body.
+- **Missing impersonation provenance.** Support acts "as" a user and the log shows only the end user — auditors flag this as a control gap. Always populate `on_behalf_of`.
+- **Cron-job retention instead of store-enforced.** A disabled or buggy cron either leaks data forever or deletes evidence early. Use object-lock / partition lifecycle so the store enforces it.
+- **No timezone discipline.** Mixed local timestamps make a forensic timeline unreconstructable. UTC + ISO-8601 + ms, server clock, everywhere.
+- **Recording allows but dropping denies.** Auditors and investigators care most about blocked attempts. Record `result: "deny"` with `reason`, not just successful actions.
+## Verify
+1. **Exactly-once coverage:** For each event in the closed list, perform the action and confirm **one** audit record is written with all required fields populated; perform a sensitive action whose audit write is forced to fail and confirm the action is **denied** (fail-closed), not silently completed.
+2. **Tamper detection:** Directly mutate one stored row (or delete one), run the chain verifier → it flags the exact broken entry. Re-run on the untouched log → clean. This is the test that proves the tamper-evidence is real, not decorative.
+3. **Immutability of the write path:** As the **application service account**, attempt `UPDATE`/`DELETE` on an audit row (and overwrite/delete on the object store) → both must be rejected by the role/bucket policy. Only `INSERT`/`PutObject` succeeds.
+4. **No leakage:** Trigger actions involving secrets and PII/PHI (export a record, change a password, edit a profile), then grep the stored audit entries for the raw secret, the password, and the literal PII values → **zero hits**; only ids, field names, and counts appear.
+5. **Retention enforced by the store:** Confirm the object-lock/partition policy is configured for the regulated window and that no role (including admin/root) can delete before expiry; confirm entries past the window expire automatically without a manual job.
+6. **Investigation queries:** Run *"all actions by actor X in window T"* and *"all actors who touched target Y"* → both return correct, complete results in interactive time on indexed fields, and a `request_id` pivots to the matching operational trace.
+7. **Provenance:** An impersonated action shows both the acting agent and `on_behalf_of`; a denied action shows `result: "deny"` + `reason`; `source_ip` matches the trusted-proxy value, not a spoofed body field.
+Done = every event in the closed list emits exactly one complete record on a physically separate, INSERT-only, retention-locked store; the chain verifier detects any edit/delete; no secret or raw PII/PHI appears in any entry; and the two core investigation queries return complete, correct results mapped to their SOC2/HIPAA controls.

package/skills/build-cdc-streaming-pipeline/SKILL.md ADDED Viewed

@@ -0,0 +1,123 @@
+---
+name: build-cdc-streaming-pipeline
+description: Designs change-data-capture and streaming pipelines — log-based CDC off a DB transaction log (Debezium/WAL/binlog), topic-per-table fan-out onto Kafka/Kinesis, consumer-group/offset/rebalance correctness, windowed/stateful stream processing with watermarks, exactly-once vs at-least-once-plus-idempotent delivery, and Avro/Protobuf schema-registry evolution.
+when_to_use: When row changes (incl. deletes) must propagate continuously and low-latency rather than on a schedule — capturing off a transaction log, fanning onto a partitioned stream bus, consuming with correct offset/rebalance/ordering, windowed joins/aggregations, and sinking to a search index/warehouse/cache kept in sync. Distinct from build-etl-pipeline (scheduled batch/incremental loads) and message-queue-jobs (durable server-to-server task queues, not a replayable change log).
+---
+## When to Use
+Reach for this skill when data must **flow as a continuous change stream**, not land in scheduled batches:
+- "Stream every row change out of Postgres/MySQL into Kafka and keep Elasticsearch in sync"
+- "Mirror a table into the warehouse in near-real-time, including deletes"
+- "My consumer is reprocessing / skipping events after a deploy or rebalance"
+- "Consumer-group lag is climbing; ordering is wrong; one partition is hot"
+- "Join an orders stream against an enrichment stream with a 5-minute window"
+- "Late/out-of-order events are dropped or double-counted"
+- "Producer schema changed and consumers broke" / "map Debezium op codes to upserts and deletes"
+NOT this skill:
+- Scheduled/incremental batch loads to a warehouse (Airflow/dbt, nightly, `updated_at` cursor) → build-etl-pipeline
+- Durable server-to-server work/task queue (enqueue a job, one worker runs it once) → message-queue-jobs
+- Client-facing live push over WebSocket/SSE (chat, dashboards) → build-realtime-channel
+- Offline client store + delta pull + conflict resolution → build-offline-first-sync
+- The replication slot / logical-decoding DDL impact on the **source** DB itself → db-migration-safety
+- Embedding/indexing documents for retrieval as the sink semantics → rag-pipeline
+## Steps
+1. **Confirm it's actually streaming, then capture log-based — not query polling.** If freshness tolerance is minutes/hours and deletes don't need to propagate, stop and use build-etl-pipeline. If changes (incl. deletes) must land in seconds, do CDC. Pick the capture method:
+   | Method | Captures deletes | Source load | Ordering | Use when |
+   |---|---|---|---|---|
+   | Query polling (`WHERE updated_at > :cursor`) | ❌ no (row is gone) | full table scan / index pressure | by `updated_at` only | no log access; deletes don't matter; small tables |
+   | **Log-based CDC (Debezium on WAL/binlog/redo)** | ✅ yes | low — reads the log the DB already writes | exact commit order per table | **default** — full fidelity, deletes, low impact |
+   | Trigger-based | ✅ yes | write amplification on every DML | by trigger | log unavailable but deletes needed |
+   Default: **Debezium connectors** — Postgres (`pgoutput` logical decoding + replication slot), MySQL (`binlog`, `binlog_format=ROW`, `binlog_row_image=FULL`), Mongo (change streams). Set Postgres `wal_level=logical`, `REPLICA IDENTITY FULL` on tables whose before-image (for deletes/diffs) you need.
+2. **Get the snapshot→stream handoff right, or you lose or double rows at startup.** A new connector must read existing rows (snapshot) then switch to live log without a gap. Use `snapshot.mode=initial` (snapshot once, then stream) — the connector records the log position at snapshot start and streams from there. For huge tables use **incremental snapshot** (`signal`-driven, chunked) so streaming isn't blocked and the connector is resumable. **Never** drop the replication slot while paused — Postgres then discards WAL the connector hasn't read and you get a permanent gap (full re-snapshot required). Monitor `pg_replication_slots.confirmed_flush_lsn`; an abandoned slot also pins WAL and fills the disk.
+3. **Shape the bus: topic-per-table, partition key = entity id, choose retention vs compaction.** One topic per source table/aggregate (`server.table` → `dbserver1.public.orders`). Partition **by primary key** so all events for one entity land on one partition → per-entity ordering is preserved; Kafka guarantees order **only within a partition**, never across. Do not key by a low-cardinality column (creates hot partitions) or leave keys null (round-robin → ordering lost).
+   | Topic config | Use for | Effect |
+   |---|---|---|
+   | `cleanup.policy=delete` + `retention.ms` | event/audit streams, replay window | drops old segments by time/size |
+   | `cleanup.policy=compact` | **CDC table mirrors** (latest state per key) | keeps newest value per key forever; tombstone (`value=null`) deletes the key |
+   | `compact,delete` | mirror + bounded history | compacted, plus old tombstones expire after `delete.retention.ms` |
+   Default for a table mirror: **log compaction**, keyed by PK. Kinesis equivalent: shard by partition key = PK; remember Kinesis ordering is per-shard and resharding rehashes keys.
+4. **Map CDC op codes to sink operations explicitly.** Debezium envelope `op`: `c`(create)/`r`(read/snapshot)/`u`(update) → **upsert** by PK; `d`(delete) → emit a **tombstone** (`key=PK, value=null`) so compaction and downstream deletes work. Configure `ExtractNewRecordState` SMT to unwrap the envelope and `delete.handling.mode=rewrite` (or `drop`) per sink needs. A sink that treats `d` as an upsert of nulls instead of a delete silently resurrects deleted rows.
+5. **Consume correctly — offset-commit timing is the core bug.** The consumer group assigns partitions to members; each commits the offset of records it has processed. **Commit after the side effect is durable, never before.**
+   - **Enable-auto-commit is at-least-once at best and silently lossy at worst:** it commits on a timer (`auto.commit.interval.ms`) regardless of whether your handler finished. A crash after commit-but-before-processing → message lost. **Set `enable.auto.commit=false`** and commit manually after the sink write succeeds.
+   ```java
+   // at-least-once done right: process → flush sink → THEN commit
+   props.put("enable.auto.commit", "false");
+   props.put("isolation.level", "read_committed"); // skip aborted txn records
+   props.put("max.poll.records", "500");
+   while (running) {
+     var records = consumer.poll(Duration.ofMillis(500));
+     for (var r : records) sink.upsert(key(r), value(r));  // idempotent
+     sink.flush();                                          // durable side effect first
+     consumer.commitSync();                                 // commit only after flush
+   }
+   ```
+   Order is load-bearing: process → flush → commit. Commit-before-process loses on crash; commit-per-record kills throughput.
+   - **Cooperative rebalance, not eager (stop-the-world):** set `partition.assignment.strategy=CooperativeStickyAssignor` so a join/leave revokes only the moved partitions instead of pausing the whole group. Commit in the `onPartitionsRevoked` callback so the new owner resumes from the right place.
+   - **Avoid spurious rebalances:** if processing a poll batch can exceed `max.poll.interval.ms` (default 5 min), the broker evicts the member and rebalances mid-work. Either lower `max.poll.records` or raise the interval. Keep `session.timeout.ms`/`heartbeat.interval.ms` at ~3:1.
+   - **Lag, not just throughput:** alert on consumer-group lag (`kafka-consumer-groups --describe`, or Burrow/CMAK). Scale by adding consumers **up to the partition count** — extra consumers past `#partitions` sit idle. More throughput needs more partitions (and partition count can only go *up*; increasing it rehashes keys and breaks ordering for in-flight keys).
+   - **Poison record → DLQ, don't block the partition.** A record that always fails (bad schema, sink rejects) will halt the partition forever if you retry in place. After N attempts, route it to a dead-letter topic with headers (original topic/partition/offset/exception), commit past it, continue.
+6. **Process: stateless map vs windowed/stateful — pick the window and a watermark.** Stateless filter/transform/route → a plain consumer or single-operator stream. Joins/aggregations need **state + a window + a watermark** (event-time progress marker that says "no events older than T will arrive"):
+   | Window | Use for | Note |
+   |---|---|---|
+   | Tumbling (fixed, non-overlapping) | per-minute counts, billing buckets | each event in exactly one window |
+   | Hopping/sliding (overlapping) | moving averages, "last 5 min every 1 min" | event in multiple windows |
+   | Session (gap-based) | user sessions, bursts | window closes after inactivity gap |
+   Use **event time** (the row's commit/`ts_ms`), never processing time, or replay and out-of-order delivery corrupt results. Set `allowed_lateness`/grace so late events update an already-emitted window instead of being dropped; send events past the grace period to a side-output, don't silently discard. Keep operator state in a durable, checkpointed store (Kafka Streams `RocksDB` + changelog topic, or Flink checkpoints) so a restart restores aggregates instead of recomputing from zero.
+7. **Choose delivery semantics deliberately — exactly-once is opt-in and not free.** Default and simplest: **at-least-once + idempotent sink.** Make the sink absorb duplicates (upsert by PK, `INSERT ... ON CONFLICT DO UPDATE`, dedup table on event id) so reprocessing after a rebalance/replay is harmless. Reach for true **exactly-once** only when the sink can't be made idempotent (e.g. incrementing counters, append-only ledgers):
+   - Kafka→Kafka: enable EOS — `processing.guarantee=exactly_once_v2` (Kafka Streams) or transactional producer (`enable.idempotence=true`, `transactional.id`) + consumer `isolation.level=read_committed`. This is a transactional read-process-write **within Kafka only**; it does not extend to an external DB.
+   - Kafka→external store: use **idempotent upserts**, or a two-phase/transactional sink connector that stores the consumed offset in the *same* transaction as the data.
+   - **Replay** is a first-class operation: reset the group to an offset/timestamp (`kafka-consumer-groups --reset-offsets --to-datetime`) and reprocess. This only produces correct results **because** the sink is idempotent or transactional — design for replay from day one.
+8. **Schema registry + compatibility, or producers will break consumers.** Serialize with **Avro or Protobuf via a schema registry** (not raw JSON) so every record carries a schema id and the registry enforces compatibility on register. Default compatibility: **BACKWARD** (new schema can read old data) — consumers upgrade first. Rules that keep it safe: add fields **with defaults**, never rename/retype a field in place (add new + dual-write + retire), never remove a required field. Pin Debezium key/value converters to the registry. For Kafka Connect sinks, the registry + compatibility check is what stops a bad producer from poisoning every downstream consumer at 3am.
+## Common Errors
+- **`enable.auto.commit=true` treated as exactly-once.** It's a timer that commits independent of your handler — a crash loses or reprocesses. Set it `false` and commit after the sink flush.
+- **Committing the offset before the side effect is durable.** Crash in the gap = silent data loss. Strict order: process → flush sink → commit.
+- **Dropping/recreating the Postgres replication slot to "reset".** WAL the connector hasn't consumed is discarded → permanent gap, forces a full re-snapshot. Pause the connector, keep the slot; never delete a slot with unconsumed WAL.
+- **Abandoned/lagging slot fills the source disk.** A stopped consumer pins WAL forever. Alert on `confirmed_flush_lsn` lag and slot age; clean up dead connectors.
+- **Null or low-cardinality partition key.** Null key → round-robin → cross-partition reordering of one entity's events. Low-cardinality key → hot partition. Key by primary key.
+- **Increasing partition count on a live keyed topic.** Rehashes keys → an entity's new events go to a different partition than its in-flight ones → ordering broken. Plan partition count up front; treat increases as a migration.
+- **Treating a Debezium `d` (delete) as an upsert.** Resurrects deleted rows in the sink. Emit a tombstone (`value=null`) and let the sink delete; use the `ExtractNewRecordState` SMT.
+- **Reading uncommitted transactional records.** Without `isolation.level=read_committed`, consumers see aborted-transaction records and double-count. Set it whenever producers use transactions.
+- **Poison record retried in place.** One un-processable record halts its partition forever and lag explodes. Bounded retries → DLQ topic → commit past it.
+- **Processing on the poll thread longer than `max.poll.interval.ms`.** Broker thinks the consumer died, rebalances mid-batch, you reprocess. Shrink `max.poll.records` or raise the interval; offload slow work.
+- **Eager (stop-the-world) rebalance assignor by default.** Every scale event pauses the whole group. Use `CooperativeStickyAssignor`.
+- **Windowing on processing time.** Replay and out-of-order delivery silently corrupt aggregates. Window on event time with a watermark; route past-grace events to a side-output.
+- **Raw JSON with no registry.** A producer field rename breaks every consumer with no guardrail. Use Avro/Protobuf + registry with BACKWARD compatibility.
+- **Scaling consumers past partition count.** Extra members sit idle. Add partitions (carefully — see above) or split the workload differently.
+## Verify
+1. **Capture fidelity incl. deletes:** `INSERT`, `UPDATE`, then `DELETE` a row on the source → consumer observes a create, an update (with correct before/after), and a tombstone, in commit order. A delete that produces no tombstone is a fail.
+2. **Snapshot→stream no-gap:** seed N rows, start the connector, then write M more **during** the snapshot → exactly N+M distinct rows arrive downstream, none missing, none duplicated past idempotency.
+3. **Per-entity ordering:** rapidly emit 3 updates to one PK → consumer receives them in source order on a single partition (events for that key never interleave out of order).
+4. **Offset correctness across restart:** kill the consumer mid-batch, restart → no committed-but-unprocessed record is lost and no already-sunk record corrupts the sink (idempotency holds). Lag returns to ~0.
+5. **Rebalance correctness:** add then remove a consumer under load with `CooperativeStickyAssignor` → no record is processed by two members and none is skipped; only moved partitions are revoked (check logs).
+6. **Replay = same result:** `--reset-offsets --to-earliest` and reprocess → final sink state is byte-identical to before the replay (proves the sink is idempotent/transactional).
+7. **Poison handling:** inject a record the sink rejects → it lands in the DLQ with origin headers, the partition keeps flowing, lag does not climb.
+8. **Late event:** emit an event with an event-time inside a closed-but-within-grace window → the window result updates; past grace → it appears in the side-output, not silently dropped.
+9. **Schema evolution:** register a new schema adding a field with a default under BACKWARD compatibility → old consumers keep running; attempt an incompatible change → registry rejects the register (does not reach consumers).
+10. **Lag SLO:** under sustained source write load, consumer-group lag stays bounded (returns toward 0), not monotonically rising.
+Done = deletes propagate as tombstones, snapshot→stream is gap-free, per-entity ordering holds on one partition, a kill-restart and a full replay both leave the sink state correct (idempotent or transactional), poison records go to a DLQ without blocking the partition, late events hit grace/side-output (never silently dropped), and an incompatible schema is rejected at the registry before it reaches any consumer.