sanook-cli 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.env.example +19 -0
  2. package/CHANGELOG.md +144 -0
  3. package/README.md +153 -20
  4. package/README.th.md +136 -0
  5. package/dist/agentContext.js +4 -0
  6. package/dist/approval.js +6 -0
  7. package/dist/bin.js +394 -51
  8. package/dist/brain.js +92 -59
  9. package/dist/brand.js +47 -0
  10. package/dist/checkpoint.js +37 -0
  11. package/dist/commands.js +86 -6
  12. package/dist/compaction.js +76 -5
  13. package/dist/config.js +100 -12
  14. package/dist/cost.js +60 -3
  15. package/dist/doctor.js +92 -0
  16. package/dist/gateway/auth.js +2 -2
  17. package/dist/gateway/ledger.js +2 -2
  18. package/dist/gateway/scheduler.js +1 -0
  19. package/dist/gateway/serve.js +6 -4
  20. package/dist/gateway/server.js +10 -2
  21. package/dist/git.js +11 -2
  22. package/dist/hooks.js +43 -17
  23. package/dist/knowledge.js +48 -49
  24. package/dist/loop.js +182 -66
  25. package/dist/lsp/client.js +173 -0
  26. package/dist/lsp/framing.js +56 -0
  27. package/dist/lsp/index.js +138 -0
  28. package/dist/lsp/servers.js +82 -0
  29. package/dist/mcp-server.js +244 -0
  30. package/dist/mcp.js +184 -29
  31. package/dist/memory-store.js +559 -0
  32. package/dist/memory.js +143 -29
  33. package/dist/orchestrate.js +150 -0
  34. package/dist/providers/codex.js +2 -2
  35. package/dist/providers/keys.js +3 -2
  36. package/dist/providers/registry.js +133 -1
  37. package/dist/repomap.js +93 -0
  38. package/dist/search/chunk.js +158 -0
  39. package/dist/search/embed-store.js +187 -0
  40. package/dist/search/engine.js +203 -0
  41. package/dist/search/fuse.js +35 -0
  42. package/dist/search/index-core.js +187 -0
  43. package/dist/search/indexer.js +241 -0
  44. package/dist/search/store.js +77 -0
  45. package/dist/session.js +42 -8
  46. package/dist/skill-install.js +10 -10
  47. package/dist/skills.js +12 -9
  48. package/dist/summarize.js +31 -0
  49. package/dist/tools/bash.js +21 -2
  50. package/dist/tools/diagnostics.js +41 -0
  51. package/dist/tools/edit.js +29 -7
  52. package/dist/tools/index.js +8 -1
  53. package/dist/tools/list.js +7 -2
  54. package/dist/tools/permission.js +90 -9
  55. package/dist/tools/read.js +23 -4
  56. package/dist/tools/remember.js +1 -1
  57. package/dist/tools/sandbox.js +61 -0
  58. package/dist/tools/search.js +105 -4
  59. package/dist/tools/task.js +195 -29
  60. package/dist/tools/timeout.js +35 -0
  61. package/dist/tools/util.js +10 -0
  62. package/dist/tools/write.js +6 -4
  63. package/dist/trust.js +89 -0
  64. package/dist/ui/app.js +218 -27
  65. package/dist/ui/banner.js +4 -9
  66. package/dist/ui/history.js +30 -0
  67. package/dist/ui/mentions.js +44 -0
  68. package/dist/ui/setup.js +6 -5
  69. package/dist/ui/useEditor.js +83 -0
  70. package/dist/update.js +114 -0
  71. package/dist/worktree.js +173 -0
  72. package/package.json +11 -5
  73. package/scripts/postinstall.mjs +33 -0
  74. package/second-brain/.agents/_Index.md +30 -0
  75. package/second-brain/.agents/skills/_Index.md +30 -0
  76. package/second-brain/.agents/workflows/_Index.md +30 -0
  77. package/second-brain/AGENTS.md +4 -4
  78. package/second-brain/Acceptance/_Index.md +30 -0
  79. package/second-brain/Acceptance/golden-case-template.md +39 -0
  80. package/second-brain/Areas/_Index.md +30 -0
  81. package/second-brain/Bugs/System-OS/_Index.md +30 -0
  82. package/second-brain/Bugs/_Index.md +30 -0
  83. package/second-brain/CLAUDE.md +4 -1
  84. package/second-brain/Checklists/_Index.md +30 -0
  85. package/second-brain/Checklists/preflight-postflight-template.md +29 -0
  86. package/second-brain/Distillations/_Index.md +30 -0
  87. package/second-brain/Entities/_Index.md +30 -0
  88. package/second-brain/Entities/entity-template.md +33 -0
  89. package/second-brain/Evals/_Index.md +30 -0
  90. package/second-brain/Evals/correction-pairs.md +24 -0
  91. package/second-brain/Evals/failure-taxonomy.md +24 -0
  92. package/second-brain/Evals/golden-set.md +25 -0
  93. package/second-brain/Evals/quality-ledger.md +23 -0
  94. package/second-brain/Evals/self-eval-rubric.md +23 -0
  95. package/second-brain/GEMINI.md +4 -4
  96. package/second-brain/Goals/_Index.md +30 -0
  97. package/second-brain/Handoffs/_Index.md +30 -0
  98. package/second-brain/Home.md +7 -0
  99. package/second-brain/Intake/Raw Sources/_Index.md +30 -0
  100. package/second-brain/Intake/_Index.md +30 -0
  101. package/second-brain/Intake/_Quarantine/_Index.md +30 -0
  102. package/second-brain/Learning/_Index.md +30 -0
  103. package/second-brain/Playbooks/_Index.md +30 -0
  104. package/second-brain/Playbooks/playbook-template.md +23 -0
  105. package/second-brain/Projects/_Index.md +30 -0
  106. package/second-brain/Prompts/_Index.md +30 -0
  107. package/second-brain/README.md +2 -1
  108. package/second-brain/Research/_Index.md +30 -0
  109. package/second-brain/Retrospectives/_Index.md +30 -0
  110. package/second-brain/Reviews/_Index.md +30 -0
  111. package/second-brain/Runbooks/_Index.md +30 -0
  112. package/second-brain/Runbooks/eval-loop.md +24 -0
  113. package/second-brain/Sessions/_Index.md +30 -0
  114. package/second-brain/Shared/AI-Context-Index.md +20 -0
  115. package/second-brain/Shared/AI-Threads/_Index.md +30 -0
  116. package/second-brain/Shared/Archive/_Index.md +30 -0
  117. package/second-brain/Shared/Assets/_Index.md +30 -0
  118. package/second-brain/Shared/Context-Packs/_Index.md +30 -0
  119. package/second-brain/Shared/Context7-Docs/_Index.md +30 -0
  120. package/second-brain/Shared/Coordination/NOW.md +28 -0
  121. package/second-brain/Shared/Coordination/_Index.md +30 -0
  122. package/second-brain/Shared/Coordination/agent-registry.md +24 -0
  123. package/second-brain/Shared/Coordination/task-board/_Index.md +30 -0
  124. package/second-brain/Shared/Coordination/task-board/task-template.md +43 -0
  125. package/second-brain/Shared/Coordination/task-board.md +32 -0
  126. package/second-brain/Shared/Core-Facts/_Index.md +30 -0
  127. package/second-brain/Shared/Decision-Memory/_Index.md +30 -0
  128. package/second-brain/Shared/Glossary/_Index.md +30 -0
  129. package/second-brain/Shared/Memory-Inbox/_Index.md +30 -0
  130. package/second-brain/Shared/Operating-State/_Index.md +30 -0
  131. package/second-brain/Shared/Prompting/_Index.md +30 -0
  132. package/second-brain/Shared/Provenance/_Index.md +30 -0
  133. package/second-brain/Shared/Rules/_Index.md +30 -0
  134. package/second-brain/Shared/Rules/contextual-note-rule.md +30 -0
  135. package/second-brain/Shared/Rules/frontmatter-standard.md +10 -0
  136. package/second-brain/Shared/Rules/memory-write-protocol.md +28 -0
  137. package/second-brain/Shared/Rules/procedural-runbook-header.md +40 -0
  138. package/second-brain/Shared/Rules/review-and-staleness-policy.md +22 -0
  139. package/second-brain/Shared/Rules/rules-formatting.md +34 -0
  140. package/second-brain/Shared/Scripts/_Index.md +30 -0
  141. package/second-brain/Shared/Scripts-Archive/_Index.md +30 -0
  142. package/second-brain/Shared/Tech-Standards/_Index.md +30 -0
  143. package/second-brain/Shared/Tech-Standards/verification-standard.md +40 -0
  144. package/second-brain/Shared/User-Memory/_Index.md +30 -0
  145. package/second-brain/Shared/User-Persona/_Index.md +30 -0
  146. package/second-brain/Shared/User-Persona/owner-profile.md +25 -0
  147. package/second-brain/Shared/Working-Memory/_Index.md +30 -0
  148. package/second-brain/Shared/_Index.md +30 -0
  149. package/second-brain/Shared/mcp-servers/_Index.md +30 -0
  150. package/second-brain/Skills/_Index.md +30 -0
  151. package/second-brain/Templates/_Index.md +30 -0
  152. package/second-brain/Templates/bug.md +2 -0
  153. package/second-brain/Templates/handoff.md +2 -0
  154. package/second-brain/Templates/session.md +2 -0
  155. package/second-brain/Tools/_Index.md +30 -0
  156. package/second-brain/Traces/_Index.md +30 -0
  157. package/second-brain/Vault Structure Map.md +33 -1
  158. package/second-brain/copilot/_Index.md +30 -0
  159. package/skills/audit-license-compliance/SKILL.md +117 -0
  160. package/skills/author-codemod/SKILL.md +110 -0
  161. package/skills/build-audit-logging/SKILL.md +112 -0
  162. package/skills/build-cdc-streaming-pipeline/SKILL.md +123 -0
  163. package/skills/build-cli-tool/SKILL.md +108 -0
  164. package/skills/build-data-table/SKILL.md +141 -0
  165. package/skills/build-native-mobile-ui/SKILL.md +154 -0
  166. package/skills/build-offline-first-sync/SKILL.md +118 -0
  167. package/skills/build-realtime-channel/SKILL.md +122 -0
  168. package/skills/build-vector-search/SKILL.md +131 -0
  169. package/skills/compose-local-dev-stack/SKILL.md +149 -0
  170. package/skills/configure-bundler-build/SKILL.md +166 -0
  171. package/skills/configure-dns-tls/SKILL.md +142 -0
  172. package/skills/configure-reverse-proxy-lb/SKILL.md +129 -0
  173. package/skills/configure-security-headers-csp/SKILL.md +122 -0
  174. package/skills/contract-testing/SKILL.md +140 -0
  175. package/skills/datetime-timezone-correctness/SKILL.md +125 -0
  176. package/skills/debug-ci-pipeline-failure/SKILL.md +134 -0
  177. package/skills/debug-flaky-tests/SKILL.md +128 -0
  178. package/skills/defend-llm-prompt-injection/SKILL.md +110 -0
  179. package/skills/deliver-webhooks/SKILL.md +116 -0
  180. package/skills/design-api-pagination/SKILL.md +144 -0
  181. package/skills/design-authorization-model/SKILL.md +119 -0
  182. package/skills/design-backup-dr-recovery/SKILL.md +113 -0
  183. package/skills/design-event-sourcing-cqrs/SKILL.md +143 -0
  184. package/skills/design-multi-tenancy/SKILL.md +100 -0
  185. package/skills/design-protobuf-grpc-service/SKILL.md +146 -0
  186. package/skills/design-relational-schema/SKILL.md +129 -0
  187. package/skills/design-search-index-infra/SKILL.md +151 -0
  188. package/skills/design-state-machine/SKILL.md +108 -0
  189. package/skills/design-token-system/SKILL.md +109 -0
  190. package/skills/distributed-locks-leases/SKILL.md +120 -0
  191. package/skills/encrypt-sensitive-data/SKILL.md +148 -0
  192. package/skills/feature-flags-rollout/SKILL.md +130 -0
  193. package/skills/file-upload-object-storage/SKILL.md +107 -0
  194. package/skills/fuzz-dynamic-security-test/SKILL.md +111 -0
  195. package/skills/harden-llm-app-reliability/SKILL.md +126 -0
  196. package/skills/i18n-localization-setup/SKILL.md +113 -0
  197. package/skills/idempotency-keys/SKILL.md +107 -0
  198. package/skills/implement-push-notifications/SKILL.md +142 -0
  199. package/skills/ingest-webhook-secure/SKILL.md +120 -0
  200. package/skills/integrate-oauth-oidc/SKILL.md +126 -0
  201. package/skills/load-stress-test/SKILL.md +129 -0
  202. package/skills/map-privacy-data-gdpr/SKILL.md +146 -0
  203. package/skills/model-nosql-data/SKILL.md +118 -0
  204. package/skills/money-decimal-arithmetic/SKILL.md +123 -0
  205. package/skills/monitor-ml-drift/SKILL.md +109 -0
  206. package/skills/numeric-precision-units/SKILL.md +144 -0
  207. package/skills/optimize-llm-cost-latency/SKILL.md +103 -0
  208. package/skills/optimize-react-rerenders/SKILL.md +124 -0
  209. package/skills/orchestrate-agent-workflow/SKILL.md +100 -0
  210. package/skills/payments-billing-integration/SKILL.md +114 -0
  211. package/skills/pin-toolchain-versions/SKILL.md +116 -0
  212. package/skills/plan-strangler-migration/SKILL.md +95 -0
  213. package/skills/property-based-testing/SKILL.md +108 -0
  214. package/skills/publish-package-registry/SKILL.md +130 -0
  215. package/skills/recover-git-state/SKILL.md +119 -0
  216. package/skills/remediate-web-vulnerabilities/SKILL.md +125 -0
  217. package/skills/resilience-timeouts-retries/SKILL.md +104 -0
  218. package/skills/resolve-merge-rebase-conflict/SKILL.md +97 -0
  219. package/skills/rewrite-git-history/SKILL.md +109 -0
  220. package/skills/scaffold-cross-platform-app/SKILL.md +137 -0
  221. package/skills/schema-evolution-compatibility/SKILL.md +121 -0
  222. package/skills/send-transactional-email/SKILL.md +126 -0
  223. package/skills/serve-deploy-ml-model/SKILL.md +107 -0
  224. package/skills/setup-cdn-edge-waf/SKILL.md +107 -0
  225. package/skills/setup-devcontainer-env/SKILL.md +131 -0
  226. package/skills/setup-lint-format-precommit/SKILL.md +140 -0
  227. package/skills/setup-monorepo-tooling/SKILL.md +125 -0
  228. package/skills/ship-mobile-app-store-release/SKILL.md +137 -0
  229. package/skills/structured-output-llm/SKILL.md +86 -0
  230. package/skills/supply-chain-sbom-provenance/SKILL.md +120 -0
  231. package/skills/test-data-factories/SKILL.md +158 -0
  232. package/skills/threat-model-stride/SKILL.md +123 -0
  233. package/skills/train-evaluate-ml-model/SKILL.md +109 -0
  234. package/skills/unicode-text-correctness/SKILL.md +109 -0
  235. package/skills/visual-regression-testing/SKILL.md +120 -0
@@ -0,0 +1,154 @@
1
+ ---
2
+ name: build-native-mobile-ui
3
+ description: Builds native mobile UI in SwiftUI (iOS) and Jetpack Compose (Android) — declarative layout (List/LazyVStack vs Scaffold/LazyColumn), unidirectional state with hoisting (@Observable vs ViewModel/StateFlow), typed navigation stacks with deep links, adaptive sizing (size classes/WindowSizeClass), light/dark theming via semantic tokens, lifecycle-correct side effects, recomposition control, and VoiceOver/TalkBack accessibility.
4
+ when_to_use: Implementing or reviewing a native iOS (SwiftUI) or Android (Jetpack Compose) screen/component — lists, forms, custom layouts, state hoisting, typed navigation, dark mode/Dynamic Type, adaptive phone/tablet/foldable, recomposition jank. Distinct from scaffold-cross-platform-app (React Native/Flutter, not native Swift/Kotlin), build-react-component (web/React), and audit-accessibility-wcag (web WCAG audit).
5
+ ---
6
+
7
+ ## When to Use
8
+
9
+ Reach for this skill when building or reviewing a **native** iOS/Android screen in a declarative UI framework (SwiftUI or Jetpack Compose, i.e. Swift/Kotlin — not React Native or Flutter):
10
+
11
+ - "Build a SwiftUI/Compose list-detail screen with pull-to-refresh"
12
+ - "Hoist this view state — the toggle should be controlled by the parent"
13
+ - "Add a NavigationStack / Compose Navigation route with a deep link"
14
+ - "Make this adapt to iPad / foldable / landscape (two-pane on wide)"
15
+ - "Support dark mode + Dynamic Type without truncation"
16
+ - "VoiceOver reads this button wrong / TalkBack skips the row"
17
+ - "This list janks / recomposes the whole screen on every keystroke"
18
+
19
+ NOT this skill:
20
+ - Cross-platform UI in React Native or Flutter (JSX/Dart, Expo Router, Riverpod/Bloc) → scaffold-cross-platform-app (this skill is native Swift/Kotlin only)
21
+ - Web/React components (JSX, hooks, DOM) → build-react-component
22
+ - CSS/Tailwind breakpoints and responsive web layout → style-responsive-tailwind
23
+ - Auditing a web page against WCAG success criteria → audit-accessibility-wcag
24
+ - Server cache, fetching, optimistic mutation, query invalidation → manage-client-server-state (this skill owns *UI* state, not network state)
25
+ - Architecting the token tiers/themes/multi-platform export pipeline → design-token-system (this skill *consumes* tokens in a screen, doesn't design the system)
26
+ - Converting a Figma/design spec into pixel-faithful code → implement-from-design (use this skill for the framework idioms once you have the spec)
27
+ - The server send path, payload schema, or token registration for push → implement-push-notifications (this skill owns the in-app deep-link router push taps land in)
28
+ - Code signing, build lanes, store upload, phased rollout → ship-mobile-app-store-release
29
+ - Profiling/fixing web load metrics (LCP/CLS) → optimize-core-web-vitals
30
+
31
+ ## Steps
32
+
33
+ 1. **Pick the container primitive by data shape — never default to a plain stack for collections.** Lazy containers virtualize; eager ones build every child up front and jank past ~50 rows.
34
+
35
+ | Need | SwiftUI | Compose |
36
+ |---|---|---|
37
+ | Long/unbounded scrolling list | `List` (free separators, swipe, refresh) or `LazyVStack` in `ScrollView` | `LazyColumn` (with `key = { it.id }`) |
38
+ | Small fixed group (≤ ~20, all visible) | `VStack`/`Form`/`Section` | `Column` |
39
+ | Screen chrome (top bar, FAB, snackbar, insets) | `NavigationStack` + `.toolbar` | `Scaffold(topBar, floatingActionButton, snackbarHost)` |
40
+ | Grid | `LazyVGrid(columns:)` | `LazyVerticalGrid(columns = GridCells.Adaptive(160.dp))` |
41
+ | Overlap / z-stack | `ZStack` | `Box` |
42
+
43
+ **Always set stable item identity** (`List(items, id: \.id)` / `items(list, key = { it.id })`) — without it, scroll position and animations break on reorder.
44
+
45
+ 2. **One source of truth, hoisted up; flow data down, events up.** A child that owns the state it renders is unreusable and untestable. Make leaf views *stateless* (value + callback); keep state at the lowest common owner.
46
+
47
+ SwiftUI — child takes `Binding`, owns nothing:
48
+ ```swift
49
+ struct ToggleRow: View { // stateless leaf
50
+ let title: String
51
+ @Binding var isOn: Bool
52
+ var body: some View { Toggle(title, isOn: $isOn) }
53
+ }
54
+ // parent owns it:
55
+ @State private var pushEnabled = false
56
+ ToggleRow(title: "Push", isOn: $pushEnabled)
57
+ ```
58
+ Compose — hoist with `value` + `onValueChange`, never an internal `remember` for controlled state:
59
+ ```kotlin
60
+ @Composable fun ToggleRow(title: String, checked: Boolean, onChecked: (Boolean) -> Unit) {
61
+ Row { Text(title); Switch(checked = checked, onCheckedChange = onChecked) } // stateless
62
+ }
63
+ ```
64
+
65
+ | Concern | SwiftUI | Compose |
66
+ |---|---|---|
67
+ | Local ephemeral UI state | `@State` (private) | `var x by remember { mutableStateOf(...) }` |
68
+ | Owned by parent | `@Binding` | `value` + `onValueChange` lambda |
69
+ | Screen/business state, survives config change | `@Observable` class (`@State` at owner) | `ViewModel` + `StateFlow` → `collectAsStateWithLifecycle()` |
70
+ | Survive process death | `@SceneStorage` / `@AppStorage` | `SavedStateHandle` / `rememberSaveable` |
71
+ | DI'd cross-cutting | `@Environment` | `CompositionLocal` / hilt-injected VM |
72
+
73
+ Default: screen state lives in `@Observable` (iOS 17+) / `ViewModel`; the view is a pure function of it. Expose **immutable** state out (`val uiState: StateFlow<UiState>`), accept intents in (`fun onIntent(...)`).
74
+
75
+ 3. **Type your navigation — no stringly-typed routes for in-app pushes.** Drive the stack from a state-bound path so back/deep-link/restore are deterministic.
76
+
77
+ SwiftUI:
78
+ ```swift
79
+ @State private var path = NavigationPath()
80
+ NavigationStack(path: $path) {
81
+ List(items) { NavigationLink("\($0.name)", value: $0) } // value, not destination view
82
+ .navigationDestination(for: Item.self) { ItemDetail(item: $0) }
83
+ }
84
+ // deep link: path.append(item) — or .onOpenURL { url in route(url, &path) }
85
+ ```
86
+ Compose (type-safe routes, nav 2.8+ with `@Serializable` objects):
87
+ ```kotlin
88
+ @Serializable data class ItemDetail(val id: String)
89
+ NavHost(nav, startDestination = ItemList) {
90
+ composable<ItemList> { ItemListScreen(onOpen = { nav.navigate(ItemDetail(it.id)) }) }
91
+ composable<ItemDetail>(deepLinks = listOf(navDeepLink<ItemDetail>(basePath = "app://item"))) {
92
+ ItemDetailScreen(it.toRoute<ItemDetail>().id)
93
+ }
94
+ }
95
+ ```
96
+ Rules: each tab gets its **own** back stack; restore a saved stack on tab reselect (don't reset to root); a deep link must rebuild the parent stack so Back has somewhere to go.
97
+
98
+ 4. **Layout for variable size from the start — respect insets, scale with the user's type setting, branch on width class.** Hardcoded heights and a single phone layout break on Dynamic Type / iPad / foldable.
99
+ - **Safe area / insets:** never hardcode status-bar or notch padding. SwiftUI honors safe area by default — only push to edges with `.ignoresSafeArea()` deliberately and pad content back. Compose: `Scaffold` gives `innerPadding` — apply it; for keyboard use `Modifier.imePadding()` / `windowInsetsPadding(...)`.
100
+ - **Dynamic Type / font scale:** use semantic styles (`.font(.body)` / `MaterialTheme.typography.bodyLarge`), not fixed `pt`/`sp`. Let text wrap; cap with `.lineLimit` + `.minimumScaleFactor(0.8)` only when a hard ceiling exists. Verify at the largest accessibility size.
101
+ - **Adaptive width:** branch on the class, not a raw `375`-px guess. iPad/landscape and wide foldables → two-pane.
102
+
103
+ | | iOS | Android |
104
+ |---|---|---|
105
+ | Read width class | `@Environment(\.horizontalSizeClass)` (`.compact`/`.regular`) | `calculateWindowSizeClass(activity).widthSizeClass` (`Compact`/`Medium`/`Expanded`) |
106
+ | List+detail that adapts | `NavigationSplitView` | two-pane when `Expanded`, single `NavHost` when `Compact` |
107
+ | Breakpoints | compact = phone portrait; regular = iPad/landscape | Compact <600dp · Medium 600–840dp · Expanded ≥840dp |
108
+
109
+ 5. **Theme through tokens, not literals — and support dark by deriving, not duplicating.** Reference semantic roles so dark mode is automatic.
110
+ - iOS: use system semantic colors (`Color(.systemBackground)`, `.primary`, `.secondary`, `Color("Brand")` from an Asset Catalog with a Dark variant) — they flip with `@Environment(\.colorScheme)`. Icons: SF Symbols (`Image(systemName: "trash")`) so they match weight/scale.
111
+ - Android: define a `ColorScheme` via Material 3 `lightColorScheme()`/`darkColorScheme()` (or `dynamicColorScheme(context)` for Material You on API 31+), select by `isSystemInDarkTheme()`, expose through `MaterialTheme`. Never read `Color(0xFF...)` literals inside a composable.
112
+ - Never gate logic on the literal color; gate on the token/role. One token table, two schemes derived from it.
113
+
114
+ 6. **Accessibility is a build requirement, not a pass.** Every interactive element needs a label + role; targets ≥ 44pt (iOS HIG) / ≥ 48dp (Material). Decorative images get *no* label.
115
+ - iOS: `.accessibilityLabel("Delete")`, `.accessibilityAddTraits(.isButton)`, `.accessibilityHidden(true)` for decoration, group a row with `.accessibilityElement(children: .combine)` so VoiceOver reads it as one unit.
116
+ - Compose: `Modifier.semantics { contentDescription = "Delete" }` (or the param on `Icon`), `contentDescription = null` for decorative `Image`, `Modifier.clearAndSetSemantics {}` to merge a row, `Role.Button`/`Role.Checkbox` via `Modifier.semantics { role = ... }`.
117
+ - Don't override the framework focus order unless reading order is genuinely wrong; tappable area must equal visible-or-larger, never smaller than the touch-target minimum.
118
+
119
+ 7. **Lifecycle & side effects: run effects in the right hook, keyed correctly, and stop fighting recomposition.** A composable body / `body` runs *many* times — never do I/O, start timers, or mutate state there.
120
+ - iOS: `.task { await load() }` (auto-cancels on disappear) for async load; `.onAppear`/`.onDisappear` for non-async; `.onChange(of: query) { … }` for reactions. Don't kick network off in `body`.
121
+ - Compose: `LaunchedEffect(key)` for suspend work on enter / when `key` changes; `rememberCoroutineScope()` for event-triggered launches; `DisposableEffect` to register+`onDispose` cleanup; `derivedStateOf` to avoid recomposing on every upstream tick; `produceState` to bridge a callback into state. The **key** must include every input the effect depends on, or it goes stale.
122
+ - Stop needless recomposition: read VM state with `collectAsStateWithLifecycle()`; pass stable/`@Immutable` types and lambdas; hoist heavy reads out of `items{}`; defer rapidly-changing reads (scroll offset) with a lambda (`Modifier.offset { … }`) so only the layout phase reruns. SwiftUI equivalent: split big views so a small `@State` change invalidates a small subtree, give `ForEach` stable ids, mark expensive subviews `Equatable`.
123
+
124
+ 8. **Verify on a real simulator/emulator with previews + the accessibility inspectors** (see Verify) before declaring done — previews catch layout, the device catches lifecycle and gesture bugs previews can't.
125
+
126
+ ## Common Errors
127
+
128
+ - **`VStack`/`Column` for a long list.** Builds every child eagerly → jank and memory blowup. Use `LazyVStack`/`List` / `LazyColumn`.
129
+ - **No stable item key.** `LazyColumn` without `key=` (or `List` keyed by index) reorders/animates wrong and loses scroll on insert. Key by a stable id.
130
+ - **State owned in the leaf you want to reuse.** Child `@State`/internal `remember` for what the parent should control → can't lift, can't test, drifts out of sync. Hoist: `Binding` / `value`+`onValueChange`.
131
+ - **`remember { mutableStateOf(...) }` for screen state.** Lost on rotation/process death; doesn't survive nav. Put it in a `ViewModel` (or `rememberSaveable` for trivial UI bits).
132
+ - **Collecting flow with `.collectAsState()`** instead of `collectAsStateWithLifecycle()` — keeps collecting in the background, wasting work and risking stale UI. Use the lifecycle-aware one.
133
+ - **Side effect in `body`/composable body.** Network or `mutableStateOf` write during composition → infinite recomposition or duplicate loads. Move to `.task`/`LaunchedEffect`.
134
+ - **Wrong/empty `LaunchedEffect` key.** `LaunchedEffect(Unit)` that reads `id` never reloads when `id` changes; over-keyed restarts constantly. Key on exactly the inputs the effect uses.
135
+ - **Stringly-typed nav routes** (`navigate("detail/$id")` with manual parsing) — typos compile, args lose types, deep links break silently. Use type-safe routes / `value:` + `navigationDestination(for:)`.
136
+ - **Single shared back stack across tabs.** Switching tabs nukes the other tab's history. Give each tab its own `NavHost`/stack and save/restore it.
137
+ - **Hardcoded padding for the notch/status bar / ignoring `innerPadding`.** Content slides under the bar or the keyboard. Honor safe area / apply `Scaffold` `innerPadding` + `imePadding()`.
138
+ - **Fixed font sizes / `.lineLimit(1)` everywhere.** Truncates at large Dynamic Type, fails accessibility. Semantic text styles; allow wrap; scale-factor only as a last resort.
139
+ - **Hardcoded hex colors in views.** Dark mode shows white-on-white. Use semantic colors / `MaterialTheme.colorScheme` tokens with light+dark schemes.
140
+ - **Touch target smaller than the icon's frame.** A 24pt icon with no padding is a 24pt target. Pad to ≥44pt/48dp.
141
+ - **`contentDescription`/label missing on icon buttons, or set on decorative images.** Screen reader says "button" with no name, or narrates clutter. Label actionable elements; `null`/`.accessibilityHidden(true)` decoration.
142
+ - **Reading rapidly-changing state (scroll offset, animation) at composition scope.** Recomposes the whole subtree every frame. Read it in a lambda (`Modifier.offset { … }`) / use `derivedStateOf`.
143
+
144
+ ## Verify
145
+
146
+ 1. **Builds & previews render:** `xcodebuild -scheme <S> -destination 'platform=iOS Simulator,name=iPhone 15' build` / `./gradlew assembleDebug`. SwiftUI `#Preview` and Compose `@Preview` show light **and** dark variants without crashing.
147
+ 2. **List performance:** scroll a 500+ item list on device — no dropped frames; inserting/removing keeps scroll position. (Compose: Layout Inspector → recomposition counts stay flat per row while scrolling; a row recomposing on unrelated state changes is a fail.)
148
+ 3. **State hoisting holds:** toggle the child's control, confirm the parent's single source of truth updates and no duplicate/stale copy exists; rotate the device (or trigger config change) — state survives (VM/`rememberSaveable`), is not reset.
149
+ 4. **Navigation & deep link:** push → Back returns correctly; cold-launch the deep link (`xcrun simctl openurl booted app://item/42` / `adb shell am start -a android.intent.action.VIEW -d "app://item/42"`) lands on the right screen with a sane back stack; switch tabs and return — the other tab's stack is preserved.
150
+ 5. **Adaptivity:** run iPhone portrait, iPhone landscape, and iPad / a foldable (or resizable emulator dragged across 600dp and 840dp) — layout switches single↔two-pane at the size-class boundary, nothing clips or overlaps.
151
+ 6. **Dynamic Type / dark:** set the largest accessibility text size and dark mode (iOS Settings → Accessibility → Larger Text; emulator font scale 1.3+ / Dark theme) — no truncation, no white-on-white, all controls reachable.
152
+ 7. **Screen reader:** enable VoiceOver (Accessibility Inspector → audit) / TalkBack — swipe through: every actionable element announces a name + role, decorative content is skipped, focus order is logical, and no target is below 44pt/48dp (Xcode Accessibility Inspector audit / Compose `testTagsAsResourceId` + Accessibility Scanner report zero issues).
153
+
154
+ Done = the screen builds, previews render light+dark, a 500+ row list scrolls without dropped frames and without per-row recomposition on unrelated changes, state is hoisted and survives a config change, typed navigation + cold deep link land correctly with per-tab back stacks preserved, layout adapts across the size-class boundaries, and the accessibility inspector/scanner reports zero issues at the largest Dynamic Type / font scale.
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: build-offline-first-sync
3
+ description: Designs offline-first client data layers — a local store (SQLite/Room/Core Data/WatermelonDB), a durable outbound mutation queue with idempotency keys, optimistic local writes, cursor-based delta pull, conflict resolution (last-writer-wins/vector clocks/CRDT), tombstone deletes, and reconnect reconciliation.
4
+ when_to_use: When an app must read/write while offline and reconcile with a server — choosing the local store, queuing offline mutations, pulling deltas since a cursor, resolving write conflicts. Distinct from manage-client-server-state (online cache/TanStack Query) and message-queue-jobs (server-side worker queues).
5
+ ---
6
+
7
+ ## When to Use
8
+
9
+ Reach for this when the client is the **source of truth while offline** and must converge with a server later, not just cache responses:
10
+
11
+ - "App has to work in airplane mode and sync when it reconnects"
12
+ - "Pick a local store — SQLite vs Room vs Core Data vs WatermelonDB"
13
+ - "Queue writes made offline and replay them in order without dupes"
14
+ - "Two devices edited the same row offline — who wins?"
15
+ - "Pull only what changed since last sync instead of refetching everything"
16
+ - "Deletes keep coming back after sync" (missing tombstones)
17
+ - "Optimistic edit, then roll back if the server rejects it"
18
+
19
+ NOT this skill:
20
+ - Online data fetching / cache invalidation with a live connection (TanStack/React Query, hydration, refetch) → manage-client-server-state
21
+ - The **server-side** worker that processes the sync queue (consumers, DLQ, exactly-once on the backend) → message-queue-jobs
22
+ - The shape of the sync API itself (REST vs GraphQL, pagination params, error envelopes) → rest-graphql-contract
23
+ - Changing the **server** schema the deltas come from (DDL locks, rollback) → db-migration-safety
24
+ - Identifying who the syncing user is / token refresh on reconnect → auth-jwt-session
25
+
26
+ ## Steps
27
+
28
+ 1. **Pick the local store by platform + reactivity need — don't reach for raw SQLite by reflex.**
29
+
30
+ | Store | Best when | Reactive queries | Migrations |
31
+ |---|---|---|---|
32
+ | **SQLite** (SQLDelight/Drift/expo-sqlite) | Cross-platform, you want real SQL + full control | Manual (triggers/`PRAGMA data_version`) or lib-provided | Hand-written `user_version` steps |
33
+ | Room (Android) | Native Android, Kotlin/Flow | `Flow`/`LiveData` built-in | `Migration` objects, `fallbackToDestructive` = data loss, avoid |
34
+ | Core Data / SwiftData (Apple) | Native iOS, object graph + iCloud | `@FetchRequest`/`NSFetchedResultsController` | Lightweight (auto) vs mapping model |
35
+ | **WatermelonDB** (RN) | React Native, large datasets, lazy reads | Observables out of the box | `schemaMigrations` versioned |
36
+ | Realm/MongoDB Atlas Device Sync | You want sync *built in* and accept the lock-in | Live objects | Schema-versioned |
37
+
38
+ Default: **SQLite via a typed wrapper** (SQLDelight/Drift) for cross-platform; **WatermelonDB** for React Native with thousands of rows; native (Room/Core Data) only if single-platform. Avoid building your own sync on Realm Device Sync unless you adopt their whole model.
39
+
40
+ 2. **Add sync bookkeeping columns to every syncable table.** The on-device schema is the server schema **plus** local metadata:
41
+
42
+ ```sql
43
+ CREATE TABLE task (
44
+ id TEXT PRIMARY KEY, -- client-generated UUIDv7 (sortable), NOT server autoincrement
45
+ title TEXT NOT NULL,
46
+ updated_at INTEGER NOT NULL, -- server-assigned ms epoch on last sync (the LWW clock)
47
+ version INTEGER NOT NULL DEFAULT 0, -- server row version for optimistic concurrency
48
+ deleted_at INTEGER, -- tombstone; NULL = live
49
+ sync_status TEXT NOT NULL DEFAULT 'synced' -- synced | pending | conflict
50
+ );
51
+ ```
52
+ Generate IDs **on the client** (UUIDv7/ULID) so offline-created rows have a stable PK and FKs link before they ever reach the server — never depend on a server autoincrement id you don't have yet.
53
+
54
+ 3. **Reads are local-first and reactive — the UI never awaits the network.** Every screen queries the local store and observes it (`Flow`, WatermelonDB observables, `NSFetchedResultsController`, or a SQLite change-notify). Filter out tombstones (`WHERE deleted_at IS NULL`) in the read layer, not the UI. Network sync mutates the local store; the reactive query repaints. Surface freshness from a `last_synced_at` you store per-collection, not per-render guesses.
55
+
56
+ 4. **Writes go optimistic + into a durable outbox in one transaction.** Apply the change to the domain table **and** append an op to `outbox` atomically, so a crash can't lose one without the other:
57
+
58
+ ```sql
59
+ CREATE TABLE outbox (
60
+ op_id TEXT PRIMARY KEY, -- idempotency key, client UUID, sent as Idempotency-Key header
61
+ entity TEXT NOT NULL,
62
+ entity_id TEXT NOT NULL,
63
+ op TEXT NOT NULL, -- insert | update | delete
64
+ payload TEXT NOT NULL, -- JSON of changed fields (delta, not whole row)
65
+ base_version INTEGER NOT NULL, -- server version the edit was based on (0 for an offline insert; for conflict detection)
66
+ created_at INTEGER NOT NULL,
67
+ attempts INTEGER NOT NULL DEFAULT 0
68
+ );
69
+ ```
70
+ Set the row's `sync_status='pending'`. **Coalesce** repeated edits to the same `entity_id` before send (collapse 5 title edits into the latest) so the queue doesn't replay every keystroke. A delete is an op with `op='delete'` that sets `deleted_at` locally — never `DELETE FROM` until the server confirms the tombstone.
71
+
72
+ 5. **Sync engine = push-then-pull, both idempotent, with bounded backoff.** Run on connectivity-gain and on an interval:
73
+ 1. **Push:** drain `outbox` oldest-first, `Idempotency-Key: {op_id}`. On `2xx`, apply the server's returned `{version, updated_at}` to the row, set `synced`, delete the op. On `409 Conflict`, go to step 6. On `5xx`/timeout, leave the op, bump `attempts`, retry with exponential backoff + jitter (`min(2^attempts * base, 60s)`).
74
+ 2. **Pull:** `GET /sync?since={cursor}&limit=500`, where `cursor` is the **server-issued** opaque cursor (or `updated_at` high-watermark) from the last successful pull. Apply each changed/deleted row, then persist the new `cursor` **only after** the whole page is committed. Page until `has_more=false`. Pull **after** push so the server already reflects your writes and you don't fight your own optimistic state.
75
+
76
+ Never pull before push, never advance the cursor mid-page, and order by `(updated_at, id)` server-side so pagination is stable under concurrent writes.
77
+
78
+ 6. **Resolve conflicts deterministically — pick a strategy per entity, don't mix silently.**
79
+
80
+ | Strategy | Use when | Mechanism | Cost |
81
+ |---|---|---|---|
82
+ | **Last-Writer-Wins** | Independent scalar fields, low contention (default) | Compare `updated_at`; higher wins | trivial, can lose a field |
83
+ | Version / optimistic concurrency | Need to *detect* and merge, not silently drop | Server rejects with `409` if `base_version` ≠ current; client re-reads + replays | a round-trip per conflict |
84
+ | Vector clocks | Multi-device causal ordering matters | Per-replica counters; detect concurrent vs causal | bookkeeping per row |
85
+ | **CRDT** (Yjs/Automerge) | Collaborative text/lists, must merge without loss | Mergeable types converge automatically | larger payloads, library |
86
+
87
+ Default **field-level LWW** for most records; escalate to **CRDT only** for collaborative documents/lists. On `409`, the server returns the current row + version: re-base the pending op onto it (re-apply the user's delta to the latest server state), bump `base_version`, re-queue. If a field truly diverges, mark `sync_status='conflict'` and surface it — never drop a user's write without a trace.
88
+
89
+ 7. **Reconnect & reconciliation.** Detect connectivity transitions (`NWPathMonitor` / `ConnectivityManager` / `@react-native-community/netinfo`) — treat reachability as "maybe", confirm with the first real request, don't trust the radio flag alone. On reconnect: push outbox → pull deltas → clear stale `pending` that the server now confirms. Cap `attempts`; an op that exceeds the cap (e.g. permanent `400`/`422`) moves to a **client-side dead-letter / `conflict`** state and is surfaced to the user — it must not block the rest of the queue (head-of-line blocking).
90
+
91
+ 8. **Integrity: make partial sync recoverable.** Persist the cursor only after a page fully commits, so a crash mid-pull re-fetches that page (idempotent apply makes re-fetch safe). Dedupe on apply by `(id)` + `version` — ignore an incoming row whose `version` ≤ local. Negotiate schema with a `schema_version` in the sync request; on mismatch the server returns `426 Upgrade Required` and the client forces an app update rather than corrupting data. Run local migrations (`user_version` / `schemaMigrations`) **before** the first sync after an app upgrade.
92
+
93
+ ## Common Errors
94
+
95
+ - **Server-autoincrement PKs for offline-created rows.** You can't link FKs or reference the row until the server replies. Generate UUIDv7/ULID on the client; keep that id forever.
96
+ - **Hard-deleting locally instead of tombstoning.** The next pull from another device re-creates the row (it never saw the delete). Set `deleted_at`, sync the tombstone, GC tombstones only after all clients have pulled past them.
97
+ - **Advancing the sync cursor before the page is committed.** A crash mid-apply skips rows permanently — silent data loss. Commit the page, *then* persist the cursor.
98
+ - **No idempotency key on push.** A retried op after a timeout (where the server actually succeeded) double-applies — duplicate rows / double charges. `Idempotency-Key: {op_id}`, server dedupes.
99
+ - **Replaying every keystroke from the outbox.** 200 ops to sync one note. Coalesce ops per `entity_id` before push.
100
+ - **Pull before push.** The server hasn't seen your local edits yet, so the delta overwrites your optimistic state and the UI flickers back. Always push first.
101
+ - **Trusting the OS "connected" flag.** Captive portals and dead Wi-Fi report "connected". Confirm with an actual lightweight request before draining the queue.
102
+ - **Unbounded retries on a permanent `4xx`.** A `422` op retries forever and head-of-line-blocks every later op. Cap attempts; dead-letter the poison op; keep draining the rest.
103
+ - **Last-Writer-Wins on a whole row.** One device edits `title`, another edits `due_date`; whole-row LWW silently drops one field. Do **field-level** LWW or merge.
104
+ - **Ignoring clock skew in LWW.** Client clocks lie. Use the **server-assigned** `updated_at` as the LWW clock, not the device clock.
105
+ - **Migrating local schema after the first sync.** Incoming rows don't fit the old schema → crash or silent drop. Migrate on app start, before sync runs.
106
+
107
+ ## Verify
108
+
109
+ 1. **Airplane-mode write survives restart:** Go offline, create + edit + delete records, force-quit and relaunch → all local changes still present, `outbox` intact, `sync_status='pending'`.
110
+ 2. **Reconnect drains correctly:** Re-enable network → outbox empties, every op acked, rows flip to `synced`, server reflects every offline change exactly once (no dupes — proves idempotency keys work).
111
+ 3. **Delta pull is incremental:** Trigger a remote change, sync → only the changed rows transfer (inspect request: `since={cursor}` with a non-empty cursor, response page << full table). A second sync with no remote changes transfers zero rows.
112
+ 4. **Conflict resolves deterministically:** Two clients edit the same row offline, both reconnect → result matches the documented strategy (field-level LWW = each field = latest server `updated_at`; CRDT = both edits merged), and no write vanishes silently — divergence shows as `conflict`.
113
+ 5. **Tombstone delete stays deleted:** Delete on device A, sync; device B syncs → row disappears on B and does **not** resurrect on A's next pull.
114
+ 6. **Flaky network / mid-sync kill:** Throttle to 2G + 30% packet loss (Network Link Conditioner / Charles), kill the app mid-pull → relaunch re-fetches the uncommitted page, converges, no duplicate or missing rows; cursor never advanced past uncommitted data.
115
+ 7. **Poison op doesn't block the queue:** Inject an op the server rejects with `422` → it dead-letters after the attempt cap and surfaces to the user; every other queued op still syncs.
116
+ 8. **Schema-version mismatch is safe:** Point an old client at a newer server → `426`/upgrade path, not a corrupt write or crash.
117
+
118
+ Done = a record created **offline** survives an app restart, syncs **exactly once** on reconnect, incremental pull transfers only deltas since the cursor, concurrent edits resolve per the documented strategy with **no silent data loss**, deletes stay deleted, and a mid-sync kill under a flaky network converges with no duplicate or missing rows.
@@ -0,0 +1,122 @@
1
+ ---
2
+ name: build-realtime-channel
3
+ description: Builds realtime push channels over WebSocket/SSE — auth-on-connect, heartbeat/zombie eviction, topic subscribe/publish with per-topic authz and presence, sequence-numbered resume for missed-message recovery, client reconnect with backoff+jitter, and a Redis/NATS pub/sub backplane with send-buffer limits for horizontal scale.
4
+ when_to_use: Adding live updates (chat, notifications, live dashboards, collaborative cursors, feeds), choosing WebSocket vs SSE vs long-poll, or fixing a channel that drops messages, leaks connections, thunders on reconnect, or can't scale past one server. Distinct from message-queue-jobs (durable server-to-server work queues) and manage-client-server-state (client cache/refetch, not the transport).
5
+ ---
6
+
7
+ ## When to Use
8
+
9
+ Reach for this skill when the request is about **pushing live data to clients over a long-lived connection**:
10
+
11
+ - "Push notifications / chat messages / order updates to the browser in realtime"
12
+ - "Build a live dashboard / activity feed / collaborative cursors that updates without polling"
13
+ - "Should this be WebSocket, SSE, or long-poll?"
14
+ - "Our socket drops messages after a reconnect" / "clients miss updates while disconnected"
15
+ - "Connections leak — server FD count climbs and never drops" / "zombie sockets pile up"
16
+ - "On deploy every client reconnects at once and melts the box" (thundering herd)
17
+ - "Realtime works on one node but breaks behind a load balancer / can't scale out"
18
+
19
+ NOT this skill:
20
+ - Durable server-to-server work queues, retries, dead-letter, exactly-once job processing → message-queue-jobs (this skill is at-most/at-least-once *push to clients*, not a job system)
21
+ - Native mobile push via APNs/FCM, device-token registration, woken-from-killed delivery → implement-push-notifications (OS push to a closed app; this skill is an open in-app socket/stream)
22
+ - Client-side cache, refetch, optimistic UI, query invalidation → manage-client-server-state (that's what the client *does* with pushed data; this is the wire)
23
+ - Issuing/validating the token itself, refresh, session rotation → auth-jwt-session (this skill *consumes* a token on connect, it doesn't mint it)
24
+ - Capping how many connects/messages a client may send → rate-limiting
25
+ - Races inside your fan-out/handler code (shared mutable state, missing await) → async-concurrency-correctness
26
+ - Metrics/tracing/log wiring for the channel → observability-instrument
27
+
28
+ ## Steps
29
+
30
+ 1. **Pick the transport by directionality — do not default to WebSocket.** Most "realtime" needs are server→client only.
31
+
32
+ | Transport | Use when | Cost / caveat |
33
+ |---|---|---|
34
+ | **SSE** (`text/event-stream`) | Server→client only (feeds, notifications, dashboards, token streaming). **Default for one-way.** | One HTTP/1.1 conn per stream (use HTTP/2 to avoid 6-conn cap); auto-reconnect + `Last-Event-ID` built in; no binary |
35
+ | **WebSocket** | True bidirectional, low-latency, high message rate (chat, presence, games, collaborative editing) | Manual heartbeat/reconnect/resume; no auto-resume; proxies/LBs need explicit `Upgrade` support |
36
+ | **Long-poll** | Fallback only when SSE/WS are blocked (ancient proxy, locked-down corp net) | High overhead, ~1 msg per round trip; keep as graceful degradation, not primary |
37
+
38
+ Rule: one-way → **SSE**; bidirectional or >~10 msg/s/client → **WebSocket**; long-poll only as fallback. Don't hand-roll if a maintained lib fits — `Socket.IO` (built-in reconnect+rooms+fallback), `Phoenix Channels` (presence+backpressure+cluster PubSub out of the box), `Centrifugo` (standalone server, history/recovery built in), `SignalR` (.NET). Raw `ws`/SSE only when you need full control and will build lifecycle yourself.
39
+
40
+ 2. **Authenticate on connect — never in the query string.** A token in `?token=...` lands in access logs, proxy logs, and `Referer`. Validate *before* upgrading.
41
+ - **WebSocket:** pass the token via the `Sec-WebSocket-Protocol` subprotocol header, or require an authenticated **cookie** (sent automatically on the upgrade), or accept an unauthenticated socket and require an `auth` frame as the **first message** within a short deadline (≤5s) or close.
42
+ - **SSE:** EventSource can't set headers — use a same-site auth **cookie**, or a short-lived single-use ticket fetched over a normal authed request then passed once.
43
+ - On bad/expired token: WS close code **`4401`** (app range; reserve `4403` for authz failure), SSE respond **`401`** before the stream opens. Re-check token expiry on long-lived sockets; close when it lapses.
44
+
45
+ ```js
46
+ // WS upgrade with subprotocol-carried token (client)
47
+ new WebSocket("wss://api.example.com/ws", ["bearer", token]);
48
+ // server: read token from Sec-WebSocket-Protocol, verify, then accept (echo the protocol)
49
+ ```
50
+
51
+ 3. **Run the full connection lifecycle — this is where leaks live.**
52
+ - **Heartbeat:** WS — server sends `ping` every **30s**, expects `pong`; if 2 missed (60s), terminate the socket (a half-open TCP conn looks alive to the OS but is dead). SSE — emit a comment line `:keep-alive\n\n` every 15–30s so proxies don't idle-close. `ws` clients that never `pong` are zombies; an `isAlive` flag flipped false on each ping and reset on pong evicts them.
53
+ - **Zombie eviction:** sweep on an interval; `socket.terminate()` (not `.close()`) anything that failed the heartbeat. Track open connections in a registry so you can count and reap them.
54
+ - **Graceful drain on deploy:** on `SIGTERM`, stop accepting new connections, send a `going_away` app message (so clients reconnect *staggered*, not instantly), then close with code **`1001`** after a grace window. Never `kill -9` a live channel node — every client stampedes back at once.
55
+
56
+ 4. **Model subscriptions as topics with per-topic authz, and add presence.** A connection is not a subscription. Let a client subscribe to named topics/channels (`chat:room:42`, `user:7:notifications`) over one socket.
57
+ - **Authorize every subscribe** against the *current* user — a connection authed as user 7 must not subscribe to `user:9:*`. Check on subscribe, not just on connect; deny with an error frame, don't silently drop.
58
+ - **Namespace topics** so wildcards can't leak (`org:{id}:...`). Reject subscribe to topics the user can't read.
59
+ - **Presence:** maintain a per-topic set of members in the backplane (Redis `SET`/hash keyed by topic, member = `{userId, connId}`); broadcast `join`/`leave` on change. Tie membership to the connection so a dropped socket auto-removes the member (TTL-backed, refreshed by heartbeat — otherwise a crashed client lingers as "online" forever).
60
+
61
+ 5. **Make missed-message recovery explicit with sequence numbers + a resume cursor — decide the delivery guarantee up front.** Default to **at-least-once + client dedup**, not "best effort."
62
+ - Stamp every message per-topic with a monotonic **`seq`** (and an event `id`). Keep a bounded **history buffer** per topic (e.g. last N=1000 or last 5 min) in Redis (`XADD` to a stream, or a capped list).
63
+ - On (re)subscribe the client sends its **last seen `seq`** (`resume_from`); server replays buffered events `> resume_from` then switches to live. SSE gets this for free: the browser auto-sends **`Last-Event-ID`** on reconnect — honor it and replay.
64
+ - If the gap exceeds the buffer, send a **`reset`/snapshot-required** signal so the client refetches full state instead of silently missing data.
65
+ - Guarantee table — pick one and document it:
66
+
67
+ | Guarantee | Mechanism | Cost |
68
+ |---|---|---|
69
+ | Best-effort (at-most-once) | Fire-and-forget, no buffer | Drops on any disconnect — only for ephemeral (live cursor pos) |
70
+ | **At-least-once + dedup** | seq + history buffer + resume cursor; client drops `seq ≤ lastSeen` | **Default.** Bounded buffer mem; client must dedup |
71
+ | Exactly-once *delivery* | Don't. Use at-least-once + idempotent client apply | True E2E exactly-once is a distributed-systems tax you don't need |
72
+
73
+ 6. **Reconnect on the client with backoff + jitter, then resubscribe and dedup.** A fixed-delay or zero-delay reconnect loop is how one deploy becomes a self-DDoS.
74
+
75
+ ```js
76
+ // exponential backoff, full jitter, cap 30s — applies to WS and SSE-with-manual-reconnect
77
+ let attempt = 0;
78
+ function reconnect() {
79
+ const base = Math.min(30000, 1000 * 2 ** attempt++);
80
+ const delay = Math.random() * base; // full jitter — spreads the herd
81
+ setTimeout(connect, delay);
82
+ }
83
+ // on open: attempt = 0; resubscribe all topics with resume_from=lastSeq[topic];
84
+ // drop any replayed event whose seq <= lastSeq[topic] (dedup)
85
+ ```
86
+ Reset the attempt counter on a *successful* open, resubscribe every topic with its own `resume_from`, and dedup replayed events by `seq`. Stop reconnecting on a fatal close code (`4401`/`4403`) — don't hammer a server that rejected your auth.
87
+
88
+ 7. **Scale horizontally with a pub/sub backplane + per-connection backpressure.** A second node means a publish on node A must reach a subscriber on node B.
89
+ - **Stateless + backplane (preferred):** each node holds its own sockets; publishes go to **Redis Pub/Sub**, **Redis Streams**, or **NATS**; every node subscribes and fans out to its local sockets for that topic. No sticky sessions needed for WS (the socket stays pinned to one node by TCP anyway); for SSE/long-poll across nodes you still need either a backplane or sticky routing.
90
+ - **Sticky sessions:** only needed for handshake-split transports (Socket.IO long-poll→WS upgrade must hit the same node) — set LB affinity or force `transports: ['websocket']`. Prefer stateless+backplane over relying on stickiness.
91
+ - **Backpressure / slow-consumer:** a client that reads slower than you write balloons the per-socket send buffer and OOMs the node. Cap it: watch `ws.bufferedAmount` (or your lib's queue depth); if it exceeds a threshold (e.g. 1–4 MB), **drop the slow consumer** (close `1013`/`4408`) rather than buffer unboundedly. For SSE, the same applies to the response stream's write backpressure. One slow consumer must never degrade the rest.
92
+
93
+ 8. **Load + soak test, then observe.** Single-connection tests prove nothing about a channel (see Verify). Wire metrics (open conns, msgs/s, send-buffer high-water, reconnect rate, dropped-slow-consumers) via observability-instrument.
94
+
95
+ ## Common Errors
96
+
97
+ - **Token in the query string.** `?token=...` leaks into access/proxy logs and `Referer`. Use subprotocol header, auth cookie, or a first-frame `auth` message.
98
+ - **No heartbeat → silent half-open sockets.** TCP keepalive defaults to ~2h; a dead peer looks connected for hours. App-level ping/pong (30s) + terminate on miss is mandatory.
99
+ - **`.close()` instead of `.terminate()` on a zombie.** `close()` waits for a close handshake the dead peer will never send, so the FD lingers. Terminate failed-heartbeat sockets.
100
+ - **Unbounded per-socket send buffer.** One slow/paused client grows `bufferedAmount` until the node OOMs and takes down *every* connection. Cap buffer; drop the slow consumer.
101
+ - **No jitter on reconnect.** All clients backoff on the same schedule and reconnect in lockstep after an outage/deploy — synchronized thundering herd. Add full jitter.
102
+ - **Instant/zero-delay reconnect loop.** A server that closes on every connect gets hammered thousands of times/sec. Always backoff; stop on fatal auth close codes.
103
+ - **Treating a connection as a subscription / authz only on connect.** A long-lived socket can request any topic later; authorize every `subscribe` against the current user, namespace topics, deny cross-tenant.
104
+ - **No sequence numbers → "messages disappear" after reconnect.** Without `seq` + history + resume there's no way to recover the gap; clients silently miss data. Stamp seq, buffer, replay from cursor (or `Last-Event-ID`).
105
+ - **In-memory subscriptions/presence with >1 node.** A publish on node A never reaches node B's subscribers; presence shows half the users. Use a Redis/NATS backplane; back presence with a TTL'd shared store.
106
+ - **Presence that never clears.** Membership tied to a clean disconnect only — a crashed client stays "online" forever. TTL the entry, refresh on heartbeat.
107
+ - **No graceful drain on deploy.** Killing a node drops every socket simultaneously and they all reconnect at once. SIGTERM → stop accepts → `going_away` (staggered reconnect) → close `1001`.
108
+ - **Relying on sticky sessions to "fix" scale.** Stickiness papers over a missing backplane and breaks the moment a node dies (all its clients fail over to a node that doesn't know their topics). Make nodes stateless + backplane; use stickiness only for handshake-split transports.
109
+ - **SSE without keep-alive comments.** Idle proxies/LBs close a quiet `text/event-stream` after their timeout. Emit `:keep-alive` every 15–30s.
110
+
111
+ ## Verify
112
+
113
+ 1. **Auth on connect:** connect with no/expired/forged token → rejected before the stream opens (WS close `4401`, SSE `401`); token never appears in `nginx`/access logs. Cross-tenant `subscribe` → denied with an error frame, not silently dropped.
114
+ 2. **Heartbeat + zombie eviction:** `tc`/`iptables`-drop a client's traffic (simulate half-open) → server detects via missed pong within ~60s and `terminate()`s it; server open-connection gauge returns to baseline (no FD leak). Re-run 100× in a loop — count must not climb.
115
+ 3. **Missed-message recovery:** subscribe, record `seq`, kill the client, publish 50 events, reconnect with `resume_from`/`Last-Event-ID` → client receives exactly those 50 in order, zero gaps, zero dupes after dedup. Exceed the buffer → client gets a `reset`/snapshot signal (not a silent gap).
116
+ 4. **Reconnect storm (thundering herd):** connect 5–10k clients, restart/kill the node → reconnects spread over the backoff window (full-jitter histogram, not a spike); server stays up; all topics resubscribed. With zero jitter this test must visibly fail (then pass after adding jitter).
117
+ 5. **Horizontal fan-out:** run ≥2 nodes behind an LB; a subscriber on node B receives a message published to node A → proves the backplane works, not just one box. Kill node A mid-stream → its clients fail over to node B and resume from cursor with no lost messages.
118
+ 6. **Slow-consumer isolation:** one client stops reading (pause the socket) while others stay live → the slow one is dropped (`bufferedAmount` cap hit, close `1013`/`4408`); all other clients keep flowing with no latency spike; node memory stays flat.
119
+ 7. **Graceful drain:** send the node `SIGTERM` under load → clients get `going_away` then close `1001`, reconnect staggered to another node, miss zero messages (resume covers the gap).
120
+ 8. **Load/soak:** drive target concurrent connections at peak msg/s with a WS/SSE load tool (`k6` `ws`/SSE, `artillery`, `vegeta` for SSE) for ≥30 min → p99 delivery latency within budget; open-conn count, memory, and send-buffer high-water are flat (no leak/creep).
121
+
122
+ Done = auth-on-connect (no query-string token), heartbeat-driven zombie eviction with flat FD/conn count under churn, sequence-based resume recovers every missed message with no dupes, reconnect uses backoff+jitter, fan-out works across ≥2 nodes via the backplane, slow consumers are dropped without affecting others, and a ≥30-min soak shows no leak.
@@ -0,0 +1,131 @@
1
+ ---
2
+ name: build-vector-search
3
+ description: Builds semantic/vector search — pick an embedding model + dimensionality (and whether to truncate Matryoshka dims) and the matching distance metric (cosine/dot/L2, normalize to unit length so cosine == dot and IP is correct), an ANN index with the recall/latency/memory tradeoff understood (HNSW M/efConstruction/efSearch for low-latency RAM-resident; IVF-PQ nlist/nprobe/PQ for billion-scale compressed; flat/exact for <100k) in pgvector/Qdrant/Milvus/FAISS/Pinecone, chunking + overlap + per-chunk metadata for filtering, HYBRID retrieval fusing BM25 + dense by Reciprocal Rank Fusion (RRF, k≈60) not score addition, a cross-encoder/Cohere reranker over the top-50→k, correct pre-filter-vs-ANN interaction (filterable HNSW, not post-filter that starves k), and offline eval with recall@k / nDCG@10 / MRR against a labeled qrels set. Quantize (scalar/PQ) only after measuring recall loss; tune efSearch/nprobe to a recall target, not a guess.
4
+ when_to_use: Building or tuning the embedding + vector-index + retrieval-quality core — choosing an embedding model/dim/metric, sizing/tuning an HNSW or IVF-PQ index for a recall@k target, adding hybrid (BM25+vector via RRF) or a reranker, fixing pre-filtering that tanks recall, or running a recall@k/nDCG eval. Distinct from rag-pipeline (the full retrieve-augment-generate app — prompt assembly, grounding, citations, hallucination control; this skill is the retrieval engine it embeds) and design-search-index-infra (the lexical/inverted-index + cluster topology + zero-downtime reindex infra; this skill owns the embedding model, distance metric, ANN params, and relevance eval rather than shard/analyzer/capacity design).
5
+ ---
6
+
7
+ ## When to Use
8
+
9
+ Reach for this skill when the task is the **quality and mechanics of vector retrieval itself** — embeddings, the ANN index, hybrid/rerank, and measuring relevance:
10
+
11
+ - "Pick an embedding model + dimensionality + distance metric for semantic search"
12
+ - "Our ANN search misses obvious matches" / "tune HNSW/IVF for recall@10 without blowing latency"
13
+ - "pgvector / Qdrant / Milvus / FAISS / Pinecone — which index and what parameters?"
14
+ - "Add hybrid search (BM25 + vector) and a reranker" / "results are semantically close but wrong-ranked"
15
+ - "Filtering by metadata returns too few results / wrong ones" (pre-filter vs ANN)
16
+ - "How do I know retrieval got better?" → recall@k / nDCG / MRR eval
17
+ - "Quantize to fit in RAM" / "embeddings cost/latency too high"
18
+
19
+ NOT this skill:
20
+ - The end-to-end **retrieve→augment→generate** app — prompt assembly, context packing, grounding, citations, hallucination control → rag-pipeline (this skill is the retrieval core it calls; tune retrieval here, wire the LLM there)
21
+ - **Lexical/inverted-index** search infra — Elasticsearch/OpenSearch analyzers & mappings, shard/replica topology, capacity sizing, alias-based zero-downtime reindex → design-search-index-infra (it owns BM25 analyzer config + cluster ops; this skill owns the embedding model, metric, ANN params, and relevance eval)
22
+ - Measuring **LLM answer** quality (faithfulness, answer correctness, LLM-as-judge) → llm-eval-harness (this skill evals *retrieval* — recall@k/nDCG — not generation)
23
+ - Cutting **embedding/inference** cost & latency at the model/serving layer (batching, caching, model size) → optimize-llm-cost-latency
24
+ - The BM25/keyword half as a standalone full-text feature with no vectors → design-search-index-infra
25
+ - Picking a document/KV store schema unrelated to vectors → model-nosql-data; relational schema for the metadata table → design-relational-schema
26
+ - Profiling the corpus before indexing (length distribution, dupes, language mix) → profile-dataset
27
+
28
+ ## Steps
29
+
30
+ 1. **Pick the embedding model, dimensionality, and distance metric together — they're coupled.** Don't default to `text-embedding-ada-002` (legacy). 2025-2026 strong choices:
31
+
32
+ | Model | Dim | Notes |
33
+ |---|---|---|
34
+ | OpenAI `text-embedding-3-large` | 3072 (truncatable to 256/1024) | Matryoshka — truncate then **re-normalize**; strong general |
35
+ | OpenAI `text-embedding-3-small` | 1536 (truncatable) | cheap, good baseline |
36
+ | Cohere `embed-v3` / `embed-v4` | 1024 | has `input_type` (query vs document) — use it |
37
+ | `BAAI/bge-large-en-v1.5`, `intfloat/e5-large-v2` | 1024 | open, self-host; **require a prefix** (`query:` / `passage:`) — omitting it craters recall |
38
+ | `BAAI/bge-m3` | 1024 | multilingual + multi-vector |
39
+ | Voyage `voyage-3` | 1024 | strong retrieval, code/domain variants |
40
+
41
+ Rules: **embed the query and the document with the SAME model** (and the right `input_type`/prefix). Higher dim ≈ better recall but more RAM/latency — Matryoshka models let you truncate (e.g. 3072→1024) and trade recall for cost; **re-normalize after truncating**. Metric choice:
42
+
43
+ | Metric | Use when | pgvector op | Note |
44
+ |---|---|---|---|
45
+ | **Cosine** | text embeddings (default) | `<=>` (`vector_cosine_ops`) | direction only |
46
+ | **Dot / inner product** | already unit-normalized vectors | `<#>` (negative IP) | == cosine when normalized; faster |
47
+ | **L2 / Euclidean** | rarely for text; some image models | `<->` (`vector_l2_ops`) | magnitude matters |
48
+
49
+ **Normalize embeddings to unit length once at write time**, then cosine == dot and you can use the faster IP path. Pick the index opclass to match the metric — a cosine index on un-normalized vectors silently mis-ranks.
50
+
51
+ 2. **Chunk with structure, sized to the model, with overlap and metadata — bad chunks cap recall before any tuning.** Defaults: ~**256–512 tokens** per chunk, **10–15% overlap** (~50–80 tokens) so a fact split across a boundary survives. Split on **semantic boundaries** (headings, paragraphs, code blocks, `RecursiveCharacterTextSplitter` by separator hierarchy) — never a blind fixed char window mid-sentence. Stamp every chunk with metadata for filtering and citation: `{doc_id, chunk_id, source, title, section, page, created_at, tenant_id, lang}`. Consider **late chunking** (embed the long context, then pool per-chunk) or a parent-document retriever (embed small, return the larger parent) when chunks lose context. One vector per chunk; keep the raw text + metadata in a payload column/store.
52
+
53
+ 3. **Choose the ANN index by corpus size and the recall/latency/memory tradeoff — there is no free lunch.**
54
+
55
+ | Index | Recall | Latency | Memory | Build | Use when |
56
+ |---|---|---|---|---|---|
57
+ | **Flat / exact (brute force)** | 100% | O(N) | full | none | < ~50–100k vectors, or as the recall ground-truth |
58
+ | **HNSW** | high | very low | **high (graph in RAM)** | slow | low-latency, RAM-resident, ≤ tens of millions |
59
+ | **IVF / IVF-Flat** | tunable | low | medium | fast | large, want simple recall/latency knob (`nprobe`) |
60
+ | **IVF-PQ / PQ** | lower (lossy) | low | **very low (compressed)** | medium | 100M–1B+, must fit RAM/budget; accept recall hit |
61
+ | **DiskANN / Vamana** | high | low | on-disk | slow | billion-scale, can't fit graph in RAM |
62
+
63
+ **HNSW knobs:** `M` (neighbors/node, 16–64; higher = better recall + more RAM), `efConstruction` (build quality, 100–400), `efSearch`/`ef` (**query-time** recall↔latency dial — raise until recall target met). **IVF knobs:** `nlist` (clusters ≈ `√N` to `4√N`), `nprobe` (clusters scanned at query — the recall↔latency dial). **PQ knobs:** `m` sub-quantizers (dim must be divisible), `nbits` (usually 8). Default to **HNSW** unless memory or scale forces IVF-PQ.
64
+
65
+ 4. **Per-store specifics — same concepts, different syntax.**
66
+ - **pgvector** (Postgres): `CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64);` then per-session `SET hnsw.ef_search = 100;`. IVFFlat: `WITH (lists = N)` + `SET ivfflat.probes = 10;` — **build IVFFlat AFTER loading data** (it clusters existing rows); HNSW can be built on empty. `pgvector` ≥0.7 supports `halfvec` (16-bit) to halve size. Pre-filter with a plain `WHERE` + a btree index — Postgres can combine.
67
+ - **Qdrant:** HNSW by default; set `hnsw_config` (`m`, `ef_construct`) per collection, `ef` per search via `params.hnsw_ef`. **Payload indexes** on filtered fields enable *filterable HNSW* (filter applied during graph traversal, not after). Use scalar/product quantization via `quantization_config`.
68
+ - **Milvus:** explicit `index_type` (`HNSW`, `IVF_FLAT`, `IVF_PQ`, `DISKANN`, `SCANN`) + `metric_type` (`COSINE`/`IP`/`L2`); search `params` = `{ef}` or `{nprobe}`. Must `load()` collection into memory before search.
69
+ - **FAISS** (library, no server): `IndexHNSWFlat`, `IndexIVFFlat`, `IndexIVFPQ`; **train IVF/PQ on a representative sample** before `add`; `index.nprobe = N`. Wrap with `IndexIDMap` to keep external ids. You manage persistence + metadata yourself.
70
+ - **Pinecone:** managed; pick `metric` at index creation (immutable), use **namespaces** for tenant isolation, `filter` in query for metadata. Serverless handles the index internals — you tune `top_k` and filters, not `M`/`nprobe`.
71
+
72
+ 5. **Pre-filter correctly — naïve post-filtering starves your k and silently drops good hits.** Three interaction modes:
73
+
74
+ | Mode | What happens | Risk |
75
+ |---|---|---|
76
+ | **Post-filter** (ANN then drop non-matches) | fetch top-k, remove rows failing the filter | a selective filter can leave **0–few** results; raise `k` won't reliably fix it |
77
+ | **Pre-filter** (filter then exact search) | filter to a subset, brute-force within it | exact but slow on large subsets |
78
+ | **Filterable ANN** (filter *during* graph/list traversal) | engine prunes by metadata inside HNSW/IVF | best — Qdrant payload index, Milvus filtered search, pgvector `WHERE` + index |
79
+
80
+ For a highly selective filter (e.g. `tenant_id = X` with few rows), **pre-filter or partition** (separate collection/namespace/partition per tenant) instead of filtering a global index. **Always index the metadata fields you filter on**; an unindexed filter forces a slow scan or weak post-filter. Test recall *with the filter applied* — unfiltered recall lies.
81
+
82
+ 6. **Add hybrid (BM25 + dense) and fuse with RRF — not score addition.** Dense embeddings miss exact terms (IDs, codes, rare names, acronyms); BM25/keyword catches them. Run both retrievers, take each result's **rank**, and fuse with **Reciprocal Rank Fusion**:
83
+
84
+ ```
85
+ RRF_score(d) = Σ_retrievers 1 / (k + rank_r(d)) # k ≈ 60
86
+ ```
87
+
88
+ RRF is rank-based, so you **don't have to normalize** the wildly different BM25 vs cosine score scales (raw weighted score-sum is the classic bug — one scale dominates). Native support: Qdrant `Query` API with `Fusion.RRF`, Elasticsearch/OpenSearch `rrf` retriever, Milvus `RRFRanker`, Weaviate hybrid `fusionType`. In pgvector, run a BM25/`tsvector` (or ParadeDB `pg_search`) query and a vector query, then fuse in SQL. Hybrid typically beats either alone on heterogeneous corpora.
89
+
90
+ 7. **Rerank the shortlist with a cross-encoder — it fixes the ordering bi-encoders get wrong.** Retrieve a wide net (top **50–100** by RRF), then rerank to the final **k (5–10)** with a cross-encoder that scores (query, doc) jointly: **Cohere `rerank-v3.5`**, `BAAI/bge-reranker-v2-m3`, or a `cross-encoder/ms-marco-MiniLM` model. Cross-encoders are far more accurate but **O(candidates)** per query — only ever run them over the shortlist, never the whole index. Reranking usually buys more nDCG than squeezing the ANN, and it's where "semantically close but mis-ranked" gets fixed. Budget the extra ~50–300ms.
91
+
92
+ 8. **Eval with a labeled set — tune to a recall TARGET, never by eyeballing.** Build qrels: a set of queries each with known-relevant `doc_id`s (mine from clicks/logs, or hand-label 50–200). Metrics:
93
+
94
+ | Metric | Measures | When |
95
+ |---|---|---|
96
+ | **recall@k** | did the relevant doc make the top-k at all | the **ANN/retrieval** gate — most important for RAG (can't rerank what you didn't retrieve) |
97
+ | **MRR** | rank of the *first* relevant hit | single-answer / "find the doc" |
98
+ | **nDCG@10** | graded relevance + position | multi-relevant, ranking quality (post-rerank) |
99
+ | **precision@k** | fraction of top-k relevant | when noise in context hurts |
100
+
101
+ Compute **exact (flat) search as the recall=100% ground truth**, then measure your ANN's recall@k against it — that's how you set `efSearch`/`nprobe`: raise it until recall@k hits target (e.g. 0.95), then stop (latency grows past it). Re-run the suite on every model/chunk/param change. Tools: `ranx`, BEIR, `pytrec_eval`, or a small custom harness.
102
+
103
+ 9. **Quantize only after measuring the recall loss — it's a memory/latency win that costs accuracy.** Options: **scalar quantization** (float32→int8, ~4× smaller, small recall loss — good default), **binary quantization** (1-bit, ~32× smaller, big loss — only with a rescoring/oversampling pass), **PQ** (product quantization, tunable, needs training). Pattern: quantize for the fast first pass, then **rescore the top candidates with full-precision vectors** (Qdrant `rescore`, Milvus refine) to recover recall. Measure recall@k before/after on the eval set — never ship a silent quality drop. Also: store the original embedding model + dim in metadata so a model upgrade triggers a full re-embed (you can't mix embedding spaces).
104
+
105
+ ## Common Errors
106
+
107
+ - **Embedding query and documents with different models (or wrong `input_type`/prefix).** Vectors live in different spaces → garbage similarity. Fix: same model both sides; set Cohere `input_type`, E5/BGE `query:`/`passage:` prefixes.
108
+ - **Metric/opclass mismatch or un-normalized vectors with cosine/IP.** A cosine index on un-normalized vectors mis-ranks; IP on un-normalized ≠ cosine. Fix: normalize to unit length at write time, pick the matching opclass (`vector_cosine_ops` etc.).
109
+ - **Tuning by feel instead of to a recall target.** Picking `efSearch`/`nprobe` "that seems fine" hides recall cliffs. Fix: exact search as ground truth, raise the knob until recall@k ≥ target, then stop.
110
+ - **Post-filtering a selective metadata filter.** ANN returns k, the filter drops most → too few/empty results. Fix: filterable ANN (payload/`WHERE` index) or pre-filter/partition per tenant.
111
+ - **Weighted score-sum hybrid instead of RRF.** BM25 and cosine scales differ wildly; one dominates. Fix: fuse by rank with RRF (k≈60) — no score normalization needed.
112
+ - **Building an IVFFlat index before loading data.** It clusters on existing rows; empty → degenerate. Fix: load data, then build IVFFlat (HNSW is fine on empty).
113
+ - **No overlap / mid-sentence chunking.** Facts split across boundaries become unretrievable. Fix: 10–15% overlap, split on semantic boundaries.
114
+ - **Reranking the whole index.** Cross-encoders are O(N) per query → unusable latency. Fix: rerank only the top-50–100 shortlist.
115
+ - **Quantizing without measuring.** Silent recall drop in prod. Fix: measure recall@k before/after; add a full-precision rescore pass.
116
+ - **Mixing embedding spaces after a model upgrade.** New and old vectors are incomparable. Fix: store model+dim in metadata; re-embed the whole corpus on upgrade.
117
+ - **HNSW out-of-memory at scale.** The graph is RAM-resident; tens of millions × high `M` × float32 blows the budget. Fix: lower `M`, scalar-quantize, or switch to IVF-PQ / DiskANN.
118
+
119
+ ## Verify
120
+
121
+ 1. **Metric/normalization correct:** vectors are unit-normalized; the index opclass matches the metric; a known query returns its known-relevant doc as a top hit.
122
+ 2. **Same-model invariant:** grep the pipeline — query and document embeddings use the identical model + correct `input_type`/prefix.
123
+ 3. **Recall measured against exact search:** flat/brute-force gives the ground truth; ANN recall@k is computed and meets target (e.g. ≥0.95) at the chosen `efSearch`/`nprobe`, with latency recorded.
124
+ 4. **Filter recall holds:** run the eval **with the production metadata filter applied**; recall doesn't collapse (no post-filter starvation), and selective filters use pre-filter/partition.
125
+ 5. **Hybrid fuses by RRF:** BM25 and dense both contribute; fusion is rank-based (RRF k≈60), and hybrid recall@k ≥ either retriever alone on the eval set.
126
+ 6. **Rerank improves nDCG, not latency-killing:** cross-encoder runs over the top-50–100 only; nDCG@10 improves vs pre-rerank; added latency is within budget.
127
+ 7. **Chunking sound:** chunks are 256–512 tokens with 10–15% overlap on semantic boundaries, each carrying filter/citation metadata; a boundary-straddling fact is retrievable.
128
+ 8. **Quantization is net-positive:** recall@k before/after quantization is measured; any drop is recovered by a full-precision rescore pass and is within tolerance.
129
+ 9. **Index choice fits scale/memory:** the index type (flat/HNSW/IVF-PQ/DiskANN) matches corpus size and the RAM budget; HNSW graph fits in memory or a compressed index was chosen.
130
+
131
+ Done = query and documents share one normalized embedding model with a matching distance metric/opclass, the ANN index is chosen for the corpus's scale/latency/memory budget and tuned to a measured recall@k target against exact search, hybrid retrieval fuses BM25 + dense by RRF, a cross-encoder reranks the shortlist, metadata filtering uses filterable/pre-filter (not post-filter starvation), and every change is validated by the recall@k / nDCG / MRR eval in checks 3–8.