@arach/lattices 0.2.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (143) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +172 -86
  3. package/apps/mac/Info.plist +43 -0
  4. package/apps/mac/Lattices.app/Contents/Info.plist +43 -0
  5. package/apps/mac/Lattices.app/Contents/MacOS/Lattices +0 -0
  6. package/apps/mac/Lattices.app/Contents/Resources/AppIcon.icns +0 -0
  7. package/apps/mac/Lattices.app/Contents/Resources/docs/assistant-knowledge.md +130 -0
  8. package/apps/mac/Lattices.app/Contents/Resources/tap.wav +0 -0
  9. package/apps/mac/Lattices.app/Contents/_CodeSignature/CodeResources +150 -0
  10. package/apps/mac/Lattices.entitlements +21 -0
  11. package/apps/mac/Resources/Pets/assistant-spark/pet.json +62 -0
  12. package/apps/mac/Resources/Pets/assistant-spark/spritesheet.webp +0 -0
  13. package/apps/mac/Resources/Pets/scout-ranger/pet.json +6 -0
  14. package/apps/mac/Resources/Pets/scout-ranger/spritesheet.webp +0 -0
  15. package/apps/mac/Resources/tap.wav +0 -0
  16. package/assets/AppIcon.icns +0 -0
  17. package/bin/assistant-intelligence.ts +912 -0
  18. package/bin/cli/capture.ts +252 -0
  19. package/bin/cli/daemon.ts +22 -0
  20. package/bin/cli/helpers.ts +105 -0
  21. package/bin/cli/layer.ts +178 -0
  22. package/bin/cli/runs.ts +43 -0
  23. package/bin/cli/search.ts +141 -0
  24. package/bin/cli/session.ts +32 -0
  25. package/bin/client.ts +17 -0
  26. package/bin/cua.ts +26 -0
  27. package/bin/{daemon-client.js → daemon-client.ts} +49 -30
  28. package/bin/handsoff-infer.ts +96 -0
  29. package/bin/handsoff-worker.ts +531 -0
  30. package/bin/infer.ts +424 -0
  31. package/bin/keychain.ts +75 -0
  32. package/bin/lattices-app.ts +655 -0
  33. package/bin/lattices-build +125 -0
  34. package/bin/lattices-build-env.ts +77 -0
  35. package/bin/lattices-dev +362 -0
  36. package/bin/lattices.ts +3260 -0
  37. package/bin/project-twin.ts +645 -0
  38. package/docs/agent-execution-plan.md +562 -0
  39. package/docs/agent-layer-guide.md +207 -0
  40. package/docs/agents.md +233 -0
  41. package/docs/ai-chat-ux-review.md +416 -0
  42. package/docs/api.md +1041 -47
  43. package/docs/app.md +96 -13
  44. package/docs/assistant-knowledge.md +130 -0
  45. package/docs/companion-deck.md +209 -0
  46. package/docs/component-extraction-roadmap.md +392 -0
  47. package/docs/concepts.md +13 -12
  48. package/docs/config.md +83 -10
  49. package/docs/gesture-customization-proposal.md +520 -0
  50. package/docs/handsoff-test-scenarios.md +84 -0
  51. package/docs/hyperspace-grid-snappiness.md +210 -0
  52. package/docs/layers.md +176 -28
  53. package/docs/mouse-gestures.md +244 -0
  54. package/docs/ocr.md +21 -9
  55. package/docs/overview.md +42 -23
  56. package/docs/presentation-execution-review.md +491 -0
  57. package/docs/prompts/hands-off-system.md +382 -0
  58. package/docs/prompts/hands-off-turn.md +30 -0
  59. package/docs/prompts/voice-advisor.md +31 -0
  60. package/docs/prompts/voice-fallback.md +23 -0
  61. package/docs/proposals/LAT-001-gesture-visual-customization.md +522 -0
  62. package/docs/proposals/LAT-002-shared-overlay-canvas.md +353 -0
  63. package/docs/proposals/LAT-003-menu-bar-controller-architecture.md +291 -0
  64. package/docs/proposals/LAT-004-interactive-overlay-actors.md +534 -0
  65. package/docs/proposals/LAT-005-action-runtime-product-spine.md +914 -0
  66. package/docs/proposals/LAT-006-followup-gaps.md +103 -0
  67. package/docs/proposals/LAT-006-runs-and-capture-in-lattices.md +566 -0
  68. package/docs/proposals/LAT-007-unified-app-shell.md +128 -0
  69. package/docs/quickstart.md +8 -12
  70. package/docs/reference/dewey.config.ts +74 -0
  71. package/docs/reference/install-agent.md +79 -0
  72. package/docs/release.md +172 -0
  73. package/docs/repo-structure.md +100 -0
  74. package/docs/terminal-kit.md +87 -0
  75. package/docs/tiling-reference.md +224 -0
  76. package/docs/twins.md +138 -0
  77. package/docs/voice-command-protocol.md +278 -0
  78. package/docs/voice-error-model.md +73 -0
  79. package/docs/voice.md +221 -0
  80. package/package.json +69 -16
  81. package/packages/npm/sdk/cua.d.mts +1 -0
  82. package/packages/npm/sdk/cua.d.ts +188 -0
  83. package/packages/npm/sdk/cua.mjs +376 -0
  84. package/app/Lattices.app/Contents/Info.plist +0 -24
  85. package/app/Package.swift +0 -13
  86. package/app/Sources/ActionRow.swift +0 -61
  87. package/app/Sources/App.swift +0 -10
  88. package/app/Sources/AppDelegate.swift +0 -234
  89. package/app/Sources/AppShellView.swift +0 -62
  90. package/app/Sources/AppTypeClassifier.swift +0 -70
  91. package/app/Sources/AppWindowShell.swift +0 -63
  92. package/app/Sources/CheatSheetHUD.swift +0 -332
  93. package/app/Sources/CommandModeState.swift +0 -1362
  94. package/app/Sources/CommandModeView.swift +0 -1405
  95. package/app/Sources/CommandModeWindow.swift +0 -192
  96. package/app/Sources/CommandPaletteView.swift +0 -307
  97. package/app/Sources/CommandPaletteWindow.swift +0 -134
  98. package/app/Sources/DaemonProtocol.swift +0 -101
  99. package/app/Sources/DaemonServer.swift +0 -414
  100. package/app/Sources/DesktopModel.swift +0 -121
  101. package/app/Sources/DesktopModelTypes.swift +0 -71
  102. package/app/Sources/DiagnosticLog.swift +0 -271
  103. package/app/Sources/EventBus.swift +0 -30
  104. package/app/Sources/HotkeyManager.swift +0 -250
  105. package/app/Sources/HotkeyStore.swift +0 -338
  106. package/app/Sources/InventoryManager.swift +0 -35
  107. package/app/Sources/InventoryPath.swift +0 -43
  108. package/app/Sources/KeyRecorderView.swift +0 -210
  109. package/app/Sources/LatticesApi.swift +0 -1125
  110. package/app/Sources/MainView.swift +0 -467
  111. package/app/Sources/MainWindow.swift +0 -83
  112. package/app/Sources/OcrModel.swift +0 -309
  113. package/app/Sources/OcrStore.swift +0 -295
  114. package/app/Sources/OmniSearchState.swift +0 -283
  115. package/app/Sources/OmniSearchView.swift +0 -288
  116. package/app/Sources/OmniSearchWindow.swift +0 -105
  117. package/app/Sources/OrphanRow.swift +0 -129
  118. package/app/Sources/PaletteCommand.swift +0 -419
  119. package/app/Sources/PermissionChecker.swift +0 -125
  120. package/app/Sources/Preferences.swift +0 -92
  121. package/app/Sources/ProcessModel.swift +0 -199
  122. package/app/Sources/ProcessQuery.swift +0 -151
  123. package/app/Sources/Project.swift +0 -28
  124. package/app/Sources/ProjectRow.swift +0 -368
  125. package/app/Sources/ProjectScanner.swift +0 -121
  126. package/app/Sources/ScreenMapState.swift +0 -2387
  127. package/app/Sources/ScreenMapView.swift +0 -2820
  128. package/app/Sources/ScreenMapWindowController.swift +0 -89
  129. package/app/Sources/SessionManager.swift +0 -72
  130. package/app/Sources/SettingsView.swift +0 -1053
  131. package/app/Sources/SettingsWindow.swift +0 -20
  132. package/app/Sources/TabGroupRow.swift +0 -178
  133. package/app/Sources/Terminal.swift +0 -259
  134. package/app/Sources/TerminalQuery.swift +0 -156
  135. package/app/Sources/TerminalSynthesizer.swift +0 -200
  136. package/app/Sources/Theme.swift +0 -163
  137. package/app/Sources/TilePickerView.swift +0 -209
  138. package/app/Sources/TmuxModel.swift +0 -53
  139. package/app/Sources/TmuxQuery.swift +0 -81
  140. package/app/Sources/WindowTiler.swift +0 -1755
  141. package/app/Sources/WorkspaceManager.swift +0 -434
  142. package/bin/lattices-app.js +0 -221
  143. package/bin/lattices.js +0 -1418
@@ -0,0 +1,224 @@
1
+ # Tiling Reference
2
+
3
+ Complete reference for Lattices window tiling — positions, grids, execution paths, and voice interpretation.
4
+
5
+ ## Position System
6
+
7
+ Every tile position is a cell in a **cols × rows** grid, expressed as fractional `(x, y, w, h)` of the screen's visible area (excluding menu bar and dock).
8
+
9
+ ### Named Positions
10
+
11
+ All valid position strings that `TilePosition` accepts:
12
+
13
+ | Position string | Grid | Cell (col, row) | Description |
14
+ |---|---|---|---|
15
+ | `maximize` | 1×1 | full | Full screen (100% × 100%) |
16
+ | `center` | — | — | Centered floating (70% × 80%, offset 15%/10%) |
17
+ | **Halves (2×1, full height)** | | | |
18
+ | `left` | 2×1 | 0,0 | Left 50% |
19
+ | `right` | 2×1 | 1,0 | Right 50% |
20
+ | **Halves (1×2, full width)** | | | |
21
+ | `top` | 1×2 | 0,0 | Top 50% |
22
+ | `bottom` | 1×2 | 0,1 | Bottom 50% |
23
+ | **Quarters (2×2)** | | | |
24
+ | `top-left` | 2×2 | 0,0 | Top-left 25% |
25
+ | `top-right` | 2×2 | 1,0 | Top-right 25% |
26
+ | `bottom-left` | 2×2 | 0,1 | Bottom-left 25% |
27
+ | `bottom-right` | 2×2 | 1,1 | Bottom-right 25% |
28
+ | **Thirds (3×1, full height)** | | | |
29
+ | `left-third` | 3×1 | 0,0 | Left 33% column |
30
+ | `center-third` | 3×1 | 1,0 | Center 33% column |
31
+ | `right-third` | 3×1 | 2,0 | Right 33% column |
32
+ | **Sixths (3×2)** | | | |
33
+ | `top-left-third` | 3×2 | 0,0 | Top-left sixth |
34
+ | `top-center-third` | 3×2 | 1,0 | Top-center sixth |
35
+ | `top-right-third` | 3×2 | 2,0 | Top-right sixth |
36
+ | `bottom-left-third` | 3×2 | 0,1 | Bottom-left sixth |
37
+ | `bottom-center-third` | 3×2 | 1,1 | Bottom-center sixth |
38
+ | `bottom-right-third` | 3×2 | 2,1 | Bottom-right sixth |
39
+ | **Fourths (4×1, full height)** | | | |
40
+ | `first-fourth` | 4×1 | 0,0 | Leftmost 25% column |
41
+ | `second-fourth` | 4×1 | 1,0 | Second 25% column |
42
+ | `third-fourth` | 4×1 | 2,0 | Third 25% column |
43
+ | `last-fourth` | 4×1 | 3,0 | Rightmost 25% column |
44
+ | **Eighths (4×2)** | | | |
45
+ | `top-first-fourth` | 4×2 | 0,0 | Top row, 1st column |
46
+ | `top-second-fourth` | 4×2 | 1,0 | Top row, 2nd column |
47
+ | `top-third-fourth` | 4×2 | 2,0 | Top row, 3rd column |
48
+ | `top-last-fourth` | 4×2 | 3,0 | Top row, 4th column |
49
+ | `bottom-first-fourth` | 4×2 | 0,1 | Bottom row, 1st column |
50
+ | `bottom-second-fourth` | 4×2 | 1,1 | Bottom row, 2nd column |
51
+ | `bottom-third-fourth` | 4×2 | 2,1 | Bottom row, 3rd column |
52
+ | `bottom-last-fourth` | 4×2 | 3,1 | Bottom row, 4th column |
53
+ | **Horizontal thirds (1×3)** | | | |
54
+ | `top-third` | 1×3 | 0,0 | Top 33% row |
55
+ | `middle-third` | 1×3 | 0,1 | Middle 33% row |
56
+ | `bottom-third` | 1×3 | 0,2 | Bottom 33% row |
57
+ | **Edge quarters** | | | |
58
+ | `left-quarter` | 4×1 | 0,0 | Leftmost 25% column |
59
+ | `right-quarter` | 4×1 | 3,0 | Rightmost 25% column |
60
+ | `top-quarter` | 1×4 | 0,0 | Top 25% row |
61
+ | `bottom-quarter` | 1×4 | 0,3 | Bottom 25% row |
62
+
63
+ ### Custom Grid Syntax
64
+
65
+ For arbitrary grids: compact `CxR:C,R` or canonical `grid:CxR:C,R`
66
+
67
+ - `C` = total columns, `R` = total rows
68
+ - Compact `CxR:C,R` starts at 1 from the top-left
69
+ - Canonical `grid:CxR:C,R` starts at 0 for API/wire compatibility
70
+ - Example: `5x3:3,2` or `grid:5x3:2,1` = center cell of a 5×3 grid
71
+ - Spans can use two inclusive corners, such as `4x4:1,1-2,2`
72
+
73
+ Parsed by `PlacementSpec` / `parseGridString()` into fractional `(x, y, w, h)`.
74
+
75
+ ### Placement Contract
76
+
77
+ Placement strings are convenient at the boundary, but the daemon uses a
78
+ typed placement model internally:
79
+
80
+ - named tile positions
81
+ - arbitrary grid cells
82
+ - raw fractional rectangles
83
+
84
+ That is what keeps CLI, daemon, voice, and hands-off execution aligned.
85
+
86
+ ## Drag Snap Zones
87
+
88
+ The menu bar app can also use placement specs as drag-to-snap targets.
89
+ When you drag a window, Lattices shows faint landing zones plus a live
90
+ preview of the resulting frame. Releasing over a zone tiles the dragged
91
+ window to that placement. Hold `Command` while dragging to reveal snap
92
+ mode, and release `Command` to drop back to a normal free drag without
93
+ ending the gesture.
94
+
95
+ The recommended agent-owned config lives in `~/.lattices/snap-zones.json`:
96
+
97
+ ```json
98
+ {
99
+ "enabled": true,
100
+ "modifier": "command",
101
+ "zoneOpacity": 0.08,
102
+ "highlightOpacity": 0.18,
103
+ "previewOpacity": 0.14,
104
+ "rules": [
105
+ {
106
+ "id": "left-edge",
107
+ "label": "Left",
108
+ "placement": "left",
109
+ "trigger": { "x": 0.0, "y": 0.18, "w": 0.12, "h": 0.64 },
110
+ "priority": 10
111
+ },
112
+ {
113
+ "id": "notes-rail",
114
+ "label": "Notes",
115
+ "placement": { "x": 0.68, "y": 0.0, "w": 0.32, "h": 1.0 },
116
+ "trigger": { "x": 0.88, "y": 0.18, "w": 0.12, "h": 0.64 },
117
+ "priority": 30
118
+ }
119
+ ]
120
+ }
121
+ ```
122
+
123
+ Notes:
124
+
125
+ - `rules` is the preferred list key. `zones` is still accepted for backward compatibility.
126
+ - `modifier` accepts `command`, `option`, `control`, or `shift`.
127
+ - `placement` can be a named placement/preset string or raw fractions.
128
+ - `trigger` uses normalized `(x, y, w, h)` fractions of the screen's
129
+ visible area, with `y = 0` at the top.
130
+ - `priority` breaks ties when trigger regions overlap.
131
+ - `trigger` can also be a named placement or preset string if you want
132
+ the trigger region itself to reuse an existing tile definition.
133
+ - The older `~/.lattices/grid.json` `snapZones` section still works, but
134
+ `~/.lattices/snap-zones.json` is the cleaner file for agents to edit.
135
+
136
+ ## Execution Paths
137
+
138
+ The old split-brain tiling logic has been collapsed toward a shared path.
139
+ The canonical mutation is now:
140
+
141
+ ```json
142
+ { "method": "window.place", "params": { "placement": "left" } }
143
+ ```
144
+
145
+ All higher-level surfaces should compile into the same placement model:
146
+
147
+ - **Daemon / CLI**: `window.place` is the canonical mutation
148
+ - **Compatibility**: `window.tile` maps to `window.place`
149
+ - **Voice / hands-off**: parse natural language, then emit a placement spec
150
+ - **HUD**: still exposes a smaller shortcut set, but should target the same placement executor
151
+
152
+ The important change is that placement resolution now happens through
153
+ `PlacementSpec`, not through separate ad hoc parsers per surface.
154
+
155
+ ## Frame Calculation
156
+
157
+ All paths eventually call one of:
158
+
159
+ 1. **`WindowTiler.tileFrame(for:on:)`** — takes a `TilePosition` + `NSScreen`, returns a `CGRect` in AX coordinates (origin = top-left of primary display)
160
+ 2. **`WindowTiler.tileFrame(fractions:inDisplay:)`** — takes raw `(x, y, w, h)` fractions + display rect
161
+
162
+ The math:
163
+ ```
164
+ visible = screen.visibleFrame (excludes menu bar + dock)
165
+ primaryH = primary screen height
166
+ axTop = primaryH - visible.maxY (flip from AppKit bottom-left to AX top-left)
167
+
168
+ frame.x = visible.x + visible.width × fx
169
+ frame.y = axTop + visible.height × fy
170
+ frame.w = visible.width × fw
171
+ frame.h = visible.height × fh
172
+ ```
173
+
174
+ ## Window Targeting
175
+
176
+ The `tile_window` intent resolves the target window in this priority:
177
+
178
+ 1. **`session`** slot → `LatticesApi.window.place` / `window.tile` compatibility wrapper
179
+ 2. **`wid`** slot → `DesktopModel.shared.windows[wid]` (direct window ID lookup)
180
+ 3. **`app`** slot → first matching window by `localizedCaseInsensitiveContains`, excluding recently-tiled windows (prevents double-matching in batch commands like "Chrome left, Chrome right")
181
+ 4. **No target** → tiles the frontmost window
182
+
183
+ ### HandsOff-specific targeting
184
+
185
+ The system prompt instructs the LLM to always use `wid` from the desktop snapshot, never `app`. This avoids ambiguity when multiple windows of the same app exist. In speech, the LLM says the app name; in the JSON action, it uses the wid.
186
+
187
+ ## Common Layouts (multi-action)
188
+
189
+ These are composed from multiple `tile_window` actions:
190
+
191
+ | Layout | Actions |
192
+ |---|---|
193
+ | Split screen | left + right |
194
+ | Stack | top + bottom |
195
+ | Thirds | left-third + center-third + right-third |
196
+ | Quadrants | top-left + top-right + bottom-left + bottom-right |
197
+ | Six-up (3×2) | All six `*-*-third` positions |
198
+ | Eight-up (4×2) | All eight `*-*-fourth` positions |
199
+ | Distribute | Single `distribute` intent (auto-grid) |
200
+
201
+ CLI shortcuts compile into the same distributor:
202
+
203
+ - `lattices tile family` → smart-grid the frontmost app's visible windows
204
+ - `lattices distribute iTerm2 right` → smart-grid visible iTerm windows inside the right half
205
+
206
+ ## HandsOff Smart Distribution
207
+
208
+ When the LLM sends multiple `tile_window` actions targeting the **same position**, `HandsOffSession.distributeTileActions()` subdivides:
209
+
210
+ - 2+ windows → "left" becomes top-left, left, bottom-left
211
+ - 2+ windows → "right" becomes top-right, right, bottom-right
212
+ - 2+ windows → "maximize" fans out to quadrants then halves
213
+
214
+ ## Guardrails
215
+
216
+ - **Typed placement validation**: invalid placement strings or objects are rejected at the daemon boundary.
217
+ - **Recently-tiled dedup**: `IntentEngine.recentlyTiledWids` prevents the same window from being matched twice within 2 seconds during batch operations.
218
+ - **Compatibility wrappers**: `window.tile` still works, but routes through the same placement machinery.
219
+
220
+ ## Current Gaps
221
+
222
+ 1. **Voice extraction still needs to catch up**: the canonical executor understands horizontal thirds and edge quarters, but the local voice resolver still needs broader phrase coverage.
223
+ 2. **HUD coverage is narrower than the executor**: keyboard tiling exposes a small subset of the full placement vocabulary.
224
+ 3. **Optimization and layer actions are still wrapper-level**: `space.optimize` and `layer.activate` are now stable action IDs, but they currently wrap existing distributor and layer-switching behavior rather than a full planner.
package/docs/twins.md ADDED
@@ -0,0 +1,138 @@
1
+ ---
2
+ title: Project Twins
3
+ description: Pi-backed project twins for mediated, persistent agent execution
4
+ order: 3
5
+ ---
6
+
7
+ A project twin is a persistent software counterpart to a codebase.
8
+
9
+ It is not the primary agent. It is the project-native runtime that sits
10
+ between a general-purpose caller and the project's execution protocol.
11
+
12
+ ## Why a twin exists
13
+
14
+ General-purpose agents are interchangeable. Project protocols are not.
15
+
16
+ If every primary agent has to learn the project's tool surface, memory
17
+ policy, protocol semantics, and context conventions from scratch, the
18
+ integration becomes brittle. A twin fixes that by becoming the stable
19
+ project-facing runtime:
20
+
21
+ - The **primary agent** asks for work
22
+ - The **twin** resumes with the right context and memory
23
+ - The **protocol** stays behind the twin boundary
24
+
25
+ ```text
26
+ primary agent -> project twin -> project protocol / harness
27
+ ```
28
+
29
+ The twin is the client of record for the project.
30
+
31
+ ## Responsibilities
32
+
33
+ A project twin owns:
34
+
35
+ - Project-scoped identity
36
+ - Persistent session continuity
37
+ - Memory compaction and continuation
38
+ - Tool policy and allowed capabilities
39
+ - Protocol knowledge
40
+ - Project context assembly
41
+ - Caller-facing summaries and handoffs
42
+
43
+ A primary agent should not speak the project protocol directly. It should
44
+ invoke the twin.
45
+
46
+ ## Pi-backed runtime
47
+
48
+ Pi is a good fit for the twin runtime because it already provides:
49
+
50
+ - Persistent sessions
51
+ - RPC mode for long-running subprocess integration
52
+ - Tool calling with an explicit harness
53
+ - Compaction and summarization hooks
54
+ - Context files, prompt templates, and extension loading
55
+
56
+ That makes the split:
57
+
58
+ - **Twin**: product concept and policy boundary
59
+ - **Pi**: reasoning and session runtime
60
+ - **Host system**: orchestration, durable memory, and protocol adapters
61
+
62
+ Pi powers the twin. It does not define the twin.
63
+
64
+ ## Invocation model
65
+
66
+ The primary agent makes a single mediated call into the twin:
67
+
68
+ 1. Resume the twin session
69
+ 2. Inject caller context, project memory, and protocol state
70
+ 3. Let the twin do project-local work inside the harness
71
+ 4. Return a concise result to the caller
72
+
73
+ The caller should see a stable capability surface such as:
74
+
75
+ - `status`
76
+ - `inspect`
77
+ - `plan`
78
+ - `execute`
79
+ - `summarize`
80
+ - `handoff`
81
+
82
+ It should not see raw protocol-shaped operations unless that protocol is
83
+ itself the public product surface.
84
+
85
+ ## Implementation in this repo
86
+
87
+ This repo now includes a Pi-backed runtime in
88
+ [`bin/project-twin.ts`](/Users/arach/dev/lattices/bin/project-twin.ts).
89
+
90
+ The runtime:
91
+
92
+ - Spawns `pi --mode rpc` as a persistent subprocess
93
+ - Stores project-local session state under `.openscout/twins/<name>/sessions`
94
+ - Exposes a stable `invoke()` API for callers
95
+ - Optionally injects OpenScout relay context if `.openscout/relay*` exists
96
+
97
+ The default harness is intentionally narrow:
98
+
99
+ - Built-in Pi tools are explicitly pinned to `read,bash,edit,write`
100
+ - Extension, skill, and prompt-template discovery are disabled by default
101
+ - Project instructions still come from `AGENTS.md` and related context files
102
+
103
+ This keeps the twin deterministic unless the host explicitly widens the
104
+ surface.
105
+
106
+ ## Example
107
+
108
+ ```ts
109
+ import { ProjectTwin } from "@lattices/cli"
110
+
111
+ const twin = new ProjectTwin({
112
+ cwd: "/Users/you/dev/my-project",
113
+ name: "my-project",
114
+ model: "anthropic/claude-sonnet-4-5",
115
+ })
116
+
117
+ await twin.start()
118
+
119
+ const result = await twin.invoke({
120
+ caller: "primary-agent",
121
+ protocol: "openscout-relay",
122
+ memory: "The caller is debugging relay enrollment and wants the next safe action.",
123
+ task: "Inspect the available project context and summarize what the caller should do next.",
124
+ })
125
+
126
+ console.log(result.text)
127
+
128
+ await twin.stop()
129
+ ```
130
+
131
+ ## Design rule
132
+
133
+ All project-specific protocol semantics should live behind the twin
134
+ boundary.
135
+
136
+ The primary agent should invoke the twin as a skill-like capability.
137
+ The twin should own context assembly, protocol interaction, and the final
138
+ handoff back to the caller.
@@ -0,0 +1,278 @@
1
+ # Voice Command Protocol — Lattices ↔ Vox
2
+
3
+ ## Overview
4
+
5
+ Lattices delegates all audio capture and transcription to Vox via WebSocket JSON-RPC. Lattices never accesses the microphone directly — it borrows Vox's mic and transcription pipeline, receives English text back, and routes it through its own intent engine.
6
+
7
+ These dictations are **ephemeral** — Vox does not persist them as memos, sync them, or add them to Vox's history. Lattices is just using Vox as a transcription pipe.
8
+
9
+ ## Vox Process Model
10
+
11
+ Vox consists of three independent processes:
12
+
13
+ | Process | Role | Relevance to Lattices |
14
+ |---|---|---|
15
+ | **Vox.app** | Main UI — menu bar, notch visualization, memo history | None |
16
+ | **Vox** | Background service — mic access, recording, hotkeys, orchestrates transcription, state notifications | **This is what Lattices connects to** |
17
+ | **VoxEngine** | Transcription engine — runs Whisper models, called by Vox internally | Indirect — Vox delegates to it |
18
+
19
+ Vox is the right target because:
20
+ - It owns the mic and recording lifecycle
21
+ - It's the long-running background process (always up when Vox is installed)
22
+ - It already orchestrates the record → transcribe → result pipeline
23
+ - It's easy to discover via its existing DistributedNotification
24
+
25
+ ## Service Discovery
26
+
27
+ Lattices never hardcodes ports. Discovery uses two mechanisms:
28
+
29
+ ### 1. Well-known file (at rest)
30
+
31
+ Vox writes its service configuration on startup:
32
+
33
+ ```
34
+ ~/.vox/services.json
35
+ ```
36
+
37
+ ```json
38
+ {
39
+ "agent": {"port": 19823, "pid": 48209},
40
+ "engine": {"port": 19821, "pid": 48210},
41
+ "sync": {"port": 19820, "pid": 48208},
42
+ "inference": {"port": 19822, "pid": 48212}
43
+ }
44
+ ```
45
+
46
+ Lattices reads `agent.port` from this file. If the file doesn't exist, Vox isn't installed.
47
+
48
+ ### 2. DistributedNotification (live discovery)
49
+
50
+ Vox posts when it comes online:
51
+
52
+ ```
53
+ Notification: com.jdi.vox.agent.live.ready
54
+ UserInfo: {"agentPort": 19823, "pid": 48209}
55
+ ```
56
+
57
+ Lattices subscribes to this on startup. Handles:
58
+ - **Vox launches after Lattices** — Lattices picks up the port dynamically
59
+ - **Vox restarts** — Lattices reconnects with the new port
60
+ - **Port changes** — no stale config
61
+
62
+ ### 3. Health check
63
+
64
+ After discovering a port, Lattices confirms Vox is alive:
65
+
66
+ ```json
67
+ → {"id": "hc", "method": "ping"}
68
+ ← {"id": "hc", "result": {"pong": true}}
69
+ ```
70
+
71
+ If ping fails, Lattices marks voice as unavailable and retries on the next `live.ready` or after ~30 seconds.
72
+
73
+ ### When Vox is not running
74
+
75
+ Three possible states:
76
+
77
+ | State | How detected | Lattices behavior |
78
+ |---|---|---|
79
+ | **Not installed** | `/Applications/Vox.app` doesn't exist and no `~/.vox/` dir | Footer: `[Space] Voice (unavailable)` — no recovery action |
80
+ | **Installed but not running** | App bundle exists, but `services.json` missing/stale or ping fails | Footer: `[Space] Voice (start Vox)` — pressing Space runs `open /Applications/Vox.app`, which brings up Vox as a side effect |
81
+ | **Running** | Ping succeeds | Normal operation |
82
+
83
+ Launch-on-demand flow:
84
+ 1. User presses Space while Vox is down but Vox is installed
85
+ 2. Lattices runs `NSWorkspace.shared.open(URL(fileURLWithPath: "/Applications/Vox.app"))`
86
+ 3. Feedback strip shows "Starting Vox..."
87
+ 4. Lattices waits for `live.ready` notification (timeout: 10s)
88
+ 5. On `live.ready`, connects and proceeds with `startDictation`
89
+ 6. On timeout, shows "Couldn't reach Vox — try opening it manually"
90
+
91
+ Passive behavior (no user action):
92
+ - No log spam — just a quiet unavailable state
93
+ - Lattices keeps listening for `live.ready` and re-checks `services.json` periodically (~30s)
94
+ - The moment Vox comes online, voice becomes available — no restart needed
95
+
96
+ ## Protocol
97
+
98
+ ### Wire Format
99
+
100
+ Uses Vox's JSON-RPC format over WebSocket:
101
+
102
+ ```
103
+ Request: {"id": "...", "method": "...", "params": {...}}
104
+ Response: {"id": "...", "result": {...}} or {"id": "...", "error": "..."}
105
+ Event: {"event": "...", "data": {...}} (server push, no id)
106
+ ```
107
+
108
+ ### Methods (Lattices → Vox)
109
+
110
+ **`startDictation`** — Start recording from the mic.
111
+
112
+ ```json
113
+ {"id": "1", "method": "startDictation", "params": {
114
+ "source": "lattices",
115
+ "persist": false
116
+ }}
117
+ ```
118
+
119
+ - `source` — identifies the caller (for Vox's logging/UI)
120
+ - `persist: false` — do not save as a memo, do not sync, do not show in Vox history
121
+
122
+ Response (immediate ack):
123
+ ```json
124
+ {"id": "1", "result": {"ok": true}}
125
+ ```
126
+
127
+ Error responses:
128
+ ```json
129
+ {"id": "1", "error": "Microphone access denied"}
130
+ {"id": "1", "error": "No model loaded"}
131
+ {"id": "1", "error": "mic_busy", "owner": "vox"}
132
+ ```
133
+
134
+ The `mic_busy` error means another consumer (Vox's own memo recording, or another client) already has an active dictation. The `owner` field identifies who holds the mic. Lattices shows: "Mic in use by Vox — finish your memo first".
135
+
136
+ The reverse case (user hits Vox hotkey while Lattices has the mic) is handled on Vox's side — it should reject its own recording with an equivalent busy state. Vox is the single owner of mic arbitration.
137
+
138
+ **`stopDictation`** — Stop recording and return the transcript.
139
+
140
+ ```json
141
+ {"id": "2", "method": "stopDictation"}
142
+ ```
143
+
144
+ Response (after transcription completes):
145
+ ```json
146
+ {"id": "2", "result": {
147
+ "transcript": "tile this left",
148
+ "confidence": 0.94,
149
+ "durationMs": 1820
150
+ }}
151
+ ```
152
+
153
+ **`cancelDictation`** — Abort without transcribing.
154
+
155
+ ```json
156
+ {"id": "3", "method": "cancelDictation"}
157
+ ```
158
+
159
+ ```json
160
+ {"id": "3", "result": {"ok": true}}
161
+ ```
162
+
163
+ ### Events (Vox → Lattices)
164
+
165
+ Pushed over the WebSocket connection during an active dictation.
166
+
167
+ | Event | When | Data |
168
+ |---|---|---|
169
+ | `dictation.started` | Mic is hot, recording has begun | `{"source": "lattices"}` |
170
+ | `dictation.transcribing` | Recording stopped, model is running | `{}` |
171
+ | `dictation.result` | Transcription complete | `{"transcript": "...", "confidence": 0.94, "durationMs": 1820}` |
172
+ | `dictation.error` | Something failed during recording or transcription | `{"message": "..."}` |
173
+
174
+ ## Disconnect Contract
175
+
176
+ If the WebSocket connection drops mid-dictation (Lattices crashes, user quits, network hiccup), Vox **must** auto-cancel the in-flight dictation:
177
+
178
+ 1. Stop recording immediately
179
+ 2. Discard any captured audio — do not transcribe
180
+ 3. Release the mic so Vox's own UI or a reconnecting client can use it
181
+ 4. Log the orphaned dictation for diagnostics: `[dictation] orphaned session from lattices — connection dropped, auto-cancelled`
182
+
183
+ Vox treats a closed WebSocket as an implicit `cancelDictation`. No grace period, no buffering — if the consumer is gone, the recording is worthless.
184
+
185
+ On the Lattices side, if the connection drops while in `listening` or `transcribing` state:
186
+ - Feedback strip: "Connection lost" (red)
187
+ - Attempt reconnect via normal discovery (ping → `services.json` → wait for `live.ready`)
188
+ - Do not auto-retry the dictation — the user needs to press Space again
189
+
190
+ ## End-to-End Lifecycle
191
+
192
+ ```mermaid
193
+ sequenceDiagram
194
+ participant U as User
195
+ participant L as Lattices UI
196
+ participant TA as Vox
197
+ participant IE as Intent Engine
198
+
199
+ U->>L: Press Space (in cheat sheet)
200
+ L->>TA: startDictation (persist: false)
201
+
202
+ alt Error
203
+ TA-->>L: error (mic denied / no model)
204
+ L->>U: Red text in feedback strip
205
+ else OK
206
+ TA-->>L: {ok: true}
207
+ TA-->>L: dictation.started
208
+ L->>U: Green dot (pulsing) + "Listening..."
209
+
210
+ Note over U,TA: User speaks...
211
+
212
+ U->>L: Press Space again
213
+ L->>TA: stopDictation
214
+ TA-->>L: dictation.transcribing
215
+ L->>U: "Transcribing..."
216
+
217
+ TA-->>L: {transcript: "tile this left", confidence: 0.94}
218
+ L->>U: Show transcript
219
+ end
220
+
221
+ L->>IE: Classify via NLEmbedding
222
+ IE-->>L: intent: tile_window, slots: {position: left}, confidence: 0.95
223
+ L->>U: Show intent + slots
224
+
225
+ L->>IE: Execute
226
+ IE-->>L: result
227
+ L->>U: "Done" or error
228
+
229
+ Note over L: Log entry written
230
+ ```
231
+
232
+ ## UI States
233
+
234
+ | State | Feedback strip | Footer |
235
+ |---|---|---|
236
+ | **Idle** | Hidden | `[Space] Voice [ESC] Dismiss` |
237
+ | **Not installed** | Hidden | `[Space] Voice (unavailable) [ESC] Dismiss` |
238
+ | **Installed, not running** | Hidden | `[Space] Voice (start Vox) [ESC] Dismiss` |
239
+ | **Starting** | "Starting Vox..." | `[ESC] Cancel` |
240
+ | **Error** | Red: "Mic access denied" or "Mic in use by Vox" | `[ESC] Dismiss` |
241
+ | **Disconnected** | Red: "Connection lost" | `[ESC] Dismiss` |
242
+ | **Listening** | Green dot + "Listening..." | `[Space] Stop [ESC] Cancel` |
243
+ | **Transcribing** | "Transcribing..." | `[ESC] Cancel` |
244
+ | **Result** | `"tile this left"` → `tile window · position: left` → `Done` | `[Space] New [ESC] Dismiss` |
245
+
246
+ ## Logging
247
+
248
+ Every voice command produces a diagnostic log entry:
249
+
250
+ ```
251
+ [voice] "tile this left" → tile_window(position: left) → ok (conf=0.95, 1820ms)
252
+ [voice] "organize my stuff" → distribute() → ok (conf=0.79, 2100ms)
253
+ [voice] "do something weird" → (no match, conf=0.41, 900ms)
254
+ [voice] error: Vox not running
255
+ [voice] error: mic_busy (owner: vox)
256
+ [voice] error: connection dropped mid-dictation
257
+ [voice] launched Vox, connected in 2.1s
258
+ ```
259
+
260
+ ## Implementation Scope
261
+
262
+ ### Lattices side
263
+ - Use `@vox/client` SDK (`VoxClient` with `service: "agent"`, `clientId: "lattices"`, `capabilities: ["dictation"]`) — see `vox/sdk/SDK.md` for full reference
264
+ - Replace `AVAudioRecorder` in `VoxAudioProvider` with `createDictationSession().start({ persist: false })`
265
+ - Remove mic entitlement and `NSMicrophoneUsageDescription` (Lattices never touches the mic)
266
+ - Service discovery, auto-reconnect, and auth are handled by the SDK
267
+ - Map `DictationSession` events (`stateChange`, `partialTranscript`, `finalTranscript`, `error`) to cheat sheet UI states
268
+ - Handle `MicBusyError` — show `"Mic in use by ${error.owner}"`
269
+
270
+ ### Vox side (separate repo)
271
+ - Expose a WebSocket bridge (or add methods to existing bridge)
272
+ - Add `startDictation`, `stopDictation`, `cancelDictation` handlers
273
+ - Emit `dictation.started`, `dictation.transcribing`, `dictation.result`, `dictation.error` events
274
+ - Honor `persist: false` — skip memo creation and sync
275
+ - Write `~/.vox/services.json` on startup (all service ports)
276
+ - Include `agentPort` in `live.ready` notification userInfo
277
+ - Return `mic_busy` error with `owner` field when another consumer holds the mic
278
+ - Auto-cancel dictation on WebSocket disconnect (closed socket = implicit cancel)