shmakk 1.2.3 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/.env.example +11 -0
  2. package/README.md +75 -1
  3. package/docs/index.html +154 -16
  4. package/docs/mcp.md +78 -0
  5. package/docs/ssh.md +82 -0
  6. package/docs/vibedit-analysis.md +375 -0
  7. package/docs/vim.md +110 -0
  8. package/docs/voice.md +4 -0
  9. package/package.json +9 -5
  10. package/scripts/test-vibedit.js +45 -0
  11. package/scripts/vibedit-demo.sh +52 -0
  12. package/skills/shmakk-skill-creator.md +269 -0
  13. package/src/_check.js +7 -0
  14. package/src/_check_schema.js +5 -0
  15. package/src/_cleanup.js +18 -0
  16. package/src/_fix.js +9 -0
  17. package/src/_test_import.js +15 -0
  18. package/src/agent.js +11 -4
  19. package/src/browser-daemon.js +209 -0
  20. package/src/browser.js +10 -0
  21. package/src/cli/browserDaemon.js +60 -0
  22. package/src/cli/connectBrowser.js +137 -0
  23. package/src/cli.js +235 -8
  24. package/src/completions.js +8 -0
  25. package/src/control.js +273 -1
  26. package/src/core/browserConnector.js +523 -0
  27. package/src/correction.js +6 -0
  28. package/src/electron.js +305 -0
  29. package/src/endpoints.js +74 -9
  30. package/src/index.js +24 -1
  31. package/src/llm.js +501 -61
  32. package/src/mobile.js +307 -0
  33. package/src/notify.js +51 -3
  34. package/src/orchestrator.js +35 -1
  35. package/src/pty.js +11 -6
  36. package/src/review.js +45 -11
  37. package/src/self-commands.js +153 -0
  38. package/src/session-convert.js +508 -0
  39. package/src/session-search.js +31 -0
  40. package/src/session.js +392 -46
  41. package/src/skills/browserActions.ts +984 -0
  42. package/src/skills.js +451 -24
  43. package/src/system-prompt.js +31 -25
  44. package/src/tools.js +81 -0
  45. package/src/vibedit/control.js +534 -0
  46. package/src/vibedit/electron.js +108 -0
  47. package/src/vibedit/files.js +171 -0
  48. package/src/vibedit/index.js +298 -0
  49. package/src/vibedit/overlay.js +1482 -0
  50. package/src/vibedit/prompts.js +245 -0
  51. package/src/vibedit/state.js +32 -0
  52. package/src/vim.js +410 -0
@@ -0,0 +1,375 @@
1
+ # vibedit Architecture Analysis
2
+
3
+ ## Overview
4
+
5
+ vibedit is a visual in-browser editor that lets you click on elements of a live webpage, edit them visually, and map those edits back to source code changes on disk. It runs the target app in a Playwright Chromium instance, injects a shadow-DOM overlay panel, and communicates with an LLM (default: LM Studio running qwen3.5-9b) over an OpenAI-compatible API. Screenshots are captured server-side via Playwright's `page.screenshot()` and sent as base64 JPEGs in multimodal chat requests.
6
+
7
+ ---
8
+
9
+ ## System Architecture (Data Flow)
10
+
11
+ ```
12
+ [User browser page] <--WebSocket--> [Node control server] <--HTTP POST--> [LM Studio /chat/completions]
13
+ | | |
14
+ overlay.js injected control.js llm.js (OpenAI-compatible)
15
+ (shadow DOM panel) (WS message routing, (fetch wrapper with
16
+ - DOM editing screenshot capture, multimodal retry)
17
+ - WS client LLM orchestration)
18
+ - flow recording
19
+ |
20
+ files.js (source file matching + edit block application)
21
+ prompts.js (system/user prompt templates)
22
+ ```
23
+
24
+ ---
25
+
26
+ ## 1. Screenshot Capture
27
+
28
+ **Where:** `src/control.js:30-34` — `screenshotB64()`
29
+
30
+ ```js
31
+ async function screenshotB64() {
32
+ const buf = await page.screenshot({ type: "jpeg", quality: 60, fullPage: false });
33
+ return buf.toString("base64");
34
+ }
35
+ ```
36
+
37
+ Uses Playwright's `page.screenshot()` (viewport only, `fullPage: false`). Quality is 60 JPEG. Returns a base64-encoded string. The `page` reference is captured from `index.js:43` when launching the browser.
38
+
39
+ The screenshots are attached to LLM requests as `images` array entries inside the user message (in OpenAI vision format): base64 data URIs with `data:image/jpeg;base64,${b64}` prefix. See `src/llm.js:17-20`.
40
+
41
+ **Control flow:** Every `chat`, `save`, and `flowApply` message type triggers a `screenshotB64()` call BEFORE calling the LLM. The screenshot is the current viewport state at the moment the user clicked Send, Save, or Apply. If `ctx.vision` is falsy (i.e., `--vision` flag not passed), screenshot capture is skipped and the request is text-only.
42
+
43
+ ---
44
+
45
+ ## 2. LLM API Endpoint Format
46
+
47
+ **Where:** `src/llm.js` — the `chat()` function
48
+
49
+ ### Endpoint
50
+
51
+ ```
52
+ POST {ctx.lmUrl}/chat/completions
53
+ ```
54
+
55
+ Default `ctx.lmUrl` = `http://127.0.0.1:1234/v1` (LM Studio default).
56
+
57
+ Configurable via:
58
+ - `--lm <url>` CLI flag
59
+ - `LMSTUDIO_URL` environment variable
60
+ - Falls back to the hardcoded default above
61
+
62
+ ### Request Body Shape
63
+
64
+ ```json
65
+ {
66
+ "model": "qwen/qwen3.5-9b",
67
+ "temperature": 0.2,
68
+ "max_tokens": 2048,
69
+ "messages": [
70
+ {
71
+ "role": "system",
72
+ "content": "You are a frontend editing assistant..."
73
+ },
74
+ {
75
+ "role": "user",
76
+ "content": [
77
+ { "type": "text", "text": "Page URL: http://..." },
78
+ {
79
+ "type": "image_url",
80
+ "image_url": { "url": "data:image/jpeg;base64,<base64>" }
81
+ }
82
+ ]
83
+ }
84
+ ]
85
+ }
86
+ ```
87
+
88
+ ### Key Parameters
89
+
90
+ | Param | Default | Override |
91
+ |-------|---------|----------|
92
+ | `model` | `qwen/qwen3.5-9b` | `--model <id>` or `VIBEDIT_MODEL` env |
93
+ | `temperature` | `0.2` | Hardcoded in `chat()` call site |
94
+ | `max_tokens` | `2048` (chat), `4096` (save/flowApply) | Passed via `opts.maxTokens` |
95
+ | `vision` | `false` | `--vision` flag or `VIBEDIT_VISION=1` env |
96
+
97
+ ### Multimodal Handling
98
+
99
+ Messages carry an `images` array of base64 JPEG strings. If the model rejects multimodal input (non-2xx response), `llm.js` retries once with text-only (`src/llm.js:40-46`).
100
+
101
+ ```js
102
+ try { return await call(hasImages); }
103
+ catch (err) {
104
+ if (hasImages) {
105
+ console.warn("[vibedit] multimodal request failed, retrying text-only:", err.message);
106
+ return await call(false);
107
+ }
108
+ throw err;
109
+ }
110
+ ```
111
+
112
+ ### Response Parsing
113
+
114
+ The raw response is extracted as: `data?.choices?.[0]?.message?.content ?? ""`
115
+
116
+ This string is then parsed differently depending on the message type.
117
+
118
+ ---
119
+
120
+ ## 3. Prompt Design (Three Distinct Prompt Sets)
121
+
122
+ ### 3.1 Chat Prompt (live DOM editing)
123
+
124
+ **System prompt** (`src/prompts.js:3-21` — `chatSystem()`):
125
+
126
+ The LLM is told it is a "frontend editing assistant embedded in a live web page." It receives a pruned DOM and a user request, and must respond with ONLY a JSON object in this shape:
127
+
128
+ ```json
129
+ {
130
+ "reply": "one or two short sentences for the user",
131
+ "ops": [
132
+ { "selector": "css selector", "action": "setText", "value": "new text" },
133
+ { "selector": "css selector", "action": "setStyle", "style": { "color": "#ff0000" } },
134
+ { "selector": "css selector", "action": "setHTML", "value": "<b>html</b>" },
135
+ { "selector": "css selector", "action": "setAttr", "name": "src", "value": "..." },
136
+ { "selector": "css selector", "action": "remove" }
137
+ ]
138
+ }
139
+ ```
140
+
141
+ Rules encoded in the system prompt:
142
+ - Prefer IDs, then stable class names
143
+ - Return `"ops": []` if no visual change requested
144
+ - Never invent selectors; say so in `reply` if unsure
145
+
146
+ **User prompt** (`src/prompts.js:23-28` — `chatUser()`):
147
+
148
+ ```
149
+ Page URL: {msg.url}
150
+ Page title: {msg.title}
151
+ Currently selected element: {msg.selected} (if any)
152
+ Pruned DOM: {msg.dom} (truncated to 9000 chars)
153
+
154
+ User request: {msg.text}
155
+ ```
156
+
157
+ The DOM is the pruned outerHTML of `<body>` with `<script>`, `<style>`, `<noscript>`, `<svg>`, metadata stripped, and data- attributes removed or truncated (see `overlay.js:260-273` — `prunedDOM()`).
158
+
159
+ **Response parsing** (`src/control.js:48-56` — inside `handleChat()`):
160
+
161
+ ```js
162
+ let parsed;
163
+ try {
164
+ parsed = JSON.parse(stripFences(raw));
165
+ } catch {
166
+ parsed = { reply: raw, ops: [] };
167
+ }
168
+ send(ws, { type: "chatResult", reply: parsed.reply || "", ops: Array.isArray(parsed.ops) ? parsed.ops : [] });
169
+ ```
170
+
171
+ `stripFences()` removes leading/trailing markdown code fences (```json ... ```). Graceful fallback: if JSON parse fails, the raw text becomes `reply` and `ops` is empty.
172
+
173
+ ### 3.2 Save Prompt (source code mapping)
174
+
175
+ **System prompt** (`src/prompts.js:30-53` — `saveSystem()`):
176
+
177
+ The LLM is told to map live DOM edits back to source code. It outputs edit blocks in SEARCH/REPLACE format:
178
+
179
+ ```
180
+ FILE: relative/path/from/project/root
181
+ <<<<<<< SEARCH
182
+ exact lines copied verbatim from the provided file content
183
+ =======
184
+ the replacement lines
185
+ >>>>>>> REPLACE
186
+ ```
187
+
188
+ Rules:
189
+ - SEARCH must be character-for-character from the provided file content
190
+ - One block per distinct change; multiple blocks per file are fine
191
+ - JSX/Vue/Svelte: edit component source, not rendered HTML
192
+ - Inline styles should become CSS rule changes
193
+ - If a change cannot be located, skip it (do not guess paths)
194
+
195
+ **User prompt** (`src/prompts.js:55-79` — `saveUser()`):
196
+
197
+ For each tracked change, shows:
198
+ - DOM changes: `CHANGE N (selector: ...)` with BEFORE/AFTER outerHTML
199
+ - CSS changes: `CHANGE N (CSS rule for selector ...)` with existing rules and new declarations
200
+
201
+ Plus candidate source files (up to 5, shortlisted by `shortlistFiles()`).
202
+
203
+ ### 3.3 Flow Prompt (user interaction recording)
204
+
205
+ **System prompt:** Same `saveSystem()` as Save.
206
+
207
+ **User prompt** (`src/prompts.js:81-105` — `flowUser()`):
208
+
209
+ Shows a timestamped event log:
210
+ ```
211
+ [1.2s] click .header "Welcome"
212
+ [3.5s] scroll to y=450
213
+ [5.1s] typed in #email
214
+ ```
215
+
216
+ Plus the pruned DOM at end of recording, the user's instruction, and candidate source files. Three screenshots (first, middle, last) are included as images when vision is enabled.
217
+
218
+ ---
219
+
220
+ ## 4. Response Parsing and Code Modification
221
+
222
+ ### Chat Result (client-side)
223
+
224
+ The overlay receives `{ type: "chatResult", reply, ops }` via WebSocket. The `ops` array is processed by `applyOps()` in `overlay.js:331-349`:
225
+
226
+ ```js
227
+ function applyOps(ops) {
228
+ for (const op of ops) {
229
+ let el = document.querySelector(op.selector);
230
+ if (!el || isOurs(el)) continue;
231
+ trackBefore(el); // record original state
232
+ if (op.action === "setText") el.textContent = op.value ?? "";
233
+ else if (op.action === "setHTML") el.innerHTML = op.value ?? "";
234
+ else if (op.action === "setStyle" && op.style)
235
+ for (const [k, v] of Object.entries(op.style))
236
+ el.style.setProperty(toKebab(k), v);
237
+ else if (op.action === "setAttr") el.setAttribute(op.name, op.value ?? "");
238
+ else if (op.action === "remove") { el.remove(); }
239
+ }
240
+ }
241
+ ```
242
+
243
+ Supported DOM actions: `setText`, `setHTML`, `setStyle`, `setAttr`, `remove`.
244
+
245
+ Each modification is tracked in the `changes` Map (keyed by CSS path of the element) so it can be reverted or saved to source.
246
+
247
+ ### Save Result (server-side)
248
+
249
+ **Edit block parsing** (`src/files.js:100-128` — `applyEditBlocks()`):
250
+
251
+ The LLM's raw text output is parsed with regex:
252
+
253
+ ```js
254
+ const BLOCK_RE = /FILE:\s*(.+?)\s*\n<{5,}\s*SEARCH\s*\n([\s\S]*?)\n={5,}\s*\n([\s\S]*?)\n>{5,}\s*REPLACE/g;
255
+ ```
256
+
257
+ Two matching strategies:
258
+
259
+ 1. **Exact match** (`exactReplace`): simple `String.indexOf()` check. Fast path.
260
+ 2. **Fuzzy match** (`fuzzyReplace`): line-trimmed match that tolerates indentation drift. Splits both search and content into lines, trims whitespace, tries to find a contiguous match of the trimmed lines.
261
+
262
+ Vibedit project-local artifacts are stored under `.shmakk/state/`; generated
263
+ specs use `.shmakk/state/vibedit-specs/` and recorded flow media uses
264
+ `.shmakk/state/vibedit-sessions/`.
265
+
266
+ ### File Shortlisting
267
+
268
+ **Where:** `src/files.js:56-97` — `shortlistFiles()`
269
+
270
+ Before asking the LLM to generate edit blocks, vibedit determines which source files are relevant to the user's changes:
271
+
272
+ 1. Walk the project directory (excluding `node_modules`, `.git`, `dist`, `build`, etc.)
273
+ 2. Collect all files with source extensions (`.html`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.astro`, `.css`, `.scss`, `.less`, `.mjs`, `.cjs`)
274
+ 3. Extract "needles" from the change data: text fragments (6-80 chars), class names, IDs, CSS property names, selector parts
275
+ 4. Score each file by needle occurrence count (weighted by needle length, capped at 5000)
276
+ 5. Return top 5 files with content trimmed to fit within 16,000 chars total budget; for large files, show windows around hit lines
277
+
278
+ ---
279
+
280
+ ## 5. WebSocket Message Protocol
281
+
282
+ ### Client-to-Server Messages
283
+
284
+ | Type | Fields | Purpose |
285
+ |------|--------|---------|
286
+ | `chat` | `text`, `url`, `title`, `dom`, `selected` | Ask LLM about current page |
287
+ | `save` | `changes[]`, `url`, `dom` | Map live edits to source files |
288
+ | `flowStart` | (none) | Begin recording interaction flow |
289
+ | `flowStop` | (none) | End recording |
290
+ | `flowEvent` | `ev: { kind, selector, text, x, y, url }` | Log click/scroll/input/nav event |
291
+ | `flowApply` | `id`, `instruction`, `dom`, `url` | Apply LLM changes from recorded flow |
292
+ | `flowDiscard` | `id` | Delete the recorded session |
293
+
294
+ ### Server-to-Client Messages
295
+
296
+ | Type | Fields | Purpose |
297
+ |------|--------|---------|
298
+ | `hello` | `model`, `vision` | Connection established |
299
+ | `status` | `text` | Progress indicator |
300
+ | `chatResult` | `reply`, `ops[]` | LLM response for chat |
301
+ | `saveResult` | `ok`, `summary`, `applied[]`, `failed[]`, `modelOutput` | Result of source edits |
302
+ | `flowStarted` | `id` | Recording started |
303
+ | `flowStopped` | `id`, `shots`, `events[]`, `base` | Recording completed |
304
+ | `error` | `text` | Error message |
305
+
306
+ ---
307
+
308
+ ## 6. Flow Recording (Interaction Capture)
309
+
310
+ vibedit has a "userflow" feature that records user interactions (clicks, scrolls, input, navigation) as timed events while capturing screenshots every 1.5 seconds.
311
+
312
+ **Server-side** (`src/control.js:60-122`):
313
+
314
+ - `startFlow()`: Creates session dir, takes first screenshot, sets 1.5s interval timer
315
+ - `flowEvent`: Appended to `events[]` array in memory
316
+ - `stopFlow()`: Writes `events.json`, stops timer
317
+ - `handleFlowApply()`: Sends 3 screenshots (first/middle/last) as vision input, plus event timeline, to the LLM for source mapping
318
+
319
+ **Client-side** (`overlay.js`):
320
+
321
+ - Click events recorded with CSS path, text content (80 chars), coordinates
322
+ - Scroll events debounced at 250ms
323
+ - Input events on form fields
324
+ - Playback UI shows frames with scrubber and event annotations
325
+
326
+ ---
327
+
328
+ ## 7. Bootstrap & Runtime Flow
329
+
330
+ 1. **`bin/vibedit.js`** parses CLI args, resolves the target (package.json or HTML file)
331
+ 2. **`src/index.js` — `start()`**:
332
+ a. Starts or detects a dev server (npm/yarn/pnpm/bun `dev` script, or static file server on port 8362)
333
+ b. Launches Chromium via Playwright (headless: false, viewport: null for native size)
334
+ c. Injects `overlay.js` via `context.addInitScript()` so it runs on every page load
335
+ d. Passes the control server port to the overlay via `window.__VIBEDIT__.port`
336
+ e. Navigates to the app URL with retry logic
337
+ f. Starts the control server (WebSocket + HTTP) on port 8417
338
+ 3. **`src/control.js` — `startControlServer()`**: Handles all WebSocket messages, orchestrates LLM calls, serves session screenshots over HTTP for the playback UI
339
+ 4. **`src/overlay.js`**: Shadow-DOM panel connects to control server, provides chat UI, element inspector, CSS rule editor, flow recording, and applies AI-generated DOM ops
340
+
341
+ ---
342
+
343
+ ## 8. Key Files Summary
344
+
345
+ | File | Lines | Role |
346
+ |------|-------|------|
347
+ | `src/overlay.js` | ~710 | Client-side: shadow-DOM panel, element inspector, chat, flow recording, DOM ops application |
348
+ | `src/control.js` | ~185 | Server-side: WebSocket routing, screenshot capture, LLM orchestration, flow session management |
349
+ | `src/llm.js` | ~50 | OpenAI-compatible chat client with multimodal support and text-only fallback |
350
+ | `src/prompts.js` | ~105 | Prompt templates for chat, save, and flow modes |
351
+ | `src/files.js` | ~170 | Source file discovery, needle-based shortlisting, SEARCH/REPLACE edit block parsing and application |
352
+ | `src/index.js` | ~90 | Entry point: browser launch, overlay injection, dev server startup, shutdown handling |
353
+ | `src/devserver.js` | ~85 | Dev server detection (npm/yarn/pnpm/bun) and static file server |
354
+ | `bin/vibedit.js` | ~60 | CLI argument parsing |
355
+ | `package.json` | ~20 | Dependencies: `playwright`, `ws` |
356
+
357
+ ---
358
+
359
+ ## 9. Design Observations for shmakk Integration
360
+
361
+ 1. **Prompt rigidity is intentional**: Prompts are kept short and structured because the default model is 9B parameters. Moving to larger models or agent-based workflows (like shmakk's multi-agent system) would benefit from more descriptive prompts and structured output formats.
362
+
363
+ 2. **Screenshot + DOM dual input**: The vision LLM receives BOTH a base64 screenshot AND a text-based pruned DOM. The DOM is the primary source for operations (the LLM cannot "see" class names or selectors from the image alone), while the screenshot gives visual layout context.
364
+
365
+ 3. **DOM ops are simple but powerful**: Five operations cover most UI edits: text, HTML, styles, attributes, remove. No support for create/insert/move operations.
366
+
367
+ 4. **Source mapping is reactive**: Changes tracked in-memory during the editing session are sent to the LLM only when the user clicks "Save." The LLM then maps BEFORE/AFTER DOM blobs back to source files.
368
+
369
+ 5. **LLM has no filesystem access**: The LLM never sees the full project. It sees only up to 5 shortlisted files (determined by text-matching needle extraction). This keeps token usage low but can miss edits in unlisted files.
370
+
371
+ 6. **Model is swappable**: The LM Studio endpoint can point to any OpenAI-compatible API. The model ID defaults to qwen3.5-9b but can be any vision-capable model.
372
+
373
+ 7. **Single-page focus**: The tool is designed for web apps in a single browser page. No multi-tab, no Electron/mobile app support. Extending to Electron or mobile would require a different screenshot capture mechanism (e.g., native screenshot APIs) and potentially a different overlay injection strategy.
374
+
375
+ 8. **Flow recording is time-sampled**: Screenshots at 1.5s intervals + event log. The LLM sees 3 screenshots (first/middle/last) to understand the interaction timeline. This is a clever low-token approach for understanding user flows.
package/docs/vim.md ADDED
@@ -0,0 +1,110 @@
1
+ # shmakk Vim / vi
2
+
3
+ shmakk can wrap your normal `vi` or `vim` command inside a shmakk session and add AI editor commands without replacing your Vim setup.
4
+
5
+ ## Launch modes
6
+
7
+ ```bash
8
+ shmakk --vim vi # default: intercept vi
9
+ shmakk --vim vim # intercept vim
10
+ shmakk --vim disable # no Vim interception
11
+ ```
12
+
13
+ When enabled, shmakk creates a temporary executable shim and prepends it to `PATH` inside the shmakk shell. The shim launches your real editor, lets it load your normal vimrc/plugins/colors, then sources a generated shmakk Vim plugin.
14
+
15
+ ## Commands
16
+
17
+ | Command | Purpose |
18
+ |---------|---------|
19
+ | `:G <prompt>` | Generate code at the cursor |
20
+ | `:Tw <prompt>` | Write prose or documentation at the cursor |
21
+ | `:Cmd <command>` | Run a shell command in a scratch buffer |
22
+ | `:ShmakkSuggest` | Request a full-block code suggestion |
23
+ | `:ShmakkAccept` | Preview and accept a pending auto-suggestion |
24
+ | `:ShmakkPreview` | Preview a pending auto-suggestion |
25
+ | `:ShmakkDeny` | Clear a pending auto-suggestion |
26
+
27
+ Mappings:
28
+
29
+ | Mapping | Purpose |
30
+ |---------|---------|
31
+ | `<C-Space>` | Manual full-block suggestion with preview + Accept/Deny |
32
+ | `<leader>sa` | Accept pending auto-suggestion |
33
+ | `<leader>sp` | Preview pending auto-suggestion |
34
+ | `<leader>sd` | Deny pending auto-suggestion |
35
+
36
+ Lowercase `:g` is not overridden because it is Vim's native `:global` command. Use uppercase `:G` for shmakk generation. Normal Vim commands such as `:%s/foo/bar/g` remain native Vim behavior.
37
+
38
+ ## Suggestions
39
+
40
+ Manual suggestions are available with `<C-Space>` or `:ShmakkSuggest`. shmakk opens a scratch preview buffer and asks whether to accept or deny before inserting.
41
+
42
+ Automatic suggestions are opt-in:
43
+
44
+ ```vim
45
+ let g:shmakk_auto_suggest = 1
46
+ let g:shmakk_auto_suggest_delay_ms = 2000
47
+ let g:shmakk_auto_suggest_min_chars = 20
48
+ ```
49
+
50
+ Auto-suggest uses Vim `job_start()` when available, so the model call runs in the background. When a suggestion is ready, shmakk stores it as a pending suggestion and prints:
51
+
52
+ ```text
53
+ [shmakk] suggestion ready: :ShmakkAccept, :ShmakkPreview, or :ShmakkDeny
54
+ ```
55
+
56
+ `ShmakkAccept` always previews before inserting.
57
+
58
+ ## Fast model routing
59
+
60
+ Vim suggestions prefer a fast endpoint:
61
+
62
+ 1. `SHMAKK_VIM_SUGGEST_ENDPOINT`
63
+ 2. `SHMAKK_FAST_ENDPOINT`
64
+ 3. the endpoint registry's `"fast"` model
65
+ 4. the current/main model
66
+
67
+ Example `~/.config/shmakk/endpoints.json`:
68
+
69
+ ```json
70
+ {
71
+ "main": "pro",
72
+ "fast": "flash",
73
+ "models": {
74
+ "pro": {
75
+ "provider": "google",
76
+ "model": "gemini-pro",
77
+ "api_key": "..."
78
+ },
79
+ "flash": {
80
+ "provider": "google",
81
+ "model": "gemini-flash",
82
+ "api_key": "..."
83
+ }
84
+ }
85
+ }
86
+ ```
87
+
88
+ ## Speed tuning
89
+
90
+ Suggestions send a trimmed context window around the cursor. Tune it with environment variables:
91
+
92
+ | Variable | Default | Purpose |
93
+ |----------|---------|---------|
94
+ | `SHMAKK_VIM_SUGGEST_BEFORE_LINES` | `80` | Lines before cursor |
95
+ | `SHMAKK_VIM_SUGGEST_AFTER_LINES` | `40` | Lines after cursor |
96
+ | `SHMAKK_VIM_SUGGEST_MAX_CHARS` | `12000` | Maximum suggestion context chars |
97
+
98
+ For lower latency, use a fast model and reduce context, for example:
99
+
100
+ ```bash
101
+ export SHMAKK_VIM_SUGGEST_ENDPOINT=flash
102
+ export SHMAKK_VIM_SUGGEST_MAX_CHARS=4000
103
+ export SHMAKK_VIM_SUGGEST_BEFORE_LINES=40
104
+ export SHMAKK_VIM_SUGGEST_AFTER_LINES=20
105
+ ```
106
+
107
+ ## Command execution
108
+
109
+ `:Cmd` runs shell commands in the current Vim working directory and shows output in a scratch buffer. It removes shmakk session environment variables and blocks running `shmakk` recursively from inside `:Cmd`.
110
+
package/docs/voice.md CHANGED
@@ -52,6 +52,10 @@ shmakk --stt # mic input only, text responses
52
52
  shmakk --tts # text input, spoken responses
53
53
  ```
54
54
 
55
+ The three modes are exclusive. If multiple flags are passed, the last one wins.
56
+ Inside a running shmakk session, `enable stt`, `enable tts`, and `enable sts`
57
+ also disable the other two modes.
58
+
55
59
  Just speak. shmakk will:
56
60
  1. Detect your voice via VAD
57
61
  2. Transcribe it (shown in cyan on stderr)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "shmakk",
3
- "version": "1.2.3",
3
+ "version": "1.2.5",
4
4
  "description": "AI-supervised terminal wrapper — command correction, tool-driven tasks, safety controls",
5
5
  "license": "MIT",
6
6
  "keywords": [
@@ -33,6 +33,7 @@
33
33
  "start": "node bin/shmakk.js",
34
34
  "dev": "node bin/shmakk.js --debug",
35
35
  "test": "node test/units.js",
36
+ "test-vision": "node test/vision-e2e.js",
36
37
  "check": "node -e \"require('./src/index'); require('./src/agent'); require('./src/orchestrator'); console.log('check-ok')\"",
37
38
  "mock-llm": "node test/mock-llm.js",
38
39
  "global:setup": "node src/global-setup.js",
@@ -49,13 +50,16 @@
49
50
  },
50
51
  "dependencies": {
51
52
  "@lydell/node-pty": "^1.2.0-beta.12",
52
- "openai": "^4.77.0",
53
- "wavefile": "^11.0.0"
53
+ "openai": "^4.104.0",
54
+ "wavefile": "^11.0.0",
55
+ "ws": "^8.21.0"
54
56
  },
55
57
  "optionalDependencies": {
56
58
  "@huggingface/transformers": "^4.2.0",
57
59
  "better-sqlite3": "^11.0.0",
58
- "kokoro-js": "^1.2.1",
59
- "playwright": "^1.40.0"
60
+ "kokoro-js": "^1.2.1"
61
+ },
62
+ "devDependencies": {
63
+ "playwright": "^1.60.0"
60
64
  }
61
65
  }
@@ -0,0 +1,45 @@
1
+ #!/usr/bin/env node
2
+ // Standalone test: start vibedit overlay on any running URL or HTML file.
3
+ // Usage: node scripts/test-vibedit.js <url-or-file> [projectDir]
4
+ // Examples:
5
+ // node scripts/test-vibedit.js http://localhost:5173
6
+ // node scripts/test-vibedit.js ~/my-project/index.html
7
+ // node scripts/test-vibedit.js ./demo.html
8
+
9
+ const { startVibedit } = require('../src/vibedit');
10
+
11
+ const args = process.argv.slice(2);
12
+ const target = args[0];
13
+ const projectDir = args[1] || process.cwd();
14
+
15
+ if (!target) {
16
+ console.error('Usage: node scripts/test-vibedit.js <url-or-file> [projectDir]');
17
+ console.error(' URL: http://localhost:5173');
18
+ console.error(' File: ~/my-project/index.html');
19
+ console.error(' Relpath: ./demo.html');
20
+ process.exit(1);
21
+ }
22
+
23
+ console.log(`Starting vibedit on ${target} (project: ${projectDir})`);
24
+ console.log('A Chromium window will open with the overlay puck in the bottom-right.');
25
+ console.log('Click the puck to chat, make changes live, then click Save.');
26
+ console.log('Ctrl-C to stop.\n');
27
+
28
+ startVibedit({
29
+ projectDir,
30
+ appUrl: target,
31
+ onSpec: (spec, specPath) => {
32
+ console.log(`\n[test] Spec saved! ${spec.summary || '(no summary)'}`);
33
+ console.log(`[test] Spec file: ${specPath}`);
34
+ console.log('[test] In a real session, this would be injected into the next agent run.\n');
35
+ },
36
+ }).then(({ shutdown }) => {
37
+ process.on('SIGINT', async () => {
38
+ console.log('\nShutting down...');
39
+ await shutdown();
40
+ process.exit(0);
41
+ });
42
+ }).catch(err => {
43
+ console.error('Failed to start vibedit:', err.message);
44
+ process.exit(1);
45
+ });
@@ -0,0 +1,52 @@
1
+ #!/bin/bash
2
+ # Start vibedit demo with a simple built-in HTML page (no external server needed)
3
+ # Usage: bash scripts/vibedit-demo.sh
4
+
5
+ set -e
6
+
7
+ DEMO_DIR="/tmp/shmakk-vibedit-demo"
8
+ mkdir -p "$DEMO_DIR"
9
+
10
+ cat > "$DEMO_DIR/index.html" << 'HTML'
11
+ <!DOCTYPE html>
12
+ <html lang="en">
13
+ <head>
14
+ <meta charset="UTF-8">
15
+ <title>Vibedit Demo</title>
16
+ <style>
17
+ * { margin: 0; padding: 0; box-sizing: border-box; }
18
+ body { font-family: system-ui, sans-serif; background: #f5f5f5; padding: 2rem; }
19
+ h1 { color: #333; margin-bottom: 1rem; }
20
+ p { color: #666; max-width: 600px; line-height: 1.6; }
21
+ .card { background: white; border-radius: 8px; padding: 1.5rem; margin-top: 1rem; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }
22
+ button { background: #2563eb; color: white; border: none; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer; margin-top: 0.5rem; }
23
+ .counter { font-size: 2rem; font-weight: bold; color: #2563eb; margin: 0.5rem 0; }
24
+ </style>
25
+ </head>
26
+ <body>
27
+ <h1>Vibedit Demo Page</h1>
28
+ <p>This is a test page for vibedit. Click the puck (bottom-right corner) to open the chat overlay.</p>
29
+ <div class="card">
30
+ <h2>Counter Example</h2>
31
+ <div class="counter" id="count">0</div>
32
+ <button onclick="document.getElementById('count').textContent = parseInt(document.getElementById('count').textContent) + 1">Click me</button>
33
+ </div>
34
+ <div class="card">
35
+ <h2>Try this in the chat:</h2>
36
+ <p>"Make the counter red and bigger"</p>
37
+ <p>"Change the heading to say something else"</p>
38
+ <p>"Make the background dark"</p>
39
+ </div>
40
+ </body>
41
+ </html>
42
+ HTML
43
+
44
+ echo "Demo page: $DEMO_DIR/index.html"
45
+ echo "Starting vibedit (static server + browser)..."
46
+ echo "Ctrl-C to stop"
47
+ echo ""
48
+
49
+ node "$(dirname "$0")/test-vibedit.js" "$DEMO_DIR/index.html" "$DEMO_DIR"
50
+
51
+ rm -rf "$DEMO_DIR"
52
+ echo "Cleaned up."