shmakk 1.2.4 → 1.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +11 -0
- package/README.md +75 -1
- package/docs/index.html +154 -16
- package/docs/mcp.md +78 -0
- package/docs/ssh.md +82 -0
- package/docs/vibedit-analysis.md +375 -0
- package/docs/vim.md +110 -0
- package/docs/voice.md +4 -0
- package/package.json +9 -5
- package/scripts/test-vibedit.js +45 -0
- package/scripts/vibedit-demo.sh +52 -0
- package/skills/shmakk-skill-creator.md +269 -0
- package/src/_check.js +7 -0
- package/src/_check_schema.js +5 -0
- package/src/_cleanup.js +18 -0
- package/src/_fix.js +9 -0
- package/src/_test_import.js +15 -0
- package/src/agent.js +11 -4
- package/src/browser-daemon.js +209 -0
- package/src/browser.js +10 -0
- package/src/cli/browserDaemon.js +60 -0
- package/src/cli/connectBrowser.js +137 -0
- package/src/cli.js +235 -8
- package/src/completions.js +8 -0
- package/src/control.js +273 -1
- package/src/core/browserConnector.js +523 -0
- package/src/electron.js +305 -0
- package/src/endpoints.js +74 -9
- package/src/index.js +24 -1
- package/src/llm.js +501 -61
- package/src/mobile.js +307 -0
- package/src/notify.js +51 -3
- package/src/orchestrator.js +35 -1
- package/src/pty.js +11 -6
- package/src/review.js +45 -11
- package/src/self-commands.js +153 -0
- package/src/session-convert.js +508 -0
- package/src/session-search.js +31 -0
- package/src/session.js +384 -46
- package/src/skills/browserActions.ts +984 -0
- package/src/skills.js +451 -24
- package/src/system-prompt.js +31 -25
- package/src/tools.js +81 -0
- package/src/vibedit/control.js +534 -0
- package/src/vibedit/electron.js +108 -0
- package/src/vibedit/files.js +171 -0
- package/src/vibedit/index.js +298 -0
- package/src/vibedit/overlay.js +1482 -0
- package/src/vibedit/prompts.js +245 -0
- package/src/vibedit/state.js +32 -0
- package/src/vim.js +410 -0
|
@@ -0,0 +1,375 @@
|
|
|
1
|
+
# vibedit Architecture Analysis
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
vibedit is a visual in-browser editor that lets you click on elements of a live webpage, edit them visually, and map those edits back to source code changes on disk. It runs the target app in a Playwright Chromium instance, injects a shadow-DOM overlay panel, and communicates with an LLM (default: LM Studio running qwen3.5-9b) over an OpenAI-compatible API. Screenshots are captured server-side via Playwright's `page.screenshot()` and sent as base64 JPEGs in multimodal chat requests.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## System Architecture (Data Flow)
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
[User browser page] <--WebSocket--> [Node control server] <--HTTP POST--> [LM Studio /chat/completions]
|
|
13
|
+
| | |
|
|
14
|
+
overlay.js injected control.js llm.js (OpenAI-compatible)
|
|
15
|
+
(shadow DOM panel) (WS message routing, (fetch wrapper with
|
|
16
|
+
- DOM editing screenshot capture, multimodal retry)
|
|
17
|
+
- WS client LLM orchestration)
|
|
18
|
+
- flow recording
|
|
19
|
+
|
|
|
20
|
+
files.js (source file matching + edit block application)
|
|
21
|
+
prompts.js (system/user prompt templates)
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## 1. Screenshot Capture
|
|
27
|
+
|
|
28
|
+
**Where:** `src/control.js:30-34` — `screenshotB64()`
|
|
29
|
+
|
|
30
|
+
```js
|
|
31
|
+
async function screenshotB64() {
|
|
32
|
+
const buf = await page.screenshot({ type: "jpeg", quality: 60, fullPage: false });
|
|
33
|
+
return buf.toString("base64");
|
|
34
|
+
}
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Uses Playwright's `page.screenshot()` (viewport only, `fullPage: false`). Quality is 60 JPEG. Returns a base64-encoded string. The `page` reference is captured from `index.js:43` when launching the browser.
|
|
38
|
+
|
|
39
|
+
The screenshots are attached to LLM requests as `images` array entries inside the user message (in OpenAI vision format): base64 data URIs with `data:image/jpeg;base64,${b64}` prefix. See `src/llm.js:17-20`.
|
|
40
|
+
|
|
41
|
+
**Control flow:** Every `chat`, `save`, and `flowApply` message type triggers a `screenshotB64()` call BEFORE calling the LLM. The screenshot is the current viewport state at the moment the user clicked Send, Save, or Apply. If `ctx.vision` is falsy (i.e., `--vision` flag not passed), screenshot capture is skipped and the request is text-only.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## 2. LLM API Endpoint Format
|
|
46
|
+
|
|
47
|
+
**Where:** `src/llm.js` — the `chat()` function
|
|
48
|
+
|
|
49
|
+
### Endpoint
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
POST {ctx.lmUrl}/chat/completions
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Default `ctx.lmUrl` = `http://127.0.0.1:1234/v1` (LM Studio default).
|
|
56
|
+
|
|
57
|
+
Configurable via:
|
|
58
|
+
- `--lm <url>` CLI flag
|
|
59
|
+
- `LMSTUDIO_URL` environment variable
|
|
60
|
+
- Falls back to the hardcoded default above
|
|
61
|
+
|
|
62
|
+
### Request Body Shape
|
|
63
|
+
|
|
64
|
+
```json
|
|
65
|
+
{
|
|
66
|
+
"model": "qwen/qwen3.5-9b",
|
|
67
|
+
"temperature": 0.2,
|
|
68
|
+
"max_tokens": 2048,
|
|
69
|
+
"messages": [
|
|
70
|
+
{
|
|
71
|
+
"role": "system",
|
|
72
|
+
"content": "You are a frontend editing assistant..."
|
|
73
|
+
},
|
|
74
|
+
{
|
|
75
|
+
"role": "user",
|
|
76
|
+
"content": [
|
|
77
|
+
{ "type": "text", "text": "Page URL: http://..." },
|
|
78
|
+
{
|
|
79
|
+
"type": "image_url",
|
|
80
|
+
"image_url": { "url": "data:image/jpeg;base64,<base64>" }
|
|
81
|
+
}
|
|
82
|
+
]
|
|
83
|
+
}
|
|
84
|
+
]
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Key Parameters
|
|
89
|
+
|
|
90
|
+
| Param | Default | Override |
|
|
91
|
+
|-------|---------|----------|
|
|
92
|
+
| `model` | `qwen/qwen3.5-9b` | `--model <id>` or `VIBEDIT_MODEL` env |
|
|
93
|
+
| `temperature` | `0.2` | Hardcoded in `chat()` call site |
|
|
94
|
+
| `max_tokens` | `2048` (chat), `4096` (save/flowApply) | Passed via `opts.maxTokens` |
|
|
95
|
+
| `vision` | `false` | `--vision` flag or `VIBEDIT_VISION=1` env |
|
|
96
|
+
|
|
97
|
+
### Multimodal Handling
|
|
98
|
+
|
|
99
|
+
Messages carry an `images` array of base64 JPEG strings. If the model rejects multimodal input (non-2xx response), `llm.js` retries once with text-only (`src/llm.js:40-46`).
|
|
100
|
+
|
|
101
|
+
```js
|
|
102
|
+
try { return await call(hasImages); }
|
|
103
|
+
catch (err) {
|
|
104
|
+
if (hasImages) {
|
|
105
|
+
console.warn("[vibedit] multimodal request failed, retrying text-only:", err.message);
|
|
106
|
+
return await call(false);
|
|
107
|
+
}
|
|
108
|
+
throw err;
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Response Parsing
|
|
113
|
+
|
|
114
|
+
The raw response is extracted as: `data?.choices?.[0]?.message?.content ?? ""`
|
|
115
|
+
|
|
116
|
+
This string is then parsed differently depending on the message type.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## 3. Prompt Design (Three Distinct Prompt Sets)
|
|
121
|
+
|
|
122
|
+
### 3.1 Chat Prompt (live DOM editing)
|
|
123
|
+
|
|
124
|
+
**System prompt** (`src/prompts.js:3-21` — `chatSystem()`):
|
|
125
|
+
|
|
126
|
+
The LLM is told it is a "frontend editing assistant embedded in a live web page." It receives a pruned DOM and a user request, and must respond with ONLY a JSON object in this shape:
|
|
127
|
+
|
|
128
|
+
```json
|
|
129
|
+
{
|
|
130
|
+
"reply": "one or two short sentences for the user",
|
|
131
|
+
"ops": [
|
|
132
|
+
{ "selector": "css selector", "action": "setText", "value": "new text" },
|
|
133
|
+
{ "selector": "css selector", "action": "setStyle", "style": { "color": "#ff0000" } },
|
|
134
|
+
{ "selector": "css selector", "action": "setHTML", "value": "<b>html</b>" },
|
|
135
|
+
{ "selector": "css selector", "action": "setAttr", "name": "src", "value": "..." },
|
|
136
|
+
{ "selector": "css selector", "action": "remove" }
|
|
137
|
+
]
|
|
138
|
+
}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Rules encoded in the system prompt:
|
|
142
|
+
- Prefer IDs, then stable class names
|
|
143
|
+
- Return `"ops": []` if no visual change requested
|
|
144
|
+
- Never invent selectors; say so in `reply` if unsure
|
|
145
|
+
|
|
146
|
+
**User prompt** (`src/prompts.js:23-28` — `chatUser()`):
|
|
147
|
+
|
|
148
|
+
```
|
|
149
|
+
Page URL: {msg.url}
|
|
150
|
+
Page title: {msg.title}
|
|
151
|
+
Currently selected element: {msg.selected} (if any)
|
|
152
|
+
Pruned DOM: {msg.dom} (truncated to 9000 chars)
|
|
153
|
+
|
|
154
|
+
User request: {msg.text}
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
The DOM is the pruned outerHTML of `<body>` with `<script>`, `<style>`, `<noscript>`, `<svg>`, metadata stripped, and data- attributes removed or truncated (see `overlay.js:260-273` — `prunedDOM()`).
|
|
158
|
+
|
|
159
|
+
**Response parsing** (`src/control.js:48-56` — inside `handleChat()`):
|
|
160
|
+
|
|
161
|
+
```js
|
|
162
|
+
let parsed;
|
|
163
|
+
try {
|
|
164
|
+
parsed = JSON.parse(stripFences(raw));
|
|
165
|
+
} catch {
|
|
166
|
+
parsed = { reply: raw, ops: [] };
|
|
167
|
+
}
|
|
168
|
+
send(ws, { type: "chatResult", reply: parsed.reply || "", ops: Array.isArray(parsed.ops) ? parsed.ops : [] });
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
`stripFences()` removes leading/trailing markdown code fences (```json ... ```). Graceful fallback: if JSON parse fails, the raw text becomes `reply` and `ops` is empty.
|
|
172
|
+
|
|
173
|
+
### 3.2 Save Prompt (source code mapping)
|
|
174
|
+
|
|
175
|
+
**System prompt** (`src/prompts.js:30-53` — `saveSystem()`):
|
|
176
|
+
|
|
177
|
+
The LLM is told to map live DOM edits back to source code. It outputs edit blocks in SEARCH/REPLACE format:
|
|
178
|
+
|
|
179
|
+
```
|
|
180
|
+
FILE: relative/path/from/project/root
|
|
181
|
+
<<<<<<< SEARCH
|
|
182
|
+
exact lines copied verbatim from the provided file content
|
|
183
|
+
=======
|
|
184
|
+
the replacement lines
|
|
185
|
+
>>>>>>> REPLACE
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
Rules:
|
|
189
|
+
- SEARCH must be character-for-character from the provided file content
|
|
190
|
+
- One block per distinct change; multiple blocks per file are fine
|
|
191
|
+
- JSX/Vue/Svelte: edit component source, not rendered HTML
|
|
192
|
+
- Inline styles should become CSS rule changes
|
|
193
|
+
- If a change cannot be located, skip it (do not guess paths)
|
|
194
|
+
|
|
195
|
+
**User prompt** (`src/prompts.js:55-79` — `saveUser()`):
|
|
196
|
+
|
|
197
|
+
For each tracked change, shows:
|
|
198
|
+
- DOM changes: `CHANGE N (selector: ...)` with BEFORE/AFTER outerHTML
|
|
199
|
+
- CSS changes: `CHANGE N (CSS rule for selector ...)` with existing rules and new declarations
|
|
200
|
+
|
|
201
|
+
Plus candidate source files (up to 5, shortlisted by `shortlistFiles()`).
|
|
202
|
+
|
|
203
|
+
### 3.3 Flow Prompt (user interaction recording)
|
|
204
|
+
|
|
205
|
+
**System prompt:** Same `saveSystem()` as Save.
|
|
206
|
+
|
|
207
|
+
**User prompt** (`src/prompts.js:81-105` — `flowUser()`):
|
|
208
|
+
|
|
209
|
+
Shows a timestamped event log:
|
|
210
|
+
```
|
|
211
|
+
[1.2s] click .header "Welcome"
|
|
212
|
+
[3.5s] scroll to y=450
|
|
213
|
+
[5.1s] typed in #email
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
Plus the pruned DOM at end of recording, the user's instruction, and candidate source files. Three screenshots (first, middle, last) are included as images when vision is enabled.
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
## 4. Response Parsing and Code Modification
|
|
221
|
+
|
|
222
|
+
### Chat Result (client-side)
|
|
223
|
+
|
|
224
|
+
The overlay receives `{ type: "chatResult", reply, ops }` via WebSocket. The `ops` array is processed by `applyOps()` in `overlay.js:331-349`:
|
|
225
|
+
|
|
226
|
+
```js
|
|
227
|
+
function applyOps(ops) {
|
|
228
|
+
for (const op of ops) {
|
|
229
|
+
let el = document.querySelector(op.selector);
|
|
230
|
+
if (!el || isOurs(el)) continue;
|
|
231
|
+
trackBefore(el); // record original state
|
|
232
|
+
if (op.action === "setText") el.textContent = op.value ?? "";
|
|
233
|
+
else if (op.action === "setHTML") el.innerHTML = op.value ?? "";
|
|
234
|
+
else if (op.action === "setStyle" && op.style)
|
|
235
|
+
for (const [k, v] of Object.entries(op.style))
|
|
236
|
+
el.style.setProperty(toKebab(k), v);
|
|
237
|
+
else if (op.action === "setAttr") el.setAttribute(op.name, op.value ?? "");
|
|
238
|
+
else if (op.action === "remove") { el.remove(); }
|
|
239
|
+
}
|
|
240
|
+
}
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
Supported DOM actions: `setText`, `setHTML`, `setStyle`, `setAttr`, `remove`.
|
|
244
|
+
|
|
245
|
+
Each modification is tracked in the `changes` Map (keyed by CSS path of the element) so it can be reverted or saved to source.
|
|
246
|
+
|
|
247
|
+
### Save Result (server-side)
|
|
248
|
+
|
|
249
|
+
**Edit block parsing** (`src/files.js:100-128` — `applyEditBlocks()`):
|
|
250
|
+
|
|
251
|
+
The LLM's raw text output is parsed with regex:
|
|
252
|
+
|
|
253
|
+
```js
|
|
254
|
+
const BLOCK_RE = /FILE:\s*(.+?)\s*\n<{5,}\s*SEARCH\s*\n([\s\S]*?)\n={5,}\s*\n([\s\S]*?)\n>{5,}\s*REPLACE/g;
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Two matching strategies:
|
|
258
|
+
|
|
259
|
+
1. **Exact match** (`exactReplace`): simple `String.indexOf()` check. Fast path.
|
|
260
|
+
2. **Fuzzy match** (`fuzzyReplace`): line-trimmed match that tolerates indentation drift. Splits both search and content into lines, trims whitespace, tries to find a contiguous match of the trimmed lines.
|
|
261
|
+
|
|
262
|
+
Vibedit project-local artifacts are stored under `.shmakk/state/`; generated
|
|
263
|
+
specs use `.shmakk/state/vibedit-specs/` and recorded flow media uses
|
|
264
|
+
`.shmakk/state/vibedit-sessions/`.
|
|
265
|
+
|
|
266
|
+
### File Shortlisting
|
|
267
|
+
|
|
268
|
+
**Where:** `src/files.js:56-97` — `shortlistFiles()`
|
|
269
|
+
|
|
270
|
+
Before asking the LLM to generate edit blocks, vibedit determines which source files are relevant to the user's changes:
|
|
271
|
+
|
|
272
|
+
1. Walk the project directory (excluding `node_modules`, `.git`, `dist`, `build`, etc.)
|
|
273
|
+
2. Collect all files with source extensions (`.html`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.astro`, `.css`, `.scss`, `.less`, `.mjs`, `.cjs`)
|
|
274
|
+
3. Extract "needles" from the change data: text fragments (6-80 chars), class names, IDs, CSS property names, selector parts
|
|
275
|
+
4. Score each file by needle occurrence count (weighted by needle length, capped at 5000)
|
|
276
|
+
5. Return top 5 files with content trimmed to fit within 16,000 chars total budget; for large files, show windows around hit lines
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## 5. WebSocket Message Protocol
|
|
281
|
+
|
|
282
|
+
### Client-to-Server Messages
|
|
283
|
+
|
|
284
|
+
| Type | Fields | Purpose |
|
|
285
|
+
|------|--------|---------|
|
|
286
|
+
| `chat` | `text`, `url`, `title`, `dom`, `selected` | Ask LLM about current page |
|
|
287
|
+
| `save` | `changes[]`, `url`, `dom` | Map live edits to source files |
|
|
288
|
+
| `flowStart` | (none) | Begin recording interaction flow |
|
|
289
|
+
| `flowStop` | (none) | End recording |
|
|
290
|
+
| `flowEvent` | `ev: { kind, selector, text, x, y, url }` | Log click/scroll/input/nav event |
|
|
291
|
+
| `flowApply` | `id`, `instruction`, `dom`, `url` | Apply LLM changes from recorded flow |
|
|
292
|
+
| `flowDiscard` | `id` | Delete the recorded session |
|
|
293
|
+
|
|
294
|
+
### Server-to-Client Messages
|
|
295
|
+
|
|
296
|
+
| Type | Fields | Purpose |
|
|
297
|
+
|------|--------|---------|
|
|
298
|
+
| `hello` | `model`, `vision` | Connection established |
|
|
299
|
+
| `status` | `text` | Progress indicator |
|
|
300
|
+
| `chatResult` | `reply`, `ops[]` | LLM response for chat |
|
|
301
|
+
| `saveResult` | `ok`, `summary`, `applied[]`, `failed[]`, `modelOutput` | Result of source edits |
|
|
302
|
+
| `flowStarted` | `id` | Recording started |
|
|
303
|
+
| `flowStopped` | `id`, `shots`, `events[]`, `base` | Recording completed |
|
|
304
|
+
| `error` | `text` | Error message |
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## 6. Flow Recording (Interaction Capture)
|
|
309
|
+
|
|
310
|
+
vibedit has a "userflow" feature that records user interactions (clicks, scrolls, input, navigation) as timed events while capturing screenshots every 1.5 seconds.
|
|
311
|
+
|
|
312
|
+
**Server-side** (`src/control.js:60-122`):
|
|
313
|
+
|
|
314
|
+
- `startFlow()`: Creates session dir, takes first screenshot, sets 1.5s interval timer
|
|
315
|
+
- `flowEvent`: Appended to `events[]` array in memory
|
|
316
|
+
- `stopFlow()`: Writes `events.json`, stops timer
|
|
317
|
+
- `handleFlowApply()`: Sends 3 screenshots (first/middle/last) as vision input, plus event timeline, to the LLM for source mapping
|
|
318
|
+
|
|
319
|
+
**Client-side** (`overlay.js`):
|
|
320
|
+
|
|
321
|
+
- Click events recorded with CSS path, text content (80 chars), coordinates
|
|
322
|
+
- Scroll events debounced at 250ms
|
|
323
|
+
- Input events on form fields
|
|
324
|
+
- Playback UI shows frames with scrubber and event annotations
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## 7. Bootstrap & Runtime Flow
|
|
329
|
+
|
|
330
|
+
1. **`bin/vibedit.js`** parses CLI args, resolves the target (package.json or HTML file)
|
|
331
|
+
2. **`src/index.js` — `start()`**:
|
|
332
|
+
a. Starts or detects a dev server (npm/yarn/pnpm/bun `dev` script, or static file server on port 8362)
|
|
333
|
+
b. Launches Chromium via Playwright (headless: false, viewport: null for native size)
|
|
334
|
+
c. Injects `overlay.js` via `context.addInitScript()` so it runs on every page load
|
|
335
|
+
d. Passes the control server port to the overlay via `window.__VIBEDIT__.port`
|
|
336
|
+
e. Navigates to the app URL with retry logic
|
|
337
|
+
f. Starts the control server (WebSocket + HTTP) on port 8417
|
|
338
|
+
3. **`src/control.js` — `startControlServer()`**: Handles all WebSocket messages, orchestrates LLM calls, serves session screenshots over HTTP for the playback UI
|
|
339
|
+
4. **`src/overlay.js`**: Shadow-DOM panel connects to control server, provides chat UI, element inspector, CSS rule editor, flow recording, and applies AI-generated DOM ops
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## 8. Key Files Summary
|
|
344
|
+
|
|
345
|
+
| File | Lines | Role |
|
|
346
|
+
|------|-------|------|
|
|
347
|
+
| `src/overlay.js` | ~710 | Client-side: shadow-DOM panel, element inspector, chat, flow recording, DOM ops application |
|
|
348
|
+
| `src/control.js` | ~185 | Server-side: WebSocket routing, screenshot capture, LLM orchestration, flow session management |
|
|
349
|
+
| `src/llm.js` | ~50 | OpenAI-compatible chat client with multimodal support and text-only fallback |
|
|
350
|
+
| `src/prompts.js` | ~105 | Prompt templates for chat, save, and flow modes |
|
|
351
|
+
| `src/files.js` | ~170 | Source file discovery, needle-based shortlisting, SEARCH/REPLACE edit block parsing and application |
|
|
352
|
+
| `src/index.js` | ~90 | Entry point: browser launch, overlay injection, dev server startup, shutdown handling |
|
|
353
|
+
| `src/devserver.js` | ~85 | Dev server detection (npm/yarn/pnpm/bun) and static file server |
|
|
354
|
+
| `bin/vibedit.js` | ~60 | CLI argument parsing |
|
|
355
|
+
| `package.json` | ~20 | Dependencies: `playwright`, `ws` |
|
|
356
|
+
|
|
357
|
+
---
|
|
358
|
+
|
|
359
|
+
## 9. Design Observations for shmakk Integration
|
|
360
|
+
|
|
361
|
+
1. **Prompt rigidity is intentional**: Prompts are kept short and structured because the default model is 9B parameters. Moving to larger models or agent-based workflows (like shmakk's multi-agent system) would benefit from more descriptive prompts and structured output formats.
|
|
362
|
+
|
|
363
|
+
2. **Screenshot + DOM dual input**: The vision LLM receives BOTH a base64 screenshot AND a text-based pruned DOM. The DOM is the primary source for operations (the LLM cannot "see" class names or selectors from the image alone), while the screenshot gives visual layout context.
|
|
364
|
+
|
|
365
|
+
3. **DOM ops are simple but powerful**: Five operations cover most UI edits: text, HTML, styles, attributes, remove. No support for create/insert/move operations.
|
|
366
|
+
|
|
367
|
+
4. **Source mapping is reactive**: Changes tracked in-memory during the editing session are sent to the LLM only when the user clicks "Save." The LLM then maps BEFORE/AFTER DOM blobs back to source files.
|
|
368
|
+
|
|
369
|
+
5. **LLM has no filesystem access**: The LLM never sees the full project. It sees only up to 5 shortlisted files (determined by text-matching needle extraction). This keeps token usage low but can miss edits in unlisted files.
|
|
370
|
+
|
|
371
|
+
6. **Model is swappable**: The LM Studio endpoint can point to any OpenAI-compatible API. The model ID defaults to qwen3.5-9b but can be any vision-capable model.
|
|
372
|
+
|
|
373
|
+
7. **Single-page focus**: The tool is designed for web apps in a single browser page. No multi-tab, no Electron/mobile app support. Extending to Electron or mobile would require a different screenshot capture mechanism (e.g., native screenshot APIs) and potentially a different overlay injection strategy.
|
|
374
|
+
|
|
375
|
+
8. **Flow recording is time-sampled**: Screenshots at 1.5s intervals + event log. The LLM sees 3 screenshots (first/middle/last) to understand the interaction timeline. This is a clever low-token approach for understanding user flows.
|
package/docs/vim.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# shmakk Vim / vi
|
|
2
|
+
|
|
3
|
+
shmakk can wrap your normal `vi` or `vim` command inside a shmakk session and add AI editor commands without replacing your Vim setup.
|
|
4
|
+
|
|
5
|
+
## Launch modes
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
shmakk --vim vi # default: intercept vi
|
|
9
|
+
shmakk --vim vim # intercept vim
|
|
10
|
+
shmakk --vim disable # no Vim interception
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
When enabled, shmakk creates a temporary executable shim and prepends it to `PATH` inside the shmakk shell. The shim launches your real editor, lets it load your normal vimrc/plugins/colors, then sources a generated shmakk Vim plugin.
|
|
14
|
+
|
|
15
|
+
## Commands
|
|
16
|
+
|
|
17
|
+
| Command | Purpose |
|
|
18
|
+
|---------|---------|
|
|
19
|
+
| `:G <prompt>` | Generate code at the cursor |
|
|
20
|
+
| `:Tw <prompt>` | Write prose or documentation at the cursor |
|
|
21
|
+
| `:Cmd <command>` | Run a shell command in a scratch buffer |
|
|
22
|
+
| `:ShmakkSuggest` | Request a full-block code suggestion |
|
|
23
|
+
| `:ShmakkAccept` | Preview and accept a pending auto-suggestion |
|
|
24
|
+
| `:ShmakkPreview` | Preview a pending auto-suggestion |
|
|
25
|
+
| `:ShmakkDeny` | Clear a pending auto-suggestion |
|
|
26
|
+
|
|
27
|
+
Mappings:
|
|
28
|
+
|
|
29
|
+
| Mapping | Purpose |
|
|
30
|
+
|---------|---------|
|
|
31
|
+
| `<C-Space>` | Manual full-block suggestion with preview + Accept/Deny |
|
|
32
|
+
| `<leader>sa` | Accept pending auto-suggestion |
|
|
33
|
+
| `<leader>sp` | Preview pending auto-suggestion |
|
|
34
|
+
| `<leader>sd` | Deny pending auto-suggestion |
|
|
35
|
+
|
|
36
|
+
Lowercase `:g` is not overridden because it is Vim's native `:global` command. Use uppercase `:G` for shmakk generation. Normal Vim commands such as `:%s/foo/bar/g` remain native Vim behavior.
|
|
37
|
+
|
|
38
|
+
## Suggestions
|
|
39
|
+
|
|
40
|
+
Manual suggestions are available with `<C-Space>` or `:ShmakkSuggest`. shmakk opens a scratch preview buffer and asks whether to accept or deny before inserting.
|
|
41
|
+
|
|
42
|
+
Automatic suggestions are opt-in:
|
|
43
|
+
|
|
44
|
+
```vim
|
|
45
|
+
let g:shmakk_auto_suggest = 1
|
|
46
|
+
let g:shmakk_auto_suggest_delay_ms = 2000
|
|
47
|
+
let g:shmakk_auto_suggest_min_chars = 20
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
Auto-suggest uses Vim `job_start()` when available, so the model call runs in the background. When a suggestion is ready, shmakk stores it as a pending suggestion and prints:
|
|
51
|
+
|
|
52
|
+
```text
|
|
53
|
+
[shmakk] suggestion ready: :ShmakkAccept, :ShmakkPreview, or :ShmakkDeny
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
`ShmakkAccept` always previews before inserting.
|
|
57
|
+
|
|
58
|
+
## Fast model routing
|
|
59
|
+
|
|
60
|
+
Vim suggestions prefer a fast endpoint:
|
|
61
|
+
|
|
62
|
+
1. `SHMAKK_VIM_SUGGEST_ENDPOINT`
|
|
63
|
+
2. `SHMAKK_FAST_ENDPOINT`
|
|
64
|
+
3. the endpoint registry's `"fast"` model
|
|
65
|
+
4. the current/main model
|
|
66
|
+
|
|
67
|
+
Example `~/.config/shmakk/endpoints.json`:
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"main": "pro",
|
|
72
|
+
"fast": "flash",
|
|
73
|
+
"models": {
|
|
74
|
+
"pro": {
|
|
75
|
+
"provider": "google",
|
|
76
|
+
"model": "gemini-pro",
|
|
77
|
+
"api_key": "..."
|
|
78
|
+
},
|
|
79
|
+
"flash": {
|
|
80
|
+
"provider": "google",
|
|
81
|
+
"model": "gemini-flash",
|
|
82
|
+
"api_key": "..."
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
}
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Speed tuning
|
|
89
|
+
|
|
90
|
+
Suggestions send a trimmed context window around the cursor. Tune it with environment variables:
|
|
91
|
+
|
|
92
|
+
| Variable | Default | Purpose |
|
|
93
|
+
|----------|---------|---------|
|
|
94
|
+
| `SHMAKK_VIM_SUGGEST_BEFORE_LINES` | `80` | Lines before cursor |
|
|
95
|
+
| `SHMAKK_VIM_SUGGEST_AFTER_LINES` | `40` | Lines after cursor |
|
|
96
|
+
| `SHMAKK_VIM_SUGGEST_MAX_CHARS` | `12000` | Maximum suggestion context chars |
|
|
97
|
+
|
|
98
|
+
For lower latency, use a fast model and reduce context, for example:
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
export SHMAKK_VIM_SUGGEST_ENDPOINT=flash
|
|
102
|
+
export SHMAKK_VIM_SUGGEST_MAX_CHARS=4000
|
|
103
|
+
export SHMAKK_VIM_SUGGEST_BEFORE_LINES=40
|
|
104
|
+
export SHMAKK_VIM_SUGGEST_AFTER_LINES=20
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## Command execution
|
|
108
|
+
|
|
109
|
+
`:Cmd` runs shell commands in the current Vim working directory and shows output in a scratch buffer. It removes shmakk session environment variables and blocks running `shmakk` recursively from inside `:Cmd`.
|
|
110
|
+
|
package/docs/voice.md
CHANGED
|
@@ -52,6 +52,10 @@ shmakk --stt # mic input only, text responses
|
|
|
52
52
|
shmakk --tts # text input, spoken responses
|
|
53
53
|
```
|
|
54
54
|
|
|
55
|
+
The three modes are exclusive. If multiple flags are passed, the last one wins.
|
|
56
|
+
Inside a running shmakk session, `enable stt`, `enable tts`, and `enable sts`
|
|
57
|
+
also disable the other two modes.
|
|
58
|
+
|
|
55
59
|
Just speak. shmakk will:
|
|
56
60
|
1. Detect your voice via VAD
|
|
57
61
|
2. Transcribe it (shown in cyan on stderr)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "shmakk",
|
|
3
|
-
"version": "1.2.
|
|
3
|
+
"version": "1.2.5",
|
|
4
4
|
"description": "AI-supervised terminal wrapper — command correction, tool-driven tasks, safety controls",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"keywords": [
|
|
@@ -33,6 +33,7 @@
|
|
|
33
33
|
"start": "node bin/shmakk.js",
|
|
34
34
|
"dev": "node bin/shmakk.js --debug",
|
|
35
35
|
"test": "node test/units.js",
|
|
36
|
+
"test-vision": "node test/vision-e2e.js",
|
|
36
37
|
"check": "node -e \"require('./src/index'); require('./src/agent'); require('./src/orchestrator'); console.log('check-ok')\"",
|
|
37
38
|
"mock-llm": "node test/mock-llm.js",
|
|
38
39
|
"global:setup": "node src/global-setup.js",
|
|
@@ -49,13 +50,16 @@
|
|
|
49
50
|
},
|
|
50
51
|
"dependencies": {
|
|
51
52
|
"@lydell/node-pty": "^1.2.0-beta.12",
|
|
52
|
-
"openai": "^4.
|
|
53
|
-
"wavefile": "^11.0.0"
|
|
53
|
+
"openai": "^4.104.0",
|
|
54
|
+
"wavefile": "^11.0.0",
|
|
55
|
+
"ws": "^8.21.0"
|
|
54
56
|
},
|
|
55
57
|
"optionalDependencies": {
|
|
56
58
|
"@huggingface/transformers": "^4.2.0",
|
|
57
59
|
"better-sqlite3": "^11.0.0",
|
|
58
|
-
"kokoro-js": "^1.2.1"
|
|
59
|
-
|
|
60
|
+
"kokoro-js": "^1.2.1"
|
|
61
|
+
},
|
|
62
|
+
"devDependencies": {
|
|
63
|
+
"playwright": "^1.60.0"
|
|
60
64
|
}
|
|
61
65
|
}
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
// Standalone test: start vibedit overlay on any running URL or HTML file.
|
|
3
|
+
// Usage: node scripts/test-vibedit.js <url-or-file> [projectDir]
|
|
4
|
+
// Examples:
|
|
5
|
+
// node scripts/test-vibedit.js http://localhost:5173
|
|
6
|
+
// node scripts/test-vibedit.js ~/my-project/index.html
|
|
7
|
+
// node scripts/test-vibedit.js ./demo.html
|
|
8
|
+
|
|
9
|
+
const { startVibedit } = require('../src/vibedit');
|
|
10
|
+
|
|
11
|
+
const args = process.argv.slice(2);
|
|
12
|
+
const target = args[0];
|
|
13
|
+
const projectDir = args[1] || process.cwd();
|
|
14
|
+
|
|
15
|
+
if (!target) {
|
|
16
|
+
console.error('Usage: node scripts/test-vibedit.js <url-or-file> [projectDir]');
|
|
17
|
+
console.error(' URL: http://localhost:5173');
|
|
18
|
+
console.error(' File: ~/my-project/index.html');
|
|
19
|
+
console.error(' Relpath: ./demo.html');
|
|
20
|
+
process.exit(1);
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
console.log(`Starting vibedit on ${target} (project: ${projectDir})`);
|
|
24
|
+
console.log('A Chromium window will open with the overlay puck in the bottom-right.');
|
|
25
|
+
console.log('Click the puck to chat, make changes live, then click Save.');
|
|
26
|
+
console.log('Ctrl-C to stop.\n');
|
|
27
|
+
|
|
28
|
+
startVibedit({
|
|
29
|
+
projectDir,
|
|
30
|
+
appUrl: target,
|
|
31
|
+
onSpec: (spec, specPath) => {
|
|
32
|
+
console.log(`\n[test] Spec saved! ${spec.summary || '(no summary)'}`);
|
|
33
|
+
console.log(`[test] Spec file: ${specPath}`);
|
|
34
|
+
console.log('[test] In a real session, this would be injected into the next agent run.\n');
|
|
35
|
+
},
|
|
36
|
+
}).then(({ shutdown }) => {
|
|
37
|
+
process.on('SIGINT', async () => {
|
|
38
|
+
console.log('\nShutting down...');
|
|
39
|
+
await shutdown();
|
|
40
|
+
process.exit(0);
|
|
41
|
+
});
|
|
42
|
+
}).catch(err => {
|
|
43
|
+
console.error('Failed to start vibedit:', err.message);
|
|
44
|
+
process.exit(1);
|
|
45
|
+
});
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
#!/bin/bash
|
|
2
|
+
# Start vibedit demo with a simple built-in HTML page (no external server needed)
|
|
3
|
+
# Usage: bash scripts/vibedit-demo.sh
|
|
4
|
+
|
|
5
|
+
set -e
|
|
6
|
+
|
|
7
|
+
DEMO_DIR="/tmp/shmakk-vibedit-demo"
|
|
8
|
+
mkdir -p "$DEMO_DIR"
|
|
9
|
+
|
|
10
|
+
cat > "$DEMO_DIR/index.html" << 'HTML'
|
|
11
|
+
<!DOCTYPE html>
|
|
12
|
+
<html lang="en">
|
|
13
|
+
<head>
|
|
14
|
+
<meta charset="UTF-8">
|
|
15
|
+
<title>Vibedit Demo</title>
|
|
16
|
+
<style>
|
|
17
|
+
* { margin: 0; padding: 0; box-sizing: border-box; }
|
|
18
|
+
body { font-family: system-ui, sans-serif; background: #f5f5f5; padding: 2rem; }
|
|
19
|
+
h1 { color: #333; margin-bottom: 1rem; }
|
|
20
|
+
p { color: #666; max-width: 600px; line-height: 1.6; }
|
|
21
|
+
.card { background: white; border-radius: 8px; padding: 1.5rem; margin-top: 1rem; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }
|
|
22
|
+
button { background: #2563eb; color: white; border: none; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer; margin-top: 0.5rem; }
|
|
23
|
+
.counter { font-size: 2rem; font-weight: bold; color: #2563eb; margin: 0.5rem 0; }
|
|
24
|
+
</style>
|
|
25
|
+
</head>
|
|
26
|
+
<body>
|
|
27
|
+
<h1>Vibedit Demo Page</h1>
|
|
28
|
+
<p>This is a test page for vibedit. Click the puck (bottom-right corner) to open the chat overlay.</p>
|
|
29
|
+
<div class="card">
|
|
30
|
+
<h2>Counter Example</h2>
|
|
31
|
+
<div class="counter" id="count">0</div>
|
|
32
|
+
<button onclick="document.getElementById('count').textContent = parseInt(document.getElementById('count').textContent) + 1">Click me</button>
|
|
33
|
+
</div>
|
|
34
|
+
<div class="card">
|
|
35
|
+
<h2>Try this in the chat:</h2>
|
|
36
|
+
<p>"Make the counter red and bigger"</p>
|
|
37
|
+
<p>"Change the heading to say something else"</p>
|
|
38
|
+
<p>"Make the background dark"</p>
|
|
39
|
+
</div>
|
|
40
|
+
</body>
|
|
41
|
+
</html>
|
|
42
|
+
HTML
|
|
43
|
+
|
|
44
|
+
echo "Demo page: $DEMO_DIR/index.html"
|
|
45
|
+
echo "Starting vibedit (static server + browser)..."
|
|
46
|
+
echo "Ctrl-C to stop"
|
|
47
|
+
echo ""
|
|
48
|
+
|
|
49
|
+
node "$(dirname "$0")/test-vibedit.js" "$DEMO_DIR/index.html" "$DEMO_DIR"
|
|
50
|
+
|
|
51
|
+
rm -rf "$DEMO_DIR"
|
|
52
|
+
echo "Cleaned up."
|