@akshayram1/omnibrowser-agent 0.2.8 → 0.2.26
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +11 -0
- package/dist/background.js +24 -5
- package/dist/background.js.map +2 -2
- package/dist/content.js +120 -4
- package/dist/content.js.map +3 -3
- package/dist/lib.js +264 -58
- package/dist/lib.js.map +3 -3
- package/dist/popup.html +7 -1
- package/dist/popup.js +19 -1
- package/dist/popup.js.map +2 -2
- package/dist/types/core/prompt.d.ts +3 -0
- package/dist/types/core/webllm-bridge.d.ts +33 -0
- package/dist/types/lib/index.d.ts +2 -0
- package/dist/types/shared/parse-action.d.ts +2 -1
- package/docs/EMBEDDING.md +3 -14
- package/docs/ROADMAP.md +8 -13
- package/index.html +962 -70
- package/package.json +1 -1
- package/plan.md +114 -0
- package/styles.css +653 -297
- package/vercel.json +7 -2
package/package.json
CHANGED
package/plan.md
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Local LLM Website Control
|
|
3
|
+
overview: Improve the WebLLM integration so an on-device LLM can actually control a website reliably — fixing the thin prompt, adding a ready-made bridge helper, making JSON output robust, and fixing the extension memory bug.
|
|
4
|
+
todos:
|
|
5
|
+
- id: prompt-module
|
|
6
|
+
content: Create src/core/prompt.ts with buildSystemPrompt() and buildUserMessage() that serialize the full PageSnapshot, history, memory, lastError, and action schema for the LLM
|
|
7
|
+
status: pending
|
|
8
|
+
- id: webllm-bridge-helper
|
|
9
|
+
content: Create src/core/webllm-bridge.ts with createWebLLMBridge(engine) factory and export it from src/lib/index.ts
|
|
10
|
+
status: pending
|
|
11
|
+
- id: planner-rich-prompt
|
|
12
|
+
content: Update planner.ts webllm path to use the rich prompt and add JSON parse retry logic using parsePlannerResult
|
|
13
|
+
status: pending
|
|
14
|
+
- id: fix-extension-memory
|
|
15
|
+
content: Fix memory and lastError persistence in content/index.ts and background/index.ts so reflection state survives across extension ticks
|
|
16
|
+
status: pending
|
|
17
|
+
- id: popup-modelid
|
|
18
|
+
content: Add model name input to popup UI (shown when webllm is selected), forwarded in planner.modelId
|
|
19
|
+
status: pending
|
|
20
|
+
isProject: false
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
# Local LLM Website Control — WebLLM Improvements
|
|
24
|
+
|
|
25
|
+
## Current State
|
|
26
|
+
|
|
27
|
+
The library has two planners: `heuristic` (regex, no LLM) and `webllm` (delegates to `window.__browserAgentWebLLM`, manually wired by the user). The example bridge in `[docs/EMBEDDING.md](docs/EMBEDDING.md)` only passes `goal` and `history` to the LLM — the rich page snapshot (interactive elements with selectors, page text, URL) is completely ignored, and there is no error recovery or memory. This means WebLLM can't reliably pick selectors or multi-step through a site.
|
|
28
|
+
|
|
29
|
+
## Architecture of improved WebLLM flow
|
|
30
|
+
|
|
31
|
+
```mermaid
|
|
32
|
+
flowchart TD
|
|
33
|
+
UserGoal["User Goal"] --> BrowserAgent
|
|
34
|
+
BrowserAgent --> collectSnapshot["collectSnapshot()\nobserver.ts"]
|
|
35
|
+
collectSnapshot --> Snapshot["PageSnapshot\n(url, title, textPreview, candidates[])"]
|
|
36
|
+
Snapshot --> planNextAction["planNextAction()\nplanner.ts"]
|
|
37
|
+
planNextAction -->|kind=heuristic| HeuristicPlan["Regex planner (existing)"]
|
|
38
|
+
planNextAction -->|kind=webllm| BridgeLookup["window.__browserAgentWebLLM"]
|
|
39
|
+
BridgeLookup -->|manual wiring| UserBridge["User-written bridge (existing)"]
|
|
40
|
+
BridgeLookup -->|createWebLLMBridge NEW| BuiltInBridge["createWebLLMBridge(engine)\nbuilt-in helper"]
|
|
41
|
+
BuiltInBridge --> RichPrompt["prompt.ts NEW\nbuildSystemPrompt()\nbuildUserMessage(input)\n- All 8 action types + schema\n- Candidates list with selectors\n- history + memory + lastError"]
|
|
42
|
+
RichPrompt --> WebLLMEngine["@mlc-ai/web-llm engine\n(runs fully on-device)"]
|
|
43
|
+
WebLLMEngine --> JSONOutput["Raw LLM output"]
|
|
44
|
+
JSONOutput --> ParseRetry["parsePlannerResult()\n+ retry on bad JSON"]
|
|
45
|
+
ParseRetry --> executeAction["executeAction()\nexecutor.ts"]
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
|
|
50
|
+
## Improvements
|
|
51
|
+
|
|
52
|
+
### 1. Rich prompt module
|
|
53
|
+
|
|
54
|
+
**New file:** `[src/core/prompt.ts](src/core/prompt.ts)`
|
|
55
|
+
|
|
56
|
+
The LLM currently has no idea what elements are on the page. This module creates two functions:
|
|
57
|
+
|
|
58
|
+
- `buildSystemPrompt(customPrompt?)` — explains all 8 action types (`click`, `type`, `navigate`, `extract`, `scroll`, `focus`, `wait`, `done`) with their exact JSON fields, explains the `PlannerResult` wrapper (`evaluation`, `memory`, `nextGoal`), and instructs the model to output only valid JSON
|
|
59
|
+
- `buildUserMessage(input: PlannerInput)` — serializes the full context:
|
|
60
|
+
- Page URL, title, text preview (first 800 chars)
|
|
61
|
+
- Numbered list of candidates: `[1] button | selector: "button.submit" | text: "Submit" | label: "Submit form"`
|
|
62
|
+
- Step history
|
|
63
|
+
- Working memory (if any)
|
|
64
|
+
- Last error (if any), asking the LLM to recover
|
|
65
|
+
|
|
66
|
+
### 2. `createWebLLMBridge(engine)` helper
|
|
67
|
+
|
|
68
|
+
**New file:** `[src/core/webllm-bridge.ts](src/core/webllm-bridge.ts)`, exported from `[src/lib/index.ts](src/lib/index.ts)`
|
|
69
|
+
|
|
70
|
+
Replaces the fragile manual bridge from EMBEDDING.md. Users just do:
|
|
71
|
+
|
|
72
|
+
```ts
|
|
73
|
+
import * as webllm from "@mlc-ai/web-llm";
|
|
74
|
+
import { createWebLLMBridge } from "@akshayram1/omnibrowser-agent";
|
|
75
|
+
|
|
76
|
+
const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");
|
|
77
|
+
window.__browserAgentWebLLM = createWebLLMBridge(engine);
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Internally it calls `buildSystemPrompt` + `buildUserMessage`, sends them to `engine.chat.completions.create`, and runs `parsePlannerResult` on the output. If JSON parsing fails it sends one correction follow-up before giving up.
|
|
81
|
+
|
|
82
|
+
### 3. JSON parse retry in planner
|
|
83
|
+
|
|
84
|
+
**File:** `[src/core/planner.ts](src/core/planner.ts)`
|
|
85
|
+
|
|
86
|
+
The existing `webllm` path passes the raw result directly to `toPlannerResult` with no error handling. Update it to:
|
|
87
|
+
|
|
88
|
+
- Use `parsePlannerResult` from `[src/shared/parse-action.ts](src/shared/parse-action.ts)` (already handles fenced JSON)
|
|
89
|
+
- On failure, send a correction message `"Invalid JSON. Reply with only a valid JSON object."` and retry once via the bridge's chat history
|
|
90
|
+
- Return `{ action: { type: "done", reason: "..." } }` if retry also fails
|
|
91
|
+
|
|
92
|
+
### 4. Fix extension memory persistence bug
|
|
93
|
+
|
|
94
|
+
**Files:** `[src/content/index.ts](src/content/index.ts)`, `[src/background/index.ts](src/background/index.ts)`
|
|
95
|
+
|
|
96
|
+
The content script returns `ContentResult` (which includes `reflection.memory`) but the background never writes it back into the session before the next `AGENT_TICK`. So WebLLM's cross-step memory is silently lost in the extension. Fix: after receiving a tick result, background updates `session.memory = result.reflection?.memory ?? session.memory` and `session.lastError = result.status === "error" ? result.message : undefined`.
|
|
97
|
+
|
|
98
|
+
### 5. Model name input in extension popup
|
|
99
|
+
|
|
100
|
+
**Files:** `[src/popup/index.html](src/popup/index.html)`, `[src/popup/index.ts](src/popup/index.ts)`
|
|
101
|
+
|
|
102
|
+
When `webllm` is selected, show a text input for the model ID (e.g. `Llama-3.2-1B-Instruct-q4f16_1-MLC`) and forward it as `planner.modelId` in the session. Currently the popup hardcodes no model ID.
|
|
103
|
+
|
|
104
|
+
## File change summary
|
|
105
|
+
|
|
106
|
+
- `[src/core/prompt.ts](src/core/prompt.ts)` — New: `buildSystemPrompt()` + `buildUserMessage()`
|
|
107
|
+
- `[src/core/webllm-bridge.ts](src/core/webllm-bridge.ts)` — New: `createWebLLMBridge(engine)` factory
|
|
108
|
+
- `[src/lib/index.ts](src/lib/index.ts)` — Export `createWebLLMBridge`
|
|
109
|
+
- `[src/core/planner.ts](src/core/planner.ts)` — Use rich prompt in webllm path + JSON retry
|
|
110
|
+
- `[src/content/index.ts](src/content/index.ts)` — Return memory/lastError in tick response
|
|
111
|
+
- `[src/background/index.ts](src/background/index.ts)` — Persist memory/lastError into session between ticks
|
|
112
|
+
- `[src/popup/index.html](src/popup/index.html)` + `[src/popup/index.ts](src/popup/index.ts)` — Add model ID input
|
|
113
|
+
- `[docs/EMBEDDING.md](docs/EMBEDDING.md)` — Update example to use `createWebLLMBridge`
|
|
114
|
+
|