@akshayram1/omnibrowser-agent 0.2.29 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/plan.md DELETED
@@ -1,114 +0,0 @@
1
- ---
2
- name: Local LLM Website Control
3
- overview: Improve the WebLLM integration so an on-device LLM can actually control a website reliably — fixing the thin prompt, adding a ready-made bridge helper, making JSON output robust, and fixing the extension memory bug.
4
- todos:
5
- - id: prompt-module
6
- content: Create src/core/prompt.ts with buildSystemPrompt() and buildUserMessage() that serialize the full PageSnapshot, history, memory, lastError, and action schema for the LLM
7
- status: pending
8
- - id: webllm-bridge-helper
9
- content: Create src/core/webllm-bridge.ts with createWebLLMBridge(engine) factory and export it from src/lib/index.ts
10
- status: pending
11
- - id: planner-rich-prompt
12
- content: Update planner.ts webllm path to use the rich prompt and add JSON parse retry logic using parsePlannerResult
13
- status: pending
14
- - id: fix-extension-memory
15
- content: Fix memory and lastError persistence in content/index.ts and background/index.ts so reflection state survives across extension ticks
16
- status: pending
17
- - id: popup-modelid
18
- content: Add model name input to popup UI (shown when webllm is selected), forwarded in planner.modelId
19
- status: pending
20
- isProject: false
21
- ---
22
-
23
- # Local LLM Website Control — WebLLM Improvements
24
-
25
- ## Current State
26
-
27
- The library has two planners: `heuristic` (regex, no LLM) and `webllm` (delegates to `window.__browserAgentWebLLM`, manually wired by the user). The example bridge in `[docs/EMBEDDING.md](docs/EMBEDDING.md)` only passes `goal` and `history` to the LLM — the rich page snapshot (interactive elements with selectors, page text, URL) is completely ignored, and there is no error recovery or memory. This means WebLLM can't reliably pick selectors or multi-step through a site.
28
-
29
- ## Architecture of improved WebLLM flow
30
-
31
- ```mermaid
32
- flowchart TD
33
- UserGoal["User Goal"] --> BrowserAgent
34
- BrowserAgent --> collectSnapshot["collectSnapshot()\nobserver.ts"]
35
- collectSnapshot --> Snapshot["PageSnapshot\n(url, title, textPreview, candidates[])"]
36
- Snapshot --> planNextAction["planNextAction()\nplanner.ts"]
37
- planNextAction -->|kind=heuristic| HeuristicPlan["Regex planner (existing)"]
38
- planNextAction -->|kind=webllm| BridgeLookup["window.__browserAgentWebLLM"]
39
- BridgeLookup -->|manual wiring| UserBridge["User-written bridge (existing)"]
40
- BridgeLookup -->|createWebLLMBridge NEW| BuiltInBridge["createWebLLMBridge(engine)\nbuilt-in helper"]
41
- BuiltInBridge --> RichPrompt["prompt.ts NEW\nbuildSystemPrompt()\nbuildUserMessage(input)\n- All 8 action types + schema\n- Candidates list with selectors\n- history + memory + lastError"]
42
- RichPrompt --> WebLLMEngine["@mlc-ai/web-llm engine\n(runs fully on-device)"]
43
- WebLLMEngine --> JSONOutput["Raw LLM output"]
44
- JSONOutput --> ParseRetry["parsePlannerResult()\n+ retry on bad JSON"]
45
- ParseRetry --> executeAction["executeAction()\nexecutor.ts"]
46
- ```
47
-
48
-
49
-
50
- ## Improvements
51
-
52
- ### 1. Rich prompt module
53
-
54
- **New file:** `[src/core/prompt.ts](src/core/prompt.ts)`
55
-
56
- The LLM currently has no idea what elements are on the page. This module creates two functions:
57
-
58
- - `buildSystemPrompt(customPrompt?)` — explains all 8 action types (`click`, `type`, `navigate`, `extract`, `scroll`, `focus`, `wait`, `done`) with their exact JSON fields, explains the `PlannerResult` wrapper (`evaluation`, `memory`, `nextGoal`), and instructs the model to output only valid JSON
59
- - `buildUserMessage(input: PlannerInput)` — serializes the full context:
60
- - Page URL, title, text preview (first 800 chars)
61
- - Numbered list of candidates: `[1] button | selector: "button.submit" | text: "Submit" | label: "Submit form"`
62
- - Step history
63
- - Working memory (if any)
64
- - Last error (if any), asking the LLM to recover
65
-
66
- ### 2. `createWebLLMBridge(engine)` helper
67
-
68
- **New file:** `[src/core/webllm-bridge.ts](src/core/webllm-bridge.ts)`, exported from `[src/lib/index.ts](src/lib/index.ts)`
69
-
70
- Replaces the fragile manual bridge from EMBEDDING.md. Users just do:
71
-
72
- ```ts
73
- import * as webllm from "@mlc-ai/web-llm";
74
- import { createWebLLMBridge } from "@akshayram1/omnibrowser-agent";
75
-
76
- const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");
77
- window.__browserAgentWebLLM = createWebLLMBridge(engine);
78
- ```
79
-
80
- Internally it calls `buildSystemPrompt` + `buildUserMessage`, sends them to `engine.chat.completions.create`, and runs `parsePlannerResult` on the output. If JSON parsing fails it sends one correction follow-up before giving up.
81
-
82
- ### 3. JSON parse retry in planner
83
-
84
- **File:** `[src/core/planner.ts](src/core/planner.ts)`
85
-
86
- The existing `webllm` path passes the raw result directly to `toPlannerResult` with no error handling. Update it to:
87
-
88
- - Use `parsePlannerResult` from `[src/shared/parse-action.ts](src/shared/parse-action.ts)` (already handles fenced JSON)
89
- - On failure, send a correction message `"Invalid JSON. Reply with only a valid JSON object."` and retry once via the bridge's chat history
90
- - Return `{ action: { type: "done", reason: "..." } }` if retry also fails
91
-
92
- ### 4. Fix extension memory persistence bug
93
-
94
- **Files:** `[src/content/index.ts](src/content/index.ts)`, `[src/background/index.ts](src/background/index.ts)`
95
-
96
- The content script returns `ContentResult` (which includes `reflection.memory`) but the background never writes it back into the session before the next `AGENT_TICK`. So WebLLM's cross-step memory is silently lost in the extension. Fix: after receiving a tick result, background updates `session.memory = result.reflection?.memory ?? session.memory` and `session.lastError = result.status === "error" ? result.message : undefined`.
97
-
98
- ### 5. Model name input in extension popup
99
-
100
- **Files:** `[src/popup/index.html](src/popup/index.html)`, `[src/popup/index.ts](src/popup/index.ts)`
101
-
102
- When `webllm` is selected, show a text input for the model ID (e.g. `Llama-3.2-1B-Instruct-q4f16_1-MLC`) and forward it as `planner.modelId` in the session. Currently the popup hardcodes no model ID.
103
-
104
- ## File change summary
105
-
106
- - `[src/core/prompt.ts](src/core/prompt.ts)` — New: `buildSystemPrompt()` + `buildUserMessage()`
107
- - `[src/core/webllm-bridge.ts](src/core/webllm-bridge.ts)` — New: `createWebLLMBridge(engine)` factory
108
- - `[src/lib/index.ts](src/lib/index.ts)` — Export `createWebLLMBridge`
109
- - `[src/core/planner.ts](src/core/planner.ts)` — Use rich prompt in webllm path + JSON retry
110
- - `[src/content/index.ts](src/content/index.ts)` — Return memory/lastError in tick response
111
- - `[src/background/index.ts](src/background/index.ts)` — Persist memory/lastError into session between ticks
112
- - `[src/popup/index.html](src/popup/index.html)` + `[src/popup/index.ts](src/popup/index.ts)` — Add model ID input
113
- - `[docs/EMBEDDING.md](docs/EMBEDDING.md)` — Update example to use `createWebLLMBridge`
114
-