@akshayram1/omnibrowser-agent 0.2.6 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/index.html CHANGED
@@ -3,11 +3,8 @@
3
3
  <head>
4
4
  <meta charset="UTF-8" />
5
5
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
- <title>OmniBrowser Agent</title>
7
- <meta
8
- name="description"
9
- content="OmniBrowser Agent - local-first browser AI operator library."
10
- />
6
+ <title>OmniBrowser Agent — Local-first Browser AI</title>
7
+ <meta name="description" content="OmniBrowser Agent — local-first browser AI operator. No API keys. No cloud. Runs entirely in the browser via WebLLM + WebGPU." />
11
8
  <link rel="stylesheet" href="./styles.css" />
12
9
  </head>
13
10
  <body>
@@ -16,6 +13,7 @@
16
13
  <a class="brand" href="#home">OmniBrowser Agent</a>
17
14
  <nav class="nav">
18
15
  <a href="#home">Home</a>
16
+ <a href="#whats-new">What's New</a>
19
17
  <a href="#docs">Docs</a>
20
18
  <a href="#architecture">Architecture</a>
21
19
  <a href="#embedding">Embedding</a>
@@ -29,37 +27,29 @@
29
27
  <!-- HOME -->
30
28
  <section id="home" class="section hero">
31
29
  <div class="wrap">
32
- <p class="eyebrow">Open-source browser automation SDK</p>
30
+ <p class="eyebrow">Open-source browser automation SDK · v0.2.6</p>
33
31
  <h1>Local-first browser AI automation library</h1>
34
32
  <p>
35
- OmniBrowser Agent helps you run page observation, planning, and execution flows directly in the browser.
33
+ OmniBrowser Agent plans and executes DOM actions entirely in the browser — no API keys, no cloud costs, no data leaving your machine.
34
+ Wire in a WebLLM model and it reasons, remembers, and acts on any webpage.
36
35
  </p>
37
36
  <div class="chips">
38
37
  <span>Privacy-first</span>
39
- <span>WebLLM-ready</span>
38
+ <span>WebLLM + WebGPU</span>
39
+ <span>Reflection loop</span>
40
40
  <span>Human-approved mode</span>
41
+ <span>Custom system prompt</span>
41
42
  <span>Embeddable API</span>
42
43
  </div>
43
44
  <div class="actions">
44
45
  <a class="btn primary" href="./examples/chatbot/">Live Demo</a>
45
- <a
46
- class="btn"
47
- href="https://www.npmjs.com/package/@akshayram1/omnibrowser-agent"
48
- target="_blank"
49
- rel="noreferrer"
50
- >NPM Package</a
51
- >
52
- <a
53
- class="btn"
54
- href="https://github.com/akshayram1/omnibrowser-agent"
55
- target="_blank"
56
- rel="noreferrer"
57
- >GitHub Repo</a
58
- >
46
+ <a class="btn" href="https://www.npmjs.com/package/@akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">NPM Package</a>
47
+ <a class="btn" href="https://github.com/akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">GitHub</a>
59
48
  </div>
60
49
  <div class="stats" aria-label="project stats">
61
- <div class="stat"><strong>2</strong><span>Execution Modes</span></div>
62
- <div class="stat"><strong>3</strong><span>Planner Options</span></div>
50
+ <div class="stat"><strong>2</strong><span>Agent Modes</span></div>
51
+ <div class="stat"><strong>2</strong><span>Planner Modes</span></div>
52
+ <div class="stat"><strong>8</strong><span>Action Types</span></div>
63
53
  <div class="stat"><strong>MIT</strong><span>License</span></div>
64
54
  </div>
65
55
  <div class="home-grid">
@@ -67,16 +57,18 @@
67
57
  <h3>Use Cases</h3>
68
58
  <ul>
69
59
  <li>CRM profile lookup automation</li>
70
- <li>Guided web task execution</li>
60
+ <li>Guided form-filling workflows</li>
71
61
  <li>Assisted data extraction flows</li>
62
+ <li>Multi-step task automation</li>
72
63
  </ul>
73
64
  </article>
74
65
  <article class="card">
75
- <h3>Core Modules</h3>
66
+ <h3>Core Engine</h3>
76
67
  <ul>
77
- <li><strong>Observer:</strong> page signals and candidates</li>
78
- <li><strong>Planner:</strong> next best action selection</li>
79
- <li><strong>Executor:</strong> safe browser action runtime</li>
68
+ <li><strong>Observer:</strong> DOM snapshot + candidate elements</li>
69
+ <li><strong>Planner:</strong> reflection → next action</li>
70
+ <li><strong>Safety:</strong> safe / review / blocked gating</li>
71
+ <li><strong>Executor:</strong> DOM actions with framework compat</li>
80
72
  </ul>
81
73
  </article>
82
74
  <article class="card">
@@ -84,19 +76,89 @@
84
76
  <ul>
85
77
  <li><a href="https://www.npmjs.com/package/@akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">NPM package</a></li>
86
78
  <li><a href="https://github.com/akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">GitHub repository</a></li>
87
- <li><a href="./README.md" target="_blank" rel="noreferrer">README</a></li>
79
+ <li><a href="./examples/chatbot/" target="_blank">Live Demo</a></li>
88
80
  </ul>
89
81
  </article>
90
82
  </div>
91
83
  </div>
92
84
  </section>
93
85
 
86
+ <!-- WHAT'S NEW -->
87
+ <section id="whats-new" class="section">
88
+ <div class="wrap">
89
+ <div class="surface">
90
+ <h2>What's New in v0.2.6</h2>
91
+ <p>This release implements the <strong>reflection-before-action pattern</strong> — the same loop used by leading browser agents — plus a new <code>systemPrompt</code> option so you can shape agent behaviour without rewriting the bridge.</p>
92
+
93
+ <h3>Reflection Loop <span class="badge new">New</span></h3>
94
+ <p>Before every action the agent now goes through a 4-step inner loop:</p>
95
+ <div class="docs-grid">
96
+ <article class="doc-card">
97
+ <h4>1 · Evaluate</h4>
98
+ <p>What happened in the previous step? Did it succeed? What changed on the page?</p>
99
+ </article>
100
+ <article class="doc-card">
101
+ <h4>2 · Remember</h4>
102
+ <p>What key facts should be carried into the next step? Selector mappings, field values, task state.</p>
103
+ </article>
104
+ <article class="doc-card">
105
+ <h4>3 · Plan</h4>
106
+ <p>State the next goal in plain English before choosing an action.</p>
107
+ </article>
108
+ <article class="doc-card">
109
+ <h4>4 · Act</h4>
110
+ <p>Output the specific DOM action: click, type, navigate, scroll, etc.</p>
111
+ </article>
112
+ </div>
113
+
114
+ <p>The WebLLM bridge now returns the full reflection object:</p>
115
+ <pre><code>{
116
+ "evaluation": "The name field was filled successfully.",
117
+ "memory": "Name=#name done. Next: fill email at #email.",
118
+ "next_goal": "Type the email address into #email",
119
+ "action": { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
120
+ }</code></pre>
121
+
122
+ <p>The <code>nextGoal</code> field is surfaced in the live demo as a <strong>💭 thought bubble</strong> before each action, so you can follow the agent's reasoning in real time.</p>
123
+
124
+ <h3>Working Memory Across Steps <span class="badge new">New</span></h3>
125
+ <p>The agent's <code>memory</code> string is automatically carried forward from one tick to the next inside <code>AgentSession</code>. The planner receives it as <code>input.memory</code> and can update it each step — giving the agent a scratchpad across the whole task.</p>
126
+
127
+ <h3>Custom System Prompt <span class="badge new">New</span></h3>
128
+ <p>Pass your own system prompt directly in the planner config — no need to rewrite the bridge:</p>
129
+ <pre><code>const agent = createBrowserAgent({
130
+ goal: "Fill the checkout form",
131
+ planner: {
132
+ kind: "webllm",
133
+ systemPrompt: "You are a careful checkout assistant. Never submit before all required fields are filled."
134
+ }
135
+ });</code></pre>
136
+
137
+ <h3>New Exports <span class="badge new">New</span></h3>
138
+ <ul>
139
+ <li><code>parsePlannerResult(raw)</code> — parse the full reflection+action JSON from raw LLM output, with fallback to bare AgentAction for backward compatibility.</li>
140
+ <li><code>PlannerResult</code> type — <code>{ action, evaluation?, memory?, nextGoal? }</code></li>
141
+ </ul>
142
+ <pre><code>import { parsePlannerResult } from "@akshayram1/omnibrowser-agent";
143
+
144
+ const result = parsePlannerResult(llmRawOutput);
145
+ // result.action → AgentAction
146
+ // result.evaluation → string | undefined
147
+ // result.memory → string | undefined
148
+ // result.nextGoal → string | undefined</code></pre>
149
+
150
+ <h3>Backward Compatible</h3>
151
+ <p>Existing bridges that return a bare <code>AgentAction</code> object still work without any changes. The library normalises both formats automatically.</p>
152
+ </div>
153
+ </div>
154
+ </section>
155
+
94
156
  <!-- DOCS / QUICK START -->
95
157
  <section id="docs" class="section">
96
158
  <div class="wrap">
97
159
  <div class="surface">
98
160
  <h2>Docs</h2>
99
- <p>Everything you need to install, initialize, and run your first browser agent.</p>
161
+ <p>Everything you need to install, initialise, and run your first browser agent.</p>
100
162
 
101
163
  <h3>Installation</h3>
102
164
  <pre><code>npm install @akshayram1/omnibrowser-agent</code></pre>
@@ -107,83 +169,94 @@
107
169
  const agent = createBrowserAgent(
108
170
  {
109
171
  goal: "Open CRM and find customer John Smith",
110
- mode: "human-approved",
111
- planner: { kind: "heuristic" }
172
+ mode: "human-approved", // or "autonomous"
173
+ planner: { kind: "heuristic" } // or "webllm"
112
174
  },
113
175
  {
114
- onStep: (result) => console.log(result.message),
115
- onApprovalRequired: (action) => console.log("Needs approval:", action),
116
- onDone: (result) => console.log("Done:", result.message),
117
- onMaxStepsReached: (session) => console.log("Max steps hit", session.history)
176
+ onStep: (result, session) => console.log(result.message),
177
+ onApprovalRequired:(action, session) => console.log("Needs approval:", action),
178
+ onDone: (result, session) => console.log("Done:", result.message),
179
+ onError: (err, session) => console.error(err),
180
+ onMaxStepsReached: (session) => console.log("Max steps hit"),
118
181
  }
119
182
  );
120
183
 
121
184
  await agent.start();
122
185
 
123
- // Resume after approval:
186
+ // Resume after an approval prompt:
124
187
  await agent.resume();
125
188
 
126
- // Inspect state:
189
+ // Inspect state at any time:
127
190
  console.log(agent.isRunning, agent.hasPendingAction);
128
191
 
129
192
  // Stop:
130
193
  agent.stop();</code></pre>
131
194
 
132
- <h3>AbortSignal support</h3>
195
+ <h3>AbortSignal Support</h3>
133
196
  <pre><code>const controller = new AbortController();
134
197
  const agent = createBrowserAgent({ goal: "...", signal: controller.signal });
135
198
  agent.start();
136
199
 
137
- // Cancel from outside:
138
- controller.abort();</code></pre>
200
+ controller.abort(); // cancel from outside</code></pre>
201
+
202
+ <h3>Reading Reflection Fields</h3>
203
+ <p>Every <code>onStep</code> result now includes optional reflection data from the planner:</p>
204
+ <pre><code>onStep(result, session) {
205
+ if (result.reflection?.nextGoal) {
206
+ console.log("Agent thinking:", result.reflection.nextGoal);
207
+ }
208
+ if (result.reflection?.memory) {
209
+ console.log("Agent memory:", result.reflection.memory);
210
+ }
211
+ console.log("Action:", result.message);
212
+ }</code></pre>
139
213
 
140
- <h3>Execution Modes</h3>
214
+ <h3>Agent Modes</h3>
141
215
  <div class="docs-grid">
142
216
  <article class="doc-card">
143
217
  <h4>human-approved</h4>
144
- <p>Requires explicit approval for sensitive actions. Best for production-like workflows.</p>
218
+ <p>Pauses on review-rated actions and fires <code>onApprovalRequired</code>. Call <code>agent.resume()</code> to continue. Recommended for CRM, finance, and admin flows.</p>
145
219
  </article>
146
220
  <article class="doc-card">
147
221
  <h4>autonomous</h4>
148
- <p>Runs actions continuously with fewer pauses. Best for rapid iteration and demos.</p>
222
+ <p>Executes all safe and review actions without pausing. Best for rapid prototyping and demos.</p>
149
223
  </article>
150
224
  </div>
151
225
 
152
- <h3>Planner Options</h3>
226
+ <h3>Planner Modes</h3>
153
227
  <div class="docs-grid">
154
228
  <article class="doc-card">
155
229
  <h4>heuristic</h4>
156
- <p>Zero-dependency regex-based planner. Works offline. Best for simple, predictable goals.</p>
230
+ <p>Zero-dependency regex planner. Works fully offline. Best for simple, predictable goals: navigate, fill a field, click a button.</p>
157
231
  </article>
158
232
  <article class="doc-card">
159
233
  <h4>webllm</h4>
160
- <p>Delegates to a local WebLLM bridge (<code>window.__browserAgentWebLLM</code>). Fully private, no API calls.</p>
234
+ <p>On-device LLM via WebGPU through <code>window.__browserAgentWebLLM</code>. Fully private. Supports the reflection loop and custom system prompts.</p>
161
235
  </article>
162
-
163
236
  </div>
164
237
 
165
238
  <h3>Supported Actions</h3>
166
239
  <table>
167
240
  <thead>
168
- <tr><th>Action</th><th>Description</th></tr>
241
+ <tr><th>Action</th><th>Description</th><th>Risk level</th></tr>
169
242
  </thead>
170
243
  <tbody>
171
- <tr><td><code>click</code></td><td>Click an element by CSS selector</td></tr>
172
- <tr><td><code>type</code></td><td>Type text into an input or textarea</td></tr>
173
- <tr><td><code>navigate</code></td><td>Navigate to a URL</td></tr>
174
- <tr><td><code>extract</code></td><td>Extract text from an element</td></tr>
175
- <tr><td><code>scroll</code></td><td>Scroll a container or the page</td></tr>
176
- <tr><td><code>focus</code></td><td>Focus an element (useful for dropdowns)</td></tr>
177
- <tr><td><code>wait</code></td><td>Pause for a given number of milliseconds</td></tr>
178
- <tr><td><code>done</code></td><td>Signal task completion</td></tr>
244
+ <tr><td><code>navigate</code></td><td>Navigate to a URL (http/https only)</td><td>safe</td></tr>
245
+ <tr><td><code>click</code></td><td>Click an element by CSS selector</td><td>safe / review</td></tr>
246
+ <tr><td><code>type</code></td><td>Type text into an input or textarea</td><td>safe / review</td></tr>
247
+ <tr><td><code>scroll</code></td><td>Scroll a container or the page</td><td>safe</td></tr>
248
+ <tr><td><code>focus</code></td><td>Focus an element (useful for dropdowns)</td><td>safe</td></tr>
249
+ <tr><td><code>wait</code></td><td>Pause for N milliseconds</td><td>safe</td></tr>
250
+ <tr><td><code>extract</code></td><td>Extract text from an element</td><td>review</td></tr>
251
+ <tr><td><code>done</code></td><td>Signal task completion</td><td>safe</td></tr>
179
252
  </tbody>
180
253
  </table>
181
254
 
182
- <h3>Safety Notes</h3>
255
+ <h3>Safety Model</h3>
183
256
  <ul>
184
- <li>Prefer scoped selectors for deterministic action targeting.</li>
185
- <li>Use <code>human-approved</code> mode for workflows that mutate critical data.</li>
186
- <li>Log <code>onStep</code> output for auditability and debugging.</li>
257
+ <li><strong>safe</strong> executes immediately in all modes.</li>
258
+ <li><strong>review</strong> — pauses in <code>human-approved</code> mode; executes in <code>autonomous</code>. Triggered by actions on labels matching delete / submit / pay / confirm / transfer.</li>
259
+ <li><strong>blocked</strong> — never executes. Triggered by <code>javascript:</code>, <code>file:</code>, or malformed URLs.</li>
187
260
  </ul>
188
261
  </div>
189
262
  </div>
@@ -194,77 +267,85 @@ controller.abort();</code></pre>
194
267
  <div class="wrap">
195
268
  <div class="surface">
196
269
  <h2>Architecture</h2>
197
- <p>How OmniBrowser Agent is structured internally and how its components interact.</p>
270
+ <p>OmniBrowser Agent is split into two delivery modes that share the same underlying engine. See the full breakdown in <a href="https://github.com/akshayram1/omnibrowser-agent/blob/main/docs/arch.md" target="_blank" rel="noreferrer">docs/arch.md</a>.</p>
198
271
 
199
- <h3>Goals</h3>
200
- <ul>
201
- <li>Local-first runtime in browser</li>
202
- <li>Privacy-first defaults</li>
203
- <li>Open-source composable planner/executor contracts</li>
204
- <li>Human-approved mode for risky actions</li>
205
- </ul>
272
+ <h3>Delivery Layer</h3>
273
+ <div class="docs-grid">
274
+ <article class="doc-card">
275
+ <h4>🧩 Chrome Extension</h4>
276
+ <p>Popup UI + background service worker. Manages sessions per tab and drives the tick loop via <code>chrome.tabs.sendMessage</code>.</p>
277
+ </article>
278
+ <article class="doc-card">
279
+ <h4>📦 npm Library</h4>
280
+ <p><code>createBrowserAgent()</code> — runs the same tick loop in-process inside your web app. No extension required.</p>
281
+ </article>
282
+ </div>
206
283
 
207
- <h3>Runtime Components</h3>
284
+ <h3>Core Modules <code>src/core/</code></h3>
208
285
  <div class="docs-grid">
209
286
  <article class="doc-card">
210
- <h4>Popup UI</h4>
211
- <p>Starts/stops sessions. Picks execution mode (<code>autonomous</code>, <code>human-approved</code>) and planner (<code>heuristic</code>, <code>webllm</code>).</p>
287
+ <h4>observer.ts</h4>
288
+ <p>Queries all interactive elements, filters invisible ones, resolves accessible labels (<code>aria-label</code>, <code>for/id</code>, wrapping <code>&lt;label&gt;</code>), caps at 60 candidates. Returns <code>PageSnapshot</code>.</p>
212
289
  </article>
213
290
  <article class="doc-card">
214
- <h4>Background Service Worker</h4>
215
- <p>Session state machine per tab. Tick loop orchestration and approval handling.</p>
291
+ <h4>planner.ts</h4>
292
+ <p>Calls heuristic regex or the <code>window.__browserAgentWebLLM</code> bridge. Returns <code>PlannerResult</code> action plus optional <code>evaluation</code>, <code>memory</code>, <code>nextGoal</code>.</p>
216
293
  </article>
217
294
  <article class="doc-card">
218
- <h4>Content Agent</h4>
219
- <p><strong>pageObserver</strong> page snapshot extraction.<br>
220
- <strong>planner</strong> — next-action decision.<br>
221
- <strong>safety</strong> — risk gating.<br>
222
- <strong>executor</strong> — DOM action execution.</p>
295
+ <h4>executor.ts</h4>
296
+ <p>Performs DOM actions. Uses <code>InputEvent</code> with <code>bubbles: true</code> for React/Vue compat. Verifies element exists, is not disabled, and value updated. Throws on failure so the retry loop feeds <code>lastError</code> back.</p>
223
297
  </article>
224
298
  </div>
225
299
 
226
- <h3>Action Contracts</h3>
227
- <p>All components share a typed action protocol defined in <code>src/shared/contracts.ts</code>:</p>
228
- <ul>
229
- <li><code>click</code> — click element by CSS selector</li>
230
- <li><code>type</code> type text into input/textarea</li>
231
- <li><code>navigate</code> — navigate to URL</li>
232
- <li><code>extract</code> — extract text from element</li>
233
- <li><code>scroll</code> — scroll container or page</li>
234
- <li><code>focus</code> focus an element</li>
235
- <li><code>wait</code> — pause for N milliseconds</li>
236
- <li><code>done</code> — signal task completion</li>
237
- </ul>
238
-
239
- <h3>Safety Model</h3>
240
- <ul>
241
- <li>Block invalid URL protocols</li>
242
- <li>Review risky actions (submit/delete/pay-like selectors)</li>
243
- <li>In <code>human-approved</code> mode, review-level actions require manual approval before execution</li>
244
- </ul>
245
-
246
- <h3>Planner Bridges</h3>
247
- <p>
248
- All planner bridges follow the same pattern — an object attached to <code>window</code>
249
- that implements a <code>plan()</code> method returning an <code>AgentAction</code>.
250
- The core library has <strong>zero runtime dependencies</strong>; bridge implementations are provided by the consumer.
251
- </p>
252
-
253
- <h4>WebLLM bridge</h4>
300
+ <h3>Data Flow — One Tick</h3>
301
+ <pre><code>goal + history + memory
302
+
303
+
304
+ observer.collectSnapshot() → PageSnapshot (url, title, candidates[])
305
+
306
+
307
+ planner.planNextAction() → PlannerResult
308
+ { action, evaluation?, memory?, nextGoal? }
309
+
310
+
311
+ safety.assessRisk(action) → safe | review | blocked
312
+
313
+ ┌────┴──────────────────────────┐
314
+ blocked review (human-approved mode)
315
+ │ │
316
+ stop pause user approves → resume()
317
+
318
+ safe / approved
319
+
320
+
321
+ executor.executeAction(action) → result string
322
+
323
+
324
+ session.history.push(result)
325
+ session.memory = plannerResult.memory
326
+ → next tick</code></pre>
327
+
328
+ <h3>WebLLM Bridge Contract</h3>
329
+ <p>Attach an object to <code>window.__browserAgentWebLLM</code> before starting the agent. The bridge can return either the new <code>PlannerResult</code> format or a bare <code>AgentAction</code> (backward compatible).</p>
254
330
  <pre><code>window.__browserAgentWebLLM = {
255
331
  async plan(input, modelId) {
256
- // call your local WebLLM engine and return one AgentAction
257
- return { type: "done", reason: "result from model" };
332
+ // input.goal, input.snapshot, input.history,
333
+ // input.lastError, input.memory, input.systemPrompt
334
+ return {
335
+ evaluation: "Previous step succeeded.",
336
+ memory: "Name field is #name.",
337
+ next_goal: "Fill the email field.",
338
+ action: { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
339
+ };
258
340
  }
259
341
  };</code></pre>
260
342
 
261
-
262
343
  <h3>Current Limitations</h3>
263
344
  <ul>
264
- <li>No persistent long-term memory yet</li>
265
- <li>No task DSL or skills registry yet</li>
266
- <li>Risk scoring is a simple keyword heuristic</li>
267
- <li>No robust selector healing yet</li>
345
+ <li>No persistent long-term memory (IndexedDB) yet</li>
346
+ <li>No goal decomposition / multi-step task graphs yet</li>
347
+ <li>Risk scoring is keyword-based, not semantic</li>
348
+ <li>No selector healing or fallback strategy yet</li>
268
349
  </ul>
269
350
  </div>
270
351
  </div>
@@ -275,12 +356,12 @@ controller.abort();</code></pre>
275
356
  <div class="wrap">
276
357
  <div class="surface">
277
358
  <h2>Embedding Guide</h2>
278
- <p>How to embed OmniBrowser Agent as a library inside your own web application.</p>
359
+ <p>Embed OmniBrowser Agent as a library in any web application. Full reference in <a href="https://github.com/akshayram1/omnibrowser-agent/blob/main/docs/EMBEDDING.md" target="_blank" rel="noreferrer">docs/EMBEDDING.md</a>.</p>
279
360
 
280
361
  <h3>Install</h3>
281
362
  <pre><code>npm install @akshayram1/omnibrowser-agent</code></pre>
282
363
 
283
- <h3>Basic Usage</h3>
364
+ <h3>Heuristic Planner (zero setup)</h3>
284
365
  <pre><code>import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
285
366
 
286
367
  const agent = createBrowserAgent(
@@ -292,41 +373,79 @@ const agent = createBrowserAgent(
292
373
  stepDelayMs: 400
293
374
  },
294
375
  {
295
- onStep: (result) => console.log("step", result),
296
- onApprovalRequired: (action) => {
297
- console.log("approval required", action);
298
- // Show your own modal/button then call approvePendingAction()
299
- },
300
- onDone: (result) => console.log("done", result),
301
- onError: (error) => console.error(error)
376
+ onStep: (result) => console.log("step", result),
377
+ onApprovalRequired: (action) => showApprovalModal(action),
378
+ onDone: (result) => console.log("done", result),
379
+ onError: (error) => console.error(error)
302
380
  }
303
381
  );
304
382
 
305
- await agent.start();</code></pre>
383
+ await agent.start();
384
+
385
+ // Approve a paused action:
386
+ await agent.approvePendingAction();
306
387
 
307
- <h3>Approve a Pending Action</h3>
308
- <pre><code>await agent.approvePendingAction();</code></pre>
388
+ // Stop at any time:
389
+ agent.stop();</code></pre>
309
390
 
310
- <h3>Stop Running Session</h3>
311
- <pre><code>agent.stop();</code></pre>
391
+ <h3>WebLLM Planner with Reflection</h3>
392
+ <p>Load a WebLLM engine, wire the bridge, then start the agent. The bridge receives the full reflection input and should return the reflection+action object:</p>
393
+ <pre><code>import * as webllm from "@mlc-ai/web-llm";
394
+ import { createBrowserAgent, parsePlannerResult } from "@akshayram1/omnibrowser-agent";
312
395
 
313
- <h3>WebLLM Mode</h3>
314
- <p>To use planner mode <code>webllm</code>, provide a local bridge in your app:</p>
315
- <pre><code>window.__browserAgentWebLLM = {
396
+ const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");
397
+
398
+ window.__browserAgentWebLLM = {
316
399
  async plan(input, modelId) {
317
- // call your local WebLLM engine and return one AgentAction JSON
318
- return { type: "done", reason: `Implement bridge with model ${modelId ?? "default"}` };
400
+ const { goal, history, lastError, memory, systemPrompt } = input;
401
+
402
+ const defaultSystem = `You are a browser automation agent.
403
+ Output ONLY a JSON object in this format:
404
+ {"evaluation":"...","memory":"...","next_goal":"...","action":{...}}`;
405
+
406
+ const resp = await engine.chat.completions.create({
407
+ messages: [
408
+ { role: "system", content: systemPrompt || defaultSystem },
409
+ { role: "user", content: `Goal: "${goal}"\nHistory: ${history.slice(-4).join(" → ")}${memory ? "\nMemory: " + memory : ""}${lastError ? "\nLast error: " + lastError : ""}` }
410
+ ],
411
+ temperature: 0,
412
+ max_tokens: 200
413
+ });
414
+
415
+ return parsePlannerResult(resp.choices[0].message.content);
319
416
  }
320
417
  };
321
418
 
322
- // Then configure:
323
- planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }</code></pre>
419
+ const agent = createBrowserAgent({
420
+ goal: "Fill the checkout form with my details",
421
+ planner: { kind: "webllm" }
422
+ }, {
423
+ onStep(result) {
424
+ if (result.reflection?.nextGoal) console.log("💭", result.reflection.nextGoal);
425
+ console.log("✅", result.message);
426
+ }
427
+ });
428
+
429
+ await agent.start();</code></pre>
430
+
431
+ <h3>Custom System Prompt</h3>
432
+ <p>Shape the agent's personality or constraints without touching the bridge:</p>
433
+ <pre><code>const agent = createBrowserAgent({
434
+ goal: "Book a meeting room for tomorrow",
435
+ planner: {
436
+ kind: "webllm",
437
+ systemPrompt: `You are a careful meeting room booking assistant.
438
+ Always confirm the room is available before clicking Book.
439
+ Never navigate away from the booking portal.`
440
+ }
441
+ });</code></pre>
324
442
 
325
443
  <h3>Notes</h3>
326
444
  <ul>
327
- <li>For production, mount this inside an authenticated app shell and add your own permission checks.</li>
328
- <li><code>human-approved</code> mode is recommended for CRM, finance, and admin actions.</li>
329
445
  <li>The WebLLM bridge is not bundled — bring your own engine and attach it to <code>window.__browserAgentWebLLM</code>.</li>
446
+ <li>Use <code>human-approved</code> mode for CRM, finance, and admin actions.</li>
447
+ <li>Bridges returning a bare <code>AgentAction</code> still work — backward compatible.</li>
448
+ <li>For production apps, mount inside an authenticated shell and add your own permission checks.</li>
330
449
  </ul>
331
450
  </div>
332
451
  </div>
@@ -337,6 +456,7 @@ planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }</code>
337
456
  <div class="wrap">
338
457
  <div class="surface">
339
458
  <h2>Roadmap</h2>
459
+ <p>Full roadmap in <a href="https://github.com/akshayram1/omnibrowser-agent/blob/main/docs/ROADMAP.md" target="_blank" rel="noreferrer">docs/ROADMAP.md</a>.</p>
340
460
 
341
461
  <h3>v0.1</h3>
342
462
  <ul>
@@ -346,20 +466,30 @@ planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }</code>
346
466
  <li>Human-approved mode</li>
347
467
  </ul>
348
468
 
349
- <h3>v0.2 <span class="badge">current</span></h3>
469
+ <h3>v0.2 <span class="badge">stable</span></h3>
350
470
  <ul>
351
471
  <li>New actions: <code>scroll</code>, <code>focus</code></li>
352
472
  <li>Improved heuristic planner with regex goal patterns</li>
353
- <li>Better page observation (visibility filtering, placeholder capture, up to 60 candidates)</li>
473
+ <li>Better page observation (visibility filtering, up to 60 candidates)</li>
354
474
  <li>Library API: <code>resume()</code>, <code>isRunning</code>, <code>hasPendingAction</code>, <code>AbortSignal</code>, <code>onMaxStepsReached</code></li>
475
+ <li>CI pipeline with auto version bump on push to main</li>
476
+ </ul>
477
+
478
+ <h3>v0.2.6 <span class="badge new">current</span></h3>
479
+ <ul>
480
+ <li>Reflection-before-action pattern (<code>evaluation → memory → next_goal → act</code>)</li>
481
+ <li>Working memory carried across ticks via <code>AgentSession.memory</code></li>
482
+ <li><code>parsePlannerResult()</code> exported from library</li>
483
+ <li><code>systemPrompt</code> option in <code>PlannerConfig</code></li>
484
+ <li>Thought bubble (💭) messages in live demo</li>
485
+ <li>Chatbot UI redesign: tabs, typing indicator, right-aligned messages</li>
355
486
  </ul>
356
487
 
357
488
  <h3>v0.3</h3>
358
489
  <ul>
359
490
  <li>Site profile and policy engine (allowlist, blocked domains)</li>
360
491
  <li>Selector healing and fallback strategy</li>
361
- <li>Session memory and action replay log</li>
362
- <li>Drupal CRM starter skills</li>
492
+ <li>Session replay log</li>
363
493
  </ul>
364
494
 
365
495
  <h3>v1.0</h3>
@@ -382,27 +512,11 @@ planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }</code>
382
512
  <h2>Contact</h2>
383
513
  <p>Maintainer: Akshay Chame</p>
384
514
  <ul>
385
- <li>
386
- Email:
387
- <a href="mailto:akshaychame2@gmail.com">akshaychame2@gmail.com</a>
388
- </li>
389
- <li>
390
- GitHub:
391
- <a href="https://github.com/akshayram1" target="_blank" rel="noreferrer">@akshayram1</a>
392
- </li>
393
- <li>
394
- Package:
395
- <a
396
- href="https://www.npmjs.com/package/@akshayram1/omnibrowser-agent"
397
- target="_blank"
398
- rel="noreferrer"
399
- >@akshayram1/omnibrowser-agent</a
400
- >
401
- </li>
515
+ <li>Email: <a href="mailto:akshaychame2@gmail.com">akshaychame2@gmail.com</a></li>
516
+ <li>GitHub: <a href="https://github.com/akshayram1" target="_blank" rel="noreferrer">@akshayram1</a></li>
517
+ <li>Package: <a href="https://www.npmjs.com/package/@akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">@akshayram1/omnibrowser-agent</a></li>
402
518
  </ul>
403
- <p class="contact-note">
404
- For feature requests or bugs, please open an issue on GitHub with reproduction steps.
405
- </p>
519
+ <p class="contact-note">For feature requests or bugs, please open an issue on GitHub with reproduction steps.</p>
406
520
  </div>
407
521
  </div>
408
522
  </section>
@@ -410,7 +524,7 @@ planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }</code>
410
524
 
411
525
  <footer class="footer">
412
526
  <div class="wrap">
413
- <p>© 2026 OmniBrowser Agent · MIT License</p>
527
+ <p>© 2026 OmniBrowser Agent · MIT License · <a href="https://github.com/akshayram1/omnibrowser-agent" target="_blank" rel="noreferrer">GitHub</a></p>
414
528
  </div>
415
529
  </footer>
416
530
  </body>