@akshayram1/omnibrowser-agent 0.2.29 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.npm.md ADDED
@@ -0,0 +1,220 @@
1
+ # omnibrowser-agent
2
+
3
+ [![npm](https://img.shields.io/npm/v/@akshayram1/omnibrowser-agent)](https://www.npmjs.com/package/@akshayram1/omnibrowser-agent)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
5
+
6
+ Local-first browser AI operator. Plans and executes DOM actions entirely in the browser — no API keys, no cloud costs, no data leaving your machine.
7
+
8
+ [Live Demo](https://omnibrowser-agent.vercel.app/examples/chatbot/) · [GitHub](https://github.com/akshayram1/omnibrowser-agent) · [Embedding Guide](https://github.com/akshayram1/omnibrowser-agent/blob/main/docs/EMBEDDING.md) · [Roadmap](https://github.com/akshayram1/omnibrowser-agent/blob/main/docs/ROADMAP.md)
9
+
10
+ ---
11
+
12
+ ## Architecture
13
+
14
+ ```
15
+ Chrome Extension npm Library
16
+ (popup + bg worker) createBrowserAgent()
17
+ | |
18
+ +----------+-------------+
19
+ |
20
+ Orchestration
21
+ (session & tick loop)
22
+ |
23
+ +----------+----------+
24
+ | | |
25
+ observer planner executor
26
+ (DOM snap) (heuristic (click/type/
27
+ /webllm) navigate...)
28
+ | | |
29
+ +----------+----------+
30
+ |
31
+ safety
32
+ (safe/review/blocked)
33
+ ```
34
+
35
+ ### One tick
36
+
37
+ ```
38
+ goal + history + memory
39
+ |
40
+ v
41
+ observer.collectSnapshot() --> PageSnapshot (url, title, candidates[])
42
+ |
43
+ v
44
+ planner.planNextAction() --> PlannerResult { action, evaluation?, memory?, nextGoal? }
45
+ |
46
+ v
47
+ safety.assessRisk(action) --> safe | review | blocked
48
+ |
49
+ blocked --> stop
50
+ review --> pause (human-approved) --> user calls resume()
51
+ safe --> executor.executeAction()
52
+ |
53
+ v
54
+ session.history.push(result) --> next tick
55
+ ```
56
+
57
+ ---
58
+
59
+ ## Install
60
+
61
+ ```bash
62
+ npm install @akshayram1/omnibrowser-agent
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Quick start
68
+
69
+ ```ts
70
+ import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
71
+
72
+ const agent = createBrowserAgent({
73
+ goal: "Search for contact Jane Doe and open her profile",
74
+ mode: "human-approved", // or "autonomous"
75
+ planner: { kind: "heuristic" } // or "webllm"
76
+ }, {
77
+ onStep: (result, session) => console.log(result.message),
78
+ onApprovalRequired: (action, session) => console.log("Review:", action),
79
+ onDone: (result, session) => console.log("Done:", result.message),
80
+ onError: (err, session) => console.error(err),
81
+ onMaxStepsReached: (session) => console.log("Max steps hit"),
82
+ });
83
+
84
+ await agent.start();
85
+
86
+ // After onApprovalRequired fires:
87
+ await agent.resume();
88
+
89
+ // Cancel at any time:
90
+ agent.stop();
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Planner modes
96
+
97
+ | Mode | Description | When to use |
98
+ |-------------|-----------------------------------------------------|-----------------------------------------------|
99
+ | `heuristic` | Zero-dependency regex planner. Works fully offline. | Simple, predictable goals — navigate, fill, click |
100
+ | `webllm` | On-device LLM via WebGPU. Fully private, no API calls. | Open-ended, multi-step, language-heavy goals |
101
+
102
+ ### WebLLM with a custom system prompt
103
+
104
+ ```ts
105
+ const agent = createBrowserAgent({
106
+ goal: "Fill the checkout form",
107
+ planner: {
108
+ kind: "webllm",
109
+ systemPrompt: "You are a careful checkout assistant. Never submit before all required fields are filled."
110
+ }
111
+ });
112
+ ```
113
+
114
+ ### Recommended WebLLM models
115
+
116
+ | Model ID | Size | Notes |
117
+ |----------|------|-------|
118
+ | `Llama-3.2-1B-Instruct-q4f16_1-MLC` | ~600 MB | fastest |
119
+ | `Llama-3.2-3B-Instruct-q4f16_1-MLC` | ~1.5 GB | fast |
120
+ | `Phi-3.5-mini-instruct-q4f16_1-MLC` | ~2 GB | quality |
121
+ | `Mistral-7B-Instruct-v0.3-q4f16_1-MLC` | ~4.1 GB | balanced |
122
+ | `Qwen2.5-7B-Instruct-q4f16_1-MLC` | ~4.3 GB | strong |
123
+ | `Llama-3.1-8B-Instruct-q4f16_1-MLC` | ~4.8 GB | strong |
124
+ | `Qwen3-8B-q4f16_1-MLC` | ~5 GB | latest Qwen |
125
+ | `gemma-2-9b-it-q4f16_1-MLC` | ~5.5 GB | Google Gemma |
126
+ | `DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC` | ~5 GB | reasoning |
127
+ | `Llama-3.1-70B-Instruct-q3f16_1-MLC` | ~35 GB | most capable (needs 24+ GB VRAM) |
128
+
129
+ ---
130
+
131
+ ## Agent modes
132
+
133
+ | Mode | Behaviour |
134
+ |------------------|---------------------------------------------------------------------------|
135
+ | `autonomous` | All `safe` and `review` actions execute without pause |
136
+ | `human-approved` | `review`-rated actions pause and emit `onApprovalRequired` — call `resume()` to continue |
137
+
138
+ ---
139
+
140
+ ## Supported actions
141
+
142
+ | Action | Description | Risk |
143
+ |------------|------------------------------------|----------------|
144
+ | `navigate` | Navigate to a URL (http/https only) | safe |
145
+ | `click` | Click an element by CSS selector | safe / review |
146
+ | `type` | Type text into an input | safe / review |
147
+ | `scroll` | Scroll a container or the page | safe |
148
+ | `focus` | Focus an element | safe |
149
+ | `wait` | Pause for N milliseconds | safe |
150
+ | `extract` | Extract text from an element | review |
151
+ | `done` | Signal task completion | safe |
152
+
153
+ ---
154
+
155
+ ## AbortSignal support
156
+
157
+ ```ts
158
+ const controller = new AbortController();
159
+ const agent = createBrowserAgent({ goal: "...", signal: controller.signal });
160
+ agent.start();
161
+
162
+ controller.abort(); // cancel from outside
163
+ ```
164
+
165
+ ---
166
+
167
+ ## WebLLM bridge wiring
168
+
169
+ ```ts
170
+ import * as webllm from "@mlc-ai/web-llm";
171
+ import { createBrowserAgent, parsePlannerResult } from "@akshayram1/omnibrowser-agent";
172
+
173
+ const engine = await webllm.CreateMLCEngine("Phi-3.5-mini-instruct-q4f16_1-MLC");
174
+
175
+ window.__browserAgentWebLLM = {
176
+ async plan(input) {
177
+ const { goal, history, lastError, memory, systemPrompt } = input;
178
+ const resp = await engine.chat.completions.create({
179
+ messages: [
180
+ { role: "system", content: systemPrompt || "You are a browser automation agent. Output only JSON." },
181
+ { role: "user", content: `Goal: "${goal}"\nHistory: ${history.slice(-4).join(" -> ")}${memory ? "\nMemory: " + memory : ""}${lastError ? "\nLast error: " + lastError : ""}` }
182
+ ],
183
+ temperature: 0,
184
+ max_tokens: 200
185
+ });
186
+ return parsePlannerResult(resp.choices[0].message.content);
187
+ }
188
+ };
189
+
190
+ const agent = createBrowserAgent({ goal: "Fill the checkout form", planner: { kind: "webllm" } });
191
+ await agent.start();
192
+ ```
193
+
194
+ ---
195
+
196
+ ## Chrome Extension
197
+
198
+ 1. `npm run build`
199
+ 2. Open `chrome://extensions`, enable **Developer Mode**, click **Load unpacked**, select `dist/`.
200
+ 3. Open any tab, enter a goal in the popup, pick a mode, and click **Start**.
201
+
202
+ ---
203
+
204
+ ## Project structure
205
+
206
+ ```
207
+ src/
208
+ ├── background/ Extension service worker — session management
209
+ ├── content/ Extension content script — runs in page context
210
+ ├── core/ Shared engine (planner, observer, executor)
211
+ ├── lib/ npm library entry — createBrowserAgent()
212
+ ├── popup/ Extension popup UI
213
+ └── shared/ Types, safety, and parse utilities
214
+ ```
215
+
216
+ ---
217
+
218
+ ## License
219
+
220
+ MIT © Akshay Chame
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@akshayram1/omnibrowser-agent",
3
- "version": "0.2.29",
3
+ "version": "0.2.32",
4
4
  "private": false,
5
5
  "type": "module",
6
6
  "main": "./dist/lib.js",
@@ -20,7 +20,9 @@
20
20
  "build:types": "tsc -p tsconfig.lib.json",
21
21
  "watch": "node scripts/build.mjs --watch",
22
22
  "typecheck": "tsc --noEmit && tsc -p tsconfig.test.json --noEmit",
23
- "test": "node --experimental-strip-types --test src/shared/safety.test.ts src/core/planner.test.ts"
23
+ "test": "node --experimental-strip-types --test src/shared/safety.test.ts src/core/planner.test.ts",
24
+ "prepack": "cp README.md README.github.md && cp README.npm.md README.md",
25
+ "postpack": "mv README.github.md README.md"
24
26
  },
25
27
  "devDependencies": {
26
28
  "@types/chrome": "^0.0.322",
@@ -1,41 +0,0 @@
1
- name: CI
2
-
3
- on:
4
- push:
5
- branches: [main]
6
-
7
- permissions:
8
- contents: write
9
-
10
- jobs:
11
- ci:
12
- runs-on: ubuntu-latest
13
- steps:
14
- - uses: actions/checkout@v4
15
- with:
16
- token: ${{ secrets.GITHUB_TOKEN }}
17
-
18
- - uses: actions/setup-node@v4
19
- with:
20
- node-version: '22'
21
- registry-url: 'https://registry.npmjs.org'
22
-
23
- - name: Install dependencies
24
- run: npm install
25
-
26
- - name: Run tests
27
- run: npm test
28
-
29
- - name: Bump patch version
30
- id: bump
31
- run: |
32
- NEW_VERSION=$(npm version patch --no-git-tag-version)
33
- echo "version=$NEW_VERSION" >> "$GITHUB_OUTPUT"
34
-
35
- - name: Commit version bump
36
- run: |
37
- git config user.name "github-actions[bot]"
38
- git config user.email "github-actions[bot]@users.noreply.github.com"
39
- git add package.json
40
- git commit -m "chore: bump version to ${{ steps.bump.outputs.version }} [skip ci]"
41
- git push
@@ -1,64 +0,0 @@
1
- # OmniBrowser Agent Architecture (v0.2)
2
-
3
- ## Goals
4
-
5
- - Local-first runtime in browser
6
- - Privacy-first defaults
7
- - Open-source composable planner/executor contracts
8
- - Human-approved mode for risky actions
9
-
10
- ## Runtime Components
11
-
12
- 1. Popup UI (`src/popup`)
13
- - Starts/stops sessions
14
- - Picks mode (`autonomous`, `human-approved`)
15
- - Picks planner (`heuristic`, `webllm`)
16
-
17
- 2. Background Service Worker (`src/background`)
18
- - Session state machine per tab
19
- - Tick loop orchestration
20
- - Approval handling
21
-
22
- 3. Content Agent (`src/content`)
23
- - `observer`: page snapshot extraction
24
- - `planner`: next-action decision (heuristic / WebLLM)
25
- - `safety`: risk gating (`safe`, `review`, `blocked`)
26
- - `executor`: DOM action execution
27
-
28
- ## Contracts
29
-
30
- - Shared in `src/shared/contracts.ts`
31
- - Action protocol:
32
- - click
33
- - type
34
- - navigate
35
- - extract
36
- - scroll
37
- - focus
38
- - wait
39
- - done
40
-
41
- ## Safety Model
42
-
43
- - Block invalid URL protocols
44
- - Review risky actions (submit/delete/pay-like selectors)
45
- - In `human-approved` mode, review-level actions require manual approval
46
-
47
- ## Planner Bridges
48
-
49
- All planner bridges follow the same pattern: an object attached to `window` that implements a `plan()` method returning an `AgentAction`. The core library has zero runtime dependencies — bridge implementations are provided by the consumer.
50
-
51
- ### WebLLM bridge
52
-
53
- ```ts
54
- window.__browserAgentWebLLM = {
55
- async plan(input, modelId) { /* call local WebLLM engine, return AgentAction */ }
56
- };
57
- ```
58
-
59
- ## Limitations (v0.2)
60
-
61
- - No persistent long-term memory yet
62
- - No task DSL/skills registry yet
63
- - Risk scoring is simple keyword heuristic
64
- - Selector healing is basic (attribute fallback + single-element shortcut)
@@ -1,67 +0,0 @@
1
- # Deployment Guide
2
-
3
- ## npm Package
4
-
5
- ### Publish a new version manually
6
-
7
- ```bash
8
- npm run build
9
- npm publish --access public
10
- ```
11
-
12
- The CI pipeline auto-bumps the patch version on every push to `main`, so manual version bumps are only needed for minor/major releases:
13
-
14
- ```bash
15
- npm version minor # 0.2.x → 0.3.0
16
- npm version major # 0.x.y → 1.0.0
17
- npm run build
18
- npm publish --access public
19
- ```
20
-
21
- ### Required secret
22
-
23
- Add `NPM_TOKEN` to your GitHub repository secrets if you want the pipeline to publish automatically (not enabled by default).
24
-
25
- ---
26
-
27
- ## Vercel (Static Site / Chatbot Demo)
28
-
29
- The homepage and chatbot demo are static files served from the repo root.
30
-
31
- 1. Import the repository at [vercel.com/new](https://vercel.com/new).
32
- 2. Vercel picks up `vercel.json` automatically — no extra configuration needed.
33
- 3. Every push to `main` triggers a new deployment.
34
-
35
- `vercel.json` key settings:
36
- ```json
37
- {
38
- "buildCommand": null,
39
- "outputDirectory": "."
40
- }
41
- ```
42
-
43
- ---
44
-
45
- ## Chrome Extension (local / sideload)
46
-
47
- 1. Build the extension bundle:
48
- ```bash
49
- npm run build
50
- ```
51
- 2. Open `chrome://extensions` and enable **Developer mode**.
52
- 3. Click **Load unpacked** and select the `public/` folder.
53
-
54
- To update after code changes, rebuild and click the refresh icon on the extension card.
55
-
56
- ---
57
-
58
- ## CI Pipeline
59
-
60
- The GitHub Actions workflow (`.github/workflows/ci.yml`) runs on every push to `main`:
61
-
62
- 1. Installs dependencies.
63
- 2. Runs `npm test` (Node built-in test runner, no extra deps).
64
- 3. Bumps the patch version in `package.json` via `npm version patch`.
65
- 4. Commits and pushes the version bump with `[skip ci]` to avoid a second run.
66
-
67
- The commit is made by `github-actions[bot]` using the built-in `GITHUB_TOKEN` — no extra secrets required.
package/docs/EMBEDDING.md DELETED
@@ -1,74 +0,0 @@
1
- # Embedding OmniBrowser Agent in Your Website
2
-
3
- You can keep the extension flow and also embed OmniBrowser Agent as a library in your own web app.
4
-
5
- ## Install
6
-
7
- ```bash
8
- npm install @akshayram1/omnibrowser-agent
9
- ```
10
-
11
- ## Basic usage
12
-
13
- ```ts
14
- import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
15
-
16
- const agent = createBrowserAgent(
17
- {
18
- goal: "Search contact Jane Doe and open profile",
19
- mode: "human-approved",
20
- planner: { kind: "heuristic" },
21
- maxSteps: 15,
22
- stepDelayMs: 400
23
- },
24
- {
25
- onStep: (result) => console.log("step", result),
26
- onApprovalRequired: (action) => {
27
- console.log("approval required", action);
28
- // Show your own modal/button then call approvePendingAction()
29
- },
30
- onDone: (result) => console.log("done", result),
31
- onError: (error) => console.error(error)
32
- }
33
- );
34
-
35
- await agent.start();
36
- ```
37
-
38
- ## Approve a pending action
39
-
40
- ```ts
41
- await agent.approvePendingAction();
42
- ```
43
-
44
- ## Stop running session
45
-
46
- ```ts
47
- agent.stop();
48
- ```
49
-
50
- ## WebLLM mode in embedded app
51
-
52
- To use planner mode `webllm`, load the WebLLM engine and wire the bridge before starting the agent:
53
-
54
- ```ts
55
- import * as webllm from "@mlc-ai/web-llm";
56
- import { createBrowserAgent, createWebLLMBridge } from "@akshayram1/omnibrowser-agent";
57
-
58
- const engine = await webllm.CreateMLCEngine("Llama-3.2-1B-Instruct-q4f16_1-MLC");
59
-
60
- window.__browserAgentWebLLM = createWebLLMBridge(engine);
61
-
62
- const agent = createBrowserAgent({
63
- goal: "Fill the contact form",
64
- planner: { kind: "webllm", modelId: "Llama-3.2-1B-Instruct-q4f16_1-MLC" }
65
- });
66
-
67
- await agent.start();
68
- ```
69
-
70
- ## Notes
71
-
72
- - For production, mount this inside an authenticated app shell and add your own permission checks.
73
- - `human-approved` mode is recommended for CRM/finance/admin actions.
74
- - Bring your own WebLLM engine instance, then wire `createWebLLMBridge(engine)` to `window.__browserAgentWebLLM`.
package/docs/ROADMAP.md DELETED
@@ -1,29 +0,0 @@
1
- # Roadmap
2
-
3
- ## v0.1
4
-
5
- - Extension runtime loop
6
- - Shared action contracts
7
- - Heuristic + WebLLM planner switch
8
- - Human-approved mode
9
-
10
- ## v0.2 (current)
11
-
12
- - New actions: `scroll`, `focus`
13
- - Improved heuristic planner with regex goal patterns
14
- - Better page observation (visibility filtering, placeholder capture)
15
- - Library API: `resume()`, `isRunning`, `hasPendingAction`, `AbortSignal`, `onMaxStepsReached`
16
-
17
- ## v0.3
18
-
19
- - Expanded WebLLM model catalog (new 7B/8B options + compatibility matrix)
20
- - Improved model loading UX (recommended presets by speed/quality and device memory)
21
- - Enhanced default system prompts for safer, clearer multi-step planning
22
- - Prompt presets for common workflows (docs navigation, CRM form fill, task automation)
23
-
24
- ## v1.0
25
-
26
- - Advanced prompt orchestration (goal-aware system prompt routing and contextual guardrails)
27
- - Functionality expansion: richer action toolkit and stronger extraction/navigation reliability
28
- - Adaptive planner behaviour (model-aware retries, fallback strategies, and recovery flows)
29
- - Evaluation suite for prompt and model quality across benchmark browser tasks