agentic-browser 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md ADDED
@@ -0,0 +1,128 @@
1
+ # Agent Instructions
2
+
3
+ ## Quick Reference
4
+
5
+ ```bash
6
+ npm run build # tsdown (~20ms)
7
+ npm run typecheck # tsc --noEmit
8
+ npm run lint # oxlint
9
+ npm run lint:fix # oxlint --fix
10
+ npm run format # oxfmt --write
11
+ npm test # vitest run
12
+ npm run test:watch # vitest
13
+ npm run docs:dev # vocs dev server
14
+ ```
15
+
16
+ ## Architecture
17
+
18
+ AI-driven browser automation via Chrome DevTools Protocol (CDP). Three interfaces: CLI, MCP server, programmatic API.
19
+
20
+ ### Module Map
21
+
22
+ ```
23
+ src/
24
+ index.ts — Public API exports (AgenticBrowserCore + types)
25
+ cli/
26
+ index.ts — CLI entry (Commander.js, colon-namespaced commands)
27
+ runtime.ts — AgenticBrowserCore class + factory functions
28
+ app.ts — AppContext DI container (config, logger, eventStore, tokenService, memoryService)
29
+ commands/agent.ts — Stateful agent commands (auto-restart, retry, session persistence)
30
+ commands/*.ts — Low-level CLI command handlers
31
+ mcp/
32
+ index.ts — MCP server (stdio transport, 7 tools wrapping AgenticBrowserCore)
33
+ session/
34
+ browser-controller.ts — BrowserController interface + ChromeCdpBrowserController (CDP WebSocket)
35
+ session-manager.ts — Orchestrates sessions, commands, memory recording
36
+ session-state.ts — In-memory state tracking
37
+ chrome-launcher.ts — Chrome executable discovery & launch
38
+ transport/
39
+ control-api.ts — ControlApi facade (delegates to SessionManager)
40
+ ws-server.ts — Authenticated WebSocket server
41
+ memory/
42
+ memory-service.ts — Task memory coordination
43
+ memory-index.ts — Search/ranking (fuzzy match + freshness + domain)
44
+ task-insight-store.ts — JSON file persistence
45
+ staleness-detector.ts — Freshness state machine (fresh → suspect → stale)
46
+ memory-schemas.ts — Zod v4 schemas for memory domain
47
+ auth/ — Token-based session auth
48
+ lib/
49
+ config.ts — loadConfig() from env vars
50
+ domain-schemas.ts — Zod v4 schemas (Session, Command, ConnectionState, etc.)
51
+ observability/ — Logger + EventStore
52
+ ```
53
+
54
+ ### Key Flow
55
+
56
+ ```
57
+ AgenticBrowserCore → ControlApi → SessionManager → BrowserController (CDP)
58
+ → MemoryService (record evidence)
59
+ ```
60
+
61
+ 1. `createAgenticBrowserCore()` builds AppContext + ChromeCdpBrowserController
62
+ 2. Commands execute via CDP `Runtime.evaluate` on the browser page
63
+ 3. Results are recorded as evidence, indexed per-domain for memory search
64
+
65
+ ## Code Conventions
66
+
67
+ - **ESM-only**: `"type": "module"`, use `.js` extensions in all TypeScript imports
68
+ - **Zod v4**: `import { z } from "zod"` — `z.record()` requires key+value args
69
+ - **Commander.js v14**: colon-namespaced commands (`session:start`, `memory:search`)
70
+ - **CLI output**: exactly one JSON line to `stdout`, errors to `stderr`
71
+ - **Types**: interfaces for public contracts, type aliases for unions/inferred
72
+ - **No console.log in MCP server**: use `process.stderr.write()` for debug output
73
+
74
+ ## How to Add a New CLI Command
75
+
76
+ 1. Create handler in `src/cli/commands/<name>.ts`
77
+ ```ts
78
+ export async function myCommand(runtime: Runtime, input: { ... }) {
79
+ return runtime.api.doSomething(input);
80
+ }
81
+ ```
82
+ 2. Register in `src/cli/index.ts` with `program.command("<name>").action(...)`
83
+ 3. Optionally add agent wrapper in `src/cli/commands/agent.ts`
84
+
85
+ ## How to Add a New BrowserController Method
86
+
87
+ 1. Add to `BrowserController` interface in `src/session/browser-controller.ts`
88
+ 2. Implement in `ChromeCdpBrowserController` (CDP `Runtime.evaluate` pattern)
89
+ 3. Add stub in `MockBrowserController`
90
+ 4. Propagate: `SessionManager` → `ControlApi` → `AgenticBrowserCore`
91
+
92
+ ## How to Add a New MCP Tool
93
+
94
+ 1. Add `server.tool()` call in `src/mcp/index.ts`
95
+ 2. Use Zod v4 schemas for tool parameters
96
+ 3. Call `AgenticBrowserCore` methods directly
97
+ 4. Return `{ content: [{ type: "text", text: JSON.stringify(result) }] }`
98
+
99
+ ## Testing
100
+
101
+ - **Unit**: `tests/unit/*.unit.test.ts` — pure logic with mocks
102
+ - **Contract**: `tests/contract/*.contract.test.ts` — API contract validation
103
+ - **Integration**: `tests/integration/*.integration.test.ts` — full lifecycle with MockBrowserController
104
+ - Factory: `createMockAgenticBrowserCore(env)` — never launches real Chrome
105
+ - Framework: Vitest, no special setup needed
106
+
107
+ ## Environment Variables
108
+
109
+ - `AGENTIC_BROWSER_LOG_DIR` — base dir for sessions/memory/events (default: `.agentic-browser`)
110
+ - `AGENTIC_BROWSER_CHROME_EXECUTABLE_PATH` — explicit Chrome path (auto-discovered if not set)
111
+
112
+ ## MCP Server
113
+
114
+ Subcommand: `agentic-browser mcp` (stdio transport). Setup: `agentic-browser setup`. Tools:
115
+
116
+ | Tool | Purpose |
117
+ | ----------------------- | ------------------------------ |
118
+ | `browser_start_session` | Start Chrome, return sessionId |
119
+ | `browser_navigate` | Navigate to URL |
120
+ | `browser_interact` | click / type / press / waitFor |
121
+ | `browser_get_content` | Get page title / text / html |
122
+ | `browser_get_elements` | Discover interactive elements |
123
+ | `browser_search_memory` | Search task memory |
124
+ | `browser_stop_session` | Stop Chrome session |
125
+
126
+ ## For Browser Automation Tasks
127
+
128
+ See the [MCP Server](/mcp-server) docs for tool details and the README for CLI usage.
package/README.md ADDED
@@ -0,0 +1,226 @@
1
+ # agentic-browser
2
+
3
+ CLI and MCP server to control a local Chrome session for AI agents.
4
+
5
+ ## Purpose
6
+
7
+ - Starts a managed Chrome session.
8
+ - Accepts commands (for example `navigate`).
9
+ - Returns structured JSON output that an LLM can parse directly.
10
+ - Optimized for low-latency command execution by reusing CDP connections.
11
+
12
+ ## Requirements
13
+
14
+ - Node.js 20+
15
+ - Installed Chrome
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ npm install agentic-browser
21
+ ```
22
+
23
+ ## Build (Development)
24
+
25
+ ```bash
26
+ npm install
27
+ npm run build
28
+ ```
29
+
30
+ ## Quality Checks
31
+
32
+ ```bash
33
+ npm run format
34
+ npm run lint
35
+ npm test
36
+ ```
37
+
38
+ ## Agent Commands (Recommended for LLMs)
39
+
40
+ The `agent` subcommand manages session state, auto-restarts on disconnect, generates command IDs, and retries failed commands automatically:
41
+
42
+ ```bash
43
+ agentic-browser agent start
44
+ agentic-browser agent status
45
+ agentic-browser agent run navigate '{"url":"https://example.com"}'
46
+ agentic-browser agent run interact '{"action":"click","selector":"#login"}'
47
+ agentic-browser agent content --mode text
48
+ agentic-browser agent content --mode html --selector main
49
+ agentic-browser agent elements
50
+ agentic-browser agent elements --roles button,link --limit 20
51
+ agentic-browser agent memory-search "navigate:example.com" --domain example.com
52
+ agentic-browser agent stop
53
+ agentic-browser agent cleanup --dry-run --max-age-days 7
54
+ ```
55
+
56
+ ### Discover Interactive Elements
57
+
58
+ List all clickable/interactive elements on the current page:
59
+
60
+ ```bash
61
+ agentic-browser agent elements
62
+ agentic-browser agent elements --roles button,link,input --visible-only --limit 30
63
+ agentic-browser agent elements --selector "#main-content"
64
+ ```
65
+
66
+ Returns a JSON array of elements with CSS selectors usable in `agent run interact`:
67
+
68
+ ```json
69
+ {
70
+ "ok": true,
71
+ "action": "elements",
72
+ "elements": [
73
+ {
74
+ "selector": "#login-btn",
75
+ "role": "button",
76
+ "tagName": "button",
77
+ "text": "Login",
78
+ "actions": ["click"],
79
+ "visible": true,
80
+ "enabled": true
81
+ }
82
+ ],
83
+ "totalFound": 42,
84
+ "truncated": true
85
+ }
86
+ ```
87
+
88
+ ## MCP Server
89
+
90
+ ### Quick Setup
91
+
92
+ ```bash
93
+ npx agentic-browser setup
94
+ ```
95
+
96
+ Detects your AI tools (Claude Code, Cursor) and writes the MCP config automatically.
97
+
98
+ ### Manual Configuration
99
+
100
+ Add to your MCP config (`.mcp.json`, `.cursor/mcp.json`, etc.):
101
+
102
+ ```json
103
+ {
104
+ "mcpServers": {
105
+ "agentic-browser": {
106
+ "command": "npx",
107
+ "args": ["agentic-browser", "mcp"]
108
+ }
109
+ }
110
+ }
111
+ ```
112
+
113
+ ## Low-Level CLI Commands
114
+
115
+ For direct control without session state management:
116
+
117
+ ### 1. Start a Session
118
+
119
+ ```bash
120
+ agentic-browser session:start
121
+ ```
122
+
123
+ ### 2. Read Session Status
124
+
125
+ ```bash
126
+ agentic-browser session:status <sessionId>
127
+ ```
128
+
129
+ ### 3. Run a Command (`navigate` / `interact`)
130
+
131
+ ```bash
132
+ agentic-browser command:run <sessionId> <commandId> navigate '{"url":"https://example.com"}'
133
+ agentic-browser command:run <sessionId> cmd-2 interact '{"action":"click","selector":"a"}'
134
+ ```
135
+
136
+ More `interact` actions:
137
+
138
+ - `{"action":"type","selector":"input[name=q]","text":"innoq"}`
139
+ - `{"action":"press","key":"Enter"}`
140
+ - `{"action":"waitFor","selector":"main","timeoutMs":4000}`
141
+
142
+ ### 4. Read Page Content
143
+
144
+ ```bash
145
+ agentic-browser page:content <sessionId> --mode title
146
+ agentic-browser page:content <sessionId> --mode text
147
+ agentic-browser page:content <sessionId> --mode html --selector main
148
+ ```
149
+
150
+ ### 5. Rotate Session Token
151
+
152
+ ```bash
153
+ agentic-browser session:auth <sessionId>
154
+ ```
155
+
156
+ ### 6. Restart / Stop / Cleanup
157
+
158
+ ```bash
159
+ agentic-browser session:restart <sessionId>
160
+ agentic-browser session:stop <sessionId>
161
+ agentic-browser session:cleanup --max-age-days 7
162
+ ```
163
+
164
+ ### 7. Task Memory
165
+
166
+ ```bash
167
+ agentic-browser memory:search "navigate:example.com" --domain example.com --limit 5
168
+ agentic-browser memory:inspect <insightId>
169
+ agentic-browser memory:verify <insightId>
170
+ agentic-browser memory:stats
171
+ ```
172
+
173
+ ## Recommended Agent Flow
174
+
175
+ 1. `agent start` — launch Chrome and persist session.
176
+ 2. `agent elements` — discover what's on the page.
177
+ 3. `agent run navigate/interact` — execute actions using discovered selectors.
178
+ 4. `agent content` — read page content after actions.
179
+ 5. `agent memory-search` — reuse known selectors for repeated tasks.
180
+ 6. `agent stop` — terminate when done.
181
+
182
+ ## Important Notes for LLMs
183
+
184
+ - Exactly **one** managed session is supported at a time.
185
+ - Session state is persisted in `.agentic-browser/`.
186
+ - All commands print exactly one JSON line to `stdout`.
187
+ - `payloadJson` must be valid JSON.
188
+ - Parse only `stdout` as result object and use exit code for success/failure.
189
+
190
+ ## Programmatic API
191
+
192
+ ```ts
193
+ import { createAgenticBrowserCore } from "agentic-browser";
194
+
195
+ const core = createAgenticBrowserCore();
196
+ const session = await core.startSession();
197
+
198
+ await core.runCommand({
199
+ sessionId: session.sessionId,
200
+ commandId: "cmd-1",
201
+ type: "navigate",
202
+ payload: { url: "https://example.com" },
203
+ });
204
+
205
+ const elements = await core.getInteractiveElements({
206
+ sessionId: session.sessionId,
207
+ roles: ["button", "link"],
208
+ visibleOnly: true,
209
+ limit: 30,
210
+ });
211
+
212
+ const memory = core.searchMemory({
213
+ taskIntent: "navigate:example.com",
214
+ siteDomain: "example.com",
215
+ limit: 3,
216
+ });
217
+
218
+ await core.stopSession(session.sessionId);
219
+ ```
220
+
221
+ ## Documentation
222
+
223
+ ```bash
224
+ npm run docs:dev # Dev server at localhost:5173
225
+ npm run docs:build # Static build
226
+ ```