sentienceapi 0.90.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (107) hide show
  1. package/LICENSE.md +43 -0
  2. package/README.md +946 -0
  3. package/dist/actions.d.ts +54 -0
  4. package/dist/actions.d.ts.map +1 -0
  5. package/dist/actions.js +349 -0
  6. package/dist/actions.js.map +1 -0
  7. package/dist/agent.d.ts +157 -0
  8. package/dist/agent.d.ts.map +1 -0
  9. package/dist/agent.js +437 -0
  10. package/dist/agent.js.map +1 -0
  11. package/dist/browser.d.ts +46 -0
  12. package/dist/browser.d.ts.map +1 -0
  13. package/dist/browser.js +622 -0
  14. package/dist/browser.js.map +1 -0
  15. package/dist/cli.d.ts +5 -0
  16. package/dist/cli.d.ts.map +1 -0
  17. package/dist/cli.js +174 -0
  18. package/dist/cli.js.map +1 -0
  19. package/dist/conversational-agent.d.ts +123 -0
  20. package/dist/conversational-agent.d.ts.map +1 -0
  21. package/dist/conversational-agent.js +327 -0
  22. package/dist/conversational-agent.js.map +1 -0
  23. package/dist/expect.d.ts +16 -0
  24. package/dist/expect.d.ts.map +1 -0
  25. package/dist/expect.js +66 -0
  26. package/dist/expect.js.map +1 -0
  27. package/dist/generator.d.ts +16 -0
  28. package/dist/generator.d.ts.map +1 -0
  29. package/dist/generator.js +205 -0
  30. package/dist/generator.js.map +1 -0
  31. package/dist/index.d.ts +21 -0
  32. package/dist/index.d.ts.map +1 -0
  33. package/dist/index.js +70 -0
  34. package/dist/index.js.map +1 -0
  35. package/dist/inspector.d.ts +13 -0
  36. package/dist/inspector.d.ts.map +1 -0
  37. package/dist/inspector.js +147 -0
  38. package/dist/inspector.js.map +1 -0
  39. package/dist/llm-provider.d.ts +60 -0
  40. package/dist/llm-provider.d.ts.map +1 -0
  41. package/dist/llm-provider.js +106 -0
  42. package/dist/llm-provider.js.map +1 -0
  43. package/dist/query.d.ts +8 -0
  44. package/dist/query.d.ts.map +1 -0
  45. package/dist/query.js +337 -0
  46. package/dist/query.js.map +1 -0
  47. package/dist/read.d.ts +40 -0
  48. package/dist/read.d.ts.map +1 -0
  49. package/dist/read.js +86 -0
  50. package/dist/read.js.map +1 -0
  51. package/dist/recorder.d.ts +44 -0
  52. package/dist/recorder.d.ts.map +1 -0
  53. package/dist/recorder.js +256 -0
  54. package/dist/recorder.js.map +1 -0
  55. package/dist/screenshot.d.ts +17 -0
  56. package/dist/screenshot.d.ts.map +1 -0
  57. package/dist/screenshot.js +37 -0
  58. package/dist/screenshot.js.map +1 -0
  59. package/dist/snapshot.d.ts +23 -0
  60. package/dist/snapshot.d.ts.map +1 -0
  61. package/dist/snapshot.js +187 -0
  62. package/dist/snapshot.js.map +1 -0
  63. package/dist/tracing/cloud-sink.d.ts +74 -0
  64. package/dist/tracing/cloud-sink.d.ts.map +1 -0
  65. package/dist/tracing/cloud-sink.js +262 -0
  66. package/dist/tracing/cloud-sink.js.map +1 -0
  67. package/dist/tracing/index.d.ts +12 -0
  68. package/dist/tracing/index.d.ts.map +1 -0
  69. package/dist/tracing/index.js +28 -0
  70. package/dist/tracing/index.js.map +1 -0
  71. package/dist/tracing/jsonl-sink.d.ts +41 -0
  72. package/dist/tracing/jsonl-sink.d.ts.map +1 -0
  73. package/dist/tracing/jsonl-sink.js +168 -0
  74. package/dist/tracing/jsonl-sink.js.map +1 -0
  75. package/dist/tracing/sink.d.ts +24 -0
  76. package/dist/tracing/sink.d.ts.map +1 -0
  77. package/dist/tracing/sink.js +15 -0
  78. package/dist/tracing/sink.js.map +1 -0
  79. package/dist/tracing/tracer-factory.d.ts +57 -0
  80. package/dist/tracing/tracer-factory.d.ts.map +1 -0
  81. package/dist/tracing/tracer-factory.js +274 -0
  82. package/dist/tracing/tracer-factory.js.map +1 -0
  83. package/dist/tracing/tracer.d.ts +74 -0
  84. package/dist/tracing/tracer.d.ts.map +1 -0
  85. package/dist/tracing/tracer.js +131 -0
  86. package/dist/tracing/tracer.js.map +1 -0
  87. package/dist/tracing/types.d.ts +63 -0
  88. package/dist/tracing/types.d.ts.map +1 -0
  89. package/dist/tracing/types.js +8 -0
  90. package/dist/tracing/types.js.map +1 -0
  91. package/dist/types.d.ts +110 -0
  92. package/dist/types.d.ts.map +1 -0
  93. package/dist/types.js +6 -0
  94. package/dist/types.js.map +1 -0
  95. package/dist/utils.d.ts +29 -0
  96. package/dist/utils.d.ts.map +1 -0
  97. package/dist/utils.js +74 -0
  98. package/dist/utils.js.map +1 -0
  99. package/dist/wait.d.ts +20 -0
  100. package/dist/wait.d.ts.map +1 -0
  101. package/dist/wait.js +63 -0
  102. package/dist/wait.js.map +1 -0
  103. package/package.json +72 -0
  104. package/spec/README.md +72 -0
  105. package/spec/SNAPSHOT_V1.md +208 -0
  106. package/spec/sdk-types.md +259 -0
  107. package/spec/snapshot.schema.json +148 -0
package/README.md ADDED
@@ -0,0 +1,946 @@
1
+ # Sentience TypeScript SDK
2
+
3
+ The SDK is open under ELv2; the core semantic geometry and reliability logic runs in Sentience-hosted services.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ # Install from npm
9
+ npm install sentienceapi
10
+
11
+ # Install Playwright browsers (required)
12
+ npx playwright install chromium
13
+ ```
14
+
15
+ **For local development:**
16
+ ```bash
17
+ npm install
18
+ npm run build
19
+ ```
20
+
21
+ ## Quick Start: Choose Your Abstraction Level
22
+
23
+ Sentience SDK offers **4 levels of abstraction** - choose based on your needs:
24
+
25
+ ### 💬 Level 4: Conversational Agent (Highest Abstraction) - **NEW in v0.3.0**
26
+
27
+ Complete automation with natural conversation. Just describe what you want, and the agent plans and executes everything:
28
+
29
+ ```typescript
30
+ import { SentienceBrowser, ConversationalAgent, OpenAIProvider } from 'sentienceapi';
31
+
32
+ const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
33
+ const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o');
34
+ const agent = new ConversationalAgent({ llmProvider: llm, browser });
35
+
36
+ // Navigate to starting page
37
+ await browser.getPage().goto('https://amazon.com');
38
+
39
+ // ONE command does it all - automatic planning and execution!
40
+ const response = await agent.execute(
41
+ "Search for 'wireless mouse' and tell me the price of the top result"
42
+ );
43
+ console.log(response); // "I found the top result for wireless mouse on Amazon. It's priced at $24.99..."
44
+
45
+ // Follow-up questions maintain context
46
+ const followUp = await agent.chat("Add it to cart");
47
+ console.log(followUp);
48
+
49
+ await browser.close();
50
+ ```
51
+
52
+ **When to use:** Complex multi-step tasks, conversational interfaces, maximum convenience
53
+ **Code reduction:** 99% less code - describe goals in natural language
54
+ **Requirements:** OpenAI or Anthropic API key
55
+
56
+ ### 🤖 Level 3: Agent (Natural Language Commands) - **Recommended for Most Users**
57
+
58
+ Zero coding knowledge needed. Just write what you want in plain English:
59
+
60
+ ```typescript
61
+ import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentienceapi';
62
+
63
+ const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
64
+ const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o-mini');
65
+ const agent = new SentienceAgent(browser, llm);
66
+
67
+ await browser.getPage().goto('https://www.amazon.com');
68
+
69
+ // Just natural language commands - agent handles everything!
70
+ await agent.act('Click the search box');
71
+ await agent.act("Type 'wireless mouse' into the search field");
72
+ await agent.act('Press Enter key');
73
+ await agent.act('Click the first product result');
74
+
75
+ // Automatic token tracking
76
+ console.log(`Tokens used: ${agent.getTokenStats().totalTokens}`);
77
+ await browser.close();
78
+ ```
79
+
80
+ **When to use:** Quick automation, non-technical users, rapid prototyping
81
+ **Code reduction:** 95-98% less code vs manual approach
82
+ **Requirements:** OpenAI API key (or Anthropic for Claude)
83
+
84
+ ### 🔧 Level 2: Direct SDK (Technical Control)
85
+
86
+ Full control with semantic selectors. For technical users who want precision:
87
+
88
+ ```typescript
89
+ import { SentienceBrowser, snapshot, find, click, typeText, press } from 'sentienceapi';
90
+
91
+ const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
92
+ await browser.getPage().goto('https://www.amazon.com');
93
+
94
+ // Get semantic snapshot
95
+ const snap = await snapshot(browser);
96
+
97
+ // Find elements using query DSL
98
+ const searchBox = find(snap, 'role=textbox text~"search"');
99
+ await click(browser, searchBox!.id);
100
+
101
+ // Type and submit
102
+ await typeText(browser, searchBox!.id, 'wireless mouse');
103
+ await press(browser, 'Enter');
104
+
105
+ await browser.close();
106
+ ```
107
+
108
+ **When to use:** Need precise control, debugging, custom workflows
109
+ **Code reduction:** Still 80% less code vs raw Playwright
110
+ **Requirements:** Only Sentience API key
111
+
112
+ ### ⚙️ Level 1: Raw Playwright (Maximum Control)
113
+
114
+ For when you need complete low-level control (rare):
115
+
116
+ ```typescript
117
+ import { chromium } from 'playwright';
118
+
119
+ const browser = await chromium.launch();
120
+ const page = await browser.newPage();
121
+ await page.goto('https://www.amazon.com');
122
+ await page.fill('#twotabsearchtextbox', 'wireless mouse');
123
+ await page.press('#twotabsearchtextbox', 'Enter');
124
+ await browser.close();
125
+ ```
126
+
127
+ **When to use:** Very specific edge cases, custom browser configs
128
+ **Tradeoffs:** No semantic intelligence, brittle selectors, more code
129
+
130
+ ---
131
+
132
+ ## Agent Execution Tracing (NEW in v0.3.1)
133
+
134
+ Record complete agent execution traces for debugging, analysis, and replay. Traces capture every step, snapshot, LLM decision, and action in a structured JSONL format.
135
+
136
+ ### Quick Start: Agent with Tracing
137
+
138
+ ```typescript
139
+ import {
140
+ SentienceBrowser,
141
+ SentienceAgent,
142
+ OpenAIProvider,
143
+ Tracer,
144
+ JsonlTraceSink
145
+ } from 'sentienceapi';
146
+ import { randomUUID } from 'crypto';
147
+
148
+ const browser = await SentienceBrowser.create({ apiKey: process.env.SENTIENCE_API_KEY });
149
+ const llm = new OpenAIProvider(process.env.OPENAI_API_KEY!, 'gpt-4o');
150
+
151
+ // Create a tracer
152
+ const runId = randomUUID();
153
+ const sink = new JsonlTraceSink(`traces/${runId}.jsonl`);
154
+ const tracer = new Tracer(runId, sink);
155
+
156
+ // Create agent with tracer
157
+ const agent = new SentienceAgent(browser, llm, 50, true, tracer);
158
+
159
+ // Emit run_start
160
+ tracer.emitRunStart('SentienceAgent', 'gpt-4o');
161
+
162
+ try {
163
+ await browser.getPage().goto('https://google.com');
164
+
165
+ // Every action is automatically traced!
166
+ await agent.act('Click the search box');
167
+ await agent.act("Type 'sentience ai' into the search field");
168
+ await agent.act('Press Enter');
169
+
170
+ tracer.emitRunEnd(3);
171
+ } finally {
172
+ // Flush trace to disk
173
+ await agent.closeTracer();
174
+ await browser.close();
175
+ }
176
+
177
+ console.log(`✅ Trace saved to: traces/${runId}.jsonl`);
178
+ ```
179
+
180
+ ### What Gets Traced
181
+
182
+ Each agent action generates multiple events:
183
+
184
+ 1. **step_start** - Before action execution (goal, URL, attempt)
185
+ 2. **snapshot** - Page state with all interactive elements
186
+ 3. **llm_response** - LLM decision (model, tokens, response)
187
+ 4. **action** - Executed action (type, element ID, success)
188
+ 5. **error** - Any failures (error message, retry attempt)
189
+
190
+ **Example trace output:**
191
+ ```jsonl
192
+ {"v":1,"type":"run_start","ts":"2025-12-26T10:00:00.000Z","run_id":"abc-123","seq":1,"data":{"agent":"SentienceAgent","llm_model":"gpt-4o"}}
193
+ {"v":1,"type":"step_start","ts":"2025-12-26T10:00:01.000Z","run_id":"abc-123","seq":2,"step_id":"step-1","data":{"step_index":1,"goal":"Click the search box","attempt":0,"url":"https://google.com"}}
194
+ {"v":1,"type":"snapshot","ts":"2025-12-26T10:00:01.500Z","run_id":"abc-123","seq":3,"step_id":"step-1","data":{"url":"https://google.com","elements":[...]}}
195
+ {"v":1,"type":"llm_response","ts":"2025-12-26T10:00:02.000Z","run_id":"abc-123","seq":4,"step_id":"step-1","data":{"model":"gpt-4o","prompt_tokens":250,"completion_tokens":10,"response_text":"CLICK(42)"}}
196
+ {"v":1,"type":"action","ts":"2025-12-26T10:00:02.500Z","run_id":"abc-123","seq":5,"step_id":"step-1","data":{"action_type":"click","element_id":42,"success":true}}
197
+ {"v":1,"type":"run_end","ts":"2025-12-26T10:00:03.000Z","run_id":"abc-123","seq":6,"data":{"steps":1}}
198
+ ```
199
+
200
+ ### Reading and Analyzing Traces
201
+
202
+ ```typescript
203
+ import * as fs from 'fs';
204
+
205
+ // Read trace file
206
+ const content = fs.readFileSync(`traces/${runId}.jsonl`, 'utf-8');
207
+ const events = content.trim().split('\n').map(JSON.parse);
208
+
209
+ console.log(`Total events: ${events.length}`);
210
+
211
+ // Analyze events
212
+ events.forEach(event => {
213
+ console.log(`[${event.seq}] ${event.type} - ${event.ts}`);
214
+ });
215
+
216
+ // Filter by type
217
+ const actions = events.filter(e => e.type === 'action');
218
+ console.log(`Actions taken: ${actions.length}`);
219
+
220
+ // Get token usage
221
+ const llmEvents = events.filter(e => e.type === 'llm_response');
222
+ const totalTokens = llmEvents.reduce((sum, e) => sum + (e.data.prompt_tokens || 0) + (e.data.completion_tokens || 0), 0);
223
+ console.log(`Total tokens: ${totalTokens}`);
224
+ ```
225
+
226
+ ### Tracing Without Agent (Manual)
227
+
228
+ You can also use the tracer directly for custom workflows:
229
+
230
+ ```typescript
231
+ import { Tracer, JsonlTraceSink } from 'sentienceapi';
232
+ import { randomUUID } from 'crypto';
233
+
234
+ const runId = randomUUID();
235
+ const sink = new JsonlTraceSink(`traces/${runId}.jsonl`);
236
+ const tracer = new Tracer(runId, sink);
237
+
238
+ // Emit custom events
239
+ tracer.emit('custom_event', {
240
+ message: 'Something happened',
241
+ details: { foo: 'bar' }
242
+ });
243
+
244
+ // Use convenience methods
245
+ tracer.emitRunStart('MyAgent', 'gpt-4o');
246
+ tracer.emitStepStart('step-1', 1, 'Do something');
247
+ tracer.emitError('step-1', 'Something went wrong');
248
+ tracer.emitRunEnd(1);
249
+
250
+ // Flush to disk
251
+ await tracer.close();
252
+ ```
253
+
254
+ ### Schema Compatibility
255
+
256
+ Traces are **100% compatible** with Python SDK traces - use the same tools to analyze traces from both TypeScript and Python agents!
257
+
258
+ **See full example:** [examples/agent-with-tracing.ts](examples/agent-with-tracing.ts)
259
+
260
+ ---
261
+
262
+ ## Agent Layer Examples
263
+
264
+ ### Google Search (6 lines of code)
265
+
266
+ ```typescript
267
+ import { SentienceBrowser, SentienceAgent, OpenAIProvider } from 'sentienceapi';
268
+
269
+ const browser = await SentienceBrowser.create({ apiKey: apiKey });
270
+ const llm = new OpenAIProvider(openaiKey, 'gpt-4o-mini');
271
+ const agent = new SentienceAgent(browser, llm);
272
+
273
+ await browser.getPage().goto('https://www.google.com');
274
+ await agent.act('Click the search box');
275
+ await agent.act("Type 'mechanical keyboards' into the search field");
276
+ await agent.act('Press Enter key');
277
+ await agent.act('Click the first non-ad search result');
278
+
279
+ await browser.close();
280
+ ```
281
+
282
+ **See full example:** [examples/agent-google-search.ts](examples/agent-google-search.ts)
283
+
284
+ ### Using Anthropic Claude Instead of GPT
285
+
286
+ ```typescript
287
+ import { SentienceAgent, AnthropicProvider } from 'sentienceapi';
288
+
289
+ // Swap OpenAI for Anthropic - same API!
290
+ const llm = new AnthropicProvider(
291
+ process.env.ANTHROPIC_API_KEY!,
292
+ 'claude-3-5-sonnet-20241022'
293
+ );
294
+
295
+ const agent = new SentienceAgent(browser, llm);
296
+ await agent.act('Click the search button'); // Works exactly the same
297
+ ```
298
+
299
+ **BYOB (Bring Your Own Brain):** OpenAI, Anthropic, or implement `LLMProvider` for any model.
300
+
301
+ **See full example:** [examples/agent-with-anthropic.ts](examples/agent-with-anthropic.ts)
302
+
303
+ ### Amazon Shopping (98% code reduction)
304
+
305
+ **Before (manual approach):** 350 lines
306
+ **After (agent layer):** 6 lines
307
+
308
+ ```typescript
309
+ await agent.act('Click the search box');
310
+ await agent.act("Type 'wireless mouse' into the search field");
311
+ await agent.act('Press Enter key');
312
+ await agent.act('Click the first visible product in the search results');
313
+ await agent.act("Click the 'Add to Cart' button");
314
+ ```
315
+
316
+ **See full example:** [examples/agent-amazon-shopping.ts](examples/agent-amazon-shopping.ts)
317
+
318
+ ---
319
+
320
+ ## Installation for Agent Layer
321
+
322
+ ```bash
323
+ # Install core SDK
324
+ npm install sentienceapi
325
+
326
+ # Install LLM provider (choose one or both)
327
+ npm install openai # For GPT-4, GPT-4o, GPT-4o-mini
328
+ npm install @anthropic-ai/sdk # For Claude 3.5 Sonnet
329
+
330
+ # Set API keys
331
+ export SENTIENCE_API_KEY="your-sentience-key"
332
+ export OPENAI_API_KEY="your-openai-key" # OR
333
+ export ANTHROPIC_API_KEY="your-anthropic-key"
334
+ ```
335
+
336
+ ---
337
+
338
+ ## Direct SDK Quick Start
339
+
340
+ ```typescript
341
+ import { SentienceBrowser, snapshot, find, click } from './src';
342
+
343
+ async function main() {
344
+ const browser = new SentienceBrowser();
345
+
346
+ try {
347
+ await browser.start();
348
+
349
+ await browser.goto('https://example.com');
350
+ await browser.getPage().waitForLoadState('networkidle');
351
+
352
+ // Take snapshot - captures all interactive elements
353
+ const snap = await snapshot(browser);
354
+ console.log(`Found ${snap.elements.length} elements`);
355
+
356
+ // Find and click a link using semantic selectors
357
+ const link = find(snap, 'role=link text~"More information"');
358
+ if (link) {
359
+ const result = await click(browser, link.id);
360
+ console.log(`Click success: ${result.success}`);
361
+ }
362
+ } finally {
363
+ await browser.close();
364
+ }
365
+ }
366
+
367
+ main();
368
+ ```
369
+
370
+ ## Real-World Example: Amazon Shopping Bot
371
+
372
+ This example demonstrates navigating Amazon, finding products, and adding items to cart:
373
+
374
+ ```typescript
375
+ import { SentienceBrowser, snapshot, find, click } from './src';
376
+
377
+ async function main() {
378
+ const browser = new SentienceBrowser(undefined, undefined, false);
379
+
380
+ try {
381
+ await browser.start();
382
+
383
+ // Navigate to Amazon Best Sellers
384
+ await browser.goto('https://www.amazon.com/gp/bestsellers/');
385
+ await browser.getPage().waitForLoadState('networkidle');
386
+ await new Promise(resolve => setTimeout(resolve, 2000));
387
+
388
+ // Take snapshot and find products
389
+ const snap = await snapshot(browser);
390
+ console.log(`Found ${snap.elements.length} elements`);
391
+
392
+ // Find first product in viewport using spatial filtering
393
+ const products = snap.elements
394
+ .filter(el =>
395
+ el.role === 'link' &&
396
+ el.visual_cues.is_clickable &&
397
+ el.in_viewport &&
398
+ !el.is_occluded &&
399
+ el.bbox.y < 600 // First row
400
+ );
401
+
402
+ if (products.length > 0) {
403
+ // Sort by position (left to right, top to bottom)
404
+ products.sort((a, b) => a.bbox.y - b.bbox.y || a.bbox.x - b.bbox.x);
405
+ const firstProduct = products[0];
406
+
407
+ console.log(`Clicking: ${firstProduct.text}`);
408
+ const result = await click(browser, firstProduct.id);
409
+
410
+ // Wait for product page
411
+ await browser.getPage().waitForLoadState('networkidle');
412
+ await new Promise(resolve => setTimeout(resolve, 2000));
413
+
414
+ // Find and click "Add to Cart" button
415
+ const productSnap = await snapshot(browser);
416
+ const addToCart = find(productSnap, 'role=button text~"add to cart"');
417
+
418
+ if (addToCart) {
419
+ const cartResult = await click(browser, addToCart.id);
420
+ console.log(`Added to cart: ${cartResult.success}`);
421
+ }
422
+ }
423
+ } finally {
424
+ await browser.close();
425
+ }
426
+ }
427
+
428
+ main();
429
+ ```
430
+
431
+ **See the complete tutorial**: [Amazon Shopping Guide](../docs/AMAZON_SHOPPING_GUIDE.md)
432
+
433
+ ## Running Examples
434
+
435
+ **⚠️ Important**: You cannot use `node` directly to run TypeScript files. Use one of these methods:
436
+
437
+ ### Option 1: Using npm scripts (recommended)
438
+ ```bash
439
+ npm run example:hello
440
+ npm run example:basic
441
+ npm run example:query
442
+ npm run example:wait
443
+ ```
444
+
445
+ ### Option 2: Using ts-node directly
446
+ ```bash
447
+ npx ts-node examples/hello.ts
448
+ # or if ts-node is installed globally:
449
+ ts-node examples/hello.ts
450
+ ```
451
+
452
+ ### Option 3: Compile then run
453
+ ```bash
454
+ npm run build
455
+ # Then use compiled JavaScript from dist/
456
+ ```
457
+
458
+ ## Core Features
459
+
460
+ ### Browser Control
461
+ - **`SentienceBrowser`** - Playwright browser with Sentience extension pre-loaded
462
+ - **`browser.goto(url)`** - Navigate with automatic extension readiness checks
463
+ - Automatic bot evasion and stealth mode
464
+ - Configurable headless/headed mode
465
+
466
+ ### Snapshot - Intelligent Page Analysis
467
+ - **`snapshot(browser, options?)`** - Capture page state with AI-ranked elements
468
+ - Returns semantic elements with roles, text, importance scores, and bounding boxes
469
+ - Optional screenshot capture (PNG/JPEG)
470
+ - TypeScript types for type safety
471
+
472
+ **Example:**
473
+ ```typescript
474
+ const snap = await snapshot(browser, { screenshot: true });
475
+
476
+ // Access structured data
477
+ console.log(`URL: ${snap.url}`);
478
+ console.log(`Viewport: ${snap.viewport.width}x${snap.viewport.height}`);
479
+ console.log(`Elements: ${snap.elements.length}`);
480
+
481
+ // Iterate over elements
482
+ for (const element of snap.elements) {
483
+ console.log(`${element.role}: ${element.text} (importance: ${element.importance})`);
484
+ }
485
+ ```
486
+
487
+ ### Query Engine - Semantic Element Selection
488
+ - **`query(snapshot, selector)`** - Find all matching elements
489
+ - **`find(snapshot, selector)`** - Find single best match (by importance)
490
+ - Powerful query DSL with multiple operators
491
+
492
+ **Query Examples:**
493
+ ```typescript
494
+ // Find by role and text
495
+ const button = find(snap, 'role=button text="Sign in"');
496
+
497
+ // Substring match (case-insensitive)
498
+ const link = find(snap, 'role=link text~"more info"');
499
+
500
+ // Spatial filtering
501
+ const topLeft = find(snap, 'bbox.x<=100 bbox.y<=200');
502
+
503
+ // Multiple conditions (AND logic)
504
+ const primaryBtn = find(snap, 'role=button clickable=true visible=true importance>800');
505
+
506
+ // Prefix/suffix matching
507
+ const startsWith = find(snap, 'text^="Add"');
508
+ const endsWith = find(snap, 'text$="Cart"');
509
+
510
+ // Numeric comparisons
511
+ const important = query(snap, 'importance>=700');
512
+ const firstRow = query(snap, 'bbox.y<600');
513
+ ```
514
+
515
+ **📖 [Complete Query DSL Guide](docs/QUERY_DSL.md)** - All operators, fields, and advanced patterns
516
+
517
+ ### Actions - Interact with Elements
518
+ - **`click(browser, elementId)`** - Click element by ID
519
+ - **`clickRect(browser, rect)`** - Click at center of rectangle (coordinate-based)
520
+ - **`typeText(browser, elementId, text)`** - Type into input fields
521
+ - **`press(browser, key)`** - Press keyboard keys (Enter, Escape, Tab, etc.)
522
+
523
+ All actions return `ActionResult` with success status, timing, and outcome:
524
+
525
+ ```typescript
526
+ const result = await click(browser, element.id);
527
+
528
+ console.log(`Success: ${result.success}`);
529
+ console.log(`Outcome: ${result.outcome}`); // "navigated", "dom_updated", "error"
530
+ console.log(`Duration: ${result.duration_ms}ms`);
531
+ console.log(`URL changed: ${result.url_changed}`);
532
+ ```
533
+
534
+ **Coordinate-based clicking:**
535
+ ```typescript
536
+ import { clickRect } from './src';
537
+
538
+ // Click at center of rectangle (x, y, width, height)
539
+ await clickRect(browser, { x: 100, y: 200, w: 50, h: 30 });
540
+
541
+ // With visual highlight (default: red border for 2 seconds)
542
+ await clickRect(browser, { x: 100, y: 200, w: 50, h: 30 }, true, 2.0);
543
+
544
+ // Using element's bounding box
545
+ const snap = await snapshot(browser);
546
+ const element = find(snap, 'role=button');
547
+ if (element) {
548
+ await clickRect(browser, {
549
+ x: element.bbox.x,
550
+ y: element.bbox.y,
551
+ w: element.bbox.width,
552
+ h: element.bbox.height
553
+ });
554
+ }
555
+ ```
556
+
557
+ ### Wait & Assertions
558
+ - **`waitFor(browser, selector, timeout?, interval?, useApi?)`** - Wait for element to appear
559
+ - **`expect(browser, selector)`** - Assertion helper with fluent API
560
+
561
+ **Examples:**
562
+ ```typescript
563
+ // Wait for element (auto-detects optimal interval based on API usage)
564
+ const result = await waitFor(browser, 'role=button text="Submit"', 10000);
565
+ if (result.found) {
566
+ console.log(`Found after ${result.duration_ms}ms`);
567
+ }
568
+
569
+ // Use local extension with fast polling (250ms interval)
570
+ const result = await waitFor(browser, 'role=button', 5000, undefined, false);
571
+
572
+ // Use remote API with network-friendly polling (1500ms interval)
573
+ const result = await waitFor(browser, 'role=button', 5000, undefined, true);
574
+
575
+ // Custom interval override
576
+ const result = await waitFor(browser, 'role=button', 5000, 500, false);
577
+
578
+ // Semantic wait conditions
579
+ await waitFor(browser, 'clickable=true', 5000); // Wait for clickable element
580
+ await waitFor(browser, 'importance>100', 5000); // Wait for important element
581
+ await waitFor(browser, 'role=link visible=true', 5000); // Wait for visible link
582
+
583
+ // Assertions
584
+ await expect(browser, 'role=button text="Submit"').toExist(5000);
585
+ await expect(browser, 'role=heading').toBeVisible();
586
+ await expect(browser, 'role=button').toHaveText('Submit');
587
+ await expect(browser, 'role=link').toHaveCount(10);
588
+ ```
589
+
590
+ ### Content Reading
591
+ - **`read(browser, options?)`** - Extract page content
592
+ - `format: "text"` - Plain text extraction
593
+ - `format: "markdown"` - High-quality markdown conversion (uses Turndown)
594
+ - `format: "raw"` - Cleaned HTML (default)
595
+
596
+ **Example:**
597
+ ```typescript
598
+ import { read } from './src';
599
+
600
+ // Get markdown content
601
+ const result = await read(browser, { format: 'markdown' });
602
+ console.log(result.content); // Markdown text
603
+
604
+ // Get plain text
605
+ const result = await read(browser, { format: 'text' });
606
+ console.log(result.content); // Plain text
607
+ ```
608
+
609
+ ### Screenshots
610
+ - **`screenshot(browser, options?)`** - Standalone screenshot capture
611
+ - Returns base64-encoded data URL
612
+ - PNG or JPEG format
613
+ - Quality control for JPEG (1-100)
614
+
615
+ **Example:**
616
+ ```typescript
617
+ import { screenshot } from './src';
618
+ import { writeFileSync } from 'fs';
619
+
620
+ // Capture PNG screenshot
621
+ const dataUrl = await screenshot(browser, { format: 'png' });
622
+
623
+ // Save to file
624
+ const base64Data = dataUrl.split(',')[1];
625
+ const imageData = Buffer.from(base64Data, 'base64');
626
+ writeFileSync('screenshot.png', imageData);
627
+
628
+ // JPEG with quality control (smaller file size)
629
+ const dataUrl = await screenshot(browser, { format: 'jpeg', quality: 85 });
630
+ ```
631
+
632
+ ## Element Properties
633
+
634
+ Elements returned by `snapshot()` have the following properties:
635
+
636
+ ```typescript
637
+ element.id // Unique identifier for interactions
638
+ element.role // ARIA role (button, link, textbox, heading, etc.)
639
+ element.text // Visible text content
640
+ element.importance // AI importance score (0-1000)
641
+ element.bbox // Bounding box (x, y, width, height)
642
+ element.visual_cues // Visual analysis (is_primary, is_clickable, background_color)
643
+ element.in_viewport // Is element visible in current viewport?
644
+ element.is_occluded // Is element covered by other elements?
645
+ element.z_index // CSS stacking order
646
+ ```
647
+
648
+ ## Query DSL Reference
649
+
650
+ ### Basic Operators
651
+
652
+ | Operator | Description | Example |
653
+ |----------|-------------|---------|
654
+ | `=` | Exact match | `role=button` |
655
+ | `!=` | Exclusion | `role!=link` |
656
+ | `~` | Substring (case-insensitive) | `text~"sign in"` |
657
+ | `^=` | Prefix match | `text^="Add"` |
658
+ | `$=` | Suffix match | `text$="Cart"` |
659
+ | `>`, `>=` | Greater than | `importance>500` |
660
+ | `<`, `<=` | Less than | `bbox.y<600` |
661
+
662
+ ### Supported Fields
663
+
664
+ - **Role**: `role=button|link|textbox|heading|...`
665
+ - **Text**: `text`, `text~`, `text^=`, `text$=`
666
+ - **Visibility**: `clickable=true|false`, `visible=true|false`
667
+ - **Importance**: `importance`, `importance>=N`, `importance<N`
668
+ - **Position**: `bbox.x`, `bbox.y`, `bbox.width`, `bbox.height`
669
+ - **Layering**: `z_index`
670
+
671
+ ## Examples
672
+
673
+ See the `examples/` directory for complete working examples:
674
+
675
+ ### Agent Layer (Level 3 - Natural Language)
676
+ - **`agent-google-search.ts`** - Google search automation with natural language commands
677
+ - **`agent-amazon-shopping.ts`** - Amazon shopping bot (6 lines vs 350 lines manual code)
678
+ - **`agent-with-anthropic.ts`** - Using Anthropic Claude instead of OpenAI GPT
679
+
680
+ ### Direct SDK (Level 2 - Technical Control)
681
+ - **`hello.ts`** - Extension bridge verification
682
+ - **`basic-agent.ts`** - Basic snapshot and element inspection
683
+ - **`query-demo.ts`** - Query engine demonstrations
684
+ - **`wait-and-click.ts`** - Waiting for elements and performing actions
685
+ - **`read-markdown.ts`** - Content extraction and markdown conversion
686
+
687
+ ## Testing
688
+
689
+ ```bash
690
+ # Run all tests
691
+ npm test
692
+
693
+ # Run with coverage
694
+ npm run test:coverage
695
+
696
+ # Run specific test file
697
+ npm test -- snapshot.test.ts
698
+ ```
699
+
700
+ ## Configuration
701
+
702
+ ### Viewport Size
703
+
704
+ Default viewport is **1280x800** pixels. You can customize it using Playwright's API:
705
+
706
+ ```typescript
707
+ const browser = new SentienceBrowser();
708
+ await browser.start();
709
+
710
+ // Set custom viewport before navigating
711
+ await browser.getPage().setViewportSize({ width: 1920, height: 1080 });
712
+
713
+ await browser.goto('https://example.com');
714
+ ```
715
+
716
+ ### Headless Mode
717
+
718
+ ```typescript
719
+ // Headed mode (shows browser window)
720
+ const browser = new SentienceBrowser(undefined, undefined, false);
721
+
722
+ // Headless mode
723
+ const browser = new SentienceBrowser(undefined, undefined, true);
724
+
725
+ // Auto-detect based on environment (default)
726
+ const browser = new SentienceBrowser(); // headless=true if CI=true, else false
727
+ ```
728
+
729
+ ### Residential Proxy Support
730
+
731
+ For users running from datacenters (AWS, DigitalOcean, etc.), you can configure a residential proxy to prevent IP-based detection by Cloudflare, Akamai, and other anti-bot services.
732
+
733
+ **Supported Formats:**
734
+ - HTTP: `http://username:password@host:port`
735
+ - HTTPS: `https://username:password@host:port`
736
+ - SOCKS5: `socks5://username:password@host:port`
737
+
738
+ **Usage:**
739
+
740
+ ```typescript
741
+ // Via constructor parameter
742
+ const browser = new SentienceBrowser(
743
+ undefined,
744
+ undefined,
745
+ false,
746
+ 'http://username:password@residential-proxy.com:8000'
747
+ );
748
+ await browser.start();
749
+
750
+ // Via environment variable
751
+ process.env.SENTIENCE_PROXY = 'http://username:password@proxy.com:8000';
752
+ const browser = new SentienceBrowser();
753
+ await browser.start();
754
+
755
+ // With agent
756
+ import { SentienceAgent, OpenAIProvider } from 'sentienceapi';
757
+
758
+ const browser = new SentienceBrowser(
759
+ 'your-api-key',
760
+ undefined,
761
+ false,
762
+ 'http://user:pass@proxy.com:8000'
763
+ );
764
+ await browser.start();
765
+
766
+ const agent = new SentienceAgent(browser, new OpenAIProvider('openai-key'));
767
+ await agent.act('Navigate to example.com');
768
+ ```
769
+
770
+ **WebRTC Protection:**
771
+ The SDK automatically adds WebRTC leak protection flags when a proxy is configured, preventing your real datacenter IP from being exposed via WebRTC even when using proxies.
772
+
773
+ **HTTPS Certificate Handling:**
774
+ The SDK automatically ignores HTTPS certificate errors when a proxy is configured, as residential proxies often use self-signed certificates for SSL interception. This ensures seamless navigation to HTTPS sites through the proxy.
775
+
776
+ **Example:**
777
+ ```bash
778
+ # Run with proxy via environment variable
779
+ SENTIENCE_PROXY=http://user:pass@proxy.com:8000 npm run example:proxy
780
+
781
+ # Or via command line argument
782
+ ts-node examples/proxy-example.ts --proxy=http://user:pass@proxy.com:8000
783
+ ```
784
+
785
+ **Note:** The proxy is configured at the browser level, so all traffic (including the Chrome extension) routes through the proxy. No changes to the extension are required.
786
+
787
+ ### Authentication Session Injection
788
+
789
+ Inject pre-recorded authentication sessions (cookies + localStorage) to start your agent already logged in, bypassing login screens, 2FA, and CAPTCHAs. This saves tokens and reduces costs by eliminating login steps.
790
+
791
+ ```typescript
792
+ // Workflow 1: Inject pre-recorded session from file
793
+ import { SentienceBrowser, saveStorageState } from 'sentienceapi';
794
+
795
+ // Save session after manual login
796
+ const browser = new SentienceBrowser();
797
+ await browser.start();
798
+ await browser.getPage().goto('https://example.com');
799
+ // ... log in manually ...
800
+ await saveStorageState(browser.getContext(), 'auth.json');
801
+
802
+ // Use saved session in future runs
803
+ const browser2 = new SentienceBrowser(
804
+ undefined, // apiKey
805
+ undefined, // apiUrl
806
+ false, // headless
807
+ undefined, // proxy
808
+ undefined, // userDataDir
809
+ 'auth.json' // storageState - inject saved session
810
+ );
811
+ await browser2.start();
812
+ // Agent starts already logged in!
813
+
814
+ // Workflow 2: Persistent sessions (cookies persist across runs)
815
+ const browser3 = new SentienceBrowser(
816
+ undefined, // apiKey
817
+ undefined, // apiUrl
818
+ false, // headless
819
+ undefined, // proxy
820
+ './chrome_profile', // userDataDir - persist cookies
821
+ undefined // storageState
822
+ );
823
+ await browser3.start();
824
+ // First run: Log in
825
+ // Second run: Already logged in (cookies persist automatically)
826
+ ```
827
+
828
+ **Benefits:**
829
+ - Bypass login screens and CAPTCHAs with valid sessions
830
+ - Save 5-10 agent steps and hundreds of tokens per run
831
+ - Maintain stateful sessions for accessing authenticated pages
832
+ - Act as authenticated users (e.g., "Go to my Orders page")
833
+
834
+ See `examples/auth-injection-agent.ts` for complete examples.
835
+
836
+ ## Best Practices
837
+
838
+ ### 1. Wait for Dynamic Content
839
+ ```typescript
840
+ await browser.goto('https://example.com');
841
+ await browser.getPage().waitForLoadState('networkidle');
842
+ await new Promise(resolve => setTimeout(resolve, 1000)); // Extra buffer
843
+ ```
844
+
845
+ ### 2. Use Multiple Strategies for Finding Elements
846
+ ```typescript
847
+ // Try exact match first
848
+ let btn = find(snap, 'role=button text="Add to Cart"');
849
+
850
+ // Fallback to fuzzy match
851
+ if (!btn) {
852
+ btn = find(snap, 'role=button text~"cart"');
853
+ }
854
+ ```
855
+
856
+ ### 3. Check Element Visibility Before Clicking
857
+ ```typescript
858
+ if (element.in_viewport && !element.is_occluded) {
859
+ await click(browser, element.id);
860
+ }
861
+ ```
862
+
863
+ ### 4. Handle Navigation
864
+ ```typescript
865
+ const result = await click(browser, linkId);
866
+ if (result.url_changed) {
867
+ await browser.getPage().waitForLoadState('networkidle');
868
+ }
869
+ ```
870
+
871
+ ### 5. Use Screenshots Sparingly
872
+ ```typescript
873
+ // Fast - no screenshot (only element data)
874
+ const snap = await snapshot(browser);
875
+
876
+ // Slower - with screenshot (for debugging/verification)
877
+ const snap = await snapshot(browser, { screenshot: true });
878
+ ```
879
+
880
+ ### 6. Always Close Browser
881
+ ```typescript
882
+ const browser = new SentienceBrowser();
883
+
884
+ try {
885
+ await browser.start();
886
+ // ... your automation code
887
+ } finally {
888
+ await browser.close(); // Always clean up
889
+ }
890
+ ```
891
+
892
+ ## Troubleshooting
893
+
894
+ ### "Extension failed to load"
895
+ **Solution:** Build the extension first:
896
+ ```bash
897
+ cd sentience-chrome
898
+ ./build.sh
899
+ ```
900
+
901
+ ### "Cannot use import statement outside a module"
902
+ **Solution:** Don't use `node` directly. Use `ts-node` or npm scripts:
903
+ ```bash
904
+ npx ts-node examples/hello.ts
905
+ # or
906
+ npm run example:hello
907
+ ```
908
+
909
+ ### "Element not found"
910
+ **Solutions:**
911
+ - Ensure page is loaded: `await browser.getPage().waitForLoadState('networkidle')`
912
+ - Use `waitFor()`: `await waitFor(browser, 'role=button', 10000)`
913
+ - Debug elements: `console.log(snap.elements.map(el => el.text))`
914
+
915
+ ### Button not clickable
916
+ **Solutions:**
917
+ - Check visibility: `element.in_viewport && !element.is_occluded`
918
+ - Scroll to element: `await browser.getPage().evaluate(\`window.sentience_registry[${element.id}].scrollIntoView()\`)`
919
+
920
+ ## Documentation
921
+
922
+ - **📖 [Amazon Shopping Guide](../docs/AMAZON_SHOPPING_GUIDE.md)** - Complete tutorial with real-world example
923
+ - **📖 [Query DSL Guide](docs/QUERY_DSL.md)** - Advanced query patterns and operators
924
+ - **📄 [API Contract](../spec/SNAPSHOT_V1.md)** - Snapshot API specification
925
+ - **📄 [Type Definitions](../spec/sdk-types.md)** - TypeScript/Python type definitions
926
+
927
+ ## License
928
+
929
+ 📜 **License**
930
+
931
+ This SDK is licensed under the **Elastic License 2.0 (ELv2)**.
932
+
933
+ The Elastic License 2.0 allows you to use, modify, and distribute this SDK for internal, research, and non-competitive purposes. It **does not permit offering this SDK or a derivative as a hosted or managed service**, nor using it to build a competing product or service.
934
+
935
+ ### Important Notes
936
+
937
+ - This SDK is a **client-side library** that communicates with proprietary Sentience services and browser components.
938
+
939
+ - The Sentience backend services (including semantic geometry grounding, ranking, visual cues, and trace processing) are **not open source** and are governed by Sentience’s Terms of Service.
940
+
941
+ - Use of this SDK does **not** grant rights to operate, replicate, or reimplement Sentience’s hosted services.
942
+
943
+ For commercial usage, hosted offerings, or enterprise deployments, please contact Sentience to obtain a commercial license.
944
+
945
+ See the full license text in [`LICENSE`](./LICENSE.md).
946
+