libretto 0.4.4 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli/cli.js +20 -19
- package/dist/cli/commands/ai.js +1 -1
- package/dist/cli/commands/browser.js +3 -3
- package/dist/cli/commands/execution.js +3 -3
- package/dist/cli/commands/logs.js +1 -1
- package/dist/cli/core/browser.js +11 -6
- package/dist/cli/core/context.js +4 -18
- package/dist/cli/core/session.js +2 -2
- package/dist/cli/core/snapshot-analyzer.js +2 -2
- package/dist/cli/router.js +1 -1
- package/dist/cli/workers/run-integration-runtime.js +2 -2
- package/dist/shared/paths/paths.js +2 -1
- package/dist/shared/paths/repo-root.d.ts +3 -0
- package/dist/shared/paths/repo-root.js +24 -0
- package/package.json +6 -7
- package/scripts/postinstall.mjs +12 -3
- package/skills/libretto/SKILL.md +93 -404
- package/skills/libretto/references/auth-profiles.md +30 -0
- package/skills/libretto/references/pages-and-page-targeting.md +29 -0
- package/skills/libretto/references/reverse-engineering-network-requests.md +39 -0
- package/skills/libretto/references/user-action-log.md +31 -0
- package/src/cli/cli.ts +173 -0
- package/src/cli/commands/ai.ts +35 -0
- package/src/cli/commands/browser.ts +165 -0
- package/src/cli/commands/execution.ts +691 -0
- package/src/cli/commands/init.ts +327 -0
- package/src/cli/commands/logs.ts +128 -0
- package/src/cli/commands/shared.ts +70 -0
- package/src/cli/commands/snapshot.ts +327 -0
- package/src/cli/core/ai-config.ts +255 -0
- package/src/cli/core/api-snapshot-analyzer.ts +97 -0
- package/src/cli/core/browser.ts +839 -0
- package/src/cli/core/context.ts +122 -0
- package/src/cli/core/pause-signals.ts +35 -0
- package/src/cli/core/session-telemetry.ts +553 -0
- package/src/cli/core/session.ts +209 -0
- package/src/cli/core/snapshot-analyzer.ts +875 -0
- package/src/cli/core/snapshot-api-config.ts +236 -0
- package/src/cli/core/telemetry.ts +446 -0
- package/src/cli/framework/simple-cli.ts +1273 -0
- package/src/cli/index.ts +13 -0
- package/src/cli/router.ts +28 -0
- package/src/cli/workers/run-integration-runtime.ts +311 -0
- package/src/cli/workers/run-integration-worker-protocol.ts +14 -0
- package/src/cli/workers/run-integration-worker.ts +75 -0
- package/src/index.ts +120 -0
- package/src/runtime/download/download.ts +100 -0
- package/src/runtime/download/index.ts +7 -0
- package/src/runtime/extract/extract.ts +92 -0
- package/src/runtime/extract/index.ts +1 -0
- package/src/runtime/network/index.ts +5 -0
- package/src/runtime/network/network.ts +113 -0
- package/src/runtime/recovery/agent.ts +256 -0
- package/src/runtime/recovery/errors.ts +152 -0
- package/src/runtime/recovery/index.ts +7 -0
- package/src/runtime/recovery/recovery.ts +50 -0
- package/{dist/shared/condense-dom/condense-dom.cjs → src/shared/condense-dom/condense-dom.ts} +243 -115
- package/src/shared/config/config.ts +22 -0
- package/src/shared/config/index.ts +5 -0
- package/src/shared/debug/index.ts +1 -0
- package/src/shared/debug/pause.ts +85 -0
- package/src/shared/instrumentation/errors.ts +82 -0
- package/src/shared/instrumentation/index.ts +9 -0
- package/src/shared/instrumentation/instrument.ts +276 -0
- package/src/shared/llm/ai-sdk-adapter.ts +78 -0
- package/src/shared/llm/client.ts +217 -0
- package/src/shared/llm/index.ts +3 -0
- package/src/shared/llm/types.ts +63 -0
- package/src/shared/logger/index.ts +6 -0
- package/src/shared/logger/logger.ts +352 -0
- package/src/shared/logger/sinks.ts +144 -0
- package/src/shared/paths/paths.ts +109 -0
- package/src/shared/paths/repo-root.ts +27 -0
- package/src/shared/run/api.ts +2 -0
- package/src/shared/run/browser.ts +98 -0
- package/src/shared/state/index.ts +11 -0
- package/src/shared/state/session-state.ts +74 -0
- package/src/shared/visualization/ghost-cursor.ts +200 -0
- package/src/shared/visualization/highlight.ts +146 -0
- package/src/shared/visualization/index.ts +18 -0
- package/src/shared/workflow/workflow.ts +42 -0
- package/dist/index.cjs +0 -144
- package/dist/index.d.cts +0 -21
- package/dist/runtime/download/download.cjs +0 -70
- package/dist/runtime/download/download.d.cts +0 -35
- package/dist/runtime/download/index.cjs +0 -30
- package/dist/runtime/download/index.d.cts +0 -3
- package/dist/runtime/extract/extract.cjs +0 -88
- package/dist/runtime/extract/extract.d.cts +0 -23
- package/dist/runtime/extract/index.cjs +0 -28
- package/dist/runtime/extract/index.d.cts +0 -5
- package/dist/runtime/network/index.cjs +0 -28
- package/dist/runtime/network/index.d.cts +0 -4
- package/dist/runtime/network/network.cjs +0 -91
- package/dist/runtime/network/network.d.cts +0 -28
- package/dist/runtime/recovery/agent.cjs +0 -223
- package/dist/runtime/recovery/agent.d.cts +0 -13
- package/dist/runtime/recovery/errors.cjs +0 -124
- package/dist/runtime/recovery/errors.d.cts +0 -31
- package/dist/runtime/recovery/index.cjs +0 -34
- package/dist/runtime/recovery/index.d.cts +0 -7
- package/dist/runtime/recovery/recovery.cjs +0 -55
- package/dist/runtime/recovery/recovery.d.cts +0 -12
- package/dist/shared/condense-dom/condense-dom.d.cts +0 -34
- package/dist/shared/config/config.cjs +0 -44
- package/dist/shared/config/config.d.cts +0 -10
- package/dist/shared/config/index.cjs +0 -32
- package/dist/shared/config/index.d.cts +0 -1
- package/dist/shared/debug/index.cjs +0 -28
- package/dist/shared/debug/index.d.cts +0 -1
- package/dist/shared/debug/pause.cjs +0 -86
- package/dist/shared/debug/pause.d.cts +0 -12
- package/dist/shared/instrumentation/errors.cjs +0 -81
- package/dist/shared/instrumentation/errors.d.cts +0 -12
- package/dist/shared/instrumentation/index.cjs +0 -35
- package/dist/shared/instrumentation/index.d.cts +0 -6
- package/dist/shared/instrumentation/instrument.cjs +0 -206
- package/dist/shared/instrumentation/instrument.d.cts +0 -32
- package/dist/shared/llm/ai-sdk-adapter.cjs +0 -71
- package/dist/shared/llm/ai-sdk-adapter.d.cts +0 -22
- package/dist/shared/llm/client.cjs +0 -218
- package/dist/shared/llm/client.d.cts +0 -13
- package/dist/shared/llm/index.cjs +0 -31
- package/dist/shared/llm/index.d.cts +0 -5
- package/dist/shared/llm/types.cjs +0 -16
- package/dist/shared/llm/types.d.cts +0 -67
- package/dist/shared/logger/index.cjs +0 -37
- package/dist/shared/logger/index.d.cts +0 -2
- package/dist/shared/logger/logger.cjs +0 -232
- package/dist/shared/logger/logger.d.cts +0 -86
- package/dist/shared/logger/sinks.cjs +0 -160
- package/dist/shared/logger/sinks.d.cts +0 -9
- package/dist/shared/paths/paths.cjs +0 -104
- package/dist/shared/paths/paths.d.cts +0 -10
- package/dist/shared/run/api.cjs +0 -28
- package/dist/shared/run/api.d.cts +0 -2
- package/dist/shared/run/browser.cjs +0 -98
- package/dist/shared/run/browser.d.cts +0 -22
- package/dist/shared/state/index.cjs +0 -38
- package/dist/shared/state/index.d.cts +0 -2
- package/dist/shared/state/session-state.cjs +0 -92
- package/dist/shared/state/session-state.d.cts +0 -40
- package/dist/shared/visualization/ghost-cursor.cjs +0 -174
- package/dist/shared/visualization/ghost-cursor.d.cts +0 -37
- package/dist/shared/visualization/highlight.cjs +0 -134
- package/dist/shared/visualization/highlight.d.cts +0 -22
- package/dist/shared/visualization/index.cjs +0 -45
- package/dist/shared/visualization/index.d.cts +0 -3
- package/dist/shared/workflow/workflow.cjs +0 -47
- package/dist/shared/workflow/workflow.d.cts +0 -21
- package/skills/libretto/code-generation-rules.md +0 -223
- package/skills/libretto/integration-approach-selection.md +0 -174
|
@@ -1,45 +0,0 @@
|
|
|
1
|
-
"use strict";
|
|
2
|
-
var __defProp = Object.defineProperty;
|
|
3
|
-
var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
|
|
4
|
-
var __getOwnPropNames = Object.getOwnPropertyNames;
|
|
5
|
-
var __hasOwnProp = Object.prototype.hasOwnProperty;
|
|
6
|
-
var __export = (target, all) => {
|
|
7
|
-
for (var name in all)
|
|
8
|
-
__defProp(target, name, { get: all[name], enumerable: true });
|
|
9
|
-
};
|
|
10
|
-
var __copyProps = (to, from, except, desc) => {
|
|
11
|
-
if (from && typeof from === "object" || typeof from === "function") {
|
|
12
|
-
for (let key of __getOwnPropNames(from))
|
|
13
|
-
if (!__hasOwnProp.call(to, key) && key !== except)
|
|
14
|
-
__defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
|
|
15
|
-
}
|
|
16
|
-
return to;
|
|
17
|
-
};
|
|
18
|
-
var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: true }), mod);
|
|
19
|
-
var visualization_exports = {};
|
|
20
|
-
__export(visualization_exports, {
|
|
21
|
-
clearHighlights: () => import_highlight.clearHighlights,
|
|
22
|
-
ensureGhostCursor: () => import_ghost_cursor.ensureGhostCursor,
|
|
23
|
-
ensureHighlightLayer: () => import_highlight.ensureHighlightLayer,
|
|
24
|
-
getGhostCursorPosition: () => import_ghost_cursor.getGhostCursorPosition,
|
|
25
|
-
ghostClick: () => import_ghost_cursor.ghostClick,
|
|
26
|
-
hideGhostCursor: () => import_ghost_cursor.hideGhostCursor,
|
|
27
|
-
moveGhostCursor: () => import_ghost_cursor.moveGhostCursor,
|
|
28
|
-
moveGhostCursorWithDistance: () => import_ghost_cursor.moveGhostCursorWithDistance,
|
|
29
|
-
showHighlight: () => import_highlight.showHighlight
|
|
30
|
-
});
|
|
31
|
-
module.exports = __toCommonJS(visualization_exports);
|
|
32
|
-
var import_ghost_cursor = require("./ghost-cursor.js");
|
|
33
|
-
var import_highlight = require("./highlight.js");
|
|
34
|
-
// Annotate the CommonJS export names for ESM import in node:
|
|
35
|
-
0 && (module.exports = {
|
|
36
|
-
clearHighlights,
|
|
37
|
-
ensureGhostCursor,
|
|
38
|
-
ensureHighlightLayer,
|
|
39
|
-
getGhostCursorPosition,
|
|
40
|
-
ghostClick,
|
|
41
|
-
hideGhostCursor,
|
|
42
|
-
moveGhostCursor,
|
|
43
|
-
moveGhostCursorWithDistance,
|
|
44
|
-
showHighlight
|
|
45
|
-
});
|
|
@@ -1,3 +0,0 @@
|
|
|
1
|
-
export { GhostCursorOptions, GhostCursorStyle, ensureGhostCursor, getGhostCursorPosition, ghostClick, hideGhostCursor, moveGhostCursor, moveGhostCursorWithDistance } from './ghost-cursor.cjs';
|
|
2
|
-
export { HighlightOptions, ShowHighlightParams, clearHighlights, ensureHighlightLayer, showHighlight } from './highlight.cjs';
|
|
3
|
-
import 'playwright';
|
|
@@ -1,47 +0,0 @@
|
|
|
1
|
-
"use strict";
|
|
2
|
-
var __defProp = Object.defineProperty;
|
|
3
|
-
var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
|
|
4
|
-
var __getOwnPropNames = Object.getOwnPropertyNames;
|
|
5
|
-
var __hasOwnProp = Object.prototype.hasOwnProperty;
|
|
6
|
-
var __export = (target, all) => {
|
|
7
|
-
for (var name in all)
|
|
8
|
-
__defProp(target, name, { get: all[name], enumerable: true });
|
|
9
|
-
};
|
|
10
|
-
var __copyProps = (to, from, except, desc) => {
|
|
11
|
-
if (from && typeof from === "object" || typeof from === "function") {
|
|
12
|
-
for (let key of __getOwnPropNames(from))
|
|
13
|
-
if (!__hasOwnProp.call(to, key) && key !== except)
|
|
14
|
-
__defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
|
|
15
|
-
}
|
|
16
|
-
return to;
|
|
17
|
-
};
|
|
18
|
-
var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: true }), mod);
|
|
19
|
-
var workflow_exports = {};
|
|
20
|
-
__export(workflow_exports, {
|
|
21
|
-
LIBRETTO_WORKFLOW_BRAND: () => LIBRETTO_WORKFLOW_BRAND,
|
|
22
|
-
LibrettoWorkflow: () => LibrettoWorkflow,
|
|
23
|
-
workflow: () => workflow
|
|
24
|
-
});
|
|
25
|
-
module.exports = __toCommonJS(workflow_exports);
|
|
26
|
-
const LIBRETTO_WORKFLOW_BRAND = /* @__PURE__ */ Symbol.for("libretto.workflow");
|
|
27
|
-
class LibrettoWorkflow {
|
|
28
|
-
[LIBRETTO_WORKFLOW_BRAND] = true;
|
|
29
|
-
metadata;
|
|
30
|
-
handler;
|
|
31
|
-
constructor(metadata, handler) {
|
|
32
|
-
this.metadata = metadata;
|
|
33
|
-
this.handler = handler;
|
|
34
|
-
}
|
|
35
|
-
async run(ctx, input) {
|
|
36
|
-
return this.handler(ctx, input);
|
|
37
|
-
}
|
|
38
|
-
}
|
|
39
|
-
function workflow(metadata, handler) {
|
|
40
|
-
return new LibrettoWorkflow(metadata, handler);
|
|
41
|
-
}
|
|
42
|
-
// Annotate the CommonJS export names for ESM import in node:
|
|
43
|
-
0 && (module.exports = {
|
|
44
|
-
LIBRETTO_WORKFLOW_BRAND,
|
|
45
|
-
LibrettoWorkflow,
|
|
46
|
-
workflow
|
|
47
|
-
});
|
|
@@ -1,21 +0,0 @@
|
|
|
1
|
-
import { Page } from 'playwright';
|
|
2
|
-
import { MinimalLogger } from '../logger/logger.cjs';
|
|
3
|
-
|
|
4
|
-
declare const LIBRETTO_WORKFLOW_BRAND: unique symbol;
|
|
5
|
-
type LibrettoWorkflowMetadata = {};
|
|
6
|
-
type LibrettoWorkflowContext<S = {}> = {
|
|
7
|
-
page: Page;
|
|
8
|
-
logger: MinimalLogger;
|
|
9
|
-
services: S;
|
|
10
|
-
};
|
|
11
|
-
type LibrettoWorkflowHandler<Input = unknown, Output = unknown, S = {}> = (ctx: LibrettoWorkflowContext<S>, input: Input) => Promise<Output>;
|
|
12
|
-
declare class LibrettoWorkflow<Input = unknown, Output = unknown, S = {}> {
|
|
13
|
-
readonly [LIBRETTO_WORKFLOW_BRAND] = true;
|
|
14
|
-
readonly metadata: LibrettoWorkflowMetadata;
|
|
15
|
-
private readonly handler;
|
|
16
|
-
constructor(metadata: LibrettoWorkflowMetadata, handler: LibrettoWorkflowHandler<Input, Output, S>);
|
|
17
|
-
run(ctx: LibrettoWorkflowContext<S>, input: Input): Promise<Output>;
|
|
18
|
-
}
|
|
19
|
-
declare function workflow<Input = unknown, Output = unknown, S = {}>(metadata: LibrettoWorkflowMetadata, handler: LibrettoWorkflowHandler<Input, Output, S>): LibrettoWorkflow<Input, Output, S>;
|
|
20
|
-
|
|
21
|
-
export { LIBRETTO_WORKFLOW_BRAND, LibrettoWorkflow, type LibrettoWorkflowContext, type LibrettoWorkflowHandler, type LibrettoWorkflowMetadata, workflow };
|
|
@@ -1,223 +0,0 @@
|
|
|
1
|
-
# Code Generation Rules
|
|
2
|
-
|
|
3
|
-
These rules apply when generating production TypeScript files from interactive browser sessions. Read this file before writing any production code.
|
|
4
|
-
|
|
5
|
-
## Workflow File Structure
|
|
6
|
-
|
|
7
|
-
Generated files must export a `workflow()` instance so they can be run via `npx libretto run <file> <exportName>`. Import `workflow` and its types from `"libretto"`:
|
|
8
|
-
|
|
9
|
-
```typescript
|
|
10
|
-
import { workflow, pause, type LibrettoWorkflowContext } from "libretto";
|
|
11
|
-
|
|
12
|
-
type Input = {
|
|
13
|
-
// Define the expected input shape — passed via --params JSON
|
|
14
|
-
query: string;
|
|
15
|
-
maxResults?: number;
|
|
16
|
-
};
|
|
17
|
-
|
|
18
|
-
type Output = {
|
|
19
|
-
// Define what the workflow returns
|
|
20
|
-
results: Array<{ name: string; value: string }>;
|
|
21
|
-
};
|
|
22
|
-
|
|
23
|
-
export const myWorkflow = workflow<Input, Output>(
|
|
24
|
-
{},
|
|
25
|
-
async (ctx, input): Promise<Output> => {
|
|
26
|
-
const { page } = ctx;
|
|
27
|
-
|
|
28
|
-
// workflow logic here — use ctx.page, ctx.logger, ctx.services
|
|
29
|
-
await page.goto("https://example.com");
|
|
30
|
-
// ...
|
|
31
|
-
|
|
32
|
-
return { results: [] };
|
|
33
|
-
},
|
|
34
|
-
);
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
**Key points:**
|
|
38
|
-
|
|
39
|
-
- The named export (e.g., `myWorkflow`) is what you pass as the second arg to `npx libretto run ./file.ts myWorkflow`
|
|
40
|
-
- `ctx` provides `page`, `logger`, and `services` (generic, default `{}`)
|
|
41
|
-
- `input` comes from `--params '{"query":"foo"}'` or `--params-file params.json` on the CLI
|
|
42
|
-
- If the site requires a saved login session, pass `--auth-profile <domain>` to the CLI (created via `npx libretto save <domain>`)
|
|
43
|
-
- Use `await pause()` (imported from `"libretto"`) to pause the workflow for debugging. It is a no-op in production.
|
|
44
|
-
- The browser is launched and closed automatically by the CLI — do not launch or close it in the handler
|
|
45
|
-
|
|
46
|
-
## Passing Application Dependencies via Services
|
|
47
|
-
|
|
48
|
-
Use the third generic on `workflow<Input, Output, Services>` to inject
|
|
49
|
-
dependencies that exist in your application but not in libretto's runtime
|
|
50
|
-
(DB transactions, API clients, caches, etc.):
|
|
51
|
-
|
|
52
|
-
```typescript
|
|
53
|
-
import { type Transaction } from "./db";
|
|
54
|
-
|
|
55
|
-
type MyServices = { tx?: Transaction };
|
|
56
|
-
|
|
57
|
-
export const myWorkflow = workflow<Input, Output, MyServices>(
|
|
58
|
-
{},
|
|
59
|
-
async (ctx, input) => {
|
|
60
|
-
if (ctx.services.tx) {
|
|
61
|
-
await ctx.services.tx.insert(/* ... */);
|
|
62
|
-
} else {
|
|
63
|
-
ctx.logger.info("No DB transaction — skipping write");
|
|
64
|
-
}
|
|
65
|
-
// ... browser automation ...
|
|
66
|
-
},
|
|
67
|
-
);
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
In production, the caller passes services when invoking `.run()`:
|
|
71
|
-
|
|
72
|
-
```typescript
|
|
73
|
-
await myWorkflow.run(
|
|
74
|
-
{ page, logger, services: { tx } },
|
|
75
|
-
input,
|
|
76
|
-
);
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
When running standalone via `npx libretto run`, services defaults to `{}`,
|
|
80
|
-
so mark fields optional for anything unavailable in that context.
|
|
81
|
-
|
|
82
|
-
## Playwright Locators for DOM Interaction
|
|
83
|
-
|
|
84
|
-
Generated code must use Playwright locator APIs for all DOM interactions. Do not use `page.evaluate()` with `document.querySelector`, `querySelectorAll`, `textContent`, `click()`, or other DOM APIs when a Playwright locator can do the same thing.
|
|
85
|
-
|
|
86
|
-
During the interactive `exec` phase, `page.evaluate` is fine for quick prototyping. In generated production code, translate those patterns into Playwright locators.
|
|
87
|
-
|
|
88
|
-
### Translation Table
|
|
89
|
-
|
|
90
|
-
| Operation | Interactive (`exec`) | Production file |
|
|
91
|
-
| ---------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------- |
|
|
92
|
-
| Click | `page.evaluate(() => document.getElementById('x').click())` | `page.locator('#x').click()` |
|
|
93
|
-
| Check state | `page.evaluate(() => el.checked)` | `page.locator('#x').isChecked()` |
|
|
94
|
-
| Read text | `page.evaluate(() => el.textContent)` | `page.locator('#x').textContent()` |
|
|
95
|
-
| Read all text | `querySelectorAll(...).map(e => e.textContent)` | `page.locator('.items').allTextContents()` |
|
|
96
|
-
| Element position | `el.getBoundingClientRect()` | `page.locator('#x').boundingBox()` |
|
|
97
|
-
| Inline styles | `el.style.top` | `page.locator('#x').getAttribute('style')` |
|
|
98
|
-
| Count elements | `querySelectorAll(...).length` | `page.locator('.items').count()` |
|
|
99
|
-
| Select dropdown | `selectEl.value = '...'` | `page.locator('select').selectOption('...')` |
|
|
100
|
-
| Iterate elements | `querySelectorAll(...).forEach(...)` | `const items = await locator.all(); for (const item of items) { ... }` |
|
|
101
|
-
| Scoped query | `parent.querySelector('.child')` | `parentLocator.locator('.child').textContent()` |
|
|
102
|
-
| Batch extraction | `querySelectorAll('.item').forEach(e => { ... })` | `for (const item of await locator.all()) { const text = await item.locator('.text').textContent(); ... }` |
|
|
103
|
-
|
|
104
|
-
### Anti-Patterns
|
|
105
|
-
|
|
106
|
-
These patterns come up frequently during interactive sessions and should not carry over into production code:
|
|
107
|
-
|
|
108
|
-
```typescript
|
|
109
|
-
// DON'T — batch-read via evaluate string
|
|
110
|
-
const data = await page.evaluate(`(() => {
|
|
111
|
-
const posts = document.querySelectorAll('.post');
|
|
112
|
-
return Array.from(posts).map(p => ({
|
|
113
|
-
name: p.querySelector('.name')?.textContent,
|
|
114
|
-
content: p.querySelector('.content')?.textContent,
|
|
115
|
-
}));
|
|
116
|
-
})()`);
|
|
117
|
-
|
|
118
|
-
// DO — Playwright locators with a loop
|
|
119
|
-
const posts = await page.locator('.post').all();
|
|
120
|
-
for (const post of posts) {
|
|
121
|
-
const name = await post.locator('.name').textContent();
|
|
122
|
-
const content = await post.locator('.content').textContent();
|
|
123
|
-
}
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
```typescript
|
|
127
|
-
// DON'T — evaluate to count elements
|
|
128
|
-
const count = await el.evaluate(`(el) => el.querySelectorAll('.item').length`);
|
|
129
|
-
|
|
130
|
-
// DO
|
|
131
|
-
const count = await el.locator('.item').count();
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
```typescript
|
|
135
|
-
// DON'T — evaluate to read scoped text
|
|
136
|
-
const text = await post.evaluate(
|
|
137
|
-
`(el) => el.querySelector('[data-view-name="foo"]')?.textContent`
|
|
138
|
-
);
|
|
139
|
-
|
|
140
|
-
// DO
|
|
141
|
-
const text = await post.locator('[data-view-name="foo"]').textContent();
|
|
142
|
-
```
|
|
143
|
-
|
|
144
|
-
### When `page.evaluate()` Is Acceptable
|
|
145
|
-
|
|
146
|
-
Use `page.evaluate()` only for operations that have no Playwright locator equivalent:
|
|
147
|
-
|
|
148
|
-
1. **Browser-native APIs** — `getComputedStyle()`, `window.*` globals, `document.cookie`, scroll position
|
|
149
|
-
2. **In-browser `fetch()` calls** — making HTTP requests from the browser context
|
|
150
|
-
3. **Parsing operations** — using `DOMParser` to parse HTML/XML strings inside the browser
|
|
151
|
-
|
|
152
|
-
A quick test: if the evaluate body contains `querySelector`, `querySelectorAll`, `textContent`, `click()`, `getAttribute()`, or iterates DOM elements, it should be rewritten with Playwright locators.
|
|
153
|
-
|
|
154
|
-
When `page.evaluate()` is used for the acceptable cases above, keep the logic self-contained and return JSON-serializable values:
|
|
155
|
-
|
|
156
|
-
```typescript
|
|
157
|
-
const data = (await page.evaluate(`(() => {
|
|
158
|
-
const style = getComputedStyle(document.documentElement);
|
|
159
|
-
return style.getPropertyValue('--brand-color');
|
|
160
|
-
})()`)) as string;
|
|
161
|
-
```
|
|
162
|
-
|
|
163
|
-
Do not rely on broad DOM querying inside `page.evaluate()` for production flows when Playwright locators can express the same interaction.
|
|
164
|
-
|
|
165
|
-
## Network Request Methods
|
|
166
|
-
|
|
167
|
-
When codifying network-based data extraction or form submissions, wrap `page.evaluate(() => fetch(...))` calls in typed methods on a shared API client class:
|
|
168
|
-
|
|
169
|
-
```typescript
|
|
170
|
-
class ApiClient {
|
|
171
|
-
constructor(private page: Page) {}
|
|
172
|
-
|
|
173
|
-
private async apiFetch(
|
|
174
|
-
url: string,
|
|
175
|
-
options?: { method?: string; body?: string },
|
|
176
|
-
): Promise<string> {
|
|
177
|
-
return await this.page.evaluate(
|
|
178
|
-
async ({ url, method, body }) => {
|
|
179
|
-
const init: RequestInit = { method: method ?? "GET" };
|
|
180
|
-
if (body) {
|
|
181
|
-
init.headers = {
|
|
182
|
-
"Content-Type": "application/x-www-form-urlencoded",
|
|
183
|
-
};
|
|
184
|
-
init.body = body;
|
|
185
|
-
}
|
|
186
|
-
const response = await fetch(url, init);
|
|
187
|
-
if (!response.ok) throw new Error(`${response.status} for ${url}`);
|
|
188
|
-
return await response.text();
|
|
189
|
-
},
|
|
190
|
-
{ url, method: options?.method, body: options?.body },
|
|
191
|
-
);
|
|
192
|
-
}
|
|
193
|
-
|
|
194
|
-
async fetchReferralList(status: string): Promise<Referral[]> {
|
|
195
|
-
const raw = await this.apiFetch(`/api/referrals?status=${status}`);
|
|
196
|
-
// parse and return typed data
|
|
197
|
-
}
|
|
198
|
-
}
|
|
199
|
-
```
|
|
200
|
-
|
|
201
|
-
One method per endpoint. No try-catch in API methods — let errors propagate to the orchestrator. Parse XML/HTML inside `page.evaluate()` with `DOMParser`. Use string expressions for `page.evaluate()` to avoid DOM type errors.
|
|
202
|
-
|
|
203
|
-
## Comments
|
|
204
|
-
|
|
205
|
-
Add comments throughout generated code to explain what each logical block is doing. Comments should describe **intent**, not restate the code. Group related actions under a single comment rather than commenting every line.
|
|
206
|
-
|
|
207
|
-
```typescript
|
|
208
|
-
// Log in with credentials
|
|
209
|
-
await page.locator('#username').fill(user);
|
|
210
|
-
await page.locator('#password').fill(pass);
|
|
211
|
-
await page.locator('#login').click();
|
|
212
|
-
|
|
213
|
-
// Extract author and content from each feed post
|
|
214
|
-
const posts = await page.locator('.post').all();
|
|
215
|
-
for (const post of posts) {
|
|
216
|
-
const name = await post.locator('.name').textContent();
|
|
217
|
-
const content = await post.locator('.content').textContent();
|
|
218
|
-
}
|
|
219
|
-
```
|
|
220
|
-
|
|
221
|
-
## Type Checking
|
|
222
|
-
|
|
223
|
-
The generated file must pass `npx tsc --noEmit` before it's considered done. If there are type errors around DOM access, prefer locator APIs first, then use focused `page.evaluate()` only for browser-native APIs.
|
|
@@ -1,174 +0,0 @@
|
|
|
1
|
-
# Integration Approach Selection Guide
|
|
2
|
-
|
|
3
|
-
**Purpose:** You are connected to a live Chrome session on a target website. Your job is to probe the site for bot detection measures, assess its security posture, and determine the best integration strategy for data extraction. All strategies use Playwright for browser control — the question is what to **prioritize** for data capture: in-browser fetch calls, passive network interception, or DOM extraction.
|
|
4
|
-
|
|
5
|
-
After completing the probes below, produce a **Site Assessment Summary** (see the output format at the end of this document).
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Probing the Site
|
|
10
|
-
|
|
11
|
-
Run these probes to build a picture of the site's detection posture. The examples below are starting points — use your judgment to investigate further based on what you find. Sites may use detection methods not listed here.
|
|
12
|
-
|
|
13
|
-
### Probe 1: Bot Protection Services & Security Signals
|
|
14
|
-
|
|
15
|
-
Look for signs that the site uses bot protection — either a third-party service or custom detection. There is no complete list of indicators; these are common examples.
|
|
16
|
-
|
|
17
|
-
**Cookies to look for (examples, not exhaustive):**
|
|
18
|
-
|
|
19
|
-
| Cookie Pattern | Associated Service |
|
|
20
|
-
|---|---|
|
|
21
|
-
| `_abck` | Akamai Bot Manager |
|
|
22
|
-
| `_px*` | PerimeterX (HUMAN) |
|
|
23
|
-
| `datadome` | DataDome |
|
|
24
|
-
| `cf_clearance` | Cloudflare |
|
|
25
|
-
| `_imp_apg_r_*` | Shape Security (F5) |
|
|
26
|
-
| `x-kpsdk-*` | Kasada |
|
|
27
|
-
|
|
28
|
-
But don't just check this list. Examine **all** cookies on the page — look for any cookies with obfuscated names, telemetry-related prefixes, or values that look like fingerprint hashes or encrypted tokens. Unknown security cookies are still security cookies.
|
|
29
|
-
|
|
30
|
-
**Global variables to check (examples):**
|
|
31
|
-
|
|
32
|
-
```js
|
|
33
|
-
// Known telemetry globals — but probe broadly, not just these
|
|
34
|
-
window._pxAppId // PerimeterX
|
|
35
|
-
window.bmak // Akamai
|
|
36
|
-
window.ddjskey // DataDome
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
Also examine the page's scripts: look at the first `<script>` tags in the document source, check what external domains scripts load from (e.g., `*.akamaized.net`, `*.perimeterx.net`, `*.datadome.co`, `*.kasada.io`). Bot protection scripts are typically injected before any application code.
|
|
40
|
-
|
|
41
|
-
**Challenge pages:**
|
|
42
|
-
|
|
43
|
-
Check if the page is showing a challenge or interstitial instead of real content — "Checking your browser...", CAPTCHA iframes, blank pages with only a spinner. These indicate active bot protection that has already been triggered.
|
|
44
|
-
|
|
45
|
-
**General guidance:** The goal is to determine whether the site has bot protection and roughly how aggressive it is. Don't limit yourself to known signatures — look at the overall page behavior, unusual scripts, and anything that seems like security telemetry.
|
|
46
|
-
|
|
47
|
-
### Probe 2: Fetch / XHR Interception
|
|
48
|
-
|
|
49
|
-
Check whether the site has monkey-patched `window.fetch` or `XMLHttpRequest`. If it has, making your own fetch calls from `page.evaluate()` is risky because the site can inspect call stacks and detect calls that don't originate from its own code.
|
|
50
|
-
|
|
51
|
-
```js
|
|
52
|
-
// Check if fetch has been wrapped
|
|
53
|
-
window.fetch.toString()
|
|
54
|
-
// Native: "function fetch() { [native code] }"
|
|
55
|
-
// Patched: shows actual JavaScript source
|
|
56
|
-
|
|
57
|
-
// Check XMLHttpRequest
|
|
58
|
-
XMLHttpRequest.prototype.open.toString()
|
|
59
|
-
|
|
60
|
-
// Check property descriptors for tampering
|
|
61
|
-
Object.getOwnPropertyDescriptor(window, 'fetch')
|
|
62
|
-
// Normal: { value: ƒ, writable: true, enumerable: true, configurable: true }
|
|
63
|
-
|
|
64
|
-
// Proxy-based wrapping is harder to detect — native fetch has no prototype
|
|
65
|
-
window.fetch.hasOwnProperty('prototype') // true may indicate a Proxy wrapper
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
**Important:** Some sites use `Proxy` to wrap fetch, which makes `toString()` still return `"[native code]"`. The prototype check is a heuristic, not definitive. If you see any sign of fetch interception, treat it as patched.
|
|
69
|
-
|
|
70
|
-
### Probe 3: Behavioral Monitoring
|
|
71
|
-
|
|
72
|
-
Look for signs that the site collects behavioral telemetry (mouse movements, keystrokes, scroll patterns). Heavy monitoring means you should use natural, human-like interaction patterns when driving the UI.
|
|
73
|
-
|
|
74
|
-
Things to check:
|
|
75
|
-
- Unusually large numbers of event listeners on `document` or `body` for `mousemove`, `keydown`, `scroll`, `touchstart`, `click`
|
|
76
|
-
- Known telemetry collection scripts
|
|
77
|
-
- `MutationObserver` instances watching the DOM for injected elements
|
|
78
|
-
- `requestAnimationFrame` loops that aren't tied to visible animations
|
|
79
|
-
|
|
80
|
-
If you're in a DevTools context, `getEventListeners(document)` is the quickest way to assess this. Otherwise, use heuristics — heavy behavioral monitoring usually correlates with enterprise bot protection from Probe 1.
|
|
81
|
-
|
|
82
|
-
---
|
|
83
|
-
|
|
84
|
-
## Choosing a Data Capture Strategy
|
|
85
|
-
|
|
86
|
-
Every integration uses Playwright to control the browser. The question is what to **prioritize** for getting data out. In practice, most integrations use a mix — you'll always need some Playwright interaction for navigation, login flows, cookie consent, etc. The strategies below describe what to lean on for the core data extraction.
|
|
87
|
-
|
|
88
|
-
### Strategy A: Prioritize `page.evaluate(fetch(...))`
|
|
89
|
-
|
|
90
|
-
Make fetch calls directly from within the browser's JavaScript context. The requests share the browser's TLS fingerprint, cookies, and origin — they look identical to requests the site's own JS would make.
|
|
91
|
-
|
|
92
|
-
**When to prioritize this:**
|
|
93
|
-
- No enterprise bot protection detected
|
|
94
|
-
- `fetch` is NOT monkey-patched
|
|
95
|
-
- You've identified API endpoints that return the data you need
|
|
96
|
-
- You need data that requires many API calls (deep pagination, bulk queries) where driving the UI would be slow
|
|
97
|
-
|
|
98
|
-
**Why:** Maximum control and efficiency. You call exactly the endpoints you want with the parameters you want, skip UI rendering, and get structured JSON back. On sites without aggressive detection, this is the fastest and cleanest approach.
|
|
99
|
-
|
|
100
|
-
**Risk:** If the site monitors fetch call stacks (Layer 4 detection), your calls will be flagged because they don't originate from the site's bundled code. This is uncommon but exists on high-security sites.
|
|
101
|
-
|
|
102
|
-
**You'll still use Playwright for:** Initial navigation, login/auth flows, cookie consent, and any UI interactions needed to establish session state before making fetch calls.
|
|
103
|
-
|
|
104
|
-
### Strategy B: Prioritize `page.on('response', ...)` (Passive Interception)
|
|
105
|
-
|
|
106
|
-
Listen to network responses that the browser naturally makes as you navigate. You don't make any extra requests — you capture data flowing through the site's own API calls.
|
|
107
|
-
|
|
108
|
-
**When to prioritize this:**
|
|
109
|
-
- Enterprise bot protection is detected
|
|
110
|
-
- `fetch` IS monkey-patched
|
|
111
|
-
- The site's normal UI flow triggers API calls that return the data you need
|
|
112
|
-
- You want to minimize detection risk as much as possible
|
|
113
|
-
|
|
114
|
-
**Why:** Zero additional network risk. The requests that happen are the ones the site's own code triggers. You're just listening. No anomalous call stacks, no unexpected request patterns, no extra fetch calls for monitoring to flag.
|
|
115
|
-
|
|
116
|
-
**Trade-off:** You only get data the page naturally loads. If you need page 50 of results, you have to click "next" 49 times via Playwright. You must set up listeners before the navigation that triggers the requests.
|
|
117
|
-
|
|
118
|
-
**You'll still use Playwright for:** All navigation and interaction to trigger the data-bearing API calls, plus any data that isn't available via intercepted responses (DOM-only content).
|
|
119
|
-
|
|
120
|
-
### Strategy C: Prioritize Playwright DOM Extraction
|
|
121
|
-
|
|
122
|
-
Extract data directly from the rendered page using selectors and `page.evaluate()` to read DOM content.
|
|
123
|
-
|
|
124
|
-
**When to prioritize this:**
|
|
125
|
-
- Data is server-rendered (no JSON API calls observed)
|
|
126
|
-
- The site doesn't expose the data you need via any API
|
|
127
|
-
- You need visual/layout information that only exists in the DOM
|
|
128
|
-
- As a fallback when Strategies A and B can't get specific pieces of data
|
|
129
|
-
|
|
130
|
-
**Why:** Works regardless of the site's API architecture. If the data is visible on the page, you can extract it.
|
|
131
|
-
|
|
132
|
-
**Trade-off:** Slower, fragile against DOM changes, and you only get data the UI renders (which may be less than what API responses contain).
|
|
133
|
-
|
|
134
|
-
---
|
|
135
|
-
|
|
136
|
-
## Decision Summary
|
|
137
|
-
|
|
138
|
-
| Site Profile | Primary Strategy | Supplement With |
|
|
139
|
-
|---|---|---|
|
|
140
|
-
| No bot protection, fetch not patched | **A** (`page.evaluate(fetch)`) | Playwright for navigation/auth |
|
|
141
|
-
| No bot protection, fetch IS patched | **B** (`page.on('response', ...)`) | Playwright for navigation; DOM extraction as fallback |
|
|
142
|
-
| Bot protection detected, fetch not patched | **B** (`page.on('response', ...)`) | Playwright for all navigation; cautious use of `page.evaluate(fetch)` only if needed |
|
|
143
|
-
| Bot protection detected, fetch IS patched | **B** (`page.on('response', ...)`) | Playwright for all navigation; DOM extraction as fallback |
|
|
144
|
-
| Server-rendered content (no API calls) | **C** (DOM extraction) | Playwright for all interaction |
|
|
145
|
-
|
|
146
|
-
---
|
|
147
|
-
|
|
148
|
-
## Output: Site Assessment Summary
|
|
149
|
-
|
|
150
|
-
After running the probes, produce a summary in this format. **Do NOT include a final strategy recommendation.** The security assessment determines what's *safe to use*, not what will *work*. Present this to the user for input, then use the safe approaches as you build the integration — adapting if specific endpoints don't work as expected (see "Handling Approach Mismatches" in SKILL.md).
|
|
151
|
-
|
|
152
|
-
```
|
|
153
|
-
## Site Assessment: [site URL]
|
|
154
|
-
|
|
155
|
-
### Bot Detection Profile
|
|
156
|
-
- **Enterprise bot protection:** [None detected / Detected — describe what you found (service name if identifiable, cookies, scripts, telemetry globals)]
|
|
157
|
-
- **Fetch/XHR interception:** [Native (not patched) / Patched — describe what you found]
|
|
158
|
-
- **Behavioral monitoring:** [None detected / Light / Heavy — describe indicators]
|
|
159
|
-
- **Challenge pages:** [None / Present — describe type (CAPTCHA, interstitial, etc.)]
|
|
160
|
-
- **Overall security posture:** [None / Low / Moderate / High / Very High]
|
|
161
|
-
|
|
162
|
-
### API Surface
|
|
163
|
-
- **API calls observed:** [List key endpoints discovered, or "None — content appears server-rendered"]
|
|
164
|
-
- **Data format:** [JSON / GraphQL / HTML fragments / Other — note if any responses use proprietary/binary formats]
|
|
165
|
-
- **Pagination:** [Describe how pagination works if applicable]
|
|
166
|
-
|
|
167
|
-
### Safe Approaches
|
|
168
|
-
- **`page.evaluate(fetch(...))`:** [Safe / Unsafe — brief rationale based on fetch patching, bot detection, etc.]
|
|
169
|
-
- **`page.on('response', ...)`:** [Viable / Not viable — note if response formats are parseable or proprietary]
|
|
170
|
-
- **DOM extraction:** [Always available as fallback]
|
|
171
|
-
- **Interaction notes:** [any behavioral precautions — natural mouse movements, typing delays, etc.]
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
**Important:** This assessment tells you which tools are in your toolbox. Present it to the user, get their input, then start building the integration using the safe approaches.
|