imprint-mcp 0.4.6 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. package/README.md +4 -4
  2. package/examples/google-flights/README.md +2 -0
  3. package/examples/google-flights/_shared/flights_request.ts +10 -4
  4. package/examples/google-flights/get_flight_booking_details/index.ts +5 -2
  5. package/examples/google-flights/get_flight_booking_details/parser.ts +8 -0
  6. package/examples/google-flights/get_flight_booking_details/workflow.json +5 -2
  7. package/examples/google-flights/get_flight_calendar_prices/index.ts +5 -2
  8. package/examples/google-flights/get_flight_calendar_prices/parser.ts +8 -4
  9. package/examples/google-flights/get_flight_calendar_prices/workflow.json +5 -2
  10. package/examples/google-flights/lookup_airport/index.ts +3 -0
  11. package/examples/google-flights/lookup_airport/parser.ts +8 -1
  12. package/examples/google-flights/lookup_airport/workflow.json +3 -0
  13. package/examples/google-flights/search_flights/index.ts +63 -8
  14. package/examples/google-flights/search_flights/parser.ts +10 -0
  15. package/examples/google-flights/search_flights/request-transform.ts +45 -0
  16. package/examples/google-flights/search_flights/workflow.json +63 -8
  17. package/package.json +1 -1
  18. package/prompts/build-planning.md +1 -1
  19. package/prompts/compile-agent.md +5 -3
  20. package/prompts/prereq-builder.md +2 -1
  21. package/src/imprint/backend-ladder.ts +436 -43
  22. package/src/imprint/cdp-browser-fetch.ts +176 -6
  23. package/src/imprint/cdp-jar-cache.ts +105 -10
  24. package/src/imprint/compile-tools.ts +2 -2
  25. package/src/imprint/mcp-server.ts +152 -65
  26. package/src/imprint/probe-backends.ts +41 -10
  27. package/src/imprint/runtime.ts +24 -12
  28. package/src/imprint/stealth-fetch.ts +71 -0
  29. package/src/imprint/stealth-token-cache.ts +38 -1
  30. package/src/imprint/types.ts +45 -0
@@ -52,7 +52,7 @@ Follow these steps to compile the session:
52
52
  - *Session-scoped state* (minted once per page load, reused across requests): add a bootstrap capture with `browser_bootstrap` capability. Pick the `source` based on where the value actually lives in the recording — these are not interchangeable:
53
53
  - **Response header** (`source: 'response_header'`, `header: '<exact name>'`): the bootstrap GET's HTTP response carries the token as a header. Enterprise CSRF tokens, anti-replay tokens, and many app-minted page nonces are returned this way. **First check** — search the bootstrap response headers for the recorded token before reaching for any HTML/DOM source. If the token appears in `requests[0].response.headers`, this is the only correct source. Do NOT synthesize an `_shared/page-tokens.ts` HTML-regex helper for it; the body will not contain the value and the regex will silently miss.
54
54
 
55
- **Capture-source cross-check (verifier-enforced).** Before you declare any `required` capture, locate the matching recorded request in the session and confirm the declared source actually carries the recorded value: `response_header` → the header must exist in `response.headers`; `cookie` → `response.headers['set-cookie']` must define that cookie name; `html_regex` / `text_regex` → the pattern must match the recorded response body. The verifier rejects `done()` if the declared source does not produce a value in the recording, and it explicitly classifies a runtime `STATE_MISSING` from a declared capture as a workflow-correctness error (not infra) so the tool cannot ship waived. Picking the wrong source is the most common cause of "API rungs all silently fall to playbook" — measure twice.
55
+ **Capture-source cross-check (verifier-enforced).** Before you declare any `required` capture, locate the matching recorded request in the session and confirm the declared source actually carries the recorded value: `response_header` → the header must exist in `response.headers`; `cookie` → `response.headers['set-cookie']` must define that cookie name; `html_regex` / `text_regex` → the pattern must match the recorded response body; `request_header` / `request_url_regex` / `request_body_regex` → a browser-observed request matching `method`/`urlPattern` must carry the value in its headers, URL, or body. The verifier rejects `done()` if the declared source does not produce a value in the recording, and it explicitly classifies a runtime `STATE_MISSING` from a declared capture as a workflow-correctness error (not infra) so the tool cannot ship waived. Picking the wrong source is the most common cause of "API rungs all silently fall to playbook" — measure twice.
56
56
 
57
57
  **Referenced-capture cross-check — applies even to `required: false` captures (verifier-enforced).** If ANY request hard-references a capture via `${state.X}` in a header/body/url, that capture is effectively required regardless of its `required` flag, and the verifier checks its `html_regex`/`text_regex` pattern against EVERY recorded HTML page for the site (not just the bootstrap URL's own response — the bootstrap page may not even be in the recording). If the pattern matches no recorded page, `done()` is rejected (the runtime would `STATE_MISSING` the whole request). **Write the regex against the token as it ACTUALLY appears in the recorded HTML — read the recorded page first.** Common pitfall: a token embedded as `mUtil.createSecureCookie("Csrf-token", "<hex>")` is NOT matched by a pattern like `[Cc]srf[^"']{0,24}['"]([0-9a-f]{48,})['"]` because the `", "` separator between the cookie name and value falls between the two quotes — anchor on the real structure instead, e.g. `createSecureCookie\("Csrf-token",\s*"([0-9a-f]+)"`. When the live call would burn an anti-bot `.act`, the verifier SKIPS the live test entirely if a referenced capture can't resolve — so a wrong regex here costs you a whole verification cycle with no live signal. Get it right against the recording first.
58
58
 
@@ -60,6 +60,7 @@ Follow these steps to compile the session:
60
60
  - **HTML body** (`source: 'html_regex'`): the token is embedded in a `<script>` block, meta tag, or inline JSON inside the HTML. Use this only after confirming the value actually appears in the response body.
61
61
  - **DOM** (`source: 'dom_attribute'` / `source: 'dom_text'`): the token is rendered into a specific element by the page's JS — use a stable selector.
62
62
  - **Cookie / storage** (`source: 'cookie'` / `'local_storage'` / `'session_storage'`): the token is persisted client-side after bootstrap.
63
+ - **Browser-observed request** (`source: 'request_header'` / `'request_url_regex'` / `'request_body_regex'`): the bootstrap page's own XHR/fetch request carries the token, request id, or complete POST envelope. Use `method`, `urlPattern`, and `mode` to select the observed request, then extract from its header, URL, or body. This is the right source when the value is minted by page JavaScript but is absent from HTML, cookies, storage, and response headers.
63
64
  - *Per-request state* (unique per API call — nonces, request IDs, timestamps): write a `requestTransformModule` that generates fresh values.
64
65
  - *Bot-defense state* (sensor headers, fingerprints): use `stealth_bootstrap` capability.
65
66
  - **`constant`**: Identical across every pass the classifier compared — usually safe to hardcode. BUT: scrutinize high-entropy “constants” (UUIDs, JWTs, long hex/base64 strings). They may be slow-rotating tokens that happened to match across two runs taken minutes apart. If a constant looks like a token, treat it with suspicion and consider adding a bootstrap capture as a safety measure. **Exception — cross-recording corroboration.** The classifier diffs the recording against the automated replay AND against every other recording of this site (often captured hours or days apart), then keeps a value `constant` only if it never varied in any pass. A high-entropy value classified `constant` on this basis is *static infrastructure the server checks on every call*, NOT a rotating token: a GraphQL safelisting / persisted-query signature (`graphql-operation-signature`, `x-apollo-operation-id`, `x-apollo-operation-signature`), an API build/asset hash, a public app key. **Keep it verbatim** — dropping it gets the request 403'd or silently degraded to sentinel data. A genuinely rotating token could not be byte-identical across time-separated recordings; the classifier would have marked it `browser_minted`/`server_derived`. (The replay alone is unreliable here: anti-bot edges block the automated replay, so a protected header may be `constant` *purely* on cross-recording evidence — that evidence is sufficient; do not second-guess it as "high-entropy so probably rotating".)
@@ -97,7 +98,7 @@ Follow these steps to compile the session:
97
98
  - Keep headers minimal — drop bot-detection headers (Akamai fingerprints, DataDome, PerimeterX), drop browser-internal headers, keep `Content-Type`, `Origin`, `Referer` when needed
98
99
  - **CRITICAL — preserve FUNCTIONAL request headers (same principle as query params).** Beyond the standard set, the recorded request often carries headers the server *checks* on every call: anti-CSRF / anti-replay tokens (`X-Csrf-Token`, `X-XSRF-Token`, `RequestVerificationToken`, …), API keys, session/nonce headers, `X-*` app headers. These are part of the functional contract — dropping one usually makes a state-changing POST silently fail or get tarpitted, exactly like dropping a query param. For each non-bot, non-browser-internal header on the recorded request: keep it. If its value is a per-session/per-call token (high-entropy, rotates across the recording), do NOT hardcode it — capture it (`${state.NAME}` from a bootstrap/request capture) and template it. The litmus test mirrors query params: if the recorded request sent it and it isn't a bot fingerprint, the workflow request must send it too (literal if static, `${state.X}`/`${param.X}` if dynamic). A recorded state-changing POST (`*.act`, `/checkout`, `/book`, anything that mutates) that carried a CSRF/session header MUST template that header from captured state — never silently omit it.
99
100
  - **CRITICAL: Preserve ALL query parameters from the recorded URL.** Unlike HTTP headers — where you drop bot-detection fingerprints — query params are part of the API's functional contract. Even if a param value looks obfuscated or high-entropy (base64, hex, random-looking), it likely carries meaning the server checks (anti-bot tokens, session binding, A/B bucketing, obfuscated checksums). Preserve every param key: substitute the value with `${response[N].name}` or `${state.name}` if it came from an earlier response, `${param.NAME}` if user-variable, or keep the literal value if it's a static constant (like `search=false`). Missing a single query param can silently cause the API to return sentinel/degraded data rather than an error — the server may fall back to generic defaults instead of returning the actual results.
100
- - **Per-call query params (URL signing).** If a query param has a different high-entropy value on every request to the same URL path in the session, it is likely a URL signing token computed by client-side JavaScript. Do NOT hardcode the recorded value — it is per-call and will expire. Instead: use `search_response_body` to search the session's JavaScript responses (look for `.js` URLs) for the param name. The signing function is usually simple (HMAC, MD5, XOR + base64 with a static key). Once you find it, write a `requestTransformModule` (sibling to `parser.ts`) that exports `transform(method: string, url: string): string` — it takes the unsigned URL and returns the URL with the signing param appended. Set `"requestTransformModule": "./request-transform.ts"` in workflow.json. The runtime calls this function before each request.
101
+ - **Per-call query params (URL signing).** If a query param has a different high-entropy value on every request to the same URL path in the session, it is likely a URL signing token computed by client-side JavaScript. Do NOT hardcode the recorded value — it is per-call and will expire. Instead: use `search_response_body` to search the session's JavaScript responses (look for `.js` URLs) for the param name. The signing function is usually simple (HMAC, MD5, XOR + base64 with a static key). Once you find it, write a `requestTransformModule` (sibling to `parser.ts`) that exports `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>, state?: Record<string, unknown>)` — it takes the unsigned URL and returns the URL with the signing param appended. Set `"requestTransformModule": "./request-transform.ts"` in workflow.json. The runtime calls this function before each request.
101
102
  - **Complex body construction via requestTransformModule.** When the API uses a body format where simple `${param.X}` placeholder substitution cannot correctly encode values — e.g., JSPB arrays in form-encoded fields, nested JSON strings with position-dependent escaping — write a `requestTransformModule` that constructs the body programmatically. The transform receives `params` as a 4th argument and can return an object instead of a string:
102
103
  ```typescript
103
104
  export function transform(
@@ -105,12 +106,13 @@ Follow these steps to compile the session:
105
106
  url: string,
106
107
  responses: unknown[],
107
108
  params?: Record<string, string | number | boolean>,
109
+ state?: Record<string, unknown>,
108
110
  ): { url: string; body?: string } {
109
111
  const body = buildRequestBody(params ?? {});
110
112
  return { url, body };
111
113
  }
112
114
  ```
113
- Returning a plain `string` (just the URL) still works for simple URL-signing. Use the object return when you need to build or modify the request body or headers. Do NOT invent URL query parameters as a workaround for body-encoding complexity — the server ignores unknown query params and the parameters will have no effect.
115
+ Returning a plain `string` (just the URL) still works for simple URL-signing. Use the object return when you need to build or modify the request body or headers. If the bootstrap URL itself needs a derived value, export `prepareParams(params)` from the same module; it runs before bootstrap URL substitution and can return additional primitive params. Do NOT invent URL query parameters as a workaround for body-encoding complexity — the server ignores unknown query params and the parameters will have no effect.
114
116
  - **`x-api-key` is normally NOT a credential.** It's an app-level identifier baked into the site's JavaScript — same for every visitor, not user-specific. Keep it as a literal string in the workflow. Only treat it as a credential if you can clearly see it varies per account (e.g., it appears in a `Set-Cookie` after login, or differs across sessions). The same applies to `x-channel-id`, `x-app-id`, `x-app-version`, and similar metadata headers — hardcode them.
115
117
  - **NEVER use `${env.NAME}` placeholders.** The `${env.X}` syntax exists in the runtime but is reserved for operator-level configuration, not for values you can see in the recording. If a value appears in the captured request, hardcode it. If multiple candidates in the same session use different API keys for different endpoints, hardcode each one — they are endpoint-specific app constants, not secrets. The only valid placeholder types for your workflow are `${param.NAME}`, `${credential.NAME}`, `${state.NAME}`, and `${response[N].NAME}`.
116
118
  - If the workflow chains multiple requests (request N+1 uses a value from request N's response), add an `extract` field to request N and reference it in request N+1 via `${response[N].name}`
@@ -22,7 +22,8 @@ You receive `{ site, url, module, availableDependencies, sources, implementation
22
22
  ## Output requirements by `kind`
23
23
 
24
24
  ### `request-transform`
25
- - Export a `transform` function: `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>): string | { url: string; body?: string }`.
25
+ - Export a `transform` function: `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>, state?: Record<string, unknown>): string | { url: string; body?: string; headers?: Record<string, string> }`.
26
+ - Optionally export `prepareParams(params)` when bootstrap URLs need derived primitive params before `${param.*}` substitution.
26
27
  - It reproduces the site's per-request signing/body logic (e.g. HMAC/MD5/CRC32 + encoding) so the regenerated value matches what the recording sent. Derive the algorithm from `sources` (and any `.js` body included there). Return the URL with the signing param appended (or `{ url, body }` when you must build the body).
27
28
  - **The verifier re-signs a recorded URL and checks your output reproduces the recorded signing param.** A no-op that returns the URL unchanged will fail.
28
29