imprint-mcp 0.4.7 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/README.md +4 -4
  2. package/examples/google-flights/README.md +0 -2
  3. package/examples/google-flights/_shared/flights_request.ts +4 -10
  4. package/examples/google-flights/get_flight_booking_details/index.ts +2 -5
  5. package/examples/google-flights/get_flight_booking_details/parser.ts +0 -8
  6. package/examples/google-flights/get_flight_booking_details/workflow.json +2 -5
  7. package/examples/google-flights/get_flight_calendar_prices/index.ts +2 -5
  8. package/examples/google-flights/get_flight_calendar_prices/parser.ts +4 -8
  9. package/examples/google-flights/get_flight_calendar_prices/workflow.json +2 -5
  10. package/examples/google-flights/lookup_airport/index.ts +0 -3
  11. package/examples/google-flights/lookup_airport/parser.ts +1 -8
  12. package/examples/google-flights/lookup_airport/workflow.json +0 -3
  13. package/examples/google-flights/search_flights/index.ts +7 -62
  14. package/examples/google-flights/search_flights/request-transform.ts +0 -45
  15. package/examples/google-flights/search_flights/workflow.json +7 -62
  16. package/package.json +1 -1
  17. package/prompts/build-planning.md +1 -1
  18. package/prompts/compile-agent.md +3 -5
  19. package/prompts/prereq-builder.md +1 -2
  20. package/src/imprint/backend-ladder.ts +47 -436
  21. package/src/imprint/cdp-browser-fetch.ts +6 -176
  22. package/src/imprint/cdp-jar-cache.ts +10 -105
  23. package/src/imprint/compile-tools.ts +2 -2
  24. package/src/imprint/mcp-server.ts +65 -152
  25. package/src/imprint/probe-backends.ts +10 -41
  26. package/src/imprint/runtime.ts +12 -24
  27. package/src/imprint/stealth-fetch.ts +0 -71
  28. package/src/imprint/stealth-token-cache.ts +1 -38
  29. package/src/imprint/types.ts +0 -45
package/README.md CHANGED
@@ -169,8 +169,8 @@ When an API call gets blocked, Imprint doesn't jump to DOM replay. It escalates
169
169
  │ + API
170
170
 
171
171
  cdp-replay ~2-35s API calls run inside a live, trusted Chrome —
172
- reuses browser-observed request state and refreshes
173
- anti-bot tokens between protected POSTs
172
+ a protected POST refreshes its anti-bot token
173
+ between calls (multi-step state-changing flows)
174
174
 
175
175
  stealth-fetch ~1-12s Defeats Akamai, Cloudflare, DataDome
176
176
 
@@ -178,9 +178,9 @@ When an API call gets blocked, Imprint doesn't jump to DOM replay. It escalates
178
178
  playbook ~9s Full DOM replay — universal fallback
179
179
  ```
180
180
 
181
- The full order is `fetch → fetch-bootstrap → cdp-replay → stealth-fetch → playbook`; `auto` mode walks it and stops at the first backend that works. Workflows that declare browser-observed request captures can start at `cdp-replay`, so MCP sessions reuse the same Chrome instead of paying a cold bootstrap on each route/date.
181
+ The full order is `fetch → fetch-bootstrap → cdp-replay → stealth-fetch → playbook`; `auto` mode walks it and stops at the first backend that works.
182
182
 
183
- For bot-protected sites, `imprint probe-backends <site> --tool <toolName>` writes a `backends.json` preference cache so cron and MCP start from the known-good backend instead of rediscovering blocked rungs. Use `imprint probe-backends <site> --all` to refresh every tool in a multi-tool site; `imprint mcp status` reports stale or invalid backend caches before they quietly fall back to the default ladder. Execution-only workflow changes keep the cached backend order when the backend capability hash still matches. CDP replay records both cold and warm timings when it succeeds: a timeout-safe cold start may rank by its fast warm runtime, but a cold start above the preferred threshold stays behind cold-safe backends in durable cache order.
183
+ For bot-protected sites, `imprint probe-backends <site> --tool <toolName>` writes a `backends.json` preference cache so cron and MCP start from the known-good backend instead of rediscovering blocked rungs. Use `imprint probe-backends <site> --all` to refresh every tool in a multi-tool site; `imprint mcp status` reports stale or invalid backend caches before they quietly fall back to the default ladder. CDP replay records both cold and warm timings when it succeeds: a timeout-safe cold start may rank by its fast warm runtime, but a cold start above the preferred threshold stays behind cold-safe backends in durable cache order.
184
184
 
185
185
  Every recording compiles to *both* `workflow.json` and `playbook.yaml`, so the ladder always has a DOM fallback.
186
186
 
@@ -17,8 +17,6 @@ A 4-tool MCP server for Google Flights, compiled from a recording of a normal fl
17
17
 
18
18
  - **Protocol**: Google's `/_/FlightsFrontendUi` **`batchexecute`** endpoint returns a nested-array (protobuf-ish) payload. The compiler reverse-engineered the encoding into `_shared/batchexecute.ts` (shared decoder) + per-tool `parser.ts`, and the `f.req` request shape into `_shared/flights_request.ts` + per-tool `request-transform.ts`.
19
19
  - **Anti-bot**: the per-page `f.sid` / `bl` tokens are bootstrapped at runtime (`${state.f_sid}` placeholders), and calls run on the **cdp-replay** rung (requests issued inside a live, trusted Chrome) with a **stealth-fetch** fallback.
20
- - **MCP pacing**: `search_flights` declares `execution.minCallSpacingMs: 2000` because Google Flights can return fast empty result sets when warm CDP searches are fired back-to-back with no breathing room.
21
- - **Bounded fallback**: these tools declare `execution.skipPlaybookFallback` so MCP calls fail fast after the API/browser-backed rungs are exhausted instead of spending the rest of the agent timeout in an unstructured DOM replay.
22
20
  - **Artifacts per tool**: `workflow.json` (API replay), `playbook.yaml` (DOM fallback), `index.ts` (MCP tool), `parser.ts` + `request-transform.ts` (codecs).
23
21
 
24
22
  ## Install
@@ -8,11 +8,9 @@
8
8
  // with the date range living in the outer wrapper; Shopping/Booking use the full
9
9
  // 15-slot leg with DATE at [6]. Verified by decoding seq 97 vs seq 111.
10
10
 
11
- // Fresh searches emit wrapper `...,0,0,0,1]` and use leg[14]=3 for normal
12
- // shopping legs. Return leg [14]=1 appears in booking / selected-leg flows, not
13
- // in the initial search request used by this tool.
14
- // In-page-refined searches use `...,0,1,0,1]` — a UI freshness flag, not a user
15
- // param; we always emit the fresh form for shopping.
11
+ // Fresh searches emit wrapper `...,0,0,0,1]` and leg[14]=3 (proven seq 111/140).
12
+ // In-page-refined searches use `...,0,1,0,1]` with return-leg[14]=1 (seq 194/425)
13
+ // a UI freshness flag, not a user param; we always emit the fresh form for shopping.
16
14
  // Booking outbound legs use [14]=3, return legs [14]=1 (seq 764/811).
17
15
 
18
16
  function buildLeg(leg: any): any[] {
@@ -76,11 +74,7 @@ export function transform(
76
74
  let payload: any;
77
75
 
78
76
  if (rpc === 'GetShoppingResults') {
79
- const searchContext =
80
- typeof p.searchContextToken === 'string' && p.searchContextToken
81
- ? [null, null, null, p.searchContextToken]
82
- : [];
83
- payload = [searchContext, sp, 0, 0, 0, 1];
77
+ payload = [[], sp, 0, 0, 0, 1];
84
78
  } else if (rpc === 'GetCalendarPicker') {
85
79
  const legs = sp[13];
86
80
  if (Array.isArray(legs)) sp[13] = legs.map((l: any) => (Array.isArray(l) ? l.slice(0, 4) : l));
@@ -94,7 +94,7 @@ const WORKFLOW: Workflow = {
94
94
  "captures": [
95
95
  {
96
96
  "name": "f_sid",
97
- "required": true,
97
+ "required": false,
98
98
  "capability": "browser_bootstrap",
99
99
  "source": "html_regex",
100
100
  "pattern": "\"FdrFJe\":\"([^\"]+)\"",
@@ -102,7 +102,7 @@ const WORKFLOW: Workflow = {
102
102
  },
103
103
  {
104
104
  "name": "bl",
105
- "required": true,
105
+ "required": false,
106
106
  "capability": "browser_bootstrap",
107
107
  "source": "html_regex",
108
108
  "pattern": "\"cfb2h\":\"([^\"]+)\"",
@@ -112,9 +112,6 @@ const WORKFLOW: Workflow = {
112
112
  },
113
113
  "parserModule": "./parser.ts",
114
114
  "requestTransformModule": "./request-transform.ts",
115
- "execution": {
116
- "skipPlaybookFallback": true
117
- },
118
115
  "liveVerified": true
119
116
  };
120
117
 
@@ -142,9 +142,6 @@ export function extract(
142
142
  let frames: Array<{ rpcid: string | null; payload: any }> = [];
143
143
  if (typeof rawResponse === 'string') {
144
144
  frames = decodeBatchExecute(rawResponse);
145
- if (frames.length === 0) {
146
- throw new Error('Google Flights GetBookingResults response did not contain a batchexecute payload');
147
- }
148
145
  } else if (rawResponse != null) {
149
146
  frames = [{ rpcid: null, payload: rawResponse }];
150
147
  }
@@ -173,11 +170,6 @@ export function extract(
173
170
 
174
171
  const segments = [...segMap.values()];
175
172
  const fareOptions = [...fareMap.values()];
176
- if (segments.length === 0 && fareOptions.length === 0) {
177
- throw new Error(
178
- 'Google Flights GetBookingResults payload did not contain recognizable booking details',
179
- );
180
- }
181
173
  const prices = fareOptions.map((f) => f.priceUSD);
182
174
 
183
175
  return {
@@ -65,7 +65,7 @@
65
65
  "name": "f_sid",
66
66
  "pattern": "\"FdrFJe\":\"([^\"]+)\"",
67
67
  "group": 1,
68
- "required": true,
68
+ "required": false,
69
69
  "capability": "browser_bootstrap"
70
70
  },
71
71
  {
@@ -73,7 +73,7 @@
73
73
  "name": "bl",
74
74
  "pattern": "\"cfb2h\":\"([^\"]+)\"",
75
75
  "group": 1,
76
- "required": true,
76
+ "required": false,
77
77
  "capability": "browser_bootstrap"
78
78
  }
79
79
  ]
@@ -94,8 +94,5 @@
94
94
  ],
95
95
  "requestTransformModule": "./request-transform.ts",
96
96
  "parserModule": "./parser.ts",
97
- "execution": {
98
- "skipPlaybookFallback": true
99
- },
100
97
  "liveVerified": true
101
98
  }
@@ -74,7 +74,7 @@ const WORKFLOW: Workflow = {
74
74
  "captures": [
75
75
  {
76
76
  "name": "f_sid",
77
- "required": true,
77
+ "required": false,
78
78
  "capability": "browser_bootstrap",
79
79
  "source": "html_regex",
80
80
  "pattern": "\"FdrFJe\":\"([^\"]+)\"",
@@ -82,7 +82,7 @@ const WORKFLOW: Workflow = {
82
82
  },
83
83
  {
84
84
  "name": "bl",
85
- "required": true,
85
+ "required": false,
86
86
  "capability": "browser_bootstrap",
87
87
  "source": "html_regex",
88
88
  "pattern": "\"cfb2h\":\"([^\"]+)\"",
@@ -92,9 +92,6 @@ const WORKFLOW: Workflow = {
92
92
  },
93
93
  "parserModule": "./parser.ts",
94
94
  "requestTransformModule": "./request-transform.ts",
95
- "execution": {
96
- "skipPlaybookFallback": true
97
- },
98
95
  "liveVerified": true
99
96
  };
100
97
 
@@ -54,9 +54,6 @@ export function extract(
54
54
  ): unknown {
55
55
  const raw = typeof rawResponse === 'string' ? rawResponse : JSON.stringify(rawResponse ?? '');
56
56
  const frames = decodeBatchExecute(raw);
57
- if (frames.length === 0) {
58
- throw new Error('Google Flights GetCalendarPicker response did not contain a batchexecute payload');
59
- }
60
57
 
61
58
  let payload: unknown = null;
62
59
  for (const f of frames) {
@@ -66,14 +63,13 @@ export function extract(
66
63
  break;
67
64
  }
68
65
  }
66
+ // If no frame produced entries, still attempt the first frame's payload so an
67
+ // empty (zero-result) response yields an empty calendar rather than throwing.
68
+ if (payload == null && frames.length > 0) payload = frames[0]?.payload ?? null;
69
+
69
70
  const entries = collectEntries(payload).sort((a, b) =>
70
71
  a.departureDate < b.departureDate ? -1 : a.departureDate > b.departureDate ? 1 : 0,
71
72
  );
72
- if (entries.length === 0) {
73
- throw new Error(
74
- 'Google Flights GetCalendarPicker payload did not contain recognizable calendar prices',
75
- );
76
- }
77
73
 
78
74
  const prices: Record<string, number> = {};
79
75
  for (const e of entries) prices[e.departureDate] = e.lowestPriceUSD;
@@ -44,7 +44,7 @@
44
44
  "name": "f_sid",
45
45
  "pattern": "\"FdrFJe\":\"([^\"]+)\"",
46
46
  "group": 1,
47
- "required": true,
47
+ "required": false,
48
48
  "capability": "browser_bootstrap"
49
49
  },
50
50
  {
@@ -52,7 +52,7 @@
52
52
  "name": "bl",
53
53
  "pattern": "\"cfb2h\":\"([^\"]+)\"",
54
54
  "group": 1,
55
- "required": true,
55
+ "required": false,
56
56
  "capability": "browser_bootstrap"
57
57
  }
58
58
  ]
@@ -74,8 +74,5 @@
74
74
  ],
75
75
  "requestTransformModule": "./request-transform.ts",
76
76
  "parserModule": "./parser.ts",
77
- "execution": {
78
- "skipPlaybookFallback": true
79
- },
80
77
  "liveVerified": true
81
78
  }
@@ -71,9 +71,6 @@ const WORKFLOW: Workflow = {
71
71
  },
72
72
  "parserModule": "./parser.ts",
73
73
  "requestTransformModule": "./request-transform.ts",
74
- "execution": {
75
- "skipPlaybookFallback": true
76
- },
77
74
  "liveVerified": true
78
75
  };
79
76
 
@@ -45,15 +45,8 @@ export function extract(
45
45
  ): unknown {
46
46
  const raw = typeof rawResponse === 'string' ? rawResponse : JSON.stringify(rawResponse);
47
47
  const payload = extractRpcPayload(raw, 'tDoGIe');
48
- if (payload == null) {
49
- throw new Error('Google Flights tDoGIe response did not contain a batchexecute payload');
50
- }
51
48
 
52
- if (!Array.isArray(payload) || !Array.isArray(payload[1])) {
53
- throw new Error('Google Flights tDoGIe payload did not contain a recognizable match list');
54
- }
55
-
56
- const matchesRaw = payload[1];
49
+ const matchesRaw = Array.isArray(payload) && Array.isArray(payload[1]) ? payload[1] : [];
57
50
  const matches = matchesRaw
58
51
  .map(parseItem)
59
52
  .filter((m): m is AirportMatch => m !== null);
@@ -53,8 +53,5 @@
53
53
  ],
54
54
  "parserModule": "./parser.ts",
55
55
  "requestTransformModule": "./request-transform.ts",
56
- "execution": {
57
- "skipPlaybookFallback": true
58
- },
59
56
  "liveVerified": true
60
57
  }
@@ -120,13 +120,11 @@ const WORKFLOW: Workflow = {
120
120
  "requests": [
121
121
  {
122
122
  "method": "POST",
123
- "url": "https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetShoppingResults?f.sid=${state.f_sid}&bl=${state.bl}&hl=en-US&soc-app=162&soc-platform=1&soc-device=1&_reqid=${state.reqid}&rt=c",
123
+ "url": "https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetShoppingResults?f.sid=${state.f_sid}&bl=${state.bl}&hl=en-US&soc-app=162&soc-platform=1&soc-device=1&_reqid=1708023&rt=c",
124
124
  "headers": {
125
125
  "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
126
126
  "X-Same-Domain": "1",
127
- "Referer": "https://www.google.com/travel/flights?q=${param.bootstrap_query}&curr=USD",
128
- "Accept-Language": "en-US,en;q=0.9",
129
- "X-Goog-BatchExecute-Bgr": "${state.bgr}",
127
+ "Referer": "https://www.google.com/travel/flights",
130
128
  "x-goog-ext-259736195-jspb": "[\"en-US\",\"US\",\"USD\",2,null,[420],null,null,7,[]]"
131
129
  },
132
130
  "body": "f.req=${param.origin}|${param.destination}|${param.departure_date}|${param.return_date}|${param.trip_type}|${param.max_stops}|${param.airlines}|${param.max_price}|${param.outbound_times}|${param.return_times}|${param.max_duration}|${param.carry_on_bags}&",
@@ -135,7 +133,7 @@ const WORKFLOW: Workflow = {
135
133
  ],
136
134
  "site": "google-flights",
137
135
  "bootstrap": {
138
- "url": "https://www.google.com/travel/flights?q=${param.bootstrap_query}&curr=USD",
136
+ "url": "https://www.google.com/travel/flights",
139
137
  "waitUntil": "domcontentloaded",
140
138
  "timeoutMs": 30000,
141
139
  "captures": [
@@ -143,75 +141,22 @@ const WORKFLOW: Workflow = {
143
141
  "name": "f_sid",
144
142
  "required": true,
145
143
  "capability": "browser_bootstrap",
146
- "source": "request_url_regex",
147
- "pattern": "[?&]f\\.sid=([^&]+)",
148
- "method": "POST",
149
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
150
- "mode": "last",
144
+ "source": "html_regex",
145
+ "pattern": "\"FdrFJe\":\"([^\"]+)\"",
151
146
  "group": 1
152
147
  },
153
148
  {
154
149
  "name": "bl",
155
150
  "required": true,
156
151
  "capability": "browser_bootstrap",
157
- "source": "request_url_regex",
158
- "pattern": "[?&]bl=([^&]+)",
159
- "method": "POST",
160
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
161
- "mode": "last",
162
- "group": 1
163
- },
164
- {
165
- "name": "bgr",
166
- "required": true,
167
- "capability": "browser_bootstrap",
168
- "source": "request_header",
169
- "header": "X-Goog-BatchExecute-Bgr",
170
- "method": "POST",
171
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
172
- "mode": "last"
173
- },
174
- {
175
- "name": "reqid",
176
- "required": true,
177
- "capability": "browser_bootstrap",
178
- "source": "request_url_regex",
179
- "pattern": "[?&]_reqid=([^&]+)",
180
- "method": "POST",
181
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
182
- "mode": "last",
183
- "group": 1
184
- },
185
- {
186
- "name": "search_context_token",
187
- "required": false,
188
- "capability": "browser_bootstrap",
189
- "source": "request_body_regex",
190
- "pattern": "%5B%5Bnull%2Cnull%2Cnull%2C%5C%22(.+?)%5C%22%5D",
191
- "method": "POST",
192
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
193
- "mode": "last",
194
- "group": 1
195
- },
196
- {
197
- "name": "observed_search_body",
198
- "required": false,
199
- "capability": "browser_bootstrap",
200
- "source": "request_body_regex",
201
- "pattern": "^(f\\.req=.*)$",
202
- "method": "POST",
203
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
204
- "mode": "last",
152
+ "source": "html_regex",
153
+ "pattern": "\"cfb2h\":\"([^\"]+)\"",
205
154
  "group": 1
206
155
  }
207
156
  ]
208
157
  },
209
158
  "parserModule": "./parser.ts",
210
159
  "requestTransformModule": "./request-transform.ts",
211
- "execution": {
212
- "minCallSpacingMs": 2000,
213
- "skipPlaybookFallback": true
214
- },
215
160
  "liveVerified": true
216
161
  };
217
162
 
@@ -62,56 +62,13 @@ function num(v: unknown): number | undefined {
62
62
  return Number.isFinite(n) && n > 0 ? n : undefined;
63
63
  }
64
64
 
65
- function buildBootstrapQuery(params: Params): string {
66
- const origin = params.origin != null ? String(params.origin) : '';
67
- const destination = params.destination != null ? String(params.destination) : '';
68
- const departureDate = params.departure_date != null ? String(params.departure_date) : '';
69
- const tripType = String(params.trip_type ?? 'round_trip').toLowerCase();
70
- if (tripType === 'one_way' || tripType === 'oneway' || tripType === '2') {
71
- return `One way flights from ${origin} to ${destination} on ${departureDate}`;
72
- }
73
- if (params.return_date) {
74
- return `Round trip flights from ${origin} to ${destination} departing ${departureDate} returning ${params.return_date}`;
75
- }
76
- return `Flights from ${origin} to ${destination} on ${departureDate}`;
77
- }
78
-
79
- export function prepareParams(params?: Params): Params {
80
- const p: Params = params ?? {};
81
- return {
82
- ...p,
83
- bootstrap_query: buildBootstrapQuery(p),
84
- };
85
- }
86
-
87
- function hasNonDefaultFilters(params: Params): boolean {
88
- if (params.max_stops != null && params.max_stops !== '' && Number(params.max_stops) !== 3) {
89
- return true;
90
- }
91
- return Boolean(
92
- params.airlines ||
93
- num(params.max_price) ||
94
- params.outbound_times ||
95
- params.return_times ||
96
- num(params.max_duration) ||
97
- num(params.carry_on_bags),
98
- );
99
- }
100
-
101
65
  export function transform(
102
66
  method: string,
103
67
  url: string,
104
68
  responses: Record<string, any>,
105
69
  params?: Params,
106
- state?: Record<string, unknown>,
107
70
  ): { url: string; body: string } {
108
71
  const p: Params = params ?? {};
109
- const observedSearchBody =
110
- typeof state?.observed_search_body === 'string' ? state.observed_search_body : undefined;
111
- if (observedSearchBody && !hasNonDefaultFilters(p)) {
112
- return { url, body: observedSearchBody };
113
- }
114
-
115
72
  const tripType = mapTripType(p.trip_type);
116
73
  const stops = p.max_stops != null && p.max_stops !== '' ? mapStops(p.max_stops) : 0;
117
74
  const { alliances, carriers } = parseAirlines(p.airlines);
@@ -156,8 +113,6 @@ export function transform(
156
113
  // CONFIG[10] wire form is [1, <carry-on count>]; shared builder emits
157
114
  // [carryOn, checked], so map count -> checked slot, constant 1 -> first.
158
115
  bags: carryOn != null ? { carryOn: 1, checked: carryOn } : undefined,
159
- searchContextToken:
160
- typeof state?.search_context_token === 'string' ? state.search_context_token : undefined,
161
116
  };
162
117
 
163
118
  return sharedTransform(method, url, responses, mapped);
@@ -101,87 +101,36 @@
101
101
  }
102
102
  ],
103
103
  "bootstrap": {
104
- "url": "https://www.google.com/travel/flights?q=${param.bootstrap_query}&curr=USD",
104
+ "url": "https://www.google.com/travel/flights",
105
105
  "waitUntil": "domcontentloaded",
106
106
  "timeoutMs": 30000,
107
107
  "captures": [
108
108
  {
109
- "source": "request_url_regex",
109
+ "source": "html_regex",
110
110
  "name": "f_sid",
111
- "pattern": "[?&]f\\.sid=([^&]+)",
111
+ "pattern": "\"FdrFJe\":\"([^\"]+)\"",
112
112
  "group": 1,
113
- "method": "POST",
114
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
115
- "mode": "last",
116
113
  "required": true,
117
114
  "capability": "browser_bootstrap"
118
115
  },
119
116
  {
120
- "source": "request_url_regex",
117
+ "source": "html_regex",
121
118
  "name": "bl",
122
- "pattern": "[?&]bl=([^&]+)",
119
+ "pattern": "\"cfb2h\":\"([^\"]+)\"",
123
120
  "group": 1,
124
- "method": "POST",
125
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
126
- "mode": "last",
127
121
  "required": true,
128
122
  "capability": "browser_bootstrap"
129
- },
130
- {
131
- "source": "request_header",
132
- "name": "bgr",
133
- "header": "X-Goog-BatchExecute-Bgr",
134
- "method": "POST",
135
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
136
- "mode": "last",
137
- "required": true,
138
- "capability": "browser_bootstrap"
139
- },
140
- {
141
- "source": "request_url_regex",
142
- "name": "reqid",
143
- "pattern": "[?&]_reqid=([^&]+)",
144
- "group": 1,
145
- "method": "POST",
146
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
147
- "mode": "last",
148
- "required": true,
149
- "capability": "browser_bootstrap"
150
- },
151
- {
152
- "source": "request_body_regex",
153
- "name": "search_context_token",
154
- "pattern": "%5B%5Bnull%2Cnull%2Cnull%2C%5C%22(.+?)%5C%22%5D",
155
- "group": 1,
156
- "method": "POST",
157
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
158
- "mode": "last",
159
- "required": false,
160
- "capability": "browser_bootstrap"
161
- },
162
- {
163
- "source": "request_body_regex",
164
- "name": "observed_search_body",
165
- "pattern": "^(f\\.req=.*)$",
166
- "group": 1,
167
- "method": "POST",
168
- "urlPattern": "FlightsFrontendService/GetShoppingResults",
169
- "mode": "last",
170
- "required": false,
171
- "capability": "browser_bootstrap"
172
123
  }
173
124
  ]
174
125
  },
175
126
  "requests": [
176
127
  {
177
128
  "method": "POST",
178
- "url": "https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetShoppingResults?f.sid=${state.f_sid}&bl=${state.bl}&hl=en-US&soc-app=162&soc-platform=1&soc-device=1&_reqid=${state.reqid}&rt=c",
129
+ "url": "https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetShoppingResults?f.sid=${state.f_sid}&bl=${state.bl}&hl=en-US&soc-app=162&soc-platform=1&soc-device=1&_reqid=1708023&rt=c",
179
130
  "headers": {
180
131
  "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
181
132
  "X-Same-Domain": "1",
182
- "Referer": "https://www.google.com/travel/flights?q=${param.bootstrap_query}&curr=USD",
183
- "Accept-Language": "en-US,en;q=0.9",
184
- "X-Goog-BatchExecute-Bgr": "${state.bgr}",
133
+ "Referer": "https://www.google.com/travel/flights",
185
134
  "x-goog-ext-259736195-jspb": "[\"en-US\",\"US\",\"USD\",2,null,[420],null,null,7,[]]"
186
135
  },
187
136
  "body": "f.req=${param.origin}|${param.destination}|${param.departure_date}|${param.return_date}|${param.trip_type}|${param.max_stops}|${param.airlines}|${param.max_price}|${param.outbound_times}|${param.return_times}|${param.max_duration}|${param.carry_on_bags}&",
@@ -190,9 +139,5 @@
190
139
  ],
191
140
  "requestTransformModule": "./request-transform.ts",
192
141
  "parserModule": "./parser.ts",
193
- "execution": {
194
- "minCallSpacingMs": 2000,
195
- "skipPlaybookFallback": true
196
- },
197
142
  "liveVerified": true
198
143
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "imprint-mcp",
3
- "version": "0.4.7",
3
+ "version": "0.4.8",
4
4
  "description": "Teach an AI agent how to use any website. Once. Records a real browser session + narration; generates a deterministic MCP tool plus a DOM-replay playbook fallback.",
5
5
  "type": "module",
6
6
  "exports": {
@@ -59,7 +59,7 @@ You receive:
59
59
 
60
60
  1. **Emit exactly one `perTool` entry per `selectedTools` entry**, using the same `toolName`. Do not invent or drop tools.
61
61
  2. **Only hoist a shared module when ≥2 selected tools genuinely share it.** Single-use logic stays inside that tool's own parser.ts / request-transform.ts — do NOT create a `_shared/` module for it.
62
- 3. **`request-transform`** — URL signing, body construction, or bootstrap-param preparation shared across tools. Wire-up: the consuming tool sets `requestTransformModule: "../_shared/<name>.ts"`. Ground it in `ephemeralValues` (browser_minted, high-entropy query param) and `sourceSeqs`. The exported `transform(method, url, responses, params?, state?)` returns the signed URL (or `{ url, body?, headers? }`); an optional `prepareParams(params)` can return primitive params used before bootstrap URL substitution.
62
+ 3. **`request-transform`** — URL signing or body construction shared across tools. Wire-up: the consuming tool sets `requestTransformModule: "../_shared/<name>.ts"`. Ground it in `ephemeralValues` (browser_minted, high-entropy query param) and `sourceSeqs`. The exported `transform(method, url, responses, params?)` returns the signed URL (or `{ url, body? }`).
63
63
  4. **`parser-helper`** — a decoder/normalizer ≥2 tools' parsers call (e.g. a shared JSPB walker, a shared field mapper). The consuming tool's parser.ts does `import { ... } from '../_shared/<name>.ts'`. Ground it in a captured response body (`sourceSeqs`).
64
64
  5. **`types`** — shared TypeScript interfaces used by ≥2 parsers. Type-only; no runtime behavior.
65
65
  6. **Auth is NEVER a shared module.** Login is request data, and the runtime cannot run a shared sub-workflow. Put the exact recipe in each tool's `authRecipe` (login seqs, credential names, captures with `${state.X}` wiring) and set `required: false` with empty arrays when a tool needs no login. Every authed tool replicates the same recipe inline.
@@ -52,7 +52,7 @@ Follow these steps to compile the session:
52
52
  - *Session-scoped state* (minted once per page load, reused across requests): add a bootstrap capture with `browser_bootstrap` capability. Pick the `source` based on where the value actually lives in the recording — these are not interchangeable:
53
53
  - **Response header** (`source: 'response_header'`, `header: '<exact name>'`): the bootstrap GET's HTTP response carries the token as a header. Enterprise CSRF tokens, anti-replay tokens, and many app-minted page nonces are returned this way. **First check** — search the bootstrap response headers for the recorded token before reaching for any HTML/DOM source. If the token appears in `requests[0].response.headers`, this is the only correct source. Do NOT synthesize an `_shared/page-tokens.ts` HTML-regex helper for it; the body will not contain the value and the regex will silently miss.
54
54
 
55
- **Capture-source cross-check (verifier-enforced).** Before you declare any `required` capture, locate the matching recorded request in the session and confirm the declared source actually carries the recorded value: `response_header` → the header must exist in `response.headers`; `cookie` → `response.headers['set-cookie']` must define that cookie name; `html_regex` / `text_regex` → the pattern must match the recorded response body; `request_header` / `request_url_regex` / `request_body_regex` → a browser-observed request matching `method`/`urlPattern` must carry the value in its headers, URL, or body. The verifier rejects `done()` if the declared source does not produce a value in the recording, and it explicitly classifies a runtime `STATE_MISSING` from a declared capture as a workflow-correctness error (not infra) so the tool cannot ship waived. Picking the wrong source is the most common cause of "API rungs all silently fall to playbook" — measure twice.
55
+ **Capture-source cross-check (verifier-enforced).** Before you declare any `required` capture, locate the matching recorded request in the session and confirm the declared source actually carries the recorded value: `response_header` → the header must exist in `response.headers`; `cookie` → `response.headers['set-cookie']` must define that cookie name; `html_regex` / `text_regex` → the pattern must match the recorded response body. The verifier rejects `done()` if the declared source does not produce a value in the recording, and it explicitly classifies a runtime `STATE_MISSING` from a declared capture as a workflow-correctness error (not infra) so the tool cannot ship waived. Picking the wrong source is the most common cause of "API rungs all silently fall to playbook" — measure twice.
56
56
 
57
57
  **Referenced-capture cross-check — applies even to `required: false` captures (verifier-enforced).** If ANY request hard-references a capture via `${state.X}` in a header/body/url, that capture is effectively required regardless of its `required` flag, and the verifier checks its `html_regex`/`text_regex` pattern against EVERY recorded HTML page for the site (not just the bootstrap URL's own response — the bootstrap page may not even be in the recording). If the pattern matches no recorded page, `done()` is rejected (the runtime would `STATE_MISSING` the whole request). **Write the regex against the token as it ACTUALLY appears in the recorded HTML — read the recorded page first.** Common pitfall: a token embedded as `mUtil.createSecureCookie("Csrf-token", "<hex>")` is NOT matched by a pattern like `[Cc]srf[^"']{0,24}['"]([0-9a-f]{48,})['"]` because the `", "` separator between the cookie name and value falls between the two quotes — anchor on the real structure instead, e.g. `createSecureCookie\("Csrf-token",\s*"([0-9a-f]+)"`. When the live call would burn an anti-bot `.act`, the verifier SKIPS the live test entirely if a referenced capture can't resolve — so a wrong regex here costs you a whole verification cycle with no live signal. Get it right against the recording first.
58
58
 
@@ -60,7 +60,6 @@ Follow these steps to compile the session:
60
60
  - **HTML body** (`source: 'html_regex'`): the token is embedded in a `<script>` block, meta tag, or inline JSON inside the HTML. Use this only after confirming the value actually appears in the response body.
61
61
  - **DOM** (`source: 'dom_attribute'` / `source: 'dom_text'`): the token is rendered into a specific element by the page's JS — use a stable selector.
62
62
  - **Cookie / storage** (`source: 'cookie'` / `'local_storage'` / `'session_storage'`): the token is persisted client-side after bootstrap.
63
- - **Browser-observed request** (`source: 'request_header'` / `'request_url_regex'` / `'request_body_regex'`): the bootstrap page's own XHR/fetch request carries the token, request id, or complete POST envelope. Use `method`, `urlPattern`, and `mode` to select the observed request, then extract from its header, URL, or body. This is the right source when the value is minted by page JavaScript but is absent from HTML, cookies, storage, and response headers.
64
63
  - *Per-request state* (unique per API call — nonces, request IDs, timestamps): write a `requestTransformModule` that generates fresh values.
65
64
  - *Bot-defense state* (sensor headers, fingerprints): use `stealth_bootstrap` capability.
66
65
  - **`constant`**: Identical across every pass the classifier compared — usually safe to hardcode. BUT: scrutinize high-entropy “constants” (UUIDs, JWTs, long hex/base64 strings). They may be slow-rotating tokens that happened to match across two runs taken minutes apart. If a constant looks like a token, treat it with suspicion and consider adding a bootstrap capture as a safety measure. **Exception — cross-recording corroboration.** The classifier diffs the recording against the automated replay AND against every other recording of this site (often captured hours or days apart), then keeps a value `constant` only if it never varied in any pass. A high-entropy value classified `constant` on this basis is *static infrastructure the server checks on every call*, NOT a rotating token: a GraphQL safelisting / persisted-query signature (`graphql-operation-signature`, `x-apollo-operation-id`, `x-apollo-operation-signature`), an API build/asset hash, a public app key. **Keep it verbatim** — dropping it gets the request 403'd or silently degraded to sentinel data. A genuinely rotating token could not be byte-identical across time-separated recordings; the classifier would have marked it `browser_minted`/`server_derived`. (The replay alone is unreliable here: anti-bot edges block the automated replay, so a protected header may be `constant` *purely* on cross-recording evidence — that evidence is sufficient; do not second-guess it as "high-entropy so probably rotating".)
@@ -98,7 +97,7 @@ Follow these steps to compile the session:
98
97
  - Keep headers minimal — drop bot-detection headers (Akamai fingerprints, DataDome, PerimeterX), drop browser-internal headers, keep `Content-Type`, `Origin`, `Referer` when needed
99
98
  - **CRITICAL — preserve FUNCTIONAL request headers (same principle as query params).** Beyond the standard set, the recorded request often carries headers the server *checks* on every call: anti-CSRF / anti-replay tokens (`X-Csrf-Token`, `X-XSRF-Token`, `RequestVerificationToken`, …), API keys, session/nonce headers, `X-*` app headers. These are part of the functional contract — dropping one usually makes a state-changing POST silently fail or get tarpitted, exactly like dropping a query param. For each non-bot, non-browser-internal header on the recorded request: keep it. If its value is a per-session/per-call token (high-entropy, rotates across the recording), do NOT hardcode it — capture it (`${state.NAME}` from a bootstrap/request capture) and template it. The litmus test mirrors query params: if the recorded request sent it and it isn't a bot fingerprint, the workflow request must send it too (literal if static, `${state.X}`/`${param.X}` if dynamic). A recorded state-changing POST (`*.act`, `/checkout`, `/book`, anything that mutates) that carried a CSRF/session header MUST template that header from captured state — never silently omit it.
100
99
  - **CRITICAL: Preserve ALL query parameters from the recorded URL.** Unlike HTTP headers — where you drop bot-detection fingerprints — query params are part of the API's functional contract. Even if a param value looks obfuscated or high-entropy (base64, hex, random-looking), it likely carries meaning the server checks (anti-bot tokens, session binding, A/B bucketing, obfuscated checksums). Preserve every param key: substitute the value with `${response[N].name}` or `${state.name}` if it came from an earlier response, `${param.NAME}` if user-variable, or keep the literal value if it's a static constant (like `search=false`). Missing a single query param can silently cause the API to return sentinel/degraded data rather than an error — the server may fall back to generic defaults instead of returning the actual results.
101
- - **Per-call query params (URL signing).** If a query param has a different high-entropy value on every request to the same URL path in the session, it is likely a URL signing token computed by client-side JavaScript. Do NOT hardcode the recorded value — it is per-call and will expire. Instead: use `search_response_body` to search the session's JavaScript responses (look for `.js` URLs) for the param name. The signing function is usually simple (HMAC, MD5, XOR + base64 with a static key). Once you find it, write a `requestTransformModule` (sibling to `parser.ts`) that exports `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>, state?: Record<string, unknown>)` — it takes the unsigned URL and returns the URL with the signing param appended. Set `"requestTransformModule": "./request-transform.ts"` in workflow.json. The runtime calls this function before each request.
100
+ - **Per-call query params (URL signing).** If a query param has a different high-entropy value on every request to the same URL path in the session, it is likely a URL signing token computed by client-side JavaScript. Do NOT hardcode the recorded value — it is per-call and will expire. Instead: use `search_response_body` to search the session's JavaScript responses (look for `.js` URLs) for the param name. The signing function is usually simple (HMAC, MD5, XOR + base64 with a static key). Once you find it, write a `requestTransformModule` (sibling to `parser.ts`) that exports `transform(method: string, url: string): string` — it takes the unsigned URL and returns the URL with the signing param appended. Set `"requestTransformModule": "./request-transform.ts"` in workflow.json. The runtime calls this function before each request.
102
101
  - **Complex body construction via requestTransformModule.** When the API uses a body format where simple `${param.X}` placeholder substitution cannot correctly encode values — e.g., JSPB arrays in form-encoded fields, nested JSON strings with position-dependent escaping — write a `requestTransformModule` that constructs the body programmatically. The transform receives `params` as a 4th argument and can return an object instead of a string:
103
102
  ```typescript
104
103
  export function transform(
@@ -106,13 +105,12 @@ Follow these steps to compile the session:
106
105
  url: string,
107
106
  responses: unknown[],
108
107
  params?: Record<string, string | number | boolean>,
109
- state?: Record<string, unknown>,
110
108
  ): { url: string; body?: string } {
111
109
  const body = buildRequestBody(params ?? {});
112
110
  return { url, body };
113
111
  }
114
112
  ```
115
- Returning a plain `string` (just the URL) still works for simple URL-signing. Use the object return when you need to build or modify the request body or headers. If the bootstrap URL itself needs a derived value, export `prepareParams(params)` from the same module; it runs before bootstrap URL substitution and can return additional primitive params. Do NOT invent URL query parameters as a workaround for body-encoding complexity — the server ignores unknown query params and the parameters will have no effect.
113
+ Returning a plain `string` (just the URL) still works for simple URL-signing. Use the object return when you need to build or modify the request body or headers. Do NOT invent URL query parameters as a workaround for body-encoding complexity — the server ignores unknown query params and the parameters will have no effect.
116
114
  - **`x-api-key` is normally NOT a credential.** It's an app-level identifier baked into the site's JavaScript — same for every visitor, not user-specific. Keep it as a literal string in the workflow. Only treat it as a credential if you can clearly see it varies per account (e.g., it appears in a `Set-Cookie` after login, or differs across sessions). The same applies to `x-channel-id`, `x-app-id`, `x-app-version`, and similar metadata headers — hardcode them.
117
115
  - **NEVER use `${env.NAME}` placeholders.** The `${env.X}` syntax exists in the runtime but is reserved for operator-level configuration, not for values you can see in the recording. If a value appears in the captured request, hardcode it. If multiple candidates in the same session use different API keys for different endpoints, hardcode each one — they are endpoint-specific app constants, not secrets. The only valid placeholder types for your workflow are `${param.NAME}`, `${credential.NAME}`, `${state.NAME}`, and `${response[N].NAME}`.
118
116
  - If the workflow chains multiple requests (request N+1 uses a value from request N's response), add an `extract` field to request N and reference it in request N+1 via `${response[N].name}`
@@ -22,8 +22,7 @@ You receive `{ site, url, module, availableDependencies, sources, implementation
22
22
  ## Output requirements by `kind`
23
23
 
24
24
  ### `request-transform`
25
- - Export a `transform` function: `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>, state?: Record<string, unknown>): string | { url: string; body?: string; headers?: Record<string, string> }`.
26
- - Optionally export `prepareParams(params)` when bootstrap URLs need derived primitive params before `${param.*}` substitution.
25
+ - Export a `transform` function: `transform(method: string, url: string, responses: unknown[], params?: Record<string, string | number | boolean>): string | { url: string; body?: string }`.
27
26
  - It reproduces the site's per-request signing/body logic (e.g. HMAC/MD5/CRC32 + encoding) so the regenerated value matches what the recording sent. Derive the algorithm from `sources` (and any `.js` body included there). Return the URL with the signing param appended (or `{ url, body }` when you must build the body).
28
27
  - **The verifier re-signs a recorded URL and checks your output reproduces the recorded signing param.** A no-op that returns the URL unchanged will fail.
29
28