@isdk/web-fetcher 0.2.7 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. package/README.action.cn.md +9 -2
  2. package/README.action.md +9 -2
  3. package/README.cn.md +5 -1
  4. package/README.engine.cn.md +6 -2
  5. package/README.engine.md +6 -2
  6. package/README.md +5 -1
  7. package/dist/index.d.mts +5 -1
  8. package/dist/index.d.ts +5 -1
  9. package/dist/index.js +1 -1
  10. package/dist/index.mjs +1 -1
  11. package/docs/README.md +5 -1
  12. package/docs/_media/README.action.md +9 -2
  13. package/docs/_media/README.cn.md +5 -1
  14. package/docs/_media/README.engine.md +6 -2
  15. package/docs/classes/CheerioFetchEngine.md +71 -59
  16. package/docs/classes/ClickAction.md +23 -23
  17. package/docs/classes/ExtractAction.md +23 -23
  18. package/docs/classes/FetchAction.md +23 -23
  19. package/docs/classes/FetchEngine.md +67 -59
  20. package/docs/classes/FetchSession.md +10 -10
  21. package/docs/classes/FillAction.md +23 -23
  22. package/docs/classes/GetContentAction.md +23 -23
  23. package/docs/classes/GotoAction.md +23 -23
  24. package/docs/classes/PauseAction.md +23 -23
  25. package/docs/classes/PlaywrightFetchEngine.md +71 -59
  26. package/docs/classes/SubmitAction.md +23 -23
  27. package/docs/classes/WaitForAction.md +23 -23
  28. package/docs/classes/WebFetcher.md +5 -5
  29. package/docs/enumerations/FetchActionResultStatus.md +4 -4
  30. package/docs/functions/fetchWeb.md +2 -2
  31. package/docs/interfaces/BaseFetchActionProperties.md +11 -11
  32. package/docs/interfaces/BaseFetchCollectorActionProperties.md +15 -15
  33. package/docs/interfaces/BaseFetcherProperties.md +32 -24
  34. package/docs/interfaces/DispatchedEngineAction.md +4 -4
  35. package/docs/interfaces/ExtractActionProperties.md +11 -11
  36. package/docs/interfaces/FetchActionInContext.md +15 -15
  37. package/docs/interfaces/FetchActionProperties.md +12 -12
  38. package/docs/interfaces/FetchActionResult.md +6 -6
  39. package/docs/interfaces/FetchContext.md +46 -34
  40. package/docs/interfaces/FetchEngineContext.md +41 -29
  41. package/docs/interfaces/FetchMetadata.md +5 -5
  42. package/docs/interfaces/FetchResponse.md +21 -13
  43. package/docs/interfaces/FetchReturnTypeRegistry.md +7 -7
  44. package/docs/interfaces/FetchSite.md +39 -27
  45. package/docs/interfaces/FetcherOptions.md +38 -26
  46. package/docs/interfaces/GotoActionOptions.md +6 -6
  47. package/docs/interfaces/PendingEngineRequest.md +3 -3
  48. package/docs/interfaces/SubmitActionOptions.md +2 -2
  49. package/docs/interfaces/WaitForActionOptions.md +5 -5
  50. package/docs/type-aliases/BaseFetchActionOptions.md +1 -1
  51. package/docs/type-aliases/BaseFetchCollectorOptions.md +1 -1
  52. package/docs/type-aliases/BrowserEngine.md +1 -1
  53. package/docs/type-aliases/FetchActionCapabilities.md +1 -1
  54. package/docs/type-aliases/FetchActionCapabilityMode.md +1 -1
  55. package/docs/type-aliases/FetchActionOptions.md +1 -1
  56. package/docs/type-aliases/FetchEngineAction.md +1 -1
  57. package/docs/type-aliases/FetchEngineType.md +1 -1
  58. package/docs/type-aliases/FetchReturnType.md +1 -1
  59. package/docs/type-aliases/FetchReturnTypeFor.md +1 -1
  60. package/docs/type-aliases/OnFetchPauseCallback.md +1 -1
  61. package/docs/type-aliases/ResourceType.md +1 -1
  62. package/docs/variables/DefaultFetcherProperties.md +1 -1
  63. package/docs/variables/FetcherOptionKeys.md +1 -1
  64. package/package.json +3 -2
@@ -127,15 +127,22 @@ export class FillAction extends FetchAction {
127
127
 
128
128
  暂停执行,以等待一个或多个条件的满足。
129
129
 
130
- 在 `browser` 模式下,如果提供了多个条件,它们将按顺序依次等待。例如,它会先等待选择器出现,然后等待网络空闲,最后再等待指定的毫秒数。
130
+ 在 `browser` 模式下,如果提供了多个条件,它们将按顺序依次等待。例如,它会先等待选择器出现,然后等待网络空闲,**最后**再等待指定的毫秒数 (`ms`)。
131
131
 
132
132
  * **`id`**: `waitFor`
133
133
  * **`params`**: 一个指定等待条件的对象,可包含以下一个或多个键:
134
- * **`ms`** (number): 等待指定的毫秒数。两个引擎都支持。
135
134
  * **`selector`** (string): 等待匹配的选择器出现在页面中。仅 `browser` 模式支持。
136
135
  * **`networkIdle`** (boolean): 等待直到网络空闲(即,在一段时间内没有新的网络请求)。仅 `browser` 模式支持。
136
+ * **`ms`** (number): 在满足其他条件**之后**,继续等待指定的毫秒数。两个引擎都支持。
137
137
  * **`returns`**: `none`
138
138
 
139
+ > **⚠️ 关键区别: `ms` (延迟) vs 超时**
140
+ >
141
+ > 请勿将 **`ms`** 参数与“超时时间”混淆。
142
+ >
143
+ > * **`ms` (持续时长)**: 这是一个强制的“睡眠”或“延迟”时间。如果与 `selector` 一起使用,它表示在找到元素**之后**,还要额外等待的时间。
144
+ > * **超时 (Deadline)**: 等待条件(如 `selector` 或 `networkIdle`)满足的最长等待时间是由会话级别的 **`timeoutMs`** 选项控制的(默认通常是 30秒),而不是这个 `ms` 参数。
145
+
139
146
  #### `pause`
140
147
 
141
148
  暂停 Action 脚本的执行,以允许用户手动介入(例如,解决验证码)。此 Action **必须**在 **`fetchWeb`** 的选项中提供一个 **`onPause`** 回调处理器。当此 Action 被触发时,它会调用 **`onPause`** 处理器并等待其执行完成。
package/README.action.md CHANGED
@@ -127,15 +127,22 @@ Submits a form.
127
127
 
128
128
  Pauses execution to wait for one or more conditions to be met.
129
129
 
130
- In `browser` mode, if multiple conditions are provided, they are awaited sequentially. For example, it will first wait for the selector to appear, then wait for the network to be idle, and finally wait for the specified duration.
130
+ In `browser` mode, if multiple conditions are provided, they are awaited sequentially. For example, it will first wait for the selector to appear, then wait for the network to be idle, and **finally** wait for the specified duration (`ms`).
131
131
 
132
132
  * **`id`**: `waitFor`
133
133
  * **`params`**: An object specifying the wait condition, which can contain one or more of the following keys:
134
- * **`ms`** (number): Waits for the specified number of milliseconds. Supported by both engines.
135
134
  * **`selector`** (string): Waits for an element matching the selector to appear on the page. Supported only in `browser` mode.
136
135
  * **`networkIdle`** (boolean): Waits until the network is idle (i.e., no new network requests for a period of time). Supported only in `browser` mode.
136
+ * **`ms`** (number): Waits for the specified number of milliseconds **after** other conditions are met. Supported by both engines.
137
137
  * **`returns`**: `none`
138
138
 
139
+ > **⚠️ Critical Distinction: `ms` vs Timeout**
140
+ >
141
+ > Do not confuse the **`ms`** parameter with a timeout setting.
142
+ >
143
+ > * **`ms` (Duration)**: This forces the script to sleep/pause for a fixed amount of time. If used with `selector`, it adds an **extra delay** after the element is found.
144
+ > * **Timeout (Deadline)**: The maximum time the script will wait for a condition (like `selector` or `networkIdle`) to be met before failing is controlled by the session-level **`timeoutMs`** option (default is usually 30s), not this `ms` parameter.
145
+
139
146
  #### `pause`
140
147
 
141
148
  Pauses the execution of the Action Script to allow for manual user intervention (e.g., solving a CAPTCHA).
package/README.cn.md CHANGED
@@ -133,7 +133,10 @@ searchGoogle('gemini');
133
133
  * `engine` ('http' | 'browser' | 'auto'): 要使用的引擎。默认为 `auto`。
134
134
  * `actions` (FetchActionOptions[]): 要执行的动作对象数组。(支持 `action`/`name` 作为 `id` 的别名,`args` 作为 `params` 的别名)
135
135
  * `headers` (Record<string, string>): 用于所有请求的头信息。
136
- * ...以及许多其他用于代理、Cookie、重试等的选项。
136
+ * `cookies` (Cookie[]): 要使用的 Cookie 数组。
137
+ * `sessionState` (any): 要恢复的 Crawlee 会话状态。
138
+ * `sessionPoolOptions` (SessionPoolOptions): 底层 Crawlee SessionPool 的高级配置。
139
+ * ...以及许多其他用于代理、重试等的选项。
137
140
 
138
141
  ### 内置动作
139
142
 
@@ -157,6 +160,7 @@ searchGoogle('gemini');
157
160
  * `statusCode`: HTTP 状态码。
158
161
  * `headers`: HTTP 头信息。
159
162
  * `cookies`: Cookie 数组。
163
+ * `sessionState`: Crawlee 会话状态。
160
164
  * `text`, `html`: 页面内容。
161
165
  * `outputs` (Record<string, any>): 通过 `storeAs` 提取并存储的数据。
162
166
 
@@ -24,7 +24,7 @@
24
24
  * **关键抽象**:
25
25
  * **生命周期**:`initialize()` 和 `cleanup()` 方法。
26
26
  * **核心操作**:`goto()`、`getContent()`、`click()`、`fill()`、`submit()`、`waitFor()`、`extract()`。
27
- * **配置与状态**:`headers()`、`cookies()`、`blockResources()`、`getState()`。
27
+ * **配置与状态**:`headers()`、`cookies()`、`blockResources()`、`getState()`、`sessionPoolOptions`。
28
28
  * **静态注册表**:它维护所有可用引擎实现的静态注册表(`FetchEngine.register`),允许通过 `id` 或 `mode` 动态选择引擎。
29
29
 
30
30
  ### `FetchEngine.create(context, options)`
@@ -48,10 +48,14 @@
48
48
 
49
49
  引擎支持在多次执行之间持久化和恢复会话状态(主要是 Cookie)。
50
50
 
51
- * **`sessionState`**: 一个完整的状态对象(源自 Crawlee 的 SessionPool),可用于完全恢复之前的会话。这在引擎初始化期间设置。
51
+ * **`sessionState`**: 一个完整的状态对象(源自 Crawlee 的 SessionPool),可用于完全恢复之前的会话。该状态会**自动包含在每个 `FetchResponse` 中**,方便进行持久化,并在以后初始化引擎时通过选项传回。
52
+ * **`sessionPoolOptions`**: 允许对底层的 Crawlee `SessionPool` 进行高级配置(例如 `maxUsageCount`, `maxPoolSize`)。
53
+ > **注意**: 为了确保会话状态管理的正常运行,`persistenceOptions.enable` 会被强制设置为 `true`。
52
54
  * **`overrideSessionState`**: 如果设置为 `true`,则强制引擎使用提供的 `sessionState` 覆盖存储中的任何现有持久化状态。当你希望确保会话以确切提供的状态启动,忽略持久化层中的任何陈旧数据时,这非常有用。
53
55
  * **`cookies`**: 用于会话的显式 Cookie 数组。
54
56
 
57
+ > **一致性说明**: 引擎使用 Crawlee 的 `Session` 作为 Cookie 的唯一真理源。在 `browser` 模式下,Cookie 会在交互(如点击)期间实时从浏览器同步到 Session,确保 `getContent()` 或 `extract()` 始终反映最新状态。
58
+
55
59
  **优先级规则:**
56
60
  如果同时提供了 `sessionState` 和 `cookies`,引擎将采用**“合并并覆盖”**策略:
57
61
  1. 首先从 `sessionState` 恢复会话。
package/README.engine.md CHANGED
@@ -24,7 +24,7 @@ This is the abstract base class that defines the contract for all fetch engines.
24
24
  * **Key Abstractions**:
25
25
  * **Lifecycle**: `initialize()` and `cleanup()` methods.
26
26
  * **Core Actions**: `goto()`, `getContent()`, `click()`, `fill()`, `submit()`, `waitFor()`, `extract()`.
27
- * **Configuration & State**: `headers()`, `cookies()`, `blockResources()`, `getState()`.
27
+ * **Configuration & State**: `headers()`, `cookies()`, `blockResources()`, `getState()`, `sessionPoolOptions`.
28
28
  * **Static Registry**: It maintains a static registry of all available engine implementations (`FetchEngine.register`), allowing for dynamic selection by `id` or `mode`.
29
29
 
30
30
  ### `FetchEngine.create(context, options)`
@@ -48,10 +48,14 @@ When the library determines which engine to use (via internal `maybeCreateEngine
48
48
 
49
49
  The engine supports persisting and restoring session state (primarily cookies) between executions.
50
50
 
51
- * **`sessionState`**: A comprehensive state object (derived from Crawlee's SessionPool) that can be used to fully restore a previous session. This is set during engine initialization.
51
+ * **`sessionState`**: A comprehensive state object (derived from Crawlee's SessionPool) that can be used to fully restore a previous session. This state is **automatically included in every `FetchResponse`**, making it easy to persist and later provide back to the engine during initialization.
52
+ * **`sessionPoolOptions`**: Allows advanced configuration of the underlying Crawlee `SessionPool` (e.g., `maxUsageCount`, `maxPoolSize`).
53
+ > **Note**: `persistenceOptions.enable` is forced to `true` to ensure proper session state management.
52
54
  * **`overrideSessionState`**: If set to `true`, it forces the engine to overwrite any existing persistent state in the storage with the provided `sessionState`. This is useful when you want to ensure the session starts with the exact state provided, ignoring any stale data in the persistence layer.
53
55
  * **`cookies`**: An array of explicit cookies to use for the session.
54
56
 
57
+ > **Consistency Note**: The engine uses Crawlee's `Session` as the single source of truth for cookies. In `browser` mode, cookies are synchronized in real-time from the browser to the session during interactions (like clicks), ensuring that `getContent()` or `extract()` always reflect the latest state.
58
+
55
59
  **Precedence Rule:**
56
60
  If both `sessionState` and `cookies` are provided, the engine adopts a **"Merge and Override"** strategy:
57
61
  1. The session is first restored from the `sessionState`.
package/README.md CHANGED
@@ -133,7 +133,10 @@ This is the main entry point for the library.
133
133
  * `engine` ('http' | 'browser' | 'auto'): The engine to use. Defaults to `auto`.
134
134
  * `actions` (FetchActionOptions[]): An array of action objects to execute. (Supports `action`/`name` as alias for `id`, and `args` as alias for `params`)
135
135
  * `headers` (Record<string, string>): Headers to use for all requests.
136
- * ...and many other options for proxy, cookies, retries, etc.
136
+ * `cookies` (Cookie[]): Array of cookies to use.
137
+ * `sessionState` (any): Crawlee session state to restore.
138
+ * `sessionPoolOptions` (SessionPoolOptions): Advanced configuration for the underlying Crawlee SessionPool.
139
+ * ...and many other options for proxy, retries, etc.
137
140
 
138
141
  ### Built-in Actions
139
142
 
@@ -157,6 +160,7 @@ The `fetchWeb` function returns an object containing:
157
160
  * `statusCode`: HTTP status code.
158
161
  * `headers`: HTTP headers.
159
162
  * `cookies`: Array of cookies.
163
+ * `sessionState`: Crawlee session state.
160
164
  * `text`, `html`: Page content.
161
165
  * `outputs` (Record<string, any>): Data extracted and stored via `storeAs`.
162
166
 
package/dist/index.d.mts CHANGED
@@ -1,4 +1,4 @@
1
- import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions } from 'crawlee';
1
+ import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions, SessionPoolOptions } from 'crawlee';
2
2
  export { Cookie } from 'crawlee';
3
3
  import { EventEmitter } from 'events-ex';
4
4
 
@@ -947,6 +947,7 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
947
947
  protected requestQueue?: RequestQueue;
948
948
  protected hdrs: Record<string, string>;
949
949
  protected _initialCookies?: Cookie[];
950
+ protected _initializedSessions: Set<string>;
950
951
  protected currentSession?: Session;
951
952
  protected pendingRequests: Map<string, PendingEngineRequest>;
952
953
  protected requestCounter: number;
@@ -1258,6 +1259,7 @@ type CheerioNode = ReturnType<CheerioSelection['first']>;
1258
1259
  declare class CheerioFetchEngine extends FetchEngine<CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions> {
1259
1260
  static readonly id = "cheerio";
1260
1261
  static readonly mode = "http";
1262
+ private _ensureCheerioContext;
1261
1263
  protected _buildResponse(context: CheerioCrawlingContext): Promise<FetchResponse>;
1262
1264
  protected _querySelectorAll(context: {
1263
1265
  $: CheerioAPI;
@@ -1413,6 +1415,7 @@ interface BaseFetcherProperties {
1413
1415
  headers?: Record<string, string>;
1414
1416
  cookies?: Cookie[];
1415
1417
  sessionState?: any;
1418
+ sessionPoolOptions?: SessionPoolOptions;
1416
1419
  overrideSessionState?: boolean;
1417
1420
  reuseCookies?: boolean;
1418
1421
  throwHttpErrors?: boolean;
@@ -1486,6 +1489,7 @@ interface FetchResponse {
1486
1489
  text?: string;
1487
1490
  json?: any;
1488
1491
  cookies?: Cookie[];
1492
+ sessionState?: any;
1489
1493
  metadata?: FetchMetadata;
1490
1494
  }
1491
1495
  declare const DefaultFetcherProperties: BaseFetcherProperties;
package/dist/index.d.ts CHANGED
@@ -1,4 +1,4 @@
1
- import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions } from 'crawlee';
1
+ import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions, SessionPoolOptions } from 'crawlee';
2
2
  export { Cookie } from 'crawlee';
3
3
  import { EventEmitter } from 'events-ex';
4
4
 
@@ -947,6 +947,7 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
947
947
  protected requestQueue?: RequestQueue;
948
948
  protected hdrs: Record<string, string>;
949
949
  protected _initialCookies?: Cookie[];
950
+ protected _initializedSessions: Set<string>;
950
951
  protected currentSession?: Session;
951
952
  protected pendingRequests: Map<string, PendingEngineRequest>;
952
953
  protected requestCounter: number;
@@ -1258,6 +1259,7 @@ type CheerioNode = ReturnType<CheerioSelection['first']>;
1258
1259
  declare class CheerioFetchEngine extends FetchEngine<CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions> {
1259
1260
  static readonly id = "cheerio";
1260
1261
  static readonly mode = "http";
1262
+ private _ensureCheerioContext;
1261
1263
  protected _buildResponse(context: CheerioCrawlingContext): Promise<FetchResponse>;
1262
1264
  protected _querySelectorAll(context: {
1263
1265
  $: CheerioAPI;
@@ -1413,6 +1415,7 @@ interface BaseFetcherProperties {
1413
1415
  headers?: Record<string, string>;
1414
1416
  cookies?: Cookie[];
1415
1417
  sessionState?: any;
1418
+ sessionPoolOptions?: SessionPoolOptions;
1416
1419
  overrideSessionState?: boolean;
1417
1420
  reuseCookies?: boolean;
1418
1421
  throwHttpErrors?: boolean;
@@ -1486,6 +1489,7 @@ interface FetchResponse {
1486
1489
  text?: string;
1487
1490
  json?: any;
1488
1491
  cookies?: Cookie[];
1492
+ sessionState?: any;
1489
1493
  metadata?: FetchMetadata;
1490
1494
  }
1491
1495
  declare const DefaultFetcherProperties: BaseFetcherProperties;
package/dist/index.js CHANGED
@@ -1 +1 @@
1
- "use strict";var t,e=Object.create,i=Object.defineProperty,s=Object.getOwnPropertyDescriptor,n=Object.getOwnPropertyNames,r=Object.getPrototypeOf,o=Object.prototype.hasOwnProperty,a=(t,e,r,a)=>{if(e&&"object"==typeof e||"function"==typeof e)for(let c of n(e))o.call(t,c)||c===r||i(t,c,{get:()=>e[c],enumerable:!(a=s(e,c))||a.enumerable});return t},c={};((t,e)=>{for(var s in e)i(t,s,{get:e[s],enumerable:!0})})(c,{CheerioFetchEngine:()=>j,ClickAction:()=>N,DefaultFetcherProperties:()=>l,ExtractAction:()=>z,FetchAction:()=>f,FetchActionResultStatus:()=>h,FetchEngine:()=>S,FetchSession:()=>$,FetcherOptionKeys:()=>u,FillAction:()=>H,GetContentAction:()=>M,GotoAction:()=>L,PauseAction:()=>B,PlaywrightFetchEngine:()=>U,SubmitAction:()=>G,WaitForAction:()=>W,WebFetcher:()=>R,fetchWeb:()=>D}),module.exports=(t=c,a(i({},"__esModule",{value:!0}),t));var l={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},u=Object.keys(l).concat(["actions","onPause"]),h=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(h||{}),w=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const n=[],r=new Set;for(const i of s){const s=d(i.activateOn),o=d(i.collectOn),a=d(i.deactivateOn),c=!(i.background??!0),l=t.create(i);if(!l)continue;let u=!1,h=!1,w=0;const f=async t=>{if(!u&&!h){u=!0;try{await(l.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!h){u||await f(s);try{const n=Promise.resolve(l.onExecute?.(e,i,s)).then(s=>{var n,r;if(i.storeAs){((n=e.outputs)[r=i.storeAs]||(n[r]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{w++});c&&(r.add(n),n.finally(()=>r.delete(n)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!h){0===w&&m("collector:after"),h=!0;try{await(l.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),b.forEach(t=>t())}}},v=p(e,s,f),b=y(e,o,m),x=p(e,a,g);if(n.push(...v,...b,...x),!s.length&&!o.length&&!a.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),n.push(()=>e.eventBus.off("fetcher:action:end",t))}}return n.length||r.size>0?{cleanup:()=>n.forEach(t=>t()),awaitExecPendings:async()=>{r.size>0&&await Promise.allSettled(Array.from(r))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,n=i.length>0?i[i.length-1].id:void 0,r={...e,id:this.id,depth:s,parent:n};i.push(r),t.currentAction=r;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const n=t.internal.actionStack,r=n.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:r,stack:[...n]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{n.pop();const e=n.length;t.currentAction=e>0?n[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let n;try{return t.throwHttpErrors=s,n=await this.onExecute(t,e),n&&n.returnType||(n={status:1,returnType:this.returnType??"any",result:n}),n}catch(e){if(n={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return n}finally{await this.afterExec(t,e,n,i)}}};w.registry=new Map,w.returnType="any",w.capabilities={http:"noop",browser:"noop"};var f=w;function d(t){return t?Array.isArray(t)?t:[t]:[]}function p(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(n,e),s.push(()=>t.eventBus.off(n,e))}return s}function y(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=t=>i(n,t);t.eventBus.on(n,e),s.push(()=>t.eventBus.off(n,e))}return s}var m=require("events-ex");var g=require("lodash-es"),v=(0,require("nanoid").customAlphabet)("0123456789abcdefghijklmnopqrstuvwxyz",12);var b=require("lodash-es"),x=require("events-ex"),q=require("@isdk/common-error"),E=require("crawlee");function k(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}E.Configuration.getGlobalConfig().set("persistStorage",!1);var S=class{constructor(){this.hdrs={},this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new x.EventEmitter,this.isPageActive=!1,this.navigationLock=function(){const t=k();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(t,e){const i=(0,b.defaultsDeep)(e,t,l),s=i.engine??t.engine,n=s?this.get(s)??this.getByMode(s):null;if(n){const e=new n;return await e.initialize(t,i),e}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let n=e;if(i){const t=await this._querySelectorAll(e,i);n=t.length>0?t[0]:null}if(!n)return null;const r={};for(const t in s)r[t]=await this._extract(s[t],n);return r}if("array"===i){const{selector:i,items:s}=t,n=i?await this._querySelectorAll(e,i):[e],r=[];for(const t of n)r.push(await this._extract(s,t));return r}const{selector:s}=t;let n=e;if(s){const t=await this._querySelectorAll(e,s);n=t.length>0?t[0]:null}return n?this._extractValue(t,n):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,n=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=n,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;(0,b.merge)(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await E.RequestQueue.open();const i=await this._getSpecificCrawlerOptions(t);t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const s={...(0,b.defaultsDeep)(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:{maxPoolSize:1,persistenceOptions:{enable:!0},sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}}}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};s.preNavigationHooks||(s.preNavigationHooks=[]),s.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,this._initialCookies&&this._initialCookies.length>0&&e){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url),this._initialCookies=void 0}});const n=this.crawler=this._createCrawler(s),r=await E.KeyValueStore.open(null,{config:n.config}),o=await r.getValue(E.PERSIST_STATE_KEY);!t.sessionState||o&&!t.overrideSessionState||await r.setValue(E.PERSIST_STATE_KEY,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){try{this.currentSession=t.session;const{request:e}=t;this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this._buildResponse(t),n=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&n){const t=new q.CommonError(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,n=t?.statusCode||500,r=t?.url?t.url:i.url,o=new q.CommonError(`Request${r?" for "+r:""} failed: ${e.message}`,"request",n);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function C(t,e){let i;if(e?.engine){if(i=await S.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),n=t.engine||s?.engine||"auto";return i=await S.create(t,{engine:n}),i||(i=await S.create(t,{engine:"http"})),i}S.registry=new Map;var $=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=v(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=f.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await C(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(t=this.options){const e=new m.EventEmitter;return(0,g.defaultsDeep)({...t,id:this.id,eventBus:e,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},l)}},R=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new $(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}},A=require("crawlee"),P=((t,s,n)=>(n=null!=t?e(r(t)):{},a(!s&&t&&t.__esModule?n:i(n,"default",{value:t,enumerable:!0}),t)))(require("cheerio")),F=require("@isdk/common-error"),j=class extends S{async _buildResponse(t){const{request:e,response:i,body:s,$:n}=t,r=n?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");r&&r!==o&&(o=r);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:n="string"}=t;if(0===i.length)return null;let r="";if(r=s?i.attr(s)??null:"html"===n?i.html():i.text().trim(),null===r)return null;switch(n){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,n=i(s).first();let r;if(0===n.length)try{r=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new F.CommonError(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!n.is("a")||!n.attr("href")){if(n.is('input[type="submit"], button[type="submit"], button, input')){const e=n.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new F.CommonError("click: submit-like element without form","click")}throw new F.CommonError(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=n.attr("href");r=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:r});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new F.CommonError(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new F.CommonError(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new F.NotFoundError(e.selector,"submit");const n=s.attr("action")||t.request.loadedUrl||t.request.url,r=(s.attr("method")||"GET").toUpperCase(),o=new URL(n,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),n=s.attr("name");if(!n)return;const r=s.val();null!=r&&(a[n]=String(r))}),"GET"===r){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const n={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),n["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),n["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:n})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new F.CommonError(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",F.ErrorCode.NotSupported)}}_updateStateAfterNavigation(t,e){const i=e;let s=i.headers;if(!s&&i.rawHeaders){s={};for(let t=0;t<i.rawHeaders.length;t+=2)s[i.rawHeaders[t].toLowerCase()]=i.rawHeaders[t+1]}s=s||{};const n=i.body,r=P.load(n??"");t.$=r,t.response=i,t.body=n;const o=r.html(),a=r.text(),c=(s["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:i.url,statusCode:i.statusCode,statusText:i.statusMessage,headers:s,contentType:c,body:n,html:o,text:a}}_createCrawler(t){return new A.CheerioCrawler(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new A.ProxyConfiguration({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const n=e?.timeoutMs||this.opts?.timeoutMs||3e4,r=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new F.CommonError(`goto timed out after ${n}ms.`,"gotoTimeout",F.ErrorCode.RequestTimeout))},n);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(r),t(e)},reject:t=>{clearTimeout(r),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=k(),s}};j.id="cheerio",j.mode="http",S.register(j);var O=require("crawlee"),T=require("playwright"),_=require("@isdk/common-error"),U=class extends S{async _buildResponse(t){const{page:e,response:i,request:s,session:n}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const r=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return console.log("[PlaywrightEngine] Cookies from page context:",a),n&&n.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:r,html:r,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let n="";if(n=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===n)return null;switch(n=n.trim(),s){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const n=await this.buildResponse(t);return this.lastResponse=n,n}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const n=e.selector||"form",r=i.locator(n).first();if(0===await r.count())throw new _.NotFoundError(n,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await r.elementHandle();if(!t)throw new _.CommonError(`submit: could not get form handle for ${n}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),n=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:n,html:n,text:n,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await r.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new _.CommonError(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",_.ErrorCode.NotSupported)}}_createCrawler(t){return new O.PlaywrightCrawler(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const n=this.blockedTypes;n.size>0&&await e.route("**/*",t=>{n.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:T.firefox,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new _.CommonError("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};U.id="playwright",U.mode="browser",S.register(U);var N=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};N.id="click",N.returnType="none",N.capabilities={http:"simulate",browser:"native"},f.register(N);var H=class extends f{async onExecute(t,e){const{selector:i,value:s,...n}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,n)}};H.id="fill",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},f.register(H);var M=class extends f{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},f.register(M);var L=class extends f{async onExecute(t,e,i){const s=e?.params,n=s?.url||t.url;if(!n)throw new Error("URL is required for goto action");const r=t.internal.engine;if(!r)throw new Error("No engine available");t.url=n;return await r.goto(n,s)}};L.id="goto",L.returnType="response",L.capabilities={http:"native",browser:"native"},f.register(L);var G=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};G.id="submit",G.returnType="none",G.capabilities={http:"simulate",browser:"native"},f.register(G);var W=class extends f{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};W.id="waitFor",W.returnType="none",W.capabilities={http:"native",browser:"native"},f.register(W);var z=class extends f{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};z.id="extract",z.returnType="any",z.capabilities={http:"native",browser:"native"},f.register(z);var B=class extends f{async onExecute(t,e){const{selector:i,message:s,attribute:n}=e?.params||{},r=t.internal.engine;if("browser"===r?.mode){if(i){if(!await(r?.extract({selector:i,attribute:n})))return}r&&"pause"in r?await r.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function D(t,e){return(new R).fetch(t,e)}B.id="pause",B.capabilities={http:"native",browser:"native"},B.returnType="none",f.register(B);
1
+ "use strict";var t,e=Object.create,i=Object.defineProperty,s=Object.getOwnPropertyDescriptor,n=Object.getOwnPropertyNames,r=Object.getPrototypeOf,o=Object.prototype.hasOwnProperty,a=(t,e,r,a)=>{if(e&&"object"==typeof e||"function"==typeof e)for(let c of n(e))o.call(t,c)||c===r||i(t,c,{get:()=>e[c],enumerable:!(a=s(e,c))||a.enumerable});return t},c={};((t,e)=>{for(var s in e)i(t,s,{get:e[s],enumerable:!0})})(c,{CheerioFetchEngine:()=>O,ClickAction:()=>N,DefaultFetcherProperties:()=>l,ExtractAction:()=>W,FetchAction:()=>f,FetchActionResultStatus:()=>h,FetchEngine:()=>k,FetchSession:()=>$,FetcherOptionKeys:()=>u,FillAction:()=>H,GetContentAction:()=>M,GotoAction:()=>L,PauseAction:()=>z,PlaywrightFetchEngine:()=>U,SubmitAction:()=>B,WaitForAction:()=>G,WebFetcher:()=>R,fetchWeb:()=>D}),module.exports=(t=c,a(i({},"__esModule",{value:!0}),t));var l={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},u=Object.keys(l).concat(["actions","onPause"]),h=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(h||{}),w=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const n=[],r=new Set;for(const i of s){const s=d(i.activateOn),o=d(i.collectOn),a=d(i.deactivateOn),c=!(i.background??!0),l=t.create(i);if(!l)continue;let u=!1,h=!1,w=0;const f=async t=>{if(!u&&!h){u=!0;try{await(l.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!h){u||await f(s);try{const n=Promise.resolve(l.onExecute?.(e,i,s)).then(s=>{var n,r;if(i.storeAs){((n=e.outputs)[r=i.storeAs]||(n[r]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{w++});c&&(r.add(n),n.finally(()=>r.delete(n)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!h){0===w&&m("collector:after"),h=!0;try{await(l.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),v.forEach(t=>t())}}},b=p(e,s,f),v=y(e,o,m),x=p(e,a,g);if(n.push(...b,...v,...x),!s.length&&!o.length&&!a.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),n.push(()=>e.eventBus.off("fetcher:action:end",t))}}return n.length||r.size>0?{cleanup:()=>n.forEach(t=>t()),awaitExecPendings:async()=>{r.size>0&&await Promise.allSettled(Array.from(r))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,n=i.length>0?i[i.length-1].id:void 0,r={...e,id:this.id,depth:s,parent:n};i.push(r),t.currentAction=r;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const n=t.internal.actionStack,r=n.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:r,stack:[...n]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{n.pop();const e=n.length;t.currentAction=e>0?n[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let n;try{return t.throwHttpErrors=s,n=await this.onExecute(t,e),n&&n.returnType||(n={status:1,returnType:this.returnType??"any",result:n}),n}catch(e){if(n={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return n}finally{await this.afterExec(t,e,n,i)}}};w.registry=new Map,w.returnType="any",w.capabilities={http:"noop",browser:"noop"};var f=w;function d(t){return t?Array.isArray(t)?t:[t]:[]}function p(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(n,e),s.push(()=>t.eventBus.off(n,e))}return s}function y(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=t=>i(n,t);t.eventBus.on(n,e),s.push(()=>t.eventBus.off(n,e))}return s}var m=require("events-ex");var g=require("lodash-es"),b=(0,require("nanoid").customAlphabet)("0123456789abcdefghijklmnopqrstuvwxyz",12);var v=require("lodash-es"),x=require("events-ex"),q=require("@isdk/common-error"),E=require("crawlee");function S(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}E.Configuration.getGlobalConfig().set("persistStorage",!1);var k=class{constructor(){this.hdrs={},this._initializedSessions=new Set,this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new x.EventEmitter,this.isPageActive=!1,this.navigationLock=function(){const t=S();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(t,e){const i=(0,v.defaultsDeep)(e,t,l),s=i.engine??t.engine,n=s?this.get(s)??this.getByMode(s):null;if(n){const e=new n;return await e.initialize(t,i),e}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let n=e;if(i){const t=await this._querySelectorAll(e,i);n=t.length>0?t[0]:null}if(!n)return null;const r={};for(const t in s)r[t]=await this._extract(s[t],n);return r}if("array"===i){const{selector:i,items:s}=t,n=i?await this._querySelectorAll(e,i):[e],r=[];for(const t of n)r.push(await this._extract(s,t));return r}const{selector:s}=t;let n=e;if(s){const t=await this._querySelectorAll(e,s);n=t.length>0?t[0]:null}return n?this._extractValue(t,n):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),!e.cookies&&t.session&&(e.cookies=t.session.getCookies(t.request.url)),this.crawler?.sessionPool&&(e.sessionState=await this.crawler.sessionPool.getState()),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,n=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=n,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;(0,v.merge)(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await E.RequestQueue.open();const i=await this._getSpecificCrawlerOptions(t),s=(0,v.defaultsDeep)({persistenceOptions:{enable:!0}},t.sessionPoolOptions,{maxPoolSize:1,sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}});t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const n={...(0,v.defaultsDeep)(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:s}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};n.preNavigationHooks||(n.preNavigationHooks=[]),n.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,e&&!this._initializedSessions.has(e.id)){if(this._initialCookies&&this._initialCookies.length>0){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url)}this._initializedSessions.add(e.id)}});const r=this.crawler=this._createCrawler(n),o=await E.KeyValueStore.open(null,{config:r.config}),a=await o.getValue(E.PERSIST_STATE_KEY);!t.sessionState||a&&!t.overrideSessionState||await o.setValue(E.PERSIST_STATE_KEY,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){const{request:e}=t;try{this.currentSession=t.session,this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this.buildResponse(t),n=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&n){const t=new q.CommonError(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{if(this.currentSession){const t=this.currentSession.getCookies(e.url);t&&(this._initialCookies=t)}this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,n=t?.statusCode||500,r=t?.url?t.url:i.url,o=new q.CommonError(`Request${r?" for "+r:""} failed: ${e.message}`,"request",n);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this._initializedSessions.clear(),this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function C(t,e){let i;if(e?.engine){if(i=await k.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),n=t.engine||s?.engine||"auto";return i=await k.create(t,{engine:n}),i||(i=await k.create(t,{engine:"http"})),i}k.registry=new Map;var $=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=b(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=f.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await C(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(t=this.options){const e=new m.EventEmitter;return(0,g.defaultsDeep)({...t,id:this.id,eventBus:e,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},l)}},R=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new $(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}},A=require("crawlee"),P=((t,s,n)=>(n=null!=t?e(r(t)):{},a(!s&&t&&t.__esModule?n:i(n,"default",{value:t,enumerable:!0}),t)))(require("cheerio")),F=require("@isdk/common-error"),O=class extends k{_ensureCheerioContext(t){if(!t.$&&t.body){let e="string"==typeof t.body?t.body:Buffer.isBuffer(t.body)?t.body.toString("utf-8"):JSON.stringify(t.body);e.trim().startsWith("<")||(e=`<html><body><pre>${e}</pre></body></html>`),t.$=P.load(e)}}async _buildResponse(t){this._ensureCheerioContext(t);const{request:e,response:i,body:s,$:n}=t,r=n?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");r&&r!==o&&(o=r);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:n="string"}=t;if(0===i.length)return null;let r="";if(r=s?i.attr(s)??null:"html"===n?i.html():i.text().trim(),null===r)return null;switch(n){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,n=i(s).first();let r;if(0===n.length)try{r=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new F.CommonError(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!n.is("a")||!n.attr("href")){if(n.is('input[type="submit"], button[type="submit"], button, input')){const e=n.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new F.CommonError("click: submit-like element without form","click")}throw new F.CommonError(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=n.attr("href");r=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:r});return void await this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new F.CommonError(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new F.CommonError(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new F.NotFoundError(e.selector,"submit");const n=s.attr("action")||t.request.loadedUrl||t.request.url,r=(s.attr("method")||"GET").toUpperCase(),o=new URL(n,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),n=s.attr("name");if(!n)return;const r=s.val();null!=r&&(a[n]=String(r))}),"GET"===r){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const n={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),n["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),n["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:n})}return void await this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new F.CommonError(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",F.ErrorCode.NotSupported)}}async _updateStateAfterNavigation(t,e){const i=e;t.response=i,t.body=i.body,t.$=void 0,i.url&&(t.request.loadedUrl=i.url),this.lastResponse=await this.buildResponse(t)}_createCrawler(t){return new A.CheerioCrawler(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new A.ProxyConfiguration({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const n=e?.timeoutMs||this.opts?.timeoutMs||3e4,r=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new F.CommonError(`goto timed out after ${n}ms.`,"gotoTimeout",F.ErrorCode.RequestTimeout))},n);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(r),t(e)},reject:t=>{clearTimeout(r),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=S(),s}};O.id="cheerio",O.mode="http",k.register(O);var j=require("crawlee"),T=require("playwright"),_=require("@isdk/common-error"),U=class extends k{async _buildResponse(t){const{page:e,response:i,request:s,session:n}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},body:"",html:"",text:""};const r=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return n&&n.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:r,html:r,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let n="";if(n=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===n)return null;switch(n=n.trim(),s){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const n=await this.buildResponse(t);return this.lastResponse=n,n}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const n=e.selector||"form",r=i.locator(n).first();if(0===await r.count())throw new _.NotFoundError(n,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await r.elementHandle();if(!t)throw new _.CommonError(`submit: could not get form handle for ${n}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),n=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:n,html:n,text:n,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await r.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new _.CommonError(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",_.ErrorCode.NotSupported)}}_createCrawler(t){return new j.PlaywrightCrawler(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const n=this.blockedTypes;n.size>0&&await e.route("**/*",t=>{n.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:T.firefox,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new _.CommonError("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};U.id="playwright",U.mode="browser",k.register(U);var N=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};N.id="click",N.returnType="none",N.capabilities={http:"simulate",browser:"native"},f.register(N);var H=class extends f{async onExecute(t,e){const{selector:i,value:s,...n}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,n)}};H.id="fill",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},f.register(H);var M=class extends f{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},f.register(M);var L=class extends f{async onExecute(t,e,i){const s=e?.params,n=s?.url||t.url;if(!n)throw new Error("URL is required for goto action");const r=t.internal.engine;if(!r)throw new Error("No engine available");t.url=n;return await r.goto(n,s)}};L.id="goto",L.returnType="response",L.capabilities={http:"native",browser:"native"},f.register(L);var B=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};B.id="submit",B.returnType="none",B.capabilities={http:"simulate",browser:"native"},f.register(B);var G=class extends f{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};G.id="waitFor",G.returnType="none",G.capabilities={http:"native",browser:"native"},f.register(G);var W=class extends f{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};W.id="extract",W.returnType="any",W.capabilities={http:"native",browser:"native"},f.register(W);var z=class extends f{async onExecute(t,e){const{selector:i,message:s,attribute:n}=e?.params||{},r=t.internal.engine;if("browser"===r?.mode){if(i){if(!await(r?.extract({selector:i,attribute:n})))return}r&&"pause"in r?await r.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function D(t,e){return(new R).fetch(t,e)}z.id="pause",z.capabilities={http:"native",browser:"native"},z.returnType="none",f.register(z);
package/dist/index.mjs CHANGED
@@ -1 +1 @@
1
- var t={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},e=Object.keys(t).concat(["actions","onPause"]),i=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(i||{}),s=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const r=[],c=new Set;for(const i of s){const s=n(i.activateOn),l=n(i.collectOn),h=n(i.deactivateOn),u=!(i.background??!0),w=t.create(i);if(!w)continue;let f=!1,d=!1,p=0;const y=async t=>{if(!f&&!d){f=!0;try{await(w.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!d){f||await y(s);try{const r=Promise.resolve(w.onExecute?.(e,i,s)).then(s=>{var r,n;if(i.storeAs){((r=e.outputs)[n=i.storeAs]||(r[n]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{p++});u&&(c.add(r),r.finally(()=>c.delete(r)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!d){0===p&&m("collector:after"),d=!0;try{await(w.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),x.forEach(t=>t())}}},v=o(e,s,y),x=a(e,l,m),b=o(e,h,g);if(r.push(...v,...x,...b),!s.length&&!l.length&&!h.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),r.push(()=>e.eventBus.off("fetcher:action:end",t))}}return r.length||c.size>0?{cleanup:()=>r.forEach(t=>t()),awaitExecPendings:async()=>{c.size>0&&await Promise.allSettled(Array.from(c))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,r=i.length>0?i[i.length-1].id:void 0,n={...e,id:this.id,depth:s,parent:r};i.push(n),t.currentAction=n;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const r=t.internal.actionStack,n=r.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:n,stack:[...r]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{r.pop();const e=r.length;t.currentAction=e>0?r[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let r;try{return t.throwHttpErrors=s,r=await this.onExecute(t,e),r&&r.returnType||(r={status:1,returnType:this.returnType??"any",result:r}),r}catch(e){if(r={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return r}finally{await this.afterExec(t,e,r,i)}}};s.registry=new Map,s.returnType="any",s.capabilities={http:"noop",browser:"noop"};var r=s;function n(t){return t?Array.isArray(t)?t:[t]:[]}function o(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(r,e),s.push(()=>t.eventBus.off(r,e))}return s}function a(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=t=>i(r,t);t.eventBus.on(r,e),s.push(()=>t.eventBus.off(r,e))}return s}import{EventEmitter as c}from"events-ex";import{defaultsDeep as l}from"lodash-es";import{customAlphabet as h}from"nanoid";var u=h("0123456789abcdefghijklmnopqrstuvwxyz",12);import{defaultsDeep as w,merge as f}from"lodash-es";import{EventEmitter as d}from"events-ex";import{CommonError as p}from"@isdk/common-error";import{Configuration as y,KeyValueStore as m,PERSIST_STATE_KEY as g,RequestQueue as v}from"crawlee";function x(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}y.getGlobalConfig().set("persistStorage",!1);var b=class{constructor(){this.hdrs={},this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new d,this.isPageActive=!1,this.navigationLock=function(){const t=x();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(e,i){const s=w(i,e,t),r=s.engine??e.engine,n=r?this.get(r)??this.getByMode(r):null;if(n){const t=new n;return await t.initialize(e,s),t}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let r=e;if(i){const t=await this._querySelectorAll(e,i);r=t.length>0?t[0]:null}if(!r)return null;const n={};for(const t in s)n[t]=await this._extract(s[t],r);return n}if("array"===i){const{selector:i,items:s}=t,r=i?await this._querySelectorAll(e,i):[e],n=[];for(const t of r)n.push(await this._extract(s,t));return n}const{selector:s}=t;let r=e;if(s){const t=await this._querySelectorAll(e,s);r=t.length>0?t[0]:null}return r?this._extractValue(t,r):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,r=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=r,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;f(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await v.open();const i=await this._getSpecificCrawlerOptions(t);t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const s={...w(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:{maxPoolSize:1,persistenceOptions:{enable:!0},sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}}}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};s.preNavigationHooks||(s.preNavigationHooks=[]),s.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,this._initialCookies&&this._initialCookies.length>0&&e){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url),this._initialCookies=void 0}});const r=this.crawler=this._createCrawler(s),n=await m.open(null,{config:r.config}),o=await n.getValue(g);!t.sessionState||o&&!t.overrideSessionState||await n.setValue(g,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){try{this.currentSession=t.session;const{request:e}=t;this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this._buildResponse(t),r=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&r){const t=new p(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,r=t?.statusCode||500,n=t?.url?t.url:i.url,o=new p(`Request${n?" for "+n:""} failed: ${e.message}`,"request",r);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function E(t,e){let i;if(e?.engine){if(i=await b.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),r=t.engine||s?.engine||"auto";return i=await b.create(t,{engine:r}),i||(i=await b.create(t,{engine:"http"})),i}b.registry=new Map;var k=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=u(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=r.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await E(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(e=this.options){const i=new c;return l({...e,id:this.id,eventBus:i,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},t)}},S=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new k(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}};import{CheerioCrawler as q,ProxyConfiguration as C}from"crawlee";import*as $ from"cheerio";import{CommonError as R,ErrorCode as P,NotFoundError as T}from"@isdk/common-error";var A=class extends b{async _buildResponse(t){const{request:e,response:i,body:s,$:r}=t,n=r?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");n&&n!==o&&(o=n);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:r="string"}=t;if(0===i.length)return null;let n="";if(n=s?i.attr(s)??null:"html"===r?i.html():i.text().trim(),null===n)return null;switch(r){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,r=i(s).first();let n;if(0===r.length)try{n=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new R(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!r.is("a")||!r.attr("href")){if(r.is('input[type="submit"], button[type="submit"], button, input')){const e=r.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new R("click: submit-like element without form","click")}throw new R(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=r.attr("href");n=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:n});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new R(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new R(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new T(e.selector,"submit");const r=s.attr("action")||t.request.loadedUrl||t.request.url,n=(s.attr("method")||"GET").toUpperCase(),o=new URL(r,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),r=s.attr("name");if(!r)return;const n=s.val();null!=n&&(a[r]=String(n))}),"GET"===n){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const r={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),r["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),r["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:r})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new R(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",P.NotSupported)}}_updateStateAfterNavigation(t,e){const i=e;let s=i.headers;if(!s&&i.rawHeaders){s={};for(let t=0;t<i.rawHeaders.length;t+=2)s[i.rawHeaders[t].toLowerCase()]=i.rawHeaders[t+1]}s=s||{};const r=i.body,n=$.load(r??"");t.$=n,t.response=i,t.body=r;const o=n.html(),a=n.text(),c=(s["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:i.url,statusCode:i.statusCode,statusText:i.statusMessage,headers:s,contentType:c,body:r,html:o,text:a}}_createCrawler(t){return new q(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new C({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const r=e?.timeoutMs||this.opts?.timeoutMs||3e4,n=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new R(`goto timed out after ${r}ms.`,"gotoTimeout",P.RequestTimeout))},r);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(n),t(e)},reject:t=>{clearTimeout(n),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=x(),s}};A.id="cheerio",A.mode="http",b.register(A);import{PlaywrightCrawler as U}from"crawlee";import{firefox as _}from"playwright";import{CommonError as j,ErrorCode as O,NotFoundError as N}from"@isdk/common-error";var F=class extends b{async _buildResponse(t){const{page:e,response:i,request:s,session:r}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const n=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return console.log("[PlaywrightEngine] Cookies from page context:",a),r&&r.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:n,html:n,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let r="";if(r=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===r)return null;switch(r=r.trim(),s){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const r=await this.buildResponse(t);return this.lastResponse=r,r}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const r=e.selector||"form",n=i.locator(r).first();if(0===await n.count())throw new N(r,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await n.elementHandle();if(!t)throw new j(`submit: could not get form handle for ${r}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),r=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:r,html:r,text:r,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await n.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new j(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",O.NotSupported)}}_createCrawler(t){return new U(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const r=this.blockedTypes;r.size>0&&await e.route("**/*",t=>{r.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:_,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new j("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};F.id="playwright",F.mode="browser",b.register(F);var H=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};H.id="click",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},r.register(H);var L=class extends r{async onExecute(t,e){const{selector:i,value:s,...r}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,r)}};L.id="fill",L.returnType="none",L.capabilities={http:"simulate",browser:"native"},r.register(L);var M=class extends r{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},r.register(M);var z=class extends r{async onExecute(t,e,i){const s=e?.params,r=s?.url||t.url;if(!r)throw new Error("URL is required for goto action");const n=t.internal.engine;if(!n)throw new Error("No engine available");t.url=r;return await n.goto(r,s)}};z.id="goto",z.returnType="response",z.capabilities={http:"native",browser:"native"},r.register(z);var B=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};B.id="submit",B.returnType="none",B.capabilities={http:"simulate",browser:"native"},r.register(B);var D=class extends r{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};D.id="waitFor",D.returnType="none",D.capabilities={http:"native",browser:"native"},r.register(D);var G=class extends r{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};G.id="extract",G.returnType="any",G.capabilities={http:"native",browser:"native"},r.register(G);var I=class extends r{async onExecute(t,e){const{selector:i,message:s,attribute:r}=e?.params||{},n=t.internal.engine;if("browser"===n?.mode){if(i){if(!await(n?.extract({selector:i,attribute:r})))return}n&&"pause"in n?await n.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function J(t,e){return(new S).fetch(t,e)}I.id="pause",I.capabilities={http:"native",browser:"native"},I.returnType="none",r.register(I);export{A as CheerioFetchEngine,H as ClickAction,t as DefaultFetcherProperties,G as ExtractAction,r as FetchAction,i as FetchActionResultStatus,b as FetchEngine,k as FetchSession,e as FetcherOptionKeys,L as FillAction,M as GetContentAction,z as GotoAction,I as PauseAction,F as PlaywrightFetchEngine,B as SubmitAction,D as WaitForAction,S as WebFetcher,J as fetchWeb};
1
+ var t={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},e=Object.keys(t).concat(["actions","onPause"]),i=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(i||{}),s=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const r=[],c=new Set;for(const i of s){const s=n(i.activateOn),l=n(i.collectOn),h=n(i.deactivateOn),u=!(i.background??!0),w=t.create(i);if(!w)continue;let f=!1,d=!1,p=0;const y=async t=>{if(!f&&!d){f=!0;try{await(w.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!d){f||await y(s);try{const r=Promise.resolve(w.onExecute?.(e,i,s)).then(s=>{var r,n;if(i.storeAs){((r=e.outputs)[n=i.storeAs]||(r[n]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{p++});u&&(c.add(r),r.finally(()=>c.delete(r)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!d){0===p&&m("collector:after"),d=!0;try{await(w.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),x.forEach(t=>t())}}},v=o(e,s,y),x=a(e,l,m),b=o(e,h,g);if(r.push(...v,...x,...b),!s.length&&!l.length&&!h.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),r.push(()=>e.eventBus.off("fetcher:action:end",t))}}return r.length||c.size>0?{cleanup:()=>r.forEach(t=>t()),awaitExecPendings:async()=>{c.size>0&&await Promise.allSettled(Array.from(c))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,r=i.length>0?i[i.length-1].id:void 0,n={...e,id:this.id,depth:s,parent:r};i.push(n),t.currentAction=n;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const r=t.internal.actionStack,n=r.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:n,stack:[...r]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{r.pop();const e=r.length;t.currentAction=e>0?r[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let r;try{return t.throwHttpErrors=s,r=await this.onExecute(t,e),r&&r.returnType||(r={status:1,returnType:this.returnType??"any",result:r}),r}catch(e){if(r={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return r}finally{await this.afterExec(t,e,r,i)}}};s.registry=new Map,s.returnType="any",s.capabilities={http:"noop",browser:"noop"};var r=s;function n(t){return t?Array.isArray(t)?t:[t]:[]}function o(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(r,e),s.push(()=>t.eventBus.off(r,e))}return s}function a(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=t=>i(r,t);t.eventBus.on(r,e),s.push(()=>t.eventBus.off(r,e))}return s}import{EventEmitter as c}from"events-ex";import{defaultsDeep as l}from"lodash-es";import{customAlphabet as h}from"nanoid";var u=h("0123456789abcdefghijklmnopqrstuvwxyz",12);import{defaultsDeep as w,merge as f}from"lodash-es";import{EventEmitter as d}from"events-ex";import{CommonError as p}from"@isdk/common-error";import{Configuration as y,KeyValueStore as m,PERSIST_STATE_KEY as g,RequestQueue as v}from"crawlee";function x(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}y.getGlobalConfig().set("persistStorage",!1);var b=class{constructor(){this.hdrs={},this._initializedSessions=new Set,this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new d,this.isPageActive=!1,this.navigationLock=function(){const t=x();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(e,i){const s=w(i,e,t),r=s.engine??e.engine,n=r?this.get(r)??this.getByMode(r):null;if(n){const t=new n;return await t.initialize(e,s),t}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let r=e;if(i){const t=await this._querySelectorAll(e,i);r=t.length>0?t[0]:null}if(!r)return null;const n={};for(const t in s)n[t]=await this._extract(s[t],r);return n}if("array"===i){const{selector:i,items:s}=t,r=i?await this._querySelectorAll(e,i):[e],n=[];for(const t of r)n.push(await this._extract(s,t));return n}const{selector:s}=t;let r=e;if(s){const t=await this._querySelectorAll(e,s);r=t.length>0?t[0]:null}return r?this._extractValue(t,r):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),!e.cookies&&t.session&&(e.cookies=t.session.getCookies(t.request.url)),this.crawler?.sessionPool&&(e.sessionState=await this.crawler.sessionPool.getState()),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,r=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=r,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;f(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await v.open();const i=await this._getSpecificCrawlerOptions(t),s=w({persistenceOptions:{enable:!0}},t.sessionPoolOptions,{maxPoolSize:1,sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}});t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const r={...w(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:s}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};r.preNavigationHooks||(r.preNavigationHooks=[]),r.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,e&&!this._initializedSessions.has(e.id)){if(this._initialCookies&&this._initialCookies.length>0){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url)}this._initializedSessions.add(e.id)}});const n=this.crawler=this._createCrawler(r),o=await m.open(null,{config:n.config}),a=await o.getValue(g);!t.sessionState||a&&!t.overrideSessionState||await o.setValue(g,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){const{request:e}=t;try{this.currentSession=t.session,this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this.buildResponse(t),r=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&r){const t=new p(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{if(this.currentSession){const t=this.currentSession.getCookies(e.url);t&&(this._initialCookies=t)}this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,r=t?.statusCode||500,n=t?.url?t.url:i.url,o=new p(`Request${n?" for "+n:""} failed: ${e.message}`,"request",r);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this._initializedSessions.clear(),this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function E(t,e){let i;if(e?.engine){if(i=await b.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),r=t.engine||s?.engine||"auto";return i=await b.create(t,{engine:r}),i||(i=await b.create(t,{engine:"http"})),i}b.registry=new Map;var S=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=u(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=r.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await E(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(e=this.options){const i=new c;return l({...e,id:this.id,eventBus:i,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},t)}},k=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new S(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}};import{CheerioCrawler as q,ProxyConfiguration as C}from"crawlee";import*as $ from"cheerio";import{CommonError as R,ErrorCode as P,NotFoundError as T}from"@isdk/common-error";var A=class extends b{_ensureCheerioContext(t){if(!t.$&&t.body){let e="string"==typeof t.body?t.body:Buffer.isBuffer(t.body)?t.body.toString("utf-8"):JSON.stringify(t.body);e.trim().startsWith("<")||(e=`<html><body><pre>${e}</pre></body></html>`),t.$=$.load(e)}}async _buildResponse(t){this._ensureCheerioContext(t);const{request:e,response:i,body:s,$:r}=t,n=r?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");n&&n!==o&&(o=n);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:r="string"}=t;if(0===i.length)return null;let n="";if(n=s?i.attr(s)??null:"html"===r?i.html():i.text().trim(),null===n)return null;switch(r){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,r=i(s).first();let n;if(0===r.length)try{n=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new R(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!r.is("a")||!r.attr("href")){if(r.is('input[type="submit"], button[type="submit"], button, input')){const e=r.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new R("click: submit-like element without form","click")}throw new R(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=r.attr("href");n=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:n});return void await this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new R(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new R(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new T(e.selector,"submit");const r=s.attr("action")||t.request.loadedUrl||t.request.url,n=(s.attr("method")||"GET").toUpperCase(),o=new URL(r,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),r=s.attr("name");if(!r)return;const n=s.val();null!=n&&(a[r]=String(n))}),"GET"===n){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const r={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),r["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),r["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:r})}return void await this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new R(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",P.NotSupported)}}async _updateStateAfterNavigation(t,e){const i=e;t.response=i,t.body=i.body,t.$=void 0,i.url&&(t.request.loadedUrl=i.url),this.lastResponse=await this.buildResponse(t)}_createCrawler(t){return new q(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new C({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const r=e?.timeoutMs||this.opts?.timeoutMs||3e4,n=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new R(`goto timed out after ${r}ms.`,"gotoTimeout",P.RequestTimeout))},r);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(n),t(e)},reject:t=>{clearTimeout(n),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=x(),s}};A.id="cheerio",A.mode="http",b.register(A);import{PlaywrightCrawler as _}from"crawlee";import{firefox as U}from"playwright";import{CommonError as O,ErrorCode as j,NotFoundError as N}from"@isdk/common-error";var F=class extends b{async _buildResponse(t){const{page:e,response:i,request:s,session:r}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},body:"",html:"",text:""};const n=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return r&&r.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:n,html:n,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let r="";if(r=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===r)return null;switch(r=r.trim(),s){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const r=await this.buildResponse(t);return this.lastResponse=r,r}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const r=e.selector||"form",n=i.locator(r).first();if(0===await n.count())throw new N(r,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await n.elementHandle();if(!t)throw new O(`submit: could not get form handle for ${r}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),r=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:r,html:r,text:r,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await n.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new O(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",j.NotSupported)}}_createCrawler(t){return new _(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const r=this.blockedTypes;r.size>0&&await e.route("**/*",t=>{r.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:U,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new O("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};F.id="playwright",F.mode="browser",b.register(F);var H=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};H.id="click",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},r.register(H);var L=class extends r{async onExecute(t,e){const{selector:i,value:s,...r}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,r)}};L.id="fill",L.returnType="none",L.capabilities={http:"simulate",browser:"native"},r.register(L);var M=class extends r{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},r.register(M);var B=class extends r{async onExecute(t,e,i){const s=e?.params,r=s?.url||t.url;if(!r)throw new Error("URL is required for goto action");const n=t.internal.engine;if(!n)throw new Error("No engine available");t.url=r;return await n.goto(r,s)}};B.id="goto",B.returnType="response",B.capabilities={http:"native",browser:"native"},r.register(B);var z=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};z.id="submit",z.returnType="none",z.capabilities={http:"simulate",browser:"native"},r.register(z);var D=class extends r{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};D.id="waitFor",D.returnType="none",D.capabilities={http:"native",browser:"native"},r.register(D);var J=class extends r{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};J.id="extract",J.returnType="any",J.capabilities={http:"native",browser:"native"},r.register(J);var G=class extends r{async onExecute(t,e){const{selector:i,message:s,attribute:r}=e?.params||{},n=t.internal.engine;if("browser"===n?.mode){if(i){if(!await(n?.extract({selector:i,attribute:r})))return}n&&"pause"in n?await n.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function I(t,e){return(new k).fetch(t,e)}G.id="pause",G.capabilities={http:"native",browser:"native"},G.returnType="none",r.register(G);export{A as CheerioFetchEngine,H as ClickAction,t as DefaultFetcherProperties,J as ExtractAction,r as FetchAction,i as FetchActionResultStatus,b as FetchEngine,S as FetchSession,e as FetcherOptionKeys,L as FillAction,M as GetContentAction,B as GotoAction,G as PauseAction,F as PlaywrightFetchEngine,z as SubmitAction,D as WaitForAction,k as WebFetcher,I as fetchWeb};
package/docs/README.md CHANGED
@@ -137,7 +137,10 @@ This is the main entry point for the library.
137
137
  * `engine` ('http' | 'browser' | 'auto'): The engine to use. Defaults to `auto`.
138
138
  * `actions` (FetchActionOptions[]): An array of action objects to execute. (Supports `action`/`name` as alias for `id`, and `args` as alias for `params`)
139
139
  * `headers` (Record<string, string>): Headers to use for all requests.
140
- * ...and many other options for proxy, cookies, retries, etc.
140
+ * `cookies` (Cookie[]): Array of cookies to use.
141
+ * `sessionState` (any): Crawlee session state to restore.
142
+ * `sessionPoolOptions` (SessionPoolOptions): Advanced configuration for the underlying Crawlee SessionPool.
143
+ * ...and many other options for proxy, retries, etc.
141
144
 
142
145
  ### Built-in Actions
143
146
 
@@ -161,6 +164,7 @@ The `fetchWeb` function returns an object containing:
161
164
  * `statusCode`: HTTP status code.
162
165
  * `headers`: HTTP headers.
163
166
  * `cookies`: Array of cookies.
167
+ * `sessionState`: Crawlee session state.
164
168
  * `text`, `html`: Page content.
165
169
  * `outputs` (Record<string, any>): Data extracted and stored via `storeAs`.
166
170
 
@@ -127,15 +127,22 @@ Submits a form.
127
127
 
128
128
  Pauses execution to wait for one or more conditions to be met.
129
129
 
130
- In `browser` mode, if multiple conditions are provided, they are awaited sequentially. For example, it will first wait for the selector to appear, then wait for the network to be idle, and finally wait for the specified duration.
130
+ In `browser` mode, if multiple conditions are provided, they are awaited sequentially. For example, it will first wait for the selector to appear, then wait for the network to be idle, and **finally** wait for the specified duration (`ms`).
131
131
 
132
132
  * **`id`**: `waitFor`
133
133
  * **`params`**: An object specifying the wait condition, which can contain one or more of the following keys:
134
- * **`ms`** (number): Waits for the specified number of milliseconds. Supported by both engines.
135
134
  * **`selector`** (string): Waits for an element matching the selector to appear on the page. Supported only in `browser` mode.
136
135
  * **`networkIdle`** (boolean): Waits until the network is idle (i.e., no new network requests for a period of time). Supported only in `browser` mode.
136
+ * **`ms`** (number): Waits for the specified number of milliseconds **after** other conditions are met. Supported by both engines.
137
137
  * **`returns`**: `none`
138
138
 
139
+ > **⚠️ Critical Distinction: `ms` vs Timeout**
140
+ >
141
+ > Do not confuse the **`ms`** parameter with a timeout setting.
142
+ >
143
+ > * **`ms` (Duration)**: This forces the script to sleep/pause for a fixed amount of time. If used with `selector`, it adds an **extra delay** after the element is found.
144
+ > * **Timeout (Deadline)**: The maximum time the script will wait for a condition (like `selector` or `networkIdle`) to be met before failing is controlled by the session-level **`timeoutMs`** option (default is usually 30s), not this `ms` parameter.
145
+
139
146
  #### `pause`
140
147
 
141
148
  Pauses the execution of the Action Script to allow for manual user intervention (e.g., solving a CAPTCHA).
@@ -133,7 +133,10 @@ searchGoogle('gemini');
133
133
  * `engine` ('http' | 'browser' | 'auto'): 要使用的引擎。默认为 `auto`。
134
134
  * `actions` (FetchActionOptions[]): 要执行的动作对象数组。(支持 `action`/`name` 作为 `id` 的别名,`args` 作为 `params` 的别名)
135
135
  * `headers` (Record<string, string>): 用于所有请求的头信息。
136
- * ...以及许多其他用于代理、Cookie、重试等的选项。
136
+ * `cookies` (Cookie[]): 要使用的 Cookie 数组。
137
+ * `sessionState` (any): 要恢复的 Crawlee 会话状态。
138
+ * `sessionPoolOptions` (SessionPoolOptions): 底层 Crawlee SessionPool 的高级配置。
139
+ * ...以及许多其他用于代理、重试等的选项。
137
140
 
138
141
  ### 内置动作
139
142
 
@@ -157,6 +160,7 @@ searchGoogle('gemini');
157
160
  * `statusCode`: HTTP 状态码。
158
161
  * `headers`: HTTP 头信息。
159
162
  * `cookies`: Cookie 数组。
163
+ * `sessionState`: Crawlee 会话状态。
160
164
  * `text`, `html`: 页面内容。
161
165
  * `outputs` (Record<string, any>): 通过 `storeAs` 提取并存储的数据。
162
166