@isdk/web-fetcher 0.2.6 → 0.2.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +4 -1
- package/README.engine.cn.md +31 -1
- package/README.engine.md +31 -1
- package/README.md +4 -1
- package/dist/index.d.mts +12 -3
- package/dist/index.d.ts +12 -3
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +4 -1
- package/docs/_media/README.cn.md +4 -1
- package/docs/_media/README.engine.md +31 -1
- package/docs/classes/CheerioFetchEngine.md +95 -71
- package/docs/classes/ClickAction.md +23 -23
- package/docs/classes/ExtractAction.md +23 -23
- package/docs/classes/FetchAction.md +23 -23
- package/docs/classes/FetchEngine.md +83 -67
- package/docs/classes/FetchSession.md +24 -12
- package/docs/classes/FillAction.md +23 -23
- package/docs/classes/GetContentAction.md +23 -23
- package/docs/classes/GotoAction.md +23 -23
- package/docs/classes/PauseAction.md +23 -23
- package/docs/classes/PlaywrightFetchEngine.md +95 -71
- package/docs/classes/SubmitAction.md +23 -23
- package/docs/classes/WaitForAction.md +23 -23
- package/docs/classes/WebFetcher.md +5 -5
- package/docs/enumerations/FetchActionResultStatus.md +4 -4
- package/docs/functions/fetchWeb.md +2 -2
- package/docs/interfaces/BaseFetchActionProperties.md +11 -11
- package/docs/interfaces/BaseFetchCollectorActionProperties.md +15 -15
- package/docs/interfaces/BaseFetcherProperties.md +46 -22
- package/docs/interfaces/DispatchedEngineAction.md +4 -4
- package/docs/interfaces/ExtractActionProperties.md +11 -11
- package/docs/interfaces/FetchActionInContext.md +15 -15
- package/docs/interfaces/FetchActionProperties.md +12 -12
- package/docs/interfaces/FetchActionResult.md +6 -6
- package/docs/interfaces/FetchContext.md +68 -32
- package/docs/interfaces/FetchEngineContext.md +63 -27
- package/docs/interfaces/FetchMetadata.md +5 -5
- package/docs/interfaces/FetchResponse.md +13 -13
- package/docs/interfaces/FetchReturnTypeRegistry.md +7 -7
- package/docs/interfaces/FetchSite.md +61 -25
- package/docs/interfaces/FetcherOptions.md +60 -24
- package/docs/interfaces/GotoActionOptions.md +6 -6
- package/docs/interfaces/PendingEngineRequest.md +3 -3
- package/docs/interfaces/SubmitActionOptions.md +2 -2
- package/docs/interfaces/WaitForActionOptions.md +5 -5
- package/docs/type-aliases/BaseFetchActionOptions.md +1 -1
- package/docs/type-aliases/BaseFetchCollectorOptions.md +1 -1
- package/docs/type-aliases/BrowserEngine.md +1 -1
- package/docs/type-aliases/FetchActionCapabilities.md +1 -1
- package/docs/type-aliases/FetchActionCapabilityMode.md +1 -1
- package/docs/type-aliases/FetchActionOptions.md +1 -1
- package/docs/type-aliases/FetchEngineAction.md +1 -1
- package/docs/type-aliases/FetchEngineType.md +1 -1
- package/docs/type-aliases/FetchReturnType.md +1 -1
- package/docs/type-aliases/FetchReturnTypeFor.md +1 -1
- package/docs/type-aliases/OnFetchPauseCallback.md +1 -1
- package/docs/type-aliases/ResourceType.md +1 -1
- package/docs/variables/DefaultFetcherProperties.md +1 -1
- package/docs/variables/FetcherOptionKeys.md +1 -1
- package/package.json +3 -2
package/README.cn.md
CHANGED
|
@@ -133,7 +133,10 @@ searchGoogle('gemini');
|
|
|
133
133
|
* `engine` ('http' | 'browser' | 'auto'): 要使用的引擎。默认为 `auto`。
|
|
134
134
|
* `actions` (FetchActionOptions[]): 要执行的动作对象数组。(支持 `action`/`name` 作为 `id` 的别名,`args` 作为 `params` 的别名)
|
|
135
135
|
* `headers` (Record<string, string>): 用于所有请求的头信息。
|
|
136
|
-
*
|
|
136
|
+
* `cookies` (Cookie[]): 要使用的 Cookie 数组。
|
|
137
|
+
* `sessionState` (any): 要恢复的 Crawlee 会话状态。
|
|
138
|
+
* `sessionPoolOptions` (SessionPoolOptions): 底层 Crawlee SessionPool 的高级配置。
|
|
139
|
+
* ...以及许多其他用于代理、重试等的选项。
|
|
137
140
|
|
|
138
141
|
### 内置动作
|
|
139
142
|
|
package/README.engine.cn.md
CHANGED
|
@@ -24,13 +24,43 @@
|
|
|
24
24
|
* **关键抽象**:
|
|
25
25
|
* **生命周期**:`initialize()` 和 `cleanup()` 方法。
|
|
26
26
|
* **核心操作**:`goto()`、`getContent()`、`click()`、`fill()`、`submit()`、`waitFor()`、`extract()`。
|
|
27
|
-
* **配置与状态**:`headers()`、`cookies()`、`blockResources()`、`getState()`。
|
|
27
|
+
* **配置与状态**:`headers()`、`cookies()`、`blockResources()`、`getState()`、`sessionPoolOptions`。
|
|
28
28
|
* **静态注册表**:它维护所有可用引擎实现的静态注册表(`FetchEngine.register`),允许通过 `id` 或 `mode` 动态选择引擎。
|
|
29
29
|
|
|
30
30
|
### `FetchEngine.create(context, options)`
|
|
31
31
|
|
|
32
32
|
此静态工厂方法是创建引擎实例的指定入口点。它会自动选择并初始化合适的引擎。
|
|
33
33
|
|
|
34
|
+
### `FetchSession(options, engine)`
|
|
35
|
+
|
|
36
|
+
`FetchSession` 类管理抓取操作的生命周期。它现在在构造函数中支持一个可选的 `engine` 参数,用于为该会话强制指定引擎实现,从而绕过任何自动检测或基于注册表的选择。
|
|
37
|
+
|
|
38
|
+
### 引擎选择优先级
|
|
39
|
+
|
|
40
|
+
当库决定使用哪个引擎时(通过内部的 `maybeCreateEngine`),它遵循以下优先级:
|
|
41
|
+
|
|
42
|
+
1. **显式强制引擎**:如果在会话或引擎创建期间显式传递了引擎 ID(例如 `FetchSession` 中的新 `engine` 参数)。
|
|
43
|
+
2. **配置引擎**:在 `FetcherOptions` 中定义的 `engine` 属性。
|
|
44
|
+
3. **站点注册表**:如果目标 URL 匹配已配置 `sites` 注册表中的域名,则使用该站点首选的引擎。
|
|
45
|
+
4. **默认值**:默认为 `auto`。如果启用了 `enableSmart`,它会智能地在 `http` 和 `browser`之间切换,否则默认为 `http`。
|
|
46
|
+
|
|
47
|
+
### 会话管理与状态持久化
|
|
48
|
+
|
|
49
|
+
引擎支持在多次执行之间持久化和恢复会话状态(主要是 Cookie)。
|
|
50
|
+
|
|
51
|
+
* **`sessionState`**: 一个完整的状态对象(源自 Crawlee 的 SessionPool),可用于完全恢复之前的会话。这在引擎初始化期间设置。
|
|
52
|
+
* **`sessionPoolOptions`**: 允许对底层的 Crawlee `SessionPool` 进行高级配置(例如 `maxUsageCount`, `maxPoolSize`)。
|
|
53
|
+
> **注意**: 为了确保会话状态管理的正常运行,`persistenceOptions.enable` 会被强制设置为 `true`。
|
|
54
|
+
* **`overrideSessionState`**: 如果设置为 `true`,则强制引擎使用提供的 `sessionState` 覆盖存储中的任何现有持久化状态。当你希望确保会话以确切提供的状态启动,忽略持久化层中的任何陈旧数据时,这非常有用。
|
|
55
|
+
* **`cookies`**: 用于会话的显式 Cookie 数组。
|
|
56
|
+
|
|
57
|
+
**优先级规则:**
|
|
58
|
+
如果同时提供了 `sessionState` 和 `cookies`,引擎将采用**“合并并覆盖”**策略:
|
|
59
|
+
1. 首先从 `sessionState` 恢复会话。
|
|
60
|
+
2. 然后在之上应用显式的 `cookies`。
|
|
61
|
+
* **结果**:`sessionState` 中任何冲突的 Cookie 都会被显式的 `cookies` **覆盖**。
|
|
62
|
+
* 如果两者同时存在,系统将记录一条警告日志,以提醒用户这种覆盖行为。
|
|
63
|
+
|
|
34
64
|
---
|
|
35
65
|
|
|
36
66
|
## 🏗️ 3. 架构和工作流程
|
package/README.engine.md
CHANGED
|
@@ -24,13 +24,43 @@ This is the abstract base class that defines the contract for all fetch engines.
|
|
|
24
24
|
* **Key Abstractions**:
|
|
25
25
|
* **Lifecycle**: `initialize()` and `cleanup()` methods.
|
|
26
26
|
* **Core Actions**: `goto()`, `getContent()`, `click()`, `fill()`, `submit()`, `waitFor()`, `extract()`.
|
|
27
|
-
* **Configuration & State**: `headers()`, `cookies()`, `blockResources()`, `getState()`.
|
|
27
|
+
* **Configuration & State**: `headers()`, `cookies()`, `blockResources()`, `getState()`, `sessionPoolOptions`.
|
|
28
28
|
* **Static Registry**: It maintains a static registry of all available engine implementations (`FetchEngine.register`), allowing for dynamic selection by `id` or `mode`.
|
|
29
29
|
|
|
30
30
|
### `FetchEngine.create(context, options)`
|
|
31
31
|
|
|
32
32
|
This static factory method is the designated entry point for creating an engine instance. It automatically selects and initializes the appropriate engine.
|
|
33
33
|
|
|
34
|
+
### `FetchSession(options, engine)`
|
|
35
|
+
|
|
36
|
+
The `FetchSession` class manages the lifecycle of a fetch operation. It now supports an optional `engine` parameter in its constructor to force a specific engine implementation for that session, bypassing any auto-detection or registry-based selection.
|
|
37
|
+
|
|
38
|
+
### Engine Selection Priority
|
|
39
|
+
|
|
40
|
+
When the library determines which engine to use (via internal `maybeCreateEngine`), it follows this priority:
|
|
41
|
+
|
|
42
|
+
1. **Explicit Forced Engine**: If an engine ID is explicitly passed during session or engine creation (e.g., the new `engine` parameter in `FetchSession`).
|
|
43
|
+
2. **Configuration Engine**: The `engine` property defined in `FetcherOptions`.
|
|
44
|
+
3. **Site Registry**: If the target URL matches a domain in the configured `sites` registry, it uses the engine preferred for that site.
|
|
45
|
+
4. **Default**: Defaults to `auto`, which intelligently switches between `http` and `browser` if `enableSmart` is enabled, or defaults to `http` otherwise.
|
|
46
|
+
|
|
47
|
+
### Session Management & State Persistence
|
|
48
|
+
|
|
49
|
+
The engine supports persisting and restoring session state (primarily cookies) between executions.
|
|
50
|
+
|
|
51
|
+
* **`sessionState`**: A comprehensive state object (derived from Crawlee's SessionPool) that can be used to fully restore a previous session. This is set during engine initialization.
|
|
52
|
+
* **`sessionPoolOptions`**: Allows advanced configuration of the underlying Crawlee `SessionPool` (e.g., `maxUsageCount`, `maxPoolSize`).
|
|
53
|
+
> **Note**: `persistenceOptions.enable` is forced to `true` to ensure proper session state management.
|
|
54
|
+
* **`overrideSessionState`**: If set to `true`, it forces the engine to overwrite any existing persistent state in the storage with the provided `sessionState`. This is useful when you want to ensure the session starts with the exact state provided, ignoring any stale data in the persistence layer.
|
|
55
|
+
* **`cookies`**: An array of explicit cookies to use for the session.
|
|
56
|
+
|
|
57
|
+
**Precedence Rule:**
|
|
58
|
+
If both `sessionState` and `cookies` are provided, the engine adopts a **"Merge and Override"** strategy:
|
|
59
|
+
1. The session is first restored from the `sessionState`.
|
|
60
|
+
2. The explicit `cookies` are then applied on top.
|
|
61
|
+
* **Result:** Any conflicting cookies in `sessionState` will be **overwritten** by the explicit `cookies`.
|
|
62
|
+
* A warning will be logged if both are present to alert the user of this override behavior.
|
|
63
|
+
|
|
34
64
|
---
|
|
35
65
|
|
|
36
66
|
## 🏗️ 3. Architecture and Workflow
|
package/README.md
CHANGED
|
@@ -133,7 +133,10 @@ This is the main entry point for the library.
|
|
|
133
133
|
* `engine` ('http' | 'browser' | 'auto'): The engine to use. Defaults to `auto`.
|
|
134
134
|
* `actions` (FetchActionOptions[]): An array of action objects to execute. (Supports `action`/`name` as alias for `id`, and `args` as alias for `params`)
|
|
135
135
|
* `headers` (Record<string, string>): Headers to use for all requests.
|
|
136
|
-
*
|
|
136
|
+
* `cookies` (Cookie[]): Array of cookies to use.
|
|
137
|
+
* `sessionState` (any): Crawlee session state to restore.
|
|
138
|
+
* `sessionPoolOptions` (SessionPoolOptions): Advanced configuration for the underlying Crawlee SessionPool.
|
|
139
|
+
* ...and many other options for proxy, retries, etc.
|
|
137
140
|
|
|
138
141
|
### Built-in Actions
|
|
139
142
|
|
package/dist/index.d.mts
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions } from 'crawlee';
|
|
1
|
+
import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions, SessionPoolOptions } from 'crawlee';
|
|
2
2
|
export { Cookie } from 'crawlee';
|
|
3
3
|
import { EventEmitter } from 'events-ex';
|
|
4
4
|
|
|
@@ -946,7 +946,9 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
|
|
|
946
946
|
protected isCrawlerReady?: boolean;
|
|
947
947
|
protected requestQueue?: RequestQueue;
|
|
948
948
|
protected hdrs: Record<string, string>;
|
|
949
|
-
protected
|
|
949
|
+
protected _initialCookies?: Cookie[];
|
|
950
|
+
protected _initializedSessions: Set<string>;
|
|
951
|
+
protected currentSession?: Session;
|
|
950
952
|
protected pendingRequests: Map<string, PendingEngineRequest>;
|
|
951
953
|
protected requestCounter: number;
|
|
952
954
|
protected actionEmitter: EventEmitter;
|
|
@@ -1072,6 +1074,7 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
|
|
|
1072
1074
|
*/
|
|
1073
1075
|
getState(): Promise<{
|
|
1074
1076
|
cookies: Cookie[];
|
|
1077
|
+
sessionState?: any;
|
|
1075
1078
|
}>;
|
|
1076
1079
|
/**
|
|
1077
1080
|
* Gets the execution mode of this engine (`'http'` or `'browser'`).
|
|
@@ -1256,6 +1259,7 @@ type CheerioNode = ReturnType<CheerioSelection['first']>;
|
|
|
1256
1259
|
declare class CheerioFetchEngine extends FetchEngine<CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions> {
|
|
1257
1260
|
static readonly id = "cheerio";
|
|
1258
1261
|
static readonly mode = "http";
|
|
1262
|
+
private _ensureCheerioContext;
|
|
1259
1263
|
protected _buildResponse(context: CheerioCrawlingContext): Promise<FetchResponse>;
|
|
1260
1264
|
protected _querySelectorAll(context: {
|
|
1261
1265
|
$: CheerioAPI;
|
|
@@ -1410,6 +1414,9 @@ interface BaseFetcherProperties {
|
|
|
1410
1414
|
antibot?: boolean;
|
|
1411
1415
|
headers?: Record<string, string>;
|
|
1412
1416
|
cookies?: Cookie[];
|
|
1417
|
+
sessionState?: any;
|
|
1418
|
+
sessionPoolOptions?: SessionPoolOptions;
|
|
1419
|
+
overrideSessionState?: boolean;
|
|
1413
1420
|
reuseCookies?: boolean;
|
|
1414
1421
|
throwHttpErrors?: boolean;
|
|
1415
1422
|
proxy?: string | string[];
|
|
@@ -1489,10 +1496,11 @@ declare const FetcherOptionKeys: string[];
|
|
|
1489
1496
|
|
|
1490
1497
|
declare class FetchSession {
|
|
1491
1498
|
private options;
|
|
1499
|
+
readonly engine?: string | undefined;
|
|
1492
1500
|
readonly id: string;
|
|
1493
1501
|
readonly context: FetchContext;
|
|
1494
1502
|
private closed;
|
|
1495
|
-
constructor(options?: FetcherOptions);
|
|
1503
|
+
constructor(options?: FetcherOptions, engine?: string | undefined);
|
|
1496
1504
|
/**
|
|
1497
1505
|
* 执行单个动作
|
|
1498
1506
|
*/
|
|
@@ -1504,6 +1512,7 @@ declare class FetchSession {
|
|
|
1504
1512
|
getOutputs(): Record<string, any>;
|
|
1505
1513
|
getState(): Promise<{
|
|
1506
1514
|
cookies: Cookie[];
|
|
1515
|
+
sessionState?: any;
|
|
1507
1516
|
} | undefined>;
|
|
1508
1517
|
dispose(): Promise<void>;
|
|
1509
1518
|
private ensureEngine;
|
package/dist/index.d.ts
CHANGED
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions } from 'crawlee';
|
|
1
|
+
import { CrawlingContext, BasicCrawler, BasicCrawlerOptions, RequestQueue, Cookie, Session, CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions, PlaywrightCrawlingContext, PlaywrightCrawler, PlaywrightCrawlerOptions, SessionPoolOptions } from 'crawlee';
|
|
2
2
|
export { Cookie } from 'crawlee';
|
|
3
3
|
import { EventEmitter } from 'events-ex';
|
|
4
4
|
|
|
@@ -946,7 +946,9 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
|
|
|
946
946
|
protected isCrawlerReady?: boolean;
|
|
947
947
|
protected requestQueue?: RequestQueue;
|
|
948
948
|
protected hdrs: Record<string, string>;
|
|
949
|
-
protected
|
|
949
|
+
protected _initialCookies?: Cookie[];
|
|
950
|
+
protected _initializedSessions: Set<string>;
|
|
951
|
+
protected currentSession?: Session;
|
|
950
952
|
protected pendingRequests: Map<string, PendingEngineRequest>;
|
|
951
953
|
protected requestCounter: number;
|
|
952
954
|
protected actionEmitter: EventEmitter;
|
|
@@ -1072,6 +1074,7 @@ declare abstract class FetchEngine<TContext extends CrawlingContext = any, TCraw
|
|
|
1072
1074
|
*/
|
|
1073
1075
|
getState(): Promise<{
|
|
1074
1076
|
cookies: Cookie[];
|
|
1077
|
+
sessionState?: any;
|
|
1075
1078
|
}>;
|
|
1076
1079
|
/**
|
|
1077
1080
|
* Gets the execution mode of this engine (`'http'` or `'browser'`).
|
|
@@ -1256,6 +1259,7 @@ type CheerioNode = ReturnType<CheerioSelection['first']>;
|
|
|
1256
1259
|
declare class CheerioFetchEngine extends FetchEngine<CheerioCrawlingContext, CheerioCrawler, CheerioCrawlerOptions> {
|
|
1257
1260
|
static readonly id = "cheerio";
|
|
1258
1261
|
static readonly mode = "http";
|
|
1262
|
+
private _ensureCheerioContext;
|
|
1259
1263
|
protected _buildResponse(context: CheerioCrawlingContext): Promise<FetchResponse>;
|
|
1260
1264
|
protected _querySelectorAll(context: {
|
|
1261
1265
|
$: CheerioAPI;
|
|
@@ -1410,6 +1414,9 @@ interface BaseFetcherProperties {
|
|
|
1410
1414
|
antibot?: boolean;
|
|
1411
1415
|
headers?: Record<string, string>;
|
|
1412
1416
|
cookies?: Cookie[];
|
|
1417
|
+
sessionState?: any;
|
|
1418
|
+
sessionPoolOptions?: SessionPoolOptions;
|
|
1419
|
+
overrideSessionState?: boolean;
|
|
1413
1420
|
reuseCookies?: boolean;
|
|
1414
1421
|
throwHttpErrors?: boolean;
|
|
1415
1422
|
proxy?: string | string[];
|
|
@@ -1489,10 +1496,11 @@ declare const FetcherOptionKeys: string[];
|
|
|
1489
1496
|
|
|
1490
1497
|
declare class FetchSession {
|
|
1491
1498
|
private options;
|
|
1499
|
+
readonly engine?: string | undefined;
|
|
1492
1500
|
readonly id: string;
|
|
1493
1501
|
readonly context: FetchContext;
|
|
1494
1502
|
private closed;
|
|
1495
|
-
constructor(options?: FetcherOptions);
|
|
1503
|
+
constructor(options?: FetcherOptions, engine?: string | undefined);
|
|
1496
1504
|
/**
|
|
1497
1505
|
* 执行单个动作
|
|
1498
1506
|
*/
|
|
@@ -1504,6 +1512,7 @@ declare class FetchSession {
|
|
|
1504
1512
|
getOutputs(): Record<string, any>;
|
|
1505
1513
|
getState(): Promise<{
|
|
1506
1514
|
cookies: Cookie[];
|
|
1515
|
+
sessionState?: any;
|
|
1507
1516
|
} | undefined>;
|
|
1508
1517
|
dispose(): Promise<void>;
|
|
1509
1518
|
private ensureEngine;
|
package/dist/index.js
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
"use strict";var t,e=Object.create,i=Object.defineProperty,s=Object.getOwnPropertyDescriptor,n=Object.getOwnPropertyNames,r=Object.getPrototypeOf,o=Object.prototype.hasOwnProperty,a=(t,e,r,a)=>{if(e&&"object"==typeof e||"function"==typeof e)for(let c of n(e))o.call(t,c)||c===r||i(t,c,{get:()=>e[c],enumerable:!(a=s(e,c))||a.enumerable});return t},c={};((t,e)=>{for(var s in e)i(t,s,{get:e[s],enumerable:!0})})(c,{CheerioFetchEngine:()=>j,ClickAction:()=>H,DefaultFetcherProperties:()=>l,ExtractAction:()=>W,FetchAction:()=>f,FetchActionResultStatus:()=>h,FetchEngine:()=>S,FetchSession:()=>$,FetcherOptionKeys:()=>u,FillAction:()=>M,GetContentAction:()=>L,GotoAction:()=>G,PauseAction:()=>B,PlaywrightFetchEngine:()=>N,SubmitAction:()=>z,WaitForAction:()=>D,WebFetcher:()=>R,fetchWeb:()=>I}),module.exports=(t=c,a(i({},"__esModule",{value:!0}),t));var l={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},u=Object.keys(l).concat(["actions","onPause"]),h=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(h||{}),w=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const n=[],r=new Set;for(const i of s){const s=d(i.activateOn),o=d(i.collectOn),a=d(i.deactivateOn),c=!(i.background??!0),l=t.create(i);if(!l)continue;let u=!1,h=!1,w=0;const f=async t=>{if(!u&&!h){u=!0;try{await(l.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!h){u||await f(s);try{const n=Promise.resolve(l.onExecute?.(e,i,s)).then(s=>{var n,r;if(i.storeAs){((n=e.outputs)[r=i.storeAs]||(n[r]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{w++});c&&(r.add(n),n.finally(()=>r.delete(n)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!h){0===w&&m("collector:after"),h=!0;try{await(l.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),v.forEach(t=>t())}}},b=p(e,s,f),v=y(e,o,m),x=p(e,a,g);if(n.push(...b,...v,...x),!s.length&&!o.length&&!a.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),n.push(()=>e.eventBus.off("fetcher:action:end",t))}}return n.length||r.size>0?{cleanup:()=>n.forEach(t=>t()),awaitExecPendings:async()=>{r.size>0&&await Promise.allSettled(Array.from(r))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,n=i.length>0?i[i.length-1].id:void 0,r={...e,id:this.id,depth:s,parent:n};i.push(r),t.currentAction=r;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const n=t.internal.actionStack,r=n.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:r,stack:[...n]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{n.pop();const e=n.length;t.currentAction=e>0?n[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e);let s;try{const i=e?.failOnError??!0;return t.throwHttpErrors=i,s=await this.onExecute(t,e),s&&s.returnType||(s={status:1,returnType:this.returnType??"any",result:s}),s}catch(i){if(s={status:0,error:i,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},e?.failOnError)throw i;return s}finally{await this.afterExec(t,e,s,i)}}};w.registry=new Map,w.returnType="any",w.capabilities={http:"noop",browser:"noop"};var f=w;function d(t){return t?Array.isArray(t)?t:[t]:[]}function p(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(n,e),s.push(()=>t.eventBus.off(n,e))}return s}function y(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=t=>i(n,t);t.eventBus.on(n,e),s.push(()=>t.eventBus.off(n,e))}return s}var m=require("events-ex");var g=require("lodash-es"),b=(0,require("nanoid").customAlphabet)("0123456789abcdefghijklmnopqrstuvwxyz",12);var v=require("lodash-es"),x=require("events-ex"),q=require("@isdk/common-error"),E=require("crawlee");function k(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}E.Configuration.getGlobalConfig().set("persistStorage",!1);var S=class{constructor(){this.hdrs={},this.jar=[],this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new x.EventEmitter,this.isPageActive=!1,this.navigationLock=function(){const t=k();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(t,e){const i=(0,v.defaultsDeep)(e,t,l),s=i.engine??t.engine,n=s?this.get(s)??this.getByMode(s):null;if(n){const e=new n;return await e.initialize(t,i),e}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let n=e;if(i){const t=await this._querySelectorAll(e,i);n=t.length>0?t[0]:null}if(!n)return null;const r={};for(const t in s)r[t]=await this._extract(s[t],n);return r}if("array"===i){const{selector:i,items:s}=t,n=i?await this._querySelectorAll(e,i):[e],r=[];for(const t of n)r.push(await this._extract(s,t));return r}const{selector:s}=t;let n=e;if(s){const t=await this._querySelectorAll(e,s);n=t.length>0?t[0]:null}return n?this._extractValue(t,n):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,n=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=n,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies()}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;(0,v.merge)(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this.jar=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await E.RequestQueue.open();const i=await this._getSpecificCrawlerOptions(t),s={...(0,v.defaultsDeep)(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:{maxPoolSize:1,persistenceOptions:{enable:!1},sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}}}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};this.crawler=this._createCrawler(s),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){try{const{request:e}=t;this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this._buildResponse(t),n=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&n){const t=new q.CommonError(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,n=t?.statusCode||500,r=t?.url?t.url:i.url,o=new q.CommonError(`Request${r?" for "+r:""} failed: ${e.message}`,"request",n);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){return Array.isArray(t)?(this.jar=[...t],!0):null===t?(this.jar=[],!0):[...this.jar]}async dispose(){await this.cleanup()}};async function C(t,e){const i=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),s=t.engine||i?.engine||"auto";let n=await S.create(t,{engine:s});return n||(n=await S.create(t,{engine:"http"})),n}S.registry=new Map;var $=class{constructor(t={}){this.options=t,this.closed=!1,this.id=b(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=f.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await C(this.context,{url:e}))throw new Error("No engine found")}}createContext(t=this.options){const e=new m.EventEmitter;return(0,g.defaultsDeep)({...t,id:this.id,eventBus:e,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},l)}},R=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new $(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}},A=require("crawlee"),P=((t,s,n)=>(n=null!=t?e(r(t)):{},a(!s&&t&&t.__esModule?n:i(n,"default",{value:t,enumerable:!0}),t)))(require("cheerio")),F=require("@isdk/common-error"),j=class extends S{async _buildResponse(t){const{request:e,response:i,body:s,$:n}=t,r=n?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");r&&r!==o&&(o=r);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:n="string"}=t;if(0===i.length)return null;let r="";if(r=s?i.attr(s)??null:"html"===n?i.html():i.text().trim(),null===r)return null;switch(n){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,n=i(s).first();let r;if(0===n.length)try{r=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new F.CommonError(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!n.is("a")||!n.attr("href")){if(n.is('input[type="submit"], button[type="submit"], button, input')){const e=n.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new F.CommonError("click: submit-like element without form","click")}throw new F.CommonError(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=n.attr("href");r=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:r});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new F.CommonError(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new F.CommonError(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new F.NotFoundError(e.selector,"submit");const n=s.attr("action")||t.request.loadedUrl||t.request.url,r=(s.attr("method")||"GET").toUpperCase(),o=new URL(n,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),n=s.attr("name");if(!n)return;const r=s.val();null!=r&&(a[n]=String(r))}),"GET"===r){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const n={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),n["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),n["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:n})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new F.CommonError(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",F.ErrorCode.NotSupported)}}_updateStateAfterNavigation(t,e){const i=e;let s=i.headers;if(!s&&i.rawHeaders){s={};for(let t=0;t<i.rawHeaders.length;t+=2)s[i.rawHeaders[t].toLowerCase()]=i.rawHeaders[t+1]}s=s||{};const n=i.body,r=P.load(n??"");t.$=r,t.response=i,t.body=n;const o=r.html(),a=r.text(),c=(s["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:i.url,statusCode:i.statusCode,statusText:i.statusMessage,headers:s,contentType:c,body:n,html:o,text:a}}_createCrawler(t){return new A.CheerioCrawler(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new A.ProxyConfiguration({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{if(s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs}),this.jar.length>0&&e)for(const t of this.jar){const s=`${t.name}=${t.value}`;e.setCookie(s,i.url)}}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const n=e?.timeoutMs||this.opts?.timeoutMs||3e4,r=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new F.CommonError(`goto timed out after ${n}ms.`,"gotoTimeout",F.ErrorCode.RequestTimeout))},n);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(r),t(e)},reject:t=>{clearTimeout(r),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=k(),s}};j.id="cheerio",j.mode="http",S.register(j);var T=require("crawlee"),O=require("playwright"),_=require("camoufox-js"),U=require("@isdk/common-error"),N=class extends S{async _buildResponse(t){const{page:e,response:i,request:s}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const n=await e.content(),r=await e.textContent("body");return{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:await e.context().cookies(),body:n,html:n,text:r||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let n="";if(n=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===n)return null;switch(n=n.trim(),s){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const n=await this.buildResponse(t);return this.lastResponse=n,n}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const n=e.selector||"form",r=i.locator(n).first();if(0===await r.count())throw new U.NotFoundError(n,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await r.elementHandle();if(!t)throw new U.CommonError(`submit: could not get form handle for ${n}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),n=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:n,html:n,text:n,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await r.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new U.CommonError(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",U.ErrorCode.NotSupported)}}_createCrawler(t){return new T.PlaywrightCrawler(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{if(s.throwHttpErrors=t.throwHttpErrors,this.jar.length>0)try{const t=this.jar.map(t=>{const e={...t};return e.domain||e.url||(e.url=i.url),"no_restriction"===e.sameSite&&(e.sameSite="None"),e});await e.context().addCookies(t)}catch(t){console.error("[Playwright] Failed to restore cookies:",t)}const n=this.blockedTypes;n.size>0&&await e.route("**/*",t=>{n.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const t=await(0,_.launchOptions)({headless:e});i.launchContext={launcher:O.firefox,launchOptions:t},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new U.CommonError("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};N.id="playwright",N.mode="browser",S.register(N);var H=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};H.id="click",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},f.register(H);var M=class extends f{async onExecute(t,e){const{selector:i,value:s,...n}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,n)}};M.id="fill",M.returnType="none",M.capabilities={http:"simulate",browser:"native"},f.register(M);var L=class extends f{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};L.id="getContent",L.returnType="response",L.capabilities={http:"native",browser:"native"},f.register(L);var G=class extends f{async onExecute(t,e,i){const s=e?.params,n=s?.url||t.url;if(!n)throw new Error("URL is required for goto action");const r=t.internal.engine;if(!r)throw new Error("No engine available");t.url=n;return await r.goto(n,s)}};G.id="goto",G.returnType="response",G.capabilities={http:"native",browser:"native"},f.register(G);var z=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};z.id="submit",z.returnType="none",z.capabilities={http:"simulate",browser:"native"},f.register(z);var D=class extends f{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};D.id="waitFor",D.returnType="none",D.capabilities={http:"native",browser:"native"},f.register(D);var W=class extends f{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};W.id="extract",W.returnType="any",W.capabilities={http:"native",browser:"native"},f.register(W);var B=class extends f{async onExecute(t,e){const{selector:i,message:s,attribute:n}=e?.params||{},r=t.internal.engine;if("browser"===r?.mode){if(i){if(!await(r?.extract({selector:i,attribute:n})))return}r&&"pause"in r?await r.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function I(t,e){return(new R).fetch(t,e)}B.id="pause",B.capabilities={http:"native",browser:"native"},B.returnType="none",f.register(B);
|
|
1
|
+
"use strict";var t,e=Object.create,i=Object.defineProperty,s=Object.getOwnPropertyDescriptor,n=Object.getOwnPropertyNames,r=Object.getPrototypeOf,o=Object.prototype.hasOwnProperty,a=(t,e,r,a)=>{if(e&&"object"==typeof e||"function"==typeof e)for(let c of n(e))o.call(t,c)||c===r||i(t,c,{get:()=>e[c],enumerable:!(a=s(e,c))||a.enumerable});return t},c={};((t,e)=>{for(var s in e)i(t,s,{get:e[s],enumerable:!0})})(c,{CheerioFetchEngine:()=>O,ClickAction:()=>N,DefaultFetcherProperties:()=>l,ExtractAction:()=>W,FetchAction:()=>f,FetchActionResultStatus:()=>h,FetchEngine:()=>k,FetchSession:()=>$,FetcherOptionKeys:()=>u,FillAction:()=>H,GetContentAction:()=>M,GotoAction:()=>L,PauseAction:()=>z,PlaywrightFetchEngine:()=>U,SubmitAction:()=>B,WaitForAction:()=>G,WebFetcher:()=>R,fetchWeb:()=>D}),module.exports=(t=c,a(i({},"__esModule",{value:!0}),t));var l={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},u=Object.keys(l).concat(["actions","onPause"]),h=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(h||{}),w=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const n=[],r=new Set;for(const i of s){const s=d(i.activateOn),o=d(i.collectOn),a=d(i.deactivateOn),c=!(i.background??!0),l=t.create(i);if(!l)continue;let u=!1,h=!1,w=0;const f=async t=>{if(!u&&!h){u=!0;try{await(l.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!h){u||await f(s);try{const n=Promise.resolve(l.onExecute?.(e,i,s)).then(s=>{var n,r;if(i.storeAs){((n=e.outputs)[r=i.storeAs]||(n[r]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{w++});c&&(r.add(n),n.finally(()=>r.delete(n)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:l.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!h){0===w&&m("collector:after"),h=!0;try{await(l.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),v.forEach(t=>t())}}},b=p(e,s,f),v=y(e,o,m),x=p(e,a,g);if(n.push(...b,...v,...x),!s.length&&!o.length&&!a.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),n.push(()=>e.eventBus.off("fetcher:action:end",t))}}return n.length||r.size>0?{cleanup:()=>n.forEach(t=>t()),awaitExecPendings:async()=>{r.size>0&&await Promise.allSettled(Array.from(r))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,n=i.length>0?i[i.length-1].id:void 0,r={...e,id:this.id,depth:s,parent:n};i.push(r),t.currentAction=r;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const n=t.internal.actionStack,r=n.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:r,stack:[...n]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{n.pop();const e=n.length;t.currentAction=e>0?n[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let n;try{return t.throwHttpErrors=s,n=await this.onExecute(t,e),n&&n.returnType||(n={status:1,returnType:this.returnType??"any",result:n}),n}catch(e){if(n={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return n}finally{await this.afterExec(t,e,n,i)}}};w.registry=new Map,w.returnType="any",w.capabilities={http:"noop",browser:"noop"};var f=w;function d(t){return t?Array.isArray(t)?t:[t]:[]}function p(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(n,e),s.push(()=>t.eventBus.off(n,e))}return s}function y(t,e,i){const s=[];for(const n of e)if("string"==typeof n||n instanceof RegExp){const e=t=>i(n,t);t.eventBus.on(n,e),s.push(()=>t.eventBus.off(n,e))}return s}var m=require("events-ex");var g=require("lodash-es"),b=(0,require("nanoid").customAlphabet)("0123456789abcdefghijklmnopqrstuvwxyz",12);var v=require("lodash-es"),x=require("events-ex"),q=require("@isdk/common-error"),E=require("crawlee");function S(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}E.Configuration.getGlobalConfig().set("persistStorage",!1);var k=class{constructor(){this.hdrs={},this._initializedSessions=new Set,this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new x.EventEmitter,this.isPageActive=!1,this.navigationLock=function(){const t=S();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(t,e){const i=(0,v.defaultsDeep)(e,t,l),s=i.engine??t.engine,n=s?this.get(s)??this.getByMode(s):null;if(n){const e=new n;return await e.initialize(t,i),e}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let n=e;if(i){const t=await this._querySelectorAll(e,i);n=t.length>0?t[0]:null}if(!n)return null;const r={};for(const t in s)r[t]=await this._extract(s[t],n);return r}if("array"===i){const{selector:i,items:s}=t,n=i?await this._querySelectorAll(e,i):[e],r=[];for(const t of n)r.push(await this._extract(s,t));return r}const{selector:s}=t;let n=e;if(s){const t=await this._querySelectorAll(e,s);n=t.length>0?t[0]:null}return n?this._extractValue(t,n):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,n=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=n,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;(0,v.merge)(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await E.RequestQueue.open();const i=await this._getSpecificCrawlerOptions(t),s=(0,v.defaultsDeep)({persistenceOptions:{enable:!0}},t.sessionPoolOptions,{maxPoolSize:1,sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}});t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const n={...(0,v.defaultsDeep)(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:s}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};n.preNavigationHooks||(n.preNavigationHooks=[]),n.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,e&&!this._initializedSessions.has(e.id)){if(this._initialCookies&&this._initialCookies.length>0){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url)}this._initializedSessions.add(e.id)}});const r=this.crawler=this._createCrawler(n),o=await E.KeyValueStore.open(null,{config:r.config}),a=await o.getValue(E.PERSIST_STATE_KEY);!t.sessionState||a&&!t.overrideSessionState||await o.setValue(E.PERSIST_STATE_KEY,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){const{request:e}=t;try{this.currentSession=t.session,this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this._buildResponse(t),n=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&n){const t=new q.CommonError(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{if(this.currentSession){const t=this.currentSession.getCookies(e.url);t&&(this._initialCookies=t)}this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,n=t?.statusCode||500,r=t?.url?t.url:i.url,o=new q.CommonError(`Request${r?" for "+r:""} failed: ${e.message}`,"request",n);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this._initializedSessions.clear(),this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function C(t,e){let i;if(e?.engine){if(i=await k.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),n=t.engine||s?.engine||"auto";return i=await k.create(t,{engine:n}),i||(i=await k.create(t,{engine:"http"})),i}k.registry=new Map;var $=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=b(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=f.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await C(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(t=this.options){const e=new m.EventEmitter;return(0,g.defaultsDeep)({...t,id:this.id,eventBus:e,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},l)}},R=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new $(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}},A=require("crawlee"),P=((t,s,n)=>(n=null!=t?e(r(t)):{},a(!s&&t&&t.__esModule?n:i(n,"default",{value:t,enumerable:!0}),t)))(require("cheerio")),F=require("@isdk/common-error"),O=class extends k{_ensureCheerioContext(t){if(!t.$&&t.body){let e="string"==typeof t.body?t.body:Buffer.isBuffer(t.body)?t.body.toString("utf-8"):JSON.stringify(t.body);e.trim().startsWith("<")||(e=`<html><body><pre>${e}</pre></body></html>`),t.$=P.load(e)}}async _buildResponse(t){this._ensureCheerioContext(t);const{request:e,response:i,body:s,$:n}=t,r=n?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");r&&r!==o&&(o=r);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:n="string"}=t;if(0===i.length)return null;let r="";if(r=s?i.attr(s)??null:"html"===n?i.html():i.text().trim(),null===r)return null;switch(n){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,n=i(s).first();let r;if(0===n.length)try{r=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new F.CommonError(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!n.is("a")||!n.attr("href")){if(n.is('input[type="submit"], button[type="submit"], button, input')){const e=n.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new F.CommonError("click: submit-like element without form","click")}throw new F.CommonError(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=n.attr("href");r=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:r});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new F.CommonError(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new F.CommonError(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new F.CommonError(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new F.NotFoundError(e.selector,"submit");const n=s.attr("action")||t.request.loadedUrl||t.request.url,r=(s.attr("method")||"GET").toUpperCase(),o=new URL(n,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),n=s.attr("name");if(!n)return;const r=s.val();null!=r&&(a[n]=String(r))}),"GET"===r){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const n={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),n["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),n["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:n})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new F.CommonError(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",F.ErrorCode.NotSupported)}}_updateStateAfterNavigation(t,e){const i=e;let s=i.headers;if(!s&&i.rawHeaders){s={};for(let t=0;t<i.rawHeaders.length;t+=2)s[i.rawHeaders[t].toLowerCase()]=i.rawHeaders[t+1]}s=s||{};const n=i.body,r=P.load(n??"");t.$=r,t.response=i,t.body=n;const o=r.html(),a=r.text(),c=(s["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:i.url,statusCode:i.statusCode,statusText:i.statusMessage,headers:s,contentType:c,body:n,html:o,text:a}}_createCrawler(t){return new A.CheerioCrawler(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new A.ProxyConfiguration({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const n=e?.timeoutMs||this.opts?.timeoutMs||3e4,r=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new F.CommonError(`goto timed out after ${n}ms.`,"gotoTimeout",F.ErrorCode.RequestTimeout))},n);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(r),t(e)},reject:t=>{clearTimeout(r),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=S(),s}};O.id="cheerio",O.mode="http",k.register(O);var j=require("crawlee"),T=require("playwright"),_=require("@isdk/common-error"),U=class extends k{async _buildResponse(t){const{page:e,response:i,request:s,session:n}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const r=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return console.log("[PlaywrightEngine] Cookies from page context:",a),n&&n.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:r,html:r,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let n="";if(n=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===n)return null;switch(n=n.trim(),s){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const n=await this.buildResponse(t);return this.lastResponse=n,n}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const n=await this.buildResponse(t);return void(this.lastResponse=n);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const n=e.selector||"form",r=i.locator(n).first();if(0===await r.count())throw new _.NotFoundError(n,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await r.elementHandle();if(!t)throw new _.CommonError(`submit: could not get form handle for ${n}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),n=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:n,html:n,text:n,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await r.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new _.CommonError(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",_.ErrorCode.NotSupported)}}_createCrawler(t){return new j.PlaywrightCrawler(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const n=this.blockedTypes;n.size>0&&await e.route("**/*",t=>{n.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:T.firefox,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new _.CommonError("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};U.id="playwright",U.mode="browser",k.register(U);var N=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};N.id="click",N.returnType="none",N.capabilities={http:"simulate",browser:"native"},f.register(N);var H=class extends f{async onExecute(t,e){const{selector:i,value:s,...n}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,n)}};H.id="fill",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},f.register(H);var M=class extends f{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},f.register(M);var L=class extends f{async onExecute(t,e,i){const s=e?.params,n=s?.url||t.url;if(!n)throw new Error("URL is required for goto action");const r=t.internal.engine;if(!r)throw new Error("No engine available");t.url=n;return await r.goto(n,s)}};L.id="goto",L.returnType="response",L.capabilities={http:"native",browser:"native"},f.register(L);var B=class extends f{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};B.id="submit",B.returnType="none",B.capabilities={http:"simulate",browser:"native"},f.register(B);var G=class extends f{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};G.id="waitFor",G.returnType="none",G.capabilities={http:"native",browser:"native"},f.register(G);var W=class extends f{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};W.id="extract",W.returnType="any",W.capabilities={http:"native",browser:"native"},f.register(W);var z=class extends f{async onExecute(t,e){const{selector:i,message:s,attribute:n}=e?.params||{},r=t.internal.engine;if("browser"===r?.mode){if(i){if(!await(r?.extract({selector:i,attribute:n})))return}r&&"pause"in r?await r.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function D(t,e){return(new R).fetch(t,e)}z.id="pause",z.capabilities={http:"native",browser:"native"},z.returnType="none",f.register(z);
|
package/dist/index.mjs
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
var t={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},e=Object.keys(t).concat(["actions","onPause"]),s=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(s||{}),i=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const s="string"==typeof e?e:e.id||e.name||e.action;if(!s)throw new Error("Action must have id, name or action");const i=s instanceof t?s.constructor:this.registry.get(s);return i?new i:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...s){const i=t.internal.engine;if(!i)throw new Error("No engine available");if("function"!=typeof i[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await i[e](...s)}installCollectors(e,s){const i=s?.collectors;if(!i?.length)return;const r=[],c=new Set;for(const s of i){const i=n(s.activateOn),l=n(s.collectOn),u=n(s.deactivateOn),h=!(s.background??!0),w=t.create(s);if(!w)continue;let f=!1,d=!1,p=0;const y=async t=>{if(!f&&!d){f=!0;try{await(w.onBeforeExec?.(e,s))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,phase:"before",error:t})}}},m=async(t,i)=>{if(!d){f||await y(i);try{const r=Promise.resolve(w.onExecute?.(e,s,i)).then(i=>{var r,n;if(s.storeAs){((r=e.outputs)[n=s.storeAs]||(r[n]=[])).push(i)}return e.eventBus.emit("collector:result",{action:this.id,collector:s.id||s.name,event:t,result:i}),i}).catch(i=>{e.eventBus.emit("collector:error",{action:this.id,collector:s.id||s.name,event:t,phase:"exec",error:i})}).finally(()=>{p++});h&&(c.add(r),r.finally(()=>c.delete(r)))}catch(s){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,event:t,phase:"exec",error:s})}}},g=async()=>{if(!d){0===p&&m("collector:after"),d=!0;try{await(w.onAfterExec?.(e,s))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:s.id||s.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:s.id||s.name}),x.forEach(t=>t())}}},v=o(e,i,y),x=a(e,l,m),b=o(e,u,g);if(r.push(...v,...x,...b),!i.length&&!l.length&&!u.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),r.push(()=>e.eventBus.off("fetcher:action:end",t))}}return r.length||c.size>0?{cleanup:()=>r.forEach(t=>t()),awaitExecPendings:async()=>{c.size>0&&await Promise.allSettled(Array.from(c))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const s=t.internal.actionStack,i=s.length,r=s.length>0?s[s.length-1].id:void 0,n={...e,id:this.id,depth:i,parent:r};s.push(n),t.currentAction=n;const o={action:this,context:t,options:e,depth:i,stack:[...s]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,s,i){const r=t.internal.actionStack,n=r.length-1,o=i?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=s,"response"!==s?.returnType||s.error||(t.lastResponse=s.result),e?.storeAs&&(t.outputs[e.storeAs]=s?.result),s?.error&&(t.currentAction.error=s.error),await(this.onAfterExec?.(t,e));const i={action:this,context:t,options:e,result:s,depth:n,stack:[...r]};s?.error&&(i.error=s.error);try{t.eventBus.emit(`action:${this.id}.end`,i)}catch(t){}try{t.eventBus.emit("action:end",i)}catch(t){}}finally{try{o?.cleanup()}finally{r.pop();const e=r.length;t.currentAction=e>0?r[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const s=await this.beforeExec(t,e);let i;try{const s=e?.failOnError??!0;return t.throwHttpErrors=s,i=await this.onExecute(t,e),i&&i.returnType||(i={status:1,returnType:this.returnType??"any",result:i}),i}catch(s){if(i={status:0,error:s,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},e?.failOnError)throw s;return i}finally{await this.afterExec(t,e,i,s)}}};i.registry=new Map,i.returnType="any",i.capabilities={http:"noop",browser:"noop"};var r=i;function n(t){return t?Array.isArray(t)?t:[t]:[]}function o(t,e,s){const i=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=(...t)=>{s(t[0])};t.eventBus.once(r,e),i.push(()=>t.eventBus.off(r,e))}return i}function a(t,e,s){const i=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=t=>s(r,t);t.eventBus.on(r,e),i.push(()=>t.eventBus.off(r,e))}return i}import{EventEmitter as c}from"events-ex";import{defaultsDeep as l}from"lodash-es";import{customAlphabet as u}from"nanoid";var h=u("0123456789abcdefghijklmnopqrstuvwxyz",12);import{defaultsDeep as w,merge as f}from"lodash-es";import{EventEmitter as d}from"events-ex";import{CommonError as p}from"@isdk/common-error";import{Configuration as y,RequestQueue as m}from"crawlee";function g(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}y.getGlobalConfig().set("persistStorage",!1);var v=class{constructor(){this.hdrs={},this.jar=[],this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new d,this.isPageActive=!1,this.navigationLock=function(){const t=g();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,s]of this.registry.entries())if(s.mode===t)return s}static async create(e,s){const i=w(s,e,t),r=i.engine??e.engine,n=r?this.get(r)??this.getByMode(r):null;if(n){const t=new n;return await t.initialize(e,i),t}}async _extract(t,e){const s=t.type;if(!e)return"array"===s?[]:null;if("object"===s){const{selector:s,properties:i}=t;let r=e;if(s){const t=await this._querySelectorAll(e,s);r=t.length>0?t[0]:null}if(!r)return null;const n={};for(const t in i)n[t]=await this._extract(i[t],r);return n}if("array"===s){const{selector:s,items:i}=t,r=s?await this._querySelectorAll(e,s):[e],n=[];for(const t of r)n.push(await this._extract(i,t));return n}const{selector:i}=t;let r=e;if(i){const t=await this._querySelectorAll(e,i);r=t.length>0?t[0]:null}return r?this._extractValue(t,r):null}async buildResponse(t){const e=await this._buildResponse(t),s=e.headers["content-type"]||"";return e.contentType=s.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:s,exclude:i}=e,r=t.split(",").map(t=>{let e=t.trim();return s&&(e=`${e}:has(${s})`),i&&(e=`${e}:not(${i})`),e}).join(", ");e.selector=r,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies()}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;f(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[s,i]of Object.entries(t))e[s.toLowerCase()]=i;return e}(t.headers),this.jar=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await m.open();const s=await this._getSpecificCrawlerOptions(t),i={...w(s,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:{maxPoolSize:1,persistenceOptions:{enable:!1},sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}}}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};this.crawler=this._createCrawler(i),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const s=async({action:e,resolve:s,reject:i})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void s();s(await this.executeAction(t,e))}catch(t){i(t)}};this.actionEmitter.on("dispatch",s),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",s),e()})})}async _sharedRequestHandler(t){try{const{request:e}=t;this.isPageActive=!0;const s=this.pendingRequests.get(e.userData.requestId);if(s){const i=await this._buildResponse(t),r=!i.statusCode||i.statusCode>=400;if(this.ctx?.throwHttpErrors&&r){const t=new p(`Request for ${i.finalUrl} failed with status ${i.statusCode||"N/A"}`,"request",i.statusCode);s.reject(t)}else this.lastResponse=i,s.resolve(i);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:s}=t,i=this.pendingRequests.get(s.userData.requestId);if(i&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(s.userData.requestId);const t=e.response,r=t?.statusCode||500,n=t?.url?t.url:s.url,o=new p(`Request${n?" for "+n:""} failed: ${e.message}`,"request",r);i.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,s)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:s})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const s={};for(const[e,i]of Object.entries(t))s[e.toLowerCase()]=String(i);return this.hdrs=!0===e?s:{...this.hdrs,...s},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){return Array.isArray(t)?(this.jar=[...t],!0):null===t?(this.jar=[],!0):[...this.jar]}async dispose(){await this.cleanup()}};async function x(t,e){const s=function(t,e){if(!t||!e?.length)return null;const s=new URL(t);let i=e.find(t=>t.domain===s.hostname);i||(i=e.find(t=>s.hostname.endsWith(t.domain)));if(!i)return null;if(i.pathScope?.length){if(!i.pathScope.some(t=>s.pathname.startsWith(t)))return null}return i}(e?.url||t.url,t.sites),i=t.engine||s?.engine||"auto";let r=await v.create(t,{engine:i});return r||(r=await v.create(t,{engine:"http"})),r}v.registry=new Map;var b=class{constructor(t={}){this.options=t,this.closed=!1,this.id=h(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=r.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let s,i;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return s=await e.execute(this.context,t),s}catch(t){throw i=t,i}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const s=t[e];await this.execute(s),e++}const s=await this.execute({id:"getContent"});return{result:s?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await x(this.context,{url:e}))throw new Error("No engine found")}}createContext(e=this.options){const s=new c;return l({...e,id:this.id,eventBus:s,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,s){return this.execute({name:t,params:e,...s})}},t)}},E=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new b(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const s=await this.createSession(e);try{const i=e?.actions||[];t&&0!==i.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&i.unshift({id:"goto",params:{url:t}});return await s.executeAll(i)}finally{await s.dispose()}}};import{CheerioCrawler as k,ProxyConfiguration as S}from"crawlee";import*as q from"cheerio";import{CommonError as C,ErrorCode as $,NotFoundError as R}from"@isdk/common-error";var P=class extends v{async _buildResponse(t){const{request:e,response:s,body:i,$:r}=t,n=r?.html();let o="string"==typeof i?i:Buffer.isBuffer(i)?i.toString("utf-8"):String(i??"");n&&n!==o&&(o=n);let a=s?.headers;if(!a&&s?.rawHeaders){a={};const t=s.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:s?.statusCode??200,statusText:s?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:i,html:o,text:o}}async _querySelectorAll(t,e){const{$:s,el:i}=t;return i.find(e).toArray().map(t=>({$:s,el:s(t)}))}async _extractValue(t,e){const{el:s}=e,{attribute:i,type:r="string"}=t;if(0===s.length)return null;let n="";if(n=i?s.attr(i)??null:"html"===r?s.html():s.text().trim(),null===n)return null;switch(r){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{$:s}=t;switch(e.type){case"dispose":return;case"extract":if(!s)throw new C(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:s,el:s.root()});case"click":{if(!s)throw new C(`Cheerio context not available for action: ${e.type}`,"click");const i=e.selector,r=s(i).first();let n;if(0===r.length)try{n=new URL(i,t.request.loadedUrl||t.request.url).href}catch{throw new C(`click: selector not found or invalid URL: ${i}`,"click")}else{if(!r.is("a")||!r.attr("href")){if(r.is('input[type="submit"], button[type="submit"], button, input')){const e=r.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new C("click: submit-like element without form","click")}throw new C(`click: unsupported element for http simulate. Selector: ${i}`,"click")}{const e=r.attr("href");n=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:n});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!s)throw new C(`Cheerio context not available for action: ${e.type}`),"fill";const i=s(e.selector).first();if(0===i.length)throw new C(`fill: selector not found: ${e.selector}`);if(!i.is("input, textarea, select"))throw new C(`fill: not a form field: ${e.selector}`);return i.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const i=this.ctx?.onPause;return void(i?(console.info(e.message||"Execution paused for manual intervention."),await i({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!s)throw new C(`Cheerio context not available for action: ${e.type}`,"submit");const i="string"==typeof e.selector?s(e.selector).first():null!=e.selector?e.selector:s("form").first();if(0===i.length)throw new R(e.selector,"submit");const r=i.attr("action")||t.request.loadedUrl||t.request.url,n=(i.attr("method")||"GET").toUpperCase(),o=new URL(r,t.request.loadedUrl||t.request.url).href,a={};let c;if(i.find("input, select, textarea").each((t,e)=>{const i=s(e),r=i.attr("name");if(!r)return;const n=i.val();null!=n&&(a[r]=String(n))}),"GET"===n){const e=new URL(o);Object.entries(a).forEach(([t,s])=>e.searchParams.set(t,s)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let s;const r={};"application/json"===(e.options?.enctype||i.attr("enctype")||"application/x-www-form-urlencoded")?(s=JSON.stringify(a),r["Content-Type"]="application/json"):(s=new URLSearchParams(a).toString(),r["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:s,headers:r})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new C(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",$.NotSupported)}}_updateStateAfterNavigation(t,e){const s=e;let i=s.headers;if(!i&&s.rawHeaders){i={};for(let t=0;t<s.rawHeaders.length;t+=2)i[s.rawHeaders[t].toLowerCase()]=s.rawHeaders[t+1]}i=i||{};const r=s.body,n=q.load(r??"");t.$=n,t.response=s,t.body=r;const o=n.html(),a=n.text(),c=(i["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:s.url,statusCode:s.statusCode,statusText:s.statusMessage,headers:i,contentType:c,body:r,html:o,text:a}}_createCrawler(t){return new k(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,s=e?.length?new S({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:s,preNavigationHooks:[({session:e,request:s},i)=>{if(i.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(i.timeout={request:this.opts.timeoutMs}),this.jar.length>0&&e)for(const t of this.jar){const i=`${t.name}=${t.value}`;e.setCookie(i,s.url)}}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const s="req-"+ ++this.requestCounter,i=new Promise((t,i)=>{const r=e?.timeoutMs||this.opts?.timeoutMs||3e4,n=setTimeout(()=>{this.pendingRequests.delete(s),this.navigationLock.release(),i(new C(`goto timed out after ${r}ms.`,"gotoTimeout",$.RequestTimeout))},r);this.pendingRequests.set(s,{resolve:e=>{clearTimeout(n),t(e)},reject:t=>{clearTimeout(n),i(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:s},uniqueKey:`${t}-${s}`}).catch(t=>{const e=this.pendingRequests.get(s);e&&(this.pendingRequests.delete(s),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=g(),i}};P.id="cheerio",P.mode="http",v.register(P);import{PlaywrightCrawler as T}from"crawlee";import{firefox as A}from"playwright";import{launchOptions as U}from"camoufox-js";import{CommonError as _,ErrorCode as j,NotFoundError as O}from"@isdk/common-error";var F=class extends v{async _buildResponse(t){const{page:e,response:s,request:i}=t;if(!e||e.isClosed())return{url:i.url,finalUrl:i.loadedUrl||i.url,statusCode:s?.status(),statusText:s?.statusText(),headers:await(s?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const r=await e.content(),n=await e.textContent("body");return{url:e.url(),finalUrl:e.url(),statusCode:s?.status(),statusText:s?.statusText(),headers:await(s?.allHeaders())||{},cookies:await e.context().cookies(),body:r,html:r,text:n||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:s,type:i="string"}=t;if(0===await e.count())return null;let r="";if(r=s?await e.getAttribute(s):"html"===i?await e.innerHTML():await e.textContent(),null===r)return null;switch(r=r.trim(),i){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{page:s}=t,i=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const i=await s.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});i&&(t={...t,response:i});const r=await this.buildResponse(t);return this.lastResponse=r,r}case"extract":{const i=await this._extract(e.schema,s.locator("body"));return this.lastResponse=await this.buildResponse(t),i}case"click":{await s.click(e.selector,{timeout:i}),await s.waitForLoadState("networkidle",{timeout:i});const r=await this.buildResponse(t);return void(this.lastResponse=r)}case"fill":await s.fill(e.selector,e.value,{timeout:i});const r=await this.buildResponse(t);return void(this.lastResponse=r);case"waitFor":try{e.options?.selector&&await s.waitForSelector(e.options.selector,{timeout:i}),e.options?.networkIdle&&await s.waitForLoadState("networkidle",{timeout:i})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await s.waitForTimeout(e.options.ms));case"submit":{const r=e.selector||"form",n=s.locator(r).first();if(0===await n.count())throw new O(r,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await n.elementHandle();if(!t)throw new _(`submit: could not get form handle for ${r}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),s={};e.forEach((t,e)=>{s[e]=t.toString()});const i=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(s)}),r=await i.text();return{status:i.status,statusText:i.statusText,headers:Object.fromEntries(i.headers.entries()),body:r,html:r,text:r,url:t.action,finalUrl:i.url}});return await t.dispose(),await s.setContent(e.html),void(this.lastResponse=e)}return await n.evaluate(t=>t.submit()),await s.waitForLoadState("networkidle",{timeout:i}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new _(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",j.NotSupported)}}_createCrawler(t){return new T(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,s={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:s},i)=>{if(i.throwHttpErrors=t.throwHttpErrors,this.jar.length>0)try{const t=this.jar.map(t=>{const e={...t};return e.domain||e.url||(e.url=s.url),"no_restriction"===e.sameSite&&(e.sameSite="None"),e});await e.context().addCookies(t)}catch(t){console.error("[Playwright] Failed to restore cookies:",t)}const r=this.blockedTypes;r.size>0&&await e.route("**/*",t=>{r.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){s.browserPoolOptions={useFingerprints:!1};const t=await U({headless:e});s.launchContext={launcher:A,launchOptions:t},s.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return s}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new _("RequestQueue not initialized","goto");const s="req-"+ ++this.requestCounter,i=new Promise((t,e)=>{this.pendingRequests.set(s,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:s,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${s}`}),i}};F.id="playwright",F.mode="browser",v.register(F);var N=class extends r{async onExecute(t,e){const{selector:s,...i}=e?.params||{};if(!s)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",s,i)}};N.id="click",N.returnType="none",N.capabilities={http:"simulate",browser:"native"},r.register(N);var H=class extends r{async onExecute(t,e){const{selector:s,value:i,...r}=e?.params||{};if(!s)throw new Error("Selector is required for fill action");if(void 0===i)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",s,i,r)}};H.id="fill",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},r.register(H);var L=class extends r{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};L.id="getContent",L.returnType="response",L.capabilities={http:"native",browser:"native"},r.register(L);var M=class extends r{async onExecute(t,e,s){const i=e?.params,r=i?.url||t.url;if(!r)throw new Error("URL is required for goto action");const n=t.internal.engine;if(!n)throw new Error("No engine available");t.url=r;return await n.goto(r,i)}};M.id="goto",M.returnType="response",M.capabilities={http:"native",browser:"native"},r.register(M);var z=class extends r{async onExecute(t,e){const{selector:s,...i}=e?.params||{};await this.delegateToEngine(t,"submit",s,i)}};z.id="submit",z.returnType="none",z.capabilities={http:"simulate",browser:"native"},r.register(z);var D=class extends r{async onExecute(t,e){const s=t.internal.engine;if(!s)throw new Error("No engine available");await s.waitFor(e?.params)}};D.id="waitFor",D.returnType="none",D.capabilities={http:"native",browser:"native"},r.register(D);var B=class extends r{async onExecute(t,e){const s=e?.params;if(!s)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",s)}};B.id="extract",B.returnType="any",B.capabilities={http:"native",browser:"native"},r.register(B);var G=class extends r{async onExecute(t,e){const{selector:s,message:i,attribute:r}=e?.params||{},n=t.internal.engine;if("browser"===n?.mode){if(s){if(!await(n?.extract({selector:s,attribute:r})))return}n&&"pause"in n?await n.pause(i):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function I(t,e){return(new E).fetch(t,e)}G.id="pause",G.capabilities={http:"native",browser:"native"},G.returnType="none",r.register(G);export{P as CheerioFetchEngine,N as ClickAction,t as DefaultFetcherProperties,B as ExtractAction,r as FetchAction,s as FetchActionResultStatus,v as FetchEngine,b as FetchSession,e as FetcherOptionKeys,H as FillAction,L as GetContentAction,M as GotoAction,G as PauseAction,F as PlaywrightFetchEngine,z as SubmitAction,D as WaitForAction,E as WebFetcher,I as fetchWeb};
|
|
1
|
+
var t={engine:"auto",enableSmart:!0,useSiteRegistry:!0,antibot:!1,headers:{},cookies:[],reuseCookies:!0,throwHttpErrors:void 0,proxy:[],blockResources:[],ignoreSslErrors:!0,browser:{engine:"playwright",headless:!0,waitUntil:"domcontentloaded"},http:{method:"GET"},timeoutMs:6e4,requestHandlerTimeoutSecs:void 0,maxConcurrency:1,maxRequestsPerMinute:1e3,delayBetweenRequestsMs:0,retries:0,sites:[]},e=Object.keys(t).concat(["actions","onPause"]),i=(t=>(t[t.Failed=0]="Failed",t[t.Success=1]="Success",t[t.Skipped=2]="Skipped",t))(i||{}),s=class t{static register(t){const e=t.id;if(!e)throw new Error("FetchAction.register: actionClass.id is required");this.registry.set(e,t)}static get(t){return this.registry.get(t)}static create(e){const i="string"==typeof e?e:e.id||e.name||e.action;if(!i)throw new Error("Action must have id, name or action");const s=i instanceof t?i.constructor:this.registry.get(i);return s?new s:void 0}static has(t){return this.registry.has(t)}static list(){return Array.from(this.registry.keys())}static getCapability(t){return this.capabilities[t]??"noop"}getCapability(t){return this.constructor.getCapability(t)}get id(){return this.constructor.id}get returnType(){return this.constructor.returnType}get capabilities(){return this.constructor.capabilities}async delegateToEngine(t,e,...i){const s=t.internal.engine;if(!s)throw new Error("No engine available");if("function"!=typeof s[e])throw new Error(`Engine does not have a method named '${String(e)}'`);return await s[e](...i)}installCollectors(e,i){const s=i?.collectors;if(!s?.length)return;const r=[],c=new Set;for(const i of s){const s=n(i.activateOn),l=n(i.collectOn),h=n(i.deactivateOn),u=!(i.background??!0),w=t.create(i);if(!w)continue;let f=!1,d=!1,p=0;const y=async t=>{if(!f&&!d){f=!0;try{await(w.onBeforeExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,phase:"before",error:t})}}},m=async(t,s)=>{if(!d){f||await y(s);try{const r=Promise.resolve(w.onExecute?.(e,i,s)).then(s=>{var r,n;if(i.storeAs){((r=e.outputs)[n=i.storeAs]||(r[n]=[])).push(s)}return e.eventBus.emit("collector:result",{action:this.id,collector:i.id||i.name,event:t,result:s}),s}).catch(s=>{e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,event:t,phase:"exec",error:s})}).finally(()=>{p++});u&&(c.add(r),r.finally(()=>c.delete(r)))}catch(i){e.eventBus.emit("collector:error",{action:this.id,collector:w.id,event:t,phase:"exec",error:i})}}},g=async()=>{if(!d){0===p&&m("collector:after"),d=!0;try{await(w.onAfterExec?.(e,i))}catch(t){e.eventBus.emit("collector:error",{action:this.id,collector:i.id||i.name,phase:"after",error:t})}finally{e.eventBus.emit("collector:end",{action:this.id,collector:i.id||i.name}),x.forEach(t=>t())}}},v=o(e,s,y),x=a(e,l,m),b=o(e,h,g);if(r.push(...v,...x,...b),!s.length&&!l.length&&!h.length){const t=()=>{g()};e.eventBus.once(`action:${this.id}.end`,t),r.push(()=>e.eventBus.off("fetcher:action:end",t))}}return r.length||c.size>0?{cleanup:()=>r.forEach(t=>t()),awaitExecPendings:async()=>{c.size>0&&await Promise.allSettled(Array.from(c))}}:void 0}async beforeExec(t,e){t.internal.actionStack||(t.internal.actionStack=[]);const i=t.internal.actionStack,s=i.length,r=i.length>0?i[i.length-1].id:void 0,n={...e,id:this.id,depth:s,parent:r};i.push(n),t.currentAction=n;const o={action:this,context:t,options:e,depth:s,stack:[...i]};t.eventBus.emit(`action:${this.id}.start`,o),t.eventBus.emit("action:start",o),await(this.onBeforeExec?.(t,e));return{entry:o,collectors:this.installCollectors(t,e)}}async afterExec(t,e,i,s){const r=t.internal.actionStack,n=r.length-1,o=s?.collectors;try{await(o?.awaitExecPendings()),t.lastResult=i,"response"!==i?.returnType||i.error||(t.lastResponse=i.result),e?.storeAs&&(t.outputs[e.storeAs]=i?.result),i?.error&&(t.currentAction.error=i.error),await(this.onAfterExec?.(t,e));const s={action:this,context:t,options:e,result:i,depth:n,stack:[...r]};i?.error&&(s.error=i.error);try{t.eventBus.emit(`action:${this.id}.end`,s)}catch(t){}try{t.eventBus.emit("action:end",s)}catch(t){}}finally{try{o?.cleanup()}finally{r.pop();const e=r.length;t.currentAction=e>0?r[e-1]:void 0}}}async execute(t,e){e?.args&&!e.params&&(e.params=e.args);const i=await this.beforeExec(t,e),s=e?.failOnError??!0;let r;try{return t.throwHttpErrors=s,r=await this.onExecute(t,e),r&&r.returnType||(r={status:1,returnType:this.returnType??"any",result:r}),r}catch(e){if(r={status:0,error:e,meta:{id:this.id,engineType:t.engine,capability:this.getCapability(t.engine)}},s)throw e;return r}finally{await this.afterExec(t,e,r,i)}}};s.registry=new Map,s.returnType="any",s.capabilities={http:"noop",browser:"noop"};var r=s;function n(t){return t?Array.isArray(t)?t:[t]:[]}function o(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=(...t)=>{i(t[0])};t.eventBus.once(r,e),s.push(()=>t.eventBus.off(r,e))}return s}function a(t,e,i){const s=[];for(const r of e)if("string"==typeof r||r instanceof RegExp){const e=t=>i(r,t);t.eventBus.on(r,e),s.push(()=>t.eventBus.off(r,e))}return s}import{EventEmitter as c}from"events-ex";import{defaultsDeep as l}from"lodash-es";import{customAlphabet as h}from"nanoid";var u=h("0123456789abcdefghijklmnopqrstuvwxyz",12);import{defaultsDeep as w,merge as f}from"lodash-es";import{EventEmitter as d}from"events-ex";import{CommonError as p}from"@isdk/common-error";import{Configuration as y,KeyValueStore as m,PERSIST_STATE_KEY as g,RequestQueue as v}from"crawlee";function x(){let t=()=>{};const e=new Promise(e=>{t=e});return e.release=t,e}y.getGlobalConfig().set("persistStorage",!1);var b=class{constructor(){this.hdrs={},this._initializedSessions=new Set,this.pendingRequests=new Map,this.requestCounter=0,this.actionEmitter=new d,this.isPageActive=!1,this.navigationLock=function(){const t=x();return t.release(),t}(),this.blockedTypes=new Set}static register(t){const e=t.id;if(!e)throw new Error("Engine must define static id");if(this.registry.has(e))throw new Error(`Engine id duplicated: ${e}`);this.registry.set(e,t)}static get(t){return this.registry.get(t)}static getByMode(t){for(const[e,i]of this.registry.entries())if(i.mode===t)return i}static async create(e,i){const s=w(i,e,t),r=s.engine??e.engine,n=r?this.get(r)??this.getByMode(r):null;if(n){const t=new n;return await t.initialize(e,s),t}}async _extract(t,e){const i=t.type;if(!e)return"array"===i?[]:null;if("object"===i){const{selector:i,properties:s}=t;let r=e;if(i){const t=await this._querySelectorAll(e,i);r=t.length>0?t[0]:null}if(!r)return null;const n={};for(const t in s)n[t]=await this._extract(s[t],r);return n}if("array"===i){const{selector:i,items:s}=t,r=i?await this._querySelectorAll(e,i):[e],n=[];for(const t of r)n.push(await this._extract(s,t));return n}const{selector:s}=t;let r=e;if(s){const t=await this._querySelectorAll(e,s);r=t.length>0?t[0]:null}return r?this._extractValue(t,r):null}async buildResponse(t){const e=await this._buildResponse(t),i=e.headers["content-type"]||"";return e.contentType=i.split(";")[0].trim(),e}waitFor(t){return this.dispatchAction({type:"waitFor",options:t})}click(t){return this.dispatchAction({type:"click",selector:t})}fill(t,e){return this.dispatchAction({type:"fill",selector:t,value:e})}submit(t,e){return this.dispatchAction({type:"submit",selector:t,options:e})}pause(t){return this.dispatchAction({type:"pause",message:t})}extract(t){const e=this._normalizeSchema(t);return this.dispatchAction({type:"extract",schema:e})}_normalizeSchema(t){const e=JSON.parse(JSON.stringify(t));if(e.properties)for(const t in e.properties)e.properties[t]=this._normalizeSchema(e.properties[t]);if(e.items&&(e.items=this._normalizeSchema(e.items)),"array"===e.type&&(e.attribute&&!e.items&&(e.items={attribute:e.attribute},delete e.attribute),e.items||(e.items={type:"string"})),e.selector&&(e.has||e.exclude)){const{selector:t,has:i,exclude:s}=e,r=t.split(",").map(t=>{let e=t.trim();return i&&(e=`${e}:has(${i})`),s&&(e=`${e}:not(${s})`),e}).join(", ");e.selector=r,delete e.has,delete e.exclude}return e}get id(){return this.constructor.id}async getState(){return{cookies:await this.cookies(),sessionState:await(this.crawler?.sessionPool?.getState())}}get mode(){return this.constructor.mode}get context(){return this.ctx}async initialize(t,e){if(this.ctx)return;f(t,e),this.ctx=t,this.opts=t,this.hdrs=function(t){const e={};if(t&&"object"==typeof t)for(const[i,s]of Object.entries(t))e[i.toLowerCase()]=s;return e}(t.headers),this._initialCookies=[...t.cookies??[]],t.internal||(t.internal={}),t.internal.engine=this,t.engine=this.mode,this.actionEmitter.setMaxListeners(100),this.requestQueue=await v.open();const i=await this._getSpecificCrawlerOptions(t),s=w({persistenceOptions:{enable:!0}},t.sessionPoolOptions,{maxPoolSize:1,sessionOptions:{maxUsageCount:1e3,maxErrorScore:3}});t.sessionState&&t.cookies&&t.cookies.length>0&&console.warn('[FetchEngine] Warning: Both "sessionState" and "cookies" are provided. Explicit "cookies" will override any conflicting cookies restored from "sessionState".');const r={...w(i,{requestQueue:this.requestQueue,maxConcurrency:1,minConcurrency:1,useSessionPool:!0,persistCookiesPerSession:!0,sessionPoolOptions:s}),requestHandler:this._requestHandler.bind(this),errorHandler:this._failedRequestHandler.bind(this),failedRequestHandler:this._failedRequestHandler.bind(this)};r.preNavigationHooks||(r.preNavigationHooks=[]),r.preNavigationHooks.unshift(({crawler:t,session:e,request:i},s)=>{if(this.currentSession=e,e&&!this._initializedSessions.has(e.id)){if(this._initialCookies&&this._initialCookies.length>0){const t=this._initialCookies.map(t=>{const e={...t};return"no_restriction"===e.sameSite&&(e.sameSite="None"),e});e.setCookies(t,i.url)}this._initializedSessions.add(e.id)}});const n=this.crawler=this._createCrawler(r),o=await m.open(null,{config:n.config}),a=await o.getValue(g);!t.sessionState||a&&!t.overrideSessionState||await o.setValue(g,t.sessionState),this.crawler.run().then(()=>{this.isCrawlerReady=!0}).catch(t=>{this.isCrawlerReady=!1,console.error("Crawler background error:",t)})}async cleanup(){await(this._cleanup?.()),await this._commonCleanup();const t=this.ctx;t&&t.internal?.engine===this&&(t.internal.engine=void 0),this.ctx=void 0,this.opts=void 0}async _executePendingActions(t){await new Promise(e=>{const i=async({action:e,resolve:i,reject:s})=>{try{if("dispose"===e.type)return this.actionEmitter.emit("dispose"),void i();i(await this.executeAction(t,e))}catch(t){s(t)}};this.actionEmitter.on("dispatch",i),this.actionEmitter.once("dispose",()=>{this.actionEmitter.removeListener("dispatch",i),e()})})}async _sharedRequestHandler(t){const{request:e}=t;try{this.currentSession=t.session,this.isPageActive=!0;const i=this.pendingRequests.get(e.userData.requestId);if(i){const s=await this._buildResponse(t),r=!s.statusCode||s.statusCode>=400;if(this.ctx?.throwHttpErrors&&r){const t=new p(`Request for ${s.finalUrl} failed with status ${s.statusCode||"N/A"}`,"request",s.statusCode);i.reject(t)}else this.lastResponse=s,i.resolve(s);this.pendingRequests.delete(e.userData.requestId)}await this._executePendingActions(t)}finally{if(this.currentSession){const t=this.currentSession.getCookies(e.url);t&&(this._initialCookies=t)}this.isPageActive=!1,this.navigationLock.release()}}async _sharedFailedRequestHandler(t,e){const{request:i}=t,s=this.pendingRequests.get(i.userData.requestId);if(s&&e&&this.ctx?.throwHttpErrors){this.pendingRequests.delete(i.userData.requestId);const t=e.response,r=t?.statusCode||500,n=t?.url?t.url:i.url,o=new p(`Request${n?" for "+n:""} failed: ${e.message}`,"request",r);s.reject(o)}return this._sharedRequestHandler(t)}async dispatchAction(t){if(!this.isPageActive)throw new Error("No active page. Call goto() before performing actions.");return new Promise((e,i)=>{this.actionEmitter.emit("dispatch",{action:t,resolve:e,reject:i})})}async _requestHandler(t){await this._sharedRequestHandler(t)}async _failedRequestHandler(t,e){await this._sharedFailedRequestHandler(t,e)}async _commonCleanup(){if(this._initializedSessions.clear(),this.isPageActive&&await this.dispatchAction({type:"dispose"}).catch(()=>{}),this.pendingRequests.size>0)for(const[,t]of this.pendingRequests)t.reject(new Error("Cleanup:Request cancelled"));if(this.actionEmitter.removeAllListeners(),this.crawler){try{await(this.crawler.teardown?.())}catch(t){console.error("ccrawler teardown error:",t)}this.crawler=void 0}this.isCrawlerReady=void 0,this.requestQueue&&(await this.requestQueue.drop(),this.requestQueue=void 0),this.pendingRequests.clear()}async blockResources(t,e){return e&&this.blockedTypes.clear(),t.forEach(t=>this.blockedTypes.add(t)),t.length}getContent(){return this.lastResponse?Promise.resolve(this.lastResponse):Promise.reject(new Error("No content fetched yet. Call goto() first."))}async headers(t,e){if(void 0===t)return{...this.hdrs};if("string"==typeof t&&void 0===e)return this.hdrs[t.toLowerCase()]||"";if(null!==t&&"object"==typeof t){const i={};for(const[e,s]of Object.entries(t))i[e.toLowerCase()]=String(s);return this.hdrs=!0===e?i:{...this.hdrs,...i},!0}return"string"==typeof t&&("string"==typeof e?this.hdrs[t.toLowerCase()]=e:null===e&&delete this.hdrs[t.toLowerCase()],!0)}async cookies(t){const e=this.lastResponse?.url||"";if(Array.isArray(t))return this.currentSession?this.currentSession.setCookies(t,e):this._initialCookies=[...t],!0;if(null===t)return this.currentSession,this._initialCookies=[],!0;if(this.currentSession){return this.currentSession.getCookies(e)}return[...this._initialCookies||[]]}async dispose(){await this.cleanup()}};async function E(t,e){let i;if(e?.engine){if(i=await b.create(t,{engine:e.engine}),!i)throw new Error(`No engine available for ${e.engine}`);return i}const s=function(t,e){if(!t||!e?.length)return null;const i=new URL(t);let s=e.find(t=>t.domain===i.hostname);s||(s=e.find(t=>i.hostname.endsWith(t.domain)));if(!s)return null;if(s.pathScope?.length){if(!s.pathScope.some(t=>i.pathname.startsWith(t)))return null}return s}(e?.url||t.url,t.sites),r=t.engine||s?.engine||"auto";return i=await b.create(t,{engine:r}),i||(i=await b.create(t,{engine:"http"})),i}b.registry=new Map;var k=class{constructor(t={},e){this.options=t,this.engine=e,this.closed=!1,this.id=u(),this.context=this.createContext(t)}async execute(t){await this.ensureEngine(t);const e=r.create(t);if(!e)throw new Error(`Unknown action: ${t.id||t.name}`);let i,s;this.context.internal.actionIndex=(this.context.internal.actionIndex||0)+1,this.context.currentAction={...t,index:this.context.internal.actionIndex,startedAt:Date.now()};try{return i=await e.execute(this.context,t),i}catch(t){throw s=t,s}finally{this.context.currentAction=void 0}}async executeAll(t){let e=0;try{for(;e<t.length;){const i=t[e];await this.execute(i),e++}const i=await this.execute({id:"getContent"});return{result:i?.result,outputs:this.getOutputs()}}catch(t){throw t.actionIndex=e,t}}getOutputs(){return this.context.outputs}async getState(){return this.context.internal.engine?.getState()}async dispose(){if(this.closed)return;const t=this.context.eventBus;t.emit("session:closing",{sessionId:this.id});try{await(this.context.internal.engine?.dispose())}finally{this.closed=!0}t.emit("session:closed",{sessionId:this.id})}async ensureEngine(t){if(this.closed)throw new Error("Session is closed");if(!this.context.internal.engine){const e=t?.params?.url??this.context.url;if(!await E(this.context,{url:e,engine:this.engine}))throw new Error("No engine found")}}createContext(e=this.options){const i=new c;return l({...e,id:this.id,eventBus:i,outputs:{},internal:{},execute:async t=>this.execute(t),action:async function(t,e,i){return this.execute({name:t,params:e,...i})}},t)}},S=class{constructor(t={}){this.defaults=t}async createSession(t){const e={...this.defaults,...t||{}};return new k(e)}async fetch(t,e){"string"!=typeof t&&(t=(e=t).url);const i=await this.createSession(e);try{const s=e?.actions||[];t&&0!==s.findIndex(e=>"goto"===e.id&&e.params?.url===t)&&s.unshift({id:"goto",params:{url:t}});return await i.executeAll(s)}finally{await i.dispose()}}};import{CheerioCrawler as C,ProxyConfiguration as q}from"crawlee";import*as $ from"cheerio";import{CommonError as R,ErrorCode as P,NotFoundError as T}from"@isdk/common-error";var A=class extends b{_ensureCheerioContext(t){if(!t.$&&t.body){let e="string"==typeof t.body?t.body:Buffer.isBuffer(t.body)?t.body.toString("utf-8"):JSON.stringify(t.body);e.trim().startsWith("<")||(e=`<html><body><pre>${e}</pre></body></html>`),t.$=$.load(e)}}async _buildResponse(t){this._ensureCheerioContext(t);const{request:e,response:i,body:s,$:r}=t,n=r?.html();let o="string"==typeof s?s:Buffer.isBuffer(s)?s.toString("utf-8"):String(s??"");n&&n!==o&&(o=n);let a=i?.headers;if(!a&&i?.rawHeaders){a={};const t=i.rawHeaders;for(let e=0;e<t.length;e+=2)a[t[e].toLowerCase()]=t[e+1]}return{url:e.url,finalUrl:e.loadedUrl||e.url,statusCode:i?.statusCode??200,statusText:i?.statusMessage,headers:a||{},cookies:t.session?.getCookies(e.url),body:s,html:o,text:o}}async _querySelectorAll(t,e){const{$:i,el:s}=t;return s.find(e).toArray().map(t=>({$:i,el:i(t)}))}async _extractValue(t,e){const{el:i}=e,{attribute:s,type:r="string"}=t;if(0===i.length)return null;let n="";if(n=s?i.attr(s)??null:"html"===r?i.html():i.text().trim(),null===n)return null;switch(r){case"number":return parseFloat(n.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=n.toLowerCase();return"true"===t||"1"===t;default:return n}}async executeAction(t,e){const{$:i}=t;switch(e.type){case"dispose":return;case"extract":if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"extract");return this._extract(e.schema,{$:i,el:i.root()});case"click":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"click");const s=e.selector,r=i(s).first();let n;if(0===r.length)try{n=new URL(s,t.request.loadedUrl||t.request.url).href}catch{throw new R(`click: selector not found or invalid URL: ${s}`,"click")}else{if(!r.is("a")||!r.attr("href")){if(r.is('input[type="submit"], button[type="submit"], button, input')){const e=r.closest("form");if(e.length)return this.executeAction(t,{type:"submit",selector:e});throw new R("click: submit-like element without form","click")}throw new R(`click: unsupported element for http simulate. Selector: ${s}`,"click")}{const e=r.attr("href");n=new URL(e,t.request.loadedUrl||t.request.url).href}}const o=await t.sendRequest({url:n});return void this._updateStateAfterNavigation(t,o)}case"fill":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`),"fill";const s=i(e.selector).first();if(0===s.length)throw new R(`fill: selector not found: ${e.selector}`);if(!s.is("input, textarea, select"))throw new R(`fill: not a form field: ${e.selector}`);return s.val(e.value),void(this.lastResponse=await this.buildResponse(t))}case"waitFor":return void(e.options?.ms&&await new Promise(t=>setTimeout(t,e.options.ms)));case"pause":const s=this.ctx?.onPause;return void(s?(console.info(e.message||"Execution paused for manual intervention."),await s({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."));case"submit":{if(!i)throw new R(`Cheerio context not available for action: ${e.type}`,"submit");const s="string"==typeof e.selector?i(e.selector).first():null!=e.selector?e.selector:i("form").first();if(0===s.length)throw new T(e.selector,"submit");const r=s.attr("action")||t.request.loadedUrl||t.request.url,n=(s.attr("method")||"GET").toUpperCase(),o=new URL(r,t.request.loadedUrl||t.request.url).href,a={};let c;if(s.find("input, select, textarea").each((t,e)=>{const s=i(e),r=s.attr("name");if(!r)return;const n=s.val();null!=n&&(a[r]=String(n))}),"GET"===n){const e=new URL(o);Object.entries(a).forEach(([t,i])=>e.searchParams.set(t,i)),c=await t.sendRequest({url:e.href,method:"GET"})}else{let i;const r={};"application/json"===(e.options?.enctype||s.attr("enctype")||"application/x-www-form-urlencoded")?(i=JSON.stringify(a),r["Content-Type"]="application/json"):(i=new URLSearchParams(a).toString(),r["Content-Type"]="application/x-www-form-urlencoded"),c=await t.sendRequest({url:o,method:"POST",body:i,headers:r})}return void this._updateStateAfterNavigation(t,c)}case"getContent":return this.buildResponse(t);default:throw new R(`Unknown action type: ${e.type}`,"CheerioFetchEngine.executeAction",P.NotSupported)}}_updateStateAfterNavigation(t,e){const i=e;let s=i.headers;if(!s&&i.rawHeaders){s={};for(let t=0;t<i.rawHeaders.length;t+=2)s[i.rawHeaders[t].toLowerCase()]=i.rawHeaders[t+1]}s=s||{};const r=i.body,n=$.load(r??"");t.$=n,t.response=i,t.body=r;const o=n.html(),a=n.text(),c=(s["content-type"]||"").split(";")[0].trim();this.lastResponse={url:t.request.url,finalUrl:i.url,statusCode:i.statusCode,statusText:i.statusMessage,headers:s,contentType:c,body:r,html:o,text:a}}_createCrawler(t){return new C(t)}_getSpecificCrawlerOptions(t){const e=this.opts?.proxy?"string"==typeof this.opts.proxy?[this.opts.proxy]:this.opts.proxy:void 0,i=e?.length?new q({proxyUrls:e}):void 0;return{additionalMimeTypes:["text/plain"],maxRequestRetries:1,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,proxyConfiguration:i,preNavigationHooks:[({session:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors,this.opts?.timeoutMs&&(s.timeout={request:this.opts.timeoutMs})}]}}async goto(t,e){this.isPageActive&&this.dispatchAction({type:"dispose"}).catch(()=>{});const i="req-"+ ++this.requestCounter,s=new Promise((t,s)=>{const r=e?.timeoutMs||this.opts?.timeoutMs||3e4,n=setTimeout(()=>{this.pendingRequests.delete(i),this.navigationLock.release(),s(new R(`goto timed out after ${r}ms.`,"gotoTimeout",P.RequestTimeout))},r);this.pendingRequests.set(i,{resolve:e=>{clearTimeout(n),t(e)},reject:t=>{clearTimeout(n),s(t)}})});return this.requestQueue.addRequest({...e,url:t,headers:{...this.hdrs,...e?.headers},userData:{requestId:i},uniqueKey:`${t}-${i}`}).catch(t=>{const e=this.pendingRequests.get(i);e&&(this.pendingRequests.delete(i),this.navigationLock.release(),e.reject(t))}),await this.navigationLock,this.navigationLock=x(),s}};A.id="cheerio",A.mode="http",b.register(A);import{PlaywrightCrawler as U}from"crawlee";import{firefox as _}from"playwright";import{CommonError as O,ErrorCode as j,NotFoundError as N}from"@isdk/common-error";var F=class extends b{async _buildResponse(t){const{page:e,response:i,request:s,session:r}=t;if(!e||e.isClosed())return{url:s.url,finalUrl:s.loadedUrl||s.url,statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:[],body:"",html:"",text:""};const n=await e.content(),o=await e.textContent("body"),a=await e.context().cookies();return console.log("[PlaywrightEngine] Cookies from page context:",a),r&&r.setCookies(a,s.url),{url:e.url(),finalUrl:e.url(),statusCode:i?.status(),statusText:i?.statusText(),headers:await(i?.allHeaders())||{},cookies:a,body:n,html:n,text:o||""}}async _querySelectorAll(t,e){return t.locator(e).all()}async _extractValue(t,e){const{attribute:i,type:s="string"}=t;if(0===await e.count())return null;let r="";if(r=i?await e.getAttribute(i):"html"===s?await e.innerHTML():await e.textContent(),null===r)return null;switch(r=r.trim(),s){case"number":return parseFloat(r.replace(/[^0-9.-]+/g,""))||null;case"boolean":const t=r.toLowerCase();return"true"===t||"1"===t;default:return r}}async executeAction(t,e){const{page:i}=t,s=this.opts?.timeoutMs||3e4;switch(e.type){case"navigate":{const s=await i.goto(e.url,{waitUntil:e.opts?.waitUntil||"domcontentloaded",timeout:this.opts?.timeoutMs||3e4});s&&(t={...t,response:s});const r=await this.buildResponse(t);return this.lastResponse=r,r}case"extract":{const s=await this._extract(e.schema,i.locator("body"));return this.lastResponse=await this.buildResponse(t),s}case"click":{await i.click(e.selector,{timeout:s}),await i.waitForLoadState("networkidle",{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r)}case"fill":await i.fill(e.selector,e.value,{timeout:s});const r=await this.buildResponse(t);return void(this.lastResponse=r);case"waitFor":try{e.options?.selector&&await i.waitForSelector(e.options.selector,{timeout:s}),e.options?.networkIdle&&await i.waitForLoadState("networkidle",{timeout:s})}catch(t){if(!1!==e.options?.failOnTimeout)throw t}return void(e.options?.ms&&await i.waitForTimeout(e.options.ms));case"submit":{const r=e.selector||"form",n=i.locator(r).first();if(0===await n.count())throw new N(r,"submit");if("application/json"===(e.options?.enctype||"application/x-www-form-urlencoded")){const t=await n.elementHandle();if(!t)throw new O(`submit: could not get form handle for ${r}`,"submit");const e=await t.evaluate(async t=>{const e=new FormData(t),i={};e.forEach((t,e)=>{i[e]=t.toString()});const s=await fetch(t.action,{method:t.method,headers:{"Content-Type":"application/json"},body:JSON.stringify(i)}),r=await s.text();return{status:s.status,statusText:s.statusText,headers:Object.fromEntries(s.headers.entries()),body:r,html:r,text:r,url:t.action,finalUrl:s.url}});return await t.dispose(),await i.setContent(e.html),void(this.lastResponse=e)}return await n.evaluate(t=>t.submit()),await i.waitForLoadState("networkidle",{timeout:s}),void(this.lastResponse=await this.buildResponse(t))}case"pause":{const t=this.ctx?.onPause;return void(t?(console.info(e.message||"Execution paused for manual intervention."),await t({message:e.message}),console.info("Resuming execution...")):console.warn("[PauseAction] was called, but no `onPause` handler was provided in fetchWeb options. Skipped."))}case"getContent":return this.buildResponse(t);default:throw new O(`Unknown action type: ${e.type}`,"PlaywrightFetchEngine.executeAction",j.NotSupported)}}_createCrawler(t){return new U(t)}async _getSpecificCrawlerOptions(t){const e=t.browser?.headless??!0,i={maxRequestRetries:t.retries||3,headless:e,requestHandlerTimeoutSecs:t.requestHandlerTimeoutSecs,preNavigationHooks:[async({page:e,request:i},s)=>{s.throwHttpErrors=t.throwHttpErrors;const r=this.blockedTypes;r.size>0&&await e.route("**/*",t=>{r.has(t.request().resourceType())?t.abort():t.continue()})}]};if(this.opts?.antibot){i.browserPoolOptions={useFingerprints:!1};const{launchOptions:t}=await import("camoufox-js"),s=await t({headless:e});i.launchContext={launcher:_,launchOptions:s},i.postNavigationHooks=[async({page:t,handleCloudflareChallenge:e})=>{await e()}]}return i}async goto(t,e){if(this.isPageActive)return this.dispatchAction({type:"navigate",url:t,opts:e});if(!this.requestQueue)throw new O("RequestQueue not initialized","goto");const i="req-"+ ++this.requestCounter,s=new Promise((t,e)=>{this.pendingRequests.set(i,{resolve:t,reject:e})});return await this.requestQueue.addRequest({url:t,headers:this.hdrs,userData:{requestId:i,waitUntil:e?.waitUntil||"domcontentloaded"},uniqueKey:`${t}-${i}`}),s}};F.id="playwright",F.mode="browser",b.register(F);var H=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};if(!i)throw new Error("Selector is required for click action");await this.delegateToEngine(t,"click",i,s)}};H.id="click",H.returnType="none",H.capabilities={http:"simulate",browser:"native"},r.register(H);var L=class extends r{async onExecute(t,e){const{selector:i,value:s,...r}=e?.params||{};if(!i)throw new Error("Selector is required for fill action");if(void 0===s)throw new Error("Value is required for fill action");await this.delegateToEngine(t,"fill",i,s,r)}};L.id="fill",L.returnType="none",L.capabilities={http:"simulate",browser:"native"},r.register(L);var M=class extends r{async onExecute(t,e){return await this.delegateToEngine(t,"getContent",e?.params)}};M.id="getContent",M.returnType="response",M.capabilities={http:"native",browser:"native"},r.register(M);var B=class extends r{async onExecute(t,e,i){const s=e?.params,r=s?.url||t.url;if(!r)throw new Error("URL is required for goto action");const n=t.internal.engine;if(!n)throw new Error("No engine available");t.url=r;return await n.goto(r,s)}};B.id="goto",B.returnType="response",B.capabilities={http:"native",browser:"native"},r.register(B);var z=class extends r{async onExecute(t,e){const{selector:i,...s}=e?.params||{};await this.delegateToEngine(t,"submit",i,s)}};z.id="submit",z.returnType="none",z.capabilities={http:"simulate",browser:"native"},r.register(z);var D=class extends r{async onExecute(t,e){const i=t.internal.engine;if(!i)throw new Error("No engine available");await i.waitFor(e?.params)}};D.id="waitFor",D.returnType="none",D.capabilities={http:"native",browser:"native"},r.register(D);var J=class extends r{async onExecute(t,e){const i=e?.params;if(!i)throw new Error("Schema is required for extract action");return this.delegateToEngine(t,"extract",i)}};J.id="extract",J.returnType="any",J.capabilities={http:"native",browser:"native"},r.register(J);var G=class extends r{async onExecute(t,e){const{selector:i,message:s,attribute:r}=e?.params||{},n=t.internal.engine;if("browser"===n?.mode){if(i){if(!await(n?.extract({selector:i,attribute:r})))return}n&&"pause"in n?await n.pause(s):console.warn("[PauseAction] was called, but the current engine does not support `pause`. Skipped.")}else console.warn("[PauseAction] can only run in browser engine. Skipped.")}};async function I(t,e){return(new S).fetch(t,e)}G.id="pause",G.capabilities={http:"native",browser:"native"},G.returnType="none",r.register(G);export{A as CheerioFetchEngine,H as ClickAction,t as DefaultFetcherProperties,J as ExtractAction,r as FetchAction,i as FetchActionResultStatus,b as FetchEngine,k as FetchSession,e as FetcherOptionKeys,L as FillAction,M as GetContentAction,B as GotoAction,G as PauseAction,F as PlaywrightFetchEngine,z as SubmitAction,D as WaitForAction,S as WebFetcher,I as fetchWeb};
|
package/docs/README.md
CHANGED
|
@@ -137,7 +137,10 @@ This is the main entry point for the library.
|
|
|
137
137
|
* `engine` ('http' | 'browser' | 'auto'): The engine to use. Defaults to `auto`.
|
|
138
138
|
* `actions` (FetchActionOptions[]): An array of action objects to execute. (Supports `action`/`name` as alias for `id`, and `args` as alias for `params`)
|
|
139
139
|
* `headers` (Record<string, string>): Headers to use for all requests.
|
|
140
|
-
*
|
|
140
|
+
* `cookies` (Cookie[]): Array of cookies to use.
|
|
141
|
+
* `sessionState` (any): Crawlee session state to restore.
|
|
142
|
+
* `sessionPoolOptions` (SessionPoolOptions): Advanced configuration for the underlying Crawlee SessionPool.
|
|
143
|
+
* ...and many other options for proxy, retries, etc.
|
|
141
144
|
|
|
142
145
|
### Built-in Actions
|
|
143
146
|
|
package/docs/_media/README.cn.md
CHANGED
|
@@ -133,7 +133,10 @@ searchGoogle('gemini');
|
|
|
133
133
|
* `engine` ('http' | 'browser' | 'auto'): 要使用的引擎。默认为 `auto`。
|
|
134
134
|
* `actions` (FetchActionOptions[]): 要执行的动作对象数组。(支持 `action`/`name` 作为 `id` 的别名,`args` 作为 `params` 的别名)
|
|
135
135
|
* `headers` (Record<string, string>): 用于所有请求的头信息。
|
|
136
|
-
*
|
|
136
|
+
* `cookies` (Cookie[]): 要使用的 Cookie 数组。
|
|
137
|
+
* `sessionState` (any): 要恢复的 Crawlee 会话状态。
|
|
138
|
+
* `sessionPoolOptions` (SessionPoolOptions): 底层 Crawlee SessionPool 的高级配置。
|
|
139
|
+
* ...以及许多其他用于代理、重试等的选项。
|
|
137
140
|
|
|
138
141
|
### 内置动作
|
|
139
142
|
|