@isdk/web-searcher 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +49 -26
- package/README.md +50 -27
- package/dist/index.d.mts +30 -1
- package/dist/index.d.ts +30 -1
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +50 -27
- package/docs/classes/GoogleSearcher.md +27 -27
- package/docs/classes/WebSearcher.md +27 -27
- package/docs/interfaces/CustomTimeRange.md +3 -3
- package/docs/interfaces/PaginationConfig.md +26 -6
- package/docs/interfaces/SearchContext.md +4 -4
- package/docs/interfaces/SearchOptions.md +23 -8
- package/docs/interfaces/StandardSearchResult.md +56 -6
- package/docs/type-aliases/SafeSearchLevel.md +1 -1
- package/docs/type-aliases/SearchCategory.md +1 -1
- package/docs/type-aliases/SearchTimeRange.md +1 -1
- package/docs/type-aliases/SearchTimeRangePreset.md +1 -1
- package/docs/type-aliases/SearcherConstructor.md +1 -1
- package/package.json +16 -4
package/docs/README.md
CHANGED
|
@@ -15,53 +15,57 @@ Building a robust search scraper involves more than just fetching a URL. You oft
|
|
|
15
15
|
- **Data Cleaning**: Parse raw HTML and resolve redirect links.
|
|
16
16
|
- **Flexibility**: Switch between HTTP (fast) and Browser (anti-bot) modes easily.
|
|
17
17
|
|
|
18
|
-
This module encapsulates these patterns into a reusable `
|
|
18
|
+
This module encapsulates these patterns into a reusable `WebSearcher` class.
|
|
19
19
|
|
|
20
20
|
## 🚀 Quick Start
|
|
21
21
|
|
|
22
22
|
### 1. One-off Search
|
|
23
23
|
|
|
24
|
-
|
|
24
|
+
> **⚠️ Note on `GoogleSearcher`**: The `GoogleSearcher` class used in these examples is a **demo implementation** included for educational purposes. It is not intended for production use.
|
|
25
|
+
>
|
|
26
|
+
> * It lacks advanced anti-bot handling (CAPTCHA solving, proxy rotation) required for scraping Google reliably at scale.
|
|
27
|
+
> * The extracted data may be **inaccurate or misaligned** due to Google's frequent DOM changes and A/B testing.
|
|
28
|
+
|
|
29
|
+
Use the static `WebSearcher.search` method for quick, disposable tasks. It automatically creates a session, fetches results, and cleans up.
|
|
25
30
|
|
|
26
31
|
```typescript
|
|
27
|
-
import {
|
|
28
|
-
import { GoogleSearcher } from '@isdk/web-fetcher/search/engines/google';
|
|
32
|
+
import { GoogleSearcher, WebSearcher } from '@isdk/web-fetcher';
|
|
29
33
|
|
|
30
34
|
// Register the engine (only needs to be done once)
|
|
31
|
-
|
|
35
|
+
WebSearcher.register(GoogleSearcher);
|
|
32
36
|
|
|
33
37
|
// Search!
|
|
34
38
|
// The 'limit' parameter ensures we fetch enough pages to get 20 results.
|
|
35
39
|
// Note: The engine name is case-sensitive and derived from the class name (e.g., 'GoogleSearcher' -> 'Google')
|
|
36
|
-
const results = await
|
|
40
|
+
const results = await WebSearcher.search('Google', 'open source', { limit: 20 });
|
|
37
41
|
|
|
38
42
|
console.log(results);
|
|
39
43
|
```
|
|
40
44
|
|
|
41
45
|
### 2. Stateful Session
|
|
42
46
|
|
|
43
|
-
Since `
|
|
47
|
+
Since `WebSearcher` extends `FetchSession`, you can instantiate it to keep cookies and storage alive across multiple requests. This is useful for authenticated searches or avoiding bot detection by behaving like a human.
|
|
48
|
+
|
|
49
|
+
### 🛡️ Core Principle: Template is Law
|
|
44
50
|
|
|
45
|
-
|
|
46
|
-
When creating a session, options are merged in the following order:
|
|
47
|
-
1. **Template Default**: Defined in the Searcher class (highest priority for structural options).
|
|
48
|
-
2. **User Options**: Passed to the constructor (can fill missing defaults or override if allowed).
|
|
51
|
+
The `template` defined in the `WebSearcher` subclass acts as the authoritative "blueprint".
|
|
49
52
|
|
|
50
|
-
|
|
53
|
+
- **Template Priority**: If the template defines a property (e.g., `engine: 'browser'`, `headers`), that value is **locked** and cannot be overridden by user options. This ensures engine stability.
|
|
54
|
+
- **User Flexibility**: Properties **not** explicitly defined in the template (such as `proxy`, `timeoutMs`, or custom variables) can be freely set by the user in the constructor or `search()` method.
|
|
51
55
|
|
|
52
56
|
```typescript
|
|
53
57
|
// Create a persistent session
|
|
54
58
|
const google = new GoogleSearcher({
|
|
55
|
-
headless: false, // Override
|
|
59
|
+
headless: false, // Override if not locked in template
|
|
56
60
|
proxy: 'http://my-proxy:8080',
|
|
57
|
-
timeoutMs: 30000 // Set a global timeout
|
|
61
|
+
timeoutMs: 30000 // Set a global timeout (valid if template doesn't define it)
|
|
58
62
|
});
|
|
59
63
|
|
|
60
64
|
try {
|
|
61
65
|
// First query
|
|
62
66
|
// You can also pass runtime options to override session defaults or inject variables
|
|
63
67
|
const results1 = await google.search('term A', {
|
|
64
|
-
timeoutMs: 60000, // Override timeout just for this search
|
|
68
|
+
timeoutMs: 60000, // Override session timeout just for this search
|
|
65
69
|
extraParam: 'value' // Can be used in template as ${extraParam}
|
|
66
70
|
});
|
|
67
71
|
|
|
@@ -75,11 +79,11 @@ try {
|
|
|
75
79
|
|
|
76
80
|
## 🛠️ Implementing a New Search Engine
|
|
77
81
|
|
|
78
|
-
To support a new website, create a class that extends `
|
|
82
|
+
To support a new website, create a class that extends `WebSearcher`.
|
|
79
83
|
|
|
80
84
|
### Step 1: Define the Template
|
|
81
85
|
|
|
82
|
-
To support a new website, create a class that extends `
|
|
86
|
+
To support a new website, create a class that extends `WebSearcher`. The engine name is automatically derived from the class name (e.g., `MyBlogSearcher` -> `MyBlog`), but you can customize it and add aliases using static properties.
|
|
83
87
|
|
|
84
88
|
The `template` property defines the "Blueprint" for your search. It's a standard `FetcherOptions` object but supports **variable injection**.
|
|
85
89
|
|
|
@@ -91,10 +95,10 @@ Supported variables:
|
|
|
91
95
|
- `${limit}`: The requested limit.
|
|
92
96
|
|
|
93
97
|
```typescript
|
|
94
|
-
import {
|
|
98
|
+
import { WebSearcher } from '@isdk/web-fetcher/search';
|
|
95
99
|
import { FetcherOptions } from '@isdk/web-fetcher/types';
|
|
96
100
|
|
|
97
|
-
export class MyBlogSearcher extends
|
|
101
|
+
export class MyBlogSearcher extends WebSearcher {
|
|
98
102
|
static name = 'blog'; // Custom name (case-sensitive)
|
|
99
103
|
static alias = ['myblog', 'news'];
|
|
100
104
|
|
|
@@ -124,7 +128,7 @@ export class MyBlogSearcher extends Searcher {
|
|
|
124
128
|
|
|
125
129
|
### Step 2: Configure Pagination
|
|
126
130
|
|
|
127
|
-
Tell the `
|
|
131
|
+
Tell the `WebSearcher` how to navigate to the next page. Implement the `pagination` getter.
|
|
128
132
|
|
|
129
133
|
#### Option A: URL Parameters (Offset/Page)
|
|
130
134
|
|
|
@@ -156,7 +160,7 @@ protected override get pagination() {
|
|
|
156
160
|
|
|
157
161
|
### Step 3: Transform & Clean Data
|
|
158
162
|
|
|
159
|
-
Override `transform` to clean data. Since `
|
|
163
|
+
Override `transform` to clean data. Since `WebSearcher` is a `FetchSession`, you can also make extra requests (like resolving redirects) using `this`.
|
|
160
164
|
|
|
161
165
|
```typescript
|
|
162
166
|
protected override async transform(outputs: Record<string, any>) {
|
|
@@ -171,24 +175,43 @@ protected override async transform(outputs: Record<string, any>) {
|
|
|
171
175
|
}
|
|
172
176
|
```
|
|
173
177
|
|
|
174
|
-
|
|
178
|
+
### 🧠 Advanced Concepts
|
|
179
|
+
|
|
180
|
+
### Auto-Pagination: `limit` vs `maxPages`
|
|
181
|
+
|
|
182
|
+
The `WebSearcher` is designed to be result-oriented. When you call `search()`, you specify how many results you want, and the searcher handles the pagination logic.
|
|
183
|
+
|
|
184
|
+
- **`limit`**: Your target number of total results.
|
|
185
|
+
- **`maxPages`**: The safety threshold. It limits how many pages (fetch cycles) the searcher is allowed to navigate to satisfy your `limit`.
|
|
175
186
|
|
|
176
|
-
|
|
187
|
+
**Example Logic:**
|
|
188
|
+
If you request `{ limit: 50 }` but each page only has 5 results:
|
|
177
189
|
|
|
178
|
-
|
|
190
|
+
1. The searcher fetches page 1 (5 results).
|
|
191
|
+
2. It sees `5 < 50`, so it fetches page 2.
|
|
192
|
+
3. It continues until it has 50 results **OR** it reaches `maxPages` (default 10).
|
|
193
|
+
|
|
194
|
+
This prevent infinite loops if the "Next" button selector is broken or if the search engine keeps returning the same results.
|
|
179
195
|
|
|
180
196
|
### User-defined Transforms
|
|
181
197
|
|
|
182
198
|
Users can provide their own `transform` when calling `search`. This runs **after** the engine's built-in transform.
|
|
183
199
|
|
|
200
|
+
This is extremely powerful for **filtering out ads** or irrelevant content. If the user filters out results, the auto-pagination logic will automatically kick in to fetch more pages to ensure the final result list meets your requested `limit` with only valid entries.
|
|
201
|
+
|
|
184
202
|
```typescript
|
|
185
203
|
await google.search('test', {
|
|
186
|
-
|
|
204
|
+
limit: 20,
|
|
205
|
+
// Example: Filter out sponsored results and only keep PDFs
|
|
206
|
+
transform: (results) => {
|
|
207
|
+
return results.filter(r => {
|
|
208
|
+
const isAd = r.isSponsored || r.url.includes('googleadservices.com');
|
|
209
|
+
return !isAd && r.url.endsWith('.pdf');
|
|
210
|
+
});
|
|
211
|
+
}
|
|
187
212
|
});
|
|
188
213
|
```
|
|
189
214
|
|
|
190
|
-
If the user filters out results, the auto-pagination logic will kick in to fetch more pages to meet the requested limit.
|
|
191
|
-
|
|
192
215
|
### Standardized Search Options
|
|
193
216
|
|
|
194
217
|
When calling `search()`, you can provide standardized options that the search engine will map to specific parameters:
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Class: GoogleSearcher
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/
|
|
9
|
+
Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L24)
|
|
10
10
|
|
|
11
11
|
A sample implementation of a Google Search scraper.
|
|
12
12
|
|
|
@@ -37,7 +37,7 @@ Use this class to understand:
|
|
|
37
37
|
|
|
38
38
|
> **new GoogleSearcher**(`options?`): `GoogleSearcher`
|
|
39
39
|
|
|
40
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
40
|
+
Defined in: web-fetcher/dist/index.d.ts:2275
|
|
41
41
|
|
|
42
42
|
Creates a new FetchSession.
|
|
43
43
|
|
|
@@ -63,7 +63,7 @@ Configuration options for the fetcher.
|
|
|
63
63
|
|
|
64
64
|
> `protected` **closed**: `boolean`
|
|
65
65
|
|
|
66
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
66
|
+
Defined in: web-fetcher/dist/index.d.ts:2269
|
|
67
67
|
|
|
68
68
|
#### Inherited from
|
|
69
69
|
|
|
@@ -75,7 +75,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
|
|
|
75
75
|
|
|
76
76
|
> `readonly` **context**: `FetchContext`
|
|
77
77
|
|
|
78
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
78
|
+
Defined in: web-fetcher/dist/index.d.ts:2268
|
|
79
79
|
|
|
80
80
|
The execution context for this session, containing configurations, event bus, and shared state.
|
|
81
81
|
|
|
@@ -89,7 +89,7 @@ The execution context for this session, containing configurations, event bus, an
|
|
|
89
89
|
|
|
90
90
|
> `readonly` **id**: `string`
|
|
91
91
|
|
|
92
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
92
|
+
Defined in: web-fetcher/dist/index.d.ts:2264
|
|
93
93
|
|
|
94
94
|
Unique identifier for the session.
|
|
95
95
|
|
|
@@ -103,7 +103,7 @@ Unique identifier for the session.
|
|
|
103
103
|
|
|
104
104
|
> `protected` **options**: `FetcherOptions`
|
|
105
105
|
|
|
106
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
106
|
+
Defined in: web-fetcher/dist/index.d.ts:2260
|
|
107
107
|
|
|
108
108
|
#### Inherited from
|
|
109
109
|
|
|
@@ -115,7 +115,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
|
|
|
115
115
|
|
|
116
116
|
> `static` **\_isFactory**: `boolean` = `false`
|
|
117
117
|
|
|
118
|
-
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/
|
|
118
|
+
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
|
|
119
119
|
|
|
120
120
|
#### Inherited from
|
|
121
121
|
|
|
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
|
|
|
127
127
|
|
|
128
128
|
> `static` **alias**: `string`[]
|
|
129
129
|
|
|
130
|
-
Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/
|
|
130
|
+
Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L25)
|
|
131
131
|
|
|
132
132
|
Engine alias(es). Can be a single string or an array of strings.
|
|
133
133
|
Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
@@ -142,7 +142,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
|
142
142
|
|
|
143
143
|
> `static` **createObject**: (`name`, ...`args`) => [`WebSearcher`](WebSearcher.md)
|
|
144
144
|
|
|
145
|
-
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/
|
|
145
|
+
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
|
|
146
146
|
|
|
147
147
|
Creates an instance of the registered search engine.
|
|
148
148
|
|
|
@@ -176,7 +176,7 @@ An instance of the search engine.
|
|
|
176
176
|
|
|
177
177
|
> `static` **forEach**: (`cb`) => `void`
|
|
178
178
|
|
|
179
|
-
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/
|
|
179
|
+
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
|
|
180
180
|
|
|
181
181
|
Iterates over all registered engines.
|
|
182
182
|
|
|
@@ -202,7 +202,7 @@ Callback function to invoke for each registered engine.
|
|
|
202
202
|
|
|
203
203
|
> `static` **get**: (`name`) => *typeof* [`WebSearcher`](WebSearcher.md)
|
|
204
204
|
|
|
205
|
-
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/
|
|
205
|
+
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
|
|
206
206
|
|
|
207
207
|
Retrieves a registered search engine class by name.
|
|
208
208
|
|
|
@@ -230,7 +230,7 @@ The search engine class constructor.
|
|
|
230
230
|
|
|
231
231
|
> `static` `optional` **name**: `string`
|
|
232
232
|
|
|
233
|
-
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/
|
|
233
|
+
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
|
|
234
234
|
|
|
235
235
|
Custom engine name. If not provided, it is derived from the class name.
|
|
236
236
|
For example, `GoogleSearcher` becomes `Google`.
|
|
@@ -245,7 +245,7 @@ For example, `GoogleSearcher` becomes `Google`.
|
|
|
245
245
|
|
|
246
246
|
> `static` **register**: (`ctor`, `options?`) => `boolean`
|
|
247
247
|
|
|
248
|
-
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/
|
|
248
|
+
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
|
|
249
249
|
|
|
250
250
|
Registers a search engine class.
|
|
251
251
|
|
|
@@ -279,7 +279,7 @@ Registration options. If a string is provided, it is used as the registered name
|
|
|
279
279
|
|
|
280
280
|
> `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
|
|
281
281
|
|
|
282
|
-
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/
|
|
282
|
+
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
|
|
283
283
|
|
|
284
284
|
Sets aliases for a registered engine.
|
|
285
285
|
|
|
@@ -311,7 +311,7 @@ Aliases to add.
|
|
|
311
311
|
|
|
312
312
|
> `static` **unregister**: (`name?`) => `void`
|
|
313
313
|
|
|
314
|
-
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
314
|
+
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
|
|
315
315
|
|
|
316
316
|
Unregisters a search engine.
|
|
317
317
|
|
|
@@ -339,7 +339,7 @@ The name or class to unregister.
|
|
|
339
339
|
|
|
340
340
|
> **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md)
|
|
341
341
|
|
|
342
|
-
Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
342
|
+
Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L61)
|
|
343
343
|
|
|
344
344
|
Configures pagination for Google Search results.
|
|
345
345
|
Uses the 'start' URL parameter, incrementing by 10 for each page.
|
|
@@ -360,7 +360,7 @@ Uses the 'start' URL parameter, incrementing by 10 for each page.
|
|
|
360
360
|
|
|
361
361
|
> **get** **template**(): `FetcherOptions`
|
|
362
362
|
|
|
363
|
-
Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/
|
|
363
|
+
Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L32)
|
|
364
364
|
|
|
365
365
|
Defines the fetch template for Google Search.
|
|
366
366
|
|
|
@@ -380,7 +380,7 @@ The fetcher configuration including the URL pattern and extraction rules.
|
|
|
380
380
|
|
|
381
381
|
> `protected` **createContext**(`options`): `FetchContext`
|
|
382
382
|
|
|
383
|
-
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/
|
|
383
|
+
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
|
|
384
384
|
|
|
385
385
|
#### Parameters
|
|
386
386
|
|
|
@@ -402,7 +402,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
|
|
|
402
402
|
|
|
403
403
|
> **dispose**(): `Promise`\<`void`\>
|
|
404
404
|
|
|
405
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
405
|
+
Defined in: web-fetcher/dist/index.d.ts:2334
|
|
406
406
|
|
|
407
407
|
Disposes of the session and its associated engine.
|
|
408
408
|
|
|
@@ -425,7 +425,7 @@ This method should be called when the session is no longer needed to free up res
|
|
|
425
425
|
|
|
426
426
|
> **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
|
|
427
427
|
|
|
428
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
428
|
+
Defined in: web-fetcher/dist/index.d.ts:2289
|
|
429
429
|
|
|
430
430
|
Executes a single action within the session.
|
|
431
431
|
|
|
@@ -473,7 +473,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
|
|
|
473
473
|
|
|
474
474
|
> **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
|
|
475
475
|
|
|
476
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
476
|
+
Defined in: web-fetcher/dist/index.d.ts:2306
|
|
477
477
|
|
|
478
478
|
Executes a sequence of actions.
|
|
479
479
|
|
|
@@ -517,7 +517,7 @@ const { result, outputs } = await session.executeAll([
|
|
|
517
517
|
|
|
518
518
|
> `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
|
|
519
519
|
|
|
520
|
-
Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/
|
|
520
|
+
Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L82)
|
|
521
521
|
|
|
522
522
|
Maps standard `SearchOptions` to Google's specific URL parameters.
|
|
523
523
|
|
|
@@ -551,7 +551,7 @@ A map of variables to inject into the URL template.
|
|
|
551
551
|
|
|
552
552
|
> **getOutputs**(): `Record`\<`string`, `any`\>
|
|
553
553
|
|
|
554
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
554
|
+
Defined in: web-fetcher/dist/index.d.ts:2317
|
|
555
555
|
|
|
556
556
|
Retrieves all outputs accumulated during the session.
|
|
557
557
|
|
|
@@ -571,7 +571,7 @@ A record of stored output data.
|
|
|
571
571
|
|
|
572
572
|
> **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
|
|
573
573
|
|
|
574
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
574
|
+
Defined in: web-fetcher/dist/index.d.ts:2323
|
|
575
575
|
|
|
576
576
|
Gets the current state of the session, including cookies and engine-specific state.
|
|
577
577
|
|
|
@@ -591,7 +591,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
|
|
|
591
591
|
|
|
592
592
|
> **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
593
593
|
|
|
594
|
-
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/
|
|
594
|
+
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
|
|
595
595
|
|
|
596
596
|
Executes a search query.
|
|
597
597
|
|
|
@@ -628,7 +628,7 @@ A promise resolving to an array of standardized search results.
|
|
|
628
628
|
|
|
629
629
|
> `protected` **transform**(`outputs`): `Promise`\<`any`[]\>
|
|
630
630
|
|
|
631
|
-
Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/
|
|
631
|
+
Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L144)
|
|
632
632
|
|
|
633
633
|
Cleans and normalizes the extracted results.
|
|
634
634
|
Specifically, it unwraps Google's redirect URLs (starting with `/url?q=`).
|
|
@@ -657,7 +657,7 @@ An array of cleaned search results.
|
|
|
657
657
|
|
|
658
658
|
> `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
659
659
|
|
|
660
|
-
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/
|
|
660
|
+
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
|
|
661
661
|
|
|
662
662
|
Static helper to execute a one-off search.
|
|
663
663
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Abstract Class: WebSearcher
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/
|
|
9
|
+
Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L31)
|
|
10
10
|
|
|
11
11
|
The abstract base class for all search engines.
|
|
12
12
|
|
|
@@ -41,7 +41,7 @@ WebSearcher.register(MySearcher);
|
|
|
41
41
|
|
|
42
42
|
> **new WebSearcher**(`options?`): `WebSearcher`
|
|
43
43
|
|
|
44
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
44
|
+
Defined in: web-fetcher/dist/index.d.ts:2275
|
|
45
45
|
|
|
46
46
|
Creates a new FetchSession.
|
|
47
47
|
|
|
@@ -67,7 +67,7 @@ Configuration options for the fetcher.
|
|
|
67
67
|
|
|
68
68
|
> `protected` **closed**: `boolean`
|
|
69
69
|
|
|
70
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
70
|
+
Defined in: web-fetcher/dist/index.d.ts:2269
|
|
71
71
|
|
|
72
72
|
#### Inherited from
|
|
73
73
|
|
|
@@ -79,7 +79,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
|
|
|
79
79
|
|
|
80
80
|
> `readonly` **context**: `FetchContext`
|
|
81
81
|
|
|
82
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
82
|
+
Defined in: web-fetcher/dist/index.d.ts:2268
|
|
83
83
|
|
|
84
84
|
The execution context for this session, containing configurations, event bus, and shared state.
|
|
85
85
|
|
|
@@ -93,7 +93,7 @@ The execution context for this session, containing configurations, event bus, an
|
|
|
93
93
|
|
|
94
94
|
> `readonly` **id**: `string`
|
|
95
95
|
|
|
96
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
96
|
+
Defined in: web-fetcher/dist/index.d.ts:2264
|
|
97
97
|
|
|
98
98
|
Unique identifier for the session.
|
|
99
99
|
|
|
@@ -107,7 +107,7 @@ Unique identifier for the session.
|
|
|
107
107
|
|
|
108
108
|
> `protected` **options**: `FetcherOptions`
|
|
109
109
|
|
|
110
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
110
|
+
Defined in: web-fetcher/dist/index.d.ts:2260
|
|
111
111
|
|
|
112
112
|
#### Inherited from
|
|
113
113
|
|
|
@@ -119,7 +119,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
|
|
|
119
119
|
|
|
120
120
|
> `static` **\_isFactory**: `boolean` = `false`
|
|
121
121
|
|
|
122
|
-
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/
|
|
122
|
+
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
|
|
123
123
|
|
|
124
124
|
***
|
|
125
125
|
|
|
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
|
|
|
127
127
|
|
|
128
128
|
> `static` `optional` **alias**: `string` \| `string`[]
|
|
129
129
|
|
|
130
|
-
Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/
|
|
130
|
+
Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L45)
|
|
131
131
|
|
|
132
132
|
Engine alias(es). Can be a single string or an array of strings.
|
|
133
133
|
Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
@@ -138,7 +138,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
|
138
138
|
|
|
139
139
|
> `static` **createObject**: (`name`, ...`args`) => `WebSearcher`
|
|
140
140
|
|
|
141
|
-
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/
|
|
141
|
+
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
|
|
142
142
|
|
|
143
143
|
Creates an instance of the registered search engine.
|
|
144
144
|
|
|
@@ -168,7 +168,7 @@ An instance of the search engine.
|
|
|
168
168
|
|
|
169
169
|
> `static` **forEach**: (`cb`) => `void`
|
|
170
170
|
|
|
171
|
-
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/
|
|
171
|
+
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
|
|
172
172
|
|
|
173
173
|
Iterates over all registered engines.
|
|
174
174
|
|
|
@@ -190,7 +190,7 @@ Callback function to invoke for each registered engine.
|
|
|
190
190
|
|
|
191
191
|
> `static` **get**: (`name`) => *typeof* `WebSearcher`
|
|
192
192
|
|
|
193
|
-
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/
|
|
193
|
+
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
|
|
194
194
|
|
|
195
195
|
Retrieves a registered search engine class by name.
|
|
196
196
|
|
|
@@ -214,7 +214,7 @@ The search engine class constructor.
|
|
|
214
214
|
|
|
215
215
|
> `static` `optional` **name**: `string`
|
|
216
216
|
|
|
217
|
-
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/
|
|
217
|
+
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
|
|
218
218
|
|
|
219
219
|
Custom engine name. If not provided, it is derived from the class name.
|
|
220
220
|
For example, `GoogleSearcher` becomes `Google`.
|
|
@@ -225,7 +225,7 @@ For example, `GoogleSearcher` becomes `Google`.
|
|
|
225
225
|
|
|
226
226
|
> `static` **register**: (`ctor`, `options?`) => `boolean`
|
|
227
227
|
|
|
228
|
-
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/
|
|
228
|
+
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
|
|
229
229
|
|
|
230
230
|
Registers a search engine class.
|
|
231
231
|
|
|
@@ -255,7 +255,7 @@ Registration options. If a string is provided, it is used as the registered name
|
|
|
255
255
|
|
|
256
256
|
> `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
|
|
257
257
|
|
|
258
|
-
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/
|
|
258
|
+
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
|
|
259
259
|
|
|
260
260
|
Sets aliases for a registered engine.
|
|
261
261
|
|
|
@@ -283,7 +283,7 @@ Aliases to add.
|
|
|
283
283
|
|
|
284
284
|
> `static` **unregister**: (`name?`) => `void`
|
|
285
285
|
|
|
286
|
-
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
286
|
+
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
|
|
287
287
|
|
|
288
288
|
Unregisters a search engine.
|
|
289
289
|
|
|
@@ -307,7 +307,7 @@ The name or class to unregister.
|
|
|
307
307
|
|
|
308
308
|
> **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md) \| `undefined`
|
|
309
309
|
|
|
310
|
-
Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/
|
|
310
|
+
Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L151)
|
|
311
311
|
|
|
312
312
|
Optional pagination configuration.
|
|
313
313
|
Defines how the searcher navigates to subsequent pages.
|
|
@@ -326,7 +326,7 @@ If undefined, the searcher will only fetch the first page.
|
|
|
326
326
|
|
|
327
327
|
> **get** `abstract` **template**(): `FetcherOptions`
|
|
328
328
|
|
|
329
|
-
Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/
|
|
329
|
+
Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L143)
|
|
330
330
|
|
|
331
331
|
The declarative template for the fetch options.
|
|
332
332
|
|
|
@@ -356,7 +356,7 @@ get template() {
|
|
|
356
356
|
|
|
357
357
|
> `protected` **createContext**(`options`): `FetchContext`
|
|
358
358
|
|
|
359
|
-
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/
|
|
359
|
+
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
|
|
360
360
|
|
|
361
361
|
#### Parameters
|
|
362
362
|
|
|
@@ -378,7 +378,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
|
|
|
378
378
|
|
|
379
379
|
> **dispose**(): `Promise`\<`void`\>
|
|
380
380
|
|
|
381
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
381
|
+
Defined in: web-fetcher/dist/index.d.ts:2334
|
|
382
382
|
|
|
383
383
|
Disposes of the session and its associated engine.
|
|
384
384
|
|
|
@@ -401,7 +401,7 @@ This method should be called when the session is no longer needed to free up res
|
|
|
401
401
|
|
|
402
402
|
> **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
|
|
403
403
|
|
|
404
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
404
|
+
Defined in: web-fetcher/dist/index.d.ts:2289
|
|
405
405
|
|
|
406
406
|
Executes a single action within the session.
|
|
407
407
|
|
|
@@ -449,7 +449,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
|
|
|
449
449
|
|
|
450
450
|
> **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
|
|
451
451
|
|
|
452
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
452
|
+
Defined in: web-fetcher/dist/index.d.ts:2306
|
|
453
453
|
|
|
454
454
|
Executes a sequence of actions.
|
|
455
455
|
|
|
@@ -493,7 +493,7 @@ const { result, outputs } = await session.executeAll([
|
|
|
493
493
|
|
|
494
494
|
> `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
|
|
495
495
|
|
|
496
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
496
|
+
Defined in: [web-searcher/src/searcher.ts:309](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L309)
|
|
497
497
|
|
|
498
498
|
Transforms standard options into engine-specific template variables.
|
|
499
499
|
|
|
@@ -521,7 +521,7 @@ A dictionary of variables to be injected into the template.
|
|
|
521
521
|
|
|
522
522
|
> **getOutputs**(): `Record`\<`string`, `any`\>
|
|
523
523
|
|
|
524
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
524
|
+
Defined in: web-fetcher/dist/index.d.ts:2317
|
|
525
525
|
|
|
526
526
|
Retrieves all outputs accumulated during the session.
|
|
527
527
|
|
|
@@ -541,7 +541,7 @@ A record of stored output data.
|
|
|
541
541
|
|
|
542
542
|
> **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
|
|
543
543
|
|
|
544
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
544
|
+
Defined in: web-fetcher/dist/index.d.ts:2323
|
|
545
545
|
|
|
546
546
|
Gets the current state of the session, including cookies and engine-specific state.
|
|
547
547
|
|
|
@@ -561,7 +561,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
|
|
|
561
561
|
|
|
562
562
|
> **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
563
563
|
|
|
564
|
-
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/
|
|
564
|
+
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
|
|
565
565
|
|
|
566
566
|
Executes a search query.
|
|
567
567
|
|
|
@@ -594,7 +594,7 @@ A promise resolving to an array of standardized search results.
|
|
|
594
594
|
|
|
595
595
|
> `protected` **transform**(`outputs`, `context`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
596
596
|
|
|
597
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
597
|
+
Defined in: [web-searcher/src/searcher.ts:291](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L291)
|
|
598
598
|
|
|
599
599
|
Transform and clean the raw extracted results.
|
|
600
600
|
|
|
@@ -627,7 +627,7 @@ A promise resolving to an array of standardized search results.
|
|
|
627
627
|
|
|
628
628
|
> `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
629
629
|
|
|
630
|
-
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/
|
|
630
|
+
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
|
|
631
631
|
|
|
632
632
|
Static helper to execute a one-off search.
|
|
633
633
|
|