@isdk/web-searcher 0.1.3 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +168 -8
- package/README.md +168 -8
- package/dist/index.d.mts +221 -12
- package/dist/index.d.ts +221 -12
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +168 -8
- package/docs/classes/GoogleSearcher.md +171 -44
- package/docs/classes/WebSearcher.md +158 -45
- package/docs/functions/extractDate.md +42 -0
- package/docs/functions/extractMetadataFrom.md +40 -0
- package/docs/functions/fetchHeaders.md +34 -0
- package/docs/functions/fetchPartial.md +41 -0
- package/docs/functions/normalizeDate.md +29 -0
- package/docs/functions/parseHeaders.md +28 -0
- package/docs/functions/parseHtml.md +31 -0
- package/docs/functions/testUrlsByLatency.md +38 -0
- package/docs/globals.md +18 -0
- package/docs/interfaces/CustomTimeRange.md +3 -3
- package/docs/interfaces/ExtractOptions.md +54 -0
- package/docs/interfaces/FetchExtractorOptions.md +35 -0
- package/docs/interfaces/FetcherOptions.md +424 -0
- package/docs/interfaces/HtmlData.md +53 -0
- package/docs/interfaces/MetadataResult.md +27 -0
- package/docs/interfaces/PaginationConfig.md +9 -9
- package/docs/interfaces/SearchContext.md +30 -4
- package/docs/interfaces/SearchOptions.md +77 -11
- package/docs/interfaces/StandardSearchResult.md +10 -10
- package/docs/interfaces/VerifiedUrl.md +25 -0
- package/docs/type-aliases/MetadataType.md +13 -0
- package/docs/type-aliases/SafeSearchLevel.md +1 -1
- package/docs/type-aliases/SearchCategory.md +2 -2
- package/docs/type-aliases/SearchTimeRange.md +1 -1
- package/docs/type-aliases/SearchTimeRangePreset.md +2 -2
- package/docs/type-aliases/SearcherConstructor.md +2 -2
- package/package.json +3 -2
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Abstract Class: WebSearcher
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
9
|
+
Defined in: [web-searcher/src/searcher.ts:32](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L32)
|
|
10
10
|
|
|
11
11
|
The abstract base class for all search engines.
|
|
12
12
|
|
|
@@ -41,7 +41,7 @@ WebSearcher.register(MySearcher);
|
|
|
41
41
|
|
|
42
42
|
> **new WebSearcher**(`options?`): `WebSearcher`
|
|
43
43
|
|
|
44
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
44
|
+
Defined in: web-fetcher/dist/index.d.ts:1171
|
|
45
45
|
|
|
46
46
|
Creates a new FetchSession.
|
|
47
47
|
|
|
@@ -49,7 +49,7 @@ Creates a new FetchSession.
|
|
|
49
49
|
|
|
50
50
|
##### options?
|
|
51
51
|
|
|
52
|
-
`FetcherOptions`
|
|
52
|
+
[`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
53
53
|
|
|
54
54
|
Configuration options for the fetcher.
|
|
55
55
|
|
|
@@ -67,7 +67,7 @@ Configuration options for the fetcher.
|
|
|
67
67
|
|
|
68
68
|
> `protected` **closed**: `boolean`
|
|
69
69
|
|
|
70
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
70
|
+
Defined in: web-fetcher/dist/index.d.ts:1165
|
|
71
71
|
|
|
72
72
|
#### Inherited from
|
|
73
73
|
|
|
@@ -79,7 +79,7 @@ Defined in: web-fetcher/dist/index.d.ts:2269
|
|
|
79
79
|
|
|
80
80
|
> `readonly` **context**: `FetchContext`
|
|
81
81
|
|
|
82
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
82
|
+
Defined in: web-fetcher/dist/index.d.ts:1164
|
|
83
83
|
|
|
84
84
|
The execution context for this session, containing configurations, event bus, and shared state.
|
|
85
85
|
|
|
@@ -93,7 +93,7 @@ The execution context for this session, containing configurations, event bus, an
|
|
|
93
93
|
|
|
94
94
|
> `readonly` **id**: `string`
|
|
95
95
|
|
|
96
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
96
|
+
Defined in: web-fetcher/dist/index.d.ts:1160
|
|
97
97
|
|
|
98
98
|
Unique identifier for the session.
|
|
99
99
|
|
|
@@ -105,9 +105,9 @@ Unique identifier for the session.
|
|
|
105
105
|
|
|
106
106
|
### options
|
|
107
107
|
|
|
108
|
-
> `protected` **options**: `FetcherOptions`
|
|
108
|
+
> `protected` **options**: [`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
109
109
|
|
|
110
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
110
|
+
Defined in: web-fetcher/dist/index.d.ts:1156
|
|
111
111
|
|
|
112
112
|
#### Inherited from
|
|
113
113
|
|
|
@@ -119,7 +119,7 @@ Defined in: web-fetcher/dist/index.d.ts:2260
|
|
|
119
119
|
|
|
120
120
|
> `static` **\_isFactory**: `boolean` = `false`
|
|
121
121
|
|
|
122
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
122
|
+
Defined in: [web-searcher/src/searcher.ts:34](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L34)
|
|
123
123
|
|
|
124
124
|
***
|
|
125
125
|
|
|
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
|
|
|
127
127
|
|
|
128
128
|
> `static` `optional` **alias**: `string` \| `string`[]
|
|
129
129
|
|
|
130
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
130
|
+
Defined in: [web-searcher/src/searcher.ts:46](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L46)
|
|
131
131
|
|
|
132
132
|
Engine alias(es). Can be a single string or an array of strings.
|
|
133
133
|
Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
@@ -138,7 +138,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
|
138
138
|
|
|
139
139
|
> `static` **createObject**: (`name`, ...`args`) => `WebSearcher`
|
|
140
140
|
|
|
141
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
141
|
+
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L85)
|
|
142
142
|
|
|
143
143
|
Creates an instance of the registered search engine.
|
|
144
144
|
|
|
@@ -164,11 +164,31 @@ An instance of the search engine.
|
|
|
164
164
|
|
|
165
165
|
***
|
|
166
166
|
|
|
167
|
+
### currentInstanceIndex?
|
|
168
|
+
|
|
169
|
+
> `static` `optional` **currentInstanceIndex**: `number`
|
|
170
|
+
|
|
171
|
+
Defined in: [web-searcher/src/searcher.ts:52](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L52)
|
|
172
|
+
|
|
173
|
+
Globally shared index for tracking the currently active instance (node) across sessions.
|
|
174
|
+
|
|
175
|
+
***
|
|
176
|
+
|
|
177
|
+
### defaultBaseUrls?
|
|
178
|
+
|
|
179
|
+
> `static` `optional` **defaultBaseUrls**: `string`[]
|
|
180
|
+
|
|
181
|
+
Defined in: [web-searcher/src/searcher.ts:49](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L49)
|
|
182
|
+
|
|
183
|
+
Default base URLs for engines that support multiple instances.
|
|
184
|
+
|
|
185
|
+
***
|
|
186
|
+
|
|
167
187
|
### forEach()
|
|
168
188
|
|
|
169
189
|
> `static` **forEach**: (`cb`) => `void`
|
|
170
190
|
|
|
171
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
191
|
+
Defined in: [web-searcher/src/searcher.ts:92](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L92)
|
|
172
192
|
|
|
173
193
|
Iterates over all registered engines.
|
|
174
194
|
|
|
@@ -190,7 +210,7 @@ Callback function to invoke for each registered engine.
|
|
|
190
210
|
|
|
191
211
|
> `static` **get**: (`name`) => *typeof* `WebSearcher`
|
|
192
212
|
|
|
193
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
213
|
+
Defined in: [web-searcher/src/searcher.ts:76](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L76)
|
|
194
214
|
|
|
195
215
|
Retrieves a registered search engine class by name.
|
|
196
216
|
|
|
@@ -214,7 +234,7 @@ The search engine class constructor.
|
|
|
214
234
|
|
|
215
235
|
> `static` `optional` **name**: `string`
|
|
216
236
|
|
|
217
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
237
|
+
Defined in: [web-searcher/src/searcher.ts:41](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L41)
|
|
218
238
|
|
|
219
239
|
Custom engine name. If not provided, it is derived from the class name.
|
|
220
240
|
For example, `GoogleSearcher` becomes `Google`.
|
|
@@ -225,7 +245,7 @@ For example, `GoogleSearcher` becomes `Google`.
|
|
|
225
245
|
|
|
226
246
|
> `static` **register**: (`ctor`, `options?`) => `boolean`
|
|
227
247
|
|
|
228
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
248
|
+
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L61)
|
|
229
249
|
|
|
230
250
|
Registers a search engine class.
|
|
231
251
|
|
|
@@ -255,7 +275,7 @@ Registration options. If a string is provided, it is used as the registered name
|
|
|
255
275
|
|
|
256
276
|
> `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
|
|
257
277
|
|
|
258
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
278
|
+
Defined in: [web-searcher/src/searcher.ts:100](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L100)
|
|
259
279
|
|
|
260
280
|
Sets aliases for a registered engine.
|
|
261
281
|
|
|
@@ -283,7 +303,7 @@ Aliases to add.
|
|
|
283
303
|
|
|
284
304
|
> `static` **unregister**: (`name?`) => `void`
|
|
285
305
|
|
|
286
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
306
|
+
Defined in: [web-searcher/src/searcher.ts:68](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L68)
|
|
287
307
|
|
|
288
308
|
Unregisters a search engine.
|
|
289
309
|
|
|
@@ -307,7 +327,7 @@ The name or class to unregister.
|
|
|
307
327
|
|
|
308
328
|
> **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md) \| `undefined`
|
|
309
329
|
|
|
310
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
330
|
+
Defined in: [web-searcher/src/searcher.ts:198](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L198)
|
|
311
331
|
|
|
312
332
|
Optional pagination configuration.
|
|
313
333
|
Defines how the searcher navigates to subsequent pages.
|
|
@@ -324,15 +344,17 @@ If undefined, the searcher will only fetch the first page.
|
|
|
324
344
|
|
|
325
345
|
#### Get Signature
|
|
326
346
|
|
|
327
|
-
> **get**
|
|
347
|
+
> **get** **template**(): [`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
328
348
|
|
|
329
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
349
|
+
Defined in: [web-searcher/src/searcher.ts:188](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L188)
|
|
330
350
|
|
|
331
351
|
The declarative template for the fetch options.
|
|
332
352
|
|
|
333
|
-
Subclasses
|
|
353
|
+
Subclasses can implement this getter to provide the engine configuration,
|
|
334
354
|
including the base URL, search parameters pattern, and extraction rules.
|
|
335
355
|
|
|
356
|
+
This getter is **optional** if you override [getTemplate](#gettemplate).
|
|
357
|
+
|
|
336
358
|
Supports variable injection using syntax like `${query}`, `${offset}`, etc.
|
|
337
359
|
|
|
338
360
|
##### Example
|
|
@@ -348,21 +370,47 @@ get template() {
|
|
|
348
370
|
|
|
349
371
|
##### Returns
|
|
350
372
|
|
|
351
|
-
`FetcherOptions`
|
|
373
|
+
[`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
352
374
|
|
|
353
375
|
## Methods
|
|
354
376
|
|
|
377
|
+
### \_logDebug()
|
|
378
|
+
|
|
379
|
+
> `protected` **\_logDebug**(`category`, ...`args`): `void`
|
|
380
|
+
|
|
381
|
+
Defined in: web-fetcher/dist/index.d.ts:1172
|
|
382
|
+
|
|
383
|
+
#### Parameters
|
|
384
|
+
|
|
385
|
+
##### category
|
|
386
|
+
|
|
387
|
+
`string`
|
|
388
|
+
|
|
389
|
+
##### args
|
|
390
|
+
|
|
391
|
+
...`any`[]
|
|
392
|
+
|
|
393
|
+
#### Returns
|
|
394
|
+
|
|
395
|
+
`void`
|
|
396
|
+
|
|
397
|
+
#### Inherited from
|
|
398
|
+
|
|
399
|
+
`FetchSession._logDebug`
|
|
400
|
+
|
|
401
|
+
***
|
|
402
|
+
|
|
355
403
|
### createContext()
|
|
356
404
|
|
|
357
405
|
> `protected` **createContext**(`options`): `FetchContext`
|
|
358
406
|
|
|
359
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
407
|
+
Defined in: [web-searcher/src/searcher.ts:216](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L216)
|
|
360
408
|
|
|
361
409
|
#### Parameters
|
|
362
410
|
|
|
363
411
|
##### options
|
|
364
412
|
|
|
365
|
-
`FetcherOptions` = `...`
|
|
413
|
+
[`FetcherOptions`](../interfaces/FetcherOptions.md) = `...`
|
|
366
414
|
|
|
367
415
|
#### Returns
|
|
368
416
|
|
|
@@ -378,7 +426,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
|
|
|
378
426
|
|
|
379
427
|
> **dispose**(): `Promise`\<`void`\>
|
|
380
428
|
|
|
381
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
429
|
+
Defined in: web-fetcher/dist/index.d.ts:1231
|
|
382
430
|
|
|
383
431
|
Disposes of the session and its associated engine.
|
|
384
432
|
|
|
@@ -401,7 +449,7 @@ This method should be called when the session is no longer needed to free up res
|
|
|
401
449
|
|
|
402
450
|
> **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
|
|
403
451
|
|
|
404
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
452
|
+
Defined in: web-fetcher/dist/index.d.ts:1186
|
|
405
453
|
|
|
406
454
|
Executes a single action within the session.
|
|
407
455
|
|
|
@@ -449,7 +497,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
|
|
|
449
497
|
|
|
450
498
|
> **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
|
|
451
499
|
|
|
452
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
500
|
+
Defined in: web-fetcher/dist/index.d.ts:1203
|
|
453
501
|
|
|
454
502
|
Executes a sequence of actions.
|
|
455
503
|
|
|
@@ -457,13 +505,13 @@ Executes a sequence of actions.
|
|
|
457
505
|
|
|
458
506
|
##### actions
|
|
459
507
|
|
|
460
|
-
`_RequireAtLeastOne`\<`FetchActionProperties`, `"
|
|
508
|
+
`_RequireAtLeastOne`\<`FetchActionProperties`, `"name"` \| `"id"` \| `"action"`\>[]
|
|
461
509
|
|
|
462
510
|
An array of action options to be executed in order.
|
|
463
511
|
|
|
464
512
|
##### options?
|
|
465
513
|
|
|
466
|
-
`Partial
|
|
514
|
+
`Partial`\<[`FetcherOptions`](../interfaces/FetcherOptions.md)\> & `object`
|
|
467
515
|
|
|
468
516
|
Optional temporary configuration overrides (e.g., timeoutMs, headers) for this batch of actions.
|
|
469
517
|
These overrides do not affect the main session context.
|
|
@@ -493,7 +541,7 @@ const { result, outputs } = await session.executeAll([
|
|
|
493
541
|
|
|
494
542
|
> `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
|
|
495
543
|
|
|
496
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
544
|
+
Defined in: [web-searcher/src/searcher.ts:457](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L457)
|
|
497
545
|
|
|
498
546
|
Transforms standard options into engine-specific template variables.
|
|
499
547
|
|
|
@@ -521,7 +569,7 @@ A dictionary of variables to be injected into the template.
|
|
|
521
569
|
|
|
522
570
|
> **getOutputs**(): `Record`\<`string`, `any`\>
|
|
523
571
|
|
|
524
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
572
|
+
Defined in: web-fetcher/dist/index.d.ts:1214
|
|
525
573
|
|
|
526
574
|
Retrieves all outputs accumulated during the session.
|
|
527
575
|
|
|
@@ -541,7 +589,7 @@ A record of stored output data.
|
|
|
541
589
|
|
|
542
590
|
> **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
|
|
543
591
|
|
|
544
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
592
|
+
Defined in: web-fetcher/dist/index.d.ts:1220
|
|
545
593
|
|
|
546
594
|
Gets the current state of the session, including cookies and engine-specific state.
|
|
547
595
|
|
|
@@ -557,16 +605,49 @@ A promise resolving to the session state, or undefined if no engine is initializ
|
|
|
557
605
|
|
|
558
606
|
***
|
|
559
607
|
|
|
608
|
+
### getTemplate()
|
|
609
|
+
|
|
610
|
+
> `protected` **getTemplate**(`variables`, `options`): [`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
611
|
+
|
|
612
|
+
Defined in: [web-searcher/src/searcher.ts:212](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L212)
|
|
613
|
+
|
|
614
|
+
Dynamically retrieves the fetch template based on current variables and search options.
|
|
615
|
+
|
|
616
|
+
Subclasses can override this method to return different extraction rules (actions)
|
|
617
|
+
or URL patterns based on the search category, region, or other parameters.
|
|
618
|
+
|
|
619
|
+
#### Parameters
|
|
620
|
+
|
|
621
|
+
##### variables
|
|
622
|
+
|
|
623
|
+
`Record`\<`string`, `any`\>
|
|
624
|
+
|
|
625
|
+
The calculated variables (from formatOptions, pagination, etc.).
|
|
626
|
+
|
|
627
|
+
##### options
|
|
628
|
+
|
|
629
|
+
[`SearchOptions`](../interfaces/SearchOptions.md)
|
|
630
|
+
|
|
631
|
+
The original search options provided by the user.
|
|
632
|
+
|
|
633
|
+
#### Returns
|
|
634
|
+
|
|
635
|
+
[`FetcherOptions`](../interfaces/FetcherOptions.md)
|
|
636
|
+
|
|
637
|
+
The fetcher configuration to be used for the current request.
|
|
638
|
+
|
|
639
|
+
***
|
|
640
|
+
|
|
560
641
|
### search()
|
|
561
642
|
|
|
562
643
|
> **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
563
644
|
|
|
564
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
645
|
+
Defined in: [web-searcher/src/searcher.ts:246](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L246)
|
|
565
646
|
|
|
566
647
|
Executes a search query.
|
|
567
648
|
|
|
568
|
-
This method handles the pagination loop, variable injection,
|
|
569
|
-
and result transformation.
|
|
649
|
+
This method handles the pagination loop, multi-instance failover, variable injection,
|
|
650
|
+
fetching, and result transformation.
|
|
570
651
|
|
|
571
652
|
#### Parameters
|
|
572
653
|
|
|
@@ -594,7 +675,7 @@ A promise resolving to an array of standardized search results.
|
|
|
594
675
|
|
|
595
676
|
> `protected` **transform**(`outputs`, `context`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
596
677
|
|
|
597
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
678
|
+
Defined in: [web-searcher/src/searcher.ts:439](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L439)
|
|
598
679
|
|
|
599
680
|
Transform and clean the raw extracted results.
|
|
600
681
|
|
|
@@ -623,24 +704,56 @@ A promise resolving to an array of standardized search results.
|
|
|
623
704
|
|
|
624
705
|
***
|
|
625
706
|
|
|
707
|
+
### validateFetchResult()
|
|
708
|
+
|
|
709
|
+
> `protected` **validateFetchResult**(`results`, `context`): `Promise`\<`boolean`\>
|
|
710
|
+
|
|
711
|
+
Defined in: [web-searcher/src/searcher.ts:421](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L421)
|
|
712
|
+
|
|
713
|
+
Hook for subclasses to validate fetched results before they are accepted.
|
|
714
|
+
If this returns false, the instance manager will consider the fetch a failure
|
|
715
|
+
and automatically switch to the next available baseUrl (if any).
|
|
716
|
+
|
|
717
|
+
#### Parameters
|
|
718
|
+
|
|
719
|
+
##### results
|
|
720
|
+
|
|
721
|
+
[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]
|
|
722
|
+
|
|
723
|
+
The extracted results.
|
|
724
|
+
|
|
725
|
+
##### context
|
|
726
|
+
|
|
727
|
+
[`SearchContext`](../interfaces/SearchContext.md)
|
|
728
|
+
|
|
729
|
+
Context including the current baseUrl and page.
|
|
730
|
+
|
|
731
|
+
#### Returns
|
|
732
|
+
|
|
733
|
+
`Promise`\<`boolean`\>
|
|
734
|
+
|
|
735
|
+
A promise resolving to true if valid, false otherwise.
|
|
736
|
+
|
|
737
|
+
***
|
|
738
|
+
|
|
626
739
|
### search()
|
|
627
740
|
|
|
628
|
-
> `static` **search**(`
|
|
741
|
+
> `static` **search**(`engineNames`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
629
742
|
|
|
630
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
743
|
+
Defined in: [web-searcher/src/searcher.ts:113](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/searcher.ts#L113)
|
|
631
744
|
|
|
632
|
-
Static helper to execute a one-off search.
|
|
745
|
+
Static helper to execute a one-off search or a fallback chain.
|
|
633
746
|
|
|
634
|
-
It creates an instance of the specified engine, executes the search, and
|
|
635
|
-
|
|
747
|
+
It creates an instance of the specified engine(s), executes the search, and automatically
|
|
748
|
+
falls back to the next engine in the list if the current one fails or is exhausted.
|
|
636
749
|
|
|
637
750
|
#### Parameters
|
|
638
751
|
|
|
639
|
-
#####
|
|
752
|
+
##### engineNames
|
|
640
753
|
|
|
641
|
-
|
|
754
|
+
The name(s) of the engine(s) to use (e.g., 'Google' or ['SearXNG', 'Google']).
|
|
642
755
|
|
|
643
|
-
|
|
756
|
+
`string` | `string`[]
|
|
644
757
|
|
|
645
758
|
##### query
|
|
646
759
|
|
|
@@ -650,7 +763,7 @@ The search query string.
|
|
|
650
763
|
|
|
651
764
|
##### options
|
|
652
765
|
|
|
653
|
-
[`SearchOptions`](../interfaces/SearchOptions.md) & `FetcherOptions` = `{}`
|
|
766
|
+
[`SearchOptions`](../interfaces/SearchOptions.md) & [`FetcherOptions`](../interfaces/FetcherOptions.md) = `{}`
|
|
654
767
|
|
|
655
768
|
Combined search options and fetcher options.
|
|
656
769
|
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / extractDate
|
|
6
|
+
|
|
7
|
+
# Function: extractDate()
|
|
8
|
+
|
|
9
|
+
> **extractDate**(`url`, `options`): `Promise`\<`string` \| `null`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:30](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/date-extractor.ts#L30)
|
|
12
|
+
|
|
13
|
+
High-level convenience function to extract the publication or modification date from a URL.
|
|
14
|
+
It performs a partial fetch of the content and applies multiple extraction rules
|
|
15
|
+
(LD+JSON, Meta tags, Time tags, Headers) to find the most reliable date.
|
|
16
|
+
|
|
17
|
+
## Parameters
|
|
18
|
+
|
|
19
|
+
### url
|
|
20
|
+
|
|
21
|
+
`string`
|
|
22
|
+
|
|
23
|
+
The web page URL to analyze.
|
|
24
|
+
|
|
25
|
+
### options
|
|
26
|
+
|
|
27
|
+
[`ExtractOptions`](../interfaces/ExtractOptions.md) = `{}`
|
|
28
|
+
|
|
29
|
+
Fetch and extraction options.
|
|
30
|
+
|
|
31
|
+
## Returns
|
|
32
|
+
|
|
33
|
+
`Promise`\<`string` \| `null`\>
|
|
34
|
+
|
|
35
|
+
An ISO 8601 date string, or null if no valid date could be found.
|
|
36
|
+
|
|
37
|
+
## Example
|
|
38
|
+
|
|
39
|
+
```ts
|
|
40
|
+
const date = await extractDate('https://example.com/article');
|
|
41
|
+
console.log(date); // "2024-01-20T12:00:00.000Z"
|
|
42
|
+
```
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / extractMetadataFrom
|
|
6
|
+
|
|
7
|
+
# Function: extractMetadataFrom()
|
|
8
|
+
|
|
9
|
+
> **extractMetadataFrom**(`result`, `type`): `string` \| `null`
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/extractor.ts:27](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/extractor.ts#L27)
|
|
12
|
+
|
|
13
|
+
Extracts specific metadata from parsed HTML and headers based on a requested type.
|
|
14
|
+
Currently supports 'date' extraction with a prioritized fallback mechanism.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### result
|
|
19
|
+
|
|
20
|
+
An object containing the raw HTML content and response headers.
|
|
21
|
+
|
|
22
|
+
#### content
|
|
23
|
+
|
|
24
|
+
`string`
|
|
25
|
+
|
|
26
|
+
#### headers
|
|
27
|
+
|
|
28
|
+
`Headers`
|
|
29
|
+
|
|
30
|
+
### type
|
|
31
|
+
|
|
32
|
+
`string`
|
|
33
|
+
|
|
34
|
+
The type of metadata to extract.
|
|
35
|
+
|
|
36
|
+
## Returns
|
|
37
|
+
|
|
38
|
+
`string` \| `null`
|
|
39
|
+
|
|
40
|
+
The extracted and normalized value, or null if not found.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / fetchHeaders
|
|
6
|
+
|
|
7
|
+
# Function: fetchHeaders()
|
|
8
|
+
|
|
9
|
+
> **fetchHeaders**(`url`, `options`): `Promise`\<`Headers` \| `null`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:19](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L19)
|
|
12
|
+
|
|
13
|
+
Fetches only the HTTP headers for a given URL using a HEAD request.
|
|
14
|
+
Useful for checking 'last-modified' without downloading the body.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### url
|
|
19
|
+
|
|
20
|
+
`string`
|
|
21
|
+
|
|
22
|
+
The URL to check.
|
|
23
|
+
|
|
24
|
+
### options
|
|
25
|
+
|
|
26
|
+
[`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
|
|
27
|
+
|
|
28
|
+
Request options.
|
|
29
|
+
|
|
30
|
+
## Returns
|
|
31
|
+
|
|
32
|
+
`Promise`\<`Headers` \| `null`\>
|
|
33
|
+
|
|
34
|
+
The Headers object, or null on failure.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / fetchPartial
|
|
6
|
+
|
|
7
|
+
# Function: fetchPartial()
|
|
8
|
+
|
|
9
|
+
> **fetchPartial**(`url`, `maxBytes`, `options`): `Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:55](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L55)
|
|
12
|
+
|
|
13
|
+
Fetches a partial amount of content from a URL.
|
|
14
|
+
Automatically handles character set detection from the Content-Type header.
|
|
15
|
+
Aborts the request once the specified maxBytes is reached.
|
|
16
|
+
|
|
17
|
+
## Parameters
|
|
18
|
+
|
|
19
|
+
### url
|
|
20
|
+
|
|
21
|
+
`string`
|
|
22
|
+
|
|
23
|
+
The URL to fetch.
|
|
24
|
+
|
|
25
|
+
### maxBytes
|
|
26
|
+
|
|
27
|
+
`number` = `32768`
|
|
28
|
+
|
|
29
|
+
The maximum number of bytes to read. Defaults to 32KB.
|
|
30
|
+
|
|
31
|
+
### options
|
|
32
|
+
|
|
33
|
+
[`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
|
|
34
|
+
|
|
35
|
+
Request options.
|
|
36
|
+
|
|
37
|
+
## Returns
|
|
38
|
+
|
|
39
|
+
`Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
|
|
40
|
+
|
|
41
|
+
An object containing the decoded content string and the response headers.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / normalizeDate
|
|
6
|
+
|
|
7
|
+
# Function: normalizeDate()
|
|
8
|
+
|
|
9
|
+
> **normalizeDate**(`dateStr`): `string` \| `null`
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/date-normalizer.ts:9](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/date-normalizer.ts#L9)
|
|
12
|
+
|
|
13
|
+
Normalizes a date string into a standard ISO 8601 format (UTC).
|
|
14
|
+
It handles various formats (YYYY-MM-DD, RFC2822, etc.) and performs
|
|
15
|
+
aggressive cleaning and sanity checks.
|
|
16
|
+
|
|
17
|
+
## Parameters
|
|
18
|
+
|
|
19
|
+
### dateStr
|
|
20
|
+
|
|
21
|
+
The raw date string to normalize.
|
|
22
|
+
|
|
23
|
+
`string` | `null`
|
|
24
|
+
|
|
25
|
+
## Returns
|
|
26
|
+
|
|
27
|
+
`string` \| `null`
|
|
28
|
+
|
|
29
|
+
An ISO 8601 string (e.g., "2024-01-20T00:00:00.000Z") or null if invalid.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / parseHeaders
|
|
6
|
+
|
|
7
|
+
# Function: parseHeaders()
|
|
8
|
+
|
|
9
|
+
> **parseHeaders**(`headers`): `Record`\<`string`, `string`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:25](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L25)
|
|
12
|
+
|
|
13
|
+
Converts a Web API Headers object into a plain JavaScript record.
|
|
14
|
+
All header names are converted to lowercase for consistent access.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### headers
|
|
19
|
+
|
|
20
|
+
`Headers`
|
|
21
|
+
|
|
22
|
+
The Headers object to parse.
|
|
23
|
+
|
|
24
|
+
## Returns
|
|
25
|
+
|
|
26
|
+
`Record`\<`string`, `string`\>
|
|
27
|
+
|
|
28
|
+
A record where keys are lowercase header names.
|