@isdk/web-searcher 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,695 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / GoogleSearcher
6
+
7
+ # Class: GoogleSearcher
8
+
9
+ Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L24)
10
+
11
+ A sample implementation of a Google Search scraper.
12
+
13
+ ## Remarks
14
+
15
+ **⚠️ DEMO ONLY ⚠️**
16
+
17
+ This class serves as a **reference implementation** to demonstrate how to extend
18
+ the `WebSearcher` base class. It is **NOT intended for production use**.
19
+
20
+ Google frequently changes its HTML structure and employs sophisticated anti-bot measures.
21
+ A production-grade Google scraper would require robust proxy rotation, CAPTCHA solving,
22
+ and constant maintenance of selectors, or usage of an official API.
23
+
24
+ Use this class to understand:
25
+ 1. How to define a fetch template with variable injection.
26
+ 2. How to map standard options (like time range) to engine-specific URL parameters.
27
+ 3. How to handle pagination.
28
+ 4. How to transform and clean raw extracted data.
29
+
30
+ ## Extends
31
+
32
+ - [`WebSearcher`](WebSearcher.md)
33
+
34
+ ## Constructors
35
+
36
+ ### Constructor
37
+
38
+ > **new GoogleSearcher**(`options?`): `GoogleSearcher`
39
+
40
+ Defined in: web-fetcher/dist/index.d.ts:2192
41
+
42
+ Creates a new FetchSession.
43
+
44
+ #### Parameters
45
+
46
+ ##### options?
47
+
48
+ `FetcherOptions`
49
+
50
+ Configuration options for the fetcher.
51
+
52
+ #### Returns
53
+
54
+ `GoogleSearcher`
55
+
56
+ #### Inherited from
57
+
58
+ [`WebSearcher`](WebSearcher.md).[`constructor`](WebSearcher.md#constructor)
59
+
60
+ ## Properties
61
+
62
+ ### closed
63
+
64
+ > `protected` **closed**: `boolean`
65
+
66
+ Defined in: web-fetcher/dist/index.d.ts:2186
67
+
68
+ #### Inherited from
69
+
70
+ [`WebSearcher`](WebSearcher.md).[`closed`](WebSearcher.md#closed)
71
+
72
+ ***
73
+
74
+ ### context
75
+
76
+ > `readonly` **context**: `FetchContext`
77
+
78
+ Defined in: web-fetcher/dist/index.d.ts:2185
79
+
80
+ The execution context for this session, containing configurations, event bus, and shared state.
81
+
82
+ #### Inherited from
83
+
84
+ [`WebSearcher`](WebSearcher.md).[`context`](WebSearcher.md#context)
85
+
86
+ ***
87
+
88
+ ### id
89
+
90
+ > `readonly` **id**: `string`
91
+
92
+ Defined in: web-fetcher/dist/index.d.ts:2181
93
+
94
+ Unique identifier for the session.
95
+
96
+ #### Inherited from
97
+
98
+ [`WebSearcher`](WebSearcher.md).[`id`](WebSearcher.md#id)
99
+
100
+ ***
101
+
102
+ ### options
103
+
104
+ > `protected` **options**: `FetcherOptions`
105
+
106
+ Defined in: web-fetcher/dist/index.d.ts:2177
107
+
108
+ #### Inherited from
109
+
110
+ [`WebSearcher`](WebSearcher.md).[`options`](WebSearcher.md#options)
111
+
112
+ ***
113
+
114
+ ### \_isFactory
115
+
116
+ > `static` **\_isFactory**: `boolean` = `false`
117
+
118
+ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L33)
119
+
120
+ #### Inherited from
121
+
122
+ [`WebSearcher`](WebSearcher.md).[`_isFactory`](WebSearcher.md#_isfactory)
123
+
124
+ ***
125
+
126
+ ### alias
127
+
128
+ > `static` **alias**: `string`[]
129
+
130
+ Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L25)
131
+
132
+ Engine alias(es). Can be a single string or an array of strings.
133
+ Useful for registering shorthand names (e.g., 'g' for 'Google').
134
+
135
+ #### Overrides
136
+
137
+ [`WebSearcher`](WebSearcher.md).[`alias`](WebSearcher.md#alias)
138
+
139
+ ***
140
+
141
+ ### createObject()
142
+
143
+ > `static` **createObject**: (`name`, ...`args`) => [`WebSearcher`](WebSearcher.md)
144
+
145
+ Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L78)
146
+
147
+ Creates an instance of the registered search engine.
148
+
149
+ #### Parameters
150
+
151
+ ##### name
152
+
153
+ `string`
154
+
155
+ The name of the engine.
156
+
157
+ ##### args
158
+
159
+ ...`any`[]
160
+
161
+ Arguments to pass to the constructor.
162
+
163
+ #### Returns
164
+
165
+ [`WebSearcher`](WebSearcher.md)
166
+
167
+ An instance of the search engine.
168
+
169
+ #### Inherited from
170
+
171
+ [`WebSearcher`](WebSearcher.md).[`createObject`](WebSearcher.md#createobject)
172
+
173
+ ***
174
+
175
+ ### forEach()
176
+
177
+ > `static` **forEach**: (`cb`) => `void`
178
+
179
+ Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L85)
180
+
181
+ Iterates over all registered engines.
182
+
183
+ #### Parameters
184
+
185
+ ##### cb
186
+
187
+ (`ctor`, `name`) => `void`
188
+
189
+ Callback function to invoke for each registered engine.
190
+
191
+ #### Returns
192
+
193
+ `void`
194
+
195
+ #### Inherited from
196
+
197
+ [`WebSearcher`](WebSearcher.md).[`forEach`](WebSearcher.md#foreach)
198
+
199
+ ***
200
+
201
+ ### get()
202
+
203
+ > `static` **get**: (`name`) => *typeof* [`WebSearcher`](WebSearcher.md)
204
+
205
+ Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L69)
206
+
207
+ Retrieves a registered search engine class by name.
208
+
209
+ #### Parameters
210
+
211
+ ##### name
212
+
213
+ `string`
214
+
215
+ The name of the engine (e.g., 'Google').
216
+
217
+ #### Returns
218
+
219
+ *typeof* [`WebSearcher`](WebSearcher.md)
220
+
221
+ The search engine class constructor.
222
+
223
+ #### Inherited from
224
+
225
+ [`WebSearcher`](WebSearcher.md).[`get`](WebSearcher.md#get)
226
+
227
+ ***
228
+
229
+ ### name?
230
+
231
+ > `static` `optional` **name**: `string`
232
+
233
+ Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L40)
234
+
235
+ Custom engine name. If not provided, it is derived from the class name.
236
+ For example, `GoogleSearcher` becomes `Google`.
237
+
238
+ #### Inherited from
239
+
240
+ [`WebSearcher`](WebSearcher.md).[`name`](WebSearcher.md#name)
241
+
242
+ ***
243
+
244
+ ### register()
245
+
246
+ > `static` **register**: (`ctor`, `options?`) => `boolean`
247
+
248
+ Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L54)
249
+
250
+ Registers a search engine class.
251
+
252
+ #### Parameters
253
+
254
+ ##### ctor
255
+
256
+ *typeof* [`WebSearcher`](WebSearcher.md)
257
+
258
+ The search engine class to register.
259
+
260
+ ##### options?
261
+
262
+ Registration options. If a string is provided, it is used as the registered name.
263
+
264
+ `string` | `IBaseFactoryOptions`
265
+
266
+ #### Returns
267
+
268
+ `boolean`
269
+
270
+ `true` if registration was successful.
271
+
272
+ #### Inherited from
273
+
274
+ [`WebSearcher`](WebSearcher.md).[`register`](WebSearcher.md#register)
275
+
276
+ ***
277
+
278
+ ### setAliases()
279
+
280
+ > `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
281
+
282
+ Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L93)
283
+
284
+ Sets aliases for a registered engine.
285
+
286
+ #### Parameters
287
+
288
+ ##### ctor
289
+
290
+ *typeof* [`WebSearcher`](WebSearcher.md)
291
+
292
+ The search engine class.
293
+
294
+ ##### aliases
295
+
296
+ ...`string`[]
297
+
298
+ Aliases to add.
299
+
300
+ #### Returns
301
+
302
+ `void`
303
+
304
+ #### Inherited from
305
+
306
+ [`WebSearcher`](WebSearcher.md).[`setAliases`](WebSearcher.md#setaliases)
307
+
308
+ ***
309
+
310
+ ### unregister()
311
+
312
+ > `static` **unregister**: (`name?`) => `void`
313
+
314
+ Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L61)
315
+
316
+ Unregisters a search engine.
317
+
318
+ #### Parameters
319
+
320
+ ##### name?
321
+
322
+ The name or class to unregister.
323
+
324
+ `string` | *typeof* [`WebSearcher`](WebSearcher.md)
325
+
326
+ #### Returns
327
+
328
+ `void`
329
+
330
+ #### Inherited from
331
+
332
+ [`WebSearcher`](WebSearcher.md).[`unregister`](WebSearcher.md#unregister)
333
+
334
+ ## Accessors
335
+
336
+ ### pagination
337
+
338
+ #### Get Signature
339
+
340
+ > **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md)
341
+
342
+ Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L61)
343
+
344
+ Configures pagination for Google Search results.
345
+ Uses the 'start' URL parameter, incrementing by 10 for each page.
346
+
347
+ ##### Returns
348
+
349
+ [`PaginationConfig`](../interfaces/PaginationConfig.md)
350
+
351
+ #### Overrides
352
+
353
+ [`WebSearcher`](WebSearcher.md).[`pagination`](WebSearcher.md#pagination)
354
+
355
+ ***
356
+
357
+ ### template
358
+
359
+ #### Get Signature
360
+
361
+ > **get** **template**(): `FetcherOptions`
362
+
363
+ Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L32)
364
+
365
+ Defines the fetch template for Google Search.
366
+
367
+ ##### Returns
368
+
369
+ `FetcherOptions`
370
+
371
+ The fetcher configuration including the URL pattern and extraction rules.
372
+
373
+ #### Overrides
374
+
375
+ [`WebSearcher`](WebSearcher.md).[`template`](WebSearcher.md#template)
376
+
377
+ ## Methods
378
+
379
+ ### createContext()
380
+
381
+ > `protected` **createContext**(`options`): `FetchContext`
382
+
383
+ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L155)
384
+
385
+ #### Parameters
386
+
387
+ ##### options
388
+
389
+ `FetcherOptions` = `...`
390
+
391
+ #### Returns
392
+
393
+ `FetchContext`
394
+
395
+ #### Inherited from
396
+
397
+ [`WebSearcher`](WebSearcher.md).[`createContext`](WebSearcher.md#createcontext)
398
+
399
+ ***
400
+
401
+ ### dispose()
402
+
403
+ > **dispose**(): `Promise`\<`void`\>
404
+
405
+ Defined in: web-fetcher/dist/index.d.ts:2251
406
+
407
+ Disposes of the session and its associated engine.
408
+
409
+ #### Returns
410
+
411
+ `Promise`\<`void`\>
412
+
413
+ #### Remarks
414
+
415
+ This method should be called when the session is no longer needed to free up resources
416
+ (e.g., closing browser instances, purging temporary storage).
417
+
418
+ #### Inherited from
419
+
420
+ [`WebSearcher`](WebSearcher.md).[`dispose`](WebSearcher.md#dispose)
421
+
422
+ ***
423
+
424
+ ### execute()
425
+
426
+ > **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
427
+
428
+ Defined in: web-fetcher/dist/index.d.ts:2206
429
+
430
+ Executes a single action within the session.
431
+
432
+ #### Type Parameters
433
+
434
+ ##### R
435
+
436
+ `R` *extends* `FetchReturnType` = `"response"`
437
+
438
+ The expected return type of the action.
439
+
440
+ #### Parameters
441
+
442
+ ##### actionOptions
443
+
444
+ `_RequireAtLeastOne`
445
+
446
+ Configuration for the action to be executed.
447
+
448
+ ##### context?
449
+
450
+ `FetchContext`
451
+
452
+ Optional context override for this specific execution. Defaults to the session context.
453
+
454
+ #### Returns
455
+
456
+ `Promise`\<`FetchActionResult`\<`R`\>\>
457
+
458
+ A promise that resolves to the result of the action.
459
+
460
+ #### Example
461
+
462
+ ```ts
463
+ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
464
+ ```
465
+
466
+ #### Inherited from
467
+
468
+ [`WebSearcher`](WebSearcher.md).[`execute`](WebSearcher.md#execute)
469
+
470
+ ***
471
+
472
+ ### executeAll()
473
+
474
+ > **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
475
+
476
+ Defined in: web-fetcher/dist/index.d.ts:2223
477
+
478
+ Executes a sequence of actions.
479
+
480
+ #### Parameters
481
+
482
+ ##### actions
483
+
484
+ `_RequireAtLeastOne`\<`FetchActionProperties`, `"id"` \| `"name"` \| `"action"`\>[]
485
+
486
+ An array of action options to be executed in order.
487
+
488
+ ##### options?
489
+
490
+ `Partial`\<`FetcherOptions`\> & `object`
491
+
492
+ Optional temporary configuration overrides (e.g., timeoutMs, headers) for this batch of actions.
493
+ These overrides do not affect the main session context.
494
+
495
+ #### Returns
496
+
497
+ `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
498
+
499
+ A promise that resolves to an object containing the result of the last action and all accumulated outputs.
500
+
501
+ #### Example
502
+
503
+ ```ts
504
+ const { result, outputs } = await session.executeAll([
505
+ { name: 'goto', params: { url: 'https://example.com' } },
506
+ { name: 'extract', params: { schema: { title: 'h1' } }, storeAs: 'data' }
507
+ ], { timeoutMs: 30000 });
508
+ ```
509
+
510
+ #### Inherited from
511
+
512
+ [`WebSearcher`](WebSearcher.md).[`executeAll`](WebSearcher.md#executeall)
513
+
514
+ ***
515
+
516
+ ### formatOptions()
517
+
518
+ > `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
519
+
520
+ Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L82)
521
+
522
+ Maps standard `SearchOptions` to Google's specific URL parameters.
523
+
524
+ - `timeRange` -> `tbs` (e.g., 'qdr:d' for day)
525
+ - `category` -> `tbm` (e.g., 'isch' for images)
526
+ - `region` -> `gl`
527
+ - `language` -> `hl`
528
+ - `safeSearch` -> `safe`
529
+
530
+ #### Parameters
531
+
532
+ ##### options
533
+
534
+ [`SearchOptions`](../interfaces/SearchOptions.md)
535
+
536
+ The user-provided search options.
537
+
538
+ #### Returns
539
+
540
+ `Record`\<`string`, `any`\>
541
+
542
+ A map of variables to inject into the URL template.
543
+
544
+ #### Overrides
545
+
546
+ [`WebSearcher`](WebSearcher.md).[`formatOptions`](WebSearcher.md#formatoptions)
547
+
548
+ ***
549
+
550
+ ### getOutputs()
551
+
552
+ > **getOutputs**(): `Record`\<`string`, `any`\>
553
+
554
+ Defined in: web-fetcher/dist/index.d.ts:2234
555
+
556
+ Retrieves all outputs accumulated during the session.
557
+
558
+ #### Returns
559
+
560
+ `Record`\<`string`, `any`\>
561
+
562
+ A record of stored output data.
563
+
564
+ #### Inherited from
565
+
566
+ [`WebSearcher`](WebSearcher.md).[`getOutputs`](WebSearcher.md#getoutputs)
567
+
568
+ ***
569
+
570
+ ### getState()
571
+
572
+ > **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
573
+
574
+ Defined in: web-fetcher/dist/index.d.ts:2240
575
+
576
+ Gets the current state of the session, including cookies and engine-specific state.
577
+
578
+ #### Returns
579
+
580
+ `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
581
+
582
+ A promise resolving to the session state, or undefined if no engine is initialized.
583
+
584
+ #### Inherited from
585
+
586
+ [`WebSearcher`](WebSearcher.md).[`getState`](WebSearcher.md#getstate)
587
+
588
+ ***
589
+
590
+ ### search()
591
+
592
+ > **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
593
+
594
+ Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L182)
595
+
596
+ Executes a search query.
597
+
598
+ This method handles the pagination loop, variable injection, fetching,
599
+ and result transformation.
600
+
601
+ #### Parameters
602
+
603
+ ##### query
604
+
605
+ `string`
606
+
607
+ The search query string.
608
+
609
+ ##### options
610
+
611
+ [`SearchOptions`](../interfaces/SearchOptions.md) = `{}`
612
+
613
+ Optional search parameters (e.g., limit, timeRange).
614
+
615
+ #### Returns
616
+
617
+ `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
618
+
619
+ A promise resolving to an array of standardized search results.
620
+
621
+ #### Inherited from
622
+
623
+ [`WebSearcher`](WebSearcher.md).[`search`](WebSearcher.md#search)
624
+
625
+ ***
626
+
627
+ ### transform()
628
+
629
+ > `protected` **transform**(`outputs`): `Promise`\<`any`[]\>
630
+
631
+ Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/engines/google.ts#L144)
632
+
633
+ Cleans and normalizes the extracted results.
634
+ Specifically, it unwraps Google's redirect URLs (starting with `/url?q=`).
635
+
636
+ #### Parameters
637
+
638
+ ##### outputs
639
+
640
+ `Record`\<`string`, `any`\>
641
+
642
+ The raw outputs from the fetcher.
643
+
644
+ #### Returns
645
+
646
+ `Promise`\<`any`[]\>
647
+
648
+ An array of cleaned search results.
649
+
650
+ #### Overrides
651
+
652
+ [`WebSearcher`](WebSearcher.md).[`transform`](WebSearcher.md#transform)
653
+
654
+ ***
655
+
656
+ ### search()
657
+
658
+ > `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
659
+
660
+ Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e9a6e5ec9526780489427743389b927a5c16db5c/src/searcher.ts#L106)
661
+
662
+ Static helper to execute a one-off search.
663
+
664
+ It creates an instance of the specified engine, executes the search, and then
665
+ automatically disposes of the session.
666
+
667
+ #### Parameters
668
+
669
+ ##### engineName
670
+
671
+ `string`
672
+
673
+ The name of the engine to use (e.g., 'Google').
674
+
675
+ ##### query
676
+
677
+ `string`
678
+
679
+ The search query string.
680
+
681
+ ##### options
682
+
683
+ [`SearchOptions`](../interfaces/SearchOptions.md) & `FetcherOptions` = `{}`
684
+
685
+ Combined search options and fetcher options.
686
+
687
+ #### Returns
688
+
689
+ `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
690
+
691
+ A promise resolving to an array of standardized search results.
692
+
693
+ #### Inherited from
694
+
695
+ [`WebSearcher`](WebSearcher.md).[`search`](WebSearcher.md#search-2)