@isdk/web-searcher 0.1.3 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +168 -8
- package/README.md +168 -8
- package/dist/index.d.mts +221 -12
- package/dist/index.d.ts +221 -12
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +168 -8
- package/docs/classes/GoogleSearcher.md +171 -44
- package/docs/classes/WebSearcher.md +158 -45
- package/docs/functions/extractDate.md +42 -0
- package/docs/functions/extractMetadataFrom.md +40 -0
- package/docs/functions/fetchHeaders.md +34 -0
- package/docs/functions/fetchPartial.md +41 -0
- package/docs/functions/normalizeDate.md +29 -0
- package/docs/functions/parseHeaders.md +28 -0
- package/docs/functions/parseHtml.md +31 -0
- package/docs/functions/testUrlsByLatency.md +38 -0
- package/docs/globals.md +18 -0
- package/docs/interfaces/CustomTimeRange.md +3 -3
- package/docs/interfaces/ExtractOptions.md +54 -0
- package/docs/interfaces/FetchExtractorOptions.md +35 -0
- package/docs/interfaces/FetcherOptions.md +424 -0
- package/docs/interfaces/HtmlData.md +53 -0
- package/docs/interfaces/MetadataResult.md +27 -0
- package/docs/interfaces/PaginationConfig.md +9 -9
- package/docs/interfaces/SearchContext.md +30 -4
- package/docs/interfaces/SearchOptions.md +77 -11
- package/docs/interfaces/StandardSearchResult.md +10 -10
- package/docs/interfaces/VerifiedUrl.md +25 -0
- package/docs/type-aliases/MetadataType.md +13 -0
- package/docs/type-aliases/SafeSearchLevel.md +1 -1
- package/docs/type-aliases/SearchCategory.md +2 -2
- package/docs/type-aliases/SearchTimeRange.md +1 -1
- package/docs/type-aliases/SearchTimeRangePreset.md +2 -2
- package/docs/type-aliases/SearcherConstructor.md +2 -2
- package/package.json +3 -2
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / parseHtml
|
|
6
|
+
|
|
7
|
+
# Function: parseHtml()
|
|
8
|
+
|
|
9
|
+
> **parseHtml**(`html`): [`HtmlData`](../interfaces/HtmlData.md)
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:49](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L49)
|
|
12
|
+
|
|
13
|
+
Parses an HTML string to extract generic metadata structures (Meta tags, JSON-LD, Time tags).
|
|
14
|
+
|
|
15
|
+
This function does not perform field-specific logic (like finding a date); it simply
|
|
16
|
+
|
|
17
|
+
collects available structured data.
|
|
18
|
+
|
|
19
|
+
## Parameters
|
|
20
|
+
|
|
21
|
+
### html
|
|
22
|
+
|
|
23
|
+
`string`
|
|
24
|
+
|
|
25
|
+
The raw HTML content to parse.
|
|
26
|
+
|
|
27
|
+
## Returns
|
|
28
|
+
|
|
29
|
+
[`HtmlData`](../interfaces/HtmlData.md)
|
|
30
|
+
|
|
31
|
+
An object containing grouped metadata from the HTML.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / testUrlsByLatency
|
|
6
|
+
|
|
7
|
+
# Function: testUrlsByLatency()
|
|
8
|
+
|
|
9
|
+
> **testUrlsByLatency**(`urls`, `options`): `Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/latency.ts:12](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/latency.ts#L12)
|
|
12
|
+
|
|
13
|
+
A general utility to test a list of URLs for availability and latency.
|
|
14
|
+
Returns a list of verified URLs sorted by response time.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### urls
|
|
19
|
+
|
|
20
|
+
`string`[]
|
|
21
|
+
|
|
22
|
+
### options
|
|
23
|
+
|
|
24
|
+
#### limit?
|
|
25
|
+
|
|
26
|
+
`number`
|
|
27
|
+
|
|
28
|
+
#### testPath?
|
|
29
|
+
|
|
30
|
+
`string`
|
|
31
|
+
|
|
32
|
+
#### timeout?
|
|
33
|
+
|
|
34
|
+
`number`
|
|
35
|
+
|
|
36
|
+
## Returns
|
|
37
|
+
|
|
38
|
+
`Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
|
package/docs/globals.md
CHANGED
|
@@ -12,15 +12,33 @@
|
|
|
12
12
|
## Interfaces
|
|
13
13
|
|
|
14
14
|
- [CustomTimeRange](interfaces/CustomTimeRange.md)
|
|
15
|
+
- [ExtractOptions](interfaces/ExtractOptions.md)
|
|
16
|
+
- [FetcherOptions](interfaces/FetcherOptions.md)
|
|
17
|
+
- [FetchExtractorOptions](interfaces/FetchExtractorOptions.md)
|
|
18
|
+
- [HtmlData](interfaces/HtmlData.md)
|
|
19
|
+
- [MetadataResult](interfaces/MetadataResult.md)
|
|
15
20
|
- [PaginationConfig](interfaces/PaginationConfig.md)
|
|
16
21
|
- [SearchContext](interfaces/SearchContext.md)
|
|
17
22
|
- [SearchOptions](interfaces/SearchOptions.md)
|
|
18
23
|
- [StandardSearchResult](interfaces/StandardSearchResult.md)
|
|
24
|
+
- [VerifiedUrl](interfaces/VerifiedUrl.md)
|
|
19
25
|
|
|
20
26
|
## Type Aliases
|
|
21
27
|
|
|
28
|
+
- [MetadataType](type-aliases/MetadataType.md)
|
|
22
29
|
- [SafeSearchLevel](type-aliases/SafeSearchLevel.md)
|
|
23
30
|
- [SearchCategory](type-aliases/SearchCategory.md)
|
|
24
31
|
- [SearcherConstructor](type-aliases/SearcherConstructor.md)
|
|
25
32
|
- [SearchTimeRange](type-aliases/SearchTimeRange.md)
|
|
26
33
|
- [SearchTimeRangePreset](type-aliases/SearchTimeRangePreset.md)
|
|
34
|
+
|
|
35
|
+
## Functions
|
|
36
|
+
|
|
37
|
+
- [extractDate](functions/extractDate.md)
|
|
38
|
+
- [extractMetadataFrom](functions/extractMetadataFrom.md)
|
|
39
|
+
- [fetchHeaders](functions/fetchHeaders.md)
|
|
40
|
+
- [fetchPartial](functions/fetchPartial.md)
|
|
41
|
+
- [normalizeDate](functions/normalizeDate.md)
|
|
42
|
+
- [parseHeaders](functions/parseHeaders.md)
|
|
43
|
+
- [parseHtml](functions/parseHtml.md)
|
|
44
|
+
- [testUrlsByLatency](functions/testUrlsByLatency.md)
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: CustomTimeRange
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/types.ts#L113)
|
|
10
10
|
|
|
11
11
|
## Properties
|
|
12
12
|
|
|
@@ -14,7 +14,7 @@ Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher
|
|
|
14
14
|
|
|
15
15
|
> **from**: `string` \| `Date`
|
|
16
16
|
|
|
17
|
-
Defined in: [web-searcher/src/types.ts:
|
|
17
|
+
Defined in: [web-searcher/src/types.ts:115](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/types.ts#L115)
|
|
18
18
|
|
|
19
19
|
Start date (Date object or string like 'YYYY-MM-DD').
|
|
20
20
|
|
|
@@ -24,6 +24,6 @@ Start date (Date object or string like 'YYYY-MM-DD').
|
|
|
24
24
|
|
|
25
25
|
> `optional` **to**: `string` \| `Date`
|
|
26
26
|
|
|
27
|
-
Defined in: [web-searcher/src/types.ts:
|
|
27
|
+
Defined in: [web-searcher/src/types.ts:117](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/types.ts#L117)
|
|
28
28
|
|
|
29
29
|
End date (Date object or string like 'YYYY-MM-DD'). Defaults to current date if omitted.
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / ExtractOptions
|
|
6
|
+
|
|
7
|
+
# Interface: ExtractOptions
|
|
8
|
+
|
|
9
|
+
Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:7](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/date-extractor.ts#L7)
|
|
10
|
+
|
|
11
|
+
Options for the extractDate function.
|
|
12
|
+
|
|
13
|
+
## Extends
|
|
14
|
+
|
|
15
|
+
- [`FetchExtractorOptions`](FetchExtractorOptions.md)
|
|
16
|
+
|
|
17
|
+
## Properties
|
|
18
|
+
|
|
19
|
+
### headers?
|
|
20
|
+
|
|
21
|
+
> `optional` **headers**: `Record`\<`string`, `string`\>
|
|
22
|
+
|
|
23
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L8)
|
|
24
|
+
|
|
25
|
+
Custom HTTP headers to include in the request.
|
|
26
|
+
|
|
27
|
+
#### Inherited from
|
|
28
|
+
|
|
29
|
+
[`FetchExtractorOptions`](FetchExtractorOptions.md).[`headers`](FetchExtractorOptions.md#headers)
|
|
30
|
+
|
|
31
|
+
***
|
|
32
|
+
|
|
33
|
+
### maxBytes?
|
|
34
|
+
|
|
35
|
+
> `optional` **maxBytes**: `number`
|
|
36
|
+
|
|
37
|
+
Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:12](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/date-extractor.ts#L12)
|
|
38
|
+
|
|
39
|
+
Maximum number of bytes to download from the URL.
|
|
40
|
+
Defaults to 32768 (32KB), which is usually enough for the HTML <head>.
|
|
41
|
+
|
|
42
|
+
***
|
|
43
|
+
|
|
44
|
+
### timeout?
|
|
45
|
+
|
|
46
|
+
> `optional` **timeout**: `number`
|
|
47
|
+
|
|
48
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L6)
|
|
49
|
+
|
|
50
|
+
Timeout in milliseconds. Defaults vary by function (5s to 10s).
|
|
51
|
+
|
|
52
|
+
#### Inherited from
|
|
53
|
+
|
|
54
|
+
[`FetchExtractorOptions`](FetchExtractorOptions.md).[`timeout`](FetchExtractorOptions.md#timeout)
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / FetchExtractorOptions
|
|
6
|
+
|
|
7
|
+
# Interface: FetchExtractorOptions
|
|
8
|
+
|
|
9
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:4](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L4)
|
|
10
|
+
|
|
11
|
+
Options for network requests.
|
|
12
|
+
|
|
13
|
+
## Extended by
|
|
14
|
+
|
|
15
|
+
- [`ExtractOptions`](ExtractOptions.md)
|
|
16
|
+
|
|
17
|
+
## Properties
|
|
18
|
+
|
|
19
|
+
### headers?
|
|
20
|
+
|
|
21
|
+
> `optional` **headers**: `Record`\<`string`, `string`\>
|
|
22
|
+
|
|
23
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L8)
|
|
24
|
+
|
|
25
|
+
Custom HTTP headers to include in the request.
|
|
26
|
+
|
|
27
|
+
***
|
|
28
|
+
|
|
29
|
+
### timeout?
|
|
30
|
+
|
|
31
|
+
> `optional` **timeout**: `number`
|
|
32
|
+
|
|
33
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/fetcher.ts#L6)
|
|
34
|
+
|
|
35
|
+
Timeout in milliseconds. Defaults vary by function (5s to 10s).
|
|
@@ -0,0 +1,424 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / FetcherOptions
|
|
6
|
+
|
|
7
|
+
# Interface: FetcherOptions
|
|
8
|
+
|
|
9
|
+
Defined in: web-fetcher/dist/index.d.ts:1108
|
|
10
|
+
|
|
11
|
+
## Extends
|
|
12
|
+
|
|
13
|
+
- `BaseFetcherProperties`
|
|
14
|
+
|
|
15
|
+
## Properties
|
|
16
|
+
|
|
17
|
+
### actions?
|
|
18
|
+
|
|
19
|
+
> `optional` **actions**: `_RequireAtLeastOne`\<`FetchActionProperties`, `"name"` \| `"id"` \| `"action"`\>[]
|
|
20
|
+
|
|
21
|
+
Defined in: web-fetcher/dist/index.d.ts:1109
|
|
22
|
+
|
|
23
|
+
***
|
|
24
|
+
|
|
25
|
+
### antibot?
|
|
26
|
+
|
|
27
|
+
> `optional` **antibot**: `boolean`
|
|
28
|
+
|
|
29
|
+
Defined in: web-fetcher/dist/index.d.ts:1048
|
|
30
|
+
|
|
31
|
+
#### Inherited from
|
|
32
|
+
|
|
33
|
+
`BaseFetcherProperties.antibot`
|
|
34
|
+
|
|
35
|
+
***
|
|
36
|
+
|
|
37
|
+
### blockResources?
|
|
38
|
+
|
|
39
|
+
> `optional` **blockResources**: `string`[]
|
|
40
|
+
|
|
41
|
+
Defined in: web-fetcher/dist/index.d.ts:1061
|
|
42
|
+
|
|
43
|
+
#### Inherited from
|
|
44
|
+
|
|
45
|
+
`BaseFetcherProperties.blockResources`
|
|
46
|
+
|
|
47
|
+
***
|
|
48
|
+
|
|
49
|
+
### browser?
|
|
50
|
+
|
|
51
|
+
> `optional` **browser**: `object`
|
|
52
|
+
|
|
53
|
+
Defined in: web-fetcher/dist/index.d.ts:1071
|
|
54
|
+
|
|
55
|
+
#### engine?
|
|
56
|
+
|
|
57
|
+
> `optional` **engine**: `BrowserEngine`
|
|
58
|
+
|
|
59
|
+
浏览器引擎,默认为 playwright
|
|
60
|
+
|
|
61
|
+
- `playwright`: 使用 Playwright 引擎
|
|
62
|
+
- `puppeteer`: 使用 Puppeteer 引擎
|
|
63
|
+
|
|
64
|
+
#### headless?
|
|
65
|
+
|
|
66
|
+
> `optional` **headless**: `boolean`
|
|
67
|
+
|
|
68
|
+
#### launchOptions?
|
|
69
|
+
|
|
70
|
+
> `optional` **launchOptions**: `Record`\<`string`, `any`\>
|
|
71
|
+
|
|
72
|
+
#### waitUntil?
|
|
73
|
+
|
|
74
|
+
> `optional` **waitUntil**: `"load"` \| `"domcontentloaded"` \| `"networkidle"` \| `"commit"`
|
|
75
|
+
|
|
76
|
+
#### Inherited from
|
|
77
|
+
|
|
78
|
+
`BaseFetcherProperties.browser`
|
|
79
|
+
|
|
80
|
+
***
|
|
81
|
+
|
|
82
|
+
### cache?
|
|
83
|
+
|
|
84
|
+
> `optional` **cache**: `FetchCacheOptions`
|
|
85
|
+
|
|
86
|
+
Defined in: web-fetcher/dist/index.d.ts:1069
|
|
87
|
+
|
|
88
|
+
Cache configuration for persistent HTTP caching.
|
|
89
|
+
|
|
90
|
+
#### Inherited from
|
|
91
|
+
|
|
92
|
+
`BaseFetcherProperties.cache`
|
|
93
|
+
|
|
94
|
+
***
|
|
95
|
+
|
|
96
|
+
### cookies?
|
|
97
|
+
|
|
98
|
+
> `optional` **cookies**: `Cookie`[]
|
|
99
|
+
|
|
100
|
+
Defined in: web-fetcher/dist/index.d.ts:1051
|
|
101
|
+
|
|
102
|
+
#### Inherited from
|
|
103
|
+
|
|
104
|
+
`BaseFetcherProperties.cookies`
|
|
105
|
+
|
|
106
|
+
***
|
|
107
|
+
|
|
108
|
+
### debug?
|
|
109
|
+
|
|
110
|
+
> `optional` **debug**: `string` \| `boolean` \| `string`[]
|
|
111
|
+
|
|
112
|
+
Defined in: web-fetcher/dist/index.d.ts:1049
|
|
113
|
+
|
|
114
|
+
#### Inherited from
|
|
115
|
+
|
|
116
|
+
`BaseFetcherProperties.debug`
|
|
117
|
+
|
|
118
|
+
***
|
|
119
|
+
|
|
120
|
+
### delayBetweenRequestsMs?
|
|
121
|
+
|
|
122
|
+
> `optional` **delayBetweenRequestsMs**: `number`
|
|
123
|
+
|
|
124
|
+
Defined in: web-fetcher/dist/index.d.ts:1091
|
|
125
|
+
|
|
126
|
+
#### Inherited from
|
|
127
|
+
|
|
128
|
+
`BaseFetcherProperties.delayBetweenRequestsMs`
|
|
129
|
+
|
|
130
|
+
***
|
|
131
|
+
|
|
132
|
+
### enableSmart?
|
|
133
|
+
|
|
134
|
+
> `optional` **enableSmart**: `boolean`
|
|
135
|
+
|
|
136
|
+
Defined in: web-fetcher/dist/index.d.ts:1044
|
|
137
|
+
|
|
138
|
+
#### Inherited from
|
|
139
|
+
|
|
140
|
+
`BaseFetcherProperties.enableSmart`
|
|
141
|
+
|
|
142
|
+
***
|
|
143
|
+
|
|
144
|
+
### engine?
|
|
145
|
+
|
|
146
|
+
> `optional` **engine**: `string`
|
|
147
|
+
|
|
148
|
+
Defined in: web-fetcher/dist/index.d.ts:1043
|
|
149
|
+
|
|
150
|
+
抓取模式
|
|
151
|
+
|
|
152
|
+
- `http`: 使用 HTTP 进行抓取
|
|
153
|
+
- `browser`: 使用浏览器进行抓取
|
|
154
|
+
- `auto`: auto 会走“智能探测”选择 http 或 browser, 但是如果没有启用 smart,并且在站点注册表中没有,那么则等价为 http.
|
|
155
|
+
|
|
156
|
+
#### Inherited from
|
|
157
|
+
|
|
158
|
+
`BaseFetcherProperties.engine`
|
|
159
|
+
|
|
160
|
+
***
|
|
161
|
+
|
|
162
|
+
### headers?
|
|
163
|
+
|
|
164
|
+
> `optional` **headers**: `Record`\<`string`, `string`\>
|
|
165
|
+
|
|
166
|
+
Defined in: web-fetcher/dist/index.d.ts:1050
|
|
167
|
+
|
|
168
|
+
#### Inherited from
|
|
169
|
+
|
|
170
|
+
`BaseFetcherProperties.headers`
|
|
171
|
+
|
|
172
|
+
***
|
|
173
|
+
|
|
174
|
+
### http?
|
|
175
|
+
|
|
176
|
+
> `optional` **http**: `object`
|
|
177
|
+
|
|
178
|
+
Defined in: web-fetcher/dist/index.d.ts:1083
|
|
179
|
+
|
|
180
|
+
#### body?
|
|
181
|
+
|
|
182
|
+
> `optional` **body**: `any`
|
|
183
|
+
|
|
184
|
+
#### method?
|
|
185
|
+
|
|
186
|
+
> `optional` **method**: `"GET"` \| `"POST"` \| `"PUT"` \| `"PATCH"` \| `"DELETE"`
|
|
187
|
+
|
|
188
|
+
#### Inherited from
|
|
189
|
+
|
|
190
|
+
`BaseFetcherProperties.http`
|
|
191
|
+
|
|
192
|
+
***
|
|
193
|
+
|
|
194
|
+
### ignoreSslErrors?
|
|
195
|
+
|
|
196
|
+
> `optional` **ignoreSslErrors**: `boolean`
|
|
197
|
+
|
|
198
|
+
Defined in: web-fetcher/dist/index.d.ts:1070
|
|
199
|
+
|
|
200
|
+
#### Inherited from
|
|
201
|
+
|
|
202
|
+
`BaseFetcherProperties.ignoreSslErrors`
|
|
203
|
+
|
|
204
|
+
***
|
|
205
|
+
|
|
206
|
+
### maxConcurrency?
|
|
207
|
+
|
|
208
|
+
> `optional` **maxConcurrency**: `number`
|
|
209
|
+
|
|
210
|
+
Defined in: web-fetcher/dist/index.d.ts:1089
|
|
211
|
+
|
|
212
|
+
#### Inherited from
|
|
213
|
+
|
|
214
|
+
`BaseFetcherProperties.maxConcurrency`
|
|
215
|
+
|
|
216
|
+
***
|
|
217
|
+
|
|
218
|
+
### maxRequestsPerMinute?
|
|
219
|
+
|
|
220
|
+
> `optional` **maxRequestsPerMinute**: `number`
|
|
221
|
+
|
|
222
|
+
Defined in: web-fetcher/dist/index.d.ts:1090
|
|
223
|
+
|
|
224
|
+
#### Inherited from
|
|
225
|
+
|
|
226
|
+
`BaseFetcherProperties.maxRequestsPerMinute`
|
|
227
|
+
|
|
228
|
+
***
|
|
229
|
+
|
|
230
|
+
### onPause?
|
|
231
|
+
|
|
232
|
+
> `optional` **onPause**: `OnFetchPauseCallback`
|
|
233
|
+
|
|
234
|
+
Defined in: web-fetcher/dist/index.d.ts:1110
|
|
235
|
+
|
|
236
|
+
***
|
|
237
|
+
|
|
238
|
+
### output?
|
|
239
|
+
|
|
240
|
+
> `optional` **output**: `object`
|
|
241
|
+
|
|
242
|
+
Defined in: web-fetcher/dist/index.d.ts:1056
|
|
243
|
+
|
|
244
|
+
#### cookies?
|
|
245
|
+
|
|
246
|
+
> `optional` **cookies**: `boolean`
|
|
247
|
+
|
|
248
|
+
#### sessionState?
|
|
249
|
+
|
|
250
|
+
> `optional` **sessionState**: `boolean`
|
|
251
|
+
|
|
252
|
+
#### Inherited from
|
|
253
|
+
|
|
254
|
+
`BaseFetcherProperties.output`
|
|
255
|
+
|
|
256
|
+
***
|
|
257
|
+
|
|
258
|
+
### overrideSessionState?
|
|
259
|
+
|
|
260
|
+
> `optional` **overrideSessionState**: `boolean`
|
|
261
|
+
|
|
262
|
+
Defined in: web-fetcher/dist/index.d.ts:1054
|
|
263
|
+
|
|
264
|
+
#### Inherited from
|
|
265
|
+
|
|
266
|
+
`BaseFetcherProperties.overrideSessionState`
|
|
267
|
+
|
|
268
|
+
***
|
|
269
|
+
|
|
270
|
+
### proxy?
|
|
271
|
+
|
|
272
|
+
> `optional` **proxy**: `string` \| `string`[]
|
|
273
|
+
|
|
274
|
+
Defined in: web-fetcher/dist/index.d.ts:1060
|
|
275
|
+
|
|
276
|
+
#### Inherited from
|
|
277
|
+
|
|
278
|
+
`BaseFetcherProperties.proxy`
|
|
279
|
+
|
|
280
|
+
***
|
|
281
|
+
|
|
282
|
+
### requestHandlerTimeoutSecs?
|
|
283
|
+
|
|
284
|
+
> `optional` **requestHandlerTimeoutSecs**: `number`
|
|
285
|
+
|
|
286
|
+
Defined in: web-fetcher/dist/index.d.ts:1088
|
|
287
|
+
|
|
288
|
+
#### Inherited from
|
|
289
|
+
|
|
290
|
+
`BaseFetcherProperties.requestHandlerTimeoutSecs`
|
|
291
|
+
|
|
292
|
+
***
|
|
293
|
+
|
|
294
|
+
### retries?
|
|
295
|
+
|
|
296
|
+
> `optional` **retries**: `number`
|
|
297
|
+
|
|
298
|
+
Defined in: web-fetcher/dist/index.d.ts:1092
|
|
299
|
+
|
|
300
|
+
#### Inherited from
|
|
301
|
+
|
|
302
|
+
`BaseFetcherProperties.retries`
|
|
303
|
+
|
|
304
|
+
***
|
|
305
|
+
|
|
306
|
+
### sessionPoolOptions?
|
|
307
|
+
|
|
308
|
+
> `optional` **sessionPoolOptions**: `SessionPoolOptions`
|
|
309
|
+
|
|
310
|
+
Defined in: web-fetcher/dist/index.d.ts:1053
|
|
311
|
+
|
|
312
|
+
#### Inherited from
|
|
313
|
+
|
|
314
|
+
`BaseFetcherProperties.sessionPoolOptions`
|
|
315
|
+
|
|
316
|
+
***
|
|
317
|
+
|
|
318
|
+
### sessionState?
|
|
319
|
+
|
|
320
|
+
> `optional` **sessionState**: `any`
|
|
321
|
+
|
|
322
|
+
Defined in: web-fetcher/dist/index.d.ts:1052
|
|
323
|
+
|
|
324
|
+
#### Inherited from
|
|
325
|
+
|
|
326
|
+
`BaseFetcherProperties.sessionState`
|
|
327
|
+
|
|
328
|
+
***
|
|
329
|
+
|
|
330
|
+
### sites?
|
|
331
|
+
|
|
332
|
+
> `optional` **sites**: `FetchSite`[]
|
|
333
|
+
|
|
334
|
+
Defined in: web-fetcher/dist/index.d.ts:1093
|
|
335
|
+
|
|
336
|
+
#### Inherited from
|
|
337
|
+
|
|
338
|
+
`BaseFetcherProperties.sites`
|
|
339
|
+
|
|
340
|
+
***
|
|
341
|
+
|
|
342
|
+
### storage?
|
|
343
|
+
|
|
344
|
+
> `optional` **storage**: `StorageOptions`
|
|
345
|
+
|
|
346
|
+
Defined in: web-fetcher/dist/index.d.ts:1065
|
|
347
|
+
|
|
348
|
+
Storage configuration for session isolation and persistence.
|
|
349
|
+
|
|
350
|
+
#### Inherited from
|
|
351
|
+
|
|
352
|
+
`BaseFetcherProperties.storage`
|
|
353
|
+
|
|
354
|
+
***
|
|
355
|
+
|
|
356
|
+
### syncStateOnUpgrade?
|
|
357
|
+
|
|
358
|
+
> `optional` **syncStateOnUpgrade**: `boolean`
|
|
359
|
+
|
|
360
|
+
Defined in: web-fetcher/dist/index.d.ts:1045
|
|
361
|
+
|
|
362
|
+
#### Inherited from
|
|
363
|
+
|
|
364
|
+
`BaseFetcherProperties.syncStateOnUpgrade`
|
|
365
|
+
|
|
366
|
+
***
|
|
367
|
+
|
|
368
|
+
### throwHttpErrors?
|
|
369
|
+
|
|
370
|
+
> `optional` **throwHttpErrors**: `boolean`
|
|
371
|
+
|
|
372
|
+
Defined in: web-fetcher/dist/index.d.ts:1055
|
|
373
|
+
|
|
374
|
+
#### Inherited from
|
|
375
|
+
|
|
376
|
+
`BaseFetcherProperties.throwHttpErrors`
|
|
377
|
+
|
|
378
|
+
***
|
|
379
|
+
|
|
380
|
+
### timeoutMs?
|
|
381
|
+
|
|
382
|
+
> `optional` **timeoutMs**: `number`
|
|
383
|
+
|
|
384
|
+
Defined in: web-fetcher/dist/index.d.ts:1087
|
|
385
|
+
|
|
386
|
+
#### Inherited from
|
|
387
|
+
|
|
388
|
+
`BaseFetcherProperties.timeoutMs`
|
|
389
|
+
|
|
390
|
+
***
|
|
391
|
+
|
|
392
|
+
### upgradeThresholdMs?
|
|
393
|
+
|
|
394
|
+
> `optional` **upgradeThresholdMs**: `number`
|
|
395
|
+
|
|
396
|
+
Defined in: web-fetcher/dist/index.d.ts:1046
|
|
397
|
+
|
|
398
|
+
#### Inherited from
|
|
399
|
+
|
|
400
|
+
`BaseFetcherProperties.upgradeThresholdMs`
|
|
401
|
+
|
|
402
|
+
***
|
|
403
|
+
|
|
404
|
+
### url?
|
|
405
|
+
|
|
406
|
+
> `optional` **url**: `string`
|
|
407
|
+
|
|
408
|
+
Defined in: web-fetcher/dist/index.d.ts:1094
|
|
409
|
+
|
|
410
|
+
#### Inherited from
|
|
411
|
+
|
|
412
|
+
`BaseFetcherProperties.url`
|
|
413
|
+
|
|
414
|
+
***
|
|
415
|
+
|
|
416
|
+
### useSiteRegistry?
|
|
417
|
+
|
|
418
|
+
> `optional` **useSiteRegistry**: `boolean`
|
|
419
|
+
|
|
420
|
+
Defined in: web-fetcher/dist/index.d.ts:1047
|
|
421
|
+
|
|
422
|
+
#### Inherited from
|
|
423
|
+
|
|
424
|
+
`BaseFetcherProperties.useSiteRegistry`
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / HtmlData
|
|
6
|
+
|
|
7
|
+
# Interface: HtmlData
|
|
8
|
+
|
|
9
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:4](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L4)
|
|
10
|
+
|
|
11
|
+
Represents structured data extracted from an HTML document.
|
|
12
|
+
|
|
13
|
+
## Properties
|
|
14
|
+
|
|
15
|
+
### jsonLd
|
|
16
|
+
|
|
17
|
+
> **jsonLd**: `any`[]
|
|
18
|
+
|
|
19
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:8](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L8)
|
|
20
|
+
|
|
21
|
+
Array of parsed JSON-LD objects found in the document.
|
|
22
|
+
|
|
23
|
+
***
|
|
24
|
+
|
|
25
|
+
### meta
|
|
26
|
+
|
|
27
|
+
> **meta**: `Record`\<`string`, `string`\>
|
|
28
|
+
|
|
29
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:6](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L6)
|
|
30
|
+
|
|
31
|
+
Map of meta tag names/properties to their content. Keys are lowercase.
|
|
32
|
+
|
|
33
|
+
***
|
|
34
|
+
|
|
35
|
+
### time
|
|
36
|
+
|
|
37
|
+
> **time**: `object`[]
|
|
38
|
+
|
|
39
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:10](https://github.com/isdk/web-searcher.js/blob/955bc509edda39926bd12c6c2b8c28da7eb13ff5/src/utils/extractor/parser.ts#L10)
|
|
40
|
+
|
|
41
|
+
Array of data from HTML <time> tags.
|
|
42
|
+
|
|
43
|
+
#### datetime
|
|
44
|
+
|
|
45
|
+
> **datetime**: `string` \| `null`
|
|
46
|
+
|
|
47
|
+
The value of the 'datetime' attribute, if present.
|
|
48
|
+
|
|
49
|
+
#### text
|
|
50
|
+
|
|
51
|
+
> **text**: `string`
|
|
52
|
+
|
|
53
|
+
The text content within the <time> tag, with HTML stripped.
|