@isdk/web-searcher 0.1.4 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +196 -7
- package/README.md +196 -7
- package/dist/index.d.mts +234 -11
- package/dist/index.d.ts +234 -11
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +196 -7
- package/docs/classes/GoogleSearcher.md +289 -60
- package/docs/classes/WebSearcher.md +264 -61
- package/docs/functions/extractDate.md +42 -0
- package/docs/functions/extractMetadataFrom.md +40 -0
- package/docs/functions/fetchHeaders.md +34 -0
- package/docs/functions/fetchPartial.md +41 -0
- package/docs/functions/normalizeDate.md +29 -0
- package/docs/functions/parseHeaders.md +28 -0
- package/docs/functions/parseHtml.md +31 -0
- package/docs/functions/testUrlsByLatency.md +42 -0
- package/docs/globals.md +18 -0
- package/docs/interfaces/CustomTimeRange.md +3 -3
- package/docs/interfaces/ExtractOptions.md +54 -0
- package/docs/interfaces/FetchExtractorOptions.md +35 -0
- package/docs/interfaces/FetcherOptions.md +436 -0
- package/docs/interfaces/HtmlData.md +53 -0
- package/docs/interfaces/MetadataResult.md +27 -0
- package/docs/interfaces/PaginationConfig.md +9 -9
- package/docs/interfaces/SearchContext.md +30 -4
- package/docs/interfaces/SearchOptions.md +77 -11
- package/docs/interfaces/StandardSearchResult.md +10 -10
- package/docs/interfaces/VerifiedUrl.md +25 -0
- package/docs/type-aliases/MetadataType.md +13 -0
- package/docs/type-aliases/SafeSearchLevel.md +1 -1
- package/docs/type-aliases/SearchCategory.md +2 -2
- package/docs/type-aliases/SearchTimeRange.md +1 -1
- package/docs/type-aliases/SearchTimeRangePreset.md +1 -1
- package/docs/type-aliases/SearcherConstructor.md +2 -2
- package/package.json +3 -2
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / fetchHeaders
|
|
6
|
+
|
|
7
|
+
# Function: fetchHeaders()
|
|
8
|
+
|
|
9
|
+
> **fetchHeaders**(`url`, `options`): `Promise`\<`Headers` \| `null`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:19](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L19)
|
|
12
|
+
|
|
13
|
+
Fetches only the HTTP headers for a given URL using a HEAD request.
|
|
14
|
+
Useful for checking 'last-modified' without downloading the body.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### url
|
|
19
|
+
|
|
20
|
+
`string`
|
|
21
|
+
|
|
22
|
+
The URL to check.
|
|
23
|
+
|
|
24
|
+
### options
|
|
25
|
+
|
|
26
|
+
[`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
|
|
27
|
+
|
|
28
|
+
Request options.
|
|
29
|
+
|
|
30
|
+
## Returns
|
|
31
|
+
|
|
32
|
+
`Promise`\<`Headers` \| `null`\>
|
|
33
|
+
|
|
34
|
+
The Headers object, or null on failure.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / fetchPartial
|
|
6
|
+
|
|
7
|
+
# Function: fetchPartial()
|
|
8
|
+
|
|
9
|
+
> **fetchPartial**(`url`, `maxBytes`, `options`): `Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:55](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L55)
|
|
12
|
+
|
|
13
|
+
Fetches a partial amount of content from a URL.
|
|
14
|
+
Automatically handles character set detection from the Content-Type header.
|
|
15
|
+
Aborts the request once the specified maxBytes is reached.
|
|
16
|
+
|
|
17
|
+
## Parameters
|
|
18
|
+
|
|
19
|
+
### url
|
|
20
|
+
|
|
21
|
+
`string`
|
|
22
|
+
|
|
23
|
+
The URL to fetch.
|
|
24
|
+
|
|
25
|
+
### maxBytes
|
|
26
|
+
|
|
27
|
+
`number` = `32768`
|
|
28
|
+
|
|
29
|
+
The maximum number of bytes to read. Defaults to 32KB.
|
|
30
|
+
|
|
31
|
+
### options
|
|
32
|
+
|
|
33
|
+
[`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
|
|
34
|
+
|
|
35
|
+
Request options.
|
|
36
|
+
|
|
37
|
+
## Returns
|
|
38
|
+
|
|
39
|
+
`Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
|
|
40
|
+
|
|
41
|
+
An object containing the decoded content string and the response headers.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / normalizeDate
|
|
6
|
+
|
|
7
|
+
# Function: normalizeDate()
|
|
8
|
+
|
|
9
|
+
> **normalizeDate**(`dateStr`): `string` \| `null`
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/date-normalizer.ts:9](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-normalizer.ts#L9)
|
|
12
|
+
|
|
13
|
+
Normalizes a date string into a standard ISO 8601 format (UTC).
|
|
14
|
+
It handles various formats (YYYY-MM-DD, RFC2822, etc.) and performs
|
|
15
|
+
aggressive cleaning and sanity checks.
|
|
16
|
+
|
|
17
|
+
## Parameters
|
|
18
|
+
|
|
19
|
+
### dateStr
|
|
20
|
+
|
|
21
|
+
The raw date string to normalize.
|
|
22
|
+
|
|
23
|
+
`string` | `null`
|
|
24
|
+
|
|
25
|
+
## Returns
|
|
26
|
+
|
|
27
|
+
`string` \| `null`
|
|
28
|
+
|
|
29
|
+
An ISO 8601 string (e.g., "2024-01-20T00:00:00.000Z") or null if invalid.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / parseHeaders
|
|
6
|
+
|
|
7
|
+
# Function: parseHeaders()
|
|
8
|
+
|
|
9
|
+
> **parseHeaders**(`headers`): `Record`\<`string`, `string`\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:25](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/parser.ts#L25)
|
|
12
|
+
|
|
13
|
+
Converts a Web API Headers object into a plain JavaScript record.
|
|
14
|
+
All header names are converted to lowercase for consistent access.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### headers
|
|
19
|
+
|
|
20
|
+
`Headers`
|
|
21
|
+
|
|
22
|
+
The Headers object to parse.
|
|
23
|
+
|
|
24
|
+
## Returns
|
|
25
|
+
|
|
26
|
+
`Record`\<`string`, `string`\>
|
|
27
|
+
|
|
28
|
+
A record where keys are lowercase header names.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / parseHtml
|
|
6
|
+
|
|
7
|
+
# Function: parseHtml()
|
|
8
|
+
|
|
9
|
+
> **parseHtml**(`html`): [`HtmlData`](../interfaces/HtmlData.md)
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/extractor/parser.ts:49](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/parser.ts#L49)
|
|
12
|
+
|
|
13
|
+
Parses an HTML string to extract generic metadata structures (Meta tags, JSON-LD, Time tags).
|
|
14
|
+
|
|
15
|
+
This function does not perform field-specific logic (like finding a date); it simply
|
|
16
|
+
|
|
17
|
+
collects available structured data.
|
|
18
|
+
|
|
19
|
+
## Parameters
|
|
20
|
+
|
|
21
|
+
### html
|
|
22
|
+
|
|
23
|
+
`string`
|
|
24
|
+
|
|
25
|
+
The raw HTML content to parse.
|
|
26
|
+
|
|
27
|
+
## Returns
|
|
28
|
+
|
|
29
|
+
[`HtmlData`](../interfaces/HtmlData.md)
|
|
30
|
+
|
|
31
|
+
An object containing grouped metadata from the HTML.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / testUrlsByLatency
|
|
6
|
+
|
|
7
|
+
# Function: testUrlsByLatency()
|
|
8
|
+
|
|
9
|
+
> **testUrlsByLatency**(`urls`, `options`): `Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
|
|
10
|
+
|
|
11
|
+
Defined in: [web-searcher/src/utils/latency.ts:12](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/latency.ts#L12)
|
|
12
|
+
|
|
13
|
+
A general utility to test a list of URLs for availability and latency.
|
|
14
|
+
Returns a list of verified URLs sorted by response time.
|
|
15
|
+
|
|
16
|
+
## Parameters
|
|
17
|
+
|
|
18
|
+
### urls
|
|
19
|
+
|
|
20
|
+
`string`[]
|
|
21
|
+
|
|
22
|
+
### options
|
|
23
|
+
|
|
24
|
+
#### limit?
|
|
25
|
+
|
|
26
|
+
`number`
|
|
27
|
+
|
|
28
|
+
#### proxy?
|
|
29
|
+
|
|
30
|
+
`string`
|
|
31
|
+
|
|
32
|
+
#### testPath?
|
|
33
|
+
|
|
34
|
+
`string`
|
|
35
|
+
|
|
36
|
+
#### timeout?
|
|
37
|
+
|
|
38
|
+
`number`
|
|
39
|
+
|
|
40
|
+
## Returns
|
|
41
|
+
|
|
42
|
+
`Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
|
package/docs/globals.md
CHANGED
|
@@ -12,15 +12,33 @@
|
|
|
12
12
|
## Interfaces
|
|
13
13
|
|
|
14
14
|
- [CustomTimeRange](interfaces/CustomTimeRange.md)
|
|
15
|
+
- [ExtractOptions](interfaces/ExtractOptions.md)
|
|
16
|
+
- [FetcherOptions](interfaces/FetcherOptions.md)
|
|
17
|
+
- [FetchExtractorOptions](interfaces/FetchExtractorOptions.md)
|
|
18
|
+
- [HtmlData](interfaces/HtmlData.md)
|
|
19
|
+
- [MetadataResult](interfaces/MetadataResult.md)
|
|
15
20
|
- [PaginationConfig](interfaces/PaginationConfig.md)
|
|
16
21
|
- [SearchContext](interfaces/SearchContext.md)
|
|
17
22
|
- [SearchOptions](interfaces/SearchOptions.md)
|
|
18
23
|
- [StandardSearchResult](interfaces/StandardSearchResult.md)
|
|
24
|
+
- [VerifiedUrl](interfaces/VerifiedUrl.md)
|
|
19
25
|
|
|
20
26
|
## Type Aliases
|
|
21
27
|
|
|
28
|
+
- [MetadataType](type-aliases/MetadataType.md)
|
|
22
29
|
- [SafeSearchLevel](type-aliases/SafeSearchLevel.md)
|
|
23
30
|
- [SearchCategory](type-aliases/SearchCategory.md)
|
|
24
31
|
- [SearcherConstructor](type-aliases/SearcherConstructor.md)
|
|
25
32
|
- [SearchTimeRange](type-aliases/SearchTimeRange.md)
|
|
26
33
|
- [SearchTimeRangePreset](type-aliases/SearchTimeRangePreset.md)
|
|
34
|
+
|
|
35
|
+
## Functions
|
|
36
|
+
|
|
37
|
+
- [extractDate](functions/extractDate.md)
|
|
38
|
+
- [extractMetadataFrom](functions/extractMetadataFrom.md)
|
|
39
|
+
- [fetchHeaders](functions/fetchHeaders.md)
|
|
40
|
+
- [fetchPartial](functions/fetchPartial.md)
|
|
41
|
+
- [normalizeDate](functions/normalizeDate.md)
|
|
42
|
+
- [parseHeaders](functions/parseHeaders.md)
|
|
43
|
+
- [parseHtml](functions/parseHtml.md)
|
|
44
|
+
- [testUrlsByLatency](functions/testUrlsByLatency.md)
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: CustomTimeRange
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L113)
|
|
10
10
|
|
|
11
11
|
## Properties
|
|
12
12
|
|
|
@@ -14,7 +14,7 @@ Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher
|
|
|
14
14
|
|
|
15
15
|
> **from**: `string` \| `Date`
|
|
16
16
|
|
|
17
|
-
Defined in: [web-searcher/src/types.ts:
|
|
17
|
+
Defined in: [web-searcher/src/types.ts:115](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L115)
|
|
18
18
|
|
|
19
19
|
Start date (Date object or string like 'YYYY-MM-DD').
|
|
20
20
|
|
|
@@ -24,6 +24,6 @@ Start date (Date object or string like 'YYYY-MM-DD').
|
|
|
24
24
|
|
|
25
25
|
> `optional` **to**: `string` \| `Date`
|
|
26
26
|
|
|
27
|
-
Defined in: [web-searcher/src/types.ts:
|
|
27
|
+
Defined in: [web-searcher/src/types.ts:117](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L117)
|
|
28
28
|
|
|
29
29
|
End date (Date object or string like 'YYYY-MM-DD'). Defaults to current date if omitted.
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / ExtractOptions
|
|
6
|
+
|
|
7
|
+
# Interface: ExtractOptions
|
|
8
|
+
|
|
9
|
+
Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:7](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-extractor.ts#L7)
|
|
10
|
+
|
|
11
|
+
Options for the extractDate function.
|
|
12
|
+
|
|
13
|
+
## Extends
|
|
14
|
+
|
|
15
|
+
- [`FetchExtractorOptions`](FetchExtractorOptions.md)
|
|
16
|
+
|
|
17
|
+
## Properties
|
|
18
|
+
|
|
19
|
+
### headers?
|
|
20
|
+
|
|
21
|
+
> `optional` **headers**: `Record`\<`string`, `string`\>
|
|
22
|
+
|
|
23
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L8)
|
|
24
|
+
|
|
25
|
+
Custom HTTP headers to include in the request.
|
|
26
|
+
|
|
27
|
+
#### Inherited from
|
|
28
|
+
|
|
29
|
+
[`FetchExtractorOptions`](FetchExtractorOptions.md).[`headers`](FetchExtractorOptions.md#headers)
|
|
30
|
+
|
|
31
|
+
***
|
|
32
|
+
|
|
33
|
+
### maxBytes?
|
|
34
|
+
|
|
35
|
+
> `optional` **maxBytes**: `number`
|
|
36
|
+
|
|
37
|
+
Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:12](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-extractor.ts#L12)
|
|
38
|
+
|
|
39
|
+
Maximum number of bytes to download from the URL.
|
|
40
|
+
Defaults to 32768 (32KB), which is usually enough for the HTML <head>.
|
|
41
|
+
|
|
42
|
+
***
|
|
43
|
+
|
|
44
|
+
### timeout?
|
|
45
|
+
|
|
46
|
+
> `optional` **timeout**: `number`
|
|
47
|
+
|
|
48
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L6)
|
|
49
|
+
|
|
50
|
+
Timeout in milliseconds. Defaults vary by function (5s to 10s).
|
|
51
|
+
|
|
52
|
+
#### Inherited from
|
|
53
|
+
|
|
54
|
+
[`FetchExtractorOptions`](FetchExtractorOptions.md).[`timeout`](FetchExtractorOptions.md#timeout)
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
[**@isdk/web-searcher**](../README.md)
|
|
2
|
+
|
|
3
|
+
***
|
|
4
|
+
|
|
5
|
+
[@isdk/web-searcher](../globals.md) / FetchExtractorOptions
|
|
6
|
+
|
|
7
|
+
# Interface: FetchExtractorOptions
|
|
8
|
+
|
|
9
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:4](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L4)
|
|
10
|
+
|
|
11
|
+
Options for network requests.
|
|
12
|
+
|
|
13
|
+
## Extended by
|
|
14
|
+
|
|
15
|
+
- [`ExtractOptions`](ExtractOptions.md)
|
|
16
|
+
|
|
17
|
+
## Properties
|
|
18
|
+
|
|
19
|
+
### headers?
|
|
20
|
+
|
|
21
|
+
> `optional` **headers**: `Record`\<`string`, `string`\>
|
|
22
|
+
|
|
23
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L8)
|
|
24
|
+
|
|
25
|
+
Custom HTTP headers to include in the request.
|
|
26
|
+
|
|
27
|
+
***
|
|
28
|
+
|
|
29
|
+
### timeout?
|
|
30
|
+
|
|
31
|
+
> `optional` **timeout**: `number`
|
|
32
|
+
|
|
33
|
+
Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L6)
|
|
34
|
+
|
|
35
|
+
Timeout in milliseconds. Defaults vary by function (5s to 10s).
|