@isdk/web-searcher 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/README.cn.md +196 -7
  2. package/README.md +196 -7
  3. package/dist/index.d.mts +234 -11
  4. package/dist/index.d.ts +234 -11
  5. package/dist/index.js +1 -1
  6. package/dist/index.mjs +1 -1
  7. package/docs/README.md +196 -7
  8. package/docs/classes/GoogleSearcher.md +289 -60
  9. package/docs/classes/WebSearcher.md +264 -61
  10. package/docs/functions/extractDate.md +42 -0
  11. package/docs/functions/extractMetadataFrom.md +40 -0
  12. package/docs/functions/fetchHeaders.md +34 -0
  13. package/docs/functions/fetchPartial.md +41 -0
  14. package/docs/functions/normalizeDate.md +29 -0
  15. package/docs/functions/parseHeaders.md +28 -0
  16. package/docs/functions/parseHtml.md +31 -0
  17. package/docs/functions/testUrlsByLatency.md +42 -0
  18. package/docs/globals.md +18 -0
  19. package/docs/interfaces/CustomTimeRange.md +3 -3
  20. package/docs/interfaces/ExtractOptions.md +54 -0
  21. package/docs/interfaces/FetchExtractorOptions.md +35 -0
  22. package/docs/interfaces/FetcherOptions.md +436 -0
  23. package/docs/interfaces/HtmlData.md +53 -0
  24. package/docs/interfaces/MetadataResult.md +27 -0
  25. package/docs/interfaces/PaginationConfig.md +9 -9
  26. package/docs/interfaces/SearchContext.md +30 -4
  27. package/docs/interfaces/SearchOptions.md +77 -11
  28. package/docs/interfaces/StandardSearchResult.md +10 -10
  29. package/docs/interfaces/VerifiedUrl.md +25 -0
  30. package/docs/type-aliases/MetadataType.md +13 -0
  31. package/docs/type-aliases/SafeSearchLevel.md +1 -1
  32. package/docs/type-aliases/SearchCategory.md +2 -2
  33. package/docs/type-aliases/SearchTimeRange.md +1 -1
  34. package/docs/type-aliases/SearchTimeRangePreset.md +1 -1
  35. package/docs/type-aliases/SearcherConstructor.md +2 -2
  36. package/package.json +3 -2
@@ -0,0 +1,34 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / fetchHeaders
6
+
7
+ # Function: fetchHeaders()
8
+
9
+ > **fetchHeaders**(`url`, `options`): `Promise`\<`Headers` \| `null`\>
10
+
11
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:19](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L19)
12
+
13
+ Fetches only the HTTP headers for a given URL using a HEAD request.
14
+ Useful for checking 'last-modified' without downloading the body.
15
+
16
+ ## Parameters
17
+
18
+ ### url
19
+
20
+ `string`
21
+
22
+ The URL to check.
23
+
24
+ ### options
25
+
26
+ [`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
27
+
28
+ Request options.
29
+
30
+ ## Returns
31
+
32
+ `Promise`\<`Headers` \| `null`\>
33
+
34
+ The Headers object, or null on failure.
@@ -0,0 +1,41 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / fetchPartial
6
+
7
+ # Function: fetchPartial()
8
+
9
+ > **fetchPartial**(`url`, `maxBytes`, `options`): `Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
10
+
11
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:55](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L55)
12
+
13
+ Fetches a partial amount of content from a URL.
14
+ Automatically handles character set detection from the Content-Type header.
15
+ Aborts the request once the specified maxBytes is reached.
16
+
17
+ ## Parameters
18
+
19
+ ### url
20
+
21
+ `string`
22
+
23
+ The URL to fetch.
24
+
25
+ ### maxBytes
26
+
27
+ `number` = `32768`
28
+
29
+ The maximum number of bytes to read. Defaults to 32KB.
30
+
31
+ ### options
32
+
33
+ [`FetchExtractorOptions`](../interfaces/FetchExtractorOptions.md) = `{}`
34
+
35
+ Request options.
36
+
37
+ ## Returns
38
+
39
+ `Promise`\<\{ `content`: `string`; `headers`: `Headers`; \} \| `null`\>
40
+
41
+ An object containing the decoded content string and the response headers.
@@ -0,0 +1,29 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / normalizeDate
6
+
7
+ # Function: normalizeDate()
8
+
9
+ > **normalizeDate**(`dateStr`): `string` \| `null`
10
+
11
+ Defined in: [web-searcher/src/utils/extractor/date-normalizer.ts:9](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-normalizer.ts#L9)
12
+
13
+ Normalizes a date string into a standard ISO 8601 format (UTC).
14
+ It handles various formats (YYYY-MM-DD, RFC2822, etc.) and performs
15
+ aggressive cleaning and sanity checks.
16
+
17
+ ## Parameters
18
+
19
+ ### dateStr
20
+
21
+ The raw date string to normalize.
22
+
23
+ `string` | `null`
24
+
25
+ ## Returns
26
+
27
+ `string` \| `null`
28
+
29
+ An ISO 8601 string (e.g., "2024-01-20T00:00:00.000Z") or null if invalid.
@@ -0,0 +1,28 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / parseHeaders
6
+
7
+ # Function: parseHeaders()
8
+
9
+ > **parseHeaders**(`headers`): `Record`\<`string`, `string`\>
10
+
11
+ Defined in: [web-searcher/src/utils/extractor/parser.ts:25](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/parser.ts#L25)
12
+
13
+ Converts a Web API Headers object into a plain JavaScript record.
14
+ All header names are converted to lowercase for consistent access.
15
+
16
+ ## Parameters
17
+
18
+ ### headers
19
+
20
+ `Headers`
21
+
22
+ The Headers object to parse.
23
+
24
+ ## Returns
25
+
26
+ `Record`\<`string`, `string`\>
27
+
28
+ A record where keys are lowercase header names.
@@ -0,0 +1,31 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / parseHtml
6
+
7
+ # Function: parseHtml()
8
+
9
+ > **parseHtml**(`html`): [`HtmlData`](../interfaces/HtmlData.md)
10
+
11
+ Defined in: [web-searcher/src/utils/extractor/parser.ts:49](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/parser.ts#L49)
12
+
13
+ Parses an HTML string to extract generic metadata structures (Meta tags, JSON-LD, Time tags).
14
+
15
+ This function does not perform field-specific logic (like finding a date); it simply
16
+
17
+ collects available structured data.
18
+
19
+ ## Parameters
20
+
21
+ ### html
22
+
23
+ `string`
24
+
25
+ The raw HTML content to parse.
26
+
27
+ ## Returns
28
+
29
+ [`HtmlData`](../interfaces/HtmlData.md)
30
+
31
+ An object containing grouped metadata from the HTML.
@@ -0,0 +1,42 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / testUrlsByLatency
6
+
7
+ # Function: testUrlsByLatency()
8
+
9
+ > **testUrlsByLatency**(`urls`, `options`): `Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
10
+
11
+ Defined in: [web-searcher/src/utils/latency.ts:12](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/latency.ts#L12)
12
+
13
+ A general utility to test a list of URLs for availability and latency.
14
+ Returns a list of verified URLs sorted by response time.
15
+
16
+ ## Parameters
17
+
18
+ ### urls
19
+
20
+ `string`[]
21
+
22
+ ### options
23
+
24
+ #### limit?
25
+
26
+ `number`
27
+
28
+ #### proxy?
29
+
30
+ `string`
31
+
32
+ #### testPath?
33
+
34
+ `string`
35
+
36
+ #### timeout?
37
+
38
+ `number`
39
+
40
+ ## Returns
41
+
42
+ `Promise`\<[`VerifiedUrl`](../interfaces/VerifiedUrl.md)[]\>
package/docs/globals.md CHANGED
@@ -12,15 +12,33 @@
12
12
  ## Interfaces
13
13
 
14
14
  - [CustomTimeRange](interfaces/CustomTimeRange.md)
15
+ - [ExtractOptions](interfaces/ExtractOptions.md)
16
+ - [FetcherOptions](interfaces/FetcherOptions.md)
17
+ - [FetchExtractorOptions](interfaces/FetchExtractorOptions.md)
18
+ - [HtmlData](interfaces/HtmlData.md)
19
+ - [MetadataResult](interfaces/MetadataResult.md)
15
20
  - [PaginationConfig](interfaces/PaginationConfig.md)
16
21
  - [SearchContext](interfaces/SearchContext.md)
17
22
  - [SearchOptions](interfaces/SearchOptions.md)
18
23
  - [StandardSearchResult](interfaces/StandardSearchResult.md)
24
+ - [VerifiedUrl](interfaces/VerifiedUrl.md)
19
25
 
20
26
  ## Type Aliases
21
27
 
28
+ - [MetadataType](type-aliases/MetadataType.md)
22
29
  - [SafeSearchLevel](type-aliases/SafeSearchLevel.md)
23
30
  - [SearchCategory](type-aliases/SearchCategory.md)
24
31
  - [SearcherConstructor](type-aliases/SearcherConstructor.md)
25
32
  - [SearchTimeRange](type-aliases/SearchTimeRange.md)
26
33
  - [SearchTimeRangePreset](type-aliases/SearchTimeRangePreset.md)
34
+
35
+ ## Functions
36
+
37
+ - [extractDate](functions/extractDate.md)
38
+ - [extractMetadataFrom](functions/extractMetadataFrom.md)
39
+ - [fetchHeaders](functions/fetchHeaders.md)
40
+ - [fetchPartial](functions/fetchPartial.md)
41
+ - [normalizeDate](functions/normalizeDate.md)
42
+ - [parseHeaders](functions/parseHeaders.md)
43
+ - [parseHtml](functions/parseHtml.md)
44
+ - [testUrlsByLatency](functions/testUrlsByLatency.md)
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: CustomTimeRange
8
8
 
9
- Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher.js/blob/7bcd8cca4a3a7fc201a5cf3e3b4283f267eadcea/src/types.ts#L104)
9
+ Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L113)
10
10
 
11
11
  ## Properties
12
12
 
@@ -14,7 +14,7 @@ Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher
14
14
 
15
15
  > **from**: `string` \| `Date`
16
16
 
17
- Defined in: [web-searcher/src/types.ts:106](https://github.com/isdk/web-searcher.js/blob/7bcd8cca4a3a7fc201a5cf3e3b4283f267eadcea/src/types.ts#L106)
17
+ Defined in: [web-searcher/src/types.ts:115](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L115)
18
18
 
19
19
  Start date (Date object or string like 'YYYY-MM-DD').
20
20
 
@@ -24,6 +24,6 @@ Start date (Date object or string like 'YYYY-MM-DD').
24
24
 
25
25
  > `optional` **to**: `string` \| `Date`
26
26
 
27
- Defined in: [web-searcher/src/types.ts:108](https://github.com/isdk/web-searcher.js/blob/7bcd8cca4a3a7fc201a5cf3e3b4283f267eadcea/src/types.ts#L108)
27
+ Defined in: [web-searcher/src/types.ts:117](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/types.ts#L117)
28
28
 
29
29
  End date (Date object or string like 'YYYY-MM-DD'). Defaults to current date if omitted.
@@ -0,0 +1,54 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / ExtractOptions
6
+
7
+ # Interface: ExtractOptions
8
+
9
+ Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:7](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-extractor.ts#L7)
10
+
11
+ Options for the extractDate function.
12
+
13
+ ## Extends
14
+
15
+ - [`FetchExtractorOptions`](FetchExtractorOptions.md)
16
+
17
+ ## Properties
18
+
19
+ ### headers?
20
+
21
+ > `optional` **headers**: `Record`\<`string`, `string`\>
22
+
23
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L8)
24
+
25
+ Custom HTTP headers to include in the request.
26
+
27
+ #### Inherited from
28
+
29
+ [`FetchExtractorOptions`](FetchExtractorOptions.md).[`headers`](FetchExtractorOptions.md#headers)
30
+
31
+ ***
32
+
33
+ ### maxBytes?
34
+
35
+ > `optional` **maxBytes**: `number`
36
+
37
+ Defined in: [web-searcher/src/utils/extractor/date-extractor.ts:12](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/date-extractor.ts#L12)
38
+
39
+ Maximum number of bytes to download from the URL.
40
+ Defaults to 32768 (32KB), which is usually enough for the HTML <head>.
41
+
42
+ ***
43
+
44
+ ### timeout?
45
+
46
+ > `optional` **timeout**: `number`
47
+
48
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L6)
49
+
50
+ Timeout in milliseconds. Defaults vary by function (5s to 10s).
51
+
52
+ #### Inherited from
53
+
54
+ [`FetchExtractorOptions`](FetchExtractorOptions.md).[`timeout`](FetchExtractorOptions.md#timeout)
@@ -0,0 +1,35 @@
1
+ [**@isdk/web-searcher**](../README.md)
2
+
3
+ ***
4
+
5
+ [@isdk/web-searcher](../globals.md) / FetchExtractorOptions
6
+
7
+ # Interface: FetchExtractorOptions
8
+
9
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:4](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L4)
10
+
11
+ Options for network requests.
12
+
13
+ ## Extended by
14
+
15
+ - [`ExtractOptions`](ExtractOptions.md)
16
+
17
+ ## Properties
18
+
19
+ ### headers?
20
+
21
+ > `optional` **headers**: `Record`\<`string`, `string`\>
22
+
23
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:8](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L8)
24
+
25
+ Custom HTTP headers to include in the request.
26
+
27
+ ***
28
+
29
+ ### timeout?
30
+
31
+ > `optional` **timeout**: `number`
32
+
33
+ Defined in: [web-searcher/src/utils/extractor/fetcher.ts:6](https://github.com/isdk/web-searcher.js/blob/0c4757eb75b3b7c5af0231806f11e7b3c3166736/src/utils/extractor/fetcher.ts#L6)
34
+
35
+ Timeout in milliseconds. Defaults vary by function (5s to 10s).