@isdk/web-searcher 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.cn.md +31 -13
- package/README.md +32 -14
- package/dist/index.d.mts +30 -1
- package/dist/index.d.ts +30 -1
- package/dist/index.js +1 -1
- package/dist/index.mjs +1 -1
- package/docs/README.md +32 -14
- package/docs/classes/GoogleSearcher.md +27 -27
- package/docs/classes/WebSearcher.md +27 -27
- package/docs/interfaces/CustomTimeRange.md +3 -3
- package/docs/interfaces/PaginationConfig.md +26 -6
- package/docs/interfaces/SearchContext.md +4 -4
- package/docs/interfaces/SearchOptions.md +23 -8
- package/docs/interfaces/StandardSearchResult.md +56 -6
- package/docs/type-aliases/SafeSearchLevel.md +1 -1
- package/docs/type-aliases/SearchCategory.md +1 -1
- package/docs/type-aliases/SearchTimeRange.md +1 -1
- package/docs/type-aliases/SearchTimeRangePreset.md +1 -1
- package/docs/type-aliases/SearcherConstructor.md +1 -1
- package/package.json +1 -1
package/README.cn.md
CHANGED
|
@@ -42,27 +42,26 @@ console.log(results);
|
|
|
42
42
|
|
|
43
43
|
由于 `WebSearcher` 继承自 `FetchSession`,您可以实例化它以在多个请求之间保持 Cookie 和存储。这对于需要登录的搜索或通过模拟人类行为来避免反爬虫非常有用。
|
|
44
44
|
|
|
45
|
-
|
|
46
|
-
创建会话时,选项按以下顺序合并:
|
|
45
|
+
### 🛡️ 核心准则:模板即法律 (Template is Law)
|
|
47
46
|
|
|
48
|
-
|
|
49
|
-
2. **用户选项 (User Options)**:传递给构造函数的选项(可填充缺失的默认值,或在允许的情况下进行覆盖)。
|
|
47
|
+
在 `WebSearcher` 子类中定义的 `template` 是权威的“蓝图”。
|
|
50
48
|
|
|
51
|
-
|
|
49
|
+
- **模板优先级**:如果模板定义了某个属性(如 `engine: 'browser'`、特定的 `headers` 等),该值将被**锁定**,用户选项无法覆盖。这确保了抓取逻辑的稳定性。
|
|
50
|
+
- **用户灵活性**:对于模板中**未**显式锁定的属性(如 `proxy`、`timeoutMs` 或自定义变量),用户可以在构造函数或 `search()` 方法中自由设置。
|
|
52
51
|
|
|
53
52
|
```typescript
|
|
54
53
|
// 创建一个持久化会话
|
|
55
54
|
const google = new GoogleSearcher({
|
|
56
|
-
headless: false, //
|
|
55
|
+
headless: false, // 如果模板中未锁定,则可以覆盖
|
|
57
56
|
proxy: 'http://my-proxy:8080',
|
|
58
|
-
timeoutMs: 30000 //
|
|
57
|
+
timeoutMs: 30000 // 有效(假设 GoogleSearcher 模板未显式设置 timeoutMs)
|
|
59
58
|
});
|
|
60
59
|
|
|
61
60
|
try {
|
|
62
61
|
// 第一次查询
|
|
63
62
|
// 您还可以传递运行时选项来覆盖会话默认值或注入变量
|
|
64
63
|
const results1 = await google.search('term A', {
|
|
65
|
-
timeoutMs: 60000, //
|
|
64
|
+
timeoutMs: 60000, // 针对此次搜索覆盖超时时间
|
|
66
65
|
extraParam: 'value' // 可以在模板中通过 ${extraParam} 使用
|
|
67
66
|
});
|
|
68
67
|
|
|
@@ -174,22 +173,41 @@ protected override async transform(outputs: Record<string, any>) {
|
|
|
174
173
|
|
|
175
174
|
## 🧠 高级概念
|
|
176
175
|
|
|
177
|
-
###
|
|
176
|
+
### 自动分页:`limit` 与 `maxPages` 的关系
|
|
178
177
|
|
|
179
|
-
`WebSearcher`
|
|
178
|
+
`WebSearcher` 的设计是以结果为导向的。当您调用 `search()` 时,您只需要指定想要多少条结果,搜索器会自动处理翻页逻辑。
|
|
179
|
+
|
|
180
|
+
- **`limit`**: 您期望获取的结果总数。
|
|
181
|
+
- **`maxPages`**: 安全阈值。它限制了搜索器为了满足 `limit` 而允许抓取的最大页数(翻页循环次数)。
|
|
182
|
+
|
|
183
|
+
**协作逻辑示例:**
|
|
184
|
+
如果您请求 `{ limit: 50 }`,但每页只有 5 条结果:
|
|
185
|
+
|
|
186
|
+
1. 搜索器抓取第 1 页(得到 5 条)。
|
|
187
|
+
2. 发现 `5 < 50`,于是自动抓取第 2 页。
|
|
188
|
+
3. 循环持续,直到获取 50 条结果 **或者** 达到了 `maxPages` 的限制(默认为 10 页)。
|
|
189
|
+
|
|
190
|
+
这种机制可以防止因“下一页”选择器失效或引擎陷入死循环而导致的无限抓取,保护您的系统资源。
|
|
180
191
|
|
|
181
192
|
### 用户自定义转换 (User-defined Transforms)
|
|
182
193
|
|
|
183
194
|
用户可以在调用 `search` 时提供自己的 `transform`。它会在引擎内置的转换**之后**运行。
|
|
184
195
|
|
|
196
|
+
这在**过滤广告**或无关内容时非常强大。如果用户过滤掉了某些结果,自动分页逻辑会**自动启动**以抓取更多页面,确保最终返回给您的结果列表既满足 `limit` 数量要求,又只包含有效的条目。
|
|
197
|
+
|
|
185
198
|
```typescript
|
|
186
199
|
await google.search('test', {
|
|
187
|
-
|
|
200
|
+
limit: 20,
|
|
201
|
+
// 示例:过滤掉赞助商结果(广告)并只保留 PDF
|
|
202
|
+
transform: (results) => {
|
|
203
|
+
return results.filter(r => {
|
|
204
|
+
const isAd = r.isSponsored || r.url.includes('googleadservices.com');
|
|
205
|
+
return !isAd && r.url.endsWith('.pdf');
|
|
206
|
+
});
|
|
207
|
+
}
|
|
188
208
|
});
|
|
189
209
|
```
|
|
190
210
|
|
|
191
|
-
如果用户过滤掉了结果,自动分页逻辑会启动以抓取更多页面来满足请求的 limit。
|
|
192
|
-
|
|
193
211
|
### 标准化搜索选项
|
|
194
212
|
|
|
195
213
|
在调用 `search()` 时,您可以提供标准化的选项,搜索引擎会将其映射到特定的参数:
|
package/README.md
CHANGED
|
@@ -42,27 +42,26 @@ console.log(results);
|
|
|
42
42
|
|
|
43
43
|
Since `WebSearcher` extends `FetchSession`, you can instantiate it to keep cookies and storage alive across multiple requests. This is useful for authenticated searches or avoiding bot detection by behaving like a human.
|
|
44
44
|
|
|
45
|
-
|
|
46
|
-
When creating a session, options are merged in the following order:
|
|
45
|
+
### 🛡️ Core Principle: Template is Law
|
|
47
46
|
|
|
48
|
-
|
|
49
|
-
2. **User Options**: Passed to the constructor (can fill missing defaults or override if allowed).
|
|
47
|
+
The `template` defined in the `WebSearcher` subclass acts as the authoritative "blueprint".
|
|
50
48
|
|
|
51
|
-
|
|
49
|
+
- **Template Priority**: If the template defines a property (e.g., `engine: 'browser'`, `headers`), that value is **locked** and cannot be overridden by user options. This ensures engine stability.
|
|
50
|
+
- **User Flexibility**: Properties **not** explicitly defined in the template (such as `proxy`, `timeoutMs`, or custom variables) can be freely set by the user in the constructor or `search()` method.
|
|
52
51
|
|
|
53
52
|
```typescript
|
|
54
53
|
// Create a persistent session
|
|
55
54
|
const google = new GoogleSearcher({
|
|
56
|
-
headless: false, // Override
|
|
55
|
+
headless: false, // Override if not locked in template
|
|
57
56
|
proxy: 'http://my-proxy:8080',
|
|
58
|
-
timeoutMs: 30000 // Set a global timeout
|
|
57
|
+
timeoutMs: 30000 // Set a global timeout (valid if template doesn't define it)
|
|
59
58
|
});
|
|
60
59
|
|
|
61
60
|
try {
|
|
62
61
|
// First query
|
|
63
62
|
// You can also pass runtime options to override session defaults or inject variables
|
|
64
63
|
const results1 = await google.search('term A', {
|
|
65
|
-
timeoutMs: 60000, // Override timeout just for this search
|
|
64
|
+
timeoutMs: 60000, // Override session timeout just for this search
|
|
66
65
|
extraParam: 'value' // Can be used in template as ${extraParam}
|
|
67
66
|
});
|
|
68
67
|
|
|
@@ -172,24 +171,43 @@ protected override async transform(outputs: Record<string, any>) {
|
|
|
172
171
|
}
|
|
173
172
|
```
|
|
174
173
|
|
|
175
|
-
|
|
174
|
+
### 🧠 Advanced Concepts
|
|
176
175
|
|
|
177
|
-
### Auto-Pagination
|
|
176
|
+
### Auto-Pagination: `limit` vs `maxPages`
|
|
178
177
|
|
|
179
|
-
The `WebSearcher` is
|
|
178
|
+
The `WebSearcher` is designed to be result-oriented. When you call `search()`, you specify how many results you want, and the searcher handles the pagination logic.
|
|
179
|
+
|
|
180
|
+
- **`limit`**: Your target number of total results.
|
|
181
|
+
- **`maxPages`**: The safety threshold. It limits how many pages (fetch cycles) the searcher is allowed to navigate to satisfy your `limit`.
|
|
182
|
+
|
|
183
|
+
**Example Logic:**
|
|
184
|
+
If you request `{ limit: 50 }` but each page only has 5 results:
|
|
185
|
+
|
|
186
|
+
1. The searcher fetches page 1 (5 results).
|
|
187
|
+
2. It sees `5 < 50`, so it fetches page 2.
|
|
188
|
+
3. It continues until it has 50 results **OR** it reaches `maxPages` (default 10).
|
|
189
|
+
|
|
190
|
+
This prevent infinite loops if the "Next" button selector is broken or if the search engine keeps returning the same results.
|
|
180
191
|
|
|
181
192
|
### User-defined Transforms
|
|
182
193
|
|
|
183
194
|
Users can provide their own `transform` when calling `search`. This runs **after** the engine's built-in transform.
|
|
184
195
|
|
|
196
|
+
This is extremely powerful for **filtering out ads** or irrelevant content. If the user filters out results, the auto-pagination logic will automatically kick in to fetch more pages to ensure the final result list meets your requested `limit` with only valid entries.
|
|
197
|
+
|
|
185
198
|
```typescript
|
|
186
199
|
await google.search('test', {
|
|
187
|
-
|
|
200
|
+
limit: 20,
|
|
201
|
+
// Example: Filter out sponsored results and only keep PDFs
|
|
202
|
+
transform: (results) => {
|
|
203
|
+
return results.filter(r => {
|
|
204
|
+
const isAd = r.isSponsored || r.url.includes('googleadservices.com');
|
|
205
|
+
return !isAd && r.url.endsWith('.pdf');
|
|
206
|
+
});
|
|
207
|
+
}
|
|
188
208
|
});
|
|
189
209
|
```
|
|
190
210
|
|
|
191
|
-
If the user filters out results, the auto-pagination logic will kick in to fetch more pages to meet the requested limit.
|
|
192
|
-
|
|
193
211
|
### Standardized Search Options
|
|
194
212
|
|
|
195
213
|
When calling `search()`, you can provide standardized options that the search engine will map to specific parameters:
|
package/dist/index.d.mts
CHANGED
|
@@ -15,7 +15,17 @@ interface StandardSearchResult {
|
|
|
15
15
|
snippet?: string;
|
|
16
16
|
/** An optional image URL associated with the result. */
|
|
17
17
|
image?: string;
|
|
18
|
-
/**
|
|
18
|
+
/** The date the result was published or last updated. */
|
|
19
|
+
date?: string | Date;
|
|
20
|
+
/** The author or source name of the result. */
|
|
21
|
+
author?: string;
|
|
22
|
+
/** The favicon URL of the source website. */
|
|
23
|
+
favicon?: string;
|
|
24
|
+
/** The rank or position of the result (usually 1-indexed). */
|
|
25
|
+
rank?: number;
|
|
26
|
+
/** The source website name (e.g., 'GitHub', 'StackOverflow'). */
|
|
27
|
+
source?: string;
|
|
28
|
+
/** Allows for engine-specific extra fields (e.g., siteIcon, category). */
|
|
19
29
|
[key: string]: any;
|
|
20
30
|
}
|
|
21
31
|
/**
|
|
@@ -52,6 +62,16 @@ interface PaginationConfig {
|
|
|
52
62
|
* Required if type is 'click-next'.
|
|
53
63
|
*/
|
|
54
64
|
nextButtonSelector?: string;
|
|
65
|
+
/**
|
|
66
|
+
* The safety threshold for the maximum number of pages to fetch automatically
|
|
67
|
+
* in a single search call.
|
|
68
|
+
*
|
|
69
|
+
* Even if the requested `limit` of results hasn't been reached, the searcher
|
|
70
|
+
* will stop after this many pages to prevent infinite loops or excessive API usage.
|
|
71
|
+
*
|
|
72
|
+
* @default 10
|
|
73
|
+
*/
|
|
74
|
+
maxPages?: number;
|
|
55
75
|
}
|
|
56
76
|
/**
|
|
57
77
|
* Context object passed to the transform function.
|
|
@@ -80,6 +100,15 @@ type SafeSearchLevel = 'off' | 'moderate' | 'strict';
|
|
|
80
100
|
interface SearchOptions {
|
|
81
101
|
/** The maximum number of results to retrieve. */
|
|
82
102
|
limit?: number;
|
|
103
|
+
/**
|
|
104
|
+
* The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
|
|
105
|
+
*
|
|
106
|
+
* This is a safety guard. If the `limit` is high but each page has few results,
|
|
107
|
+
* the searcher will stop once this page count is reached.
|
|
108
|
+
*
|
|
109
|
+
* If not provided, it defaults to the value in `PaginationConfig` or 10.
|
|
110
|
+
*/
|
|
111
|
+
maxPages?: number;
|
|
83
112
|
/**
|
|
84
113
|
* Date range for the search results.
|
|
85
114
|
* Default: 'all'
|
package/dist/index.d.ts
CHANGED
|
@@ -15,7 +15,17 @@ interface StandardSearchResult {
|
|
|
15
15
|
snippet?: string;
|
|
16
16
|
/** An optional image URL associated with the result. */
|
|
17
17
|
image?: string;
|
|
18
|
-
/**
|
|
18
|
+
/** The date the result was published or last updated. */
|
|
19
|
+
date?: string | Date;
|
|
20
|
+
/** The author or source name of the result. */
|
|
21
|
+
author?: string;
|
|
22
|
+
/** The favicon URL of the source website. */
|
|
23
|
+
favicon?: string;
|
|
24
|
+
/** The rank or position of the result (usually 1-indexed). */
|
|
25
|
+
rank?: number;
|
|
26
|
+
/** The source website name (e.g., 'GitHub', 'StackOverflow'). */
|
|
27
|
+
source?: string;
|
|
28
|
+
/** Allows for engine-specific extra fields (e.g., siteIcon, category). */
|
|
19
29
|
[key: string]: any;
|
|
20
30
|
}
|
|
21
31
|
/**
|
|
@@ -52,6 +62,16 @@ interface PaginationConfig {
|
|
|
52
62
|
* Required if type is 'click-next'.
|
|
53
63
|
*/
|
|
54
64
|
nextButtonSelector?: string;
|
|
65
|
+
/**
|
|
66
|
+
* The safety threshold for the maximum number of pages to fetch automatically
|
|
67
|
+
* in a single search call.
|
|
68
|
+
*
|
|
69
|
+
* Even if the requested `limit` of results hasn't been reached, the searcher
|
|
70
|
+
* will stop after this many pages to prevent infinite loops or excessive API usage.
|
|
71
|
+
*
|
|
72
|
+
* @default 10
|
|
73
|
+
*/
|
|
74
|
+
maxPages?: number;
|
|
55
75
|
}
|
|
56
76
|
/**
|
|
57
77
|
* Context object passed to the transform function.
|
|
@@ -80,6 +100,15 @@ type SafeSearchLevel = 'off' | 'moderate' | 'strict';
|
|
|
80
100
|
interface SearchOptions {
|
|
81
101
|
/** The maximum number of results to retrieve. */
|
|
82
102
|
limit?: number;
|
|
103
|
+
/**
|
|
104
|
+
* The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
|
|
105
|
+
*
|
|
106
|
+
* This is a safety guard. If the `limit` is high but each page has few results,
|
|
107
|
+
* the searcher will stop once this page count is reached.
|
|
108
|
+
*
|
|
109
|
+
* If not provided, it defaults to the value in `PaginationConfig` or 10.
|
|
110
|
+
*/
|
|
111
|
+
maxPages?: number;
|
|
83
112
|
/**
|
|
84
113
|
* Date range for the search results.
|
|
85
114
|
* Default: 'all'
|
package/dist/index.js
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
"use strict";var t,e=Object.defineProperty,r=Object.getOwnPropertyDescriptor,s=Object.getOwnPropertyNames,a=Object.prototype.hasOwnProperty,i={};((t,r)=>{for(var s in r)e(t,s,{get:r[s],enumerable:!0})})(i,{GoogleSearcher:()=>
|
|
1
|
+
"use strict";var t,e=Object.defineProperty,r=Object.getOwnPropertyDescriptor,s=Object.getOwnPropertyNames,a=Object.prototype.hasOwnProperty,i={};((t,r)=>{for(var s in r)e(t,s,{get:r[s],enumerable:!0})})(i,{GoogleSearcher:()=>f,WebSearcher:()=>h}),module.exports=(t=i,((t,i,n,o)=>{if(i&&"object"==typeof i||"function"==typeof i)for(let c of s(i))a.call(t,c)||c===n||e(t,c,{get:()=>i[c],enumerable:!(o=r(i,c))||o.enumerable});return t})(e({},"__esModule",{value:!0}),t));var n=require("@isdk/web-fetcher"),o=require("custom-factory"),c=require("lodash-es");function l(t,e){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,r)=>{const s=e[r.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>l(t,e));if((0,c.isPlainObject)(t)){const r={};for(const s in t)Object.prototype.hasOwnProperty.call(t,s)&&(r[s]=l(t[s],e));return r}return t}var u=require("lodash-es"),h=class extends n.FetchSession{static async search(t,e,r={}){const s=this.createObject(t,r);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(e,r)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const e=this.template,r=(0,u.defaultsDeep)({},e,t);return e.engine&&"auto"!==e.engine||!t.engine||(r.engine=t.engine),super.createContext(r)}async search(t,e={}){const r=e.limit||10,s=[];let a=0;const i=this.pagination?.startValue??0,n=this.pagination?.increment??1,o=e.maxPages||this.pagination?.maxPages||10;for(;s.length<r;){const c=this.formatOptions(e),h=i+a*n,f={...e,...c,query:t,page:a+i,offset:h,limit:r},m=l(this.template,f),d=(0,u.defaultsDeep)({},m,e),g=[];if(0===a||"url-param"===this.pagination?.type?d.url&&g.push({id:"goto",params:{url:d.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(g.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),g.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),d.actions){const t=d.actions.filter(t=>!(g.length>0&&"goto"===g[0].id&&"goto"===t.id));g.push(...t)}d.engine&&this.context.engine!==d.engine&&d.engine;const{outputs:p}=await this.executeAll(g),w={query:t,page:a,limit:e.limit};let y=[];if(y=await this.transform(p,w),e.transform&&(y=await e.transform(y,w)),!y||0===y.length)break;if(s.push(...y),s.length>=r||!this.pagination)break;if(a++,a>=o)break}return s.slice(0,r)}async transform(t,e){return t.results||[]}formatOptions(t){return{...t}}};h._isFactory=!1,(0,o.addBaseFactoryAbility)(h),h.prototype.name="Searcher";var f=class extends h{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const e={};if(t.timeRange)if("string"==typeof t.timeRange){const r={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};r[t.timeRange]&&(e.tbs=r[t.timeRange])}else{const r=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(r.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;e.tbs=`cdr:1,cd_min:${t(r)},cd_max:${t(s)}`}}if(t.category){const r={images:"isch",videos:"vid",news:"nws"};r[t.category]&&(e.tbm=r[t.category])}return t.region&&(e.gl=t.region),t.language&&(e.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?e.safe="active":"off"===t.safeSearch&&(e.safe="images")),e}async transform(t){const e=t.results||[];return Array.isArray(e)?e.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const e=new URL(t.url,"https://www.google.com").searchParams.get("q");e&&(t.url=e)}catch(t){}return t}):[]}};f.alias=["google"];
|
package/dist/index.mjs
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
import{FetchSession as t}from"@isdk/web-fetcher";import{addBaseFactoryAbility as r}from"custom-factory";import{isPlainObject as e}from"lodash-es";function s(t,r){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,e)=>{const s=r[e.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>s(t,r));if(e(t)){const e={};for(const a in t)Object.prototype.hasOwnProperty.call(t,a)&&(e[a]=s(t[a],r));return e}return t}import{defaultsDeep as a}from"lodash-es";var i=class extends t{static async search(t,r,e={}){const s=this.createObject(t,e);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(r,e)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const r=this.template,e=a({},r,t);return r.engine&&"auto"!==r.engine||!t.engine||(e.engine=t.engine),super.createContext(e)}async search(t,r={}){const e=r.limit||10,i=[];let o=0;const n=this.pagination?.startValue??0,c=this.pagination?.increment??1;for(;i.length<e;){const l=this.formatOptions(r),m=n+o*c,
|
|
1
|
+
import{FetchSession as t}from"@isdk/web-fetcher";import{addBaseFactoryAbility as r}from"custom-factory";import{isPlainObject as e}from"lodash-es";function s(t,r){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,e)=>{const s=r[e.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>s(t,r));if(e(t)){const e={};for(const a in t)Object.prototype.hasOwnProperty.call(t,a)&&(e[a]=s(t[a],r));return e}return t}import{defaultsDeep as a}from"lodash-es";var i=class extends t{static async search(t,r,e={}){const s=this.createObject(t,e);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(r,e)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const r=this.template,e=a({},r,t);return r.engine&&"auto"!==r.engine||!t.engine||(e.engine=t.engine),super.createContext(e)}async search(t,r={}){const e=r.limit||10,i=[];let o=0;const n=this.pagination?.startValue??0,c=this.pagination?.increment??1,h=r.maxPages||this.pagination?.maxPages||10;for(;i.length<e;){const l=this.formatOptions(r),m=n+o*c,f={...r,...l,query:t,page:o+n,offset:m,limit:e},u=s(this.template,f),p=a({},u,r),d=[];if(0===o||"url-param"===this.pagination?.type?p.url&&d.push({id:"goto",params:{url:p.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(d.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),d.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),p.actions){const t=p.actions.filter(t=>!(d.length>0&&"goto"===d[0].id&&"goto"===t.id));d.push(...t)}p.engine&&this.context.engine!==p.engine&&p.engine;const{outputs:w}=await this.executeAll(d),g={query:t,page:o,limit:r.limit};let y=[];if(y=await this.transform(w,g),r.transform&&(y=await r.transform(y,g)),!y||0===y.length)break;if(i.push(...y),i.length>=e||!this.pagination)break;if(o++,o>=h)break}return i.slice(0,e)}async transform(t,r){return t.results||[]}formatOptions(t){return{...t}}};i._isFactory=!1,r(i),i.prototype.name="Searcher";var o=class extends i{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const r={};if(t.timeRange)if("string"==typeof t.timeRange){const e={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};e[t.timeRange]&&(r.tbs=e[t.timeRange])}else{const e=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(e.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;r.tbs=`cdr:1,cd_min:${t(e)},cd_max:${t(s)}`}}if(t.category){const e={images:"isch",videos:"vid",news:"nws"};e[t.category]&&(r.tbm=e[t.category])}return t.region&&(r.gl=t.region),t.language&&(r.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?r.safe="active":"off"===t.safeSearch&&(r.safe="images")),r}async transform(t){const r=t.results||[];return Array.isArray(r)?r.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const r=new URL(t.url,"https://www.google.com").searchParams.get("q");r&&(t.url=r)}catch(t){}return t}):[]}};o.alias=["google"];export{o as GoogleSearcher,i as WebSearcher};
|
package/docs/README.md
CHANGED
|
@@ -46,27 +46,26 @@ console.log(results);
|
|
|
46
46
|
|
|
47
47
|
Since `WebSearcher` extends `FetchSession`, you can instantiate it to keep cookies and storage alive across multiple requests. This is useful for authenticated searches or avoiding bot detection by behaving like a human.
|
|
48
48
|
|
|
49
|
-
|
|
50
|
-
When creating a session, options are merged in the following order:
|
|
49
|
+
### 🛡️ Core Principle: Template is Law
|
|
51
50
|
|
|
52
|
-
|
|
53
|
-
2. **User Options**: Passed to the constructor (can fill missing defaults or override if allowed).
|
|
51
|
+
The `template` defined in the `WebSearcher` subclass acts as the authoritative "blueprint".
|
|
54
52
|
|
|
55
|
-
|
|
53
|
+
- **Template Priority**: If the template defines a property (e.g., `engine: 'browser'`, `headers`), that value is **locked** and cannot be overridden by user options. This ensures engine stability.
|
|
54
|
+
- **User Flexibility**: Properties **not** explicitly defined in the template (such as `proxy`, `timeoutMs`, or custom variables) can be freely set by the user in the constructor or `search()` method.
|
|
56
55
|
|
|
57
56
|
```typescript
|
|
58
57
|
// Create a persistent session
|
|
59
58
|
const google = new GoogleSearcher({
|
|
60
|
-
headless: false, // Override
|
|
59
|
+
headless: false, // Override if not locked in template
|
|
61
60
|
proxy: 'http://my-proxy:8080',
|
|
62
|
-
timeoutMs: 30000 // Set a global timeout
|
|
61
|
+
timeoutMs: 30000 // Set a global timeout (valid if template doesn't define it)
|
|
63
62
|
});
|
|
64
63
|
|
|
65
64
|
try {
|
|
66
65
|
// First query
|
|
67
66
|
// You can also pass runtime options to override session defaults or inject variables
|
|
68
67
|
const results1 = await google.search('term A', {
|
|
69
|
-
timeoutMs: 60000, // Override timeout just for this search
|
|
68
|
+
timeoutMs: 60000, // Override session timeout just for this search
|
|
70
69
|
extraParam: 'value' // Can be used in template as ${extraParam}
|
|
71
70
|
});
|
|
72
71
|
|
|
@@ -176,24 +175,43 @@ protected override async transform(outputs: Record<string, any>) {
|
|
|
176
175
|
}
|
|
177
176
|
```
|
|
178
177
|
|
|
179
|
-
|
|
178
|
+
### 🧠 Advanced Concepts
|
|
180
179
|
|
|
181
|
-
### Auto-Pagination
|
|
180
|
+
### Auto-Pagination: `limit` vs `maxPages`
|
|
182
181
|
|
|
183
|
-
The `WebSearcher` is
|
|
182
|
+
The `WebSearcher` is designed to be result-oriented. When you call `search()`, you specify how many results you want, and the searcher handles the pagination logic.
|
|
183
|
+
|
|
184
|
+
- **`limit`**: Your target number of total results.
|
|
185
|
+
- **`maxPages`**: The safety threshold. It limits how many pages (fetch cycles) the searcher is allowed to navigate to satisfy your `limit`.
|
|
186
|
+
|
|
187
|
+
**Example Logic:**
|
|
188
|
+
If you request `{ limit: 50 }` but each page only has 5 results:
|
|
189
|
+
|
|
190
|
+
1. The searcher fetches page 1 (5 results).
|
|
191
|
+
2. It sees `5 < 50`, so it fetches page 2.
|
|
192
|
+
3. It continues until it has 50 results **OR** it reaches `maxPages` (default 10).
|
|
193
|
+
|
|
194
|
+
This prevent infinite loops if the "Next" button selector is broken or if the search engine keeps returning the same results.
|
|
184
195
|
|
|
185
196
|
### User-defined Transforms
|
|
186
197
|
|
|
187
198
|
Users can provide their own `transform` when calling `search`. This runs **after** the engine's built-in transform.
|
|
188
199
|
|
|
200
|
+
This is extremely powerful for **filtering out ads** or irrelevant content. If the user filters out results, the auto-pagination logic will automatically kick in to fetch more pages to ensure the final result list meets your requested `limit` with only valid entries.
|
|
201
|
+
|
|
189
202
|
```typescript
|
|
190
203
|
await google.search('test', {
|
|
191
|
-
|
|
204
|
+
limit: 20,
|
|
205
|
+
// Example: Filter out sponsored results and only keep PDFs
|
|
206
|
+
transform: (results) => {
|
|
207
|
+
return results.filter(r => {
|
|
208
|
+
const isAd = r.isSponsored || r.url.includes('googleadservices.com');
|
|
209
|
+
return !isAd && r.url.endsWith('.pdf');
|
|
210
|
+
});
|
|
211
|
+
}
|
|
192
212
|
});
|
|
193
213
|
```
|
|
194
214
|
|
|
195
|
-
If the user filters out results, the auto-pagination logic will kick in to fetch more pages to meet the requested limit.
|
|
196
|
-
|
|
197
215
|
### Standardized Search Options
|
|
198
216
|
|
|
199
217
|
When calling `search()`, you can provide standardized options that the search engine will map to specific parameters:
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Class: GoogleSearcher
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/
|
|
9
|
+
Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L24)
|
|
10
10
|
|
|
11
11
|
A sample implementation of a Google Search scraper.
|
|
12
12
|
|
|
@@ -37,7 +37,7 @@ Use this class to understand:
|
|
|
37
37
|
|
|
38
38
|
> **new GoogleSearcher**(`options?`): `GoogleSearcher`
|
|
39
39
|
|
|
40
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
40
|
+
Defined in: web-fetcher/dist/index.d.ts:2275
|
|
41
41
|
|
|
42
42
|
Creates a new FetchSession.
|
|
43
43
|
|
|
@@ -63,7 +63,7 @@ Configuration options for the fetcher.
|
|
|
63
63
|
|
|
64
64
|
> `protected` **closed**: `boolean`
|
|
65
65
|
|
|
66
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
66
|
+
Defined in: web-fetcher/dist/index.d.ts:2269
|
|
67
67
|
|
|
68
68
|
#### Inherited from
|
|
69
69
|
|
|
@@ -75,7 +75,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
|
|
|
75
75
|
|
|
76
76
|
> `readonly` **context**: `FetchContext`
|
|
77
77
|
|
|
78
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
78
|
+
Defined in: web-fetcher/dist/index.d.ts:2268
|
|
79
79
|
|
|
80
80
|
The execution context for this session, containing configurations, event bus, and shared state.
|
|
81
81
|
|
|
@@ -89,7 +89,7 @@ The execution context for this session, containing configurations, event bus, an
|
|
|
89
89
|
|
|
90
90
|
> `readonly` **id**: `string`
|
|
91
91
|
|
|
92
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
92
|
+
Defined in: web-fetcher/dist/index.d.ts:2264
|
|
93
93
|
|
|
94
94
|
Unique identifier for the session.
|
|
95
95
|
|
|
@@ -103,7 +103,7 @@ Unique identifier for the session.
|
|
|
103
103
|
|
|
104
104
|
> `protected` **options**: `FetcherOptions`
|
|
105
105
|
|
|
106
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
106
|
+
Defined in: web-fetcher/dist/index.d.ts:2260
|
|
107
107
|
|
|
108
108
|
#### Inherited from
|
|
109
109
|
|
|
@@ -115,7 +115,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
|
|
|
115
115
|
|
|
116
116
|
> `static` **\_isFactory**: `boolean` = `false`
|
|
117
117
|
|
|
118
|
-
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/
|
|
118
|
+
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
|
|
119
119
|
|
|
120
120
|
#### Inherited from
|
|
121
121
|
|
|
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
|
|
|
127
127
|
|
|
128
128
|
> `static` **alias**: `string`[]
|
|
129
129
|
|
|
130
|
-
Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/
|
|
130
|
+
Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L25)
|
|
131
131
|
|
|
132
132
|
Engine alias(es). Can be a single string or an array of strings.
|
|
133
133
|
Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
@@ -142,7 +142,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
|
142
142
|
|
|
143
143
|
> `static` **createObject**: (`name`, ...`args`) => [`WebSearcher`](WebSearcher.md)
|
|
144
144
|
|
|
145
|
-
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/
|
|
145
|
+
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
|
|
146
146
|
|
|
147
147
|
Creates an instance of the registered search engine.
|
|
148
148
|
|
|
@@ -176,7 +176,7 @@ An instance of the search engine.
|
|
|
176
176
|
|
|
177
177
|
> `static` **forEach**: (`cb`) => `void`
|
|
178
178
|
|
|
179
|
-
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/
|
|
179
|
+
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
|
|
180
180
|
|
|
181
181
|
Iterates over all registered engines.
|
|
182
182
|
|
|
@@ -202,7 +202,7 @@ Callback function to invoke for each registered engine.
|
|
|
202
202
|
|
|
203
203
|
> `static` **get**: (`name`) => *typeof* [`WebSearcher`](WebSearcher.md)
|
|
204
204
|
|
|
205
|
-
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/
|
|
205
|
+
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
|
|
206
206
|
|
|
207
207
|
Retrieves a registered search engine class by name.
|
|
208
208
|
|
|
@@ -230,7 +230,7 @@ The search engine class constructor.
|
|
|
230
230
|
|
|
231
231
|
> `static` `optional` **name**: `string`
|
|
232
232
|
|
|
233
|
-
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/
|
|
233
|
+
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
|
|
234
234
|
|
|
235
235
|
Custom engine name. If not provided, it is derived from the class name.
|
|
236
236
|
For example, `GoogleSearcher` becomes `Google`.
|
|
@@ -245,7 +245,7 @@ For example, `GoogleSearcher` becomes `Google`.
|
|
|
245
245
|
|
|
246
246
|
> `static` **register**: (`ctor`, `options?`) => `boolean`
|
|
247
247
|
|
|
248
|
-
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/
|
|
248
|
+
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
|
|
249
249
|
|
|
250
250
|
Registers a search engine class.
|
|
251
251
|
|
|
@@ -279,7 +279,7 @@ Registration options. If a string is provided, it is used as the registered name
|
|
|
279
279
|
|
|
280
280
|
> `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
|
|
281
281
|
|
|
282
|
-
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/
|
|
282
|
+
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
|
|
283
283
|
|
|
284
284
|
Sets aliases for a registered engine.
|
|
285
285
|
|
|
@@ -311,7 +311,7 @@ Aliases to add.
|
|
|
311
311
|
|
|
312
312
|
> `static` **unregister**: (`name?`) => `void`
|
|
313
313
|
|
|
314
|
-
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
314
|
+
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
|
|
315
315
|
|
|
316
316
|
Unregisters a search engine.
|
|
317
317
|
|
|
@@ -339,7 +339,7 @@ The name or class to unregister.
|
|
|
339
339
|
|
|
340
340
|
> **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md)
|
|
341
341
|
|
|
342
|
-
Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
342
|
+
Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L61)
|
|
343
343
|
|
|
344
344
|
Configures pagination for Google Search results.
|
|
345
345
|
Uses the 'start' URL parameter, incrementing by 10 for each page.
|
|
@@ -360,7 +360,7 @@ Uses the 'start' URL parameter, incrementing by 10 for each page.
|
|
|
360
360
|
|
|
361
361
|
> **get** **template**(): `FetcherOptions`
|
|
362
362
|
|
|
363
|
-
Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/
|
|
363
|
+
Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L32)
|
|
364
364
|
|
|
365
365
|
Defines the fetch template for Google Search.
|
|
366
366
|
|
|
@@ -380,7 +380,7 @@ The fetcher configuration including the URL pattern and extraction rules.
|
|
|
380
380
|
|
|
381
381
|
> `protected` **createContext**(`options`): `FetchContext`
|
|
382
382
|
|
|
383
|
-
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/
|
|
383
|
+
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
|
|
384
384
|
|
|
385
385
|
#### Parameters
|
|
386
386
|
|
|
@@ -402,7 +402,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
|
|
|
402
402
|
|
|
403
403
|
> **dispose**(): `Promise`\<`void`\>
|
|
404
404
|
|
|
405
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
405
|
+
Defined in: web-fetcher/dist/index.d.ts:2334
|
|
406
406
|
|
|
407
407
|
Disposes of the session and its associated engine.
|
|
408
408
|
|
|
@@ -425,7 +425,7 @@ This method should be called when the session is no longer needed to free up res
|
|
|
425
425
|
|
|
426
426
|
> **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
|
|
427
427
|
|
|
428
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
428
|
+
Defined in: web-fetcher/dist/index.d.ts:2289
|
|
429
429
|
|
|
430
430
|
Executes a single action within the session.
|
|
431
431
|
|
|
@@ -473,7 +473,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
|
|
|
473
473
|
|
|
474
474
|
> **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
|
|
475
475
|
|
|
476
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
476
|
+
Defined in: web-fetcher/dist/index.d.ts:2306
|
|
477
477
|
|
|
478
478
|
Executes a sequence of actions.
|
|
479
479
|
|
|
@@ -517,7 +517,7 @@ const { result, outputs } = await session.executeAll([
|
|
|
517
517
|
|
|
518
518
|
> `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
|
|
519
519
|
|
|
520
|
-
Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/
|
|
520
|
+
Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L82)
|
|
521
521
|
|
|
522
522
|
Maps standard `SearchOptions` to Google's specific URL parameters.
|
|
523
523
|
|
|
@@ -551,7 +551,7 @@ A map of variables to inject into the URL template.
|
|
|
551
551
|
|
|
552
552
|
> **getOutputs**(): `Record`\<`string`, `any`\>
|
|
553
553
|
|
|
554
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
554
|
+
Defined in: web-fetcher/dist/index.d.ts:2317
|
|
555
555
|
|
|
556
556
|
Retrieves all outputs accumulated during the session.
|
|
557
557
|
|
|
@@ -571,7 +571,7 @@ A record of stored output data.
|
|
|
571
571
|
|
|
572
572
|
> **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
|
|
573
573
|
|
|
574
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
574
|
+
Defined in: web-fetcher/dist/index.d.ts:2323
|
|
575
575
|
|
|
576
576
|
Gets the current state of the session, including cookies and engine-specific state.
|
|
577
577
|
|
|
@@ -591,7 +591,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
|
|
|
591
591
|
|
|
592
592
|
> **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
593
593
|
|
|
594
|
-
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/
|
|
594
|
+
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
|
|
595
595
|
|
|
596
596
|
Executes a search query.
|
|
597
597
|
|
|
@@ -628,7 +628,7 @@ A promise resolving to an array of standardized search results.
|
|
|
628
628
|
|
|
629
629
|
> `protected` **transform**(`outputs`): `Promise`\<`any`[]\>
|
|
630
630
|
|
|
631
|
-
Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/
|
|
631
|
+
Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L144)
|
|
632
632
|
|
|
633
633
|
Cleans and normalizes the extracted results.
|
|
634
634
|
Specifically, it unwraps Google's redirect URLs (starting with `/url?q=`).
|
|
@@ -657,7 +657,7 @@ An array of cleaned search results.
|
|
|
657
657
|
|
|
658
658
|
> `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
659
659
|
|
|
660
|
-
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/
|
|
660
|
+
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
|
|
661
661
|
|
|
662
662
|
Static helper to execute a one-off search.
|
|
663
663
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Abstract Class: WebSearcher
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/
|
|
9
|
+
Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L31)
|
|
10
10
|
|
|
11
11
|
The abstract base class for all search engines.
|
|
12
12
|
|
|
@@ -41,7 +41,7 @@ WebSearcher.register(MySearcher);
|
|
|
41
41
|
|
|
42
42
|
> **new WebSearcher**(`options?`): `WebSearcher`
|
|
43
43
|
|
|
44
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
44
|
+
Defined in: web-fetcher/dist/index.d.ts:2275
|
|
45
45
|
|
|
46
46
|
Creates a new FetchSession.
|
|
47
47
|
|
|
@@ -67,7 +67,7 @@ Configuration options for the fetcher.
|
|
|
67
67
|
|
|
68
68
|
> `protected` **closed**: `boolean`
|
|
69
69
|
|
|
70
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
70
|
+
Defined in: web-fetcher/dist/index.d.ts:2269
|
|
71
71
|
|
|
72
72
|
#### Inherited from
|
|
73
73
|
|
|
@@ -79,7 +79,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
|
|
|
79
79
|
|
|
80
80
|
> `readonly` **context**: `FetchContext`
|
|
81
81
|
|
|
82
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
82
|
+
Defined in: web-fetcher/dist/index.d.ts:2268
|
|
83
83
|
|
|
84
84
|
The execution context for this session, containing configurations, event bus, and shared state.
|
|
85
85
|
|
|
@@ -93,7 +93,7 @@ The execution context for this session, containing configurations, event bus, an
|
|
|
93
93
|
|
|
94
94
|
> `readonly` **id**: `string`
|
|
95
95
|
|
|
96
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
96
|
+
Defined in: web-fetcher/dist/index.d.ts:2264
|
|
97
97
|
|
|
98
98
|
Unique identifier for the session.
|
|
99
99
|
|
|
@@ -107,7 +107,7 @@ Unique identifier for the session.
|
|
|
107
107
|
|
|
108
108
|
> `protected` **options**: `FetcherOptions`
|
|
109
109
|
|
|
110
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
110
|
+
Defined in: web-fetcher/dist/index.d.ts:2260
|
|
111
111
|
|
|
112
112
|
#### Inherited from
|
|
113
113
|
|
|
@@ -119,7 +119,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
|
|
|
119
119
|
|
|
120
120
|
> `static` **\_isFactory**: `boolean` = `false`
|
|
121
121
|
|
|
122
|
-
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/
|
|
122
|
+
Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
|
|
123
123
|
|
|
124
124
|
***
|
|
125
125
|
|
|
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
|
|
|
127
127
|
|
|
128
128
|
> `static` `optional` **alias**: `string` \| `string`[]
|
|
129
129
|
|
|
130
|
-
Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/
|
|
130
|
+
Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L45)
|
|
131
131
|
|
|
132
132
|
Engine alias(es). Can be a single string or an array of strings.
|
|
133
133
|
Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
@@ -138,7 +138,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
|
|
|
138
138
|
|
|
139
139
|
> `static` **createObject**: (`name`, ...`args`) => `WebSearcher`
|
|
140
140
|
|
|
141
|
-
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/
|
|
141
|
+
Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
|
|
142
142
|
|
|
143
143
|
Creates an instance of the registered search engine.
|
|
144
144
|
|
|
@@ -168,7 +168,7 @@ An instance of the search engine.
|
|
|
168
168
|
|
|
169
169
|
> `static` **forEach**: (`cb`) => `void`
|
|
170
170
|
|
|
171
|
-
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/
|
|
171
|
+
Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
|
|
172
172
|
|
|
173
173
|
Iterates over all registered engines.
|
|
174
174
|
|
|
@@ -190,7 +190,7 @@ Callback function to invoke for each registered engine.
|
|
|
190
190
|
|
|
191
191
|
> `static` **get**: (`name`) => *typeof* `WebSearcher`
|
|
192
192
|
|
|
193
|
-
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/
|
|
193
|
+
Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
|
|
194
194
|
|
|
195
195
|
Retrieves a registered search engine class by name.
|
|
196
196
|
|
|
@@ -214,7 +214,7 @@ The search engine class constructor.
|
|
|
214
214
|
|
|
215
215
|
> `static` `optional` **name**: `string`
|
|
216
216
|
|
|
217
|
-
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/
|
|
217
|
+
Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
|
|
218
218
|
|
|
219
219
|
Custom engine name. If not provided, it is derived from the class name.
|
|
220
220
|
For example, `GoogleSearcher` becomes `Google`.
|
|
@@ -225,7 +225,7 @@ For example, `GoogleSearcher` becomes `Google`.
|
|
|
225
225
|
|
|
226
226
|
> `static` **register**: (`ctor`, `options?`) => `boolean`
|
|
227
227
|
|
|
228
|
-
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/
|
|
228
|
+
Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
|
|
229
229
|
|
|
230
230
|
Registers a search engine class.
|
|
231
231
|
|
|
@@ -255,7 +255,7 @@ Registration options. If a string is provided, it is used as the registered name
|
|
|
255
255
|
|
|
256
256
|
> `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
|
|
257
257
|
|
|
258
|
-
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/
|
|
258
|
+
Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
|
|
259
259
|
|
|
260
260
|
Sets aliases for a registered engine.
|
|
261
261
|
|
|
@@ -283,7 +283,7 @@ Aliases to add.
|
|
|
283
283
|
|
|
284
284
|
> `static` **unregister**: (`name?`) => `void`
|
|
285
285
|
|
|
286
|
-
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/
|
|
286
|
+
Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
|
|
287
287
|
|
|
288
288
|
Unregisters a search engine.
|
|
289
289
|
|
|
@@ -307,7 +307,7 @@ The name or class to unregister.
|
|
|
307
307
|
|
|
308
308
|
> **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md) \| `undefined`
|
|
309
309
|
|
|
310
|
-
Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/
|
|
310
|
+
Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L151)
|
|
311
311
|
|
|
312
312
|
Optional pagination configuration.
|
|
313
313
|
Defines how the searcher navigates to subsequent pages.
|
|
@@ -326,7 +326,7 @@ If undefined, the searcher will only fetch the first page.
|
|
|
326
326
|
|
|
327
327
|
> **get** `abstract` **template**(): `FetcherOptions`
|
|
328
328
|
|
|
329
|
-
Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/
|
|
329
|
+
Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L143)
|
|
330
330
|
|
|
331
331
|
The declarative template for the fetch options.
|
|
332
332
|
|
|
@@ -356,7 +356,7 @@ get template() {
|
|
|
356
356
|
|
|
357
357
|
> `protected` **createContext**(`options`): `FetchContext`
|
|
358
358
|
|
|
359
|
-
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/
|
|
359
|
+
Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
|
|
360
360
|
|
|
361
361
|
#### Parameters
|
|
362
362
|
|
|
@@ -378,7 +378,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
|
|
|
378
378
|
|
|
379
379
|
> **dispose**(): `Promise`\<`void`\>
|
|
380
380
|
|
|
381
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
381
|
+
Defined in: web-fetcher/dist/index.d.ts:2334
|
|
382
382
|
|
|
383
383
|
Disposes of the session and its associated engine.
|
|
384
384
|
|
|
@@ -401,7 +401,7 @@ This method should be called when the session is no longer needed to free up res
|
|
|
401
401
|
|
|
402
402
|
> **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
|
|
403
403
|
|
|
404
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
404
|
+
Defined in: web-fetcher/dist/index.d.ts:2289
|
|
405
405
|
|
|
406
406
|
Executes a single action within the session.
|
|
407
407
|
|
|
@@ -449,7 +449,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
|
|
|
449
449
|
|
|
450
450
|
> **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
|
|
451
451
|
|
|
452
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
452
|
+
Defined in: web-fetcher/dist/index.d.ts:2306
|
|
453
453
|
|
|
454
454
|
Executes a sequence of actions.
|
|
455
455
|
|
|
@@ -493,7 +493,7 @@ const { result, outputs } = await session.executeAll([
|
|
|
493
493
|
|
|
494
494
|
> `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
|
|
495
495
|
|
|
496
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
496
|
+
Defined in: [web-searcher/src/searcher.ts:309](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L309)
|
|
497
497
|
|
|
498
498
|
Transforms standard options into engine-specific template variables.
|
|
499
499
|
|
|
@@ -521,7 +521,7 @@ A dictionary of variables to be injected into the template.
|
|
|
521
521
|
|
|
522
522
|
> **getOutputs**(): `Record`\<`string`, `any`\>
|
|
523
523
|
|
|
524
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
524
|
+
Defined in: web-fetcher/dist/index.d.ts:2317
|
|
525
525
|
|
|
526
526
|
Retrieves all outputs accumulated during the session.
|
|
527
527
|
|
|
@@ -541,7 +541,7 @@ A record of stored output data.
|
|
|
541
541
|
|
|
542
542
|
> **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
|
|
543
543
|
|
|
544
|
-
Defined in: web-fetcher/dist/index.d.ts:
|
|
544
|
+
Defined in: web-fetcher/dist/index.d.ts:2323
|
|
545
545
|
|
|
546
546
|
Gets the current state of the session, including cookies and engine-specific state.
|
|
547
547
|
|
|
@@ -561,7 +561,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
|
|
|
561
561
|
|
|
562
562
|
> **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
563
563
|
|
|
564
|
-
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/
|
|
564
|
+
Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
|
|
565
565
|
|
|
566
566
|
Executes a search query.
|
|
567
567
|
|
|
@@ -594,7 +594,7 @@ A promise resolving to an array of standardized search results.
|
|
|
594
594
|
|
|
595
595
|
> `protected` **transform**(`outputs`, `context`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
596
596
|
|
|
597
|
-
Defined in: [web-searcher/src/searcher.ts:
|
|
597
|
+
Defined in: [web-searcher/src/searcher.ts:291](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L291)
|
|
598
598
|
|
|
599
599
|
Transform and clean the raw extracted results.
|
|
600
600
|
|
|
@@ -627,7 +627,7 @@ A promise resolving to an array of standardized search results.
|
|
|
627
627
|
|
|
628
628
|
> `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
|
|
629
629
|
|
|
630
|
-
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/
|
|
630
|
+
Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
|
|
631
631
|
|
|
632
632
|
Static helper to execute a one-off search.
|
|
633
633
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: CustomTimeRange
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L104)
|
|
10
10
|
|
|
11
11
|
## Properties
|
|
12
12
|
|
|
@@ -14,7 +14,7 @@ Defined in: [web-searcher/src/types.ts:78](https://github.com/isdk/web-searcher.
|
|
|
14
14
|
|
|
15
15
|
> **from**: `string` \| `Date`
|
|
16
16
|
|
|
17
|
-
Defined in: [web-searcher/src/types.ts:
|
|
17
|
+
Defined in: [web-searcher/src/types.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L106)
|
|
18
18
|
|
|
19
19
|
Start date (Date object or string like 'YYYY-MM-DD').
|
|
20
20
|
|
|
@@ -24,6 +24,6 @@ Start date (Date object or string like 'YYYY-MM-DD').
|
|
|
24
24
|
|
|
25
25
|
> `optional` **to**: `string` \| `Date`
|
|
26
26
|
|
|
27
|
-
Defined in: [web-searcher/src/types.ts:
|
|
27
|
+
Defined in: [web-searcher/src/types.ts:108](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L108)
|
|
28
28
|
|
|
29
29
|
End date (Date object or string like 'YYYY-MM-DD'). Defaults to current date if omitted.
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: PaginationConfig
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:41](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L41)
|
|
10
10
|
|
|
11
11
|
Configuration for pagination strategies.
|
|
12
12
|
Defines how the searcher should navigate to the next page of results.
|
|
@@ -17,7 +17,7 @@ Defines how the searcher should navigate to the next page of results.
|
|
|
17
17
|
|
|
18
18
|
> `optional` **increment**: `number`
|
|
19
19
|
|
|
20
|
-
Defined in: [web-searcher/src/types.ts:
|
|
20
|
+
Defined in: [web-searcher/src/types.ts:68](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L68)
|
|
21
21
|
|
|
22
22
|
The increment step for each page.
|
|
23
23
|
- If the parameter represents an item offset (like Google's 'start'), this might be 10.
|
|
@@ -31,11 +31,31 @@ The increment step for each page.
|
|
|
31
31
|
|
|
32
32
|
***
|
|
33
33
|
|
|
34
|
+
### maxPages?
|
|
35
|
+
|
|
36
|
+
> `optional` **maxPages**: `number`
|
|
37
|
+
|
|
38
|
+
Defined in: [web-searcher/src/types.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L85)
|
|
39
|
+
|
|
40
|
+
The safety threshold for the maximum number of pages to fetch automatically
|
|
41
|
+
in a single search call.
|
|
42
|
+
|
|
43
|
+
Even if the requested `limit` of results hasn't been reached, the searcher
|
|
44
|
+
will stop after this many pages to prevent infinite loops or excessive API usage.
|
|
45
|
+
|
|
46
|
+
#### Default
|
|
47
|
+
|
|
48
|
+
```ts
|
|
49
|
+
10
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
***
|
|
53
|
+
|
|
34
54
|
### nextButtonSelector?
|
|
35
55
|
|
|
36
56
|
> `optional` **nextButtonSelector**: `string`
|
|
37
57
|
|
|
38
|
-
Defined in: [web-searcher/src/types.ts:
|
|
58
|
+
Defined in: [web-searcher/src/types.ts:74](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L74)
|
|
39
59
|
|
|
40
60
|
The CSS selector for the "Next" page button.
|
|
41
61
|
Required if type is 'click-next'.
|
|
@@ -46,7 +66,7 @@ Required if type is 'click-next'.
|
|
|
46
66
|
|
|
47
67
|
> `optional` **paramName**: `string`
|
|
48
68
|
|
|
49
|
-
Defined in: [web-searcher/src/types.ts:
|
|
69
|
+
Defined in: [web-searcher/src/types.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L54)
|
|
50
70
|
|
|
51
71
|
The name of the URL parameter used for pagination.
|
|
52
72
|
Required if type is 'url-param'.
|
|
@@ -63,7 +83,7 @@ Required if type is 'url-param'.
|
|
|
63
83
|
|
|
64
84
|
> `optional` **startValue**: `number`
|
|
65
85
|
|
|
66
|
-
Defined in: [web-searcher/src/types.ts:
|
|
86
|
+
Defined in: [web-searcher/src/types.ts:60](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L60)
|
|
67
87
|
|
|
68
88
|
The starting value for the pagination parameter.
|
|
69
89
|
|
|
@@ -79,7 +99,7 @@ The starting value for the pagination parameter.
|
|
|
79
99
|
|
|
80
100
|
> **type**: `"url-param"` \| `"click-next"`
|
|
81
101
|
|
|
82
|
-
Defined in: [web-searcher/src/types.ts:
|
|
102
|
+
Defined in: [web-searcher/src/types.ts:47](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L47)
|
|
83
103
|
|
|
84
104
|
The type of pagination mechanism:
|
|
85
105
|
- 'url-param': Pagination is handled by modifying URL parameters (e.g., `?page=2` or `?start=10`).
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: SearchContext
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:91](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L91)
|
|
10
10
|
|
|
11
11
|
Context object passed to the transform function.
|
|
12
12
|
|
|
@@ -16,7 +16,7 @@ Context object passed to the transform function.
|
|
|
16
16
|
|
|
17
17
|
> `optional` **limit**: `number`
|
|
18
18
|
|
|
19
|
-
Defined in: [web-searcher/src/types.ts:
|
|
19
|
+
Defined in: [web-searcher/src/types.ts:99](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L99)
|
|
20
20
|
|
|
21
21
|
The requested limit of results.
|
|
22
22
|
|
|
@@ -26,7 +26,7 @@ The requested limit of results.
|
|
|
26
26
|
|
|
27
27
|
> **page**: `number`
|
|
28
28
|
|
|
29
|
-
Defined in: [web-searcher/src/types.ts:
|
|
29
|
+
Defined in: [web-searcher/src/types.ts:96](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L96)
|
|
30
30
|
|
|
31
31
|
The current page index (0-based).
|
|
32
32
|
|
|
@@ -36,6 +36,6 @@ The current page index (0-based).
|
|
|
36
36
|
|
|
37
37
|
> **query**: `string`
|
|
38
38
|
|
|
39
|
-
Defined in: [web-searcher/src/types.ts:
|
|
39
|
+
Defined in: [web-searcher/src/types.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L93)
|
|
40
40
|
|
|
41
41
|
The original search query.
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: SearchOptions
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:120](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L120)
|
|
10
10
|
|
|
11
11
|
Options provided when executing a search.
|
|
12
12
|
|
|
@@ -22,7 +22,7 @@ Any other custom variables to be injected into the template.
|
|
|
22
22
|
|
|
23
23
|
> `optional` **category**: [`SearchCategory`](../type-aliases/SearchCategory.md)
|
|
24
24
|
|
|
25
|
-
Defined in: [web-searcher/src/types.ts:
|
|
25
|
+
Defined in: [web-searcher/src/types.ts:144](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L144)
|
|
26
26
|
|
|
27
27
|
The category of results to return.
|
|
28
28
|
Default: 'all' (web search)
|
|
@@ -33,7 +33,7 @@ Default: 'all' (web search)
|
|
|
33
33
|
|
|
34
34
|
> `optional` **language**: `string`
|
|
35
35
|
|
|
36
|
-
Defined in: [web-searcher/src/types.ts:
|
|
36
|
+
Defined in: [web-searcher/src/types.ts:154](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L154)
|
|
37
37
|
|
|
38
38
|
Language code (ISO 639-1) for the interface or results (e.g., 'en', 'zh-CN').
|
|
39
39
|
|
|
@@ -43,17 +43,32 @@ Language code (ISO 639-1) for the interface or results (e.g., 'en', 'zh-CN').
|
|
|
43
43
|
|
|
44
44
|
> `optional` **limit**: `number`
|
|
45
45
|
|
|
46
|
-
Defined in: [web-searcher/src/types.ts:
|
|
46
|
+
Defined in: [web-searcher/src/types.ts:122](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L122)
|
|
47
47
|
|
|
48
48
|
The maximum number of results to retrieve.
|
|
49
49
|
|
|
50
50
|
***
|
|
51
51
|
|
|
52
|
+
### maxPages?
|
|
53
|
+
|
|
54
|
+
> `optional` **maxPages**: `number`
|
|
55
|
+
|
|
56
|
+
Defined in: [web-searcher/src/types.ts:132](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L132)
|
|
57
|
+
|
|
58
|
+
The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
|
|
59
|
+
|
|
60
|
+
This is a safety guard. If the `limit` is high but each page has few results,
|
|
61
|
+
the searcher will stop once this page count is reached.
|
|
62
|
+
|
|
63
|
+
If not provided, it defaults to the value in `PaginationConfig` or 10.
|
|
64
|
+
|
|
65
|
+
***
|
|
66
|
+
|
|
52
67
|
### region?
|
|
53
68
|
|
|
54
69
|
> `optional` **region**: `string`
|
|
55
70
|
|
|
56
|
-
Defined in: [web-searcher/src/types.ts:
|
|
71
|
+
Defined in: [web-searcher/src/types.ts:149](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L149)
|
|
57
72
|
|
|
58
73
|
Region code (ISO 3166-1 alpha-2) to bias results (e.g., 'US', 'CN', 'JP').
|
|
59
74
|
|
|
@@ -63,7 +78,7 @@ Region code (ISO 3166-1 alpha-2) to bias results (e.g., 'US', 'CN', 'JP').
|
|
|
63
78
|
|
|
64
79
|
> `optional` **safeSearch**: [`SafeSearchLevel`](../type-aliases/SafeSearchLevel.md)
|
|
65
80
|
|
|
66
|
-
Defined in: [web-searcher/src/types.ts:
|
|
81
|
+
Defined in: [web-searcher/src/types.ts:160](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L160)
|
|
67
82
|
|
|
68
83
|
Safe search filtering level.
|
|
69
84
|
Default: engine dependent (usually 'moderate' or 'strict' by default).
|
|
@@ -74,7 +89,7 @@ Default: engine dependent (usually 'moderate' or 'strict' by default).
|
|
|
74
89
|
|
|
75
90
|
> `optional` **timeRange**: [`SearchTimeRange`](../type-aliases/SearchTimeRange.md)
|
|
76
91
|
|
|
77
|
-
Defined in: [web-searcher/src/types.ts:
|
|
92
|
+
Defined in: [web-searcher/src/types.ts:138](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L138)
|
|
78
93
|
|
|
79
94
|
Date range for the search results.
|
|
80
95
|
Default: 'all'
|
|
@@ -85,7 +100,7 @@ Default: 'all'
|
|
|
85
100
|
|
|
86
101
|
> `optional` **transform**: (`results`, `context`) => [`StandardSearchResult`](StandardSearchResult.md)[] \| `Promise`\<[`StandardSearchResult`](StandardSearchResult.md)[]\>
|
|
87
102
|
|
|
88
|
-
Defined in: [web-searcher/src/types.ts:
|
|
103
|
+
Defined in: [web-searcher/src/types.ts:166](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L166)
|
|
89
104
|
|
|
90
105
|
A custom transform function to filter or modify results at runtime.
|
|
91
106
|
This runs AFTER the engine-level transform.
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
# Interface: StandardSearchResult
|
|
8
8
|
|
|
9
|
-
Defined in: [web-searcher/src/types.ts:5](https://github.com/isdk/web-searcher.js/blob/
|
|
9
|
+
Defined in: [web-searcher/src/types.ts:5](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L5)
|
|
10
10
|
|
|
11
11
|
Interface representing a standardized search result item.
|
|
12
12
|
This ensures consistency across different search engines.
|
|
@@ -15,35 +15,85 @@ This ensures consistency across different search engines.
|
|
|
15
15
|
|
|
16
16
|
\[`key`: `string`\]: `any`
|
|
17
17
|
|
|
18
|
-
Allows for engine-specific extra fields (e.g.,
|
|
18
|
+
Allows for engine-specific extra fields (e.g., siteIcon, category).
|
|
19
19
|
|
|
20
20
|
## Properties
|
|
21
21
|
|
|
22
|
+
### author?
|
|
23
|
+
|
|
24
|
+
> `optional` **author**: `string`
|
|
25
|
+
|
|
26
|
+
Defined in: [web-searcher/src/types.ts:22](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L22)
|
|
27
|
+
|
|
28
|
+
The author or source name of the result.
|
|
29
|
+
|
|
30
|
+
***
|
|
31
|
+
|
|
32
|
+
### date?
|
|
33
|
+
|
|
34
|
+
> `optional` **date**: `string` \| `Date`
|
|
35
|
+
|
|
36
|
+
Defined in: [web-searcher/src/types.ts:19](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L19)
|
|
37
|
+
|
|
38
|
+
The date the result was published or last updated.
|
|
39
|
+
|
|
40
|
+
***
|
|
41
|
+
|
|
42
|
+
### favicon?
|
|
43
|
+
|
|
44
|
+
> `optional` **favicon**: `string`
|
|
45
|
+
|
|
46
|
+
Defined in: [web-searcher/src/types.ts:25](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L25)
|
|
47
|
+
|
|
48
|
+
The favicon URL of the source website.
|
|
49
|
+
|
|
50
|
+
***
|
|
51
|
+
|
|
22
52
|
### image?
|
|
23
53
|
|
|
24
54
|
> `optional` **image**: `string`
|
|
25
55
|
|
|
26
|
-
Defined in: [web-searcher/src/types.ts:16](https://github.com/isdk/web-searcher.js/blob/
|
|
56
|
+
Defined in: [web-searcher/src/types.ts:16](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L16)
|
|
27
57
|
|
|
28
58
|
An optional image URL associated with the result.
|
|
29
59
|
|
|
30
60
|
***
|
|
31
61
|
|
|
62
|
+
### rank?
|
|
63
|
+
|
|
64
|
+
> `optional` **rank**: `number`
|
|
65
|
+
|
|
66
|
+
Defined in: [web-searcher/src/types.ts:28](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L28)
|
|
67
|
+
|
|
68
|
+
The rank or position of the result (usually 1-indexed).
|
|
69
|
+
|
|
70
|
+
***
|
|
71
|
+
|
|
32
72
|
### snippet?
|
|
33
73
|
|
|
34
74
|
> `optional` **snippet**: `string`
|
|
35
75
|
|
|
36
|
-
Defined in: [web-searcher/src/types.ts:13](https://github.com/isdk/web-searcher.js/blob/
|
|
76
|
+
Defined in: [web-searcher/src/types.ts:13](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L13)
|
|
37
77
|
|
|
38
78
|
A brief snippet or description of the result.
|
|
39
79
|
|
|
40
80
|
***
|
|
41
81
|
|
|
82
|
+
### source?
|
|
83
|
+
|
|
84
|
+
> `optional` **source**: `string`
|
|
85
|
+
|
|
86
|
+
Defined in: [web-searcher/src/types.ts:31](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L31)
|
|
87
|
+
|
|
88
|
+
The source website name (e.g., 'GitHub', 'StackOverflow').
|
|
89
|
+
|
|
90
|
+
***
|
|
91
|
+
|
|
42
92
|
### title
|
|
43
93
|
|
|
44
94
|
> **title**: `string`
|
|
45
95
|
|
|
46
|
-
Defined in: [web-searcher/src/types.ts:7](https://github.com/isdk/web-searcher.js/blob/
|
|
96
|
+
Defined in: [web-searcher/src/types.ts:7](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L7)
|
|
47
97
|
|
|
48
98
|
The title of the search result.
|
|
49
99
|
|
|
@@ -53,6 +103,6 @@ The title of the search result.
|
|
|
53
103
|
|
|
54
104
|
> **url**: `string`
|
|
55
105
|
|
|
56
|
-
Defined in: [web-searcher/src/types.ts:10](https://github.com/isdk/web-searcher.js/blob/
|
|
106
|
+
Defined in: [web-searcher/src/types.ts:10](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L10)
|
|
57
107
|
|
|
58
108
|
The URL of the search result.
|
|
@@ -8,4 +8,4 @@
|
|
|
8
8
|
|
|
9
9
|
> **SafeSearchLevel** = `"off"` \| `"moderate"` \| `"strict"`
|
|
10
10
|
|
|
11
|
-
Defined in: [web-searcher/src/types.ts:
|
|
11
|
+
Defined in: [web-searcher/src/types.ts:115](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L115)
|
|
@@ -8,4 +8,4 @@
|
|
|
8
8
|
|
|
9
9
|
> **SearchCategory** = `"all"` \| `"images"` \| `"videos"` \| `"news"`
|
|
10
10
|
|
|
11
|
-
Defined in: [web-searcher/src/types.ts:
|
|
11
|
+
Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L113)
|
|
@@ -8,4 +8,4 @@
|
|
|
8
8
|
|
|
9
9
|
> **SearchTimeRange** = [`SearchTimeRangePreset`](SearchTimeRangePreset.md) \| [`CustomTimeRange`](../interfaces/CustomTimeRange.md)
|
|
10
10
|
|
|
11
|
-
Defined in: [web-searcher/src/types.ts:
|
|
11
|
+
Defined in: [web-searcher/src/types.ts:111](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L111)
|
|
@@ -8,4 +8,4 @@
|
|
|
8
8
|
|
|
9
9
|
> **SearchTimeRangePreset** = `"all"` \| `"day"` \| `"week"` \| `"month"` \| `"year"`
|
|
10
10
|
|
|
11
|
-
Defined in: [web-searcher/src/types.ts:
|
|
11
|
+
Defined in: [web-searcher/src/types.ts:102](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L102)
|
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
|
|
9
9
|
> **SearcherConstructor** = (`options?`) => [`WebSearcher`](../classes/WebSearcher.md)
|
|
10
10
|
|
|
11
|
-
Defined in: [web-searcher/src/searcher.ts:10](https://github.com/isdk/web-searcher.js/blob/
|
|
11
|
+
Defined in: [web-searcher/src/searcher.ts:10](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L10)
|
|
12
12
|
|
|
13
13
|
Constructor definition for Searcher subclasses.
|
|
14
14
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@isdk/web-searcher",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.3",
|
|
4
4
|
"description": "A high-level framework for building search engine scrapers, supporting multi-page navigation, session persistence, and result standardization.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Riceball LEE <snowyu.lee@gmail.com>",
|