@isdk/web-searcher 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.cn.md CHANGED
@@ -42,27 +42,26 @@ console.log(results);
42
42
 
43
43
  由于 `WebSearcher` 继承自 `FetchSession`,您可以实例化它以在多个请求之间保持 Cookie 和存储。这对于需要登录的搜索或通过模拟人类行为来避免反爬虫非常有用。
44
44
 
45
- **配置优先级:**
46
- 创建会话时,选项按以下顺序合并:
45
+ ### 🛡️ 核心准则:模板即法律 (Template is Law)
47
46
 
48
- 1. **模板默认 (Template Default)**:在 WebSearcher 类中定义(结构化选项的优先级最高)。
49
- 2. **用户选项 (User Options)**:传递给构造函数的选项(可填充缺失的默认值,或在允许的情况下进行覆盖)。
47
+ `WebSearcher` 子类中定义的 `template` 是权威的“蓝图”。
50
48
 
51
- *注:如果模板设置了 `engine: 'auto'`(默认值),则会尊重用户提供的 `engine` 选项。*
49
+ - **模板优先级**:如果模板定义了某个属性(如 `engine: 'browser'`、特定的 `headers` 等),该值将被**锁定**,用户选项无法覆盖。这确保了抓取逻辑的稳定性。
50
+ - **用户灵活性**:对于模板中**未**显式锁定的属性(如 `proxy`、`timeoutMs` 或自定义变量),用户可以在构造函数或 `search()` 方法中自由设置。
52
51
 
53
52
  ```typescript
54
53
  // 创建一个持久化会话
55
54
  const google = new GoogleSearcher({
56
- headless: false, // 覆盖默认选项 (例如显示浏览器)
55
+ headless: false, // 如果模板中未锁定,则可以覆盖
57
56
  proxy: 'http://my-proxy:8080',
58
- timeoutMs: 30000 // 为请求设置全局超时
57
+ timeoutMs: 30000 // 有效(假设 GoogleSearcher 模板未显式设置 timeoutMs)
59
58
  });
60
59
 
61
60
  try {
62
61
  // 第一次查询
63
62
  // 您还可以传递运行时选项来覆盖会话默认值或注入变量
64
63
  const results1 = await google.search('term A', {
65
- timeoutMs: 60000, // 仅针对此搜索覆盖超时时间
64
+ timeoutMs: 60000, // 针对此次搜索覆盖超时时间
66
65
  extraParam: 'value' // 可以在模板中通过 ${extraParam} 使用
67
66
  });
68
67
 
@@ -174,22 +173,41 @@ protected override async transform(outputs: Record<string, any>) {
174
173
 
175
174
  ## 🧠 高级概念
176
175
 
177
- ### 自动分页与过滤
176
+ ### 自动分页:`limit` 与 `maxPages` 的关系
178
177
 
179
- `WebSearcher` 是智能的。如果您请求 `limit: 10`,但第一页只返回了 5 条结果(或者如果您的 `transform` 过滤掉了一些结果),它会自动抓取下一页,直到满足限制。
178
+ `WebSearcher` 的设计是以结果为导向的。当您调用 `search()` 时,您只需要指定想要多少条结果,搜索器会自动处理翻页逻辑。
179
+
180
+ - **`limit`**: 您期望获取的结果总数。
181
+ - **`maxPages`**: 安全阈值。它限制了搜索器为了满足 `limit` 而允许抓取的最大页数(翻页循环次数)。
182
+
183
+ **协作逻辑示例:**
184
+ 如果您请求 `{ limit: 50 }`,但每页只有 5 条结果:
185
+
186
+ 1. 搜索器抓取第 1 页(得到 5 条)。
187
+ 2. 发现 `5 < 50`,于是自动抓取第 2 页。
188
+ 3. 循环持续,直到获取 50 条结果 **或者** 达到了 `maxPages` 的限制(默认为 10 页)。
189
+
190
+ 这种机制可以防止因“下一页”选择器失效或引擎陷入死循环而导致的无限抓取,保护您的系统资源。
180
191
 
181
192
  ### 用户自定义转换 (User-defined Transforms)
182
193
 
183
194
  用户可以在调用 `search` 时提供自己的 `transform`。它会在引擎内置的转换**之后**运行。
184
195
 
196
+ 这在**过滤广告**或无关内容时非常强大。如果用户过滤掉了某些结果,自动分页逻辑会**自动启动**以抓取更多页面,确保最终返回给您的结果列表既满足 `limit` 数量要求,又只包含有效的条目。
197
+
185
198
  ```typescript
186
199
  await google.search('test', {
187
- transform: (results) => results.filter(r => r.url.endsWith('.pdf'))
200
+ limit: 20,
201
+ // 示例:过滤掉赞助商结果(广告)并只保留 PDF
202
+ transform: (results) => {
203
+ return results.filter(r => {
204
+ const isAd = r.isSponsored || r.url.includes('googleadservices.com');
205
+ return !isAd && r.url.endsWith('.pdf');
206
+ });
207
+ }
188
208
  });
189
209
  ```
190
210
 
191
- 如果用户过滤掉了结果,自动分页逻辑会启动以抓取更多页面来满足请求的 limit。
192
-
193
211
  ### 标准化搜索选项
194
212
 
195
213
  在调用 `search()` 时,您可以提供标准化的选项,搜索引擎会将其映射到特定的参数:
package/README.md CHANGED
@@ -42,27 +42,26 @@ console.log(results);
42
42
 
43
43
  Since `WebSearcher` extends `FetchSession`, you can instantiate it to keep cookies and storage alive across multiple requests. This is useful for authenticated searches or avoiding bot detection by behaving like a human.
44
44
 
45
- **Configuration Precedence:**
46
- When creating a session, options are merged in the following order:
45
+ ### 🛡️ Core Principle: Template is Law
47
46
 
48
- 1. **Template Default**: Defined in the WebSearcher class (highest priority for structural options).
49
- 2. **User Options**: Passed to the constructor (can fill missing defaults or override if allowed).
47
+ The `template` defined in the `WebSearcher` subclass acts as the authoritative "blueprint".
50
48
 
51
- *Note: If the template sets `engine: 'auto'` (default), user-provided `engine` option will be respected.*
49
+ - **Template Priority**: If the template defines a property (e.g., `engine: 'browser'`, `headers`), that value is **locked** and cannot be overridden by user options. This ensures engine stability.
50
+ - **User Flexibility**: Properties **not** explicitly defined in the template (such as `proxy`, `timeoutMs`, or custom variables) can be freely set by the user in the constructor or `search()` method.
52
51
 
53
52
  ```typescript
54
53
  // Create a persistent session
55
54
  const google = new GoogleSearcher({
56
- headless: false, // Override default options (e.g., show browser)
55
+ headless: false, // Override if not locked in template
57
56
  proxy: 'http://my-proxy:8080',
58
- timeoutMs: 30000 // Set a global timeout for requests
57
+ timeoutMs: 30000 // Set a global timeout (valid if template doesn't define it)
59
58
  });
60
59
 
61
60
  try {
62
61
  // First query
63
62
  // You can also pass runtime options to override session defaults or inject variables
64
63
  const results1 = await google.search('term A', {
65
- timeoutMs: 60000, // Override timeout just for this search
64
+ timeoutMs: 60000, // Override session timeout just for this search
66
65
  extraParam: 'value' // Can be used in template as ${extraParam}
67
66
  });
68
67
 
@@ -172,24 +171,43 @@ protected override async transform(outputs: Record<string, any>) {
172
171
  }
173
172
  ```
174
173
 
175
- ## 🧠 Advanced Concepts
174
+ ### 🧠 Advanced Concepts
176
175
 
177
- ### Auto-Pagination & Filtering
176
+ ### Auto-Pagination: `limit` vs `maxPages`
178
177
 
179
- The `WebSearcher` is smart. If you request `limit: 10`, but the first page only returns 5 results (or if your `transform` filters out results), it will automatically fetch the next page until the limit is met.
178
+ The `WebSearcher` is designed to be result-oriented. When you call `search()`, you specify how many results you want, and the searcher handles the pagination logic.
179
+
180
+ - **`limit`**: Your target number of total results.
181
+ - **`maxPages`**: The safety threshold. It limits how many pages (fetch cycles) the searcher is allowed to navigate to satisfy your `limit`.
182
+
183
+ **Example Logic:**
184
+ If you request `{ limit: 50 }` but each page only has 5 results:
185
+
186
+ 1. The searcher fetches page 1 (5 results).
187
+ 2. It sees `5 < 50`, so it fetches page 2.
188
+ 3. It continues until it has 50 results **OR** it reaches `maxPages` (default 10).
189
+
190
+ This prevent infinite loops if the "Next" button selector is broken or if the search engine keeps returning the same results.
180
191
 
181
192
  ### User-defined Transforms
182
193
 
183
194
  Users can provide their own `transform` when calling `search`. This runs **after** the engine's built-in transform.
184
195
 
196
+ This is extremely powerful for **filtering out ads** or irrelevant content. If the user filters out results, the auto-pagination logic will automatically kick in to fetch more pages to ensure the final result list meets your requested `limit` with only valid entries.
197
+
185
198
  ```typescript
186
199
  await google.search('test', {
187
- transform: (results) => results.filter(r => r.url.endsWith('.pdf'))
200
+ limit: 20,
201
+ // Example: Filter out sponsored results and only keep PDFs
202
+ transform: (results) => {
203
+ return results.filter(r => {
204
+ const isAd = r.isSponsored || r.url.includes('googleadservices.com');
205
+ return !isAd && r.url.endsWith('.pdf');
206
+ });
207
+ }
188
208
  });
189
209
  ```
190
210
 
191
- If the user filters out results, the auto-pagination logic will kick in to fetch more pages to meet the requested limit.
192
-
193
211
  ### Standardized Search Options
194
212
 
195
213
  When calling `search()`, you can provide standardized options that the search engine will map to specific parameters:
package/dist/index.d.mts CHANGED
@@ -15,7 +15,17 @@ interface StandardSearchResult {
15
15
  snippet?: string;
16
16
  /** An optional image URL associated with the result. */
17
17
  image?: string;
18
- /** Allows for engine-specific extra fields (e.g., rank, author, date). */
18
+ /** The date the result was published or last updated. */
19
+ date?: string | Date;
20
+ /** The author or source name of the result. */
21
+ author?: string;
22
+ /** The favicon URL of the source website. */
23
+ favicon?: string;
24
+ /** The rank or position of the result (usually 1-indexed). */
25
+ rank?: number;
26
+ /** The source website name (e.g., 'GitHub', 'StackOverflow'). */
27
+ source?: string;
28
+ /** Allows for engine-specific extra fields (e.g., siteIcon, category). */
19
29
  [key: string]: any;
20
30
  }
21
31
  /**
@@ -52,6 +62,16 @@ interface PaginationConfig {
52
62
  * Required if type is 'click-next'.
53
63
  */
54
64
  nextButtonSelector?: string;
65
+ /**
66
+ * The safety threshold for the maximum number of pages to fetch automatically
67
+ * in a single search call.
68
+ *
69
+ * Even if the requested `limit` of results hasn't been reached, the searcher
70
+ * will stop after this many pages to prevent infinite loops or excessive API usage.
71
+ *
72
+ * @default 10
73
+ */
74
+ maxPages?: number;
55
75
  }
56
76
  /**
57
77
  * Context object passed to the transform function.
@@ -80,6 +100,15 @@ type SafeSearchLevel = 'off' | 'moderate' | 'strict';
80
100
  interface SearchOptions {
81
101
  /** The maximum number of results to retrieve. */
82
102
  limit?: number;
103
+ /**
104
+ * The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
105
+ *
106
+ * This is a safety guard. If the `limit` is high but each page has few results,
107
+ * the searcher will stop once this page count is reached.
108
+ *
109
+ * If not provided, it defaults to the value in `PaginationConfig` or 10.
110
+ */
111
+ maxPages?: number;
83
112
  /**
84
113
  * Date range for the search results.
85
114
  * Default: 'all'
package/dist/index.d.ts CHANGED
@@ -15,7 +15,17 @@ interface StandardSearchResult {
15
15
  snippet?: string;
16
16
  /** An optional image URL associated with the result. */
17
17
  image?: string;
18
- /** Allows for engine-specific extra fields (e.g., rank, author, date). */
18
+ /** The date the result was published or last updated. */
19
+ date?: string | Date;
20
+ /** The author or source name of the result. */
21
+ author?: string;
22
+ /** The favicon URL of the source website. */
23
+ favicon?: string;
24
+ /** The rank or position of the result (usually 1-indexed). */
25
+ rank?: number;
26
+ /** The source website name (e.g., 'GitHub', 'StackOverflow'). */
27
+ source?: string;
28
+ /** Allows for engine-specific extra fields (e.g., siteIcon, category). */
19
29
  [key: string]: any;
20
30
  }
21
31
  /**
@@ -52,6 +62,16 @@ interface PaginationConfig {
52
62
  * Required if type is 'click-next'.
53
63
  */
54
64
  nextButtonSelector?: string;
65
+ /**
66
+ * The safety threshold for the maximum number of pages to fetch automatically
67
+ * in a single search call.
68
+ *
69
+ * Even if the requested `limit` of results hasn't been reached, the searcher
70
+ * will stop after this many pages to prevent infinite loops or excessive API usage.
71
+ *
72
+ * @default 10
73
+ */
74
+ maxPages?: number;
55
75
  }
56
76
  /**
57
77
  * Context object passed to the transform function.
@@ -80,6 +100,15 @@ type SafeSearchLevel = 'off' | 'moderate' | 'strict';
80
100
  interface SearchOptions {
81
101
  /** The maximum number of results to retrieve. */
82
102
  limit?: number;
103
+ /**
104
+ * The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
105
+ *
106
+ * This is a safety guard. If the `limit` is high but each page has few results,
107
+ * the searcher will stop once this page count is reached.
108
+ *
109
+ * If not provided, it defaults to the value in `PaginationConfig` or 10.
110
+ */
111
+ maxPages?: number;
83
112
  /**
84
113
  * Date range for the search results.
85
114
  * Default: 'all'
package/dist/index.js CHANGED
@@ -1 +1 @@
1
- "use strict";var t,e=Object.defineProperty,r=Object.getOwnPropertyDescriptor,s=Object.getOwnPropertyNames,a=Object.prototype.hasOwnProperty,i={};((t,r)=>{for(var s in r)e(t,s,{get:r[s],enumerable:!0})})(i,{GoogleSearcher:()=>h,WebSearcher:()=>f}),module.exports=(t=i,((t,i,n,o)=>{if(i&&"object"==typeof i||"function"==typeof i)for(let c of s(i))a.call(t,c)||c===n||e(t,c,{get:()=>i[c],enumerable:!(o=r(i,c))||o.enumerable});return t})(e({},"__esModule",{value:!0}),t));var n=require("@isdk/web-fetcher"),o=require("custom-factory"),c=require("lodash-es");function l(t,e){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,r)=>{const s=e[r.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>l(t,e));if((0,c.isPlainObject)(t)){const r={};for(const s in t)Object.prototype.hasOwnProperty.call(t,s)&&(r[s]=l(t[s],e));return r}return t}var u=require("lodash-es"),f=class extends n.FetchSession{static async search(t,e,r={}){const s=this.createObject(t,r);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(e,r)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const e=this.template,r=(0,u.defaultsDeep)({},e,t);return e.engine&&"auto"!==e.engine||!t.engine||(r.engine=t.engine),super.createContext(r)}async search(t,e={}){const r=e.limit||10,s=[];let a=0;const i=this.pagination?.startValue??0,n=this.pagination?.increment??1;for(;s.length<r;){const o=this.formatOptions(e),c=i+a*n,f={...e,...o,query:t,page:a+i,offset:c,limit:r},h=l(this.template,f),m=(0,u.defaultsDeep)({},h,e),d=[];if(0===a||"url-param"===this.pagination?.type?m.url&&d.push({id:"goto",params:{url:m.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(d.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),d.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),m.actions){const t=m.actions.filter(t=>!(d.length>0&&"goto"===d[0].id&&"goto"===t.id));d.push(...t)}m.engine&&this.context.engine!==m.engine&&m.engine;const{outputs:g}=await this.executeAll(d),p={query:t,page:a,limit:e.limit};let w=[];if(w=await this.transform(g,p),e.transform&&(w=await e.transform(w,p)),!w||0===w.length)break;if(s.push(...w),s.length>=r||!this.pagination)break;if(a++,a>10)break}return s.slice(0,r)}async transform(t,e){return t.results||[]}formatOptions(t){return{...t}}};f._isFactory=!1,(0,o.addBaseFactoryAbility)(f),f.prototype.name="Searcher";var h=class extends f{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const e={};if(t.timeRange)if("string"==typeof t.timeRange){const r={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};r[t.timeRange]&&(e.tbs=r[t.timeRange])}else{const r=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(r.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;e.tbs=`cdr:1,cd_min:${t(r)},cd_max:${t(s)}`}}if(t.category){const r={images:"isch",videos:"vid",news:"nws"};r[t.category]&&(e.tbm=r[t.category])}return t.region&&(e.gl=t.region),t.language&&(e.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?e.safe="active":"off"===t.safeSearch&&(e.safe="images")),e}async transform(t){const e=t.results||[];return Array.isArray(e)?e.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const e=new URL(t.url,"https://www.google.com").searchParams.get("q");e&&(t.url=e)}catch(t){}return t}):[]}};h.alias=["google"];
1
+ "use strict";var t,e=Object.defineProperty,r=Object.getOwnPropertyDescriptor,s=Object.getOwnPropertyNames,a=Object.prototype.hasOwnProperty,i={};((t,r)=>{for(var s in r)e(t,s,{get:r[s],enumerable:!0})})(i,{GoogleSearcher:()=>f,WebSearcher:()=>h}),module.exports=(t=i,((t,i,n,o)=>{if(i&&"object"==typeof i||"function"==typeof i)for(let c of s(i))a.call(t,c)||c===n||e(t,c,{get:()=>i[c],enumerable:!(o=r(i,c))||o.enumerable});return t})(e({},"__esModule",{value:!0}),t));var n=require("@isdk/web-fetcher"),o=require("custom-factory"),c=require("lodash-es");function l(t,e){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,r)=>{const s=e[r.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>l(t,e));if((0,c.isPlainObject)(t)){const r={};for(const s in t)Object.prototype.hasOwnProperty.call(t,s)&&(r[s]=l(t[s],e));return r}return t}var u=require("lodash-es"),h=class extends n.FetchSession{static async search(t,e,r={}){const s=this.createObject(t,r);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(e,r)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const e=this.template,r=(0,u.defaultsDeep)({},e,t);return e.engine&&"auto"!==e.engine||!t.engine||(r.engine=t.engine),super.createContext(r)}async search(t,e={}){const r=e.limit||10,s=[];let a=0;const i=this.pagination?.startValue??0,n=this.pagination?.increment??1,o=e.maxPages||this.pagination?.maxPages||10;for(;s.length<r;){const c=this.formatOptions(e),h=i+a*n,f={...e,...c,query:t,page:a+i,offset:h,limit:r},m=l(this.template,f),d=(0,u.defaultsDeep)({},m,e),g=[];if(0===a||"url-param"===this.pagination?.type?d.url&&g.push({id:"goto",params:{url:d.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(g.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),g.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),d.actions){const t=d.actions.filter(t=>!(g.length>0&&"goto"===g[0].id&&"goto"===t.id));g.push(...t)}d.engine&&this.context.engine!==d.engine&&d.engine;const{outputs:p}=await this.executeAll(g),w={query:t,page:a,limit:e.limit};let y=[];if(y=await this.transform(p,w),e.transform&&(y=await e.transform(y,w)),!y||0===y.length)break;if(s.push(...y),s.length>=r||!this.pagination)break;if(a++,a>=o)break}return s.slice(0,r)}async transform(t,e){return t.results||[]}formatOptions(t){return{...t}}};h._isFactory=!1,(0,o.addBaseFactoryAbility)(h),h.prototype.name="Searcher";var f=class extends h{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const e={};if(t.timeRange)if("string"==typeof t.timeRange){const r={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};r[t.timeRange]&&(e.tbs=r[t.timeRange])}else{const r=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(r.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;e.tbs=`cdr:1,cd_min:${t(r)},cd_max:${t(s)}`}}if(t.category){const r={images:"isch",videos:"vid",news:"nws"};r[t.category]&&(e.tbm=r[t.category])}return t.region&&(e.gl=t.region),t.language&&(e.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?e.safe="active":"off"===t.safeSearch&&(e.safe="images")),e}async transform(t){const e=t.results||[];return Array.isArray(e)?e.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const e=new URL(t.url,"https://www.google.com").searchParams.get("q");e&&(t.url=e)}catch(t){}return t}):[]}};f.alias=["google"];
package/dist/index.mjs CHANGED
@@ -1 +1 @@
1
- import{FetchSession as t}from"@isdk/web-fetcher";import{addBaseFactoryAbility as r}from"custom-factory";import{isPlainObject as e}from"lodash-es";function s(t,r){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,e)=>{const s=r[e.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>s(t,r));if(e(t)){const e={};for(const a in t)Object.prototype.hasOwnProperty.call(t,a)&&(e[a]=s(t[a],r));return e}return t}import{defaultsDeep as a}from"lodash-es";var i=class extends t{static async search(t,r,e={}){const s=this.createObject(t,e);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(r,e)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const r=this.template,e=a({},r,t);return r.engine&&"auto"!==r.engine||!t.engine||(e.engine=t.engine),super.createContext(e)}async search(t,r={}){const e=r.limit||10,i=[];let o=0;const n=this.pagination?.startValue??0,c=this.pagination?.increment??1;for(;i.length<e;){const l=this.formatOptions(r),m=n+o*c,h={...r,...l,query:t,page:o+n,offset:m,limit:e},f=s(this.template,h),u=a({},f,r),p=[];if(0===o||"url-param"===this.pagination?.type?u.url&&p.push({id:"goto",params:{url:u.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(p.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),p.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),u.actions){const t=u.actions.filter(t=>!(p.length>0&&"goto"===p[0].id&&"goto"===t.id));p.push(...t)}u.engine&&this.context.engine!==u.engine&&u.engine;const{outputs:d}=await this.executeAll(p),w={query:t,page:o,limit:r.limit};let g=[];if(g=await this.transform(d,w),r.transform&&(g=await r.transform(g,w)),!g||0===g.length)break;if(i.push(...g),i.length>=e||!this.pagination)break;if(o++,o>10)break}return i.slice(0,e)}async transform(t,r){return t.results||[]}formatOptions(t){return{...t}}};i._isFactory=!1,r(i),i.prototype.name="Searcher";var o=class extends i{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const r={};if(t.timeRange)if("string"==typeof t.timeRange){const e={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};e[t.timeRange]&&(r.tbs=e[t.timeRange])}else{const e=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(e.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;r.tbs=`cdr:1,cd_min:${t(e)},cd_max:${t(s)}`}}if(t.category){const e={images:"isch",videos:"vid",news:"nws"};e[t.category]&&(r.tbm=e[t.category])}return t.region&&(r.gl=t.region),t.language&&(r.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?r.safe="active":"off"===t.safeSearch&&(r.safe="images")),r}async transform(t){const r=t.results||[];return Array.isArray(r)?r.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const r=new URL(t.url,"https://www.google.com").searchParams.get("q");r&&(t.url=r)}catch(t){}return t}):[]}};o.alias=["google"];export{o as GoogleSearcher,i as WebSearcher};
1
+ import{FetchSession as t}from"@isdk/web-fetcher";import{addBaseFactoryAbility as r}from"custom-factory";import{isPlainObject as e}from"lodash-es";function s(t,r){if("string"==typeof t)return t.replace(/\$\{(.*?)\}/g,(t,e)=>{const s=r[e.trim()];return void 0!==s?String(s):""});if(Array.isArray(t))return t.map(t=>s(t,r));if(e(t)){const e={};for(const a in t)Object.prototype.hasOwnProperty.call(t,a)&&(e[a]=s(t[a],r));return e}return t}import{defaultsDeep as a}from"lodash-es";var i=class extends t{static async search(t,r,e={}){const s=this.createObject(t,e);if(!s)throw new Error(`Search engine not found: ${t}`);try{return await s.search(r,e)}finally{await s.dispose()}}get pagination(){}createContext(t=this.options){const r=this.template,e=a({},r,t);return r.engine&&"auto"!==r.engine||!t.engine||(e.engine=t.engine),super.createContext(e)}async search(t,r={}){const e=r.limit||10,i=[];let o=0;const n=this.pagination?.startValue??0,c=this.pagination?.increment??1,h=r.maxPages||this.pagination?.maxPages||10;for(;i.length<e;){const l=this.formatOptions(r),m=n+o*c,f={...r,...l,query:t,page:o+n,offset:m,limit:e},u=s(this.template,f),p=a({},u,r),d=[];if(0===o||"url-param"===this.pagination?.type?p.url&&d.push({id:"goto",params:{url:p.url}}):"click-next"===this.pagination?.type&&this.pagination.nextButtonSelector&&(d.push({id:"click",params:{selector:this.pagination.nextButtonSelector}}),d.push({id:"waitFor",params:{networkIdle:!0,ms:500}})),p.actions){const t=p.actions.filter(t=>!(d.length>0&&"goto"===d[0].id&&"goto"===t.id));d.push(...t)}p.engine&&this.context.engine!==p.engine&&p.engine;const{outputs:w}=await this.executeAll(d),g={query:t,page:o,limit:r.limit};let y=[];if(y=await this.transform(w,g),r.transform&&(y=await r.transform(y,g)),!y||0===y.length)break;if(i.push(...y),i.length>=e||!this.pagination)break;if(o++,o>=h)break}return i.slice(0,e)}async transform(t,r){return t.results||[]}formatOptions(t){return{...t}}};i._isFactory=!1,r(i),i.prototype.name="Searcher";var o=class extends i{get template(){return{engine:"browser",browser:{headless:!1},url:"https://www.google.com/search?q=${query}&start=${offset}&tbs=${tbs}&tbm=${tbm}&gl=${gl}&hl=${hl}&safe=${safe}",actions:[{id:"extract",storeAs:"results",params:{type:"array",selector:"#main #search",items:{url:{selector:"a:has(h3)",attribute:"href",required:!0},title:{selector:"a:has(h3) h3",required:!0,mode:"innerText"},snippet:{selector:"div[style*='-webkit-line-clamp']",type:"html"}}}}]}}get pagination(){return{type:"url-param",paramName:"start",startValue:0,increment:10}}formatOptions(t){const r={};if(t.timeRange)if("string"==typeof t.timeRange){const e={day:"qdr:d",week:"qdr:w",month:"qdr:m",year:"qdr:y"};e[t.timeRange]&&(r.tbs=e[t.timeRange])}else{const e=new Date(t.timeRange.from),s=t.timeRange.to?new Date(t.timeRange.to):new Date;if(!isNaN(e.getTime())&&!isNaN(s.getTime())){const t=t=>`${t.getMonth()+1}/${t.getDate()}/${t.getFullYear()}`;r.tbs=`cdr:1,cd_min:${t(e)},cd_max:${t(s)}`}}if(t.category){const e={images:"isch",videos:"vid",news:"nws"};e[t.category]&&(r.tbm=e[t.category])}return t.region&&(r.gl=t.region),t.language&&(r.hl=t.language),t.safeSearch&&("strict"===t.safeSearch?r.safe="active":"off"===t.safeSearch&&(r.safe="images")),r}async transform(t){const r=t.results||[];return Array.isArray(r)?r.map(t=>{if(t.url&&t.url.startsWith("/url?q="))try{const r=new URL(t.url,"https://www.google.com").searchParams.get("q");r&&(t.url=r)}catch(t){}return t}):[]}};o.alias=["google"];export{o as GoogleSearcher,i as WebSearcher};
package/docs/README.md CHANGED
@@ -46,27 +46,26 @@ console.log(results);
46
46
 
47
47
  Since `WebSearcher` extends `FetchSession`, you can instantiate it to keep cookies and storage alive across multiple requests. This is useful for authenticated searches or avoiding bot detection by behaving like a human.
48
48
 
49
- **Configuration Precedence:**
50
- When creating a session, options are merged in the following order:
49
+ ### 🛡️ Core Principle: Template is Law
51
50
 
52
- 1. **Template Default**: Defined in the WebSearcher class (highest priority for structural options).
53
- 2. **User Options**: Passed to the constructor (can fill missing defaults or override if allowed).
51
+ The `template` defined in the `WebSearcher` subclass acts as the authoritative "blueprint".
54
52
 
55
- *Note: If the template sets `engine: 'auto'` (default), user-provided `engine` option will be respected.*
53
+ - **Template Priority**: If the template defines a property (e.g., `engine: 'browser'`, `headers`), that value is **locked** and cannot be overridden by user options. This ensures engine stability.
54
+ - **User Flexibility**: Properties **not** explicitly defined in the template (such as `proxy`, `timeoutMs`, or custom variables) can be freely set by the user in the constructor or `search()` method.
56
55
 
57
56
  ```typescript
58
57
  // Create a persistent session
59
58
  const google = new GoogleSearcher({
60
- headless: false, // Override default options (e.g., show browser)
59
+ headless: false, // Override if not locked in template
61
60
  proxy: 'http://my-proxy:8080',
62
- timeoutMs: 30000 // Set a global timeout for requests
61
+ timeoutMs: 30000 // Set a global timeout (valid if template doesn't define it)
63
62
  });
64
63
 
65
64
  try {
66
65
  // First query
67
66
  // You can also pass runtime options to override session defaults or inject variables
68
67
  const results1 = await google.search('term A', {
69
- timeoutMs: 60000, // Override timeout just for this search
68
+ timeoutMs: 60000, // Override session timeout just for this search
70
69
  extraParam: 'value' // Can be used in template as ${extraParam}
71
70
  });
72
71
 
@@ -176,24 +175,43 @@ protected override async transform(outputs: Record<string, any>) {
176
175
  }
177
176
  ```
178
177
 
179
- ## 🧠 Advanced Concepts
178
+ ### 🧠 Advanced Concepts
180
179
 
181
- ### Auto-Pagination & Filtering
180
+ ### Auto-Pagination: `limit` vs `maxPages`
182
181
 
183
- The `WebSearcher` is smart. If you request `limit: 10`, but the first page only returns 5 results (or if your `transform` filters out results), it will automatically fetch the next page until the limit is met.
182
+ The `WebSearcher` is designed to be result-oriented. When you call `search()`, you specify how many results you want, and the searcher handles the pagination logic.
183
+
184
+ - **`limit`**: Your target number of total results.
185
+ - **`maxPages`**: The safety threshold. It limits how many pages (fetch cycles) the searcher is allowed to navigate to satisfy your `limit`.
186
+
187
+ **Example Logic:**
188
+ If you request `{ limit: 50 }` but each page only has 5 results:
189
+
190
+ 1. The searcher fetches page 1 (5 results).
191
+ 2. It sees `5 < 50`, so it fetches page 2.
192
+ 3. It continues until it has 50 results **OR** it reaches `maxPages` (default 10).
193
+
194
+ This prevent infinite loops if the "Next" button selector is broken or if the search engine keeps returning the same results.
184
195
 
185
196
  ### User-defined Transforms
186
197
 
187
198
  Users can provide their own `transform` when calling `search`. This runs **after** the engine's built-in transform.
188
199
 
200
+ This is extremely powerful for **filtering out ads** or irrelevant content. If the user filters out results, the auto-pagination logic will automatically kick in to fetch more pages to ensure the final result list meets your requested `limit` with only valid entries.
201
+
189
202
  ```typescript
190
203
  await google.search('test', {
191
- transform: (results) => results.filter(r => r.url.endsWith('.pdf'))
204
+ limit: 20,
205
+ // Example: Filter out sponsored results and only keep PDFs
206
+ transform: (results) => {
207
+ return results.filter(r => {
208
+ const isAd = r.isSponsored || r.url.includes('googleadservices.com');
209
+ return !isAd && r.url.endsWith('.pdf');
210
+ });
211
+ }
192
212
  });
193
213
  ```
194
214
 
195
- If the user filters out results, the auto-pagination logic will kick in to fetch more pages to meet the requested limit.
196
-
197
215
  ### Standardized Search Options
198
216
 
199
217
  When calling `search()`, you can provide standardized options that the search engine will map to specific parameters:
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Class: GoogleSearcher
8
8
 
9
- Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L24)
9
+ Defined in: [web-searcher/src/engines/google.ts:24](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L24)
10
10
 
11
11
  A sample implementation of a Google Search scraper.
12
12
 
@@ -37,7 +37,7 @@ Use this class to understand:
37
37
 
38
38
  > **new GoogleSearcher**(`options?`): `GoogleSearcher`
39
39
 
40
- Defined in: web-fetcher/dist/index.d.ts:2192
40
+ Defined in: web-fetcher/dist/index.d.ts:2275
41
41
 
42
42
  Creates a new FetchSession.
43
43
 
@@ -63,7 +63,7 @@ Configuration options for the fetcher.
63
63
 
64
64
  > `protected` **closed**: `boolean`
65
65
 
66
- Defined in: web-fetcher/dist/index.d.ts:2186
66
+ Defined in: web-fetcher/dist/index.d.ts:2269
67
67
 
68
68
  #### Inherited from
69
69
 
@@ -75,7 +75,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
75
75
 
76
76
  > `readonly` **context**: `FetchContext`
77
77
 
78
- Defined in: web-fetcher/dist/index.d.ts:2185
78
+ Defined in: web-fetcher/dist/index.d.ts:2268
79
79
 
80
80
  The execution context for this session, containing configurations, event bus, and shared state.
81
81
 
@@ -89,7 +89,7 @@ The execution context for this session, containing configurations, event bus, an
89
89
 
90
90
  > `readonly` **id**: `string`
91
91
 
92
- Defined in: web-fetcher/dist/index.d.ts:2181
92
+ Defined in: web-fetcher/dist/index.d.ts:2264
93
93
 
94
94
  Unique identifier for the session.
95
95
 
@@ -103,7 +103,7 @@ Unique identifier for the session.
103
103
 
104
104
  > `protected` **options**: `FetcherOptions`
105
105
 
106
- Defined in: web-fetcher/dist/index.d.ts:2177
106
+ Defined in: web-fetcher/dist/index.d.ts:2260
107
107
 
108
108
  #### Inherited from
109
109
 
@@ -115,7 +115,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
115
115
 
116
116
  > `static` **\_isFactory**: `boolean` = `false`
117
117
 
118
- Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L33)
118
+ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
119
119
 
120
120
  #### Inherited from
121
121
 
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
127
127
 
128
128
  > `static` **alias**: `string`[]
129
129
 
130
- Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L25)
130
+ Defined in: [web-searcher/src/engines/google.ts:25](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L25)
131
131
 
132
132
  Engine alias(es). Can be a single string or an array of strings.
133
133
  Useful for registering shorthand names (e.g., 'g' for 'Google').
@@ -142,7 +142,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
142
142
 
143
143
  > `static` **createObject**: (`name`, ...`args`) => [`WebSearcher`](WebSearcher.md)
144
144
 
145
- Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L78)
145
+ Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
146
146
 
147
147
  Creates an instance of the registered search engine.
148
148
 
@@ -176,7 +176,7 @@ An instance of the search engine.
176
176
 
177
177
  > `static` **forEach**: (`cb`) => `void`
178
178
 
179
- Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L85)
179
+ Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
180
180
 
181
181
  Iterates over all registered engines.
182
182
 
@@ -202,7 +202,7 @@ Callback function to invoke for each registered engine.
202
202
 
203
203
  > `static` **get**: (`name`) => *typeof* [`WebSearcher`](WebSearcher.md)
204
204
 
205
- Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L69)
205
+ Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
206
206
 
207
207
  Retrieves a registered search engine class by name.
208
208
 
@@ -230,7 +230,7 @@ The search engine class constructor.
230
230
 
231
231
  > `static` `optional` **name**: `string`
232
232
 
233
- Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L40)
233
+ Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
234
234
 
235
235
  Custom engine name. If not provided, it is derived from the class name.
236
236
  For example, `GoogleSearcher` becomes `Google`.
@@ -245,7 +245,7 @@ For example, `GoogleSearcher` becomes `Google`.
245
245
 
246
246
  > `static` **register**: (`ctor`, `options?`) => `boolean`
247
247
 
248
- Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L54)
248
+ Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
249
249
 
250
250
  Registers a search engine class.
251
251
 
@@ -279,7 +279,7 @@ Registration options. If a string is provided, it is used as the registered name
279
279
 
280
280
  > `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
281
281
 
282
- Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L93)
282
+ Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
283
283
 
284
284
  Sets aliases for a registered engine.
285
285
 
@@ -311,7 +311,7 @@ Aliases to add.
311
311
 
312
312
  > `static` **unregister**: (`name?`) => `void`
313
313
 
314
- Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L61)
314
+ Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
315
315
 
316
316
  Unregisters a search engine.
317
317
 
@@ -339,7 +339,7 @@ The name or class to unregister.
339
339
 
340
340
  > **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md)
341
341
 
342
- Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L61)
342
+ Defined in: [web-searcher/src/engines/google.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L61)
343
343
 
344
344
  Configures pagination for Google Search results.
345
345
  Uses the 'start' URL parameter, incrementing by 10 for each page.
@@ -360,7 +360,7 @@ Uses the 'start' URL parameter, incrementing by 10 for each page.
360
360
 
361
361
  > **get** **template**(): `FetcherOptions`
362
362
 
363
- Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L32)
363
+ Defined in: [web-searcher/src/engines/google.ts:32](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L32)
364
364
 
365
365
  Defines the fetch template for Google Search.
366
366
 
@@ -380,7 +380,7 @@ The fetcher configuration including the URL pattern and extraction rules.
380
380
 
381
381
  > `protected` **createContext**(`options`): `FetchContext`
382
382
 
383
- Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L155)
383
+ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
384
384
 
385
385
  #### Parameters
386
386
 
@@ -402,7 +402,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
402
402
 
403
403
  > **dispose**(): `Promise`\<`void`\>
404
404
 
405
- Defined in: web-fetcher/dist/index.d.ts:2251
405
+ Defined in: web-fetcher/dist/index.d.ts:2334
406
406
 
407
407
  Disposes of the session and its associated engine.
408
408
 
@@ -425,7 +425,7 @@ This method should be called when the session is no longer needed to free up res
425
425
 
426
426
  > **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
427
427
 
428
- Defined in: web-fetcher/dist/index.d.ts:2206
428
+ Defined in: web-fetcher/dist/index.d.ts:2289
429
429
 
430
430
  Executes a single action within the session.
431
431
 
@@ -473,7 +473,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
473
473
 
474
474
  > **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
475
475
 
476
- Defined in: web-fetcher/dist/index.d.ts:2223
476
+ Defined in: web-fetcher/dist/index.d.ts:2306
477
477
 
478
478
  Executes a sequence of actions.
479
479
 
@@ -517,7 +517,7 @@ const { result, outputs } = await session.executeAll([
517
517
 
518
518
  > `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
519
519
 
520
- Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L82)
520
+ Defined in: [web-searcher/src/engines/google.ts:82](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L82)
521
521
 
522
522
  Maps standard `SearchOptions` to Google's specific URL parameters.
523
523
 
@@ -551,7 +551,7 @@ A map of variables to inject into the URL template.
551
551
 
552
552
  > **getOutputs**(): `Record`\<`string`, `any`\>
553
553
 
554
- Defined in: web-fetcher/dist/index.d.ts:2234
554
+ Defined in: web-fetcher/dist/index.d.ts:2317
555
555
 
556
556
  Retrieves all outputs accumulated during the session.
557
557
 
@@ -571,7 +571,7 @@ A record of stored output data.
571
571
 
572
572
  > **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
573
573
 
574
- Defined in: web-fetcher/dist/index.d.ts:2240
574
+ Defined in: web-fetcher/dist/index.d.ts:2323
575
575
 
576
576
  Gets the current state of the session, including cookies and engine-specific state.
577
577
 
@@ -591,7 +591,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
591
591
 
592
592
  > **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
593
593
 
594
- Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L182)
594
+ Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
595
595
 
596
596
  Executes a search query.
597
597
 
@@ -628,7 +628,7 @@ A promise resolving to an array of standardized search results.
628
628
 
629
629
  > `protected` **transform**(`outputs`): `Promise`\<`any`[]\>
630
630
 
631
- Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/engines/google.ts#L144)
631
+ Defined in: [web-searcher/src/engines/google.ts:144](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/engines/google.ts#L144)
632
632
 
633
633
  Cleans and normalizes the extracted results.
634
634
  Specifically, it unwraps Google's redirect URLs (starting with `/url?q=`).
@@ -657,7 +657,7 @@ An array of cleaned search results.
657
657
 
658
658
  > `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
659
659
 
660
- Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L106)
660
+ Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
661
661
 
662
662
  Static helper to execute a one-off search.
663
663
 
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Abstract Class: WebSearcher
8
8
 
9
- Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L31)
9
+ Defined in: [web-searcher/src/searcher.ts:31](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L31)
10
10
 
11
11
  The abstract base class for all search engines.
12
12
 
@@ -41,7 +41,7 @@ WebSearcher.register(MySearcher);
41
41
 
42
42
  > **new WebSearcher**(`options?`): `WebSearcher`
43
43
 
44
- Defined in: web-fetcher/dist/index.d.ts:2192
44
+ Defined in: web-fetcher/dist/index.d.ts:2275
45
45
 
46
46
  Creates a new FetchSession.
47
47
 
@@ -67,7 +67,7 @@ Configuration options for the fetcher.
67
67
 
68
68
  > `protected` **closed**: `boolean`
69
69
 
70
- Defined in: web-fetcher/dist/index.d.ts:2186
70
+ Defined in: web-fetcher/dist/index.d.ts:2269
71
71
 
72
72
  #### Inherited from
73
73
 
@@ -79,7 +79,7 @@ Defined in: web-fetcher/dist/index.d.ts:2186
79
79
 
80
80
  > `readonly` **context**: `FetchContext`
81
81
 
82
- Defined in: web-fetcher/dist/index.d.ts:2185
82
+ Defined in: web-fetcher/dist/index.d.ts:2268
83
83
 
84
84
  The execution context for this session, containing configurations, event bus, and shared state.
85
85
 
@@ -93,7 +93,7 @@ The execution context for this session, containing configurations, event bus, an
93
93
 
94
94
  > `readonly` **id**: `string`
95
95
 
96
- Defined in: web-fetcher/dist/index.d.ts:2181
96
+ Defined in: web-fetcher/dist/index.d.ts:2264
97
97
 
98
98
  Unique identifier for the session.
99
99
 
@@ -107,7 +107,7 @@ Unique identifier for the session.
107
107
 
108
108
  > `protected` **options**: `FetcherOptions`
109
109
 
110
- Defined in: web-fetcher/dist/index.d.ts:2177
110
+ Defined in: web-fetcher/dist/index.d.ts:2260
111
111
 
112
112
  #### Inherited from
113
113
 
@@ -119,7 +119,7 @@ Defined in: web-fetcher/dist/index.d.ts:2177
119
119
 
120
120
  > `static` **\_isFactory**: `boolean` = `false`
121
121
 
122
- Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L33)
122
+ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L33)
123
123
 
124
124
  ***
125
125
 
@@ -127,7 +127,7 @@ Defined in: [web-searcher/src/searcher.ts:33](https://github.com/isdk/web-search
127
127
 
128
128
  > `static` `optional` **alias**: `string` \| `string`[]
129
129
 
130
- Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L45)
130
+ Defined in: [web-searcher/src/searcher.ts:45](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L45)
131
131
 
132
132
  Engine alias(es). Can be a single string or an array of strings.
133
133
  Useful for registering shorthand names (e.g., 'g' for 'Google').
@@ -138,7 +138,7 @@ Useful for registering shorthand names (e.g., 'g' for 'Google').
138
138
 
139
139
  > `static` **createObject**: (`name`, ...`args`) => `WebSearcher`
140
140
 
141
- Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L78)
141
+ Defined in: [web-searcher/src/searcher.ts:78](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L78)
142
142
 
143
143
  Creates an instance of the registered search engine.
144
144
 
@@ -168,7 +168,7 @@ An instance of the search engine.
168
168
 
169
169
  > `static` **forEach**: (`cb`) => `void`
170
170
 
171
- Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L85)
171
+ Defined in: [web-searcher/src/searcher.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L85)
172
172
 
173
173
  Iterates over all registered engines.
174
174
 
@@ -190,7 +190,7 @@ Callback function to invoke for each registered engine.
190
190
 
191
191
  > `static` **get**: (`name`) => *typeof* `WebSearcher`
192
192
 
193
- Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L69)
193
+ Defined in: [web-searcher/src/searcher.ts:69](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L69)
194
194
 
195
195
  Retrieves a registered search engine class by name.
196
196
 
@@ -214,7 +214,7 @@ The search engine class constructor.
214
214
 
215
215
  > `static` `optional` **name**: `string`
216
216
 
217
- Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L40)
217
+ Defined in: [web-searcher/src/searcher.ts:40](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L40)
218
218
 
219
219
  Custom engine name. If not provided, it is derived from the class name.
220
220
  For example, `GoogleSearcher` becomes `Google`.
@@ -225,7 +225,7 @@ For example, `GoogleSearcher` becomes `Google`.
225
225
 
226
226
  > `static` **register**: (`ctor`, `options?`) => `boolean`
227
227
 
228
- Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L54)
228
+ Defined in: [web-searcher/src/searcher.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L54)
229
229
 
230
230
  Registers a search engine class.
231
231
 
@@ -255,7 +255,7 @@ Registration options. If a string is provided, it is used as the registered name
255
255
 
256
256
  > `static` **setAliases**: (`ctor`, ...`aliases`) => `void`
257
257
 
258
- Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L93)
258
+ Defined in: [web-searcher/src/searcher.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L93)
259
259
 
260
260
  Sets aliases for a registered engine.
261
261
 
@@ -283,7 +283,7 @@ Aliases to add.
283
283
 
284
284
  > `static` **unregister**: (`name?`) => `void`
285
285
 
286
- Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L61)
286
+ Defined in: [web-searcher/src/searcher.ts:61](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L61)
287
287
 
288
288
  Unregisters a search engine.
289
289
 
@@ -307,7 +307,7 @@ The name or class to unregister.
307
307
 
308
308
  > **get** **pagination**(): [`PaginationConfig`](../interfaces/PaginationConfig.md) \| `undefined`
309
309
 
310
- Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L151)
310
+ Defined in: [web-searcher/src/searcher.ts:151](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L151)
311
311
 
312
312
  Optional pagination configuration.
313
313
  Defines how the searcher navigates to subsequent pages.
@@ -326,7 +326,7 @@ If undefined, the searcher will only fetch the first page.
326
326
 
327
327
  > **get** `abstract` **template**(): `FetcherOptions`
328
328
 
329
- Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L143)
329
+ Defined in: [web-searcher/src/searcher.ts:143](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L143)
330
330
 
331
331
  The declarative template for the fetch options.
332
332
 
@@ -356,7 +356,7 @@ get template() {
356
356
 
357
357
  > `protected` **createContext**(`options`): `FetchContext`
358
358
 
359
- Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L155)
359
+ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L155)
360
360
 
361
361
  #### Parameters
362
362
 
@@ -378,7 +378,7 @@ Defined in: [web-searcher/src/searcher.ts:155](https://github.com/isdk/web-searc
378
378
 
379
379
  > **dispose**(): `Promise`\<`void`\>
380
380
 
381
- Defined in: web-fetcher/dist/index.d.ts:2251
381
+ Defined in: web-fetcher/dist/index.d.ts:2334
382
382
 
383
383
  Disposes of the session and its associated engine.
384
384
 
@@ -401,7 +401,7 @@ This method should be called when the session is no longer needed to free up res
401
401
 
402
402
  > **execute**\<`R`\>(`actionOptions`, `context?`): `Promise`\<`FetchActionResult`\<`R`\>\>
403
403
 
404
- Defined in: web-fetcher/dist/index.d.ts:2206
404
+ Defined in: web-fetcher/dist/index.d.ts:2289
405
405
 
406
406
  Executes a single action within the session.
407
407
 
@@ -449,7 +449,7 @@ await session.execute({ name: 'goto', params: { url: 'https://example.com' } });
449
449
 
450
450
  > **executeAll**(`actions`, `options?`): `Promise`\<\{ `outputs`: `Record`\<`string`, `any`\>; `result`: `FetchResponse` \| `undefined`; \}\>
451
451
 
452
- Defined in: web-fetcher/dist/index.d.ts:2223
452
+ Defined in: web-fetcher/dist/index.d.ts:2306
453
453
 
454
454
  Executes a sequence of actions.
455
455
 
@@ -493,7 +493,7 @@ const { result, outputs } = await session.executeAll([
493
493
 
494
494
  > `protected` **formatOptions**(`options`): `Record`\<`string`, `any`\>
495
495
 
496
- Defined in: [web-searcher/src/searcher.ts:308](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L308)
496
+ Defined in: [web-searcher/src/searcher.ts:309](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L309)
497
497
 
498
498
  Transforms standard options into engine-specific template variables.
499
499
 
@@ -521,7 +521,7 @@ A dictionary of variables to be injected into the template.
521
521
 
522
522
  > **getOutputs**(): `Record`\<`string`, `any`\>
523
523
 
524
- Defined in: web-fetcher/dist/index.d.ts:2234
524
+ Defined in: web-fetcher/dist/index.d.ts:2317
525
525
 
526
526
  Retrieves all outputs accumulated during the session.
527
527
 
@@ -541,7 +541,7 @@ A record of stored output data.
541
541
 
542
542
  > **getState**(): `Promise`\<\{ `cookies`: `Cookie`[]; `sessionState?`: `any`; \} \| `undefined`\>
543
543
 
544
- Defined in: web-fetcher/dist/index.d.ts:2240
544
+ Defined in: web-fetcher/dist/index.d.ts:2323
545
545
 
546
546
  Gets the current state of the session, including cookies and engine-specific state.
547
547
 
@@ -561,7 +561,7 @@ A promise resolving to the session state, or undefined if no engine is initializ
561
561
 
562
562
  > **search**(`query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
563
563
 
564
- Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L182)
564
+ Defined in: [web-searcher/src/searcher.ts:182](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L182)
565
565
 
566
566
  Executes a search query.
567
567
 
@@ -594,7 +594,7 @@ A promise resolving to an array of standardized search results.
594
594
 
595
595
  > `protected` **transform**(`outputs`, `context`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
596
596
 
597
- Defined in: [web-searcher/src/searcher.ts:290](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L290)
597
+ Defined in: [web-searcher/src/searcher.ts:291](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L291)
598
598
 
599
599
  Transform and clean the raw extracted results.
600
600
 
@@ -627,7 +627,7 @@ A promise resolving to an array of standardized search results.
627
627
 
628
628
  > `static` **search**(`engineName`, `query`, `options`): `Promise`\<[`StandardSearchResult`](../interfaces/StandardSearchResult.md)[]\>
629
629
 
630
- Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L106)
630
+ Defined in: [web-searcher/src/searcher.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L106)
631
631
 
632
632
  Static helper to execute a one-off search.
633
633
 
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: CustomTimeRange
8
8
 
9
- Defined in: [web-searcher/src/types.ts:78](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L78)
9
+ Defined in: [web-searcher/src/types.ts:104](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L104)
10
10
 
11
11
  ## Properties
12
12
 
@@ -14,7 +14,7 @@ Defined in: [web-searcher/src/types.ts:78](https://github.com/isdk/web-searcher.
14
14
 
15
15
  > **from**: `string` \| `Date`
16
16
 
17
- Defined in: [web-searcher/src/types.ts:80](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L80)
17
+ Defined in: [web-searcher/src/types.ts:106](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L106)
18
18
 
19
19
  Start date (Date object or string like 'YYYY-MM-DD').
20
20
 
@@ -24,6 +24,6 @@ Start date (Date object or string like 'YYYY-MM-DD').
24
24
 
25
25
  > `optional` **to**: `string` \| `Date`
26
26
 
27
- Defined in: [web-searcher/src/types.ts:82](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L82)
27
+ Defined in: [web-searcher/src/types.ts:108](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L108)
28
28
 
29
29
  End date (Date object or string like 'YYYY-MM-DD'). Defaults to current date if omitted.
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: PaginationConfig
8
8
 
9
- Defined in: [web-searcher/src/types.ts:26](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L26)
9
+ Defined in: [web-searcher/src/types.ts:41](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L41)
10
10
 
11
11
  Configuration for pagination strategies.
12
12
  Defines how the searcher should navigate to the next page of results.
@@ -17,7 +17,7 @@ Defines how the searcher should navigate to the next page of results.
17
17
 
18
18
  > `optional` **increment**: `number`
19
19
 
20
- Defined in: [web-searcher/src/types.ts:53](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L53)
20
+ Defined in: [web-searcher/src/types.ts:68](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L68)
21
21
 
22
22
  The increment step for each page.
23
23
  - If the parameter represents an item offset (like Google's 'start'), this might be 10.
@@ -31,11 +31,31 @@ The increment step for each page.
31
31
 
32
32
  ***
33
33
 
34
+ ### maxPages?
35
+
36
+ > `optional` **maxPages**: `number`
37
+
38
+ Defined in: [web-searcher/src/types.ts:85](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L85)
39
+
40
+ The safety threshold for the maximum number of pages to fetch automatically
41
+ in a single search call.
42
+
43
+ Even if the requested `limit` of results hasn't been reached, the searcher
44
+ will stop after this many pages to prevent infinite loops or excessive API usage.
45
+
46
+ #### Default
47
+
48
+ ```ts
49
+ 10
50
+ ```
51
+
52
+ ***
53
+
34
54
  ### nextButtonSelector?
35
55
 
36
56
  > `optional` **nextButtonSelector**: `string`
37
57
 
38
- Defined in: [web-searcher/src/types.ts:59](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L59)
58
+ Defined in: [web-searcher/src/types.ts:74](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L74)
39
59
 
40
60
  The CSS selector for the "Next" page button.
41
61
  Required if type is 'click-next'.
@@ -46,7 +66,7 @@ Required if type is 'click-next'.
46
66
 
47
67
  > `optional` **paramName**: `string`
48
68
 
49
- Defined in: [web-searcher/src/types.ts:39](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L39)
69
+ Defined in: [web-searcher/src/types.ts:54](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L54)
50
70
 
51
71
  The name of the URL parameter used for pagination.
52
72
  Required if type is 'url-param'.
@@ -63,7 +83,7 @@ Required if type is 'url-param'.
63
83
 
64
84
  > `optional` **startValue**: `number`
65
85
 
66
- Defined in: [web-searcher/src/types.ts:45](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L45)
86
+ Defined in: [web-searcher/src/types.ts:60](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L60)
67
87
 
68
88
  The starting value for the pagination parameter.
69
89
 
@@ -79,7 +99,7 @@ The starting value for the pagination parameter.
79
99
 
80
100
  > **type**: `"url-param"` \| `"click-next"`
81
101
 
82
- Defined in: [web-searcher/src/types.ts:32](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L32)
102
+ Defined in: [web-searcher/src/types.ts:47](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L47)
83
103
 
84
104
  The type of pagination mechanism:
85
105
  - 'url-param': Pagination is handled by modifying URL parameters (e.g., `?page=2` or `?start=10`).
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: SearchContext
8
8
 
9
- Defined in: [web-searcher/src/types.ts:65](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L65)
9
+ Defined in: [web-searcher/src/types.ts:91](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L91)
10
10
 
11
11
  Context object passed to the transform function.
12
12
 
@@ -16,7 +16,7 @@ Context object passed to the transform function.
16
16
 
17
17
  > `optional` **limit**: `number`
18
18
 
19
- Defined in: [web-searcher/src/types.ts:73](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L73)
19
+ Defined in: [web-searcher/src/types.ts:99](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L99)
20
20
 
21
21
  The requested limit of results.
22
22
 
@@ -26,7 +26,7 @@ The requested limit of results.
26
26
 
27
27
  > **page**: `number`
28
28
 
29
- Defined in: [web-searcher/src/types.ts:70](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L70)
29
+ Defined in: [web-searcher/src/types.ts:96](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L96)
30
30
 
31
31
  The current page index (0-based).
32
32
 
@@ -36,6 +36,6 @@ The current page index (0-based).
36
36
 
37
37
  > **query**: `string`
38
38
 
39
- Defined in: [web-searcher/src/types.ts:67](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L67)
39
+ Defined in: [web-searcher/src/types.ts:93](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L93)
40
40
 
41
41
  The original search query.
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: SearchOptions
8
8
 
9
- Defined in: [web-searcher/src/types.ts:94](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L94)
9
+ Defined in: [web-searcher/src/types.ts:120](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L120)
10
10
 
11
11
  Options provided when executing a search.
12
12
 
@@ -22,7 +22,7 @@ Any other custom variables to be injected into the template.
22
22
 
23
23
  > `optional` **category**: [`SearchCategory`](../type-aliases/SearchCategory.md)
24
24
 
25
- Defined in: [web-searcher/src/types.ts:108](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L108)
25
+ Defined in: [web-searcher/src/types.ts:144](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L144)
26
26
 
27
27
  The category of results to return.
28
28
  Default: 'all' (web search)
@@ -33,7 +33,7 @@ Default: 'all' (web search)
33
33
 
34
34
  > `optional` **language**: `string`
35
35
 
36
- Defined in: [web-searcher/src/types.ts:118](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L118)
36
+ Defined in: [web-searcher/src/types.ts:154](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L154)
37
37
 
38
38
  Language code (ISO 639-1) for the interface or results (e.g., 'en', 'zh-CN').
39
39
 
@@ -43,17 +43,32 @@ Language code (ISO 639-1) for the interface or results (e.g., 'en', 'zh-CN').
43
43
 
44
44
  > `optional` **limit**: `number`
45
45
 
46
- Defined in: [web-searcher/src/types.ts:96](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L96)
46
+ Defined in: [web-searcher/src/types.ts:122](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L122)
47
47
 
48
48
  The maximum number of results to retrieve.
49
49
 
50
50
  ***
51
51
 
52
+ ### maxPages?
53
+
54
+ > `optional` **maxPages**: `number`
55
+
56
+ Defined in: [web-searcher/src/types.ts:132](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L132)
57
+
58
+ The maximum number of pages (fetch cycles) allowed to reach the requested `limit`.
59
+
60
+ This is a safety guard. If the `limit` is high but each page has few results,
61
+ the searcher will stop once this page count is reached.
62
+
63
+ If not provided, it defaults to the value in `PaginationConfig` or 10.
64
+
65
+ ***
66
+
52
67
  ### region?
53
68
 
54
69
  > `optional` **region**: `string`
55
70
 
56
- Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L113)
71
+ Defined in: [web-searcher/src/types.ts:149](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L149)
57
72
 
58
73
  Region code (ISO 3166-1 alpha-2) to bias results (e.g., 'US', 'CN', 'JP').
59
74
 
@@ -63,7 +78,7 @@ Region code (ISO 3166-1 alpha-2) to bias results (e.g., 'US', 'CN', 'JP').
63
78
 
64
79
  > `optional` **safeSearch**: [`SafeSearchLevel`](../type-aliases/SafeSearchLevel.md)
65
80
 
66
- Defined in: [web-searcher/src/types.ts:124](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L124)
81
+ Defined in: [web-searcher/src/types.ts:160](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L160)
67
82
 
68
83
  Safe search filtering level.
69
84
  Default: engine dependent (usually 'moderate' or 'strict' by default).
@@ -74,7 +89,7 @@ Default: engine dependent (usually 'moderate' or 'strict' by default).
74
89
 
75
90
  > `optional` **timeRange**: [`SearchTimeRange`](../type-aliases/SearchTimeRange.md)
76
91
 
77
- Defined in: [web-searcher/src/types.ts:102](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L102)
92
+ Defined in: [web-searcher/src/types.ts:138](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L138)
78
93
 
79
94
  Date range for the search results.
80
95
  Default: 'all'
@@ -85,7 +100,7 @@ Default: 'all'
85
100
 
86
101
  > `optional` **transform**: (`results`, `context`) => [`StandardSearchResult`](StandardSearchResult.md)[] \| `Promise`\<[`StandardSearchResult`](StandardSearchResult.md)[]\>
87
102
 
88
- Defined in: [web-searcher/src/types.ts:130](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L130)
103
+ Defined in: [web-searcher/src/types.ts:166](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L166)
89
104
 
90
105
  A custom transform function to filter or modify results at runtime.
91
106
  This runs AFTER the engine-level transform.
@@ -6,7 +6,7 @@
6
6
 
7
7
  # Interface: StandardSearchResult
8
8
 
9
- Defined in: [web-searcher/src/types.ts:5](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L5)
9
+ Defined in: [web-searcher/src/types.ts:5](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L5)
10
10
 
11
11
  Interface representing a standardized search result item.
12
12
  This ensures consistency across different search engines.
@@ -15,35 +15,85 @@ This ensures consistency across different search engines.
15
15
 
16
16
  \[`key`: `string`\]: `any`
17
17
 
18
- Allows for engine-specific extra fields (e.g., rank, author, date).
18
+ Allows for engine-specific extra fields (e.g., siteIcon, category).
19
19
 
20
20
  ## Properties
21
21
 
22
+ ### author?
23
+
24
+ > `optional` **author**: `string`
25
+
26
+ Defined in: [web-searcher/src/types.ts:22](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L22)
27
+
28
+ The author or source name of the result.
29
+
30
+ ***
31
+
32
+ ### date?
33
+
34
+ > `optional` **date**: `string` \| `Date`
35
+
36
+ Defined in: [web-searcher/src/types.ts:19](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L19)
37
+
38
+ The date the result was published or last updated.
39
+
40
+ ***
41
+
42
+ ### favicon?
43
+
44
+ > `optional` **favicon**: `string`
45
+
46
+ Defined in: [web-searcher/src/types.ts:25](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L25)
47
+
48
+ The favicon URL of the source website.
49
+
50
+ ***
51
+
22
52
  ### image?
23
53
 
24
54
  > `optional` **image**: `string`
25
55
 
26
- Defined in: [web-searcher/src/types.ts:16](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L16)
56
+ Defined in: [web-searcher/src/types.ts:16](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L16)
27
57
 
28
58
  An optional image URL associated with the result.
29
59
 
30
60
  ***
31
61
 
62
+ ### rank?
63
+
64
+ > `optional` **rank**: `number`
65
+
66
+ Defined in: [web-searcher/src/types.ts:28](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L28)
67
+
68
+ The rank or position of the result (usually 1-indexed).
69
+
70
+ ***
71
+
32
72
  ### snippet?
33
73
 
34
74
  > `optional` **snippet**: `string`
35
75
 
36
- Defined in: [web-searcher/src/types.ts:13](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L13)
76
+ Defined in: [web-searcher/src/types.ts:13](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L13)
37
77
 
38
78
  A brief snippet or description of the result.
39
79
 
40
80
  ***
41
81
 
82
+ ### source?
83
+
84
+ > `optional` **source**: `string`
85
+
86
+ Defined in: [web-searcher/src/types.ts:31](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L31)
87
+
88
+ The source website name (e.g., 'GitHub', 'StackOverflow').
89
+
90
+ ***
91
+
42
92
  ### title
43
93
 
44
94
  > **title**: `string`
45
95
 
46
- Defined in: [web-searcher/src/types.ts:7](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L7)
96
+ Defined in: [web-searcher/src/types.ts:7](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L7)
47
97
 
48
98
  The title of the search result.
49
99
 
@@ -53,6 +103,6 @@ The title of the search result.
53
103
 
54
104
  > **url**: `string`
55
105
 
56
- Defined in: [web-searcher/src/types.ts:10](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L10)
106
+ Defined in: [web-searcher/src/types.ts:10](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L10)
57
107
 
58
108
  The URL of the search result.
@@ -8,4 +8,4 @@
8
8
 
9
9
  > **SafeSearchLevel** = `"off"` \| `"moderate"` \| `"strict"`
10
10
 
11
- Defined in: [web-searcher/src/types.ts:89](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L89)
11
+ Defined in: [web-searcher/src/types.ts:115](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L115)
@@ -8,4 +8,4 @@
8
8
 
9
9
  > **SearchCategory** = `"all"` \| `"images"` \| `"videos"` \| `"news"`
10
10
 
11
- Defined in: [web-searcher/src/types.ts:87](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L87)
11
+ Defined in: [web-searcher/src/types.ts:113](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L113)
@@ -8,4 +8,4 @@
8
8
 
9
9
  > **SearchTimeRange** = [`SearchTimeRangePreset`](SearchTimeRangePreset.md) \| [`CustomTimeRange`](../interfaces/CustomTimeRange.md)
10
10
 
11
- Defined in: [web-searcher/src/types.ts:85](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L85)
11
+ Defined in: [web-searcher/src/types.ts:111](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L111)
@@ -8,4 +8,4 @@
8
8
 
9
9
  > **SearchTimeRangePreset** = `"all"` \| `"day"` \| `"week"` \| `"month"` \| `"year"`
10
10
 
11
- Defined in: [web-searcher/src/types.ts:76](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/types.ts#L76)
11
+ Defined in: [web-searcher/src/types.ts:102](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/types.ts#L102)
@@ -8,7 +8,7 @@
8
8
 
9
9
  > **SearcherConstructor** = (`options?`) => [`WebSearcher`](../classes/WebSearcher.md)
10
10
 
11
- Defined in: [web-searcher/src/searcher.ts:10](https://github.com/isdk/web-searcher.js/blob/6ce291d521b8526526b386fab6dda19d36d0bece/src/searcher.ts#L10)
11
+ Defined in: [web-searcher/src/searcher.ts:10](https://github.com/isdk/web-searcher.js/blob/e17f1bcb40984e389c2901da9e3b4886a969899a/src/searcher.ts#L10)
12
12
 
13
13
  Constructor definition for Searcher subclasses.
14
14
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@isdk/web-searcher",
3
- "version": "0.1.2",
3
+ "version": "0.1.3",
4
4
  "description": "A high-level framework for building search engine scrapers, supporting multi-page navigation, session persistence, and result standardization.",
5
5
  "license": "MIT",
6
6
  "author": "Riceball LEE <snowyu.lee@gmail.com>",