npm - @isdk/proxy - Versions diffs - 0.1.1 → 0.1.2 - Mend

@isdk/proxy 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.cn.md +191 -9
package/README.md +191 -7
package/dist/index.d.mts +272 -30
package/dist/index.d.ts +272 -30
package/dist/index.js +1 -1
package/dist/index.mjs +1 -1
package/docs/README.md +191 -7
package/docs/classes/SmartCache.md +59 -11
package/docs/functions/createCachedFetch.md +1 -1
package/docs/functions/createFetchWithCache.md +1 -1
package/docs/functions/extractData.md +34 -5
package/docs/functions/fetchWithCache.md +11 -1
package/docs/functions/generateCacheKey.md +34 -4
package/docs/functions/getSiteConfig.md +39 -0
package/docs/functions/isAllowed.md +35 -8
package/docs/functions/isGlob.md +23 -0
package/docs/functions/isMatch.md +44 -0
package/docs/globals.md +5 -0
package/docs/interfaces/BodyFilterConfig.md +77 -0
package/docs/interfaces/CacheEntry.md +9 -9
package/docs/interfaces/CacheMetadata.md +8 -8
package/docs/interfaces/CacheRule.md +80 -0
package/docs/interfaces/FetchWithCacheContext.md +43 -13
package/docs/interfaces/FetchWithCacheOptions.md +40 -10
package/docs/interfaces/KeyFilterConfig.md +11 -7
package/docs/interfaces/ProxyConfig.md +4 -4
package/docs/interfaces/SiteCacheConfig.md +46 -11
package/docs/interfaces/SmartCacheOptions.md +32 -6
package/package.json +4 -2

package/README.cn.md CHANGED Viewed

@@ -13,6 +13,8 @@
 ## 核心特性
 - **🚀 混合多级缓存**: L1 (LRU 内存) 提供极速响应，L2 (内容寻址磁盘 `cacache`) 提供持久化存储。
+- **📥 HTTP POST & 多方法支持**: 完整支持 POST、PUT 等非 GET 方法的缓存，内置智能请求体指纹计算机制。
+- **🎯 精细化规则拦截**: 支持通过 `cacheRules` 对特定路径或 Query 参数进行外科手术式的精确缓存控制。
 - **🌊 原生流式分发**: 内部完全基于 Stream 管道化构建，在代理大文件时天然防 OOM 内存溢出。
 - **🧠 智能元数据驻留**: 无论文件多大，元数据 (Headers, Status, Policy) 始终驻留在内存中，确保纳秒级的缓存策略判定。
 - **🔄 过期后异步更新 (SWR)**: 立即返回过期数据，同时在后台静默更新缓存，实现“零等待”响应。
@@ -31,6 +33,8 @@ pnpm add @isdk/proxy
 使用 `@isdk/proxy` 的主要方式是通过 `fetchWithCache` 函数，它可以包装任何 HTTP 请求逻辑。
+### 基础用法 (GET 请求)
 ```typescript
 import { SmartCache, createCachedFetch } from '@isdk/proxy';
@@ -40,31 +44,100 @@ const cache = new SmartCache({
   maxMemorySize: 1024 * 1024 // 内存阈值 1MB
 });
-// 2. 创建一个预配置的缓存 Fetcher (内部会自动防缓存击穿)
+// 2. 创建一个预配置的缓存 Fetcher
 const myFetch = createCachedFetch({
   cache,
   config: {
     staleIfError: true,
-    forceCache: false // 设置为 true 可无视 no-store 强制缓存一切，适用于离线应用
   },
   backgroundUpdate: true // 开启 SWR (过期后后台静默更新)
 });
-// 3. 在应用的任何地方愉快地使用它！
-const request = new Request('https://api.example.com/data');
-const response = await myFetch(request, (req) => fetch(req)); // 传入任何返回 Promise<Response> 的获取函数
+// 3. 愉快地使用它！
+const response = await myFetch(new Request('https://api.example.com/data'), (req) => fetch(req));
+console.log(response.headers.get('x-proxy-cache'));
+```
+### 进阶用法：缓存 POST 请求
-console.log(response.headers.get('x-proxy-cache')); // 输出: "MISS", "HIT", "STALE" 或 "STALE_IF_ERROR"
-const data = await response.json();
+你可以通过配置 `methods` 开启 POST/PUT 缓存，并使用 `body` 过滤器排除请求体中的动态字段（如时间戳、随机数），从而确保缓存键的稳定性。
+```typescript
+const myPostFetch = createCachedFetch({
+  cache,
+  config: {
+    methods: ['GET', 'POST'], // 允许缓存 POST
+    body: {
+      exclude: ['timestamp', 'nonce'] // 生成缓存键时忽略这些动态字段
+    },
+    cacheRules: [
+      { method: 'POST', path: '/api/v1/query' } // 仅对特定的 POST 接口生效
+    ],
+    forceCache: true // 对于 POST 请求，后端通常不发 Cache-Control，建议开启强制缓存
+  }
+});
+```
+## 配置详解：`SiteCacheConfig`
+| 配置项 | 类型 | 说明 |
+| :--- | :--- | :--- |
+| `methods` | `string[]` | 允许缓存的 HTTP 方法列表。默认仅为 `['GET', 'HEAD']`。 |
+| `cacheRules` | `CacheRule[]` | 精细化拦截规则。如果配置，请求必须匹配其中至少一条规则才会被缓存。 |
+| `query` | `KeyFilterConfig` | URL 查询参数过滤（`include` 白名单 / `exclude` 黑名单）。 |
+| `headers` | `KeyFilterConfig` | 请求头过滤。 |
+| `cookies` | `KeyFilterConfig` | Cookie 字段过滤。 |
+| `body` | `KeyFilterConfig` | **仅限 JSON** 的请求体字段过滤。 |
+| `staleIfError`| `boolean` | 网络请求失败时，是否强制返回本地过期的旧缓存。 |
+| `forceCache` | `boolean` | 是否无视源站指令强制执行缓存，常用于离线应用。 |
+### `CacheRule` 规则对象
+- `method`: 匹配的 HTTP 方法。
+- `path`: 路径匹配（支持**正则表达式**、**Glob 通配符**、**数组格式**或**前缀匹配**）。
+- `query`: 键值对匹配。值可以是 `string`（全等/Glob匹配）、`true`（参数必须存在）、`false`（参数必须不存在）、或 `RegExp`（正则匹配）。
+- `body`: Body 内容匹配（支持**正则表达式**、**Glob 通配符**或**数组格式**）。
+### 模式匹配说明
+`@isdk/proxy` 为所有可配置字段提供强大的模式匹配能力：
+| 模式类型 | 示例 | 说明 |
+| :--- | :--- | :--- |
+| **正则表达式** | `/api/v[12]/.*/i` | JavaScript 正则（JSON 中用字符串表示，如 `"/api/v[12]/.*/i"`） |
+| **Glob 通配符** | `/**/*.json` | 文件路径风格通配符匹配 |
+| **否定模式** | `['!/api/private/**', '/api/**']` | 排除匹配（以 `!` 开头） |
+| **数组格式** | `['/api/v1/*', '/api/v2/*']` | 多模式组合（OR 逻辑，负向优先） |
+| **布尔值** | `true` / `false` | 用于 query 参数：必须存在/不存在 |
+**高级模式匹配示例：**
+```typescript
+const myFetch = createCachedFetch({
+  cache,
+  config: {
+    cacheRules: [
+      {
+        path: ['/api/v1/items/*', '!/api/v1/items/private/*'], // v1 items，排除 private
+        query: {
+          format: '/^(json|xml)$/',     // 正则匹配 format 参数
+          'page*': true                  // Glob：任何以 page 开头的参数必须存在
+        },
+        body: /\"action\"\s*:\s*\"query\"/ // 正则匹配 Body 内容
+      }
+    ]
+  }
+});
 ```
 ## 适配器 (Adapters)
 `@isdk/proxy` 旨在成为环境无关的纯净核心。虽然核心库保持纯粹，但你可以轻松集成或找到针对特定环境的适配器：
-- **MSW 适配器**: 参见 `@isdk/proxy-msw` (独立包)，将此缓存引擎作为 MSW 拦截器使用。
+- **HTTP 代理服务器 (Node.js)**: 参见 [@isdk/proxy-server](https://www.npmjs.com/package/@isdk/proxy-server)（独立包），用于启动独立的 HTTP 缓存代理服务器。
+- **Crawlee 适配器**: 参见 [@isdk/proxy-crawlee](https://www.npmjs.com/package/@isdk/proxy-crawlee)（独立包），用于集成到 Crawlee 网页爬虫生命周期中。
+- **MSW 适配器**: 参见 `@isdk/proxy-msw`（独立包），将此缓存引擎作为 MSW 拦截器使用。
 - **Axios 适配器**: 可以通过将 Axios 配置转换为 Web 标准 `Request` 轻松实现。
-- **Crawlee 适配器**: 能够集成到爬虫生命周期中，减少重复抓取。
 ## 架构设计详解
@@ -124,6 +197,115 @@ const data = await response.json();
 - **`options.maxMemorySize`**: 响应体进入内存 (L1) 的大小阈值（字节），超过此大小的文件将直接进入磁盘流传输（默认 `1048576` 即 1MB）。
 - **`options.storagePath`**: 磁盘 L2 缓存（cacache）的物理存储路径（默认为操作系统的临时目录）。
+### 工具函数
+导出以下工具函数供高级用法：
+#### `isMatch(pattern, value, usePrefix?)`
+通用模式匹配函数。支持正则表达式、Glob、数组模式（含否定）和字符串前缀/精确匹配。
+- **`pattern`**: `string | RegExp | (string | RegExp)[]`
+- **`value`**: 要测试的字符串
+- **`usePrefix`**: 对于普通字符串，是否使用前缀匹配而非精确匹配（默认：`false`）
+- **返回值**: `boolean`
+```typescript
+import { isMatch } from '@isdk/proxy';
+isMatch('/api/v[12]/.*', '/api/v1/users');     // 正则表达式
+isMatch('/api/**/*.json', '/api/v1/data.json'); // Glob 通配符
+isMatch(['!/private/**', '/api/**'], '/api/data'); // 否定模式
+```
+#### `isGlob(pattern)`
+判断字符串是否为 Glob 语法。
+- **`pattern`**: `string`
+- **返回值**: `boolean`
+#### `getSiteConfig(urlString, proxyConfig)`
+根据 URL 获取对应的站点级缓存配置。
+- **`urlString`**: 完整的请求 URL
+- **`proxyConfig`**: 包含 `sites` 和 `default` 配置的 `ProxyConfig` 对象
+- **返回值**: `SiteCacheConfig`
+```typescript
+import { getSiteConfig } from '@isdk/proxy';
+const config = getSiteConfig('https://api.example.com/data', {
+  default: { methods: ['GET'] },
+  sites: {
+    'api.example.com': { methods: ['GET', 'POST'], forceCache: true },
+    '/internal/': { staleIfError: true } // 前缀匹配
+  }
+});
+```
+#### `isAllowed(key, config, defaultAllowed?)`
+判断指定的键是否允许参与缓存指纹计算。
+- **`key`**: 要检查的键名
+- **`config`**: `KeyFilterConfig` 对象，支持 `include`（白名单）或 `exclude`（黑名单）
+- **`defaultAllowed`**: 可选参数。当没有配置或配置未命中时使用的默认值
+- **返回值**: `boolean | undefined`
+**优先级逻辑**：
+1. `exclude` 命中 → 直接返回 `false`（优先级最高）
+2. `include` 存在且命中 → 返回 `true`
+3. `include` 存在但不命中 → 返回 `false`
+4. 都没有配置 → 使用 `defaultAllowed`（未传则返回 `undefined`）
+```typescript
+import { isAllowed } from '@isdk/proxy';
+// 无配置
+isAllowed('key'); // undefined (falsy)
+// 白名单
+isAllowed('id', { include: ['id', 'name'] }); // true
+isAllowed('email', { include: ['id', 'name'] }); // false
+// 黑名单
+isAllowed('password', { exclude: ['password'] }); // false
+isAllowed('name', { exclude: ['password'] }); // undefined (falsy)
+// 需要 defaultAllowed 来设置默认值
+isAllowed('name', { exclude: ['password'] }, true); // true
+```
+#### `extractData(source, config, defaultAllowed?)`
+从源对象中根据过滤配置提取数据并标准化。用于生成缓存指纹。
+- **`source`**: 原始数据对象（Query、Headers、Cookies 等）
+- **`config`**: `KeyFilterConfig` 对象
+- **`defaultAllowed`**: 可选参数。当没有配置或配置未命中时，是否允许提取（默认 `false`）
+- **返回值**: `Record<string, string[]>` 标准化后的数据，键为小写，值为排序后的数组
+```typescript
+import { extractData } from '@isdk/proxy';
+const headers = { 'Content-Type': 'application/json', 'X-Request-Id': '123' };
+// 默认不提取任何键
+extractData(headers); // {}
+// 提取所有键
+extractData(headers, undefined, true); // { 'content-type': ['application/json'], 'x-request-id': ['123'] }
+// 白名单
+extractData(headers, { include: ['content-type'] }); // { 'content-type': ['application/json'] }
+// 黑名单
+extractData(headers, { include: ['*'], exclude: ['x-request-id'] }, true); // { 'content-type': ['application/json'] }
+```
 ### 缓存状态标头 (Cache Status Headers)
 由 `@isdk/proxy` 处理并返回的所有 `Response`，其 Headers 中都会注入 `x-proxy-cache` 字段以便观测生命周期，可能的值有：

package/README.md CHANGED Viewed

@@ -15,6 +15,8 @@ In high-concurrency environments—like **API Proxies**, **Web Scrapers**, or **
 ## Key Features
 - **🚀 Hybrid Multi-tier Cache**: Extreme speed with L1 (LRU Memory) and persistence with L2 (Content Addressable Disk via `cacache`).
+- **📥 HTTP POST & Method Support**: Full support for caching POST, PUT, and other methods with intelligent request body fingerprinting.
+- **🎯 Precision Filtering**: Fine-grained `cacheRules` to intercept specific paths or query parameters.
 - **🌊 Streaming Native**: Fully stream-based internal pipeline natively prevents Out-Of-Memory (OOM) issues when proxying large files.
 - **🧠 Intelligent Meta-Residency**: Metadata (Headers, Status, Policy) stays in memory regardless of body size, ensuring nanosecond cache policy evaluations.
 - **🔄 Stale-While-Revalidate (SWR)**: Serve stale content instantly while updating the cache silently in the background.
@@ -33,6 +35,8 @@ pnpm add @isdk/proxy
 The primary way to use `@isdk/proxy` is via the `fetchWithCache` function, which can wrap any HTTP request logic.
+### Basic Usage (GET)
 ```typescript
 import { SmartCache, createCachedFetch } from '@isdk/proxy';
@@ -42,28 +46,98 @@ const cache = new SmartCache({
   maxMemorySize: 1024 * 1024 // 1MB threshold
 });
-// 2. Create a pre-configured cached fetcher (automatically tracks concurrent requests)
+// 2. Create a pre-configured cached fetcher
 const myFetch = createCachedFetch({
   cache,
   config: {
     staleIfError: true,
-    forceCache: false // Set to true to cache everything (ignore no-store) for offline-first apps
   },
   backgroundUpdate: true // Enable SWR
 });
-// 3. Use it anywhere in your app!
-const request = new Request('https://api.example.com/data');
-const response = await myFetch(request, (req) => fetch(req));
+// 3. Use it!
+const response = await myFetch(new Request('https://api.example.com/data'), (req) => fetch(req));
+console.log(response.headers.get('x-proxy-cache'));
+```
+### Advanced Usage: Caching POST Requests
-console.log(response.headers.get('x-proxy-cache')); // "MISS", "HIT", "STALE", or "STALE_IF_ERROR"
-const data = await response.json();
+You can cache POST/PUT requests by enabling methods and defining body filters to ignore dynamic fields (like timestamps) in the request body.
+```typescript
+const myPostFetch = createCachedFetch({
+  cache,
+  config: {
+    methods: ['GET', 'POST'], // Enable POST caching
+    body: {
+      exclude: ['timestamp', 'nonce'] // Ignore these fields when generating cache keys
+    },
+    cacheRules: [
+      { method: 'POST', path: '/api/v1/query' } // Only cache specific POST endpoints
+    ],
+    forceCache: true // Often needed for POST if backend doesn't send Cache-Control
+  }
+});
+```
+## Configuration: `SiteCacheConfig`
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `methods` | `string[]` | List of allowed HTTP methods. Default: `['GET', 'HEAD']`. |
+| `cacheRules` | `CacheRule[]` | Fine-grained rules. If set, a request must match at least one rule to be cached. |
+| `query` | `KeyFilterConfig` | Filters for URL search parameters (`include`/`exclude`). |
+| `headers` | `KeyFilterConfig` | Filters for request headers. |
+| `cookies` | `KeyFilterConfig` | Filters for cookies. |
+| `body` | `KeyFilterConfig` | Filters for JSON request body fields. |
+| `staleIfError`| `boolean` | Serve stale cache on network failure. |
+| `forceCache` | `boolean` | Ignore `no-store` and force caching (useful for offline support). |
+### `CacheRule` Object
+- `method`: HTTP method to match.
+- `path`: URL pathname matching (supports **RegExp**, **Glob**, **Array**, or **prefix match**).
+- `query`: Key-value pairs. Values can be `string` (exact/Glob match), `true` (must exist), `false` (must not exist), or `RegExp`.
+- `body`: Body content matching (supports **RegExp**, **Glob**, or **Array**).
+### Pattern Matching
+`@isdk/proxy` provides powerful pattern matching for all configurable fields:
+| Pattern Type | Example | Description |
+| :--- | :--- | :--- |
+| **RegExp** | `/api/v[12]/.*/i` | JavaScript RegExp (in JSON, use string like `"/api/v[12]/.*/i"`) |
+| **Glob** | `/**/*.json` | File path style wildcard matching |
+| **Negation** | `['!/api/private/**', '/api/**']` | Exclude patterns (prefixed with `!`, checked first) |
+| **Array** | `['/api/v1/*', '/api/v2/*']` | Multiple patterns (OR logic, negative takes precedence) |
+| **Boolean** | `true` / `false` | For query params: must/must not exist |
+**Example with advanced pattern matching:**
+```typescript
+const myFetch = createCachedFetch({
+  cache,
+  config: {
+    cacheRules: [
+      {
+        path: ['/api/v1/items/*', '!/api/v1/items/private/*'], // v1 items, exclude private
+        query: {
+          format: '/^(json|xml)$/',     // Regex for format param
+          'page*': true                  // Glob: any param starting with 'page' must exist
+        },
+        body: /\"action\"\s*:\s*\"query\"/ // Regex body match
+      }
+    ]
+  }
+});
 ```
 ## Adapters
 `@isdk/proxy` is designed to be framework-agnostic. While the core library is pure, you can find (or build) adapters for specific environments:
+- **HTTP Caching Proxy Server (Node.js)**: See [@isdk/proxy-server](https://www.npmjs.com/package/@isdk/proxy-server) (separate package) for running a standalone HTTP forward proxy.
+- **Crawlee Adapter**: See [@isdk/proxy-crawlee](https://www.npmjs.com/package/@isdk/proxy-crawlee) (separate package) for integrating with Crawlee web scraping lifecycle.
 - **MSW Adapter**: See `@isdk/proxy-msw` (separate package) to use this caching engine as an MSW interceptor.
 - **Axios Adapter**: Easily implemented by converting Axios config to Web `Request`.
@@ -116,9 +190,119 @@ The hybrid multi-tier storage engine.
 - **`options.maxMemorySize`**: Threshold (in bytes) for offloading bodies to disk (default `1048576`, i.e., 1MB).
 - **`options.storagePath`**: Disk storage path for the `cacache` engine (defaults to a system temp folder).
+### Utility Functions
+Exported from `@isdk/proxy` for advanced usage:
+#### `isMatch(pattern, value, usePrefix?)`
+Universal pattern matching function. Supports RegExp, Glob, array patterns (with negation), and string prefix/exact matching.
+- **`pattern`**: `string | RegExp | (string | RegExp)[]`
+- **`value`**: The string to test against
+- **`usePrefix`**: For plain strings, use prefix match instead of exact match (default: `false`)
+- **Returns**: `boolean`
+```typescript
+import { isMatch } from '@isdk/proxy';
+isMatch('/api/v[12]/.*', '/api/v1/users');     // RegExp
+isMatch('/api/**/*.json', '/api/v1/data.json'); // Glob
+isMatch(['!/private/**', '/api/**'], '/api/data'); // Negation
+```
+#### `isGlob(pattern)`
+Check if a pattern is Glob syntax.
+- **`pattern`**: `string`
+- **Returns**: `boolean`
+#### `getSiteConfig(urlString, proxyConfig)`
+Get the site-specific cache configuration for a given URL.
+- **`urlString`**: Full URL to match
+- **`proxyConfig`**: `ProxyConfig` object with `sites` and `default` config
+- **Returns**: `SiteCacheConfig`
+```typescript
+import { getSiteConfig } from '@isdk/proxy';
+const config = getSiteConfig('https://api.example.com/data', {
+  default: { methods: ['GET'] },
+  sites: {
+    'api.example.com': { methods: ['GET', 'POST'], forceCache: true },
+    '/internal/': { staleIfError: true } // prefix match
+  }
+});
+```
+#### `isAllowed(key, config, defaultAllowed?)`
+Check if a key is allowed to participate in cache key fingerprinting.
+- **`key`**: The key name to check
+- **`config`**: `KeyFilterConfig` with `include` (whitelist) or `exclude` (blacklist)
+- **`defaultAllowed`**: Optional. Default value when no config or no match
+- **Returns**: `boolean | undefined`
+**Priority Logic**:
+1. `exclude` hit → returns `false` (highest priority)
+2. `include` exists and hits → returns `true`
+3. `include` exists but no hit → returns `false`
+4. No config → uses `defaultAllowed` (returns `undefined` if not provided)
+```typescript
+import { isAllowed } from '@isdk/proxy';
+// No config
+isAllowed('key'); // undefined (falsy)
+// Whitelist
+isAllowed('id', { include: ['id', 'name'] }); // true
+isAllowed('email', { include: ['id', 'name'] }); // false
+// Blacklist
+isAllowed('password', { exclude: ['password'] }); // false
+isAllowed('name', { exclude: ['password'] }); // undefined (falsy)
+// Need defaultAllowed to set default
+isAllowed('name', { exclude: ['password'] }, true); // true
+```
+#### `extractData(source, config, defaultAllowed?)`
+Extract and normalize data from a source object based on filter config. Used for generating cache fingerprints.
+- **`source`**: Original data object (Query, Headers, Cookies, etc.)
+- **`config`**: `KeyFilterConfig` object
+- **`defaultAllowed`**: Optional. Whether to allow extraction when no config or no match (default `false`)
+- **Returns**: `Record<string, string[]>` normalized data with lowercase keys and sorted array values
+```typescript
+import { extractData } from '@isdk/proxy';
+const headers = { 'Content-Type': 'application/json', 'X-Request-Id': '123' };
+// No extraction by default
+extractData(headers); // {}
+// Extract all keys
+extractData(headers, undefined, true); // { 'content-type': ['application/json'], 'x-request-id': ['123'] }
+// Whitelist
+extractData(headers, { include: ['content-type'] }); // { 'content-type': ['application/json'] }
+// Blacklist
+extractData(headers, { include: ['*'], exclude: ['x-request-id'] }, true); // { 'content-type': ['application/json'] }
+```
 ### Cache Status Headers
 Every response processed by `@isdk/proxy` will include an `x-proxy-cache` header indicating its lifecycle:
 - `HIT`: Served entirely from L1 or L2 cache.
 - `MISS`: Bypassed cache and fetched from the origin server.
 - `STALE`: Served from stale cache while a background update was initiated (SWR).