npm - @peterwangze/claude-trigger-router - Versions diffs - 1.5.0 → 1.7.0 - Mend

@peterwangze/claude-trigger-router 1.5.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +22 -6
package/dist/cli.js +7593 -7063
package/dist/cli.js.map +4 -4
package/docs/configuration-guide.md +11 -4
package/docs/release-notes-v1.6.0.md +36 -0
package/docs/release-notes-v1.7.0.md +36 -0
package/docs/releasing.md +5 -5
package/docs/server-maintainer-guide.md +20 -2
package/package.json +1 -1

package/docs/configuration-guide.md CHANGED Viewed

@@ -320,18 +320,20 @@ Health 摘要只解释已有 trace / metrics / anomaly 数据，不会改变路
 - `models`：可注册模型列表，字段复用 `Models[]` 的最小模型配置语义；多个相同 `id` 会编译成同一个 logical model pool。
 - `upstream_services`：上游服务引用列表，只保存服务 ID、base URL 和可选 token。
-- `strategy`：可选，当前支持 `priority` 和 `least-latency`；默认是 `priority`，显式设置 `least-latency` 后会优先选择已有成功延迟样本中平均延迟最低的健康 endpoint，没有样本时回退 priority。
+- `strategy`：可选，当前支持 `priority`、`least-latency`、`round-robin`、`health-aware` 和 `cost-aware`；默认是 `priority`。`least-latency` 按成功延迟窗口选低延迟 endpoint；`round-robin` 按当前成功次数选择低使用 endpoint；`health-aware` 优先健康、低失败、低延迟 endpoint；`cost-aware` 使用成本 metadata 选择低成本 endpoint。
 - `metadata.pool_endpoint_id`：可选，给某个 pool endpoint 一个稳定 ID。
-- `metadata.pool_priority`：可选，数值越小优先级越高；在 `priority` 策略下直接决定 active endpoint，在 `least-latency` 没有延迟样本或延迟相同时作为稳定回退顺序。
+- `metadata.pool_priority`：可选，数值越小优先级越高；在 `priority` 策略下直接决定 active endpoint，在其他策略缺少对应样本或分数相同时作为稳定回退顺序。
 - `metadata.pool_enabled`：可选，设为 `false` 时该 endpoint 会保留在 pool 中但不会成为 active endpoint。
 - `metadata.upstream_service_id`：可选，将 endpoint 关联到 `upstream_services[].id`，用于维护者观测和后续调度。
+- `metadata.cost_per_1m_input_tokens` / `metadata.cost_per_1m_output_tokens` / `metadata.cost_currency`：可选，用于展示和 cost-aware 调度的成本提示。
+- `metadata.rate_limit_rpm` / `metadata.rate_limit_tpm`：可选，用于展示和调度时识别速率容量。
 示例：
 ```yaml
 Registration:
   enabled: true
-  strategy: "least-latency"
+  strategy: "health-aware"
   upstream_services:
     - id: "edge-router"
       base_url: "https://edge.example.com"
@@ -346,6 +348,11 @@ Registration:
         pool_endpoint_id: "sonnet-edge-a"
         pool_priority: 10
         upstream_service_id: "edge-router"
+        cost_per_1m_input_tokens: 3
+        cost_per_1m_output_tokens: 15
+        cost_currency: "USD"
+        rate_limit_rpm: 120
+        rate_limit_tpm: 240000
     - id: sonnet
       api: "https://edge-b.example.com/v1"
       key: "${EDGE_B_MODEL_KEY}"
@@ -356,7 +363,7 @@ Registration:
         pool_priority: 20
 ```
-编译结果可以通过 `GET /api/models/compiled`、`POST /api/models/compiled/preview` 或 `/ui` 的 Compiled Models 区查看。当前阶段 pool 会把 active endpoint 编译成真实内部 provider；如果没有同名顶层 `Models[]` 覆盖，`Router.default: sonnet` 这类 logical model id 会解析到 active pool endpoint，并在治理 trace 中记录 `model_pool:<modelId>:<endpointId>`。当当前 pool endpoint 返回非流式 upstream error 时，运行时会按当前策略选择下一个 enabled endpoint 做一次本地重试，并记录 `model_pool_fallback:<modelId>:<endpointId>`；失败 endpoint 会进入短冷却，连续失败达到阈值后会进入更长的 `open` 熔断状态，后续 logical model 解析和 fallback candidate 会优先跳过冷却或熔断中的 endpoint。成功响应会写入延迟窗口，compiled model pool、`/api/models/pool-health` 和 `/ui` 可看到 endpoint 的平均延迟；当 `Registration.strategy: "least-latency"` 时，这个延迟窗口会参与 active endpoint 和 fallback candidate 选择。启用远程客户端配置时，本地 `/api/remote-status` 会拉取远端 `/api/registration` 的只读脱敏摘要，帮助使用者确认远端服务端注册了多少模型和 upstream 服务，但不会自动同步或覆盖本地 `Registration`。
+编译结果可以通过 `GET /api/models/compiled`、`POST /api/models/compiled/preview` 或 `/ui` 的 Compiled Models 区查看。当前阶段 pool 会把 active endpoint 编译成真实内部 provider；如果没有同名顶层 `Models[]` 覆盖，`Router.default: sonnet` 这类 logical model id 会解析到 active pool endpoint，并在治理 trace 中记录 `model_pool:<modelId>:<endpointId>`。当当前 pool endpoint 返回非流式 upstream error 时，运行时会按当前策略选择下一个 enabled endpoint 做一次本地重试，并记录 `model_pool_fallback:<modelId>:<endpointId>`；失败 endpoint 会进入短冷却，连续失败达到阈值后会进入更长的 `open` 熔断状态，后续 logical model 解析和 fallback candidate 会优先跳过冷却或熔断中的 endpoint。成功响应会写入延迟窗口，compiled model pool、`/api/models/pool-health` 和 `/ui` 可看到 endpoint 的平均延迟、成本和速率限制元数据；维护者也可以调用 `POST /api/models/pool-health/probe` 或点击 `/ui` 的“主动探测”，对 enabled endpoint 做轻量 `HEAD` 探测：2xx/3xx/4xx 视为网络可达并记录成功延迟，5xx/网络错误记录失败并复用 cooldown / circuit breaker。启用远程客户端配置时，本地 `/api/remote-status` 会拉取远端 `/api/registration` 的只读脱敏摘要，帮助使用者确认远端服务端注册了多少模型和 upstream 服务，但不会自动同步或覆盖本地 `Registration`。
 当前明确不支持 `nodes`、`node_id`、`cluster` 这类集群/节点编排字段。

package/docs/release-notes-v1.6.0.md ADDED Viewed

@@ -0,0 +1,36 @@
+# v1.6.0 Release Notes
+`v1.6.0` 定位为“多模型收益运营化版”。这个版本不继续扩展服务化或模型池策略，而是先把维护者判断“多模型组合是否真的带来质量/速度收益”的证据链做完整。
+## 本次发布主线
+- Benchmark history：`ctr eval --input/--run` 支持 `--save-history`，把评测摘要写入 `~/.claude-trigger-router/benchmark-history.json`；`ctr eval --history` 可查看最近分数、趋势和 Top models。
+- History API/UI：新增 `/api/benchmark/history`，`/ui` 维护者工作台展示 benchmark history、最近趋势、Top models，并同屏显示真实 trace 的 task comparison / quality evidence 对齐摘要。
+- 人工校准 UI：新增 `/api/benchmark/calibration` 与 `/ui` Human calibration 表单，维护者可把人工评分样本追加进 benchmark history；历史文件只保存摘要，不持久化原始模型输出。
+- 核心路由场景任务集：固定任务新增 `routeScenario`，覆盖 default、think、long_context、background、rule_hit、candidate_selection，并保留 server_ops / pool_health 作为后续服务化证据。
+- 评测与真实 trace 对齐：离线评测报告新增 `byRouteScenario`，UI benchmark history 同时展示离线 history、真实 task comparison 和 quality evidence，避免把 rubric 分数孤立看待。
+## 发布边界
+本版本聚焦收益运营化，不把 CTR 宣称为完整 server/cloud 托管平台、完整模型池运营平台或 agent 平台。以下事项进入后续版本，但不作为 `v1.6.0` 发布承诺：
+- 服务端部署默认安全策略、密钥轮换手册和托管维护 checklist。
+- 模型池主动健康探测、成本/速率元数据和更多调度策略。
+- handoff summary、tool capability guardrail、输入/输出 guardrail 和 trace span 化。
+- 更完整的可视化趋势图表或外部报表系统。
+## 发布前必跑
+```bash
+npm run release:verify
+npm run release:stage
+```
+正式发布前确认：
+- `package.json` 与 `package-lock.json` 版本均为 `1.6.0`
+- `ctr version` 输出 `Version: 1.6.0`
+- `v1.6.0` tag 与包版本一致
+- npm registry 中不存在 `@peterwangze/claude-trigger-router@1.6.0`
+- npm trusted publisher 指向 `peterwangze/claude-trigger-router` 的 `publish.yml`
+- GitHub publish workflow 使用 Node 24 / npm 11.5.1+

package/docs/release-notes-v1.7.0.md ADDED Viewed

@@ -0,0 +1,36 @@
+# v1.7.0 Release Notes
+`v1.7.0` 定位为“远程服务与模型池安全体验版”。这个版本承接已有 server deploy、managed key、quota、remote forward、registration model pool 和 pool health 基础，目标是让服务维护者能更安全地暴露服务，让远程使用者稳定接入，并让模型池调度更可解释。
+## 本次发布主线
+- 服务端部署默认安全策略：`ctr deploy init --target server` 生成 `Runtime.security`，明确公网监听必须鉴权、bootstrap key 仅限 admin、远程客户端使用 managed `client + read-only` key，并建议放在 HTTPS 反向代理或内网之后；`/api/service-info` 返回同一 policy 和 deployment checklist。
+- 密钥轮换和托管维护：新增 `POST /api/auth/keys/:id/rotate`，admin 可生成替代 managed key、只返回一次新 secret，并立即吊销旧 key；README、`/ui` auth guide 与 server maintainer guide 已补定期轮换、交接和疑似泄漏处置路径。
+- 模型池主动健康探测：新增 operator/admin 可触发的 `POST /api/models/pool-health/probe`，对 enabled endpoint 做轻量 `HEAD` 探测，把可达延迟或失败写入现有 cooldown / circuit breaker / latency health。
+- 成本和速率元数据：`Registration.models[].metadata` 支持 `cost_per_1m_input_tokens`、`cost_per_1m_output_tokens`、`cost_currency`、`rate_limit_rpm`、`rate_limit_tpm`，并在 compiled model pool、`/api/models/pool-health` 与 `/ui` 中展示。
+- 更丰富的模型池策略：`Registration.strategy` 支持 `priority`、`least-latency`、`round-robin`、`health-aware`、`cost-aware`，active endpoint 与 fallback candidate 使用同一排序口径。
+## 发布边界
+本版本聚焦自托管服务安全和模型池运营，不宣称完整 cloud/托管控制面、节点集群编排或 agent 平台化。以下事项进入后续版本，但不作为 `v1.7.0` 发布承诺：
+- 服务发现、节点/集群编排和多节点自动注册。
+- 更完整的外部报表、告警通道和可视化趋势系统。
+- handoff summary、tool capability guardrail、输入/输出 guardrail 和 trace span 化。
+- 完整托管平台的用户、组织、计费和审计控制面。
+## 发布前必跑
+```bash
+npm run release:verify
+npm run release:stage
+```
+正式发布前确认：
+- `package.json` 与 `package-lock.json` 版本均为 `1.7.0`
+- `ctr version` 输出 `Version: 1.7.0`
+- `v1.7.0` tag 与包版本一致
+- npm registry 中不存在 `@peterwangze/claude-trigger-router@1.7.0`
+- npm trusted publisher 指向 `peterwangze/claude-trigger-router` 的 `publish.yml`
+- GitHub publish workflow 使用 Node 24 / npm 11.5.1+

package/docs/releasing.md CHANGED Viewed

@@ -7,7 +7,7 @@
 - `Release Check`：在 PR、`master` push 和手动触发时执行发布前检查
 - `Publish Package`：在版本 tag、GitHub Release 或手动触发时执行正式发布
-本次 `v1.5.0` minor release 的优先级是入口基础功能稳定与易用性巩固。继续扩展 benchmark、服务化、模型池或 agent/tool 前，发布检查需要先保护 `setup / start / status / code / doctor / ui`、配置保存/修复/迁移、打包后真实用户流和 UI 基础交互看护。
+本次 `v1.7.0` minor release 的优先级是远程服务与模型池安全体验。发布检查需要同时保护既有 `setup / start / status / code / doctor / ui` 入口主路径，以及 server deploy、managed key、quota、remote forward、model pool health、主动探测和模型池调度策略。
 ## 一次性准备
@@ -26,14 +26,14 @@
 1. 更新版本号
    - `vX.Y.0` 这类 minor release 还需要同步更新版本依赖用例、README 发布定位和对应 release notes。
-   - 本次 `v1.5.0` 的发布边界以 `docs/release-notes-v1.5.0.md` 为准：主打入口基础功能稳定与易用性巩固，不宣称完整 benchmark 运营平台或完整云端平台。
+   - 本次 `v1.7.0` 的发布边界以 `docs/release-notes-v1.7.0.md` 为准：主打远程服务与模型池安全体验，不宣称完整云端托管控制面、节点集群编排或 agent 平台。
 2. 本地先执行发布包验证：
 ```bash
 npm run release:verify
 ```
-v1.5.0 期间建议在正式 `release:verify` 前额外跑一次入口稳定专项：
+v1.7.0 期间建议在正式 `release:verify` 前额外跑一次服务安全与模型池专项：
 ```bash
 npm test -- --run --coverage
@@ -43,7 +43,7 @@ npm run test:e2e:cli
 npm run test:e2e:acceptance
 ```
-其中 coverage 口径已经从早期 `src/trigger/**/*.ts` 扩展到 setup、config、models、protocols、governance、server、auth、doctor、cli 主链；`test:ui` 是源码侧 `/ui` DOM smoke，用于保护配置载入、compiled preview、保存失败提示和 Health action 这类基础交互；`test:e2e:cli:entry` 是较短的打包后入口 smoke，用于先保护 init、doctor、start/status/stop、setup fresh、setup remote client、setup server deployment、code 和 ui；后续新增入口功能时，先补对应看护再扩展低频能力。
+其中 coverage 口径已经从早期 `src/trigger/**/*.ts` 扩展到 setup、config、models、protocols、governance、server、auth、doctor、cli 主链；`test:ui` 是源码侧 `/ui` DOM smoke，用于保护配置载入、compiled preview、保存失败提示、Health action、benchmark history 和人工校准表单这类基础交互；`test:e2e:cli:entry` 是较短的打包后入口 smoke，用于先保护 init、doctor、start/status/stop、setup fresh、setup remote client、setup server deployment、code 和 ui；后续新增入口功能时，先补对应看护再扩展低频能力。
 这一步会依次执行：
@@ -80,7 +80,7 @@ npm run test:e2e:acceptance
 - 目标端口被非本服务占用时的安全提示与“无额外文件修改”边界
 - 残留 / 失效 PID 文件的安全清理
 - `release:stage` 生成的 `.release-stage\ctr-release-home.cmd` wrapper 是否真的指向隔离 `.release-home`
-- v1.5.0 入口稳定发布承诺的 packaged entry smoke、UI DOM smoke、配置保存安全线、remote client setup 和 server deployment setup
+- v1.7.0 服务安全与模型池发布承诺的 server security policy、managed key rotation、pool health probe、成本/速率元数据和 round-robin / health-aware / cost-aware 策略
 只有这一步通过后，才继续正式发布，避免“发布后才发现包内容、CLI 启动或 setup 主流程有问题”。

package/docs/server-maintainer-guide.md CHANGED Viewed

@@ -16,12 +16,15 @@ The command creates a server-oriented config with:
 - `HOST: "0.0.0.0"`
 - `Runtime.mode: "server"`
+- `Runtime.security` defaults for public auth, admin-only bootstrap key, HTTPS/private-network exposure and recommended managed key scopes
 - a bootstrap `APIKEY`
 - logging enabled
 - editable `Models` and `Router.default`
 Edit `Models[].key`, `Models[].api`, `Models[].interface` and `Models[].model` before exposing the service.
+`GET /api/service-info` returns the same security policy and a deployment checklist so maintainers can verify the live service before sharing the URL.
 ## 2. Diagnose and start
 ```bash
@@ -55,7 +58,19 @@ Recommended remote-user scopes:
 `client` allows model calls. `read-only` allows ready/status checks such as `/api/health`, `/api/service-info`, compiled model summaries and governance GET endpoints. `operator` is for day-to-day maintenance writes such as restart, governance snapshots, schedules, anomaly thresholds and archive deletion; it cannot read or save config and cannot manage auth keys. Generated secrets are returned once.
-## 4. Expose safely
+## 4. Rotate keys
+Use admin auth to rotate a managed key when a remote client changes owner, a secret may have leaked, or a regular maintenance window requires renewal:
+```text
+POST /api/auth/keys/:id/rotate
+```
+Rotation creates one replacement key, returns the new secret once, preserves the old key's scopes/quota unless overrides are provided, and immediately revokes the old key. After the client confirms the new secret works, check `GET /api/auth/keys` and `GET /api/auth/audit` for the expected active/revoked state.
+Use `POST /api/auth/keys/:id/revoke` when no replacement should be issued.
+## 5. Expose safely
 Prefer one of these deployment envelopes:
@@ -67,8 +82,9 @@ Before exposing the service to other machines:
 - keep auth enabled with bootstrap or active managed keys
 - put public deployments behind HTTPS reverse proxy or private network access
 - give remote users managed `client + read-only` keys, not admin/bootstrap keys
+- rotate managed keys during ownership changes or suspected leaks
-## 5. Daily maintenance
+## 6. Daily maintenance
 Use:
@@ -79,3 +95,5 @@ ctr ui
 ```
 `ctr ui` opens the workbench. The maintainer area shows security status, auth scope guidance, quota usage, governance health and routing outcome summaries.
+For model pools, use `GET /api/models/pool-health` for current health and `POST /api/models/pool-health/probe` for an operator-triggered lightweight reachability probe. The probe does not send a model request; it uses `HEAD` against enabled endpoints, records latency for reachable endpoints, and records failures into the existing cooldown/circuit breaker state.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@peterwangze/claude-trigger-router",
-  "version": "1.5.0",
+  "version": "1.7.0",
   "description": "Intelligent trigger-based router for Claude Code with automatic task type detection and model routing",
   "bin": {
     "ctr": "dist/cli.js"