pi-cache-optimizer 2.1.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -1
- package/README.zh-CN.md +14 -2
- package/index.ts +85 -15
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -65,7 +65,15 @@ Generic OpenAI-compatible proxies are **not** treated as OpenAI-family just beca
|
|
|
65
65
|
pi install npm:pi-cache-optimizer
|
|
66
66
|
```
|
|
67
67
|
|
|
68
|
-
After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
|
|
68
|
+
After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered and skills are compressed automatically, session-overview churn is stripped automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
|
|
69
|
+
|
|
70
|
+
## Opt-out
|
|
71
|
+
|
|
72
|
+
| Env var | Effect |
|
|
73
|
+
|---------|--------|
|
|
74
|
+
| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | Skip DeepSeek `models.json` auto-seed |
|
|
75
|
+
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep pi's verbose `<available_skills>` XML (opt out of one-line index) |
|
|
76
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | Add `prompt_cache_key` to OpenAI-family requests (opt-in) |
|
|
69
77
|
|
|
70
78
|
## Uninstall
|
|
71
79
|
|
package/README.zh-CN.md
CHANGED
|
@@ -23,6 +23,9 @@
|
|
|
23
23
|
| 功能 | 方式 | 是否需要手动操作 |
|
|
24
24
|
|------|------|:---:|
|
|
25
25
|
| 🔄 重组 system prompt | `before_agent_start` 钩子 — 稳定前缀在前、动态上下文在后 | ❌ 自动 |
|
|
26
|
+
| 🗜️ 压缩 Skills XML | 将 pi 的每 skill 四行 XML 替换为按 skills-root 分组的紧凑单行索引(大小缩减约 93%) | ❌ 自动 |
|
|
27
|
+
| 🧹 剥离 session-overview 动态尾字段 | 从 `<session-overview>` 中移除 `RECENT COMMITS`、`Working directory`、`Line count`——这些字段每轮都在变,破坏前缀缓存 | ❌ 自动 |
|
|
28
|
+
| 🛡️ 完整性 guard | 检测 prompt 重排是否意外截断了 trellis 结构标记;如发生则回退到原始 prompt 并在 footer 显示 `⚠️ integrity` | ❌ 自动 |
|
|
26
29
|
| ⏳ 长缓存保留 | 扩展加载时设置 `PI_CACHE_RETENTION=long`;Pi/provider compat 决定实际发送内容 | ❌ 自动 |
|
|
27
30
|
| 🔗 保守 compat 提醒 | DeepSeek session-affinity 提醒,以及 Claude 兼容 endpoint 的明显 cache-control 提醒 | ⚠️ 见下 |
|
|
28
31
|
| 📊 Provider-specific 底部统计 | 在 Pi footer/status 中显示受支持 provider family 的只读缓存统计 | ❌ 自动 |
|
|
@@ -65,7 +68,15 @@ Generic OpenAI-compatible 代理**不会**仅因为使用 OpenAI 形状 API 或
|
|
|
65
68
|
pi install npm:pi-cache-optimizer
|
|
66
69
|
```
|
|
67
70
|
|
|
68
|
-
安装后 `PI_CACHE_RETENTION=long` **自动生效**,system prompt
|
|
71
|
+
安装后 `PI_CACHE_RETENTION=long` **自动生效**,system prompt **自动重组**、skills 自动压缩、session-overview 动态尾字段自动剥离;如果 `~/.pi/agent/models.json` 还没有 DeepSeek-like 模型,会自动 seed 一个 `deepseek` provider 块;受支持 model family 的响应完成且暴露 usage 后,底部状态栏会显示缓存统计。
|
|
72
|
+
|
|
73
|
+
## 退出(Opt-out)
|
|
74
|
+
|
|
75
|
+
| 环境变量 | 作用 |
|
|
76
|
+
|---------|------|
|
|
77
|
+
| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | 跳过 `models.json` DeepSeek 自动写入 |
|
|
78
|
+
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 pi 的 verbose `<available_skills>` XML(退出一行索引模式) |
|
|
79
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | 对 OpenAI-family 请求添加 `prompt_cache_key`(需主动启用) |
|
|
69
80
|
|
|
70
81
|
## 卸载
|
|
71
82
|
|
|
@@ -196,7 +207,8 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
|
|
|
196
207
|
|
|
197
208
|
本包现在有 provider-family stats adapter,但仍避免盲目泛化:
|
|
198
209
|
|
|
199
|
-
- DeepSeek cache 是自动的 prefix/KV cache。命中是 best-effort,代理可能隐藏 DeepSeek usage 字段。
|
|
210
|
+
- DeepSeek cache 是自动的 prefix/KV cache。命中是 best-effort,代理可能隐藏 DeepSeek usage 字段。DeepSeek 的 Anthropic API 兼容层**明确忽略 `cache_control` markers**(对所有 content 类型均忽略)——像 Claude Code 那样用显式缓存断点对 DeepSeek 无效。
|
|
211
|
+
- **Kiro / kiro-api**:`pi-provider-kiro` 扩展使用 AWS CodeWhisperer / Q Developer 流式协议(不是 Anthropic Messages / OpenAI Chat Completions / Bedrock Converse)。该协议没有 `cache_control` marker 的注入位置,也不返回 `cache_read_input_tokens`。对 Kiro Claude 模型,底部会显示 **0%**——这是 `pi-provider-kiro` 的限制,不是本扩展的 bug。不要强行用特殊逻辑 bump 这些数字。
|
|
200
212
|
- OpenAI-family prompt caching 只有在真实上游支持且 prompt 足够长时才会自动生效。adapter 基于模型名称且刻意保守;不会用 provider/API/base URL metadata 推断官方 OpenAI 支持。
|
|
201
213
|
- Claude prompt caching 依赖显式 Anthropic cache-control breakpoints。本版本只报告 Pi/provider 暴露的统计;不会插入 breakpoint,也不会修改请求体。
|
|
202
214
|
- Gemini/Vertex 可能暴露 implicit cached-content token count。本版本不会创建、保存、更新或删除 explicit Gemini cached-content resources。
|
package/index.ts
CHANGED
|
@@ -51,11 +51,12 @@ const NO_SKILL_COMPRESSION_ENV = "PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION";
|
|
|
51
51
|
const DEEPSEEK_API_KEY_ENV = "DEEPSEEK_API_KEY";
|
|
52
52
|
|
|
53
53
|
// WORM-flag: if optimizeSystemPrompt ever detects that its blind-replace
|
|
54
|
-
// logic has accidentally truncated
|
|
55
|
-
//
|
|
56
|
-
// publishStatus reads it once, appends a footer warning, then
|
|
57
|
-
// The flag surface is kept separate from the regular
|
|
58
|
-
// so that a one-turn glitch doesn't poison the
|
|
54
|
+
// logic has accidentally truncated a structural marker (any XML tag or
|
|
55
|
+
// HTML comment boundary marker present in the original prompt), we flip
|
|
56
|
+
// this. publishStatus reads it once, appends a footer warning, then
|
|
57
|
+
// resets it. The flag surface is kept separate from the regular
|
|
58
|
+
// cache-stats counter so that a one-turn glitch doesn't poison the
|
|
59
|
+
// persisted metrics.
|
|
59
60
|
let promptTruncationDetected = false;
|
|
60
61
|
|
|
61
62
|
// Minimum count of skills before compression is worth applying.
|
|
@@ -387,6 +388,62 @@ function stripSessionOverviewChurn(prompt: string): string {
|
|
|
387
388
|
return before + cleaned + after;
|
|
388
389
|
}
|
|
389
390
|
|
|
391
|
+
/**
|
|
392
|
+
* Extract structural markers from a prompt for the integrity guard.
|
|
393
|
+
*
|
|
394
|
+
* The guard runs in `optimizeSystemPrompt` to catch cases where the
|
|
395
|
+
* blind `rest.replace(part, "")` reorder accidentally eats text inside
|
|
396
|
+
* an extension-injected structural block (e.g., trellis
|
|
397
|
+
* `<workflow-state>`, a hypothetical `<task-tracker>`, or AGENTS.md
|
|
398
|
+
* `<!-- TRELLIS:START -->` markers). When the original prompt contains
|
|
399
|
+
* a marker that the result is missing, we fall back to the original
|
|
400
|
+
* prompt rather than ship a corrupted one.
|
|
401
|
+
*
|
|
402
|
+
* Three marker categories are recognized (covers ~99% of real-world
|
|
403
|
+
* extension injection patterns in the pi ecosystem):
|
|
404
|
+
*
|
|
405
|
+
* 1. XML-style opening tags `<tagname>` (lowercase, alpha-num + `_`/`-`)
|
|
406
|
+
* 2. XML-style closing tags `</tagname>`
|
|
407
|
+
* 3. HTML comment START/END `<!-- NAME:START -->` / `<!-- NAME:END -->`
|
|
408
|
+
*
|
|
409
|
+
* Tags with attributes (e.g., `<task id="42">`) are not currently emitted
|
|
410
|
+
* by any pi extension we know of and are skipped to keep the regex tight.
|
|
411
|
+
* Markdown headers, horizontal rules, and timestamp patterns are not
|
|
412
|
+
* usable as guards because they have no closing form to verify.
|
|
413
|
+
*
|
|
414
|
+
* The check is deliberately set-based (presence/absence) rather than
|
|
415
|
+
* count-based: a single occurrence per request is the universal
|
|
416
|
+
* convention, and a count drop with the same set of unique tags would
|
|
417
|
+
* be a different class of bug not catchable here.
|
|
418
|
+
*/
|
|
419
|
+
function extractStructuralMarkers(prompt: string): {
|
|
420
|
+
openingTags: Set<string>;
|
|
421
|
+
closingTags: Set<string>;
|
|
422
|
+
commentMarkers: Set<string>;
|
|
423
|
+
} {
|
|
424
|
+
const openingTags = new Set<string>();
|
|
425
|
+
const closingTags = new Set<string>();
|
|
426
|
+
const commentMarkers = new Set<string>();
|
|
427
|
+
|
|
428
|
+
// Opening tags: <tagname> with no attributes and no leading slash.
|
|
429
|
+
// Tagname must start with a letter and contain only alpha-num, `-`, `_`.
|
|
430
|
+
for (const match of prompt.matchAll(/<([a-z][a-z0-9_-]*)>/gi)) {
|
|
431
|
+
openingTags.add(match[1].toLowerCase());
|
|
432
|
+
}
|
|
433
|
+
// Closing tags: </tagname>
|
|
434
|
+
for (const match of prompt.matchAll(/<\/([a-z][a-z0-9_-]*)>/gi)) {
|
|
435
|
+
closingTags.add(match[1].toLowerCase());
|
|
436
|
+
}
|
|
437
|
+
// HTML comments with NAME:START or NAME:END inside.
|
|
438
|
+
// Trellis emits `<!-- TRELLIS:START -->` / `<!-- TRELLIS:END -->` in
|
|
439
|
+
// the AGENTS.md managed block; other extensions follow this convention.
|
|
440
|
+
for (const match of prompt.matchAll(/<!--\s*([A-Z][A-Z0-9_-]*):(START|END)\s*-->/g)) {
|
|
441
|
+
commentMarkers.add(`${match[1]}:${match[2]}`);
|
|
442
|
+
}
|
|
443
|
+
|
|
444
|
+
return { openingTags, closingTags, commentMarkers };
|
|
445
|
+
}
|
|
446
|
+
|
|
390
447
|
function optimizeSystemPrompt(
|
|
391
448
|
original: string,
|
|
392
449
|
opts: BuildSystemPromptOptions,
|
|
@@ -420,17 +477,29 @@ function optimizeSystemPrompt(
|
|
|
420
477
|
stablePrefix +
|
|
421
478
|
(dynamicRemainder.length > 0 ? "\n\n---\n\n" + dynamicRemainder : "");
|
|
422
479
|
|
|
423
|
-
// Sanity check:
|
|
424
|
-
// markers
|
|
425
|
-
//
|
|
426
|
-
//
|
|
427
|
-
//
|
|
428
|
-
// trellis
|
|
480
|
+
// Sanity check: scan ALL structural markers (XML tags + HTML comment
|
|
481
|
+
// boundary markers) in the original and verify each one survives the
|
|
482
|
+
// reorder. If any marker drops, the blind `rest.replace(part, "")`
|
|
483
|
+
// logic ate something it shouldn't have — fall back to the original
|
|
484
|
+
// prompt and flag the footer warning. This is provider-agnostic and
|
|
485
|
+
// extension-agnostic: trellis `<workflow-state>`, a hypothetical
|
|
486
|
+
// `<task-tracker>`, AGENTS.md `<!-- TRELLIS:START -->`, etc., are all
|
|
487
|
+
// protected without code changes when new extensions ship.
|
|
429
488
|
//
|
|
430
|
-
//
|
|
431
|
-
//
|
|
432
|
-
//
|
|
433
|
-
|
|
489
|
+
// Our skills compression runs BEFORE optimizeSystemPrompt and replaces
|
|
490
|
+
// pi's verbose `<available_skills>` block with a compressed text
|
|
491
|
+
// section that has no XML tag. So `original` here (post-compression)
|
|
492
|
+
// does not contain `<available_skills>` and the result doesn't either
|
|
493
|
+
// — no false positive.
|
|
494
|
+
const originalMarkers = extractStructuralMarkers(original);
|
|
495
|
+
const resultMarkers = extractStructuralMarkers(systemPrompt);
|
|
496
|
+
|
|
497
|
+
const missing =
|
|
498
|
+
[...originalMarkers.openingTags].some((tag) => !resultMarkers.openingTags.has(tag)) ||
|
|
499
|
+
[...originalMarkers.closingTags].some((tag) => !resultMarkers.closingTags.has(tag)) ||
|
|
500
|
+
[...originalMarkers.commentMarkers].some((m) => !resultMarkers.commentMarkers.has(m));
|
|
501
|
+
|
|
502
|
+
if (missing) {
|
|
434
503
|
promptTruncationDetected = true;
|
|
435
504
|
return { systemPrompt: original, stablePrefix: "", changed: false };
|
|
436
505
|
}
|
|
@@ -1240,6 +1309,7 @@ export const __internals_for_tests = {
|
|
|
1240
1309
|
buildStableCandidates,
|
|
1241
1310
|
optimizeSystemPrompt,
|
|
1242
1311
|
stripSessionOverviewChurn,
|
|
1312
|
+
extractStructuralMarkers,
|
|
1243
1313
|
formatSkillsForPrompt,
|
|
1244
1314
|
formatSkillsForPromptCompressed,
|
|
1245
1315
|
compressSkillsInSystemPrompt,
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pi-cache-optimizer",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.2.0",
|
|
4
4
|
"description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"pi-package",
|