pi-cache-optimizer 2.0.2 → 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -1
- package/README.zh-CN.md +14 -2
- package/index.ts +256 -6
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -65,7 +65,15 @@ Generic OpenAI-compatible proxies are **not** treated as OpenAI-family just beca
|
|
|
65
65
|
pi install npm:pi-cache-optimizer
|
|
66
66
|
```
|
|
67
67
|
|
|
68
|
-
After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
|
|
68
|
+
After installation, `PI_CACHE_RETENTION=long` is applied automatically, the system prompt is reordered and skills are compressed automatically, session-overview churn is stripped automatically, `~/.pi/agent/models.json` is auto-seeded with a DeepSeek block when no DeepSeek-like model is configured, and the footer shows cache stats after supported model-family responses with exposed usage.
|
|
69
|
+
|
|
70
|
+
## Opt-out
|
|
71
|
+
|
|
72
|
+
| Env var | Effect |
|
|
73
|
+
|---------|--------|
|
|
74
|
+
| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | Skip DeepSeek `models.json` auto-seed |
|
|
75
|
+
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | Keep pi's verbose `<available_skills>` XML (opt out of one-line index) |
|
|
76
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | Add `prompt_cache_key` to OpenAI-family requests (opt-in) |
|
|
69
77
|
|
|
70
78
|
## Uninstall
|
|
71
79
|
|
package/README.zh-CN.md
CHANGED
|
@@ -23,6 +23,9 @@
|
|
|
23
23
|
| 功能 | 方式 | 是否需要手动操作 |
|
|
24
24
|
|------|------|:---:|
|
|
25
25
|
| 🔄 重组 system prompt | `before_agent_start` 钩子 — 稳定前缀在前、动态上下文在后 | ❌ 自动 |
|
|
26
|
+
| 🗜️ 压缩 Skills XML | 将 pi 的每 skill 四行 XML 替换为按 skills-root 分组的紧凑单行索引(大小缩减约 93%) | ❌ 自动 |
|
|
27
|
+
| 🧹 剥离 session-overview 动态尾字段 | 从 `<session-overview>` 中移除 `RECENT COMMITS`、`Working directory`、`Line count`——这些字段每轮都在变,破坏前缀缓存 | ❌ 自动 |
|
|
28
|
+
| 🛡️ 完整性 guard | 检测 prompt 重排是否意外截断了 trellis 结构标记;如发生则回退到原始 prompt 并在 footer 显示 `⚠️ integrity` | ❌ 自动 |
|
|
26
29
|
| ⏳ 长缓存保留 | 扩展加载时设置 `PI_CACHE_RETENTION=long`;Pi/provider compat 决定实际发送内容 | ❌ 自动 |
|
|
27
30
|
| 🔗 保守 compat 提醒 | DeepSeek session-affinity 提醒,以及 Claude 兼容 endpoint 的明显 cache-control 提醒 | ⚠️ 见下 |
|
|
28
31
|
| 📊 Provider-specific 底部统计 | 在 Pi footer/status 中显示受支持 provider family 的只读缓存统计 | ❌ 自动 |
|
|
@@ -65,7 +68,15 @@ Generic OpenAI-compatible 代理**不会**仅因为使用 OpenAI 形状 API 或
|
|
|
65
68
|
pi install npm:pi-cache-optimizer
|
|
66
69
|
```
|
|
67
70
|
|
|
68
|
-
安装后 `PI_CACHE_RETENTION=long` **自动生效**,system prompt
|
|
71
|
+
安装后 `PI_CACHE_RETENTION=long` **自动生效**,system prompt **自动重组**、skills 自动压缩、session-overview 动态尾字段自动剥离;如果 `~/.pi/agent/models.json` 还没有 DeepSeek-like 模型,会自动 seed 一个 `deepseek` provider 块;受支持 model family 的响应完成且暴露 usage 后,底部状态栏会显示缓存统计。
|
|
72
|
+
|
|
73
|
+
## 退出(Opt-out)
|
|
74
|
+
|
|
75
|
+
| 环境变量 | 作用 |
|
|
76
|
+
|---------|------|
|
|
77
|
+
| `PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG=1` | 跳过 `models.json` DeepSeek 自动写入 |
|
|
78
|
+
| `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1` | 保留 pi 的 verbose `<available_skills>` XML(退出一行索引模式) |
|
|
79
|
+
| `PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY=1` | 对 OpenAI-family 请求添加 `prompt_cache_key`(需主动启用) |
|
|
69
80
|
|
|
70
81
|
## 卸载
|
|
71
82
|
|
|
@@ -196,7 +207,8 @@ Pi 本身还会根据模型 compat 和 `PI_CACHE_RETENTION` 决定是否发送
|
|
|
196
207
|
|
|
197
208
|
本包现在有 provider-family stats adapter,但仍避免盲目泛化:
|
|
198
209
|
|
|
199
|
-
- DeepSeek cache 是自动的 prefix/KV cache。命中是 best-effort,代理可能隐藏 DeepSeek usage 字段。
|
|
210
|
+
- DeepSeek cache 是自动的 prefix/KV cache。命中是 best-effort,代理可能隐藏 DeepSeek usage 字段。DeepSeek 的 Anthropic API 兼容层**明确忽略 `cache_control` markers**(对所有 content 类型均忽略)——像 Claude Code 那样用显式缓存断点对 DeepSeek 无效。
|
|
211
|
+
- **Kiro / kiro-api**:`pi-provider-kiro` 扩展使用 AWS CodeWhisperer / Q Developer 流式协议(不是 Anthropic Messages / OpenAI Chat Completions / Bedrock Converse)。该协议没有 `cache_control` marker 的注入位置,也不返回 `cache_read_input_tokens`。对 Kiro Claude 模型,底部会显示 **0%**——这是 `pi-provider-kiro` 的限制,不是本扩展的 bug。不要强行用特殊逻辑 bump 这些数字。
|
|
200
212
|
- OpenAI-family prompt caching 只有在真实上游支持且 prompt 足够长时才会自动生效。adapter 基于模型名称且刻意保守;不会用 provider/API/base URL metadata 推断官方 OpenAI 支持。
|
|
201
213
|
- Claude prompt caching 依赖显式 Anthropic cache-control breakpoints。本版本只报告 Pi/provider 暴露的统计;不会插入 breakpoint,也不会修改请求体。
|
|
202
214
|
- Gemini/Vertex 可能暴露 implicit cached-content token count。本版本不会创建、保存、更新或删除 explicit Gemini cached-content resources。
|
package/index.ts
CHANGED
|
@@ -7,7 +7,7 @@ import {
|
|
|
7
7
|
} from "node:fs";
|
|
8
8
|
import { mkdir, readFile, rename, unlink, writeFile } from "node:fs/promises";
|
|
9
9
|
import { homedir } from "node:os";
|
|
10
|
-
import { join } from "node:path";
|
|
10
|
+
import { dirname, join } from "node:path";
|
|
11
11
|
import type { BuildSystemPromptOptions, ExtensionAPI, ExtensionContext } from "@earendil-works/pi-coding-agent";
|
|
12
12
|
|
|
13
13
|
/**
|
|
@@ -47,8 +47,24 @@ const CACHE_PROVIDER_IDS: CacheProviderId[] = ["deepseek", "openai", "claude", "
|
|
|
47
47
|
const OPENAI_CACHE_KEY_ENV = "PI_CACHE_OPTIMIZER_OPENAI_CACHE_KEY";
|
|
48
48
|
const OPENAI_PROMPT_CACHE_KEY_PREFIX = "pi-dsco-";
|
|
49
49
|
const NO_AUTO_CONFIG_ENV = "PI_CACHE_OPTIMIZER_NO_AUTO_CONFIG";
|
|
50
|
+
const NO_SKILL_COMPRESSION_ENV = "PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION";
|
|
50
51
|
const DEEPSEEK_API_KEY_ENV = "DEEPSEEK_API_KEY";
|
|
51
52
|
|
|
53
|
+
// WORM-flag: if optimizeSystemPrompt ever detects that its blind-replace
|
|
54
|
+
// logic has accidentally truncated the trellis `<workflow-state>` block
|
|
55
|
+
// (or any structural marker from an upstream extension), we flip this.
|
|
56
|
+
// publishStatus reads it once, appends a footer warning, then resets it.
|
|
57
|
+
// The flag surface is kept separate from the regular cache-stats counter
|
|
58
|
+
// so that a one-turn glitch doesn't poison the persisted metrics.
|
|
59
|
+
let promptTruncationDetected = false;
|
|
60
|
+
|
|
61
|
+
// Minimum count of skills before compression is worth applying.
|
|
62
|
+
// Below this, pi's verbose XML block is small enough that the overhead of
|
|
63
|
+
// an additional one-line index isn't worth the loss of per-skill
|
|
64
|
+
// description hints. The 31-skill snapshot in this repo was 13.3 KB; one
|
|
65
|
+
// or two skills is well under 1 KB and not worth touching.
|
|
66
|
+
const SKILL_COMPRESSION_MIN_COUNT = 4;
|
|
67
|
+
|
|
52
68
|
// Minimum trimmed length for a candidate to qualify as a stable-prefix "part".
|
|
53
69
|
//
|
|
54
70
|
// `optimizeSystemPrompt` removes each accepted candidate from the dynamic
|
|
@@ -166,6 +182,121 @@ function formatSkillsForPrompt(skills: NonNullable<BuildSystemPromptOptions["ski
|
|
|
166
182
|
return lines.join("\n");
|
|
167
183
|
}
|
|
168
184
|
|
|
185
|
+
/**
|
|
186
|
+
* Compressed alternative to `formatSkillsForPrompt`.
|
|
187
|
+
*
|
|
188
|
+
* Pi emits a four-line XML block per skill (`<name>`, `<description>`,
|
|
189
|
+
* `<location>`) plus a three-sentence preamble. With 31 skills active in
|
|
190
|
+
* this repo that block measured 13.3 KB — 61.5 % of the total system
|
|
191
|
+
* prompt. The full description text matters when the model has to decide
|
|
192
|
+
* which skill to load, but the model can read SKILL.md on demand: the
|
|
193
|
+
* names alone plus a known location pattern is enough to identify
|
|
194
|
+
* candidates.
|
|
195
|
+
*
|
|
196
|
+
* This compressed form preserves:
|
|
197
|
+
* 1. The instruction to read SKILL.md when a task matches a skill name.
|
|
198
|
+
* 2. The relative-path resolution rule (parent of SKILL.md is the
|
|
199
|
+
* skill directory).
|
|
200
|
+
* 3. Discoverability of every skill: name + location prefix per skill.
|
|
201
|
+
*
|
|
202
|
+
* It drops:
|
|
203
|
+
* - Per-skill description text (model loads it via `read` when a name
|
|
204
|
+
* matches a task).
|
|
205
|
+
* - The `<available_skills>` XML envelope and per-skill XML overhead
|
|
206
|
+
* (~110 bytes per skill of pure structure, plus the location path).
|
|
207
|
+
*
|
|
208
|
+
* Output shape is a single text block grouped by skill-root directory so
|
|
209
|
+
* the model can compute each skill's full path by name. Names are sorted
|
|
210
|
+
* alphabetically within each group for determinism (cache stability).
|
|
211
|
+
*/
|
|
212
|
+
function formatSkillsForPromptCompressed(
|
|
213
|
+
skills: NonNullable<BuildSystemPromptOptions["skills"]>,
|
|
214
|
+
): string {
|
|
215
|
+
const visibleSkills = skills.filter((skill) => !skill.disableModelInvocation);
|
|
216
|
+
if (visibleSkills.length === 0) return "";
|
|
217
|
+
|
|
218
|
+
const groups = new Map<string, string[]>();
|
|
219
|
+
for (const skill of visibleSkills) {
|
|
220
|
+
// skill.filePath = .../<skill-name>/SKILL.md, so dirname is the
|
|
221
|
+
// skill directory and dirname-of-dirname is the skills root.
|
|
222
|
+
const skillDir = dirname(skill.filePath);
|
|
223
|
+
const root = dirname(skillDir);
|
|
224
|
+
const list = groups.get(root) ?? [];
|
|
225
|
+
list.push(skill.name);
|
|
226
|
+
groups.set(root, list);
|
|
227
|
+
}
|
|
228
|
+
|
|
229
|
+
// Sort group entries by root for determinism: same skill set under the
|
|
230
|
+
// same roots must always produce the same string, otherwise the
|
|
231
|
+
// provider prompt-prefix cache loses on prompt builder runs that
|
|
232
|
+
// happened to iterate the underlying Map in different orders.
|
|
233
|
+
const sortedGroups = [...groups.entries()].sort(([a], [b]) =>
|
|
234
|
+
a < b ? -1 : a > b ? 1 : 0,
|
|
235
|
+
);
|
|
236
|
+
|
|
237
|
+
const lines: string[] = [
|
|
238
|
+
"",
|
|
239
|
+
"",
|
|
240
|
+
"The following skills provide specialized instructions for specific tasks. When a skill name matches the task you are doing, read the SKILL.md at the listed location to load the full instructions. When a SKILL.md references a relative path, resolve it against the skill directory (parent of SKILL.md / dirname of the path) and use that absolute path in tool commands.",
|
|
241
|
+
];
|
|
242
|
+
|
|
243
|
+
for (const [root, names] of sortedGroups) {
|
|
244
|
+
names.sort();
|
|
245
|
+
lines.push("");
|
|
246
|
+
lines.push(`Skills under ${root}/<name>/SKILL.md:`);
|
|
247
|
+
// Wrap the name list at ~80 columns for readability without
|
|
248
|
+
// affecting determinism. Each line is ` name1, name2, name3,`.
|
|
249
|
+
let buf = " ";
|
|
250
|
+
for (let i = 0; i < names.length; i++) {
|
|
251
|
+
const name = names[i];
|
|
252
|
+
const piece = (buf === " " ? "" : ", ") + name;
|
|
253
|
+
if (buf.length > 2 && buf.length + piece.length > 80) {
|
|
254
|
+
lines.push(`${buf},`);
|
|
255
|
+
buf = ` ${name}`;
|
|
256
|
+
} else {
|
|
257
|
+
buf += piece;
|
|
258
|
+
}
|
|
259
|
+
}
|
|
260
|
+
if (buf.length > 2) lines.push(buf);
|
|
261
|
+
}
|
|
262
|
+
|
|
263
|
+
return lines.join("\n");
|
|
264
|
+
}
|
|
265
|
+
|
|
266
|
+
/**
|
|
267
|
+
* Replace pi's verbose `<available_skills>` block in `prompt` with the
|
|
268
|
+
* compressed one-index form. Idempotent: if the verbose form is not
|
|
269
|
+
* present (compression already applied, or skill count below threshold),
|
|
270
|
+
* the prompt is returned unchanged.
|
|
271
|
+
*
|
|
272
|
+
* Opt-out: set `PI_CACHE_OPTIMIZER_NO_SKILL_COMPRESSION=1`.
|
|
273
|
+
*
|
|
274
|
+
* Pre-conditions for compression to fire:
|
|
275
|
+
* - opts.skills present and visible-skill count >= SKILL_COMPRESSION_MIN_COUNT
|
|
276
|
+
* - Verbose block (built from the same `opts.skills`) is found in
|
|
277
|
+
* `prompt` (substring match, no regex). This anchors the substitution
|
|
278
|
+
* to pi's own emitter; if pi changes the format, we no-op rather
|
|
279
|
+
* than mangle.
|
|
280
|
+
*/
|
|
281
|
+
function compressSkillsInSystemPrompt(
|
|
282
|
+
prompt: string,
|
|
283
|
+
opts: BuildSystemPromptOptions,
|
|
284
|
+
): string {
|
|
285
|
+
if (isEnabledEnv(process.env[NO_SKILL_COMPRESSION_ENV])) return prompt;
|
|
286
|
+
if (!opts.skills || opts.skills.length === 0) return prompt;
|
|
287
|
+
|
|
288
|
+
const visible = opts.skills.filter((skill) => !skill.disableModelInvocation);
|
|
289
|
+
if (visible.length < SKILL_COMPRESSION_MIN_COUNT) return prompt;
|
|
290
|
+
|
|
291
|
+
const verbose = formatSkillsForPrompt(opts.skills);
|
|
292
|
+
if (!verbose || !prompt.includes(verbose)) return prompt;
|
|
293
|
+
|
|
294
|
+
const compressed = formatSkillsForPromptCompressed(opts.skills);
|
|
295
|
+
if (!compressed || compressed.length >= verbose.length) return prompt;
|
|
296
|
+
|
|
297
|
+
return prompt.replace(verbose, compressed);
|
|
298
|
+
}
|
|
299
|
+
|
|
169
300
|
function buildStableCandidates(opts: BuildSystemPromptOptions): string[] {
|
|
170
301
|
const candidates: string[] = [];
|
|
171
302
|
|
|
@@ -195,12 +326,67 @@ function buildStableCandidates(opts: BuildSystemPromptOptions): string[] {
|
|
|
195
326
|
}
|
|
196
327
|
|
|
197
328
|
if (opts.skills && opts.skills.length > 0) {
|
|
329
|
+
// Push BOTH forms so `optimizeSystemPrompt` finds whichever is
|
|
330
|
+
// actually present in the prompt. The `rest.includes(part)`
|
|
331
|
+
// short-circuit skips the form that isn't there. The two strings
|
|
332
|
+
// are mutually distinguishable (the verbose form contains the
|
|
333
|
+
// literal `<available_skills>` envelope; the compressed form
|
|
334
|
+
// contains `Skills under ` and no XML tags) so they cannot
|
|
335
|
+
// accidentally match each other.
|
|
198
336
|
candidates.push(formatSkillsForPrompt(opts.skills));
|
|
337
|
+
candidates.push(formatSkillsForPromptCompressed(opts.skills));
|
|
199
338
|
}
|
|
200
339
|
|
|
201
340
|
return candidates;
|
|
202
341
|
}
|
|
203
342
|
|
|
343
|
+
/**
|
|
344
|
+
* Strip per-turn churn from trellis `<session-overview>` block.
|
|
345
|
+
*
|
|
346
|
+
* Trellis injects a session-overview that includes `RECENT COMMITS`
|
|
347
|
+
* (shifts on every git commit), `Working directory: Clean/N uncommitted`
|
|
348
|
+
* (shifts on every edit/commit), and `Line count: N / 2000` (shifts on
|
|
349
|
+
* every journal append). These fields are at the tail of the
|
|
350
|
+
* session-overview and poison the prompt-prefix cache for everything
|
|
351
|
+
* that follows.
|
|
352
|
+
*
|
|
353
|
+
* This function surgically removes those three churn fields from the
|
|
354
|
+
* `<session-overview>...</session-overview>` block. The remaining
|
|
355
|
+
* fields (DEVELOPER, GIT STATUS branch-only, CURRENT TASK, ACTIVE
|
|
356
|
+
* TASKS, MY TASKS, JOURNAL FILE active-file-only, PACKAGES, PATHS)
|
|
357
|
+
* are stable within a session and become cache-friendlier.
|
|
358
|
+
*
|
|
359
|
+
* No-op when the `<session-overview>` tag is not present (e.g.
|
|
360
|
+
* trellis hook chose not to inject it, or a different extension
|
|
361
|
+
* owns the prompt).
|
|
362
|
+
*/
|
|
363
|
+
function stripSessionOverviewChurn(prompt: string): string {
|
|
364
|
+
const startTag = "<session-overview>";
|
|
365
|
+
const endTag = "</session-overview>";
|
|
366
|
+
|
|
367
|
+
const startIdx = prompt.indexOf(startTag);
|
|
368
|
+
if (startIdx === -1) return prompt;
|
|
369
|
+
|
|
370
|
+
const endIdx = prompt.indexOf(endTag, startIdx + startTag.length);
|
|
371
|
+
if (endIdx === -1) return prompt;
|
|
372
|
+
|
|
373
|
+
const before = prompt.slice(0, startIdx + startTag.length);
|
|
374
|
+
const inner = prompt.slice(startIdx + startTag.length, endIdx);
|
|
375
|
+
const after = prompt.slice(endIdx);
|
|
376
|
+
|
|
377
|
+
let cleaned = inner
|
|
378
|
+
// Drop the RECENT COMMITS section (from the heading through the
|
|
379
|
+
// next heading or end of inner). The model sees commit history
|
|
380
|
+
// via `git log`; carrying it in every system prompt is redundant.
|
|
381
|
+
.replace(/\n## RECENT COMMITS\n[\s\S]*?(?=\n## |$)/, "")
|
|
382
|
+
// Drop "Working directory: ..." (Git status tail churn).
|
|
383
|
+
.replace(/\nWorking directory:[^\n]*/g, "")
|
|
384
|
+
// Drop "Line count: N / NNNN" (Journal tail churn).
|
|
385
|
+
.replace(/\nLine count:[^\n]*/g, "");
|
|
386
|
+
|
|
387
|
+
return before + cleaned + after;
|
|
388
|
+
}
|
|
389
|
+
|
|
204
390
|
function optimizeSystemPrompt(
|
|
205
391
|
original: string,
|
|
206
392
|
opts: BuildSystemPromptOptions,
|
|
@@ -230,10 +416,27 @@ function optimizeSystemPrompt(
|
|
|
230
416
|
return { systemPrompt: original, stablePrefix: "", changed: false };
|
|
231
417
|
}
|
|
232
418
|
|
|
419
|
+
const systemPrompt =
|
|
420
|
+
stablePrefix +
|
|
421
|
+
(dynamicRemainder.length > 0 ? "\n\n---\n\n" + dynamicRemainder : "");
|
|
422
|
+
|
|
423
|
+
// Sanity check: if trellis (or another extension) injected structural
|
|
424
|
+
// markers into the prompt that happen to share a substring with one of
|
|
425
|
+
// our stable candidates, the blind `rest.replace(part, "")` could
|
|
426
|
+
// silently eat part of the dynamic layer. We anchor on
|
|
427
|
+
// `<workflow-state>` because it is the most stable structural marker
|
|
428
|
+
// trellis emits and is never a stable candidate itself.
|
|
429
|
+
//
|
|
430
|
+
// When the marker was present in the original but is missing in the
|
|
431
|
+
// result, the reorder is unsafe — fall back to the original prompt
|
|
432
|
+
// so the model gets a complete prompt, and flag the footer warning.
|
|
433
|
+
if (original.includes("<workflow-state>") && !systemPrompt.includes("<workflow-state>")) {
|
|
434
|
+
promptTruncationDetected = true;
|
|
435
|
+
return { systemPrompt: original, stablePrefix: "", changed: false };
|
|
436
|
+
}
|
|
437
|
+
|
|
233
438
|
return {
|
|
234
|
-
systemPrompt
|
|
235
|
-
stablePrefix +
|
|
236
|
-
(dynamicRemainder.length > 0 ? "\n\n---\n\n" + dynamicRemainder : ""),
|
|
439
|
+
systemPrompt,
|
|
237
440
|
stablePrefix,
|
|
238
441
|
changed: true,
|
|
239
442
|
};
|
|
@@ -1036,7 +1239,12 @@ function emitDeepseekApiKeyHintIfNeeded(
|
|
|
1036
1239
|
export const __internals_for_tests = {
|
|
1037
1240
|
buildStableCandidates,
|
|
1038
1241
|
optimizeSystemPrompt,
|
|
1242
|
+
stripSessionOverviewChurn,
|
|
1243
|
+
formatSkillsForPrompt,
|
|
1244
|
+
formatSkillsForPromptCompressed,
|
|
1245
|
+
compressSkillsInSystemPrompt,
|
|
1039
1246
|
MIN_STABLE_CANDIDATE_LENGTH,
|
|
1247
|
+
SKILL_COMPRESSION_MIN_COUNT,
|
|
1040
1248
|
};
|
|
1041
1249
|
|
|
1042
1250
|
export default function (pi: ExtensionAPI) {
|
|
@@ -1120,7 +1328,17 @@ export default function (pi: ExtensionAPI) {
|
|
|
1120
1328
|
await rollOverStatsIfNeeded(ctx);
|
|
1121
1329
|
|
|
1122
1330
|
const adapter = selectAdapterForModel(model);
|
|
1123
|
-
|
|
1331
|
+
let statusText: string | undefined = adapter ? formatCacheStats(adapter, getStatsForAdapter(adapter)) : undefined;
|
|
1332
|
+
|
|
1333
|
+
// If optimizeSystemPrompt detected structural truncation on this or
|
|
1334
|
+
// a recent turn, flag it once in the footer so the user knows to
|
|
1335
|
+
// /reload before continuing. The flag resets after emission so a
|
|
1336
|
+
// single-turn glitch does not permanently taint the footer.
|
|
1337
|
+
if (promptTruncationDetected && statusText !== undefined) {
|
|
1338
|
+
statusText = statusText + " ⚠️ integrity";
|
|
1339
|
+
promptTruncationDetected = false;
|
|
1340
|
+
}
|
|
1341
|
+
|
|
1124
1342
|
if (statusText === lastStatusText) return;
|
|
1125
1343
|
|
|
1126
1344
|
lastStatusText = statusText;
|
|
@@ -1145,13 +1363,45 @@ export default function (pi: ExtensionAPI) {
|
|
|
1145
1363
|
});
|
|
1146
1364
|
|
|
1147
1365
|
pi.on("before_agent_start", async (event, _ctx) => {
|
|
1148
|
-
|
|
1366
|
+
// Step 1: strip per-turn churn from <session-overview>.
|
|
1367
|
+
// Removing RECENT COMMITS, Working directory status, and
|
|
1368
|
+
// Journal line count makes more of the session-overview stable
|
|
1369
|
+
// across turns, which DeepSeek's prefix cache can then retain.
|
|
1370
|
+
const strippedPrompt = stripSessionOverviewChurn(event.systemPrompt);
|
|
1371
|
+
|
|
1372
|
+
// Step 2: compress skills XML → one-line index.
|
|
1373
|
+
// The compressed form is identical-string-equivalent to the
|
|
1374
|
+
// verbose one as far as cache-stability is concerned because both
|
|
1375
|
+
// are deterministic from the same `event.systemPromptOptions.skills`.
|
|
1376
|
+
// No-op if opted out, below SKILL_COMPRESSION_MIN_COUNT, or if pi
|
|
1377
|
+
// emitted a format we don't recognize.
|
|
1378
|
+
const compressedPrompt = compressSkillsInSystemPrompt(
|
|
1379
|
+
strippedPrompt,
|
|
1380
|
+
event.systemPromptOptions,
|
|
1381
|
+
);
|
|
1382
|
+
|
|
1383
|
+
// Step 3: lift stable content above dynamic content for cache
|
|
1384
|
+
// stability. Operates on the (stripped + compressed) prompt so the
|
|
1385
|
+
// cache key derived from `stablePrefix` reflects what actually
|
|
1386
|
+
// ships to the provider.
|
|
1387
|
+
const optimized = optimizeSystemPrompt(compressedPrompt, event.systemPromptOptions);
|
|
1149
1388
|
latestPromptCacheKey = buildPromptCacheKey(optimized.stablePrefix);
|
|
1150
1389
|
|
|
1151
1390
|
if (optimized.changed && optimized.systemPrompt.trim().length > 0) {
|
|
1152
1391
|
return { systemPrompt: optimized.systemPrompt };
|
|
1153
1392
|
}
|
|
1154
1393
|
|
|
1394
|
+
// Reorder didn't apply but compression might have. Return the
|
|
1395
|
+
// compressed (or stripped) prompt directly so we still benefit from
|
|
1396
|
+
// the volume cut even when reorder is a no-op (e.g., short sessions
|
|
1397
|
+
// where no stable candidate is long enough).
|
|
1398
|
+
if (compressedPrompt !== strippedPrompt && compressedPrompt.trim().length > 0) {
|
|
1399
|
+
return { systemPrompt: compressedPrompt };
|
|
1400
|
+
}
|
|
1401
|
+
if (strippedPrompt !== event.systemPrompt && strippedPrompt.trim().length > 0) {
|
|
1402
|
+
return { systemPrompt: strippedPrompt };
|
|
1403
|
+
}
|
|
1404
|
+
|
|
1155
1405
|
return {};
|
|
1156
1406
|
});
|
|
1157
1407
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pi-cache-optimizer",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.1.1",
|
|
4
4
|
"description": "Pi extension that improves provider-side KV/prompt cache hit rates (DeepSeek, OpenAI, Claude, Gemini) by reordering the system prompt, requesting long retention, and showing footer cache stats. Renamed from pi-deepseek-cache-optimizer.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"pi-package",
|