npm - opencode-acp - Versions diffs - 1.4.0 → 1.4.1 - Mend

opencode-acp 1.4.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -22,78 +22,41 @@ The model decides <em>when</em> and <em>what</em> to compress — not a hard lim
 ## Why ACP
-ACP is **model-driven context management** for OpenCode. Instead of passively
-truncating at a hard limit, it exposes tools that let the model decide **when**
-and **what** to compress — producing high-fidelity summaries of completed
-segments while freeing context space. The model controls what to keep, which is
-strictly better than blind truncation.
-### What makes ACP different
-- **Full block lifecycle, model-driven.** The model can `compress` a range into a
-  summary, `decompress` to restore any block on demand, and `mark_block` /
-  `unmark_block` to flag blocks for deferred deletion. The model owns its own
-  context lifecycle — not just "create a block and hope GC handles it".
-- **Cache-aware by design.** Summaries merge into existing user turns and batch
-  cleanup does a *single* cache break, so prefix-cache hit ratios stay near **90%**
-  even when sessions run at 70%+ context utilization (see [Proven at scale](#proven-at-scale)).
-- **Pressure-aware GC.** Instead of blind age-based truncation that silently
-  drops important info (task IDs, file paths, decisions), ACP consolidates marked
-  blocks first and demotes blind truncation to a last-resort fallback at 100%.
-- **Two compression modes.** *Range* mode (contiguous spans → block summaries)
-  and *message* mode (surgical per-message summaries for scattered content).
-- **Protected content.** Tool outputs, file patterns, and user messages you mark
-  protected are injected into summaries, so nothing critical is ever lost.
-- **Automatic strategies.** Deduplication (same tool + args → keep last) and
-  purge-errors (drop errored inputs after N turns), recalculated on compress —
-  not on every turn.
-- **Production-grade configuration.** 3-layer merge (global → config-dir →
-  project), per-model context-limit overrides, and user-editable prompts.
-### A hardened fork of DCP
-ACP started as a fork of [DCP](https://github.com/Tarquinen/opencode-dynamic-context-pruning)
-and now diverges so far that the original is a small subset. Beyond the features
-above, it ships **37 bug fixes** that make the core production-stable — state
-persistence across restarts, real token reporting (was returning 0), GC
-deactivation, reversed-boundary auto-recovery, 268× logger/tokenizer speedup,
-dialog-role confusion fixes, and skipping OpenCode's internal title/summary
-agents so session titles keep generating. Core deltas vs the original:
-| | DCP (original) | ACP |
-|---|---|---|
-| **Max stable session** | ~200 messages | 10,000+ |
-| **Per-turn overhead** | 20 – 50 s | ~90 ms |
-| **Model-driven decompress + block cleanup** | No | Yes |
-| **State survives restart** | No | Yes |
+ACP hands all context-management authority to the model itself — not relying on
+external models or any complex external mechanism to do context management. It
+is, to date, the best context-management implementation on the market.
----
+This brings two concrete effects:
-## Proven at scale
+- **It saves about two-thirds of tokens.** A model with a 1,000,000-token context
+  window effectively runs in the **200,000–300,000 token range**.
+- **It supports ultra-long sessions without losing key content** — **500M-token-level
+  cumulative context, 100,000 messages per session**.
-ACP is battle-tested on real, long-running engineering sessions. Aggregate stats
-from a single developer workstation (1,445 sessions, 69,097 model turns):
+---
-| Metric | Value |
-|--------|-------|
-| Total tokens processed (incl. prompt-cache reads) | **6.17 billion** |
-| Billable tokens (input + output + reasoning) | 828 million |
-| Prompt-cache hit ratio (average) | ~87% |
-| Compression blocks created (all-time) | 4,894 |
+## Proven at scale
-Two representative heavy sessions (anonymized) — the headline number is **total
-tokens pushed through the model**, not peak context:
+Real engineering context, in practice.
-| Session | Span | Turns | **Total tokens** | Cache hit | Context p50 | Context p95 | Peak |
-|---------|------|-------|------------------|-----------|-------------|-------------|------|
-| Session 1 | 6 days | 2,694 | **582 M** | 86.2% | 1.2 K (<1%) | 251 K (25%) | 488 K (49%) |
-| Session 2 | 2 days | 1,536 | **463 M** | 89.0% | 1.8 K (<1%) | 335 K (34%) | 769 K (77%) |
+**Supports 500M-token-level cumulative context, with p95 context around 30% and
+an average prompt-cache hit ratio above 85%.** (That average — not per-session —
+is explained in [Impact on Prompt Caching](#impact-on-prompt-caching), where it
+turns out to save far more tokens than traditional compression.)
-The picture this paints: the *median* turn is tiny (short tool exchanges), but
-the heavy turns regularly reach **250K–335K context (25–34% of the 1M window)**
-and occasionally spike to **49–77%**. Even at those spikes the prefix-cache hit
-ratio stays near **90%** — the payoff of ACP's cache-aware compression, which
-prunes from the tail (preserving the shared prefix) instead of truncating blindly.
+| | Session 1 | Session 2 |
+|---|---|---|
+| **Messages** | 3,024 | 2,028 |
+| **Total tokens processed** | 582 M | 463 M |
+| **Prompt-cache hit ratio** | 86.2% | 89.0% |
+| **Context p50 (median)** | 1.2 K (<1%) | 1.8 K (<1%) |
+| **Context p75** | 2.8 K | 3.5 K |
+| **Context p90** | 108 K (11%) | 58 K (6%) |
+| **Context p95** | 251 K (25%) | 335 K (34%) |
+| **Context p99** | 425 K (43%) | 442 K (44%) |
+| **Peak** | 488 K (49%) | 769 K (77%) |
+(Context percentages are of the 1M window.)
 ---
@@ -117,39 +80,70 @@ Or add to your opencode config:
 ## How It Works
-ACP reduces context size through a compress tool and automatic cleanup. Your session history is never modified -- ACP replaces pruned content with placeholders before sending requests to your LLM.
+ACP hands the context-compression tool directly to the model. The model is
+**100% responsible** for context compression. The model's available tools are
+mainly: **compress**, **decompress**, and **delete** (`mark_block` / `unmark_block`).
-### Compress
+### Lifecycle
-Compress is a tool exposed to your model that replaces closed, stale conversation content with high-fidelity technical summaries. You can think of this as a much smarter version of Opencode's compaction process. Instead of triggering statically when your session reaches its maximum context and on the entire coding session, Compress allows the model to pick when to activate based on task completion, and to only compress the specific messages that are no longer needed verbatim.
+Three operations: **compress**, **decompress**, and **delete**. Content loops
+between raw and compressed, and eventually terminates in deletion:
-ACP supports two compression modes:
+```mermaid
+stateDiagram-v2
+    Raw --> Compressed : compress
+    Compressed --> Raw : decompress
+    Compressed --> Deleted : delete
+```
+### Compression strategy
+The system injects a prompt telling the model the current context ratio, the
+compression ratio, whether context is idle, and compression suggestions. When the
+trigger ratio is hit, content is compressed in **priority order**:
-- **`range` mode** compresses contiguous spans of conversation into one or more summaries.
-- **`message` mode** (experimental) compresses individual raw messages independently, letting the model manage context much more surgically.
+1. Agent/subagent review & consultation results (largest block of uncompressed content)
+2. Verbose command output (build/test runs, git diff/log/status, directory listings)
+3. Exploration that led nowhere (failed approaches, dead-end searches)
+4. Redundant tool results (reading the same file repeatedly, repeated status checks)
+5. Intermediate steps of completed multi-step tasks
+6. Resolved discussion threads (once a decision is recorded)
+7. Large file contents already used
-In `range` mode, when a new compression overlaps an earlier one, the earlier summary is nested inside the new one so information is preserved through layers of compression rather than diluted away. In both modes, protected tool outputs (such as subagents and skills) and protected file patterns are kept in compression summaries, ensuring that the most important information is never lost. You can also enable `protectUserMessages` to preserve your messages verbatim during compression, though note that large prompts (e.g. copy-pasting log files in the prompt) will then never be compressed away.
+After compression, the original content is replaced by a short **block** that
+references the original (recoverable via `decompress`).
-### Deduplication
+### Decompression strategy
-Identifies repeated tool calls (same tool, same arguments) and keeps only the most recent output. Recalculated when the compress tool runs, so prompt cache is only impacted alongside compression.
+The model decides when to decompress. When the context is large enough to
+interfere with the model's self-attention, short blocks lead the model to compress
+some content first, handle the urgent matter, then decompress what it needs in
+later work.
-### Purge Errors
+### Deletion strategy
-Prunes inputs from errored tool calls after a configurable number of turns (default: 4). Error messages are preserved; only the potentially large input content is removed. Recalculated on compress tool use.
+To handle the accumulation of many small historical blocks, the new version adds
+a deletion strategy. The model decides whether to delete. **Once deleted, content
+is irrecoverable.** This replaces the original forced GC, so that forced garbage
+collection no longer deletes things the model considers important.
-### Deferred Block Cleanup (mark_block)
+---
+## Impact on Prompt Caching
-Besides `compress` and `decompress`, ACP exposes `mark_block` / `unmark_block` tools. These give the model a **zero-cache-cost** way to flag compressed blocks it no longer needs in detail, deferring all consolidation into a single operation when context pressure rises.
+Historically, ACP has fixed many of the low-cache-hit-rate problems caused by
+DCP. The overall cache hit rate is now **~87%**.
-- **`mark_block`** flags a block for later cleanup. The block stays fully active and keeps serving prompt-cache hits — nothing changes immediately.
-- When context usage crosses configurable thresholds, ACP consolidates all marked blocks into one summary in a **single cache break** (instead of losing them one at a time):
-  - **Low (default 60%)**: a nudge reminds the model that marked blocks can be merged.
-  - **High (default 75%)**: all marked blocks are auto merge-compressed into one.
-  - **Force (default 90%)**: all old-gen blocks are merged regardless of marks — a last resort before age-based GC truncation.
-- **`unmark_block`** removes the flag if the model changes its mind.
+Compared to traditional compression — which only compresses at 80–90% and, once it
+compresses, forces 100% of the context to re-hit — ACP's hit rate is effectively
+higher.
-This is purely additive — existing GC behavior is retained as the ultimate fallback at 100%, and merged blocks still respond to `decompress`. Thresholds are configurable under `gc.batchCleanup`.
+Additionally, ACP keeps total context around **~30% most of the time**, versus the
+traditional **50–80%**. So total token savings are far higher than traditional
+compression.
+**Conclusion:** ACP simultaneously raises the overall cache hit rate **and**
+ensures key context information is not lost.
 ---
@@ -368,22 +362,6 @@ For the `compress` tool, `compress.protectedTools` ensures specific tool outputs
 ---
-## Impact on Prompt Caching
-LLM providers cache prompts based on exact prefix matching. When ACP prunes content, it changes messages, which invalidates cached prefixes from that point forward.
-**Trade-off:** You lose some cache reads but gain token savings from reduced context size and fewer hallucinations from stale context. In most cases, especially in long sessions, the savings outweigh the cache miss cost.
-> [!NOTE]
-> In testing, cache hit rates were approximately 85% with ACP vs 90% without.
-**No impact for:**
-- **Request-based billing** -- Providers like GitHub Copilot that charge per request, not tokens.
-- **Uniform token pricing** -- Providers like Cerebras that bill cached and uncached tokens at the same rate.
----
 ## Migrating from DCP
 ACP is a drop-in replacement for DCP. To migrate:

package/README.zh-CN.md CHANGED Viewed

@@ -22,50 +22,34 @@
 ## 为什么选择 ACP
-ACP 是 OpenCode 的**模型驱动上下文管理**。它不是被动地在硬性上限处截断，而是暴露工具让模型自行决定**何时**压缩以及**压缩什么** —— 对已完成的片段生成高保真摘要，在释放上下文空间的同时保留重要细节。模型掌握保留哪些信息，这严格优于盲目截断。
+ACP 将上下文管理的所有权限全部交给模型自己，而不依靠外部模型或各种复杂的机制去做上下文管理。它是迄今为止，市面上对上下文管理最好的实现。
-### ACP 有何不同
+这带来两个影响：
-- **完整的块生命周期，模型自主。** 模型可以 `compress` 把一段范围压成摘要、`decompress` 按需恢复任意块、`mark_block` / `unmark_block` 标记延迟删除。模型拥有自己的上下文生命周期 —— 而不是"建个块然后指望 GC 去处理"。
-- **缓存感知设计。** 摘要合并进已有的用户轮次，批量清理只产生*单次*缓存打断，所以即便会话跑到 70%+ 上下文占用，前缀缓存命中率仍接近 **90%**（见 [实战验证](#实战验证)）。
-- **压力感知 GC。** 不是盲目按年龄截断（会静默丢失 task ID、文件路径、决策等重要信息），而是优先整合被标记的块，把盲目截断降级为 100% 时的最后兜底。
-- **两种压缩模式。** *Range* 模式（连续片段 → 块摘要）和 *message* 模式（针对分散内容的精准单消息摘要）。
-- **受保护内容。** 你标记保护的工具输出、文件模式、用户消息会被注入摘要，关键信息永不丢失。
-- **自动策略。** 去重（相同工具+参数 → 只留最后一次）和清除错误（N 轮后丢弃出错的输入），在 compress 时重算 —— 不是每轮。
-- **生产级配置。** 三层合并（全局 → 配置目录 → 项目）+ 每模型上下文上限覆盖 + 用户可编辑 prompt。
-### DCP 的强化分支
-ACP 起初是 [DCP](https://github.com/Tarquinen/opencode-dynamic-context-pruning) 的分支，如今分歧之大已让原版只相当于一个小子集。除上述特性外，还包含 **37 项 bug 修复**让核心达到生产稳定 —— 跨重启状态持久化、真实 token 上报（此前返回 0）、GC 停用、反转边界自动恢复、268 倍日志/tokenizer 加速、对话角色混乱修复，以及跳过 OpenCode 内置 title/summary agent 恢复标题生成。相对原版的核心差异：
-| | DCP（原版） | ACP |
-|---|---|---|
-| **最大稳定会话** | ~200 条消息 | 10,000+ |
-| **每轮开销** | 20 – 50 秒 | ~90 ms |
-| **模型自主解压 + 块清理** | 否 | 是 |
-| **状态跨重启保留** | 否 | 是 |
+- **省 token（约三分之二）。** 一个 100 万 token 上下文窗口的模型，实际只在 **20 万–30 万 token** 区间运行。
+- **超长上下文不丢关键内容** —— 支持 **5 亿级别上下文、单会话 10 万条消息**。
 ---
 ## 实战验证
-ACP 在真实的长周期工程会话上经受过检验。单台开发机的累计统计（1445 个会话，69,097 个模型回合）：
-| 指标 | 数值 |
-|------|------|
-| 总处理 token（含 prompt-cache 读取） | **61.7 亿** |
-| 计费 token（input + output + reasoning） | 8.28 亿 |
-| prompt-cache 命中率（平均） | ~87% |
-| 累计创建压缩 block | 4,894 个 |
-两个有代表性的重负载会话（已匿名化）—— 头条数字是**流经模型的累计 token**，而不是峰值上下文：
+真实工程中的上下文情况。
-| 会话 | 跨度 | 回合数 | **累计 token** | 缓存命中 | 上下文 p50 | 上下文 p95 | 峰值 |
-|------|------|--------|----------------|----------|------------|------------|------|
-| 会话一 | 6 天 | 2,694 | **5.82 亿** | 86.2% | 1.2 K（<1%） | 25.1 万（25%） | 48.8 万（49%） |
-| 会话二 | 2 天 | 1,536 | **4.63 亿** | 89.0% | 1.8 K（<1%） | 33.5 万（34%） | 76.9 万（77%） |
+**支持 5 亿级别 token，p95 上下文比例在 30% 左右，平均缓存命中率 85% 以上。**（注意这是平均缓存命中率，不是单会话命中率——后面[对 Prompt 缓存的影响](#对-prompt-缓存的影响)会解释，这实际上比传统压缩算法大幅度节省了 token。）
-这组数据说明：*中位*回合很小（短小的工具交互），但重量级回合经常到 **25–33 万上下文（1M 窗口的 25–34%）**，偶尔飙到 **49–77%**。即便在这些尖峰处，前缀缓存命中率仍接近 **90%** —— 这正是 ACP 缓存感知压缩的价值：从尾部裁剪（保留共享前缀），而非盲目截断。
+| | 会话一 | 会话二 |
+|---|---|---|
+| **消息总条数** | 3,024 | 2,028 |
+| **累计处理 token** | 5.82 亿 | 4.63 亿 |
+| **prompt-cache 命中率** | 86.2% | 89.0% |
+| **上下文 p50（中位）** | 1.2 K（<1%） | 1.8 K（<1%） |
+| **上下文 p75** | 2.8 K | 3.5 K |
+| **上下文 p90** | 10.8 万（11%） | 5.8 万（6%） |
+| **上下文 p95** | 25.1 万（25%） | 33.5 万（34%） |
+| **上下文 p99** | 42.5 万（43%） | 44.2 万（44%） |
+| **峰值** | 48.8 万（49%） | 76.9 万（77%） |
+（上下文百分比均以 1M 窗口计。）
 ---
@@ -89,39 +73,52 @@ opencode plugin opencode-acp@latest --global
 ## 工作原理
-ACP 通过 `compress` 工具和自动清理来缩减上下文大小。你的会话历史永远不会被修改 — ACP 在向 LLM 发送请求之前，用占位符替换已剪枝的内容。
+ACP 把上下文压缩工具直接交给模型。模型对上下文压缩**负全责**。模型可用的工具主要是：**compress**、**decompress** 和 **delete**（`mark_block` / `unmark_block`）。
+### 生命周期
+三个操作：**压缩**、**解压缩**、**删除**。内容在原始与压缩之间循环，最终以删除终结：
+```mermaid
+stateDiagram-v2
+    Raw --> Compressed : compress
+    Compressed --> Raw : decompress
+    Compressed --> Deleted : delete
+```
+### 压缩策略
-### Compress
+系统会注入一段 prompt，告诉模型当前的上下文比例、压缩比例、上下文是否空闲，以及压缩建议。当触发比例被命中时，内容按**优先级顺序**被压缩：
-Compress 是一个暴露给模型的工具，用高保真的技术摘要替换已关闭、过时的对话内容。你可以将其视为 OpenCode 内置压缩过程的更智能版本。与在会话达到最大上下文时静态触发并对整个编码会话进行压缩不同，Compress 允许模型根据任务完成情况选择何时激活，并且只压缩不再需要逐字保留的特定消息。
+1. Agent/子代理的评审与咨询结果（最大一块未压缩内容）
+2. 冗长的命令输出（构建/测试运行、git diff/log/status、目录列表）
+3. 无结果的探索（失败的方法、死胡同式的搜索）
+4. 冗余的工具结果（反复读同一个文件、重复的状态检查）
+5. 已完成多步任务的中间步骤
+6. 已尘埃落定的讨论（一旦决策被记录）
+7. 已经用过的大段文件内容
-ACP 支持两种压缩模式：
+压缩完成后，原始内容被一个简短的 **block** 替换，该 block 引用原始内容（可通过 `decompress` 恢复）。
-- **`range` 模式**将连续的对话片段压缩为一个或多个摘要。
-- **`message` 模式**（实验性）独立压缩单条原始消息，使模型能够更精细地管理上下文。
+### 解压策略
-在 `range` 模式下，当新的压缩与较早的压缩重叠时，较早的摘要会嵌套在新的摘要中，使信息通过压缩层级得以保留而非被稀释。在两种模式下，受保护的工具输出（如子代理和技能）以及受保护的文件模式会在压缩摘要中保留，确保最重要的信息永远不会丢失。你还可以启用 `protectUserMessages` 以在压缩过程中逐字保留你的消息，但请注意，大型提示（例如在提示中复制粘贴日志文件）将永远不会被压缩掉。
+由模型决定何时解压。当上下文大到足以干扰模型的 self-attention 时，简短的 block 会让模型先压缩一部分内容，处理完紧急事务，再在后续工作中按需解压。
-### 去重
+### 删除策略
-识别重复的工具调用（相同工具、相同参数），仅保留最近的输出。在 `compress` 工具运行时重新计算，因此提示缓存仅在压缩时受到影响。
+为了应对大量小块历史内容的堆积，新版本增加了删除策略。由模型决定是否删除。**一旦删除，内容不可恢复。** 这取代了原先的强制 GC，使得强制垃圾回收不再删除模型认为重要的内容。
-### 清除错误
+---
-在可配置的轮次后（默认：4 轮）剪除出错工具调用的输入。错误消息被保留；仅移除可能很大的输入内容。在 `compress` 工具使用时重新计算。
+## 对 Prompt 缓存的影响
-### 延迟块清理（mark_block）
+历史上 ACP 修复了大量由 DCP 导致的低缓存命中率问题。目前整体缓存命中率约为 **87%**。
-除 `compress` 和 `decompress` 外，ACP 还提供 `mark_block` / `unmark_block` 工具。它们让模型能以**零缓存成本**标记不再需要详细内容的压缩块，将所有整合推迟到上下文压力升高时一次性完成。
+相比传统压缩——只在 80–90% 时才压缩，一旦压缩就强制 100% 的上下文重新命中——ACP 的命中率实际上更高。
-- **`mark_block`** 将一个块标记为稍后清理。该块仍完全活跃并继续服务提示缓存命中 —— 立即没有任何变化。
-- 当上下文使用率越过可配置阈值时，ACP 会在**单次缓存打断**中将所有已标记块整合为一个摘要（而非逐个丢失）：
-  - **低阈值（默认 60%）**：提醒模型已标记的块可以合并。
-  - **高阈值（默认 75%）**：所有已标记块被自动合并压缩为一个。
-  - **强制阈值（默认 90%）**：无论是否标记，所有老年代块都被合并 —— 这是基于年龄的 GC 截断前的最后手段。
-- **`unmark_block`** 在模型改变主意时移除标记。
+此外：ACP 大部分时间将总上下文维持在 **~30%** 左右，而传统方案是 50–80%。因此总 token 节省远高于传统压缩。
-此机制纯粹是附加的 —— 既有的 GC 行为保留为 100% 时的最终兜底，合并后的块仍可被 `decompress` 恢复。阈值可在 `gc.batchCleanup` 下配置。
+**结论：** ACP 在提高整体缓存命中率的同时，确保关键上下文信息不丢失。
 ---
@@ -340,22 +337,6 @@ ACP 暴露六个可编辑的 prompt：
 ---
-## 对 Prompt 缓存的影响
-LLM 提供商基于精确前缀匹配来缓存 prompt。当 ACP 剪枝内容时，它会修改消息，从而从该点开始使缓存的前缀失效。
-**权衡：** 你会损失一些缓存读取，但从缩减的上下文大小中获得 token 节省，并减少因过时上下文产生的幻觉。在大多数情况下，尤其是长会话中，节省的开销超过缓存未命中的代价。
-> [!NOTE]
-> 在测试中，使用 ACP 的缓存命中率约为 85%，不使用时约为 90%。
-**以下场景无影响：**
-- **按请求计费** — 如 GitHub Copilot 等按请求而非按 token 计费的提供商。
-- **统一 token 定价** — 如 Cerebras 等对缓存和未缓存 token 统一价格的提供商。
----
 ## 从 DCP 迁移
 ACP 是 DCP 的直接替代品。迁移步骤：

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
     "$schema": "https://json.schemastore.org/package.json",
     "name": "opencode-acp",
-    "version": "1.4.0",
+    "version": "1.4.1",
     "type": "module",
     "description": "Active Context Pruning — model-driven context management for OpenCode (hardened fork of DCP with 34 bug fixes)",
     "main": "./dist/index.js",