npm - opencode-engram - Versions diffs - 0.1.0 - Mend

opencode-engram 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/LICENSE +21 -0
package/README.md +259 -0
package/README.zh.md +257 -0
package/README.zht.md +255 -0
package/package.json +48 -0
package/src/common/charting.ts +107 -0
package/src/common/common.ts +74 -0
package/src/common/config.ts +963 -0
package/src/common/history-prompt.ts +72 -0
package/src/common/plugin.ts +593 -0
package/src/common/upstream-navigator-prompt.ts +131 -0
package/src/core/index.ts +33 -0
package/src/core/sdk-bridge.ts +73 -0
package/src/core/session.ts +196 -0
package/src/core/turn-index.ts +219 -0
package/src/domain/adapter.ts +386 -0
package/src/domain/clip-text.ts +86 -0
package/src/domain/domain.ts +618 -0
package/src/domain/preview.ts +132 -0
package/src/domain/serialize.ts +409 -0
package/src/domain/types.ts +321 -0
package/src/runtime/charting.ts +73 -0
package/src/runtime/debug.ts +155 -0
package/src/runtime/logger.ts +34 -0
package/src/runtime/message-io.ts +224 -0
package/src/runtime/runtime.ts +1033 -0
package/src/runtime/search.ts +1280 -0
package/src/runtime/turn-resolve.ts +111 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 NocturnesLK
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,259 @@
+# opencode-engram
+[简体中文](./README.zh.md) | [繁體中文](./README.zht.md)
+A pull-based conversation history retrieval plugin for [OpenCode](https://github.com/opencode-ai/opencode).
+Internet information is public-tier experience, local environment information is project-tier experience, and the conversation history agents accumulate during their work — reasoning chains, rejected paths, user constraints — is task-tier experience. Yet this history, the closest to actual work, almost always sinks into storage right after it's produced, never reused by subsequent agents. Engram treats conversation history as an equally important **third information source**, letting agents pull on demand during execution — at the moment of maximum information, the role that best understands the need autonomously decides what it needs.
+Engram implements two functional modules: **Context Charting** — an approach that challenges traditional context compression — and **Upstream History Retrieval**, which enables sub-agents to retrieve the conversation history of upstream agents on demand.
+> This is a personal project, not officially developed by or affiliated with OpenCode. OpenCode was chosen as the host platform because its openness in conversation history access and plugin integration made this exploration possible.
+## Features
+The current mainstream paradigm for context migration is push-based: **context is filtered or distilled and then handed off to the next agent.** For example, in multi-agent systems, a parent agent summarizes context into a prompt for the sub-agent; in context compression, context is compressed into a summary and passed to a new version of itself.
+However, the push model has an inherent limitation: **it requires anticipating all necessary information before the next agent begins, then transfers it in a highly lossy, irreversible manner.** More dangerous than the information loss itself is that the next agent cannot even perceive the loss — it treats what it receives as the complete reality. So things often end up like this: the agent reasons with total confidence, follows a dubious chain of logic, and produces a skewed result.
+Pull-based is the opposite paradigm: **the next agent fetches the context it needs by itself.** During execution, the agent retrieves information on demand as actual gaps arise. Web search, codebase exploration, and most memory systems follow this paradigm. Compared to push, pull has a critical advantage: **the moment of information filtering is delayed from "before work starts" to "when the need emerges", and the filtering entity shifts from an external role to the agent itself. At the moment of maximum information, the role that best understands the need makes the judgment.**
+The question is: in context migration scenarios, there are virtually no pull-based designs. Why? I believe it's because people have long overlooked a third important information source beyond internet information and local environment information — **conversation history**. This history has almost never been treated as a resource agents can use; it is not revisited by subsequent agents, nor even by the same agent's compressed self within the same session.
+Why has conversation history been neglected for so long? Because the mainstream focuses only on byproducts of conversation history — extracting it into fragmented bits, distilling it into structured summaries — while ignoring direct use of the raw conversation history itself. This history is closest to the agent, yet least utilized. Engram's starting point is to restore conversation history as a resource agents can directly access, using it the same way they use web search and codebase exploration.
+### Context Charting
+Context compression is where the limitations of the push paradigm are most concentrated.
+When conversation turns reach the context window limit, the standard approach is: call an LLM to summarize the conversation history into a text summary, then replace the original history with the summary to free up space. Under this model, the agent in subsequent turns reads a further-distilled summary but treats it as the complete factual basis of the history for its reasoning.
+This model has at least three fundamental contradictions that cannot be resolved by optimizing prompts:
+- **Cognitive Asymmetry:** Even if prompt optimization makes the model aware it is reading a summary, you cannot give it the meta-cognition of "what was compressed away." Knowing something is missing without knowing *what* is missing is no different from not knowing at all, because the model can only reason based on what it has.
+- **Upfront Decision Risk:** The compressing agent must decide what to keep and what to discard without knowing what the subsequent tasks will be. This is an irreversible information loss decision.
+- **Cumulative Distortion:** As the conversation continues, summaries get summarized again, forming "summaries of summaries." After multiple iterations, early original solutions, user corrections, and fine-grained constraints gradually disappear, causing the agent's behavior to drift further and further from the original requirements.
+In reality, the raw conversation history was never lost — traditional designs simply treated it as a sunk cost. Engram's approach is to abandon the idea of "replacing history with text summaries" and instead provide a set of **structured history navigation data**.
+```mermaid
+flowchart LR
+    Source["50 turns of conversation"]
+    subgraph Traditional["Traditional Compression"]
+        direction TB
+        B["Text summary (lossy replacement)"]
+        C1["Subsequent reasoning"]
+        B --> C1
+        C1 -.->|"Cannot trace back to original history"| B
+    end
+    subgraph Charting["Context Charting"]
+        direction TB
+        E["Navigation data block (lossless index)"]
+        C2["Subsequent reasoning"]
+        E --> C2
+        C2 <-->|"Pull original history on demand"| E
+    end
+    Source -->|"LLM summary"| B
+    Source -->|"Extract metadata"| E
+    style Source fill:#1a1a2e,stroke:#58a6ff,color:#fff
+    style B fill:#4a2020,stroke:#ff6b6b,color:#fff
+    style E fill:#1a2a1a,stroke:#3fb950,color:#fff
+```
+In Engram, "charting" means generating a set of structured navigation data for the currently visible conversation. When compression is triggered, Engram injects a structured data block into the context:
+- **Conversation Overview:** Contains a structured index of the currently visible turns along with previews of user and agent messages within each turn. Subsequent agents can use tools to dive into turns of interest for further exploration.
+- **Recent Process:** Retains a message window near the latest visible turn, letting the agent quickly understand the current state.
+- **Retrieval Guidance:** Through injecting specific prompts, it builds a mental model for the agent, making it value conversation history information as much as local environment information, activating an exploratory mindset.
+<details>
+<summary>Implementation Notes</summary>
+Theoretically, the process of the compressing agent generating a summary could be skipped entirely, but OpenCode does not yet provide an interface to bypass it. For simplicity, Engram currently uses a temporary approach: prompt injection makes the compressing agent return very brief output, then the structured navigation data directly replaces the original content. This is not free — it still incurs one compression inference cost — but it is usually extremely low. (Perhaps in the future, OpenCode can be encouraged to open an interface to solve this.)
+</details>
+Context Charting fundamentally resolves the core contradictions of context compression:
+- The conversation overview is a simple preview of historical messages, naturally lacking in detail. The agent knows something is missing, so it naturally chooses to pull history to fill the gaps.
+- During execution, when encountering actual information gaps, the agent autonomously decides the scope and depth of retrospection, shifting the moment of information filtering from compression time to consumption time.
+- Conversation history suffers no loss whatsoever over time. No matter how many turns the conversation goes through, the oldest history remains within the agent's reach.
+### Upstream History Retrieval
+Beyond compression issues in long conversations, multi-agent collaboration faces similar challenges.
+Consider a typical scenario in multi-agent collaboration: the user and the main agent go through complex multi-turn discussions, settle on a plan, and accumulate a large number of constraints and process context along the way. Then the main agent wants to delegate a sub-agent to implement it. At this point, it faces a choice: what information should go into the prompt?
+A structural problem surfaces here: it must make a judgment before the sub-agent starts working, but the sub-agent's actual needs only emerge during execution.
+The more fundamental problem is: reasoning chains, rejected paths, and implicit constraints accumulated over multiple rounds of discussion cannot be compressed into a single prompt. The sub-agent naturally starts work with a massive context deficit — no one knows whether that context is truly important, and the main agent cannot guarantee nothing has been missed.
+Even if the plan is perfect and the main agent somehow passes it over completely (e.g., by referencing a plan document), the sub-agent loses the process context of how the plan was discussed, and is likely to execute in ways that deviate from expectations due to misunderstandings.
+An obvious solution is to dump all the context directly into the sub-agent's window — but then token waste and context decay follow. Take it a step further and filter for only necessary context? Then we're back to the same problem.
+Engram's solution: provide the sub-agent with a set of retrieval tools pointing directly to the main agent's complete conversation history. Whenever it encounters an information gap during execution, it looks it up itself.
+```mermaid
+sequenceDiagram
+    autonumber
+    participant History as Upstream Conversation History
+    participant Engram as Engram
+    participant Agent as Sub-agent
+    Note over Agent: Encounters information gap
+    Agent->>Engram: history_search("implementation plan for API A")
+    Engram->>History: Search
+    History-->>Engram: Matched records
+    Engram-->>Agent: Result preview (turn index + summary)
+    Note over Agent: Preview confirms relevance, dive deeper
+    Agent->>Engram: history_pull_message("msg_064")
+    Engram->>History: Read
+    History-->>Engram: Original message
+    Engram-->>Agent: Full message (technical constraints + field definitions)
+    Note over Agent: Context filled, continue task
+```
+Engram provides agents with specialized retrieval tools, each designed around cognitive pathways and progressive disclosure principles, maximizing retrieval efficiency and minimizing token consumption. Token consumption mainly occurs on the paths the agent actually dives into, not from loading everything upfront.
+The entire system accesses OpenCode's existing conversation storage in a read-only manner. It writes no data and maintains no derivative models. Zero maintenance cost, zero consistency issues.
+The core of this design is not efficiency optimization, although token savings are significant. The core is a transfer of judgment: **"What information is relevant to the current task" is no longer predicted by the main agent at the moment of delegation, but determined autonomously by the sub-agent at the moment of execution based on actual needs.** The former is the moment of least information; the latter is the moment of maximum information.
+### Conversation History Access Tools
+Context Charting and Upstream History Retrieval share the same set of retrieval tools. These tools are designed in layers according to the data model: **Turn → Message → Part**, following the principle of **Progressive Disclosure**. When executing tasks, an agent can start from the low-token-cost index layer, and only after confirming relevance through previews does it issue higher-token-cost content pulls to deeper levels. This architecture ensures information completeness while achieving optimal token efficiency for ultra-long sessions through precise control of disclosure depth.
+```mermaid
+flowchart TB
+    L1["L1: Turn Layer (Turn)<br/>history_browse_turns<br/><i>Low token · High information compression ratio</i>"]
+    L2["L2: Message Layer (Message)<br/>history_browse_messages<br/><i>Medium token · High navigation precision</i>"]
+    L3["L3: Content Layer (Part)<br/>history_pull_message / part<br/><i>High token · Full-text detail disclosure</i>"]
+    S["Direct Entry<br/>history_search<br/><i>Jump to content layer by keyword</i>"]
+    L1 --> L2 --> L3
+    S -.->|"Skip layers"| L3
+    classDef l1 fill:#163b22,stroke:#3fb950,stroke-width:2px,color:#fff
+    classDef l2 fill:#3d2b14,stroke:#f0883e,stroke-width:2px,color:#fff
+    classDef l3 fill:#112740,stroke:#58a6ff,stroke-width:2px,color:#fff
+    classDef search fill:#2a1a2e,stroke:#d2a8ff,stroke-width:2px,color:#fff
+    class L1 l1
+    class L2 l2
+    class L3 l3
+    class S search
+```
+`history_search` provides direct access at the **Part level**. When an agent already knows specific keywords or tool call characteristics (e.g., `bash`), it can skip the layer hierarchy and jump directly to specific content.
+For detailed tool interface documentation, see [docs/tools.md](docs/tools.md).
+#### L1: Turn Preview (history_browse_turns)
+This tool provides a global index of the conversation. Each turn contains only a user intent preview and assistant execution metadata (tool statistics, list of modified files). This allows the agent to scan hundreds of turns with minimal token cost and quickly locate targets.
+#### L2: Message Preview (history_browse_messages)
+View message sequences and their metadata (attachments, tool status) anchored by `message_id`. This layer is used for secondary confirmation within target turns, preventing the agent from blindly pulling full text without first verifying context relevance.
+#### L3: Full Pull (history_pull_message / history_pull_part)
+This is the highest token consumption layer. `history_pull_message` splits a message into independent parts by type. If content is truncated due to length limits, the agent can further call `history_pull_part` to retrieve the complete full text of that part. Only key content filtered through the previous two layers is allowed into the main context window.
+#### Arbitrary Session Access
+The tools natively support access to any session, but since this project has not yet introduced a session discovery tool, users need to explicitly specify the target session ID in their instructions. The agent can then work directly based on that session's history. For example:
+- "Continue from the latest progress in session `ses_xxx`, and complete the unfinished parts."
+- "Draw on the problem-solving experience in session `ses_xxx` to find a viable solution for the current problem."
+Session IDs can be obtained via the `opencode session list` command.
+## Design Philosophy
+### Zero Infrastructure
+Unlike most "memory" systems, Engram does not maintain vector databases, does not run embedding pipelines, and does not write any derivative data. The entire system accesses OpenCode's existing conversation storage in a **read-only manner**, with full-text search computed in real time at query time.
+This means: zero maintenance cost, zero consistency issues, zero additional storage. The conversation history itself is the database — no need to translate it again.
+### Transfer of Judgment
+Engram's design does not pursue more efficient information transfer or more precise context compression. Its core is a reallocation of judgment: **"What information is relevant to the current task" is no longer predicted by an external role at the moment of least information, but determined autonomously by the agent itself at the moment of maximum information.**
+This principle runs through both functional modules: Context Charting delays the moment of information filtering from compression time to consumption time; Upstream History Retrieval shifts the filtering entity from the main agent to the sub-agent.
+## Quick Start
+**Prerequisites:** Node.js 22+, [OpenCode](https://github.com/opencode-ai/opencode) installed.
+Register the plugin in your opencode.json(c) configuration file:
+```jsonc
+{
+  "plugin": ["opencode-engram"]
+}
+```
+Once done, restart OpenCode to start using it. No additional configuration is needed by default; all features work out of the box.
+## Configuration
+Both functional modules can be independently enabled or disabled via configuration parameters:
+```jsonc
+{
+  "upstream_history": {
+    "enable": true   // Upstream History Retrieval, enabled by default
+  },
+  "context_charting": {
+    "enable": true  // Context Charting, enabled by default
+  }
+}
+```
+The configuration file is `opencode-engram.json` / `opencode-engram.jsonc` located in the project root or the global OpenCode configuration directory. Project-level configuration overrides global configuration. Through configuration adjustments, you can finely control the level of detail exposed in tool output, balancing output quality and token consumption. Additionally, you can finely control the display behavior of tool inputs and outputs, with support for custom tools. For complete configuration field documentation, see [docs/config.md](docs/config.md).
+## Contributing
+Issues and PRs are welcome.
+```bash
+# Clone and install
+git clone https://github.com/NocturnesLK/opencode-engram.git
+cd opencode-engram
+npm ci
+# Type checking
+npm run typecheck
+# Run tests (80% coverage threshold)
+npm run test:coverage
+# Run a single test file
+npx vitest run src/runtime/runtime.test.ts
+```
+Test files are co-located with source files (`foo.ts` corresponds to `foo.test.ts`), using the Vitest framework. Please add tests alongside new features.
+## Roadmap
+- [ ] **Charting Benchmark:** Establish benchmarks to quantify the benefits of Context Charting compared to traditional compression in long-conversation tasks
+- [ ] **Cross-platform Support:** Extend retrieval tools to platforms beyond OpenCode (Claude Code, etc.)
+- [ ] **Agent Audit:** Add a third functional module where an independent audit agent pulls the execution history of a target agent, for development, debugging, evaluation, and iteration of the target agent
+## License
+MIT © 2026 NocturnesLK

package/README.zh.md ADDED Viewed

@@ -0,0 +1,257 @@
+# opencode-engram
+为 [OpenCode](https://github.com/opencode-ai/opencode) 开发的拉取式（Pull-based）对话历史检索插件。
+互联网信息是公共级的经验，本地环境信息是项目级的经验，而代理在工作中积累的对话历史——推理链、否决路径、用户约束——是任务级的经验。然而这些最贴近实际工作的历史几乎在产生后就沉入存储，从未被后续代理重新利用。Engram 将对话历史视作拥有同等地位的**第三信息源**，让代理在执行过程中按需拉取（Pull），在信息最充分的时刻，由最了解需求的角色自主决定需要什么。
+Engram 实现了两个功能模块：**上下文绘图（Context Charting）**——一种挑战传统上下文压缩的方案，以及**上游历史检索**——子代理按需检索上游代理的对话历史。
+> 本项目属于个人项目，并非由 OpenCode 官方开发，且不存在隶属关系。选择 OpenCode 作为宿主平台，是因为它在对话历史访问和插件集成上的开放性让本项目的探索成为了可能。
+## 功能
+当前上下文迁移的主流范式是推入式（Push）：**将上下文经过滤或提炼后移交给下一个代理。** 例如，在多智能体系统中，父代理总结上下文为提示词传递给子代理；又如上下文压缩，上下文被压缩为摘要信息传递给新的自己。
+然而，推入式具有本质上的局限：**在下一个代理执行前预判所需的一切信息，然后以损失极大且不可恢复的方式传递。** 比起信息的损失，更危险的是下一个代理甚至无法感知损失的存在，因为它将获得的信息视作完整的现实。于是，最后事情往往变成：代理怀着百分百的信心，进行可疑的推理，最后产出偏离的结果。
+拉取式（Pull）是与推入式相对的范式，即**下一个代理自行拉取需要的上下文。** 代理在执行过程中，根据实际遇到的信息缺口按需检索，互联网检索、代码库探索和大部分记忆系统都遵循这一范式。相比推入式，拉取式具有关键的优点：**信息筛选的时刻从"工作开始前"推迟到"需求浮现时"，筛选的主体从外部角色变为代理自身。在信息最充分的时刻，由最了解需求的角色做出判断。**
+问题是，在上下文迁移场景中，几乎没有基于拉取式的设计。为什么？我认为是因为长久以来，人们都忽视了对互联网信息和本地环境信息以外的第三个重要信息源的利用，那就是**对话历史**。这些历史几乎从未被当作代理可利用的资源，不会被后续的代理、甚至同一会话中压缩后的自己重新访问。
+为什么对话历史长期被忽视？因为主流只关注对话历史的加工产物——将对话历史提取为片段化的碎片，将对话历史提炼为结构化的摘要——却忽视了对原始对话历史的直接利用。这些历史离代理最近，却最少被利用。Engram 的出发点是将对话历史恢复为代理可直接访问的资源，像使用互联网搜索和代码库探索一样使用对话历史。
+### 上下文绘图（Context Charting）
+上下文压缩是推入式范式局限最为集中的场景。
+当对话轮次触及上下文窗口上限时，标准做法是：调用 LLM 将历史对话概括为一段文字摘要，然后用摘要替换原始历史以释放空间。在这种模式下，后续轮次的代理读取到的是经过二次提炼的摘要，却将其作为历史的全部事实基础进行推理。
+这种模式至少存在三个难以通过优化提示词解决的根本矛盾：
+- **认知不对称**：即便通过优化提示词让模型知道自己读到的是摘要信息，也不可能让模型拥有"从什么压缩而来"的元认知。知道缺失，却不知道缺失了什么，和不知道缺失无异，因为模型只能基于已有的内容推理。
+- **决策前置风险**：压缩代理必须在不知道后续具体任务是什么的情况下，提前判断哪些信息该保留、哪些该丢弃。这是一种不可逆的信息丢失决策。
+- **累积失真**：随着对话持续进行，摘要会被再次总结，形成"摘要的摘要"。经过多次迭代后，早期的原始方案、用户修正和细粒度约束将逐步消亡，导致代理的行为逐渐偏离最初的需求轨道。
+实际上，原始对话历史从未丢失，只是传统设计将其视为一种沉没成本。Engram 的做法是放弃"以文本摘要替代历史"的思路，转而提供一套**结构化的历史导航数据**。
+```mermaid
+flowchart LR
+    Source["50 轮对话"]
+    subgraph Traditional["传统压缩"]
+        direction TB
+        B["文字摘要（有损替换）"]
+        C1["后续推理"]
+        B --> C1
+        C1 -.->|"无法回溯原始历史"| B
+    end
+    subgraph Charting["上下文绘图"]
+        direction TB
+        E["导航数据块（无损索引）"]
+        C2["后续推理"]
+        E --> C2
+        C2 <-->|"按需拉取原始历史"| E
+    end
+    Source -->|"LLM 摘要"| B
+    Source -->|"提取元数据"| E
+    style Source fill:#1a1a2e,stroke:#58a6ff,color:#fff
+    style B fill:#4a2020,stroke:#ff6b6b,color:#fff
+    style E fill:#1a2a1a,stroke:#3fb950,color:#fff
+```
+在 Engram 中，"绘图"是指为当前可见对话生成一组结构化导航数据。当触发压缩时，Engram 会在上下文中注入一个结构化数据块：
+- **对话概览**：包含当前可见轮次的结构化索引及各轮次内用户消息和代理消息的预览，后续代理可以使用工具切入感兴趣的轮次继续探索。
+- **最近过程**：保留最新可见轮次附近的消息窗口，让代理迅速理解当前状态。
+- **检索引导**：通过注入特定的提示词，构建代理的心智模型，让它像重视本地环境信息一样重视对话历史信息，激活探索的心智。
+<details>
+<summary>实现说明</summary>
+理论上虽然可以完全跳过压缩代理生成摘要的过程，但是由于 OpenCode 尚未提供跳过的接口，简单起见，当前 Engram 采用一种临时的方式实现：通过提示词注入让压缩代理直接返回非常简短的输出，然后将结构化导航数据直接替换原内容。这不是免费的，仍然会产生一次压缩的推理成本，但通常成本极低。（也许后续可以推动 OpenCode 开放接口来解决此问题）
+</details>
+上下文绘图从根本上消解了上下文压缩的核心矛盾：
+- 对话概览是历史消息的简单预览，天然缺乏细节。代理知道有缺失，于是自然地选择拉取历史来填补缺失。
+- 执行中，遇到实际信息缺口时，代理自主决定回溯范围和回溯深度，将信息的筛选时刻从压缩时转变为消费时。
+- 对话历史不会随着时间产生丝毫损失，不管对话经历了多少轮次，最久远的历史仍然在代理可达的范围内。
+### 上游历史检索
+除了长对话中的压缩问题，多智能体协作同样面临类似的挑战。
+设想一个多智能体协作中的典型场景：用户和主代理经过复杂的多轮讨论，敲定了一个方案，沿途积累了大量约束和过程上下文。然后主代理想要委托子代理去实现，这时它面临一个选择：把哪些信息写进提示词？
+一个结构性的问题在此浮现：它必须在子代理开始工作之前做出判断，但子代理的实际需求只有在执行过程中才会浮现。
+更根本的问题是：多轮讨论积累的推理链、否决路径和隐含约束，无法被压缩到一段提示词中。子代理天然带着大量缺失的上下文开始工作，谁也不知道那些上下文到底重不重要，主代理无法保证不会遗漏。
+即便方案很完美，主代理也通过某种方式完整传递了方案（例如传递方案文件的引用），但子代理失去了方案讨论的过程上下文，很可能因理解偏差导致不符合预期的执行。
+一个显而易见的解决方案是将所有上下文直接塞进子代理的窗口，那么词元浪费和上下文腐坏问题也会随之而来。更进一步，通过某种策略过滤必要上下文？那么又遇到了前面的问题。
+Engram 的解决方案是：给子代理提供一组检索工具，直接指向主代理的完整对话历史。在执行过程中遇到任何信息缺口时，它自己去查。
+```mermaid
+sequenceDiagram
+    autonumber
+    participant History as 上游对话历史
+    participant Engram as Engram
+    participant Agent as 子代理
+    Note over Agent: 遇到信息缺口
+    Agent->>Engram: history_search("接口 A 实现计划")
+    Engram->>History: 检索
+    History-->>Engram: 命中记录
+    Engram-->>Agent: 结果预览（轮次索引 + 摘要）
+    Note over Agent: 预览确认关联性，深入拉取
+    Agent->>Engram: history_pull_message("msg_064")
+    Engram->>History: 读取
+    History-->>Engram: 原始消息
+    Engram-->>Agent: 完整消息（技术约束 + 字段定义）
+    Note over Agent: 上下文补齐，继续任务
+```
+Engram 给代理提供了专用的检索工具，每个工具都按认知路径和渐进式披露原则设计，最大化检索效率和最小化词元消耗，词元的主要消耗只发生在代理实际深入的路径上，而非全量加载。
+整个系统以只读方式访问 OpenCode 已有的对话存储。不写入数据，不维护派生模型。零维护成本，零一致性问题。
+这个设计的核心不是效率优化，虽然词元节省是显著的。核心在于一个判断权的转移：**"什么信息与当前任务相关"，不再由主代理在委派时刻预判，而由子代理在执行时刻根据实际需求自主决定。** 前者是信息最不充分的时刻，后者是信息最充分的时刻。
+### 对话历史访问工具
+上下文绘图和上游历史检索共享同一套检索工具。这些工具按照数据模型进行分层设计：**Turn（轮次）→ Message（消息）→ Part（片段）**，遵循**渐进式披露（Progressive Disclosure）**原则。代理在执行任务时，可以从低 Token 消耗的索引层开始，仅在通过预览确认了关联性后，才向更深层级发起高 Token 消耗的内容拉取。这种架构在确保信息完整性的同时，通过精确控制披露深度，实现了处理超长会话时的 Token 效率最优解。
+```mermaid
+flowchart TB
+    L1["L1: 轮次层 (Turn)<br/>history_browse_turns<br/><i>低 Token · 高信息压缩比</i>"]
+    L2["L2: 消息层 (Message)<br/>history_browse_messages<br/><i>中 Token · 高导航精度</i>"]
+    L3["L3: 内容层 (Part)<br/>history_pull_message / part<br/><i>高 Token · 全文细节披露</i>"]
+    S["直达入口<br/>history_search<br/><i>按关键词跳转至内容层</i>"]
+    L1 --> L2 --> L3
+    S -.->|"跳过层级"| L3
+    classDef l1 fill:#163b22,stroke:#3fb950,stroke-width:2px,color:#fff
+    classDef l2 fill:#3d2b14,stroke:#f0883e,stroke-width:2px,color:#fff
+    classDef l3 fill:#112740,stroke:#58a6ff,stroke-width:2px,color:#fff
+    classDef search fill:#2a1a2e,stroke:#d2a8ff,stroke-width:2px,color:#fff
+    class L1 l1
+    class L2 l2
+    class L3 l3
+    class S search
+```
+`history_search` 提供了针对 **Part 层级** 的直接访问能力。当代理已知特定关键词或工具调用特征（如 `bash`）时，可以跳过层级顺序，直接定位到具体内容。
+详细的工具接口说明请参见 [docs/tools.md](docs/tools.md)。
+#### L1：轮次预览（history_browse_turns）
+该工具提供对话的全局索引。每轮仅包含 user 意图预览和 assistant 的执行元数据（工具统计、修改文件列表）。这使得代理能以极低的 Token 开销扫描数百轮对话，快速锁定目标。
+#### L2：消息预览（history_browse_messages）
+以 `message_id` 为锚点查看消息序列及其元数据（附件、工具状态）。本层级用于在目标轮次内进行二次确认，避免代理在未校验上下文关联性时盲目拉取全量文本。
+#### L3：全量拉取（history_pull_message / history_pull_part）
+这是 Token 消耗最高的层级。`history_pull_message` 将消息按类型拆分为独立片段（Part）。若内容因长度限制被截断，代理可进一步调用 `history_pull_part` 获取该片段的完整全文。只有经过前两层筛选出的关键内容，才会被允许进入主上下文窗口。
+#### 任意会话访问
+工具原生支持对任意会话的访问，但由于本项目尚未引入会话发现工具，用户需要在指令中显式指定目标会话 ID，然后便让代理基于该会话的历史直接开展工作。例如：
+- "从会话 `ses_xxx` 最新的进展继续，补全尚未完成的部分。"
+- "参考会话 `ses_xxx` 中的解决经验，为当前问题找到可行的解决方案。"
+会话 ID 可以通过 `opencode session list` 命令查询获得。
+## 设计哲学
+### 零基础设施
+与大多数"记忆"系统不同，Engram 不维护向量数据库、不运行 embedding 管线、不写入任何派生数据。整个系统以**只读方式**访问 OpenCode 已有的对话存储，全文搜索在查询时实时计算。
+这意味着：零维护成本，零一致性问题，零额外存储。对话历史本身就是数据库，无需将它再翻译一遍。
+### 判断权转移
+Engram 的设计不追求更高效的信息传递或更精准的上下文压缩。它的核心是一个判断权的重新分配：**"什么信息与当前任务相关"，不再由外部角色在信息最不充分的时刻预判，而由代理自身在信息最充分的时刻自主决定。**
+这个原则贯穿两个功能模块：上下文绘图将信息筛选的时刻从压缩时推迟到消费时，上游历史检索将信息筛选的主体从主代理转移到子代理。
+## 快速开始
+**前置条件：** Node.js 22+，[OpenCode](https://github.com/opencode-ai/opencode) 已安装。
+在 opencode.json(c) 配置文件中注册插件：
+```jsonc
+{
+  "plugin": ["opencode-engram"]
+}
+```
+完成后重启 OpenCode 即可体验。默认无需额外配置，所有功能开箱即用。
+## 配置
+两个功能模块可以通过配置参数独立启用或禁用：
+```jsonc
+{
+  "upstream_history": {
+    "enable": true   // 上游历史检索，默认启用
+  },
+  "context_charting": {
+    "enable": true  // 上下文绘图，默认启用
+  }
+}
+```
+配置文件为项目根目录或全局 OpenCode 配置目录下的 `opencode-engram.json` / `opencode-engram.jsonc`，项目级配置会覆盖全局配置。通过调整配置，可以精细控制工具输出的细节暴露程度，以平衡输出质量与词元消耗。此外，还可以精细控制工具的输入和输出的显示行为，支持自定义工具。完整的配置字段说明请参见 [docs/config.md](docs/config.md)。
+## 贡献
+欢迎提交 issue 和 PR。
+```bash
+# 克隆并安装
+git clone https://github.com/NocturnesLK/opencode-engram.git
+cd opencode-engram
+npm ci
+# 类型检查
+npm run typecheck
+# 运行测试（覆盖率阈值 80%）
+npm run test:coverage
+# 运行单个测试文件
+npx vitest run src/runtime/runtime.test.ts
+```
+测试文件与源文件共位（`foo.ts` 对应 `foo.test.ts`），使用 Vitest 框架。新功能请同步添加测试。
+## Roadmap
+- [ ] **绘图基准**：建立基准测试，量化上下文绘图在长对话任务中相比传统压缩的效益
+- [ ] **其他平台支持**：将检索工具扩展到 OpenCode 以外的平台（Claude Code等）
+- [ ] **代理审查**：新增第三种功能，由独立审查代理拉取目标代理的执行历史，用于目标代理的开发、调试、评估和迭代
+## License
+MIT © 2026 NocturnesLK