npm - @bd7pil/opencode-deep-memory - Versions diffs - 0.7.0 → 0.8.1 - Mend

@bd7pil/opencode-deep-memory 0.7.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -7,11 +7,11 @@
 OpenCode sessions are stateless. Every restart is a cold start. Native compaction
 destroys conversation content. **deep-memory** adds three layers:
-| Layer | Hook | Purpose |
-|-------|------|---------|
-| **Remember** | `memory_search`, `memory_store`, `memory_forget`, `memory_expand` | Decisions, constraints, gotchas survive across sessions via BM25 + CJK search. Storage at `.deep-memory/` in your project root — visible, version-controllable. |
-| **Recover** | `session.created`, `experimental.session.compacting` | Checkpoint captures conversation before compaction destroys it. Resume injection recalls everything on a new session (3000 token first-turn budget). |
-| **Compress** | `experimental.chat.messages.transform` | Old reasoning, metadata, system injections, and thinking tags stripped deterministically — no LLM calls. Cache-stable sentinel replacements preserve prompt cache. |
+| Layer | What survives | How |
+|-------|--------------|-----|
+| **Remember** | Decisions, constraints, gotchas | `memory_search` / `memory_store` — BM25 + CJK search across sessions |
+| **Recover** | Full conversation context | Checkpoint captures before compaction; resume injection on new session |
+| **Compress** | Token budget | Deterministic stripping + pressure-triggered deep compression — no LLM calls |
 ## Quick start
@@ -30,125 +30,116 @@ OpenCode auto-installs on startup. Memory appears at `.deep-memory/` in your pro
 ## How it works
 ```
-                         ┌─────────────────────────────┐
-                         │     system.transform         │
-                         │   m[0] stable (cache hit)    │
-                         │   m[1] volatile (per-turn)   │
-                         │   repo map (code symbols)    │
-                         └─────────────────────────────┘
-                                     ▲
-┌──────────────┐    ┌──────────────┐ │  ┌───────────────────────────┐
-│ chat.message │    │  chat.params │ │  │      messages.transform   │
-│ keyword→notes│    │ agent→budget │ │  │  ① Layer 1: strip reason. │
-│  "记住"/"rem" │    │ main 800t    │ │  │  ② Layer 2: deep compress │
-│              │    │ oracle 400t  │ │  │     dedup / error purge / │
-└──────────────┘    └──────────────┘ │  │     tool compress / JSON / │
-                                     │  │     message prune / CCR   │
-                     ┌──────────────┘  └───────────────────────────┘
-                     │
-┌────────────────────┴────────────────────────┐
-│                  event                      │
-│  session.created → resume + dream schedule  │
-│  session.idle    → enrichment + notify      │
-│  session.compacted → checkpoint             │
-└─────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│  messages.transform (every turn)                                │
+│  ├─ Strip reasoning/thinking parts (physical removal)           │
+│  ├─ Remove system-injected messages (physical removal)          │
+│  ├─ Truncate old tool errors                                    │
+│  └─ Deep compress: dedup / tool output / JSON / assistant text  │
+└─────────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│  system.transform (every turn)                                  │
+│  ├─ Inject stable: MEMORY.md constraints + tool hint (cache hit)│
+│  └─ Inject volatile: BM25 search results + repo map symbols     │
+└─────────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│  compacting (before OpenCode destroys messages)                 │
+│  ├─ Capture raw messages → checkpoint.raw.json                  │
+│  ├─ Extract knowledge → checkpoint.md                           │
+│  └─ Inject structured handoff prompt for LLM                    │
+└─────────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│  events                                                         │
+│  ├─ session.created → resume + dream schedule                   │
+│  ├─ session.idle    → enrichment                                │
+│  └─ session.compacted → pressure calibration                    │
+└─────────────────────────────────────────────────────────────────┘
 ```
 ## Context compression
-Two compression layers run automatically, no LLM calls required.
+Two layers, fully automatic, no LLM calls.
-### Layer 1: Deterministic stripping
+### Layer 1: Deterministic stripping (always active)
-Always active, strips disposable content from old messages:
+| Target | Action |
+|--------|--------|
+| Old reasoning/thinking parts | Physical removal |
+| System injections (`<system-reminder>`, etc.) | Physical removal |
+| Tool errors >100 chars (older than 4 turns) | Truncate |
+| Inline `<thinking>` tags | Regex strip |
-| What gets stripped | How | Why safe |
-|--------------------|-----|----------|
-| `reasoning_details` metadata | Delete the JSON blob | Billing metadata, never reaches model |
-| Old reasoning text | Replace with `[cleared]` | Conclusions are in assistant text |
-| System injections | Replace with `[stripped]` | `<system-reminder>` stale after one turn |
-| Tool errors >100 chars | Truncate | An old error only needs "it failed" |
-| Inline `<thinking>` tags | Regex strip | Process, not product |
+No marker pollution — old content is physically removed, not replaced with `[cleared]` or `[stripped]`. This prevents [context confusion](https://www.philschmid.de/context-engineering-part-2).
 ### Layer 2: Deep compression (pressure-triggered)
-Activates when context pressure exceeds thresholds. Inspired by
-[DCP](https://github.com/Opencode-DCP/opencode-dynamic-context-pruning),
-[Headroom](https://github.com/chopratejas/headroom), and
-[Edgee](https://github.com/edgee-ai/edgee).
 | Pressure | Threshold | Actions |
 |----------|-----------|---------|
-| **always** | every turn | tool dedup + error purge + tool output compress + JSON crush (all reversible via CCR) |
-| **medium** | ≥ 30% context | + old message text truncation (lossy, extracts key info) |
-| **high** | ≥ 50% context | + nudge (alerts model to save important findings)
+| **always** | every turn | tool dedup + error purge + tool output compress + JSON crush + assistant text compress |
+| **medium** | ≥ 50K tokens | + memory nudge (prompts LLM to use `memory_store`) |
+| **high** | ≥ 150K tokens | + pressure nudge (prompts LLM to summarize old tasks) |
-What gets compressed at medium+:
+Thresholds are absolute, not percentage-based — they work consistently across 200K and 1M+ context windows. Based on [Focus Agent](https://arxiv.org/html/2601.07190v1) research.
 | Target | Strategy | Source |
 |--------|----------|--------|
-| Duplicate tool calls | Signature matching (`toolName::sortedParams`) | DCP |
-| Old error inputs | Purge inputs after 4 turns | DCP |
-| File reads | Keep first 50 + key lines + last 20 | Edgee |
-| Command outputs | Keep errors + last 30 lines | Edgee |
-| Search results | Keep top-20, group by file | Edgee |
-| JSON arrays | Keep first 30% + last 15% + dedup middle | Headroom SmartCrusher |
-| Old assistant text | Extract key info (headings, code, errors) | DCP |
+| Duplicate tool calls | Signature matching | [DCP][] |
+| Old error inputs | Purge after 4 turns | [DCP][] |
+| File reads | Keep head + key lines + tail | [Edgee][] |
+| Command outputs | Keep errors + tail | [Edgee][] |
+| Search results | Keep top-20, group by file | [Edgee][] |
+| JSON arrays | Head + dedup middle + tail | [Headroom][] |
+| Old assistant text | Preserve structure, compress prose | [LLMLingua][] |
-All compressed content is **reversible** via CCR (Compress-Cache-Retrieve):
-originals are cached with SHA-256 hash and 5-minute TTL.
-Models can retrieve them via the `deep_expand` tool.
+All compressed content is **reversible** via CCR (Compress-Cache-Retrieve) — originals cached with SHA-256 hash, retrievable via `deep_expand` tool.
-**Never touched**: user messages, recent 8 messages, protected tools
-(question, edit, write, todowrite, memory_store/search/forget).
+**Never touched**: user messages, recent 4K tokens, protected tools (question, edit, write, todowrite, memory_*).
-## Toast notifications
+## Memory nudge
-After each LLM turn, deep-memory shows a toast notification (bottom-right corner) summarizing
-what was compressed and injected. The notification level is chosen automatically:
+Detects decisions, constraints, and fixes in conversation — nudges the LLM to persist them.
-| Scenario | Level | Content |
-|----------|-------|---------|
-| Injection only (no compression) | minimal | One-line summary: `-8.5K stripped` |
-| Compression (short session) | detailed | Progress bar + per-category breakdown |
-| Compression + rich context (repo-map, memory, checkpoint) | extended | Full panel with budget usage |
+| Pattern | Example | Nudge |
+|---------|---------|-------|
+| Decision | "我决定用 PostgreSQL" / "I'll use PostgreSQL" | `memory_store(type="decision")` |
+| Constraint | "不能用 eval()" / "must not use eval()" | `memory_store(type="constraint")` |
+| Error fix | "修复了权限问题" / "fixed the permission error" | `memory_store(type="gotcha")` |
-Example toast (detailed level):
+English + Chinese. Pressure nudge and memory nudge have independent cooldowns.
-```
-deep-memory | compressed
-─ Compression ─────────────────────────────
-│████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
-  reasoning -6.2K | metadata -2.1K | tool_err -0.8K
-─ Injection ───────────────────────────────
-  m[0] stable 1055B ✓  m[1] volatile 574B
-  tier=main | mode=normal
-  repo-map: 12 symbols | memory: 8 entries
-```
+## Tools
+| Tool | Purpose |
+|------|---------|
+| `memory_search` | Search persistent memory (BM25 + CJK bigram) |
+| `memory_store` | Store decisions, constraints, gotchas, facts, notes |
+| `memory_forget` | Remove stale memory entries |
+| `memory_expand` | Retrieve original content of a compressed message |
+| `deep_expand` | Retrieve original content via CCR hash |
-## Cache-stable injection
+## Compaction
-Each turn pushes two system prompt fragments:
+When OpenCode compacts a session:
-- **Stable** (`<deep-memory-stable>`): constraints, rules, and the tool hint.
-  These change only when MEMORY.md is updated — typically across sessions, not turns.
-  Because they're byte-identical turn after turn, the provider's prompt cache hits on this prefix.
+1. **Capture** raw messages to `checkpoint.raw.json`
+2. **Extract** knowledge via 5 heuristic extractors
+3. **Write** structured `checkpoint.md`
+4. **Inject** Hermes-8 structured prompt + Codex-style handoff prefix
-- **Volatile** (`<deep-memory-volatile>`): context-aware search results from the user's
-  current query, tier-allocated by importance, plus repo map symbols for recently-read files.
-  This is the only part that changes per turn.
+The LLM produces: Task Overview → Progress → Key Decisions → Constraints → Files Modified → Errors → Next Steps → Critical Context
-The injection budget adapts to the agent: main orchestrator gets 800 tokens per turn
-(3000 on session resume), deep-reasoning agents get 400, and tool subagents get 80.
+## Memory consolidation
-## Memory search (BM25 + CJK bigram)
+| Cycle | Trigger | Action |
+|-------|---------|--------|
+| **Auto-dream** | 7 days or notes.md >20 lines | Consolidate notes + checkpoints → MEMORY.md |
+| **Auto-distill** | 30 days | Package recurring workflows → skill candidates |
+| **Enrichment** | Session idle after compaction | LLM enriches checkpoint with cross-references |
-Instead of SQLite FTS5, we use a pure-JS BM25 engine with a CJK-aware tokenizer.
-Chinese runs are split into sliding 2-character bigrams (`"权限死锁"` →
-`["权","权限","限死","死锁","锁"]`), making multi-character CJK phrases searchable
-without an embedding model. Latin text uses standard whitespace/punctuation splitting.
-The index is rebuilt from Markdown files on startup (<250ms for 2000 entries) and
-updated incrementally on writes.
+New projects: MEMORY.md auto-bootstraps from notes.md. Both agents have `memory_forget` enabled.
 ## Configuration
@@ -161,81 +152,72 @@ updated incrementally on writes.
 ## Storage
 ```
-<project>/.deep-memory/       ← version-controllable
+<project>/.deep-memory/
 ├── MEMORY.md                   persistent decisions/constraints/gotchas
 ├── notes.md                    keyword captures
 ├── checkpoint.md               last compaction extraction
+├── checkpoint.raw.json         raw messages dump
 ├── .schedule.json              dream/distill state
-└── sessions/<sid>/              per-session archive
+├── .compaction-log.jsonl       compaction audit trail
+└── sessions/<sid>/             per-session archive
 ```
-## Tools
-| Tool | Purpose |
-|------|---------|
-| `memory_search` | Search persistent memory across sessions (BM25 + CJK) |
-| `memory_store` | Store decisions, constraints, gotchas, facts, notes |
-| `memory_forget` | Remove memory entries matching a query |
-| `memory_expand` | Decompress a sentinel reference to its original content |
-| `deep_expand` | Retrieve original content compressed by CCR (use `[ccr:HASH]` marker) |
-| `deep_expand` | Retrieve original content compressed by CCR (use `[ccr:HASH]` marker) |
 ## Commands
-Copy `.opencode/command/*.md` to your project:
 - `/checkpoint` — manually capture session state
 - `/dream` — consolidate notes into persistent memory
 - `/distill` — package recurring workflows into skills
-## Design
+## Development
-**Memory entries** carry a type (`decision`, `constraint`, `gotcha`, `fact`, `note`) and
-an importance score. Importance is heuristically derived from entry type, recency,
-frequency across sessions, and keyword-match relevance to the current query —
-no LLM calls required.
+```bash
+npm install
+npm run verify   # typecheck + test (363) + build + smoke (49)
+```
-Entries are stored as Markdown sections (e.g. `## Decisions`, `## Constraints`) in
-`MEMORY.md`, with `[date]` timestamps for time-based decay. The BM25 index is rebuilt
-from these files on startup and updated incrementally on write.
+## Acknowledgments
-Background consolidation runs on a 7-day cycle (auto-dream) plus an accumulation trigger
-(when `notes.md` exceeds 20 lines). A separate 30-day cycle (auto-distill) packages
-recurring workflows into skill candidates. Both use background sessions to avoid
-consuming the main session's context budget.
+**[DCP][]** — Dynamic Context Pruning for OpenCode. Tool dedup, error purge, and nudge system.
-## Acknowledgments
+**[Headroom][]** — JSON array crush and CCR (Compress-Cache-Retrieve).
+**[Edgee][]** — Per-tool compression strategies (read, bash, grep, glob).
-**[MiMo-Code][]** — a terminal-native AI coding assistant with persistent memory that keeps a
-deep understanding of your project across sessions while continuously improving itself.
+**[Contextomizer][]** — Content type detection pipeline.
-**[Magic Context][]** — unbounded context. Memory that manages itself. One session, for life.
-The hippocampus for coding agents, part of CortexKit.
+**[Focus Agent][]** — Absolute token thresholds and assistant text compression research.
-**[Aider][]** — AI pair programming in your terminal. Lets you pair program with LLMs to start
-a new project or build on your existing codebase.
+**[LLMLingua][]** — Selective compression: preserve structure, compress prose.
-**[Roo Code][]** — a whole dev team of AI agents in your code editor.
+**[Codex CLI][]** — Handoff prefix pattern for compaction continuity.
-**[Continue][]** — pioneering open-source coding agent, available as a CLI, VS Code extension,
-and JetBrains plugin.
+**[Google ADK][]** — Append-only event compaction architecture.
-**[OpenHands][]** — Code Less, Make More. A community focused on AI-driven development.
+**[Hermes][]** — 8-section structured compaction prompt design.
-**[Plandex][]** — an AI coding agent designed for large tasks and real world projects.
+**[MiMo-Code][]** — Terminal-native AI coding assistant with persistent memory.
-**[DCP][]** — Dynamic Context Pruning for OpenCode. Our tool deduplication, error purging,
-and nudge system are inspired by DCP's architecture.
+**[Magic Context][]** — Unbounded context for coding agents.
-**[Headroom][]** — compress tool outputs, logs, files, RAG chunks for AI agents.
-Our JSON array crush and CCR (Compress-Cache-Retrieve) are derived from Headroom's SmartCrusher.
+**[Aider][]** — AI pair programming in your terminal.
-**[Edgee][]** — agent gateway that compresses tokens before LLM providers.
-Our per-tool compression strategies (read, bash, grep, glob) are inspired by Edgee's approach.
+**[Roo Code][]** — A whole dev team of AI agents in your code editor.
-**[Contextomizer][]** — ultra-fast deterministic library for transforming bloated tool outputs.
-Our content type detection pipeline is inspired by Contextomizer's approach.
+**[Continue][]** — Pioneering open-source coding agent.
+**[OpenHands][]** — Code Less, Make More.
+**[Plandex][]** — AI coding agent for large tasks and real world projects.
+[DCP]: https://github.com/Opencode-DCP/opencode-dynamic-context-pruning
+[Headroom]: https://github.com/chopratejas/headroom
+[Edgee]: https://github.com/edgee-ai/edgee
+[Contextomizer]: https://github.com/GandalFran/contextomizer
+[Focus Agent]: https://arxiv.org/html/2601.07190v1
+[LLMLingua]: https://github.com/microsoft/LLMLingua
+[Codex CLI]: https://github.com/openai/codex
+[Google ADK]: https://github.com/google/adk-python
+[Hermes]: https://github.com/NousResearch/hermes-agent
 [MiMo-Code]: https://github.com/XiaomiMiMo/MiMo-Code
 [Magic Context]: https://github.com/cortexkit/magic-context
 [Aider]: https://github.com/Aider-AI/aider
@@ -243,35 +225,6 @@ Our content type detection pipeline is inspired by Contextomizer's approach.
 [Continue]: https://github.com/continuedev/continue
 [OpenHands]: https://github.com/All-Hands-AI/OpenHands
 [Plandex]: https://github.com/plandex-ai/plandex
-[DCP]: https://github.com/Opencode-DCP/opencode-dynamic-context-pruning
-[Headroom]: https://github.com/chopratejas/headroom
-[Edgee]: https://github.com/edgee-ai/edgee
-[Contextomizer]: https://github.com/GandalFran/contextomizer
-## Development
-```bash
-npm install
-npm run verify   # typecheck + test (363) + build + smoke (49)
-```
-Stats: 54 source files, 27 test files (363 tests), 10 compress modules, 49 smoke checks.
-## CI/CD (npm Trusted Publishing)
-Releases use npm OIDC Trusted Publishing — no token needed. To set up for a fork:
-1. **npmjs.com** → Package Settings → Trusted Publishers → Add:
-   - Owner: your GitHub username
-   - Repository: your fork name
-   - Workflow filename: `publish.yml`
-2. **package.json** → update `repository.url` to match your fork
-3. **Push a tag** → GitHub Actions auto-publishes:
-   ```bash
-   git tag v1.0.0 && git push origin v1.0.0
-   ```
-Requirements: npm CLI ≥ 11.5.1, Node.js ≥ 22, `id-token: write` permission, public repository.
 ## License

package/dist/index.js CHANGED Viewed

@@ -261,6 +261,7 @@ var PluginState = class {
   _ccrCache = /* @__PURE__ */ new Map();
   _lastInputTokens = 0;
   _lastNudgeMessageCount = /* @__PURE__ */ new Map();
+  _lastMemoryNudgeMessageCount = /* @__PURE__ */ new Map();
   _lastCCRCleanup = 0;
   _modelContextWindow = 0;
   agentOf(sessionID) {
@@ -274,6 +275,7 @@ var PluginState = class {
     this._models.delete(sessionID);
     this._lastUserText.delete(sessionID);
     this._lastNudgeMessageCount.delete(sessionID);
+    this._lastMemoryNudgeMessageCount.delete(sessionID);
   }
   recordModel(sessionID, model) {
     this._models.set(sessionID, model);
@@ -423,6 +425,13 @@ var PluginState = class {
     const last = this._lastNudgeMessageCount.get(sessionID);
     return last != null ? currentMessageCount - last : Number.POSITIVE_INFINITY;
   }
+  recordMemoryNudge(sessionID, messageCount) {
+    this._lastMemoryNudgeMessageCount.set(sessionID, messageCount);
+  }
+  messagesSinceLastMemoryNudge(sessionID, currentMessageCount) {
+    const last = this._lastMemoryNudgeMessageCount.get(sessionID);
+    return last != null ? currentMessageCount - last : Number.POSITIVE_INFINITY;
+  }
   setModelContextWindow(tokens) {
     if (tokens > 0) this._modelContextWindow = tokens;
   }
@@ -1127,7 +1136,7 @@ async function runDream(opts) {
         tools: {
           memory_search: true,
           memory_store: true,
-          memory_forget: false,
+          memory_forget: true,
           read: true,
           list: true
         }
@@ -1228,23 +1237,38 @@ async function handleSessionCreatedForDream(args) {
   }
   const notesPath = memoryFilePath("project", "notes", projectPath);
   let notesLines = 0;
+  let notesContent = "";
   try {
-    const content = fs5.readFileSync(notesPath, "utf8");
-    if (content.trim().length === 0) {
+    notesContent = fs5.readFileSync(notesPath, "utf8");
+    if (notesContent.trim().length === 0) {
       logger?.debug("auto-dream: notes.md is empty, skipping spawn");
       return;
     }
-    notesLines = content.split("\n").filter((l) => l.trim()).length;
+    notesLines = notesContent.split("\n").filter((l) => l.trim()).length;
   } catch {
     logger?.debug("auto-dream: notes.md not found, skipping spawn");
     return;
   }
   const memoryPath = memoryFilePath("project", "memory", projectPath);
   if (!fs5.existsSync(memoryPath) || fs5.statSync(memoryPath).size < 50) {
-    logger?.debug("auto-dream: MEMORY.md missing or too small, skipping", {
-      sessionID: info.id
-    });
-    return;
+    if (notesLines >= 5) {
+      try {
+        fs5.writeFileSync(memoryPath, notesContent, "utf8");
+        logger?.info("auto-dream: bootstrapped MEMORY.md from notes.md", {
+          notesLines
+        });
+      } catch (err) {
+        logger?.warn("auto-dream: failed to bootstrap MEMORY.md", {
+          error: err instanceof Error ? err.message : String(err)
+        });
+        return;
+      }
+    } else {
+      logger?.debug("auto-dream: MEMORY.md missing and notes too small, skipping", {
+        sessionID: info.id
+      });
+      return;
+    }
   }
   const isSevenDayDue = schedule.lastDream === null || Date.now() - Date.parse(schedule.lastDream) > DREAM_INTERVAL_MS;
   let isAccumulationDue = false;
@@ -1386,7 +1410,7 @@ async function runDistill(opts) {
         tools: {
           memory_search: true,
           memory_store: true,
-          memory_forget: false,
+          memory_forget: true,
           read: true,
           list: true
         }
@@ -15301,6 +15325,43 @@ function maxContextFrom(modelContextWindow) {
   if (calibratedMaxContext > 0) return calibratedMaxContext;
   return FALLBACK_MAX_CONTEXT;
 }
+function estimateTokens2(text) {
+  let cjk = 0;
+  let other = 0;
+  for (const ch of text) {
+    if (/[\u4e00-\u9fff\u3400-\u4dbf\u3000-\u303f\uff00-\uffef\u3040-\u309f\u30a0-\u30ff]/.test(ch)) {
+      cjk++;
+    } else {
+      other++;
+    }
+  }
+  return Math.ceil(cjk * 0.7 + other / 3.8);
+}
+function extractTokensFromMessages(messages) {
+  let total = 0;
+  for (const msg of messages) {
+    for (const part of msg.parts) {
+      if (typeof part !== "object" || part === null) continue;
+      const p = part;
+      if (p["type"] === "text" && typeof p["text"] === "string") {
+        total += estimateTokens2(p["text"]);
+      } else if (p["type"] === "tool") {
+        const state = p["state"];
+        if (state?.["output"] && typeof state["output"] === "string") {
+          total += estimateTokens2(state["output"]);
+        }
+        if (state?.["error"] && typeof state["error"] === "string") {
+          total += estimateTokens2(state["error"]);
+        }
+      } else if (p["type"] === "reasoning" || p["type"] === "thinking") {
+        if (typeof p["text"] === "string") {
+          total += estimateTokens2(p["text"]);
+        }
+      }
+    }
+  }
+  return total;
+}
 function extractInputTokensFromMessages(messages) {
   let best = 0;
   for (let i = messages.length - 1; i >= 0; i--) {
@@ -15323,7 +15384,7 @@ function extractInputTokensFromMessages(messages) {
 function detectPressure(messages, modelContextWindow) {
   const ctx = maxContextFrom(modelContextWindow || 0);
   const inputTokens = extractInputTokensFromMessages(messages);
-  const estimated = inputTokens > 0 ? inputTokens : 0;
+  const estimated = inputTokens > 0 ? inputTokens : extractTokensFromMessages(messages);
   const ratio = Math.min(estimated / ctx, 1);
   let level;
   if (estimated >= PRESSURE_HIGH_TOKENS) level = "high";
@@ -15350,17 +15411,17 @@ function buildNudgeText(level) {
 var MEMORY_NUDGE_COOLDOWN = 3;
 var DECISION_PATTERNS = [
   /\b(?:decided|decision|chose|chosen|picked|selected)\b/i,
-  /\b(?:采用|选择|决定|确定|选用)\b/,
+  /(?:采用|选择|决定|确定|选用)/,
   /\b(?:use|using|go with|went with)\b.*\b(?:because|since|due to)\b/i
 ];
 var CONSTRAINT_PATTERNS = [
   /\b(?:must not|cannot|should not|do not|never|always)\b/i,
   /\b(?:constraint|restriction|limitation|requirement)\b/i,
-  /\b(?:不能|必须|禁止|约束|限制|要求|务必)\b/
+  /(?:不能|必须|禁止|约束|限制|要求|务必)/
 ];
 var ERROR_FIX_PATTERNS = [
   /\b(?:fix|fixed|resolve|resolved|patch|corrected)\b/i,
-  /\b(?:修复|修复了|解决|解决了)\b/,
+  /(?:修复|修复了|解决|解决了)/,
   /\b(?:the (?:bug|error|issue) (?:was|is)|root cause)\b/i
 ];
 function detectMemoryNudge(messages, messagesSinceLastNudge) {
@@ -15374,13 +15435,14 @@ function detectMemoryNudge(messages, messagesSinceLastNudge) {
   const hasRecentToolError = recentMessages.some(
     (m) => m.parts.some((p) => p.type === "tool" && p.state?.status === "error")
   );
+  const recentAll = recentUserText + "\n" + recentAssistantText;
   if (hasRecentToolError && ERROR_FIX_PATTERNS.some((p) => p.test(recentAssistantText))) {
     return { injected: true, type: "gotcha" };
   }
-  if (CONSTRAINT_PATTERNS.some((p) => p.test(recentUserText))) {
+  if (CONSTRAINT_PATTERNS.some((p) => p.test(recentAll))) {
     return { injected: true, type: "constraint" };
   }
-  if (DECISION_PATTERNS.some((p) => p.test(recentAssistantText))) {
+  if (DECISION_PATTERNS.some((p) => p.test(recentAll))) {
     return { injected: true, type: "decision" };
   }
   return { injected: false, type: null };
@@ -15774,17 +15836,19 @@ function runCompressionPipeline(ctx) {
     estimatedTokens: pressure.estimatedTokens
   };
   const sid = sessionID || "default";
-  const messagesSinceNudge = state.messagesSinceLastNudge(sid, messages.length);
-  if (shouldInjectNudge(pressure.level, messagesSinceNudge)) {
+  const currentMsgCount = messages.length;
+  const pressureSince = state.messagesSinceLastNudge(sid, currentMsgCount);
+  if (shouldInjectNudge(pressure.level, pressureSince)) {
     if (injectIntoLastAssistant(messages, buildNudgeText(pressure.level))) {
       stats.nudgeInjected = true;
-      state.recordNudge(sid, messages.length);
+      state.recordNudge(sid, currentMsgCount);
     }
   }
-  const memoryNudge = detectMemoryNudge(messages, state.messagesSinceLastNudge(sid, messages.length));
+  const memorySince = state.messagesSinceLastMemoryNudge(sid, currentMsgCount);
+  const memoryNudge = detectMemoryNudge(messages, memorySince);
   if (memoryNudge.injected) {
     if (injectIntoLastAssistant(messages, buildMemoryNudge(memoryNudge.type))) {
-      state.recordNudge(sid, messages.length);
+      state.recordMemoryNudge(sid, currentMsgCount);
       logger?.debug("compress: memory nudge", { type: memoryNudge.type });
     }
   }