npm - @parkgogogo/openclaw-reflection - Versions diffs - 0.1.0 → 0.1.3 - Mend

@parkgogogo/openclaw-reflection 0.1.0 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/INSTALL.md +48 -36
package/README.md +104 -54
package/README.zh-CN.md +219 -0
package/assets/memory-flowchart.png +0 -0
package/assets/openclaw-reflection-logo.png +0 -0
package/openclaw.plugin.json +1 -1
package/package.json +6 -3
package/src/config.ts +1 -1
package/src/evals/cli.ts +20 -5
package/src/evals/comparison.ts +248 -0
package/src/evals/datasets.ts +5 -5
package/src/evals/models.ts +125 -0
package/src/evals/reporting.ts +123 -0
package/src/evals/runner.ts +81 -19
package/src/index.ts +73 -8
package/src/memory-gate/analyzer.ts +4 -4
package/src/memory-gate/prompt.ts +1 -1
package/src/message-handler.ts +18 -18
package/src/write-guardian/audit-log.ts +71 -0
package/src/{file-curator → write-guardian}/index.ts +62 -20

package/INSTALL.md CHANGED Viewed

@@ -1,78 +1,90 @@
-# OpenClaw Install Guide
+# OpenClaw Plugin Install Skill
-This file is for OpenClaw itself to follow when installing `@parkgogogo/openclaw-reflection`.
+Use this file when installing `@parkgogogo/openclaw-reflection`.
-## Goal
+## Role
-Install the packaged plugin, enable it under the `openclaw-reflection` plugin id, provide a valid LLM config, and restart the gateway so the plugin becomes active.
+Install the package, collect only missing config, write the plugin entry, validate config, and restart the gateway.
-## Install From npm
+## Ask Only If Missing
-Use the published package, not a linked workspace checkout:
+- `workspaceDir`
+  Recommendation: use OpenClaw's current workspace for the active profile.
+  Only ask if the current workspace is unknown or the operator wants a different one.
+- `llm.baseURL`
+- `llm.apiKey`
+- `llm.model`
+- whether the profile already uses `plugins.allow`
-```bash
-openclaw plugins install @parkgogogo/openclaw-reflection@0.1.0
-```
+## Defaults
+- `bufferSize`: `50`
+- `logLevel`: `info`
+- `memoryGate.enabled`: `true`
+- `memoryGate.windowSize`: `10`
+- `consolidation.enabled`: `false`
+- `consolidation.schedule`: `0 2 * * *`
+- recommended model: `x-ai/grok-4.1-fast`
+## Rules
-If you are installing from a local release artifact instead:
+- Install from npm, not a linked checkout.
+- Write config under `plugins.entries.openclaw-reflection`.
+- Do not leave `llm.baseURL`, `llm.apiKey`, or `llm.model` empty.
+- Only add `openclaw-reflection` to `plugins.allow` if the profile already uses an allowlist.
+## Steps
+### 1. Install
 ```bash
-openclaw plugins install ./parkgogogo-openclaw-reflection-0.1.0.tgz
+openclaw plugins install @parkgogogo/openclaw-reflection@0.1.0
 ```
-## Write Config
+### 2. Write Config
-Add this under `plugins.entries.openclaw-reflection` in the active OpenClaw profile config:
+Use the current OpenClaw workspace as `workspaceDir` unless the operator explicitly wants another workspace.
-```json
+```jsonc
 {
   "enabled": true,
   "config": {
-    "workspaceDir": "/absolute/path/to/agent-workspace",
+    "workspaceDir": "/absolute/path/to/current-openclaw-workspace",
     "bufferSize": 50,
     "logLevel": "info",
     "llm": {
-      "baseURL": "https://api.openai.com/v1",
+      "baseURL": "https://openrouter.ai/api/v1",
       "apiKey": "YOUR_API_KEY",
-      "model": "gpt-4.1-mini"
+      "model": "x-ai/grok-4.1-fast"
     },
     "memoryGate": {
       "enabled": true,
       "windowSize": 10
     },
     "consolidation": {
-      "enabled": true,
+      "enabled": false,
       "schedule": "0 2 * * *"
     }
   }
 }
 ```
-Make sure `plugins.allow` includes `openclaw-reflection` if the profile uses an allowlist.
+If the profile uses `plugins.allow`, ensure it contains:
-## Validate And Restart
+```json
+["openclaw-reflection"]
+```
-Run:
+### 3. Validate
 ```bash
 openclaw config validate --json
-openclaw gateway run --force --verbose
 ```
-In another shell, confirm the gateway is live:
+Stop if validation fails.
+### 4. Restart
 ```bash
-curl --fail http://127.0.0.1:PORT/health
+openclaw gateway restart
 ```
-## Expected Success Signals
-- The gateway starts without plugin registration errors
-- The plugin log contains `Plugin registered successfully, all hooks active`
-- `GET /health` returns JSON containing `"ok": true`
-## Do Not Do This
-- Do not use `openclaw plugins install -l` unless you are actively developing this repository
-- Do not leave `llm.baseURL`, `llm.apiKey`, or `llm.model` empty
-- Do not configure the plugin under the old id `reflection-plugin`

package/README.md CHANGED Viewed

@@ -1,14 +1,29 @@
 # OpenClaw Reflection
+<p align="center">
+  <img src="./assets/openclaw-reflection-logo.png" alt="OpenClaw Reflection logo" width="180" />
+</p>
+<p align="center"><strong>Make OpenClaw's native memory system sharper without replacing it.</strong></p>
 ![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-Plugin-111111?style=flat-square)
 ![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178c6?style=flat-square)
-![memoryGate 16/16](https://img.shields.io/badge/memoryGate-16%2F16%20passed-2ea043?style=flat-square)
-![writer guardian 16/16](https://img.shields.io/badge/writer%20guardian-16%2F16%20passed-2ea043?style=flat-square)
+![memory_gate 18 cases](https://img.shields.io/badge/memory_gate-18%20benchmark%20cases-2ea043?style=flat-square)
+![write_guardian 14 cases](https://img.shields.io/badge/write_guardian-14%20benchmark%20cases-2ea043?style=flat-square)
-**Make OpenClaw's native memory system sharper without replacing it.**
+Chinese version: [README.zh-CN.md](./README.zh-CN.md)
 OpenClaw Reflection is an additive layer on top of OpenClaw's built-in Markdown memory system. It captures message flow, keeps thread noise out of long-term memory, writes durable knowledge into the same human-readable memory files OpenClaw already uses, and periodically consolidates them so your agent gets sharper over time instead of messier.
+## Current Scope
+Reflection currently supports:
+- a single agent
+- multiple sessions for that same agent
+Reflection does not currently support multi-agent memory coordination or per-agent routing across multiple agents in one OpenClaw setup.
 ## Built On OpenClaw Memory
 OpenClaw memory is already workspace-native: the source of truth is Markdown files in the agent workspace, not a hidden database. In the official model, daily logs live under `memory/YYYY-MM-DD.md`, while `MEMORY.md` is the curated long-term layer.
@@ -19,17 +34,15 @@ Reflection builds on top of that system instead of replacing it.
 - It does **not** require replacing OpenClaw's default `memory-core`
 - It does **not** take over the active `plugins.slots.memory` role
 - It works by listening to message hooks and curating the same workspace memory files
+- It analyzes and curates `USER.md`, `MEMORY.md`, `TOOLS.md`, `IDENTITY.md`, and `SOUL.md` based on conversation flow
 In practice, that means low migration risk and low conceptual overhead: you keep OpenClaw's native MEMORY workflow, and Reflection enhances the capture, filtering, routing, and consolidation steps around it.
 ## Why People Install It
-Most chat memory systems fail in one of two ways:
+OpenClaw's core long-term files such as `USER.md`, `TOOLS.md`, `IDENTITY.md`, and `SOUL.md` are hard to improve continuously in the default setup.
-- they forget too much, so you keep re-explaining the same context
-- they remember too much, so temporary thread noise pollutes long-term memory
-Reflection is built to fix both.
+Reflection is built to solve that.
 - Keep stable user preferences and collaboration habits
 - Preserve durable shared context across sessions
@@ -37,15 +50,20 @@ Reflection is built to fix both.
 - Refuse one-off tasks, active thread chatter, and misrouted writes
 - Periodically consolidate memory so it stays usable
+## Core Mechanism
+Reflection uses LLM analysis over recent conversation context and adds two control points: `memory_gate` and `write_guardian`.
+- `memory_gate` analyzes the conversation and decides which durable fact, if any, should be written and which target file it belongs to
+- `write_guardian` acts as the write gate and follows OpenClaw's file responsibilities to decide whether a write should be accepted, rejected, or merged into the target file
 ## Install
 ### Recommended for users: install the plugin package
-OpenClaw can install plugins directly from a package source. That is the right distribution path for Reflection, because users should not need to clone the repository or run `pnpm install` just to use the plugin.
+For an install script written for OpenClaw itself to follow, including which config questions to ask first, see [INSTALL.md](./INSTALL.md).
-For a step-by-step installation flow that OpenClaw can follow directly, see [INSTALL.md](./INSTALL.md).
-Registry install after publishing:
+Install
 ```bash
 openclaw plugins install <npm-spec>
@@ -61,25 +79,25 @@ openclaw plugins install @parkgogogo/openclaw-reflection
 Put the following under `plugins.entries.openclaw-reflection` in your OpenClaw config:
-```json
+```jsonc
 {
-  "enabled": true,
+  "enabled": true, // Enable the plugin entry
   "config": {
-    "workspaceDir": "/absolute/path/to/your-agent-workspace",
-    "bufferSize": 50,
-    "logLevel": "info",
+    "workspaceDir": "/absolute/path/to/your-agent-workspace", // Workspace where MEMORY.md, USER.md, TOOLS.md, IDENTITY.md, and SOUL.md live
+    "bufferSize": 50, // Session buffer size used to collect recent messages
+    "logLevel": "info", // Runtime log verbosity: debug, info, warn, or error
     "llm": {
-      "baseURL": "https://api.openai.com/v1",
-      "apiKey": "YOUR_API_KEY",
-      "model": "gpt-4.1-mini"
+      "baseURL": "https://openrouter.ai/api/v1", // OpenAI-compatible provider base URL
+      "apiKey": "YOUR_API_KEY", // Provider API key used for analysis and writing
+      "model": "x-ai/grok-4.1-fast" // Recommended model for plugin runtime
     },
     "memoryGate": {
-      "enabled": true,
-      "windowSize": 10
+      "enabled": true, // Enable durable-memory filtering before any write
+      "windowSize": 10 // Number of recent messages included in memory_gate analysis
     },
     "consolidation": {
-      "enabled": true,
-      "schedule": "0 2 * * *"
+      "enabled": false, // Keep disabled by default; enable only if you want scheduled cleanup
+      "schedule": "0 2 * * *" // Cron expression used when consolidation is enabled
     }
   }
 }
@@ -89,6 +107,13 @@ Put the following under `plugins.entries.openclaw-reflection` in your OpenClaw c
 Once the gateway restarts, Reflection will begin listening to `message_received` and `before_message_write`, then writing curated memory files into your configured `workspaceDir`.
+### Observability command
+- Reflection now writes an independent write_guardian audit log to:
+  - `<workspaceDir>/.openclaw-reflection/write-guardian.log.jsonl`
+- Register command: `/openclaw-reflection`
+  - Returns the most recent 10 write_guardian behaviors (written/refused/failed/skipped), including decision, target file, and reason.
 ## What You Get
 | You want                             | Reflection gives you                                       |
@@ -96,42 +121,30 @@ Once the gateway restarts, Reflection will begin listening to `message_received`
 | A memory system you can inspect      | Plain Markdown files you can open, edit, diff, and version |
 | Better continuity across sessions    | Durable facts routed into the right long-term file         |
 | Less memory pollution                | Gatekeeping that refuses temporary or misrouted content    |
-| A system that stays usable over time | Scheduled consolidation for existing memory files          |
-## Why This Beats Naive Memory
-| Naive memory                     | Reflection                                       |
-| -------------------------------- | ------------------------------------------------ |
-| Appends whatever seems memorable | Filters for durable signal before writing        |
-| Hides memory in a black box      | Stores memory in readable Markdown files         |
-| Mixes all facts together         | Routes facts into purpose-specific files         |
-| Lets bad writes accumulate       | Adds writer guarding and scheduled consolidation |
+| A system that stays usable over time | Optional scheduled consolidation for existing memory files |
 ## How It Works
-```mermaid
-flowchart LR
-  A["Incoming conversation"] --> B["Session buffer"]
-  B --> C["memoryGate"]
-  C -->|durable fact| D["Writer guardian"]
-  C -->|thread noise| E["No write"]
-  D --> F["MEMORY.md / USER.md / SOUL.md / IDENTITY.md / TOOLS.md"]
-  F --> G["Scheduled consolidation"]
-```
+![OpenClaw Reflection flowchart](./assets/memory-flowchart.png)
 In practice, the pipeline is simple:
 1. Reflection captures conversation context from OpenClaw hooks.
-2. `memoryGate` decides whether the candidate fact is durable enough to keep.
+2. `memory_gate` decides whether the candidate fact is durable enough to keep.
 3. A file-specific guardian either rewrites the target memory file or refuses the write.
-4. Scheduled consolidation keeps `MEMORY.md`, `USER.md`, `SOUL.md`, and `TOOLS.md` compact over time.
+4. When enabled, scheduled consolidation keeps `MEMORY.md`, `USER.md`, `SOUL.md`, and `TOOLS.md` compact over time.
 ## Proof, Not Just Promises
-This repo already includes offline eval coverage for the two hardest parts of the system:
+The active default offline benchmark currently includes:
+- `memory_gate`: `18` benchmark cases
+- `write_guardian`: `14` benchmark cases
-- [`memoryGate`: 16/16 passed on V2](./evals/results/2026-03-08-memory-gate-v2-16-of-16.md)
-- [`writer guardian`: 16/16 passed on V2](./evals/results/2026-03-08-writer-guardian-v2-16-of-16.md)
+The most recent archived result snapshots in this repo are:
+- [`memory_gate`: 16/16 passed on V2](./evals/results/2026-03-08-memory-gate-v2-16-of-16.md)
+- [`write_guardian`: 16/16 passed on V2](./evals/results/2026-03-08-write-guardian-v2-16-of-16.md)
 These evals focus on the failure modes that make long-term memory systems unreliable:
@@ -163,32 +176,69 @@ These evals focus on the failure modes that make long-term memory systems unreli
 | `llm.model`              | `gpt-4.1-mini`              | Model used for analysis and consolidation |
 | `memoryGate.enabled`     | `true`                      | Enable long-term memory filtering         |
 | `memoryGate.windowSize`  | `10`                        | Message window used during analysis       |
-| `consolidation.enabled`  | `true`                      | Enable scheduled consolidation            |
+| `consolidation.enabled`  | `false`                     | Enable scheduled consolidation            |
 | `consolidation.schedule` | `0 2 * * *`                 | Cron expression for consolidation         |
 ## Built For
 - personal agents that should get better over weeks, not just one session
+- single-agent OpenClaw setups with many sessions
 - teams that want memory with reviewability and version control
 - OpenClaw users who do not want a black-box memory store
 - agents that need stronger continuity without turning every chat into permanent history
 ## Development And Evals
+Recommended model for real plugin use:
+- `x-ai/grok-4.1-fast`
+The development eval setup in this repository currently uses:
+- eval model: `x-ai/grok-4.1-fast`
+- judge model: `openai/gpt-5.4`
 ```bash
 pnpm run typecheck
 pnpm run eval:memory-gate
-pnpm run eval:writer-guardian
+pnpm run eval:write-guardian
 pnpm run eval:all
+node evals/run.mjs \
+  --suite memory-gate \
+  --models-config evals/models.json \
+  --baseline grok-fast \
+  --output evals/results/$(date +%F)-memory-gate-matrix.json \
+  --markdown-output evals/results/$(date +%F)-memory-gate-matrix.md
 ```
+`evals/models.json` defines only the comparison matrix. The shared provider endpoint and key still come from `EVAL_BASE_URL` and `EVAL_API_KEY`. JSON output is the source of truth for automation and history, while the Markdown artifact is the readable leaderboard summary.
 More eval details: [evals/README.md](./evals/README.md)
-Fast packaged-plugin regression on a reused local OpenClaw profile:
+## Model Selection
-```bash
-pnpm run e2e:openclaw-plugin
-```
+Benchmark date: `2026-03-09`
+Scope: `memory_gate` only, `18` cases, shared OpenRouter-compatible `EVAL_*` route
+| Model | Pass/Total | Accuracy | Errors (P/S/E) | Recommendation | Best For |
+| --- | --- | --- | --- | --- | --- |
+| `x-ai/grok-4.1-fast` | `17/18` | `94.4%` | `0/0/0` | Default baseline | Daily eval baseline |
+| `qwen/qwen3.5-flash-02-23` | `17/18` | `94.4%` | `0/1/0` | Good backup option | Cost-sensitive cross-checks |
+| `google/gemini-2.5-flash-lite` | `16/18` | `88.9%` | `0/0/0` | Fast iteration candidate | Cheap prompt iteration |
+| `inception/mercury-2` | `11/18` | `61.1%` | `0/0/0` | Not recommended as default | Exploratory comparisons only |
+| `minimax/minimax-m2.5` | `9/18` | `50.0%` | `0/0/0` | Not recommended as default | Occasional sanity checks only |
+| `openai/gpt-4o-mini` | `4/18` | `22.2%` | `18/0/0` | Not recommended on current route | Avoid on current OpenRouter path |
+How to choose:
+- Default to `x-ai/grok-4.1-fast` because it had the best overall stability in this round with no internal errors.
+- Use `qwen/qwen3.5-flash-02-23` as the strongest backup when you want similar accuracy but can tolerate one schema failure in this benchmark.
+- Use `google/gemini-2.5-flash-lite` for cheaper, faster prompt iteration when slightly lower boundary accuracy is acceptable.
+- Avoid `inception/mercury-2` and `minimax/minimax-m2.5` as defaults because they frequently collapse `SOUL`, `IDENTITY`, or `NO_WRITE` boundaries into the wrong bucket.
+- Avoid `openai/gpt-4o-mini` on the current OpenRouter/Azure-backed route because all `18` cases surfaced provider-side structured-output errors.
+Source artifact: [2026-03-09-memory-gate-openrouter-model-benchmark.md](./evals/results/2026-03-09-memory-gate-openrouter-model-benchmark.md)
 ## Links

package/README.zh-CN.md ADDED Viewed

@@ -0,0 +1,219 @@
+# OpenClaw Reflection
+<p align="center">
+  <img src="./assets/openclaw-reflection-logo.png" alt="OpenClaw Reflection logo" width="180" />
+</p>
+<p align="center"><strong>在不替换 OpenClaw 原生记忆体系的前提下，让 Markdown 记忆更干净、更稳定、更可持续。</strong></p>
+英文版： [README.md](./README.md)
+![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-Plugin-111111?style=flat-square)
+![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178c6?style=flat-square)
+![memory_gate 18 cases](https://img.shields.io/badge/memory_gate-18%20benchmark%20cases-2ea043?style=flat-square)
+![write_guardian 14 cases](https://img.shields.io/badge/write_guardian-14%20benchmark%20cases-2ea043?style=flat-square)
+OpenClaw Reflection 是叠加在 OpenClaw 原生 Markdown memory 之上的一层增强插件。它负责监听消息流，过滤线程噪音，把真正长期有效的信息写回 OpenClaw 的核心记忆文件，并定期整理这些文件，避免长期使用后越记越乱。
+## 当前支持范围
+Reflection 当前支持：
+- 单一 agent
+- 同一个 agent 下的多 sessions
+目前还不支持多 agent 之间的记忆协调，也不支持在一个 OpenClaw 多 agent 环境里做按 agent 分流的长期记忆管理。
+## 它建立在 OpenClaw 原生 Memory 之上
+OpenClaw 的 memory 本来就是 workspace-native 的：事实源头是 agent workspace 中的 Markdown 文件，而不是隐藏数据库。官方模型里，日常记录通常在 `memory/YYYY-MM-DD.md`，而 `MEMORY.md` 是长期整理层。
+Reflection 的定位不是替换，而是增强：
+- 不引入新的私有 memory store
+- 不要求替换 OpenClaw 默认的 `memory-core`
+- 不接管 `plugins.slots.memory`
+- 直接围绕现有 Markdown memory 文件做捕获、过滤、路由和整理
+- 根据对话，分析整理 `USER.md` `MEMORY.md` `TOOLS.md` `IDENTITY.md` `SOUL.md`
+这意味着迁移成本低、概念负担低，也更容易人工检查和版本管理。
+## 为什么要装它
+Openclaw 默认状态下核心的 `USER.md` `TOOLS.md` `IDENTITY.md` `SOUL.md` 是很难自我迭代改进的
+Reflection 就是为了解决这个问题：
+- 保留稳定的用户偏好和协作习惯
+- 沉淀跨会话仍然有价值的长期上下文
+- 将长期记忆拆分到 `MEMORY.md`、`USER.md`、`SOUL.md`、`IDENTITY.md`、`TOOLS.md`
+- 拒绝一次性任务、短期线程聊天、错路由内容
+- 周期性整理长期记忆，防止文件持续膨胀和失真
+## 原理
+我们使用 LLM 的能力对最近的对话进行分析，设置了 `memory_gate` 和 `write_guardian` 两个工具
+- `memory_gate` 通过对话分析，分析有哪些事实应该被记录到哪个文件
+- `write_guardian` 设置为写入门禁，会根据 OpenClaw 官方的指引，来判断是否要写入，并进行事实整合
+## 安装
+### 推荐方式：安装打包后的插件
+更详细的安装指引见 [INSTALL.md](./INSTALL.md)。这个文件现在按“给 OpenClaw 自己执行的安装技能”来写，包含安装前应该向操作者询问哪些配置。
+手动直接安装：
+```bash
+openclaw plugins install @parkgogogo/openclaw-reflection
+```
+### 添加插件配置
+把下面这段配置写到 OpenClaw profile 的 `plugins.entries.openclaw-reflection` 下：
+```jsonc
+{
+  "enabled": true, // 启用这个插件入口
+  "config": {
+    "workspaceDir": "/absolute/path/to/your-agent-workspace", // 长期记忆文件所在的 agent workspace 目录
+    "bufferSize": 50, // 会话缓冲区大小，用来保留最近消息上下文
+    "logLevel": "info", // 运行日志级别：debug、info、warn、error
+    "llm": {
+      "baseURL": "https://openrouter.ai/api/v1", // OpenAI 兼容接口的 provider base URL
+      "apiKey": "YOUR_API_KEY", // 用于分析和写入决策的 provider API key
+      "model": "x-ai/grok-4.1-fast" // 推荐用于插件运行时的模型
+    },
+    "memoryGate": {
+      "enabled": true, // 启用长期记忆写入前的过滤
+      "windowSize": 10 // memory_gate 分析时使用的最近消息窗口大小
+    },
+    "consolidation": {
+      "enabled": false, // 默认禁用；只有需要定时整理时再开启
+      "schedule": "0 2 * * *" // 启用 consolidation 后使用的 cron 表达式
+    }
+  }
+}
+```
+### 重启 OpenClaw Gateway
+Gateway 重启后，Reflection 就会开始监听 `message_received` 和 `before_message_write`，并把整理后的长期信息写入你配置的 `workspaceDir`。
+### 可观测性命令
+- Reflection 现在会给 write_guardian 单独写一份审计日志：
+  - `<workspaceDir>/.openclaw-reflection/write-guardian.log.jsonl`
+- 注册命令：`/openclaw-reflection`
+  - 返回最近 10 条 write_guardian 行为（written/refused/failed/skipped），包含 decision、目标文件和原因。
+## 你会得到什么
+| 你想要的能力             | Reflection 提供的结果                          |
+| ------------------------ | ---------------------------------------------- |
+| 可检查、可编辑的记忆系统 | 直接落到 Markdown 文件，能打开、diff、版本管理 |
+| 更稳定的跨会话连续性     | 长期事实会被路由到正确的文件                   |
+| 更少的记忆污染           | 会过滤临时线程内容和错路由写入                 |
+| 长期使用后仍然可维护     | 可选的定期 consolidation，避免文件越来越乱     |
+## 它如何工作
+![OpenClaw Reflection flowchart](./assets/memory-flowchart.png)
+流程很直接：
+1. Reflection 从 OpenClaw hook 中捕获会话上下文。
+2. `memory_gate` 判断候选事实是否足够长期、足够稳定。
+3. file-specific `write_guardian` 决定是否写入目标文件，并在需要时重写目标文件内容。
+4. 在启用时，`consolidation` 会定期整理长期文件，控制冗余和过时信息。
+## 评测覆盖
+我们设置了一个小型人工校验过的数据集，使用 x-ai/grok-4.1-fast 来优化 prompt，测试完善 `memory_gate` 和 `write_guardian`
+当前默认离线 benchmark 包含：
+- `memory_gate`：`18` 个 benchmark case
+- `write_guardian`：`14` 个 benchmark case
+仓库中最近一次归档结果快照是：
+- [`memory_gate`: 16/16 passed on V2](./evals/results/2026-03-08-memory-gate-v2-16-of-16.md)
+- [`write_guardian`: 16/16 passed on V2](./evals/results/2026-03-08-write-guardian-v2-16-of-16.md)
+这些评测重点覆盖：
+- 拒绝当前线程噪音
+- 防止用户事实写错文件
+- 保持 `SOUL` 连续性规则
+- 正确替换过时的 `IDENTITY` 元数据
+- 让 `TOOLS.md` 只保存本地工具映射，而不是把它误当工具注册表
+## 长期记忆文件
+| 文件          | 作用                                           |
+| ------------- | ---------------------------------------------- |
+| `MEMORY.md`   | 持久共享上下文、关键结论、长期背景事实         |
+| `USER.md`     | 稳定的用户偏好、协作风格、长期有帮助的个人背景 |
+| `SOUL.md`     | 助手原则、边界、连续性规则                     |
+| `IDENTITY.md` | 显式身份元数据，例如名字、气质、形象描述       |
+| `TOOLS.md`    | 环境特定的工具别名、端点、设备名、本地工具映射 |
+## 开发和评测命令
+实际插件使用时，推荐模型：
+- `x-ai/grok-4.1-fast`
+当前这个仓库里的开发评测配置使用的是：
+- eval model: `x-ai/grok-4.1-fast`
+- judge model: `openai/gpt-5.4`
+```bash
+pnpm run typecheck
+pnpm run eval:memory-gate
+pnpm run eval:write-guardian
+pnpm run eval:all
+node evals/run.mjs \
+  --suite memory-gate \
+  --models-config evals/models.json \
+  --baseline grok-fast \
+  --output evals/results/$(date +%F)-memory-gate-matrix.json \
+  --markdown-output evals/results/$(date +%F)-memory-gate-matrix.md
+```
+`evals/models.json` 只用来定义多模型对比矩阵；共享的 provider endpoint 和 key 仍然来自 `EVAL_BASE_URL` 与 `EVAL_API_KEY`。JSON 输出是后续自动化和历史追踪的基准，Markdown 输出则是给人看的 leaderboard 摘要。
+更多评测说明见 [evals/README.md](./evals/README.md)。
+## 模型选择
+评测日期：`2026-03-09`
+范围：仅 `memory_gate`，共 `18` 个 case，共享 OpenRouter 兼容的 `EVAL_*` 路由
+| 模型 | Pass/Total | 准确率 | 错误数 (P/S/E) | 建议 | 适用场景 |
+| --- | --- | --- | --- | --- | --- |
+| `x-ai/grok-4.1-fast` | `17/18` | `94.4%` | `0/0/0` | 默认基线 | 日常 eval 基线 |
+| `qwen/qwen3.5-flash-02-23` | `17/18` | `94.4%` | `0/1/0` | 优秀备选 | 对成本敏感的交叉验证 |
+| `google/gemini-2.5-flash-lite` | `16/18` | `88.9%` | `0/0/0` | 便宜快速候选 | 低成本 prompt 迭代 |
+| `inception/mercury-2` | `11/18` | `61.1%` | `0/0/0` | 不建议默认使用 | 仅做探索性对比 |
+| `minimax/minimax-m2.5` | `9/18` | `50.0%` | `0/0/0` | 不建议默认使用 | 偶尔做 sanity check |
+| `openai/gpt-4o-mini` | `4/18` | `22.2%` | `18/0/0` | 当前路由下不建议使用 | 避免在当前 OpenRouter 路径使用 |
+如何选择：
+- 默认优先用 `x-ai/grok-4.1-fast`，因为这一轮里它的整体稳定性最好，而且没有内部错误。
+- 如果想要接近的准确率，同时能接受一次 schema 失败，可以把 `qwen/qwen3.5-flash-02-23` 作为最强备选。
+- 如果更看重低成本和快速迭代，可以用 `google/gemini-2.5-flash-lite`，但要接受它在部分 `TOOLS` 边界上略弱。
+- 不要把 `inception/mercury-2` 和 `minimax/minimax-m2.5` 当默认基线，因为它们经常把 `SOUL`、`IDENTITY` 或 `NO_WRITE` 判到错误类别。
+- 当前 OpenRouter/Azure 路由下不要选 `openai/gpt-4o-mini`，因为 `18` 个 case 全都触发了 provider 侧 structured-output 错误。
+源结果见：[2026-03-09-memory-gate-openrouter-model-benchmark.md](./evals/results/2026-03-09-memory-gate-openrouter-model-benchmark.md)
+## 链接
+- OpenClaw plugin docs: [docs.openclaw.ai/tools/plugin](https://docs.openclaw.ai/tools/plugin)

package/assets/memory-flowchart.png ADDED Viewed

Binary file

package/assets/openclaw-reflection-logo.png ADDED Viewed

Binary file

package/openclaw.plugin.json CHANGED Viewed

@@ -54,7 +54,7 @@
         "properties": {
           "enabled": {
             "type": "boolean",
-            "default": true
+            "default": false
           },
           "schedule": {
             "type": "string",

package/package.json CHANGED Viewed

@@ -1,13 +1,15 @@
 {
   "name": "@parkgogogo/openclaw-reflection",
-  "version": "0.1.0",
+  "version": "0.1.3",
   "description": "OpenClaw plugin that enhances native Markdown memory with filtering, curation, and consolidation",
   "type": "module",
   "main": "src/index.ts",
   "files": [
+    "assets/",
     "src/",
     "openclaw.plugin.json",
     "README.md",
+    "README.zh-CN.md",
     "INSTALL.md"
   ],
   "repository": {
@@ -19,12 +21,13 @@
     "url": "https://github.com/parkgogogo/openclaw-reflection/issues"
   },
   "scripts": {
-    "build": "tsc --noEmit",
+    "build": "tsc -p tsconfig.json",
     "clean": "rm -rf logs",
+    "test": "pnpm run build && node --test tests/*.test.mjs",
     "typecheck": "tsc --noEmit",
     "e2e:openclaw-plugin": "bash scripts/e2e-openclaw-plugin.sh",
     "eval:memory-gate": "pnpm exec tsc && node evals/run.mjs --suite memory-gate",
-    "eval:writer-guardian": "pnpm exec tsc && node evals/run.mjs --suite writer-guardian",
+    "eval:write-guardian": "pnpm exec tsc && node evals/run.mjs --suite write-guardian",
     "eval:all": "pnpm exec tsc && node evals/run.mjs --suite all"
   },
   "keywords": [

package/src/config.ts CHANGED Viewed

@@ -21,7 +21,7 @@ const DEFAULT_CONFIG: PluginConfig = {
     windowSize: 10,
   },
   consolidation: {
-    enabled: true,
+    enabled: false,
     schedule: "0 2 * * *",
   },
 };