clawmem 0.8.2 → 0.8.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +10 -11
- package/README.md +7 -78
- package/SKILL.md +6 -3
- package/package.json +1 -1
- package/src/clawmem.ts +89 -17
- package/src/entity.ts +67 -7
- package/src/store.ts +8 -0
- /package/src/openclaw/{plugin.json → openclaw.plugin.json} +0 -0
package/CLAUDE.md
CHANGED
|
@@ -206,18 +206,17 @@ systemctl --user status clawmem-watcher.service clawmem-embed.timer
|
|
|
206
206
|
|
|
207
207
|
When using ClawMem with OpenClaw, choose one of two deployment options:
|
|
208
208
|
|
|
209
|
-
|
|
209
|
+
**Active Memory coexistence:** ClawMem is fully compatible with OpenClaw's Active Memory plugin (v2026.4.10+). They search different backends (ClawMem vault vs dreaming/wiki) and inject into different prompt regions (user prompt vs system prompt). Both can run simultaneously — no configuration needed.
|
|
210
|
+
|
|
211
|
+
**OpenClaw v2026.4.10+ recommended:** Fixes a config normalization bug where `plugins.slots.contextEngine` was silently dropped (#64192).
|
|
210
212
|
|
|
211
|
-
|
|
213
|
+
### Option 1: ClawMem Exclusive (Recommended)
|
|
212
214
|
|
|
213
|
-
|
|
214
|
-
- No context window waste (avoids 10-15% duplicate injection)
|
|
215
|
-
- Prevents OpenClaw native memory auto-initialization on updates
|
|
216
|
-
- All memory in ClawMem's hybrid search + graph traversal system
|
|
215
|
+
ClawMem handles 100% of structured memory. Disable native memory search (not Active Memory — that's separate and compatible):
|
|
217
216
|
|
|
218
217
|
**Configuration:**
|
|
219
218
|
```bash
|
|
220
|
-
# Disable OpenClaw's native memory
|
|
219
|
+
# Disable OpenClaw's native memory search
|
|
221
220
|
openclaw config set agents.defaults.memorySearch.extraPaths "[]"
|
|
222
221
|
|
|
223
222
|
# Verify
|
|
@@ -235,7 +234,7 @@ ls ~/.openclaw/agents/main/memory/
|
|
|
235
234
|
|
|
236
235
|
### Option 2: Hybrid (ClawMem + Native)
|
|
237
236
|
|
|
238
|
-
Run both ClawMem and OpenClaw's native memory for redundancy.
|
|
237
|
+
Run both ClawMem and OpenClaw's native memory search for redundancy.
|
|
239
238
|
|
|
240
239
|
**Configuration:**
|
|
241
240
|
```bash
|
|
@@ -243,9 +242,9 @@ openclaw config set agents.defaults.memorySearch.extraPaths '["~/documents", "~/
|
|
|
243
242
|
```
|
|
244
243
|
|
|
245
244
|
**Tradeoffs:**
|
|
246
|
-
-
|
|
247
|
-
-
|
|
248
|
-
-
|
|
245
|
+
- Redundant recall from two independent systems
|
|
246
|
+
- 10-15% context window waste from duplicate facts
|
|
247
|
+
- Two memory indices to maintain
|
|
249
248
|
|
|
250
249
|
**Recommendation:** Use Option 1 unless you have a specific need for redundant memory systems.
|
|
251
250
|
|
package/README.md
CHANGED
|
@@ -47,82 +47,7 @@ ClawMem turns your markdown notes, project docs, and research dumps into persist
|
|
|
47
47
|
|
|
48
48
|
Runs fully local with no API keys and no cloud services. Integrates via Claude Code hooks and MCP tools, as an OpenClaw ContextEngine plugin, or as a Hermes Agent MemoryProvider plugin. All modes share the same vault for cross-runtime memory. Works with any MCP-compatible client.
|
|
49
49
|
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
- **Entity resolution + co-occurrence graph** — LLM entity extraction with quality filters, type-agnostic canonical resolution within [compatibility buckets](docs/internals/entity-resolution.md) (extensible type vocabulary), IDF-based entity edge scoring, co-occurrence tracking, entity graph traversal for ENTITY intent queries
|
|
53
|
-
- **MPFP graph retrieval** — Multi-Path Fact Propagation with meta-path patterns per intent, hop-synchronized edge cache, Forward Push with α=0.15 teleport probability. Replaces single-beam traversal for causal/entity/temporal queries.
|
|
54
|
-
- **Temporal query extraction** — regex-based date range extraction from natural language queries ("last week", "March 2026"), wired as WHERE filters into BM25 and vector search
|
|
55
|
-
- **4-way parallel retrieval** — temporal proximity and entity graph channels added as parallel RRF legs in `query` tool (Tier 3 only), alongside existing BM25 + vector channels
|
|
56
|
-
- **3-tier consolidation** — facts to observations (auto-generated, with proof_count and trend enum) to mental models. Background worker synthesizes clusters of related observations into consolidated patterns.
|
|
57
|
-
- **Observation invalidation** — soft invalidation (invalidated_at/invalidated_by/superseded_by columns). Observations with confidence ≤ 0.2 after contradiction are filtered from search results.
|
|
58
|
-
- **Memory nudge** — periodic ephemeral `<vault-nudge>` injection prompting lifecycle tool use after N turns of inactivity. Configurable via `CLAWMEM_NUDGE_INTERVAL`.
|
|
59
|
-
|
|
60
|
-
### v0.7.1 Safety Release
|
|
61
|
-
|
|
62
|
-
Five independent safety gates around the consolidation pipeline and context surfacing, aimed at preventing contamination, cross-entity merges, and unchecked contradictions from landing in the vault. Every extraction ships with full unit + integration test coverage (+158 tests on top of the v0.7.0 baseline). See [consolidation safety](docs/concepts/architecture.md#consolidation-safety-v071) for the architectural walkthrough.
|
|
63
|
-
|
|
64
|
-
- **Taxonomy cleanup** — standardized on the A-MEM `contradicts` (plural) convention across the entire codebase, eliminating silent query misses on the legacy singular form
|
|
65
|
-
- **Name-aware merge safety** — the Phase 2 consolidation worker gate extracts entity anchors (via `entity_mentions`, with lexical proper-noun fallback) and runs dual-threshold normalized 3-gram cosine similarity before merging similar observations. Cross-entity merges are hard-rejected when anchor sets differ materially, preventing context bleed where "Alice decided X" merges into "Bob decided X". Thresholds are env-overridable (`CLAWMEM_MERGE_SCORE_NORMAL`=0.93, `_STRICT`=0.98). Dry-run mode via `CLAWMEM_MERGE_GUARD_DRY_RUN` for calibration.
|
|
66
|
-
- **Contradiction-aware merge gate** — after the name-aware gate passes, a deterministic heuristic (negation asymmetry, number/date mismatch) plus an LLM check detect contradictory merges. Blocked merges route to `link` policy (insert new row + `contradicts` edge, default) or `supersede` policy (mark old row `status='inactive'`). Configurable via `CLAWMEM_CONTRADICTION_POLICY` and `CLAWMEM_CONTRADICTION_MIN_CONFIDENCE`. Phase 3 deductive synthesis applies the same gate to deductive dedupe matches.
|
|
67
|
-
- **Anti-contamination deductive synthesis** — every Phase 3 draft runs through a three-layer validator: deterministic pre-checks (empty conclusion, invalid source_indices, pool-only entity contamination via `entity_mentions`) + LLM validator (fail-open with `validatorFallbackAccepts` counter) + dedupe. Per-reason rejection stats exposed via `DeductiveSynthesisStats` so Phase 3 yield can be diagnosed without enabling extra logging.
|
|
68
|
-
- **Context instruction + relationship snippets** — `context-surfacing` now always prepends an `<instruction>` block framing the surfaced facts as background knowledge the model already holds, and appends an optional `<relationships>` block listing memory-graph edges where BOTH endpoints are in the surfaced doc set. The relationships block is the first thing dropped when the payload would overflow `CLAWMEM_PROFILE`'s token budget, preserving facts-first behaviour while giving the model graph-level reasoning hooks directly in-prompt.
|
|
69
|
-
|
|
70
|
-
### v0.7.2 Post-Import Conversation Synthesis
|
|
71
|
-
|
|
72
|
-
Opt-in LLM pass that runs **after** `clawmem mine` finishes indexing an imported collection. Operates on the freshly imported `content_type='conversation'` documents and extracts structured knowledge facts (decisions / preferences / milestones / problems) plus cross-fact relations, writing each fact as a first-class searchable document alongside the raw conversation exchanges. See [post-import synthesis](docs/concepts/architecture.md#post-import-conversation-synthesis-v072) for the architectural walkthrough.
|
|
73
|
-
|
|
74
|
-
- **New CLI flag** — `clawmem mine <dir> --synthesize [--synthesis-max-docs N]`. Off by default. When omitted, existing mine behaviour is byte-identical to v0.7.1.
|
|
75
|
-
- **Two-pass pipeline** — Pass 1 extracts facts per conversation via the existing LLM, saves each via dedup-aware `saveMemory`, and populates a local alias map. Pass 2 resolves cross-fact links against the local map first, falling back to collection-scoped SQL lookup. Forward references (link to a fact extracted later in the same run) are resolved correctly.
|
|
76
|
-
- **Idempotent reruns** — synthesized fact paths are a pure function of `(sourceDocId, slug(title), short sha256(normalizedTitle))`, so reruns over the same conversation batch hit the `saveMemory` update branch instead of creating parallel rows. Same-slug collisions are disambiguated by the stable hash suffix, not encounter order.
|
|
77
|
-
- **Fail-closed link resolution** — when two different facts claim the same normalized title or alias, the resolver treats the link as ambiguous and counts it unresolved. Pre-existing docs with duplicate titles in the collection do not silently bind either.
|
|
78
|
-
- **Weight-monotonic relation upsert** — `memory_relations` insert uses `ON CONFLICT DO UPDATE SET weight = MAX(weight, excluded.weight)`, which is idempotent on equal-weight reruns but still accepts stronger later evidence without double-counting.
|
|
79
|
-
- **Non-fatal failure model** — any LLM failure, JSON parse error, saveMemory collision, or relation insert error is counted and logged, never re-thrown. Synthesis failure after `indexCollection` commits does not roll back the mine import.
|
|
80
|
-
- **Split operator counters** — `llmFailures` counts actual LLM path failures (null, thrown, non-array JSON), while `docsWithNoFacts` counts docs where the LLM responded validly but returned zero structured facts. Previously these were conflated as `nullCalls`.
|
|
81
|
-
|
|
82
|
-
Adds +63 tests (46 unit + 5 integration + 12 regression) on top of the v0.7.1 baseline.
|
|
83
|
-
|
|
84
|
-
### v0.8.0 Quiet-Window Heavy Maintenance Lane
|
|
85
|
-
|
|
86
|
-
A second, longer-interval consolidation worker that keeps Phase 2 + Phase 3 running on large vaults without starving interactive sessions. Off by default — set `CLAWMEM_HEAVY_LANE=true` to enable. The existing 5-minute light-lane worker is unchanged. See [heavy maintenance lane](docs/concepts/architecture.md#heavy-maintenance-lane-v080) for the architectural walkthrough.
|
|
87
|
-
|
|
88
|
-
- **Quiet-window gating** — the heavy lane only runs inside the hours set by `CLAWMEM_HEAVY_LANE_WINDOW_START` / `CLAWMEM_HEAVY_LANE_WINDOW_END` (0-23). Supports midnight wraparound (e.g., 22→6). Null on either bound means "always in window".
|
|
89
|
-
- **Query-rate gating via `context_usage`** — counts hook injections in the last 10 minutes and skips the tick when the rate exceeds `CLAWMEM_HEAVY_LANE_MAX_USAGES` (default 30). No new `query_activity` table; reuses v0.7.0 telemetry.
|
|
90
|
-
- **DB-backed worker leases** — exclusivity enforced via a new `worker_leases` table with atomic `INSERT ... ON CONFLICT DO UPDATE ... WHERE expires_at <= ?` acquisition, random 16-byte fencing tokens, and TTL reclaim. Safe under multi-process contention; any SQLite error translates to a `lease_unavailable` skip rather than a thrown exception.
|
|
91
|
-
- **Stale-first selection** — Phase 2 and Phase 3 reorder their candidate sets by `COALESCE(recall_stats.last_recalled_at, documents.last_accessed_at, documents.modified_at) ASC` so long-unseen docs bubble up first. Empty `recall_stats` falls through to access-time without erroring.
|
|
92
|
-
- **Optional surprisal selector** — `CLAWMEM_HEAVY_LANE_SURPRISAL=true` plumbs k-NN anomaly-ranked doc ids (via the existing `computeSurprisalScores`) into Phase 2 as an explicit `candidateIds` filter. Degrades to stale-first on vaults without embeddings and logs `selector: 'surprisal-fallback-stale'` in the journal.
|
|
93
|
-
- **`maintenance_runs` journal** — every scheduled attempt writes a row: `status` (`started`/`completed`/`failed`/`skipped`), `reason` for skips, selected/processed/created/null_call counts, and a `metrics_json` payload with selector type and full `DeductiveSynthesisStats` breakdown. Operators can reconstruct any lane decision without reading worker logs.
|
|
94
|
-
- **Force-enforce merge gate** — the heavy lane passes `guarded: true` to `consolidateObservations`, which overrides `CLAWMEM_MERGE_GUARD_DRY_RUN` inside `findSimilarConsolidation` so experimenting operators cannot weaken heavy-lane enforcement via env flag.
|
|
95
|
-
|
|
96
|
-
Adds +56 tests (13 worker-lease + 35 maintenance unit + 8 maintenance integration) on top of the v0.7.2 baseline.
|
|
97
|
-
|
|
98
|
-
### v0.8.1 Multi-Turn Prior-Query Lookback
|
|
99
|
-
|
|
100
|
-
`context-surfacing` now builds its retrieval query from the current prompt plus up to two recent same-session prior prompts, so a short follow-up turn ("do the same for X", "explain the rationale") can still inherit the vocabulary of earlier turns. The raw prompt is persisted in a new nullable `context_usage.query_text` column so future hook ticks can reconstitute the multi-turn query from the DB. See [multi-turn lookback](docs/concepts/architecture.md#multi-turn-prior-query-lookback-v081) for the full walkthrough.
|
|
101
|
-
|
|
102
|
-
- **Additive schema migration** — new nullable `query_text TEXT` column on `context_usage`, guarded by `PRAGMA table_info`. Pre-v0.8.1 stores get the column added on first open; ad-hoc stores that skip the migration path degrade transparently via a feature-detect WeakMap so `insertUsageFn` never writes a column that doesn't exist.
|
|
103
|
-
- **Discovery path only** — the multi-turn query feeds vector search, BM25, and query expansion. Cross-encoder reranking continues to use the RAW current prompt so relevance scoring is not diluted by older turns, and composite scoring / snippet extraction / dedupe / routing-hint detection all remain on the raw prompt as well.
|
|
104
|
-
- **Privacy-conscious persistence split** — gated skip paths (slash commands, `MIN_PROMPT_LENGTH`, `shouldSkipRetrieval`, heartbeat dedupe) do NOT persist their raw text because those turns are not meaningful user questions and carry a higher sensitivity profile. Post-retrieval empty paths (empty result set, threshold blocked, budget blocked) DO persist so a follow-up turn can still inherit the intent even when the current turn surfaced nothing.
|
|
105
|
-
- **Current-first truncation** — the combined query is clamped to 2000 chars with the current prompt preserved verbatim at the head. Older priors are dropped first when the budget runs out. If the current prompt alone already exceeds the cap, priors are omitted entirely and the current prompt is truncated.
|
|
106
|
-
- **SQL-level self-match guard** — duplicate submits of the same prompt are filtered out of the lookback SELECT via `AND query_text != ?` so a retry burst cannot eat into the 2-prior budget and leave the lookback window underfilled.
|
|
107
|
-
- **10-minute max age, session-scoped** — priors older than 10 minutes or from a different `session_id` are invisible to the lookback. All fallback paths (missing column, DB error, no matching rows) return the current prompt unchanged — the hook never throws on lookback failures.
|
|
108
|
-
|
|
109
|
-
Adds +27 tests (22 unit + 5 integration) on top of the v0.8.0 baseline.
|
|
110
|
-
|
|
111
|
-
### v0.8.2 Dual-Host Worker Architecture
|
|
112
|
-
|
|
113
|
-
Both maintenance lanes can now be hosted by the long-lived `clawmem watch` watcher service in addition to the existing per-session `clawmem mcp` host. This makes the systemd-managed watcher the canonical 24/7 home for the v0.8.0 heavy maintenance lane — its quiet-window logic finally sees a live worker at the configured hours regardless of whether any Claude Code session is open. The light consolidation lane (Phase 1 backfill + Phase 2 merge + Phase 3 deductive synthesis + Phase 4 recall stats) now also acquires its own DB-backed `worker_leases` row before each tick, symmetric with the heavy lane's existing exclusivity, so multiple host processes against the same vault cannot race on Phase 2 merges or Phase 3 deductive writes.
|
|
114
|
-
|
|
115
|
-
- **Light-lane worker lease** — `runConsolidationTick` wraps every tick (Phase 1 → 4) in `withWorkerLease` against a new `light-consolidation` worker name with a 10-minute TTL. Two host processes (e.g. one watcher service + one per-session stdio MCP) cannot both consolidate the same near-duplicate observations or both INSERT a duplicate row into `consolidated_observations`. Phase 1 enrichment is also serialized — overkill for cost but cleaner for symmetry. The in-process `isRunning` reentrancy guard remains the cheap first defense before the SQLite lease round-trip.
|
|
116
|
-
- **`cmdWatch` hosts both workers** — `clawmem watch` honors the same `CLAWMEM_ENABLE_CONSOLIDATION` and `CLAWMEM_HEAVY_LANE` env-var gates as `cmdMcp`. Off by default. Mirror the existing systemd unit (or your wrapper `.env`) to opt in. The recommended deployment for v0.8.2+ is to set both env vars on `clawmem-watcher.service` and leave `cmdMcp` unset, so the heavy lane has a continuously available host independent of Claude Code session lifecycle.
|
|
117
|
-
- **`cmdMcp` is now a fallback host with a heavy-lane warning** — `cmdMcp` retains the same env-var gates so non-watcher deployments (e.g. macOS users running everything via Claude Code launchd) keep working unchanged. When `CLAWMEM_HEAVY_LANE=true` is set on a stdio MCP host, `cmdMcp` emits a one-line warning to stderr advising operators to move heavy-lane hosting to `clawmem watch` instead.
|
|
118
|
-
- **Async drain on shutdown** — both worker stop helpers (`stopConsolidationWorker` and the closure returned by `startHeavyMaintenanceWorker`) are now `async`, clearing their `setInterval` AND polling their in-flight running flag until any mid-tick worker drains. This guarantees the worker's `withWorkerLease` finally block runs against a still-open store, so the lease is released cleanly instead of leaking until TTL expiry. Bounded waits (15s light, 30s heavy) prevent a stuck tick from wedging shutdown indefinitely; the next process reclaims any stranded lease atomically.
|
|
119
|
-
- **Signal handlers registered before worker startup** — both `cmdWatch` and `cmdMcp` now register their `SIGINT`/`SIGTERM` handlers BEFORE any worker initialization. A signal arriving in the brief window between worker startup and handler registration would otherwise terminate the host via the default signal action (exit 143) and skip the async drain entirely.
|
|
120
|
-
- **Subprocess smoke test** — new `tests/integration/cmdwatch-workers.integration.test.ts` spawns `bun src/clawmem.ts watch` against a temp vault with short worker intervals, exercises the env-var gates, exercises a real heavy-lane tick (slow path, ~35s), and asserts the lease is released cleanly on `SIGTERM`.
|
|
121
|
-
- **Bug fix: removed dead skill-vault watcher block from `clawmem.ts cmdWatch()`** — a try/catch wrapped block had been silently destructuring `getSkillContentRoot` from `./config.ts`, but that helper is forge-internal and was never exported in public ClawMem. The runtime catch swallowed the failure so it had no observable effect, but TypeScript flagged a static `TS2339` error on the destructure. v0.8.2 removes the dead code path. No behavior change for public users.
|
|
122
|
-
|
|
123
|
-
Adds +15 tests (9 light-lane lease unit + 5 cmdWatch fast subprocess + 1 cmdWatch slow subprocess) on top of the v0.8.1 baseline.
|
|
124
|
-
|
|
125
|
-
For operational guidance — enabling the workers via systemd drop-in, tuning intervals to your usage pattern, monitoring queries, and rollback steps — see [docs/guides/systemd-services.md](docs/guides/systemd-services.md#background-maintenance-workers-v082).
|
|
50
|
+
Full version history is in [RELEASE_NOTES.md](RELEASE_NOTES.md). Upgrade instructions for existing vaults are in [docs/guides/upgrading.md](docs/guides/upgrading.md).
|
|
126
51
|
|
|
127
52
|
## Architecture
|
|
128
53
|
|
|
@@ -261,7 +186,7 @@ clawmem setup mcp # Register MCP server in ~/.claude.json (31 tools)
|
|
|
261
186
|
ClawMem registers as a native ContextEngine plugin - OpenClaw's pluggable interface for context management. Same 90/10 automatic retrieval, delivered through OpenClaw's lifecycle system instead of Claude Code hooks.
|
|
262
187
|
|
|
263
188
|
```bash
|
|
264
|
-
clawmem setup openclaw #
|
|
189
|
+
clawmem setup openclaw # Auto-installs plugin, prints remaining steps
|
|
265
190
|
```
|
|
266
191
|
|
|
267
192
|
**What the plugin provides:**
|
|
@@ -271,11 +196,15 @@ clawmem setup openclaw # Shows installation steps
|
|
|
271
196
|
- **5 agent tools** - `clawmem_search`, `clawmem_get`, `clawmem_session_log`, `clawmem_timeline`, `clawmem_similar`
|
|
272
197
|
- **Session lifecycle hooks** - `session_start`, `session_end`, `before_reset` safety net
|
|
273
198
|
|
|
274
|
-
Disable OpenClaw's native memory
|
|
199
|
+
Disable OpenClaw's native memory search to avoid duplicate injection:
|
|
275
200
|
```bash
|
|
276
201
|
openclaw config set agents.defaults.memorySearch.extraPaths "[]"
|
|
277
202
|
```
|
|
278
203
|
|
|
204
|
+
ClawMem coexists cleanly with OpenClaw's [Active Memory](https://docs.openclaw.ai/concepts/active-memory) plugin (v2026.4.10+) — they search different backends and inject into different prompt regions, so both can run simultaneously without conflict. See the [OpenClaw plugin guide](docs/guides/openclaw-plugin.md#coexistence-with-openclaw-active-memory) for details.
|
|
205
|
+
|
|
206
|
+
> **OpenClaw v2026.4.10+** recommended — fixes a config normalization bug where `plugins.slots.contextEngine` was silently dropped (#64192).
|
|
207
|
+
|
|
279
208
|
**Alternative:** OpenClaw agents can also use ClawMem's MCP server directly (`clawmem setup mcp`), with or without hooks. This gives full access to all 31 MCP tools but bypasses OpenClaw's ContextEngine lifecycle, so you lose token budget awareness, native compaction orchestration, and the `afterTurn()` message pipeline. The ContextEngine plugin is recommended for new OpenClaw setups; MCP is available as an additional or standalone integration.
|
|
280
209
|
|
|
281
210
|
#### Hermes Agent
|
package/SKILL.md
CHANGED
|
@@ -605,12 +605,15 @@ Phase 3 deductive synthesis applies the same `contradicts` link for any draft th
|
|
|
605
605
|
|
|
606
606
|
## OpenClaw Integration
|
|
607
607
|
|
|
608
|
+
**Active Memory coexistence:** ClawMem is fully compatible with OpenClaw's Active Memory plugin (v2026.4.10+). They search different backends and inject into different prompt regions — both can run simultaneously. The deployment options below control native memory search (`memorySearch.extraPaths`), not Active Memory.
|
|
609
|
+
|
|
610
|
+
**OpenClaw v2026.4.10+ recommended** — fixes contextEngine slot being silently dropped during config normalization (#64192).
|
|
611
|
+
|
|
608
612
|
### Option 1: ClawMem Exclusive (Recommended)
|
|
609
613
|
|
|
610
|
-
ClawMem handles 100% of memory.
|
|
614
|
+
ClawMem handles 100% of structured memory. Disable native memory search:
|
|
611
615
|
|
|
612
616
|
```bash
|
|
613
|
-
# Disable OpenClaw's native memory
|
|
614
617
|
openclaw config set agents.defaults.memorySearch.extraPaths "[]"
|
|
615
618
|
```
|
|
616
619
|
|
|
@@ -618,7 +621,7 @@ openclaw config set agents.defaults.memorySearch.extraPaths "[]"
|
|
|
618
621
|
|
|
619
622
|
### Option 2: Hybrid
|
|
620
623
|
|
|
621
|
-
Run both ClawMem and OpenClaw native memory.
|
|
624
|
+
Run both ClawMem and OpenClaw native memory search.
|
|
622
625
|
|
|
623
626
|
```bash
|
|
624
627
|
openclaw config set agents.defaults.memorySearch.extraPaths '["~/documents", "~/notes"]'
|
package/package.json
CHANGED
package/src/clawmem.ts
CHANGED
|
@@ -1300,43 +1300,115 @@ function cmdPath() {
|
|
|
1300
1300
|
|
|
1301
1301
|
async function cmdSetupOpenClaw(args: string[]) {
|
|
1302
1302
|
const remove = args.includes("--remove");
|
|
1303
|
-
const binPath = findClawmemBinary();
|
|
1304
1303
|
const pluginDir = pathResolve(import.meta.dir, "openclaw");
|
|
1304
|
+
const extensionsDir = pathResolve(process.env.HOME || "~", ".openclaw", "extensions");
|
|
1305
|
+
const linkPath = pathResolve(extensionsDir, "clawmem");
|
|
1306
|
+
|
|
1307
|
+
// Check if openclaw CLI is available
|
|
1308
|
+
const hasOpenClawCli = (() => {
|
|
1309
|
+
try {
|
|
1310
|
+
const r = Bun.spawnSync(["openclaw", "--version"], { stdout: "pipe", stderr: "pipe" });
|
|
1311
|
+
return r.exitCode === 0;
|
|
1312
|
+
} catch { return false; }
|
|
1313
|
+
})();
|
|
1305
1314
|
|
|
1306
1315
|
if (remove) {
|
|
1307
|
-
|
|
1308
|
-
|
|
1309
|
-
|
|
1316
|
+
// Actually uninstall — mirror of install behavior
|
|
1317
|
+
let removed = false;
|
|
1318
|
+
try {
|
|
1319
|
+
const stat = await import("fs").then(m => m.lstatSync(linkPath));
|
|
1320
|
+
if (stat.isSymbolicLink() || stat.isDirectory()) {
|
|
1321
|
+
const { unlinkSync, rmSync } = await import("fs");
|
|
1322
|
+
if (stat.isSymbolicLink()) {
|
|
1323
|
+
unlinkSync(linkPath);
|
|
1324
|
+
} else {
|
|
1325
|
+
rmSync(linkPath, { recursive: true });
|
|
1326
|
+
}
|
|
1327
|
+
console.log(`${c.green}Removed plugin from ${linkPath}${c.reset}`);
|
|
1328
|
+
removed = true;
|
|
1329
|
+
}
|
|
1330
|
+
} catch (e: any) {
|
|
1331
|
+
if (e.code !== "ENOENT") throw e;
|
|
1332
|
+
console.log(`${c.dim}Plugin not installed at ${linkPath}${c.reset}`);
|
|
1333
|
+
}
|
|
1334
|
+
|
|
1335
|
+
if (hasOpenClawCli) {
|
|
1336
|
+
Bun.spawnSync(["openclaw", "config", "set", "plugins.slots.contextEngine", "legacy"], { stdout: "inherit", stderr: "inherit" });
|
|
1337
|
+
console.log(`${c.green}Reset context engine slot to legacy${c.reset}`);
|
|
1338
|
+
} else if (removed) {
|
|
1339
|
+
console.log(`${c.dim}openclaw CLI not found — manually run: openclaw config set plugins.slots.contextEngine legacy${c.reset}`);
|
|
1340
|
+
}
|
|
1310
1341
|
return;
|
|
1311
1342
|
}
|
|
1312
1343
|
|
|
1313
|
-
//
|
|
1344
|
+
// Verify plugin source files exist
|
|
1314
1345
|
if (!existsSync(pathResolve(pluginDir, "index.ts"))) {
|
|
1315
1346
|
die(`OpenClaw plugin files not found at ${pluginDir}`);
|
|
1316
1347
|
}
|
|
1348
|
+
if (!existsSync(pathResolve(pluginDir, "openclaw.plugin.json"))) {
|
|
1349
|
+
die(`Plugin manifest not found at ${pluginDir}/openclaw.plugin.json`);
|
|
1350
|
+
}
|
|
1317
1351
|
|
|
1318
|
-
|
|
1319
|
-
|
|
1320
|
-
|
|
1321
|
-
|
|
1352
|
+
// Create extensions directory
|
|
1353
|
+
if (!existsSync(extensionsDir)) {
|
|
1354
|
+
mkdirSync(extensionsDir, { recursive: true });
|
|
1355
|
+
}
|
|
1356
|
+
|
|
1357
|
+
// Remove stale symlink/directory if present
|
|
1358
|
+
try {
|
|
1359
|
+
const { lstatSync, unlinkSync, rmSync } = await import("fs");
|
|
1360
|
+
const stat = lstatSync(linkPath);
|
|
1361
|
+
if (stat.isSymbolicLink()) {
|
|
1362
|
+
const { readlinkSync } = await import("fs");
|
|
1363
|
+
const target = readlinkSync(linkPath);
|
|
1364
|
+
if (target === pluginDir) {
|
|
1365
|
+
console.log(`${c.dim}Symlink already correct at ${linkPath}${c.reset}`);
|
|
1366
|
+
} else {
|
|
1367
|
+
unlinkSync(linkPath);
|
|
1368
|
+
console.log(`${c.dim}Replaced stale symlink (was → ${target})${c.reset}`);
|
|
1369
|
+
}
|
|
1370
|
+
} else if (stat.isDirectory()) {
|
|
1371
|
+
rmSync(linkPath, { recursive: true });
|
|
1372
|
+
console.log(`${c.dim}Replaced existing directory at ${linkPath}${c.reset}`);
|
|
1373
|
+
} else {
|
|
1374
|
+
// Regular file or other non-symlink, non-directory — conflict
|
|
1375
|
+
die(`${linkPath} exists but is not a symlink or directory. Remove it manually and re-run setup.`);
|
|
1376
|
+
}
|
|
1377
|
+
} catch (e: any) {
|
|
1378
|
+
if (e.code !== "ENOENT") throw e;
|
|
1379
|
+
}
|
|
1380
|
+
|
|
1381
|
+
// Create symlink
|
|
1382
|
+
if (!existsSync(linkPath)) {
|
|
1383
|
+
const { symlinkSync } = await import("fs");
|
|
1384
|
+
symlinkSync(pluginDir, linkPath);
|
|
1385
|
+
}
|
|
1386
|
+
console.log(`${c.green}Installed plugin: ${linkPath} → ${pluginDir}${c.reset}`);
|
|
1387
|
+
|
|
1388
|
+
// Version warning
|
|
1322
1389
|
console.log();
|
|
1323
|
-
console.log(`${c.bold}
|
|
1390
|
+
console.log(`${c.bold}Note:${c.reset} OpenClaw v2026.4.10+ recommended — earlier versions`);
|
|
1391
|
+
console.log(`have a bug where plugins.slots.contextEngine is silently dropped`);
|
|
1392
|
+
console.log(`during config normalization (openclaw/openclaw#64192).`);
|
|
1393
|
+
|
|
1394
|
+
// Remaining steps — gateway must restart BEFORE setting the context engine slot,
|
|
1395
|
+
// otherwise OpenClaw hasn't discovered the plugin yet and the slot assignment
|
|
1396
|
+
// fails or is ignored (the exact bug reported in issue #5).
|
|
1324
1397
|
console.log();
|
|
1325
|
-
console.log(
|
|
1326
|
-
console.log(` ${c.cyan}ln -s ${pluginDir} ~/.openclaw/extensions/clawmem${c.reset}`);
|
|
1398
|
+
console.log(`${c.bold}Next steps:${c.reset}`);
|
|
1327
1399
|
console.log();
|
|
1328
|
-
console.log(`
|
|
1329
|
-
console.log(` ${c.cyan}
|
|
1400
|
+
console.log(` 1. Restart OpenClaw gateway to discover the plugin:`);
|
|
1401
|
+
console.log(` ${c.cyan}openclaw gateway restart${c.reset}`);
|
|
1330
1402
|
console.log();
|
|
1331
|
-
console.log(`
|
|
1403
|
+
console.log(` 2. Set ClawMem as the active context engine (after restart):`);
|
|
1332
1404
|
console.log(` ${c.cyan}openclaw config set plugins.slots.contextEngine clawmem${c.reset}`);
|
|
1333
1405
|
console.log();
|
|
1334
|
-
console.log(`
|
|
1406
|
+
console.log(` 3. Configure GPU endpoints (if not using defaults):`);
|
|
1335
1407
|
console.log(` ${c.cyan}openclaw config set plugins.entries.clawmem.config.gpuEmbed http://YOUR_GPU:8088${c.reset}`);
|
|
1336
1408
|
console.log(` ${c.cyan}openclaw config set plugins.entries.clawmem.config.gpuLlm http://YOUR_GPU:8089${c.reset}`);
|
|
1337
1409
|
console.log(` ${c.cyan}openclaw config set plugins.entries.clawmem.config.gpuRerank http://YOUR_GPU:8090${c.reset}`);
|
|
1338
1410
|
console.log();
|
|
1339
|
-
console.log(`
|
|
1411
|
+
console.log(` 4. Start the REST API (for agent tools):`);
|
|
1340
1412
|
console.log(` ${c.cyan}clawmem serve &${c.reset}`);
|
|
1341
1413
|
console.log();
|
|
1342
1414
|
console.log(`${c.dim}ClawMem will work alongside Claude Code hooks — both modes share the same vault.${c.reset}`);
|
package/src/entity.ts
CHANGED
|
@@ -161,6 +161,53 @@ function makeEntityId(name: string, type: string, vault: string = 'default'): st
|
|
|
161
161
|
return `${vault}:${type}:${normalized}`;
|
|
162
162
|
}
|
|
163
163
|
|
|
164
|
+
// =============================================================================
|
|
165
|
+
// Entity Cap (content-type-aware, §1.5 v0.8.3)
|
|
166
|
+
// =============================================================================
|
|
167
|
+
|
|
168
|
+
/**
|
|
169
|
+
* Per-content-type entity cap applied to LLM extraction output.
|
|
170
|
+
*
|
|
171
|
+
* Long-form content (research dumps, conversation synthesis, hub/index docs)
|
|
172
|
+
* legitimately mentions more distinct entities than short decision records or
|
|
173
|
+
* handoff notes. A flat cap of 10 silently dropped real entities on long-form
|
|
174
|
+
* documents. This map lets each content type keep its full entity set up to a
|
|
175
|
+
* type-appropriate ceiling, while short types stay tight to suppress LLM noise.
|
|
176
|
+
*
|
|
177
|
+
* Unknown or untyped documents fall through to the default cap of 10 (matches
|
|
178
|
+
* pre-v0.8.3 behavior — backward compatible for any caller that doesn't pass
|
|
179
|
+
* a contentType).
|
|
180
|
+
*/
|
|
181
|
+
const ENTITY_CAP_BY_TYPE: Record<string, number> = {
|
|
182
|
+
research: 15, // long-form research dumps
|
|
183
|
+
hub: 12, // architecture docs, indexes
|
|
184
|
+
conversation: 12, // synthesized conversation exports
|
|
185
|
+
decision: 8, // short decision records
|
|
186
|
+
deductive: 8, // inferred observations
|
|
187
|
+
note: 8, // session notes
|
|
188
|
+
handoff: 8, // session handoffs
|
|
189
|
+
progress: 8, // progress logs
|
|
190
|
+
project: 10, // generic project content
|
|
191
|
+
};
|
|
192
|
+
|
|
193
|
+
/**
|
|
194
|
+
* Return the entity cap for a given content type. Falls back to 10 for
|
|
195
|
+
* undefined or unknown types (pre-v0.8.3 behavior).
|
|
196
|
+
*
|
|
197
|
+
* Input is trimmed + lowercased before lookup so values from hand-authored
|
|
198
|
+
* frontmatter or older imported docs (e.g. "Research", " conversation ") map
|
|
199
|
+
* cleanly to the canonical lowercase keys in `ENTITY_CAP_BY_TYPE`. The DB
|
|
200
|
+
* `documents.content_type` column is not normalized at the write boundary,
|
|
201
|
+
* so normalization has to happen here to avoid silent fall-through to the
|
|
202
|
+
* default cap of 10.
|
|
203
|
+
*/
|
|
204
|
+
export function entityCapForContentType(contentType?: string): number {
|
|
205
|
+
if (!contentType) return 10;
|
|
206
|
+
const key = contentType.trim().toLowerCase();
|
|
207
|
+
if (!key) return 10;
|
|
208
|
+
return ENTITY_CAP_BY_TYPE[key] ?? 10;
|
|
209
|
+
}
|
|
210
|
+
|
|
164
211
|
// =============================================================================
|
|
165
212
|
// Entity Extraction (LLM-based)
|
|
166
213
|
// =============================================================================
|
|
@@ -168,14 +215,25 @@ function makeEntityId(name: string, type: string, vault: string = 'default'): st
|
|
|
168
215
|
/**
|
|
169
216
|
* Extract named entities from document content using LLM.
|
|
170
217
|
* Returns a list of (name, type) pairs.
|
|
218
|
+
*
|
|
219
|
+
* @param contentType Optional document content_type. When provided, caps the
|
|
220
|
+
* returned entity list using `entityCapForContentType`. When omitted, uses
|
|
221
|
+
* the default cap of 10 (backward compatible).
|
|
171
222
|
*/
|
|
172
223
|
export async function extractEntities(
|
|
173
224
|
llm: LLM,
|
|
174
225
|
title: string,
|
|
175
|
-
content: string
|
|
226
|
+
content: string,
|
|
227
|
+
contentType?: string
|
|
176
228
|
): Promise<ExtractedEntity[]> {
|
|
177
229
|
const truncated = content.slice(0, 2000);
|
|
178
230
|
|
|
231
|
+
// v0.8.3 (§1.5): compute the cap up front so we can thread it into BOTH
|
|
232
|
+
// the prompt ("0-N entities") and the post-LLM slice. Without the dynamic
|
|
233
|
+
// prompt, a compliant model stops at the hardcoded 10 even when we'd
|
|
234
|
+
// accept 15 — the slice becomes a no-op and §1.5 is only half-effective.
|
|
235
|
+
const cap = entityCapForContentType(contentType);
|
|
236
|
+
|
|
179
237
|
const prompt = `Extract named entities from this document. Include people, projects, services, tools, organizations, and specific technical components.
|
|
180
238
|
|
|
181
239
|
Title: ${title}
|
|
@@ -189,7 +247,7 @@ Return ONLY valid JSON array:
|
|
|
189
247
|
Rules:
|
|
190
248
|
- Only include specific, named entities (not generic concepts like "database" or "testing")
|
|
191
249
|
- Normalize names: "VM 202" not "vm202", "ClawMem" not "clawmem"
|
|
192
|
-
- 0
|
|
250
|
+
- 0-${cap} entities. Return empty array [] if no specific entities found
|
|
193
251
|
- Include the most specific type for each entity
|
|
194
252
|
- Do NOT extract the document's title as an entity
|
|
195
253
|
- Do NOT extract heading labels, section names, or sentence fragments
|
|
@@ -217,7 +275,7 @@ Return ONLY the JSON array. /no_think`;
|
|
|
217
275
|
['person', 'project', 'service', 'tool', 'concept', 'org', 'location'].includes(e.type)
|
|
218
276
|
)
|
|
219
277
|
.filter(e => !isLowQualityEntity(e.name, e.type, title))
|
|
220
|
-
.slice(0,
|
|
278
|
+
.slice(0, cap);
|
|
221
279
|
} catch (err) {
|
|
222
280
|
console.log(`[entity] LLM extraction failed:`, err);
|
|
223
281
|
return [];
|
|
@@ -454,12 +512,14 @@ export async function enrichDocumentEntities(
|
|
|
454
512
|
): Promise<number> {
|
|
455
513
|
try {
|
|
456
514
|
// Get document content (snapshot for extraction)
|
|
515
|
+
// v0.8.3 (§1.5): fetch content_type so extractEntities can apply a
|
|
516
|
+
// content-type-aware cap instead of the flat slice(0, 10).
|
|
457
517
|
const doc = db.prepare(`
|
|
458
|
-
SELECT d.title, c.doc as body
|
|
518
|
+
SELECT d.title, d.content_type, c.doc as body
|
|
459
519
|
FROM documents d
|
|
460
520
|
JOIN content c ON c.hash = d.hash
|
|
461
521
|
WHERE d.id = ? AND d.active = 1
|
|
462
|
-
`).get(docId) as { title: string; body: string } | null;
|
|
522
|
+
`).get(docId) as { title: string; content_type: string | null; body: string } | null;
|
|
463
523
|
|
|
464
524
|
if (!doc) {
|
|
465
525
|
console.log(`[entity] Document ${docId} not found or inactive`);
|
|
@@ -478,8 +538,8 @@ export async function enrichDocumentEntities(
|
|
|
478
538
|
return 0; // Same input, already enriched — skip
|
|
479
539
|
}
|
|
480
540
|
|
|
481
|
-
// Step 1: Extract entities via LLM
|
|
482
|
-
const entities = await extractEntities(llm, doc.title, doc.body);
|
|
541
|
+
// Step 1: Extract entities via LLM (cap is content-type-aware as of v0.8.3 §1.5)
|
|
542
|
+
const entities = await extractEntities(llm, doc.title, doc.body, doc.content_type ?? undefined);
|
|
483
543
|
|
|
484
544
|
// Recheck input hash before writing — abort if content changed during LLM call
|
|
485
545
|
const recheckHash = db.prepare(`
|
package/src/store.ts
CHANGED
|
@@ -1543,6 +1543,10 @@ export function createStore(dbPath?: string, opts?: { readonly?: boolean; busyTi
|
|
|
1543
1543
|
|
|
1544
1544
|
// Usage relation tracking — records relations between documents
|
|
1545
1545
|
insertRelation: (fromDoc: number, toDoc: number, relType: string, weight: number = 1.0) => {
|
|
1546
|
+
// v0.8.3 (§1.3): reject self-loops at the API boundary. A document
|
|
1547
|
+
// relating to itself has no informational value for graph traversal
|
|
1548
|
+
// and would pollute intent_search/find_similar neighborhoods.
|
|
1549
|
+
if (fromDoc === toDoc) return;
|
|
1546
1550
|
db.prepare(`
|
|
1547
1551
|
INSERT INTO memory_relations (source_id, target_id, relation_type, weight, created_at)
|
|
1548
1552
|
VALUES (?, ?, ?, ?, ?)
|
|
@@ -4224,6 +4228,10 @@ export async function syncBeadsIssues(
|
|
|
4224
4228
|
const targetRow = db.prepare(`SELECT doc_id FROM beads_issues WHERE beads_id = ?`).get(dep.target_id) as { doc_id: number } | undefined;
|
|
4225
4229
|
|
|
4226
4230
|
if (sourceRow && targetRow) {
|
|
4231
|
+
// v0.8.3 (§1.3): mirror of insertRelation self-loop guard. Beads can
|
|
4232
|
+
// theoretically express a self-dependency (e.g. a `relates-to` edge
|
|
4233
|
+
// from an issue to itself); skip those before they land in the graph.
|
|
4234
|
+
if (sourceRow.doc_id === targetRow.doc_id) continue;
|
|
4227
4235
|
db.prepare(`
|
|
4228
4236
|
INSERT OR IGNORE INTO memory_relations (source_id, target_id, relation_type, weight, metadata, created_at)
|
|
4229
4237
|
VALUES (?, ?, ?, 1.0, ?, ?)
|
|
File without changes
|