@simbimbo/memory-ocmemog 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,77 @@
1
1
  # Changelog
2
2
 
3
+ ## Unreleased
4
+
5
+ ## 0.1.17 — 2026-03-26
6
+
7
+ Promotion/governance observability, anti-cruft hardening, queue/runtime summary parity, and release validation recovery.
8
+
9
+ ### Highlights
10
+ - surfaced agent-scoped auto-hydration policy and decision reasons in runtime and dedicated sidecar diagnostics
11
+ - added request-level embedding execution diagnostics and promoted key embedding outcomes into the search execution path summary
12
+ - added compact governance, reinforcement, and suppression rollups to `/memory/search` diagnostics, including per-bucket parity
13
+ - added queue health, severity, invalid/retrying payload indicators, and doctor-style aliases to `runtimeSummary.queue`
14
+ - added governance queue/review/audit/rollback/auto-resolve diagnostics plus normalized priority labels and concise explanations
15
+ - added promotion decision explanations, verification summaries, quality summaries, and richer rejection reasons
16
+ - activated anti-cruft promotion gating for low-confidence generic memories, including a distinct redundant-generic rejection path
17
+ - repaired a malformed local edit in `promote.py` and revalidated the branch tip with full test coverage before release
18
+ - full validation passed: `188 passed`
19
+
20
+ ### Highlights
21
+ - improved lexical retrieval scoring to consider token overlap, ordered phrase overlap, and light prefix matching instead of relying on blunt substring-or-overlap behavior
22
+ - kept the retrieval path bounded and hybrid by continuing to blend lexical, semantic, reinforcement, promotion, recency, and lane-aware signals
23
+ - added lightweight `searchDiagnostics` to `/memory/search` so retrieval strategy, lane, bucket counts, result compaction, timing, and vector-search scan/prefilter details are visible in the API response
24
+ - added a bounded lexical prefilter inside `vector_index.search_memory()` so semantic ranking can prefer lexically relevant candidates before cosine scoring without introducing ANN complexity
25
+ - aligned README and architecture/usage docs with the actual shipped hybrid retrieval behavior
26
+ - added regression coverage for partial-phrase lexical matches, sidecar search diagnostics, vector prefilter behavior, malformed queue-line recovery, bounded async retry behavior, and doctor visibility for retrying queue payloads
27
+ - hardened async queue processing so malformed queue JSON is skipped/acknowledged instead of blocking later valid entries in the same queue file
28
+ - added bounded retry tracking for valid queue payload failures so poison items are retried a small number of times and then dropped/acknowledged instead of blocking the queue forever
29
+ - improved doctor queue health output so malformed queue lines and retrying poison items are reported separately with clearer hints and samples
30
+ - added `runtimeSummary` to sidecar/runtime payloads so provider path, hash-fallback state, degraded/ready mode, and compatibility residue are explicit to operators
31
+ - expanded `/memory/search` diagnostics with a request-level `execution_path` block so provider-configured, provider-skipped, local-fallback-expected, and route-exception-fallback behavior is explicit per request
32
+ - added `reviewDiagnostics` to `/memory/governance/review/summary` so cache freshness, item count, kind breakdown, and active filters are explicit to operators
33
+ - added an `explanation` block to `/memory/governance/review` items so per-item rationale and source/target status context are easier to render and review
34
+ - added normalized governance `priority_label` values on review items and `priority_label_counts` in review summary diagnostics for simpler operator triage
35
+ - added per-agent auto-hydration controls (`OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS` / `OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS`) so prompt-time continuity can be scoped by `ctx.agentId` without disabling global ingest/checkpoint behavior
36
+ - surfaced the active auto-hydration agent policy in `runtimeSummary.auto_hydration` for easier operator verification and debugging
37
+ - added explicit plugin-side hydration decision reasons so skips can be traced to global disable vs denylist vs allowlist mismatch
38
+ - added `/memory/auto_hydration/policy` so operators can query the current agent-specific prompt-hydration decision from the sidecar
39
+ - improved plugin hydration observability with structured skip/apply decision logs that include agent id, decision reason, and prepend sizes
40
+ - added compact `governance_summary` payloads to retrieval results so search consumers can triage governance state without unpacking the full provenance structure
41
+ - enriched `runtimeSummary` embedding observability with local embedding model and embedding path readiness details so provider vs local/simple fallback is clearer to operators
42
+ - added request-level embedding execution diagnostics to vector/search responses so operators can tell whether provider embedding was attempted, whether local fallback ran, and which path actually produced the query embedding
43
+ - made retrieval reinforcement scoring frequency-aware and exposed `reinforcement_count` in retrieval signals so repeated successful experience can influence ranking without unbounded growth
44
+ - added light recency-aware reinforcement weighting and exposed `reinforcement_weighted_count` so newer successful experiences matter more than stale ones
45
+ - added bounded negative reinforcement handling and exposed `reinforcement_negative_count` / `reinforcement_negative_penalty` so failed experience can depress ranking in an explainable way
46
+ - added compact promotion decision explanations so distill/promote outcomes are easier to inspect and render
47
+ - added compact promotion verification summaries so confidence/threshold semantics are easier to interpret consistently
48
+ - enriched rejected promotion reasons so generic-destination low-confidence cases are easier to distinguish from ordinary below-threshold outcomes
49
+ - added `quality_summary` to promotion decisions so low-value generic candidates are easier to identify as likely memory cruft (`drop` / high-noise-risk) while stronger specific memories are easier to keep
50
+ - activated a first anti-cruft retention gate: low-confidence candidates that only resolve to generic `knowledge` are now rejected as `rejected_as_generic_cruft`
51
+ - added a second anti-cruft distinction for low-confidence generic candidates that merely restate existing generic knowledge: `rejected_as_redundant_generic_cruft`
52
+ - added a weak-specific ambiguity distinction so below-threshold candidates that fit a specific bucket only loosely are labeled `rejected_as_ambiguous_specific_memory` and surfaced for review more clearly
53
+ - added compact reinforcement rollups to `/memory/search` diagnostics so operators can see visible and retrieval-side reinforcement totals, including per-bucket visible counts
54
+ - extended retrieval-side reinforcement diagnostics with per-bucket totals for parity with other search/operator rollups
55
+ - extended visible and retrieval-side reinforcement rollups with bounded negative/polarity totals so failed experience is inspectable too
56
+ - promoted the key embedding outcome fields into `searchDiagnostics.execution_path` so request-level scanability is better without drilling into nested vector diagnostics
57
+ - added compact queue health snapshots to `runtimeSummary` so operators can see queue depth, last run, processed totals, error counts, and worker status from normal sidecar payloads
58
+ - extended `runtimeSummary.queue` with lightweight severity/hints so backlog/worker/error conditions are easier to judge from normal runtime payloads
59
+ - added compact invalid-line and retrying-payload indicators to `runtimeSummary.queue` so normal runtime payloads can distinguish queue corruption from poison-item retry churn
60
+ - added doctor-style queue aliases (`queue_depth`, `queue_backlog_severity`) to `runtimeSummary.queue` to reduce translation friction between runtime payloads and doctor output
61
+ - added compact queue worker config issues to `runtimeSummary.queue` so invalid poll/batch settings surface in normal runtime payloads too
62
+ - normalized runtime summary sub-blocks so `queue`, `embedding_path_summary`, and `auto_hydration` all expose a small shared operator shape (`enabled`, `status`, `issues`) where appropriate
63
+ - added compact `queueDiagnostics` to `/memory/governance/queue` so operators can quickly see item counts plus bucket/kind/priority-label breakdowns
64
+ - added compact `explanation` blocks to governance queue items so queue consumers get short rationale and target-reference context without unpacking raw fields
65
+ - added normalized `priority_label` values to governance queue items so queue and review surfaces share the same urgency vocabulary
66
+ - added compact `autoResolveDiagnostics` to `/memory/governance/auto_resolve` so operators can quickly see action totals plus reason/kind breakdowns and the active policy profile
67
+ - added compact `auditDiagnostics` to `/memory/governance/audit` so operators can quickly see audit item totals plus event/status breakdowns
68
+ - added compact `rollbackDiagnostics` to `/memory/governance/rollback` so operators can quickly see whether rollback succeeded and how the outcome was classified
69
+ - reframed governance review actions as apply/dismiss and added `/memory/governance/review/auto_apply` so routine review handling does not depend on dashboard/user approval input
70
+ - added `governance_rollup` to `/memory/search` diagnostics so search consumers can quickly see visible result status counts and needs-review totals
71
+ - extended visible governance rollups with per-bucket breakdowns so search consumers can see where visible governance pressure is concentrated
72
+ - added retrieval governance suppression counts so `/memory/search` diagnostics can report how many candidates were hidden as `superseded` or `duplicate`
73
+ - extended retrieval governance suppression diagnostics with per-bucket breakdowns so search consumers can see where hidden-governance pressure is concentrated
74
+
3
75
  ## 0.1.16 — 2026-03-25
4
76
 
5
77
  Platform support doc clarification for Linux/Windows service guidance.
package/README.md CHANGED
@@ -3,11 +3,38 @@
3
3
  **ocmemog** is an advanced memory engine for OpenClaw that combines durable long-term memory, transcript-backed continuity, conversation hydration, checkpoint expansion, and pondering inside a sidecar-based plugin architecture.
4
4
 
5
5
  It is designed to go beyond simple memory search by providing:
6
- - **durable memory and semantic retrieval**
6
+ - **durable memory and hybrid retrieval (lexical + semantic)**
7
+ - **operator-visible search diagnostics for retrieval and vector-search behavior**
8
+ - **bounded vector search with lightweight lexical prefiltering**
7
9
  - **lossless-style conversation continuity**
8
10
  - **checkpointing, branch-aware hydration, and turn expansion**
9
11
  - **transcript ingestion with anchored context recovery**
10
12
  - **pondering and reflection generation**
13
+ - **durable queue behavior that skips malformed queued payloads, bounds poison-item retries, and exposes clearer queue health diagnostics**
14
+ - **compact runtime summaries that make provider/fallback/degraded state explicit in sidecar responses, including local embedding model/path readiness**
15
+ - **request-level embedding path diagnostics inside search/vector-search responses, with promoted top-level execution summaries**
16
+ - **frequency-aware, recency-aware, and polarity-aware reinforcement signals in retrieval ranking**
17
+ - **compact reinforcement rollups in search diagnostics, including per-bucket and negative/polarity parity**
18
+ - **more consistent runtime summary sub-blocks across embedding, queue, and auto-hydration surfaces**
19
+ - **compact queue health snapshots in runtime summaries, including severity, hints, invalid/retrying indicators, doctor-style aliases, and worker-config issues**
20
+ - **request-level search execution diagnostics that show provider-skip vs local-fallback vs route-fallback behavior**
21
+ - **governance review summary diagnostics for cache freshness and review-kind breakdowns**
22
+ - **governance review item explanations that make duplicate/contradiction/supersession rationale easier to render**
23
+ - **normalized governance priority labels for easier operator triage**
24
+ - **a sidecar hydration-policy diagnostics route for agent-specific continuity debugging**
25
+ - **compact governance summaries in retrieval results to bridge search and review workflows**
26
+ - **promotion decision explanations, verification summaries, and quality summaries for better distill/promote operator clarity**
27
+ - **an explicit anti-cruft quality signal so weak generic memories are easier to spot and avoid keeping long-term**
28
+ - **active anti-cruft retention gates that reject low-confidence generic memories, especially redundant generic junk, while flagging weak specific-fit memories more explicitly for review**
29
+ - **compact governance queue diagnostics for faster operator triage**
30
+ - **governance review apply/dismiss + auto-apply flows that do not depend on dashboard approval input**
31
+ - **governance queue item explanations that align queue and review surfaces**
32
+ - **shared normalized priority labels across governance queue and review items**
33
+ - **compact governance auto-resolve diagnostics for faster operator triage**
34
+ - **compact governance audit diagnostics for faster operator triage**
35
+ - **compact governance rollback diagnostics for faster operator triage**
36
+ - **governance rollups in search diagnostics for faster operator triage, including per-bucket visible breakdowns**
37
+ - **hidden-by-governance suppression counts in retrieval diagnostics, including per-bucket breakdowns**
11
38
 
12
39
  Architecture at a glance:
13
40
  - **OpenClaw plugin (`index.ts`)** handles tools and hook integration
@@ -136,6 +163,11 @@ Optional environment variables:
136
163
  - `OCMEMOG_SHUTDOWN_TIMING` (`true` enables shutdown timing logs; defaults to `true`)
137
164
  - `OCMEMOG_API_TOKEN` (optional; if set, requests must include `x-ocmemog-token` or `Authorization: Bearer ...`; OpenClaw plugin users should also set the plugin `config.token` field)
138
165
  - `OCMEMOG_AUTO_HYDRATION` (`true` to re-enable prompt-time continuity prepending; defaults to `false` as a safety guard until the host runtime is verified not to persist prepended context into session history)
166
+ - `OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS` (comma-separated `ctx.agentId` allowlist for prompt-time hydration; when set, only matching agents receive before-prompt hydration)
167
+ - `OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS` (comma-separated `ctx.agentId` denylist for prompt-time hydration; checked before the allowlist so specific agents can be blocked even when global hydration remains enabled)
168
+ - `runtimeSummary.auto_hydration` now exposes the active auto-hydration policy so operators can verify agent scoping from sidecar/runtime payloads
169
+ - plugin-side hydration gating now has explicit decision reasons (`disabled_globally`, `denied_by_agent_id`, `not_in_allowlist`, `allowed_by_allowlist`, `allowed_globally`) for clearer debugging/logging
170
+ - plugin logs now record structured prompt-hydration decision context for both skipped and applied hydration events
139
171
  - `OCMEMOG_LAPTOP_MODE` (`auto` by default; on macOS battery power this slows watcher polling, reduces ingest batch size, and disables sentiment reinforcement unless explicitly overridden)
140
172
  - `OCMEMOG_LOCAL_LLM_BASE_URL` (default: `http://127.0.0.1:18080/v1`; local OpenAI-compatible text endpoint, e.g. llama.cpp)
141
173
  - `OCMEMOG_LOCAL_LLM_MODEL` (default: `qwen2.5-7b-instruct`; matches the active Qwen2.5-7B-Instruct GGUF runtime)
@@ -30,22 +30,81 @@ The main SQLite database owns these tables:
30
30
 
31
31
  ## Retrieval flow
32
32
 
33
- The current sidecar behavior is simpler than brAIn's full memory architecture:
33
+ The current sidecar retrieval path is a bounded hybrid ranker rather than a pure substring search:
34
34
 
35
35
  1. `/memory/search` calls `retrieval.retrieve_for_queries()`.
36
- 2. Retrieval scans `knowledge`, `reflections`, `directives`, and `tasks` for substring matches.
37
- 3. Result scoring combines:
38
- - keyword hit: `1.0` on substring match
39
- - reinforcement bonus: `reward_score * 0.5`
40
- - confidence bonus: `promotion confidence * 0.3`
41
- 4. If `knowledge` has no keyword hit, retrieval falls back to `vector_index.search_memory()`.
42
- 5. The sidecar flattens the bucketed results into a plugin-friendly response.
36
+ 2. Each query fans into `retrieval.retrieve()` across the selected categories.
37
+ 3. Lexical ranking now combines:
38
+ - exact substring hit (`1.0` when the full query appears)
39
+ - token overlap ratio
40
+ - ordered phrase/sequence overlap
41
+ - light prefix matching for partial-word queries
42
+ 4. Semantic ranking runs through `vector_index.search_memory()` across the selected embedded categories.
43
+ 5. Final scoring blends:
44
+ - keyword score
45
+ - semantic score
46
+ - reinforcement history
47
+ - promotion confidence
48
+ - recency
49
+ - optional lane bonus when lane-aware metadata matches
50
+ 6. Superseded / duplicate memories are filtered out, contested memories are penalized, and the sidecar flattens the ranked bucketed results into a plugin-friendly response.
51
+ 7. The sidecar response now includes lightweight `searchDiagnostics` so operators can inspect the active retrieval strategy, lane selection, per-bucket counts, result compaction, elapsed time, vector-search scan/prefilter behavior, and request-level execution path (provider-configured/provider-skipped/local-fallback-expected/route-exception-fallback) without scraping logs.
52
+ - vector search diagnostics now also carry the actual embedding execution outcome for the request (provider attempted, local fallback used, winning path, embedding generated)
53
+ - the top-level execution-path summary now promotes the key embedding outcome fields for faster operator scanning
54
+ 8. Retrieval items now also carry a compact `governance_summary` so retrieval and governance surfaces share a simpler bridge for status/triage without forcing every consumer to parse the full governance/provenance structure.
55
+ - retrieval signals now also expose `reinforcement_count`, `reinforcement_weighted_count`, `reinforcement_negative_count`, and `reinforcement_negative_penalty`, and reinforcement weighting is now frequency-aware, lightly recency-aware, and polarity-aware instead of pure flat averaging
56
+ 9. `/memory/search` diagnostics now include a governance rollup over the visible results so search consumers can quickly see how governance state is affecting the returned set.
57
+ - this now includes both overall visible status counts and per-bucket visible rollups
58
+ 10. Retrieval diagnostics also track governance-suppressed candidates (`superseded` / `duplicate`) so the search response can explain what governance hid before the visible result set was assembled.
59
+ 11. Suppression diagnostics now include per-bucket breakdowns so operators can see which memory classes are carrying the most governance cleanup pressure.
60
+ 12. Search diagnostics now also include compact reinforcement rollups so operators can see how much visible retrieval weight is coming from repeated successful experience, overall and by bucket.
61
+ - retrieval-side reinforcement diagnostics now mirror that with their own per-bucket totals for parity with governance suppression reporting
62
+ - reinforcement rollups now also expose bounded negative/polarity totals so operators can see when failed experience is dragging visible results downward
43
63
 
44
64
  Operational limits:
45
65
 
46
- - Semantic fallback now rehydrates any embedded bucket (`knowledge`, `runbooks`, `lessons`) when there are no keyword hits.
66
+ - Retrieval is still bounded to recent rows per category before ranking, so this is not a full-corpus search engine yet.
47
67
  - Default embeddings are local hash vectors (`OCMEMOG_EMBED_MODEL_LOCAL=simple`; legacy alias: `BRAIN_EMBED_MODEL_LOCAL`), which are deterministic but weak.
48
- - `runbooks`, `lessons`, `directives`, `reflections`, and `tasks` are now included in the default searchable categories and embedding index.
68
+ - `runbooks`, `lessons`, `directives`, `reflections`, and `tasks` are included in the default searchable categories and embedding index.
69
+ - Semantic ranking currently depends on the active embedding backend and the bounded candidate window in `vector_index.search_memory()`.
70
+ - Vector search now supports a lightweight lexical prefilter over the bounded scan window before cosine ranking, which improves relevance without changing the no-ANN local-first design.
71
+
72
+ Queue/async ingest behavior note:
73
+
74
+ - the async ingest queue is append-only on disk and processed in bounded batches
75
+ - malformed queue lines are skipped and acknowledged rather than blocking valid entries behind them
76
+ - valid payload failures are retried in-queue with a bounded retry counter before eventual drop/ack to avoid permanent poison-pill blockage
77
+ - operational visibility for these cases remains in queue stats / doctor health rather than crashing the sidecar, and doctor now distinguishes malformed queue damage from retrying poison items
78
+
79
+ Promotion decisions now expose a compact explanation object so operator surfaces can render why a candidate was promoted or rejected, what threshold applied, and which bucket was selected.
80
+ They now also expose a compact verification summary so confidence/threshold semantics are easier to interpret uniformly.
81
+ Rejected promotions also now distinguish generic-destination low-confidence cases from ordinary below-threshold destination-specific failures.
82
+
83
+ To combat long-term memory cruft, promotion decisions now also expose a compact `quality_summary`.
84
+ That summary is intentionally simple and operator-oriented:
85
+ - `quality`: low / medium / high
86
+ - `keep_recommendation`: drop / review / keep
87
+ - `noise_risk`: high / medium / low
88
+ - `destination_specificity`: generic / specific
89
+ - `margin`: confidence minus threshold
90
+
91
+ Design intent:
92
+ - weak generic memories should be visibly low-quality and easier to reject/prune later
93
+ - more specific, higher-margin memories should be visibly safer to keep
94
+ - this started as an explainability/control surface, but it now also drives a small active anti-cruft gate
95
+ - it gives future automation a better input for “only good memories are remembered” without requiring a risky schema or policy rewrite first
96
+
97
+ Current active anti-cruft rules:
98
+ - if a candidate is below the promotion threshold **and** only resolves to the generic `knowledge` destination,
99
+ it is treated as likely cruft and rejected with `rejected_as_generic_cruft`
100
+ - if that same low-confidence generic candidate is also textually redundant with existing generic knowledge,
101
+ it is classified more specifically as `rejected_as_redundant_generic_cruft`
102
+ - if a candidate resolves to a more specific destination but still falls modestly below threshold, it is now called out as `rejected_as_ambiguous_specific_memory`
103
+ - why these are the right first rules:
104
+ - generic low-confidence memories are among the easiest ways for memory stores to accumulate junk
105
+ - redundant generic memories are even worse because they increase clutter without increasing recall value
106
+ - weak specific-fit memories are not necessarily junk, but they should be made explicitly reviewable instead of being treated as cleanly trustworthy
107
+ - the rules are intentionally narrow to reduce surprise and avoid over-pruning while the quality system matures
49
108
 
50
109
  ## Write paths
51
110
 
@@ -84,6 +143,26 @@ Known caveat:
84
143
 
85
144
  ## Sidecar contract
86
145
 
146
+ The sidecar exposes a compact runtime summary in route payloads so operators can quickly tell whether the sidecar is in ready/degraded mode, which embedding provider path is active, which local embedding model is configured, whether hash-embedding fallback is in effect, what the current queue health snapshot looks like, and how much compatibility residue remains.
147
+ That queue snapshot now includes lightweight severity/hints so the normal runtime payload carries some operational judgment instead of raw counters only.
148
+ It also now distinguishes invalid queue lines from retrying payloads, which brings a compact slice of doctor-style queue diagnosis into ordinary runtime payloads.
149
+ To reduce translation friction with doctor output, the runtime queue snapshot now also carries doctor-style aliases such as `queue_depth` and `queue_backlog_severity`.
150
+ It now also carries compact worker-config issue reporting so invalid poll/batch settings can surface in normal runtime payloads without a separate doctor run.
151
+ As part of the runtime-summary consistency pass, the main operator-facing sub-blocks now expose a small shared shape (`enabled`, `status`, `issues`) where it fits, which makes the overall summary easier to consume uniformly.
152
+
153
+ Governance review summary responses now also expose lightweight diagnostics so operators can tell whether they are seeing cached data, how many review items are present, and how the queue splits across review kinds without scraping the full list.
154
+ Governance queue responses now also expose lightweight queue diagnostics so operators can quickly see item counts plus bucket/kind/priority-label breakdowns.
155
+ Governance auto-resolve responses now also expose lightweight diagnostics so operators can quickly see action totals plus reason/kind breakdowns and the active policy profile.
156
+ Governance audit responses now also expose lightweight diagnostics so operators can quickly see audit item totals plus event/status breakdowns.
157
+ Governance rollback responses now also expose lightweight diagnostics so operators can quickly see whether rollback succeeded and how the outcome was classified.
158
+ Governance queue items now also carry compact explanation blocks so queue surfaces and review surfaces are more aligned in how they present rationale.
159
+ Queue items now also share the same normalized priority-label vocabulary as review items, reducing operator/UI translation work.
160
+
161
+ Individual governance review items now also carry a compact explanation object so operator surfaces can render human-readable rationale and status context without reverse-engineering the raw review payload.
162
+ The governance review flow is now framed as apply/dismiss plus optional auto-apply, rather than requiring a dashboard-bound human approval step for routine cases.
163
+
164
+ Review items and review-summary diagnostics now also expose normalized priority labels so operator surfaces can reason about urgency without inventing their own bucket thresholds.
165
+
87
166
  The sidecar exposes:
88
167
 
89
168
  - `GET /healthz`
@@ -17,6 +17,9 @@ The release gate is now codified by:
17
17
  ## Validation
18
18
  - [ ] Install test deps for sidecar route tests: `python3 -m pip install -r requirements-test.txt`
19
19
  - [ ] `./scripts/ocmemog-release-check.sh`
20
+ - [ ] If prompt-time hydration behavior changed, validate the plugin gating path too (for example `node --test tests/test_auto_hydration_agent_scope.ts`) so agent-scoped `before_prompt_build` controls are covered
21
+ - [ ] If runtime/operator summary surfaces changed, validate the targeted runtime parity tests too (for example `tests/test_namespace_compat.py`) so `runtimeSummary` queue / embedding / auto-hydration blocks stay aligned
22
+ - [ ] If promotion/retention behavior changed, validate targeted promotion tests (for example `tests/test_profile_buckets.py`) and verify docs still reflect current anti-cruft gates and quality signals, including redundant-generic and ambiguous-specific rejection behavior when applicable
20
23
  - [ ] Verify `tests/test_doctor.py` still passes for doctor health surfaces if you changed check coverage
21
24
  - [ ] Verify `reports/release-gate-proof.json` exists after a passing gate and documents:
22
25
  - live ingest/search/get/hydrate verification
@@ -34,6 +37,8 @@ GitHub CI runs the same release check command so local and CI validation remain
34
37
  - [ ] Verify optional prereq install path is documented correctly
35
38
  - [ ] Verify LaunchAgent load path still matches repo scripts
36
39
  - [ ] Verify sidecar health check passes after install
40
+ - [ ] Verify any new plugin env controls are documented in README/usage/release notes (for example `OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS` / `OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS`)
41
+ - [ ] Verify README/usage/release notes still describe the current operator/runtime surfaces (`runtimeSummary`, `searchDiagnostics`, queue snapshot, hydration policy route) accurately after release-bound changes
37
42
 
38
43
  ## Public artifacts
39
44
  - [ ] Push `main`
package/docs/usage.md CHANGED
@@ -81,6 +81,13 @@ Default state location in this repo is `.ocmemog-state/`.
81
81
 
82
82
  On shutdown, set `OCMEMOG_SHUTDOWN_DRAIN_QUEUE=true` to synchronously flush queued ingest entries before exit. This is useful for short-running deployments and tests that expect strong delivery guarantees.
83
83
 
84
+ Queue behavior notes:
85
+ - malformed queue lines are now treated as durable queue errors and skipped/acknowledged so a single bad payload does not block later valid work
86
+ - valid payload failures are retried with a bounded in-queue retry marker (`_ocmemog_retry_count`) instead of blocking forever on the first poison item
87
+ - `OCMEMOG_INGEST_MAX_RETRIES` controls how many failed attempts a queued payload gets before it is dropped and recorded as a retry-exhausted error
88
+ - runtime queue stats keep the last queue parse/retry error visible via `QUEUE_STATS["last_error"]`
89
+ - `ocmemog-doctor` queue health now distinguishes invalid queue lines from retrying payloads so operators can tell parsing damage apart from poison-item retries
90
+
84
91
  ## Plugin API
85
92
 
86
93
  Health:
@@ -191,12 +198,75 @@ Notes:
191
198
  - Valid sidecar categories today are `knowledge`, `reflections`, `directives`, `tasks`, `runbooks`, and `lessons`.
192
199
  - `/memory/get` currently expects a `table:id` reference.
193
200
  - Runtime degradation is reported in every sidecar response.
201
+ - Sidecar responses now also include `runtimeSummary`, a compact operator-facing summary of runtime mode, embedding provider, local embedding model, embedding path readiness/fallback state, queue health snapshot, shim surface count, and missing dependency count.
202
+ - `runtimeSummary.queue` now includes lightweight operational judgment too: `severity` (`ok|warn|high`) plus short `hints` for backlog/worker/error situations.
203
+ - `runtimeSummary.queue` now also distinguishes `invalid_lines`, `retrying_lines`, and `max_retry_seen`, so normal runtime payloads can hint at queue corruption vs poison-item retry churn without a full doctor pass.
204
+ - For parity with doctor-style outputs, `runtimeSummary.queue` now also exposes `queue_depth` and `queue_backlog_severity` aliases alongside the compact fields.
205
+ - `runtimeSummary.queue.config_issues` now surfaces compact worker-config validation problems (for example invalid poll interval or batch size), plus a hint when config is invalid.
206
+ - For consistency across runtime summary sub-blocks, `queue`, `embedding_path_summary`, and `auto_hydration` now all expose a small common shape: `enabled`, `status`, and `issues`.
207
+ - Prompt-time auto-hydration can now be scoped per OpenClaw agent via plugin env vars:
208
+ - `OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS=agent-a,agent-b`
209
+ - `OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS=agent-x`
210
+ - ingest/checkpoint hooks remain global; only `before_prompt_build` hydration is agent-scoped
211
+ - the active auto-hydration policy is surfaced in `runtimeSummary.auto_hydration`
212
+ - plugin-side decision reasons now distinguish `disabled_globally`, `denied_by_agent_id`, `not_in_allowlist`, `allowed_by_allowlist`, and `allowed_globally` for easier debugging
213
+ - plugin logs now include structured decision context for both skipped and applied prompt hydration, including agent id, reason, and prepend sizes
214
+ - `/memory/search` now also returns `searchDiagnostics` with lightweight operator-facing retrieval metadata such as strategy, lane, bucket counts, result counts, query token count, elapsed time, vector-search diagnostics (`scan_limit`, `prefilter_limit`, candidate rows, fallback usage), and an `execution_path` block that clarifies provider-configured vs provider-skipped vs local-fallback-expected vs route-exception-fallback behavior.
215
+ - `searchDiagnostics.execution_path` now also promotes key embedding outcome fields (`provider_attempted`, `embedding_generated`, `embedding_path_used`, `local_fallback_used`) so the top-level request summary is easier to scan without drilling into nested vector diagnostics.
216
+ - `searchDiagnostics.vector_search.embedding` now carries per-request embedding execution details such as whether a provider was attempted, whether local fallback was actually used, what path won (`provider`, `local_simple`, `local_model`), and whether an embedding was generated at all.
217
+ - `searchDiagnostics` now also includes `governance_rollup` so operators can quickly see visible result status counts, how many returned items still need governance review, and per-bucket visible rollups for categories such as `knowledge`, `runbooks`, or `lessons`.
218
+ - `searchDiagnostics.retrieval_governance` now reports how many candidates were hidden before return because governance marked them `superseded` or `duplicate`, including per-bucket breakdowns such as `knowledge`, `runbooks`, or `lessons`.
219
+ - `searchDiagnostics.reinforcement_rollup` and `searchDiagnostics.retrieval_reinforcement` now summarize visible reinforcement pressure and retrieval-side reinforcement totals, including per-bucket visible and retrieval-side reinforcement counts.
220
+ - Reinforcement rollups now also include negative/polarity totals (`negative_reinforcement_result_count`, `total_negative_penalty`) so operators can see when failed experience is actively depressing the visible result set.
221
+ - Retrieval results now include a compact `governance_summary` alongside the full governance payload so dashboards/operators can quickly see status, canonical/relationship references, contradiction count, and `needs_review` without unpacking the full provenance structure.
222
+ - `/memory/governance/review/summary` now returns `reviewDiagnostics` so operators can see cache hit/freshness, item count, kind breakdown, and active filters without inferring from the raw item list.
223
+ - `/memory/governance/review` items now include an `explanation` block with a short human-facing rationale plus source/target memory status, so dashboards and operators do not have to reconstruct meaning from raw fields alone.
224
+ - Governance review items now also include a normalized `priority_label` (`none|low|medium|high|critical`), and review summary diagnostics include `priority_label_counts` for quick operator triage.
225
+ - Governance review actions are now modeled as `apply` / `dismiss` rather than human-approval language, and `/memory/governance/review/auto_apply` can apply current review items directly without relying on dashboard/user approval input.
226
+ - `/memory/governance/queue` now returns `queueDiagnostics` so operators can see item count plus bucket/kind/priority-label breakdowns without scanning the full queue manually.
227
+ - Governance queue items now also include an `explanation` block with short human-facing rationale and target-reference context, so queue consumers do not have to reconstruct meaning from raw kind/priority fields alone.
228
+ - Governance queue items now also carry the same normalized `priority_label` (`none|low|medium|high|critical`) used by governance review items.
229
+ - `/memory/governance/auto_resolve` now returns `autoResolveDiagnostics` so operators can see action counts plus reason/kind breakdowns and the active policy profile without unpacking the full action list manually.
230
+ - `/memory/governance/audit` now returns `auditDiagnostics` so operators can quickly see audit item counts plus event/status breakdowns without scanning the raw log-derived entries manually.
231
+ - `/memory/governance/rollback` now returns `rollbackDiagnostics` so operators can quickly see whether rollback succeeded and how the outcome was classified.
232
+ - `/memory/auto_hydration/policy` accepts an `agent_id` and returns the current prompt-time hydration decision (`allowed`, `reason`, allowlist, denylist, and scoping state) so agent-specific continuity policy can be debugged from the sidecar.
194
233
 
195
234
  ## What is safe to rely on
196
235
 
197
236
  - `store.init_db()` creates the local schema automatically
237
+ - promotion decisions now return an `explanation` block describing why a candidate was promoted or rejected, what threshold applied, and which destination bucket was chosen
238
+ - promotion decisions now also return a compact `verification_summary` (`status`, `reason`, `confidence`, `threshold`, `margin`) so verification/confidence semantics are easier to interpret consistently
239
+ - rejected promotions now use slightly richer reasons, distinguishing plain below-threshold outcomes from below-threshold generic-destination cases
240
+ - promotion decisions now also return a `quality_summary` designed specifically to fight long-term memory cruft:
241
+ - `quality` (`low|medium|high`)
242
+ - `keep_recommendation` (`drop|review|keep`)
243
+ - `noise_risk` (`high|medium|low`)
244
+ - `destination_specificity` (`generic|specific`)
245
+ - `margin` (confidence minus threshold)
246
+ - practical meaning:
247
+ - low-confidence generic `knowledge` candidates are now explicitly labeled as high-risk noise and recommended for drop
248
+ - stronger, more specific promoted memories are labeled as keep-worthy
249
+ - this does not replace governance/review yet, but it gives operator surfaces and future automation a clearer signal for “remember this” vs “don’t keep this around”
250
+ - the anti-cruft gate is now partially active, not just advisory:
251
+ - low-confidence candidates that only resolve to the generic `knowledge` destination are rejected as likely cruft instead of being treated like ordinary generic rejects
252
+ - this shows up explicitly as `rejected_as_generic_cruft` in both `verification_summary.reason` and `explanation.reason`
253
+ - if the low-confidence generic candidate is also textually redundant with existing generic knowledge, it is now flagged more specifically as `rejected_as_redundant_generic_cruft`
254
+ - `quality_summary.redundant_generic=true` marks that stronger duplicate-ish generic junk case
255
+ - low-confidence candidates that do resolve to a more specific bucket can now still be called out as weak/ambiguous specific memories instead of being lumped into an undifferentiated threshold failure
256
+ - `quality_summary.ambiguous_specific=true` plus `rejected_as_ambiguous_specific_memory` mark these cases for review rather than treating them like generic junk
257
+ - intent: weak generic memories should fail earlier so they do not accumulate as low-value long-term memory objects, especially when they merely restate already-kept generic knowledge, while weak specific-fit memories stay visible as review-worthy instead of silently blending into ordinary rejects
198
258
  - `retrieval.retrieve_for_queries()` is the main sidecar search path
199
- - `vector_index.search_memory()` provides a semantic fallback over `knowledge`, `runbooks`, `lessons`, `directives`, `reflections`, and `tasks` when keyword retrieval misses
259
+ - search is hybrid-ranked, not substring-only:
260
+ - lexical scoring blends exact match, token overlap, ordered phrase overlap, and light prefix matching
261
+ - semantic scoring comes from `vector_index.search_memory()` across the selected embedded categories
262
+ - final ranking also considers reinforcement history, promotion confidence, recency, and optional lane bonuses
263
+ - reinforcement is now frequency-aware rather than flat-average only; repeated successful experiences increase strength up to a bounded cap and expose `reinforcement_count` in retrieval signals
264
+ - reinforcement is also recency-aware: newer successful experiences count more than stale ones, and retrieval signals now expose `reinforcement_weighted_count`
265
+ - negative reinforcement is now modeled explicitly with bounded penalties; retrieval signals expose `reinforcement_negative_count` and `reinforcement_negative_penalty`
266
+ - `vector_index.search_memory()` remains a bounded semantic scan rather than a full ANN index
267
+ - it now supports a lightweight lexical prefilter before cosine ranking
268
+ - `OCMEMOG_SEARCH_VECTOR_SCAN_LIMIT` bounds the candidate window
269
+ - `OCMEMOG_SEARCH_VECTOR_PREFILTER_LIMIT` bounds the lexically-biased shortlist used before cosine scoring
200
270
  - `probe_runtime()` exposes missing shim replacements and optional embedding warnings
201
271
 
202
272
  ## What is not safe to rely on yet
package/index.ts CHANGED
@@ -12,9 +12,6 @@ type PluginConfig = {
12
12
  token?: string;
13
13
  };
14
14
 
15
- const AUTO_HYDRATION_ENABLED = ["1", "true", "yes"].includes(
16
- String(process.env.OCMEMOG_AUTO_HYDRATION ?? "false").trim().toLowerCase(),
17
- );
18
15
  const DURABLE_OUTBOX_ENABLED = !["0", "false", "no"].includes(
19
16
  String(process.env.OCMEMOG_DURABLE_OUTBOX ?? "true").trim().toLowerCase(),
20
17
  );
@@ -521,6 +518,83 @@ function buildTurnMetadata(message: unknown, ctx: { agentId?: string; sessionKey
521
518
  };
522
519
  }
523
520
 
521
+ function autoHydrationEnabled(): boolean {
522
+ return ["1", "true", "yes"].includes(String(process.env.OCMEMOG_AUTO_HYDRATION ?? "false").trim().toLowerCase());
523
+ }
524
+
525
+ function parseAgentIdList(raw: string | undefined): string[] {
526
+ return String(raw ?? "")
527
+ .split(",")
528
+ .map((value) => value.trim())
529
+ .filter(Boolean);
530
+ }
531
+
532
+ export function getAutoHydrationDecision(agentId?: string): {
533
+ enabled: boolean;
534
+ allowed: boolean;
535
+ reason:
536
+ | 'disabled_globally'
537
+ | 'denied_by_agent_id'
538
+ | 'not_in_allowlist'
539
+ | 'allowed_by_allowlist'
540
+ | 'allowed_globally';
541
+ agentId?: string;
542
+ allowAgentIds: string[];
543
+ denyAgentIds: string[];
544
+ } {
545
+ const normalized = String(agentId ?? '').trim() || undefined;
546
+ const allowAgentIds = parseAgentIdList(process.env.OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS);
547
+ const denyAgentIds = parseAgentIdList(process.env.OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS);
548
+ if (!autoHydrationEnabled()) {
549
+ return {
550
+ enabled: false,
551
+ allowed: false,
552
+ reason: 'disabled_globally',
553
+ agentId: normalized,
554
+ allowAgentIds,
555
+ denyAgentIds,
556
+ };
557
+ }
558
+ if (normalized && denyAgentIds.includes(normalized)) {
559
+ return {
560
+ enabled: true,
561
+ allowed: false,
562
+ reason: 'denied_by_agent_id',
563
+ agentId: normalized,
564
+ allowAgentIds,
565
+ denyAgentIds,
566
+ };
567
+ }
568
+ if (allowAgentIds.length > 0) {
569
+ const allowed = Boolean(normalized && allowAgentIds.includes(normalized));
570
+ return {
571
+ enabled: true,
572
+ allowed,
573
+ reason: allowed ? 'allowed_by_allowlist' : 'not_in_allowlist',
574
+ agentId: normalized,
575
+ allowAgentIds,
576
+ denyAgentIds,
577
+ };
578
+ }
579
+ return {
580
+ enabled: true,
581
+ allowed: true,
582
+ reason: 'allowed_globally',
583
+ agentId: normalized,
584
+ allowAgentIds,
585
+ denyAgentIds,
586
+ };
587
+ }
588
+
589
+ export function formatAutoHydrationDecisionLog(decision: ReturnType<typeof getAutoHydrationDecision>): string {
590
+ const agent = decision.agentId ?? '<none>';
591
+ return `agent=${agent} allowed=${String(decision.allowed)} reason=${decision.reason} allow_agents=${decision.allowAgentIds.join('|') || '<all>'} deny_agents=${decision.denyAgentIds.join('|') || '<none>'}`;
592
+ }
593
+
594
+ export function shouldAutoHydrateForAgent(agentId?: string): boolean {
595
+ return getAutoHydrationDecision(agentId).allowed;
596
+ }
597
+
524
598
  function registerAutomaticContinuityHooks(api: OpenClawPluginApi, config: PluginConfig) {
525
599
  void flushOutbox(api, config).catch((error) => {
526
600
  api.logger.warn(`ocmemog durable outbox startup flush failed: ${error instanceof Error ? error.message : String(error)}`);
@@ -562,10 +636,20 @@ function registerAutomaticContinuityHooks(api: OpenClawPluginApi, config: Plugin
562
636
  // failures if a host runtime persists prepended context into transcript history.
563
637
  // Keep the memory backend and sidecar tools active, but only prepend continuity
564
638
  // when explicitly enabled and after the host runtime has been validated.
565
- api.logger.info(`ocmemog auto hydration env raw=${String(process.env.OCMEMOG_AUTO_HYDRATION ?? '<unset>')} computed=${String(AUTO_HYDRATION_ENABLED)}`);
566
- if (AUTO_HYDRATION_ENABLED) {
639
+ const allowAgentIds = parseAgentIdList(process.env.OCMEMOG_AUTO_HYDRATION_ALLOW_AGENT_IDS);
640
+ const denyAgentIds = parseAgentIdList(process.env.OCMEMOG_AUTO_HYDRATION_DENY_AGENT_IDS);
641
+ const hydrationEnabled = autoHydrationEnabled();
642
+ api.logger.info(
643
+ `ocmemog auto hydration env raw=${String(process.env.OCMEMOG_AUTO_HYDRATION ?? '<unset>')} computed=${String(hydrationEnabled)} allow_agents=${allowAgentIds.join('|') || '<all>'} deny_agents=${denyAgentIds.join('|') || '<none>'}`,
644
+ );
645
+ if (hydrationEnabled) {
567
646
  api.on("before_prompt_build", async (event, ctx) => {
568
647
  try {
648
+ const hydrationDecision = getAutoHydrationDecision(ctx.agentId);
649
+ if (!hydrationDecision.allowed) {
650
+ api.logger.info(`ocmemog auto hydration skipped ${formatAutoHydrationDecisionLog(hydrationDecision)}`);
651
+ return;
652
+ }
569
653
  const scope = resolveHydrationScope(event.messages ?? [], ctx);
570
654
  if (!scope.session_id && !scope.thread_id && !scope.conversation_id) {
571
655
  return;
@@ -579,7 +663,7 @@ function registerAutomaticContinuityHooks(api: OpenClawPluginApi, config: Plugin
579
663
  const continuityContext = buildHydrationContext(payload);
580
664
  const prependContext = [briefContext, continuityContext].filter(Boolean).join("\n\n");
581
665
  api.logger.info(
582
- `ocmemog hydration prepend sizes brief=${briefContext.length} continuity=${continuityContext.length} combined=${prependContext.length}`,
666
+ `ocmemog auto hydration applied ${formatAutoHydrationDecisionLog(hydrationDecision)} brief=${briefContext.length} continuity=${continuityContext.length} combined=${prependContext.length}`,
583
667
  );
584
668
  if (!prependContext) {
585
669
  return;
package/ocmemog/doctor.py CHANGED
@@ -412,14 +412,28 @@ def _run_queue_health(_: None) -> CheckResult:
412
412
 
413
413
  invalid = 0
414
414
  total = 0
415
+ retrying = 0
416
+ max_retry_seen = 0
415
417
  invalid_samples: list[dict[str, Any]] = []
418
+ retry_samples: list[dict[str, Any]] = []
416
419
  for raw_line in queue_path.read_text(encoding="utf-8").splitlines():
417
420
  line = raw_line.strip()
418
421
  if not line:
419
422
  continue
420
423
  total += 1
421
424
  try:
422
- json.loads(line)
425
+ payload = json.loads(line)
426
+ if isinstance(payload, dict):
427
+ retry_count = int(payload.get("_ocmemog_retry_count", 0) or 0)
428
+ if retry_count > 0:
429
+ retrying += 1
430
+ max_retry_seen = max(max_retry_seen, retry_count)
431
+ if len(retry_samples) < 3:
432
+ retry_samples.append({
433
+ "line_no": total,
434
+ "retry_count": retry_count,
435
+ "kind": str(payload.get("kind") or payload.get("_ocmemog_task") or ""),
436
+ })
423
437
  except Exception:
424
438
  invalid += 1
425
439
  if len(invalid_samples) < 3:
@@ -430,6 +444,9 @@ def _run_queue_health(_: None) -> CheckResult:
430
444
  if invalid:
431
445
  status = "warn"
432
446
  messages.append(f"Queue has {invalid} invalid line(s).")
447
+ if retrying:
448
+ status = "warn"
449
+ messages.append(f"Queue has {retrying} retrying payload(s) (max retry count {max_retry_seen}).")
433
450
  if depth > 25:
434
451
  status = "warn"
435
452
  messages.append(f"Queue backlog is elevated ({depth}).")
@@ -448,6 +465,8 @@ def _run_queue_health(_: None) -> CheckResult:
448
465
  hints: list[str] = []
449
466
  if invalid > 0:
450
467
  hints.append("Run --fix repair-queue to drop invalid queue entries.")
468
+ if retrying > 0:
469
+ hints.append("Inspect the queue retrying payloads; repeated retries usually indicate a poison item or downstream ingest/postprocess failure.")
451
470
  if depth > 0 and not worker_enabled:
452
471
  hints.append("Enable OCMEMOG_INGEST_ASYNC_WORKER or flush with POST /memory/ingest_flush.")
453
472
  if depth > 1000:
@@ -479,6 +498,8 @@ def _run_queue_health(_: None) -> CheckResult:
479
498
  "queue_depth": depth,
480
499
  "queue_path": str(queue_path),
481
500
  "invalid_lines": invalid,
501
+ "retrying_lines": retrying,
502
+ "max_retry_seen": max_retry_seen,
482
503
  "lines_seen": total,
483
504
  "stats": stats,
484
505
  "queue_bytes": queue_size,
@@ -487,6 +508,7 @@ def _run_queue_health(_: None) -> CheckResult:
487
508
  "queue_worker_batch_max": worker_batch_max,
488
509
  "queue_config_issues": queue_config,
489
510
  "invalid_payload_samples": invalid_samples,
511
+ "retrying_payload_samples": retry_samples,
490
512
  "ingest_worker_running": bool(app._INGEST_WORKER_THREAD and app._INGEST_WORKER_THREAD.is_alive()),
491
513
  "queue_backlog_severity": backlog_severity,
492
514
  "queue_hints": hints,