RubyGems - claude_memory - Versions diffs - 0.13.0 → 0.13.1 - Mend

claude_memory 0.13.0 → 0.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/.claude/memory.sqlite3 +0 -0
data/.claude-plugin/marketplace.json +1 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +10 -0
data/docs/improvements.md +79 -0
data/lib/claude_memory/distill/null_distiller.rb +24 -2
data/lib/claude_memory/hook/context_injector.rb +35 -10
data/lib/claude_memory/observe/reflector.rb +32 -16
data/lib/claude_memory/observe/token_overlap_matcher.rb +55 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +1 -0
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 93e553fc87bd36ebe27b8709a28dea9bd206a669491327fc8b6340e6078a8f67
-  data.tar.gz: c19722c5028ed5746f58792bb34e57939600deaa0a59b6ff00a421b3ed829bdc
+  metadata.gz: '03389773428d20d899e3320e349cfd6a55e9c92df7b00837cf85466fe0741985'
+  data.tar.gz: 14c91b7778987ffd1acbf11b5b58c4ccf0ed7eda6f187534ea27b4dcc987bf67
 SHA512:
-  metadata.gz: 7bd3a98c636c525e2bfc579e9e265ecef3a07311d4ba543d692abc092ddc0822df86832b9e5b53d802597a6dbe7495aa192210d6d4e5961c21383daac22cef09
-  data.tar.gz: b587239c11c342551f35937ecdbb108254269306fa98fbaac6680d23f8e427a32f26cbb1524c8f19a08da1773e292b6d75df92e101cb724d59fd65017f275832
+  metadata.gz: b16bf521bfd7c496f4723a6029f981121ea2a76643eafb5045ab9528ab51543b47caf7d51474f21d69bb725e9544aa85c295ad303ee94cd7267f3d8cf4233184
+  data.tar.gz: a3d8a65e03d9c62fb7d4e2feb1ec37b32c9513eb13fe3c7c6488922192a842e67ac237562b186278948cb4e8ef9372a8e5f2d63c45ab34e9a6ac4604102ff0d7

data/.claude/memory.sqlite3 CHANGED Viewed

Binary file

data/.claude-plugin/marketplace.json CHANGED Viewed

@@ -7,7 +7,7 @@
   "plugins": [
     {
       "name": "claude-memory",
-      "version": "0.13.0",
+      "version": "0.13.1",
       "source": "./",
       "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions, plus an episodic observation log of what happened — so Claude explains your codebase without file traversal, follows your patterns, learns from corrections, and never re-asks what it already learned.",
       "repository": "https://github.com/codenamev/claude_memory"

data/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-memory",
-  "version": "0.13.0",
+  "version": "0.13.1",
   "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions, plus an episodic observation log of what happened — so Claude explains your codebase without file traversal, follows your patterns, learns from corrections, and never re-asks what it already learned.",
   "author": {
     "name": "Valentino Stoll",

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,16 @@ All notable changes to this project will be documented in this file.
 ## [Unreleased]
+## [0.13.1] - 2026-06-23
+Theme: **The observational layer, audited and repaired.** A critical examination of every observation in a real (dogfooding) project DB found the episodic layer was producing ~no useful observations and injecting noise into sessions — three evidence-backed defects, now fixed (the data is in `docs/improvements.md` #72–#75). No schema changes, no breaking changes.
+### Fixed
+- **High-precision Layer-1 observation filter (#74).** The high-recall Observer was scraping code, doc, and transcript fragments past `noise_body?` and injecting them into SessionStart (measured: 38 of 117 obvious-noise rows slipped through — spec fixtures, CHANGELOG table rows, benchmark tree output, even the distiller's own source comments). Strengthened the gate to reject code/JSON `key: "value"`, method calls, table pipes, box-drawing glyphs, `(vector)` labels, and raw JSONL fields, and to require a body to begin like a prose sentence. Verified against the real noise corpus: every sampled fragment now rejected, clean prose decisions/conventions kept.
+- **Corroboration can finally accumulate (#73).** Dedup matched on exact normalized strings, so varied wording of the same event never folded — every observation stayed at `corroboration_count = 1` and the promotion gate could never fire. Replaced exact grouping with greedy clustering over an injected similarity matcher; the default `Observe::TokenOverlapMatcher` (lexical Jaccard, deterministic, free) folds near-duplicates so corroboration climbs toward promotion. Pure synonym paraphrases still need real embeddings (measured: tfidf can't separate them from unrelated text on short bodies) — injectable via the `matcher:` seam.
+- **Observation capture elevated to a first-class SessionStart ask (#72).** Authoring observations was a paragraph buried in the optional deep-distill prompt, which fires almost never (`store_extraction` had zero calls in the layer's lifetime; Layer-1 auto-ingest carried ~100:1 of the load). Decoupled it into its own prominent `## Log What Happened` section. Whether the LLM-authored (Layer-2) path now fires is measurable via the `mcp_extraction` content-item source.
 ## [0.13.0] - 2026-06-18
 Theme: **Episodic memory — a second kind of memory.** ClaudeMemory gains an append-only *observation* layer ("what happened") that complements the semantic fact store ("what is true"), modeled on [Mastra's Observational Memory](docs/influence/mastra-observational-memory.md). Observations accrue automatically, are deduplicated/consolidated by reflection, and are promoted to facts only after corroboration — making repeated sighting an anti-hallucination gate built into the memory model. Schema advances to v20 (additive; no breaking changes to existing facts/queries).

data/docs/improvements.md CHANGED Viewed

@@ -504,6 +504,85 @@ Source: `docs/influence/mastra-observational-memory.md` — architecture study o
 ---
+### 71. Exclude the project DB from the published gem (gem is 28MB, ~96MB of it the dogfooding DB)
+Source: 2026-06-18 live observation while building the 0.13.0 release gem.
+**Problem.** `claude_memory.gemspec` builds its file list from `git ls-files` and rejects `bin/ Gemfile .gitignore .rspec spec/ .github/ .standard.yml` — but **not** `.claude/memory.sqlite3`, which is tracked (per the "always commit the project DB" convention). So the published gem *ships the repo's own dogfooding memory database*: the working-tree DB is ~96MB, compressing to a **28MB gem** (v0.6.0 was 280KB; the gem has been silently growing — 0.9.1 was 19MB — as the DB accumulates). Gem users get nothing from it (they init their own empty DB on install), it bloats every download, and it's trending toward RubyGems' 100MB ceiling.
+**Fix.** Add `.claude/` (or at least `.claude/memory.sqlite3` + WAL/SHM siblings) to the gemspec reject filter. Verify with `gem build` that the gem drops to <1MB and that nothing in the gem actually requires the file at runtime (it shouldn't — runtime opens the *user's* DB path via `Configuration`). Add a spec asserting `Gem::Specification.load(...).files` excludes `.claude/memory.sqlite3` so it can't regress.
+**Why High.** Low effort, high impact: ~28MB → <1MB published gem, and it removes a slow-growing landmine before it actually exceeds the RubyGems size limit and blocks a release. Not introduced by 0.13.0 — pre-existing and compounding.
+**Note on the convention.** This does *not* conflict with "always commit `.claude/memory.sqlite3`" — that's about repo reproducibility for collaborators. Shipping it *in the gem* is a separate, unintended consequence of the `git ls-files` manifest.
+---
+> **Observational-layer audit (2026-06-23).** A critical examination of every observation in this project's DB found the episodic layer is, in practice, producing ~no useful observations and is injecting noise into sessions. Four root causes (#72–#75), each backed by the live data below. Snapshot at audit time: **113 active observations, 0 consolidated, 0 expired, 0 promoted, every `corroboration_count = 1`**; only `decision`/`preference` kinds; every row traces to a `claude_code` transcript on the `observational-layer-*` branches. The count grew 99 → 105 → 113 *during* the audit session — the dogfooding loop is live and compounding. These supersede the optimistic framing of #68; the mechanism is sound, the inputs and the matching are not.
+### 72. Layer-2 (Claude-as-observer) produces **zero** observations — the quality source is silent ⭐
+Source: 2026-06-23 observational-layer audit.
+**🟡 Structural fix shipped 2026-06-23** (option a). Decoupled the observation-capture ask from the buried, rarely-fired deep-distill paragraph into its own prominent SessionStart section (`ContextInjector#format_observation_capture_prompt` — "## Log What Happened"). This maximizes the chance Claude authors observations, but persistence still rides a `store_extraction` tool call, so **effectiveness is not yet proven** — whether Layer-2 now actually fires is measurable via the `mcp_extraction` content-item source and needs real-session validation (and ultimately the #75 eval). Not closing this until that signal turns positive.
+**Problem.** The design's quality observations were always meant to come from **Layer-2** (Claude-as-observer): the SessionStart prompt (`ContextInjector#format_distillation_prompt`) asks Claude to populate the `observations` field of its `memory.store_extraction` call. **The vehicle is dormant.** Evidence (sharpened 2026-06-23 with `mcp_tool_calls.called_at` + `activity_events`):
+- `store_extraction` was invoked **4 times ever — all on 2026-04-17 to 2026-04-30**, i.e. *six-plus weeks before the observational layer (and the `observations` parameter) shipped on 2026-06-16/18*. Those calls **could not** have carried observations; the field didn't exist yet.
+- **Since the layer shipped, `store_extraction` has fired ZERO times.** Layer-2 has never run, not once, in the feature's entire life. (Corroborating: `store_extraction` creates a synthetic `source: "mcp_extraction"` content_item; **0 of the 113 observations trace to `mcp_extraction`** — all to `claude_code` ingest.)
+- **Layer-1 auto-ingest dominates ~100:1**: `activity_events` shows **409 `hook_ingest` vs 4 `store_extraction` total**. Content flows in automatically on every Stop/SessionEnd hook without Claude's involvement, so the Layer-2 deep-distill path — gated on a *fresh session* with *undistilled ≥200-char* items that Claude must *choose* to act on — essentially never triggers.
+**Why this is the highest-leverage finding.** Layer-1 (regex over raw transcript) *cannot* produce episodic narrative — it only scrapes fragments (see #74; #74 makes it high-precision, not narrative). The design delegated quality to Layer-2, but Layer-2 is structurally dormant: it isn't a prompt-wording problem (the `store_extraction` schema *does* expose `observations` with a good description, and the prompt *does* ask) — it's that **the path the observations ride on doesn't fire** in normal operation. So the episodic log is, and will remain, 100% Layer-1 scrapes until observation authoring is moved onto a path that actually runs.
+**Fix (design fork — not a one-shot code change).** Options:
+- **(a) Author observations on the Layer-1 hook path, but with an LLM.** The hook can't call Claude (no API budget), so this means: have the *next* SessionStart context inject the raw undistilled tail and ask Claude to emit observations directly as part of the normal turn (not gated behind a voluntary `store_extraction` deep-distill). Rides the session, no extra cost — same mechanism the fact-injection uses.
+- **(b) Make Layer-2 fire reliably** — lower the fresh-session/≥200-char gate, or make observation emission a first-class, early, non-optional instruction. Risk: effectiveness is unmeasurable without real A/B sessions, and the headless-recall gap (`project_headless_retrieval_gap.md`) says Claude often won't call MCP tools at all.
+- **(c) Derive observations from what Claude already produces** — the `decisions`/`facts` it extracts are higher-signal than regex scrapes; synthesize observations from those deterministically.
+- Regardless: **add telemetry distinguishing Layer-1 vs Layer-2 observation provenance** so "is Layer-2 firing?" is a dashboard number, not a forensic dig.
+**Cross-links.** Blocks the value premise of #68; #74/#73 only make the Layer-1 floor less bad and the loop functional — neither makes the log *good*. This is the one that does.
+### 73. Observation dedup/corroboration is normalized-**exact**, so the promotion loop can never fire ⭐
+**✅ Shipped 2026-06-23.** Replaced exact-string grouping with greedy clustering over an injected similarity matcher (`Reflector#dedupe_scope`). Default matcher is `Observe::TokenOverlapMatcher` — lexical Jaccard token-overlap (deterministic, free, no embedding dependency), threshold 0.5. Folds the common case (one event re-observed with slightly different wording → corroboration now accumulates and can cross the promotion gate) while keeping unrelated statements apart (Jaccard ~0). **Deliberate limitation, data-driven:** measured that tfidf cosine (0.32) can't separate pure synonym paraphrases from unrelated pairs (0.13) on short bodies, so neither the default lexical matcher nor tfidf folds "use SQLite" vs "chose SQLite" — that needs real embeddings, which can be injected via the `matcher:` seam (any object responding to `similar?(a, b)`) when fastembed is configured.
+Source: 2026-06-23 observational-layer audit.
+**Problem.** `Observe::Reflector#dedupe` folds observations by `group_by { [scope, normalize(body)] }` where `normalize` is just `downcase` + whitespace-collapse + strip. Two observations corroborate **only if their bodies are byte-identical after lowercasing**. Real captures of "the same thing" are never byte-identical — e.g. the four stored variants "PreCompact hook set.", "PreCompact hook set — the design's Mastra-token-threshold analog.", "PreCompact set alongside ingest + sweep." describe one event but never fold. Result, confirmed in the data: **every observation has `corroboration_count = 1`; 0 consolidated; 0 promoted.** The corroboration gate — the layer's headline anti-hallucination feature — is **dead by construction** on any varied text. It can only fire if the *exact same string* recurs, which regex fragments from different transcript chunks essentially never do.
+**Fix.** Corroboration/dedup must be **semantic**, not exact: reuse the existing embedding stack (`Embeddings` + sqlite-vec) to fold observations above a similarity threshold, or fold on a normalized *subject+kind* key rather than the full body. Until then, the promotion gate provides no value and the "graduate after 2 sightings" story is unsupported. Add a spec that two paraphrases of one event corroborate.
+**Cross-links.** Without this, #72's quality observations still wouldn't promote.
+### 74. Layer-1 Observer ingests code/doc/transcript fragments; `noise_body?` lets ~⅓ through
+**✅ Shipped 2026-06-23** (commit `d81a684`). Strengthened `NOISE_BODY_SIGNATURE` (code/JSON `key: "value"`, method calls, spaced table pipes, box-drawing glyphs, `(vector)` labels, JSONL fields) and added a prose-start requirement. Verified against the audit corpus: all five real noise samples now rejected, clean prose kept. **Residual (not a regression):** *truncated-prose* fragments with no code signature (e.g. "encompasses how to use fr…") can still slip — that's the greedy `.+` capture, and ultimately the Layer-2 question (#72), not the noise filter.
+Source: 2026-06-23 observational-layer audit.
+**Problem.** The Layer-1 Observer runs `decided to (.+)` / `we always|never (.+)` over **raw transcript text**, which on this repo (and any repo whose sessions discuss code) is saturated with trigger phrases inside source, specs, docs, and tool output. The `noise_body?` filter (`NOISE_BODY_SIGNATURE = /\bdef\s|\bclass\s|\bmodule\s|=>|::|","|":\s*"|[{}]|\$\(|&&|\|\|/`) is tuned for code-*syntax* and misses prose/table/transcript fragments. Measured against the live 113: the filter catches **39**, but **38 obvious-noise rows slip through** (≈44% look like noise by a conservative heuristic; manual review puts it higher). Concrete slipped examples actually sitting in the injected log:
+- `[89] decided to use SQLite", kind: "decision", priority: 1) expect(id).to be_a(Integer)…` — a **spec fixture line**.
+- `[104] decided to gate promotion on corroboration" | | Changes | Explicitly…` — a **CHANGELOG table row** (`| |` ≠ `||`, so it dodges the filter).
+- `[48]–[55] / first-person `we always|never`)…` — fragments of the **distiller's own source-code comment**.
+- `[99] · (vector) 78 ├─ How frozen_string_literal…` — **benchmark tree output**.
+These are priority-1 `decision` rows, so they *are* injected into Block 1 of SessionStart (observed live in this session's own context) — spending context budget on garbage and risking misdirection (e.g. `[46] decided to use Postgres.`, a fixture string, implying a stack the project doesn't use).
+**Fix.** Make Layer-1 high-precision-or-silent: reject bodies that look like code/markdown/transcript (leading `-`/`#`/`|`, table pipes, `key: "value"` shapes, tree glyphs `├─└─`, `(vector)`, backtick-dense spans, JSONL artifacts) — invert the default from high-recall to high-precision, since the recall here is ~all noise. Pair with the `ContentSanitizer`/Observer border. (This is the P1 item from the 2026-06-18 quality review, now empirically confirmed and worse than estimated.)
+**Cross-links.** Even fully fixed, Layer-1 is a stopgap until #72; together they decide whether the log is signal or noise.
+### 75. The episodic layer has no fair test — this repo is a pathological self-pollution case
+Source: 2026-06-23 observational-layer audit.
+**Problem.** Every observation traces to this project's *own* `claude_code` design transcripts, whose specs literally contain `insert_observation(body: "decided to use SQLite")` and whose docs are full of "decided to…" prose. claude_memory dogfooding on its own repo is the **worst possible self-test** for the Observer — it maximizes trigger-text density and self-ingestion. So the audit above measures *self-pollution*, not the design's ceiling; a normal Rails/Django app would look very different. We currently have **no measurement of the layer's value on a representative project**, and the optimistic compression/promotion story in #68 was never validated.
+**Fix.** Stand up the deferred **LongMemEval-style episodic suite** (#67/#68 medium item) and/or capture a real non-claude_memory project trace as a fixture, and report observation precision (signal vs noise), corroboration/promotion rates, and compression on *that*. Treat "episodic layer adds value" as **unproven** in public materials until this exists (the 0.13.0 blog draft already hedges accordingly). Until then, the self-pollution makes the dashboard Observations panel actively misleading on this repo.
+**Cross-links.** Gates any future episodic value claim; depends on #72–#74 being fixed first to be worth measuring.
+---
 ## Medium Priority
 ### ~~18. Shell Completion for CLI~~ ✅ Implemented 2026-03-20

data/lib/claude_memory/distill/null_distiller.rb CHANGED Viewed

@@ -49,8 +49,24 @@ module ClaudeMemory
         /\bwe\s+(?:should\s+)?(?:always|never)\s+(.+)/i
       ].freeze
-      # Bodies that look like code / JSON / shell rather than a statement.
-      NOISE_BODY_SIGNATURE = /\bdef\s|\bclass\s|\bmodule\s|=>|::|","|":\s*"|[{}]|\$\(|&&|\|\|/
+      # Bodies that look like code / JSON / shell / markup / transcript rather
+      # than a prose statement. High-precision gate: the Layer-1 observer scrapes
+      # raw transcript spans, which on a code-heavy project are dominated by
+      # source, specs, docs, and tool output — none of which are observations.
+      # (2026-06-23 audit, improvements #74: the prior signature let 38/117
+      # obvious-noise rows through — spec fixtures like `kind: "decision"`,
+      # CHANGELOG table rows, benchmark tree output, the distiller's own source
+      # comments — and they were being injected into SessionStart.)
+      NOISE_BODY_SIGNATURE = Regexp.union(
+        /\bdef\s|\bclass\s|\bmodule\s/,                      # Ruby definitions
+        /=>|::|","|":\s*"|[{}]|\$\(|&&|\|\|/,                # code / JSON / shell punctuation
+        /\w+:\s*["\[{\d]/,                                   # code/JSON key: "value" / key: 1 / key: [
+        /\w\(/,                                              # method/function call: expect(, insert_observation(
+        /\s\|\s/,                                            # spaced table pipe (doc / CHANGELOG rows)
+        /[\u{2500}-\u{257f}]/,                               # box-drawing glyphs (tree / benchmark output)
+        /\(vector\)|\(text\)/,                               # benchmark mode labels
+        /parentUuid|isSidechain|toolUseID|hookName|"type":/  # raw JSONL transcript fields
+      )
       def distill(text, content_item_id: nil)
         entities = extract_entities(text)
@@ -166,7 +182,13 @@ module ClaudeMemory
         (s[/\A.{0,240}?[.!?](?=\s|\z)/m] || s[0, 240]).to_s.strip
       end
+      # A usable observation reads as a prose sentence. Reject anything that
+      # doesn't begin like one (leading /, |, ·, or box-drawing glyphs from a
+      # code comment or tool output) or that carries a code/markup/transcript
+      # signature.
       def noise_body?(body)
+        return true unless body.match?(/\A[A-Za-z]/)
         body.match?(NOISE_BODY_SIGNATURE)
       end

data/lib/claude_memory/hook/context_injector.rb CHANGED Viewed

@@ -73,7 +73,12 @@ module ClaudeMemory
         if fresh_session?
           undistilled = fetch_undistilled(MAX_UNDISTILLED)
-          sections << format_distillation_prompt(undistilled) if undistilled.any?
+          if undistilled.any?
+            sections << format_distillation_prompt(undistilled)
+            # The episodic-capture ask is its own prominent section (#72), not a
+            # buried paragraph inside the deep-distill prompt.
+            sections << format_observation_capture_prompt
+          end
           promotion = fetch_promotion_candidates(MAX_PROMOTION_CANDIDATES)
           sections << format_observation_reflection(promotion) if promotion.any?
@@ -247,15 +252,7 @@ module ClaudeMemory
           "in the object (e.g., \"… because …\", \"… so that …\", \"caused by …\",",
           "\"breaks when …\"). A fact with a reason is recoverable once stale; a",
           "bare conclusion is dead weight. Prefer one fact-with-reason over two",
-          "facts-without.",
-          "",
-          "**Also log what happened (episodic layer):** in the same",
-          "`memory.store_extraction` call, populate `observations` — one per",
-          "discrete event (a decision made, a preference stated, a notable action",
-          "or outcome). Each: a concise `body` of what happened, a `kind`",
-          "(decision/preference/event/…), and a reason for decisions/preferences.",
-          "Observations record \"what happened\"; facts record \"what is true\". They",
-          "accumulate, and a corroborated observation can later graduate into a fact."
+          "facts-without."
         ]
         items.each do |item|
@@ -269,6 +266,34 @@ module ClaudeMemory
         lines.join("\n")
       end
+      # First-class, standalone ask for the episodic layer (#72). Authoring
+      # observations was previously a paragraph buried inside the optional
+      # deep-distill flow above, and that flow fires almost never — so the
+      # episodic log was 100% Layer-1 scrapes. This decouples it: a prominent,
+      # lightweight instruction to log "what happened" directly, the same way
+      # the fact context rides the session. Effectiveness is measurable via the
+      # `mcp_extraction` content-item source (Layer-2) vs `claude_code` (Layer-1).
+      def format_observation_capture_prompt
+        <<~PROMPT.strip
+          ## Log What Happened (episodic memory)
+          Record the recent narrative as **observations** — "what happened",
+          complementing the facts above ("what is true"). For each discrete
+          event in the recent work above (a decision made, a preference stated,
+          a notable fix or outcome), call `memory.store_extraction` with an
+          `observations` array — one entry per event:
+          - `body`: one concise sentence of what happened (embed a reason for
+            decisions/preferences — "… because …", "… so that …")
+          - `kind`: `decision`, `preference`, or `event`
+          - `priority`: 1 important, 2 maybe, 3 info
+          Keep it to genuine events worth remembering — skip routine steps and
+          code output. Observations accumulate and a corroborated one graduates
+          into a fact. Send them with the facts in the same call, or on their own.
+        PROMPT
+      end
       def format_section(title, items)
         items = items.compact.uniq
         return nil if items.empty?

data/lib/claude_memory/observe/reflector.rb CHANGED Viewed

@@ -11,8 +11,12 @@ module ClaudeMemory
     # timer (Claude Code has no cron hook) and without extra API cost.
     #
     # Two passes, both provenance-preserving (tombstone, never hard-delete):
-    #   - dedupe: collapse near-identical active observations (same scope,
-    #     normalized body) into the newest, linking losers via consolidated_into.
+    #   - dedupe: collapse near-duplicate active observations (same scope) into
+    #     the newest, linking losers via consolidated_into. Similarity is decided
+    #     by an injected matcher (default: lexical token-overlap, #73) so the
+    #     promotion gate can actually accumulate corroboration — exact-string
+    #     matching never folded varied wording, leaving every observation at
+    #     corroboration 1 (the 2026-06-23 audit finding).
     #   - expire_stale_info: retire info-level (🟢 / priority 3) observations
     #     older than the TTL to bound context size. Important (🔴) and maybe
     #     (🟡) are never expired — only the lowest-signal tier ages out.
@@ -31,9 +35,10 @@ module ClaudeMemory
         end
       end
-      def initialize(store, info_ttl_days: DEFAULT_INFO_TTL_DAYS)
+      def initialize(store, info_ttl_days: DEFAULT_INFO_TTL_DAYS, matcher: TokenOverlapMatcher.new)
         @store = store
         @info_ttl_days = info_ttl_days
+        @matcher = matcher
       end
       # @return [Result] number of observations deduped and expired
@@ -51,21 +56,36 @@ module ClaudeMemory
       def dedupe
         active = @store.observations.where(status: "active").order(:id).all
+        active.group_by { |o| o[:scope] }.sum { |_scope, rows| dedupe_scope(rows) }
+      end
+      # Greedy clustering within one scope: the newest observation in a cluster
+      # is the keeper; older near-duplicates fold into it. O(n²) matcher calls,
+      # but n is bounded (#74 cut the inflow; expire_stale_info bounds the tail).
+      def dedupe_scope(rows)
+        return 0 if rows.size < 2
+        ordered = rows.sort_by { |r| [r[:observed_at].to_s, r[:id]] }.reverse
+        folded = {}
         merged = 0
-        active.group_by { |o| [o[:scope], normalize(o[:body])] }.each_value do |rows|
-          next if rows.size < 2
+        ordered.each do |keeper|
+          next if folded[keeper[:id]]
+          ordered.each do |other|
+            next if other[:id] == keeper[:id] || folded[other[:id]]
+            next unless @matcher.similar?(keeper[:body], other[:body])
-          keeper = rows.max_by { |r| [r[:observed_at].to_s, r[:id]] }
-          rows.each do |loser|
-            next if loser[:id] == keeper[:id]
-            # Fold the loser's sightings into the keeper before tombstoning so
-            # corroboration survives consolidation and can cross the promotion
+            # Fold the duplicate's sightings into the keeper before tombstoning
+            # so corroboration survives consolidation and can cross the promotion
             # threshold. A duplicate IS a repeated sighting.
-            @store.increment_corroboration(keeper[:id], by: loser[:corroboration_count] || 1)
-            @store.tombstone_observation(loser[:id], into_id: keeper[:id])
+            @store.increment_corroboration(keeper[:id], by: other[:corroboration_count] || 1)
+            @store.tombstone_observation(other[:id], into_id: keeper[:id])
+            folded[other[:id]] = true
             merged += 1
           end
+          folded[keeper[:id]] = true
         end
         merged
@@ -82,10 +102,6 @@ module ClaudeMemory
         ids.each { |id| @store.expire_observation(id) }
         ids.size
       end
-      def normalize(body)
-        body.to_s.downcase.gsub(/\s+/, " ").strip
-      end
     end
   end
 end

data/lib/claude_memory/observe/token_overlap_matcher.rb ADDED Viewed

@@ -0,0 +1,55 @@
+# frozen_string_literal: true
+module ClaudeMemory
+  module Observe
+    # Default observation-similarity matcher: lexical token-overlap (Jaccard).
+    #
+    # Deterministic, free, no embedding dependency — so it runs shell-side in
+    # the Reflector's sweep pass at no extra cost. Two bodies are "the same
+    # sighting" when their significant-word sets overlap past the threshold.
+    # This folds the common case (one event re-observed with slightly different
+    # wording — "PreCompact hook set." / "PreCompact hook set — the design
+    # analog", Jaccard 0.6) while keeping unrelated observations apart (distinct
+    # developer statements share ~no content words → Jaccard ~0).
+    #
+    # It does NOT capture pure synonym paraphrases ("use SQLite" vs "chose
+    # SQLite") — no free lexical method can on short text (measured 2026-06-23:
+    # tfidf cosine 0.32 for that pair, indistinguishable from unrelated pairs
+    # at 0.13). For paraphrase folding, inject a semantic matcher backed by real
+    # embeddings: the Reflector accepts any object responding to
+    # `similar?(body_a, body_b)`.
+    class TokenOverlapMatcher
+      DEFAULT_THRESHOLD = 0.5
+      # Function words carry no episodic signal; dropping them focuses the
+      # overlap on subject/verb content.
+      STOPWORDS = %w[
+        a an the to of in on at for and or but we i it is are was were be been
+        this that these those with as by from into our your their its do does
+      ].to_set.freeze
+      def initialize(threshold: DEFAULT_THRESHOLD)
+        @threshold = threshold
+      end
+      # @return [Boolean] true when the two bodies are near-duplicate sightings
+      def similar?(body_a, body_b)
+        a = significant_tokens(body_a)
+        b = significant_tokens(body_b)
+        return false if a.empty? || b.empty?
+        intersection = (a & b).size.to_f
+        union = (a | b).size
+        (intersection / union) >= @threshold
+      end
+      private
+      def significant_tokens(body)
+        body.to_s.downcase.scan(/[a-z0-9]+/)
+          .reject { |word| word.length < 2 || STOPWORDS.include?(word) }
+          .to_set
+      end
+    end
+  end
+end

data/lib/claude_memory/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module ClaudeMemory
-  VERSION = "0.13.0"
+  VERSION = "0.13.1"
 end

data/lib/claude_memory.rb CHANGED Viewed

@@ -126,6 +126,7 @@ require_relative "claude_memory/domain/provenance"
 require_relative "claude_memory/domain/conflict"
 require_relative "claude_memory/domain/observation"
 require_relative "claude_memory/observe/observations_renderer"
+require_relative "claude_memory/observe/token_overlap_matcher"
 require_relative "claude_memory/observe/reflector"
 require_relative "claude_memory/embeddings/model_registry"
 require_relative "claude_memory/embeddings/inspector"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: claude_memory
 version: !ruby/object:Gem::Version
-  version: 0.13.0
+  version: 0.13.1
 platform: ruby
 authors:
 - Valentino Stoll
@@ -368,6 +368,7 @@ files:
 - lib/claude_memory/mcp/tools.rb
 - lib/claude_memory/observe/observations_renderer.rb
 - lib/claude_memory/observe/reflector.rb
+- lib/claude_memory/observe/token_overlap_matcher.rb
 - lib/claude_memory/otel/attributes.rb
 - lib/claude_memory/otel/constants.rb
 - lib/claude_memory/otel/ingestor.rb