npm - @onlooker-community/ecosystem - Versions diffs - 0.18.0 → 0.20.0 - Mend

@onlooker-community/ecosystem 0.18.0 → 0.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/.claude-plugin/marketplace.json +13 -0
package/.claude-plugin/plugin.json +1 -1
package/.release-please-manifest.json +4 -2
package/CHANGELOG.md +14 -0
package/CLAUDE.md +1 -0
package/docs/memory-architecture.md +102 -0
package/package.json +3 -3
package/plugins/curator/docs/adr/001-staleness-tiers.md +100 -0
package/plugins/curator/docs/design.md +311 -0
package/plugins/historian/docs/adr/001-local-embeddings-only.md +96 -0
package/plugins/historian/docs/design.md +317 -0
package/plugins/librarian/.claude-plugin/plugin.json +14 -0
package/plugins/librarian/CHANGELOG.md +10 -0
package/plugins/librarian/README.md +51 -0
package/plugins/librarian/config.json +52 -0
package/plugins/librarian/docs/adr/001-propose-dont-auto-write.md +87 -0
package/plugins/librarian/docs/design.md +301 -0
package/plugins/librarian/hooks/hooks.json +26 -0
package/plugins/librarian/scripts/hooks/librarian-session-end.sh +312 -0
package/plugins/librarian/scripts/hooks/librarian-session-start.sh +103 -0
package/plugins/librarian/scripts/lib/librarian-archivist-reader.sh +67 -0
package/plugins/librarian/scripts/lib/librarian-classifier.sh +139 -0
package/plugins/librarian/scripts/lib/librarian-config.sh +74 -0
package/plugins/librarian/scripts/lib/librarian-durability.sh +77 -0
package/plugins/librarian/scripts/lib/librarian-emit.sh +72 -0
package/plugins/librarian/scripts/lib/librarian-project-key.sh +83 -0
package/plugins/librarian/scripts/lib/librarian-storage.sh +222 -0
package/plugins/librarian/scripts/lib/librarian-ulid.sh +50 -0
package/plugins/warden/.claude-plugin/plugin.json +14 -0
package/plugins/warden/CHANGELOG.md +10 -0
package/plugins/warden/config.json +51 -0
package/plugins/warden/docs/adr/001-detect-after-ingest-gate-before-action.md +62 -0
package/plugins/warden/docs/design.md +123 -0
package/plugins/warden/hooks/hooks.json +73 -0
package/plugins/warden/scripts/hooks/warden-post-tool-use.sh +201 -0
package/plugins/warden/scripts/hooks/warden-pre-tool-use.sh +94 -0
package/plugins/warden/scripts/hooks/warden-session-start.sh +52 -0
package/plugins/warden/scripts/lib/warden-cli.sh +124 -0
package/plugins/warden/scripts/lib/warden-config.sh +79 -0
package/plugins/warden/scripts/lib/warden-evaluator.sh +246 -0
package/plugins/warden/scripts/lib/warden-events.sh +85 -0
package/plugins/warden/scripts/lib/warden-gate-state.sh +105 -0
package/plugins/warden/scripts/lib/warden-patterns.sh +132 -0
package/plugins/warden/scripts/lib/warden-sanitizer.sh +80 -0
package/plugins/warden/scripts/lib/warden-scanner.sh +119 -0
package/plugins/warden/scripts/lib/warden-ulid.sh +50 -0
package/plugins/warden/skills/warden/SKILL.md +49 -0
package/release-please-config.json +32 -0
package/test/bats/librarian-session-end.bats +182 -0
package/test/bats/librarian-session-start.bats +136 -0
package/test/bats/warden-config.bats +54 -0
package/test/bats/warden-events.bats +85 -0
package/test/bats/warden-gate-state.bats +67 -0
package/test/bats/warden-patterns.bats +58 -0
package/test/bats/warden-sanitizer.bats +53 -0
package/test/bats/warden-scanner.bats +56 -0
package/test/bats/warden-ulid.bats +30 -0

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -111,6 +111,19 @@
       "license": "MIT",
       "keywords": ["synthesis", "recommendations", "observability", "coaching", "patterns", "weekly"],
       "tags": ["observability", "coaching"]
+    },
+    {
+      "name": "warden",
+      "source": "./plugins/warden",
+      "description": "Untrusted-content gate. Scans content flowing in through WebFetch and Read for prompt-injection patterns, and when a threat is detected closes a session-scoped gate that blocks Write, Edit, and Bash until the user explicitly clears it. Grounded in Meta's Agents Rule of Two — warden removes the agent's external-actions property while untrusted content is in play. Requires the ecosystem plugin.",
+      "author": {
+        "name": "Onlooker Community"
+      },
+      "homepage": "https://onlooker.dev",
+      "repository": "https://github.com/onlooker-community/ecosystem",
+      "license": "MIT",
+      "keywords": ["security", "prompt-injection", "rule-of-two", "safety", "content-gate", "untrusted-content"],
+      "tags": ["safety", "security"]
     }
   ]
 }

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "ecosystem",
-  "version": "0.18.0",
+  "version": "0.20.0",
   "description": "Observability substrate for Claude Code. Provides the shared ~/.onlooker/ storage root, canonical schema-validated event emission, session and tool tracking hooks, and prompt rules. Required by all other Onlooker plugins.",
   "author": {
     "name": "Onlooker Community",

package/.release-please-manifest.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  ".": "0.18.0",
+  ".": "0.20.0",
   "plugins/archivist": "0.1.0",
   "plugins/tribunal": "1.0.1",
   "plugins/echo": "0.2.0",
@@ -7,5 +7,7 @@
   "plugins/governor": "0.2.0",
   "plugins/compass": "0.2.0",
   "plugins/scribe": "0.2.0",
-  "plugins/counsel": "0.2.0"
+  "plugins/counsel": "0.2.0",
+  "plugins/warden": "0.2.0",
+  "plugins/librarian": "0.1.0"
 }

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,19 @@
 # Changelog
+## [0.20.0](https://github.com/onlooker-community/ecosystem/compare/ecosystem-v0.19.0...ecosystem-v0.20.0) (2026-06-04)
+### Features
+* **librarian:** land plugin end-to-end with memory layer designs :seedling: ([#55](https://github.com/onlooker-community/ecosystem/issues/55)) ([d4821ef](https://github.com/onlooker-community/ecosystem/commit/d4821efabfeb587e460e898d7db8f92fcc3f2c61))
+## [0.19.0](https://github.com/onlooker-community/ecosystem/compare/ecosystem-v0.18.0...ecosystem-v0.19.0) (2026-06-02)
+### Features
+* **warden:** untrusted-content gate enforcing the Agents Rule of Two :shield: ([#53](https://github.com/onlooker-community/ecosystem/issues/53)) ([210aa51](https://github.com/onlooker-community/ecosystem/commit/210aa51bff66226a0eec1f17292a2af4ea4ef56a))
 ## [0.18.0](https://github.com/onlooker-community/ecosystem/compare/ecosystem-v0.17.0...ecosystem-v0.18.0) (2026-06-02)

package/CLAUDE.md CHANGED Viewed

@@ -36,6 +36,7 @@ scripts/lib/onlooker-event.mjs  ← canonical event builder; all plugins route t
 | echo | Stop | Regression-tests prompt changes after each agent stop |
 | governor | SessionStart, PreToolUse (Task), PostToolUse (Task), Stop | Budget gates on subagent spawns; tracks spend per session |
 | tribunal | Stop + skill invocation | Post-task quality gate; also invokable via `/tribunal` |
+| warden | PostToolUse (WebFetch, Read), PreToolUse (Write, Edit, MultiEdit, Bash), SessionStart + skill invocation | Scans ingested content for injection; closes a content gate that blocks write-class tools until cleared via `/warden` |
 Plugins communicate by emitting events to the JSONL log — they do not call each other directly. All plugins depend on the ecosystem substrate; no plugin depends on another plugin directly.

package/docs/memory-architecture.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Memory Architecture
+This document describes how the Onlooker ecosystem manages memory across timescales — from the in-session working context, through compaction-resistant session memory, into durable cross-session knowledge, and back out as just-in-time recall.
+It is the companion to [`architecture.md`](architecture.md), which describes the substrate. This doc describes how the memory-flavored plugins compose on top of it.
+---
+## The four timescales
+| Timescale | Lifetime | Substrate | Plugin |
+|-----------|----------|-----------|--------|
+| Working | Within one model turn | Conversation context window | (the model itself) |
+| Session | One Claude Code session, across compactions | `~/.onlooker/archivist/<project-key>/` | **archivist** |
+| Durable | Across sessions, weeks–months | `~/.claude/projects/<encoded-project>/memory/` | **librarian**, **curator** |
+| Episodic | Across sessions, retrieved on similarity | `~/.onlooker/historian/<project-key>/` | **historian** |
+Each plugin operates on exactly one substrate. They communicate by events, not by reaching across substrates.
+---
+## The pipeline
+```mermaid
+flowchart LR
+    session[Active session]
+    session -->|PreCompact| archivist[(archivist artifacts<br/>per session)]
+    archivist -->|SessionEnd| librarian{librarian:<br/>worth keeping?}
+    librarian -->|propose| memstore[(typed memory store<br/>user / feedback / project / reference)]
+    memstore -->|SessionStart audit| curator{curator:<br/>still valid?}
+    curator -->|prune proposals| memstore
+    session -->|SessionEnd| historian[(historian vector store<br/>transcript embeddings)]
+    historian -->|UserPromptSubmit similarity| session
+    memstore -->|SessionStart inject| session
+    archivist -->|SessionStart inject| session
+```
+### Flow of a fact
+A non-obvious project fact ("the auth migration is driven by legal, not tech debt") goes through:
+1. **Captured during session** — surfaces in the conversation, lands in the transcript.
+2. **Extracted at compaction by archivist** — written as a `decisions/<ulid>.json` artifact when context fills.
+3. **Reinjected by archivist next session** — appears in `additionalContext` if it ranks within the recency/pinning budget. The fact survives session boundaries but lives in a per-session, recency-ranked space.
+4. **Promoted by librarian** — at session end (or on demand), librarian decides "this should be a `project` memory" and writes it to the typed memory store with provenance pointing back to the archivist artifact.
+5. **Audited by curator** — periodically, curator checks whether the **Why:** is still load-bearing (has the legal review concluded? did the migration land?). If stale, curator surfaces a prune proposal.
+6. **Recalled by historian (separately)** — if a future session encounters a similar problem ("we're being asked to rip out X middleware again"), historian retrieves the transcript chunk from the original session where this was discussed, providing fuller context than the distilled project memory can carry.
+---
+## Boundaries between memory plugins
+The memory plugins overlap in spirit but operate on different substrates and answer different questions. Misunderstanding these boundaries is the most likely failure mode.
+| Plugin | Question it answers | Storage | Trigger |
+|--------|---------------------|---------|---------|
+| archivist | "What did we decide / try / leave open in *this* session?" | `~/.onlooker/archivist/<project-key>/` | PreCompact, SessionStart |
+| librarian | "Which of those decisions deserves to live forever?" | `~/.claude/projects/<encoded-project>/memory/` (writes proposals; user confirms) | SessionEnd, scheduled |
+| curator | "Which of our durable memories are now wrong, stale, or contradictory?" | reads & proposes prunes against the typed memory store | SessionStart (cheap), scheduled (LLM) |
+| historian | "Have we seen something like this before, and if so, what happened?" | `~/.onlooker/historian/<project-key>/` (embeddings) | SessionEnd, UserPromptSubmit |
+| scribe | "What's a readable artifact of what we did?" | scribe's own output dir | session end |
+| cartographer | "Are the instruction files (CLAUDE.md, AGENTS.md, rules/) internally consistent?" | reads instruction files; emits findings | SessionStart, instruction-file writes |
+| counsel | "What does the *event log* across all plugins tell us to improve?" | reads JSONL log; produces brief | weekly |
+**The two easiest confusions:**
+1. **cartographer vs. curator.** Cartographer audits the *instruction files* the user maintains by hand (CLAUDE.md, AGENTS.md, `.claude/rules/`). Curator audits the *typed auto-memory store* the librarian writes into. Same shape of audit, different substrate. They are parallel, not redundant.
+2. **archivist vs. librarian.** Archivist keeps everything from a session, ranked by recency, and reinjects into the next one. Librarian promotes a small subset into the typed memory store where it will be reinjected *every* session, with classification (user/feedback/project/reference) and provenance.
+---
+## Shared invariants
+All three new memory plugins follow the ecosystem conventions:
+- **Project keying** for plugin-owned storage (`~/.onlooker/<plugin>/<project-key>/`) uses the SHA256-of-remote-URL scheme from [`architecture.md`](architecture.md#project-keying), so cross-clone consistency is preserved for plugin state. Note: the typed memory store the librarian writes into is keyed by Claude Code's per-checkout path encoding (`~/.claude/projects/<encoded-project>/memory/`), which is a different scheme — see [librarian ADR-001](../plugins/librarian/docs/adr/001-propose-dont-auto-write.md) for how this asymmetry is handled.
+- **No cross-plugin runtime dependencies.** Librarian degrades gracefully if archivist artifacts are absent (no proposals; emits `librarian.scan.skipped`). Curator degrades gracefully if the memory store is empty. Historian degrades gracefully if no past transcripts are indexed.
+- **Event emission via `onlooker-event.mjs`.** Event types: `librarian.*`, `curator.*`, `historian.*`. Registered in `@onlooker-community/schema` before first emission.
+- **Fail-soft on missing `~/.onlooker/`.** Plugins must not block a session they were not invited to.
+- **Context-budget awareness.** Reinjection at `SessionStart` competes for the same budget archivist already uses. Each plugin's design lays out its budget cap explicitly; the ecosystem does not yet enforce a global ceiling. See [Open Questions](#open-questions) below.
+---
+## Open Questions
+1. **Global reinjection ceiling.** Archivist, librarian (via the typed memory store it writes into), historian, counsel, and cartographer all surface things at `SessionStart`. Each respects its own cap, but there is no shared budget. A `governor.context_budget.*` event series could attribute context spend per plugin and provide a soft global cap. Currently each plugin sets its own ceiling independently.
+2. **Memory store path scheme divergence.** Claude Code's typed memory store uses per-checkout path encoding; the ecosystem uses SHA256-of-remote-URL for cross-clone sharing. Librarian writes to the former (so the user sees promotions in the same place as their hand-written memories), but its own operational state (proposals queue, provenance log) lives at `~/.onlooker/librarian/<project-key>/` under the ecosystem scheme. This asymmetry is handled in [librarian ADR-001](../plugins/librarian/docs/adr/001-propose-dont-auto-write.md) but remains an awkwardness worth revisiting if Claude Code adopts a remote-derived key.
+3. **Promotion provenance after deletion.** When the user manually deletes a promoted memory, librarian should learn from that — at minimum, don't re-propose the same content. Sketched in librarian's design as the "tombstone" mechanism but not yet decided.
+4. **Historian and secrets.** Embedding entire transcripts means tokens, paths, and other sensitive strings that appeared in a past conversation become recallable. Deletion semantics (purge by session_id, by date range, by pattern) need to be defined before historian goes beyond a design sketch.
+---
+## Plugin design documents
+- [Librarian](../plugins/librarian/docs/design.md) — consolidates archivist artifacts into the typed memory store
+- [Curator](../plugins/curator/docs/design.md) — detects stale, contradictory, and decayed memories
+- [Historian](../plugins/historian/docs/design.md) — episodic retrieval over past session transcripts

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@onlooker-community/ecosystem",
-  "version": "0.18.0",
+  "version": "0.20.0",
   "description": "Agents, skills, hooks, commands, rules, and MCP configurations that power [Onlooker](https://onlooker.dev)",
   "author": {
     "name": "Onlooker Community",
@@ -19,14 +19,14 @@
     "onlooker-install": "install.sh"
   },
   "dependencies": {
-    "@onlooker-community/schema": "^2.4.0"
+    "@onlooker-community/schema": "^2.5.0"
   },
   "scripts": {
     "postinstall": "echo '\\n  onlooker-ecosystem installed!\\n  Run: npx onlooker-install typescript\\n  Docs: https://github.com/onlooker-community/ecosystem\\n'",
     "test": "npm run test:bats && npm run test:schema",
     "test:bats": "bats test/bats",
     "test:schema": "node --test test/node/*.test.mjs",
-    "test:shellcheck": "shellcheck -S error -x install.sh scripts/common.sh scripts/hooks/*.sh scripts/lib/*.sh plugins/archivist/scripts/hooks/*.sh plugins/archivist/scripts/lib/*.sh plugins/tribunal/scripts/hooks/*.sh plugins/tribunal/scripts/lib/*.sh plugins/echo/scripts/hooks/*.sh plugins/echo/scripts/lib/*.sh plugins/governor/scripts/hooks/*.sh plugins/governor/scripts/lib/*.sh plugins/compass/scripts/hooks/*.sh plugins/compass/scripts/lib/*.sh plugins/scribe/scripts/hooks/*.sh plugins/scribe/scripts/lib/*.sh plugins/counsel/scripts/hooks/*.sh plugins/counsel/scripts/lib/*.sh",
+    "test:shellcheck": "shellcheck -S error -x install.sh scripts/common.sh scripts/hooks/*.sh scripts/lib/*.sh plugins/archivist/scripts/hooks/*.sh plugins/archivist/scripts/lib/*.sh plugins/tribunal/scripts/hooks/*.sh plugins/tribunal/scripts/lib/*.sh plugins/echo/scripts/hooks/*.sh plugins/echo/scripts/lib/*.sh plugins/governor/scripts/hooks/*.sh plugins/governor/scripts/lib/*.sh plugins/compass/scripts/hooks/*.sh plugins/compass/scripts/lib/*.sh plugins/scribe/scripts/hooks/*.sh plugins/scribe/scripts/lib/*.sh plugins/counsel/scripts/hooks/*.sh plugins/counsel/scripts/lib/*.sh plugins/warden/scripts/hooks/*.sh plugins/warden/scripts/lib/*.sh plugins/librarian/scripts/hooks/*.sh plugins/librarian/scripts/lib/*.sh",
     "lint:references": "node scripts/lint/check-references.mjs",
     "lint:manifests": "node scripts/lint/check-manifests.mjs",
     "coverage:node": "node scripts/coverage/run-coverage.mjs",

package/plugins/curator/docs/adr/001-staleness-tiers.md ADDED Viewed

@@ -0,0 +1,100 @@
+# ADR-001: Two-Tier Staleness Checks — Cheap Every Session, LLM Weekly
+- Status: Accepted
+- Date: 2026-06-02
+- Deciders: Meagan
+- Tags: curator, memory, performance, rate-limiting
+## Context and Problem Statement
+Curator audits the typed memory store for staleness, broken references, contradictions, and decayed dates. The natural place to surface findings is at `SessionStart` — that's when the user is paged in for the day's work and most receptive to a one-line "Curator: 2 findings to review" pointer.
+But `SessionStart` fires on every session, including every restart after compaction and every new branch checkout. Whatever curator runs at `SessionStart` runs *a lot*. Two pressures push against each other:
+- The checks need to be cheap enough not to delay session startup or accumulate hidden cost.
+- The most valuable check (contradiction detection between memory pairs) requires an LLM call per candidate pair, and the candidate set grows with the memory store.
+A single check tier — either "all checks every session" or "all checks on a manual trigger" — fails the gradient. The first is too expensive at scale; the second is too easy to forget, and findings rot quietly.
+## Decision Drivers
+- **`SessionStart` is a hot path.** Cartographer's design explicitly documents that audits run as a detached background process to avoid blocking sessions. Curator inherits the same constraint.
+- **Cheap checks are nearly free.** Date pattern matching, file-exists checks on path references, and `rg` calls for symbol references all sit in the millisecond range. They can run every session inside a wall-clock budget without observable latency.
+- **LLM contradiction checks scale poorly.** Pairwise comparison is O(N²) on the candidate set even after similarity-filtering. A 100-memory store today may have ~10 candidate pairs; a 500-memory store has more. Running this every session is paying a recurring cost that grows with the user's investment in the system — the worst kind of cost curve.
+- **The signal decays slowly.** A contradiction between two memories does not become more or less true between Monday and Tuesday. Weekly cadence captures real changes (new memories, edited memories) without burning compute on a static problem.
+- **User-pull beats system-push for big sweeps.** A manual `/curator scan` skill exists for users who want to force a full sweep — that's the right surface for "I just landed a big refactor; re-check everything."
+- **Cartographer precedent.** Cartographer runs a periodic background audit and surfaces findings as events. Curator's tiered cadence is the moral equivalent: cheap checks are the "always-on" surface, the LLM sweep is the "periodic" surface.
+## Considered Options
+1. **One tier: all checks every session.** Simple, predictable, expensive at scale. Wall-clock budget acts as a ceiling but degrades coverage as the store grows.
+2. **One tier: all checks on manual trigger only.** Cheap and predictable but loses the always-on signal users want from `SessionStart`. Findings rot.
+3. **Two tiers: cheap checks every session, LLM sweep at most once per N days.** Preserves the always-on cheap signal and amortizes the expensive sweep over time.
+4. **Three tiers: cheap every session, mid-cost (e.g., rg-based symbol re-check) daily, LLM weekly.** Finer granularity. Adds operational complexity (multiple watermarks, multiple skip-reason cases) without an obvious payoff at current memory-store sizes.
+5. **Adaptive cadence.** Run the LLM sweep when the memory store has changed by more than X% since last sweep. Conceptually elegant but introduces a change-detection layer that itself needs validation.
+## Decision
+We adopt **Option 3: two tiers with a watermark-gated LLM sweep at most once per `llm_sweep_interval_days` (default: 7)**.
+The cheap tier runs on every `SessionStart` inside a `wall_clock_budget_ms` budget (default: 500ms). It performs:
+- Date pattern parsing and grace-period checks.
+- Path reference existence checks.
+- Symbol reference `rg` checks (capped at `max_symbol_checks_per_session`).
+- Usage tracker reads from the JSONL log (when the `memory.recalled` emitter ships).
+If the cheap tier runs over budget, curator emits `curator.scan.skipped` with `reason: "over_budget"` and exits without partial results. The wall-clock budget exists to make budget overruns visible as events rather than as silent slowdowns.
+The LLM tier runs only when `now - last_llm_sweep_at >= llm_sweep_interval_days`. The watermark lives at `~/.onlooker/curator/<project-key>/last_llm_sweep.json`. When the gate opens:
+- Pairwise Jaccard similarity is computed across all memories.
+- Pairs above `contradiction_similarity_threshold` (default: 0.4) with opposing sentiment markers proceed to LLM evaluation.
+- Up to `max_pair_evaluations_per_sweep` (default: 50) calls are made, watermark advances regardless of whether the cap is hit.
+The manual `/curator scan` skill bypasses both rate gates. Useful for "I just landed N memories; re-check now."
+Option 4 was rejected because the user-visible benefit of a third tier is unclear at current memory-store sizes (typically tens of entries, not hundreds). It can be added later without changing the architecture if scale changes the calculus.
+Option 5 was rejected as a default because change-detection introduces a calibration problem (what counts as significant change?) on top of the staleness-detection problem. It is plausible as a future optimization layered on Option 3 (e.g., advance the LLM watermark when the memory store has changed by less than a threshold, to defer the next sweep).
+## Consequences
+### Positive
+- `SessionStart` latency stays bounded by the cheap-check wall-clock budget — typically well under 500ms.
+- The expensive LLM sweep runs at a cadence that matches the rate-of-change of the underlying signal (memories don't contradict each other on a per-minute basis).
+- A manual override (`/curator scan`) gives users the always-available "scan now" surface without making it the default.
+- Budget overruns are observable: `curator.scan.skipped` with `reason: "over_budget"` is a leading indicator that the memory store has grown enough to need tuning.
+- The watermark-gated sweep also caps cost: a user with a single high-traffic project pays at most N LLM-sweep calls per `llm_sweep_interval_days`.
+### Negative
+- A contradiction introduced today may go up to 7 days before being flagged. This is the cost of weekly cadence. Mitigation: the manual `/curator scan` exists; users who notice they just added two conflicting memories can force a check.
+- The two-tier model is slightly more complex than one tier. Two watermarks (cheap-tier last-run, LLM-tier last-sweep), two budget knobs, two skip reasons. The added complexity is small relative to the cost savings.
+- The cheap tier's wall-clock budget interacts badly with very large memory stores. At ≥200 memories, the cheap tier may itself need a per-memory cap (e.g., "scan only memories touched in the last N days"). Not yet a problem; flagged as a future open question.
+### Neutral
+- The choice of 7 days for `llm_sweep_interval_days` is a guess. Users who care can override. A future calibration could measure how often pair-similarity-with-opposing-sentiment results change verdict between consecutive sweeps; if the rate is low, the interval could grow.
+## Implementation Notes
+- The cheap-tier wall-clock budget is checked between sub-tasks, not within them. A single `rg` call that itself runs over budget is allowed to finish; the gate prevents *the next* sub-task from starting.
+- The LLM watermark is updated *before* the sweep begins, not after. This prevents a sweep that crashes midway from being retried immediately on the next `SessionStart`. The downside: a crashed sweep delays the next full check by the interval. Acceptable — better than a crash-retry loop.
+- Findings dedup uses `deduped_hash`. For `contradiction` findings the hash includes both memory bodies' content hashes, so an edit to either body re-opens the finding for re-evaluation on the next sweep.
+- Manual `/curator scan` updates the watermark like an automatic sweep — a user who runs the skill on Monday and Tuesday gets two LLM sweeps in two days, which is fine because they explicitly asked for it.
+- The cheap-tier and LLM-tier are independently configurable via `cheap_checks.enabled` and `llm_sweep.enabled`. Users can run either tier alone (e.g., LLM sweep off in a budget-sensitive project).
+## Validation
+- A memory store of ~50 entries should produce cheap-check sweeps under 200ms on a typical macOS dev laptop. If `curator.scan.skipped` with `reason: "over_budget"` fires at this size, the wall-clock budget needs tuning, not the design.
+- An LLM sweep at ~50 memories should make ≤20 pair-evaluation calls in the common case. The `max_pair_evaluations_per_sweep` cap of 50 should not be reached. If it is reached regularly, similarity threshold and sentiment-marker filtering need tuning.
+- Findings open for more than two LLM-sweep intervals without user action should produce a `curator.finding.aged_unhandled` summary event — an integration point for counsel's weekly brief.
+## References
+- Cartographer design — precedent for background audits with per-finding events (`plugins/cartographer/docs/design.md`)
+- Compass ADR-001 — precedent for explicit budget gates and skip-reason events (`plugins/compass/docs/adr/001-evaluate-prompts-in-context.md`)
+- Memory architecture overview (`docs/memory-architecture.md`)
+- Curator design (`../design.md`)

package/plugins/curator/docs/design.md ADDED Viewed

@@ -0,0 +1,311 @@
+# Curator — Plugin Design
+**Plugin name:** `curator`
+**Tagline:** *Tends the memory garden.*
+**Status:** Design (pre-implementation)
+Curator is the maintenance layer for the user's typed memory store. It runs cheap heuristic checks at every `SessionStart` and an LLM-backed conflict sweep at most weekly, surfaces stale references, decayed dates, and contradicting entries, and proposes prunes for user review. It does not edit the memory store directly — the same posture librarian and cartographer adopt for durable substrates.
+It sits in the [memory architecture](../../../docs/memory-architecture.md) downstream of librarian: librarian writes (with user confirmation); curator audits. Curator is parallel to cartographer: same shape (audit, propose, surface), different substrate. Cartographer audits hand-maintained instruction files (CLAUDE.md, AGENTS.md, `.claude/rules/`); curator audits the typed auto-memory store at `~/.claude/projects/<encoded-project>/memory/`.
+---
+## Failure Modes Curator Addresses
+**A — Decayed date references.** A project memory says "merge freeze begins 2026-03-05 for mobile release cut." After March 5 passes, the memory is at best uninformative and at worst misleading (the model continues to flag work as freeze-sensitive). Curator detects past-tense date markers and proposes removal or refactor.
+**B — Stale path references.** A reference memory says "see `scripts/legacy_ingest.py` for the old pipeline shape." The file has since been deleted. The memory now points to nothing. Curator validates path references on a periodic sweep and flags broken ones.
+**C — Contradicting memories.** A user memory says "prefer functional patterns" and a feedback memory says "yes, the class-based approach was right for this hot path." Both are true in their original contexts. The model has to reconcile them at runtime, often badly. Curator's LLM-backed sweep finds high-similarity, opposing-sentiment pairs and surfaces the contradiction for human disambiguation.
+**D — Unused memories (weakest signal).** A memory has been in the store for 90 days and has never been surfaced as relevant in any session (signal: no `memory.recalled` event references it). It might be load-bearing as a backstop, or it might be dead weight. Curator flags but does not propose removal — the signal is too noisy for action.
+**E — Type drift.** A `project` memory ("we're rewriting auth for compliance") becomes a `feedback` memory ("this directory looks weird because of legal review") once the rewrite is done. The original type still fits but a better type now exists. Curator can detect type-drift candidates but the action (re-classification) is necessarily manual.
+---
+## Architecture
+```
+SessionStart hook fires
+        │
+        ▼
+┌──────────────────────┐
+│   Rate Gate          │  cheap checks: every session
+│                      │  LLM checks: once per llm_sweep_interval_days
+└─────────┬────────────┘
+          │
+          ▼
+┌──────────────────────┐
+│  Memory Reader       │  reads MEMORY.md + *.md files from memory store
+│                      │  parses frontmatter (name, description, type)
+└─────────┬────────────┘
+          │
+          ▼ (cheap sweep, every session)
+┌──────────────────────┐
+│  Date Checker        │  parse dates from bodies; flag past-tense markers
+└─────────┬────────────┘
+          │
+          ▼
+┌──────────────────────┐
+│  Reference Checker   │  validate path refs (file exists), symbol refs
+│                      │  (rg the symbol; warn on zero matches), URL refs
+│                      │  (HEAD with budget; skipped without consent)
+└─────────┬────────────┘
+          │
+          ▼
+┌──────────────────────┐
+│  Usage Tracker       │  read JSONL log; correlate memory IDs with
+│                      │  memory.recalled events from N days
+└─────────┬────────────┘
+          │
+          ▼ (LLM sweep, if interval elapsed)
+┌──────────────────────┐
+│  Similarity Matrix   │  Jaccard on token sets; pairs with sim > threshold
+│                      │  → LLM contradiction check
+└─────────┬────────────┘
+          │
+          ▼
+┌──────────────────────┐
+│  Findings Store      │  ~/.onlooker/curator/<key>/findings/<ulid>.json
+└─────────┬────────────┘
+          │ at SessionStart
+          ▼
+┌──────────────────────┐
+│ Surfacer             │  "Curator: 2 stale, 1 contradicting findings."
+│                      │  Review via /curator review.
+└──────────────────────┘
+```
+### Rate Gate
+Three categories of check, three cadences:
+- **Cheap checks (date, reference, usage):** run every `SessionStart`. Combined wall-clock budget: ≤500ms. Above that, curator emits `curator.scan.skipped` with `reason: "over_budget"` and defers.
+- **LLM contradiction sweep:** runs at most once per `llm_sweep_interval_days` (default: 7) per project. Watermark stored at `~/.onlooker/curator/<project-key>/last_llm_sweep.json`.
+- **Manual sweep:** `/curator scan` forces a full sweep including the LLM pass, ignoring rate gates.
+The rate gate exists because curator runs on every session start, and a quadratic LLM pass on a growing memory store is the worst kind of background cost: invisible, recurring, and proportional to user investment.
+### Memory Reader
+Parses the typed memory store:
+1. Reads `~/.claude/projects/<encoded-project>/memory/MEMORY.md` for the index entries.
+2. For each line of the form `- [Title](file.md) — hook`, resolves `file.md` against the memory dir.
+3. Reads each referenced file. Parses YAML frontmatter (`name`, `description`, `type`). The body after frontmatter is the memory content.
+4. If a file is referenced from `MEMORY.md` but does not exist, that itself is a `findings.broken_index` — surfaced immediately.
+5. If a file exists in the memory dir but is not referenced from `MEMORY.md`, that is `findings.orphaned_memory` — also surfaced.
+### Date Checker
+For each memory body, scans for date patterns and absolute references:
+- **ISO-8601 dates** (`2026-03-05`, `2026-03-05T10:00:00Z`).
+- **Quarter markers** (`Q1 2026`, `2026Q3`).
+- **Named deadlines** with absolute dates nearby (`freeze`, `deadline`, `release cut`, `migration`, `cutover`, `EOL`, `expires`).
+- **Relative-to-write markers** when the frontmatter has a discoverable write date (`promoted_at`, `created_at`): phrases like "next week", "by end of month", "this Friday" relative to that date.
+For each match, compares to today's date. If a date is more than `date_grace_period_days` (default: 14) in the past, emits `curator.finding.date_decayed` with the matched phrase and the gap in days.
+The check does not propose removal automatically — past dates often have lingering relevance ("freeze on 2026-03-05" might still document why a code shape is the way it is). The user decides whether to remove, refactor, or keep.
+### Reference Checker
+For each memory body, scans for two kinds of references:
+1. **Path references.** Patterns matching `path/to/file.ext` heuristics. For each candidate path, resolves against the repo root (from `git rev-parse --show-toplevel`). If the path does not exist, emits `curator.finding.path_broken` with the memory file and the broken path.
+2. **Symbol references.** Heuristic: backtick-wrapped identifiers (`` `myFunction` ``, `` `MyClass` ``) that look like code identifiers (CamelCase or snake_case with no spaces, length ≥ 3). For each, runs `rg --type-add 'all:*' --type all -F 'identifier'` in the repo root. If zero matches, emits `curator.finding.symbol_missing`.
+3. **URL references.** Optional, disabled by default. When `check_urls: true` and the URL host is not in `url_allowlist`, curator emits `curator.finding.url_unchecked` (a record that the memory contains an external URL it cannot validate without network). URLs in the allowlist (and only those) are HEAD-checked under a wall-clock budget.
+The reference checker treats matches as evidence of liveness, not correctness. A symbol that grep-matches might still be the wrong symbol; a path that resolves might point to renamed content. The checker is a smoke alarm, not a smoke detector.
+### Usage Tracker
+Reads `~/.onlooker/logs/onlooker-events.jsonl` (rate-limited; the tail is enough for usage windows) for events of type `memory.recalled` and `memory.referenced` over the last `usage_window_days` (default: 30). For each memory file, computes recall count.
+The Onlooker event log does not yet emit `memory.recalled` events. Adding that emitter belongs to the ecosystem substrate (so all plugins benefit), not to curator. Until it ships, the usage tracker emits `curator.finding.unused_undetectable` once per scan and skips the rest of the pass. This is recorded as a hard dependency in [Open Questions #1](#open-questions).
+When the emitter ships: memories with zero recalls in the window are flagged `curator.finding.unused_low_signal`. The finding is informational only — the design does not propose removal based on usage alone, because the recall signal is itself noisy (the model may not surface a memory it should have, and a recalled memory may have been irrelevant).
+### Similarity Matrix and Contradiction Check (LLM sweep)
+Run at most once per `llm_sweep_interval_days`:
+1. Compute pairwise Jaccard similarity over normalized token sets (lowercased, stopwords removed, top-K tokens per body).
+2. Filter to pairs where similarity ≥ `contradiction_similarity_threshold` (default: 0.4) and where the two memories have at least one opposing sentiment marker (one contains `always`/`prefer`/`do` and the other contains `never`/`avoid`/`don't`).
+3. For each surviving pair, call Haiku with both memory bodies and ask:
+```
+You are evaluating whether two memory entries contradict each other in practice.
+Two memories CONTRADICT when applying both leads to inconsistent action.
+Two memories COMPLEMENT when they apply in different contexts and a careful reader
+   can follow both.
+Two memories are REDUNDANT when one strictly subsumes the other.
+RULES:
+- Output only: {"verdict": "<contradict|complement|redundant|unrelated>",
+                "rationale": "<≤30 words>"}
+<memory_a>
+title: {{TITLE_A}}
+body: {{BODY_A}}
+</memory_a>
+<memory_b>
+title: {{TITLE_B}}
+body: {{BODY_B}}
+</memory_b>
+```
+Model: `claude-haiku-4-5-20251001`. Temperature 0.2. Max output tokens: 96.
+`contradict` verdicts become `curator.finding.contradiction`. `redundant` verdicts become `curator.finding.redundant_pair`. `complement` and `unrelated` are logged but not surfaced.
+### Findings Store and Surfacer
+Each finding is written to `~/.onlooker/curator/<project-key>/findings/<ulid>.json`:
+```json
+{
+  "id": "01J...",
+  "kind": "date_decayed | path_broken | symbol_missing | url_unchecked | unused_low_signal | contradiction | redundant_pair | broken_index | orphaned_memory",
+  "memory_files": ["feedback_no_trailing_summaries.md"],
+  "detail": { ... kind-specific ... },
+  "created_at": "2026-06-02T18:24:11Z",
+  "deduped_hash": "...",
+  "status": "open | acknowledged | resolved"
+}
+```
+The `deduped_hash` prevents the same finding from being re-emitted every session. Same shape as cartographer's `payload.finding_hash`.
+At `SessionStart`, curator counts open findings by kind and emits a one-line `additionalContext` pointer:
+> Curator: 1 contradiction, 2 path-broken, 1 date-decayed. Review with `/curator review`.
+The pointer caps the inject at one line; findings details live in the skill, not in context.
+---
+## Integration Points
+**Librarian.** Curator uses the `source: "librarian"` provenance to apply different staleness criteria to librarian-promoted memories vs. hand-written ones (open question — current default treats them identically).
+**Cartographer.** Same shape; different substrate. They can run independently. Curator's findings format intentionally mirrors cartographer's so a future unified findings dashboard can render both.
+**Ecosystem substrate.** Curator depends on a `memory.recalled` / `memory.referenced` event emitter that does not yet exist. Until it ships, the usage tracker is dormant.
+**Counsel.** Counsel reads curator's findings as part of the weekly observability brief; curator does not need to know about counsel.
+**Historian.** Independent. Curator audits the distilled memory store; historian operates on the transcript embeddings. A path that's stale in a memory is not made fresh by being in a transcript.
+---
+## Configuration (`config.json`)
+```json
+{
+  "plugin_name": "curator",
+  "storage_path": "${ONLOOKER_DIR:-$HOME/.onlooker}",
+  "curator": {
+    "enabled": false,
+    "memory_store_path": "${HOME}/.claude/projects/${CLAUDE_PROJECT_ENCODED}/memory",
+    "cheap_checks": {
+      "enabled": true,
+      "wall_clock_budget_ms": 500,
+      "skip_if_session_age_under_seconds": 5
+    },
+    "date_check": {
+      "enabled": true,
+      "date_grace_period_days": 14
+    },
+    "reference_check": {
+      "enabled": true,
+      "check_urls": false,
+      "url_allowlist": []
+    },
+    "usage_tracker": {
+      "enabled": true,
+      "usage_window_days": 30
+    },
+    "llm_sweep": {
+      "enabled": true,
+      "model": "claude-haiku-4-5-20251001",
+      "temperature": 0.2,
+      "max_output_tokens": 96,
+      "interval_days": 7,
+      "max_pair_evaluations_per_sweep": 50,
+      "contradiction_similarity_threshold": 0.40
+    },
+    "surfacer": {
+      "max_pointer_chars": 200,
+      "skip_when_zero": true
+    }
+  }
+}
+```
+`skip_if_session_age_under_seconds` exists because a session start followed quickly by another session start (compaction, restart) shouldn't re-run the cheap checks.
+---
+## Events
+| Event | Trigger | Key payload fields |
+|---|---|---|
+| `curator.scan.started` | Scan run begins | `mode: cheap\|llm\|manual`, `findings_open_before` |
+| `curator.scan.completed` | Scan run ends | `findings_new`, `findings_resolved`, `duration_ms` |
+| `curator.scan.skipped` | Skipped by rate gate | `reason: over_budget\|llm_interval_not_elapsed\|disabled` |
+| `curator.finding.date_decayed` | A dated phrase is past the grace period | `memory_file`, `matched_phrase`, `days_past` |
+| `curator.finding.path_broken` | Path reference does not resolve | `memory_file`, `broken_path` |
+| `curator.finding.symbol_missing` | Backticked identifier returns zero rg matches | `memory_file`, `symbol` |
+| `curator.finding.url_unchecked` | URL present, host not in allowlist | `memory_file`, `url_host` |
+| `curator.finding.unused_low_signal` | Zero recalls in window (when emitter exists) | `memory_file`, `window_days` |
+| `curator.finding.unused_undetectable` | Usage emitter not present | `note: "memory.recalled events not implemented"` |
+| `curator.finding.contradiction` | LLM verdict `contradict` | `memory_a`, `memory_b`, `rationale` |
+| `curator.finding.redundant_pair` | LLM verdict `redundant` | `memory_a`, `memory_b`, `rationale` |
+| `curator.finding.broken_index` | MEMORY.md references missing file | `referenced_file` |
+| `curator.finding.orphaned_memory` | Memory file not referenced from MEMORY.md | `memory_file` |
+| `curator.finding.acknowledged` | User acknowledged finding via skill (no action taken) | `finding_id` |
+| `curator.finding.resolved` | User resolved finding via skill (action taken) | `finding_id`, `action: prune\|edit\|reclassify\|defer` |
+---
+## Skills
+**`/curator review`** — interactive walkthrough of open findings. For each: shows the memory body excerpt, the finding kind and detail, and offers prune / edit / reclassify / acknowledge / defer.
+**`/curator scan`** — forces a full sweep including the LLM pass. Ignores rate gates.
+**`/curator calibrate`** — runs the LLM sweep against the current memory store and reports precision against a labeled set (which the user maintains in `~/.onlooker/curator/<project-key>/calibration_labels.json`). Useful for tuning `contradiction_similarity_threshold`.
+---
+## Open Questions
+1. **`memory.recalled` event dependency.** The usage tracker requires an event emitter in the ecosystem substrate that does not yet exist. The substrate change is small (`UserPromptExpansion` hook can emit an event each time a memory is reinjected) but it is a prerequisite. Until then, the usage signal is dormant — `curator.finding.unused_undetectable` is emitted once per scan to make the missing capability visible.
+2. **Librarian-promoted vs. hand-written staleness.** A librarian-promoted memory was distilled from a session; its staleness criteria might be "the source session is older than X." A hand-written memory has no equivalent decay marker. The current design treats them identically; the provenance field is captured but not yet used differently.
+3. **LLM sweep cost growth.** Pairwise contradiction checks are O(N²) on pair candidates. At 100 memories with similarity-filtering, the sweep is typically under 10 LLM calls; at 500 memories the worst case approaches the `max_pair_evaluations_per_sweep` cap. A smarter pre-filter (e.g., embedding-based clustering to limit pair candidates) becomes worthwhile around 200 memories.
+4. **Finding dedup vs. re-evaluation.** A `date_decayed` finding for `2026-03-05` is the same fact every session — `deduped_hash` prevents re-emission. But a `contradiction` finding between two memories may be re-evaluated if either memory's body changes; the dedup hash should include both bodies' hashes, not just memory IDs.
+5. **Auto-prune as a future opt-in.** Like librarian's `auto_promote`, curator could grow an `auto_prune` mode for high-confidence findings (e.g., `path_broken` with no possible interpretation). Deferred until the cheap-check precision is measured in practice.
+6. **Type-drift detection.** Mentioned as failure mode E but not addressed by the current checks. Would require an LLM call per memory: "given this body, what type fits best?" — too expensive for every session, plausible for the weekly sweep.
+7. **Interaction with `~/.claude/CLAUDE.md`.** Global instructions in `~/.claude/CLAUDE.md` shape behavior but live outside the typed memory store. Curator does not audit them — cartographer does. If the boundary moves (e.g., librarian gains the ability to propose `~/.claude/CLAUDE.md` edits), curator and cartographer will need a shared rule for which substrate owns which file.
+---
+## Non-Goals
+- Does not edit the memory store automatically — same posture as librarian and cartographer.
+- Does not write new memories — that is librarian's job.
+- Does not perform retrieval — the typed memory store reinjection mechanism is owned elsewhere.
+- Does not audit instruction files (CLAUDE.md, AGENTS.md, `.claude/rules/`) — that is cartographer's job.
+- Does not synthesize cross-session improvement briefs — that is counsel's job.
+- Does not block any tool call — curator's surfacer is informational only.