@onlooker-community/ecosystem 0.19.0 → 0.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/.claude-plugin/marketplace.json +26 -0
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/.release-please-manifest.json +4 -2
  4. package/CHANGELOG.md +14 -0
  5. package/docs/memory-architecture.md +102 -0
  6. package/package.json +3 -3
  7. package/plugins/curator/.claude-plugin/plugin.json +14 -0
  8. package/plugins/curator/CHANGELOG.md +10 -0
  9. package/plugins/curator/README.md +55 -0
  10. package/plugins/curator/config.json +41 -0
  11. package/plugins/curator/docs/adr/001-staleness-tiers.md +100 -0
  12. package/plugins/curator/docs/design.md +311 -0
  13. package/plugins/curator/hooks/hooks.json +15 -0
  14. package/plugins/curator/scripts/hooks/curator-session-start.sh +343 -0
  15. package/plugins/curator/scripts/lib/curator-checks.sh +155 -0
  16. package/plugins/curator/scripts/lib/curator-config.sh +67 -0
  17. package/plugins/curator/scripts/lib/curator-emit.sh +61 -0
  18. package/plugins/curator/scripts/lib/curator-memory-reader.sh +225 -0
  19. package/plugins/curator/scripts/lib/curator-project-key.sh +82 -0
  20. package/plugins/curator/scripts/lib/curator-storage.sh +176 -0
  21. package/plugins/curator/scripts/lib/curator-ulid.sh +43 -0
  22. package/plugins/historian/docs/adr/001-local-embeddings-only.md +96 -0
  23. package/plugins/historian/docs/design.md +317 -0
  24. package/plugins/librarian/.claude-plugin/plugin.json +14 -0
  25. package/plugins/librarian/CHANGELOG.md +10 -0
  26. package/plugins/librarian/README.md +51 -0
  27. package/plugins/librarian/config.json +52 -0
  28. package/plugins/librarian/docs/adr/001-propose-dont-auto-write.md +87 -0
  29. package/plugins/librarian/docs/design.md +301 -0
  30. package/plugins/librarian/hooks/hooks.json +26 -0
  31. package/plugins/librarian/scripts/hooks/librarian-session-end.sh +312 -0
  32. package/plugins/librarian/scripts/hooks/librarian-session-start.sh +103 -0
  33. package/plugins/librarian/scripts/lib/librarian-archivist-reader.sh +67 -0
  34. package/plugins/librarian/scripts/lib/librarian-classifier.sh +139 -0
  35. package/plugins/librarian/scripts/lib/librarian-config.sh +74 -0
  36. package/plugins/librarian/scripts/lib/librarian-durability.sh +77 -0
  37. package/plugins/librarian/scripts/lib/librarian-emit.sh +72 -0
  38. package/plugins/librarian/scripts/lib/librarian-project-key.sh +83 -0
  39. package/plugins/librarian/scripts/lib/librarian-storage.sh +222 -0
  40. package/plugins/librarian/scripts/lib/librarian-ulid.sh +50 -0
  41. package/release-please-config.json +32 -0
  42. package/test/bats/curator-session-start.bats +316 -0
  43. package/test/bats/librarian-session-end.bats +182 -0
  44. package/test/bats/librarian-session-start.bats +136 -0
@@ -0,0 +1,311 @@
1
+ # Curator — Plugin Design
2
+
3
+ **Plugin name:** `curator`
4
+ **Tagline:** *Tends the memory garden.*
5
+ **Status:** Design (pre-implementation)
6
+
7
+ Curator is the maintenance layer for the user's typed memory store. It runs cheap heuristic checks at every `SessionStart` and an LLM-backed conflict sweep at most weekly, surfaces stale references, decayed dates, and contradicting entries, and proposes prunes for user review. It does not edit the memory store directly — the same posture librarian and cartographer adopt for durable substrates.
8
+
9
+ It sits in the [memory architecture](../../../docs/memory-architecture.md) downstream of librarian: librarian writes (with user confirmation); curator audits. Curator is parallel to cartographer: same shape (audit, propose, surface), different substrate. Cartographer audits hand-maintained instruction files (CLAUDE.md, AGENTS.md, `.claude/rules/`); curator audits the typed auto-memory store at `~/.claude/projects/<encoded-project>/memory/`.
10
+
11
+ ---
12
+
13
+ ## Failure Modes Curator Addresses
14
+
15
+ **A — Decayed date references.** A project memory says "merge freeze begins 2026-03-05 for mobile release cut." After March 5 passes, the memory is at best uninformative and at worst misleading (the model continues to flag work as freeze-sensitive). Curator detects past-tense date markers and proposes removal or refactor.
16
+
17
+ **B — Stale path references.** A reference memory says "see `scripts/legacy_ingest.py` for the old pipeline shape." The file has since been deleted. The memory now points to nothing. Curator validates path references on a periodic sweep and flags broken ones.
18
+
19
+ **C — Contradicting memories.** A user memory says "prefer functional patterns" and a feedback memory says "yes, the class-based approach was right for this hot path." Both are true in their original contexts. The model has to reconcile them at runtime, often badly. Curator's LLM-backed sweep finds high-similarity, opposing-sentiment pairs and surfaces the contradiction for human disambiguation.
20
+
21
+ **D — Unused memories (weakest signal).** A memory has been in the store for 90 days and has never been surfaced as relevant in any session (signal: no `memory.recalled` event references it). It might be load-bearing as a backstop, or it might be dead weight. Curator flags but does not propose removal — the signal is too noisy for action.
22
+
23
+ **E — Type drift.** A `project` memory ("we're rewriting auth for compliance") becomes a `feedback` memory ("this directory looks weird because of legal review") once the rewrite is done. The original type still fits but a better type now exists. Curator can detect type-drift candidates but the action (re-classification) is necessarily manual.
24
+
25
+ ---
26
+
27
+ ## Architecture
28
+
29
+ ```
30
+ SessionStart hook fires
31
+
32
+
33
+ ┌──────────────────────┐
34
+ │ Rate Gate │ cheap checks: every session
35
+ │ │ LLM checks: once per llm_sweep_interval_days
36
+ └─────────┬────────────┘
37
+
38
+
39
+ ┌──────────────────────┐
40
+ │ Memory Reader │ reads MEMORY.md + *.md files from memory store
41
+ │ │ parses frontmatter (name, description, type)
42
+ └─────────┬────────────┘
43
+
44
+ ▼ (cheap sweep, every session)
45
+ ┌──────────────────────┐
46
+ │ Date Checker │ parse dates from bodies; flag past-tense markers
47
+ └─────────┬────────────┘
48
+
49
+
50
+ ┌──────────────────────┐
51
+ │ Reference Checker │ validate path refs (file exists), symbol refs
52
+ │ │ (rg the symbol; warn on zero matches), URL refs
53
+ │ │ (HEAD with budget; skipped without consent)
54
+ └─────────┬────────────┘
55
+
56
+
57
+ ┌──────────────────────┐
58
+ │ Usage Tracker │ read JSONL log; correlate memory IDs with
59
+ │ │ memory.recalled events from N days
60
+ └─────────┬────────────┘
61
+
62
+ ▼ (LLM sweep, if interval elapsed)
63
+ ┌──────────────────────┐
64
+ │ Similarity Matrix │ Jaccard on token sets; pairs with sim > threshold
65
+ │ │ → LLM contradiction check
66
+ └─────────┬────────────┘
67
+
68
+
69
+ ┌──────────────────────┐
70
+ │ Findings Store │ ~/.onlooker/curator/<key>/findings/<ulid>.json
71
+ └─────────┬────────────┘
72
+ │ at SessionStart
73
+
74
+ ┌──────────────────────┐
75
+ │ Surfacer │ "Curator: 2 stale, 1 contradicting findings."
76
+ │ │ Review via /curator review.
77
+ └──────────────────────┘
78
+ ```
79
+
80
+ ### Rate Gate
81
+
82
+ Three categories of check, three cadences:
83
+
84
+ - **Cheap checks (date, reference, usage):** run every `SessionStart`. Combined wall-clock budget: ≤500ms. Above that, curator emits `curator.scan.skipped` with `reason: "over_budget"` and defers.
85
+ - **LLM contradiction sweep:** runs at most once per `llm_sweep_interval_days` (default: 7) per project. Watermark stored at `~/.onlooker/curator/<project-key>/last_llm_sweep.json`.
86
+ - **Manual sweep:** `/curator scan` forces a full sweep including the LLM pass, ignoring rate gates.
87
+
88
+ The rate gate exists because curator runs on every session start, and a quadratic LLM pass on a growing memory store is the worst kind of background cost: invisible, recurring, and proportional to user investment.
89
+
90
+ ### Memory Reader
91
+
92
+ Parses the typed memory store:
93
+
94
+ 1. Reads `~/.claude/projects/<encoded-project>/memory/MEMORY.md` for the index entries.
95
+ 2. For each line of the form `- [Title](file.md) — hook`, resolves `file.md` against the memory dir.
96
+ 3. Reads each referenced file. Parses YAML frontmatter (`name`, `description`, `type`). The body after frontmatter is the memory content.
97
+ 4. If a file is referenced from `MEMORY.md` but does not exist, that itself is a `findings.broken_index` — surfaced immediately.
98
+ 5. If a file exists in the memory dir but is not referenced from `MEMORY.md`, that is `findings.orphaned_memory` — also surfaced.
99
+
100
+ ### Date Checker
101
+
102
+ For each memory body, scans for date patterns and absolute references:
103
+
104
+ - **ISO-8601 dates** (`2026-03-05`, `2026-03-05T10:00:00Z`).
105
+ - **Quarter markers** (`Q1 2026`, `2026Q3`).
106
+ - **Named deadlines** with absolute dates nearby (`freeze`, `deadline`, `release cut`, `migration`, `cutover`, `EOL`, `expires`).
107
+ - **Relative-to-write markers** when the frontmatter has a discoverable write date (`promoted_at`, `created_at`): phrases like "next week", "by end of month", "this Friday" relative to that date.
108
+
109
+ For each match, compares to today's date. If a date is more than `date_grace_period_days` (default: 14) in the past, emits `curator.finding.date_decayed` with the matched phrase and the gap in days.
110
+
111
+ The check does not propose removal automatically — past dates often have lingering relevance ("freeze on 2026-03-05" might still document why a code shape is the way it is). The user decides whether to remove, refactor, or keep.
112
+
113
+ ### Reference Checker
114
+
115
+ For each memory body, scans for two kinds of references:
116
+
117
+ 1. **Path references.** Patterns matching `path/to/file.ext` heuristics. For each candidate path, resolves against the repo root (from `git rev-parse --show-toplevel`). If the path does not exist, emits `curator.finding.path_broken` with the memory file and the broken path.
118
+ 2. **Symbol references.** Heuristic: backtick-wrapped identifiers (`` `myFunction` ``, `` `MyClass` ``) that look like code identifiers (CamelCase or snake_case with no spaces, length ≥ 3). For each, runs `rg --type-add 'all:*' --type all -F 'identifier'` in the repo root. If zero matches, emits `curator.finding.symbol_missing`.
119
+ 3. **URL references.** Optional, disabled by default. When `check_urls: true` and the URL host is not in `url_allowlist`, curator emits `curator.finding.url_unchecked` (a record that the memory contains an external URL it cannot validate without network). URLs in the allowlist (and only those) are HEAD-checked under a wall-clock budget.
120
+
121
+ The reference checker treats matches as evidence of liveness, not correctness. A symbol that grep-matches might still be the wrong symbol; a path that resolves might point to renamed content. The checker is a smoke alarm, not a smoke detector.
122
+
123
+ ### Usage Tracker
124
+
125
+ Reads `~/.onlooker/logs/onlooker-events.jsonl` (rate-limited; the tail is enough for usage windows) for events of type `memory.recalled` and `memory.referenced` over the last `usage_window_days` (default: 30). For each memory file, computes recall count.
126
+
127
+ The Onlooker event log does not yet emit `memory.recalled` events. Adding that emitter belongs to the ecosystem substrate (so all plugins benefit), not to curator. Until it ships, the usage tracker emits `curator.finding.unused_undetectable` once per scan and skips the rest of the pass. This is recorded as a hard dependency in [Open Questions #1](#open-questions).
128
+
129
+ When the emitter ships: memories with zero recalls in the window are flagged `curator.finding.unused_low_signal`. The finding is informational only — the design does not propose removal based on usage alone, because the recall signal is itself noisy (the model may not surface a memory it should have, and a recalled memory may have been irrelevant).
130
+
131
+ ### Similarity Matrix and Contradiction Check (LLM sweep)
132
+
133
+ Run at most once per `llm_sweep_interval_days`:
134
+
135
+ 1. Compute pairwise Jaccard similarity over normalized token sets (lowercased, stopwords removed, top-K tokens per body).
136
+ 2. Filter to pairs where similarity ≥ `contradiction_similarity_threshold` (default: 0.4) and where the two memories have at least one opposing sentiment marker (one contains `always`/`prefer`/`do` and the other contains `never`/`avoid`/`don't`).
137
+ 3. For each surviving pair, call Haiku with both memory bodies and ask:
138
+
139
+ ```
140
+ You are evaluating whether two memory entries contradict each other in practice.
141
+
142
+ Two memories CONTRADICT when applying both leads to inconsistent action.
143
+ Two memories COMPLEMENT when they apply in different contexts and a careful reader
144
+ can follow both.
145
+ Two memories are REDUNDANT when one strictly subsumes the other.
146
+
147
+ RULES:
148
+ - Output only: {"verdict": "<contradict|complement|redundant|unrelated>",
149
+ "rationale": "<≤30 words>"}
150
+
151
+ <memory_a>
152
+ title: {{TITLE_A}}
153
+ body: {{BODY_A}}
154
+ </memory_a>
155
+
156
+ <memory_b>
157
+ title: {{TITLE_B}}
158
+ body: {{BODY_B}}
159
+ </memory_b>
160
+ ```
161
+
162
+ Model: `claude-haiku-4-5-20251001`. Temperature 0.2. Max output tokens: 96.
163
+
164
+ `contradict` verdicts become `curator.finding.contradiction`. `redundant` verdicts become `curator.finding.redundant_pair`. `complement` and `unrelated` are logged but not surfaced.
165
+
166
+ ### Findings Store and Surfacer
167
+
168
+ Each finding is written to `~/.onlooker/curator/<project-key>/findings/<ulid>.json`:
169
+
170
+ ```json
171
+ {
172
+ "id": "01J...",
173
+ "kind": "date_decayed | path_broken | symbol_missing | url_unchecked | unused_low_signal | contradiction | redundant_pair | broken_index | orphaned_memory",
174
+ "memory_files": ["feedback_no_trailing_summaries.md"],
175
+ "detail": { ... kind-specific ... },
176
+ "created_at": "2026-06-02T18:24:11Z",
177
+ "deduped_hash": "...",
178
+ "status": "open | acknowledged | resolved"
179
+ }
180
+ ```
181
+
182
+ The `deduped_hash` prevents the same finding from being re-emitted every session. Same shape as cartographer's `payload.finding_hash`.
183
+
184
+ At `SessionStart`, curator counts open findings by kind and emits a one-line `additionalContext` pointer:
185
+
186
+ > Curator: 1 contradiction, 2 path-broken, 1 date-decayed. Review with `/curator review`.
187
+
188
+ The pointer caps the inject at one line; findings details live in the skill, not in context.
189
+
190
+ ---
191
+
192
+ ## Integration Points
193
+
194
+ **Librarian.** Curator uses the `source: "librarian"` provenance to apply different staleness criteria to librarian-promoted memories vs. hand-written ones (open question — current default treats them identically).
195
+
196
+ **Cartographer.** Same shape; different substrate. They can run independently. Curator's findings format intentionally mirrors cartographer's so a future unified findings dashboard can render both.
197
+
198
+ **Ecosystem substrate.** Curator depends on a `memory.recalled` / `memory.referenced` event emitter that does not yet exist. Until it ships, the usage tracker is dormant.
199
+
200
+ **Counsel.** Counsel reads curator's findings as part of the weekly observability brief; curator does not need to know about counsel.
201
+
202
+ **Historian.** Independent. Curator audits the distilled memory store; historian operates on the transcript embeddings. A path that's stale in a memory is not made fresh by being in a transcript.
203
+
204
+ ---
205
+
206
+ ## Configuration (`config.json`)
207
+
208
+ ```json
209
+ {
210
+ "plugin_name": "curator",
211
+ "storage_path": "${ONLOOKER_DIR:-$HOME/.onlooker}",
212
+ "curator": {
213
+ "enabled": false,
214
+ "memory_store_path": "${HOME}/.claude/projects/${CLAUDE_PROJECT_ENCODED}/memory",
215
+ "cheap_checks": {
216
+ "enabled": true,
217
+ "wall_clock_budget_ms": 500,
218
+ "skip_if_session_age_under_seconds": 5
219
+ },
220
+ "date_check": {
221
+ "enabled": true,
222
+ "date_grace_period_days": 14
223
+ },
224
+ "reference_check": {
225
+ "enabled": true,
226
+ "check_urls": false,
227
+ "url_allowlist": []
228
+ },
229
+ "usage_tracker": {
230
+ "enabled": true,
231
+ "usage_window_days": 30
232
+ },
233
+ "llm_sweep": {
234
+ "enabled": true,
235
+ "model": "claude-haiku-4-5-20251001",
236
+ "temperature": 0.2,
237
+ "max_output_tokens": 96,
238
+ "interval_days": 7,
239
+ "max_pair_evaluations_per_sweep": 50,
240
+ "contradiction_similarity_threshold": 0.40
241
+ },
242
+ "surfacer": {
243
+ "max_pointer_chars": 200,
244
+ "skip_when_zero": true
245
+ }
246
+ }
247
+ }
248
+ ```
249
+
250
+ `skip_if_session_age_under_seconds` exists because a session start followed quickly by another session start (compaction, restart) shouldn't re-run the cheap checks.
251
+
252
+ ---
253
+
254
+ ## Events
255
+
256
+ | Event | Trigger | Key payload fields |
257
+ |---|---|---|
258
+ | `curator.scan.started` | Scan run begins | `mode: cheap\|llm\|manual`, `findings_open_before` |
259
+ | `curator.scan.completed` | Scan run ends | `findings_new`, `findings_resolved`, `duration_ms` |
260
+ | `curator.scan.skipped` | Skipped by rate gate | `reason: over_budget\|llm_interval_not_elapsed\|disabled` |
261
+ | `curator.finding.date_decayed` | A dated phrase is past the grace period | `memory_file`, `matched_phrase`, `days_past` |
262
+ | `curator.finding.path_broken` | Path reference does not resolve | `memory_file`, `broken_path` |
263
+ | `curator.finding.symbol_missing` | Backticked identifier returns zero rg matches | `memory_file`, `symbol` |
264
+ | `curator.finding.url_unchecked` | URL present, host not in allowlist | `memory_file`, `url_host` |
265
+ | `curator.finding.unused_low_signal` | Zero recalls in window (when emitter exists) | `memory_file`, `window_days` |
266
+ | `curator.finding.unused_undetectable` | Usage emitter not present | `note: "memory.recalled events not implemented"` |
267
+ | `curator.finding.contradiction` | LLM verdict `contradict` | `memory_a`, `memory_b`, `rationale` |
268
+ | `curator.finding.redundant_pair` | LLM verdict `redundant` | `memory_a`, `memory_b`, `rationale` |
269
+ | `curator.finding.broken_index` | MEMORY.md references missing file | `referenced_file` |
270
+ | `curator.finding.orphaned_memory` | Memory file not referenced from MEMORY.md | `memory_file` |
271
+ | `curator.finding.acknowledged` | User acknowledged finding via skill (no action taken) | `finding_id` |
272
+ | `curator.finding.resolved` | User resolved finding via skill (action taken) | `finding_id`, `action: prune\|edit\|reclassify\|defer` |
273
+
274
+ ---
275
+
276
+ ## Skills
277
+
278
+ **`/curator review`** — interactive walkthrough of open findings. For each: shows the memory body excerpt, the finding kind and detail, and offers prune / edit / reclassify / acknowledge / defer.
279
+
280
+ **`/curator scan`** — forces a full sweep including the LLM pass. Ignores rate gates.
281
+
282
+ **`/curator calibrate`** — runs the LLM sweep against the current memory store and reports precision against a labeled set (which the user maintains in `~/.onlooker/curator/<project-key>/calibration_labels.json`). Useful for tuning `contradiction_similarity_threshold`.
283
+
284
+ ---
285
+
286
+ ## Open Questions
287
+
288
+ 1. **`memory.recalled` event dependency.** The usage tracker requires an event emitter in the ecosystem substrate that does not yet exist. The substrate change is small (`UserPromptExpansion` hook can emit an event each time a memory is reinjected) but it is a prerequisite. Until then, the usage signal is dormant — `curator.finding.unused_undetectable` is emitted once per scan to make the missing capability visible.
289
+
290
+ 2. **Librarian-promoted vs. hand-written staleness.** A librarian-promoted memory was distilled from a session; its staleness criteria might be "the source session is older than X." A hand-written memory has no equivalent decay marker. The current design treats them identically; the provenance field is captured but not yet used differently.
291
+
292
+ 3. **LLM sweep cost growth.** Pairwise contradiction checks are O(N²) on pair candidates. At 100 memories with similarity-filtering, the sweep is typically under 10 LLM calls; at 500 memories the worst case approaches the `max_pair_evaluations_per_sweep` cap. A smarter pre-filter (e.g., embedding-based clustering to limit pair candidates) becomes worthwhile around 200 memories.
293
+
294
+ 4. **Finding dedup vs. re-evaluation.** A `date_decayed` finding for `2026-03-05` is the same fact every session — `deduped_hash` prevents re-emission. But a `contradiction` finding between two memories may be re-evaluated if either memory's body changes; the dedup hash should include both bodies' hashes, not just memory IDs.
295
+
296
+ 5. **Auto-prune as a future opt-in.** Like librarian's `auto_promote`, curator could grow an `auto_prune` mode for high-confidence findings (e.g., `path_broken` with no possible interpretation). Deferred until the cheap-check precision is measured in practice.
297
+
298
+ 6. **Type-drift detection.** Mentioned as failure mode E but not addressed by the current checks. Would require an LLM call per memory: "given this body, what type fits best?" — too expensive for every session, plausible for the weekly sweep.
299
+
300
+ 7. **Interaction with `~/.claude/CLAUDE.md`.** Global instructions in `~/.claude/CLAUDE.md` shape behavior but live outside the typed memory store. Curator does not audit them — cartographer does. If the boundary moves (e.g., librarian gains the ability to propose `~/.claude/CLAUDE.md` edits), curator and cartographer will need a shared rule for which substrate owns which file.
301
+
302
+ ---
303
+
304
+ ## Non-Goals
305
+
306
+ - Does not edit the memory store automatically — same posture as librarian and cartographer.
307
+ - Does not write new memories — that is librarian's job.
308
+ - Does not perform retrieval — the typed memory store reinjection mechanism is owned elsewhere.
309
+ - Does not audit instruction files (CLAUDE.md, AGENTS.md, `.claude/rules/`) — that is cartographer's job.
310
+ - Does not synthesize cross-session improvement briefs — that is counsel's job.
311
+ - Does not block any tool call — curator's surfacer is informational only.
@@ -0,0 +1,15 @@
1
+ {
2
+ "hooks": {
3
+ "SessionStart": [
4
+ {
5
+ "matcher": "*",
6
+ "hooks": [
7
+ {
8
+ "type": "command",
9
+ "command": "\"$CLAUDE_PLUGIN_ROOT\"/scripts/hooks/curator-session-start.sh"
10
+ }
11
+ ]
12
+ }
13
+ ]
14
+ }
15
+ }
@@ -0,0 +1,343 @@
1
+ #!/usr/bin/env bash
2
+ # Curator SessionStart hook.
3
+ #
4
+ # Runs cheap-tier checks against the typed memory store and emits findings
5
+ # under ~/.onlooker/curator/<project-key>/findings/. Surfaces a one-line
6
+ # pointer to /curator review when open findings exist.
7
+ #
8
+ # Hook contract:
9
+ # - Always exits 0. Never blocks session start.
10
+ # - Emits valid hookSpecificOutput JSON even when nothing to inject.
11
+ # - No-ops when curator.enabled is not true.
12
+ # - No-ops when no git context, no memory store path, or no checks pass
13
+ # the rate gate.
14
+ #
15
+ # LLM contradiction sweep is deferred to a follow-up commit.
16
+
17
+ set -uo pipefail
18
+
19
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
20
+ PLUGIN_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
21
+
22
+ _ECOSYSTEM_ROOT="${ONLOOKER_ECOSYSTEM_ROOT:-}"
23
+ if [[ -z "$_ECOSYSTEM_ROOT" ]]; then
24
+ _candidate="$(cd "${PLUGIN_ROOT}/../.." 2>/dev/null && pwd)"
25
+ if [[ -f "${_candidate}/scripts/lib/validate-path.sh" ]]; then
26
+ _ECOSYSTEM_ROOT="$_candidate"
27
+ fi
28
+ fi
29
+ if [[ -n "$_ECOSYSTEM_ROOT" && -f "${_ECOSYSTEM_ROOT}/scripts/lib/validate-path.sh" ]]; then
30
+ # shellcheck disable=SC1091
31
+ CLAUDE_PLUGIN_ROOT="$_ECOSYSTEM_ROOT" source "${_ECOSYSTEM_ROOT}/scripts/lib/validate-path.sh"
32
+ fi
33
+
34
+ # shellcheck source=../lib/curator-config.sh
35
+ source "${PLUGIN_ROOT}/scripts/lib/curator-config.sh"
36
+ # shellcheck source=../lib/curator-project-key.sh
37
+ source "${PLUGIN_ROOT}/scripts/lib/curator-project-key.sh"
38
+ # shellcheck source=../lib/curator-ulid.sh
39
+ source "${PLUGIN_ROOT}/scripts/lib/curator-ulid.sh"
40
+ # shellcheck source=../lib/curator-storage.sh
41
+ source "${PLUGIN_ROOT}/scripts/lib/curator-storage.sh"
42
+ # shellcheck source=../lib/curator-emit.sh
43
+ source "${PLUGIN_ROOT}/scripts/lib/curator-emit.sh"
44
+ # shellcheck source=../lib/curator-memory-reader.sh
45
+ source "${PLUGIN_ROOT}/scripts/lib/curator-memory-reader.sh"
46
+ # shellcheck source=../lib/curator-checks.sh
47
+ source "${PLUGIN_ROOT}/scripts/lib/curator-checks.sh"
48
+
49
+ _emit() {
50
+ local context="${1:-}"
51
+ jq -cn --arg ctx "$context" '{
52
+ hookSpecificOutput: {
53
+ hookEventName: "SessionStart",
54
+ additionalContext: $ctx
55
+ }
56
+ }'
57
+ }
58
+
59
+ INPUT=$(cat 2>/dev/null || true)
60
+ CWD=$(printf '%s' "$INPUT" | jq -r '.cwd // ""' 2>/dev/null) || CWD=""
61
+ SESSION_ID=$(printf '%s' "$INPUT" | jq -r '.session_id // ""' 2>/dev/null) || SESSION_ID=""
62
+ [[ -z "$CWD" ]] && CWD="$(pwd)"
63
+ [[ -z "$SESSION_ID" ]] && SESSION_ID="unknown"
64
+
65
+ REPO_ROOT=$(curator_project_repo_root "$CWD")
66
+ curator_config_load "$REPO_ROOT"
67
+
68
+ if ! curator_config_enabled; then
69
+ _emit ""
70
+ exit 0
71
+ fi
72
+
73
+ PROJECT_KEY=$(curator_project_key "$CWD")
74
+ if [[ -z "$PROJECT_KEY" ]]; then
75
+ _emit ""
76
+ exit 0
77
+ fi
78
+
79
+ curator_storage_init "$PROJECT_KEY" || { _emit ""; exit 0; }
80
+ REMOTE_URL=$(curator_project_remote_url "$CWD")
81
+ curator_storage_write_manifest "$PROJECT_KEY" "$REMOTE_URL" "$REPO_ROOT" || true
82
+
83
+ # ----------------------------------------------------------------------------
84
+ # Resolve the typed memory store path. Skip the audit if it can't be resolved.
85
+ # ----------------------------------------------------------------------------
86
+
87
+ MEM_PATH_TEMPLATE=$(curator_config_get '.curator.memory_store_path')
88
+ if [[ -z "$MEM_PATH_TEMPLATE" || "$MEM_PATH_TEMPLATE" == "null" ]]; then
89
+ MEM_PATH_TEMPLATE='${HOME}/.claude/projects/${CLAUDE_PROJECT_ENCODED}/memory'
90
+ fi
91
+ MEM_DIR=$(curator_memory_resolve_path "$MEM_PATH_TEMPLATE")
92
+
93
+ if [[ -z "$MEM_DIR" || ! -d "$MEM_DIR" ]]; then
94
+ # No memory store, nothing to audit. Still emit a scan event so the
95
+ # observability stream shows curator ran.
96
+ curator_emit "curator.scan.started" "$SESSION_ID" "$(jq -cn '{ mode: "cheap" }')"
97
+ curator_emit "curator.scan.complete" "$SESSION_ID" "$(jq -cn '{
98
+ mode: "cheap", outcome: "ok",
99
+ findings_new: 0, findings_resolved: 0, duration_ms: 0
100
+ }')"
101
+ _emit ""
102
+ exit 0
103
+ fi
104
+
105
+ # ----------------------------------------------------------------------------
106
+ # Cheap-tier rate gate.
107
+ #
108
+ # Three knobs:
109
+ # cheap_checks.enabled global on/off for the cheap tier
110
+ # cheap_checks.wall_clock_budget_ms abort phases past this elapsed
111
+ # surfacer.max_pointer_chars truncate additionalContext at this
112
+ # ----------------------------------------------------------------------------
113
+
114
+ CHEAP_ENABLED=$(curator_config_get '.curator.cheap_checks.enabled')
115
+ SCAN_START_MS=$(python3 -c 'import time; print(int(time.time() * 1000))' 2>/dev/null) \
116
+ || SCAN_START_MS=$(($(date +%s) * 1000))
117
+ SCAN_START_S=$((SCAN_START_MS / 1000))
118
+
119
+ curator_emit "curator.scan.started" "$SESSION_ID" "$(jq -cn '{ mode: "cheap" }')"
120
+
121
+ if [[ "$CHEAP_ENABLED" == "false" ]]; then
122
+ # Cheap tier explicitly off — emit scan.complete with skip_reason
123
+ # and skip straight to the surfacer (which reads previously-persisted
124
+ # findings, if any).
125
+ curator_emit "curator.scan.complete" "$SESSION_ID" "$(jq -cn \
126
+ --arg mode "cheap" --arg outcome "skipped" \
127
+ --arg skip_reason "disabled" \
128
+ --argjson findings_new 0 --argjson findings_resolved 0 \
129
+ --argjson duration_ms 0 \
130
+ '{ mode: $mode, outcome: $outcome, skip_reason: $skip_reason,
131
+ findings_new: $findings_new, findings_resolved: $findings_resolved,
132
+ duration_ms: $duration_ms }')"
133
+ FINDINGS_NEW=0
134
+ # Skip the per-check pipeline; fall through to the surfacer.
135
+ OUTCOME_FOR_SCAN_COMPLETE="skipped"
136
+ else
137
+ OUTCOME_FOR_SCAN_COMPLETE="ok"
138
+ fi
139
+
140
+ BUDGET_MS=$(curator_config_get '.curator.cheap_checks.wall_clock_budget_ms')
141
+ [[ -z "$BUDGET_MS" || "$BUDGET_MS" == "null" ]] && BUDGET_MS=500
142
+
143
+ _curator_now_ms() {
144
+ python3 -c 'import time; print(int(time.time() * 1000))' 2>/dev/null \
145
+ || echo "$(( $(date +%s) * 1000 ))"
146
+ }
147
+
148
+ _curator_over_budget() {
149
+ local now elapsed
150
+ now=$(_curator_now_ms)
151
+ elapsed=$((now - SCAN_START_MS))
152
+ (( elapsed > BUDGET_MS ))
153
+ }
154
+
155
+ # When the cheap tier is enabled, run the four checks under the budget
156
+ # gate. Each phase checks the budget BEFORE its work — partial phases
157
+ # are allowed to finish since check work itself is cheap.
158
+ DATE_FINDINGS='[]'
159
+ PATH_FINDINGS='[]'
160
+ BROKEN_INDEX='[]'
161
+ ORPHANED='[]'
162
+ BUDGET_TRIPPED="false"
163
+ MEMORIES='[]'
164
+
165
+ if [[ "$CHEAP_ENABLED" != "false" ]]; then
166
+ if _curator_over_budget; then
167
+ BUDGET_TRIPPED="true"
168
+ else
169
+ MEMORIES=$(curator_memory_load_all "$MEM_DIR")
170
+ fi
171
+
172
+ DATE_GRACE=$(curator_config_get '.curator.date_check.date_grace_period_days')
173
+ [[ -z "$DATE_GRACE" || "$DATE_GRACE" == "null" ]] && DATE_GRACE=14
174
+ DATE_CHECK_ENABLED=$(curator_config_get '.curator.date_check.enabled')
175
+
176
+ if [[ "$BUDGET_TRIPPED" != "true" && "$DATE_CHECK_ENABLED" != "false" ]]; then
177
+ if _curator_over_budget; then
178
+ BUDGET_TRIPPED="true"
179
+ else
180
+ DATE_FINDINGS=$(curator_check_dates "$MEMORIES" "$DATE_GRACE") || DATE_FINDINGS='[]'
181
+ fi
182
+ fi
183
+
184
+ REF_CHECK_ENABLED=$(curator_config_get '.curator.reference_check.enabled')
185
+ if [[ "$BUDGET_TRIPPED" != "true" && "$REF_CHECK_ENABLED" != "false" && -n "$REPO_ROOT" ]]; then
186
+ if _curator_over_budget; then
187
+ BUDGET_TRIPPED="true"
188
+ else
189
+ PATH_FINDINGS=$(curator_check_paths "$MEMORIES" "$REPO_ROOT") || PATH_FINDINGS='[]'
190
+ fi
191
+ fi
192
+
193
+ if [[ "$BUDGET_TRIPPED" != "true" ]]; then
194
+ if _curator_over_budget; then
195
+ BUDGET_TRIPPED="true"
196
+ else
197
+ BROKEN_INDEX=$(curator_check_broken_index "$MEMORIES")
198
+ ORPHANED=$(curator_check_orphaned "$MEMORIES")
199
+ fi
200
+ fi
201
+ fi
202
+
203
+ # ----------------------------------------------------------------------------
204
+ # Persist findings (with dedup by deduped_hash) and emit per-finding events.
205
+ # Skipped entirely when the cheap tier is disabled — the disabled path above
206
+ # already emitted scan.complete and set FINDINGS_NEW=0.
207
+ # ----------------------------------------------------------------------------
208
+
209
+ NOW_TS=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
210
+ [[ "$CHEAP_ENABLED" == "false" ]] || FINDINGS_NEW=0
211
+
212
+ _write_finding() {
213
+ local kind="$1"
214
+ local payload="$2"
215
+ local hash_input
216
+ hash_input="${kind}|$(printf '%s' "$payload" | jq -cS '.')"
217
+ local hash
218
+ hash=$(curator_finding_hash "$hash_input") || hash=""
219
+ [[ -z "$hash" ]] && return 0
220
+
221
+ # Dedup: skip if an open finding with the same hash already exists.
222
+ if curator_storage_has_finding_with_hash "$PROJECT_KEY" "$hash"; then
223
+ return 0
224
+ fi
225
+
226
+ local id record
227
+ id=$(curator_ulid)
228
+ record=$(jq -n \
229
+ --arg id "$id" \
230
+ --arg kind "$kind" \
231
+ --arg created_at "$NOW_TS" \
232
+ --arg deduped_hash "$hash" \
233
+ --argjson detail "$payload" \
234
+ '{
235
+ id: $id, kind: $kind, created_at: $created_at,
236
+ status: "open", deduped_hash: $deduped_hash, detail: $detail
237
+ }')
238
+ curator_storage_write_finding "$PROJECT_KEY" "$id" "$record" >/dev/null || return 0
239
+ FINDINGS_NEW=$((FINDINGS_NEW + 1))
240
+
241
+ # Per-kind event payload.
242
+ local event_type event_payload
243
+ event_type="curator.finding.${kind}"
244
+ event_payload=$(jq -cn --arg fid "$id" --argjson detail "$payload" \
245
+ '{ finding_id: $fid } + $detail')
246
+ curator_emit "$event_type" "$SESSION_ID" "$event_payload"
247
+ }
248
+
249
+ # Convert each finding-array entry into a stored + emitted finding.
250
+ _emit_kind_findings() {
251
+ local kind="$1" findings_json="$2"
252
+ local n
253
+ n=$(printf '%s' "$findings_json" | jq 'length' 2>/dev/null) || n=0
254
+ local i payload
255
+ for ((i = 0; i < n; i++)); do
256
+ payload=$(printf '%s' "$findings_json" | jq -c ".[$i]")
257
+ [[ -z "$payload" || "$payload" == "null" ]] && continue
258
+ _write_finding "$kind" "$payload"
259
+ done
260
+ }
261
+
262
+ if [[ "$CHEAP_ENABLED" != "false" ]]; then
263
+ _emit_kind_findings "date_decayed" "$DATE_FINDINGS"
264
+ _emit_kind_findings "path_broken" "$PATH_FINDINGS"
265
+ _emit_kind_findings "broken_index" "$BROKEN_INDEX"
266
+ _emit_kind_findings "orphaned_memory" "$ORPHANED"
267
+ fi
268
+
269
+ # ----------------------------------------------------------------------------
270
+ # Watermark + scan.complete. The disabled-tier branch above already emitted
271
+ # scan.complete; this branch fires only when the cheap tier ran (success or
272
+ # budget tripped).
273
+ # ----------------------------------------------------------------------------
274
+
275
+ if [[ "$CHEAP_ENABLED" != "false" ]]; then
276
+ curator_storage_write_watermark "$(curator_last_cheap_scan_path "$PROJECT_KEY")" || true
277
+
278
+ DURATION_MS=$(( $(_curator_now_ms) - SCAN_START_MS ))
279
+ if [[ "$BUDGET_TRIPPED" == "true" ]]; then
280
+ curator_emit "curator.scan.complete" "$SESSION_ID" "$(jq -cn \
281
+ --arg mode "cheap" --arg outcome "skipped" \
282
+ --arg skip_reason "over_budget" \
283
+ --argjson findings_new "$FINDINGS_NEW" \
284
+ --argjson findings_resolved 0 \
285
+ --argjson duration_ms "$DURATION_MS" \
286
+ '{ mode: $mode, outcome: $outcome, skip_reason: $skip_reason,
287
+ findings_new: $findings_new,
288
+ findings_resolved: $findings_resolved,
289
+ duration_ms: $duration_ms }')"
290
+ else
291
+ curator_emit "curator.scan.complete" "$SESSION_ID" "$(jq -cn \
292
+ --arg mode "cheap" --arg outcome "ok" \
293
+ --argjson findings_new "$FINDINGS_NEW" \
294
+ --argjson findings_resolved 0 \
295
+ --argjson duration_ms "$DURATION_MS" \
296
+ '{ mode: $mode, outcome: $outcome,
297
+ findings_new: $findings_new,
298
+ findings_resolved: $findings_resolved,
299
+ duration_ms: $duration_ms }')"
300
+ fi
301
+ fi
302
+
303
+ # ----------------------------------------------------------------------------
304
+ # Surfacer.
305
+ # ----------------------------------------------------------------------------
306
+
307
+ SKIP_WHEN_ZERO=$(curator_config_get '.curator.surfacer.skip_when_zero')
308
+ [[ -z "$SKIP_WHEN_ZERO" || "$SKIP_WHEN_ZERO" == "null" ]] && SKIP_WHEN_ZERO="true"
309
+
310
+ OPEN_COUNT=$(curator_storage_count_open "$PROJECT_KEY")
311
+ [[ -z "$OPEN_COUNT" || "$OPEN_COUNT" == "null" ]] && OPEN_COUNT=0
312
+
313
+ if [[ "$OPEN_COUNT" -eq 0 && "$SKIP_WHEN_ZERO" == "true" ]]; then
314
+ _emit ""
315
+ exit 0
316
+ fi
317
+
318
+ # Build a compact "2 path-broken, 1 date-decayed" descriptor for the
319
+ # pointer message.
320
+ COUNTS_BY_KIND=$(curator_storage_open_counts_by_kind "$PROJECT_KEY")
321
+ SUMMARY=$(printf '%s' "$COUNTS_BY_KIND" | jq -r '
322
+ map( (.count|tostring) + " " + (.kind | gsub("_"; "-")) )
323
+ | join(", ")
324
+ ')
325
+
326
+ CONTEXT=$(printf 'Curator: %s open finding%s (%s). Review with `/curator review`.' \
327
+ "$OPEN_COUNT" \
328
+ "$([ "$OPEN_COUNT" -eq 1 ] && echo "" || echo "s")" \
329
+ "$SUMMARY")
330
+
331
+ # Cap the pointer length so a long per-kind summary never overflows the
332
+ # user's SessionStart context.
333
+ MAX_POINTER=$(curator_config_get '.curator.surfacer.max_pointer_chars')
334
+ [[ -z "$MAX_POINTER" || "$MAX_POINTER" == "null" ]] && MAX_POINTER=200
335
+ if [[ "${#CONTEXT}" -gt "$MAX_POINTER" ]]; then
336
+ # Reserve room for the truncation ellipsis without exceeding the cap.
337
+ TRUNC=$((MAX_POINTER - 1))
338
+ (( TRUNC < 1 )) && TRUNC=1
339
+ CONTEXT="${CONTEXT:0:TRUNC}…"
340
+ fi
341
+
342
+ _emit "$CONTEXT"
343
+ exit 0