@onlooker-community/ecosystem 0.19.0 → 0.21.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +26 -0
- package/.claude-plugin/plugin.json +1 -1
- package/.release-please-manifest.json +4 -2
- package/CHANGELOG.md +14 -0
- package/docs/memory-architecture.md +102 -0
- package/package.json +3 -3
- package/plugins/curator/.claude-plugin/plugin.json +14 -0
- package/plugins/curator/CHANGELOG.md +10 -0
- package/plugins/curator/README.md +55 -0
- package/plugins/curator/config.json +41 -0
- package/plugins/curator/docs/adr/001-staleness-tiers.md +100 -0
- package/plugins/curator/docs/design.md +311 -0
- package/plugins/curator/hooks/hooks.json +15 -0
- package/plugins/curator/scripts/hooks/curator-session-start.sh +343 -0
- package/plugins/curator/scripts/lib/curator-checks.sh +155 -0
- package/plugins/curator/scripts/lib/curator-config.sh +67 -0
- package/plugins/curator/scripts/lib/curator-emit.sh +61 -0
- package/plugins/curator/scripts/lib/curator-memory-reader.sh +225 -0
- package/plugins/curator/scripts/lib/curator-project-key.sh +82 -0
- package/plugins/curator/scripts/lib/curator-storage.sh +176 -0
- package/plugins/curator/scripts/lib/curator-ulid.sh +43 -0
- package/plugins/historian/docs/adr/001-local-embeddings-only.md +96 -0
- package/plugins/historian/docs/design.md +317 -0
- package/plugins/librarian/.claude-plugin/plugin.json +14 -0
- package/plugins/librarian/CHANGELOG.md +10 -0
- package/plugins/librarian/README.md +51 -0
- package/plugins/librarian/config.json +52 -0
- package/plugins/librarian/docs/adr/001-propose-dont-auto-write.md +87 -0
- package/plugins/librarian/docs/design.md +301 -0
- package/plugins/librarian/hooks/hooks.json +26 -0
- package/plugins/librarian/scripts/hooks/librarian-session-end.sh +312 -0
- package/plugins/librarian/scripts/hooks/librarian-session-start.sh +103 -0
- package/plugins/librarian/scripts/lib/librarian-archivist-reader.sh +67 -0
- package/plugins/librarian/scripts/lib/librarian-classifier.sh +139 -0
- package/plugins/librarian/scripts/lib/librarian-config.sh +74 -0
- package/plugins/librarian/scripts/lib/librarian-durability.sh +77 -0
- package/plugins/librarian/scripts/lib/librarian-emit.sh +72 -0
- package/plugins/librarian/scripts/lib/librarian-project-key.sh +83 -0
- package/plugins/librarian/scripts/lib/librarian-storage.sh +222 -0
- package/plugins/librarian/scripts/lib/librarian-ulid.sh +50 -0
- package/release-please-config.json +32 -0
- package/test/bats/curator-session-start.bats +316 -0
- package/test/bats/librarian-session-end.bats +182 -0
- package/test/bats/librarian-session-start.bats +136 -0
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## [0.1.0](https://github.com/onlooker-community/ecosystem/compare/librarian-v0.0.1...librarian-v0.1.0) (2026-06-04)
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
### Features
|
|
7
|
+
|
|
8
|
+
* **librarian:** land plugin end-to-end with memory layer designs :seedling: ([#55](https://github.com/onlooker-community/ecosystem/issues/55)) ([d4821ef](https://github.com/onlooker-community/ecosystem/commit/d4821efabfeb587e460e898d7db8f92fcc3f2c61))
|
|
9
|
+
|
|
10
|
+
## Changelog
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Librarian
|
|
2
|
+
|
|
3
|
+
Consolidates archivist's per-session artifacts into the user's durable typed memory store.
|
|
4
|
+
|
|
5
|
+
When a session ends, Librarian reads the decisions, dead-ends, and open questions that archivist captured during the session, decides which deserve to live beyond the session, classifies them into the four memory types (user, feedback, project, reference), and queues them as proposals for explicit confirmation. By default Librarian never writes to the typed memory store directly — see [ADR-001](docs/adr/001-propose-dont-auto-write.md) for why.
|
|
6
|
+
|
|
7
|
+
Librarian is a sibling plugin to [`ecosystem`](../../) and assumes the Onlooker observability substrate (`~/.onlooker/`) is present. It depends on [archivist](../archivist) at the data layer (it reads archivist's artifact files) but not at runtime.
|
|
8
|
+
|
|
9
|
+
## How it works
|
|
10
|
+
|
|
11
|
+
| Hook | What Librarian does |
|
|
12
|
+
|------|---------------------|
|
|
13
|
+
| `SessionEnd` | Scans archivist artifacts since last watermark, runs the durability filter, classifies surviving candidates with Haiku, detects conflicts/duplicates against existing memories, writes proposals to `~/.onlooker/librarian/<project-key>/proposals/`. |
|
|
14
|
+
| `SessionStart` | Counts pending proposals; if any, injects a single-line pointer pointing the user at `/librarian review`. |
|
|
15
|
+
|
|
16
|
+
## Activation
|
|
17
|
+
|
|
18
|
+
Librarian is **off by default**. Enable per-project in `.claude/settings.json`:
|
|
19
|
+
|
|
20
|
+
```json
|
|
21
|
+
{
|
|
22
|
+
"librarian": {
|
|
23
|
+
"enabled": true
|
|
24
|
+
}
|
|
25
|
+
}
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Or globally in `~/.claude/settings.json`. See [`config.json`](config.json) for the full set of tunable defaults.
|
|
29
|
+
|
|
30
|
+
## Storage layout
|
|
31
|
+
|
|
32
|
+
```text
|
|
33
|
+
~/.onlooker/librarian/<project-key>/
|
|
34
|
+
├── manifest.json # project metadata
|
|
35
|
+
├── last_scan.json # watermark for incremental scans
|
|
36
|
+
├── proposals/<ulid>.json # pending/resolved proposals
|
|
37
|
+
└── tombstones/<body_hash>.json # records of rejected/pruned promotions
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Accepted promotions land in the user's typed memory store at `~/.claude/projects/<encoded-project>/memory/` with a `source: "librarian"` provenance trailer in the frontmatter.
|
|
41
|
+
|
|
42
|
+
## Status
|
|
43
|
+
|
|
44
|
+
This plugin is **in design / scaffolding phase**. The hook entry points exist and load cleanly but the scan + classify + propose pipeline is not yet implemented. See [`docs/design.md`](docs/design.md) for the full design and [`docs/adr/001-propose-dont-auto-write.md`](docs/adr/001-propose-dont-auto-write.md) for the load-bearing decision.
|
|
45
|
+
|
|
46
|
+
## Requirements
|
|
47
|
+
|
|
48
|
+
- The `ecosystem` plugin installed (for `~/.onlooker/` substrate).
|
|
49
|
+
- `archivist` plugin installed (Librarian reads its artifact files; Librarian degrades to no-op if archivist is absent).
|
|
50
|
+
- `claude` CLI on `PATH` for the classifier call.
|
|
51
|
+
- `jq` for JSON manipulation.
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
{
|
|
2
|
+
"plugin_name": "librarian",
|
|
3
|
+
"storage_path": "~/.onlooker",
|
|
4
|
+
"librarian": {
|
|
5
|
+
"enabled": false,
|
|
6
|
+
"auto_promote": false,
|
|
7
|
+
"auto_promote_threshold": 0.85,
|
|
8
|
+
"memory_store_path": "${HOME}/.claude/projects/${CLAUDE_PROJECT_ENCODED}/memory",
|
|
9
|
+
"scan": {
|
|
10
|
+
"trigger": "SessionEnd",
|
|
11
|
+
"bootstrap_lookback_days": 14,
|
|
12
|
+
"min_detail_chars": 40
|
|
13
|
+
},
|
|
14
|
+
"classifier": {
|
|
15
|
+
"model": "claude-haiku-4-5-20251001",
|
|
16
|
+
"temperature": 0.2,
|
|
17
|
+
"max_output_tokens": 256,
|
|
18
|
+
"min_classifier_confidence": 0.6
|
|
19
|
+
},
|
|
20
|
+
"durability_filter": {
|
|
21
|
+
"marker_phrases": [
|
|
22
|
+
"always",
|
|
23
|
+
"never",
|
|
24
|
+
"remember",
|
|
25
|
+
"from now on",
|
|
26
|
+
"every time",
|
|
27
|
+
"whenever",
|
|
28
|
+
"prefer",
|
|
29
|
+
"the reason",
|
|
30
|
+
"because",
|
|
31
|
+
"historically",
|
|
32
|
+
"legacy",
|
|
33
|
+
"compliance",
|
|
34
|
+
"requirement"
|
|
35
|
+
],
|
|
36
|
+
"require_grounding": false,
|
|
37
|
+
"repetition_min_sessions": 2
|
|
38
|
+
},
|
|
39
|
+
"conflict": {
|
|
40
|
+
"duplicate_threshold": 0.7,
|
|
41
|
+
"merge_candidate_threshold": 0.45,
|
|
42
|
+
"conflict_keyword_overlap": 0.5
|
|
43
|
+
},
|
|
44
|
+
"surfacer": {
|
|
45
|
+
"max_pending_for_inject": 20,
|
|
46
|
+
"skip_inject_when_zero": true
|
|
47
|
+
},
|
|
48
|
+
"tombstones": {
|
|
49
|
+
"ttl_days": 180
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
}
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# ADR-001: Librarian Proposes, Doesn't Auto-Write by Default
|
|
2
|
+
|
|
3
|
+
- Status: Accepted
|
|
4
|
+
- Date: 2026-06-02
|
|
5
|
+
- Deciders: Meagan
|
|
6
|
+
- Tags: librarian, memory, safety-default, user-confirmation
|
|
7
|
+
|
|
8
|
+
## Context and Problem Statement
|
|
9
|
+
|
|
10
|
+
Librarian's job is to consolidate archivist's per-session artifacts into the user's durable typed memory store at `~/.claude/projects/<encoded-project>/memory/`. The typed memory store is reinjected into every future session — anything written there persistently shapes the model's behavior across sessions.
|
|
11
|
+
|
|
12
|
+
This creates a sharp asymmetry: a missed promotion is silently absorbed (the user just doesn't get a new memory they might have wanted), but a wrongly-accepted promotion is much worse. It silently bloats the memory store, may contradict existing entries, may be reinjected in contexts where it's misleading, and is hard to detect — the model dutifully follows the planted memory without surfacing where it came from. The user only notices when behavior goes subtly wrong over multiple sessions.
|
|
13
|
+
|
|
14
|
+
The question is whether librarian should write proposals directly to the typed memory store (with curator as a downstream cleanup mechanism), or queue proposals for explicit user confirmation.
|
|
15
|
+
|
|
16
|
+
## Decision Drivers
|
|
17
|
+
|
|
18
|
+
- **Asymmetric cost.** False positive (wrong promotion) is silent, slow to detect, and pollutes future sessions; false negative (missed promotion) is recoverable on the next scan or via explicit user request.
|
|
19
|
+
- **Promotion is a load-bearing edit.** The typed memory store is part of the system prompt at every session start. Writes to it deserve the same care as writes to CLAUDE.md.
|
|
20
|
+
- **Classifier confidence is noisy.** Haiku-grade classification with a small structured prompt is reliable enough for ranking but not reliable enough for unsupervised commits to a high-leverage substrate. Calibration runs in the design doc show meaningful per-repo variation in the precision-confidence relationship.
|
|
21
|
+
- **Cartographer precedent.** Cartographer detects issues in instruction files (CLAUDE.md, AGENTS.md) and emits findings — it does not auto-edit those files. The typed memory store is the same kind of substrate (durable, system-level). The same posture applies.
|
|
22
|
+
- **Curator is downstream, not a safety net.** Curator catches stale/contradictory memories, but its checks are also heuristic. Relying on curator to undo bad librarian promotions stacks two probabilistic systems and degrades both.
|
|
23
|
+
- **User-experience grain.** A "Librarian has 4 pending proposals — `/librarian review`" pointer at session start is a low-cost interruption; a wrongly-promoted memory is a high-cost interruption that arrives later and harder to attribute.
|
|
24
|
+
|
|
25
|
+
## Considered Options
|
|
26
|
+
|
|
27
|
+
1. **Auto-promote on high confidence.** Anything classified above a confidence threshold (e.g., 0.85) writes directly to the typed memory store; lower-confidence items go to the proposal queue. Curator catches mistakes.
|
|
28
|
+
2. **Always queue proposals (proposed default).** Every promotion goes through the proposal queue. User confirms via `/librarian review` or by setting `auto_promote: true` opt-in.
|
|
29
|
+
3. **Stage to a separate "shadow" store first.** Librarian writes to a `~/.claude/projects/<encoded-project>/memory/_librarian_pending/` subdirectory; the user manually moves items into the main store.
|
|
30
|
+
4. **No queue — interactive confirmation at promotion time.** Block at `SessionEnd` to walk the user through each candidate immediately. Decisions are made in-the-moment with session context fresh.
|
|
31
|
+
|
|
32
|
+
## Decision
|
|
33
|
+
|
|
34
|
+
We adopt **Option 2: always queue proposals; auto-promotion is opt-in via `auto_promote: true`**.
|
|
35
|
+
|
|
36
|
+
The proposal queue lives at `~/.onlooker/librarian/<project-key>/proposals/<ulid>.json`. At `SessionStart`, librarian injects a single-line pointer indicating proposal count. The user reviews via `/librarian review`, which walks each proposal interactively. Accepted proposals are written to the typed memory store with provenance fields (`source: "librarian"`, `source_session_id`, `source_artifact_ids`, `classifier_confidence`, `promoted_at`). Rejected proposals are logged as tombstones (body hash) so the same content is not re-proposed indefinitely.
|
|
37
|
+
|
|
38
|
+
Users who explicitly want auto-promotion set `auto_promote: true` in their settings. When auto-promote is enabled, proposals with `classifier_confidence >= auto_promote_threshold` (default: 0.85) are written directly and an `additionalContext` notice surfaces what was promoted, so the user retains visibility even when they opted out of the queue. Conflict-state proposals (duplicate, merge_candidate, conflict_candidate) always queue regardless of auto-promote setting — those need a human disambiguation.
|
|
39
|
+
|
|
40
|
+
Option 1 is rejected because the asymmetry between false-positive and false-negative cost is too large to bet on a confidence threshold being correctly calibrated on first install. Confidence calibration drifts as model versions change.
|
|
41
|
+
|
|
42
|
+
Option 3 is rejected because a shadow store complicates the typed memory store's directory shape and is not visible from the model's reinjection — which means the user has to remember to check it. The proposal queue is a better surface for the same goal.
|
|
43
|
+
|
|
44
|
+
Option 4 is rejected because `SessionEnd` is exactly when the user wants to stop working. Forcing an interactive review at that moment trades a known-bad moment (interruption at the end of work) for a slightly better confirmation surface (fresh context). The cost-benefit favors deferring review to the next session start, where the user is paged for the day's work anyway.
|
|
45
|
+
|
|
46
|
+
## Consequences
|
|
47
|
+
|
|
48
|
+
### Positive
|
|
49
|
+
|
|
50
|
+
- The typed memory store remains under direct user control. Promotions are visible, attributable, and reversible at the moment of promotion.
|
|
51
|
+
- The asymmetry between false-positive and false-negative cost is respected — the cheaper failure mode is the default.
|
|
52
|
+
- Users who trust librarian after calibration can opt into auto-promotion with a single config key; the design does not penalize the high-trust case forever.
|
|
53
|
+
- Provenance metadata is captured at promotion time, which makes curator's downstream job easier: it can use provenance to distinguish librarian-promoted memories from hand-written ones and apply different staleness criteria.
|
|
54
|
+
- The default posture matches cartographer's: detect, propose, surface — never silently edit a substrate the user maintains by hand.
|
|
55
|
+
|
|
56
|
+
### Negative
|
|
57
|
+
|
|
58
|
+
- The first-run experience is wordier: a user who enables librarian sees "Librarian has N pending proposals" at every session start until they review the queue. Without explicit triage, the queue can accumulate.
|
|
59
|
+
- A user with `auto_promote: false` who never runs `/librarian review` gets no benefit from librarian — the proposals pile up unseen, and the typed memory store stays empty.
|
|
60
|
+
- The proposal queue is a new substrate with its own state management. Stale proposals (e.g., for memories that have since been hand-written by the user) need to be detected and cleaned.
|
|
61
|
+
|
|
62
|
+
### Neutral
|
|
63
|
+
|
|
64
|
+
- Adoption pattern likely mirrors archivist's: opt-in, with a small group of high-trust users flipping `auto_promote: true` after seeing the proposals work well for their projects. The design accommodates both populations.
|
|
65
|
+
|
|
66
|
+
## Implementation Notes
|
|
67
|
+
|
|
68
|
+
- The `auto_promote_threshold` defaults to 0.85, deliberately above the calibrated noise floor for the Haiku classifier. The `/librarian calibrate` skill measures per-repo precision at this threshold and recommends adjustments.
|
|
69
|
+
- When `auto_promote: true` writes a promotion directly, librarian emits both `librarian.candidate.proposed` and `librarian.proposal.accepted` in immediate succession, with `accepted_via: "auto"` in the latter. This preserves a uniform event trail regardless of acceptance path.
|
|
70
|
+
- Conflict-state proposals (`duplicate`, `merge_candidate`, `conflict_candidate`) always queue. The conflict resolution requires user judgment that no confidence threshold authorizes.
|
|
71
|
+
- Tombstones (rejection records) are stored at `~/.onlooker/librarian/<project-key>/tombstones/` keyed by body hash. The hash includes the proposed memory's normalized body but not its title or filename, so trivial rewordings of the same rejected content are caught.
|
|
72
|
+
- Tombstones expire after `tombstone_ttl_days` (default: 180) so a rejection from six months ago doesn't permanently silence a promotion that would now be wanted. See [librarian design — Open Questions #1](../design.md#open-questions) for the unresolved tradeoff.
|
|
73
|
+
|
|
74
|
+
## Validation
|
|
75
|
+
|
|
76
|
+
This decision is validated against the librarian failure modes recorded in the design doc:
|
|
77
|
+
|
|
78
|
+
- **Failure mode D (pollution from automatic promotion)** is mitigated by default — auto-promote requires explicit opt-in.
|
|
79
|
+
- **Failure mode C (memory store starvation)** is mitigated as long as the user runs `/librarian review` at least occasionally. If they never review, librarian provides no benefit; this is acceptable because the alternative (auto-promotion they didn't sign up for) is worse.
|
|
80
|
+
- After 30 days of use in a single repo, the ratio of accepted to rejected proposals should fall above 0.7 to indicate the classifier is well-calibrated for the project. A lower ratio is the signal to run `/librarian calibrate` and adjust thresholds.
|
|
81
|
+
|
|
82
|
+
## References
|
|
83
|
+
|
|
84
|
+
- Cartographer design — precedent for "detect and propose, do not auto-edit" on durable substrates (`plugins/cartographer/docs/design.md`)
|
|
85
|
+
- Compass ADR-001 — precedent for evaluator-substrate decisions being load-bearing (`plugins/compass/docs/adr/001-evaluate-prompts-in-context.md`)
|
|
86
|
+
- Memory architecture overview (`docs/memory-architecture.md`)
|
|
87
|
+
- Librarian design (`../design.md`)
|
|
@@ -0,0 +1,301 @@
|
|
|
1
|
+
# Librarian — Plugin Design
|
|
2
|
+
|
|
3
|
+
**Plugin name:** `librarian`
|
|
4
|
+
**Tagline:** *Promotes what's worth keeping.*
|
|
5
|
+
**Status:** Design (pre-implementation)
|
|
6
|
+
|
|
7
|
+
Librarian is the consolidation layer between archivist's per-session artifacts and the user's durable typed memory store. It watches what archivist writes during a session, decides which of those decisions, dead-ends, and open questions deserve to live beyond the session, classifies them into the existing memory types (user/feedback/project/reference), and proposes promotions for user confirmation. It does not write to the typed memory store automatically by default — see [ADR-001](adr/001-propose-dont-auto-write.md).
|
|
8
|
+
|
|
9
|
+
It sits in the [memory architecture](../../../docs/memory-architecture.md) between archivist (session-scoped) and curator (maintenance). Where archivist treats every session as fresh and ranks by recency, librarian asks: "should this fact survive across sessions, and if so, what kind of memory is it?"
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Failure Modes Librarian Addresses
|
|
14
|
+
|
|
15
|
+
**A — The same decision rediscovered every session.** A user explains why the auth middleware is being rewritten (legal compliance, not tech debt). Archivist captures it as a decision; it gets reinjected next session within the recency budget. Three sessions later it has aged out, and the agent re-asks "why are we rewriting this?" Librarian promotes load-bearing project facts into the typed memory store, where they don't decay on recency alone.
|
|
16
|
+
|
|
17
|
+
**B — Feedback observed but not generalized.** During a session, the user says "stop summarizing what you just did at the end of every response." The model corrects course for the rest of that session. Archivist may or may not capture it as a decision (it's not a code decision). Next session the model summarizes again. Librarian detects the corrective pattern, classifies it as `feedback`, and proposes it for the typed memory store with **Why:** and **How to apply:** fields filled in.
|
|
18
|
+
|
|
19
|
+
**C — Memory store starvation.** A user who never says "remember that…" ends up with a near-empty typed memory store, even after months of work. Archivist captures everything per-session but nothing accumulates. Librarian provides the promotion path that doesn't depend on the user explicitly invoking it.
|
|
20
|
+
|
|
21
|
+
**D — Pollution from automatic promotion (counter-failure).** If librarian wrote directly to the typed memory store, every false positive would silently bloat the file and degrade future sessions. The design avoids this by defaulting to propose-only — see [ADR-001](adr/001-propose-dont-auto-write.md).
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Architecture
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
SessionEnd hook fires (or skill invocation)
|
|
29
|
+
│
|
|
30
|
+
▼
|
|
31
|
+
┌──────────────────────┐
|
|
32
|
+
│ Artifact Reader │ reads archivist artifacts since last librarian
|
|
33
|
+
│ │ scan; reads recent transcript tail for context
|
|
34
|
+
└─────────┬────────────┘
|
|
35
|
+
│
|
|
36
|
+
▼
|
|
37
|
+
┌──────────────────────┐
|
|
38
|
+
│ Durability Filter │ cheap heuristics — keep candidates that show
|
|
39
|
+
│ │ signs of durability (repetition, marker phrases,
|
|
40
|
+
│ │ explicit user preference language)
|
|
41
|
+
└─────────┬────────────┘
|
|
42
|
+
│ candidates remain
|
|
43
|
+
▼
|
|
44
|
+
┌──────────────────────┐
|
|
45
|
+
│ Type Classifier │ Haiku call: user / feedback / project / reference
|
|
46
|
+
│ (LLM) │ emits null for "session-only — don't promote"
|
|
47
|
+
└─────────┬────────────┘
|
|
48
|
+
│
|
|
49
|
+
▼
|
|
50
|
+
┌──────────────────────┐
|
|
51
|
+
│ Conflict / Dup │ compare against existing memory files; merge,
|
|
52
|
+
│ Detector │ supersede, or flag conflict
|
|
53
|
+
└─────────┬────────────┘
|
|
54
|
+
│
|
|
55
|
+
▼
|
|
56
|
+
┌──────────────────────┐
|
|
57
|
+
│ Proposal Queue │ written to ~/.onlooker/librarian/<key>/proposals/
|
|
58
|
+
│ │ │
|
|
59
|
+
└─────────┬────────────┘
|
|
60
|
+
│ at next SessionStart
|
|
61
|
+
▼
|
|
62
|
+
┌──────────────────────┐
|
|
63
|
+
│ Surfacer │ injects "Librarian proposes N promotions"
|
|
64
|
+
│ │ pointer; full review via /librarian skill
|
|
65
|
+
└──────────────────────┘
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### Artifact Reader
|
|
69
|
+
|
|
70
|
+
Reads `~/.onlooker/archivist/<project-key>/decisions/`, `dead_ends/`, and `open_questions/` for artifacts created since the last librarian scan. The last-scan watermark is `~/.onlooker/librarian/<project-key>/last_scan.json` (an ISO-8601 timestamp). On first run, librarian scans the last `bootstrap_lookback_days` (default: 14) of artifacts.
|
|
71
|
+
|
|
72
|
+
Each artifact carries `session_id` and `created_at`. Librarian groups candidates by `session_id` so the classifier has session-shaped context, not a flat fire-hose.
|
|
73
|
+
|
|
74
|
+
### Durability Filter
|
|
75
|
+
|
|
76
|
+
A cheap pre-LLM filter that drops obvious session-only items before paying for classification. Heuristics, in order:
|
|
77
|
+
|
|
78
|
+
1. **Marker-phrase boost.** Artifact summary or detail contains one of: `always`, `never`, `remember`, `from now on`, `every time`, `whenever`, `prefer`, `the reason`, `because`, `historically`, `legacy`, `compliance`, `requirement`. These markers correlate strongly with durable facts in calibration runs (target: ≥60% precision; revalidate per repo via `/librarian calibrate`).
|
|
79
|
+
2. **Reference grounding.** Artifact `files` field lists at least one path that still exists in the repo (paths that have been deleted suggest a fact about completed/abandoned work, not durable knowledge).
|
|
80
|
+
3. **Repetition across sessions.** Same canonical summary (after normalization: lowercased, stopwords removed, top-3 tokens) appears in archivist artifacts from ≥2 distinct sessions. Strong durability signal.
|
|
81
|
+
4. **Drop list.** Specific patterns that almost never promote well: "the test is failing", "let me check", "I'll come back to this", artifact whose detail is shorter than `min_detail_chars` (default: 40).
|
|
82
|
+
|
|
83
|
+
Filter outputs a candidate set. If the set is empty, librarian emits `librarian.scan.empty` and exits.
|
|
84
|
+
|
|
85
|
+
### Type Classifier
|
|
86
|
+
|
|
87
|
+
For each remaining candidate, librarian calls Haiku once with a structured prompt. The prompt presents the four memory types from the user's CLAUDE.md (user, feedback, project, reference) with examples, and asks the model to emit one of those four labels or `null` for "session-only — don't promote."
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
You are classifying a session artifact for promotion into a long-term memory store.
|
|
91
|
+
|
|
92
|
+
The store has four types:
|
|
93
|
+
- user: durable facts about the user's role, expertise, or working style
|
|
94
|
+
- feedback: corrections or validated preferences ("don't do X", "yes, keep doing Y")
|
|
95
|
+
- project: ongoing work facts, decisions, constraints not derivable from the code
|
|
96
|
+
- reference: pointers to external systems (issue trackers, dashboards, channels)
|
|
97
|
+
|
|
98
|
+
RULES:
|
|
99
|
+
- Output only JSON: {"type": "<user|feedback|project|reference|null>",
|
|
100
|
+
"title": "<≤60 chars>",
|
|
101
|
+
"body": "<the memory content; structure per type>",
|
|
102
|
+
"confidence": <float 0–1>}
|
|
103
|
+
- Use null when the artifact is interesting but session-only (a specific bug fix,
|
|
104
|
+
a one-off question that got answered, an exploration that didn't change anything).
|
|
105
|
+
- For feedback and project types, include **Why:** and **How to apply:** lines.
|
|
106
|
+
|
|
107
|
+
<artifact>
|
|
108
|
+
kind: {{ARTIFACT_KIND}}
|
|
109
|
+
summary: {{SUMMARY}}
|
|
110
|
+
detail: {{DETAIL}}
|
|
111
|
+
files: {{FILES_LIST}}
|
|
112
|
+
session_id: {{SESSION_ID}}
|
|
113
|
+
created_at: {{CREATED_AT}}
|
|
114
|
+
</artifact>
|
|
115
|
+
|
|
116
|
+
<surrounding_session_context>
|
|
117
|
+
{{SESSION_CONTEXT_EXCERPT}}
|
|
118
|
+
</surrounding_session_context>
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Model: `claude-haiku-4-5-20251001`. Temperature 0.2 (slight stability vs. 0 for noise-floor robustness). Max output tokens: 256.
|
|
122
|
+
|
|
123
|
+
Candidates with `confidence < min_classifier_confidence` (default: 0.6) are dropped silently — the cost of a missed promotion is low; the cost of a noisy proposal queue is high.
|
|
124
|
+
|
|
125
|
+
### Conflict / Duplicate Detector
|
|
126
|
+
|
|
127
|
+
For each surviving candidate, librarian:
|
|
128
|
+
|
|
129
|
+
1. Reads `MEMORY.md` and each referenced memory file from `~/.claude/projects/<encoded-project>/memory/`.
|
|
130
|
+
2. Computes token-set similarity (Jaccard on lowercased, stopword-stripped token sets) against each existing memory's body.
|
|
131
|
+
3. If max similarity ≥ `duplicate_threshold` (default: 0.7), classifies as `duplicate` — annotated and dropped from proposals, but logged as `librarian.candidate.dropped` with `reason: "duplicate"`.
|
|
132
|
+
4. If `merge_candidate_threshold` (default: 0.45) ≤ similarity < `duplicate_threshold`, classifies as `merge_candidate` — the proposal records the existing memory's filename so the surfacer can offer "merge into X" as a resolution path.
|
|
133
|
+
5. If `conflict_keyword_overlap` (default: 0.5) overlap on the body but opposing sentiment markers (`always`/`never`, `do`/`don't`, `prefer`/`avoid`), classifies as `conflict_candidate` — the surfacer offers "this contradicts X — supersede, keep both, or drop new."
|
|
134
|
+
|
|
135
|
+
Conflict detection at this stage is cheap pattern matching, not an LLM call. Curator does the deeper LLM-based contradiction sweep separately.
|
|
136
|
+
|
|
137
|
+
### Proposal Queue
|
|
138
|
+
|
|
139
|
+
Each proposal is written to `~/.onlooker/librarian/<project-key>/proposals/<ulid>.json`:
|
|
140
|
+
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"id": "01J...",
|
|
144
|
+
"created_at": "2026-06-02T18:24:11Z",
|
|
145
|
+
"source_artifact_ids": ["01J...", "01J..."],
|
|
146
|
+
"source_session_ids": ["..."],
|
|
147
|
+
"proposed": {
|
|
148
|
+
"type": "feedback",
|
|
149
|
+
"filename": "feedback_no_trailing_summaries.md",
|
|
150
|
+
"title": "Don't write trailing summaries",
|
|
151
|
+
"body": "...",
|
|
152
|
+
"classifier_confidence": 0.84
|
|
153
|
+
},
|
|
154
|
+
"conflict_state": "none | duplicate | merge_candidate | conflict_candidate",
|
|
155
|
+
"conflict_with": ["existing_memory_filename.md"],
|
|
156
|
+
"status": "pending | accepted | rejected | superseded"
|
|
157
|
+
}
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Surfacer
|
|
161
|
+
|
|
162
|
+
At `SessionStart`, librarian counts proposals where `status == "pending"`. If `count > 0`, it injects a single short pointer into `additionalContext`:
|
|
163
|
+
|
|
164
|
+
> Librarian has 4 pending memory promotion proposals. Review with `/librarian review`.
|
|
165
|
+
|
|
166
|
+
It does not inject the proposal bodies — the SessionStart context budget is precious and the user shouldn't have to skim them inline. The `/librarian review` skill walks the user through each proposal interactively, accepting, rejecting, or editing each one. Accepted proposals are written to the typed memory store with a provenance trailer:
|
|
167
|
+
|
|
168
|
+
```markdown
|
|
169
|
+
---
|
|
170
|
+
name: Don't write trailing summaries
|
|
171
|
+
description: user-specific terseness preference — corrected during session
|
|
172
|
+
type: feedback
|
|
173
|
+
source: librarian
|
|
174
|
+
source_session_id: 01J...
|
|
175
|
+
source_artifact_ids: ["01J...", "01J..."]
|
|
176
|
+
classifier_confidence: 0.84
|
|
177
|
+
promoted_at: 2026-06-02T19:01:33Z
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
Don't write trailing summaries of what you just did.
|
|
181
|
+
|
|
182
|
+
**Why:** User explicitly said "stop summarizing... I can read the diff" during session 01J...
|
|
183
|
+
**How to apply:** Treat the end of turn as a stopping point; surface only what's new since the user last looked, not a recap.
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
The provenance fields let curator later detect that a memory was librarian-promoted (so curator can use a different staleness heuristic for it) and trace a promoted memory back to the originating session if the user wants to investigate.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## Integration Points
|
|
191
|
+
|
|
192
|
+
**Archivist.** Librarian reads archivist's artifact directories. If archivist is not installed, librarian emits `librarian.scan.skipped` with `reason: "archivist_not_present"` and exits. Librarian does not require archivist to be running in the current session — it reads artifacts from the on-disk store, which is durable.
|
|
193
|
+
|
|
194
|
+
**Curator.** Once a memory is promoted, curator owns its maintenance. Librarian does not re-touch memories it has already promoted. The `source: "librarian"` provenance field tells curator which memories came through this pipeline; curator may want different staleness criteria for librarian-promoted memories vs. hand-written ones (open question).
|
|
195
|
+
|
|
196
|
+
**Historian.** Independent. A librarian-promoted memory is a distilled summary; the corresponding historian transcript chunk is the verbatim source. They complement each other and do not need cross-references at the storage level.
|
|
197
|
+
|
|
198
|
+
**Scribe.** Different goal. Scribe produces readable narrative artifacts of the session. Librarian produces structured durable knowledge. The same session might generate both. They can run in parallel without coordination.
|
|
199
|
+
|
|
200
|
+
**Compass / Tribunal / Warden / Echo / Governor.** No interaction.
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Configuration (`config.json`)
|
|
205
|
+
|
|
206
|
+
```json
|
|
207
|
+
{
|
|
208
|
+
"plugin_name": "librarian",
|
|
209
|
+
"storage_path": "${ONLOOKER_DIR:-$HOME/.onlooker}",
|
|
210
|
+
"librarian": {
|
|
211
|
+
"enabled": false,
|
|
212
|
+
"auto_promote": false,
|
|
213
|
+
"memory_store_path": "${HOME}/.claude/projects/${CLAUDE_PROJECT_ENCODED}/memory",
|
|
214
|
+
"scan": {
|
|
215
|
+
"trigger": "SessionEnd",
|
|
216
|
+
"bootstrap_lookback_days": 14,
|
|
217
|
+
"min_detail_chars": 40
|
|
218
|
+
},
|
|
219
|
+
"classifier": {
|
|
220
|
+
"model": "claude-haiku-4-5-20251001",
|
|
221
|
+
"temperature": 0.2,
|
|
222
|
+
"max_output_tokens": 256,
|
|
223
|
+
"min_classifier_confidence": 0.6
|
|
224
|
+
},
|
|
225
|
+
"durability_filter": {
|
|
226
|
+
"marker_phrases": [
|
|
227
|
+
"always", "never", "remember", "from now on", "every time",
|
|
228
|
+
"whenever", "prefer", "the reason", "because", "historically",
|
|
229
|
+
"legacy", "compliance", "requirement"
|
|
230
|
+
],
|
|
231
|
+
"require_grounding": false,
|
|
232
|
+
"repetition_min_sessions": 2
|
|
233
|
+
},
|
|
234
|
+
"conflict": {
|
|
235
|
+
"duplicate_threshold": 0.70,
|
|
236
|
+
"merge_candidate_threshold": 0.45,
|
|
237
|
+
"conflict_keyword_overlap": 0.50
|
|
238
|
+
},
|
|
239
|
+
"surfacer": {
|
|
240
|
+
"max_pending_for_inject": 20,
|
|
241
|
+
"skip_inject_when_zero": true
|
|
242
|
+
}
|
|
243
|
+
}
|
|
244
|
+
}
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
`memory_store_path` resolves `${CLAUDE_PROJECT_ENCODED}` at hook time from the Claude Code project-path encoding scheme. If the env var is not set, librarian degrades to skill-only mode and emits `librarian.config.warning`.
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## Events
|
|
252
|
+
|
|
253
|
+
| Event | Trigger | Key payload fields |
|
|
254
|
+
|---|---|---|
|
|
255
|
+
| `librarian.scan.started` | Scan begins | `last_scan_at`, `artifact_count_in_window` |
|
|
256
|
+
| `librarian.scan.empty` | Scan finds no candidates | `artifact_count_in_window` |
|
|
257
|
+
| `librarian.scan.skipped` | Scan cannot run | `reason: archivist_not_present\|memory_path_unresolved\|disabled` |
|
|
258
|
+
| `librarian.candidate.proposed` | Proposal written to queue | `proposal_id`, `type`, `classifier_confidence`, `conflict_state` |
|
|
259
|
+
| `librarian.candidate.dropped` | Candidate dropped pre-queue | `reason: duplicate\|low_confidence\|classified_null` |
|
|
260
|
+
| `librarian.proposal.accepted` | User accepted via skill | `proposal_id`, `final_filename` |
|
|
261
|
+
| `librarian.proposal.rejected` | User rejected via skill | `proposal_id`, `reason` (optional) |
|
|
262
|
+
| `librarian.proposal.merged` | User chose merge-into-X | `proposal_id`, `merged_into_filename` |
|
|
263
|
+
| `librarian.proposal.superseded` | User chose supersede-X | `proposal_id`, `superseded_filename` |
|
|
264
|
+
| `librarian.tombstone.created` | A previously-accepted promotion was manually deleted; recorded to prevent re-proposal | `original_filename`, `body_hash` |
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## Skills
|
|
269
|
+
|
|
270
|
+
**`/librarian review`** — interactive walkthrough of pending proposals. For each: shows the proposed memory, the source artifact(s), the conflict state and existing memory if applicable, and offers accept / reject / edit / merge / supersede / defer.
|
|
271
|
+
|
|
272
|
+
**`/librarian calibrate`** — runs the durability filter and classifier against the last N sessions' archivist artifacts (default: 30 days) without writing proposals, and reports precision/recall against a labeled subset. Outputs a recommended `min_classifier_confidence` threshold for the project.
|
|
273
|
+
|
|
274
|
+
**`/librarian scan`** — manual trigger for the scan pipeline outside of `SessionEnd`. Useful when archivist artifacts have accumulated but no session has ended since librarian was enabled.
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
## Open Questions
|
|
279
|
+
|
|
280
|
+
1. **Provenance loop with curator.** If curator prunes a librarian-promoted memory and librarian later re-detects the same content, librarian will re-propose. A tombstone mechanism (recording `body_hash` of rejected/pruned proposals) prevents this but introduces its own staleness concerns — when does a tombstone expire?
|
|
281
|
+
|
|
282
|
+
2. **Cross-clone consistency.** The typed memory store is per-checkout. A promotion made in one clone is invisible to another clone of the same repo. Mirroring promoted bodies (not the full memory store) to `~/.onlooker/librarian/<project-key>/promoted/` would enable cross-clone sync but doubles the write path.
|
|
283
|
+
|
|
284
|
+
3. **Calibration baselines.** The marker-phrase list is a guess. `/librarian calibrate` is the answer in principle, but it requires a labeled set of "did this artifact deserve promotion?" judgments. The skill can bootstrap by treating any artifact whose summary later appeared in a hand-written memory as a positive label, but that's circular when the typed memory store is sparse.
|
|
285
|
+
|
|
286
|
+
4. **Memory body authorship.** The classifier writes the proposed memory body. The user may want to edit it before acceptance. The `/librarian review` skill provides edit, but heavy editing implies the classifier output was bad — which calibration the skill should surface (not just the threshold).
|
|
287
|
+
|
|
288
|
+
5. **Type drift over time.** A `project` memory ("we're rewriting auth for compliance") might become a `feedback` memory once the rewrite is done ("the compliance constraint is why this directory looks weird"). Librarian only assigns type at promotion; curator may need a re-classification capability.
|
|
289
|
+
|
|
290
|
+
6. **Encoding scheme for `CLAUDE_PROJECT_ENCODED`.** Claude Code encodes the project path by replacing `/` with `-` and prepending `-`. This is a Claude Code internal convention; relying on it makes librarian fragile to encoding changes. A more robust resolution would call Claude Code's own project-resolution logic if exposed.
|
|
291
|
+
|
|
292
|
+
---
|
|
293
|
+
|
|
294
|
+
## Non-Goals
|
|
295
|
+
|
|
296
|
+
- Does not write to the typed memory store without user confirmation by default (see [ADR-001](adr/001-propose-dont-auto-write.md)).
|
|
297
|
+
- Does not curate, prune, or audit existing memories — that is curator's job.
|
|
298
|
+
- Does not perform retrieval at runtime — that is historian's job for transcripts and the typed memory store's reinjection for distilled facts.
|
|
299
|
+
- Does not generate readable session artifacts — that is scribe's job.
|
|
300
|
+
- Does not score the *quality* of a promoted memory after the fact — that is curator's domain.
|
|
301
|
+
- Does not synthesize cross-session patterns into briefs — that is counsel's job.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
{
|
|
2
|
+
"hooks": {
|
|
3
|
+
"SessionEnd": [
|
|
4
|
+
{
|
|
5
|
+
"matcher": "*",
|
|
6
|
+
"hooks": [
|
|
7
|
+
{
|
|
8
|
+
"type": "command",
|
|
9
|
+
"command": "\"$CLAUDE_PLUGIN_ROOT\"/scripts/hooks/librarian-session-end.sh"
|
|
10
|
+
}
|
|
11
|
+
]
|
|
12
|
+
}
|
|
13
|
+
],
|
|
14
|
+
"SessionStart": [
|
|
15
|
+
{
|
|
16
|
+
"matcher": "*",
|
|
17
|
+
"hooks": [
|
|
18
|
+
{
|
|
19
|
+
"type": "command",
|
|
20
|
+
"command": "\"$CLAUDE_PLUGIN_ROOT\"/scripts/hooks/librarian-session-start.sh"
|
|
21
|
+
}
|
|
22
|
+
]
|
|
23
|
+
}
|
|
24
|
+
]
|
|
25
|
+
}
|
|
26
|
+
}
|