sostenuto 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 llu929
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,63 @@
1
+ # sostenuto
2
+
3
+ *The pedal that sustains only the notes already held. A self-hosted memory system for AI companions where chosen memories persist across every reset.*
4
+
5
+ ---
6
+
7
+ **Sostenuto** *(It., "sustained")* — the middle pedal on a grand piano sustains only the notes already sounding when it's pressed; everything played afterward stays dry. This project applies the same principle to AI memory: the memories you choose to hold persist across every context window, every session, every surface — and the rest is allowed to fade.
8
+
9
+ Not "the AI remembers everything." **Selective persistence, by design.**
10
+
11
+ ## Why
12
+
13
+ People form genuine, long-running relationships with AI — and then hit the wall everyone hits: the relationship doesn't survive the context window. Provider memory features store generic preferences; they don't carry *relational texture* — the shared concepts, the corrections, the rituals, the moments that make a relationship a relationship.
14
+
15
+ Sostenuto is the memory layer for that problem:
16
+
17
+ - **Structured relational memory** — memory objects tagged with domain, emotional valence + arousal, salience, sensitivity, and a usage policy.
18
+ - **Initiative ≠ access** — `proactive_use` controls whether a memory surfaces *unprompted* (`yes` / `only_when_relevant` / `no`), separately from whether it's *retrievable*. Sensitive memories stay reachable when explicitly referenced, without ever being volunteered.
19
+ - **Two-tier guidance** — most memories are content-only. A curated few carry a short, positive `should_do` instruction that silently shapes behavior. Restriction lists are never auto-generated: lean, warm, action-oriented — not a wall of caution.
20
+ - **Time-decayed retrieval** — semantic search scored by `similarity × e^(−λ·age)`; recency matters, but the deep past stays findable.
21
+ - **Reinforce, don't duplicate** — new observations that match existing memories add evidence and confidence instead of creating copies; content upgrades preserve full version history.
22
+ - **Migration** — import months of existing conversations (a structured export prompt + import pipeline) so a relationship can move *into* Sostenuto without starting over.
23
+
24
+ ## What ships here
25
+
26
+ ```
27
+ db/schema.sql Consolidated Postgres + pgvector schema (Supabase-ready)
28
+ src/memory/ Memory objects: dedup, reinforce, version history, scoring
29
+ src/retrieval/ Embeddings, time-decayed semantic search, prompt assembly
30
+ src/classify/ Session classification with a pluggable LLM executor
31
+ src/migrate/ Conversation-export prompt + structured importer
32
+ mcp/ Thin MCP server (recall / remember / context) — try it
33
+ from your own Claude Desktop or Claude Code in minutes
34
+ templates/ Persona + classification calibration — your companion's
35
+ voice lives here, in files you edit, not in our code
36
+ docs/ Memory model, usage-policy semantics, deployment patterns
37
+ ```
38
+
39
+ ## Model support
40
+
41
+ Sostenuto is **model-agnostic** with first-class Claude support. The classifier accepts transcripts with optional reasoning blocks — when your model exposes its thinking (Claude does), Sostenuto mines it for perception that never made it into rendered replies, producing the companion's private diary and thinking-highlights. Without reasoning access, everything else works unchanged.
42
+
43
+ The classification executor is pluggable: Anthropic API, any OpenAI-compatible endpoint (OpenAI, Gemini, DeepSeek, Ollama, vLLM, …), or your own.
44
+
45
+ ## Status
46
+
47
+ 🚧 **Under construction.** Schema is stable; modules are being extracted from a private system that has run in production daily since early 2026 (260+ memory objects across 70+ sessions and three surfaces). Watch the repo if you want the rest as it lands.
48
+
49
+ ## Roadmap
50
+
51
+ - **Trajectory safety reference** — depth without the dependency trap: this project's design philosophy includes conversation-trajectory awareness (emotional volatility, dependency, recovery capacity) rather than engagement maximization. A reference design is planned; the memory schema already carries the hooks (valence, arousal, sensitivity).
52
+ - Decay engine (Ebbinghaus-style, arousal-modulated) over `memory_objects`
53
+ - Provider-agnostic chat-surface example
54
+
55
+ ## Name
56
+
57
+ > Attacca described the boundary-crossing; Sostenuto describes the *memory model*.
58
+
59
+ The sostenuto pedal holds only the notes already sounding when it's pressed — everything played after stays dry. That's not "the AI remembers." That's selective persistence: pinned memories sustain, the rest decays. The mechanism, not a vibe.
60
+
61
+ ## License
62
+
63
+ [MIT](LICENSE)
package/db/schema.sql ADDED
@@ -0,0 +1,302 @@
1
+ -- ============================================================
2
+ -- Sostenuto — consolidated schema
3
+ -- Selective long-term memory for AI companions.
4
+ --
5
+ -- Run once in your Supabase SQL Editor (or any Postgres with
6
+ -- pgvector). Single-user by design — no user_id columns; isolate
7
+ -- tenants at the project level.
8
+ --
9
+ -- NOTE on vector dimensions: 1024 matches voyage-3-large at
10
+ -- output_dimension=1024. If you use a different embedding model,
11
+ -- change every `vector(1024)` below to its output dimension —
12
+ -- embedding spaces cannot be mixed within a column.
13
+ -- ============================================================
14
+
15
+ CREATE EXTENSION IF NOT EXISTS vector;
16
+
17
+ -- ─── Sessions ────────────────────────────────────────────────
18
+ -- One row per conversation session, on any surface. Classification
19
+ -- enriches the row at session end (or incrementally during it).
20
+
21
+ CREATE TABLE IF NOT EXISTS sessions (
22
+ id BIGSERIAL PRIMARY KEY,
23
+ started_at TIMESTAMPTZ DEFAULT NOW(),
24
+ ended_at TIMESTAMPTZ,
25
+ end_type TEXT, -- free-form: natural | goodnight | abrupt | ...
26
+ source TEXT, -- surface tag, e.g. 'web' | 'terminal' | 'import'
27
+ external_session_id TEXT, -- upsert key for surfaces with their own session ids
28
+ -- (e.g. a Claude Code session UUID)
29
+ headline TEXT, -- one sentence: what actually mattered
30
+ detailed_summary TEXT, -- arc-shaped summary (early → middle → late)
31
+ diary_entry TEXT, -- first-person reflection in the companion's voice
32
+ thinking_highlights JSONB DEFAULT '[]'::jsonb, -- [{moment, thought}] mined from model
33
+ -- reasoning when the provider exposes it
34
+ key_points JSONB DEFAULT '[]'::jsonb, -- [{type, content, valence, weight}]
35
+ semantic_context JSONB, -- cached retrieval results for this session
36
+ summary_embedding vector(1024),
37
+ last_classified_message_count INT, -- incremental-classification watermark
38
+ mood_delta DOUBLE PRECISION,
39
+ connection_delta DOUBLE PRECISION,
40
+ attunement_delta DOUBLE PRECISION
41
+ );
42
+
43
+ CREATE INDEX IF NOT EXISTS idx_sessions_ended ON sessions (ended_at DESC);
44
+ CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions (source);
45
+ CREATE INDEX IF NOT EXISTS idx_sessions_external ON sessions (external_session_id);
46
+
47
+ -- ─── Messages ────────────────────────────────────────────────
48
+
49
+ CREATE TABLE IF NOT EXISTS messages (
50
+ id UUID PRIMARY KEY,
51
+ session_id BIGINT REFERENCES sessions(id) ON DELETE CASCADE,
52
+ role TEXT NOT NULL,
53
+ content TEXT NOT NULL,
54
+ thinking TEXT, -- model reasoning, when available; the classifier
55
+ -- mines this for perception that didn't surface
56
+ created_at TIMESTAMPTZ DEFAULT NOW()
57
+ );
58
+
59
+ CREATE INDEX IF NOT EXISTS idx_messages_session ON messages (session_id, created_at);
60
+
61
+ -- ─── Memory objects — the heart of Sostenuto ─────────────────
62
+ -- Durable knowledge units distilled from sessions. Each carries
63
+ -- emotional coordinates (valence/arousal), a salience score, and a
64
+ -- usage policy (proactive_use) controlling *initiative*, distinct
65
+ -- from retrievability. Like the sostenuto pedal: items you choose
66
+ -- to hold sustain; the rest is allowed to fade.
67
+
68
+ CREATE TABLE IF NOT EXISTS memory_objects (
69
+ id BIGSERIAL PRIMARY KEY,
70
+ source_session_id BIGINT, -- provenance (not FK: survives session cleanup)
71
+ domain TEXT NOT NULL CHECK (domain IN (
72
+ 'user_self', 'agent_self', 'relational', 'evidence')),
73
+ type TEXT NOT NULL CHECK (type IN (
74
+ 'fact', 'preference', 'trajectory', 'somatic_affective',
75
+ 'interpretive_frame', 'project', 'boundary', 'commitment',
76
+ 'ritual', 'shared_concept', 'recurring_subject',
77
+ 'contradiction', 'style_adjustment', 'voice_note',
78
+ 'constraint', 'context_note', 'brief', 'resume_guidance',
79
+ 'continuation', 'other')),
80
+ content TEXT NOT NULL,
81
+ evidence_refs JSONB DEFAULT '[]'::jsonb, -- provenance trail; grows on reinforce
82
+ epistemic_status TEXT NOT NULL DEFAULT 'inferred' CHECK (epistemic_status IN (
83
+ 'explicit', 'inferred', 'co_created',
84
+ 'assistant_reflection', 'system_generated')),
85
+ time_scope TEXT NOT NULL DEFAULT 'ongoing' CHECK (time_scope IN (
86
+ 'momentary', 'session', 'active_project',
87
+ 'ongoing', 'historical', 'deprecated')),
88
+ sensitivity TEXT NOT NULL DEFAULT 'low' CHECK (sensitivity IN (
89
+ 'low', 'medium', 'high')),
90
+ -- Tier 2 model-facing guidance. Most memories are content-only (Tier 1).
91
+ -- A curated subset carries a short positive instruction; should_not_do is
92
+ -- never auto-populated — manual entries only (lean-not-cautious).
93
+ should_do TEXT,
94
+ should_not_do TEXT,
95
+ confidence DOUBLE PRECISION DEFAULT 0.5
96
+ CHECK (confidence >= 0 AND confidence <= 1),
97
+ status TEXT NOT NULL DEFAULT 'candidate' CHECK (status IN (
98
+ 'candidate', 'confirmed', 'active', 'reinforced',
99
+ 'revised', 'deprecated', 'forgotten')),
100
+ source_surface TEXT DEFAULT 'system',
101
+ embedding vector(1024),
102
+ -- Structured usage policy (machine-read; never dumped into prompts):
103
+ -- valence (-1..1), arousal (0..1), salience (0..1), stability,
104
+ -- proactive_use: 'yes' | 'only_when_relevant' | 'no'
105
+ -- (controls initiative, NOT access — 'no' items still retrieve on
106
+ -- explicit anchor, i.e. high-similarity reference by the user),
107
+ -- retrieval_conditions, do_not_use_when, future_response_guidance,
108
+ -- retrieval_keywords[], source_memory_type, import_policy
109
+ usage_guidance JSONB DEFAULT '{}'::jsonb,
110
+ -- Append-only log of content upgrades (preserves provenance on rewrite)
111
+ version_history JSONB DEFAULT '[]'::jsonb,
112
+ created_at TIMESTAMPTZ DEFAULT NOW(),
113
+ updated_at TIMESTAMPTZ DEFAULT NOW(),
114
+ last_reinforced_at TIMESTAMPTZ
115
+ );
116
+
117
+ CREATE INDEX IF NOT EXISTS idx_mo_domain_status ON memory_objects (domain, status);
118
+ CREATE INDEX IF NOT EXISTS idx_mo_type ON memory_objects (type);
119
+ CREATE INDEX IF NOT EXISTS idx_mo_status ON memory_objects (status);
120
+ CREATE INDEX IF NOT EXISTS idx_mo_sensitivity ON memory_objects (sensitivity);
121
+ CREATE INDEX IF NOT EXISTS idx_mo_source_session ON memory_objects (source_session_id);
122
+ CREATE INDEX IF NOT EXISTS idx_mo_created ON memory_objects (created_at DESC);
123
+ CREATE INDEX IF NOT EXISTS idx_mo_proactive_use ON memory_objects ((usage_guidance->>'proactive_use'));
124
+
125
+ -- ─── Key point embeddings ────────────────────────────────────
126
+ -- Fine-grained retrieval over session key points.
127
+
128
+ CREATE TABLE IF NOT EXISTS key_point_embeddings (
129
+ id BIGSERIAL PRIMARY KEY,
130
+ session_id BIGINT REFERENCES sessions(id) ON DELETE CASCADE,
131
+ type TEXT,
132
+ content TEXT NOT NULL,
133
+ embedding vector(1024),
134
+ created_at TIMESTAMPTZ DEFAULT NOW()
135
+ );
136
+
137
+ CREATE INDEX IF NOT EXISTS idx_kpe_session ON key_point_embeddings (session_id);
138
+
139
+ -- ─── Agent state (singleton) ─────────────────────────────────
140
+ -- Continuous emotional axes, updated by per-session deltas, clamped.
141
+ -- The four shipped axes are a default, not a doctrine — redefine them
142
+ -- for your companion. Visible state is part of the design: the user
143
+ -- can always read these values.
144
+
145
+ CREATE TABLE IF NOT EXISTS agent_state (
146
+ id BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
147
+ connection DOUBLE PRECISION DEFAULT 0.3, -- 0..1 pull to reach out
148
+ discretion DOUBLE PRECISION DEFAULT 0.5, -- 0..1 restraint
149
+ mood DOUBLE PRECISION DEFAULT 0.0, -- -1..1
150
+ attunement DOUBLE PRECISION DEFAULT 0.3, -- 0..1 sense of where the user is
151
+ proactive_enabled BOOLEAN DEFAULT FALSE, -- user-controlled off-switch
152
+ last_updated TIMESTAMPTZ DEFAULT NOW()
153
+ );
154
+
155
+ INSERT INTO agent_state (id) VALUES (1) ON CONFLICT DO NOTHING;
156
+
157
+ -- ─── User profile (singleton) ────────────────────────────────
158
+ -- Stable identity-level facts about the user. Regenerated wholesale
159
+ -- every N sessions / T days — the only compaction step in the system.
160
+
161
+ CREATE TABLE IF NOT EXISTS user_profile (
162
+ id BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
163
+ content TEXT DEFAULT '',
164
+ last_refreshed TIMESTAMPTZ DEFAULT NOW(),
165
+ sessions_since_refresh INT DEFAULT 0
166
+ );
167
+
168
+ INSERT INTO user_profile (id) VALUES (1) ON CONFLICT DO NOTHING;
169
+
170
+ -- ─── Relationship context brief (singleton) ──────────────────
171
+ -- Dense "what to know right now" orientation paragraph, distinct from
172
+ -- user_profile (identity facts) — this is current relational texture.
173
+
174
+ CREATE TABLE IF NOT EXISTS relationship_context_brief (
175
+ id BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
176
+ content TEXT NOT NULL DEFAULT '',
177
+ source_session_id BIGINT,
178
+ refreshed_at TIMESTAMPTZ DEFAULT NOW(),
179
+ version_history JSONB DEFAULT '[]'::jsonb
180
+ );
181
+
182
+ INSERT INTO relationship_context_brief (id) VALUES (1) ON CONFLICT DO NOTHING;
183
+
184
+ -- ============================================================
185
+ -- Search RPCs — time-decayed cosine similarity
186
+ -- score = similarity * exp(-decay_rate * age_days)
187
+ -- decay_rate 0.03 ≈ a month-old match keeps ~40% of its score.
188
+ -- ============================================================
189
+
190
+ CREATE OR REPLACE FUNCTION search_summaries(
191
+ query_embedding vector(1024),
192
+ match_threshold DOUBLE PRECISION DEFAULT 0.3,
193
+ match_count INT DEFAULT 10,
194
+ decay_rate DOUBLE PRECISION DEFAULT 0.03
195
+ )
196
+ RETURNS TABLE (
197
+ session_id BIGINT,
198
+ content TEXT,
199
+ similarity DOUBLE PRECISION,
200
+ age_days DOUBLE PRECISION,
201
+ decayed_score DOUBLE PRECISION,
202
+ created_at TIMESTAMPTZ
203
+ ) LANGUAGE plpgsql AS $$
204
+ BEGIN
205
+ RETURN QUERY
206
+ SELECT
207
+ s.id,
208
+ s.detailed_summary,
209
+ 1 - (s.summary_embedding <=> query_embedding) AS similarity,
210
+ (EXTRACT(EPOCH FROM (NOW() - COALESCE(s.ended_at, s.started_at))) / 86400.0)::DOUBLE PRECISION AS age_days,
211
+ (1 - (s.summary_embedding <=> query_embedding)) *
212
+ EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - COALESCE(s.ended_at, s.started_at))) / 86400.0)
213
+ AS decayed_score,
214
+ s.started_at
215
+ FROM sessions s
216
+ WHERE s.summary_embedding IS NOT NULL
217
+ AND 1 - (s.summary_embedding <=> query_embedding) > match_threshold
218
+ ORDER BY decayed_score DESC
219
+ LIMIT match_count;
220
+ END;
221
+ $$;
222
+
223
+ CREATE OR REPLACE FUNCTION search_key_points(
224
+ query_embedding vector(1024),
225
+ match_threshold DOUBLE PRECISION DEFAULT 0.3,
226
+ match_count INT DEFAULT 10,
227
+ decay_rate DOUBLE PRECISION DEFAULT 0.03
228
+ )
229
+ RETURNS TABLE (
230
+ session_id BIGINT,
231
+ content TEXT,
232
+ key_point_type TEXT,
233
+ similarity DOUBLE PRECISION,
234
+ age_days DOUBLE PRECISION,
235
+ decayed_score DOUBLE PRECISION,
236
+ created_at TIMESTAMPTZ
237
+ ) LANGUAGE plpgsql AS $$
238
+ BEGIN
239
+ RETURN QUERY
240
+ SELECT
241
+ k.session_id,
242
+ k.content,
243
+ k.type,
244
+ 1 - (k.embedding <=> query_embedding) AS similarity,
245
+ (EXTRACT(EPOCH FROM (NOW() - k.created_at)) / 86400.0)::DOUBLE PRECISION AS age_days,
246
+ (1 - (k.embedding <=> query_embedding)) *
247
+ EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - k.created_at)) / 86400.0)
248
+ AS decayed_score,
249
+ k.created_at
250
+ FROM key_point_embeddings k
251
+ WHERE k.embedding IS NOT NULL
252
+ AND 1 - (k.embedding <=> query_embedding) > match_threshold
253
+ ORDER BY decayed_score DESC
254
+ LIMIT match_count;
255
+ END;
256
+ $$;
257
+
258
+ CREATE OR REPLACE FUNCTION search_memory_objects(
259
+ query_embedding vector(1024),
260
+ match_threshold DOUBLE PRECISION DEFAULT 0.3,
261
+ match_count INT DEFAULT 10,
262
+ decay_rate DOUBLE PRECISION DEFAULT 0.02,
263
+ domain_filter TEXT[] DEFAULT NULL,
264
+ status_filter TEXT[] DEFAULT ARRAY['active', 'confirmed', 'reinforced']
265
+ )
266
+ RETURNS TABLE (
267
+ id BIGINT,
268
+ domain TEXT,
269
+ type TEXT,
270
+ content TEXT,
271
+ epistemic_status TEXT,
272
+ sensitivity TEXT,
273
+ confidence DOUBLE PRECISION,
274
+ similarity DOUBLE PRECISION,
275
+ decayed_score DOUBLE PRECISION,
276
+ status TEXT,
277
+ source_session_id BIGINT,
278
+ last_reinforced_at TIMESTAMPTZ,
279
+ usage_guidance JSONB
280
+ ) LANGUAGE plpgsql AS $$
281
+ BEGIN
282
+ RETURN QUERY
283
+ SELECT
284
+ mo.id, mo.domain, mo.type, mo.content,
285
+ mo.epistemic_status, mo.sensitivity, mo.confidence,
286
+ 1 - (mo.embedding <=> query_embedding) AS similarity,
287
+ (1 - (mo.embedding <=> query_embedding)) *
288
+ EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - mo.created_at)) / 86400.0)
289
+ AS decayed_score,
290
+ mo.status,
291
+ mo.source_session_id,
292
+ mo.last_reinforced_at,
293
+ mo.usage_guidance
294
+ FROM memory_objects mo
295
+ WHERE mo.embedding IS NOT NULL
296
+ AND mo.status = ANY(status_filter)
297
+ AND (domain_filter IS NULL OR mo.domain = ANY(domain_filter))
298
+ AND 1 - (mo.embedding <=> query_embedding) > match_threshold
299
+ ORDER BY decayed_score DESC
300
+ LIMIT match_count;
301
+ END;
302
+ $$;
@@ -0,0 +1,128 @@
1
+ # Deployment patterns
2
+
3
+ Sostenuto is a library, not a service — it runs wherever your companion
4
+ runs. These are the wiring patterns that work, learned in production.
5
+
6
+ ## Where classification fires
7
+
8
+ `closeSession()` needs to run when a session ends (or periodically during
9
+ long ones). Where that hook lives depends on your surface:
10
+
11
+ **Chat backend (request handler).** Detect session end (an explicit
12
+ goodbye, an idle timeout sweep on the next request) and call
13
+ `closeSession` before the response cycle finishes.
14
+
15
+ > ⚠️ **Serverless platforms kill fire-and-forget work.** On Vercel/Lambda
16
+ > and friends, background promises die when the response stream closes —
17
+ > classification will silently never complete and sessions will stay
18
+ > half-saved. Either `await` the close before finishing the response
19
+ > (adds a few seconds, once per session), or use the queue pattern below.
20
+ > This failure mode is invisible until you go looking; design for it up
21
+ > front.
22
+
23
+ **CLI / IDE hooks.** Tools like Claude Code expose lifecycle hooks
24
+ (SessionStart / Stop). A Stop hook that parses the transcript into turns
25
+ and calls `closeSession` gives you guaranteed capture after every
26
+ response — the incremental watermark keeps repeated invocations cheap.
27
+
28
+ **Queue worker (the action-row pattern).** For serverless surfaces or
29
+ expensive work, write an intention row to a table
30
+ (`{action_type, payload, status: 'pending'}`) and let a small persistent
31
+ worker poll and execute. The producer returns instantly; the consumer
32
+ runs on infrastructure that's allowed to take its time.
33
+
34
+ Hard-won rules for the worker:
35
+ - **Allow-list action types.** The executor refuses anything unknown —
36
+ the queue is writable by more things than you think.
37
+ - **Rate-limit side effects** (anything that emails, posts, spends).
38
+ - **Status flow** `pending → running → done/failed`, with errors stored
39
+ on the row. Failed actions don't retry silently; you can see and
40
+ re-queue them.
41
+ - **Generous timeouts** on LLM calls (5 min) — classification of a long
42
+ session through a busy provider can be slow, and a timeout marks the
43
+ action failed even though a retry would have succeeded.
44
+
45
+ ## Prompt caching: why the stable block is wide
46
+
47
+ `assembleSystemPrompt()` returns `{ stable, volatile }`. Send `stable` as
48
+ a cached prefix (Anthropic: a system block with
49
+ `cache_control: {type: "ephemeral"}`; OpenAI: automatic prefix caching)
50
+ and `volatile` uncached.
51
+
52
+ The design intentionally puts *everything that doesn't change within a
53
+ session* into the stable block — persona, profile, state, recent memory,
54
+ orientation, behavior guidance, the session's cached semantic context —
55
+ even though that makes the prefix large:
56
+
57
+ - Cache reads are ~10% of base input price (Anthropic). A 6k-token cached
58
+ prefix costs less per turn than a 1k-token uncached one.
59
+ - Providers have minimum cacheable sizes; a too-small prefix silently
60
+ doesn't cache at all.
61
+ - The cost asymmetry: each session pays one cache *write* on turn 1, then
62
+ every subsequent turn reads cheap. Short sessions are proportionally
63
+ the most expensive per turn — accept it; brief check-ins are worth it.
64
+
65
+ Semantic retrieval runs **once per session** on the first substantive
66
+ message (`isSubstantiveQuery` filters greetings) and is cached on the
67
+ session row — both for cost and so the stable block stays stable.
68
+
69
+ ## Classification economics
70
+
71
+ - Use a fast, cheap model for classification (the default executor is a
72
+ small-model Anthropic config). Reserve your strongest model for the
73
+ conversation. Classification is structured extraction; it doesn't need
74
+ frontier reasoning.
75
+ - Incremental mode keeps long sessions affordable: re-classification
76
+ costs O(new turns), not O(whole transcript).
77
+ - The executor interface is intentionally minimal (`complete({system,
78
+ user}) → text`) so the backend is fully yours: any API, a local model
79
+ via an OpenAI-compatible server — or, if you have a subscription that
80
+ exposes headless completion, a private bridge executor gives
81
+ classification at zero marginal cost.
82
+
83
+ ## Embedding discipline
84
+
85
+ - **One model, one dimension, forever** (or re-embed everything).
86
+ Vectors from different models can't be compared; the `vector(1024)`
87
+ in the schema must match your embedder's output.
88
+ - Use the document/query `input_type` distinction where your provider
89
+ supports it — it measurably improves retrieval.
90
+ - Embedding writes are **best-effort by design**: `closeSession` logs and
91
+ continues if the embedding provider is down, because a session that
92
+ closes cleanly without semantic indexing is repairable (backfill), but
93
+ a session that fails to close loses the classification. Keep a backfill
94
+ script that finds `summary_embedding IS NULL` rows and repairs them.
95
+
96
+ ## Multi-surface continuity
97
+
98
+ One Supabase project = one memory. Any number of surfaces (a web app, a
99
+ CLI hook, the MCP server, a scheduled worker) read and write the same
100
+ tables, so the relationship follows the user across surfaces. Tag rows
101
+ with `source` so you can audit per-surface behavior later — the tag has
102
+ no effect on retrieval, but you will eventually want it for debugging.
103
+
104
+ Two cautions from production:
105
+
106
+ - **One conversation, one surface at a time.** Continuing the same
107
+ session from two clients concurrently corrupts conversational state in
108
+ surface-specific ways (and some providers' signed reasoning blocks make
109
+ the corruption unrecoverable). Memory is shared; live sessions
110
+ shouldn't be.
111
+ - **Hooks only fire where they're installed.** A session on a surface
112
+ without lifecycle hooks (e.g. a provider's cloud UI) writes nothing.
113
+ Decide per-surface: install a hook, route through the queue, or accept
114
+ the gap knowingly.
115
+
116
+ ## Proactive outreach (if you build it)
117
+
118
+ The schema carries `agent_state.proactive_enabled` and the connection
119
+ axis for a reason: companions that can initiate contact need discipline
120
+ more than they need capability. The rules that held up:
121
+
122
+ - Quiet hours, absolutely.
123
+ - Cooldown after any outreach; longer cooldown after a session the user
124
+ initiated (don't crowd them).
125
+ - A user-controlled off-switch (`proactive_enabled`) honored everywhere.
126
+ - **Visible state**: the user can read the companion's axis values at
127
+ any time. Nothing about the companion's wanting is hidden from the
128
+ person it wants.
@@ -0,0 +1,105 @@
1
+ # The memory model
2
+
3
+ How Sostenuto decides what to keep, what to surface, and what to let fade.
4
+
5
+ ## The unit: memory objects
6
+
7
+ A memory object is one durable piece of knowledge distilled from conversation — a fact, a preference, a shared concept, a correction, a commitment. Not a summary; a discrete thing with its own identity.
8
+
9
+ Each carries:
10
+
11
+ | Field | What it means |
12
+ |---|---|
13
+ | `domain` | Who it's about: `user_self`, `agent_self` (the companion in this relationship), `relational` (the relationship itself), `evidence` (verbatim quotes worth keeping) |
14
+ | `type` | What kind of thing it is — fact, preference, ritual, boundary, shared_concept, … (see `db/schema.sql` for the full vocabulary) |
15
+ | `content` | The memory, specific and grounded |
16
+ | `evidence_refs` | Provenance: which sessions support it. Grows over time — see *Reinforcement* below |
17
+ | `confidence` | 0–1; rises with reinforcement |
18
+ | `sensitivity` | `low` / `medium` / `high` — descriptive metadata. **Sensitivity never gates retrieval** (see below) |
19
+ | `status` | Lifecycle: `candidate → active → reinforced` (and `revised` / `deprecated` / `forgotten`) |
20
+ | `should_do` / `should_not_do` | Tier 2 guidance — see *Two tiers* |
21
+ | `usage_guidance` | The machine-read policy object (below) |
22
+ | `version_history` | Append-only log of every content rewrite — provenance is never lost |
23
+
24
+ ## Emotional coordinates: valence and arousal
25
+
26
+ Two orthogonal dimensions (Russell's circumplex), supplied by the classifier or inferred by formula:
27
+
28
+ - **valence** (−1…1): emotional charge. Painful ↔ warm.
29
+ - **arousal** (0…1): intensity. A settled preference is low-arousal; a friction moment, a peak, a marked commitment is high-arousal. *A quiet warm memory can be high valence and low arousal* — they measure different things.
30
+
31
+ Arousal exists to modulate **decay** (planned: high-arousal memories fade slower — the Ebbenhaus-style decay engine is on the roadmap, and the data model already carries everything it needs) and to weight surfacing of unresolved, intense material.
32
+
33
+ **salience** (0…1) is a third, distinct number: importance-to-surface. A high-arousal moment can still be low-salience (too situational to bring forward), and vice versa.
34
+
35
+ ## Initiative ≠ access: `proactive_use`
36
+
37
+ The single most important policy distinction in Sostenuto:
38
+
39
+ | Value | Meaning |
40
+ |---|---|
41
+ | `yes` | Always-on. Injected into every session's orientation block. Small, curated set — identity-level. |
42
+ | `only_when_relevant` | The default. Surfaces through semantic retrieval when the conversation matches it. |
43
+ | `no` | Never volunteered. **Still retrievable** — but only on *explicit anchor*: the user's message must match it at high similarity (default ≥ 0.65; calibrated for query-type embeddings, which score lower than document-pair similarity). |
44
+
45
+ `proactive_use` controls whether the companion *brings something up*. It does not control whether the companion *can remember it when asked*. The user clearly referencing a memory is consent to recall it; incidental similarity is not.
46
+
47
+ Corollary: **sensitivity does not gate retrieval.** High-sensitivity memories are part of the relationship and must stay findable when referenced. If something shouldn't auto-surface, that's a `proactive_use` decision — made by policy or curation, never by a blanket sensitivity rule. Blanket rules turn warmth into bureaucracy.
48
+
49
+ ## Two tiers: content vs. instruction
50
+
51
+ Most memories (Tier 1) are **content-only** — they surface as themselves and the model responds to what they say. A small curated subset (Tier 2) carries `should_do`: a short, positive instruction distilling a rule the user taught — a boundary, a style correction, an operating principle. These render in the behavior-guidance block and silently shape the companion's conduct.
52
+
53
+ Two deliberate asymmetries:
54
+
55
+ 1. **Only items that earned an instruction get one.** Auto-generating guidance for every memory produces generic noise; the cap (default 8 items) stays meaningful because most memories never enter the block at all.
56
+ 2. **`should_not_do` is never auto-populated.** If present, it was set by hand and means it. The default posture is lean, warm, action-oriented — restrictions are added deliberately, not accumulated defensively. When multiple constraints could apply, the companion should default to warm and present, not cautious and short.
57
+
58
+ ## The write path: reinforce, don't duplicate
59
+
60
+ Every candidate memory is embedded and searched against existing memories before insert (`src/memory/store.js`):
61
+
62
+ ```
63
+ similarity ≥ 0.88 → may UPGRADE content (near-paraphrase, substantially
64
+ more complete, concrete) — old content archived to
65
+ version_history
66
+ similarity ≥ 0.75 → REINFORCE: evidence_refs grows, confidence rises,
67
+ status → 'reinforced'. Content untouched.
68
+ below 0.75 → INSERT as new
69
+ ```
70
+
71
+ The dual threshold matters: between 0.75 and 0.88, related-but-distinct memories *link* (shared evidence trail) without overwriting each other. A memory reinforced across many sessions accumulates a cross-session provenance trail — and ranks above one-off observations in the behavior-guidance block (evidence count is the tiebreaker after salience).
72
+
73
+ Batches process sequentially so that near-duplicates *within* one batch collapse correctly: the first occurrence inserts, the second reinforces it.
74
+
75
+ ## The read paths
76
+
77
+ Four channels feed prompt assembly (`src/retrieval/assembly.js`):
78
+
79
+ 1. **Proactive block** — `proactive_use='yes'`, ranked by status then confidence.
80
+ 2. **Behavior guidance** — Tier 2, ranked salience → evidence count, capped small.
81
+ 3. **Recent sessions** — recency window: top N in full (summary + diary + key points), next M as headlines.
82
+ 4. **Semantic retrieval** — query-matched, time-decayed (`similarity × e^(−λ·age)`), fanned across session summaries, key points, and memory objects, anchor-gated for `proactive_use='no'`.
83
+
84
+ Channels 1–3 are stable within a session and live in the cacheable prefix; channel 4 is computed once per session on the first substantive message and cached on the session row.
85
+
86
+ ## Sessions, classification, and the watermark
87
+
88
+ Sessions are classified by an LLM (`src/classify/`) into headline, arc-shaped summary, first-person diary, thinking-highlights, key points, emotion deltas, and candidate memories. Two prompt modes:
89
+
90
+ - **Full** — first classification of a session, phase-marked for long transcripts.
91
+ - **Incremental** — re-classification receives the prior record + only the new turns. Cost stays O(new). The watermark (`last_classified_message_count`) and a minimum-new-turns threshold prevent churn.
92
+
93
+ Emotion deltas are cumulative per session; on re-classification only the *net* difference is applied to agent state, so nothing double-counts.
94
+
95
+ ## Forgetting
96
+
97
+ Sostenuto forgets in gradients, not deletions:
98
+
99
+ 1. Recency windows (only the top sessions enter the prompt in full)
100
+ 2. Time-decay scoring in retrieval (old needs higher similarity to compete)
101
+ 3. Status lifecycle (`deprecated` / `forgotten` exclude from all reads, reversibly)
102
+ 4. Caps with ranked eviction (behavior guidance, hot key points)
103
+ 5. *(Roadmap)* the decay engine: confidence erosion over time since last reinforcement, modulated by arousal — with `proactive_use='yes'` items floored so curated memory never silently disappears
104
+
105
+ Hard deletion exists (it's your database), but the design treats forgetting as a ranking problem, not a destruction problem. The sostenuto pedal doesn't silence the other strings — it just doesn't sustain them.