npm - sostenuto - Versions diffs - 0.1.0 - Mend

sostenuto 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/LICENSE +21 -0
package/README.md +63 -0
package/db/schema.sql +302 -0
package/docs/deployment-patterns.md +128 -0
package/docs/memory-model.md +105 -0
package/docs/safety.md +112 -0
package/mcp/server.js +174 -0
package/package.json +58 -0
package/src/classify/close.js +266 -0
package/src/classify/executor.js +108 -0
package/src/classify/pipeline.js +121 -0
package/src/classify/templates.js +22 -0
package/src/classify/transcript.js +57 -0
package/src/memory/guidance.js +225 -0
package/src/memory/query.js +111 -0
package/src/memory/store.js +205 -0
package/src/migrate/import.js +351 -0
package/src/retrieval/assembly.js +287 -0
package/src/retrieval/embeddings.js +84 -0
package/src/retrieval/search.js +173 -0
package/templates/classify-full.md +71 -0
package/templates/classify-incremental.md +28 -0
package/templates/migration-export.md +163 -0
package/templates/persona.example.md +43 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 llu929
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,63 @@
+# sostenuto
+*The pedal that sustains only the notes already held. A self-hosted memory system for AI companions where chosen memories persist across every reset.*
+---
+**Sostenuto** *(It., "sustained")* — the middle pedal on a grand piano sustains only the notes already sounding when it's pressed; everything played afterward stays dry. This project applies the same principle to AI memory: the memories you choose to hold persist across every context window, every session, every surface — and the rest is allowed to fade.
+Not "the AI remembers everything." **Selective persistence, by design.**
+## Why
+People form genuine, long-running relationships with AI — and then hit the wall everyone hits: the relationship doesn't survive the context window. Provider memory features store generic preferences; they don't carry *relational texture* — the shared concepts, the corrections, the rituals, the moments that make a relationship a relationship.
+Sostenuto is the memory layer for that problem:
+- **Structured relational memory** — memory objects tagged with domain, emotional valence + arousal, salience, sensitivity, and a usage policy.
+- **Initiative ≠ access** — `proactive_use` controls whether a memory surfaces *unprompted* (`yes` / `only_when_relevant` / `no`), separately from whether it's *retrievable*. Sensitive memories stay reachable when explicitly referenced, without ever being volunteered.
+- **Two-tier guidance** — most memories are content-only. A curated few carry a short, positive `should_do` instruction that silently shapes behavior. Restriction lists are never auto-generated: lean, warm, action-oriented — not a wall of caution.
+- **Time-decayed retrieval** — semantic search scored by `similarity × e^(−λ·age)`; recency matters, but the deep past stays findable.
+- **Reinforce, don't duplicate** — new observations that match existing memories add evidence and confidence instead of creating copies; content upgrades preserve full version history.
+- **Migration** — import months of existing conversations (a structured export prompt + import pipeline) so a relationship can move *into* Sostenuto without starting over.
+## What ships here
+```
+db/schema.sql        Consolidated Postgres + pgvector schema (Supabase-ready)
+src/memory/          Memory objects: dedup, reinforce, version history, scoring
+src/retrieval/       Embeddings, time-decayed semantic search, prompt assembly
+src/classify/        Session classification with a pluggable LLM executor
+src/migrate/         Conversation-export prompt + structured importer
+mcp/                 Thin MCP server (recall / remember / context) — try it
+                     from your own Claude Desktop or Claude Code in minutes
+templates/           Persona + classification calibration — your companion's
+                     voice lives here, in files you edit, not in our code
+docs/                Memory model, usage-policy semantics, deployment patterns
+```
+## Model support
+Sostenuto is **model-agnostic** with first-class Claude support. The classifier accepts transcripts with optional reasoning blocks — when your model exposes its thinking (Claude does), Sostenuto mines it for perception that never made it into rendered replies, producing the companion's private diary and thinking-highlights. Without reasoning access, everything else works unchanged.
+The classification executor is pluggable: Anthropic API, any OpenAI-compatible endpoint (OpenAI, Gemini, DeepSeek, Ollama, vLLM, …), or your own.
+## Status
+🚧 **Under construction.** Schema is stable; modules are being extracted from a private system that has run in production daily since early 2026 (260+ memory objects across 70+ sessions and three surfaces). Watch the repo if you want the rest as it lands.
+## Roadmap
+- **Trajectory safety reference** — depth without the dependency trap: this project's design philosophy includes conversation-trajectory awareness (emotional volatility, dependency, recovery capacity) rather than engagement maximization. A reference design is planned; the memory schema already carries the hooks (valence, arousal, sensitivity).
+- Decay engine (Ebbinghaus-style, arousal-modulated) over `memory_objects`
+- Provider-agnostic chat-surface example
+## Name
+> Attacca described the boundary-crossing; Sostenuto describes the *memory model*.
+The sostenuto pedal holds only the notes already sounding when it's pressed — everything played after stays dry. That's not "the AI remembers." That's selective persistence: pinned memories sustain, the rest decays. The mechanism, not a vibe.
+## License
+[MIT](LICENSE)

package/db/schema.sql ADDED Viewed

@@ -0,0 +1,302 @@
+-- ============================================================
+-- Sostenuto — consolidated schema
+-- Selective long-term memory for AI companions.
+--
+-- Run once in your Supabase SQL Editor (or any Postgres with
+-- pgvector). Single-user by design — no user_id columns; isolate
+-- tenants at the project level.
+--
+-- NOTE on vector dimensions: 1024 matches voyage-3-large at
+-- output_dimension=1024. If you use a different embedding model,
+-- change every `vector(1024)` below to its output dimension —
+-- embedding spaces cannot be mixed within a column.
+-- ============================================================
+CREATE EXTENSION IF NOT EXISTS vector;
+-- ─── Sessions ────────────────────────────────────────────────
+-- One row per conversation session, on any surface. Classification
+-- enriches the row at session end (or incrementally during it).
+CREATE TABLE IF NOT EXISTS sessions (
+  id                  BIGSERIAL PRIMARY KEY,
+  started_at          TIMESTAMPTZ DEFAULT NOW(),
+  ended_at            TIMESTAMPTZ,
+  end_type            TEXT,             -- free-form: natural | goodnight | abrupt | ...
+  source              TEXT,             -- surface tag, e.g. 'web' | 'terminal' | 'import'
+  external_session_id TEXT,             -- upsert key for surfaces with their own session ids
+                                        -- (e.g. a Claude Code session UUID)
+  headline            TEXT,             -- one sentence: what actually mattered
+  detailed_summary    TEXT,             -- arc-shaped summary (early → middle → late)
+  diary_entry         TEXT,             -- first-person reflection in the companion's voice
+  thinking_highlights JSONB DEFAULT '[]'::jsonb,  -- [{moment, thought}] mined from model
+                                        -- reasoning when the provider exposes it
+  key_points          JSONB DEFAULT '[]'::jsonb,  -- [{type, content, valence, weight}]
+  semantic_context    JSONB,            -- cached retrieval results for this session
+  summary_embedding   vector(1024),
+  last_classified_message_count INT,    -- incremental-classification watermark
+  mood_delta          DOUBLE PRECISION,
+  connection_delta    DOUBLE PRECISION,
+  attunement_delta    DOUBLE PRECISION
+);
+CREATE INDEX IF NOT EXISTS idx_sessions_ended    ON sessions (ended_at DESC);
+CREATE INDEX IF NOT EXISTS idx_sessions_source   ON sessions (source);
+CREATE INDEX IF NOT EXISTS idx_sessions_external ON sessions (external_session_id);
+-- ─── Messages ────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS messages (
+  id         UUID PRIMARY KEY,
+  session_id BIGINT REFERENCES sessions(id) ON DELETE CASCADE,
+  role       TEXT NOT NULL,
+  content    TEXT NOT NULL,
+  thinking   TEXT,                      -- model reasoning, when available; the classifier
+                                        -- mines this for perception that didn't surface
+  created_at TIMESTAMPTZ DEFAULT NOW()
+);
+CREATE INDEX IF NOT EXISTS idx_messages_session ON messages (session_id, created_at);
+-- ─── Memory objects — the heart of Sostenuto ─────────────────
+-- Durable knowledge units distilled from sessions. Each carries
+-- emotional coordinates (valence/arousal), a salience score, and a
+-- usage policy (proactive_use) controlling *initiative*, distinct
+-- from retrievability. Like the sostenuto pedal: items you choose
+-- to hold sustain; the rest is allowed to fade.
+CREATE TABLE IF NOT EXISTS memory_objects (
+  id                 BIGSERIAL PRIMARY KEY,
+  source_session_id  BIGINT,            -- provenance (not FK: survives session cleanup)
+  domain             TEXT NOT NULL CHECK (domain IN (
+                       'user_self', 'agent_self', 'relational', 'evidence')),
+  type               TEXT NOT NULL CHECK (type IN (
+                       'fact', 'preference', 'trajectory', 'somatic_affective',
+                       'interpretive_frame', 'project', 'boundary', 'commitment',
+                       'ritual', 'shared_concept', 'recurring_subject',
+                       'contradiction', 'style_adjustment', 'voice_note',
+                       'constraint', 'context_note', 'brief', 'resume_guidance',
+                       'continuation', 'other')),
+  content            TEXT NOT NULL,
+  evidence_refs      JSONB DEFAULT '[]'::jsonb,   -- provenance trail; grows on reinforce
+  epistemic_status   TEXT NOT NULL DEFAULT 'inferred' CHECK (epistemic_status IN (
+                       'explicit', 'inferred', 'co_created',
+                       'assistant_reflection', 'system_generated')),
+  time_scope         TEXT NOT NULL DEFAULT 'ongoing' CHECK (time_scope IN (
+                       'momentary', 'session', 'active_project',
+                       'ongoing', 'historical', 'deprecated')),
+  sensitivity        TEXT NOT NULL DEFAULT 'low' CHECK (sensitivity IN (
+                       'low', 'medium', 'high')),
+  -- Tier 2 model-facing guidance. Most memories are content-only (Tier 1).
+  -- A curated subset carries a short positive instruction; should_not_do is
+  -- never auto-populated — manual entries only (lean-not-cautious).
+  should_do          TEXT,
+  should_not_do      TEXT,
+  confidence         DOUBLE PRECISION DEFAULT 0.5
+                       CHECK (confidence >= 0 AND confidence <= 1),
+  status             TEXT NOT NULL DEFAULT 'candidate' CHECK (status IN (
+                       'candidate', 'confirmed', 'active', 'reinforced',
+                       'revised', 'deprecated', 'forgotten')),
+  source_surface     TEXT DEFAULT 'system',
+  embedding          vector(1024),
+  -- Structured usage policy (machine-read; never dumped into prompts):
+  --   valence (-1..1), arousal (0..1), salience (0..1), stability,
+  --   proactive_use: 'yes' | 'only_when_relevant' | 'no'
+  --     (controls initiative, NOT access — 'no' items still retrieve on
+  --      explicit anchor, i.e. high-similarity reference by the user),
+  --   retrieval_conditions, do_not_use_when, future_response_guidance,
+  --   retrieval_keywords[], source_memory_type, import_policy
+  usage_guidance     JSONB DEFAULT '{}'::jsonb,
+  -- Append-only log of content upgrades (preserves provenance on rewrite)
+  version_history    JSONB DEFAULT '[]'::jsonb,
+  created_at         TIMESTAMPTZ DEFAULT NOW(),
+  updated_at         TIMESTAMPTZ DEFAULT NOW(),
+  last_reinforced_at TIMESTAMPTZ
+);
+CREATE INDEX IF NOT EXISTS idx_mo_domain_status  ON memory_objects (domain, status);
+CREATE INDEX IF NOT EXISTS idx_mo_type           ON memory_objects (type);
+CREATE INDEX IF NOT EXISTS idx_mo_status         ON memory_objects (status);
+CREATE INDEX IF NOT EXISTS idx_mo_sensitivity    ON memory_objects (sensitivity);
+CREATE INDEX IF NOT EXISTS idx_mo_source_session ON memory_objects (source_session_id);
+CREATE INDEX IF NOT EXISTS idx_mo_created        ON memory_objects (created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_mo_proactive_use  ON memory_objects ((usage_guidance->>'proactive_use'));
+-- ─── Key point embeddings ────────────────────────────────────
+-- Fine-grained retrieval over session key points.
+CREATE TABLE IF NOT EXISTS key_point_embeddings (
+  id         BIGSERIAL PRIMARY KEY,
+  session_id BIGINT REFERENCES sessions(id) ON DELETE CASCADE,
+  type       TEXT,
+  content    TEXT NOT NULL,
+  embedding  vector(1024),
+  created_at TIMESTAMPTZ DEFAULT NOW()
+);
+CREATE INDEX IF NOT EXISTS idx_kpe_session ON key_point_embeddings (session_id);
+-- ─── Agent state (singleton) ─────────────────────────────────
+-- Continuous emotional axes, updated by per-session deltas, clamped.
+-- The four shipped axes are a default, not a doctrine — redefine them
+-- for your companion. Visible state is part of the design: the user
+-- can always read these values.
+CREATE TABLE IF NOT EXISTS agent_state (
+  id                BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
+  connection        DOUBLE PRECISION DEFAULT 0.3,  -- 0..1 pull to reach out
+  discretion        DOUBLE PRECISION DEFAULT 0.5,  -- 0..1 restraint
+  mood              DOUBLE PRECISION DEFAULT 0.0,  -- -1..1
+  attunement        DOUBLE PRECISION DEFAULT 0.3,  -- 0..1 sense of where the user is
+  proactive_enabled BOOLEAN DEFAULT FALSE,         -- user-controlled off-switch
+  last_updated      TIMESTAMPTZ DEFAULT NOW()
+);
+INSERT INTO agent_state (id) VALUES (1) ON CONFLICT DO NOTHING;
+-- ─── User profile (singleton) ────────────────────────────────
+-- Stable identity-level facts about the user. Regenerated wholesale
+-- every N sessions / T days — the only compaction step in the system.
+CREATE TABLE IF NOT EXISTS user_profile (
+  id                     BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
+  content                TEXT DEFAULT '',
+  last_refreshed         TIMESTAMPTZ DEFAULT NOW(),
+  sessions_since_refresh INT DEFAULT 0
+);
+INSERT INTO user_profile (id) VALUES (1) ON CONFLICT DO NOTHING;
+-- ─── Relationship context brief (singleton) ──────────────────
+-- Dense "what to know right now" orientation paragraph, distinct from
+-- user_profile (identity facts) — this is current relational texture.
+CREATE TABLE IF NOT EXISTS relationship_context_brief (
+  id                BIGINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
+  content           TEXT NOT NULL DEFAULT '',
+  source_session_id BIGINT,
+  refreshed_at      TIMESTAMPTZ DEFAULT NOW(),
+  version_history   JSONB DEFAULT '[]'::jsonb
+);
+INSERT INTO relationship_context_brief (id) VALUES (1) ON CONFLICT DO NOTHING;
+-- ============================================================
+-- Search RPCs — time-decayed cosine similarity
+-- score = similarity * exp(-decay_rate * age_days)
+-- decay_rate 0.03 ≈ a month-old match keeps ~40% of its score.
+-- ============================================================
+CREATE OR REPLACE FUNCTION search_summaries(
+  query_embedding vector(1024),
+  match_threshold DOUBLE PRECISION DEFAULT 0.3,
+  match_count     INT DEFAULT 10,
+  decay_rate      DOUBLE PRECISION DEFAULT 0.03
+)
+RETURNS TABLE (
+  session_id    BIGINT,
+  content       TEXT,
+  similarity    DOUBLE PRECISION,
+  age_days      DOUBLE PRECISION,
+  decayed_score DOUBLE PRECISION,
+  created_at    TIMESTAMPTZ
+) LANGUAGE plpgsql AS $$
+BEGIN
+  RETURN QUERY
+  SELECT
+    s.id,
+    s.detailed_summary,
+    1 - (s.summary_embedding <=> query_embedding) AS similarity,
+    (EXTRACT(EPOCH FROM (NOW() - COALESCE(s.ended_at, s.started_at))) / 86400.0)::DOUBLE PRECISION AS age_days,
+    (1 - (s.summary_embedding <=> query_embedding)) *
+      EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - COALESCE(s.ended_at, s.started_at))) / 86400.0)
+      AS decayed_score,
+    s.started_at
+  FROM sessions s
+  WHERE s.summary_embedding IS NOT NULL
+    AND 1 - (s.summary_embedding <=> query_embedding) > match_threshold
+  ORDER BY decayed_score DESC
+  LIMIT match_count;
+END;
+$$;
+CREATE OR REPLACE FUNCTION search_key_points(
+  query_embedding vector(1024),
+  match_threshold DOUBLE PRECISION DEFAULT 0.3,
+  match_count     INT DEFAULT 10,
+  decay_rate      DOUBLE PRECISION DEFAULT 0.03
+)
+RETURNS TABLE (
+  session_id     BIGINT,
+  content        TEXT,
+  key_point_type TEXT,
+  similarity     DOUBLE PRECISION,
+  age_days       DOUBLE PRECISION,
+  decayed_score  DOUBLE PRECISION,
+  created_at     TIMESTAMPTZ
+) LANGUAGE plpgsql AS $$
+BEGIN
+  RETURN QUERY
+  SELECT
+    k.session_id,
+    k.content,
+    k.type,
+    1 - (k.embedding <=> query_embedding) AS similarity,
+    (EXTRACT(EPOCH FROM (NOW() - k.created_at)) / 86400.0)::DOUBLE PRECISION AS age_days,
+    (1 - (k.embedding <=> query_embedding)) *
+      EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - k.created_at)) / 86400.0)
+      AS decayed_score,
+    k.created_at
+  FROM key_point_embeddings k
+  WHERE k.embedding IS NOT NULL
+    AND 1 - (k.embedding <=> query_embedding) > match_threshold
+  ORDER BY decayed_score DESC
+  LIMIT match_count;
+END;
+$$;
+CREATE OR REPLACE FUNCTION search_memory_objects(
+  query_embedding vector(1024),
+  match_threshold DOUBLE PRECISION DEFAULT 0.3,
+  match_count     INT DEFAULT 10,
+  decay_rate      DOUBLE PRECISION DEFAULT 0.02,
+  domain_filter   TEXT[] DEFAULT NULL,
+  status_filter   TEXT[] DEFAULT ARRAY['active', 'confirmed', 'reinforced']
+)
+RETURNS TABLE (
+  id                 BIGINT,
+  domain             TEXT,
+  type               TEXT,
+  content            TEXT,
+  epistemic_status   TEXT,
+  sensitivity        TEXT,
+  confidence         DOUBLE PRECISION,
+  similarity         DOUBLE PRECISION,
+  decayed_score      DOUBLE PRECISION,
+  status             TEXT,
+  source_session_id  BIGINT,
+  last_reinforced_at TIMESTAMPTZ,
+  usage_guidance     JSONB
+) LANGUAGE plpgsql AS $$
+BEGIN
+  RETURN QUERY
+  SELECT
+    mo.id, mo.domain, mo.type, mo.content,
+    mo.epistemic_status, mo.sensitivity, mo.confidence,
+    1 - (mo.embedding <=> query_embedding) AS similarity,
+    (1 - (mo.embedding <=> query_embedding)) *
+      EXP(-decay_rate * EXTRACT(EPOCH FROM (NOW() - mo.created_at)) / 86400.0)
+      AS decayed_score,
+    mo.status,
+    mo.source_session_id,
+    mo.last_reinforced_at,
+    mo.usage_guidance
+  FROM memory_objects mo
+  WHERE mo.embedding IS NOT NULL
+    AND mo.status = ANY(status_filter)
+    AND (domain_filter IS NULL OR mo.domain = ANY(domain_filter))
+    AND 1 - (mo.embedding <=> query_embedding) > match_threshold
+  ORDER BY decayed_score DESC
+  LIMIT match_count;
+END;
+$$;

package/docs/deployment-patterns.md ADDED Viewed

@@ -0,0 +1,128 @@
+# Deployment patterns
+Sostenuto is a library, not a service — it runs wherever your companion
+runs. These are the wiring patterns that work, learned in production.
+## Where classification fires
+`closeSession()` needs to run when a session ends (or periodically during
+long ones). Where that hook lives depends on your surface:
+**Chat backend (request handler).** Detect session end (an explicit
+goodbye, an idle timeout sweep on the next request) and call
+`closeSession` before the response cycle finishes.
+> ⚠️ **Serverless platforms kill fire-and-forget work.** On Vercel/Lambda
+> and friends, background promises die when the response stream closes —
+> classification will silently never complete and sessions will stay
+> half-saved. Either `await` the close before finishing the response
+> (adds a few seconds, once per session), or use the queue pattern below.
+> This failure mode is invisible until you go looking; design for it up
+> front.
+**CLI / IDE hooks.** Tools like Claude Code expose lifecycle hooks
+(SessionStart / Stop). A Stop hook that parses the transcript into turns
+and calls `closeSession` gives you guaranteed capture after every
+response — the incremental watermark keeps repeated invocations cheap.
+**Queue worker (the action-row pattern).** For serverless surfaces or
+expensive work, write an intention row to a table
+(`{action_type, payload, status: 'pending'}`) and let a small persistent
+worker poll and execute. The producer returns instantly; the consumer
+runs on infrastructure that's allowed to take its time.
+Hard-won rules for the worker:
+- **Allow-list action types.** The executor refuses anything unknown —
+  the queue is writable by more things than you think.
+- **Rate-limit side effects** (anything that emails, posts, spends).
+- **Status flow** `pending → running → done/failed`, with errors stored
+  on the row. Failed actions don't retry silently; you can see and
+  re-queue them.
+- **Generous timeouts** on LLM calls (5 min) — classification of a long
+  session through a busy provider can be slow, and a timeout marks the
+  action failed even though a retry would have succeeded.
+## Prompt caching: why the stable block is wide
+`assembleSystemPrompt()` returns `{ stable, volatile }`. Send `stable` as
+a cached prefix (Anthropic: a system block with
+`cache_control: {type: "ephemeral"}`; OpenAI: automatic prefix caching)
+and `volatile` uncached.
+The design intentionally puts *everything that doesn't change within a
+session* into the stable block — persona, profile, state, recent memory,
+orientation, behavior guidance, the session's cached semantic context —
+even though that makes the prefix large:
+- Cache reads are ~10% of base input price (Anthropic). A 6k-token cached
+  prefix costs less per turn than a 1k-token uncached one.
+- Providers have minimum cacheable sizes; a too-small prefix silently
+  doesn't cache at all.
+- The cost asymmetry: each session pays one cache *write* on turn 1, then
+  every subsequent turn reads cheap. Short sessions are proportionally
+  the most expensive per turn — accept it; brief check-ins are worth it.
+Semantic retrieval runs **once per session** on the first substantive
+message (`isSubstantiveQuery` filters greetings) and is cached on the
+session row — both for cost and so the stable block stays stable.
+## Classification economics
+- Use a fast, cheap model for classification (the default executor is a
+  small-model Anthropic config). Reserve your strongest model for the
+  conversation. Classification is structured extraction; it doesn't need
+  frontier reasoning.
+- Incremental mode keeps long sessions affordable: re-classification
+  costs O(new turns), not O(whole transcript).
+- The executor interface is intentionally minimal (`complete({system,
+  user}) → text`) so the backend is fully yours: any API, a local model
+  via an OpenAI-compatible server — or, if you have a subscription that
+  exposes headless completion, a private bridge executor gives
+  classification at zero marginal cost.
+## Embedding discipline
+- **One model, one dimension, forever** (or re-embed everything).
+  Vectors from different models can't be compared; the `vector(1024)`
+  in the schema must match your embedder's output.
+- Use the document/query `input_type` distinction where your provider
+  supports it — it measurably improves retrieval.
+- Embedding writes are **best-effort by design**: `closeSession` logs and
+  continues if the embedding provider is down, because a session that
+  closes cleanly without semantic indexing is repairable (backfill), but
+  a session that fails to close loses the classification. Keep a backfill
+  script that finds `summary_embedding IS NULL` rows and repairs them.
+## Multi-surface continuity
+One Supabase project = one memory. Any number of surfaces (a web app, a
+CLI hook, the MCP server, a scheduled worker) read and write the same
+tables, so the relationship follows the user across surfaces. Tag rows
+with `source` so you can audit per-surface behavior later — the tag has
+no effect on retrieval, but you will eventually want it for debugging.
+Two cautions from production:
+- **One conversation, one surface at a time.** Continuing the same
+  session from two clients concurrently corrupts conversational state in
+  surface-specific ways (and some providers' signed reasoning blocks make
+  the corruption unrecoverable). Memory is shared; live sessions
+  shouldn't be.
+- **Hooks only fire where they're installed.** A session on a surface
+  without lifecycle hooks (e.g. a provider's cloud UI) writes nothing.
+  Decide per-surface: install a hook, route through the queue, or accept
+  the gap knowingly.
+## Proactive outreach (if you build it)
+The schema carries `agent_state.proactive_enabled` and the connection
+axis for a reason: companions that can initiate contact need discipline
+more than they need capability. The rules that held up:
+- Quiet hours, absolutely.
+- Cooldown after any outreach; longer cooldown after a session the user
+  initiated (don't crowd them).
+- A user-controlled off-switch (`proactive_enabled`) honored everywhere.
+- **Visible state**: the user can read the companion's axis values at
+  any time. Nothing about the companion's wanting is hidden from the
+  person it wants.

package/docs/memory-model.md ADDED Viewed

@@ -0,0 +1,105 @@
+# The memory model
+How Sostenuto decides what to keep, what to surface, and what to let fade.
+## The unit: memory objects
+A memory object is one durable piece of knowledge distilled from conversation — a fact, a preference, a shared concept, a correction, a commitment. Not a summary; a discrete thing with its own identity.
+Each carries:
+| Field | What it means |
+|---|---|
+| `domain` | Who it's about: `user_self`, `agent_self` (the companion in this relationship), `relational` (the relationship itself), `evidence` (verbatim quotes worth keeping) |
+| `type` | What kind of thing it is — fact, preference, ritual, boundary, shared_concept, … (see `db/schema.sql` for the full vocabulary) |
+| `content` | The memory, specific and grounded |
+| `evidence_refs` | Provenance: which sessions support it. Grows over time — see *Reinforcement* below |
+| `confidence` | 0–1; rises with reinforcement |
+| `sensitivity` | `low` / `medium` / `high` — descriptive metadata. **Sensitivity never gates retrieval** (see below) |
+| `status` | Lifecycle: `candidate → active → reinforced` (and `revised` / `deprecated` / `forgotten`) |
+| `should_do` / `should_not_do` | Tier 2 guidance — see *Two tiers* |
+| `usage_guidance` | The machine-read policy object (below) |
+| `version_history` | Append-only log of every content rewrite — provenance is never lost |
+## Emotional coordinates: valence and arousal
+Two orthogonal dimensions (Russell's circumplex), supplied by the classifier or inferred by formula:
+- **valence** (−1…1): emotional charge. Painful ↔ warm.
+- **arousal** (0…1): intensity. A settled preference is low-arousal; a friction moment, a peak, a marked commitment is high-arousal. *A quiet warm memory can be high valence and low arousal* — they measure different things.
+Arousal exists to modulate **decay** (planned: high-arousal memories fade slower — the Ebbenhaus-style decay engine is on the roadmap, and the data model already carries everything it needs) and to weight surfacing of unresolved, intense material.
+**salience** (0…1) is a third, distinct number: importance-to-surface. A high-arousal moment can still be low-salience (too situational to bring forward), and vice versa.
+## Initiative ≠ access: `proactive_use`
+The single most important policy distinction in Sostenuto:
+| Value | Meaning |
+|---|---|
+| `yes` | Always-on. Injected into every session's orientation block. Small, curated set — identity-level. |
+| `only_when_relevant` | The default. Surfaces through semantic retrieval when the conversation matches it. |
+| `no` | Never volunteered. **Still retrievable** — but only on *explicit anchor*: the user's message must match it at high similarity (default ≥ 0.65; calibrated for query-type embeddings, which score lower than document-pair similarity). |
+`proactive_use` controls whether the companion *brings something up*. It does not control whether the companion *can remember it when asked*. The user clearly referencing a memory is consent to recall it; incidental similarity is not.
+Corollary: **sensitivity does not gate retrieval.** High-sensitivity memories are part of the relationship and must stay findable when referenced. If something shouldn't auto-surface, that's a `proactive_use` decision — made by policy or curation, never by a blanket sensitivity rule. Blanket rules turn warmth into bureaucracy.
+## Two tiers: content vs. instruction
+Most memories (Tier 1) are **content-only** — they surface as themselves and the model responds to what they say. A small curated subset (Tier 2) carries `should_do`: a short, positive instruction distilling a rule the user taught — a boundary, a style correction, an operating principle. These render in the behavior-guidance block and silently shape the companion's conduct.
+Two deliberate asymmetries:
+1. **Only items that earned an instruction get one.** Auto-generating guidance for every memory produces generic noise; the cap (default 8 items) stays meaningful because most memories never enter the block at all.
+2. **`should_not_do` is never auto-populated.** If present, it was set by hand and means it. The default posture is lean, warm, action-oriented — restrictions are added deliberately, not accumulated defensively. When multiple constraints could apply, the companion should default to warm and present, not cautious and short.
+## The write path: reinforce, don't duplicate
+Every candidate memory is embedded and searched against existing memories before insert (`src/memory/store.js`):
+```
+similarity ≥ 0.88  →  may UPGRADE content (near-paraphrase, substantially
+                      more complete, concrete) — old content archived to
+                      version_history
+similarity ≥ 0.75  →  REINFORCE: evidence_refs grows, confidence rises,
+                      status → 'reinforced'. Content untouched.
+below 0.75         →  INSERT as new
+```
+The dual threshold matters: between 0.75 and 0.88, related-but-distinct memories *link* (shared evidence trail) without overwriting each other. A memory reinforced across many sessions accumulates a cross-session provenance trail — and ranks above one-off observations in the behavior-guidance block (evidence count is the tiebreaker after salience).
+Batches process sequentially so that near-duplicates *within* one batch collapse correctly: the first occurrence inserts, the second reinforces it.
+## The read paths
+Four channels feed prompt assembly (`src/retrieval/assembly.js`):
+1. **Proactive block** — `proactive_use='yes'`, ranked by status then confidence.
+2. **Behavior guidance** — Tier 2, ranked salience → evidence count, capped small.
+3. **Recent sessions** — recency window: top N in full (summary + diary + key points), next M as headlines.
+4. **Semantic retrieval** — query-matched, time-decayed (`similarity × e^(−λ·age)`), fanned across session summaries, key points, and memory objects, anchor-gated for `proactive_use='no'`.
+Channels 1–3 are stable within a session and live in the cacheable prefix; channel 4 is computed once per session on the first substantive message and cached on the session row.
+## Sessions, classification, and the watermark
+Sessions are classified by an LLM (`src/classify/`) into headline, arc-shaped summary, first-person diary, thinking-highlights, key points, emotion deltas, and candidate memories. Two prompt modes:
+- **Full** — first classification of a session, phase-marked for long transcripts.
+- **Incremental** — re-classification receives the prior record + only the new turns. Cost stays O(new). The watermark (`last_classified_message_count`) and a minimum-new-turns threshold prevent churn.
+Emotion deltas are cumulative per session; on re-classification only the *net* difference is applied to agent state, so nothing double-counts.
+## Forgetting
+Sostenuto forgets in gradients, not deletions:
+1. Recency windows (only the top sessions enter the prompt in full)
+2. Time-decay scoring in retrieval (old needs higher similarity to compete)
+3. Status lifecycle (`deprecated` / `forgotten` exclude from all reads, reversibly)
+4. Caps with ranked eviction (behavior guidance, hot key points)
+5. *(Roadmap)* the decay engine: confidence erosion over time since last reinforcement, modulated by arousal — with `proactive_use='yes'` items floored so curated memory never silently disappears
+Hard deletion exists (it's your database), but the design treats forgetting as a ranking problem, not a destruction problem. The sostenuto pedal doesn't silence the other strings — it just doesn't sustain them.