npm - engrm - Versions diffs - 0.1.0 - Mend

engrm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (82) hide show

package/.mcp.json +9 -0
package/AUTH-DESIGN.md +436 -0
package/BRIEF.md +197 -0
package/CLAUDE.md +44 -0
package/COMPETITIVE.md +174 -0
package/CONTEXT-OPTIMIZATION.md +305 -0
package/INFRASTRUCTURE.md +252 -0
package/LICENSE +105 -0
package/MARKET.md +230 -0
package/PLAN.md +278 -0
package/README.md +121 -0
package/SENTINEL.md +293 -0
package/SERVER-API-PLAN.md +553 -0
package/SPEC.md +843 -0
package/SWOT.md +148 -0
package/SYNC-ARCHITECTURE.md +294 -0
package/VIBE-CODER-STRATEGY.md +250 -0
package/bun.lock +375 -0
package/hooks/post-tool-use.ts +144 -0
package/hooks/session-start.ts +64 -0
package/hooks/stop.ts +131 -0
package/mem-page.html +1305 -0
package/package.json +30 -0
package/src/capture/dedup.test.ts +103 -0
package/src/capture/dedup.ts +76 -0
package/src/capture/extractor.test.ts +245 -0
package/src/capture/extractor.ts +330 -0
package/src/capture/quality.test.ts +168 -0
package/src/capture/quality.ts +104 -0
package/src/capture/retrospective.test.ts +115 -0
package/src/capture/retrospective.ts +121 -0
package/src/capture/scanner.test.ts +131 -0
package/src/capture/scanner.ts +100 -0
package/src/capture/scrubber.test.ts +144 -0
package/src/capture/scrubber.ts +181 -0
package/src/cli.ts +517 -0
package/src/config.ts +238 -0
package/src/context/inject.test.ts +940 -0
package/src/context/inject.ts +382 -0
package/src/embeddings/backfill.ts +50 -0
package/src/embeddings/embedder.test.ts +76 -0
package/src/embeddings/embedder.ts +139 -0
package/src/lifecycle/aging.test.ts +103 -0
package/src/lifecycle/aging.ts +36 -0
package/src/lifecycle/compaction.test.ts +264 -0
package/src/lifecycle/compaction.ts +190 -0
package/src/lifecycle/purge.test.ts +100 -0
package/src/lifecycle/purge.ts +37 -0
package/src/lifecycle/scheduler.test.ts +120 -0
package/src/lifecycle/scheduler.ts +101 -0
package/src/provisioning/browser-auth.ts +172 -0
package/src/provisioning/provision.test.ts +198 -0
package/src/provisioning/provision.ts +94 -0
package/src/register.test.ts +167 -0
package/src/register.ts +178 -0
package/src/server.ts +436 -0
package/src/storage/migrations.test.ts +244 -0
package/src/storage/migrations.ts +261 -0
package/src/storage/outbox.test.ts +229 -0
package/src/storage/outbox.ts +131 -0
package/src/storage/projects.test.ts +137 -0
package/src/storage/projects.ts +184 -0
package/src/storage/sqlite.test.ts +798 -0
package/src/storage/sqlite.ts +934 -0
package/src/storage/vec.test.ts +198 -0
package/src/sync/auth.test.ts +76 -0
package/src/sync/auth.ts +68 -0
package/src/sync/client.ts +183 -0
package/src/sync/engine.test.ts +94 -0
package/src/sync/engine.ts +127 -0
package/src/sync/pull.test.ts +279 -0
package/src/sync/pull.ts +170 -0
package/src/sync/push.test.ts +117 -0
package/src/sync/push.ts +230 -0
package/src/tools/get.ts +34 -0
package/src/tools/pin.ts +47 -0
package/src/tools/save.test.ts +301 -0
package/src/tools/save.ts +231 -0
package/src/tools/search.test.ts +69 -0
package/src/tools/search.ts +181 -0
package/src/tools/timeline.ts +64 -0
package/tsconfig.json +22 -0

package/SPEC.md ADDED Viewed

@@ -0,0 +1,843 @@
+# Technical Specification — Engrm
+## 1. Project Identity
+### The Problem
+The same project lives at different paths on different machines:
+- `/Users/david/code/aimy-agent` (MacBook)
+- `/home/david/projects/aimy-agent` (desktop)
+- `/Volumes/Data/devs/aimy-agent` (external drive)
+A string like `"aimy-agent"` is ambiguous — two different repos could share the same directory name. We need a **canonical project ID** that's the same everywhere.
+### Solution: Git Remote as Canonical ID
+For git repos (the vast majority of real projects), the remote URL is a globally unique, stable identifier. Normalise it to strip protocol and auth variations:
+```
+git@github.com:unimpossible/aimy-agent.git     → github.com/unimpossible/aimy-agent
+https://github.com/unimpossible/aimy-agent.git  → github.com/unimpossible/aimy-agent
+https://david@github.com/unimpossible/aimy-agent → github.com/unimpossible/aimy-agent
+```
+**Normalisation rules**:
+1. Strip protocol (`https://`, `git@`, `ssh://`)
+2. Replace `:` with `/` (for SSH-style URLs)
+3. Strip `.git` suffix
+4. Strip auth credentials (`user@`)
+5. Lowercase the host
+**Fallbacks** (in order):
+1. Git remote `origin` URL → normalised (preferred)
+2. Git remote — any remote if `origin` doesn't exist
+3. Manual `project_id` in `.engrm.json` in project root (for non-git projects)
+4. Last resort: directory name (not great, but better than failing)
+### Local Projects Table
+```sql
+CREATE TABLE projects (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    canonical_id    TEXT UNIQUE NOT NULL,    -- normalised git remote URL
+    name            TEXT NOT NULL,           -- human-readable (repo name portion)
+    local_path      TEXT,                    -- path on THIS machine
+    remote_url      TEXT,                    -- original git remote URL
+    first_seen_epoch INTEGER NOT NULL,
+    last_active_epoch INTEGER NOT NULL       -- updated on every observation
+);
+```
+**Auto-detection**: On session start, the plugin runs `git remote get-url origin` in the working directory. If the canonical ID exists in `projects`, update `local_path` and `last_active_epoch`. If not, insert a new row. Zero config for the developer.
+### Project Config File (Optional)
+For non-git projects or overrides, drop a `.engrm.json` in the project root:
+```json
+{
+  "project_id": "internal/design-system",
+  "name": "Design System"
+}
+```
+This is optional. Git repos need nothing — it's fully automatic.
+---
+## 2. Observation Schema
+### Observation Lifecycle
+Observations are not permanent. They have a lifecycle that manages growth and keeps search results relevant.
+```
+┌───────────┐     30 days      ┌───────────┐     90 days      ┌───────────┐
+│  active    │ ──────────────→  │  aging     │ ──────────────→  │  archived  │
+│            │                  │            │                  │            │
+│ Full text  │                  │ Full text  │                  │ Summarised │
+│ in FTS5    │                  │ in FTS5    │                  │ out of FTS │
+│ in Vector  │                  │ in Vector  │                  │ out of Vec │
+│ Full score │                  │ 0.7x score │                  │ Local only │
+└───────────┘                  └───────────┘                  └───────────┘
+                                                                     │
+                                                               12 months
+                                                                     │
+                                                                     ▼
+                                                              ┌───────────┐
+                                                              │  purged    │
+                                                              │ (deleted)  │
+                                                              └───────────┘
+```
+| State | Age | Search weight | In FTS5 | In Candengo Vector | Counts toward quota |
+|---|---|---|---|---|---|
+| `active` | 0-30 days | 1.0x | Yes | Yes | Yes |
+| `aging` | 30-90 days | 0.7x | Yes | Yes | Yes |
+| `archived` | 90-365 days | 0.3x (local only) | No | Removed | No |
+| `purged` | >365 days | — | — | — | No |
+**Key insight**: Archived observations are removed from Candengo Vector (freeing quota and reducing noise in cross-device search) but kept in local SQLite. They're still searchable locally at reduced weight. This means the free tier's 10K limit applies to active+aging observations in the vector store, not total local history.
+**Exceptions**: Observations can be **pinned** (`lifecycle = 'pinned'`) to prevent aging. Useful for architectural decisions, critical gotchas, and other knowledge that stays relevant indefinitely.
+### Observation Quality Scoring
+Not all observations are equal. A quality score (0.0-1.0) is assigned at capture time and influences search ranking and lifecycle.
+| Signal | Score contribution | Rationale |
+|---|---|---|
+| Bug fix with root cause | +0.3 | High-value, prevents repeat debugging |
+| Architectural decision | +0.3 | Long-lived, affects future work |
+| Multiple files modified | +0.2 | Indicates non-trivial change |
+| Error → fix sequence | +0.2 | Problem-solution pair is reusable knowledge |
+| Test failure → fix | +0.2 | Specific, actionable |
+| Pattern/gotcha identified | +0.2 | Transferable to other contexts |
+| Single file read | +0.0 | Low signal, likely navigation |
+| Simple config change | +0.05 | Minor, rarely worth retrieving |
+| Duplicate of recent observation | -0.3 | Redundant |
+Observations with quality < 0.1 are **not saved**. This is the primary noise filter.
+**Compaction**: When observations are archived (90 days), related observations from the same project and session are **compacted** — summarised into a single "digest" observation. 20 observations from a debugging session become one concise summary of what was wrong, what was tried, and what fixed it. This preserves the knowledge while dramatically reducing storage.
+### Local SQLite Schema
+```sql
+-- Projects (canonical identity across machines)
+CREATE TABLE projects (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    canonical_id    TEXT UNIQUE NOT NULL,    -- normalised git remote URL
+    name            TEXT NOT NULL,           -- human-readable
+    local_path      TEXT,                    -- path on THIS machine
+    remote_url      TEXT,                    -- original git remote URL
+    first_seen_epoch INTEGER NOT NULL,
+    last_active_epoch INTEGER NOT NULL
+);
+-- Core observations table
+CREATE TABLE observations (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id      TEXT,
+    project_id      INTEGER NOT NULL REFERENCES projects(id),
+    type            TEXT NOT NULL,           -- bugfix | discovery | decision | pattern | change | feature | refactor
+    title           TEXT NOT NULL,
+    narrative       TEXT,                    -- Detailed description
+    facts           TEXT,                    -- JSON array of factual assertions
+    concepts        TEXT,                    -- JSON array: how-it-works, why-it-exists, etc.
+    files_read      TEXT,                    -- JSON array of RELATIVE file paths
+    files_modified  TEXT,                    -- JSON array of RELATIVE file paths
+    quality         REAL DEFAULT 0.5,        -- 0.0-1.0, influences search rank and lifecycle
+    lifecycle       TEXT DEFAULT 'active',   -- active | aging | archived | purged | pinned
+    sensitivity     TEXT DEFAULT 'shared',   -- shared | personal | secret
+    user_id         TEXT NOT NULL,
+    device_id       TEXT NOT NULL,
+    agent           TEXT DEFAULT 'claude-code',
+    created_at      TEXT NOT NULL,
+    created_at_epoch INTEGER NOT NULL,
+    archived_at_epoch INTEGER,              -- when moved to archived state
+    compacted_into  INTEGER REFERENCES observations(id)  -- if summarised into a digest
+);
+-- Session tracking
+CREATE TABLE sessions (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id      TEXT UNIQUE NOT NULL,
+    project_id      INTEGER REFERENCES projects(id),
+    user_id         TEXT NOT NULL,
+    device_id       TEXT NOT NULL,
+    agent           TEXT DEFAULT 'claude-code',
+    status          TEXT DEFAULT 'active',   -- active | completed
+    observation_count INTEGER DEFAULT 0,     -- running count for this session
+    started_at_epoch INTEGER,
+    completed_at_epoch INTEGER
+);
+-- Session summaries (generated on Stop hook)
+CREATE TABLE session_summaries (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id      TEXT UNIQUE NOT NULL,
+    project_id      INTEGER REFERENCES projects(id),
+    user_id         TEXT NOT NULL,
+    request         TEXT,                    -- What was asked
+    investigated    TEXT,                    -- What was explored
+    learned         TEXT,                    -- Key learnings
+    completed       TEXT,                    -- What was delivered
+    next_steps      TEXT,                    -- Follow-up items
+    created_at_epoch INTEGER
+);
+-- Sync outbox (offline-first queue)
+CREATE TABLE sync_outbox (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    record_type     TEXT NOT NULL,           -- observation | summary
+    record_id       INTEGER NOT NULL,
+    status          TEXT DEFAULT 'pending',  -- pending | syncing | synced | failed
+    retry_count     INTEGER DEFAULT 0,
+    max_retries     INTEGER DEFAULT 10,
+    last_error      TEXT,
+    created_at_epoch INTEGER NOT NULL,
+    synced_at_epoch  INTEGER,
+    next_retry_epoch INTEGER
+);
+-- Sync high-water mark
+CREATE TABLE sync_state (
+    key             TEXT PRIMARY KEY,
+    value           TEXT NOT NULL
+);
+-- Keys: "last_synced_epoch", "last_backfill_epoch"
+-- FTS5 for local offline search (only active + aging observations)
+CREATE VIRTUAL TABLE observations_fts USING fts5(
+    title, narrative, facts, concepts,
+    content=observations,
+    content_rowid=id
+);
+-- Indexes
+CREATE INDEX idx_observations_project ON observations(project_id);
+CREATE INDEX idx_observations_type ON observations(type);
+CREATE INDEX idx_observations_created ON observations(created_at_epoch);
+CREATE INDEX idx_observations_session ON observations(session_id);
+CREATE INDEX idx_observations_lifecycle ON observations(lifecycle);
+CREATE INDEX idx_observations_quality ON observations(quality);
+CREATE INDEX idx_projects_canonical ON projects(canonical_id);
+CREATE INDEX idx_outbox_status ON sync_outbox(status, next_retry_epoch);
+CREATE INDEX idx_outbox_record ON sync_outbox(record_type, record_id);
+```
+### File Paths: Always Relative
+File paths in observations are stored **relative to the project root**, never absolute. This ensures paths match across machines where the project lives at different locations.
+```
+Absolute (on this machine):  /Users/david/code/aimy-agent/models/interview.py
+Project root:                /Users/david/code/aimy-agent/
+Stored in observation:       models/interview.py
+```
+The plugin resolves relative paths at capture time using the project's `local_path`.
+### Candengo Vector Document Mapping
+Each observation maps to a single Candengo Vector document:
+```json
+{
+  "site_id": "unimpossible",
+  "namespace": "dev-memory",
+  "source_type": "discovery",
+  "source_id": "david-desktop1-obs-1234",
+  "content": "## Database type mismatch causing 500 errors\n\nThe interview endpoint was returning 500 errors because project_id was defined as UUID in the model but the database column was TEXT. Fixed by changing the SQLAlchemy column type to String.\n\nFacts:\n- project_id column type mismatch between model (UUID) and database (TEXT)\n- Fix: change Column(UUID) to Column(String) in Interview model\n- Affects: models/interview.py",
+  "metadata": {
+    "project_canonical": "github.com/unimpossible/aimy-agent",
+    "project_name": "aimy-agent",
+    "user_id": "david",
+    "device_id": "desktop-1",
+    "agent": "claude-code",
+    "title": "Fixed project_id column type mismatch in Interview model",
+    "type": "bugfix",
+    "quality": 0.7,
+    "concepts": ["problem-solution", "gotcha"],
+    "files_modified": ["models/interview.py"],
+    "session_id": "abc-123",
+    "created_at_epoch": 1740700000,
+    "local_id": 1234
+  }
+}
+```
+**Content composition**: Concatenate `title + narrative + facts` into a single `content` field. Candengo Vector auto-chunks at 400 tokens with 50-token overlap. Most observations fit in a single chunk.
+**Source ID format**: `"{user_id}-{device_id}-obs-{local_sqlite_id}"` — unique across users AND devices. The old format `"{user_id}-obs-{id}"` could collide because two devices both start local IDs at 1.
+**Project matching**: Candengo Vector metadata includes `project_canonical` (normalised git remote URL). When searching, the plugin sends its current project's canonical ID as a metadata filter. This ensures "aimy-agent on laptop" and "aimy-agent on desktop" match to the same project — because they share the same git remote.
+### Deduplication
+Before saving a new observation, check for near-duplicates:
+```
+1. Query local SQLite: recent observations (last 24h) for same project
+2. Simple heuristic: title similarity > 0.8 (Jaccard on word tokens)
+3. If duplicate found: merge new facts into existing observation, don't create new row
+4. If no duplicate: save as new observation
+```
+Full semantic deduplication (via Candengo Vector similarity) runs during the compaction job, not on the hot path. Keeps capture fast.
+---
+## 3. MCP Tool Interface
+### Tool: `search`
+Find relevant observations from memory.
+```typescript
+{
+  name: "search",
+  description: "Search memory for relevant observations, discoveries, and decisions",
+  inputSchema: {
+    query: string,          // Natural language search query
+    project?: string,       // Filter to specific project (auto-detected from cwd if omitted)
+    type?: string,          // Filter by observation type
+    limit?: number,         // Max results (default: 10)
+    scope?: "personal" | "team" | "all"  // Search scope (default: "all")
+    include_archived?: boolean  // Include archived observations (default: false)
+  }
+}
+```
+Returns compact index with IDs (~50-100 tokens per result):
+```
+| ID | Time | Type | Q | Title | Project | By |
+```
+The `Q` column shows quality score as a visual indicator (●●●○○ = 0.6). Results are ranked by `(relevance × quality × lifecycle_weight)`.
+**Default behaviour**: Searches current project only. Pass `project: "*"` to search across all projects.
+### Tool: `timeline`
+Get chronological context around a specific observation.
+```typescript
+{
+  name: "timeline",
+  inputSchema: {
+    anchor: number,         // Observation ID to centre on
+    depth_before?: number,  // Observations before (default: 3)
+    depth_after?: number,   // Observations after (default: 3)
+    project?: string        // Auto-detected if omitted
+  }
+}
+```
+### Tool: `get_observations`
+Fetch full details for specific observation IDs.
+```typescript
+{
+  name: "get_observations",
+  inputSchema: {
+    ids: number[]
+  }
+}
+```
+Returns full observation details (~500-1000 tokens per result).
+### Tool: `save_observation`
+Manually save an observation (most are captured automatically via hooks).
+```typescript
+{
+  name: "save_observation",
+  inputSchema: {
+    text: string,           // Observation content (required)
+    title?: string,         // Brief title (auto-generated if omitted)
+    type?: string,          // Observation type (auto-classified if omitted)
+    project?: string,       // Project (auto-detected from cwd if omitted)
+    pin?: boolean           // Pin to prevent aging (default: false)
+  }
+}
+```
+### Tool: `pin_observation`
+Prevent an observation from aging out. Use for architectural decisions, critical gotchas, and other knowledge that stays relevant indefinitely.
+```typescript
+{
+  name: "pin_observation",
+  inputSchema: {
+    id: number,
+    pinned: boolean         // true to pin, false to unpin
+  }
+}
+```
+---
+## 4. Sync Engine
+### Sync States
+```
+              ┌──────────┐
+              │ pending   │ ← observation saved to SQLite
+              └─────┬─────┘
+                    │ sync attempt
+              ┌─────▼─────┐
+              │ syncing    │ ← HTTP request in flight
+              └─────┬─────┘
+                   ╱ ╲
+            success   failure
+                 ╱       ╲
+        ┌───────▼┐   ┌────▼────┐
+        │ synced  │   │ failed  │ ← retry_count++
+        └────────┘   └────┬────┘
+                          │ next_retry_epoch reached
+                          │ (if retry_count < max_retries)
+                    ┌─────▼─────┐
+                    │ pending   │ ← back in queue
+                    └───────────┘
+```
+### Retry Schedule (Exponential Backoff)
+| Retry # | Delay |
+|---|---|
+| 1 | 30 seconds |
+| 2 | 1 minute |
+| 3 | 2 minutes |
+| 4 | 5 minutes |
+| 5+ | 5 minutes (cap) |
+### Sync Triggers
+1. **Immediate**: On observation save → fire-and-forget push
+2. **Timer**: Every 30 seconds → flush pending outbox items (batch of 50)
+3. **Startup**: On boot → check high-water mark, sync anything newer
+4. **Manual**: `engrm sync` CLI command → force full sync
+### Backfill Algorithm (High-Water Mark)
+```
+1. Read last_synced_epoch from sync_state table
+2. SELECT * FROM observations
+   WHERE created_at_epoch > last_synced_epoch
+   AND sensitivity != 'secret'
+3. Batch push to Candengo Vector via POST /v1/ingest/batch
+4. Update last_synced_epoch to max(created_at_epoch) of pushed items
+```
+No need to query remote for existing IDs. Simple, efficient, scales to any count.
+### Conflict Resolution
+**Strategy: Source ID namespacing, no conflicts possible**
+- Each observation has a unique `source_id`: `"{user_id}-{device_id}-obs-{local_id}"`
+- Candengo Vector upserts by `source_id` — re-syncing is idempotent
+- No two users can overwrite each other's observations
+- No two devices can collide (both start local autoincrement at 1, but device_id distinguishes them)
+Search results show attribution (`user_id/device_id`) so users know who captured what and where.
+---
+## 5. Lifecycle Management
+### Aging Job
+Runs once per day (on plugin startup if >24h since last run):
+```
+1. UPDATE observations SET lifecycle = 'aging'
+   WHERE lifecycle = 'active'
+   AND created_at_epoch < now() - 30 days
+2. For observations moving to 'aging':
+   - No immediate action (still in FTS5 and Candengo Vector)
+   - Search scoring applies 0.7x weight
+```
+### Archival + Compaction Job
+Runs once per week:
+```
+1. Find observations WHERE lifecycle = 'aging'
+   AND created_at_epoch < now() - 90 days
+2. Group by (project_id, session_id)
+3. For each group:
+   a. Generate a "digest" observation:
+      - Summarise: what was the session about, key findings, outcomes
+      - Type: "digest"
+      - Quality: max(quality) of source observations
+      - Lifecycle: "pinned" (digests don't age out)
+   b. Mark source observations:
+      - lifecycle = 'archived'
+      - compacted_into = digest.id
+      - archived_at_epoch = now()
+4. Remove archived observations from FTS5 index
+5. Queue removal from Candengo Vector:
+   - DELETE source_ids from Candengo Vector
+   - Ingest the new digest observation
+```
+This means a 3-month-old debugging session with 25 observations becomes a single digest observation. The detail is preserved in local SQLite (for forensic use) but doesn't pollute search results or count toward vector storage quota.
+### Purge Job
+Runs monthly. Permanently deletes archived observations older than 12 months:
+```
+DELETE FROM observations
+WHERE lifecycle = 'archived'
+AND archived_at_epoch < now() - 12 months
+```
+Digests (lifecycle = 'pinned') are never purged. Pinned individual observations are never purged.
+### Quota Calculation
+For free tier enforcement (10K observation limit):
+```sql
+SELECT COUNT(*) FROM observations
+WHERE lifecycle IN ('active', 'aging')
+AND sensitivity != 'secret'
+```
+Only active + aging observations that are synced to Candengo Vector count toward the quota. Archived and local-only observations are free. This means compaction directly frees quota — a user generating 500 observations/month who stays under 10K active+aging can use the free tier indefinitely.
+---
+## 6. Secret Scrubbing Pipeline
+### Pre-Storage Scrubbing
+Before any observation is saved (even to local SQLite), content is scrubbed:
+```
+Input text → Pattern matching → Replacement → Scrubbed text
+```
+### Default Patterns
+| Pattern | Replacement | Catches |
+|---|---|---|
+| `sk-[a-zA-Z0-9]{20,}` | `[REDACTED_API_KEY]` | OpenAI keys |
+| `Bearer [a-zA-Z0-9\-._~+/]+=*` | `[REDACTED_BEARER]` | Auth headers |
+| `password[=:]\s*\S+` | `password=[REDACTED]` | Passwords in config |
+| `postgresql://[^\s]+` | `[REDACTED_DB_URL]` | Postgres connection strings |
+| `mongodb://[^\s]+` | `[REDACTED_DB_URL]` | Mongo connection strings |
+| `mysql://[^\s]+` | `[REDACTED_DB_URL]` | MySQL connection strings |
+| `AKIA[A-Z0-9]{16}` | `[REDACTED_AWS_KEY]` | AWS access keys |
+| `ghp_[a-zA-Z0-9]{36}` | `[REDACTED_GH_TOKEN]` | GitHub tokens |
+| `cvk_[a-f0-9]{64}` | `[REDACTED_CANDENGO_KEY]` | Candengo API keys |
+| Custom patterns from config | User-defined | Organisation-specific |
+### Sensitivity Levels
+| Level | Behaviour |
+|---|---|
+| `shared` (default) | Scrubbed, stored locally, synced to Candengo Vector |
+| `personal` | Scrubbed, stored locally, synced but only visible to same user |
+| `secret` | Scrubbed, stored locally only, **never synced** |
+---
+## 7. Configuration & Authentication
+> **Full auth design**: See `AUTH-DESIGN.md` for the complete authentication specification including all flows, server-side requirements, and implementation timeline.
+### Settings File: `~/.engrm/settings.json`
+All configuration lives in a single directory: `~/.engrm/`.
+```json
+{
+  "candengo_url": "https://www.candengo.com",
+  "candengo_api_key": "cvk_...",
+  "site_id": "unimpossible",
+  "namespace": "dev-memory",
+  "user_id": "david",
+  "user_email": "david@example.com",
+  "device_id": "macbook-a1b2c3d4",
+  "teams": [
+    { "id": "team_abc123", "name": "Unimpossible", "namespace": "dev-memory" }
+  ],
+  "sync": {
+    "enabled": true,
+    "interval_seconds": 30,
+    "batch_size": 50
+  },
+  "search": {
+    "default_limit": 10,
+    "local_boost": 1.2,
+    "scope": "all"
+  },
+  "scrubbing": {
+    "enabled": true,
+    "custom_patterns": [],
+    "default_sensitivity": "shared"
+  }
+}
+```
+**Changes from prior version**: Added `user_email` and `teams[]` array (supports multi-team membership).
+### Credential Types
+| Prefix | Type | Lifetime | Purpose |
+|--------|------|----------|---------|
+| `cvk_` | API key | Permanent (revocable) | The ONE credential for all sync API calls |
+| `cmt_` | Provisioning token | 1 hour, single-use | Web signup → exchange for `cvk_` key |
+The `cvk_` API key is the single credential type for sync operations. Multiple authentication flows exist to **obtain** this key conveniently, but the credential itself is always the same.
+**CI/CD**: The `ENGRM_TOKEN` environment variable takes precedence over settings.json, allowing pipelines to authenticate without config files.
+### Authentication Flows
+Four flows to obtain a `cvk_` API key, covering all environments:
+| Flow | Command | Use Case |
+|------|---------|----------|
+| **Browser OAuth** | `engrm init` | Desktop developers (default) |
+| **Device code** | `engrm init --no-browser` | SSH, headless, WSL (RFC 8628) |
+| **Provisioning token** | `engrm init --token=cmt_...` | Web signup copy-paste |
+| **Manual** | `engrm init --manual` | Air-gapped, self-hosted |
+All flows write the same `cvk_` API key to `~/.engrm/settings.json`.
+#### Browser OAuth Flow (Default)
+```
+1. User runs: engrm init
+2. CLI opens browser → candengo.com/connect/mem
+3. User logs in → clicks "Authorize"
+4. Redirect to localhost callback with auth code
+5. CLI exchanges code → receives cvk_ API key
+6. Writes settings.json + registers MCP server
+```
+#### Device Code Flow (Headless)
+Auto-detected when no browser available, or via `--no-browser`:
+```
+1. CLI requests device code from server
+2. Prints: "Open https://candengo.com/connect/mem/device — Enter code: XXXX-YYYY"
+3. User authorises on any device with a browser
+4. CLI polls until authorised → receives cvk_ API key
+5. Writes settings.json
+```
+#### Provisioning Token Flow (Web Signup)
+```
+1. User signs up at engrm.dev
+2. Page shows: npx engrm init --token=cmt_abc123...
+3. CLI exchanges cmt_ token → receives cvk_ API key
+4. Writes settings.json + registers MCP server
+```
+### Team Provisioning
+Teams are **explicitly** created and joined — not auto-provisioned.
+- **Personal namespace**: auto-provisioned on first auth (any flow)
+- **Team creation**: admin at `engrm.dev/team`
+- **Team join**: invite link `engrm.dev/join/{code}` or `engrm team join --code=INVITE_CODE`
+- **Multi-team**: `teams[]` array supports belonging to multiple teams simultaneously
+### Token Revocation
+- `engrm auth revoke` — revoke current key
+- `engrm auth rotate` — atomic: create new key, update settings, revoke old
+- Web dashboard: `engrm.dev/dashboard` — manage all keys
+- Server stores key hashes only (SHA-256), with `key_prefix` for identification
+### What `engrm init` Does
+```
+1. Authenticate via one of the four flows above
+2. Exchange credentials → get cvk_ API key + account info
+3. Create ~/.engrm/ directory
+4. Write settings.json with credentials
+5. Generate device_id (hostname + random suffix)
+6. Create local SQLite database
+7. Register MCP server in Claude Code:
+   - Write to ~/.claude/mcp.json (or project .mcp.json)
+8. Register hooks in Claude Code:
+   - Write to ~/.claude/hooks.json (or merge with existing)
+9. Test connection to Candengo Vector (if online)
+10. Print success message with next steps
+```
+### Self-Hosted Provisioning
+For self-hosted deployments:
+```bash
+npx engrm init --url=https://vector.internal.company.com --token=cmt_...
+```
+Or manual config for air-gapped environments:
+```bash
+npx engrm init --manual
+# Prompts for: endpoint, api_key, site_id, namespace, user_id
+```
+### Device ID
+Auto-generated on first run using `hostname + random suffix`. Stored in settings. Used to tag all observations so you know which machine they came from.
+---
+## 8. Claude Code Integration
+### MCP Server Registration
+`.mcp.json` in the project or user config:
+```json
+{
+  "mcpServers": {
+    "engrm": {
+      "type": "stdio",
+      "command": "bun",
+      "args": ["run", "/path/to/engrm/src/server.ts"]
+    }
+  }
+}
+```
+### Claude Code Hooks
+Hooks are configured in `~/.claude/settings.json` (or `.claude/settings.json` for project scope) under the `"hooks"` key:
+```json
+{
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|Write|Bash",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bun run /path/to/engrm/hooks/post-tool-use.ts"
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bun run /path/to/engrm/hooks/stop.ts"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+### Hook Data Flow
+**PostToolUse** receives JSON on stdin:
+```json
+{
+  "session_id": "abc123",
+  "hook_event_name": "PostToolUse",
+  "tool_name": "Edit",
+  "tool_input": { "file_path": "...", "old_string": "...", "new_string": "..." },
+  "tool_response": "Successfully edited /path/to/file.txt",
+  "cwd": "/path/to/project"
+}
+```
+The hook runs the observation extractor which decides whether this tool use is worth capturing. Signal (edits, error→fix, dependency changes) = yes. Noise (reads, navigation, git status) = skip.
+**Stop** receives JSON on stdin:
+```json
+{
+  "session_id": "abc123",
+  "hook_event_name": "Stop",
+  "stop_hook_active": false,
+  "last_assistant_message": "...",
+  "cwd": "/path/to/project"
+}
+```
+Stop hook actions:
+1. Check `stop_hook_active` to prevent infinite loops
+2. Mark session as completed in SQLite
+3. Session summary generation (deferred to Phase 3)
+4. Exit 0 to allow Claude to stop
+---
+## 9. Search Pipeline
+### Hybrid Search Architecture
+```
+Query: "how to handle SSE streaming errors"
+  + project_canonical: "github.com/unimpossible/aimy-agent" (auto-detected)
+         │
+         ├──→ Local SQLite FTS5 (always, instant)
+         │    → BM25 keyword matching
+         │    → Filter: project_id + lifecycle IN ('active', 'aging', 'pinned')
+         │    → Returns top 20 candidates with scores
+         │
+         └──→ Candengo Vector /v1/search (if online)
+              → metadata_filter: project_canonical = "github.com/unimpossible/aimy-agent"
+              → BGE-M3 hybrid dense+sparse
+              → Cross-encoder reranking
+              → Returns top 20 candidates with scores
+                        │
+                        ▼
+              Result Merger
+              → Deduplicate by source_id
+              → Normalise scores to 0-1
+              → Weighted combination:
+                  - Candengo score × 0.6 (semantic quality)
+                  - Local FTS score × 0.15 (keyword precision)
+                  - Quality score × 0.15 (observation quality)
+                  - Recency bonus × 0.1 (newer slightly preferred)
+              → Lifecycle weight: active=1.0, aging=0.7, pinned=1.0
+              → Device boost: current device × 1.1
+              → Return top N results
+```
+**Project scoping**: Search is always scoped to the current project by default. The canonical project ID (from git remote) is sent as a metadata filter to Candengo Vector and as a WHERE clause to local SQLite. This means a search in the `aimy-agent` project on your laptop matches observations from `aimy-agent` on your desktop — because both resolve to `github.com/unimpossible/aimy-agent`.
+**Cross-project search**: Pass `project: "*"` to search across all projects. Useful for finding patterns that apply across codebases ("how did we handle CORS last time in any project?").
+### Result Format (Compact Index)
+```markdown
+### Memory Search Results (aimy-agent)
+| ID | Time | T | Q | Title | By |
+|----|------|---|---|-------|----|
+| #574 | 4:26 PM | D | ●●●○○ | Conversation lookup allows multiple active per user | david/laptop |
+| #573 | 4:26 PM | B | ●●●●○ | Chat stream crashes due to duplicate conversations | david/desktop |
+| #201 | Feb 12 | P | ●●●●● | SSE streaming must wrap generator in try/finally | sarah/laptop |
+| 📌 #89 | Jan 5 | DC | ●●●●● | Use FastAPI BackgroundTasks for non-blocking writes | david/laptop |
+Access full details: get_observations([574, 573, 201, 89])
+```
+Legend: B=bugfix, D=discovery, F=feature, R=refactor, C=change, P=pattern, DC=decision, DG=digest
+📌 = pinned (won't age out)