npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.6.0 → 0.7.0 - Mend

@pentatonic-ai/ai-agent-sdk 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

package/README.md CHANGED Viewed

@@ -27,7 +27,7 @@ Two products that share one TES account, one install line, and one dashboard:
 | Product | What it does | When you want it |
 |---|---|---|
-| **Memory** | Persistent, searchable memory for your AI agent — semantic + keyword retrieval, distillation, decay, repo onboarding. Runs locally (Docker) or hosted (TES). | You want your agent to remember conversations, preferences, and codebase context across sessions. |
+| **Memory** | Persistent, searchable memory for your AI agent — 7-layer hybrid retrieval (BM25 + vector + KG + reranker), repo onboarding via references. Runs locally (Docker) or hosted (TES). | You want your agent to remember conversations, preferences, and codebase context across sessions. |
 | **Observability** | Wrap your LLM client and capture every call — tokens, tool calls, latency, content. Events flow to TES for the dashboard, analytics, and search attribution. | You want to know what your agent is actually doing in production. |
 Both products are sold separately, but you can use either, both, or neither. Plugins for **Claude Code** and **OpenClaw** install everything at once if you'd rather skip the SDK glue.
@@ -44,10 +44,9 @@ Both products are sold separately, but you can use either, both, or neither. Plu
 - [TES — the platform](#tes--the-platform)
 - [Memory](#memory)
-  - [Hosted (cloud)](#hosted-cloud)
   - [Local (self-hosted)](#local-self-hosted)
+  - [Hosted (cloud)](#hosted-cloud)
   - [Use as a library](#use-as-a-library)
-  - [Distilled memory](#distilled-memory)
 - [Observability](#observability)
   - [Wrap your LLM client](#wrap-your-llm-client)
   - [Supported providers](#supported-providers)
@@ -87,63 +86,155 @@ To check connection state later: `npx @pentatonic-ai/ai-agent-sdk whoami`. To po
 ## Memory
-Persistent, searchable memory for AI agents. Multi-signal retrieval (vector + BM25 + recency + frequency), HyDE query expansion, atomic-fact distillation, and four memory layers (episodic, semantic, procedural, working).
+Persistent, searchable memory for AI agents. Backed by a 7-layer hybrid retrieval engine — BM25 keyword (L0), core files (L1), HybridRAG orchestrator (L2), Knowledge Graph entities (L3), vector index (L4), comms-namespace vectors (L5), and a document store with cross-encoder reranker (L6). Reciprocal Rank Fusion stitches them at query time.
-Two deployment modes — same API, same plugins, same library:
+Same engine, same wire format (`/store`, `/search`, `/forget`, `/store-batch`, `/health`), two deployment modes:
-### Hosted (cloud)
+### Local (self-hosted)
+Run the full engine stack on your own machine via Docker. No API keys, no cloud, fully offline. Embeddings come from your local Ollama; quality depends on the model you pull (768d `nomic-embed-text` is the default and works fine on a laptop).
+**Prerequisites**
+- Docker + Docker Compose v2
+- Ollama installed on the host (https://ollama.com)
+- A pulled embedding model: `ollama pull nomic-embed-text`
-Run on Pentatonic's infrastructure. Higher-dimensional embeddings (NV-Embed-v2, 4096d), per-tenant Postgres, team-wide shared memory, the dashboard.
+If you'll run Claude Code (or anything else) inside a Docker container that needs to reach the engine, **make Ollama listen on all interfaces** so containers can reach it via `host.docker.internal`:
 ```bash
-# 1. Get a TES account (see [TES — the platform](#tes--the-platform))
-npx @pentatonic-ai/ai-agent-sdk login
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+echo -e '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0:11434"' \
+  | sudo tee /etc/systemd/system/ollama.service.d/override.conf
+sudo systemctl daemon-reload
+sudo systemctl restart ollama
+```
-# 2. Install the SDK
-npm install @pentatonic-ai/ai-agent-sdk
-# or: pip install pentatonic-ai-agent-sdk
+**Bring up the engine**
+```bash
+git clone https://github.com/Pentatonic-Ltd/ai-agent-sdk.git
+cd ai-agent-sdk/packages/memory-engine
+# Default .env points at Ollama on the host. Edit if your Ollama is
+# elsewhere or you want to use a higher-quality model (e.g. mxbai-embed-large
+# at 1024d → set EMBED_DIM=1024 and EMBED_MODEL_NAME=mxbai-embed-large).
+cat > .env <<'EOF'
+PME_NV_EMBED_ENABLED=false
+NV_EMBED_URL=http://host.docker.internal:11434/v1/embeddings
+EMBED_MODEL_NAME=nomic-embed-text
+EMBED_DIM=768
+OLLAMA_DIM=768
+PME_OLLAMA_URL=http://host.docker.internal:11434/api/embeddings
+PME_EMBED_MODEL=nomic-embed-text
+L5_OLLAMA_EMBED_URL=http://host.docker.internal:11434/api/embed
+L5_OLLAMA_EMBED_MODEL=nomic-embed-text
+PME_HYDE_ENABLED=false
+PME_RERANK_ENABLED=true
+PME_PORT=8099
+CLIENT_ID=local
+NEO4J_AUTH=neo4j/local-dev-pw
+NEO4J_PASSWORD=local-dev-pw
+EOF
+docker compose up -d --scale nv-embed=0
 ```
-That's it — memory operations now go through TES.
+First run pulls images and builds engine containers — ~10–15 min. Subsequent restarts are seconds.
-### Local (self-hosted)
+**Verify**
+```bash
+curl -s http://localhost:8099/health | jq
+# Status should be "ok" or "degraded" with most layers reporting ok.
+curl -sX POST http://localhost:8099/store \
+  -H "content-type: application/json" \
+  -d '{"content":"hello memory","metadata":{"arena":"local"}}' | jq
+curl -sX POST http://localhost:8099/search \
+  -H "content-type: application/json" \
+  -d '{"query":"hello","limit":3,"min_score":0.001}' | jq
+```
+If `/search` returns the row from `/store`, the engine is live.
-Run the full stack on your own machine. PostgreSQL + pgvector + Ollama in Docker. No API keys, no cloud. Pi 5 with 8GB RAM works fine (`nomic-embed-text` ~300MB + `llama3.2:3b` ~2GB).
+**Connect Claude Code**
+The `tes-memory` plugin's hooks already speak the engine's wire format. Two steps:
+1. Install the plugin (once):
+   ```
+   /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
+   /plugin install tes-memory@pentatonic-ai
+   ```
+2. Point it at your local engine. Edit `~/.claude-pentatonic/tes-memory.local.md` (create if missing):
+   ```yaml
+   ---
+   mode: local
+   memory_url: http://localhost:8099
+   ---
+   ```
+3. Reload: `/reload-plugins` (or restart Claude Code if status reports stale state — MCP server processes need a full restart to pick up plugin updates).
+Verify:
+```
+/tes-memory:tes-status
+```
+Should report `✓ Connected to local memory engine`. Now every prompt auto-searches engine memory and every turn auto-stores. The footer `🧠 Matched N memories from Pentatonic Memory` shows hits.
+**Seed memory from your codebase or docs (optional)**
+Drop the cold-start problem on day one by pre-populating the engine with references to your code/docs:
 ```bash
-npx @pentatonic-ai/ai-agent-sdk memory
+MEMORY_ENGINE_URL=http://localhost:8099 \
+  npx @pentatonic-ai/ai-agent-sdk ingest ~/code/my-project
 ```
-This starts Postgres + pgvector, Ollama, and the memory server. It pulls embedding and chat models, and writes the local config.
+References-mode by default — stores path + signature pointers, not full file contents. See [Repository Onboarding](#repository-onboarding-corpus-ingest) for details.
+**Tuning**
-Change models:
+Change embedding model: pull a different one, edit `EMBED_MODEL_NAME` + `EMBED_DIM` in `.env`, then `docker compose down -v && docker compose up -d --scale nv-embed=0` (the `-v` is required because Milvus collections are dim-locked at creation; switching dims means recreating).
+| Model | Dim | Notes |
+|---|---|---|
+| `nomic-embed-text` (default) | 768 | Smallest; works on any laptop |
+| `mxbai-embed-large` | 1024 | Better recall; ~600 MB download |
+| `nv-embed-v2` (via gateway) | 4096 | Production-grade; needs a hosted endpoint or GPU |
+### Hosted (cloud)
+Run on Pentatonic's infrastructure. NV-Embed-v2 (4096d) embeddings via the AI gateway, managed Postgres/Neo4j/Qdrant/Milvus, dashboard. The engine still ships in this repo — hosted just deploys it for you.
 ```bash
-EMBEDDING_MODEL=mxbai-embed-large LLM_MODEL=qwen2.5:7b npx @pentatonic-ai/ai-agent-sdk memory
+# 1. Get a TES account
+npx @pentatonic-ai/ai-agent-sdk login
+# 2. Install the SDK
+npm install @pentatonic-ai/ai-agent-sdk
+# or: pip install pentatonic-ai-agent-sdk
 ```
+Memory operations route through TES → engine. No client-side change between local and hosted.
 ### Use as a library
 ```javascript
-import { createMemorySystem } from '@pentatonic-ai/ai-agent-sdk/memory';
+import { engineAdapter, ingestCorpus } from '@pentatonic-ai/ai-agent-sdk/memory/corpus';
-const memory = createMemorySystem({
-  db: pgPool,
-  embedding: { url: 'http://localhost:11434/v1', model: 'nomic-embed-text' },
-  llm: { url: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
+const adapter = engineAdapter({
+  engineUrl: 'http://localhost:8099',
+  arena: 'my-app',
 });
-await memory.migrate();
-await memory.ensureLayers('my-app');
-await memory.ingest('User prefers dark mode', { clientId: 'my-app' });
-const results = await memory.search('preferences', { clientId: 'my-app' });
+await adapter.init();
+await adapter.ingestChunk('User prefers dark mode', { kind: 'note' });
 ```
-### Distilled memory
-A background LLM pass extracts atomic facts from each raw turn and stores each as its own node in the semantic layer, linked back to the source. A query like *"what does Phil drink?"* matches *"Phil drinks cortado"* more reliably than a mixed paragraph covering food, drinks, and hobbies. Default-on; the raw turn is still preserved.
-> **Store latency note (v0.5.4+):** on the local memory server, `store_memory` now awaits distillation before returning instead of running it fire-and-forget. This fixed a bug where distillation was being killed mid-flight (atoms never got embeddings, so they were unreachable by semantic search), but it means stores now take as long as your configured LLM takes to produce atoms — typically 5–30s on `llama3.2:3b`, up to the `chat()` timeout ceiling (60s default, overridable via `opts.timeout`). Cloudflare Worker deployments pass `ctx.waitUntil` and still return fast. Set `opts.distill: false` on the ingest call if you want the old fast-return behaviour at the cost of no atoms.
+For raw `/search` and `/store`, just `fetch()` against `${engineUrl}/search` etc. The wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
 ---
@@ -216,17 +307,26 @@ Works with both local and hosted memory. Install once, switch modes via config.
 /plugin install tes-memory@pentatonic-ai
 ```
-For hosted TES, run `npx @pentatonic-ai/ai-agent-sdk login` once in your terminal — the plugin's MCP server, hooks, and tools all auto-discover the credentials written to `~/.config/tes/credentials.json`. To verify the connection later, ask Claude `/tes-memory:tes-status`.
+**Local engine** — bring up the engine first ([Memory > Local](#local-self-hosted)), then point the plugin at it. Edit `~/.claude-pentatonic/tes-memory.local.md`:
+```yaml
+---
+mode: local
+memory_url: http://localhost:8099
+---
+```
+**Hosted TES** — run `login` once, the plugin auto-discovers `~/.config/tes/credentials.json`:
-For local memory:
 ```bash
-npx @pentatonic-ai/ai-agent-sdk memory
+npx @pentatonic-ai/ai-agent-sdk login
 ```
-**What it tracks:**
-- Every conversation turn — user messages, assistant responses, tool calls, duration
-- Automatic memory search — relevant memories injected as context on every prompt
-- Automatic memory storage — every turn stored with embeddings and HyDE queries
+Either way, verify with `/tes-memory:tes-status` in Claude Code. The plugin's MCP server, hooks, and tools all read the same config.
+**What it tracks (auto, every turn):**
+- Memory search at prompt time — relevant memories injected as context
+- Memory store at turn end — every conversation turn persisted
 - Token usage — input, output, cache read, cache creation tokens per turn
 ### OpenClaw
@@ -249,7 +349,7 @@ Or use the CLI directly:
 openclaw pentatonic-memory local
 ```
-**What it does:** OpenClaw's context engine hooks fire on every lifecycle event — `ingest` stores user/assistant messages with embeddings + HyDE + distillation; `assemble` injects relevant memories as system-prompt context before every model run; `compact` runs the decay cycle when the context window fills; `after-turn` consolidates high-access memories into the semantic layer. Plus agent-callable tools: `memory_search`, `memory_store`, `memory_layers`.
+**What it does:** OpenClaw's context engine hooks fire on every lifecycle event — `ingest` stores user/assistant messages via the engine's `/store` endpoint (BM25 + vector + KG indexing in parallel); `assemble` calls `/search` to inject relevant memories as system-prompt context; `compact` and `after-turn` are managed by the engine's own decay/consolidation. Plus agent-callable tools: `memory_search`, `memory_store`, `memory_layers`.
 After setup, config lives in `~/.openclaw/pentatonic-memory.json`. To switch modes, run setup again or edit directly.
@@ -263,11 +363,7 @@ You can also configure via `openclaw.json`:
       "pentatonic-memory": {
         "enabled": true,
         "config": {
-          "database_url": "postgres://memory:memory@localhost:5433/memory",
-          "embedding_url": "http://localhost:11435/v1",
-          "embedding_model": "nomic-embed-text",
-          "llm_url": "http://localhost:11435/v1",
-          "llm_model": "llama3.2:3b"
+          "memory_url": "http://localhost:8099"
         }
       }
     }
@@ -396,9 +492,11 @@ import { normalizeResponse } from "@pentatonic-ai/ai-agent-sdk";
 const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);
 ```
-### `createMemorySystem(deps)` — Memory
+### `engineAdapter(config)` — Memory
+Thin HTTP client for the memory engine. `config = { engineUrl, arena, apiKey? }`. Returns `{ ingestChunk(content, metadata), deleteByCorpusFile(repoAbs, relPath), init() }`. See [Use as a library](#use-as-a-library).
-Returns a memory instance with `.migrate()`, `.ensureLayers(clientId)`, `.ingest(content, opts)`, `.search(query, opts)`, and more. See [Use as a library](#use-as-a-library).
+For raw `/store` / `/search` calls, just `fetch()` against `${engineUrl}` directly — the wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
 ---
@@ -424,9 +522,9 @@ npx @pentatonic-ai/ai-agent-sdk doctor --path local
 What gets checked:
 - **Universal** — Node version, disk space, SDK config-file permissions
-- **Local Memory** — Postgres + pgvector + migrations, embedding/LLM endpoints, memory server port
+- **Local engine** — engine `/health`, per-layer health (L0–L6), embedding endpoint reachability
 - **Hosted TES** — endpoint reachable, API key authenticates
-- **Self-hosted platform** — HybridRAG, Qdrant, Neo4j, vLLM (each optional, skipped when its env var is unset)
+- **Plugin config** — `tes-memory.local.md` parses, `memory_url` reachable
 ### Plugins
@@ -458,24 +556,27 @@ See [`packages/doctor/README.md`](packages/doctor/README.md) for the full plugin
 ## Architecture
 ```
-                    Your code
-                        |
-        +---------------+---------------+
-        |                               |
-   Memory product              Observability product
-   (createMemorySystem)         (TESClient.wrap)
-        |                               |
-        |                               |
-   +----+----+                          |
-   |         |                          |
- Local    Hosted ---------------------- TES
- (Docker)              (Cloudflare cloud)
-   |                          |
-PG+pgvector              PG, R2, Queues,
-+ Ollama                 Workers, Modules
-                         (deep-memory,
-                          conversation-
-                          analytics, …)
+                    Your code / Claude Code plugin / OpenClaw plugin
+                                  |
+              +-------------------+--------------------+
+              |                                        |
+         Memory product                        Observability product
+         (engine HTTP API)                     (TESClient.wrap)
+              |                                        |
+              | POST /store /search /forget            | CHAT_TURN events
+              ▼                                        ▼
+      +----------------+                       +-----------------+
+      | memory engine  |                       |       TES       |
+      |  (compat shim) |                       | (Cloudflare)    |
+      +----------------+                       |  Workers, R2,   |
+              |                                |  Queues, Pages  |
+   +----------+----------+                     +--------+--------+
+   |                     |                              |
+ Local                Hosted ---------------------------+
+ (your machine)    (Pentatonic-managed)
+   |                     |
+docker compose      AWS/GCP container cluster
++ host Ollama       + AI gateway (NV-Embed-v2)
 ```
 Plugins (Claude Code, OpenClaw) are lightweight integrations on top of both products — they call into memory and emit observability events on the user's behalf.

package/bin/__tests__/callback-server.test.js CHANGED Viewed

@@ -1,7 +1,10 @@
 import { startCallbackServer } from "../lib/callback-server.js";
 async function fetchCallback(port, qs) {
-  const url = `http://localhost:${port}/callback?${qs}`;
+  // Use 127.0.0.1 not "localhost" — undici (Node 18+) resolves localhost to
+  // ::1 first, but the server binds to 127.0.0.1 only, so on IPv6-preferring
+  // hosts (GitHub Actions runners) the IPv6 attempt ECONNREFUSEs.
+  const url = `http://127.0.0.1:${port}/callback?${qs}`;
   const res = await fetch(url);
   return { status: res.status, text: await res.text() };
 }

package/bin/cli.js CHANGED Viewed

@@ -1,10 +1,6 @@
 #!/usr/bin/env node
 import { createInterface } from "readline";
-import { execFileSync } from "child_process";
-import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
-import { join } from "path";
-import { homedir } from "os";
 const DEFAULT_ENDPOINT = "https://api.pentatonic.com";
@@ -31,10 +27,10 @@ function parseArgs() {
       flags.alert = true;
     } else if (a === "--no-plugins") {
       flags.noPlugins = true;
-    } else if (a === "--local") {
-      flags.local = true;
-    } else if (a === "--remote") {
-      flags.remote = true;
+    } else if (a === "--engine-url" && args[i + 1]) {
+      flags.engineUrl = args[++i];
+    } else if (a.startsWith("--engine-url=")) {
+      flags.engineUrl = a.split("=")[1];
     } else if (!a.startsWith("--")) {
       // First non-flag arg is the command; subsequent ones are subcommand
       // arguments handled by the dispatched cmd (e.g. `ingest <path>`).
@@ -77,124 +73,14 @@ function ask(question) {
   return new Promise((resolve) => rl.question(question, resolve));
 }
-function spinner(text) {
-  const frames = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"];
-  let i = 0;
-  const id = setInterval(() => {
-    process.stdout.write(`\r${frames[i++ % frames.length]} ${text}`);
-  }, 80);
-  return {
-    stop(result) {
-      clearInterval(id);
-      process.stdout.write(`\r✓ ${result}\n`);
-    },
-    fail(msg) {
-      clearInterval(id);
-      process.stdout.write(`\r✗ ${msg}\n`);
-    },
-  };
-}
-async function setupLocalMemory() {
-  console.log(`\n  Local Memory Setup\n`);
-  // Check Docker
-  try {
-    execFileSync("docker", ["info"], { stdio: "pipe" });
-  } catch {
-    console.error("  Error: Docker is required. Install it from https://docker.com\n");
-    process.exit(1);
-  }
-  const memoryDir = new URL("../packages/memory", import.meta.url).pathname;
-  // Start infrastructure + memory server
-  const infraSpinner = spinner("Starting memory server + PostgreSQL + Ollama...");
-  try {
-    execFileSync("docker", ["compose", "up", "-d", "memory", "postgres", "ollama"], {
-      cwd: memoryDir,
-      stdio: "pipe",
-    });
-    infraSpinner.stop("Memory stack running!");
-  } catch (err) {
-    infraSpinner.fail(`Failed to start: ${err.message}`);
-    process.exit(1);
-  }
-  // Pull models
-  const embModel = process.env.EMBEDDING_MODEL || "nomic-embed-text";
-  const llmModel = process.env.LLM_MODEL || "llama3.2:3b";
-  const embSpinner = spinner(`Pulling ${embModel}...`);
-  try {
-    execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", embModel], {
-      cwd: memoryDir,
-      stdio: "pipe",
-    });
-    embSpinner.stop(`${embModel} ready!`);
-  } catch {
-    embSpinner.fail(`Failed to pull ${embModel}. Run manually: docker compose exec ollama ollama pull ${embModel}`);
-  }
-  const llmSpinner = spinner(`Pulling ${llmModel}...`);
-  try {
-    execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", llmModel], {
-      cwd: memoryDir,
-      stdio: "pipe",
-    });
-    llmSpinner.stop(`${llmModel} ready!`);
-  } catch {
-    llmSpinner.fail(`Failed to pull ${llmModel}. Run manually: docker compose exec ollama ollama pull ${llmModel}`);
-  }
-  // Write local config (warn if hosted config exists)
-  const configDir = join(homedir(), ".claude-pentatonic");
-  if (!existsSync(configDir)) {
-    mkdirSync(configDir, { recursive: true });
-  }
-  const configPath = join(configDir, "tes-memory.local.md");
-  if (existsSync(configPath)) {
-    const existing = readFileSync(configPath, "utf-8");
-    if (existing.includes("tes_endpoint") && !existing.includes("mode: local")) {
-      console.log("\n  ⚠ Hosted TES config detected. Switching to local mode will");
-      console.log("  disable hosted memory. To restore, run: npx @pentatonic-ai/ai-agent-sdk init\n");
-      const confirm = await ask("  Switch to local mode? (y/n): ");
-      if (confirm.toLowerCase() !== "y") {
-        console.log("  Cancelled. Hosted config unchanged.\n");
-        rl.close();
-        return;
-      }
-    }
-  }
-  writeFileSync(
-    configPath,
-    `---
-mode: local
-memory_url: http://localhost:3333
----
-`
-  );
-  console.log(`\n  Config written to ${configPath}`);
-  const sdkDir = new URL("..", import.meta.url).pathname;
-  console.log(`
-  Memory server: http://localhost:3333
-  Hooks are auto-configured to use local memory.
-  Install the plugin in Claude Code:
-    /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
-    /plugin install tes-memory@pentatonic-ai
-  You're ready! Every prompt auto-searches memory,
-  every turn auto-stores. No MCP setup needed.
-`);
-  rl.close();
-}
+// setupLocalMemory + its `spinner` helper were the legacy "bring up
+// Postgres + Ollama" wrapper for the in-process memory server. Removed
+// in favour of:
+//   - `tes config local`  → writes the plugin config + prints engine
+//                           bring-up instructions
+//   - `cd packages/memory-engine && docker compose up -d` → runs the
+//                           actual engine
+// `ask` is kept for any future interactive prompts.
 async function main() {
@@ -233,11 +119,23 @@ async function main() {
     process.exit(exitCode);
   }
-  // `memory` is kept as a shortcut to skip the local-or-remote question
-  // for users with that command in scripts/docs. New users should use init.
-  if (flags.command === "memory") {
-    await setupLocalMemory();
-    return;
+  // tes config <local|hosted|show> — point Claude Code's tes-memory
+  // plugin at a memory backend, or inspect what's configured. Each
+  // subcommand is a thin scaffold:
+  //   local  → write mode: local + memory_url; print engine bring-up steps
+  //   hosted → run the login flow (delegates to runLoginCommand)
+  //   show   → read and print the current plugin config
+  // Future: `tes config set <key> <value>` for engine env-var tweaks.
+  if (flags.command === "config") {
+    const sub = process.argv.slice(3).find((a) => !a.startsWith("--"));
+    const { runConfigCommand } = await import("./commands/config.js");
+    const { exitCode } = await runConfigCommand({
+      sub,
+      endpoint: TES_ENDPOINT,
+      engineUrl: flags.engineUrl,
+    });
+    rl.close();
+    process.exit(exitCode);
   }
   // Corpus subcommands — onboarding/repo ingest (spec 01)
@@ -268,18 +166,20 @@ async function main() {
     process.exit(code);
   }
-  if (flags.command !== "init") {
-    console.log(`
+  console.log(`
 @pentatonic-ai/ai-agent-sdk
 Usage:
-  npx @pentatonic-ai/ai-agent-sdk login                   Sign in with TES (browser-based OAuth)
+  npx @pentatonic-ai/ai-agent-sdk login                   First-time hosted setup: browser sign-in + writes credentials
   npx @pentatonic-ai/ai-agent-sdk whoami                  Show current login identity
-  npx @pentatonic-ai/ai-agent-sdk init                    [deprecated] Alias for 'login'
-  npx @pentatonic-ai/ai-agent-sdk init --local            Set up local Docker memory stack
-  npx @pentatonic-ai/ai-agent-sdk memory                  Shortcut for 'init --local'
+  npx @pentatonic-ai/ai-agent-sdk config <sub>            Configure memory backend; see 'config --help'
   npx @pentatonic-ai/ai-agent-sdk doctor                  Run health checks (exit 0/1/2)
+  config subcommands:
+    config local                                          Point plugin at a local memory engine
+    config hosted                                         Switch to hosted (delegates to login)
+    config show                                           Print current plugin config + creds
 Memory corpus (onboarding):
   npx @pentatonic-ai/ai-agent-sdk onboard                 Interactive: pick paths, ingest, install hooks
   npx @pentatonic-ai/ai-agent-sdk ingest <path>           One-shot ingest of a path (any folder works)
@@ -290,8 +190,8 @@ Memory corpus (onboarding):
   npx @pentatonic-ai/ai-agent-sdk corpus reset            Wipe local corpus state
   npx @pentatonic-ai/ai-agent-sdk install-git-hook        Install post-commit hook in cwd
-Tenant for corpus commands is read from these env vars:
-  TES_ENDPOINT, TES_CLIENT_ID, TES_API_KEY
+Corpus commands route to the backend configured via 'config' (local engine
+or hosted TES). Override with env vars: MEMORY_ENGINE_URL, TES_ENDPOINT, …
 doctor flags:
   --json                  Emit a JSON report
@@ -301,31 +201,8 @@ doctor flags:
   --timeout <ms>          Per-check timeout (default 10000)
 For docs, see https://api.pentatonic.com
-    `);
-    process.exit(0);
-  }
-  // init: --local still routes to setupLocalMemory (Docker stack —
-  // separate concern). Anything else (no flag, --remote, mode prompt)
-  // delegates to login via runInitAlias which emits a one-line
-  // deprecation warning. setupHostedTes (the old form-based hosted
-  // flow) is gone; init has been replaced by `login` for one major
-  // release, then `init` itself goes away.
-  if (flags.local && flags.remote) {
-    console.error("\n  Error: --local and --remote are mutually exclusive\n");
-    process.exit(1);
-  }
-  if (flags.local) {
-    await setupLocalMemory();
-    return;
-  }
-  // Non-local path → login alias.
-  const { runInitAlias } = await import("./commands/login.js");
-  const { exitCode } = await runInitAlias({
-    endpoint: TES_ENDPOINT,
-  });
-  rl.close();
-  process.exit(exitCode);
+  `);
+  process.exit(0);
 }
 main().catch((err) => {