npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.5.11 → 0.7.0 - Mend

@pentatonic-ai/ai-agent-sdk 0.5.11 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (119) hide show

package/README.md CHANGED Viewed

@@ -6,11 +6,11 @@
   </picture>
 </p>
-<h3 align="center">AI Agent SDK</h3>
+<h3 align="center">Pentatonic AI Agent SDK</h3>
 <p align="center">
-  Observability, memory, and analytics for LLM applications.<br>
-  Run locally or use hosted TES. JavaScript &amp; Python.
+  Memory and observability for AI agents.<br>
+  Two products on one platform (TES). One install. JavaScript &amp; Python.
 </p>
 <p align="center">
@@ -21,166 +21,321 @@
 ---
+## What's in this SDK
+Two products that share one TES account, one install line, and one dashboard:
+| Product | What it does | When you want it |
+|---|---|---|
+| **Memory** | Persistent, searchable memory for your AI agent — 7-layer hybrid retrieval (BM25 + vector + KG + reranker), repo onboarding via references. Runs locally (Docker) or hosted (TES). | You want your agent to remember conversations, preferences, and codebase context across sessions. |
+| **Observability** | Wrap your LLM client and capture every call — tokens, tool calls, latency, content. Events flow to TES for the dashboard, analytics, and search attribution. | You want to know what your agent is actually doing in production. |
+Both products are sold separately, but you can use either, both, or neither. Plugins for **Claude Code** and **OpenClaw** install everything at once if you'd rather skip the SDK glue.
+## Pick your path
+- 🧠 **I want memory in my agent** → [Memory](#memory)
+- 📊 **I want to instrument my LLM calls** → [Observability](#observability)
+- 🔌 **I'm using Claude Code or OpenClaw** → [Plugins](#plugins)
+- 📂 **I want to seed memory from my codebase or docs** → [Repository onboarding](#repository-onboarding-corpus-ingest)
+- 🩺 **I want to check my install** → [Health checks (`doctor`)](#health-checks-doctor)
 ## Table of Contents
-- [Overview](#overview)
-- [Local Memory (self-hosted)](#local-memory-self-hosted)
-- [Hosted TES](#hosted-tes)
-- [Claude Code Plugin](#claude-code-plugin)
-- [OpenClaw Plugin](#openclaw-plugin)
-- [SDK: Wrap Your LLM Client](#sdk-wrap-your-llm-client)
-- [Supported Providers](#supported-providers)
+- [TES — the platform](#tes--the-platform)
+- [Memory](#memory)
+  - [Local (self-hosted)](#local-self-hosted)
+  - [Hosted (cloud)](#hosted-cloud)
+  - [Use as a library](#use-as-a-library)
+- [Observability](#observability)
+  - [Wrap your LLM client](#wrap-your-llm-client)
+  - [Supported providers](#supported-providers)
+- [Plugins](#plugins)
+  - [Claude Code](#claude-code)
+  - [OpenClaw](#openclaw)
+- [Repository Onboarding (corpus ingest)](#repository-onboarding-corpus-ingest)
 - [API Reference](#api-reference)
 - [Health Checks (`doctor`)](#health-checks-doctor)
 - [Architecture](#architecture)
-## Overview
+---
+## TES — the platform
+**TES** (Thing Event System) is Pentatonic's account-and-events backbone. Both products in this SDK run on it: memory writes/queries land in TES, observability events stream to it, and the dashboard reads from it.
+You only need a TES account if you're using **hosted memory** or **observability** (observability always sends events to TES). **Local memory** runs entirely on your machine and needs no TES account.
+```bash
+# One-time: open browser, sign in or sign up, get API keys
+npx @pentatonic-ai/ai-agent-sdk login
+```
+`login` opens your browser at the hosted sign-in page. New users click "Sign up" to create a tenant (clientId + region + email + password). After verification the CLI writes credentials to `~/.config/tes/credentials.json` (mode 0600). The Claude Code plugin, OpenClaw plugin, hooks, and corpus CLI all auto-discover this file — no manual paste step.
+```
+✓ Connected as you@example.com on tenant `your-clientid`
+✓ Credentials written to ~/.config/tes/credentials.json
+```
+To check connection state later: `npx @pentatonic-ai/ai-agent-sdk whoami`. To point at a local TES dev instance: `npx @pentatonic-ai/ai-agent-sdk login --endpoint http://localhost:8788`.
+(`init` still works as a one-major-release deprecation alias for `login`.)
+---
+## Memory
-Two ways to use the SDK:
+Persistent, searchable memory for AI agents. Backed by a 7-layer hybrid retrieval engine — BM25 keyword (L0), core files (L1), HybridRAG orchestrator (L2), Knowledge Graph entities (L3), vector index (L4), comms-namespace vectors (L5), and a document store with cross-encoder reranker (L6). Reciprocal Rank Fusion stitches them at query time.
-**Local Memory** -- Run a fully private memory system on your own machine. PostgreSQL + pgvector + Ollama in Docker. No API keys, no cloud. Your agent gets persistent, searchable memory backed by multi-signal retrieval and HyDE query expansion.
+Same engine, same wire format (`/store`, `/search`, `/forget`, `/store-batch`, `/health`), two deployment modes:
-**Hosted TES** -- Connect to Pentatonic's Thing Event System for production-grade observability, higher-dimensional embeddings, conversation analytics, and team-wide shared memory.
+### Local (self-hosted)
-Both paths work with Claude Code and OpenClaw. The plugins auto-search on every prompt and auto-store every conversation turn.
+Run the full engine stack on your own machine via Docker. No API keys, no cloud, fully offline. Embeddings come from your local Ollama; quality depends on the model you pull (768d `nomic-embed-text` is the default and works fine on a laptop).
-## Local Memory (self-hosted)
+**Prerequisites**
-Run the full memory stack locally. Requires Docker and ~4GB disk for models.
+- Docker + Docker Compose v2
+- Ollama installed on the host (https://ollama.com)
+- A pulled embedding model: `ollama pull nomic-embed-text`
-### 1. Set up
+If you'll run Claude Code (or anything else) inside a Docker container that needs to reach the engine, **make Ollama listen on all interfaces** so containers can reach it via `host.docker.internal`:
 ```bash
-npx @pentatonic-ai/ai-agent-sdk memory
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+echo -e '[Service]\nEnvironment="OLLAMA_HOST=0.0.0.0:11434"' \
+  | sudo tee /etc/systemd/system/ollama.service.d/override.conf
+sudo systemctl daemon-reload
+sudo systemctl restart ollama
 ```
-This starts PostgreSQL + pgvector, Ollama, and the memory server. It pulls embedding and chat models, and writes the local config.
-### 2. Install the Claude Code plugin
+**Bring up the engine**
+```bash
+git clone https://github.com/Pentatonic-Ltd/ai-agent-sdk.git
+cd ai-agent-sdk/packages/memory-engine
+# Default .env points at Ollama on the host. Edit if your Ollama is
+# elsewhere or you want to use a higher-quality model (e.g. mxbai-embed-large
+# at 1024d → set EMBED_DIM=1024 and EMBED_MODEL_NAME=mxbai-embed-large).
+cat > .env <<'EOF'
+PME_NV_EMBED_ENABLED=false
+NV_EMBED_URL=http://host.docker.internal:11434/v1/embeddings
+EMBED_MODEL_NAME=nomic-embed-text
+EMBED_DIM=768
+OLLAMA_DIM=768
+PME_OLLAMA_URL=http://host.docker.internal:11434/api/embeddings
+PME_EMBED_MODEL=nomic-embed-text
+L5_OLLAMA_EMBED_URL=http://host.docker.internal:11434/api/embed
+L5_OLLAMA_EMBED_MODEL=nomic-embed-text
+PME_HYDE_ENABLED=false
+PME_RERANK_ENABLED=true
+PME_PORT=8099
+CLIENT_ID=local
+NEO4J_AUTH=neo4j/local-dev-pw
+NEO4J_PASSWORD=local-dev-pw
+EOF
+docker compose up -d --scale nv-embed=0
 ```
-/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
-/plugin install tes-memory@pentatonic-ai
+First run pulls images and builds engine containers — ~10–15 min. Subsequent restarts are seconds.
+**Verify**
+```bash
+curl -s http://localhost:8099/health | jq
+# Status should be "ok" or "degraded" with most layers reporting ok.
+curl -sX POST http://localhost:8099/store \
+  -H "content-type: application/json" \
+  -d '{"content":"hello memory","metadata":{"arena":"local"}}' | jq
+curl -sX POST http://localhost:8099/search \
+  -H "content-type: application/json" \
+  -d '{"query":"hello","limit":3,"min_score":0.001}' | jq
 ```
-That's it. The plugin hooks automatically search memories on every prompt and store every conversation turn. Fully local, fully private.
+If `/search` returns the row from `/store`, the engine is live.
-### What you get
+**Connect Claude Code**
-- **Automatic memory** -- every conversation turn is stored with embeddings and HyDE query expansion
-- **Semantic search** -- multi-signal retrieval combining vector similarity, BM25 full-text, recency decay, and access frequency
-- **Memory layers** -- episodic (recent), semantic (consolidated), procedural (how-to), working (temporary)
-- **Distilled memory** -- a background LLM pass extracts atomic facts from each raw turn and stores each as its own node in the semantic layer, linked back to the source. A query like *"what does Phil drink?"* matches *"Phil drinks cortado"* more reliably than a mixed paragraph covering food, drinks, and hobbies. Default-on; the raw turn is still preserved.
-- **Decay and consolidation** -- memories fade over time; frequently accessed ones get promoted
+The `tes-memory` plugin's hooks already speak the engine's wire format. Two steps:
-> **Store latency note (v0.5.4+):** on the local memory server, `store_memory` now awaits distillation before returning instead of running it fire-and-forget. This fixed a bug where distillation was being killed mid-flight (atoms never got embeddings, so they were unreachable by semantic search), but it means stores now take as long as your configured LLM takes to produce atoms — typically 5–30s on `llama3.2:3b`, up to the `chat()` timeout ceiling (60s default, overridable via `opts.timeout`). Cloudflare Worker deployments pass `ctx.waitUntil` and still return fast. Set `opts.distill: false` on the ingest call if you want the old fast-return behaviour at the cost of no atoms.
+1. Install the plugin (once):
+   ```
+   /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
+   /plugin install tes-memory@pentatonic-ai
+   ```
+2. Point it at your local engine. Edit `~/.claude-pentatonic/tes-memory.local.md` (create if missing):
+   ```yaml
+   ---
+   mode: local
+   memory_url: http://localhost:8099
+   ---
+   ```
+3. Reload: `/reload-plugins` (or restart Claude Code if status reports stale state — MCP server processes need a full restart to pick up plugin updates).
-### Change models
+Verify:
+```
+/tes-memory:tes-status
+```
+Should report `✓ Connected to local memory engine`. Now every prompt auto-searches engine memory and every turn auto-stores. The footer `🧠 Matched N memories from Pentatonic Memory` shows hits.
+**Seed memory from your codebase or docs (optional)**
+Drop the cold-start problem on day one by pre-populating the engine with references to your code/docs:
 ```bash
-EMBEDDING_MODEL=mxbai-embed-large LLM_MODEL=qwen2.5:7b npx @pentatonic-ai/ai-agent-sdk memory
+MEMORY_ENGINE_URL=http://localhost:8099 \
+  npx @pentatonic-ai/ai-agent-sdk ingest ~/code/my-project
 ```
-### Raspberry Pi
+References-mode by default — stores path + signature pointers, not full file contents. See [Repository Onboarding](#repository-onboarding-corpus-ingest) for details.
+**Tuning**
+Change embedding model: pull a different one, edit `EMBED_MODEL_NAME` + `EMBED_DIM` in `.env`, then `docker compose down -v && docker compose up -d --scale nv-embed=0` (the `-v` is required because Milvus collections are dim-locked at creation; switching dims means recreating).
+| Model | Dim | Notes |
+|---|---|---|
+| `nomic-embed-text` (default) | 768 | Smallest; works on any laptop |
+| `mxbai-embed-large` | 1024 | Better recall; ~600 MB download |
+| `nv-embed-v2` (via gateway) | 4096 | Production-grade; needs a hosted endpoint or GPU |
+### Hosted (cloud)
-Pi 5 with 8GB RAM runs the full stack. `nomic-embed-text` (~300MB) + `llama3.2:3b` (~2GB) leaves plenty of headroom.
+Run on Pentatonic's infrastructure. NV-Embed-v2 (4096d) embeddings via the AI gateway, managed Postgres/Neo4j/Qdrant/Milvus, dashboard. The engine still ships in this repo — hosted just deploys it for you.
+```bash
+# 1. Get a TES account
+npx @pentatonic-ai/ai-agent-sdk login
+# 2. Install the SDK
+npm install @pentatonic-ai/ai-agent-sdk
+# or: pip install pentatonic-ai-agent-sdk
+```
+Memory operations route through TES → engine. No client-side change between local and hosted.
 ### Use as a library
 ```javascript
-import { createMemorySystem } from '@pentatonic-ai/ai-agent-sdk/memory';
+import { engineAdapter, ingestCorpus } from '@pentatonic-ai/ai-agent-sdk/memory/corpus';
-const memory = createMemorySystem({
-  db: pgPool,
-  embedding: { url: 'http://localhost:11434/v1', model: 'nomic-embed-text' },
-  llm: { url: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
+const adapter = engineAdapter({
+  engineUrl: 'http://localhost:8099',
+  arena: 'my-app',
 });
-await memory.migrate();
-await memory.ensureLayers('my-app');
-await memory.ingest('User prefers dark mode', { clientId: 'my-app' });
-const results = await memory.search('preferences', { clientId: 'my-app' });
+await adapter.init();
+await adapter.ingestChunk('User prefers dark mode', { kind: 'note' });
 ```
-## Hosted TES
+For raw `/search` and `/store`, just `fetch()` against `${engineUrl}/search` etc. The wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
-Connect to Pentatonic's hosted infrastructure for production use.
+---
-### 1. Create an account
+## Observability
-```bash
-npx @pentatonic-ai/ai-agent-sdk init
-```
+Wrap your LLM client and every call automatically emits a `CHAT_TURN` event to TES — input/output tokens, tool calls, model, latency, content. Events flow into the TES dashboard, where you get session metrics, search attribution, dead-end detection, and full-text + semantic search across conversations.
-This walks you through account creation, email verification, and API key generation. You'll get:
+Observability requires a TES account (hosted or self-hosted Pentatonic platform). Events have nowhere to go without one.
-```
-TES_ENDPOINT=https://your-company.api.pentatonic.com
-TES_CLIENT_ID=your-company
-TES_API_KEY=tes_your-company_xxxxx
-```
+### Wrap your LLM client
-### 2. Install
+**JavaScript**
-```bash
-npm install @pentatonic-ai/ai-agent-sdk
+```js
+import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
+const tes = new TESClient({
+  clientId: process.env.TES_CLIENT_ID,
+  apiKey: process.env.TES_API_KEY,
+  endpoint: process.env.TES_ENDPOINT,
+});
+const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });
+const result = await ai.chat.completions.create({
+  model: "gpt-4o",
+  messages: [{ role: "user", content: "Hello!" }],
+});
 ```
-```bash
-pip install pentatonic-ai-agent-sdk
+**Python**
+```python
+from pentatonic_agent_events import TESClient
+tes = TESClient(
+    client_id=os.environ["TES_CLIENT_ID"],
+    api_key=os.environ["TES_API_KEY"],
+    endpoint=os.environ["TES_ENDPOINT"],
+)
+ai = tes.wrap(OpenAI(), session_id="conv-123")
+result = ai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hello!"}],
+)
 ```
-### What you get (in addition to local features)
+### Supported providers
-- **Higher-dimensional embeddings** -- NV-Embed-v2 (4096d) for better retrieval accuracy
-- **Conversation analytics** -- session metrics, search attribution, dead-end detection
-- **Team-wide shared memory** -- semantic search across your team's AI interactions
-- **Admin dashboard** -- visualize conversations, token usage, and memory explorer
-- **Multi-tenancy** -- isolated databases per client
+| Provider | Detection | Intercepted Method |
+|----------|-----------|-------------------|
+| OpenAI | `client.chat.completions.create` | `chat.completions.create()` |
+| Anthropic | `client.messages.create` | `messages.create()` |
+| Workers AI | `client.run` (JS only) | `run()` |
-## Claude Code Plugin
+All other methods pass through unchanged.
-Works with both local and hosted setups. Install once, switch modes via config.
+---
-### Install via marketplace
+## Plugins
+If you use Claude Code or OpenClaw, the plugin gives you both products at once — every conversation turn is captured (observability) AND searched/stored as memory. No SDK glue to write.
+### Claude Code
+Works with both local and hosted memory. Install once, switch modes via config.
 ```
 /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
 /plugin install tes-memory@pentatonic-ai
 ```
-### Set up
+**Local engine** — bring up the engine first ([Memory > Local](#local-self-hosted)), then point the plugin at it. Edit `~/.claude-pentatonic/tes-memory.local.md`:
-For hosted TES:
-```
-/tes-memory:tes-setup
+```yaml
+---
+mode: local
+memory_url: http://localhost:8099
+---
 ```
-For local memory:
+**Hosted TES** — run `login` once, the plugin auto-discovers `~/.config/tes/credentials.json`:
 ```bash
-npx @pentatonic-ai/ai-agent-sdk memory
+npx @pentatonic-ai/ai-agent-sdk login
 ```
-### What it tracks
-- **Every conversation turn** -- user messages, assistant responses, tool calls, duration
-- **Automatic memory search** -- relevant memories injected as context on every prompt
-- **Automatic memory storage** -- every turn stored with embeddings and HyDE queries
-- **Token usage** -- input, output, cache read, cache creation tokens per turn
+Either way, verify with `/tes-memory:tes-status` in Claude Code. The plugin's MCP server, hooks, and tools all read the same config.
-## OpenClaw Plugin
+**What it tracks (auto, every turn):**
+- Memory search at prompt time — relevant memories injected as context
+- Memory store at turn end — every conversation turn persisted
+- Token usage — input, output, cache read, cache creation tokens per turn
-Works with both local and hosted setups. Just tell OpenClaw to set it up.
-### Install
+### OpenClaw
 ```bash
 openclaw plugins install @pentatonic-ai/openclaw-memory-plugin
 ```
-### Set up
-Tell OpenClaw:
+Then tell OpenClaw:
 ```
 Set up pentatonic memory
@@ -194,18 +349,7 @@ Or use the CLI directly:
 openclaw pentatonic-memory local
 ```
-### What it does
-OpenClaw's context engine hooks fire on every lifecycle event:
-- **Ingest** -- every user and assistant message is stored with embeddings and HyDE query expansion, then distilled into atomic facts in the background (see [Distilled memory](#what-you-get))
-- **Assemble** -- relevant memories are injected as system prompt context before every model run
-- **Compact** -- decay cycle runs when the context window fills
-- **After turn** -- high-access memories get consolidated to the semantic layer
-Plus agent-callable tools: `memory_search`, `memory_store`, `memory_layers`.
-### Configuration
+**What it does:** OpenClaw's context engine hooks fire on every lifecycle event — `ingest` stores user/assistant messages via the engine's `/store` endpoint (BM25 + vector + KG indexing in parallel); `assemble` calls `/search` to inject relevant memories as system-prompt context; `compact` and `after-turn` are managed by the engine's own decay/consolidation. Plus agent-callable tools: `memory_search`, `memory_store`, `memory_layers`.
 After setup, config lives in `~/.openclaw/pentatonic-memory.json`. To switch modes, run setup again or edit directly.
@@ -219,11 +363,7 @@ You can also configure via `openclaw.json`:
       "pentatonic-memory": {
         "enabled": true,
         "config": {
-          "database_url": "postgres://memory:memory@localhost:5433/memory",
-          "embedding_url": "http://localhost:11435/v1",
-          "embedding_model": "nomic-embed-text",
-          "llm_url": "http://localhost:11435/v1",
-          "llm_model": "llama3.2:3b"
+          "memory_url": "http://localhost:8099"
         }
       }
     }
@@ -241,57 +381,80 @@ For hosted mode, replace the config block with:
 }
 ```
-## SDK: Wrap Your LLM Client
+---
-**JavaScript**
+## Repository Onboarding (corpus ingest)
-```js
-import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
+The memory layer starts empty. To avoid the cold-start problem where retrieval has nothing useful to return for the first days of use, you can ingest your repos (or any folder of docs) on day one:
-const tes = new TESClient({
-  clientId: process.env.TES_CLIENT_ID,
-  apiKey: process.env.TES_API_KEY,
-  endpoint: process.env.TES_ENDPOINT,
-});
+```bash
+# Interactive — picks paths, shows a cost preview, ingests, offers
+# to install a git post-commit hook so memory stays current
+npx @pentatonic-ai/ai-agent-sdk onboard
-const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });
-const result = await ai.chat.completions.create({
-  model: "gpt-4o",
-  messages: [{ role: "user", content: "Hello!" }],
-});
+# One-shot ingest of a single path
+npx @pentatonic-ai/ai-agent-sdk ingest ~/code/my-app
+npx @pentatonic-ai/ai-agent-sdk ingest ~/Documents/design-notes  # any folder works
+# See what's tracked and how big the corpus is
+npx @pentatonic-ai/ai-agent-sdk status
+# Delta-resync everything that's tracked (or one path)
+npx @pentatonic-ai/ai-agent-sdk resync
+# Manage the tracked-paths list
+npx @pentatonic-ai/ai-agent-sdk corpus list
+npx @pentatonic-ai/ai-agent-sdk corpus remove ~/code/old-project
+npx @pentatonic-ai/ai-agent-sdk corpus reset
 ```
-**Python**
+Tenant credentials come from env vars (`TES_ENDPOINT`, `TES_CLIENT_ID`, `TES_API_KEY`) or `~/.config/tes/credentials.json` if you used `npx @pentatonic-ai/ai-agent-sdk login`. To point at a TES instance running on `localhost`, set `TES_ENDPOINT=http://localhost:8788`.
-```python
-from pentatonic_agent_events import TESClient
+### What gets stored: references, not content
-tes = TESClient(
-    client_id=os.environ["TES_CLIENT_ID"],
-    api_key=os.environ["TES_API_KEY"],
-    endpoint=os.environ["TES_ENDPOINT"],
-)
+By default, ingest stores **pointers to source content** (path + line range + a short signature/summary), not full chunk content. Per-language strategies:
-ai = tes.wrap(OpenAI(), session_id="conv-123")
-result = ai.chat.completions.create(
-    model="gpt-4o",
-    messages=[{"role": "user", "content": "Hello!"}],
-)
-```
+- **Markdown** — one reference per H1/H2 section
+- **JS / TS** — one per top-level `function` / `class` / `const` / `export`
+- **Python** — one per top-level `def` / `class`
+- **JSON / YAML** — collapsed top-level keys
+- **Other** — single file-level reference
-## Supported Providers
+Why pointers? **Code mutates between ingests.** Embedded chunks of old source rot silently — the LLM keeps confidently citing functions you've since rewritten, with retrieval evidence to back it up. Pointers rot loudly: when a file moves or changes, `Read` fails or returns different content, and the agent observes and adjusts. Stale-but-confident is the worst-class memory bug; loud-and-self-correcting is qualitatively better for source code.
-| Provider | Detection | Intercepted Method |
-|----------|-----------|-------------------|
-| OpenAI | `client.chat.completions.create` | `chat.completions.create()` |
-| Anthropic | `client.messages.create` | `messages.create()` |
-| Workers AI | `client.run` (JS only) | `run()` |
+It also means proprietary source never leaves your machine — only the index (path + summary) is sent to the hosted TES, and the agent reads actual file contents at query time on its own.
-All other methods pass through unchanged.
+If you need a self-contained index (e.g. for air-gapped retrieval where the source isn't available at query time), opt into legacy chunk-content storage by passing `mode: "content"` to `ingestCorpus` when using the SDK as a library.
+### What gets ingested, what doesn't
+Any folder works — git is not required. The walker honors `.gitignore` and `.tesignore` if present, plus a hard-exclude list for secrets and credentials that **cannot be overridden** even with `!pattern` rules:
+- `.env*` (any environment file)
+- `*.pem`, `*.key`, `*.crt`, `*.p12`, `*.pfx`, `*.jks`
+- `id_rsa`, `id_ed25519`, `id_ecdsa`, `id_dsa` (SSH private keys)
+- `.ssh/`, `.aws/`, `.gcp/`, `.azure/` (whole directories)
+- `.npmrc`, `.pypirc`, `.netrc`
+- `secrets/`, `credentials/`, `service-account.*`
+- `*_secret*`, `*_token*`, `*_password*`
+Plus directory-level skips: `.git`, `node_modules`, `dist`, `build`, `.next`, `venv`, `__pycache__`, `target`, `.terraform`, etc. And extension skips for binaries, lockfiles, and minified output. Files larger than 512 KB are skipped by default (override with adapter options if you need to).
+### How it stays current
+For git repos, accepting the prompt during `onboard` installs a post-commit hook at `.git/hooks/post-commit` that re-ingests files changed in each commit. The hook is non-fatal — it never blocks a commit. Install manually any time with:
+```bash
+npx @pentatonic-ai/ai-agent-sdk install-git-hook
+```
+For non-git folders, re-run `ingest` or `resync` whenever the source changes. Re-ingest is cheap: the SDK keeps a content-hash per file and skips anything that hasn't changed since the last run.
+---
 ## API Reference
-### `TESClient(config)`
+### `TESClient(config)` — Observability
 | Param | Type | Default | Description |
 |-------|------|---------|-------------|
@@ -329,6 +492,14 @@ import { normalizeResponse } from "@pentatonic-ai/ai-agent-sdk";
 const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);
 ```
+### `engineAdapter(config)` — Memory
+Thin HTTP client for the memory engine. `config = { engineUrl, arena, apiKey? }`. Returns `{ ingestChunk(content, metadata), deleteByCorpusFile(repoAbs, relPath), init() }`. See [Use as a library](#use-as-a-library).
+For raw `/store` / `/search` calls, just `fetch()` against `${engineUrl}` directly — the wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
+---
 ## Health Checks (`doctor`)
 Run a full health check of your SDK install at any time:
@@ -337,9 +508,7 @@ Run a full health check of your SDK install at any time:
 npx @pentatonic-ai/ai-agent-sdk doctor
 ```
-`doctor` auto-detects which install path you're on (Local Memory, Hosted
-TES, or self-hosted Pentatonic platform) and runs only the checks that
-apply. Exit code is `0` for all-clear, `1` for warnings, `2` for critical.
+`doctor` auto-detects which install path you're on (Local Memory, Hosted TES, or self-hosted Pentatonic platform) and runs only the checks that apply. Exit code is `0` for all-clear, `1` for warnings, `2` for critical.
 Common flags:
@@ -353,17 +522,13 @@ npx @pentatonic-ai/ai-agent-sdk doctor --path local
 What gets checked:
 - **Universal** — Node version, disk space, SDK config-file permissions
-- **Local Memory** — Postgres + pgvector + migrations, embedding/LLM
-  endpoints, memory server port
+- **Local engine** — engine `/health`, per-layer health (L0–L6), embedding endpoint reachability
 - **Hosted TES** — endpoint reachable, API key authenticates
-- **Self-hosted platform** — HybridRAG, Qdrant, Neo4j, vLLM (each
-  optional, skipped when its env var is unset)
+- **Plugin config** — `tes-memory.local.md` parses, `memory_url` reachable
 ### Plugins
-Drop a `.mjs` file into `~/.config/pentatonic-ai/doctor-plugins/` to add
-your own checks. Useful for app-specific things — internal APIs, ingest
-freshness, custom infrastructure — without forking the SDK.
+Drop a `.mjs` file into `~/.config/pentatonic-ai/doctor-plugins/` to add your own checks. Useful for app-specific things — internal APIs, ingest freshness, custom infrastructure — without forking the SDK.
 ```js
 // ~/.config/pentatonic-ai/doctor-plugins/my-app.mjs
@@ -384,32 +549,38 @@ export default {
 };
 ```
-See [`packages/doctor/README.md`](packages/doctor/README.md) for the full
-plugin contract and programmatic API.
+See [`packages/doctor/README.md`](packages/doctor/README.md) for the full plugin contract and programmatic API.
+---
 ## Architecture
 ```
-        +-------------------+     +-------------------+
-        | Claude Code Plugin|     |  OpenClaw Plugin   |
-        | (hooks: auto-     |     | (context engine:   |
-        |  search + store)  |     |  ingest, assemble, |
-        +--------+----------+     |  compact, tools)   |
-                 |                +--------+----------+
-                 |                         |
-                 +------------+------------+
-                              |
-                  +-----------+-----------+
-                  |                       |
-            Local Memory            Hosted TES
-            (Docker)                (Cloud)
-                  |                       |
-       +----+----+----+          +---+----+---+
-       |    |    |    |          |   |    |   |
-      PG  Ollama MCP HTTP      PG  R2  Queue Workers
-      pgvector        API     pgvector       Modules
+                    Your code / Claude Code plugin / OpenClaw plugin
+                                  |
+              +-------------------+--------------------+
+              |                                        |
+         Memory product                        Observability product
+         (engine HTTP API)                     (TESClient.wrap)
+              |                                        |
+              | POST /store /search /forget            | CHAT_TURN events
+              ▼                                        ▼
+      +----------------+                       +-----------------+
+      | memory engine  |                       |       TES       |
+      |  (compat shim) |                       | (Cloudflare)    |
+      +----------------+                       |  Workers, R2,   |
+              |                                |  Queues, Pages  |
+   +----------+----------+                     +--------+--------+
+   |                     |                              |
+ Local                Hosted ---------------------------+
+ (your machine)    (Pentatonic-managed)
+   |                     |
+docker compose      AWS/GCP container cluster
++ host Ollama       + AI gateway (NV-Embed-v2)
 ```
+Plugins (Claude Code, OpenClaw) are lightweight integrations on top of both products — they call into memory and emit observability events on the user's behalf.
 ## License
 MIT