npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.8.6 → 0.8.7 - Mend

@pentatonic-ai/ai-agent-sdk 0.8.6 → 0.8.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +2 -2
package/dist/index.cjs +1 -1
package/dist/index.js +1 -1
package/package.json +1 -1
package/packages/memory/README.md +33 -0
package/packages/memory/openclaw-plugin/README.md +25 -0
package/packages/memory/openclaw-plugin/openclaw.plugin.json +15 -8
package/packages/memory/openclaw-plugin/package.json +1 -1
package/packages/memory/src/server.js +16 -0
package/packages/memory-engine/MIGRATION.md +219 -0
package/packages/memory-engine/README.md +20 -6

package/README.md CHANGED Viewed

@@ -238,7 +238,7 @@ await adapter.init();
 await adapter.ingestChunk('User prefers dark mode', { kind: 'note' });
 ```
-For raw `/search` and `/store`, just `fetch()` against `${engineUrl}/search` etc. The wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
+For raw `/search` and `/store`, just `fetch()` against `${engineUrl}/search` etc. The wire format is documented in `packages/memory-engine/MIGRATION.md`.
 ---
@@ -504,7 +504,7 @@ const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);
 Thin HTTP client for the memory engine. `config = { engineUrl, arena, apiKey? }`. Returns `{ ingestChunk(content, metadata), deleteByCorpusFile(repoAbs, relPath), init() }`. See [Use as a library](#use-as-a-library).
-For raw `/store` / `/search` calls, just `fetch()` against `${engineUrl}` directly — the wire format is documented in `packages/memory-engine/docs/MIGRATION.md`.
+For raw `/store` / `/search` calls, just `fetch()` against `${engineUrl}` directly — the wire format is documented in `packages/memory-engine/MIGRATION.md`.
 ---

package/dist/index.cjs CHANGED Viewed

@@ -906,7 +906,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.8.6";
+var VERSION = "0.8.7";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/dist/index.js CHANGED Viewed

@@ -875,7 +875,7 @@ function fireAndForgetEmit(clientConfig, sessionOpts, messages, result, model) {
 }
 // src/telemetry.js
-var VERSION = "0.8.6";
+var VERSION = "0.8.7";
 var TELEMETRY_URL = "https://sdk-telemetry.philip-134.workers.dev";
 function machineId() {
   const raw = typeof process !== "undefined" ? `${process.env?.USER || process.env?.USERNAME || "u"}:${process.platform || "x"}:${process.arch || "x"}` : "browser";

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pentatonic-ai/ai-agent-sdk",
-  "version": "0.8.6",
+  "version": "0.8.7",
   "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
   "type": "module",
   "main": "./dist/index.cjs",

package/packages/memory/README.md CHANGED Viewed

@@ -1,5 +1,38 @@
 # Memory System
+> ## ⚠️ DEPRECATED — use the 7-layer memory engine instead
+>
+> This package is the **legacy** single-process MCP server backed by
+> PostgreSQL + pgvector + Ollama. It's superseded by the **7-layer
+> memory engine** at [`packages/memory-engine/`](../memory-engine/),
+> which is what the top-level SDK README walks users into and what
+> TES production runs.
+>
+> | | This package (legacy) | `packages/memory-engine/` (current) |
+> |---|---|---|
+> | Wire | MCP over stdio | HTTP (`/store`, `/search`, ...) |
+> | Storage | One Postgres table, one embedding | 7 layers fused via RRF |
+> | Features | HyDE expansion | KG entities (L3), cross-encoder reranker (L6), multi-collection (L5), arena scoping, /health/deep |
+> | Bench accuracy | (not benched) | 84.6% / p50 110ms |
+> | Deploys | Single node process | docker compose stack |
+>
+> **What still works:** this server keeps running until v1.0. Existing
+> deployments continue to function; no breaking changes here. A startup
+> warning prints to stderr on every cold-start (suppress with
+> `PENTATONIC_DEPRECATION_QUIET=1`).
+>
+> **What to do:** new installs should follow the engine path — see the
+> top-level [README → Memory → Local](../../README.md#local-self-hosted)
+> section. Existing installs can keep running this server through the
+> v0.9 line; migration guidance for v1.0 will land before then.
+>
+> **Why deprecate:** the engine subsumes every feature this server
+> offers (and adds graph, reranker, multi-store fusion), so maintaining
+> both paths fragments testing, security review, and operator-facing
+> docs without a payoff. One product is clearer than two.
+---
 Self-hosted memory system for AI agents. Give Claude Code or OpenClaw persistent, searchable memory backed by PostgreSQL, pgvector, and Ollama. Fully local — no API keys, no cloud dependencies.
 ## What You Get

package/packages/memory/openclaw-plugin/README.md CHANGED Viewed

@@ -2,6 +2,31 @@
 Persistent, searchable memory for OpenClaw. Local (Docker + Ollama) or hosted (Pentatonic TES).
+> ## ⚠️ The local-mode config below targets a deprecated backend
+>
+> The `database_url` / `embedding_url` / `llm_url` config fields shown
+> in this README configure the **legacy** `packages/memory/` Postgres+
+> Ollama+pgvector MCP server, which is being retired in favour of the
+> 7-layer memory engine at `packages/memory-engine/`. Both backends
+> still work; the legacy one will be removed in v1.0.
+>
+> **For new installs, prefer:**
+>
+> ```json
+> "pentatonic-memory": {
+>   "enabled": true,
+>   "config": {
+>     "mode": "local",
+>     "memory_url": "http://localhost:8099"
+>   }
+> }
+> ```
+>
+> …with the engine brought up via `docker compose up -d` from
+> `packages/memory-engine/`. See the
+> [top-level SDK README](../../../README.md#local-self-hosted) for the
+> walkthrough.
 ## Install
 ```bash

package/packages/memory/openclaw-plugin/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "pentatonic-memory",
   "name": "Pentatonic Memory",
   "description": "Persistent, searchable memory with multi-signal retrieval and HyDE query expansion. Local (Docker + Ollama) or hosted (Pentatonic TES).",
-  "version": "0.8.5",
+  "version": "0.8.6",
   "kind": "context-engine",
   "configSchema": {
     "type": "object",
@@ -16,28 +16,35 @@
       },
       "memory_url": {
         "type": "string",
-        "default": "http://localhost:3333",
-        "description": "Memory server HTTP URL (local mode, default: http://localhost:3333)"
+        "default": "http://localhost:8099",
+        "description": "Memory engine HTTP URL (local mode). Default 8099 = packages/memory-engine compat shim. (3333 was the legacy single-process MCP server port — deprecated.)"
       },
       "database_url": {
         "type": "string",
-        "description": "PostgreSQL connection string (local mode)"
+        "description": "PostgreSQL connection string. DEPRECATED — targets the legacy packages/memory MCP server, retired in favor of memory_url + the 7-layer engine. Removal targeted v1.0.",
+        "deprecated": true
       },
       "embedding_url": {
         "type": "string",
-        "description": "OpenAI-compatible embeddings endpoint (local mode)"
+        "description": "OpenAI-compatible embeddings endpoint. DEPRECATED — same scope as database_url; the engine manages its own embedding routing via L*_EMBED_PROVIDER server-side.",
+        "deprecated": true
       },
       "embedding_model": {
         "type": "string",
-        "default": "nomic-embed-text"
+        "default": "nomic-embed-text",
+        "description": "Legacy-mode embedding model. DEPRECATED — paired with database_url.",
+        "deprecated": true
       },
       "llm_url": {
         "type": "string",
-        "description": "OpenAI-compatible chat endpoint for HyDE (local mode)"
+        "description": "OpenAI-compatible chat endpoint for HyDE. DEPRECATED — HyDE is not used by the 7-layer engine; legacy-mode only.",
+        "deprecated": true
       },
       "llm_model": {
         "type": "string",
-        "default": "llama3.2:3b"
+        "default": "llama3.2:3b",
+        "description": "Legacy-mode chat model for HyDE. DEPRECATED — paired with llm_url.",
+        "deprecated": true
       },
       "tes_endpoint": {
         "type": "string",

package/packages/memory/openclaw-plugin/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pentatonic-ai/openclaw-memory-plugin",
-  "version": "0.8.5",
+  "version": "0.8.6",
   "description": "Pentatonic Memory plugin for OpenClaw — persistent, searchable memory with multi-signal retrieval and HyDE query expansion",
   "type": "module",
   "main": "index.js",

package/packages/memory/src/server.js CHANGED Viewed

@@ -40,6 +40,22 @@ process.on("unhandledRejection", (err) => {
   process.stderr.write(`[memory-server] Unhandled rejection: ${err?.message || err}\n`);
 });
+// Deprecation notice — see packages/memory/README.md for context.
+// This MCP server (Postgres+pgvector+Ollama, single-process) is being
+// retired in favour of the 7-layer engine at packages/memory-engine/.
+// Targeted for removal in v1.0; in the meantime everything keeps
+// working. Print once on startup so operators see the signal in logs
+// without flooding the conversation surface.
+if (process.env.PENTATONIC_DEPRECATION_QUIET !== "1") {
+  process.stderr.write(
+    "[memory-server] DEPRECATED: this server (Postgres+pgvector+Ollama MCP) " +
+      "is superseded by the 7-layer memory engine at packages/memory-engine/. " +
+      "Existing deployments keep working; removal targeted for v1.0. " +
+      "See README → Memory → Local for the migration path. " +
+      "Suppress this warning with PENTATONIC_DEPRECATION_QUIET=1.\n"
+  );
+}
 const CLIENT_ID = process.env.CLIENT_ID || "default";
 function createMemory() {

package/packages/memory-engine/MIGRATION.md ADDED Viewed

@@ -0,0 +1,219 @@
+# Wire format & migration
+`pentatonic-memory-engine` is a drop-in replacement for `pentatonic-memory` v0.5.x.
+Same HTTP API, same request/response shapes — your existing SDK client code
+keeps working. This doc covers:
+1. **Wire format** — what every endpoint accepts and returns
+2. **What changed vs v0.5.x** — new endpoints + small additive fields
+3. **Operational notes** — per-layer health, deep healthchecks, env vars
+---
+## Wire format
+All endpoints accept and return JSON. Base URL is the compat-shim host
+(default `http://localhost:8099`).
+### `POST /store`
+Write a memory.
+```json
+{
+  "content": "User prefers dark mode",
+  "metadata": {
+    "arena": "my-app",         // tenant-scoping key. Defaults to "default".
+    "kind": "note",             // optional; opaque, surfaced on read
+    "source_file": "config.md", // optional; routed to L1/L6 paths
+    "contact_email": "...",     // optional; triggers L3 Person extraction
+    "contact_name": "...",      // optional; same
+    "channel": "email",         // optional; tags ChannelStat denormalisation
+    "direction": "inbound"      // optional; same
+  },
+  "client_id": "my-app"         // optional; alternative to metadata.arena
+}
+```
+Response:
+```json
+{
+  "id": "cc830b145b1e36514f73fd508aac885a",
+  "content": "User prefers dark mode",
+  "layerId": "ml_my-app_episodic",
+  "engine": {
+    "l0": 1, "l4_qmd": 1, "l4": 1, "l5": 1, "l6": 1,
+    "l3_chunks": 0, "l3_entities": 0  // non-zero when metadata.contact_* present
+  }
+}
+```
+### `POST /store-batch`
+Same body as `/store` but `content` is replaced with `records: [{content, metadata}, ...]`.
+30–50× faster than calling `/store` per record because L2 issues one
+batched `/v1/embeddings` call instead of one per record.
+### `POST /search`
+Hybrid search across all 7 layers, RRF-fused.
+```json
+{
+  "query": "dark mode preferences",
+  "limit": 10,                  // default 16
+  "min_score": 0.3,              // default 0; filter low-similarity hits
+  "client_id": "my-app",         // tenant scope
+  "user_id": "alice@example.com", // optional; spans clientId AND clientId:user_id arenas
+  "method": "hybrid"             // hybrid (default) | vector | bm25
+}
+```
+Response:
+```json
+{
+  "results": [
+    {
+      "id": "cc830b145b1e36514f73fd508aac885a.md",
+      "content": "User prefers dark mode",
+      "metadata": { "arena": "my-app", "kind": "note", ... },
+      "similarity": 0.87,
+      "layer_id": "ml_my-app_episodic",
+      "source": "cc830b145b1e36514f73fd508aac885a.md",
+      "engine_layer": ""        // which layer produced this hit (l0/l3/l4/l5/l6)
+    }
+  ]
+}
+```
+### `POST /forget`
+Delete memories matching a filter. Two modes:
+- **Per-arena delete**: `{"arena": "my-app"}` removes everything tagged with
+  that arena across L0/L4/L5/L6 + L3 chunks + L3 Entity nodes.
+- **Global wipe**: `{"confirm": "GLOBAL_WIPE"}` (literal string, no arena) wipes
+  every layer for every tenant. Intended for dev resets only.
+Returns counts deleted per layer.
+### `GET /health`
+Shallow health. Returns 200 even when `status: "degraded"` — the body's
+`status` is the verdict.
+```json
+{
+  "status": "ok",
+  "version": "0.1.0",
+  "engine": "pentatonic-memory-engine",
+  "layers": {
+    "l0": "ok", "l1": "ok", "l2": "ok", "l3": "ok",
+    "l4": "ok", "l5": "ok", "l6": "ok", "nv_embed": "ok"
+  },
+  "memories": {
+    "l0_bm25_chunks": 233142,
+    "l4_vectors": 64212,
+    "l5_chats_chunks": 109671,
+    "l6_vector_chunks": 68220,
+    "l6_fts_chunks": 39703
+  }
+}
+```
+Layer values are either `"ok"`, `"degraded"`, `"http <code>"`, or
+`"unreachable: <reason>"`. The aggregate `status` is `degraded` if any
+single layer is non-ok and `down` if ≥3 are non-ok.
+The `memories` field is a per-layer chunk-count dict (since v0.8.4 — previously
+a single int that only reported L6's count and misled operators about real
+corpus size).
+### `GET /health/deep` (since v0.8.4)
+Synthetic round-trip per layer: embed a sentinel → write to layer → search
+for it → assert hit. Slower (~1–2s); intended for ops/cron, not for compose
+healthchecks.
+```json
+{
+  "status": "ok",
+  "ok": true,
+  "layers": {
+    "l4": {"status": "ok", "ok": true, "embed_ms": 12.6, "write_ms": 2.2,
+           "search_ms": 0.0, "hit": true, "total_ms": 17.1},
+    "l5": {"status": "ok", "ok": true,
+           "collections": {"chats": true, "emails": true, "contacts": true, "memory": true},
+           "embed_ms": 9.4, "write_ms": 7.6, "search_ms": 1.6, "hit": true},
+    "l6": {"status": "ok", "ok": true, "embed_ms": 11.3, "write_ms": 635.1,
+           "search_ms": 59.5, "hit": true, "reranker": "ok"}
+  }
+}
+```
+Sentinel rows are stored under arena `__healthcheck__` with a fixed id, so
+the probe pollutes the corpus by at most one row per layer (upserts, never
+accumulates).
+### `POST /aggregate` (since v0.8.x)
+Typed-Person aggregation over the L3 graph. Counts `(:Person)-[:COMMUNICATED]->(:Chunk)`
+edges by `group_by` keys. See `packages/memory-engine/compat/server.py` for
+the full schema — used today by the relationships UI in the TES module.
+---
+## What changed vs `pentatonic-memory` v0.5.x
+| | v0.5.x | This package |
+|---|---|---|
+| Storage | Single Postgres + pgvector + HNSW | 7-layer fusion (SQLite FTS, Neo4j, sqlite-vec, Milvus, Milvus+rerank) |
+| Embedding | One model, one dim, ingest-time | Per-layer configurable; provider-aware (`L*_EMBED_PROVIDER`) |
+| Endpoints added | — | `/store-batch`, `/forget`, `/health/deep`, `/aggregate` |
+| `/health` body | `{status, layers}` only | + `version`, `engine`, `nv_embed` layer, per-layer `memories` dict |
+| `/store` engine fields | absent | `engine: { l0, l3_chunks, l3_entities, l4, l5, l6 }` per-layer write counts |
+| Backwards-incompat changes | — | **None.** All v0.5 client code keeps working. |
+The engine container's compat shim (`compat/server.py`) is the API surface;
+the layer services behind it can be replaced or scaled independently
+without changing client code.
+---
+## Operational notes
+### Provider-aware embedding (since v0.8.0)
+Each layer service picks an embedding provider via `L*_EMBED_PROVIDER`:
+- `openai` (default) — Bearer auth, `/v1/embeddings` path, OpenAI-shaped body
+- `pentatonic-gateway` — X-API-Key auth, `/v1/embed` path
+- `cohere` — `{texts, input_type}` body shape
+A 401 from the configured provider auto-detects against the other built-ins
+(opt out per layer with `L*_EMBED_AUTODETECT=false`). See
+`engine/services/_shared/embed_provider.py` for the dispatch table.
+### L2 concurrency (since v0.8.4)
+L2 hybridrag-proxy is async throughout — `AsyncGraphDatabase` for Neo4j,
+`httpx.AsyncClient` for L4/L5/L6 fan-out, `asyncio.to_thread` for sqlite +
+PyTorch reranker. Fan-out across layers runs concurrently via
+`asyncio.gather`. Under sustained ingest, `/health` and `/search` no
+longer compete for a saturated threadpool.
+### L5 collection bootstrap (since v0.8.4)
+L5's serve() ensures all four collections (`chats`, `emails`, `contacts`,
+`memory`) exist at startup. Previously only `chats` was bootstrapped and
+writes to the others would 500.
+### Health check semantics
+- `compose` healthcheck and `engine-runner.sh` deploy gate use `/health` —
+  it's fast (<50ms) and returns HTTP 200 regardless of body status.
+- Operators/cron should use `/health/deep` for real functional validation.
+- A `compose` healthcheck on `/health/deep` would burn embedding budget
+  every 10s — avoid.

package/packages/memory-engine/README.md CHANGED Viewed

@@ -52,7 +52,7 @@ client (any) ───► POST /forget ──►   (FastAPI)  │──►│
                                                      ├──────────────────┤
                                                      │  L4  sqlite-vec   │
                                                      ├──────────────────┤
-                                                     │  L5  Qdrant comms │
+                                                     │  L5  Milvus comms │
                                                      ├──────────────────┤
                                                      │  L6  Document     │
                                                      │      Store +      │
@@ -76,7 +76,7 @@ Each layer indexes the same content differently. Search runs all seven in parall
 | L2 | HybridRAG orchestrator | Fan-out + RRF fusion across all layers | Python FastAPI |
 | L3 | Knowledge Graph | Entity-aware retrieval, multi-hop relationships | Neo4j (OSS) |
 | L4 | Vector index | High-recall semantic search | sqlite-vec |
-| L5 | Comms / multi-collection vectors | Chat / email / contact / memory namespaces | Qdrant |
+| L5 | Comms / multi-collection vectors | Chat / email / contact / memory namespaces | Milvus (Lite by default; standalone via compose) |
 | L6 | Document store | Per-arena docs + cross-encoder reranker | sqlite + Milvus + MiniLM |
 ## Quick start
@@ -92,7 +92,19 @@ Wait ~30s for layers to come up. Verify:
 ```bash
 curl http://localhost:8099/health
-# → {"status":"ok","layers":{"l0":"ok","l1":"ok","l2":"ok","l3":"ok","l4":"ok","l5":"ok","l6":"ok"},"engine":"pentatonic-memory-engine"}
+# → {
+#     "status": "ok",
+#     "version": "0.1.0",
+#     "engine": "pentatonic-memory-engine",
+#     "layers": {"l0":"ok","l1":"ok","l2":"ok","l3":"ok","l4":"ok","l5":"ok","l6":"ok","nv_embed":"ok"},
+#     "memories": {
+#       "l0_bm25_chunks": N, "l4_vectors": N,
+#       "l5_chats_chunks": N, "l6_vector_chunks": N, "l6_fts_chunks": N
+#     }
+#   }
+# Or run real functional round-trips per layer (slower; ~1–2s):
+curl http://localhost:8099/health/deep
 ```
 Now point your existing `pentatonic-memory` SDK client at `http://localhost:8099` — no code change.
@@ -122,10 +134,12 @@ Both modes populate all 7 layers on `/store-batch` (since v0.2). The mode flag o
 |---|---|---|---|
 | `POST /store` | ✅ | ✅ | Same request/response shape |
 | `POST /search` | ✅ | ✅ | Same request/response shape; ?mode=vector/text both supported |
-| `GET /health` | ✅ | ✅ | Returns aggregate health across all 7 layers |
-| `POST /store-batch` | ❌ | ✅ | New: batch-ingest N records in one HTTP call (30-50× faster) |
+| `GET /health` | ✅ | ✅ | Returns aggregate health across all 7 layers + nv-embed reachability + per-layer `memories` counts |
+| `GET /health/deep` | ❌ | ✅ | NEW (v0.8.4): synthetic embed → write → search round-trip per layer. Slower (~1–2s); for ops/monitoring on demand. |
+| `POST /store-batch` | ❌ | ✅ | Batch-ingest N records in one HTTP call (30-50× faster) |
 | `POST /forget` | ❌ (regression) | ✅ | Restored from v0.4.x; supports `metadata_contains` filter |
+| `POST /aggregate` | ❌ | ✅ | NEW (v0.8.x): typed-Person aggregation over the L3 graph — counts COMMUNICATED edges per channel via the ChannelStat denormalisation |
-Migration: see `docs/MIGRATION.md`.
+Migration: see [`MIGRATION.md`](MIGRATION.md) for the wire-format walkthrough.