npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.3.0-beta.3 → 0.4.0-beta.1 - Mend

@pentatonic-ai/ai-agent-sdk 0.3.0-beta.3 → 0.4.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +172 -297
package/bin/cli.js +97 -1
package/dist/pentatonic_ai_agent_sdk-0.3.0-py3-none-any.whl +0 -0
package/dist/pentatonic_ai_agent_sdk-0.3.0.tar.gz +0 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,400 +1,275 @@
-# @pentatonic-ai/ai-agent-sdk
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-light.svg">
+    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-dark.svg">
+    <img alt="Pentatonic" src="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-dark.svg" width="200">
+  </picture>
+</p>
-LLM observability SDK — track token usage, tool calls, and conversations via [Pentatonic TES](https://api.pentatonic.com).
+<h3 align="center">AI Agent SDK</h3>
-Provider-agnostic: automatically wraps OpenAI, Anthropic, and Cloudflare Workers AI clients. Available for both **JavaScript** and **Python**.
+<p align="center">
+  Observability, memory, and analytics for LLM applications.<br>
+  Run locally or use hosted TES. JavaScript &amp; Python.
+</p>
-## Getting Started
+<p align="center">
+  <a href="https://www.npmjs.com/package/@pentatonic-ai/ai-agent-sdk"><img src="https://img.shields.io/npm/v/@pentatonic-ai/ai-agent-sdk?style=flat-square&color=00fba9&label=npm" alt="npm"></a>
+  <a href="https://pypi.org/project/pentatonic-ai-agent-sdk/"><img src="https://img.shields.io/pypi/v/pentatonic-ai-agent-sdk?style=flat-square&color=00fba9&label=pypi" alt="PyPI"></a>
+  <a href="https://github.com/Pentatonic-Ltd/ai-agent-sdk/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Pentatonic-Ltd/ai-agent-sdk?style=flat-square&color=333" alt="License"></a>
+</p>
-### 1. Create an account and get your API key
+---
-```bash
-npx @pentatonic-ai/ai-agent-sdk init
-```
+## Table of Contents
-This will walk you through:
-- Creating a Pentatonic account (email, company name, password)
-- Choosing a data region (EU or US)
-- Email verification
-- Generating your API key
+- [Overview](#overview)
+- [Local Memory (self-hosted)](#local-memory-self-hosted)
+- [Hosted TES](#hosted-tes)
+- [Claude Code Plugin](#claude-code-plugin)
+- [SDK: Wrap Your LLM Client](#sdk-wrap-your-llm-client)
+- [Supported Providers](#supported-providers)
+- [API Reference](#api-reference)
+- [Architecture](#architecture)
-At the end you'll see your credentials:
+## Overview
-```
-TES_ENDPOINT=https://api.pentatonic.com
-TES_CLIENT_ID=your-company
-TES_API_KEY=tes_your-company_xxxxx
-```
+Two ways to use the SDK:
-Add these to your environment (`.env`, secrets manager, etc.) and the CLI will install the SDK for you.
+**Local Memory** -- Run a fully private memory system on your own machine. PostgreSQL + pgvector + Ollama in Docker. No API keys, no cloud. Your agent gets persistent, searchable memory backed by multi-signal retrieval and HyDE query expansion.
-### 2. Or install manually
+**Hosted TES** -- Connect to Pentatonic's Thing Event System for production-grade observability, higher-dimensional embeddings, conversation analytics, and team-wide shared memory.
-If you already have an account, install the SDK directly:
+Both paths use the same Claude Code plugin. The hooks auto-search on every prompt and auto-store every conversation turn.
-```bash
-npm install @pentatonic-ai/ai-agent-sdk
-```
+## Local Memory (self-hosted)
+Run the full memory stack locally. Requires Docker and ~4GB disk for models.
+### 1. Set up
 ```bash
-pip install pentatonic-ai-agent-sdk
+npx @pentatonic-ai/ai-agent-sdk memory
 ```
-You can create API keys in the [Pentatonic dashboard](https://api.pentatonic.com).
+This starts PostgreSQL + pgvector, Ollama, and the memory server. It pulls embedding and chat models, and writes the local config.
-## Quick Start
+### 2. Install the Claude Code plugin
-#### JavaScript
+```
+/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
+/plugin install tes-memory@pentatonic-ai
+```
-```js
-import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
+That's it. The plugin hooks automatically search memories on every prompt and store every conversation turn. Fully local, fully private.
-const tes = new TESClient({
-  clientId: process.env.TES_CLIENT_ID,
-  apiKey: process.env.TES_API_KEY,
-  endpoint: process.env.TES_ENDPOINT,
-});
-```
+### What you get
-#### Python
+- **Automatic memory** -- every conversation turn is stored with embeddings and HyDE query expansion
+- **Semantic search** -- multi-signal retrieval combining vector similarity, BM25 full-text, recency decay, and access frequency
+- **Memory layers** -- episodic (recent), semantic (consolidated), procedural (how-to), working (temporary)
+- **Decay and consolidation** -- memories fade over time; frequently accessed ones get promoted
-```python
-from pentatonic_agent_events import TESClient
-import os
+### Change models
-tes = TESClient(
-    client_id=os.environ["TES_CLIENT_ID"],
-    api_key=os.environ["TES_API_KEY"],
-    endpoint=os.environ["TES_ENDPOINT"],
-)
+```bash
+EMBEDDING_MODEL=mxbai-embed-large LLM_MODEL=qwen2.5:7b npx @pentatonic-ai/ai-agent-sdk memory
 ```
-### Wrap any LLM client (automatic tracking)
-`tes.wrap()` auto-detects your client and intercepts every call — each one emits a `CHAT_TURN` event automatically. Pass an optional `sessionId` to link events from the same conversation, and `metadata` to attach custom fields.
+### Raspberry Pi
-#### JavaScript — OpenAI
+Pi 5 with 8GB RAM runs the full stack. `nomic-embed-text` (~300MB) + `llama3.2:3b` (~2GB) leaves plenty of headroom.
-```js
-import OpenAI from "openai";
+### Use as a library
-const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123", metadata: { userId: "u_1" } });
+```javascript
+import { createMemorySystem } from '@pentatonic/memory';
-// Every create() call automatically emits a CHAT_TURN event
-const result = await ai.chat.completions.create({
-  model: "gpt-4o",
-  messages: [{ role: "user", content: "Hello!" }],
+const memory = createMemorySystem({
+  db: pgPool,
+  embedding: { url: 'http://localhost:11434/v1', model: 'nomic-embed-text' },
+  llm: { url: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
 });
-ai.sessionId; // "conv-123" — or auto-generated UUID if not provided
+await memory.migrate();
+await memory.ensureLayers('my-app');
+await memory.ingest('User prefers dark mode', { clientId: 'my-app' });
+const results = await memory.search('preferences', { clientId: 'my-app' });
 ```
-#### Python — OpenAI
+## Hosted TES
-```python
-from openai import OpenAI
+Connect to Pentatonic's hosted infrastructure for production use.
-ai = tes.wrap(OpenAI(), session_id="conv-123", metadata={"user_id": "u_1"})
+### 1. Create an account
-# Every create() call automatically emits a CHAT_TURN event
-result = ai.chat.completions.create(
-    model="gpt-4o",
-    messages=[{"role": "user", "content": "Hello!"}],
-)
-ai.session_id  # "conv-123" — or auto-generated UUID if not provided
+```bash
+npx @pentatonic-ai/ai-agent-sdk init
 ```
-#### JavaScript — Anthropic
-```js
-import Anthropic from "@anthropic-ai/sdk";
-const claude = tes.wrap(new Anthropic());
+This walks you through account creation, email verification, and API key generation. You'll get:
-const result = await claude.messages.create({
-  model: "claude-sonnet-4-6-20250514",
-  max_tokens: 1024,
-  messages: [{ role: "user", content: "Hello!" }],
-});
 ```
-#### Python — Anthropic
-```python
-from anthropic import Anthropic
-claude = tes.wrap(Anthropic())
-result = claude.messages.create(
-    model="claude-sonnet-4-6-20250514",
-    max_tokens=1024,
-    messages=[{"role": "user", "content": "Hello!"}],
-)
+TES_ENDPOINT=https://your-company.api.pentatonic.com
+TES_CLIENT_ID=your-company
+TES_API_KEY=tes_your-company_xxxxx
 ```
-#### JavaScript — Cloudflare Workers AI
-```js
-// Cloudflare Workers AI binding
-const ai = tes.wrap(env.AI, { sessionId: sid, metadata: { shop: shopDomain } });
+### 2. Install
-// run() is intercepted automatically
-const result = await ai.run("@cf/meta/llama-3.1-8b-instruct", {
-  messages: [{ role: "user", content: "Hello!" }],
-});
+```bash
+npm install @pentatonic-ai/ai-agent-sdk
 ```
-> **Note:** Workers AI is a Cloudflare-specific binding and is only available in JavaScript.
+```bash
+pip install pentatonic-ai-agent-sdk
+```
-### Tool-calling loops
+### What you get (in addition to local features)
-For multi-round tool loops, just keep calling the wrapped client. Each `create()`/`run()` call emits its own event, and they're linked by `sessionId`. The dashboard aggregates tokens, tool calls, and turns per session automatically.
+- **Higher-dimensional embeddings** -- NV-Embed-v2 (4096d) for better retrieval accuracy
+- **Conversation analytics** -- session metrics, search attribution, dead-end detection
+- **Team-wide shared memory** -- semantic search across your team's AI interactions
+- **Admin dashboard** -- visualize conversations, token usage, and memory explorer
+- **Multi-tenancy** -- isolated databases per client
-#### JavaScript
+## Claude Code Plugin
-```js
-const ai = tes.wrap(new OpenAI(), { sessionId: "conv-101" });
+Works with both local and hosted setups. Install once, switch modes via config.
-// Round 1: AI requests a tool call — emits event with tool_calls
-const r1 = await ai.chat.completions.create({
-  model: "gpt-4o",
-  messages: [{ role: "user", content: "Find me running shoes" }],
-  tools: [searchTool],
-});
+### Install via marketplace
-// Execute tool, feed results back...
-// Round 2: AI responds with final answer — emits another event
-const r2 = await ai.chat.completions.create({
-  model: "gpt-4o",
-  messages: [...messages, { role: "tool", content: toolResult }],
-});
-// That's it. No manual emit needed. Both events share sessionId "conv-101".
+```
+/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
+/plugin install tes-memory@pentatonic-ai
 ```
-#### Python
-```python
-ai = tes.wrap(OpenAI(), session_id="conv-101")
-r1 = ai.chat.completions.create(
-    model="gpt-4o",
-    messages=[{"role": "user", "content": "Find me running shoes"}],
-    tools=[search_tool],
-)
-# Execute tool, feed results back...
+### Set up
-r2 = ai.chat.completions.create(
-    model="gpt-4o",
-    messages=[*messages, {"role": "tool", "content": tool_result}],
-)
+For hosted TES:
+```
+/tes-memory:tes-setup
+```
-# No manual emit needed.
+For local memory:
+```bash
+npx @pentatonic-ai/ai-agent-sdk memory
 ```
-### Manual session (full control)
+### What it tracks
+- **Every conversation turn** -- user messages, assistant responses, tool calls, duration
+- **Automatic memory search** -- relevant memories injected as context on every prompt
+- **Automatic memory storage** -- every turn stored with embeddings and HyDE queries
+- **Token usage** -- input, output, cache read, cache creation tokens per turn
-If you don't want to use `tes.wrap()`, create a session directly:
+## SDK: Wrap Your LLM Client
-#### JavaScript
+**JavaScript**
 ```js
-const session = tes.session({
-  sessionId: "conv-123",
-  metadata: { userId: "u_456" },
-});
+import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
-// Call your LLM however you like
-const response = await openai.chat.completions.create({
-  model: "gpt-4o",
-  messages: [{ role: "user", content: "What is 2+2?" }],
+const tes = new TESClient({
+  clientId: process.env.TES_CLIENT_ID,
+  apiKey: process.env.TES_API_KEY,
+  endpoint: process.env.TES_ENDPOINT,
 });
-// Record the response (accumulates tokens, tool calls, model)
-session.record(response);
-// Emit when the turn is complete
-await session.emitChatTurn({
-  userMessage: "What is 2+2?",
-  assistantResponse: response.choices[0].message.content,
+const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });
+const result = await ai.chat.completions.create({
+  model: "gpt-4o",
+  messages: [{ role: "user", content: "Hello!" }],
 });
 ```
-#### Python
+**Python**
 ```python
-session = tes.session(
-    session_id="conv-123",
-    metadata={"user_id": "u_456"},
-)
+from pentatonic_agent_events import TESClient
-response = openai.chat.completions.create(
-    model="gpt-4o",
-    messages=[{"role": "user", "content": "What is 2+2?"}],
+tes = TESClient(
+    client_id=os.environ["TES_CLIENT_ID"],
+    api_key=os.environ["TES_API_KEY"],
+    endpoint=os.environ["TES_ENDPOINT"],
 )
-session.record(response)
-session.emit_chat_turn(
-    user_message="What is 2+2?",
-    assistant_response=response["choices"][0]["message"]["content"],
+ai = tes.wrap(OpenAI(), session_id="conv-123")
+result = ai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hello!"}],
 )
 ```
-## API Reference
-### `TESClient`
-Creates a new client.
-#### JavaScript
-```js
-new TESClient({ clientId, apiKey, endpoint, headers?, userId?, captureContent?, maxContentLength? })
-```
-#### Python
-```python
-TESClient(client_id, api_key, endpoint, headers=None, user_id=None, capture_content=True, max_content_length=4096)
-```
-| Param (JS / Python) | Type | Default | Description |
-|----------------------|------|---------|-------------|
-| `clientId` / `client_id` | `string` | *required* | Your application/tenant identifier |
-| `apiKey` / `api_key` | `string` | *required* | TES service API key (sent as `x-service-key` header) |
-| `endpoint` / `endpoint` | `string` | *required* | TES instance URL (must be `https://`, except `localhost` for dev) |
-| `headers` / `headers` | `object` / `dict` | `{}` | Additional headers to include in every request |
-| `userId` / `user_id` | `string` | `null` / `None` | Optional user identifier — included as `data.attributes.userId` on every event. Enables user-scoped memory and attribution. |
-| `captureContent` / `capture_content` | `boolean` / `bool` | `true` / `True` | Whether to include message content in events |
-| `maxContentLength` / `max_content_length` | `number` / `int` | `4096` | Truncate content beyond this length |
-### `tes.wrap(client, opts?)`
-Returns a Proxy (JS) or wrapper (Python) around any supported LLM client. Every intercepted call emits a `CHAT_TURN` event automatically.
-#### JavaScript
-```js
-const ai = tes.wrap(client, { sessionId, userId, metadata });
-```
-#### Python
-```python
-ai = tes.wrap(client, session_id=None, user_id=None, metadata=None)
-```
-| Option (JS / Python) | Type | Default | Description |
-|----------------------|------|---------|-------------|
-| `sessionId` / `session_id` | `string` | `crypto.randomUUID()` / `uuid.uuid4()` | Links events from the same conversation |
-| `userId` / `user_id` | `string` | Inherits from client | Override the user identifier for this wrapped instance |
-| `metadata` / `metadata` | `object` / `dict` | `{}` | Custom fields included in every emitted event |
-Auto-detects the provider:
+## Supported Providers
-| Client | Detection | Intercepted method |
-|--------|-----------|-------------------|
+| Provider | Detection | Intercepted Method |
+|----------|-----------|-------------------|
 | OpenAI | `client.chat.completions.create` | `chat.completions.create()` |
 | Anthropic | `client.messages.create` | `messages.create()` |
 | Workers AI | `client.run` (JS only) | `run()` |
-All other methods/properties pass through unchanged. The wrapped client exposes `ai.sessionId` (JS) or `ai.session_id` (Python).
-### `tes.session(opts?)`
-Returns a `Session` instance.
+All other methods pass through unchanged.
-| Option (JS / Python) | Type | Default | Description |
-|----------------------|------|---------|-------------|
-| `sessionId` / `session_id` | `string` | `crypto.randomUUID()` / `uuid.uuid4()` | Conversation/session identifier |
-| `metadata` / `metadata` | `object` / `dict` | `{}` | Extra fields included in every emitted event |
-### `session.record(rawResponse)`
-Normalizes an LLM response and accumulates token usage, tool calls, and model info. Accepts responses from any supported provider. Returns the normalized response.
-### `session.emitChatTurn()` / `session.emit_chat_turn()`
-Sends a `CHAT_TURN` event to TES with accumulated usage data, then resets counters.
+## API Reference
-| Param (JS / Python) | Type | Description |
-|---------------------|------|-------------|
-| `userMessage` / `user_message` | `string` | The user's message |
-| `assistantResponse` / `assistant_response` | `string` | The assistant's response |
-| `turnNumber` / `turn_number` | `number` / `int` | Optional turn number |
+### `TESClient(config)`
-### `session.emitToolUse()` / `session.emit_tool_use()`
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `clientId` | `string` | required | Your tenant identifier |
+| `apiKey` | `string` | required | TES API key |
+| `endpoint` | `string` | required | TES instance URL |
+| `userId` | `string` | `null` | User identifier for attribution |
+| `captureContent` | `boolean` | `true` | Include message content in events |
+| `maxContentLength` | `number` | `4096` | Truncate content beyond this length |
-Sends a `TOOL_USE` event for individual tool invocations.
+### `tes.wrap(client, opts?)`
-| Param (JS / Python) | Type | Description |
-|---------------------|------|-------------|
-| `tool` / `tool` | `string` | Tool name |
-| `args` / `args` | `object` / `dict` | Tool arguments |
-| `resultSummary` / `result_summary` | `string` | Optional result summary |
-| `durationMs` / `duration_ms` | `number` / `int` | Optional duration in milliseconds |
-| `turnNumber` / `turn_number` | `number` / `int` | Optional turn number |
+Returns an instrumented proxy. Every intercepted call emits a `CHAT_TURN` event.
-### `session.emitSessionStart()` / `session.emit_session_start()`
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `sessionId` | `string` | auto-generated UUID | Links events from the same conversation |
+| `metadata` | `object` | `{}` | Custom fields on every event |
-Sends a `SESSION_START` event.
+### `tes.session(opts?)`
-### `session.totalUsage` / `session.total_usage`
+Returns a `Session` for manual event emission.
-Returns current accumulated usage: `{ prompt_tokens, completion_tokens, total_tokens, ai_rounds }`.
+### `session.emitChatTurn({ userMessage, assistantResponse, turnNumber? })`
-### `normalizeResponse(raw)` / `normalize_response(raw)`
+Emits a `CHAT_TURN` event with accumulated data, then resets.
-Standalone utility to normalize any LLM response into a consistent shape:
+### `normalizeResponse(raw)`
-#### JavaScript
+Standalone utility to normalize any LLM response:
 ```js
 import { normalizeResponse } from "@pentatonic-ai/ai-agent-sdk";
-const normalized = normalizeResponse(openaiResponse);
-// { content, model, usage: { prompt_tokens, completion_tokens }, toolCalls: [{ tool, args }] }
+const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);
 ```
-#### Python
+## Architecture
-```python
-from pentatonic_agent_events import normalize_response
-normalized = normalize_response(openai_response)
-# { "content", "model", "usage": { "prompt_tokens", "completion_tokens" }, "tool_calls": [{ "tool", "args" }] }
 ```
-> **Note:** In Python, the normalized response uses `tool_calls` (snake_case) instead of `toolCalls` (camelCase).
-## Events Emitted
-All events are sent to the TES GraphQL API (`emitEvent` mutation) authenticated via `x-service-key` and `x-client-id` headers.
-| Event Type | Entity Type | When |
-|------------|-------------|------|
-| `CHAT_TURN` | `conversation` | Every `create()`/`run()` call via `wrap()`, or manually via `session.emitChatTurn()` |
-| `TOOL_USE` | `conversation` | Via `session.emitToolUse()` (manual only) |
-| `SESSION_START` | `conversation` | Via `session.emitSessionStart()` (manual only) |
-## Supported Providers
-| Provider | Auto-wrap | Manual session | Response normalization |
-|----------|-----------|---------------|----------------------|
-| **OpenAI** (and compatible: Azure, Groq, Together, Mistral) | JS + Python | JS + Python | JS + Python |
-| **Anthropic** | JS + Python | JS + Python | JS + Python |
-| **Cloudflare Workers AI** | JS only | JS only | JS + Python |
-## Security
-- **HTTPS enforced:** The SDK rejects non-HTTPS endpoints (except `localhost` for development)
-- **API key protection:** Stored as a non-enumerable property (JS) or private attribute (Python) — won't appear in `JSON.stringify`, `repr()`, or error reporters
-- **Content controls:** Set `captureContent: false` (JS) or `capture_content=False` (Python) to omit message content from events, or use `maxContentLength` / `max_content_length` to truncate
-- **No runtime dependencies:** Both the JavaScript and Python SDKs have zero external runtime dependencies
+                    +-----------------------+
+                    |   Claude Code Plugin  |
+                    |   (hooks: auto-search |
+                    |    + auto-store)      |
+                    +-----------+-----------+
+                                |
+                    +-----------+-----------+
+                    |                       |
+              Local Memory            Hosted TES
+              (Docker)                (Cloud)
+                    |                       |
+         +----+----+----+          +---+----+---+
+         |    |    |    |          |   |    |   |
+        PG  Ollama MCP HTTP      PG  R2  Queue Workers
+        pgvector        API     pgvector       Modules
+```
 ## License

package/bin/cli.js CHANGED Viewed

@@ -2,6 +2,9 @@
 import { createInterface } from "readline";
 import { execFileSync } from "child_process";
+import { existsSync, mkdirSync, writeFileSync } from "fs";
+import { join } from "path";
+import { homedir } from "os";
 const DEFAULT_ENDPOINT = "https://api.pentatonic.com";
@@ -135,16 +138,109 @@ function toClientId(companyName) {
     .replace(/^-|-$/g, "");
 }
+async function setupLocalMemory() {
+  console.log(`\n  @pentatonic/memory — Local Setup\n`);
+  // Check Docker
+  try {
+    execFileSync("docker", ["info"], { stdio: "pipe" });
+  } catch {
+    console.error("  Error: Docker is required. Install it from https://docker.com\n");
+    process.exit(1);
+  }
+  const memoryDir = new URL("../packages/memory", import.meta.url).pathname;
+  // Start infrastructure + memory server
+  const infraSpinner = spinner("Starting memory server + PostgreSQL + Ollama...");
+  try {
+    execFileSync("docker", ["compose", "up", "-d", "memory", "postgres", "ollama"], {
+      cwd: memoryDir,
+      stdio: "pipe",
+    });
+    infraSpinner.stop("Memory stack running!");
+  } catch (err) {
+    infraSpinner.fail(`Failed to start: ${err.message}`);
+    process.exit(1);
+  }
+  // Pull models
+  const embModel = process.env.EMBEDDING_MODEL || "nomic-embed-text";
+  const llmModel = process.env.LLM_MODEL || "llama3.2:3b";
+  const embSpinner = spinner(`Pulling ${embModel}...`);
+  try {
+    execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", embModel], {
+      cwd: memoryDir,
+      stdio: "pipe",
+    });
+    embSpinner.stop(`${embModel} ready!`);
+  } catch {
+    embSpinner.fail(`Failed to pull ${embModel}. Run manually: docker compose exec ollama ollama pull ${embModel}`);
+  }
+  const llmSpinner = spinner(`Pulling ${llmModel}...`);
+  try {
+    execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", llmModel], {
+      cwd: memoryDir,
+      stdio: "pipe",
+    });
+    llmSpinner.stop(`${llmModel} ready!`);
+  } catch {
+    llmSpinner.fail(`Failed to pull ${llmModel}. Run manually: docker compose exec ollama ollama pull ${llmModel}`);
+  }
+  // Write local config
+  const configDir = join(homedir(), ".claude-pentatonic");
+  if (!existsSync(configDir)) {
+    mkdirSync(configDir, { recursive: true });
+  }
+  const configPath = join(configDir, "tes-memory.local.md");
+  writeFileSync(
+    configPath,
+    `---
+mode: local
+memory_url: http://localhost:3333
+---
+`
+  );
+  console.log(`\n  Config written to ${configPath}`);
+  const sdkDir = new URL("..", import.meta.url).pathname;
+  console.log(`
+  Memory server: http://localhost:3333
+  Hooks are auto-configured to use local memory.
+  Install the plugin in Claude Code:
+    /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
+    /plugin install tes-memory@pentatonic-ai
+  You're ready! Every prompt auto-searches memory,
+  every turn auto-stores. No MCP setup needed.
+`);
+  rl.close();
+}
 async function main() {
   const flags = parseArgs();
   const TES_ENDPOINT = flags.endpoint || DEFAULT_ENDPOINT;
+  if (flags.command === "memory") {
+    await setupLocalMemory();
+    return;
+  }
   if (flags.command !== "init") {
     console.log(`
 @pentatonic-ai/ai-agent-sdk
 Usage:
-  npx @pentatonic-ai/ai-agent-sdk init                    Set up account and install SDK
+  npx @pentatonic-ai/ai-agent-sdk init                    Set up hosted TES account
+  npx @pentatonic-ai/ai-agent-sdk memory                  Set up local memory stack
   npx @pentatonic-ai/ai-agent-sdk init --endpoint URL     Use a custom TES endpoint
 For docs, see https://api.pentatonic.com

package/dist/pentatonic_ai_agent_sdk-0.3.0-py3-none-any.whl ADDED Viewed

Binary file

package/dist/pentatonic_ai_agent_sdk-0.3.0.tar.gz ADDED Viewed

Binary file

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pentatonic-ai/ai-agent-sdk",
-  "version": "0.3.0-beta.3",
+  "version": "0.4.0-beta.1",
   "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
   "type": "module",
   "main": "./dist/index.cjs",