@pentatonic-ai/ai-agent-sdk 0.3.0-beta.3 → 0.4.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,400 +1,275 @@
1
- # @pentatonic-ai/ai-agent-sdk
1
+ <p align="center">
2
+ <picture>
3
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-light.svg">
4
+ <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-dark.svg">
5
+ <img alt="Pentatonic" src="https://raw.githubusercontent.com/Pentatonic-Ltd/ai-agent-sdk/main/.github/logo-dark.svg" width="200">
6
+ </picture>
7
+ </p>
2
8
 
3
- LLM observability SDK — track token usage, tool calls, and conversations via [Pentatonic TES](https://api.pentatonic.com).
9
+ <h3 align="center">AI Agent SDK</h3>
4
10
 
5
- Provider-agnostic: automatically wraps OpenAI, Anthropic, and Cloudflare Workers AI clients. Available for both **JavaScript** and **Python**.
11
+ <p align="center">
12
+ Observability, memory, and analytics for LLM applications.<br>
13
+ Run locally or use hosted TES. JavaScript &amp; Python.
14
+ </p>
6
15
 
7
- ## Getting Started
16
+ <p align="center">
17
+ <a href="https://www.npmjs.com/package/@pentatonic-ai/ai-agent-sdk"><img src="https://img.shields.io/npm/v/@pentatonic-ai/ai-agent-sdk?style=flat-square&color=00fba9&label=npm" alt="npm"></a>
18
+ <a href="https://pypi.org/project/pentatonic-ai-agent-sdk/"><img src="https://img.shields.io/pypi/v/pentatonic-ai-agent-sdk?style=flat-square&color=00fba9&label=pypi" alt="PyPI"></a>
19
+ <a href="https://github.com/Pentatonic-Ltd/ai-agent-sdk/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Pentatonic-Ltd/ai-agent-sdk?style=flat-square&color=333" alt="License"></a>
20
+ </p>
8
21
 
9
- ### 1. Create an account and get your API key
22
+ ---
10
23
 
11
- ```bash
12
- npx @pentatonic-ai/ai-agent-sdk init
13
- ```
24
+ ## Table of Contents
14
25
 
15
- This will walk you through:
16
- - Creating a Pentatonic account (email, company name, password)
17
- - Choosing a data region (EU or US)
18
- - Email verification
19
- - Generating your API key
26
+ - [Overview](#overview)
27
+ - [Local Memory (self-hosted)](#local-memory-self-hosted)
28
+ - [Hosted TES](#hosted-tes)
29
+ - [Claude Code Plugin](#claude-code-plugin)
30
+ - [SDK: Wrap Your LLM Client](#sdk-wrap-your-llm-client)
31
+ - [Supported Providers](#supported-providers)
32
+ - [API Reference](#api-reference)
33
+ - [Architecture](#architecture)
20
34
 
21
- At the end you'll see your credentials:
35
+ ## Overview
22
36
 
23
- ```
24
- TES_ENDPOINT=https://api.pentatonic.com
25
- TES_CLIENT_ID=your-company
26
- TES_API_KEY=tes_your-company_xxxxx
27
- ```
37
+ Two ways to use the SDK:
28
38
 
29
- Add these to your environment (`.env`, secrets manager, etc.) and the CLI will install the SDK for you.
39
+ **Local Memory** -- Run a fully private memory system on your own machine. PostgreSQL + pgvector + Ollama in Docker. No API keys, no cloud. Your agent gets persistent, searchable memory backed by multi-signal retrieval and HyDE query expansion.
30
40
 
31
- ### 2. Or install manually
41
+ **Hosted TES** -- Connect to Pentatonic's Thing Event System for production-grade observability, higher-dimensional embeddings, conversation analytics, and team-wide shared memory.
32
42
 
33
- If you already have an account, install the SDK directly:
43
+ Both paths use the same Claude Code plugin. The hooks auto-search on every prompt and auto-store every conversation turn.
34
44
 
35
- ```bash
36
- npm install @pentatonic-ai/ai-agent-sdk
37
- ```
45
+ ## Local Memory (self-hosted)
46
+
47
+ Run the full memory stack locally. Requires Docker and ~4GB disk for models.
48
+
49
+ ### 1. Set up
38
50
 
39
51
  ```bash
40
- pip install pentatonic-ai-agent-sdk
52
+ npx @pentatonic-ai/ai-agent-sdk memory
41
53
  ```
42
54
 
43
- You can create API keys in the [Pentatonic dashboard](https://api.pentatonic.com).
55
+ This starts PostgreSQL + pgvector, Ollama, and the memory server. It pulls embedding and chat models, and writes the local config.
44
56
 
45
- ## Quick Start
57
+ ### 2. Install the Claude Code plugin
46
58
 
47
- #### JavaScript
59
+ ```
60
+ /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
61
+ /plugin install tes-memory@pentatonic-ai
62
+ ```
48
63
 
49
- ```js
50
- import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
64
+ That's it. The plugin hooks automatically search memories on every prompt and store every conversation turn. Fully local, fully private.
51
65
 
52
- const tes = new TESClient({
53
- clientId: process.env.TES_CLIENT_ID,
54
- apiKey: process.env.TES_API_KEY,
55
- endpoint: process.env.TES_ENDPOINT,
56
- });
57
- ```
66
+ ### What you get
58
67
 
59
- #### Python
68
+ - **Automatic memory** -- every conversation turn is stored with embeddings and HyDE query expansion
69
+ - **Semantic search** -- multi-signal retrieval combining vector similarity, BM25 full-text, recency decay, and access frequency
70
+ - **Memory layers** -- episodic (recent), semantic (consolidated), procedural (how-to), working (temporary)
71
+ - **Decay and consolidation** -- memories fade over time; frequently accessed ones get promoted
60
72
 
61
- ```python
62
- from pentatonic_agent_events import TESClient
63
- import os
73
+ ### Change models
64
74
 
65
- tes = TESClient(
66
- client_id=os.environ["TES_CLIENT_ID"],
67
- api_key=os.environ["TES_API_KEY"],
68
- endpoint=os.environ["TES_ENDPOINT"],
69
- )
75
+ ```bash
76
+ EMBEDDING_MODEL=mxbai-embed-large LLM_MODEL=qwen2.5:7b npx @pentatonic-ai/ai-agent-sdk memory
70
77
  ```
71
78
 
72
- ### Wrap any LLM client (automatic tracking)
73
-
74
- `tes.wrap()` auto-detects your client and intercepts every call — each one emits a `CHAT_TURN` event automatically. Pass an optional `sessionId` to link events from the same conversation, and `metadata` to attach custom fields.
79
+ ### Raspberry Pi
75
80
 
76
- #### JavaScript OpenAI
81
+ Pi 5 with 8GB RAM runs the full stack. `nomic-embed-text` (~300MB) + `llama3.2:3b` (~2GB) leaves plenty of headroom.
77
82
 
78
- ```js
79
- import OpenAI from "openai";
83
+ ### Use as a library
80
84
 
81
- const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123", metadata: { userId: "u_1" } });
85
+ ```javascript
86
+ import { createMemorySystem } from '@pentatonic/memory';
82
87
 
83
- // Every create() call automatically emits a CHAT_TURN event
84
- const result = await ai.chat.completions.create({
85
- model: "gpt-4o",
86
- messages: [{ role: "user", content: "Hello!" }],
88
+ const memory = createMemorySystem({
89
+ db: pgPool,
90
+ embedding: { url: 'http://localhost:11434/v1', model: 'nomic-embed-text' },
91
+ llm: { url: 'http://localhost:11434/v1', model: 'llama3.2:3b' },
87
92
  });
88
93
 
89
- ai.sessionId; // "conv-123" — or auto-generated UUID if not provided
94
+ await memory.migrate();
95
+ await memory.ensureLayers('my-app');
96
+ await memory.ingest('User prefers dark mode', { clientId: 'my-app' });
97
+ const results = await memory.search('preferences', { clientId: 'my-app' });
90
98
  ```
91
99
 
92
- #### Python — OpenAI
100
+ ## Hosted TES
93
101
 
94
- ```python
95
- from openai import OpenAI
102
+ Connect to Pentatonic's hosted infrastructure for production use.
96
103
 
97
- ai = tes.wrap(OpenAI(), session_id="conv-123", metadata={"user_id": "u_1"})
104
+ ### 1. Create an account
98
105
 
99
- # Every create() call automatically emits a CHAT_TURN event
100
- result = ai.chat.completions.create(
101
- model="gpt-4o",
102
- messages=[{"role": "user", "content": "Hello!"}],
103
- )
104
-
105
- ai.session_id # "conv-123" — or auto-generated UUID if not provided
106
+ ```bash
107
+ npx @pentatonic-ai/ai-agent-sdk init
106
108
  ```
107
109
 
108
- #### JavaScript Anthropic
109
-
110
- ```js
111
- import Anthropic from "@anthropic-ai/sdk";
112
-
113
- const claude = tes.wrap(new Anthropic());
110
+ This walks you through account creation, email verification, and API key generation. You'll get:
114
111
 
115
- const result = await claude.messages.create({
116
- model: "claude-sonnet-4-6-20250514",
117
- max_tokens: 1024,
118
- messages: [{ role: "user", content: "Hello!" }],
119
- });
120
112
  ```
121
-
122
- #### Python — Anthropic
123
-
124
- ```python
125
- from anthropic import Anthropic
126
-
127
- claude = tes.wrap(Anthropic())
128
-
129
- result = claude.messages.create(
130
- model="claude-sonnet-4-6-20250514",
131
- max_tokens=1024,
132
- messages=[{"role": "user", "content": "Hello!"}],
133
- )
113
+ TES_ENDPOINT=https://your-company.api.pentatonic.com
114
+ TES_CLIENT_ID=your-company
115
+ TES_API_KEY=tes_your-company_xxxxx
134
116
  ```
135
117
 
136
- #### JavaScript — Cloudflare Workers AI
137
-
138
- ```js
139
- // Cloudflare Workers AI binding
140
- const ai = tes.wrap(env.AI, { sessionId: sid, metadata: { shop: shopDomain } });
118
+ ### 2. Install
141
119
 
142
- // run() is intercepted automatically
143
- const result = await ai.run("@cf/meta/llama-3.1-8b-instruct", {
144
- messages: [{ role: "user", content: "Hello!" }],
145
- });
120
+ ```bash
121
+ npm install @pentatonic-ai/ai-agent-sdk
146
122
  ```
147
123
 
148
- > **Note:** Workers AI is a Cloudflare-specific binding and is only available in JavaScript.
124
+ ```bash
125
+ pip install pentatonic-ai-agent-sdk
126
+ ```
149
127
 
150
- ### Tool-calling loops
128
+ ### What you get (in addition to local features)
151
129
 
152
- For multi-round tool loops, just keep calling the wrapped client. Each `create()`/`run()` call emits its own event, and they're linked by `sessionId`. The dashboard aggregates tokens, tool calls, and turns per session automatically.
130
+ - **Higher-dimensional embeddings** -- NV-Embed-v2 (4096d) for better retrieval accuracy
131
+ - **Conversation analytics** -- session metrics, search attribution, dead-end detection
132
+ - **Team-wide shared memory** -- semantic search across your team's AI interactions
133
+ - **Admin dashboard** -- visualize conversations, token usage, and memory explorer
134
+ - **Multi-tenancy** -- isolated databases per client
153
135
 
154
- #### JavaScript
136
+ ## Claude Code Plugin
155
137
 
156
- ```js
157
- const ai = tes.wrap(new OpenAI(), { sessionId: "conv-101" });
138
+ Works with both local and hosted setups. Install once, switch modes via config.
158
139
 
159
- // Round 1: AI requests a tool call — emits event with tool_calls
160
- const r1 = await ai.chat.completions.create({
161
- model: "gpt-4o",
162
- messages: [{ role: "user", content: "Find me running shoes" }],
163
- tools: [searchTool],
164
- });
140
+ ### Install via marketplace
165
141
 
166
- // Execute tool, feed results back...
167
-
168
- // Round 2: AI responds with final answer — emits another event
169
- const r2 = await ai.chat.completions.create({
170
- model: "gpt-4o",
171
- messages: [...messages, { role: "tool", content: toolResult }],
172
- });
173
-
174
- // That's it. No manual emit needed. Both events share sessionId "conv-101".
142
+ ```
143
+ /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
144
+ /plugin install tes-memory@pentatonic-ai
175
145
  ```
176
146
 
177
- #### Python
178
-
179
- ```python
180
- ai = tes.wrap(OpenAI(), session_id="conv-101")
181
-
182
- r1 = ai.chat.completions.create(
183
- model="gpt-4o",
184
- messages=[{"role": "user", "content": "Find me running shoes"}],
185
- tools=[search_tool],
186
- )
187
-
188
- # Execute tool, feed results back...
147
+ ### Set up
189
148
 
190
- r2 = ai.chat.completions.create(
191
- model="gpt-4o",
192
- messages=[*messages, {"role": "tool", "content": tool_result}],
193
- )
149
+ For hosted TES:
150
+ ```
151
+ /tes-memory:tes-setup
152
+ ```
194
153
 
195
- # No manual emit needed.
154
+ For local memory:
155
+ ```bash
156
+ npx @pentatonic-ai/ai-agent-sdk memory
196
157
  ```
197
158
 
198
- ### Manual session (full control)
159
+ ### What it tracks
160
+
161
+ - **Every conversation turn** -- user messages, assistant responses, tool calls, duration
162
+ - **Automatic memory search** -- relevant memories injected as context on every prompt
163
+ - **Automatic memory storage** -- every turn stored with embeddings and HyDE queries
164
+ - **Token usage** -- input, output, cache read, cache creation tokens per turn
199
165
 
200
- If you don't want to use `tes.wrap()`, create a session directly:
166
+ ## SDK: Wrap Your LLM Client
201
167
 
202
- #### JavaScript
168
+ **JavaScript**
203
169
 
204
170
  ```js
205
- const session = tes.session({
206
- sessionId: "conv-123",
207
- metadata: { userId: "u_456" },
208
- });
171
+ import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
209
172
 
210
- // Call your LLM however you like
211
- const response = await openai.chat.completions.create({
212
- model: "gpt-4o",
213
- messages: [{ role: "user", content: "What is 2+2?" }],
173
+ const tes = new TESClient({
174
+ clientId: process.env.TES_CLIENT_ID,
175
+ apiKey: process.env.TES_API_KEY,
176
+ endpoint: process.env.TES_ENDPOINT,
214
177
  });
215
178
 
216
- // Record the response (accumulates tokens, tool calls, model)
217
- session.record(response);
218
-
219
- // Emit when the turn is complete
220
- await session.emitChatTurn({
221
- userMessage: "What is 2+2?",
222
- assistantResponse: response.choices[0].message.content,
179
+ const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });
180
+ const result = await ai.chat.completions.create({
181
+ model: "gpt-4o",
182
+ messages: [{ role: "user", content: "Hello!" }],
223
183
  });
224
184
  ```
225
185
 
226
- #### Python
186
+ **Python**
227
187
 
228
188
  ```python
229
- session = tes.session(
230
- session_id="conv-123",
231
- metadata={"user_id": "u_456"},
232
- )
189
+ from pentatonic_agent_events import TESClient
233
190
 
234
- response = openai.chat.completions.create(
235
- model="gpt-4o",
236
- messages=[{"role": "user", "content": "What is 2+2?"}],
191
+ tes = TESClient(
192
+ client_id=os.environ["TES_CLIENT_ID"],
193
+ api_key=os.environ["TES_API_KEY"],
194
+ endpoint=os.environ["TES_ENDPOINT"],
237
195
  )
238
196
 
239
- session.record(response)
240
-
241
- session.emit_chat_turn(
242
- user_message="What is 2+2?",
243
- assistant_response=response["choices"][0]["message"]["content"],
197
+ ai = tes.wrap(OpenAI(), session_id="conv-123")
198
+ result = ai.chat.completions.create(
199
+ model="gpt-4o",
200
+ messages=[{"role": "user", "content": "Hello!"}],
244
201
  )
245
202
  ```
246
203
 
247
- ## API Reference
248
-
249
- ### `TESClient`
250
-
251
- Creates a new client.
252
-
253
- #### JavaScript
254
-
255
- ```js
256
- new TESClient({ clientId, apiKey, endpoint, headers?, userId?, captureContent?, maxContentLength? })
257
- ```
258
-
259
- #### Python
260
-
261
- ```python
262
- TESClient(client_id, api_key, endpoint, headers=None, user_id=None, capture_content=True, max_content_length=4096)
263
- ```
264
-
265
- | Param (JS / Python) | Type | Default | Description |
266
- |----------------------|------|---------|-------------|
267
- | `clientId` / `client_id` | `string` | *required* | Your application/tenant identifier |
268
- | `apiKey` / `api_key` | `string` | *required* | TES service API key (sent as `x-service-key` header) |
269
- | `endpoint` / `endpoint` | `string` | *required* | TES instance URL (must be `https://`, except `localhost` for dev) |
270
- | `headers` / `headers` | `object` / `dict` | `{}` | Additional headers to include in every request |
271
- | `userId` / `user_id` | `string` | `null` / `None` | Optional user identifier — included as `data.attributes.userId` on every event. Enables user-scoped memory and attribution. |
272
- | `captureContent` / `capture_content` | `boolean` / `bool` | `true` / `True` | Whether to include message content in events |
273
- | `maxContentLength` / `max_content_length` | `number` / `int` | `4096` | Truncate content beyond this length |
274
-
275
- ### `tes.wrap(client, opts?)`
276
-
277
- Returns a Proxy (JS) or wrapper (Python) around any supported LLM client. Every intercepted call emits a `CHAT_TURN` event automatically.
278
-
279
- #### JavaScript
280
-
281
- ```js
282
- const ai = tes.wrap(client, { sessionId, userId, metadata });
283
- ```
284
-
285
- #### Python
286
-
287
- ```python
288
- ai = tes.wrap(client, session_id=None, user_id=None, metadata=None)
289
- ```
290
-
291
- | Option (JS / Python) | Type | Default | Description |
292
- |----------------------|------|---------|-------------|
293
- | `sessionId` / `session_id` | `string` | `crypto.randomUUID()` / `uuid.uuid4()` | Links events from the same conversation |
294
- | `userId` / `user_id` | `string` | Inherits from client | Override the user identifier for this wrapped instance |
295
- | `metadata` / `metadata` | `object` / `dict` | `{}` | Custom fields included in every emitted event |
296
-
297
- Auto-detects the provider:
204
+ ## Supported Providers
298
205
 
299
- | Client | Detection | Intercepted method |
300
- |--------|-----------|-------------------|
206
+ | Provider | Detection | Intercepted Method |
207
+ |----------|-----------|-------------------|
301
208
  | OpenAI | `client.chat.completions.create` | `chat.completions.create()` |
302
209
  | Anthropic | `client.messages.create` | `messages.create()` |
303
210
  | Workers AI | `client.run` (JS only) | `run()` |
304
211
 
305
- All other methods/properties pass through unchanged. The wrapped client exposes `ai.sessionId` (JS) or `ai.session_id` (Python).
306
-
307
- ### `tes.session(opts?)`
308
-
309
- Returns a `Session` instance.
212
+ All other methods pass through unchanged.
310
213
 
311
- | Option (JS / Python) | Type | Default | Description |
312
- |----------------------|------|---------|-------------|
313
- | `sessionId` / `session_id` | `string` | `crypto.randomUUID()` / `uuid.uuid4()` | Conversation/session identifier |
314
- | `metadata` / `metadata` | `object` / `dict` | `{}` | Extra fields included in every emitted event |
315
-
316
- ### `session.record(rawResponse)`
317
-
318
- Normalizes an LLM response and accumulates token usage, tool calls, and model info. Accepts responses from any supported provider. Returns the normalized response.
319
-
320
- ### `session.emitChatTurn()` / `session.emit_chat_turn()`
321
-
322
- Sends a `CHAT_TURN` event to TES with accumulated usage data, then resets counters.
214
+ ## API Reference
323
215
 
324
- | Param (JS / Python) | Type | Description |
325
- |---------------------|------|-------------|
326
- | `userMessage` / `user_message` | `string` | The user's message |
327
- | `assistantResponse` / `assistant_response` | `string` | The assistant's response |
328
- | `turnNumber` / `turn_number` | `number` / `int` | Optional turn number |
216
+ ### `TESClient(config)`
329
217
 
330
- ### `session.emitToolUse()` / `session.emit_tool_use()`
218
+ | Param | Type | Default | Description |
219
+ |-------|------|---------|-------------|
220
+ | `clientId` | `string` | required | Your tenant identifier |
221
+ | `apiKey` | `string` | required | TES API key |
222
+ | `endpoint` | `string` | required | TES instance URL |
223
+ | `userId` | `string` | `null` | User identifier for attribution |
224
+ | `captureContent` | `boolean` | `true` | Include message content in events |
225
+ | `maxContentLength` | `number` | `4096` | Truncate content beyond this length |
331
226
 
332
- Sends a `TOOL_USE` event for individual tool invocations.
227
+ ### `tes.wrap(client, opts?)`
333
228
 
334
- | Param (JS / Python) | Type | Description |
335
- |---------------------|------|-------------|
336
- | `tool` / `tool` | `string` | Tool name |
337
- | `args` / `args` | `object` / `dict` | Tool arguments |
338
- | `resultSummary` / `result_summary` | `string` | Optional result summary |
339
- | `durationMs` / `duration_ms` | `number` / `int` | Optional duration in milliseconds |
340
- | `turnNumber` / `turn_number` | `number` / `int` | Optional turn number |
229
+ Returns an instrumented proxy. Every intercepted call emits a `CHAT_TURN` event.
341
230
 
342
- ### `session.emitSessionStart()` / `session.emit_session_start()`
231
+ | Option | Type | Default | Description |
232
+ |--------|------|---------|-------------|
233
+ | `sessionId` | `string` | auto-generated UUID | Links events from the same conversation |
234
+ | `metadata` | `object` | `{}` | Custom fields on every event |
343
235
 
344
- Sends a `SESSION_START` event.
236
+ ### `tes.session(opts?)`
345
237
 
346
- ### `session.totalUsage` / `session.total_usage`
238
+ Returns a `Session` for manual event emission.
347
239
 
348
- Returns current accumulated usage: `{ prompt_tokens, completion_tokens, total_tokens, ai_rounds }`.
240
+ ### `session.emitChatTurn({ userMessage, assistantResponse, turnNumber? })`
349
241
 
350
- ### `normalizeResponse(raw)` / `normalize_response(raw)`
242
+ Emits a `CHAT_TURN` event with accumulated data, then resets.
351
243
 
352
- Standalone utility to normalize any LLM response into a consistent shape:
244
+ ### `normalizeResponse(raw)`
353
245
 
354
- #### JavaScript
246
+ Standalone utility to normalize any LLM response:
355
247
 
356
248
  ```js
357
249
  import { normalizeResponse } from "@pentatonic-ai/ai-agent-sdk";
358
250
 
359
- const normalized = normalizeResponse(openaiResponse);
360
- // { content, model, usage: { prompt_tokens, completion_tokens }, toolCalls: [{ tool, args }] }
251
+ const { content, model, usage, toolCalls } = normalizeResponse(openaiResponse);
361
252
  ```
362
253
 
363
- #### Python
254
+ ## Architecture
364
255
 
365
- ```python
366
- from pentatonic_agent_events import normalize_response
367
-
368
- normalized = normalize_response(openai_response)
369
- # { "content", "model", "usage": { "prompt_tokens", "completion_tokens" }, "tool_calls": [{ "tool", "args" }] }
370
256
  ```
371
-
372
- > **Note:** In Python, the normalized response uses `tool_calls` (snake_case) instead of `toolCalls` (camelCase).
373
-
374
- ## Events Emitted
375
-
376
- All events are sent to the TES GraphQL API (`emitEvent` mutation) authenticated via `x-service-key` and `x-client-id` headers.
377
-
378
- | Event Type | Entity Type | When |
379
- |------------|-------------|------|
380
- | `CHAT_TURN` | `conversation` | Every `create()`/`run()` call via `wrap()`, or manually via `session.emitChatTurn()` |
381
- | `TOOL_USE` | `conversation` | Via `session.emitToolUse()` (manual only) |
382
- | `SESSION_START` | `conversation` | Via `session.emitSessionStart()` (manual only) |
383
-
384
- ## Supported Providers
385
-
386
- | Provider | Auto-wrap | Manual session | Response normalization |
387
- |----------|-----------|---------------|----------------------|
388
- | **OpenAI** (and compatible: Azure, Groq, Together, Mistral) | JS + Python | JS + Python | JS + Python |
389
- | **Anthropic** | JS + Python | JS + Python | JS + Python |
390
- | **Cloudflare Workers AI** | JS only | JS only | JS + Python |
391
-
392
- ## Security
393
-
394
- - **HTTPS enforced:** The SDK rejects non-HTTPS endpoints (except `localhost` for development)
395
- - **API key protection:** Stored as a non-enumerable property (JS) or private attribute (Python) — won't appear in `JSON.stringify`, `repr()`, or error reporters
396
- - **Content controls:** Set `captureContent: false` (JS) or `capture_content=False` (Python) to omit message content from events, or use `maxContentLength` / `max_content_length` to truncate
397
- - **No runtime dependencies:** Both the JavaScript and Python SDKs have zero external runtime dependencies
257
+ +-----------------------+
258
+ | Claude Code Plugin |
259
+ | (hooks: auto-search |
260
+ | + auto-store) |
261
+ +-----------+-----------+
262
+ |
263
+ +-----------+-----------+
264
+ | |
265
+ Local Memory Hosted TES
266
+ (Docker) (Cloud)
267
+ | |
268
+ +----+----+----+ +---+----+---+
269
+ | | | | | | | |
270
+ PG Ollama MCP HTTP PG R2 Queue Workers
271
+ pgvector API pgvector Modules
272
+ ```
398
273
 
399
274
  ## License
400
275
 
package/bin/cli.js CHANGED
@@ -2,6 +2,9 @@
2
2
 
3
3
  import { createInterface } from "readline";
4
4
  import { execFileSync } from "child_process";
5
+ import { existsSync, mkdirSync, writeFileSync } from "fs";
6
+ import { join } from "path";
7
+ import { homedir } from "os";
5
8
 
6
9
  const DEFAULT_ENDPOINT = "https://api.pentatonic.com";
7
10
 
@@ -135,16 +138,109 @@ function toClientId(companyName) {
135
138
  .replace(/^-|-$/g, "");
136
139
  }
137
140
 
141
+ async function setupLocalMemory() {
142
+ console.log(`\n @pentatonic/memory — Local Setup\n`);
143
+
144
+ // Check Docker
145
+ try {
146
+ execFileSync("docker", ["info"], { stdio: "pipe" });
147
+ } catch {
148
+ console.error(" Error: Docker is required. Install it from https://docker.com\n");
149
+ process.exit(1);
150
+ }
151
+
152
+ const memoryDir = new URL("../packages/memory", import.meta.url).pathname;
153
+
154
+ // Start infrastructure + memory server
155
+ const infraSpinner = spinner("Starting memory server + PostgreSQL + Ollama...");
156
+ try {
157
+ execFileSync("docker", ["compose", "up", "-d", "memory", "postgres", "ollama"], {
158
+ cwd: memoryDir,
159
+ stdio: "pipe",
160
+ });
161
+ infraSpinner.stop("Memory stack running!");
162
+ } catch (err) {
163
+ infraSpinner.fail(`Failed to start: ${err.message}`);
164
+ process.exit(1);
165
+ }
166
+
167
+ // Pull models
168
+ const embModel = process.env.EMBEDDING_MODEL || "nomic-embed-text";
169
+ const llmModel = process.env.LLM_MODEL || "llama3.2:3b";
170
+
171
+ const embSpinner = spinner(`Pulling ${embModel}...`);
172
+ try {
173
+ execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", embModel], {
174
+ cwd: memoryDir,
175
+ stdio: "pipe",
176
+ });
177
+ embSpinner.stop(`${embModel} ready!`);
178
+ } catch {
179
+ embSpinner.fail(`Failed to pull ${embModel}. Run manually: docker compose exec ollama ollama pull ${embModel}`);
180
+ }
181
+
182
+ const llmSpinner = spinner(`Pulling ${llmModel}...`);
183
+ try {
184
+ execFileSync("docker", ["compose", "exec", "ollama", "ollama", "pull", llmModel], {
185
+ cwd: memoryDir,
186
+ stdio: "pipe",
187
+ });
188
+ llmSpinner.stop(`${llmModel} ready!`);
189
+ } catch {
190
+ llmSpinner.fail(`Failed to pull ${llmModel}. Run manually: docker compose exec ollama ollama pull ${llmModel}`);
191
+ }
192
+
193
+ // Write local config
194
+ const configDir = join(homedir(), ".claude-pentatonic");
195
+ if (!existsSync(configDir)) {
196
+ mkdirSync(configDir, { recursive: true });
197
+ }
198
+
199
+ const configPath = join(configDir, "tes-memory.local.md");
200
+ writeFileSync(
201
+ configPath,
202
+ `---
203
+ mode: local
204
+ memory_url: http://localhost:3333
205
+ ---
206
+ `
207
+ );
208
+
209
+ console.log(`\n Config written to ${configPath}`);
210
+
211
+ const sdkDir = new URL("..", import.meta.url).pathname;
212
+
213
+ console.log(`
214
+ Memory server: http://localhost:3333
215
+ Hooks are auto-configured to use local memory.
216
+
217
+ Install the plugin in Claude Code:
218
+ /plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
219
+ /plugin install tes-memory@pentatonic-ai
220
+
221
+ You're ready! Every prompt auto-searches memory,
222
+ every turn auto-stores. No MCP setup needed.
223
+ `);
224
+
225
+ rl.close();
226
+ }
227
+
138
228
  async function main() {
139
229
  const flags = parseArgs();
140
230
  const TES_ENDPOINT = flags.endpoint || DEFAULT_ENDPOINT;
141
231
 
232
+ if (flags.command === "memory") {
233
+ await setupLocalMemory();
234
+ return;
235
+ }
236
+
142
237
  if (flags.command !== "init") {
143
238
  console.log(`
144
239
  @pentatonic-ai/ai-agent-sdk
145
240
 
146
241
  Usage:
147
- npx @pentatonic-ai/ai-agent-sdk init Set up account and install SDK
242
+ npx @pentatonic-ai/ai-agent-sdk init Set up hosted TES account
243
+ npx @pentatonic-ai/ai-agent-sdk memory Set up local memory stack
148
244
  npx @pentatonic-ai/ai-agent-sdk init --endpoint URL Use a custom TES endpoint
149
245
 
150
246
  For docs, see https://api.pentatonic.com
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pentatonic-ai/ai-agent-sdk",
3
- "version": "0.3.0-beta.3",
3
+ "version": "0.4.0-beta.1",
4
4
  "description": "TES SDK — LLM observability and lifecycle tracking via Pentatonic Thing Event System. Track token usage, tool calls, and conversations. Manage things through event-sourced lifecycle stages with AI enrichment and vector search.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.cjs",