@sochdb/sochdb 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +201 -0
- package/README.md +3349 -0
- package/_bin/aarch64-apple-darwin/libsochdb_storage.dylib +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-bulk +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-grpc-server +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-server +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb-bulk.exe +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb-grpc-server.exe +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb_storage.dll +0 -0
- package/_bin/x86_64-unknown-linux-gnu/libsochdb_storage.so +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-bulk +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-grpc-server +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-server +0 -0
- package/bin/sochdb-bulk.js +80 -0
- package/bin/sochdb-grpc-server.js +80 -0
- package/bin/sochdb-server.js +84 -0
- package/dist/cjs/analytics.js +196 -0
- package/dist/cjs/database.js +929 -0
- package/dist/cjs/embedded/database.js +236 -0
- package/dist/cjs/embedded/ffi/bindings.js +113 -0
- package/dist/cjs/embedded/ffi/library-finder.js +135 -0
- package/dist/cjs/embedded/index.js +14 -0
- package/dist/cjs/embedded/transaction.js +172 -0
- package/dist/cjs/errors.js +71 -0
- package/dist/cjs/format.js +176 -0
- package/dist/cjs/grpc-client.js +328 -0
- package/dist/cjs/index.js +75 -0
- package/dist/cjs/ipc-client.js +504 -0
- package/dist/cjs/query.js +154 -0
- package/dist/cjs/server-manager.js +295 -0
- package/dist/cjs/sql-engine.js +874 -0
- package/dist/esm/analytics.js +196 -0
- package/dist/esm/database.js +931 -0
- package/dist/esm/embedded/database.js +239 -0
- package/dist/esm/embedded/ffi/bindings.js +142 -0
- package/dist/esm/embedded/ffi/library-finder.js +135 -0
- package/dist/esm/embedded/index.js +14 -0
- package/dist/esm/embedded/transaction.js +176 -0
- package/dist/esm/errors.js +71 -0
- package/dist/esm/format.js +179 -0
- package/dist/esm/grpc-client.js +333 -0
- package/dist/esm/index.js +75 -0
- package/dist/esm/ipc-client.js +505 -0
- package/dist/esm/query.js +159 -0
- package/dist/esm/server-manager.js +295 -0
- package/dist/esm/sql-engine.js +875 -0
- package/dist/types/analytics.d.ts +66 -0
- package/dist/types/analytics.d.ts.map +1 -0
- package/dist/types/database.d.ts +523 -0
- package/dist/types/database.d.ts.map +1 -0
- package/dist/types/embedded/database.d.ts +105 -0
- package/dist/types/embedded/database.d.ts.map +1 -0
- package/dist/types/embedded/ffi/bindings.d.ts +24 -0
- package/dist/types/embedded/ffi/bindings.d.ts.map +1 -0
- package/dist/types/embedded/ffi/library-finder.d.ts +17 -0
- package/dist/types/embedded/ffi/library-finder.d.ts.map +1 -0
- package/dist/types/embedded/index.d.ts +9 -0
- package/dist/types/embedded/index.d.ts.map +1 -0
- package/dist/types/embedded/transaction.d.ts +21 -0
- package/dist/types/embedded/transaction.d.ts.map +1 -0
- package/dist/types/errors.d.ts +36 -0
- package/dist/types/errors.d.ts.map +1 -0
- package/dist/types/format.d.ts +117 -0
- package/dist/types/format.d.ts.map +1 -0
- package/dist/types/grpc-client.d.ts +120 -0
- package/dist/types/grpc-client.d.ts.map +1 -0
- package/dist/types/index.d.ts +50 -0
- package/dist/types/index.d.ts.map +1 -0
- package/dist/types/ipc-client.d.ts +177 -0
- package/dist/types/ipc-client.d.ts.map +1 -0
- package/dist/types/query.d.ts +85 -0
- package/dist/types/query.d.ts.map +1 -0
- package/dist/types/server-manager.d.ts +29 -0
- package/dist/types/server-manager.d.ts.map +1 -0
- package/dist/types/sql-engine.d.ts +100 -0
- package/dist/types/sql-engine.d.ts.map +1 -0
- package/package.json +90 -0
- package/scripts/postinstall.js +50 -0
package/README.md
ADDED
|
@@ -0,0 +1,3349 @@
|
|
|
1
|
+
# SochDB Node.js SDK v0.4.0
|
|
2
|
+
|
|
3
|
+
**Dual-mode architecture: Embedded (FFI) + Server (gRPC/IPC)**
|
|
4
|
+
Choose the deployment mode that fits your needs.
|
|
5
|
+
|
|
6
|
+
## Architecture: Flexible Deployment
|
|
7
|
+
|
|
8
|
+
```
|
|
9
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
10
|
+
│ DEPLOYMENT OPTIONS │
|
|
11
|
+
├─────────────────────────────────────────────────────────────┤
|
|
12
|
+
│ │
|
|
13
|
+
│ 1. EMBEDDED MODE (FFI) 2. SERVER MODE (gRPC) │
|
|
14
|
+
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
|
15
|
+
│ │ Node.js App │ │ Node.js App │ │
|
|
16
|
+
│ │ ├─ Database.open()│ │ ├─ SochDBClient() │ │
|
|
17
|
+
│ │ └─ Direct FFI │ │ └─ gRPC calls │ │
|
|
18
|
+
│ │ │ │ │ │ │ │
|
|
19
|
+
│ │ ▼ │ │ ▼ │ │
|
|
20
|
+
│ │ libsochdb_storage │ │ sochdb-grpc │ │
|
|
21
|
+
│ │ (Rust native) │ │ (Rust server) │ │
|
|
22
|
+
│ └─────────────────────┘ └─────────────────────┘ │
|
|
23
|
+
│ │
|
|
24
|
+
│ ✅ No server needed ✅ Multi-language │
|
|
25
|
+
│ ✅ Local files ✅ Centralized logic │
|
|
26
|
+
│ ✅ Simple deployment ✅ Production scale │
|
|
27
|
+
└─────────────────────────────────────────────────────────────┘
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### When to Use Each Mode
|
|
31
|
+
|
|
32
|
+
**Embedded Mode (FFI):**
|
|
33
|
+
- ✅ Local development and testing
|
|
34
|
+
- ✅ Jupyter notebooks and data science
|
|
35
|
+
- ✅ Single-process applications
|
|
36
|
+
- ✅ Edge deployments without network
|
|
37
|
+
- ✅ No server setup required
|
|
38
|
+
|
|
39
|
+
**Server Mode (gRPC):**
|
|
40
|
+
- ✅ Production deployments
|
|
41
|
+
- ✅ Multi-language teams (Python, Node.js, Go)
|
|
42
|
+
- ✅ Distributed systems
|
|
43
|
+
- ✅ Centralized business logic
|
|
44
|
+
- ✅ Horizontal scaling
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Installation
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
npm install @sochdb/sochdb
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
Or from source:
|
|
55
|
+
```bash
|
|
56
|
+
cd sochdb-typescript-sdk
|
|
57
|
+
npm install
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
# SochDB Node.js SDK Documentation
|
|
61
|
+
|
|
62
|
+
**Version 0.4.0** | LLM-Optimized Embedded Database with Native Vector Search
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Table of Contents
|
|
67
|
+
|
|
68
|
+
1. [Quick Start](#1-quick-start)
|
|
69
|
+
2. [Installation](#2-installation)
|
|
70
|
+
3. [Architecture Overview](#3-architecture-overview)
|
|
71
|
+
4. [Core Key-Value Operations](#4-core-key-value-operations)
|
|
72
|
+
5. [Transactions (ACID with SSI)](#5-transactions-acid-with-ssi)
|
|
73
|
+
6. [Query Builder](#6-query-builder)
|
|
74
|
+
7. [Prefix Scanning](#7-prefix-scanning)
|
|
75
|
+
8. [SQL Operations](#8-sql-operations)
|
|
76
|
+
9. [Table Management & Index Policies](#9-table-management--index-policies)
|
|
77
|
+
10. [Namespaces & Multi-Tenancy](#10-namespaces--multi-tenancy)
|
|
78
|
+
11. [Collections & Vector Search](#11-collections--vector-search)
|
|
79
|
+
12. [Hybrid Search (Vector + BM25)](#12-hybrid-search-vector--bm25)
|
|
80
|
+
13. [Graph Operations](#13-graph-operations)
|
|
81
|
+
14. [Temporal Graph (Time-Travel)](#14-temporal-graph-time-travel)
|
|
82
|
+
15. [Semantic Cache](#15-semantic-cache)
|
|
83
|
+
16. [Context Query Builder (LLM Optimization) and Session](#16-context-query-builder-llm-optimization)
|
|
84
|
+
17. [Atomic Multi-Index Writes](#17-atomic-multi-index-writes)
|
|
85
|
+
18. [Recovery & WAL Management](#18-recovery--wal-management)
|
|
86
|
+
19. [Checkpoints & Snapshots](#19-checkpoints--snapshots)
|
|
87
|
+
20. [Compression & Storage](#20-compression--storage)
|
|
88
|
+
21. [Statistics & Monitoring](#21-statistics--monitoring)
|
|
89
|
+
22. [Distributed Tracing](#22-distributed-tracing)
|
|
90
|
+
23. [Workflow & Run Tracking](#23-workflow--run-tracking)
|
|
91
|
+
24. [Server Mode (gRPC Client)](#24-server-mode-grpc-client)
|
|
92
|
+
25. [IPC Client (Unix Sockets)](#25-ipc-client-unix-sockets)
|
|
93
|
+
26. [Standalone VectorIndex](#26-standalone-vectorindex)
|
|
94
|
+
27. [Vector Utilities](#27-vector-utilities)
|
|
95
|
+
28. [Data Formats (TOON/JSON/Columnar)](#28-data-formats-toonjsoncolumnar)
|
|
96
|
+
29. [Policy Service](#29-policy-service)
|
|
97
|
+
30. [MCP (Model Context Protocol)](#30-mcp-model-context-protocol)
|
|
98
|
+
31. [Configuration Reference](#31-configuration-reference)
|
|
99
|
+
32. [Error Handling](#32-error-handling)
|
|
100
|
+
33. [Async Support](#33-async-support)
|
|
101
|
+
34. [Building & Development](#34-building--development)
|
|
102
|
+
35. [Complete Examples](#35-complete-examples)
|
|
103
|
+
36. [Migration Guide](#36-migration-guide)
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## 1. Quick Start
|
|
108
|
+
|
|
109
|
+
```typescript
|
|
110
|
+
from sochdb import Database
|
|
111
|
+
|
|
112
|
+
# Open (or create) a database
|
|
113
|
+
db = Database.open("./my_database")
|
|
114
|
+
|
|
115
|
+
# Store and retrieve data
|
|
116
|
+
db.put(b"hello", b"world")
|
|
117
|
+
value = db.get(b"hello") # b"world"
|
|
118
|
+
|
|
119
|
+
# Use transactions for atomic operations
|
|
120
|
+
with db.transaction() as txn:
|
|
121
|
+
txn.put(b"key1", b"value1")
|
|
122
|
+
txn.put(b"key2", b"value2")
|
|
123
|
+
# Auto-commits on success, auto-rollbacks on exception
|
|
124
|
+
|
|
125
|
+
# Clean up
|
|
126
|
+
db.delete(b"hello")
|
|
127
|
+
db.close()
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**30-Second Overview:**
|
|
131
|
+
- **Key-Value**: Fast reads/writes with `get`/`put`/`delete`
|
|
132
|
+
- **Transactions**: ACID with SSI isolation
|
|
133
|
+
- **Vector Search**: HNSW-based semantic search
|
|
134
|
+
- **Hybrid Search**: Combine vectors with BM25 keyword search
|
|
135
|
+
- **Graph**: Build and traverse knowledge graphs
|
|
136
|
+
- **LLM-Optimized**: TOON format uses 40-60% fewer tokens than JSON
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## 2. Installation
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
npm install @sochdb/sochdb
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Platform Support:**
|
|
147
|
+
| Platform | Architecture | Status |
|
|
148
|
+
|----------|--------------|--------|
|
|
149
|
+
| Linux | x86_64, aarch64 | ✅ Full support |
|
|
150
|
+
| macOS | x86_64, arm64 | ✅ Full support |
|
|
151
|
+
| Windows | x86_64 | ✅ Full support |
|
|
152
|
+
|
|
153
|
+
**Optional Dependencies:**
|
|
154
|
+
```bash
|
|
155
|
+
# For async support
|
|
156
|
+
npm install @sochdb/sochdb[async]
|
|
157
|
+
|
|
158
|
+
# For server mode
|
|
159
|
+
npm install @sochdb/sochdb[grpc]
|
|
160
|
+
|
|
161
|
+
# Everything
|
|
162
|
+
npm install @sochdb/sochdb[all]
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## 3. Architecture Overview
|
|
168
|
+
|
|
169
|
+
SochDB supports two deployment modes:
|
|
170
|
+
|
|
171
|
+
### Embedded Mode (Default)
|
|
172
|
+
|
|
173
|
+
Direct Rust bindings via FFI. No server required.
|
|
174
|
+
|
|
175
|
+
```typescript
|
|
176
|
+
from sochdb import Database
|
|
177
|
+
|
|
178
|
+
with Database.open("./mydb") as db:
|
|
179
|
+
db.put(b"key", b"value")
|
|
180
|
+
value = db.get(b"key")
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
**Best for:** Local development, notebooks, single-process applications.
|
|
184
|
+
|
|
185
|
+
### Server Mode (gRPC)
|
|
186
|
+
|
|
187
|
+
Thin client connecting to `sochdb-grpc` server.
|
|
188
|
+
|
|
189
|
+
```typescript
|
|
190
|
+
from sochdb import SochDBClient
|
|
191
|
+
|
|
192
|
+
client = SochDBClient("localhost:50051")
|
|
193
|
+
client.put(b"key", b"value", namespace="default")
|
|
194
|
+
value = client.get(b"key", namespace="default")
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Best for:** Production, multi-process, distributed systems.
|
|
198
|
+
|
|
199
|
+
### Feature Comparison
|
|
200
|
+
|
|
201
|
+
| Feature | Embedded | Server |
|
|
202
|
+
|---------|----------|--------|
|
|
203
|
+
| Setup | `npm install` only | Server + client |
|
|
204
|
+
| Performance | Fastest (in-process) | Network overhead |
|
|
205
|
+
| Multi-process | ❌ | ✅ |
|
|
206
|
+
| Horizontal scaling | ❌ | ✅ |
|
|
207
|
+
| Vector search | ✅ | ✅ |
|
|
208
|
+
| Graph operations | ✅ | ✅ |
|
|
209
|
+
| Semantic cache | ✅ | ✅ |
|
|
210
|
+
| Context service | Limited | ✅ Full |
|
|
211
|
+
| MCP integration | ❌ | ✅ |
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
215
|
+
│ DEPLOYMENT OPTIONS │
|
|
216
|
+
├─────────────────────────────────────────────────────────────┤
|
|
217
|
+
│ EMBEDDED MODE (FFI) SERVER MODE (gRPC) │
|
|
218
|
+
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
|
219
|
+
│ │ Node.js App │ │ Node.js App │ │
|
|
220
|
+
│ │ ├─ Database.open()│ │ ├─ SochDBClient() │ │
|
|
221
|
+
│ │ └─ Direct FFI │ │ └─ gRPC calls │ │
|
|
222
|
+
│ │ │ │ │ │ │ │
|
|
223
|
+
│ │ ▼ │ │ ▼ │ │
|
|
224
|
+
│ │ libsochdb_storage │ │ sochdb-grpc │ │
|
|
225
|
+
│ │ (Rust native) │ │ (Rust server) │ │
|
|
226
|
+
│ └─────────────────────┘ └─────────────────────┘ │
|
|
227
|
+
│ │
|
|
228
|
+
│ ✅ No server needed ✅ Multi-language │
|
|
229
|
+
│ ✅ Local files ✅ Centralized logic │
|
|
230
|
+
│ ✅ Simple deployment ✅ Production scale │
|
|
231
|
+
└─────────────────────────────────────────────────────────────┘
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
---
|
|
235
|
+
|
|
236
|
+
## 4. Core Key-Value Operations
|
|
237
|
+
|
|
238
|
+
All keys and values are **bytes**.
|
|
239
|
+
|
|
240
|
+
### Basic Operations
|
|
241
|
+
|
|
242
|
+
```typescript
|
|
243
|
+
from sochdb import Database
|
|
244
|
+
|
|
245
|
+
db = Database.open("./my_db")
|
|
246
|
+
|
|
247
|
+
# Store data
|
|
248
|
+
db.put(b"user:1", b"Alice")
|
|
249
|
+
db.put(b"user:2", b"Bob")
|
|
250
|
+
|
|
251
|
+
# Retrieve data
|
|
252
|
+
user = db.get(b"user:1") # Returns b"Alice" or None
|
|
253
|
+
|
|
254
|
+
# Check existence
|
|
255
|
+
exists = db.exists(b"user:1") # True
|
|
256
|
+
|
|
257
|
+
# Delete data
|
|
258
|
+
db.delete(b"user:1")
|
|
259
|
+
|
|
260
|
+
db.close()
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### Path-Based Keys (Hierarchical)
|
|
264
|
+
|
|
265
|
+
Organize data hierarchically with path-based access:
|
|
266
|
+
|
|
267
|
+
```typescript
|
|
268
|
+
# Store with path (strings auto-converted to bytes internally)
|
|
269
|
+
db.put_path("users/alice/name", b"Alice Smith")
|
|
270
|
+
db.put_path("users/alice/email", b"alice@example.com")
|
|
271
|
+
db.put_path("users/bob/name", b"Bob Jones")
|
|
272
|
+
|
|
273
|
+
# Retrieve by path
|
|
274
|
+
name = db.get_path("users/alice/name") # b"Alice Smith"
|
|
275
|
+
|
|
276
|
+
# Delete by path
|
|
277
|
+
db.delete_path("users/alice/email")
|
|
278
|
+
|
|
279
|
+
# List at path (like listing directory)
|
|
280
|
+
children = db.list_path("users/") # ["alice", "bob"]
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### With TTL (Time-To-Live)
|
|
284
|
+
|
|
285
|
+
```typescript
|
|
286
|
+
# Store with expiration (seconds)
|
|
287
|
+
db.put(b"session:abc123", b"user_data", ttl_seconds=3600) # Expires in 1 hour
|
|
288
|
+
|
|
289
|
+
# TTL of 0 means no expiration
|
|
290
|
+
db.put(b"permanent_key", b"value", ttl_seconds=0)
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### Batch Operations
|
|
294
|
+
|
|
295
|
+
```typescript
|
|
296
|
+
# Batch put (more efficient than individual puts)
|
|
297
|
+
db.put_batch([
|
|
298
|
+
(b"key1", b"value1"),
|
|
299
|
+
(b"key2", b"value2"),
|
|
300
|
+
(b"key3", b"value3"),
|
|
301
|
+
])
|
|
302
|
+
|
|
303
|
+
# Batch get
|
|
304
|
+
values = db.get_batch([b"key1", b"key2", b"key3"])
|
|
305
|
+
# Returns: [b"value1", b"value2", b"value3"] (None for missing keys)
|
|
306
|
+
|
|
307
|
+
# Batch delete
|
|
308
|
+
db.delete_batch([b"key1", b"key2", b"key3"])
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Context Manager
|
|
312
|
+
|
|
313
|
+
```typescript
|
|
314
|
+
with Database.open("./my_db") as db:
|
|
315
|
+
db.put(b"key", b"value")
|
|
316
|
+
# Automatically closes when exiting
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
---
|
|
320
|
+
|
|
321
|
+
## 5. Transactions (ACID with SSI)
|
|
322
|
+
|
|
323
|
+
SochDB provides full ACID transactions with **Serializable Snapshot Isolation (SSI)**.
|
|
324
|
+
|
|
325
|
+
### Context Manager Pattern (Recommended)
|
|
326
|
+
|
|
327
|
+
```typescript
|
|
328
|
+
# Auto-commits on success, auto-rollbacks on exception
|
|
329
|
+
with db.transaction() as txn:
|
|
330
|
+
txn.put(b"accounts/alice", b"1000")
|
|
331
|
+
txn.put(b"accounts/bob", b"500")
|
|
332
|
+
|
|
333
|
+
# Read within transaction sees your writes
|
|
334
|
+
balance = txn.get(b"accounts/alice") # b"1000"
|
|
335
|
+
|
|
336
|
+
# If exception occurs, rolls back automatically
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
### Closure Pattern (Rust-Style)
|
|
340
|
+
|
|
341
|
+
```typescript
|
|
342
|
+
# Using with_transaction for automatic commit/rollback
|
|
343
|
+
def transfer_funds(txn):
|
|
344
|
+
alice = int(txn.get(b"accounts/alice") or b"0")
|
|
345
|
+
bob = int(txn.get(b"accounts/bob") or b"0")
|
|
346
|
+
|
|
347
|
+
txn.put(b"accounts/alice", str(alice - 100).encode())
|
|
348
|
+
txn.put(b"accounts/bob", str(bob + 100).encode())
|
|
349
|
+
|
|
350
|
+
return "Transfer complete"
|
|
351
|
+
|
|
352
|
+
result = db.with_transaction(transfer_funds)
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
### Manual Transaction Control
|
|
356
|
+
|
|
357
|
+
```typescript
|
|
358
|
+
txn = db.begin_transaction()
|
|
359
|
+
try:
|
|
360
|
+
txn.put(b"key1", b"value1")
|
|
361
|
+
txn.put(b"key2", b"value2")
|
|
362
|
+
|
|
363
|
+
commit_ts = txn.commit() # Returns HLC timestamp
|
|
364
|
+
print(f"Committed at: {commit_ts}")
|
|
365
|
+
except Exception as e:
|
|
366
|
+
txn.abort()
|
|
367
|
+
raise
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
### Transaction Properties
|
|
371
|
+
|
|
372
|
+
```typescript
|
|
373
|
+
txn = db.transaction()
|
|
374
|
+
print(f"Transaction ID: {txn.id}") # Unique identifier
|
|
375
|
+
print(f"Start timestamp: {txn.start_ts}") # HLC start time
|
|
376
|
+
print(f"Isolation: {txn.isolation}") # "serializable"
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
### SSI Conflict Handling
|
|
380
|
+
|
|
381
|
+
```typescript
|
|
382
|
+
from sochdb import TransactionConflictError
|
|
383
|
+
|
|
384
|
+
MAX_RETRIES = 3
|
|
385
|
+
|
|
386
|
+
for attempt in range(MAX_RETRIES):
|
|
387
|
+
try:
|
|
388
|
+
with db.transaction() as txn:
|
|
389
|
+
# Read and modify
|
|
390
|
+
value = int(txn.get(b"counter") or b"0")
|
|
391
|
+
txn.put(b"counter", str(value + 1).encode())
|
|
392
|
+
break # Success
|
|
393
|
+
except TransactionConflictError:
|
|
394
|
+
if attempt == MAX_RETRIES - 1:
|
|
395
|
+
raise
|
|
396
|
+
# Retry on conflict
|
|
397
|
+
continue
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
### All Transaction Operations
|
|
401
|
+
|
|
402
|
+
```typescript
|
|
403
|
+
with db.transaction() as txn:
|
|
404
|
+
# Key-value
|
|
405
|
+
txn.put(key, value)
|
|
406
|
+
txn.get(key)
|
|
407
|
+
txn.delete(key)
|
|
408
|
+
txn.exists(key)
|
|
409
|
+
|
|
410
|
+
# Path-based
|
|
411
|
+
txn.put_path(path, value)
|
|
412
|
+
txn.get_path(path)
|
|
413
|
+
|
|
414
|
+
# Batch operations
|
|
415
|
+
txn.put_batch(pairs)
|
|
416
|
+
txn.get_batch(keys)
|
|
417
|
+
|
|
418
|
+
# Scanning
|
|
419
|
+
for k, v in txn.scan_prefix(b"prefix/"):
|
|
420
|
+
print(k, v)
|
|
421
|
+
|
|
422
|
+
# SQL (within transaction isolation)
|
|
423
|
+
result = txn.execute("SELECT * FROM users WHERE id = 1")
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
### Isolation Levels
|
|
427
|
+
|
|
428
|
+
```typescript
|
|
429
|
+
from sochdb import IsolationLevel
|
|
430
|
+
|
|
431
|
+
# Default: Serializable (strongest)
|
|
432
|
+
with db.transaction(isolation=IsolationLevel.SERIALIZABLE) as txn:
|
|
433
|
+
pass
|
|
434
|
+
|
|
435
|
+
# Snapshot isolation (faster, allows some anomalies)
|
|
436
|
+
with db.transaction(isolation=IsolationLevel.SNAPSHOT) as txn:
|
|
437
|
+
pass
|
|
438
|
+
|
|
439
|
+
# Read committed (fastest, least isolation)
|
|
440
|
+
with db.transaction(isolation=IsolationLevel.READ_COMMITTED) as txn:
|
|
441
|
+
pass
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
## 6. Query Builder
|
|
447
|
+
|
|
448
|
+
Fluent API for building efficient queries with predicate pushdown.
|
|
449
|
+
|
|
450
|
+
### Basic Query
|
|
451
|
+
|
|
452
|
+
```typescript
|
|
453
|
+
# Query with prefix and limit
|
|
454
|
+
results = db.query("users/")
|
|
455
|
+
.limit(10)
|
|
456
|
+
.execute()
|
|
457
|
+
|
|
458
|
+
for key, value in results:
|
|
459
|
+
print(f"{key.decode()}: {value.decode()}")
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
### Filtered Query
|
|
463
|
+
|
|
464
|
+
```typescript
|
|
465
|
+
from sochdb import CompareOp
|
|
466
|
+
|
|
467
|
+
# Query with filters
|
|
468
|
+
results = db.query("orders/")
|
|
469
|
+
.where("status", CompareOp.EQ, "pending")
|
|
470
|
+
.where("amount", CompareOp.GT, 100)
|
|
471
|
+
.order_by("created_at", descending=True)
|
|
472
|
+
.limit(50)
|
|
473
|
+
.offset(10)
|
|
474
|
+
.execute()
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
### Column Selection
|
|
478
|
+
|
|
479
|
+
```typescript
|
|
480
|
+
# Select specific fields only
|
|
481
|
+
results = db.query("users/")
|
|
482
|
+
.select(["name", "email"]) # Only fetch these columns
|
|
483
|
+
.where("active", CompareOp.EQ, True)
|
|
484
|
+
.execute()
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
### Aggregate Queries
|
|
488
|
+
|
|
489
|
+
```typescript
|
|
490
|
+
# Count
|
|
491
|
+
count = db.query("orders/")
|
|
492
|
+
.where("status", CompareOp.EQ, "completed")
|
|
493
|
+
.count()
|
|
494
|
+
|
|
495
|
+
# Sum (for numeric columns)
|
|
496
|
+
total = db.query("orders/")
|
|
497
|
+
.sum("amount")
|
|
498
|
+
|
|
499
|
+
# Group by
|
|
500
|
+
results = db.query("orders/")
|
|
501
|
+
.select(["status", "COUNT(*)", "SUM(amount)"])
|
|
502
|
+
.group_by("status")
|
|
503
|
+
.execute()
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
### Query in Transaction
|
|
507
|
+
|
|
508
|
+
```typescript
|
|
509
|
+
with db.transaction() as txn:
|
|
510
|
+
results = txn.query("users/")
|
|
511
|
+
.where("role", CompareOp.EQ, "admin")
|
|
512
|
+
.execute()
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
---
|
|
516
|
+
|
|
517
|
+
## 7. Prefix Scanning
|
|
518
|
+
|
|
519
|
+
Iterate over keys with common prefixes efficiently.
|
|
520
|
+
|
|
521
|
+
### Safe Prefix Scan (Recommended)
|
|
522
|
+
|
|
523
|
+
```typescript
|
|
524
|
+
# Requires minimum 2-byte prefix (prevents accidental full scans)
|
|
525
|
+
for key, value in db.scan_prefix(b"users/"):
|
|
526
|
+
print(f"{key.decode()}: {value.decode()}")
|
|
527
|
+
|
|
528
|
+
# Raises ValueError if prefix < 2 bytes
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
### Unchecked Prefix Scan
|
|
532
|
+
|
|
533
|
+
```typescript
|
|
534
|
+
# For internal operations needing empty/short prefixes
|
|
535
|
+
# WARNING: Can cause expensive full-database scans
|
|
536
|
+
for key, value in db.scan_prefix_unchecked(b""):
|
|
537
|
+
print(f"All keys: {key}")
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
### Batched Scanning (1000x Faster)
|
|
541
|
+
|
|
542
|
+
```typescript
|
|
543
|
+
# Fetches 1000 results per FFI call instead of 1
|
|
544
|
+
# Performance: 10,000 results = 10 FFI calls vs 10,000 calls
|
|
545
|
+
|
|
546
|
+
for key, value in db.scan_batched(b"prefix/", batch_size=1000):
|
|
547
|
+
process(key, value)
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
### Reverse Scan
|
|
551
|
+
|
|
552
|
+
```typescript
|
|
553
|
+
# Scan in reverse order (newest first)
|
|
554
|
+
for key, value in db.scan_prefix(b"logs/", reverse=True):
|
|
555
|
+
print(key, value)
|
|
556
|
+
```
|
|
557
|
+
|
|
558
|
+
### Range Scan
|
|
559
|
+
|
|
560
|
+
```typescript
|
|
561
|
+
# Scan within a specific range
|
|
562
|
+
for key, value in db.scan_range(b"users/a", b"users/m"):
|
|
563
|
+
print(key, value) # All users from "a" to "m"
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
### Streaming Large Results
|
|
567
|
+
|
|
568
|
+
```typescript
|
|
569
|
+
# For very large result sets, use streaming to avoid memory issues
|
|
570
|
+
for batch in db.scan_stream(b"logs/", batch_size=10000):
|
|
571
|
+
for key, value in batch:
|
|
572
|
+
process(key, value)
|
|
573
|
+
# Memory is freed after processing each batch
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
---
|
|
577
|
+
|
|
578
|
+
## 8. SQL Operations
|
|
579
|
+
|
|
580
|
+
Execute SQL queries for familiar relational patterns.
|
|
581
|
+
|
|
582
|
+
### Creating Tables
|
|
583
|
+
|
|
584
|
+
```typescript
|
|
585
|
+
db.execute_sql("""
|
|
586
|
+
CREATE TABLE users (
|
|
587
|
+
id INTEGER PRIMARY KEY,
|
|
588
|
+
name TEXT NOT NULL,
|
|
589
|
+
email TEXT UNIQUE,
|
|
590
|
+
age INTEGER,
|
|
591
|
+
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
|
592
|
+
)
|
|
593
|
+
""")
|
|
594
|
+
|
|
595
|
+
db.execute_sql("""
|
|
596
|
+
CREATE TABLE posts (
|
|
597
|
+
id INTEGER PRIMARY KEY,
|
|
598
|
+
user_id INTEGER REFERENCES users(id),
|
|
599
|
+
title TEXT NOT NULL,
|
|
600
|
+
content TEXT,
|
|
601
|
+
likes INTEGER DEFAULT 0
|
|
602
|
+
)
|
|
603
|
+
""")
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
### CRUD Operations
|
|
607
|
+
|
|
608
|
+
```typescript
|
|
609
|
+
# Insert
|
|
610
|
+
db.execute_sql("""
|
|
611
|
+
INSERT INTO users (id, name, email, age)
|
|
612
|
+
VALUES (1, 'Alice', 'alice@example.com', 30)
|
|
613
|
+
""")
|
|
614
|
+
|
|
615
|
+
# Insert with parameters (prevents SQL injection)
|
|
616
|
+
db.execute_sql(
|
|
617
|
+
"INSERT INTO users (id, name, email, age) VALUES (?, ?, ?, ?)",
|
|
618
|
+
params=[2, "Bob", "bob@example.com", 25]
|
|
619
|
+
)
|
|
620
|
+
|
|
621
|
+
# Select
|
|
622
|
+
result = db.execute_sql("SELECT * FROM users WHERE age > 25")
|
|
623
|
+
for row in result.rows:
|
|
624
|
+
print(row) # {'id': 1, 'name': 'Alice', ...}
|
|
625
|
+
|
|
626
|
+
# Update
|
|
627
|
+
db.execute_sql("UPDATE users SET email = 'alice.new@example.com' WHERE id = 1")
|
|
628
|
+
|
|
629
|
+
# Delete
|
|
630
|
+
db.execute_sql("DELETE FROM users WHERE id = 2")
|
|
631
|
+
```
|
|
632
|
+
|
|
633
|
+
### Upsert (Insert or Update)
|
|
634
|
+
|
|
635
|
+
```typescript
|
|
636
|
+
# Insert or update on conflict
|
|
637
|
+
db.execute_sql("""
|
|
638
|
+
INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
|
|
639
|
+
ON CONFLICT (id) DO UPDATE SET
|
|
640
|
+
name = excluded.name,
|
|
641
|
+
email = excluded.email
|
|
642
|
+
""")
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
### Query Results
|
|
646
|
+
|
|
647
|
+
```typescript
|
|
648
|
+
from sochdb import SQLQueryResult
|
|
649
|
+
|
|
650
|
+
result = db.execute_sql("SELECT id, name FROM users")
|
|
651
|
+
|
|
652
|
+
print(f"Columns: {result.columns}") # ['id', 'name']
|
|
653
|
+
print(f"Row count: {len(result.rows)}")
|
|
654
|
+
print(f"Execution time: {result.execution_time_ms}ms")
|
|
655
|
+
|
|
656
|
+
for row in result.rows:
|
|
657
|
+
print(f"ID: {row['id']}, Name: {row['name']}")
|
|
658
|
+
|
|
659
|
+
# Convert to different formats
|
|
660
|
+
df = result.to_dataframe() # pandas DataFrame
|
|
661
|
+
json_data = result.to_json()
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
### Index Management
|
|
665
|
+
|
|
666
|
+
```typescript
|
|
667
|
+
# Create index
|
|
668
|
+
db.execute_sql("CREATE INDEX idx_users_email ON users(email)")
|
|
669
|
+
|
|
670
|
+
# Create unique index
|
|
671
|
+
db.execute_sql("CREATE UNIQUE INDEX idx_users_email ON users(email)")
|
|
672
|
+
|
|
673
|
+
# Drop index
|
|
674
|
+
db.execute_sql("DROP INDEX IF EXISTS idx_users_email")
|
|
675
|
+
|
|
676
|
+
# List indexes
|
|
677
|
+
indexes = db.list_indexes("users")
|
|
678
|
+
```
|
|
679
|
+
|
|
680
|
+
### Prepared Statements
|
|
681
|
+
|
|
682
|
+
```typescript
|
|
683
|
+
# Prepare once, execute many times
|
|
684
|
+
stmt = db.prepare("SELECT * FROM users WHERE age > ? AND status = ?")
|
|
685
|
+
|
|
686
|
+
# Execute with different parameters
|
|
687
|
+
young_active = stmt.execute([25, "active"])
|
|
688
|
+
old_active = stmt.execute([50, "active"])
|
|
689
|
+
|
|
690
|
+
# Close when done
|
|
691
|
+
stmt.close()
|
|
692
|
+
```
|
|
693
|
+
|
|
694
|
+
### Dialect Support
|
|
695
|
+
|
|
696
|
+
SochDB auto-detects SQL dialects:
|
|
697
|
+
|
|
698
|
+
```typescript
|
|
699
|
+
# PostgreSQL style
|
|
700
|
+
db.execute_sql("INSERT INTO users VALUES (1, 'Alice') ON CONFLICT DO NOTHING")
|
|
701
|
+
|
|
702
|
+
# MySQL style
|
|
703
|
+
db.execute_sql("INSERT IGNORE INTO users VALUES (1, 'Alice')")
|
|
704
|
+
|
|
705
|
+
# SQLite style
|
|
706
|
+
db.execute_sql("INSERT OR IGNORE INTO users VALUES (1, 'Alice')")
|
|
707
|
+
```
|
|
708
|
+
|
|
709
|
+
---
|
|
710
|
+
|
|
711
|
+
## 9. Table Management & Index Policies
|
|
712
|
+
|
|
713
|
+
### Table Information
|
|
714
|
+
|
|
715
|
+
```typescript
|
|
716
|
+
# Get table schema
|
|
717
|
+
schema = db.get_table_schema("users")
|
|
718
|
+
print(f"Columns: {schema.columns}")
|
|
719
|
+
print(f"Primary key: {schema.primary_key}")
|
|
720
|
+
print(f"Indexes: {schema.indexes}")
|
|
721
|
+
|
|
722
|
+
# List all tables
|
|
723
|
+
tables = db.list_tables()
|
|
724
|
+
|
|
725
|
+
# Drop table
|
|
726
|
+
db.execute_sql("DROP TABLE IF EXISTS old_table")
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
### Index Policies
|
|
730
|
+
|
|
731
|
+
Configure per-table indexing strategies for optimal performance:
|
|
732
|
+
|
|
733
|
+
```typescript
|
|
734
|
+
# Policy constants
|
|
735
|
+
Database.INDEX_WRITE_OPTIMIZED # 0 - O(1) insert, O(N) scan
|
|
736
|
+
Database.INDEX_BALANCED # 1 - O(1) amortized insert, O(log K) scan
|
|
737
|
+
Database.INDEX_SCAN_OPTIMIZED # 2 - O(log N) insert, O(log N + K) scan
|
|
738
|
+
Database.INDEX_APPEND_ONLY # 3 - O(1) insert, O(N) scan (time-series)
|
|
739
|
+
|
|
740
|
+
# Set by constant
|
|
741
|
+
db.set_table_index_policy("logs", Database.INDEX_APPEND_ONLY)
|
|
742
|
+
|
|
743
|
+
# Set by string
|
|
744
|
+
db.set_table_index_policy("users", "scan_optimized")
|
|
745
|
+
|
|
746
|
+
# Get current policy
|
|
747
|
+
policy = db.get_table_index_policy("users")
|
|
748
|
+
print(f"Policy: {policy}") # "scan_optimized"
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### Policy Selection Guide
|
|
752
|
+
|
|
753
|
+
| Policy | Insert | Scan | Best For |
|
|
754
|
+
|--------|--------|------|----------|
|
|
755
|
+
| `write_optimized` | O(1) | O(N) | High-write ingestion |
|
|
756
|
+
| `balanced` | O(1) amortized | O(log K) | General use (default) |
|
|
757
|
+
| `scan_optimized` | O(log N) | O(log N + K) | Analytics, read-heavy |
|
|
758
|
+
| `append_only` | O(1) | O(N) | Time-series, logs |
|
|
759
|
+
|
|
760
|
+
---
|
|
761
|
+
|
|
762
|
+
## 10. Namespaces & Multi-Tenancy
|
|
763
|
+
|
|
764
|
+
Organize data into logical namespaces for tenant isolation.
|
|
765
|
+
|
|
766
|
+
### Creating Namespaces
|
|
767
|
+
|
|
768
|
+
```typescript
|
|
769
|
+
from sochdb import NamespaceConfig
|
|
770
|
+
|
|
771
|
+
# Create namespace with metadata
|
|
772
|
+
ns = db.create_namespace(
|
|
773
|
+
name="tenant_123",
|
|
774
|
+
display_name="Acme Corp",
|
|
775
|
+
labels={"tier": "premium", "region": "us-east"}
|
|
776
|
+
)
|
|
777
|
+
|
|
778
|
+
# Simple creation
|
|
779
|
+
ns = db.create_namespace("tenant_456")
|
|
780
|
+
```
|
|
781
|
+
|
|
782
|
+
### Getting Namespaces
|
|
783
|
+
|
|
784
|
+
```typescript
|
|
785
|
+
# Get existing namespace
|
|
786
|
+
ns = db.namespace("tenant_123")
|
|
787
|
+
|
|
788
|
+
# Get or create (idempotent)
|
|
789
|
+
ns = db.get_or_create_namespace("tenant_123")
|
|
790
|
+
|
|
791
|
+
# Check if exists
|
|
792
|
+
exists = db.namespace_exists("tenant_123")
|
|
793
|
+
```
|
|
794
|
+
|
|
795
|
+
### Context Manager for Scoped Operations
|
|
796
|
+
|
|
797
|
+
```typescript
|
|
798
|
+
with db.use_namespace("tenant_123") as ns:
|
|
799
|
+
# All operations automatically scoped to tenant_123
|
|
800
|
+
collection = ns.collection("documents")
|
|
801
|
+
ns.put("config/key", b"value")
|
|
802
|
+
|
|
803
|
+
# No need to specify namespace in each call
|
|
804
|
+
```
|
|
805
|
+
|
|
806
|
+
### Namespace Operations
|
|
807
|
+
|
|
808
|
+
```typescript
|
|
809
|
+
# List all namespaces
|
|
810
|
+
namespaces = db.list_namespaces()
|
|
811
|
+
print(namespaces) # ['tenant_123', 'tenant_456']
|
|
812
|
+
|
|
813
|
+
# Get namespace info
|
|
814
|
+
info = db.namespace_info("tenant_123")
|
|
815
|
+
print(f"Created: {info['created_at']}")
|
|
816
|
+
print(f"Labels: {info['labels']}")
|
|
817
|
+
print(f"Size: {info['size_bytes']}")
|
|
818
|
+
|
|
819
|
+
# Update labels
|
|
820
|
+
db.update_namespace("tenant_123", labels={"tier": "enterprise"})
|
|
821
|
+
|
|
822
|
+
# Delete namespace (WARNING: deletes all data in namespace)
|
|
823
|
+
db.delete_namespace("old_tenant", force=True)
|
|
824
|
+
```
|
|
825
|
+
|
|
826
|
+
### Namespace-Scoped Key-Value
|
|
827
|
+
|
|
828
|
+
```typescript
|
|
829
|
+
ns = db.namespace("tenant_123")
|
|
830
|
+
|
|
831
|
+
# Operations automatically prefixed with namespace
|
|
832
|
+
ns.put("users/alice", b"data") # Actually: tenant_123/users/alice
|
|
833
|
+
ns.get("users/alice")
|
|
834
|
+
ns.delete("users/alice")
|
|
835
|
+
|
|
836
|
+
# Scan within namespace
|
|
837
|
+
for key, value in ns.scan("users/"):
|
|
838
|
+
print(key, value) # Keys shown without namespace prefix
|
|
839
|
+
```
|
|
840
|
+
|
|
841
|
+
### Cross-Namespace Operations
|
|
842
|
+
|
|
843
|
+
```typescript
|
|
844
|
+
# Copy data between namespaces
|
|
845
|
+
db.copy_between_namespaces(
|
|
846
|
+
source_ns="tenant_123",
|
|
847
|
+
target_ns="tenant_456",
|
|
848
|
+
prefix="shared/"
|
|
849
|
+
)
|
|
850
|
+
```
|
|
851
|
+
|
|
852
|
+
---
|
|
853
|
+
|
|
854
|
+
## 11. Collections & Vector Search
|
|
855
|
+
|
|
856
|
+
Collections store documents with embeddings for semantic search using HNSW.
|
|
857
|
+
|
|
858
|
+
### Collection Configuration
|
|
859
|
+
|
|
860
|
+
```typescript
|
|
861
|
+
from sochdb import (
|
|
862
|
+
CollectionConfig,
|
|
863
|
+
DistanceMetric,
|
|
864
|
+
QuantizationType,
|
|
865
|
+
)
|
|
866
|
+
|
|
867
|
+
config = CollectionConfig(
|
|
868
|
+
name="documents",
|
|
869
|
+
dimension=384, # Embedding dimension (must match your model)
|
|
870
|
+
metric=DistanceMetric.COSINE, # COSINE, EUCLIDEAN, DOT_PRODUCT
|
|
871
|
+
m=16, # HNSW M parameter (connections per node)
|
|
872
|
+
ef_construction=100, # HNSW construction quality
|
|
873
|
+
ef_search=50, # HNSW search quality (higher = slower but better)
|
|
874
|
+
quantization=QuantizationType.NONE, # NONE, SCALAR (int8), PQ (product quantization)
|
|
875
|
+
enable_hybrid_search=False, # Enable BM25 + vector
|
|
876
|
+
content_field=None, # Field for BM25 indexing
|
|
877
|
+
)
|
|
878
|
+
```
|
|
879
|
+
|
|
880
|
+
### Creating Collections
|
|
881
|
+
|
|
882
|
+
```typescript
|
|
883
|
+
ns = db.namespace("default")
|
|
884
|
+
|
|
885
|
+
# With config object
|
|
886
|
+
collection = ns.create_collection(config)
|
|
887
|
+
|
|
888
|
+
# With parameters (simpler)
|
|
889
|
+
collection = ns.create_collection(
|
|
890
|
+
name="documents",
|
|
891
|
+
dimension=384,
|
|
892
|
+
metric=DistanceMetric.COSINE
|
|
893
|
+
)
|
|
894
|
+
|
|
895
|
+
# Get existing collection
|
|
896
|
+
collection = ns.collection("documents")
|
|
897
|
+
```
|
|
898
|
+
|
|
899
|
+
### Inserting Documents
|
|
900
|
+
|
|
901
|
+
```typescript
|
|
902
|
+
# Single insert
|
|
903
|
+
collection.insert(
|
|
904
|
+
id="doc1",
|
|
905
|
+
vector=[0.1, 0.2, ...], # 384-dim float array
|
|
906
|
+
metadata={"title": "Introduction", "author": "Alice", "category": "tech"}
|
|
907
|
+
)
|
|
908
|
+
|
|
909
|
+
# Batch insert (more efficient for bulk loading)
|
|
910
|
+
collection.insert_batch(
|
|
911
|
+
ids=["doc1", "doc2", "doc3"],
|
|
912
|
+
vectors=[[...], [...], [...]], # List of vectors
|
|
913
|
+
metadata=[
|
|
914
|
+
{"title": "Doc 1"},
|
|
915
|
+
{"title": "Doc 2"},
|
|
916
|
+
{"title": "Doc 3"}
|
|
917
|
+
]
|
|
918
|
+
)
|
|
919
|
+
|
|
920
|
+
# Multi-vector insert (multiple vectors per document, e.g., chunks)
|
|
921
|
+
collection.insert_multi(
|
|
922
|
+
id="long_doc",
|
|
923
|
+
vectors=[[...], [...], [...]], # Multiple vectors for same doc
|
|
924
|
+
metadata={"title": "Long Document"}
|
|
925
|
+
)
|
|
926
|
+
```
|
|
927
|
+
|
|
928
|
+
### Vector Search
|
|
929
|
+
|
|
930
|
+
```typescript
|
|
931
|
+
from sochdb import SearchRequest
|
|
932
|
+
|
|
933
|
+
# Using SearchRequest (full control)
|
|
934
|
+
request = SearchRequest(
|
|
935
|
+
vector=[0.15, 0.25, ...], # Query vector
|
|
936
|
+
k=10, # Number of results
|
|
937
|
+
ef_search=100, # Search quality (overrides collection default)
|
|
938
|
+
filter={"author": "Alice"}, # Metadata filter
|
|
939
|
+
min_score=0.7, # Minimum similarity score
|
|
940
|
+
include_vectors=False, # Include vectors in results
|
|
941
|
+
include_metadata=True, # Include metadata in results
|
|
942
|
+
)
|
|
943
|
+
results = collection.search(request)
|
|
944
|
+
|
|
945
|
+
# Convenience method (simpler)
|
|
946
|
+
results = collection.vector_search(
|
|
947
|
+
vector=[0.15, 0.25, ...],
|
|
948
|
+
k=10,
|
|
949
|
+
filter={"author": "Alice"}
|
|
950
|
+
)
|
|
951
|
+
|
|
952
|
+
# Process results
|
|
953
|
+
for result in results:
|
|
954
|
+
print(f"ID: {result.id}")
|
|
955
|
+
print(f"Score: {result.score:.4f}") # Similarity score
|
|
956
|
+
print(f"Metadata: {result.metadata}")
|
|
957
|
+
```
|
|
958
|
+
|
|
959
|
+
### Metadata Filtering
|
|
960
|
+
|
|
961
|
+
```typescript
|
|
962
|
+
# Equality
|
|
963
|
+
filter={"author": "Alice"}
|
|
964
|
+
|
|
965
|
+
# Comparison operators
|
|
966
|
+
filter={"age": {"$gt": 30}} # Greater than
|
|
967
|
+
filter={"age": {"$gte": 30}} # Greater than or equal
|
|
968
|
+
filter={"age": {"$lt": 30}} # Less than
|
|
969
|
+
filter={"age": {"$lte": 30}} # Less than or equal
|
|
970
|
+
filter={"author": {"$ne": "Alice"}} # Not equal
|
|
971
|
+
|
|
972
|
+
# Array operators
|
|
973
|
+
filter={"category": {"$in": ["tech", "science"]}} # In array
|
|
974
|
+
filter={"category": {"$nin": ["sports"]}} # Not in array
|
|
975
|
+
|
|
976
|
+
# Logical operators
|
|
977
|
+
filter={"$and": [{"author": "Alice"}, {"year": 2024}]}
|
|
978
|
+
filter={"$or": [{"category": "tech"}, {"category": "science"}]}
|
|
979
|
+
filter={"$not": {"author": "Bob"}}
|
|
980
|
+
|
|
981
|
+
# Nested filters
|
|
982
|
+
filter={
|
|
983
|
+
"$and": [
|
|
984
|
+
{"$or": [{"category": "tech"}, {"category": "science"}]},
|
|
985
|
+
{"year": {"$gte": 2020}}
|
|
986
|
+
]
|
|
987
|
+
}
|
|
988
|
+
```
|
|
989
|
+
|
|
990
|
+
### Collection Management
|
|
991
|
+
|
|
992
|
+
```typescript
|
|
993
|
+
# Get collection
|
|
994
|
+
collection = ns.get_collection("documents")
|
|
995
|
+
# or
|
|
996
|
+
collection = ns.collection("documents")
|
|
997
|
+
|
|
998
|
+
# List collections
|
|
999
|
+
collections = ns.list_collections()
|
|
1000
|
+
|
|
1001
|
+
# Collection info
|
|
1002
|
+
info = collection.info()
|
|
1003
|
+
print(f"Name: {info['name']}")
|
|
1004
|
+
print(f"Dimension: {info['dimension']}")
|
|
1005
|
+
print(f"Count: {info['count']}")
|
|
1006
|
+
print(f"Metric: {info['metric']}")
|
|
1007
|
+
print(f"Index size: {info['index_size_bytes']}")
|
|
1008
|
+
|
|
1009
|
+
# Delete collection
|
|
1010
|
+
ns.delete_collection("old_collection")
|
|
1011
|
+
|
|
1012
|
+
# Individual document operations
|
|
1013
|
+
doc = collection.get("doc1")
|
|
1014
|
+
collection.delete("doc1")
|
|
1015
|
+
collection.update("doc1", metadata={"category": "updated"})
|
|
1016
|
+
count = collection.count()
|
|
1017
|
+
```
|
|
1018
|
+
|
|
1019
|
+
### Quantization for Memory Efficiency
|
|
1020
|
+
|
|
1021
|
+
```typescript
|
|
1022
|
+
# Scalar quantization (int8) - 4x memory reduction
|
|
1023
|
+
config = CollectionConfig(
|
|
1024
|
+
name="documents",
|
|
1025
|
+
dimension=384,
|
|
1026
|
+
quantization=QuantizationType.SCALAR
|
|
1027
|
+
)
|
|
1028
|
+
|
|
1029
|
+
# Product quantization - 32x memory reduction
|
|
1030
|
+
config = CollectionConfig(
|
|
1031
|
+
name="documents",
|
|
1032
|
+
dimension=768,
|
|
1033
|
+
quantization=QuantizationType.PQ,
|
|
1034
|
+
pq_num_subvectors=96, # 768/96 = 8 dimensions per subvector
|
|
1035
|
+
pq_num_centroids=256 # 8-bit codes
|
|
1036
|
+
)
|
|
1037
|
+
```
|
|
1038
|
+
|
|
1039
|
+
---
|
|
1040
|
+
|
|
1041
|
+
## 12. Hybrid Search (Vector + BM25)
|
|
1042
|
+
|
|
1043
|
+
Combine vector similarity with keyword matching for best results.
|
|
1044
|
+
|
|
1045
|
+
### Enable Hybrid Search
|
|
1046
|
+
|
|
1047
|
+
```typescript
|
|
1048
|
+
config = CollectionConfig(
|
|
1049
|
+
name="articles",
|
|
1050
|
+
dimension=384,
|
|
1051
|
+
enable_hybrid_search=True, # Enable BM25 indexing
|
|
1052
|
+
content_field="text" # Field to index for BM25
|
|
1053
|
+
)
|
|
1054
|
+
collection = ns.create_collection(config)
|
|
1055
|
+
|
|
1056
|
+
# Insert with text content
|
|
1057
|
+
collection.insert(
|
|
1058
|
+
id="article1",
|
|
1059
|
+
vector=[...],
|
|
1060
|
+
metadata={
|
|
1061
|
+
"title": "Machine Learning Tutorial",
|
|
1062
|
+
"text": "This tutorial covers the basics of machine learning...",
|
|
1063
|
+
"category": "tech"
|
|
1064
|
+
}
|
|
1065
|
+
)
|
|
1066
|
+
```
|
|
1067
|
+
|
|
1068
|
+
### Keyword Search (BM25 Only)
|
|
1069
|
+
|
|
1070
|
+
```typescript
|
|
1071
|
+
results = collection.keyword_search(
|
|
1072
|
+
query="machine learning tutorial",
|
|
1073
|
+
k=10,
|
|
1074
|
+
filter={"category": "tech"}
|
|
1075
|
+
)
|
|
1076
|
+
```
|
|
1077
|
+
|
|
1078
|
+
### Hybrid Search (Vector + BM25)
|
|
1079
|
+
|
|
1080
|
+
```typescript
|
|
1081
|
+
# Combine vector and keyword search
|
|
1082
|
+
results = collection.hybrid_search(
|
|
1083
|
+
vector=[0.1, 0.2, ...], # Query embedding
|
|
1084
|
+
text_query="machine learning", # Keyword query
|
|
1085
|
+
k=10,
|
|
1086
|
+
alpha=0.7, # 0.0 = pure keyword, 1.0 = pure vector, 0.5 = balanced
|
|
1087
|
+
filter={"category": "tech"}
|
|
1088
|
+
)
|
|
1089
|
+
```
|
|
1090
|
+
|
|
1091
|
+
### Full SearchRequest for Hybrid
|
|
1092
|
+
|
|
1093
|
+
```typescript
|
|
1094
|
+
request = SearchRequest(
|
|
1095
|
+
vector=[0.1, 0.2, ...],
|
|
1096
|
+
text_query="machine learning",
|
|
1097
|
+
k=10,
|
|
1098
|
+
alpha=0.7, # Blend factor
|
|
1099
|
+
rrf_k=60.0, # RRF k parameter (Reciprocal Rank Fusion)
|
|
1100
|
+
filter={"category": "tech"},
|
|
1101
|
+
aggregate="max", # max | mean | first (for multi-vector docs)
|
|
1102
|
+
as_of="2024-01-01T00:00:00Z", # Time-travel query
|
|
1103
|
+
include_vectors=False,
|
|
1104
|
+
include_metadata=True,
|
|
1105
|
+
include_scores=True,
|
|
1106
|
+
)
|
|
1107
|
+
results = collection.search(request)
|
|
1108
|
+
|
|
1109
|
+
# Access detailed results
|
|
1110
|
+
print(f"Query time: {results.query_time_ms}ms")
|
|
1111
|
+
print(f"Total matches: {results.total_count}")
|
|
1112
|
+
print(f"Vector results: {results.vector_results}") # Results from vector search
|
|
1113
|
+
print(f"Keyword results: {results.keyword_results}") # Results from BM25
|
|
1114
|
+
print(f"Fused results: {results.fused_results}") # Combined results
|
|
1115
|
+
```
|
|
1116
|
+
|
|
1117
|
+
---
|
|
1118
|
+
|
|
1119
|
+
## 13. Graph Operations
|
|
1120
|
+
|
|
1121
|
+
Build and query knowledge graphs.
|
|
1122
|
+
|
|
1123
|
+
### Adding Nodes
|
|
1124
|
+
|
|
1125
|
+
```typescript
|
|
1126
|
+
# Add a node
|
|
1127
|
+
db.add_node(
|
|
1128
|
+
namespace="default",
|
|
1129
|
+
node_id="alice",
|
|
1130
|
+
node_type="person",
|
|
1131
|
+
properties={"role": "engineer", "team": "ml", "level": "senior"}
|
|
1132
|
+
)
|
|
1133
|
+
|
|
1134
|
+
db.add_node("default", "project_x", "project", {"status": "active", "priority": "high"})
|
|
1135
|
+
db.add_node("default", "bob", "person", {"role": "manager", "team": "ml"})
|
|
1136
|
+
```
|
|
1137
|
+
|
|
1138
|
+
### Adding Edges
|
|
1139
|
+
|
|
1140
|
+
```typescript
|
|
1141
|
+
# Add directed edge
|
|
1142
|
+
db.add_edge(
|
|
1143
|
+
namespace="default",
|
|
1144
|
+
from_id="alice",
|
|
1145
|
+
edge_type="works_on",
|
|
1146
|
+
to_id="project_x",
|
|
1147
|
+
properties={"role": "lead", "since": "2024-01"}
|
|
1148
|
+
)
|
|
1149
|
+
|
|
1150
|
+
db.add_edge("default", "alice", "reports_to", "bob")
|
|
1151
|
+
db.add_edge("default", "bob", "manages", "project_x")
|
|
1152
|
+
```
|
|
1153
|
+
|
|
1154
|
+
### Graph Traversal
|
|
1155
|
+
|
|
1156
|
+
```typescript
|
|
1157
|
+
# BFS traversal from a starting node
|
|
1158
|
+
nodes, edges = db.traverse(
|
|
1159
|
+
namespace="default",
|
|
1160
|
+
start_node="alice",
|
|
1161
|
+
max_depth=3,
|
|
1162
|
+
order="bfs" # "bfs" or "dfs"
|
|
1163
|
+
)
|
|
1164
|
+
|
|
1165
|
+
for node in nodes:
|
|
1166
|
+
print(f"Node: {node['id']} ({node['node_type']})")
|
|
1167
|
+
print(f" Properties: {node['properties']}")
|
|
1168
|
+
|
|
1169
|
+
for edge in edges:
|
|
1170
|
+
print(f"{edge['from_id']} --{edge['edge_type']}--> {edge['to_id']}")
|
|
1171
|
+
```
|
|
1172
|
+
|
|
1173
|
+
### Filtered Traversal
|
|
1174
|
+
|
|
1175
|
+
```typescript
|
|
1176
|
+
# Traverse with filters
|
|
1177
|
+
nodes, edges = db.traverse(
|
|
1178
|
+
namespace="default",
|
|
1179
|
+
start_node="alice",
|
|
1180
|
+
max_depth=2,
|
|
1181
|
+
edge_types=["works_on", "reports_to"], # Only follow these edge types
|
|
1182
|
+
node_types=["person", "project"], # Only include these node types
|
|
1183
|
+
node_filter={"team": "ml"} # Filter nodes by properties
|
|
1184
|
+
)
|
|
1185
|
+
```
|
|
1186
|
+
|
|
1187
|
+
### Graph Queries
|
|
1188
|
+
|
|
1189
|
+
```typescript
|
|
1190
|
+
# Find shortest path
|
|
1191
|
+
path = db.find_path(
|
|
1192
|
+
namespace="default",
|
|
1193
|
+
from_id="alice",
|
|
1194
|
+
to_id="project_y",
|
|
1195
|
+
max_depth=5
|
|
1196
|
+
)
|
|
1197
|
+
|
|
1198
|
+
# Get neighbors
|
|
1199
|
+
neighbors = db.get_neighbors(
|
|
1200
|
+
namespace="default",
|
|
1201
|
+
node_id="alice",
|
|
1202
|
+
direction="outgoing" # "outgoing", "incoming", "both"
|
|
1203
|
+
)
|
|
1204
|
+
|
|
1205
|
+
# Get specific edge
|
|
1206
|
+
edge = db.get_edge("default", "alice", "works_on", "project_x")
|
|
1207
|
+
|
|
1208
|
+
# Delete node (and all connected edges)
|
|
1209
|
+
db.delete_node("default", "old_node")
|
|
1210
|
+
|
|
1211
|
+
# Delete edge
|
|
1212
|
+
db.delete_edge("default", "alice", "works_on", "project_old")
|
|
1213
|
+
```
|
|
1214
|
+
|
|
1215
|
+
---
|
|
1216
|
+
|
|
1217
|
+
## 14. Temporal Graph (Time-Travel)
|
|
1218
|
+
|
|
1219
|
+
Track state changes over time with temporal edges.
|
|
1220
|
+
|
|
1221
|
+
### Adding Temporal Edges
|
|
1222
|
+
|
|
1223
|
+
```typescript
|
|
1224
|
+
import time
|
|
1225
|
+
|
|
1226
|
+
now = int(time.time() * 1000) # milliseconds since epoch
|
|
1227
|
+
one_hour = 60 * 60 * 1000
|
|
1228
|
+
|
|
1229
|
+
# Record: Door was open from 10:00 to 11:00
|
|
1230
|
+
db.add_temporal_edge(
|
|
1231
|
+
namespace="smart_home",
|
|
1232
|
+
from_id="door_front",
|
|
1233
|
+
edge_type="STATE",
|
|
1234
|
+
to_id="open",
|
|
1235
|
+
valid_from=now - one_hour, # Start time (ms)
|
|
1236
|
+
valid_until=now, # End time (ms)
|
|
1237
|
+
properties={"sensor": "motion_1", "confidence": 0.95}
|
|
1238
|
+
)
|
|
1239
|
+
|
|
1240
|
+
# Record: Light is currently on (no end time yet)
|
|
1241
|
+
db.add_temporal_edge(
|
|
1242
|
+
namespace="smart_home",
|
|
1243
|
+
from_id="light_living",
|
|
1244
|
+
edge_type="STATE",
|
|
1245
|
+
to_id="on",
|
|
1246
|
+
valid_from=now,
|
|
1247
|
+
valid_until=0, # 0 = still valid (no end time)
|
|
1248
|
+
properties={"brightness": "80%", "color": "warm"}
|
|
1249
|
+
)
|
|
1250
|
+
```
|
|
1251
|
+
|
|
1252
|
+
### Time-Travel Queries
|
|
1253
|
+
|
|
1254
|
+
```typescript
|
|
1255
|
+
# Query modes:
|
|
1256
|
+
# - "CURRENT": Edges valid right now
|
|
1257
|
+
# - "POINT_IN_TIME": Edges valid at specific timestamp
|
|
1258
|
+
# - "RANGE": All edges within a time range
|
|
1259
|
+
|
|
1260
|
+
# What is the current state?
|
|
1261
|
+
edges = db.query_temporal_graph(
|
|
1262
|
+
namespace="smart_home",
|
|
1263
|
+
node_id="door_front",
|
|
1264
|
+
mode="CURRENT",
|
|
1265
|
+
edge_type="STATE"
|
|
1266
|
+
)
|
|
1267
|
+
current_state = edges[0]["to_id"] if edges else "unknown"
|
|
1268
|
+
|
|
1269
|
+
# Was the door open 1.5 hours ago?
|
|
1270
|
+
edges = db.query_temporal_graph(
|
|
1271
|
+
namespace="smart_home",
|
|
1272
|
+
node_id="door_front",
|
|
1273
|
+
mode="POINT_IN_TIME",
|
|
1274
|
+
timestamp=now - int(1.5 * 60 * 60 * 1000)
|
|
1275
|
+
)
|
|
1276
|
+
was_open = any(e["to_id"] == "open" for e in edges)
|
|
1277
|
+
|
|
1278
|
+
# All state changes in last hour
|
|
1279
|
+
edges = db.query_temporal_graph(
|
|
1280
|
+
namespace="smart_home",
|
|
1281
|
+
node_id="door_front",
|
|
1282
|
+
mode="RANGE",
|
|
1283
|
+
start_time=now - one_hour,
|
|
1284
|
+
end_time=now
|
|
1285
|
+
)
|
|
1286
|
+
for edge in edges:
|
|
1287
|
+
print(f"State: {edge['to_id']} from {edge['valid_from']} to {edge['valid_until']}")
|
|
1288
|
+
```
|
|
1289
|
+
|
|
1290
|
+
### End a Temporal Edge
|
|
1291
|
+
|
|
1292
|
+
```typescript
|
|
1293
|
+
# Close the current "on" state
|
|
1294
|
+
db.end_temporal_edge(
|
|
1295
|
+
namespace="smart_home",
|
|
1296
|
+
from_id="light_living",
|
|
1297
|
+
edge_type="STATE",
|
|
1298
|
+
to_id="on",
|
|
1299
|
+
end_time=int(time.time() * 1000)
|
|
1300
|
+
)
|
|
1301
|
+
```
|
|
1302
|
+
|
|
1303
|
+
---
|
|
1304
|
+
|
|
1305
|
+
## 15. Semantic Cache
|
|
1306
|
+
|
|
1307
|
+
Cache LLM responses with similarity-based retrieval for cost savings.
|
|
1308
|
+
|
|
1309
|
+
### Storing Cached Responses
|
|
1310
|
+
|
|
1311
|
+
```typescript
|
|
1312
|
+
# Store response with embedding
|
|
1313
|
+
db.cache_put(
|
|
1314
|
+
cache_name="llm_responses",
|
|
1315
|
+
key="What is Python?", # Original query (for display/debugging)
|
|
1316
|
+
value="Python is a high-level programming language...",
|
|
1317
|
+
embedding=[0.1, 0.2, ...], # Query embedding (384-dim)
|
|
1318
|
+
ttl_seconds=3600, # Expire in 1 hour (0 = no expiry)
|
|
1319
|
+
metadata={"model": "claude-3", "tokens": 150}
|
|
1320
|
+
)
|
|
1321
|
+
```
|
|
1322
|
+
|
|
1323
|
+
### Cache Lookup
|
|
1324
|
+
|
|
1325
|
+
```typescript
|
|
1326
|
+
# Check cache before calling LLM
|
|
1327
|
+
cached = db.cache_get(
|
|
1328
|
+
cache_name="llm_responses",
|
|
1329
|
+
query_embedding=[0.12, 0.18, ...], # Embed the new query
|
|
1330
|
+
threshold=0.85 # Cosine similarity threshold
|
|
1331
|
+
)
|
|
1332
|
+
|
|
1333
|
+
if cached:
|
|
1334
|
+
print(f"Cache HIT!")
|
|
1335
|
+
print(f"Original query: {cached['key']}")
|
|
1336
|
+
print(f"Response: {cached['value']}")
|
|
1337
|
+
print(f"Similarity: {cached['score']:.4f}")
|
|
1338
|
+
else:
|
|
1339
|
+
print("Cache MISS - calling LLM...")
|
|
1340
|
+
# Call LLM and cache the result
|
|
1341
|
+
```
|
|
1342
|
+
|
|
1343
|
+
### Cache Management
|
|
1344
|
+
|
|
1345
|
+
```typescript
|
|
1346
|
+
# Delete specific entry
|
|
1347
|
+
db.cache_delete("llm_responses", key="What is Python?")
|
|
1348
|
+
|
|
1349
|
+
# Clear entire cache
|
|
1350
|
+
db.cache_clear("llm_responses")
|
|
1351
|
+
|
|
1352
|
+
# Get cache statistics
|
|
1353
|
+
stats = db.cache_stats("llm_responses")
|
|
1354
|
+
print(f"Total entries: {stats['count']}")
|
|
1355
|
+
print(f"Hit rate: {stats['hit_rate']:.2%}")
|
|
1356
|
+
print(f"Memory usage: {stats['size_bytes']}")
|
|
1357
|
+
```
|
|
1358
|
+
|
|
1359
|
+
### Full Usage Pattern
|
|
1360
|
+
|
|
1361
|
+
```typescript
|
|
1362
|
+
def get_llm_response(query: str, embed_fn, llm_fn):
|
|
1363
|
+
"""Get response from cache or LLM."""
|
|
1364
|
+
query_embedding = embed_fn(query)
|
|
1365
|
+
|
|
1366
|
+
# Try cache first
|
|
1367
|
+
cached = db.cache_get(
|
|
1368
|
+
cache_name="llm_responses",
|
|
1369
|
+
query_embedding=query_embedding,
|
|
1370
|
+
threshold=0.90
|
|
1371
|
+
)
|
|
1372
|
+
|
|
1373
|
+
if cached:
|
|
1374
|
+
return cached['value']
|
|
1375
|
+
|
|
1376
|
+
# Cache miss - call LLM
|
|
1377
|
+
response = llm_fn(query)
|
|
1378
|
+
|
|
1379
|
+
# Store in cache
|
|
1380
|
+
db.cache_put(
|
|
1381
|
+
cache_name="llm_responses",
|
|
1382
|
+
key=query,
|
|
1383
|
+
value=response,
|
|
1384
|
+
embedding=query_embedding,
|
|
1385
|
+
ttl_seconds=86400 # 24 hours
|
|
1386
|
+
)
|
|
1387
|
+
|
|
1388
|
+
return response
|
|
1389
|
+
```
|
|
1390
|
+
|
|
1391
|
+
---
|
|
1392
|
+
|
|
1393
|
+
## 16. Context Query Builder (LLM Optimization)
|
|
1394
|
+
|
|
1395
|
+
Assemble LLM context with token budgeting and priority-based truncation.
|
|
1396
|
+
|
|
1397
|
+
### Basic Context Query
|
|
1398
|
+
|
|
1399
|
+
```typescript
|
|
1400
|
+
from sochdb import ContextQueryBuilder, ContextFormat, TruncationStrategy
|
|
1401
|
+
|
|
1402
|
+
# Build context for LLM
|
|
1403
|
+
context = ContextQueryBuilder() \
|
|
1404
|
+
.for_session("session_123") \
|
|
1405
|
+
.with_budget(4096) \
|
|
1406
|
+
.format(ContextFormat.TOON) \
|
|
1407
|
+
.literal("SYSTEM", priority=0, text="You are a helpful assistant.") \
|
|
1408
|
+
.section("USER_PROFILE", priority=1) \
|
|
1409
|
+
.get("user.profile.{name, preferences}") \
|
|
1410
|
+
.done() \
|
|
1411
|
+
.section("HISTORY", priority=2) \
|
|
1412
|
+
.last(10, "messages") \
|
|
1413
|
+
.where_eq("session_id", "session_123") \
|
|
1414
|
+
.done() \
|
|
1415
|
+
.section("KNOWLEDGE", priority=3) \
|
|
1416
|
+
.search("documents", "$query_embedding", k=5) \
|
|
1417
|
+
.done() \
|
|
1418
|
+
.execute()
|
|
1419
|
+
|
|
1420
|
+
print(f"Token count: {context.token_count}")
|
|
1421
|
+
print(f"Context:\n{context.text}")
|
|
1422
|
+
```
|
|
1423
|
+
|
|
1424
|
+
### Section Types
|
|
1425
|
+
|
|
1426
|
+
| Type | Method | Description |
|
|
1427
|
+
|------|--------|-------------|
|
|
1428
|
+
| `literal` | `.literal(name, priority, text)` | Static text content |
|
|
1429
|
+
| `get` | `.get(path)` | Fetch specific data by path |
|
|
1430
|
+
| `last` | `.last(n, table)` | Most recent N records from table |
|
|
1431
|
+
| `search` | `.search(collection, embedding, k)` | Vector similarity search |
|
|
1432
|
+
| `sql` | `.sql(query)` | SQL query results |
|
|
1433
|
+
|
|
1434
|
+
### Truncation Strategies
|
|
1435
|
+
|
|
1436
|
+
```typescript
|
|
1437
|
+
# Drop from end (keep beginning) - default
|
|
1438
|
+
.truncation(TruncationStrategy.TAIL_DROP)
|
|
1439
|
+
|
|
1440
|
+
# Drop from beginning (keep end)
|
|
1441
|
+
.truncation(TruncationStrategy.HEAD_DROP)
|
|
1442
|
+
|
|
1443
|
+
# Proportionally truncate across sections
|
|
1444
|
+
.truncation(TruncationStrategy.PROPORTIONAL)
|
|
1445
|
+
|
|
1446
|
+
# Fail if budget exceeded
|
|
1447
|
+
.truncation(TruncationStrategy.STRICT)
|
|
1448
|
+
```
|
|
1449
|
+
|
|
1450
|
+
### Variables and Bindings
|
|
1451
|
+
|
|
1452
|
+
```typescript
|
|
1453
|
+
from sochdb import ContextValue
|
|
1454
|
+
|
|
1455
|
+
context = ContextQueryBuilder() \
|
|
1456
|
+
.for_session("session_123") \
|
|
1457
|
+
.set_var("query_embedding", ContextValue.Embedding([0.1, 0.2, ...])) \
|
|
1458
|
+
.set_var("user_id", ContextValue.String("user_456")) \
|
|
1459
|
+
.section("KNOWLEDGE", priority=2) \
|
|
1460
|
+
.search("documents", "$query_embedding", k=5) \
|
|
1461
|
+
.done() \
|
|
1462
|
+
.execute()
|
|
1463
|
+
```
|
|
1464
|
+
|
|
1465
|
+
### Output Formats
|
|
1466
|
+
|
|
1467
|
+
```typescript
|
|
1468
|
+
# TOON format (40-60% fewer tokens)
|
|
1469
|
+
.format(ContextFormat.TOON)
|
|
1470
|
+
|
|
1471
|
+
# JSON format
|
|
1472
|
+
.format(ContextFormat.JSON)
|
|
1473
|
+
|
|
1474
|
+
# Markdown format (human-readable)
|
|
1475
|
+
.format(ContextFormat.MARKDOWN)
|
|
1476
|
+
|
|
1477
|
+
# Plain text
|
|
1478
|
+
.format(ContextFormat.TEXT)
|
|
1479
|
+
```
|
|
1480
|
+
|
|
1481
|
+
|
|
1482
|
+
## Session Management (Agent Context)
|
|
1483
|
+
|
|
1484
|
+
Stateful session management for agentic use cases with permissions, sandboxing, audit logging, and budget tracking.
|
|
1485
|
+
|
|
1486
|
+
### Session Overview
|
|
1487
|
+
|
|
1488
|
+
```
|
|
1489
|
+
Agent session abc123:
|
|
1490
|
+
cwd: /agents/abc123
|
|
1491
|
+
vars: $model = "gpt-4", $budget = 1000
|
|
1492
|
+
permissions: fs:rw, db:rw, calc:*
|
|
1493
|
+
audit: [read /data/users, write /agents/abc123/cache]
|
|
1494
|
+
```
|
|
1495
|
+
|
|
1496
|
+
### Creating Sessions
|
|
1497
|
+
|
|
1498
|
+
```typescript
|
|
1499
|
+
from sochdb import SessionManager, AgentContext
|
|
1500
|
+
from datetime import timedelta
|
|
1501
|
+
|
|
1502
|
+
# Create session manager with idle timeout
|
|
1503
|
+
session_mgr = SessionManager(idle_timeout=timedelta(hours=1))
|
|
1504
|
+
|
|
1505
|
+
# Create a new session
|
|
1506
|
+
session = session_mgr.create_session("session_abc123")
|
|
1507
|
+
|
|
1508
|
+
# Get existing session
|
|
1509
|
+
session = session_mgr.get_session("session_abc123")
|
|
1510
|
+
|
|
1511
|
+
# Get or create (idempotent)
|
|
1512
|
+
session = session_mgr.get_or_create("session_abc123")
|
|
1513
|
+
|
|
1514
|
+
# Remove session
|
|
1515
|
+
session_mgr.remove_session("session_abc123")
|
|
1516
|
+
|
|
1517
|
+
# Cleanup expired sessions
|
|
1518
|
+
removed_count = session_mgr.cleanup_expired()
|
|
1519
|
+
|
|
1520
|
+
# Get active session count
|
|
1521
|
+
count = session_mgr.session_count()
|
|
1522
|
+
```
|
|
1523
|
+
|
|
1524
|
+
### Agent Context
|
|
1525
|
+
|
|
1526
|
+
```typescript
|
|
1527
|
+
from sochdb import AgentContext, ContextValue
|
|
1528
|
+
|
|
1529
|
+
# Create agent context
|
|
1530
|
+
ctx = AgentContext("session_abc123")
|
|
1531
|
+
print(f"Session ID: {ctx.session_id}")
|
|
1532
|
+
print(f"Working dir: {ctx.working_dir}") # /agents/session_abc123
|
|
1533
|
+
|
|
1534
|
+
# Create with custom working directory
|
|
1535
|
+
ctx = AgentContext.with_working_dir("session_abc123", "/custom/path")
|
|
1536
|
+
|
|
1537
|
+
# Create with full permissions (trusted agents)
|
|
1538
|
+
ctx = AgentContext.with_full_permissions("session_abc123")
|
|
1539
|
+
```
|
|
1540
|
+
|
|
1541
|
+
### Session Variables
|
|
1542
|
+
|
|
1543
|
+
```typescript
|
|
1544
|
+
# Set variables
|
|
1545
|
+
ctx.set_var("model", ContextValue.String("gpt-4"))
|
|
1546
|
+
ctx.set_var("budget", ContextValue.Number(1000.0))
|
|
1547
|
+
ctx.set_var("debug", ContextValue.Bool(True))
|
|
1548
|
+
ctx.set_var("tags", ContextValue.List([
|
|
1549
|
+
ContextValue.String("ml"),
|
|
1550
|
+
ContextValue.String("production")
|
|
1551
|
+
]))
|
|
1552
|
+
|
|
1553
|
+
# Get variables
|
|
1554
|
+
model = ctx.get_var("model") # Returns ContextValue or None
|
|
1555
|
+
budget = ctx.get_var("budget")
|
|
1556
|
+
|
|
1557
|
+
# Peek (read-only, no audit)
|
|
1558
|
+
value = ctx.peek_var("model")
|
|
1559
|
+
|
|
1560
|
+
# Variable substitution in strings
|
|
1561
|
+
text = ctx.substitute_vars("Using $model with budget $budget")
|
|
1562
|
+
# Result: "Using gpt-4 with budget 1000"
|
|
1563
|
+
```
|
|
1564
|
+
|
|
1565
|
+
### Context Value Types
|
|
1566
|
+
|
|
1567
|
+
```typescript
|
|
1568
|
+
from sochdb import ContextValue
|
|
1569
|
+
|
|
1570
|
+
# String
|
|
1571
|
+
ContextValue.String("hello")
|
|
1572
|
+
|
|
1573
|
+
# Number (float)
|
|
1574
|
+
ContextValue.Number(42.5)
|
|
1575
|
+
|
|
1576
|
+
# Boolean
|
|
1577
|
+
ContextValue.Bool(True)
|
|
1578
|
+
|
|
1579
|
+
# List
|
|
1580
|
+
ContextValue.List([
|
|
1581
|
+
ContextValue.String("a"),
|
|
1582
|
+
ContextValue.Number(1.0)
|
|
1583
|
+
])
|
|
1584
|
+
|
|
1585
|
+
# Object (dict)
|
|
1586
|
+
ContextValue.Object({
|
|
1587
|
+
"key": ContextValue.String("value"),
|
|
1588
|
+
"count": ContextValue.Number(10.0)
|
|
1589
|
+
})
|
|
1590
|
+
|
|
1591
|
+
# Null
|
|
1592
|
+
ContextValue.Null()
|
|
1593
|
+
```
|
|
1594
|
+
|
|
1595
|
+
### Permissions
|
|
1596
|
+
|
|
1597
|
+
```typescript
|
|
1598
|
+
from sochdb import (
|
|
1599
|
+
AgentPermissions,
|
|
1600
|
+
FsPermissions,
|
|
1601
|
+
DbPermissions,
|
|
1602
|
+
NetworkPermissions
|
|
1603
|
+
)
|
|
1604
|
+
|
|
1605
|
+
# Configure permissions
|
|
1606
|
+
ctx.permissions = AgentPermissions(
|
|
1607
|
+
filesystem=FsPermissions(
|
|
1608
|
+
read=True,
|
|
1609
|
+
write=True,
|
|
1610
|
+
mkdir=True,
|
|
1611
|
+
delete=False,
|
|
1612
|
+
allowed_paths=["/agents/session_abc123", "/shared/data"]
|
|
1613
|
+
),
|
|
1614
|
+
database=DbPermissions(
|
|
1615
|
+
read=True,
|
|
1616
|
+
write=True,
|
|
1617
|
+
create=False,
|
|
1618
|
+
drop=False,
|
|
1619
|
+
allowed_tables=["user_*", "cache_*"] # Pattern matching
|
|
1620
|
+
),
|
|
1621
|
+
calculator=True,
|
|
1622
|
+
network=NetworkPermissions(
|
|
1623
|
+
http=True,
|
|
1624
|
+
allowed_domains=["api.example.com", "*.internal.net"]
|
|
1625
|
+
)
|
|
1626
|
+
)
|
|
1627
|
+
|
|
1628
|
+
# Check permissions before operations
|
|
1629
|
+
try:
|
|
1630
|
+
ctx.check_fs_permission("/agents/session_abc123/data.json", AuditOperation.FS_READ)
|
|
1631
|
+
# Permission granted
|
|
1632
|
+
except ContextError as e:
|
|
1633
|
+
print(f"Permission denied: {e}")
|
|
1634
|
+
|
|
1635
|
+
try:
|
|
1636
|
+
ctx.check_db_permission("user_profiles", AuditOperation.DB_QUERY)
|
|
1637
|
+
# Permission granted
|
|
1638
|
+
except ContextError as e:
|
|
1639
|
+
print(f"Permission denied: {e}")
|
|
1640
|
+
```
|
|
1641
|
+
|
|
1642
|
+
### Budget Tracking
|
|
1643
|
+
|
|
1644
|
+
```typescript
|
|
1645
|
+
from sochdb import OperationBudget
|
|
1646
|
+
|
|
1647
|
+
# Configure budget limits
|
|
1648
|
+
ctx.budget = OperationBudget(
|
|
1649
|
+
max_tokens=100000, # Maximum tokens (input + output)
|
|
1650
|
+
max_cost=5000, # Maximum cost in millicents ($50.00)
|
|
1651
|
+
max_operations=10000 # Maximum operation count
|
|
1652
|
+
)
|
|
1653
|
+
|
|
1654
|
+
# Consume budget (called automatically by operations)
|
|
1655
|
+
try:
|
|
1656
|
+
ctx.consume_budget(tokens=500, cost=10) # 500 tokens, $0.10
|
|
1657
|
+
except ContextError as e:
|
|
1658
|
+
if "Budget exceeded" in str(e):
|
|
1659
|
+
print("Budget limit reached!")
|
|
1660
|
+
|
|
1661
|
+
# Check budget status
|
|
1662
|
+
print(f"Tokens used: {ctx.budget.tokens_used}/{ctx.budget.max_tokens}")
|
|
1663
|
+
print(f"Cost used: ${ctx.budget.cost_used / 100:.2f}/${ctx.budget.max_cost / 100:.2f}")
|
|
1664
|
+
print(f"Operations: {ctx.budget.operations_used}/{ctx.budget.max_operations}")
|
|
1665
|
+
```
|
|
1666
|
+
|
|
1667
|
+
### Session Transactions
|
|
1668
|
+
|
|
1669
|
+
```typescript
|
|
1670
|
+
# Begin transaction within session
|
|
1671
|
+
ctx.begin_transaction(tx_id=12345)
|
|
1672
|
+
|
|
1673
|
+
# Create savepoint
|
|
1674
|
+
ctx.savepoint("before_update")
|
|
1675
|
+
|
|
1676
|
+
# Record pending writes (for rollback)
|
|
1677
|
+
ctx.record_pending_write(
|
|
1678
|
+
resource_type=ResourceType.FILE,
|
|
1679
|
+
resource_key="/agents/session_abc123/data.json",
|
|
1680
|
+
original_value=b'{"old": "data"}'
|
|
1681
|
+
)
|
|
1682
|
+
|
|
1683
|
+
# Commit transaction
|
|
1684
|
+
ctx.commit_transaction()
|
|
1685
|
+
|
|
1686
|
+
# Or rollback
|
|
1687
|
+
pending_writes = ctx.rollback_transaction()
|
|
1688
|
+
for write in pending_writes:
|
|
1689
|
+
print(f"Rolling back: {write.resource_key}")
|
|
1690
|
+
# Restore original_value
|
|
1691
|
+
```
|
|
1692
|
+
|
|
1693
|
+
### Path Resolution
|
|
1694
|
+
|
|
1695
|
+
```typescript
|
|
1696
|
+
# Paths are resolved relative to working directory
|
|
1697
|
+
ctx = AgentContext.with_working_dir("session_abc123", "/home/agent")
|
|
1698
|
+
|
|
1699
|
+
# Relative paths
|
|
1700
|
+
resolved = ctx.resolve_path("data.json") # /home/agent/data.json
|
|
1701
|
+
|
|
1702
|
+
# Absolute paths pass through
|
|
1703
|
+
resolved = ctx.resolve_path("/absolute/path") # /absolute/path
|
|
1704
|
+
```
|
|
1705
|
+
|
|
1706
|
+
### Audit Trail
|
|
1707
|
+
|
|
1708
|
+
```typescript
|
|
1709
|
+
# All operations are automatically logged
|
|
1710
|
+
# Audit entry includes: timestamp, operation, resource, result, metadata
|
|
1711
|
+
|
|
1712
|
+
# Export audit log
|
|
1713
|
+
audit_log = ctx.export_audit()
|
|
1714
|
+
for entry in audit_log:
|
|
1715
|
+
print(f"[{entry['timestamp']}] {entry['operation']}: {entry['resource']} -> {entry['result']}")
|
|
1716
|
+
|
|
1717
|
+
# Example output:
|
|
1718
|
+
# [1705312345] var.set: model -> success
|
|
1719
|
+
# [1705312346] fs.read: /data/config.json -> success
|
|
1720
|
+
# [1705312347] db.query: users -> success
|
|
1721
|
+
# [1705312348] fs.write: /forbidden/file -> denied:path not in allowed paths
|
|
1722
|
+
```
|
|
1723
|
+
|
|
1724
|
+
### Audit Operations
|
|
1725
|
+
|
|
1726
|
+
```typescript
|
|
1727
|
+
from sochdb import AuditOperation
|
|
1728
|
+
|
|
1729
|
+
# Filesystem operations
|
|
1730
|
+
AuditOperation.FS_READ
|
|
1731
|
+
AuditOperation.FS_WRITE
|
|
1732
|
+
AuditOperation.FS_MKDIR
|
|
1733
|
+
AuditOperation.FS_DELETE
|
|
1734
|
+
AuditOperation.FS_LIST
|
|
1735
|
+
|
|
1736
|
+
# Database operations
|
|
1737
|
+
AuditOperation.DB_QUERY
|
|
1738
|
+
AuditOperation.DB_INSERT
|
|
1739
|
+
AuditOperation.DB_UPDATE
|
|
1740
|
+
AuditOperation.DB_DELETE
|
|
1741
|
+
|
|
1742
|
+
# Other operations
|
|
1743
|
+
AuditOperation.CALCULATE
|
|
1744
|
+
AuditOperation.VAR_SET
|
|
1745
|
+
AuditOperation.VAR_GET
|
|
1746
|
+
AuditOperation.TX_BEGIN
|
|
1747
|
+
AuditOperation.TX_COMMIT
|
|
1748
|
+
AuditOperation.TX_ROLLBACK
|
|
1749
|
+
```
|
|
1750
|
+
|
|
1751
|
+
### Tool Registry
|
|
1752
|
+
|
|
1753
|
+
```typescript
|
|
1754
|
+
from sochdb import ToolDefinition, ToolCallRecord
|
|
1755
|
+
from datetime import datetime
|
|
1756
|
+
|
|
1757
|
+
# Register tools available to the agent
|
|
1758
|
+
ctx.register_tool(ToolDefinition(
|
|
1759
|
+
name="search_documents",
|
|
1760
|
+
description="Search documents by semantic similarity",
|
|
1761
|
+
parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}}}',
|
|
1762
|
+
requires_confirmation=False
|
|
1763
|
+
))
|
|
1764
|
+
|
|
1765
|
+
ctx.register_tool(ToolDefinition(
|
|
1766
|
+
name="delete_file",
|
|
1767
|
+
description="Delete a file from the filesystem",
|
|
1768
|
+
parameters_schema='{"type": "object", "properties": {"path": {"type": "string"}}}',
|
|
1769
|
+
requires_confirmation=True # Requires user confirmation
|
|
1770
|
+
))
|
|
1771
|
+
|
|
1772
|
+
# Record tool calls
|
|
1773
|
+
ctx.record_tool_call(ToolCallRecord(
|
|
1774
|
+
call_id="call_001",
|
|
1775
|
+
tool_name="search_documents",
|
|
1776
|
+
arguments='{"query": "machine learning"}',
|
|
1777
|
+
result='[{"id": "doc1", "score": 0.95}]',
|
|
1778
|
+
error=None,
|
|
1779
|
+
timestamp=datetime.now()
|
|
1780
|
+
))
|
|
1781
|
+
|
|
1782
|
+
# Access tool call history
|
|
1783
|
+
for call in ctx.tool_calls:
|
|
1784
|
+
print(f"{call.tool_name}: {call.result or call.error}")
|
|
1785
|
+
```
|
|
1786
|
+
|
|
1787
|
+
### Session Lifecycle
|
|
1788
|
+
|
|
1789
|
+
```typescript
|
|
1790
|
+
# Check session age
|
|
1791
|
+
age = ctx.age()
|
|
1792
|
+
print(f"Session age: {age}")
|
|
1793
|
+
|
|
1794
|
+
# Check idle time
|
|
1795
|
+
idle = ctx.idle_time()
|
|
1796
|
+
print(f"Idle time: {idle}")
|
|
1797
|
+
|
|
1798
|
+
# Check if expired
|
|
1799
|
+
if ctx.is_expired(idle_timeout=timedelta(hours=1)):
|
|
1800
|
+
print("Session has expired!")
|
|
1801
|
+
```
|
|
1802
|
+
|
|
1803
|
+
### Complete Session Example
|
|
1804
|
+
|
|
1805
|
+
```typescript
|
|
1806
|
+
from sochdb import (
|
|
1807
|
+
SessionManager, AgentContext, ContextValue,
|
|
1808
|
+
AgentPermissions, FsPermissions, DbPermissions,
|
|
1809
|
+
OperationBudget, ToolDefinition, AuditOperation
|
|
1810
|
+
)
|
|
1811
|
+
from datetime import timedelta
|
|
1812
|
+
|
|
1813
|
+
# Initialize session manager
|
|
1814
|
+
session_mgr = SessionManager(idle_timeout=timedelta(hours=2))
|
|
1815
|
+
|
|
1816
|
+
# Create session for an agent
|
|
1817
|
+
session_id = "agent_session_12345"
|
|
1818
|
+
ctx = session_mgr.get_or_create(session_id)
|
|
1819
|
+
|
|
1820
|
+
# Configure the agent
|
|
1821
|
+
ctx.permissions = AgentPermissions(
|
|
1822
|
+
filesystem=FsPermissions(
|
|
1823
|
+
read=True,
|
|
1824
|
+
write=True,
|
|
1825
|
+
allowed_paths=[f"/agents/{session_id}", "/shared"]
|
|
1826
|
+
),
|
|
1827
|
+
database=DbPermissions(
|
|
1828
|
+
read=True,
|
|
1829
|
+
write=True,
|
|
1830
|
+
allowed_tables=["documents", "cache_*"]
|
|
1831
|
+
),
|
|
1832
|
+
calculator=True
|
|
1833
|
+
)
|
|
1834
|
+
|
|
1835
|
+
ctx.budget = OperationBudget(
|
|
1836
|
+
max_tokens=50000,
|
|
1837
|
+
max_cost=1000, # $10.00
|
|
1838
|
+
max_operations=1000
|
|
1839
|
+
)
|
|
1840
|
+
|
|
1841
|
+
# Set initial variables
|
|
1842
|
+
ctx.set_var("model", ContextValue.String("claude-3-sonnet"))
|
|
1843
|
+
ctx.set_var("temperature", ContextValue.Number(0.7))
|
|
1844
|
+
|
|
1845
|
+
# Register available tools
|
|
1846
|
+
ctx.register_tool(ToolDefinition(
|
|
1847
|
+
name="vector_search",
|
|
1848
|
+
description="Search vectors by similarity",
|
|
1849
|
+
parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}, "k": {"type": "integer"}}}',
|
|
1850
|
+
requires_confirmation=False
|
|
1851
|
+
))
|
|
1852
|
+
|
|
1853
|
+
# Perform operations with permission checks
|
|
1854
|
+
def safe_read_file(ctx: AgentContext, path: str) -> bytes:
|
|
1855
|
+
resolved = ctx.resolve_path(path)
|
|
1856
|
+
ctx.check_fs_permission(resolved, AuditOperation.FS_READ)
|
|
1857
|
+
ctx.consume_budget(tokens=100, cost=1)
|
|
1858
|
+
# ... actual file read ...
|
|
1859
|
+
return b"file contents"
|
|
1860
|
+
|
|
1861
|
+
def safe_db_query(ctx: AgentContext, table: str, query: str):
|
|
1862
|
+
ctx.check_db_permission(table, AuditOperation.DB_QUERY)
|
|
1863
|
+
ctx.consume_budget(tokens=500, cost=5)
|
|
1864
|
+
# ... actual query ...
|
|
1865
|
+
return []
|
|
1866
|
+
|
|
1867
|
+
# Use in transaction
|
|
1868
|
+
ctx.begin_transaction(tx_id=1)
|
|
1869
|
+
try:
|
|
1870
|
+
# Operations here...
|
|
1871
|
+
ctx.commit_transaction()
|
|
1872
|
+
except Exception as e:
|
|
1873
|
+
ctx.rollback_transaction()
|
|
1874
|
+
raise
|
|
1875
|
+
|
|
1876
|
+
# Export audit trail for debugging/compliance
|
|
1877
|
+
audit = ctx.export_audit()
|
|
1878
|
+
print(f"Session performed {len(audit)} operations")
|
|
1879
|
+
|
|
1880
|
+
# Cleanup
|
|
1881
|
+
session_mgr.cleanup_expired()
|
|
1882
|
+
```
|
|
1883
|
+
|
|
1884
|
+
### Session Errors
|
|
1885
|
+
|
|
1886
|
+
```typescript
|
|
1887
|
+
from sochdb import ContextError
|
|
1888
|
+
|
|
1889
|
+
try:
|
|
1890
|
+
ctx.check_fs_permission("/forbidden", AuditOperation.FS_READ)
|
|
1891
|
+
except ContextError as e:
|
|
1892
|
+
if e.is_permission_denied():
|
|
1893
|
+
print(f"Permission denied: {e.message}")
|
|
1894
|
+
elif e.is_variable_not_found():
|
|
1895
|
+
print(f"Variable not found: {e.variable_name}")
|
|
1896
|
+
elif e.is_budget_exceeded():
|
|
1897
|
+
print(f"Budget exceeded: {e.budget_type}")
|
|
1898
|
+
elif e.is_transaction_error():
|
|
1899
|
+
print(f"Transaction error: {e.message}")
|
|
1900
|
+
elif e.is_invalid_path():
|
|
1901
|
+
print(f"Invalid path: {e.path}")
|
|
1902
|
+
elif e.is_session_expired():
|
|
1903
|
+
print("Session has expired")
|
|
1904
|
+
```
|
|
1905
|
+
---
|
|
1906
|
+
|
|
1907
|
+
## 17. Atomic Multi-Index Writes
|
|
1908
|
+
|
|
1909
|
+
Ensure consistency across KV storage, vectors, and graphs with atomic operations.
|
|
1910
|
+
|
|
1911
|
+
### Problem Without Atomicity
|
|
1912
|
+
|
|
1913
|
+
```
|
|
1914
|
+
# Without atomic writes, a crash can leave:
|
|
1915
|
+
# - Embedding exists but graph edges don't
|
|
1916
|
+
# - KV data exists but embedding is missing
|
|
1917
|
+
# - Partial graph relationships
|
|
1918
|
+
```
|
|
1919
|
+
|
|
1920
|
+
### Atomic Memory Writer
|
|
1921
|
+
|
|
1922
|
+
```typescript
|
|
1923
|
+
from sochdb import AtomicMemoryWriter, MemoryOp
|
|
1924
|
+
|
|
1925
|
+
writer = AtomicMemoryWriter(db)
|
|
1926
|
+
|
|
1927
|
+
# Build atomic operation set
|
|
1928
|
+
result = writer.write_atomic(
|
|
1929
|
+
memory_id="memory_123",
|
|
1930
|
+
ops=[
|
|
1931
|
+
# Store the blob/content
|
|
1932
|
+
MemoryOp.PutBlob(
|
|
1933
|
+
key=b"memories/memory_123/content",
|
|
1934
|
+
value=b"Meeting notes: discussed project timeline..."
|
|
1935
|
+
),
|
|
1936
|
+
|
|
1937
|
+
# Store the embedding
|
|
1938
|
+
MemoryOp.PutEmbedding(
|
|
1939
|
+
collection="memories",
|
|
1940
|
+
id="memory_123",
|
|
1941
|
+
embedding=[0.1, 0.2, ...],
|
|
1942
|
+
metadata={"type": "meeting", "date": "2024-01-15"}
|
|
1943
|
+
),
|
|
1944
|
+
|
|
1945
|
+
# Create graph nodes
|
|
1946
|
+
MemoryOp.CreateNode(
|
|
1947
|
+
namespace="default",
|
|
1948
|
+
node_id="memory_123",
|
|
1949
|
+
node_type="memory",
|
|
1950
|
+
properties={"importance": "high"}
|
|
1951
|
+
),
|
|
1952
|
+
|
|
1953
|
+
# Create graph edges
|
|
1954
|
+
MemoryOp.CreateEdge(
|
|
1955
|
+
namespace="default",
|
|
1956
|
+
from_id="memory_123",
|
|
1957
|
+
edge_type="relates_to",
|
|
1958
|
+
to_id="project_x",
|
|
1959
|
+
properties={}
|
|
1960
|
+
),
|
|
1961
|
+
]
|
|
1962
|
+
)
|
|
1963
|
+
|
|
1964
|
+
print(f"Intent ID: {result.intent_id}")
|
|
1965
|
+
print(f"Operations applied: {result.ops_applied}")
|
|
1966
|
+
print(f"Status: {result.status}") # "committed"
|
|
1967
|
+
```
|
|
1968
|
+
|
|
1969
|
+
### How It Works
|
|
1970
|
+
|
|
1971
|
+
```
|
|
1972
|
+
1. Write intent(id, ops...) to WAL ← Crash-safe
|
|
1973
|
+
2. Apply ops one-by-one
|
|
1974
|
+
3. Write commit(id) to WAL ← All-or-nothing
|
|
1975
|
+
4. Recovery replays incomplete intents
|
|
1976
|
+
```
|
|
1977
|
+
|
|
1978
|
+
---
|
|
1979
|
+
|
|
1980
|
+
## 18. Recovery & WAL Management
|
|
1981
|
+
|
|
1982
|
+
SochDB uses Write-Ahead Logging (WAL) for durability with automatic recovery.
|
|
1983
|
+
|
|
1984
|
+
### Recovery Manager
|
|
1985
|
+
|
|
1986
|
+
```typescript
|
|
1987
|
+
from sochdb import RecoveryManager
|
|
1988
|
+
|
|
1989
|
+
recovery = db.recovery()
|
|
1990
|
+
|
|
1991
|
+
# Check if recovery is needed
|
|
1992
|
+
if recovery.needs_recovery():
|
|
1993
|
+
result = recovery.recover()
|
|
1994
|
+
print(f"Status: {result.status}")
|
|
1995
|
+
print(f"Replayed entries: {result.replayed_entries}")
|
|
1996
|
+
```
|
|
1997
|
+
|
|
1998
|
+
### WAL Verification
|
|
1999
|
+
|
|
2000
|
+
```typescript
|
|
2001
|
+
# Verify WAL integrity
|
|
2002
|
+
result = recovery.verify_wal()
|
|
2003
|
+
|
|
2004
|
+
print(f"Valid: {result.is_valid}")
|
|
2005
|
+
print(f"Total entries: {result.total_entries}")
|
|
2006
|
+
print(f"Valid entries: {result.valid_entries}")
|
|
2007
|
+
print(f"Corrupted: {result.corrupted_entries}")
|
|
2008
|
+
print(f"Last valid LSN: {result.last_valid_lsn}")
|
|
2009
|
+
|
|
2010
|
+
if result.checksum_errors:
|
|
2011
|
+
for error in result.checksum_errors:
|
|
2012
|
+
print(f"Checksum error at LSN {error.lsn}: expected {error.expected}, got {error.actual}")
|
|
2013
|
+
```
|
|
2014
|
+
|
|
2015
|
+
### Force Checkpoint
|
|
2016
|
+
|
|
2017
|
+
```typescript
|
|
2018
|
+
# Force a checkpoint (flush memtable to disk)
|
|
2019
|
+
result = recovery.checkpoint()
|
|
2020
|
+
|
|
2021
|
+
print(f"Checkpoint LSN: {result.checkpoint_lsn}")
|
|
2022
|
+
print(f"Duration: {result.duration_ms}ms")
|
|
2023
|
+
```
|
|
2024
|
+
|
|
2025
|
+
### WAL Statistics
|
|
2026
|
+
|
|
2027
|
+
```typescript
|
|
2028
|
+
stats = recovery.wal_stats()
|
|
2029
|
+
|
|
2030
|
+
print(f"Total size: {stats.total_size_bytes} bytes")
|
|
2031
|
+
print(f"Active size: {stats.active_size_bytes} bytes")
|
|
2032
|
+
print(f"Archived size: {stats.archived_size_bytes} bytes")
|
|
2033
|
+
print(f"Entry count: {stats.entry_count}")
|
|
2034
|
+
print(f"Oldest LSN: {stats.oldest_entry_lsn}")
|
|
2035
|
+
print(f"Newest LSN: {stats.newest_entry_lsn}")
|
|
2036
|
+
```
|
|
2037
|
+
|
|
2038
|
+
### WAL Truncation
|
|
2039
|
+
|
|
2040
|
+
```typescript
|
|
2041
|
+
# Truncate WAL after checkpoint (reclaim disk space)
|
|
2042
|
+
result = recovery.truncate_wal(up_to_lsn=12345)
|
|
2043
|
+
|
|
2044
|
+
print(f"Truncated to LSN: {result.up_to_lsn}")
|
|
2045
|
+
print(f"Bytes freed: {result.bytes_freed}")
|
|
2046
|
+
```
|
|
2047
|
+
|
|
2048
|
+
### Open with Auto-Recovery
|
|
2049
|
+
|
|
2050
|
+
```typescript
|
|
2051
|
+
from sochdb import open_with_recovery
|
|
2052
|
+
|
|
2053
|
+
# Automatically recovers if needed
|
|
2054
|
+
db = open_with_recovery("./my_database")
|
|
2055
|
+
```
|
|
2056
|
+
|
|
2057
|
+
---
|
|
2058
|
+
|
|
2059
|
+
## 19. Checkpoints & Snapshots
|
|
2060
|
+
|
|
2061
|
+
### Application Checkpoints
|
|
2062
|
+
|
|
2063
|
+
Save and restore application state for workflow interruption/resumption.
|
|
2064
|
+
|
|
2065
|
+
```typescript
|
|
2066
|
+
from sochdb import CheckpointService
|
|
2067
|
+
|
|
2068
|
+
checkpoint_svc = db.checkpoint_service()
|
|
2069
|
+
|
|
2070
|
+
# Create a checkpoint
|
|
2071
|
+
checkpoint_id = checkpoint_svc.create(
|
|
2072
|
+
name="workflow_step_3",
|
|
2073
|
+
state=serialized_state, # bytes
|
|
2074
|
+
metadata={"step": "3", "user": "alice", "workflow": "data_pipeline"}
|
|
2075
|
+
)
|
|
2076
|
+
|
|
2077
|
+
# Restore checkpoint
|
|
2078
|
+
state = checkpoint_svc.restore(checkpoint_id)
|
|
2079
|
+
|
|
2080
|
+
# List checkpoints
|
|
2081
|
+
checkpoints = checkpoint_svc.list()
|
|
2082
|
+
for cp in checkpoints:
|
|
2083
|
+
print(f"{cp.name}: {cp.created_at}, {cp.state_size} bytes")
|
|
2084
|
+
|
|
2085
|
+
# Delete checkpoint
|
|
2086
|
+
checkpoint_svc.delete(checkpoint_id)
|
|
2087
|
+
```
|
|
2088
|
+
|
|
2089
|
+
### Workflow Checkpointing
|
|
2090
|
+
|
|
2091
|
+
```typescript
|
|
2092
|
+
# Create a workflow run
|
|
2093
|
+
run_id = checkpoint_svc.create_run(
|
|
2094
|
+
workflow="data_pipeline",
|
|
2095
|
+
params={"input_file": "data.csv", "batch_size": 1000}
|
|
2096
|
+
)
|
|
2097
|
+
|
|
2098
|
+
# Save checkpoint at each node/step
|
|
2099
|
+
checkpoint_svc.save_node_checkpoint(
|
|
2100
|
+
run_id=run_id,
|
|
2101
|
+
node_id="transform_step",
|
|
2102
|
+
state=step_state,
|
|
2103
|
+
metadata={"rows_processed": 5000}
|
|
2104
|
+
)
|
|
2105
|
+
|
|
2106
|
+
# Load latest checkpoint for a node
|
|
2107
|
+
checkpoint = checkpoint_svc.load_node_checkpoint(run_id, "transform_step")
|
|
2108
|
+
|
|
2109
|
+
# List all checkpoints for a run
|
|
2110
|
+
node_checkpoints = checkpoint_svc.list_run_checkpoints(run_id)
|
|
2111
|
+
```
|
|
2112
|
+
|
|
2113
|
+
### Snapshot Reader (Point-in-Time)
|
|
2114
|
+
|
|
2115
|
+
```typescript
|
|
2116
|
+
# Create a consistent snapshot for reading
|
|
2117
|
+
snapshot = db.snapshot()
|
|
2118
|
+
|
|
2119
|
+
# Read from snapshot (doesn't see newer writes)
|
|
2120
|
+
value = snapshot.get(b"key")
|
|
2121
|
+
|
|
2122
|
+
# All reads within snapshot see consistent state
|
|
2123
|
+
with db.snapshot() as snap:
|
|
2124
|
+
v1 = snap.get(b"key1")
|
|
2125
|
+
v2 = snap.get(b"key2") # Same consistent view
|
|
2126
|
+
|
|
2127
|
+
# Meanwhile, writes continue in main DB
|
|
2128
|
+
db.put(b"key1", b"new_value") # Snapshot doesn't see this
|
|
2129
|
+
```
|
|
2130
|
+
|
|
2131
|
+
---
|
|
2132
|
+
|
|
2133
|
+
## 20. Compression & Storage
|
|
2134
|
+
|
|
2135
|
+
### Compression Settings
|
|
2136
|
+
|
|
2137
|
+
```typescript
|
|
2138
|
+
from sochdb import CompressionType
|
|
2139
|
+
|
|
2140
|
+
db = Database.open("./my_db", config={
|
|
2141
|
+
# Compression for SST files
|
|
2142
|
+
"compression": CompressionType.LZ4, # LZ4 (fast), ZSTD (better ratio), NONE
|
|
2143
|
+
"compression_level": 3, # ZSTD: 1-22, LZ4: ignored
|
|
2144
|
+
|
|
2145
|
+
# Compression for WAL
|
|
2146
|
+
"wal_compression": CompressionType.NONE, # Usually NONE for WAL (already sequential)
|
|
2147
|
+
})
|
|
2148
|
+
```
|
|
2149
|
+
|
|
2150
|
+
### Compression Comparison
|
|
2151
|
+
|
|
2152
|
+
| Type | Ratio | Compress Speed | Decompress Speed | Use Case |
|
|
2153
|
+
|------|-------|----------------|------------------|----------|
|
|
2154
|
+
| `NONE` | 1x | N/A | N/A | Already compressed data |
|
|
2155
|
+
| `LZ4` | ~2.5x | ~780 MB/s | ~4500 MB/s | General use (default) |
|
|
2156
|
+
| `ZSTD` | ~3.5x | ~520 MB/s | ~1800 MB/s | Cold storage, large datasets |
|
|
2157
|
+
|
|
2158
|
+
### Storage Statistics
|
|
2159
|
+
|
|
2160
|
+
```typescript
|
|
2161
|
+
stats = db.storage_stats()
|
|
2162
|
+
|
|
2163
|
+
print(f"Data size: {stats.data_size_bytes}")
|
|
2164
|
+
print(f"Index size: {stats.index_size_bytes}")
|
|
2165
|
+
print(f"WAL size: {stats.wal_size_bytes}")
|
|
2166
|
+
print(f"Compression ratio: {stats.compression_ratio:.2f}x")
|
|
2167
|
+
print(f"SST files: {stats.sst_file_count}")
|
|
2168
|
+
print(f"Levels: {stats.level_stats}")
|
|
2169
|
+
```
|
|
2170
|
+
|
|
2171
|
+
### Compaction Control
|
|
2172
|
+
|
|
2173
|
+
```typescript
|
|
2174
|
+
# Manual compaction (reclaim space, optimize reads)
|
|
2175
|
+
db.compact()
|
|
2176
|
+
|
|
2177
|
+
# Compact specific level
|
|
2178
|
+
db.compact_level(level=0)
|
|
2179
|
+
|
|
2180
|
+
# Get compaction stats
|
|
2181
|
+
stats = db.compaction_stats()
|
|
2182
|
+
print(f"Pending compactions: {stats.pending_compactions}")
|
|
2183
|
+
print(f"Running compactions: {stats.running_compactions}")
|
|
2184
|
+
```
|
|
2185
|
+
|
|
2186
|
+
---
|
|
2187
|
+
|
|
2188
|
+
## 21. Statistics & Monitoring
|
|
2189
|
+
|
|
2190
|
+
### Database Statistics
|
|
2191
|
+
|
|
2192
|
+
```typescript
|
|
2193
|
+
stats = db.stats()
|
|
2194
|
+
|
|
2195
|
+
# Transaction stats
|
|
2196
|
+
print(f"Active transactions: {stats.active_transactions}")
|
|
2197
|
+
print(f"Committed transactions: {stats.committed_transactions}")
|
|
2198
|
+
print(f"Aborted transactions: {stats.aborted_transactions}")
|
|
2199
|
+
print(f"Conflict rate: {stats.conflict_rate:.2%}")
|
|
2200
|
+
|
|
2201
|
+
# Operation stats
|
|
2202
|
+
print(f"Total reads: {stats.total_reads}")
|
|
2203
|
+
print(f"Total writes: {stats.total_writes}")
|
|
2204
|
+
print(f"Cache hit rate: {stats.cache_hit_rate:.2%}")
|
|
2205
|
+
|
|
2206
|
+
# Storage stats
|
|
2207
|
+
print(f"Key count: {stats.key_count}")
|
|
2208
|
+
print(f"Total data size: {stats.total_data_bytes}")
|
|
2209
|
+
```
|
|
2210
|
+
|
|
2211
|
+
### Token Statistics (LLM Optimization)
|
|
2212
|
+
|
|
2213
|
+
```typescript
|
|
2214
|
+
stats = db.token_stats()
|
|
2215
|
+
|
|
2216
|
+
print(f"TOON tokens emitted: {stats.toon_tokens_emitted}")
|
|
2217
|
+
print(f"Equivalent JSON tokens: {stats.json_tokens_equivalent}")
|
|
2218
|
+
print(f"Token savings: {stats.token_savings_percent:.1f}%")
|
|
2219
|
+
```
|
|
2220
|
+
|
|
2221
|
+
### Performance Metrics
|
|
2222
|
+
|
|
2223
|
+
```typescript
|
|
2224
|
+
metrics = db.performance_metrics()
|
|
2225
|
+
|
|
2226
|
+
# Latency percentiles
|
|
2227
|
+
print(f"Read P50: {metrics.read_latency_p50_us}µs")
|
|
2228
|
+
print(f"Read P99: {metrics.read_latency_p99_us}µs")
|
|
2229
|
+
print(f"Write P50: {metrics.write_latency_p50_us}µs")
|
|
2230
|
+
print(f"Write P99: {metrics.write_latency_p99_us}µs")
|
|
2231
|
+
|
|
2232
|
+
# Throughput
|
|
2233
|
+
print(f"Reads/sec: {metrics.reads_per_second}")
|
|
2234
|
+
print(f"Writes/sec: {metrics.writes_per_second}")
|
|
2235
|
+
```
|
|
2236
|
+
|
|
2237
|
+
---
|
|
2238
|
+
|
|
2239
|
+
## 22. Distributed Tracing
|
|
2240
|
+
|
|
2241
|
+
Track operations for debugging and performance analysis.
|
|
2242
|
+
|
|
2243
|
+
### Starting Traces
|
|
2244
|
+
|
|
2245
|
+
```typescript
|
|
2246
|
+
from sochdb import TraceStore
|
|
2247
|
+
|
|
2248
|
+
traces = TraceStore(db)
|
|
2249
|
+
|
|
2250
|
+
# Start a trace run
|
|
2251
|
+
run = traces.start_run(
|
|
2252
|
+
name="user_request",
|
|
2253
|
+
resource={"service": "api", "version": "1.0.0"}
|
|
2254
|
+
)
|
|
2255
|
+
trace_id = run.trace_id
|
|
2256
|
+
```
|
|
2257
|
+
|
|
2258
|
+
### Creating Spans
|
|
2259
|
+
|
|
2260
|
+
```typescript
|
|
2261
|
+
from sochdb import SpanKind, SpanStatusCode
|
|
2262
|
+
|
|
2263
|
+
# Start root span
|
|
2264
|
+
root_span = traces.start_span(
|
|
2265
|
+
trace_id=trace_id,
|
|
2266
|
+
name="handle_request",
|
|
2267
|
+
parent_span_id=None,
|
|
2268
|
+
kind=SpanKind.SERVER
|
|
2269
|
+
)
|
|
2270
|
+
|
|
2271
|
+
# Start child span
|
|
2272
|
+
db_span = traces.start_span(
|
|
2273
|
+
trace_id=trace_id,
|
|
2274
|
+
name="database_query",
|
|
2275
|
+
parent_span_id=root_span.span_id,
|
|
2276
|
+
kind=SpanKind.CLIENT
|
|
2277
|
+
)
|
|
2278
|
+
|
|
2279
|
+
# Add attributes
|
|
2280
|
+
traces.set_span_attributes(trace_id, db_span.span_id, {
|
|
2281
|
+
"db.system": "sochdb",
|
|
2282
|
+
"db.operation": "SELECT",
|
|
2283
|
+
"db.table": "users"
|
|
2284
|
+
})
|
|
2285
|
+
|
|
2286
|
+
# End spans
|
|
2287
|
+
traces.end_span(trace_id, db_span.span_id, SpanStatusCode.OK)
|
|
2288
|
+
traces.end_span(trace_id, root_span.span_id, SpanStatusCode.OK)
|
|
2289
|
+
|
|
2290
|
+
# End the trace run
|
|
2291
|
+
traces.end_run(trace_id, TraceStatus.COMPLETED)
|
|
2292
|
+
```
|
|
2293
|
+
|
|
2294
|
+
### Domain Events
|
|
2295
|
+
|
|
2296
|
+
```typescript
|
|
2297
|
+
# Log retrieval (for RAG debugging)
|
|
2298
|
+
traces.log_retrieval(
|
|
2299
|
+
trace_id=trace_id,
|
|
2300
|
+
query="user query",
|
|
2301
|
+
results=[{"id": "doc1", "score": 0.95}],
|
|
2302
|
+
latency_ms=15
|
|
2303
|
+
)
|
|
2304
|
+
|
|
2305
|
+
# Log LLM call
|
|
2306
|
+
traces.log_llm_call(
|
|
2307
|
+
trace_id=trace_id,
|
|
2308
|
+
model="claude-3-sonnet",
|
|
2309
|
+
input_tokens=500,
|
|
2310
|
+
output_tokens=200,
|
|
2311
|
+
latency_ms=1200
|
|
2312
|
+
)
|
|
2313
|
+
```
|
|
2314
|
+
|
|
2315
|
+
---
|
|
2316
|
+
|
|
2317
|
+
## 23. Workflow & Run Tracking
|
|
2318
|
+
|
|
2319
|
+
Track long-running workflows with events and state.
|
|
2320
|
+
|
|
2321
|
+
### Creating Workflow Runs
|
|
2322
|
+
|
|
2323
|
+
```typescript
|
|
2324
|
+
from sochdb import WorkflowService, RunStatus
|
|
2325
|
+
|
|
2326
|
+
workflow_svc = db.workflow_service()
|
|
2327
|
+
|
|
2328
|
+
# Create a new run
|
|
2329
|
+
run = workflow_svc.create_run(
|
|
2330
|
+
run_id="run_123",
|
|
2331
|
+
workflow="data_pipeline",
|
|
2332
|
+
params={"input": "data.csv", "output": "results.json"}
|
|
2333
|
+
)
|
|
2334
|
+
|
|
2335
|
+
print(f"Run ID: {run.run_id}")
|
|
2336
|
+
print(f"Status: {run.status}")
|
|
2337
|
+
print(f"Created: {run.created_at}")
|
|
2338
|
+
```
|
|
2339
|
+
|
|
2340
|
+
### Appending Events
|
|
2341
|
+
|
|
2342
|
+
```typescript
|
|
2343
|
+
from sochdb import WorkflowEvent, EventType
|
|
2344
|
+
|
|
2345
|
+
# Append events as workflow progresses
|
|
2346
|
+
workflow_svc.append_event(WorkflowEvent(
|
|
2347
|
+
run_id="run_123",
|
|
2348
|
+
event_type=EventType.NODE_STARTED,
|
|
2349
|
+
node_id="extract",
|
|
2350
|
+
data={"input_file": "data.csv"}
|
|
2351
|
+
))
|
|
2352
|
+
|
|
2353
|
+
workflow_svc.append_event(WorkflowEvent(
|
|
2354
|
+
run_id="run_123",
|
|
2355
|
+
event_type=EventType.NODE_COMPLETED,
|
|
2356
|
+
node_id="extract",
|
|
2357
|
+
data={"rows_extracted": 10000}
|
|
2358
|
+
))
|
|
2359
|
+
```
|
|
2360
|
+
|
|
2361
|
+
### Querying Events
|
|
2362
|
+
|
|
2363
|
+
```typescript
|
|
2364
|
+
# Get all events for a run
|
|
2365
|
+
events = workflow_svc.get_events("run_123")
|
|
2366
|
+
|
|
2367
|
+
# Get events since a sequence number
|
|
2368
|
+
new_events = workflow_svc.get_events("run_123", since_seq=10, limit=100)
|
|
2369
|
+
|
|
2370
|
+
# Stream events (for real-time monitoring)
|
|
2371
|
+
for event in workflow_svc.stream_events("run_123"):
|
|
2372
|
+
print(f"[{event.seq}] {event.event_type}: {event.node_id}")
|
|
2373
|
+
```
|
|
2374
|
+
|
|
2375
|
+
### Update Run Status
|
|
2376
|
+
|
|
2377
|
+
```typescript
|
|
2378
|
+
# Update status
|
|
2379
|
+
workflow_svc.update_run_status("run_123", RunStatus.COMPLETED)
|
|
2380
|
+
|
|
2381
|
+
# Or mark as failed
|
|
2382
|
+
workflow_svc.update_run_status("run_123", RunStatus.FAILED)
|
|
2383
|
+
```
|
|
2384
|
+
|
|
2385
|
+
---
|
|
2386
|
+
|
|
2387
|
+
## 24. Server Mode (gRPC Client)
|
|
2388
|
+
|
|
2389
|
+
Full-featured client for distributed deployments.
|
|
2390
|
+
|
|
2391
|
+
### Connection
|
|
2392
|
+
|
|
2393
|
+
```typescript
|
|
2394
|
+
from sochdb import SochDBClient
|
|
2395
|
+
|
|
2396
|
+
# Basic connection
|
|
2397
|
+
client = SochDBClient("localhost:50051")
|
|
2398
|
+
|
|
2399
|
+
# With TLS
|
|
2400
|
+
client = SochDBClient("localhost:50051", secure=True, ca_cert="ca.pem")
|
|
2401
|
+
|
|
2402
|
+
# With authentication
|
|
2403
|
+
client = SochDBClient("localhost:50051", api_key="your_api_key")
|
|
2404
|
+
|
|
2405
|
+
# Context manager
|
|
2406
|
+
with SochDBClient("localhost:50051") as client:
|
|
2407
|
+
client.put(b"key", b"value")
|
|
2408
|
+
```
|
|
2409
|
+
|
|
2410
|
+
### Key-Value Operations
|
|
2411
|
+
|
|
2412
|
+
```typescript
|
|
2413
|
+
# Put with TTL
|
|
2414
|
+
client.put(b"key", b"value", namespace="default", ttl_seconds=3600)
|
|
2415
|
+
|
|
2416
|
+
# Get
|
|
2417
|
+
value = client.get(b"key", namespace="default")
|
|
2418
|
+
|
|
2419
|
+
# Delete
|
|
2420
|
+
client.delete(b"key", namespace="default")
|
|
2421
|
+
|
|
2422
|
+
# Batch operations
|
|
2423
|
+
client.put_batch([
|
|
2424
|
+
(b"key1", b"value1"),
|
|
2425
|
+
(b"key2", b"value2"),
|
|
2426
|
+
], namespace="default")
|
|
2427
|
+
```
|
|
2428
|
+
|
|
2429
|
+
### Vector Operations (Server Mode)
|
|
2430
|
+
|
|
2431
|
+
```typescript
|
|
2432
|
+
# Create index
|
|
2433
|
+
client.create_index(
|
|
2434
|
+
name="embeddings",
|
|
2435
|
+
dimension=384,
|
|
2436
|
+
metric="cosine",
|
|
2437
|
+
m=16,
|
|
2438
|
+
ef_construction=200
|
|
2439
|
+
)
|
|
2440
|
+
|
|
2441
|
+
# Insert vectors
|
|
2442
|
+
client.insert_vectors(
|
|
2443
|
+
index_name="embeddings",
|
|
2444
|
+
ids=[1, 2, 3],
|
|
2445
|
+
vectors=[[...], [...], [...]]
|
|
2446
|
+
)
|
|
2447
|
+
|
|
2448
|
+
# Search
|
|
2449
|
+
results = client.search(
|
|
2450
|
+
index_name="embeddings",
|
|
2451
|
+
query=[0.1, 0.2, ...],
|
|
2452
|
+
k=10,
|
|
2453
|
+
ef_search=50
|
|
2454
|
+
)
|
|
2455
|
+
|
|
2456
|
+
for result in results:
|
|
2457
|
+
print(f"ID: {result.id}, Distance: {result.distance}")
|
|
2458
|
+
```
|
|
2459
|
+
|
|
2460
|
+
### Collection Operations (Server Mode)
|
|
2461
|
+
|
|
2462
|
+
```typescript
|
|
2463
|
+
# Create collection
|
|
2464
|
+
client.create_collection(
|
|
2465
|
+
name="documents",
|
|
2466
|
+
dimension=384,
|
|
2467
|
+
namespace="default",
|
|
2468
|
+
metric="cosine"
|
|
2469
|
+
)
|
|
2470
|
+
|
|
2471
|
+
# Add documents
|
|
2472
|
+
client.add_documents(
|
|
2473
|
+
collection_name="documents",
|
|
2474
|
+
documents=[
|
|
2475
|
+
{"id": "1", "content": "Hello", "embedding": [...], "metadata": {...}},
|
|
2476
|
+
{"id": "2", "content": "World", "embedding": [...], "metadata": {...}}
|
|
2477
|
+
],
|
|
2478
|
+
namespace="default"
|
|
2479
|
+
)
|
|
2480
|
+
|
|
2481
|
+
# Search
|
|
2482
|
+
results = client.search_collection(
|
|
2483
|
+
collection_name="documents",
|
|
2484
|
+
query_vector=[...],
|
|
2485
|
+
k=10,
|
|
2486
|
+
namespace="default",
|
|
2487
|
+
filter={"author": "Alice"}
|
|
2488
|
+
)
|
|
2489
|
+
```
|
|
2490
|
+
|
|
2491
|
+
### Context Service (Server Mode)
|
|
2492
|
+
|
|
2493
|
+
```typescript
|
|
2494
|
+
# Query context for LLM
|
|
2495
|
+
context = client.query_context(
|
|
2496
|
+
session_id="session_123",
|
|
2497
|
+
sections=[
|
|
2498
|
+
{"name": "system", "priority": 0, "type": "literal",
|
|
2499
|
+
"content": "You are a helpful assistant."},
|
|
2500
|
+
{"name": "history", "priority": 1, "type": "recent",
|
|
2501
|
+
"table": "messages", "top_k": 10},
|
|
2502
|
+
{"name": "knowledge", "priority": 2, "type": "search",
|
|
2503
|
+
"collection": "documents", "embedding": [...], "top_k": 5}
|
|
2504
|
+
],
|
|
2505
|
+
token_limit=4096,
|
|
2506
|
+
format="toon"
|
|
2507
|
+
)
|
|
2508
|
+
|
|
2509
|
+
print(context.text)
|
|
2510
|
+
print(f"Tokens used: {context.token_count}")
|
|
2511
|
+
```
|
|
2512
|
+
|
|
2513
|
+
---
|
|
2514
|
+
|
|
2515
|
+
## 25. IPC Client (Unix Sockets)
|
|
2516
|
+
|
|
2517
|
+
Local server communication via Unix sockets (lower latency than gRPC).
|
|
2518
|
+
|
|
2519
|
+
```typescript
|
|
2520
|
+
from sochdb import IpcClient
|
|
2521
|
+
|
|
2522
|
+
# Connect
|
|
2523
|
+
client = IpcClient.connect("/tmp/sochdb.sock", timeout=30.0)
|
|
2524
|
+
|
|
2525
|
+
# Basic operations
|
|
2526
|
+
client.put(b"key", b"value")
|
|
2527
|
+
value = client.get(b"key")
|
|
2528
|
+
client.delete(b"key")
|
|
2529
|
+
|
|
2530
|
+
# Path operations
|
|
2531
|
+
client.put_path(["users", "alice"], b"data")
|
|
2532
|
+
value = client.get_path(["users", "alice"])
|
|
2533
|
+
|
|
2534
|
+
# Query
|
|
2535
|
+
result = client.query("users/", limit=100)
|
|
2536
|
+
|
|
2537
|
+
# Scan
|
|
2538
|
+
results = client.scan("prefix/")
|
|
2539
|
+
|
|
2540
|
+
# Transactions
|
|
2541
|
+
txn_id = client.begin_transaction()
|
|
2542
|
+
# ... operations ...
|
|
2543
|
+
commit_ts = client.commit(txn_id)
|
|
2544
|
+
# or client.abort(txn_id)
|
|
2545
|
+
|
|
2546
|
+
# Admin
|
|
2547
|
+
client.ping()
|
|
2548
|
+
client.checkpoint()
|
|
2549
|
+
stats = client.stats()
|
|
2550
|
+
|
|
2551
|
+
client.close()
|
|
2552
|
+
```
|
|
2553
|
+
|
|
2554
|
+
---
|
|
2555
|
+
|
|
2556
|
+
## 26. Standalone VectorIndex
|
|
2557
|
+
|
|
2558
|
+
Direct HNSW index operations without collections.
|
|
2559
|
+
|
|
2560
|
+
```typescript
|
|
2561
|
+
from sochdb import VectorIndex, VectorIndexConfig, DistanceMetric
|
|
2562
|
+
import numpy as np
|
|
2563
|
+
|
|
2564
|
+
# Create index
|
|
2565
|
+
config = VectorIndexConfig(
|
|
2566
|
+
dimension=384,
|
|
2567
|
+
metric=DistanceMetric.COSINE,
|
|
2568
|
+
m=16,
|
|
2569
|
+
ef_construction=200,
|
|
2570
|
+
ef_search=50,
|
|
2571
|
+
max_elements=100000
|
|
2572
|
+
)
|
|
2573
|
+
index = VectorIndex(config)
|
|
2574
|
+
|
|
2575
|
+
# Insert single vector
|
|
2576
|
+
index.insert(id=1, vector=np.array([0.1, 0.2, ...], dtype=np.float32))
|
|
2577
|
+
|
|
2578
|
+
# Batch insert
|
|
2579
|
+
ids = np.array([1, 2, 3], dtype=np.uint64)
|
|
2580
|
+
vectors = np.array([[...], [...], [...]], dtype=np.float32)
|
|
2581
|
+
count = index.insert_batch(ids, vectors)
|
|
2582
|
+
|
|
2583
|
+
# Fast batch insert (returns failures)
|
|
2584
|
+
inserted, failed = index.insert_batch_fast(ids, vectors)
|
|
2585
|
+
|
|
2586
|
+
# Search
|
|
2587
|
+
query = np.array([0.1, 0.2, ...], dtype=np.float32)
|
|
2588
|
+
results = index.search(query, k=10, ef_search=100)
|
|
2589
|
+
|
|
2590
|
+
for id, distance in results:
|
|
2591
|
+
print(f"ID: {id}, Distance: {distance}")
|
|
2592
|
+
|
|
2593
|
+
# Properties
|
|
2594
|
+
print(f"Size: {len(index)}")
|
|
2595
|
+
print(f"Dimension: {index.dimension}")
|
|
2596
|
+
|
|
2597
|
+
# Save/load
|
|
2598
|
+
index.save("./index.bin")
|
|
2599
|
+
index = VectorIndex.load("./index.bin")
|
|
2600
|
+
```
|
|
2601
|
+
|
|
2602
|
+
---
|
|
2603
|
+
|
|
2604
|
+
## 27. Vector Utilities
|
|
2605
|
+
|
|
2606
|
+
Standalone vector operations for preprocessing and analysis.
|
|
2607
|
+
|
|
2608
|
+
```typescript
|
|
2609
|
+
from sochdb import vector
|
|
2610
|
+
|
|
2611
|
+
# Distance calculations
|
|
2612
|
+
a = [1.0, 0.0, 0.0]
|
|
2613
|
+
b = [0.707, 0.707, 0.0]
|
|
2614
|
+
|
|
2615
|
+
cosine_dist = vector.cosine_distance(a, b)
|
|
2616
|
+
euclidean_dist = vector.euclidean_distance(a, b)
|
|
2617
|
+
dot_product = vector.dot_product(a, b)
|
|
2618
|
+
|
|
2619
|
+
print(f"Cosine distance: {cosine_dist:.4f}")
|
|
2620
|
+
print(f"Euclidean distance: {euclidean_dist:.4f}")
|
|
2621
|
+
print(f"Dot product: {dot_product:.4f}")
|
|
2622
|
+
|
|
2623
|
+
# Normalize a vector
|
|
2624
|
+
v = [3.0, 4.0]
|
|
2625
|
+
normalized = vector.normalize(v)
|
|
2626
|
+
print(f"Normalized: {normalized}") # [0.6, 0.8]
|
|
2627
|
+
|
|
2628
|
+
# Batch normalize
|
|
2629
|
+
vectors = [[3.0, 4.0], [1.0, 0.0]]
|
|
2630
|
+
normalized_batch = vector.normalize_batch(vectors)
|
|
2631
|
+
|
|
2632
|
+
# Compute centroid
|
|
2633
|
+
vectors = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
|
|
2634
|
+
centroid = vector.centroid(vectors)
|
|
2635
|
+
|
|
2636
|
+
# Cosine similarity (1 - distance)
|
|
2637
|
+
similarity = vector.cosine_similarity(a, b)
|
|
2638
|
+
```
|
|
2639
|
+
|
|
2640
|
+
---
|
|
2641
|
+
|
|
2642
|
+
## 28. Data Formats (TOON/JSON/Columnar)
|
|
2643
|
+
|
|
2644
|
+
### Wire Formats
|
|
2645
|
+
|
|
2646
|
+
```typescript
|
|
2647
|
+
from sochdb import WireFormat
|
|
2648
|
+
|
|
2649
|
+
# Available formats
|
|
2650
|
+
WireFormat.TOON # Token-efficient (40-66% fewer tokens)
|
|
2651
|
+
WireFormat.JSON # Standard JSON
|
|
2652
|
+
WireFormat.COLUMNAR # Raw columnar for analytics
|
|
2653
|
+
|
|
2654
|
+
# Parse from string
|
|
2655
|
+
fmt = WireFormat.from_string("toon")
|
|
2656
|
+
|
|
2657
|
+
# Convert between formats
|
|
2658
|
+
data = {"users": [{"id": 1, "name": "Alice"}]}
|
|
2659
|
+
toon_data = WireFormat.to_toon(data)
|
|
2660
|
+
json_data = WireFormat.to_json(data)
|
|
2661
|
+
```
|
|
2662
|
+
|
|
2663
|
+
### TOON Format Benefits
|
|
2664
|
+
|
|
2665
|
+
TOON uses **40-60% fewer tokens** than JSON:
|
|
2666
|
+
|
|
2667
|
+
```
|
|
2668
|
+
# JSON (15 tokens)
|
|
2669
|
+
{"users": [{"id": 1, "name": "Alice"}]}
|
|
2670
|
+
|
|
2671
|
+
# TOON (9 tokens)
|
|
2672
|
+
users:
|
|
2673
|
+
- id: 1
|
|
2674
|
+
name: Alice
|
|
2675
|
+
```
|
|
2676
|
+
|
|
2677
|
+
### Context Formats
|
|
2678
|
+
|
|
2679
|
+
```typescript
|
|
2680
|
+
from sochdb import ContextFormat
|
|
2681
|
+
|
|
2682
|
+
ContextFormat.TOON # Token-efficient
|
|
2683
|
+
ContextFormat.JSON # Structured data
|
|
2684
|
+
ContextFormat.MARKDOWN # Human-readable
|
|
2685
|
+
|
|
2686
|
+
# Format capabilities
|
|
2687
|
+
from sochdb import FormatCapabilities
|
|
2688
|
+
|
|
2689
|
+
# Convert between formats
|
|
2690
|
+
ctx_fmt = FormatCapabilities.wire_to_context(WireFormat.TOON)
|
|
2691
|
+
wire_fmt = FormatCapabilities.context_to_wire(ContextFormat.JSON)
|
|
2692
|
+
|
|
2693
|
+
# Check round-trip support
|
|
2694
|
+
if FormatCapabilities.supports_round_trip(WireFormat.TOON):
|
|
2695
|
+
print("Safe for decode(encode(x)) = x")
|
|
2696
|
+
```
|
|
2697
|
+
|
|
2698
|
+
---
|
|
2699
|
+
|
|
2700
|
+
## 29. Policy Service
|
|
2701
|
+
|
|
2702
|
+
Register and evaluate access control policies.
|
|
2703
|
+
|
|
2704
|
+
```typescript
|
|
2705
|
+
from sochdb import PolicyService
|
|
2706
|
+
|
|
2707
|
+
policy_svc = db.policy_service()
|
|
2708
|
+
|
|
2709
|
+
# Register a policy
|
|
2710
|
+
policy_svc.register(
|
|
2711
|
+
policy_id="read_own_data",
|
|
2712
|
+
name="Users can read their own data",
|
|
2713
|
+
trigger="READ",
|
|
2714
|
+
action="ALLOW",
|
|
2715
|
+
condition="resource.owner == user.id"
|
|
2716
|
+
)
|
|
2717
|
+
|
|
2718
|
+
# Register another policy
|
|
2719
|
+
policy_svc.register(
|
|
2720
|
+
policy_id="admin_all",
|
|
2721
|
+
name="Admins can do everything",
|
|
2722
|
+
trigger="*",
|
|
2723
|
+
action="ALLOW",
|
|
2724
|
+
condition="user.role == 'admin'"
|
|
2725
|
+
)
|
|
2726
|
+
|
|
2727
|
+
# Evaluate policy
|
|
2728
|
+
result = policy_svc.evaluate(
|
|
2729
|
+
action="READ",
|
|
2730
|
+
resource="documents/123",
|
|
2731
|
+
context={"user.id": "alice", "user.role": "user", "resource.owner": "alice"}
|
|
2732
|
+
)
|
|
2733
|
+
|
|
2734
|
+
if result.allowed:
|
|
2735
|
+
print("Access granted")
|
|
2736
|
+
else:
|
|
2737
|
+
print(f"Access denied: {result.reason}")
|
|
2738
|
+
print(f"Denying policy: {result.policy_id}")
|
|
2739
|
+
|
|
2740
|
+
# List policies
|
|
2741
|
+
policies = policy_svc.list()
|
|
2742
|
+
for p in policies:
|
|
2743
|
+
print(f"{p.policy_id}: {p.name}")
|
|
2744
|
+
|
|
2745
|
+
# Delete policy
|
|
2746
|
+
policy_svc.delete("old_policy")
|
|
2747
|
+
```
|
|
2748
|
+
|
|
2749
|
+
---
|
|
2750
|
+
|
|
2751
|
+
## 30. MCP (Model Context Protocol)
|
|
2752
|
+
|
|
2753
|
+
Integrate SochDB as an MCP tool provider.
|
|
2754
|
+
|
|
2755
|
+
### Built-in MCP Tools
|
|
2756
|
+
|
|
2757
|
+
| Tool | Description |
|
|
2758
|
+
|------|-------------|
|
|
2759
|
+
| `sochdb_query` | Execute ToonQL/SQL queries |
|
|
2760
|
+
| `sochdb_context_query` | Fetch AI-optimized context |
|
|
2761
|
+
| `sochdb_put` | Store key-value data |
|
|
2762
|
+
| `sochdb_get` | Retrieve data by key |
|
|
2763
|
+
| `sochdb_search` | Vector similarity search |
|
|
2764
|
+
|
|
2765
|
+
### Using MCP Tools (Server Mode)
|
|
2766
|
+
|
|
2767
|
+
```typescript
|
|
2768
|
+
# List available tools
|
|
2769
|
+
tools = client.list_mcp_tools()
|
|
2770
|
+
for tool in tools:
|
|
2771
|
+
print(f"{tool.name}: {tool.description}")
|
|
2772
|
+
|
|
2773
|
+
# Get tool schema
|
|
2774
|
+
schema = client.get_mcp_tool_schema("sochdb_search")
|
|
2775
|
+
print(schema)
|
|
2776
|
+
|
|
2777
|
+
# Execute tool
|
|
2778
|
+
result = client.execute_mcp_tool(
|
|
2779
|
+
name="sochdb_query",
|
|
2780
|
+
arguments={"query": "SELECT * FROM users", "format": "toon"}
|
|
2781
|
+
)
|
|
2782
|
+
print(result)
|
|
2783
|
+
```
|
|
2784
|
+
|
|
2785
|
+
### Register Custom Tool
|
|
2786
|
+
|
|
2787
|
+
```typescript
|
|
2788
|
+
# Register a custom tool
|
|
2789
|
+
client.register_mcp_tool(
|
|
2790
|
+
name="search_documents",
|
|
2791
|
+
description="Search documents by semantic similarity",
|
|
2792
|
+
input_schema={
|
|
2793
|
+
"type": "object",
|
|
2794
|
+
"properties": {
|
|
2795
|
+
"query": {"type": "string", "description": "Search query"},
|
|
2796
|
+
"k": {"type": "integer", "description": "Number of results", "default": 10}
|
|
2797
|
+
},
|
|
2798
|
+
"required": ["query"]
|
|
2799
|
+
}
|
|
2800
|
+
)
|
|
2801
|
+
```
|
|
2802
|
+
|
|
2803
|
+
---
|
|
2804
|
+
|
|
2805
|
+
## 31. Configuration Reference
|
|
2806
|
+
|
|
2807
|
+
### Database Configuration
|
|
2808
|
+
|
|
2809
|
+
```typescript
|
|
2810
|
+
from sochdb import Database, CompressionType, SyncMode
|
|
2811
|
+
|
|
2812
|
+
db = Database.open("./my_db", config={
|
|
2813
|
+
# Durability
|
|
2814
|
+
"wal_enabled": True, # Write-ahead logging
|
|
2815
|
+
"sync_mode": SyncMode.NORMAL, # FULL, NORMAL, OFF
|
|
2816
|
+
|
|
2817
|
+
# Performance
|
|
2818
|
+
"memtable_size_bytes": 64 * 1024 * 1024, # 64MB (flush threshold)
|
|
2819
|
+
"block_cache_size_bytes": 256 * 1024 * 1024, # 256MB
|
|
2820
|
+
"group_commit": True, # Batch commits
|
|
2821
|
+
|
|
2822
|
+
# Compression
|
|
2823
|
+
"compression": CompressionType.LZ4,
|
|
2824
|
+
|
|
2825
|
+
# Index policy
|
|
2826
|
+
"index_policy": "balanced",
|
|
2827
|
+
|
|
2828
|
+
# Background workers
|
|
2829
|
+
"compaction_threads": 2,
|
|
2830
|
+
"flush_threads": 1,
|
|
2831
|
+
})
|
|
2832
|
+
```
|
|
2833
|
+
|
|
2834
|
+
### Sync Modes
|
|
2835
|
+
|
|
2836
|
+
| Mode | Speed | Safety | Use Case |
|
|
2837
|
+
|------|-------|--------|----------|
|
|
2838
|
+
| `OFF` | ~10x faster | Risk of data loss | Development, caches |
|
|
2839
|
+
| `NORMAL` | Balanced | Fsync at checkpoints | Default |
|
|
2840
|
+
| `FULL` | Slowest | Fsync every commit | Financial data |
|
|
2841
|
+
|
|
2842
|
+
### CollectionConfig Reference
|
|
2843
|
+
|
|
2844
|
+
| Field | Type | Default | Description |
|
|
2845
|
+
|-------|------|---------|-------------|
|
|
2846
|
+
| `name` | str | required | Collection name |
|
|
2847
|
+
| `dimension` | int | required | Vector dimension |
|
|
2848
|
+
| `metric` | DistanceMetric | COSINE | COSINE, EUCLIDEAN, DOT_PRODUCT |
|
|
2849
|
+
| `m` | int | 16 | HNSW M parameter |
|
|
2850
|
+
| `ef_construction` | int | 100 | HNSW build quality |
|
|
2851
|
+
| `ef_search` | int | 50 | HNSW search quality |
|
|
2852
|
+
| `quantization` | QuantizationType | NONE | NONE, SCALAR, PQ |
|
|
2853
|
+
| `enable_hybrid_search` | bool | False | Enable BM25 |
|
|
2854
|
+
| `content_field` | str | None | Field for BM25 indexing |
|
|
2855
|
+
|
|
2856
|
+
### Environment Variables
|
|
2857
|
+
|
|
2858
|
+
| Variable | Description |
|
|
2859
|
+
|----------|-------------|
|
|
2860
|
+
| `TOONDB_LIB_PATH` | Custom path to native library |
|
|
2861
|
+
| `TOONDB_DISABLE_ANALYTICS` | Disable anonymous usage tracking |
|
|
2862
|
+
| `TOONDB_LOG_LEVEL` | Log level (DEBUG, INFO, WARN, ERROR) |
|
|
2863
|
+
|
|
2864
|
+
---
|
|
2865
|
+
|
|
2866
|
+
## 32. Error Handling
|
|
2867
|
+
|
|
2868
|
+
### Error Types
|
|
2869
|
+
|
|
2870
|
+
```typescript
|
|
2871
|
+
from sochdb import (
|
|
2872
|
+
# Base
|
|
2873
|
+
SochDBError,
|
|
2874
|
+
|
|
2875
|
+
# Connection
|
|
2876
|
+
ConnectionError,
|
|
2877
|
+
ConnectionTimeoutError,
|
|
2878
|
+
|
|
2879
|
+
# Transaction
|
|
2880
|
+
TransactionError,
|
|
2881
|
+
TransactionConflictError, # SSI conflict - retry
|
|
2882
|
+
TransactionTimeoutError,
|
|
2883
|
+
|
|
2884
|
+
# Storage
|
|
2885
|
+
DatabaseError,
|
|
2886
|
+
CorruptionError,
|
|
2887
|
+
DiskFullError,
|
|
2888
|
+
|
|
2889
|
+
# Namespace
|
|
2890
|
+
NamespaceNotFoundError,
|
|
2891
|
+
NamespaceExistsError,
|
|
2892
|
+
NamespaceAccessError,
|
|
2893
|
+
|
|
2894
|
+
# Collection
|
|
2895
|
+
CollectionNotFoundError,
|
|
2896
|
+
CollectionExistsError,
|
|
2897
|
+
CollectionConfigError,
|
|
2898
|
+
|
|
2899
|
+
# Validation
|
|
2900
|
+
ValidationError,
|
|
2901
|
+
DimensionMismatchError,
|
|
2902
|
+
InvalidMetadataError,
|
|
2903
|
+
|
|
2904
|
+
# Query
|
|
2905
|
+
QueryError,
|
|
2906
|
+
QuerySyntaxError,
|
|
2907
|
+
QueryTimeoutError,
|
|
2908
|
+
)
|
|
2909
|
+
```
|
|
2910
|
+
|
|
2911
|
+
### Error Handling Pattern
|
|
2912
|
+
|
|
2913
|
+
```typescript
|
|
2914
|
+
from sochdb import (
|
|
2915
|
+
SochDBError,
|
|
2916
|
+
TransactionConflictError,
|
|
2917
|
+
DimensionMismatchError,
|
|
2918
|
+
CollectionNotFoundError,
|
|
2919
|
+
)
|
|
2920
|
+
|
|
2921
|
+
try:
|
|
2922
|
+
with db.transaction() as txn:
|
|
2923
|
+
txn.put(b"key", b"value")
|
|
2924
|
+
|
|
2925
|
+
except TransactionConflictError as e:
|
|
2926
|
+
# SSI conflict - safe to retry
|
|
2927
|
+
print(f"Conflict detected: {e}")
|
|
2928
|
+
|
|
2929
|
+
except DimensionMismatchError as e:
|
|
2930
|
+
# Vector dimension wrong
|
|
2931
|
+
print(f"Expected {e.expected} dimensions, got {e.actual}")
|
|
2932
|
+
|
|
2933
|
+
except CollectionNotFoundError as e:
|
|
2934
|
+
# Collection doesn't exist
|
|
2935
|
+
print(f"Collection not found: {e.collection}")
|
|
2936
|
+
|
|
2937
|
+
except SochDBError as e:
|
|
2938
|
+
# All other SochDB errors
|
|
2939
|
+
print(f"Error: {e}")
|
|
2940
|
+
print(f"Code: {e.code}")
|
|
2941
|
+
print(f"Remediation: {e.remediation}")
|
|
2942
|
+
```
|
|
2943
|
+
|
|
2944
|
+
### Error Information
|
|
2945
|
+
|
|
2946
|
+
```typescript
|
|
2947
|
+
try:
|
|
2948
|
+
# ...
|
|
2949
|
+
except SochDBError as e:
|
|
2950
|
+
print(f"Message: {e.message}")
|
|
2951
|
+
print(f"Code: {e.code}") # ErrorCode enum
|
|
2952
|
+
print(f"Details: {e.details}") # Additional context
|
|
2953
|
+
print(f"Remediation: {e.remediation}") # How to fix
|
|
2954
|
+
print(f"Retryable: {e.retryable}") # Safe to retry?
|
|
2955
|
+
```
|
|
2956
|
+
|
|
2957
|
+
---
|
|
2958
|
+
|
|
2959
|
+
## 33. Async Support
|
|
2960
|
+
|
|
2961
|
+
Optional async/await support for non-blocking operations.
|
|
2962
|
+
|
|
2963
|
+
```typescript
|
|
2964
|
+
from sochdb import AsyncDatabase
|
|
2965
|
+
|
|
2966
|
+
async def main():
|
|
2967
|
+
# Open async database
|
|
2968
|
+
db = await AsyncDatabase.open("./my_db")
|
|
2969
|
+
|
|
2970
|
+
# Async operations
|
|
2971
|
+
await db.put(b"key", b"value")
|
|
2972
|
+
value = await db.get(b"key")
|
|
2973
|
+
|
|
2974
|
+
# Async transactions
|
|
2975
|
+
async with db.transaction() as txn:
|
|
2976
|
+
await txn.put(b"key1", b"value1")
|
|
2977
|
+
await txn.put(b"key2", b"value2")
|
|
2978
|
+
|
|
2979
|
+
# Async vector search
|
|
2980
|
+
results = await db.collection("docs").search(SearchRequest(
|
|
2981
|
+
vector=[0.1, 0.2, ...],
|
|
2982
|
+
k=10
|
|
2983
|
+
))
|
|
2984
|
+
|
|
2985
|
+
await db.close()
|
|
2986
|
+
|
|
2987
|
+
# Run
|
|
2988
|
+
import asyncio
|
|
2989
|
+
asyncio.run(main())
|
|
2990
|
+
```
|
|
2991
|
+
|
|
2992
|
+
**Note:** Requires `npm install @sochdb/sochdb[async]`
|
|
2993
|
+
|
|
2994
|
+
---
|
|
2995
|
+
|
|
2996
|
+
## 34. Building & Development
|
|
2997
|
+
|
|
2998
|
+
### Building Native Extensions
|
|
2999
|
+
|
|
3000
|
+
```bash
|
|
3001
|
+
# Build for current platform
|
|
3002
|
+
typescript build_native.ts
|
|
3003
|
+
|
|
3004
|
+
# Build only FFI libraries
|
|
3005
|
+
typescript build_native.ts --libs
|
|
3006
|
+
|
|
3007
|
+
# Build for all platforms
|
|
3008
|
+
typescript build_native.ts --all
|
|
3009
|
+
|
|
3010
|
+
# Clean
|
|
3011
|
+
typescript build_native.ts --clean
|
|
3012
|
+
```
|
|
3013
|
+
|
|
3014
|
+
### Library Discovery
|
|
3015
|
+
|
|
3016
|
+
The SDK looks for native libraries in this order:
|
|
3017
|
+
1. `TOONDB_LIB_PATH` environment variable
|
|
3018
|
+
2. Bundled in wheel: `lib/{target}/`
|
|
3019
|
+
3. Package directory
|
|
3020
|
+
4. Development builds: `target/release/`, `target/debug/`
|
|
3021
|
+
5. System paths: `/usr/local/lib`, `/usr/lib`
|
|
3022
|
+
|
|
3023
|
+
### Running Tests
|
|
3024
|
+
|
|
3025
|
+
```bash
|
|
3026
|
+
# All tests
|
|
3027
|
+
pytest
|
|
3028
|
+
|
|
3029
|
+
# Specific test file
|
|
3030
|
+
pytest tests/test_vector_search.ts
|
|
3031
|
+
|
|
3032
|
+
# With coverage
|
|
3033
|
+
pytest --cov=sochdb
|
|
3034
|
+
|
|
3035
|
+
# Performance tests
|
|
3036
|
+
pytest tests/perf/ --benchmark
|
|
3037
|
+
```
|
|
3038
|
+
|
|
3039
|
+
### Package Structure
|
|
3040
|
+
|
|
3041
|
+
```
|
|
3042
|
+
sochdb/
|
|
3043
|
+
├── __init__.ts # Public API exports
|
|
3044
|
+
├── database.ts # Database, Transaction
|
|
3045
|
+
├── namespace.ts # Namespace, Collection
|
|
3046
|
+
├── vector.ts # VectorIndex, utilities
|
|
3047
|
+
├── grpc_client.ts # SochDBClient (server mode)
|
|
3048
|
+
├── ipc_client.ts # IpcClient (Unix sockets)
|
|
3049
|
+
├── context.ts # ContextQueryBuilder
|
|
3050
|
+
├── atomic.ts # AtomicMemoryWriter
|
|
3051
|
+
├── recovery.ts # RecoveryManager
|
|
3052
|
+
├── checkpoint.ts # CheckpointService
|
|
3053
|
+
├── workflow.ts # WorkflowService
|
|
3054
|
+
├── trace.ts # TraceStore
|
|
3055
|
+
├── policy.ts # PolicyService
|
|
3056
|
+
├── format.ts # WireFormat, ContextFormat
|
|
3057
|
+
├── errors.ts # All error types
|
|
3058
|
+
├── _bin/ # Bundled binaries
|
|
3059
|
+
└── lib/ # FFI libraries
|
|
3060
|
+
```
|
|
3061
|
+
|
|
3062
|
+
---
|
|
3063
|
+
|
|
3064
|
+
## 35. Complete Examples
|
|
3065
|
+
|
|
3066
|
+
### RAG Pipeline Example
|
|
3067
|
+
|
|
3068
|
+
```typescript
|
|
3069
|
+
from sochdb import Database, CollectionConfig, DistanceMetric, SearchRequest
|
|
3070
|
+
|
|
3071
|
+
# Setup
|
|
3072
|
+
db = Database.open("./rag_db")
|
|
3073
|
+
ns = db.get_or_create_namespace("rag")
|
|
3074
|
+
|
|
3075
|
+
# Create collection for documents
|
|
3076
|
+
collection = ns.create_collection(CollectionConfig(
|
|
3077
|
+
name="documents",
|
|
3078
|
+
dimension=384,
|
|
3079
|
+
metric=DistanceMetric.COSINE,
|
|
3080
|
+
enable_hybrid_search=True,
|
|
3081
|
+
content_field="text"
|
|
3082
|
+
))
|
|
3083
|
+
|
|
3084
|
+
# Index documents
|
|
3085
|
+
def index_document(doc_id: str, text: str, embed_fn):
|
|
3086
|
+
embedding = embed_fn(text)
|
|
3087
|
+
collection.insert(
|
|
3088
|
+
id=doc_id,
|
|
3089
|
+
vector=embedding,
|
|
3090
|
+
metadata={"text": text, "indexed_at": "2024-01-15"}
|
|
3091
|
+
)
|
|
3092
|
+
|
|
3093
|
+
# Retrieve relevant context
|
|
3094
|
+
def retrieve_context(query: str, embed_fn, k: int = 5) -> list:
|
|
3095
|
+
query_embedding = embed_fn(query)
|
|
3096
|
+
|
|
3097
|
+
results = collection.hybrid_search(
|
|
3098
|
+
vector=query_embedding,
|
|
3099
|
+
text_query=query,
|
|
3100
|
+
k=k,
|
|
3101
|
+
alpha=0.7 # 70% vector, 30% keyword
|
|
3102
|
+
)
|
|
3103
|
+
|
|
3104
|
+
return [r.metadata["text"] for r in results]
|
|
3105
|
+
|
|
3106
|
+
# Full RAG pipeline
|
|
3107
|
+
def rag_query(query: str, embed_fn, llm_fn):
|
|
3108
|
+
# 1. Retrieve
|
|
3109
|
+
context_docs = retrieve_context(query, embed_fn)
|
|
3110
|
+
|
|
3111
|
+
# 2. Build context
|
|
3112
|
+
from sochdb import ContextQueryBuilder, ContextFormat
|
|
3113
|
+
|
|
3114
|
+
context = ContextQueryBuilder() \
|
|
3115
|
+
.for_session("rag_session") \
|
|
3116
|
+
.with_budget(4096) \
|
|
3117
|
+
.literal("SYSTEM", 0, "Answer based on the provided context.") \
|
|
3118
|
+
.literal("CONTEXT", 1, "\n\n".join(context_docs)) \
|
|
3119
|
+
.literal("QUESTION", 2, query) \
|
|
3120
|
+
.execute()
|
|
3121
|
+
|
|
3122
|
+
# 3. Generate
|
|
3123
|
+
response = llm_fn(context.text)
|
|
3124
|
+
|
|
3125
|
+
return response
|
|
3126
|
+
|
|
3127
|
+
db.close()
|
|
3128
|
+
```
|
|
3129
|
+
|
|
3130
|
+
### Knowledge Graph Example
|
|
3131
|
+
|
|
3132
|
+
```typescript
|
|
3133
|
+
from sochdb import Database
|
|
3134
|
+
import time
|
|
3135
|
+
|
|
3136
|
+
db = Database.open("./knowledge_graph")
|
|
3137
|
+
|
|
3138
|
+
# Build a knowledge graph
|
|
3139
|
+
db.add_node("kg", "alice", "person", {"role": "engineer", "level": "senior"})
|
|
3140
|
+
db.add_node("kg", "bob", "person", {"role": "manager"})
|
|
3141
|
+
db.add_node("kg", "project_ai", "project", {"status": "active", "budget": 100000})
|
|
3142
|
+
db.add_node("kg", "ml_team", "team", {"size": 5})
|
|
3143
|
+
|
|
3144
|
+
db.add_edge("kg", "alice", "works_on", "project_ai", {"role": "lead"})
|
|
3145
|
+
db.add_edge("kg", "alice", "member_of", "ml_team")
|
|
3146
|
+
db.add_edge("kg", "bob", "manages", "project_ai")
|
|
3147
|
+
db.add_edge("kg", "bob", "leads", "ml_team")
|
|
3148
|
+
|
|
3149
|
+
# Query: Find all projects Alice works on
|
|
3150
|
+
nodes, edges = db.traverse("kg", "alice", max_depth=1)
|
|
3151
|
+
projects = [n for n in nodes if n["node_type"] == "project"]
|
|
3152
|
+
print(f"Alice's projects: {[p['id'] for p in projects]}")
|
|
3153
|
+
|
|
3154
|
+
# Query: Who manages Alice's projects?
|
|
3155
|
+
for project in projects:
|
|
3156
|
+
nodes, edges = db.traverse("kg", project["id"], max_depth=1)
|
|
3157
|
+
managers = [e["from_id"] for e in edges if e["edge_type"] == "manages"]
|
|
3158
|
+
print(f"{project['id']} managed by: {managers}")
|
|
3159
|
+
|
|
3160
|
+
db.close()
|
|
3161
|
+
```
|
|
3162
|
+
|
|
3163
|
+
### Multi-Tenant SaaS Example
|
|
3164
|
+
|
|
3165
|
+
```typescript
|
|
3166
|
+
from sochdb import Database
|
|
3167
|
+
|
|
3168
|
+
db = Database.open("./saas_db")
|
|
3169
|
+
|
|
3170
|
+
# Create tenant namespaces
|
|
3171
|
+
for tenant in ["acme_corp", "globex", "initech"]:
|
|
3172
|
+
ns = db.create_namespace(
|
|
3173
|
+
name=tenant,
|
|
3174
|
+
labels={"tier": "premium" if tenant == "acme_corp" else "standard"}
|
|
3175
|
+
)
|
|
3176
|
+
|
|
3177
|
+
# Create tenant-specific collections
|
|
3178
|
+
ns.create_collection(
|
|
3179
|
+
name="documents",
|
|
3180
|
+
dimension=384
|
|
3181
|
+
)
|
|
3182
|
+
|
|
3183
|
+
# Tenant-scoped operations
|
|
3184
|
+
with db.use_namespace("acme_corp") as ns:
|
|
3185
|
+
collection = ns.collection("documents")
|
|
3186
|
+
|
|
3187
|
+
# All operations isolated to acme_corp
|
|
3188
|
+
collection.insert(
|
|
3189
|
+
id="doc1",
|
|
3190
|
+
vector=[0.1] * 384,
|
|
3191
|
+
metadata={"title": "Acme Internal Doc"}
|
|
3192
|
+
)
|
|
3193
|
+
|
|
3194
|
+
# Search only searches acme_corp's documents
|
|
3195
|
+
results = collection.vector_search(
|
|
3196
|
+
vector=[0.1] * 384,
|
|
3197
|
+
k=10
|
|
3198
|
+
)
|
|
3199
|
+
|
|
3200
|
+
# Cleanup
|
|
3201
|
+
db.close()
|
|
3202
|
+
```
|
|
3203
|
+
|
|
3204
|
+
---
|
|
3205
|
+
|
|
3206
|
+
## 36. Migration Guide
|
|
3207
|
+
|
|
3208
|
+
### From v0.2.x to v0.3.x
|
|
3209
|
+
|
|
3210
|
+
```typescript
|
|
3211
|
+
# Old: scan() with range
|
|
3212
|
+
for k, v in db.scan(b"users/", b"users0"): # DEPRECATED
|
|
3213
|
+
pass
|
|
3214
|
+
|
|
3215
|
+
# New: scan_prefix()
|
|
3216
|
+
for k, v in db.scan_prefix(b"users/"):
|
|
3217
|
+
pass
|
|
3218
|
+
|
|
3219
|
+
# Old: execute_sql returns tuple
|
|
3220
|
+
columns, rows = db.execute_sql("SELECT * FROM users")
|
|
3221
|
+
|
|
3222
|
+
# New: execute_sql returns SQLQueryResult
|
|
3223
|
+
result = db.execute_sql("SELECT * FROM users")
|
|
3224
|
+
columns = result.columns
|
|
3225
|
+
rows = result.rows
|
|
3226
|
+
```
|
|
3227
|
+
|
|
3228
|
+
### From SQLite/PostgreSQL
|
|
3229
|
+
|
|
3230
|
+
```typescript
|
|
3231
|
+
# SQLite
|
|
3232
|
+
# conn = sqlite3.connect("app.db")
|
|
3233
|
+
# cursor = conn.execute("SELECT * FROM users")
|
|
3234
|
+
|
|
3235
|
+
# SochDB (same SQL, embedded)
|
|
3236
|
+
db = Database.open("./app_db")
|
|
3237
|
+
result = db.execute_sql("SELECT * FROM users")
|
|
3238
|
+
```
|
|
3239
|
+
|
|
3240
|
+
### From Redis
|
|
3241
|
+
|
|
3242
|
+
```typescript
|
|
3243
|
+
# Redis
|
|
3244
|
+
# r = redis.Redis()
|
|
3245
|
+
# r.set("key", "value")
|
|
3246
|
+
# r.get("key")
|
|
3247
|
+
|
|
3248
|
+
# SochDB
|
|
3249
|
+
db = Database.open("./cache_db")
|
|
3250
|
+
db.put(b"key", b"value")
|
|
3251
|
+
db.get(b"key")
|
|
3252
|
+
|
|
3253
|
+
# With TTL
|
|
3254
|
+
db.put(b"session:123", b"data", ttl_seconds=3600)
|
|
3255
|
+
```
|
|
3256
|
+
|
|
3257
|
+
### From Pinecone/Weaviate
|
|
3258
|
+
|
|
3259
|
+
```typescript
|
|
3260
|
+
# Pinecone
|
|
3261
|
+
# index.upsert(vectors=[(id, embedding, metadata)])
|
|
3262
|
+
# results = index.query(vector=query, top_k=10)
|
|
3263
|
+
|
|
3264
|
+
# SochDB
|
|
3265
|
+
collection = db.namespace("default").collection("vectors")
|
|
3266
|
+
collection.insert(id=id, vector=embedding, metadata=metadata)
|
|
3267
|
+
results = collection.vector_search(vector=query, k=10)
|
|
3268
|
+
```
|
|
3269
|
+
|
|
3270
|
+
---
|
|
3271
|
+
|
|
3272
|
+
## Performance
|
|
3273
|
+
|
|
3274
|
+
**Network Overhead:**
|
|
3275
|
+
- gRPC: ~100-200 μs per request (local)
|
|
3276
|
+
- IPC: ~50-100 μs per request (Unix socket)
|
|
3277
|
+
|
|
3278
|
+
**Batch Operations:**
|
|
3279
|
+
- Vector insert: 50,000 vectors/sec (batch mode)
|
|
3280
|
+
- Vector search: 20,000 queries/sec (47 μs/query)
|
|
3281
|
+
|
|
3282
|
+
**Recommendation:**
|
|
3283
|
+
- Use **batch operations** for high throughput
|
|
3284
|
+
- Use **IPC** for same-machine communication
|
|
3285
|
+
- Use **gRPC** for distributed systems
|
|
3286
|
+
|
|
3287
|
+
---
|
|
3288
|
+
|
|
3289
|
+
## FAQ
|
|
3290
|
+
|
|
3291
|
+
**Q: Which mode should I use?**
|
|
3292
|
+
A:
|
|
3293
|
+
- **Embedded (FFI)**: For local dev, notebooks, single-process apps
|
|
3294
|
+
- **Server (gRPC)**: For production, multi-language, distributed systems
|
|
3295
|
+
|
|
3296
|
+
**Q: Can I switch between modes?**
|
|
3297
|
+
A: Yes! Both modes have the same API. Change `Database.open()` to `SochDBClient()` and vice versa.
|
|
3298
|
+
|
|
3299
|
+
**Q: Do temporal graphs work in embedded mode?**
|
|
3300
|
+
A: Yes! As of v0.4.0, temporal graphs work in both embedded and server modes with identical APIs.
|
|
3301
|
+
|
|
3302
|
+
**Q: Is embedded mode slower than server mode?**
|
|
3303
|
+
A: Embedded mode is faster for single-process use (no network overhead). Server mode is better for distributed deployments.
|
|
3304
|
+
|
|
3305
|
+
**Q: Where is the business logic?**
|
|
3306
|
+
A: All business logic is in Rust. Embedded mode uses FFI bindings, server mode uses gRPC. Same Rust code, different transport.
|
|
3307
|
+
|
|
3308
|
+
**Q: What about the old "fat client" Database class?**
|
|
3309
|
+
A: It's still here as embedded mode! We now support dual-mode: embedded FFI + server gRPC.
|
|
3310
|
+
|
|
3311
|
+
---
|
|
3312
|
+
|
|
3313
|
+
## Examples
|
|
3314
|
+
|
|
3315
|
+
See the [examples/](examples/) directory for complete working examples:
|
|
3316
|
+
|
|
3317
|
+
**Embedded Mode (FFI - No Server):**
|
|
3318
|
+
- [23_collections_embedded.ts](examples/23_collections_embedded.ts) - Document storage, JSON, transactions
|
|
3319
|
+
- [22_namespaces.ts](examples/22_namespaces.ts) - Multi-tenant isolation with key prefixes
|
|
3320
|
+
- [24_batch_operations.ts](examples/24_batch_operations.ts) - Atomic writes, rollback, conditional updates
|
|
3321
|
+
- [25_temporal_graph_embedded.ts](examples/25_temporal_graph_embedded.ts) - Time-travel queries (NEW!)
|
|
3322
|
+
|
|
3323
|
+
**Server Mode (gRPC - Requires Server):**
|
|
3324
|
+
- [21_temporal_graph.ts](examples/21_temporal_graph.ts) - Temporal graphs via gRPC
|
|
3325
|
+
|
|
3326
|
+
---
|
|
3327
|
+
|
|
3328
|
+
## Getting Help
|
|
3329
|
+
|
|
3330
|
+
- **Documentation**: https://sochdb.dev
|
|
3331
|
+
- **GitHub Issues**: https://github.com/sochdb/sochdb/issues
|
|
3332
|
+
- **Examples**: See [examples/](examples/) directory
|
|
3333
|
+
|
|
3334
|
+
---
|
|
3335
|
+
|
|
3336
|
+
## Contributing
|
|
3337
|
+
|
|
3338
|
+
Interested in contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for:
|
|
3339
|
+
- Development environment setup
|
|
3340
|
+
- Building from source
|
|
3341
|
+
- Running tests
|
|
3342
|
+
- Code style guidelines
|
|
3343
|
+
- Pull request process
|
|
3344
|
+
|
|
3345
|
+
---
|
|
3346
|
+
|
|
3347
|
+
## License
|
|
3348
|
+
|
|
3349
|
+
Apache License 2.0
|