@sochdb/sochdb 0.5.1 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +357 -3164
- package/_bin/aarch64-apple-darwin/libsochdb_storage.dylib +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-bulk +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-grpc-server +0 -0
- package/_bin/aarch64-apple-darwin/sochdb-server +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb-bulk.exe +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb-grpc-server.exe +0 -0
- package/_bin/x86_64-pc-windows-msvc/sochdb_storage.dll +0 -0
- package/_bin/x86_64-unknown-linux-gnu/libsochdb_storage.so +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-bulk +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-grpc-server +0 -0
- package/_bin/x86_64-unknown-linux-gnu/sochdb-server +0 -0
- package/dist/cjs/embedded/database.js +279 -1
- package/dist/cjs/index.js +3 -3
- package/dist/cjs/namespace.js +33 -6
- package/dist/cjs/queue.js +143 -24
- package/dist/esm/embedded/database.js +279 -1
- package/dist/esm/index.js +3 -3
- package/dist/esm/namespace.js +33 -6
- package/dist/esm/queue.js +143 -24
- package/dist/types/embedded/database.d.ts +69 -0
- package/dist/types/embedded/database.d.ts.map +1 -1
- package/dist/types/index.d.ts +2 -2
- package/dist/types/namespace.d.ts.map +1 -1
- package/dist/types/queue.d.ts.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -27,54 +27,40 @@ Choose the deployment mode that fits your needs.
|
|
|
27
27
|
|
|
28
28
|
# SochDB Node.js SDK Documentation
|
|
29
29
|
|
|
30
|
-
**LLM-Optimized Embedded Database with Native Vector Search
|
|
30
|
+
> **Version 0.5.2** — LLM-Optimized Embedded Database with Native Vector Search
|
|
31
31
|
|
|
32
32
|
---
|
|
33
33
|
|
|
34
34
|
## Table of Contents
|
|
35
35
|
|
|
36
36
|
1. [Quick Start](#1-quick-start)
|
|
37
|
-
2. [
|
|
38
|
-
|
|
37
|
+
2. [Features](#features)
|
|
38
|
+
- [Memory System](#memory-system---llm-native-memory-for-ai-agents)
|
|
39
|
+
- [Semantic Cache](#semantic-cache---llm-response-caching)
|
|
40
|
+
- [Context Query Builder](#context-query-builder---token-aware-llm-context)
|
|
39
41
|
- [Namespace API](#namespace-api---multi-tenant-isolation)
|
|
40
42
|
- [Priority Queue API](#priority-queue-api---task-processing)
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
25. [Statistics & Monitoring](#25-statistics--monitoring)
|
|
63
|
-
26. [Distributed Tracing](#26-distributed-tracing)
|
|
64
|
-
27. [Workflow & Run Tracking](#27-workflow--run-tracking)
|
|
65
|
-
28. [Server Mode (gRPC Client)](#28-server-mode-grpc-client)
|
|
66
|
-
29. [IPC Client (Unix Sockets)](#29-ipc-client-unix-sockets)
|
|
67
|
-
30. [Standalone VectorIndex](#30-standalone-vectorindex)
|
|
68
|
-
31. [Vector Utilities](#31-vector-utilities)
|
|
69
|
-
32. [Data Formats (TOON/JSON/Columnar)](#32-data-formats-toonjsoncolumnar)
|
|
70
|
-
33. [Policy Service](#33-policy-service)
|
|
71
|
-
34. [MCP (Model Context Protocol)](#34-mcp-model-context-protocol)
|
|
72
|
-
35. [Configuration Reference](#35-configuration-reference)
|
|
73
|
-
36. [Error Handling](#36-error-handling)
|
|
74
|
-
37. [Async Support](#37-async-support)
|
|
75
|
-
38. [Building & Development](#38-building--development)
|
|
76
|
-
39. [Complete Examples](#39-complete-examples)
|
|
77
|
-
40. [Migration Guide](#40-migration-guide)
|
|
43
|
+
3. [Architecture](#architecture-flexible-deployment)
|
|
44
|
+
4. [System Requirements](#system-requirements)
|
|
45
|
+
5. [Troubleshooting](#troubleshooting)
|
|
46
|
+
6. [Vector Search (Native HNSW)](#-vector-search---native-hnsw)
|
|
47
|
+
7. [API Reference](#api-reference)
|
|
48
|
+
- [Core Key-Value Operations](#core-key-value-operations)
|
|
49
|
+
- [Transactions (ACID with SSI)](#transactions-acid-with-ssi)
|
|
50
|
+
- [Prefix Scanning](#prefix-scanning)
|
|
51
|
+
- [Namespaces & Collections](#namespaces--collections)
|
|
52
|
+
- [Priority Queues](#priority-queues)
|
|
53
|
+
- [Graph Operations](#graph-operations)
|
|
54
|
+
- [Semantic Cache](#semantic-cache)
|
|
55
|
+
- [Context Query Builder](#context-query-builder)
|
|
56
|
+
- [Memory System (LLM-Native)](#memory-system-llm-native)
|
|
57
|
+
- [Data Formats (TOON/JSON/Columnar)](#data-formats-toonjsoncolumnar)
|
|
58
|
+
- [Policy Service & MCP](#policy-service--mcp)
|
|
59
|
+
- [Server Mode (IPC / gRPC)](#server-mode-ipc--grpc)
|
|
60
|
+
- [Checkpoints & Statistics](#checkpoints--statistics)
|
|
61
|
+
- [Error Handling](#error-handling)
|
|
62
|
+
- [Configuration Reference](#configuration-reference)
|
|
63
|
+
- [Performance](#performance)
|
|
78
64
|
|
|
79
65
|
---
|
|
80
66
|
|
|
@@ -780,3291 +766,498 @@ class HnswIndex {
|
|
|
780
766
|
}
|
|
781
767
|
```
|
|
782
768
|
|
|
769
|
+
### Engine Status
|
|
770
|
+
|
|
771
|
+
| Component | Status |
|
|
772
|
+
|-----------|--------|
|
|
773
|
+
| **Cost-based optimizer** | ✅ Production-ready — full cost model, cardinality estimation, plan caching |
|
|
774
|
+
| **Adaptive group commit** | ✅ Implemented — Little's Law-based batch sizing |
|
|
775
|
+
| **WAL compaction** | ⚠️ Partial — manual `checkpoint()` + `truncateWal()` available |
|
|
776
|
+
| **HNSW vector index** | ✅ Production-ready — direct FFI bindings |
|
|
777
|
+
|
|
783
778
|
### Roadmap
|
|
784
779
|
|
|
785
|
-
- **Current**: Direct HNSW FFI bindings
|
|
780
|
+
- **Current**: Direct HNSW FFI bindings with production cost-based optimizer
|
|
786
781
|
- **Next**: Collection API auto-uses HNSW in embedded mode
|
|
787
782
|
- **Future**: Persistent HNSW indexes with disk storage
|
|
788
783
|
|
|
789
784
|
---
|
|
790
785
|
|
|
791
|
-
# SochDB Node.js SDK Documentation
|
|
792
|
-
|
|
793
|
-
**LLM-Optimized Embedded Database with Native Vector Search**
|
|
794
|
-
|
|
795
786
|
---
|
|
796
787
|
|
|
797
|
-
##
|
|
788
|
+
## API Reference
|
|
798
789
|
|
|
799
|
-
|
|
800
|
-
|
|
801
|
-
|
|
802
|
-
- [Namespace API](#namespace-api---multi-tenant-isolation)
|
|
803
|
-
- [Priority Queue API](#priority-queue-api---task-processing)
|
|
804
|
-
4. [Architecture Overview](#4-architecture-overview)
|
|
805
|
-
5. [Core Key-Value Operations](#5-core-key-value-operations)
|
|
806
|
-
6. [Transactions (ACID with SSI)](#6-transactions-acid-with-ssi)
|
|
807
|
-
7. [Query Builder](#7-query-builder)
|
|
808
|
-
8. [Prefix Scanning](#8-prefix-scanning)
|
|
809
|
-
9. [SQL Operations](#9-sql-operations)
|
|
810
|
-
10. [Table Management & Index Policies](#10-table-management--index-policies)
|
|
811
|
-
11. [Namespaces & Collections](#11-namespaces--collections)
|
|
812
|
-
12. [Priority Queues](#12-priority-queues)
|
|
813
|
-
13. [Vector Search](#13-vector-search)
|
|
814
|
-
14. [Hybrid Search (Vector + BM25)](#14-hybrid-search-vector--bm25)
|
|
815
|
-
15. [Graph Operations](#15-graph-operations)
|
|
816
|
-
16. [Temporal Graph (Time-Travel)](#16-temporal-graph-time-travel)
|
|
817
|
-
17. [Semantic Cache](#17-semantic-cache)
|
|
818
|
-
18. [Context Query Builder (LLM Optimization)](#18-context-query-builder-llm-optimization)
|
|
819
|
-
19. [Atomic Multi-Index Writes](#19-atomic-multi-index-writes)
|
|
820
|
-
20. [Recovery & WAL Management](#20-recovery--wal-management)
|
|
821
|
-
21. [Checkpoints & Snapshots](#21-checkpoints--snapshots)
|
|
822
|
-
22. [Compression & Storage](#22-compression--storage)
|
|
823
|
-
23. [Statistics & Monitoring](#23-statistics--monitoring)
|
|
824
|
-
24. [Server Mode (gRPC Client)](#24-server-mode-grpc-client)
|
|
825
|
-
25. [IPC Client (Unix Sockets)](#25-ipc-client-unix-sockets)
|
|
826
|
-
26. [Error Handling](#26-error-handling)
|
|
827
|
-
27. [Complete Examples](#27-complete-examples)
|
|
790
|
+
> **Version 0.5.2** — Complete API documentation with TypeScript examples.
|
|
791
|
+
|
|
792
|
+
All core logic runs in the Rust engine via FFI. The SDK is a thin client.
|
|
828
793
|
|
|
829
794
|
---
|
|
830
795
|
|
|
831
|
-
|
|
796
|
+
### Core Key-Value Operations
|
|
832
797
|
|
|
833
798
|
```typescript
|
|
834
|
-
from sochdb
|
|
799
|
+
import { EmbeddedDatabase } from '@sochdb/sochdb';
|
|
835
800
|
|
|
836
|
-
|
|
837
|
-
db = Database.open("./my_database")
|
|
801
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
838
802
|
|
|
839
|
-
|
|
840
|
-
db.put(
|
|
841
|
-
value = db.get(
|
|
803
|
+
// Put / Get / Delete
|
|
804
|
+
await db.put(Buffer.from('user:1'), Buffer.from('{"name":"Alice"}'));
|
|
805
|
+
const value = await db.get(Buffer.from('user:1'));
|
|
806
|
+
console.log(value?.toString()); // {"name":"Alice"}
|
|
807
|
+
await db.delete(Buffer.from('user:1'));
|
|
842
808
|
|
|
843
|
-
|
|
844
|
-
|
|
845
|
-
|
|
846
|
-
txn.put(b"key2", b"value2")
|
|
847
|
-
# Auto-commits on success, auto-rollbacks on exception
|
|
809
|
+
// Path-based keys (hierarchical)
|
|
810
|
+
await db.putPath('/users/alice/profile', Buffer.from('{"age":30}'));
|
|
811
|
+
const profile = await db.getPath('/users/alice/profile');
|
|
848
812
|
|
|
849
|
-
|
|
850
|
-
db.delete(b"hello")
|
|
851
|
-
db.close()
|
|
813
|
+
db.close();
|
|
852
814
|
```
|
|
853
815
|
|
|
854
|
-
**30-Second Overview:**
|
|
855
|
-
- **Key-Value**: Fast reads/writes with `get`/`put`/`delete`
|
|
856
|
-
- **Transactions**: ACID with SSI isolation
|
|
857
|
-
- **Vector Search**: HNSW-based semantic search
|
|
858
|
-
- **Hybrid Search**: Combine vectors with BM25 keyword search
|
|
859
|
-
- **Graph**: Build and traverse knowledge graphs
|
|
860
|
-
- **LLM-Optimized**: TOON format uses 40-60% fewer tokens than JSON
|
|
861
|
-
|
|
862
816
|
---
|
|
863
817
|
|
|
864
|
-
|
|
865
|
-
|
|
866
|
-
```bash
|
|
867
|
-
npm install @sochdb/sochdb
|
|
868
|
-
```
|
|
818
|
+
### Transactions (ACID with SSI)
|
|
869
819
|
|
|
870
|
-
|
|
871
|
-
| Platform | Architecture | Status |
|
|
872
|
-
|----------|--------------|--------|
|
|
873
|
-
| Linux | x86_64, aarch64 | ✅ Full support |
|
|
874
|
-
| macOS | x86_64, arm64 | ✅ Full support |
|
|
875
|
-
| Windows | x86_64 | ✅ Full support |
|
|
820
|
+
SochDB uses Serializable Snapshot Isolation for full ACID transactions:
|
|
876
821
|
|
|
877
|
-
|
|
878
|
-
|
|
879
|
-
# For async support
|
|
880
|
-
npm install @sochdb/sochdb[async]
|
|
822
|
+
```typescript
|
|
823
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
881
824
|
|
|
882
|
-
|
|
883
|
-
|
|
825
|
+
// Auto-managed transaction
|
|
826
|
+
await db.withTransaction(async (txn) => {
|
|
827
|
+
await txn.put(Buffer.from('key1'), Buffer.from('val1'));
|
|
828
|
+
await txn.put(Buffer.from('key2'), Buffer.from('val2'));
|
|
829
|
+
const v = await txn.get(Buffer.from('key1'));
|
|
830
|
+
// Auto-commits on success, auto-aborts on throw
|
|
831
|
+
});
|
|
884
832
|
|
|
885
|
-
|
|
886
|
-
|
|
833
|
+
// Manual transaction control
|
|
834
|
+
const txn = db.transaction();
|
|
835
|
+
try {
|
|
836
|
+
await txn.put(Buffer.from('balance:alice'), Buffer.from('100'));
|
|
837
|
+
await txn.put(Buffer.from('balance:bob'), Buffer.from('200'));
|
|
838
|
+
await txn.commit(); // Single atomic fsync
|
|
839
|
+
} catch (err) {
|
|
840
|
+
await txn.abort();
|
|
841
|
+
throw err;
|
|
842
|
+
}
|
|
887
843
|
```
|
|
888
844
|
|
|
889
845
|
---
|
|
890
846
|
|
|
891
|
-
|
|
892
|
-
|
|
893
|
-
SochDB supports two deployment modes:
|
|
894
|
-
|
|
895
|
-
### Embedded Mode (Default)
|
|
896
|
-
|
|
897
|
-
Direct Rust bindings via FFI. No server required.
|
|
898
|
-
|
|
899
|
-
```typescript
|
|
900
|
-
from sochdb import Database
|
|
901
|
-
|
|
902
|
-
with Database.open("./mydb") as db:
|
|
903
|
-
db.put(b"key", b"value")
|
|
904
|
-
value = db.get(b"key")
|
|
905
|
-
```
|
|
906
|
-
|
|
907
|
-
**Best for:** Local development, notebooks, single-process applications.
|
|
908
|
-
|
|
909
|
-
### Server Mode (gRPC)
|
|
910
|
-
|
|
911
|
-
Thin client connecting to `sochdb-grpc` server.
|
|
847
|
+
### Prefix Scanning
|
|
912
848
|
|
|
913
849
|
```typescript
|
|
914
|
-
|
|
915
|
-
|
|
916
|
-
client = SochDBClient("localhost:50051")
|
|
917
|
-
client.put(b"key", b"value", namespace="default")
|
|
918
|
-
value = client.get(b"key", namespace="default")
|
|
919
|
-
```
|
|
920
|
-
|
|
921
|
-
**Best for:** Production, multi-process, distributed systems.
|
|
850
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
922
851
|
|
|
923
|
-
|
|
852
|
+
// Insert test data
|
|
853
|
+
await db.put(Buffer.from('user:1'), Buffer.from('Alice'));
|
|
854
|
+
await db.put(Buffer.from('user:2'), Buffer.from('Bob'));
|
|
855
|
+
await db.put(Buffer.from('user:3'), Buffer.from('Charlie'));
|
|
924
856
|
|
|
925
|
-
|
|
926
|
-
|
|
927
|
-
|
|
928
|
-
|
|
929
|
-
| Multi-process | ❌ | ✅ |
|
|
930
|
-
| Horizontal scaling | ❌ | ✅ |
|
|
931
|
-
| Vector search | ✅ | ✅ |
|
|
932
|
-
| Graph operations | ✅ | ✅ |
|
|
933
|
-
| Semantic cache | ✅ | ✅ |
|
|
934
|
-
| Context service | Limited | ✅ Full |
|
|
935
|
-
| MCP integration | ❌ | ✅ |
|
|
857
|
+
// Scan all keys with prefix
|
|
858
|
+
for await (const [key, value] of db.scanPrefix(Buffer.from('user:'))) {
|
|
859
|
+
console.log(`${key.toString()} = ${value.toString()}`);
|
|
860
|
+
}
|
|
936
861
|
|
|
937
|
-
|
|
938
|
-
|
|
939
|
-
|
|
940
|
-
|
|
941
|
-
|
|
942
|
-
|
|
943
|
-
│ │ Node.js App │ │ Node.js App │ │
|
|
944
|
-
│ │ ├─ Database.open()│ │ ├─ SochDBClient() │ │
|
|
945
|
-
│ │ └─ Direct FFI │ │ └─ gRPC calls │ │
|
|
946
|
-
│ │ │ │ │ │ │ │
|
|
947
|
-
│ │ ▼ │ │ ▼ │ │
|
|
948
|
-
│ │ libsochdb_storage │ │ sochdb-grpc │ │
|
|
949
|
-
│ │ (Rust native) │ │ (Rust server) │ │
|
|
950
|
-
│ └─────────────────────┘ └─────────────────────┘ │
|
|
951
|
-
│ │
|
|
952
|
-
│ ✅ No server needed ✅ Multi-language │
|
|
953
|
-
│ ✅ Local files ✅ Centralized logic │
|
|
954
|
-
│ ✅ Simple deployment ✅ Production scale │
|
|
955
|
-
└─────────────────────────────────────────────────────────────┘
|
|
862
|
+
// Transaction-scoped scan
|
|
863
|
+
const txn = db.transaction();
|
|
864
|
+
for await (const [k, v] of txn.scanPrefix(Buffer.from('order:'))) {
|
|
865
|
+
console.log(`${k.toString()} = ${v.toString()}`);
|
|
866
|
+
}
|
|
867
|
+
await txn.commit();
|
|
956
868
|
```
|
|
957
869
|
|
|
958
870
|
---
|
|
959
871
|
|
|
960
|
-
|
|
961
|
-
|
|
962
|
-
All keys and values are **bytes**.
|
|
963
|
-
|
|
964
|
-
### Basic Operations
|
|
965
|
-
|
|
966
|
-
```typescript
|
|
967
|
-
from sochdb import Database
|
|
968
|
-
|
|
969
|
-
db = Database.open("./my_db")
|
|
970
|
-
|
|
971
|
-
# Store data
|
|
972
|
-
db.put(b"user:1", b"Alice")
|
|
973
|
-
db.put(b"user:2", b"Bob")
|
|
974
|
-
|
|
975
|
-
# Retrieve data
|
|
976
|
-
user = db.get(b"user:1") # Returns b"Alice" or None
|
|
977
|
-
|
|
978
|
-
# Check existence
|
|
979
|
-
exists = db.exists(b"user:1") # True
|
|
980
|
-
|
|
981
|
-
# Delete data
|
|
982
|
-
db.delete(b"user:1")
|
|
983
|
-
|
|
984
|
-
db.close()
|
|
985
|
-
```
|
|
986
|
-
|
|
987
|
-
### Path-Based Keys (Hierarchical)
|
|
872
|
+
### Namespaces & Collections
|
|
988
873
|
|
|
989
|
-
|
|
874
|
+
Multi-tenant isolation with vector-enabled collections:
|
|
990
875
|
|
|
991
876
|
```typescript
|
|
992
|
-
|
|
993
|
-
db.put_path("users/alice/name", b"Alice Smith")
|
|
994
|
-
db.put_path("users/alice/email", b"alice@example.com")
|
|
995
|
-
db.put_path("users/bob/name", b"Bob Jones")
|
|
996
|
-
|
|
997
|
-
# Retrieve by path
|
|
998
|
-
name = db.get_path("users/alice/name") # b"Alice Smith"
|
|
999
|
-
|
|
1000
|
-
# Delete by path
|
|
1001
|
-
db.delete_path("users/alice/email")
|
|
877
|
+
import { EmbeddedDatabase } from '@sochdb/sochdb';
|
|
1002
878
|
|
|
1003
|
-
|
|
1004
|
-
children = db.list_path("users/") # ["alice", "bob"]
|
|
1005
|
-
```
|
|
879
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
1006
880
|
|
|
1007
|
-
|
|
881
|
+
// Create or get a namespace
|
|
882
|
+
const ns = await db.getOrCreateNamespace('tenant_1', {
|
|
883
|
+
displayName: 'Tenant One',
|
|
884
|
+
labels: { tier: 'premium' },
|
|
885
|
+
});
|
|
1008
886
|
|
|
1009
|
-
|
|
1010
|
-
|
|
1011
|
-
|
|
887
|
+
// Create a collection with vector search
|
|
888
|
+
const docs = await ns.createCollection({
|
|
889
|
+
name: 'documents',
|
|
890
|
+
dimension: 384,
|
|
891
|
+
metric: 'cosine',
|
|
892
|
+
indexed: true,
|
|
893
|
+
hnswM: 16,
|
|
894
|
+
hnswEfConstruction: 100,
|
|
895
|
+
});
|
|
1012
896
|
|
|
1013
|
-
|
|
1014
|
-
|
|
1015
|
-
|
|
897
|
+
// Insert vectors with metadata
|
|
898
|
+
const id = await docs.insert(
|
|
899
|
+
[0.1, 0.2, 0.3 /* ...384 dims */],
|
|
900
|
+
{ title: 'Introduction to AI', author: 'Alice' },
|
|
901
|
+
);
|
|
1016
902
|
|
|
1017
|
-
|
|
903
|
+
// Batch insert
|
|
904
|
+
const ids = await docs.insertMany(
|
|
905
|
+
[[0.1, 0.2 /* ... */], [0.3, 0.4 /* ... */]],
|
|
906
|
+
[{ title: 'Doc 1' }, { title: 'Doc 2' }],
|
|
907
|
+
);
|
|
1018
908
|
|
|
1019
|
-
|
|
1020
|
-
|
|
1021
|
-
|
|
1022
|
-
|
|
1023
|
-
|
|
1024
|
-
|
|
1025
|
-
|
|
1026
|
-
|
|
1027
|
-
# Batch get
|
|
1028
|
-
values = db.get_batch([b"key1", b"key2", b"key3"])
|
|
1029
|
-
# Returns: [b"value1", b"value2", b"value3"] (None for missing keys)
|
|
1030
|
-
|
|
1031
|
-
# Batch delete
|
|
1032
|
-
db.delete_batch([b"key1", b"key2", b"key3"])
|
|
1033
|
-
```
|
|
909
|
+
// Search with optional filter
|
|
910
|
+
const results = await docs.search({
|
|
911
|
+
queryVector: [0.1, 0.2, 0.3 /* ...384 dims */],
|
|
912
|
+
k: 5,
|
|
913
|
+
filter: { author: 'Alice' },
|
|
914
|
+
includeMetadata: true,
|
|
915
|
+
});
|
|
916
|
+
results.forEach(r => console.log(`${r.id}: score=${r.score.toFixed(4)}`));
|
|
1034
917
|
|
|
1035
|
-
|
|
918
|
+
// Collection management
|
|
919
|
+
const count = await docs.count();
|
|
920
|
+
const collections = await ns.listCollections(); // ['documents']
|
|
921
|
+
await ns.deleteCollection('documents');
|
|
1036
922
|
|
|
1037
|
-
|
|
1038
|
-
|
|
1039
|
-
|
|
1040
|
-
# Automatically closes when exiting
|
|
923
|
+
// Namespace management
|
|
924
|
+
const namespaces = await db.listNamespaces();
|
|
925
|
+
await db.deleteNamespace('tenant_1');
|
|
1041
926
|
```
|
|
1042
927
|
|
|
1043
928
|
---
|
|
1044
929
|
|
|
1045
|
-
|
|
1046
|
-
|
|
1047
|
-
SochDB provides full ACID transactions with **Serializable Snapshot Isolation (SSI)**.
|
|
1048
|
-
|
|
1049
|
-
### Context Manager Pattern (Recommended)
|
|
1050
|
-
|
|
1051
|
-
```typescript
|
|
1052
|
-
# Auto-commits on success, auto-rollbacks on exception
|
|
1053
|
-
with db.transaction() as txn:
|
|
1054
|
-
txn.put(b"accounts/alice", b"1000")
|
|
1055
|
-
txn.put(b"accounts/bob", b"500")
|
|
1056
|
-
|
|
1057
|
-
# Read within transaction sees your writes
|
|
1058
|
-
balance = txn.get(b"accounts/alice") # b"1000"
|
|
1059
|
-
|
|
1060
|
-
# If exception occurs, rolls back automatically
|
|
1061
|
-
```
|
|
1062
|
-
|
|
1063
|
-
### Closure Pattern (Rust-Style)
|
|
1064
|
-
|
|
1065
|
-
```typescript
|
|
1066
|
-
# Using with_transaction for automatic commit/rollback
|
|
1067
|
-
def transfer_funds(txn):
|
|
1068
|
-
alice = int(txn.get(b"accounts/alice") or b"0")
|
|
1069
|
-
bob = int(txn.get(b"accounts/bob") or b"0")
|
|
1070
|
-
|
|
1071
|
-
txn.put(b"accounts/alice", str(alice - 100).encode())
|
|
1072
|
-
txn.put(b"accounts/bob", str(bob + 100).encode())
|
|
1073
|
-
|
|
1074
|
-
return "Transfer complete"
|
|
1075
|
-
|
|
1076
|
-
result = db.with_transaction(transfer_funds)
|
|
1077
|
-
```
|
|
1078
|
-
|
|
1079
|
-
### Manual Transaction Control
|
|
1080
|
-
|
|
1081
|
-
```typescript
|
|
1082
|
-
txn = db.begin_transaction()
|
|
1083
|
-
try:
|
|
1084
|
-
txn.put(b"key1", b"value1")
|
|
1085
|
-
txn.put(b"key2", b"value2")
|
|
1086
|
-
|
|
1087
|
-
commit_ts = txn.commit() # Returns HLC timestamp
|
|
1088
|
-
print(f"Committed at: {commit_ts}")
|
|
1089
|
-
except Exception as e:
|
|
1090
|
-
txn.abort()
|
|
1091
|
-
raise
|
|
1092
|
-
```
|
|
1093
|
-
|
|
1094
|
-
### Transaction Properties
|
|
1095
|
-
|
|
1096
|
-
```typescript
|
|
1097
|
-
txn = db.transaction()
|
|
1098
|
-
print(f"Transaction ID: {txn.id}") # Unique identifier
|
|
1099
|
-
print(f"Start timestamp: {txn.start_ts}") # HLC start time
|
|
1100
|
-
print(f"Isolation: {txn.isolation}") # "serializable"
|
|
1101
|
-
```
|
|
930
|
+
### Priority Queues
|
|
1102
931
|
|
|
1103
|
-
|
|
932
|
+
Ordered task queues with priority-based dequeue, ack/nack, and dead-letter support:
|
|
1104
933
|
|
|
1105
934
|
```typescript
|
|
1106
|
-
from sochdb
|
|
1107
|
-
|
|
1108
|
-
MAX_RETRIES = 3
|
|
1109
|
-
|
|
1110
|
-
for attempt in range(MAX_RETRIES):
|
|
1111
|
-
try:
|
|
1112
|
-
with db.transaction() as txn:
|
|
1113
|
-
# Read and modify
|
|
1114
|
-
value = int(txn.get(b"counter") or b"0")
|
|
1115
|
-
txn.put(b"counter", str(value + 1).encode())
|
|
1116
|
-
break # Success
|
|
1117
|
-
except TransactionConflictError:
|
|
1118
|
-
if attempt == MAX_RETRIES - 1:
|
|
1119
|
-
raise
|
|
1120
|
-
# Retry on conflict
|
|
1121
|
-
continue
|
|
1122
|
-
```
|
|
935
|
+
import { createQueue, EmbeddedDatabase } from '@sochdb/sochdb';
|
|
1123
936
|
|
|
1124
|
-
|
|
937
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
1125
938
|
|
|
1126
|
-
|
|
1127
|
-
|
|
1128
|
-
|
|
1129
|
-
|
|
1130
|
-
|
|
1131
|
-
|
|
1132
|
-
txn.exists(key)
|
|
1133
|
-
|
|
1134
|
-
# Path-based
|
|
1135
|
-
txn.put_path(path, value)
|
|
1136
|
-
txn.get_path(path)
|
|
1137
|
-
|
|
1138
|
-
# Batch operations
|
|
1139
|
-
txn.put_batch(pairs)
|
|
1140
|
-
txn.get_batch(keys)
|
|
1141
|
-
|
|
1142
|
-
# Scanning
|
|
1143
|
-
for k, v in txn.scan_prefix(b"prefix/"):
|
|
1144
|
-
print(k, v)
|
|
1145
|
-
|
|
1146
|
-
# SQL (within transaction isolation)
|
|
1147
|
-
result = txn.execute("SELECT * FROM users WHERE id = 1")
|
|
1148
|
-
```
|
|
939
|
+
// Create a queue
|
|
940
|
+
const queue = createQueue(db, 'background-jobs', {
|
|
941
|
+
visibilityTimeout: 30,
|
|
942
|
+
maxRetries: 3,
|
|
943
|
+
deadLetterQueue: 'failed-jobs',
|
|
944
|
+
});
|
|
1149
945
|
|
|
1150
|
-
|
|
946
|
+
// Enqueue tasks (lower priority = higher urgency)
|
|
947
|
+
const taskId = await queue.enqueue(
|
|
948
|
+
1,
|
|
949
|
+
Buffer.from(JSON.stringify({ action: 'send_email', to: 'alice@example.com' })),
|
|
950
|
+
{ source: 'api', retryable: true },
|
|
951
|
+
);
|
|
1151
952
|
|
|
1152
|
-
|
|
1153
|
-
|
|
953
|
+
await queue.enqueue(10, Buffer.from('low priority'));
|
|
954
|
+
await queue.enqueue(1, Buffer.from('high priority'));
|
|
1154
955
|
|
|
1155
|
-
|
|
1156
|
-
|
|
1157
|
-
|
|
956
|
+
// Dequeue highest priority task
|
|
957
|
+
const task = await queue.dequeue('worker-1');
|
|
958
|
+
if (task) {
|
|
959
|
+
console.log(`Processing: ${task.taskId}`);
|
|
960
|
+
console.log(`Priority: ${task.priority}, State: ${task.state}`);
|
|
961
|
+
|
|
962
|
+
try {
|
|
963
|
+
// Process task...
|
|
964
|
+
await queue.ack(task.taskId); // Mark completed
|
|
965
|
+
} catch (err) {
|
|
966
|
+
await queue.nack(task.taskId); // Re-queue for retry
|
|
967
|
+
}
|
|
968
|
+
}
|
|
1158
969
|
|
|
1159
|
-
|
|
1160
|
-
|
|
1161
|
-
|
|
970
|
+
// Queue statistics
|
|
971
|
+
const stats = await queue.stats();
|
|
972
|
+
console.log(`Pending: ${stats.pending}, Claimed: ${stats.claimed}`);
|
|
1162
973
|
|
|
1163
|
-
|
|
1164
|
-
|
|
1165
|
-
pass
|
|
974
|
+
// Purge completed/dead-lettered tasks
|
|
975
|
+
const purged = await queue.purge();
|
|
1166
976
|
```
|
|
1167
977
|
|
|
1168
978
|
---
|
|
1169
979
|
|
|
1170
|
-
|
|
1171
|
-
|
|
1172
|
-
Fluent API for building efficient queries with predicate pushdown.
|
|
1173
|
-
|
|
1174
|
-
### Basic Query
|
|
1175
|
-
|
|
1176
|
-
```typescript
|
|
1177
|
-
# Query with prefix and limit
|
|
1178
|
-
results = db.query("users/")
|
|
1179
|
-
.limit(10)
|
|
1180
|
-
.execute()
|
|
1181
|
-
|
|
1182
|
-
for key, value in results:
|
|
1183
|
-
print(f"{key.decode()}: {value.decode()}")
|
|
1184
|
-
```
|
|
1185
|
-
|
|
1186
|
-
### Filtered Query
|
|
1187
|
-
|
|
1188
|
-
```typescript
|
|
1189
|
-
from sochdb import CompareOp
|
|
1190
|
-
|
|
1191
|
-
# Query with filters
|
|
1192
|
-
results = db.query("orders/")
|
|
1193
|
-
.where("status", CompareOp.EQ, "pending")
|
|
1194
|
-
.where("amount", CompareOp.GT, 100)
|
|
1195
|
-
.order_by("created_at", descending=True)
|
|
1196
|
-
.limit(50)
|
|
1197
|
-
.offset(10)
|
|
1198
|
-
.execute()
|
|
1199
|
-
```
|
|
980
|
+
### Graph Operations
|
|
1200
981
|
|
|
1201
|
-
|
|
982
|
+
Graph overlay stored as key-value pairs in the Rust engine:
|
|
1202
983
|
|
|
1203
984
|
```typescript
|
|
1204
|
-
|
|
1205
|
-
results = db.query("users/")
|
|
1206
|
-
.select(["name", "email"]) # Only fetch these columns
|
|
1207
|
-
.where("active", CompareOp.EQ, True)
|
|
1208
|
-
.execute()
|
|
1209
|
-
```
|
|
1210
|
-
|
|
1211
|
-
### Aggregate Queries
|
|
985
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
1212
986
|
|
|
1213
|
-
|
|
1214
|
-
|
|
1215
|
-
|
|
1216
|
-
|
|
1217
|
-
.count()
|
|
1218
|
-
|
|
1219
|
-
# Sum (for numeric columns)
|
|
1220
|
-
total = db.query("orders/")
|
|
1221
|
-
.sum("amount")
|
|
1222
|
-
|
|
1223
|
-
# Group by
|
|
1224
|
-
results = db.query("orders/")
|
|
1225
|
-
.select(["status", "COUNT(*)", "SUM(amount)"])
|
|
1226
|
-
.group_by("status")
|
|
1227
|
-
.execute()
|
|
1228
|
-
```
|
|
987
|
+
// Add nodes
|
|
988
|
+
await db.addNode('social', 'alice', 'person', { name: 'Alice', role: 'engineer' });
|
|
989
|
+
await db.addNode('social', 'bob', 'person', { name: 'Bob', role: 'designer' });
|
|
990
|
+
await db.addNode('social', 'acme', 'company', { name: 'Acme Corp' });
|
|
1229
991
|
|
|
1230
|
-
|
|
992
|
+
// Add edges
|
|
993
|
+
await db.addEdge('social', 'alice', 'works_at', 'acme');
|
|
994
|
+
await db.addEdge('social', 'bob', 'works_at', 'acme');
|
|
995
|
+
await db.addEdge('social', 'alice', 'knows', 'bob');
|
|
1231
996
|
|
|
1232
|
-
|
|
1233
|
-
|
|
1234
|
-
|
|
1235
|
-
.where("role", CompareOp.EQ, "admin")
|
|
1236
|
-
.execute()
|
|
997
|
+
// Graph traversal (BFS or DFS)
|
|
998
|
+
const result = await db.traverse('social', 'alice', 2, 'bfs');
|
|
999
|
+
console.log(`Nodes: ${result.nodes.length}, Edges: ${result.edges.length}`);
|
|
1237
1000
|
```
|
|
1238
1001
|
|
|
1239
1002
|
---
|
|
1240
1003
|
|
|
1241
|
-
|
|
1242
|
-
|
|
1243
|
-
Iterate over keys with common prefixes efficiently.
|
|
1244
|
-
|
|
1245
|
-
### Safe Prefix Scan (Recommended)
|
|
1246
|
-
|
|
1247
|
-
```typescript
|
|
1248
|
-
# Requires minimum 2-byte prefix (prevents accidental full scans)
|
|
1249
|
-
for key, value in db.scan_prefix(b"users/"):
|
|
1250
|
-
print(f"{key.decode()}: {value.decode()}")
|
|
1251
|
-
|
|
1252
|
-
# Raises ValueError if prefix < 2 bytes
|
|
1253
|
-
```
|
|
1254
|
-
|
|
1255
|
-
### Unchecked Prefix Scan
|
|
1256
|
-
|
|
1257
|
-
```typescript
|
|
1258
|
-
# For internal operations needing empty/short prefixes
|
|
1259
|
-
# WARNING: Can cause expensive full-database scans
|
|
1260
|
-
for key, value in db.scan_prefix_unchecked(b""):
|
|
1261
|
-
print(f"All keys: {key}")
|
|
1262
|
-
```
|
|
1004
|
+
### Semantic Cache
|
|
1263
1005
|
|
|
1264
|
-
|
|
1006
|
+
Cache LLM responses with vector-similarity retrieval:
|
|
1265
1007
|
|
|
1266
1008
|
```typescript
|
|
1267
|
-
|
|
1268
|
-
# Performance: 10,000 results = 10 FFI calls vs 10,000 calls
|
|
1269
|
-
|
|
1270
|
-
for key, value in db.scan_batched(b"prefix/", batch_size=1000):
|
|
1271
|
-
process(key, value)
|
|
1272
|
-
```
|
|
1009
|
+
import { SemanticCache, EmbeddedDatabase } from '@sochdb/sochdb';
|
|
1273
1010
|
|
|
1274
|
-
|
|
1011
|
+
const db = EmbeddedDatabase.open('./cache_db');
|
|
1012
|
+
const cache = new SemanticCache(db, 'llm_responses');
|
|
1275
1013
|
|
|
1276
|
-
|
|
1277
|
-
|
|
1278
|
-
|
|
1279
|
-
|
|
1280
|
-
|
|
1014
|
+
// Store response with embedding
|
|
1015
|
+
await cache.put(
|
|
1016
|
+
'What is machine learning?',
|
|
1017
|
+
'Machine learning is a subset of AI...',
|
|
1018
|
+
[0.1, 0.2, 0.3 /* ... */], // embedding vector
|
|
1019
|
+
3600, // TTL seconds
|
|
1020
|
+
{ model: 'gpt-4', tokens: 42 },
|
|
1021
|
+
);
|
|
1281
1022
|
|
|
1282
|
-
|
|
1023
|
+
// Check cache before calling LLM
|
|
1024
|
+
const hit = await cache.get(queryEmbedding, 0.85);
|
|
1025
|
+
if (hit) {
|
|
1026
|
+
console.log(`Cache HIT (score: ${hit.score.toFixed(4)})`);
|
|
1027
|
+
console.log(`Response: ${hit.value}`);
|
|
1028
|
+
}
|
|
1283
1029
|
|
|
1284
|
-
|
|
1285
|
-
|
|
1286
|
-
|
|
1287
|
-
|
|
1030
|
+
// Cache management
|
|
1031
|
+
const stats = await cache.stats();
|
|
1032
|
+
console.log(`Hit rate: ${(stats.hitRate * 100).toFixed(1)}%`);
|
|
1033
|
+
await cache.purgeExpired();
|
|
1034
|
+
await cache.clear();
|
|
1288
1035
|
```
|
|
1289
1036
|
|
|
1290
|
-
|
|
1037
|
+
Convenience methods on `EmbeddedDatabase`:
|
|
1291
1038
|
|
|
1292
1039
|
```typescript
|
|
1293
|
-
|
|
1294
|
-
|
|
1295
|
-
|
|
1296
|
-
|
|
1297
|
-
# Memory is freed after processing each batch
|
|
1040
|
+
await db.cachePut('my_cache', 'key', 'value', [0.1, 0.2 /* ... */], 3600);
|
|
1041
|
+
const val = await db.cacheGet('my_cache', [0.1, 0.2 /* ... */], 0.85);
|
|
1042
|
+
await db.cacheDelete('my_cache', 'key');
|
|
1043
|
+
await db.cacheClear('my_cache');
|
|
1298
1044
|
```
|
|
1299
1045
|
|
|
1300
1046
|
---
|
|
1301
1047
|
|
|
1302
|
-
|
|
1303
|
-
|
|
1304
|
-
Execute SQL queries for familiar relational patterns.
|
|
1305
|
-
|
|
1306
|
-
### Creating Tables
|
|
1307
|
-
|
|
1308
|
-
```typescript
|
|
1309
|
-
db.execute_sql("""
|
|
1310
|
-
CREATE TABLE users (
|
|
1311
|
-
id INTEGER PRIMARY KEY,
|
|
1312
|
-
name TEXT NOT NULL,
|
|
1313
|
-
email TEXT UNIQUE,
|
|
1314
|
-
age INTEGER,
|
|
1315
|
-
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
|
1316
|
-
)
|
|
1317
|
-
""")
|
|
1318
|
-
|
|
1319
|
-
db.execute_sql("""
|
|
1320
|
-
CREATE TABLE posts (
|
|
1321
|
-
id INTEGER PRIMARY KEY,
|
|
1322
|
-
user_id INTEGER REFERENCES users(id),
|
|
1323
|
-
title TEXT NOT NULL,
|
|
1324
|
-
content TEXT,
|
|
1325
|
-
likes INTEGER DEFAULT 0
|
|
1326
|
-
)
|
|
1327
|
-
""")
|
|
1328
|
-
```
|
|
1329
|
-
|
|
1330
|
-
### CRUD Operations
|
|
1331
|
-
|
|
1332
|
-
```typescript
|
|
1333
|
-
# Insert
|
|
1334
|
-
db.execute_sql("""
|
|
1335
|
-
INSERT INTO users (id, name, email, age)
|
|
1336
|
-
VALUES (1, 'Alice', 'alice@example.com', 30)
|
|
1337
|
-
""")
|
|
1338
|
-
|
|
1339
|
-
# Insert with parameters (prevents SQL injection)
|
|
1340
|
-
db.execute_sql(
|
|
1341
|
-
"INSERT INTO users (id, name, email, age) VALUES (?, ?, ?, ?)",
|
|
1342
|
-
params=[2, "Bob", "bob@example.com", 25]
|
|
1343
|
-
)
|
|
1344
|
-
|
|
1345
|
-
# Select
|
|
1346
|
-
result = db.execute_sql("SELECT * FROM users WHERE age > 25")
|
|
1347
|
-
for row in result.rows:
|
|
1348
|
-
print(row) # {'id': 1, 'name': 'Alice', ...}
|
|
1349
|
-
|
|
1350
|
-
# Update
|
|
1351
|
-
db.execute_sql("UPDATE users SET email = 'alice.new@example.com' WHERE id = 1")
|
|
1352
|
-
|
|
1353
|
-
# Delete
|
|
1354
|
-
db.execute_sql("DELETE FROM users WHERE id = 2")
|
|
1355
|
-
```
|
|
1356
|
-
|
|
1357
|
-
### Upsert (Insert or Update)
|
|
1048
|
+
### Context Query Builder
|
|
1358
1049
|
|
|
1359
|
-
|
|
1360
|
-
# Insert or update on conflict
|
|
1361
|
-
db.execute_sql("""
|
|
1362
|
-
INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
|
|
1363
|
-
ON CONFLICT (id) DO UPDATE SET
|
|
1364
|
-
name = excluded.name,
|
|
1365
|
-
email = excluded.email
|
|
1366
|
-
""")
|
|
1367
|
-
```
|
|
1368
|
-
|
|
1369
|
-
### Query Results
|
|
1050
|
+
Token-budget-aware context assembly for LLM prompts:
|
|
1370
1051
|
|
|
1371
1052
|
```typescript
|
|
1372
|
-
from sochdb
|
|
1373
|
-
|
|
1374
|
-
result = db.execute_sql("SELECT id, name FROM users")
|
|
1053
|
+
import { createContextBuilder, ContextOutputFormat, TruncationStrategy } from '@sochdb/sochdb';
|
|
1375
1054
|
|
|
1376
|
-
|
|
1377
|
-
|
|
1378
|
-
|
|
1055
|
+
const result = createContextBuilder()
|
|
1056
|
+
.forSession('session_123')
|
|
1057
|
+
.withBudget(4096)
|
|
1058
|
+
.setFormat(ContextOutputFormat.MARKDOWN)
|
|
1059
|
+
.setTruncation(TruncationStrategy.PROPORTIONAL)
|
|
1060
|
+
.literal('system', 100, 'You are a helpful assistant.')
|
|
1061
|
+
.literal('user_query', 90, 'Tell me about SochDB')
|
|
1062
|
+
.section('context')
|
|
1063
|
+
.execute();
|
|
1379
1064
|
|
|
1380
|
-
|
|
1381
|
-
|
|
1382
|
-
|
|
1383
|
-
# Convert to different formats
|
|
1384
|
-
df = result.to_dataframe() # pandas DataFrame
|
|
1385
|
-
json_data = result.to_json()
|
|
1065
|
+
console.log(`Tokens used: ${result.tokenCount}`);
|
|
1066
|
+
console.log(result.text);
|
|
1386
1067
|
```
|
|
1387
1068
|
|
|
1388
|
-
|
|
1389
|
-
|
|
1390
|
-
```typescript
|
|
1391
|
-
# Create index
|
|
1392
|
-
db.execute_sql("CREATE INDEX idx_users_email ON users(email)")
|
|
1393
|
-
|
|
1394
|
-
# Create unique index
|
|
1395
|
-
db.execute_sql("CREATE UNIQUE INDEX idx_users_email ON users(email)")
|
|
1396
|
-
|
|
1397
|
-
# Drop index
|
|
1398
|
-
db.execute_sql("DROP INDEX IF EXISTS idx_users_email")
|
|
1069
|
+
---
|
|
1399
1070
|
|
|
1400
|
-
|
|
1401
|
-
indexes = db.list_indexes("users")
|
|
1402
|
-
```
|
|
1071
|
+
### Memory System (LLM-Native)
|
|
1403
1072
|
|
|
1404
|
-
|
|
1073
|
+
Complete memory for AI agents — extraction, consolidation, hybrid retrieval:
|
|
1405
1074
|
|
|
1406
1075
|
```typescript
|
|
1407
|
-
|
|
1408
|
-
|
|
1409
|
-
|
|
1410
|
-
# Execute with different parameters
|
|
1411
|
-
young_active = stmt.execute([25, "active"])
|
|
1412
|
-
old_active = stmt.execute([50, "active"])
|
|
1413
|
-
|
|
1414
|
-
# Close when done
|
|
1415
|
-
stmt.close()
|
|
1416
|
-
```
|
|
1417
|
-
|
|
1418
|
-
### Dialect Support
|
|
1076
|
+
import {
|
|
1077
|
+
EmbeddedDatabase, ExtractionPipeline, Consolidator, HybridRetriever, AllowedSet,
|
|
1078
|
+
} from '@sochdb/sochdb';
|
|
1419
1079
|
|
|
1420
|
-
|
|
1080
|
+
const db = EmbeddedDatabase.open('./memory_db');
|
|
1421
1081
|
|
|
1422
|
-
|
|
1423
|
-
|
|
1424
|
-
|
|
1082
|
+
// 1. Extract entities and relations
|
|
1083
|
+
const pipeline = ExtractionPipeline.fromDatabase(db, 'user_123', {
|
|
1084
|
+
entityTypes: ['person', 'organization', 'location'],
|
|
1085
|
+
minConfidence: 0.7,
|
|
1086
|
+
});
|
|
1087
|
+
const extracted = await pipeline.extractAndCommit(
|
|
1088
|
+
'Alice joined Acme Corp in San Francisco.',
|
|
1089
|
+
async (text) => ({
|
|
1090
|
+
entities: [
|
|
1091
|
+
{ name: 'Alice', entity_type: 'person', confidence: 0.95 },
|
|
1092
|
+
{ name: 'Acme Corp', entity_type: 'organization', confidence: 0.9 },
|
|
1093
|
+
],
|
|
1094
|
+
relations: [
|
|
1095
|
+
{ from_entity: 'Alice', relation_type: 'works_at', to_entity: 'Acme Corp' },
|
|
1096
|
+
],
|
|
1097
|
+
}),
|
|
1098
|
+
);
|
|
1425
1099
|
|
|
1426
|
-
|
|
1427
|
-
|
|
1100
|
+
// 2. Consolidate facts
|
|
1101
|
+
const consolidator = Consolidator.fromDatabase(db, 'user_123');
|
|
1102
|
+
await consolidator.add({
|
|
1103
|
+
fact: { subject: 'Alice', predicate: 'lives_in', object: 'SF' },
|
|
1104
|
+
source: 'conversation_1',
|
|
1105
|
+
confidence: 0.9,
|
|
1106
|
+
});
|
|
1107
|
+
await consolidator.consolidate();
|
|
1108
|
+
const facts = await consolidator.getCanonicalFacts();
|
|
1428
1109
|
|
|
1429
|
-
|
|
1430
|
-
|
|
1110
|
+
// 3. Hybrid retrieval (vector + BM25 with RRF fusion)
|
|
1111
|
+
const retriever = HybridRetriever.fromDatabase(db, 'user_123', 'documents');
|
|
1112
|
+
await retriever.indexDocuments([
|
|
1113
|
+
{ id: 'doc1', content: 'Alice is a software engineer', embedding: [0.1, 0.2 /* ... */] },
|
|
1114
|
+
]);
|
|
1115
|
+
const results = await retriever.retrieve('Who is Alice?', [0.1, 0.2 /* ... */], AllowedSet.allowAll(), 5);
|
|
1431
1116
|
```
|
|
1432
1117
|
|
|
1433
1118
|
---
|
|
1434
1119
|
|
|
1435
|
-
|
|
1436
|
-
|
|
1437
|
-
### Table Information
|
|
1438
|
-
|
|
1439
|
-
```typescript
|
|
1440
|
-
# Get table schema
|
|
1441
|
-
schema = db.get_table_schema("users")
|
|
1442
|
-
print(f"Columns: {schema.columns}")
|
|
1443
|
-
print(f"Primary key: {schema.primary_key}")
|
|
1444
|
-
print(f"Indexes: {schema.indexes}")
|
|
1445
|
-
|
|
1446
|
-
# List all tables
|
|
1447
|
-
tables = db.list_tables()
|
|
1448
|
-
|
|
1449
|
-
# Drop table
|
|
1450
|
-
db.execute_sql("DROP TABLE IF EXISTS old_table")
|
|
1451
|
-
```
|
|
1452
|
-
|
|
1453
|
-
### Index Policies
|
|
1454
|
-
|
|
1455
|
-
Configure per-table indexing strategies for optimal performance:
|
|
1120
|
+
### Data Formats (TOON/JSON/Columnar)
|
|
1456
1121
|
|
|
1457
1122
|
```typescript
|
|
1458
|
-
|
|
1459
|
-
|
|
1460
|
-
|
|
1461
|
-
|
|
1462
|
-
|
|
1463
|
-
|
|
1464
|
-
|
|
1465
|
-
|
|
1123
|
+
// TOON format — compact, LLM-optimized
|
|
1124
|
+
const toon = EmbeddedDatabase.toToon('users', [
|
|
1125
|
+
{ id: 1, name: 'Alice', role: 'engineer' },
|
|
1126
|
+
{ id: 2, name: 'Bob', role: 'designer' },
|
|
1127
|
+
]);
|
|
1128
|
+
// [users]
|
|
1129
|
+
// id|name|role
|
|
1130
|
+
// 1|Alice|engineer
|
|
1131
|
+
// 2|Bob|designer
|
|
1466
1132
|
|
|
1467
|
-
|
|
1468
|
-
|
|
1469
|
-
|
|
1470
|
-
# Get current policy
|
|
1471
|
-
policy = db.get_table_index_policy("users")
|
|
1472
|
-
print(f"Policy: {policy}") # "scan_optimized"
|
|
1133
|
+
// JSON round-trip
|
|
1134
|
+
const json = EmbeddedDatabase.toJson('users', [{ id: 1, name: 'Alice' }]);
|
|
1135
|
+
const parsed = EmbeddedDatabase.fromJson(json);
|
|
1473
1136
|
```
|
|
1474
1137
|
|
|
1475
|
-
### Policy Selection Guide
|
|
1476
|
-
|
|
1477
|
-
| Policy | Insert | Scan | Best For |
|
|
1478
|
-
|--------|--------|------|----------|
|
|
1479
|
-
| `write_optimized` | O(1) | O(N) | High-write ingestion |
|
|
1480
|
-
| `balanced` | O(1) amortized | O(log K) | General use (default) |
|
|
1481
|
-
| `scan_optimized` | O(log N) | O(log N + K) | Analytics, read-heavy |
|
|
1482
|
-
| `append_only` | O(1) | O(N) | Time-series, logs |
|
|
1483
|
-
|
|
1484
1138
|
---
|
|
1485
1139
|
|
|
1486
|
-
|
|
1487
|
-
|
|
1488
|
-
Organize data into logical namespaces for tenant isolation.
|
|
1489
|
-
|
|
1490
|
-
### Creating Namespaces
|
|
1491
|
-
|
|
1492
|
-
```typescript
|
|
1493
|
-
from sochdb import NamespaceConfig
|
|
1494
|
-
|
|
1495
|
-
# Create namespace with metadata
|
|
1496
|
-
ns = db.create_namespace(
|
|
1497
|
-
name="tenant_123",
|
|
1498
|
-
display_name="Acme Corp",
|
|
1499
|
-
labels={"tier": "premium", "region": "us-east"}
|
|
1500
|
-
)
|
|
1501
|
-
|
|
1502
|
-
# Simple creation
|
|
1503
|
-
ns = db.create_namespace("tenant_456")
|
|
1504
|
-
```
|
|
1505
|
-
|
|
1506
|
-
### Getting Namespaces
|
|
1140
|
+
### Policy Service & MCP
|
|
1507
1141
|
|
|
1508
1142
|
```typescript
|
|
1509
|
-
|
|
1510
|
-
ns = db.namespace("tenant_123")
|
|
1511
|
-
|
|
1512
|
-
# Get or create (idempotent)
|
|
1513
|
-
ns = db.get_or_create_namespace("tenant_123")
|
|
1514
|
-
|
|
1515
|
-
# Check if exists
|
|
1516
|
-
exists = db.namespace_exists("tenant_123")
|
|
1517
|
-
```
|
|
1143
|
+
import { PolicyService, McpServer } from '@sochdb/sochdb';
|
|
1518
1144
|
|
|
1519
|
-
|
|
1145
|
+
// Access control
|
|
1146
|
+
const policy = new PolicyService(db);
|
|
1147
|
+
policy.addRule({
|
|
1148
|
+
name: 'Allow Write for Admins',
|
|
1149
|
+
resource: '*', action: 'write', effect: 'allow',
|
|
1150
|
+
condition: (ctx) => ctx.role === 'admin',
|
|
1151
|
+
});
|
|
1152
|
+
const allowed = policy.evaluate({ role: 'admin' }, 'documents', 'write');
|
|
1520
1153
|
|
|
1521
|
-
|
|
1522
|
-
|
|
1523
|
-
|
|
1524
|
-
|
|
1525
|
-
ns.put("config/key", b"value")
|
|
1526
|
-
|
|
1527
|
-
# No need to specify namespace in each call
|
|
1154
|
+
// MCP — expose DB tools to AI agents
|
|
1155
|
+
const server = new McpServer(db, { name: 'my-db', version: '1.0.0' });
|
|
1156
|
+
server.registerDefaultTools();
|
|
1157
|
+
await server.start();
|
|
1528
1158
|
```
|
|
1529
1159
|
|
|
1530
|
-
|
|
1531
|
-
|
|
1532
|
-
```typescript
|
|
1533
|
-
# List all namespaces
|
|
1534
|
-
namespaces = db.list_namespaces()
|
|
1535
|
-
print(namespaces) # ['tenant_123', 'tenant_456']
|
|
1536
|
-
|
|
1537
|
-
# Get namespace info
|
|
1538
|
-
info = db.namespace_info("tenant_123")
|
|
1539
|
-
print(f"Created: {info['created_at']}")
|
|
1540
|
-
print(f"Labels: {info['labels']}")
|
|
1541
|
-
print(f"Size: {info['size_bytes']}")
|
|
1542
|
-
|
|
1543
|
-
# Update labels
|
|
1544
|
-
db.update_namespace("tenant_123", labels={"tier": "enterprise"})
|
|
1545
|
-
|
|
1546
|
-
# Delete namespace (WARNING: deletes all data in namespace)
|
|
1547
|
-
db.delete_namespace("old_tenant", force=True)
|
|
1548
|
-
```
|
|
1160
|
+
---
|
|
1549
1161
|
|
|
1550
|
-
###
|
|
1162
|
+
### Server Mode (IPC / gRPC)
|
|
1551
1163
|
|
|
1552
1164
|
```typescript
|
|
1553
|
-
|
|
1554
|
-
|
|
1555
|
-
# Operations automatically prefixed with namespace
|
|
1556
|
-
ns.put("users/alice", b"data") # Actually: tenant_123/users/alice
|
|
1557
|
-
ns.get("users/alice")
|
|
1558
|
-
ns.delete("users/alice")
|
|
1559
|
-
|
|
1560
|
-
# Scan within namespace
|
|
1561
|
-
for key, value in ns.scan("users/"):
|
|
1562
|
-
print(key, value) # Keys shown without namespace prefix
|
|
1563
|
-
```
|
|
1165
|
+
import { IpcClient, SochDBClient } from '@sochdb/sochdb';
|
|
1564
1166
|
|
|
1565
|
-
|
|
1167
|
+
// IPC Client (Unix socket)
|
|
1168
|
+
const ipc = new IpcClient('/tmp/sochdb.sock');
|
|
1169
|
+
await ipc.connect();
|
|
1170
|
+
await ipc.put(Buffer.from('key'), Buffer.from('value'));
|
|
1171
|
+
const val = await ipc.get(Buffer.from('key'));
|
|
1172
|
+
await ipc.addNode('ns', 'alice', 'person', { name: 'Alice' });
|
|
1173
|
+
ipc.close();
|
|
1566
1174
|
|
|
1567
|
-
|
|
1568
|
-
|
|
1569
|
-
|
|
1570
|
-
|
|
1571
|
-
|
|
1572
|
-
|
|
1573
|
-
)
|
|
1175
|
+
// gRPC Client
|
|
1176
|
+
const grpc = new SochDBClient('localhost:50051');
|
|
1177
|
+
await grpc.createIndex('embeddings', 384, 'cosine');
|
|
1178
|
+
await grpc.insertVectors('embeddings', [1, 2], [[0.1, 0.2 /* ... */], [0.3, 0.4 /* ... */]]);
|
|
1179
|
+
const searchResults = await grpc.search('embeddings', [0.1, 0.2 /* ... */], 5);
|
|
1180
|
+
grpc.close();
|
|
1574
1181
|
```
|
|
1575
1182
|
|
|
1576
1183
|
---
|
|
1577
1184
|
|
|
1578
|
-
|
|
1579
|
-
|
|
1580
|
-
Collections store documents with embeddings for semantic search using HNSW.
|
|
1581
|
-
|
|
1582
|
-
### Collection Configuration
|
|
1583
|
-
|
|
1584
|
-
```typescript
|
|
1585
|
-
from sochdb import (
|
|
1586
|
-
CollectionConfig,
|
|
1587
|
-
DistanceMetric,
|
|
1588
|
-
QuantizationType,
|
|
1589
|
-
)
|
|
1590
|
-
|
|
1591
|
-
config = CollectionConfig(
|
|
1592
|
-
name="documents",
|
|
1593
|
-
dimension=384, # Embedding dimension (must match your model)
|
|
1594
|
-
metric=DistanceMetric.COSINE, # COSINE, EUCLIDEAN, DOT_PRODUCT
|
|
1595
|
-
m=16, # HNSW M parameter (connections per node)
|
|
1596
|
-
ef_construction=100, # HNSW construction quality
|
|
1597
|
-
ef_search=50, # HNSW search quality (higher = slower but better)
|
|
1598
|
-
quantization=QuantizationType.NONE, # NONE, SCALAR (int8), PQ (product quantization)
|
|
1599
|
-
enable_hybrid_search=False, # Enable BM25 + vector
|
|
1600
|
-
content_field=None, # Field for BM25 indexing
|
|
1601
|
-
)
|
|
1602
|
-
```
|
|
1603
|
-
|
|
1604
|
-
### Creating Collections
|
|
1185
|
+
### Checkpoints & Statistics
|
|
1605
1186
|
|
|
1606
1187
|
```typescript
|
|
1607
|
-
|
|
1608
|
-
|
|
1609
|
-
# With config object
|
|
1610
|
-
collection = ns.create_collection(config)
|
|
1188
|
+
const db = EmbeddedDatabase.open('./mydb');
|
|
1611
1189
|
|
|
1612
|
-
|
|
1613
|
-
|
|
1614
|
-
name="documents",
|
|
1615
|
-
dimension=384,
|
|
1616
|
-
metric=DistanceMetric.COSINE
|
|
1617
|
-
)
|
|
1190
|
+
const lsn = await db.checkpoint();
|
|
1191
|
+
console.log(`Checkpoint LSN: ${lsn}`);
|
|
1618
1192
|
|
|
1619
|
-
|
|
1620
|
-
|
|
1193
|
+
const stats = await db.stats();
|
|
1194
|
+
console.log(`Memtable: ${stats.memtableSizeBytes} bytes`);
|
|
1195
|
+
console.log(`WAL: ${stats.walSizeBytes} bytes`);
|
|
1196
|
+
console.log(`Active txns: ${stats.activeTransactions}`);
|
|
1621
1197
|
```
|
|
1622
1198
|
|
|
1623
|
-
|
|
1624
|
-
|
|
1625
|
-
```typescript
|
|
1626
|
-
# Single insert
|
|
1627
|
-
collection.insert(
|
|
1628
|
-
id="doc1",
|
|
1629
|
-
vector=[0.1, 0.2, ...], # 384-dim float array
|
|
1630
|
-
metadata={"title": "Introduction", "author": "Alice", "category": "tech"}
|
|
1631
|
-
)
|
|
1632
|
-
|
|
1633
|
-
# Batch insert (more efficient for bulk loading)
|
|
1634
|
-
collection.insert_batch(
|
|
1635
|
-
ids=["doc1", "doc2", "doc3"],
|
|
1636
|
-
vectors=[[...], [...], [...]], # List of vectors
|
|
1637
|
-
metadata=[
|
|
1638
|
-
{"title": "Doc 1"},
|
|
1639
|
-
{"title": "Doc 2"},
|
|
1640
|
-
{"title": "Doc 3"}
|
|
1641
|
-
]
|
|
1642
|
-
)
|
|
1643
|
-
|
|
1644
|
-
# Multi-vector insert (multiple vectors per document, e.g., chunks)
|
|
1645
|
-
collection.insert_multi(
|
|
1646
|
-
id="long_doc",
|
|
1647
|
-
vectors=[[...], [...], [...]], # Multiple vectors for same doc
|
|
1648
|
-
metadata={"title": "Long Document"}
|
|
1649
|
-
)
|
|
1650
|
-
```
|
|
1199
|
+
---
|
|
1651
1200
|
|
|
1652
|
-
###
|
|
1201
|
+
### Error Handling
|
|
1653
1202
|
|
|
1654
1203
|
```typescript
|
|
1655
|
-
|
|
1656
|
-
|
|
1657
|
-
|
|
1658
|
-
|
|
1659
|
-
vector=[0.15, 0.25, ...], # Query vector
|
|
1660
|
-
k=10, # Number of results
|
|
1661
|
-
ef_search=100, # Search quality (overrides collection default)
|
|
1662
|
-
filter={"author": "Alice"}, # Metadata filter
|
|
1663
|
-
min_score=0.7, # Minimum similarity score
|
|
1664
|
-
include_vectors=False, # Include vectors in results
|
|
1665
|
-
include_metadata=True, # Include metadata in results
|
|
1666
|
-
)
|
|
1667
|
-
results = collection.search(request)
|
|
1668
|
-
|
|
1669
|
-
# Convenience method (simpler)
|
|
1670
|
-
results = collection.vector_search(
|
|
1671
|
-
vector=[0.15, 0.25, ...],
|
|
1672
|
-
k=10,
|
|
1673
|
-
filter={"author": "Alice"}
|
|
1674
|
-
)
|
|
1675
|
-
|
|
1676
|
-
# Process results
|
|
1677
|
-
for result in results:
|
|
1678
|
-
print(f"ID: {result.id}")
|
|
1679
|
-
print(f"Score: {result.score:.4f}") # Similarity score
|
|
1680
|
-
print(f"Metadata: {result.metadata}")
|
|
1681
|
-
```
|
|
1682
|
-
|
|
1683
|
-
### Metadata Filtering
|
|
1204
|
+
import {
|
|
1205
|
+
SochDBError, TransactionError, DatabaseLockedError,
|
|
1206
|
+
NamespaceNotFoundError, CollectionExistsError,
|
|
1207
|
+
} from '@sochdb/sochdb';
|
|
1684
1208
|
|
|
1685
|
-
|
|
1686
|
-
|
|
1687
|
-
|
|
1688
|
-
|
|
1689
|
-
|
|
1690
|
-
|
|
1691
|
-
|
|
1692
|
-
|
|
1693
|
-
|
|
1694
|
-
|
|
1695
|
-
|
|
1696
|
-
|
|
1697
|
-
filter={"category": {"$in": ["tech", "science"]}} # In array
|
|
1698
|
-
filter={"category": {"$nin": ["sports"]}} # Not in array
|
|
1699
|
-
|
|
1700
|
-
# Logical operators
|
|
1701
|
-
filter={"$and": [{"author": "Alice"}, {"year": 2024}]}
|
|
1702
|
-
filter={"$or": [{"category": "tech"}, {"category": "science"}]}
|
|
1703
|
-
filter={"$not": {"author": "Bob"}}
|
|
1704
|
-
|
|
1705
|
-
# Nested filters
|
|
1706
|
-
filter={
|
|
1707
|
-
"$and": [
|
|
1708
|
-
{"$or": [{"category": "tech"}, {"category": "science"}]},
|
|
1709
|
-
{"year": {"$gte": 2020}}
|
|
1710
|
-
]
|
|
1209
|
+
try {
|
|
1210
|
+
await db.withTransaction(async (txn) => {
|
|
1211
|
+
await txn.put(Buffer.from('key'), Buffer.from('value'));
|
|
1212
|
+
});
|
|
1213
|
+
} catch (err) {
|
|
1214
|
+
if (err instanceof DatabaseLockedError) {
|
|
1215
|
+
console.error('Database locked by another process');
|
|
1216
|
+
} else if (err instanceof TransactionError) {
|
|
1217
|
+
console.error('Transaction conflict (SSI) — retry');
|
|
1218
|
+
} else if (err instanceof NamespaceNotFoundError) {
|
|
1219
|
+
console.error('Namespace does not exist');
|
|
1220
|
+
}
|
|
1711
1221
|
}
|
|
1712
1222
|
```
|
|
1713
1223
|
|
|
1714
|
-
|
|
1715
|
-
|
|
1716
|
-
```typescript
|
|
1717
|
-
# Get collection
|
|
1718
|
-
collection = ns.get_collection("documents")
|
|
1719
|
-
# or
|
|
1720
|
-
collection = ns.collection("documents")
|
|
1721
|
-
|
|
1722
|
-
# List collections
|
|
1723
|
-
collections = ns.list_collections()
|
|
1724
|
-
|
|
1725
|
-
# Collection info
|
|
1726
|
-
info = collection.info()
|
|
1727
|
-
print(f"Name: {info['name']}")
|
|
1728
|
-
print(f"Dimension: {info['dimension']}")
|
|
1729
|
-
print(f"Count: {info['count']}")
|
|
1730
|
-
print(f"Metric: {info['metric']}")
|
|
1731
|
-
print(f"Index size: {info['index_size_bytes']}")
|
|
1732
|
-
|
|
1733
|
-
# Delete collection
|
|
1734
|
-
ns.delete_collection("old_collection")
|
|
1735
|
-
|
|
1736
|
-
# Individual document operations
|
|
1737
|
-
doc = collection.get("doc1")
|
|
1738
|
-
collection.delete("doc1")
|
|
1739
|
-
collection.update("doc1", metadata={"category": "updated"})
|
|
1740
|
-
count = collection.count()
|
|
1741
|
-
```
|
|
1224
|
+
---
|
|
1742
1225
|
|
|
1743
|
-
###
|
|
1226
|
+
### Configuration Reference
|
|
1744
1227
|
|
|
1745
1228
|
```typescript
|
|
1746
|
-
|
|
1747
|
-
|
|
1748
|
-
|
|
1749
|
-
|
|
1750
|
-
|
|
1751
|
-
|
|
1752
|
-
|
|
1753
|
-
# Product quantization - 32x memory reduction
|
|
1754
|
-
config = CollectionConfig(
|
|
1755
|
-
name="documents",
|
|
1756
|
-
dimension=768,
|
|
1757
|
-
quantization=QuantizationType.PQ,
|
|
1758
|
-
pq_num_subvectors=96, # 768/96 = 8 dimensions per subvector
|
|
1759
|
-
pq_num_centroids=256 # 8-bit codes
|
|
1760
|
-
)
|
|
1229
|
+
const db = EmbeddedDatabase.open('./mydb', {
|
|
1230
|
+
walEnabled: true,
|
|
1231
|
+
syncMode: 'full', // 'full' | 'normal' | 'off'
|
|
1232
|
+
memtableSizeBytes: 64 * 1024 * 1024,
|
|
1233
|
+
groupCommit: true,
|
|
1234
|
+
indexPolicy: 'balanced', // 'write_optimized' | 'balanced' | 'scan_optimized' | 'append_only'
|
|
1235
|
+
});
|
|
1761
1236
|
```
|
|
1762
1237
|
|
|
1763
|
-
|
|
1764
|
-
|
|
1765
|
-
|
|
1766
|
-
|
|
1767
|
-
Combine vector similarity with keyword matching for best results.
|
|
1238
|
+
| Environment Variable | Description |
|
|
1239
|
+
|---------------------|-------------|
|
|
1240
|
+
| `SOCHDB_LIB_PATH` | Custom path to native library |
|
|
1241
|
+
| `SOCHDB_LOG_LEVEL` | Log level (DEBUG, INFO, WARN, ERROR) |
|
|
1768
1242
|
|
|
1769
|
-
|
|
1243
|
+
---
|
|
1770
1244
|
|
|
1771
|
-
|
|
1772
|
-
config = CollectionConfig(
|
|
1773
|
-
name="articles",
|
|
1774
|
-
dimension=384,
|
|
1775
|
-
enable_hybrid_search=True, # Enable BM25 indexing
|
|
1776
|
-
content_field="text" # Field to index for BM25
|
|
1777
|
-
)
|
|
1778
|
-
collection = ns.create_collection(config)
|
|
1779
|
-
|
|
1780
|
-
# Insert with text content
|
|
1781
|
-
collection.insert(
|
|
1782
|
-
id="article1",
|
|
1783
|
-
vector=[...],
|
|
1784
|
-
metadata={
|
|
1785
|
-
"title": "Machine Learning Tutorial",
|
|
1786
|
-
"text": "This tutorial covers the basics of machine learning...",
|
|
1787
|
-
"category": "tech"
|
|
1788
|
-
}
|
|
1789
|
-
)
|
|
1790
|
-
```
|
|
1791
|
-
|
|
1792
|
-
### Keyword Search (BM25 Only)
|
|
1793
|
-
|
|
1794
|
-
```typescript
|
|
1795
|
-
results = collection.keyword_search(
|
|
1796
|
-
query="machine learning tutorial",
|
|
1797
|
-
k=10,
|
|
1798
|
-
filter={"category": "tech"}
|
|
1799
|
-
)
|
|
1800
|
-
```
|
|
1801
|
-
|
|
1802
|
-
### Hybrid Search (Vector + BM25)
|
|
1803
|
-
|
|
1804
|
-
```typescript
|
|
1805
|
-
# Combine vector and keyword search
|
|
1806
|
-
results = collection.hybrid_search(
|
|
1807
|
-
vector=[0.1, 0.2, ...], # Query embedding
|
|
1808
|
-
text_query="machine learning", # Keyword query
|
|
1809
|
-
k=10,
|
|
1810
|
-
alpha=0.7, # 0.0 = pure keyword, 1.0 = pure vector, 0.5 = balanced
|
|
1811
|
-
filter={"category": "tech"}
|
|
1812
|
-
)
|
|
1813
|
-
```
|
|
1814
|
-
|
|
1815
|
-
### Full SearchRequest for Hybrid
|
|
1816
|
-
|
|
1817
|
-
```typescript
|
|
1818
|
-
request = SearchRequest(
|
|
1819
|
-
vector=[0.1, 0.2, ...],
|
|
1820
|
-
text_query="machine learning",
|
|
1821
|
-
k=10,
|
|
1822
|
-
alpha=0.7, # Blend factor
|
|
1823
|
-
rrf_k=60.0, # RRF k parameter (Reciprocal Rank Fusion)
|
|
1824
|
-
filter={"category": "tech"},
|
|
1825
|
-
aggregate="max", # max | mean | first (for multi-vector docs)
|
|
1826
|
-
as_of="2024-01-01T00:00:00Z", # Time-travel query
|
|
1827
|
-
include_vectors=False,
|
|
1828
|
-
include_metadata=True,
|
|
1829
|
-
include_scores=True,
|
|
1830
|
-
)
|
|
1831
|
-
results = collection.search(request)
|
|
1832
|
-
|
|
1833
|
-
# Access detailed results
|
|
1834
|
-
print(f"Query time: {results.query_time_ms}ms")
|
|
1835
|
-
print(f"Total matches: {results.total_count}")
|
|
1836
|
-
print(f"Vector results: {results.vector_results}") # Results from vector search
|
|
1837
|
-
print(f"Keyword results: {results.keyword_results}") # Results from BM25
|
|
1838
|
-
print(f"Fused results: {results.fused_results}") # Combined results
|
|
1839
|
-
```
|
|
1840
|
-
|
|
1841
|
-
---
|
|
1842
|
-
|
|
1843
|
-
## 13. Graph Operations
|
|
1844
|
-
|
|
1845
|
-
Build and query knowledge graphs.
|
|
1846
|
-
|
|
1847
|
-
### Adding Nodes
|
|
1848
|
-
|
|
1849
|
-
```typescript
|
|
1850
|
-
# Add a node
|
|
1851
|
-
db.add_node(
|
|
1852
|
-
namespace="default",
|
|
1853
|
-
node_id="alice",
|
|
1854
|
-
node_type="person",
|
|
1855
|
-
properties={"role": "engineer", "team": "ml", "level": "senior"}
|
|
1856
|
-
)
|
|
1857
|
-
|
|
1858
|
-
db.add_node("default", "project_x", "project", {"status": "active", "priority": "high"})
|
|
1859
|
-
db.add_node("default", "bob", "person", {"role": "manager", "team": "ml"})
|
|
1860
|
-
```
|
|
1861
|
-
|
|
1862
|
-
### Adding Edges
|
|
1863
|
-
|
|
1864
|
-
```typescript
|
|
1865
|
-
# Add directed edge
|
|
1866
|
-
db.add_edge(
|
|
1867
|
-
namespace="default",
|
|
1868
|
-
from_id="alice",
|
|
1869
|
-
edge_type="works_on",
|
|
1870
|
-
to_id="project_x",
|
|
1871
|
-
properties={"role": "lead", "since": "2024-01"}
|
|
1872
|
-
)
|
|
1873
|
-
|
|
1874
|
-
db.add_edge("default", "alice", "reports_to", "bob")
|
|
1875
|
-
db.add_edge("default", "bob", "manages", "project_x")
|
|
1876
|
-
```
|
|
1877
|
-
|
|
1878
|
-
### Graph Traversal
|
|
1879
|
-
|
|
1880
|
-
```typescript
|
|
1881
|
-
# BFS traversal from a starting node
|
|
1882
|
-
nodes, edges = db.traverse(
|
|
1883
|
-
namespace="default",
|
|
1884
|
-
start_node="alice",
|
|
1885
|
-
max_depth=3,
|
|
1886
|
-
order="bfs" # "bfs" or "dfs"
|
|
1887
|
-
)
|
|
1888
|
-
|
|
1889
|
-
for node in nodes:
|
|
1890
|
-
print(f"Node: {node['id']} ({node['node_type']})")
|
|
1891
|
-
print(f" Properties: {node['properties']}")
|
|
1892
|
-
|
|
1893
|
-
for edge in edges:
|
|
1894
|
-
print(f"{edge['from_id']} --{edge['edge_type']}--> {edge['to_id']}")
|
|
1895
|
-
```
|
|
1896
|
-
|
|
1897
|
-
### Filtered Traversal
|
|
1898
|
-
|
|
1899
|
-
```typescript
|
|
1900
|
-
# Traverse with filters
|
|
1901
|
-
nodes, edges = db.traverse(
|
|
1902
|
-
namespace="default",
|
|
1903
|
-
start_node="alice",
|
|
1904
|
-
max_depth=2,
|
|
1905
|
-
edge_types=["works_on", "reports_to"], # Only follow these edge types
|
|
1906
|
-
node_types=["person", "project"], # Only include these node types
|
|
1907
|
-
node_filter={"team": "ml"} # Filter nodes by properties
|
|
1908
|
-
)
|
|
1909
|
-
```
|
|
1910
|
-
|
|
1911
|
-
### Graph Queries
|
|
1912
|
-
|
|
1913
|
-
```typescript
|
|
1914
|
-
# Find shortest path
|
|
1915
|
-
path = db.find_path(
|
|
1916
|
-
namespace="default",
|
|
1917
|
-
from_id="alice",
|
|
1918
|
-
to_id="project_y",
|
|
1919
|
-
max_depth=5
|
|
1920
|
-
)
|
|
1921
|
-
|
|
1922
|
-
# Get neighbors
|
|
1923
|
-
neighbors = db.get_neighbors(
|
|
1924
|
-
namespace="default",
|
|
1925
|
-
node_id="alice",
|
|
1926
|
-
direction="outgoing" # "outgoing", "incoming", "both"
|
|
1927
|
-
)
|
|
1928
|
-
|
|
1929
|
-
# Get specific edge
|
|
1930
|
-
edge = db.get_edge("default", "alice", "works_on", "project_x")
|
|
1931
|
-
|
|
1932
|
-
# Delete node (and all connected edges)
|
|
1933
|
-
db.delete_node("default", "old_node")
|
|
1934
|
-
|
|
1935
|
-
# Delete edge
|
|
1936
|
-
db.delete_edge("default", "alice", "works_on", "project_old")
|
|
1937
|
-
```
|
|
1938
|
-
|
|
1939
|
-
---
|
|
1940
|
-
|
|
1941
|
-
## 14. Temporal Graph (Time-Travel)
|
|
1942
|
-
|
|
1943
|
-
Track state changes over time with temporal edges.
|
|
1944
|
-
|
|
1945
|
-
### Adding Temporal Edges
|
|
1946
|
-
|
|
1947
|
-
```typescript
|
|
1948
|
-
import time
|
|
1949
|
-
|
|
1950
|
-
now = int(time.time() * 1000) # milliseconds since epoch
|
|
1951
|
-
one_hour = 60 * 60 * 1000
|
|
1952
|
-
|
|
1953
|
-
# Record: Door was open from 10:00 to 11:00
|
|
1954
|
-
db.add_temporal_edge(
|
|
1955
|
-
namespace="smart_home",
|
|
1956
|
-
from_id="door_front",
|
|
1957
|
-
edge_type="STATE",
|
|
1958
|
-
to_id="open",
|
|
1959
|
-
valid_from=now - one_hour, # Start time (ms)
|
|
1960
|
-
valid_until=now, # End time (ms)
|
|
1961
|
-
properties={"sensor": "motion_1", "confidence": 0.95}
|
|
1962
|
-
)
|
|
1963
|
-
|
|
1964
|
-
# Record: Light is currently on (no end time yet)
|
|
1965
|
-
db.add_temporal_edge(
|
|
1966
|
-
namespace="smart_home",
|
|
1967
|
-
from_id="light_living",
|
|
1968
|
-
edge_type="STATE",
|
|
1969
|
-
to_id="on",
|
|
1970
|
-
valid_from=now,
|
|
1971
|
-
valid_until=0, # 0 = still valid (no end time)
|
|
1972
|
-
properties={"brightness": "80%", "color": "warm"}
|
|
1973
|
-
)
|
|
1974
|
-
```
|
|
1975
|
-
|
|
1976
|
-
### Time-Travel Queries
|
|
1977
|
-
|
|
1978
|
-
```typescript
|
|
1979
|
-
# Query modes:
|
|
1980
|
-
# - "CURRENT": Edges valid right now
|
|
1981
|
-
# - "POINT_IN_TIME": Edges valid at specific timestamp
|
|
1982
|
-
# - "RANGE": All edges within a time range
|
|
1983
|
-
|
|
1984
|
-
# What is the current state?
|
|
1985
|
-
edges = db.query_temporal_graph(
|
|
1986
|
-
namespace="smart_home",
|
|
1987
|
-
node_id="door_front",
|
|
1988
|
-
mode="CURRENT",
|
|
1989
|
-
edge_type="STATE"
|
|
1990
|
-
)
|
|
1991
|
-
current_state = edges[0]["to_id"] if edges else "unknown"
|
|
1992
|
-
|
|
1993
|
-
# Was the door open 1.5 hours ago?
|
|
1994
|
-
edges = db.query_temporal_graph(
|
|
1995
|
-
namespace="smart_home",
|
|
1996
|
-
node_id="door_front",
|
|
1997
|
-
mode="POINT_IN_TIME",
|
|
1998
|
-
timestamp=now - int(1.5 * 60 * 60 * 1000)
|
|
1999
|
-
)
|
|
2000
|
-
was_open = any(e["to_id"] == "open" for e in edges)
|
|
2001
|
-
|
|
2002
|
-
# All state changes in last hour
|
|
2003
|
-
edges = db.query_temporal_graph(
|
|
2004
|
-
namespace="smart_home",
|
|
2005
|
-
node_id="door_front",
|
|
2006
|
-
mode="RANGE",
|
|
2007
|
-
start_time=now - one_hour,
|
|
2008
|
-
end_time=now
|
|
2009
|
-
)
|
|
2010
|
-
for edge in edges:
|
|
2011
|
-
print(f"State: {edge['to_id']} from {edge['valid_from']} to {edge['valid_until']}")
|
|
2012
|
-
```
|
|
2013
|
-
|
|
2014
|
-
### End a Temporal Edge
|
|
2015
|
-
|
|
2016
|
-
```typescript
|
|
2017
|
-
# Close the current "on" state
|
|
2018
|
-
db.end_temporal_edge(
|
|
2019
|
-
namespace="smart_home",
|
|
2020
|
-
from_id="light_living",
|
|
2021
|
-
edge_type="STATE",
|
|
2022
|
-
to_id="on",
|
|
2023
|
-
end_time=int(time.time() * 1000)
|
|
2024
|
-
)
|
|
2025
|
-
```
|
|
2026
|
-
|
|
2027
|
-
---
|
|
2028
|
-
|
|
2029
|
-
## 15. Semantic Cache
|
|
2030
|
-
|
|
2031
|
-
Cache LLM responses with similarity-based retrieval for cost savings.
|
|
2032
|
-
|
|
2033
|
-
### Storing Cached Responses
|
|
2034
|
-
|
|
2035
|
-
```typescript
|
|
2036
|
-
# Store response with embedding
|
|
2037
|
-
db.cache_put(
|
|
2038
|
-
cache_name="llm_responses",
|
|
2039
|
-
key="What is Python?", # Original query (for display/debugging)
|
|
2040
|
-
value="Python is a high-level programming language...",
|
|
2041
|
-
embedding=[0.1, 0.2, ...], # Query embedding (384-dim)
|
|
2042
|
-
ttl_seconds=3600, # Expire in 1 hour (0 = no expiry)
|
|
2043
|
-
metadata={"model": "claude-3", "tokens": 150}
|
|
2044
|
-
)
|
|
2045
|
-
```
|
|
2046
|
-
|
|
2047
|
-
### Cache Lookup
|
|
2048
|
-
|
|
2049
|
-
```typescript
|
|
2050
|
-
# Check cache before calling LLM
|
|
2051
|
-
cached = db.cache_get(
|
|
2052
|
-
cache_name="llm_responses",
|
|
2053
|
-
query_embedding=[0.12, 0.18, ...], # Embed the new query
|
|
2054
|
-
threshold=0.85 # Cosine similarity threshold
|
|
2055
|
-
)
|
|
2056
|
-
|
|
2057
|
-
if cached:
|
|
2058
|
-
print(f"Cache HIT!")
|
|
2059
|
-
print(f"Original query: {cached['key']}")
|
|
2060
|
-
print(f"Response: {cached['value']}")
|
|
2061
|
-
print(f"Similarity: {cached['score']:.4f}")
|
|
2062
|
-
else:
|
|
2063
|
-
print("Cache MISS - calling LLM...")
|
|
2064
|
-
# Call LLM and cache the result
|
|
2065
|
-
```
|
|
2066
|
-
|
|
2067
|
-
### Cache Management
|
|
2068
|
-
|
|
2069
|
-
```typescript
|
|
2070
|
-
# Delete specific entry
|
|
2071
|
-
db.cache_delete("llm_responses", key="What is Python?")
|
|
2072
|
-
|
|
2073
|
-
# Clear entire cache
|
|
2074
|
-
db.cache_clear("llm_responses")
|
|
2075
|
-
|
|
2076
|
-
# Get cache statistics
|
|
2077
|
-
stats = db.cache_stats("llm_responses")
|
|
2078
|
-
print(f"Total entries: {stats['count']}")
|
|
2079
|
-
print(f"Hit rate: {stats['hit_rate']:.2%}")
|
|
2080
|
-
print(f"Memory usage: {stats['size_bytes']}")
|
|
2081
|
-
```
|
|
2082
|
-
|
|
2083
|
-
### Full Usage Pattern
|
|
2084
|
-
|
|
2085
|
-
```typescript
|
|
2086
|
-
def get_llm_response(query: str, embed_fn, llm_fn):
|
|
2087
|
-
"""Get response from cache or LLM."""
|
|
2088
|
-
query_embedding = embed_fn(query)
|
|
2089
|
-
|
|
2090
|
-
# Try cache first
|
|
2091
|
-
cached = db.cache_get(
|
|
2092
|
-
cache_name="llm_responses",
|
|
2093
|
-
query_embedding=query_embedding,
|
|
2094
|
-
threshold=0.90
|
|
2095
|
-
)
|
|
2096
|
-
|
|
2097
|
-
if cached:
|
|
2098
|
-
return cached['value']
|
|
2099
|
-
|
|
2100
|
-
# Cache miss - call LLM
|
|
2101
|
-
response = llm_fn(query)
|
|
2102
|
-
|
|
2103
|
-
# Store in cache
|
|
2104
|
-
db.cache_put(
|
|
2105
|
-
cache_name="llm_responses",
|
|
2106
|
-
key=query,
|
|
2107
|
-
value=response,
|
|
2108
|
-
embedding=query_embedding,
|
|
2109
|
-
ttl_seconds=86400 # 24 hours
|
|
2110
|
-
)
|
|
2111
|
-
|
|
2112
|
-
return response
|
|
2113
|
-
```
|
|
2114
|
-
|
|
2115
|
-
---
|
|
2116
|
-
|
|
2117
|
-
## 16. Context Query Builder (LLM Optimization)
|
|
2118
|
-
|
|
2119
|
-
Assemble LLM context with token budgeting and priority-based truncation.
|
|
2120
|
-
|
|
2121
|
-
### Basic Context Query
|
|
2122
|
-
|
|
2123
|
-
```typescript
|
|
2124
|
-
from sochdb import ContextQueryBuilder, ContextFormat, TruncationStrategy
|
|
2125
|
-
|
|
2126
|
-
# Build context for LLM
|
|
2127
|
-
context = ContextQueryBuilder() \
|
|
2128
|
-
.for_session("session_123") \
|
|
2129
|
-
.with_budget(4096) \
|
|
2130
|
-
.format(ContextFormat.TOON) \
|
|
2131
|
-
.literal("SYSTEM", priority=0, text="You are a helpful assistant.") \
|
|
2132
|
-
.section("USER_PROFILE", priority=1) \
|
|
2133
|
-
.get("user.profile.{name, preferences}") \
|
|
2134
|
-
.done() \
|
|
2135
|
-
.section("HISTORY", priority=2) \
|
|
2136
|
-
.last(10, "messages") \
|
|
2137
|
-
.where_eq("session_id", "session_123") \
|
|
2138
|
-
.done() \
|
|
2139
|
-
.section("KNOWLEDGE", priority=3) \
|
|
2140
|
-
.search("documents", "$query_embedding", k=5) \
|
|
2141
|
-
.done() \
|
|
2142
|
-
.execute()
|
|
2143
|
-
|
|
2144
|
-
print(f"Token count: {context.token_count}")
|
|
2145
|
-
print(f"Context:\n{context.text}")
|
|
2146
|
-
```
|
|
2147
|
-
|
|
2148
|
-
### Section Types
|
|
2149
|
-
|
|
2150
|
-
| Type | Method | Description |
|
|
2151
|
-
|------|--------|-------------|
|
|
2152
|
-
| `literal` | `.literal(name, priority, text)` | Static text content |
|
|
2153
|
-
| `get` | `.get(path)` | Fetch specific data by path |
|
|
2154
|
-
| `last` | `.last(n, table)` | Most recent N records from table |
|
|
2155
|
-
| `search` | `.search(collection, embedding, k)` | Vector similarity search |
|
|
2156
|
-
| `sql` | `.sql(query)` | SQL query results |
|
|
2157
|
-
|
|
2158
|
-
### Truncation Strategies
|
|
2159
|
-
|
|
2160
|
-
```typescript
|
|
2161
|
-
# Drop from end (keep beginning) - default
|
|
2162
|
-
.truncation(TruncationStrategy.TAIL_DROP)
|
|
2163
|
-
|
|
2164
|
-
# Drop from beginning (keep end)
|
|
2165
|
-
.truncation(TruncationStrategy.HEAD_DROP)
|
|
2166
|
-
|
|
2167
|
-
# Proportionally truncate across sections
|
|
2168
|
-
.truncation(TruncationStrategy.PROPORTIONAL)
|
|
2169
|
-
|
|
2170
|
-
# Fail if budget exceeded
|
|
2171
|
-
.truncation(TruncationStrategy.STRICT)
|
|
2172
|
-
```
|
|
2173
|
-
|
|
2174
|
-
### Variables and Bindings
|
|
2175
|
-
|
|
2176
|
-
```typescript
|
|
2177
|
-
from sochdb import ContextValue
|
|
2178
|
-
|
|
2179
|
-
context = ContextQueryBuilder() \
|
|
2180
|
-
.for_session("session_123") \
|
|
2181
|
-
.set_var("query_embedding", ContextValue.Embedding([0.1, 0.2, ...])) \
|
|
2182
|
-
.set_var("user_id", ContextValue.String("user_456")) \
|
|
2183
|
-
.section("KNOWLEDGE", priority=2) \
|
|
2184
|
-
.search("documents", "$query_embedding", k=5) \
|
|
2185
|
-
.done() \
|
|
2186
|
-
.execute()
|
|
2187
|
-
```
|
|
2188
|
-
|
|
2189
|
-
### Output Formats
|
|
2190
|
-
|
|
2191
|
-
```typescript
|
|
2192
|
-
# TOON format (40-60% fewer tokens)
|
|
2193
|
-
.format(ContextFormat.TOON)
|
|
2194
|
-
|
|
2195
|
-
# JSON format
|
|
2196
|
-
.format(ContextFormat.JSON)
|
|
2197
|
-
|
|
2198
|
-
# Markdown format (human-readable)
|
|
2199
|
-
.format(ContextFormat.MARKDOWN)
|
|
2200
|
-
|
|
2201
|
-
# Plain text
|
|
2202
|
-
.format(ContextFormat.TEXT)
|
|
2203
|
-
```
|
|
2204
|
-
|
|
2205
|
-
|
|
2206
|
-
## Session Management (Agent Context)
|
|
2207
|
-
|
|
2208
|
-
Stateful session management for agentic use cases with permissions, sandboxing, audit logging, and budget tracking.
|
|
2209
|
-
|
|
2210
|
-
### Session Overview
|
|
2211
|
-
|
|
2212
|
-
```
|
|
2213
|
-
Agent session abc123:
|
|
2214
|
-
cwd: /agents/abc123
|
|
2215
|
-
vars: $model = "gpt-4", $budget = 1000
|
|
2216
|
-
permissions: fs:rw, db:rw, calc:*
|
|
2217
|
-
audit: [read /data/users, write /agents/abc123/cache]
|
|
2218
|
-
```
|
|
2219
|
-
|
|
2220
|
-
### Creating Sessions
|
|
2221
|
-
|
|
2222
|
-
```typescript
|
|
2223
|
-
from sochdb import SessionManager, AgentContext
|
|
2224
|
-
from datetime import timedelta
|
|
2225
|
-
|
|
2226
|
-
# Create session manager with idle timeout
|
|
2227
|
-
session_mgr = SessionManager(idle_timeout=timedelta(hours=1))
|
|
2228
|
-
|
|
2229
|
-
# Create a new session
|
|
2230
|
-
session = session_mgr.create_session("session_abc123")
|
|
2231
|
-
|
|
2232
|
-
# Get existing session
|
|
2233
|
-
session = session_mgr.get_session("session_abc123")
|
|
2234
|
-
|
|
2235
|
-
# Get or create (idempotent)
|
|
2236
|
-
session = session_mgr.get_or_create("session_abc123")
|
|
2237
|
-
|
|
2238
|
-
# Remove session
|
|
2239
|
-
session_mgr.remove_session("session_abc123")
|
|
2240
|
-
|
|
2241
|
-
# Cleanup expired sessions
|
|
2242
|
-
removed_count = session_mgr.cleanup_expired()
|
|
2243
|
-
|
|
2244
|
-
# Get active session count
|
|
2245
|
-
count = session_mgr.session_count()
|
|
2246
|
-
```
|
|
2247
|
-
|
|
2248
|
-
### Agent Context
|
|
2249
|
-
|
|
2250
|
-
```typescript
|
|
2251
|
-
from sochdb import AgentContext, ContextValue
|
|
2252
|
-
|
|
2253
|
-
# Create agent context
|
|
2254
|
-
ctx = AgentContext("session_abc123")
|
|
2255
|
-
print(f"Session ID: {ctx.session_id}")
|
|
2256
|
-
print(f"Working dir: {ctx.working_dir}") # /agents/session_abc123
|
|
2257
|
-
|
|
2258
|
-
# Create with custom working directory
|
|
2259
|
-
ctx = AgentContext.with_working_dir("session_abc123", "/custom/path")
|
|
2260
|
-
|
|
2261
|
-
# Create with full permissions (trusted agents)
|
|
2262
|
-
ctx = AgentContext.with_full_permissions("session_abc123")
|
|
2263
|
-
```
|
|
2264
|
-
|
|
2265
|
-
### Session Variables
|
|
2266
|
-
|
|
2267
|
-
```typescript
|
|
2268
|
-
# Set variables
|
|
2269
|
-
ctx.set_var("model", ContextValue.String("gpt-4"))
|
|
2270
|
-
ctx.set_var("budget", ContextValue.Number(1000.0))
|
|
2271
|
-
ctx.set_var("debug", ContextValue.Bool(True))
|
|
2272
|
-
ctx.set_var("tags", ContextValue.List([
|
|
2273
|
-
ContextValue.String("ml"),
|
|
2274
|
-
ContextValue.String("production")
|
|
2275
|
-
]))
|
|
2276
|
-
|
|
2277
|
-
# Get variables
|
|
2278
|
-
model = ctx.get_var("model") # Returns ContextValue or None
|
|
2279
|
-
budget = ctx.get_var("budget")
|
|
2280
|
-
|
|
2281
|
-
# Peek (read-only, no audit)
|
|
2282
|
-
value = ctx.peek_var("model")
|
|
2283
|
-
|
|
2284
|
-
# Variable substitution in strings
|
|
2285
|
-
text = ctx.substitute_vars("Using $model with budget $budget")
|
|
2286
|
-
# Result: "Using gpt-4 with budget 1000"
|
|
2287
|
-
```
|
|
2288
|
-
|
|
2289
|
-
### Context Value Types
|
|
2290
|
-
|
|
2291
|
-
```typescript
|
|
2292
|
-
from sochdb import ContextValue
|
|
2293
|
-
|
|
2294
|
-
# String
|
|
2295
|
-
ContextValue.String("hello")
|
|
2296
|
-
|
|
2297
|
-
# Number (float)
|
|
2298
|
-
ContextValue.Number(42.5)
|
|
2299
|
-
|
|
2300
|
-
# Boolean
|
|
2301
|
-
ContextValue.Bool(True)
|
|
2302
|
-
|
|
2303
|
-
# List
|
|
2304
|
-
ContextValue.List([
|
|
2305
|
-
ContextValue.String("a"),
|
|
2306
|
-
ContextValue.Number(1.0)
|
|
2307
|
-
])
|
|
2308
|
-
|
|
2309
|
-
# Object (dict)
|
|
2310
|
-
ContextValue.Object({
|
|
2311
|
-
"key": ContextValue.String("value"),
|
|
2312
|
-
"count": ContextValue.Number(10.0)
|
|
2313
|
-
})
|
|
2314
|
-
|
|
2315
|
-
# Null
|
|
2316
|
-
ContextValue.Null()
|
|
2317
|
-
```
|
|
2318
|
-
|
|
2319
|
-
### Permissions
|
|
2320
|
-
|
|
2321
|
-
```typescript
|
|
2322
|
-
from sochdb import (
|
|
2323
|
-
AgentPermissions,
|
|
2324
|
-
FsPermissions,
|
|
2325
|
-
DbPermissions,
|
|
2326
|
-
NetworkPermissions
|
|
2327
|
-
)
|
|
2328
|
-
|
|
2329
|
-
# Configure permissions
|
|
2330
|
-
ctx.permissions = AgentPermissions(
|
|
2331
|
-
filesystem=FsPermissions(
|
|
2332
|
-
read=True,
|
|
2333
|
-
write=True,
|
|
2334
|
-
mkdir=True,
|
|
2335
|
-
delete=False,
|
|
2336
|
-
allowed_paths=["/agents/session_abc123", "/shared/data"]
|
|
2337
|
-
),
|
|
2338
|
-
database=DbPermissions(
|
|
2339
|
-
read=True,
|
|
2340
|
-
write=True,
|
|
2341
|
-
create=False,
|
|
2342
|
-
drop=False,
|
|
2343
|
-
allowed_tables=["user_*", "cache_*"] # Pattern matching
|
|
2344
|
-
),
|
|
2345
|
-
calculator=True,
|
|
2346
|
-
network=NetworkPermissions(
|
|
2347
|
-
http=True,
|
|
2348
|
-
allowed_domains=["api.example.com", "*.internal.net"]
|
|
2349
|
-
)
|
|
2350
|
-
)
|
|
2351
|
-
|
|
2352
|
-
# Check permissions before operations
|
|
2353
|
-
try:
|
|
2354
|
-
ctx.check_fs_permission("/agents/session_abc123/data.json", AuditOperation.FS_READ)
|
|
2355
|
-
# Permission granted
|
|
2356
|
-
except ContextError as e:
|
|
2357
|
-
print(f"Permission denied: {e}")
|
|
2358
|
-
|
|
2359
|
-
try:
|
|
2360
|
-
ctx.check_db_permission("user_profiles", AuditOperation.DB_QUERY)
|
|
2361
|
-
# Permission granted
|
|
2362
|
-
except ContextError as e:
|
|
2363
|
-
print(f"Permission denied: {e}")
|
|
2364
|
-
```
|
|
2365
|
-
|
|
2366
|
-
### Budget Tracking
|
|
2367
|
-
|
|
2368
|
-
```typescript
|
|
2369
|
-
from sochdb import OperationBudget
|
|
2370
|
-
|
|
2371
|
-
# Configure budget limits
|
|
2372
|
-
ctx.budget = OperationBudget(
|
|
2373
|
-
max_tokens=100000, # Maximum tokens (input + output)
|
|
2374
|
-
max_cost=5000, # Maximum cost in millicents ($50.00)
|
|
2375
|
-
max_operations=10000 # Maximum operation count
|
|
2376
|
-
)
|
|
2377
|
-
|
|
2378
|
-
# Consume budget (called automatically by operations)
|
|
2379
|
-
try:
|
|
2380
|
-
ctx.consume_budget(tokens=500, cost=10) # 500 tokens, $0.10
|
|
2381
|
-
except ContextError as e:
|
|
2382
|
-
if "Budget exceeded" in str(e):
|
|
2383
|
-
print("Budget limit reached!")
|
|
2384
|
-
|
|
2385
|
-
# Check budget status
|
|
2386
|
-
print(f"Tokens used: {ctx.budget.tokens_used}/{ctx.budget.max_tokens}")
|
|
2387
|
-
print(f"Cost used: ${ctx.budget.cost_used / 100:.2f}/${ctx.budget.max_cost / 100:.2f}")
|
|
2388
|
-
print(f"Operations: {ctx.budget.operations_used}/{ctx.budget.max_operations}")
|
|
2389
|
-
```
|
|
2390
|
-
|
|
2391
|
-
### Session Transactions
|
|
2392
|
-
|
|
2393
|
-
```typescript
|
|
2394
|
-
# Begin transaction within session
|
|
2395
|
-
ctx.begin_transaction(tx_id=12345)
|
|
2396
|
-
|
|
2397
|
-
# Create savepoint
|
|
2398
|
-
ctx.savepoint("before_update")
|
|
2399
|
-
|
|
2400
|
-
# Record pending writes (for rollback)
|
|
2401
|
-
ctx.record_pending_write(
|
|
2402
|
-
resource_type=ResourceType.FILE,
|
|
2403
|
-
resource_key="/agents/session_abc123/data.json",
|
|
2404
|
-
original_value=b'{"old": "data"}'
|
|
2405
|
-
)
|
|
2406
|
-
|
|
2407
|
-
# Commit transaction
|
|
2408
|
-
ctx.commit_transaction()
|
|
2409
|
-
|
|
2410
|
-
# Or rollback
|
|
2411
|
-
pending_writes = ctx.rollback_transaction()
|
|
2412
|
-
for write in pending_writes:
|
|
2413
|
-
print(f"Rolling back: {write.resource_key}")
|
|
2414
|
-
# Restore original_value
|
|
2415
|
-
```
|
|
2416
|
-
|
|
2417
|
-
### Path Resolution
|
|
2418
|
-
|
|
2419
|
-
```typescript
|
|
2420
|
-
# Paths are resolved relative to working directory
|
|
2421
|
-
ctx = AgentContext.with_working_dir("session_abc123", "/home/agent")
|
|
2422
|
-
|
|
2423
|
-
# Relative paths
|
|
2424
|
-
resolved = ctx.resolve_path("data.json") # /home/agent/data.json
|
|
2425
|
-
|
|
2426
|
-
# Absolute paths pass through
|
|
2427
|
-
resolved = ctx.resolve_path("/absolute/path") # /absolute/path
|
|
2428
|
-
```
|
|
2429
|
-
|
|
2430
|
-
### Audit Trail
|
|
2431
|
-
|
|
2432
|
-
```typescript
|
|
2433
|
-
# All operations are automatically logged
|
|
2434
|
-
# Audit entry includes: timestamp, operation, resource, result, metadata
|
|
2435
|
-
|
|
2436
|
-
# Export audit log
|
|
2437
|
-
audit_log = ctx.export_audit()
|
|
2438
|
-
for entry in audit_log:
|
|
2439
|
-
print(f"[{entry['timestamp']}] {entry['operation']}: {entry['resource']} -> {entry['result']}")
|
|
2440
|
-
|
|
2441
|
-
# Example output:
|
|
2442
|
-
# [1705312345] var.set: model -> success
|
|
2443
|
-
# [1705312346] fs.read: /data/config.json -> success
|
|
2444
|
-
# [1705312347] db.query: users -> success
|
|
2445
|
-
# [1705312348] fs.write: /forbidden/file -> denied:path not in allowed paths
|
|
2446
|
-
```
|
|
2447
|
-
|
|
2448
|
-
### Audit Operations
|
|
2449
|
-
|
|
2450
|
-
```typescript
|
|
2451
|
-
from sochdb import AuditOperation
|
|
2452
|
-
|
|
2453
|
-
# Filesystem operations
|
|
2454
|
-
AuditOperation.FS_READ
|
|
2455
|
-
AuditOperation.FS_WRITE
|
|
2456
|
-
AuditOperation.FS_MKDIR
|
|
2457
|
-
AuditOperation.FS_DELETE
|
|
2458
|
-
AuditOperation.FS_LIST
|
|
2459
|
-
|
|
2460
|
-
# Database operations
|
|
2461
|
-
AuditOperation.DB_QUERY
|
|
2462
|
-
AuditOperation.DB_INSERT
|
|
2463
|
-
AuditOperation.DB_UPDATE
|
|
2464
|
-
AuditOperation.DB_DELETE
|
|
2465
|
-
|
|
2466
|
-
# Other operations
|
|
2467
|
-
AuditOperation.CALCULATE
|
|
2468
|
-
AuditOperation.VAR_SET
|
|
2469
|
-
AuditOperation.VAR_GET
|
|
2470
|
-
AuditOperation.TX_BEGIN
|
|
2471
|
-
AuditOperation.TX_COMMIT
|
|
2472
|
-
AuditOperation.TX_ROLLBACK
|
|
2473
|
-
```
|
|
2474
|
-
|
|
2475
|
-
### Tool Registry
|
|
2476
|
-
|
|
2477
|
-
```typescript
|
|
2478
|
-
from sochdb import ToolDefinition, ToolCallRecord
|
|
2479
|
-
from datetime import datetime
|
|
2480
|
-
|
|
2481
|
-
# Register tools available to the agent
|
|
2482
|
-
ctx.register_tool(ToolDefinition(
|
|
2483
|
-
name="search_documents",
|
|
2484
|
-
description="Search documents by semantic similarity",
|
|
2485
|
-
parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}}}',
|
|
2486
|
-
requires_confirmation=False
|
|
2487
|
-
))
|
|
2488
|
-
|
|
2489
|
-
ctx.register_tool(ToolDefinition(
|
|
2490
|
-
name="delete_file",
|
|
2491
|
-
description="Delete a file from the filesystem",
|
|
2492
|
-
parameters_schema='{"type": "object", "properties": {"path": {"type": "string"}}}',
|
|
2493
|
-
requires_confirmation=True # Requires user confirmation
|
|
2494
|
-
))
|
|
2495
|
-
|
|
2496
|
-
# Record tool calls
|
|
2497
|
-
ctx.record_tool_call(ToolCallRecord(
|
|
2498
|
-
call_id="call_001",
|
|
2499
|
-
tool_name="search_documents",
|
|
2500
|
-
arguments='{"query": "machine learning"}',
|
|
2501
|
-
result='[{"id": "doc1", "score": 0.95}]',
|
|
2502
|
-
error=None,
|
|
2503
|
-
timestamp=datetime.now()
|
|
2504
|
-
))
|
|
2505
|
-
|
|
2506
|
-
# Access tool call history
|
|
2507
|
-
for call in ctx.tool_calls:
|
|
2508
|
-
print(f"{call.tool_name}: {call.result or call.error}")
|
|
2509
|
-
```
|
|
2510
|
-
|
|
2511
|
-
### Session Lifecycle
|
|
2512
|
-
|
|
2513
|
-
```typescript
|
|
2514
|
-
# Check session age
|
|
2515
|
-
age = ctx.age()
|
|
2516
|
-
print(f"Session age: {age}")
|
|
2517
|
-
|
|
2518
|
-
# Check idle time
|
|
2519
|
-
idle = ctx.idle_time()
|
|
2520
|
-
print(f"Idle time: {idle}")
|
|
2521
|
-
|
|
2522
|
-
# Check if expired
|
|
2523
|
-
if ctx.is_expired(idle_timeout=timedelta(hours=1)):
|
|
2524
|
-
print("Session has expired!")
|
|
2525
|
-
```
|
|
2526
|
-
|
|
2527
|
-
### Complete Session Example
|
|
2528
|
-
|
|
2529
|
-
```typescript
|
|
2530
|
-
from sochdb import (
|
|
2531
|
-
SessionManager, AgentContext, ContextValue,
|
|
2532
|
-
AgentPermissions, FsPermissions, DbPermissions,
|
|
2533
|
-
OperationBudget, ToolDefinition, AuditOperation
|
|
2534
|
-
)
|
|
2535
|
-
from datetime import timedelta
|
|
2536
|
-
|
|
2537
|
-
# Initialize session manager
|
|
2538
|
-
session_mgr = SessionManager(idle_timeout=timedelta(hours=2))
|
|
2539
|
-
|
|
2540
|
-
# Create session for an agent
|
|
2541
|
-
session_id = "agent_session_12345"
|
|
2542
|
-
ctx = session_mgr.get_or_create(session_id)
|
|
2543
|
-
|
|
2544
|
-
# Configure the agent
|
|
2545
|
-
ctx.permissions = AgentPermissions(
|
|
2546
|
-
filesystem=FsPermissions(
|
|
2547
|
-
read=True,
|
|
2548
|
-
write=True,
|
|
2549
|
-
allowed_paths=[f"/agents/{session_id}", "/shared"]
|
|
2550
|
-
),
|
|
2551
|
-
database=DbPermissions(
|
|
2552
|
-
read=True,
|
|
2553
|
-
write=True,
|
|
2554
|
-
allowed_tables=["documents", "cache_*"]
|
|
2555
|
-
),
|
|
2556
|
-
calculator=True
|
|
2557
|
-
)
|
|
2558
|
-
|
|
2559
|
-
ctx.budget = OperationBudget(
|
|
2560
|
-
max_tokens=50000,
|
|
2561
|
-
max_cost=1000, # $10.00
|
|
2562
|
-
max_operations=1000
|
|
2563
|
-
)
|
|
2564
|
-
|
|
2565
|
-
# Set initial variables
|
|
2566
|
-
ctx.set_var("model", ContextValue.String("claude-3-sonnet"))
|
|
2567
|
-
ctx.set_var("temperature", ContextValue.Number(0.7))
|
|
2568
|
-
|
|
2569
|
-
# Register available tools
|
|
2570
|
-
ctx.register_tool(ToolDefinition(
|
|
2571
|
-
name="vector_search",
|
|
2572
|
-
description="Search vectors by similarity",
|
|
2573
|
-
parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}, "k": {"type": "integer"}}}',
|
|
2574
|
-
requires_confirmation=False
|
|
2575
|
-
))
|
|
2576
|
-
|
|
2577
|
-
# Perform operations with permission checks
|
|
2578
|
-
def safe_read_file(ctx: AgentContext, path: str) -> bytes:
|
|
2579
|
-
resolved = ctx.resolve_path(path)
|
|
2580
|
-
ctx.check_fs_permission(resolved, AuditOperation.FS_READ)
|
|
2581
|
-
ctx.consume_budget(tokens=100, cost=1)
|
|
2582
|
-
# ... actual file read ...
|
|
2583
|
-
return b"file contents"
|
|
2584
|
-
|
|
2585
|
-
def safe_db_query(ctx: AgentContext, table: str, query: str):
|
|
2586
|
-
ctx.check_db_permission(table, AuditOperation.DB_QUERY)
|
|
2587
|
-
ctx.consume_budget(tokens=500, cost=5)
|
|
2588
|
-
# ... actual query ...
|
|
2589
|
-
return []
|
|
2590
|
-
|
|
2591
|
-
# Use in transaction
|
|
2592
|
-
ctx.begin_transaction(tx_id=1)
|
|
2593
|
-
try:
|
|
2594
|
-
# Operations here...
|
|
2595
|
-
ctx.commit_transaction()
|
|
2596
|
-
except Exception as e:
|
|
2597
|
-
ctx.rollback_transaction()
|
|
2598
|
-
raise
|
|
2599
|
-
|
|
2600
|
-
# Export audit trail for debugging/compliance
|
|
2601
|
-
audit = ctx.export_audit()
|
|
2602
|
-
print(f"Session performed {len(audit)} operations")
|
|
2603
|
-
|
|
2604
|
-
# Cleanup
|
|
2605
|
-
session_mgr.cleanup_expired()
|
|
2606
|
-
```
|
|
2607
|
-
|
|
2608
|
-
### Session Errors
|
|
2609
|
-
|
|
2610
|
-
```typescript
|
|
2611
|
-
from sochdb import ContextError
|
|
2612
|
-
|
|
2613
|
-
try:
|
|
2614
|
-
ctx.check_fs_permission("/forbidden", AuditOperation.FS_READ)
|
|
2615
|
-
except ContextError as e:
|
|
2616
|
-
if e.is_permission_denied():
|
|
2617
|
-
print(f"Permission denied: {e.message}")
|
|
2618
|
-
elif e.is_variable_not_found():
|
|
2619
|
-
print(f"Variable not found: {e.variable_name}")
|
|
2620
|
-
elif e.is_budget_exceeded():
|
|
2621
|
-
print(f"Budget exceeded: {e.budget_type}")
|
|
2622
|
-
elif e.is_transaction_error():
|
|
2623
|
-
print(f"Transaction error: {e.message}")
|
|
2624
|
-
elif e.is_invalid_path():
|
|
2625
|
-
print(f"Invalid path: {e.path}")
|
|
2626
|
-
elif e.is_session_expired():
|
|
2627
|
-
print("Session has expired")
|
|
2628
|
-
```
|
|
2629
|
-
---
|
|
2630
|
-
|
|
2631
|
-
## 17. Atomic Multi-Index Writes
|
|
2632
|
-
|
|
2633
|
-
Ensure consistency across KV storage, vectors, and graphs with atomic operations.
|
|
2634
|
-
|
|
2635
|
-
### Problem Without Atomicity
|
|
2636
|
-
|
|
2637
|
-
```
|
|
2638
|
-
# Without atomic writes, a crash can leave:
|
|
2639
|
-
# - Embedding exists but graph edges don't
|
|
2640
|
-
# - KV data exists but embedding is missing
|
|
2641
|
-
# - Partial graph relationships
|
|
2642
|
-
```
|
|
2643
|
-
|
|
2644
|
-
### Atomic Memory Writer
|
|
2645
|
-
|
|
2646
|
-
```typescript
|
|
2647
|
-
from sochdb import AtomicMemoryWriter, MemoryOp
|
|
2648
|
-
|
|
2649
|
-
writer = AtomicMemoryWriter(db)
|
|
2650
|
-
|
|
2651
|
-
# Build atomic operation set
|
|
2652
|
-
result = writer.write_atomic(
|
|
2653
|
-
memory_id="memory_123",
|
|
2654
|
-
ops=[
|
|
2655
|
-
# Store the blob/content
|
|
2656
|
-
MemoryOp.PutBlob(
|
|
2657
|
-
key=b"memories/memory_123/content",
|
|
2658
|
-
value=b"Meeting notes: discussed project timeline..."
|
|
2659
|
-
),
|
|
2660
|
-
|
|
2661
|
-
# Store the embedding
|
|
2662
|
-
MemoryOp.PutEmbedding(
|
|
2663
|
-
collection="memories",
|
|
2664
|
-
id="memory_123",
|
|
2665
|
-
embedding=[0.1, 0.2, ...],
|
|
2666
|
-
metadata={"type": "meeting", "date": "2024-01-15"}
|
|
2667
|
-
),
|
|
2668
|
-
|
|
2669
|
-
# Create graph nodes
|
|
2670
|
-
MemoryOp.CreateNode(
|
|
2671
|
-
namespace="default",
|
|
2672
|
-
node_id="memory_123",
|
|
2673
|
-
node_type="memory",
|
|
2674
|
-
properties={"importance": "high"}
|
|
2675
|
-
),
|
|
2676
|
-
|
|
2677
|
-
# Create graph edges
|
|
2678
|
-
MemoryOp.CreateEdge(
|
|
2679
|
-
namespace="default",
|
|
2680
|
-
from_id="memory_123",
|
|
2681
|
-
edge_type="relates_to",
|
|
2682
|
-
to_id="project_x",
|
|
2683
|
-
properties={}
|
|
2684
|
-
),
|
|
2685
|
-
]
|
|
2686
|
-
)
|
|
2687
|
-
|
|
2688
|
-
print(f"Intent ID: {result.intent_id}")
|
|
2689
|
-
print(f"Operations applied: {result.ops_applied}")
|
|
2690
|
-
print(f"Status: {result.status}") # "committed"
|
|
2691
|
-
```
|
|
2692
|
-
|
|
2693
|
-
### How It Works
|
|
2694
|
-
|
|
2695
|
-
```
|
|
2696
|
-
1. Write intent(id, ops...) to WAL ← Crash-safe
|
|
2697
|
-
2. Apply ops one-by-one
|
|
2698
|
-
3. Write commit(id) to WAL ← All-or-nothing
|
|
2699
|
-
4. Recovery replays incomplete intents
|
|
2700
|
-
```
|
|
2701
|
-
|
|
2702
|
-
---
|
|
2703
|
-
|
|
2704
|
-
## 18. Recovery & WAL Management
|
|
2705
|
-
|
|
2706
|
-
SochDB uses Write-Ahead Logging (WAL) for durability with automatic recovery.
|
|
2707
|
-
|
|
2708
|
-
### Recovery Manager
|
|
2709
|
-
|
|
2710
|
-
```typescript
|
|
2711
|
-
from sochdb import RecoveryManager
|
|
2712
|
-
|
|
2713
|
-
recovery = db.recovery()
|
|
2714
|
-
|
|
2715
|
-
# Check if recovery is needed
|
|
2716
|
-
if recovery.needs_recovery():
|
|
2717
|
-
result = recovery.recover()
|
|
2718
|
-
print(f"Status: {result.status}")
|
|
2719
|
-
print(f"Replayed entries: {result.replayed_entries}")
|
|
2720
|
-
```
|
|
2721
|
-
|
|
2722
|
-
### WAL Verification
|
|
2723
|
-
|
|
2724
|
-
```typescript
|
|
2725
|
-
# Verify WAL integrity
|
|
2726
|
-
result = recovery.verify_wal()
|
|
2727
|
-
|
|
2728
|
-
print(f"Valid: {result.is_valid}")
|
|
2729
|
-
print(f"Total entries: {result.total_entries}")
|
|
2730
|
-
print(f"Valid entries: {result.valid_entries}")
|
|
2731
|
-
print(f"Corrupted: {result.corrupted_entries}")
|
|
2732
|
-
print(f"Last valid LSN: {result.last_valid_lsn}")
|
|
2733
|
-
|
|
2734
|
-
if result.checksum_errors:
|
|
2735
|
-
for error in result.checksum_errors:
|
|
2736
|
-
print(f"Checksum error at LSN {error.lsn}: expected {error.expected}, got {error.actual}")
|
|
2737
|
-
```
|
|
2738
|
-
|
|
2739
|
-
### Force Checkpoint
|
|
2740
|
-
|
|
2741
|
-
```typescript
|
|
2742
|
-
# Force a checkpoint (flush memtable to disk)
|
|
2743
|
-
result = recovery.checkpoint()
|
|
2744
|
-
|
|
2745
|
-
print(f"Checkpoint LSN: {result.checkpoint_lsn}")
|
|
2746
|
-
print(f"Duration: {result.duration_ms}ms")
|
|
2747
|
-
```
|
|
2748
|
-
|
|
2749
|
-
### WAL Statistics
|
|
2750
|
-
|
|
2751
|
-
```typescript
|
|
2752
|
-
stats = recovery.wal_stats()
|
|
2753
|
-
|
|
2754
|
-
print(f"Total size: {stats.total_size_bytes} bytes")
|
|
2755
|
-
print(f"Active size: {stats.active_size_bytes} bytes")
|
|
2756
|
-
print(f"Archived size: {stats.archived_size_bytes} bytes")
|
|
2757
|
-
print(f"Entry count: {stats.entry_count}")
|
|
2758
|
-
print(f"Oldest LSN: {stats.oldest_entry_lsn}")
|
|
2759
|
-
print(f"Newest LSN: {stats.newest_entry_lsn}")
|
|
2760
|
-
```
|
|
2761
|
-
|
|
2762
|
-
### WAL Truncation
|
|
2763
|
-
|
|
2764
|
-
```typescript
|
|
2765
|
-
# Truncate WAL after checkpoint (reclaim disk space)
|
|
2766
|
-
result = recovery.truncate_wal(up_to_lsn=12345)
|
|
2767
|
-
|
|
2768
|
-
print(f"Truncated to LSN: {result.up_to_lsn}")
|
|
2769
|
-
print(f"Bytes freed: {result.bytes_freed}")
|
|
2770
|
-
```
|
|
2771
|
-
|
|
2772
|
-
### Open with Auto-Recovery
|
|
2773
|
-
|
|
2774
|
-
```typescript
|
|
2775
|
-
from sochdb import open_with_recovery
|
|
2776
|
-
|
|
2777
|
-
# Automatically recovers if needed
|
|
2778
|
-
db = open_with_recovery("./my_database")
|
|
2779
|
-
```
|
|
2780
|
-
|
|
2781
|
-
---
|
|
2782
|
-
|
|
2783
|
-
## 19. Checkpoints & Snapshots
|
|
2784
|
-
|
|
2785
|
-
### Application Checkpoints
|
|
2786
|
-
|
|
2787
|
-
Save and restore application state for workflow interruption/resumption.
|
|
2788
|
-
|
|
2789
|
-
```typescript
|
|
2790
|
-
from sochdb import CheckpointService
|
|
2791
|
-
|
|
2792
|
-
checkpoint_svc = db.checkpoint_service()
|
|
2793
|
-
|
|
2794
|
-
# Create a checkpoint
|
|
2795
|
-
checkpoint_id = checkpoint_svc.create(
|
|
2796
|
-
name="workflow_step_3",
|
|
2797
|
-
state=serialized_state, # bytes
|
|
2798
|
-
metadata={"step": "3", "user": "alice", "workflow": "data_pipeline"}
|
|
2799
|
-
)
|
|
2800
|
-
|
|
2801
|
-
# Restore checkpoint
|
|
2802
|
-
state = checkpoint_svc.restore(checkpoint_id)
|
|
2803
|
-
|
|
2804
|
-
# List checkpoints
|
|
2805
|
-
checkpoints = checkpoint_svc.list()
|
|
2806
|
-
for cp in checkpoints:
|
|
2807
|
-
print(f"{cp.name}: {cp.created_at}, {cp.state_size} bytes")
|
|
2808
|
-
|
|
2809
|
-
# Delete checkpoint
|
|
2810
|
-
checkpoint_svc.delete(checkpoint_id)
|
|
2811
|
-
```
|
|
2812
|
-
|
|
2813
|
-
### Workflow Checkpointing
|
|
2814
|
-
|
|
2815
|
-
```typescript
|
|
2816
|
-
# Create a workflow run
|
|
2817
|
-
run_id = checkpoint_svc.create_run(
|
|
2818
|
-
workflow="data_pipeline",
|
|
2819
|
-
params={"input_file": "data.csv", "batch_size": 1000}
|
|
2820
|
-
)
|
|
2821
|
-
|
|
2822
|
-
# Save checkpoint at each node/step
|
|
2823
|
-
checkpoint_svc.save_node_checkpoint(
|
|
2824
|
-
run_id=run_id,
|
|
2825
|
-
node_id="transform_step",
|
|
2826
|
-
state=step_state,
|
|
2827
|
-
metadata={"rows_processed": 5000}
|
|
2828
|
-
)
|
|
2829
|
-
|
|
2830
|
-
# Load latest checkpoint for a node
|
|
2831
|
-
checkpoint = checkpoint_svc.load_node_checkpoint(run_id, "transform_step")
|
|
2832
|
-
|
|
2833
|
-
# List all checkpoints for a run
|
|
2834
|
-
node_checkpoints = checkpoint_svc.list_run_checkpoints(run_id)
|
|
2835
|
-
```
|
|
2836
|
-
|
|
2837
|
-
### Snapshot Reader (Point-in-Time)
|
|
2838
|
-
|
|
2839
|
-
```typescript
|
|
2840
|
-
# Create a consistent snapshot for reading
|
|
2841
|
-
snapshot = db.snapshot()
|
|
2842
|
-
|
|
2843
|
-
# Read from snapshot (doesn't see newer writes)
|
|
2844
|
-
value = snapshot.get(b"key")
|
|
2845
|
-
|
|
2846
|
-
# All reads within snapshot see consistent state
|
|
2847
|
-
with db.snapshot() as snap:
|
|
2848
|
-
v1 = snap.get(b"key1")
|
|
2849
|
-
v2 = snap.get(b"key2") # Same consistent view
|
|
2850
|
-
|
|
2851
|
-
# Meanwhile, writes continue in main DB
|
|
2852
|
-
db.put(b"key1", b"new_value") # Snapshot doesn't see this
|
|
2853
|
-
```
|
|
2854
|
-
|
|
2855
|
-
---
|
|
2856
|
-
|
|
2857
|
-
## 20. Compression & Storage
|
|
2858
|
-
|
|
2859
|
-
### Compression Settings
|
|
2860
|
-
|
|
2861
|
-
```typescript
|
|
2862
|
-
from sochdb import CompressionType
|
|
2863
|
-
|
|
2864
|
-
db = Database.open("./my_db", config={
|
|
2865
|
-
# Compression for SST files
|
|
2866
|
-
"compression": CompressionType.LZ4, # LZ4 (fast), ZSTD (better ratio), NONE
|
|
2867
|
-
"compression_level": 3, # ZSTD: 1-22, LZ4: ignored
|
|
2868
|
-
|
|
2869
|
-
# Compression for WAL
|
|
2870
|
-
"wal_compression": CompressionType.NONE, # Usually NONE for WAL (already sequential)
|
|
2871
|
-
})
|
|
2872
|
-
```
|
|
2873
|
-
|
|
2874
|
-
### Compression Comparison
|
|
2875
|
-
|
|
2876
|
-
| Type | Ratio | Compress Speed | Decompress Speed | Use Case |
|
|
2877
|
-
|------|-------|----------------|------------------|----------|
|
|
2878
|
-
| `NONE` | 1x | N/A | N/A | Already compressed data |
|
|
2879
|
-
| `LZ4` | ~2.5x | ~780 MB/s | ~4500 MB/s | General use (default) |
|
|
2880
|
-
| `ZSTD` | ~3.5x | ~520 MB/s | ~1800 MB/s | Cold storage, large datasets |
|
|
2881
|
-
|
|
2882
|
-
### Storage Statistics
|
|
2883
|
-
|
|
2884
|
-
```typescript
|
|
2885
|
-
stats = db.storage_stats()
|
|
2886
|
-
|
|
2887
|
-
print(f"Data size: {stats.data_size_bytes}")
|
|
2888
|
-
print(f"Index size: {stats.index_size_bytes}")
|
|
2889
|
-
print(f"WAL size: {stats.wal_size_bytes}")
|
|
2890
|
-
print(f"Compression ratio: {stats.compression_ratio:.2f}x")
|
|
2891
|
-
print(f"SST files: {stats.sst_file_count}")
|
|
2892
|
-
print(f"Levels: {stats.level_stats}")
|
|
2893
|
-
```
|
|
2894
|
-
|
|
2895
|
-
### Compaction Control
|
|
2896
|
-
|
|
2897
|
-
```typescript
|
|
2898
|
-
# Manual compaction (reclaim space, optimize reads)
|
|
2899
|
-
db.compact()
|
|
2900
|
-
|
|
2901
|
-
# Compact specific level
|
|
2902
|
-
db.compact_level(level=0)
|
|
2903
|
-
|
|
2904
|
-
# Get compaction stats
|
|
2905
|
-
stats = db.compaction_stats()
|
|
2906
|
-
print(f"Pending compactions: {stats.pending_compactions}")
|
|
2907
|
-
print(f"Running compactions: {stats.running_compactions}")
|
|
2908
|
-
```
|
|
2909
|
-
|
|
2910
|
-
---
|
|
2911
|
-
|
|
2912
|
-
## 21. Statistics & Monitoring
|
|
2913
|
-
|
|
2914
|
-
### Database Statistics
|
|
2915
|
-
|
|
2916
|
-
```typescript
|
|
2917
|
-
stats = db.stats()
|
|
2918
|
-
|
|
2919
|
-
# Transaction stats
|
|
2920
|
-
print(f"Active transactions: {stats.active_transactions}")
|
|
2921
|
-
print(f"Committed transactions: {stats.committed_transactions}")
|
|
2922
|
-
print(f"Aborted transactions: {stats.aborted_transactions}")
|
|
2923
|
-
print(f"Conflict rate: {stats.conflict_rate:.2%}")
|
|
2924
|
-
|
|
2925
|
-
# Operation stats
|
|
2926
|
-
print(f"Total reads: {stats.total_reads}")
|
|
2927
|
-
print(f"Total writes: {stats.total_writes}")
|
|
2928
|
-
print(f"Cache hit rate: {stats.cache_hit_rate:.2%}")
|
|
2929
|
-
|
|
2930
|
-
# Storage stats
|
|
2931
|
-
print(f"Key count: {stats.key_count}")
|
|
2932
|
-
print(f"Total data size: {stats.total_data_bytes}")
|
|
2933
|
-
```
|
|
2934
|
-
|
|
2935
|
-
### Token Statistics (LLM Optimization)
|
|
2936
|
-
|
|
2937
|
-
```typescript
|
|
2938
|
-
stats = db.token_stats()
|
|
2939
|
-
|
|
2940
|
-
print(f"TOON tokens emitted: {stats.toon_tokens_emitted}")
|
|
2941
|
-
print(f"Equivalent JSON tokens: {stats.json_tokens_equivalent}")
|
|
2942
|
-
print(f"Token savings: {stats.token_savings_percent:.1f}%")
|
|
2943
|
-
```
|
|
2944
|
-
|
|
2945
|
-
### Performance Metrics
|
|
2946
|
-
|
|
2947
|
-
```typescript
|
|
2948
|
-
metrics = db.performance_metrics()
|
|
2949
|
-
|
|
2950
|
-
# Latency percentiles
|
|
2951
|
-
print(f"Read P50: {metrics.read_latency_p50_us}µs")
|
|
2952
|
-
print(f"Read P99: {metrics.read_latency_p99_us}µs")
|
|
2953
|
-
print(f"Write P50: {metrics.write_latency_p50_us}µs")
|
|
2954
|
-
print(f"Write P99: {metrics.write_latency_p99_us}µs")
|
|
2955
|
-
|
|
2956
|
-
# Throughput
|
|
2957
|
-
print(f"Reads/sec: {metrics.reads_per_second}")
|
|
2958
|
-
print(f"Writes/sec: {metrics.writes_per_second}")
|
|
2959
|
-
```
|
|
2960
|
-
|
|
2961
|
-
---
|
|
2962
|
-
|
|
2963
|
-
## 22. Distributed Tracing
|
|
2964
|
-
|
|
2965
|
-
Track operations for debugging and performance analysis.
|
|
2966
|
-
|
|
2967
|
-
### Starting Traces
|
|
2968
|
-
|
|
2969
|
-
```typescript
|
|
2970
|
-
from sochdb import TraceStore
|
|
2971
|
-
|
|
2972
|
-
traces = TraceStore(db)
|
|
2973
|
-
|
|
2974
|
-
# Start a trace run
|
|
2975
|
-
run = traces.start_run(
|
|
2976
|
-
name="user_request",
|
|
2977
|
-
resource={"service": "api", "version": "1.0.0"}
|
|
2978
|
-
)
|
|
2979
|
-
trace_id = run.trace_id
|
|
2980
|
-
```
|
|
2981
|
-
|
|
2982
|
-
### Creating Spans
|
|
2983
|
-
|
|
2984
|
-
```typescript
|
|
2985
|
-
from sochdb import SpanKind, SpanStatusCode
|
|
2986
|
-
|
|
2987
|
-
# Start root span
|
|
2988
|
-
root_span = traces.start_span(
|
|
2989
|
-
trace_id=trace_id,
|
|
2990
|
-
name="handle_request",
|
|
2991
|
-
parent_span_id=None,
|
|
2992
|
-
kind=SpanKind.SERVER
|
|
2993
|
-
)
|
|
2994
|
-
|
|
2995
|
-
# Start child span
|
|
2996
|
-
db_span = traces.start_span(
|
|
2997
|
-
trace_id=trace_id,
|
|
2998
|
-
name="database_query",
|
|
2999
|
-
parent_span_id=root_span.span_id,
|
|
3000
|
-
kind=SpanKind.CLIENT
|
|
3001
|
-
)
|
|
3002
|
-
|
|
3003
|
-
# Add attributes
|
|
3004
|
-
traces.set_span_attributes(trace_id, db_span.span_id, {
|
|
3005
|
-
"db.system": "sochdb",
|
|
3006
|
-
"db.operation": "SELECT",
|
|
3007
|
-
"db.table": "users"
|
|
3008
|
-
})
|
|
3009
|
-
|
|
3010
|
-
# End spans
|
|
3011
|
-
traces.end_span(trace_id, db_span.span_id, SpanStatusCode.OK)
|
|
3012
|
-
traces.end_span(trace_id, root_span.span_id, SpanStatusCode.OK)
|
|
3013
|
-
|
|
3014
|
-
# End the trace run
|
|
3015
|
-
traces.end_run(trace_id, TraceStatus.COMPLETED)
|
|
3016
|
-
```
|
|
3017
|
-
|
|
3018
|
-
### Domain Events
|
|
3019
|
-
|
|
3020
|
-
```typescript
|
|
3021
|
-
# Log retrieval (for RAG debugging)
|
|
3022
|
-
traces.log_retrieval(
|
|
3023
|
-
trace_id=trace_id,
|
|
3024
|
-
query="user query",
|
|
3025
|
-
results=[{"id": "doc1", "score": 0.95}],
|
|
3026
|
-
latency_ms=15
|
|
3027
|
-
)
|
|
3028
|
-
|
|
3029
|
-
# Log LLM call
|
|
3030
|
-
traces.log_llm_call(
|
|
3031
|
-
trace_id=trace_id,
|
|
3032
|
-
model="claude-3-sonnet",
|
|
3033
|
-
input_tokens=500,
|
|
3034
|
-
output_tokens=200,
|
|
3035
|
-
latency_ms=1200
|
|
3036
|
-
)
|
|
3037
|
-
```
|
|
3038
|
-
|
|
3039
|
-
---
|
|
3040
|
-
|
|
3041
|
-
## 23. Workflow & Run Tracking
|
|
3042
|
-
|
|
3043
|
-
Track long-running workflows with events and state.
|
|
3044
|
-
|
|
3045
|
-
### Creating Workflow Runs
|
|
3046
|
-
|
|
3047
|
-
```typescript
|
|
3048
|
-
from sochdb import WorkflowService, RunStatus
|
|
3049
|
-
|
|
3050
|
-
workflow_svc = db.workflow_service()
|
|
3051
|
-
|
|
3052
|
-
# Create a new run
|
|
3053
|
-
run = workflow_svc.create_run(
|
|
3054
|
-
run_id="run_123",
|
|
3055
|
-
workflow="data_pipeline",
|
|
3056
|
-
params={"input": "data.csv", "output": "results.json"}
|
|
3057
|
-
)
|
|
3058
|
-
|
|
3059
|
-
print(f"Run ID: {run.run_id}")
|
|
3060
|
-
print(f"Status: {run.status}")
|
|
3061
|
-
print(f"Created: {run.created_at}")
|
|
3062
|
-
```
|
|
3063
|
-
|
|
3064
|
-
### Appending Events
|
|
3065
|
-
|
|
3066
|
-
```typescript
|
|
3067
|
-
from sochdb import WorkflowEvent, EventType
|
|
3068
|
-
|
|
3069
|
-
# Append events as workflow progresses
|
|
3070
|
-
workflow_svc.append_event(WorkflowEvent(
|
|
3071
|
-
run_id="run_123",
|
|
3072
|
-
event_type=EventType.NODE_STARTED,
|
|
3073
|
-
node_id="extract",
|
|
3074
|
-
data={"input_file": "data.csv"}
|
|
3075
|
-
))
|
|
3076
|
-
|
|
3077
|
-
workflow_svc.append_event(WorkflowEvent(
|
|
3078
|
-
run_id="run_123",
|
|
3079
|
-
event_type=EventType.NODE_COMPLETED,
|
|
3080
|
-
node_id="extract",
|
|
3081
|
-
data={"rows_extracted": 10000}
|
|
3082
|
-
))
|
|
3083
|
-
```
|
|
3084
|
-
|
|
3085
|
-
### Querying Events
|
|
3086
|
-
|
|
3087
|
-
```typescript
|
|
3088
|
-
# Get all events for a run
|
|
3089
|
-
events = workflow_svc.get_events("run_123")
|
|
3090
|
-
|
|
3091
|
-
# Get events since a sequence number
|
|
3092
|
-
new_events = workflow_svc.get_events("run_123", since_seq=10, limit=100)
|
|
3093
|
-
|
|
3094
|
-
# Stream events (for real-time monitoring)
|
|
3095
|
-
for event in workflow_svc.stream_events("run_123"):
|
|
3096
|
-
print(f"[{event.seq}] {event.event_type}: {event.node_id}")
|
|
3097
|
-
```
|
|
3098
|
-
|
|
3099
|
-
### Update Run Status
|
|
3100
|
-
|
|
3101
|
-
```typescript
|
|
3102
|
-
# Update status
|
|
3103
|
-
workflow_svc.update_run_status("run_123", RunStatus.COMPLETED)
|
|
3104
|
-
|
|
3105
|
-
# Or mark as failed
|
|
3106
|
-
workflow_svc.update_run_status("run_123", RunStatus.FAILED)
|
|
3107
|
-
```
|
|
3108
|
-
|
|
3109
|
-
---
|
|
3110
|
-
|
|
3111
|
-
## 24. Server Mode (gRPC Client)
|
|
3112
|
-
|
|
3113
|
-
Full-featured client for distributed deployments.
|
|
3114
|
-
|
|
3115
|
-
### Connection
|
|
3116
|
-
|
|
3117
|
-
```typescript
|
|
3118
|
-
from sochdb import SochDBClient
|
|
3119
|
-
|
|
3120
|
-
# Basic connection
|
|
3121
|
-
client = SochDBClient("localhost:50051")
|
|
3122
|
-
|
|
3123
|
-
# With TLS
|
|
3124
|
-
client = SochDBClient("localhost:50051", secure=True, ca_cert="ca.pem")
|
|
3125
|
-
|
|
3126
|
-
# With authentication
|
|
3127
|
-
client = SochDBClient("localhost:50051", api_key="your_api_key")
|
|
3128
|
-
|
|
3129
|
-
# Context manager
|
|
3130
|
-
with SochDBClient("localhost:50051") as client:
|
|
3131
|
-
client.put(b"key", b"value")
|
|
3132
|
-
```
|
|
3133
|
-
|
|
3134
|
-
### Key-Value Operations
|
|
3135
|
-
|
|
3136
|
-
```typescript
|
|
3137
|
-
# Put with TTL
|
|
3138
|
-
client.put(b"key", b"value", namespace="default", ttl_seconds=3600)
|
|
3139
|
-
|
|
3140
|
-
# Get
|
|
3141
|
-
value = client.get(b"key", namespace="default")
|
|
3142
|
-
|
|
3143
|
-
# Delete
|
|
3144
|
-
client.delete(b"key", namespace="default")
|
|
3145
|
-
|
|
3146
|
-
# Batch operations
|
|
3147
|
-
client.put_batch([
|
|
3148
|
-
(b"key1", b"value1"),
|
|
3149
|
-
(b"key2", b"value2"),
|
|
3150
|
-
], namespace="default")
|
|
3151
|
-
```
|
|
3152
|
-
|
|
3153
|
-
### Vector Operations (Server Mode)
|
|
3154
|
-
|
|
3155
|
-
```typescript
|
|
3156
|
-
# Create index
|
|
3157
|
-
client.create_index(
|
|
3158
|
-
name="embeddings",
|
|
3159
|
-
dimension=384,
|
|
3160
|
-
metric="cosine",
|
|
3161
|
-
m=16,
|
|
3162
|
-
ef_construction=200
|
|
3163
|
-
)
|
|
3164
|
-
|
|
3165
|
-
# Insert vectors
|
|
3166
|
-
client.insert_vectors(
|
|
3167
|
-
index_name="embeddings",
|
|
3168
|
-
ids=[1, 2, 3],
|
|
3169
|
-
vectors=[[...], [...], [...]]
|
|
3170
|
-
)
|
|
3171
|
-
|
|
3172
|
-
# Search
|
|
3173
|
-
results = client.search(
|
|
3174
|
-
index_name="embeddings",
|
|
3175
|
-
query=[0.1, 0.2, ...],
|
|
3176
|
-
k=10,
|
|
3177
|
-
ef_search=50
|
|
3178
|
-
)
|
|
3179
|
-
|
|
3180
|
-
for result in results:
|
|
3181
|
-
print(f"ID: {result.id}, Distance: {result.distance}")
|
|
3182
|
-
```
|
|
3183
|
-
|
|
3184
|
-
### Collection Operations (Server Mode)
|
|
3185
|
-
|
|
3186
|
-
```typescript
|
|
3187
|
-
# Create collection
|
|
3188
|
-
client.create_collection(
|
|
3189
|
-
name="documents",
|
|
3190
|
-
dimension=384,
|
|
3191
|
-
namespace="default",
|
|
3192
|
-
metric="cosine"
|
|
3193
|
-
)
|
|
3194
|
-
|
|
3195
|
-
# Add documents
|
|
3196
|
-
client.add_documents(
|
|
3197
|
-
collection_name="documents",
|
|
3198
|
-
documents=[
|
|
3199
|
-
{"id": "1", "content": "Hello", "embedding": [...], "metadata": {...}},
|
|
3200
|
-
{"id": "2", "content": "World", "embedding": [...], "metadata": {...}}
|
|
3201
|
-
],
|
|
3202
|
-
namespace="default"
|
|
3203
|
-
)
|
|
3204
|
-
|
|
3205
|
-
# Search
|
|
3206
|
-
results = client.search_collection(
|
|
3207
|
-
collection_name="documents",
|
|
3208
|
-
query_vector=[...],
|
|
3209
|
-
k=10,
|
|
3210
|
-
namespace="default",
|
|
3211
|
-
filter={"author": "Alice"}
|
|
3212
|
-
)
|
|
3213
|
-
```
|
|
3214
|
-
|
|
3215
|
-
### Context Service (Server Mode)
|
|
3216
|
-
|
|
3217
|
-
```typescript
|
|
3218
|
-
# Query context for LLM
|
|
3219
|
-
context = client.query_context(
|
|
3220
|
-
session_id="session_123",
|
|
3221
|
-
sections=[
|
|
3222
|
-
{"name": "system", "priority": 0, "type": "literal",
|
|
3223
|
-
"content": "You are a helpful assistant."},
|
|
3224
|
-
{"name": "history", "priority": 1, "type": "recent",
|
|
3225
|
-
"table": "messages", "top_k": 10},
|
|
3226
|
-
{"name": "knowledge", "priority": 2, "type": "search",
|
|
3227
|
-
"collection": "documents", "embedding": [...], "top_k": 5}
|
|
3228
|
-
],
|
|
3229
|
-
token_limit=4096,
|
|
3230
|
-
format="toon"
|
|
3231
|
-
)
|
|
3232
|
-
|
|
3233
|
-
print(context.text)
|
|
3234
|
-
print(f"Tokens used: {context.token_count}")
|
|
3235
|
-
```
|
|
3236
|
-
|
|
3237
|
-
---
|
|
3238
|
-
|
|
3239
|
-
## 25. IPC Client (Unix Sockets)
|
|
3240
|
-
|
|
3241
|
-
Local server communication via Unix sockets (lower latency than gRPC).
|
|
3242
|
-
|
|
3243
|
-
```typescript
|
|
3244
|
-
from sochdb import IpcClient
|
|
3245
|
-
|
|
3246
|
-
# Connect
|
|
3247
|
-
client = IpcClient.connect("/tmp/sochdb.sock", timeout=30.0)
|
|
3248
|
-
|
|
3249
|
-
# Basic operations
|
|
3250
|
-
client.put(b"key", b"value")
|
|
3251
|
-
value = client.get(b"key")
|
|
3252
|
-
client.delete(b"key")
|
|
3253
|
-
|
|
3254
|
-
# Path operations
|
|
3255
|
-
client.put_path(["users", "alice"], b"data")
|
|
3256
|
-
value = client.get_path(["users", "alice"])
|
|
3257
|
-
|
|
3258
|
-
# Query
|
|
3259
|
-
result = client.query("users/", limit=100)
|
|
3260
|
-
|
|
3261
|
-
# Scan
|
|
3262
|
-
results = client.scan("prefix/")
|
|
3263
|
-
|
|
3264
|
-
# Transactions
|
|
3265
|
-
txn_id = client.begin_transaction()
|
|
3266
|
-
# ... operations ...
|
|
3267
|
-
commit_ts = client.commit(txn_id)
|
|
3268
|
-
# or client.abort(txn_id)
|
|
3269
|
-
|
|
3270
|
-
# Admin
|
|
3271
|
-
client.ping()
|
|
3272
|
-
client.checkpoint()
|
|
3273
|
-
stats = client.stats()
|
|
3274
|
-
|
|
3275
|
-
client.close()
|
|
3276
|
-
```
|
|
3277
|
-
|
|
3278
|
-
---
|
|
3279
|
-
|
|
3280
|
-
## 26. Standalone VectorIndex
|
|
3281
|
-
|
|
3282
|
-
Direct HNSW index operations without collections.
|
|
3283
|
-
|
|
3284
|
-
```typescript
|
|
3285
|
-
from sochdb import VectorIndex, VectorIndexConfig, DistanceMetric
|
|
3286
|
-
import numpy as np
|
|
3287
|
-
|
|
3288
|
-
# Create index
|
|
3289
|
-
config = VectorIndexConfig(
|
|
3290
|
-
dimension=384,
|
|
3291
|
-
metric=DistanceMetric.COSINE,
|
|
3292
|
-
m=16,
|
|
3293
|
-
ef_construction=200,
|
|
3294
|
-
ef_search=50,
|
|
3295
|
-
max_elements=100000
|
|
3296
|
-
)
|
|
3297
|
-
index = VectorIndex(config)
|
|
3298
|
-
|
|
3299
|
-
# Insert single vector
|
|
3300
|
-
index.insert(id=1, vector=np.array([0.1, 0.2, ...], dtype=np.float32))
|
|
3301
|
-
|
|
3302
|
-
# Batch insert
|
|
3303
|
-
ids = np.array([1, 2, 3], dtype=np.uint64)
|
|
3304
|
-
vectors = np.array([[...], [...], [...]], dtype=np.float32)
|
|
3305
|
-
count = index.insert_batch(ids, vectors)
|
|
3306
|
-
|
|
3307
|
-
# Fast batch insert (returns failures)
|
|
3308
|
-
inserted, failed = index.insert_batch_fast(ids, vectors)
|
|
3309
|
-
|
|
3310
|
-
# Search
|
|
3311
|
-
query = np.array([0.1, 0.2, ...], dtype=np.float32)
|
|
3312
|
-
results = index.search(query, k=10, ef_search=100)
|
|
3313
|
-
|
|
3314
|
-
for id, distance in results:
|
|
3315
|
-
print(f"ID: {id}, Distance: {distance}")
|
|
3316
|
-
|
|
3317
|
-
# Properties
|
|
3318
|
-
print(f"Size: {len(index)}")
|
|
3319
|
-
print(f"Dimension: {index.dimension}")
|
|
3320
|
-
|
|
3321
|
-
# Save/load
|
|
3322
|
-
index.save("./index.bin")
|
|
3323
|
-
index = VectorIndex.load("./index.bin")
|
|
3324
|
-
```
|
|
3325
|
-
|
|
3326
|
-
---
|
|
3327
|
-
|
|
3328
|
-
## 27. Vector Utilities
|
|
3329
|
-
|
|
3330
|
-
Standalone vector operations for preprocessing and analysis.
|
|
3331
|
-
|
|
3332
|
-
```typescript
|
|
3333
|
-
from sochdb import vector
|
|
3334
|
-
|
|
3335
|
-
# Distance calculations
|
|
3336
|
-
a = [1.0, 0.0, 0.0]
|
|
3337
|
-
b = [0.707, 0.707, 0.0]
|
|
3338
|
-
|
|
3339
|
-
cosine_dist = vector.cosine_distance(a, b)
|
|
3340
|
-
euclidean_dist = vector.euclidean_distance(a, b)
|
|
3341
|
-
dot_product = vector.dot_product(a, b)
|
|
3342
|
-
|
|
3343
|
-
print(f"Cosine distance: {cosine_dist:.4f}")
|
|
3344
|
-
print(f"Euclidean distance: {euclidean_dist:.4f}")
|
|
3345
|
-
print(f"Dot product: {dot_product:.4f}")
|
|
3346
|
-
|
|
3347
|
-
# Normalize a vector
|
|
3348
|
-
v = [3.0, 4.0]
|
|
3349
|
-
normalized = vector.normalize(v)
|
|
3350
|
-
print(f"Normalized: {normalized}") # [0.6, 0.8]
|
|
3351
|
-
|
|
3352
|
-
# Batch normalize
|
|
3353
|
-
vectors = [[3.0, 4.0], [1.0, 0.0]]
|
|
3354
|
-
normalized_batch = vector.normalize_batch(vectors)
|
|
3355
|
-
|
|
3356
|
-
# Compute centroid
|
|
3357
|
-
vectors = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
|
|
3358
|
-
centroid = vector.centroid(vectors)
|
|
3359
|
-
|
|
3360
|
-
# Cosine similarity (1 - distance)
|
|
3361
|
-
similarity = vector.cosine_similarity(a, b)
|
|
3362
|
-
```
|
|
3363
|
-
|
|
3364
|
-
---
|
|
3365
|
-
|
|
3366
|
-
## 28. Data Formats (TOON/JSON/Columnar)
|
|
3367
|
-
|
|
3368
|
-
### Wire Formats
|
|
3369
|
-
|
|
3370
|
-
```typescript
|
|
3371
|
-
from sochdb import WireFormat
|
|
3372
|
-
|
|
3373
|
-
# Available formats
|
|
3374
|
-
WireFormat.TOON # Token-efficient (40-66% fewer tokens)
|
|
3375
|
-
WireFormat.JSON # Standard JSON
|
|
3376
|
-
WireFormat.COLUMNAR # Raw columnar for analytics
|
|
3377
|
-
|
|
3378
|
-
# Parse from string
|
|
3379
|
-
fmt = WireFormat.from_string("toon")
|
|
3380
|
-
|
|
3381
|
-
# Convert between formats
|
|
3382
|
-
data = {"users": [{"id": 1, "name": "Alice"}]}
|
|
3383
|
-
toon_data = WireFormat.to_toon(data)
|
|
3384
|
-
json_data = WireFormat.to_json(data)
|
|
3385
|
-
```
|
|
3386
|
-
|
|
3387
|
-
### TOON Format Benefits
|
|
3388
|
-
|
|
3389
|
-
TOON uses **40-60% fewer tokens** than JSON:
|
|
3390
|
-
|
|
3391
|
-
```
|
|
3392
|
-
# JSON (15 tokens)
|
|
3393
|
-
{"users": [{"id": 1, "name": "Alice"}]}
|
|
3394
|
-
|
|
3395
|
-
# TOON (9 tokens)
|
|
3396
|
-
users:
|
|
3397
|
-
- id: 1
|
|
3398
|
-
name: Alice
|
|
3399
|
-
```
|
|
3400
|
-
|
|
3401
|
-
### Context Formats
|
|
3402
|
-
|
|
3403
|
-
```typescript
|
|
3404
|
-
from sochdb import ContextFormat
|
|
3405
|
-
|
|
3406
|
-
ContextFormat.TOON # Token-efficient
|
|
3407
|
-
ContextFormat.JSON # Structured data
|
|
3408
|
-
ContextFormat.MARKDOWN # Human-readable
|
|
3409
|
-
|
|
3410
|
-
# Format capabilities
|
|
3411
|
-
from sochdb import FormatCapabilities
|
|
3412
|
-
|
|
3413
|
-
# Convert between formats
|
|
3414
|
-
ctx_fmt = FormatCapabilities.wire_to_context(WireFormat.TOON)
|
|
3415
|
-
wire_fmt = FormatCapabilities.context_to_wire(ContextFormat.JSON)
|
|
3416
|
-
|
|
3417
|
-
# Check round-trip support
|
|
3418
|
-
if FormatCapabilities.supports_round_trip(WireFormat.TOON):
|
|
3419
|
-
print("Safe for decode(encode(x)) = x")
|
|
3420
|
-
```
|
|
3421
|
-
|
|
3422
|
-
---
|
|
3423
|
-
|
|
3424
|
-
## 29. Policy Service
|
|
3425
|
-
|
|
3426
|
-
Register and evaluate access control policies.
|
|
3427
|
-
|
|
3428
|
-
```typescript
|
|
3429
|
-
from sochdb import PolicyService
|
|
3430
|
-
|
|
3431
|
-
policy_svc = db.policy_service()
|
|
3432
|
-
|
|
3433
|
-
# Register a policy
|
|
3434
|
-
policy_svc.register(
|
|
3435
|
-
policy_id="read_own_data",
|
|
3436
|
-
name="Users can read their own data",
|
|
3437
|
-
trigger="READ",
|
|
3438
|
-
action="ALLOW",
|
|
3439
|
-
condition="resource.owner == user.id"
|
|
3440
|
-
)
|
|
3441
|
-
|
|
3442
|
-
# Register another policy
|
|
3443
|
-
policy_svc.register(
|
|
3444
|
-
policy_id="admin_all",
|
|
3445
|
-
name="Admins can do everything",
|
|
3446
|
-
trigger="*",
|
|
3447
|
-
action="ALLOW",
|
|
3448
|
-
condition="user.role == 'admin'"
|
|
3449
|
-
)
|
|
3450
|
-
|
|
3451
|
-
# Evaluate policy
|
|
3452
|
-
result = policy_svc.evaluate(
|
|
3453
|
-
action="READ",
|
|
3454
|
-
resource="documents/123",
|
|
3455
|
-
context={"user.id": "alice", "user.role": "user", "resource.owner": "alice"}
|
|
3456
|
-
)
|
|
3457
|
-
|
|
3458
|
-
if result.allowed:
|
|
3459
|
-
print("Access granted")
|
|
3460
|
-
else:
|
|
3461
|
-
print(f"Access denied: {result.reason}")
|
|
3462
|
-
print(f"Denying policy: {result.policy_id}")
|
|
3463
|
-
|
|
3464
|
-
# List policies
|
|
3465
|
-
policies = policy_svc.list()
|
|
3466
|
-
for p in policies:
|
|
3467
|
-
print(f"{p.policy_id}: {p.name}")
|
|
3468
|
-
|
|
3469
|
-
# Delete policy
|
|
3470
|
-
policy_svc.delete("old_policy")
|
|
3471
|
-
```
|
|
3472
|
-
|
|
3473
|
-
---
|
|
3474
|
-
|
|
3475
|
-
## 30. MCP (Model Context Protocol)
|
|
3476
|
-
|
|
3477
|
-
Integrate SochDB as an MCP tool provider.
|
|
3478
|
-
|
|
3479
|
-
### Built-in MCP Tools
|
|
3480
|
-
|
|
3481
|
-
| Tool | Description |
|
|
3482
|
-
|------|-------------|
|
|
3483
|
-
| `sochdb_query` | Execute ToonQL/SQL queries |
|
|
3484
|
-
| `sochdb_context_query` | Fetch AI-optimized context |
|
|
3485
|
-
| `sochdb_put` | Store key-value data |
|
|
3486
|
-
| `sochdb_get` | Retrieve data by key |
|
|
3487
|
-
| `sochdb_search` | Vector similarity search |
|
|
3488
|
-
|
|
3489
|
-
### Using MCP Tools (Server Mode)
|
|
3490
|
-
|
|
3491
|
-
```typescript
|
|
3492
|
-
# List available tools
|
|
3493
|
-
tools = client.list_mcp_tools()
|
|
3494
|
-
for tool in tools:
|
|
3495
|
-
print(f"{tool.name}: {tool.description}")
|
|
3496
|
-
|
|
3497
|
-
# Get tool schema
|
|
3498
|
-
schema = client.get_mcp_tool_schema("sochdb_search")
|
|
3499
|
-
print(schema)
|
|
3500
|
-
|
|
3501
|
-
# Execute tool
|
|
3502
|
-
result = client.execute_mcp_tool(
|
|
3503
|
-
name="sochdb_query",
|
|
3504
|
-
arguments={"query": "SELECT * FROM users", "format": "toon"}
|
|
3505
|
-
)
|
|
3506
|
-
print(result)
|
|
3507
|
-
```
|
|
3508
|
-
|
|
3509
|
-
### Register Custom Tool
|
|
3510
|
-
|
|
3511
|
-
```typescript
|
|
3512
|
-
# Register a custom tool
|
|
3513
|
-
client.register_mcp_tool(
|
|
3514
|
-
name="search_documents",
|
|
3515
|
-
description="Search documents by semantic similarity",
|
|
3516
|
-
input_schema={
|
|
3517
|
-
"type": "object",
|
|
3518
|
-
"properties": {
|
|
3519
|
-
"query": {"type": "string", "description": "Search query"},
|
|
3520
|
-
"k": {"type": "integer", "description": "Number of results", "default": 10}
|
|
3521
|
-
},
|
|
3522
|
-
"required": ["query"]
|
|
3523
|
-
}
|
|
3524
|
-
)
|
|
3525
|
-
```
|
|
3526
|
-
|
|
3527
|
-
---
|
|
3528
|
-
|
|
3529
|
-
## 31. Configuration Reference
|
|
3530
|
-
|
|
3531
|
-
### Database Configuration
|
|
3532
|
-
|
|
3533
|
-
```typescript
|
|
3534
|
-
from sochdb import Database, CompressionType, SyncMode
|
|
3535
|
-
|
|
3536
|
-
db = Database.open("./my_db", config={
|
|
3537
|
-
# Durability
|
|
3538
|
-
"wal_enabled": True, # Write-ahead logging
|
|
3539
|
-
"sync_mode": SyncMode.NORMAL, # FULL, NORMAL, OFF
|
|
3540
|
-
|
|
3541
|
-
# Performance
|
|
3542
|
-
"memtable_size_bytes": 64 * 1024 * 1024, # 64MB (flush threshold)
|
|
3543
|
-
"block_cache_size_bytes": 256 * 1024 * 1024, # 256MB
|
|
3544
|
-
"group_commit": True, # Batch commits
|
|
3545
|
-
|
|
3546
|
-
# Compression
|
|
3547
|
-
"compression": CompressionType.LZ4,
|
|
3548
|
-
|
|
3549
|
-
# Index policy
|
|
3550
|
-
"index_policy": "balanced",
|
|
3551
|
-
|
|
3552
|
-
# Background workers
|
|
3553
|
-
"compaction_threads": 2,
|
|
3554
|
-
"flush_threads": 1,
|
|
3555
|
-
})
|
|
3556
|
-
```
|
|
3557
|
-
|
|
3558
|
-
### Sync Modes
|
|
3559
|
-
|
|
3560
|
-
| Mode | Speed | Safety | Use Case |
|
|
3561
|
-
|------|-------|--------|----------|
|
|
3562
|
-
| `OFF` | ~10x faster | Risk of data loss | Development, caches |
|
|
3563
|
-
| `NORMAL` | Balanced | Fsync at checkpoints | Default |
|
|
3564
|
-
| `FULL` | Slowest | Fsync every commit | Financial data |
|
|
3565
|
-
|
|
3566
|
-
### CollectionConfig Reference
|
|
3567
|
-
|
|
3568
|
-
| Field | Type | Default | Description |
|
|
3569
|
-
|-------|------|---------|-------------|
|
|
3570
|
-
| `name` | str | required | Collection name |
|
|
3571
|
-
| `dimension` | int | required | Vector dimension |
|
|
3572
|
-
| `metric` | DistanceMetric | COSINE | COSINE, EUCLIDEAN, DOT_PRODUCT |
|
|
3573
|
-
| `m` | int | 16 | HNSW M parameter |
|
|
3574
|
-
| `ef_construction` | int | 100 | HNSW build quality |
|
|
3575
|
-
| `ef_search` | int | 50 | HNSW search quality |
|
|
3576
|
-
| `quantization` | QuantizationType | NONE | NONE, SCALAR, PQ |
|
|
3577
|
-
| `enable_hybrid_search` | bool | False | Enable BM25 |
|
|
3578
|
-
| `content_field` | str | None | Field for BM25 indexing |
|
|
3579
|
-
|
|
3580
|
-
### Environment Variables
|
|
3581
|
-
|
|
3582
|
-
| Variable | Description |
|
|
3583
|
-
|----------|-------------|
|
|
3584
|
-
| `TOONDB_LIB_PATH` | Custom path to native library |
|
|
3585
|
-
| `TOONDB_DISABLE_ANALYTICS` | Disable anonymous usage tracking |
|
|
3586
|
-
| `TOONDB_LOG_LEVEL` | Log level (DEBUG, INFO, WARN, ERROR) |
|
|
3587
|
-
|
|
3588
|
-
---
|
|
3589
|
-
|
|
3590
|
-
## 32. Error Handling
|
|
3591
|
-
|
|
3592
|
-
### Error Types
|
|
3593
|
-
|
|
3594
|
-
```typescript
|
|
3595
|
-
from sochdb import (
|
|
3596
|
-
# Base
|
|
3597
|
-
SochDBError,
|
|
3598
|
-
|
|
3599
|
-
# Connection
|
|
3600
|
-
ConnectionError,
|
|
3601
|
-
ConnectionTimeoutError,
|
|
3602
|
-
|
|
3603
|
-
# Transaction
|
|
3604
|
-
TransactionError,
|
|
3605
|
-
TransactionConflictError, # SSI conflict - retry
|
|
3606
|
-
TransactionTimeoutError,
|
|
3607
|
-
|
|
3608
|
-
# Storage
|
|
3609
|
-
DatabaseError,
|
|
3610
|
-
CorruptionError,
|
|
3611
|
-
DiskFullError,
|
|
3612
|
-
|
|
3613
|
-
# Namespace
|
|
3614
|
-
NamespaceNotFoundError,
|
|
3615
|
-
NamespaceExistsError,
|
|
3616
|
-
NamespaceAccessError,
|
|
3617
|
-
|
|
3618
|
-
# Collection
|
|
3619
|
-
CollectionNotFoundError,
|
|
3620
|
-
CollectionExistsError,
|
|
3621
|
-
CollectionConfigError,
|
|
3622
|
-
|
|
3623
|
-
# Validation
|
|
3624
|
-
ValidationError,
|
|
3625
|
-
DimensionMismatchError,
|
|
3626
|
-
InvalidMetadataError,
|
|
3627
|
-
|
|
3628
|
-
# Query
|
|
3629
|
-
QueryError,
|
|
3630
|
-
QuerySyntaxError,
|
|
3631
|
-
QueryTimeoutError,
|
|
3632
|
-
)
|
|
3633
|
-
```
|
|
3634
|
-
|
|
3635
|
-
### Error Handling Pattern
|
|
3636
|
-
|
|
3637
|
-
```typescript
|
|
3638
|
-
from sochdb import (
|
|
3639
|
-
SochDBError,
|
|
3640
|
-
TransactionConflictError,
|
|
3641
|
-
DimensionMismatchError,
|
|
3642
|
-
CollectionNotFoundError,
|
|
3643
|
-
)
|
|
3644
|
-
|
|
3645
|
-
try:
|
|
3646
|
-
with db.transaction() as txn:
|
|
3647
|
-
txn.put(b"key", b"value")
|
|
3648
|
-
|
|
3649
|
-
except TransactionConflictError as e:
|
|
3650
|
-
# SSI conflict - safe to retry
|
|
3651
|
-
print(f"Conflict detected: {e}")
|
|
3652
|
-
|
|
3653
|
-
except DimensionMismatchError as e:
|
|
3654
|
-
# Vector dimension wrong
|
|
3655
|
-
print(f"Expected {e.expected} dimensions, got {e.actual}")
|
|
3656
|
-
|
|
3657
|
-
except CollectionNotFoundError as e:
|
|
3658
|
-
# Collection doesn't exist
|
|
3659
|
-
print(f"Collection not found: {e.collection}")
|
|
3660
|
-
|
|
3661
|
-
except SochDBError as e:
|
|
3662
|
-
# All other SochDB errors
|
|
3663
|
-
print(f"Error: {e}")
|
|
3664
|
-
print(f"Code: {e.code}")
|
|
3665
|
-
print(f"Remediation: {e.remediation}")
|
|
3666
|
-
```
|
|
3667
|
-
|
|
3668
|
-
### Error Information
|
|
3669
|
-
|
|
3670
|
-
```typescript
|
|
3671
|
-
try:
|
|
3672
|
-
# ...
|
|
3673
|
-
except SochDBError as e:
|
|
3674
|
-
print(f"Message: {e.message}")
|
|
3675
|
-
print(f"Code: {e.code}") # ErrorCode enum
|
|
3676
|
-
print(f"Details: {e.details}") # Additional context
|
|
3677
|
-
print(f"Remediation: {e.remediation}") # How to fix
|
|
3678
|
-
print(f"Retryable: {e.retryable}") # Safe to retry?
|
|
3679
|
-
```
|
|
3680
|
-
|
|
3681
|
-
---
|
|
3682
|
-
|
|
3683
|
-
## 33. Async Support
|
|
3684
|
-
|
|
3685
|
-
Optional async/await support for non-blocking operations.
|
|
3686
|
-
|
|
3687
|
-
```typescript
|
|
3688
|
-
from sochdb import AsyncDatabase
|
|
3689
|
-
|
|
3690
|
-
async def main():
|
|
3691
|
-
# Open async database
|
|
3692
|
-
db = await AsyncDatabase.open("./my_db")
|
|
3693
|
-
|
|
3694
|
-
# Async operations
|
|
3695
|
-
await db.put(b"key", b"value")
|
|
3696
|
-
value = await db.get(b"key")
|
|
3697
|
-
|
|
3698
|
-
# Async transactions
|
|
3699
|
-
async with db.transaction() as txn:
|
|
3700
|
-
await txn.put(b"key1", b"value1")
|
|
3701
|
-
await txn.put(b"key2", b"value2")
|
|
3702
|
-
|
|
3703
|
-
# Async vector search
|
|
3704
|
-
results = await db.collection("docs").search(SearchRequest(
|
|
3705
|
-
vector=[0.1, 0.2, ...],
|
|
3706
|
-
k=10
|
|
3707
|
-
))
|
|
3708
|
-
|
|
3709
|
-
await db.close()
|
|
3710
|
-
|
|
3711
|
-
# Run
|
|
3712
|
-
import asyncio
|
|
3713
|
-
asyncio.run(main())
|
|
3714
|
-
```
|
|
3715
|
-
|
|
3716
|
-
**Note:** Requires `npm install @sochdb/sochdb[async]`
|
|
3717
|
-
|
|
3718
|
-
---
|
|
3719
|
-
|
|
3720
|
-
## 34. Building & Development
|
|
3721
|
-
|
|
3722
|
-
### Building Native Extensions
|
|
3723
|
-
|
|
3724
|
-
```bash
|
|
3725
|
-
# Build for current platform
|
|
3726
|
-
typescript build_native.ts
|
|
3727
|
-
|
|
3728
|
-
# Build only FFI libraries
|
|
3729
|
-
typescript build_native.ts --libs
|
|
3730
|
-
|
|
3731
|
-
# Build for all platforms
|
|
3732
|
-
typescript build_native.ts --all
|
|
3733
|
-
|
|
3734
|
-
# Clean
|
|
3735
|
-
typescript build_native.ts --clean
|
|
3736
|
-
```
|
|
3737
|
-
|
|
3738
|
-
### Library Discovery
|
|
3739
|
-
|
|
3740
|
-
The SDK looks for native libraries in this order:
|
|
3741
|
-
1. `TOONDB_LIB_PATH` environment variable
|
|
3742
|
-
2. Bundled in wheel: `lib/{target}/`
|
|
3743
|
-
3. Package directory
|
|
3744
|
-
4. Development builds: `target/release/`, `target/debug/`
|
|
3745
|
-
5. System paths: `/usr/local/lib`, `/usr/lib`
|
|
3746
|
-
|
|
3747
|
-
### Running Tests
|
|
3748
|
-
|
|
3749
|
-
```bash
|
|
3750
|
-
# All tests
|
|
3751
|
-
pytest
|
|
3752
|
-
|
|
3753
|
-
# Specific test file
|
|
3754
|
-
pytest tests/test_vector_search.ts
|
|
3755
|
-
|
|
3756
|
-
# With coverage
|
|
3757
|
-
pytest --cov=sochdb
|
|
3758
|
-
|
|
3759
|
-
# Performance tests
|
|
3760
|
-
pytest tests/perf/ --benchmark
|
|
3761
|
-
```
|
|
3762
|
-
|
|
3763
|
-
### Package Structure
|
|
3764
|
-
|
|
3765
|
-
```
|
|
3766
|
-
sochdb/
|
|
3767
|
-
├── __init__.ts # Public API exports
|
|
3768
|
-
├── database.ts # Database, Transaction
|
|
3769
|
-
├── namespace.ts # Namespace, Collection
|
|
3770
|
-
├── vector.ts # VectorIndex, utilities
|
|
3771
|
-
├── grpc_client.ts # SochDBClient (server mode)
|
|
3772
|
-
├── ipc_client.ts # IpcClient (Unix sockets)
|
|
3773
|
-
├── context.ts # ContextQueryBuilder
|
|
3774
|
-
├── atomic.ts # AtomicMemoryWriter
|
|
3775
|
-
├── recovery.ts # RecoveryManager
|
|
3776
|
-
├── checkpoint.ts # CheckpointService
|
|
3777
|
-
├── workflow.ts # WorkflowService
|
|
3778
|
-
├── trace.ts # TraceStore
|
|
3779
|
-
├── policy.ts # PolicyService
|
|
3780
|
-
├── format.ts # WireFormat, ContextFormat
|
|
3781
|
-
├── errors.ts # All error types
|
|
3782
|
-
├── _bin/ # Bundled binaries
|
|
3783
|
-
└── lib/ # FFI libraries
|
|
3784
|
-
```
|
|
3785
|
-
|
|
3786
|
-
---
|
|
3787
|
-
|
|
3788
|
-
## 35. Complete Examples
|
|
3789
|
-
|
|
3790
|
-
### RAG Pipeline Example
|
|
3791
|
-
|
|
3792
|
-
```typescript
|
|
3793
|
-
from sochdb import Database, CollectionConfig, DistanceMetric, SearchRequest
|
|
3794
|
-
|
|
3795
|
-
# Setup
|
|
3796
|
-
db = Database.open("./rag_db")
|
|
3797
|
-
ns = db.get_or_create_namespace("rag")
|
|
3798
|
-
|
|
3799
|
-
# Create collection for documents
|
|
3800
|
-
collection = ns.create_collection(CollectionConfig(
|
|
3801
|
-
name="documents",
|
|
3802
|
-
dimension=384,
|
|
3803
|
-
metric=DistanceMetric.COSINE,
|
|
3804
|
-
enable_hybrid_search=True,
|
|
3805
|
-
content_field="text"
|
|
3806
|
-
))
|
|
3807
|
-
|
|
3808
|
-
# Index documents
|
|
3809
|
-
def index_document(doc_id: str, text: str, embed_fn):
|
|
3810
|
-
embedding = embed_fn(text)
|
|
3811
|
-
collection.insert(
|
|
3812
|
-
id=doc_id,
|
|
3813
|
-
vector=embedding,
|
|
3814
|
-
metadata={"text": text, "indexed_at": "2024-01-15"}
|
|
3815
|
-
)
|
|
3816
|
-
|
|
3817
|
-
# Retrieve relevant context
|
|
3818
|
-
def retrieve_context(query: str, embed_fn, k: int = 5) -> list:
|
|
3819
|
-
query_embedding = embed_fn(query)
|
|
3820
|
-
|
|
3821
|
-
results = collection.hybrid_search(
|
|
3822
|
-
vector=query_embedding,
|
|
3823
|
-
text_query=query,
|
|
3824
|
-
k=k,
|
|
3825
|
-
alpha=0.7 # 70% vector, 30% keyword
|
|
3826
|
-
)
|
|
3827
|
-
|
|
3828
|
-
return [r.metadata["text"] for r in results]
|
|
3829
|
-
|
|
3830
|
-
# Full RAG pipeline
|
|
3831
|
-
def rag_query(query: str, embed_fn, llm_fn):
|
|
3832
|
-
# 1. Retrieve
|
|
3833
|
-
context_docs = retrieve_context(query, embed_fn)
|
|
3834
|
-
|
|
3835
|
-
# 2. Build context
|
|
3836
|
-
from sochdb import ContextQueryBuilder, ContextFormat
|
|
3837
|
-
|
|
3838
|
-
context = ContextQueryBuilder() \
|
|
3839
|
-
.for_session("rag_session") \
|
|
3840
|
-
.with_budget(4096) \
|
|
3841
|
-
.literal("SYSTEM", 0, "Answer based on the provided context.") \
|
|
3842
|
-
.literal("CONTEXT", 1, "\n\n".join(context_docs)) \
|
|
3843
|
-
.literal("QUESTION", 2, query) \
|
|
3844
|
-
.execute()
|
|
3845
|
-
|
|
3846
|
-
# 3. Generate
|
|
3847
|
-
response = llm_fn(context.text)
|
|
3848
|
-
|
|
3849
|
-
return response
|
|
3850
|
-
|
|
3851
|
-
db.close()
|
|
3852
|
-
```
|
|
3853
|
-
|
|
3854
|
-
### Knowledge Graph Example
|
|
3855
|
-
|
|
3856
|
-
```typescript
|
|
3857
|
-
from sochdb import Database
|
|
3858
|
-
import time
|
|
3859
|
-
|
|
3860
|
-
db = Database.open("./knowledge_graph")
|
|
3861
|
-
|
|
3862
|
-
# Build a knowledge graph
|
|
3863
|
-
db.add_node("kg", "alice", "person", {"role": "engineer", "level": "senior"})
|
|
3864
|
-
db.add_node("kg", "bob", "person", {"role": "manager"})
|
|
3865
|
-
db.add_node("kg", "project_ai", "project", {"status": "active", "budget": 100000})
|
|
3866
|
-
db.add_node("kg", "ml_team", "team", {"size": 5})
|
|
3867
|
-
|
|
3868
|
-
db.add_edge("kg", "alice", "works_on", "project_ai", {"role": "lead"})
|
|
3869
|
-
db.add_edge("kg", "alice", "member_of", "ml_team")
|
|
3870
|
-
db.add_edge("kg", "bob", "manages", "project_ai")
|
|
3871
|
-
db.add_edge("kg", "bob", "leads", "ml_team")
|
|
3872
|
-
|
|
3873
|
-
# Query: Find all projects Alice works on
|
|
3874
|
-
nodes, edges = db.traverse("kg", "alice", max_depth=1)
|
|
3875
|
-
projects = [n for n in nodes if n["node_type"] == "project"]
|
|
3876
|
-
print(f"Alice's projects: {[p['id'] for p in projects]}")
|
|
3877
|
-
|
|
3878
|
-
# Query: Who manages Alice's projects?
|
|
3879
|
-
for project in projects:
|
|
3880
|
-
nodes, edges = db.traverse("kg", project["id"], max_depth=1)
|
|
3881
|
-
managers = [e["from_id"] for e in edges if e["edge_type"] == "manages"]
|
|
3882
|
-
print(f"{project['id']} managed by: {managers}")
|
|
3883
|
-
|
|
3884
|
-
db.close()
|
|
3885
|
-
```
|
|
3886
|
-
|
|
3887
|
-
### Multi-Tenant SaaS Example
|
|
3888
|
-
|
|
3889
|
-
```typescript
|
|
3890
|
-
from sochdb import Database
|
|
3891
|
-
|
|
3892
|
-
db = Database.open("./saas_db")
|
|
3893
|
-
|
|
3894
|
-
# Create tenant namespaces
|
|
3895
|
-
for tenant in ["acme_corp", "globex", "initech"]:
|
|
3896
|
-
ns = db.create_namespace(
|
|
3897
|
-
name=tenant,
|
|
3898
|
-
labels={"tier": "premium" if tenant == "acme_corp" else "standard"}
|
|
3899
|
-
)
|
|
3900
|
-
|
|
3901
|
-
# Create tenant-specific collections
|
|
3902
|
-
ns.create_collection(
|
|
3903
|
-
name="documents",
|
|
3904
|
-
dimension=384
|
|
3905
|
-
)
|
|
3906
|
-
|
|
3907
|
-
# Tenant-scoped operations
|
|
3908
|
-
with db.use_namespace("acme_corp") as ns:
|
|
3909
|
-
collection = ns.collection("documents")
|
|
3910
|
-
|
|
3911
|
-
# All operations isolated to acme_corp
|
|
3912
|
-
collection.insert(
|
|
3913
|
-
id="doc1",
|
|
3914
|
-
vector=[0.1] * 384,
|
|
3915
|
-
metadata={"title": "Acme Internal Doc"}
|
|
3916
|
-
)
|
|
3917
|
-
|
|
3918
|
-
# Search only searches acme_corp's documents
|
|
3919
|
-
results = collection.vector_search(
|
|
3920
|
-
vector=[0.1] * 384,
|
|
3921
|
-
k=10
|
|
3922
|
-
)
|
|
3923
|
-
|
|
3924
|
-
# Cleanup
|
|
3925
|
-
db.close()
|
|
3926
|
-
```
|
|
3927
|
-
|
|
3928
|
-
---
|
|
3929
|
-
|
|
3930
|
-
## 36. Migration Guide
|
|
3931
|
-
|
|
3932
|
-
### From v0.2.x to v0.3.x
|
|
3933
|
-
|
|
3934
|
-
```typescript
|
|
3935
|
-
# Old: scan() with range
|
|
3936
|
-
for k, v in db.scan(b"users/", b"users0"): # DEPRECATED
|
|
3937
|
-
pass
|
|
3938
|
-
|
|
3939
|
-
# New: scan_prefix()
|
|
3940
|
-
for k, v in db.scan_prefix(b"users/"):
|
|
3941
|
-
pass
|
|
3942
|
-
|
|
3943
|
-
# Old: execute_sql returns tuple
|
|
3944
|
-
columns, rows = db.execute_sql("SELECT * FROM users")
|
|
3945
|
-
|
|
3946
|
-
# New: execute_sql returns SQLQueryResult
|
|
3947
|
-
result = db.execute_sql("SELECT * FROM users")
|
|
3948
|
-
columns = result.columns
|
|
3949
|
-
rows = result.rows
|
|
3950
|
-
```
|
|
3951
|
-
|
|
3952
|
-
### From SQLite/PostgreSQL
|
|
3953
|
-
|
|
3954
|
-
```typescript
|
|
3955
|
-
# SQLite
|
|
3956
|
-
# conn = sqlite3.connect("app.db")
|
|
3957
|
-
# cursor = conn.execute("SELECT * FROM users")
|
|
3958
|
-
|
|
3959
|
-
# SochDB (same SQL, embedded)
|
|
3960
|
-
db = Database.open("./app_db")
|
|
3961
|
-
result = db.execute_sql("SELECT * FROM users")
|
|
3962
|
-
```
|
|
3963
|
-
|
|
3964
|
-
### From Redis
|
|
3965
|
-
|
|
3966
|
-
```typescript
|
|
3967
|
-
# Redis
|
|
3968
|
-
# r = redis.Redis()
|
|
3969
|
-
# r.set("key", "value")
|
|
3970
|
-
# r.get("key")
|
|
3971
|
-
|
|
3972
|
-
# SochDB
|
|
3973
|
-
db = Database.open("./cache_db")
|
|
3974
|
-
db.put(b"key", b"value")
|
|
3975
|
-
db.get(b"key")
|
|
3976
|
-
|
|
3977
|
-
# With TTL
|
|
3978
|
-
db.put(b"session:123", b"data", ttl_seconds=3600)
|
|
3979
|
-
```
|
|
3980
|
-
|
|
3981
|
-
### From Pinecone/Weaviate
|
|
3982
|
-
|
|
3983
|
-
```typescript
|
|
3984
|
-
# Pinecone
|
|
3985
|
-
# index.upsert(vectors=[(id, embedding, metadata)])
|
|
3986
|
-
# results = index.query(vector=query, top_k=10)
|
|
3987
|
-
|
|
3988
|
-
# SochDB
|
|
3989
|
-
collection = db.namespace("default").collection("vectors")
|
|
3990
|
-
collection.insert(id=id, vector=embedding, metadata=metadata)
|
|
3991
|
-
results = collection.vector_search(vector=query, k=10)
|
|
3992
|
-
```
|
|
3993
|
-
|
|
3994
|
-
---
|
|
3995
|
-
|
|
3996
|
-
## Performance
|
|
3997
|
-
|
|
3998
|
-
**Network Overhead:**
|
|
3999
|
-
- gRPC: ~100-200 μs per request (local)
|
|
4000
|
-
- IPC: ~50-100 μs per request (Unix socket)
|
|
4001
|
-
|
|
4002
|
-
**Batch Operations:**
|
|
4003
|
-
- Vector insert: 50,000 vectors/sec (batch mode)
|
|
4004
|
-
- Vector search: 20,000 queries/sec (47 μs/query)
|
|
4005
|
-
|
|
4006
|
-
**Recommendation:**
|
|
4007
|
-
- Use **batch operations** for high throughput
|
|
4008
|
-
- Use **IPC** for same-machine communication
|
|
4009
|
-
- Use **gRPC** for distributed systems
|
|
4010
|
-
|
|
4011
|
-
---
|
|
4012
|
-
|
|
4013
|
-
## FAQ
|
|
4014
|
-
|
|
4015
|
-
**Q: Which mode should I use?**
|
|
4016
|
-
A:
|
|
4017
|
-
- **Embedded (FFI)**: For local dev, notebooks, single-process apps
|
|
4018
|
-
- **Server (gRPC)**: For production, multi-language, distributed systems
|
|
4019
|
-
|
|
4020
|
-
**Q: Can I switch between modes?**
|
|
4021
|
-
A: Yes! Both modes have the same API. Change `Database.open()` to `SochDBClient()` and vice versa.
|
|
4022
|
-
|
|
4023
|
-
**Q: Do temporal graphs work in embedded mode?**
|
|
4024
|
-
A: Yes! Temporal graphs work in both embedded and server modes with identical APIs.
|
|
4025
|
-
|
|
4026
|
-
**Q: Is embedded mode slower than server mode?**
|
|
4027
|
-
A: Embedded mode is faster for single-process use (no network overhead). Server mode is better for distributed deployments.
|
|
4028
|
-
|
|
4029
|
-
**Q: Where is the business logic?**
|
|
4030
|
-
A: All business logic is in Rust. Embedded mode uses FFI bindings, server mode uses gRPC. Same Rust code, different transport.
|
|
4031
|
-
|
|
4032
|
-
**Q: What about the old "fat client" Database class?**
|
|
4033
|
-
A: It's still here as embedded mode! We now support dual-mode: embedded FFI + server gRPC.
|
|
4034
|
-
|
|
4035
|
-
---
|
|
4036
|
-
|
|
4037
|
-
## Examples
|
|
4038
|
-
|
|
4039
|
-
See the [examples/](examples/) directory for complete working examples:
|
|
4040
|
-
|
|
4041
|
-
**Embedded Mode (FFI - No Server):**
|
|
4042
|
-
- [23_collections_embedded.ts](examples/23_collections_embedded.ts) - Document storage, JSON, transactions
|
|
4043
|
-
- [22_namespaces.ts](examples/22_namespaces.ts) - Multi-tenant isolation with key prefixes
|
|
4044
|
-
- [24_batch_operations.ts](examples/24_batch_operations.ts) - Atomic writes, rollback, conditional updates
|
|
4045
|
-
- [25_temporal_graph_embedded.ts](examples/25_temporal_graph_embedded.ts) - Time-travel queries (NEW!)
|
|
4046
|
-
|
|
4047
|
-
**Server Mode (gRPC - Requires Server):**
|
|
4048
|
-
- [21_temporal_graph.ts](examples/21_temporal_graph.ts) - Temporal graphs via gRPC
|
|
4049
|
-
|
|
4050
|
-
---
|
|
4051
|
-
|
|
4052
|
-
## Getting Help
|
|
1245
|
+
### Performance
|
|
4053
1246
|
|
|
4054
|
-
|
|
4055
|
-
|
|
4056
|
-
|
|
1247
|
+
| Operation | Latency |
|
|
1248
|
+
|-----------|---------|
|
|
1249
|
+
| KV Read | ~100ns |
|
|
1250
|
+
| KV Write (fsync) | ~5ms |
|
|
1251
|
+
| KV Write (concurrent, amortized) | ~60µs |
|
|
1252
|
+
| Vector Search (HNSW, 1M vectors) | <5ms |
|
|
1253
|
+
| Prefix Scan (per item) | ~200ns |
|
|
1254
|
+
| Max Concurrent Readers | 1024 |
|
|
4057
1255
|
|
|
4058
1256
|
---
|
|
4059
1257
|
|
|
4060
1258
|
## Contributing
|
|
4061
1259
|
|
|
4062
|
-
|
|
4063
|
-
- Development environment setup
|
|
4064
|
-
- Building from source
|
|
4065
|
-
- Running tests
|
|
4066
|
-
- Code style guidelines
|
|
4067
|
-
- Pull request process
|
|
1260
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, building from source, and pull request guidelines.
|
|
4068
1261
|
|
|
4069
1262
|
---
|
|
4070
1263
|
|