@sochdb/sochdb 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,54 +27,40 @@ Choose the deployment mode that fits your needs.
27
27
 
28
28
  # SochDB Node.js SDK Documentation
29
29
 
30
- **LLM-Optimized Embedded Database with Native Vector Search**
30
+ > **Version 0.5.3** — LLM-Optimized Embedded Database with Native Vector Search
31
31
 
32
32
  ---
33
33
 
34
34
  ## Table of Contents
35
35
 
36
36
  1. [Quick Start](#1-quick-start)
37
- 2. [Installation](#2-installation)
38
- 3. [Features](#3-features)
37
+ 2. [Features](#features)
38
+ - [Memory System](#memory-system---llm-native-memory-for-ai-agents)
39
+ - [Semantic Cache](#semantic-cache---llm-response-caching)
40
+ - [Context Query Builder](#context-query-builder---token-aware-llm-context)
39
41
  - [Namespace API](#namespace-api---multi-tenant-isolation)
40
42
  - [Priority Queue API](#priority-queue-api---task-processing)
41
- 4. [Architecture Overview](#4-architecture-overview)
42
- 5. [Core Key-Value Operations](#5-core-key-value-operations)
43
- 6. [Transactions (ACID with SSI)](#6-transactions-acid-with-ssi)
44
- 7. [Query Builder](#7-query-builder)
45
- 8. [Prefix Scanning](#8-prefix-scanning)
46
- 9. [SQL Operations](#9-sql-operations)
47
- 10. [Table Management & Index Policies](#10-table-management--index-policies)
48
- 11. [Namespaces & Collections](#11-namespaces--collections)
49
- 12. [Priority Queues](#12-priority-queues)
50
- 13. [Vector Search](#13-vector-search)
51
- 14. [Hybrid Search (Vector + BM25)](#14-hybrid-search-vector--bm25)
52
- 15. [Graph Operations](#15-graph-operations)
53
- 16. [Temporal Graph (Time-Travel)](#16-temporal-graph-time-travel)
54
- 17. [Semantic Cache](#17-semantic-cache)
55
- 18. [Memory System](#18-memory-system)
56
- 19. [Session Management](#19-session-management)
57
- 20. [Context Query Builder (LLM Optimization)](#20-context-query-builder-llm-optimization)
58
- 21. [Atomic Multi-Index Writes](#21-atomic-multi-index-writes)
59
- 22. [Recovery & WAL Management](#22-recovery--wal-management)
60
- 23. [Checkpoints & Snapshots](#23-checkpoints--snapshots)
61
- 24. [Compression & Storage](#24-compression--storage)
62
- 25. [Statistics & Monitoring](#25-statistics--monitoring)
63
- 26. [Distributed Tracing](#26-distributed-tracing)
64
- 27. [Workflow & Run Tracking](#27-workflow--run-tracking)
65
- 28. [Server Mode (gRPC Client)](#28-server-mode-grpc-client)
66
- 29. [IPC Client (Unix Sockets)](#29-ipc-client-unix-sockets)
67
- 30. [Standalone VectorIndex](#30-standalone-vectorindex)
68
- 31. [Vector Utilities](#31-vector-utilities)
69
- 32. [Data Formats (TOON/JSON/Columnar)](#32-data-formats-toonjsoncolumnar)
70
- 33. [Policy Service](#33-policy-service)
71
- 34. [MCP (Model Context Protocol)](#34-mcp-model-context-protocol)
72
- 35. [Configuration Reference](#35-configuration-reference)
73
- 36. [Error Handling](#36-error-handling)
74
- 37. [Async Support](#37-async-support)
75
- 38. [Building & Development](#38-building--development)
76
- 39. [Complete Examples](#39-complete-examples)
77
- 40. [Migration Guide](#40-migration-guide)
43
+ 3. [Architecture](#architecture-flexible-deployment)
44
+ 4. [System Requirements](#system-requirements)
45
+ 5. [Troubleshooting](#troubleshooting)
46
+ 6. [Vector Search (Native HNSW)](#-vector-search---native-hnsw)
47
+ 7. [API Reference](#api-reference)
48
+ - [Core Key-Value Operations](#core-key-value-operations)
49
+ - [Transactions (ACID with SSI)](#transactions-acid-with-ssi)
50
+ - [Prefix Scanning](#prefix-scanning)
51
+ - [Namespaces & Collections](#namespaces--collections)
52
+ - [Priority Queues](#priority-queues)
53
+ - [Graph Operations](#graph-operations)
54
+ - [Semantic Cache](#semantic-cache)
55
+ - [Context Query Builder](#context-query-builder)
56
+ - [Memory System (LLM-Native)](#memory-system-llm-native)
57
+ - [Data Formats (TOON/JSON/Columnar)](#data-formats-toonjsoncolumnar)
58
+ - [Policy Service & MCP](#policy-service--mcp)
59
+ - [Server Mode (IPC / gRPC)](#server-mode-ipc--grpc)
60
+ - [Checkpoints & Statistics](#checkpoints--statistics)
61
+ - [Error Handling](#error-handling)
62
+ - [Configuration Reference](#configuration-reference)
63
+ - [Performance](#performance)
78
64
 
79
65
  ---
80
66
 
@@ -780,3291 +766,498 @@ class HnswIndex {
780
766
  }
781
767
  ```
782
768
 
769
+ ### Engine Status
770
+
771
+ | Component | Status |
772
+ |-----------|--------|
773
+ | **Cost-based optimizer** | ✅ Production-ready — full cost model, cardinality estimation, plan caching |
774
+ | **Adaptive group commit** | ✅ Implemented — Little's Law-based batch sizing |
775
+ | **WAL compaction** | ⚠️ Partial — manual `checkpoint()` + `truncateWal()` available |
776
+ | **HNSW vector index** | ✅ Production-ready — direct FFI bindings |
777
+
783
778
  ### Roadmap
784
779
 
785
- - **Current**: Direct HNSW FFI bindings
780
+ - **Current**: Direct HNSW FFI bindings with production cost-based optimizer
786
781
  - **Next**: Collection API auto-uses HNSW in embedded mode
787
782
  - **Future**: Persistent HNSW indexes with disk storage
788
783
 
789
784
  ---
790
785
 
791
- # SochDB Node.js SDK Documentation
792
-
793
- **LLM-Optimized Embedded Database with Native Vector Search**
794
-
795
786
  ---
796
787
 
797
- ## Table of Contents
788
+ ## API Reference
798
789
 
799
- 1. [Quick Start](#1-quick-start)
800
- 2. [Installation](#2-installation)
801
- 3. [Features](#3-features)
802
- - [Namespace API](#namespace-api---multi-tenant-isolation)
803
- - [Priority Queue API](#priority-queue-api---task-processing)
804
- 4. [Architecture Overview](#4-architecture-overview)
805
- 5. [Core Key-Value Operations](#5-core-key-value-operations)
806
- 6. [Transactions (ACID with SSI)](#6-transactions-acid-with-ssi)
807
- 7. [Query Builder](#7-query-builder)
808
- 8. [Prefix Scanning](#8-prefix-scanning)
809
- 9. [SQL Operations](#9-sql-operations)
810
- 10. [Table Management & Index Policies](#10-table-management--index-policies)
811
- 11. [Namespaces & Collections](#11-namespaces--collections)
812
- 12. [Priority Queues](#12-priority-queues)
813
- 13. [Vector Search](#13-vector-search)
814
- 14. [Hybrid Search (Vector + BM25)](#14-hybrid-search-vector--bm25)
815
- 15. [Graph Operations](#15-graph-operations)
816
- 16. [Temporal Graph (Time-Travel)](#16-temporal-graph-time-travel)
817
- 17. [Semantic Cache](#17-semantic-cache)
818
- 18. [Context Query Builder (LLM Optimization)](#18-context-query-builder-llm-optimization)
819
- 19. [Atomic Multi-Index Writes](#19-atomic-multi-index-writes)
820
- 20. [Recovery & WAL Management](#20-recovery--wal-management)
821
- 21. [Checkpoints & Snapshots](#21-checkpoints--snapshots)
822
- 22. [Compression & Storage](#22-compression--storage)
823
- 23. [Statistics & Monitoring](#23-statistics--monitoring)
824
- 24. [Server Mode (gRPC Client)](#24-server-mode-grpc-client)
825
- 25. [IPC Client (Unix Sockets)](#25-ipc-client-unix-sockets)
826
- 26. [Error Handling](#26-error-handling)
827
- 27. [Complete Examples](#27-complete-examples)
790
+ > **Version 0.5.3** Complete API documentation with TypeScript examples.
791
+
792
+ All core logic runs in the Rust engine via FFI. The SDK is a thin client.
828
793
 
829
794
  ---
830
795
 
831
- ## 1. Quick Start
796
+ ### Core Key-Value Operations
832
797
 
833
798
  ```typescript
834
- from sochdb import Database
799
+ import { EmbeddedDatabase } from '@sochdb/sochdb';
835
800
 
836
- # Open (or create) a database
837
- db = Database.open("./my_database")
801
+ const db = EmbeddedDatabase.open('./mydb');
838
802
 
839
- # Store and retrieve data
840
- db.put(b"hello", b"world")
841
- value = db.get(b"hello") # b"world"
803
+ // Put / Get / Delete
804
+ await db.put(Buffer.from('user:1'), Buffer.from('{"name":"Alice"}'));
805
+ const value = await db.get(Buffer.from('user:1'));
806
+ console.log(value?.toString()); // {"name":"Alice"}
807
+ await db.delete(Buffer.from('user:1'));
842
808
 
843
- # Use transactions for atomic operations
844
- with db.transaction() as txn:
845
- txn.put(b"key1", b"value1")
846
- txn.put(b"key2", b"value2")
847
- # Auto-commits on success, auto-rollbacks on exception
809
+ // Path-based keys (hierarchical)
810
+ await db.putPath('/users/alice/profile', Buffer.from('{"age":30}'));
811
+ const profile = await db.getPath('/users/alice/profile');
848
812
 
849
- # Clean up
850
- db.delete(b"hello")
851
- db.close()
813
+ db.close();
852
814
  ```
853
815
 
854
- **30-Second Overview:**
855
- - **Key-Value**: Fast reads/writes with `get`/`put`/`delete`
856
- - **Transactions**: ACID with SSI isolation
857
- - **Vector Search**: HNSW-based semantic search
858
- - **Hybrid Search**: Combine vectors with BM25 keyword search
859
- - **Graph**: Build and traverse knowledge graphs
860
- - **LLM-Optimized**: TOON format uses 40-60% fewer tokens than JSON
861
-
862
816
  ---
863
817
 
864
- ## 2. Installation
865
-
866
- ```bash
867
- npm install @sochdb/sochdb
868
- ```
818
+ ### Transactions (ACID with SSI)
869
819
 
870
- **Platform Support:**
871
- | Platform | Architecture | Status |
872
- |----------|--------------|--------|
873
- | Linux | x86_64, aarch64 | ✅ Full support |
874
- | macOS | x86_64, arm64 | ✅ Full support |
875
- | Windows | x86_64 | ✅ Full support |
820
+ SochDB uses Serializable Snapshot Isolation for full ACID transactions:
876
821
 
877
- **Optional Dependencies:**
878
- ```bash
879
- # For async support
880
- npm install @sochdb/sochdb[async]
822
+ ```typescript
823
+ const db = EmbeddedDatabase.open('./mydb');
881
824
 
882
- # For server mode
883
- npm install @sochdb/sochdb[grpc]
825
+ // Auto-managed transaction
826
+ await db.withTransaction(async (txn) => {
827
+ await txn.put(Buffer.from('key1'), Buffer.from('val1'));
828
+ await txn.put(Buffer.from('key2'), Buffer.from('val2'));
829
+ const v = await txn.get(Buffer.from('key1'));
830
+ // Auto-commits on success, auto-aborts on throw
831
+ });
884
832
 
885
- # Everything
886
- npm install @sochdb/sochdb[all]
833
+ // Manual transaction control
834
+ const txn = db.transaction();
835
+ try {
836
+ await txn.put(Buffer.from('balance:alice'), Buffer.from('100'));
837
+ await txn.put(Buffer.from('balance:bob'), Buffer.from('200'));
838
+ await txn.commit(); // Single atomic fsync
839
+ } catch (err) {
840
+ await txn.abort();
841
+ throw err;
842
+ }
887
843
  ```
888
844
 
889
845
  ---
890
846
 
891
- ## 3. Architecture Overview
892
-
893
- SochDB supports two deployment modes:
894
-
895
- ### Embedded Mode (Default)
896
-
897
- Direct Rust bindings via FFI. No server required.
898
-
899
- ```typescript
900
- from sochdb import Database
901
-
902
- with Database.open("./mydb") as db:
903
- db.put(b"key", b"value")
904
- value = db.get(b"key")
905
- ```
906
-
907
- **Best for:** Local development, notebooks, single-process applications.
908
-
909
- ### Server Mode (gRPC)
910
-
911
- Thin client connecting to `sochdb-grpc` server.
847
+ ### Prefix Scanning
912
848
 
913
849
  ```typescript
914
- from sochdb import SochDBClient
915
-
916
- client = SochDBClient("localhost:50051")
917
- client.put(b"key", b"value", namespace="default")
918
- value = client.get(b"key", namespace="default")
919
- ```
920
-
921
- **Best for:** Production, multi-process, distributed systems.
850
+ const db = EmbeddedDatabase.open('./mydb');
922
851
 
923
- ### Feature Comparison
852
+ // Insert test data
853
+ await db.put(Buffer.from('user:1'), Buffer.from('Alice'));
854
+ await db.put(Buffer.from('user:2'), Buffer.from('Bob'));
855
+ await db.put(Buffer.from('user:3'), Buffer.from('Charlie'));
924
856
 
925
- | Feature | Embedded | Server |
926
- |---------|----------|--------|
927
- | Setup | `npm install` only | Server + client |
928
- | Performance | Fastest (in-process) | Network overhead |
929
- | Multi-process | ❌ | ✅ |
930
- | Horizontal scaling | ❌ | ✅ |
931
- | Vector search | ✅ | ✅ |
932
- | Graph operations | ✅ | ✅ |
933
- | Semantic cache | ✅ | ✅ |
934
- | Context service | Limited | ✅ Full |
935
- | MCP integration | ❌ | ✅ |
857
+ // Scan all keys with prefix
858
+ for await (const [key, value] of db.scanPrefix(Buffer.from('user:'))) {
859
+ console.log(`${key.toString()} = ${value.toString()}`);
860
+ }
936
861
 
937
- ```
938
- ┌─────────────────────────────────────────────────────────────┐
939
- │ DEPLOYMENT OPTIONS │
940
- ├─────────────────────────────────────────────────────────────┤
941
- │ EMBEDDED MODE (FFI) SERVER MODE (gRPC) │
942
- │ ┌─────────────────────┐ ┌─────────────────────┐ │
943
- │ │ Node.js App │ │ Node.js App │ │
944
- │ │ ├─ Database.open()│ │ ├─ SochDBClient() │ │
945
- │ │ └─ Direct FFI │ │ └─ gRPC calls │ │
946
- │ │ │ │ │ │ │ │
947
- │ │ ▼ │ │ ▼ │ │
948
- │ │ libsochdb_storage │ │ sochdb-grpc │ │
949
- │ │ (Rust native) │ │ (Rust server) │ │
950
- │ └─────────────────────┘ └─────────────────────┘ │
951
- │ │
952
- │ ✅ No server needed ✅ Multi-language │
953
- │ ✅ Local files ✅ Centralized logic │
954
- │ ✅ Simple deployment ✅ Production scale │
955
- └─────────────────────────────────────────────────────────────┘
862
+ // Transaction-scoped scan
863
+ const txn = db.transaction();
864
+ for await (const [k, v] of txn.scanPrefix(Buffer.from('order:'))) {
865
+ console.log(`${k.toString()} = ${v.toString()}`);
866
+ }
867
+ await txn.commit();
956
868
  ```
957
869
 
958
870
  ---
959
871
 
960
- ## 4. Core Key-Value Operations
961
-
962
- All keys and values are **bytes**.
963
-
964
- ### Basic Operations
965
-
966
- ```typescript
967
- from sochdb import Database
968
-
969
- db = Database.open("./my_db")
970
-
971
- # Store data
972
- db.put(b"user:1", b"Alice")
973
- db.put(b"user:2", b"Bob")
974
-
975
- # Retrieve data
976
- user = db.get(b"user:1") # Returns b"Alice" or None
977
-
978
- # Check existence
979
- exists = db.exists(b"user:1") # True
980
-
981
- # Delete data
982
- db.delete(b"user:1")
983
-
984
- db.close()
985
- ```
986
-
987
- ### Path-Based Keys (Hierarchical)
872
+ ### Namespaces & Collections
988
873
 
989
- Organize data hierarchically with path-based access:
874
+ Multi-tenant isolation with vector-enabled collections:
990
875
 
991
876
  ```typescript
992
- # Store with path (strings auto-converted to bytes internally)
993
- db.put_path("users/alice/name", b"Alice Smith")
994
- db.put_path("users/alice/email", b"alice@example.com")
995
- db.put_path("users/bob/name", b"Bob Jones")
996
-
997
- # Retrieve by path
998
- name = db.get_path("users/alice/name") # b"Alice Smith"
999
-
1000
- # Delete by path
1001
- db.delete_path("users/alice/email")
877
+ import { EmbeddedDatabase } from '@sochdb/sochdb';
1002
878
 
1003
- # List at path (like listing directory)
1004
- children = db.list_path("users/") # ["alice", "bob"]
1005
- ```
879
+ const db = EmbeddedDatabase.open('./mydb');
1006
880
 
1007
- ### With TTL (Time-To-Live)
881
+ // Create or get a namespace
882
+ const ns = await db.getOrCreateNamespace('tenant_1', {
883
+ displayName: 'Tenant One',
884
+ labels: { tier: 'premium' },
885
+ });
1008
886
 
1009
- ```typescript
1010
- # Store with expiration (seconds)
1011
- db.put(b"session:abc123", b"user_data", ttl_seconds=3600) # Expires in 1 hour
887
+ // Create a collection with vector search
888
+ const docs = await ns.createCollection({
889
+ name: 'documents',
890
+ dimension: 384,
891
+ metric: 'cosine',
892
+ indexed: true,
893
+ hnswM: 16,
894
+ hnswEfConstruction: 100,
895
+ });
1012
896
 
1013
- # TTL of 0 means no expiration
1014
- db.put(b"permanent_key", b"value", ttl_seconds=0)
1015
- ```
897
+ // Insert vectors with metadata
898
+ const id = await docs.insert(
899
+ [0.1, 0.2, 0.3 /* ...384 dims */],
900
+ { title: 'Introduction to AI', author: 'Alice' },
901
+ );
1016
902
 
1017
- ### Batch Operations
903
+ // Batch insert
904
+ const ids = await docs.insertMany(
905
+ [[0.1, 0.2 /* ... */], [0.3, 0.4 /* ... */]],
906
+ [{ title: 'Doc 1' }, { title: 'Doc 2' }],
907
+ );
1018
908
 
1019
- ```typescript
1020
- # Batch put (more efficient than individual puts)
1021
- db.put_batch([
1022
- (b"key1", b"value1"),
1023
- (b"key2", b"value2"),
1024
- (b"key3", b"value3"),
1025
- ])
1026
-
1027
- # Batch get
1028
- values = db.get_batch([b"key1", b"key2", b"key3"])
1029
- # Returns: [b"value1", b"value2", b"value3"] (None for missing keys)
1030
-
1031
- # Batch delete
1032
- db.delete_batch([b"key1", b"key2", b"key3"])
1033
- ```
909
+ // Search with optional filter
910
+ const results = await docs.search({
911
+ queryVector: [0.1, 0.2, 0.3 /* ...384 dims */],
912
+ k: 5,
913
+ filter: { author: 'Alice' },
914
+ includeMetadata: true,
915
+ });
916
+ results.forEach(r => console.log(`${r.id}: score=${r.score.toFixed(4)}`));
1034
917
 
1035
- ### Context Manager
918
+ // Collection management
919
+ const count = await docs.count();
920
+ const collections = await ns.listCollections(); // ['documents']
921
+ await ns.deleteCollection('documents');
1036
922
 
1037
- ```typescript
1038
- with Database.open("./my_db") as db:
1039
- db.put(b"key", b"value")
1040
- # Automatically closes when exiting
923
+ // Namespace management
924
+ const namespaces = await db.listNamespaces();
925
+ await db.deleteNamespace('tenant_1');
1041
926
  ```
1042
927
 
1043
928
  ---
1044
929
 
1045
- ## 5. Transactions (ACID with SSI)
1046
-
1047
- SochDB provides full ACID transactions with **Serializable Snapshot Isolation (SSI)**.
1048
-
1049
- ### Context Manager Pattern (Recommended)
1050
-
1051
- ```typescript
1052
- # Auto-commits on success, auto-rollbacks on exception
1053
- with db.transaction() as txn:
1054
- txn.put(b"accounts/alice", b"1000")
1055
- txn.put(b"accounts/bob", b"500")
1056
-
1057
- # Read within transaction sees your writes
1058
- balance = txn.get(b"accounts/alice") # b"1000"
1059
-
1060
- # If exception occurs, rolls back automatically
1061
- ```
1062
-
1063
- ### Closure Pattern (Rust-Style)
1064
-
1065
- ```typescript
1066
- # Using with_transaction for automatic commit/rollback
1067
- def transfer_funds(txn):
1068
- alice = int(txn.get(b"accounts/alice") or b"0")
1069
- bob = int(txn.get(b"accounts/bob") or b"0")
1070
-
1071
- txn.put(b"accounts/alice", str(alice - 100).encode())
1072
- txn.put(b"accounts/bob", str(bob + 100).encode())
1073
-
1074
- return "Transfer complete"
1075
-
1076
- result = db.with_transaction(transfer_funds)
1077
- ```
1078
-
1079
- ### Manual Transaction Control
1080
-
1081
- ```typescript
1082
- txn = db.begin_transaction()
1083
- try:
1084
- txn.put(b"key1", b"value1")
1085
- txn.put(b"key2", b"value2")
1086
-
1087
- commit_ts = txn.commit() # Returns HLC timestamp
1088
- print(f"Committed at: {commit_ts}")
1089
- except Exception as e:
1090
- txn.abort()
1091
- raise
1092
- ```
1093
-
1094
- ### Transaction Properties
1095
-
1096
- ```typescript
1097
- txn = db.transaction()
1098
- print(f"Transaction ID: {txn.id}") # Unique identifier
1099
- print(f"Start timestamp: {txn.start_ts}") # HLC start time
1100
- print(f"Isolation: {txn.isolation}") # "serializable"
1101
- ```
930
+ ### Priority Queues
1102
931
 
1103
- ### SSI Conflict Handling
932
+ Ordered task queues with priority-based dequeue, ack/nack, and dead-letter support:
1104
933
 
1105
934
  ```typescript
1106
- from sochdb import TransactionConflictError
1107
-
1108
- MAX_RETRIES = 3
1109
-
1110
- for attempt in range(MAX_RETRIES):
1111
- try:
1112
- with db.transaction() as txn:
1113
- # Read and modify
1114
- value = int(txn.get(b"counter") or b"0")
1115
- txn.put(b"counter", str(value + 1).encode())
1116
- break # Success
1117
- except TransactionConflictError:
1118
- if attempt == MAX_RETRIES - 1:
1119
- raise
1120
- # Retry on conflict
1121
- continue
1122
- ```
935
+ import { createQueue, EmbeddedDatabase } from '@sochdb/sochdb';
1123
936
 
1124
- ### All Transaction Operations
937
+ const db = EmbeddedDatabase.open('./mydb');
1125
938
 
1126
- ```typescript
1127
- with db.transaction() as txn:
1128
- # Key-value
1129
- txn.put(key, value)
1130
- txn.get(key)
1131
- txn.delete(key)
1132
- txn.exists(key)
1133
-
1134
- # Path-based
1135
- txn.put_path(path, value)
1136
- txn.get_path(path)
1137
-
1138
- # Batch operations
1139
- txn.put_batch(pairs)
1140
- txn.get_batch(keys)
1141
-
1142
- # Scanning
1143
- for k, v in txn.scan_prefix(b"prefix/"):
1144
- print(k, v)
1145
-
1146
- # SQL (within transaction isolation)
1147
- result = txn.execute("SELECT * FROM users WHERE id = 1")
1148
- ```
939
+ // Create a queue
940
+ const queue = createQueue(db, 'background-jobs', {
941
+ visibilityTimeout: 30,
942
+ maxRetries: 3,
943
+ deadLetterQueue: 'failed-jobs',
944
+ });
1149
945
 
1150
- ### Isolation Levels
946
+ // Enqueue tasks (lower priority = higher urgency)
947
+ const taskId = await queue.enqueue(
948
+ 1,
949
+ Buffer.from(JSON.stringify({ action: 'send_email', to: 'alice@example.com' })),
950
+ { source: 'api', retryable: true },
951
+ );
1151
952
 
1152
- ```typescript
1153
- from sochdb import IsolationLevel
953
+ await queue.enqueue(10, Buffer.from('low priority'));
954
+ await queue.enqueue(1, Buffer.from('high priority'));
1154
955
 
1155
- # Default: Serializable (strongest)
1156
- with db.transaction(isolation=IsolationLevel.SERIALIZABLE) as txn:
1157
- pass
956
+ // Dequeue highest priority task
957
+ const task = await queue.dequeue('worker-1');
958
+ if (task) {
959
+ console.log(`Processing: ${task.taskId}`);
960
+ console.log(`Priority: ${task.priority}, State: ${task.state}`);
961
+
962
+ try {
963
+ // Process task...
964
+ await queue.ack(task.taskId); // Mark completed
965
+ } catch (err) {
966
+ await queue.nack(task.taskId); // Re-queue for retry
967
+ }
968
+ }
1158
969
 
1159
- # Snapshot isolation (faster, allows some anomalies)
1160
- with db.transaction(isolation=IsolationLevel.SNAPSHOT) as txn:
1161
- pass
970
+ // Queue statistics
971
+ const stats = await queue.stats();
972
+ console.log(`Pending: ${stats.pending}, Claimed: ${stats.claimed}`);
1162
973
 
1163
- # Read committed (fastest, least isolation)
1164
- with db.transaction(isolation=IsolationLevel.READ_COMMITTED) as txn:
1165
- pass
974
+ // Purge completed/dead-lettered tasks
975
+ const purged = await queue.purge();
1166
976
  ```
1167
977
 
1168
978
  ---
1169
979
 
1170
- ## 6. Query Builder
1171
-
1172
- Fluent API for building efficient queries with predicate pushdown.
1173
-
1174
- ### Basic Query
1175
-
1176
- ```typescript
1177
- # Query with prefix and limit
1178
- results = db.query("users/")
1179
- .limit(10)
1180
- .execute()
1181
-
1182
- for key, value in results:
1183
- print(f"{key.decode()}: {value.decode()}")
1184
- ```
1185
-
1186
- ### Filtered Query
1187
-
1188
- ```typescript
1189
- from sochdb import CompareOp
1190
-
1191
- # Query with filters
1192
- results = db.query("orders/")
1193
- .where("status", CompareOp.EQ, "pending")
1194
- .where("amount", CompareOp.GT, 100)
1195
- .order_by("created_at", descending=True)
1196
- .limit(50)
1197
- .offset(10)
1198
- .execute()
1199
- ```
980
+ ### Graph Operations
1200
981
 
1201
- ### Column Selection
982
+ Graph overlay stored as key-value pairs in the Rust engine:
1202
983
 
1203
984
  ```typescript
1204
- # Select specific fields only
1205
- results = db.query("users/")
1206
- .select(["name", "email"]) # Only fetch these columns
1207
- .where("active", CompareOp.EQ, True)
1208
- .execute()
1209
- ```
1210
-
1211
- ### Aggregate Queries
985
+ const db = EmbeddedDatabase.open('./mydb');
1212
986
 
1213
- ```typescript
1214
- # Count
1215
- count = db.query("orders/")
1216
- .where("status", CompareOp.EQ, "completed")
1217
- .count()
1218
-
1219
- # Sum (for numeric columns)
1220
- total = db.query("orders/")
1221
- .sum("amount")
1222
-
1223
- # Group by
1224
- results = db.query("orders/")
1225
- .select(["status", "COUNT(*)", "SUM(amount)"])
1226
- .group_by("status")
1227
- .execute()
1228
- ```
987
+ // Add nodes
988
+ await db.addNode('social', 'alice', 'person', { name: 'Alice', role: 'engineer' });
989
+ await db.addNode('social', 'bob', 'person', { name: 'Bob', role: 'designer' });
990
+ await db.addNode('social', 'acme', 'company', { name: 'Acme Corp' });
1229
991
 
1230
- ### Query in Transaction
992
+ // Add edges
993
+ await db.addEdge('social', 'alice', 'works_at', 'acme');
994
+ await db.addEdge('social', 'bob', 'works_at', 'acme');
995
+ await db.addEdge('social', 'alice', 'knows', 'bob');
1231
996
 
1232
- ```typescript
1233
- with db.transaction() as txn:
1234
- results = txn.query("users/")
1235
- .where("role", CompareOp.EQ, "admin")
1236
- .execute()
997
+ // Graph traversal (BFS or DFS)
998
+ const result = await db.traverse('social', 'alice', 2, 'bfs');
999
+ console.log(`Nodes: ${result.nodes.length}, Edges: ${result.edges.length}`);
1237
1000
  ```
1238
1001
 
1239
1002
  ---
1240
1003
 
1241
- ## 7. Prefix Scanning
1242
-
1243
- Iterate over keys with common prefixes efficiently.
1244
-
1245
- ### Safe Prefix Scan (Recommended)
1246
-
1247
- ```typescript
1248
- # Requires minimum 2-byte prefix (prevents accidental full scans)
1249
- for key, value in db.scan_prefix(b"users/"):
1250
- print(f"{key.decode()}: {value.decode()}")
1251
-
1252
- # Raises ValueError if prefix < 2 bytes
1253
- ```
1254
-
1255
- ### Unchecked Prefix Scan
1256
-
1257
- ```typescript
1258
- # For internal operations needing empty/short prefixes
1259
- # WARNING: Can cause expensive full-database scans
1260
- for key, value in db.scan_prefix_unchecked(b""):
1261
- print(f"All keys: {key}")
1262
- ```
1004
+ ### Semantic Cache
1263
1005
 
1264
- ### Batched Scanning (1000x Faster)
1006
+ Cache LLM responses with vector-similarity retrieval:
1265
1007
 
1266
1008
  ```typescript
1267
- # Fetches 1000 results per FFI call instead of 1
1268
- # Performance: 10,000 results = 10 FFI calls vs 10,000 calls
1269
-
1270
- for key, value in db.scan_batched(b"prefix/", batch_size=1000):
1271
- process(key, value)
1272
- ```
1009
+ import { SemanticCache, EmbeddedDatabase } from '@sochdb/sochdb';
1273
1010
 
1274
- ### Reverse Scan
1011
+ const db = EmbeddedDatabase.open('./cache_db');
1012
+ const cache = new SemanticCache(db, 'llm_responses');
1275
1013
 
1276
- ```typescript
1277
- # Scan in reverse order (newest first)
1278
- for key, value in db.scan_prefix(b"logs/", reverse=True):
1279
- print(key, value)
1280
- ```
1014
+ // Store response with embedding
1015
+ await cache.put(
1016
+ 'What is machine learning?',
1017
+ 'Machine learning is a subset of AI...',
1018
+ [0.1, 0.2, 0.3 /* ... */], // embedding vector
1019
+ 3600, // TTL seconds
1020
+ { model: 'gpt-4', tokens: 42 },
1021
+ );
1281
1022
 
1282
- ### Range Scan
1023
+ // Check cache before calling LLM
1024
+ const hit = await cache.get(queryEmbedding, 0.85);
1025
+ if (hit) {
1026
+ console.log(`Cache HIT (score: ${hit.score.toFixed(4)})`);
1027
+ console.log(`Response: ${hit.value}`);
1028
+ }
1283
1029
 
1284
- ```typescript
1285
- # Scan within a specific range
1286
- for key, value in db.scan_range(b"users/a", b"users/m"):
1287
- print(key, value) # All users from "a" to "m"
1030
+ // Cache management
1031
+ const stats = await cache.stats();
1032
+ console.log(`Hit rate: ${(stats.hitRate * 100).toFixed(1)}%`);
1033
+ await cache.purgeExpired();
1034
+ await cache.clear();
1288
1035
  ```
1289
1036
 
1290
- ### Streaming Large Results
1037
+ Convenience methods on `EmbeddedDatabase`:
1291
1038
 
1292
1039
  ```typescript
1293
- # For very large result sets, use streaming to avoid memory issues
1294
- for batch in db.scan_stream(b"logs/", batch_size=10000):
1295
- for key, value in batch:
1296
- process(key, value)
1297
- # Memory is freed after processing each batch
1040
+ await db.cachePut('my_cache', 'key', 'value', [0.1, 0.2 /* ... */], 3600);
1041
+ const val = await db.cacheGet('my_cache', [0.1, 0.2 /* ... */], 0.85);
1042
+ await db.cacheDelete('my_cache', 'key');
1043
+ await db.cacheClear('my_cache');
1298
1044
  ```
1299
1045
 
1300
1046
  ---
1301
1047
 
1302
- ## 8. SQL Operations
1303
-
1304
- Execute SQL queries for familiar relational patterns.
1305
-
1306
- ### Creating Tables
1307
-
1308
- ```typescript
1309
- db.execute_sql("""
1310
- CREATE TABLE users (
1311
- id INTEGER PRIMARY KEY,
1312
- name TEXT NOT NULL,
1313
- email TEXT UNIQUE,
1314
- age INTEGER,
1315
- created_at TEXT DEFAULT CURRENT_TIMESTAMP
1316
- )
1317
- """)
1318
-
1319
- db.execute_sql("""
1320
- CREATE TABLE posts (
1321
- id INTEGER PRIMARY KEY,
1322
- user_id INTEGER REFERENCES users(id),
1323
- title TEXT NOT NULL,
1324
- content TEXT,
1325
- likes INTEGER DEFAULT 0
1326
- )
1327
- """)
1328
- ```
1329
-
1330
- ### CRUD Operations
1331
-
1332
- ```typescript
1333
- # Insert
1334
- db.execute_sql("""
1335
- INSERT INTO users (id, name, email, age)
1336
- VALUES (1, 'Alice', 'alice@example.com', 30)
1337
- """)
1338
-
1339
- # Insert with parameters (prevents SQL injection)
1340
- db.execute_sql(
1341
- "INSERT INTO users (id, name, email, age) VALUES (?, ?, ?, ?)",
1342
- params=[2, "Bob", "bob@example.com", 25]
1343
- )
1344
-
1345
- # Select
1346
- result = db.execute_sql("SELECT * FROM users WHERE age > 25")
1347
- for row in result.rows:
1348
- print(row) # {'id': 1, 'name': 'Alice', ...}
1349
-
1350
- # Update
1351
- db.execute_sql("UPDATE users SET email = 'alice.new@example.com' WHERE id = 1")
1352
-
1353
- # Delete
1354
- db.execute_sql("DELETE FROM users WHERE id = 2")
1355
- ```
1356
-
1357
- ### Upsert (Insert or Update)
1048
+ ### Context Query Builder
1358
1049
 
1359
- ```typescript
1360
- # Insert or update on conflict
1361
- db.execute_sql("""
1362
- INSERT INTO users (id, name, email) VALUES (1, 'Alice', 'alice@example.com')
1363
- ON CONFLICT (id) DO UPDATE SET
1364
- name = excluded.name,
1365
- email = excluded.email
1366
- """)
1367
- ```
1368
-
1369
- ### Query Results
1050
+ Token-budget-aware context assembly for LLM prompts:
1370
1051
 
1371
1052
  ```typescript
1372
- from sochdb import SQLQueryResult
1373
-
1374
- result = db.execute_sql("SELECT id, name FROM users")
1053
+ import { createContextBuilder, ContextOutputFormat, TruncationStrategy } from '@sochdb/sochdb';
1375
1054
 
1376
- print(f"Columns: {result.columns}") # ['id', 'name']
1377
- print(f"Row count: {len(result.rows)}")
1378
- print(f"Execution time: {result.execution_time_ms}ms")
1055
+ const result = createContextBuilder()
1056
+ .forSession('session_123')
1057
+ .withBudget(4096)
1058
+ .setFormat(ContextOutputFormat.MARKDOWN)
1059
+ .setTruncation(TruncationStrategy.PROPORTIONAL)
1060
+ .literal('system', 100, 'You are a helpful assistant.')
1061
+ .literal('user_query', 90, 'Tell me about SochDB')
1062
+ .section('context')
1063
+ .execute();
1379
1064
 
1380
- for row in result.rows:
1381
- print(f"ID: {row['id']}, Name: {row['name']}")
1382
-
1383
- # Convert to different formats
1384
- df = result.to_dataframe() # pandas DataFrame
1385
- json_data = result.to_json()
1065
+ console.log(`Tokens used: ${result.tokenCount}`);
1066
+ console.log(result.text);
1386
1067
  ```
1387
1068
 
1388
- ### Index Management
1389
-
1390
- ```typescript
1391
- # Create index
1392
- db.execute_sql("CREATE INDEX idx_users_email ON users(email)")
1393
-
1394
- # Create unique index
1395
- db.execute_sql("CREATE UNIQUE INDEX idx_users_email ON users(email)")
1396
-
1397
- # Drop index
1398
- db.execute_sql("DROP INDEX IF EXISTS idx_users_email")
1069
+ ---
1399
1070
 
1400
- # List indexes
1401
- indexes = db.list_indexes("users")
1402
- ```
1071
+ ### Memory System (LLM-Native)
1403
1072
 
1404
- ### Prepared Statements
1073
+ Complete memory for AI agents — extraction, consolidation, hybrid retrieval:
1405
1074
 
1406
1075
  ```typescript
1407
- # Prepare once, execute many times
1408
- stmt = db.prepare("SELECT * FROM users WHERE age > ? AND status = ?")
1409
-
1410
- # Execute with different parameters
1411
- young_active = stmt.execute([25, "active"])
1412
- old_active = stmt.execute([50, "active"])
1413
-
1414
- # Close when done
1415
- stmt.close()
1416
- ```
1417
-
1418
- ### Dialect Support
1076
+ import {
1077
+ EmbeddedDatabase, ExtractionPipeline, Consolidator, HybridRetriever, AllowedSet,
1078
+ } from '@sochdb/sochdb';
1419
1079
 
1420
- SochDB auto-detects SQL dialects:
1080
+ const db = EmbeddedDatabase.open('./memory_db');
1421
1081
 
1422
- ```typescript
1423
- # PostgreSQL style
1424
- db.execute_sql("INSERT INTO users VALUES (1, 'Alice') ON CONFLICT DO NOTHING")
1082
+ // 1. Extract entities and relations
1083
+ const pipeline = ExtractionPipeline.fromDatabase(db, 'user_123', {
1084
+ entityTypes: ['person', 'organization', 'location'],
1085
+ minConfidence: 0.7,
1086
+ });
1087
+ const extracted = await pipeline.extractAndCommit(
1088
+ 'Alice joined Acme Corp in San Francisco.',
1089
+ async (text) => ({
1090
+ entities: [
1091
+ { name: 'Alice', entity_type: 'person', confidence: 0.95 },
1092
+ { name: 'Acme Corp', entity_type: 'organization', confidence: 0.9 },
1093
+ ],
1094
+ relations: [
1095
+ { from_entity: 'Alice', relation_type: 'works_at', to_entity: 'Acme Corp' },
1096
+ ],
1097
+ }),
1098
+ );
1425
1099
 
1426
- # MySQL style
1427
- db.execute_sql("INSERT IGNORE INTO users VALUES (1, 'Alice')")
1100
+ // 2. Consolidate facts
1101
+ const consolidator = Consolidator.fromDatabase(db, 'user_123');
1102
+ await consolidator.add({
1103
+ fact: { subject: 'Alice', predicate: 'lives_in', object: 'SF' },
1104
+ source: 'conversation_1',
1105
+ confidence: 0.9,
1106
+ });
1107
+ await consolidator.consolidate();
1108
+ const facts = await consolidator.getCanonicalFacts();
1428
1109
 
1429
- # SQLite style
1430
- db.execute_sql("INSERT OR IGNORE INTO users VALUES (1, 'Alice')")
1110
+ // 3. Hybrid retrieval (vector + BM25 with RRF fusion)
1111
+ const retriever = HybridRetriever.fromDatabase(db, 'user_123', 'documents');
1112
+ await retriever.indexDocuments([
1113
+ { id: 'doc1', content: 'Alice is a software engineer', embedding: [0.1, 0.2 /* ... */] },
1114
+ ]);
1115
+ const results = await retriever.retrieve('Who is Alice?', [0.1, 0.2 /* ... */], AllowedSet.allowAll(), 5);
1431
1116
  ```
1432
1117
 
1433
1118
  ---
1434
1119
 
1435
- ## 9. Table Management & Index Policies
1436
-
1437
- ### Table Information
1438
-
1439
- ```typescript
1440
- # Get table schema
1441
- schema = db.get_table_schema("users")
1442
- print(f"Columns: {schema.columns}")
1443
- print(f"Primary key: {schema.primary_key}")
1444
- print(f"Indexes: {schema.indexes}")
1445
-
1446
- # List all tables
1447
- tables = db.list_tables()
1448
-
1449
- # Drop table
1450
- db.execute_sql("DROP TABLE IF EXISTS old_table")
1451
- ```
1452
-
1453
- ### Index Policies
1454
-
1455
- Configure per-table indexing strategies for optimal performance:
1120
+ ### Data Formats (TOON/JSON/Columnar)
1456
1121
 
1457
1122
  ```typescript
1458
- # Policy constants
1459
- Database.INDEX_WRITE_OPTIMIZED # 0 - O(1) insert, O(N) scan
1460
- Database.INDEX_BALANCED # 1 - O(1) amortized insert, O(log K) scan
1461
- Database.INDEX_SCAN_OPTIMIZED # 2 - O(log N) insert, O(log N + K) scan
1462
- Database.INDEX_APPEND_ONLY # 3 - O(1) insert, O(N) scan (time-series)
1463
-
1464
- # Set by constant
1465
- db.set_table_index_policy("logs", Database.INDEX_APPEND_ONLY)
1123
+ // TOON format — compact, LLM-optimized
1124
+ const toon = EmbeddedDatabase.toToon('users', [
1125
+ { id: 1, name: 'Alice', role: 'engineer' },
1126
+ { id: 2, name: 'Bob', role: 'designer' },
1127
+ ]);
1128
+ // [users]
1129
+ // id|name|role
1130
+ // 1|Alice|engineer
1131
+ // 2|Bob|designer
1466
1132
 
1467
- # Set by string
1468
- db.set_table_index_policy("users", "scan_optimized")
1469
-
1470
- # Get current policy
1471
- policy = db.get_table_index_policy("users")
1472
- print(f"Policy: {policy}") # "scan_optimized"
1133
+ // JSON round-trip
1134
+ const json = EmbeddedDatabase.toJson('users', [{ id: 1, name: 'Alice' }]);
1135
+ const parsed = EmbeddedDatabase.fromJson(json);
1473
1136
  ```
1474
1137
 
1475
- ### Policy Selection Guide
1476
-
1477
- | Policy | Insert | Scan | Best For |
1478
- |--------|--------|------|----------|
1479
- | `write_optimized` | O(1) | O(N) | High-write ingestion |
1480
- | `balanced` | O(1) amortized | O(log K) | General use (default) |
1481
- | `scan_optimized` | O(log N) | O(log N + K) | Analytics, read-heavy |
1482
- | `append_only` | O(1) | O(N) | Time-series, logs |
1483
-
1484
1138
  ---
1485
1139
 
1486
- ## 10. Namespaces & Multi-Tenancy
1487
-
1488
- Organize data into logical namespaces for tenant isolation.
1489
-
1490
- ### Creating Namespaces
1491
-
1492
- ```typescript
1493
- from sochdb import NamespaceConfig
1494
-
1495
- # Create namespace with metadata
1496
- ns = db.create_namespace(
1497
- name="tenant_123",
1498
- display_name="Acme Corp",
1499
- labels={"tier": "premium", "region": "us-east"}
1500
- )
1501
-
1502
- # Simple creation
1503
- ns = db.create_namespace("tenant_456")
1504
- ```
1505
-
1506
- ### Getting Namespaces
1140
+ ### Policy Service & MCP
1507
1141
 
1508
1142
  ```typescript
1509
- # Get existing namespace
1510
- ns = db.namespace("tenant_123")
1511
-
1512
- # Get or create (idempotent)
1513
- ns = db.get_or_create_namespace("tenant_123")
1514
-
1515
- # Check if exists
1516
- exists = db.namespace_exists("tenant_123")
1517
- ```
1143
+ import { PolicyService, McpServer } from '@sochdb/sochdb';
1518
1144
 
1519
- ### Context Manager for Scoped Operations
1145
+ // Access control
1146
+ const policy = new PolicyService(db);
1147
+ policy.addRule({
1148
+ name: 'Allow Write for Admins',
1149
+ resource: '*', action: 'write', effect: 'allow',
1150
+ condition: (ctx) => ctx.role === 'admin',
1151
+ });
1152
+ const allowed = policy.evaluate({ role: 'admin' }, 'documents', 'write');
1520
1153
 
1521
- ```typescript
1522
- with db.use_namespace("tenant_123") as ns:
1523
- # All operations automatically scoped to tenant_123
1524
- collection = ns.collection("documents")
1525
- ns.put("config/key", b"value")
1526
-
1527
- # No need to specify namespace in each call
1154
+ // MCP — expose DB tools to AI agents
1155
+ const server = new McpServer(db, { name: 'my-db', version: '1.0.0' });
1156
+ server.registerDefaultTools();
1157
+ await server.start();
1528
1158
  ```
1529
1159
 
1530
- ### Namespace Operations
1531
-
1532
- ```typescript
1533
- # List all namespaces
1534
- namespaces = db.list_namespaces()
1535
- print(namespaces) # ['tenant_123', 'tenant_456']
1536
-
1537
- # Get namespace info
1538
- info = db.namespace_info("tenant_123")
1539
- print(f"Created: {info['created_at']}")
1540
- print(f"Labels: {info['labels']}")
1541
- print(f"Size: {info['size_bytes']}")
1542
-
1543
- # Update labels
1544
- db.update_namespace("tenant_123", labels={"tier": "enterprise"})
1545
-
1546
- # Delete namespace (WARNING: deletes all data in namespace)
1547
- db.delete_namespace("old_tenant", force=True)
1548
- ```
1160
+ ---
1549
1161
 
1550
- ### Namespace-Scoped Key-Value
1162
+ ### Server Mode (IPC / gRPC)
1551
1163
 
1552
1164
  ```typescript
1553
- ns = db.namespace("tenant_123")
1554
-
1555
- # Operations automatically prefixed with namespace
1556
- ns.put("users/alice", b"data") # Actually: tenant_123/users/alice
1557
- ns.get("users/alice")
1558
- ns.delete("users/alice")
1559
-
1560
- # Scan within namespace
1561
- for key, value in ns.scan("users/"):
1562
- print(key, value) # Keys shown without namespace prefix
1563
- ```
1165
+ import { IpcClient, SochDBClient } from '@sochdb/sochdb';
1564
1166
 
1565
- ### Cross-Namespace Operations
1167
+ // IPC Client (Unix socket)
1168
+ const ipc = new IpcClient('/tmp/sochdb.sock');
1169
+ await ipc.connect();
1170
+ await ipc.put(Buffer.from('key'), Buffer.from('value'));
1171
+ const val = await ipc.get(Buffer.from('key'));
1172
+ await ipc.addNode('ns', 'alice', 'person', { name: 'Alice' });
1173
+ ipc.close();
1566
1174
 
1567
- ```typescript
1568
- # Copy data between namespaces
1569
- db.copy_between_namespaces(
1570
- source_ns="tenant_123",
1571
- target_ns="tenant_456",
1572
- prefix="shared/"
1573
- )
1175
+ // gRPC Client
1176
+ const grpc = new SochDBClient('localhost:50051');
1177
+ await grpc.createIndex('embeddings', 384, 'cosine');
1178
+ await grpc.insertVectors('embeddings', [1, 2], [[0.1, 0.2 /* ... */], [0.3, 0.4 /* ... */]]);
1179
+ const searchResults = await grpc.search('embeddings', [0.1, 0.2 /* ... */], 5);
1180
+ grpc.close();
1574
1181
  ```
1575
1182
 
1576
1183
  ---
1577
1184
 
1578
- ## 11. Collections & Vector Search
1579
-
1580
- Collections store documents with embeddings for semantic search using HNSW.
1581
-
1582
- ### Collection Configuration
1583
-
1584
- ```typescript
1585
- from sochdb import (
1586
- CollectionConfig,
1587
- DistanceMetric,
1588
- QuantizationType,
1589
- )
1590
-
1591
- config = CollectionConfig(
1592
- name="documents",
1593
- dimension=384, # Embedding dimension (must match your model)
1594
- metric=DistanceMetric.COSINE, # COSINE, EUCLIDEAN, DOT_PRODUCT
1595
- m=16, # HNSW M parameter (connections per node)
1596
- ef_construction=100, # HNSW construction quality
1597
- ef_search=50, # HNSW search quality (higher = slower but better)
1598
- quantization=QuantizationType.NONE, # NONE, SCALAR (int8), PQ (product quantization)
1599
- enable_hybrid_search=False, # Enable BM25 + vector
1600
- content_field=None, # Field for BM25 indexing
1601
- )
1602
- ```
1603
-
1604
- ### Creating Collections
1185
+ ### Checkpoints & Statistics
1605
1186
 
1606
1187
  ```typescript
1607
- ns = db.namespace("default")
1608
-
1609
- # With config object
1610
- collection = ns.create_collection(config)
1188
+ const db = EmbeddedDatabase.open('./mydb');
1611
1189
 
1612
- # With parameters (simpler)
1613
- collection = ns.create_collection(
1614
- name="documents",
1615
- dimension=384,
1616
- metric=DistanceMetric.COSINE
1617
- )
1190
+ const lsn = await db.checkpoint();
1191
+ console.log(`Checkpoint LSN: ${lsn}`);
1618
1192
 
1619
- # Get existing collection
1620
- collection = ns.collection("documents")
1193
+ const stats = await db.stats();
1194
+ console.log(`Memtable: ${stats.memtableSizeBytes} bytes`);
1195
+ console.log(`WAL: ${stats.walSizeBytes} bytes`);
1196
+ console.log(`Active txns: ${stats.activeTransactions}`);
1621
1197
  ```
1622
1198
 
1623
- ### Inserting Documents
1624
-
1625
- ```typescript
1626
- # Single insert
1627
- collection.insert(
1628
- id="doc1",
1629
- vector=[0.1, 0.2, ...], # 384-dim float array
1630
- metadata={"title": "Introduction", "author": "Alice", "category": "tech"}
1631
- )
1632
-
1633
- # Batch insert (more efficient for bulk loading)
1634
- collection.insert_batch(
1635
- ids=["doc1", "doc2", "doc3"],
1636
- vectors=[[...], [...], [...]], # List of vectors
1637
- metadata=[
1638
- {"title": "Doc 1"},
1639
- {"title": "Doc 2"},
1640
- {"title": "Doc 3"}
1641
- ]
1642
- )
1643
-
1644
- # Multi-vector insert (multiple vectors per document, e.g., chunks)
1645
- collection.insert_multi(
1646
- id="long_doc",
1647
- vectors=[[...], [...], [...]], # Multiple vectors for same doc
1648
- metadata={"title": "Long Document"}
1649
- )
1650
- ```
1199
+ ---
1651
1200
 
1652
- ### Vector Search
1201
+ ### Error Handling
1653
1202
 
1654
1203
  ```typescript
1655
- from sochdb import SearchRequest
1656
-
1657
- # Using SearchRequest (full control)
1658
- request = SearchRequest(
1659
- vector=[0.15, 0.25, ...], # Query vector
1660
- k=10, # Number of results
1661
- ef_search=100, # Search quality (overrides collection default)
1662
- filter={"author": "Alice"}, # Metadata filter
1663
- min_score=0.7, # Minimum similarity score
1664
- include_vectors=False, # Include vectors in results
1665
- include_metadata=True, # Include metadata in results
1666
- )
1667
- results = collection.search(request)
1668
-
1669
- # Convenience method (simpler)
1670
- results = collection.vector_search(
1671
- vector=[0.15, 0.25, ...],
1672
- k=10,
1673
- filter={"author": "Alice"}
1674
- )
1675
-
1676
- # Process results
1677
- for result in results:
1678
- print(f"ID: {result.id}")
1679
- print(f"Score: {result.score:.4f}") # Similarity score
1680
- print(f"Metadata: {result.metadata}")
1681
- ```
1682
-
1683
- ### Metadata Filtering
1204
+ import {
1205
+ SochDBError, TransactionError, DatabaseLockedError,
1206
+ NamespaceNotFoundError, CollectionExistsError,
1207
+ } from '@sochdb/sochdb';
1684
1208
 
1685
- ```typescript
1686
- # Equality
1687
- filter={"author": "Alice"}
1688
-
1689
- # Comparison operators
1690
- filter={"age": {"$gt": 30}} # Greater than
1691
- filter={"age": {"$gte": 30}} # Greater than or equal
1692
- filter={"age": {"$lt": 30}} # Less than
1693
- filter={"age": {"$lte": 30}} # Less than or equal
1694
- filter={"author": {"$ne": "Alice"}} # Not equal
1695
-
1696
- # Array operators
1697
- filter={"category": {"$in": ["tech", "science"]}} # In array
1698
- filter={"category": {"$nin": ["sports"]}} # Not in array
1699
-
1700
- # Logical operators
1701
- filter={"$and": [{"author": "Alice"}, {"year": 2024}]}
1702
- filter={"$or": [{"category": "tech"}, {"category": "science"}]}
1703
- filter={"$not": {"author": "Bob"}}
1704
-
1705
- # Nested filters
1706
- filter={
1707
- "$and": [
1708
- {"$or": [{"category": "tech"}, {"category": "science"}]},
1709
- {"year": {"$gte": 2020}}
1710
- ]
1209
+ try {
1210
+ await db.withTransaction(async (txn) => {
1211
+ await txn.put(Buffer.from('key'), Buffer.from('value'));
1212
+ });
1213
+ } catch (err) {
1214
+ if (err instanceof DatabaseLockedError) {
1215
+ console.error('Database locked by another process');
1216
+ } else if (err instanceof TransactionError) {
1217
+ console.error('Transaction conflict (SSI) retry');
1218
+ } else if (err instanceof NamespaceNotFoundError) {
1219
+ console.error('Namespace does not exist');
1220
+ }
1711
1221
  }
1712
1222
  ```
1713
1223
 
1714
- ### Collection Management
1715
-
1716
- ```typescript
1717
- # Get collection
1718
- collection = ns.get_collection("documents")
1719
- # or
1720
- collection = ns.collection("documents")
1721
-
1722
- # List collections
1723
- collections = ns.list_collections()
1724
-
1725
- # Collection info
1726
- info = collection.info()
1727
- print(f"Name: {info['name']}")
1728
- print(f"Dimension: {info['dimension']}")
1729
- print(f"Count: {info['count']}")
1730
- print(f"Metric: {info['metric']}")
1731
- print(f"Index size: {info['index_size_bytes']}")
1732
-
1733
- # Delete collection
1734
- ns.delete_collection("old_collection")
1735
-
1736
- # Individual document operations
1737
- doc = collection.get("doc1")
1738
- collection.delete("doc1")
1739
- collection.update("doc1", metadata={"category": "updated"})
1740
- count = collection.count()
1741
- ```
1224
+ ---
1742
1225
 
1743
- ### Quantization for Memory Efficiency
1226
+ ### Configuration Reference
1744
1227
 
1745
1228
  ```typescript
1746
- # Scalar quantization (int8) - 4x memory reduction
1747
- config = CollectionConfig(
1748
- name="documents",
1749
- dimension=384,
1750
- quantization=QuantizationType.SCALAR
1751
- )
1752
-
1753
- # Product quantization - 32x memory reduction
1754
- config = CollectionConfig(
1755
- name="documents",
1756
- dimension=768,
1757
- quantization=QuantizationType.PQ,
1758
- pq_num_subvectors=96, # 768/96 = 8 dimensions per subvector
1759
- pq_num_centroids=256 # 8-bit codes
1760
- )
1229
+ const db = EmbeddedDatabase.open('./mydb', {
1230
+ walEnabled: true,
1231
+ syncMode: 'full', // 'full' | 'normal' | 'off'
1232
+ memtableSizeBytes: 64 * 1024 * 1024,
1233
+ groupCommit: true,
1234
+ indexPolicy: 'balanced', // 'write_optimized' | 'balanced' | 'scan_optimized' | 'append_only'
1235
+ });
1761
1236
  ```
1762
1237
 
1763
- ---
1764
-
1765
- ## 12. Hybrid Search (Vector + BM25)
1766
-
1767
- Combine vector similarity with keyword matching for best results.
1238
+ | Environment Variable | Description |
1239
+ |---------------------|-------------|
1240
+ | `SOCHDB_LIB_PATH` | Custom path to native library |
1241
+ | `SOCHDB_LOG_LEVEL` | Log level (DEBUG, INFO, WARN, ERROR) |
1768
1242
 
1769
- ### Enable Hybrid Search
1243
+ ---
1770
1244
 
1771
- ```typescript
1772
- config = CollectionConfig(
1773
- name="articles",
1774
- dimension=384,
1775
- enable_hybrid_search=True, # Enable BM25 indexing
1776
- content_field="text" # Field to index for BM25
1777
- )
1778
- collection = ns.create_collection(config)
1779
-
1780
- # Insert with text content
1781
- collection.insert(
1782
- id="article1",
1783
- vector=[...],
1784
- metadata={
1785
- "title": "Machine Learning Tutorial",
1786
- "text": "This tutorial covers the basics of machine learning...",
1787
- "category": "tech"
1788
- }
1789
- )
1790
- ```
1791
-
1792
- ### Keyword Search (BM25 Only)
1793
-
1794
- ```typescript
1795
- results = collection.keyword_search(
1796
- query="machine learning tutorial",
1797
- k=10,
1798
- filter={"category": "tech"}
1799
- )
1800
- ```
1801
-
1802
- ### Hybrid Search (Vector + BM25)
1803
-
1804
- ```typescript
1805
- # Combine vector and keyword search
1806
- results = collection.hybrid_search(
1807
- vector=[0.1, 0.2, ...], # Query embedding
1808
- text_query="machine learning", # Keyword query
1809
- k=10,
1810
- alpha=0.7, # 0.0 = pure keyword, 1.0 = pure vector, 0.5 = balanced
1811
- filter={"category": "tech"}
1812
- )
1813
- ```
1814
-
1815
- ### Full SearchRequest for Hybrid
1816
-
1817
- ```typescript
1818
- request = SearchRequest(
1819
- vector=[0.1, 0.2, ...],
1820
- text_query="machine learning",
1821
- k=10,
1822
- alpha=0.7, # Blend factor
1823
- rrf_k=60.0, # RRF k parameter (Reciprocal Rank Fusion)
1824
- filter={"category": "tech"},
1825
- aggregate="max", # max | mean | first (for multi-vector docs)
1826
- as_of="2024-01-01T00:00:00Z", # Time-travel query
1827
- include_vectors=False,
1828
- include_metadata=True,
1829
- include_scores=True,
1830
- )
1831
- results = collection.search(request)
1832
-
1833
- # Access detailed results
1834
- print(f"Query time: {results.query_time_ms}ms")
1835
- print(f"Total matches: {results.total_count}")
1836
- print(f"Vector results: {results.vector_results}") # Results from vector search
1837
- print(f"Keyword results: {results.keyword_results}") # Results from BM25
1838
- print(f"Fused results: {results.fused_results}") # Combined results
1839
- ```
1840
-
1841
- ---
1842
-
1843
- ## 13. Graph Operations
1844
-
1845
- Build and query knowledge graphs.
1846
-
1847
- ### Adding Nodes
1848
-
1849
- ```typescript
1850
- # Add a node
1851
- db.add_node(
1852
- namespace="default",
1853
- node_id="alice",
1854
- node_type="person",
1855
- properties={"role": "engineer", "team": "ml", "level": "senior"}
1856
- )
1857
-
1858
- db.add_node("default", "project_x", "project", {"status": "active", "priority": "high"})
1859
- db.add_node("default", "bob", "person", {"role": "manager", "team": "ml"})
1860
- ```
1861
-
1862
- ### Adding Edges
1863
-
1864
- ```typescript
1865
- # Add directed edge
1866
- db.add_edge(
1867
- namespace="default",
1868
- from_id="alice",
1869
- edge_type="works_on",
1870
- to_id="project_x",
1871
- properties={"role": "lead", "since": "2024-01"}
1872
- )
1873
-
1874
- db.add_edge("default", "alice", "reports_to", "bob")
1875
- db.add_edge("default", "bob", "manages", "project_x")
1876
- ```
1877
-
1878
- ### Graph Traversal
1879
-
1880
- ```typescript
1881
- # BFS traversal from a starting node
1882
- nodes, edges = db.traverse(
1883
- namespace="default",
1884
- start_node="alice",
1885
- max_depth=3,
1886
- order="bfs" # "bfs" or "dfs"
1887
- )
1888
-
1889
- for node in nodes:
1890
- print(f"Node: {node['id']} ({node['node_type']})")
1891
- print(f" Properties: {node['properties']}")
1892
-
1893
- for edge in edges:
1894
- print(f"{edge['from_id']} --{edge['edge_type']}--> {edge['to_id']}")
1895
- ```
1896
-
1897
- ### Filtered Traversal
1898
-
1899
- ```typescript
1900
- # Traverse with filters
1901
- nodes, edges = db.traverse(
1902
- namespace="default",
1903
- start_node="alice",
1904
- max_depth=2,
1905
- edge_types=["works_on", "reports_to"], # Only follow these edge types
1906
- node_types=["person", "project"], # Only include these node types
1907
- node_filter={"team": "ml"} # Filter nodes by properties
1908
- )
1909
- ```
1910
-
1911
- ### Graph Queries
1912
-
1913
- ```typescript
1914
- # Find shortest path
1915
- path = db.find_path(
1916
- namespace="default",
1917
- from_id="alice",
1918
- to_id="project_y",
1919
- max_depth=5
1920
- )
1921
-
1922
- # Get neighbors
1923
- neighbors = db.get_neighbors(
1924
- namespace="default",
1925
- node_id="alice",
1926
- direction="outgoing" # "outgoing", "incoming", "both"
1927
- )
1928
-
1929
- # Get specific edge
1930
- edge = db.get_edge("default", "alice", "works_on", "project_x")
1931
-
1932
- # Delete node (and all connected edges)
1933
- db.delete_node("default", "old_node")
1934
-
1935
- # Delete edge
1936
- db.delete_edge("default", "alice", "works_on", "project_old")
1937
- ```
1938
-
1939
- ---
1940
-
1941
- ## 14. Temporal Graph (Time-Travel)
1942
-
1943
- Track state changes over time with temporal edges.
1944
-
1945
- ### Adding Temporal Edges
1946
-
1947
- ```typescript
1948
- import time
1949
-
1950
- now = int(time.time() * 1000) # milliseconds since epoch
1951
- one_hour = 60 * 60 * 1000
1952
-
1953
- # Record: Door was open from 10:00 to 11:00
1954
- db.add_temporal_edge(
1955
- namespace="smart_home",
1956
- from_id="door_front",
1957
- edge_type="STATE",
1958
- to_id="open",
1959
- valid_from=now - one_hour, # Start time (ms)
1960
- valid_until=now, # End time (ms)
1961
- properties={"sensor": "motion_1", "confidence": 0.95}
1962
- )
1963
-
1964
- # Record: Light is currently on (no end time yet)
1965
- db.add_temporal_edge(
1966
- namespace="smart_home",
1967
- from_id="light_living",
1968
- edge_type="STATE",
1969
- to_id="on",
1970
- valid_from=now,
1971
- valid_until=0, # 0 = still valid (no end time)
1972
- properties={"brightness": "80%", "color": "warm"}
1973
- )
1974
- ```
1975
-
1976
- ### Time-Travel Queries
1977
-
1978
- ```typescript
1979
- # Query modes:
1980
- # - "CURRENT": Edges valid right now
1981
- # - "POINT_IN_TIME": Edges valid at specific timestamp
1982
- # - "RANGE": All edges within a time range
1983
-
1984
- # What is the current state?
1985
- edges = db.query_temporal_graph(
1986
- namespace="smart_home",
1987
- node_id="door_front",
1988
- mode="CURRENT",
1989
- edge_type="STATE"
1990
- )
1991
- current_state = edges[0]["to_id"] if edges else "unknown"
1992
-
1993
- # Was the door open 1.5 hours ago?
1994
- edges = db.query_temporal_graph(
1995
- namespace="smart_home",
1996
- node_id="door_front",
1997
- mode="POINT_IN_TIME",
1998
- timestamp=now - int(1.5 * 60 * 60 * 1000)
1999
- )
2000
- was_open = any(e["to_id"] == "open" for e in edges)
2001
-
2002
- # All state changes in last hour
2003
- edges = db.query_temporal_graph(
2004
- namespace="smart_home",
2005
- node_id="door_front",
2006
- mode="RANGE",
2007
- start_time=now - one_hour,
2008
- end_time=now
2009
- )
2010
- for edge in edges:
2011
- print(f"State: {edge['to_id']} from {edge['valid_from']} to {edge['valid_until']}")
2012
- ```
2013
-
2014
- ### End a Temporal Edge
2015
-
2016
- ```typescript
2017
- # Close the current "on" state
2018
- db.end_temporal_edge(
2019
- namespace="smart_home",
2020
- from_id="light_living",
2021
- edge_type="STATE",
2022
- to_id="on",
2023
- end_time=int(time.time() * 1000)
2024
- )
2025
- ```
2026
-
2027
- ---
2028
-
2029
- ## 15. Semantic Cache
2030
-
2031
- Cache LLM responses with similarity-based retrieval for cost savings.
2032
-
2033
- ### Storing Cached Responses
2034
-
2035
- ```typescript
2036
- # Store response with embedding
2037
- db.cache_put(
2038
- cache_name="llm_responses",
2039
- key="What is Python?", # Original query (for display/debugging)
2040
- value="Python is a high-level programming language...",
2041
- embedding=[0.1, 0.2, ...], # Query embedding (384-dim)
2042
- ttl_seconds=3600, # Expire in 1 hour (0 = no expiry)
2043
- metadata={"model": "claude-3", "tokens": 150}
2044
- )
2045
- ```
2046
-
2047
- ### Cache Lookup
2048
-
2049
- ```typescript
2050
- # Check cache before calling LLM
2051
- cached = db.cache_get(
2052
- cache_name="llm_responses",
2053
- query_embedding=[0.12, 0.18, ...], # Embed the new query
2054
- threshold=0.85 # Cosine similarity threshold
2055
- )
2056
-
2057
- if cached:
2058
- print(f"Cache HIT!")
2059
- print(f"Original query: {cached['key']}")
2060
- print(f"Response: {cached['value']}")
2061
- print(f"Similarity: {cached['score']:.4f}")
2062
- else:
2063
- print("Cache MISS - calling LLM...")
2064
- # Call LLM and cache the result
2065
- ```
2066
-
2067
- ### Cache Management
2068
-
2069
- ```typescript
2070
- # Delete specific entry
2071
- db.cache_delete("llm_responses", key="What is Python?")
2072
-
2073
- # Clear entire cache
2074
- db.cache_clear("llm_responses")
2075
-
2076
- # Get cache statistics
2077
- stats = db.cache_stats("llm_responses")
2078
- print(f"Total entries: {stats['count']}")
2079
- print(f"Hit rate: {stats['hit_rate']:.2%}")
2080
- print(f"Memory usage: {stats['size_bytes']}")
2081
- ```
2082
-
2083
- ### Full Usage Pattern
2084
-
2085
- ```typescript
2086
- def get_llm_response(query: str, embed_fn, llm_fn):
2087
- """Get response from cache or LLM."""
2088
- query_embedding = embed_fn(query)
2089
-
2090
- # Try cache first
2091
- cached = db.cache_get(
2092
- cache_name="llm_responses",
2093
- query_embedding=query_embedding,
2094
- threshold=0.90
2095
- )
2096
-
2097
- if cached:
2098
- return cached['value']
2099
-
2100
- # Cache miss - call LLM
2101
- response = llm_fn(query)
2102
-
2103
- # Store in cache
2104
- db.cache_put(
2105
- cache_name="llm_responses",
2106
- key=query,
2107
- value=response,
2108
- embedding=query_embedding,
2109
- ttl_seconds=86400 # 24 hours
2110
- )
2111
-
2112
- return response
2113
- ```
2114
-
2115
- ---
2116
-
2117
- ## 16. Context Query Builder (LLM Optimization)
2118
-
2119
- Assemble LLM context with token budgeting and priority-based truncation.
2120
-
2121
- ### Basic Context Query
2122
-
2123
- ```typescript
2124
- from sochdb import ContextQueryBuilder, ContextFormat, TruncationStrategy
2125
-
2126
- # Build context for LLM
2127
- context = ContextQueryBuilder() \
2128
- .for_session("session_123") \
2129
- .with_budget(4096) \
2130
- .format(ContextFormat.TOON) \
2131
- .literal("SYSTEM", priority=0, text="You are a helpful assistant.") \
2132
- .section("USER_PROFILE", priority=1) \
2133
- .get("user.profile.{name, preferences}") \
2134
- .done() \
2135
- .section("HISTORY", priority=2) \
2136
- .last(10, "messages") \
2137
- .where_eq("session_id", "session_123") \
2138
- .done() \
2139
- .section("KNOWLEDGE", priority=3) \
2140
- .search("documents", "$query_embedding", k=5) \
2141
- .done() \
2142
- .execute()
2143
-
2144
- print(f"Token count: {context.token_count}")
2145
- print(f"Context:\n{context.text}")
2146
- ```
2147
-
2148
- ### Section Types
2149
-
2150
- | Type | Method | Description |
2151
- |------|--------|-------------|
2152
- | `literal` | `.literal(name, priority, text)` | Static text content |
2153
- | `get` | `.get(path)` | Fetch specific data by path |
2154
- | `last` | `.last(n, table)` | Most recent N records from table |
2155
- | `search` | `.search(collection, embedding, k)` | Vector similarity search |
2156
- | `sql` | `.sql(query)` | SQL query results |
2157
-
2158
- ### Truncation Strategies
2159
-
2160
- ```typescript
2161
- # Drop from end (keep beginning) - default
2162
- .truncation(TruncationStrategy.TAIL_DROP)
2163
-
2164
- # Drop from beginning (keep end)
2165
- .truncation(TruncationStrategy.HEAD_DROP)
2166
-
2167
- # Proportionally truncate across sections
2168
- .truncation(TruncationStrategy.PROPORTIONAL)
2169
-
2170
- # Fail if budget exceeded
2171
- .truncation(TruncationStrategy.STRICT)
2172
- ```
2173
-
2174
- ### Variables and Bindings
2175
-
2176
- ```typescript
2177
- from sochdb import ContextValue
2178
-
2179
- context = ContextQueryBuilder() \
2180
- .for_session("session_123") \
2181
- .set_var("query_embedding", ContextValue.Embedding([0.1, 0.2, ...])) \
2182
- .set_var("user_id", ContextValue.String("user_456")) \
2183
- .section("KNOWLEDGE", priority=2) \
2184
- .search("documents", "$query_embedding", k=5) \
2185
- .done() \
2186
- .execute()
2187
- ```
2188
-
2189
- ### Output Formats
2190
-
2191
- ```typescript
2192
- # TOON format (40-60% fewer tokens)
2193
- .format(ContextFormat.TOON)
2194
-
2195
- # JSON format
2196
- .format(ContextFormat.JSON)
2197
-
2198
- # Markdown format (human-readable)
2199
- .format(ContextFormat.MARKDOWN)
2200
-
2201
- # Plain text
2202
- .format(ContextFormat.TEXT)
2203
- ```
2204
-
2205
-
2206
- ## Session Management (Agent Context)
2207
-
2208
- Stateful session management for agentic use cases with permissions, sandboxing, audit logging, and budget tracking.
2209
-
2210
- ### Session Overview
2211
-
2212
- ```
2213
- Agent session abc123:
2214
- cwd: /agents/abc123
2215
- vars: $model = "gpt-4", $budget = 1000
2216
- permissions: fs:rw, db:rw, calc:*
2217
- audit: [read /data/users, write /agents/abc123/cache]
2218
- ```
2219
-
2220
- ### Creating Sessions
2221
-
2222
- ```typescript
2223
- from sochdb import SessionManager, AgentContext
2224
- from datetime import timedelta
2225
-
2226
- # Create session manager with idle timeout
2227
- session_mgr = SessionManager(idle_timeout=timedelta(hours=1))
2228
-
2229
- # Create a new session
2230
- session = session_mgr.create_session("session_abc123")
2231
-
2232
- # Get existing session
2233
- session = session_mgr.get_session("session_abc123")
2234
-
2235
- # Get or create (idempotent)
2236
- session = session_mgr.get_or_create("session_abc123")
2237
-
2238
- # Remove session
2239
- session_mgr.remove_session("session_abc123")
2240
-
2241
- # Cleanup expired sessions
2242
- removed_count = session_mgr.cleanup_expired()
2243
-
2244
- # Get active session count
2245
- count = session_mgr.session_count()
2246
- ```
2247
-
2248
- ### Agent Context
2249
-
2250
- ```typescript
2251
- from sochdb import AgentContext, ContextValue
2252
-
2253
- # Create agent context
2254
- ctx = AgentContext("session_abc123")
2255
- print(f"Session ID: {ctx.session_id}")
2256
- print(f"Working dir: {ctx.working_dir}") # /agents/session_abc123
2257
-
2258
- # Create with custom working directory
2259
- ctx = AgentContext.with_working_dir("session_abc123", "/custom/path")
2260
-
2261
- # Create with full permissions (trusted agents)
2262
- ctx = AgentContext.with_full_permissions("session_abc123")
2263
- ```
2264
-
2265
- ### Session Variables
2266
-
2267
- ```typescript
2268
- # Set variables
2269
- ctx.set_var("model", ContextValue.String("gpt-4"))
2270
- ctx.set_var("budget", ContextValue.Number(1000.0))
2271
- ctx.set_var("debug", ContextValue.Bool(True))
2272
- ctx.set_var("tags", ContextValue.List([
2273
- ContextValue.String("ml"),
2274
- ContextValue.String("production")
2275
- ]))
2276
-
2277
- # Get variables
2278
- model = ctx.get_var("model") # Returns ContextValue or None
2279
- budget = ctx.get_var("budget")
2280
-
2281
- # Peek (read-only, no audit)
2282
- value = ctx.peek_var("model")
2283
-
2284
- # Variable substitution in strings
2285
- text = ctx.substitute_vars("Using $model with budget $budget")
2286
- # Result: "Using gpt-4 with budget 1000"
2287
- ```
2288
-
2289
- ### Context Value Types
2290
-
2291
- ```typescript
2292
- from sochdb import ContextValue
2293
-
2294
- # String
2295
- ContextValue.String("hello")
2296
-
2297
- # Number (float)
2298
- ContextValue.Number(42.5)
2299
-
2300
- # Boolean
2301
- ContextValue.Bool(True)
2302
-
2303
- # List
2304
- ContextValue.List([
2305
- ContextValue.String("a"),
2306
- ContextValue.Number(1.0)
2307
- ])
2308
-
2309
- # Object (dict)
2310
- ContextValue.Object({
2311
- "key": ContextValue.String("value"),
2312
- "count": ContextValue.Number(10.0)
2313
- })
2314
-
2315
- # Null
2316
- ContextValue.Null()
2317
- ```
2318
-
2319
- ### Permissions
2320
-
2321
- ```typescript
2322
- from sochdb import (
2323
- AgentPermissions,
2324
- FsPermissions,
2325
- DbPermissions,
2326
- NetworkPermissions
2327
- )
2328
-
2329
- # Configure permissions
2330
- ctx.permissions = AgentPermissions(
2331
- filesystem=FsPermissions(
2332
- read=True,
2333
- write=True,
2334
- mkdir=True,
2335
- delete=False,
2336
- allowed_paths=["/agents/session_abc123", "/shared/data"]
2337
- ),
2338
- database=DbPermissions(
2339
- read=True,
2340
- write=True,
2341
- create=False,
2342
- drop=False,
2343
- allowed_tables=["user_*", "cache_*"] # Pattern matching
2344
- ),
2345
- calculator=True,
2346
- network=NetworkPermissions(
2347
- http=True,
2348
- allowed_domains=["api.example.com", "*.internal.net"]
2349
- )
2350
- )
2351
-
2352
- # Check permissions before operations
2353
- try:
2354
- ctx.check_fs_permission("/agents/session_abc123/data.json", AuditOperation.FS_READ)
2355
- # Permission granted
2356
- except ContextError as e:
2357
- print(f"Permission denied: {e}")
2358
-
2359
- try:
2360
- ctx.check_db_permission("user_profiles", AuditOperation.DB_QUERY)
2361
- # Permission granted
2362
- except ContextError as e:
2363
- print(f"Permission denied: {e}")
2364
- ```
2365
-
2366
- ### Budget Tracking
2367
-
2368
- ```typescript
2369
- from sochdb import OperationBudget
2370
-
2371
- # Configure budget limits
2372
- ctx.budget = OperationBudget(
2373
- max_tokens=100000, # Maximum tokens (input + output)
2374
- max_cost=5000, # Maximum cost in millicents ($50.00)
2375
- max_operations=10000 # Maximum operation count
2376
- )
2377
-
2378
- # Consume budget (called automatically by operations)
2379
- try:
2380
- ctx.consume_budget(tokens=500, cost=10) # 500 tokens, $0.10
2381
- except ContextError as e:
2382
- if "Budget exceeded" in str(e):
2383
- print("Budget limit reached!")
2384
-
2385
- # Check budget status
2386
- print(f"Tokens used: {ctx.budget.tokens_used}/{ctx.budget.max_tokens}")
2387
- print(f"Cost used: ${ctx.budget.cost_used / 100:.2f}/${ctx.budget.max_cost / 100:.2f}")
2388
- print(f"Operations: {ctx.budget.operations_used}/{ctx.budget.max_operations}")
2389
- ```
2390
-
2391
- ### Session Transactions
2392
-
2393
- ```typescript
2394
- # Begin transaction within session
2395
- ctx.begin_transaction(tx_id=12345)
2396
-
2397
- # Create savepoint
2398
- ctx.savepoint("before_update")
2399
-
2400
- # Record pending writes (for rollback)
2401
- ctx.record_pending_write(
2402
- resource_type=ResourceType.FILE,
2403
- resource_key="/agents/session_abc123/data.json",
2404
- original_value=b'{"old": "data"}'
2405
- )
2406
-
2407
- # Commit transaction
2408
- ctx.commit_transaction()
2409
-
2410
- # Or rollback
2411
- pending_writes = ctx.rollback_transaction()
2412
- for write in pending_writes:
2413
- print(f"Rolling back: {write.resource_key}")
2414
- # Restore original_value
2415
- ```
2416
-
2417
- ### Path Resolution
2418
-
2419
- ```typescript
2420
- # Paths are resolved relative to working directory
2421
- ctx = AgentContext.with_working_dir("session_abc123", "/home/agent")
2422
-
2423
- # Relative paths
2424
- resolved = ctx.resolve_path("data.json") # /home/agent/data.json
2425
-
2426
- # Absolute paths pass through
2427
- resolved = ctx.resolve_path("/absolute/path") # /absolute/path
2428
- ```
2429
-
2430
- ### Audit Trail
2431
-
2432
- ```typescript
2433
- # All operations are automatically logged
2434
- # Audit entry includes: timestamp, operation, resource, result, metadata
2435
-
2436
- # Export audit log
2437
- audit_log = ctx.export_audit()
2438
- for entry in audit_log:
2439
- print(f"[{entry['timestamp']}] {entry['operation']}: {entry['resource']} -> {entry['result']}")
2440
-
2441
- # Example output:
2442
- # [1705312345] var.set: model -> success
2443
- # [1705312346] fs.read: /data/config.json -> success
2444
- # [1705312347] db.query: users -> success
2445
- # [1705312348] fs.write: /forbidden/file -> denied:path not in allowed paths
2446
- ```
2447
-
2448
- ### Audit Operations
2449
-
2450
- ```typescript
2451
- from sochdb import AuditOperation
2452
-
2453
- # Filesystem operations
2454
- AuditOperation.FS_READ
2455
- AuditOperation.FS_WRITE
2456
- AuditOperation.FS_MKDIR
2457
- AuditOperation.FS_DELETE
2458
- AuditOperation.FS_LIST
2459
-
2460
- # Database operations
2461
- AuditOperation.DB_QUERY
2462
- AuditOperation.DB_INSERT
2463
- AuditOperation.DB_UPDATE
2464
- AuditOperation.DB_DELETE
2465
-
2466
- # Other operations
2467
- AuditOperation.CALCULATE
2468
- AuditOperation.VAR_SET
2469
- AuditOperation.VAR_GET
2470
- AuditOperation.TX_BEGIN
2471
- AuditOperation.TX_COMMIT
2472
- AuditOperation.TX_ROLLBACK
2473
- ```
2474
-
2475
- ### Tool Registry
2476
-
2477
- ```typescript
2478
- from sochdb import ToolDefinition, ToolCallRecord
2479
- from datetime import datetime
2480
-
2481
- # Register tools available to the agent
2482
- ctx.register_tool(ToolDefinition(
2483
- name="search_documents",
2484
- description="Search documents by semantic similarity",
2485
- parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}}}',
2486
- requires_confirmation=False
2487
- ))
2488
-
2489
- ctx.register_tool(ToolDefinition(
2490
- name="delete_file",
2491
- description="Delete a file from the filesystem",
2492
- parameters_schema='{"type": "object", "properties": {"path": {"type": "string"}}}',
2493
- requires_confirmation=True # Requires user confirmation
2494
- ))
2495
-
2496
- # Record tool calls
2497
- ctx.record_tool_call(ToolCallRecord(
2498
- call_id="call_001",
2499
- tool_name="search_documents",
2500
- arguments='{"query": "machine learning"}',
2501
- result='[{"id": "doc1", "score": 0.95}]',
2502
- error=None,
2503
- timestamp=datetime.now()
2504
- ))
2505
-
2506
- # Access tool call history
2507
- for call in ctx.tool_calls:
2508
- print(f"{call.tool_name}: {call.result or call.error}")
2509
- ```
2510
-
2511
- ### Session Lifecycle
2512
-
2513
- ```typescript
2514
- # Check session age
2515
- age = ctx.age()
2516
- print(f"Session age: {age}")
2517
-
2518
- # Check idle time
2519
- idle = ctx.idle_time()
2520
- print(f"Idle time: {idle}")
2521
-
2522
- # Check if expired
2523
- if ctx.is_expired(idle_timeout=timedelta(hours=1)):
2524
- print("Session has expired!")
2525
- ```
2526
-
2527
- ### Complete Session Example
2528
-
2529
- ```typescript
2530
- from sochdb import (
2531
- SessionManager, AgentContext, ContextValue,
2532
- AgentPermissions, FsPermissions, DbPermissions,
2533
- OperationBudget, ToolDefinition, AuditOperation
2534
- )
2535
- from datetime import timedelta
2536
-
2537
- # Initialize session manager
2538
- session_mgr = SessionManager(idle_timeout=timedelta(hours=2))
2539
-
2540
- # Create session for an agent
2541
- session_id = "agent_session_12345"
2542
- ctx = session_mgr.get_or_create(session_id)
2543
-
2544
- # Configure the agent
2545
- ctx.permissions = AgentPermissions(
2546
- filesystem=FsPermissions(
2547
- read=True,
2548
- write=True,
2549
- allowed_paths=[f"/agents/{session_id}", "/shared"]
2550
- ),
2551
- database=DbPermissions(
2552
- read=True,
2553
- write=True,
2554
- allowed_tables=["documents", "cache_*"]
2555
- ),
2556
- calculator=True
2557
- )
2558
-
2559
- ctx.budget = OperationBudget(
2560
- max_tokens=50000,
2561
- max_cost=1000, # $10.00
2562
- max_operations=1000
2563
- )
2564
-
2565
- # Set initial variables
2566
- ctx.set_var("model", ContextValue.String("claude-3-sonnet"))
2567
- ctx.set_var("temperature", ContextValue.Number(0.7))
2568
-
2569
- # Register available tools
2570
- ctx.register_tool(ToolDefinition(
2571
- name="vector_search",
2572
- description="Search vectors by similarity",
2573
- parameters_schema='{"type": "object", "properties": {"query": {"type": "string"}, "k": {"type": "integer"}}}',
2574
- requires_confirmation=False
2575
- ))
2576
-
2577
- # Perform operations with permission checks
2578
- def safe_read_file(ctx: AgentContext, path: str) -> bytes:
2579
- resolved = ctx.resolve_path(path)
2580
- ctx.check_fs_permission(resolved, AuditOperation.FS_READ)
2581
- ctx.consume_budget(tokens=100, cost=1)
2582
- # ... actual file read ...
2583
- return b"file contents"
2584
-
2585
- def safe_db_query(ctx: AgentContext, table: str, query: str):
2586
- ctx.check_db_permission(table, AuditOperation.DB_QUERY)
2587
- ctx.consume_budget(tokens=500, cost=5)
2588
- # ... actual query ...
2589
- return []
2590
-
2591
- # Use in transaction
2592
- ctx.begin_transaction(tx_id=1)
2593
- try:
2594
- # Operations here...
2595
- ctx.commit_transaction()
2596
- except Exception as e:
2597
- ctx.rollback_transaction()
2598
- raise
2599
-
2600
- # Export audit trail for debugging/compliance
2601
- audit = ctx.export_audit()
2602
- print(f"Session performed {len(audit)} operations")
2603
-
2604
- # Cleanup
2605
- session_mgr.cleanup_expired()
2606
- ```
2607
-
2608
- ### Session Errors
2609
-
2610
- ```typescript
2611
- from sochdb import ContextError
2612
-
2613
- try:
2614
- ctx.check_fs_permission("/forbidden", AuditOperation.FS_READ)
2615
- except ContextError as e:
2616
- if e.is_permission_denied():
2617
- print(f"Permission denied: {e.message}")
2618
- elif e.is_variable_not_found():
2619
- print(f"Variable not found: {e.variable_name}")
2620
- elif e.is_budget_exceeded():
2621
- print(f"Budget exceeded: {e.budget_type}")
2622
- elif e.is_transaction_error():
2623
- print(f"Transaction error: {e.message}")
2624
- elif e.is_invalid_path():
2625
- print(f"Invalid path: {e.path}")
2626
- elif e.is_session_expired():
2627
- print("Session has expired")
2628
- ```
2629
- ---
2630
-
2631
- ## 17. Atomic Multi-Index Writes
2632
-
2633
- Ensure consistency across KV storage, vectors, and graphs with atomic operations.
2634
-
2635
- ### Problem Without Atomicity
2636
-
2637
- ```
2638
- # Without atomic writes, a crash can leave:
2639
- # - Embedding exists but graph edges don't
2640
- # - KV data exists but embedding is missing
2641
- # - Partial graph relationships
2642
- ```
2643
-
2644
- ### Atomic Memory Writer
2645
-
2646
- ```typescript
2647
- from sochdb import AtomicMemoryWriter, MemoryOp
2648
-
2649
- writer = AtomicMemoryWriter(db)
2650
-
2651
- # Build atomic operation set
2652
- result = writer.write_atomic(
2653
- memory_id="memory_123",
2654
- ops=[
2655
- # Store the blob/content
2656
- MemoryOp.PutBlob(
2657
- key=b"memories/memory_123/content",
2658
- value=b"Meeting notes: discussed project timeline..."
2659
- ),
2660
-
2661
- # Store the embedding
2662
- MemoryOp.PutEmbedding(
2663
- collection="memories",
2664
- id="memory_123",
2665
- embedding=[0.1, 0.2, ...],
2666
- metadata={"type": "meeting", "date": "2024-01-15"}
2667
- ),
2668
-
2669
- # Create graph nodes
2670
- MemoryOp.CreateNode(
2671
- namespace="default",
2672
- node_id="memory_123",
2673
- node_type="memory",
2674
- properties={"importance": "high"}
2675
- ),
2676
-
2677
- # Create graph edges
2678
- MemoryOp.CreateEdge(
2679
- namespace="default",
2680
- from_id="memory_123",
2681
- edge_type="relates_to",
2682
- to_id="project_x",
2683
- properties={}
2684
- ),
2685
- ]
2686
- )
2687
-
2688
- print(f"Intent ID: {result.intent_id}")
2689
- print(f"Operations applied: {result.ops_applied}")
2690
- print(f"Status: {result.status}") # "committed"
2691
- ```
2692
-
2693
- ### How It Works
2694
-
2695
- ```
2696
- 1. Write intent(id, ops...) to WAL ← Crash-safe
2697
- 2. Apply ops one-by-one
2698
- 3. Write commit(id) to WAL ← All-or-nothing
2699
- 4. Recovery replays incomplete intents
2700
- ```
2701
-
2702
- ---
2703
-
2704
- ## 18. Recovery & WAL Management
2705
-
2706
- SochDB uses Write-Ahead Logging (WAL) for durability with automatic recovery.
2707
-
2708
- ### Recovery Manager
2709
-
2710
- ```typescript
2711
- from sochdb import RecoveryManager
2712
-
2713
- recovery = db.recovery()
2714
-
2715
- # Check if recovery is needed
2716
- if recovery.needs_recovery():
2717
- result = recovery.recover()
2718
- print(f"Status: {result.status}")
2719
- print(f"Replayed entries: {result.replayed_entries}")
2720
- ```
2721
-
2722
- ### WAL Verification
2723
-
2724
- ```typescript
2725
- # Verify WAL integrity
2726
- result = recovery.verify_wal()
2727
-
2728
- print(f"Valid: {result.is_valid}")
2729
- print(f"Total entries: {result.total_entries}")
2730
- print(f"Valid entries: {result.valid_entries}")
2731
- print(f"Corrupted: {result.corrupted_entries}")
2732
- print(f"Last valid LSN: {result.last_valid_lsn}")
2733
-
2734
- if result.checksum_errors:
2735
- for error in result.checksum_errors:
2736
- print(f"Checksum error at LSN {error.lsn}: expected {error.expected}, got {error.actual}")
2737
- ```
2738
-
2739
- ### Force Checkpoint
2740
-
2741
- ```typescript
2742
- # Force a checkpoint (flush memtable to disk)
2743
- result = recovery.checkpoint()
2744
-
2745
- print(f"Checkpoint LSN: {result.checkpoint_lsn}")
2746
- print(f"Duration: {result.duration_ms}ms")
2747
- ```
2748
-
2749
- ### WAL Statistics
2750
-
2751
- ```typescript
2752
- stats = recovery.wal_stats()
2753
-
2754
- print(f"Total size: {stats.total_size_bytes} bytes")
2755
- print(f"Active size: {stats.active_size_bytes} bytes")
2756
- print(f"Archived size: {stats.archived_size_bytes} bytes")
2757
- print(f"Entry count: {stats.entry_count}")
2758
- print(f"Oldest LSN: {stats.oldest_entry_lsn}")
2759
- print(f"Newest LSN: {stats.newest_entry_lsn}")
2760
- ```
2761
-
2762
- ### WAL Truncation
2763
-
2764
- ```typescript
2765
- # Truncate WAL after checkpoint (reclaim disk space)
2766
- result = recovery.truncate_wal(up_to_lsn=12345)
2767
-
2768
- print(f"Truncated to LSN: {result.up_to_lsn}")
2769
- print(f"Bytes freed: {result.bytes_freed}")
2770
- ```
2771
-
2772
- ### Open with Auto-Recovery
2773
-
2774
- ```typescript
2775
- from sochdb import open_with_recovery
2776
-
2777
- # Automatically recovers if needed
2778
- db = open_with_recovery("./my_database")
2779
- ```
2780
-
2781
- ---
2782
-
2783
- ## 19. Checkpoints & Snapshots
2784
-
2785
- ### Application Checkpoints
2786
-
2787
- Save and restore application state for workflow interruption/resumption.
2788
-
2789
- ```typescript
2790
- from sochdb import CheckpointService
2791
-
2792
- checkpoint_svc = db.checkpoint_service()
2793
-
2794
- # Create a checkpoint
2795
- checkpoint_id = checkpoint_svc.create(
2796
- name="workflow_step_3",
2797
- state=serialized_state, # bytes
2798
- metadata={"step": "3", "user": "alice", "workflow": "data_pipeline"}
2799
- )
2800
-
2801
- # Restore checkpoint
2802
- state = checkpoint_svc.restore(checkpoint_id)
2803
-
2804
- # List checkpoints
2805
- checkpoints = checkpoint_svc.list()
2806
- for cp in checkpoints:
2807
- print(f"{cp.name}: {cp.created_at}, {cp.state_size} bytes")
2808
-
2809
- # Delete checkpoint
2810
- checkpoint_svc.delete(checkpoint_id)
2811
- ```
2812
-
2813
- ### Workflow Checkpointing
2814
-
2815
- ```typescript
2816
- # Create a workflow run
2817
- run_id = checkpoint_svc.create_run(
2818
- workflow="data_pipeline",
2819
- params={"input_file": "data.csv", "batch_size": 1000}
2820
- )
2821
-
2822
- # Save checkpoint at each node/step
2823
- checkpoint_svc.save_node_checkpoint(
2824
- run_id=run_id,
2825
- node_id="transform_step",
2826
- state=step_state,
2827
- metadata={"rows_processed": 5000}
2828
- )
2829
-
2830
- # Load latest checkpoint for a node
2831
- checkpoint = checkpoint_svc.load_node_checkpoint(run_id, "transform_step")
2832
-
2833
- # List all checkpoints for a run
2834
- node_checkpoints = checkpoint_svc.list_run_checkpoints(run_id)
2835
- ```
2836
-
2837
- ### Snapshot Reader (Point-in-Time)
2838
-
2839
- ```typescript
2840
- # Create a consistent snapshot for reading
2841
- snapshot = db.snapshot()
2842
-
2843
- # Read from snapshot (doesn't see newer writes)
2844
- value = snapshot.get(b"key")
2845
-
2846
- # All reads within snapshot see consistent state
2847
- with db.snapshot() as snap:
2848
- v1 = snap.get(b"key1")
2849
- v2 = snap.get(b"key2") # Same consistent view
2850
-
2851
- # Meanwhile, writes continue in main DB
2852
- db.put(b"key1", b"new_value") # Snapshot doesn't see this
2853
- ```
2854
-
2855
- ---
2856
-
2857
- ## 20. Compression & Storage
2858
-
2859
- ### Compression Settings
2860
-
2861
- ```typescript
2862
- from sochdb import CompressionType
2863
-
2864
- db = Database.open("./my_db", config={
2865
- # Compression for SST files
2866
- "compression": CompressionType.LZ4, # LZ4 (fast), ZSTD (better ratio), NONE
2867
- "compression_level": 3, # ZSTD: 1-22, LZ4: ignored
2868
-
2869
- # Compression for WAL
2870
- "wal_compression": CompressionType.NONE, # Usually NONE for WAL (already sequential)
2871
- })
2872
- ```
2873
-
2874
- ### Compression Comparison
2875
-
2876
- | Type | Ratio | Compress Speed | Decompress Speed | Use Case |
2877
- |------|-------|----------------|------------------|----------|
2878
- | `NONE` | 1x | N/A | N/A | Already compressed data |
2879
- | `LZ4` | ~2.5x | ~780 MB/s | ~4500 MB/s | General use (default) |
2880
- | `ZSTD` | ~3.5x | ~520 MB/s | ~1800 MB/s | Cold storage, large datasets |
2881
-
2882
- ### Storage Statistics
2883
-
2884
- ```typescript
2885
- stats = db.storage_stats()
2886
-
2887
- print(f"Data size: {stats.data_size_bytes}")
2888
- print(f"Index size: {stats.index_size_bytes}")
2889
- print(f"WAL size: {stats.wal_size_bytes}")
2890
- print(f"Compression ratio: {stats.compression_ratio:.2f}x")
2891
- print(f"SST files: {stats.sst_file_count}")
2892
- print(f"Levels: {stats.level_stats}")
2893
- ```
2894
-
2895
- ### Compaction Control
2896
-
2897
- ```typescript
2898
- # Manual compaction (reclaim space, optimize reads)
2899
- db.compact()
2900
-
2901
- # Compact specific level
2902
- db.compact_level(level=0)
2903
-
2904
- # Get compaction stats
2905
- stats = db.compaction_stats()
2906
- print(f"Pending compactions: {stats.pending_compactions}")
2907
- print(f"Running compactions: {stats.running_compactions}")
2908
- ```
2909
-
2910
- ---
2911
-
2912
- ## 21. Statistics & Monitoring
2913
-
2914
- ### Database Statistics
2915
-
2916
- ```typescript
2917
- stats = db.stats()
2918
-
2919
- # Transaction stats
2920
- print(f"Active transactions: {stats.active_transactions}")
2921
- print(f"Committed transactions: {stats.committed_transactions}")
2922
- print(f"Aborted transactions: {stats.aborted_transactions}")
2923
- print(f"Conflict rate: {stats.conflict_rate:.2%}")
2924
-
2925
- # Operation stats
2926
- print(f"Total reads: {stats.total_reads}")
2927
- print(f"Total writes: {stats.total_writes}")
2928
- print(f"Cache hit rate: {stats.cache_hit_rate:.2%}")
2929
-
2930
- # Storage stats
2931
- print(f"Key count: {stats.key_count}")
2932
- print(f"Total data size: {stats.total_data_bytes}")
2933
- ```
2934
-
2935
- ### Token Statistics (LLM Optimization)
2936
-
2937
- ```typescript
2938
- stats = db.token_stats()
2939
-
2940
- print(f"TOON tokens emitted: {stats.toon_tokens_emitted}")
2941
- print(f"Equivalent JSON tokens: {stats.json_tokens_equivalent}")
2942
- print(f"Token savings: {stats.token_savings_percent:.1f}%")
2943
- ```
2944
-
2945
- ### Performance Metrics
2946
-
2947
- ```typescript
2948
- metrics = db.performance_metrics()
2949
-
2950
- # Latency percentiles
2951
- print(f"Read P50: {metrics.read_latency_p50_us}µs")
2952
- print(f"Read P99: {metrics.read_latency_p99_us}µs")
2953
- print(f"Write P50: {metrics.write_latency_p50_us}µs")
2954
- print(f"Write P99: {metrics.write_latency_p99_us}µs")
2955
-
2956
- # Throughput
2957
- print(f"Reads/sec: {metrics.reads_per_second}")
2958
- print(f"Writes/sec: {metrics.writes_per_second}")
2959
- ```
2960
-
2961
- ---
2962
-
2963
- ## 22. Distributed Tracing
2964
-
2965
- Track operations for debugging and performance analysis.
2966
-
2967
- ### Starting Traces
2968
-
2969
- ```typescript
2970
- from sochdb import TraceStore
2971
-
2972
- traces = TraceStore(db)
2973
-
2974
- # Start a trace run
2975
- run = traces.start_run(
2976
- name="user_request",
2977
- resource={"service": "api", "version": "1.0.0"}
2978
- )
2979
- trace_id = run.trace_id
2980
- ```
2981
-
2982
- ### Creating Spans
2983
-
2984
- ```typescript
2985
- from sochdb import SpanKind, SpanStatusCode
2986
-
2987
- # Start root span
2988
- root_span = traces.start_span(
2989
- trace_id=trace_id,
2990
- name="handle_request",
2991
- parent_span_id=None,
2992
- kind=SpanKind.SERVER
2993
- )
2994
-
2995
- # Start child span
2996
- db_span = traces.start_span(
2997
- trace_id=trace_id,
2998
- name="database_query",
2999
- parent_span_id=root_span.span_id,
3000
- kind=SpanKind.CLIENT
3001
- )
3002
-
3003
- # Add attributes
3004
- traces.set_span_attributes(trace_id, db_span.span_id, {
3005
- "db.system": "sochdb",
3006
- "db.operation": "SELECT",
3007
- "db.table": "users"
3008
- })
3009
-
3010
- # End spans
3011
- traces.end_span(trace_id, db_span.span_id, SpanStatusCode.OK)
3012
- traces.end_span(trace_id, root_span.span_id, SpanStatusCode.OK)
3013
-
3014
- # End the trace run
3015
- traces.end_run(trace_id, TraceStatus.COMPLETED)
3016
- ```
3017
-
3018
- ### Domain Events
3019
-
3020
- ```typescript
3021
- # Log retrieval (for RAG debugging)
3022
- traces.log_retrieval(
3023
- trace_id=trace_id,
3024
- query="user query",
3025
- results=[{"id": "doc1", "score": 0.95}],
3026
- latency_ms=15
3027
- )
3028
-
3029
- # Log LLM call
3030
- traces.log_llm_call(
3031
- trace_id=trace_id,
3032
- model="claude-3-sonnet",
3033
- input_tokens=500,
3034
- output_tokens=200,
3035
- latency_ms=1200
3036
- )
3037
- ```
3038
-
3039
- ---
3040
-
3041
- ## 23. Workflow & Run Tracking
3042
-
3043
- Track long-running workflows with events and state.
3044
-
3045
- ### Creating Workflow Runs
3046
-
3047
- ```typescript
3048
- from sochdb import WorkflowService, RunStatus
3049
-
3050
- workflow_svc = db.workflow_service()
3051
-
3052
- # Create a new run
3053
- run = workflow_svc.create_run(
3054
- run_id="run_123",
3055
- workflow="data_pipeline",
3056
- params={"input": "data.csv", "output": "results.json"}
3057
- )
3058
-
3059
- print(f"Run ID: {run.run_id}")
3060
- print(f"Status: {run.status}")
3061
- print(f"Created: {run.created_at}")
3062
- ```
3063
-
3064
- ### Appending Events
3065
-
3066
- ```typescript
3067
- from sochdb import WorkflowEvent, EventType
3068
-
3069
- # Append events as workflow progresses
3070
- workflow_svc.append_event(WorkflowEvent(
3071
- run_id="run_123",
3072
- event_type=EventType.NODE_STARTED,
3073
- node_id="extract",
3074
- data={"input_file": "data.csv"}
3075
- ))
3076
-
3077
- workflow_svc.append_event(WorkflowEvent(
3078
- run_id="run_123",
3079
- event_type=EventType.NODE_COMPLETED,
3080
- node_id="extract",
3081
- data={"rows_extracted": 10000}
3082
- ))
3083
- ```
3084
-
3085
- ### Querying Events
3086
-
3087
- ```typescript
3088
- # Get all events for a run
3089
- events = workflow_svc.get_events("run_123")
3090
-
3091
- # Get events since a sequence number
3092
- new_events = workflow_svc.get_events("run_123", since_seq=10, limit=100)
3093
-
3094
- # Stream events (for real-time monitoring)
3095
- for event in workflow_svc.stream_events("run_123"):
3096
- print(f"[{event.seq}] {event.event_type}: {event.node_id}")
3097
- ```
3098
-
3099
- ### Update Run Status
3100
-
3101
- ```typescript
3102
- # Update status
3103
- workflow_svc.update_run_status("run_123", RunStatus.COMPLETED)
3104
-
3105
- # Or mark as failed
3106
- workflow_svc.update_run_status("run_123", RunStatus.FAILED)
3107
- ```
3108
-
3109
- ---
3110
-
3111
- ## 24. Server Mode (gRPC Client)
3112
-
3113
- Full-featured client for distributed deployments.
3114
-
3115
- ### Connection
3116
-
3117
- ```typescript
3118
- from sochdb import SochDBClient
3119
-
3120
- # Basic connection
3121
- client = SochDBClient("localhost:50051")
3122
-
3123
- # With TLS
3124
- client = SochDBClient("localhost:50051", secure=True, ca_cert="ca.pem")
3125
-
3126
- # With authentication
3127
- client = SochDBClient("localhost:50051", api_key="your_api_key")
3128
-
3129
- # Context manager
3130
- with SochDBClient("localhost:50051") as client:
3131
- client.put(b"key", b"value")
3132
- ```
3133
-
3134
- ### Key-Value Operations
3135
-
3136
- ```typescript
3137
- # Put with TTL
3138
- client.put(b"key", b"value", namespace="default", ttl_seconds=3600)
3139
-
3140
- # Get
3141
- value = client.get(b"key", namespace="default")
3142
-
3143
- # Delete
3144
- client.delete(b"key", namespace="default")
3145
-
3146
- # Batch operations
3147
- client.put_batch([
3148
- (b"key1", b"value1"),
3149
- (b"key2", b"value2"),
3150
- ], namespace="default")
3151
- ```
3152
-
3153
- ### Vector Operations (Server Mode)
3154
-
3155
- ```typescript
3156
- # Create index
3157
- client.create_index(
3158
- name="embeddings",
3159
- dimension=384,
3160
- metric="cosine",
3161
- m=16,
3162
- ef_construction=200
3163
- )
3164
-
3165
- # Insert vectors
3166
- client.insert_vectors(
3167
- index_name="embeddings",
3168
- ids=[1, 2, 3],
3169
- vectors=[[...], [...], [...]]
3170
- )
3171
-
3172
- # Search
3173
- results = client.search(
3174
- index_name="embeddings",
3175
- query=[0.1, 0.2, ...],
3176
- k=10,
3177
- ef_search=50
3178
- )
3179
-
3180
- for result in results:
3181
- print(f"ID: {result.id}, Distance: {result.distance}")
3182
- ```
3183
-
3184
- ### Collection Operations (Server Mode)
3185
-
3186
- ```typescript
3187
- # Create collection
3188
- client.create_collection(
3189
- name="documents",
3190
- dimension=384,
3191
- namespace="default",
3192
- metric="cosine"
3193
- )
3194
-
3195
- # Add documents
3196
- client.add_documents(
3197
- collection_name="documents",
3198
- documents=[
3199
- {"id": "1", "content": "Hello", "embedding": [...], "metadata": {...}},
3200
- {"id": "2", "content": "World", "embedding": [...], "metadata": {...}}
3201
- ],
3202
- namespace="default"
3203
- )
3204
-
3205
- # Search
3206
- results = client.search_collection(
3207
- collection_name="documents",
3208
- query_vector=[...],
3209
- k=10,
3210
- namespace="default",
3211
- filter={"author": "Alice"}
3212
- )
3213
- ```
3214
-
3215
- ### Context Service (Server Mode)
3216
-
3217
- ```typescript
3218
- # Query context for LLM
3219
- context = client.query_context(
3220
- session_id="session_123",
3221
- sections=[
3222
- {"name": "system", "priority": 0, "type": "literal",
3223
- "content": "You are a helpful assistant."},
3224
- {"name": "history", "priority": 1, "type": "recent",
3225
- "table": "messages", "top_k": 10},
3226
- {"name": "knowledge", "priority": 2, "type": "search",
3227
- "collection": "documents", "embedding": [...], "top_k": 5}
3228
- ],
3229
- token_limit=4096,
3230
- format="toon"
3231
- )
3232
-
3233
- print(context.text)
3234
- print(f"Tokens used: {context.token_count}")
3235
- ```
3236
-
3237
- ---
3238
-
3239
- ## 25. IPC Client (Unix Sockets)
3240
-
3241
- Local server communication via Unix sockets (lower latency than gRPC).
3242
-
3243
- ```typescript
3244
- from sochdb import IpcClient
3245
-
3246
- # Connect
3247
- client = IpcClient.connect("/tmp/sochdb.sock", timeout=30.0)
3248
-
3249
- # Basic operations
3250
- client.put(b"key", b"value")
3251
- value = client.get(b"key")
3252
- client.delete(b"key")
3253
-
3254
- # Path operations
3255
- client.put_path(["users", "alice"], b"data")
3256
- value = client.get_path(["users", "alice"])
3257
-
3258
- # Query
3259
- result = client.query("users/", limit=100)
3260
-
3261
- # Scan
3262
- results = client.scan("prefix/")
3263
-
3264
- # Transactions
3265
- txn_id = client.begin_transaction()
3266
- # ... operations ...
3267
- commit_ts = client.commit(txn_id)
3268
- # or client.abort(txn_id)
3269
-
3270
- # Admin
3271
- client.ping()
3272
- client.checkpoint()
3273
- stats = client.stats()
3274
-
3275
- client.close()
3276
- ```
3277
-
3278
- ---
3279
-
3280
- ## 26. Standalone VectorIndex
3281
-
3282
- Direct HNSW index operations without collections.
3283
-
3284
- ```typescript
3285
- from sochdb import VectorIndex, VectorIndexConfig, DistanceMetric
3286
- import numpy as np
3287
-
3288
- # Create index
3289
- config = VectorIndexConfig(
3290
- dimension=384,
3291
- metric=DistanceMetric.COSINE,
3292
- m=16,
3293
- ef_construction=200,
3294
- ef_search=50,
3295
- max_elements=100000
3296
- )
3297
- index = VectorIndex(config)
3298
-
3299
- # Insert single vector
3300
- index.insert(id=1, vector=np.array([0.1, 0.2, ...], dtype=np.float32))
3301
-
3302
- # Batch insert
3303
- ids = np.array([1, 2, 3], dtype=np.uint64)
3304
- vectors = np.array([[...], [...], [...]], dtype=np.float32)
3305
- count = index.insert_batch(ids, vectors)
3306
-
3307
- # Fast batch insert (returns failures)
3308
- inserted, failed = index.insert_batch_fast(ids, vectors)
3309
-
3310
- # Search
3311
- query = np.array([0.1, 0.2, ...], dtype=np.float32)
3312
- results = index.search(query, k=10, ef_search=100)
3313
-
3314
- for id, distance in results:
3315
- print(f"ID: {id}, Distance: {distance}")
3316
-
3317
- # Properties
3318
- print(f"Size: {len(index)}")
3319
- print(f"Dimension: {index.dimension}")
3320
-
3321
- # Save/load
3322
- index.save("./index.bin")
3323
- index = VectorIndex.load("./index.bin")
3324
- ```
3325
-
3326
- ---
3327
-
3328
- ## 27. Vector Utilities
3329
-
3330
- Standalone vector operations for preprocessing and analysis.
3331
-
3332
- ```typescript
3333
- from sochdb import vector
3334
-
3335
- # Distance calculations
3336
- a = [1.0, 0.0, 0.0]
3337
- b = [0.707, 0.707, 0.0]
3338
-
3339
- cosine_dist = vector.cosine_distance(a, b)
3340
- euclidean_dist = vector.euclidean_distance(a, b)
3341
- dot_product = vector.dot_product(a, b)
3342
-
3343
- print(f"Cosine distance: {cosine_dist:.4f}")
3344
- print(f"Euclidean distance: {euclidean_dist:.4f}")
3345
- print(f"Dot product: {dot_product:.4f}")
3346
-
3347
- # Normalize a vector
3348
- v = [3.0, 4.0]
3349
- normalized = vector.normalize(v)
3350
- print(f"Normalized: {normalized}") # [0.6, 0.8]
3351
-
3352
- # Batch normalize
3353
- vectors = [[3.0, 4.0], [1.0, 0.0]]
3354
- normalized_batch = vector.normalize_batch(vectors)
3355
-
3356
- # Compute centroid
3357
- vectors = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
3358
- centroid = vector.centroid(vectors)
3359
-
3360
- # Cosine similarity (1 - distance)
3361
- similarity = vector.cosine_similarity(a, b)
3362
- ```
3363
-
3364
- ---
3365
-
3366
- ## 28. Data Formats (TOON/JSON/Columnar)
3367
-
3368
- ### Wire Formats
3369
-
3370
- ```typescript
3371
- from sochdb import WireFormat
3372
-
3373
- # Available formats
3374
- WireFormat.TOON # Token-efficient (40-66% fewer tokens)
3375
- WireFormat.JSON # Standard JSON
3376
- WireFormat.COLUMNAR # Raw columnar for analytics
3377
-
3378
- # Parse from string
3379
- fmt = WireFormat.from_string("toon")
3380
-
3381
- # Convert between formats
3382
- data = {"users": [{"id": 1, "name": "Alice"}]}
3383
- toon_data = WireFormat.to_toon(data)
3384
- json_data = WireFormat.to_json(data)
3385
- ```
3386
-
3387
- ### TOON Format Benefits
3388
-
3389
- TOON uses **40-60% fewer tokens** than JSON:
3390
-
3391
- ```
3392
- # JSON (15 tokens)
3393
- {"users": [{"id": 1, "name": "Alice"}]}
3394
-
3395
- # TOON (9 tokens)
3396
- users:
3397
- - id: 1
3398
- name: Alice
3399
- ```
3400
-
3401
- ### Context Formats
3402
-
3403
- ```typescript
3404
- from sochdb import ContextFormat
3405
-
3406
- ContextFormat.TOON # Token-efficient
3407
- ContextFormat.JSON # Structured data
3408
- ContextFormat.MARKDOWN # Human-readable
3409
-
3410
- # Format capabilities
3411
- from sochdb import FormatCapabilities
3412
-
3413
- # Convert between formats
3414
- ctx_fmt = FormatCapabilities.wire_to_context(WireFormat.TOON)
3415
- wire_fmt = FormatCapabilities.context_to_wire(ContextFormat.JSON)
3416
-
3417
- # Check round-trip support
3418
- if FormatCapabilities.supports_round_trip(WireFormat.TOON):
3419
- print("Safe for decode(encode(x)) = x")
3420
- ```
3421
-
3422
- ---
3423
-
3424
- ## 29. Policy Service
3425
-
3426
- Register and evaluate access control policies.
3427
-
3428
- ```typescript
3429
- from sochdb import PolicyService
3430
-
3431
- policy_svc = db.policy_service()
3432
-
3433
- # Register a policy
3434
- policy_svc.register(
3435
- policy_id="read_own_data",
3436
- name="Users can read their own data",
3437
- trigger="READ",
3438
- action="ALLOW",
3439
- condition="resource.owner == user.id"
3440
- )
3441
-
3442
- # Register another policy
3443
- policy_svc.register(
3444
- policy_id="admin_all",
3445
- name="Admins can do everything",
3446
- trigger="*",
3447
- action="ALLOW",
3448
- condition="user.role == 'admin'"
3449
- )
3450
-
3451
- # Evaluate policy
3452
- result = policy_svc.evaluate(
3453
- action="READ",
3454
- resource="documents/123",
3455
- context={"user.id": "alice", "user.role": "user", "resource.owner": "alice"}
3456
- )
3457
-
3458
- if result.allowed:
3459
- print("Access granted")
3460
- else:
3461
- print(f"Access denied: {result.reason}")
3462
- print(f"Denying policy: {result.policy_id}")
3463
-
3464
- # List policies
3465
- policies = policy_svc.list()
3466
- for p in policies:
3467
- print(f"{p.policy_id}: {p.name}")
3468
-
3469
- # Delete policy
3470
- policy_svc.delete("old_policy")
3471
- ```
3472
-
3473
- ---
3474
-
3475
- ## 30. MCP (Model Context Protocol)
3476
-
3477
- Integrate SochDB as an MCP tool provider.
3478
-
3479
- ### Built-in MCP Tools
3480
-
3481
- | Tool | Description |
3482
- |------|-------------|
3483
- | `sochdb_query` | Execute ToonQL/SQL queries |
3484
- | `sochdb_context_query` | Fetch AI-optimized context |
3485
- | `sochdb_put` | Store key-value data |
3486
- | `sochdb_get` | Retrieve data by key |
3487
- | `sochdb_search` | Vector similarity search |
3488
-
3489
- ### Using MCP Tools (Server Mode)
3490
-
3491
- ```typescript
3492
- # List available tools
3493
- tools = client.list_mcp_tools()
3494
- for tool in tools:
3495
- print(f"{tool.name}: {tool.description}")
3496
-
3497
- # Get tool schema
3498
- schema = client.get_mcp_tool_schema("sochdb_search")
3499
- print(schema)
3500
-
3501
- # Execute tool
3502
- result = client.execute_mcp_tool(
3503
- name="sochdb_query",
3504
- arguments={"query": "SELECT * FROM users", "format": "toon"}
3505
- )
3506
- print(result)
3507
- ```
3508
-
3509
- ### Register Custom Tool
3510
-
3511
- ```typescript
3512
- # Register a custom tool
3513
- client.register_mcp_tool(
3514
- name="search_documents",
3515
- description="Search documents by semantic similarity",
3516
- input_schema={
3517
- "type": "object",
3518
- "properties": {
3519
- "query": {"type": "string", "description": "Search query"},
3520
- "k": {"type": "integer", "description": "Number of results", "default": 10}
3521
- },
3522
- "required": ["query"]
3523
- }
3524
- )
3525
- ```
3526
-
3527
- ---
3528
-
3529
- ## 31. Configuration Reference
3530
-
3531
- ### Database Configuration
3532
-
3533
- ```typescript
3534
- from sochdb import Database, CompressionType, SyncMode
3535
-
3536
- db = Database.open("./my_db", config={
3537
- # Durability
3538
- "wal_enabled": True, # Write-ahead logging
3539
- "sync_mode": SyncMode.NORMAL, # FULL, NORMAL, OFF
3540
-
3541
- # Performance
3542
- "memtable_size_bytes": 64 * 1024 * 1024, # 64MB (flush threshold)
3543
- "block_cache_size_bytes": 256 * 1024 * 1024, # 256MB
3544
- "group_commit": True, # Batch commits
3545
-
3546
- # Compression
3547
- "compression": CompressionType.LZ4,
3548
-
3549
- # Index policy
3550
- "index_policy": "balanced",
3551
-
3552
- # Background workers
3553
- "compaction_threads": 2,
3554
- "flush_threads": 1,
3555
- })
3556
- ```
3557
-
3558
- ### Sync Modes
3559
-
3560
- | Mode | Speed | Safety | Use Case |
3561
- |------|-------|--------|----------|
3562
- | `OFF` | ~10x faster | Risk of data loss | Development, caches |
3563
- | `NORMAL` | Balanced | Fsync at checkpoints | Default |
3564
- | `FULL` | Slowest | Fsync every commit | Financial data |
3565
-
3566
- ### CollectionConfig Reference
3567
-
3568
- | Field | Type | Default | Description |
3569
- |-------|------|---------|-------------|
3570
- | `name` | str | required | Collection name |
3571
- | `dimension` | int | required | Vector dimension |
3572
- | `metric` | DistanceMetric | COSINE | COSINE, EUCLIDEAN, DOT_PRODUCT |
3573
- | `m` | int | 16 | HNSW M parameter |
3574
- | `ef_construction` | int | 100 | HNSW build quality |
3575
- | `ef_search` | int | 50 | HNSW search quality |
3576
- | `quantization` | QuantizationType | NONE | NONE, SCALAR, PQ |
3577
- | `enable_hybrid_search` | bool | False | Enable BM25 |
3578
- | `content_field` | str | None | Field for BM25 indexing |
3579
-
3580
- ### Environment Variables
3581
-
3582
- | Variable | Description |
3583
- |----------|-------------|
3584
- | `TOONDB_LIB_PATH` | Custom path to native library |
3585
- | `TOONDB_DISABLE_ANALYTICS` | Disable anonymous usage tracking |
3586
- | `TOONDB_LOG_LEVEL` | Log level (DEBUG, INFO, WARN, ERROR) |
3587
-
3588
- ---
3589
-
3590
- ## 32. Error Handling
3591
-
3592
- ### Error Types
3593
-
3594
- ```typescript
3595
- from sochdb import (
3596
- # Base
3597
- SochDBError,
3598
-
3599
- # Connection
3600
- ConnectionError,
3601
- ConnectionTimeoutError,
3602
-
3603
- # Transaction
3604
- TransactionError,
3605
- TransactionConflictError, # SSI conflict - retry
3606
- TransactionTimeoutError,
3607
-
3608
- # Storage
3609
- DatabaseError,
3610
- CorruptionError,
3611
- DiskFullError,
3612
-
3613
- # Namespace
3614
- NamespaceNotFoundError,
3615
- NamespaceExistsError,
3616
- NamespaceAccessError,
3617
-
3618
- # Collection
3619
- CollectionNotFoundError,
3620
- CollectionExistsError,
3621
- CollectionConfigError,
3622
-
3623
- # Validation
3624
- ValidationError,
3625
- DimensionMismatchError,
3626
- InvalidMetadataError,
3627
-
3628
- # Query
3629
- QueryError,
3630
- QuerySyntaxError,
3631
- QueryTimeoutError,
3632
- )
3633
- ```
3634
-
3635
- ### Error Handling Pattern
3636
-
3637
- ```typescript
3638
- from sochdb import (
3639
- SochDBError,
3640
- TransactionConflictError,
3641
- DimensionMismatchError,
3642
- CollectionNotFoundError,
3643
- )
3644
-
3645
- try:
3646
- with db.transaction() as txn:
3647
- txn.put(b"key", b"value")
3648
-
3649
- except TransactionConflictError as e:
3650
- # SSI conflict - safe to retry
3651
- print(f"Conflict detected: {e}")
3652
-
3653
- except DimensionMismatchError as e:
3654
- # Vector dimension wrong
3655
- print(f"Expected {e.expected} dimensions, got {e.actual}")
3656
-
3657
- except CollectionNotFoundError as e:
3658
- # Collection doesn't exist
3659
- print(f"Collection not found: {e.collection}")
3660
-
3661
- except SochDBError as e:
3662
- # All other SochDB errors
3663
- print(f"Error: {e}")
3664
- print(f"Code: {e.code}")
3665
- print(f"Remediation: {e.remediation}")
3666
- ```
3667
-
3668
- ### Error Information
3669
-
3670
- ```typescript
3671
- try:
3672
- # ...
3673
- except SochDBError as e:
3674
- print(f"Message: {e.message}")
3675
- print(f"Code: {e.code}") # ErrorCode enum
3676
- print(f"Details: {e.details}") # Additional context
3677
- print(f"Remediation: {e.remediation}") # How to fix
3678
- print(f"Retryable: {e.retryable}") # Safe to retry?
3679
- ```
3680
-
3681
- ---
3682
-
3683
- ## 33. Async Support
3684
-
3685
- Optional async/await support for non-blocking operations.
3686
-
3687
- ```typescript
3688
- from sochdb import AsyncDatabase
3689
-
3690
- async def main():
3691
- # Open async database
3692
- db = await AsyncDatabase.open("./my_db")
3693
-
3694
- # Async operations
3695
- await db.put(b"key", b"value")
3696
- value = await db.get(b"key")
3697
-
3698
- # Async transactions
3699
- async with db.transaction() as txn:
3700
- await txn.put(b"key1", b"value1")
3701
- await txn.put(b"key2", b"value2")
3702
-
3703
- # Async vector search
3704
- results = await db.collection("docs").search(SearchRequest(
3705
- vector=[0.1, 0.2, ...],
3706
- k=10
3707
- ))
3708
-
3709
- await db.close()
3710
-
3711
- # Run
3712
- import asyncio
3713
- asyncio.run(main())
3714
- ```
3715
-
3716
- **Note:** Requires `npm install @sochdb/sochdb[async]`
3717
-
3718
- ---
3719
-
3720
- ## 34. Building & Development
3721
-
3722
- ### Building Native Extensions
3723
-
3724
- ```bash
3725
- # Build for current platform
3726
- typescript build_native.ts
3727
-
3728
- # Build only FFI libraries
3729
- typescript build_native.ts --libs
3730
-
3731
- # Build for all platforms
3732
- typescript build_native.ts --all
3733
-
3734
- # Clean
3735
- typescript build_native.ts --clean
3736
- ```
3737
-
3738
- ### Library Discovery
3739
-
3740
- The SDK looks for native libraries in this order:
3741
- 1. `TOONDB_LIB_PATH` environment variable
3742
- 2. Bundled in wheel: `lib/{target}/`
3743
- 3. Package directory
3744
- 4. Development builds: `target/release/`, `target/debug/`
3745
- 5. System paths: `/usr/local/lib`, `/usr/lib`
3746
-
3747
- ### Running Tests
3748
-
3749
- ```bash
3750
- # All tests
3751
- pytest
3752
-
3753
- # Specific test file
3754
- pytest tests/test_vector_search.ts
3755
-
3756
- # With coverage
3757
- pytest --cov=sochdb
3758
-
3759
- # Performance tests
3760
- pytest tests/perf/ --benchmark
3761
- ```
3762
-
3763
- ### Package Structure
3764
-
3765
- ```
3766
- sochdb/
3767
- ├── __init__.ts # Public API exports
3768
- ├── database.ts # Database, Transaction
3769
- ├── namespace.ts # Namespace, Collection
3770
- ├── vector.ts # VectorIndex, utilities
3771
- ├── grpc_client.ts # SochDBClient (server mode)
3772
- ├── ipc_client.ts # IpcClient (Unix sockets)
3773
- ├── context.ts # ContextQueryBuilder
3774
- ├── atomic.ts # AtomicMemoryWriter
3775
- ├── recovery.ts # RecoveryManager
3776
- ├── checkpoint.ts # CheckpointService
3777
- ├── workflow.ts # WorkflowService
3778
- ├── trace.ts # TraceStore
3779
- ├── policy.ts # PolicyService
3780
- ├── format.ts # WireFormat, ContextFormat
3781
- ├── errors.ts # All error types
3782
- ├── _bin/ # Bundled binaries
3783
- └── lib/ # FFI libraries
3784
- ```
3785
-
3786
- ---
3787
-
3788
- ## 35. Complete Examples
3789
-
3790
- ### RAG Pipeline Example
3791
-
3792
- ```typescript
3793
- from sochdb import Database, CollectionConfig, DistanceMetric, SearchRequest
3794
-
3795
- # Setup
3796
- db = Database.open("./rag_db")
3797
- ns = db.get_or_create_namespace("rag")
3798
-
3799
- # Create collection for documents
3800
- collection = ns.create_collection(CollectionConfig(
3801
- name="documents",
3802
- dimension=384,
3803
- metric=DistanceMetric.COSINE,
3804
- enable_hybrid_search=True,
3805
- content_field="text"
3806
- ))
3807
-
3808
- # Index documents
3809
- def index_document(doc_id: str, text: str, embed_fn):
3810
- embedding = embed_fn(text)
3811
- collection.insert(
3812
- id=doc_id,
3813
- vector=embedding,
3814
- metadata={"text": text, "indexed_at": "2024-01-15"}
3815
- )
3816
-
3817
- # Retrieve relevant context
3818
- def retrieve_context(query: str, embed_fn, k: int = 5) -> list:
3819
- query_embedding = embed_fn(query)
3820
-
3821
- results = collection.hybrid_search(
3822
- vector=query_embedding,
3823
- text_query=query,
3824
- k=k,
3825
- alpha=0.7 # 70% vector, 30% keyword
3826
- )
3827
-
3828
- return [r.metadata["text"] for r in results]
3829
-
3830
- # Full RAG pipeline
3831
- def rag_query(query: str, embed_fn, llm_fn):
3832
- # 1. Retrieve
3833
- context_docs = retrieve_context(query, embed_fn)
3834
-
3835
- # 2. Build context
3836
- from sochdb import ContextQueryBuilder, ContextFormat
3837
-
3838
- context = ContextQueryBuilder() \
3839
- .for_session("rag_session") \
3840
- .with_budget(4096) \
3841
- .literal("SYSTEM", 0, "Answer based on the provided context.") \
3842
- .literal("CONTEXT", 1, "\n\n".join(context_docs)) \
3843
- .literal("QUESTION", 2, query) \
3844
- .execute()
3845
-
3846
- # 3. Generate
3847
- response = llm_fn(context.text)
3848
-
3849
- return response
3850
-
3851
- db.close()
3852
- ```
3853
-
3854
- ### Knowledge Graph Example
3855
-
3856
- ```typescript
3857
- from sochdb import Database
3858
- import time
3859
-
3860
- db = Database.open("./knowledge_graph")
3861
-
3862
- # Build a knowledge graph
3863
- db.add_node("kg", "alice", "person", {"role": "engineer", "level": "senior"})
3864
- db.add_node("kg", "bob", "person", {"role": "manager"})
3865
- db.add_node("kg", "project_ai", "project", {"status": "active", "budget": 100000})
3866
- db.add_node("kg", "ml_team", "team", {"size": 5})
3867
-
3868
- db.add_edge("kg", "alice", "works_on", "project_ai", {"role": "lead"})
3869
- db.add_edge("kg", "alice", "member_of", "ml_team")
3870
- db.add_edge("kg", "bob", "manages", "project_ai")
3871
- db.add_edge("kg", "bob", "leads", "ml_team")
3872
-
3873
- # Query: Find all projects Alice works on
3874
- nodes, edges = db.traverse("kg", "alice", max_depth=1)
3875
- projects = [n for n in nodes if n["node_type"] == "project"]
3876
- print(f"Alice's projects: {[p['id'] for p in projects]}")
3877
-
3878
- # Query: Who manages Alice's projects?
3879
- for project in projects:
3880
- nodes, edges = db.traverse("kg", project["id"], max_depth=1)
3881
- managers = [e["from_id"] for e in edges if e["edge_type"] == "manages"]
3882
- print(f"{project['id']} managed by: {managers}")
3883
-
3884
- db.close()
3885
- ```
3886
-
3887
- ### Multi-Tenant SaaS Example
3888
-
3889
- ```typescript
3890
- from sochdb import Database
3891
-
3892
- db = Database.open("./saas_db")
3893
-
3894
- # Create tenant namespaces
3895
- for tenant in ["acme_corp", "globex", "initech"]:
3896
- ns = db.create_namespace(
3897
- name=tenant,
3898
- labels={"tier": "premium" if tenant == "acme_corp" else "standard"}
3899
- )
3900
-
3901
- # Create tenant-specific collections
3902
- ns.create_collection(
3903
- name="documents",
3904
- dimension=384
3905
- )
3906
-
3907
- # Tenant-scoped operations
3908
- with db.use_namespace("acme_corp") as ns:
3909
- collection = ns.collection("documents")
3910
-
3911
- # All operations isolated to acme_corp
3912
- collection.insert(
3913
- id="doc1",
3914
- vector=[0.1] * 384,
3915
- metadata={"title": "Acme Internal Doc"}
3916
- )
3917
-
3918
- # Search only searches acme_corp's documents
3919
- results = collection.vector_search(
3920
- vector=[0.1] * 384,
3921
- k=10
3922
- )
3923
-
3924
- # Cleanup
3925
- db.close()
3926
- ```
3927
-
3928
- ---
3929
-
3930
- ## 36. Migration Guide
3931
-
3932
- ### From v0.2.x to v0.3.x
3933
-
3934
- ```typescript
3935
- # Old: scan() with range
3936
- for k, v in db.scan(b"users/", b"users0"): # DEPRECATED
3937
- pass
3938
-
3939
- # New: scan_prefix()
3940
- for k, v in db.scan_prefix(b"users/"):
3941
- pass
3942
-
3943
- # Old: execute_sql returns tuple
3944
- columns, rows = db.execute_sql("SELECT * FROM users")
3945
-
3946
- # New: execute_sql returns SQLQueryResult
3947
- result = db.execute_sql("SELECT * FROM users")
3948
- columns = result.columns
3949
- rows = result.rows
3950
- ```
3951
-
3952
- ### From SQLite/PostgreSQL
3953
-
3954
- ```typescript
3955
- # SQLite
3956
- # conn = sqlite3.connect("app.db")
3957
- # cursor = conn.execute("SELECT * FROM users")
3958
-
3959
- # SochDB (same SQL, embedded)
3960
- db = Database.open("./app_db")
3961
- result = db.execute_sql("SELECT * FROM users")
3962
- ```
3963
-
3964
- ### From Redis
3965
-
3966
- ```typescript
3967
- # Redis
3968
- # r = redis.Redis()
3969
- # r.set("key", "value")
3970
- # r.get("key")
3971
-
3972
- # SochDB
3973
- db = Database.open("./cache_db")
3974
- db.put(b"key", b"value")
3975
- db.get(b"key")
3976
-
3977
- # With TTL
3978
- db.put(b"session:123", b"data", ttl_seconds=3600)
3979
- ```
3980
-
3981
- ### From Pinecone/Weaviate
3982
-
3983
- ```typescript
3984
- # Pinecone
3985
- # index.upsert(vectors=[(id, embedding, metadata)])
3986
- # results = index.query(vector=query, top_k=10)
3987
-
3988
- # SochDB
3989
- collection = db.namespace("default").collection("vectors")
3990
- collection.insert(id=id, vector=embedding, metadata=metadata)
3991
- results = collection.vector_search(vector=query, k=10)
3992
- ```
3993
-
3994
- ---
3995
-
3996
- ## Performance
3997
-
3998
- **Network Overhead:**
3999
- - gRPC: ~100-200 μs per request (local)
4000
- - IPC: ~50-100 μs per request (Unix socket)
4001
-
4002
- **Batch Operations:**
4003
- - Vector insert: 50,000 vectors/sec (batch mode)
4004
- - Vector search: 20,000 queries/sec (47 μs/query)
4005
-
4006
- **Recommendation:**
4007
- - Use **batch operations** for high throughput
4008
- - Use **IPC** for same-machine communication
4009
- - Use **gRPC** for distributed systems
4010
-
4011
- ---
4012
-
4013
- ## FAQ
4014
-
4015
- **Q: Which mode should I use?**
4016
- A:
4017
- - **Embedded (FFI)**: For local dev, notebooks, single-process apps
4018
- - **Server (gRPC)**: For production, multi-language, distributed systems
4019
-
4020
- **Q: Can I switch between modes?**
4021
- A: Yes! Both modes have the same API. Change `Database.open()` to `SochDBClient()` and vice versa.
4022
-
4023
- **Q: Do temporal graphs work in embedded mode?**
4024
- A: Yes! Temporal graphs work in both embedded and server modes with identical APIs.
4025
-
4026
- **Q: Is embedded mode slower than server mode?**
4027
- A: Embedded mode is faster for single-process use (no network overhead). Server mode is better for distributed deployments.
4028
-
4029
- **Q: Where is the business logic?**
4030
- A: All business logic is in Rust. Embedded mode uses FFI bindings, server mode uses gRPC. Same Rust code, different transport.
4031
-
4032
- **Q: What about the old "fat client" Database class?**
4033
- A: It's still here as embedded mode! We now support dual-mode: embedded FFI + server gRPC.
4034
-
4035
- ---
4036
-
4037
- ## Examples
4038
-
4039
- See the [examples/](examples/) directory for complete working examples:
4040
-
4041
- **Embedded Mode (FFI - No Server):**
4042
- - [23_collections_embedded.ts](examples/23_collections_embedded.ts) - Document storage, JSON, transactions
4043
- - [22_namespaces.ts](examples/22_namespaces.ts) - Multi-tenant isolation with key prefixes
4044
- - [24_batch_operations.ts](examples/24_batch_operations.ts) - Atomic writes, rollback, conditional updates
4045
- - [25_temporal_graph_embedded.ts](examples/25_temporal_graph_embedded.ts) - Time-travel queries (NEW!)
4046
-
4047
- **Server Mode (gRPC - Requires Server):**
4048
- - [21_temporal_graph.ts](examples/21_temporal_graph.ts) - Temporal graphs via gRPC
4049
-
4050
- ---
4051
-
4052
- ## Getting Help
1245
+ ### Performance
4053
1246
 
4054
- - **Documentation**: https://sochdb.dev
4055
- - **GitHub Issues**: https://github.com/sochdb/sochdb/issues
4056
- - **Examples**: See [examples/](examples/) directory
1247
+ | Operation | Latency |
1248
+ |-----------|---------|
1249
+ | KV Read | ~100ns |
1250
+ | KV Write (fsync) | ~5ms |
1251
+ | KV Write (concurrent, amortized) | ~60µs |
1252
+ | Vector Search (HNSW, 1M vectors) | <5ms |
1253
+ | Prefix Scan (per item) | ~200ns |
1254
+ | Max Concurrent Readers | 1024 |
4057
1255
 
4058
1256
  ---
4059
1257
 
4060
1258
  ## Contributing
4061
1259
 
4062
- Interested in contributing? See [CONTRIBUTING.md](CONTRIBUTING.md) for:
4063
- - Development environment setup
4064
- - Building from source
4065
- - Running tests
4066
- - Code style guidelines
4067
- - Pull request process
1260
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, building from source, and pull request guidelines.
4068
1261
 
4069
1262
  ---
4070
1263