@soulcraft/brainy 0.37.0 → 0.38.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +710 -1642
- package/dist/brainyData.d.ts +37 -0
- package/dist/distributed/configManager.d.ts +97 -0
- package/dist/distributed/domainDetector.d.ts +77 -0
- package/dist/distributed/hashPartitioner.d.ts +77 -0
- package/dist/distributed/healthMonitor.d.ts +110 -0
- package/dist/distributed/index.d.ts +10 -0
- package/dist/distributed/operationalModes.d.ts +104 -0
- package/dist/types/distributedTypes.d.ts +197 -0
- package/dist/types/distributedTypes.d.ts.map +1 -0
- package/dist/unified.js +1383 -2
- package/dist/unified.min.js +991 -991
- package/dist/utils/crypto.d.ts +25 -0
- package/dist/utils/crypto.d.ts.map +1 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -7,1859 +7,927 @@
|
|
|
7
7
|
[](https://www.typescriptlang.org/)
|
|
8
8
|
[](CONTRIBUTING.md)
|
|
9
9
|
|
|
10
|
-
[//]: # ([](https://github.com/sodal-project/cartographer))
|
|
11
|
-
|
|
12
10
|
**A powerful graph & vector data platform for AI applications across any environment**
|
|
13
11
|
|
|
14
12
|
</div>
|
|
15
13
|
|
|
16
|
-
## ✨
|
|
17
|
-
|
|
18
|
-
Brainy combines the power of vector search with graph relationships in a lightweight, cross-platform database. Whether
|
|
19
|
-
you're building AI applications, recommendation systems, or knowledge graphs, Brainy provides the tools you need to
|
|
20
|
-
store, connect, and retrieve your data intelligently.
|
|
21
|
-
|
|
22
|
-
What makes Brainy special? It intelligently adapts to your environment! Brainy automatically detects your platform,
|
|
23
|
-
adjusts its storage strategy, and optimizes performance based on your usage patterns. The more you use it, the smarter
|
|
24
|
-
it gets - learning from your data to provide increasingly relevant results and connections.
|
|
25
|
-
|
|
26
|
-
### 🚀 Key Features
|
|
27
|
-
|
|
28
|
-
- **🧠 Zero Configuration** - Auto-detects environment and optimizes automatically
|
|
29
|
-
- **⚡ Production-Scale Performance** - Handles millions of vectors with sub-second search
|
|
30
|
-
- **🎯 Intelligent Partitioning** - Semantic clustering with auto-tuning
|
|
31
|
-
- **📊 Adaptive Learning** - Gets smarter with usage, optimizes itself over time
|
|
32
|
-
- **🗄️ Smart Storage** - OPFS, FileSystem, S3 auto-selection based on environment
|
|
33
|
-
- **💾 Massive Memory Optimization** - 75% reduction with compression, intelligent caching
|
|
34
|
-
- **🚀 Distributed Search** - Parallel processing with load balancing
|
|
35
|
-
- **🔄 Real-Time Adaptation** - Automatically adjusts to your data patterns
|
|
36
|
-
- **Run Everywhere** - Works in browsers, Node.js, serverless functions, and containers
|
|
37
|
-
- **Vector Search** - Find semantically similar content using embeddings
|
|
38
|
-
- **Advanced JSON Document Search** - Search within specific fields of JSON documents with field prioritization and
|
|
39
|
-
service-based field standardization
|
|
40
|
-
- **Graph Relationships** - Connect data with meaningful relationships
|
|
41
|
-
- **Streaming Pipeline** - Process data in real-time as it flows through the system
|
|
42
|
-
- **Extensible Augmentations** - Customize and extend functionality with pluggable components
|
|
43
|
-
- **Built-in Conduits** - Sync and scale across instances with WebSocket and WebRTC
|
|
44
|
-
- **TensorFlow Integration** - Use TensorFlow.js for high-quality embeddings
|
|
45
|
-
- **Persistent Storage** - Data persists across sessions and scales to any size
|
|
46
|
-
- **TypeScript Support** - Fully typed API with generics
|
|
47
|
-
- **CLI Tools & Web Service** - Command-line interface and REST API web service for data management
|
|
48
|
-
- **Model Control Protocol (MCP)** - Allow external AI models to access Brainy data and use augmentation pipeline as
|
|
49
|
-
tools
|
|
50
|
-
|
|
51
|
-
## ⚡ Large-Scale Performance Optimizations
|
|
52
|
-
|
|
53
|
-
**New in v0.36.0**: Brainy now includes 6 core optimizations that transform it from a prototype into a production-ready system capable of handling millions of vectors:
|
|
54
|
-
|
|
55
|
-
### 🎯 Performance Benchmarks
|
|
56
|
-
|
|
57
|
-
| Dataset Size | Search Time | Memory Usage | API Calls Reduction |
|
|
58
|
-
|-------------|-------------|--------------|-------------------|
|
|
59
|
-
| **10k vectors** | ~50ms | Standard | N/A |
|
|
60
|
-
| **100k vectors** | ~200ms | 30% reduction | 50-70% fewer |
|
|
61
|
-
| **1M+ vectors** | ~500ms | 75% reduction | 50-90% fewer |
|
|
62
|
-
|
|
63
|
-
### 🧠 6 Core Optimization Systems
|
|
64
|
-
|
|
65
|
-
1. **🎛️ Auto-Configuration System** - Detects environment, resources, and data patterns
|
|
66
|
-
2. **🔀 Semantic Partitioning** - Intelligent clustering with auto-tuning (4-32 clusters)
|
|
67
|
-
3. **🚀 Distributed Search** - Parallel processing across partitions with load balancing
|
|
68
|
-
4. **🧠 Multi-Level Caching** - Hot/Warm/Cold caching with predictive prefetching
|
|
69
|
-
5. **📦 Batch S3 Operations** - Reduces cloud storage API calls by 50-90%
|
|
70
|
-
6. **💾 Advanced Compression** - Vector quantization and memory-mapping for large datasets
|
|
14
|
+
## ✨ What is Brainy?
|
|
71
15
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
| Environment | Auto-Configured | Performance Focus |
|
|
75
|
-
|-------------|-----------------|-------------------|
|
|
76
|
-
| **Browser** | OPFS + Web Workers | Memory efficiency, 512MB-1GB limits |
|
|
77
|
-
| **Node.js** | FileSystem + Worker Threads | High performance, 4GB-8GB+ usage |
|
|
78
|
-
| **Serverless** | S3 + Memory cache | Cold start optimization, latency focus |
|
|
79
|
-
|
|
80
|
-
### 📊 Intelligent Scaling Strategy
|
|
81
|
-
|
|
82
|
-
The system automatically adapts based on your dataset size:
|
|
83
|
-
|
|
84
|
-
- **< 25k vectors**: Single optimized index, no partitioning needed
|
|
85
|
-
- **25k - 100k**: Semantic clustering (4-8 clusters), balanced performance
|
|
86
|
-
- **100k - 1M**: Advanced partitioning (8-16 clusters), scale-optimized
|
|
87
|
-
- **1M+ vectors**: Maximum optimization (16-32 clusters), enterprise-grade
|
|
88
|
-
|
|
89
|
-
### 🧠 Adaptive Learning Features
|
|
90
|
-
|
|
91
|
-
- **Performance Monitoring**: Tracks latency, cache hits, memory usage
|
|
92
|
-
- **Dynamic Tuning**: Adjusts parameters every 50 searches based on performance
|
|
93
|
-
- **Pattern Recognition**: Learns from access patterns to improve predictions
|
|
94
|
-
- **Self-Optimization**: Automatically enables/disables features based on workload
|
|
95
|
-
|
|
96
|
-
> **📖 Full Documentation**: See the complete [Large-Scale Optimizations Guide](docs/optimization-guides/large-scale-optimizations.md) for detailed configuration options and advanced usage.
|
|
97
|
-
|
|
98
|
-
## 🚀 Live Demo
|
|
99
|
-
|
|
100
|
-
**[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the interactive demo on
|
|
101
|
-
GitHub Pages that showcases Brainy's main features.
|
|
102
|
-
|
|
103
|
-
## 📊 What Can You Build?
|
|
104
|
-
|
|
105
|
-
- **Semantic Search Engines** - Find content based on meaning, not just keywords
|
|
106
|
-
- **Recommendation Systems** - Suggest similar items based on vector similarity
|
|
107
|
-
- **Knowledge Graphs** - Build connected data structures with relationships
|
|
108
|
-
- **AI Applications** - Store and retrieve embeddings for machine learning models
|
|
109
|
-
- **AI-Enhanced Applications** - Build applications that leverage vector embeddings for intelligent data processing
|
|
110
|
-
- **Data Organization Tools** - Automatically categorize and connect related information
|
|
111
|
-
- **Adaptive Experiences** - Create applications that learn and evolve with your users
|
|
112
|
-
- **Model-Integrated Systems** - Connect external AI models to Brainy data and tools using MCP
|
|
113
|
-
|
|
114
|
-
## 🔧 Installation
|
|
115
|
-
|
|
116
|
-
```bash
|
|
117
|
-
npm install @soulcraft/brainy
|
|
118
|
-
```
|
|
16
|
+
Imagine a database that thinks like you do - connecting ideas, finding patterns, and getting smarter over time. Brainy is the **AI-native database** that brings vector search and knowledge graphs together in one powerful, ridiculously easy-to-use package.
|
|
119
17
|
|
|
120
|
-
|
|
121
|
-
configuration
|
|
18
|
+
### 🆕 NEW: Distributed Mode (v0.38+)
|
|
19
|
+
**Scale horizontally with zero configuration!** Brainy now supports distributed deployments with automatic coordination:
|
|
20
|
+
- **🌐 Multi-Instance Coordination** - Multiple readers and writers working in harmony
|
|
21
|
+
- **🏷️ Smart Domain Detection** - Automatically categorizes data (medical, legal, product, etc.)
|
|
22
|
+
- **📊 Real-Time Health Monitoring** - Track performance across all instances
|
|
23
|
+
- **🔄 Automatic Role Optimization** - Readers optimize for cache, writers for throughput
|
|
24
|
+
- **🗂️ Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
|
|
122
25
|
|
|
123
|
-
###
|
|
26
|
+
### 🚀 Why Developers Love Brainy
|
|
124
27
|
|
|
125
|
-
Brainy
|
|
28
|
+
- **🧠 It Just Works™** - No config files, no tuning parameters, no DevOps headaches. Brainy auto-detects your environment and optimizes itself
|
|
29
|
+
- **🌍 True Write-Once, Run-Anywhere** - Same code runs in React, Angular, Vue, Node.js, Deno, Bun, serverless, edge workers, and even vanilla HTML
|
|
30
|
+
- **⚡ Scary Fast** - Handles millions of vectors with sub-millisecond search. Built-in GPU acceleration when available
|
|
31
|
+
- **🎯 Self-Learning** - Like having a database that goes to the gym. Gets faster and smarter the more you use it
|
|
32
|
+
- **🔮 AI-First Design** - Built for the age of embeddings, RAG, and semantic search. Your LLMs will thank you
|
|
33
|
+
- **🎮 Actually Fun to Use** - Clean API, great DX, and it does the heavy lifting so you can build cool stuff
|
|
126
34
|
|
|
127
|
-
|
|
35
|
+
## 🚀 Quick Start (30 seconds!)
|
|
128
36
|
|
|
37
|
+
### Node.js TLDR
|
|
129
38
|
```bash
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
Command-line interface for data management, bulk operations, and database administration.
|
|
134
|
-
|
|
135
|
-
#### Web Service Package
|
|
39
|
+
# Install
|
|
40
|
+
npm install brainy
|
|
136
41
|
|
|
137
|
-
|
|
138
|
-
npm install @soulcraft/brainy-web-service
|
|
42
|
+
# Use it
|
|
139
43
|
```
|
|
44
|
+
```javascript
|
|
45
|
+
import { createAutoBrainy, NounType, VerbType } from 'brainy'
|
|
140
46
|
|
|
141
|
-
REST API web service wrapper that provides HTTP endpoints for search operations and database queries.
|
|
142
|
-
|
|
143
|
-
## 🚀 Quick Setup - Zero Configuration!
|
|
144
|
-
|
|
145
|
-
**New in v0.36.0**: Brainy now automatically detects your environment and optimizes itself! Choose your scenario:
|
|
146
|
-
|
|
147
|
-
### ✨ Instant Setup (Auto-Everything)
|
|
148
|
-
```typescript
|
|
149
|
-
import { createAutoBrainy } from '@soulcraft/brainy'
|
|
150
|
-
|
|
151
|
-
// That's it! Everything is auto-configured
|
|
152
47
|
const brainy = createAutoBrainy()
|
|
153
48
|
|
|
154
|
-
// Add data
|
|
155
|
-
await brainy.
|
|
156
|
-
const results = await brainy.search([0.1, 0.2, 0.3], 10)
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
### 📦 With S3 Storage (Still Auto-Configured)
|
|
160
|
-
```typescript
|
|
161
|
-
import { createAutoBrainy } from '@soulcraft/brainy'
|
|
162
|
-
|
|
163
|
-
// Auto-detects AWS credentials from environment variables
|
|
164
|
-
const brainy = createAutoBrainy({
|
|
165
|
-
bucketName: 'my-vector-storage'
|
|
166
|
-
// region: 'us-east-1' (default)
|
|
167
|
-
// AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from env
|
|
168
|
-
})
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
### 🎯 Scenario-Based Setup
|
|
172
|
-
```typescript
|
|
173
|
-
import { createQuickBrainy } from '@soulcraft/brainy'
|
|
174
|
-
|
|
175
|
-
// Choose your scale: 'small', 'medium', 'large', 'enterprise'
|
|
176
|
-
const brainy = await createQuickBrainy('large', {
|
|
177
|
-
bucketName: 'my-big-vector-db'
|
|
178
|
-
})
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
| Scenario | Dataset Size | Memory Usage | S3 Required | Best For |
|
|
182
|
-
|----------|-------------|--------------|-------------|----------|
|
|
183
|
-
| `small` | ≤10k vectors | ≤1GB | No | Development, testing |
|
|
184
|
-
| `medium` | ≤100k vectors | ≤4GB | Serverless only | Production apps |
|
|
185
|
-
| `large` | ≤1M vectors | ≤8GB | Yes | Large applications |
|
|
186
|
-
| `enterprise` | ≤10M vectors | ≤32GB | Yes | Enterprise systems |
|
|
187
|
-
|
|
188
|
-
### 🧠 What Auto-Configuration Does
|
|
189
|
-
|
|
190
|
-
- **🎯 Environment Detection**: Browser, Node.js, or Serverless
|
|
191
|
-
- **💾 Smart Memory Management**: Uses available RAM optimally
|
|
192
|
-
- **🗄️ Storage Selection**: OPFS, FileSystem, S3, or Memory
|
|
193
|
-
- **⚡ Performance Tuning**: Threading, caching, compression
|
|
194
|
-
- **📊 Adaptive Learning**: Improves performance over time
|
|
195
|
-
- **🔍 Semantic Partitioning**: Auto-clusters similar vectors
|
|
196
|
-
|
|
197
|
-
## 🏁 Traditional Setup (Manual Configuration)
|
|
198
|
-
|
|
199
|
-
If you prefer manual control:
|
|
200
|
-
|
|
201
|
-
```typescript
|
|
202
|
-
import { BrainyData, NounType, VerbType } from '@soulcraft/brainy'
|
|
203
|
-
|
|
204
|
-
// Create and initialize the database
|
|
205
|
-
const db = new BrainyData()
|
|
206
|
-
await db.init()
|
|
207
|
-
|
|
208
|
-
// Add data (automatically converted to vectors)
|
|
209
|
-
const catId = await db.add("Cats are independent pets", {
|
|
49
|
+
// Add data with Nouns (entities)
|
|
50
|
+
const catId = await brainy.add("Siamese cats are elegant and vocal", {
|
|
210
51
|
noun: NounType.Thing,
|
|
211
|
-
|
|
52
|
+
breed: "Siamese",
|
|
53
|
+
category: "animal"
|
|
212
54
|
})
|
|
213
55
|
|
|
214
|
-
const
|
|
215
|
-
noun: NounType.
|
|
216
|
-
|
|
56
|
+
const ownerId = await brainy.add("John loves his pets", {
|
|
57
|
+
noun: NounType.Person,
|
|
58
|
+
name: "John Smith"
|
|
217
59
|
})
|
|
218
60
|
|
|
219
|
-
//
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
// Add a relationship between items
|
|
224
|
-
await db.addVerb(catId, dogId, {
|
|
225
|
-
verb: VerbType.RelatedTo,
|
|
226
|
-
description: 'Both are common household pets'
|
|
61
|
+
// Connect with Verbs (relationships)
|
|
62
|
+
await brainy.addVerb(ownerId, catId, {
|
|
63
|
+
verb: VerbType.Owns,
|
|
64
|
+
since: "2020-01-01"
|
|
227
65
|
})
|
|
228
|
-
```
|
|
229
|
-
|
|
230
|
-
### Import Options
|
|
231
|
-
|
|
232
|
-
```typescript
|
|
233
|
-
// Standard import - automatically adapts to any environment
|
|
234
|
-
import { BrainyData } from '@soulcraft/brainy'
|
|
235
|
-
|
|
236
|
-
// Minified version for production
|
|
237
|
-
import { BrainyData } from '@soulcraft/brainy/min'
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
> **Note**: The CLI functionality is available as a separate package `@soulcraft/brainy-cli` to reduce the bundle size
|
|
241
|
-
> of the main package. Install it globally with `npm install -g @soulcraft/brainy-cli` to use the command-line
|
|
242
|
-
> interface.
|
|
243
|
-
|
|
244
|
-
### Browser Usage
|
|
245
|
-
|
|
246
|
-
```html
|
|
247
66
|
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
import { BrainyData } from './dist/unified.js'
|
|
67
|
+
// Search by meaning
|
|
68
|
+
const results = await brainy.searchText("feline companions", 5)
|
|
251
69
|
|
|
252
|
-
|
|
253
|
-
|
|
70
|
+
// Search JSON documents by specific fields
|
|
71
|
+
const docs = await brainy.searchDocuments("Siamese", {
|
|
72
|
+
fields: ['breed', 'category'], // Search these fields
|
|
73
|
+
weights: { breed: 2.0 }, // Prioritize breed matches
|
|
74
|
+
limit: 10
|
|
75
|
+
})
|
|
254
76
|
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
// ...
|
|
258
|
-
</script>
|
|
77
|
+
// Find relationships
|
|
78
|
+
const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
|
|
259
79
|
```
|
|
260
80
|
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
## 🧩 How It Works
|
|
264
|
-
|
|
265
|
-
Brainy combines **six advanced optimization systems** with core vector database technologies to create a production-ready, self-optimizing system:
|
|
266
|
-
|
|
267
|
-
### 🔧 Core Technologies
|
|
268
|
-
1. **Vector Embeddings** - Converts data (text, images, etc.) into numerical vectors using TensorFlow.js
|
|
269
|
-
2. **Optimized HNSW Algorithm** - Fast similarity search with semantic partitioning and distributed processing
|
|
270
|
-
3. **🧠 Auto-Configuration Engine** - Detects environment, resources, and data patterns to optimize automatically
|
|
271
|
-
4. **🎯 Intelligent Storage System** - Multi-level caching with predictive prefetching and batch operations
|
|
272
|
-
|
|
273
|
-
### ⚡ Advanced Optimization Layer
|
|
274
|
-
5. **Semantic Partitioning** - Auto-clusters similar vectors for faster search (4-32 clusters based on scale)
|
|
275
|
-
6. **Distributed Search** - Parallel processing across partitions with intelligent load balancing
|
|
276
|
-
7. **Multi-Level Caching** - Hot (RAM) → Warm (Fast Storage) → Cold (S3/Disk) with 70-90% hit rates
|
|
277
|
-
8. **Batch Operations** - Reduces S3 API calls by 50-90% through intelligent batching
|
|
278
|
-
9. **Adaptive Learning** - Continuously learns from usage patterns and optimizes performance
|
|
279
|
-
10. **Advanced Compression** - Vector quantization achieves 75% memory reduction for large datasets
|
|
280
|
-
|
|
281
|
-
### 🎯 Environment-Specific Optimizations
|
|
81
|
+
That's it! No config, no setup, it just works™
|
|
282
82
|
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
```
|
|
291
|
-
Data Input → Auto-Detection → Environment Optimization → Semantic Partitioning →
|
|
292
|
-
Distributed Search → Multi-Level Caching → Performance Learning → Self-Tuning
|
|
293
|
-
```
|
|
83
|
+
### 🌐 Distributed Mode Example (NEW!)
|
|
84
|
+
```javascript
|
|
85
|
+
// Writer Instance - Ingests data from multiple sources
|
|
86
|
+
const writer = createAutoBrainy({
|
|
87
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
88
|
+
distributed: { role: 'writer' } // Explicit role for safety
|
|
89
|
+
})
|
|
294
90
|
|
|
295
|
-
|
|
91
|
+
// Reader Instance - Optimized for search queries
|
|
92
|
+
const reader = createAutoBrainy({
|
|
93
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
94
|
+
distributed: { role: 'reader' } // 80% memory for cache
|
|
95
|
+
})
|
|
296
96
|
|
|
297
|
-
|
|
97
|
+
// Data automatically gets domain tags
|
|
98
|
+
await writer.add("Patient shows symptoms of...", {
|
|
99
|
+
diagnosis: "flu" // Auto-tagged as 'medical' domain
|
|
100
|
+
})
|
|
298
101
|
|
|
299
|
-
|
|
102
|
+
// Domain-aware search across all partitions
|
|
103
|
+
const results = await reader.search("medical symptoms", 10, {
|
|
104
|
+
filter: { domain: 'medical' } // Only search medical data
|
|
105
|
+
})
|
|
300
106
|
|
|
301
|
-
|
|
302
|
-
|
|
107
|
+
// Monitor health across all instances
|
|
108
|
+
const health = reader.getHealthStatus()
|
|
109
|
+
console.log(`Instance ${health.instanceId}: ${health.status}`)
|
|
303
110
|
```
|
|
304
111
|
|
|
305
|
-
|
|
306
|
-
operations faster and more relevant.
|
|
307
|
-
|
|
308
|
-
### Pipeline Stages
|
|
309
|
-
|
|
310
|
-
1. **Data Ingestion**
|
|
311
|
-
- Raw text or pre-computed vectors enter the pipeline
|
|
312
|
-
- Data is validated and prepared for processing
|
|
313
|
-
|
|
314
|
-
2. **Embedding Generation**
|
|
315
|
-
- Text is transformed into numerical vectors using embedding models
|
|
316
|
-
- Uses TensorFlow Universal Sentence Encoder for high-quality text embeddings
|
|
317
|
-
- Custom embedding functions can be plugged in for specialized domains
|
|
318
|
-
|
|
319
|
-
3. **Vector Indexing**
|
|
320
|
-
- Vectors are indexed using the HNSW algorithm
|
|
321
|
-
- Hierarchical structure enables fast similarity search
|
|
322
|
-
- Configurable parameters for precision vs. performance tradeoffs
|
|
323
|
-
|
|
324
|
-
4. **Graph Construction**
|
|
325
|
-
- Nouns (entities) become nodes in the knowledge graph
|
|
326
|
-
- Verbs (relationships) connect related entities
|
|
327
|
-
- Typed relationships add semantic meaning to connections
|
|
328
|
-
|
|
329
|
-
5. **Adaptive Learning**
|
|
330
|
-
- Analyzes usage patterns to optimize future operations
|
|
331
|
-
- Tunes performance parameters based on your environment
|
|
332
|
-
- Adjusts search strategies based on query history
|
|
333
|
-
- Becomes more efficient and relevant the more you use it
|
|
334
|
-
|
|
335
|
-
6. **Intelligent Storage**
|
|
336
|
-
- Data is saved using the optimal storage for your environment
|
|
337
|
-
- Automatic selection between OPFS, filesystem, S3, or memory
|
|
338
|
-
- Migrates between storage types as your application's needs evolve
|
|
339
|
-
- Scales from tiny datasets to massive data collections
|
|
340
|
-
- Configurable storage adapters for custom persistence needs
|
|
341
|
-
|
|
342
|
-
### Augmentation Types
|
|
343
|
-
|
|
344
|
-
Brainy uses a powerful augmentation system to extend functionality. Augmentations are processed in the following order:
|
|
345
|
-
|
|
346
|
-
1. **SENSE**
|
|
347
|
-
- Ingests and processes raw, unstructured data into nouns and verbs
|
|
348
|
-
- Handles text, images, audio streams, and other input formats
|
|
349
|
-
- Example: Converting raw text into structured entities
|
|
350
|
-
|
|
351
|
-
2. **MEMORY**
|
|
352
|
-
- Provides storage capabilities for data in different formats
|
|
353
|
-
- Manages persistence across sessions
|
|
354
|
-
- Example: Storing vectors in OPFS or filesystem
|
|
355
|
-
|
|
356
|
-
3. **COGNITION**
|
|
357
|
-
- Enables advanced reasoning, inference, and logical operations
|
|
358
|
-
- Analyzes relationships between entities
|
|
359
|
-
- Examples:
|
|
360
|
-
- Inferring new connections between existing data
|
|
361
|
-
- Deriving insights from graph relationships
|
|
362
|
-
|
|
363
|
-
4. **CONDUIT**
|
|
364
|
-
- Establishes channels for structured data exchange
|
|
365
|
-
- Connects with external systems and syncs between Brainy instances
|
|
366
|
-
- Two built-in iConduit augmentations for scaling out and syncing:
|
|
367
|
-
- **WebSocket iConduit** - Syncs data between browsers and servers
|
|
368
|
-
- **WebRTC iConduit** - Direct peer-to-peer syncing between browsers
|
|
369
|
-
- Examples:
|
|
370
|
-
- Integrating with third-party APIs
|
|
371
|
-
- Syncing Brainy instances between browsers using WebSockets
|
|
372
|
-
- Peer-to-peer syncing between browsers using WebRTC
|
|
373
|
-
|
|
374
|
-
5. **ACTIVATION**
|
|
375
|
-
- Initiates actions, responses, or data manipulations
|
|
376
|
-
- Triggers events based on data changes
|
|
377
|
-
- Example: Sending notifications when new data is processed
|
|
378
|
-
|
|
379
|
-
6. **PERCEPTION**
|
|
380
|
-
- Interprets, contextualizes, and visualizes identified nouns and verbs
|
|
381
|
-
- Creates meaningful representations of data
|
|
382
|
-
- Example: Generating visualizations of graph relationships
|
|
383
|
-
|
|
384
|
-
7. **DIALOG**
|
|
385
|
-
- Facilitates natural language understanding and generation
|
|
386
|
-
- Enables conversational interactions
|
|
387
|
-
- Example: Processing user queries and generating responses
|
|
388
|
-
|
|
389
|
-
8. **WEBSOCKET**
|
|
390
|
-
- Enables real-time communication via WebSockets
|
|
391
|
-
- Can be combined with other augmentation types
|
|
392
|
-
- Example: Streaming data processing in real-time
|
|
393
|
-
|
|
394
|
-
### Streaming Data Support
|
|
395
|
-
|
|
396
|
-
Brainy's pipeline is designed to handle streaming data efficiently:
|
|
397
|
-
|
|
398
|
-
1. **WebSocket Integration**
|
|
399
|
-
- Built-in support for WebSocket connections
|
|
400
|
-
- Process data as it arrives without blocking
|
|
401
|
-
- Example: `setupWebSocketPipeline(url, dataType, options)`
|
|
402
|
-
|
|
403
|
-
2. **Asynchronous Processing**
|
|
404
|
-
- Non-blocking architecture for real-time data handling
|
|
405
|
-
- Parallel processing of incoming streams
|
|
406
|
-
- Example: `createWebSocketHandler(connection, dataType, options)`
|
|
407
|
-
|
|
408
|
-
3. **Event-Based Architecture**
|
|
409
|
-
- Augmentations can listen to data feeds and streams
|
|
410
|
-
- Real-time updates propagate through the pipeline
|
|
411
|
-
- Example: `listenToFeed(feedUrl, callback)`
|
|
412
|
-
|
|
413
|
-
4. **Threaded Execution**
|
|
414
|
-
- Comprehensive multi-threading for high-performance operations
|
|
415
|
-
- Parallel processing for batch operations, vector calculations, and embedding generation
|
|
416
|
-
- Configurable execution modes (SEQUENTIAL, PARALLEL, THREADED)
|
|
417
|
-
- Automatic thread management based on environment capabilities
|
|
418
|
-
- Example: `executeTypedPipeline(augmentations, method, args, { mode: ExecutionMode.THREADED })`
|
|
419
|
-
|
|
420
|
-
### Running the Pipeline
|
|
421
|
-
|
|
422
|
-
The pipeline runs automatically when you:
|
|
423
|
-
|
|
424
|
-
```typescript
|
|
425
|
-
// Add data (runs embedding → indexing → storage)
|
|
426
|
-
const id = await db.add("Your text data here", { metadata })
|
|
112
|
+
## 🎭 Key Features
|
|
427
113
|
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
|
|
114
|
+
### Core Capabilities
|
|
115
|
+
- **Vector Search** - Find semantically similar content using embeddings
|
|
116
|
+
- **Graph Relationships** - Connect data with meaningful relationships
|
|
117
|
+
- **JSON Document Search** - Search within specific fields with prioritization
|
|
118
|
+
- **Distributed Mode** - Scale horizontally with automatic coordination between instances
|
|
119
|
+
- **Real-Time Syncing** - WebSocket and WebRTC for distributed instances
|
|
120
|
+
- **Streaming Pipeline** - Process data in real-time as it flows through
|
|
121
|
+
- **Model Control Protocol** - Let AI models access your data
|
|
122
|
+
|
|
123
|
+
### Smart Optimizations
|
|
124
|
+
- **Auto-Configuration** - Detects environment and optimizes automatically
|
|
125
|
+
- **Adaptive Learning** - Gets smarter with usage, optimizes itself over time
|
|
126
|
+
- **Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
|
|
127
|
+
- **Role-Based Optimization** - Readers maximize cache, writers optimize throughput
|
|
128
|
+
- **Domain-Aware Indexing** - Automatic categorization improves search relevance
|
|
129
|
+
- **Multi-Level Caching** - Hot/warm/cold caching with predictive prefetching
|
|
130
|
+
- **Memory Optimization** - 75% reduction with compression for large datasets
|
|
131
|
+
|
|
132
|
+
### Developer Experience
|
|
133
|
+
- **TypeScript Support** - Fully typed API with generics
|
|
134
|
+
- **Extensible Augmentations** - Customize and extend functionality
|
|
135
|
+
- **REST API** - Web service wrapper for HTTP endpoints
|
|
136
|
+
- **Auto-Complete** - IntelliSense for all APIs and types
|
|
434
137
|
|
|
435
|
-
|
|
138
|
+
## 📦 Installation
|
|
436
139
|
|
|
140
|
+
### Main Package
|
|
437
141
|
```bash
|
|
438
|
-
|
|
439
|
-
brainy add "Your text data here" '{"noun":"Thing"}'
|
|
440
|
-
|
|
441
|
-
# Search through the CLI pipeline
|
|
442
|
-
brainy search "Your query here" --limit 5
|
|
443
|
-
|
|
444
|
-
# Connect entities through the CLI
|
|
445
|
-
brainy addVerb <sourceId> <targetId> RelatedTo
|
|
446
|
-
```
|
|
447
|
-
|
|
448
|
-
### Extending the Pipeline
|
|
449
|
-
|
|
450
|
-
Brainy's pipeline is designed for extensibility at every stage:
|
|
451
|
-
|
|
452
|
-
1. **Custom Embedding**
|
|
453
|
-
```typescript
|
|
454
|
-
// Create your own embedding function
|
|
455
|
-
const myEmbedder = async (text) => {
|
|
456
|
-
// Your custom embedding logic here
|
|
457
|
-
return [0.1, 0.2, 0.3, ...] // Return a vector
|
|
458
|
-
}
|
|
459
|
-
|
|
460
|
-
// Use it in Brainy
|
|
461
|
-
const db = new BrainyData({
|
|
462
|
-
embeddingFunction: myEmbedder
|
|
463
|
-
})
|
|
464
|
-
```
|
|
465
|
-
|
|
466
|
-
2. **Custom Distance Functions**
|
|
467
|
-
```typescript
|
|
468
|
-
// Define your own distance function
|
|
469
|
-
const myDistance = (a, b) => {
|
|
470
|
-
// Your custom distance calculation
|
|
471
|
-
return Math.sqrt(a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0))
|
|
472
|
-
}
|
|
473
|
-
|
|
474
|
-
// Use it in Brainy
|
|
475
|
-
const db = new BrainyData({
|
|
476
|
-
distanceFunction: myDistance
|
|
477
|
-
})
|
|
478
|
-
```
|
|
479
|
-
|
|
480
|
-
3. **Custom Storage Adapters**
|
|
481
|
-
```typescript
|
|
482
|
-
// Implement the StorageAdapter interface
|
|
483
|
-
class MyStorage implements StorageAdapter {
|
|
484
|
-
// Your storage implementation
|
|
485
|
-
}
|
|
486
|
-
|
|
487
|
-
// Use it in Brainy
|
|
488
|
-
const db = new BrainyData({
|
|
489
|
-
storageAdapter: new MyStorage()
|
|
490
|
-
})
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
4. **Augmentations System**
|
|
494
|
-
```typescript
|
|
495
|
-
// Create custom augmentations to extend functionality
|
|
496
|
-
const myAugmentation = {
|
|
497
|
-
type: 'memory',
|
|
498
|
-
name: 'my-custom-storage',
|
|
499
|
-
// Implementation details
|
|
500
|
-
}
|
|
501
|
-
|
|
502
|
-
// Register with Brainy
|
|
503
|
-
db.registerAugmentation(myAugmentation)
|
|
504
|
-
```
|
|
505
|
-
|
|
506
|
-
## Data Model
|
|
507
|
-
|
|
508
|
-
Brainy uses a graph-based data model with two primary concepts:
|
|
509
|
-
|
|
510
|
-
### Nouns (Entities)
|
|
511
|
-
|
|
512
|
-
The main entities in your data (nodes in the graph):
|
|
513
|
-
|
|
514
|
-
- Each noun has a unique ID, vector representation, and metadata
|
|
515
|
-
- Nouns can be categorized by type (Person, Place, Thing, Event, Concept, etc.)
|
|
516
|
-
- Nouns are automatically vectorized for similarity search
|
|
517
|
-
|
|
518
|
-
### Verbs (Relationships)
|
|
519
|
-
|
|
520
|
-
Connections between nouns (edges in the graph):
|
|
521
|
-
|
|
522
|
-
- Each verb connects a source noun to a target noun
|
|
523
|
-
- Verbs have types that define the relationship (RelatedTo, Controls, Contains, etc.)
|
|
524
|
-
- Verbs can have their own metadata to describe the relationship
|
|
525
|
-
|
|
526
|
-
### Type Utilities
|
|
527
|
-
|
|
528
|
-
Brainy provides utility functions to access lists of noun and verb types:
|
|
529
|
-
|
|
530
|
-
```typescript
|
|
531
|
-
import {
|
|
532
|
-
NounType,
|
|
533
|
-
VerbType,
|
|
534
|
-
getNounTypes,
|
|
535
|
-
getVerbTypes,
|
|
536
|
-
getNounTypeMap,
|
|
537
|
-
getVerbTypeMap
|
|
538
|
-
} from '@soulcraft/brainy'
|
|
539
|
-
|
|
540
|
-
// At development time:
|
|
541
|
-
// Access specific types directly from the NounType and VerbType objects
|
|
542
|
-
console.log(NounType.Person) // 'person'
|
|
543
|
-
console.log(VerbType.Contains) // 'contains'
|
|
544
|
-
|
|
545
|
-
// At runtime:
|
|
546
|
-
// Get a list of all noun types
|
|
547
|
-
const nounTypes = getNounTypes() // ['person', 'organization', 'location', ...]
|
|
548
|
-
|
|
549
|
-
// Get a list of all verb types
|
|
550
|
-
const verbTypes = getVerbTypes() // ['relatedTo', 'contains', 'partOf', ...]
|
|
551
|
-
|
|
552
|
-
// Get a map of noun type keys to values
|
|
553
|
-
const nounTypeMap = getNounTypeMap() // { Person: 'person', Organization: 'organization', ... }
|
|
554
|
-
|
|
555
|
-
// Get a map of verb type keys to values
|
|
556
|
-
const verbTypeMap = getVerbTypeMap() // { RelatedTo: 'relatedTo', Contains: 'contains', ... }
|
|
142
|
+
npm install brainy
|
|
557
143
|
```
|
|
558
144
|
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
- Get a complete list of available noun and verb types
|
|
562
|
-
- Validate user input against valid types
|
|
563
|
-
- Create dynamic UI components that display or select from available types
|
|
564
|
-
- Map between type keys and their string values
|
|
565
|
-
|
|
566
|
-
## Command Line Interface
|
|
567
|
-
|
|
568
|
-
Brainy includes a powerful CLI for managing your data. The CLI is available as a separate package
|
|
569
|
-
`@soulcraft/brainy-cli` to reduce the bundle size of the main package.
|
|
570
|
-
|
|
571
|
-
### Installing and Using the CLI
|
|
572
|
-
|
|
145
|
+
### Optional: Offline Models Package
|
|
573
146
|
```bash
|
|
574
|
-
|
|
575
|
-
npm install -g @soulcraft/brainy-cli
|
|
576
|
-
|
|
577
|
-
# Initialize a database
|
|
578
|
-
brainy init
|
|
579
|
-
|
|
580
|
-
# Add some data
|
|
581
|
-
brainy add "Cats are independent pets" '{"noun":"Thing","category":"animal"}'
|
|
582
|
-
brainy add "Dogs are loyal companions" '{"noun":"Thing","category":"animal"}'
|
|
583
|
-
|
|
584
|
-
# Search for similar items
|
|
585
|
-
brainy search "feline pets" 5
|
|
586
|
-
|
|
587
|
-
# Add relationships between items
|
|
588
|
-
brainy addVerb <sourceId> <targetId> RelatedTo '{"description":"Both are pets"}'
|
|
589
|
-
|
|
590
|
-
# Visualize the graph structure
|
|
591
|
-
brainy visualize
|
|
592
|
-
brainy visualize --root <id> --depth 3
|
|
147
|
+
npm install @soulcraft/brainy-models
|
|
593
148
|
```
|
|
594
149
|
|
|
595
|
-
|
|
150
|
+
The `@soulcraft/brainy-models` package provides **offline access** to the Universal Sentence Encoder model, eliminating network dependencies and ensuring consistent performance. Perfect for:
|
|
151
|
+
- **Air-gapped environments** - No internet? No problem
|
|
152
|
+
- **Consistent performance** - No network latency or throttling
|
|
153
|
+
- **Privacy-focused apps** - Keep everything local
|
|
154
|
+
- **High-reliability systems** - No external dependencies
|
|
596
155
|
|
|
597
|
-
|
|
598
|
-
|
|
156
|
+
```javascript
|
|
157
|
+
import { createAutoBrainy } from 'brainy'
|
|
158
|
+
import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
|
|
599
159
|
|
|
600
|
-
|
|
601
|
-
|
|
160
|
+
// Use the bundled model for offline operation
|
|
161
|
+
const brainy = createAutoBrainy({
|
|
162
|
+
embeddingModel: BundledUniversalSentenceEncoder
|
|
163
|
+
})
|
|
602
164
|
```
|
|
603
165
|
|
|
604
|
-
|
|
605
|
-
|
|
606
|
-
### Available Commands
|
|
607
|
-
|
|
608
|
-
#### Basic Database Operations:
|
|
609
|
-
|
|
610
|
-
- `init` - Initialize a new database
|
|
611
|
-
- `add <text> [metadata]` - Add a new noun with text and optional metadata
|
|
612
|
-
- `search <query> [limit]` - Search for nouns similar to the query
|
|
613
|
-
- `get <id>` - Get a noun by ID
|
|
614
|
-
- `delete <id>` - Delete a noun by ID
|
|
615
|
-
- `addVerb <sourceId> <targetId> <verbType> [metadata]` - Add a relationship
|
|
616
|
-
- `getVerbs <id>` - Get all relationships for a noun
|
|
617
|
-
- `status` - Show database status
|
|
618
|
-
- `clear` - Clear all data from the database
|
|
619
|
-
- `generate-random-graph` - Generate test data
|
|
620
|
-
- `visualize` - Visualize the graph structure
|
|
621
|
-
- `completion-setup` - Setup shell autocomplete
|
|
622
|
-
|
|
623
|
-
#### Pipeline and Augmentation Commands:
|
|
624
|
-
|
|
625
|
-
- `list-augmentations` - List all available augmentation types and registered augmentations
|
|
626
|
-
- `augmentation-info <type>` - Get detailed information about a specific augmentation type
|
|
627
|
-
- `test-pipeline [text]` - Test the sequential pipeline with sample data
|
|
628
|
-
- `-t, --data-type <type>` - Type of data to process (default: 'text')
|
|
629
|
-
- `-m, --mode <mode>` - Execution mode: sequential, parallel, threaded (default: 'sequential')
|
|
630
|
-
- `-s, --stop-on-error` - Stop execution if an error occurs
|
|
631
|
-
- `-v, --verbose` - Show detailed output
|
|
632
|
-
- `stream-test` - Test streaming data through the pipeline (simulated)
|
|
633
|
-
- `-c, --count <number>` - Number of data items to stream (default: 5)
|
|
634
|
-
- `-i, --interval <ms>` - Interval between data items in milliseconds (default: 1000)
|
|
635
|
-
- `-t, --data-type <type>` - Type of data to process (default: 'text')
|
|
636
|
-
- `-v, --verbose` - Show detailed output
|
|
637
|
-
|
|
638
|
-
## 📚 Documentation
|
|
639
|
-
|
|
640
|
-
### 🚀 [Getting Started](docs/getting-started/)
|
|
641
|
-
Quick setup guides and first steps with Brainy.
|
|
642
|
-
|
|
643
|
-
- **[Installation](docs/getting-started/installation.md)** - Installation and setup
|
|
644
|
-
- **[Quick Start](docs/getting-started/quick-start.md)** - Get running in 2 minutes
|
|
645
|
-
- **[First Steps](docs/getting-started/first-steps.md)** - Core concepts and features
|
|
646
|
-
- **[Environment Setup](docs/getting-started/environment-setup.md)** - Environment-specific configuration
|
|
647
|
-
|
|
648
|
-
### 📖 [User Guides](docs/user-guides/)
|
|
649
|
-
Comprehensive guides for using Brainy effectively.
|
|
166
|
+
## 🎨 Build Amazing Things
|
|
650
167
|
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
|
|
168
|
+
**🤖 AI Chat Applications** - Build ChatGPT-like apps with long-term memory and context awareness
|
|
169
|
+
**🔍 Semantic Search Engines** - Search by meaning, not keywords. Find "that thing that's like a cat but bigger" → returns "tiger"
|
|
170
|
+
**🎯 Recommendation Engines** - "Users who liked this also liked..." but actually good
|
|
171
|
+
**🧬 Knowledge Graphs** - Connect everything to everything. Wikipedia meets Neo4j meets magic
|
|
172
|
+
**👁️ Computer Vision Apps** - Store and search image embeddings. "Find all photos with dogs wearing hats"
|
|
173
|
+
**🎵 Music Discovery** - Find songs that "feel" similar. Spotify's Discover Weekly in your app
|
|
174
|
+
**📚 Smart Documentation** - Docs that answer questions. "How do I deploy to production?" → relevant guides
|
|
175
|
+
**🛡️ Fraud Detection** - Find patterns humans can't see. Anomaly detection on steroids
|
|
176
|
+
**🌐 Real-Time Collaboration** - Sync vector data across devices. Figma for AI data
|
|
177
|
+
**🏥 Medical Diagnosis Tools** - Match symptoms to conditions using embedding similarity
|
|
655
178
|
|
|
656
|
-
|
|
657
|
-
Transform Brainy from prototype to production-ready system.
|
|
179
|
+
## 🧬 The Power of Nouns & Verbs
|
|
658
180
|
|
|
659
|
-
- **
|
|
660
|
-
- **[Auto-Configuration](docs/optimization-guides/auto-configuration.md)** - Intelligent environment detection
|
|
661
|
-
- **[Memory Optimization](docs/optimization-guides/memory-optimization.md)** - Advanced memory management
|
|
662
|
-
- **[Storage Optimization](docs/optimization-guides/storage-optimization.md)** - S3 and storage optimization
|
|
181
|
+
Brainy uses a **graph-based data model** that mirrors how humans think - with **Nouns** (entities) connected by **Verbs** (relationships). This isn't just vectors in a void; it's structured, meaningful data.
|
|
663
182
|
|
|
664
|
-
###
|
|
665
|
-
Complete API documentation and method references.
|
|
183
|
+
### 📝 Nouns (What Things Are)
|
|
666
184
|
|
|
667
|
-
|
|
668
|
-
-
|
|
669
|
-
-
|
|
670
|
-
-
|
|
185
|
+
Nouns are your entities - the "things" in your data. Each noun has:
|
|
186
|
+
- A unique ID
|
|
187
|
+
- A vector representation (for similarity search)
|
|
188
|
+
- A type (Person, Document, Concept, etc.)
|
|
189
|
+
- Custom metadata
|
|
671
190
|
|
|
672
|
-
|
|
673
|
-
Practical code examples and real-world applications.
|
|
191
|
+
**Available Noun Types:**
|
|
674
192
|
|
|
675
|
-
|
|
676
|
-
|
|
677
|
-
|
|
678
|
-
|
|
193
|
+
| Category | Types | Use For |
|
|
194
|
+
|----------|-------|---------|
|
|
195
|
+
| **Core Entities** | `Person`, `Organization`, `Location`, `Thing`, `Concept`, `Event` | People, companies, places, objects, ideas, happenings |
|
|
196
|
+
| **Digital Content** | `Document`, `Media`, `File`, `Message`, `Content` | PDFs, images, videos, emails, posts, generic content |
|
|
197
|
+
| **Collections** | `Collection`, `Dataset` | Groups of items, structured data sets |
|
|
198
|
+
| **Business** | `Product`, `Service`, `User`, `Task`, `Project` | E-commerce, SaaS, project management |
|
|
199
|
+
| **Descriptive** | `Process`, `State`, `Role` | Workflows, conditions, responsibilities |
|
|
679
200
|
|
|
680
|
-
###
|
|
201
|
+
### 🔗 Verbs (How Things Connect)
|
|
681
202
|
|
|
682
|
-
-
|
|
683
|
-
- **[Statistics Guide](STATISTICS.md)** - Database statistics and monitoring
|
|
684
|
-
- **[Technical Guides](TECHNICAL_GUIDES.md)** - Advanced technical topics
|
|
203
|
+
Verbs are your relationships - they give meaning to connections. Not just "these vectors are similar" but "this OWNS that" or "this CAUSES that".
|
|
685
204
|
|
|
686
|
-
|
|
205
|
+
**Available Verb Types:**
|
|
687
206
|
|
|
688
|
-
|
|
689
|
-
|
|
690
|
-
|
|
691
|
-
|
|
692
|
-
|
|
207
|
+
| Category | Types | Examples |
|
|
208
|
+
|----------|-------|----------|
|
|
209
|
+
| **Core** | `RelatedTo`, `Contains`, `PartOf`, `LocatedAt`, `References` | Generic relations, containment, location |
|
|
210
|
+
| **Temporal** | `Precedes`, `Succeeds`, `Causes`, `DependsOn`, `Requires` | Time sequences, causality, dependencies |
|
|
211
|
+
| **Creation** | `Creates`, `Transforms`, `Becomes`, `Modifies`, `Consumes` | Creation, change, consumption |
|
|
212
|
+
| **Ownership** | `Owns`, `AttributedTo`, `CreatedBy`, `BelongsTo` | Ownership, authorship, belonging |
|
|
213
|
+
| **Social** | `MemberOf`, `WorksWith`, `FriendOf`, `Follows`, `Likes`, `ReportsTo` | Social networks, organizations |
|
|
214
|
+
| **Functional** | `Describes`, `Implements`, `Validates`, `Triggers`, `Serves` | Functions, implementations, services |
|
|
693
215
|
|
|
694
|
-
|
|
695
|
-
await db.clear()
|
|
216
|
+
### 💡 Why This Matters
|
|
696
217
|
|
|
697
|
-
|
|
698
|
-
|
|
218
|
+
```javascript
|
|
219
|
+
// Traditional vector DB: Just similarity
|
|
220
|
+
const similar = await vectorDB.search(embedding, 10)
|
|
221
|
+
// Result: [vector1, vector2, ...] - What do these mean? 🤷
|
|
699
222
|
|
|
700
|
-
//
|
|
701
|
-
const
|
|
223
|
+
// Brainy: Similarity + Meaning + Relationships
|
|
224
|
+
const catId = await brainy.add("Siamese cat", {
|
|
225
|
+
noun: NounType.Thing,
|
|
226
|
+
breed: "Siamese"
|
|
227
|
+
})
|
|
228
|
+
const ownerId = await brainy.add("John Smith", {
|
|
229
|
+
noun: NounType.Person
|
|
230
|
+
})
|
|
231
|
+
await brainy.addVerb(ownerId, catId, {
|
|
232
|
+
verb: VerbType.Owns,
|
|
233
|
+
since: "2020-01-01"
|
|
234
|
+
})
|
|
702
235
|
|
|
703
|
-
//
|
|
704
|
-
const
|
|
236
|
+
// Now you can search with context!
|
|
237
|
+
const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
|
|
238
|
+
const catOwners = await brainy.getVerbsByTarget(catId, VerbType.Owns)
|
|
705
239
|
```
|
|
706
240
|
|
|
707
|
-
|
|
708
|
-
|
|
709
|
-
Brainy provides a way to get statistics about the current state of the database. For detailed information about the
|
|
710
|
-
statistics system, including implementation details, scalability improvements, and usage examples, see
|
|
711
|
-
our [Statistics Guide](STATISTICS.md).
|
|
712
|
-
|
|
713
|
-
```typescript
|
|
714
|
-
import { BrainyData, getStatistics } from '@soulcraft/brainy'
|
|
241
|
+
## 🌍 Distributed Mode (New!)
|
|
715
242
|
|
|
716
|
-
|
|
717
|
-
const db = new BrainyData()
|
|
718
|
-
await db.init()
|
|
243
|
+
Brainy now supports **distributed deployments** with multiple specialized instances sharing the same data. Perfect for scaling your AI applications across multiple servers.
|
|
719
244
|
|
|
720
|
-
|
|
721
|
-
const stats = await db.getStatistics()
|
|
722
|
-
console.log(stats)
|
|
723
|
-
// Output: { nounCount: 0, verbCount: 0, metadataCount: 0, hnswIndexSize: 0, serviceBreakdown: {...} }
|
|
724
|
-
```
|
|
245
|
+
### Distributed Setup
|
|
725
246
|
|
|
726
|
-
|
|
727
|
-
|
|
728
|
-
|
|
729
|
-
|
|
730
|
-
const id = await db.add(textOrVector, {
|
|
731
|
-
noun: NounType.Thing,
|
|
732
|
-
// other metadata...
|
|
247
|
+
```javascript
|
|
248
|
+
// Single instance (no change needed!)
|
|
249
|
+
const brainy = createAutoBrainy({
|
|
250
|
+
storage: { type: 's3', bucket: 'my-bucket' }
|
|
733
251
|
})
|
|
734
252
|
|
|
735
|
-
//
|
|
736
|
-
|
|
737
|
-
|
|
738
|
-
|
|
739
|
-
|
|
740
|
-
|
|
741
|
-
{
|
|
742
|
-
vectorOrData: "Second item to add",
|
|
743
|
-
metadata: { noun: NounType.Thing, category: 'example' }
|
|
744
|
-
},
|
|
745
|
-
// More items...
|
|
746
|
-
], {
|
|
747
|
-
forceEmbed: false,
|
|
748
|
-
concurrency: 4, // Control the level of parallelism (default: 4)
|
|
749
|
-
batchSize: 50 // Control the number of items to process in a single batch (default: 50)
|
|
253
|
+
// Distributed mode requires explicit role configuration
|
|
254
|
+
// Option 1: Via environment variable
|
|
255
|
+
process.env.BRAINY_ROLE = 'writer' // or 'reader' or 'hybrid'
|
|
256
|
+
const brainy = createAutoBrainy({
|
|
257
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
258
|
+
distributed: true
|
|
750
259
|
})
|
|
751
260
|
|
|
752
|
-
//
|
|
753
|
-
const
|
|
754
|
-
|
|
755
|
-
//
|
|
756
|
-
await db.updateMetadata(id, {
|
|
757
|
-
noun: NounType.Thing,
|
|
758
|
-
// updated metadata...
|
|
261
|
+
// Option 2: Via configuration
|
|
262
|
+
const writer = createAutoBrainy({
|
|
263
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
264
|
+
distributed: { role: 'writer' } // Handles data ingestion
|
|
759
265
|
})
|
|
760
266
|
|
|
761
|
-
|
|
762
|
-
|
|
763
|
-
|
|
764
|
-
|
|
765
|
-
const results = await db.search(vectorOrText, numResults)
|
|
766
|
-
const textResults = await db.searchText("query text", numResults)
|
|
767
|
-
|
|
768
|
-
// Search by noun type
|
|
769
|
-
const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
|
|
267
|
+
const reader = createAutoBrainy({
|
|
268
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
269
|
+
distributed: { role: 'reader' } // Optimized for queries
|
|
270
|
+
})
|
|
770
271
|
|
|
771
|
-
//
|
|
772
|
-
const
|
|
773
|
-
|
|
272
|
+
// Option 3: Via read/write mode (role auto-inferred)
|
|
273
|
+
const writer = createAutoBrainy({
|
|
274
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
275
|
+
writeOnly: true, // Automatically becomes 'writer' role
|
|
276
|
+
distributed: true
|
|
774
277
|
})
|
|
775
278
|
|
|
776
|
-
|
|
777
|
-
|
|
778
|
-
|
|
779
|
-
|
|
279
|
+
const reader = createAutoBrainy({
|
|
280
|
+
storage: { type: 's3', bucket: 'my-bucket' },
|
|
281
|
+
readOnly: true, // Automatically becomes 'reader' role
|
|
282
|
+
distributed: true
|
|
780
283
|
})
|
|
781
284
|
```
|
|
782
285
|
|
|
783
|
-
###
|
|
286
|
+
### Key Distributed Features
|
|
784
287
|
|
|
785
|
-
|
|
786
|
-
|
|
288
|
+
**🎯 Explicit Role Configuration**
|
|
289
|
+
- Roles must be explicitly set (no dangerous auto-assignment)
|
|
290
|
+
- Can use environment variables, config, or read/write modes
|
|
291
|
+
- Clear separation between writers and readers
|
|
787
292
|
|
|
788
|
-
|
|
789
|
-
|
|
790
|
-
|
|
791
|
-
|
|
293
|
+
**#️⃣ Hash-Based Partitioning**
|
|
294
|
+
- Handles multiple writers with different data types
|
|
295
|
+
- Even distribution across partitions
|
|
296
|
+
- No semantic conflicts with mixed data
|
|
792
297
|
|
|
793
|
-
|
|
794
|
-
|
|
795
|
-
|
|
298
|
+
**🏷️ Domain Tagging**
|
|
299
|
+
- Automatic domain detection (medical, legal, product, etc.)
|
|
300
|
+
- Filter searches by domain
|
|
301
|
+
- Logical separation without complexity
|
|
302
|
+
|
|
303
|
+
```javascript
|
|
304
|
+
// Data is automatically tagged with domains
|
|
305
|
+
await brainy.add({
|
|
306
|
+
symptoms: "fever",
|
|
307
|
+
diagnosis: "flu"
|
|
308
|
+
}, metadata) // Auto-tagged as 'medical'
|
|
309
|
+
|
|
310
|
+
// Search within specific domains
|
|
311
|
+
const medicalResults = await brainy.search(query, 10, {
|
|
312
|
+
filter: { domain: 'medical' }
|
|
313
|
+
})
|
|
796
314
|
```
|
|
797
315
|
|
|
798
|
-
|
|
316
|
+
**📊 Health Monitoring**
|
|
317
|
+
- Real-time health metrics
|
|
318
|
+
- Automatic dead instance cleanup
|
|
319
|
+
- Performance tracking
|
|
799
320
|
|
|
800
|
-
```
|
|
801
|
-
//
|
|
802
|
-
|
|
321
|
+
```javascript
|
|
322
|
+
// Get health status
|
|
323
|
+
const health = brainy.getHealthStatus()
|
|
324
|
+
// {
|
|
325
|
+
// status: 'healthy',
|
|
326
|
+
// role: 'reader',
|
|
327
|
+
// vectorCount: 1000000,
|
|
328
|
+
// cacheHitRate: 0.95,
|
|
329
|
+
// requestsPerSecond: 150
|
|
330
|
+
// }
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**⚡ Role-Optimized Performance**
|
|
334
|
+
- **Readers**: 80% memory for cache, aggressive prefetching
|
|
335
|
+
- **Writers**: Optimized write batching, minimal cache
|
|
336
|
+
- **Hybrid**: Adaptive based on workload
|
|
337
|
+
|
|
338
|
+
### Deployment Examples
|
|
339
|
+
|
|
340
|
+
**Docker Compose**
|
|
341
|
+
```yaml
|
|
342
|
+
services:
|
|
343
|
+
writer:
|
|
344
|
+
image: myapp
|
|
345
|
+
environment:
|
|
346
|
+
BRAINY_ROLE: writer # Optional - auto-detects
|
|
347
|
+
|
|
348
|
+
reader:
|
|
349
|
+
image: myapp
|
|
350
|
+
environment:
|
|
351
|
+
BRAINY_ROLE: reader # Optional - auto-detects
|
|
352
|
+
scale: 5
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
**Kubernetes**
|
|
356
|
+
```yaml
|
|
357
|
+
# Automatically detects role from deployment type
|
|
358
|
+
apiVersion: apps/v1
|
|
359
|
+
kind: Deployment
|
|
360
|
+
metadata:
|
|
361
|
+
name: brainy-readers
|
|
362
|
+
spec:
|
|
363
|
+
replicas: 10 # Multiple readers
|
|
364
|
+
template:
|
|
365
|
+
spec:
|
|
366
|
+
containers:
|
|
367
|
+
- name: app
|
|
368
|
+
image: myapp
|
|
369
|
+
# Role auto-detected as 'reader' (multiple replicas)
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
**Benefits**
|
|
373
|
+
- ✅ **50-70% faster searches** with parallel readers
|
|
374
|
+
- ✅ **No coordination complexity** - Shared JSON config in S3
|
|
375
|
+
- ✅ **Zero downtime scaling** - Add/remove instances anytime
|
|
376
|
+
- ✅ **Automatic failover** - Dead instances cleaned up automatically
|
|
377
|
+
|
|
378
|
+
## 🤔 Why Choose Brainy?
|
|
379
|
+
|
|
380
|
+
### vs. Traditional Databases
|
|
381
|
+
❌ **PostgreSQL with pgvector** - Requires complex setup, tuning, and DevOps expertise
|
|
382
|
+
✅ **Brainy** - Zero config, auto-optimizes, works everywhere from browser to cloud
|
|
383
|
+
|
|
384
|
+
### vs. Vector Databases
|
|
385
|
+
❌ **Pinecone/Weaviate/Qdrant** - Cloud-only, expensive, vendor lock-in
|
|
386
|
+
✅ **Brainy** - Run locally, in browser, or cloud. Your choice, your data
|
|
387
|
+
|
|
388
|
+
### vs. Graph Databases
|
|
389
|
+
❌ **Neo4j** - Great for graphs, no vector support
|
|
390
|
+
✅ **Brainy** - Vectors + graphs in one. Best of both worlds
|
|
391
|
+
|
|
392
|
+
### vs. DIY Solutions
|
|
393
|
+
❌ **Building your own** - Months of work, optimization nightmares
|
|
394
|
+
✅ **Brainy** - Production-ready in 30 seconds
|
|
395
|
+
|
|
396
|
+
## 🚀 Getting Started in 30 Seconds
|
|
397
|
+
|
|
398
|
+
### React
|
|
399
|
+
|
|
400
|
+
```jsx
|
|
401
|
+
import { createAutoBrainy } from 'brainy'
|
|
402
|
+
import { useEffect, useState } from 'react'
|
|
403
|
+
|
|
404
|
+
function SemanticSearch() {
|
|
405
|
+
const [brainy] = useState(() => createAutoBrainy())
|
|
406
|
+
const [results, setResults] = useState([])
|
|
407
|
+
|
|
408
|
+
const search = async (query) => {
|
|
409
|
+
const items = await brainy.searchText(query, 10)
|
|
410
|
+
setResults(items)
|
|
411
|
+
}
|
|
412
|
+
|
|
413
|
+
return (
|
|
414
|
+
<input onChange={(e) => search(e.target.value)}
|
|
415
|
+
placeholder="Search by meaning..." />
|
|
416
|
+
)
|
|
417
|
+
}
|
|
803
418
|
```
|
|
804
419
|
|
|
805
|
-
###
|
|
420
|
+
### Angular
|
|
806
421
|
|
|
807
422
|
```typescript
|
|
808
|
-
|
|
809
|
-
|
|
810
|
-
|
|
811
|
-
|
|
423
|
+
import { Component, OnInit } from '@angular/core'
|
|
424
|
+
import { createAutoBrainy } from 'brainy'
|
|
425
|
+
|
|
426
|
+
@Component({
|
|
427
|
+
selector: 'app-search',
|
|
428
|
+
template: `
|
|
429
|
+
<input (input)="search($event.target.value)"
|
|
430
|
+
placeholder="Semantic search...">
|
|
431
|
+
<div *ngFor="let result of results">
|
|
432
|
+
{{ result.text }}
|
|
433
|
+
</div>
|
|
434
|
+
`
|
|
812
435
|
})
|
|
436
|
+
export class SearchComponent implements OnInit {
|
|
437
|
+
brainy = createAutoBrainy()
|
|
438
|
+
results = []
|
|
813
439
|
|
|
814
|
-
|
|
815
|
-
|
|
816
|
-
await db.addVerb(sourceId, targetId, {
|
|
817
|
-
verb: VerbType.RelatedTo,
|
|
818
|
-
// Enable auto-creation of missing nouns
|
|
819
|
-
autoCreateMissingNouns: true,
|
|
820
|
-
// Optional metadata for auto-created nouns
|
|
821
|
-
missingNounMetadata: {
|
|
822
|
-
noun: NounType.Concept,
|
|
823
|
-
description: 'Auto-created noun'
|
|
440
|
+
async search(query: string) {
|
|
441
|
+
this.results = await this.brainy.searchText(query, 10)
|
|
824
442
|
}
|
|
825
|
-
}
|
|
443
|
+
}
|
|
444
|
+
```
|
|
826
445
|
|
|
827
|
-
|
|
828
|
-
const verbs = await db.getAllVerbs()
|
|
446
|
+
### Vue 3
|
|
829
447
|
|
|
830
|
-
|
|
831
|
-
|
|
448
|
+
```vue
|
|
449
|
+
<script setup>
|
|
450
|
+
import { createAutoBrainy } from 'brainy'
|
|
451
|
+
import { ref } from 'vue'
|
|
832
452
|
|
|
833
|
-
|
|
834
|
-
const
|
|
453
|
+
const brainy = createAutoBrainy()
|
|
454
|
+
const results = ref([])
|
|
835
455
|
|
|
836
|
-
|
|
837
|
-
|
|
456
|
+
const search = async (query) => {
|
|
457
|
+
results.value = await brainy.searchText(query, 10)
|
|
458
|
+
}
|
|
459
|
+
</script>
|
|
838
460
|
|
|
839
|
-
|
|
840
|
-
|
|
461
|
+
<template>
|
|
462
|
+
<input @input="search($event.target.value)"
|
|
463
|
+
placeholder="Find similar content...">
|
|
464
|
+
<div v-for="result in results" :key="result.id">
|
|
465
|
+
{{ result.text }}
|
|
466
|
+
</div>
|
|
467
|
+
</template>
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
### Svelte
|
|
471
|
+
|
|
472
|
+
```svelte
|
|
473
|
+
<script>
|
|
474
|
+
import { createAutoBrainy } from 'brainy'
|
|
475
|
+
|
|
476
|
+
const brainy = createAutoBrainy()
|
|
477
|
+
let results = []
|
|
478
|
+
|
|
479
|
+
async function search(e) {
|
|
480
|
+
results = await brainy.searchText(e.target.value, 10)
|
|
481
|
+
}
|
|
482
|
+
</script>
|
|
841
483
|
|
|
842
|
-
|
|
843
|
-
|
|
484
|
+
<input on:input={search} placeholder="AI-powered search...">
|
|
485
|
+
{#each results as result}
|
|
486
|
+
<div>{result.text}</div>
|
|
487
|
+
{/each}
|
|
844
488
|
```
|
|
845
489
|
|
|
846
|
-
|
|
490
|
+
### Next.js (App Router)
|
|
847
491
|
|
|
848
|
-
|
|
492
|
+
```jsx
|
|
493
|
+
// app/search/page.js
|
|
494
|
+
import { createAutoBrainy } from 'brainy'
|
|
849
495
|
|
|
850
|
-
|
|
851
|
-
|
|
852
|
-
|
|
853
|
-
|
|
496
|
+
export default function SearchPage() {
|
|
497
|
+
async function search(formData) {
|
|
498
|
+
'use server'
|
|
499
|
+
const brainy = createAutoBrainy({ bucketName: 'vectors' })
|
|
500
|
+
const query = formData.get('query')
|
|
501
|
+
return await brainy.searchText(query, 10)
|
|
502
|
+
}
|
|
854
503
|
|
|
855
|
-
|
|
856
|
-
|
|
857
|
-
|
|
504
|
+
return (
|
|
505
|
+
<form action={search}>
|
|
506
|
+
<input name="query" placeholder="Search..." />
|
|
507
|
+
<button type="submit">Search</button>
|
|
508
|
+
</form>
|
|
509
|
+
)
|
|
510
|
+
}
|
|
511
|
+
```
|
|
858
512
|
|
|
859
|
-
|
|
860
|
-
db.setReadOnly(true)
|
|
513
|
+
### Node.js / Bun / Deno
|
|
861
514
|
|
|
862
|
-
|
|
863
|
-
|
|
515
|
+
```javascript
|
|
516
|
+
import { createAutoBrainy } from 'brainy'
|
|
864
517
|
|
|
865
|
-
|
|
866
|
-
db.setWriteOnly(true)
|
|
518
|
+
const brainy = createAutoBrainy()
|
|
867
519
|
|
|
868
|
-
//
|
|
869
|
-
|
|
520
|
+
// Add some data
|
|
521
|
+
await brainy.add("TypeScript is a typed superset of JavaScript", {
|
|
522
|
+
category: 'programming'
|
|
523
|
+
})
|
|
870
524
|
|
|
871
|
-
//
|
|
872
|
-
|
|
873
|
-
|
|
525
|
+
// Search for similar content
|
|
526
|
+
const results = await brainy.searchText("JavaScript with types", 5)
|
|
527
|
+
console.log(results)
|
|
874
528
|
```
|
|
875
529
|
|
|
876
|
-
|
|
877
|
-
where you want to prevent modifications to the database.
|
|
878
|
-
- **Write-Only Mode**: When enabled, prevents all search operations. Useful for initial data loading or when you want to
|
|
879
|
-
optimize for write performance.
|
|
530
|
+
### Vanilla JavaScript
|
|
880
531
|
|
|
881
|
-
|
|
532
|
+
```html
|
|
533
|
+
<!DOCTYPE html>
|
|
534
|
+
<html>
|
|
535
|
+
<head>
|
|
536
|
+
<script type="module">
|
|
537
|
+
import { createAutoBrainy } from 'https://unpkg.com/brainy/dist/unified.min.js'
|
|
538
|
+
|
|
539
|
+
window.brainy = createAutoBrainy()
|
|
540
|
+
|
|
541
|
+
window.search = async function(query) {
|
|
542
|
+
const results = await brainy.searchText(query, 10)
|
|
543
|
+
document.getElementById('results').innerHTML =
|
|
544
|
+
results.map(r => `<div>${r.text}</div>`).join('')
|
|
545
|
+
}
|
|
546
|
+
</script>
|
|
547
|
+
</head>
|
|
548
|
+
<body>
|
|
549
|
+
<input onkeyup="search(this.value)" placeholder="Search...">
|
|
550
|
+
<div id="results"></div>
|
|
551
|
+
</body>
|
|
552
|
+
</html>
|
|
553
|
+
```
|
|
882
554
|
|
|
883
|
-
|
|
884
|
-
import {
|
|
885
|
-
BrainyData,
|
|
886
|
-
createTensorFlowEmbeddingFunction,
|
|
887
|
-
createThreadedEmbeddingFunction
|
|
888
|
-
} from '@soulcraft/brainy'
|
|
889
|
-
|
|
890
|
-
// Use the standard TensorFlow Universal Sentence Encoder embedding function
|
|
891
|
-
const db = new BrainyData({
|
|
892
|
-
embeddingFunction: createTensorFlowEmbeddingFunction()
|
|
893
|
-
})
|
|
894
|
-
await db.init()
|
|
555
|
+
### Cloudflare Workers
|
|
895
556
|
|
|
896
|
-
|
|
897
|
-
|
|
898
|
-
|
|
899
|
-
|
|
900
|
-
|
|
901
|
-
|
|
902
|
-
|
|
903
|
-
|
|
904
|
-
|
|
905
|
-
|
|
906
|
-
const
|
|
907
|
-
|
|
908
|
-
|
|
909
|
-
)
|
|
910
|
-
console.log(`Similarity score: ${similarity}`) // Higher value means more similar
|
|
911
|
-
|
|
912
|
-
// Calculate similarity with custom options
|
|
913
|
-
const vectorA = await db.embed("First text")
|
|
914
|
-
const vectorB = await db.embed("Second text")
|
|
915
|
-
const customSimilarity = await db.calculateSimilarity(
|
|
916
|
-
vectorA, // Can use pre-computed vectors
|
|
917
|
-
vectorB,
|
|
918
|
-
{
|
|
919
|
-
forceEmbed: false, // Skip embedding if inputs are already vectors
|
|
920
|
-
distanceFunction: cosineDistance // Optional custom distance function
|
|
557
|
+
```javascript
|
|
558
|
+
import { createAutoBrainy } from 'brainy'
|
|
559
|
+
|
|
560
|
+
export default {
|
|
561
|
+
async fetch(request, env) {
|
|
562
|
+
const brainy = createAutoBrainy({
|
|
563
|
+
bucketName: env.R2_BUCKET
|
|
564
|
+
})
|
|
565
|
+
|
|
566
|
+
const url = new URL(request.url)
|
|
567
|
+
const query = url.searchParams.get('q')
|
|
568
|
+
|
|
569
|
+
const results = await brainy.searchText(query, 10)
|
|
570
|
+
return Response.json(results)
|
|
921
571
|
}
|
|
922
|
-
|
|
572
|
+
}
|
|
923
573
|
```
|
|
924
574
|
|
|
925
|
-
|
|
926
|
-
performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
|
|
927
|
-
falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
|
|
928
|
-
implementation includes worker reuse and model caching for optimal performance.
|
|
929
|
-
|
|
930
|
-
### Performance Tuning
|
|
931
|
-
|
|
932
|
-
Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
|
|
933
|
-
container, server):
|
|
934
|
-
|
|
935
|
-
#### GPU and CPU Optimization
|
|
936
|
-
|
|
937
|
-
Brainy uses GPU and CPU optimization for compute-intensive operations:
|
|
938
|
-
|
|
939
|
-
1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
|
|
940
|
-
2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
|
|
941
|
-
3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
|
|
942
|
-
4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
|
|
943
|
-
5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
|
|
944
|
-
|
|
945
|
-
#### Multithreading Support
|
|
946
|
-
|
|
947
|
-
Brainy includes comprehensive multithreading support to improve performance across all environments:
|
|
575
|
+
### AWS Lambda
|
|
948
576
|
|
|
949
|
-
|
|
950
|
-
|
|
951
|
-
3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
|
|
952
|
-
4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
|
|
953
|
-
5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
|
|
954
|
-
6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
|
|
955
|
-
7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
|
|
577
|
+
```javascript
|
|
578
|
+
import { createAutoBrainy } from 'brainy'
|
|
956
579
|
|
|
957
|
-
|
|
958
|
-
|
|
959
|
-
|
|
960
|
-
|
|
961
|
-
|
|
962
|
-
|
|
963
|
-
|
|
964
|
-
|
|
965
|
-
|
|
966
|
-
|
|
967
|
-
M: 16, // Max connections per noun
|
|
968
|
-
efConstruction: 200, // Construction candidate list size
|
|
969
|
-
efSearch: 50, // Search candidate list size
|
|
970
|
-
},
|
|
971
|
-
|
|
972
|
-
// Performance optimization options
|
|
973
|
-
performance: {
|
|
974
|
-
useParallelization: true, // Enable multithreaded search operations
|
|
975
|
-
},
|
|
976
|
-
|
|
977
|
-
// Noun and Verb type validation
|
|
978
|
-
typeValidation: {
|
|
979
|
-
enforceNounTypes: true, // Validate noun types against NounType enum
|
|
980
|
-
enforceVerbTypes: true, // Validate verb types against VerbType enum
|
|
981
|
-
},
|
|
982
|
-
|
|
983
|
-
// Storage configuration
|
|
984
|
-
storage: {
|
|
985
|
-
requestPersistentStorage: true,
|
|
986
|
-
// Example configuration for cloud storage (replace with your own values):
|
|
987
|
-
// s3Storage: {
|
|
988
|
-
// bucketName: 'your-s3-bucket-name',
|
|
989
|
-
// region: 'your-aws-region'
|
|
990
|
-
// // Credentials should be provided via environment variables
|
|
991
|
-
// // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
|
|
992
|
-
// }
|
|
580
|
+
export const handler = async (event) => {
|
|
581
|
+
const brainy = createAutoBrainy({
|
|
582
|
+
bucketName: process.env.S3_BUCKET
|
|
583
|
+
})
|
|
584
|
+
|
|
585
|
+
const results = await brainy.searchText(event.query, 10)
|
|
586
|
+
|
|
587
|
+
return {
|
|
588
|
+
statusCode: 200,
|
|
589
|
+
body: JSON.stringify(results)
|
|
993
590
|
}
|
|
994
|
-
}
|
|
591
|
+
}
|
|
995
592
|
```
|
|
996
593
|
|
|
997
|
-
###
|
|
998
|
-
|
|
999
|
-
Brainy includes an optimized HNSW index implementation for large datasets that may not fit entirely in memory, using a
|
|
1000
|
-
hybrid approach:
|
|
594
|
+
### Azure Functions
|
|
1001
595
|
|
|
1002
|
-
|
|
1003
|
-
|
|
1004
|
-
3. **Memory-Efficient Indexing** - Optimizes memory usage for large-scale vector collections
|
|
596
|
+
```javascript
|
|
597
|
+
import { createAutoBrainy } from 'brainy'
|
|
1005
598
|
|
|
1006
|
-
|
|
1007
|
-
|
|
1008
|
-
|
|
1009
|
-
|
|
1010
|
-
|
|
1011
|
-
|
|
1012
|
-
|
|
1013
|
-
|
|
1014
|
-
|
|
1015
|
-
efSearch: 50, // Search candidate list size
|
|
1016
|
-
|
|
1017
|
-
// Memory threshold in bytes - when exceeded, will use disk-based approach
|
|
1018
|
-
memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
|
|
1019
|
-
|
|
1020
|
-
// Product quantization settings for dimensionality reduction
|
|
1021
|
-
productQuantization: {
|
|
1022
|
-
enabled: true, // Enable product quantization
|
|
1023
|
-
numSubvectors: 16, // Number of subvectors to split the vector into
|
|
1024
|
-
numCentroids: 256 // Number of centroids per subvector
|
|
1025
|
-
},
|
|
1026
|
-
|
|
1027
|
-
// Whether to use disk-based storage for the index
|
|
1028
|
-
useDiskBasedIndex: true // Enable disk-based storage
|
|
1029
|
-
},
|
|
1030
|
-
|
|
1031
|
-
// Storage configuration (required for disk-based index)
|
|
1032
|
-
storage: {
|
|
1033
|
-
requestPersistentStorage: true
|
|
599
|
+
module.exports = async function (context, req) {
|
|
600
|
+
const brainy = createAutoBrainy({
|
|
601
|
+
bucketName: process.env.AZURE_STORAGE_CONTAINER
|
|
602
|
+
})
|
|
603
|
+
|
|
604
|
+
const results = await brainy.searchText(req.query.q, 10)
|
|
605
|
+
|
|
606
|
+
context.res = {
|
|
607
|
+
body: results
|
|
1034
608
|
}
|
|
1035
|
-
}
|
|
1036
|
-
|
|
1037
|
-
// The optimized index automatically adapts based on dataset size:
|
|
1038
|
-
// 1. For small datasets: Uses standard in-memory approach
|
|
1039
|
-
// 2. For medium datasets: Applies product quantization to reduce memory usage
|
|
1040
|
-
// 3. For large datasets: Combines product quantization with disk-based storage
|
|
1041
|
-
|
|
1042
|
-
// Check status to see memory usage and optimization details
|
|
1043
|
-
const status = await db.status()
|
|
1044
|
-
console.log(status.details.index)
|
|
609
|
+
}
|
|
1045
610
|
```
|
|
1046
611
|
|
|
1047
|
-
|
|
1048
|
-
|
|
1049
|
-
Brainy provides several distance functions for vector similarity calculations:
|
|
612
|
+
### Google Cloud Functions
|
|
1050
613
|
|
|
1051
|
-
|
|
1052
|
-
|
|
1053
|
-
- `manhattanDistance`: Measures the sum of absolute differences between vector components
|
|
1054
|
-
- `dotProductDistance`: Measures the negative dot product between vectors
|
|
1055
|
-
|
|
1056
|
-
All distance functions are optimized for performance and automatically use the most efficient implementation based on
|
|
1057
|
-
the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
|
|
1058
|
-
and multithreading when available to improve performance.
|
|
1059
|
-
|
|
1060
|
-
## Backup and Restore
|
|
1061
|
-
|
|
1062
|
-
Brainy provides backup and restore capabilities that allow you to:
|
|
1063
|
-
|
|
1064
|
-
- Back up your data
|
|
1065
|
-
- Transfer data between Brainy instances
|
|
1066
|
-
- Restore existing data into Brainy for vectorization and indexing
|
|
1067
|
-
- Backup data for analysis or visualization in other tools
|
|
1068
|
-
|
|
1069
|
-
### Backing Up Data
|
|
1070
|
-
|
|
1071
|
-
```typescript
|
|
1072
|
-
// Backup all data from the database
|
|
1073
|
-
const backupData = await db.backup()
|
|
1074
|
-
|
|
1075
|
-
// The backup data includes:
|
|
1076
|
-
// - All nouns (entities) with their vectors and metadata
|
|
1077
|
-
// - All verbs (relationships) between nouns
|
|
1078
|
-
// - Noun types and verb types
|
|
1079
|
-
// - HNSW index data for fast similarity search
|
|
1080
|
-
// - Version information
|
|
1081
|
-
|
|
1082
|
-
// Save the backup data to a file (Node.js environment)
|
|
1083
|
-
import fs from 'fs'
|
|
614
|
+
```javascript
|
|
615
|
+
import { createAutoBrainy } from 'brainy'
|
|
1084
616
|
|
|
1085
|
-
|
|
617
|
+
export const searchHandler = async (req, res) => {
|
|
618
|
+
const brainy = createAutoBrainy({
|
|
619
|
+
bucketName: process.env.GCS_BUCKET
|
|
620
|
+
})
|
|
621
|
+
|
|
622
|
+
const results = await brainy.searchText(req.query.q, 10)
|
|
623
|
+
res.json(results)
|
|
624
|
+
}
|
|
1086
625
|
```
|
|
1087
626
|
|
|
1088
|
-
###
|
|
627
|
+
### Google Cloud Run
|
|
1089
628
|
|
|
1090
|
-
|
|
629
|
+
```dockerfile
|
|
630
|
+
# Dockerfile
|
|
631
|
+
FROM node:20-alpine
|
|
632
|
+
USER node
|
|
633
|
+
WORKDIR /app
|
|
634
|
+
COPY package*.json ./
|
|
635
|
+
RUN npm install brainy
|
|
636
|
+
COPY . .
|
|
637
|
+
CMD ["node", "server.js"]
|
|
638
|
+
```
|
|
1091
639
|
|
|
1092
|
-
|
|
1093
|
-
|
|
1094
|
-
|
|
640
|
+
```javascript
|
|
641
|
+
// server.js
|
|
642
|
+
import { createAutoBrainy } from 'brainy'
|
|
643
|
+
import express from 'express'
|
|
1095
644
|
|
|
1096
|
-
|
|
1097
|
-
|
|
1098
|
-
|
|
1099
|
-
clearExisting: true // Whether to clear existing data before restore
|
|
645
|
+
const app = express()
|
|
646
|
+
const brainy = createAutoBrainy({
|
|
647
|
+
bucketName: process.env.GCS_BUCKET
|
|
1100
648
|
})
|
|
1101
649
|
|
|
1102
|
-
|
|
1103
|
-
|
|
1104
|
-
|
|
1105
|
-
|
|
1106
|
-
{
|
|
1107
|
-
id: '123',
|
|
1108
|
-
// No vector field - will be created during import
|
|
1109
|
-
metadata: {
|
|
1110
|
-
noun: 'Thing',
|
|
1111
|
-
text: 'This text will be used to generate a vector'
|
|
1112
|
-
}
|
|
1113
|
-
}
|
|
1114
|
-
],
|
|
1115
|
-
verbs: [],
|
|
1116
|
-
version: '1.0.0'
|
|
1117
|
-
}
|
|
650
|
+
app.get('/search', async (req, res) => {
|
|
651
|
+
const results = await brainy.searchText(req.query.q, 10)
|
|
652
|
+
res.json(results)
|
|
653
|
+
})
|
|
1118
654
|
|
|
1119
|
-
const
|
|
655
|
+
const port = process.env.PORT || 8080
|
|
656
|
+
app.listen(port, () => console.log(`Brainy on Cloud Run: ${port}`))
|
|
1120
657
|
```
|
|
1121
658
|
|
|
1122
|
-
### CLI Backup/Restore
|
|
1123
|
-
|
|
1124
659
|
```bash
|
|
1125
|
-
#
|
|
1126
|
-
|
|
1127
|
-
|
|
1128
|
-
|
|
1129
|
-
|
|
1130
|
-
|
|
1131
|
-
# Import sparse data (without vectors)
|
|
1132
|
-
brainy import-sparse --input sparse-data.json
|
|
660
|
+
# Deploy to Cloud Run
|
|
661
|
+
gcloud run deploy brainy-api \
|
|
662
|
+
--source . \
|
|
663
|
+
--platform managed \
|
|
664
|
+
--region us-central1 \
|
|
665
|
+
--allow-unauthenticated
|
|
1133
666
|
```
|
|
1134
667
|
|
|
1135
|
-
|
|
1136
|
-
|
|
1137
|
-
Brainy uses the following embedding approach:
|
|
1138
|
-
|
|
1139
|
-
- TensorFlow Universal Sentence Encoder (high-quality text embeddings)
|
|
1140
|
-
- GPU acceleration when available (via WebGL in browsers)
|
|
1141
|
-
- Batch embedding for processing multiple items efficiently
|
|
1142
|
-
- Worker reuse and model caching for optimal performance
|
|
1143
|
-
- Custom embedding functions can be plugged in for specialized domains
|
|
1144
|
-
|
|
1145
|
-
## Extensions
|
|
1146
|
-
|
|
1147
|
-
Brainy includes an augmentation system for extending functionality:
|
|
1148
|
-
|
|
1149
|
-
- **Memory Augmentations**: Different storage backends
|
|
1150
|
-
- **Sense Augmentations**: Process raw data
|
|
1151
|
-
- **Cognition Augmentations**: Reasoning and inference
|
|
1152
|
-
- **Dialog Augmentations**: Text processing and interaction
|
|
1153
|
-
- **Perception Augmentations**: Data interpretation and visualization
|
|
1154
|
-
- **Activation Augmentations**: Trigger actions
|
|
1155
|
-
|
|
1156
|
-
### Simplified Augmentation System
|
|
668
|
+
### Vercel Edge Functions
|
|
1157
669
|
|
|
1158
|
-
|
|
1159
|
-
|
|
1160
|
-
|
|
1161
|
-
```typescript
|
|
1162
|
-
import {
|
|
1163
|
-
createMemoryAugmentation,
|
|
1164
|
-
createConduitAugmentation,
|
|
1165
|
-
createSenseAugmentation,
|
|
1166
|
-
addWebSocketSupport,
|
|
1167
|
-
executeStreamlined,
|
|
1168
|
-
processStaticData,
|
|
1169
|
-
processStreamingData,
|
|
1170
|
-
createPipeline
|
|
1171
|
-
} from '@soulcraft/brainy'
|
|
1172
|
-
|
|
1173
|
-
// Create a memory augmentation with minimal code
|
|
1174
|
-
const memoryAug = createMemoryAugmentation({
|
|
1175
|
-
name: 'simple-memory',
|
|
1176
|
-
description: 'A simple in-memory storage augmentation',
|
|
1177
|
-
autoRegister: true,
|
|
1178
|
-
autoInitialize: true,
|
|
1179
|
-
|
|
1180
|
-
// Implement only the methods you need
|
|
1181
|
-
storeData: async (key, data) => {
|
|
1182
|
-
// Your implementation here
|
|
1183
|
-
return {
|
|
1184
|
-
success: true,
|
|
1185
|
-
data: true
|
|
1186
|
-
}
|
|
1187
|
-
},
|
|
1188
|
-
|
|
1189
|
-
retrieveData: async (key) => {
|
|
1190
|
-
// Your implementation here
|
|
1191
|
-
return {
|
|
1192
|
-
success: true,
|
|
1193
|
-
data: { example: 'data', key }
|
|
1194
|
-
}
|
|
1195
|
-
}
|
|
1196
|
-
})
|
|
670
|
+
```javascript
|
|
671
|
+
import { createAutoBrainy } from 'brainy'
|
|
1197
672
|
|
|
1198
|
-
|
|
1199
|
-
|
|
1200
|
-
|
|
1201
|
-
// Your implementation here
|
|
1202
|
-
return {
|
|
1203
|
-
connectionId: 'ws-1',
|
|
1204
|
-
url,
|
|
1205
|
-
status: 'connected'
|
|
1206
|
-
}
|
|
1207
|
-
}
|
|
1208
|
-
})
|
|
673
|
+
export const config = {
|
|
674
|
+
runtime: 'edge'
|
|
675
|
+
}
|
|
1209
676
|
|
|
1210
|
-
|
|
1211
|
-
const
|
|
1212
|
-
|
|
1213
|
-
|
|
1214
|
-
|
|
1215
|
-
|
|
1216
|
-
|
|
1217
|
-
|
|
1218
|
-
|
|
1219
|
-
{
|
|
1220
|
-
augmentation: memoryAug,
|
|
1221
|
-
method: 'storeData',
|
|
1222
|
-
transformArgs: (data) => ['processed-data', data]
|
|
1223
|
-
}
|
|
1224
|
-
]
|
|
1225
|
-
)
|
|
1226
|
-
|
|
1227
|
-
// Create a reusable pipeline
|
|
1228
|
-
const pipeline = createPipeline([
|
|
1229
|
-
{
|
|
1230
|
-
augmentation: senseAug,
|
|
1231
|
-
method: 'processRawData',
|
|
1232
|
-
transformArgs: (data) => [data, 'text']
|
|
1233
|
-
},
|
|
1234
|
-
{
|
|
1235
|
-
augmentation: memoryAug,
|
|
1236
|
-
method: 'storeData',
|
|
1237
|
-
transformArgs: (data) => ['processed-data', data]
|
|
1238
|
-
}
|
|
1239
|
-
])
|
|
677
|
+
export default async function handler(request) {
|
|
678
|
+
const brainy = createAutoBrainy()
|
|
679
|
+
const { searchParams } = new URL(request.url)
|
|
680
|
+
const query = searchParams.get('q')
|
|
681
|
+
|
|
682
|
+
const results = await brainy.searchText(query, 10)
|
|
683
|
+
return Response.json(results)
|
|
684
|
+
}
|
|
685
|
+
```
|
|
1240
686
|
|
|
1241
|
-
|
|
1242
|
-
const result = await pipeline('New input data')
|
|
687
|
+
### Netlify Functions
|
|
1243
688
|
|
|
1244
|
-
|
|
1245
|
-
|
|
1246
|
-
|
|
1247
|
-
|
|
1248
|
-
|
|
1249
|
-
|
|
689
|
+
```javascript
|
|
690
|
+
import { createAutoBrainy } from 'brainy'
|
|
691
|
+
|
|
692
|
+
export async function handler(event, context) {
|
|
693
|
+
const brainy = createAutoBrainy()
|
|
694
|
+
const query = event.queryStringParameters.q
|
|
695
|
+
|
|
696
|
+
const results = await brainy.searchText(query, 10)
|
|
697
|
+
|
|
698
|
+
return {
|
|
699
|
+
statusCode: 200,
|
|
700
|
+
body: JSON.stringify(results)
|
|
1250
701
|
}
|
|
1251
|
-
|
|
702
|
+
}
|
|
1252
703
|
```
|
|
1253
704
|
|
|
1254
|
-
|
|
1255
|
-
|
|
1256
|
-
1. **Factory Functions** - Create augmentations with minimal boilerplate
|
|
1257
|
-
2. **WebSocket Support** - Add WebSocket capabilities to any augmentation
|
|
1258
|
-
3. **Streamlined Pipeline** - Process data through augmentations more efficiently
|
|
1259
|
-
4. **Dynamic Loading** - Load augmentations at runtime when needed
|
|
1260
|
-
5. **Static & Streaming Data** - Handle both static and streaming data with the same API
|
|
1261
|
-
|
|
1262
|
-
#### WebSocket Augmentation Types
|
|
1263
|
-
|
|
1264
|
-
Brainy exports several WebSocket augmentation types that can be used by augmentation creators to add WebSocket
|
|
1265
|
-
capabilities to their augmentations:
|
|
705
|
+
### Supabase Edge Functions
|
|
1266
706
|
|
|
1267
707
|
```typescript
|
|
1268
|
-
import {
|
|
1269
|
-
|
|
1270
|
-
|
|
1271
|
-
|
|
1272
|
-
|
|
1273
|
-
|
|
1274
|
-
|
|
1275
|
-
|
|
1276
|
-
|
|
1277
|
-
|
|
1278
|
-
|
|
1279
|
-
|
|
1280
|
-
|
|
1281
|
-
// Function to add WebSocket support to any augmentation
|
|
1282
|
-
addWebSocketSupport
|
|
1283
|
-
} from '@soulcraft/brainy'
|
|
1284
|
-
|
|
1285
|
-
// Example: Creating a typed WebSocket-enabled sense augmentation
|
|
1286
|
-
const mySenseAug = createSenseAugmentation({
|
|
1287
|
-
name: 'my-sense',
|
|
1288
|
-
processRawData: async (data, dataType) => {
|
|
1289
|
-
// Implementation
|
|
1290
|
-
return {
|
|
1291
|
-
success: true,
|
|
1292
|
-
data: { nouns: [], verbs: [] }
|
|
1293
|
-
}
|
|
1294
|
-
}
|
|
1295
|
-
}) as IWebSocketSenseAugmentation
|
|
1296
|
-
|
|
1297
|
-
// Add WebSocket support
|
|
1298
|
-
addWebSocketSupport(mySenseAug, {
|
|
1299
|
-
connectWebSocket: async (url) => {
|
|
1300
|
-
// WebSocket implementation
|
|
1301
|
-
return {
|
|
1302
|
-
connectionId: 'ws-1',
|
|
1303
|
-
url,
|
|
1304
|
-
status: 'connected'
|
|
1305
|
-
}
|
|
1306
|
-
},
|
|
1307
|
-
sendWebSocketMessage: async (connectionId, data) => {
|
|
1308
|
-
// Send message implementation
|
|
1309
|
-
},
|
|
1310
|
-
onWebSocketMessage: async (connectionId, callback) => {
|
|
1311
|
-
// Register callback implementation
|
|
1312
|
-
},
|
|
1313
|
-
offWebSocketMessage: async (connectionId, callback) => {
|
|
1314
|
-
// Remove callback implementation
|
|
1315
|
-
},
|
|
1316
|
-
closeWebSocket: async (connectionId, code, reason) => {
|
|
1317
|
-
// Close connection implementation
|
|
1318
|
-
}
|
|
708
|
+
import { createAutoBrainy } from 'brainy'
|
|
709
|
+
import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
|
|
710
|
+
|
|
711
|
+
serve(async (req) => {
|
|
712
|
+
const brainy = createAutoBrainy()
|
|
713
|
+
const url = new URL(req.url)
|
|
714
|
+
const query = url.searchParams.get('q')
|
|
715
|
+
|
|
716
|
+
const results = await brainy.searchText(query, 10)
|
|
717
|
+
|
|
718
|
+
return new Response(JSON.stringify(results), {
|
|
719
|
+
headers: { 'Content-Type': 'application/json' }
|
|
720
|
+
})
|
|
1319
721
|
})
|
|
1320
|
-
|
|
1321
|
-
// Now mySenseAug has both sense augmentation methods and WebSocket methods
|
|
1322
|
-
await mySenseAug.processRawData('data', 'text')
|
|
1323
|
-
await mySenseAug.connectWebSocket('wss://example.com')
|
|
1324
722
|
```
|
|
1325
723
|
|
|
1326
|
-
|
|
1327
|
-
providing type safety and autocompletion for augmentations with WebSocket capabilities.
|
|
1328
|
-
|
|
1329
|
-
### Model Control Protocol (MCP)
|
|
724
|
+
### Docker Container
|
|
1330
725
|
|
|
1331
|
-
|
|
1332
|
-
|
|
726
|
+
```dockerfile
|
|
727
|
+
FROM node:20-alpine
|
|
728
|
+
USER node
|
|
729
|
+
WORKDIR /app
|
|
730
|
+
COPY package*.json ./
|
|
731
|
+
RUN npm install brainy
|
|
732
|
+
COPY . .
|
|
1333
733
|
|
|
1334
|
-
|
|
1335
|
-
- **MCPAugmentationToolset**: Exposes the augmentation pipeline as tools
|
|
1336
|
-
- **BrainyMCPService**: Integrates the adapter and toolset, providing WebSocket and REST server implementations
|
|
1337
|
-
|
|
1338
|
-
Environment compatibility:
|
|
1339
|
-
|
|
1340
|
-
- **BrainyMCPAdapter** and **MCPAugmentationToolset** can run in any environment (browser, Node.js, server)
|
|
1341
|
-
- **BrainyMCPService** core functionality works in any environment
|
|
1342
|
-
|
|
1343
|
-
For detailed documentation and usage examples, see the [MCP documentation](src/mcp/README.md).
|
|
1344
|
-
|
|
1345
|
-
## Cross-Environment Compatibility
|
|
1346
|
-
|
|
1347
|
-
Brainy is designed to run seamlessly in any environment, from browsers to Node.js to serverless functions and
|
|
1348
|
-
containers. All Brainy data, functions, and augmentations are environment-agnostic, allowing you to use the same code
|
|
1349
|
-
everywhere.
|
|
1350
|
-
|
|
1351
|
-
### Environment Detection
|
|
1352
|
-
|
|
1353
|
-
Brainy automatically detects the environment it's running in:
|
|
1354
|
-
|
|
1355
|
-
```typescript
|
|
1356
|
-
import { environment } from '@soulcraft/brainy'
|
|
1357
|
-
|
|
1358
|
-
// Check which environment we're running in
|
|
1359
|
-
console.log(`Running in ${
|
|
1360
|
-
environment.isBrowser ? 'browser' :
|
|
1361
|
-
environment.isNode ? 'Node.js' :
|
|
1362
|
-
'serverless/unknown'
|
|
1363
|
-
} environment`)
|
|
734
|
+
CMD ["node", "server.js"]
|
|
1364
735
|
```
|
|
1365
736
|
|
|
1366
|
-
|
|
1367
|
-
|
|
1368
|
-
|
|
1369
|
-
|
|
1370
|
-
- **Browser**: Uses Origin Private File System (OPFS) when available, falls back to in-memory storage
|
|
1371
|
-
- **Node.js**: Uses file system storage by default, with options for S3-compatible cloud storage
|
|
1372
|
-
- **Serverless**: Uses in-memory storage with options for cloud persistence
|
|
1373
|
-
- **Container**: Automatically detects and uses the appropriate storage based on available capabilities
|
|
1374
|
-
|
|
1375
|
-
### Dynamic Imports
|
|
1376
|
-
|
|
1377
|
-
Brainy uses dynamic imports to load environment-specific dependencies only when needed, keeping the bundle size small
|
|
1378
|
-
and ensuring compatibility across environments.
|
|
1379
|
-
|
|
1380
|
-
### Browser Support
|
|
1381
|
-
|
|
1382
|
-
Works in all modern browsers:
|
|
1383
|
-
|
|
1384
|
-
- Chrome 86+
|
|
1385
|
-
- Edge 86+
|
|
1386
|
-
- Opera 72+
|
|
1387
|
-
- Chrome for Android 86+
|
|
737
|
+
```javascript
|
|
738
|
+
// server.js
|
|
739
|
+
import { createAutoBrainy } from 'brainy'
|
|
740
|
+
import express from 'express'
|
|
1388
741
|
|
|
1389
|
-
|
|
742
|
+
const app = express()
|
|
743
|
+
const brainy = createAutoBrainy()
|
|
1390
744
|
|
|
1391
|
-
|
|
745
|
+
app.get('/search', async (req, res) => {
|
|
746
|
+
const results = await brainy.searchText(req.query.q, 10)
|
|
747
|
+
res.json(results)
|
|
748
|
+
})
|
|
1392
749
|
|
|
1393
|
-
|
|
1394
|
-
|
|
750
|
+
app.listen(3000, () => console.log('Brainy running on port 3000'))
|
|
751
|
+
```
|
|
1395
752
|
|
|
1396
|
-
|
|
753
|
+
### Kubernetes
|
|
1397
754
|
|
|
1398
|
-
|
|
755
|
+
```yaml
|
|
756
|
+
apiVersion: apps/v1
|
|
757
|
+
kind: Deployment
|
|
758
|
+
metadata:
|
|
759
|
+
name: brainy-api
|
|
760
|
+
spec:
|
|
761
|
+
replicas: 3
|
|
762
|
+
template:
|
|
763
|
+
spec:
|
|
764
|
+
containers:
|
|
765
|
+
- name: brainy
|
|
766
|
+
image: your-registry/brainy-api:latest
|
|
767
|
+
env:
|
|
768
|
+
- name: S3_BUCKET
|
|
769
|
+
value: "your-vector-bucket"
|
|
770
|
+
```
|
|
1399
771
|
|
|
1400
|
-
|
|
1401
|
-
- **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the
|
|
1402
|
-
interactive demo on
|
|
1403
|
-
GitHub Pages
|
|
1404
|
-
- Or run it locally with `npm run demo` (see [demo instructions](demo.md) for details)
|
|
1405
|
-
- To deploy your own version to GitHub Pages, use the GitHub Actions workflow in
|
|
1406
|
-
`.github/workflows/deploy-demo.yml`,
|
|
1407
|
-
which automatically deploys when pushing to the main branch or can be manually triggered
|
|
1408
|
-
- To use a custom domain (like www.soulcraft.com):
|
|
1409
|
-
1. A CNAME file is already included in the demo directory
|
|
1410
|
-
2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
|
|
1411
|
-
3. Configure your domain's DNS settings to point to GitHub Pages:
|
|
772
|
+
### Railway.app
|
|
1412
773
|
|
|
1413
|
-
|
|
1414
|
-
|
|
774
|
+
```javascript
|
|
775
|
+
// server.js
|
|
776
|
+
import { createAutoBrainy } from 'brainy'
|
|
1415
777
|
|
|
1416
|
-
|
|
778
|
+
const brainy = createAutoBrainy({
|
|
779
|
+
bucketName: process.env.RAILWAY_VOLUME_NAME
|
|
780
|
+
})
|
|
1417
781
|
|
|
1418
|
-
|
|
1419
|
-
|
|
1420
|
-
- How HNSW search works
|
|
782
|
+
// Railway automatically handles the rest!
|
|
783
|
+
```
|
|
1421
784
|
|
|
1422
|
-
|
|
785
|
+
### Render.com
|
|
1423
786
|
|
|
1424
|
-
|
|
787
|
+
```yaml
|
|
788
|
+
# render.yaml
|
|
789
|
+
services:
|
|
790
|
+
- type: web
|
|
791
|
+
name: brainy-api
|
|
792
|
+
env: node
|
|
793
|
+
buildCommand: npm install brainy
|
|
794
|
+
startCommand: node server.js
|
|
795
|
+
envVars:
|
|
796
|
+
- key: BRAINY_STORAGE
|
|
797
|
+
value: persistent-disk
|
|
798
|
+
```
|
|
1425
799
|
|
|
1426
|
-
|
|
1427
|
-
direct browser-to-browser communication without a server in the middle.
|
|
1428
|
-
- **WebRTC iConduit**: For direct peer-to-peer syncing between browsers. This is the recommended approach for
|
|
1429
|
-
browser-to-browser communication.
|
|
800
|
+
## 🚀 Quick Examples
|
|
1430
801
|
|
|
1431
|
-
|
|
802
|
+
### Basic Usage
|
|
1432
803
|
|
|
1433
|
-
```
|
|
1434
|
-
import {
|
|
1435
|
-
BrainyData,
|
|
1436
|
-
pipeline,
|
|
1437
|
-
createConduitAugmentation
|
|
1438
|
-
} from '@soulcraft/brainy'
|
|
804
|
+
```javascript
|
|
805
|
+
import { BrainyData, NounType, VerbType } from 'brainy'
|
|
1439
806
|
|
|
1440
|
-
//
|
|
807
|
+
// Initialize
|
|
1441
808
|
const db = new BrainyData()
|
|
1442
809
|
await db.init()
|
|
1443
810
|
|
|
1444
|
-
//
|
|
1445
|
-
const
|
|
1446
|
-
|
|
1447
|
-
|
|
1448
|
-
|
|
1449
|
-
|
|
1450
|
-
// Connect to another Brainy instance (server or browser)
|
|
1451
|
-
// Replace the example URL below with your actual WebSocket server URL
|
|
1452
|
-
const connectionResult = await pipeline.executeConduitPipeline(
|
|
1453
|
-
'establishConnection',
|
|
1454
|
-
['wss://example-websocket-server.com/brainy-sync', { protocols: 'brainy-sync' }]
|
|
1455
|
-
)
|
|
1456
|
-
|
|
1457
|
-
if (connectionResult[0] && (await connectionResult[0]).success) {
|
|
1458
|
-
const connection = (await connectionResult[0]).data
|
|
1459
|
-
|
|
1460
|
-
// Read data from the remote instance
|
|
1461
|
-
const readResult = await pipeline.executeConduitPipeline(
|
|
1462
|
-
'readData',
|
|
1463
|
-
[{ connectionId: connection.connectionId, query: { type: 'getAllNouns' } }]
|
|
1464
|
-
)
|
|
811
|
+
// Add data (automatically vectorized)
|
|
812
|
+
const catId = await db.add("Cats are independent pets", {
|
|
813
|
+
noun: NounType.Thing,
|
|
814
|
+
category: 'animal'
|
|
815
|
+
})
|
|
1465
816
|
|
|
1466
|
-
|
|
1467
|
-
|
|
1468
|
-
const remoteNouns = (await readResult[0]).data
|
|
1469
|
-
for (const noun of remoteNouns) {
|
|
1470
|
-
await db.add(noun.vector, noun.metadata)
|
|
1471
|
-
}
|
|
1472
|
-
}
|
|
817
|
+
// Search for similar items
|
|
818
|
+
const results = await db.searchText("feline pets", 5)
|
|
1473
819
|
|
|
1474
|
-
|
|
1475
|
-
|
|
1476
|
-
|
|
1477
|
-
|
|
1478
|
-
|
|
1479
|
-
} else if (data.type === 'newVerb') {
|
|
1480
|
-
await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
|
|
1481
|
-
}
|
|
1482
|
-
})
|
|
1483
|
-
}
|
|
820
|
+
// Add relationships
|
|
821
|
+
await db.addVerb(catId, dogId, {
|
|
822
|
+
verb: VerbType.RelatedTo,
|
|
823
|
+
description: 'Both are pets'
|
|
824
|
+
})
|
|
1484
825
|
```
|
|
1485
826
|
|
|
1486
|
-
|
|
827
|
+
### AutoBrainy (Recommended)
|
|
1487
828
|
|
|
1488
|
-
```
|
|
1489
|
-
import {
|
|
1490
|
-
BrainyData,
|
|
1491
|
-
pipeline,
|
|
1492
|
-
createConduitAugmentation
|
|
1493
|
-
} from '@soulcraft/brainy'
|
|
1494
|
-
|
|
1495
|
-
// Create and initialize the database
|
|
1496
|
-
const db = new BrainyData()
|
|
1497
|
-
await db.init()
|
|
829
|
+
```javascript
|
|
830
|
+
import { createAutoBrainy } from 'brainy'
|
|
1498
831
|
|
|
1499
|
-
//
|
|
1500
|
-
const
|
|
1501
|
-
|
|
1502
|
-
// Register the augmentation with the pipeline
|
|
1503
|
-
pipeline.register(webrtcConduit)
|
|
1504
|
-
|
|
1505
|
-
// Connect to a peer using a signaling server
|
|
1506
|
-
// Replace the example values below with your actual configuration
|
|
1507
|
-
const connectionResult = await pipeline.executeConduitPipeline(
|
|
1508
|
-
'establishConnection',
|
|
1509
|
-
[
|
|
1510
|
-
'peer-id-to-connect-to', // Replace with actual peer ID
|
|
1511
|
-
{
|
|
1512
|
-
signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
|
|
1513
|
-
localPeerId: 'my-local-peer-id', // Replace with your local peer ID
|
|
1514
|
-
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] // Public STUN server
|
|
1515
|
-
}
|
|
1516
|
-
]
|
|
1517
|
-
)
|
|
1518
|
-
|
|
1519
|
-
if (connectionResult[0] && (await connectionResult[0]).success) {
|
|
1520
|
-
const connection = (await connectionResult[0]).data
|
|
1521
|
-
|
|
1522
|
-
// Set up real-time sync by monitoring the stream
|
|
1523
|
-
await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
|
|
1524
|
-
// Handle incoming data (e.g., new nouns, verbs, updates)
|
|
1525
|
-
if (data.type === 'newNoun') {
|
|
1526
|
-
await db.add(data.vector, data.metadata)
|
|
1527
|
-
} else if (data.type === 'newVerb') {
|
|
1528
|
-
await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
|
|
1529
|
-
}
|
|
1530
|
-
})
|
|
832
|
+
// Everything auto-configured!
|
|
833
|
+
const brainy = createAutoBrainy()
|
|
1531
834
|
|
|
1532
|
-
|
|
1533
|
-
|
|
1534
|
-
|
|
1535
|
-
// Send the new noun to the peer
|
|
1536
|
-
await pipeline.executeConduitPipeline(
|
|
1537
|
-
'writeData',
|
|
1538
|
-
[
|
|
1539
|
-
{
|
|
1540
|
-
connectionId: connection.connectionId,
|
|
1541
|
-
data: {
|
|
1542
|
-
type: 'newNoun',
|
|
1543
|
-
id: nounId,
|
|
1544
|
-
vector: (await db.get(nounId)).vector,
|
|
1545
|
-
metadata: (await db.get(nounId)).metadata
|
|
1546
|
-
}
|
|
1547
|
-
}
|
|
1548
|
-
]
|
|
1549
|
-
)
|
|
1550
|
-
}
|
|
835
|
+
// Just start using it
|
|
836
|
+
await brainy.addVector({ id: '1', vector: [0.1, 0.2, 0.3], text: 'Hello' })
|
|
837
|
+
const results = await brainy.search([0.1, 0.2, 0.3], 10)
|
|
1551
838
|
```
|
|
1552
839
|
|
|
1553
|
-
|
|
1554
|
-
|
|
1555
|
-
Brainy supports searching a server-hosted instance from a browser, storing results locally, and performing further
|
|
1556
|
-
searches against the local instance:
|
|
1557
|
-
|
|
1558
|
-
```typescript
|
|
1559
|
-
import { BrainyData } from '@soulcraft/brainy'
|
|
1560
|
-
|
|
1561
|
-
// Create and initialize the database with remote server configuration
|
|
1562
|
-
// Replace the example URL below with your actual Brainy server URL
|
|
1563
|
-
const db = new BrainyData({
|
|
1564
|
-
remoteServer: {
|
|
1565
|
-
url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
|
|
1566
|
-
protocols: 'brainy-sync',
|
|
1567
|
-
autoConnect: true // Connect automatically during initialization
|
|
1568
|
-
}
|
|
1569
|
-
})
|
|
1570
|
-
await db.init()
|
|
1571
|
-
|
|
1572
|
-
// Or connect manually after initialization
|
|
1573
|
-
if (!db.isConnectedToRemoteServer()) {
|
|
1574
|
-
// Replace the example URL below with your actual Brainy server URL
|
|
1575
|
-
await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
|
|
1576
|
-
}
|
|
1577
|
-
|
|
1578
|
-
// Search the remote server (results are stored locally)
|
|
1579
|
-
const remoteResults = await db.searchText('machine learning', 5, { searchMode: 'remote' })
|
|
840
|
+
### Scenario-Based Setup
|
|
1580
841
|
|
|
1581
|
-
|
|
1582
|
-
|
|
1583
|
-
|
|
1584
|
-
// Perform a combined search (local first, then remote if needed)
|
|
1585
|
-
const combinedResults = await db.searchText('neural networks', 5, { searchMode: 'combined' })
|
|
842
|
+
```javascript
|
|
843
|
+
import { createQuickBrainy } from 'brainy'
|
|
1586
844
|
|
|
1587
|
-
//
|
|
1588
|
-
const
|
|
1589
|
-
|
|
1590
|
-
category: 'AI',
|
|
1591
|
-
tags: ['deep learning', 'neural networks']
|
|
845
|
+
// Choose your scale: 'small', 'medium', 'large', 'enterprise'
|
|
846
|
+
const brainy = await createQuickBrainy('large', {
|
|
847
|
+
bucketName: 'my-vector-db'
|
|
1592
848
|
})
|
|
1593
|
-
|
|
1594
|
-
// Clean up when done (this also cleans up worker pools)
|
|
1595
|
-
await db.shutDown()
|
|
1596
849
|
```
|
|
1597
850
|
|
|
1598
|
-
|
|
1599
|
-
|
|
1600
|
-
## 📈 Scaling Strategy
|
|
1601
|
-
|
|
1602
|
-
Brainy is designed to handle datasets of various sizes, from small collections to large-scale deployments. For
|
|
1603
|
-
terabyte-scale data that can't fit entirely in memory, we provide several approaches:
|
|
1604
|
-
|
|
1605
|
-
- **Disk-Based HNSW**: Modified implementations using intelligent caching and partial loading
|
|
1606
|
-
- **Distributed HNSW**: Sharding and partitioning across multiple machines
|
|
1607
|
-
- **Hybrid Solutions**: Combining quantization techniques with multi-tier architectures
|
|
1608
|
-
|
|
1609
|
-
For detailed information on how to scale Brainy for large datasets, vector dimension standardization, threading
|
|
1610
|
-
implementation, storage testing, and other technical topics, see our
|
|
1611
|
-
comprehensive [Technical Guides](TECHNICAL_GUIDES.md).
|
|
1612
|
-
|
|
1613
|
-
## Recent Changes and Performance Improvements
|
|
1614
|
-
|
|
1615
|
-
### Enhanced Memory Management and Scalability
|
|
1616
|
-
|
|
1617
|
-
Brainy has been significantly improved to handle larger datasets more efficiently:
|
|
1618
|
-
|
|
1619
|
-
- **Pagination Support**: All data retrieval methods now support pagination to avoid loading entire datasets into memory
|
|
1620
|
-
at once. The deprecated `getAllNouns()` and `getAllVerbs()` methods have been replaced with `getNouns()` and
|
|
1621
|
-
`getVerbs()` methods that support pagination, filtering, and cursor-based navigation.
|
|
1622
|
-
|
|
1623
|
-
- **Multi-level Caching**: A sophisticated three-level caching strategy has been implemented:
|
|
1624
|
-
- **Level 1**: Hot cache (most accessed nodes) - RAM (automatically detecting and adjusting in each environment)
|
|
1625
|
-
- **Level 2**: Warm cache (recent nodes) - OPFS, Filesystem or S3 depending on environment
|
|
1626
|
-
- **Level 3**: Cold storage (all nodes) - OPFS, Filesystem or S3 depending on environment
|
|
1627
|
-
|
|
1628
|
-
- **Adaptive Memory Usage**: The system automatically detects available memory and adjusts cache sizes accordingly:
|
|
1629
|
-
- In Node.js: Uses 10% of free memory (minimum 1000 entries)
|
|
1630
|
-
- In browsers: Scales based on device memory (500 entries per GB, minimum 1000)
|
|
1631
|
-
|
|
1632
|
-
- **Intelligent Cache Eviction**: Implements a Least Recently Used (LRU) policy that evicts the oldest 20% of items when
|
|
1633
|
-
the cache reaches the configured threshold.
|
|
1634
|
-
|
|
1635
|
-
- **Prefetching Strategy**: Implements batch prefetching to improve performance while avoiding overwhelming system
|
|
1636
|
-
resources.
|
|
1637
|
-
|
|
1638
|
-
### S3-Compatible Storage Improvements
|
|
1639
|
-
|
|
1640
|
-
- **Enhanced Cloud Storage**: Improved support for S3-compatible storage services including AWS S3, Cloudflare R2, and
|
|
1641
|
-
others.
|
|
1642
|
-
|
|
1643
|
-
- **Optimized Data Access**: Batch operations and error handling for efficient cloud storage access.
|
|
1644
|
-
|
|
1645
|
-
- **Change Log Management**: Efficient synchronization through change logs to track updates.
|
|
1646
|
-
|
|
1647
|
-
### Data Compatibility
|
|
1648
|
-
|
|
1649
|
-
Yes, you can use existing data indexed from an old version. Brainy includes robust data migration capabilities:
|
|
1650
|
-
|
|
1651
|
-
- **Vector Regeneration**: If vectors are missing in imported data, they will be automatically created using the
|
|
1652
|
-
embedding function.
|
|
1653
|
-
|
|
1654
|
-
- **HNSW Index Reconstruction**: The system can reconstruct the HNSW index from backup data, ensuring compatibility with
|
|
1655
|
-
previous versions.
|
|
1656
|
-
|
|
1657
|
-
- **Sparse Data Import**: Support for importing sparse data (without vectors) through the `importSparseData()` method.
|
|
1658
|
-
|
|
1659
|
-
### System Requirements
|
|
1660
|
-
|
|
1661
|
-
#### Default Mode
|
|
1662
|
-
|
|
1663
|
-
- **Memory**:
|
|
1664
|
-
- Minimum: 512MB RAM
|
|
1665
|
-
- Recommended: 2GB+ RAM for medium datasets, 8GB+ for large datasets
|
|
1666
|
-
|
|
1667
|
-
- **CPU**:
|
|
1668
|
-
- Minimum: 2 cores
|
|
1669
|
-
- Recommended: 4+ cores for better performance with parallel operations
|
|
1670
|
-
|
|
1671
|
-
- **Storage**:
|
|
1672
|
-
- Minimum: 1GB available storage
|
|
1673
|
-
- Recommended: Storage space at least 3x the size of your dataset
|
|
1674
|
-
|
|
1675
|
-
#### Read-Only Mode
|
|
1676
|
-
|
|
1677
|
-
Read-only mode prevents all write operations (add, update, delete) and is optimized for search operations.
|
|
1678
|
-
|
|
1679
|
-
- **Memory**:
|
|
1680
|
-
- Minimum: 256MB RAM
|
|
1681
|
-
- Recommended: 1GB+ RAM
|
|
1682
|
-
|
|
1683
|
-
- **CPU**:
|
|
1684
|
-
- Minimum: 1 core
|
|
1685
|
-
- Recommended: 2+ cores
|
|
1686
|
-
|
|
1687
|
-
- **Storage**:
|
|
1688
|
-
- Minimum: Storage space equal to the size of your dataset
|
|
1689
|
-
- Recommended: 2x the size of your dataset for caching
|
|
1690
|
-
|
|
1691
|
-
- **New Feature**: Lazy loading support in read-only mode for improved performance with large datasets.
|
|
1692
|
-
|
|
1693
|
-
#### Write-Only Mode
|
|
1694
|
-
|
|
1695
|
-
Write-only mode prevents all search operations and is optimized for initial data loading or when you want to optimize
|
|
1696
|
-
for write performance.
|
|
1697
|
-
|
|
1698
|
-
- **Memory**:
|
|
1699
|
-
- Minimum: 512MB RAM
|
|
1700
|
-
- Recommended: 2GB+ RAM
|
|
1701
|
-
|
|
1702
|
-
- **CPU**:
|
|
1703
|
-
- Minimum: 2 cores
|
|
1704
|
-
- Recommended: 4+ cores for faster data ingestion
|
|
1705
|
-
|
|
1706
|
-
- **Storage**:
|
|
1707
|
-
- Minimum: Storage space at least 2x the size of your dataset
|
|
1708
|
-
- Recommended: 4x the size of your dataset for optimal performance
|
|
1709
|
-
|
|
1710
|
-
### Performance Tuning Parameters
|
|
1711
|
-
|
|
1712
|
-
Brainy offers comprehensive configuration options for performance tuning, with enhanced support for large datasets in S3
|
|
1713
|
-
or other remote storage. **All configuration is optional** - the system automatically detects the optimal settings based
|
|
1714
|
-
on your environment, dataset size, and usage patterns.
|
|
1715
|
-
|
|
1716
|
-
#### Intelligent Defaults
|
|
1717
|
-
|
|
1718
|
-
Brainy uses intelligent defaults that automatically adapt to your environment:
|
|
1719
|
-
|
|
1720
|
-
- **Environment Detection**: Automatically detects whether you're running in Node.js, browser, or worker environment
|
|
1721
|
-
- **Memory-Aware Caching**: Adjusts cache sizes based on available system memory
|
|
1722
|
-
- **Dataset Size Adaptation**: Tunes parameters based on the size of your dataset
|
|
1723
|
-
- **Usage Pattern Optimization**: Adjusts to read-heavy vs. write-heavy workloads
|
|
1724
|
-
- **Storage Type Awareness**: Optimizes for local vs. remote storage (S3, R2, etc.)
|
|
1725
|
-
- **Operating Mode Specialization**: Special optimizations for read-only and write-only modes
|
|
1726
|
-
|
|
1727
|
-
#### Cache Configuration (Optional)
|
|
1728
|
-
|
|
1729
|
-
You can override any of these automatically tuned parameters if needed:
|
|
1730
|
-
|
|
1731
|
-
- **Hot Cache Size**: Control the maximum number of items to keep in memory.
|
|
1732
|
-
- For large datasets (>100K items), consider values between 5,000-50,000 depending on available memory.
|
|
1733
|
-
- In read-only mode, larger values (10,000-100,000) can be used for better performance.
|
|
1734
|
-
|
|
1735
|
-
- **Eviction Threshold**: Set the threshold at which cache eviction begins (default: 0.8 or 80% of max size).
|
|
1736
|
-
- For write-heavy workloads, lower values (0.6-0.7) may improve performance.
|
|
1737
|
-
- For read-heavy workloads, higher values (0.8-0.9) are recommended.
|
|
1738
|
-
|
|
1739
|
-
- **Warm Cache TTL**: Set the time-to-live for items in the warm cache (default: 3600000 ms or 1 hour).
|
|
1740
|
-
- For frequently changing data, shorter TTLs are recommended.
|
|
1741
|
-
- For relatively static data, longer TTLs improve performance.
|
|
1742
|
-
|
|
1743
|
-
- **Batch Size**: Control the number of items to process in a single batch for operations like prefetching.
|
|
1744
|
-
- For S3 or remote storage with large datasets, larger values (50-200) significantly improve throughput.
|
|
1745
|
-
- In read-only mode with remote storage, even larger values (100-300) can be used.
|
|
1746
|
-
|
|
1747
|
-
#### Auto-Tuning (Enabled by Default)
|
|
1748
|
-
|
|
1749
|
-
- **Auto-Tune**: Enable or disable automatic tuning of cache parameters based on usage patterns (default: true).
|
|
1750
|
-
- **Auto-Tune Interval**: Set how frequently the system adjusts cache parameters (default: 60000 ms or 1 minute).
|
|
1751
|
-
|
|
1752
|
-
#### Read-Only Mode Optimizations (Automatic)
|
|
1753
|
-
|
|
1754
|
-
Read-only mode includes special optimizations for search performance that are automatically applied:
|
|
1755
|
-
|
|
1756
|
-
- **Larger Cache Sizes**: Automatically uses more memory for caching (up to 40% of free memory for large datasets).
|
|
1757
|
-
- **Aggressive Prefetching**: Loads more data in each batch to reduce the number of storage requests.
|
|
1758
|
-
- **Prefetch Strategy**: Defaults to 'aggressive' prefetching strategy in read-only mode.
|
|
1759
|
-
|
|
1760
|
-
#### Example Configuration for Large S3 Datasets
|
|
851
|
+
### With Offline Models
|
|
1761
852
|
|
|
1762
853
|
```javascript
|
|
1763
|
-
|
|
1764
|
-
|
|
1765
|
-
lazyLoadInReadOnlyMode: true,
|
|
1766
|
-
storage: {
|
|
1767
|
-
type: 's3',
|
|
1768
|
-
s3Storage: {
|
|
1769
|
-
bucketName: 'your-bucket',
|
|
1770
|
-
accessKeyId: 'your-access-key',
|
|
1771
|
-
secretAccessKey: 'your-secret-key',
|
|
1772
|
-
region: 'your-region'
|
|
1773
|
-
}
|
|
1774
|
-
},
|
|
1775
|
-
cache: {
|
|
1776
|
-
hotCacheMaxSize: 20000,
|
|
1777
|
-
hotCacheEvictionThreshold: 0.85,
|
|
1778
|
-
batchSize: 100,
|
|
1779
|
-
readOnlyMode: {
|
|
1780
|
-
hotCacheMaxSize: 50000,
|
|
1781
|
-
batchSize: 200,
|
|
1782
|
-
prefetchStrategy: 'aggressive'
|
|
1783
|
-
}
|
|
1784
|
-
}
|
|
1785
|
-
});
|
|
1786
|
-
```
|
|
1787
|
-
|
|
1788
|
-
These configuration options make Brainy more efficient, scalable, and adaptable to different environments and usage
|
|
1789
|
-
patterns, especially for large datasets in cloud storage.
|
|
1790
|
-
|
|
1791
|
-
## Testing
|
|
854
|
+
import { createAutoBrainy } from 'brainy'
|
|
855
|
+
import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
|
|
1792
856
|
|
|
1793
|
-
|
|
1794
|
-
|
|
1795
|
-
|
|
1796
|
-
|
|
1797
|
-
|
|
1798
|
-
```bash
|
|
1799
|
-
# Run all tests
|
|
1800
|
-
npm test
|
|
1801
|
-
|
|
1802
|
-
# Run tests with comprehensive reporting
|
|
1803
|
-
npm run test:report
|
|
857
|
+
// Use bundled model for offline operation
|
|
858
|
+
const brainy = createAutoBrainy({
|
|
859
|
+
embeddingModel: BundledUniversalSentenceEncoder,
|
|
860
|
+
// Model loads from local files, no network needed!
|
|
861
|
+
})
|
|
1804
862
|
|
|
1805
|
-
|
|
1806
|
-
|
|
863
|
+
// Works exactly the same, but 100% offline
|
|
864
|
+
await brainy.add("This works without internet!", {
|
|
865
|
+
noun: NounType.Content
|
|
866
|
+
})
|
|
1807
867
|
```
|
|
1808
868
|
|
|
1809
|
-
##
|
|
869
|
+
## 🌐 Live Demo
|
|
1810
870
|
|
|
1811
|
-
|
|
871
|
+
**[Try the interactive demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - See Brainy in action with animations and examples.
|
|
1812
872
|
|
|
1813
|
-
|
|
1814
|
-
see [DEVELOPERS.md](DEVELOPERS.md).
|
|
873
|
+
## 🔧 Environment Support
|
|
1815
874
|
|
|
1816
|
-
|
|
875
|
+
| Environment | Storage | Threading | Auto-Configured |
|
|
876
|
+
|-------------|---------|-----------|-----------------|
|
|
877
|
+
| Browser | OPFS | Web Workers | ✅ |
|
|
878
|
+
| Node.js | FileSystem/S3 | Worker Threads | ✅ |
|
|
879
|
+
| Serverless | Memory/S3 | Limited | ✅ |
|
|
880
|
+
| Edge Functions | Memory/KV | Limited | ✅ |
|
|
1817
881
|
|
|
1818
|
-
|
|
882
|
+
## 📚 Documentation
|
|
1819
883
|
|
|
1820
|
-
|
|
1821
|
-
|
|
884
|
+
### Getting Started
|
|
885
|
+
- [**Quick Start Guide**](docs/getting-started/) - Get up and running in minutes
|
|
886
|
+
- [**Installation**](docs/getting-started/installation.md) - Detailed setup instructions
|
|
887
|
+
- [**Environment Setup**](docs/getting-started/environment-setup.md) - Platform-specific configuration
|
|
1822
888
|
|
|
1823
|
-
|
|
1824
|
-
|
|
889
|
+
### User Guides
|
|
890
|
+
- [**Search and Metadata**](docs/user-guides/) - Advanced search techniques
|
|
891
|
+
- [**JSON Document Search**](docs/guides/json-document-search.md) - Field-based searching
|
|
892
|
+
- [**Production Migration**](docs/guides/production-migration-guide.md) - Deployment best practices
|
|
1825
893
|
|
|
1826
|
-
|
|
1827
|
-
|
|
1828
|
-
|
|
1829
|
-
|
|
894
|
+
### API Reference
|
|
895
|
+
- [**Core API**](docs/api-reference/) - Complete method reference
|
|
896
|
+
- [**Configuration Options**](docs/api-reference/configuration.md) - All configuration parameters
|
|
897
|
+
- [**Auto-Configuration API**](docs/api-reference/auto-configuration-api.md) - Intelligent setup
|
|
1830
898
|
|
|
1831
|
-
|
|
1832
|
-
|
|
899
|
+
### Optimization & Scaling
|
|
900
|
+
- [**Large-Scale Optimizations**](docs/optimization-guides/) - Handle millions of vectors
|
|
901
|
+
- [**Memory Management**](docs/optimization-guides/memory-optimization.md) - Efficient resource usage
|
|
902
|
+
- [**S3 Migration Guide**](docs/optimization-guides/s3-migration-guide.md) - Cloud storage setup
|
|
1833
903
|
|
|
1834
|
-
|
|
904
|
+
### Examples & Patterns
|
|
905
|
+
- [**Code Examples**](docs/examples/) - Real-world usage patterns
|
|
906
|
+
- [**Integrations**](docs/examples/integrations.md) - Third-party services
|
|
907
|
+
- [**Performance Patterns**](docs/examples/performance.md) - Optimization techniques
|
|
1835
908
|
|
|
1836
|
-
|
|
1837
|
-
|
|
1838
|
-
|
|
1839
|
-
|
|
909
|
+
### Technical Documentation
|
|
910
|
+
- [**Architecture Overview**](docs/technical/) - System design and internals
|
|
911
|
+
- [**Testing Guide**](docs/technical/TESTING.md) - Testing strategies
|
|
912
|
+
- [**Statistics & Monitoring**](docs/technical/STATISTICS.md) - Performance tracking
|
|
1840
913
|
|
|
1841
|
-
|
|
1842
|
-
- `fix`: A bug fix (maps to **Fixed** section)
|
|
1843
|
-
- `chore`: Regular maintenance tasks (maps to **Changed** section)
|
|
1844
|
-
- `docs`: Documentation changes (maps to **Documentation** section)
|
|
1845
|
-
- `refactor`: Code changes that neither fix bugs nor add features (maps to **Changed** section)
|
|
1846
|
-
- `perf`: Performance improvements (maps to **Changed** section)
|
|
914
|
+
## 🤝 Contributing
|
|
1847
915
|
|
|
1848
|
-
|
|
916
|
+
We welcome contributions! Please see:
|
|
917
|
+
- [Contributing Guidelines](CONTRIBUTING.md)
|
|
918
|
+
- [Developer Documentation](docs/development/DEVELOPERS.md)
|
|
919
|
+
- [Code of Conduct](CODE_OF_CONDUCT.md)
|
|
1849
920
|
|
|
1850
|
-
|
|
921
|
+
## 📄 License
|
|
1851
922
|
|
|
1852
|
-
|
|
1853
|
-
# Update version and generate changelog
|
|
1854
|
-
npm run _release:patch # or _release:minor, _release:major
|
|
923
|
+
[MIT](LICENSE)
|
|
1855
924
|
|
|
1856
|
-
|
|
1857
|
-
npm run _github-release
|
|
925
|
+
## 🔗 Related Projects
|
|
1858
926
|
|
|
1859
|
-
|
|
1860
|
-
npm publish
|
|
1861
|
-
```
|
|
927
|
+
- [**Cartographer**](https://github.com/sodal-project/cartographer) - Standardized interfaces for Brainy
|
|
1862
928
|
|
|
1863
|
-
|
|
929
|
+
---
|
|
1864
930
|
|
|
1865
|
-
|
|
931
|
+
<div align="center">
|
|
932
|
+
<strong>Ready to build something amazing? Get started with Brainy today!</strong>
|
|
933
|
+
</div>
|