@soulcraft/brainy 0.37.0 → 0.38.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,1859 +7,927 @@
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.4.5-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
9
9
 
10
- [//]: # ([![Cartographer](https://img.shields.io/badge/Cartographer-Official%20Standard-brightgreen)](https://github.com/sodal-project/cartographer))
11
-
12
10
  **A powerful graph & vector data platform for AI applications across any environment**
13
11
 
14
12
  </div>
15
13
 
16
- ## ✨ Overview
17
-
18
- Brainy combines the power of vector search with graph relationships in a lightweight, cross-platform database. Whether
19
- you're building AI applications, recommendation systems, or knowledge graphs, Brainy provides the tools you need to
20
- store, connect, and retrieve your data intelligently.
21
-
22
- What makes Brainy special? It intelligently adapts to your environment! Brainy automatically detects your platform,
23
- adjusts its storage strategy, and optimizes performance based on your usage patterns. The more you use it, the smarter
24
- it gets - learning from your data to provide increasingly relevant results and connections.
25
-
26
- ### 🚀 Key Features
27
-
28
- - **🧠 Zero Configuration** - Auto-detects environment and optimizes automatically
29
- - **⚡ Production-Scale Performance** - Handles millions of vectors with sub-second search
30
- - **🎯 Intelligent Partitioning** - Semantic clustering with auto-tuning
31
- - **📊 Adaptive Learning** - Gets smarter with usage, optimizes itself over time
32
- - **🗄️ Smart Storage** - OPFS, FileSystem, S3 auto-selection based on environment
33
- - **💾 Massive Memory Optimization** - 75% reduction with compression, intelligent caching
34
- - **🚀 Distributed Search** - Parallel processing with load balancing
35
- - **🔄 Real-Time Adaptation** - Automatically adjusts to your data patterns
36
- - **Run Everywhere** - Works in browsers, Node.js, serverless functions, and containers
37
- - **Vector Search** - Find semantically similar content using embeddings
38
- - **Advanced JSON Document Search** - Search within specific fields of JSON documents with field prioritization and
39
- service-based field standardization
40
- - **Graph Relationships** - Connect data with meaningful relationships
41
- - **Streaming Pipeline** - Process data in real-time as it flows through the system
42
- - **Extensible Augmentations** - Customize and extend functionality with pluggable components
43
- - **Built-in Conduits** - Sync and scale across instances with WebSocket and WebRTC
44
- - **TensorFlow Integration** - Use TensorFlow.js for high-quality embeddings
45
- - **Persistent Storage** - Data persists across sessions and scales to any size
46
- - **TypeScript Support** - Fully typed API with generics
47
- - **CLI Tools & Web Service** - Command-line interface and REST API web service for data management
48
- - **Model Control Protocol (MCP)** - Allow external AI models to access Brainy data and use augmentation pipeline as
49
- tools
50
-
51
- ## ⚡ Large-Scale Performance Optimizations
52
-
53
- **New in v0.36.0**: Brainy now includes 6 core optimizations that transform it from a prototype into a production-ready system capable of handling millions of vectors:
54
-
55
- ### 🎯 Performance Benchmarks
56
-
57
- | Dataset Size | Search Time | Memory Usage | API Calls Reduction |
58
- |-------------|-------------|--------------|-------------------|
59
- | **10k vectors** | ~50ms | Standard | N/A |
60
- | **100k vectors** | ~200ms | 30% reduction | 50-70% fewer |
61
- | **1M+ vectors** | ~500ms | 75% reduction | 50-90% fewer |
62
-
63
- ### 🧠 6 Core Optimization Systems
64
-
65
- 1. **🎛️ Auto-Configuration System** - Detects environment, resources, and data patterns
66
- 2. **🔀 Semantic Partitioning** - Intelligent clustering with auto-tuning (4-32 clusters)
67
- 3. **🚀 Distributed Search** - Parallel processing across partitions with load balancing
68
- 4. **🧠 Multi-Level Caching** - Hot/Warm/Cold caching with predictive prefetching
69
- 5. **📦 Batch S3 Operations** - Reduces cloud storage API calls by 50-90%
70
- 6. **💾 Advanced Compression** - Vector quantization and memory-mapping for large datasets
14
+ ## ✨ What is Brainy?
71
15
 
72
- ### 🎯 Automatic Environment Detection
73
-
74
- | Environment | Auto-Configured | Performance Focus |
75
- |-------------|-----------------|-------------------|
76
- | **Browser** | OPFS + Web Workers | Memory efficiency, 512MB-1GB limits |
77
- | **Node.js** | FileSystem + Worker Threads | High performance, 4GB-8GB+ usage |
78
- | **Serverless** | S3 + Memory cache | Cold start optimization, latency focus |
79
-
80
- ### 📊 Intelligent Scaling Strategy
81
-
82
- The system automatically adapts based on your dataset size:
83
-
84
- - **< 25k vectors**: Single optimized index, no partitioning needed
85
- - **25k - 100k**: Semantic clustering (4-8 clusters), balanced performance
86
- - **100k - 1M**: Advanced partitioning (8-16 clusters), scale-optimized
87
- - **1M+ vectors**: Maximum optimization (16-32 clusters), enterprise-grade
88
-
89
- ### 🧠 Adaptive Learning Features
90
-
91
- - **Performance Monitoring**: Tracks latency, cache hits, memory usage
92
- - **Dynamic Tuning**: Adjusts parameters every 50 searches based on performance
93
- - **Pattern Recognition**: Learns from access patterns to improve predictions
94
- - **Self-Optimization**: Automatically enables/disables features based on workload
95
-
96
- > **📖 Full Documentation**: See the complete [Large-Scale Optimizations Guide](docs/optimization-guides/large-scale-optimizations.md) for detailed configuration options and advanced usage.
97
-
98
- ## 🚀 Live Demo
99
-
100
- **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the interactive demo on
101
- GitHub Pages that showcases Brainy's main features.
102
-
103
- ## 📊 What Can You Build?
104
-
105
- - **Semantic Search Engines** - Find content based on meaning, not just keywords
106
- - **Recommendation Systems** - Suggest similar items based on vector similarity
107
- - **Knowledge Graphs** - Build connected data structures with relationships
108
- - **AI Applications** - Store and retrieve embeddings for machine learning models
109
- - **AI-Enhanced Applications** - Build applications that leverage vector embeddings for intelligent data processing
110
- - **Data Organization Tools** - Automatically categorize and connect related information
111
- - **Adaptive Experiences** - Create applications that learn and evolve with your users
112
- - **Model-Integrated Systems** - Connect external AI models to Brainy data and tools using MCP
113
-
114
- ## 🔧 Installation
115
-
116
- ```bash
117
- npm install @soulcraft/brainy
118
- ```
16
+ Imagine a database that thinks like you do - connecting ideas, finding patterns, and getting smarter over time. Brainy is the **AI-native database** that brings vector search and knowledge graphs together in one powerful, ridiculously easy-to-use package.
119
17
 
120
- TensorFlow.js packages are included as bundled dependencies and will be automatically installed without any additional
121
- configuration.
18
+ ### 🆕 NEW: Distributed Mode (v0.38+)
19
+ **Scale horizontally with zero configuration!** Brainy now supports distributed deployments with automatic coordination:
20
+ - **🌐 Multi-Instance Coordination** - Multiple readers and writers working in harmony
21
+ - **🏷️ Smart Domain Detection** - Automatically categorizes data (medical, legal, product, etc.)
22
+ - **📊 Real-Time Health Monitoring** - Track performance across all instances
23
+ - **🔄 Automatic Role Optimization** - Readers optimize for cache, writers for throughput
24
+ - **🗂️ Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
122
25
 
123
- ### Additional Packages
26
+ ### 🚀 Why Developers Love Brainy
124
27
 
125
- Brainy offers specialized packages for different use cases:
28
+ - **🧠 It Just Works™** - No config files, no tuning parameters, no DevOps headaches. Brainy auto-detects your environment and optimizes itself
29
+ - **🌍 True Write-Once, Run-Anywhere** - Same code runs in React, Angular, Vue, Node.js, Deno, Bun, serverless, edge workers, and even vanilla HTML
30
+ - **⚡ Scary Fast** - Handles millions of vectors with sub-millisecond search. Built-in GPU acceleration when available
31
+ - **🎯 Self-Learning** - Like having a database that goes to the gym. Gets faster and smarter the more you use it
32
+ - **🔮 AI-First Design** - Built for the age of embeddings, RAG, and semantic search. Your LLMs will thank you
33
+ - **🎮 Actually Fun to Use** - Clean API, great DX, and it does the heavy lifting so you can build cool stuff
126
34
 
127
- #### CLI Package
35
+ ## 🚀 Quick Start (30 seconds!)
128
36
 
37
+ ### Node.js TLDR
129
38
  ```bash
130
- npm install -g @soulcraft/brainy-cli
131
- ```
132
-
133
- Command-line interface for data management, bulk operations, and database administration.
134
-
135
- #### Web Service Package
39
+ # Install
40
+ npm install brainy
136
41
 
137
- ```bash
138
- npm install @soulcraft/brainy-web-service
42
+ # Use it
139
43
  ```
44
+ ```javascript
45
+ import { createAutoBrainy, NounType, VerbType } from 'brainy'
140
46
 
141
- REST API web service wrapper that provides HTTP endpoints for search operations and database queries.
142
-
143
- ## 🚀 Quick Setup - Zero Configuration!
144
-
145
- **New in v0.36.0**: Brainy now automatically detects your environment and optimizes itself! Choose your scenario:
146
-
147
- ### ✨ Instant Setup (Auto-Everything)
148
- ```typescript
149
- import { createAutoBrainy } from '@soulcraft/brainy'
150
-
151
- // That's it! Everything is auto-configured
152
47
  const brainy = createAutoBrainy()
153
48
 
154
- // Add data and search - all optimizations enabled automatically
155
- await brainy.addVector({ id: '1', vector: [0.1, 0.2, 0.3], text: 'Hello world' })
156
- const results = await brainy.search([0.1, 0.2, 0.3], 10)
157
- ```
158
-
159
- ### 📦 With S3 Storage (Still Auto-Configured)
160
- ```typescript
161
- import { createAutoBrainy } from '@soulcraft/brainy'
162
-
163
- // Auto-detects AWS credentials from environment variables
164
- const brainy = createAutoBrainy({
165
- bucketName: 'my-vector-storage'
166
- // region: 'us-east-1' (default)
167
- // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from env
168
- })
169
- ```
170
-
171
- ### 🎯 Scenario-Based Setup
172
- ```typescript
173
- import { createQuickBrainy } from '@soulcraft/brainy'
174
-
175
- // Choose your scale: 'small', 'medium', 'large', 'enterprise'
176
- const brainy = await createQuickBrainy('large', {
177
- bucketName: 'my-big-vector-db'
178
- })
179
- ```
180
-
181
- | Scenario | Dataset Size | Memory Usage | S3 Required | Best For |
182
- |----------|-------------|--------------|-------------|----------|
183
- | `small` | ≤10k vectors | ≤1GB | No | Development, testing |
184
- | `medium` | ≤100k vectors | ≤4GB | Serverless only | Production apps |
185
- | `large` | ≤1M vectors | ≤8GB | Yes | Large applications |
186
- | `enterprise` | ≤10M vectors | ≤32GB | Yes | Enterprise systems |
187
-
188
- ### 🧠 What Auto-Configuration Does
189
-
190
- - **🎯 Environment Detection**: Browser, Node.js, or Serverless
191
- - **💾 Smart Memory Management**: Uses available RAM optimally
192
- - **🗄️ Storage Selection**: OPFS, FileSystem, S3, or Memory
193
- - **⚡ Performance Tuning**: Threading, caching, compression
194
- - **📊 Adaptive Learning**: Improves performance over time
195
- - **🔍 Semantic Partitioning**: Auto-clusters similar vectors
196
-
197
- ## 🏁 Traditional Setup (Manual Configuration)
198
-
199
- If you prefer manual control:
200
-
201
- ```typescript
202
- import { BrainyData, NounType, VerbType } from '@soulcraft/brainy'
203
-
204
- // Create and initialize the database
205
- const db = new BrainyData()
206
- await db.init()
207
-
208
- // Add data (automatically converted to vectors)
209
- const catId = await db.add("Cats are independent pets", {
49
+ // Add data with Nouns (entities)
50
+ const catId = await brainy.add("Siamese cats are elegant and vocal", {
210
51
  noun: NounType.Thing,
211
- category: 'animal'
52
+ breed: "Siamese",
53
+ category: "animal"
212
54
  })
213
55
 
214
- const dogId = await db.add("Dogs are loyal companions", {
215
- noun: NounType.Thing,
216
- category: 'animal'
56
+ const ownerId = await brainy.add("John loves his pets", {
57
+ noun: NounType.Person,
58
+ name: "John Smith"
217
59
  })
218
60
 
219
- // Search for similar items
220
- const results = await db.searchText("feline pets", 2)
221
- console.log(results)
222
-
223
- // Add a relationship between items
224
- await db.addVerb(catId, dogId, {
225
- verb: VerbType.RelatedTo,
226
- description: 'Both are common household pets'
61
+ // Connect with Verbs (relationships)
62
+ await brainy.addVerb(ownerId, catId, {
63
+ verb: VerbType.Owns,
64
+ since: "2020-01-01"
227
65
  })
228
- ```
229
-
230
- ### Import Options
231
-
232
- ```typescript
233
- // Standard import - automatically adapts to any environment
234
- import { BrainyData } from '@soulcraft/brainy'
235
-
236
- // Minified version for production
237
- import { BrainyData } from '@soulcraft/brainy/min'
238
- ```
239
-
240
- > **Note**: The CLI functionality is available as a separate package `@soulcraft/brainy-cli` to reduce the bundle size
241
- > of the main package. Install it globally with `npm install -g @soulcraft/brainy-cli` to use the command-line
242
- > interface.
243
-
244
- ### Browser Usage
245
-
246
- ```html
247
66
 
248
- <script type="module">
249
- // Use local files instead of CDN
250
- import { BrainyData } from './dist/unified.js'
67
+ // Search by meaning
68
+ const results = await brainy.searchText("feline companions", 5)
251
69
 
252
- // Or minified version
253
- // import { BrainyData } from './dist/unified.min.js'
70
+ // Search JSON documents by specific fields
71
+ const docs = await brainy.searchDocuments("Siamese", {
72
+ fields: ['breed', 'category'], // Search these fields
73
+ weights: { breed: 2.0 }, // Prioritize breed matches
74
+ limit: 10
75
+ })
254
76
 
255
- const db = new BrainyData()
256
- await db.init()
257
- // ...
258
- </script>
77
+ // Find relationships
78
+ const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
259
79
  ```
260
80
 
261
- Modern bundlers like Webpack, Rollup, and Vite will automatically use the unified build which adapts to any environment.
262
-
263
- ## 🧩 How It Works
264
-
265
- Brainy combines **six advanced optimization systems** with core vector database technologies to create a production-ready, self-optimizing system:
266
-
267
- ### 🔧 Core Technologies
268
- 1. **Vector Embeddings** - Converts data (text, images, etc.) into numerical vectors using TensorFlow.js
269
- 2. **Optimized HNSW Algorithm** - Fast similarity search with semantic partitioning and distributed processing
270
- 3. **🧠 Auto-Configuration Engine** - Detects environment, resources, and data patterns to optimize automatically
271
- 4. **🎯 Intelligent Storage System** - Multi-level caching with predictive prefetching and batch operations
272
-
273
- ### ⚡ Advanced Optimization Layer
274
- 5. **Semantic Partitioning** - Auto-clusters similar vectors for faster search (4-32 clusters based on scale)
275
- 6. **Distributed Search** - Parallel processing across partitions with intelligent load balancing
276
- 7. **Multi-Level Caching** - Hot (RAM) → Warm (Fast Storage) → Cold (S3/Disk) with 70-90% hit rates
277
- 8. **Batch Operations** - Reduces S3 API calls by 50-90% through intelligent batching
278
- 9. **Adaptive Learning** - Continuously learns from usage patterns and optimizes performance
279
- 10. **Advanced Compression** - Vector quantization achieves 75% memory reduction for large datasets
280
-
281
- ### 🎯 Environment-Specific Optimizations
81
+ That's it! No config, no setup, it just works™
282
82
 
283
- | Environment | Storage | Threading | Memory | Focus |
284
- |-------------|---------|-----------|---------|-------|
285
- | **Browser** | OPFS + Cache | Web Workers | 512MB-1GB | Responsiveness |
286
- | **Node.js** | FileSystem + S3 | Worker Threads | 4GB-8GB+ | Throughput |
287
- | **Serverless** | S3 + Memory | Limited | 1GB-2GB | Cold Start Speed |
288
-
289
- ### 🔄 Adaptive Intelligence Flow
290
- ```
291
- Data Input → Auto-Detection → Environment Optimization → Semantic Partitioning →
292
- Distributed Search → Multi-Level Caching → Performance Learning → Self-Tuning
293
- ```
83
+ ### 🌐 Distributed Mode Example (NEW!)
84
+ ```javascript
85
+ // Writer Instance - Ingests data from multiple sources
86
+ const writer = createAutoBrainy({
87
+ storage: { type: 's3', bucket: 'my-bucket' },
88
+ distributed: { role: 'writer' } // Explicit role for safety
89
+ })
294
90
 
295
- The system **automatically adapts** to your environment, learns from your usage patterns, and **continuously optimizes itself** for better performance over time.
91
+ // Reader Instance - Optimized for search queries
92
+ const reader = createAutoBrainy({
93
+ storage: { type: 's3', bucket: 'my-bucket' },
94
+ distributed: { role: 'reader' } // 80% memory for cache
95
+ })
296
96
 
297
- ## 🚀 The Brainy Pipeline
97
+ // Data automatically gets domain tags
98
+ await writer.add("Patient shows symptoms of...", {
99
+ diagnosis: "flu" // Auto-tagged as 'medical' domain
100
+ })
298
101
 
299
- Brainy's data processing pipeline transforms raw data into searchable, connected knowledge that gets smarter over time:
102
+ // Domain-aware search across all partitions
103
+ const results = await reader.search("medical symptoms", 10, {
104
+ filter: { domain: 'medical' } // Only search medical data
105
+ })
300
106
 
301
- ```
302
- Raw Data Embedding → Vector Storage → Graph Connections → Adaptive Learning → Query & Retrieval
107
+ // Monitor health across all instances
108
+ const health = reader.getHealthStatus()
109
+ console.log(`Instance ${health.instanceId}: ${health.status}`)
303
110
  ```
304
111
 
305
- Each time data flows through this pipeline, Brainy learns more about your usage patterns and environment, making future
306
- operations faster and more relevant.
307
-
308
- ### Pipeline Stages
309
-
310
- 1. **Data Ingestion**
311
- - Raw text or pre-computed vectors enter the pipeline
312
- - Data is validated and prepared for processing
313
-
314
- 2. **Embedding Generation**
315
- - Text is transformed into numerical vectors using embedding models
316
- - Uses TensorFlow Universal Sentence Encoder for high-quality text embeddings
317
- - Custom embedding functions can be plugged in for specialized domains
318
-
319
- 3. **Vector Indexing**
320
- - Vectors are indexed using the HNSW algorithm
321
- - Hierarchical structure enables fast similarity search
322
- - Configurable parameters for precision vs. performance tradeoffs
323
-
324
- 4. **Graph Construction**
325
- - Nouns (entities) become nodes in the knowledge graph
326
- - Verbs (relationships) connect related entities
327
- - Typed relationships add semantic meaning to connections
328
-
329
- 5. **Adaptive Learning**
330
- - Analyzes usage patterns to optimize future operations
331
- - Tunes performance parameters based on your environment
332
- - Adjusts search strategies based on query history
333
- - Becomes more efficient and relevant the more you use it
334
-
335
- 6. **Intelligent Storage**
336
- - Data is saved using the optimal storage for your environment
337
- - Automatic selection between OPFS, filesystem, S3, or memory
338
- - Migrates between storage types as your application's needs evolve
339
- - Scales from tiny datasets to massive data collections
340
- - Configurable storage adapters for custom persistence needs
341
-
342
- ### Augmentation Types
343
-
344
- Brainy uses a powerful augmentation system to extend functionality. Augmentations are processed in the following order:
345
-
346
- 1. **SENSE**
347
- - Ingests and processes raw, unstructured data into nouns and verbs
348
- - Handles text, images, audio streams, and other input formats
349
- - Example: Converting raw text into structured entities
350
-
351
- 2. **MEMORY**
352
- - Provides storage capabilities for data in different formats
353
- - Manages persistence across sessions
354
- - Example: Storing vectors in OPFS or filesystem
355
-
356
- 3. **COGNITION**
357
- - Enables advanced reasoning, inference, and logical operations
358
- - Analyzes relationships between entities
359
- - Examples:
360
- - Inferring new connections between existing data
361
- - Deriving insights from graph relationships
362
-
363
- 4. **CONDUIT**
364
- - Establishes channels for structured data exchange
365
- - Connects with external systems and syncs between Brainy instances
366
- - Two built-in iConduit augmentations for scaling out and syncing:
367
- - **WebSocket iConduit** - Syncs data between browsers and servers
368
- - **WebRTC iConduit** - Direct peer-to-peer syncing between browsers
369
- - Examples:
370
- - Integrating with third-party APIs
371
- - Syncing Brainy instances between browsers using WebSockets
372
- - Peer-to-peer syncing between browsers using WebRTC
373
-
374
- 5. **ACTIVATION**
375
- - Initiates actions, responses, or data manipulations
376
- - Triggers events based on data changes
377
- - Example: Sending notifications when new data is processed
378
-
379
- 6. **PERCEPTION**
380
- - Interprets, contextualizes, and visualizes identified nouns and verbs
381
- - Creates meaningful representations of data
382
- - Example: Generating visualizations of graph relationships
383
-
384
- 7. **DIALOG**
385
- - Facilitates natural language understanding and generation
386
- - Enables conversational interactions
387
- - Example: Processing user queries and generating responses
388
-
389
- 8. **WEBSOCKET**
390
- - Enables real-time communication via WebSockets
391
- - Can be combined with other augmentation types
392
- - Example: Streaming data processing in real-time
393
-
394
- ### Streaming Data Support
395
-
396
- Brainy's pipeline is designed to handle streaming data efficiently:
397
-
398
- 1. **WebSocket Integration**
399
- - Built-in support for WebSocket connections
400
- - Process data as it arrives without blocking
401
- - Example: `setupWebSocketPipeline(url, dataType, options)`
402
-
403
- 2. **Asynchronous Processing**
404
- - Non-blocking architecture for real-time data handling
405
- - Parallel processing of incoming streams
406
- - Example: `createWebSocketHandler(connection, dataType, options)`
407
-
408
- 3. **Event-Based Architecture**
409
- - Augmentations can listen to data feeds and streams
410
- - Real-time updates propagate through the pipeline
411
- - Example: `listenToFeed(feedUrl, callback)`
412
-
413
- 4. **Threaded Execution**
414
- - Comprehensive multi-threading for high-performance operations
415
- - Parallel processing for batch operations, vector calculations, and embedding generation
416
- - Configurable execution modes (SEQUENTIAL, PARALLEL, THREADED)
417
- - Automatic thread management based on environment capabilities
418
- - Example: `executeTypedPipeline(augmentations, method, args, { mode: ExecutionMode.THREADED })`
419
-
420
- ### Running the Pipeline
421
-
422
- The pipeline runs automatically when you:
423
-
424
- ```typescript
425
- // Add data (runs embedding → indexing → storage)
426
- const id = await db.add("Your text data here", { metadata })
112
+ ## 🎭 Key Features
427
113
 
428
- // Search (runs embedding → similarity search)
429
- const results = await db.searchText("Your query here", 5)
430
-
431
- // Connect entities (runs graph construction storage)
432
- await db.addVerb(sourceId, targetId, { verb: VerbType.RelatedTo })
433
- ```
114
+ ### Core Capabilities
115
+ - **Vector Search** - Find semantically similar content using embeddings
116
+ - **Graph Relationships** - Connect data with meaningful relationships
117
+ - **JSON Document Search** - Search within specific fields with prioritization
118
+ - **Distributed Mode** - Scale horizontally with automatic coordination between instances
119
+ - **Real-Time Syncing** - WebSocket and WebRTC for distributed instances
120
+ - **Streaming Pipeline** - Process data in real-time as it flows through
121
+ - **Model Control Protocol** - Let AI models access your data
122
+
123
+ ### Smart Optimizations
124
+ - **Auto-Configuration** - Detects environment and optimizes automatically
125
+ - **Adaptive Learning** - Gets smarter with usage, optimizes itself over time
126
+ - **Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
127
+ - **Role-Based Optimization** - Readers maximize cache, writers optimize throughput
128
+ - **Domain-Aware Indexing** - Automatic categorization improves search relevance
129
+ - **Multi-Level Caching** - Hot/warm/cold caching with predictive prefetching
130
+ - **Memory Optimization** - 75% reduction with compression for large datasets
131
+
132
+ ### Developer Experience
133
+ - **TypeScript Support** - Fully typed API with generics
134
+ - **Extensible Augmentations** - Customize and extend functionality
135
+ - **REST API** - Web service wrapper for HTTP endpoints
136
+ - **Auto-Complete** - IntelliSense for all APIs and types
434
137
 
435
- Using the CLI:
138
+ ## 📦 Installation
436
139
 
140
+ ### Main Package
437
141
  ```bash
438
- # Add data through the CLI pipeline
439
- brainy add "Your text data here" '{"noun":"Thing"}'
440
-
441
- # Search through the CLI pipeline
442
- brainy search "Your query here" --limit 5
443
-
444
- # Connect entities through the CLI
445
- brainy addVerb <sourceId> <targetId> RelatedTo
446
- ```
447
-
448
- ### Extending the Pipeline
449
-
450
- Brainy's pipeline is designed for extensibility at every stage:
451
-
452
- 1. **Custom Embedding**
453
- ```typescript
454
- // Create your own embedding function
455
- const myEmbedder = async (text) => {
456
- // Your custom embedding logic here
457
- return [0.1, 0.2, 0.3, ...] // Return a vector
458
- }
459
-
460
- // Use it in Brainy
461
- const db = new BrainyData({
462
- embeddingFunction: myEmbedder
463
- })
464
- ```
465
-
466
- 2. **Custom Distance Functions**
467
- ```typescript
468
- // Define your own distance function
469
- const myDistance = (a, b) => {
470
- // Your custom distance calculation
471
- return Math.sqrt(a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0))
472
- }
473
-
474
- // Use it in Brainy
475
- const db = new BrainyData({
476
- distanceFunction: myDistance
477
- })
478
- ```
479
-
480
- 3. **Custom Storage Adapters**
481
- ```typescript
482
- // Implement the StorageAdapter interface
483
- class MyStorage implements StorageAdapter {
484
- // Your storage implementation
485
- }
486
-
487
- // Use it in Brainy
488
- const db = new BrainyData({
489
- storageAdapter: new MyStorage()
490
- })
491
- ```
492
-
493
- 4. **Augmentations System**
494
- ```typescript
495
- // Create custom augmentations to extend functionality
496
- const myAugmentation = {
497
- type: 'memory',
498
- name: 'my-custom-storage',
499
- // Implementation details
500
- }
501
-
502
- // Register with Brainy
503
- db.registerAugmentation(myAugmentation)
504
- ```
505
-
506
- ## Data Model
507
-
508
- Brainy uses a graph-based data model with two primary concepts:
509
-
510
- ### Nouns (Entities)
511
-
512
- The main entities in your data (nodes in the graph):
513
-
514
- - Each noun has a unique ID, vector representation, and metadata
515
- - Nouns can be categorized by type (Person, Place, Thing, Event, Concept, etc.)
516
- - Nouns are automatically vectorized for similarity search
517
-
518
- ### Verbs (Relationships)
519
-
520
- Connections between nouns (edges in the graph):
521
-
522
- - Each verb connects a source noun to a target noun
523
- - Verbs have types that define the relationship (RelatedTo, Controls, Contains, etc.)
524
- - Verbs can have their own metadata to describe the relationship
525
-
526
- ### Type Utilities
527
-
528
- Brainy provides utility functions to access lists of noun and verb types:
529
-
530
- ```typescript
531
- import {
532
- NounType,
533
- VerbType,
534
- getNounTypes,
535
- getVerbTypes,
536
- getNounTypeMap,
537
- getVerbTypeMap
538
- } from '@soulcraft/brainy'
539
-
540
- // At development time:
541
- // Access specific types directly from the NounType and VerbType objects
542
- console.log(NounType.Person) // 'person'
543
- console.log(VerbType.Contains) // 'contains'
544
-
545
- // At runtime:
546
- // Get a list of all noun types
547
- const nounTypes = getNounTypes() // ['person', 'organization', 'location', ...]
548
-
549
- // Get a list of all verb types
550
- const verbTypes = getVerbTypes() // ['relatedTo', 'contains', 'partOf', ...]
551
-
552
- // Get a map of noun type keys to values
553
- const nounTypeMap = getNounTypeMap() // { Person: 'person', Organization: 'organization', ... }
554
-
555
- // Get a map of verb type keys to values
556
- const verbTypeMap = getVerbTypeMap() // { RelatedTo: 'relatedTo', Contains: 'contains', ... }
142
+ npm install brainy
557
143
  ```
558
144
 
559
- These utility functions make it easy to:
560
-
561
- - Get a complete list of available noun and verb types
562
- - Validate user input against valid types
563
- - Create dynamic UI components that display or select from available types
564
- - Map between type keys and their string values
565
-
566
- ## Command Line Interface
567
-
568
- Brainy includes a powerful CLI for managing your data. The CLI is available as a separate package
569
- `@soulcraft/brainy-cli` to reduce the bundle size of the main package.
570
-
571
- ### Installing and Using the CLI
572
-
145
+ ### Optional: Offline Models Package
573
146
  ```bash
574
- # Install the CLI globally
575
- npm install -g @soulcraft/brainy-cli
576
-
577
- # Initialize a database
578
- brainy init
579
-
580
- # Add some data
581
- brainy add "Cats are independent pets" '{"noun":"Thing","category":"animal"}'
582
- brainy add "Dogs are loyal companions" '{"noun":"Thing","category":"animal"}'
583
-
584
- # Search for similar items
585
- brainy search "feline pets" 5
586
-
587
- # Add relationships between items
588
- brainy addVerb <sourceId> <targetId> RelatedTo '{"description":"Both are pets"}'
589
-
590
- # Visualize the graph structure
591
- brainy visualize
592
- brainy visualize --root <id> --depth 3
147
+ npm install @soulcraft/brainy-models
593
148
  ```
594
149
 
595
- ### Using the CLI in Your Code
150
+ The `@soulcraft/brainy-models` package provides **offline access** to the Universal Sentence Encoder model, eliminating network dependencies and ensuring consistent performance. Perfect for:
151
+ - **Air-gapped environments** - No internet? No problem
152
+ - **Consistent performance** - No network latency or throttling
153
+ - **Privacy-focused apps** - Keep everything local
154
+ - **High-reliability systems** - No external dependencies
596
155
 
597
- The CLI functionality is available as a separate package `@soulcraft/brainy-cli`. If you need CLI functionality in your
598
- application, install the CLI package:
156
+ ```javascript
157
+ import { createAutoBrainy } from 'brainy'
158
+ import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
599
159
 
600
- ```bash
601
- npm install @soulcraft/brainy-cli
160
+ // Use the bundled model for offline operation
161
+ const brainy = createAutoBrainy({
162
+ embeddingModel: BundledUniversalSentenceEncoder
163
+ })
602
164
  ```
603
165
 
604
- Then you can use the CLI commands programmatically or through the command line interface.
605
-
606
- ### Available Commands
607
-
608
- #### Basic Database Operations:
609
-
610
- - `init` - Initialize a new database
611
- - `add <text> [metadata]` - Add a new noun with text and optional metadata
612
- - `search <query> [limit]` - Search for nouns similar to the query
613
- - `get <id>` - Get a noun by ID
614
- - `delete <id>` - Delete a noun by ID
615
- - `addVerb <sourceId> <targetId> <verbType> [metadata]` - Add a relationship
616
- - `getVerbs <id>` - Get all relationships for a noun
617
- - `status` - Show database status
618
- - `clear` - Clear all data from the database
619
- - `generate-random-graph` - Generate test data
620
- - `visualize` - Visualize the graph structure
621
- - `completion-setup` - Setup shell autocomplete
622
-
623
- #### Pipeline and Augmentation Commands:
624
-
625
- - `list-augmentations` - List all available augmentation types and registered augmentations
626
- - `augmentation-info <type>` - Get detailed information about a specific augmentation type
627
- - `test-pipeline [text]` - Test the sequential pipeline with sample data
628
- - `-t, --data-type <type>` - Type of data to process (default: 'text')
629
- - `-m, --mode <mode>` - Execution mode: sequential, parallel, threaded (default: 'sequential')
630
- - `-s, --stop-on-error` - Stop execution if an error occurs
631
- - `-v, --verbose` - Show detailed output
632
- - `stream-test` - Test streaming data through the pipeline (simulated)
633
- - `-c, --count <number>` - Number of data items to stream (default: 5)
634
- - `-i, --interval <ms>` - Interval between data items in milliseconds (default: 1000)
635
- - `-t, --data-type <type>` - Type of data to process (default: 'text')
636
- - `-v, --verbose` - Show detailed output
637
-
638
- ## 📚 Documentation
639
-
640
- ### 🚀 [Getting Started](docs/getting-started/)
641
- Quick setup guides and first steps with Brainy.
642
-
643
- - **[Installation](docs/getting-started/installation.md)** - Installation and setup
644
- - **[Quick Start](docs/getting-started/quick-start.md)** - Get running in 2 minutes
645
- - **[First Steps](docs/getting-started/first-steps.md)** - Core concepts and features
646
- - **[Environment Setup](docs/getting-started/environment-setup.md)** - Environment-specific configuration
647
-
648
- ### 📖 [User Guides](docs/user-guides/)
649
- Comprehensive guides for using Brainy effectively.
166
+ ## 🎨 Build Amazing Things
650
167
 
651
- - **[Search and Metadata](docs/user-guides/SEARCH_AND_METADATA_GUIDE.md)** - Advanced search techniques
652
- - **[Write-Only Mode](docs/user-guides/WRITEONLY_MODE_IMPLEMENTATION.md)** - High-throughput data loading
653
- - **[JSON Document Search](docs/guides/json-document-search.md)** - Search within JSON fields
654
- - **[Production Migration](docs/guides/production-migration-guide.md)** - Deployment best practices
168
+ **🤖 AI Chat Applications** - Build ChatGPT-like apps with long-term memory and context awareness
169
+ **🔍 Semantic Search Engines** - Search by meaning, not keywords. Find "that thing that's like a cat but bigger" → returns "tiger"
170
+ **🎯 Recommendation Engines** - "Users who liked this also liked..." but actually good
171
+ **🧬 Knowledge Graphs** - Connect everything to everything. Wikipedia meets Neo4j meets magic
172
+ **👁️ Computer Vision Apps** - Store and search image embeddings. "Find all photos with dogs wearing hats"
173
+ **🎵 Music Discovery** - Find songs that "feel" similar. Spotify's Discover Weekly in your app
174
+ **📚 Smart Documentation** - Docs that answer questions. "How do I deploy to production?" → relevant guides
175
+ **🛡️ Fraud Detection** - Find patterns humans can't see. Anomaly detection on steroids
176
+ **🌐 Real-Time Collaboration** - Sync vector data across devices. Figma for AI data
177
+ **🏥 Medical Diagnosis Tools** - Match symptoms to conditions using embedding similarity
655
178
 
656
- ### [Optimization Guides](docs/optimization-guides/)
657
- Transform Brainy from prototype to production-ready system.
179
+ ## 🧬 The Power of Nouns & Verbs
658
180
 
659
- - **[Large-Scale Optimizations](docs/optimization-guides/large-scale-optimizations.md)** - Complete v0.36.0 optimization system
660
- - **[Auto-Configuration](docs/optimization-guides/auto-configuration.md)** - Intelligent environment detection
661
- - **[Memory Optimization](docs/optimization-guides/memory-optimization.md)** - Advanced memory management
662
- - **[Storage Optimization](docs/optimization-guides/storage-optimization.md)** - S3 and storage optimization
181
+ Brainy uses a **graph-based data model** that mirrors how humans think - with **Nouns** (entities) connected by **Verbs** (relationships). This isn't just vectors in a void; it's structured, meaningful data.
663
182
 
664
- ### 🔧 [API Reference](docs/api-reference/)
665
- Complete API documentation and method references.
183
+ ### 📝 Nouns (What Things Are)
666
184
 
667
- - **[Core API](docs/api-reference/core-api.md)** - Main BrainyData class methods
668
- - **[Vector Operations](docs/api-reference/vector-operations.md)** - Vector storage and search
669
- - **[Configuration](docs/api-reference/configuration.md)** - System configuration
670
- - **[Auto-Configuration API](docs/api-reference/auto-configuration-api.md)** - Intelligent configuration
185
+ Nouns are your entities - the "things" in your data. Each noun has:
186
+ - A unique ID
187
+ - A vector representation (for similarity search)
188
+ - A type (Person, Document, Concept, etc.)
189
+ - Custom metadata
671
190
 
672
- ### 💡 [Examples](docs/examples/)
673
- Practical code examples and real-world applications.
191
+ **Available Noun Types:**
674
192
 
675
- - **[Basic Usage](docs/examples/basic-usage.md)** - Simple examples to get started
676
- - **[Advanced Patterns](docs/examples/advanced-patterns.md)** - Complex use cases
677
- - **[Integrations](docs/examples/integrations.md)** - Third-party service integrations
678
- - **[Performance Examples](docs/examples/performance.md)** - Optimization and scaling
193
+ | Category | Types | Use For |
194
+ |----------|-------|---------|
195
+ | **Core Entities** | `Person`, `Organization`, `Location`, `Thing`, `Concept`, `Event` | People, companies, places, objects, ideas, happenings |
196
+ | **Digital Content** | `Document`, `Media`, `File`, `Message`, `Content` | PDFs, images, videos, emails, posts, generic content |
197
+ | **Collections** | `Collection`, `Dataset` | Groups of items, structured data sets |
198
+ | **Business** | `Product`, `Service`, `User`, `Task`, `Project` | E-commerce, SaaS, project management |
199
+ | **Descriptive** | `Process`, `State`, `Role` | Workflows, conditions, responsibilities |
679
200
 
680
- ### 🔬 Technical Documentation
201
+ ### 🔗 Verbs (How Things Connect)
681
202
 
682
- - **[Testing Guide](docs/technical/TESTING.md)** - Testing strategies and best practices
683
- - **[Statistics Guide](STATISTICS.md)** - Database statistics and monitoring
684
- - **[Technical Guides](TECHNICAL_GUIDES.md)** - Advanced technical topics
203
+ Verbs are your relationships - they give meaning to connections. Not just "these vectors are similar" but "this OWNS that" or "this CAUSES that".
685
204
 
686
- ## API Reference
205
+ **Available Verb Types:**
687
206
 
688
- ### Database Management
689
-
690
- ```typescript
691
- // Initialize the database
692
- await db.init()
207
+ | Category | Types | Examples |
208
+ |----------|-------|----------|
209
+ | **Core** | `RelatedTo`, `Contains`, `PartOf`, `LocatedAt`, `References` | Generic relations, containment, location |
210
+ | **Temporal** | `Precedes`, `Succeeds`, `Causes`, `DependsOn`, `Requires` | Time sequences, causality, dependencies |
211
+ | **Creation** | `Creates`, `Transforms`, `Becomes`, `Modifies`, `Consumes` | Creation, change, consumption |
212
+ | **Ownership** | `Owns`, `AttributedTo`, `CreatedBy`, `BelongsTo` | Ownership, authorship, belonging |
213
+ | **Social** | `MemberOf`, `WorksWith`, `FriendOf`, `Follows`, `Likes`, `ReportsTo` | Social networks, organizations |
214
+ | **Functional** | `Describes`, `Implements`, `Validates`, `Triggers`, `Serves` | Functions, implementations, services |
693
215
 
694
- // Clear all data
695
- await db.clear()
216
+ ### 💡 Why This Matters
696
217
 
697
- // Get database status
698
- const status = await db.status()
218
+ ```javascript
219
+ // Traditional vector DB: Just similarity
220
+ const similar = await vectorDB.search(embedding, 10)
221
+ // Result: [vector1, vector2, ...] - What do these mean? 🤷
699
222
 
700
- // Backup all data from the database
701
- const backupData = await db.backup()
223
+ // Brainy: Similarity + Meaning + Relationships
224
+ const catId = await brainy.add("Siamese cat", {
225
+ noun: NounType.Thing,
226
+ breed: "Siamese"
227
+ })
228
+ const ownerId = await brainy.add("John Smith", {
229
+ noun: NounType.Person
230
+ })
231
+ await brainy.addVerb(ownerId, catId, {
232
+ verb: VerbType.Owns,
233
+ since: "2020-01-01"
234
+ })
702
235
 
703
- // Restore data into the database
704
- const restoreResult = await db.restore(backupData, { clearExisting: true })
236
+ // Now you can search with context!
237
+ const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
238
+ const catOwners = await brainy.getVerbsByTarget(catId, VerbType.Owns)
705
239
  ```
706
240
 
707
- ### Database Statistics
708
-
709
- Brainy provides a way to get statistics about the current state of the database. For detailed information about the
710
- statistics system, including implementation details, scalability improvements, and usage examples, see
711
- our [Statistics Guide](STATISTICS.md).
712
-
713
- ```typescript
714
- import { BrainyData, getStatistics } from '@soulcraft/brainy'
241
+ ## 🌍 Distributed Mode (New!)
715
242
 
716
- // Create and initialize the database
717
- const db = new BrainyData()
718
- await db.init()
243
+ Brainy now supports **distributed deployments** with multiple specialized instances sharing the same data. Perfect for scaling your AI applications across multiple servers.
719
244
 
720
- // Get statistics using the instance method
721
- const stats = await db.getStatistics()
722
- console.log(stats)
723
- // Output: { nounCount: 0, verbCount: 0, metadataCount: 0, hnswIndexSize: 0, serviceBreakdown: {...} }
724
- ```
245
+ ### Distributed Setup
725
246
 
726
- ### Working with Nouns (Entities)
727
-
728
- ```typescript
729
- // Add a noun (automatically vectorized)
730
- const id = await db.add(textOrVector, {
731
- noun: NounType.Thing,
732
- // other metadata...
247
+ ```javascript
248
+ // Single instance (no change needed!)
249
+ const brainy = createAutoBrainy({
250
+ storage: { type: 's3', bucket: 'my-bucket' }
733
251
  })
734
252
 
735
- // Add multiple nouns in parallel (with multithreading and batch embedding)
736
- const ids = await db.addBatch([
737
- {
738
- vectorOrData: "First item to add",
739
- metadata: { noun: NounType.Thing, category: 'example' }
740
- },
741
- {
742
- vectorOrData: "Second item to add",
743
- metadata: { noun: NounType.Thing, category: 'example' }
744
- },
745
- // More items...
746
- ], {
747
- forceEmbed: false,
748
- concurrency: 4, // Control the level of parallelism (default: 4)
749
- batchSize: 50 // Control the number of items to process in a single batch (default: 50)
253
+ // Distributed mode requires explicit role configuration
254
+ // Option 1: Via environment variable
255
+ process.env.BRAINY_ROLE = 'writer' // or 'reader' or 'hybrid'
256
+ const brainy = createAutoBrainy({
257
+ storage: { type: 's3', bucket: 'my-bucket' },
258
+ distributed: true
750
259
  })
751
260
 
752
- // Retrieve a noun
753
- const noun = await db.get(id)
754
-
755
- // Update noun metadata
756
- await db.updateMetadata(id, {
757
- noun: NounType.Thing,
758
- // updated metadata...
261
+ // Option 2: Via configuration
262
+ const writer = createAutoBrainy({
263
+ storage: { type: 's3', bucket: 'my-bucket' },
264
+ distributed: { role: 'writer' } // Handles data ingestion
759
265
  })
760
266
 
761
- // Delete a noun
762
- await db.delete(id)
763
-
764
- // Search for similar nouns
765
- const results = await db.search(vectorOrText, numResults)
766
- const textResults = await db.searchText("query text", numResults)
767
-
768
- // Search by noun type
769
- const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
267
+ const reader = createAutoBrainy({
268
+ storage: { type: 's3', bucket: 'my-bucket' },
269
+ distributed: { role: 'reader' } // Optimized for queries
270
+ })
770
271
 
771
- // Search within specific fields of JSON documents
772
- const fieldResults = await db.search("Acme Corporation", 10, {
773
- searchField: "company"
272
+ // Option 3: Via read/write mode (role auto-inferred)
273
+ const writer = createAutoBrainy({
274
+ storage: { type: 's3', bucket: 'my-bucket' },
275
+ writeOnly: true, // Automatically becomes 'writer' role
276
+ distributed: true
774
277
  })
775
278
 
776
- // Search using standard field names across different services
777
- const titleResults = await db.searchByStandardField("title", "climate change", 10)
778
- const authorResults = await db.searchByStandardField("author", "johndoe", 10, {
779
- services: ["github", "reddit"]
279
+ const reader = createAutoBrainy({
280
+ storage: { type: 's3', bucket: 'my-bucket' },
281
+ readOnly: true, // Automatically becomes 'reader' role
282
+ distributed: true
780
283
  })
781
284
  ```
782
285
 
783
- ### Field Standardization and Service Tracking
286
+ ### Key Distributed Features
784
287
 
785
- Brainy automatically tracks field names from JSON documents and associates them with the service that inserted the data.
786
- This enables powerful cross-service search capabilities:
288
+ **🎯 Explicit Role Configuration**
289
+ - Roles must be explicitly set (no dangerous auto-assignment)
290
+ - Can use environment variables, config, or read/write modes
291
+ - Clear separation between writers and readers
787
292
 
788
- ```typescript
789
- // Get all available field names organized by service
790
- const fieldNames = await db.getAvailableFieldNames()
791
- // Example output: { "github": ["repository.name", "issue.title"], "reddit": ["title", "selftext"] }
293
+ **#️⃣ Hash-Based Partitioning**
294
+ - Handles multiple writers with different data types
295
+ - Even distribution across partitions
296
+ - No semantic conflicts with mixed data
792
297
 
793
- // Get standard field mappings
794
- const standardMappings = await db.getStandardFieldMappings()
795
- // Example output: { "title": { "github": ["repository.name"], "reddit": ["title"] } }
298
+ **🏷️ Domain Tagging**
299
+ - Automatic domain detection (medical, legal, product, etc.)
300
+ - Filter searches by domain
301
+ - Logical separation without complexity
302
+
303
+ ```javascript
304
+ // Data is automatically tagged with domains
305
+ await brainy.add({
306
+ symptoms: "fever",
307
+ diagnosis: "flu"
308
+ }, metadata) // Auto-tagged as 'medical'
309
+
310
+ // Search within specific domains
311
+ const medicalResults = await brainy.search(query, 10, {
312
+ filter: { domain: 'medical' }
313
+ })
796
314
  ```
797
315
 
798
- When adding data, specify the service name to ensure proper field tracking:
316
+ **📊 Health Monitoring**
317
+ - Real-time health metrics
318
+ - Automatic dead instance cleanup
319
+ - Performance tracking
799
320
 
800
- ```typescript
801
- // Add data with service name
802
- await db.add(jsonData, metadata, { service: "github" })
321
+ ```javascript
322
+ // Get health status
323
+ const health = brainy.getHealthStatus()
324
+ // {
325
+ // status: 'healthy',
326
+ // role: 'reader',
327
+ // vectorCount: 1000000,
328
+ // cacheHitRate: 0.95,
329
+ // requestsPerSecond: 150
330
+ // }
331
+ ```
332
+
333
+ **⚡ Role-Optimized Performance**
334
+ - **Readers**: 80% memory for cache, aggressive prefetching
335
+ - **Writers**: Optimized write batching, minimal cache
336
+ - **Hybrid**: Adaptive based on workload
337
+
338
+ ### Deployment Examples
339
+
340
+ **Docker Compose**
341
+ ```yaml
342
+ services:
343
+ writer:
344
+ image: myapp
345
+ environment:
346
+ BRAINY_ROLE: writer # Optional - auto-detects
347
+
348
+ reader:
349
+ image: myapp
350
+ environment:
351
+ BRAINY_ROLE: reader # Optional - auto-detects
352
+ scale: 5
353
+ ```
354
+
355
+ **Kubernetes**
356
+ ```yaml
357
+ # Automatically detects role from deployment type
358
+ apiVersion: apps/v1
359
+ kind: Deployment
360
+ metadata:
361
+ name: brainy-readers
362
+ spec:
363
+ replicas: 10 # Multiple readers
364
+ template:
365
+ spec:
366
+ containers:
367
+ - name: app
368
+ image: myapp
369
+ # Role auto-detected as 'reader' (multiple replicas)
370
+ ```
371
+
372
+ **Benefits**
373
+ - ✅ **50-70% faster searches** with parallel readers
374
+ - ✅ **No coordination complexity** - Shared JSON config in S3
375
+ - ✅ **Zero downtime scaling** - Add/remove instances anytime
376
+ - ✅ **Automatic failover** - Dead instances cleaned up automatically
377
+
378
+ ## 🤔 Why Choose Brainy?
379
+
380
+ ### vs. Traditional Databases
381
+ ❌ **PostgreSQL with pgvector** - Requires complex setup, tuning, and DevOps expertise
382
+ ✅ **Brainy** - Zero config, auto-optimizes, works everywhere from browser to cloud
383
+
384
+ ### vs. Vector Databases
385
+ ❌ **Pinecone/Weaviate/Qdrant** - Cloud-only, expensive, vendor lock-in
386
+ ✅ **Brainy** - Run locally, in browser, or cloud. Your choice, your data
387
+
388
+ ### vs. Graph Databases
389
+ ❌ **Neo4j** - Great for graphs, no vector support
390
+ ✅ **Brainy** - Vectors + graphs in one. Best of both worlds
391
+
392
+ ### vs. DIY Solutions
393
+ ❌ **Building your own** - Months of work, optimization nightmares
394
+ ✅ **Brainy** - Production-ready in 30 seconds
395
+
396
+ ## 🚀 Getting Started in 30 Seconds
397
+
398
+ ### React
399
+
400
+ ```jsx
401
+ import { createAutoBrainy } from 'brainy'
402
+ import { useEffect, useState } from 'react'
403
+
404
+ function SemanticSearch() {
405
+ const [brainy] = useState(() => createAutoBrainy())
406
+ const [results, setResults] = useState([])
407
+
408
+ const search = async (query) => {
409
+ const items = await brainy.searchText(query, 10)
410
+ setResults(items)
411
+ }
412
+
413
+ return (
414
+ <input onChange={(e) => search(e.target.value)}
415
+ placeholder="Search by meaning..." />
416
+ )
417
+ }
803
418
  ```
804
419
 
805
- ### Working with Verbs (Relationships)
420
+ ### Angular
806
421
 
807
422
  ```typescript
808
- // Add a relationship between nouns
809
- await db.addVerb(sourceId, targetId, {
810
- verb: VerbType.RelatedTo,
811
- // other metadata...
423
+ import { Component, OnInit } from '@angular/core'
424
+ import { createAutoBrainy } from 'brainy'
425
+
426
+ @Component({
427
+ selector: 'app-search',
428
+ template: `
429
+ <input (input)="search($event.target.value)"
430
+ placeholder="Semantic search...">
431
+ <div *ngFor="let result of results">
432
+ {{ result.text }}
433
+ </div>
434
+ `
812
435
  })
436
+ export class SearchComponent implements OnInit {
437
+ brainy = createAutoBrainy()
438
+ results = []
813
439
 
814
- // Add a relationship with auto-creation of missing nouns
815
- // This is useful when the target noun might not exist yet
816
- await db.addVerb(sourceId, targetId, {
817
- verb: VerbType.RelatedTo,
818
- // Enable auto-creation of missing nouns
819
- autoCreateMissingNouns: true,
820
- // Optional metadata for auto-created nouns
821
- missingNounMetadata: {
822
- noun: NounType.Concept,
823
- description: 'Auto-created noun'
440
+ async search(query: string) {
441
+ this.results = await this.brainy.searchText(query, 10)
824
442
  }
825
- })
443
+ }
444
+ ```
826
445
 
827
- // Get all relationships
828
- const verbs = await db.getAllVerbs()
446
+ ### Vue 3
829
447
 
830
- // Get relationships by source noun
831
- const outgoingVerbs = await db.getVerbsBySource(sourceId)
448
+ ```vue
449
+ <script setup>
450
+ import { createAutoBrainy } from 'brainy'
451
+ import { ref } from 'vue'
832
452
 
833
- // Get relationships by target noun
834
- const incomingVerbs = await db.getVerbsByTarget(targetId)
453
+ const brainy = createAutoBrainy()
454
+ const results = ref([])
835
455
 
836
- // Get relationships by type
837
- const containsVerbs = await db.getVerbsByType(VerbType.Contains)
456
+ const search = async (query) => {
457
+ results.value = await brainy.searchText(query, 10)
458
+ }
459
+ </script>
838
460
 
839
- // Get a specific relationship
840
- const verb = await db.getVerb(verbId)
461
+ <template>
462
+ <input @input="search($event.target.value)"
463
+ placeholder="Find similar content...">
464
+ <div v-for="result in results" :key="result.id">
465
+ {{ result.text }}
466
+ </div>
467
+ </template>
468
+ ```
469
+
470
+ ### Svelte
471
+
472
+ ```svelte
473
+ <script>
474
+ import { createAutoBrainy } from 'brainy'
475
+
476
+ const brainy = createAutoBrainy()
477
+ let results = []
478
+
479
+ async function search(e) {
480
+ results = await brainy.searchText(e.target.value, 10)
481
+ }
482
+ </script>
841
483
 
842
- // Delete a relationship
843
- await db.deleteVerb(verbId)
484
+ <input on:input={search} placeholder="AI-powered search...">
485
+ {#each results as result}
486
+ <div>{result.text}</div>
487
+ {/each}
844
488
  ```
845
489
 
846
- ## Advanced Configuration
490
+ ### Next.js (App Router)
847
491
 
848
- ### Database Modes
492
+ ```jsx
493
+ // app/search/page.js
494
+ import { createAutoBrainy } from 'brainy'
849
495
 
850
- Brainy supports special operational modes that restrict certain operations:
851
-
852
- ```typescript
853
- import { BrainyData } from '@soulcraft/brainy'
496
+ export default function SearchPage() {
497
+ async function search(formData) {
498
+ 'use server'
499
+ const brainy = createAutoBrainy({ bucketName: 'vectors' })
500
+ const query = formData.get('query')
501
+ return await brainy.searchText(query, 10)
502
+ }
854
503
 
855
- // Create and initialize the database
856
- const db = new BrainyData()
857
- await db.init()
504
+ return (
505
+ <form action={search}>
506
+ <input name="query" placeholder="Search..." />
507
+ <button type="submit">Search</button>
508
+ </form>
509
+ )
510
+ }
511
+ ```
858
512
 
859
- // Set the database to read-only mode (prevents write operations)
860
- db.setReadOnly(true)
513
+ ### Node.js / Bun / Deno
861
514
 
862
- // Check if the database is in read-only mode
863
- const isReadOnly = db.isReadOnly() // Returns true
515
+ ```javascript
516
+ import { createAutoBrainy } from 'brainy'
864
517
 
865
- // Set the database to write-only mode (prevents search operations)
866
- db.setWriteOnly(true)
518
+ const brainy = createAutoBrainy()
867
519
 
868
- // Check if the database is in write-only mode
869
- const isWriteOnly = db.isWriteOnly() // Returns true
520
+ // Add some data
521
+ await brainy.add("TypeScript is a typed superset of JavaScript", {
522
+ category: 'programming'
523
+ })
870
524
 
871
- // Reset to normal mode (allows both read and write operations)
872
- db.setReadOnly(false)
873
- db.setWriteOnly(false)
525
+ // Search for similar content
526
+ const results = await brainy.searchText("JavaScript with types", 5)
527
+ console.log(results)
874
528
  ```
875
529
 
876
- - **Read-Only Mode**: When enabled, prevents all write operations (add, update, delete). Useful for deployment scenarios
877
- where you want to prevent modifications to the database.
878
- - **Write-Only Mode**: When enabled, prevents all search operations. Useful for initial data loading or when you want to
879
- optimize for write performance.
530
+ ### Vanilla JavaScript
880
531
 
881
- ### Embedding
532
+ ```html
533
+ <!DOCTYPE html>
534
+ <html>
535
+ <head>
536
+ <script type="module">
537
+ import { createAutoBrainy } from 'https://unpkg.com/brainy/dist/unified.min.js'
538
+
539
+ window.brainy = createAutoBrainy()
540
+
541
+ window.search = async function(query) {
542
+ const results = await brainy.searchText(query, 10)
543
+ document.getElementById('results').innerHTML =
544
+ results.map(r => `<div>${r.text}</div>`).join('')
545
+ }
546
+ </script>
547
+ </head>
548
+ <body>
549
+ <input onkeyup="search(this.value)" placeholder="Search...">
550
+ <div id="results"></div>
551
+ </body>
552
+ </html>
553
+ ```
882
554
 
883
- ```typescript
884
- import {
885
- BrainyData,
886
- createTensorFlowEmbeddingFunction,
887
- createThreadedEmbeddingFunction
888
- } from '@soulcraft/brainy'
889
-
890
- // Use the standard TensorFlow Universal Sentence Encoder embedding function
891
- const db = new BrainyData({
892
- embeddingFunction: createTensorFlowEmbeddingFunction()
893
- })
894
- await db.init()
555
+ ### Cloudflare Workers
895
556
 
896
- // Or use the threaded embedding function for better performance
897
- const threadedDb = new BrainyData({
898
- embeddingFunction: createThreadedEmbeddingFunction()
899
- })
900
- await threadedDb.init()
901
-
902
- // Directly embed text to vectors
903
- const vector = await db.embed("Some text to convert to a vector")
904
-
905
- // Calculate similarity between two texts or vectors
906
- const similarity = await db.calculateSimilarity(
907
- "Cats are furry pets",
908
- "Felines make good companions"
909
- )
910
- console.log(`Similarity score: ${similarity}`) // Higher value means more similar
911
-
912
- // Calculate similarity with custom options
913
- const vectorA = await db.embed("First text")
914
- const vectorB = await db.embed("Second text")
915
- const customSimilarity = await db.calculateSimilarity(
916
- vectorA, // Can use pre-computed vectors
917
- vectorB,
918
- {
919
- forceEmbed: false, // Skip embedding if inputs are already vectors
920
- distanceFunction: cosineDistance // Optional custom distance function
557
+ ```javascript
558
+ import { createAutoBrainy } from 'brainy'
559
+
560
+ export default {
561
+ async fetch(request, env) {
562
+ const brainy = createAutoBrainy({
563
+ bucketName: env.R2_BUCKET
564
+ })
565
+
566
+ const url = new URL(request.url)
567
+ const query = url.searchParams.get('q')
568
+
569
+ const results = await brainy.searchText(query, 10)
570
+ return Response.json(results)
921
571
  }
922
- )
572
+ }
923
573
  ```
924
574
 
925
- The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
926
- performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
927
- falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
928
- implementation includes worker reuse and model caching for optimal performance.
929
-
930
- ### Performance Tuning
931
-
932
- Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
933
- container, server):
934
-
935
- #### GPU and CPU Optimization
936
-
937
- Brainy uses GPU and CPU optimization for compute-intensive operations:
938
-
939
- 1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
940
- 2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
941
- 3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
942
- 4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
943
- 5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
944
-
945
- #### Multithreading Support
946
-
947
- Brainy includes comprehensive multithreading support to improve performance across all environments:
575
+ ### AWS Lambda
948
576
 
949
- 1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
950
- 2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
951
- 3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
952
- 4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
953
- 5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
954
- 6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
955
- 7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
577
+ ```javascript
578
+ import { createAutoBrainy } from 'brainy'
956
579
 
957
- ```typescript
958
- import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
959
-
960
- // Configure with custom options
961
- const db = new BrainyData({
962
- // Use Euclidean distance instead of default cosine distance
963
- distanceFunction: euclideanDistance,
964
-
965
- // HNSW index configuration for search performance
966
- hnsw: {
967
- M: 16, // Max connections per noun
968
- efConstruction: 200, // Construction candidate list size
969
- efSearch: 50, // Search candidate list size
970
- },
971
-
972
- // Performance optimization options
973
- performance: {
974
- useParallelization: true, // Enable multithreaded search operations
975
- },
976
-
977
- // Noun and Verb type validation
978
- typeValidation: {
979
- enforceNounTypes: true, // Validate noun types against NounType enum
980
- enforceVerbTypes: true, // Validate verb types against VerbType enum
981
- },
982
-
983
- // Storage configuration
984
- storage: {
985
- requestPersistentStorage: true,
986
- // Example configuration for cloud storage (replace with your own values):
987
- // s3Storage: {
988
- // bucketName: 'your-s3-bucket-name',
989
- // region: 'your-aws-region'
990
- // // Credentials should be provided via environment variables
991
- // // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
992
- // }
580
+ export const handler = async (event) => {
581
+ const brainy = createAutoBrainy({
582
+ bucketName: process.env.S3_BUCKET
583
+ })
584
+
585
+ const results = await brainy.searchText(event.query, 10)
586
+
587
+ return {
588
+ statusCode: 200,
589
+ body: JSON.stringify(results)
993
590
  }
994
- })
591
+ }
995
592
  ```
996
593
 
997
- ### Optimized HNSW for Large Datasets
998
-
999
- Brainy includes an optimized HNSW index implementation for large datasets that may not fit entirely in memory, using a
1000
- hybrid approach:
594
+ ### Azure Functions
1001
595
 
1002
- 1. **Product Quantization** - Reduces vector dimensionality while preserving similarity relationships
1003
- 2. **Disk-Based Storage** - Offloads vectors to disk when memory usage exceeds a threshold
1004
- 3. **Memory-Efficient Indexing** - Optimizes memory usage for large-scale vector collections
596
+ ```javascript
597
+ import { createAutoBrainy } from 'brainy'
1005
598
 
1006
- ```typescript
1007
- import { BrainyData } from '@soulcraft/brainy'
1008
-
1009
- // Configure with optimized HNSW index for large datasets
1010
- const db = new BrainyData({
1011
- hnswOptimized: {
1012
- // Standard HNSW parameters
1013
- M: 16, // Max connections per noun
1014
- efConstruction: 200, // Construction candidate list size
1015
- efSearch: 50, // Search candidate list size
1016
-
1017
- // Memory threshold in bytes - when exceeded, will use disk-based approach
1018
- memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
1019
-
1020
- // Product quantization settings for dimensionality reduction
1021
- productQuantization: {
1022
- enabled: true, // Enable product quantization
1023
- numSubvectors: 16, // Number of subvectors to split the vector into
1024
- numCentroids: 256 // Number of centroids per subvector
1025
- },
1026
-
1027
- // Whether to use disk-based storage for the index
1028
- useDiskBasedIndex: true // Enable disk-based storage
1029
- },
1030
-
1031
- // Storage configuration (required for disk-based index)
1032
- storage: {
1033
- requestPersistentStorage: true
599
+ module.exports = async function (context, req) {
600
+ const brainy = createAutoBrainy({
601
+ bucketName: process.env.AZURE_STORAGE_CONTAINER
602
+ })
603
+
604
+ const results = await brainy.searchText(req.query.q, 10)
605
+
606
+ context.res = {
607
+ body: results
1034
608
  }
1035
- })
1036
-
1037
- // The optimized index automatically adapts based on dataset size:
1038
- // 1. For small datasets: Uses standard in-memory approach
1039
- // 2. For medium datasets: Applies product quantization to reduce memory usage
1040
- // 3. For large datasets: Combines product quantization with disk-based storage
1041
-
1042
- // Check status to see memory usage and optimization details
1043
- const status = await db.status()
1044
- console.log(status.details.index)
609
+ }
1045
610
  ```
1046
611
 
1047
- ## Distance Functions
1048
-
1049
- Brainy provides several distance functions for vector similarity calculations:
612
+ ### Google Cloud Functions
1050
613
 
1051
- - `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
1052
- - `euclideanDistance`: Measures the straight-line distance between vectors
1053
- - `manhattanDistance`: Measures the sum of absolute differences between vector components
1054
- - `dotProductDistance`: Measures the negative dot product between vectors
1055
-
1056
- All distance functions are optimized for performance and automatically use the most efficient implementation based on
1057
- the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
1058
- and multithreading when available to improve performance.
1059
-
1060
- ## Backup and Restore
1061
-
1062
- Brainy provides backup and restore capabilities that allow you to:
1063
-
1064
- - Back up your data
1065
- - Transfer data between Brainy instances
1066
- - Restore existing data into Brainy for vectorization and indexing
1067
- - Backup data for analysis or visualization in other tools
1068
-
1069
- ### Backing Up Data
1070
-
1071
- ```typescript
1072
- // Backup all data from the database
1073
- const backupData = await db.backup()
1074
-
1075
- // The backup data includes:
1076
- // - All nouns (entities) with their vectors and metadata
1077
- // - All verbs (relationships) between nouns
1078
- // - Noun types and verb types
1079
- // - HNSW index data for fast similarity search
1080
- // - Version information
1081
-
1082
- // Save the backup data to a file (Node.js environment)
1083
- import fs from 'fs'
614
+ ```javascript
615
+ import { createAutoBrainy } from 'brainy'
1084
616
 
1085
- fs.writeFileSync('brainy-backup.json', JSON.stringify(backupData, null, 2))
617
+ export const searchHandler = async (req, res) => {
618
+ const brainy = createAutoBrainy({
619
+ bucketName: process.env.GCS_BUCKET
620
+ })
621
+
622
+ const results = await brainy.searchText(req.query.q, 10)
623
+ res.json(results)
624
+ }
1086
625
  ```
1087
626
 
1088
- ### Restoring Data
627
+ ### Google Cloud Run
1089
628
 
1090
- Brainy's restore functionality can handle:
629
+ ```dockerfile
630
+ # Dockerfile
631
+ FROM node:20-alpine
632
+ USER node
633
+ WORKDIR /app
634
+ COPY package*.json ./
635
+ RUN npm install brainy
636
+ COPY . .
637
+ CMD ["node", "server.js"]
638
+ ```
1091
639
 
1092
- 1. Complete backups with vectors and index data
1093
- 2. Sparse data without vectors (vectors will be created during restore)
1094
- 3. Data without HNSW index (index will be reconstructed if needed)
640
+ ```javascript
641
+ // server.js
642
+ import { createAutoBrainy } from 'brainy'
643
+ import express from 'express'
1095
644
 
1096
- ```typescript
1097
- // Restore data with all options
1098
- const restoreResult = await db.restore(backupData, {
1099
- clearExisting: true // Whether to clear existing data before restore
645
+ const app = express()
646
+ const brainy = createAutoBrainy({
647
+ bucketName: process.env.GCS_BUCKET
1100
648
  })
1101
649
 
1102
- // Import sparse data (without vectors)
1103
- // Vectors will be automatically created using the embedding function
1104
- const sparseData = {
1105
- nouns: [
1106
- {
1107
- id: '123',
1108
- // No vector field - will be created during import
1109
- metadata: {
1110
- noun: 'Thing',
1111
- text: 'This text will be used to generate a vector'
1112
- }
1113
- }
1114
- ],
1115
- verbs: [],
1116
- version: '1.0.0'
1117
- }
650
+ app.get('/search', async (req, res) => {
651
+ const results = await brainy.searchText(req.query.q, 10)
652
+ res.json(results)
653
+ })
1118
654
 
1119
- const sparseImportResult = await db.importSparseData(sparseData)
655
+ const port = process.env.PORT || 8080
656
+ app.listen(port, () => console.log(`Brainy on Cloud Run: ${port}`))
1120
657
  ```
1121
658
 
1122
- ### CLI Backup/Restore
1123
-
1124
659
  ```bash
1125
- # Backup data to a file
1126
- brainy backup --output brainy-backup.json
1127
-
1128
- # Restore data from a file
1129
- brainy restore --input brainy-backup.json --clear-existing
1130
-
1131
- # Import sparse data (without vectors)
1132
- brainy import-sparse --input sparse-data.json
660
+ # Deploy to Cloud Run
661
+ gcloud run deploy brainy-api \
662
+ --source . \
663
+ --platform managed \
664
+ --region us-central1 \
665
+ --allow-unauthenticated
1133
666
  ```
1134
667
 
1135
- ## Embedding
1136
-
1137
- Brainy uses the following embedding approach:
1138
-
1139
- - TensorFlow Universal Sentence Encoder (high-quality text embeddings)
1140
- - GPU acceleration when available (via WebGL in browsers)
1141
- - Batch embedding for processing multiple items efficiently
1142
- - Worker reuse and model caching for optimal performance
1143
- - Custom embedding functions can be plugged in for specialized domains
1144
-
1145
- ## Extensions
1146
-
1147
- Brainy includes an augmentation system for extending functionality:
1148
-
1149
- - **Memory Augmentations**: Different storage backends
1150
- - **Sense Augmentations**: Process raw data
1151
- - **Cognition Augmentations**: Reasoning and inference
1152
- - **Dialog Augmentations**: Text processing and interaction
1153
- - **Perception Augmentations**: Data interpretation and visualization
1154
- - **Activation Augmentations**: Trigger actions
1155
-
1156
- ### Simplified Augmentation System
668
+ ### Vercel Edge Functions
1157
669
 
1158
- Brainy provides a simplified factory system for creating, importing, and executing augmentations with minimal
1159
- boilerplate:
1160
-
1161
- ```typescript
1162
- import {
1163
- createMemoryAugmentation,
1164
- createConduitAugmentation,
1165
- createSenseAugmentation,
1166
- addWebSocketSupport,
1167
- executeStreamlined,
1168
- processStaticData,
1169
- processStreamingData,
1170
- createPipeline
1171
- } from '@soulcraft/brainy'
1172
-
1173
- // Create a memory augmentation with minimal code
1174
- const memoryAug = createMemoryAugmentation({
1175
- name: 'simple-memory',
1176
- description: 'A simple in-memory storage augmentation',
1177
- autoRegister: true,
1178
- autoInitialize: true,
1179
-
1180
- // Implement only the methods you need
1181
- storeData: async (key, data) => {
1182
- // Your implementation here
1183
- return {
1184
- success: true,
1185
- data: true
1186
- }
1187
- },
1188
-
1189
- retrieveData: async (key) => {
1190
- // Your implementation here
1191
- return {
1192
- success: true,
1193
- data: { example: 'data', key }
1194
- }
1195
- }
1196
- })
670
+ ```javascript
671
+ import { createAutoBrainy } from 'brainy'
1197
672
 
1198
- // Add WebSocket support to any augmentation
1199
- const wsAugmentation = addWebSocketSupport(memoryAug, {
1200
- connectWebSocket: async (url) => {
1201
- // Your implementation here
1202
- return {
1203
- connectionId: 'ws-1',
1204
- url,
1205
- status: 'connected'
1206
- }
1207
- }
1208
- })
673
+ export const config = {
674
+ runtime: 'edge'
675
+ }
1209
676
 
1210
- // Process static data through a pipeline
1211
- const result = await processStaticData(
1212
- 'Input data',
1213
- [
1214
- {
1215
- augmentation: senseAug,
1216
- method: 'processRawData',
1217
- transformArgs: (data) => [data, 'text']
1218
- },
1219
- {
1220
- augmentation: memoryAug,
1221
- method: 'storeData',
1222
- transformArgs: (data) => ['processed-data', data]
1223
- }
1224
- ]
1225
- )
1226
-
1227
- // Create a reusable pipeline
1228
- const pipeline = createPipeline([
1229
- {
1230
- augmentation: senseAug,
1231
- method: 'processRawData',
1232
- transformArgs: (data) => [data, 'text']
1233
- },
1234
- {
1235
- augmentation: memoryAug,
1236
- method: 'storeData',
1237
- transformArgs: (data) => ['processed-data', data]
1238
- }
1239
- ])
677
+ export default async function handler(request) {
678
+ const brainy = createAutoBrainy()
679
+ const { searchParams } = new URL(request.url)
680
+ const query = searchParams.get('q')
681
+
682
+ const results = await brainy.searchText(query, 10)
683
+ return Response.json(results)
684
+ }
685
+ ```
1240
686
 
1241
- // Use the pipeline
1242
- const result = await pipeline('New input data')
687
+ ### Netlify Functions
1243
688
 
1244
- // Dynamically load augmentations at runtime
1245
- const loadedAugmentations = await loadAugmentationModule(
1246
- import('./my-augmentations.js'),
1247
- {
1248
- autoRegister: true,
1249
- autoInitialize: true
689
+ ```javascript
690
+ import { createAutoBrainy } from 'brainy'
691
+
692
+ export async function handler(event, context) {
693
+ const brainy = createAutoBrainy()
694
+ const query = event.queryStringParameters.q
695
+
696
+ const results = await brainy.searchText(query, 10)
697
+
698
+ return {
699
+ statusCode: 200,
700
+ body: JSON.stringify(results)
1250
701
  }
1251
- )
702
+ }
1252
703
  ```
1253
704
 
1254
- The simplified augmentation system provides:
1255
-
1256
- 1. **Factory Functions** - Create augmentations with minimal boilerplate
1257
- 2. **WebSocket Support** - Add WebSocket capabilities to any augmentation
1258
- 3. **Streamlined Pipeline** - Process data through augmentations more efficiently
1259
- 4. **Dynamic Loading** - Load augmentations at runtime when needed
1260
- 5. **Static & Streaming Data** - Handle both static and streaming data with the same API
1261
-
1262
- #### WebSocket Augmentation Types
1263
-
1264
- Brainy exports several WebSocket augmentation types that can be used by augmentation creators to add WebSocket
1265
- capabilities to their augmentations:
705
+ ### Supabase Edge Functions
1266
706
 
1267
707
  ```typescript
1268
- import {
1269
- // Base WebSocket support interface
1270
- IWebSocketSupport,
1271
-
1272
- // Combined WebSocket augmentation types
1273
- IWebSocketSenseAugmentation,
1274
- IWebSocketConduitAugmentation,
1275
- IWebSocketCognitionAugmentation,
1276
- IWebSocketMemoryAugmentation,
1277
- IWebSocketPerceptionAugmentation,
1278
- IWebSocketDialogAugmentation,
1279
- IWebSocketActivationAugmentation,
1280
-
1281
- // Function to add WebSocket support to any augmentation
1282
- addWebSocketSupport
1283
- } from '@soulcraft/brainy'
1284
-
1285
- // Example: Creating a typed WebSocket-enabled sense augmentation
1286
- const mySenseAug = createSenseAugmentation({
1287
- name: 'my-sense',
1288
- processRawData: async (data, dataType) => {
1289
- // Implementation
1290
- return {
1291
- success: true,
1292
- data: { nouns: [], verbs: [] }
1293
- }
1294
- }
1295
- }) as IWebSocketSenseAugmentation
1296
-
1297
- // Add WebSocket support
1298
- addWebSocketSupport(mySenseAug, {
1299
- connectWebSocket: async (url) => {
1300
- // WebSocket implementation
1301
- return {
1302
- connectionId: 'ws-1',
1303
- url,
1304
- status: 'connected'
1305
- }
1306
- },
1307
- sendWebSocketMessage: async (connectionId, data) => {
1308
- // Send message implementation
1309
- },
1310
- onWebSocketMessage: async (connectionId, callback) => {
1311
- // Register callback implementation
1312
- },
1313
- offWebSocketMessage: async (connectionId, callback) => {
1314
- // Remove callback implementation
1315
- },
1316
- closeWebSocket: async (connectionId, code, reason) => {
1317
- // Close connection implementation
1318
- }
708
+ import { createAutoBrainy } from 'brainy'
709
+ import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
710
+
711
+ serve(async (req) => {
712
+ const brainy = createAutoBrainy()
713
+ const url = new URL(req.url)
714
+ const query = url.searchParams.get('q')
715
+
716
+ const results = await brainy.searchText(query, 10)
717
+
718
+ return new Response(JSON.stringify(results), {
719
+ headers: { 'Content-Type': 'application/json' }
720
+ })
1319
721
  })
1320
-
1321
- // Now mySenseAug has both sense augmentation methods and WebSocket methods
1322
- await mySenseAug.processRawData('data', 'text')
1323
- await mySenseAug.connectWebSocket('wss://example.com')
1324
722
  ```
1325
723
 
1326
- These WebSocket augmentation types combine the base augmentation interfaces with the `IWebSocketSupport` interface,
1327
- providing type safety and autocompletion for augmentations with WebSocket capabilities.
1328
-
1329
- ### Model Control Protocol (MCP)
724
+ ### Docker Container
1330
725
 
1331
- Brainy includes a Model Control Protocol (MCP) implementation that allows external models to access Brainy data and use
1332
- the augmentation pipeline as tools:
726
+ ```dockerfile
727
+ FROM node:20-alpine
728
+ USER node
729
+ WORKDIR /app
730
+ COPY package*.json ./
731
+ RUN npm install brainy
732
+ COPY . .
1333
733
 
1334
- - **BrainyMCPAdapter**: Provides access to Brainy data through MCP
1335
- - **MCPAugmentationToolset**: Exposes the augmentation pipeline as tools
1336
- - **BrainyMCPService**: Integrates the adapter and toolset, providing WebSocket and REST server implementations
1337
-
1338
- Environment compatibility:
1339
-
1340
- - **BrainyMCPAdapter** and **MCPAugmentationToolset** can run in any environment (browser, Node.js, server)
1341
- - **BrainyMCPService** core functionality works in any environment
1342
-
1343
- For detailed documentation and usage examples, see the [MCP documentation](src/mcp/README.md).
1344
-
1345
- ## Cross-Environment Compatibility
1346
-
1347
- Brainy is designed to run seamlessly in any environment, from browsers to Node.js to serverless functions and
1348
- containers. All Brainy data, functions, and augmentations are environment-agnostic, allowing you to use the same code
1349
- everywhere.
1350
-
1351
- ### Environment Detection
1352
-
1353
- Brainy automatically detects the environment it's running in:
1354
-
1355
- ```typescript
1356
- import { environment } from '@soulcraft/brainy'
1357
-
1358
- // Check which environment we're running in
1359
- console.log(`Running in ${
1360
- environment.isBrowser ? 'browser' :
1361
- environment.isNode ? 'Node.js' :
1362
- 'serverless/unknown'
1363
- } environment`)
734
+ CMD ["node", "server.js"]
1364
735
  ```
1365
736
 
1366
- ### Adaptive Storage
1367
-
1368
- Storage adapters are automatically selected based on the environment:
1369
-
1370
- - **Browser**: Uses Origin Private File System (OPFS) when available, falls back to in-memory storage
1371
- - **Node.js**: Uses file system storage by default, with options for S3-compatible cloud storage
1372
- - **Serverless**: Uses in-memory storage with options for cloud persistence
1373
- - **Container**: Automatically detects and uses the appropriate storage based on available capabilities
1374
-
1375
- ### Dynamic Imports
1376
-
1377
- Brainy uses dynamic imports to load environment-specific dependencies only when needed, keeping the bundle size small
1378
- and ensuring compatibility across environments.
1379
-
1380
- ### Browser Support
1381
-
1382
- Works in all modern browsers:
1383
-
1384
- - Chrome 86+
1385
- - Edge 86+
1386
- - Opera 72+
1387
- - Chrome for Android 86+
737
+ ```javascript
738
+ // server.js
739
+ import { createAutoBrainy } from 'brainy'
740
+ import express from 'express'
1388
741
 
1389
- For browsers without OPFS support, falls back to in-memory storage.
742
+ const app = express()
743
+ const brainy = createAutoBrainy()
1390
744
 
1391
- ## Related Projects
745
+ app.get('/search', async (req, res) => {
746
+ const results = await brainy.searchText(req.query.q, 10)
747
+ res.json(results)
748
+ })
1392
749
 
1393
- - **[Cartographer](https://github.com/sodal-project/cartographer)** - A companion project that provides standardized
1394
- interfaces for interacting with Brainy
750
+ app.listen(3000, () => console.log('Brainy running on port 3000'))
751
+ ```
1395
752
 
1396
- ## Demo
753
+ ### Kubernetes
1397
754
 
1398
- The repository includes a comprehensive demo that showcases Brainy's main features:
755
+ ```yaml
756
+ apiVersion: apps/v1
757
+ kind: Deployment
758
+ metadata:
759
+ name: brainy-api
760
+ spec:
761
+ replicas: 3
762
+ template:
763
+ spec:
764
+ containers:
765
+ - name: brainy
766
+ image: your-registry/brainy-api:latest
767
+ env:
768
+ - name: S3_BUCKET
769
+ value: "your-vector-bucket"
770
+ ```
1399
771
 
1400
- - `demo/index.html` - A single demo page with animations demonstrating Brainy's features.
1401
- - **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the
1402
- interactive demo on
1403
- GitHub Pages
1404
- - Or run it locally with `npm run demo` (see [demo instructions](demo.md) for details)
1405
- - To deploy your own version to GitHub Pages, use the GitHub Actions workflow in
1406
- `.github/workflows/deploy-demo.yml`,
1407
- which automatically deploys when pushing to the main branch or can be manually triggered
1408
- - To use a custom domain (like www.soulcraft.com):
1409
- 1. A CNAME file is already included in the demo directory
1410
- 2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
1411
- 3. Configure your domain's DNS settings to point to GitHub Pages:
772
+ ### Railway.app
1412
773
 
1413
- - Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
1414
- - Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
774
+ ```javascript
775
+ // server.js
776
+ import { createAutoBrainy } from 'brainy'
1415
777
 
1416
- The demo showcases:
778
+ const brainy = createAutoBrainy({
779
+ bucketName: process.env.RAILWAY_VOLUME_NAME
780
+ })
1417
781
 
1418
- - How Brainy runs in different environments (browser, Node.js, server, cloud)
1419
- - How the noun-verb data model works
1420
- - How HNSW search works
782
+ // Railway automatically handles the rest!
783
+ ```
1421
784
 
1422
- ## Syncing Brainy Instances
785
+ ### Render.com
1423
786
 
1424
- You can use the conduit augmentations to sync Brainy instances:
787
+ ```yaml
788
+ # render.yaml
789
+ services:
790
+ - type: web
791
+ name: brainy-api
792
+ env: node
793
+ buildCommand: npm install brainy
794
+ startCommand: node server.js
795
+ envVars:
796
+ - key: BRAINY_STORAGE
797
+ value: persistent-disk
798
+ ```
1425
799
 
1426
- - **WebSocket iConduit**: For syncing between browsers and servers, or between servers. WebSockets cannot be used for
1427
- direct browser-to-browser communication without a server in the middle.
1428
- - **WebRTC iConduit**: For direct peer-to-peer syncing between browsers. This is the recommended approach for
1429
- browser-to-browser communication.
800
+ ## 🚀 Quick Examples
1430
801
 
1431
- #### WebSocket Sync Example
802
+ ### Basic Usage
1432
803
 
1433
- ```typescript
1434
- import {
1435
- BrainyData,
1436
- pipeline,
1437
- createConduitAugmentation
1438
- } from '@soulcraft/brainy'
804
+ ```javascript
805
+ import { BrainyData, NounType, VerbType } from 'brainy'
1439
806
 
1440
- // Create and initialize the database
807
+ // Initialize
1441
808
  const db = new BrainyData()
1442
809
  await db.init()
1443
810
 
1444
- // Create a WebSocket conduit augmentation
1445
- const wsConduit = await createConduitAugmentation('websocket', 'my-websocket-sync')
1446
-
1447
- // Register the augmentation with the pipeline
1448
- pipeline.register(wsConduit)
1449
-
1450
- // Connect to another Brainy instance (server or browser)
1451
- // Replace the example URL below with your actual WebSocket server URL
1452
- const connectionResult = await pipeline.executeConduitPipeline(
1453
- 'establishConnection',
1454
- ['wss://example-websocket-server.com/brainy-sync', { protocols: 'brainy-sync' }]
1455
- )
1456
-
1457
- if (connectionResult[0] && (await connectionResult[0]).success) {
1458
- const connection = (await connectionResult[0]).data
1459
-
1460
- // Read data from the remote instance
1461
- const readResult = await pipeline.executeConduitPipeline(
1462
- 'readData',
1463
- [{ connectionId: connection.connectionId, query: { type: 'getAllNouns' } }]
1464
- )
811
+ // Add data (automatically vectorized)
812
+ const catId = await db.add("Cats are independent pets", {
813
+ noun: NounType.Thing,
814
+ category: 'animal'
815
+ })
1465
816
 
1466
- // Process and add the received data to the local instance
1467
- if (readResult[0] && (await readResult[0]).success) {
1468
- const remoteNouns = (await readResult[0]).data
1469
- for (const noun of remoteNouns) {
1470
- await db.add(noun.vector, noun.metadata)
1471
- }
1472
- }
817
+ // Search for similar items
818
+ const results = await db.searchText("feline pets", 5)
1473
819
 
1474
- // Set up real-time sync by monitoring the stream
1475
- await wsConduit.monitorStream(connection.connectionId, async (data) => {
1476
- // Handle incoming data (e.g., new nouns, verbs, updates)
1477
- if (data.type === 'newNoun') {
1478
- await db.add(data.vector, data.metadata)
1479
- } else if (data.type === 'newVerb') {
1480
- await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
1481
- }
1482
- })
1483
- }
820
+ // Add relationships
821
+ await db.addVerb(catId, dogId, {
822
+ verb: VerbType.RelatedTo,
823
+ description: 'Both are pets'
824
+ })
1484
825
  ```
1485
826
 
1486
- #### WebRTC Peer-to-Peer Sync Example
827
+ ### AutoBrainy (Recommended)
1487
828
 
1488
- ```typescript
1489
- import {
1490
- BrainyData,
1491
- pipeline,
1492
- createConduitAugmentation
1493
- } from '@soulcraft/brainy'
1494
-
1495
- // Create and initialize the database
1496
- const db = new BrainyData()
1497
- await db.init()
829
+ ```javascript
830
+ import { createAutoBrainy } from 'brainy'
1498
831
 
1499
- // Create a WebRTC conduit augmentation
1500
- const webrtcConduit = await createConduitAugmentation('webrtc', 'my-webrtc-sync')
1501
-
1502
- // Register the augmentation with the pipeline
1503
- pipeline.register(webrtcConduit)
1504
-
1505
- // Connect to a peer using a signaling server
1506
- // Replace the example values below with your actual configuration
1507
- const connectionResult = await pipeline.executeConduitPipeline(
1508
- 'establishConnection',
1509
- [
1510
- 'peer-id-to-connect-to', // Replace with actual peer ID
1511
- {
1512
- signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
1513
- localPeerId: 'my-local-peer-id', // Replace with your local peer ID
1514
- iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] // Public STUN server
1515
- }
1516
- ]
1517
- )
1518
-
1519
- if (connectionResult[0] && (await connectionResult[0]).success) {
1520
- const connection = (await connectionResult[0]).data
1521
-
1522
- // Set up real-time sync by monitoring the stream
1523
- await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
1524
- // Handle incoming data (e.g., new nouns, verbs, updates)
1525
- if (data.type === 'newNoun') {
1526
- await db.add(data.vector, data.metadata)
1527
- } else if (data.type === 'newVerb') {
1528
- await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
1529
- }
1530
- })
832
+ // Everything auto-configured!
833
+ const brainy = createAutoBrainy()
1531
834
 
1532
- // When adding new data locally, also send to the peer
1533
- const nounId = await db.add("New data to sync", { noun: "Thing" })
1534
-
1535
- // Send the new noun to the peer
1536
- await pipeline.executeConduitPipeline(
1537
- 'writeData',
1538
- [
1539
- {
1540
- connectionId: connection.connectionId,
1541
- data: {
1542
- type: 'newNoun',
1543
- id: nounId,
1544
- vector: (await db.get(nounId)).vector,
1545
- metadata: (await db.get(nounId)).metadata
1546
- }
1547
- }
1548
- ]
1549
- )
1550
- }
835
+ // Just start using it
836
+ await brainy.addVector({ id: '1', vector: [0.1, 0.2, 0.3], text: 'Hello' })
837
+ const results = await brainy.search([0.1, 0.2, 0.3], 10)
1551
838
  ```
1552
839
 
1553
- #### Browser-Server Search Example
1554
-
1555
- Brainy supports searching a server-hosted instance from a browser, storing results locally, and performing further
1556
- searches against the local instance:
1557
-
1558
- ```typescript
1559
- import { BrainyData } from '@soulcraft/brainy'
1560
-
1561
- // Create and initialize the database with remote server configuration
1562
- // Replace the example URL below with your actual Brainy server URL
1563
- const db = new BrainyData({
1564
- remoteServer: {
1565
- url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
1566
- protocols: 'brainy-sync',
1567
- autoConnect: true // Connect automatically during initialization
1568
- }
1569
- })
1570
- await db.init()
1571
-
1572
- // Or connect manually after initialization
1573
- if (!db.isConnectedToRemoteServer()) {
1574
- // Replace the example URL below with your actual Brainy server URL
1575
- await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
1576
- }
1577
-
1578
- // Search the remote server (results are stored locally)
1579
- const remoteResults = await db.searchText('machine learning', 5, { searchMode: 'remote' })
840
+ ### Scenario-Based Setup
1580
841
 
1581
- // Search the local database (includes previously stored results)
1582
- const localResults = await db.searchText('machine learning', 5, { searchMode: 'local' })
1583
-
1584
- // Perform a combined search (local first, then remote if needed)
1585
- const combinedResults = await db.searchText('neural networks', 5, { searchMode: 'combined' })
842
+ ```javascript
843
+ import { createQuickBrainy } from 'brainy'
1586
844
 
1587
- // Add data to both local and remote instances
1588
- const id = await db.addToBoth('Deep learning is a subset of machine learning', {
1589
- noun: 'Concept',
1590
- category: 'AI',
1591
- tags: ['deep learning', 'neural networks']
845
+ // Choose your scale: 'small', 'medium', 'large', 'enterprise'
846
+ const brainy = await createQuickBrainy('large', {
847
+ bucketName: 'my-vector-db'
1592
848
  })
1593
-
1594
- // Clean up when done (this also cleans up worker pools)
1595
- await db.shutDown()
1596
849
  ```
1597
850
 
1598
- ---
1599
-
1600
- ## 📈 Scaling Strategy
1601
-
1602
- Brainy is designed to handle datasets of various sizes, from small collections to large-scale deployments. For
1603
- terabyte-scale data that can't fit entirely in memory, we provide several approaches:
1604
-
1605
- - **Disk-Based HNSW**: Modified implementations using intelligent caching and partial loading
1606
- - **Distributed HNSW**: Sharding and partitioning across multiple machines
1607
- - **Hybrid Solutions**: Combining quantization techniques with multi-tier architectures
1608
-
1609
- For detailed information on how to scale Brainy for large datasets, vector dimension standardization, threading
1610
- implementation, storage testing, and other technical topics, see our
1611
- comprehensive [Technical Guides](TECHNICAL_GUIDES.md).
1612
-
1613
- ## Recent Changes and Performance Improvements
1614
-
1615
- ### Enhanced Memory Management and Scalability
1616
-
1617
- Brainy has been significantly improved to handle larger datasets more efficiently:
1618
-
1619
- - **Pagination Support**: All data retrieval methods now support pagination to avoid loading entire datasets into memory
1620
- at once. The deprecated `getAllNouns()` and `getAllVerbs()` methods have been replaced with `getNouns()` and
1621
- `getVerbs()` methods that support pagination, filtering, and cursor-based navigation.
1622
-
1623
- - **Multi-level Caching**: A sophisticated three-level caching strategy has been implemented:
1624
- - **Level 1**: Hot cache (most accessed nodes) - RAM (automatically detecting and adjusting in each environment)
1625
- - **Level 2**: Warm cache (recent nodes) - OPFS, Filesystem or S3 depending on environment
1626
- - **Level 3**: Cold storage (all nodes) - OPFS, Filesystem or S3 depending on environment
1627
-
1628
- - **Adaptive Memory Usage**: The system automatically detects available memory and adjusts cache sizes accordingly:
1629
- - In Node.js: Uses 10% of free memory (minimum 1000 entries)
1630
- - In browsers: Scales based on device memory (500 entries per GB, minimum 1000)
1631
-
1632
- - **Intelligent Cache Eviction**: Implements a Least Recently Used (LRU) policy that evicts the oldest 20% of items when
1633
- the cache reaches the configured threshold.
1634
-
1635
- - **Prefetching Strategy**: Implements batch prefetching to improve performance while avoiding overwhelming system
1636
- resources.
1637
-
1638
- ### S3-Compatible Storage Improvements
1639
-
1640
- - **Enhanced Cloud Storage**: Improved support for S3-compatible storage services including AWS S3, Cloudflare R2, and
1641
- others.
1642
-
1643
- - **Optimized Data Access**: Batch operations and error handling for efficient cloud storage access.
1644
-
1645
- - **Change Log Management**: Efficient synchronization through change logs to track updates.
1646
-
1647
- ### Data Compatibility
1648
-
1649
- Yes, you can use existing data indexed from an old version. Brainy includes robust data migration capabilities:
1650
-
1651
- - **Vector Regeneration**: If vectors are missing in imported data, they will be automatically created using the
1652
- embedding function.
1653
-
1654
- - **HNSW Index Reconstruction**: The system can reconstruct the HNSW index from backup data, ensuring compatibility with
1655
- previous versions.
1656
-
1657
- - **Sparse Data Import**: Support for importing sparse data (without vectors) through the `importSparseData()` method.
1658
-
1659
- ### System Requirements
1660
-
1661
- #### Default Mode
1662
-
1663
- - **Memory**:
1664
- - Minimum: 512MB RAM
1665
- - Recommended: 2GB+ RAM for medium datasets, 8GB+ for large datasets
1666
-
1667
- - **CPU**:
1668
- - Minimum: 2 cores
1669
- - Recommended: 4+ cores for better performance with parallel operations
1670
-
1671
- - **Storage**:
1672
- - Minimum: 1GB available storage
1673
- - Recommended: Storage space at least 3x the size of your dataset
1674
-
1675
- #### Read-Only Mode
1676
-
1677
- Read-only mode prevents all write operations (add, update, delete) and is optimized for search operations.
1678
-
1679
- - **Memory**:
1680
- - Minimum: 256MB RAM
1681
- - Recommended: 1GB+ RAM
1682
-
1683
- - **CPU**:
1684
- - Minimum: 1 core
1685
- - Recommended: 2+ cores
1686
-
1687
- - **Storage**:
1688
- - Minimum: Storage space equal to the size of your dataset
1689
- - Recommended: 2x the size of your dataset for caching
1690
-
1691
- - **New Feature**: Lazy loading support in read-only mode for improved performance with large datasets.
1692
-
1693
- #### Write-Only Mode
1694
-
1695
- Write-only mode prevents all search operations and is optimized for initial data loading or when you want to optimize
1696
- for write performance.
1697
-
1698
- - **Memory**:
1699
- - Minimum: 512MB RAM
1700
- - Recommended: 2GB+ RAM
1701
-
1702
- - **CPU**:
1703
- - Minimum: 2 cores
1704
- - Recommended: 4+ cores for faster data ingestion
1705
-
1706
- - **Storage**:
1707
- - Minimum: Storage space at least 2x the size of your dataset
1708
- - Recommended: 4x the size of your dataset for optimal performance
1709
-
1710
- ### Performance Tuning Parameters
1711
-
1712
- Brainy offers comprehensive configuration options for performance tuning, with enhanced support for large datasets in S3
1713
- or other remote storage. **All configuration is optional** - the system automatically detects the optimal settings based
1714
- on your environment, dataset size, and usage patterns.
1715
-
1716
- #### Intelligent Defaults
1717
-
1718
- Brainy uses intelligent defaults that automatically adapt to your environment:
1719
-
1720
- - **Environment Detection**: Automatically detects whether you're running in Node.js, browser, or worker environment
1721
- - **Memory-Aware Caching**: Adjusts cache sizes based on available system memory
1722
- - **Dataset Size Adaptation**: Tunes parameters based on the size of your dataset
1723
- - **Usage Pattern Optimization**: Adjusts to read-heavy vs. write-heavy workloads
1724
- - **Storage Type Awareness**: Optimizes for local vs. remote storage (S3, R2, etc.)
1725
- - **Operating Mode Specialization**: Special optimizations for read-only and write-only modes
1726
-
1727
- #### Cache Configuration (Optional)
1728
-
1729
- You can override any of these automatically tuned parameters if needed:
1730
-
1731
- - **Hot Cache Size**: Control the maximum number of items to keep in memory.
1732
- - For large datasets (>100K items), consider values between 5,000-50,000 depending on available memory.
1733
- - In read-only mode, larger values (10,000-100,000) can be used for better performance.
1734
-
1735
- - **Eviction Threshold**: Set the threshold at which cache eviction begins (default: 0.8 or 80% of max size).
1736
- - For write-heavy workloads, lower values (0.6-0.7) may improve performance.
1737
- - For read-heavy workloads, higher values (0.8-0.9) are recommended.
1738
-
1739
- - **Warm Cache TTL**: Set the time-to-live for items in the warm cache (default: 3600000 ms or 1 hour).
1740
- - For frequently changing data, shorter TTLs are recommended.
1741
- - For relatively static data, longer TTLs improve performance.
1742
-
1743
- - **Batch Size**: Control the number of items to process in a single batch for operations like prefetching.
1744
- - For S3 or remote storage with large datasets, larger values (50-200) significantly improve throughput.
1745
- - In read-only mode with remote storage, even larger values (100-300) can be used.
1746
-
1747
- #### Auto-Tuning (Enabled by Default)
1748
-
1749
- - **Auto-Tune**: Enable or disable automatic tuning of cache parameters based on usage patterns (default: true).
1750
- - **Auto-Tune Interval**: Set how frequently the system adjusts cache parameters (default: 60000 ms or 1 minute).
1751
-
1752
- #### Read-Only Mode Optimizations (Automatic)
1753
-
1754
- Read-only mode includes special optimizations for search performance that are automatically applied:
1755
-
1756
- - **Larger Cache Sizes**: Automatically uses more memory for caching (up to 40% of free memory for large datasets).
1757
- - **Aggressive Prefetching**: Loads more data in each batch to reduce the number of storage requests.
1758
- - **Prefetch Strategy**: Defaults to 'aggressive' prefetching strategy in read-only mode.
1759
-
1760
- #### Example Configuration for Large S3 Datasets
851
+ ### With Offline Models
1761
852
 
1762
853
  ```javascript
1763
- const brainy = new BrainyData({
1764
- readOnly: true,
1765
- lazyLoadInReadOnlyMode: true,
1766
- storage: {
1767
- type: 's3',
1768
- s3Storage: {
1769
- bucketName: 'your-bucket',
1770
- accessKeyId: 'your-access-key',
1771
- secretAccessKey: 'your-secret-key',
1772
- region: 'your-region'
1773
- }
1774
- },
1775
- cache: {
1776
- hotCacheMaxSize: 20000,
1777
- hotCacheEvictionThreshold: 0.85,
1778
- batchSize: 100,
1779
- readOnlyMode: {
1780
- hotCacheMaxSize: 50000,
1781
- batchSize: 200,
1782
- prefetchStrategy: 'aggressive'
1783
- }
1784
- }
1785
- });
1786
- ```
1787
-
1788
- These configuration options make Brainy more efficient, scalable, and adaptable to different environments and usage
1789
- patterns, especially for large datasets in cloud storage.
1790
-
1791
- ## Testing
854
+ import { createAutoBrainy } from 'brainy'
855
+ import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
1792
856
 
1793
- Brainy uses Vitest for testing. For detailed information about testing in Brainy, including test configuration, scripts,
1794
- reporting tools, and best practices, see our [Testing Guide](docs/technical/TESTING.md).
1795
-
1796
- Here are some common test commands:
1797
-
1798
- ```bash
1799
- # Run all tests
1800
- npm test
1801
-
1802
- # Run tests with comprehensive reporting
1803
- npm run test:report
857
+ // Use bundled model for offline operation
858
+ const brainy = createAutoBrainy({
859
+ embeddingModel: BundledUniversalSentenceEncoder,
860
+ // Model loads from local files, no network needed!
861
+ })
1804
862
 
1805
- # Run tests with coverage
1806
- npm run test:coverage
863
+ // Works exactly the same, but 100% offline
864
+ await brainy.add("This works without internet!", {
865
+ noun: NounType.Content
866
+ })
1807
867
  ```
1808
868
 
1809
- ## Contributing
869
+ ## 🌐 Live Demo
1810
870
 
1811
- For detailed contribution guidelines, please see [CONTRIBUTING.md](CONTRIBUTING.md).
871
+ **[Try the interactive demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - See Brainy in action with animations and examples.
1812
872
 
1813
- For developer documentation, including building, testing, and publishing instructions, please
1814
- see [DEVELOPERS.md](DEVELOPERS.md).
873
+ ## 🔧 Environment Support
1815
874
 
1816
- We have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to follow.
875
+ | Environment | Storage | Threading | Auto-Configured |
876
+ |-------------|---------|-----------|-----------------|
877
+ | Browser | OPFS | Web Workers | ✅ |
878
+ | Node.js | FileSystem/S3 | Worker Threads | ✅ |
879
+ | Serverless | Memory/S3 | Limited | ✅ |
880
+ | Edge Functions | Memory/KV | Limited | ✅ |
1817
881
 
1818
- ### Commit Message Format
882
+ ## 📚 Documentation
1819
883
 
1820
- For best results with automatic changelog generation, follow
1821
- the [Conventional Commits](https://www.conventionalcommits.org/) specification for your commit messages:
884
+ ### Getting Started
885
+ - [**Quick Start Guide**](docs/getting-started/) - Get up and running in minutes
886
+ - [**Installation**](docs/getting-started/installation.md) - Detailed setup instructions
887
+ - [**Environment Setup**](docs/getting-started/environment-setup.md) - Platform-specific configuration
1822
888
 
1823
- ```
1824
- AI Template for automated commit messages:
889
+ ### User Guides
890
+ - [**Search and Metadata**](docs/user-guides/) - Advanced search techniques
891
+ - [**JSON Document Search**](docs/guides/json-document-search.md) - Field-based searching
892
+ - [**Production Migration**](docs/guides/production-migration-guide.md) - Deployment best practices
1825
893
 
1826
- Use Conventional Commit format
1827
- Specify the changes in a structured format
1828
- Add information about the purpose of the commit
1829
- ```
894
+ ### API Reference
895
+ - [**Core API**](docs/api-reference/) - Complete method reference
896
+ - [**Configuration Options**](docs/api-reference/configuration.md) - All configuration parameters
897
+ - [**Auto-Configuration API**](docs/api-reference/auto-configuration-api.md) - Intelligent setup
1830
898
 
1831
- ```
1832
- <type>(<scope>): <description>
899
+ ### Optimization & Scaling
900
+ - [**Large-Scale Optimizations**](docs/optimization-guides/) - Handle millions of vectors
901
+ - [**Memory Management**](docs/optimization-guides/memory-optimization.md) - Efficient resource usage
902
+ - [**S3 Migration Guide**](docs/optimization-guides/s3-migration-guide.md) - Cloud storage setup
1833
903
 
1834
- [optional body]
904
+ ### Examples & Patterns
905
+ - [**Code Examples**](docs/examples/) - Real-world usage patterns
906
+ - [**Integrations**](docs/examples/integrations.md) - Third-party services
907
+ - [**Performance Patterns**](docs/examples/performance.md) - Optimization techniques
1835
908
 
1836
- [optional footer(s)]
1837
- ```
1838
-
1839
- Where `<type>` is one of:
909
+ ### Technical Documentation
910
+ - [**Architecture Overview**](docs/technical/) - System design and internals
911
+ - [**Testing Guide**](docs/technical/TESTING.md) - Testing strategies
912
+ - [**Statistics & Monitoring**](docs/technical/STATISTICS.md) - Performance tracking
1840
913
 
1841
- - `feat`: A new feature (maps to **Added** section)
1842
- - `fix`: A bug fix (maps to **Fixed** section)
1843
- - `chore`: Regular maintenance tasks (maps to **Changed** section)
1844
- - `docs`: Documentation changes (maps to **Documentation** section)
1845
- - `refactor`: Code changes that neither fix bugs nor add features (maps to **Changed** section)
1846
- - `perf`: Performance improvements (maps to **Changed** section)
914
+ ## 🤝 Contributing
1847
915
 
1848
- ### Manual Release Process
916
+ We welcome contributions! Please see:
917
+ - [Contributing Guidelines](CONTRIBUTING.md)
918
+ - [Developer Documentation](docs/development/DEVELOPERS.md)
919
+ - [Code of Conduct](CODE_OF_CONDUCT.md)
1849
920
 
1850
- If you need more control over the release process, you can use the individual commands:
921
+ ## 📄 License
1851
922
 
1852
- ```bash
1853
- # Update version and generate changelog
1854
- npm run _release:patch # or _release:minor, _release:major
923
+ [MIT](LICENSE)
1855
924
 
1856
- # Create GitHub release
1857
- npm run _github-release
925
+ ## 🔗 Related Projects
1858
926
 
1859
- # Publish to NPM
1860
- npm publish
1861
- ```
927
+ - [**Cartographer**](https://github.com/sodal-project/cartographer) - Standardized interfaces for Brainy
1862
928
 
1863
- ## License
929
+ ---
1864
930
 
1865
- [MIT](LICENSE)
931
+ <div align="center">
932
+ <strong>Ready to build something amazing? Get started with Brainy today!</strong>
933
+ </div>