@soulcraft/brainy 0.37.0 โ†’ 0.39.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,1859 +7,1004 @@
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.4.5-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
9
9
 
10
- [//]: # ([![Cartographer](https://img.shields.io/badge/Cartographer-Official%20Standard-brightgreen)](https://github.com/sodal-project/cartographer))
11
-
12
10
  **A powerful graph & vector data platform for AI applications across any environment**
13
11
 
14
12
  </div>
15
13
 
16
- ## โœจ Overview
17
-
18
- Brainy combines the power of vector search with graph relationships in a lightweight, cross-platform database. Whether
19
- you're building AI applications, recommendation systems, or knowledge graphs, Brainy provides the tools you need to
20
- store, connect, and retrieve your data intelligently.
21
-
22
- What makes Brainy special? It intelligently adapts to your environment! Brainy automatically detects your platform,
23
- adjusts its storage strategy, and optimizes performance based on your usage patterns. The more you use it, the smarter
24
- it gets - learning from your data to provide increasingly relevant results and connections.
25
-
26
- ### ๐Ÿš€ Key Features
27
-
28
- - **๐Ÿง  Zero Configuration** - Auto-detects environment and optimizes automatically
29
- - **โšก Production-Scale Performance** - Handles millions of vectors with sub-second search
30
- - **๐ŸŽฏ Intelligent Partitioning** - Semantic clustering with auto-tuning
31
- - **๐Ÿ“Š Adaptive Learning** - Gets smarter with usage, optimizes itself over time
32
- - **๐Ÿ—„๏ธ Smart Storage** - OPFS, FileSystem, S3 auto-selection based on environment
33
- - **๐Ÿ’พ Massive Memory Optimization** - 75% reduction with compression, intelligent caching
34
- - **๐Ÿš€ Distributed Search** - Parallel processing with load balancing
35
- - **๐Ÿ”„ Real-Time Adaptation** - Automatically adjusts to your data patterns
36
- - **Run Everywhere** - Works in browsers, Node.js, serverless functions, and containers
37
- - **Vector Search** - Find semantically similar content using embeddings
38
- - **Advanced JSON Document Search** - Search within specific fields of JSON documents with field prioritization and
39
- service-based field standardization
40
- - **Graph Relationships** - Connect data with meaningful relationships
41
- - **Streaming Pipeline** - Process data in real-time as it flows through the system
42
- - **Extensible Augmentations** - Customize and extend functionality with pluggable components
43
- - **Built-in Conduits** - Sync and scale across instances with WebSocket and WebRTC
44
- - **TensorFlow Integration** - Use TensorFlow.js for high-quality embeddings
45
- - **Persistent Storage** - Data persists across sessions and scales to any size
46
- - **TypeScript Support** - Fully typed API with generics
47
- - **CLI Tools & Web Service** - Command-line interface and REST API web service for data management
48
- - **Model Control Protocol (MCP)** - Allow external AI models to access Brainy data and use augmentation pipeline as
49
- tools
50
-
51
- ## โšก Large-Scale Performance Optimizations
52
-
53
- **New in v0.36.0**: Brainy now includes 6 core optimizations that transform it from a prototype into a production-ready system capable of handling millions of vectors:
54
-
55
- ### ๐ŸŽฏ Performance Benchmarks
56
-
57
- | Dataset Size | Search Time | Memory Usage | API Calls Reduction |
58
- |-------------|-------------|--------------|-------------------|
59
- | **10k vectors** | ~50ms | Standard | N/A |
60
- | **100k vectors** | ~200ms | 30% reduction | 50-70% fewer |
61
- | **1M+ vectors** | ~500ms | 75% reduction | 50-90% fewer |
62
-
63
- ### ๐Ÿง  6 Core Optimization Systems
14
+ ## โœจ What is Brainy?
64
15
 
65
- 1. **๐ŸŽ›๏ธ Auto-Configuration System** - Detects environment, resources, and data patterns
66
- 2. **๐Ÿ”€ Semantic Partitioning** - Intelligent clustering with auto-tuning (4-32 clusters)
67
- 3. **๐Ÿš€ Distributed Search** - Parallel processing across partitions with load balancing
68
- 4. **๐Ÿง  Multi-Level Caching** - Hot/Warm/Cold caching with predictive prefetching
69
- 5. **๐Ÿ“ฆ Batch S3 Operations** - Reduces cloud storage API calls by 50-90%
70
- 6. **๐Ÿ’พ Advanced Compression** - Vector quantization and memory-mapping for large datasets
16
+ Imagine a database that thinks like you do - connecting ideas, finding patterns, and getting smarter over time. Brainy
17
+ is the **AI-native database** that brings vector search and knowledge graphs together in one powerful, ridiculously
18
+ easy-to-use package.
71
19
 
72
- ### ๐ŸŽฏ Automatic Environment Detection
20
+ ### ๐Ÿ†• NEW: Distributed Mode (v0.38+)
73
21
 
74
- | Environment | Auto-Configured | Performance Focus |
75
- |-------------|-----------------|-------------------|
76
- | **Browser** | OPFS + Web Workers | Memory efficiency, 512MB-1GB limits |
77
- | **Node.js** | FileSystem + Worker Threads | High performance, 4GB-8GB+ usage |
78
- | **Serverless** | S3 + Memory cache | Cold start optimization, latency focus |
22
+ **Scale horizontally with zero configuration!** Brainy now supports distributed deployments with automatic coordination:
79
23
 
80
- ### ๐Ÿ“Š Intelligent Scaling Strategy
24
+ - **๐ŸŒ Multi-Instance Coordination** - Multiple readers and writers working in harmony
25
+ - **๐Ÿท๏ธ Smart Domain Detection** - Automatically categorizes data (medical, legal, product, etc.)
26
+ - **๐Ÿ“Š Real-Time Health Monitoring** - Track performance across all instances
27
+ - **๐Ÿ”„ Automatic Role Optimization** - Readers optimize for cache, writers for throughput
28
+ - **๐Ÿ—‚๏ธ Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
81
29
 
82
- The system automatically adapts based on your dataset size:
30
+ ### ๐Ÿš€ Why Developers Love Brainy
83
31
 
84
- - **< 25k vectors**: Single optimized index, no partitioning needed
85
- - **25k - 100k**: Semantic clustering (4-8 clusters), balanced performance
86
- - **100k - 1M**: Advanced partitioning (8-16 clusters), scale-optimized
87
- - **1M+ vectors**: Maximum optimization (16-32 clusters), enterprise-grade
32
+ - **๐Ÿง  Zero-to-Smartโ„ข** - No config files, no tuning parameters, no DevOps headaches. Brainy auto-detects your
33
+ environment and optimizes itself
34
+ - **๐ŸŒ True Write-Once, Run-Anywhere** - Same code runs in React, Angular, Vue, Node.js, Deno, Bun, serverless, edge
35
+ workers, and even vanilla HTML
36
+ - **โšก Scary Fast** - Handles millions of vectors with sub-millisecond search. Built-in GPU acceleration when available
37
+ - **๐ŸŽฏ Self-Learning** - Like having a database that goes to the gym. Gets faster and smarter the more you use it
38
+ - **๐Ÿ”ฎ AI-First Design** - Built for the age of embeddings, RAG, and semantic search. Your LLMs will thank you
39
+ - **๐ŸŽฎ Actually Fun to Use** - Clean API, great DX, and it does the heavy lifting so you can build cool stuff
88
40
 
89
- ### ๐Ÿง  Adaptive Learning Features
41
+ ### ๐Ÿš€ NEW: Ultra-Fast Search Performance + Auto-Configuration
90
42
 
91
- - **Performance Monitoring**: Tracks latency, cache hits, memory usage
92
- - **Dynamic Tuning**: Adjusts parameters every 50 searches based on performance
93
- - **Pattern Recognition**: Learns from access patterns to improve predictions
94
- - **Self-Optimization**: Automatically enables/disables features based on workload
43
+ **Your searches just got 100x faster AND Brainy now configures itself!** Advanced performance with zero setup:
95
44
 
96
- > **๐Ÿ“– Full Documentation**: See the complete [Large-Scale Optimizations Guide](docs/optimization-guides/large-scale-optimizations.md) for detailed configuration options and advanced usage.
45
+ - **๐Ÿค– Intelligent Auto-Configuration** - Detects environment and usage patterns, optimizes automatically
46
+ - **โšก Smart Result Caching** - Repeated queries return in <1ms with automatic cache invalidation
47
+ - **๐Ÿ“„ Cursor-Based Pagination** - Navigate millions of results with constant O(k) performance
48
+ - **๐Ÿ”„ Real-Time Data Sync** - Cache automatically updates when data changes, even in distributed scenarios
49
+ - **๐Ÿ“Š Performance Monitoring** - Built-in hit rate and memory usage tracking with adaptive optimization
50
+ - **๐ŸŽฏ Zero Breaking Changes** - All existing code works unchanged, just faster and smarter
97
51
 
98
- ## ๐Ÿš€ Live Demo
99
-
100
- **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the interactive demo on
101
- GitHub Pages that showcases Brainy's main features.
102
-
103
- ## ๐Ÿ“Š What Can You Build?
104
-
105
- - **Semantic Search Engines** - Find content based on meaning, not just keywords
106
- - **Recommendation Systems** - Suggest similar items based on vector similarity
107
- - **Knowledge Graphs** - Build connected data structures with relationships
108
- - **AI Applications** - Store and retrieve embeddings for machine learning models
109
- - **AI-Enhanced Applications** - Build applications that leverage vector embeddings for intelligent data processing
110
- - **Data Organization Tools** - Automatically categorize and connect related information
111
- - **Adaptive Experiences** - Create applications that learn and evolve with your users
112
- - **Model-Integrated Systems** - Connect external AI models to Brainy data and tools using MCP
113
-
114
- ## ๐Ÿ”ง Installation
52
+ ```javascript
53
+ // Zero configuration - everything optimized automatically!
54
+ const brainy = new BrainyData() // Auto-detects environment & optimizes
55
+ await brainy.init()
56
+
57
+ // Caching happens automatically - no setup needed!
58
+ const results1 = await brainy.search('query', 10) // ~50ms first time
59
+ const results2 = await brainy.search('query', 10) // <1ms cached hit!
60
+
61
+ // Advanced pagination works instantly
62
+ const page1 = await brainy.searchWithCursor('query', 100)
63
+ const page2 = await brainy.searchWithCursor('query', 100, {
64
+ cursor: page1.cursor // Constant time, no matter how deep!
65
+ })
115
66
 
116
- ```bash
117
- npm install @soulcraft/brainy
67
+ // Monitor auto-optimized performance
68
+ const stats = brainy.getCacheStats()
69
+ console.log(`Auto-tuned cache hit rate: ${(stats.search.hitRate * 100).toFixed(1)}%`)
118
70
  ```
119
71
 
120
- TensorFlow.js packages are included as bundled dependencies and will be automatically installed without any additional
121
- configuration.
122
-
123
- ### Additional Packages
124
-
125
- Brainy offers specialized packages for different use cases:
72
+ ## ๐Ÿš€ Quick Start (30 seconds!)
126
73
 
127
- #### CLI Package
74
+ ### Node.js TLDR
128
75
 
129
76
  ```bash
130
- npm install -g @soulcraft/brainy-cli
131
- ```
132
-
133
- Command-line interface for data management, bulk operations, and database administration.
134
-
135
- #### Web Service Package
77
+ # Install
78
+ npm install brainy
136
79
 
137
- ```bash
138
- npm install @soulcraft/brainy-web-service
80
+ # Use it
139
81
  ```
140
82
 
141
- REST API web service wrapper that provides HTTP endpoints for search operations and database queries.
142
-
143
- ## ๐Ÿš€ Quick Setup - Zero Configuration!
144
-
145
- **New in v0.36.0**: Brainy now automatically detects your environment and optimizes itself! Choose your scenario:
146
-
147
- ### โœจ Instant Setup (Auto-Everything)
148
- ```typescript
149
- import { createAutoBrainy } from '@soulcraft/brainy'
83
+ ```javascript
84
+ import { createAutoBrainy, NounType, VerbType } from 'brainy'
150
85
 
151
- // That's it! Everything is auto-configured
152
86
  const brainy = createAutoBrainy()
153
87
 
154
- // Add data and search - all optimizations enabled automatically
155
- await brainy.addVector({ id: '1', vector: [0.1, 0.2, 0.3], text: 'Hello world' })
156
- const results = await brainy.search([0.1, 0.2, 0.3], 10)
157
- ```
158
-
159
- ### ๐Ÿ“ฆ With S3 Storage (Still Auto-Configured)
160
- ```typescript
161
- import { createAutoBrainy } from '@soulcraft/brainy'
162
-
163
- // Auto-detects AWS credentials from environment variables
164
- const brainy = createAutoBrainy({
165
- bucketName: 'my-vector-storage'
166
- // region: 'us-east-1' (default)
167
- // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from env
168
- })
169
- ```
170
-
171
- ### ๐ŸŽฏ Scenario-Based Setup
172
- ```typescript
173
- import { createQuickBrainy } from '@soulcraft/brainy'
174
-
175
- // Choose your scale: 'small', 'medium', 'large', 'enterprise'
176
- const brainy = await createQuickBrainy('large', {
177
- bucketName: 'my-big-vector-db'
178
- })
179
- ```
180
-
181
- | Scenario | Dataset Size | Memory Usage | S3 Required | Best For |
182
- |----------|-------------|--------------|-------------|----------|
183
- | `small` | โ‰ค10k vectors | โ‰ค1GB | No | Development, testing |
184
- | `medium` | โ‰ค100k vectors | โ‰ค4GB | Serverless only | Production apps |
185
- | `large` | โ‰ค1M vectors | โ‰ค8GB | Yes | Large applications |
186
- | `enterprise` | โ‰ค10M vectors | โ‰ค32GB | Yes | Enterprise systems |
187
-
188
- ### ๐Ÿง  What Auto-Configuration Does
189
-
190
- - **๐ŸŽฏ Environment Detection**: Browser, Node.js, or Serverless
191
- - **๐Ÿ’พ Smart Memory Management**: Uses available RAM optimally
192
- - **๐Ÿ—„๏ธ Storage Selection**: OPFS, FileSystem, S3, or Memory
193
- - **โšก Performance Tuning**: Threading, caching, compression
194
- - **๐Ÿ“Š Adaptive Learning**: Improves performance over time
195
- - **๐Ÿ” Semantic Partitioning**: Auto-clusters similar vectors
196
-
197
- ## ๐Ÿ Traditional Setup (Manual Configuration)
198
-
199
- If you prefer manual control:
200
-
201
- ```typescript
202
- import { BrainyData, NounType, VerbType } from '@soulcraft/brainy'
203
-
204
- // Create and initialize the database
205
- const db = new BrainyData()
206
- await db.init()
207
-
208
- // Add data (automatically converted to vectors)
209
- const catId = await db.add("Cats are independent pets", {
88
+ // Add data with Nouns (entities)
89
+ const catId = await brainy.add("Siamese cats are elegant and vocal", {
210
90
  noun: NounType.Thing,
211
- category: 'animal'
91
+ breed: "Siamese",
92
+ category: "animal"
212
93
  })
213
94
 
214
- const dogId = await db.add("Dogs are loyal companions", {
215
- noun: NounType.Thing,
216
- category: 'animal'
95
+ const ownerId = await brainy.add("John loves his pets", {
96
+ noun: NounType.Person,
97
+ name: "John Smith"
217
98
  })
218
99
 
219
- // Search for similar items
220
- const results = await db.searchText("feline pets", 2)
221
- console.log(results)
222
-
223
- // Add a relationship between items
224
- await db.addVerb(catId, dogId, {
225
- verb: VerbType.RelatedTo,
226
- description: 'Both are common household pets'
100
+ // Connect with Verbs (relationships)
101
+ await brainy.addVerb(ownerId, catId, {
102
+ verb: VerbType.Owns,
103
+ since: "2020-01-01"
227
104
  })
228
- ```
229
-
230
- ### Import Options
231
-
232
- ```typescript
233
- // Standard import - automatically adapts to any environment
234
- import { BrainyData } from '@soulcraft/brainy'
235
-
236
- // Minified version for production
237
- import { BrainyData } from '@soulcraft/brainy/min'
238
- ```
239
-
240
- > **Note**: The CLI functionality is available as a separate package `@soulcraft/brainy-cli` to reduce the bundle size
241
- > of the main package. Install it globally with `npm install -g @soulcraft/brainy-cli` to use the command-line
242
- > interface.
243
-
244
- ### Browser Usage
245
-
246
- ```html
247
-
248
- <script type="module">
249
- // Use local files instead of CDN
250
- import { BrainyData } from './dist/unified.js'
251
-
252
- // Or minified version
253
- // import { BrainyData } from './dist/unified.min.js'
254
-
255
- const db = new BrainyData()
256
- await db.init()
257
- // ...
258
- </script>
259
- ```
260
-
261
- Modern bundlers like Webpack, Rollup, and Vite will automatically use the unified build which adapts to any environment.
262
105
 
263
- ## ๐Ÿงฉ How It Works
106
+ // Search by meaning
107
+ const results = await brainy.searchText("feline companions", 5)
264
108
 
265
- Brainy combines **six advanced optimization systems** with core vector database technologies to create a production-ready, self-optimizing system:
266
-
267
- ### ๐Ÿ”ง Core Technologies
268
- 1. **Vector Embeddings** - Converts data (text, images, etc.) into numerical vectors using TensorFlow.js
269
- 2. **Optimized HNSW Algorithm** - Fast similarity search with semantic partitioning and distributed processing
270
- 3. **๐Ÿง  Auto-Configuration Engine** - Detects environment, resources, and data patterns to optimize automatically
271
- 4. **๐ŸŽฏ Intelligent Storage System** - Multi-level caching with predictive prefetching and batch operations
272
-
273
- ### โšก Advanced Optimization Layer
274
- 5. **Semantic Partitioning** - Auto-clusters similar vectors for faster search (4-32 clusters based on scale)
275
- 6. **Distributed Search** - Parallel processing across partitions with intelligent load balancing
276
- 7. **Multi-Level Caching** - Hot (RAM) โ†’ Warm (Fast Storage) โ†’ Cold (S3/Disk) with 70-90% hit rates
277
- 8. **Batch Operations** - Reduces S3 API calls by 50-90% through intelligent batching
278
- 9. **Adaptive Learning** - Continuously learns from usage patterns and optimizes performance
279
- 10. **Advanced Compression** - Vector quantization achieves 75% memory reduction for large datasets
280
-
281
- ### ๐ŸŽฏ Environment-Specific Optimizations
282
-
283
- | Environment | Storage | Threading | Memory | Focus |
284
- |-------------|---------|-----------|---------|-------|
285
- | **Browser** | OPFS + Cache | Web Workers | 512MB-1GB | Responsiveness |
286
- | **Node.js** | FileSystem + S3 | Worker Threads | 4GB-8GB+ | Throughput |
287
- | **Serverless** | S3 + Memory | Limited | 1GB-2GB | Cold Start Speed |
288
-
289
- ### ๐Ÿ”„ Adaptive Intelligence Flow
290
- ```
291
- Data Input โ†’ Auto-Detection โ†’ Environment Optimization โ†’ Semantic Partitioning โ†’
292
- Distributed Search โ†’ Multi-Level Caching โ†’ Performance Learning โ†’ Self-Tuning
293
- ```
294
-
295
- The system **automatically adapts** to your environment, learns from your usage patterns, and **continuously optimizes itself** for better performance over time.
296
-
297
- ## ๐Ÿš€ The Brainy Pipeline
298
-
299
- Brainy's data processing pipeline transforms raw data into searchable, connected knowledge that gets smarter over time:
109
+ // Search JSON documents by specific fields
110
+ const docs = await brainy.searchDocuments("Siamese", {
111
+ fields: ['breed', 'category'], // Search these fields
112
+ weights: { breed: 2.0 }, // Prioritize breed matches
113
+ limit: 10
114
+ })
300
115
 
301
- ```
302
- Raw Data โ†’ Embedding โ†’ Vector Storage โ†’ Graph Connections โ†’ Adaptive Learning โ†’ Query & Retrieval
116
+ // Find relationships
117
+ const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
303
118
  ```
304
119
 
305
- Each time data flows through this pipeline, Brainy learns more about your usage patterns and environment, making future
306
- operations faster and more relevant.
307
-
308
- ### Pipeline Stages
309
-
310
- 1. **Data Ingestion**
311
- - Raw text or pre-computed vectors enter the pipeline
312
- - Data is validated and prepared for processing
313
-
314
- 2. **Embedding Generation**
315
- - Text is transformed into numerical vectors using embedding models
316
- - Uses TensorFlow Universal Sentence Encoder for high-quality text embeddings
317
- - Custom embedding functions can be plugged in for specialized domains
318
-
319
- 3. **Vector Indexing**
320
- - Vectors are indexed using the HNSW algorithm
321
- - Hierarchical structure enables fast similarity search
322
- - Configurable parameters for precision vs. performance tradeoffs
323
-
324
- 4. **Graph Construction**
325
- - Nouns (entities) become nodes in the knowledge graph
326
- - Verbs (relationships) connect related entities
327
- - Typed relationships add semantic meaning to connections
328
-
329
- 5. **Adaptive Learning**
330
- - Analyzes usage patterns to optimize future operations
331
- - Tunes performance parameters based on your environment
332
- - Adjusts search strategies based on query history
333
- - Becomes more efficient and relevant the more you use it
334
-
335
- 6. **Intelligent Storage**
336
- - Data is saved using the optimal storage for your environment
337
- - Automatic selection between OPFS, filesystem, S3, or memory
338
- - Migrates between storage types as your application's needs evolve
339
- - Scales from tiny datasets to massive data collections
340
- - Configurable storage adapters for custom persistence needs
341
-
342
- ### Augmentation Types
343
-
344
- Brainy uses a powerful augmentation system to extend functionality. Augmentations are processed in the following order:
345
-
346
- 1. **SENSE**
347
- - Ingests and processes raw, unstructured data into nouns and verbs
348
- - Handles text, images, audio streams, and other input formats
349
- - Example: Converting raw text into structured entities
350
-
351
- 2. **MEMORY**
352
- - Provides storage capabilities for data in different formats
353
- - Manages persistence across sessions
354
- - Example: Storing vectors in OPFS or filesystem
355
-
356
- 3. **COGNITION**
357
- - Enables advanced reasoning, inference, and logical operations
358
- - Analyzes relationships between entities
359
- - Examples:
360
- - Inferring new connections between existing data
361
- - Deriving insights from graph relationships
362
-
363
- 4. **CONDUIT**
364
- - Establishes channels for structured data exchange
365
- - Connects with external systems and syncs between Brainy instances
366
- - Two built-in iConduit augmentations for scaling out and syncing:
367
- - **WebSocket iConduit** - Syncs data between browsers and servers
368
- - **WebRTC iConduit** - Direct peer-to-peer syncing between browsers
369
- - Examples:
370
- - Integrating with third-party APIs
371
- - Syncing Brainy instances between browsers using WebSockets
372
- - Peer-to-peer syncing between browsers using WebRTC
373
-
374
- 5. **ACTIVATION**
375
- - Initiates actions, responses, or data manipulations
376
- - Triggers events based on data changes
377
- - Example: Sending notifications when new data is processed
378
-
379
- 6. **PERCEPTION**
380
- - Interprets, contextualizes, and visualizes identified nouns and verbs
381
- - Creates meaningful representations of data
382
- - Example: Generating visualizations of graph relationships
383
-
384
- 7. **DIALOG**
385
- - Facilitates natural language understanding and generation
386
- - Enables conversational interactions
387
- - Example: Processing user queries and generating responses
388
-
389
- 8. **WEBSOCKET**
390
- - Enables real-time communication via WebSockets
391
- - Can be combined with other augmentation types
392
- - Example: Streaming data processing in real-time
393
-
394
- ### Streaming Data Support
395
-
396
- Brainy's pipeline is designed to handle streaming data efficiently:
397
-
398
- 1. **WebSocket Integration**
399
- - Built-in support for WebSocket connections
400
- - Process data as it arrives without blocking
401
- - Example: `setupWebSocketPipeline(url, dataType, options)`
402
-
403
- 2. **Asynchronous Processing**
404
- - Non-blocking architecture for real-time data handling
405
- - Parallel processing of incoming streams
406
- - Example: `createWebSocketHandler(connection, dataType, options)`
407
-
408
- 3. **Event-Based Architecture**
409
- - Augmentations can listen to data feeds and streams
410
- - Real-time updates propagate through the pipeline
411
- - Example: `listenToFeed(feedUrl, callback)`
412
-
413
- 4. **Threaded Execution**
414
- - Comprehensive multi-threading for high-performance operations
415
- - Parallel processing for batch operations, vector calculations, and embedding generation
416
- - Configurable execution modes (SEQUENTIAL, PARALLEL, THREADED)
417
- - Automatic thread management based on environment capabilities
418
- - Example: `executeTypedPipeline(augmentations, method, args, { mode: ExecutionMode.THREADED })`
419
-
420
- ### Running the Pipeline
421
-
422
- The pipeline runs automatically when you:
120
+ That's it! No config, no setup, Zero-to-Smartโ„ข
423
121
 
424
- ```typescript
425
- // Add data (runs embedding โ†’ indexing โ†’ storage)
426
- const id = await db.add("Your text data here", { metadata })
427
-
428
- // Search (runs embedding โ†’ similarity search)
429
- const results = await db.searchText("Your query here", 5)
122
+ ### ๐ŸŒ Distributed Mode Example (NEW!)
430
123
 
431
- // Connect entities (runs graph construction โ†’ storage)
432
- await db.addVerb(sourceId, targetId, { verb: VerbType.RelatedTo })
433
- ```
124
+ ```javascript
125
+ // Writer Instance - Ingests data from multiple sources
126
+ const writer = createAutoBrainy({
127
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
128
+ distributed: { role: 'writer' } // Explicit role for safety
129
+ })
434
130
 
435
- Using the CLI:
131
+ // Reader Instance - Optimized for search queries
132
+ const reader = createAutoBrainy({
133
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
134
+ distributed: { role: 'reader' } // 80% memory for cache
135
+ })
436
136
 
437
- ```bash
438
- # Add data through the CLI pipeline
439
- brainy add "Your text data here" '{"noun":"Thing"}'
137
+ // Data automatically gets domain tags
138
+ await writer.add("Patient shows symptoms of...", {
139
+ diagnosis: "flu" // Auto-tagged as 'medical' domain
140
+ })
440
141
 
441
- # Search through the CLI pipeline
442
- brainy search "Your query here" --limit 5
142
+ // Domain-aware search across all partitions
143
+ const results = await reader.search("medical symptoms", 10, {
144
+ filter: { domain: 'medical' } // Only search medical data
145
+ })
443
146
 
444
- # Connect entities through the CLI
445
- brainy addVerb <sourceId> <targetId> RelatedTo
147
+ // Monitor health across all instances
148
+ const health = reader.getHealthStatus()
149
+ console.log(`Instance ${health.instanceId}: ${health.status}`)
446
150
  ```
447
151
 
448
- ### Extending the Pipeline
449
-
450
- Brainy's pipeline is designed for extensibility at every stage:
451
-
452
- 1. **Custom Embedding**
453
- ```typescript
454
- // Create your own embedding function
455
- const myEmbedder = async (text) => {
456
- // Your custom embedding logic here
457
- return [0.1, 0.2, 0.3, ...] // Return a vector
458
- }
459
-
460
- // Use it in Brainy
461
- const db = new BrainyData({
462
- embeddingFunction: myEmbedder
463
- })
464
- ```
465
-
466
- 2. **Custom Distance Functions**
467
- ```typescript
468
- // Define your own distance function
469
- const myDistance = (a, b) => {
470
- // Your custom distance calculation
471
- return Math.sqrt(a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0))
472
- }
473
-
474
- // Use it in Brainy
475
- const db = new BrainyData({
476
- distanceFunction: myDistance
477
- })
478
- ```
479
-
480
- 3. **Custom Storage Adapters**
481
- ```typescript
482
- // Implement the StorageAdapter interface
483
- class MyStorage implements StorageAdapter {
484
- // Your storage implementation
485
- }
486
-
487
- // Use it in Brainy
488
- const db = new BrainyData({
489
- storageAdapter: new MyStorage()
490
- })
491
- ```
492
-
493
- 4. **Augmentations System**
494
- ```typescript
495
- // Create custom augmentations to extend functionality
496
- const myAugmentation = {
497
- type: 'memory',
498
- name: 'my-custom-storage',
499
- // Implementation details
500
- }
501
-
502
- // Register with Brainy
503
- db.registerAugmentation(myAugmentation)
504
- ```
505
-
506
- ## Data Model
507
-
508
- Brainy uses a graph-based data model with two primary concepts:
509
-
510
- ### Nouns (Entities)
511
-
512
- The main entities in your data (nodes in the graph):
513
-
514
- - Each noun has a unique ID, vector representation, and metadata
515
- - Nouns can be categorized by type (Person, Place, Thing, Event, Concept, etc.)
516
- - Nouns are automatically vectorized for similarity search
517
-
518
- ### Verbs (Relationships)
519
-
520
- Connections between nouns (edges in the graph):
521
-
522
- - Each verb connects a source noun to a target noun
523
- - Verbs have types that define the relationship (RelatedTo, Controls, Contains, etc.)
524
- - Verbs can have their own metadata to describe the relationship
525
-
526
- ### Type Utilities
527
-
528
- Brainy provides utility functions to access lists of noun and verb types:
529
-
530
- ```typescript
531
- import {
532
- NounType,
533
- VerbType,
534
- getNounTypes,
535
- getVerbTypes,
536
- getNounTypeMap,
537
- getVerbTypeMap
538
- } from '@soulcraft/brainy'
539
-
540
- // At development time:
541
- // Access specific types directly from the NounType and VerbType objects
542
- console.log(NounType.Person) // 'person'
543
- console.log(VerbType.Contains) // 'contains'
544
-
545
- // At runtime:
546
- // Get a list of all noun types
547
- const nounTypes = getNounTypes() // ['person', 'organization', 'location', ...]
548
-
549
- // Get a list of all verb types
550
- const verbTypes = getVerbTypes() // ['relatedTo', 'contains', 'partOf', ...]
551
-
552
- // Get a map of noun type keys to values
553
- const nounTypeMap = getNounTypeMap() // { Person: 'person', Organization: 'organization', ... }
554
-
555
- // Get a map of verb type keys to values
556
- const verbTypeMap = getVerbTypeMap() // { RelatedTo: 'relatedTo', Contains: 'contains', ... }
557
- ```
152
+ ## ๐ŸŽญ Key Features
558
153
 
559
- These utility functions make it easy to:
154
+ ### Core Capabilities
560
155
 
561
- - Get a complete list of available noun and verb types
562
- - Validate user input against valid types
563
- - Create dynamic UI components that display or select from available types
564
- - Map between type keys and their string values
156
+ - **Vector Search** - Find semantically similar content using embeddings
157
+ - **Graph Relationships** - Connect data with meaningful relationships
158
+ - **JSON Document Search** - Search within specific fields with prioritization
159
+ - **Distributed Mode** - Scale horizontally with automatic coordination between instances
160
+ - **Real-Time Syncing** - WebSocket and WebRTC for distributed instances
161
+ - **Streaming Pipeline** - Process data in real-time as it flows through
162
+ - **Model Control Protocol** - Let AI models access your data
163
+
164
+ ### Smart Optimizations
165
+
166
+ - **๐Ÿค– Intelligent Auto-Configuration** - Detects environment, usage patterns, and optimizes everything automatically
167
+ - **โšก Runtime Performance Adaptation** - Continuously monitors and self-tunes based on real usage
168
+ - **๐ŸŒ Distributed Mode Detection** - Automatically enables real-time updates for shared storage scenarios
169
+ - **๐Ÿ“Š Workload-Aware Optimization** - Adapts cache size and TTL based on read/write patterns
170
+ - **๐Ÿง  Adaptive Learning** - Gets smarter with usage, learns from your data access patterns
171
+ - **#๏ธโƒฃ Intelligent Partitioning** - Hash-based partitioning for perfect load distribution
172
+ - **๐ŸŽฏ Role-Based Optimization** - Readers maximize cache, writers optimize throughput
173
+ - **๐Ÿท๏ธ Domain-Aware Indexing** - Automatic categorization improves search relevance
174
+ - **๐Ÿ—‚๏ธ Multi-Level Caching** - Hot/warm/cold caching with predictive prefetching
175
+ - **๐Ÿ’พ Memory Optimization** - 75% reduction with compression for large datasets
176
+
177
+ ### Developer Experience
565
178
 
566
- ## Command Line Interface
179
+ - **TypeScript Support** - Fully typed API with generics
180
+ - **Extensible Augmentations** - Customize and extend functionality
181
+ - **REST API** - Web service wrapper for HTTP endpoints
182
+ - **Auto-Complete** - IntelliSense for all APIs and types
567
183
 
568
- Brainy includes a powerful CLI for managing your data. The CLI is available as a separate package
569
- `@soulcraft/brainy-cli` to reduce the bundle size of the main package.
184
+ ## ๐Ÿ“ฆ Installation
570
185
 
571
- ### Installing and Using the CLI
186
+ ### Main Package
572
187
 
573
188
  ```bash
574
- # Install the CLI globally
575
- npm install -g @soulcraft/brainy-cli
576
-
577
- # Initialize a database
578
- brainy init
579
-
580
- # Add some data
581
- brainy add "Cats are independent pets" '{"noun":"Thing","category":"animal"}'
582
- brainy add "Dogs are loyal companions" '{"noun":"Thing","category":"animal"}'
583
-
584
- # Search for similar items
585
- brainy search "feline pets" 5
586
-
587
- # Add relationships between items
588
- brainy addVerb <sourceId> <targetId> RelatedTo '{"description":"Both are pets"}'
589
-
590
- # Visualize the graph structure
591
- brainy visualize
592
- brainy visualize --root <id> --depth 3
189
+ npm install brainy
593
190
  ```
594
191
 
595
- ### Using the CLI in Your Code
596
-
597
- The CLI functionality is available as a separate package `@soulcraft/brainy-cli`. If you need CLI functionality in your
598
- application, install the CLI package:
192
+ ### Optional: Offline Models Package
599
193
 
600
194
  ```bash
601
- npm install @soulcraft/brainy-cli
195
+ npm install @soulcraft/brainy-models
602
196
  ```
603
197
 
604
- Then you can use the CLI commands programmatically or through the command line interface.
605
-
606
- ### Available Commands
607
-
608
- #### Basic Database Operations:
609
-
610
- - `init` - Initialize a new database
611
- - `add <text> [metadata]` - Add a new noun with text and optional metadata
612
- - `search <query> [limit]` - Search for nouns similar to the query
613
- - `get <id>` - Get a noun by ID
614
- - `delete <id>` - Delete a noun by ID
615
- - `addVerb <sourceId> <targetId> <verbType> [metadata]` - Add a relationship
616
- - `getVerbs <id>` - Get all relationships for a noun
617
- - `status` - Show database status
618
- - `clear` - Clear all data from the database
619
- - `generate-random-graph` - Generate test data
620
- - `visualize` - Visualize the graph structure
621
- - `completion-setup` - Setup shell autocomplete
622
-
623
- #### Pipeline and Augmentation Commands:
624
-
625
- - `list-augmentations` - List all available augmentation types and registered augmentations
626
- - `augmentation-info <type>` - Get detailed information about a specific augmentation type
627
- - `test-pipeline [text]` - Test the sequential pipeline with sample data
628
- - `-t, --data-type <type>` - Type of data to process (default: 'text')
629
- - `-m, --mode <mode>` - Execution mode: sequential, parallel, threaded (default: 'sequential')
630
- - `-s, --stop-on-error` - Stop execution if an error occurs
631
- - `-v, --verbose` - Show detailed output
632
- - `stream-test` - Test streaming data through the pipeline (simulated)
633
- - `-c, --count <number>` - Number of data items to stream (default: 5)
634
- - `-i, --interval <ms>` - Interval between data items in milliseconds (default: 1000)
635
- - `-t, --data-type <type>` - Type of data to process (default: 'text')
636
- - `-v, --verbose` - Show detailed output
637
-
638
- ## ๐Ÿ“š Documentation
639
-
640
- ### ๐Ÿš€ [Getting Started](docs/getting-started/)
641
- Quick setup guides and first steps with Brainy.
642
-
643
- - **[Installation](docs/getting-started/installation.md)** - Installation and setup
644
- - **[Quick Start](docs/getting-started/quick-start.md)** - Get running in 2 minutes
645
- - **[First Steps](docs/getting-started/first-steps.md)** - Core concepts and features
646
- - **[Environment Setup](docs/getting-started/environment-setup.md)** - Environment-specific configuration
198
+ The `@soulcraft/brainy-models` package provides **offline access** to the Universal Sentence Encoder model, eliminating
199
+ network dependencies and ensuring consistent performance. Perfect for:
647
200
 
648
- ### ๐Ÿ“– [User Guides](docs/user-guides/)
649
- Comprehensive guides for using Brainy effectively.
201
+ - **Air-gapped environments** - No internet? No problem
202
+ - **Consistent performance** - No network latency or throttling
203
+ - **Privacy-focused apps** - Keep everything local
204
+ - **High-reliability systems** - No external dependencies
650
205
 
651
- - **[Search and Metadata](docs/user-guides/SEARCH_AND_METADATA_GUIDE.md)** - Advanced search techniques
652
- - **[Write-Only Mode](docs/user-guides/WRITEONLY_MODE_IMPLEMENTATION.md)** - High-throughput data loading
653
- - **[JSON Document Search](docs/guides/json-document-search.md)** - Search within JSON fields
654
- - **[Production Migration](docs/guides/production-migration-guide.md)** - Deployment best practices
655
-
656
- ### โšก [Optimization Guides](docs/optimization-guides/)
657
- Transform Brainy from prototype to production-ready system.
658
-
659
- - **[Large-Scale Optimizations](docs/optimization-guides/large-scale-optimizations.md)** - Complete v0.36.0 optimization system
660
- - **[Auto-Configuration](docs/optimization-guides/auto-configuration.md)** - Intelligent environment detection
661
- - **[Memory Optimization](docs/optimization-guides/memory-optimization.md)** - Advanced memory management
662
- - **[Storage Optimization](docs/optimization-guides/storage-optimization.md)** - S3 and storage optimization
663
-
664
- ### ๐Ÿ”ง [API Reference](docs/api-reference/)
665
- Complete API documentation and method references.
206
+ ```javascript
207
+ import { createAutoBrainy } from 'brainy'
208
+ import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
666
209
 
667
- - **[Core API](docs/api-reference/core-api.md)** - Main BrainyData class methods
668
- - **[Vector Operations](docs/api-reference/vector-operations.md)** - Vector storage and search
669
- - **[Configuration](docs/api-reference/configuration.md)** - System configuration
670
- - **[Auto-Configuration API](docs/api-reference/auto-configuration-api.md)** - Intelligent configuration
210
+ // Use the bundled model for offline operation
211
+ const brainy = createAutoBrainy({
212
+ embeddingModel: BundledUniversalSentenceEncoder
213
+ })
214
+ ```
671
215
 
672
- ### ๐Ÿ’ก [Examples](docs/examples/)
673
- Practical code examples and real-world applications.
216
+ ## ๐ŸŽจ Build Amazing Things
674
217
 
675
- - **[Basic Usage](docs/examples/basic-usage.md)** - Simple examples to get started
676
- - **[Advanced Patterns](docs/examples/advanced-patterns.md)** - Complex use cases
677
- - **[Integrations](docs/examples/integrations.md)** - Third-party service integrations
678
- - **[Performance Examples](docs/examples/performance.md)** - Optimization and scaling
218
+ **๐Ÿค– AI Chat Applications** - Build ChatGPT-like apps with long-term memory and context awareness
219
+ **๐Ÿ” Semantic Search Engines** - Search by meaning, not keywords. Find "that thing that's like a cat but bigger" โ†’
220
+ returns "tiger"
221
+ **๐ŸŽฏ Recommendation Engines** - "Users who liked this also liked..." but actually good
222
+ **๐Ÿงฌ Knowledge Graphs** - Connect everything to everything. Wikipedia meets Neo4j meets magic
223
+ **๐Ÿ‘๏ธ Computer Vision Apps** - Store and search image embeddings. "Find all photos with dogs wearing hats"
224
+ **๐ŸŽต Music Discovery** - Find songs that "feel" similar. Spotify's Discover Weekly in your app
225
+ **๐Ÿ“š Smart Documentation** - Docs that answer questions. "How do I deploy to production?" โ†’ relevant guides
226
+ **๐Ÿ›ก๏ธ Fraud Detection** - Find patterns humans can't see. Anomaly detection on steroids
227
+ **๐ŸŒ Real-Time Collaboration** - Sync vector data across devices. Figma for AI data
228
+ **๐Ÿฅ Medical Diagnosis Tools** - Match symptoms to conditions using embedding similarity
679
229
 
680
- ### ๐Ÿ”ฌ Technical Documentation
230
+ ## ๐Ÿงฌ The Power of Nouns & Verbs
681
231
 
682
- - **[Testing Guide](docs/technical/TESTING.md)** - Testing strategies and best practices
683
- - **[Statistics Guide](STATISTICS.md)** - Database statistics and monitoring
684
- - **[Technical Guides](TECHNICAL_GUIDES.md)** - Advanced technical topics
232
+ Brainy uses a **graph-based data model** that mirrors how humans think - with **Nouns** (entities) connected by **Verbs
233
+ ** (relationships). This isn't just vectors in a void; it's structured, meaningful data.
685
234
 
686
- ## API Reference
235
+ ### ๐Ÿ“ Nouns (What Things Are)
687
236
 
688
- ### Database Management
237
+ Nouns are your entities - the "things" in your data. Each noun has:
689
238
 
690
- ```typescript
691
- // Initialize the database
692
- await db.init()
239
+ - A unique ID
240
+ - A vector representation (for similarity search)
241
+ - A type (Person, Document, Concept, etc.)
242
+ - Custom metadata
693
243
 
694
- // Clear all data
695
- await db.clear()
244
+ **Available Noun Types:**
696
245
 
697
- // Get database status
698
- const status = await db.status()
246
+ | Category | Types | Use For |
247
+ |---------------------|-------------------------------------------------------------------|-------------------------------------------------------|
248
+ | **Core Entities** | `Person`, `Organization`, `Location`, `Thing`, `Concept`, `Event` | People, companies, places, objects, ideas, happenings |
249
+ | **Digital Content** | `Document`, `Media`, `File`, `Message`, `Content` | PDFs, images, videos, emails, posts, generic content |
250
+ | **Collections** | `Collection`, `Dataset` | Groups of items, structured data sets |
251
+ | **Business** | `Product`, `Service`, `User`, `Task`, `Project` | E-commerce, SaaS, project management |
252
+ | **Descriptive** | `Process`, `State`, `Role` | Workflows, conditions, responsibilities |
699
253
 
700
- // Backup all data from the database
701
- const backupData = await db.backup()
254
+ ### ๐Ÿ”— Verbs (How Things Connect)
702
255
 
703
- // Restore data into the database
704
- const restoreResult = await db.restore(backupData, { clearExisting: true })
705
- ```
256
+ Verbs are your relationships - they give meaning to connections. Not just "these vectors are similar" but "this OWNS
257
+ that" or "this CAUSES that".
706
258
 
707
- ### Database Statistics
259
+ **Available Verb Types:**
708
260
 
709
- Brainy provides a way to get statistics about the current state of the database. For detailed information about the
710
- statistics system, including implementation details, scalability improvements, and usage examples, see
711
- our [Statistics Guide](STATISTICS.md).
261
+ | Category | Types | Examples |
262
+ |----------------|----------------------------------------------------------------------|------------------------------------------|
263
+ | **Core** | `RelatedTo`, `Contains`, `PartOf`, `LocatedAt`, `References` | Generic relations, containment, location |
264
+ | **Temporal** | `Precedes`, `Succeeds`, `Causes`, `DependsOn`, `Requires` | Time sequences, causality, dependencies |
265
+ | **Creation** | `Creates`, `Transforms`, `Becomes`, `Modifies`, `Consumes` | Creation, change, consumption |
266
+ | **Ownership** | `Owns`, `AttributedTo`, `CreatedBy`, `BelongsTo` | Ownership, authorship, belonging |
267
+ | **Social** | `MemberOf`, `WorksWith`, `FriendOf`, `Follows`, `Likes`, `ReportsTo` | Social networks, organizations |
268
+ | **Functional** | `Describes`, `Implements`, `Validates`, `Triggers`, `Serves` | Functions, implementations, services |
712
269
 
713
- ```typescript
714
- import { BrainyData, getStatistics } from '@soulcraft/brainy'
270
+ ### ๐Ÿ’ก Why This Matters
715
271
 
716
- // Create and initialize the database
717
- const db = new BrainyData()
718
- await db.init()
719
-
720
- // Get statistics using the instance method
721
- const stats = await db.getStatistics()
722
- console.log(stats)
723
- // Output: { nounCount: 0, verbCount: 0, metadataCount: 0, hnswIndexSize: 0, serviceBreakdown: {...} }
724
- ```
725
-
726
- ### Working with Nouns (Entities)
272
+ ```javascript
273
+ // Traditional vector DB: Just similarity
274
+ const similar = await vectorDB.search(embedding, 10)
275
+ // Result: [vector1, vector2, ...] - What do these mean? ๐Ÿคท
727
276
 
728
- ```typescript
729
- // Add a noun (automatically vectorized)
730
- const id = await db.add(textOrVector, {
277
+ // Brainy: Similarity + Meaning + Relationships
278
+ const catId = await brainy.add("Siamese cat", {
731
279
  noun: NounType.Thing,
732
- // other metadata...
280
+ breed: "Siamese"
733
281
  })
734
-
735
- // Add multiple nouns in parallel (with multithreading and batch embedding)
736
- const ids = await db.addBatch([
737
- {
738
- vectorOrData: "First item to add",
739
- metadata: { noun: NounType.Thing, category: 'example' }
740
- },
741
- {
742
- vectorOrData: "Second item to add",
743
- metadata: { noun: NounType.Thing, category: 'example' }
744
- },
745
- // More items...
746
- ], {
747
- forceEmbed: false,
748
- concurrency: 4, // Control the level of parallelism (default: 4)
749
- batchSize: 50 // Control the number of items to process in a single batch (default: 50)
282
+ const ownerId = await brainy.add("John Smith", {
283
+ noun: NounType.Person
750
284
  })
751
-
752
- // Retrieve a noun
753
- const noun = await db.get(id)
754
-
755
- // Update noun metadata
756
- await db.updateMetadata(id, {
757
- noun: NounType.Thing,
758
- // updated metadata...
285
+ await brainy.addVerb(ownerId, catId, {
286
+ verb: VerbType.Owns,
287
+ since: "2020-01-01"
759
288
  })
760
289
 
761
- // Delete a noun
762
- await db.delete(id)
763
-
764
- // Search for similar nouns
765
- const results = await db.search(vectorOrText, numResults)
766
- const textResults = await db.searchText("query text", numResults)
767
-
768
- // Search by noun type
769
- const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
770
-
771
- // Search within specific fields of JSON documents
772
- const fieldResults = await db.search("Acme Corporation", 10, {
773
- searchField: "company"
774
- })
775
-
776
- // Search using standard field names across different services
777
- const titleResults = await db.searchByStandardField("title", "climate change", 10)
778
- const authorResults = await db.searchByStandardField("author", "johndoe", 10, {
779
- services: ["github", "reddit"]
780
- })
781
- ```
782
-
783
- ### Field Standardization and Service Tracking
784
-
785
- Brainy automatically tracks field names from JSON documents and associates them with the service that inserted the data.
786
- This enables powerful cross-service search capabilities:
787
-
788
- ```typescript
789
- // Get all available field names organized by service
790
- const fieldNames = await db.getAvailableFieldNames()
791
- // Example output: { "github": ["repository.name", "issue.title"], "reddit": ["title", "selftext"] }
792
-
793
- // Get standard field mappings
794
- const standardMappings = await db.getStandardFieldMappings()
795
- // Example output: { "title": { "github": ["repository.name"], "reddit": ["title"] } }
290
+ // Now you can search with context!
291
+ const johnsPets = await brainy.getVerbsBySource(ownerId, VerbType.Owns)
292
+ const catOwners = await brainy.getVerbsByTarget(catId, VerbType.Owns)
796
293
  ```
797
294
 
798
- When adding data, specify the service name to ensure proper field tracking:
295
+ ## ๐ŸŒ Distributed Mode (New!)
799
296
 
800
- ```typescript
801
- // Add data with service name
802
- await db.add(jsonData, metadata, { service: "github" })
803
- ```
297
+ Brainy now supports **distributed deployments** with multiple specialized instances sharing the same data. Perfect for
298
+ scaling your AI applications across multiple servers.
804
299
 
805
- ### Working with Verbs (Relationships)
300
+ ### Distributed Setup
806
301
 
807
- ```typescript
808
- // Add a relationship between nouns
809
- await db.addVerb(sourceId, targetId, {
810
- verb: VerbType.RelatedTo,
811
- // other metadata...
302
+ ```javascript
303
+ // Single instance (no change needed!)
304
+ const brainy = createAutoBrainy({
305
+ storage: { s3Storage: { bucketName: 'my-bucket' } }
812
306
  })
813
307
 
814
- // Add a relationship with auto-creation of missing nouns
815
- // This is useful when the target noun might not exist yet
816
- await db.addVerb(sourceId, targetId, {
817
- verb: VerbType.RelatedTo,
818
- // Enable auto-creation of missing nouns
819
- autoCreateMissingNouns: true,
820
- // Optional metadata for auto-created nouns
821
- missingNounMetadata: {
822
- noun: NounType.Concept,
823
- description: 'Auto-created noun'
824
- }
308
+ // Distributed mode requires explicit role configuration
309
+ // Option 1: Via environment variable
310
+ process.env.BRAINY_ROLE = 'writer' // or 'reader' or 'hybrid'
311
+ const brainy = createAutoBrainy({
312
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
313
+ distributed: true
825
314
  })
826
315
 
827
- // Get all relationships
828
- const verbs = await db.getAllVerbs()
829
-
830
- // Get relationships by source noun
831
- const outgoingVerbs = await db.getVerbsBySource(sourceId)
832
-
833
- // Get relationships by target noun
834
- const incomingVerbs = await db.getVerbsByTarget(targetId)
316
+ // Option 2: Via configuration
317
+ const writer = createAutoBrainy({
318
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
319
+ distributed: { role: 'writer' } // Handles data ingestion
320
+ })
835
321
 
836
- // Get relationships by type
837
- const containsVerbs = await db.getVerbsByType(VerbType.Contains)
322
+ const reader = createAutoBrainy({
323
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
324
+ distributed: { role: 'reader' } // Optimized for queries
325
+ })
838
326
 
839
- // Get a specific relationship
840
- const verb = await db.getVerb(verbId)
327
+ // Option 3: Via read/write mode (role auto-inferred)
328
+ const writer = createAutoBrainy({
329
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
330
+ writeOnly: true, // Automatically becomes 'writer' role
331
+ distributed: true
332
+ })
841
333
 
842
- // Delete a relationship
843
- await db.deleteVerb(verbId)
334
+ const reader = createAutoBrainy({
335
+ storage: { s3Storage: { bucketName: 'my-bucket' } },
336
+ readOnly: true, // Automatically becomes 'reader' role
337
+ distributed: true
338
+ })
844
339
  ```
845
340
 
846
- ## Advanced Configuration
341
+ ### Key Distributed Features
847
342
 
848
- ### Database Modes
343
+ **๐ŸŽฏ Explicit Role Configuration**
849
344
 
850
- Brainy supports special operational modes that restrict certain operations:
345
+ - Roles must be explicitly set (no dangerous auto-assignment)
346
+ - Can use environment variables, config, or read/write modes
347
+ - Clear separation between writers and readers
851
348
 
852
- ```typescript
853
- import { BrainyData } from '@soulcraft/brainy'
854
-
855
- // Create and initialize the database
856
- const db = new BrainyData()
857
- await db.init()
858
-
859
- // Set the database to read-only mode (prevents write operations)
860
- db.setReadOnly(true)
349
+ **#๏ธโƒฃ Hash-Based Partitioning**
861
350
 
862
- // Check if the database is in read-only mode
863
- const isReadOnly = db.isReadOnly() // Returns true
351
+ - Handles multiple writers with different data types
352
+ - Even distribution across partitions
353
+ - No semantic conflicts with mixed data
864
354
 
865
- // Set the database to write-only mode (prevents search operations)
866
- db.setWriteOnly(true)
355
+ **๐Ÿท๏ธ Domain Tagging**
867
356
 
868
- // Check if the database is in write-only mode
869
- const isWriteOnly = db.isWriteOnly() // Returns true
357
+ - Automatic domain detection (medical, legal, product, etc.)
358
+ - Filter searches by domain
359
+ - Logical separation without complexity
870
360
 
871
- // Reset to normal mode (allows both read and write operations)
872
- db.setReadOnly(false)
873
- db.setWriteOnly(false)
361
+ ```javascript
362
+ // Data is automatically tagged with domains
363
+ await brainy.add({
364
+ symptoms: "fever",
365
+ diagnosis: "flu"
366
+ }, metadata) // Auto-tagged as 'medical'
367
+
368
+ // Search within specific domains
369
+ const medicalResults = await brainy.search(query, 10, {
370
+ filter: { domain: 'medical' }
371
+ })
874
372
  ```
875
373
 
876
- - **Read-Only Mode**: When enabled, prevents all write operations (add, update, delete). Useful for deployment scenarios
877
- where you want to prevent modifications to the database.
878
- - **Write-Only Mode**: When enabled, prevents all search operations. Useful for initial data loading or when you want to
879
- optimize for write performance.
880
-
881
- ### Embedding
374
+ **๐Ÿ“Š Health Monitoring**
882
375
 
883
- ```typescript
884
- import {
885
- BrainyData,
886
- createTensorFlowEmbeddingFunction,
887
- createThreadedEmbeddingFunction
888
- } from '@soulcraft/brainy'
889
-
890
- // Use the standard TensorFlow Universal Sentence Encoder embedding function
891
- const db = new BrainyData({
892
- embeddingFunction: createTensorFlowEmbeddingFunction()
893
- })
894
- await db.init()
376
+ - Real-time health metrics
377
+ - Automatic dead instance cleanup
378
+ - Performance tracking
895
379
 
896
- // Or use the threaded embedding function for better performance
897
- const threadedDb = new BrainyData({
898
- embeddingFunction: createThreadedEmbeddingFunction()
899
- })
900
- await threadedDb.init()
901
-
902
- // Directly embed text to vectors
903
- const vector = await db.embed("Some text to convert to a vector")
904
-
905
- // Calculate similarity between two texts or vectors
906
- const similarity = await db.calculateSimilarity(
907
- "Cats are furry pets",
908
- "Felines make good companions"
909
- )
910
- console.log(`Similarity score: ${similarity}`) // Higher value means more similar
911
-
912
- // Calculate similarity with custom options
913
- const vectorA = await db.embed("First text")
914
- const vectorB = await db.embed("Second text")
915
- const customSimilarity = await db.calculateSimilarity(
916
- vectorA, // Can use pre-computed vectors
917
- vectorB,
918
- {
919
- forceEmbed: false, // Skip embedding if inputs are already vectors
920
- distanceFunction: cosineDistance // Optional custom distance function
921
- }
922
- )
380
+ ```javascript
381
+ // Get health status
382
+ const health = brainy.getHealthStatus()
383
+ // {
384
+ // status: 'healthy',
385
+ // role: 'reader',
386
+ // vectorCount: 1000000,
387
+ // cacheHitRate: 0.95,
388
+ // requestsPerSecond: 150
389
+ // }
923
390
  ```
924
391
 
925
- The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
926
- performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
927
- falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
928
- implementation includes worker reuse and model caching for optimal performance.
929
-
930
- ### Performance Tuning
931
-
932
- Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
933
- container, server):
392
+ **โšก Role-Optimized Performance**
934
393
 
935
- #### GPU and CPU Optimization
394
+ - **Readers**: 80% memory for cache, aggressive prefetching
395
+ - **Writers**: Optimized write batching, minimal cache
396
+ - **Hybrid**: Adaptive based on workload
936
397
 
937
- Brainy uses GPU and CPU optimization for compute-intensive operations:
398
+ ### Deployment Examples
938
399
 
939
- 1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
940
- 2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
941
- 3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
942
- 4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
943
- 5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
400
+ **Docker Compose**
944
401
 
945
- #### Multithreading Support
402
+ ```yaml
403
+ services:
404
+ writer:
405
+ image: myapp
406
+ environment:
407
+ BRAINY_ROLE: writer # Optional - auto-detects
946
408
 
947
- Brainy includes comprehensive multithreading support to improve performance across all environments:
409
+ reader:
410
+ image: myapp
411
+ environment:
412
+ BRAINY_ROLE: reader # Optional - auto-detects
413
+ scale: 5
414
+ ```
948
415
 
949
- 1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
950
- 2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
951
- 3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
952
- 4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
953
- 5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
954
- 6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
955
- 7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
416
+ **Kubernetes**
956
417
 
957
- ```typescript
958
- import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
959
-
960
- // Configure with custom options
961
- const db = new BrainyData({
962
- // Use Euclidean distance instead of default cosine distance
963
- distanceFunction: euclideanDistance,
964
-
965
- // HNSW index configuration for search performance
966
- hnsw: {
967
- M: 16, // Max connections per noun
968
- efConstruction: 200, // Construction candidate list size
969
- efSearch: 50, // Search candidate list size
970
- },
971
-
972
- // Performance optimization options
973
- performance: {
974
- useParallelization: true, // Enable multithreaded search operations
975
- },
976
-
977
- // Noun and Verb type validation
978
- typeValidation: {
979
- enforceNounTypes: true, // Validate noun types against NounType enum
980
- enforceVerbTypes: true, // Validate verb types against VerbType enum
981
- },
982
-
983
- // Storage configuration
984
- storage: {
985
- requestPersistentStorage: true,
986
- // Example configuration for cloud storage (replace with your own values):
987
- // s3Storage: {
988
- // bucketName: 'your-s3-bucket-name',
989
- // region: 'your-aws-region'
990
- // // Credentials should be provided via environment variables
991
- // // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
992
- // }
993
- }
994
- })
418
+ ```yaml
419
+ # Automatically detects role from deployment type
420
+ apiVersion: apps/v1
421
+ kind: Deployment
422
+ metadata:
423
+ name: brainy-readers
424
+ spec:
425
+ replicas: 10 # Multiple readers
426
+ template:
427
+ spec:
428
+ containers:
429
+ - name: app
430
+ image: myapp
431
+ # Role auto-detected as 'reader' (multiple replicas)
995
432
  ```
996
433
 
997
- ### Optimized HNSW for Large Datasets
434
+ **Benefits**
998
435
 
999
- Brainy includes an optimized HNSW index implementation for large datasets that may not fit entirely in memory, using a
1000
- hybrid approach:
436
+ - โœ… **50-70% faster searches** with parallel readers
437
+ - โœ… **No coordination complexity** - Shared JSON config in S3
438
+ - โœ… **Zero downtime scaling** - Add/remove instances anytime
439
+ - โœ… **Automatic failover** - Dead instances cleaned up automatically
1001
440
 
1002
- 1. **Product Quantization** - Reduces vector dimensionality while preserving similarity relationships
1003
- 2. **Disk-Based Storage** - Offloads vectors to disk when memory usage exceeds a threshold
1004
- 3. **Memory-Efficient Indexing** - Optimizes memory usage for large-scale vector collections
441
+ ## ๐Ÿค” Why Choose Brainy?
1005
442
 
1006
- ```typescript
1007
- import { BrainyData } from '@soulcraft/brainy'
1008
-
1009
- // Configure with optimized HNSW index for large datasets
1010
- const db = new BrainyData({
1011
- hnswOptimized: {
1012
- // Standard HNSW parameters
1013
- M: 16, // Max connections per noun
1014
- efConstruction: 200, // Construction candidate list size
1015
- efSearch: 50, // Search candidate list size
1016
-
1017
- // Memory threshold in bytes - when exceeded, will use disk-based approach
1018
- memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
1019
-
1020
- // Product quantization settings for dimensionality reduction
1021
- productQuantization: {
1022
- enabled: true, // Enable product quantization
1023
- numSubvectors: 16, // Number of subvectors to split the vector into
1024
- numCentroids: 256 // Number of centroids per subvector
1025
- },
1026
-
1027
- // Whether to use disk-based storage for the index
1028
- useDiskBasedIndex: true // Enable disk-based storage
1029
- },
1030
-
1031
- // Storage configuration (required for disk-based index)
1032
- storage: {
1033
- requestPersistentStorage: true
1034
- }
1035
- })
443
+ ### vs. Traditional Databases
1036
444
 
1037
- // The optimized index automatically adapts based on dataset size:
1038
- // 1. For small datasets: Uses standard in-memory approach
1039
- // 2. For medium datasets: Applies product quantization to reduce memory usage
1040
- // 3. For large datasets: Combines product quantization with disk-based storage
445
+ โŒ **PostgreSQL with pgvector** - Requires complex setup, tuning, and DevOps expertise
446
+ โœ… **Brainy** - Zero config, auto-optimizes, works everywhere from browser to cloud
1041
447
 
1042
- // Check status to see memory usage and optimization details
1043
- const status = await db.status()
1044
- console.log(status.details.index)
1045
- ```
448
+ ### vs. Vector Databases
1046
449
 
1047
- ## Distance Functions
450
+ โŒ **Pinecone/Weaviate/Qdrant** - Cloud-only, expensive, vendor lock-in
451
+ โœ… **Brainy** - Run locally, in browser, or cloud. Your choice, your data
1048
452
 
1049
- Brainy provides several distance functions for vector similarity calculations:
453
+ ### vs. Graph Databases
1050
454
 
1051
- - `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
1052
- - `euclideanDistance`: Measures the straight-line distance between vectors
1053
- - `manhattanDistance`: Measures the sum of absolute differences between vector components
1054
- - `dotProductDistance`: Measures the negative dot product between vectors
455
+ โŒ **Neo4j** - Great for graphs, no vector support
456
+ โœ… **Brainy** - Vectors + graphs in one. Best of both worlds
1055
457
 
1056
- All distance functions are optimized for performance and automatically use the most efficient implementation based on
1057
- the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
1058
- and multithreading when available to improve performance.
458
+ ### vs. DIY Solutions
1059
459
 
1060
- ## Backup and Restore
460
+ โŒ **Building your own** - Months of work, optimization nightmares
461
+ โœ… **Brainy** - Production-ready in 30 seconds
1061
462
 
1062
- Brainy provides backup and restore capabilities that allow you to:
463
+ ## ๐Ÿš€ Getting Started in 30 Seconds
1063
464
 
1064
- - Back up your data
1065
- - Transfer data between Brainy instances
1066
- - Restore existing data into Brainy for vectorization and indexing
1067
- - Backup data for analysis or visualization in other tools
465
+ ### React
1068
466
 
1069
- ### Backing Up Data
467
+ ```jsx
468
+ import { createAutoBrainy } from 'brainy'
469
+ import { useEffect, useState } from 'react'
1070
470
 
1071
- ```typescript
1072
- // Backup all data from the database
1073
- const backupData = await db.backup()
471
+ function SemanticSearch() {
472
+ const [brainy] = useState(() => createAutoBrainy())
473
+ const [results, setResults] = useState([])
1074
474
 
1075
- // The backup data includes:
1076
- // - All nouns (entities) with their vectors and metadata
1077
- // - All verbs (relationships) between nouns
1078
- // - Noun types and verb types
1079
- // - HNSW index data for fast similarity search
1080
- // - Version information
1081
-
1082
- // Save the backup data to a file (Node.js environment)
1083
- import fs from 'fs'
475
+ const search = async (query) => {
476
+ const items = await brainy.searchText(query, 10)
477
+ setResults(items)
478
+ }
1084
479
 
1085
- fs.writeFileSync('brainy-backup.json', JSON.stringify(backupData, null, 2))
480
+ return (
481
+ <input onChange={(e) => search(e.target.value)}
482
+ placeholder="Search by meaning..." />
483
+ )
484
+ }
1086
485
  ```
1087
486
 
1088
- ### Restoring Data
1089
-
1090
- Brainy's restore functionality can handle:
1091
-
1092
- 1. Complete backups with vectors and index data
1093
- 2. Sparse data without vectors (vectors will be created during restore)
1094
- 3. Data without HNSW index (index will be reconstructed if needed)
487
+ ### Angular
1095
488
 
1096
489
  ```typescript
1097
- // Restore data with all options
1098
- const restoreResult = await db.restore(backupData, {
1099
- clearExisting: true // Whether to clear existing data before restore
490
+ import { Component, OnInit } from '@angular/core'
491
+ import { createAutoBrainy } from 'brainy'
492
+
493
+ @Component({
494
+ selector: 'app-search',
495
+ template: `
496
+ <input (input)="search($event.target.value)"
497
+ placeholder="Semantic search...">
498
+ <div *ngFor="let result of results">
499
+ {{ result.text }}
500
+ </div>
501
+ `
1100
502
  })
503
+ export class SearchComponent implements OnInit {
504
+ brainy = createAutoBrainy()
505
+ results = []
1101
506
 
1102
- // Import sparse data (without vectors)
1103
- // Vectors will be automatically created using the embedding function
1104
- const sparseData = {
1105
- nouns: [
1106
- {
1107
- id: '123',
1108
- // No vector field - will be created during import
1109
- metadata: {
1110
- noun: 'Thing',
1111
- text: 'This text will be used to generate a vector'
1112
- }
1113
- }
1114
- ],
1115
- verbs: [],
1116
- version: '1.0.0'
507
+ async search(query: string) {
508
+ this.results = await this.brainy.searchText(query, 10)
509
+ }
1117
510
  }
1118
-
1119
- const sparseImportResult = await db.importSparseData(sparseData)
1120
- ```
1121
-
1122
- ### CLI Backup/Restore
1123
-
1124
- ```bash
1125
- # Backup data to a file
1126
- brainy backup --output brainy-backup.json
1127
-
1128
- # Restore data from a file
1129
- brainy restore --input brainy-backup.json --clear-existing
1130
-
1131
- # Import sparse data (without vectors)
1132
- brainy import-sparse --input sparse-data.json
1133
511
  ```
1134
512
 
1135
- ## Embedding
1136
-
1137
- Brainy uses the following embedding approach:
1138
-
1139
- - TensorFlow Universal Sentence Encoder (high-quality text embeddings)
1140
- - GPU acceleration when available (via WebGL in browsers)
1141
- - Batch embedding for processing multiple items efficiently
1142
- - Worker reuse and model caching for optimal performance
1143
- - Custom embedding functions can be plugged in for specialized domains
1144
-
1145
- ## Extensions
1146
-
1147
- Brainy includes an augmentation system for extending functionality:
513
+ ### Vue 3
1148
514
 
1149
- - **Memory Augmentations**: Different storage backends
1150
- - **Sense Augmentations**: Process raw data
1151
- - **Cognition Augmentations**: Reasoning and inference
1152
- - **Dialog Augmentations**: Text processing and interaction
1153
- - **Perception Augmentations**: Data interpretation and visualization
1154
- - **Activation Augmentations**: Trigger actions
515
+ ```vue
1155
516
 
1156
- ### Simplified Augmentation System
517
+ <script setup>
518
+ import { createAutoBrainy } from 'brainy'
519
+ import { ref } from 'vue'
1157
520
 
1158
- Brainy provides a simplified factory system for creating, importing, and executing augmentations with minimal
1159
- boilerplate:
521
+ const brainy = createAutoBrainy()
522
+ const results = ref([])
1160
523
 
1161
- ```typescript
1162
- import {
1163
- createMemoryAugmentation,
1164
- createConduitAugmentation,
1165
- createSenseAugmentation,
1166
- addWebSocketSupport,
1167
- executeStreamlined,
1168
- processStaticData,
1169
- processStreamingData,
1170
- createPipeline
1171
- } from '@soulcraft/brainy'
1172
-
1173
- // Create a memory augmentation with minimal code
1174
- const memoryAug = createMemoryAugmentation({
1175
- name: 'simple-memory',
1176
- description: 'A simple in-memory storage augmentation',
1177
- autoRegister: true,
1178
- autoInitialize: true,
1179
-
1180
- // Implement only the methods you need
1181
- storeData: async (key, data) => {
1182
- // Your implementation here
1183
- return {
1184
- success: true,
1185
- data: true
1186
- }
1187
- },
1188
-
1189
- retrieveData: async (key) => {
1190
- // Your implementation here
1191
- return {
1192
- success: true,
1193
- data: { example: 'data', key }
1194
- }
524
+ const search = async (query) => {
525
+ results.value = await brainy.searchText(query, 10)
1195
526
  }
1196
- })
1197
-
1198
- // Add WebSocket support to any augmentation
1199
- const wsAugmentation = addWebSocketSupport(memoryAug, {
1200
- connectWebSocket: async (url) => {
1201
- // Your implementation here
1202
- return {
1203
- connectionId: 'ws-1',
1204
- url,
1205
- status: 'connected'
1206
- }
1207
- }
1208
- })
527
+ </script>
1209
528
 
1210
- // Process static data through a pipeline
1211
- const result = await processStaticData(
1212
- 'Input data',
1213
- [
1214
- {
1215
- augmentation: senseAug,
1216
- method: 'processRawData',
1217
- transformArgs: (data) => [data, 'text']
1218
- },
1219
- {
1220
- augmentation: memoryAug,
1221
- method: 'storeData',
1222
- transformArgs: (data) => ['processed-data', data]
1223
- }
1224
- ]
1225
- )
1226
-
1227
- // Create a reusable pipeline
1228
- const pipeline = createPipeline([
1229
- {
1230
- augmentation: senseAug,
1231
- method: 'processRawData',
1232
- transformArgs: (data) => [data, 'text']
1233
- },
1234
- {
1235
- augmentation: memoryAug,
1236
- method: 'storeData',
1237
- transformArgs: (data) => ['processed-data', data]
529
+ <template>
530
+ <input @input="search($event.target.value)"
531
+ placeholder="Find similar content...">
532
+ <div v-for="result in results" :key="result.id">
533
+ {{ result.text }}
534
+ </div>
535
+ </template>
536
+ ```
537
+
538
+ ### Svelte
539
+
540
+ ```svelte
541
+ <script>
542
+ import { createAutoBrainy } from 'brainy'
543
+
544
+ const brainy = createAutoBrainy()
545
+ let results = []
546
+
547
+ async function search(e) {
548
+ results = await brainy.searchText(e.target.value, 10)
1238
549
  }
1239
- ])
1240
-
1241
- // Use the pipeline
1242
- const result = await pipeline('New input data')
550
+ </script>
1243
551
 
1244
- // Dynamically load augmentations at runtime
1245
- const loadedAugmentations = await loadAugmentationModule(
1246
- import('./my-augmentations.js'),
1247
- {
1248
- autoRegister: true,
1249
- autoInitialize: true
1250
- }
1251
- )
552
+ <input on:input={search} placeholder="AI-powered search...">
553
+ {#each results as result}
554
+ <div>{result.text}</div>
555
+ {/each}
1252
556
  ```
1253
557
 
1254
- The simplified augmentation system provides:
1255
-
1256
- 1. **Factory Functions** - Create augmentations with minimal boilerplate
1257
- 2. **WebSocket Support** - Add WebSocket capabilities to any augmentation
1258
- 3. **Streamlined Pipeline** - Process data through augmentations more efficiently
1259
- 4. **Dynamic Loading** - Load augmentations at runtime when needed
1260
- 5. **Static & Streaming Data** - Handle both static and streaming data with the same API
558
+ ### Next.js (App Router)
1261
559
 
1262
- #### WebSocket Augmentation Types
560
+ ```jsx
561
+ // app/search/page.js
562
+ import { createAutoBrainy } from 'brainy'
1263
563
 
1264
- Brainy exports several WebSocket augmentation types that can be used by augmentation creators to add WebSocket
1265
- capabilities to their augmentations:
1266
-
1267
- ```typescript
1268
- import {
1269
- // Base WebSocket support interface
1270
- IWebSocketSupport,
1271
-
1272
- // Combined WebSocket augmentation types
1273
- IWebSocketSenseAugmentation,
1274
- IWebSocketConduitAugmentation,
1275
- IWebSocketCognitionAugmentation,
1276
- IWebSocketMemoryAugmentation,
1277
- IWebSocketPerceptionAugmentation,
1278
- IWebSocketDialogAugmentation,
1279
- IWebSocketActivationAugmentation,
1280
-
1281
- // Function to add WebSocket support to any augmentation
1282
- addWebSocketSupport
1283
- } from '@soulcraft/brainy'
1284
-
1285
- // Example: Creating a typed WebSocket-enabled sense augmentation
1286
- const mySenseAug = createSenseAugmentation({
1287
- name: 'my-sense',
1288
- processRawData: async (data, dataType) => {
1289
- // Implementation
1290
- return {
1291
- success: true,
1292
- data: { nouns: [], verbs: [] }
1293
- }
564
+ export default function SearchPage() {
565
+ async function search(formData) {
566
+ 'use server'
567
+ const brainy = createAutoBrainy({ bucketName: 'vectors' })
568
+ const query = formData.get('query')
569
+ return await brainy.searchText(query, 10)
1294
570
  }
1295
- }) as IWebSocketSenseAugmentation
1296
-
1297
- // Add WebSocket support
1298
- addWebSocketSupport(mySenseAug, {
1299
- connectWebSocket: async (url) => {
1300
- // WebSocket implementation
1301
- return {
1302
- connectionId: 'ws-1',
1303
- url,
1304
- status: 'connected'
1305
- }
1306
- },
1307
- sendWebSocketMessage: async (connectionId, data) => {
1308
- // Send message implementation
1309
- },
1310
- onWebSocketMessage: async (connectionId, callback) => {
1311
- // Register callback implementation
1312
- },
1313
- offWebSocketMessage: async (connectionId, callback) => {
1314
- // Remove callback implementation
1315
- },
1316
- closeWebSocket: async (connectionId, code, reason) => {
1317
- // Close connection implementation
1318
- }
1319
- })
1320
571
 
1321
- // Now mySenseAug has both sense augmentation methods and WebSocket methods
1322
- await mySenseAug.processRawData('data', 'text')
1323
- await mySenseAug.connectWebSocket('wss://example.com')
572
+ return (
573
+ <form action={search}>
574
+ <input name="query" placeholder="Search..." />
575
+ <button type="submit">Search</button>
576
+ </form>
577
+ )
578
+ }
1324
579
  ```
1325
580
 
1326
- These WebSocket augmentation types combine the base augmentation interfaces with the `IWebSocketSupport` interface,
1327
- providing type safety and autocompletion for augmentations with WebSocket capabilities.
1328
-
1329
- ### Model Control Protocol (MCP)
1330
-
1331
- Brainy includes a Model Control Protocol (MCP) implementation that allows external models to access Brainy data and use
1332
- the augmentation pipeline as tools:
581
+ ### Node.js / Bun / Deno
1333
582
 
1334
- - **BrainyMCPAdapter**: Provides access to Brainy data through MCP
1335
- - **MCPAugmentationToolset**: Exposes the augmentation pipeline as tools
1336
- - **BrainyMCPService**: Integrates the adapter and toolset, providing WebSocket and REST server implementations
1337
-
1338
- Environment compatibility:
1339
-
1340
- - **BrainyMCPAdapter** and **MCPAugmentationToolset** can run in any environment (browser, Node.js, server)
1341
- - **BrainyMCPService** core functionality works in any environment
1342
-
1343
- For detailed documentation and usage examples, see the [MCP documentation](src/mcp/README.md).
1344
-
1345
- ## Cross-Environment Compatibility
1346
-
1347
- Brainy is designed to run seamlessly in any environment, from browsers to Node.js to serverless functions and
1348
- containers. All Brainy data, functions, and augmentations are environment-agnostic, allowing you to use the same code
1349
- everywhere.
583
+ ```javascript
584
+ import { createAutoBrainy } from 'brainy'
1350
585
 
1351
- ### Environment Detection
586
+ const brainy = createAutoBrainy()
1352
587
 
1353
- Brainy automatically detects the environment it's running in:
588
+ // Add some data
589
+ await brainy.add("TypeScript is a typed superset of JavaScript", {
590
+ category: 'programming'
591
+ })
1354
592
 
1355
- ```typescript
1356
- import { environment } from '@soulcraft/brainy'
1357
-
1358
- // Check which environment we're running in
1359
- console.log(`Running in ${
1360
- environment.isBrowser ? 'browser' :
1361
- environment.isNode ? 'Node.js' :
1362
- 'serverless/unknown'
1363
- } environment`)
593
+ // Search for similar content
594
+ const results = await brainy.searchText("JavaScript with types", 5)
595
+ console.log(results)
1364
596
  ```
1365
597
 
1366
- ### Adaptive Storage
1367
-
1368
- Storage adapters are automatically selected based on the environment:
1369
-
1370
- - **Browser**: Uses Origin Private File System (OPFS) when available, falls back to in-memory storage
1371
- - **Node.js**: Uses file system storage by default, with options for S3-compatible cloud storage
1372
- - **Serverless**: Uses in-memory storage with options for cloud persistence
1373
- - **Container**: Automatically detects and uses the appropriate storage based on available capabilities
1374
-
1375
- ### Dynamic Imports
1376
-
1377
- Brainy uses dynamic imports to load environment-specific dependencies only when needed, keeping the bundle size small
1378
- and ensuring compatibility across environments.
1379
-
1380
- ### Browser Support
1381
-
1382
- Works in all modern browsers:
1383
-
1384
- - Chrome 86+
1385
- - Edge 86+
1386
- - Opera 72+
1387
- - Chrome for Android 86+
1388
-
1389
- For browsers without OPFS support, falls back to in-memory storage.
598
+ ### Vanilla JavaScript
1390
599
 
1391
- ## Related Projects
1392
-
1393
- - **[Cartographer](https://github.com/sodal-project/cartographer)** - A companion project that provides standardized
1394
- interfaces for interacting with Brainy
1395
-
1396
- ## Demo
1397
-
1398
- The repository includes a comprehensive demo that showcases Brainy's main features:
1399
-
1400
- - `demo/index.html` - A single demo page with animations demonstrating Brainy's features.
1401
- - **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the
1402
- interactive demo on
1403
- GitHub Pages
1404
- - Or run it locally with `npm run demo` (see [demo instructions](demo.md) for details)
1405
- - To deploy your own version to GitHub Pages, use the GitHub Actions workflow in
1406
- `.github/workflows/deploy-demo.yml`,
1407
- which automatically deploys when pushing to the main branch or can be manually triggered
1408
- - To use a custom domain (like www.soulcraft.com):
1409
- 1. A CNAME file is already included in the demo directory
1410
- 2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
1411
- 3. Configure your domain's DNS settings to point to GitHub Pages:
600
+ ```html
601
+ <!DOCTYPE html>
602
+ <html>
603
+ <head>
604
+ <script type="module">
605
+ import { createAutoBrainy } from 'https://unpkg.com/brainy/dist/unified.min.js'
606
+
607
+ window.brainy = createAutoBrainy()
608
+
609
+ window.search = async function(query) {
610
+ const results = await brainy.searchText(query, 10)
611
+ document.getElementById('results').innerHTML =
612
+ results.map(r => `<div>${r.text}</div>`).join('')
613
+ }
614
+ </script>
615
+ </head>
616
+ <body>
617
+ <input onkeyup="search(this.value)" placeholder="Search...">
618
+ <div id="results"></div>
619
+ </body>
620
+ </html>
621
+ ```
1412
622
 
1413
- - Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
1414
- - Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
623
+ ### Cloudflare Workers
1415
624
 
1416
- The demo showcases:
625
+ ```javascript
626
+ import { createAutoBrainy } from 'brainy'
1417
627
 
1418
- - How Brainy runs in different environments (browser, Node.js, server, cloud)
1419
- - How the noun-verb data model works
1420
- - How HNSW search works
628
+ export default {
629
+ async fetch(request, env) {
630
+ const brainy = createAutoBrainy({
631
+ bucketName: env.R2_BUCKET
632
+ })
1421
633
 
1422
- ## Syncing Brainy Instances
634
+ const url = new URL(request.url)
635
+ const query = url.searchParams.get('q')
1423
636
 
1424
- You can use the conduit augmentations to sync Brainy instances:
637
+ const results = await brainy.searchText(query, 10)
638
+ return Response.json(results)
639
+ }
640
+ }
641
+ ```
1425
642
 
1426
- - **WebSocket iConduit**: For syncing between browsers and servers, or between servers. WebSockets cannot be used for
1427
- direct browser-to-browser communication without a server in the middle.
1428
- - **WebRTC iConduit**: For direct peer-to-peer syncing between browsers. This is the recommended approach for
1429
- browser-to-browser communication.
643
+ ### AWS Lambda
1430
644
 
1431
- #### WebSocket Sync Example
645
+ ```javascript
646
+ import { createAutoBrainy } from 'brainy'
1432
647
 
1433
- ```typescript
1434
- import {
1435
- BrainyData,
1436
- pipeline,
1437
- createConduitAugmentation
1438
- } from '@soulcraft/brainy'
648
+ export const handler = async (event) => {
649
+ const brainy = createAutoBrainy({
650
+ bucketName: process.env.S3_BUCKET
651
+ })
1439
652
 
1440
- // Create and initialize the database
1441
- const db = new BrainyData()
1442
- await db.init()
653
+ const results = await brainy.searchText(event.query, 10)
1443
654
 
1444
- // Create a WebSocket conduit augmentation
1445
- const wsConduit = await createConduitAugmentation('websocket', 'my-websocket-sync')
655
+ return {
656
+ statusCode: 200,
657
+ body: JSON.stringify(results)
658
+ }
659
+ }
660
+ ```
1446
661
 
1447
- // Register the augmentation with the pipeline
1448
- pipeline.register(wsConduit)
662
+ ### Azure Functions
1449
663
 
1450
- // Connect to another Brainy instance (server or browser)
1451
- // Replace the example URL below with your actual WebSocket server URL
1452
- const connectionResult = await pipeline.executeConduitPipeline(
1453
- 'establishConnection',
1454
- ['wss://example-websocket-server.com/brainy-sync', { protocols: 'brainy-sync' }]
1455
- )
664
+ ```javascript
665
+ import { createAutoBrainy } from 'brainy'
1456
666
 
1457
- if (connectionResult[0] && (await connectionResult[0]).success) {
1458
- const connection = (await connectionResult[0]).data
667
+ module.exports = async function(context, req) {
668
+ const brainy = createAutoBrainy({
669
+ bucketName: process.env.AZURE_STORAGE_CONTAINER
670
+ })
1459
671
 
1460
- // Read data from the remote instance
1461
- const readResult = await pipeline.executeConduitPipeline(
1462
- 'readData',
1463
- [{ connectionId: connection.connectionId, query: { type: 'getAllNouns' } }]
1464
- )
672
+ const results = await brainy.searchText(req.query.q, 10)
1465
673
 
1466
- // Process and add the received data to the local instance
1467
- if (readResult[0] && (await readResult[0]).success) {
1468
- const remoteNouns = (await readResult[0]).data
1469
- for (const noun of remoteNouns) {
1470
- await db.add(noun.vector, noun.metadata)
1471
- }
674
+ context.res = {
675
+ body: results
1472
676
  }
1473
-
1474
- // Set up real-time sync by monitoring the stream
1475
- await wsConduit.monitorStream(connection.connectionId, async (data) => {
1476
- // Handle incoming data (e.g., new nouns, verbs, updates)
1477
- if (data.type === 'newNoun') {
1478
- await db.add(data.vector, data.metadata)
1479
- } else if (data.type === 'newVerb') {
1480
- await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
1481
- }
1482
- })
1483
677
  }
1484
678
  ```
1485
679
 
1486
- #### WebRTC Peer-to-Peer Sync Example
1487
-
1488
- ```typescript
1489
- import {
1490
- BrainyData,
1491
- pipeline,
1492
- createConduitAugmentation
1493
- } from '@soulcraft/brainy'
680
+ ### Google Cloud Functions
1494
681
 
1495
- // Create and initialize the database
1496
- const db = new BrainyData()
1497
- await db.init()
682
+ ```javascript
683
+ import { createAutoBrainy } from 'brainy'
1498
684
 
1499
- // Create a WebRTC conduit augmentation
1500
- const webrtcConduit = await createConduitAugmentation('webrtc', 'my-webrtc-sync')
1501
-
1502
- // Register the augmentation with the pipeline
1503
- pipeline.register(webrtcConduit)
1504
-
1505
- // Connect to a peer using a signaling server
1506
- // Replace the example values below with your actual configuration
1507
- const connectionResult = await pipeline.executeConduitPipeline(
1508
- 'establishConnection',
1509
- [
1510
- 'peer-id-to-connect-to', // Replace with actual peer ID
1511
- {
1512
- signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
1513
- localPeerId: 'my-local-peer-id', // Replace with your local peer ID
1514
- iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] // Public STUN server
1515
- }
1516
- ]
1517
- )
1518
-
1519
- if (connectionResult[0] && (await connectionResult[0]).success) {
1520
- const connection = (await connectionResult[0]).data
1521
-
1522
- // Set up real-time sync by monitoring the stream
1523
- await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
1524
- // Handle incoming data (e.g., new nouns, verbs, updates)
1525
- if (data.type === 'newNoun') {
1526
- await db.add(data.vector, data.metadata)
1527
- } else if (data.type === 'newVerb') {
1528
- await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
1529
- }
685
+ export const searchHandler = async (req, res) => {
686
+ const brainy = createAutoBrainy({
687
+ bucketName: process.env.GCS_BUCKET
1530
688
  })
1531
689
 
1532
- // When adding new data locally, also send to the peer
1533
- const nounId = await db.add("New data to sync", { noun: "Thing" })
1534
-
1535
- // Send the new noun to the peer
1536
- await pipeline.executeConduitPipeline(
1537
- 'writeData',
1538
- [
1539
- {
1540
- connectionId: connection.connectionId,
1541
- data: {
1542
- type: 'newNoun',
1543
- id: nounId,
1544
- vector: (await db.get(nounId)).vector,
1545
- metadata: (await db.get(nounId)).metadata
1546
- }
1547
- }
1548
- ]
1549
- )
690
+ const results = await brainy.searchText(req.query.q, 10)
691
+ res.json(results)
1550
692
  }
1551
693
  ```
1552
694
 
1553
- #### Browser-Server Search Example
695
+ ### Google Cloud Run
1554
696
 
1555
- Brainy supports searching a server-hosted instance from a browser, storing results locally, and performing further
1556
- searches against the local instance:
1557
-
1558
- ```typescript
1559
- import { BrainyData } from '@soulcraft/brainy'
1560
-
1561
- // Create and initialize the database with remote server configuration
1562
- // Replace the example URL below with your actual Brainy server URL
1563
- const db = new BrainyData({
1564
- remoteServer: {
1565
- url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
1566
- protocols: 'brainy-sync',
1567
- autoConnect: true // Connect automatically during initialization
1568
- }
1569
- })
1570
- await db.init()
1571
-
1572
- // Or connect manually after initialization
1573
- if (!db.isConnectedToRemoteServer()) {
1574
- // Replace the example URL below with your actual Brainy server URL
1575
- await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
1576
- }
1577
-
1578
- // Search the remote server (results are stored locally)
1579
- const remoteResults = await db.searchText('machine learning', 5, { searchMode: 'remote' })
697
+ ```dockerfile
698
+ # Dockerfile
699
+ FROM node:20-alpine
700
+ USER node
701
+ WORKDIR /app
702
+ COPY package*.json ./
703
+ RUN npm install brainy
704
+ COPY . .
705
+ CMD ["node", "server.js"]
706
+ ```
1580
707
 
1581
- // Search the local database (includes previously stored results)
1582
- const localResults = await db.searchText('machine learning', 5, { searchMode: 'local' })
708
+ ```javascript
709
+ // server.js
710
+ import { createAutoBrainy } from 'brainy'
711
+ import express from 'express'
1583
712
 
1584
- // Perform a combined search (local first, then remote if needed)
1585
- const combinedResults = await db.searchText('neural networks', 5, { searchMode: 'combined' })
713
+ const app = express()
714
+ const brainy = createAutoBrainy({
715
+ bucketName: process.env.GCS_BUCKET
716
+ })
1586
717
 
1587
- // Add data to both local and remote instances
1588
- const id = await db.addToBoth('Deep learning is a subset of machine learning', {
1589
- noun: 'Concept',
1590
- category: 'AI',
1591
- tags: ['deep learning', 'neural networks']
718
+ app.get('/search', async (req, res) => {
719
+ const results = await brainy.searchText(req.query.q, 10)
720
+ res.json(results)
1592
721
  })
1593
722
 
1594
- // Clean up when done (this also cleans up worker pools)
1595
- await db.shutDown()
723
+ const port = process.env.PORT || 8080
724
+ app.listen(port, () => console.log(`Brainy on Cloud Run: ${port}`))
1596
725
  ```
1597
726
 
1598
- ---
1599
-
1600
- ## ๐Ÿ“ˆ Scaling Strategy
1601
-
1602
- Brainy is designed to handle datasets of various sizes, from small collections to large-scale deployments. For
1603
- terabyte-scale data that can't fit entirely in memory, we provide several approaches:
1604
-
1605
- - **Disk-Based HNSW**: Modified implementations using intelligent caching and partial loading
1606
- - **Distributed HNSW**: Sharding and partitioning across multiple machines
1607
- - **Hybrid Solutions**: Combining quantization techniques with multi-tier architectures
1608
-
1609
- For detailed information on how to scale Brainy for large datasets, vector dimension standardization, threading
1610
- implementation, storage testing, and other technical topics, see our
1611
- comprehensive [Technical Guides](TECHNICAL_GUIDES.md).
1612
-
1613
- ## Recent Changes and Performance Improvements
1614
-
1615
- ### Enhanced Memory Management and Scalability
1616
-
1617
- Brainy has been significantly improved to handle larger datasets more efficiently:
727
+ ```bash
728
+ # Deploy to Cloud Run
729
+ gcloud run deploy brainy-api \
730
+ --source . \
731
+ --platform managed \
732
+ --region us-central1 \
733
+ --allow-unauthenticated
734
+ ```
1618
735
 
1619
- - **Pagination Support**: All data retrieval methods now support pagination to avoid loading entire datasets into memory
1620
- at once. The deprecated `getAllNouns()` and `getAllVerbs()` methods have been replaced with `getNouns()` and
1621
- `getVerbs()` methods that support pagination, filtering, and cursor-based navigation.
736
+ ### Vercel Edge Functions
1622
737
 
1623
- - **Multi-level Caching**: A sophisticated three-level caching strategy has been implemented:
1624
- - **Level 1**: Hot cache (most accessed nodes) - RAM (automatically detecting and adjusting in each environment)
1625
- - **Level 2**: Warm cache (recent nodes) - OPFS, Filesystem or S3 depending on environment
1626
- - **Level 3**: Cold storage (all nodes) - OPFS, Filesystem or S3 depending on environment
738
+ ```javascript
739
+ import { createAutoBrainy } from 'brainy'
1627
740
 
1628
- - **Adaptive Memory Usage**: The system automatically detects available memory and adjusts cache sizes accordingly:
1629
- - In Node.js: Uses 10% of free memory (minimum 1000 entries)
1630
- - In browsers: Scales based on device memory (500 entries per GB, minimum 1000)
741
+ export const config = {
742
+ runtime: 'edge'
743
+ }
1631
744
 
1632
- - **Intelligent Cache Eviction**: Implements a Least Recently Used (LRU) policy that evicts the oldest 20% of items when
1633
- the cache reaches the configured threshold.
745
+ export default async function handler(request) {
746
+ const brainy = createAutoBrainy()
747
+ const { searchParams } = new URL(request.url)
748
+ const query = searchParams.get('q')
1634
749
 
1635
- - **Prefetching Strategy**: Implements batch prefetching to improve performance while avoiding overwhelming system
1636
- resources.
750
+ const results = await brainy.searchText(query, 10)
751
+ return Response.json(results)
752
+ }
753
+ ```
1637
754
 
1638
- ### S3-Compatible Storage Improvements
755
+ ### Netlify Functions
1639
756
 
1640
- - **Enhanced Cloud Storage**: Improved support for S3-compatible storage services including AWS S3, Cloudflare R2, and
1641
- others.
757
+ ```javascript
758
+ import { createAutoBrainy } from 'brainy'
1642
759
 
1643
- - **Optimized Data Access**: Batch operations and error handling for efficient cloud storage access.
760
+ export async function handler(event, context) {
761
+ const brainy = createAutoBrainy()
762
+ const query = event.queryStringParameters.q
1644
763
 
1645
- - **Change Log Management**: Efficient synchronization through change logs to track updates.
764
+ const results = await brainy.searchText(query, 10)
1646
765
 
1647
- ### Data Compatibility
766
+ return {
767
+ statusCode: 200,
768
+ body: JSON.stringify(results)
769
+ }
770
+ }
771
+ ```
1648
772
 
1649
- Yes, you can use existing data indexed from an old version. Brainy includes robust data migration capabilities:
773
+ ### Supabase Edge Functions
1650
774
 
1651
- - **Vector Regeneration**: If vectors are missing in imported data, they will be automatically created using the
1652
- embedding function.
775
+ ```typescript
776
+ import { createAutoBrainy } from 'brainy'
777
+ import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
1653
778
 
1654
- - **HNSW Index Reconstruction**: The system can reconstruct the HNSW index from backup data, ensuring compatibility with
1655
- previous versions.
779
+ serve(async (req) => {
780
+ const brainy = createAutoBrainy()
781
+ const url = new URL(req.url)
782
+ const query = url.searchParams.get('q')
1656
783
 
1657
- - **Sparse Data Import**: Support for importing sparse data (without vectors) through the `importSparseData()` method.
784
+ const results = await brainy.searchText(query, 10)
1658
785
 
1659
- ### System Requirements
786
+ return new Response(JSON.stringify(results), {
787
+ headers: { 'Content-Type': 'application/json' }
788
+ })
789
+ })
790
+ ```
1660
791
 
1661
- #### Default Mode
792
+ ### Docker Container
1662
793
 
1663
- - **Memory**:
1664
- - Minimum: 512MB RAM
1665
- - Recommended: 2GB+ RAM for medium datasets, 8GB+ for large datasets
794
+ ```dockerfile
795
+ FROM node:20-alpine
796
+ USER node
797
+ WORKDIR /app
798
+ COPY package*.json ./
799
+ RUN npm install brainy
800
+ COPY . .
1666
801
 
1667
- - **CPU**:
1668
- - Minimum: 2 cores
1669
- - Recommended: 4+ cores for better performance with parallel operations
802
+ CMD ["node", "server.js"]
803
+ ```
1670
804
 
1671
- - **Storage**:
1672
- - Minimum: 1GB available storage
1673
- - Recommended: Storage space at least 3x the size of your dataset
805
+ ```javascript
806
+ // server.js
807
+ import { createAutoBrainy } from 'brainy'
808
+ import express from 'express'
1674
809
 
1675
- #### Read-Only Mode
810
+ const app = express()
811
+ const brainy = createAutoBrainy()
1676
812
 
1677
- Read-only mode prevents all write operations (add, update, delete) and is optimized for search operations.
813
+ app.get('/search', async (req, res) => {
814
+ const results = await brainy.searchText(req.query.q, 10)
815
+ res.json(results)
816
+ })
1678
817
 
1679
- - **Memory**:
1680
- - Minimum: 256MB RAM
1681
- - Recommended: 1GB+ RAM
818
+ app.listen(3000, () => console.log('Brainy running on port 3000'))
819
+ ```
1682
820
 
1683
- - **CPU**:
1684
- - Minimum: 1 core
1685
- - Recommended: 2+ cores
821
+ ### Kubernetes
1686
822
 
1687
- - **Storage**:
1688
- - Minimum: Storage space equal to the size of your dataset
1689
- - Recommended: 2x the size of your dataset for caching
823
+ ```yaml
824
+ apiVersion: apps/v1
825
+ kind: Deployment
826
+ metadata:
827
+ name: brainy-api
828
+ spec:
829
+ replicas: 3
830
+ template:
831
+ spec:
832
+ containers:
833
+ - name: brainy
834
+ image: your-registry/brainy-api:latest
835
+ env:
836
+ - name: S3_BUCKET
837
+ value: "your-vector-bucket"
838
+ ```
1690
839
 
1691
- - **New Feature**: Lazy loading support in read-only mode for improved performance with large datasets.
840
+ ### Railway.app
1692
841
 
1693
- #### Write-Only Mode
842
+ ```javascript
843
+ // server.js
844
+ import { createAutoBrainy } from 'brainy'
1694
845
 
1695
- Write-only mode prevents all search operations and is optimized for initial data loading or when you want to optimize
1696
- for write performance.
846
+ const brainy = createAutoBrainy({
847
+ bucketName: process.env.RAILWAY_VOLUME_NAME
848
+ })
1697
849
 
1698
- - **Memory**:
1699
- - Minimum: 512MB RAM
1700
- - Recommended: 2GB+ RAM
850
+ // Railway automatically handles the rest!
851
+ ```
1701
852
 
1702
- - **CPU**:
1703
- - Minimum: 2 cores
1704
- - Recommended: 4+ cores for faster data ingestion
853
+ ### Render.com
1705
854
 
1706
- - **Storage**:
1707
- - Minimum: Storage space at least 2x the size of your dataset
1708
- - Recommended: 4x the size of your dataset for optimal performance
855
+ ```yaml
856
+ # render.yaml
857
+ services:
858
+ - type: web
859
+ name: brainy-api
860
+ env: node
861
+ buildCommand: npm install brainy
862
+ startCommand: node server.js
863
+ envVars:
864
+ - key: BRAINY_STORAGE
865
+ value: persistent-disk
866
+ ```
1709
867
 
1710
- ### Performance Tuning Parameters
868
+ ## ๐Ÿš€ Quick Examples
1711
869
 
1712
- Brainy offers comprehensive configuration options for performance tuning, with enhanced support for large datasets in S3
1713
- or other remote storage. **All configuration is optional** - the system automatically detects the optimal settings based
1714
- on your environment, dataset size, and usage patterns.
870
+ ### Basic Usage
1715
871
 
1716
- #### Intelligent Defaults
872
+ ```javascript
873
+ import { BrainyData, NounType, VerbType } from 'brainy'
1717
874
 
1718
- Brainy uses intelligent defaults that automatically adapt to your environment:
875
+ // Initialize
876
+ const db = new BrainyData()
877
+ await db.init()
1719
878
 
1720
- - **Environment Detection**: Automatically detects whether you're running in Node.js, browser, or worker environment
1721
- - **Memory-Aware Caching**: Adjusts cache sizes based on available system memory
1722
- - **Dataset Size Adaptation**: Tunes parameters based on the size of your dataset
1723
- - **Usage Pattern Optimization**: Adjusts to read-heavy vs. write-heavy workloads
1724
- - **Storage Type Awareness**: Optimizes for local vs. remote storage (S3, R2, etc.)
1725
- - **Operating Mode Specialization**: Special optimizations for read-only and write-only modes
879
+ // Add data (automatically vectorized)
880
+ const catId = await db.add("Cats are independent pets", {
881
+ noun: NounType.Thing,
882
+ category: 'animal'
883
+ })
1726
884
 
1727
- #### Cache Configuration (Optional)
885
+ // Search for similar items
886
+ const results = await db.searchText("feline pets", 5)
1728
887
 
1729
- You can override any of these automatically tuned parameters if needed:
888
+ // Add relationships
889
+ await db.addVerb(catId, dogId, {
890
+ verb: VerbType.RelatedTo,
891
+ description: 'Both are pets'
892
+ })
893
+ ```
1730
894
 
1731
- - **Hot Cache Size**: Control the maximum number of items to keep in memory.
1732
- - For large datasets (>100K items), consider values between 5,000-50,000 depending on available memory.
1733
- - In read-only mode, larger values (10,000-100,000) can be used for better performance.
895
+ ### AutoBrainy (Recommended)
1734
896
 
1735
- - **Eviction Threshold**: Set the threshold at which cache eviction begins (default: 0.8 or 80% of max size).
1736
- - For write-heavy workloads, lower values (0.6-0.7) may improve performance.
1737
- - For read-heavy workloads, higher values (0.8-0.9) are recommended.
897
+ ```javascript
898
+ import { createAutoBrainy } from 'brainy'
1738
899
 
1739
- - **Warm Cache TTL**: Set the time-to-live for items in the warm cache (default: 3600000 ms or 1 hour).
1740
- - For frequently changing data, shorter TTLs are recommended.
1741
- - For relatively static data, longer TTLs improve performance.
900
+ // Everything auto-configured!
901
+ const brainy = createAutoBrainy()
1742
902
 
1743
- - **Batch Size**: Control the number of items to process in a single batch for operations like prefetching.
1744
- - For S3 or remote storage with large datasets, larger values (50-200) significantly improve throughput.
1745
- - In read-only mode with remote storage, even larger values (100-300) can be used.
903
+ // Just start using it
904
+ await brainy.addVector({ id: '1', vector: [0.1, 0.2, 0.3], text: 'Hello' })
905
+ const results = await brainy.search([0.1, 0.2, 0.3], 10)
906
+ ```
1746
907
 
1747
- #### Auto-Tuning (Enabled by Default)
908
+ ### Scenario-Based Setup
1748
909
 
1749
- - **Auto-Tune**: Enable or disable automatic tuning of cache parameters based on usage patterns (default: true).
1750
- - **Auto-Tune Interval**: Set how frequently the system adjusts cache parameters (default: 60000 ms or 1 minute).
910
+ ```javascript
911
+ import { createQuickBrainy } from 'brainy'
1751
912
 
1752
- #### Read-Only Mode Optimizations (Automatic)
913
+ // Choose your scale: 'small', 'medium', 'large', 'enterprise'
914
+ const brainy = await createQuickBrainy('large', {
915
+ bucketName: 'my-vector-db'
916
+ })
917
+ ```
1753
918
 
1754
- Read-only mode includes special optimizations for search performance that are automatically applied:
919
+ ### With Offline Models
1755
920
 
1756
- - **Larger Cache Sizes**: Automatically uses more memory for caching (up to 40% of free memory for large datasets).
1757
- - **Aggressive Prefetching**: Loads more data in each batch to reduce the number of storage requests.
1758
- - **Prefetch Strategy**: Defaults to 'aggressive' prefetching strategy in read-only mode.
921
+ ```javascript
922
+ import { createAutoBrainy } from 'brainy'
923
+ import { BundledUniversalSentenceEncoder } from '@soulcraft/brainy-models'
1759
924
 
1760
- #### Example Configuration for Large S3 Datasets
925
+ // Use bundled model for offline operation
926
+ const brainy = createAutoBrainy({
927
+ embeddingModel: BundledUniversalSentenceEncoder,
928
+ // Model loads from local files, no network needed!
929
+ })
1761
930
 
1762
- ```javascript
1763
- const brainy = new BrainyData({
1764
- readOnly: true,
1765
- lazyLoadInReadOnlyMode: true,
1766
- storage: {
1767
- type: 's3',
1768
- s3Storage: {
1769
- bucketName: 'your-bucket',
1770
- accessKeyId: 'your-access-key',
1771
- secretAccessKey: 'your-secret-key',
1772
- region: 'your-region'
1773
- }
1774
- },
1775
- cache: {
1776
- hotCacheMaxSize: 20000,
1777
- hotCacheEvictionThreshold: 0.85,
1778
- batchSize: 100,
1779
- readOnlyMode: {
1780
- hotCacheMaxSize: 50000,
1781
- batchSize: 200,
1782
- prefetchStrategy: 'aggressive'
1783
- }
1784
- }
1785
- });
931
+ // Works exactly the same, but 100% offline
932
+ await brainy.add("This works without internet!", {
933
+ noun: NounType.Content
934
+ })
1786
935
  ```
1787
936
 
1788
- These configuration options make Brainy more efficient, scalable, and adaptable to different environments and usage
1789
- patterns, especially for large datasets in cloud storage.
937
+ ## ๐ŸŒ Live Demo
1790
938
 
1791
- ## Testing
939
+ **[Try the interactive demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - See Brainy in action with
940
+ animations and examples.
1792
941
 
1793
- Brainy uses Vitest for testing. For detailed information about testing in Brainy, including test configuration, scripts,
1794
- reporting tools, and best practices, see our [Testing Guide](docs/technical/TESTING.md).
942
+ ## ๐Ÿ”ง Environment Support
1795
943
 
1796
- Here are some common test commands:
944
+ | Environment | Storage | Threading | Auto-Configured |
945
+ |----------------|---------------|----------------|-----------------|
946
+ | Browser | OPFS | Web Workers | โœ… |
947
+ | Node.js | FileSystem/S3 | Worker Threads | โœ… |
948
+ | Serverless | Memory/S3 | Limited | โœ… |
949
+ | Edge Functions | Memory/KV | Limited | โœ… |
1797
950
 
1798
- ```bash
1799
- # Run all tests
1800
- npm test
1801
-
1802
- # Run tests with comprehensive reporting
1803
- npm run test:report
951
+ ## ๐Ÿ“š Documentation
1804
952
 
1805
- # Run tests with coverage
1806
- npm run test:coverage
1807
- ```
953
+ ### Getting Started
1808
954
 
1809
- ## Contributing
955
+ - [**Quick Start Guide**](docs/getting-started/) - Get up and running in minutes
956
+ - [**Installation**](docs/getting-started/installation.md) - Detailed setup instructions
957
+ - [**Environment Setup**](docs/getting-started/environment-setup.md) - Platform-specific configuration
1810
958
 
1811
- For detailed contribution guidelines, please see [CONTRIBUTING.md](CONTRIBUTING.md).
959
+ ### User Guides
1812
960
 
1813
- For developer documentation, including building, testing, and publishing instructions, please
1814
- see [DEVELOPERS.md](DEVELOPERS.md).
961
+ - [**Search and Metadata**](docs/user-guides/) - Advanced search techniques
962
+ - [**JSON Document Search**](docs/guides/json-document-search.md) - Field-based searching
963
+ - [**Production Migration**](docs/guides/production-migration-guide.md) - Deployment best practices
1815
964
 
1816
- We have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to follow.
965
+ ### API Reference
1817
966
 
1818
- ### Commit Message Format
967
+ - [**Core API**](docs/api-reference/) - Complete method reference
968
+ - [**Configuration Options**](docs/api-reference/configuration.md) - All configuration parameters
1819
969
 
1820
- For best results with automatic changelog generation, follow
1821
- the [Conventional Commits](https://www.conventionalcommits.org/) specification for your commit messages:
970
+ ### Optimization & Scaling
1822
971
 
1823
- ```
1824
- AI Template for automated commit messages:
972
+ - [**Performance Features Guide**](docs/PERFORMANCE_FEATURES.md) - Advanced caching, auto-configuration, and
973
+ optimization
974
+ - [**Large-Scale Optimizations**](docs/optimization-guides/) - Handle millions of vectors
975
+ - [**Memory Management**](docs/optimization-guides/memory-optimization.md) - Efficient resource usage
976
+ - [**S3 Migration Guide**](docs/optimization-guides/s3-migration-guide.md) - Cloud storage setup
1825
977
 
1826
- Use Conventional Commit format
1827
- Specify the changes in a structured format
1828
- Add information about the purpose of the commit
1829
- ```
978
+ ### Examples & Patterns
1830
979
 
1831
- ```
1832
- <type>(<scope>): <description>
980
+ - [**Code Examples**](docs/examples/) - Real-world usage patterns
981
+ - [**Integrations**](docs/examples/integrations.md) - Third-party services
982
+ - [**Performance Patterns**](docs/examples/performance.md) - Optimization techniques
1833
983
 
1834
- [optional body]
984
+ ### Technical Documentation
1835
985
 
1836
- [optional footer(s)]
1837
- ```
986
+ - [**Architecture Overview**](docs/technical/) - System design and internals
987
+ - [**Testing Guide**](docs/technical/TESTING.md) - Testing strategies
988
+ - [**Statistics & Monitoring**](docs/technical/STATISTICS.md) - Performance tracking
1838
989
 
1839
- Where `<type>` is one of:
990
+ ## ๐Ÿค Contributing
1840
991
 
1841
- - `feat`: A new feature (maps to **Added** section)
1842
- - `fix`: A bug fix (maps to **Fixed** section)
1843
- - `chore`: Regular maintenance tasks (maps to **Changed** section)
1844
- - `docs`: Documentation changes (maps to **Documentation** section)
1845
- - `refactor`: Code changes that neither fix bugs nor add features (maps to **Changed** section)
1846
- - `perf`: Performance improvements (maps to **Changed** section)
992
+ We welcome contributions! Please see:
1847
993
 
1848
- ### Manual Release Process
994
+ - [Contributing Guidelines](CONTRIBUTING.md)
995
+ - [Developer Documentation](docs/development/DEVELOPERS.md)
996
+ - [Code of Conduct](CODE_OF_CONDUCT.md)
1849
997
 
1850
- If you need more control over the release process, you can use the individual commands:
998
+ ## ๐Ÿ“„ License
1851
999
 
1852
- ```bash
1853
- # Update version and generate changelog
1854
- npm run _release:patch # or _release:minor, _release:major
1000
+ [MIT](LICENSE)
1855
1001
 
1856
- # Create GitHub release
1857
- npm run _github-release
1002
+ ## ๐Ÿ”— Related Projects
1858
1003
 
1859
- # Publish to NPM
1860
- npm publish
1861
- ```
1004
+ - [**Cartographer**](https://github.com/sodal-project/cartographer) - Standardized interfaces for Brainy
1862
1005
 
1863
- ## License
1006
+ ---
1864
1007
 
1865
- [MIT](LICENSE)
1008
+ <div align="center">
1009
+ <strong>Ready to build something amazing? Get started with Brainy today!</strong>
1010
+ </div>