@soulcraft/brainy 0.9.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.demo.md +59 -0
- package/README.md +1257 -0
- package/brainy.png +0 -0
- package/cli-wrapper.js +56 -0
- package/dist/augmentationFactory.d.ts +87 -0
- package/dist/augmentationPipeline.d.ts +205 -0
- package/dist/augmentationRegistry.d.ts +48 -0
- package/dist/augmentationRegistryLoader.d.ts +147 -0
- package/dist/augmentations/conduitAugmentations.d.ts +173 -0
- package/dist/augmentations/memoryAugmentations.d.ts +71 -0
- package/dist/augmentations/serverSearchAugmentations.d.ts +168 -0
- package/dist/brainy.js +116929 -0
- package/dist/brainy.min.js +16107 -0
- package/dist/brainyData.d.ts +507 -0
- package/dist/cli.d.ts +7 -0
- package/dist/coreTypes.d.ts +131 -0
- package/dist/examples/basicUsage.d.ts +5 -0
- package/dist/hnsw/hnswIndex.d.ts +96 -0
- package/dist/hnsw/hnswIndexOptimized.d.ts +167 -0
- package/dist/index.d.ts +49 -0
- package/dist/mcp/brainyMCPAdapter.d.ts +69 -0
- package/dist/mcp/brainyMCPService.d.ts +99 -0
- package/dist/mcp/index.d.ts +14 -0
- package/dist/mcp/mcpAugmentationToolset.d.ts +68 -0
- package/dist/pipeline.d.ts +281 -0
- package/dist/sequentialPipeline.d.ts +114 -0
- package/dist/storage/fileSystemStorage.d.ts +123 -0
- package/dist/storage/opfsStorage.d.ts +244 -0
- package/dist/storage/s3CompatibleStorage.d.ts +158 -0
- package/dist/types/augmentations.d.ts +324 -0
- package/dist/types/augmentations.d.ts.map +1 -0
- package/dist/types/brainyDataInterface.d.ts +51 -0
- package/dist/types/brainyDataInterface.d.ts.map +1 -0
- package/dist/types/fileSystemTypes.d.ts +6 -0
- package/dist/types/fileSystemTypes.d.ts.map +1 -0
- package/dist/types/graphTypes.d.ts +134 -0
- package/dist/types/graphTypes.d.ts.map +1 -0
- package/dist/types/mcpTypes.d.ts +140 -0
- package/dist/types/mcpTypes.d.ts.map +1 -0
- package/dist/types/pipelineTypes.d.ts +27 -0
- package/dist/types/pipelineTypes.d.ts.map +1 -0
- package/dist/types/tensorflowTypes.d.ts +7 -0
- package/dist/types/tensorflowTypes.d.ts.map +1 -0
- package/dist/unified.d.ts +12 -0
- package/dist/unified.js +117122 -0
- package/dist/unified.min.js +16107 -0
- package/dist/utils/distance.d.ts +32 -0
- package/dist/utils/embedding.d.ts +55 -0
- package/dist/utils/environment.d.ts +28 -0
- package/dist/utils/index.d.ts +3 -0
- package/dist/utils/version.d.ts +6 -0
- package/dist/utils/workerUtils.d.ts +28 -0
- package/package.json +156 -0
package/README.md
ADDED
|
@@ -0,0 +1,1257 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
<img src="./brainy.png" alt="Brainy Logo" width="200"/>
|
|
3
|
+
<br/><br/>
|
|
4
|
+
|
|
5
|
+
[](LICENSE)
|
|
6
|
+
[](https://nodejs.org/)
|
|
7
|
+
[](https://www.typescriptlang.org/)
|
|
8
|
+
[](CONTRIBUTING.md)
|
|
9
|
+
[](https://github.com/sodal-project/cartographer)
|
|
10
|
+
|
|
11
|
+
**A powerful graph & vector data platform for AI applications across any environment**
|
|
12
|
+
|
|
13
|
+
</div>
|
|
14
|
+
|
|
15
|
+
## ✨ Overview
|
|
16
|
+
|
|
17
|
+
Brainy combines the power of vector search with graph relationships in a lightweight, cross-platform database. Whether
|
|
18
|
+
you're building AI applications, recommendation systems, or knowledge graphs, Brainy provides the tools you need to
|
|
19
|
+
store, connect, and retrieve your data intelligently.
|
|
20
|
+
|
|
21
|
+
What makes Brainy special? It intelligently adapts to your environment! Brainy automatically detects your platform,
|
|
22
|
+
adjusts its storage strategy, and optimizes performance based on your usage patterns. The more you use it, the smarter
|
|
23
|
+
it gets - learning from your data to provide increasingly relevant results and connections.
|
|
24
|
+
|
|
25
|
+
### 🚀 Key Features
|
|
26
|
+
|
|
27
|
+
- **Run Everywhere** - Works in browsers, Node.js, serverless functions, and containers
|
|
28
|
+
- **Vector Search** - Find semantically similar content using embeddings
|
|
29
|
+
- **Graph Relationships** - Connect data with meaningful relationships
|
|
30
|
+
- **Streaming Pipeline** - Process data in real-time as it flows through the system
|
|
31
|
+
- **Extensible Augmentations** - Customize and extend functionality with pluggable components
|
|
32
|
+
- **Built-in Conduits** - Sync and scale across instances with WebSocket and WebRTC
|
|
33
|
+
- **TensorFlow Integration** - Use TensorFlow.js for high-quality embeddings
|
|
34
|
+
- **Adaptive Intelligence** - Automatically optimizes for your environment and usage patterns
|
|
35
|
+
- **Persistent Storage** - Data persists across sessions and scales to any size
|
|
36
|
+
- **TypeScript Support** - Fully typed API with generics
|
|
37
|
+
- **CLI Tools** - Powerful command-line interface for data management
|
|
38
|
+
- **Model Control Protocol (MCP)** - Allow external AI models to access Brainy data and use augmentation pipeline as
|
|
39
|
+
tools
|
|
40
|
+
|
|
41
|
+
## 🚀 Live Demo
|
|
42
|
+
|
|
43
|
+
**[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the interactive demo on
|
|
44
|
+
GitHub Pages that showcases Brainy's main features.
|
|
45
|
+
|
|
46
|
+
## 📊 What Can You Build?
|
|
47
|
+
|
|
48
|
+
- **Semantic Search Engines** - Find content based on meaning, not just keywords
|
|
49
|
+
- **Recommendation Systems** - Suggest similar items based on vector similarity
|
|
50
|
+
- **Knowledge Graphs** - Build connected data structures with relationships
|
|
51
|
+
- **AI Applications** - Store and retrieve embeddings for machine learning models
|
|
52
|
+
- **AI-Enhanced Applications** - Build applications that leverage vector embeddings for intelligent data processing
|
|
53
|
+
- **Data Organization Tools** - Automatically categorize and connect related information
|
|
54
|
+
- **Adaptive Experiences** - Create applications that learn and evolve with your users
|
|
55
|
+
- **Model-Integrated Systems** - Connect external AI models to Brainy data and tools using MCP
|
|
56
|
+
|
|
57
|
+
## 🔧 Installation
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
npm install @soulcraft/brainy
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
TensorFlow.js packages are included as required dependencies and will be automatically installed. If you encounter
|
|
64
|
+
dependency conflicts, you may need to use the `--legacy-peer-deps` flag:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
npm install @soulcraft/brainy --legacy-peer-deps
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## 🏁 Quick Start
|
|
71
|
+
|
|
72
|
+
Brainy uses a unified build that automatically adapts to your environment (Node.js, browser, or serverless):
|
|
73
|
+
|
|
74
|
+
```typescript
|
|
75
|
+
import { BrainyData, NounType, VerbType } from '@soulcraft/brainy'
|
|
76
|
+
|
|
77
|
+
// Create and initialize the database
|
|
78
|
+
const db = new BrainyData()
|
|
79
|
+
await db.init()
|
|
80
|
+
|
|
81
|
+
// Add data (automatically converted to vectors)
|
|
82
|
+
const catId = await db.add("Cats are independent pets", {
|
|
83
|
+
noun: NounType.Thing,
|
|
84
|
+
category: 'animal'
|
|
85
|
+
})
|
|
86
|
+
|
|
87
|
+
const dogId = await db.add("Dogs are loyal companions", {
|
|
88
|
+
noun: NounType.Thing,
|
|
89
|
+
category: 'animal'
|
|
90
|
+
})
|
|
91
|
+
|
|
92
|
+
// Search for similar items
|
|
93
|
+
const results = await db.searchText("feline pets", 2)
|
|
94
|
+
console.log(results)
|
|
95
|
+
|
|
96
|
+
// Add a relationship between items
|
|
97
|
+
await db.addVerb(catId, dogId, {
|
|
98
|
+
verb: VerbType.RelatedTo,
|
|
99
|
+
description: 'Both are common household pets'
|
|
100
|
+
})
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### Import Options
|
|
104
|
+
|
|
105
|
+
```typescript
|
|
106
|
+
// Standard import - automatically adapts to any environment
|
|
107
|
+
import { BrainyData } from '@soulcraft/brainy'
|
|
108
|
+
|
|
109
|
+
// Minified version for production
|
|
110
|
+
import { BrainyData } from '@soulcraft/brainy/min'
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Browser Usage
|
|
114
|
+
|
|
115
|
+
```html
|
|
116
|
+
|
|
117
|
+
<script type="module">
|
|
118
|
+
// Use local files instead of CDN
|
|
119
|
+
import { BrainyData } from './dist/unified.js'
|
|
120
|
+
|
|
121
|
+
// Or minified version
|
|
122
|
+
// import { BrainyData } from './dist/unified.min.js'
|
|
123
|
+
|
|
124
|
+
const db = new BrainyData()
|
|
125
|
+
await db.init()
|
|
126
|
+
// ...
|
|
127
|
+
</script>
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
Modern bundlers like Webpack, Rollup, and Vite will automatically use the unified build which adapts to any environment.
|
|
131
|
+
|
|
132
|
+
## 🧩 How It Works
|
|
133
|
+
|
|
134
|
+
Brainy combines four key technologies to create its adaptive intelligence:
|
|
135
|
+
|
|
136
|
+
1. **Vector Embeddings** - Converts data (text, images, etc.) into numerical vectors that capture semantic meaning
|
|
137
|
+
2. **HNSW Algorithm** - Enables fast similarity search through a hierarchical graph structure
|
|
138
|
+
3. **Adaptive Environment Detection** - Automatically senses your platform and optimizes accordingly:
|
|
139
|
+
- Detects browser, Node.js, and serverless environments
|
|
140
|
+
- Adjusts performance parameters based on available resources
|
|
141
|
+
- Learns from query patterns to optimize future searches
|
|
142
|
+
- Tunes itself for your specific use cases
|
|
143
|
+
4. **Intelligent Storage Selection** - Uses the best available storage option for your environment:
|
|
144
|
+
- Browser: Origin Private File System (OPFS)
|
|
145
|
+
- Node.js: File system
|
|
146
|
+
- Server: S3-compatible storage (optional)
|
|
147
|
+
- Serverless: In-memory storage with optional cloud persistence
|
|
148
|
+
- Fallback: In-memory storage
|
|
149
|
+
- Automatically migrates between storage types as needed
|
|
150
|
+
|
|
151
|
+
## 🚀 The Brainy Pipeline
|
|
152
|
+
|
|
153
|
+
Brainy's data processing pipeline transforms raw data into searchable, connected knowledge that gets smarter over time:
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
Raw Data → Embedding → Vector Storage → Graph Connections → Adaptive Learning → Query & Retrieval
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Each time data flows through this pipeline, Brainy learns more about your usage patterns and environment, making future
|
|
160
|
+
operations faster and more relevant.
|
|
161
|
+
|
|
162
|
+
### Pipeline Stages
|
|
163
|
+
|
|
164
|
+
1. **Data Ingestion**
|
|
165
|
+
- Raw text or pre-computed vectors enter the pipeline
|
|
166
|
+
- Data is validated and prepared for processing
|
|
167
|
+
|
|
168
|
+
2. **Embedding Generation**
|
|
169
|
+
- Text is transformed into numerical vectors using embedding models
|
|
170
|
+
- Uses TensorFlow Universal Sentence Encoder for high-quality text embeddings
|
|
171
|
+
- Custom embedding functions can be plugged in for specialized domains
|
|
172
|
+
|
|
173
|
+
3. **Vector Indexing**
|
|
174
|
+
- Vectors are indexed using the HNSW algorithm
|
|
175
|
+
- Hierarchical structure enables fast similarity search
|
|
176
|
+
- Configurable parameters for precision vs. performance tradeoffs
|
|
177
|
+
|
|
178
|
+
4. **Graph Construction**
|
|
179
|
+
- Nouns (entities) become nodes in the knowledge graph
|
|
180
|
+
- Verbs (relationships) connect related entities
|
|
181
|
+
- Typed relationships add semantic meaning to connections
|
|
182
|
+
|
|
183
|
+
5. **Adaptive Learning**
|
|
184
|
+
- Analyzes usage patterns to optimize future operations
|
|
185
|
+
- Tunes performance parameters based on your environment
|
|
186
|
+
- Adjusts search strategies based on query history
|
|
187
|
+
- Becomes more efficient and relevant the more you use it
|
|
188
|
+
|
|
189
|
+
6. **Intelligent Storage**
|
|
190
|
+
- Data is saved using the optimal storage for your environment
|
|
191
|
+
- Automatic selection between OPFS, filesystem, S3, or memory
|
|
192
|
+
- Migrates between storage types as your application's needs evolve
|
|
193
|
+
- Scales from tiny datasets to massive data collections
|
|
194
|
+
- Configurable storage adapters for custom persistence needs
|
|
195
|
+
|
|
196
|
+
### Augmentation Types
|
|
197
|
+
|
|
198
|
+
Brainy uses a powerful augmentation system to extend functionality. Augmentations are processed in the following order:
|
|
199
|
+
|
|
200
|
+
1. **SENSE**
|
|
201
|
+
- Ingests and processes raw, unstructured data into nouns and verbs
|
|
202
|
+
- Handles text, images, audio streams, and other input formats
|
|
203
|
+
- Example: Converting raw text into structured entities
|
|
204
|
+
|
|
205
|
+
2. **MEMORY**
|
|
206
|
+
- Provides storage capabilities for data in different formats
|
|
207
|
+
- Manages persistence across sessions
|
|
208
|
+
- Example: Storing vectors in OPFS or filesystem
|
|
209
|
+
|
|
210
|
+
3. **COGNITION**
|
|
211
|
+
- Enables advanced reasoning, inference, and logical operations
|
|
212
|
+
- Analyzes relationships between entities
|
|
213
|
+
- Examples:
|
|
214
|
+
- Inferring new connections between existing data
|
|
215
|
+
- Deriving insights from graph relationships
|
|
216
|
+
|
|
217
|
+
4. **CONDUIT**
|
|
218
|
+
- Establishes channels for structured data exchange
|
|
219
|
+
- Connects with external systems and syncs between Brainy instances
|
|
220
|
+
- Two built-in iConduit augmentations for scaling out and syncing:
|
|
221
|
+
- **WebSocket iConduit** - Syncs data between browsers and servers
|
|
222
|
+
- **WebRTC iConduit** - Direct peer-to-peer syncing between browsers
|
|
223
|
+
- Examples:
|
|
224
|
+
- Integrating with third-party APIs
|
|
225
|
+
- Syncing Brainy instances between browsers using WebSockets
|
|
226
|
+
- Peer-to-peer syncing between browsers using WebRTC
|
|
227
|
+
|
|
228
|
+
5. **ACTIVATION**
|
|
229
|
+
- Initiates actions, responses, or data manipulations
|
|
230
|
+
- Triggers events based on data changes
|
|
231
|
+
- Example: Sending notifications when new data is processed
|
|
232
|
+
|
|
233
|
+
6. **PERCEPTION**
|
|
234
|
+
- Interprets, contextualizes, and visualizes identified nouns and verbs
|
|
235
|
+
- Creates meaningful representations of data
|
|
236
|
+
- Example: Generating visualizations of graph relationships
|
|
237
|
+
|
|
238
|
+
7. **DIALOG**
|
|
239
|
+
- Facilitates natural language understanding and generation
|
|
240
|
+
- Enables conversational interactions
|
|
241
|
+
- Example: Processing user queries and generating responses
|
|
242
|
+
|
|
243
|
+
8. **WEBSOCKET**
|
|
244
|
+
- Enables real-time communication via WebSockets
|
|
245
|
+
- Can be combined with other augmentation types
|
|
246
|
+
- Example: Streaming data processing in real-time
|
|
247
|
+
|
|
248
|
+
### Streaming Data Support
|
|
249
|
+
|
|
250
|
+
Brainy's pipeline is designed to handle streaming data efficiently:
|
|
251
|
+
|
|
252
|
+
1. **WebSocket Integration**
|
|
253
|
+
- Built-in support for WebSocket connections
|
|
254
|
+
- Process data as it arrives without blocking
|
|
255
|
+
- Example: `setupWebSocketPipeline(url, dataType, options)`
|
|
256
|
+
|
|
257
|
+
2. **Asynchronous Processing**
|
|
258
|
+
- Non-blocking architecture for real-time data handling
|
|
259
|
+
- Parallel processing of incoming streams
|
|
260
|
+
- Example: `createWebSocketHandler(connection, dataType, options)`
|
|
261
|
+
|
|
262
|
+
3. **Event-Based Architecture**
|
|
263
|
+
- Augmentations can listen to data feeds and streams
|
|
264
|
+
- Real-time updates propagate through the pipeline
|
|
265
|
+
- Example: `listenToFeed(feedUrl, callback)`
|
|
266
|
+
|
|
267
|
+
4. **Threaded Execution**
|
|
268
|
+
- Comprehensive multi-threading for high-performance operations
|
|
269
|
+
- Parallel processing for batch operations, vector calculations, and embedding generation
|
|
270
|
+
- Configurable execution modes (SEQUENTIAL, PARALLEL, THREADED)
|
|
271
|
+
- Automatic thread management based on environment capabilities
|
|
272
|
+
- Example: `executeTypedPipeline(augmentations, method, args, { mode: ExecutionMode.THREADED })`
|
|
273
|
+
|
|
274
|
+
### Build System
|
|
275
|
+
|
|
276
|
+
Brainy uses a modern build system that optimizes for both Node.js and browser environments:
|
|
277
|
+
|
|
278
|
+
1. **ES Modules**
|
|
279
|
+
- Built as ES modules for maximum compatibility
|
|
280
|
+
- Works in modern browsers and Node.js environments
|
|
281
|
+
- Separate optimized builds for browser and Node.js
|
|
282
|
+
|
|
283
|
+
2. **Environment-Specific Builds**
|
|
284
|
+
- **Node.js Build**: Optimized for server environments with full functionality
|
|
285
|
+
- **Browser Build**: Optimized for browser environments with reduced bundle size
|
|
286
|
+
- Conditional exports in package.json for automatic environment detection
|
|
287
|
+
|
|
288
|
+
3. **Environment Detection**
|
|
289
|
+
- Automatically detects whether it's running in a browser or Node.js
|
|
290
|
+
- Loads appropriate dependencies and functionality based on the environment
|
|
291
|
+
- Provides consistent API across all environments
|
|
292
|
+
|
|
293
|
+
4. **TypeScript**
|
|
294
|
+
- Written in TypeScript for type safety and better developer experience
|
|
295
|
+
- Generates type definitions for TypeScript users
|
|
296
|
+
- Compiled to ES2020 for modern JavaScript environments
|
|
297
|
+
|
|
298
|
+
5. **Build Scripts**
|
|
299
|
+
- `npm run build`: Builds the Node.js version
|
|
300
|
+
- `npm run build:browser`: Builds the browser-optimized version
|
|
301
|
+
- `npm run build:all`: Builds both versions
|
|
302
|
+
- `npm run demo`: Builds all versions and starts a demo server
|
|
303
|
+
- `npm run deploy:demo`: Deploys the examples directory to GitHub Pages
|
|
304
|
+
|
|
305
|
+
### Running the Pipeline
|
|
306
|
+
|
|
307
|
+
The pipeline runs automatically when you:
|
|
308
|
+
|
|
309
|
+
```typescript
|
|
310
|
+
// Add data (runs embedding → indexing → storage)
|
|
311
|
+
const id = await db.add("Your text data here", { metadata })
|
|
312
|
+
|
|
313
|
+
// Search (runs embedding → similarity search)
|
|
314
|
+
const results = await db.searchText("Your query here", 5)
|
|
315
|
+
|
|
316
|
+
// Connect entities (runs graph construction → storage)
|
|
317
|
+
await db.addVerb(sourceId, targetId, { verb: VerbType.RelatedTo })
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
Using the CLI:
|
|
321
|
+
|
|
322
|
+
```bash
|
|
323
|
+
# Add data through the CLI pipeline
|
|
324
|
+
brainy add "Your text data here" '{"noun":"Thing"}'
|
|
325
|
+
|
|
326
|
+
# Search through the CLI pipeline
|
|
327
|
+
brainy search "Your query here" --limit 5
|
|
328
|
+
|
|
329
|
+
# Connect entities through the CLI
|
|
330
|
+
brainy addVerb <sourceId> <targetId> RelatedTo
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
### Extending the Pipeline
|
|
334
|
+
|
|
335
|
+
Brainy's pipeline is designed for extensibility at every stage:
|
|
336
|
+
|
|
337
|
+
1. **Custom Embedding**
|
|
338
|
+
```typescript
|
|
339
|
+
// Create your own embedding function
|
|
340
|
+
const myEmbedder = async (text) => {
|
|
341
|
+
// Your custom embedding logic here
|
|
342
|
+
return [0.1, 0.2, 0.3, ...] // Return a vector
|
|
343
|
+
}
|
|
344
|
+
|
|
345
|
+
// Use it in Brainy
|
|
346
|
+
const db = new BrainyData({
|
|
347
|
+
embeddingFunction: myEmbedder
|
|
348
|
+
})
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
2. **Custom Distance Functions**
|
|
352
|
+
```typescript
|
|
353
|
+
// Define your own distance function
|
|
354
|
+
const myDistance = (a, b) => {
|
|
355
|
+
// Your custom distance calculation
|
|
356
|
+
return Math.sqrt(a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0))
|
|
357
|
+
}
|
|
358
|
+
|
|
359
|
+
// Use it in Brainy
|
|
360
|
+
const db = new BrainyData({
|
|
361
|
+
distanceFunction: myDistance
|
|
362
|
+
})
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
3. **Custom Storage Adapters**
|
|
366
|
+
```typescript
|
|
367
|
+
// Implement the StorageAdapter interface
|
|
368
|
+
class MyStorage implements StorageAdapter {
|
|
369
|
+
// Your storage implementation
|
|
370
|
+
}
|
|
371
|
+
|
|
372
|
+
// Use it in Brainy
|
|
373
|
+
const db = new BrainyData({
|
|
374
|
+
storageAdapter: new MyStorage()
|
|
375
|
+
})
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
4. **Augmentations System**
|
|
379
|
+
```typescript
|
|
380
|
+
// Create custom augmentations to extend functionality
|
|
381
|
+
const myAugmentation = {
|
|
382
|
+
type: 'memory',
|
|
383
|
+
name: 'my-custom-storage',
|
|
384
|
+
// Implementation details
|
|
385
|
+
}
|
|
386
|
+
|
|
387
|
+
// Register with Brainy
|
|
388
|
+
db.registerAugmentation(myAugmentation)
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
## Data Model
|
|
392
|
+
|
|
393
|
+
Brainy uses a graph-based data model with two primary concepts:
|
|
394
|
+
|
|
395
|
+
### Nouns (Entities)
|
|
396
|
+
|
|
397
|
+
The main entities in your data (nodes in the graph):
|
|
398
|
+
|
|
399
|
+
- Each noun has a unique ID, vector representation, and metadata
|
|
400
|
+
- Nouns can be categorized by type (Person, Place, Thing, Event, Concept, etc.)
|
|
401
|
+
- Nouns are automatically vectorized for similarity search
|
|
402
|
+
|
|
403
|
+
### Verbs (Relationships)
|
|
404
|
+
|
|
405
|
+
Connections between nouns (edges in the graph):
|
|
406
|
+
|
|
407
|
+
- Each verb connects a source noun to a target noun
|
|
408
|
+
- Verbs have types that define the relationship (RelatedTo, Controls, Contains, etc.)
|
|
409
|
+
- Verbs can have their own metadata to describe the relationship
|
|
410
|
+
|
|
411
|
+
## Command Line Interface
|
|
412
|
+
|
|
413
|
+
Brainy includes a powerful CLI for managing your data:
|
|
414
|
+
|
|
415
|
+
```bash
|
|
416
|
+
# Install globally
|
|
417
|
+
npm install -g @soulcraft/brainy --legacy-peer-deps
|
|
418
|
+
|
|
419
|
+
# Initialize a database
|
|
420
|
+
brainy init
|
|
421
|
+
|
|
422
|
+
# Add some data
|
|
423
|
+
brainy add "Cats are independent pets" '{"noun":"Thing","category":"animal"}'
|
|
424
|
+
brainy add "Dogs are loyal companions" '{"noun":"Thing","category":"animal"}'
|
|
425
|
+
|
|
426
|
+
# Search for similar items
|
|
427
|
+
brainy search "feline pets" 5
|
|
428
|
+
|
|
429
|
+
# Add relationships between items
|
|
430
|
+
brainy addVerb <sourceId> <targetId> RelatedTo '{"description":"Both are pets"}'
|
|
431
|
+
|
|
432
|
+
# Visualize the graph structure
|
|
433
|
+
brainy visualize
|
|
434
|
+
brainy visualize --root <id> --depth 3
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
### Development Usage
|
|
438
|
+
|
|
439
|
+
```bash
|
|
440
|
+
# Run the CLI directly from the source
|
|
441
|
+
npm run cli help
|
|
442
|
+
|
|
443
|
+
# Generate a random graph for testing
|
|
444
|
+
npm run cli generate-random-graph --noun-count 20 --verb-count 40
|
|
445
|
+
```
|
|
446
|
+
|
|
447
|
+
### Available Commands
|
|
448
|
+
|
|
449
|
+
#### Basic Database Operations:
|
|
450
|
+
|
|
451
|
+
- `init` - Initialize a new database
|
|
452
|
+
- `add <text> [metadata]` - Add a new noun with text and optional metadata
|
|
453
|
+
- `search <query> [limit]` - Search for nouns similar to the query
|
|
454
|
+
- `get <id>` - Get a noun by ID
|
|
455
|
+
- `delete <id>` - Delete a noun by ID
|
|
456
|
+
- `addVerb <sourceId> <targetId> <verbType> [metadata]` - Add a relationship
|
|
457
|
+
- `getVerbs <id>` - Get all relationships for a noun
|
|
458
|
+
- `status` - Show database status
|
|
459
|
+
- `clear` - Clear all data from the database
|
|
460
|
+
- `generate-random-graph` - Generate test data
|
|
461
|
+
- `visualize` - Visualize the graph structure
|
|
462
|
+
- `completion-setup` - Setup shell autocomplete
|
|
463
|
+
|
|
464
|
+
#### Pipeline and Augmentation Commands:
|
|
465
|
+
|
|
466
|
+
- `list-augmentations` - List all available augmentation types and registered augmentations
|
|
467
|
+
- `augmentation-info <type>` - Get detailed information about a specific augmentation type
|
|
468
|
+
- `test-pipeline [text]` - Test the sequential pipeline with sample data
|
|
469
|
+
- `-t, --data-type <type>` - Type of data to process (default: 'text')
|
|
470
|
+
- `-m, --mode <mode>` - Execution mode: sequential, parallel, threaded (default: 'sequential')
|
|
471
|
+
- `-s, --stop-on-error` - Stop execution if an error occurs
|
|
472
|
+
- `-v, --verbose` - Show detailed output
|
|
473
|
+
- `stream-test` - Test streaming data through the pipeline (simulated)
|
|
474
|
+
- `-c, --count <number>` - Number of data items to stream (default: 5)
|
|
475
|
+
- `-i, --interval <ms>` - Interval between data items in milliseconds (default: 1000)
|
|
476
|
+
- `-t, --data-type <type>` - Type of data to process (default: 'text')
|
|
477
|
+
- `-v, --verbose` - Show detailed output
|
|
478
|
+
|
|
479
|
+
## API Reference
|
|
480
|
+
|
|
481
|
+
### Database Management
|
|
482
|
+
|
|
483
|
+
```typescript
|
|
484
|
+
// Initialize the database
|
|
485
|
+
await db.init()
|
|
486
|
+
|
|
487
|
+
// Clear all data
|
|
488
|
+
await db.clear()
|
|
489
|
+
|
|
490
|
+
// Get database status
|
|
491
|
+
const status = await db.status()
|
|
492
|
+
|
|
493
|
+
// Backup all data from the database
|
|
494
|
+
const backupData = await db.backup()
|
|
495
|
+
|
|
496
|
+
// Restore data into the database
|
|
497
|
+
const restoreResult = await db.restore(backupData, { clearExisting: true })
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
### Working with Nouns (Entities)
|
|
501
|
+
|
|
502
|
+
```typescript
|
|
503
|
+
// Add a noun (automatically vectorized)
|
|
504
|
+
const id = await db.add(textOrVector, {
|
|
505
|
+
noun: NounType.Thing,
|
|
506
|
+
// other metadata...
|
|
507
|
+
})
|
|
508
|
+
|
|
509
|
+
// Add multiple nouns in parallel (with multithreading)
|
|
510
|
+
const ids = await db.addBatch([
|
|
511
|
+
{
|
|
512
|
+
vectorOrData: "First item to add",
|
|
513
|
+
metadata: { noun: NounType.Thing, category: 'example' }
|
|
514
|
+
},
|
|
515
|
+
{
|
|
516
|
+
vectorOrData: "Second item to add",
|
|
517
|
+
metadata: { noun: NounType.Thing, category: 'example' }
|
|
518
|
+
},
|
|
519
|
+
// More items...
|
|
520
|
+
], {
|
|
521
|
+
forceEmbed: false,
|
|
522
|
+
concurrency: 4 // Control the level of parallelism (default: 4)
|
|
523
|
+
})
|
|
524
|
+
|
|
525
|
+
// Retrieve a noun
|
|
526
|
+
const noun = await db.get(id)
|
|
527
|
+
|
|
528
|
+
// Update noun metadata
|
|
529
|
+
await db.updateMetadata(id, {
|
|
530
|
+
noun: NounType.Thing,
|
|
531
|
+
// updated metadata...
|
|
532
|
+
})
|
|
533
|
+
|
|
534
|
+
// Delete a noun
|
|
535
|
+
await db.delete(id)
|
|
536
|
+
|
|
537
|
+
// Search for similar nouns
|
|
538
|
+
const results = await db.search(vectorOrText, numResults)
|
|
539
|
+
const textResults = await db.searchText("query text", numResults)
|
|
540
|
+
|
|
541
|
+
// Search by noun type
|
|
542
|
+
const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
### Working with Verbs (Relationships)
|
|
546
|
+
|
|
547
|
+
```typescript
|
|
548
|
+
// Add a relationship between nouns
|
|
549
|
+
await db.addVerb(sourceId, targetId, {
|
|
550
|
+
verb: VerbType.RelatedTo,
|
|
551
|
+
// other metadata...
|
|
552
|
+
})
|
|
553
|
+
|
|
554
|
+
// Get all relationships
|
|
555
|
+
const verbs = await db.getAllVerbs()
|
|
556
|
+
|
|
557
|
+
// Get relationships by source noun
|
|
558
|
+
const outgoingVerbs = await db.getVerbsBySource(sourceId)
|
|
559
|
+
|
|
560
|
+
// Get relationships by target noun
|
|
561
|
+
const incomingVerbs = await db.getVerbsByTarget(targetId)
|
|
562
|
+
|
|
563
|
+
// Get relationships by type
|
|
564
|
+
const containsVerbs = await db.getVerbsByType(VerbType.Contains)
|
|
565
|
+
|
|
566
|
+
// Get a specific relationship
|
|
567
|
+
const verb = await db.getVerb(verbId)
|
|
568
|
+
|
|
569
|
+
// Delete a relationship
|
|
570
|
+
await db.deleteVerb(verbId)
|
|
571
|
+
```
|
|
572
|
+
|
|
573
|
+
## Advanced Configuration
|
|
574
|
+
|
|
575
|
+
### Embedding
|
|
576
|
+
|
|
577
|
+
```typescript
|
|
578
|
+
import {
|
|
579
|
+
BrainyData,
|
|
580
|
+
createTensorFlowEmbeddingFunction,
|
|
581
|
+
createThreadedEmbeddingFunction
|
|
582
|
+
} from '@soulcraft/brainy'
|
|
583
|
+
|
|
584
|
+
// Use the standard TensorFlow Universal Sentence Encoder embedding function
|
|
585
|
+
const db = new BrainyData({
|
|
586
|
+
embeddingFunction: createTensorFlowEmbeddingFunction()
|
|
587
|
+
})
|
|
588
|
+
await db.init()
|
|
589
|
+
|
|
590
|
+
// Or use the threaded embedding function for better performance
|
|
591
|
+
const threadedDb = new BrainyData({
|
|
592
|
+
embeddingFunction: createThreadedEmbeddingFunction({
|
|
593
|
+
fallbackToMain: true // Fall back to main thread if threading fails
|
|
594
|
+
})
|
|
595
|
+
})
|
|
596
|
+
await threadedDb.init()
|
|
597
|
+
|
|
598
|
+
// Directly embed text to vectors
|
|
599
|
+
const vector = await db.embed("Some text to convert to a vector")
|
|
600
|
+
```
|
|
601
|
+
|
|
602
|
+
The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve performance, especially for CPU-intensive embedding operations. It automatically falls back to the main thread if threading is not available in the current environment.
|
|
603
|
+
|
|
604
|
+
### Performance Tuning
|
|
605
|
+
|
|
606
|
+
Brainy now includes comprehensive multithreading support to improve performance across all environments:
|
|
607
|
+
|
|
608
|
+
1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
|
|
609
|
+
2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
|
|
610
|
+
3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
|
|
611
|
+
4. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
|
|
612
|
+
|
|
613
|
+
```typescript
|
|
614
|
+
import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
|
|
615
|
+
|
|
616
|
+
// Configure with custom options
|
|
617
|
+
const db = new BrainyData({
|
|
618
|
+
// Use Euclidean distance instead of default cosine distance
|
|
619
|
+
distanceFunction: euclideanDistance,
|
|
620
|
+
|
|
621
|
+
// HNSW index configuration for search performance
|
|
622
|
+
hnsw: {
|
|
623
|
+
M: 16, // Max connections per noun
|
|
624
|
+
efConstruction: 200, // Construction candidate list size
|
|
625
|
+
efSearch: 50, // Search candidate list size
|
|
626
|
+
},
|
|
627
|
+
|
|
628
|
+
// Multithreading options
|
|
629
|
+
threading: {
|
|
630
|
+
useParallelization: true, // Enable multithreaded search operations
|
|
631
|
+
},
|
|
632
|
+
|
|
633
|
+
// Noun and Verb type validation
|
|
634
|
+
typeValidation: {
|
|
635
|
+
enforceNounTypes: true, // Validate noun types against NounType enum
|
|
636
|
+
enforceVerbTypes: true, // Validate verb types against VerbType enum
|
|
637
|
+
},
|
|
638
|
+
|
|
639
|
+
// Storage configuration
|
|
640
|
+
storage: {
|
|
641
|
+
requestPersistentStorage: true,
|
|
642
|
+
// Example configuration for cloud storage (replace with your own values):
|
|
643
|
+
// s3Storage: {
|
|
644
|
+
// bucketName: 'your-s3-bucket-name',
|
|
645
|
+
// region: 'your-aws-region'
|
|
646
|
+
// // Credentials should be provided via environment variables
|
|
647
|
+
// // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
|
|
648
|
+
// }
|
|
649
|
+
}
|
|
650
|
+
})
|
|
651
|
+
```
|
|
652
|
+
|
|
653
|
+
### Optimized HNSW for Large Datasets
|
|
654
|
+
|
|
655
|
+
Brainy includes an optimized HNSW index implementation for large datasets that may not fit entirely in memory, using a
|
|
656
|
+
hybrid approach:
|
|
657
|
+
|
|
658
|
+
1. **Product Quantization** - Reduces vector dimensionality while preserving similarity relationships
|
|
659
|
+
2. **Disk-Based Storage** - Offloads vectors to disk when memory usage exceeds a threshold
|
|
660
|
+
3. **Memory-Efficient Indexing** - Optimizes memory usage for large-scale vector collections
|
|
661
|
+
|
|
662
|
+
```typescript
|
|
663
|
+
import { BrainyData } from '@soulcraft/brainy'
|
|
664
|
+
|
|
665
|
+
// Configure with optimized HNSW index for large datasets
|
|
666
|
+
const db = new BrainyData({
|
|
667
|
+
hnswOptimized: {
|
|
668
|
+
// Standard HNSW parameters
|
|
669
|
+
M: 16, // Max connections per noun
|
|
670
|
+
efConstruction: 200, // Construction candidate list size
|
|
671
|
+
efSearch: 50, // Search candidate list size
|
|
672
|
+
|
|
673
|
+
// Memory threshold in bytes - when exceeded, will use disk-based approach
|
|
674
|
+
memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
|
|
675
|
+
|
|
676
|
+
// Product quantization settings for dimensionality reduction
|
|
677
|
+
productQuantization: {
|
|
678
|
+
enabled: true, // Enable product quantization
|
|
679
|
+
numSubvectors: 16, // Number of subvectors to split the vector into
|
|
680
|
+
numCentroids: 256 // Number of centroids per subvector
|
|
681
|
+
},
|
|
682
|
+
|
|
683
|
+
// Whether to use disk-based storage for the index
|
|
684
|
+
useDiskBasedIndex: true // Enable disk-based storage
|
|
685
|
+
},
|
|
686
|
+
|
|
687
|
+
// Storage configuration (required for disk-based index)
|
|
688
|
+
storage: {
|
|
689
|
+
requestPersistentStorage: true
|
|
690
|
+
}
|
|
691
|
+
})
|
|
692
|
+
|
|
693
|
+
// The optimized index automatically adapts based on dataset size:
|
|
694
|
+
// 1. For small datasets: Uses standard in-memory approach
|
|
695
|
+
// 2. For medium datasets: Applies product quantization to reduce memory usage
|
|
696
|
+
// 3. For large datasets: Combines product quantization with disk-based storage
|
|
697
|
+
|
|
698
|
+
// Check status to see memory usage and optimization details
|
|
699
|
+
const status = await db.status()
|
|
700
|
+
console.log(status.details.index)
|
|
701
|
+
```
|
|
702
|
+
|
|
703
|
+
## Distance Functions
|
|
704
|
+
|
|
705
|
+
- `cosineDistance` (default)
|
|
706
|
+
- `euclideanDistance`
|
|
707
|
+
- `manhattanDistance`
|
|
708
|
+
- `dotProductDistance`
|
|
709
|
+
|
|
710
|
+
## Backup and Restore
|
|
711
|
+
|
|
712
|
+
Brainy provides backup and restore capabilities that allow you to:
|
|
713
|
+
|
|
714
|
+
- Back up your data
|
|
715
|
+
- Transfer data between Brainy instances
|
|
716
|
+
- Restore existing data into Brainy for vectorization and indexing
|
|
717
|
+
- Backup data for analysis or visualization in other tools
|
|
718
|
+
|
|
719
|
+
### Backing Up Data
|
|
720
|
+
|
|
721
|
+
```typescript
|
|
722
|
+
// Backup all data from the database
|
|
723
|
+
const backupData = await db.backup()
|
|
724
|
+
|
|
725
|
+
// The backup data includes:
|
|
726
|
+
// - All nouns (entities) with their vectors and metadata
|
|
727
|
+
// - All verbs (relationships) between nouns
|
|
728
|
+
// - Noun types and verb types
|
|
729
|
+
// - HNSW index data for fast similarity search
|
|
730
|
+
// - Version information
|
|
731
|
+
|
|
732
|
+
// Save the backup data to a file (Node.js environment)
|
|
733
|
+
import fs from 'fs'
|
|
734
|
+
|
|
735
|
+
fs.writeFileSync('brainy-backup.json', JSON.stringify(backupData, null, 2))
|
|
736
|
+
```
|
|
737
|
+
|
|
738
|
+
### Restoring Data
|
|
739
|
+
|
|
740
|
+
Brainy's restore functionality can handle:
|
|
741
|
+
|
|
742
|
+
1. Complete backups with vectors and index data
|
|
743
|
+
2. Sparse data without vectors (vectors will be created during restore)
|
|
744
|
+
3. Data without HNSW index (index will be reconstructed if needed)
|
|
745
|
+
|
|
746
|
+
```typescript
|
|
747
|
+
// Restore data with all options
|
|
748
|
+
const restoreResult = await db.restore(backupData, {
|
|
749
|
+
clearExisting: true // Whether to clear existing data before restore
|
|
750
|
+
})
|
|
751
|
+
|
|
752
|
+
// Import sparse data (without vectors)
|
|
753
|
+
// Vectors will be automatically created using the embedding function
|
|
754
|
+
const sparseData = {
|
|
755
|
+
nouns: [
|
|
756
|
+
{
|
|
757
|
+
id: '123',
|
|
758
|
+
// No vector field - will be created during import
|
|
759
|
+
metadata: {
|
|
760
|
+
noun: 'Thing',
|
|
761
|
+
text: 'This text will be used to generate a vector'
|
|
762
|
+
}
|
|
763
|
+
}
|
|
764
|
+
],
|
|
765
|
+
verbs: [],
|
|
766
|
+
version: '1.0.0'
|
|
767
|
+
}
|
|
768
|
+
|
|
769
|
+
const sparseImportResult = await db.importSparseData(sparseData)
|
|
770
|
+
```
|
|
771
|
+
|
|
772
|
+
### CLI Backup/Restore
|
|
773
|
+
|
|
774
|
+
```bash
|
|
775
|
+
# Backup data to a file
|
|
776
|
+
brainy backup --output brainy-backup.json
|
|
777
|
+
|
|
778
|
+
# Restore data from a file
|
|
779
|
+
brainy restore --input brainy-backup.json --clear-existing
|
|
780
|
+
|
|
781
|
+
# Import sparse data (without vectors)
|
|
782
|
+
brainy import-sparse --input sparse-data.json
|
|
783
|
+
```
|
|
784
|
+
|
|
785
|
+
## Embedding
|
|
786
|
+
|
|
787
|
+
Brainy uses the following embedding approach:
|
|
788
|
+
|
|
789
|
+
- TensorFlow Universal Sentence Encoder (high-quality text embeddings)
|
|
790
|
+
- Custom embedding functions can be plugged in for specialized domains
|
|
791
|
+
|
|
792
|
+
## Extensions
|
|
793
|
+
|
|
794
|
+
Brainy includes an augmentation system for extending functionality:
|
|
795
|
+
|
|
796
|
+
- **Memory Augmentations**: Different storage backends
|
|
797
|
+
- **Sense Augmentations**: Process raw data
|
|
798
|
+
- **Cognition Augmentations**: Reasoning and inference
|
|
799
|
+
- **Dialog Augmentations**: Text processing and interaction
|
|
800
|
+
- **Perception Augmentations**: Data interpretation and visualization
|
|
801
|
+
- **Activation Augmentations**: Trigger actions
|
|
802
|
+
|
|
803
|
+
### Simplified Augmentation System
|
|
804
|
+
|
|
805
|
+
Brainy provides a simplified factory system for creating, importing, and executing augmentations with minimal
|
|
806
|
+
boilerplate:
|
|
807
|
+
|
|
808
|
+
```typescript
|
|
809
|
+
import {
|
|
810
|
+
createMemoryAugmentation,
|
|
811
|
+
createConduitAugmentation,
|
|
812
|
+
createSenseAugmentation,
|
|
813
|
+
addWebSocketSupport,
|
|
814
|
+
executeStreamlined,
|
|
815
|
+
processStaticData,
|
|
816
|
+
processStreamingData,
|
|
817
|
+
createPipeline
|
|
818
|
+
} from '@soulcraft/brainy'
|
|
819
|
+
|
|
820
|
+
// Create a memory augmentation with minimal code
|
|
821
|
+
const memoryAug = createMemoryAugmentation({
|
|
822
|
+
name: 'simple-memory',
|
|
823
|
+
description: 'A simple in-memory storage augmentation',
|
|
824
|
+
autoRegister: true,
|
|
825
|
+
autoInitialize: true,
|
|
826
|
+
|
|
827
|
+
// Implement only the methods you need
|
|
828
|
+
storeData: async (key, data) => {
|
|
829
|
+
// Your implementation here
|
|
830
|
+
return {
|
|
831
|
+
success: true,
|
|
832
|
+
data: true
|
|
833
|
+
}
|
|
834
|
+
},
|
|
835
|
+
|
|
836
|
+
retrieveData: async (key) => {
|
|
837
|
+
// Your implementation here
|
|
838
|
+
return {
|
|
839
|
+
success: true,
|
|
840
|
+
data: { example: 'data', key }
|
|
841
|
+
}
|
|
842
|
+
}
|
|
843
|
+
})
|
|
844
|
+
|
|
845
|
+
// Add WebSocket support to any augmentation
|
|
846
|
+
const wsAugmentation = addWebSocketSupport(memoryAug, {
|
|
847
|
+
connectWebSocket: async (url) => {
|
|
848
|
+
// Your implementation here
|
|
849
|
+
return {
|
|
850
|
+
connectionId: 'ws-1',
|
|
851
|
+
url,
|
|
852
|
+
status: 'connected'
|
|
853
|
+
}
|
|
854
|
+
}
|
|
855
|
+
})
|
|
856
|
+
|
|
857
|
+
// Process static data through a pipeline
|
|
858
|
+
const result = await processStaticData(
|
|
859
|
+
'Input data',
|
|
860
|
+
[
|
|
861
|
+
{
|
|
862
|
+
augmentation: senseAug,
|
|
863
|
+
method: 'processRawData',
|
|
864
|
+
transformArgs: (data) => [data, 'text']
|
|
865
|
+
},
|
|
866
|
+
{
|
|
867
|
+
augmentation: memoryAug,
|
|
868
|
+
method: 'storeData',
|
|
869
|
+
transformArgs: (data) => ['processed-data', data]
|
|
870
|
+
}
|
|
871
|
+
]
|
|
872
|
+
)
|
|
873
|
+
|
|
874
|
+
// Create a reusable pipeline
|
|
875
|
+
const pipeline = createPipeline([
|
|
876
|
+
{
|
|
877
|
+
augmentation: senseAug,
|
|
878
|
+
method: 'processRawData',
|
|
879
|
+
transformArgs: (data) => [data, 'text']
|
|
880
|
+
},
|
|
881
|
+
{
|
|
882
|
+
augmentation: memoryAug,
|
|
883
|
+
method: 'storeData',
|
|
884
|
+
transformArgs: (data) => ['processed-data', data]
|
|
885
|
+
}
|
|
886
|
+
])
|
|
887
|
+
|
|
888
|
+
// Use the pipeline
|
|
889
|
+
const result = await pipeline('New input data')
|
|
890
|
+
|
|
891
|
+
// Dynamically load augmentations at runtime
|
|
892
|
+
const loadedAugmentations = await loadAugmentationModule(
|
|
893
|
+
import('./my-augmentations.js'),
|
|
894
|
+
{
|
|
895
|
+
autoRegister: true,
|
|
896
|
+
autoInitialize: true
|
|
897
|
+
}
|
|
898
|
+
)
|
|
899
|
+
```
|
|
900
|
+
|
|
901
|
+
The simplified augmentation system provides:
|
|
902
|
+
|
|
903
|
+
1. **Factory Functions** - Create augmentations with minimal boilerplate
|
|
904
|
+
2. **WebSocket Support** - Add WebSocket capabilities to any augmentation
|
|
905
|
+
3. **Streamlined Pipeline** - Process data through augmentations more efficiently
|
|
906
|
+
4. **Dynamic Loading** - Load augmentations at runtime when needed
|
|
907
|
+
5. **Static & Streaming Data** - Handle both static and streaming data with the same API
|
|
908
|
+
|
|
909
|
+
### Model Control Protocol (MCP)
|
|
910
|
+
|
|
911
|
+
Brainy includes a Model Control Protocol (MCP) implementation that allows external models to access Brainy data and use
|
|
912
|
+
the augmentation pipeline as tools:
|
|
913
|
+
|
|
914
|
+
- **BrainyMCPAdapter**: Provides access to Brainy data through MCP
|
|
915
|
+
- **MCPAugmentationToolset**: Exposes the augmentation pipeline as tools
|
|
916
|
+
- **BrainyMCPService**: Integrates the adapter and toolset, providing WebSocket and REST server implementations
|
|
917
|
+
|
|
918
|
+
Environment compatibility:
|
|
919
|
+
|
|
920
|
+
- **BrainyMCPAdapter** and **MCPAugmentationToolset** can run in any environment (browser, Node.js, server)
|
|
921
|
+
- **BrainyMCPService** core functionality works in any environment, but server functionality (WebSocket/REST) is in the
|
|
922
|
+
cloud-wrapper project
|
|
923
|
+
|
|
924
|
+
For detailed documentation and usage examples, see the [MCP documentation](src/mcp/README.md).
|
|
925
|
+
|
|
926
|
+
## Cross-Environment Compatibility
|
|
927
|
+
|
|
928
|
+
Brainy is designed to run seamlessly in any environment, from browsers to Node.js to serverless functions and
|
|
929
|
+
containers. All Brainy data, functions, and augmentations are environment-agnostic, allowing you to use the same code
|
|
930
|
+
everywhere.
|
|
931
|
+
|
|
932
|
+
### Environment Detection
|
|
933
|
+
|
|
934
|
+
Brainy automatically detects the environment it's running in:
|
|
935
|
+
|
|
936
|
+
```typescript
|
|
937
|
+
import { environment } from '@soulcraft/brainy'
|
|
938
|
+
|
|
939
|
+
// Check which environment we're running in
|
|
940
|
+
console.log(`Running in ${
|
|
941
|
+
environment.isBrowser ? 'browser' :
|
|
942
|
+
environment.isNode ? 'Node.js' :
|
|
943
|
+
'serverless/unknown'
|
|
944
|
+
} environment`)
|
|
945
|
+
```
|
|
946
|
+
|
|
947
|
+
### Adaptive Storage
|
|
948
|
+
|
|
949
|
+
Storage adapters are automatically selected based on the environment:
|
|
950
|
+
|
|
951
|
+
- **Browser**: Uses Origin Private File System (OPFS) when available, falls back to in-memory storage
|
|
952
|
+
- **Node.js**: Uses file system storage by default, with options for S3-compatible cloud storage
|
|
953
|
+
- **Serverless**: Uses in-memory storage with options for cloud persistence
|
|
954
|
+
- **Container**: Automatically detects and uses the appropriate storage based on available capabilities
|
|
955
|
+
|
|
956
|
+
### Dynamic Imports
|
|
957
|
+
|
|
958
|
+
Brainy uses dynamic imports to load environment-specific dependencies only when needed, keeping the bundle size small
|
|
959
|
+
and ensuring compatibility across environments.
|
|
960
|
+
|
|
961
|
+
### Browser Support
|
|
962
|
+
|
|
963
|
+
Works in all modern browsers:
|
|
964
|
+
|
|
965
|
+
- Chrome 86+
|
|
966
|
+
- Edge 86+
|
|
967
|
+
- Opera 72+
|
|
968
|
+
- Chrome for Android 86+
|
|
969
|
+
|
|
970
|
+
For browsers without OPFS support, falls back to in-memory storage.
|
|
971
|
+
|
|
972
|
+
## Cloud Deployment
|
|
973
|
+
|
|
974
|
+
Brainy can be deployed as a standalone web service on various cloud platforms using the included cloud wrapper:
|
|
975
|
+
|
|
976
|
+
- **AWS Lambda and API Gateway**: Deploy as a serverless function with API Gateway
|
|
977
|
+
- **Google Cloud Run**: Deploy as a containerized service
|
|
978
|
+
- **Cloudflare Workers**: Deploy as a serverless function on the edge
|
|
979
|
+
|
|
980
|
+
The cloud wrapper provides both RESTful and WebSocket APIs for all Brainy operations, enabling both request-response and
|
|
981
|
+
real-time communication patterns. It supports multiple storage backends and can be configured via environment variables.
|
|
982
|
+
|
|
983
|
+
Key features of the cloud wrapper:
|
|
984
|
+
|
|
985
|
+
- RESTful API for standard CRUD operations
|
|
986
|
+
- WebSocket API for real-time updates and subscriptions
|
|
987
|
+
- Model Control Protocol (MCP) service for external model access
|
|
988
|
+
- Support for multiple storage backends (Memory, FileSystem, S3)
|
|
989
|
+
- Configurable via environment variables
|
|
990
|
+
- Deployment scripts for AWS, Google Cloud, and Cloudflare
|
|
991
|
+
|
|
992
|
+
### Deploying to the Cloud
|
|
993
|
+
|
|
994
|
+
You can deploy the cloud wrapper to various cloud platforms using the following npm scripts from the root directory:
|
|
995
|
+
|
|
996
|
+
```bash
|
|
997
|
+
# Deploy to AWS Lambda and API Gateway
|
|
998
|
+
npm run deploy:cloud:aws
|
|
999
|
+
|
|
1000
|
+
# Deploy to Google Cloud Run
|
|
1001
|
+
npm run deploy:cloud:gcp
|
|
1002
|
+
|
|
1003
|
+
# Deploy to Cloudflare Workers
|
|
1004
|
+
npm run deploy:cloud:cloudflare
|
|
1005
|
+
|
|
1006
|
+
# Show available deployment options
|
|
1007
|
+
npm run deploy:cloud
|
|
1008
|
+
```
|
|
1009
|
+
|
|
1010
|
+
Before deploying, make sure to configure the appropriate environment variables in the `cloud-wrapper/.env` file. See
|
|
1011
|
+
the [Cloud Wrapper README](cloud-wrapper/README.md) for detailed configuration instructions and API documentation.
|
|
1012
|
+
|
|
1013
|
+
## Related Projects
|
|
1014
|
+
|
|
1015
|
+
- **[Cartographer](https://github.com/sodal-project/cartographer)** - A companion project that provides standardized
|
|
1016
|
+
interfaces for interacting with Brainy
|
|
1017
|
+
|
|
1018
|
+
## Demo
|
|
1019
|
+
|
|
1020
|
+
The repository includes a comprehensive demo that showcases Brainy's main features:
|
|
1021
|
+
|
|
1022
|
+
- `examples/demo.html` - A single demo page with animations demonstrating Brainy's features.
|
|
1023
|
+
- **[Try the live demo](https://soulcraft-research.github.io/brainy/demo/index.html)** - Check out the
|
|
1024
|
+
interactive demo on
|
|
1025
|
+
GitHub Pages
|
|
1026
|
+
- Or run it locally with `npm run demo` (see [demo instructions](README.demo.md) for details)
|
|
1027
|
+
- To deploy your own version to GitHub Pages, run `npm run deploy:demo`
|
|
1028
|
+
- To deploy automatically when pushing to the main branch, a GitHub Actions workflow is included in
|
|
1029
|
+
`.github/workflows/deploy-demo.yml`
|
|
1030
|
+
- To use a custom domain (like www.soulcraft.com):
|
|
1031
|
+
1. A CNAME file is already included in the demo directory
|
|
1032
|
+
2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
|
|
1033
|
+
3. Configure your domain's DNS settings to point to GitHub Pages:
|
|
1034
|
+
- Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
|
|
1035
|
+
- Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
|
|
1036
|
+
|
|
1037
|
+
The demo showcases:
|
|
1038
|
+
|
|
1039
|
+
- How Brainy runs in different environments (browser, Node.js, server, cloud)
|
|
1040
|
+
- How the noun-verb data model works
|
|
1041
|
+
- How HNSW search works
|
|
1042
|
+
|
|
1043
|
+
## Syncing Brainy Instances
|
|
1044
|
+
|
|
1045
|
+
You can use the conduit augmentations to sync Brainy instances:
|
|
1046
|
+
|
|
1047
|
+
- **WebSocket iConduit**: For syncing between browsers and servers, or between servers. WebSockets cannot be used for
|
|
1048
|
+
direct browser-to-browser communication without a server in the middle.
|
|
1049
|
+
- **WebRTC iConduit**: For direct peer-to-peer syncing between browsers. This is the recommended approach for
|
|
1050
|
+
browser-to-browser communication.
|
|
1051
|
+
|
|
1052
|
+
#### WebSocket Sync Example
|
|
1053
|
+
|
|
1054
|
+
```typescript
|
|
1055
|
+
import {
|
|
1056
|
+
BrainyData,
|
|
1057
|
+
pipeline,
|
|
1058
|
+
createConduitAugmentation
|
|
1059
|
+
} from '@soulcraft/brainy'
|
|
1060
|
+
|
|
1061
|
+
// Create and initialize the database
|
|
1062
|
+
const db = new BrainyData()
|
|
1063
|
+
await db.init()
|
|
1064
|
+
|
|
1065
|
+
// Create a WebSocket conduit augmentation
|
|
1066
|
+
const wsConduit = await createConduitAugmentation('websocket', 'my-websocket-sync')
|
|
1067
|
+
|
|
1068
|
+
// Register the augmentation with the pipeline
|
|
1069
|
+
pipeline.register(wsConduit)
|
|
1070
|
+
|
|
1071
|
+
// Connect to another Brainy instance (server or browser)
|
|
1072
|
+
// Replace the example URL below with your actual WebSocket server URL
|
|
1073
|
+
const connectionResult = await pipeline.executeConduitPipeline(
|
|
1074
|
+
'establishConnection',
|
|
1075
|
+
['wss://example-websocket-server.com/brainy-sync', { protocols: 'brainy-sync' }]
|
|
1076
|
+
)
|
|
1077
|
+
|
|
1078
|
+
if (connectionResult[0] && (await connectionResult[0]).success) {
|
|
1079
|
+
const connection = (await connectionResult[0]).data
|
|
1080
|
+
|
|
1081
|
+
// Read data from the remote instance
|
|
1082
|
+
const readResult = await pipeline.executeConduitPipeline(
|
|
1083
|
+
'readData',
|
|
1084
|
+
[{ connectionId: connection.connectionId, query: { type: 'getAllNouns' } }]
|
|
1085
|
+
)
|
|
1086
|
+
|
|
1087
|
+
// Process and add the received data to the local instance
|
|
1088
|
+
if (readResult[0] && (await readResult[0]).success) {
|
|
1089
|
+
const remoteNouns = (await readResult[0]).data
|
|
1090
|
+
for (const noun of remoteNouns) {
|
|
1091
|
+
await db.add(noun.vector, noun.metadata)
|
|
1092
|
+
}
|
|
1093
|
+
}
|
|
1094
|
+
|
|
1095
|
+
// Set up real-time sync by monitoring the stream
|
|
1096
|
+
await wsConduit.monitorStream(connection.connectionId, async (data) => {
|
|
1097
|
+
// Handle incoming data (e.g., new nouns, verbs, updates)
|
|
1098
|
+
if (data.type === 'newNoun') {
|
|
1099
|
+
await db.add(data.vector, data.metadata)
|
|
1100
|
+
} else if (data.type === 'newVerb') {
|
|
1101
|
+
await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
|
|
1102
|
+
}
|
|
1103
|
+
})
|
|
1104
|
+
}
|
|
1105
|
+
```
|
|
1106
|
+
|
|
1107
|
+
#### WebRTC Peer-to-Peer Sync Example
|
|
1108
|
+
|
|
1109
|
+
```typescript
|
|
1110
|
+
import {
|
|
1111
|
+
BrainyData,
|
|
1112
|
+
pipeline,
|
|
1113
|
+
createConduitAugmentation
|
|
1114
|
+
} from '@soulcraft/brainy'
|
|
1115
|
+
|
|
1116
|
+
// Create and initialize the database
|
|
1117
|
+
const db = new BrainyData()
|
|
1118
|
+
await db.init()
|
|
1119
|
+
|
|
1120
|
+
// Create a WebRTC conduit augmentation
|
|
1121
|
+
const webrtcConduit = await createConduitAugmentation('webrtc', 'my-webrtc-sync')
|
|
1122
|
+
|
|
1123
|
+
// Register the augmentation with the pipeline
|
|
1124
|
+
pipeline.register(webrtcConduit)
|
|
1125
|
+
|
|
1126
|
+
// Connect to a peer using a signaling server
|
|
1127
|
+
// Replace the example values below with your actual configuration
|
|
1128
|
+
const connectionResult = await pipeline.executeConduitPipeline(
|
|
1129
|
+
'establishConnection',
|
|
1130
|
+
[
|
|
1131
|
+
'peer-id-to-connect-to', // Replace with actual peer ID
|
|
1132
|
+
{
|
|
1133
|
+
signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
|
|
1134
|
+
localPeerId: 'my-local-peer-id', // Replace with your local peer ID
|
|
1135
|
+
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] // Public STUN server
|
|
1136
|
+
}
|
|
1137
|
+
]
|
|
1138
|
+
)
|
|
1139
|
+
|
|
1140
|
+
if (connectionResult[0] && (await connectionResult[0]).success) {
|
|
1141
|
+
const connection = (await connectionResult[0]).data
|
|
1142
|
+
|
|
1143
|
+
// Set up real-time sync by monitoring the stream
|
|
1144
|
+
await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
|
|
1145
|
+
// Handle incoming data (e.g., new nouns, verbs, updates)
|
|
1146
|
+
if (data.type === 'newNoun') {
|
|
1147
|
+
await db.add(data.vector, data.metadata)
|
|
1148
|
+
} else if (data.type === 'newVerb') {
|
|
1149
|
+
await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
|
|
1150
|
+
}
|
|
1151
|
+
})
|
|
1152
|
+
|
|
1153
|
+
// When adding new data locally, also send to the peer
|
|
1154
|
+
const nounId = await db.add("New data to sync", { noun: "Thing" })
|
|
1155
|
+
|
|
1156
|
+
// Send the new noun to the peer
|
|
1157
|
+
await pipeline.executeConduitPipeline(
|
|
1158
|
+
'writeData',
|
|
1159
|
+
[
|
|
1160
|
+
{
|
|
1161
|
+
connectionId: connection.connectionId,
|
|
1162
|
+
data: {
|
|
1163
|
+
type: 'newNoun',
|
|
1164
|
+
id: nounId,
|
|
1165
|
+
vector: (await db.get(nounId)).vector,
|
|
1166
|
+
metadata: (await db.get(nounId)).metadata
|
|
1167
|
+
}
|
|
1168
|
+
}
|
|
1169
|
+
]
|
|
1170
|
+
)
|
|
1171
|
+
}
|
|
1172
|
+
```
|
|
1173
|
+
|
|
1174
|
+
#### Browser-Server Search Example
|
|
1175
|
+
|
|
1176
|
+
Brainy supports searching a server-hosted instance from a browser, storing results locally, and performing further
|
|
1177
|
+
searches against the local instance:
|
|
1178
|
+
|
|
1179
|
+
```typescript
|
|
1180
|
+
import { BrainyData } from '@soulcraft/brainy'
|
|
1181
|
+
|
|
1182
|
+
// Create and initialize the database with remote server configuration
|
|
1183
|
+
// Replace the example URL below with your actual Brainy server URL
|
|
1184
|
+
const db = new BrainyData({
|
|
1185
|
+
remoteServer: {
|
|
1186
|
+
url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
|
|
1187
|
+
protocols: 'brainy-sync',
|
|
1188
|
+
autoConnect: true // Connect automatically during initialization
|
|
1189
|
+
}
|
|
1190
|
+
})
|
|
1191
|
+
await db.init()
|
|
1192
|
+
|
|
1193
|
+
// Or connect manually after initialization
|
|
1194
|
+
if (!db.isConnectedToRemoteServer()) {
|
|
1195
|
+
// Replace the example URL below with your actual Brainy server URL
|
|
1196
|
+
await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
|
|
1197
|
+
}
|
|
1198
|
+
|
|
1199
|
+
// Search the remote server (results are stored locally)
|
|
1200
|
+
const remoteResults = await db.searchText('machine learning', 5, { searchMode: 'remote' })
|
|
1201
|
+
|
|
1202
|
+
// Search the local database (includes previously stored results)
|
|
1203
|
+
const localResults = await db.searchText('machine learning', 5, { searchMode: 'local' })
|
|
1204
|
+
|
|
1205
|
+
// Perform a combined search (local first, then remote if needed)
|
|
1206
|
+
const combinedResults = await db.searchText('neural networks', 5, { searchMode: 'combined' })
|
|
1207
|
+
|
|
1208
|
+
// Add data to both local and remote instances
|
|
1209
|
+
const id = await db.addToBoth('Deep learning is a subset of machine learning', {
|
|
1210
|
+
noun: 'Concept',
|
|
1211
|
+
category: 'AI',
|
|
1212
|
+
tags: ['deep learning', 'neural networks']
|
|
1213
|
+
})
|
|
1214
|
+
|
|
1215
|
+
// Clean up when done
|
|
1216
|
+
await db.shutDown()
|
|
1217
|
+
```
|
|
1218
|
+
|
|
1219
|
+
For a complete demonstration of Brainy's features, see the [demo page](index.html).
|
|
1220
|
+
|
|
1221
|
+
## Requirements
|
|
1222
|
+
|
|
1223
|
+
- Node.js >= 23.11.0
|
|
1224
|
+
|
|
1225
|
+
## Contributing
|
|
1226
|
+
|
|
1227
|
+
For detailed contribution guidelines, please see [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
1228
|
+
|
|
1229
|
+
We have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to follow.
|
|
1230
|
+
|
|
1231
|
+
### Reporting Issues
|
|
1232
|
+
|
|
1233
|
+
We use GitHub issues to track bugs and feature requests. Please use the provided issue templates when creating a new issue:
|
|
1234
|
+
- [Bug Report Template](.github/ISSUE_TEMPLATE/bug_report.md)
|
|
1235
|
+
- [Feature Request Template](.github/ISSUE_TEMPLATE/feature_request.md)
|
|
1236
|
+
|
|
1237
|
+
### Code Style Guidelines
|
|
1238
|
+
|
|
1239
|
+
Brainy follows a specific code style to maintain consistency throughout the codebase:
|
|
1240
|
+
|
|
1241
|
+
1. **No Semicolons**: All code in the project should avoid using semicolons wherever possible
|
|
1242
|
+
2. **Formatting**: The project uses Prettier for code formatting
|
|
1243
|
+
3. **Linting**: ESLint is configured with specific rules for the project
|
|
1244
|
+
4. **TypeScript Configuration**: Strict type checking enabled with ES2020 target
|
|
1245
|
+
5. **Commit Messages**: Use the imperative mood and keep the first line concise
|
|
1246
|
+
|
|
1247
|
+
### Development Workflow
|
|
1248
|
+
|
|
1249
|
+
1. Fork the repository
|
|
1250
|
+
2. Create a feature branch
|
|
1251
|
+
3. Make your changes
|
|
1252
|
+
4. Run tests with `npm test`
|
|
1253
|
+
5. Submit a pull request
|
|
1254
|
+
|
|
1255
|
+
## License
|
|
1256
|
+
|
|
1257
|
+
[MIT](LICENSE)
|