@soulcraft/brainy 0.9.10 → 0.9.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -22
- package/dist/brainy.js +36414 -66507
- package/dist/brainy.min.js +1112 -4724
- package/dist/brainyData.d.ts +2 -0
- package/dist/hnsw/hnswIndex.d.ts +2 -0
- package/dist/unified.js +36414 -66507
- package/dist/unified.min.js +1112 -4724
- package/dist/utils/distance.d.ts +12 -1
- package/dist/utils/embedding.d.ts +19 -14
- package/dist/utils/environment.d.ts +1 -0
- package/dist/utils/index.d.ts +1 -0
- package/dist/utils/version.d.ts +1 -1
- package/dist/utils/workerUtils.d.ts +7 -17
- package/package.json +7 -2
package/README.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
[](https://nodejs.org/)
|
|
7
7
|
[](https://www.typescriptlang.org/)
|
|
8
8
|
[](CONTRIBUTING.md)
|
|
9
|
-
[](https://www.npmjs.com/package/@soulcraft/brainy)
|
|
10
10
|
|
|
11
11
|
[//]: # ([](https://github.com/sodal-project/cartographer))
|
|
12
12
|
|
|
@@ -508,7 +508,7 @@ const id = await db.add(textOrVector, {
|
|
|
508
508
|
// other metadata...
|
|
509
509
|
})
|
|
510
510
|
|
|
511
|
-
// Add multiple nouns in parallel (with multithreading)
|
|
511
|
+
// Add multiple nouns in parallel (with multithreading and batch embedding)
|
|
512
512
|
const ids = await db.addBatch([
|
|
513
513
|
{
|
|
514
514
|
vectorOrData: "First item to add",
|
|
@@ -521,7 +521,8 @@ const ids = await db.addBatch([
|
|
|
521
521
|
// More items...
|
|
522
522
|
], {
|
|
523
523
|
forceEmbed: false,
|
|
524
|
-
concurrency: 4 // Control the level of parallelism (default: 4)
|
|
524
|
+
concurrency: 4, // Control the level of parallelism (default: 4)
|
|
525
|
+
batchSize: 50 // Control the number of items to process in a single batch (default: 50)
|
|
525
526
|
})
|
|
526
527
|
|
|
527
528
|
// Retrieve a noun
|
|
@@ -591,9 +592,7 @@ await db.init()
|
|
|
591
592
|
|
|
592
593
|
// Or use the threaded embedding function for better performance
|
|
593
594
|
const threadedDb = new BrainyData({
|
|
594
|
-
embeddingFunction: createThreadedEmbeddingFunction(
|
|
595
|
-
fallbackToMain: true // Fall back to main thread if threading fails
|
|
596
|
-
})
|
|
595
|
+
embeddingFunction: createThreadedEmbeddingFunction()
|
|
597
596
|
})
|
|
598
597
|
await threadedDb.init()
|
|
599
598
|
|
|
@@ -602,17 +601,36 @@ const vector = await db.embed("Some text to convert to a vector")
|
|
|
602
601
|
```
|
|
603
602
|
|
|
604
603
|
The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
|
|
605
|
-
performance, especially for
|
|
606
|
-
|
|
604
|
+
performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
|
|
605
|
+
falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
|
|
606
|
+
implementation includes worker reuse and model caching for optimal performance.
|
|
607
607
|
|
|
608
608
|
### Performance Tuning
|
|
609
609
|
|
|
610
|
-
Brainy
|
|
610
|
+
Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
|
|
611
|
+
container, server):
|
|
612
|
+
|
|
613
|
+
#### GPU and CPU Optimization
|
|
614
|
+
|
|
615
|
+
Brainy uses GPU and CPU optimization for compute-intensive operations:
|
|
616
|
+
|
|
617
|
+
1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
|
|
618
|
+
2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
|
|
619
|
+
3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
|
|
620
|
+
4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
|
|
621
|
+
5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
|
|
622
|
+
|
|
623
|
+
#### Multithreading Support
|
|
624
|
+
|
|
625
|
+
Brainy includes comprehensive multithreading support to improve performance across all environments:
|
|
611
626
|
|
|
612
627
|
1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
|
|
613
628
|
2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
|
|
614
629
|
3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
|
|
615
|
-
4. **
|
|
630
|
+
4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
|
|
631
|
+
5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
|
|
632
|
+
6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
|
|
633
|
+
7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
|
|
616
634
|
|
|
617
635
|
```typescript
|
|
618
636
|
import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
|
|
@@ -629,8 +647,8 @@ const db = new BrainyData({
|
|
|
629
647
|
efSearch: 50, // Search candidate list size
|
|
630
648
|
},
|
|
631
649
|
|
|
632
|
-
//
|
|
633
|
-
|
|
650
|
+
// Performance optimization options
|
|
651
|
+
performance: {
|
|
634
652
|
useParallelization: true, // Enable multithreaded search operations
|
|
635
653
|
},
|
|
636
654
|
|
|
@@ -706,10 +724,16 @@ console.log(status.details.index)
|
|
|
706
724
|
|
|
707
725
|
## Distance Functions
|
|
708
726
|
|
|
709
|
-
|
|
710
|
-
|
|
711
|
-
- `
|
|
712
|
-
- `
|
|
727
|
+
Brainy provides several distance functions for vector similarity calculations:
|
|
728
|
+
|
|
729
|
+
- `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
|
|
730
|
+
- `euclideanDistance`: Measures the straight-line distance between vectors
|
|
731
|
+
- `manhattanDistance`: Measures the sum of absolute differences between vector components
|
|
732
|
+
- `dotProductDistance`: Measures the negative dot product between vectors
|
|
733
|
+
|
|
734
|
+
All distance functions are optimized for performance and automatically use the most efficient implementation based on
|
|
735
|
+
the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
|
|
736
|
+
and multithreading when available to improve performance.
|
|
713
737
|
|
|
714
738
|
## Backup and Restore
|
|
715
739
|
|
|
@@ -791,6 +815,9 @@ brainy import-sparse --input sparse-data.json
|
|
|
791
815
|
Brainy uses the following embedding approach:
|
|
792
816
|
|
|
793
817
|
- TensorFlow Universal Sentence Encoder (high-quality text embeddings)
|
|
818
|
+
- GPU acceleration when available (via WebGL in browsers)
|
|
819
|
+
- Batch embedding for processing multiple items efficiently
|
|
820
|
+
- Worker reuse and model caching for optimal performance
|
|
794
821
|
- Custom embedding functions can be plugged in for specialized domains
|
|
795
822
|
|
|
796
823
|
## Extensions
|
|
@@ -1033,8 +1060,9 @@ The repository includes a comprehensive demo that showcases Brainy's main featur
|
|
|
1033
1060
|
which automatically deploys when pushing to the main branch or can be manually triggered
|
|
1034
1061
|
- To use a custom domain (like www.soulcraft.com):
|
|
1035
1062
|
1. A CNAME file is already included in the demo directory
|
|
1036
|
-
|
|
1037
|
-
|
|
1063
|
+
2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
|
|
1064
|
+
3. Configure your domain's DNS settings to point to GitHub Pages:
|
|
1065
|
+
|
|
1038
1066
|
- Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
|
|
1039
1067
|
- Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
|
|
1040
1068
|
|
|
@@ -1216,7 +1244,7 @@ const id = await db.addToBoth('Deep learning is a subset of machine learning', {
|
|
|
1216
1244
|
tags: ['deep learning', 'neural networks']
|
|
1217
1245
|
})
|
|
1218
1246
|
|
|
1219
|
-
// Clean up when done
|
|
1247
|
+
// Clean up when done (this also cleans up worker pools)
|
|
1220
1248
|
await db.shutDown()
|
|
1221
1249
|
```
|
|
1222
1250
|
|
|
@@ -1274,9 +1302,9 @@ Brainy follows a specific code style to maintain consistency throughout the code
|
|
|
1274
1302
|
The README badges are automatically updated during the build process:
|
|
1275
1303
|
|
|
1276
1304
|
1. **npm Version Badge**: The npm version badge is automatically updated to match the version in package.json when:
|
|
1277
|
-
|
|
1278
|
-
|
|
1279
|
-
|
|
1305
|
+
- Running `npm run build` (via the prebuild script)
|
|
1306
|
+
- Running `npm version` commands (patch, minor, major)
|
|
1307
|
+
- Manually running `node scripts/generate-version.js`
|
|
1280
1308
|
|
|
1281
1309
|
This ensures that the badge always reflects the current version in package.json, even before publishing to npm.
|
|
1282
1310
|
|