@soulcraft/brainy 0.9.10 → 0.9.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![Node.js](https://img.shields.io/badge/node-%3E%3D23.11.0-brightgreen.svg)](https://nodejs.org/)
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.1.6-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
9
- [![npm](https://img.shields.io/badge/npm-v0.9.10-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
9
+ [![npm](https://img.shields.io/badge/npm-v0.9.11-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
10
10
 
11
11
  [//]: # ([![Cartographer](https://img.shields.io/badge/Cartographer-Official%20Standard-brightgreen)](https://github.com/sodal-project/cartographer))
12
12
 
@@ -508,7 +508,7 @@ const id = await db.add(textOrVector, {
508
508
  // other metadata...
509
509
  })
510
510
 
511
- // Add multiple nouns in parallel (with multithreading)
511
+ // Add multiple nouns in parallel (with multithreading and batch embedding)
512
512
  const ids = await db.addBatch([
513
513
  {
514
514
  vectorOrData: "First item to add",
@@ -521,7 +521,8 @@ const ids = await db.addBatch([
521
521
  // More items...
522
522
  ], {
523
523
  forceEmbed: false,
524
- concurrency: 4 // Control the level of parallelism (default: 4)
524
+ concurrency: 4, // Control the level of parallelism (default: 4)
525
+ batchSize: 50 // Control the number of items to process in a single batch (default: 50)
525
526
  })
526
527
 
527
528
  // Retrieve a noun
@@ -591,9 +592,7 @@ await db.init()
591
592
 
592
593
  // Or use the threaded embedding function for better performance
593
594
  const threadedDb = new BrainyData({
594
- embeddingFunction: createThreadedEmbeddingFunction({
595
- fallbackToMain: true // Fall back to main thread if threading fails
596
- })
595
+ embeddingFunction: createThreadedEmbeddingFunction()
597
596
  })
598
597
  await threadedDb.init()
599
598
 
@@ -602,17 +601,36 @@ const vector = await db.embed("Some text to convert to a vector")
602
601
  ```
603
602
 
604
603
  The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
605
- performance, especially for CPU-intensive embedding operations. It automatically falls back to the main thread if
606
- threading is not available in the current environment.
604
+ performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
605
+ falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
606
+ implementation includes worker reuse and model caching for optimal performance.
607
607
 
608
608
  ### Performance Tuning
609
609
 
610
- Brainy now includes comprehensive multithreading support to improve performance across all environments:
610
+ Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
611
+ container, server):
612
+
613
+ #### GPU and CPU Optimization
614
+
615
+ Brainy uses GPU and CPU optimization for compute-intensive operations:
616
+
617
+ 1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
618
+ 2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
619
+ 3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
620
+ 4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
621
+ 5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
622
+
623
+ #### Multithreading Support
624
+
625
+ Brainy includes comprehensive multithreading support to improve performance across all environments:
611
626
 
612
627
  1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
613
628
  2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
614
629
  3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
615
- 4. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
630
+ 4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
631
+ 5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
632
+ 6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
633
+ 7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
616
634
 
617
635
  ```typescript
618
636
  import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
@@ -629,8 +647,8 @@ const db = new BrainyData({
629
647
  efSearch: 50, // Search candidate list size
630
648
  },
631
649
 
632
- // Multithreading options
633
- threading: {
650
+ // Performance optimization options
651
+ performance: {
634
652
  useParallelization: true, // Enable multithreaded search operations
635
653
  },
636
654
 
@@ -706,10 +724,16 @@ console.log(status.details.index)
706
724
 
707
725
  ## Distance Functions
708
726
 
709
- - `cosineDistance` (default)
710
- - `euclideanDistance`
711
- - `manhattanDistance`
712
- - `dotProductDistance`
727
+ Brainy provides several distance functions for vector similarity calculations:
728
+
729
+ - `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
730
+ - `euclideanDistance`: Measures the straight-line distance between vectors
731
+ - `manhattanDistance`: Measures the sum of absolute differences between vector components
732
+ - `dotProductDistance`: Measures the negative dot product between vectors
733
+
734
+ All distance functions are optimized for performance and automatically use the most efficient implementation based on
735
+ the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
736
+ and multithreading when available to improve performance.
713
737
 
714
738
  ## Backup and Restore
715
739
 
@@ -791,6 +815,9 @@ brainy import-sparse --input sparse-data.json
791
815
  Brainy uses the following embedding approach:
792
816
 
793
817
  - TensorFlow Universal Sentence Encoder (high-quality text embeddings)
818
+ - GPU acceleration when available (via WebGL in browsers)
819
+ - Batch embedding for processing multiple items efficiently
820
+ - Worker reuse and model caching for optimal performance
794
821
  - Custom embedding functions can be plugged in for specialized domains
795
822
 
796
823
  ## Extensions
@@ -1216,7 +1243,7 @@ const id = await db.addToBoth('Deep learning is a subset of machine learning', {
1216
1243
  tags: ['deep learning', 'neural networks']
1217
1244
  })
1218
1245
 
1219
- // Clean up when done
1246
+ // Clean up when done (this also cleans up worker pools)
1220
1247
  await db.shutDown()
1221
1248
  ```
1222
1249
 
@@ -1274,9 +1301,9 @@ Brainy follows a specific code style to maintain consistency throughout the code
1274
1301
  The README badges are automatically updated during the build process:
1275
1302
 
1276
1303
  1. **npm Version Badge**: The npm version badge is automatically updated to match the version in package.json when:
1277
- - Running `npm run build` (via the prebuild script)
1278
- - Running `npm version` commands (patch, minor, major)
1279
- - Manually running `node scripts/generate-version.js`
1304
+ - Running `npm run build` (via the prebuild script)
1305
+ - Running `npm version` commands (patch, minor, major)
1306
+ - Manually running `node scripts/generate-version.js`
1280
1307
 
1281
1308
  This ensures that the badge always reflects the current version in package.json, even before publishing to npm.
1282
1309