@soulcraft/brainy 0.9.10 → 0.9.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![Node.js](https://img.shields.io/badge/node-%3E%3D23.11.0-brightgreen.svg)](https://nodejs.org/)
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.1.6-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
9
- [![npm](https://img.shields.io/badge/npm-v0.9.10-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
9
+ [![npm](https://img.shields.io/badge/npm-v0.9.12-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
10
10
 
11
11
  [//]: # ([![Cartographer](https://img.shields.io/badge/Cartographer-Official%20Standard-brightgreen)](https://github.com/sodal-project/cartographer))
12
12
 
@@ -508,7 +508,7 @@ const id = await db.add(textOrVector, {
508
508
  // other metadata...
509
509
  })
510
510
 
511
- // Add multiple nouns in parallel (with multithreading)
511
+ // Add multiple nouns in parallel (with multithreading and batch embedding)
512
512
  const ids = await db.addBatch([
513
513
  {
514
514
  vectorOrData: "First item to add",
@@ -521,7 +521,8 @@ const ids = await db.addBatch([
521
521
  // More items...
522
522
  ], {
523
523
  forceEmbed: false,
524
- concurrency: 4 // Control the level of parallelism (default: 4)
524
+ concurrency: 4, // Control the level of parallelism (default: 4)
525
+ batchSize: 50 // Control the number of items to process in a single batch (default: 50)
525
526
  })
526
527
 
527
528
  // Retrieve a noun
@@ -591,9 +592,7 @@ await db.init()
591
592
 
592
593
  // Or use the threaded embedding function for better performance
593
594
  const threadedDb = new BrainyData({
594
- embeddingFunction: createThreadedEmbeddingFunction({
595
- fallbackToMain: true // Fall back to main thread if threading fails
596
- })
595
+ embeddingFunction: createThreadedEmbeddingFunction()
597
596
  })
598
597
  await threadedDb.init()
599
598
 
@@ -602,17 +601,36 @@ const vector = await db.embed("Some text to convert to a vector")
602
601
  ```
603
602
 
604
603
  The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
605
- performance, especially for CPU-intensive embedding operations. It automatically falls back to the main thread if
606
- threading is not available in the current environment.
604
+ performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
605
+ falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
606
+ implementation includes worker reuse and model caching for optimal performance.
607
607
 
608
608
  ### Performance Tuning
609
609
 
610
- Brainy now includes comprehensive multithreading support to improve performance across all environments:
610
+ Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
611
+ container, server):
612
+
613
+ #### GPU and CPU Optimization
614
+
615
+ Brainy uses GPU and CPU optimization for compute-intensive operations:
616
+
617
+ 1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
618
+ 2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
619
+ 3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
620
+ 4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
621
+ 5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
622
+
623
+ #### Multithreading Support
624
+
625
+ Brainy includes comprehensive multithreading support to improve performance across all environments:
611
626
 
612
627
  1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
613
628
  2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
614
629
  3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
615
- 4. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
630
+ 4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
631
+ 5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
632
+ 6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
633
+ 7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
616
634
 
617
635
  ```typescript
618
636
  import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
@@ -629,8 +647,8 @@ const db = new BrainyData({
629
647
  efSearch: 50, // Search candidate list size
630
648
  },
631
649
 
632
- // Multithreading options
633
- threading: {
650
+ // Performance optimization options
651
+ performance: {
634
652
  useParallelization: true, // Enable multithreaded search operations
635
653
  },
636
654
 
@@ -706,10 +724,16 @@ console.log(status.details.index)
706
724
 
707
725
  ## Distance Functions
708
726
 
709
- - `cosineDistance` (default)
710
- - `euclideanDistance`
711
- - `manhattanDistance`
712
- - `dotProductDistance`
727
+ Brainy provides several distance functions for vector similarity calculations:
728
+
729
+ - `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
730
+ - `euclideanDistance`: Measures the straight-line distance between vectors
731
+ - `manhattanDistance`: Measures the sum of absolute differences between vector components
732
+ - `dotProductDistance`: Measures the negative dot product between vectors
733
+
734
+ All distance functions are optimized for performance and automatically use the most efficient implementation based on
735
+ the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
736
+ and multithreading when available to improve performance.
713
737
 
714
738
  ## Backup and Restore
715
739
 
@@ -791,6 +815,9 @@ brainy import-sparse --input sparse-data.json
791
815
  Brainy uses the following embedding approach:
792
816
 
793
817
  - TensorFlow Universal Sentence Encoder (high-quality text embeddings)
818
+ - GPU acceleration when available (via WebGL in browsers)
819
+ - Batch embedding for processing multiple items efficiently
820
+ - Worker reuse and model caching for optimal performance
794
821
  - Custom embedding functions can be plugged in for specialized domains
795
822
 
796
823
  ## Extensions
@@ -1033,8 +1060,9 @@ The repository includes a comprehensive demo that showcases Brainy's main featur
1033
1060
  which automatically deploys when pushing to the main branch or can be manually triggered
1034
1061
  - To use a custom domain (like www.soulcraft.com):
1035
1062
  1. A CNAME file is already included in the demo directory
1036
- 2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
1037
- 3. Configure your domain's DNS settings to point to GitHub Pages:
1063
+ 2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
1064
+ 3. Configure your domain's DNS settings to point to GitHub Pages:
1065
+
1038
1066
  - Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
1039
1067
  - Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
1040
1068
 
@@ -1216,7 +1244,7 @@ const id = await db.addToBoth('Deep learning is a subset of machine learning', {
1216
1244
  tags: ['deep learning', 'neural networks']
1217
1245
  })
1218
1246
 
1219
- // Clean up when done
1247
+ // Clean up when done (this also cleans up worker pools)
1220
1248
  await db.shutDown()
1221
1249
  ```
1222
1250
 
@@ -1274,9 +1302,9 @@ Brainy follows a specific code style to maintain consistency throughout the code
1274
1302
  The README badges are automatically updated during the build process:
1275
1303
 
1276
1304
  1. **npm Version Badge**: The npm version badge is automatically updated to match the version in package.json when:
1277
- - Running `npm run build` (via the prebuild script)
1278
- - Running `npm version` commands (patch, minor, major)
1279
- - Manually running `node scripts/generate-version.js`
1305
+ - Running `npm run build` (via the prebuild script)
1306
+ - Running `npm version` commands (patch, minor, major)
1307
+ - Manually running `node scripts/generate-version.js`
1280
1308
 
1281
1309
  This ensures that the badge always reflects the current version in package.json, even before publishing to npm.
1282
1310