npm - @soulcraft/brainy - Versions diffs - 0.9.10 → 0.9.12 - Mend

@soulcraft/brainy 0.9.10 → 0.9.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +50 -22
package/dist/brainy.js +36414 -66507
package/dist/brainy.min.js +1112 -4724
package/dist/brainyData.d.ts +2 -0
package/dist/hnsw/hnswIndex.d.ts +2 -0
package/dist/unified.js +36414 -66507
package/dist/unified.min.js +1112 -4724
package/dist/utils/distance.d.ts +12 -1
package/dist/utils/embedding.d.ts +19 -14
package/dist/utils/environment.d.ts +1 -0
package/dist/utils/index.d.ts +1 -0
package/dist/utils/version.d.ts +1 -1
package/dist/utils/workerUtils.d.ts +7 -17
package/package.json +7 -2

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 [![Node.js](https://img.shields.io/badge/node-%3E%3D23.11.0-brightgreen.svg)](https://nodejs.org/)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.1.6-blue.svg)](https://www.typescriptlang.org/)
 [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
-[![npm](https://img.shields.io/badge/npm-v0.9.10-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
+[![npm](https://img.shields.io/badge/npm-v0.9.12-blue.svg)](https://www.npmjs.com/package/@soulcraft/brainy)
 [//]: # ([![Cartographer]&#40;https://img.shields.io/badge/Cartographer-Official%20Standard-brightgreen&#41;]&#40;https://github.com/sodal-project/cartographer&#41;)
@@ -508,7 +508,7 @@ const id = await db.add(textOrVector, {
   // other metadata...
 })
-// Add multiple nouns in parallel (with multithreading)
+// Add multiple nouns in parallel (with multithreading and batch embedding)
 const ids = await db.addBatch([
   {
     vectorOrData: "First item to add",
@@ -521,7 +521,8 @@ const ids = await db.addBatch([
   // More items...
 ], {
   forceEmbed: false,
-  concurrency: 4 // Control the level of parallelism (default: 4)
+  concurrency: 4, // Control the level of parallelism (default: 4)
+  batchSize: 50   // Control the number of items to process in a single batch (default: 50)
 })
 // Retrieve a noun
@@ -591,9 +592,7 @@ await db.init()
 // Or use the threaded embedding function for better performance
 const threadedDb = new BrainyData({
-  embeddingFunction: createThreadedEmbeddingFunction({
-    fallbackToMain: true // Fall back to main thread if threading fails
-  })
+  embeddingFunction: createThreadedEmbeddingFunction()
 })
 await threadedDb.init()
@@ -602,17 +601,36 @@ const vector = await db.embed("Some text to convert to a vector")
 ```
 The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
-performance, especially for CPU-intensive embedding operations. It automatically falls back to the main thread if
-threading is not available in the current environment.
+performance, especially for embedding operations. It uses GPU acceleration when available (via WebGL in browsers) and
+falls back to CPU processing for compatibility. Universal Sentence Encoder is always used for embeddings. The
+implementation includes worker reuse and model caching for optimal performance.
 ### Performance Tuning
-Brainy now includes comprehensive multithreading support to improve performance across all environments:
+Brainy includes comprehensive performance optimizations that work across all environments (browser, CLI, Node.js,
+container, server):
+#### GPU and CPU Optimization
+Brainy uses GPU and CPU optimization for compute-intensive operations:
+1. **GPU-Accelerated Embeddings**: Generate text embeddings using TensorFlow.js with WebGL backend when available
+2. **Automatic Fallback**: Falls back to CPU backend when GPU is not available
+3. **Optimized Distance Calculations**: Perform vector similarity calculations with optimized algorithms
+4. **Cross-Environment Support**: Works consistently across browsers and Node.js environments
+5. **Memory Management**: Properly disposes of tensors to prevent memory leaks
+#### Multithreading Support
+Brainy includes comprehensive multithreading support to improve performance across all environments:
 1. **Parallel Batch Processing**: Add multiple items concurrently with controlled parallelism
 2. **Multithreaded Vector Search**: Perform distance calculations in parallel for faster search operations
 3. **Threaded Embedding Generation**: Generate embeddings in separate threads to avoid blocking the main thread
-4. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
+4. **Worker Reuse**: Maintains a pool of workers to avoid the overhead of creating and terminating workers
+5. **Model Caching**: Initializes the embedding model once per worker and reuses it for multiple operations
+6. **Batch Embedding**: Processes multiple items in a single embedding operation for better performance
+7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
 ```typescript
 import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
@@ -629,8 +647,8 @@ const db = new BrainyData({
     efSearch: 50,       // Search candidate list size
   },
-  // Multithreading options
-  threading: {
+  // Performance optimization options
+  performance: {
     useParallelization: true, // Enable multithreaded search operations
   },
@@ -706,10 +724,16 @@ console.log(status.details.index)
 ## Distance Functions
-- `cosineDistance` (default)
-- `euclideanDistance`
-- `manhattanDistance`
-- `dotProductDistance`
+Brainy provides several distance functions for vector similarity calculations:
+- `cosineDistance` (default): Measures the cosine of the angle between vectors (1 - cosine similarity)
+- `euclideanDistance`: Measures the straight-line distance between vectors
+- `manhattanDistance`: Measures the sum of absolute differences between vector components
+- `dotProductDistance`: Measures the negative dot product between vectors
+All distance functions are optimized for performance and automatically use the most efficient implementation based on
+the dataset size and available resources. For large datasets and high-dimensional vectors, Brainy uses batch processing
+and multithreading when available to improve performance.
 ## Backup and Restore
@@ -791,6 +815,9 @@ brainy import-sparse --input sparse-data.json
 Brainy uses the following embedding approach:
 - TensorFlow Universal Sentence Encoder (high-quality text embeddings)
+- GPU acceleration when available (via WebGL in browsers)
+- Batch embedding for processing multiple items efficiently
+- Worker reuse and model caching for optimal performance
 - Custom embedding functions can be plugged in for specialized domains
 ## Extensions
@@ -1033,8 +1060,9 @@ The repository includes a comprehensive demo that showcases Brainy's main featur
       which automatically deploys when pushing to the main branch or can be manually triggered
     - To use a custom domain (like www.soulcraft.com):
         1. A CNAME file is already included in the demo directory
-        2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
-        3. Configure your domain's DNS settings to point to GitHub Pages:
+            2. In your GitHub repository settings, go to Pages > Custom domain and enter your domain
+            3. Configure your domain's DNS settings to point to GitHub Pages:
             - Add a CNAME record for www pointing to `<username>.github.io` (e.g., `soulcraft-research.github.io`)
             - Or for an apex domain (soulcraft.com), add A records pointing to GitHub Pages IP addresses
@@ -1216,7 +1244,7 @@ const id = await db.addToBoth('Deep learning is a subset of machine learning', {
   tags: ['deep learning', 'neural networks']
 })
-// Clean up when done
+// Clean up when done (this also cleans up worker pools)
 await db.shutDown()
 ```
@@ -1274,9 +1302,9 @@ Brainy follows a specific code style to maintain consistency throughout the code
 The README badges are automatically updated during the build process:
 1. **npm Version Badge**: The npm version badge is automatically updated to match the version in package.json when:
-   - Running `npm run build` (via the prebuild script)
-   - Running `npm version` commands (patch, minor, major)
-   - Manually running `node scripts/generate-version.js`
+    - Running `npm run build` (via the prebuild script)
+    - Running `npm version` commands (patch, minor, major)
+    - Manually running `node scripts/generate-version.js`
 This ensures that the badge always reflects the current version in package.json, even before publishing to npm.