npm - @soulcraft/brainy - Versions diffs - 0.32.0 → 0.34.0 - Mend

@soulcraft/brainy 0.32.0 → 0.34.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.md +516 -402
package/dist/brainyData.d.ts +136 -0
package/dist/coreTypes.d.ts +26 -0
package/dist/storage/adapters/baseStorageAdapter.d.ts +16 -0
package/dist/storage/adapters/baseStorageAdapter.d.ts.map +1 -1
package/dist/storage/adapters/fileSystemStorage.d.ts.map +1 -1
package/dist/storage/cacheManager.d.ts +94 -13
package/dist/storage/cacheManager.d.ts.map +1 -1
package/dist/storage/storageFactory.d.ts +47 -3
package/dist/storage/storageFactory.d.ts.map +1 -1
package/dist/unified.js +61974 -29728
package/dist/unified.min.js +4624 -982
package/dist/utils/embedding.d.ts +11 -17
package/dist/utils/embedding.d.ts.map +1 -1
package/dist/utils/fieldNameTracking.d.ts +21 -0
package/dist/utils/fieldNameTracking.d.ts.map +1 -0
package/dist/utils/index.d.ts +2 -0
package/dist/utils/index.d.ts.map +1 -1
package/dist/utils/jsonProcessing.d.ts +43 -0
package/dist/utils/jsonProcessing.d.ts.map +1 -0
package/dist/utils/robustModelLoader.d.ts +94 -0
package/dist/utils/robustModelLoader.d.ts.map +1 -0
package/package.json +21 -21

package/README.md CHANGED Viewed

@@ -27,6 +27,8 @@ it gets - learning from your data to provide increasingly relevant results and c
 - **Run Everywhere** - Works in browsers, Node.js, serverless functions, and containers
 - **Vector Search** - Find semantically similar content using embeddings
+- **Advanced JSON Document Search** - Search within specific fields of JSON documents with field prioritization and
+  service-based field standardization
 - **Graph Relationships** - Connect data with meaningful relationships
 - **Streaming Pipeline** - Process data in real-time as it flows through the system
 - **Extensible Augmentations** - Customize and extend functionality with pluggable components
@@ -89,7 +91,7 @@ REST API web service wrapper that provides HTTP endpoints for search operations
 Brainy uses a unified build that automatically adapts to your environment (Node.js, browser, or serverless):
 ```typescript
-import {BrainyData, NounType, VerbType} from '@soulcraft/brainy'
+import { BrainyData, NounType, VerbType } from '@soulcraft/brainy'
 // Create and initialize the database
 const db = new BrainyData()
@@ -97,13 +99,13 @@ await db.init()
 // Add data (automatically converted to vectors)
 const catId = await db.add("Cats are independent pets", {
-    noun: NounType.Thing,
-    category: 'animal'
+  noun: NounType.Thing,
+  category: 'animal'
 })
 const dogId = await db.add("Dogs are loyal companions", {
-    noun: NounType.Thing,
-    category: 'animal'
+  noun: NounType.Thing,
+  category: 'animal'
 })
 // Search for similar items
@@ -112,8 +114,8 @@ console.log(results)
 // Add a relationship between items
 await db.addVerb(catId, dogId, {
-    verb: VerbType.RelatedTo,
-    description: 'Both are common household pets'
+  verb: VerbType.RelatedTo,
+  description: 'Both are common household pets'
 })
 ```
@@ -121,10 +123,10 @@ await db.addVerb(catId, dogId, {
 ```typescript
 // Standard import - automatically adapts to any environment
-import {BrainyData} from '@soulcraft/brainy'
+import { BrainyData } from '@soulcraft/brainy'
 // Minified version for production
-import {BrainyData} from '@soulcraft/brainy/min'
+import { BrainyData } from '@soulcraft/brainy/min'
 ```
 > **Note**: The CLI functionality is available as a separate package `@soulcraft/brainy-cli` to reduce the bundle size
@@ -136,15 +138,15 @@ import {BrainyData} from '@soulcraft/brainy/min'
 ```html
 <script type="module">
-    // Use local files instead of CDN
-    import {BrainyData} from './dist/unified.js'
+  // Use local files instead of CDN
+  import { BrainyData } from './dist/unified.js'
-    // Or minified version
-    // import { BrainyData } from './dist/unified.min.js'
+  // Or minified version
+  // import { BrainyData } from './dist/unified.min.js'
-    const db = new BrainyData()
-    await db.init()
-    // ...
+  const db = new BrainyData()
+  await db.init()
+  // ...
 </script>
 ```
@@ -299,13 +301,13 @@ The pipeline runs automatically when you:
 ```typescript
 // Add data (runs embedding → indexing → storage)
-const id = await db.add("Your text data here", {metadata})
+const id = await db.add("Your text data here", { metadata })
 // Search (runs embedding → similarity search)
 const results = await db.searchText("Your query here", 5)
 // Connect entities (runs graph construction → storage)
-await db.addVerb(sourceId, targetId, {verb: VerbType.RelatedTo})
+await db.addVerb(sourceId, targetId, { verb: VerbType.RelatedTo })
 ```
 Using the CLI:
@@ -404,13 +406,13 @@ Connections between nouns (edges in the graph):
 Brainy provides utility functions to access lists of noun and verb types:
 ```typescript
-import {
-  NounType,
-  VerbType,
-  getNounTypes,
-  getVerbTypes,
-  getNounTypeMap,
-  getVerbTypeMap
+import {
+  NounType,
+  VerbType,
+  getNounTypes,
+  getVerbTypes,
+  getNounTypeMap,
+  getVerbTypeMap
 } from '@soulcraft/brainy'
 // At development time:
@@ -433,6 +435,7 @@ const verbTypeMap = getVerbTypeMap() // { RelatedTo: 'relatedTo', Contains: 'con
 ```
 These utility functions make it easy to:
 - Get a complete list of available noun and verb types
 - Validate user input against valid types
 - Create dynamic UI components that display or select from available types
@@ -528,15 +531,17 @@ const status = await db.status()
 const backupData = await db.backup()
 // Restore data into the database
-const restoreResult = await db.restore(backupData, {clearExisting: true})
+const restoreResult = await db.restore(backupData, { clearExisting: true })
 ```
 ### Database Statistics
-Brainy provides a way to get statistics about the current state of the database. For detailed information about the statistics system, including implementation details, scalability improvements, and usage examples, see our [Statistics Guide](STATISTICS.md).
+Brainy provides a way to get statistics about the current state of the database. For detailed information about the
+statistics system, including implementation details, scalability improvements, and usage examples, see
+our [Statistics Guide](STATISTICS.md).
 ```typescript
-import {BrainyData, getStatistics} from '@soulcraft/brainy'
+import { BrainyData, getStatistics } from '@soulcraft/brainy'
 // Create and initialize the database
 const db = new BrainyData()
@@ -553,25 +558,25 @@ console.log(stats)
 ```typescript
 // Add a noun (automatically vectorized)
 const id = await db.add(textOrVector, {
-    noun: NounType.Thing,
-    // other metadata...
+  noun: NounType.Thing,
+  // other metadata...
 })
 // Add multiple nouns in parallel (with multithreading and batch embedding)
 const ids = await db.addBatch([
-    {
-        vectorOrData: "First item to add",
-        metadata: {noun: NounType.Thing, category: 'example'}
-    },
-    {
-        vectorOrData: "Second item to add",
-        metadata: {noun: NounType.Thing, category: 'example'}
-    },
-    // More items...
+  {
+    vectorOrData: "First item to add",
+    metadata: { noun: NounType.Thing, category: 'example' }
+  },
+  {
+    vectorOrData: "Second item to add",
+    metadata: { noun: NounType.Thing, category: 'example' }
+  },
+  // More items...
 ], {
-    forceEmbed: false,
-    concurrency: 4, // Control the level of parallelism (default: 4)
-    batchSize: 50   // Control the number of items to process in a single batch (default: 50)
+  forceEmbed: false,
+  concurrency: 4, // Control the level of parallelism (default: 4)
+  batchSize: 50   // Control the number of items to process in a single batch (default: 50)
 })
 // Retrieve a noun
@@ -579,8 +584,8 @@ const noun = await db.get(id)
 // Update noun metadata
 await db.updateMetadata(id, {
-    noun: NounType.Thing,
-    // updated metadata...
+  noun: NounType.Thing,
+  // updated metadata...
 })
 // Delete a noun
@@ -592,6 +597,39 @@ const textResults = await db.searchText("query text", numResults)
 // Search by noun type
 const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
+// Search within specific fields of JSON documents
+const fieldResults = await db.search("Acme Corporation", 10, {
+  searchField: "company"
+})
+// Search using standard field names across different services
+const titleResults = await db.searchByStandardField("title", "climate change", 10)
+const authorResults = await db.searchByStandardField("author", "johndoe", 10, {
+  services: ["github", "reddit"]
+})
+```
+### Field Standardization and Service Tracking
+Brainy automatically tracks field names from JSON documents and associates them with the service that inserted the data.
+This enables powerful cross-service search capabilities:
+```typescript
+// Get all available field names organized by service
+const fieldNames = await db.getAvailableFieldNames()
+// Example output: { "github": ["repository.name", "issue.title"], "reddit": ["title", "selftext"] }
+// Get standard field mappings
+const standardMappings = await db.getStandardFieldMappings()
+// Example output: { "title": { "github": ["repository.name"], "reddit": ["title"] } }
+```
+When adding data, specify the service name to ensure proper field tracking:
+```typescript
+// Add data with service name
+await db.add(jsonData, metadata, { service: "github" })
 ```
 ### Working with Verbs (Relationships)
@@ -599,21 +637,21 @@ const thingNouns = await db.searchByNounTypes([NounType.Thing], numResults)
 ```typescript
 // Add a relationship between nouns
 await db.addVerb(sourceId, targetId, {
-    verb: VerbType.RelatedTo,
-    // other metadata...
+  verb: VerbType.RelatedTo,
+  // other metadata...
 })
 // Add a relationship with auto-creation of missing nouns
 // This is useful when the target noun might not exist yet
 await db.addVerb(sourceId, targetId, {
-    verb: VerbType.RelatedTo,
-    // Enable auto-creation of missing nouns
-    autoCreateMissingNouns: true,
-    // Optional metadata for auto-created nouns
-    missingNounMetadata: {
-        noun: NounType.Concept,
-        description: 'Auto-created noun'
-    }
+  verb: VerbType.RelatedTo,
+  // Enable auto-creation of missing nouns
+  autoCreateMissingNouns: true,
+  // Optional metadata for auto-created nouns
+  missingNounMetadata: {
+    noun: NounType.Concept,
+    description: 'Auto-created noun'
+  }
 })
 // Get all relationships
@@ -665,32 +703,53 @@ db.setReadOnly(false)
 db.setWriteOnly(false)
 ```
-- **Read-Only Mode**: When enabled, prevents all write operations (add, update, delete). Useful for deployment scenarios where you want to prevent modifications to the database.
-- **Write-Only Mode**: When enabled, prevents all search operations. Useful for initial data loading or when you want to optimize for write performance.
+- **Read-Only Mode**: When enabled, prevents all write operations (add, update, delete). Useful for deployment scenarios
+  where you want to prevent modifications to the database.
+- **Write-Only Mode**: When enabled, prevents all search operations. Useful for initial data loading or when you want to
+  optimize for write performance.
 ### Embedding
 ```typescript
 import {
-    BrainyData,
-    createTensorFlowEmbeddingFunction,
-    createThreadedEmbeddingFunction
+  BrainyData,
+  createTensorFlowEmbeddingFunction,
+  createThreadedEmbeddingFunction
 } from '@soulcraft/brainy'
 // Use the standard TensorFlow Universal Sentence Encoder embedding function
 const db = new BrainyData({
-    embeddingFunction: createTensorFlowEmbeddingFunction()
+  embeddingFunction: createTensorFlowEmbeddingFunction()
 })
 await db.init()
 // Or use the threaded embedding function for better performance
 const threadedDb = new BrainyData({
-    embeddingFunction: createThreadedEmbeddingFunction()
+  embeddingFunction: createThreadedEmbeddingFunction()
 })
 await threadedDb.init()
 // Directly embed text to vectors
 const vector = await db.embed("Some text to convert to a vector")
+// Calculate similarity between two texts or vectors
+const similarity = await db.calculateSimilarity(
+  "Cats are furry pets",
+  "Felines make good companions"
+)
+console.log(`Similarity score: ${similarity}`) // Higher value means more similar
+// Calculate similarity with custom options
+const vectorA = await db.embed("First text")
+const vectorB = await db.embed("Second text")
+const customSimilarity = await db.calculateSimilarity(
+  vectorA, // Can use pre-computed vectors
+  vectorB,
+  {
+    forceEmbed: false, // Skip embedding if inputs are already vectors
+    distanceFunction: cosineDistance // Optional custom distance function
+  }
+)
 ```
 The threaded embedding function runs in a separate thread (Web Worker in browsers, Worker Thread in Node.js) to improve
@@ -726,42 +785,42 @@ Brainy includes comprehensive multithreading support to improve performance acro
 7. **Automatic Environment Detection**: Adapts to browser (Web Workers) and Node.js (Worker Threads) environments
 ```typescript
-import {BrainyData, euclideanDistance} from '@soulcraft/brainy'
+import { BrainyData, euclideanDistance } from '@soulcraft/brainy'
 // Configure with custom options
 const db = new BrainyData({
-    // Use Euclidean distance instead of default cosine distance
-    distanceFunction: euclideanDistance,
-    // HNSW index configuration for search performance
-    hnsw: {
-        M: 16,              // Max connections per noun
-        efConstruction: 200, // Construction candidate list size
-        efSearch: 50,       // Search candidate list size
-    },
-    // Performance optimization options
-    performance: {
-        useParallelization: true, // Enable multithreaded search operations
-    },
-    // Noun and Verb type validation
-    typeValidation: {
-        enforceNounTypes: true,  // Validate noun types against NounType enum
-        enforceVerbTypes: true,  // Validate verb types against VerbType enum
-    },
-    // Storage configuration
-    storage: {
-        requestPersistentStorage: true,
-        // Example configuration for cloud storage (replace with your own values):
-        // s3Storage: {
-        //   bucketName: 'your-s3-bucket-name',
-        //   region: 'your-aws-region'
-        //   // Credentials should be provided via environment variables
-        //   // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
-        // }
-    }
+  // Use Euclidean distance instead of default cosine distance
+  distanceFunction: euclideanDistance,
+  // HNSW index configuration for search performance
+  hnsw: {
+    M: 16,              // Max connections per noun
+    efConstruction: 200, // Construction candidate list size
+    efSearch: 50,       // Search candidate list size
+  },
+  // Performance optimization options
+  performance: {
+    useParallelization: true, // Enable multithreaded search operations
+  },
+  // Noun and Verb type validation
+  typeValidation: {
+    enforceNounTypes: true,  // Validate noun types against NounType enum
+    enforceVerbTypes: true,  // Validate verb types against VerbType enum
+  },
+  // Storage configuration
+  storage: {
+    requestPersistentStorage: true,
+    // Example configuration for cloud storage (replace with your own values):
+    // s3Storage: {
+    //   bucketName: 'your-s3-bucket-name',
+    //   region: 'your-aws-region'
+    //   // Credentials should be provided via environment variables
+    //   // AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
+    // }
+  }
 })
 ```
@@ -775,34 +834,34 @@ hybrid approach:
 3. **Memory-Efficient Indexing** - Optimizes memory usage for large-scale vector collections
 ```typescript
-import {BrainyData} from '@soulcraft/brainy'
+import { BrainyData } from '@soulcraft/brainy'
 // Configure with optimized HNSW index for large datasets
 const db = new BrainyData({
-    hnswOptimized: {
-        // Standard HNSW parameters
-        M: 16,              // Max connections per noun
-        efConstruction: 200, // Construction candidate list size
-        efSearch: 50,       // Search candidate list size
-        // Memory threshold in bytes - when exceeded, will use disk-based approach
-        memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
-        // Product quantization settings for dimensionality reduction
-        productQuantization: {
-            enabled: true,              // Enable product quantization
-            numSubvectors: 16,          // Number of subvectors to split the vector into
-            numCentroids: 256           // Number of centroids per subvector
-        },
-        // Whether to use disk-based storage for the index
-        useDiskBasedIndex: true         // Enable disk-based storage
+  hnswOptimized: {
+    // Standard HNSW parameters
+    M: 16,              // Max connections per noun
+    efConstruction: 200, // Construction candidate list size
+    efSearch: 50,       // Search candidate list size
+    // Memory threshold in bytes - when exceeded, will use disk-based approach
+    memoryThreshold: 1024 * 1024 * 1024, // 1GB default threshold
+    // Product quantization settings for dimensionality reduction
+    productQuantization: {
+      enabled: true,              // Enable product quantization
+      numSubvectors: 16,          // Number of subvectors to split the vector into
+      numCentroids: 256           // Number of centroids per subvector
     },
-    // Storage configuration (required for disk-based index)
-    storage: {
-        requestPersistentStorage: true
-    }
+    // Whether to use disk-based storage for the index
+    useDiskBasedIndex: true         // Enable disk-based storage
+  },
+  // Storage configuration (required for disk-based index)
+  storage: {
+    requestPersistentStorage: true
+  }
 })
 // The optimized index automatically adapts based on dataset size:
@@ -867,24 +926,24 @@ Brainy's restore functionality can handle:
 ```typescript
 // Restore data with all options
 const restoreResult = await db.restore(backupData, {
-    clearExisting: true // Whether to clear existing data before restore
+  clearExisting: true // Whether to clear existing data before restore
 })
 // Import sparse data (without vectors)
 // Vectors will be automatically created using the embedding function
 const sparseData = {
-    nouns: [
-        {
-            id: '123',
-            // No vector field - will be created during import
-            metadata: {
-                noun: 'Thing',
-                text: 'This text will be used to generate a vector'
-            }
-        }
-    ],
-    verbs: [],
-    version: '1.0.0'
+  nouns: [
+    {
+      id: '123',
+      // No vector field - will be created during import
+      metadata: {
+        noun: 'Thing',
+        text: 'This text will be used to generate a vector'
+      }
+    }
+  ],
+  verbs: [],
+  version: '1.0.0'
 }
 const sparseImportResult = await db.importSparseData(sparseData)
@@ -931,82 +990,82 @@ boilerplate:
 ```typescript
 import {
-    createMemoryAugmentation,
-    createConduitAugmentation,
-    createSenseAugmentation,
-    addWebSocketSupport,
-    executeStreamlined,
-    processStaticData,
-    processStreamingData,
-    createPipeline
+  createMemoryAugmentation,
+  createConduitAugmentation,
+  createSenseAugmentation,
+  addWebSocketSupport,
+  executeStreamlined,
+  processStaticData,
+  processStreamingData,
+  createPipeline
 } from '@soulcraft/brainy'
 // Create a memory augmentation with minimal code
 const memoryAug = createMemoryAugmentation({
-    name: 'simple-memory',
-    description: 'A simple in-memory storage augmentation',
-    autoRegister: true,
-    autoInitialize: true,
-    // Implement only the methods you need
-    storeData: async (key, data) => {
-        // Your implementation here
-        return {
-            success: true,
-            data: true
-        }
-    },
+  name: 'simple-memory',
+  description: 'A simple in-memory storage augmentation',
+  autoRegister: true,
+  autoInitialize: true,
+  // Implement only the methods you need
+  storeData: async (key, data) => {
+    // Your implementation here
+    return {
+      success: true,
+      data: true
+    }
+  },
-    retrieveData: async (key) => {
-        // Your implementation here
-        return {
-            success: true,
-            data: {example: 'data', key}
-        }
+  retrieveData: async (key) => {
+    // Your implementation here
+    return {
+      success: true,
+      data: { example: 'data', key }
     }
+  }
 })
 // Add WebSocket support to any augmentation
 const wsAugmentation = addWebSocketSupport(memoryAug, {
-    connectWebSocket: async (url) => {
-        // Your implementation here
-        return {
-            connectionId: 'ws-1',
-            url,
-            status: 'connected'
-        }
+  connectWebSocket: async (url) => {
+    // Your implementation here
+    return {
+      connectionId: 'ws-1',
+      url,
+      status: 'connected'
     }
+  }
 })
 // Process static data through a pipeline
 const result = await processStaticData(
-    'Input data',
-    [
-        {
-            augmentation: senseAug,
-            method: 'processRawData',
-            transformArgs: (data) => [data, 'text']
-        },
-        {
-            augmentation: memoryAug,
-            method: 'storeData',
-            transformArgs: (data) => ['processed-data', data]
-        }
-    ]
-)
-// Create a reusable pipeline
-const pipeline = createPipeline([
+  'Input data',
+  [
     {
-        augmentation: senseAug,
-        method: 'processRawData',
-        transformArgs: (data) => [data, 'text']
+      augmentation: senseAug,
+      method: 'processRawData',
+      transformArgs: (data) => [data, 'text']
     },
     {
-        augmentation: memoryAug,
-        method: 'storeData',
-        transformArgs: (data) => ['processed-data', data]
+      augmentation: memoryAug,
+      method: 'storeData',
+      transformArgs: (data) => ['processed-data', data]
     }
+  ]
+)
+// Create a reusable pipeline
+const pipeline = createPipeline([
+  {
+    augmentation: senseAug,
+    method: 'processRawData',
+    transformArgs: (data) => [data, 'text']
+  },
+  {
+    augmentation: memoryAug,
+    method: 'storeData',
+    transformArgs: (data) => ['processed-data', data]
+  }
 ])
 // Use the pipeline
@@ -1014,11 +1073,11 @@ const result = await pipeline('New input data')
 // Dynamically load augmentations at runtime
 const loadedAugmentations = await loadAugmentationModule(
-    import('./my-augmentations.js'),
-    {
-        autoRegister: true,
-        autoInitialize: true
-    }
+  import('./my-augmentations.js'),
+  {
+    autoRegister: true,
+    autoInitialize: true
+  }
 )
 ```
@@ -1037,56 +1096,56 @@ capabilities to their augmentations:
 ```typescript
 import {
-    // Base WebSocket support interface
-    IWebSocketSupport,
-    // Combined WebSocket augmentation types
-    IWebSocketSenseAugmentation,
-    IWebSocketConduitAugmentation,
-    IWebSocketCognitionAugmentation,
-    IWebSocketMemoryAugmentation,
-    IWebSocketPerceptionAugmentation,
-    IWebSocketDialogAugmentation,
-    IWebSocketActivationAugmentation,
-    // Function to add WebSocket support to any augmentation
-    addWebSocketSupport
+  // Base WebSocket support interface
+  IWebSocketSupport,
+  // Combined WebSocket augmentation types
+  IWebSocketSenseAugmentation,
+  IWebSocketConduitAugmentation,
+  IWebSocketCognitionAugmentation,
+  IWebSocketMemoryAugmentation,
+  IWebSocketPerceptionAugmentation,
+  IWebSocketDialogAugmentation,
+  IWebSocketActivationAugmentation,
+  // Function to add WebSocket support to any augmentation
+  addWebSocketSupport
 } from '@soulcraft/brainy'
 // Example: Creating a typed WebSocket-enabled sense augmentation
 const mySenseAug = createSenseAugmentation({
-    name: 'my-sense',
-    processRawData: async (data, dataType) => {
-        // Implementation
-        return {
-            success: true,
-            data: {nouns: [], verbs: []}
-        }
+  name: 'my-sense',
+  processRawData: async (data, dataType) => {
+    // Implementation
+    return {
+      success: true,
+      data: { nouns: [], verbs: [] }
     }
+  }
 }) as IWebSocketSenseAugmentation
 // Add WebSocket support
 addWebSocketSupport(mySenseAug, {
-    connectWebSocket: async (url) => {
-        // WebSocket implementation
-        return {
-            connectionId: 'ws-1',
-            url,
-            status: 'connected'
-        }
-    },
-    sendWebSocketMessage: async (connectionId, data) => {
-        // Send message implementation
-    },
-    onWebSocketMessage: async (connectionId, callback) => {
-        // Register callback implementation
-    },
-    offWebSocketMessage: async (connectionId, callback) => {
-        // Remove callback implementation
-    },
-    closeWebSocket: async (connectionId, code, reason) => {
-        // Close connection implementation
+  connectWebSocket: async (url) => {
+    // WebSocket implementation
+    return {
+      connectionId: 'ws-1',
+      url,
+      status: 'connected'
     }
+  },
+  sendWebSocketMessage: async (connectionId, data) => {
+    // Send message implementation
+  },
+  onWebSocketMessage: async (connectionId, callback) => {
+    // Register callback implementation
+  },
+  offWebSocketMessage: async (connectionId, callback) => {
+    // Remove callback implementation
+  },
+  closeWebSocket: async (connectionId, code, reason) => {
+    // Close connection implementation
+  }
 })
 // Now mySenseAug has both sense augmentation methods and WebSocket methods
@@ -1124,13 +1183,13 @@ everywhere.
 Brainy automatically detects the environment it's running in:
 ```typescript
-import {environment} from '@soulcraft/brainy'
+import { environment } from '@soulcraft/brainy'
 // Check which environment we're running in
 console.log(`Running in ${
-    environment.isBrowser ? 'browser' :
-        environment.isNode ? 'Node.js' :
-            'serverless/unknown'
+  environment.isBrowser ? 'browser' :
+    environment.isNode ? 'Node.js' :
+      'serverless/unknown'
 } environment`)
 ```
@@ -1203,9 +1262,9 @@ You can use the conduit augmentations to sync Brainy instances:
 ```typescript
 import {
-    BrainyData,
-    pipeline,
-    createConduitAugmentation
+  BrainyData,
+  pipeline,
+  createConduitAugmentation
 } from '@soulcraft/brainy'
 // Create and initialize the database
@@ -1221,36 +1280,36 @@ pipeline.register(wsConduit)
 // Connect to another Brainy instance (server or browser)
 // Replace the example URL below with your actual WebSocket server URL
 const connectionResult = await pipeline.executeConduitPipeline(
-    'establishConnection',
-    ['wss://example-websocket-server.com/brainy-sync', {protocols: 'brainy-sync'}]
+  'establishConnection',
+  ['wss://example-websocket-server.com/brainy-sync', { protocols: 'brainy-sync' }]
 )
 if (connectionResult[0] && (await connectionResult[0]).success) {
-    const connection = (await connectionResult[0]).data
-    // Read data from the remote instance
-    const readResult = await pipeline.executeConduitPipeline(
-        'readData',
-        [{connectionId: connection.connectionId, query: {type: 'getAllNouns'}}]
-    )
-    // Process and add the received data to the local instance
-    if (readResult[0] && (await readResult[0]).success) {
-        const remoteNouns = (await readResult[0]).data
-        for (const noun of remoteNouns) {
-            await db.add(noun.vector, noun.metadata)
-        }
+  const connection = (await connectionResult[0]).data
+  // Read data from the remote instance
+  const readResult = await pipeline.executeConduitPipeline(
+    'readData',
+    [{ connectionId: connection.connectionId, query: { type: 'getAllNouns' } }]
+  )
+  // Process and add the received data to the local instance
+  if (readResult[0] && (await readResult[0]).success) {
+    const remoteNouns = (await readResult[0]).data
+    for (const noun of remoteNouns) {
+      await db.add(noun.vector, noun.metadata)
     }
-    // Set up real-time sync by monitoring the stream
-    await wsConduit.monitorStream(connection.connectionId, async (data) => {
-        // Handle incoming data (e.g., new nouns, verbs, updates)
-        if (data.type === 'newNoun') {
-            await db.add(data.vector, data.metadata)
-        } else if (data.type === 'newVerb') {
-            await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
-        }
-    })
+  }
+  // Set up real-time sync by monitoring the stream
+  await wsConduit.monitorStream(connection.connectionId, async (data) => {
+    // Handle incoming data (e.g., new nouns, verbs, updates)
+    if (data.type === 'newNoun') {
+      await db.add(data.vector, data.metadata)
+    } else if (data.type === 'newVerb') {
+      await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
+    }
+  })
 }
 ```
@@ -1258,9 +1317,9 @@ if (connectionResult[0] && (await connectionResult[0]).success) {
 ```typescript
 import {
-    BrainyData,
-    pipeline,
-    createConduitAugmentation
+  BrainyData,
+  pipeline,
+  createConduitAugmentation
 } from '@soulcraft/brainy'
 // Create and initialize the database
@@ -1276,48 +1335,48 @@ pipeline.register(webrtcConduit)
 // Connect to a peer using a signaling server
 // Replace the example values below with your actual configuration
 const connectionResult = await pipeline.executeConduitPipeline(
-    'establishConnection',
-    [
-        'peer-id-to-connect-to', // Replace with actual peer ID
-        {
-            signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
-            localPeerId: 'my-local-peer-id', // Replace with your local peer ID
-            iceServers: [{urls: 'stun:stun.l.google.com:19302'}] // Public STUN server
-        }
-    ]
+  'establishConnection',
+  [
+    'peer-id-to-connect-to', // Replace with actual peer ID
+    {
+      signalServerUrl: 'wss://example-signal-server.com', // Replace with your signal server
+      localPeerId: 'my-local-peer-id', // Replace with your local peer ID
+      iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] // Public STUN server
+    }
+  ]
 )
 if (connectionResult[0] && (await connectionResult[0]).success) {
-    const connection = (await connectionResult[0]).data
-    // Set up real-time sync by monitoring the stream
-    await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
-        // Handle incoming data (e.g., new nouns, verbs, updates)
-        if (data.type === 'newNoun') {
-            await db.add(data.vector, data.metadata)
-        } else if (data.type === 'newVerb') {
-            await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
+  const connection = (await connectionResult[0]).data
+  // Set up real-time sync by monitoring the stream
+  await webrtcConduit.monitorStream(connection.connectionId, async (data) => {
+    // Handle incoming data (e.g., new nouns, verbs, updates)
+    if (data.type === 'newNoun') {
+      await db.add(data.vector, data.metadata)
+    } else if (data.type === 'newVerb') {
+      await db.addVerb(data.sourceId, data.targetId, data.vector, data.options)
+    }
+  })
+  // When adding new data locally, also send to the peer
+  const nounId = await db.add("New data to sync", { noun: "Thing" })
+  // Send the new noun to the peer
+  await pipeline.executeConduitPipeline(
+    'writeData',
+    [
+      {
+        connectionId: connection.connectionId,
+        data: {
+          type: 'newNoun',
+          id: nounId,
+          vector: (await db.get(nounId)).vector,
+          metadata: (await db.get(nounId)).metadata
         }
-    })
-    // When adding new data locally, also send to the peer
-    const nounId = await db.add("New data to sync", {noun: "Thing"})
-    // Send the new noun to the peer
-    await pipeline.executeConduitPipeline(
-        'writeData',
-        [
-            {
-                connectionId: connection.connectionId,
-                data: {
-                    type: 'newNoun',
-                    id: nounId,
-                    vector: (await db.get(nounId)).vector,
-                    metadata: (await db.get(nounId)).metadata
-                }
-            }
-        ]
-    )
+      }
+    ]
+  )
 }
 ```
@@ -1327,39 +1386,39 @@ Brainy supports searching a server-hosted instance from a browser, storing resul
 searches against the local instance:
 ```typescript
-import {BrainyData} from '@soulcraft/brainy'
+import { BrainyData } from '@soulcraft/brainy'
 // Create and initialize the database with remote server configuration
 // Replace the example URL below with your actual Brainy server URL
 const db = new BrainyData({
-    remoteServer: {
-        url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
-        protocols: 'brainy-sync',
-        autoConnect: true // Connect automatically during initialization
-    }
+  remoteServer: {
+    url: 'wss://example-brainy-server.com/ws', // Replace with your server URL
+    protocols: 'brainy-sync',
+    autoConnect: true // Connect automatically during initialization
+  }
 })
 await db.init()
 // Or connect manually after initialization
 if (!db.isConnectedToRemoteServer()) {
-    // Replace the example URL below with your actual Brainy server URL
-    await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
+  // Replace the example URL below with your actual Brainy server URL
+  await db.connectToRemoteServer('wss://example-brainy-server.com/ws', 'brainy-sync')
 }
 // Search the remote server (results are stored locally)
-const remoteResults = await db.searchText('machine learning', 5, {searchMode: 'remote'})
+const remoteResults = await db.searchText('machine learning', 5, { searchMode: 'remote' })
 // Search the local database (includes previously stored results)
-const localResults = await db.searchText('machine learning', 5, {searchMode: 'local'})
+const localResults = await db.searchText('machine learning', 5, { searchMode: 'local' })
 // Perform a combined search (local first, then remote if needed)
-const combinedResults = await db.searchText('neural networks', 5, {searchMode: 'combined'})
+const combinedResults = await db.searchText('neural networks', 5, { searchMode: 'combined' })
 // Add data to both local and remote instances
 const id = await db.addToBoth('Deep learning is a subset of machine learning', {
-    noun: 'Concept',
-    category: 'AI',
-    tags: ['deep learning', 'neural networks']
+  noun: 'Concept',
+  category: 'AI',
+  tags: ['deep learning', 'neural networks']
 })
 // Clean up when done (this also cleans up worker pools)
@@ -1377,7 +1436,9 @@ terabyte-scale data that can't fit entirely in memory, we provide several approa
 - **Distributed HNSW**: Sharding and partitioning across multiple machines
 - **Hybrid Solutions**: Combining quantization techniques with multi-tier architectures
-For detailed information on how to scale Brainy for large datasets, vector dimension standardization, threading implementation, storage testing, and other technical topics, see our comprehensive [Technical Guides](TECHNICAL_GUIDES.md).
+For detailed information on how to scale Brainy for large datasets, vector dimension standardization, threading
+implementation, storage testing, and other technical topics, see our
+comprehensive [Technical Guides](TECHNICAL_GUIDES.md).
 ## Recent Changes and Performance Improvements
@@ -1385,24 +1446,29 @@ For detailed information on how to scale Brainy for large datasets, vector dimen
 Brainy has been significantly improved to handle larger datasets more efficiently:
-- **Pagination Support**: All data retrieval methods now support pagination to avoid loading entire datasets into memory at once. The deprecated `getAllNouns()` and `getAllVerbs()` methods have been replaced with `getNouns()` and `getVerbs()` methods that support pagination, filtering, and cursor-based navigation.
+- **Pagination Support**: All data retrieval methods now support pagination to avoid loading entire datasets into memory
+  at once. The deprecated `getAllNouns()` and `getAllVerbs()` methods have been replaced with `getNouns()` and
+  `getVerbs()` methods that support pagination, filtering, and cursor-based navigation.
 - **Multi-level Caching**: A sophisticated three-level caching strategy has been implemented:
-  - **Level 1**: Hot cache (most accessed nodes) - RAM (automatically detecting and adjusting in each environment)
-  - **Level 2**: Warm cache (recent nodes) - OPFS, Filesystem or S3 depending on environment
-  - **Level 3**: Cold storage (all nodes) - OPFS, Filesystem or S3 depending on environment
+    - **Level 1**: Hot cache (most accessed nodes) - RAM (automatically detecting and adjusting in each environment)
+    - **Level 2**: Warm cache (recent nodes) - OPFS, Filesystem or S3 depending on environment
+    - **Level 3**: Cold storage (all nodes) - OPFS, Filesystem or S3 depending on environment
 - **Adaptive Memory Usage**: The system automatically detects available memory and adjusts cache sizes accordingly:
-  - In Node.js: Uses 10% of free memory (minimum 1000 entries)
-  - In browsers: Scales based on device memory (500 entries per GB, minimum 1000)
+    - In Node.js: Uses 10% of free memory (minimum 1000 entries)
+    - In browsers: Scales based on device memory (500 entries per GB, minimum 1000)
-- **Intelligent Cache Eviction**: Implements a Least Recently Used (LRU) policy that evicts the oldest 20% of items when the cache reaches the configured threshold.
+- **Intelligent Cache Eviction**: Implements a Least Recently Used (LRU) policy that evicts the oldest 20% of items when
+  the cache reaches the configured threshold.
-- **Prefetching Strategy**: Implements batch prefetching to improve performance while avoiding overwhelming system resources.
+- **Prefetching Strategy**: Implements batch prefetching to improve performance while avoiding overwhelming system
+  resources.
 ### S3-Compatible Storage Improvements
-- **Enhanced Cloud Storage**: Improved support for S3-compatible storage services including AWS S3, Cloudflare R2, and others.
+- **Enhanced Cloud Storage**: Improved support for S3-compatible storage services including AWS S3, Cloudflare R2, and
+  others.
 - **Optimized Data Access**: Batch operations and error handling for efficient cloud storage access.
@@ -1412,9 +1478,11 @@ Brainy has been significantly improved to handle larger datasets more efficientl
 Yes, you can use existing data indexed from an old version. Brainy includes robust data migration capabilities:
-- **Vector Regeneration**: If vectors are missing in imported data, they will be automatically created using the embedding function.
+- **Vector Regeneration**: If vectors are missing in imported data, they will be automatically created using the
+  embedding function.
-- **HNSW Index Reconstruction**: The system can reconstruct the HNSW index from backup data, ensuring compatibility with previous versions.
+- **HNSW Index Reconstruction**: The system can reconstruct the HNSW index from backup data, ensuring compatibility with
+  previous versions.
 - **Sparse Data Import**: Support for importing sparse data (without vectors) through the `importSparseData()` method.
@@ -1422,66 +1490,138 @@ Yes, you can use existing data indexed from an old version. Brainy includes robu
 #### Default Mode
-- **Memory**:
-  - Minimum: 512MB RAM
-  - Recommended: 2GB+ RAM for medium datasets, 8GB+ for large datasets
-- **CPU**:
-  - Minimum: 2 cores
-  - Recommended: 4+ cores for better performance with parallel operations
+- **Memory**:
+    - Minimum: 512MB RAM
+    - Recommended: 2GB+ RAM for medium datasets, 8GB+ for large datasets
+- **CPU**:
+    - Minimum: 2 cores
+    - Recommended: 4+ cores for better performance with parallel operations
 - **Storage**:
-  - Minimum: 1GB available storage
-  - Recommended: Storage space at least 3x the size of your dataset
+    - Minimum: 1GB available storage
+    - Recommended: Storage space at least 3x the size of your dataset
 #### Read-Only Mode
 Read-only mode prevents all write operations (add, update, delete) and is optimized for search operations.
-- **Memory**:
-  - Minimum: 256MB RAM
-  - Recommended: 1GB+ RAM
-- **CPU**:
-  - Minimum: 1 core
-  - Recommended: 2+ cores
+- **Memory**:
+    - Minimum: 256MB RAM
+    - Recommended: 1GB+ RAM
+- **CPU**:
+    - Minimum: 1 core
+    - Recommended: 2+ cores
 - **Storage**:
-  - Minimum: Storage space equal to the size of your dataset
-  - Recommended: 2x the size of your dataset for caching
+    - Minimum: Storage space equal to the size of your dataset
+    - Recommended: 2x the size of your dataset for caching
 - **New Feature**: Lazy loading support in read-only mode for improved performance with large datasets.
 #### Write-Only Mode
-Write-only mode prevents all search operations and is optimized for initial data loading or when you want to optimize for write performance.
+Write-only mode prevents all search operations and is optimized for initial data loading or when you want to optimize
+for write performance.
-- **Memory**:
-  - Minimum: 512MB RAM
-  - Recommended: 2GB+ RAM
-- **CPU**:
-  - Minimum: 2 cores
-  - Recommended: 4+ cores for faster data ingestion
+- **Memory**:
+    - Minimum: 512MB RAM
+    - Recommended: 2GB+ RAM
+- **CPU**:
+    - Minimum: 2 cores
+    - Recommended: 4+ cores for faster data ingestion
 - **Storage**:
-  - Minimum: Storage space at least 2x the size of your dataset
-  - Recommended: 4x the size of your dataset for optimal performance
+    - Minimum: Storage space at least 2x the size of your dataset
+    - Recommended: 4x the size of your dataset for optimal performance
 ### Performance Tuning Parameters
-Brainy offers several configuration options for performance tuning:
+Brainy offers comprehensive configuration options for performance tuning, with enhanced support for large datasets in S3
+or other remote storage. **All configuration is optional** - the system automatically detects the optimal settings based
+on your environment, dataset size, and usage patterns.
+#### Intelligent Defaults
+Brainy uses intelligent defaults that automatically adapt to your environment:
+- **Environment Detection**: Automatically detects whether you're running in Node.js, browser, or worker environment
+- **Memory-Aware Caching**: Adjusts cache sizes based on available system memory
+- **Dataset Size Adaptation**: Tunes parameters based on the size of your dataset
+- **Usage Pattern Optimization**: Adjusts to read-heavy vs. write-heavy workloads
+- **Storage Type Awareness**: Optimizes for local vs. remote storage (S3, R2, etc.)
+- **Operating Mode Specialization**: Special optimizations for read-only and write-only modes
+#### Cache Configuration (Optional)
+You can override any of these automatically tuned parameters if needed:
 - **Hot Cache Size**: Control the maximum number of items to keep in memory.
+    - For large datasets (>100K items), consider values between 5,000-50,000 depending on available memory.
+    - In read-only mode, larger values (10,000-100,000) can be used for better performance.
 - **Eviction Threshold**: Set the threshold at which cache eviction begins (default: 0.8 or 80% of max size).
-- **Warm Cache TTL**: Set the time-to-live for items in the warm cache (default: 24 hours).
-- **Batch Size**: Control the number of items to process in a single batch for operations like prefetching (default: 10).
+    - For write-heavy workloads, lower values (0.6-0.7) may improve performance.
+    - For read-heavy workloads, higher values (0.8-0.9) are recommended.
+- **Warm Cache TTL**: Set the time-to-live for items in the warm cache (default: 3600000 ms or 1 hour).
+    - For frequently changing data, shorter TTLs are recommended.
+    - For relatively static data, longer TTLs improve performance.
+- **Batch Size**: Control the number of items to process in a single batch for operations like prefetching.
+    - For S3 or remote storage with large datasets, larger values (50-200) significantly improve throughput.
+    - In read-only mode with remote storage, even larger values (100-300) can be used.
+#### Auto-Tuning (Enabled by Default)
+- **Auto-Tune**: Enable or disable automatic tuning of cache parameters based on usage patterns (default: true).
+- **Auto-Tune Interval**: Set how frequently the system adjusts cache parameters (default: 60000 ms or 1 minute).
+#### Read-Only Mode Optimizations (Automatic)
+Read-only mode includes special optimizations for search performance that are automatically applied:
-These improvements make Brainy more efficient, scalable, and adaptable to different environments and usage patterns.
+- **Larger Cache Sizes**: Automatically uses more memory for caching (up to 40% of free memory for large datasets).
+- **Aggressive Prefetching**: Loads more data in each batch to reduce the number of storage requests.
+- **Prefetch Strategy**: Defaults to 'aggressive' prefetching strategy in read-only mode.
+#### Example Configuration for Large S3 Datasets
+```javascript
+const brainy = new BrainyData({
+  readOnly: true,
+  lazyLoadInReadOnlyMode: true,
+  storage: {
+    type: 's3',
+    s3Storage: {
+      bucketName: 'your-bucket',
+      accessKeyId: 'your-access-key',
+      secretAccessKey: 'your-secret-key',
+      region: 'your-region'
+    }
+  },
+  cache: {
+    hotCacheMaxSize: 20000,
+    hotCacheEvictionThreshold: 0.85,
+    batchSize: 100,
+    readOnlyMode: {
+      hotCacheMaxSize: 50000,
+      batchSize: 200,
+      prefetchStrategy: 'aggressive'
+    }
+  }
+});
+```
+These configuration options make Brainy more efficient, scalable, and adaptable to different environments and usage
+patterns, especially for large datasets in cloud storage.
 ## Testing
-Brainy uses Vitest for testing. For detailed information about testing in Brainy, including test configuration, scripts, reporting tools, and best practices, see our [Testing Guide](TESTING.md).
+Brainy uses Vitest for testing. For detailed information about testing in Brainy, including test configuration, scripts,
+reporting tools, and best practices, see our [Testing Guide](docs/technical/TESTING.md).
 Here are some common test commands:
@@ -1505,45 +1645,18 @@ see [DEVELOPERS.md](DEVELOPERS.md).
 We have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to follow.
-## Release Workflow
-Brainy uses a streamlined release workflow that automates version updates, changelog generation, GitHub releases, and NPM deployment.
-### Automated Release Process
-The release workflow combines several steps into a single command:
-1. **Build the project** - Ensures the code compiles correctly
-2. **Run tests** - Verifies that all tests pass
-3. **Update version** - Bumps the version number (patch, minor, or major)
-4. **Generate changelog** - Automatically updates CHANGELOG.md with commit messages since the last release
-5. **Create GitHub release** - Creates a GitHub release with auto-generated notes
-6. **Publish to NPM** - Deploys the package to NPM
-### Release Commands
-Use one of the following commands to release a new version:
-```bash
-# Release with patch version update (0.0.x)
-npm run workflow:patch
-# Release with minor version update (0.x.0)
-npm run workflow:minor
-# Release with major version update (x.0.0)
-npm run workflow:major
+### Commit Message Format
-# Default workflow (same as patch)
-npm run workflow
+For best results with automatic changelog generation, follow
+the [Conventional Commits](https://www.conventionalcommits.org/) specification for your commit messages:
-# Dry run (build, test, and simulate version update without making changes)
-npm run workflow:dry-run
 ```
+AI Template for automated commit messages:
-### Commit Message Format
-For best results with automatic changelog generation, follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for your commit messages:
+Use Conventional Commit format
+Specify the changes in a structured format
+Add information about the purpose of the commit
+```
 ```
 <type>(<scope>): <description>
@@ -1554,6 +1667,7 @@ For best results with automatic changelog generation, follow the [Conventional C
 ```
 Where `<type>` is one of:
 - `feat`: A new feature (maps to **Added** section)
 - `fix`: A bug fix (maps to **Fixed** section)
 - `chore`: Regular maintenance tasks (maps to **Changed** section)
@@ -1567,10 +1681,10 @@ If you need more control over the release process, you can use the individual co
 ```bash
 # Update version and generate changelog
-npm run release:patch  # or release:minor, release:major
+npm run _release:patch  # or _release:minor, _release:major
 # Create GitHub release
-npm run github-release
+npm run _github-release
 # Publish to NPM
 npm publish