@soulcraft/brainy 3.32.1 → 3.34.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,218 @@
2
2
 
3
3
  All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
4
4
 
5
+ ### [3.34.0](https://github.com/soulcraftlabs/brainy/compare/v3.33.0...v3.34.0) (2025-10-09)
6
+
7
+ - test: adjust type-matching tests for real embeddings (v3.33.0) (1c5c77e)
8
+ - perf: pre-compute type embeddings at build time (zero runtime cost) (0d649b8)
9
+ - perf: optimize concept extraction for production (15x faster) (87eb60d)
10
+ - perf: implement smart count batching for 10x faster bulk operations (e52bcaf)
11
+
12
+
13
+ ## [3.33.0](https://github.com/soulcraftlabs/brainy/compare/v3.32.5...v3.33.0) (2025-10-09)
14
+
15
+ ### 🚀 Performance - Build-Time Type Embeddings (Zero Runtime Cost)
16
+
17
+ **Production Optimization: All type embeddings are now pre-computed at build time**
18
+
19
+ #### Problem
20
+ Type embeddings for 31 NounTypes + 40 VerbTypes were computed at runtime in 3 different places:
21
+ - `NeuralEntityExtractor` computed noun type embeddings on first use
22
+ - `BrainyTypes` computed all 31+40 type embeddings on init
23
+ - `NaturalLanguageProcessor` computed all 31+40 type embeddings on init
24
+ - **Result**: Every process restart = ~70+ embedding operations = 5-10 second initialization delay
25
+
26
+ #### Solution
27
+ Pre-computed type embeddings at build time (similar to pattern embeddings):
28
+ - Created `scripts/buildTypeEmbeddings.ts` - generates embeddings for all types once during build
29
+ - Created `src/neural/embeddedTypeEmbeddings.ts` - stores pre-computed embeddings as base64 data
30
+ - All consumers now load instant embeddings instead of computing at runtime
31
+
32
+ #### Benefits
33
+ - ✅ **Zero runtime computation** - type embeddings loaded instantly from embedded data
34
+ - ✅ **Survives all restarts** - embeddings bundled in package, no re-computation needed
35
+ - ✅ **All 71 types available** - 31 noun + 40 verb types instantly accessible
36
+ - ✅ **~100KB overhead** - small memory cost for huge performance gain
37
+ - ✅ **Permanent optimization** - build once, fast forever
38
+
39
+ #### Build Process
40
+ ```bash
41
+ # Manual rebuild (if types change)
42
+ npm run build:types:force
43
+
44
+ # Automatic check (integrated into build)
45
+ npm run build # Rebuilds types only if source changed
46
+ ```
47
+
48
+ #### Files Changed
49
+ - `scripts/buildTypeEmbeddings.ts` - Build script to generate type embeddings
50
+ - `scripts/check-type-embeddings.cjs` - Check if rebuild needed
51
+ - `src/neural/embeddedTypeEmbeddings.ts` - Pre-computed embeddings (auto-generated)
52
+ - `src/neural/entityExtractor.ts` - Uses embedded types (no runtime computation)
53
+ - `src/augmentations/typeMatching/brainyTypes.ts` - Uses embedded types (instant init)
54
+ - `src/neural/naturalLanguageProcessor.ts` - Uses embedded types (instant init)
55
+ - `src/importers/SmartExcelImporter.ts` - Updated comments to reflect zero-cost embeddings
56
+ - `package.json` - Added type embedding build scripts
57
+
58
+ #### Impact
59
+ - v3.32.5: Type embeddings computed at runtime (2-31 operations per restart)
60
+ - v3.33.0: Type embeddings loaded instantly (0 operations, pre-computed at build)
61
+ - **Permanent 100% elimination of type embedding runtime cost**
62
+
63
+ ---
64
+
65
+ ### [3.32.5](https://github.com/soulcraftlabs/brainy/compare/v3.32.4...v3.32.5) (2025-10-09)
66
+
67
+ ### 🚀 Performance - Neural Extraction Optimization (15x Faster)
68
+
69
+ **Fixed: Concept extraction now production-ready for large files**
70
+
71
+ #### Problem
72
+ `brain.extractConcepts()` appeared to hang on large Excel/PDF/Markdown files:
73
+ - Previously initialized ALL 31 NounTypes (31 embedding operations)
74
+ - For 100-row Excel file: 3,100+ embedding operations
75
+ - Caused apparent hangs/timeouts in production
76
+
77
+ #### Solution
78
+ Optimized `NeuralEntityExtractor` to only initialize requested types:
79
+ - `extractConcepts()` now only initializes Concept + Topic types (2 embeds vs 31)
80
+ - **15x faster initialization** (31 embeds → 2 embeds)
81
+ - Re-enabled concept extraction by default in Excel importer
82
+
83
+ #### Performance Impact
84
+ - **Small files (<100 rows)**: 5-20 seconds (was: appeared to hang)
85
+ - **Medium files (100-500 rows)**: 20-100 seconds (was: timeout)
86
+ - **Large files (500+ rows)**: Can be disabled if needed via `enableConceptExtraction: false`
87
+
88
+ #### Files Changed
89
+ - `src/neural/entityExtractor.ts`: Lazy type initialization
90
+ - `src/importers/SmartExcelImporter.ts`: Re-enabled with optimization notes
91
+
92
+ ### 🔧 Diagnostics - GCS Initialization Logging
93
+
94
+ **Added: Enhanced logging for GCS bucket scanning**
95
+
96
+ Added detailed diagnostic logs to help debug GCS initialization issues:
97
+ - Shows prefixes being scanned
98
+ - Displays file counts and sample filenames
99
+ - Warns if no entities found
100
+
101
+ #### Files Changed
102
+ - `src/storage/adapters/gcsStorage.ts`: Enhanced `initializeCountsFromScan()` logging
103
+
104
+ ---
105
+
106
+ ### [3.32.3](https://github.com/soulcraftlabs/brainy/compare/v3.32.2...v3.32.3) (2025-10-09)
107
+
108
+ ### ⚡ Performance Optimization - Smart Count Batching for Production Scale
109
+
110
+ **Optimized: 10x faster bulk operations with storage-aware count batching**
111
+
112
+ #### What Changed
113
+ v3.32.2 fixed the critical container restart bug by persisting counts on EVERY operation. This made the system reliable but introduced performance overhead for bulk operations (1000 entities = 1000 GCS writes = ~50 seconds).
114
+
115
+ v3.32.3 introduces **Smart Count Batching** - a storage-type aware optimization that maintains v3.32.2's reliability while dramatically improving bulk operation performance.
116
+
117
+ #### How It Works
118
+ - **Cloud storage** (GCS, S3, R2): Batches count persistence (10 operations OR 5 seconds, whichever first)
119
+ - **Local storage** (File System, Memory): Persists immediately (already fast, no benefit from batching)
120
+ - **Graceful shutdown hooks**: SIGTERM/SIGINT handlers flush pending counts before shutdown
121
+
122
+ #### Performance Impact
123
+
124
+ **API Use Case (1-10 entities):**
125
+ - Before: 2 entities = 100ms overhead, 10 entities = 500ms overhead
126
+ - After: 2 entities = 50ms overhead (batched at 5s), 10 entities = 50ms overhead (batched at threshold)
127
+ - **2-10x faster for small batches**
128
+
129
+ **Bulk Import (1000 entities via loop):**
130
+ - Before (v3.32.2): 1000 entities = 1000 GCS writes = ~50 seconds overhead
131
+ - After (v3.32.3): 1000 entities = 100 GCS writes = ~5 seconds overhead
132
+ - **10x faster for bulk operations**
133
+
134
+ #### Reliability Guarantees
135
+ ✅ **Container Restart Scenario:** Same reliability as v3.32.2
136
+ - Counts persist every 10 operations OR 5 seconds (whichever first)
137
+ - Maximum data loss window: 9 operations OR 5 seconds of data (only on ungraceful crash)
138
+
139
+ ✅ **Graceful Shutdown (Cloud Run/Fargate/Lambda):**
140
+ - SIGTERM/SIGINT handlers flush pending counts immediately
141
+ - Zero data loss on graceful container shutdown
142
+
143
+ ✅ **Production Ready:**
144
+ - Backward compatible (no breaking changes)
145
+ - Zero configuration required (automatic based on storage type)
146
+ - Works transparently for all existing code
147
+
148
+ #### Implementation Details
149
+ - `baseStorageAdapter.ts`: Added smart batching with `scheduleCountPersist()` and `flushCounts()`
150
+ - New method: `isCloudStorage()` - Detects storage type for adaptive strategy
151
+ - New method: `scheduleCountPersist()` - Smart batching logic
152
+ - New method: `flushCounts()` - Immediate flush for shutdown hooks
153
+ - Modified: 4 count methods to use smart batching instead of immediate persistence
154
+
155
+ - `gcsStorage.ts`: Added cloud storage detection
156
+ - Override `isCloudStorage()` to return `true` (enables batching)
157
+
158
+ - `s3CompatibleStorage.ts`: Added cloud storage detection
159
+ - Override `isCloudStorage()` to return `true` (enables batching)
160
+
161
+ - `brainy.ts`: Added graceful shutdown hooks
162
+ - `registerShutdownHooks()`: Handles SIGTERM, SIGINT, beforeExit
163
+ - Ensures pending count batches are flushed before container shutdown
164
+ - Critical for Cloud Run, Fargate, Lambda, and other containerized deployments
165
+
166
+ #### Migration
167
+ **No action required!** This is a transparent performance optimization.
168
+ - ✅ Same public API
169
+ - ✅ Same reliability guarantees
170
+ - ✅ Better performance (automatic)
171
+
172
+ ---
173
+
174
+ ### [3.32.2](https://github.com/soulcraftlabs/brainy/compare/v3.32.1...v3.32.2) (2025-10-09)
175
+
176
+ ### 🐛 Critical Bug Fixes - Container Restart Persistence
177
+
178
+ **Fixed: brain.find({ where: {...} }) returns empty array after restart**
179
+ **Fixed: brain.init() returns 0 entities after container restart**
180
+
181
+ #### Root Cause
182
+ Count persistence was optimized to save only every 10 operations. If <10 entities were added before container restart, counts were never persisted to storage. After restart: `totalNounCount = 0`, causing empty query results.
183
+
184
+ #### Impact
185
+ Critical for serverless/containerized deployments (Cloud Run, Fargate, Lambda) where containers restart frequently. The basic write→restart→read scenario was broken.
186
+
187
+ #### Changes
188
+ - `baseStorageAdapter.ts`: Persist counts on EVERY operation (not every 10)
189
+ - `incrementEntityCountSafe()`: Now persists immediately
190
+ - `decrementEntityCountSafe()`: Now persists immediately
191
+ - `incrementVerbCount()`: Now persists immediately
192
+ - `decrementVerbCount()`: Now persists immediately
193
+
194
+ - `gcsStorage.ts`: Better error handling for count initialization
195
+ - `initializeCounts()`: Fail loudly on network/permission errors
196
+ - `initializeCountsFromScan()`: Throw on scan failures instead of silent fail
197
+ - Added recovery logic with bucket scan fallback
198
+
199
+ #### Test Scenario (Now Fixed)
200
+ ```typescript
201
+ // Service A: Add 2 entities
202
+ await brain.add({ data: 'Entity 1' })
203
+ await brain.add({ data: 'Entity 2' })
204
+
205
+ // Container restarts (Cloud Run, Fargate, etc.)
206
+
207
+ // Service B: Query data
208
+ const stats = await brain.getStats()
209
+ console.log(stats.entities.total) // Was: 0 ❌ | Now: 2 ✅
210
+
211
+ const results = await brain.find({ where: { status: 'active' }})
212
+ console.log(results.length) // Was: 0 ❌ | Now: 2 ✅
213
+ ```
214
+
215
+ ---
216
+
5
217
  ## [3.31.0](https://github.com/soulcraftlabs/brainy/compare/v3.30.2...v3.31.0) (2025-10-09)
6
218
 
7
219
  ### 🐛 Critical Bug Fixes - Production-Scale Import Performance
@@ -24,6 +24,8 @@ export interface TypeMatchResult {
24
24
  }
25
25
  /**
26
26
  * BrainyTypes - Intelligent type detection for nouns and verbs
27
+ * PRODUCTION OPTIMIZATION (v3.33.0): Uses pre-computed type embeddings
28
+ * Type embeddings are loaded instantly; only input objects are embedded at runtime
27
29
  */
28
30
  export declare class BrainyTypes {
29
31
  private embedder;
@@ -33,7 +35,9 @@ export declare class BrainyTypes {
33
35
  private cache;
34
36
  constructor();
35
37
  /**
36
- * Initialize the type matcher by generating embeddings for all types
38
+ * Initialize the type matcher by loading pre-computed embeddings
39
+ * INSTANT - type embeddings are loaded from pre-computed data
40
+ * Only the model for input embedding needs initialization
37
41
  */
38
42
  init(): Promise<void>;
39
43
  /**
@@ -13,6 +13,7 @@
13
13
  import { NounType, VerbType } from '../../types/graphTypes.js';
14
14
  import { TransformerEmbedding } from '../../utils/embedding.js';
15
15
  import { cosineDistance } from '../../utils/distance.js';
16
+ import { getNounTypeEmbeddings, getVerbTypeEmbeddings } from '../../neural/embeddedTypeEmbeddings.js';
16
17
  /**
17
18
  * Type descriptions for semantic matching
18
19
  * These descriptions are used to generate embeddings for each type
@@ -109,6 +110,8 @@ const VERB_TYPE_DESCRIPTIONS = {
109
110
  };
110
111
  /**
111
112
  * BrainyTypes - Intelligent type detection for nouns and verbs
113
+ * PRODUCTION OPTIMIZATION (v3.33.0): Uses pre-computed type embeddings
114
+ * Type embeddings are loaded instantly; only input objects are embedded at runtime
112
115
  */
113
116
  export class BrainyTypes {
114
117
  constructor() {
@@ -116,23 +119,27 @@ export class BrainyTypes {
116
119
  this.verbEmbeddings = new Map();
117
120
  this.initialized = false;
118
121
  this.cache = new Map();
122
+ // Embedder only used for input objects, NOT for type embeddings
119
123
  this.embedder = new TransformerEmbedding({ verbose: false });
120
124
  }
121
125
  /**
122
- * Initialize the type matcher by generating embeddings for all types
126
+ * Initialize the type matcher by loading pre-computed embeddings
127
+ * INSTANT - type embeddings are loaded from pre-computed data
128
+ * Only the model for input embedding needs initialization
123
129
  */
124
130
  async init() {
125
131
  if (this.initialized)
126
132
  return;
133
+ // Initialize embedder for input objects only
127
134
  await this.embedder.init();
128
- // Generate embeddings for noun types
129
- for (const [type, description] of Object.entries(NOUN_TYPE_DESCRIPTIONS)) {
130
- const embedding = await this.embedder.embed(description);
135
+ // Load pre-computed type embeddings (instant, no computation)
136
+ const nounEmbeddings = getNounTypeEmbeddings();
137
+ const verbEmbeddings = getVerbTypeEmbeddings();
138
+ // Convert NounType/VerbType keys to strings for lookup
139
+ for (const [type, embedding] of nounEmbeddings.entries()) {
131
140
  this.nounEmbeddings.set(type, embedding);
132
141
  }
133
- // Generate embeddings for verb types
134
- for (const [type, description] of Object.entries(VERB_TYPE_DESCRIPTIONS)) {
135
- const embedding = await this.embedder.embed(description);
142
+ for (const [type, embedding] of verbEmbeddings.entries()) {
136
143
  this.verbEmbeddings.set(type, embedding);
137
144
  }
138
145
  this.initialized = true;
package/dist/brainy.d.ts CHANGED
@@ -20,6 +20,8 @@ import { BrainyInterface } from './types/brainyInterface.js';
20
20
  * Implements BrainyInterface to ensure consistency across integrations
21
21
  */
22
22
  export declare class Brainy<T = any> implements BrainyInterface<T> {
23
+ private static shutdownHooksRegisteredGlobally;
24
+ private static instances;
23
25
  private index;
24
26
  private storage;
25
27
  private metadataIndex;
@@ -48,6 +50,20 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
48
50
  init(overrides?: Partial<BrainyConfig & {
49
51
  dimensions?: number;
50
52
  }>): Promise<void>;
53
+ /**
54
+ * Register shutdown hooks for graceful count flushing (v3.32.3+)
55
+ *
56
+ * Ensures pending count batches are persisted before container shutdown.
57
+ * Critical for Cloud Run, Fargate, Lambda, and other containerized deployments.
58
+ *
59
+ * Handles:
60
+ * - SIGTERM: Graceful termination (Cloud Run, Fargate, Lambda)
61
+ * - SIGINT: Ctrl+C (development/local testing)
62
+ * - beforeExit: Node.js cleanup hook (fallback)
63
+ *
64
+ * NOTE: Registers globally (once for all instances) to avoid MaxListenersExceededWarning
65
+ */
66
+ private registerShutdownHooks;
51
67
  /**
52
68
  * Ensure Brainy is initialized
53
69
  */
package/dist/brainy.js CHANGED
@@ -42,6 +42,8 @@ export class Brainy {
42
42
  if (this.config.distributed?.enabled) {
43
43
  this.setupDistributedComponents();
44
44
  }
45
+ // Track this instance for shutdown hooks
46
+ Brainy.instances.push(this);
45
47
  // Index and storage are initialized in init() because they may need each other
46
48
  }
47
49
  /**
@@ -126,12 +128,63 @@ export class Brainy {
126
128
  if (this.config.warmup) {
127
129
  await this.warmup();
128
130
  }
131
+ // Register shutdown hooks for graceful count flushing (once globally)
132
+ if (!Brainy.shutdownHooksRegisteredGlobally) {
133
+ this.registerShutdownHooks();
134
+ Brainy.shutdownHooksRegisteredGlobally = true;
135
+ }
129
136
  this.initialized = true;
130
137
  }
131
138
  catch (error) {
132
139
  throw new Error(`Failed to initialize Brainy: ${error}`);
133
140
  }
134
141
  }
142
+ /**
143
+ * Register shutdown hooks for graceful count flushing (v3.32.3+)
144
+ *
145
+ * Ensures pending count batches are persisted before container shutdown.
146
+ * Critical for Cloud Run, Fargate, Lambda, and other containerized deployments.
147
+ *
148
+ * Handles:
149
+ * - SIGTERM: Graceful termination (Cloud Run, Fargate, Lambda)
150
+ * - SIGINT: Ctrl+C (development/local testing)
151
+ * - beforeExit: Node.js cleanup hook (fallback)
152
+ *
153
+ * NOTE: Registers globally (once for all instances) to avoid MaxListenersExceededWarning
154
+ */
155
+ registerShutdownHooks() {
156
+ const flushOnShutdown = async () => {
157
+ console.log('⚠️ Shutdown signal received - flushing pending counts...');
158
+ try {
159
+ // Flush counts for all Brainy instances
160
+ let flushedCount = 0;
161
+ for (const instance of Brainy.instances) {
162
+ if (instance.storage && typeof instance.storage.flushCounts === 'function') {
163
+ await instance.storage.flushCounts();
164
+ flushedCount++;
165
+ }
166
+ }
167
+ if (flushedCount > 0) {
168
+ console.log(`✅ Counts flushed successfully (${flushedCount} instance${flushedCount > 1 ? 's' : ''})`);
169
+ }
170
+ }
171
+ catch (error) {
172
+ console.error('❌ Failed to flush counts on shutdown:', error);
173
+ }
174
+ };
175
+ // Graceful shutdown signals (registered once globally)
176
+ process.on('SIGTERM', async () => {
177
+ await flushOnShutdown();
178
+ process.exit(0);
179
+ });
180
+ process.on('SIGINT', async () => {
181
+ await flushOnShutdown();
182
+ process.exit(0);
183
+ });
184
+ process.on('beforeExit', async () => {
185
+ await flushOnShutdown();
186
+ });
187
+ }
135
188
  /**
136
189
  * Ensure Brainy is initialized
137
190
  */
@@ -2518,6 +2571,9 @@ export class Brainy {
2518
2571
  }
2519
2572
  }
2520
2573
  }
2574
+ // Static shutdown hook tracking (global, not per-instance)
2575
+ Brainy.shutdownHooksRegisteredGlobally = false;
2576
+ Brainy.instances = [];
2521
2577
  // Re-export types for convenience
2522
2578
  export * from './types/brainy.types.js';
2523
2579
  export { NounType, VerbType } from './types/graphTypes.js';
@@ -37,6 +37,18 @@ export class SmartExcelImporter {
37
37
  const opts = {
38
38
  enableNeuralExtraction: true,
39
39
  enableRelationshipInference: true,
40
+ // CONCEPT EXTRACTION PRODUCTION-READY (v3.33.0+):
41
+ // Type embeddings are now pre-computed at build time - zero runtime cost!
42
+ // All 31 noun types + 40 verb types instantly available
43
+ //
44
+ // Performance profile:
45
+ // - Type embeddings: INSTANT (pre-computed at build time, ~100KB in-memory)
46
+ // - Model loading: ~2-5 seconds (one-time, cached after first use)
47
+ // - Per-row extraction: ~50-200ms depending on definition length
48
+ // - 100 rows: ~5-20 seconds total (production ready)
49
+ // - 1000 rows: ~50-200 seconds (disable if needed via enableConceptExtraction: false)
50
+ //
51
+ // Enabled by default for production use.
40
52
  enableConceptExtraction: true,
41
53
  confidenceThreshold: 0.6,
42
54
  termColumn: 'term|name|title|concept',
@@ -0,0 +1,34 @@
1
+ /**
2
+ * 🧠 BRAINY EMBEDDED TYPE EMBEDDINGS
3
+ *
4
+ * AUTO-GENERATED - DO NOT EDIT
5
+ * Generated: 2025-10-10T01:27:22.642Z
6
+ * Noun Types: 31
7
+ * Verb Types: 40
8
+ *
9
+ * This file contains pre-computed embeddings for all NounTypes and VerbTypes.
10
+ * No runtime computation needed, instant availability!
11
+ */
12
+ import { NounType, VerbType } from '../types/graphTypes.js';
13
+ import { Vector } from '../coreTypes.js';
14
+ export declare const TYPE_METADATA: {
15
+ nounTypes: number;
16
+ verbTypes: number;
17
+ totalTypes: number;
18
+ embeddingDimensions: number;
19
+ generatedAt: string;
20
+ sizeBytes: {
21
+ embeddings: number;
22
+ base64: number;
23
+ };
24
+ };
25
+ /**
26
+ * Get noun type embeddings as a Map for fast lookup
27
+ * This is called once and cached
28
+ */
29
+ export declare function getNounTypeEmbeddings(): Map<NounType, Vector>;
30
+ /**
31
+ * Get verb type embeddings as a Map for fast lookup
32
+ * This is called once and cached
33
+ */
34
+ export declare function getVerbTypeEmbeddings(): Map<VerbType, Vector>;