@soulcraft/brainy 4.1.3 → 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +100 -7
- package/dist/brainy.d.ts +74 -16
- package/dist/brainy.js +74 -16
- package/dist/import/FormatDetector.d.ts +6 -1
- package/dist/import/FormatDetector.js +40 -1
- package/dist/import/ImportCoordinator.d.ts +155 -5
- package/dist/import/ImportCoordinator.js +346 -6
- package/dist/import/InstancePool.d.ts +136 -0
- package/dist/import/InstancePool.js +231 -0
- package/dist/importers/SmartCSVImporter.d.ts +2 -1
- package/dist/importers/SmartCSVImporter.js +11 -22
- package/dist/importers/SmartDOCXImporter.d.ts +125 -0
- package/dist/importers/SmartDOCXImporter.js +227 -0
- package/dist/importers/SmartExcelImporter.d.ts +12 -1
- package/dist/importers/SmartExcelImporter.js +40 -25
- package/dist/importers/SmartJSONImporter.d.ts +1 -0
- package/dist/importers/SmartJSONImporter.js +25 -6
- package/dist/importers/SmartMarkdownImporter.d.ts +2 -1
- package/dist/importers/SmartMarkdownImporter.js +11 -16
- package/dist/importers/SmartPDFImporter.d.ts +2 -1
- package/dist/importers/SmartPDFImporter.js +11 -22
- package/dist/importers/SmartYAMLImporter.d.ts +121 -0
- package/dist/importers/SmartYAMLImporter.js +275 -0
- package/dist/importers/VFSStructureGenerator.js +12 -0
- package/dist/neural/SmartExtractor.d.ts +279 -0
- package/dist/neural/SmartExtractor.js +592 -0
- package/dist/neural/SmartRelationshipExtractor.d.ts +217 -0
- package/dist/neural/SmartRelationshipExtractor.js +396 -0
- package/dist/neural/embeddedTypeEmbeddings.d.ts +1 -1
- package/dist/neural/embeddedTypeEmbeddings.js +2 -2
- package/dist/neural/entityExtractor.d.ts +3 -0
- package/dist/neural/entityExtractor.js +34 -36
- package/dist/neural/presets.d.ts +189 -0
- package/dist/neural/presets.js +365 -0
- package/dist/neural/signals/ContextSignal.d.ts +166 -0
- package/dist/neural/signals/ContextSignal.js +646 -0
- package/dist/neural/signals/EmbeddingSignal.d.ts +175 -0
- package/dist/neural/signals/EmbeddingSignal.js +435 -0
- package/dist/neural/signals/ExactMatchSignal.d.ts +220 -0
- package/dist/neural/signals/ExactMatchSignal.js +542 -0
- package/dist/neural/signals/PatternSignal.d.ts +159 -0
- package/dist/neural/signals/PatternSignal.js +478 -0
- package/dist/neural/signals/VerbContextSignal.d.ts +102 -0
- package/dist/neural/signals/VerbContextSignal.js +390 -0
- package/dist/neural/signals/VerbEmbeddingSignal.d.ts +131 -0
- package/dist/neural/signals/VerbEmbeddingSignal.js +304 -0
- package/dist/neural/signals/VerbExactMatchSignal.d.ts +115 -0
- package/dist/neural/signals/VerbExactMatchSignal.js +335 -0
- package/dist/neural/signals/VerbPatternSignal.d.ts +104 -0
- package/dist/neural/signals/VerbPatternSignal.js +457 -0
- package/dist/types/graphTypes.d.ts +2 -0
- package/package.json +4 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
|
|
4
4
|
|
|
5
|
+
### [4.1.4](https://github.com/soulcraftlabs/brainy/compare/v4.1.3...v4.1.4) (2025-10-21)
|
|
6
|
+
|
|
7
|
+
- feat: add import API validation and v4.x migration guide (a1a0576)
|
|
8
|
+
|
|
9
|
+
|
|
5
10
|
### [4.1.3](https://github.com/soulcraftlabs/brainy/compare/v4.1.2...v4.1.3) (2025-10-21)
|
|
6
11
|
|
|
7
12
|
- perf: make getRelations() pagination consistent and efficient (54d819c)
|
|
@@ -223,22 +228,110 @@ $ brainy import ./research-papers --extract-concepts --progress
|
|
|
223
228
|
|
|
224
229
|
### ⚠️ Breaking Changes
|
|
225
230
|
|
|
226
|
-
|
|
231
|
+
#### 💥 Import API Redesign
|
|
232
|
+
|
|
233
|
+
The import API has been redesigned for clarity and better feature control. **Old v3.x option names are no longer recognized** and will throw errors.
|
|
234
|
+
|
|
235
|
+
**What Changed:**
|
|
236
|
+
|
|
237
|
+
| v3.x Option | v4.x Option | Action Required |
|
|
238
|
+
|-------------|-------------|-----------------|
|
|
239
|
+
| `extractRelationships` | `enableRelationshipInference` | **Rename option** |
|
|
240
|
+
| `autoDetect` | *(removed)* | **Delete option** (always enabled) |
|
|
241
|
+
| `createFileStructure` | `vfsPath` | **Replace** with VFS path |
|
|
242
|
+
| `excelSheets` | *(removed)* | **Delete option** (all sheets processed) |
|
|
243
|
+
| `pdfExtractTables` | *(removed)* | **Delete option** (always enabled) |
|
|
244
|
+
| - | `enableNeuralExtraction` | **Add option** (new in v4.x) |
|
|
245
|
+
| - | `enableConceptExtraction` | **Add option** (new in v4.x) |
|
|
246
|
+
| - | `preserveSource` | **Add option** (new in v4.x) |
|
|
247
|
+
|
|
248
|
+
**Why These Changes?**
|
|
249
|
+
|
|
250
|
+
1. **Clearer option names**: `enableRelationshipInference` explicitly indicates AI-powered relationship inference
|
|
251
|
+
2. **Separation of concerns**: Neural extraction, relationship inference, and VFS are now separate, explicit options
|
|
252
|
+
3. **Better defaults**: Auto-detection and AI features are enabled by default
|
|
253
|
+
4. **Reduced confusion**: Removed redundant options like `autoDetect` and format-specific options
|
|
254
|
+
|
|
255
|
+
**Migration Examples:**
|
|
256
|
+
|
|
257
|
+
<details>
|
|
258
|
+
<summary>Example 1: Basic Excel Import</summary>
|
|
259
|
+
|
|
260
|
+
```typescript
|
|
261
|
+
// v3.x (OLD - Will throw error)
|
|
262
|
+
await brain.import('./glossary.xlsx', {
|
|
263
|
+
extractRelationships: true,
|
|
264
|
+
createFileStructure: true
|
|
265
|
+
})
|
|
266
|
+
|
|
267
|
+
// v4.x (NEW - Use this)
|
|
268
|
+
await brain.import('./glossary.xlsx', {
|
|
269
|
+
enableRelationshipInference: true,
|
|
270
|
+
vfsPath: '/imports/glossary'
|
|
271
|
+
})
|
|
272
|
+
```
|
|
273
|
+
</details>
|
|
274
|
+
|
|
275
|
+
<details>
|
|
276
|
+
<summary>Example 2: Full-Featured Import</summary>
|
|
227
277
|
|
|
228
|
-
|
|
278
|
+
```typescript
|
|
279
|
+
// v3.x (OLD - Will throw error)
|
|
280
|
+
await brain.import('./data.xlsx', {
|
|
281
|
+
extractRelationships: true,
|
|
282
|
+
autoDetect: true,
|
|
283
|
+
createFileStructure: true
|
|
284
|
+
})
|
|
285
|
+
|
|
286
|
+
// v4.x (NEW - Use this)
|
|
287
|
+
await brain.import('./data.xlsx', {
|
|
288
|
+
enableNeuralExtraction: true, // Extract entity names
|
|
289
|
+
enableRelationshipInference: true, // Infer semantic relationships
|
|
290
|
+
enableConceptExtraction: true, // Extract entity types
|
|
291
|
+
vfsPath: '/imports/data', // VFS directory
|
|
292
|
+
preserveSource: true // Save original file
|
|
293
|
+
})
|
|
294
|
+
```
|
|
295
|
+
</details>
|
|
296
|
+
|
|
297
|
+
**Error Messages:**
|
|
298
|
+
|
|
299
|
+
If you use old v3.x options, you'll get a clear error message:
|
|
300
|
+
|
|
301
|
+
```
|
|
302
|
+
❌ Invalid import options detected (Brainy v4.x breaking changes)
|
|
303
|
+
|
|
304
|
+
The following v3.x options are no longer supported:
|
|
305
|
+
|
|
306
|
+
❌ extractRelationships
|
|
307
|
+
→ Use: enableRelationshipInference
|
|
308
|
+
→ Why: Option renamed for clarity in v4.x
|
|
309
|
+
|
|
310
|
+
📖 Migration Guide: https://brainy.dev/docs/guides/migrating-to-v4
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
**Other v4.0.0 Features (Non-Breaking):**
|
|
314
|
+
|
|
315
|
+
All other v4.0.0 features are:
|
|
229
316
|
- ✅ Opt-in (lifecycle, compression, batch operations)
|
|
230
317
|
- ✅ Additive (new CLI commands, new methods)
|
|
231
318
|
- ✅ Non-breaking (existing code continues to work)
|
|
232
319
|
|
|
233
320
|
### 📝 Migration
|
|
234
321
|
|
|
235
|
-
**
|
|
322
|
+
**Import API migration required** if you use `brain.import()` with the old v3.x option names.
|
|
236
323
|
|
|
237
|
-
|
|
324
|
+
#### Required Changes:
|
|
238
325
|
1. Update to v4.0.0: `npm install @soulcraft/brainy@4.0.0`
|
|
239
|
-
2.
|
|
240
|
-
3.
|
|
241
|
-
|
|
326
|
+
2. Update import calls to use new option names (see table above)
|
|
327
|
+
3. Test your imports - you'll get clear error messages if you use old options
|
|
328
|
+
|
|
329
|
+
#### Optional Enhancements:
|
|
330
|
+
- Enable lifecycle policies: `brainy storage lifecycle set`
|
|
331
|
+
- Use batch operations: `brainy storage batch-delete entities.txt`
|
|
332
|
+
- See full migration guide: `docs/guides/migrating-to-v4.md`
|
|
333
|
+
|
|
334
|
+
**Complete Migration Guide:** [docs/guides/migrating-to-v4.md](./docs/guides/migrating-to-v4.md)
|
|
242
335
|
|
|
243
336
|
### 🎓 What This Means
|
|
244
337
|
|
package/dist/brainy.d.ts
CHANGED
|
@@ -686,33 +686,91 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
|
|
|
686
686
|
limit?: number;
|
|
687
687
|
}): Promise<string[]>;
|
|
688
688
|
/**
|
|
689
|
-
* Import files with
|
|
689
|
+
* Import files with intelligent extraction and dual storage (VFS + Knowledge Graph)
|
|
690
690
|
*
|
|
691
691
|
* Unified import system that:
|
|
692
692
|
* - Auto-detects format (Excel, PDF, CSV, JSON, Markdown)
|
|
693
|
-
* - Extracts entities
|
|
693
|
+
* - Extracts entities with AI-powered name/type detection
|
|
694
|
+
* - Infers semantic relationships from context
|
|
694
695
|
* - Stores in both VFS (organized files) and Knowledge Graph (connected entities)
|
|
695
696
|
* - Links VFS files to graph entities
|
|
696
697
|
*
|
|
697
|
-
* @
|
|
698
|
-
* // Import from file path
|
|
699
|
-
* const result = await brain.import('/path/to/file.xlsx')
|
|
698
|
+
* @since 4.0.0
|
|
700
699
|
*
|
|
701
|
-
* @example
|
|
702
|
-
*
|
|
700
|
+
* @example Quick Start (All AI features enabled by default)
|
|
701
|
+
* ```typescript
|
|
702
|
+
* const result = await brain.import('./glossary.xlsx')
|
|
703
|
+
* // Auto-detects format, extracts entities, infers relationships
|
|
704
|
+
* ```
|
|
705
|
+
*
|
|
706
|
+
* @example Full-Featured Import (v4.x)
|
|
707
|
+
* ```typescript
|
|
708
|
+
* const result = await brain.import('./data.xlsx', {
|
|
709
|
+
* // AI features
|
|
710
|
+
* enableNeuralExtraction: true, // Extract entity names/metadata
|
|
711
|
+
* enableRelationshipInference: true, // Detect semantic relationships
|
|
712
|
+
* enableConceptExtraction: true, // Extract types/concepts
|
|
713
|
+
*
|
|
714
|
+
* // VFS features
|
|
715
|
+
* vfsPath: '/imports/my-data', // Store in VFS directory
|
|
716
|
+
* groupBy: 'type', // Organize by entity type
|
|
717
|
+
* preserveSource: true, // Keep original file
|
|
718
|
+
*
|
|
719
|
+
* // Progress tracking
|
|
720
|
+
* onProgress: (p) => console.log(p.message)
|
|
721
|
+
* })
|
|
722
|
+
* ```
|
|
723
|
+
*
|
|
724
|
+
* @example Performance Tuning (Large Files)
|
|
725
|
+
* ```typescript
|
|
726
|
+
* const result = await brain.import('./huge-file.csv', {
|
|
727
|
+
* enableDeduplication: false, // Skip dedup for speed
|
|
728
|
+
* confidenceThreshold: 0.8, // Higher threshold = fewer entities
|
|
729
|
+
* onProgress: (p) => console.log(`${p.processed}/${p.total}`)
|
|
730
|
+
* })
|
|
731
|
+
* ```
|
|
732
|
+
*
|
|
733
|
+
* @example Import from Buffer or Object
|
|
734
|
+
* ```typescript
|
|
735
|
+
* // From buffer
|
|
703
736
|
* const result = await brain.import(buffer, { format: 'pdf' })
|
|
704
737
|
*
|
|
705
|
-
*
|
|
706
|
-
* // Import JSON object
|
|
738
|
+
* // From object
|
|
707
739
|
* const result = await brain.import({ entities: [...] })
|
|
740
|
+
* ```
|
|
708
741
|
*
|
|
709
|
-
* @
|
|
710
|
-
*
|
|
711
|
-
*
|
|
712
|
-
*
|
|
713
|
-
*
|
|
714
|
-
*
|
|
715
|
-
*
|
|
742
|
+
* @throws {Error} If invalid options are provided (v4.x breaking changes)
|
|
743
|
+
*
|
|
744
|
+
* @see {@link https://brainy.dev/docs/api/import API Documentation}
|
|
745
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
746
|
+
*
|
|
747
|
+
* @remarks
|
|
748
|
+
* **⚠️ Breaking Changes from v3.x:**
|
|
749
|
+
*
|
|
750
|
+
* The import API was redesigned in v4.0.0 for clarity and better feature control.
|
|
751
|
+
* Old v3.x option names are **no longer recognized** and will throw errors.
|
|
752
|
+
*
|
|
753
|
+
* **Option Changes:**
|
|
754
|
+
* - ❌ `extractRelationships` → ✅ `enableRelationshipInference`
|
|
755
|
+
* - ❌ `createFileStructure` → ✅ `vfsPath: '/your/path'`
|
|
756
|
+
* - ❌ `autoDetect` → ✅ *(removed - always enabled)*
|
|
757
|
+
* - ❌ `excelSheets` → ✅ *(removed - all sheets processed)*
|
|
758
|
+
* - ❌ `pdfExtractTables` → ✅ *(removed - always enabled)*
|
|
759
|
+
*
|
|
760
|
+
* **New Options:**
|
|
761
|
+
* - ✅ `enableNeuralExtraction` - Extract entity names via AI
|
|
762
|
+
* - ✅ `enableConceptExtraction` - Extract entity types via AI
|
|
763
|
+
* - ✅ `preserveSource` - Save original file in VFS
|
|
764
|
+
*
|
|
765
|
+
* **If you get an error:**
|
|
766
|
+
* The error message includes migration instructions and examples.
|
|
767
|
+
* See the complete migration guide for all details.
|
|
768
|
+
*
|
|
769
|
+
* **Why these changes?**
|
|
770
|
+
* - Clearer option names (explicitly describe what they do)
|
|
771
|
+
* - Separation of concerns (neural, relationships, VFS are separate)
|
|
772
|
+
* - Better defaults (AI features enabled by default)
|
|
773
|
+
* - Reduced confusion (removed redundant options)
|
|
716
774
|
*/
|
|
717
775
|
import(source: Buffer | string | object, options?: {
|
|
718
776
|
format?: 'excel' | 'pdf' | 'csv' | 'json' | 'markdown';
|
package/dist/brainy.js
CHANGED
|
@@ -1593,33 +1593,91 @@ export class Brainy {
|
|
|
1593
1593
|
return options?.limit ? concepts.slice(0, options.limit) : concepts;
|
|
1594
1594
|
}
|
|
1595
1595
|
/**
|
|
1596
|
-
* Import files with
|
|
1596
|
+
* Import files with intelligent extraction and dual storage (VFS + Knowledge Graph)
|
|
1597
1597
|
*
|
|
1598
1598
|
* Unified import system that:
|
|
1599
1599
|
* - Auto-detects format (Excel, PDF, CSV, JSON, Markdown)
|
|
1600
|
-
* - Extracts entities
|
|
1600
|
+
* - Extracts entities with AI-powered name/type detection
|
|
1601
|
+
* - Infers semantic relationships from context
|
|
1601
1602
|
* - Stores in both VFS (organized files) and Knowledge Graph (connected entities)
|
|
1602
1603
|
* - Links VFS files to graph entities
|
|
1603
1604
|
*
|
|
1604
|
-
* @
|
|
1605
|
-
* // Import from file path
|
|
1606
|
-
* const result = await brain.import('/path/to/file.xlsx')
|
|
1605
|
+
* @since 4.0.0
|
|
1607
1606
|
*
|
|
1608
|
-
* @example
|
|
1609
|
-
*
|
|
1607
|
+
* @example Quick Start (All AI features enabled by default)
|
|
1608
|
+
* ```typescript
|
|
1609
|
+
* const result = await brain.import('./glossary.xlsx')
|
|
1610
|
+
* // Auto-detects format, extracts entities, infers relationships
|
|
1611
|
+
* ```
|
|
1612
|
+
*
|
|
1613
|
+
* @example Full-Featured Import (v4.x)
|
|
1614
|
+
* ```typescript
|
|
1615
|
+
* const result = await brain.import('./data.xlsx', {
|
|
1616
|
+
* // AI features
|
|
1617
|
+
* enableNeuralExtraction: true, // Extract entity names/metadata
|
|
1618
|
+
* enableRelationshipInference: true, // Detect semantic relationships
|
|
1619
|
+
* enableConceptExtraction: true, // Extract types/concepts
|
|
1620
|
+
*
|
|
1621
|
+
* // VFS features
|
|
1622
|
+
* vfsPath: '/imports/my-data', // Store in VFS directory
|
|
1623
|
+
* groupBy: 'type', // Organize by entity type
|
|
1624
|
+
* preserveSource: true, // Keep original file
|
|
1625
|
+
*
|
|
1626
|
+
* // Progress tracking
|
|
1627
|
+
* onProgress: (p) => console.log(p.message)
|
|
1628
|
+
* })
|
|
1629
|
+
* ```
|
|
1630
|
+
*
|
|
1631
|
+
* @example Performance Tuning (Large Files)
|
|
1632
|
+
* ```typescript
|
|
1633
|
+
* const result = await brain.import('./huge-file.csv', {
|
|
1634
|
+
* enableDeduplication: false, // Skip dedup for speed
|
|
1635
|
+
* confidenceThreshold: 0.8, // Higher threshold = fewer entities
|
|
1636
|
+
* onProgress: (p) => console.log(`${p.processed}/${p.total}`)
|
|
1637
|
+
* })
|
|
1638
|
+
* ```
|
|
1639
|
+
*
|
|
1640
|
+
* @example Import from Buffer or Object
|
|
1641
|
+
* ```typescript
|
|
1642
|
+
* // From buffer
|
|
1610
1643
|
* const result = await brain.import(buffer, { format: 'pdf' })
|
|
1611
1644
|
*
|
|
1612
|
-
*
|
|
1613
|
-
* // Import JSON object
|
|
1645
|
+
* // From object
|
|
1614
1646
|
* const result = await brain.import({ entities: [...] })
|
|
1647
|
+
* ```
|
|
1615
1648
|
*
|
|
1616
|
-
* @
|
|
1617
|
-
*
|
|
1618
|
-
*
|
|
1619
|
-
*
|
|
1620
|
-
*
|
|
1621
|
-
*
|
|
1622
|
-
*
|
|
1649
|
+
* @throws {Error} If invalid options are provided (v4.x breaking changes)
|
|
1650
|
+
*
|
|
1651
|
+
* @see {@link https://brainy.dev/docs/api/import API Documentation}
|
|
1652
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
1653
|
+
*
|
|
1654
|
+
* @remarks
|
|
1655
|
+
* **⚠️ Breaking Changes from v3.x:**
|
|
1656
|
+
*
|
|
1657
|
+
* The import API was redesigned in v4.0.0 for clarity and better feature control.
|
|
1658
|
+
* Old v3.x option names are **no longer recognized** and will throw errors.
|
|
1659
|
+
*
|
|
1660
|
+
* **Option Changes:**
|
|
1661
|
+
* - ❌ `extractRelationships` → ✅ `enableRelationshipInference`
|
|
1662
|
+
* - ❌ `createFileStructure` → ✅ `vfsPath: '/your/path'`
|
|
1663
|
+
* - ❌ `autoDetect` → ✅ *(removed - always enabled)*
|
|
1664
|
+
* - ❌ `excelSheets` → ✅ *(removed - all sheets processed)*
|
|
1665
|
+
* - ❌ `pdfExtractTables` → ✅ *(removed - always enabled)*
|
|
1666
|
+
*
|
|
1667
|
+
* **New Options:**
|
|
1668
|
+
* - ✅ `enableNeuralExtraction` - Extract entity names via AI
|
|
1669
|
+
* - ✅ `enableConceptExtraction` - Extract entity types via AI
|
|
1670
|
+
* - ✅ `preserveSource` - Save original file in VFS
|
|
1671
|
+
*
|
|
1672
|
+
* **If you get an error:**
|
|
1673
|
+
* The error message includes migration instructions and examples.
|
|
1674
|
+
* See the complete migration guide for all details.
|
|
1675
|
+
*
|
|
1676
|
+
* **Why these changes?**
|
|
1677
|
+
* - Clearer option names (explicitly describe what they do)
|
|
1678
|
+
* - Separation of concerns (neural, relationships, VFS are separate)
|
|
1679
|
+
* - Better defaults (AI features enabled by default)
|
|
1680
|
+
* - Reduced confusion (removed redundant options)
|
|
1623
1681
|
*/
|
|
1624
1682
|
async import(source, options) {
|
|
1625
1683
|
// Lazy load ImportCoordinator
|
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
*
|
|
9
9
|
* NO MOCKS - Production-ready implementation
|
|
10
10
|
*/
|
|
11
|
-
export type SupportedFormat = 'excel' | 'pdf' | 'csv' | 'json' | 'markdown';
|
|
11
|
+
export type SupportedFormat = 'excel' | 'pdf' | 'csv' | 'json' | 'markdown' | 'yaml' | 'docx';
|
|
12
12
|
export interface DetectionResult {
|
|
13
13
|
format: SupportedFormat;
|
|
14
14
|
confidence: number;
|
|
@@ -54,6 +54,11 @@ export declare class FormatDetector {
|
|
|
54
54
|
* Check if content looks like CSV
|
|
55
55
|
*/
|
|
56
56
|
private looksLikeCSV;
|
|
57
|
+
/**
|
|
58
|
+
* Check if content looks like YAML
|
|
59
|
+
* v4.2.0: Added YAML detection
|
|
60
|
+
*/
|
|
61
|
+
private looksLikeYAML;
|
|
57
62
|
/**
|
|
58
63
|
* Check if content is text-based (not binary)
|
|
59
64
|
*/
|
|
@@ -38,7 +38,11 @@ export class FormatDetector {
|
|
|
38
38
|
'.csv': 'csv',
|
|
39
39
|
'.json': 'json',
|
|
40
40
|
'.md': 'markdown',
|
|
41
|
-
'.markdown': 'markdown'
|
|
41
|
+
'.markdown': 'markdown',
|
|
42
|
+
'.yaml': 'yaml',
|
|
43
|
+
'.yml': 'yaml',
|
|
44
|
+
'.docx': 'docx',
|
|
45
|
+
'.doc': 'docx'
|
|
42
46
|
};
|
|
43
47
|
const format = extensionMap[ext];
|
|
44
48
|
if (format) {
|
|
@@ -63,6 +67,14 @@ export class FormatDetector {
|
|
|
63
67
|
evidence: ['Content starts with { or [', 'Valid JSON structure']
|
|
64
68
|
};
|
|
65
69
|
}
|
|
70
|
+
// YAML detection (v4.2.0)
|
|
71
|
+
if (this.looksLikeYAML(trimmed)) {
|
|
72
|
+
return {
|
|
73
|
+
format: 'yaml',
|
|
74
|
+
confidence: 0.90,
|
|
75
|
+
evidence: ['Contains YAML key: value patterns', 'YAML-style indentation']
|
|
76
|
+
};
|
|
77
|
+
}
|
|
66
78
|
// Markdown detection
|
|
67
79
|
if (this.looksLikeMarkdown(trimmed)) {
|
|
68
80
|
return {
|
|
@@ -233,6 +245,33 @@ export class FormatDetector {
|
|
|
233
245
|
}
|
|
234
246
|
return false;
|
|
235
247
|
}
|
|
248
|
+
/**
|
|
249
|
+
* Check if content looks like YAML
|
|
250
|
+
* v4.2.0: Added YAML detection
|
|
251
|
+
*/
|
|
252
|
+
looksLikeYAML(content) {
|
|
253
|
+
const lines = content.split('\n').filter(l => l.trim()).slice(0, 20);
|
|
254
|
+
if (lines.length < 2)
|
|
255
|
+
return false;
|
|
256
|
+
let yamlIndicators = 0;
|
|
257
|
+
for (const line of lines) {
|
|
258
|
+
const trimmed = line.trim();
|
|
259
|
+
// Check for YAML key: value pattern
|
|
260
|
+
if (/^[\w-]+:\s/.test(trimmed)) {
|
|
261
|
+
yamlIndicators++;
|
|
262
|
+
}
|
|
263
|
+
// Check for YAML list items (- item)
|
|
264
|
+
if (/^-\s+\w/.test(trimmed)) {
|
|
265
|
+
yamlIndicators++;
|
|
266
|
+
}
|
|
267
|
+
// Check for YAML document separator (---)
|
|
268
|
+
if (trimmed === '---' || trimmed === '...') {
|
|
269
|
+
yamlIndicators += 2;
|
|
270
|
+
}
|
|
271
|
+
}
|
|
272
|
+
// If >50% of lines have YAML indicators, it's likely YAML
|
|
273
|
+
return yamlIndicators / lines.length > 0.5;
|
|
274
|
+
}
|
|
236
275
|
/**
|
|
237
276
|
* Check if content is text-based (not binary)
|
|
238
277
|
*/
|
|
@@ -15,13 +15,23 @@ import { ImportHistory } from './ImportHistory.js';
|
|
|
15
15
|
import { NounType, VerbType } from '../types/graphTypes.js';
|
|
16
16
|
export interface ImportSource {
|
|
17
17
|
/** Source type */
|
|
18
|
-
type: 'buffer' | 'path' | 'string' | 'object';
|
|
18
|
+
type: 'buffer' | 'path' | 'string' | 'object' | 'url';
|
|
19
19
|
/** Source data */
|
|
20
20
|
data: Buffer | string | object;
|
|
21
21
|
/** Optional filename hint */
|
|
22
22
|
filename?: string;
|
|
23
|
+
/** HTTP headers for URL imports (v4.2.0) */
|
|
24
|
+
headers?: Record<string, string>;
|
|
25
|
+
/** Basic authentication for URL imports (v4.2.0) */
|
|
26
|
+
auth?: {
|
|
27
|
+
username: string;
|
|
28
|
+
password: string;
|
|
29
|
+
};
|
|
23
30
|
}
|
|
24
|
-
|
|
31
|
+
/**
|
|
32
|
+
* Valid import options for v4.x
|
|
33
|
+
*/
|
|
34
|
+
export interface ValidImportOptions {
|
|
25
35
|
/** Force specific format (skip auto-detection) */
|
|
26
36
|
format?: SupportedFormat;
|
|
27
37
|
/** VFS root path for imported files */
|
|
@@ -52,9 +62,81 @@ export interface ImportOptions {
|
|
|
52
62
|
enableHistory?: boolean;
|
|
53
63
|
/** Chunk size for streaming large imports (0 = no streaming) */
|
|
54
64
|
chunkSize?: number;
|
|
55
|
-
/**
|
|
56
|
-
|
|
65
|
+
/**
|
|
66
|
+
* Progress callback for tracking import progress (v4.2.0+)
|
|
67
|
+
*
|
|
68
|
+
* **Streaming Architecture** (always enabled):
|
|
69
|
+
* - Indexes are flushed periodically during import (adaptive intervals)
|
|
70
|
+
* - Data is queryable progressively as import proceeds
|
|
71
|
+
* - `progress.queryable` is `true` after each flush
|
|
72
|
+
* - Provides crash resilience and live monitoring
|
|
73
|
+
*
|
|
74
|
+
* **Adaptive Flush Intervals**:
|
|
75
|
+
* - <1K entities: Flush every 100 entities (max 10 flushes)
|
|
76
|
+
* - 1K-10K entities: Flush every 1000 entities (10-100 flushes)
|
|
77
|
+
* - >10K entities: Flush every 5000 entities (low overhead)
|
|
78
|
+
*
|
|
79
|
+
* **Performance**:
|
|
80
|
+
* - Flush overhead: ~5-50ms per flush (~0.3% total time)
|
|
81
|
+
* - No configuration needed - works optimally out of the box
|
|
82
|
+
*
|
|
83
|
+
* @example
|
|
84
|
+
* ```typescript
|
|
85
|
+
* // Monitor import progress with live queries
|
|
86
|
+
* await brain.import(file, {
|
|
87
|
+
* onProgress: async (progress) => {
|
|
88
|
+
* console.log(`${progress.processed}/${progress.total}`)
|
|
89
|
+
*
|
|
90
|
+
* // Query data as it's imported!
|
|
91
|
+
* if (progress.queryable) {
|
|
92
|
+
* const count = await brain.count({ type: 'Product' })
|
|
93
|
+
* console.log(`${count} products imported so far`)
|
|
94
|
+
* }
|
|
95
|
+
* }
|
|
96
|
+
* })
|
|
97
|
+
* ```
|
|
98
|
+
*/
|
|
99
|
+
onProgress?: (progress: ImportProgress) => void | Promise<void>;
|
|
57
100
|
}
|
|
101
|
+
/**
|
|
102
|
+
* Deprecated import options from v3.x
|
|
103
|
+
* Using these will cause TypeScript compile errors
|
|
104
|
+
*
|
|
105
|
+
* @deprecated These options are no longer supported in v4.x
|
|
106
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
107
|
+
*/
|
|
108
|
+
export interface DeprecatedImportOptions {
|
|
109
|
+
/**
|
|
110
|
+
* @deprecated Use `enableRelationshipInference` instead
|
|
111
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
112
|
+
*/
|
|
113
|
+
extractRelationships?: never;
|
|
114
|
+
/**
|
|
115
|
+
* @deprecated Removed in v4.x - auto-detection is now always enabled
|
|
116
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
117
|
+
*/
|
|
118
|
+
autoDetect?: never;
|
|
119
|
+
/**
|
|
120
|
+
* @deprecated Use `vfsPath` to specify the directory path instead
|
|
121
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
122
|
+
*/
|
|
123
|
+
createFileStructure?: never;
|
|
124
|
+
/**
|
|
125
|
+
* @deprecated Removed in v4.x - all sheets are now processed automatically
|
|
126
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
127
|
+
*/
|
|
128
|
+
excelSheets?: never;
|
|
129
|
+
/**
|
|
130
|
+
* @deprecated Removed in v4.x - table extraction is now automatic for PDF imports
|
|
131
|
+
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
132
|
+
*/
|
|
133
|
+
pdfExtractTables?: never;
|
|
134
|
+
}
|
|
135
|
+
/**
|
|
136
|
+
* Complete import options interface
|
|
137
|
+
* Combines valid v4.x options with deprecated v3.x options (which cause TypeScript errors)
|
|
138
|
+
*/
|
|
139
|
+
export type ImportOptions = ValidImportOptions & DeprecatedImportOptions;
|
|
58
140
|
export interface ImportProgress {
|
|
59
141
|
stage: 'detecting' | 'extracting' | 'storing-vfs' | 'storing-graph' | 'relationships' | 'complete';
|
|
60
142
|
/** Phase of import - extraction or relationship building (v3.49.0) */
|
|
@@ -70,6 +152,15 @@ export interface ImportProgress {
|
|
|
70
152
|
throughput?: number;
|
|
71
153
|
/** Estimated time remaining in ms (v3.38.0) */
|
|
72
154
|
eta?: number;
|
|
155
|
+
/**
|
|
156
|
+
* Whether data is queryable at this point (v4.2.0+)
|
|
157
|
+
*
|
|
158
|
+
* When true, indexes have been flushed and queries will return up-to-date results.
|
|
159
|
+
* When false, data exists in storage but indexes may not be current (queries may be slower/incomplete).
|
|
160
|
+
*
|
|
161
|
+
* Only present during streaming imports with flushInterval > 0.
|
|
162
|
+
*/
|
|
163
|
+
queryable?: boolean;
|
|
73
164
|
}
|
|
74
165
|
export interface ImportResult {
|
|
75
166
|
/** Import ID for history tracking */
|
|
@@ -127,6 +218,8 @@ export declare class ImportCoordinator {
|
|
|
127
218
|
private csvImporter;
|
|
128
219
|
private jsonImporter;
|
|
129
220
|
private markdownImporter;
|
|
221
|
+
private yamlImporter;
|
|
222
|
+
private docxImporter;
|
|
130
223
|
private vfsGenerator;
|
|
131
224
|
constructor(brain: Brainy);
|
|
132
225
|
/**
|
|
@@ -139,12 +232,27 @@ export declare class ImportCoordinator {
|
|
|
139
232
|
getHistory(): ImportHistory;
|
|
140
233
|
/**
|
|
141
234
|
* Import from any source with auto-detection
|
|
235
|
+
* v4.2.0: Now supports URL imports with authentication
|
|
142
236
|
*/
|
|
143
|
-
import(source: Buffer | string | object, options?: ImportOptions): Promise<ImportResult>;
|
|
237
|
+
import(source: Buffer | string | object | ImportSource, options?: ImportOptions): Promise<ImportResult>;
|
|
144
238
|
/**
|
|
145
239
|
* Normalize source to ImportSource
|
|
240
|
+
* v4.2.0: Now async to support URL fetching
|
|
146
241
|
*/
|
|
147
242
|
private normalizeSource;
|
|
243
|
+
/**
|
|
244
|
+
* Check if value is an ImportSource object
|
|
245
|
+
*/
|
|
246
|
+
private isImportSource;
|
|
247
|
+
/**
|
|
248
|
+
* Check if string is a URL
|
|
249
|
+
*/
|
|
250
|
+
private isUrl;
|
|
251
|
+
/**
|
|
252
|
+
* Fetch content from URL
|
|
253
|
+
* v4.2.0: Supports authentication and custom headers
|
|
254
|
+
*/
|
|
255
|
+
private fetchUrl;
|
|
148
256
|
/**
|
|
149
257
|
* Check if string is a file path
|
|
150
258
|
*/
|
|
@@ -165,4 +273,46 @@ export declare class ImportCoordinator {
|
|
|
165
273
|
* Normalize extraction result to unified format (Excel-like structure)
|
|
166
274
|
*/
|
|
167
275
|
private normalizeExtractionResult;
|
|
276
|
+
/**
|
|
277
|
+
* Validate options and reject deprecated v3.x options (v4.0.0+)
|
|
278
|
+
* Throws clear errors with migration guidance
|
|
279
|
+
*/
|
|
280
|
+
private validateOptions;
|
|
281
|
+
/**
|
|
282
|
+
* Build detailed error message for invalid options
|
|
283
|
+
* Respects LOG_LEVEL for verbosity (detailed in dev, concise in prod)
|
|
284
|
+
*/
|
|
285
|
+
private buildValidationErrorMessage;
|
|
286
|
+
/**
|
|
287
|
+
* Get progressive flush interval based on CURRENT entity count (v4.2.0+)
|
|
288
|
+
*
|
|
289
|
+
* Unlike adaptive intervals (which require knowing total count upfront),
|
|
290
|
+
* progressive intervals adjust dynamically as import proceeds.
|
|
291
|
+
*
|
|
292
|
+
* Thresholds:
|
|
293
|
+
* - 0-999 entities: Flush every 100 (frequent updates for better UX)
|
|
294
|
+
* - 1K-9.9K entities: Flush every 1000 (balanced performance/responsiveness)
|
|
295
|
+
* - 10K+ entities: Flush every 5000 (performance focused, minimal overhead)
|
|
296
|
+
*
|
|
297
|
+
* Benefits:
|
|
298
|
+
* - Works with known totals (file imports)
|
|
299
|
+
* - Works with unknown totals (streaming APIs, database cursors)
|
|
300
|
+
* - Frequent updates early when user is watching
|
|
301
|
+
* - Efficient processing later when performance matters
|
|
302
|
+
* - Low overhead (~0.3% for large imports)
|
|
303
|
+
* - No configuration required
|
|
304
|
+
*
|
|
305
|
+
* Example:
|
|
306
|
+
* - Import with 50K entities:
|
|
307
|
+
* - Flushes at: 100, 200, ..., 900 (9 flushes with interval=100)
|
|
308
|
+
* - Interval increases to 1000 at entity #1000
|
|
309
|
+
* - Flushes at: 1000, 2000, ..., 9000 (9 more flushes)
|
|
310
|
+
* - Interval increases to 5000 at entity #10000
|
|
311
|
+
* - Flushes at: 10000, 15000, ..., 50000 (8 more flushes)
|
|
312
|
+
* - Total: ~26 flushes = ~1.3s overhead = 0.026% of import time
|
|
313
|
+
*
|
|
314
|
+
* @param currentEntityCount - Current number of entities imported so far
|
|
315
|
+
* @returns Current optimal flush interval
|
|
316
|
+
*/
|
|
317
|
+
private getProgressiveFlushInterval;
|
|
168
318
|
}
|