@soulcraft/brainy 4.3.2 → 4.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +117 -0
- package/dist/augmentations/intelligentImport/handlers/csvHandler.js +33 -1
- package/dist/augmentations/intelligentImport/handlers/excelHandler.js +48 -2
- package/dist/augmentations/intelligentImport/handlers/pdfHandler.js +37 -0
- package/dist/augmentations/intelligentImport/types.d.ts +33 -0
- package/dist/brainy.d.ts +43 -3
- package/dist/brainy.js +83 -12
- package/dist/cli/commands/core.d.ts +3 -0
- package/dist/cli/commands/core.js +21 -3
- package/dist/cli/commands/import.js +69 -34
- package/dist/importers/SmartCSVImporter.js +35 -1
- package/dist/importers/SmartDOCXImporter.js +12 -0
- package/dist/importers/SmartExcelImporter.js +37 -1
- package/dist/importers/SmartJSONImporter.js +18 -0
- package/dist/importers/SmartMarkdownImporter.js +25 -2
- package/dist/importers/SmartPDFImporter.js +37 -1
- package/dist/importers/SmartYAMLImporter.js +12 -0
- package/dist/types/brainy.types.d.ts +98 -0
- package/dist/utils/import-progress-tracker.d.ts +140 -0
- package/dist/utils/import-progress-tracker.js +444 -0
- package/dist/vfs/PathResolver.js +2 -2
- package/dist/vfs/VirtualFileSystem.js +37 -9
- package/dist/vfs/semantic/projections/AuthorProjection.js +6 -3
- package/dist/vfs/semantic/projections/TagProjection.js +6 -3
- package/dist/vfs/semantic/projections/TemporalProjection.js +4 -2
- package/dist/vfs/types.d.ts +1 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,123 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
|
|
4
4
|
|
|
5
|
+
### [4.4.0](https://github.com/soulcraftlabs/brainy/compare/v4.3.2...v4.4.0) (2025-10-24)
|
|
6
|
+
|
|
7
|
+
- docs: update CHANGELOG for v4.4.0 release (a3c8a28)
|
|
8
|
+
- docs: add VFS filtering examples to brain.find() JSDoc (d435593)
|
|
9
|
+
- test: comprehensive tests for remaining APIs (17/17 passing) (f9e1bad)
|
|
10
|
+
- fix: add includeVFS to initializeRoot() - prevents duplicate root creation (fbf2605)
|
|
11
|
+
- fix: vfs.search() and vfs.findSimilar() now filter for VFS files only (0dda9dc)
|
|
12
|
+
- test: add comprehensive API verification tests (21/25 passing) (ce8530b)
|
|
13
|
+
- fix: wire up includeVFS parameter to ALL VFS-related APIs (6 critical bugs) (7582e3f)
|
|
14
|
+
- test: fix brain.add() return type usage in VFS tests (970f243)
|
|
15
|
+
- feat: brain.find() excludes VFS by default (Option 3C) (014b810)
|
|
16
|
+
- test: update VFS where clause tests for correct field names (86f5956)
|
|
17
|
+
- fix: VFS where clause field names + isVFS flag (f8d2d37)
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
## [4.4.0](https://github.com/soulcraftlabs/brainy/compare/v4.3.2...v4.4.0) (2025-10-24)
|
|
21
|
+
|
|
22
|
+
|
|
23
|
+
### 🎯 VFS Filtering Architecture (Option 3C)
|
|
24
|
+
|
|
25
|
+
Clean separation between VFS (Virtual File System) entities and knowledge graph entities with opt-in inclusion.
|
|
26
|
+
|
|
27
|
+
### ✨ Features
|
|
28
|
+
|
|
29
|
+
* **brain.similar()**: add includeVFS parameter for VFS filtering consistency
|
|
30
|
+
- New `includeVFS` parameter in `SimilarParams` interface
|
|
31
|
+
- Passes through to `brain.find()` for consistent VFS filtering
|
|
32
|
+
- Excludes VFS entities by default, opt-in with `includeVFS: true`
|
|
33
|
+
- Enables clean knowledge similarity queries without VFS pollution
|
|
34
|
+
|
|
35
|
+
### 🐛 Critical Bug Fixes
|
|
36
|
+
|
|
37
|
+
* **vfs.initializeRoot()**: add includeVFS to prevent duplicate root creation
|
|
38
|
+
- **Critical Fix**: VFS init was creating ~10 duplicate root entities (Workshop team issue)
|
|
39
|
+
- **Root Cause**: `initializeRoot()` called `brain.find()` without `includeVFS: true`, never found existing VFS root
|
|
40
|
+
- **Impact**: Every `vfs.init()` created a new root, causing empty `readdir('/')` results
|
|
41
|
+
- **Solution**: Added `includeVFS: true` to root entity lookup (line 171)
|
|
42
|
+
|
|
43
|
+
* **vfs.search()**: wire up includeVFS and add vfsType filter
|
|
44
|
+
- **Critical Fix**: `vfs.search()` returned 0 results after v4.3.3 VFS filtering
|
|
45
|
+
- **Root Cause**: Called `brain.find()` without `includeVFS: true`, excluded all VFS entities
|
|
46
|
+
- **Impact**: VFS semantic search completely broken
|
|
47
|
+
- **Solution**: Added `includeVFS: true` + `vfsType: 'file'` filter to return only VFS files
|
|
48
|
+
|
|
49
|
+
* **vfs.findSimilar()**: wire up includeVFS and add vfsType filter
|
|
50
|
+
- **Critical Fix**: `vfs.findSimilar()` returned 0 results or mixed knowledge entities
|
|
51
|
+
- **Root Cause**: Called `brain.similar()` without `includeVFS: true` or vfsType filter
|
|
52
|
+
- **Impact**: VFS similarity search broken, could return knowledge docs without .path property
|
|
53
|
+
- **Solution**: Added `includeVFS: true` + `vfsType: 'file'` filter
|
|
54
|
+
|
|
55
|
+
* **vfs.searchEntities()**: add includeVFS parameter
|
|
56
|
+
- Added `includeVFS: true` to ensure VFS entity search works correctly
|
|
57
|
+
|
|
58
|
+
* **VFS semantic projections**: fix all 3 projection classes
|
|
59
|
+
- **TagProjection**: Fixed 3 `brain.find()` calls with `includeVFS: true`
|
|
60
|
+
- **AuthorProjection**: Fixed 2 `brain.find()` calls with `includeVFS: true`
|
|
61
|
+
- **TemporalProjection**: Fixed 2 `brain.find()` calls with `includeVFS: true`
|
|
62
|
+
- **Impact**: VFS semantic views (/by-tag, /by-author, /by-date) were empty
|
|
63
|
+
|
|
64
|
+
### 📝 Documentation
|
|
65
|
+
|
|
66
|
+
* **JSDoc**: Added VFS filtering examples to `brain.find()` with 3 usage patterns
|
|
67
|
+
* **Inline comments**: Documented VFS filtering architecture at all usage sites
|
|
68
|
+
* **Code comments**: Explained critical bug fixes inline for maintainability
|
|
69
|
+
|
|
70
|
+
### ✅ Testing
|
|
71
|
+
|
|
72
|
+
* **45/49 APIs tested** (92% coverage) with 46 new integration tests
|
|
73
|
+
* **952/1005 tests passing** (95% pass rate) - all v4.4.0 changes verified
|
|
74
|
+
* Comprehensive tests for:
|
|
75
|
+
- brain.updateMany() - Batch metadata updates with merging
|
|
76
|
+
- brain.import() - CSV import with VFS integration
|
|
77
|
+
- vfs file operations (unlink, rmdir, rename, copy, move)
|
|
78
|
+
- neural.clusters() - Semantic clustering with VFS filtering
|
|
79
|
+
- Production scale verified (100 entities, 50 batch updates, 20 VFS files)
|
|
80
|
+
|
|
81
|
+
### 🏗️ Architecture
|
|
82
|
+
|
|
83
|
+
* **Option 3C**: VFS entities in graph with `isVFS` flag for clean separation
|
|
84
|
+
* **Default behavior**: `brain.find()` and `brain.similar()` exclude VFS by default
|
|
85
|
+
* **Opt-in inclusion**: Use `includeVFS: true` parameter to include VFS entities
|
|
86
|
+
* **VFS APIs**: Automatically filter for VFS-only (never return knowledge entities)
|
|
87
|
+
* **Cross-boundary relationships**: Link VFS files to knowledge entities with `brain.relate()`
|
|
88
|
+
|
|
89
|
+
### 🔍 API Behavior
|
|
90
|
+
|
|
91
|
+
**Before v4.4.0:**
|
|
92
|
+
```javascript
|
|
93
|
+
const results = await brain.find({ query: 'documentation' })
|
|
94
|
+
// Returned mixed knowledge + VFS files (confusing, polluted results)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**After v4.4.0:**
|
|
98
|
+
```javascript
|
|
99
|
+
// Clean knowledge queries (VFS excluded by default)
|
|
100
|
+
const knowledge = await brain.find({ query: 'documentation' })
|
|
101
|
+
// Returns only knowledge entities
|
|
102
|
+
|
|
103
|
+
// Opt-in to include VFS
|
|
104
|
+
const everything = await brain.find({
|
|
105
|
+
query: 'documentation',
|
|
106
|
+
includeVFS: true
|
|
107
|
+
})
|
|
108
|
+
// Returns knowledge + VFS files
|
|
109
|
+
|
|
110
|
+
// VFS-only search
|
|
111
|
+
const files = await vfs.search('documentation')
|
|
112
|
+
// Returns only VFS files (automatic filtering)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### 🎓 Migration Notes
|
|
116
|
+
|
|
117
|
+
**No breaking changes** - All existing code continues to work:
|
|
118
|
+
- Existing `brain.find()` queries get cleaner results (VFS excluded)
|
|
119
|
+
- VFS APIs now work correctly (bugs fixed)
|
|
120
|
+
- Add `includeVFS: true` only if you need VFS entities in knowledge queries
|
|
121
|
+
|
|
5
122
|
### [4.2.4](https://github.com/soulcraftlabs/brainy/compare/v4.2.3...v4.2.4) (2025-10-23)
|
|
6
123
|
|
|
7
124
|
|
|
@@ -30,13 +30,26 @@ export class CSVHandler extends BaseFormatHandler {
|
|
|
30
30
|
}
|
|
31
31
|
async process(data, options) {
|
|
32
32
|
const startTime = Date.now();
|
|
33
|
+
const progressHooks = options.progressHooks;
|
|
33
34
|
// Convert to buffer if string
|
|
34
35
|
const buffer = Buffer.isBuffer(data) ? data : Buffer.from(data, 'utf-8');
|
|
36
|
+
const totalBytes = buffer.length;
|
|
37
|
+
// v4.5.0: Report total bytes for progress tracking
|
|
38
|
+
if (progressHooks?.onBytesProcessed) {
|
|
39
|
+
progressHooks.onBytesProcessed(0);
|
|
40
|
+
}
|
|
41
|
+
if (progressHooks?.onCurrentItem) {
|
|
42
|
+
progressHooks.onCurrentItem('Detecting CSV encoding and delimiter...');
|
|
43
|
+
}
|
|
35
44
|
// Detect encoding
|
|
36
45
|
const detectedEncoding = options.encoding || this.detectEncodingSafe(buffer);
|
|
37
46
|
const text = buffer.toString(detectedEncoding);
|
|
38
47
|
// Detect delimiter if not specified
|
|
39
48
|
const delimiter = options.csvDelimiter || this.detectDelimiter(text);
|
|
49
|
+
// v4.5.0: Report progress - parsing started
|
|
50
|
+
if (progressHooks?.onCurrentItem) {
|
|
51
|
+
progressHooks.onCurrentItem(`Parsing CSV rows (delimiter: "${delimiter}")...`);
|
|
52
|
+
}
|
|
40
53
|
// Parse CSV
|
|
41
54
|
const hasHeaders = options.csvHeaders !== false;
|
|
42
55
|
const maxRows = options.maxRows;
|
|
@@ -50,19 +63,38 @@ export class CSVHandler extends BaseFormatHandler {
|
|
|
50
63
|
to: maxRows,
|
|
51
64
|
cast: false // We'll do type inference ourselves
|
|
52
65
|
});
|
|
66
|
+
// v4.5.0: Report bytes processed (entire file parsed)
|
|
67
|
+
if (progressHooks?.onBytesProcessed) {
|
|
68
|
+
progressHooks.onBytesProcessed(totalBytes);
|
|
69
|
+
}
|
|
53
70
|
// Convert to array of objects
|
|
54
71
|
const data = Array.isArray(records) ? records : [records];
|
|
72
|
+
// v4.5.0: Report data extraction progress
|
|
73
|
+
if (progressHooks?.onDataExtracted) {
|
|
74
|
+
progressHooks.onDataExtracted(data.length, data.length);
|
|
75
|
+
}
|
|
76
|
+
if (progressHooks?.onCurrentItem) {
|
|
77
|
+
progressHooks.onCurrentItem(`Extracted ${data.length} rows, inferring types...`);
|
|
78
|
+
}
|
|
55
79
|
// Infer types and convert values
|
|
56
80
|
const fields = data.length > 0 ? Object.keys(data[0]) : [];
|
|
57
81
|
const types = this.inferFieldTypes(data);
|
|
58
|
-
const convertedData = data.map(row => {
|
|
82
|
+
const convertedData = data.map((row, index) => {
|
|
59
83
|
const converted = {};
|
|
60
84
|
for (const [key, value] of Object.entries(row)) {
|
|
61
85
|
converted[key] = this.convertValue(value, types[key] || 'string');
|
|
62
86
|
}
|
|
87
|
+
// v4.5.0: Report progress every 1000 rows
|
|
88
|
+
if (progressHooks?.onCurrentItem && index > 0 && index % 1000 === 0) {
|
|
89
|
+
progressHooks.onCurrentItem(`Converting types: ${index}/${data.length} rows...`);
|
|
90
|
+
}
|
|
63
91
|
return converted;
|
|
64
92
|
});
|
|
65
93
|
const processingTime = Date.now() - startTime;
|
|
94
|
+
// v4.5.0: Final progress update
|
|
95
|
+
if (progressHooks?.onCurrentItem) {
|
|
96
|
+
progressHooks.onCurrentItem(`CSV processing complete: ${convertedData.length} rows`);
|
|
97
|
+
}
|
|
66
98
|
return {
|
|
67
99
|
format: this.format,
|
|
68
100
|
data: convertedData,
|
|
@@ -19,8 +19,17 @@ export class ExcelHandler extends BaseFormatHandler {
|
|
|
19
19
|
}
|
|
20
20
|
async process(data, options) {
|
|
21
21
|
const startTime = Date.now();
|
|
22
|
+
const progressHooks = options.progressHooks;
|
|
22
23
|
// Convert to buffer if string (though Excel should always be binary)
|
|
23
24
|
const buffer = Buffer.isBuffer(data) ? data : Buffer.from(data, 'binary');
|
|
25
|
+
const totalBytes = buffer.length;
|
|
26
|
+
// v4.5.0: Report start
|
|
27
|
+
if (progressHooks?.onBytesProcessed) {
|
|
28
|
+
progressHooks.onBytesProcessed(0);
|
|
29
|
+
}
|
|
30
|
+
if (progressHooks?.onCurrentItem) {
|
|
31
|
+
progressHooks.onCurrentItem('Loading Excel workbook...');
|
|
32
|
+
}
|
|
24
33
|
try {
|
|
25
34
|
// Read workbook
|
|
26
35
|
const workbook = XLSX.read(buffer, {
|
|
@@ -31,10 +40,19 @@ export class ExcelHandler extends BaseFormatHandler {
|
|
|
31
40
|
});
|
|
32
41
|
// Determine which sheets to process
|
|
33
42
|
const sheetsToProcess = this.getSheetsToProcess(workbook, options);
|
|
43
|
+
// v4.5.0: Report workbook loaded
|
|
44
|
+
if (progressHooks?.onCurrentItem) {
|
|
45
|
+
progressHooks.onCurrentItem(`Processing ${sheetsToProcess.length} sheets...`);
|
|
46
|
+
}
|
|
34
47
|
// Extract data from sheets
|
|
35
48
|
const allData = [];
|
|
36
49
|
const sheetMetadata = {};
|
|
37
|
-
for (
|
|
50
|
+
for (let sheetIndex = 0; sheetIndex < sheetsToProcess.length; sheetIndex++) {
|
|
51
|
+
const sheetName = sheetsToProcess[sheetIndex];
|
|
52
|
+
// v4.5.0: Report current sheet
|
|
53
|
+
if (progressHooks?.onCurrentItem) {
|
|
54
|
+
progressHooks.onCurrentItem(`Reading sheet: ${sheetName} (${sheetIndex + 1}/${sheetsToProcess.length})`);
|
|
55
|
+
}
|
|
38
56
|
const sheet = workbook.Sheets[sheetName];
|
|
39
57
|
if (!sheet)
|
|
40
58
|
continue;
|
|
@@ -75,12 +93,28 @@ export class ExcelHandler extends BaseFormatHandler {
|
|
|
75
93
|
columnCount: headers.length,
|
|
76
94
|
headers
|
|
77
95
|
};
|
|
96
|
+
// v4.5.0: Estimate bytes processed (sheets are sequential)
|
|
97
|
+
const bytesProcessed = Math.floor(((sheetIndex + 1) / sheetsToProcess.length) * totalBytes);
|
|
98
|
+
if (progressHooks?.onBytesProcessed) {
|
|
99
|
+
progressHooks.onBytesProcessed(bytesProcessed);
|
|
100
|
+
}
|
|
101
|
+
// v4.5.0: Report extraction progress
|
|
102
|
+
if (progressHooks?.onDataExtracted) {
|
|
103
|
+
progressHooks.onDataExtracted(allData.length, undefined); // Total unknown until complete
|
|
104
|
+
}
|
|
105
|
+
}
|
|
106
|
+
// v4.5.0: Report data extraction complete
|
|
107
|
+
if (progressHooks?.onCurrentItem) {
|
|
108
|
+
progressHooks.onCurrentItem(`Extracted ${allData.length} rows, inferring types...`);
|
|
109
|
+
}
|
|
110
|
+
if (progressHooks?.onDataExtracted) {
|
|
111
|
+
progressHooks.onDataExtracted(allData.length, allData.length);
|
|
78
112
|
}
|
|
79
113
|
// Infer types (excluding _sheet field)
|
|
80
114
|
const fields = allData.length > 0 ? Object.keys(allData[0]).filter(k => k !== '_sheet') : [];
|
|
81
115
|
const types = this.inferFieldTypes(allData);
|
|
82
116
|
// Convert values to appropriate types
|
|
83
|
-
const convertedData = allData.map(row => {
|
|
117
|
+
const convertedData = allData.map((row, index) => {
|
|
84
118
|
const converted = {};
|
|
85
119
|
for (const [key, value] of Object.entries(row)) {
|
|
86
120
|
if (key === '_sheet') {
|
|
@@ -90,9 +124,21 @@ export class ExcelHandler extends BaseFormatHandler {
|
|
|
90
124
|
converted[key] = this.convertValue(value, types[key] || 'string');
|
|
91
125
|
}
|
|
92
126
|
}
|
|
127
|
+
// v4.5.0: Report progress every 1000 rows (avoid spam)
|
|
128
|
+
if (progressHooks?.onCurrentItem && index > 0 && index % 1000 === 0) {
|
|
129
|
+
progressHooks.onCurrentItem(`Converting types: ${index}/${allData.length} rows...`);
|
|
130
|
+
}
|
|
93
131
|
return converted;
|
|
94
132
|
});
|
|
133
|
+
// v4.5.0: Final progress - all bytes processed
|
|
134
|
+
if (progressHooks?.onBytesProcessed) {
|
|
135
|
+
progressHooks.onBytesProcessed(totalBytes);
|
|
136
|
+
}
|
|
95
137
|
const processingTime = Date.now() - startTime;
|
|
138
|
+
// v4.5.0: Report completion
|
|
139
|
+
if (progressHooks?.onCurrentItem) {
|
|
140
|
+
progressHooks.onCurrentItem(`Excel complete: ${sheetsToProcess.length} sheets, ${convertedData.length} rows`);
|
|
141
|
+
}
|
|
96
142
|
return {
|
|
97
143
|
format: this.format,
|
|
98
144
|
data: convertedData,
|
|
@@ -42,8 +42,17 @@ export class PDFHandler extends BaseFormatHandler {
|
|
|
42
42
|
}
|
|
43
43
|
async process(data, options) {
|
|
44
44
|
const startTime = Date.now();
|
|
45
|
+
const progressHooks = options.progressHooks;
|
|
45
46
|
// Convert to buffer
|
|
46
47
|
const buffer = Buffer.isBuffer(data) ? data : Buffer.from(data, 'binary');
|
|
48
|
+
const totalBytes = buffer.length;
|
|
49
|
+
// v4.5.0: Report start
|
|
50
|
+
if (progressHooks?.onBytesProcessed) {
|
|
51
|
+
progressHooks.onBytesProcessed(0);
|
|
52
|
+
}
|
|
53
|
+
if (progressHooks?.onCurrentItem) {
|
|
54
|
+
progressHooks.onCurrentItem('Loading PDF document...');
|
|
55
|
+
}
|
|
47
56
|
try {
|
|
48
57
|
// Load PDF document
|
|
49
58
|
const loadingTask = pdfjsLib.getDocument({
|
|
@@ -55,11 +64,19 @@ export class PDFHandler extends BaseFormatHandler {
|
|
|
55
64
|
// Extract metadata
|
|
56
65
|
const metadata = await pdfDoc.getMetadata();
|
|
57
66
|
const numPages = pdfDoc.numPages;
|
|
67
|
+
// v4.5.0: Report document loaded
|
|
68
|
+
if (progressHooks?.onCurrentItem) {
|
|
69
|
+
progressHooks.onCurrentItem(`Processing ${numPages} pages...`);
|
|
70
|
+
}
|
|
58
71
|
// Extract text and structure from all pages
|
|
59
72
|
const allData = [];
|
|
60
73
|
let totalTextLength = 0;
|
|
61
74
|
let detectedTables = 0;
|
|
62
75
|
for (let pageNum = 1; pageNum <= numPages; pageNum++) {
|
|
76
|
+
// v4.5.0: Report current page
|
|
77
|
+
if (progressHooks?.onCurrentItem) {
|
|
78
|
+
progressHooks.onCurrentItem(`Processing page ${pageNum} of ${numPages}`);
|
|
79
|
+
}
|
|
63
80
|
const page = await pdfDoc.getPage(pageNum);
|
|
64
81
|
const textContent = await page.getTextContent();
|
|
65
82
|
// Extract text items with positions
|
|
@@ -96,8 +113,28 @@ export class PDFHandler extends BaseFormatHandler {
|
|
|
96
113
|
});
|
|
97
114
|
}
|
|
98
115
|
}
|
|
116
|
+
// v4.5.0: Estimate bytes processed (pages are sequential)
|
|
117
|
+
const bytesProcessed = Math.floor((pageNum / numPages) * totalBytes);
|
|
118
|
+
if (progressHooks?.onBytesProcessed) {
|
|
119
|
+
progressHooks.onBytesProcessed(bytesProcessed);
|
|
120
|
+
}
|
|
121
|
+
// v4.5.0: Report extraction progress
|
|
122
|
+
if (progressHooks?.onDataExtracted) {
|
|
123
|
+
progressHooks.onDataExtracted(allData.length, undefined); // Total unknown until complete
|
|
124
|
+
}
|
|
125
|
+
}
|
|
126
|
+
// v4.5.0: Final progress - all bytes processed
|
|
127
|
+
if (progressHooks?.onBytesProcessed) {
|
|
128
|
+
progressHooks.onBytesProcessed(totalBytes);
|
|
129
|
+
}
|
|
130
|
+
if (progressHooks?.onDataExtracted) {
|
|
131
|
+
progressHooks.onDataExtracted(allData.length, allData.length);
|
|
99
132
|
}
|
|
100
133
|
const processingTime = Date.now() - startTime;
|
|
134
|
+
// v4.5.0: Report completion
|
|
135
|
+
if (progressHooks?.onCurrentItem) {
|
|
136
|
+
progressHooks.onCurrentItem(`PDF complete: ${numPages} pages, ${allData.length} items extracted`);
|
|
137
|
+
}
|
|
101
138
|
// Get all unique fields (excluding metadata fields)
|
|
102
139
|
const fields = allData.length > 0
|
|
103
140
|
? Object.keys(allData[0]).filter(k => !k.startsWith('_'))
|
|
@@ -2,6 +2,29 @@
|
|
|
2
2
|
* Types for Intelligent Import Augmentation
|
|
3
3
|
* Handles Excel, PDF, and CSV import with intelligent extraction
|
|
4
4
|
*/
|
|
5
|
+
/**
|
|
6
|
+
* Progress hooks for format handlers
|
|
7
|
+
*
|
|
8
|
+
* Handlers call these hooks to report progress during processing.
|
|
9
|
+
* This enables real-time progress tracking for any file format.
|
|
10
|
+
*/
|
|
11
|
+
export interface FormatHandlerProgressHooks {
|
|
12
|
+
/**
|
|
13
|
+
* Report bytes processed
|
|
14
|
+
* Call this as you read/parse the file
|
|
15
|
+
*/
|
|
16
|
+
onBytesProcessed?: (bytes: number) => void;
|
|
17
|
+
/**
|
|
18
|
+
* Set current processing context
|
|
19
|
+
* Examples: "Processing page 5", "Reading sheet: Q2 Sales"
|
|
20
|
+
*/
|
|
21
|
+
onCurrentItem?: (item: string) => void;
|
|
22
|
+
/**
|
|
23
|
+
* Report structured data extraction progress
|
|
24
|
+
* Examples: "Extracted 100 rows", "Parsed 50 paragraphs"
|
|
25
|
+
*/
|
|
26
|
+
onDataExtracted?: (count: number, total?: number) => void;
|
|
27
|
+
}
|
|
5
28
|
export interface FormatHandler {
|
|
6
29
|
/**
|
|
7
30
|
* Format name (e.g., 'csv', 'xlsx', 'pdf')
|
|
@@ -47,6 +70,16 @@ export interface FormatHandlerOptions {
|
|
|
47
70
|
maxRows?: number;
|
|
48
71
|
/** Whether to stream large files */
|
|
49
72
|
streaming?: boolean;
|
|
73
|
+
/**
|
|
74
|
+
* Progress hooks (v4.5.0)
|
|
75
|
+
* Handlers call these to report progress during processing
|
|
76
|
+
*/
|
|
77
|
+
progressHooks?: FormatHandlerProgressHooks;
|
|
78
|
+
/**
|
|
79
|
+
* Total file size in bytes (v4.5.0)
|
|
80
|
+
* Used for progress percentage calculation
|
|
81
|
+
*/
|
|
82
|
+
totalBytes?: number;
|
|
50
83
|
}
|
|
51
84
|
export interface ProcessedData {
|
|
52
85
|
/** Format that was processed */
|
package/dist/brainy.d.ts
CHANGED
|
@@ -537,6 +537,27 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
|
|
|
537
537
|
* console.error('Search failed:', error)
|
|
538
538
|
* return []
|
|
539
539
|
* }
|
|
540
|
+
*
|
|
541
|
+
* @example
|
|
542
|
+
* // VFS Filtering (v4.4.0): Exclude VFS entities by default
|
|
543
|
+
* // Knowledge graph queries stay clean - no VFS files in results
|
|
544
|
+
* const knowledge = await brainy.find({ query: 'AI concepts' })
|
|
545
|
+
* // Returns only knowledge entities, VFS files excluded
|
|
546
|
+
*
|
|
547
|
+
* @example
|
|
548
|
+
* // Include VFS entities when needed
|
|
549
|
+
* const everything = await brainy.find({
|
|
550
|
+
* query: 'documentation',
|
|
551
|
+
* includeVFS: true // Opt-in to include VFS files
|
|
552
|
+
* })
|
|
553
|
+
* // Returns both knowledge entities AND VFS files
|
|
554
|
+
*
|
|
555
|
+
* @example
|
|
556
|
+
* // Search only VFS files
|
|
557
|
+
* const files = await brainy.find({
|
|
558
|
+
* where: { vfsType: 'file', extension: '.md' },
|
|
559
|
+
* includeVFS: true // Required to find VFS entities
|
|
560
|
+
* })
|
|
540
561
|
*/
|
|
541
562
|
find(query: string | FindParams<T>): Promise<Result<T>[]>;
|
|
542
563
|
/**
|
|
@@ -779,9 +800,27 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
|
|
|
779
800
|
* groupBy: 'type', // Organize by entity type
|
|
780
801
|
* preserveSource: true, // Keep original file
|
|
781
802
|
*
|
|
782
|
-
* // Progress tracking
|
|
783
|
-
* onProgress: (p) =>
|
|
803
|
+
* // Progress tracking (v4.5.0 - STANDARDIZED FOR ALL 7 FORMATS!)
|
|
804
|
+
* onProgress: (p) => {
|
|
805
|
+
* console.log(`[${p.stage}] ${p.message}`)
|
|
806
|
+
* console.log(`Entities: ${p.entities || 0}, Rels: ${p.relationships || 0}`)
|
|
807
|
+
* if (p.throughput) console.log(`Rate: ${p.throughput.toFixed(1)}/sec`)
|
|
808
|
+
* }
|
|
784
809
|
* })
|
|
810
|
+
* // THIS SAME HANDLER WORKS FOR CSV, PDF, Excel, JSON, Markdown, YAML, DOCX!
|
|
811
|
+
* ```
|
|
812
|
+
*
|
|
813
|
+
* @example Universal Progress Handler (v4.5.0)
|
|
814
|
+
* ```typescript
|
|
815
|
+
* // ONE handler for ALL 7 formats - no format-specific code needed!
|
|
816
|
+
* const universalProgress = (p) => {
|
|
817
|
+
* updateUI(p.stage, p.message, p.entities, p.relationships)
|
|
818
|
+
* }
|
|
819
|
+
*
|
|
820
|
+
* await brain.import(csvBuffer, { onProgress: universalProgress })
|
|
821
|
+
* await brain.import(pdfBuffer, { onProgress: universalProgress })
|
|
822
|
+
* await brain.import(excelBuffer, { onProgress: universalProgress })
|
|
823
|
+
* // Works for JSON, Markdown, YAML, DOCX too!
|
|
785
824
|
* ```
|
|
786
825
|
*
|
|
787
826
|
* @example Performance Tuning (Large Files)
|
|
@@ -806,6 +845,7 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
|
|
|
806
845
|
*
|
|
807
846
|
* @see {@link https://brainy.dev/docs/api/import API Documentation}
|
|
808
847
|
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
848
|
+
* @see {@link https://brainy.dev/docs/guides/standard-import-progress Standard Progress API (v4.5.0)}
|
|
809
849
|
*
|
|
810
850
|
* @remarks
|
|
811
851
|
* **⚠️ Breaking Changes from v3.x:**
|
|
@@ -836,7 +876,7 @@ export declare class Brainy<T = any> implements BrainyInterface<T> {
|
|
|
836
876
|
* - Reduced confusion (removed redundant options)
|
|
837
877
|
*/
|
|
838
878
|
import(source: Buffer | string | object, options?: {
|
|
839
|
-
format?: 'excel' | 'pdf' | 'csv' | 'json' | 'markdown';
|
|
879
|
+
format?: 'excel' | 'pdf' | 'csv' | 'json' | 'markdown' | 'yaml' | 'docx';
|
|
840
880
|
vfsPath?: string;
|
|
841
881
|
groupBy?: 'type' | 'sheet' | 'flat' | 'custom';
|
|
842
882
|
customGrouping?: (entity: any) => string;
|
package/dist/brainy.js
CHANGED
|
@@ -1012,6 +1012,27 @@ export class Brainy {
|
|
|
1012
1012
|
* console.error('Search failed:', error)
|
|
1013
1013
|
* return []
|
|
1014
1014
|
* }
|
|
1015
|
+
*
|
|
1016
|
+
* @example
|
|
1017
|
+
* // VFS Filtering (v4.4.0): Exclude VFS entities by default
|
|
1018
|
+
* // Knowledge graph queries stay clean - no VFS files in results
|
|
1019
|
+
* const knowledge = await brainy.find({ query: 'AI concepts' })
|
|
1020
|
+
* // Returns only knowledge entities, VFS files excluded
|
|
1021
|
+
*
|
|
1022
|
+
* @example
|
|
1023
|
+
* // Include VFS entities when needed
|
|
1024
|
+
* const everything = await brainy.find({
|
|
1025
|
+
* query: 'documentation',
|
|
1026
|
+
* includeVFS: true // Opt-in to include VFS files
|
|
1027
|
+
* })
|
|
1028
|
+
* // Returns both knowledge entities AND VFS files
|
|
1029
|
+
*
|
|
1030
|
+
* @example
|
|
1031
|
+
* // Search only VFS files
|
|
1032
|
+
* const files = await brainy.find({
|
|
1033
|
+
* where: { vfsType: 'file', extension: '.md' },
|
|
1034
|
+
* includeVFS: true // Required to find VFS entities
|
|
1035
|
+
* })
|
|
1015
1036
|
*/
|
|
1016
1037
|
async find(query) {
|
|
1017
1038
|
await this.ensureInitialized();
|
|
@@ -1056,6 +1077,12 @@ export class Brainy {
|
|
|
1056
1077
|
Object.assign(filter, params.where);
|
|
1057
1078
|
if (params.service)
|
|
1058
1079
|
filter.service = params.service;
|
|
1080
|
+
// v4.3.3: Exclude VFS entities by default (Option 3C architecture)
|
|
1081
|
+
// Only include VFS if explicitly requested via includeVFS: true
|
|
1082
|
+
// BUT: Don't add automatic exclusion if user explicitly queries isVFS in where clause
|
|
1083
|
+
if (params.includeVFS !== true && !params.where?.hasOwnProperty('isVFS')) {
|
|
1084
|
+
filter.isVFS = { notEquals: true };
|
|
1085
|
+
}
|
|
1059
1086
|
if (params.type) {
|
|
1060
1087
|
const types = Array.isArray(params.type) ? params.type : [params.type];
|
|
1061
1088
|
if (types.length === 1) {
|
|
@@ -1088,14 +1115,33 @@ export class Brainy {
|
|
|
1088
1115
|
if (!hasVectorSearchCriteria && !hasFilterCriteria && !hasGraphCriteria) {
|
|
1089
1116
|
const limit = params.limit || 20;
|
|
1090
1117
|
const offset = params.offset || 0;
|
|
1091
|
-
|
|
1092
|
-
|
|
1093
|
-
|
|
1094
|
-
|
|
1095
|
-
|
|
1096
|
-
|
|
1097
|
-
|
|
1098
|
-
|
|
1118
|
+
// v4.3.3: Apply VFS filtering even for empty queries
|
|
1119
|
+
let filter = {};
|
|
1120
|
+
if (params.includeVFS !== true) {
|
|
1121
|
+
filter.isVFS = { notEquals: true };
|
|
1122
|
+
}
|
|
1123
|
+
// Use metadata index if we need to filter VFS
|
|
1124
|
+
if (Object.keys(filter).length > 0) {
|
|
1125
|
+
const filteredIds = await this.metadataIndex.getIdsForFilter(filter);
|
|
1126
|
+
const pageIds = filteredIds.slice(offset, offset + limit);
|
|
1127
|
+
for (const id of pageIds) {
|
|
1128
|
+
const entity = await this.get(id);
|
|
1129
|
+
if (entity) {
|
|
1130
|
+
results.push(this.createResult(id, 1.0, entity));
|
|
1131
|
+
}
|
|
1132
|
+
}
|
|
1133
|
+
}
|
|
1134
|
+
else {
|
|
1135
|
+
// No filtering needed, use direct storage query
|
|
1136
|
+
const storageResults = await this.storage.getNouns({
|
|
1137
|
+
pagination: { limit: limit + offset, offset: 0 }
|
|
1138
|
+
});
|
|
1139
|
+
for (let i = offset; i < Math.min(offset + limit, storageResults.items.length); i++) {
|
|
1140
|
+
const noun = storageResults.items[i];
|
|
1141
|
+
if (noun) {
|
|
1142
|
+
const entity = await this.convertNounToEntity(noun);
|
|
1143
|
+
results.push(this.createResult(noun.id, 1.0, entity));
|
|
1144
|
+
}
|
|
1099
1145
|
}
|
|
1100
1146
|
}
|
|
1101
1147
|
return results;
|
|
@@ -1129,7 +1175,7 @@ export class Brainy {
|
|
|
1129
1175
|
results = Array.from(uniqueResults.values());
|
|
1130
1176
|
}
|
|
1131
1177
|
// Apply O(log n) metadata filtering using core MetadataIndexManager
|
|
1132
|
-
if (params.where || params.type || params.service) {
|
|
1178
|
+
if (params.where || params.type || params.service || params.includeVFS !== true) {
|
|
1133
1179
|
// Build filter object for metadata index
|
|
1134
1180
|
let filter = {};
|
|
1135
1181
|
// Base filter from where and service
|
|
@@ -1137,6 +1183,11 @@ export class Brainy {
|
|
|
1137
1183
|
Object.assign(filter, params.where);
|
|
1138
1184
|
if (params.service)
|
|
1139
1185
|
filter.service = params.service;
|
|
1186
|
+
// v4.3.3: Exclude VFS entities by default (Option 3C architecture)
|
|
1187
|
+
// BUT: Don't add automatic exclusion if user explicitly queries isVFS in where clause
|
|
1188
|
+
if (params.includeVFS !== true && !params.where?.hasOwnProperty('isVFS')) {
|
|
1189
|
+
filter.isVFS = { notEquals: true };
|
|
1190
|
+
}
|
|
1140
1191
|
if (params.type) {
|
|
1141
1192
|
const types = Array.isArray(params.type) ? params.type : [params.type];
|
|
1142
1193
|
if (types.length === 1) {
|
|
@@ -1361,7 +1412,8 @@ export class Brainy {
|
|
|
1361
1412
|
limit: params.limit,
|
|
1362
1413
|
type: params.type,
|
|
1363
1414
|
where: params.where,
|
|
1364
|
-
service: params.service
|
|
1415
|
+
service: params.service,
|
|
1416
|
+
includeVFS: params.includeVFS // v4.4.0: Pass through VFS filtering
|
|
1365
1417
|
});
|
|
1366
1418
|
}
|
|
1367
1419
|
// ============= BATCH OPERATIONS =============
|
|
@@ -1705,9 +1757,27 @@ export class Brainy {
|
|
|
1705
1757
|
* groupBy: 'type', // Organize by entity type
|
|
1706
1758
|
* preserveSource: true, // Keep original file
|
|
1707
1759
|
*
|
|
1708
|
-
* // Progress tracking
|
|
1709
|
-
* onProgress: (p) =>
|
|
1760
|
+
* // Progress tracking (v4.5.0 - STANDARDIZED FOR ALL 7 FORMATS!)
|
|
1761
|
+
* onProgress: (p) => {
|
|
1762
|
+
* console.log(`[${p.stage}] ${p.message}`)
|
|
1763
|
+
* console.log(`Entities: ${p.entities || 0}, Rels: ${p.relationships || 0}`)
|
|
1764
|
+
* if (p.throughput) console.log(`Rate: ${p.throughput.toFixed(1)}/sec`)
|
|
1765
|
+
* }
|
|
1710
1766
|
* })
|
|
1767
|
+
* // THIS SAME HANDLER WORKS FOR CSV, PDF, Excel, JSON, Markdown, YAML, DOCX!
|
|
1768
|
+
* ```
|
|
1769
|
+
*
|
|
1770
|
+
* @example Universal Progress Handler (v4.5.0)
|
|
1771
|
+
* ```typescript
|
|
1772
|
+
* // ONE handler for ALL 7 formats - no format-specific code needed!
|
|
1773
|
+
* const universalProgress = (p) => {
|
|
1774
|
+
* updateUI(p.stage, p.message, p.entities, p.relationships)
|
|
1775
|
+
* }
|
|
1776
|
+
*
|
|
1777
|
+
* await brain.import(csvBuffer, { onProgress: universalProgress })
|
|
1778
|
+
* await brain.import(pdfBuffer, { onProgress: universalProgress })
|
|
1779
|
+
* await brain.import(excelBuffer, { onProgress: universalProgress })
|
|
1780
|
+
* // Works for JSON, Markdown, YAML, DOCX too!
|
|
1711
1781
|
* ```
|
|
1712
1782
|
*
|
|
1713
1783
|
* @example Performance Tuning (Large Files)
|
|
@@ -1732,6 +1802,7 @@ export class Brainy {
|
|
|
1732
1802
|
*
|
|
1733
1803
|
* @see {@link https://brainy.dev/docs/api/import API Documentation}
|
|
1734
1804
|
* @see {@link https://brainy.dev/docs/guides/migrating-to-v4 Migration Guide}
|
|
1805
|
+
* @see {@link https://brainy.dev/docs/guides/standard-import-progress Standard Progress API (v4.5.0)}
|
|
1735
1806
|
*
|
|
1736
1807
|
* @remarks
|
|
1737
1808
|
* **⚠️ Breaking Changes from v3.x:**
|
|
@@ -12,6 +12,8 @@ interface AddOptions extends CoreOptions {
|
|
|
12
12
|
id?: string;
|
|
13
13
|
metadata?: string;
|
|
14
14
|
type?: string;
|
|
15
|
+
confidence?: string;
|
|
16
|
+
weight?: string;
|
|
15
17
|
}
|
|
16
18
|
interface SearchOptions extends CoreOptions {
|
|
17
19
|
limit?: string;
|
|
@@ -25,6 +27,7 @@ interface SearchOptions extends CoreOptions {
|
|
|
25
27
|
via?: string;
|
|
26
28
|
explain?: boolean;
|
|
27
29
|
includeRelations?: boolean;
|
|
30
|
+
includeVfs?: boolean;
|
|
28
31
|
fusion?: string;
|
|
29
32
|
vectorWeight?: string;
|
|
30
33
|
graphWeight?: string;
|