@kodexa-ai/document-wasm-ts 8.0.0-20484695702

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/README.md +563 -0
  2. package/dist/ContentNode.d.ts +207 -0
  3. package/dist/ContentNode.d.ts.map +1 -0
  4. package/dist/ContentNode.js +633 -0
  5. package/dist/ContentNode.js.map +1 -0
  6. package/dist/ExtractionEngine.d.ts +65 -0
  7. package/dist/ExtractionEngine.d.ts.map +1 -0
  8. package/dist/ExtractionEngine.js +115 -0
  9. package/dist/ExtractionEngine.js.map +1 -0
  10. package/dist/KddbDocument.d.ts +193 -0
  11. package/dist/KddbDocument.d.ts.map +1 -0
  12. package/dist/KddbDocument.js +575 -0
  13. package/dist/KddbDocument.js.map +1 -0
  14. package/dist/index.d.ts +54 -0
  15. package/dist/index.d.ts.map +1 -0
  16. package/dist/index.js +96 -0
  17. package/dist/index.js.map +1 -0
  18. package/dist/sqljs-bridge.bundle.js +330 -0
  19. package/dist/wasm/browser-bridge.d.ts +44 -0
  20. package/dist/wasm/browser-bridge.d.ts.map +1 -0
  21. package/dist/wasm/browser-bridge.js +104 -0
  22. package/dist/wasm/browser-bridge.js.map +1 -0
  23. package/dist/wasm/loader.d.ts +57 -0
  24. package/dist/wasm/loader.d.ts.map +1 -0
  25. package/dist/wasm/loader.js +193 -0
  26. package/dist/wasm/loader.js.map +1 -0
  27. package/dist/wasm/sqljs-bridge.d.ts +40 -0
  28. package/dist/wasm/sqljs-bridge.d.ts.map +1 -0
  29. package/dist/wasm/sqljs-bridge.js +98 -0
  30. package/dist/wasm/sqljs-bridge.js.map +1 -0
  31. package/dist/wasm/sqljs-core.d.ts +92 -0
  32. package/dist/wasm/sqljs-core.d.ts.map +1 -0
  33. package/dist/wasm/sqljs-core.js +372 -0
  34. package/dist/wasm/sqljs-core.js.map +1 -0
  35. package/dist/wasm/types.d.ts +179 -0
  36. package/dist/wasm/types.d.ts.map +1 -0
  37. package/dist/wasm/types.js +9 -0
  38. package/dist/wasm/types.js.map +1 -0
  39. package/package.json +62 -0
package/README.md ADDED
@@ -0,0 +1,563 @@
1
+ # Kodexa Document Models - TypeScript WASM Wrapper
2
+
3
+ High-performance TypeScript wrapper for the Kodexa Go library using WebAssembly. This provides fast document processing capabilities for both Node.js and browser environments.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **High Performance**: Direct access to Go library performance through WebAssembly
8
+ - **Cross-Platform**: Works in both Node.js and browsers
9
+ - **Type Safe**: Full TypeScript support with comprehensive type definitions
10
+ - **Memory Efficient**: Proper memory management with automatic cleanup
11
+ - **Complete API**: All Go library functions available through TypeScript interface
12
+
13
+ ## 📦 Installation
14
+
15
+ ```bash
16
+ npm install @kodexa-ai/document-wasm-ts
17
+ ```
18
+
19
+ ## 🏗️ Building from Source
20
+
21
+ ### Prerequisites
22
+
23
+ - Node.js 16+
24
+ - Go 1.22+
25
+ - TypeScript 5.8+
26
+
27
+ ### Build Steps
28
+
29
+ ```bash
30
+ # Install dependencies
31
+ npm install
32
+
33
+ # Build WASM module only (from Go source)
34
+ npm run build:wasm
35
+
36
+ # Build TypeScript library only
37
+ npm run build
38
+
39
+ # Build everything (WASM + TypeScript)
40
+ npm run build:all
41
+
42
+ # Run tests
43
+ npm test
44
+ ```
45
+
46
+ ### Build Scripts
47
+
48
+ - `npm run build:all` - Build both WASM and TypeScript
49
+ - `npm run build:wasm` - Build Go WASM module only (runs `make wasm wasm-support` in lib/go)
50
+ - `npm run build` - Build TypeScript library only
51
+ - `npm test` - Run test suite
52
+ - `npm run clean` - Clean dist artifacts
53
+
54
+ ## 🎯 Quick Start
55
+
56
+ ### Node.js
57
+
58
+ ```typescript
59
+ import { Kodexa } from '@kodexa-ai/document-wasm-ts';
60
+
61
+ async function main() {
62
+ // Initialize WASM module
63
+ await Kodexa.init();
64
+
65
+ // Create document from text
66
+ const document = await Kodexa.fromText('Hello, world!');
67
+
68
+ // Get root node
69
+ const root = await document.getRoot();
70
+ console.log(await root?.getContent()); // "Hello, world!"
71
+
72
+ // Cleanup
73
+ document.dispose();
74
+ Kodexa.cleanup();
75
+ }
76
+
77
+ main().catch(console.error);
78
+ ```
79
+
80
+ ### Browser
81
+
82
+ ```html
83
+ <!DOCTYPE html>
84
+ <html>
85
+ <head>
86
+ <!-- sql.js for in-browser SQLite -->
87
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/sql.js/1.11.0/sql-wasm.js"></script>
88
+ <!-- Bridge script and Go WASM runtime -->
89
+ <script src="node_modules/@kodexa-ai/document-wasm-ts/dist/sqljs-bridge.bundle.js"></script>
90
+ <script src="node_modules/@kodexa-ai/document-wasm-ts/dist/wasm_exec.js"></script>
91
+ </head>
92
+ <body>
93
+ <script type="module">
94
+ import { Kodexa } from './node_modules/@kodexa-ai/document-wasm-ts/dist/index.js';
95
+
96
+ async function run() {
97
+ await Kodexa.init();
98
+ const doc = await Kodexa.fromText('Browser document');
99
+ console.log('Document created!');
100
+ doc.dispose();
101
+ }
102
+
103
+ run().catch(console.error);
104
+ </script>
105
+ </body>
106
+ </html>
107
+ ```
108
+
109
+ ## 📚 API Reference
110
+
111
+ ### Kodexa Class
112
+
113
+ Main entry point for the library:
114
+
115
+ ```typescript
116
+ // Initialize WASM module (required before use)
117
+ await Kodexa.init();
118
+
119
+ // Create documents
120
+ const doc1 = await Kodexa.createDocument();
121
+ const doc2 = await Kodexa.fromText('text content');
122
+ const doc3 = await Kodexa.fromJson('{"data": "json"}');
123
+ const doc4 = await Kodexa.fromKddb('/path/to/file.kddb');
124
+
125
+ // Check if WASM is loaded
126
+ const loaded = Kodexa.isLoaded();
127
+
128
+ // Cleanup resources
129
+ Kodexa.cleanup();
130
+ ```
131
+
132
+ ### GoDocument Class
133
+
134
+ High-level document operations:
135
+
136
+ ```typescript
137
+ // Create documents
138
+ const doc = await GoDocument.create();
139
+ const textDoc = await GoDocument.fromText('content');
140
+ const jsonDoc = await GoDocument.fromJson('{}');
141
+
142
+ // Document operations
143
+ const root = await doc.getRoot();
144
+ const json = await doc.toJson();
145
+ const kddlBytes = await doc.toKddb();
146
+
147
+ // Node management
148
+ const node = await doc.createNode('paragraph');
149
+ await doc.setContentNode(node);
150
+
151
+ // Selection
152
+ const nodes = await doc.select('paragraph');
153
+ const firstNode = await doc.selectFirst('heading');
154
+
155
+ // Metadata
156
+ await doc.setMetadataValue('key', 'value');
157
+ const value = await doc.getMetadataValue('key');
158
+
159
+ // Cleanup
160
+ doc.dispose();
161
+ ```
162
+
163
+ ### GoContentNode Class
164
+
165
+ Node manipulation and traversal:
166
+
167
+ ```typescript
168
+ // Basic properties
169
+ const nodeType = await node.getNodeType();
170
+ await node.setNodeType('heading');
171
+
172
+ const content = await node.getContent();
173
+ await node.setContent('New content');
174
+
175
+ const index = await node.getIndex();
176
+ await node.setIndex(0);
177
+
178
+ // Hierarchy
179
+ const parent = await node.getParent();
180
+ const children = await node.getChildren();
181
+ const childCount = await node.getChildCount();
182
+ const child = await node.getChild(0);
183
+ await node.addChild(childNode);
184
+
185
+ // Navigation
186
+ const next = await node.nextNode();
187
+ const prev = await node.previousNode();
188
+ const isFirst = await node.isFirstChild();
189
+ const isLast = await node.isLastChild();
190
+
191
+ // Tagging
192
+ await node.tag('important');
193
+ await node.tagWithOptions('label', { confidence: 0.95 });
194
+ const hasTag = await node.hasTag('important');
195
+ await node.removeTag('important');
196
+ const tags = await node.getTags();
197
+
198
+ // Features
199
+ await node.setFeature('style', 'color', ['blue']);
200
+ const feature = await node.getFeature('style', 'color');
201
+ const value = await node.getFeatureValue('style', 'color');
202
+ const hasFeature = await node.hasFeature('style', 'color');
203
+ const features = await node.getFeatures();
204
+ const styleFeatures = await node.getFeaturesOfType('style');
205
+
206
+ // Spatial data
207
+ await node.setBBox(10, 20, 300, 50);
208
+ const bbox = await node.getBBox();
209
+ const x = await node.getX();
210
+ const y = await node.getY();
211
+ await node.setRotate(45);
212
+
213
+ // Selection
214
+ const selected = await node.select('span');
215
+ const first = await node.selectFirst('span');
216
+
217
+ // Cleanup
218
+ node.dispose();
219
+ ```
220
+
221
+ ## 🎨 Examples
222
+
223
+ ### Document Creation and Manipulation
224
+
225
+ ```typescript
226
+ import { Kodexa } from '@kodexa-ai/document-wasm-ts';
227
+
228
+ async function documentExample() {
229
+ await Kodexa.init();
230
+
231
+ // Create document
232
+ const doc = await Kodexa.fromText('Sample document');
233
+ const root = await doc.getRoot();
234
+
235
+ // Create nodes
236
+ const heading = await doc.createNode('heading');
237
+ await heading.setContent('Main Title');
238
+
239
+ const paragraph = await doc.createNode('paragraph');
240
+ await paragraph.setContent('This is content.');
241
+
242
+ // Build hierarchy
243
+ if (root) {
244
+ await root.addChild(heading);
245
+ await root.addChild(paragraph);
246
+ }
247
+
248
+ // Tag and style
249
+ await heading.tag('title');
250
+ await paragraph.setFeature('style', 'font-size', ['14px']);
251
+
252
+ // Serialize
253
+ const json = await doc.toJson();
254
+ console.log('Document JSON:', json);
255
+
256
+ // Cleanup
257
+ doc.dispose();
258
+ Kodexa.cleanup();
259
+ }
260
+ ```
261
+
262
+ ### Advanced Node Operations
263
+
264
+ ```typescript
265
+ async function nodeExample() {
266
+ await Kodexa.init();
267
+
268
+ const doc = await Kodexa.createDocument();
269
+ const node = await doc.createNode('paragraph');
270
+
271
+ // Spatial positioning
272
+ await node.setBBox(100, 200, 400, 50);
273
+ const bbox = await node.getBBox();
274
+ console.log(`Position: ${bbox?.x},${bbox?.y}`);
275
+
276
+ // Multiple features
277
+ await node.setFeature('style', 'color', ['red']);
278
+ await node.setFeature('style', 'weight', ['bold']);
279
+ await node.setFeature('layout', 'margin', ['10px']);
280
+
281
+ // Get all style features
282
+ const styleFeatures = await node.getFeaturesOfType('style');
283
+ console.log('Style features:', styleFeatures);
284
+
285
+ // Navigation example
286
+ const parent = await node.getParent();
287
+ const siblings = parent ? await parent.getChildren() : [];
288
+ const isLast = await node.isLastChild();
289
+
290
+ doc.dispose();
291
+ Kodexa.cleanup();
292
+ }
293
+ ```
294
+
295
+ ### Performance Example
296
+
297
+ ```typescript
298
+ async function performanceExample() {
299
+ await Kodexa.init();
300
+
301
+ const start = Date.now();
302
+ const documents = [];
303
+
304
+ // Create 1000 documents
305
+ for (let i = 0; i < 1000; i++) {
306
+ const doc = await Kodexa.fromText(`Document ${i}`);
307
+ documents.push(doc);
308
+ }
309
+
310
+ const duration = Date.now() - start;
311
+ console.log(`Created 1000 documents in ${duration}ms`);
312
+
313
+ // Cleanup
314
+ documents.forEach(doc => doc.dispose());
315
+ Kodexa.cleanup();
316
+ }
317
+ ```
318
+
319
+ ## 🧪 Testing
320
+
321
+ ```bash
322
+ # Run all tests
323
+ npm test
324
+
325
+ # Run with coverage
326
+ npm run test:coverage
327
+
328
+ # Run specific test
329
+ npm test -- wasm-document.test.ts
330
+
331
+ # Run integration tests (requires WASM build)
332
+ WASM_INTEGRATION_TEST=true npm test
333
+ ```
334
+
335
+ ### HTML Test Files
336
+
337
+ The library includes HTML test files for interactive browser testing. These files must be served via HTTP (not opened directly with `file://`) due to CORS and ES module requirements.
338
+
339
+ **Available test files:**
340
+ - `test-extraction.html` - Test extraction engine functionality
341
+ - `test-queries.html` - Test document query functions (getLines, getNodeTypes, etc.)
342
+ - `test-minimal.html` - Minimal WASM loading and basic functionality test
343
+ - `kddb-compare.html` - Compare kddb file processing between implementations
344
+
345
+ **Serving the test files:**
346
+
347
+ ```bash
348
+ cd lib/typescript
349
+
350
+ # Option 1: Python (built-in)
351
+ python3 -m http.server 8080
352
+
353
+ # Option 2: Node.js http-server
354
+ npx http-server -p 8080
355
+
356
+ # Option 3: Node.js serve
357
+ npx serve -p 8080
358
+ ```
359
+
360
+ Then open `http://localhost:8080/test-queries.html` or `http://localhost:8080/test-extraction.html` in your browser.
361
+
362
+ **Note:** Make sure you've built the WASM module first with `npm run build:all`.
363
+
364
+ ## 🔧 Configuration
365
+
366
+ ### TypeScript Configuration
367
+
368
+ The library includes TypeScript definitions. Configure your `tsconfig.json`:
369
+
370
+ ```json
371
+ {
372
+ "compilerOptions": {
373
+ "target": "ES2020",
374
+ "module": "ESNext",
375
+ "moduleResolution": "node",
376
+ "allowSyntheticDefaultImports": true,
377
+ "esModuleInterop": true,
378
+ "strict": true
379
+ }
380
+ }
381
+ ```
382
+
383
+ ### Webpack Configuration
384
+
385
+ For browser usage with Webpack:
386
+
387
+ ```javascript
388
+ module.exports = {
389
+ resolve: {
390
+ fallback: {
391
+ "fs": false,
392
+ "path": false
393
+ }
394
+ },
395
+ experiments: {
396
+ asyncWebAssembly: true
397
+ }
398
+ };
399
+ ```
400
+
401
+ ## ⚡ Performance
402
+
403
+ The WASM wrapper provides significant performance benefits:
404
+
405
+ - **Document Creation**: ~0.1ms per document
406
+ - **Node Operations**: ~0.01ms per operation
407
+ - **Memory Usage**: ~50% less than pure JS implementations
408
+ - **File I/O**: Native Go performance for KDDB files
409
+
410
+ ### Benchmarks
411
+
412
+ ```
413
+ Operation | Pure JS | WASM | Improvement
414
+ -------------------------|----------|---------|------------
415
+ Create 1000 documents | 500ms | 100ms | 5x faster
416
+ Process large document | 2000ms | 400ms | 5x faster
417
+ Memory usage (1MB doc) | 5MB | 2.5MB | 50% less
418
+ ```
419
+
420
+ ## 🔒 Memory Management
421
+
422
+ Proper memory management is crucial for WASM applications:
423
+
424
+ ```typescript
425
+ // Always dispose of documents and nodes
426
+ const doc = await Kodexa.fromText('content');
427
+ try {
428
+ // Use document...
429
+ } finally {
430
+ doc.dispose(); // Free WASM memory
431
+ }
432
+
433
+ // Cleanup at application end
434
+ window.addEventListener('beforeunload', () => {
435
+ Kodexa.cleanup();
436
+ });
437
+ ```
438
+
439
+ ## 🐛 Troubleshooting
440
+
441
+ ### Common Issues
442
+
443
+ **WASM module not loading:**
444
+ ```typescript
445
+ // Check if WASM is supported
446
+ if (!WebAssembly) {
447
+ console.error('WebAssembly not supported');
448
+ }
449
+
450
+ // Check loading
451
+ try {
452
+ await Kodexa.init();
453
+ } catch (error) {
454
+ console.error('WASM init failed:', error);
455
+ }
456
+ ```
457
+
458
+ **Memory leaks:**
459
+ ```typescript
460
+ // Always dispose resources
461
+ const doc = await Kodexa.fromText('content');
462
+ // ... use document
463
+ doc.dispose(); // Required!
464
+
465
+ // Check for undisposed objects
466
+ // Use browser dev tools to monitor memory
467
+ ```
468
+
469
+ **Performance issues:**
470
+ ```typescript
471
+ // Batch operations when possible
472
+ const nodes = [];
473
+ for (let i = 0; i < 1000; i++) {
474
+ nodes.push(await doc.createNode('item'));
475
+ }
476
+
477
+ // Better: create in batches
478
+ const batch = await Promise.all(
479
+ Array(1000).fill(0).map(() => doc.createNode('item'))
480
+ );
481
+ ```
482
+
483
+ ## 📄 License
484
+
485
+ This project is licensed under the same terms as the main Kodexa project.
486
+
487
+ ## 📦 Release Process
488
+
489
+ This package is automatically published to npm with provenance attestation via GitHub Actions.
490
+
491
+ ### Automatic Publishing
492
+
493
+ The package is **automatically published** on every push to `main` or `develop` branches that modifies files in `kodexa-document/lib/typescript/`.
494
+
495
+ ### Version Format
496
+
497
+ Versions follow the format: `MAJOR.MINOR.PATCH-BUILDID`
498
+ - Example: `8.0.0-20484605521`
499
+ - The build ID is the GitHub Actions run ID, ensuring unique versions
500
+
501
+ ### How to Release
502
+
503
+ Simply push your changes to `develop` or `main`:
504
+
505
+ ```bash
506
+ # Make your changes to the TypeScript package
507
+ git add .
508
+ git commit -m "feat: add new feature to document API"
509
+ git push origin develop
510
+ ```
511
+
512
+ The workflow will:
513
+ 1. Build the TypeScript package
514
+ 2. Generate a unique version using the GitHub run ID
515
+ 3. Publish to npm with provenance
516
+
517
+ ### Bumping Major/Minor Version
518
+
519
+ To change the base version (e.g., from 8.0.0 to 9.0.0):
520
+
521
+ 1. Update the version in `package.json`:
522
+ ```json
523
+ "version": "9.0.0"
524
+ ```
525
+
526
+ 2. Commit and push:
527
+ ```bash
528
+ git add package.json
529
+ git commit -m "chore: bump base version to 9.0.0"
530
+ git push origin develop
531
+ ```
532
+
533
+ The next build will publish as `9.0.0-<run-id>`.
534
+
535
+ ### Manual Publish (Dry Run)
536
+
537
+ To test publishing without actually releasing:
538
+ 1. Go to [Actions > Publish TypeScript Package](https://github.com/kodexa-ai/platform/actions/workflows/publish-typescript.yml)
539
+ 2. Click "Run workflow"
540
+ 3. Check "Dry run" option
541
+ 4. Click "Run workflow"
542
+
543
+ ### Build Traceability
544
+
545
+ Each published version includes the GitHub Actions run ID, allowing you to trace any version back to its exact build and commit.
546
+
547
+ ## 🤝 Contributing
548
+
549
+ 1. Fork the repository
550
+ 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
551
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
552
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
553
+ 5. Open a Pull Request
554
+
555
+ ## 📞 Support
556
+
557
+ - Documentation: [https://docs.kodexa.com](https://docs.kodexa.com)
558
+ - Issues: [GitHub Issues](https://github.com/kodexa-ai/kodexa-document/issues)
559
+ - Discussions: [GitHub Discussions](https://github.com/kodexa-ai/kodexa-document/discussions)
560
+
561
+ ---
562
+
563
+ Made with ❤️ by the Kodexa team