@kreuzberg/wasm 4.6.3 → 4.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +27 -29
- package/dist/pkg/README.md +27 -29
- package/dist/pkg/kreuzberg_wasm.js +156 -160
- package/dist/pkg/kreuzberg_wasm_bg.js +156 -160
- package/dist/pkg/kreuzberg_wasm_bg.wasm +0 -0
- package/dist/pkg/kreuzberg_wasm_bg.wasm.d.ts +4 -5
- package/dist/types.d.ts +209 -0
- package/dist/types.d.ts.map +1 -1
- package/package.json +3 -3
package/README.md
CHANGED
|
@@ -22,7 +22,7 @@
|
|
|
22
22
|
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
|
|
23
23
|
</a>
|
|
24
24
|
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
|
|
25
|
-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.
|
|
25
|
+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.7.1" alt="Go">
|
|
26
26
|
</a>
|
|
27
27
|
<a href="https://www.nuget.org/packages/Kreuzberg/">
|
|
28
28
|
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">
|
|
@@ -42,13 +42,16 @@
|
|
|
42
42
|
|
|
43
43
|
<!-- Project Info -->
|
|
44
44
|
<a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE">
|
|
45
|
-
<img src="https://img.shields.io/badge/License-MIT-
|
|
45
|
+
<img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
|
|
46
46
|
</a>
|
|
47
47
|
<a href="https://docs.kreuzberg.dev">
|
|
48
|
-
<img src="https://img.shields.io/badge/docs-kreuzberg.dev-
|
|
48
|
+
<img src="https://img.shields.io/badge/docs-kreuzberg.dev-007ec6" alt="Documentation">
|
|
49
|
+
</a>
|
|
50
|
+
<a href="https://docs.kreuzberg.dev/demo.html">
|
|
51
|
+
<img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo">
|
|
49
52
|
</a>
|
|
50
53
|
<a href="https://huggingface.co/Kreuzberg">
|
|
51
|
-
<img src="https://img.shields.io/badge/%F0%9F%A4%
|
|
54
|
+
<img src="https://img.shields.io/badge/%F0%9F%A4%97_Hugging_Face-007ec6" alt="Hugging Face">
|
|
52
55
|
</a>
|
|
53
56
|
</div>
|
|
54
57
|
|
|
@@ -61,7 +64,7 @@
|
|
|
61
64
|
</div>
|
|
62
65
|
|
|
63
66
|
|
|
64
|
-
Extract text, tables, images, and metadata from 91+ file formats including PDF, Office documents, and images. WebAssembly bindings for browsers, Deno, and Cloudflare Workers with portable deployment and multi-threading support.
|
|
67
|
+
Extract text, tables, images, and metadata from 91+ file formats and 248 programming languages including PDF, Office documents, and images. WebAssembly bindings for browsers, Deno, and Cloudflare Workers with portable deployment and multi-threading support.
|
|
65
68
|
|
|
66
69
|
|
|
67
70
|
## Installation
|
|
@@ -74,6 +77,7 @@ Install via one of the supported package managers:
|
|
|
74
77
|
|
|
75
78
|
|
|
76
79
|
**npm:**
|
|
80
|
+
|
|
77
81
|
```bash
|
|
78
82
|
npm install @kreuzberg/wasm
|
|
79
83
|
```
|
|
@@ -82,6 +86,7 @@ npm install @kreuzberg/wasm
|
|
|
82
86
|
|
|
83
87
|
|
|
84
88
|
**pnpm:**
|
|
89
|
+
|
|
85
90
|
```bash
|
|
86
91
|
pnpm add @kreuzberg/wasm
|
|
87
92
|
```
|
|
@@ -90,6 +95,7 @@ pnpm add @kreuzberg/wasm
|
|
|
90
95
|
|
|
91
96
|
|
|
92
97
|
**yarn:**
|
|
98
|
+
|
|
93
99
|
```bash
|
|
94
100
|
yarn add @kreuzberg/wasm
|
|
95
101
|
```
|
|
@@ -318,6 +324,19 @@ extractDocuments(fileBytes, mimes)
|
|
|
318
324
|
| **Scientific** | `.tex`, `.latex`, `.typst`, `.jats`, `.ipynb`, `.docbook` | LaTeX, Jupyter notebooks, PubMed JATS |
|
|
319
325
|
| **Documentation** | `.opml`, `.pod`, `.mdoc`, `.troff` | Technical documentation formats |
|
|
320
326
|
|
|
327
|
+
#### Code Intelligence (248 Languages)
|
|
328
|
+
|
|
329
|
+
| Feature | Description |
|
|
330
|
+
|---------|-------------|
|
|
331
|
+
| **Structure Extraction** | Functions, classes, methods, structs, interfaces, enums |
|
|
332
|
+
| **Import/Export Analysis** | Module dependencies, re-exports, wildcard imports |
|
|
333
|
+
| **Symbol Extraction** | Variables, constants, type aliases, properties |
|
|
334
|
+
| **Docstring Parsing** | Google, NumPy, Sphinx, JSDoc, RustDoc, and 10+ formats |
|
|
335
|
+
| **Diagnostics** | Parse errors with line/column positions |
|
|
336
|
+
| **Syntax-Aware Chunking** | Split code by semantic boundaries, not arbitrary byte offsets |
|
|
337
|
+
|
|
338
|
+
Powered by [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — [documentation](https://docs.tree-sitter-language-pack.kreuzberg.dev).
|
|
339
|
+
|
|
321
340
|
**[Complete Format Reference](https://kreuzberg.dev/reference/formats/)**
|
|
322
341
|
|
|
323
342
|
### Key Capabilities
|
|
@@ -337,6 +356,9 @@ extractDocuments(fileBytes, mimes)
|
|
|
337
356
|
- **Batch Processing** - Efficiently process multiple documents in parallel
|
|
338
357
|
- **Memory Efficient** - Stream large files without loading entirely into memory
|
|
339
358
|
- **Language Detection** - Detect and support multiple languages in documents
|
|
359
|
+
|
|
360
|
+
- **Code Intelligence** - Extract structure, imports, exports, symbols, and docstrings from [248 programming languages](https://docs.tree-sitter-language-pack.kreuzberg.dev) via tree-sitter
|
|
361
|
+
|
|
340
362
|
- **Configuration** - Fine-grained control over extraction behavior
|
|
341
363
|
|
|
342
364
|
### Performance Characteristics
|
|
@@ -488,30 +510,6 @@ For advanced configuration options including language detection, table extractio
|
|
|
488
510
|
|
|
489
511
|
**[Configuration Guide](https://kreuzberg.dev/guides/configuration/)**
|
|
490
512
|
|
|
491
|
-
## Platform Limitations
|
|
492
|
-
|
|
493
|
-
WASM runs in single-threaded environments without access to ONNX Runtime, which constrains some features:
|
|
494
|
-
|
|
495
|
-
### Unsupported Features
|
|
496
|
-
|
|
497
|
-
- **Layout Detection** – Requires RT-DETR model inference via ONNX Runtime, which is unavailable in WebAssembly
|
|
498
|
-
- **Hardware Acceleration** – No GPU support (AccelerationConfig is not applicable)
|
|
499
|
-
- **Concurrency Configuration** – Single-threaded WASM environment (ConcurrencyConfig does not apply)
|
|
500
|
-
- **Email Codepage Configuration** – EmailConfig is not supported in WASM
|
|
501
|
-
|
|
502
|
-
### Supported Features
|
|
503
|
-
|
|
504
|
-
- **Text Extraction** – Full text content from all supported formats
|
|
505
|
-
- **OCR via Tesseract WASM** – Scanned document and image OCR using browser-native Tesseract
|
|
506
|
-
- **Embeddings** – FastEmbed-based vector generation
|
|
507
|
-
- **Chunking** – Text segmentation for RAG pipelines
|
|
508
|
-
- **Metadata Extraction** – Document properties, creation dates, page counts
|
|
509
|
-
- **Table Extraction** – Structured table data from PDFs and spreadsheets
|
|
510
|
-
- **Language Detection** – Identify document language
|
|
511
|
-
- **Image Extraction** – Embedded images from documents
|
|
512
|
-
|
|
513
|
-
All 91+ file formats supported by Kreuzberg are available in WASM, with the exception that features requiring ONNX Runtime (layout detection) will fail gracefully with an unsupported error.
|
|
514
|
-
|
|
515
513
|
## Documentation
|
|
516
514
|
|
|
517
515
|
- **[Official Documentation](https://kreuzberg.dev/)**
|
package/dist/pkg/README.md
CHANGED
|
@@ -22,7 +22,7 @@
|
|
|
22
22
|
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
|
|
23
23
|
</a>
|
|
24
24
|
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
|
|
25
|
-
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.
|
|
25
|
+
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v4.7.1" alt="Go">
|
|
26
26
|
</a>
|
|
27
27
|
<a href="https://www.nuget.org/packages/Kreuzberg/">
|
|
28
28
|
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">
|
|
@@ -42,13 +42,16 @@
|
|
|
42
42
|
|
|
43
43
|
<!-- Project Info -->
|
|
44
44
|
<a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE">
|
|
45
|
-
<img src="https://img.shields.io/badge/License-MIT-
|
|
45
|
+
<img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
|
|
46
46
|
</a>
|
|
47
47
|
<a href="https://docs.kreuzberg.dev">
|
|
48
|
-
<img src="https://img.shields.io/badge/docs-kreuzberg.dev-
|
|
48
|
+
<img src="https://img.shields.io/badge/docs-kreuzberg.dev-007ec6" alt="Documentation">
|
|
49
|
+
</a>
|
|
50
|
+
<a href="https://docs.kreuzberg.dev/demo.html">
|
|
51
|
+
<img src="https://img.shields.io/badge/%E2%96%B6%EF%B8%8F_Live_Demo-007ec6" alt="Live Demo">
|
|
49
52
|
</a>
|
|
50
53
|
<a href="https://huggingface.co/Kreuzberg">
|
|
51
|
-
<img src="https://img.shields.io/badge/%F0%9F%A4%
|
|
54
|
+
<img src="https://img.shields.io/badge/%F0%9F%A4%97_Hugging_Face-007ec6" alt="Hugging Face">
|
|
52
55
|
</a>
|
|
53
56
|
</div>
|
|
54
57
|
|
|
@@ -61,7 +64,7 @@
|
|
|
61
64
|
</div>
|
|
62
65
|
|
|
63
66
|
|
|
64
|
-
Extract text, tables, images, and metadata from 91+ file formats including PDF, Office documents, and images. WebAssembly bindings for browsers, Deno, and Cloudflare Workers with portable deployment and multi-threading support.
|
|
67
|
+
Extract text, tables, images, and metadata from 91+ file formats and 248 programming languages including PDF, Office documents, and images. WebAssembly bindings for browsers, Deno, and Cloudflare Workers with portable deployment and multi-threading support.
|
|
65
68
|
|
|
66
69
|
|
|
67
70
|
## Installation
|
|
@@ -74,6 +77,7 @@ Install via one of the supported package managers:
|
|
|
74
77
|
|
|
75
78
|
|
|
76
79
|
**npm:**
|
|
80
|
+
|
|
77
81
|
```bash
|
|
78
82
|
npm install @kreuzberg/wasm
|
|
79
83
|
```
|
|
@@ -82,6 +86,7 @@ npm install @kreuzberg/wasm
|
|
|
82
86
|
|
|
83
87
|
|
|
84
88
|
**pnpm:**
|
|
89
|
+
|
|
85
90
|
```bash
|
|
86
91
|
pnpm add @kreuzberg/wasm
|
|
87
92
|
```
|
|
@@ -90,6 +95,7 @@ pnpm add @kreuzberg/wasm
|
|
|
90
95
|
|
|
91
96
|
|
|
92
97
|
**yarn:**
|
|
98
|
+
|
|
93
99
|
```bash
|
|
94
100
|
yarn add @kreuzberg/wasm
|
|
95
101
|
```
|
|
@@ -318,6 +324,19 @@ extractDocuments(fileBytes, mimes)
|
|
|
318
324
|
| **Scientific** | `.tex`, `.latex`, `.typst`, `.jats`, `.ipynb`, `.docbook` | LaTeX, Jupyter notebooks, PubMed JATS |
|
|
319
325
|
| **Documentation** | `.opml`, `.pod`, `.mdoc`, `.troff` | Technical documentation formats |
|
|
320
326
|
|
|
327
|
+
#### Code Intelligence (248 Languages)
|
|
328
|
+
|
|
329
|
+
| Feature | Description |
|
|
330
|
+
|---------|-------------|
|
|
331
|
+
| **Structure Extraction** | Functions, classes, methods, structs, interfaces, enums |
|
|
332
|
+
| **Import/Export Analysis** | Module dependencies, re-exports, wildcard imports |
|
|
333
|
+
| **Symbol Extraction** | Variables, constants, type aliases, properties |
|
|
334
|
+
| **Docstring Parsing** | Google, NumPy, Sphinx, JSDoc, RustDoc, and 10+ formats |
|
|
335
|
+
| **Diagnostics** | Parse errors with line/column positions |
|
|
336
|
+
| **Syntax-Aware Chunking** | Split code by semantic boundaries, not arbitrary byte offsets |
|
|
337
|
+
|
|
338
|
+
Powered by [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — [documentation](https://docs.tree-sitter-language-pack.kreuzberg.dev).
|
|
339
|
+
|
|
321
340
|
**[Complete Format Reference](https://kreuzberg.dev/reference/formats/)**
|
|
322
341
|
|
|
323
342
|
### Key Capabilities
|
|
@@ -337,6 +356,9 @@ extractDocuments(fileBytes, mimes)
|
|
|
337
356
|
- **Batch Processing** - Efficiently process multiple documents in parallel
|
|
338
357
|
- **Memory Efficient** - Stream large files without loading entirely into memory
|
|
339
358
|
- **Language Detection** - Detect and support multiple languages in documents
|
|
359
|
+
|
|
360
|
+
- **Code Intelligence** - Extract structure, imports, exports, symbols, and docstrings from [248 programming languages](https://docs.tree-sitter-language-pack.kreuzberg.dev) via tree-sitter
|
|
361
|
+
|
|
340
362
|
- **Configuration** - Fine-grained control over extraction behavior
|
|
341
363
|
|
|
342
364
|
### Performance Characteristics
|
|
@@ -488,30 +510,6 @@ For advanced configuration options including language detection, table extractio
|
|
|
488
510
|
|
|
489
511
|
**[Configuration Guide](https://kreuzberg.dev/guides/configuration/)**
|
|
490
512
|
|
|
491
|
-
## Platform Limitations
|
|
492
|
-
|
|
493
|
-
WASM runs in single-threaded environments without access to ONNX Runtime, which constrains some features:
|
|
494
|
-
|
|
495
|
-
### Unsupported Features
|
|
496
|
-
|
|
497
|
-
- **Layout Detection** – Requires RT-DETR model inference via ONNX Runtime, which is unavailable in WebAssembly
|
|
498
|
-
- **Hardware Acceleration** – No GPU support (AccelerationConfig is not applicable)
|
|
499
|
-
- **Concurrency Configuration** – Single-threaded WASM environment (ConcurrencyConfig does not apply)
|
|
500
|
-
- **Email Codepage Configuration** – EmailConfig is not supported in WASM
|
|
501
|
-
|
|
502
|
-
### Supported Features
|
|
503
|
-
|
|
504
|
-
- **Text Extraction** – Full text content from all supported formats
|
|
505
|
-
- **OCR via Tesseract WASM** – Scanned document and image OCR using browser-native Tesseract
|
|
506
|
-
- **Embeddings** – FastEmbed-based vector generation
|
|
507
|
-
- **Chunking** – Text segmentation for RAG pipelines
|
|
508
|
-
- **Metadata Extraction** – Document properties, creation dates, page counts
|
|
509
|
-
- **Table Extraction** – Structured table data from PDFs and spreadsheets
|
|
510
|
-
- **Language Detection** – Identify document language
|
|
511
|
-
- **Image Extraction** – Embedded images from documents
|
|
512
|
-
|
|
513
|
-
All 91+ file formats supported by Kreuzberg are available in WASM, with the exception that features requiring ONNX Runtime (layout detection) will fail gracefully with an unsupported error.
|
|
514
|
-
|
|
515
513
|
## Documentation
|
|
516
514
|
|
|
517
515
|
- **[Official Documentation](https://kreuzberg.dev/)**
|