@kreuzberg/node 4.0.0-rc.6 → 4.0.0-rc.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,26 +1,36 @@
1
- # Kreuzberg for Node.js
1
+ # Kreuzberg
2
+
3
+ [![Rust](https://img.shields.io/crates/v/kreuzberg?label=Rust)](https://crates.io/crates/kreuzberg)
4
+ [![Python](https://img.shields.io/pypi/v/kreuzberg?label=Python)](https://pypi.org/project/kreuzberg/)
5
+ [![TypeScript](https://img.shields.io/npm/v/@kreuzberg/node?label=TypeScript)](https://www.npmjs.com/package/@kreuzberg/node)
6
+ [![WASM](https://img.shields.io/npm/v/@kreuzberg/wasm?label=WASM)](https://www.npmjs.com/package/@kreuzberg/wasm)
7
+ [![Ruby](https://img.shields.io/gem/v/kreuzberg?label=Ruby)](https://rubygems.org/gems/kreuzberg)
8
+ [![Java](https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java)](https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg)
9
+ [![Go](https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go)](https://pkg.go.dev/github.com/kreuzberg-dev/kreuzberg)
10
+ [![C#](https://img.shields.io/nuget/v/Goldziher.Kreuzberg?label=C%23)](https://www.nuget.org/packages/Goldziher.Kreuzberg/)
2
11
 
3
- [![npm](https://img.shields.io/npm/v/kreuzberg)](https://www.npmjs.com/package/kreuzberg)
4
- [![Crates.io](https://img.shields.io/crates/v/kreuzberg)](https://crates.io/crates/kreuzberg)
5
- [![PyPI](https://img.shields.io/pypi/v/kreuzberg)](https://pypi.org/project/kreuzberg/)
6
- [![RubyGems](https://img.shields.io/gem/v/kreuzberg)](https://rubygems.org/gems/kreuzberg)
7
- [![Node Version](https://img.shields.io/node/v/kreuzberg)](https://www.npmjs.com/package/kreuzberg)
8
12
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
- [![Documentation](https://img.shields.io/badge/docs-kreuzberg.dev-blue)](https://kreuzberg.dev)
13
+ [![Documentation](https://img.shields.io/badge/docs-kreuzberg.dev-blue)](https://kreuzberg.dev/)
14
+ [![Discord](https://img.shields.io/badge/Discord-Join%20our%20community-7289da)](https://discord.gg/pXxagNK2zN)
10
15
 
11
16
  High-performance document intelligence for Node.js and TypeScript, powered by Rust.
12
17
 
13
- Extract text, tables, images, and metadata from 50+ file formats including PDF, DOCX, PPTX, XLSX, images, and more.
18
+ Extract text, tables, images, and metadata from 56 file formats including PDF, DOCX, PPTX, XLSX, images, and more.
19
+
20
+ > **Recommended for Node.js and Bun** - Native NAPI-RS bindings provide the best performance (2-3x faster than WASM).
21
+ >
22
+ > For browser, Deno, or Cloudflare Workers, use [@kreuzberg/wasm](../kreuzberg-wasm/) instead.
14
23
 
15
- > **🚀 Version 4.0.0 Release Candidate**
24
+ > **Version 4.0.0 Release Candidate**
16
25
  > This is a pre-release version. We invite you to test the library and [report any issues](https://github.com/kreuzberg-dev/kreuzberg/issues) you encounter.
17
26
 
18
27
  ## Features
19
28
 
20
- - **50+ File Formats**: PDF, DOCX, PPTX, XLSX, images, HTML, Markdown, XML, JSON, and more
29
+ - **56 File Formats**: PDF, DOCX, PPTX, XLSX, images, HTML, Markdown, XML, JSON, and more
21
30
  - **OCR Support**: Built-in Tesseract, EasyOCR, and PaddleOCR backends for scanned documents
22
31
  - **Table Extraction**: Advanced table detection and structured data extraction
23
- - **High Performance**: Native Rust core provides 10-50x performance improvements over pure JavaScript
32
+ - **Native Performance**: 2-3x faster than WASM; 10-50x faster than pure JavaScript
33
+ - **Zero-Copy Operations**: Direct system calls and minimal data copying
24
34
  - **Type-Safe**: Full TypeScript definitions for all methods, configurations, and return types
25
35
  - **Async/Sync APIs**: Both asynchronous and synchronous extraction methods
26
36
  - **Batch Processing**: Process multiple documents in parallel with optimized concurrency
@@ -29,6 +39,27 @@ Extract text, tables, images, and metadata from 50+ file formats including PDF,
29
39
  - **Caching**: Built-in result caching for faster repeated extractions
30
40
  - **Zero Configuration**: Works out of the box with sensible defaults
31
41
 
42
+ ## Why Use This Package?
43
+
44
+ Choose `@kreuzberg/node` if you're building with:
45
+
46
+ - **Node.js 18+** - Native bindings provide direct access to system resources
47
+ - **Bun** - Full compatibility with Bun's Node.js API
48
+ - **Performance-critical applications** - Processing large document batches or real-time extraction
49
+ - **Server-side extraction** - APIs, microservices, document processing pipelines
50
+
51
+ ### Comparison with @kreuzberg/wasm
52
+
53
+ | Aspect | `@kreuzberg/node` | `@kreuzberg/wasm` |
54
+ |--------|------------------|-------------------|
55
+ | **Performance** | 2-3x faster (native) | Standard baseline |
56
+ | **Environment** | Node.js, Bun | Browser, Deno, Workers, Node.js |
57
+ | **Bundle Size** | 10-15 MB (prebuilt binary) | 2-4 MB (WASM module) |
58
+ | **System Access** | Direct system calls | Sandboxed via WASM |
59
+ | **Best For** | Server-side, batch processing | Client-side, edge computing |
60
+
61
+ Use `@kreuzberg/wasm` for browser applications, Cloudflare Workers, Deno, or when you need a smaller bundle size.
62
+
32
63
  ## Requirements
33
64
 
34
65
  - Node.js 18 or higher
@@ -537,7 +568,7 @@ Processing 100 mixed documents (PDF, DOCX, XLSX):
537
568
  If you encounter errors about missing native modules:
538
569
 
539
570
  ```bash
540
- npm rebuild kreuzberg
571
+ npm rebuild @kreuzberg/node
541
572
  ```
542
573
 
543
574
  ### OCR Not Working
@@ -653,7 +684,7 @@ For comprehensive documentation, visit [https://kreuzberg.dev](https://kreuzberg
653
684
 
654
685
  ## Contributing
655
686
 
656
- We welcome contributions! Please see our [Contributing Guide](https://github.com/kreuzberg-dev/kreuzberg/blob/main/docs/contributing.md) for details.
687
+ We welcome contributions! Please see our [Contributing Guide](../../CONTRIBUTING.md) for details.
657
688
 
658
689
  ## License
659
690
 
@@ -666,4 +697,4 @@ MIT
666
697
  - [GitHub](https://github.com/kreuzberg-dev/kreuzberg)
667
698
  - [Issue Tracker](https://github.com/kreuzberg-dev/kreuzberg/issues)
668
699
  - [Changelog](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CHANGELOG.md)
669
- - [npm Package](https://www.npmjs.com/package/kreuzberg)
700
+ - [npm Package](https://www.npmjs.com/package/@kreuzberg/node)
package/dist/cli.d.mts ADDED
@@ -0,0 +1,9 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Proxy entry point that forwards to the Rust-based Kreuzberg CLI.
4
+ *
5
+ * This keeps `npx kreuzberg` working without shipping an additional TypeScript CLI implementation.
6
+ */
7
+ declare function main(argv: string[]): number;
8
+
9
+ export { main };
package/dist/cli.d.ts ADDED
@@ -0,0 +1,9 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Proxy entry point that forwards to the Rust-based Kreuzberg CLI.
4
+ *
5
+ * This keeps `npx kreuzberg` working without shipping an additional TypeScript CLI implementation.
6
+ */
7
+ declare function main(argv: string[]): number;
8
+
9
+ export { main };
package/dist/cli.js ADDED
@@ -0,0 +1,78 @@
1
+ #!/usr/bin/env node
2
+ "use strict";
3
+ var __create = Object.create;
4
+ var __defProp = Object.defineProperty;
5
+ var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
6
+ var __getOwnPropNames = Object.getOwnPropertyNames;
7
+ var __getProtoOf = Object.getPrototypeOf;
8
+ var __hasOwnProp = Object.prototype.hasOwnProperty;
9
+ var __export = (target, all) => {
10
+ for (var name in all)
11
+ __defProp(target, name, { get: all[name], enumerable: true });
12
+ };
13
+ var __copyProps = (to, from, except, desc) => {
14
+ if (from && typeof from === "object" || typeof from === "function") {
15
+ for (let key of __getOwnPropNames(from))
16
+ if (!__hasOwnProp.call(to, key) && key !== except)
17
+ __defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
18
+ }
19
+ return to;
20
+ };
21
+ var __toESM = (mod, isNodeMode, target) => (target = mod != null ? __create(__getProtoOf(mod)) : {}, __copyProps(
22
+ // If the importer is in node compatibility mode or this is not an ESM
23
+ // file that has been converted to a CommonJS file using a Babel-
24
+ // compatible transform (i.e. "__esModule" has not been set), then set
25
+ // "default" to the CommonJS "module.exports" for node compatibility.
26
+ isNodeMode || !mod || !mod.__esModule ? __defProp(target, "default", { value: mod, enumerable: true }) : target,
27
+ mod
28
+ ));
29
+ var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: true }), mod);
30
+ var cli_exports = {};
31
+ __export(cli_exports, {
32
+ main: () => main
33
+ });
34
+ module.exports = __toCommonJS(cli_exports);
35
+ var import_node_child_process = require("node:child_process");
36
+ var import_node_fs = require("node:fs");
37
+ var import_node_path = require("node:path");
38
+ var import_node_url = require("node:url");
39
+ var import_which = __toESM(require("which"));
40
+ const import_meta = {};
41
+ function main(argv) {
42
+ const args = argv.slice(2);
43
+ let cliPath;
44
+ try {
45
+ cliPath = import_which.default.sync("kreuzberg-cli");
46
+ } catch {
47
+ }
48
+ if (!cliPath) {
49
+ const __dirname = typeof __filename !== "undefined" ? (0, import_node_path.dirname)(__filename) : (0, import_node_path.dirname)((0, import_node_url.fileURLToPath)(import_meta.url));
50
+ const devBinary = (0, import_node_path.join)(__dirname, "..", "..", "..", "target", "release", "kreuzberg");
51
+ if ((0, import_node_fs.existsSync)(devBinary)) {
52
+ cliPath = devBinary;
53
+ }
54
+ }
55
+ if (!cliPath) {
56
+ console.error(
57
+ "The embedded Kreuzberg CLI binary could not be located. This indicates a packaging issue; please open an issue at https://github.com/kreuzberg-dev/kreuzberg/issues so we can investigate."
58
+ );
59
+ return 1;
60
+ }
61
+ const result = (0, import_node_child_process.spawnSync)(cliPath, args, {
62
+ stdio: "inherit",
63
+ shell: false
64
+ });
65
+ if (result.error) {
66
+ console.error(`Failed to execute kreuzberg-cli: ${result.error.message}`);
67
+ return 1;
68
+ }
69
+ return result.status ?? 1;
70
+ }
71
+ if (require.main === module) {
72
+ process.exit(main(process.argv));
73
+ }
74
+ // Annotate the CommonJS export names for ESM import in node:
75
+ 0 && (module.exports = {
76
+ main
77
+ });
78
+ //# sourceMappingURL=cli.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../typescript/cli.ts"],"sourcesContent":["#!/usr/bin/env node\n\n/**\n * Proxy entry point that forwards to the Rust-based Kreuzberg CLI.\n *\n * This keeps `npx kreuzberg` working without shipping an additional TypeScript CLI implementation.\n */\n\nimport { spawnSync } from \"node:child_process\";\nimport { existsSync } from \"node:fs\";\nimport { dirname, join } from \"node:path\";\nimport { fileURLToPath } from \"node:url\";\nimport which from \"which\";\n\nfunction main(argv: string[]): number {\n\tconst args = argv.slice(2);\n\n\tlet cliPath: string | undefined;\n\ttry {\n\t\tcliPath = which.sync(\"kreuzberg-cli\");\n\t} catch {}\n\n\tif (!cliPath) {\n\t\tconst __dirname = typeof __filename !== \"undefined\" ? dirname(__filename) : dirname(fileURLToPath(import.meta.url));\n\t\tconst devBinary = join(__dirname, \"..\", \"..\", \"..\", \"target\", \"release\", \"kreuzberg\");\n\t\tif (existsSync(devBinary)) {\n\t\t\tcliPath = devBinary;\n\t\t}\n\t}\n\n\tif (!cliPath) {\n\t\tconsole.error(\n\t\t\t\"The embedded Kreuzberg CLI binary could not be located. \" +\n\t\t\t\t\"This indicates a packaging issue; please open an issue at \" +\n\t\t\t\t\"https://github.com/kreuzberg-dev/kreuzberg/issues so we can investigate.\",\n\t\t);\n\t\treturn 1;\n\t}\n\n\tconst result = spawnSync(cliPath, args, {\n\t\tstdio: \"inherit\",\n\t\tshell: false,\n\t});\n\n\tif (result.error) {\n\t\tconsole.error(`Failed to execute kreuzberg-cli: ${result.error.message}`);\n\t\treturn 1;\n\t}\n\n\treturn result.status ?? 1;\n}\n\nif (require.main === module) {\n\tprocess.exit(main(process.argv));\n}\n\nexport { main };\n"],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;;;;AAAA;AAAA;AAAA;AAAA;AAAA;AAQA,gCAA0B;AAC1B,qBAA2B;AAC3B,uBAA8B;AAC9B,sBAA8B;AAC9B,mBAAkB;AAZlB;AAcA,SAAS,KAAK,MAAwB;AACrC,QAAM,OAAO,KAAK,MAAM,CAAC;AAEzB,MAAI;AACJ,MAAI;AACH,cAAU,aAAAA,QAAM,KAAK,eAAe;AAAA,EACrC,QAAQ;AAAA,EAAC;AAET,MAAI,CAAC,SAAS;AACb,UAAM,YAAY,OAAO,eAAe,kBAAc,0BAAQ,UAAU,QAAI,8BAAQ,+BAAc,YAAY,GAAG,CAAC;AAClH,UAAM,gBAAY,uBAAK,WAAW,MAAM,MAAM,MAAM,UAAU,WAAW,WAAW;AACpF,YAAI,2BAAW,SAAS,GAAG;AAC1B,gBAAU;AAAA,IACX;AAAA,EACD;AAEA,MAAI,CAAC,SAAS;AACb,YAAQ;AAAA,MACP;AAAA,IAGD;AACA,WAAO;AAAA,EACR;AAEA,QAAM,aAAS,qCAAU,SAAS,MAAM;AAAA,IACvC,OAAO;AAAA,IACP,OAAO;AAAA,EACR,CAAC;AAED,MAAI,OAAO,OAAO;AACjB,YAAQ,MAAM,oCAAoC,OAAO,MAAM,OAAO,EAAE;AACxE,WAAO;AAAA,EACR;AAEA,SAAO,OAAO,UAAU;AACzB;AAEA,IAAI,QAAQ,SAAS,QAAQ;AAC5B,UAAQ,KAAK,KAAK,QAAQ,IAAI,CAAC;AAChC;","names":["which"]}
package/dist/cli.mjs ADDED
@@ -0,0 +1,43 @@
1
+ #!/usr/bin/env node
2
+ import { spawnSync } from "node:child_process";
3
+ import { existsSync } from "node:fs";
4
+ import { dirname, join } from "node:path";
5
+ import { fileURLToPath } from "node:url";
6
+ import which from "which";
7
+ function main(argv) {
8
+ const args = argv.slice(2);
9
+ let cliPath;
10
+ try {
11
+ cliPath = which.sync("kreuzberg-cli");
12
+ } catch {
13
+ }
14
+ if (!cliPath) {
15
+ const __dirname = typeof __filename !== "undefined" ? dirname(__filename) : dirname(fileURLToPath(import.meta.url));
16
+ const devBinary = join(__dirname, "..", "..", "..", "target", "release", "kreuzberg");
17
+ if (existsSync(devBinary)) {
18
+ cliPath = devBinary;
19
+ }
20
+ }
21
+ if (!cliPath) {
22
+ console.error(
23
+ "The embedded Kreuzberg CLI binary could not be located. This indicates a packaging issue; please open an issue at https://github.com/kreuzberg-dev/kreuzberg/issues so we can investigate."
24
+ );
25
+ return 1;
26
+ }
27
+ const result = spawnSync(cliPath, args, {
28
+ stdio: "inherit",
29
+ shell: false
30
+ });
31
+ if (result.error) {
32
+ console.error(`Failed to execute kreuzberg-cli: ${result.error.message}`);
33
+ return 1;
34
+ }
35
+ return result.status ?? 1;
36
+ }
37
+ if (require.main === module) {
38
+ process.exit(main(process.argv));
39
+ }
40
+ export {
41
+ main
42
+ };
43
+ //# sourceMappingURL=cli.mjs.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../typescript/cli.ts"],"sourcesContent":["#!/usr/bin/env node\n\n/**\n * Proxy entry point that forwards to the Rust-based Kreuzberg CLI.\n *\n * This keeps `npx kreuzberg` working without shipping an additional TypeScript CLI implementation.\n */\n\nimport { spawnSync } from \"node:child_process\";\nimport { existsSync } from \"node:fs\";\nimport { dirname, join } from \"node:path\";\nimport { fileURLToPath } from \"node:url\";\nimport which from \"which\";\n\nfunction main(argv: string[]): number {\n\tconst args = argv.slice(2);\n\n\tlet cliPath: string | undefined;\n\ttry {\n\t\tcliPath = which.sync(\"kreuzberg-cli\");\n\t} catch {}\n\n\tif (!cliPath) {\n\t\tconst __dirname = typeof __filename !== \"undefined\" ? dirname(__filename) : dirname(fileURLToPath(import.meta.url));\n\t\tconst devBinary = join(__dirname, \"..\", \"..\", \"..\", \"target\", \"release\", \"kreuzberg\");\n\t\tif (existsSync(devBinary)) {\n\t\t\tcliPath = devBinary;\n\t\t}\n\t}\n\n\tif (!cliPath) {\n\t\tconsole.error(\n\t\t\t\"The embedded Kreuzberg CLI binary could not be located. \" +\n\t\t\t\t\"This indicates a packaging issue; please open an issue at \" +\n\t\t\t\t\"https://github.com/kreuzberg-dev/kreuzberg/issues so we can investigate.\",\n\t\t);\n\t\treturn 1;\n\t}\n\n\tconst result = spawnSync(cliPath, args, {\n\t\tstdio: \"inherit\",\n\t\tshell: false,\n\t});\n\n\tif (result.error) {\n\t\tconsole.error(`Failed to execute kreuzberg-cli: ${result.error.message}`);\n\t\treturn 1;\n\t}\n\n\treturn result.status ?? 1;\n}\n\nif (require.main === module) {\n\tprocess.exit(main(process.argv));\n}\n\nexport { main };\n"],"mappings":";AAQA,SAAS,iBAAiB;AAC1B,SAAS,kBAAkB;AAC3B,SAAS,SAAS,YAAY;AAC9B,SAAS,qBAAqB;AAC9B,OAAO,WAAW;AAElB,SAAS,KAAK,MAAwB;AACrC,QAAM,OAAO,KAAK,MAAM,CAAC;AAEzB,MAAI;AACJ,MAAI;AACH,cAAU,MAAM,KAAK,eAAe;AAAA,EACrC,QAAQ;AAAA,EAAC;AAET,MAAI,CAAC,SAAS;AACb,UAAM,YAAY,OAAO,eAAe,cAAc,QAAQ,UAAU,IAAI,QAAQ,cAAc,YAAY,GAAG,CAAC;AAClH,UAAM,YAAY,KAAK,WAAW,MAAM,MAAM,MAAM,UAAU,WAAW,WAAW;AACpF,QAAI,WAAW,SAAS,GAAG;AAC1B,gBAAU;AAAA,IACX;AAAA,EACD;AAEA,MAAI,CAAC,SAAS;AACb,YAAQ;AAAA,MACP;AAAA,IAGD;AACA,WAAO;AAAA,EACR;AAEA,QAAM,SAAS,UAAU,SAAS,MAAM;AAAA,IACvC,OAAO;AAAA,IACP,OAAO;AAAA,EACR,CAAC;AAED,MAAI,OAAO,OAAO;AACjB,YAAQ,MAAM,oCAAoC,OAAO,MAAM,OAAO,EAAE;AACxE,WAAO;AAAA,EACR;AAEA,SAAO,OAAO,UAAU;AACzB;AAEA,IAAI,QAAQ,SAAS,QAAQ;AAC5B,UAAQ,KAAK,KAAK,QAAQ,IAAI,CAAC;AAChC;","names":[]}
@@ -0,0 +1,358 @@
1
+ /**
2
+ * Error types for Kreuzberg document intelligence framework.
3
+ *
4
+ * These error classes mirror the Rust core error types and provide
5
+ * type-safe error handling for TypeScript consumers.
6
+ *
7
+ * ## Error Hierarchy
8
+ *
9
+ * ```
10
+ * Error (JavaScript built-in)
11
+ * └── KreuzbergError (base class)
12
+ * ├── ValidationError
13
+ * ├── ParsingError
14
+ * ├── OcrError
15
+ * ├── CacheError
16
+ * ├── ImageProcessingError
17
+ * ├── PluginError
18
+ * ├── MissingDependencyError
19
+ * └── ... (other error types)
20
+ * ```
21
+ *
22
+ * @module errors
23
+ */
24
+ /**
25
+ * FFI error codes matching kreuzberg-ffi C library error types.
26
+ *
27
+ * @example
28
+ * ```typescript
29
+ * import { ErrorCode, getLastErrorCode } from '@kreuzberg/node';
30
+ *
31
+ * try {
32
+ * const result = await extractFile('document.pdf');
33
+ * } catch (error) {
34
+ * const code = getLastErrorCode();
35
+ * if (code === ErrorCode.Panic) {
36
+ * console.error('A panic occurred in the native library');
37
+ * }
38
+ * }
39
+ * ```
40
+ */
41
+ declare enum ErrorCode {
42
+ /**
43
+ * No error (success)
44
+ */
45
+ Success = 0,
46
+ /**
47
+ * Generic error
48
+ */
49
+ GenericError = 1,
50
+ /**
51
+ * Panic occurred in native code
52
+ */
53
+ Panic = 2,
54
+ /**
55
+ * Invalid argument provided
56
+ */
57
+ InvalidArgument = 3,
58
+ /**
59
+ * I/O error (file system, network, etc.)
60
+ */
61
+ IoError = 4,
62
+ /**
63
+ * Error parsing document content
64
+ */
65
+ ParsingError = 5,
66
+ /**
67
+ * Error in OCR processing
68
+ */
69
+ OcrError = 6,
70
+ /**
71
+ * Required system dependency is missing
72
+ */
73
+ MissingDependency = 7
74
+ }
75
+ /**
76
+ * Context information for panics in native code.
77
+ *
78
+ * Contains file location, line number, function name, panic message,
79
+ * and timestamp for debugging native library issues.
80
+ *
81
+ * @example
82
+ * ```typescript
83
+ * import { KreuzbergError } from '@kreuzberg/node';
84
+ *
85
+ * try {
86
+ * const result = await extractFile('document.pdf');
87
+ * } catch (error) {
88
+ * if (error instanceof KreuzbergError && error.panicContext) {
89
+ * console.error('Panic occurred:');
90
+ * console.error(`File: ${error.panicContext.file}`);
91
+ * console.error(`Line: ${error.panicContext.line}`);
92
+ * console.error(`Function: ${error.panicContext.function}`);
93
+ * console.error(`Message: ${error.panicContext.message}`);
94
+ * }
95
+ * }
96
+ * ```
97
+ */
98
+ interface PanicContext {
99
+ /**
100
+ * Source file where panic occurred
101
+ */
102
+ file: string;
103
+ /**
104
+ * Line number in source file
105
+ */
106
+ line: number;
107
+ /**
108
+ * Function name where panic occurred
109
+ */
110
+ function: string;
111
+ /**
112
+ * Panic message
113
+ */
114
+ message: string;
115
+ /**
116
+ * Unix timestamp (seconds since epoch)
117
+ */
118
+ timestamp_secs: number;
119
+ }
120
+ /**
121
+ * Base error class for all Kreuzberg errors.
122
+ *
123
+ * All error types thrown by Kreuzberg extend this class, allowing
124
+ * consumers to catch all Kreuzberg-specific errors with a single catch block.
125
+ *
126
+ * @example
127
+ * ```typescript
128
+ * import { extractFile, KreuzbergError } from '@kreuzberg/node';
129
+ *
130
+ * try {
131
+ * const result = await extractFile('document.pdf');
132
+ * } catch (error) {
133
+ * if (error instanceof KreuzbergError) {
134
+ * console.error('Kreuzberg error:', error.message);
135
+ * if (error.panicContext) {
136
+ * console.error('Panic at:', error.panicContext.file + ':' + error.panicContext.line);
137
+ * }
138
+ * } else {
139
+ * throw error; // Re-throw non-Kreuzberg errors
140
+ * }
141
+ * }
142
+ * ```
143
+ */
144
+ declare class KreuzbergError extends Error {
145
+ /**
146
+ * Panic context if error was caused by a panic in native code.
147
+ * Will be null for non-panic errors.
148
+ */
149
+ readonly panicContext: PanicContext | null;
150
+ constructor(message: string, panicContext?: PanicContext | null);
151
+ toJSON(): {
152
+ name: string;
153
+ message: string;
154
+ panicContext: PanicContext | null;
155
+ stack: string | undefined;
156
+ };
157
+ }
158
+ /**
159
+ * Error thrown when document validation fails.
160
+ *
161
+ * Validation errors occur when a document doesn't meet specified criteria,
162
+ * such as minimum content length, required metadata fields, or quality thresholds.
163
+ *
164
+ * @example
165
+ * ```typescript
166
+ * import { extractFile, ValidationError } from '@kreuzberg/node';
167
+ *
168
+ * try {
169
+ * const result = await extractFile('document.pdf');
170
+ * } catch (error) {
171
+ * if (error instanceof ValidationError) {
172
+ * console.error('Document validation failed:', error.message);
173
+ * }
174
+ * }
175
+ * ```
176
+ */
177
+ declare class ValidationError extends KreuzbergError {
178
+ constructor(message: string, panicContext?: PanicContext | null);
179
+ }
180
+ /**
181
+ * Error thrown when document parsing fails.
182
+ *
183
+ * Parsing errors occur when a document is corrupted, malformed, or cannot
184
+ * be processed by the extraction engine. This includes issues like:
185
+ * - Corrupted PDF files
186
+ * - Invalid XML/JSON syntax
187
+ * - Unsupported file format versions
188
+ * - Encrypted documents without valid passwords
189
+ *
190
+ * @example
191
+ * ```typescript
192
+ * import { extractFile, ParsingError } from '@kreuzberg/node';
193
+ *
194
+ * try {
195
+ * const result = await extractFile('corrupted.pdf');
196
+ * } catch (error) {
197
+ * if (error instanceof ParsingError) {
198
+ * console.error('Failed to parse document:', error.message);
199
+ * }
200
+ * }
201
+ * ```
202
+ */
203
+ declare class ParsingError extends KreuzbergError {
204
+ constructor(message: string, panicContext?: PanicContext | null);
205
+ }
206
+ /**
207
+ * Error thrown when OCR processing fails.
208
+ *
209
+ * OCR errors occur during optical character recognition, such as:
210
+ * - OCR backend initialization failures
211
+ * - Image preprocessing errors
212
+ * - Language model loading issues
213
+ * - OCR engine crashes
214
+ *
215
+ * @example
216
+ * ```typescript
217
+ * import { extractFile, OcrError } from '@kreuzberg/node';
218
+ *
219
+ * try {
220
+ * const result = await extractFile('scanned.pdf', null, {
221
+ * ocr: { backend: 'tesseract', language: 'eng' }
222
+ * });
223
+ * } catch (error) {
224
+ * if (error instanceof OcrError) {
225
+ * console.error('OCR processing failed:', error.message);
226
+ * }
227
+ * }
228
+ * ```
229
+ */
230
+ declare class OcrError extends KreuzbergError {
231
+ constructor(message: string, panicContext?: PanicContext | null);
232
+ }
233
+ /**
234
+ * Error thrown when cache operations fail.
235
+ *
236
+ * Cache errors are typically non-fatal and occur during caching operations, such as:
237
+ * - Cache directory creation failures
238
+ * - Disk write errors
239
+ * - Cache entry corruption
240
+ * - Insufficient disk space
241
+ *
242
+ * These errors are usually logged but don't prevent extraction from completing.
243
+ *
244
+ * @example
245
+ * ```typescript
246
+ * import { extractFile, CacheError } from '@kreuzberg/node';
247
+ *
248
+ * try {
249
+ * const result = await extractFile('document.pdf', null, {
250
+ * useCache: true
251
+ * });
252
+ * } catch (error) {
253
+ * if (error instanceof CacheError) {
254
+ * console.warn('Cache operation failed, continuing without cache:', error.message);
255
+ * }
256
+ * }
257
+ * ```
258
+ */
259
+ declare class CacheError extends KreuzbergError {
260
+ constructor(message: string, panicContext?: PanicContext | null);
261
+ }
262
+ /**
263
+ * Error thrown when image processing operations fail.
264
+ *
265
+ * Image processing errors occur during image manipulation, such as:
266
+ * - Image decoding failures
267
+ * - Unsupported image formats
268
+ * - Image resizing/scaling errors
269
+ * - DPI adjustment failures
270
+ * - Color space conversion issues
271
+ *
272
+ * @example
273
+ * ```typescript
274
+ * import { extractFile, ImageProcessingError } from '@kreuzberg/node';
275
+ *
276
+ * try {
277
+ * const result = await extractFile('document.pdf', null, {
278
+ * images: {
279
+ * extractImages: true,
280
+ * targetDpi: 300
281
+ * }
282
+ * });
283
+ * } catch (error) {
284
+ * if (error instanceof ImageProcessingError) {
285
+ * console.error('Image processing failed:', error.message);
286
+ * }
287
+ * }
288
+ * ```
289
+ */
290
+ declare class ImageProcessingError extends KreuzbergError {
291
+ constructor(message: string, panicContext?: PanicContext | null);
292
+ }
293
+ /**
294
+ * Error thrown when a plugin operation fails.
295
+ *
296
+ * Plugin errors occur in custom plugins (postprocessors, validators, OCR backends), such as:
297
+ * - Plugin initialization failures
298
+ * - Plugin processing errors
299
+ * - Plugin crashes or timeouts
300
+ * - Invalid plugin configuration
301
+ *
302
+ * The error message includes the plugin name to help identify which plugin failed.
303
+ *
304
+ * @example
305
+ * ```typescript
306
+ * import { extractFile, PluginError } from '@kreuzberg/node';
307
+ *
308
+ * try {
309
+ * const result = await extractFile('document.pdf');
310
+ * } catch (error) {
311
+ * if (error instanceof PluginError) {
312
+ * console.error(`Plugin '${error.pluginName}' failed:`, error.message);
313
+ * }
314
+ * }
315
+ * ```
316
+ */
317
+ declare class PluginError extends KreuzbergError {
318
+ /**
319
+ * Name of the plugin that threw the error.
320
+ */
321
+ readonly pluginName: string;
322
+ constructor(message: string, pluginName: string, panicContext?: PanicContext | null);
323
+ toJSON(): {
324
+ name: string;
325
+ message: string;
326
+ pluginName: string;
327
+ panicContext: PanicContext | null;
328
+ stack: string | undefined;
329
+ };
330
+ }
331
+ /**
332
+ * Error thrown when a required system dependency is missing.
333
+ *
334
+ * Missing dependency errors occur when external tools or libraries are not available, such as:
335
+ * - LibreOffice (for DOC/PPT/XLS files)
336
+ * - Tesseract OCR (for OCR processing)
337
+ * - ImageMagick (for image processing)
338
+ * - Poppler (for PDF rendering)
339
+ *
340
+ * @example
341
+ * ```typescript
342
+ * import { extractFile, MissingDependencyError } from '@kreuzberg/node';
343
+ *
344
+ * try {
345
+ * const result = await extractFile('document.doc');
346
+ * } catch (error) {
347
+ * if (error instanceof MissingDependencyError) {
348
+ * console.error('Missing dependency:', error.message);
349
+ * console.log('Please install LibreOffice to process DOC files');
350
+ * }
351
+ * }
352
+ * ```
353
+ */
354
+ declare class MissingDependencyError extends KreuzbergError {
355
+ constructor(message: string, panicContext?: PanicContext | null);
356
+ }
357
+
358
+ export { CacheError, ErrorCode, ImageProcessingError, KreuzbergError, MissingDependencyError, OcrError, type PanicContext, ParsingError, PluginError, ValidationError };