@opendataloader/pdf 1.0.5 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -75,40 +75,42 @@ pip install -U opendataloader-pdf
75
75
 
76
76
  ### Usage
77
77
 
78
- - input_path can be either the path to a single document or the path to a folder.
79
- - If you don’t specify an output_folder, the output data will be saved in the same directory as the input document.
78
+ input_path can be either the path to a single document or the path to a folder.
80
79
 
81
80
  ```python
82
81
  import opendataloader_pdf
83
82
 
84
- opendataloader_pdf.run(
85
- input_path="path/to/document.pdf",
86
- output_folder="path/to/output",
87
- generate_markdown=True,
88
- generate_html=True,
89
- generate_annotated_pdf=True,
83
+ opendataloader_pdf.convert(
84
+ input_path=["path/to/document.pdf", "path/to/folder"],
85
+ output_dir="path/to/output",
86
+ format=["json", "html", "pdf", "markdown"]
90
87
  )
91
88
  ```
92
89
 
93
- ### Function: run()
90
+ If you want to run it via CLI, you can use the following command on the terminal:
91
+
92
+ ```bash
93
+ opendataloader-pdf path/to/document.pdf path/to/folder -o path/to/output -f json html pdf markdown
94
+ ```
95
+
96
+ ### Function: convert()
94
97
 
95
98
  The main function to process PDFs.
96
99
 
97
- | Parameter | Type | Required | Default | Description |
98
- |--------------------------| ------ | -------- |--------------|---------------------------------------------------------------------------------------------------------------------------------------------|
99
- | `input_path` | `str` | ✅ Yes | — | Path to the input PDF file or folder. |
100
- | `output_folder` | `str` | No | input folder | Path to the output folder. |
101
- | `password` | `str` | No | `None` | Password for the PDF file. |
102
- | `replace_invalid_chars` | `str` | No | `" "` | Character to replace invalid or unrecognized characters (e.g., �, \u0000) |
103
- | `content_safety_off` | `str` | No | `None` | Disables one or more content safety filters. Accepts a comma-separated list of filter names. Arguments: all, hidden-text, off-page, tiny, hidden-ocg. |
104
- | `generate_markdown` | `bool` | No | `False` | If `True`, generates a Markdown output file. |
105
- | `generate_html` | `bool` | No | `False` | If `True`, generates an HTML output file. |
106
- | `generate_annotated_pdf` | `bool` | No | `False` | If `True`, generates an annotated PDF output file. |
107
- | `keep_line_breaks` | `bool` | No | `False` | If `True`, keeps line breaks in the output. |
108
- | `html_in_markdown` | `bool` | No | `False` | If `True`, uses HTML in the Markdown output. |
109
- | `add_image_to_markdown` | `bool` | No | `False` | If `True`, adds images to the Markdown output. |
110
- | `no_json` | `bool` | No | `False` | If `True`, disables the JSON output. |
111
- | `debug` | `bool` | No | `False` | If `True`, prints CLI messages to the console during execution. |
100
+ | Parameter | Type | Required | Default | Description |
101
+ |--------------------------|----------------| -------- |--------------|---------------------------------------------------------------------------------------------------------------------------------------------|
102
+ | `input_path` | `List[str]` | ✅ Yes | — | One or more PDF file paths or directories to process. |
103
+ | `output_dir` | `Optional[str]` | No | input folder | Directory where outputs are written. |
104
+ | `password` | `Optional[str]` | No | `None` | Password used for encrypted PDFs. |
105
+ | `format` | `Optional[List[str]]` | No | `None` | Output formats to generate (e.g. `"json"`, `"html"`, `"pdf"`, `"text"`, `"markdown"`, `"markdown-with-html"`, `"markdown-with-images"`). |
106
+ | `quiet` | `bool` | No | `False` | Suppresses CLI logging output when `True`. |
107
+ | `content_safety_off` | `Optional[List[str]]` | No | `None` | List of content safety filters to disable (e.g. `"all"`, `"hidden-text"`, `"off-page"`, `"tiny"`, `"hidden-ocg"`). |
108
+ | `keep_line_breaks` | `bool` | No | `False` | Preserves line breaks in text output when `True`. |
109
+ | `replace_invalid_chars` | `Optional[str]` | No | `None` | Replacement character for invalid or unrecognized characters (e.g., �, `\u0000`). |
110
+
111
+ ### Function: run()
112
+
113
+ Deprecated.
112
114
 
113
115
  <br/>
114
116
 
@@ -152,6 +154,24 @@ async function main() {
152
154
  main();
153
155
  ```
154
156
 
157
+ If you want to run it via CLI, you can use the following command:
158
+
159
+ ```bash
160
+ npx @opendataloader/pdf path/to/document.pdf -o path/to/output --markdown --html --pdf
161
+ ```
162
+
163
+ or you can install it globally:
164
+
165
+ ```bash
166
+ npm install -g @opendataloader/pdf
167
+ ```
168
+
169
+ then run:
170
+
171
+ ```bash
172
+ opendataloader-pdf path/to/document.pdf -o path/to/output --markdown --html --pdf
173
+ ```
174
+
155
175
  ### Function: run()
156
176
 
157
177
  `run(inputPath: string, options?: RunOptions): Promise<string>`
@@ -321,16 +341,23 @@ The images are extracted from PDF as individual files and stored in a subfolder
321
341
  ```
322
342
  Options:
323
343
  -o,--output-dir <arg> Specifies the output directory for generated files
344
+ -p,--password <arg> Specifies the password for an encrypted PDF
345
+ -f,--format <arg> List of output formats to generate (json, text, html, pdf, markdown, markdown-with-html, markdown-with-images). Default: json
346
+ -q,--quiet Suppresses console logging output
347
+ --content-safety-off <arg> Disables one or more content safety filters. Accepts a list of filter names. Arguments: all, hidden-text, off-page, tiny, hidden-ocg
324
348
  --keep-line-breaks Preserves original line breaks in the extracted text
325
- --content-safety-off <arg> Disables one or more content safety filters. Accepts a comma-separated list of filter names. Arguments: all, hidden-text, off-page, tiny, hidden-ocg
326
- --markdown-with-html Sets the data extraction output format to Markdown with rendering complex elements like tables as HTML for better structure
327
- --markdown-with-images Sets the data extraction output format to Markdown with extracting images from the PDF and includes them as links
328
- --markdown Sets the data extraction output format to Markdown
329
- --html Sets the data extraction output format to HTML
349
+ --replace-invalid-chars <arg> Replaces invalid or unrecognized characters (e.g., �, \u0000) with the specified character
350
+ ```
351
+
352
+ The legacy options (for backward compatibility):
353
+
354
+ ```
330
355
  --no-json Disables the JSON output format
331
- -p,--password <arg> Specifies the password for an encrypted PDF
356
+ --html Sets the data extraction output format to HTML
332
357
  --pdf Generates a new PDF file where the extracted layout data is visualized as annotations
333
- --replace-invalid-chars <arg> Replaces invalid or unrecognized characters (e.g., �, \u0000) with the specified character
358
+ --markdown Sets the data extraction output format to Markdown
359
+ --markdown-with-html Sets the data extraction output format to Markdown with rendering complex elements like tables as HTML for better structure
360
+ --markdown-with-images Sets the data extraction output format to Markdown with extracting images from the PDF and includes them as links
334
361
  ```
335
362
 
336
363
  ### Schema of the JSON output
package/dist/cli.cjs ADDED
@@ -0,0 +1,276 @@
1
+ #!/usr/bin/env node
2
+ "use strict";
3
+ var __create = Object.create;
4
+ var __defProp = Object.defineProperty;
5
+ var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
6
+ var __getOwnPropNames = Object.getOwnPropertyNames;
7
+ var __getProtoOf = Object.getPrototypeOf;
8
+ var __hasOwnProp = Object.prototype.hasOwnProperty;
9
+ var __copyProps = (to, from, except, desc) => {
10
+ if (from && typeof from === "object" || typeof from === "function") {
11
+ for (let key of __getOwnPropNames(from))
12
+ if (!__hasOwnProp.call(to, key) && key !== except)
13
+ __defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
14
+ }
15
+ return to;
16
+ };
17
+ var __toESM = (mod, isNodeMode, target) => (target = mod != null ? __create(__getProtoOf(mod)) : {}, __copyProps(
18
+ // If the importer is in node compatibility mode or this is not an ESM
19
+ // file that has been converted to a CommonJS file using a Babel-
20
+ // compatible transform (i.e. "__esModule" has not been set), then set
21
+ // "default" to the CommonJS "module.exports" for node compatibility.
22
+ isNodeMode || !mod || !mod.__esModule ? __defProp(target, "default", { value: mod, enumerable: true }) : target,
23
+ mod
24
+ ));
25
+
26
+ // src/index.ts
27
+ var import_child_process = require("child_process");
28
+ var path = __toESM(require("path"), 1);
29
+ var fs = __toESM(require("fs"), 1);
30
+ var import_url = require("url");
31
+ var import_meta = {};
32
+ var __filename = (0, import_url.fileURLToPath)(import_meta.url);
33
+ var __dirname = path.dirname(__filename);
34
+ var JAR_NAME = "opendataloader-pdf-cli.jar";
35
+ function getRedactedCommandString(command, commandArgs) {
36
+ const commandArgsForLogging = [...commandArgs];
37
+ const passwordIndex = commandArgsForLogging.indexOf("--password");
38
+ if (passwordIndex > -1 && passwordIndex + 1 < commandArgsForLogging.length) {
39
+ commandArgsForLogging[passwordIndex + 1] = "[REDACTED]";
40
+ }
41
+ return `${command} ${commandArgsForLogging.join(" ")}`;
42
+ }
43
+ function run(inputPath, options = {}) {
44
+ return new Promise((resolve, reject) => {
45
+ if (!fs.existsSync(inputPath)) {
46
+ return reject(new Error(`Input file or folder not found: ${inputPath}`));
47
+ }
48
+ const args = [];
49
+ if (options.outputFolder) {
50
+ args.push("--output-dir", options.outputFolder);
51
+ }
52
+ if (options.password) {
53
+ args.push("--password", options.password);
54
+ }
55
+ if (options.replaceInvalidChars) {
56
+ args.push("--replace-invalid-chars", options.replaceInvalidChars);
57
+ }
58
+ if (options.generateMarkdown) {
59
+ args.push("--markdown");
60
+ }
61
+ if (options.generateHtml) {
62
+ args.push("--html");
63
+ }
64
+ if (options.generateAnnotatedPdf) {
65
+ args.push("--pdf");
66
+ }
67
+ if (options.keepLineBreaks) {
68
+ args.push("--keep-line-breaks");
69
+ }
70
+ if (options.contentSafetyOff) {
71
+ args.push("--content-safety-off", options.contentSafetyOff);
72
+ }
73
+ if (options.htmlInMarkdown) {
74
+ args.push("--markdown-with-html");
75
+ }
76
+ if (options.addImageToMarkdown) {
77
+ args.push("--markdown-with-images");
78
+ }
79
+ if (options.noJson) {
80
+ args.push("--no-json");
81
+ }
82
+ args.push(inputPath);
83
+ const jarPath = path.join(__dirname, "..", "lib", JAR_NAME);
84
+ if (!fs.existsSync(jarPath)) {
85
+ return reject(
86
+ new Error(`JAR file not found at ${jarPath}. Please run the build script first.`)
87
+ );
88
+ }
89
+ const command = "java";
90
+ const commandArgs = ["-jar", jarPath, ...args];
91
+ if (options.debug) {
92
+ console.error(`Running command: ${getRedactedCommandString(command, commandArgs)}`);
93
+ }
94
+ const javaProcess = (0, import_child_process.spawn)(command, commandArgs);
95
+ let stdout = "";
96
+ let stderr = "";
97
+ javaProcess.stdout.on("data", (data) => {
98
+ const chunk = data.toString();
99
+ if (options.debug) {
100
+ process.stdout.write(chunk);
101
+ }
102
+ stdout += chunk;
103
+ });
104
+ javaProcess.stderr.on("data", (data) => {
105
+ const chunk = data.toString();
106
+ if (options.debug) {
107
+ process.stderr.write(chunk);
108
+ }
109
+ stderr += chunk;
110
+ });
111
+ javaProcess.on("close", (code) => {
112
+ if (code === 0) {
113
+ resolve(stdout);
114
+ } else {
115
+ const error = new Error(
116
+ `The opendataloader-pdf CLI exited with code ${code}.
117
+
118
+ ${stderr}`
119
+ );
120
+ reject(error);
121
+ }
122
+ });
123
+ javaProcess.on("error", (err) => {
124
+ if (err.message.includes("ENOENT")) {
125
+ reject(
126
+ new Error(
127
+ "'java' command not found. Please ensure Java is installed and in your system's PATH."
128
+ )
129
+ );
130
+ } else {
131
+ reject(err);
132
+ }
133
+ });
134
+ });
135
+ }
136
+
137
+ // src/cli.ts
138
+ function printHelp() {
139
+ console.log(`Usage: opendataloader-pdf [options] <input>`);
140
+ console.log("");
141
+ console.log("Options:");
142
+ console.log(" -o, --output-dir <path> Directory where outputs are written");
143
+ console.log(" -p, --password <password> Password for encrypted PDFs");
144
+ console.log(" --replace-invalid-chars <c> Replacement character for invalid characters");
145
+ console.log(" --content-safety-off <mode> Disable content safety filtering (provide mode)");
146
+ console.log(" --markdown Generate Markdown output");
147
+ console.log(" --html Generate HTML output");
148
+ console.log(" --pdf Generate annotated PDF output");
149
+ console.log(" --keep-line-breaks Preserve line breaks in text output");
150
+ console.log(" --markdown-with-html Allow raw HTML within Markdown output");
151
+ console.log(" --markdown-with-images Embed images in Markdown output");
152
+ console.log(" --no-json Disable JSON output generation");
153
+ console.log(" --debug Stream CLI logs directly to stdout/stderr");
154
+ console.log(" -h, --help Show this message and exit");
155
+ }
156
+ function parseArgs(argv) {
157
+ const options = {};
158
+ let inputPath;
159
+ let showHelp = false;
160
+ const readValue = (currentIndex, option) => {
161
+ const nextValue = argv[currentIndex + 1];
162
+ if (!nextValue || nextValue.startsWith("-")) {
163
+ throw new Error(`Option ${option} requires a value.`);
164
+ }
165
+ return { value: nextValue, nextIndex: currentIndex + 1 };
166
+ };
167
+ for (let i = 0; i < argv.length; i += 1) {
168
+ const arg = argv[i];
169
+ switch (arg) {
170
+ case "--help":
171
+ case "-h":
172
+ showHelp = true;
173
+ i = argv.length;
174
+ break;
175
+ case "--output-dir":
176
+ case "-o": {
177
+ const { value, nextIndex } = readValue(i, arg);
178
+ options.outputFolder = value;
179
+ i = nextIndex;
180
+ break;
181
+ }
182
+ case "--password":
183
+ case "-p": {
184
+ const { value, nextIndex } = readValue(i, arg);
185
+ options.password = value;
186
+ i = nextIndex;
187
+ break;
188
+ }
189
+ case "--replace-invalid-chars": {
190
+ const { value, nextIndex } = readValue(i, arg);
191
+ options.replaceInvalidChars = value;
192
+ i = nextIndex;
193
+ break;
194
+ }
195
+ case "--content-safety-off": {
196
+ const { value, nextIndex } = readValue(i, arg);
197
+ options.contentSafetyOff = value;
198
+ i = nextIndex;
199
+ break;
200
+ }
201
+ case "--markdown":
202
+ options.generateMarkdown = true;
203
+ break;
204
+ case "--html":
205
+ options.generateHtml = true;
206
+ break;
207
+ case "--pdf":
208
+ options.generateAnnotatedPdf = true;
209
+ break;
210
+ case "--keep-line-breaks":
211
+ options.keepLineBreaks = true;
212
+ break;
213
+ case "--markdown-with-html":
214
+ options.htmlInMarkdown = true;
215
+ break;
216
+ case "--markdown-with-images":
217
+ options.addImageToMarkdown = true;
218
+ break;
219
+ case "--no-json":
220
+ options.noJson = true;
221
+ break;
222
+ case "--debug":
223
+ options.debug = true;
224
+ break;
225
+ default:
226
+ if (arg.startsWith("-")) {
227
+ throw new Error(`Unknown option: ${arg}`);
228
+ }
229
+ if (inputPath) {
230
+ throw new Error("Multiple input paths provided. Only one input path is allowed.");
231
+ }
232
+ inputPath = arg;
233
+ }
234
+ }
235
+ return { inputPath, options, showHelp };
236
+ }
237
+ async function main() {
238
+ let parsed;
239
+ try {
240
+ parsed = parseArgs(process.argv.slice(2));
241
+ } catch (err) {
242
+ const message = err instanceof Error ? err.message : String(err);
243
+ console.error(message);
244
+ console.error("Use '--help' to see available options.");
245
+ return 1;
246
+ }
247
+ if (parsed.showHelp) {
248
+ printHelp();
249
+ return 0;
250
+ }
251
+ if (!parsed.inputPath) {
252
+ console.error("Missing required input path.");
253
+ console.error("Use '--help' to see usage information.");
254
+ return 1;
255
+ }
256
+ try {
257
+ const output = await run(parsed.inputPath, parsed.options);
258
+ if (output && !parsed.options.debug) {
259
+ process.stdout.write(output);
260
+ }
261
+ if (output && !output.endsWith("\n") && !parsed.options.debug) {
262
+ process.stdout.write("\n");
263
+ }
264
+ return 0;
265
+ } catch (err) {
266
+ const message = err instanceof Error ? err.message : String(err);
267
+ console.error(message);
268
+ return 1;
269
+ }
270
+ }
271
+ main().then((code) => {
272
+ if (code !== 0) {
273
+ process.exit(code);
274
+ }
275
+ });
276
+ //# sourceMappingURL=cli.cjs.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/index.ts","../src/cli.ts"],"sourcesContent":["import { spawn } from 'child_process';\nimport * as path from 'path';\nimport * as fs from 'fs';\nimport { fileURLToPath } from 'url';\n\nconst __filename = fileURLToPath(import.meta.url);\nconst __dirname = path.dirname(__filename);\n\nconst JAR_NAME = 'opendataloader-pdf-cli.jar';\n\nfunction getRedactedCommandString(command: string, commandArgs: string[]): string {\n const commandArgsForLogging = [...commandArgs];\n const passwordIndex = commandArgsForLogging.indexOf('--password');\n if (passwordIndex > -1 && passwordIndex + 1 < commandArgsForLogging.length) {\n commandArgsForLogging[passwordIndex + 1] = '[REDACTED]';\n }\n return `${command} ${commandArgsForLogging.join(' ')}`;\n}\n\nexport interface RunOptions {\n outputFolder?: string;\n password?: string;\n replaceInvalidChars?: string;\n generateMarkdown?: boolean;\n generateHtml?: boolean;\n generateAnnotatedPdf?: boolean;\n keepLineBreaks?: boolean;\n contentSafetyOff?: string;\n htmlInMarkdown?: boolean;\n addImageToMarkdown?: boolean;\n noJson?: boolean;\n debug?: boolean;\n}\n\nexport function run(inputPath: string, options: RunOptions = {}): Promise<string> {\n return new Promise((resolve, reject) => {\n if (!fs.existsSync(inputPath)) {\n return reject(new Error(`Input file or folder not found: ${inputPath}`));\n }\n\n const args: string[] = [];\n if (options.outputFolder) {\n args.push('--output-dir', options.outputFolder);\n }\n if (options.password) {\n args.push('--password', options.password);\n }\n if (options.replaceInvalidChars) {\n args.push('--replace-invalid-chars', options.replaceInvalidChars);\n }\n if (options.generateMarkdown) {\n args.push('--markdown');\n }\n if (options.generateHtml) {\n args.push('--html');\n }\n if (options.generateAnnotatedPdf) {\n args.push('--pdf');\n }\n if (options.keepLineBreaks) {\n args.push('--keep-line-breaks');\n }\n if (options.contentSafetyOff) {\n args.push('--content-safety-off', options.contentSafetyOff);\n }\n if (options.htmlInMarkdown) {\n args.push('--markdown-with-html');\n }\n if (options.addImageToMarkdown) {\n args.push('--markdown-with-images');\n }\n if (options.noJson) {\n args.push('--no-json');\n }\n\n args.push(inputPath);\n\n const jarPath = path.join(__dirname, '..', 'lib', JAR_NAME);\n\n if (!fs.existsSync(jarPath)) {\n return reject(\n new Error(`JAR file not found at ${jarPath}. Please run the build script first.`),\n );\n }\n\n const command = 'java';\n const commandArgs = ['-jar', jarPath, ...args];\n\n if (options.debug) {\n console.error(`Running command: ${getRedactedCommandString(command, commandArgs)}`);\n }\n\n const javaProcess = spawn(command, commandArgs);\n\n let stdout = '';\n let stderr = '';\n\n javaProcess.stdout.on('data', (data) => {\n const chunk = data.toString();\n if (options.debug) {\n process.stdout.write(chunk);\n }\n stdout += chunk;\n });\n\n javaProcess.stderr.on('data', (data) => {\n const chunk = data.toString();\n if (options.debug) {\n process.stderr.write(chunk);\n }\n stderr += chunk;\n });\n\n javaProcess.on('close', (code) => {\n if (code === 0) {\n resolve(stdout);\n } else {\n const error = new Error(\n `The opendataloader-pdf CLI exited with code ${code}.\\n\\n${stderr}`,\n );\n reject(error);\n }\n });\n\n javaProcess.on('error', (err) => {\n if (err.message.includes('ENOENT')) {\n reject(\n new Error(\n \"'java' command not found. Please ensure Java is installed and in your system's PATH.\",\n ),\n );\n } else {\n reject(err);\n }\n });\n });\n}\n","#!/usr/bin/env node\nimport { run, RunOptions } from './index.js';\n\ninterface ParsedArgs {\n inputPath?: string;\n options: RunOptions;\n showHelp: boolean;\n}\n\nfunction printHelp(): void {\n console.log(`Usage: opendataloader-pdf [options] <input>`);\n console.log('');\n console.log('Options:');\n console.log(' -o, --output-dir <path> Directory where outputs are written');\n console.log(' -p, --password <password> Password for encrypted PDFs');\n console.log(' --replace-invalid-chars <c> Replacement character for invalid characters');\n console.log(' --content-safety-off <mode> Disable content safety filtering (provide mode)');\n console.log(' --markdown Generate Markdown output');\n console.log(' --html Generate HTML output');\n console.log(' --pdf Generate annotated PDF output');\n console.log(' --keep-line-breaks Preserve line breaks in text output');\n console.log(' --markdown-with-html Allow raw HTML within Markdown output');\n console.log(' --markdown-with-images Embed images in Markdown output');\n console.log(' --no-json Disable JSON output generation');\n console.log(' --debug Stream CLI logs directly to stdout/stderr');\n console.log(' -h, --help Show this message and exit');\n}\n\nfunction parseArgs(argv: string[]): ParsedArgs {\n const options: RunOptions = {};\n let inputPath: string | undefined;\n let showHelp = false;\n\n const readValue = (currentIndex: number, option: string): { value: string; nextIndex: number } => {\n const nextValue = argv[currentIndex + 1];\n if (!nextValue || nextValue.startsWith('-')) {\n throw new Error(`Option ${option} requires a value.`);\n }\n return { value: nextValue, nextIndex: currentIndex + 1 };\n };\n\n for (let i = 0; i < argv.length; i += 1) {\n const arg = argv[i];\n\n switch (arg) {\n case '--help':\n case '-h':\n showHelp = true;\n i = argv.length; // exit loop\n break;\n case '--output-dir':\n case '-o': {\n const { value, nextIndex } = readValue(i, arg);\n options.outputFolder = value;\n i = nextIndex;\n break;\n }\n case '--password':\n case '-p': {\n const { value, nextIndex } = readValue(i, arg);\n options.password = value;\n i = nextIndex;\n break;\n }\n case '--replace-invalid-chars': {\n const { value, nextIndex } = readValue(i, arg);\n options.replaceInvalidChars = value;\n i = nextIndex;\n break;\n }\n case '--content-safety-off': {\n const { value, nextIndex } = readValue(i, arg);\n options.contentSafetyOff = value;\n i = nextIndex;\n break;\n }\n case '--markdown':\n options.generateMarkdown = true;\n break;\n case '--html':\n options.generateHtml = true;\n break;\n case '--pdf':\n options.generateAnnotatedPdf = true;\n break;\n case '--keep-line-breaks':\n options.keepLineBreaks = true;\n break;\n case '--markdown-with-html':\n options.htmlInMarkdown = true;\n break;\n case '--markdown-with-images':\n options.addImageToMarkdown = true;\n break;\n case '--no-json':\n options.noJson = true;\n break;\n case '--debug':\n options.debug = true;\n break;\n default:\n if (arg.startsWith('-')) {\n throw new Error(`Unknown option: ${arg}`);\n }\n\n if (inputPath) {\n throw new Error('Multiple input paths provided. Only one input path is allowed.');\n }\n inputPath = arg;\n }\n }\n\n return { inputPath, options, showHelp };\n}\n\nasync function main(): Promise<number> {\n let parsed: ParsedArgs;\n\n try {\n parsed = parseArgs(process.argv.slice(2));\n } catch (err) {\n const message = err instanceof Error ? err.message : String(err);\n console.error(message);\n console.error(\"Use '--help' to see available options.\");\n return 1;\n }\n\n if (parsed.showHelp) {\n printHelp();\n return 0;\n }\n\n if (!parsed.inputPath) {\n console.error('Missing required input path.');\n console.error(\"Use '--help' to see usage information.\");\n return 1;\n }\n\n try {\n const output = await run(parsed.inputPath, parsed.options);\n if (output && !parsed.options.debug) {\n process.stdout.write(output);\n }\n if (output && !output.endsWith('\\n') && !parsed.options.debug) {\n process.stdout.write('\\n');\n }\n return 0;\n } catch (err) {\n const message = err instanceof Error ? err.message : String(err);\n console.error(message);\n return 1;\n }\n}\n\nmain().then((code) => {\n if (code !== 0) {\n process.exit(code);\n }\n});\n"],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;AAAA,2BAAsB;AACtB,WAAsB;AACtB,SAAoB;AACpB,iBAA8B;AAH9B;AAKA,IAAM,iBAAa,0BAAc,YAAY,GAAG;AAChD,IAAM,YAAiB,aAAQ,UAAU;AAEzC,IAAM,WAAW;AAEjB,SAAS,yBAAyB,SAAiB,aAA+B;AAChF,QAAM,wBAAwB,CAAC,GAAG,WAAW;AAC7C,QAAM,gBAAgB,sBAAsB,QAAQ,YAAY;AAChE,MAAI,gBAAgB,MAAM,gBAAgB,IAAI,sBAAsB,QAAQ;AAC1E,0BAAsB,gBAAgB,CAAC,IAAI;AAAA,EAC7C;AACA,SAAO,GAAG,OAAO,IAAI,sBAAsB,KAAK,GAAG,CAAC;AACtD;AAiBO,SAAS,IAAI,WAAmB,UAAsB,CAAC,GAAoB;AAChF,SAAO,IAAI,QAAQ,CAAC,SAAS,WAAW;AACtC,QAAI,CAAI,cAAW,SAAS,GAAG;AAC7B,aAAO,OAAO,IAAI,MAAM,mCAAmC,SAAS,EAAE,CAAC;AAAA,IACzE;AAEA,UAAM,OAAiB,CAAC;AACxB,QAAI,QAAQ,cAAc;AACxB,WAAK,KAAK,gBAAgB,QAAQ,YAAY;AAAA,IAChD;AACA,QAAI,QAAQ,UAAU;AACpB,WAAK,KAAK,cAAc,QAAQ,QAAQ;AAAA,IAC1C;AACA,QAAI,QAAQ,qBAAqB;AAC/B,WAAK,KAAK,2BAA2B,QAAQ,mBAAmB;AAAA,IAClE;AACA,QAAI,QAAQ,kBAAkB;AAC5B,WAAK,KAAK,YAAY;AAAA,IACxB;AACA,QAAI,QAAQ,cAAc;AACxB,WAAK,KAAK,QAAQ;AAAA,IACpB;AACA,QAAI,QAAQ,sBAAsB;AAChC,WAAK,KAAK,OAAO;AAAA,IACnB;AACA,QAAI,QAAQ,gBAAgB;AAC1B,WAAK,KAAK,oBAAoB;AAAA,IAChC;AACA,QAAI,QAAQ,kBAAkB;AAC5B,WAAK,KAAK,wBAAwB,QAAQ,gBAAgB;AAAA,IAC5D;AACA,QAAI,QAAQ,gBAAgB;AAC1B,WAAK,KAAK,sBAAsB;AAAA,IAClC;AACA,QAAI,QAAQ,oBAAoB;AAC9B,WAAK,KAAK,wBAAwB;AAAA,IACpC;AACA,QAAI,QAAQ,QAAQ;AAClB,WAAK,KAAK,WAAW;AAAA,IACvB;AAEA,SAAK,KAAK,SAAS;AAEnB,UAAM,UAAe,UAAK,WAAW,MAAM,OAAO,QAAQ;AAE1D,QAAI,CAAI,cAAW,OAAO,GAAG;AAC3B,aAAO;AAAA,QACL,IAAI,MAAM,yBAAyB,OAAO,sCAAsC;AAAA,MAClF;AAAA,IACF;AAEA,UAAM,UAAU;AAChB,UAAM,cAAc,CAAC,QAAQ,SAAS,GAAG,IAAI;AAE7C,QAAI,QAAQ,OAAO;AACjB,cAAQ,MAAM,oBAAoB,yBAAyB,SAAS,WAAW,CAAC,EAAE;AAAA,IACpF;AAEA,UAAM,kBAAc,4BAAM,SAAS,WAAW;AAE9C,QAAI,SAAS;AACb,QAAI,SAAS;AAEb,gBAAY,OAAO,GAAG,QAAQ,CAAC,SAAS;AACtC,YAAM,QAAQ,KAAK,SAAS;AAC5B,UAAI,QAAQ,OAAO;AACjB,gBAAQ,OAAO,MAAM,KAAK;AAAA,MAC5B;AACA,gBAAU;AAAA,IACZ,CAAC;AAED,gBAAY,OAAO,GAAG,QAAQ,CAAC,SAAS;AACtC,YAAM,QAAQ,KAAK,SAAS;AAC5B,UAAI,QAAQ,OAAO;AACjB,gBAAQ,OAAO,MAAM,KAAK;AAAA,MAC5B;AACA,gBAAU;AAAA,IACZ,CAAC;AAED,gBAAY,GAAG,SAAS,CAAC,SAAS;AAChC,UAAI,SAAS,GAAG;AACd,gBAAQ,MAAM;AAAA,MAChB,OAAO;AACL,cAAM,QAAQ,IAAI;AAAA,UAChB,+CAA+C,IAAI;AAAA;AAAA,EAAQ,MAAM;AAAA,QACnE;AACA,eAAO,KAAK;AAAA,MACd;AAAA,IACF,CAAC;AAED,gBAAY,GAAG,SAAS,CAAC,QAAQ;AAC/B,UAAI,IAAI,QAAQ,SAAS,QAAQ,GAAG;AAClC;AAAA,UACE,IAAI;AAAA,YACF;AAAA,UACF;AAAA,QACF;AAAA,MACF,OAAO;AACL,eAAO,GAAG;AAAA,MACZ;AAAA,IACF,CAAC;AAAA,EACH,CAAC;AACH;;;AC/HA,SAAS,YAAkB;AACzB,UAAQ,IAAI,6CAA6C;AACzD,UAAQ,IAAI,EAAE;AACd,UAAQ,IAAI,UAAU;AACtB,UAAQ,IAAI,oEAAoE;AAChF,UAAQ,IAAI,4DAA4D;AACxE,UAAQ,IAAI,6EAA6E;AACzF,UAAQ,IAAI,gFAAgF;AAC5F,UAAQ,IAAI,yDAAyD;AACrE,UAAQ,IAAI,qDAAqD;AACjE,UAAQ,IAAI,8DAA8D;AAC1E,UAAQ,IAAI,oEAAoE;AAChF,UAAQ,IAAI,sEAAsE;AAClF,UAAQ,IAAI,gEAAgE;AAC5E,UAAQ,IAAI,+DAA+D;AAC3E,UAAQ,IAAI,0EAA0E;AACtF,UAAQ,IAAI,2DAA2D;AACzE;AAEA,SAAS,UAAU,MAA4B;AAC7C,QAAM,UAAsB,CAAC;AAC7B,MAAI;AACJ,MAAI,WAAW;AAEf,QAAM,YAAY,CAAC,cAAsB,WAAyD;AAChG,UAAM,YAAY,KAAK,eAAe,CAAC;AACvC,QAAI,CAAC,aAAa,UAAU,WAAW,GAAG,GAAG;AAC3C,YAAM,IAAI,MAAM,UAAU,MAAM,oBAAoB;AAAA,IACtD;AACA,WAAO,EAAE,OAAO,WAAW,WAAW,eAAe,EAAE;AAAA,EACzD;AAEA,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK,GAAG;AACvC,UAAM,MAAM,KAAK,CAAC;AAElB,YAAQ,KAAK;AAAA,MACX,KAAK;AAAA,MACL,KAAK;AACH,mBAAW;AACX,YAAI,KAAK;AACT;AAAA,MACF,KAAK;AAAA,MACL,KAAK,MAAM;AACT,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,eAAe;AACvB,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK;AAAA,MACL,KAAK,MAAM;AACT,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,WAAW;AACnB,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK,2BAA2B;AAC9B,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,sBAAsB;AAC9B,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK,wBAAwB;AAC3B,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,mBAAmB;AAC3B,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK;AACH,gBAAQ,mBAAmB;AAC3B;AAAA,MACF,KAAK;AACH,gBAAQ,eAAe;AACvB;AAAA,MACF,KAAK;AACH,gBAAQ,uBAAuB;AAC/B;AAAA,MACF,KAAK;AACH,gBAAQ,iBAAiB;AACzB;AAAA,MACF,KAAK;AACH,gBAAQ,iBAAiB;AACzB;AAAA,MACF,KAAK;AACH,gBAAQ,qBAAqB;AAC7B;AAAA,MACF,KAAK;AACH,gBAAQ,SAAS;AACjB;AAAA,MACF,KAAK;AACH,gBAAQ,QAAQ;AAChB;AAAA,MACF;AACE,YAAI,IAAI,WAAW,GAAG,GAAG;AACvB,gBAAM,IAAI,MAAM,mBAAmB,GAAG,EAAE;AAAA,QAC1C;AAEA,YAAI,WAAW;AACb,gBAAM,IAAI,MAAM,gEAAgE;AAAA,QAClF;AACA,oBAAY;AAAA,IAChB;AAAA,EACF;AAEA,SAAO,EAAE,WAAW,SAAS,SAAS;AACxC;AAEA,eAAe,OAAwB;AACrC,MAAI;AAEJ,MAAI;AACF,aAAS,UAAU,QAAQ,KAAK,MAAM,CAAC,CAAC;AAAA,EAC1C,SAAS,KAAK;AACZ,UAAM,UAAU,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAC/D,YAAQ,MAAM,OAAO;AACrB,YAAQ,MAAM,wCAAwC;AACtD,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,UAAU;AACnB,cAAU;AACV,WAAO;AAAA,EACT;AAEA,MAAI,CAAC,OAAO,WAAW;AACrB,YAAQ,MAAM,8BAA8B;AAC5C,YAAQ,MAAM,wCAAwC;AACtD,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,SAAS,MAAM,IAAI,OAAO,WAAW,OAAO,OAAO;AACzD,QAAI,UAAU,CAAC,OAAO,QAAQ,OAAO;AACnC,cAAQ,OAAO,MAAM,MAAM;AAAA,IAC7B;AACA,QAAI,UAAU,CAAC,OAAO,SAAS,IAAI,KAAK,CAAC,OAAO,QAAQ,OAAO;AAC7D,cAAQ,OAAO,MAAM,IAAI;AAAA,IAC3B;AACA,WAAO;AAAA,EACT,SAAS,KAAK;AACZ,UAAM,UAAU,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAC/D,YAAQ,MAAM,OAAO;AACrB,WAAO;AAAA,EACT;AACF;AAEA,KAAK,EAAE,KAAK,CAAC,SAAS;AACpB,MAAI,SAAS,GAAG;AACd,YAAQ,KAAK,IAAI;AAAA,EACnB;AACF,CAAC;","names":[]}
package/dist/cli.d.cts ADDED
@@ -0,0 +1 @@
1
+ #!/usr/bin/env node
package/dist/cli.d.ts ADDED
@@ -0,0 +1 @@
1
+ #!/usr/bin/env node
package/dist/cli.js ADDED
@@ -0,0 +1,252 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/index.ts
4
+ import { spawn } from "child_process";
5
+ import * as path from "path";
6
+ import * as fs from "fs";
7
+ import { fileURLToPath } from "url";
8
+ var __filename = fileURLToPath(import.meta.url);
9
+ var __dirname = path.dirname(__filename);
10
+ var JAR_NAME = "opendataloader-pdf-cli.jar";
11
+ function getRedactedCommandString(command, commandArgs) {
12
+ const commandArgsForLogging = [...commandArgs];
13
+ const passwordIndex = commandArgsForLogging.indexOf("--password");
14
+ if (passwordIndex > -1 && passwordIndex + 1 < commandArgsForLogging.length) {
15
+ commandArgsForLogging[passwordIndex + 1] = "[REDACTED]";
16
+ }
17
+ return `${command} ${commandArgsForLogging.join(" ")}`;
18
+ }
19
+ function run(inputPath, options = {}) {
20
+ return new Promise((resolve, reject) => {
21
+ if (!fs.existsSync(inputPath)) {
22
+ return reject(new Error(`Input file or folder not found: ${inputPath}`));
23
+ }
24
+ const args = [];
25
+ if (options.outputFolder) {
26
+ args.push("--output-dir", options.outputFolder);
27
+ }
28
+ if (options.password) {
29
+ args.push("--password", options.password);
30
+ }
31
+ if (options.replaceInvalidChars) {
32
+ args.push("--replace-invalid-chars", options.replaceInvalidChars);
33
+ }
34
+ if (options.generateMarkdown) {
35
+ args.push("--markdown");
36
+ }
37
+ if (options.generateHtml) {
38
+ args.push("--html");
39
+ }
40
+ if (options.generateAnnotatedPdf) {
41
+ args.push("--pdf");
42
+ }
43
+ if (options.keepLineBreaks) {
44
+ args.push("--keep-line-breaks");
45
+ }
46
+ if (options.contentSafetyOff) {
47
+ args.push("--content-safety-off", options.contentSafetyOff);
48
+ }
49
+ if (options.htmlInMarkdown) {
50
+ args.push("--markdown-with-html");
51
+ }
52
+ if (options.addImageToMarkdown) {
53
+ args.push("--markdown-with-images");
54
+ }
55
+ if (options.noJson) {
56
+ args.push("--no-json");
57
+ }
58
+ args.push(inputPath);
59
+ const jarPath = path.join(__dirname, "..", "lib", JAR_NAME);
60
+ if (!fs.existsSync(jarPath)) {
61
+ return reject(
62
+ new Error(`JAR file not found at ${jarPath}. Please run the build script first.`)
63
+ );
64
+ }
65
+ const command = "java";
66
+ const commandArgs = ["-jar", jarPath, ...args];
67
+ if (options.debug) {
68
+ console.error(`Running command: ${getRedactedCommandString(command, commandArgs)}`);
69
+ }
70
+ const javaProcess = spawn(command, commandArgs);
71
+ let stdout = "";
72
+ let stderr = "";
73
+ javaProcess.stdout.on("data", (data) => {
74
+ const chunk = data.toString();
75
+ if (options.debug) {
76
+ process.stdout.write(chunk);
77
+ }
78
+ stdout += chunk;
79
+ });
80
+ javaProcess.stderr.on("data", (data) => {
81
+ const chunk = data.toString();
82
+ if (options.debug) {
83
+ process.stderr.write(chunk);
84
+ }
85
+ stderr += chunk;
86
+ });
87
+ javaProcess.on("close", (code) => {
88
+ if (code === 0) {
89
+ resolve(stdout);
90
+ } else {
91
+ const error = new Error(
92
+ `The opendataloader-pdf CLI exited with code ${code}.
93
+
94
+ ${stderr}`
95
+ );
96
+ reject(error);
97
+ }
98
+ });
99
+ javaProcess.on("error", (err) => {
100
+ if (err.message.includes("ENOENT")) {
101
+ reject(
102
+ new Error(
103
+ "'java' command not found. Please ensure Java is installed and in your system's PATH."
104
+ )
105
+ );
106
+ } else {
107
+ reject(err);
108
+ }
109
+ });
110
+ });
111
+ }
112
+
113
+ // src/cli.ts
114
+ function printHelp() {
115
+ console.log(`Usage: opendataloader-pdf [options] <input>`);
116
+ console.log("");
117
+ console.log("Options:");
118
+ console.log(" -o, --output-dir <path> Directory where outputs are written");
119
+ console.log(" -p, --password <password> Password for encrypted PDFs");
120
+ console.log(" --replace-invalid-chars <c> Replacement character for invalid characters");
121
+ console.log(" --content-safety-off <mode> Disable content safety filtering (provide mode)");
122
+ console.log(" --markdown Generate Markdown output");
123
+ console.log(" --html Generate HTML output");
124
+ console.log(" --pdf Generate annotated PDF output");
125
+ console.log(" --keep-line-breaks Preserve line breaks in text output");
126
+ console.log(" --markdown-with-html Allow raw HTML within Markdown output");
127
+ console.log(" --markdown-with-images Embed images in Markdown output");
128
+ console.log(" --no-json Disable JSON output generation");
129
+ console.log(" --debug Stream CLI logs directly to stdout/stderr");
130
+ console.log(" -h, --help Show this message and exit");
131
+ }
132
+ function parseArgs(argv) {
133
+ const options = {};
134
+ let inputPath;
135
+ let showHelp = false;
136
+ const readValue = (currentIndex, option) => {
137
+ const nextValue = argv[currentIndex + 1];
138
+ if (!nextValue || nextValue.startsWith("-")) {
139
+ throw new Error(`Option ${option} requires a value.`);
140
+ }
141
+ return { value: nextValue, nextIndex: currentIndex + 1 };
142
+ };
143
+ for (let i = 0; i < argv.length; i += 1) {
144
+ const arg = argv[i];
145
+ switch (arg) {
146
+ case "--help":
147
+ case "-h":
148
+ showHelp = true;
149
+ i = argv.length;
150
+ break;
151
+ case "--output-dir":
152
+ case "-o": {
153
+ const { value, nextIndex } = readValue(i, arg);
154
+ options.outputFolder = value;
155
+ i = nextIndex;
156
+ break;
157
+ }
158
+ case "--password":
159
+ case "-p": {
160
+ const { value, nextIndex } = readValue(i, arg);
161
+ options.password = value;
162
+ i = nextIndex;
163
+ break;
164
+ }
165
+ case "--replace-invalid-chars": {
166
+ const { value, nextIndex } = readValue(i, arg);
167
+ options.replaceInvalidChars = value;
168
+ i = nextIndex;
169
+ break;
170
+ }
171
+ case "--content-safety-off": {
172
+ const { value, nextIndex } = readValue(i, arg);
173
+ options.contentSafetyOff = value;
174
+ i = nextIndex;
175
+ break;
176
+ }
177
+ case "--markdown":
178
+ options.generateMarkdown = true;
179
+ break;
180
+ case "--html":
181
+ options.generateHtml = true;
182
+ break;
183
+ case "--pdf":
184
+ options.generateAnnotatedPdf = true;
185
+ break;
186
+ case "--keep-line-breaks":
187
+ options.keepLineBreaks = true;
188
+ break;
189
+ case "--markdown-with-html":
190
+ options.htmlInMarkdown = true;
191
+ break;
192
+ case "--markdown-with-images":
193
+ options.addImageToMarkdown = true;
194
+ break;
195
+ case "--no-json":
196
+ options.noJson = true;
197
+ break;
198
+ case "--debug":
199
+ options.debug = true;
200
+ break;
201
+ default:
202
+ if (arg.startsWith("-")) {
203
+ throw new Error(`Unknown option: ${arg}`);
204
+ }
205
+ if (inputPath) {
206
+ throw new Error("Multiple input paths provided. Only one input path is allowed.");
207
+ }
208
+ inputPath = arg;
209
+ }
210
+ }
211
+ return { inputPath, options, showHelp };
212
+ }
213
+ async function main() {
214
+ let parsed;
215
+ try {
216
+ parsed = parseArgs(process.argv.slice(2));
217
+ } catch (err) {
218
+ const message = err instanceof Error ? err.message : String(err);
219
+ console.error(message);
220
+ console.error("Use '--help' to see available options.");
221
+ return 1;
222
+ }
223
+ if (parsed.showHelp) {
224
+ printHelp();
225
+ return 0;
226
+ }
227
+ if (!parsed.inputPath) {
228
+ console.error("Missing required input path.");
229
+ console.error("Use '--help' to see usage information.");
230
+ return 1;
231
+ }
232
+ try {
233
+ const output = await run(parsed.inputPath, parsed.options);
234
+ if (output && !parsed.options.debug) {
235
+ process.stdout.write(output);
236
+ }
237
+ if (output && !output.endsWith("\n") && !parsed.options.debug) {
238
+ process.stdout.write("\n");
239
+ }
240
+ return 0;
241
+ } catch (err) {
242
+ const message = err instanceof Error ? err.message : String(err);
243
+ console.error(message);
244
+ return 1;
245
+ }
246
+ }
247
+ main().then((code) => {
248
+ if (code !== 0) {
249
+ process.exit(code);
250
+ }
251
+ });
252
+ //# sourceMappingURL=cli.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/index.ts","../src/cli.ts"],"sourcesContent":["import { spawn } from 'child_process';\nimport * as path from 'path';\nimport * as fs from 'fs';\nimport { fileURLToPath } from 'url';\n\nconst __filename = fileURLToPath(import.meta.url);\nconst __dirname = path.dirname(__filename);\n\nconst JAR_NAME = 'opendataloader-pdf-cli.jar';\n\nfunction getRedactedCommandString(command: string, commandArgs: string[]): string {\n const commandArgsForLogging = [...commandArgs];\n const passwordIndex = commandArgsForLogging.indexOf('--password');\n if (passwordIndex > -1 && passwordIndex + 1 < commandArgsForLogging.length) {\n commandArgsForLogging[passwordIndex + 1] = '[REDACTED]';\n }\n return `${command} ${commandArgsForLogging.join(' ')}`;\n}\n\nexport interface RunOptions {\n outputFolder?: string;\n password?: string;\n replaceInvalidChars?: string;\n generateMarkdown?: boolean;\n generateHtml?: boolean;\n generateAnnotatedPdf?: boolean;\n keepLineBreaks?: boolean;\n contentSafetyOff?: string;\n htmlInMarkdown?: boolean;\n addImageToMarkdown?: boolean;\n noJson?: boolean;\n debug?: boolean;\n}\n\nexport function run(inputPath: string, options: RunOptions = {}): Promise<string> {\n return new Promise((resolve, reject) => {\n if (!fs.existsSync(inputPath)) {\n return reject(new Error(`Input file or folder not found: ${inputPath}`));\n }\n\n const args: string[] = [];\n if (options.outputFolder) {\n args.push('--output-dir', options.outputFolder);\n }\n if (options.password) {\n args.push('--password', options.password);\n }\n if (options.replaceInvalidChars) {\n args.push('--replace-invalid-chars', options.replaceInvalidChars);\n }\n if (options.generateMarkdown) {\n args.push('--markdown');\n }\n if (options.generateHtml) {\n args.push('--html');\n }\n if (options.generateAnnotatedPdf) {\n args.push('--pdf');\n }\n if (options.keepLineBreaks) {\n args.push('--keep-line-breaks');\n }\n if (options.contentSafetyOff) {\n args.push('--content-safety-off', options.contentSafetyOff);\n }\n if (options.htmlInMarkdown) {\n args.push('--markdown-with-html');\n }\n if (options.addImageToMarkdown) {\n args.push('--markdown-with-images');\n }\n if (options.noJson) {\n args.push('--no-json');\n }\n\n args.push(inputPath);\n\n const jarPath = path.join(__dirname, '..', 'lib', JAR_NAME);\n\n if (!fs.existsSync(jarPath)) {\n return reject(\n new Error(`JAR file not found at ${jarPath}. Please run the build script first.`),\n );\n }\n\n const command = 'java';\n const commandArgs = ['-jar', jarPath, ...args];\n\n if (options.debug) {\n console.error(`Running command: ${getRedactedCommandString(command, commandArgs)}`);\n }\n\n const javaProcess = spawn(command, commandArgs);\n\n let stdout = '';\n let stderr = '';\n\n javaProcess.stdout.on('data', (data) => {\n const chunk = data.toString();\n if (options.debug) {\n process.stdout.write(chunk);\n }\n stdout += chunk;\n });\n\n javaProcess.stderr.on('data', (data) => {\n const chunk = data.toString();\n if (options.debug) {\n process.stderr.write(chunk);\n }\n stderr += chunk;\n });\n\n javaProcess.on('close', (code) => {\n if (code === 0) {\n resolve(stdout);\n } else {\n const error = new Error(\n `The opendataloader-pdf CLI exited with code ${code}.\\n\\n${stderr}`,\n );\n reject(error);\n }\n });\n\n javaProcess.on('error', (err) => {\n if (err.message.includes('ENOENT')) {\n reject(\n new Error(\n \"'java' command not found. Please ensure Java is installed and in your system's PATH.\",\n ),\n );\n } else {\n reject(err);\n }\n });\n });\n}\n","#!/usr/bin/env node\nimport { run, RunOptions } from './index.js';\n\ninterface ParsedArgs {\n inputPath?: string;\n options: RunOptions;\n showHelp: boolean;\n}\n\nfunction printHelp(): void {\n console.log(`Usage: opendataloader-pdf [options] <input>`);\n console.log('');\n console.log('Options:');\n console.log(' -o, --output-dir <path> Directory where outputs are written');\n console.log(' -p, --password <password> Password for encrypted PDFs');\n console.log(' --replace-invalid-chars <c> Replacement character for invalid characters');\n console.log(' --content-safety-off <mode> Disable content safety filtering (provide mode)');\n console.log(' --markdown Generate Markdown output');\n console.log(' --html Generate HTML output');\n console.log(' --pdf Generate annotated PDF output');\n console.log(' --keep-line-breaks Preserve line breaks in text output');\n console.log(' --markdown-with-html Allow raw HTML within Markdown output');\n console.log(' --markdown-with-images Embed images in Markdown output');\n console.log(' --no-json Disable JSON output generation');\n console.log(' --debug Stream CLI logs directly to stdout/stderr');\n console.log(' -h, --help Show this message and exit');\n}\n\nfunction parseArgs(argv: string[]): ParsedArgs {\n const options: RunOptions = {};\n let inputPath: string | undefined;\n let showHelp = false;\n\n const readValue = (currentIndex: number, option: string): { value: string; nextIndex: number } => {\n const nextValue = argv[currentIndex + 1];\n if (!nextValue || nextValue.startsWith('-')) {\n throw new Error(`Option ${option} requires a value.`);\n }\n return { value: nextValue, nextIndex: currentIndex + 1 };\n };\n\n for (let i = 0; i < argv.length; i += 1) {\n const arg = argv[i];\n\n switch (arg) {\n case '--help':\n case '-h':\n showHelp = true;\n i = argv.length; // exit loop\n break;\n case '--output-dir':\n case '-o': {\n const { value, nextIndex } = readValue(i, arg);\n options.outputFolder = value;\n i = nextIndex;\n break;\n }\n case '--password':\n case '-p': {\n const { value, nextIndex } = readValue(i, arg);\n options.password = value;\n i = nextIndex;\n break;\n }\n case '--replace-invalid-chars': {\n const { value, nextIndex } = readValue(i, arg);\n options.replaceInvalidChars = value;\n i = nextIndex;\n break;\n }\n case '--content-safety-off': {\n const { value, nextIndex } = readValue(i, arg);\n options.contentSafetyOff = value;\n i = nextIndex;\n break;\n }\n case '--markdown':\n options.generateMarkdown = true;\n break;\n case '--html':\n options.generateHtml = true;\n break;\n case '--pdf':\n options.generateAnnotatedPdf = true;\n break;\n case '--keep-line-breaks':\n options.keepLineBreaks = true;\n break;\n case '--markdown-with-html':\n options.htmlInMarkdown = true;\n break;\n case '--markdown-with-images':\n options.addImageToMarkdown = true;\n break;\n case '--no-json':\n options.noJson = true;\n break;\n case '--debug':\n options.debug = true;\n break;\n default:\n if (arg.startsWith('-')) {\n throw new Error(`Unknown option: ${arg}`);\n }\n\n if (inputPath) {\n throw new Error('Multiple input paths provided. Only one input path is allowed.');\n }\n inputPath = arg;\n }\n }\n\n return { inputPath, options, showHelp };\n}\n\nasync function main(): Promise<number> {\n let parsed: ParsedArgs;\n\n try {\n parsed = parseArgs(process.argv.slice(2));\n } catch (err) {\n const message = err instanceof Error ? err.message : String(err);\n console.error(message);\n console.error(\"Use '--help' to see available options.\");\n return 1;\n }\n\n if (parsed.showHelp) {\n printHelp();\n return 0;\n }\n\n if (!parsed.inputPath) {\n console.error('Missing required input path.');\n console.error(\"Use '--help' to see usage information.\");\n return 1;\n }\n\n try {\n const output = await run(parsed.inputPath, parsed.options);\n if (output && !parsed.options.debug) {\n process.stdout.write(output);\n }\n if (output && !output.endsWith('\\n') && !parsed.options.debug) {\n process.stdout.write('\\n');\n }\n return 0;\n } catch (err) {\n const message = err instanceof Error ? err.message : String(err);\n console.error(message);\n return 1;\n }\n}\n\nmain().then((code) => {\n if (code !== 0) {\n process.exit(code);\n }\n});\n"],"mappings":";;;AAAA,SAAS,aAAa;AACtB,YAAY,UAAU;AACtB,YAAY,QAAQ;AACpB,SAAS,qBAAqB;AAE9B,IAAM,aAAa,cAAc,YAAY,GAAG;AAChD,IAAM,YAAiB,aAAQ,UAAU;AAEzC,IAAM,WAAW;AAEjB,SAAS,yBAAyB,SAAiB,aAA+B;AAChF,QAAM,wBAAwB,CAAC,GAAG,WAAW;AAC7C,QAAM,gBAAgB,sBAAsB,QAAQ,YAAY;AAChE,MAAI,gBAAgB,MAAM,gBAAgB,IAAI,sBAAsB,QAAQ;AAC1E,0BAAsB,gBAAgB,CAAC,IAAI;AAAA,EAC7C;AACA,SAAO,GAAG,OAAO,IAAI,sBAAsB,KAAK,GAAG,CAAC;AACtD;AAiBO,SAAS,IAAI,WAAmB,UAAsB,CAAC,GAAoB;AAChF,SAAO,IAAI,QAAQ,CAAC,SAAS,WAAW;AACtC,QAAI,CAAI,cAAW,SAAS,GAAG;AAC7B,aAAO,OAAO,IAAI,MAAM,mCAAmC,SAAS,EAAE,CAAC;AAAA,IACzE;AAEA,UAAM,OAAiB,CAAC;AACxB,QAAI,QAAQ,cAAc;AACxB,WAAK,KAAK,gBAAgB,QAAQ,YAAY;AAAA,IAChD;AACA,QAAI,QAAQ,UAAU;AACpB,WAAK,KAAK,cAAc,QAAQ,QAAQ;AAAA,IAC1C;AACA,QAAI,QAAQ,qBAAqB;AAC/B,WAAK,KAAK,2BAA2B,QAAQ,mBAAmB;AAAA,IAClE;AACA,QAAI,QAAQ,kBAAkB;AAC5B,WAAK,KAAK,YAAY;AAAA,IACxB;AACA,QAAI,QAAQ,cAAc;AACxB,WAAK,KAAK,QAAQ;AAAA,IACpB;AACA,QAAI,QAAQ,sBAAsB;AAChC,WAAK,KAAK,OAAO;AAAA,IACnB;AACA,QAAI,QAAQ,gBAAgB;AAC1B,WAAK,KAAK,oBAAoB;AAAA,IAChC;AACA,QAAI,QAAQ,kBAAkB;AAC5B,WAAK,KAAK,wBAAwB,QAAQ,gBAAgB;AAAA,IAC5D;AACA,QAAI,QAAQ,gBAAgB;AAC1B,WAAK,KAAK,sBAAsB;AAAA,IAClC;AACA,QAAI,QAAQ,oBAAoB;AAC9B,WAAK,KAAK,wBAAwB;AAAA,IACpC;AACA,QAAI,QAAQ,QAAQ;AAClB,WAAK,KAAK,WAAW;AAAA,IACvB;AAEA,SAAK,KAAK,SAAS;AAEnB,UAAM,UAAe,UAAK,WAAW,MAAM,OAAO,QAAQ;AAE1D,QAAI,CAAI,cAAW,OAAO,GAAG;AAC3B,aAAO;AAAA,QACL,IAAI,MAAM,yBAAyB,OAAO,sCAAsC;AAAA,MAClF;AAAA,IACF;AAEA,UAAM,UAAU;AAChB,UAAM,cAAc,CAAC,QAAQ,SAAS,GAAG,IAAI;AAE7C,QAAI,QAAQ,OAAO;AACjB,cAAQ,MAAM,oBAAoB,yBAAyB,SAAS,WAAW,CAAC,EAAE;AAAA,IACpF;AAEA,UAAM,cAAc,MAAM,SAAS,WAAW;AAE9C,QAAI,SAAS;AACb,QAAI,SAAS;AAEb,gBAAY,OAAO,GAAG,QAAQ,CAAC,SAAS;AACtC,YAAM,QAAQ,KAAK,SAAS;AAC5B,UAAI,QAAQ,OAAO;AACjB,gBAAQ,OAAO,MAAM,KAAK;AAAA,MAC5B;AACA,gBAAU;AAAA,IACZ,CAAC;AAED,gBAAY,OAAO,GAAG,QAAQ,CAAC,SAAS;AACtC,YAAM,QAAQ,KAAK,SAAS;AAC5B,UAAI,QAAQ,OAAO;AACjB,gBAAQ,OAAO,MAAM,KAAK;AAAA,MAC5B;AACA,gBAAU;AAAA,IACZ,CAAC;AAED,gBAAY,GAAG,SAAS,CAAC,SAAS;AAChC,UAAI,SAAS,GAAG;AACd,gBAAQ,MAAM;AAAA,MAChB,OAAO;AACL,cAAM,QAAQ,IAAI;AAAA,UAChB,+CAA+C,IAAI;AAAA;AAAA,EAAQ,MAAM;AAAA,QACnE;AACA,eAAO,KAAK;AAAA,MACd;AAAA,IACF,CAAC;AAED,gBAAY,GAAG,SAAS,CAAC,QAAQ;AAC/B,UAAI,IAAI,QAAQ,SAAS,QAAQ,GAAG;AAClC;AAAA,UACE,IAAI;AAAA,YACF;AAAA,UACF;AAAA,QACF;AAAA,MACF,OAAO;AACL,eAAO,GAAG;AAAA,MACZ;AAAA,IACF,CAAC;AAAA,EACH,CAAC;AACH;;;AC/HA,SAAS,YAAkB;AACzB,UAAQ,IAAI,6CAA6C;AACzD,UAAQ,IAAI,EAAE;AACd,UAAQ,IAAI,UAAU;AACtB,UAAQ,IAAI,oEAAoE;AAChF,UAAQ,IAAI,4DAA4D;AACxE,UAAQ,IAAI,6EAA6E;AACzF,UAAQ,IAAI,gFAAgF;AAC5F,UAAQ,IAAI,yDAAyD;AACrE,UAAQ,IAAI,qDAAqD;AACjE,UAAQ,IAAI,8DAA8D;AAC1E,UAAQ,IAAI,oEAAoE;AAChF,UAAQ,IAAI,sEAAsE;AAClF,UAAQ,IAAI,gEAAgE;AAC5E,UAAQ,IAAI,+DAA+D;AAC3E,UAAQ,IAAI,0EAA0E;AACtF,UAAQ,IAAI,2DAA2D;AACzE;AAEA,SAAS,UAAU,MAA4B;AAC7C,QAAM,UAAsB,CAAC;AAC7B,MAAI;AACJ,MAAI,WAAW;AAEf,QAAM,YAAY,CAAC,cAAsB,WAAyD;AAChG,UAAM,YAAY,KAAK,eAAe,CAAC;AACvC,QAAI,CAAC,aAAa,UAAU,WAAW,GAAG,GAAG;AAC3C,YAAM,IAAI,MAAM,UAAU,MAAM,oBAAoB;AAAA,IACtD;AACA,WAAO,EAAE,OAAO,WAAW,WAAW,eAAe,EAAE;AAAA,EACzD;AAEA,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK,GAAG;AACvC,UAAM,MAAM,KAAK,CAAC;AAElB,YAAQ,KAAK;AAAA,MACX,KAAK;AAAA,MACL,KAAK;AACH,mBAAW;AACX,YAAI,KAAK;AACT;AAAA,MACF,KAAK;AAAA,MACL,KAAK,MAAM;AACT,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,eAAe;AACvB,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK;AAAA,MACL,KAAK,MAAM;AACT,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,WAAW;AACnB,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK,2BAA2B;AAC9B,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,sBAAsB;AAC9B,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK,wBAAwB;AAC3B,cAAM,EAAE,OAAO,UAAU,IAAI,UAAU,GAAG,GAAG;AAC7C,gBAAQ,mBAAmB;AAC3B,YAAI;AACJ;AAAA,MACF;AAAA,MACA,KAAK;AACH,gBAAQ,mBAAmB;AAC3B;AAAA,MACF,KAAK;AACH,gBAAQ,eAAe;AACvB;AAAA,MACF,KAAK;AACH,gBAAQ,uBAAuB;AAC/B;AAAA,MACF,KAAK;AACH,gBAAQ,iBAAiB;AACzB;AAAA,MACF,KAAK;AACH,gBAAQ,iBAAiB;AACzB;AAAA,MACF,KAAK;AACH,gBAAQ,qBAAqB;AAC7B;AAAA,MACF,KAAK;AACH,gBAAQ,SAAS;AACjB;AAAA,MACF,KAAK;AACH,gBAAQ,QAAQ;AAChB;AAAA,MACF;AACE,YAAI,IAAI,WAAW,GAAG,GAAG;AACvB,gBAAM,IAAI,MAAM,mBAAmB,GAAG,EAAE;AAAA,QAC1C;AAEA,YAAI,WAAW;AACb,gBAAM,IAAI,MAAM,gEAAgE;AAAA,QAClF;AACA,oBAAY;AAAA,IAChB;AAAA,EACF;AAEA,SAAO,EAAE,WAAW,SAAS,SAAS;AACxC;AAEA,eAAe,OAAwB;AACrC,MAAI;AAEJ,MAAI;AACF,aAAS,UAAU,QAAQ,KAAK,MAAM,CAAC,CAAC;AAAA,EAC1C,SAAS,KAAK;AACZ,UAAM,UAAU,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAC/D,YAAQ,MAAM,OAAO;AACrB,YAAQ,MAAM,wCAAwC;AACtD,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,UAAU;AACnB,cAAU;AACV,WAAO;AAAA,EACT;AAEA,MAAI,CAAC,OAAO,WAAW;AACrB,YAAQ,MAAM,8BAA8B;AAC5C,YAAQ,MAAM,wCAAwC;AACtD,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,SAAS,MAAM,IAAI,OAAO,WAAW,OAAO,OAAO;AACzD,QAAI,UAAU,CAAC,OAAO,QAAQ,OAAO;AACnC,cAAQ,OAAO,MAAM,MAAM;AAAA,IAC7B;AACA,QAAI,UAAU,CAAC,OAAO,SAAS,IAAI,KAAK,CAAC,OAAO,QAAQ,OAAO;AAC7D,cAAQ,OAAO,MAAM,IAAI;AAAA,IAC3B;AACA,WAAO;AAAA,EACT,SAAS,KAAK;AACZ,UAAM,UAAU,eAAe,QAAQ,IAAI,UAAU,OAAO,GAAG;AAC/D,YAAQ,MAAM,OAAO;AACrB,WAAO;AAAA,EACT;AACF;AAEA,KAAK,EAAE,KAAK,CAAC,SAAS;AACpB,MAAI,SAAS,GAAG;AACd,YAAQ,KAAK,IAAI;AAAA,EACnB;AACF,CAAC;","names":[]}
Binary file
package/package.json CHANGED
@@ -1,11 +1,14 @@
1
1
  {
2
2
  "name": "@opendataloader/pdf",
3
- "version": "1.0.5",
3
+ "version": "1.1.0",
4
4
  "description": "A Node.js wrapper for the opendataloader-pdf Java CLI.",
5
5
  "main": "./dist/index.cjs",
6
6
  "module": "./dist/index.js",
7
7
  "types": "./dist/index.d.ts",
8
8
  "type": "module",
9
+ "bin": {
10
+ "opendataloader-pdf": "./dist/cli.js"
11
+ },
9
12
  "exports": {
10
13
  ".": {
11
14
  "import": "./dist/index.js",