pdf-oxide-wasm 0.3.23 → 0.3.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +141 -145
- package/package.json +1 -1
- package/pdf_oxide.d.ts +8 -0
- package/pdf_oxide.js +44 -0
- package/pdf_oxide_bg.wasm +0 -0
- package/pdf_oxide_bg.wasm.d.ts +3 -1
package/README.md
CHANGED
|
@@ -1,14 +1,34 @@
|
|
|
1
|
-
#
|
|
1
|
+
# PDF Oxide for WASM — The Fastest PDF Toolkit for Browsers, Deno, Bun & Edge
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
Extract text, convert to markdown/HTML, search, fill forms, create and edit PDFs — all from WebAssembly.
|
|
5
|
-
|
|
6
|
-
Built on the [pdf-oxide](https://github.com/yfedoseev/pdf_oxide) Rust core. No native binaries, no system dependencies.
|
|
3
|
+
The fastest WebAssembly PDF library for text extraction, image extraction, and markdown conversion. Powered by a pure-Rust core compiled to WebAssembly. Runs in Node.js, browsers, Deno, Bun, and serverless edge runtimes — no native binaries, no `node-gyp`, no `postinstall`. 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf. 100% pass rate on 3,830 real-world PDFs. MIT / Apache-2.0 licensed.
|
|
7
4
|
|
|
8
5
|
[](https://www.npmjs.com/package/pdf-oxide-wasm)
|
|
9
|
-
[](https://opensource.org/licenses)
|
|
7
|
+
|
|
8
|
+
> **Part of the [PDF Oxide](https://github.com/yfedoseev/pdf_oxide) toolkit.** Same Rust core, same speed, same 100% pass rate as the [Rust](https://docs.rs/pdf_oxide), [Python](https://github.com/yfedoseev/pdf_oxide/blob/main/python/README.md), [Go](https://github.com/yfedoseev/pdf_oxide/blob/main/go/README.md), [JavaScript / TypeScript (Node.js native)](https://github.com/yfedoseev/pdf_oxide/blob/main/js/README.md), and [C# / .NET](https://github.com/yfedoseev/pdf_oxide/blob/main/csharp/README.md) bindings.
|
|
9
|
+
>
|
|
10
|
+
> Need a faster Node.js binding with native code? Use [pdf-oxide](https://www.npmjs.com/package/pdf-oxide) instead — same API, native N-API addon.
|
|
10
11
|
|
|
11
|
-
##
|
|
12
|
+
## Quick Start
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install pdf-oxide-wasm
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
```javascript
|
|
19
|
+
const { WasmPdfDocument } = require("pdf-oxide-wasm");
|
|
20
|
+
const fs = require("fs");
|
|
21
|
+
|
|
22
|
+
const bytes = new Uint8Array(fs.readFileSync("paper.pdf"));
|
|
23
|
+
const doc = new WasmPdfDocument(bytes);
|
|
24
|
+
|
|
25
|
+
console.log(doc.extractText(0));
|
|
26
|
+
console.log(doc.toMarkdown(0));
|
|
27
|
+
|
|
28
|
+
doc.free();
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Why pdf-oxide-wasm?
|
|
12
32
|
|
|
13
33
|
| Feature | pdf-oxide-wasm | pdf-parse | pdf-lib | pdfjs-dist |
|
|
14
34
|
|---|---|---|---|---|
|
|
@@ -25,15 +45,47 @@ Built on the [pdf-oxide](https://github.com/yfedoseev/pdf_oxide) Rust core. No n
|
|
|
25
45
|
| TypeScript types included | Yes | No | Yes | Yes |
|
|
26
46
|
| License | MIT / Apache-2.0 | MIT | MIT | Apache-2.0 |
|
|
27
47
|
|
|
28
|
-
|
|
48
|
+
- **Fast** — 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf
|
|
49
|
+
- **Reliable** — 100% pass rate on 3,830 test PDFs, zero panics, zero timeouts
|
|
50
|
+
- **Universal** — Runs in Node.js, browsers, Deno, Bun, and Cloudflare Workers without modification
|
|
51
|
+
- **Zero install friction** — No native binaries, no `node-gyp`, no `postinstall` scripts
|
|
52
|
+
- **Pure Rust core** — Memory-safe, panic-free, compiled straight to WebAssembly
|
|
53
|
+
- **Full TypeScript support** — Type definitions ship in the package
|
|
54
|
+
|
|
55
|
+
## Performance
|
|
56
|
+
|
|
57
|
+
Benchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs). Text extraction libraries only. Single-thread, 60s timeout, no warm-up.
|
|
58
|
+
|
|
59
|
+
| Library | Mean | p99 | Pass Rate | License |
|
|
60
|
+
|---------|------|-----|-----------|---------|
|
|
61
|
+
| **PDF Oxide** | **0.8ms** | **9ms** | **100%** | **MIT / Apache-2.0** |
|
|
62
|
+
| PyMuPDF | 4.6ms | 28ms | 99.3% | AGPL-3.0 |
|
|
63
|
+
| pypdfium2 | 4.1ms | 42ms | 99.2% | Apache-2.0 |
|
|
64
|
+
| pdftext | 7.3ms | 82ms | 99.0% | GPL-3.0 |
|
|
65
|
+
| pdfminer | 16.8ms | 124ms | 98.8% | MIT |
|
|
66
|
+
| pypdf | 12.1ms | 97ms | 98.4% | BSD-3 |
|
|
67
|
+
|
|
68
|
+
99.5% text parity vs PyMuPDF and pypdfium2 across the full corpus. The WASM compilation preserves near-native performance — no garbage collection overhead, no child process spawning, no temp files.
|
|
69
|
+
|
|
70
|
+
## Installation
|
|
29
71
|
|
|
30
72
|
```bash
|
|
31
73
|
npm install pdf-oxide-wasm
|
|
32
74
|
```
|
|
33
75
|
|
|
34
|
-
|
|
76
|
+
Works without modification in:
|
|
77
|
+
|
|
78
|
+
- **Node.js** 18+ (CommonJS and ESM)
|
|
79
|
+
- **Browsers** — Chrome, Firefox, Safari, Edge
|
|
80
|
+
- **Cloudflare Workers** — runs in V8 isolates with WASM support
|
|
81
|
+
- **Deno** — native WASM support
|
|
82
|
+
- **Bun** — native WASM support
|
|
83
|
+
|
|
84
|
+
No native binaries, no system dependencies, no build step.
|
|
35
85
|
|
|
36
|
-
|
|
86
|
+
## API Tour
|
|
87
|
+
|
|
88
|
+
### Open and extract text
|
|
37
89
|
|
|
38
90
|
```javascript
|
|
39
91
|
const { WasmPdfDocument } = require("pdf-oxide-wasm");
|
|
@@ -43,14 +95,14 @@ const bytes = new Uint8Array(fs.readFileSync("document.pdf"));
|
|
|
43
95
|
const doc = new WasmPdfDocument(bytes);
|
|
44
96
|
|
|
45
97
|
console.log(`Pages: ${doc.pageCount()}`);
|
|
46
|
-
console.log(doc.extractText(0));
|
|
47
|
-
console.log(doc.toMarkdown(0));
|
|
48
|
-
console.log(doc.toHtml(0));
|
|
98
|
+
console.log(doc.extractText(0)); // plain text
|
|
99
|
+
console.log(doc.toMarkdown(0)); // markdown
|
|
100
|
+
console.log(doc.toHtml(0)); // HTML
|
|
49
101
|
|
|
50
102
|
doc.free();
|
|
51
103
|
```
|
|
52
104
|
|
|
53
|
-
|
|
105
|
+
ESM / TypeScript:
|
|
54
106
|
|
|
55
107
|
```typescript
|
|
56
108
|
import { WasmPdfDocument } from "pdf-oxide-wasm";
|
|
@@ -65,23 +117,14 @@ const markdown = doc.toMarkdownAll();
|
|
|
65
117
|
doc.free();
|
|
66
118
|
```
|
|
67
119
|
|
|
68
|
-
###
|
|
69
|
-
|
|
70
|
-
```javascript
|
|
71
|
-
import { WasmPdf } from "pdf-oxide-wasm";
|
|
72
|
-
|
|
73
|
-
const pdf = WasmPdf.fromMarkdown("# Invoice\n\nTotal: $42.00", "Invoice", "Acme Corp");
|
|
74
|
-
const bytes = pdf.toBytes(); // Uint8Array — write to file or send as response
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
### Search inside a PDF
|
|
120
|
+
### Search
|
|
78
121
|
|
|
79
122
|
```javascript
|
|
80
123
|
const results = doc.search("quarterly revenue", true); // case-insensitive
|
|
81
124
|
// Returns: [{ page, text, bbox, start_index, end_index, span_boxes }]
|
|
82
125
|
```
|
|
83
126
|
|
|
84
|
-
###
|
|
127
|
+
### Form fields
|
|
85
128
|
|
|
86
129
|
```javascript
|
|
87
130
|
const fields = doc.getFormFields();
|
|
@@ -90,7 +133,16 @@ const fields = doc.getFormFields();
|
|
|
90
133
|
doc.setFormFieldValue("name", "Jane Doe");
|
|
91
134
|
doc.setFormFieldValue("agree_terms", true);
|
|
92
135
|
|
|
93
|
-
const filledPdf = doc.saveToBytes();
|
|
136
|
+
const filledPdf = doc.saveToBytes();
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Create a PDF from Markdown
|
|
140
|
+
|
|
141
|
+
```javascript
|
|
142
|
+
import { WasmPdf } from "pdf-oxide-wasm";
|
|
143
|
+
|
|
144
|
+
const pdf = WasmPdf.fromMarkdown("# Invoice\n\nTotal: $42.00", "Invoice", "Acme Corp");
|
|
145
|
+
const bytes = pdf.toBytes();
|
|
94
146
|
```
|
|
95
147
|
|
|
96
148
|
### Encrypt a PDF (AES-256)
|
|
@@ -104,133 +156,77 @@ const encrypted = doc.saveEncryptedToBytes(
|
|
|
104
156
|
);
|
|
105
157
|
```
|
|
106
158
|
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
**Text Extraction** — plain text, Markdown, and HTML output formats. Character-level and span-level extraction with bounding boxes, font names, sizes, weights, colors, and italic flags.
|
|
110
|
-
|
|
111
|
-
**Format Conversion** — convert any page or all pages to Markdown (with heading detection, images, form fields), HTML (with optional CSS layout preservation), or structured plain text.
|
|
112
|
-
|
|
113
|
-
**Full-Text Search** — regex and literal search across all pages or a single page. Case-insensitive, whole-word, and max-results options. Returns match positions with bounding boxes.
|
|
114
|
-
|
|
115
|
-
**Image Extraction** — extract image metadata (dimensions, color space, bits per component, bounding boxes) and raw image bytes as PNG.
|
|
116
|
-
|
|
117
|
-
**Form Fields** — read all AcroForm fields (text, button, choice, signature). Get/set individual field values. Export form data as FDF or XFDF. Flatten forms into static content. XFA detection.
|
|
118
|
-
|
|
119
|
-
**PDF Creation** — generate PDFs from Markdown, HTML, plain text, or images (PNG/JPEG). Multi-image support (one page per image). Set title, author metadata.
|
|
120
|
-
|
|
121
|
-
**PDF Editing** — set document metadata (title, author, subject, keywords). Rotate pages, set MediaBox/CropBox, crop margins. Erase (whiteout) regions. Reposition, resize, and set bounds on images. Flatten or apply redactions. Merge PDFs. Embed files.
|
|
122
|
-
|
|
123
|
-
**Encryption** — AES-256 encryption with granular permissions (print, copy, modify, annotate).
|
|
124
|
-
|
|
125
|
-
**Document Structure** — bookmarks/outline (table of contents), annotations (links, comments, form widgets), page labels, XMP metadata, vector paths.
|
|
126
|
-
|
|
127
|
-
## API Reference
|
|
128
|
-
|
|
129
|
-
### `WasmPdfDocument` — read, extract, search, and edit existing PDFs
|
|
130
|
-
|
|
131
|
-
| Method | Description |
|
|
132
|
-
|---|---|
|
|
133
|
-
| `new(data)` | Load PDF from `Uint8Array` |
|
|
134
|
-
| `pageCount()` | Number of pages |
|
|
135
|
-
| `version()` | PDF version as `[major, minor]` |
|
|
136
|
-
| `authenticate(password)` | Decrypt an encrypted PDF |
|
|
137
|
-
| `hasStructureTree()` | Check for Tagged PDF structure |
|
|
138
|
-
| **Text Extraction** | |
|
|
139
|
-
| `extractText(page)` | Plain text from one page |
|
|
140
|
-
| `extractAllText()` | Plain text from all pages |
|
|
141
|
-
| `extractChars(page)` | Character-level data with positions |
|
|
142
|
-
| `extractSpans(page)` | Span-level data with positions |
|
|
143
|
-
| **Format Conversion** | |
|
|
144
|
-
| `toMarkdown(page, headings?, images?, forms?)` | Markdown from one page |
|
|
145
|
-
| `toMarkdownAll(headings?, images?, forms?)` | Markdown from all pages |
|
|
146
|
-
| `toHtml(page, layout?, headings?, forms?)` | HTML from one page |
|
|
147
|
-
| `toHtmlAll(layout?, headings?, forms?)` | HTML from all pages |
|
|
148
|
-
| `toPlainText(page)` | Plain text with layout |
|
|
149
|
-
| `toPlainTextAll()` | Plain text all pages |
|
|
150
|
-
| **Search** | |
|
|
151
|
-
| `search(pattern, caseInsensitive?, literal?, wholeWord?, max?)` | Search all pages |
|
|
152
|
-
| `searchPage(page, pattern, ...)` | Search one page |
|
|
153
|
-
| **Images** | |
|
|
154
|
-
| `extractImages(page)` | Image metadata (dimensions, color space, bbox) |
|
|
155
|
-
| `extractImageBytes(page)` | Image data as PNG `Uint8Array` |
|
|
156
|
-
| `pageImages(page)` | Image placement info (bounds, matrix) |
|
|
157
|
-
| **Forms** | |
|
|
158
|
-
| `getFormFields()` | All form fields with types and values |
|
|
159
|
-
| `getFormFieldValue(name)` | Get a single field value |
|
|
160
|
-
| `setFormFieldValue(name, value)` | Set a field value |
|
|
161
|
-
| `exportFormData(format?)` | Export as FDF or XFDF |
|
|
162
|
-
| `hasXfa()` | Check for XFA form data |
|
|
163
|
-
| `flattenForms()` | Flatten all form fields |
|
|
164
|
-
| `flattenFormsOnPage(page)` | Flatten fields on one page |
|
|
165
|
-
| **Document Structure** | |
|
|
166
|
-
| `getOutline()` | Bookmarks / table of contents |
|
|
167
|
-
| `getAnnotations(page)` | Page annotations |
|
|
168
|
-
| `extractPaths(page)` | Vector paths (lines, curves) |
|
|
169
|
-
| `pageLabels()` | Page label ranges |
|
|
170
|
-
| `xmpMetadata()` | XMP metadata |
|
|
171
|
-
| **Editing** | |
|
|
172
|
-
| `setTitle(title)` | Set document title |
|
|
173
|
-
| `setAuthor(author)` | Set document author |
|
|
174
|
-
| `setSubject(subject)` | Set document subject |
|
|
175
|
-
| `setKeywords(keywords)` | Set document keywords |
|
|
176
|
-
| `setPageRotation(page, degrees)` | Set page rotation |
|
|
177
|
-
| `rotatePage(page, degrees)` | Rotate page by degrees |
|
|
178
|
-
| `rotateAllPages(degrees)` | Rotate all pages |
|
|
179
|
-
| `pageMediaBox(page)` | Get MediaBox |
|
|
180
|
-
| `setPageMediaBox(page, llx, lly, urx, ury)` | Set MediaBox |
|
|
181
|
-
| `pageCropBox(page)` | Get CropBox |
|
|
182
|
-
| `setPageCropBox(page, llx, lly, urx, ury)` | Set CropBox |
|
|
183
|
-
| `cropMargins(left, right, top, bottom)` | Crop all page margins |
|
|
184
|
-
| `eraseRegion(page, llx, lly, urx, ury)` | Whiteout a region |
|
|
185
|
-
| `eraseRegions(page, rects)` | Whiteout multiple regions |
|
|
186
|
-
| `repositionImage(page, name, x, y)` | Move an image |
|
|
187
|
-
| `resizeImage(page, name, w, h)` | Resize an image |
|
|
188
|
-
| `setImageBounds(page, name, x, y, w, h)` | Set image bounds |
|
|
189
|
-
| `flattenPageAnnotations(page)` | Flatten page annotations |
|
|
190
|
-
| `flattenAllAnnotations()` | Flatten all annotations |
|
|
191
|
-
| `applyPageRedactions(page)` | Apply redactions on page |
|
|
192
|
-
| `applyAllRedactions()` | Apply all redactions |
|
|
193
|
-
| `mergeFrom(data)` | Merge another PDF |
|
|
194
|
-
| `embedFile(name, data)` | Embed a file |
|
|
195
|
-
| **Save** | |
|
|
196
|
-
| `saveToBytes()` | Save edits → `Uint8Array` |
|
|
197
|
-
| `saveEncryptedToBytes(userPwd, ownerPwd?, ...)` | Save with AES-256 encryption |
|
|
198
|
-
| `free()` | Release WASM memory |
|
|
199
|
-
|
|
200
|
-
### `WasmPdf` — create new PDFs
|
|
201
|
-
|
|
202
|
-
| Method | Description |
|
|
203
|
-
|---|---|
|
|
204
|
-
| `fromMarkdown(content, title?, author?)` | Create PDF from Markdown |
|
|
205
|
-
| `fromHtml(content, title?, author?)` | Create PDF from HTML |
|
|
206
|
-
| `fromText(content, title?, author?)` | Create PDF from plain text |
|
|
207
|
-
| `fromImageBytes(data)` | Create PDF from image (PNG/JPEG) |
|
|
208
|
-
| `fromMultipleImageBytes(images)` | Create multi-page PDF from images |
|
|
209
|
-
| `toBytes()` | Get PDF as `Uint8Array` |
|
|
210
|
-
| `size` | PDF size in bytes |
|
|
211
|
-
|
|
212
|
-
## Platform Compatibility
|
|
159
|
+
### Render and extract images
|
|
213
160
|
|
|
214
|
-
|
|
161
|
+
```javascript
|
|
162
|
+
const images = doc.extractImages(0);
|
|
163
|
+
const pngBytes = doc.extractImageBytes(0);
|
|
164
|
+
```
|
|
215
165
|
|
|
216
|
-
|
|
217
|
-
- **Browsers** — Chrome, Firefox, Safari, Edge
|
|
218
|
-
- **Cloudflare Workers** — runs in V8 isolates with WASM support
|
|
219
|
-
- **Deno** — native WASM support
|
|
220
|
-
- **Bun** — native WASM support
|
|
166
|
+
### Edit metadata, pages, and content
|
|
221
167
|
|
|
222
|
-
|
|
168
|
+
```javascript
|
|
169
|
+
doc.setTitle("Quarterly Report");
|
|
170
|
+
doc.setAuthor("Finance Team");
|
|
171
|
+
doc.setPageRotation(0, 90);
|
|
172
|
+
doc.cropMargins(36, 36, 36, 36);
|
|
173
|
+
doc.eraseRegion(0, 50, 50, 200, 100);
|
|
174
|
+
doc.flattenAllAnnotations();
|
|
175
|
+
|
|
176
|
+
const editedBytes = doc.saveToBytes();
|
|
177
|
+
```
|
|
223
178
|
|
|
224
|
-
##
|
|
179
|
+
## Other languages
|
|
180
|
+
|
|
181
|
+
PDF Oxide ships the same Rust core through six bindings:
|
|
182
|
+
|
|
183
|
+
- **Rust** — `cargo add pdf_oxide` — see [docs.rs/pdf_oxide](https://docs.rs/pdf_oxide)
|
|
184
|
+
- **Python** — `pip install pdf_oxide` — see [python/README.md](https://github.com/yfedoseev/pdf_oxide/blob/main/python/README.md)
|
|
185
|
+
- **Go** — `go get github.com/yfedoseev/pdf_oxide/go` — see [go/README.md](https://github.com/yfedoseev/pdf_oxide/blob/main/go/README.md)
|
|
186
|
+
- **JavaScript / TypeScript (Node.js native)** — `npm install pdf-oxide` — see [js/README.md](https://github.com/yfedoseev/pdf_oxide/blob/main/js/README.md)
|
|
187
|
+
- **C# / .NET** — `dotnet add package PdfOxide` — see [csharp/README.md](https://github.com/yfedoseev/pdf_oxide/blob/main/csharp/README.md)
|
|
188
|
+
|
|
189
|
+
A bug fix in the Rust core lands in every binding on the next release.
|
|
225
190
|
|
|
226
|
-
|
|
191
|
+
## Documentation
|
|
227
192
|
|
|
228
|
-
|
|
193
|
+
- **[Full Documentation](https://pdf.oxide.fyi)** — Complete documentation site
|
|
194
|
+
- **[WASM Getting Started](https://github.com/yfedoseev/pdf_oxide/blob/main/docs/getting-started-wasm.md)** — Step-by-step WASM guide
|
|
195
|
+
- **[Main Repository](https://github.com/yfedoseev/pdf_oxide)** — Rust core, CLI, MCP server, all bindings
|
|
196
|
+
- **[Performance Benchmarks](https://pdf.oxide.fyi/docs/performance)** — Full benchmark methodology and results
|
|
197
|
+
- **[GitHub Issues](https://github.com/yfedoseev/pdf_oxide/issues)** — Bug reports and feature requests
|
|
229
198
|
|
|
230
|
-
|
|
199
|
+
## Use Cases
|
|
231
200
|
|
|
232
|
-
|
|
201
|
+
- **Browser PDF tooling** — Extract, search, and convert PDFs entirely client-side, no server upload
|
|
202
|
+
- **Edge / serverless workers** — Process PDFs in Cloudflare Workers, Vercel Edge, Deno Deploy
|
|
203
|
+
- **RAG / LLM pipelines** — Convert PDFs to clean Markdown for retrieval-augmented generation
|
|
204
|
+
- **PDF generation** — Create invoices, reports, certificates programmatically without a backend
|
|
205
|
+
- **Universal Node.js packages** — Same code runs in Node.js, the browser, and edge runtimes
|
|
206
|
+
|
|
207
|
+
## Why I built this
|
|
208
|
+
|
|
209
|
+
I needed PyMuPDF's speed without its AGPL license, and I needed it in more than one language. Nothing existed that ticked all three boxes — fast, MIT, multi-language — so I wrote it. The Rust core is what does the real work; the bindings for Python, Go, JS/TS, C#, and WASM are thin shells around the same code, so a bug fix in one lands in all of them. It now passes 100% of the veraPDF + Mozilla pdf.js + DARPA SafeDocs test corpora (3,830 PDFs) on every platform I've tested.
|
|
210
|
+
|
|
211
|
+
If it's useful to you, a star on GitHub genuinely helps. If something's broken or missing, [open an issue](https://github.com/yfedoseev/pdf_oxide/issues) — I read all of them.
|
|
212
|
+
|
|
213
|
+
— Yury
|
|
233
214
|
|
|
234
215
|
## License
|
|
235
216
|
|
|
236
|
-
MIT
|
|
217
|
+
Dual-licensed under [MIT](https://github.com/yfedoseev/pdf_oxide/blob/main/LICENSE-MIT) or [Apache-2.0](https://github.com/yfedoseev/pdf_oxide/blob/main/LICENSE-APACHE) at your option. Unlike AGPL-licensed alternatives, pdf_oxide can be used freely in any project — commercial or open-source — with no copyleft restrictions.
|
|
218
|
+
|
|
219
|
+
## Citation
|
|
220
|
+
|
|
221
|
+
```bibtex
|
|
222
|
+
@software{pdf_oxide,
|
|
223
|
+
title = {PDF Oxide: Fast PDF Toolkit for Rust, Python, Go, JavaScript, and C#},
|
|
224
|
+
author = {Yury Fedoseev},
|
|
225
|
+
year = {2025},
|
|
226
|
+
url = {https://github.com/yfedoseev/pdf_oxide}
|
|
227
|
+
}
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
**WASM** + **Rust core** | MIT / Apache-2.0 | 100% pass rate on 3,830 PDFs | 0.8ms mean | 5× faster than the industry leaders
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pdf-oxide-wasm",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.27",
|
|
4
4
|
"description": "Fast, zero-dependency PDF toolkit for Node.js, browsers, and edge runtimes — text extraction, markdown/HTML conversion, search, form filling, creation, and editing. Rust core compiled to WebAssembly.",
|
|
5
5
|
"license": "MIT OR Apache-2.0",
|
|
6
6
|
"repository": {
|
package/pdf_oxide.d.ts
CHANGED
|
@@ -717,6 +717,14 @@ export class WasmPdfDocument {
|
|
|
717
717
|
* Validate PDF/A compliance. Level: "1b", "2b", etc.
|
|
718
718
|
*/
|
|
719
719
|
validatePdfA(level: string): any;
|
|
720
|
+
/**
|
|
721
|
+
* Validate PDF/UA accessibility compliance.
|
|
722
|
+
*/
|
|
723
|
+
validatePdfUa(level?: string | null): any;
|
|
724
|
+
/**
|
|
725
|
+
* Validate PDF/X print production compliance.
|
|
726
|
+
*/
|
|
727
|
+
validatePdfX(level?: string | null): any;
|
|
720
728
|
/**
|
|
721
729
|
* Get the PDF version as [major, minor].
|
|
722
730
|
*/
|
package/pdf_oxide.js
CHANGED
|
@@ -2539,6 +2539,50 @@ class WasmPdfDocument {
|
|
|
2539
2539
|
wasm.__wbindgen_add_to_stack_pointer(16);
|
|
2540
2540
|
}
|
|
2541
2541
|
}
|
|
2542
|
+
/**
|
|
2543
|
+
* Validate PDF/UA accessibility compliance.
|
|
2544
|
+
* @param {string | null} [level]
|
|
2545
|
+
* @returns {any}
|
|
2546
|
+
*/
|
|
2547
|
+
validatePdfUa(level) {
|
|
2548
|
+
try {
|
|
2549
|
+
const retptr = wasm.__wbindgen_add_to_stack_pointer(-16);
|
|
2550
|
+
var ptr0 = isLikeNone(level) ? 0 : passStringToWasm0(level, wasm.__wbindgen_export, wasm.__wbindgen_export2);
|
|
2551
|
+
var len0 = WASM_VECTOR_LEN;
|
|
2552
|
+
wasm.wasmpdfdocument_validatePdfUa(retptr, this.__wbg_ptr, ptr0, len0);
|
|
2553
|
+
var r0 = getDataViewMemory0().getInt32(retptr + 4 * 0, true);
|
|
2554
|
+
var r1 = getDataViewMemory0().getInt32(retptr + 4 * 1, true);
|
|
2555
|
+
var r2 = getDataViewMemory0().getInt32(retptr + 4 * 2, true);
|
|
2556
|
+
if (r2) {
|
|
2557
|
+
throw takeObject(r1);
|
|
2558
|
+
}
|
|
2559
|
+
return takeObject(r0);
|
|
2560
|
+
} finally {
|
|
2561
|
+
wasm.__wbindgen_add_to_stack_pointer(16);
|
|
2562
|
+
}
|
|
2563
|
+
}
|
|
2564
|
+
/**
|
|
2565
|
+
* Validate PDF/X print production compliance.
|
|
2566
|
+
* @param {string | null} [level]
|
|
2567
|
+
* @returns {any}
|
|
2568
|
+
*/
|
|
2569
|
+
validatePdfX(level) {
|
|
2570
|
+
try {
|
|
2571
|
+
const retptr = wasm.__wbindgen_add_to_stack_pointer(-16);
|
|
2572
|
+
var ptr0 = isLikeNone(level) ? 0 : passStringToWasm0(level, wasm.__wbindgen_export, wasm.__wbindgen_export2);
|
|
2573
|
+
var len0 = WASM_VECTOR_LEN;
|
|
2574
|
+
wasm.wasmpdfdocument_validatePdfX(retptr, this.__wbg_ptr, ptr0, len0);
|
|
2575
|
+
var r0 = getDataViewMemory0().getInt32(retptr + 4 * 0, true);
|
|
2576
|
+
var r1 = getDataViewMemory0().getInt32(retptr + 4 * 1, true);
|
|
2577
|
+
var r2 = getDataViewMemory0().getInt32(retptr + 4 * 2, true);
|
|
2578
|
+
if (r2) {
|
|
2579
|
+
throw takeObject(r1);
|
|
2580
|
+
}
|
|
2581
|
+
return takeObject(r0);
|
|
2582
|
+
} finally {
|
|
2583
|
+
wasm.__wbindgen_add_to_stack_pointer(16);
|
|
2584
|
+
}
|
|
2585
|
+
}
|
|
2542
2586
|
/**
|
|
2543
2587
|
* Get the PDF version as [major, minor].
|
|
2544
2588
|
* @returns {Uint8Array}
|
package/pdf_oxide_bg.wasm
CHANGED
|
Binary file
|
package/pdf_oxide_bg.wasm.d.ts
CHANGED
|
@@ -108,6 +108,8 @@ export const wasmpdfdocument_toMarkdownAll: (a: number, b: number, c: number, d:
|
|
|
108
108
|
export const wasmpdfdocument_toPlainText: (a: number, b: number, c: number) => void;
|
|
109
109
|
export const wasmpdfdocument_toPlainTextAll: (a: number, b: number) => void;
|
|
110
110
|
export const wasmpdfdocument_validatePdfA: (a: number, b: number, c: number, d: number) => void;
|
|
111
|
+
export const wasmpdfdocument_validatePdfUa: (a: number, b: number, c: number, d: number) => void;
|
|
112
|
+
export const wasmpdfdocument_validatePdfX: (a: number, b: number, c: number, d: number) => void;
|
|
111
113
|
export const wasmpdfdocument_version: (a: number, b: number) => void;
|
|
112
114
|
export const wasmpdfdocument_within: (a: number, b: number, c: number, d: number, e: number) => void;
|
|
113
115
|
export const wasmpdfdocument_xmpMetadata: (a: number, b: number) => void;
|
|
@@ -133,8 +135,8 @@ export const disableLogging: () => void;
|
|
|
133
135
|
export const __wbg_wasmocrengine_free: (a: number, b: number) => void;
|
|
134
136
|
export const wasmpdfdocument_saveToBytes: (a: number, b: number) => void;
|
|
135
137
|
export const wasmocrconfig_new: () => number;
|
|
136
|
-
export const __wbg_wasmfooter_free: (a: number, b: number) => void;
|
|
137
138
|
export const __wbg_wasmheader_free: (a: number, b: number) => void;
|
|
139
|
+
export const __wbg_wasmfooter_free: (a: number, b: number) => void;
|
|
138
140
|
export const __wbindgen_export: (a: number, b: number) => number;
|
|
139
141
|
export const __wbindgen_export2: (a: number, b: number, c: number, d: number) => number;
|
|
140
142
|
export const __wbindgen_export3: (a: number) => void;
|