npm - pdf-oxide-wasm - Versions diffs - 0.3.10 → 0.3.12 - Mend

pdf-oxide-wasm 0.3.10 → 0.3.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,9 +1,40 @@
 # pdf-oxide-wasm
-High-performance PDF text extraction and manipulation via WebAssembly. Built on the [PDF Oxide](https://github.com/yfedoseev/pdf_oxide) Rust core.
+Fast, zero-dependency PDF toolkit for Node.js, browsers, and serverless edge runtimes.
+Extract text, convert to markdown/HTML, search, fill forms, create and edit PDFs — all from WebAssembly.
+Built on the [pdf-oxide](https://github.com/yfedoseev/pdf_oxide) Rust core. No native binaries, no system dependencies.
+[![npm](https://img.shields.io/npm/v/pdf-oxide-wasm)](https://www.npmjs.com/package/pdf-oxide-wasm)
+[![license](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue)](https://github.com/yfedoseev/pdf_oxide/blob/main/LICENSE-MIT)
+## Why pdf-oxide-wasm
+| Feature | pdf-oxide-wasm | pdf-parse | pdf-lib | pdfjs-dist |
+|---|---|---|---|---|
+| Text extraction | Yes | Yes | No | Yes |
+| Markdown / HTML output | Yes | No | No | No |
+| PDF creation | Yes | No | Yes | No |
+| Form field read/write | Yes | No | Partial | No |
+| Full-text search (regex) | Yes | No | No | No |
+| Image extraction | Yes | No | No | No |
+| Merge, encrypt, edit | Yes | No | Yes | No |
+| Serverless / edge runtimes | Yes | No | No | No |
+| Zero native dependencies | Yes | Yes | Yes | No |
+| WebAssembly-based | Yes | No | No | No |
+| TypeScript types included | Yes | No | Yes | Yes |
+| License | MIT / Apache-2.0 | MIT | MIT | Apache-2.0 |
+## Install
+```bash
+npm install pdf-oxide-wasm
+```
 ## Quick Start
+### Extract text (Node.js — CommonJS)
 ```javascript
 const { WasmPdfDocument } = require("pdf-oxide-wasm");
 const fs = require("fs");
@@ -12,35 +43,193 @@ const bytes = new Uint8Array(fs.readFileSync("document.pdf"));
 const doc = new WasmPdfDocument(bytes);
 console.log(`Pages: ${doc.pageCount()}`);
-console.log(doc.extractText(0));
+console.log(doc.extractText(0));       // plain text from page 0
+console.log(doc.toMarkdown(0));        // markdown from page 0
+console.log(doc.toHtml(0));            // HTML from page 0
 doc.free();
 ```
-### ESM
+### Extract text (ESM / TypeScript)
-```javascript
+```typescript
 import { WasmPdfDocument } from "pdf-oxide-wasm";
+import { readFile } from "fs/promises";
-const bytes = new Uint8Array(await fs.promises.readFile("document.pdf"));
+const bytes = new Uint8Array(await readFile("document.pdf"));
 const doc = new WasmPdfDocument(bytes);
-const text = doc.extractText(0);
+const text = doc.extractAllText();
+const markdown = doc.toMarkdownAll();
 doc.free();
 ```
+### Create a PDF from Markdown
+```javascript
+import { WasmPdf } from "pdf-oxide-wasm";
+const pdf = WasmPdf.fromMarkdown("# Invoice\n\nTotal: $42.00", "Invoice", "Acme Corp");
+const bytes = pdf.toBytes(); // Uint8Array — write to file or send as response
+```
+### Search inside a PDF
+```javascript
+const results = doc.search("quarterly revenue", true); // case-insensitive
+// Returns: [{ page, text, bbox, start_index, end_index, span_boxes }]
+```
+### Read and fill form fields
+```javascript
+const fields = doc.getFormFields();
+// [{ name, field_type, value, tooltip, bounds, is_readonly, is_required }]
+doc.setFormFieldValue("name", "Jane Doe");
+doc.setFormFieldValue("agree_terms", true);
+const filledPdf = doc.saveToBytes(); // Uint8Array
+```
+### Encrypt a PDF (AES-256)
+```javascript
+const encrypted = doc.saveEncryptedToBytes(
+  "user-password",
+  "owner-password",
+  true,  // allow print
+  false, // deny copy
+);
+```
 ## Features
-- Text extraction (plain text, Markdown, HTML)
-- Character-level and span-level extraction with positions
-- PDF creation from Markdown, HTML, text, and images
-- Form field extraction and filling
-- PDF editing (metadata, rotation, cropping, annotations)
-- Encryption (AES-256)
-- Search with regex support
+**Text Extraction** — plain text, Markdown, and HTML output formats. Character-level and span-level extraction with bounding boxes, font names, sizes, weights, colors, and italic flags.
+**Format Conversion** — convert any page or all pages to Markdown (with heading detection, images, form fields), HTML (with optional CSS layout preservation), or structured plain text.
+**Full-Text Search** — regex and literal search across all pages or a single page. Case-insensitive, whole-word, and max-results options. Returns match positions with bounding boxes.
+**Image Extraction** — extract image metadata (dimensions, color space, bits per component, bounding boxes) and raw image bytes as PNG.
+**Form Fields** — read all AcroForm fields (text, button, choice, signature). Get/set individual field values. Export form data as FDF or XFDF. Flatten forms into static content. XFA detection.
+**PDF Creation** — generate PDFs from Markdown, HTML, plain text, or images (PNG/JPEG). Multi-image support (one page per image). Set title, author metadata.
+**PDF Editing** — set document metadata (title, author, subject, keywords). Rotate pages, set MediaBox/CropBox, crop margins. Erase (whiteout) regions. Reposition, resize, and set bounds on images. Flatten or apply redactions. Merge PDFs. Embed files.
+**Encryption** — AES-256 encryption with granular permissions (print, copy, modify, annotate).
+**Document Structure** — bookmarks/outline (table of contents), annotations (links, comments, form widgets), page labels, XMP metadata, vector paths.
+## API Reference
+### `WasmPdfDocument` — read, extract, search, and edit existing PDFs
+| Method | Description |
+|---|---|
+| `new(data)` | Load PDF from `Uint8Array` |
+| `pageCount()` | Number of pages |
+| `version()` | PDF version as `[major, minor]` |
+| `authenticate(password)` | Decrypt an encrypted PDF |
+| `hasStructureTree()` | Check for Tagged PDF structure |
+| **Text Extraction** | |
+| `extractText(page)` | Plain text from one page |
+| `extractAllText()` | Plain text from all pages |
+| `extractChars(page)` | Character-level data with positions |
+| `extractSpans(page)` | Span-level data with positions |
+| **Format Conversion** | |
+| `toMarkdown(page, headings?, images?, forms?)` | Markdown from one page |
+| `toMarkdownAll(headings?, images?, forms?)` | Markdown from all pages |
+| `toHtml(page, layout?, headings?, forms?)` | HTML from one page |
+| `toHtmlAll(layout?, headings?, forms?)` | HTML from all pages |
+| `toPlainText(page)` | Plain text with layout |
+| `toPlainTextAll()` | Plain text all pages |
+| **Search** | |
+| `search(pattern, caseInsensitive?, literal?, wholeWord?, max?)` | Search all pages |
+| `searchPage(page, pattern, ...)` | Search one page |
+| **Images** | |
+| `extractImages(page)` | Image metadata (dimensions, color space, bbox) |
+| `extractImageBytes(page)` | Image data as PNG `Uint8Array` |
+| `pageImages(page)` | Image placement info (bounds, matrix) |
+| **Forms** | |
+| `getFormFields()` | All form fields with types and values |
+| `getFormFieldValue(name)` | Get a single field value |
+| `setFormFieldValue(name, value)` | Set a field value |
+| `exportFormData(format?)` | Export as FDF or XFDF |
+| `hasXfa()` | Check for XFA form data |
+| `flattenForms()` | Flatten all form fields |
+| `flattenFormsOnPage(page)` | Flatten fields on one page |
+| **Document Structure** | |
+| `getOutline()` | Bookmarks / table of contents |
+| `getAnnotations(page)` | Page annotations |
+| `extractPaths(page)` | Vector paths (lines, curves) |
+| `pageLabels()` | Page label ranges |
+| `xmpMetadata()` | XMP metadata |
+| **Editing** | |
+| `setTitle(title)` | Set document title |
+| `setAuthor(author)` | Set document author |
+| `setSubject(subject)` | Set document subject |
+| `setKeywords(keywords)` | Set document keywords |
+| `setPageRotation(page, degrees)` | Set page rotation |
+| `rotatePage(page, degrees)` | Rotate page by degrees |
+| `rotateAllPages(degrees)` | Rotate all pages |
+| `pageMediaBox(page)` | Get MediaBox |
+| `setPageMediaBox(page, llx, lly, urx, ury)` | Set MediaBox |
+| `pageCropBox(page)` | Get CropBox |
+| `setPageCropBox(page, llx, lly, urx, ury)` | Set CropBox |
+| `cropMargins(left, right, top, bottom)` | Crop all page margins |
+| `eraseRegion(page, llx, lly, urx, ury)` | Whiteout a region |
+| `eraseRegions(page, rects)` | Whiteout multiple regions |
+| `repositionImage(page, name, x, y)` | Move an image |
+| `resizeImage(page, name, w, h)` | Resize an image |
+| `setImageBounds(page, name, x, y, w, h)` | Set image bounds |
+| `flattenPageAnnotations(page)` | Flatten page annotations |
+| `flattenAllAnnotations()` | Flatten all annotations |
+| `applyPageRedactions(page)` | Apply redactions on page |
+| `applyAllRedactions()` | Apply all redactions |
+| `mergeFrom(data)` | Merge another PDF |
+| `embedFile(name, data)` | Embed a file |
+| **Save** | |
+| `saveToBytes()` | Save edits → `Uint8Array` |
+| `saveEncryptedToBytes(userPwd, ownerPwd?, ...)` | Save with AES-256 encryption |
+| `free()` | Release WASM memory |
+### `WasmPdf` — create new PDFs
+| Method | Description |
+|---|---|
+| `fromMarkdown(content, title?, author?)` | Create PDF from Markdown |
+| `fromHtml(content, title?, author?)` | Create PDF from HTML |
+| `fromText(content, title?, author?)` | Create PDF from plain text |
+| `fromImageBytes(data)` | Create PDF from image (PNG/JPEG) |
+| `fromMultipleImageBytes(images)` | Create multi-page PDF from images |
+| `toBytes()` | Get PDF as `Uint8Array` |
+| `size` | PDF size in bytes |
+## Platform Compatibility
+Works without modification in:
+- **Node.js** 18+ (CommonJS and ESM)
+- **Browsers** — Chrome, Firefox, Safari, Edge
+- **Cloudflare Workers** — runs in V8 isolates with WASM support
+- **Deno** — native WASM support
+- **Bun** — native WASM support
+No native binaries, no `node-gyp`, no `postinstall` scripts. Install and use immediately.
+## Performance
+pdf-oxide-wasm is built on a Rust PDF parser compiled to WebAssembly. The Rust core ([pdf_oxide](https://crates.io/crates/pdf_oxide)) achieves 0.8ms mean extraction time across 3,830 test PDFs with a 100% success rate — the fastest PDF text extraction library available in Rust. The WASM compilation preserves near-native performance without garbage collection overhead or child process spawning.
+## Full Documentation
-## Documentation
+Complete guide with examples: [Getting Started with WASM](https://github.com/yfedoseev/pdf_oxide/blob/main/docs/getting-started-wasm.md)
-Full API reference and examples: [Getting Started (WASM)](https://github.com/yfedoseev/pdf_oxide/blob/main/docs/getting-started-wasm.md)
+Rust library documentation: [docs.rs/pdf_oxide](https://docs.rs/pdf_oxide)
 ## License

package/package.json CHANGED Viewed

@@ -1,13 +1,19 @@
 {
   "name": "pdf-oxide-wasm",
-  "version": "0.3.10",
-  "description": "High-performance PDF text extraction and manipulation via WebAssembly",
+  "version": "0.3.12",
+  "description": "Fast, zero-dependency PDF toolkit for Node.js, browsers, and edge runtimes — text extraction, markdown/HTML conversion, search, form filling, creation, and editing. Rust core compiled to WebAssembly.",
   "license": "MIT OR Apache-2.0",
   "repository": {
     "type": "git",
     "url": "https://github.com/yfedoseev/pdf_oxide"
   },
   "homepage": "https://github.com/yfedoseev/pdf_oxide/blob/main/docs/getting-started-wasm.md",
+  "bugs": {
+    "url": "https://github.com/yfedoseev/pdf_oxide/issues"
+  },
+  "engines": {
+    "node": ">=18"
+  },
   "files": [
     "pdf_oxide_bg.wasm",
     "pdf_oxide.js",
@@ -17,11 +23,32 @@
   ],
   "main": "pdf_oxide.js",
   "types": "pdf_oxide.d.ts",
+  "sideEffects": false,
   "keywords": [
     "pdf",
     "wasm",
     "webassembly",
+    "pdf-parser",
+    "pdf-extract",
     "text-extraction",
-    "pdf-parser"
+    "pdf-to-text",
+    "pdf-to-markdown",
+    "pdf-reader",
+    "pdf-search",
+    "pdf-form",
+    "pdf-creation",
+    "markdown-to-pdf",
+    "html-to-pdf",
+    "rust",
+    "rust-wasm",
+    "serverless",
+    "cloudflare-workers",
+    "node",
+    "browser",
+    "zero-dependency",
+    "pdf-metadata",
+    "pdf-merge",
+    "pdf-encrypt",
+    "typescript"
   ]
 }

package/pdf_oxide_bg.wasm CHANGED Viewed

Binary file