pdfnative 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -21,7 +21,7 @@ pdfnative ships as three coordinated packages — pick whichever entry point fit
21
21
 
22
22
  | Package | Latest | Use it for |
23
23
  |---|:---:|---|
24
- | [`pdfnative`](https://www.npmjs.com/package/pdfnative) | **v1.2.0** | The library itself — call from Node, browsers, Workers, Deno, Bun. |
24
+ | [`pdfnative`](https://www.npmjs.com/package/pdfnative) | **v1.3.0** | The library itself — call from Node, browsers, Workers, Deno, Bun. |
25
25
  | [`pdfnative-cli`](https://www.npmjs.com/package/pdfnative-cli) | **v0.3.0** | Render JSON → PDF, sign (RSA + ECDSA-SHA256, RFC 3161 detection), inspect, and verify CMS signatures from the shell. New in v0.3.0: `--watch`, `--template`, `--font {latin,emoji}`, auto signature placeholder. |
26
26
  | [`pdfnative-mcp`](https://www.npmjs.com/package/pdfnative-mcp) | **v0.3.0** | Use pdfnative from Claude Desktop, Cursor, Continue, Zed (or any stdio MCP client) — **9 structured tools** including the new `inspect_pdf`, a `pdfA` flag on every doc tool, multi-script `lang`, and per-tool `outputSchema` (MCP 2025-06-18). |
27
27
 
@@ -37,23 +37,24 @@ Detailed docs: [CLI guide](docs/guides/cli.md) · [MCP guide](docs/guides/mcp.md
37
37
 
38
38
  - **Zero dependencies** — built from scratch in pure TypeScript. Zero runtime dependencies, tree-shakeable, auditable
39
39
  - **ISO 32000-1 compliant** — valid xref tables, /Info metadata, proper font embedding
40
- - **16 Unicode scripts** — Thai, Japanese, Chinese (SC), Korean, Greek, Devanagari, Turkish, Vietnamese, Polish, Arabic, Hebrew, Cyrillic, Georgian, Armenian, Bengali, Tamil
40
+ - **22 Unicode scripts** — Thai, Japanese, Chinese (SC), Korean, Greek, Devanagari, Telugu, Turkish, Vietnamese, Polish, Arabic, Hebrew, Cyrillic, Georgian, Armenian, Bengali, Tamil, Sinhala, Tibetan, Khmer, Myanmar, Ethiopic
41
41
  - **Thai OpenType shaping** — GSUB substitution + GPOS mark-to-base + mark-to-mark positioning
42
42
  - **Arabic positional shaping** — GSUB isolated/initial/medial/final forms + lam-alef ligatures
43
- - **BiDi text layout** — Unicode Bidirectional Algorithm (UAX #9) with glyph mirroring, isolates (LRI/RLI/FSI/PDI), and explicit embeddings (LRE/RLE/LRO/RLO/PDF)
44
- - **USE-lite cluster classifier** — public API (`classifyUseCategory`, `classifyClusters`) with per-script tables for Devanagari, Bengali, Tamil (v1.2.0; shaper rewire lands in v1.3)
43
+ - **BiDi text layout** — Unicode Bidirectional Algorithm (UAX #9) with glyph mirroring, isolates (LRI/RLI/FSI/PDI), and explicit embeddings (LRE/RLE/LRO/RLO/PDF) including character-level X4–X5 overrides (v1.3.0)
44
+ - **USE-lite shaping** — `classifyUseCategory` / `classifyClusters` drive joiner classification across the Devanagari, Bengali, and Tamil shapers, fixing nukta+virama, half-form, eyelash-ra, and ya-phalaa edge cases (v1.3.0)
45
+ - **Colour emoji (COLRv1)** — opt-in Noto Color Emoji subset; solid + linear + radial gradient layers rendered as native PDF Form XObjects; monochrome fallback when not registered (v1.3.0). Variation selectors, ZWJ/ZWNJ, and skin-tone modifiers no longer leave tofu, and glyph `/BBox` is computed from contour bounds so emoji are never clipped (v1.3.0). [Guide →](docs/guides/colour-emoji.md)
45
46
  - **Multi-font fallback** — automatic cross-script font switching with continuation bias
46
47
  - **TTF subsetting** — only used glyphs embedded (dramatic file size reduction)
47
48
  - **Tagged PDF / PDF/A** — structure tree, /ActualText, XMP metadata, sRGB OutputIntent (PDF/A-1b, 2b, 2u, 3b with embedded file attachments)
48
49
  - **PDF Encryption** — AES-128 (V4/R4) and AES-256 (V5/R6), owner + user passwords, granular permissions
49
- - **Free-form document builder** — headings, paragraphs, lists, tables, images, barcodes, SVG paths, form fields, spacers, page breaks, table of contents
50
+ - **Free-form document builder** — headings, paragraphs, lists, tables, images, barcodes, SVG paths, form fields, spacers, page breaks, table of contents. Configurable block limit via `layout.maxBlocks` (default 100 000) for very large reports (v1.3.0)
50
51
  - **Smart tables** — multi-page slicing with repeated headers, auto-wrap on column overflow, zebra striping, captions, and smart auto-fit columns (v1.2.0). [Guide →](docs/guides/tables.md)
51
52
  - **Barcode & QR code generation** — Code 128, EAN-13, QR Code, Data Matrix, PDF417 — pure PDF path operators (no images)
52
53
  - **SVG path rendering** — path, rect, circle, ellipse, line, polyline, polygon as native PDF operators
53
54
  - **AcroForm fields** — text, multiline, checkbox, radio, dropdown, listbox with appearance streams (ISO 32000-1 §12.7)
54
55
  - **Digital signatures** — CMS/PKCS#7 detached signatures with RSA + ECDSA, SHA-256/384/512, X.509 parsing (ISO 32000-1 §12.8). One-call placeholder injection via `addSignaturePlaceholder()` (v1.2.0)
55
- - **Streaming output** — AsyncGenerator-based progressive PDF emission with configurable chunk size, plus object-boundary page-by-page streaming (`buildPDFStreamPageByPage()`, v1.2.0)
56
- - **PDF parser & modifier** — read existing PDFs (tokenizer, xref, object parser, FlateDecode inflate) + incremental modification
56
+ - **Streaming output** — AsyncGenerator-based progressive PDF emission with configurable chunk size, object-boundary page-by-page streaming, and **true constant-memory streaming** (`buildDocumentPDFStreamTrue()`, v1.3.0) where the full PDF binary never materialises. [Guide →](docs/guides/streaming.md)
57
+ - **PDF parser & modifier** — read existing PDFs (tokenizer, xref, object parser, FlateDecode inflate) + incremental modification. Read-only PDF/UA structural checker `validatePdfUA()` (ISO 14289-1: MarkInfo, StructTree, ParentTree, Lang, per-page MCID uniqueness) (v1.3.0)
57
58
  - **Image embedding** — JPEG (DCTDecode) and PNG (FlateDecode) with auto-scaling and alignment
58
59
  - **Hyperlinks** — PDF link annotations (/URI) with URL validation, blue underlined text, tagged /Link
59
60
  - **Header/footer templates** — configurable `PageTemplate` with left/center/right zones and `{page}`/`{pages}`/`{date}`/`{title}` placeholders
@@ -62,7 +63,7 @@ Detailed docs: [CLI guide](docs/guides/cli.md) · [MCP guide](docs/guides/mcp.md
62
63
  - **FlateDecode compression** — zlib stream compression (50–90% size reduction), zero-dependency, platform-native
63
64
  - **Web Worker support** — off-main-thread generation for large datasets
64
65
  - **Tree-shakeable** — ESM + CJS dual build with TypeScript declarations
65
- - **95%+ test coverage** — 1822+ tests across 53 files, fuzz suite, performance benchmarks
66
+ - **95%+ test coverage** — 1982+ tests across 71 files, fuzz suite, dual-mode visual-regression suite, performance benchmarks
66
67
  - **NPM provenance** — signed builds via GitHub Actions OIDC
67
68
  - **On-device generation** — runs in Node, browsers, Workers, Deno, Bun. No SaaS round-trip; documents never leave the calling process unless your application explicitly sends them
68
69
  - **No telemetry, no network calls** — verifiable in source. The library never opens a socket, fetches remote fonts, or phones home
@@ -86,7 +87,7 @@ npm install pdfnative
86
87
  - ❓ **FAQ:** [docs/guides/faq.md](docs/guides/faq.md) — fonts, encryption, signatures, comparisons.
87
88
  - 🛠️ **Troubleshooting:** [docs/guides/troubleshooting.md](docs/guides/troubleshooting.md) — common pitfalls.
88
89
  - 🎮 **Playgrounds:** [docs/playgrounds/extreme-scripts.html](docs/playgrounds/extreme-scripts.html) (live BiDi/Indic stress tests) and [docs/playgrounds/medical-800.html](docs/playgrounds/medical-800.html) (800-page Web Worker showcase).
89
- - 🧪 **Sample PDFs:** [scripts/generators/](scripts/generators/) — ~140 sample PDFs across 23 categories (see [Sample PDFs](#sample-pdfs) below).
90
+ - 🧪 **Sample PDFs:** [scripts/generators/](scripts/generators/) — ~187 sample PDFs across 32 categories (see [Sample PDFs](#sample-pdfs) below).
90
91
 
91
92
  ## Why pdfnative?
92
93
 
@@ -201,6 +202,12 @@ registerFonts({
201
202
  hy: () => import('pdfnative/fonts/noto-armenian-data.js'),
202
203
  bn: () => import('pdfnative/fonts/noto-bengali-data.js'),
203
204
  ta: () => import('pdfnative/fonts/noto-tamil-data.js'),
205
+ te: () => import('pdfnative/fonts/noto-telugu-data.js'), // v1.3.0
206
+ si: () => import('pdfnative/fonts/noto-sinhala-data.js'), // v1.3.0
207
+ bo: () => import('pdfnative/fonts/noto-tibetan-data.js'), // v1.3.0
208
+ km: () => import('pdfnative/fonts/noto-khmer-data.js'), // v1.3.0
209
+ my: () => import('pdfnative/fonts/noto-myanmar-data.js'), // v1.3.0
210
+ am: () => import('pdfnative/fonts/noto-ethiopic-data.js'), // v1.3.0
204
211
  // v1.1.0+ — optional Latin fallback for PDF/A documents with curly quotes,
205
212
  // em-dash, ellipsis, etc. (activates automatically when needed):
206
213
  latin: () => import('pdfnative/fonts/noto-sans-data.js'),
@@ -420,7 +427,7 @@ See [scripts/README.md](scripts/README.md) for the modular generator architectur
420
427
  | `sample-hy.pdf` | Armenian |
421
428
  | `sample-bn.pdf` | Bengali (GSUB conjuncts + GPOS marks) |
422
429
  | `sample-ta.pdf` | Tamil (GSUB + split vowel decomposition) |
423
- | `sample-multi.pdf` | Mixed: all 16 scripts in one PDF |
430
+ | `sample-multi.pdf` | Mixed: all 22 scripts in one PDF |
424
431
  | `sample-pagination.pdf` | 200 rows, multi-page layout |
425
432
 
426
433
  ### Diverse Use Cases (non-financial)
@@ -515,8 +522,14 @@ See [scripts/README.md](scripts/README.md) for the modular generator architectur
515
522
  | `doc-bengali.pdf` | Bengali document (GSUB conjuncts + GPOS marks) |
516
523
  | `doc-tamil.pdf` | Tamil document (GSUB substitution + split vowels) |
517
524
  | `doc-devanagari.pdf` | Hindi (Devanagari) document — GSUB conjuncts, reph reordering, matra reordering, split vowels |
525
+ | `doc-telugu.pdf` | Telugu document (virama conjuncts + GPOS marks, no reph) |
526
+ | `doc-sinhala.pdf` | Sinhala document (virama conjuncts + pre-base kombuva reordering) |
527
+ | `doc-tibetan.pdf` | Tibetan document (vertical subjoined-consonant stacking) |
528
+ | `doc-khmer.pdf` | Khmer document (USE-lite: coeng subscripts, pre-base vowels) |
529
+ | `doc-myanmar.pdf` | Myanmar document (USE-lite: medials, pre-base reordering) |
530
+ | `doc-amharic.pdf` | Amharic/Ethiopic document (syllabic abugida, no reordering) |
518
531
  | `doc-chinese-catalog.pdf` | Chinese product catalog (tables, ordering info) |
519
- | `doc-multi-language.pdf` | Multi-language: EN + Arabic + Japanese in one PDF |
532
+ | `doc-multi-language.pdf` | Multi-language showcase: all 22 Unicode scripts in one PDF |
520
533
  | `doc-invoice.pdf` | Invoice template (line items, totals, payment link) |
521
534
  | `doc-report-multipage.pdf` | 3-page technical report (7 sections, 4 tables) |
522
535
  | `doc-contract-bilingual.pdf` | Bilingual EN/AR contract (legal sections, signatures) |
@@ -698,6 +711,10 @@ See [scripts/README.md](scripts/README.md) for the modular generator architectur
698
711
  |----------|-------------|
699
712
  | `buildDocumentPDFStream(params, layout?, streamOpts?)` | Stream document PDF as `AsyncGenerator<Uint8Array>` |
700
713
  | `buildPDFStream(params, layout?, streamOpts?)` | Stream table PDF as `AsyncGenerator<Uint8Array>` |
714
+ | `buildDocumentPDFStreamTrue(params, layout?, streamOpts?)` | **True constant-memory** document streaming — frees each part as it yields (v1.3.0) |
715
+ | `buildPDFStreamTrue(params, layout?, streamOpts?)` | **True constant-memory** table streaming (v1.3.0) |
716
+ | `buildDocumentPDFStreamPageByPage(params, layout?)` | Stream document PDF chunked at PDF object boundaries |
717
+ | `buildPDFStreamPageByPage(params, layout?)` | Stream table PDF chunked at PDF object boundaries |
701
718
  | `validateDocumentStreamable(params, layout?)` | Validate document is compatible with streaming (no TOC, no `{pages}`) |
702
719
  | `validateTableStreamable(params, layout?)` | Validate table is compatible with streaming |
703
720
  | `chunkBinaryString(str, chunkSize)` | Split binary string into `Uint8Array` chunks |
@@ -731,6 +748,7 @@ See [scripts/README.md](scripts/README.md) for the modular generator architectur
731
748
  | `isRef(v)` / `isDict(v)` / `isArray(v)` / `isStream(v)` | Type guards for parsed PDF values |
732
749
  | `dictGet(dict, key)` / `dictGetName(dict, key)` | Dictionary value accessors |
733
750
  | `inflateSync(data)` | Decompress FlateDecode data (zlib inflate) |
751
+ | `validatePdfUA(bytes)` | Read-only PDF/UA structural checker — returns `{ valid, errors, warnings }` (v1.3.0) |
734
752
 
735
753
  ### Document Block Types
736
754
 
@@ -811,6 +829,11 @@ const pdf = buildPDFBytes(params, { compress: true });
811
829
  | `shapeBengaliText(str, fontData)` | Bengali GSUB conjuncts + GPOS marks |
812
830
  | `shapeTamilText(str, fontData)` | Tamil GSUB + split vowel decomposition |
813
831
  | `shapeDevanagariText(str, fontData)` | Devanagari cluster shaping + GSUB/GPOS |
832
+ | `shapeTeluguText(str, fontData)` | Telugu GSUB conjuncts + GPOS marks (v1.3.0) |
833
+ | `shapeSinhalaText(str, fontData)` | Sinhala conjuncts + pre-base reorder + GSUB/GPOS (v1.3.0) |
834
+ | `shapeTibetanText(str, fontData)` | Tibetan vertical subjoined stacking (v1.3.0) |
835
+ | `shapeKhmerText(str, fontData)` | Khmer USE-lite — coeng subscripts + pre-base vowels (v1.3.0) |
836
+ | `shapeMyanmarText(str, fontData)` | Myanmar USE-lite — medials + virama stacking (v1.3.0) |
814
837
  | `detectFallbackLangs(texts, primaryLang)` | Detect needed fallback fonts |
815
838
  | `detectCharLang(codePoint)` | Map codepoint to preferred font language |
816
839
  | `splitTextByFont(str, fontEntries)` | Multi-font text run splitting |
@@ -821,6 +844,10 @@ const pdf = buildPDFBytes(params, { compress: true });
821
844
  | `shapeArabicText(str, fontData)` | Arabic GSUB positional shaping |
822
845
  | `containsArabic(text)` | Detect Arabic content |
823
846
  | `containsHebrew(text)` | Detect Hebrew content |
847
+ | `containsTelugu(text)` | Detect Telugu content (v1.3.0) |
848
+ | `isTeluguCodepoint(cp)` | Telugu codepoint predicate (v1.3.0) |
849
+ | `containsSinhala(text)` / `containsTibetan(text)` / `containsKhmer(text)` / `containsMyanmar(text)` / `containsEthiopic(text)` | Detect script content (v1.3.0) |
850
+ | `isSinhalaCodepoint(cp)` / `isTibetanCodepoint(cp)` / `isKhmerCodepoint(cp)` / `isMyanmarCodepoint(cp)` / `isEthiopicCodepoint(cp)` | Codepoint predicates (v1.3.0) |
824
851
 
825
852
  ### Layout Constants
826
853
 
@@ -971,9 +998,9 @@ src/
971
998
  ├── worker-api.ts # Worker/main-thread dispatch
972
999
  └── pdf-worker.ts # Self-contained worker entry
973
1000
 
974
- fonts/ # Pre-built font data modules (16 scripts)
1001
+ fonts/ # Pre-built font data modules (22 scripts)
975
1002
  tools/ # CLI: build-font-data.cjs (TTF → JS module)
976
- scripts/ # Modular sample PDF generation (23 generators, 140+ PDFs)
1003
+ scripts/ # Modular sample PDF generation (32 generators, 187+ PDFs)
977
1004
  tests/ # 1726+ tests (48 files: unit + integration + fuzz + parser)
978
1005
  bench/ # Performance benchmarks (vitest bench)
979
1006
  ```
@@ -1191,7 +1218,7 @@ pdfnative targets ES2020 and works in any environment that supports `Uint8Array`
1191
1218
 
1192
1219
  ## Origin
1193
1220
 
1194
- pdfnative was born inside [**plika.app**](https://plika.app) — a personal finance application where high-quality, multi-language PDF generation (bank statements, transaction reports) was a core requirement. Rather than depending on heavy third-party libraries, the PDF engine was built from scratch with zero dependencies, strict ISO compliance, and native support for 16 Unicode scripts.
1221
+ pdfnative was born inside [**plika.app**](https://plika.app) — a personal finance application where high-quality, multi-language PDF generation (bank statements, transaction reports) was a core requirement. Rather than depending on heavy third-party libraries, the PDF engine was built from scratch with zero dependencies, strict ISO compliance, and native support for 22 Unicode scripts.
1195
1222
 
1196
1223
  The decision was then made to extract the engine into an independent open-source library so that everyone can benefit from production-grade PDF generation — not just plika.app users.
1197
1224