@dvvebond/core 0.2.12 → 0.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,32 +1,26 @@
1
- # LibPDF
1
+ # @dvvebond/core
2
2
 
3
- [![npm](https://img.shields.io/npm/v/@libpdf/core)](https://www.npmjs.com/package/@libpdf/core)
4
- [![npm downloads](https://img.shields.io/npm/dm/@libpdf/core)](https://www.npmjs.com/package/@libpdf/core)
5
- [![CI](https://github.com/LibPDF-js/core/actions/workflows/ci.yml/badge.svg)](https://github.com/LibPDF-js/core/actions/workflows/ci.yml)
6
- [![GitHub stars](https://img.shields.io/github/stars/libpdf-js/core?style=flat)](https://github.com/LibPDF-js/core)
7
- [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
3
+ [![npm](https://img.shields.io/npm/v/@dvvebond/core)](https://www.npmjs.com/package/@dvvebond/core)
8
4
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
9
6
 
10
- A modern PDF library for TypeScript. Parse, modify, and generate PDFs with a clean, intuitive API.
11
-
12
- > **Beta Software**: LibPDF is under active development and APIs may change between minor versions, but we use it in production at [Documenso](https://documenso.com) and consider it ready for real-world use.
13
-
14
- ## Why LibPDF?
15
-
16
- LibPDF was born from frustration. At [Documenso](https://documenso.com), we found ourselves wrestling with the JavaScript PDF ecosystem:
7
+ A fork of [@libpdf/core](https://github.com/LibPDF-js/core) with enhanced React components, Azure Document Intelligence integration, text extraction with bounding boxes, and enterprise PDF viewing features.
17
8
 
18
- - **PDF.js** is excellent for rendering and even has annotation editing — but it requires a browser
19
- - **pdf-lib** has a great API, but chokes on slightly malformed documents
20
- - **pdfkit** only generates, no parsing at all
9
+ ## Fork Enhancements
21
10
 
22
- We kept adding workarounds. A patch here for a malformed xref table. A hack there for an encrypted document. Eventually, we decided to build what we actually needed:
11
+ This fork extends the original LibPDF library with:
23
12
 
24
- - **Lenient like PDFBox and PDF.js**: opens documents other libraries reject
25
- - **Intuitive like pdf-lib**: clean, TypeScript-first API
26
- - **Complete**: encryption, digital signatures, incremental saves, form filling
13
+ - **React Integration Layer**: Production-ready React components and hooks for PDF viewing
14
+ - **Text Extraction with Bounding Boxes**: Character, word, line, and paragraph-level extraction with precise coordinates
15
+ - **Search Functionality**: Full-text search with highlighting and navigation
16
+ - **Viewport Management**: Virtual scrolling for large documents with efficient memory usage
17
+ - **Azure Document Intelligence Integration**: Process PDFs with Azure AI services
18
+ - **Coordinate Transformation**: Convert between PDF and screen coordinates
27
19
 
28
20
  ## Features
29
21
 
22
+ ### Core PDF Operations (from LibPDF)
23
+
30
24
  | Feature | Status | Notes |
31
25
  | ------------------ | ------ | ------------------------------------------ |
32
26
  | Parse any PDF | Yes | Graceful fallback for malformed documents |
@@ -42,170 +36,311 @@ We kept adding workarounds. A patch here for a malformed xref table. A hack ther
42
36
  | Images | Yes | JPEG, PNG (with alpha) |
43
37
  | Incremental Saves | Yes | Append changes, preserve signatures |
44
38
 
39
+ ### Enhanced Features (this fork)
40
+
41
+ | Feature | Description |
42
+ | -------------------------- | -------------------------------------------------------------- |
43
+ | React Components | ReactPDFViewer, PageNavigation, ZoomControls, SearchInput |
44
+ | React Hooks | usePDFViewer, usePDFSearch, useBoundingBoxOverlay, useViewport |
45
+ | Text Extraction | Hierarchical extraction (char/word/line/paragraph) |
46
+ | Bounding Box Visualization | Overlay system for highlighting text regions |
47
+ | Virtual Scrolling | Efficient rendering of large documents |
48
+ | Search Engine | Full-text search with match highlighting |
49
+ | Coordinate Transformation | PDF-to-screen and screen-to-PDF conversions |
50
+ | PDF.js Integration | Seamless integration with Mozilla's PDF.js |
51
+
45
52
  ## Installation
46
53
 
47
54
  ```bash
48
- npm install @libpdf/core
55
+ npm install @dvvebond/core
56
+ # or
57
+ bun add @dvvebond/core
49
58
  # or
50
- bun add @libpdf/core
59
+ pnpm add @dvvebond/core
60
+ ```
61
+
62
+ For React components:
63
+
64
+ ```bash
65
+ # React is a peer dependency
66
+ npm install @dvvebond/core react react-dom
51
67
  ```
52
68
 
53
69
  ## Quick Start
54
70
 
55
- ### Parse an existing PDF
71
+ ### Basic PDF Loading
56
72
 
57
73
  ```typescript
58
- import { PDF } from "@libpdf/core";
74
+ import { PDF } from "@dvvebond/core";
59
75
 
60
76
  const pdf = await PDF.load(bytes);
61
77
  const pages = pdf.getPages();
62
-
63
78
  console.log(`${pages.length} pages`);
64
79
  ```
65
80
 
66
- ### Open an encrypted PDF
67
-
68
- ```typescript
69
- const pdf = await PDF.load(bytes, { credentials: "password" });
81
+ ### React PDF Viewer
82
+
83
+ ```tsx
84
+ import { ReactPDFViewer, usePDFViewer, PageNavigation, ZoomControls } from "@dvvebond/core/react";
85
+ import { useRef } from "react";
86
+
87
+ function PDFViewerApp() {
88
+ const viewerRef = useRef<ReactPDFViewerRef>(null);
89
+
90
+ return (
91
+ <div style={{ height: "100vh", display: "flex", flexDirection: "column" }}>
92
+ <div className="toolbar">
93
+ <PageNavigation viewerRef={viewerRef} />
94
+ <ZoomControls viewerRef={viewerRef} />
95
+ </div>
96
+ <ReactPDFViewer
97
+ ref={viewerRef}
98
+ url="/document.pdf"
99
+ initialScale={1.0}
100
+ onPageChange={page => console.log("Current page:", page)}
101
+ onDocumentLoad={info => console.log("Loaded:", info.numPages, "pages")}
102
+ />
103
+ </div>
104
+ );
105
+ }
70
106
  ```
71
107
 
72
- ### Fill a form
108
+ ### Text Extraction with Bounding Boxes
73
109
 
74
110
  ```typescript
75
- const pdf = await PDF.load(bytes);
76
- const form = pdf.getForm();
77
-
78
- form.fill({
79
- name: "Jane Doe",
80
- email: "jane@example.com",
81
- agreed: true,
111
+ import {
112
+ HierarchicalTextExtractor,
113
+ createHierarchicalTextExtractor,
114
+ type TextPage,
115
+ } from "@dvvebond/core";
116
+
117
+ // Extract text with full hierarchy
118
+ const extractor = createHierarchicalTextExtractor();
119
+ const pageText: TextPage = await extractor.extractPage(pdfPage, {
120
+ includeCharacters: true,
121
+ includeWords: true,
122
+ includeLines: true,
123
+ includeParagraphs: true,
82
124
  });
83
125
 
84
- const filled = await pdf.save();
126
+ // Access bounding boxes at any level
127
+ for (const paragraph of pageText.paragraphs) {
128
+ console.log("Paragraph bbox:", paragraph.boundingBox);
129
+ for (const line of paragraph.lines) {
130
+ console.log(" Line:", line.text, "at", line.boundingBox);
131
+ }
132
+ }
133
+ ```
134
+
135
+ ### Search with Highlighting
136
+
137
+ ```tsx
138
+ import { usePDFSearch, useBoundingBoxOverlay } from "@dvvebond/core/react";
139
+
140
+ function SearchableViewer() {
141
+ const { searchState, search, nextMatch, prevMatch, clearSearch } = usePDFSearch();
142
+ const { boundingBoxes, setHighlights } = useBoundingBoxOverlay();
143
+
144
+ const handleSearch = async (query: string) => {
145
+ const results = await search(query);
146
+ // Results include bounding boxes for highlighting
147
+ setHighlights(results.map(r => r.boundingBox));
148
+ };
149
+
150
+ return (
151
+ <div>
152
+ <input type="text" onChange={e => handleSearch(e.target.value)} placeholder="Search..." />
153
+ <span>
154
+ {searchState.currentMatch} / {searchState.totalMatches}
155
+ </span>
156
+ <button onClick={prevMatch}>Previous</button>
157
+ <button onClick={nextMatch}>Next</button>
158
+ </div>
159
+ );
160
+ }
85
161
  ```
86
162
 
87
- ### Sign a document
163
+ ### Bounding Box Visualization
88
164
 
89
165
  ```typescript
90
- import { PDF, P12Signer } from "@libpdf/core";
166
+ import { createBoundingBoxOverlay, type OverlayBoundingBox } from "@dvvebond/core";
91
167
 
92
- const pdf = await PDF.load(bytes);
93
- const signer = await P12Signer.create(p12Bytes, "password");
168
+ const overlay = createBoundingBoxOverlay(containerElement, {
169
+ pageWidth: 612,
170
+ pageHeight: 792,
171
+ scale: 1.5,
172
+ });
94
173
 
95
- const signed = await pdf.sign({
96
- signer,
97
- reason: "I approve this document",
174
+ // Add bounding boxes with different types
175
+ overlay.setBoundingBoxes([
176
+ { x: 50, y: 100, width: 200, height: 20, type: "word", text: "Hello" },
177
+ { x: 50, y: 130, width: 300, height: 20, type: "line", text: "Hello World" },
178
+ { x: 50, y: 100, width: 300, height: 100, type: "paragraph" },
179
+ ]);
180
+
181
+ // Control visibility by type
182
+ overlay.setVisibility({
183
+ character: false,
184
+ word: true,
185
+ line: true,
186
+ paragraph: false,
98
187
  });
99
188
  ```
100
189
 
101
- ### Merge PDFs
190
+ ### Virtual Scrolling for Large Documents
102
191
 
103
192
  ```typescript
104
- const merged = await PDF.merge([pdf1Bytes, pdf2Bytes, pdf3Bytes]);
105
- ```
193
+ import { createViewportManager, createVirtualScroller, type PageSource } from "@dvvebond/core";
106
194
 
107
- ### Draw on a page
195
+ const scroller = createVirtualScroller(containerElement, {
196
+ totalPages: 100,
197
+ pageHeight: 792,
198
+ pageGap: 10,
199
+ overscan: 2, // Render 2 extra pages above/below viewport
200
+ });
108
201
 
109
- ```typescript
110
- import { PDF, rgb } from "@libpdf/core";
202
+ const viewportManager = createViewportManager({
203
+ pageSource: pdfDocument,
204
+ scroller,
205
+ renderer: createCanvasRenderer(),
206
+ });
207
+
208
+ // Pages are automatically loaded/unloaded as user scrolls
209
+ scroller.on("visibleRangeChange", ({ startPage, endPage }) => {
210
+ console.log(`Visible pages: ${startPage} - ${endPage}`);
211
+ });
212
+ ```
111
213
 
112
- const pdf = PDF.create();
113
- const page = pdf.addPage({ size: "letter" });
214
+ ### Coordinate Transformation
114
215
 
115
- page.drawText("Hello, World!", {
116
- x: 50,
117
- y: 700,
118
- fontSize: 24,
119
- color: rgb(0, 0, 0),
216
+ ```typescript
217
+ import {
218
+ createCoordinateTransformer,
219
+ getMousePdfCoordinates,
220
+ transformBoundingBoxes,
221
+ } from "@dvvebond/core";
222
+
223
+ const transformer = createCoordinateTransformer({
224
+ pageWidth: 612,
225
+ pageHeight: 792,
226
+ scale: 1.5,
227
+ rotation: 0,
120
228
  });
121
229
 
122
- page.drawRectangle({
123
- x: 50,
124
- y: 600,
125
- width: 200,
126
- height: 100,
127
- color: rgb(0.9, 0.9, 0.9),
128
- borderColor: rgb(0, 0, 0),
129
- borderWidth: 1,
230
+ // Handle click events on PDF
231
+ containerElement.addEventListener("click", event => {
232
+ const pdfCoords = getMousePdfCoordinates(event, containerElement, transformer);
233
+ console.log(`Clicked at PDF coordinates: (${pdfCoords.x}, ${pdfCoords.y})`);
130
234
  });
131
235
 
132
- const output = await pdf.save();
236
+ // Transform bounding boxes from PDF to screen coordinates
237
+ const screenBoxes = transformBoundingBoxes(pdfBoundingBoxes, transformer);
133
238
  ```
134
239
 
135
- ## Runtime Support
240
+ ### PDF.js Integration
136
241
 
137
- LibPDF runs everywhere:
242
+ ```typescript
243
+ import {
244
+ initializePDFJS,
245
+ loadPDFJSDocument,
246
+ getPDFJSTextContent,
247
+ createPDFJSRenderer,
248
+ } from "@dvvebond/core";
249
+
250
+ // Initialize PDF.js (call once at app startup)
251
+ await initializePDFJS();
252
+
253
+ // Load document
254
+ const doc = await loadPDFJSDocument(pdfBytes);
255
+ const page = await doc.getPage(1);
256
+
257
+ // Render to canvas
258
+ const renderer = createPDFJSRenderer(canvas, {
259
+ scale: 1.5,
260
+ enableTextLayer: true,
261
+ });
262
+ await renderer.render(page);
138
263
 
139
- - **Node.js** 20+
140
- - **Bun**
141
- - **Browsers** (modern, with Web Crypto)
264
+ // Extract text content
265
+ const textContent = await getPDFJSTextContent(page);
266
+ ```
142
267
 
143
- ## Known Limitations
268
+ ## React Hooks Reference
144
269
 
145
- Some features are not yet implemented:
270
+ ### usePDFViewer
146
271
 
147
- | Feature | Status | Notes |
148
- | --------------------------- | ---------------- | -------------------------------------- |
149
- | Signature verification | Not implemented | Signing works; verification is planned |
150
- | TrueType Collections (.ttc) | Not supported | Extract individual fonts first |
151
- | JBIG2 image decoding | Passthrough only | Images preserved but not decoded |
152
- | JPEG2000 (JPX) decoding | Passthrough only | Images preserved but not decoded |
153
- | Certificate encryption | Not supported | Password encryption works |
154
- | JavaScript actions | Ignored | Form calculations not executed |
272
+ Main hook for PDF viewer state management.
155
273
 
156
- These limitations are documented to set expectations. Most don't affect typical use cases like form filling, signing, or document manipulation.
274
+ ```tsx
275
+ const { currentPage, totalPages, scale, isLoading, error, goToPage, setScale, zoomIn, zoomOut } =
276
+ usePDFViewer(viewerRef);
277
+ ```
278
+
279
+ ### usePDFSearch
157
280
 
158
- ## Philosophy
281
+ Hook for search functionality.
159
282
 
160
- ### Be lenient
283
+ ```tsx
284
+ const {
285
+ searchState, // { query, matches, currentMatch, totalMatches, status }
286
+ search, // (query: string) => Promise<SearchResult[]>
287
+ nextMatch, // () => void
288
+ prevMatch, // () => void
289
+ clearSearch, // () => void
290
+ } = usePDFSearch(viewerRef);
291
+ ```
161
292
 
162
- Real-world PDFs are messy. Export a document through three different tools and you'll get three slightly different interpretations of the spec. LibPDF prioritizes _opening your document_ over strict compliance. When standard parsing fails, we fall back to brute-force recovery, scanning the entire file to rebuild the structure.
293
+ ### useBoundingBoxOverlay
163
294
 
164
- ### Two API layers
295
+ Hook for bounding box visualization.
165
296
 
166
- - **High-level**: `PDF`, `PDFPage`, `PDFForm` for common tasks
167
- - **Low-level**: `PdfDict`, `PdfArray`, `PdfStream` for full control
297
+ ```tsx
298
+ const {
299
+ boundingBoxes,
300
+ visibility,
301
+ setVisibility,
302
+ addBoundingBoxes,
303
+ clearBoundingBoxes,
304
+ highlightBox,
305
+ } = useBoundingBoxOverlay(viewerRef);
306
+ ```
168
307
 
169
- ## Demo
308
+ ### useViewport
170
309
 
171
- Run the interactive PDF viewer demo to explore LibPDF's viewing capabilities:
310
+ Hook for viewport information.
172
311
 
173
- ```bash
174
- bun run demo
312
+ ```tsx
313
+ const { viewportWidth, viewportHeight, scrollTop, scrollLeft, visiblePages } =
314
+ useViewport(viewerRef);
175
315
  ```
176
316
 
177
- See [demo/README.md](demo/README.md) for features and keyboard shortcuts.
317
+ ### useScrollPosition
178
318
 
179
- ## Documentation
319
+ Hook for scroll position tracking.
180
320
 
181
- Full documentation at [libpdf.dev](https://libpdf.dev)
321
+ ```tsx
322
+ const { scrollTop, scrollLeft, scrollTo, scrollToPage } = useScrollPosition(viewerRef);
323
+ ```
182
324
 
183
- ## Sponsors
325
+ ## Runtime Support
184
326
 
185
- LibPDF is developed by [Documenso](https://documenso.com), the open-source DocuSign alternative.
327
+ Works in all modern JavaScript environments:
186
328
 
187
- <a href="https://documenso.com">
188
- <img src="apps/docs/public/sponsors/documenso.png" alt="Documenso" height="24">
189
- </a>
329
+ - **Node.js** 20+
330
+ - **Bun**
331
+ - **Browsers** (modern, with Web Crypto)
190
332
 
191
- ## Contributing
333
+ ## Migration from react-pdf
192
334
 
193
- We welcome contributions! See our [contributing guide](CONTRIBUTING.md) for details.
335
+ See [MIGRATION.md](./MIGRATION.md) for a detailed migration guide from react-pdf to @dvvebond/core.
194
336
 
195
- ```bash
196
- # Clone the repo
197
- git clone https://github.com/libpdf/core.git
198
- cd libpdf
337
+ ## API Reference
199
338
 
200
- # Install dependencies
201
- bun install
339
+ See [API.md](./API.md) for complete API documentation.
202
340
 
203
- # Run tests
204
- bun run test
341
+ ## Acknowledgments
205
342
 
206
- # Type check
207
- bun run typecheck
208
- ```
343
+ This project is a fork of [LibPDF](https://github.com/LibPDF-js/core) by [Documenso](https://documenso.com). The core PDF parsing and generation functionality is their excellent work.
209
344
 
210
345
  ## License
211
346