@dvvebond/core 0.2.12 → 0.2.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +249 -114
- package/dist/index.cjs +60708 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +21553 -0
- package/dist/index.d.mts +1 -581
- package/dist/index.mjs +2 -1081
- package/dist/index.mjs.map +1 -1
- package/dist/parsing-worker-host-CBKQ4mss.cjs +652 -0
- package/dist/parsing-worker-host-CBKQ4mss.cjs.map +1 -0
- package/dist/parsing-worker-host-DIPVulML.cjs +3 -0
- package/dist/react.cjs +49896 -0
- package/dist/react.cjs.map +1 -0
- package/dist/react.d.cts +11207 -0
- package/dist/react.d.mts +11207 -0
- package/dist/react.mjs +49855 -0
- package/dist/react.mjs.map +1 -0
- package/package.json +23 -8
package/README.md
CHANGED
|
@@ -1,32 +1,26 @@
|
|
|
1
|
-
#
|
|
1
|
+
# @dvvebond/core
|
|
2
2
|
|
|
3
|
-
[](https://www.npmjs.com/package/@libpdf/core)
|
|
5
|
-
[](https://github.com/LibPDF-js/core/actions/workflows/ci.yml)
|
|
6
|
-
[](https://github.com/LibPDF-js/core)
|
|
7
|
-
[](LICENSE)
|
|
3
|
+
[](https://www.npmjs.com/package/@dvvebond/core)
|
|
8
4
|
[](https://www.typescriptlang.org/)
|
|
5
|
+
[](LICENSE)
|
|
9
6
|
|
|
10
|
-
A
|
|
11
|
-
|
|
12
|
-
> **Beta Software**: LibPDF is under active development and APIs may change between minor versions, but we use it in production at [Documenso](https://documenso.com) and consider it ready for real-world use.
|
|
13
|
-
|
|
14
|
-
## Why LibPDF?
|
|
15
|
-
|
|
16
|
-
LibPDF was born from frustration. At [Documenso](https://documenso.com), we found ourselves wrestling with the JavaScript PDF ecosystem:
|
|
7
|
+
A fork of [@libpdf/core](https://github.com/LibPDF-js/core) with enhanced React components, Azure Document Intelligence integration, text extraction with bounding boxes, and enterprise PDF viewing features.
|
|
17
8
|
|
|
18
|
-
|
|
19
|
-
- **pdf-lib** has a great API, but chokes on slightly malformed documents
|
|
20
|
-
- **pdfkit** only generates, no parsing at all
|
|
9
|
+
## Fork Enhancements
|
|
21
10
|
|
|
22
|
-
|
|
11
|
+
This fork extends the original LibPDF library with:
|
|
23
12
|
|
|
24
|
-
- **
|
|
25
|
-
- **
|
|
26
|
-
- **
|
|
13
|
+
- **React Integration Layer**: Production-ready React components and hooks for PDF viewing
|
|
14
|
+
- **Text Extraction with Bounding Boxes**: Character, word, line, and paragraph-level extraction with precise coordinates
|
|
15
|
+
- **Search Functionality**: Full-text search with highlighting and navigation
|
|
16
|
+
- **Viewport Management**: Virtual scrolling for large documents with efficient memory usage
|
|
17
|
+
- **Azure Document Intelligence Integration**: Process PDFs with Azure AI services
|
|
18
|
+
- **Coordinate Transformation**: Convert between PDF and screen coordinates
|
|
27
19
|
|
|
28
20
|
## Features
|
|
29
21
|
|
|
22
|
+
### Core PDF Operations (from LibPDF)
|
|
23
|
+
|
|
30
24
|
| Feature | Status | Notes |
|
|
31
25
|
| ------------------ | ------ | ------------------------------------------ |
|
|
32
26
|
| Parse any PDF | Yes | Graceful fallback for malformed documents |
|
|
@@ -42,170 +36,311 @@ We kept adding workarounds. A patch here for a malformed xref table. A hack ther
|
|
|
42
36
|
| Images | Yes | JPEG, PNG (with alpha) |
|
|
43
37
|
| Incremental Saves | Yes | Append changes, preserve signatures |
|
|
44
38
|
|
|
39
|
+
### Enhanced Features (this fork)
|
|
40
|
+
|
|
41
|
+
| Feature | Description |
|
|
42
|
+
| -------------------------- | -------------------------------------------------------------- |
|
|
43
|
+
| React Components | ReactPDFViewer, PageNavigation, ZoomControls, SearchInput |
|
|
44
|
+
| React Hooks | usePDFViewer, usePDFSearch, useBoundingBoxOverlay, useViewport |
|
|
45
|
+
| Text Extraction | Hierarchical extraction (char/word/line/paragraph) |
|
|
46
|
+
| Bounding Box Visualization | Overlay system for highlighting text regions |
|
|
47
|
+
| Virtual Scrolling | Efficient rendering of large documents |
|
|
48
|
+
| Search Engine | Full-text search with match highlighting |
|
|
49
|
+
| Coordinate Transformation | PDF-to-screen and screen-to-PDF conversions |
|
|
50
|
+
| PDF.js Integration | Seamless integration with Mozilla's PDF.js |
|
|
51
|
+
|
|
45
52
|
## Installation
|
|
46
53
|
|
|
47
54
|
```bash
|
|
48
|
-
npm install @
|
|
55
|
+
npm install @dvvebond/core
|
|
56
|
+
# or
|
|
57
|
+
bun add @dvvebond/core
|
|
49
58
|
# or
|
|
50
|
-
|
|
59
|
+
pnpm add @dvvebond/core
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
For React components:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
# React is a peer dependency
|
|
66
|
+
npm install @dvvebond/core react react-dom
|
|
51
67
|
```
|
|
52
68
|
|
|
53
69
|
## Quick Start
|
|
54
70
|
|
|
55
|
-
###
|
|
71
|
+
### Basic PDF Loading
|
|
56
72
|
|
|
57
73
|
```typescript
|
|
58
|
-
import { PDF } from "@
|
|
74
|
+
import { PDF } from "@dvvebond/core";
|
|
59
75
|
|
|
60
76
|
const pdf = await PDF.load(bytes);
|
|
61
77
|
const pages = pdf.getPages();
|
|
62
|
-
|
|
63
78
|
console.log(`${pages.length} pages`);
|
|
64
79
|
```
|
|
65
80
|
|
|
66
|
-
###
|
|
67
|
-
|
|
68
|
-
```
|
|
69
|
-
|
|
81
|
+
### React PDF Viewer
|
|
82
|
+
|
|
83
|
+
```tsx
|
|
84
|
+
import { ReactPDFViewer, usePDFViewer, PageNavigation, ZoomControls } from "@dvvebond/core/react";
|
|
85
|
+
import { useRef } from "react";
|
|
86
|
+
|
|
87
|
+
function PDFViewerApp() {
|
|
88
|
+
const viewerRef = useRef<ReactPDFViewerRef>(null);
|
|
89
|
+
|
|
90
|
+
return (
|
|
91
|
+
<div style={{ height: "100vh", display: "flex", flexDirection: "column" }}>
|
|
92
|
+
<div className="toolbar">
|
|
93
|
+
<PageNavigation viewerRef={viewerRef} />
|
|
94
|
+
<ZoomControls viewerRef={viewerRef} />
|
|
95
|
+
</div>
|
|
96
|
+
<ReactPDFViewer
|
|
97
|
+
ref={viewerRef}
|
|
98
|
+
url="/document.pdf"
|
|
99
|
+
initialScale={1.0}
|
|
100
|
+
onPageChange={page => console.log("Current page:", page)}
|
|
101
|
+
onDocumentLoad={info => console.log("Loaded:", info.numPages, "pages")}
|
|
102
|
+
/>
|
|
103
|
+
</div>
|
|
104
|
+
);
|
|
105
|
+
}
|
|
70
106
|
```
|
|
71
107
|
|
|
72
|
-
###
|
|
108
|
+
### Text Extraction with Bounding Boxes
|
|
73
109
|
|
|
74
110
|
```typescript
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
111
|
+
import {
|
|
112
|
+
HierarchicalTextExtractor,
|
|
113
|
+
createHierarchicalTextExtractor,
|
|
114
|
+
type TextPage,
|
|
115
|
+
} from "@dvvebond/core";
|
|
116
|
+
|
|
117
|
+
// Extract text with full hierarchy
|
|
118
|
+
const extractor = createHierarchicalTextExtractor();
|
|
119
|
+
const pageText: TextPage = await extractor.extractPage(pdfPage, {
|
|
120
|
+
includeCharacters: true,
|
|
121
|
+
includeWords: true,
|
|
122
|
+
includeLines: true,
|
|
123
|
+
includeParagraphs: true,
|
|
82
124
|
});
|
|
83
125
|
|
|
84
|
-
|
|
126
|
+
// Access bounding boxes at any level
|
|
127
|
+
for (const paragraph of pageText.paragraphs) {
|
|
128
|
+
console.log("Paragraph bbox:", paragraph.boundingBox);
|
|
129
|
+
for (const line of paragraph.lines) {
|
|
130
|
+
console.log(" Line:", line.text, "at", line.boundingBox);
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Search with Highlighting
|
|
136
|
+
|
|
137
|
+
```tsx
|
|
138
|
+
import { usePDFSearch, useBoundingBoxOverlay } from "@dvvebond/core/react";
|
|
139
|
+
|
|
140
|
+
function SearchableViewer() {
|
|
141
|
+
const { searchState, search, nextMatch, prevMatch, clearSearch } = usePDFSearch();
|
|
142
|
+
const { boundingBoxes, setHighlights } = useBoundingBoxOverlay();
|
|
143
|
+
|
|
144
|
+
const handleSearch = async (query: string) => {
|
|
145
|
+
const results = await search(query);
|
|
146
|
+
// Results include bounding boxes for highlighting
|
|
147
|
+
setHighlights(results.map(r => r.boundingBox));
|
|
148
|
+
};
|
|
149
|
+
|
|
150
|
+
return (
|
|
151
|
+
<div>
|
|
152
|
+
<input type="text" onChange={e => handleSearch(e.target.value)} placeholder="Search..." />
|
|
153
|
+
<span>
|
|
154
|
+
{searchState.currentMatch} / {searchState.totalMatches}
|
|
155
|
+
</span>
|
|
156
|
+
<button onClick={prevMatch}>Previous</button>
|
|
157
|
+
<button onClick={nextMatch}>Next</button>
|
|
158
|
+
</div>
|
|
159
|
+
);
|
|
160
|
+
}
|
|
85
161
|
```
|
|
86
162
|
|
|
87
|
-
###
|
|
163
|
+
### Bounding Box Visualization
|
|
88
164
|
|
|
89
165
|
```typescript
|
|
90
|
-
import {
|
|
166
|
+
import { createBoundingBoxOverlay, type OverlayBoundingBox } from "@dvvebond/core";
|
|
91
167
|
|
|
92
|
-
const
|
|
93
|
-
|
|
168
|
+
const overlay = createBoundingBoxOverlay(containerElement, {
|
|
169
|
+
pageWidth: 612,
|
|
170
|
+
pageHeight: 792,
|
|
171
|
+
scale: 1.5,
|
|
172
|
+
});
|
|
94
173
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
174
|
+
// Add bounding boxes with different types
|
|
175
|
+
overlay.setBoundingBoxes([
|
|
176
|
+
{ x: 50, y: 100, width: 200, height: 20, type: "word", text: "Hello" },
|
|
177
|
+
{ x: 50, y: 130, width: 300, height: 20, type: "line", text: "Hello World" },
|
|
178
|
+
{ x: 50, y: 100, width: 300, height: 100, type: "paragraph" },
|
|
179
|
+
]);
|
|
180
|
+
|
|
181
|
+
// Control visibility by type
|
|
182
|
+
overlay.setVisibility({
|
|
183
|
+
character: false,
|
|
184
|
+
word: true,
|
|
185
|
+
line: true,
|
|
186
|
+
paragraph: false,
|
|
98
187
|
});
|
|
99
188
|
```
|
|
100
189
|
|
|
101
|
-
###
|
|
190
|
+
### Virtual Scrolling for Large Documents
|
|
102
191
|
|
|
103
192
|
```typescript
|
|
104
|
-
|
|
105
|
-
```
|
|
193
|
+
import { createViewportManager, createVirtualScroller, type PageSource } from "@dvvebond/core";
|
|
106
194
|
|
|
107
|
-
|
|
195
|
+
const scroller = createVirtualScroller(containerElement, {
|
|
196
|
+
totalPages: 100,
|
|
197
|
+
pageHeight: 792,
|
|
198
|
+
pageGap: 10,
|
|
199
|
+
overscan: 2, // Render 2 extra pages above/below viewport
|
|
200
|
+
});
|
|
108
201
|
|
|
109
|
-
|
|
110
|
-
|
|
202
|
+
const viewportManager = createViewportManager({
|
|
203
|
+
pageSource: pdfDocument,
|
|
204
|
+
scroller,
|
|
205
|
+
renderer: createCanvasRenderer(),
|
|
206
|
+
});
|
|
207
|
+
|
|
208
|
+
// Pages are automatically loaded/unloaded as user scrolls
|
|
209
|
+
scroller.on("visibleRangeChange", ({ startPage, endPage }) => {
|
|
210
|
+
console.log(`Visible pages: ${startPage} - ${endPage}`);
|
|
211
|
+
});
|
|
212
|
+
```
|
|
111
213
|
|
|
112
|
-
|
|
113
|
-
const page = pdf.addPage({ size: "letter" });
|
|
214
|
+
### Coordinate Transformation
|
|
114
215
|
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
216
|
+
```typescript
|
|
217
|
+
import {
|
|
218
|
+
createCoordinateTransformer,
|
|
219
|
+
getMousePdfCoordinates,
|
|
220
|
+
transformBoundingBoxes,
|
|
221
|
+
} from "@dvvebond/core";
|
|
222
|
+
|
|
223
|
+
const transformer = createCoordinateTransformer({
|
|
224
|
+
pageWidth: 612,
|
|
225
|
+
pageHeight: 792,
|
|
226
|
+
scale: 1.5,
|
|
227
|
+
rotation: 0,
|
|
120
228
|
});
|
|
121
229
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
height: 100,
|
|
127
|
-
color: rgb(0.9, 0.9, 0.9),
|
|
128
|
-
borderColor: rgb(0, 0, 0),
|
|
129
|
-
borderWidth: 1,
|
|
230
|
+
// Handle click events on PDF
|
|
231
|
+
containerElement.addEventListener("click", event => {
|
|
232
|
+
const pdfCoords = getMousePdfCoordinates(event, containerElement, transformer);
|
|
233
|
+
console.log(`Clicked at PDF coordinates: (${pdfCoords.x}, ${pdfCoords.y})`);
|
|
130
234
|
});
|
|
131
235
|
|
|
132
|
-
|
|
236
|
+
// Transform bounding boxes from PDF to screen coordinates
|
|
237
|
+
const screenBoxes = transformBoundingBoxes(pdfBoundingBoxes, transformer);
|
|
133
238
|
```
|
|
134
239
|
|
|
135
|
-
|
|
240
|
+
### PDF.js Integration
|
|
136
241
|
|
|
137
|
-
|
|
242
|
+
```typescript
|
|
243
|
+
import {
|
|
244
|
+
initializePDFJS,
|
|
245
|
+
loadPDFJSDocument,
|
|
246
|
+
getPDFJSTextContent,
|
|
247
|
+
createPDFJSRenderer,
|
|
248
|
+
} from "@dvvebond/core";
|
|
249
|
+
|
|
250
|
+
// Initialize PDF.js (call once at app startup)
|
|
251
|
+
await initializePDFJS();
|
|
252
|
+
|
|
253
|
+
// Load document
|
|
254
|
+
const doc = await loadPDFJSDocument(pdfBytes);
|
|
255
|
+
const page = await doc.getPage(1);
|
|
256
|
+
|
|
257
|
+
// Render to canvas
|
|
258
|
+
const renderer = createPDFJSRenderer(canvas, {
|
|
259
|
+
scale: 1.5,
|
|
260
|
+
enableTextLayer: true,
|
|
261
|
+
});
|
|
262
|
+
await renderer.render(page);
|
|
138
263
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
264
|
+
// Extract text content
|
|
265
|
+
const textContent = await getPDFJSTextContent(page);
|
|
266
|
+
```
|
|
142
267
|
|
|
143
|
-
##
|
|
268
|
+
## React Hooks Reference
|
|
144
269
|
|
|
145
|
-
|
|
270
|
+
### usePDFViewer
|
|
146
271
|
|
|
147
|
-
|
|
148
|
-
| --------------------------- | ---------------- | -------------------------------------- |
|
|
149
|
-
| Signature verification | Not implemented | Signing works; verification is planned |
|
|
150
|
-
| TrueType Collections (.ttc) | Not supported | Extract individual fonts first |
|
|
151
|
-
| JBIG2 image decoding | Passthrough only | Images preserved but not decoded |
|
|
152
|
-
| JPEG2000 (JPX) decoding | Passthrough only | Images preserved but not decoded |
|
|
153
|
-
| Certificate encryption | Not supported | Password encryption works |
|
|
154
|
-
| JavaScript actions | Ignored | Form calculations not executed |
|
|
272
|
+
Main hook for PDF viewer state management.
|
|
155
273
|
|
|
156
|
-
|
|
274
|
+
```tsx
|
|
275
|
+
const { currentPage, totalPages, scale, isLoading, error, goToPage, setScale, zoomIn, zoomOut } =
|
|
276
|
+
usePDFViewer(viewerRef);
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### usePDFSearch
|
|
157
280
|
|
|
158
|
-
|
|
281
|
+
Hook for search functionality.
|
|
159
282
|
|
|
160
|
-
|
|
283
|
+
```tsx
|
|
284
|
+
const {
|
|
285
|
+
searchState, // { query, matches, currentMatch, totalMatches, status }
|
|
286
|
+
search, // (query: string) => Promise<SearchResult[]>
|
|
287
|
+
nextMatch, // () => void
|
|
288
|
+
prevMatch, // () => void
|
|
289
|
+
clearSearch, // () => void
|
|
290
|
+
} = usePDFSearch(viewerRef);
|
|
291
|
+
```
|
|
161
292
|
|
|
162
|
-
|
|
293
|
+
### useBoundingBoxOverlay
|
|
163
294
|
|
|
164
|
-
|
|
295
|
+
Hook for bounding box visualization.
|
|
165
296
|
|
|
166
|
-
|
|
167
|
-
|
|
297
|
+
```tsx
|
|
298
|
+
const {
|
|
299
|
+
boundingBoxes,
|
|
300
|
+
visibility,
|
|
301
|
+
setVisibility,
|
|
302
|
+
addBoundingBoxes,
|
|
303
|
+
clearBoundingBoxes,
|
|
304
|
+
highlightBox,
|
|
305
|
+
} = useBoundingBoxOverlay(viewerRef);
|
|
306
|
+
```
|
|
168
307
|
|
|
169
|
-
|
|
308
|
+
### useViewport
|
|
170
309
|
|
|
171
|
-
|
|
310
|
+
Hook for viewport information.
|
|
172
311
|
|
|
173
|
-
```
|
|
174
|
-
|
|
312
|
+
```tsx
|
|
313
|
+
const { viewportWidth, viewportHeight, scrollTop, scrollLeft, visiblePages } =
|
|
314
|
+
useViewport(viewerRef);
|
|
175
315
|
```
|
|
176
316
|
|
|
177
|
-
|
|
317
|
+
### useScrollPosition
|
|
178
318
|
|
|
179
|
-
|
|
319
|
+
Hook for scroll position tracking.
|
|
180
320
|
|
|
181
|
-
|
|
321
|
+
```tsx
|
|
322
|
+
const { scrollTop, scrollLeft, scrollTo, scrollToPage } = useScrollPosition(viewerRef);
|
|
323
|
+
```
|
|
182
324
|
|
|
183
|
-
##
|
|
325
|
+
## Runtime Support
|
|
184
326
|
|
|
185
|
-
|
|
327
|
+
Works in all modern JavaScript environments:
|
|
186
328
|
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
329
|
+
- **Node.js** 20+
|
|
330
|
+
- **Bun**
|
|
331
|
+
- **Browsers** (modern, with Web Crypto)
|
|
190
332
|
|
|
191
|
-
##
|
|
333
|
+
## Migration from react-pdf
|
|
192
334
|
|
|
193
|
-
|
|
335
|
+
See [MIGRATION.md](./MIGRATION.md) for a detailed migration guide from react-pdf to @dvvebond/core.
|
|
194
336
|
|
|
195
|
-
|
|
196
|
-
# Clone the repo
|
|
197
|
-
git clone https://github.com/libpdf/core.git
|
|
198
|
-
cd libpdf
|
|
337
|
+
## API Reference
|
|
199
338
|
|
|
200
|
-
|
|
201
|
-
bun install
|
|
339
|
+
See [API.md](./API.md) for complete API documentation.
|
|
202
340
|
|
|
203
|
-
|
|
204
|
-
bun run test
|
|
341
|
+
## Acknowledgments
|
|
205
342
|
|
|
206
|
-
|
|
207
|
-
bun run typecheck
|
|
208
|
-
```
|
|
343
|
+
This project is a fork of [LibPDF](https://github.com/LibPDF-js/core) by [Documenso](https://documenso.com). The core PDF parsing and generation functionality is their excellent work.
|
|
209
344
|
|
|
210
345
|
## License
|
|
211
346
|
|