@zola_do/document-manipulator 0.2.4 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +430 -34
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,7 +1,21 @@
1
1
  # @zola_do/document-manipulator
2
2
 
3
+ [![npm version](https://img.shields.io/npm/v/@zola_do/document-manipulator.svg)](https://www.npmjs.com/package/@zola_do/document-manipulator)
4
+ [![npm downloads](https://img.shields.io/npm/dm/@zola_do/document-manipulator.svg)](https://www.npmjs.com/package/@zola_do/document-manipulator)
5
+ [![License: ISC](https://img.shields.io/badge/License-ISC-blue.svg)](https://opensource.org/licenses/ISC)
6
+
3
7
  PDF and DOCX merge, conversion, and template population for NestJS.
4
8
 
9
+ ## Overview
10
+
11
+ `@zola_do/document-manipulator` provides:
12
+
13
+ - **PDF Merging** — Combine multiple PDFs into one
14
+ - **DOCX Merging** — Combine multiple Word documents
15
+ - **Document Conversion** — Convert between formats (DOCX ↔ PDF ↔ HTML)
16
+ - **Template Population** — Fill DOCX templates with data
17
+ - **MinIO Integration** — Store and retrieve documents
18
+
5
19
  ## Installation
6
20
 
7
21
  ```bash
@@ -12,23 +26,30 @@ npm install @zola_do/document-manipulator
12
26
  npm install @zola_do/nestjs-shared
13
27
  ```
14
28
 
15
- ## Dependencies
29
+ ### Dependencies
30
+
31
+ ```bash
32
+ npm install @zola_do/minio
33
+ npm install docx-templates pdf-lib @scholarcy/docx-merger
34
+ npm install axios form-data
35
+ npm install libreoffice-convert
36
+ ```
37
+
38
+ **Note:** For document conversion, LibreOffice must be installed on the system.
16
39
 
17
- This package depends on:
40
+ ## Operations notes
18
41
 
19
- - `@zola_do/minio` Object storage for documents
20
- - `docx-templates` DOCX template filling
21
- - `pdf-lib` — PDF manipulation
22
- - `libreoffice-convert` — Document format conversion (requires LibreOffice installed)
42
+ - **LibreOffice / soffice:** Runs as a subprocess during conversion. In production, enforce **timeouts** on conversion calls in your app, bound **temporary directories**, and avoid running converters as root. On Linux, consider `systemd` `MemoryMax` / `CPUQuota` for the worker process or dedicated job containers.
43
+ - **Cloud conversion:** For serverless or locked-down containers without LibreOffice, route conversions through an external API by implementing a thin wrapper in your app (keep secrets and quotas in your infrastructure, not in this library).
23
44
 
24
- ## Usage
45
+ ## Quick Start
25
46
 
26
- ### Module Setup
47
+ ### 1. Register Modules
27
48
 
28
49
  ```typescript
29
- import { Module } from '@nestjs/common';
30
- import { DocumentManipulatorModule } from '@zola_do/document-manipulator';
31
- import { MinIoModule } from '@zola_do/minio';
50
+ import { Module } from "@nestjs/common";
51
+ import { DocumentManipulatorModule } from "@zola_do/document-manipulator";
52
+ import { MinIoModule } from "@zola_do/minio";
32
53
 
33
54
  @Module({
34
55
  imports: [MinIoModule, DocumentManipulatorModule],
@@ -36,62 +57,437 @@ import { MinIoModule } from '@zola_do/minio';
36
57
  export class AppModule {}
37
58
  ```
38
59
 
39
- ### Merging PDFs
60
+ ### 2. Use Service
40
61
 
41
62
  ```typescript
42
- import { Injectable } from '@nestjs/common';
43
- import { DocumentManipulatorService } from '@zola_do/document-manipulator';
63
+ import { Injectable } from "@nestjs/common";
64
+ import { DocumentManipulatorService } from "@zola_do/document-manipulator";
44
65
 
45
66
  @Injectable()
46
67
  export class ReportService {
47
68
  constructor(private readonly docService: DocumentManipulatorService) {}
48
69
 
49
- async mergePdfs(pdfBuffers: Buffer[]) {
50
- await this.docService.mergePdf(pdfBuffers);
51
- // Merged PDF uploaded to MinIO
70
+ async mergeInvoices(invoiceIds: string[]) {
71
+ // Get PDF buffers
72
+ const pdfBuffers = await Promise.all(
73
+ invoiceIds.map((id) => this.getInvoicePdf(id)),
74
+ );
75
+
76
+ // Merge PDFs
77
+ const mergedBuffer = await this.docService.mergePdf(pdfBuffers);
78
+
79
+ // Upload to MinIO
80
+ return await this.minioService.uploadBuffer(
81
+ mergedBuffer,
82
+ `merged-invoice-${Date.now()}.pdf`,
83
+ "application/pdf",
84
+ "invoices",
85
+ );
52
86
  }
53
87
  }
54
88
  ```
55
89
 
56
- ### Merging DOCX
90
+ ## PDF Operations
91
+
92
+ ### Merge PDFs
93
+
94
+ Combine multiple PDF buffers into one:
57
95
 
58
96
  ```typescript
59
- await this.docService.mergeDocx(docxBuffers);
97
+ async mergePdfs(pdfBuffers: Buffer[]) {
98
+ const merged = await this.docService.mergePdf(pdfBuffers);
99
+ return merged; // Single PDF Buffer
100
+ }
60
101
  ```
61
102
 
62
- ### Populating Templates
103
+ **Example: Merge invoice PDFs**
104
+
105
+ ```typescript
106
+ async generateMergedInvoice(orderId: string): Promise<Buffer> {
107
+ const order = await this.orderService.findOne(orderId);
108
+
109
+ // Get each item's PDF
110
+ const itemPdfs = await Promise.all(
111
+ order.items.map(item => this.getItemPdf(item.id))
112
+ );
113
+
114
+ // Add cover page
115
+ const coverPage = await this.generateCoverPage(order);
116
+ const allPdfs = [coverPage, ...itemPdfs];
117
+
118
+ return this.docService.mergePdf(allPdfs);
119
+ }
120
+ ```
121
+
122
+ ### Convert Document to PDF
123
+
124
+ Convert DOCX to PDF (requires LibreOffice):
125
+
126
+ ```typescript
127
+ async convertToPdf(docxBuffer: Buffer): Promise<Buffer> {
128
+ const pdfBuffer = await this.docService.convertDocument(docxBuffer, '.pdf');
129
+ return pdfBuffer;
130
+ }
131
+ ```
132
+
133
+ ## DOCX Operations
134
+
135
+ ### Merge DOCX Files
136
+
137
+ Combine multiple Word documents:
138
+
139
+ ```typescript
140
+ async mergeDocx(docxBuffers: Buffer[]) {
141
+ const merged = await this.docService.mergeDocx(docxBuffers);
142
+ return merged; // Single DOCX Buffer
143
+ }
144
+ ```
145
+
146
+ **Example: Merge contract sections**
147
+
148
+ ```typescript
149
+ async generateContract(contractId: string): Promise<Buffer> {
150
+ const contract = await this.contractService.findOne(contractId);
151
+
152
+ const sections = [
153
+ await this.loadTemplate('contracts/header.docx'),
154
+ await this.loadTemplate(`contracts/${contract.type}/body.docx`),
155
+ await this.loadTemplate('contracts/terms.docx'),
156
+ await this.loadTemplate('contracts/signatures.docx'),
157
+ ];
158
+
159
+ return this.docService.mergeDocx(sections);
160
+ }
161
+ ```
162
+
163
+ ### Convert DOCX to HTML
164
+
165
+ ```typescript
166
+ async convertToHtml(docxBuffer: Buffer): Promise<Buffer> {
167
+ const htmlBuffer = await this.docService.convertDocument(docxBuffer, '.html');
168
+ return htmlBuffer;
169
+ }
170
+ ```
171
+
172
+ ### Convert DOCX to PDF
173
+
174
+ ```typescript
175
+ async convertDocxToPdf(docxBuffer: Buffer): Promise<Buffer> {
176
+ const pdfBuffer = await this.docService.convertDocument(docxBuffer, '.pdf');
177
+ return pdfBuffer;
178
+ }
179
+ ```
180
+
181
+ ## Template Population
63
182
 
64
183
  Fill DOCX templates with data:
65
184
 
66
185
  ```typescript
67
- const populatedBuffer = await this.docService.populateTemplate(
68
- templateBuffer,
69
- { public_body: 'Procurement of procedure' },
70
- );
186
+ async populateTemplate(
187
+ templateBuffer: Buffer,
188
+ data: Record<string, any>,
189
+ ): Promise<Buffer> {
190
+ const result = await this.docService.populateTemplate(templateBuffer, data);
191
+ return result;
192
+ }
193
+ ```
194
+
195
+ **Example: Generate personalized document**
196
+
197
+ ```typescript
198
+ async generateOfferLetter(employeeId: string): Promise<Buffer> {
199
+ const employee = await this.employeeService.findOne(employeeId);
200
+ const template = await this.loadTemplate('offer-letter.docx');
201
+
202
+ return this.docService.populateTemplate(template, {
203
+ candidateName: employee.fullName,
204
+ position: employee.position,
205
+ startDate: employee.startDate.toLocaleDateString(),
206
+ salary: employee.salary.toLocaleString(),
207
+ benefits: employee.benefits,
208
+ companyName: 'Acme Corporation',
209
+ hrName: 'Jane Smith',
210
+ offerDate: new Date().toLocaleDateString(),
211
+ });
212
+ }
213
+ ```
214
+
215
+ ## Document Conversion
216
+
217
+ ### Supported Conversions
218
+
219
+ | Input | Output | Method |
220
+ | ----- | ------ | ----------- |
221
+ | DOCX | PDF | LibreOffice |
222
+ | DOCX | HTML | LibreOffice |
223
+ | DOCX | ODT | LibreOffice |
224
+ | DOC | DOCX | LibreOffice |
225
+
226
+ ### Using LibreOffice
227
+
228
+ **Note:** LibreOffice must be installed for conversion to work.
229
+
230
+ ```bash
231
+ # macOS
232
+ brew install libreoffice
233
+
234
+ # Ubuntu/Debian
235
+ sudo apt install libreoffice
236
+
237
+ # Windows
238
+ # Download from https://www.libreoffice.org/download/download/
239
+ ```
240
+
241
+ ### Remote Conversion Service
242
+
243
+ For serverless environments without LibreOffice:
244
+
245
+ ```typescript
246
+ async convertViaRemote(docxBuffer: Buffer): Promise<Buffer> {
247
+ // Configure remote endpoint in environment
248
+ // DOCUMENT_CONVERT_ENDPOINT=https://convert.example.com
249
+
250
+ return await this.docService.convertDocument(docxBuffer, '.pdf');
251
+ }
252
+ ```
253
+
254
+ ## FileHelperService
255
+
256
+ Utility service for file operations:
257
+
258
+ ```typescript
259
+ import { Injectable } from "@nestjs/common";
260
+ import { FileHelperService } from "@zola_do/document-manipulator";
261
+
262
+ @Injectable()
263
+ export class DocumentService {
264
+ constructor(
265
+ private readonly docService: DocumentManipulatorService,
266
+ private readonly fileHelper: FileHelperService,
267
+ ) {}
268
+
269
+ async processDocument(buffer: Buffer, filename: string) {
270
+ // Get file extension
271
+ const ext = this.fileHelper.getFileExtension(filename);
272
+
273
+ // Get MIME type
274
+ const mime = this.fileHelper.getMimeType(filename);
275
+
276
+ // Validate file type
277
+ if (!this.fileHelper.isValidDocumentType(ext)) {
278
+ throw new BadRequestException("Invalid document type");
279
+ }
280
+
281
+ // Get file size
282
+ const size = this.fileHelper.getFileSize(buffer);
283
+
284
+ // ... process document
285
+ }
286
+ }
287
+ ```
288
+
289
+ ### FileHelper Methods
290
+
291
+ ```typescript
292
+ class FileHelperService {
293
+ getFileExtension(filename: string): string;
294
+ getMimeType(filename: string): string;
295
+ isValidDocumentType(extension: string): boolean;
296
+ isValidImageType(extension: string): boolean;
297
+ getFileSize(buffer: Buffer): number;
298
+ sanitizeFilename(filename: string): string;
299
+ }
71
300
  ```
72
301
 
73
- ### Converting Documents
302
+ ## Document Architecture
303
+
304
+ ```
305
+ ┌─────────────────────────────────────────────────────────────────────┐
306
+ │ Document Manipulation Flow │
307
+ ├─────────────────────────────────────────────────────────────────────┤
308
+ │ │
309
+ │ Source Documents │
310
+ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
311
+ │ │ DOCX 1 │ │ DOCX 2 │ │ DOCX 3 │ │
312
+ │ └────┬────┘ └────┬────┘ └────┬────┘ │
313
+ │ │ │ │ │
314
+ │ └───────────┼───────────┘ │
315
+ │ │ │
316
+ │ ▼ │
317
+ │ ┌─────────────────┐ │
318
+ │ │ Document │ │
319
+ │ │ Manipulator │ │
320
+ │ │ Service │ │
321
+ │ └────────┬────────┘ │
322
+ │ │ │
323
+ │ ┌─────────┼─────────┐ │
324
+ │ │ │ │ │
325
+ │ ▼ ▼ ▼ │
326
+ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
327
+ │ │ merge │ │ populate │ │ convert │ │
328
+ │ │ PDFs │ │ template │ │ formats │ │
329
+ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
330
+ │ │ │ │ │
331
+ │ ▼ ▼ ▼ │
332
+ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
333
+ │ │ Merged │ │ Filled │ │ Output │ │
334
+ │ │ PDF │ │ Template │ │ Format │ │
335
+ │ └──────────┘ └──────────┘ └──────────┘ │
336
+ │ │
337
+ └─────────────────────────────────────────────────────────────────────┘
338
+ ```
74
339
 
75
- Convert between formats (requires LibreOffice):
340
+ ## Environment Variables
341
+
342
+ | Variable | Description | Required |
343
+ | --------------------------- | ----------------------------- | ----------- |
344
+ | `DOCUMENT_CONVERT_ENDPOINT` | Remote conversion service URL | No |
345
+ | `MINIO_ENDPOINT` | MinIO server endpoint | For storage |
346
+ | `MINIO_PORT` | MinIO port | For storage |
347
+ | `MINIO_ACCESSKEY` | MinIO access key | For storage |
348
+ | `MINIO_SECRETKEY` | MinIO secret key | For storage |
349
+
350
+ ## API Reference
351
+
352
+ ### Module
76
353
 
77
354
  ```typescript
78
- const htmlBuffer = await this.docService.convertDocument(
79
- docxBuffer,
80
- '.html',
81
- );
355
+ DocumentManipulatorModule.forRoot(options?: DocumentManipulatorModuleOptions)
82
356
  ```
83
357
 
84
- ## Exports
358
+ ### Service
359
+
360
+ ```typescript
361
+ class DocumentManipulatorService {
362
+ // PDF Operations
363
+ async mergePdf(pdfBuffers: Buffer[]): Promise<Buffer>;
364
+
365
+ // DOCX Operations
366
+ async mergeDocx(docxBuffers: Buffer[]): Promise<Buffer>;
367
+
368
+ // Template Operations
369
+ async populateTemplate(
370
+ templateBuffer: Buffer,
371
+ data: Record<string, any>,
372
+ ): Promise<Buffer>;
373
+
374
+ // Conversion
375
+ async convertDocument(
376
+ buffer: Buffer,
377
+ outputExtension: ".pdf" | ".html" | ".odt",
378
+ ): Promise<Buffer>;
379
+ }
380
+ ```
381
+
382
+ ### FileHelperService
383
+
384
+ ```typescript
385
+ class FileHelperService {
386
+ getFileExtension(filename: string): string;
387
+ getMimeType(filename: string): string;
388
+ isValidDocumentType(extension: string): boolean;
389
+ isValidImageType(extension: string): boolean;
390
+ getFileSize(buffer: Buffer): number;
391
+ sanitizeFilename(filename: string): string;
392
+ }
393
+ ```
394
+
395
+ ## Common Use Cases
396
+
397
+ ### 1. Generate Merged Invoice Report
398
+
399
+ ```typescript
400
+ async generateInvoiceReport(date: Date): Promise<Buffer> {
401
+ const orders = await this.orderService.findByDate(date);
85
402
 
86
- - `DocumentManipulatorModule` Register the module
87
- - `DocumentManipulatorService` — Merge, convert, populate templates
88
- - `FileHelperService` — File utilities
403
+ const orderPdfs = await Promise.all(
404
+ orders.map(order => this.generateOrderPdf(order.id))
405
+ );
406
+
407
+ const header = await this.generateReportHeader(date);
408
+
409
+ return this.docService.mergePdf([header, ...orderPdfs]);
410
+ }
411
+ ```
412
+
413
+ ### 2. Create Package Documents
414
+
415
+ ```typescript
416
+ async createContractPackage(contractId: string): Promise<Buffer> {
417
+ const contract = await this.contractService.findOne(contractId);
418
+
419
+ // Generate each document from template
420
+ const [cover, terms, specs, pricing] = await Promise.all([
421
+ this.populateTemplate('cover.docx', contract),
422
+ this.populateTemplate('terms.docx', contract),
423
+ this.populateTemplate('specs.docx', contract),
424
+ this.populateTemplate('pricing.docx', contract),
425
+ ]);
426
+
427
+ // Merge all DOCX
428
+ return this.docService.mergeDocx([cover, terms, specs, pricing]);
429
+ }
430
+ ```
431
+
432
+ ### 3. Export to Multiple Formats
433
+
434
+ ```typescript
435
+ async exportDocument(documentId: string, format: string): Promise<Buffer> {
436
+ const doc = await this.documentService.findOne(documentId);
437
+
438
+ switch (format) {
439
+ case 'pdf':
440
+ return this.docService.convertDocument(doc.buffer, '.pdf');
441
+ case 'html':
442
+ return this.docService.convertDocument(doc.buffer, '.html');
443
+ default:
444
+ return doc.buffer; // Return original DOCX
445
+ }
446
+ }
447
+ ```
448
+
449
+ ## Troubleshooting
450
+
451
+ ### Q: LibreOffice not found?
452
+
453
+ Ensure LibreOffice is installed and in PATH:
454
+
455
+ ```bash
456
+ which libreoffice
457
+ # or
458
+ which soffice
459
+ ```
460
+
461
+ ### Q: Conversion fails?
462
+
463
+ Check that the input buffer is a valid document:
464
+
465
+ ```typescript
466
+ if (buffer.length === 0) {
467
+ throw new Error("Empty buffer");
468
+ }
469
+ ```
470
+
471
+ ### Q: PDF merge produces blank pages?
472
+
473
+ Ensure PDF buffers are valid:
474
+
475
+ ```typescript
476
+ // Verify PDF header
477
+ if (!buffer.slice(0, 4).equals(Buffer.from("%PDF"))) {
478
+ throw new Error("Invalid PDF buffer");
479
+ }
480
+ ```
89
481
 
90
482
  ## Related Packages
91
483
 
92
484
  - [@zola_do/docx](../docx) — DOCX template processing
93
485
  - [@zola_do/minio](../minio) — Required for storage
94
486
 
487
+ ## License
488
+
489
+ ISC
490
+
95
491
  ## Community
96
492
 
97
493
  - [Contributing](../../CONTRIBUTING.md)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@zola_do/document-manipulator",
3
- "version": "0.2.4",
3
+ "version": "0.2.6",
4
4
  "description": "PDF/DOCX merge, conversion, template population for NestJS",
5
5
  "author": "zolaDO",
6
6
  "license": "ISC",