docxmlater 11.0.4 → 11.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +410 -637
- package/dist/core/DocumentParser.d.ts.map +1 -1
- package/dist/core/DocumentParser.js +3 -0
- package/dist/core/DocumentParser.js.map +1 -1
- package/dist/elements/ImageRun.d.ts.map +1 -1
- package/dist/elements/ImageRun.js +6 -1
- package/dist/elements/ImageRun.js.map +1 -1
- package/dist/esm/core/DocumentParser.js +3 -0
- package/dist/esm/core/DocumentParser.js.map +1 -1
- package/dist/esm/elements/ImageRun.js +6 -1
- package/dist/esm/elements/ImageRun.js.map +1 -1
- package/package.json +16 -5
- package/src/core/DocumentParser.ts +7 -0
- package/src/elements/ImageRun.ts +10 -1
package/README.md
CHANGED
|
@@ -1,116 +1,89 @@
|
|
|
1
|
-
#
|
|
1
|
+
# docxmlater
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**The TypeScript library for editing existing Word documents with full tracked-changes, comment, and bookmark fidelity.**
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Most DOCX libraries can generate documents from scratch. docxmlater is built for the harder problem: loading an existing `.docx`, modifying it, and saving it back without corrupting it. That includes documents that already contain tracked changes, comments, or bookmarks - which most other libraries silently break on round-trip.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
[Try it in your browser](https://stackblitz.com/github/ItMeDiaTech/docXMLater/tree/main/playground) | [Why docxmlater](#why-docxmlater) | [Quick Start](#quick-start) | [API Reference](#api-reference)
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
- Programmatic batch processing of corporate documents
|
|
13
|
-
|
|
14
|
-
If you only need to **generate documents from scratch** and don't need to load/edit existing files, consider the [docx](https://www.npmjs.com/package/docx) package which has a declarative builder API optimized for document creation.
|
|
15
|
-
|
|
16
|
-
## Features
|
|
17
|
-
|
|
18
|
-
### Core Document Operations
|
|
19
|
-
|
|
20
|
-
- Create DOCX files from scratch
|
|
21
|
-
- Read and modify existing DOCX files
|
|
22
|
-
- Buffer-based operations (load/save from memory)
|
|
23
|
-
- Document properties (core, extended, custom)
|
|
24
|
-
- Memory management with dispose pattern
|
|
25
|
-
- Bookmark pair validation and auto-repair (`validateBookmarkPairs()`)
|
|
26
|
-
- App.xml metadata preservation (HeadingPairs, TotalTime, etc.)
|
|
27
|
-
- Document background color/theme support
|
|
28
|
-
|
|
29
|
-
### Text & Paragraph Formatting
|
|
30
|
-
|
|
31
|
-
- Character formatting: bold, italic, underline, strikethrough, subscript, superscript
|
|
32
|
-
- Font properties: family, size, color (RGB and theme colors), highlight
|
|
33
|
-
- Text effects: small caps, all caps, shadow, emboss, engrave
|
|
34
|
-
- Paragraph alignment, indentation, spacing, borders, shading
|
|
35
|
-
- Text search and replace with regex support
|
|
36
|
-
- Custom styles (paragraph, character, table)
|
|
37
|
-
- CJK/East Asian paragraph properties (kinsoku, wordWrap, overflowPunct, topLinePunct)
|
|
38
|
-
- Underline color and theme color attributes
|
|
39
|
-
- Theme font references (asciiTheme, hAnsiTheme, eastAsiaTheme, csTheme)
|
|
40
|
-
|
|
41
|
-
### Lists & Tables
|
|
42
|
-
|
|
43
|
-
- Numbered lists (decimal, roman, alpha)
|
|
44
|
-
- Bulleted lists with various bullet styles
|
|
45
|
-
- Multi-level lists with custom numbering and restart control
|
|
46
|
-
- Tables with formatting, borders, shading
|
|
47
|
-
- Cell spanning (merge cells horizontally and vertically)
|
|
48
|
-
- Advanced table properties (margins, widths, alignment)
|
|
49
|
-
- Table navigation helpers (`getFirstParagraph()`, `getLastParagraph()`)
|
|
50
|
-
- Legacy horizontal merge (`hMerge`) support
|
|
51
|
-
- Table layout parsing (`fixed`/`auto`)
|
|
52
|
-
- Table style shading updates (modify styles.xml colors)
|
|
53
|
-
- Cell content management (trailing blank removal with structure preservation)
|
|
54
|
-
|
|
55
|
-
### Rich Content
|
|
56
|
-
|
|
57
|
-
- Images (PNG, JPEG, GIF, SVG, EMF, WMF) with positioning, text wrapping, and full ECMA-376 DrawingML attribute coverage
|
|
58
|
-
- Headers & footers (different first page, odd/even pages)
|
|
59
|
-
- Hyperlinks (external URLs, internal bookmarks)
|
|
60
|
-
- Hyperlink defragmentation utility (fixes fragmented links from Google Docs)
|
|
61
|
-
- Hyperlink URL sanitization (strips browser extension prefixes from corrupted URLs)
|
|
62
|
-
- Bookmarks and cross-references
|
|
63
|
-
- Body-level bookmark support (bookmarks between block elements)
|
|
64
|
-
- Shapes and text boxes
|
|
9
|
+
```bash
|
|
10
|
+
npm install docxmlater
|
|
11
|
+
```
|
|
65
12
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
-
|
|
84
|
-
-
|
|
85
|
-
-
|
|
86
|
-
-
|
|
87
|
-
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
-
|
|
102
|
-
|
|
103
|
-
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
-
|
|
112
|
-
-
|
|
113
|
-
-
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Why docxmlater
|
|
16
|
+
|
|
17
|
+
| | `docx` | `docxtemplater` | **docxmlater** |
|
|
18
|
+
| ------------------------------- | :----: | :--------------------------: | :------------: |
|
|
19
|
+
| Generate documents from scratch | ✓ | ✓ | ✓ |
|
|
20
|
+
| Load and edit existing files | ✗ | partial (templates only) | ✓ |
|
|
21
|
+
| Round-trip XML fidelity | ✗ | partial | ✓ |
|
|
22
|
+
| Tracked-changes preservation | ✗ | ✗ | ✓ |
|
|
23
|
+
| Comments (resolve / unresolve) | ✗ | ✗ | ✓ |
|
|
24
|
+
| Bookmarks (block and inline) | ✗ | partial | ✓ |
|
|
25
|
+
| Compatibility-mode upgrade | ✗ | ✗ | ✓ |
|
|
26
|
+
| Free / open-source | ✓ | partial (commercial modules) | ✓ |
|
|
27
|
+
|
|
28
|
+
### When docxmlater is the right choice
|
|
29
|
+
|
|
30
|
+
- You need to **load existing Word documents** and modify any element with full fidelity.
|
|
31
|
+
- You're working with documents that contain **tracked changes**, **comments**, or **bookmarks** that must round-trip cleanly.
|
|
32
|
+
- You programmatically apply formatting on top of someone else's drafted document.
|
|
33
|
+
- You're processing documents from older Word versions and need **compatibility-mode upgrade**.
|
|
34
|
+
- You want a single library with no commercial tier behind features you actually need.
|
|
35
|
+
|
|
36
|
+
### When you might want something else
|
|
37
|
+
|
|
38
|
+
- If you only need to **generate a document from scratch** with no edit or round-trip requirement, the `docx` package has a more declarative builder API.
|
|
39
|
+
- If your entire workflow is **template-fill** (placeholders in a designer-authored docx), `docxtemplater` may fit better.
|
|
40
|
+
- If you only need to **convert docx to HTML or Markdown for display**, `mammoth` is purpose-built.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## About the Project
|
|
45
|
+
|
|
46
|
+
docxmlater began in early 2025 as a personal effort to build a TypeScript framework capable of full programmatic interaction with `.docx` files. What started as a focused side project grew into a much larger undertaking as the depth of the OOXML specification revealed itself. The work is implemented directly against the 6,000+ page ECMA-376 standard, with attention paid to round-trip fidelity, schema correctness, and the practical edge cases real-world Word documents introduce.
|
|
47
|
+
|
|
48
|
+
The library is in active production use on a small team for day-to-day document formatting workflows. The aim is to provide a free, capable alternative to commercial DOCX engines that charge thousands of dollars per year per seat.
|
|
49
|
+
|
|
50
|
+
What distinguishes docxmlater from existing libraries is its first-class support for revision workflows. Tracked changes, comments, and bookmarks are fully integrated. Documents that already contain tracked changes can be processed without corruption, preserving the existing revision history where required while still applying new formatting on top.
|
|
51
|
+
|
|
52
|
+
If you encounter a use case that is not yet implemented and would be broadly useful, please open an issue.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Table of Contents
|
|
57
|
+
|
|
58
|
+
- [Why docxmlater](#why-docxmlater)
|
|
59
|
+
- [Installation](#installation)
|
|
60
|
+
- [Quick Start](#quick-start)
|
|
61
|
+
- [Feature Overview](#feature-overview)
|
|
62
|
+
- [API Reference](#api-reference)
|
|
63
|
+
- [Document](#document)
|
|
64
|
+
- [Paragraph](#paragraph)
|
|
65
|
+
- [Run](#run)
|
|
66
|
+
- [Table](#table)
|
|
67
|
+
- [TableCell](#tablecell)
|
|
68
|
+
- [Section](#section)
|
|
69
|
+
- [Comment & CommentManager](#comment--commentmanager)
|
|
70
|
+
- [Utilities](#utilities)
|
|
71
|
+
- [Advanced Topics](#advanced-topics)
|
|
72
|
+
- [Tracked Changes](#tracked-changes)
|
|
73
|
+
- [Custom Styles](#custom-styles)
|
|
74
|
+
- [Hyperlink Management](#hyperlink-management)
|
|
75
|
+
- [Compatibility Mode](#compatibility-mode)
|
|
76
|
+
- [Templates](#templates)
|
|
77
|
+
- [Document Conversion](#document-conversion)
|
|
78
|
+
- [Performance & Memory Management](#performance--memory-management)
|
|
79
|
+
- [Architecture](#architecture)
|
|
80
|
+
- [Security](#security)
|
|
81
|
+
- [TypeScript Support](#typescript-support)
|
|
82
|
+
- [Requirements](#requirements)
|
|
83
|
+
- [Contributing](#contributing)
|
|
84
|
+
- [License](#license)
|
|
85
|
+
|
|
86
|
+
---
|
|
114
87
|
|
|
115
88
|
## Installation
|
|
116
89
|
|
|
@@ -118,69 +91,51 @@ The following features are preserved as raw XML on round-trip but have no editin
|
|
|
118
91
|
npm install docxmlater
|
|
119
92
|
```
|
|
120
93
|
|
|
94
|
+
Requires Node.js **18.0.0** or higher. TypeScript 5.0+ is recommended for development.
|
|
95
|
+
|
|
96
|
+
The only runtime dependency is `jszip` for ZIP archive handling.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
121
100
|
## Quick Start
|
|
122
101
|
|
|
123
|
-
###
|
|
102
|
+
### Create a new document
|
|
124
103
|
|
|
125
104
|
```typescript
|
|
126
105
|
import { Document } from 'docxmlater';
|
|
127
106
|
|
|
128
|
-
// Create a new document
|
|
129
107
|
const doc = Document.create();
|
|
130
|
-
|
|
131
|
-
// Add a paragraph
|
|
132
108
|
const para = doc.createParagraph();
|
|
133
109
|
para.addText('Hello, World!', { bold: true, fontSize: 24 });
|
|
134
110
|
|
|
135
|
-
// Save to file
|
|
136
111
|
await doc.save('hello.docx');
|
|
137
|
-
|
|
138
|
-
// Don't forget to dispose
|
|
139
112
|
doc.dispose();
|
|
140
113
|
```
|
|
141
114
|
|
|
142
|
-
###
|
|
115
|
+
### Load and modify an existing document
|
|
143
116
|
|
|
144
117
|
```typescript
|
|
145
118
|
import { Document } from 'docxmlater';
|
|
146
119
|
|
|
147
|
-
// Load existing document
|
|
148
120
|
const doc = await Document.load('input.docx');
|
|
149
121
|
|
|
150
|
-
// Find and replace text
|
|
151
122
|
doc.replaceText(/old text/g, 'new text');
|
|
123
|
+
doc.createParagraph().addText('Added paragraph', { italic: true });
|
|
152
124
|
|
|
153
|
-
// Add a new paragraph
|
|
154
|
-
const para = doc.createParagraph();
|
|
155
|
-
para.addText('Added paragraph', { italic: true });
|
|
156
|
-
|
|
157
|
-
// Save modifications
|
|
158
125
|
await doc.save('output.docx');
|
|
159
126
|
doc.dispose();
|
|
160
127
|
```
|
|
161
128
|
|
|
162
|
-
###
|
|
129
|
+
### Tables
|
|
163
130
|
|
|
164
131
|
```typescript
|
|
165
|
-
import { Document } from 'docxmlater';
|
|
166
|
-
|
|
167
132
|
const doc = Document.create();
|
|
168
|
-
|
|
169
|
-
// Create a 3x4 table
|
|
170
133
|
const table = doc.createTable(3, 4);
|
|
171
134
|
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
headerRow.getCell(1).addParagraph().addText('Column 2', { bold: true });
|
|
176
|
-
headerRow.getCell(2).addParagraph().addText('Column 3', { bold: true });
|
|
177
|
-
headerRow.getCell(3).addParagraph().addText('Column 4', { bold: true });
|
|
135
|
+
const header = table.getRow(0);
|
|
136
|
+
header.getCell(0).createParagraph().addText('Column 1', { bold: true });
|
|
137
|
+
header.getCell(1).createParagraph().addText('Column 2', { bold: true });
|
|
178
138
|
|
|
179
|
-
// Add data
|
|
180
|
-
table.getRow(1).getCell(0).addParagraph().addText('Data 1');
|
|
181
|
-
table.getRow(1).getCell(1).addParagraph().addText('Data 2');
|
|
182
|
-
|
|
183
|
-
// Apply borders
|
|
184
139
|
table.setBorders({
|
|
185
140
|
top: { style: 'single', size: 4, color: '000000' },
|
|
186
141
|
bottom: { style: 'single', size: 4, color: '000000' },
|
|
@@ -194,563 +149,397 @@ await doc.save('table.docx');
|
|
|
194
149
|
doc.dispose();
|
|
195
150
|
```
|
|
196
151
|
|
|
197
|
-
###
|
|
152
|
+
### Images
|
|
198
153
|
|
|
199
154
|
```typescript
|
|
200
155
|
import { Document } from 'docxmlater';
|
|
201
156
|
import { readFileSync } from 'fs';
|
|
202
157
|
|
|
203
158
|
const doc = Document.create();
|
|
204
|
-
|
|
205
|
-
// Load image from file
|
|
206
159
|
const imageBuffer = readFileSync('photo.jpg');
|
|
207
160
|
|
|
208
|
-
// Add image to document
|
|
209
161
|
const para = doc.createParagraph();
|
|
210
|
-
await para.addImage(imageBuffer, {
|
|
211
|
-
width: 400,
|
|
212
|
-
height: 300,
|
|
213
|
-
format: 'jpg',
|
|
214
|
-
});
|
|
162
|
+
await para.addImage(imageBuffer, { width: 400, height: 300, format: 'jpg' });
|
|
215
163
|
|
|
216
164
|
await doc.save('with-image.docx');
|
|
217
165
|
doc.dispose();
|
|
218
166
|
```
|
|
219
167
|
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
```typescript
|
|
223
|
-
import { Document } from 'docxmlater';
|
|
224
|
-
|
|
225
|
-
const doc = await Document.load('document.docx');
|
|
226
|
-
|
|
227
|
-
// Get all hyperlinks
|
|
228
|
-
const hyperlinks = doc.getHyperlinks();
|
|
229
|
-
console.log(`Found ${hyperlinks.length} hyperlinks`);
|
|
230
|
-
|
|
231
|
-
// Update URLs in batch (30-50% faster than manual iteration)
|
|
232
|
-
doc.updateHyperlinkUrls('http://old-domain.com', 'https://new-domain.com');
|
|
233
|
-
|
|
234
|
-
// Fix fragmented hyperlinks from Google Docs
|
|
235
|
-
const mergedCount = doc.defragmentHyperlinks({
|
|
236
|
-
resetFormatting: true, // Fix corrupted fonts
|
|
237
|
-
});
|
|
238
|
-
console.log(`Merged ${mergedCount} fragmented hyperlinks`);
|
|
239
|
-
|
|
240
|
-
await doc.save('updated.docx');
|
|
241
|
-
doc.dispose();
|
|
242
|
-
```
|
|
243
|
-
|
|
244
|
-
### Custom Styles
|
|
245
|
-
|
|
246
|
-
```typescript
|
|
247
|
-
import { Document, Style } from 'docxmlater';
|
|
248
|
-
|
|
249
|
-
const doc = Document.create();
|
|
250
|
-
|
|
251
|
-
// Create custom paragraph style
|
|
252
|
-
const customStyle = new Style('CustomHeading', 'paragraph');
|
|
253
|
-
customStyle.setName('Custom Heading');
|
|
254
|
-
customStyle.setRunFormatting({
|
|
255
|
-
bold: true,
|
|
256
|
-
fontSize: 32,
|
|
257
|
-
color: '0070C0',
|
|
258
|
-
});
|
|
259
|
-
customStyle.setParagraphFormatting({
|
|
260
|
-
alignment: 'center',
|
|
261
|
-
spacingAfter: 240,
|
|
262
|
-
});
|
|
263
|
-
|
|
264
|
-
// Add style to document
|
|
265
|
-
doc.getStylesManager().addStyle(customStyle);
|
|
266
|
-
|
|
267
|
-
// Apply style to paragraph
|
|
268
|
-
const para = doc.createParagraph();
|
|
269
|
-
para.addText('Styled Heading');
|
|
270
|
-
para.applyStyle('CustomHeading');
|
|
271
|
-
|
|
272
|
-
await doc.save('styled.docx');
|
|
273
|
-
doc.dispose();
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
### Compatibility Mode Detection and Upgrade
|
|
168
|
+
---
|
|
277
169
|
|
|
278
|
-
|
|
279
|
-
import { Document, CompatibilityMode } from 'docxmlater';
|
|
170
|
+
## Feature Overview
|
|
280
171
|
|
|
281
|
-
|
|
172
|
+
### Document Operations
|
|
282
173
|
|
|
283
|
-
|
|
284
|
-
console.log(`Mode: ${doc.getCompatibilityMode()}`); // e.g., 12 (Word 2007)
|
|
174
|
+
Create, load, and save documents from files or buffers. Manage core, extended, and custom document properties. Validate and auto-repair bookmark pairs. Preserve `app.xml` metadata (HeadingPairs, TotalTime, etc.). Configurable document background color and theme support.
|
|
285
175
|
|
|
286
|
-
|
|
287
|
-
// Get detailed compatibility info
|
|
288
|
-
const info = doc.getCompatibilityInfo();
|
|
289
|
-
console.log(`Legacy flags: ${info.legacyFlags.length}`);
|
|
290
|
-
|
|
291
|
-
// Upgrade to Word 2013+ mode (equivalent to File > Info > Convert)
|
|
292
|
-
const report = doc.upgradeToModernFormat();
|
|
293
|
-
console.log(`Removed ${report.removedFlags.length} legacy flags`);
|
|
294
|
-
console.log(`Added ${report.addedSettings.length} modern settings`);
|
|
295
|
-
}
|
|
296
|
-
|
|
297
|
-
await doc.save('modern.docx');
|
|
298
|
-
doc.dispose();
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
## API Overview
|
|
302
|
-
|
|
303
|
-
### Document Class
|
|
176
|
+
### Text & Paragraph Formatting
|
|
304
177
|
|
|
305
|
-
|
|
178
|
+
- Character formatting: bold, italic, underline, strikethrough, sub/superscript, small caps, all caps, shadow, emboss, engrave
|
|
179
|
+
- Font properties: family, size, color (RGB and theme), highlight, underline color
|
|
180
|
+
- Theme font references (`asciiTheme`, `hAnsiTheme`, `eastAsiaTheme`, `csTheme`)
|
|
181
|
+
- Paragraph alignment, indentation, spacing, borders, shading
|
|
182
|
+
- CJK / East Asian properties (kinsoku, wordWrap, overflowPunct, topLinePunct)
|
|
183
|
+
- Cross-run text search and replace, regex supported
|
|
306
184
|
|
|
307
|
-
|
|
308
|
-
- `Document.load(filepath, options?)` - Load from file
|
|
309
|
-
- `Document.loadFromBuffer(buffer, options?)` - Load from memory
|
|
185
|
+
### Lists & Tables
|
|
310
186
|
|
|
311
|
-
|
|
187
|
+
- Numbered (decimal, roman, alpha) and bulleted lists
|
|
188
|
+
- Multi-level lists with custom numbering and restart control
|
|
189
|
+
- Tables with borders, shading, alignment, and width control
|
|
190
|
+
- Horizontal and vertical cell merging, including legacy `hMerge`
|
|
191
|
+
- Fixed and auto table layouts
|
|
192
|
+
- Cell content management with structure preservation
|
|
312
193
|
|
|
313
|
-
|
|
194
|
+
### Rich Content
|
|
314
195
|
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
196
|
+
- Images: PNG, JPEG, GIF, SVG, EMF, WMF - with positioning, text wrapping, and full DrawingML attribute coverage
|
|
197
|
+
- Headers and footers with first-page and odd/even variants
|
|
198
|
+
- Hyperlinks (external and internal), with defragmentation and URL sanitization utilities
|
|
199
|
+
- Bookmarks (block and inline level) and cross-references
|
|
200
|
+
- Shapes and text boxes
|
|
318
201
|
|
|
319
|
-
|
|
320
|
-
const doc = await Document.load('document.docx', {
|
|
321
|
-
revisionHandling: 'accept' // Accept all changes (default)
|
|
322
|
-
// OR
|
|
323
|
-
revisionHandling: 'strip' // Remove all revision markup
|
|
324
|
-
// OR
|
|
325
|
-
revisionHandling: 'preserve' // Keep tracked changes (may cause corruption, but should not do so - report errors if found)
|
|
326
|
-
});
|
|
327
|
-
```
|
|
202
|
+
### Revisions & Collaboration
|
|
328
203
|
|
|
329
|
-
|
|
204
|
+
- Track changes (insertions, deletions, formatting)
|
|
205
|
+
- Character-level granular revisions via text diffing
|
|
206
|
+
- Comments with resolve/unresolve workflow
|
|
207
|
+
- Run property change tracking (`w:rPrChange`)
|
|
208
|
+
- Paragraph mark revision tracking (`w:del`/`w:ins` in `w:pPr`/`w:rPr`)
|
|
209
|
+
- People.xml auto-registration for revision authors
|
|
210
|
+
- Full round-trip preservation of pre-existing tracked changes
|
|
330
211
|
|
|
331
|
-
|
|
332
|
-
- `'strip'`: Removes all revision markup completely
|
|
333
|
-
- `'preserve'`: Keeps tracked changes as-is (may cause Word "unreadable content" errors)
|
|
212
|
+
### Advanced Features
|
|
334
213
|
|
|
335
|
-
|
|
214
|
+
- Compatibility mode detection and upgrade (Word 2003 / 2007 / 2010 / 2013+)
|
|
215
|
+
- Table of contents generation with customizable heading levels
|
|
216
|
+
- Fields: merge fields, date/time, page numbers, TOC fields
|
|
217
|
+
- Footnotes and endnotes (full round-trip and dedicated API)
|
|
218
|
+
- Content controls (Structured Document Tags)
|
|
219
|
+
- Form field data preservation (text, checkbox, dropdown per ECMA-376 §17.16)
|
|
220
|
+
- `w14` run effects passthrough (Word 2010+ ligatures, numForm, textOutline)
|
|
221
|
+
- Multiple sections with independent page layouts and orientations
|
|
222
|
+
- Lossless image optimization (PNG re-compression, BMP-to-PNG conversion)
|
|
223
|
+
- Unified shading model with theme color support and inheritance resolution
|
|
336
224
|
|
|
337
|
-
|
|
225
|
+
### Document Conversion
|
|
338
226
|
|
|
339
|
-
|
|
227
|
+
Export to Markdown, HTML (fragment or full page), Base64, or Data URI. Create documents from Markdown.
|
|
340
228
|
|
|
341
|
-
|
|
342
|
-
- `createTable(rows, cols)` - Add table
|
|
343
|
-
- `createSection()` - Add section
|
|
344
|
-
- `getBodyElements()` - Get all body content
|
|
229
|
+
### Preserved (round-trip only)
|
|
345
230
|
|
|
346
|
-
|
|
231
|
+
The following features round-trip safely as raw XML but have no editing API:
|
|
347
232
|
|
|
348
|
-
- `
|
|
349
|
-
-
|
|
350
|
-
-
|
|
351
|
-
-
|
|
352
|
-
- `
|
|
353
|
-
-
|
|
233
|
+
- Charts (`c:chartSpace`)
|
|
234
|
+
- SmartArt
|
|
235
|
+
- OLE embedded objects (`w:object`)
|
|
236
|
+
- Math equations
|
|
237
|
+
- Glossary documents (`glossary.xml`)
|
|
238
|
+
- Advanced DrawingML (gradient/pattern fills, group shapes, 3D effects)
|
|
354
239
|
|
|
355
|
-
|
|
240
|
+
---
|
|
356
241
|
|
|
357
|
-
|
|
358
|
-
- `setAllRunsSize(size)` - Apply font size to all text
|
|
359
|
-
- `setAllRunsColor(color)` - Apply color to all text
|
|
360
|
-
- `getFormattingReport()` - Get document formatting statistics
|
|
242
|
+
## API Reference
|
|
361
243
|
|
|
362
|
-
|
|
244
|
+
### Document
|
|
363
245
|
|
|
364
|
-
|
|
365
|
-
- `updateHyperlinkUrls(oldUrl, newUrl)` - Batch URL update
|
|
366
|
-
- `defragmentHyperlinks(options?)` - Fix fragmented links
|
|
367
|
-
- `collectAllReferencedHyperlinkIds()` - Comprehensive scan of all hyperlink relationship IDs (includes nested tables, headers/footers, footnotes/endnotes)
|
|
246
|
+
**Creation & Loading**
|
|
368
247
|
|
|
369
|
-
|
|
248
|
+
| Method | Description |
|
|
249
|
+
| ------------------------------------------- | --------------------------- |
|
|
250
|
+
| `Document.create(options?)` | Create a new document |
|
|
251
|
+
| `Document.load(path, options?)` | Load from a file path |
|
|
252
|
+
| `Document.loadFromBuffer(buffer, options?)` | Load from a `Buffer` |
|
|
253
|
+
| `Document.fromMarkdown(md)` | Create from Markdown source |
|
|
254
|
+
| `Document.loadFromBase64(b64)` | Load from a Base64 string |
|
|
370
255
|
|
|
371
|
-
|
|
372
|
-
- `getCharacterCount(includeSpaces?)` - Count characters
|
|
373
|
-
- `estimateSize()` - Estimate file size
|
|
256
|
+
**Content Management**
|
|
374
257
|
|
|
375
|
-
|
|
258
|
+
- `createParagraph()`, `createTable(rows, cols)`, `createSection()`
|
|
259
|
+
- `addHeading(text, level?)`, `addPageBreak()`, `addHorizontalRule(color?, size?)`
|
|
260
|
+
- `addBulletListFromArray(items)`, `addNumberedListFromArray(items)`
|
|
261
|
+
- `createTableFromCSV(csv, delimiter?)`
|
|
262
|
+
- `getBodyElements()`, `clear()`, `clone()`
|
|
263
|
+
- `insertAfter(ref, el)`, `insertBefore(ref, el)`, `replaceElement(old, new)`, `removeElement(el)`
|
|
264
|
+
- `forEachParagraph(cb)`, `forEachTable(cb)`, `extractByHeading(maxLevel?)`, `getElementsBetween(start, end)`
|
|
376
265
|
|
|
377
|
-
|
|
378
|
-
- `isCompatibilityMode()` - Check if document targets a legacy Word version
|
|
379
|
-
- `getCompatibilityInfo()` - Get full parsed compat settings
|
|
380
|
-
- `upgradeToModernFormat()` - Upgrade to Word 2013+ mode (removes legacy flags)
|
|
266
|
+
**Search & Replace**
|
|
381
267
|
|
|
382
|
-
|
|
268
|
+
- `findText(pattern)`, `replaceText(pattern, replacement)`
|
|
269
|
+
- `findParagraphsByText(pattern)`, `getParagraphsByStyle(styleId)`
|
|
270
|
+
- `getRunsByFont(name)`, `getRunsByColor(color)`
|
|
383
271
|
|
|
384
|
-
|
|
385
|
-
- `createEndnote(paragraph, text)` - Add endnote
|
|
386
|
-
- `clearFootnotes()` / `clearEndnotes()` - Remove all notes
|
|
387
|
-
- `getFootnoteManager()` / `getEndnoteManager()` - Access note managers
|
|
272
|
+
**Bulk Formatting**
|
|
388
273
|
|
|
389
|
-
|
|
274
|
+
- `setAllRunsFont(name)`, `setAllRunsSize(size)`, `setAllRunsColor(color)`
|
|
275
|
+
- `setDefaultFont(name, size?)`, `setDefaultFontSize(size)`
|
|
276
|
+
- `getFormattingReport()`
|
|
390
277
|
|
|
391
|
-
|
|
392
|
-
- `cleanupUnusedNumbering()` - Remove unused numbering definitions (scans body, headers, footers, footnotes, endnotes)
|
|
393
|
-
- `consolidateNumbering(options?)` - Merge duplicate abstract numbering definitions
|
|
394
|
-
- `validateNumberingReferences()` - Fix orphaned numId references
|
|
278
|
+
**Hyperlinks**
|
|
395
279
|
|
|
396
|
-
|
|
280
|
+
- `getHyperlinks()`, `updateHyperlinkUrls(oldUrl, newUrl)`
|
|
281
|
+
- `defragmentHyperlinks(options?)`, `collectAllReferencedHyperlinkIds()`
|
|
397
282
|
|
|
398
|
-
|
|
283
|
+
**Statistics**
|
|
399
284
|
|
|
400
|
-
|
|
285
|
+
- `getWordCount()`, `getCharacterCount(includeSpaces?)`
|
|
286
|
+
- `estimateSize()`, `getStatistics()`
|
|
401
287
|
|
|
402
|
-
|
|
403
|
-
- `stripOrphanRSIDs()` - Remove orphan RSIDs from settings.xml
|
|
404
|
-
- `clearDirectSpacingForStyles(styleIds)` - Remove direct spacing overrides from styled paragraphs
|
|
288
|
+
**Compatibility**
|
|
405
289
|
|
|
406
|
-
|
|
290
|
+
- `getCompatibilityMode()`, `isCompatibilityMode()`
|
|
291
|
+
- `getCompatibilityInfo()`, `upgradeToModernFormat()`
|
|
407
292
|
|
|
408
|
-
|
|
293
|
+
**Footnotes & Endnotes**
|
|
409
294
|
|
|
410
|
-
|
|
295
|
+
- `createFootnote(paragraph, text)`, `createEndnote(paragraph, text)`
|
|
296
|
+
- `clearFootnotes()`, `clearEndnotes()`
|
|
297
|
+
- `getFootnoteManager()`, `getEndnoteManager()`
|
|
411
298
|
|
|
412
|
-
|
|
413
|
-
- `addPageBreak()` - Insert page break
|
|
414
|
-
- `addHorizontalRule(color?, size?)` - Insert horizontal line
|
|
415
|
-
- `setDefaultFont(name, size?)` - Set document default font via Normal style
|
|
416
|
-
- `setDefaultFontSize(size)` - Set document default font size
|
|
417
|
-
- `clear()` - Remove all body content (preserves styles/settings)
|
|
418
|
-
- `clone()` - Deep copy document for template batch generation
|
|
419
|
-
- `addBulletListFromArray(items)` - Create bullet list from string array
|
|
420
|
-
- `addNumberedListFromArray(items)` - Create numbered list from string array
|
|
421
|
-
- `createTableFromCSV(csv, delimiter?)` - Create table from CSV data
|
|
299
|
+
**Numbering**
|
|
422
300
|
|
|
423
|
-
|
|
301
|
+
- `restartNumbering(numId, level?, startValue?)`
|
|
302
|
+
- `cleanupUnusedNumbering()`, `consolidateNumbering(options?)`
|
|
303
|
+
- `validateNumberingReferences()`
|
|
424
304
|
|
|
425
|
-
|
|
426
|
-
- `findAndHighlight(text, color?)` - Highlight all text occurrences
|
|
427
|
-
- `findAndFormat(text, formatting)` - Apply formatting to all text occurrences
|
|
305
|
+
**Sanitization & Optimization**
|
|
428
306
|
|
|
429
|
-
|
|
307
|
+
- `flattenFieldCodes()` - strip INCLUDEPICTURE markup, keep images
|
|
308
|
+
- `stripOrphanRSIDs()` - remove unused RSIDs from `settings.xml`
|
|
309
|
+
- `clearDirectSpacingForStyles(ids)` - remove direct spacing on styled paragraphs
|
|
310
|
+
- `optimizeImages()` - lossless PNG re-compression, BMP-to-PNG
|
|
430
311
|
|
|
431
|
-
|
|
432
|
-
- `toHTML(options?)` - Export as HTML (fragment or full page)
|
|
433
|
-
- `toBase64()` - Export as base64 string
|
|
434
|
-
- `toDataUri()` - Export as data URI
|
|
435
|
-
- `fromMarkdown(md)` - Create document from Markdown (static)
|
|
436
|
-
- `loadFromBase64(base64)` - Load document from base64 (static)
|
|
312
|
+
**Templates & Highlighting**
|
|
437
313
|
|
|
438
|
-
|
|
314
|
+
- `fillTemplate(data, options?)` - replace `{{key}}` placeholders across runs
|
|
315
|
+
- `findAndHighlight(text, color?)`, `findAndFormat(text, formatting)`
|
|
439
316
|
|
|
440
|
-
|
|
441
|
-
- `insertBefore(reference, element)` - Insert element before reference
|
|
442
|
-
- `replaceElement(old, new)` - Replace body element in-place
|
|
443
|
-
- `removeElement(element)` - Remove body element by reference
|
|
444
|
-
- `extractByHeading(maxLevel?)` - Group content by heading sections
|
|
445
|
-
- `getElementsBetween(start, end)` - Get elements between two references
|
|
446
|
-
- `forEachParagraph(callback)` - Iterate top-level paragraphs
|
|
447
|
-
- `forEachTable(callback)` - Iterate top-level tables
|
|
448
|
-
- `getStatistics()` - Comprehensive document metrics
|
|
317
|
+
**Conversion**
|
|
449
318
|
|
|
450
|
-
|
|
319
|
+
- `toMarkdown()`, `toHTML(options?)`, `toPlainText()`
|
|
320
|
+
- `toBase64()`, `toDataUri()`, `getHeadingHierarchy()`
|
|
321
|
+
- `findImagesWithoutAltText()` (accessibility audit)
|
|
451
322
|
|
|
452
|
-
|
|
453
|
-
- `toBuffer()` - Save to Buffer
|
|
454
|
-
- `dispose()` - Free resources (important!)
|
|
323
|
+
**Saving**
|
|
455
324
|
|
|
456
|
-
|
|
325
|
+
- `save(path)`, `toBuffer()`, `dispose()` - _always call `dispose()` when finished_
|
|
457
326
|
|
|
458
|
-
|
|
327
|
+
### Paragraph
|
|
459
328
|
|
|
460
|
-
|
|
461
|
-
- `addRun(run)` - Add custom run
|
|
462
|
-
- `addHyperlink(hyperlink)` - Add hyperlink
|
|
463
|
-
- `addImage(buffer, options)` - Add image
|
|
329
|
+
**Content**: `addText(text, formatting?)`, `addRun(run)`, `addHyperlink(link)`, `addImage(buffer, options)`
|
|
464
330
|
|
|
465
|
-
**Formatting
|
|
331
|
+
**Formatting**: `setAlignment`, `setIndentation`, `setSpacing`, `setBorders`, `setShading`, `applyStyle`, `setKeepNext`, `setKeepLines`, `setPageBreakBefore`, `clearSpacing`
|
|
466
332
|
|
|
467
|
-
|
|
468
|
-
- `setIndentation(options)` - First line, hanging, left, right
|
|
469
|
-
- `setSpacing(options)` - Line spacing, before/after
|
|
470
|
-
- `setBorders(borders)` - Paragraph borders
|
|
471
|
-
- `setShading(shading)` - Background color
|
|
472
|
-
- `applyStyle(styleId)` - Apply paragraph style
|
|
333
|
+
**Text manipulation**: `applyFormattingToRange`, `deleteRange`, `truncate`, `wrap`, `splitAt`, `consolidateRuns`, `replaceAll`, `findTextCrossRun`, `getRunAtOffset`, `getFormattingAtOffset`, `contains`, `toJSON` / `fromJSON`
|
|
473
334
|
|
|
474
|
-
**
|
|
335
|
+
**Numbering**: `setNumbering(numId, level)`
|
|
475
336
|
|
|
476
|
-
|
|
477
|
-
- `setKeepLines(value)` - Keep lines together
|
|
478
|
-
- `setPageBreakBefore(value)` - Page break before
|
|
479
|
-
- `clearSpacing()` - Remove direct spacing (inherit from style)
|
|
337
|
+
### Run
|
|
480
338
|
|
|
481
|
-
**Text
|
|
339
|
+
**Text**: `setText`, `getText`, `getPlainText`, `splitAt`
|
|
482
340
|
|
|
483
|
-
|
|
484
|
-
- `deleteRange(start, end)` - Delete character range
|
|
485
|
-
- `truncate(maxLength, suffix?)` - Truncate text with ellipsis
|
|
486
|
-
- `wrap(prefix, suffix, formatting?)` - Wrap content with prefix/suffix
|
|
487
|
-
- `splitAt(offset)` - Split paragraph into two at character position
|
|
488
|
-
- `consolidateRuns()` - Merge adjacent runs with identical formatting
|
|
489
|
-
- `replaceAll(find, replace)` - Cross-run find and replace
|
|
490
|
-
- `findTextCrossRun(find)` - Cross-run text search with offsets
|
|
491
|
-
- `getRunAtOffset(offset)` - Get run at character position
|
|
492
|
-
- `getFormattingAtOffset(offset)` - Get formatting at character position
|
|
493
|
-
- `contains(text, caseSensitive?)` - Check if paragraph contains text
|
|
494
|
-
- `toJSON()` / `fromJSON(data)` - Serialize/deserialize paragraph
|
|
341
|
+
**Character formatting**: `setBold`, `setItalic`, `setUnderline`, `setStrikethrough`, `setFont`, `setFontSize`, `setColor`, `setHighlight`
|
|
495
342
|
|
|
496
|
-
**
|
|
343
|
+
**Advanced**: `setSubscript`, `setSuperscript`, `setSmallCaps`, `setAllCaps`, `clearMatchingFormatting`, `equals`, `hasSameFormatting`, `clone`
|
|
497
344
|
|
|
498
|
-
|
|
345
|
+
### Table
|
|
499
346
|
|
|
500
|
-
|
|
347
|
+
**Structure**: `addRow`, `addRowFromArray`, `getRow`, `getCell`, `setCell`, `duplicateRow`, `addSummaryRow`
|
|
501
348
|
|
|
502
|
-
**
|
|
349
|
+
**Data**: `fromArray` / `toArray`, `fromCSV` / `toCSV`, `toPlainText`, `transpose`, `clone`, `sortRows`
|
|
503
350
|
|
|
504
|
-
|
|
505
|
-
- `getText()` - Get run text
|
|
506
|
-
- `getPlainText()` - Get text only (no tabs/breaks)
|
|
507
|
-
- `splitAt(offset)` - Split run at character position
|
|
351
|
+
**Queries**: `getColumnCells`, `getColumnTexts`, `findCell`, `filterRows`, `forEachCell`, `mapColumn`
|
|
508
352
|
|
|
509
|
-
**
|
|
353
|
+
**Formatting**: `setBorders`, `setAlignment`, `setWidth`, `setLayout`, `applyStyle`
|
|
510
354
|
|
|
511
|
-
|
|
512
|
-
- `setItalic(value)` - Italic text
|
|
513
|
-
- `setUnderline(style?)` - Underline
|
|
514
|
-
- `setStrikethrough(value)` - Strikethrough
|
|
515
|
-
- `setFont(name)` - Font family
|
|
516
|
-
- `setFontSize(size)` - Font size in points
|
|
517
|
-
- `setColor(color)` - Text color (hex)
|
|
518
|
-
- `setHighlight(color)` - Highlight color
|
|
355
|
+
**Cleanup**: `removeEmptyRows`, `removeEmptyColumns`
|
|
519
356
|
|
|
520
|
-
|
|
357
|
+
### TableCell
|
|
521
358
|
|
|
522
|
-
|
|
523
|
-
- `setSuperscript(value)` - Superscript
|
|
524
|
-
- `setSmallCaps(value)` - Small capitals
|
|
525
|
-
- `setAllCaps(value)` - All capitals
|
|
526
|
-
- `clearMatchingFormatting(styleFormatting)` - Remove formatting matching a style (for inheritance)
|
|
527
|
-
- `equals(other)` - Compare text and formatting equality
|
|
528
|
-
- `hasSameFormatting(other)` - Compare formatting only
|
|
529
|
-
- `clone()` - Deep copy run
|
|
359
|
+
**Content**: `addParagraph`, `getParagraphs`, `removeTrailingBlankParagraphs`, `removeParagraph`, `addParagraphAt`
|
|
530
360
|
|
|
531
|
-
|
|
361
|
+
**Formatting**: `setBorders`, `setShading`, `setBackgroundColor` / `getBackgroundColor`, `setVerticalAlignment`, `setWidth`
|
|
532
362
|
|
|
533
|
-
**
|
|
363
|
+
**Spanning**: `setHorizontalMerge`, `setVerticalMerge`
|
|
534
364
|
|
|
535
|
-
|
|
536
|
-
- `addRowFromArray(cells)` - Add row from string array
|
|
537
|
-
- `getRow(index)` - Get row by index
|
|
538
|
-
- `getCell(row, col)` - Get specific cell
|
|
539
|
-
- `setCell(row, col, text)` - Set cell text by coordinates
|
|
540
|
-
- `duplicateRow(index, count?)` - Clone a row in-place
|
|
541
|
-
- `addSummaryRow(options?)` - Add computed totals row
|
|
365
|
+
**Convenience**: `setTextAlignment`, `setAllParagraphsStyle`, `setAllRunsFont`, `setAllRunsSize`, `setAllRunsColor`
|
|
542
366
|
|
|
543
|
-
|
|
367
|
+
### Section
|
|
544
368
|
|
|
545
|
-
|
|
546
|
-
- `fromCSV(csv, delimiter?)` / `toCSV(delimiter?)` - CSV round-trip
|
|
547
|
-
- `toPlainText(colSep?, rowSep?)` - Delimited text export
|
|
548
|
-
- `transpose()` - Swap rows and columns
|
|
549
|
-
- `clone()` - Deep copy table
|
|
369
|
+
**Line numbering**: `setLineNumbering(options)`, `getLineNumbering()`, `clearLineNumbering()`
|
|
550
370
|
|
|
551
|
-
|
|
371
|
+
### Comment & CommentManager
|
|
552
372
|
|
|
553
|
-
|
|
554
|
-
- `getColumnTexts(colIndex)` - Get text values in a column
|
|
555
|
-
- `findCell(predicate)` - Find first matching cell with coordinates
|
|
556
|
-
- `filterRows(predicate)` - Get indices of matching rows
|
|
557
|
-
- `forEachCell(callback)` - Iterate all cells with row/col
|
|
558
|
-
- `mapColumn(colIndex, transform)` - Transform column values
|
|
373
|
+
**Comment**: `resolve()`, `unresolve()`, `isResolved()`
|
|
559
374
|
|
|
560
|
-
**
|
|
375
|
+
**CommentManager**: `getResolvedComments()`, `getUnresolvedComments()`
|
|
561
376
|
|
|
562
|
-
|
|
563
|
-
- `removeEmptyColumns()` - Remove columns with no text
|
|
377
|
+
### Utilities
|
|
564
378
|
|
|
565
|
-
**
|
|
379
|
+
**Unit conversion**
|
|
566
380
|
|
|
567
|
-
|
|
568
|
-
|
|
569
|
-
- `setWidth(width)` - Table width
|
|
570
|
-
- `setLayout(layout)` - Fixed or auto layout
|
|
381
|
+
```typescript
|
|
382
|
+
import { twipsToPoints, inchesToTwips, emusToPixels } from 'docxmlater';
|
|
571
383
|
|
|
572
|
-
|
|
384
|
+
twipsToPoints(240); // 12 points
|
|
385
|
+
inchesToTwips(1); // 1440 twips
|
|
386
|
+
emusToPixels(914400, 96); // 96 pixels at 96 DPI
|
|
387
|
+
```
|
|
573
388
|
|
|
574
|
-
|
|
389
|
+
40+ conversion helpers across twips, EMUs, points, pixels, inches, and centimeters.
|
|
575
390
|
|
|
576
|
-
|
|
391
|
+
**Validation**
|
|
577
392
|
|
|
578
|
-
|
|
393
|
+
```typescript
|
|
394
|
+
import { validateRunText, cleanXmlFromText } from 'docxmlater';
|
|
579
395
|
|
|
580
|
-
|
|
581
|
-
|
|
396
|
+
const result = validateRunText('Some <w:t>text</w:t>');
|
|
397
|
+
if (result.hasXml) {
|
|
398
|
+
const cleaned = cleanXmlFromText(result.text);
|
|
399
|
+
}
|
|
400
|
+
```
|
|
582
401
|
|
|
583
|
-
**
|
|
402
|
+
**Corruption detection**
|
|
584
403
|
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
- `setBackgroundColor(hex)` / `getBackgroundColor()` - Simple color shortcut
|
|
588
|
-
- `setVerticalAlignment(alignment)` - Top, center, bottom
|
|
589
|
-
- `setWidth(width)` - Cell width
|
|
404
|
+
```typescript
|
|
405
|
+
import { detectCorruptionInDocument } from 'docxmlater';
|
|
590
406
|
|
|
591
|
-
|
|
407
|
+
const doc = await Document.load('suspect.docx');
|
|
408
|
+
const report = detectCorruptionInDocument(doc);
|
|
592
409
|
|
|
593
|
-
|
|
594
|
-
|
|
410
|
+
if (report.isCorrupted) {
|
|
411
|
+
report.locations.forEach((loc) => {
|
|
412
|
+
console.log(`Line ${loc.lineNumber}: ${loc.issue}`);
|
|
413
|
+
});
|
|
414
|
+
}
|
|
415
|
+
```
|
|
595
416
|
|
|
596
|
-
|
|
417
|
+
---
|
|
597
418
|
|
|
598
|
-
|
|
599
|
-
- `setAllParagraphsStyle(styleId)` - Apply style to all paragraphs
|
|
600
|
-
- `setAllRunsFont(fontName)` - Apply font to all runs
|
|
601
|
-
- `setAllRunsSize(size)` - Apply font size to all runs
|
|
602
|
-
- `setAllRunsColor(color)` - Apply color to all runs
|
|
419
|
+
## Advanced Topics
|
|
603
420
|
|
|
604
|
-
|
|
421
|
+
### Tracked Changes
|
|
605
422
|
|
|
606
|
-
|
|
607
|
-
- `removeParagraph(index)` - Remove paragraph at index (updates nested content positions)
|
|
608
|
-
- `addParagraphAt(index, paragraph)` - Insert paragraph at index (updates nested content positions)
|
|
423
|
+
By default, `Document.load()` accepts all tracked changes during loading. This prevents revision-ID conflicts that can cause Word to report "unreadable content" on round-trip.
|
|
609
424
|
|
|
610
|
-
|
|
425
|
+
```typescript
|
|
426
|
+
const doc = await Document.load('document.docx', {
|
|
427
|
+
revisionHandling: 'accept', // default - keep insertions, drop deletions
|
|
428
|
+
// revisionHandling: 'strip', - remove all revision markup entirely
|
|
429
|
+
// revisionHandling: 'preserve', - keep tracked changes verbatim (advanced)
|
|
430
|
+
});
|
|
431
|
+
```
|
|
611
432
|
|
|
612
|
-
|
|
433
|
+
| Mode | Behavior |
|
|
434
|
+
| ------------------ | ------------------------------------------------------------------------ |
|
|
435
|
+
| `accept` (default) | Removes revision markup, keeps inserted content, removes deleted content |
|
|
436
|
+
| `strip` | Removes all revision markup completely |
|
|
437
|
+
| `preserve` | Keeps tracked changes intact for advanced workflows |
|
|
613
438
|
|
|
614
|
-
|
|
615
|
-
- `updateTableStyleShadingBulk(settings)` - Bulk update table style shading
|
|
616
|
-
- `removeTrailingBlanksInTableCells(options?)` - Remove trailing blanks from all table cells
|
|
439
|
+
### Custom Styles
|
|
617
440
|
|
|
618
|
-
|
|
441
|
+
```typescript
|
|
442
|
+
import { Document, Style } from 'docxmlater';
|
|
619
443
|
|
|
620
|
-
|
|
444
|
+
const doc = Document.create();
|
|
621
445
|
|
|
622
|
-
|
|
446
|
+
const heading = new Style('CustomHeading', 'paragraph');
|
|
447
|
+
heading.setName('Custom Heading');
|
|
448
|
+
heading.setRunFormatting({ bold: true, fontSize: 32, color: '0070C0' });
|
|
449
|
+
heading.setParagraphFormatting({ alignment: 'center', spacingAfter: 240 });
|
|
623
450
|
|
|
624
|
-
|
|
451
|
+
doc.getStylesManager().addStyle(heading);
|
|
625
452
|
|
|
626
|
-
|
|
453
|
+
const para = doc.createParagraph();
|
|
454
|
+
para.addText('Styled Heading');
|
|
455
|
+
para.applyStyle('CustomHeading');
|
|
627
456
|
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
457
|
+
await doc.save('styled.docx');
|
|
458
|
+
doc.dispose();
|
|
459
|
+
```
|
|
631
460
|
|
|
632
|
-
###
|
|
461
|
+
### Hyperlink Management
|
|
633
462
|
|
|
634
|
-
|
|
463
|
+
```typescript
|
|
464
|
+
const doc = await Document.load('document.docx');
|
|
635
465
|
|
|
636
|
-
|
|
637
|
-
|
|
638
|
-
- `isResolved()` - Check if comment is resolved
|
|
466
|
+
const links = doc.getHyperlinks();
|
|
467
|
+
console.log(`Found ${links.length} hyperlinks`);
|
|
639
468
|
|
|
640
|
-
|
|
469
|
+
doc.updateHyperlinkUrls('http://old-domain.com', 'https://new-domain.com');
|
|
641
470
|
|
|
642
|
-
|
|
471
|
+
const merged = doc.defragmentHyperlinks({ resetFormatting: true });
|
|
472
|
+
console.log(`Merged ${merged} fragmented hyperlinks`);
|
|
643
473
|
|
|
644
|
-
|
|
645
|
-
|
|
474
|
+
await doc.save('updated.docx');
|
|
475
|
+
doc.dispose();
|
|
476
|
+
```
|
|
646
477
|
|
|
647
|
-
|
|
478
|
+
`defragmentHyperlinks` repairs fragmented links commonly produced by Google Docs exports. Batch URL updates run 30-50% faster than manual iteration.
|
|
648
479
|
|
|
649
|
-
|
|
480
|
+
### Compatibility Mode
|
|
650
481
|
|
|
651
482
|
```typescript
|
|
652
|
-
|
|
653
|
-
|
|
654
|
-
const points = twipsToPoints(240); // 240 twips = 12 points
|
|
655
|
-
const twips = inchesToTwips(1); // 1 inch = 1440 twips
|
|
656
|
-
const pixels = emusToPixels(914400, 96); // 914400 EMUs = 96 pixels at 96 DPI
|
|
657
|
-
```
|
|
483
|
+
const doc = await Document.load('legacy.docx');
|
|
658
484
|
|
|
659
|
-
|
|
485
|
+
console.log(`Mode: ${doc.getCompatibilityMode()}`); // e.g. 12 (Word 2007)
|
|
660
486
|
|
|
661
|
-
|
|
662
|
-
|
|
487
|
+
if (doc.isCompatibilityMode()) {
|
|
488
|
+
const info = doc.getCompatibilityInfo();
|
|
489
|
+
console.log(`Legacy flags: ${info.legacyFlags.length}`);
|
|
663
490
|
|
|
664
|
-
|
|
665
|
-
|
|
666
|
-
|
|
667
|
-
console.warn(result.message);
|
|
668
|
-
const cleaned = cleanXmlFromText(result.text);
|
|
491
|
+
const report = doc.upgradeToModernFormat();
|
|
492
|
+
console.log(`Removed ${report.removedFlags.length} legacy flags`);
|
|
493
|
+
console.log(`Added ${report.addedSettings.length} modern settings`);
|
|
669
494
|
}
|
|
670
|
-
```
|
|
671
|
-
|
|
672
|
-
**Corruption Detection:**
|
|
673
|
-
|
|
674
|
-
```typescript
|
|
675
|
-
import { detectCorruptionInDocument } from 'docxmlater';
|
|
676
|
-
|
|
677
|
-
const doc = await Document.load('suspect.docx');
|
|
678
|
-
const report = detectCorruptionInDocument(doc);
|
|
679
495
|
|
|
680
|
-
|
|
681
|
-
|
|
682
|
-
report.locations.forEach((loc) => {
|
|
683
|
-
console.log(`Line ${loc.lineNumber}: ${loc.issue}`);
|
|
684
|
-
console.log(`Suggested fix: ${loc.suggestedFix}`);
|
|
685
|
-
});
|
|
686
|
-
}
|
|
496
|
+
await doc.save('modern.docx');
|
|
497
|
+
doc.dispose();
|
|
687
498
|
```
|
|
688
499
|
|
|
689
|
-
|
|
500
|
+
`upgradeToModernFormat()` is the programmatic equivalent of _File → Info → Convert_ in Word.
|
|
690
501
|
|
|
691
|
-
|
|
502
|
+
### Templates
|
|
692
503
|
|
|
693
504
|
```typescript
|
|
694
|
-
|
|
695
|
-
Document,
|
|
696
|
-
Paragraph,
|
|
697
|
-
Run,
|
|
698
|
-
Table,
|
|
699
|
-
RunFormatting,
|
|
700
|
-
ParagraphFormatting,
|
|
701
|
-
DocumentProperties,
|
|
702
|
-
} from 'docxmlater';
|
|
505
|
+
const doc = await Document.load('template.docx');
|
|
703
506
|
|
|
704
|
-
|
|
705
|
-
|
|
706
|
-
|
|
707
|
-
|
|
708
|
-
|
|
709
|
-
};
|
|
507
|
+
doc.fillTemplate({
|
|
508
|
+
customer: 'Acme Corp',
|
|
509
|
+
date: '2025-04-25',
|
|
510
|
+
total: '$12,400.00',
|
|
511
|
+
});
|
|
710
512
|
|
|
711
|
-
|
|
712
|
-
|
|
713
|
-
title: 'My Document',
|
|
714
|
-
author: 'John Doe',
|
|
715
|
-
created: new Date(),
|
|
716
|
-
};
|
|
513
|
+
await doc.save('invoice-acme.docx');
|
|
514
|
+
doc.dispose();
|
|
717
515
|
```
|
|
718
516
|
|
|
719
|
-
|
|
517
|
+
Placeholders use `{{key}}` syntax and are replaced safely across run boundaries.
|
|
720
518
|
|
|
721
|
-
|
|
519
|
+
### Document Conversion
|
|
722
520
|
|
|
723
|
-
|
|
724
|
-
|
|
725
|
-
## Testing
|
|
726
|
-
|
|
727
|
-
The framework includes comprehensive test coverage:
|
|
728
|
-
|
|
729
|
-
- **4,134 test cases** across 195 test suites
|
|
730
|
-
- Tests cover all phases of implementation
|
|
731
|
-
- Integration tests for complex scenarios
|
|
732
|
-
- Performance benchmarks
|
|
733
|
-
- Edge case validation
|
|
521
|
+
```typescript
|
|
522
|
+
const doc = await Document.load('report.docx');
|
|
734
523
|
|
|
735
|
-
|
|
524
|
+
const md = doc.toMarkdown();
|
|
525
|
+
const html = doc.toHTML({ fullPage: true });
|
|
526
|
+
const base64 = doc.toBase64();
|
|
736
527
|
|
|
737
|
-
|
|
738
|
-
npm test # Run all tests
|
|
739
|
-
npm run test:watch # Watch mode
|
|
740
|
-
npm run test:coverage # Coverage report
|
|
528
|
+
doc.dispose();
|
|
741
529
|
```
|
|
742
530
|
|
|
743
|
-
|
|
531
|
+
---
|
|
744
532
|
|
|
745
|
-
|
|
746
|
-
- Buffer-based operations are faster than file I/O
|
|
747
|
-
- Batch hyperlink updates are 30-50% faster than manual iteration
|
|
748
|
-
- Large documents (1000+ pages) supported with memory management
|
|
749
|
-
- Streaming support for very large files
|
|
533
|
+
## Performance & Memory Management
|
|
750
534
|
|
|
751
|
-
|
|
535
|
+
- **Always call `dispose()`** to release ZIP handles and image buffers
|
|
536
|
+
- Buffer-based I/O (`loadFromBuffer` / `toBuffer`) is 20-30% faster than file-path I/O
|
|
537
|
+
- Default size limits: warn at 50 MB, error at 150 MB (configurable via `LoadOptions.sizeLimits`)
|
|
538
|
+
- Memory footprint: ~2 MB per `Document`, ~2 bytes/character, full buffer per embedded image, ~200 bytes/cell
|
|
539
|
+
- For repeated paragraph access, cache `getAllParagraphs()` rather than calling it inside a loop
|
|
540
|
+
- Large documents (1,000+ pages) are supported
|
|
752
541
|
|
|
753
|
-
|
|
542
|
+
### Recommended Pattern
|
|
754
543
|
|
|
755
544
|
```typescript
|
|
756
545
|
import { Document } from 'docxmlater';
|
|
@@ -767,11 +556,11 @@ try {
|
|
|
767
556
|
}
|
|
768
557
|
```
|
|
769
558
|
|
|
770
|
-
For
|
|
559
|
+
For server-side buffer workflows:
|
|
771
560
|
|
|
772
561
|
```typescript
|
|
773
|
-
async function processDocument(
|
|
774
|
-
const doc = await Document.loadFromBuffer(
|
|
562
|
+
async function processDocument(input: Buffer): Promise<Buffer> {
|
|
563
|
+
const doc = await Document.loadFromBuffer(input);
|
|
775
564
|
try {
|
|
776
565
|
doc.replaceText(/placeholder/g, 'actual value');
|
|
777
566
|
return await doc.toBuffer();
|
|
@@ -781,132 +570,116 @@ async function processDocument(inputBuffer: Buffer): Promise<Buffer> {
|
|
|
781
570
|
}
|
|
782
571
|
```
|
|
783
572
|
|
|
784
|
-
Custom error types are available from `docxmlater/internal
|
|
573
|
+
Custom error types are available from `docxmlater/internal`. These include `DocxError`, `InvalidDocxError`, `CorruptedArchiveError`, and `FileOperationError`.
|
|
785
574
|
|
|
786
|
-
|
|
575
|
+
Logging is configurable via `DOCXMLATER_LOG_LEVEL=debug|info|warn|error`.
|
|
787
576
|
|
|
788
|
-
|
|
789
|
-
- Call `dispose()` promptly to release ZIP handles and image buffers
|
|
790
|
-
- Size limits default to warning at 50MB and error at 150MB (configurable via `LoadOptions.sizeLimits`)
|
|
791
|
-
- Memory usage: ~2MB base per Document, ~2 bytes/char, full buffer per embedded image, ~200 bytes/cell
|
|
792
|
-
- For repeated paragraph access, cache the result of `getAllParagraphs()` rather than calling it in a loop
|
|
577
|
+
---
|
|
793
578
|
|
|
794
579
|
## Architecture
|
|
795
580
|
|
|
796
|
-
The framework follows a modular architecture:
|
|
797
|
-
|
|
798
581
|
```
|
|
799
582
|
src/
|
|
800
|
-
├── core/
|
|
801
|
-
├── elements/
|
|
802
|
-
├── formatting/
|
|
803
|
-
├── managers/
|
|
804
|
-
├──
|
|
805
|
-
├──
|
|
806
|
-
├──
|
|
807
|
-
├──
|
|
808
|
-
├──
|
|
809
|
-
├──
|
|
810
|
-
├──
|
|
811
|
-
└── utils/
|
|
583
|
+
├── core/ Document, Parser, Generator, Validator
|
|
584
|
+
├── elements/ Paragraph, Run, Table, Image, Section, ...
|
|
585
|
+
├── formatting/ Style and Numbering managers
|
|
586
|
+
├── managers/ Drawing, Image, Relationship managers
|
|
587
|
+
├── tracking/ Revision tracking context
|
|
588
|
+
├── validation/ Revision and structural validation
|
|
589
|
+
├── helpers/ Cleanup utilities
|
|
590
|
+
├── xml/ XML generation and parsing (ReDoS-safe)
|
|
591
|
+
├── zip/ ZIP archive handling
|
|
592
|
+
├── constants/ Compatibility flags, limits, schema constants
|
|
593
|
+
├── types/ TypeScript type definitions
|
|
594
|
+
└── utils/ Units, validation, error handling
|
|
812
595
|
```
|
|
813
596
|
|
|
814
|
-
|
|
597
|
+
**Design principles**
|
|
815
598
|
|
|
816
|
-
-
|
|
817
|
-
- Position-based XML parsing (ReDoS
|
|
818
|
-
-
|
|
819
|
-
-
|
|
820
|
-
-
|
|
599
|
+
- Strict adherence to ECMA-376 (Office Open XML)
|
|
600
|
+
- Position-based XML parsing (not regex) to prevent ReDoS
|
|
601
|
+
- Round-trip XML fidelity through `_originalXml` preservation and dirty-flag regeneration
|
|
602
|
+
- Explicit memory management via the `dispose()` pattern
|
|
603
|
+
- Defensive validation with comprehensive type coverage
|
|
821
604
|
|
|
822
|
-
|
|
605
|
+
---
|
|
823
606
|
|
|
824
|
-
|
|
825
|
-
|
|
826
|
-
### ReDoS Prevention
|
|
827
|
-
|
|
828
|
-
The XML parser uses position-based parsing instead of regular expressions, preventing catastrophic backtracking attacks that can cause denial of service.
|
|
829
|
-
|
|
830
|
-
### Input Validation
|
|
831
|
-
|
|
832
|
-
**Size Limits:**
|
|
607
|
+
## Security
|
|
833
608
|
|
|
834
|
-
-
|
|
835
|
-
-
|
|
836
|
-
- XML content
|
|
609
|
+
- **ReDoS protection** - position-based XML parsing eliminates catastrophic backtracking
|
|
610
|
+
- **Path traversal prevention** - DOCX archive entries are validated against `../`, absolute paths, and URL-encoded traversal
|
|
611
|
+
- **XML injection prevention** - all text and attribute content is escaped via `XMLBuilder.escapeXmlText()` and `XMLBuilder.escapeXmlAttribute()`
|
|
612
|
+
- **Size limits** - configurable warning (50 MB) and hard cap (150 MB) on document size
|
|
613
|
+
- **Nesting limits** - XML parser caps nesting depth at 256 levels (configurable) to prevent stack overflow
|
|
614
|
+
- **UTF-8 enforcement** - all text content is explicitly UTF-8 encoded per ECMA-376
|
|
837
615
|
|
|
838
616
|
```typescript
|
|
839
|
-
// Configure size limits
|
|
840
617
|
const doc = await Document.load('large.docx', {
|
|
841
|
-
sizeLimits: {
|
|
842
|
-
warningSizeMB: 100,
|
|
843
|
-
maxSizeMB: 500,
|
|
844
|
-
},
|
|
618
|
+
sizeLimits: { warningSizeMB: 100, maxSizeMB: 500 },
|
|
845
619
|
});
|
|
846
620
|
```
|
|
847
621
|
|
|
848
|
-
**Nesting Depth:**
|
|
849
|
-
|
|
850
|
-
- Maximum XML nesting depth: 256 (configurable)
|
|
851
|
-
- Prevents stack overflow attacks
|
|
852
|
-
|
|
853
622
|
```typescript
|
|
854
623
|
import { XMLParser } from 'docxmlater/internal';
|
|
855
624
|
|
|
856
|
-
|
|
857
|
-
const obj = XMLParser.parseToObject(xml, {
|
|
858
|
-
maxNestingDepth: 512, // Increase if needed
|
|
859
|
-
});
|
|
625
|
+
const obj = XMLParser.parseToObject(xml, { maxNestingDepth: 512 });
|
|
860
626
|
```
|
|
861
627
|
|
|
862
|
-
|
|
863
|
-
|
|
864
|
-
File paths within DOCX archives are validated to prevent directory traversal attacks:
|
|
628
|
+
---
|
|
865
629
|
|
|
866
|
-
|
|
867
|
-
- Blocks absolute paths
|
|
868
|
-
- Validates URL-encoded path components
|
|
869
|
-
|
|
870
|
-
### XML Injection Prevention
|
|
630
|
+
## TypeScript Support
|
|
871
631
|
|
|
872
|
-
|
|
632
|
+
Full type definitions are bundled with the package:
|
|
873
633
|
|
|
874
|
-
|
|
875
|
-
|
|
634
|
+
```typescript
|
|
635
|
+
import {
|
|
636
|
+
Document,
|
|
637
|
+
Paragraph,
|
|
638
|
+
Run,
|
|
639
|
+
Table,
|
|
640
|
+
RunFormatting,
|
|
641
|
+
ParagraphFormatting,
|
|
642
|
+
DocumentProperties,
|
|
643
|
+
} from 'docxmlater';
|
|
876
644
|
|
|
877
|
-
|
|
645
|
+
const formatting: RunFormatting = {
|
|
646
|
+
bold: true,
|
|
647
|
+
fontSize: 12,
|
|
648
|
+
color: 'FF0000',
|
|
649
|
+
};
|
|
878
650
|
|
|
879
|
-
|
|
651
|
+
const properties: DocumentProperties = {
|
|
652
|
+
title: 'My Document',
|
|
653
|
+
author: 'Jane Doe',
|
|
654
|
+
created: new Date(),
|
|
655
|
+
};
|
|
656
|
+
```
|
|
880
657
|
|
|
881
|
-
|
|
658
|
+
---
|
|
882
659
|
|
|
883
660
|
## Requirements
|
|
884
661
|
|
|
885
662
|
- Node.js 18.0.0 or higher
|
|
886
663
|
- TypeScript 5.0+ (for development)
|
|
887
664
|
|
|
888
|
-
|
|
889
|
-
|
|
890
|
-
- `jszip` - ZIP archive handling
|
|
665
|
+
Single runtime dependency: `jszip`.
|
|
891
666
|
|
|
892
|
-
|
|
893
|
-
|
|
894
|
-
MIT
|
|
667
|
+
---
|
|
895
668
|
|
|
896
669
|
## Contributing
|
|
897
670
|
|
|
898
|
-
Contributions welcome
|
|
671
|
+
Contributions are welcome. Please:
|
|
899
672
|
|
|
900
673
|
1. Fork the repository
|
|
901
674
|
2. Create a feature branch
|
|
902
|
-
3. Add tests for new
|
|
903
|
-
4. Ensure
|
|
904
|
-
5.
|
|
675
|
+
3. Add tests for any new functionality
|
|
676
|
+
4. Ensure the full test suite passes (`npm test`)
|
|
677
|
+
5. Open a pull request
|
|
905
678
|
|
|
906
|
-
|
|
679
|
+
If you have a use case that is not yet supported, opening an issue first is the best way to discuss design before code.
|
|
907
680
|
|
|
908
|
-
|
|
681
|
+
---
|
|
909
682
|
|
|
910
|
-
##
|
|
683
|
+
## License
|
|
911
684
|
|
|
912
|
-
|
|
685
|
+
MIT
|