docxmlater 1.16.0 → 1.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. package/README.md +491 -259
  2. package/dist/core/Document.d.ts +13 -2
  3. package/dist/core/Document.d.ts.map +1 -1
  4. package/dist/core/Document.js +155 -7
  5. package/dist/core/Document.js.map +1 -1
  6. package/dist/core/DocumentParser.d.ts.map +1 -1
  7. package/dist/core/DocumentParser.js +41 -42
  8. package/dist/core/DocumentParser.js.map +1 -1
  9. package/dist/core/DocumentValidator.d.ts.map +1 -1
  10. package/dist/core/DocumentValidator.js +4 -3
  11. package/dist/core/DocumentValidator.js.map +1 -1
  12. package/dist/elements/Hyperlink.d.ts +1 -0
  13. package/dist/elements/Hyperlink.d.ts.map +1 -1
  14. package/dist/elements/Hyperlink.js +57 -36
  15. package/dist/elements/Hyperlink.js.map +1 -1
  16. package/dist/elements/ImageManager.d.ts.map +1 -1
  17. package/dist/elements/ImageManager.js +2 -1
  18. package/dist/elements/ImageManager.js.map +1 -1
  19. package/dist/elements/Paragraph.d.ts +3 -0
  20. package/dist/elements/Paragraph.d.ts.map +1 -1
  21. package/dist/elements/Paragraph.js +11 -2
  22. package/dist/elements/Paragraph.js.map +1 -1
  23. package/dist/elements/Run.d.ts.map +1 -1
  24. package/dist/elements/Run.js +6 -4
  25. package/dist/elements/Run.js.map +1 -1
  26. package/dist/elements/StructuredDocumentTag.d.ts +2 -0
  27. package/dist/elements/StructuredDocumentTag.d.ts.map +1 -1
  28. package/dist/elements/StructuredDocumentTag.js +5 -1
  29. package/dist/elements/StructuredDocumentTag.js.map +1 -1
  30. package/dist/elements/Table.d.ts.map +1 -1
  31. package/dist/elements/Table.js +3 -2
  32. package/dist/elements/Table.js.map +1 -1
  33. package/dist/formatting/AbstractNumbering.d.ts.map +1 -1
  34. package/dist/formatting/AbstractNumbering.js +2 -1
  35. package/dist/formatting/AbstractNumbering.js.map +1 -1
  36. package/dist/formatting/NumberingManager.d.ts.map +1 -1
  37. package/dist/formatting/NumberingManager.js +6 -5
  38. package/dist/formatting/NumberingManager.js.map +1 -1
  39. package/dist/formatting/Style.d.ts.map +1 -1
  40. package/dist/formatting/Style.js +2 -1
  41. package/dist/formatting/Style.js.map +1 -1
  42. package/dist/types/styleConfig.d.ts +1 -0
  43. package/dist/types/styleConfig.d.ts.map +1 -1
  44. package/dist/utils/deepClone.d.ts +3 -0
  45. package/dist/utils/deepClone.d.ts.map +1 -0
  46. package/dist/utils/deepClone.js +50 -0
  47. package/dist/utils/deepClone.js.map +1 -0
  48. package/dist/utils/validation.d.ts.map +1 -1
  49. package/dist/utils/validation.js +3 -2
  50. package/dist/utils/validation.js.map +1 -1
  51. package/dist/zip/ZipHandler.d.ts.map +1 -1
  52. package/dist/zip/ZipHandler.js +3 -2
  53. package/dist/zip/ZipHandler.js.map +1 -1
  54. package/package.json +6 -4
package/README.md CHANGED
@@ -1,259 +1,491 @@
1
- # DOCX Header Line Break Processor
2
-
3
- A TypeScript utility using the docXMLater framework to automatically insert line breaks after Header 2 elements within 1x1 tables in Microsoft Word documents.
4
-
5
- ## Understanding Bullet Points in DOCX/XML
6
-
7
- ### Structure Overview
8
-
9
- Bullet points in DOCX files involve two main components:
10
-
11
- 1. **Numbering Definitions** (`word/numbering.xml`)
12
- ```xml
13
- <w:abstractNum w:abstractNumId="1">
14
- <w:lvl w:ilvl="0">
15
- <w:numFmt w:val="bullet"/>
16
- <w:lvlText w:val="•"/>
17
- <w:lvlJc w:val="left"/>
18
- </w:lvl>
19
- </w:abstractNum>
20
- ```
21
-
22
- 2. **Paragraph References** (`word/document.xml`)
23
- ```xml
24
- <w:p>
25
- <w:pPr>
26
- <w:numPr>
27
- <w:ilvl w:val="0"/>
28
- <w:numId w:val="1"/>
29
- </w:numPr>
30
- </w:pPr>
31
- <w:r>
32
- <w:t>Bullet point text</w:t>
33
- </w:r>
34
- </w:p>
35
- ```
36
-
37
- ### Common Windows Bullet Symbols
38
-
39
- | Symbol | Unicode | Name | Usage |
40
- |--------|---------|------|-------|
41
- | | U+2022 | Bullet | Default bullet |
42
- | | U+25CB | White Circle | Secondary level |
43
- | | U+25AA | Black Square | Tertiary level |
44
- | | U+25AB | White Square | Alternative |
45
- | | U+25C6 | Black Diamond | Emphasis |
46
- | | U+27A2 | Arrow | Direction/action |
47
- | ✓ | U+2713 | Check Mark | Completed items |
48
-
49
- ## Features
50
-
51
- - Detects Header 2 (Heading2 style) within 1x1 tables
52
- - Checks for existing line breaks between table and next element
53
- - Inserts line break only when needed
54
- - Preserves document structure and formatting
55
- - ✅ Supports both low-level XML manipulation and high-level API
56
-
57
- ## Installation
58
-
59
- ```bash
60
- # Clone or create the project
61
- mkdir docx-processor
62
- cd docx-processor
63
-
64
- # Install dependencies
65
- npm install docxml jszip
66
- npm install -D typescript ts-node @types/node
67
-
68
- # Or using the provided package.json
69
- npm install
70
- ```
71
-
72
- ## Usage
73
-
74
- ### Command Line
75
-
76
- ```bash
77
- # Basic usage
78
- ts-node process-headers-in-tables.ts input.docx
79
-
80
- # With custom output file
81
- ts-node process-headers-in-tables.ts input.docx output.docx
82
-
83
- # Verbose mode
84
- ts-node process-headers-in-tables.ts input.docx output.docx --verbose
85
- ```
86
-
87
- ### As a Module
88
-
89
- ```typescript
90
- import { HeaderTableProcessor } from './process-headers-in-tables';
91
-
92
- const processor = new HeaderTableProcessor({
93
- inputFile: 'document.docx',
94
- outputFile: 'processed.docx',
95
- verbose: true
96
- });
97
-
98
- await processor.process();
99
- ```
100
-
101
- ## How It Works
102
-
103
- ### Detection Logic
104
-
105
- 1. **Table Identification**: Finds all `<w:tbl>` elements
106
- 2. **1x1 Verification**: Counts rows (`<w:tr>`) and cells (`<w:tc>`)
107
- 3. **Header 2 Check**: Looks for `<w:pStyle w:val="Heading2">`
108
- 4. **Gap Analysis**: Examines content after table for existing breaks
109
-
110
- ### Line Break Insertion
111
-
112
- The processor inserts an empty paragraph when:
113
- - Table is exactly 1x1
114
- - Contains Header 2 style
115
- - No line break exists after table
116
- - Next element is not a section break
117
-
118
- ### XML Structure Added
119
-
120
- ```xml
121
- <!-- Empty paragraph for line break -->
122
- <w:p w:rsidR="00AB12CD" w:rsidRDefault="00AB12CD">
123
- <w:pPr>
124
- <w:spacing w:after="0" w:before="0" w:line="240" w:lineRule="auto"/>
125
- </w:pPr>
126
- </w:p>
127
- ```
128
-
129
- ## Architecture
130
-
131
- ### Low-Level API (ZipHandler)
132
- - Direct XML manipulation
133
- - Full control over document structure
134
- - Best for complex transformations
135
-
136
- ### High-Level API (Document)
137
- - Object-oriented approach
138
- - Type-safe operations
139
- - Simpler for basic edits
140
-
141
- ### Hyperlink Defragmentation (v1.15.0+)
142
-
143
- Fix fragmented hyperlinks from Google Docs exports:
144
-
145
- ```typescript
146
- import { Document } from 'docxmlater';
147
-
148
- // Load document with fragmented hyperlinks
149
- const doc = await Document.load('google-docs-export.docx');
150
-
151
- // Basic defragmentation - merges hyperlinks with same URL
152
- const mergedCount = doc.defragmentHyperlinks();
153
- console.log(`Merged ${mergedCount} fragmented hyperlinks`);
154
-
155
- // With formatting reset - fixes corrupted fonts (e.g., Caveat)
156
- const fixedCount = doc.defragmentHyperlinks({
157
- resetFormatting: true
158
- });
159
-
160
- // Reset individual hyperlink formatting
161
- const hyperlinks = doc.getHyperlinks();
162
- for (const { hyperlink } of hyperlinks) {
163
- if (hyperlink.getFormatting().font === 'Caveat') {
164
- hyperlink.resetToStandardFormatting(); // Calibri, blue, underline
165
- }
166
- }
167
-
168
- await doc.save('fixed.docx');
169
- ```
170
-
171
- **Features:**
172
- - Merges non-consecutive hyperlinks with same URL
173
- - Fixes corrupted fonts from Google Docs (Caveat → Calibri)
174
- - Processes hyperlinks in tables and main content
175
- - Optional formatting reset to standard style
176
-
177
- ## Examples
178
-
179
- ### Example 1: Processing Multiple Files
180
-
181
- ```typescript
182
- const files = ['doc1.docx', 'doc2.docx', 'doc3.docx'];
183
-
184
- for (const file of files) {
185
- const processor = new HeaderTableProcessor({
186
- inputFile: file,
187
- verbose: false
188
- });
189
- await processor.process();
190
- console.log(`Processed: ${file}`);
191
- }
192
- ```
193
-
194
- ### Example 2: Custom Processing Logic
195
-
196
- ```typescript
197
- class CustomProcessor extends HeaderTableProcessor {
198
- protected createEmptyParagraph(): string {
199
- // Custom spacing or formatting
200
- return `<w:p>
201
- <w:pPr>
202
- <w:spacing w:after="200" w:before="100"/>
203
- </w:pPr>
204
- </w:p>`;
205
- }
206
- }
207
- ```
208
-
209
- ## Troubleshooting
210
-
211
- ### Document Won't Open
212
- - Validate XML syntax
213
- - Check for unclosed tags
214
- - Verify RSID format (8 hex characters)
215
-
216
- ### Line Breaks Not Appearing
217
- - Confirm Header 2 style name matches exactly
218
- - Check table structure (must be 1x1)
219
- - Verify output file is being saved
220
-
221
- ### Performance Issues
222
- - Use buffer operations for large files
223
- - Process in batches for multiple documents
224
- - Consider streaming for files > 10MB
225
-
226
- ## Testing
227
-
228
- Create a test document with:
229
- 1. Regular paragraphs
230
- 2. 1x1 table with Header 2
231
- 3. 2x2 table with Header 2 (should be ignored)
232
- 4. 1x1 table without Header 2 (should be ignored)
233
-
234
- Run the processor and verify only the 1x1 table with Header 2 gets a line break.
235
-
236
- ## Dependencies
237
-
238
- - `docxml` (docXMLater framework) - TypeScript DOCX manipulation
239
- - `jszip` - ZIP file handling
240
- - `typescript` - TypeScript compiler
241
- - `ts-node` - TypeScript execution
242
-
243
- ## License
244
-
245
- MIT
246
-
247
- ## Contributing
248
-
249
- 1. Fork the repository
250
- 2. Create your feature branch
251
- 3. Test your changes
252
- 4. Submit a pull request
253
-
254
- ## Notes
255
-
256
- - The docXMLater framework is accessed via npm package `docxml`
257
- - Original repository: https://github.com/wvbe/docxml
258
- - This implementation uses low-level ZIP/XML manipulation for precise control
259
- - RSID generation ensures Word tracks changes properly
1
+ # docXMLater
2
+
3
+ A comprehensive, production-ready TypeScript/JavaScript framework for creating, reading, and manipulating Microsoft Word (.docx) documents programmatically.
4
+
5
+ ## Features
6
+
7
+ ### Core Document Operations
8
+ - Create DOCX files from scratch
9
+ - Read and modify existing DOCX files
10
+ - Buffer-based operations (load/save from memory)
11
+ - Document properties (core, extended, custom)
12
+ - Memory management with dispose pattern
13
+
14
+ ### Text & Paragraph Formatting
15
+ - Character formatting: bold, italic, underline, strikethrough, subscript, superscript
16
+ - Font properties: family, size, color (RGB and theme colors), highlight
17
+ - Text effects: small caps, all caps, shadow, emboss, engrave
18
+ - Paragraph alignment, indentation, spacing, borders, shading
19
+ - Text search and replace with regex support
20
+ - Custom styles (paragraph, character, table)
21
+
22
+ ### Lists & Tables
23
+ - Numbered lists (decimal, roman, alpha)
24
+ - Bulleted lists with various bullet styles
25
+ - Multi-level lists with custom numbering
26
+ - Tables with formatting, borders, shading
27
+ - Cell spanning (merge cells horizontally and vertically)
28
+ - Advanced table properties (margins, widths, alignment)
29
+
30
+ ### Rich Content
31
+ - Images (PNG, JPEG, GIF, SVG) with positioning and text wrapping
32
+ - Headers & footers (different first page, odd/even pages)
33
+ - Hyperlinks (external URLs, internal bookmarks)
34
+ - Hyperlink defragmentation utility (fixes fragmented links from Google Docs)
35
+ - Bookmarks and cross-references
36
+ - Shapes and text boxes
37
+
38
+ ### Advanced Features
39
+ - Track changes (revisions for insertions, deletions, formatting)
40
+ - Comments and annotations
41
+ - Table of contents generation with customizable heading levels
42
+ - Fields: merge fields, date/time, page numbers, TOC fields
43
+ - Footnotes and endnotes
44
+ - Content controls (Structured Document Tags)
45
+ - Multiple sections with different page layouts
46
+ - Page orientation, size, and margins
47
+
48
+ ### Developer Tools
49
+ - Complete XML generation and parsing (ReDoS-safe, position-based parser)
50
+ - 40+ unit conversion functions (twips, EMUs, points, pixels, inches, cm)
51
+ - Validation utilities and corruption detection
52
+ - Full TypeScript support with comprehensive type definitions
53
+ - Error handling utilities
54
+ - Logging infrastructure with multiple log levels
55
+
56
+ ## Installation
57
+
58
+ ```bash
59
+ npm install docxmlater
60
+ ```
61
+
62
+ ## Quick Start
63
+
64
+ ### Creating a New Document
65
+
66
+ ```typescript
67
+ import { Document } from 'docxmlater';
68
+
69
+ // Create a new document
70
+ const doc = Document.create();
71
+
72
+ // Add a paragraph
73
+ const para = doc.createParagraph();
74
+ para.addText('Hello, World!', { bold: true, fontSize: 24 });
75
+
76
+ // Save to file
77
+ await doc.save('hello.docx');
78
+
79
+ // Don't forget to dispose
80
+ doc.dispose();
81
+ ```
82
+
83
+ ### Loading and Modifying Documents
84
+
85
+ ```typescript
86
+ import { Document } from 'docxmlater';
87
+
88
+ // Load existing document
89
+ const doc = await Document.load('input.docx');
90
+
91
+ // Find and replace text
92
+ doc.replaceText(/old text/g, 'new text');
93
+
94
+ // Add a new paragraph
95
+ const para = doc.createParagraph();
96
+ para.addText('Added paragraph', { italic: true });
97
+
98
+ // Save modifications
99
+ await doc.save('output.docx');
100
+ doc.dispose();
101
+ ```
102
+
103
+ ### Working with Tables
104
+
105
+ ```typescript
106
+ import { Document } from 'docxmlater';
107
+
108
+ const doc = Document.create();
109
+
110
+ // Create a 3x4 table
111
+ const table = doc.createTable(3, 4);
112
+
113
+ // Set header row
114
+ const headerRow = table.getRow(0);
115
+ headerRow.getCell(0).addParagraph().addText('Column 1', { bold: true });
116
+ headerRow.getCell(1).addParagraph().addText('Column 2', { bold: true });
117
+ headerRow.getCell(2).addParagraph().addText('Column 3', { bold: true });
118
+ headerRow.getCell(3).addParagraph().addText('Column 4', { bold: true });
119
+
120
+ // Add data
121
+ table.getRow(1).getCell(0).addParagraph().addText('Data 1');
122
+ table.getRow(1).getCell(1).addParagraph().addText('Data 2');
123
+
124
+ // Apply borders
125
+ table.setBorders({
126
+ top: { style: 'single', size: 4, color: '000000' },
127
+ bottom: { style: 'single', size: 4, color: '000000' },
128
+ left: { style: 'single', size: 4, color: '000000' },
129
+ right: { style: 'single', size: 4, color: '000000' },
130
+ insideH: { style: 'single', size: 4, color: '000000' },
131
+ insideV: { style: 'single', size: 4, color: '000000' }
132
+ });
133
+
134
+ await doc.save('table.docx');
135
+ doc.dispose();
136
+ ```
137
+
138
+ ### Adding Images
139
+
140
+ ```typescript
141
+ import { Document } from 'docxmlater';
142
+ import { readFileSync } from 'fs';
143
+
144
+ const doc = Document.create();
145
+
146
+ // Load image from file
147
+ const imageBuffer = readFileSync('photo.jpg');
148
+
149
+ // Add image to document
150
+ const para = doc.createParagraph();
151
+ await para.addImage(imageBuffer, {
152
+ width: 400,
153
+ height: 300,
154
+ format: 'jpg'
155
+ });
156
+
157
+ await doc.save('with-image.docx');
158
+ doc.dispose();
159
+ ```
160
+
161
+ ### Hyperlink Management
162
+
163
+ ```typescript
164
+ import { Document } from 'docxmlater';
165
+
166
+ const doc = await Document.load('document.docx');
167
+
168
+ // Get all hyperlinks
169
+ const hyperlinks = doc.getHyperlinks();
170
+ console.log(`Found ${hyperlinks.length} hyperlinks`);
171
+
172
+ // Update URLs in batch (30-50% faster than manual iteration)
173
+ doc.updateHyperlinkUrls('http://old-domain.com', 'https://new-domain.com');
174
+
175
+ // Fix fragmented hyperlinks from Google Docs
176
+ const mergedCount = doc.defragmentHyperlinks({
177
+ resetFormatting: true // Fix corrupted fonts
178
+ });
179
+ console.log(`Merged ${mergedCount} fragmented hyperlinks`);
180
+
181
+ await doc.save('updated.docx');
182
+ doc.dispose();
183
+ ```
184
+
185
+ ### Custom Styles
186
+
187
+ ```typescript
188
+ import { Document, Style } from 'docxmlater';
189
+
190
+ const doc = Document.create();
191
+
192
+ // Create custom paragraph style
193
+ const customStyle = new Style('CustomHeading', 'paragraph');
194
+ customStyle.setName('Custom Heading');
195
+ customStyle.setRunFormatting({
196
+ bold: true,
197
+ fontSize: 32,
198
+ color: '0070C0'
199
+ });
200
+ customStyle.setParagraphFormatting({
201
+ alignment: 'center',
202
+ spacingAfter: 240
203
+ });
204
+
205
+ // Add style to document
206
+ doc.getStylesManager().addStyle(customStyle);
207
+
208
+ // Apply style to paragraph
209
+ const para = doc.createParagraph();
210
+ para.addText('Styled Heading');
211
+ para.applyStyle('CustomHeading');
212
+
213
+ await doc.save('styled.docx');
214
+ doc.dispose();
215
+ ```
216
+
217
+ ## API Overview
218
+
219
+ ### Document Class
220
+
221
+ **Creation & Loading:**
222
+ - `Document.create(options?)` - Create new document
223
+ - `Document.load(filepath)` - Load from file
224
+ - `Document.loadFromBuffer(buffer)` - Load from memory
225
+
226
+ **Content Management:**
227
+ - `createParagraph()` - Add paragraph
228
+ - `createTable(rows, cols)` - Add table
229
+ - `createSection()` - Add section
230
+ - `getBodyElements()` - Get all body content
231
+
232
+ **Search & Replace:**
233
+ - `findText(pattern)` - Find text matches
234
+ - `replaceText(pattern, replacement)` - Replace text
235
+
236
+ **Hyperlinks:**
237
+ - `getHyperlinks()` - Get all hyperlinks
238
+ - `updateHyperlinkUrls(oldUrl, newUrl)` - Batch URL update
239
+ - `defragmentHyperlinks(options?)` - Fix fragmented links
240
+
241
+ **Statistics:**
242
+ - `getWordCount()` - Count words
243
+ - `getCharacterCount(includeSpaces?)` - Count characters
244
+ - `estimateSize()` - Estimate file size
245
+
246
+ **Saving:**
247
+ - `save(filepath)` - Save to file
248
+ - `toBuffer()` - Save to Buffer
249
+ - `dispose()` - Free resources (important!)
250
+
251
+ ### Paragraph Class
252
+
253
+ **Content:**
254
+ - `addText(text, formatting?)` - Add text run
255
+ - `addRun(run)` - Add custom run
256
+ - `addHyperlink(hyperlink)` - Add hyperlink
257
+ - `addImage(buffer, options)` - Add image
258
+
259
+ **Formatting:**
260
+ - `setAlignment(alignment)` - Left, center, right, justify
261
+ - `setIndentation(options)` - First line, hanging, left, right
262
+ - `setSpacing(options)` - Line spacing, before/after
263
+ - `setBorders(borders)` - Paragraph borders
264
+ - `setShading(shading)` - Background color
265
+ - `applyStyle(styleId)` - Apply paragraph style
266
+
267
+ **Properties:**
268
+ - `setKeepNext(value)` - Keep with next paragraph
269
+ - `setKeepLines(value)` - Keep lines together
270
+ - `setPageBreakBefore(value)` - Page break before
271
+
272
+ **Numbering:**
273
+ - `setNumbering(numId, level)` - Apply list numbering
274
+
275
+ ### Run Class
276
+
277
+ **Text:**
278
+ - `setText(text)` - Set run text
279
+ - `getText()` - Get run text
280
+
281
+ **Character Formatting:**
282
+ - `setBold(value)` - Bold text
283
+ - `setItalic(value)` - Italic text
284
+ - `setUnderline(style?)` - Underline
285
+ - `setStrikethrough(value)` - Strikethrough
286
+ - `setFont(name)` - Font family
287
+ - `setFontSize(size)` - Font size in points
288
+ - `setColor(color)` - Text color (hex)
289
+ - `setHighlight(color)` - Highlight color
290
+
291
+ **Advanced:**
292
+ - `setSubscript(value)` - Subscript
293
+ - `setSuperscript(value)` - Superscript
294
+ - `setSmallCaps(value)` - Small capitals
295
+ - `setAllCaps(value)` - All capitals
296
+
297
+ ### Table Class
298
+
299
+ **Structure:**
300
+ - `addRow()` - Add row
301
+ - `getRow(index)` - Get row by index
302
+ - `getCell(row, col)` - Get specific cell
303
+
304
+ **Formatting:**
305
+ - `setBorders(borders)` - Table borders
306
+ - `setAlignment(alignment)` - Table alignment
307
+ - `setWidth(width)` - Table width
308
+ - `setLayout(layout)` - Fixed or auto layout
309
+
310
+ **Style:**
311
+ - `applyStyle(styleId)` - Apply table style
312
+
313
+ ### TableCell Class
314
+
315
+ **Content:**
316
+ - `addParagraph()` - Add paragraph to cell
317
+ - `getParagraphs()` - Get all paragraphs
318
+
319
+ **Formatting:**
320
+ - `setBorders(borders)` - Cell borders
321
+ - `setShading(color)` - Cell background
322
+ - `setVerticalAlignment(alignment)` - Top, center, bottom
323
+ - `setWidth(width)` - Cell width
324
+
325
+ **Spanning:**
326
+ - `setHorizontalMerge(mergeType)` - Horizontal merge
327
+ - `setVerticalMerge(mergeType)` - Vertical merge
328
+
329
+ ### Utilities
330
+
331
+ **Unit Conversions:**
332
+ ```typescript
333
+ import { twipsToPoints, inchesToTwips, emusToPixels } from 'docxmlater';
334
+
335
+ const points = twipsToPoints(240); // 240 twips = 12 points
336
+ const twips = inchesToTwips(1); // 1 inch = 1440 twips
337
+ const pixels = emusToPixels(914400, 96); // 914400 EMUs = 96 pixels at 96 DPI
338
+ ```
339
+
340
+ **Validation:**
341
+ ```typescript
342
+ import { validateRunText, detectXmlInText, cleanXmlFromText } from 'docxmlater';
343
+
344
+ // Detect XML patterns in text
345
+ const result = validateRunText('Some <w:t>text</w:t>');
346
+ if (result.hasXml) {
347
+ console.warn(result.message);
348
+ const cleaned = cleanXmlFromText(result.text);
349
+ }
350
+ ```
351
+
352
+ **Corruption Detection:**
353
+ ```typescript
354
+ import { detectCorruptionInDocument } from 'docxmlater';
355
+
356
+ const doc = await Document.load('suspect.docx');
357
+ const report = detectCorruptionInDocument(doc);
358
+
359
+ if (report.isCorrupted) {
360
+ console.log(`Found ${report.locations.length} corruption issues`);
361
+ report.locations.forEach(loc => {
362
+ console.log(`Line ${loc.lineNumber}: ${loc.issue}`);
363
+ console.log(`Suggested fix: ${loc.suggestedFix}`);
364
+ });
365
+ }
366
+ ```
367
+
368
+ ## TypeScript Support
369
+
370
+ Full TypeScript definitions included:
371
+
372
+ ```typescript
373
+ import {
374
+ Document,
375
+ Paragraph,
376
+ Run,
377
+ Table,
378
+ RunFormatting,
379
+ ParagraphFormatting,
380
+ DocumentProperties
381
+ } from 'docxmlater';
382
+
383
+ // Type-safe formatting
384
+ const formatting: RunFormatting = {
385
+ bold: true,
386
+ fontSize: 12,
387
+ color: 'FF0000'
388
+ };
389
+
390
+ // Type-safe document properties
391
+ const properties: DocumentProperties = {
392
+ title: 'My Document',
393
+ author: 'John Doe',
394
+ created: new Date()
395
+ };
396
+ ```
397
+
398
+ ## Version History
399
+
400
+ **Current Version: 1.16.0**
401
+
402
+ See [CHANGELOG.md](CHANGELOG.md) for detailed version history.
403
+
404
+ ## RAG-CLI Integration (Development Only)
405
+
406
+ This project includes MCP (Model Context Protocol) configuration to allow Claude Code to access docXMLater documentation from Documentation_Hub during development.
407
+
408
+ **Note:** RAG-CLI uses `python-docx` for DOCX indexing, not docXMLater. These are complementary tools:
409
+ - **RAG-CLI**: Index DOCX files for search/retrieval (read-only)
410
+ - **docXMLater**: Create, modify, format DOCX files (read-write)
411
+
412
+ The `.mcp.json` configuration is for development assistance only and does not represent a runtime integration between the two projects.
413
+
414
+ ## Testing
415
+
416
+ The framework includes comprehensive test coverage:
417
+
418
+ - **2073+ test cases** across 59 test files
419
+ - Tests cover all phases of implementation
420
+ - Integration tests for complex scenarios
421
+ - Performance benchmarks
422
+ - Edge case validation
423
+
424
+ Run tests:
425
+
426
+ ```bash
427
+ npm test # Run all tests
428
+ npm run test:watch # Watch mode
429
+ npm run test:coverage # Coverage report
430
+ ```
431
+
432
+ ## Performance Considerations
433
+
434
+ - Use `dispose()` to free resources after document operations
435
+ - Buffer-based operations are faster than file I/O
436
+ - Batch hyperlink updates are 30-50% faster than manual iteration
437
+ - Large documents (1000+ pages) supported with memory management
438
+ - Streaming support for very large files
439
+
440
+ ## Architecture
441
+
442
+ The framework follows a modular architecture:
443
+
444
+ ```
445
+ src/
446
+ ├── core/ # Document, Parser, Generator, Validator
447
+ ├── elements/ # Paragraph, Run, Table, Image, etc.
448
+ ├── formatting/ # Style, Numbering managers
449
+ ├── managers/ # Drawing, Image, Relationship managers
450
+ ├── xml/ # XML generation and parsing
451
+ ├── zip/ # ZIP archive handling
452
+ └── utils/ # Validation, units, error handling
453
+ ```
454
+
455
+ Key design principles:
456
+ - KISS (Keep It Simple, Stupid) - no over-engineering
457
+ - Position-based XML parsing (ReDoS-safe)
458
+ - Defensive programming with comprehensive validation
459
+ - Memory-efficient with explicit disposal pattern
460
+ - Full ECMA-376 (OpenXML) compliance
461
+
462
+ ## Requirements
463
+
464
+ - Node.js 18.0.0 or higher
465
+ - TypeScript 5.0+ (for development)
466
+
467
+ ## Dependencies
468
+
469
+ - `jszip` - ZIP archive handling
470
+
471
+ ## License
472
+
473
+ MIT
474
+
475
+ ## Contributing
476
+
477
+ Contributions welcome! Please:
478
+ 1. Fork the repository
479
+ 2. Create a feature branch
480
+ 3. Add tests for new features
481
+ 4. Ensure all tests pass
482
+ 5. Submit a pull request
483
+
484
+ ## Support
485
+
486
+ - GitHub Issues: https://github.com/ItMeDiaTech/docXMLater/issues
487
+ - Documentation: See CLAUDE.md for detailed implementation notes
488
+
489
+ ## Acknowledgments
490
+
491
+ Built with careful attention to the ECMA-376 Office Open XML specification. Special thanks to the OpenXML community for comprehensive documentation and examples.