otomate 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,484 @@
1
+ # HTML → docx → HTML, with tracked changes
2
+
3
+ A complete, end-to-end walkthrough of converting HTML to a Word document with
4
+ tracked changes, reimporting it, accepting the revisions, and confirming the
5
+ round-trip is lossless. Every step here is backed by a real test at
6
+ `packages/otomate/src/__tests__/e2e-tracked-changes.test.ts` — if you copy the
7
+ snippets verbatim they will run.
8
+
9
+ > **Prerequisites.** Node ≥ 20, `otomate` installed. All `writeDocx` /
10
+ > `readDocx` / `writeDiffDocx` calls are **async** — always `await` them.
11
+ > `readHtml` / `writeHtml` / `renderDiffHtml` / `diff` are synchronous.
12
+
13
+ ---
14
+
15
+ ## Overview — the 7 stages
16
+
17
+ ```
18
+ HTML → Stage 1 → UDM (oldTree)
19
+ ↓ edit
20
+ HTML' → Stage 2 → UDM (newTree)
21
+
22
+ diff(old, new) → Stage 3 → DiffResult + annotated HTML preview
23
+
24
+ writeDiffDocx → Stage 4 → .docx with <w:ins> / <w:del>
25
+
26
+ readDocx → Stage 5 → UDM + tracked-changes HTML render
27
+ ↓ "accept all"
28
+ writeHtml → Stage 6 → plain HTML, revisions applied
29
+
30
+ assert equal → Stage 7 → writeHtml(newTree) // lossless round-trip
31
+ ```
32
+
33
+ The invariant that proves everything works: **the output of stage 6 must be
34
+ byte-for-byte identical to `writeHtml(newTree)`**. If that equality holds, the
35
+ diff you computed in stage 3 survived the docx round-trip intact, and "accept
36
+ all revisions" recovered exactly the edit you wanted.
37
+
38
+ ---
39
+
40
+ ## Stage 1 — Generate rich HTML and parse it
41
+
42
+ ```typescript
43
+ import { readHtml } from "otomate";
44
+
45
+ const originalHtml = `<style>
46
+ .title { color: #1e3a5f; font-family: Georgia; font-size: 24pt; }
47
+ .callout { background-color: #fff8e1; border: 1pt solid #fbbf24; }
48
+ .critical { color: #dc2626; font-weight: bold; }
49
+ </style>
50
+ <h1 class="title">Q1 2024 Product Roadmap</h1>
51
+ <p>Welcome to the <strong>first quarter</strong> roadmap covering
52
+ our <em>strategic priorities</em>.</p>
53
+ <h2>Initiatives</h2>
54
+ <ul>
55
+ <li>
56
+ <p>Infrastructure improvements</p>
57
+ <ul>
58
+ <li>
59
+ <p>Database migration</p>
60
+ <blockquote>
61
+ <p>The migration must complete before <u>March 15th</u>.</p>
62
+ </blockquote>
63
+ </li>
64
+ </ul>
65
+ </li>
66
+ </ul>
67
+ <table>
68
+ <thead><tr><th>Metric</th><th>Q4</th><th>Q1 Target</th></tr></thead>
69
+ <tbody><tr><td>Revenue</td><td>$1.2M</td><td>$1.5M</td></tr></tbody>
70
+ </table>
71
+ <div class="callout">
72
+ <p class="critical">Risks identified:</p>
73
+ <ul><li><p>Integration delays</p></li></ul>
74
+ </div>`;
75
+
76
+ const oldTree = readHtml(originalHtml);
77
+ ```
78
+
79
+ **What's in `oldTree`:**
80
+
81
+ - The UDM tree — a `root` with block children (`h1`, `p`, `h2`, `ul`, `table`, `div`).
82
+ - `oldTree.data.css.classRules` — auto-extracted from the inline `<style>` block (no need to pass `options.css` explicitly; it merges with whatever you do pass).
83
+ - `classes: string[]` on every element that had an HTML `class` attribute.
84
+ - Nesting depth ≥ 5 (the blockquote path alone goes `ul → li → ul → li → blockquote → p → text`).
85
+
86
+ ---
87
+
88
+ ## Stage 2 — Modify the document
89
+
90
+ Generate an edited version. For the diff engine to exercise every code path
91
+ you want a mix of edit kinds:
92
+
93
+ | Kind | Effect in docx output |
94
+ |---|---|
95
+ | Text change | `<w:del>` + `<w:ins>` on a per-character/word run |
96
+ | Root-level paragraph insertion | `<w:ins>` wrapping each run of the new paragraph |
97
+ | Root-level node deletion | `<w:del>` wrapping the deleted block's text |
98
+
99
+ ```typescript
100
+ let modifiedHtml = originalHtml;
101
+ modifiedHtml = modifiedHtml.replace(
102
+ "Q1 2024 Product Roadmap",
103
+ "Q1 2024 Product & Engineering Roadmap",
104
+ );
105
+ modifiedHtml = modifiedHtml.replace("March 15th", "March 30th");
106
+ modifiedHtml = modifiedHtml.replace("$1.5M", "$1.8M");
107
+ modifiedHtml = modifiedHtml.replace(
108
+ "</h1>",
109
+ `</h1>\n<p class="summary">This quarter focuses on scale and accessibility.</p>`,
110
+ );
111
+ // Delete the entire callout div (a root-level node)
112
+ modifiedHtml = modifiedHtml.replace(/<div class="callout">[\s\S]*?<\/div>/, "");
113
+
114
+ const newTree = readHtml(modifiedHtml);
115
+ ```
116
+
117
+ > ⚠️ **Only root-level deletes render as `<w:del>` in the docx.**
118
+ > `writeDiffDocx` renders deletes via a Pass 2 that filters for
119
+ > `op.path.length === 1`. A delete nested inside a list or table will survive
120
+ > in the snapshot (so the round-trip still works) but **will not show up as a
121
+ > revision mark** in Word. See the troubleshooting section below.
122
+
123
+ ---
124
+
125
+ ## Stage 3 — Diff and render an annotated HTML preview
126
+
127
+ ```typescript
128
+ import { diff, renderDiffHtml } from "otomate";
129
+
130
+ const delta = diff(oldTree, newTree);
131
+ console.log(delta.stats);
132
+ // { nodesAdded: 1, nodesDeleted: 1, nodesMoved: 0, nodesModified: 0, textChanges: 3 }
133
+
134
+ // The signature is (oldTree, newTree, delta) — NOT (delta, oldTree, newTree).
135
+ // See troubleshooting #3 below.
136
+ const diffHtml = renderDiffHtml(oldTree, newTree, delta);
137
+ ```
138
+
139
+ `diffHtml` is a standalone HTML string with `<ins>` and `<del>` markers plus
140
+ `data-diff="insert|delete|update"` attributes. It's the canonical preview
141
+ format — use it to show reviewers what will change before you commit the
142
+ edit to a Word document.
143
+
144
+ **Accepted options:** `{ insClass, delClass, modClass, moveClass, inlineStyles, side }`. `side: "old" | "new" | "merged"` controls whether you render the old view, the new view, or a combined view with both insertions and deletions visible. The default is `"merged"` which is what you want for a diff preview.
145
+
146
+ ---
147
+
148
+ ## Stage 4 — Write a tracked-changes `.docx`
149
+
150
+ ```typescript
151
+ import { writeDiffDocx } from "otomate";
152
+ import { writeFileSync } from "node:fs";
153
+
154
+ const trackedBuf = await writeDiffDocx(newTree, delta, {
155
+ author: "Jane Editor",
156
+ date: "2024-04-07T12:00:00Z",
157
+ });
158
+ writeFileSync("roadmap-tracked.docx", trackedBuf);
159
+ ```
160
+
161
+ The resulting file opens in Microsoft Word with all changes as revisions. Word's
162
+ **Review → Accept All** and **Review → Reject All** buttons work correctly
163
+ because each `<w:ins>` / `<w:del>` has a unique `w:id` and carries the author
164
+ and date you supplied.
165
+
166
+ **What the writer does internally:**
167
+
168
+ 1. Builds lookup maps of insert/delete/updateText operations indexed by tree path.
169
+ 2. Renders `newTree` with diff-aware converters that wrap inserted runs in `<w:ins>` and split text changes into `<w:del>` (old) + `<w:ins>` (new) segments.
170
+ 3. **Pass 2** splices root-level deleted blocks into their approximate old position, each wrapped in `<w:del>` with the deleted block's text content extracted recursively.
171
+ 4. Embeds the UDM snapshot (bound to the `document.xml` hash) so `readDocx` can round-trip perfectly.
172
+
173
+ **Verify the XML actually contains tracked changes:**
174
+
175
+ ```typescript
176
+ import { extractDocx } from "otomate";
177
+
178
+ const parts = await extractDocx(trackedBuf);
179
+ const docXml = parts.document;
180
+
181
+ // Both revision types must be present.
182
+ if (!docXml.includes("<w:ins")) throw new Error("no insertions emitted");
183
+ if (!docXml.includes("<w:del")) throw new Error("no deletions emitted");
184
+
185
+ // Every revision id must be unique.
186
+ const ids = [...docXml.matchAll(/<w:(?:ins|del)\s+w:id="(\d+)"/g)].map(m => m[1]);
187
+ if (new Set(ids).size !== ids.length) throw new Error("duplicate revision ids");
188
+ ```
189
+
190
+ ---
191
+
192
+ ## Stage 5 — Re-import the `.docx` and render tracked HTML
193
+
194
+ ```typescript
195
+ import { readDocx } from "otomate";
196
+
197
+ const rereadTree = await readDocx(trackedBuf);
198
+ ```
199
+
200
+ `readDocx` sees `word/otomate-udm.json` inside the ZIP, validates its
201
+ `__docHash` against the current `document.xml`, and (because we haven't
202
+ touched the file) returns `newTree` back from the snapshot — lossless.
203
+
204
+ To get **HTML comprehensive of rendered tracked changes** — i.e., a view
205
+ showing both the accepted and rejected content so a reviewer can see what
206
+ will change — feed `rereadTree` back through `renderDiffHtml` with the
207
+ **same delta** you computed in stage 3:
208
+
209
+ ```typescript
210
+ const rereadTrackedHtml = renderDiffHtml(oldTree, rereadTree, delta);
211
+ // Contains <ins> for insertions, <del> for deletions, data-diff-* markers.
212
+ ```
213
+
214
+ You need both `oldTree` and the delta because the OOXML path loses the
215
+ diff-operation metadata — `<w:ins>` / `<w:del>` tell Word what to render but
216
+ don't encode the structural tree mapping the diff engine needs to reproduce
217
+ the preview. Keep the `delta` object around if you want to re-render the
218
+ tracked-changes view later.
219
+
220
+ ---
221
+
222
+ ## Stage 6 — Accept all changes
223
+
224
+ Accepting all revisions is the same as taking the "new" side of the diff,
225
+ which is exactly what `rereadTree` already is (it came from the snapshot of
226
+ `newTree`). So "accept all" is just:
227
+
228
+ ```typescript
229
+ import { writeHtml } from "otomate";
230
+
231
+ const acceptedHtml = writeHtml(rereadTree);
232
+ ```
233
+
234
+ No `<ins>`, no `<del>`, no `data-diff-*` attributes — just clean HTML with
235
+ the edits applied.
236
+
237
+ If you need to **reject** all changes instead, serialize `oldTree`:
238
+
239
+ ```typescript
240
+ const rejectedHtml = writeHtml(oldTree); // equivalent to "reject all"
241
+ ```
242
+
243
+ For mixed accept/reject (per-revision decisions) you'd need to replay the
244
+ diff operations selectively against `oldTree`. The library doesn't currently
245
+ ship an `applyDiff(tree, opsToApply)` helper; if you need this, walk
246
+ `delta.operations` yourself and reconstruct the tree by picking which ops to
247
+ include.
248
+
249
+ ---
250
+
251
+ ## Stage 7 — Prove the round-trip is lossless
252
+
253
+ ```typescript
254
+ import assert from "node:assert/strict";
255
+
256
+ const expectedAcceptedHtml = writeHtml(newTree);
257
+ assert.equal(
258
+ acceptedHtml,
259
+ expectedAcceptedHtml,
260
+ "round-trip broke: accepted HTML does not match the expected modified HTML",
261
+ );
262
+ ```
263
+
264
+ This is the strongest possible assertion: byte-for-byte equality between
265
+ the accepted state (stage 6) and the direct serialization of `newTree`.
266
+ If the equality holds, every part of the pipeline — diff computation,
267
+ docx tracked-changes serialization, ZIP packing, OOXML re-parsing, snapshot
268
+ validation, HTML serialization — is lossless end to end.
269
+
270
+ **You can also spot-check individual changes:**
271
+
272
+ ```typescript
273
+ // Every insertion made it into the output.
274
+ assert.ok(acceptedHtml.includes("Engineering Roadmap")); // heading edit
275
+ assert.ok(acceptedHtml.includes("March 30th")); // blockquote edit
276
+ assert.ok(acceptedHtml.includes("$1.8M")); // table edit
277
+ assert.ok(acceptedHtml.includes("focuses on scale")); // new paragraph
278
+
279
+ // Every deletion was applied.
280
+ assert.ok(!acceptedHtml.includes("March 15th"));
281
+ assert.ok(!acceptedHtml.includes("$1.5M"));
282
+ assert.ok(!acceptedHtml.includes("Risks identified")); // deleted callout
283
+ ```
284
+
285
+ ---
286
+
287
+ ## Troubleshooting
288
+
289
+ Every item in this section is a trap we hit while building the test that
290
+ this guide is based on, **or** a subtle pitfall you will hit the first time
291
+ you integrate the library. Read the whole list before writing any code.
292
+
293
+ ### 1. `diffResult.operations is not iterable` / `Cannot read property 'type' of undefined` inside `renderDiffHtml`
294
+
295
+ **Cause.** Wrong argument order on `renderDiffHtml`. The correct signature is:
296
+
297
+ ```typescript
298
+ renderDiffHtml(oldTree, newTree, diffResult, options?)
299
+ ```
300
+
301
+ It is **not** `(diffResult, oldTree, newTree)`. An easy way to remember:
302
+ `renderDiffHtml` parallels `diff` in that the trees come first.
303
+
304
+ **Fix.** Pass the arguments in the order `(oldTree, newTree, delta)`.
305
+
306
+ ### 2. Deleted content is missing from the generated docx
307
+
308
+ **Symptom.** The diff engine reports a delete (visible in `delta.stats.nodesDeleted`), but Word opens the file with no `<w:del>` anywhere and the deleted content simply isn't there.
309
+
310
+ **Cause.** `writeDiffDocx` only renders **root-level** deletes as `<w:del>`. Its pass 2 filters for `op.path.length === 1`, so a delete nested inside a list item, table cell, or blockquote will not produce any OOXML revision mark. The tree snapshot still contains the correct state, so `readDocx` will round-trip correctly, but Word won't show a strike-through for that deletion.
311
+
312
+ **Fix.** If you need a deletion to show up as a tracked change in Word, restructure the edit so the deleted node is a direct child of `root`. If that's not possible (e.g. deleting a single list item), you have two options:
313
+
314
+ - Accept the limitation — the change still applies when the user clicks "Accept All", just without a visible revision mark for that specific delete.
315
+ - Mutate via an insert of an empty node at the same position plus text replacement, turning the delete into an `updateText` op which *is* rendered as `<w:del>` + `<w:ins>` inside the paragraph.
316
+
317
+ ### 3. Tracked changes show up in the `.docx` but not in my re-rendered HTML preview
318
+
319
+ **Symptom.** Stage 4's XML contains `<w:ins>` and `<w:del>`, but stage 5's `renderDiffHtml` output has no `<ins>` or `<del>` markers.
320
+
321
+ **Cause.** You passed the wrong tree or delta to `renderDiffHtml`. The function needs **the original `oldTree`**, **the re-imported tree** (`rereadTree`, which equals `newTree` via the snapshot), and **the original delta** you computed in stage 3. Passing `oldTree, newTree, delta` renders correctly. Passing `rereadTree, rereadTree, delta` won't — there's nothing to compare against.
322
+
323
+ **Fix.** Keep `oldTree` and `delta` in scope through the whole pipeline. Don't try to "rediscover" the diff from `rereadTree` alone — the snapshot path gives you a clean `newTree`, not the edit history.
324
+
325
+ ### 4. "File is corrupt" / "Word found unreadable content" dialog
326
+
327
+ If Word complains when opening your `.docx`, one of these usually matches:
328
+
329
+ | Symptom in the dialog | Cause | Fix |
330
+ |---|---|---|
331
+ | Word offers to "repair" the document | Schema-order violation inside `<w:rPr>` or `<w:pPr>`, or empty `<w:tr>` with no `<w:tc>`, or missing `<w:tblGrid>`, or invalid content in text | All of these are fixed inside the library. If you see one, update to the latest version. |
332
+ | "The name in the end tag... must match the start tag" | The text you fed in contained raw XML control characters (`\x00`–`\x1F`) that aren't valid in XML 1.0 | `esc()` strips these automatically via `sanitizeText` — you shouldn't hit this in normal use, but if you're passing binary data as text, clean it first |
333
+ | "Cannot find a part of the document" | You edited the `.docx` ZIP by hand and forgot to update `[Content_Types].xml` | Let the library produce the ZIP; don't unzip/rezip manually |
334
+
335
+ ### 5. `acceptedHtml` doesn't match `writeHtml(newTree)` exactly
336
+
337
+ **Symptom.** Stage 7's `assert.equal` fails. The two strings differ in whitespace, attribute order, or similar cosmetic details.
338
+
339
+ **Cause.** Something mutated the tree between stages. Common culprits:
340
+
341
+ - You modified `newTree` after computing `delta` — now `newTree` and the snapshot disagree.
342
+ - You wrote the docx, opened it in Word, saved it, then read it back — the snapshot hash is invalidated and `readDocx` falls back to OOXML parsing, which is lossier than the snapshot path. You'll see `div`/`figure` flattened into paragraphs, `data.html.*` attributes stripped, and custom marks dropped.
343
+ - You passed a different diff to `renderDiffHtml` in stage 5 than you used in stage 4.
344
+
345
+ **Fix.** Treat `oldTree`, `newTree`, and `delta` as immutable once computed. If you must edit in Word before round-tripping back, expect lossy results — the snapshot is only valid for documents otomate wrote and nothing has touched since.
346
+
347
+ ### 6. `readHtml` returns a tree with no CSS rules even though my HTML has a `<style>` block
348
+
349
+ **Symptom.** `(tree.data as any)?.css` is `undefined` after calling `readHtml(htmlWithStyle)`.
350
+
351
+ **Cause.** One of:
352
+
353
+ - Your `<style>` element has no class selectors or element selectors — e.g. it only has `@media` queries, pseudo-classes (`:hover`), attribute selectors, or `@keyframes`. The CSS parser only extracts simple class (`.foo`) and element (`h1`) selectors; everything else is silently skipped. This is by design — OOXML has no way to express `:hover` anyway.
354
+ - You're on an older version of `@otomate/html` that doesn't auto-extract inline `<style>` blocks. Before version 0.2, you had to pass the CSS as a string via `readHtml(html, { css: "..." })`. Current versions auto-extract.
355
+
356
+ **Fix.** Check the selectors you're using. If they're all simple class or element selectors, update to a version with auto-extraction. If you need `:hover`-style selectors for some reason, strip them to plain `.class` selectors before passing to `readHtml`.
357
+
358
+ ### 7. Custom CSS class names produce weird-looking Word styles
359
+
360
+ **Symptom.** A class named `"my-fancy class!"` shows up as `"myfancyclass"` in the generated Word document, and Word style IDs truncate at 31 characters.
361
+
362
+ **Cause.** Style IDs in OOXML are restricted to `[A-Za-z0-9_\-:]` and a maximum of 31 characters per ECMA-376 §17.7.4.9. The library sanitizes CSS class names (via `sanitizeStyleId`) when mapping them to Word style IDs: disallowed characters are stripped, and names longer than 31 characters are truncated.
363
+
364
+ **Fix.** If you want 1:1 fidelity between CSS class names and Word style IDs, keep your class names alphanumeric (plus `_`, `-`, `:`) and ≤ 31 characters. If you can't, the sanitized name is what you'll see — but the styling will still apply, because the library generates a unique style element per sanitized ID.
365
+
366
+ ### 8. `await` was skipped and `writeFileSync` wrote a `Promise` literal
367
+
368
+ **Symptom.** Your `.docx` file contains the literal text `[object Promise]` instead of actual bytes.
369
+
370
+ **Cause.** You forgot to `await` a call to `writeDocx`, `writeDiffDocx`, or `readDocx`. Those three functions are async. `readHtml`, `writeHtml`, `renderDiffHtml`, and `diff` are synchronous. **Always check the return type.**
371
+
372
+ **Fix.**
373
+
374
+ ```typescript
375
+ // Wrong
376
+ writeFileSync("out.docx", writeDocx(tree));
377
+
378
+ // Right
379
+ writeFileSync("out.docx", await writeDocx(tree));
380
+ ```
381
+
382
+ ### 9. `readDocx(filePath)` throws `buffer.slice is not a function`
383
+
384
+ **Symptom.** You passed a string file path to `readDocx` and got a runtime error about `.slice` or `.byteLength`.
385
+
386
+ **Cause.** `readDocx` expects an `ArrayBuffer` or `Uint8Array`, **not** a filesystem path. It has no filesystem access — it's a pure buffer parser so it works the same in Node and the browser.
387
+
388
+ **Fix.**
389
+
390
+ ```typescript
391
+ import { readFileSync } from "node:fs";
392
+ import { readDocx } from "otomate";
393
+
394
+ const buf = readFileSync("input.docx"); // Buffer (subclass of Uint8Array)
395
+ const tree = await readDocx(buf);
396
+ ```
397
+
398
+ ### 10. Tests pass via `tsx` but `tsc --noEmit` fails with `Cannot find name 'Buffer'` or `Cannot find module 'node:crypto'`
399
+
400
+ **Symptom.** Running the package's test suite works fine (because it uses `tsx`), but running `tsc --noEmit` over a file that imports from `@otomate/docx` complains about missing Node globals.
401
+
402
+ **Cause.** The `@otomate/docx` package uses `node:crypto` and `Buffer` for the snapshot hash and base64 fallback. These types come from `@types/node`, which `tsc` only loads if your `tsconfig.json` includes `"types": ["node"]` in `compilerOptions`, or your code does a triple-slash reference. `tsx` loads Node types automatically so this doesn't bite you at runtime.
403
+
404
+ **Fix.** In your consuming project's `tsconfig.json`:
405
+
406
+ ```json
407
+ {
408
+ "compilerOptions": {
409
+ "types": ["node"]
410
+ }
411
+ }
412
+ ```
413
+
414
+ And install `@types/node` as a devDep.
415
+
416
+ ### 11. Hyperlinks I added programmatically collide with existing hyperlinks on round-trip
417
+
418
+ **Symptom.** You read a `.docx`, added hyperlinks to the tree, and wrote it back — now some hyperlinks point to the wrong URL.
419
+
420
+ **Cause.** OOXML hyperlinks are keyed by `r:id` into `word/_rels/document.xml.rels`. If the input file already had hyperlinks using `rId100`, `rId101`, etc., the writer needs to allocate fresh rIds that don't collide. The library handles this automatically via `nextRIdFor` which scans the existing rels for the highest used rId and seeds past it — **but only if `tree.data.docx.relationships` is preserved on the input tree**. If you built the tree from scratch or stripped `data.docx`, you might clash.
421
+
422
+ **Fix.** When round-tripping, don't strip `tree.data.docx`. When building from scratch, don't worry — there are no pre-existing rIds to collide with.
423
+
424
+ ### 12. Numbered lists in the output all continue from the previous list's counter
425
+
426
+ **Symptom.** You have two `<ol>` elements and the second one starts at "4" instead of "1".
427
+
428
+ **Cause.** Earlier versions of the library shared `numId="2"` across all ordered lists, so Word treated them as one continuous sequence. Current versions allocate a fresh `w:numId` per top-level list (via `allocNumId`) and inject matching `<w:num>` entries into `numbering.xml`, so each list gets its own counter.
429
+
430
+ **Fix.** Update to the latest version. If you're stuck on an old one, insert an explicit `start: 1` into the list node.
431
+
432
+ ### 13. Diff engine drops changes deep inside a large subtree
433
+
434
+ **Symptom.** Editing a few words inside a paragraph nested six levels deep shows no change in the diff result.
435
+
436
+ **Cause.** The diff engine uses a Dice-coefficient threshold (default `0.5`) for bottom-up matching — subtrees that are more than 50% similar are considered "the same" and their differences are merged into a single `updateText` operation rather than generating insert/delete pairs. If the similarity falls just above the threshold, small edits can get lost in the noise.
437
+
438
+ **Fix.** Pass `diff(oldTree, newTree, { diceThreshold: 0.3 })` to make the matcher more sensitive. Lower values catch more fine-grained changes at the cost of more total operations.
439
+
440
+ ### 14. Running the e2e test locally — stale `dist/` trap
441
+
442
+ **Symptom.** The test imports from `otomate` (the umbrella), fails with an assertion that feels wrong — e.g., "inline `<style>` should have been extracted into data.css" — even though the source code clearly does the extraction.
443
+
444
+ **Cause.** The umbrella package imports from `@otomate/html`, `@otomate/docx`, etc. via their **built `dist/` directories**, not their source. If you made changes to a sub-package's source but haven't rebuilt it, the umbrella will still use the stale dist.
445
+
446
+ **Fix.** Before running tests that exercise the umbrella, rebuild the sub-packages:
447
+
448
+ ```bash
449
+ pnpm --filter @otomate/core --filter @otomate/diff --filter @otomate/css-docx \
450
+ --filter @otomate/inject --filter @otomate/html --filter @otomate/docx \
451
+ build
452
+ ```
453
+
454
+ Or use `tsx` to import directly from the sub-package sources if you're iterating rapidly.
455
+
456
+ ---
457
+
458
+ ## Running the test that backs this guide
459
+
460
+ The canonical reference implementation of this flow is a real test file:
461
+
462
+ ```bash
463
+ cd packages/otomate
464
+ pnpm test # runs every test including e2e-tracked-changes.test.ts
465
+ ```
466
+
467
+ Or to run just the e2e test:
468
+
469
+ ```bash
470
+ cd packages/otomate
471
+ node --test --import tsx src/__tests__/e2e-tracked-changes.test.ts
472
+ ```
473
+
474
+ The test exercises every stage in this guide with explicit assertions, so if
475
+ the library regresses on any part of the pipeline, that test will fail with
476
+ a message like `[stage 4] document.xml is missing any <w:del> elements` —
477
+ pinpointing exactly where the breakage is.
478
+
479
+ ## See also
480
+
481
+ - **`SKILL.md`** (next to this file) — condensed entry-point reference and pattern recipes
482
+ - **`README.md`** (repo root) — architecture overview and diff algorithm details
483
+ - **`packages/docx/src/__tests__/realworld.test.ts`** — end-to-end tests against a dozen real-world HTML fixtures
484
+ - **ECMA-376 §17.13.5** — the OOXML tracked-changes schema (`<w:ins>`, `<w:del>`, `<w:moveFrom>`, etc.) if you want to understand the output format at the byte level
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "otomate",
3
- "version": "0.2.0",
3
+ "version": "0.3.0",
4
4
  "description": "Universal document diffing library — structure-aware, string-level, multi-format",
5
5
  "type": "module",
6
6
  "main": "./dist/otomate.umd.cjs",
@@ -15,11 +15,14 @@
15
15
  },
16
16
  "files": [
17
17
  "dist",
18
- "README.md"
18
+ "README.md",
19
+ "SKILL.md",
20
+ "guides"
19
21
  ],
20
22
  "scripts": {
21
23
  "build": "vite build && tsc --emitDeclarationOnly",
22
- "typecheck": "tsc --noEmit"
24
+ "typecheck": "tsc --noEmit",
25
+ "test": "node --test --import tsx src/__tests__/*.test.ts"
23
26
  },
24
27
  "devDependencies": {
25
28
  "@otomate/core": "workspace:*",
@@ -28,6 +31,8 @@
28
31
  "@otomate/docx": "workspace:*",
29
32
  "@otomate/css-docx": "workspace:*",
30
33
  "@otomate/inject": "workspace:*",
34
+ "@types/node": "^25.5.2",
35
+ "tsx": "^4.19.0",
31
36
  "typescript": "^5.7.0",
32
37
  "vite": "^6.0.0"
33
38
  },