n8n-nodes-adeu 1.15.2 → 1.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,12 @@
5
5
 
6
6
  An [n8n](https://n8n.io) community node for **[Adeu](https://adeu.ai)** — the AI-native Virtual DOM for Microsoft Word.
7
7
 
8
- > **🆕 New in Upcomming release:** `Apply Edits` now supports a **Dry Run** mode that previews edits without committing them. AI Agents can use it as a self-correction primitive to verify anchors before issuing a real edit. See the [Dry Run section](#-dry-run-mode-self-correction-for-ai-agents) and the [`$fromAI` recipe](#apply-edits) below. **Existing workflows must hand-add the `Dry_Run` binding** — n8n caches `$fromAI` expressions per workflow and does not retroactively update them on package upgrades.
8
+ > **🆕 New in this release:**
9
+ > - **`Extract Outline`** — a new operation returning a token-cheap structural map (headings + page numbers + table flags) for navigating large documents.
10
+ > - **`Page` parameter on `Extract Markdown`** — fetch only one page of a paginated projection instead of the whole document.
11
+ > - **`match_mode` and `regex` on `modify` edits** — `Apply Edits` now supports targeted multi-occurrence writes. Set `match_mode: "all"` to replace every occurrence, `"first"` to anchor to the first hit silently, or omit/`"strict"` to fail on ambiguity. Set `regex: true` to interpret `target_text` as an ES2022 RegExp (with `$1`, `$2` capture-group references in `new_text`).
12
+ >
13
+ > **Existing workflows must hand-update their `$fromAI` expressions** to expose the new fields — n8n caches `$fromAI` expressions per workflow and does not retroactively update them on package upgrades.
9
14
 
10
15
  This node bridges the gap between Large Language Models (LLMs) and Microsoft Word. It translates complex OpenXML (`.docx`) files into token-efficient Markdown, allows AI models to reason over legal or technical text, and translates the AI's JSON output back into **native Word Tracked Changes and Comments** — all completely in-process, without your documents ever leaving the n8n runtime.
11
16
 
@@ -46,29 +51,82 @@ Restart your n8n instance after installation.
46
51
 
47
52
  - **CriticMarkup Projection**: Translates existing Word tracked changes into standard Markdown (`{++inserted++}`, `{--deleted--}`).
48
53
  - **Semantic Appendix**: Automatically extracts defined terms, cross-references, and potential typos to give LLMs deeper context.
54
+ - **Structural Outline**: Lightweight headings-and-pages map of any document, with table/footnote flags per section.
55
+ - **Pagination**: Drill into a single page of a large document instead of blasting the full body into LLM context.
49
56
  - **Native Redlining**: Apply `modify`, `accept`, `reject`, and `reply` actions directly to the OOXML tree.
57
+ - **Targeted Multi-Occurrence Writes**: `match_mode` (`strict`/`first`/`all`) and `regex` support for surgical or sweeping replacements.
50
58
  - **Document Sanitization**: Strip metadata, auto-accept markup, and apply read-only locks before sending to counterparties.
51
59
 
52
60
  ---
53
61
 
54
62
  ## ⚙️ Operations
55
63
 
56
- The node exposes one resource (**Document**) with five operations:
64
+ The node exposes one resource (**Document**) with six operations:
57
65
 
58
66
  ### 1. Extract Markdown
59
67
  Projects a `.docx` file into LLM-friendly Markdown.
60
68
  - **Input**: `.docx` binary.
61
- - **Output**: JSON `{ markdown, fileName, cleanView }`.
62
- - **Clean View toggle**:
69
+ - **Output**: JSON `{ markdown, fileName, cleanView }` (plus pagination metadata when a `page` is requested).
70
+ - **Clean View toggle**:
63
71
  - `False` (Raw View): Shows all pending tracked changes via CriticMarkup. Best for resolving counterparty edits.
64
72
  - `True` (Clean View): Simulates an "Accept All" state, hiding markup. Best for generating net-new redlines on a clean baseline.
73
+ - **🆕 Page parameter**: Optional 1-based page number. When `0` (default), the full document is returned. When `>= 1`, only that page's content is returned and the JSON includes `{ page, total_pages, has_next, has_prev, tracked_change_count }`. Pages are ~19,000-character chunks of the projected body; the Structural Appendix is appended to every page. Use **Extract Outline** first to discover how many pages exist.
65
74
 
66
- ### 2. Apply Edits
75
+ ### 2. Extract Outline 🆕
76
+ Returns a token-cheap structural map of the document — essentially a table of contents an LLM can use to navigate large files.
77
+ - **Input**: `.docx` binary.
78
+ - **Output**: JSON `{ fileName, total_pages, outline: OutlineNode[] }` where each `OutlineNode` is:
79
+ ```json
80
+ {
81
+ "level": 2,
82
+ "text": "Confidentiality",
83
+ "page": 1,
84
+ "style": "Heading 2",
85
+ "has_table": false,
86
+ "footnote_ids": ["fn-1", "fn-3"]
87
+ }
88
+ ```
89
+ - `level` (1–6): Heading depth.
90
+ - `text`: Heading text with markdown/CriticMarkup stripped.
91
+ - `page`: Which Extract Markdown page this heading lands on.
92
+ - `style`: Word style name (e.g. `Heading 1`, `Title`) or `(heuristic)` for headings detected purely by typography.
93
+ - `has_table`: Whether the section directly contains a Word table (does not bubble up to ancestor headings).
94
+ - `footnote_ids`: Footnote/endnote markers scoped to this section, in document order, e.g. `fn-1`, `en-2`.
95
+ - **Typical pattern**: Call this first, let the LLM choose a section, then call **Extract Markdown** with the matching `page` to get just that page's content.
96
+
97
+ ### 3. Apply Edits
67
98
  Applies a JSON array of `DocumentChange` operations back to the Word document as tracked changes and comments.
68
99
  - **Input**: `.docx` binary + a `changes` JSON array (read from an upstream node or defined inline).
69
- - **Output**: A new redlined `.docx` binary + JSON application stats.
100
+ - **Output**: A new redlined `.docx` binary + JSON application stats with per-edit reports (status, occurrences modified, heading path, pages affected, CriticMarkup context, post-accept preview).
70
101
  - **Atomic Batch Validation**: Adeu pre-validates the *entire* array of edits before touching the document. If even one edit is invalid (e.g., target text not found, ambiguous match), the engine safely rejects the entire batch to prevent partial or corrupted document states.
71
102
 
103
+ #### 🆕 Targeted Multi-Occurrence Writes (`match_mode` + `regex`)
104
+ The `modify` edit type now supports two optional fields:
105
+
106
+ - **`match_mode`** (`"strict"` | `"first"` | `"all"`, default `"strict"`):
107
+ - `"strict"`: Fails with an actionable ambiguity error if `target_text` matches more than one location. Recommended default — surfaces ambiguity to the LLM so it can self-correct with more context.
108
+ - `"first"`: Silently anchors to the first occurrence in linear document order. Use only when you've verified there's just one intended hit.
109
+ - `"all"`: Applies the same replacement to every occurrence. Returns `occurrences_modified` in the per-edit report. Pages listed in the report cover all modified locations.
110
+
111
+ - **`regex`** (boolean, default `false`):
112
+ - When `true`, `target_text` is interpreted as an ES2022 `RegExp` pattern (case-sensitive by default — embed flags via inline syntax like `(?i)` if needed).
113
+ - `new_text` may reference capture groups via `$1`, `$2`, etc.
114
+ - Combine with `match_mode: "all"` for global regex-based replacements.
115
+
116
+ **Example — convert all dollar amounts to EUR**:
117
+ ```json
118
+ [
119
+ {
120
+ "type": "modify",
121
+ "target_text": "\\$(\\d+)",
122
+ "new_text": "EUR $1",
123
+ "match_mode": "all",
124
+ "regex": true,
125
+ "comment": "Currency normalization."
126
+ }
127
+ ]
128
+ ```
129
+
72
130
  #### 🔍 Dry Run Mode (Self-Correction for AI Agents)
73
131
  `Apply Edits` accepts an optional `Dry Run` boolean (default `false`). When enabled:
74
132
  - Every edit is validated and simulated in-memory.
@@ -79,12 +137,12 @@ Applies a JSON array of `DocumentChange` operations back to the Word document as
79
137
 
80
138
  **When the AI should use it**: as a self-correction primitive for uncertain anchors (long quotes, legal terminology, possible duplicate phrases). The agent dry-runs, inspects the `critic_markup` preview, then re-calls with `Dry_Run=false` to commit. The system prompt in the example workflow tells the LLM explicitly to use dry-run sparingly — every dry run is an extra round trip.
81
139
 
82
- ### 3. Generate Diff
140
+ ### 4. Generate Diff
83
141
  Produces a sub-word level `@@ Word Patch @@` diff between two versions of a document.
84
142
  - **Input**: Two `.docx` binaries on the same item (e.g., `data` and `data2`).
85
143
  - **Output**: JSON `{ diff, originalFileName, modifiedFileName }`.
86
144
 
87
- ### 4. Finalize Document
145
+ ### 5. Finalize Document
88
146
  Prepares a document for signature or external distribution.
89
147
  - **Modes**:
90
148
  - `Full`: Strips all metadata and requires all tracked changes/comments to be resolved (or auto-accepted).
@@ -92,8 +150,8 @@ Prepares a document for signature or external distribution.
92
150
  - `Baseline`: Only strips background noise (RSIDs, proof errors) without touching metadata.
93
151
  - **Protection**: Can inject a native Word "Read-Only" lock into the document settings.
94
152
 
95
- ### 5. Hydrate Tool Output (The "Hydration" Note)
96
- Because n8n's AI Agent tool wrapper intercepts and **strips all binary data** from tool outputs, files generated inside an AI loop cannot reach downstream nodes directly.
153
+ ### 6. Hydrate Tool Output (The "Hydration" Note)
154
+ Because n8n's AI Agent tool wrapper intercepts and **strips all binary data** from tool outputs, files generated inside an AI loop cannot reach downstream nodes directly.
97
155
  - **What it does**: This operation is placed immediately downstream of the AI Agent on the main workflow execution line. It reads the stashed metadata pointer left by the last execution of `apply_edits`, retrieves the raw file stream directly from n8n's secure binary storage, and attaches a fresh binary buffer onto the outgoing item.
98
156
  - **Output Path Construction**: It supports an optional output path template (e.g., `C:\path\to\folder\{baseName}_{timestamp}.docx`) to resolve path strings inside TypeScript. This avoids expression-parsing and escape issues when configuring downstream Write File nodes on Windows.
99
157
 
@@ -103,14 +161,14 @@ Because n8n's AI Agent tool wrapper intercepts and **strips all binary data** fr
103
161
 
104
162
  To use the **Apply Edits** operation, your LLM must output a JSON array of objects matching this schema.
105
163
 
106
- | Type | Required Fields | Description |
107
- | :--- | :--- | :--- |
108
- | `modify` | `target_text`, `new_text` | Replaces baseline text. Use the `comment` field to attach a comment bubble. |
109
- | `accept` | `target_id` | Accepts an existing tracked change (e.g., `Chg:123`). |
110
- | `reject` | `target_id` | Rejects an existing tracked change. |
111
- | `reply` | `target_id`, `text` | Replies to an existing comment (e.g., `Com:456`). |
112
- | `insert_row` | `target_text`, `position`, `cells` | Inserts a new table row `above` or `below` the target cell text. |
113
- | `delete_row` | `target_text` | Deletes the table row containing the target text. |
164
+ | Type | Required Fields | Optional Fields | Description |
165
+ | :--- | :--- | :--- | :--- |
166
+ | `modify` | `target_text`, `new_text` | `comment`, `match_mode`, `regex` | Replaces baseline text. `match_mode`: `"strict"` (default, fails on ambiguity), `"first"` (silently picks first hit), `"all"` (replaces every occurrence). `regex`: when `true`, `target_text` is an ES2022 RegExp pattern. |
167
+ | `accept` | `target_id` | `comment` | Accepts an existing tracked change (e.g., `Chg:123`). |
168
+ | `reject` | `target_id` | `comment` | Rejects an existing tracked change. |
169
+ | `reply` | `target_id`, `text` | — | Replies to an existing comment (e.g., `Com:456`). |
170
+ | `insert_row` | `target_text`, `position`, `cells` | — | Inserts a new table row `above` or `below` the target cell text. |
171
+ | `delete_row` | `target_text` | — | Deletes the table row containing the target text. |
114
172
 
115
173
  **Example LLM Output:**
116
174
  ```json
@@ -125,6 +183,13 @@ To use the **Apply Edits** operation, your LLM must output a JSON array of objec
125
183
  "target_text": "within thirty (30) days",
126
184
  "new_text": "within forty-five (45) days",
127
185
  "comment": "Compromise per our playbook."
186
+ },
187
+ {
188
+ "type": "modify",
189
+ "target_text": "the Contractor",
190
+ "new_text": "the Service Provider",
191
+ "match_mode": "all",
192
+ "comment": "Term harmonization."
128
193
  }
129
194
  ]
130
195
  ```
@@ -148,13 +213,16 @@ When an AI Agent applies edits, receives feedback, and needs to make *another* r
148
213
  [ Gmail Trigger (Incoming Doc) ]
149
214
 
150
215
 
151
- [ Adeu: Extract Markdown ]
216
+ [ Adeu: Extract Outline ] Cheap structural map for large documents
152
217
 
153
218
 
154
- [ AI Node (LLM) ] Outputs a JSON array of `DocumentChange` objects
219
+ [ Adeu: Extract Markdown ] Optionally page-scoped via Page parameter
155
220
 
156
221
 
157
- [ Adeu: Apply Edits ] Pre-validates and writes redlines atomically
222
+ [ AI Node (LLM) ] Outputs a JSON array of DocumentChange objects
223
+
224
+
225
+ [ Adeu: Apply Edits ] ← Pre-validates and writes redlines atomically
158
226
 
159
227
 
160
228
  [ Gmail: Reply with Doc ]
@@ -167,9 +235,10 @@ When an AI Agent applies edits, receives feedback, and needs to make *another* r
167
235
  To achieve the highest batch success rate when prompting models like Gemini, GPT-4o, or Claude to generate edits:
168
236
 
169
237
  1. **Enforce Exact Matching**: Instruct the LLM: *"The `target_text` must be copied EXACTLY from the source document — including identical punctuation, spacing, and capitalization."*
170
- 2. **Short but Unique**: Instruct the LLM: *"Keep `target_text` short, but ensure it is unique enough to not match multiple locations in the document."*
238
+ 2. **Short but Unique**: Instruct the LLM: *"Keep `target_text` short, but ensure it is unique enough to not match multiple locations in the document. If you need to replace the same phrase in many places, use `match_mode: 'all'` instead of writing multiple separate edits."*
171
239
  3. **No Fake Markup**: Instruct the LLM: *"Do NOT include CriticMarkup tags like `{++` or `{--` in your `new_text`. The engine will apply the redline tracking automatically."*
172
240
  4. **Mind the Overlap Constraint**: Adeu's engine strictly prevents `modify` (text-replace) edits from overlapping with or targeting text that is *already* inside a pending tracked change. Instruct the LLM: *"You cannot `modify` text that is wrapped in counterparty tracking markup. You must `accept` or `reject` their change using its ID."*
241
+ 5. **Use Outline for Navigation**: For documents longer than ~20 pages, instruct the LLM to call `Extract Outline` first to get a structural map, then call `Extract Markdown` with a specific `Page` number to drill in. This avoids blowing the context window on the full document body.
173
242
 
174
243
  ---
175
244
 
@@ -219,6 +288,25 @@ AI Agents cannot pass binary `.docx` data through JSON arguments anyway — that
219
288
  ={{ $fromAI('Clean_View', `Boolean. Set false (default) to surface all pending tracked changes as CriticMarkup tags {++ins++}, {--del--}, {>>comment<<} — use when reviewing counterparty edits or any document with pending markup. Set true to project the document as if all tracked changes were accepted (simulates Accept All) — use only when generating net-new redlines against a clean baseline.`, 'boolean', false) }}
220
289
  ```
221
290
 
291
+ **Page** 🆕:
292
+ ```
293
+ ={{ $fromAI('Page', `Optional 1-based integer page number to retrieve only one page of the projected document. Set to 0 (default) for the full document body — use 0 for short documents (under ~10 pages). For long documents, call extract_outline first to discover total_pages and which headings live on which page, then call this tool again with Page set to the page you need. Pages are ~19,000-character chunks; the Structural Appendix is appended to every page. If you request a page beyond total_pages the tool will error.`, 'number', 0) }}
294
+ ```
295
+
296
+ ---
297
+
298
+ ### Extract Outline 🆕
299
+
300
+ **Source Node Name** (when `Document Source` is `From Another Node`):
301
+ ```
302
+ ={{ $fromAI('Source_Node_Name', `Exact name of the workflow node that produced the .docx binary (string, case-sensitive, e.g. 'Read Binary File' or 'Gmail Trigger'). Must match the node label in the canvas exactly.`, 'string', 'Read Binary File') }}
303
+ ```
304
+
305
+ **Source Binary ID** (when `Document Source` is `From Another Node`):
306
+ ```
307
+ ={{ $fromAI('Source_Binary_Id', `Optional string. If you are inspecting a document that you have already modified during this conversation, pass the 'redlinedBinaryId' from the previous tool output here to view the updated draft outline. Leave empty on the first call to load from the baseline node name.`, 'string', '') }}
308
+ ```
309
+
222
310
  ---
223
311
 
224
312
  ### Apply Edits
@@ -240,7 +328,7 @@ AI Agents cannot pass binary `.docx` data through JSON arguments anyway — that
240
328
 
241
329
  **Changes (JSON):**
242
330
  ```
243
- ={{ $fromAI('Changes_JSON', `JSON-encoded string containing an array of DocumentChange objects. Each object is one of: {"type":"modify","target_text":"<verbatim from source>","new_text":"<replacement>","comment":"<optional>"} | {"type":"accept","target_id":"Chg:12","comment":"<optional>"} | {"type":"reject","target_id":"Chg:12","comment":"<optional>"} | {"type":"reply","target_id":"Com:45","text":"<reply>"} | {"type":"insert_row","target_text":"<cell text anchoring row>","position":"above" or "below","cells":["col1","col2"]} | {"type":"delete_row","target_text":"<cell text anchoring row>"}. RULES: target_text must be copied VERBATIM from the source including punctuation/whitespace/case and must uniquely anchor one location; never include CriticMarkup tags like {++ or {-- in new_text — the engine applies tracking automatically; use Chg:N and Com:N IDs exactly as surfaced by extract_markdown; the entire array must be a single JSON-encoded string. Atomic batch: if any single edit is invalid the whole array is rejected with an error telling you which edit failed — use that to self-correct on the next call.`, 'string') }}
331
+ ={{ $fromAI('Changes_JSON', `JSON-encoded string containing an array of DocumentChange objects. Each object is one of: {"type":"modify","target_text":"<verbatim from source>","new_text":"<replacement>","comment":"<optional>","match_mode":"<optional 'strict' (default) | 'first' | 'all'>","regex":<optional boolean default false>} | {"type":"accept","target_id":"Chg:12","comment":"<optional>"} | {"type":"reject","target_id":"Chg:12","comment":"<optional>"} | {"type":"reply","target_id":"Com:45","text":"<reply>"} | {"type":"insert_row","target_text":"<cell text anchoring row>","position":"above" or "below","cells":["col1","col2"]} | {"type":"delete_row","target_text":"<cell text anchoring row>"}. MODIFY EXTENDED: set match_mode='all' to replace every occurrence of target_text in linear document order (returns occurrences_modified in the per-edit report); set match_mode='first' to silently anchor to the first hit; omit or use 'strict' (default) to fail on ambiguous matches so you can self-correct with more context. Set regex=true to interpret target_text as an ES2022 RegExp pattern; new_text may reference capture groups via $1, $2 etc. Combine match_mode='all' with regex=true for global pattern-based replacements. RULES: target_text must be copied VERBATIM from the source including punctuation/whitespace/case (unless regex=true) and must uniquely anchor one location under match_mode='strict'; never include CriticMarkup tags like {++ or {-- in new_text — the engine applies tracking automatically; use Chg:N and Com:N IDs exactly as surfaced by extract_markdown; the entire array must be a single JSON-encoded string. Atomic batch: if any single edit is invalid the whole array is rejected with an error telling you which edit failed — use that to self-correct on the next call.`, 'string') }}
244
332
  ```
245
333
 
246
334
  **Return Markdown Output:**
@@ -317,8 +405,9 @@ AI Agents cannot pass binary `.docx` data through JSON arguments anyway — that
317
405
  Because Adeu enforces **Atomic Batch Validation**, any error in the LLM's JSON will throw a `NodeApiError` and halt the node. The error message will tell you exactly which edit failed and why.
318
406
 
319
407
  * **"Target text not found"**: The LLM hallucinated a word, altered the spacing, or the text doesn't exist in the baseline document.
320
- * **"Ambiguous match"**: The LLM used a `target_text` (like "the Company") that appears multiple times. The error details will show you the exact occurrences. Advise the LLM to include more surrounding context (e.g., "the Company shall indemnify").
408
+ * **"Ambiguous match"**: The LLM used a `target_text` (like "the Company") that appears multiple times. The error details will show you the exact occurrences. Advise the LLM to either include more surrounding context (e.g., "the Company shall indemnify") or use `match_mode: "all"` if the intent is to replace every occurrence.
321
409
  * **"Modification targets an active insertion..."**: The LLM tried to `modify` text that another author is currently tracking. Adeu explicitly blocks this to maintain virtual DOM integrity and clean redline threading. You must `accept` or `reject` that prior change first.
322
- * **"Read-only elements"**: The LLM tried to modify structural items like cross-references or footnotes.
410
+ * **"Read-only elements"**: The LLM tried to modify structural items like cross-references or footnotes.
411
+ * **"Page N exceeds total_pages"**: The LLM requested a page beyond what the document has. Have it call `Extract Outline` first to discover the page count.
323
412
 
324
- **Tip**: If you are running bulk processing workflows, you can enable n8n's **"Continue On Fail"** setting on the `Apply Edits` node. If the LLM generates a flawed batch, n8n will catch the error, output an `{ "error": "..." }` JSON object for that specific document, and continue processing the rest of the files in your queue.
413
+ **Tip**: If you are running bulk processing workflows, you can enable n8n's **"Continue On Fail"** setting on the `Apply Edits` node. If the LLM generates a flawed batch, n8n will catch the error, output an `{ "error": "..." }` JSON object for that specific document, and continue processing the rest of the files in your queue.