documentation-hub 5.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (271) hide show
  1. package/.eslintrc.json +43 -0
  2. package/.github/workflows/build.yml +64 -0
  3. package/.github/workflows/ci.yml +39 -0
  4. package/.vscode/extensions.json +3 -0
  5. package/Current.md +97 -0
  6. package/DocHub_Image.png +0 -0
  7. package/README.md +666 -0
  8. package/USER_GUIDE.md +1173 -0
  9. package/Updater.md +311 -0
  10. package/build/256x256.png +0 -0
  11. package/build/512x512.png +0 -0
  12. package/build/app-update.yml +4 -0
  13. package/build/create-icon.js +208 -0
  14. package/build/icon.ico +0 -0
  15. package/build/icon.png +0 -0
  16. package/build/icon_1024x1024.png +0 -0
  17. package/dist/assets/Analytics-BpsG9895.js +1 -0
  18. package/dist/assets/Card-IAZin8kp.js +1 -0
  19. package/dist/assets/CurrentSession-B-rFkHvf.js +12 -0
  20. package/dist/assets/Dashboard-C_5gMb0q.js +1 -0
  21. package/dist/assets/Documents-CqZ25axS.js +1 -0
  22. package/dist/assets/Input-l89xwXBi.js +1 -0
  23. package/dist/assets/Reporting-DqdHJY_a.js +1 -0
  24. package/dist/assets/Search-XNbu5z_3.js +1 -0
  25. package/dist/assets/SessionManager-lH9hZfzH.js +1 -0
  26. package/dist/assets/Sessions-ClZOPYNc.js +1 -0
  27. package/dist/assets/Settings-DUEHGURa.js +11 -0
  28. package/dist/assets/index-8xUe8ptc.js +24 -0
  29. package/dist/assets/index-RYyJqF7O.css +1 -0
  30. package/dist/assets/path-BkOl0AGO.js +1 -0
  31. package/dist/assets/promises-ID_B9S-h.js +1 -0
  32. package/dist/assets/urlHelpers-TvgahX0r.js +1 -0
  33. package/dist/assets/useToast-yRSO1dkm.js +1 -0
  34. package/dist/assets/vendor-charts-RkGK5ROP.js +36 -0
  35. package/dist/assets/vendor-db-l0sNRNKZ.js +1 -0
  36. package/dist/assets/vendor-react-BVZ_anCF.js +4 -0
  37. package/dist/assets/vendor-search-Dw8P0qyA.js +1 -0
  38. package/dist/assets/vendor-ui-BU7NfluV.js +53 -0
  39. package/dist/electron/PowerAutomateApiService-LfW09ZGr.js +147 -0
  40. package/dist/electron/main-CXkNtyv-.js +19789 -0
  41. package/dist/electron/main.js +5 -0
  42. package/dist/electron/preload.js +1 -0
  43. package/dist/icon.png +0 -0
  44. package/dist/index.html +27 -0
  45. package/docs/CODEBASE_ANALYSIS_REPORT.md +309 -0
  46. package/docs/DEBUG_LOGGING_GUIDE.md +244 -0
  47. package/docs/README.md +115 -0
  48. package/docs/TOC_WIRING_GUIDE.md +344 -0
  49. package/docs/analysis/Bullet_Symbol_Bug_Analysis.md +136 -0
  50. package/docs/analysis/DOCXMLATER_ANALYSIS_SUMMARY.txt +169 -0
  51. package/docs/analysis/Document_Processing_Issues_Analysis.md +704 -0
  52. package/docs/analysis/FIELD_PRESERVATION_ANALYSIS.md +1200 -0
  53. package/docs/analysis/INDENTATION_PRESERVE_ANALYSIS.md +181 -0
  54. package/docs/analysis/INDENTATION_PRESERVE_IMPLEMENTATION.md +207 -0
  55. package/docs/analysis/List_Implementation.md +206 -0
  56. package/docs/analysis/List_Implementation_Accuracy_Report.md +366 -0
  57. package/docs/analysis/PROCESSING_OPTIONS_UI_UPDATES.md +220 -0
  58. package/docs/analysis/RefactorStyles.md +852 -0
  59. package/docs/analysis/STYLE_PARAMETER_ENHANCEMENT.md +143 -0
  60. package/docs/analysis/docxmlater-comparison-todo-2025-11-13.md +636 -0
  61. package/docs/analysis/docxmlater-implementation-analysis-2025-11-13.md +340 -0
  62. package/docs/analysis/docxmlater-template_ui-integration-analysis.md +263 -0
  63. package/docs/analysis/github-issues-to-create.md +237 -0
  64. package/docs/api/API_README.md +538 -0
  65. package/docs/api/API_REFERENCE.md +751 -0
  66. package/docs/api/TYPE_DEFINITIONS.md +869 -0
  67. package/docs/architecture/FONT_EMBEDDING_GUIDE.md +318 -0
  68. package/docs/architecture/docxmlater-functions-and-structure.md +726 -0
  69. package/docs/docxmlater-readme.md +1341 -0
  70. package/docs/fixes/EXECUTION_LOG_TEST_BASE.md +573 -0
  71. package/docs/fixes/HYPERLINK_TEXT_SANITIZATION.md +253 -0
  72. package/docs/fixes/README.md +37 -0
  73. package/docs/github-issues/issue-1-body.md +125 -0
  74. package/docs/github-issues/issue-10-body.md +850 -0
  75. package/docs/github-issues/issue-2-body.md +200 -0
  76. package/docs/github-issues/issue-3-body.md +270 -0
  77. package/docs/github-issues/issue-4-body.md +169 -0
  78. package/docs/github-issues/issue-5-body.md +173 -0
  79. package/docs/github-issues/issue-6-body.md +158 -0
  80. package/docs/github-issues/issue-7-body.md +171 -0
  81. package/docs/github-issues/issue-8-body.md +407 -0
  82. package/docs/github-issues/issue-9-body.md +515 -0
  83. package/docs/github-issues/issue-tracker.md +274 -0
  84. package/docs/github-issues/predictive-analysis-2025-10-18.md +2131 -0
  85. package/docs/implementation/List_Framework_Refactor_Plan.md +336 -0
  86. package/docs/implementation/PRIMARY_TEXT_COLOR_FEATURE.md +217 -0
  87. package/docs/implementation/RELEASE_PLAN_v2.1.0.md +362 -0
  88. package/docs/implementation/RefactorStyles.md +588 -0
  89. package/docs/implementation/implement-plan.md +489 -0
  90. package/docs/implementation/missing-helpers-implementation.md +391 -0
  91. package/docs/implementation/refactor-plan.md +520 -0
  92. package/docs/implementation/session-implementation-complete.md +233 -0
  93. package/docs/implementation/session-management-plan.md +250 -0
  94. package/docs/setup-checklist.md +77 -0
  95. package/docs/versions/changelog.md +345 -0
  96. package/electron/customUpdater.ts +656 -0
  97. package/electron/main.ts +2441 -0
  98. package/electron/memoryConfig.ts +187 -0
  99. package/electron/preload.ts +394 -0
  100. package/electron/proxyConfig.ts +340 -0
  101. package/electron/services/BackupService.ts +452 -0
  102. package/electron/services/DictionaryService.ts +402 -0
  103. package/electron/services/LocalDictionaryLookupService.ts +147 -0
  104. package/electron/services/PowerAutomateApiService.ts +231 -0
  105. package/electron/services/SharePointSyncService.ts +474 -0
  106. package/electron/windowsCertStore.ts +427 -0
  107. package/electron/zscalerConfig.ts +381 -0
  108. package/eslint.config.js +92 -0
  109. package/jest.config.js +52 -0
  110. package/package.json +214 -0
  111. package/postcss.config.mjs +6 -0
  112. package/public/icon.png +0 -0
  113. package/publish-release.ps1 +5 -0
  114. package/renovate.json +30 -0
  115. package/src/App.tsx +216 -0
  116. package/src/__mocks__/p-limit.js +12 -0
  117. package/src/__mocks__/styleMock.js +1 -0
  118. package/src/components/common/BugReportButton.tsx +44 -0
  119. package/src/components/common/BugReportDialog.tsx +193 -0
  120. package/src/components/common/Button.tsx +153 -0
  121. package/src/components/common/Card.tsx +86 -0
  122. package/src/components/common/ColorPickerDialog.tsx +177 -0
  123. package/src/components/common/ConfirmDialog.tsx +96 -0
  124. package/src/components/common/DebugConsole.tsx +275 -0
  125. package/src/components/common/EmptyState.tsx +183 -0
  126. package/src/components/common/ErrorBoundary.tsx +98 -0
  127. package/src/components/common/ErrorDetailsDialog.tsx +153 -0
  128. package/src/components/common/ErrorFallback.tsx +218 -0
  129. package/src/components/common/Input.tsx +109 -0
  130. package/src/components/common/Skeleton.tsx +184 -0
  131. package/src/components/common/SplashScreen.tsx +81 -0
  132. package/src/components/common/Toast.tsx +155 -0
  133. package/src/components/common/Tooltip.tsx +79 -0
  134. package/src/components/common/UpdateNotification.tsx +320 -0
  135. package/src/components/comparison/ComparisonWindow.tsx +374 -0
  136. package/src/components/comparison/SideBySideDiff.tsx +486 -0
  137. package/src/components/comparison/index.ts +8 -0
  138. package/src/components/document/DocumentUploader.tsx +288 -0
  139. package/src/components/document/HyperlinkPreview.tsx +430 -0
  140. package/src/components/document/HyperlinkService.md +1484 -0
  141. package/src/components/document/Hyperlink_Technical_Documentation.md +496 -0
  142. package/src/components/document/InlineChangesView.tsx +707 -0
  143. package/src/components/document/ProcessingProgress.tsx +303 -0
  144. package/src/components/document/ProcessingResults.tsx +256 -0
  145. package/src/components/document/TrackedChangesDetail.tsx +530 -0
  146. package/src/components/document/TrackedChangesPanel.tsx +546 -0
  147. package/src/components/document/VirtualDocumentList.tsx +240 -0
  148. package/src/components/editor/DocumentEditor.tsx +723 -0
  149. package/src/components/editor/DocumentEditorModal.tsx +640 -0
  150. package/src/components/editor/EditorQuickActions.tsx +502 -0
  151. package/src/components/editor/EditorToolbar.tsx +312 -0
  152. package/src/components/editor/TableEditor.tsx +926 -0
  153. package/src/components/editor/index.ts +18 -0
  154. package/src/components/layout/Header.tsx +190 -0
  155. package/src/components/layout/Sidebar.tsx +313 -0
  156. package/src/components/layout/TitleBar.tsx +190 -0
  157. package/src/components/navigation/CommandPalette.tsx +233 -0
  158. package/src/components/navigation/KeyboardShortcutsModal.tsx +173 -0
  159. package/src/components/sessions/ChangeItem.tsx +408 -0
  160. package/src/components/sessions/ChangeViewer.tsx +1155 -0
  161. package/src/components/sessions/DocumentComparisonModal.tsx +314 -0
  162. package/src/components/sessions/ProcessingOptions.tsx +297 -0
  163. package/src/components/sessions/ReplacementsTab.tsx +438 -0
  164. package/src/components/sessions/RevisionHandlingOptions.tsx +87 -0
  165. package/src/components/sessions/SessionManager.tsx +188 -0
  166. package/src/components/sessions/StylesEditor.tsx +1335 -0
  167. package/src/components/sessions/TabContainer.tsx +151 -0
  168. package/src/components/sessions/VirtualSessionList.tsx +157 -0
  169. package/src/components/sessions/sessionToProcessorManager.tsx +420 -0
  170. package/src/components/settings/CertificateManager.tsx +410 -0
  171. package/src/components/settings/SegmentedControl.tsx +88 -0
  172. package/src/components/settings/SettingRow.tsx +52 -0
  173. package/src/contexts/GlobalStatsContext.tsx +396 -0
  174. package/src/contexts/SessionContext.tsx +2129 -0
  175. package/src/contexts/ThemeContext.tsx +428 -0
  176. package/src/contexts/UserSettingsContext.tsx +290 -0
  177. package/src/contexts/__tests__/GlobalStatsContext.test.tsx +390 -0
  178. package/src/global.d.ts +273 -0
  179. package/src/hooks/useDocumentQueue.tsx +210 -0
  180. package/src/hooks/useToast.tsx +55 -0
  181. package/src/main.tsx +10 -0
  182. package/src/pages/Analytics.tsx +386 -0
  183. package/src/pages/CurrentSession.tsx +1174 -0
  184. package/src/pages/Dashboard.tsx +319 -0
  185. package/src/pages/Documents.tsx +317 -0
  186. package/src/pages/Projects.tsx +250 -0
  187. package/src/pages/Reporting.tsx +386 -0
  188. package/src/pages/Search.tsx +349 -0
  189. package/src/pages/Sessions.tsx +285 -0
  190. package/src/pages/Settings.tsx +2662 -0
  191. package/src/services/HyperlinkService.ts +1085 -0
  192. package/src/services/document/DocXMLaterProcessor.ts +617 -0
  193. package/src/services/document/DocumentProcessingComparison.ts +856 -0
  194. package/src/services/document/DocumentSnapshotService.ts +575 -0
  195. package/src/services/document/WordDocumentProcessor.ts +10509 -0
  196. package/src/services/document/__tests__/DocXMLaterProcessor.hyperlinks.test.md +311 -0
  197. package/src/services/document/__tests__/WordDocumentProcessor.integration.test.ts +515 -0
  198. package/src/services/document/__tests__/WordDocumentProcessor.test.ts +812 -0
  199. package/src/services/document/blanklines/BlankLineManager.ts +658 -0
  200. package/src/services/document/blanklines/__tests__/paragraphChecks.test.ts +281 -0
  201. package/src/services/document/blanklines/helpers/blankLineInsertion.ts +87 -0
  202. package/src/services/document/blanklines/helpers/blankLineSnapshot.ts +251 -0
  203. package/src/services/document/blanklines/helpers/clearCustom.ts +121 -0
  204. package/src/services/document/blanklines/helpers/contextChecks.ts +117 -0
  205. package/src/services/document/blanklines/helpers/imageChecks.ts +51 -0
  206. package/src/services/document/blanklines/helpers/paragraphChecks.ts +236 -0
  207. package/src/services/document/blanklines/helpers/removeBlanksBetweenListItems.ts +91 -0
  208. package/src/services/document/blanklines/helpers/removeTrailingBlanks.ts +35 -0
  209. package/src/services/document/blanklines/helpers/tableGuards.ts +21 -0
  210. package/src/services/document/blanklines/index.ts +67 -0
  211. package/src/services/document/blanklines/rules/additionRules.ts +337 -0
  212. package/src/services/document/blanklines/rules/indentationRules.ts +317 -0
  213. package/src/services/document/blanklines/rules/removalRules.ts +362 -0
  214. package/src/services/document/blanklines/rules/ruleTypes.ts +92 -0
  215. package/src/services/document/blanklines/types.ts +29 -0
  216. package/src/services/document/helpers/ImageBorderCropper.ts +377 -0
  217. package/src/services/document/helpers/__tests__/whitespace.test.ts +272 -0
  218. package/src/services/document/helpers/whitespace.ts +117 -0
  219. package/src/services/document/list/ListNormalizer.ts +947 -0
  220. package/src/services/document/list/index.ts +45 -0
  221. package/src/services/document/list/list-detection.ts +275 -0
  222. package/src/services/document/list/list-types.ts +162 -0
  223. package/src/services/document/processors/HyperlinkProcessor.ts +370 -0
  224. package/src/services/document/processors/ListProcessor.ts +257 -0
  225. package/src/services/document/processors/StructureProcessor.ts +176 -0
  226. package/src/services/document/processors/StyleProcessor.ts +389 -0
  227. package/src/services/document/processors/TableProcessor.ts +2238 -0
  228. package/src/services/document/processors/__tests__/HyperlinkProcessor.test.ts +314 -0
  229. package/src/services/document/processors/__tests__/ListProcessor.test.ts +291 -0
  230. package/src/services/document/processors/__tests__/StructureProcessor.test.ts +257 -0
  231. package/src/services/document/processors/__tests__/TableProcessor.hlp-tips-bullets.test.ts +459 -0
  232. package/src/services/document/processors/__tests__/TableProcessor.test.ts +1604 -0
  233. package/src/services/document/processors/index.ts +28 -0
  234. package/src/services/document/types/docx-processing.ts +310 -0
  235. package/src/services/editor/EditorActionHandlers.ts +901 -0
  236. package/src/services/editor/index.ts +13 -0
  237. package/src/setupTests.ts +47 -0
  238. package/src/styles/global.css +782 -0
  239. package/src/types/backup.ts +132 -0
  240. package/src/types/dictionary.ts +125 -0
  241. package/src/types/document-processing.ts +331 -0
  242. package/src/types/docxmlater-augments.d.ts +142 -0
  243. package/src/types/editor.ts +280 -0
  244. package/src/types/electron.ts +340 -0
  245. package/src/types/globalStats.ts +155 -0
  246. package/src/types/hyperlink.ts +471 -0
  247. package/src/types/operations.ts +354 -0
  248. package/src/types/session.ts +427 -0
  249. package/src/types/settings.ts +112 -0
  250. package/src/utils/MemoryMonitor.ts +248 -0
  251. package/src/utils/cn.ts +6 -0
  252. package/src/utils/colorConvert.ts +306 -0
  253. package/src/utils/diffUtils.ts +347 -0
  254. package/src/utils/documentUtils.ts +202 -0
  255. package/src/utils/electronGuard.ts +62 -0
  256. package/src/utils/indexedDB.ts +915 -0
  257. package/src/utils/logger.ts +717 -0
  258. package/src/utils/pathSecurity.ts +232 -0
  259. package/src/utils/pathValidator.ts +236 -0
  260. package/src/utils/processingTimeEstimator.ts +153 -0
  261. package/src/utils/safeJsonParse.ts +62 -0
  262. package/src/utils/textSanitizer.ts +162 -0
  263. package/src/utils/urlHelpers.ts +304 -0
  264. package/src/utils/urlPatterns.ts +198 -0
  265. package/src/utils/urlSanitizer.ts +152 -0
  266. package/src/vite-env.d.ts +11 -0
  267. package/tsconfig.electron.json +19 -0
  268. package/tsconfig.json +36 -0
  269. package/tsconfig.node.json +12 -0
  270. package/typedoc.json +45 -0
  271. package/vite.config.ts +152 -0
@@ -0,0 +1,1200 @@
1
+ # Field Preservation Analysis - Critical Bug Report
2
+
3
+ **Date**: November 14, 2025
4
+ **Analysis Type**: Deep Codebase Investigation
5
+ **Status**: 🚨 **CRITICAL BUG IDENTIFIED**
6
+ **Severity**: HIGH - Data Loss Issue
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ After thoroughly analyzing both the **docXMLater framework** (document processing library) and **dochub-app application** (main application using the framework), I have identified **critical bugs in field preservation** that explain why fields are inconsistently preserved during DOCX file processing.
13
+
14
+ ### Key Findings
15
+
16
+ 1. ✅ **Simple fields (`<w:fldSimple>`) ARE parsed** - Code exists in DocumentParser
17
+ 2. ❌ **Simple fields MAY NOT be serialized** - Conditional preservation bug
18
+ 3. ❌ **Complex fields (`<w:fldChar>`) are COMPLETELY IGNORED** - Not parsed at all
19
+ 4. ⚠️ **Order preservation is unreliable** - Metadata generation has edge cases
20
+
21
+ ---
22
+
23
+ ## Architecture Overview
24
+
25
+ ### System Flow
26
+
27
+ ```
28
+ dochub-app (Main Application)
29
+
30
+ WordDocumentProcessor.ts
31
+
32
+ DocXMLaterProcessor.ts (thin wrapper)
33
+
34
+ docXMLater Framework
35
+ ├─ DocumentParser.ts (Load: XML → Objects)
36
+ ├─ Document.ts (In-memory document)
37
+ └─ DocumentGenerator.ts (Save: Objects → XML)
38
+ ```
39
+
40
+ ### Document Processing Pipeline
41
+
42
+ ```
43
+ 1. LOAD PHASE
44
+ ┌─────────────────────────────────────┐
45
+ │ Word DOCX File (ZIP archive) │
46
+ └──────────────┬──────────────────────┘
47
+
48
+ ┌─────────────────────────────────────┐
49
+ │ ZipHandler.load() │
50
+ │ - Extracts word/document.xml │
51
+ └──────────────┬──────────────────────┘
52
+
53
+ ┌─────────────────────────────────────┐
54
+ │ DocumentParser.parseDocument() │
55
+ │ - Parses XML to structured objects │
56
+ │ - Creates Paragraph, Run, etc. │
57
+ └──────────────┬──────────────────────┘
58
+
59
+ ┌─────────────────────────────────────┐
60
+ │ Document (in-memory) │
61
+ │ - bodyElements: Paragraph[] │
62
+ └─────────────────────────────────────┘
63
+
64
+ 2. PROCESSING PHASE
65
+ ┌─────────────────────────────────────┐
66
+ │ WordDocumentProcessor │
67
+ │ - Updates hyperlinks │
68
+ │ - Applies styles │
69
+ │ - Formats tables │
70
+ └─────────────────────────────────────┘
71
+
72
+ 3. SAVE PHASE
73
+ ┌─────────────────────────────────────┐
74
+ │ Document.save() │
75
+ │ - Calls Paragraph.toXML() │
76
+ │ - Generates word/document.xml │
77
+ └──────────────┬──────────────────────┘
78
+
79
+ ┌─────────────────────────────────────┐
80
+ │ ZipHandler.save() │
81
+ │ - Writes ZIP archive to disk │
82
+ └─────────────────────────────────────┘
83
+ ```
84
+
85
+ ---
86
+
87
+ ## 🐛 BUG #1: Complex Fields Are Completely Ignored
88
+
89
+ ### Description
90
+
91
+ **Complex fields** (using `<w:fldChar>` structure) are **NOT parsed at all**. They are silently dropped during document loading.
92
+
93
+ ### Field Types in Word Documents
94
+
95
+ Word documents use two field structures:
96
+
97
+ #### 1. Simple Fields (`<w:fldSimple>`)
98
+
99
+ ```xml
100
+ <w:p>
101
+ <w:fldSimple w:instr=" PAGE \* MERGEFORMAT ">
102
+ <w:t>1</w:t>
103
+ </w:fldSimple>
104
+ </w:p>
105
+ ```
106
+
107
+ #### 2. Complex Fields (`<w:fldChar>`) - ❌ **NOT HANDLED**
108
+
109
+ ```xml
110
+ <w:p>
111
+ <w:r><w:fldChar w:fldCharType="begin"/></w:r>
112
+ <w:r><w:instrText> PAGE \* MERGEFORMAT </w:instrText></w:r>
113
+ <w:r><w:fldChar w:fldCharType="separate"/></w:r>
114
+ <w:r><w:t>1</w:t></w:r>
115
+ <w:r><w:fldChar w:fldCharType="end"/></w:r>
116
+ </w:p>
117
+ ```
118
+
119
+ ### Affected Field Types
120
+
121
+ Complex fields are used for:
122
+
123
+ - ✅ Table of Contents (TOC)
124
+ - ✅ Cross-references (REF)
125
+ - ✅ Page numbers (PAGE)
126
+ - ✅ Date/Time fields (DATE, TIME)
127
+ - ✅ Document properties (AUTHOR, TITLE, etc.)
128
+ - ✅ Conditional fields (IF)
129
+ - ✅ Mail merge fields (MERGEFIELD)
130
+
131
+ ### Code Evidence
132
+
133
+ **File**: `docXMLater/src/core/DocumentParser.ts`
134
+
135
+ **Parsing Logic** (lines ~238-310):
136
+
137
+ ```typescript
138
+ // In parseParagraphWithOrder() method:
139
+ if (orderedChildren && orderedChildren.length > 0) {
140
+ for (const childInfo of orderedChildren) {
141
+ const elementType = childInfo.type;
142
+
143
+ if (elementType === 'w:r') {
144
+ // ✅ Runs are parsed
145
+ } else if (elementType === 'w:hyperlink') {
146
+ // ✅ Hyperlinks are parsed
147
+ } else if (elementType === 'w:fldSimple') {
148
+ // ✅ Simple fields are parsed
149
+ }
150
+ // ❌ NO HANDLING FOR w:fldChar - complex fields are ignored!
151
+ }
152
+ }
153
+ ```
154
+
155
+ **Parsing Method** (lines ~580-616):
156
+
157
+ ```typescript
158
+ private parseSimpleFieldFromObject(fieldObj: any): Field | null {
159
+ // ✅ This method exists and works
160
+ const instruction = fieldObj["@_w:instr"];
161
+ const type = (typeMatch?.[1] || 'PAGE') as FieldType;
162
+ return Field.create({ type, instruction, formatting });
163
+ }
164
+
165
+ // ❌ NO METHOD FOR parseComplexFieldFromObject()
166
+ // ❌ NO CODE TO DETECT w:fldChar elements
167
+ ```
168
+
169
+ **Result**: Complex fields are **silently dropped** during parsing. The XML contains them, but they never make it into the in-memory Document object.
170
+
171
+ ### Impact
172
+
173
+ **Hit or Miss Behavior**:
174
+
175
+ - Documents with **simple fields** (`<w:fldSimple>`) → ✅ Preserved (sometimes, see Bug #2)
176
+ - Documents with **complex fields** (`<w:fldChar>`) → ❌ Always lost
177
+ - Mixed documents → ⚠️ Partial loss (simple preserved, complex lost)
178
+
179
+ This explains the "hit or miss" nature - it depends on which field structure Word used when creating the field.
180
+
181
+ ### Why Word Uses Different Structures
182
+
183
+ Word decides between simple and complex fields based on:
184
+
185
+ - **Simple**: Basic fields with no special formatting or nested content
186
+ - **Complex**: Fields with special formatting, nested fields, or complex instructions
187
+
188
+ The choice is **automatic and invisible to users**, which is why the bug appears random.
189
+
190
+ ---
191
+
192
+ ## 🐛 BUG #2: Simple Field Preservation Depends on `_orderedChildren` Metadata
193
+
194
+ ### Description
195
+
196
+ Simple fields (`<w:fldSimple>`) are only preserved if the **XMLParser generates `_orderedChildren` metadata**. This metadata is **conditionally generated**, leading to inconsistent field preservation.
197
+
198
+ ### Root Cause
199
+
200
+ **File**: `docXMLater/src/xml/XMLParser.ts`
201
+
202
+ **Lines ~560-575** (coalesceChildren method):
203
+
204
+ ```typescript
205
+ // Build ordered children metadata to preserve document order
206
+ const orderedChildren: Array<{ type: string; index: number }> = [];
207
+
208
+ // ... build orderedChildren array ...
209
+
210
+ // ❌ BUG: Only adds metadata if multiple child types exist
211
+ if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
212
+ result['_orderedChildren'] = orderedChildren;
213
+ }
214
+ ```
215
+
216
+ ### The Problem
217
+
218
+ **Scenario 1**: Paragraph with runs, hyperlinks, AND fields
219
+
220
+ ```xml
221
+ <w:p>
222
+ <w:r><w:t>Text</w:t></w:r>
223
+ <w:hyperlink><w:r><w:t>Link</w:t></w:r></w:hyperlink>
224
+ <w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
225
+ </w:p>
226
+ ```
227
+
228
+ - `uniqueTypes.length = 3` (w:r, w:hyperlink, w:fldSimple)
229
+ - ✅ `_orderedChildren` is created
230
+ - ✅ Field is parsed in correct order
231
+ - ✅ **Field PRESERVED**
232
+
233
+ **Scenario 2**: Paragraph with ONLY fields
234
+
235
+ ```xml
236
+ <w:p>
237
+ <w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
238
+ </w:p>
239
+ ```
240
+
241
+ - `uniqueTypes.length = 1` (only w:fldSimple)
242
+ - ❌ `_orderedChildren` is **NOT created** (fails `uniqueTypes.length > 1` check)
243
+ - ❌ Falls back to non-ordered parsing
244
+ - ⚠️ **Field MAY be lost** (depends on fallback behavior)
245
+
246
+ **Scenario 3**: Paragraph with fields and runs of same type
247
+
248
+ ```xml
249
+ <w:p>
250
+ <w:r><w:t>Page </w:t></w:r>
251
+ <w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
252
+ <w:r><w:t> of </w:t></w:r>
253
+ <w:fldSimple w:instr="NUMPAGES"><w:t>10</w:t></w:fldSimple>
254
+ </w:p>
255
+ ```
256
+
257
+ - `uniqueTypes.length = 2` (w:r, w:fldSimple)
258
+ - ✅ `_orderedChildren` is created
259
+ - ✅ **Fields PRESERVED**
260
+
261
+ ### Fallback Behavior Analysis
262
+
263
+ **File**: `docXMLater/src/core/DocumentParser.ts` (lines ~280-310)
264
+
265
+ ```typescript
266
+ } else {
267
+ // Fallback to sequential processing if no order metadata
268
+ // Handle runs (w:r)
269
+ const runs = pElement["w:r"];
270
+ // ...process runs...
271
+
272
+ // Handle hyperlinks (w:hyperlink)
273
+ const hyperlinks = pElement["w:hyperlink"];
274
+ // ...process hyperlinks...
275
+
276
+ // Handle simple fields (w:fldSimple)
277
+ const fields = pElement["w:fldSimple"];
278
+ const fieldChildren = Array.isArray(fields) ? fields : (fields ? [fields] : []);
279
+
280
+ for (const fieldObj of fieldChildren) {
281
+ const field = this.parseSimpleFieldFromObject(fieldObj);
282
+ if (field) {
283
+ paragraph.addField(field); // ✅ Field IS added in fallback
284
+ }
285
+ }
286
+ }
287
+ ```
288
+
289
+ **Conclusion**: In the fallback path, fields **ARE processed** and added to the paragraph. However, they are processed **AFTER runs and hyperlinks**, which means:
290
+
291
+ - ✅ Fields are preserved
292
+ - ⚠️ Field **ORDER** may be wrong (always appear last instead of in correct position)
293
+
294
+ ### Why This Causes "Hit or Miss"
295
+
296
+ **Working Case** (multi-type paragraph):
297
+
298
+ - Paragraph has runs + fields → `_orderedChildren` created → Fields in correct order ✅
299
+
300
+ **Broken Case** (fields only):
301
+
302
+ - Paragraph has only fields → No `_orderedChildren` → Fallback parsing → Fields at wrong position ⚠️
303
+ - If document structure depends on field order (e.g., TOC), this breaks functionality
304
+
305
+ ---
306
+
307
+ ## 🐛 BUG #3: Field Serialization Does Not Preserve Complex Fields
308
+
309
+ ### Description
310
+
311
+ Even if complex fields WERE parsed (they aren't per Bug #1), the serialization code in `Paragraph.toXML()` cannot properly reconstruct them.
312
+
313
+ ### Code Evidence
314
+
315
+ **File**: `docXMLater/src/elements/Paragraph.ts` (lines ~600-650)
316
+
317
+ ```typescript
318
+ // Add content (runs, fields, hyperlinks, revisions, shapes, textboxes)
319
+ for (let i = 0; i < this.content.length; i++) {
320
+ const item = this.content[i];
321
+
322
+ if (item instanceof Field) {
323
+ // ❌ BUG: Fields are wrapped in a run - converts to <w:fldSimple>
324
+ paragraphChildren.push(XMLBuilder.w('r', undefined, [item.toXML()]));
325
+ } else if (item instanceof Hyperlink) {
326
+ // ✅ Hyperlinks are standalone elements
327
+ paragraphChildren.push(item.toXML());
328
+ } else if (item) {
329
+ paragraphChildren.push(item.toXML());
330
+ }
331
+ }
332
+ ```
333
+
334
+ **Field.toXML()** from `Field.ts`:
335
+
336
+ ```typescript
337
+ toXML(): XMLElement {
338
+ // ...
339
+ return {
340
+ name: 'w:fldSimple', // ❌ Always generates fldSimple, never fldChar
341
+ attributes: {
342
+ 'w:instr': this.instruction,
343
+ },
344
+ children,
345
+ };
346
+ }
347
+ ```
348
+
349
+ ### The Problem
350
+
351
+ The current code **always serializes fields as `<w:fldSimple>`**, even if they were originally complex fields. This means:
352
+
353
+ 1. Complex fields can't be represented in the object model
354
+ 2. Even if parsing were fixed, serialization would convert them to simple fields
355
+ 3. Word may reject or incorrectly render the simplified fields
356
+
357
+ ### Impact
358
+
359
+ - Complex field formatting is lost
360
+ - Nested fields are flattened
361
+ - TOC fields may not update correctly in Word
362
+ - Cross-references lose their specialized behavior
363
+
364
+ ---
365
+
366
+ ## 🐛 BUG #4: Runs with `w:fldChar` Elements Are Treated as Regular Text Runs
367
+
368
+ ### Description
369
+
370
+ During parsing, runs that contain `<w:fldChar>` elements (field markers) are processed as **regular text runs**, losing the field structure entirely.
371
+
372
+ ### Code Evidence
373
+
374
+ **File**: `docXMLater/src/core/DocumentParser.ts` (lines ~450-550)
375
+
376
+ ```typescript
377
+ private parseRunFromObject(runObj: any): Run | null {
378
+ // Extract all run content elements (text, tabs, breaks, etc.)
379
+ const content: RunContent[] = [];
380
+
381
+ if (runObj["_orderedChildren"]) {
382
+ for (const child of runObj["_orderedChildren"]) {
383
+ const elementType = child.type;
384
+
385
+ switch (elementType) {
386
+ case 'w:t':
387
+ // ✅ Text is handled
388
+ break;
389
+ case 'w:tab':
390
+ // ✅ Tabs are handled
391
+ break;
392
+ case 'w:br':
393
+ // ✅ Breaks are handled
394
+ break;
395
+ // ❌ NO CASE FOR 'w:fldChar'
396
+ // ❌ NO CASE FOR 'w:instrText'
397
+ }
398
+ }
399
+ }
400
+
401
+ // Create run from content elements - returns a regular Run
402
+ const run = Run.createFromContent(content, { cleanXmlFromText: false });
403
+ return run;
404
+ }
405
+ ```
406
+
407
+ ### What Should Happen
408
+
409
+ When a run contains `<w:fldChar>`, it's part of a **complex field structure**:
410
+
411
+ ```xml
412
+ <!-- Field begin marker -->
413
+ <w:r><w:fldChar w:fldCharType="begin"/></w:r>
414
+
415
+ <!-- Field instruction -->
416
+ <w:r><w:instrText> PAGE \* MERGEFORMAT </w:instrText></w:r>
417
+
418
+ <!-- Field separator -->
419
+ <w:r><w:fldChar w:fldCharType="separate"/></w:r>
420
+
421
+ <!-- Field result (actual displayed value) -->
422
+ <w:r><w:t>1</w:t></w:r>
423
+
424
+ <!-- Field end marker -->
425
+ <w:r><w:fldChar w:fldCharType="end"/></w:r>
426
+ ```
427
+
428
+ These runs should be **grouped together** and converted to a single Field object.
429
+
430
+ ### What Actually Happens
431
+
432
+ Each run is parsed **independently**:
433
+
434
+ - Run with `fldChar="begin"` → Parsed as empty Run (no text) → ✅ Preserved as empty run
435
+ - Run with `instrText` → Parsed as Run with text " PAGE \* MERGEFORMAT " → ❌ Shows as literal text
436
+ - Run with `fldChar="separate"` → Parsed as empty Run → ✅ Preserved as empty run
437
+ - Run with actual value → Parsed as Run with text "1" → ✅ Preserved
438
+ - Run with `fldChar="end"` → Parsed as empty Run → ✅ Preserved as empty run
439
+
440
+ **Result**: The field structure is **lost**. The instruction text appears as **literal visible text** in the document instead of being executed as a field.
441
+
442
+ ---
443
+
444
+ ## 🐛 BUG #5: Field Order Can Be Scrambled During Serialization
445
+
446
+ ### Description
447
+
448
+ Even when fields ARE preserved during parsing, they may be serialized in the wrong order relative to runs and hyperlinks.
449
+
450
+ ### Code Evidence
451
+
452
+ **File**: `docXMLater/src/elements/Paragraph.ts` (lines ~600-650)
453
+
454
+ ```typescript
455
+ // Add content (runs, fields, hyperlinks, revisions, shapes, textboxes)
456
+ for (let i = 0; i < this.content.length; i++) {
457
+ const item = this.content[i];
458
+
459
+ if (item instanceof Field) {
460
+ paragraphChildren.push(XMLBuilder.w('r', undefined, [item.toXML()]));
461
+ } else if (item instanceof Hyperlink) {
462
+ paragraphChildren.push(item.toXML());
463
+ } else if (item instanceof Revision) {
464
+ paragraphChildren.push(item.toXML());
465
+ } else if (item instanceof RangeMarker) {
466
+ paragraphChildren.push(item.toXML());
467
+ } else if (item) {
468
+ paragraphChildren.push(item.toXML());
469
+ }
470
+ }
471
+ ```
472
+
473
+ ### The Problem
474
+
475
+ The serialization iterates through `this.content[]` in order, which **should** preserve order. However:
476
+
477
+ 1. **During parsing**, the order depends on whether `_orderedChildren` exists (Bug #2)
478
+ 2. **During processing**, the `content[]` array may be modified by dochub-app operations
479
+ 3. **No validation** ensures the order remains correct
480
+
481
+ ### Potential Scenario
482
+
483
+ ```
484
+ Original: Text [FIELD: PAGE] Link
485
+ Parsed: [Run: "Text"] [Field: PAGE] [Hyperlink: "Link"]
486
+ After processing: [Run: "Text"] [Hyperlink: "Link"] [Field: PAGE]
487
+ Saved: Text Link [FIELD: PAGE] ❌ Wrong order!
488
+ ```
489
+
490
+ While this is **theoretically possible**, the current code doesn't modify `content[]` order during processing, so this is **low risk** compared to Bugs #1-4.
491
+
492
+ ---
493
+
494
+ ## 📊 Summary Table: Field Preservation Matrix
495
+
496
+ | Field Structure | Paragraph Content | `_orderedChildren`? | Parsing Result | Serialization | Final Result |
497
+ | -------------------- | --------------------- | ------------------- | ------------------ | ---------------- | ---------------------- |
498
+ | Simple (`fldSimple`) | Runs + Fields + Links | ✅ Yes (3 types) | ✅ Parsed in order | ✅ Correct order | ✅ **PRESERVED** |
499
+ | Simple (`fldSimple`) | Runs + Fields | ✅ Yes (2 types) | ✅ Parsed in order | ✅ Correct order | ✅ **PRESERVED** |
500
+ | Simple (`fldSimple`) | Fields only | ❌ No (1 type) | ⚠️ Fallback path | ⚠️ Wrong order | ⚠️ **ORDER BROKEN** |
501
+ | Complex (`fldChar`) | Any combination | N/A | ❌ Not parsed | ❌ No object | ❌ **COMPLETELY LOST** |
502
+
503
+ ---
504
+
505
+ ## 🔍 Root Cause Analysis
506
+
507
+ ### Why These Bugs Exist
508
+
509
+ #### 1. **Incomplete Implementation**
510
+
511
+ The docXMLater framework has:
512
+
513
+ - ✅ `Field` class defined in `elements/Field.ts`
514
+ - ✅ `ComplexField` class defined in `elements/Field.ts`
515
+ - ❌ **No parsing code** for `ComplexField` in `DocumentParser.ts`
516
+ - ❌ **No detection logic** for `w:fldChar` elements
517
+
518
+ **Evidence**: The `Field.ts` file has complete classes for both simple and complex fields, but `DocumentParser.ts` only has code to parse simple fields.
519
+
520
+ #### 2. **Design Flaw in Order Preservation**
521
+
522
+ The `_orderedChildren` metadata is meant to preserve element order, but the condition `uniqueTypes.length > 1` is **too restrictive**:
523
+
524
+ ```typescript
525
+ // ❌ FLAWED LOGIC: Assumes order only matters with multiple types
526
+ if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
527
+ result['_orderedChildren'] = orderedChildren;
528
+ }
529
+
530
+ // ✅ CORRECT LOGIC: Order always matters
531
+ if (orderedChildren.length > 0) {
532
+ result['_orderedChildren'] = orderedChildren;
533
+ }
534
+ ```
535
+
536
+ The assumption that "single-type content doesn't need ordering" is **FALSE**. Consider:
537
+
538
+ - Multiple fields in sequence → Order matters (e.g., "Page 1 of 10")
539
+ - Multiple runs → Order matters for text flow
540
+
541
+ #### 3. **Architectural Mismatch**
542
+
543
+ The framework was designed with a **run-centric model**:
544
+
545
+ - Paragraphs contain **Runs, Hyperlinks, Fields**
546
+ - Each is a separate object type
547
+
548
+ But Word's XML has a **run-based structure** where **complex fields ARE runs**:
549
+
550
+ - Complex fields are **sequences of special runs**
551
+ - Each `<w:r>` can contain `<w:fldChar>` or `<w:instrText>`
552
+
553
+ The framework needs **stateful parsing** to group these runs into Field objects, but it uses **stateless element-by-element parsing**.
554
+
555
+ ---
556
+
557
+ ## 💡 Impact on User Experience
558
+
559
+ ### Symptoms Users See
560
+
561
+ 1. **"Fields disappear"** after processing
562
+ - User inserts PAGE field in Word
563
+ - Processes document in dochub-app
564
+ - Opens result → `PAGE` shows as literal text or is missing
565
+
566
+ 2. **"Sometimes it works, sometimes it doesn't"**
567
+ - Document A: Simple fields → Works ✅
568
+ - Document B: Complex fields → Fails ❌
569
+ - Same operation, different field types → Appears random
570
+
571
+ 3. **"Table of Contents is broken"**
572
+ - TOC uses complex fields
573
+ - Always lost during processing
574
+ - Document needs manual TOC recreation after processing
575
+
576
+ 4. **"Page numbers disappear"**
577
+ - Header/footer page numbers often use complex fields
578
+ - Lost during document processing
579
+ - Users must manually re-insert fields
580
+
581
+ ### Real-World Scenarios
582
+
583
+ **Scenario A: Legal Documents**
584
+
585
+ ```
586
+ Original: Contract dated [DATE] on page [PAGE]
587
+ After: Contract dated DATE on page PAGE
588
+ ```
589
+
590
+ - Field codes appear as literal text
591
+ - Professional documents look broken
592
+
593
+ **Scenario B: Reports**
594
+
595
+ ```
596
+ Original: [TOC with hyperlinked entries]
597
+ After: [Empty TOC or visible field codes]
598
+ ```
599
+
600
+ - TOC must be manually regenerated
601
+ - Cross-references broken
602
+
603
+ **Scenario C: Templates**
604
+
605
+ ```
606
+ Original: Author: [AUTHOR], Modified: [SAVEDATE]
607
+ After: Author: AUTHOR, Modified: SAVEDATE
608
+ ```
609
+
610
+ - Dynamic fields converted to static text
611
+ - Document loses its template functionality
612
+
613
+ ---
614
+
615
+ ## 🛠️ Recommended Fixes
616
+
617
+ ### Fix Priority
618
+
619
+ 1. **CRITICAL** - BUG #1: Add complex field parsing
620
+ 2. **HIGH** - BUG #4: Detect and group `w:fldChar` runs
621
+ 3. **MEDIUM** - BUG #2: Always generate `_orderedChildren`
622
+ 4. **LOW** - BUG #3: Add ComplexField serialization
623
+ 5. **LOW** - BUG #5: Validate content order preservation
624
+
625
+ ### Fix #1: Add Complex Field Parsing
626
+
627
+ **File**: `docXMLater/src/core/DocumentParser.ts`
628
+
629
+ **Location**: `parseParagraphWithOrder()` method (around line 250)
630
+
631
+ **Current Code**:
632
+
633
+ ```typescript
634
+ } else if (elementType === "w:fldSimple") {
635
+ // Parse simple fields
636
+ }
637
+ ```
638
+
639
+ **Add After**:
640
+
641
+ ```typescript
642
+ } else if (elementType === "w:r") {
643
+ // Check if this run contains field characters
644
+ const run = runArray[elementIndex];
645
+ if (run && (run["w:fldChar"] || run["w:instrText"])) {
646
+ // This is part of a complex field - add to pending field parser
647
+ this.addToComplexField(run);
648
+ } else {
649
+ // Regular run
650
+ const parsedRun = this.parseRunFromObject(run);
651
+ if (parsedRun) paragraph.addRun(parsedRun);
652
+ }
653
+ }
654
+ ```
655
+
656
+ **Add New Method**:
657
+
658
+ ```typescript
659
+ private complexFieldBuffer: any[] = [];
660
+
661
+ private addToComplexField(run: any): void {
662
+ this.complexFieldBuffer.push(run);
663
+
664
+ // Check if we've completed a field (found "end" marker)
665
+ if (run["w:fldChar"]?.["@_w:fldCharType"] === "end") {
666
+ const field = this.parseComplexFieldFromBuffer();
667
+ if (field) {
668
+ // Add field to current paragraph
669
+ }
670
+ this.complexFieldBuffer = [];
671
+ }
672
+ }
673
+
674
+ private parseComplexFieldFromBuffer(): ComplexField | null {
675
+ // Parse the buffered runs into a ComplexField object
676
+ // Extract instruction from w:instrText
677
+ // Extract result from runs between separate and end
678
+ // Return ComplexField instance
679
+ }
680
+ ```
681
+
682
+ ### Fix #2: Always Generate `_orderedChildren`
683
+
684
+ **File**: `docXMLater/src/xml/XMLParser.ts`
685
+
686
+ **Location**: `coalesceChildren()` method (line ~573)
687
+
688
+ **Current Code**:
689
+
690
+ ```typescript
691
+ // ❌ BUG: Only adds metadata if multiple child types exist
692
+ if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
693
+ result['_orderedChildren'] = orderedChildren;
694
+ }
695
+ ```
696
+
697
+ **Fixed Code**:
698
+
699
+ ```typescript
700
+ // ✅ FIX: Always add metadata to preserve element order
701
+ if (orderedChildren.length > 0) {
702
+ result['_orderedChildren'] = orderedChildren;
703
+ }
704
+ ```
705
+
706
+ **Impact**: This single-line change ensures field order is always preserved, even for paragraphs with only one element type.
707
+
708
+ ### Fix #3: Handle `w:fldChar` in Run Parsing
709
+
710
+ **File**: `docXMLater/src/core/DocumentParser.ts`
711
+
712
+ **Location**: `parseRunFromObject()` method (around line 500)
713
+
714
+ **Add to switch statement**:
715
+
716
+ ```typescript
717
+ switch (elementType) {
718
+ case 'w:t':
719
+ // Existing text handling
720
+ break;
721
+
722
+ case 'w:tab':
723
+ // Existing tab handling
724
+ break;
725
+
726
+ case 'w:fldChar':
727
+ // ✅ NEW: Handle field character markers
728
+ const fldChar = runObj['w:fldChar'];
729
+ const fldCharType = fldChar?.['@_w:fldCharType'];
730
+ content.push({
731
+ type: 'fieldCharacter',
732
+ value: fldCharType, // 'begin', 'separate', or 'end'
733
+ });
734
+ break;
735
+
736
+ case 'w:instrText':
737
+ // ✅ NEW: Handle field instruction text
738
+ const instrText = runObj['w:instrText'];
739
+ const instruction =
740
+ typeof instrText === 'object' && instrText !== null
741
+ ? instrText['#text'] || ''
742
+ : instrText || '';
743
+ content.push({
744
+ type: 'fieldInstruction',
745
+ value: instruction,
746
+ });
747
+ break;
748
+ }
749
+ ```
750
+
751
+ ### Fix #4: Update Run to Support Field Elements
752
+
753
+ **File**: `docXMLater/src/elements/Run.ts`
754
+
755
+ **Update RunContentType**:
756
+
757
+ ```typescript
758
+ export type RunContentType =
759
+ | 'text'
760
+ | 'tab'
761
+ | 'break'
762
+ | 'carriageReturn'
763
+ | 'softHyphen'
764
+ | 'noBreakHyphen'
765
+ | 'fieldCharacter' // ✅ NEW: w:fldChar elements
766
+ | 'fieldInstruction'; // ✅ NEW: w:instrText elements
767
+ ```
768
+
769
+ **Update Run.toXML()**:
770
+
771
+ ```typescript
772
+ switch (contentElement.type) {
773
+ case 'fieldCharacter':
774
+ runChildren.push(
775
+ XMLBuilder.wSelf('fldChar', {
776
+ 'w:fldCharType': contentElement.value,
777
+ })
778
+ );
779
+ break;
780
+
781
+ case 'fieldInstruction':
782
+ runChildren.push(
783
+ XMLBuilder.w(
784
+ 'instrText',
785
+ {
786
+ 'xml:space': 'preserve',
787
+ },
788
+ [contentElement.value || '']
789
+ )
790
+ );
791
+ break;
792
+ }
793
+ ```
794
+
795
+ ---
796
+
797
+ ## 🧪 Testing Strategy
798
+
799
+ ### Test Cases
800
+
801
+ #### Test 1: Simple Field Preservation
802
+
803
+ ```typescript
804
+ // Create document with simple field
805
+ const doc = Document.create();
806
+ const para = new Paragraph();
807
+ para.addField(Field.createPageNumber());
808
+ doc.addParagraph(para);
809
+
810
+ // Save and reload
811
+ await doc.save('test.docx');
812
+ const doc2 = await Document.load('test.docx');
813
+
814
+ // Verify field exists
815
+ const paras = doc2.getParagraphs();
816
+ assert(paras[0].getContent().some((item) => item instanceof Field));
817
+ ```
818
+
819
+ #### Test 2: Complex Field Preservation
820
+
821
+ ```typescript
822
+ // Create document with complex field
823
+ const doc = Document.create();
824
+ const para = new Paragraph();
825
+ const complexField = new ComplexField({
826
+ instruction: ' PAGE \\* MERGEFORMAT ',
827
+ result: '1',
828
+ });
829
+ para.addField(complexField);
830
+ doc.addParagraph(para);
831
+
832
+ // Save and reload
833
+ await doc.save('test-complex.docx');
834
+ const doc2 = await Document.load('test-complex.docx');
835
+
836
+ // Verify complex field preserved
837
+ const content = doc2.getParagraphs()[0].getContent();
838
+ assert(content.some((item) => item instanceof ComplexField));
839
+ ```
840
+
841
+ #### Test 3: Field Order Preservation
842
+
843
+ ```typescript
844
+ // Create paragraph with interleaved content
845
+ const para = new Paragraph();
846
+ para.addText('Page ');
847
+ para.addField(Field.createPageNumber());
848
+ para.addText(' of ');
849
+ para.addField(Field.createTotalPages());
850
+
851
+ // Save and reload
852
+ // Verify order: Run → Field → Run → Field
853
+ ```
854
+
855
+ ### Validation Approach
856
+
857
+ 1. **Unit Tests**: Test each parsing/serialization method independently
858
+ 2. **Integration Tests**: Test full load/save cycle
859
+ 3. **Real Documents**: Test with actual Word documents containing various field types
860
+ 4. **Regression Tests**: Ensure fixes don't break existing functionality
861
+
862
+ ---
863
+
864
+ ## 📝 Additional Observations
865
+
866
+ ### Positive Findings
867
+
868
+ ✅ **The Field classes are well-designed**
869
+
870
+ - `Field.ts` and `FieldHelpers.ts` provide comprehensive field support
871
+ - Both simple and complex fields have complete implementations
872
+ - Field creation helpers exist for common field types
873
+
874
+ ✅ **Paragraph and Run handling is solid**
875
+
876
+ - Order preservation works well for runs and hyperlinks
877
+ - The `_orderedChildren` mechanism is clever
878
+ - Type-safe object model prevents many bugs
879
+
880
+ ✅ **dochub-app integration is clean**
881
+
882
+ - WordDocumentProcessor uses docXMLater APIs correctly
883
+ - Error handling is comprehensive
884
+ - Memory management is excellent
885
+
886
+ ### Areas of Concern
887
+
888
+ ⚠️ **No Field Extraction API**
889
+
890
+ - WordDocumentProcessor has `extractHyperlinks()` method
891
+ - **No equivalent `extractFields()` method** exists
892
+ - Can't enumerate fields in a document programmatically
893
+ - dochub-app can't validate or report on field preservation
894
+
895
+ ⚠️ **No Field Validation**
896
+
897
+ - No checks during save to warn about lost fields
898
+ - Silent data loss - users don't know fields were dropped
899
+ - No diff/comparison showing before/after field counts
900
+
901
+ ⚠️ **Limited Field Support in Paragraphpara API**
902
+
903
+ - `Paragraph.addField()` exists
904
+ - **No `Paragraph.getFields()` method**
905
+ - **No `Paragraph.removeField()` method**
906
+ - Fields can't be queried or manipulated after being added
907
+
908
+ ---
909
+
910
+ ## 🎯 Critical Path to Resolution
911
+
912
+ ### For Immediate Relief (Quick Fix)
913
+
914
+ **Option A: Document with Warning**
915
+
916
+ 1. Add field count validation before/after processing
917
+ 2. Warn users if fields are lost
918
+ 3. Document limitation in UI: "Complex fields are not preserved"
919
+
920
+ **Option B: Raw XML Passthrough**
921
+
922
+ 1. Detect complex fields during load
923
+ 2. Store original XML for those paragraphs
924
+ 3. Write back unchanged XML during save
925
+ 4. Only process paragraphs without complex fields
926
+
927
+ ### For Complete Solution (Full Fix)
928
+
929
+ **Phase 1: Parsing**
930
+
931
+ 1. Implement `parseComplexFieldFromRunSequence()`
932
+ 2. Add `w:fldChar` and `w:instrText` detection to run parser
933
+ 3. Add state machine to group field runs into ComplexField objects
934
+ 4. Update `parseParagraphWithOrder()` to handle complex fields
935
+
936
+ **Phase 2: XMLParser Fix**
937
+
938
+ 1. Remove `uniqueTypes.length > 1` condition in `coalesceChildren()`
939
+ 2. Always generate `_orderedChildren` when elements exist
940
+ 3. Add regression tests for single-type parsing
941
+
942
+ **Phase 3: Serialization**
943
+
944
+ 1. Update `Paragraph.toXML()` to handle `ComplexField` separately
945
+ 2. `ComplexField.toXML()` should return **multiple runs**, not wrapped in single run
946
+ 3. Preserve `<w:fldChar>` elements in run serialization
947
+
948
+ **Phase 4: API Enhancement**
949
+
950
+ 1. Add `Paragraph.getFields()` method
951
+ 2. Add `Document.extractFields()` method (like `extractHyperlinks()`)
952
+ 3. Add field validation during save
953
+
954
+ ---
955
+
956
+ ## 📚 References and Evidence
957
+
958
+ ### Files Analyzed
959
+
960
+ #### docXMLater Framework
961
+
962
+ 1. ✅ `src/core/DocumentParser.ts` (1,500+ lines) - Parsing logic
963
+ 2. ✅ `src/core/DocumentGenerator.ts` (400+ lines) - Generation logic
964
+ 3. ✅ `src/core/Document.ts` (2,000+ lines) - Main API
965
+ 4. ✅ `src/elements/Field.ts` (500+ lines) - Field classes
966
+ 5. ✅ `src/elements/FieldHelpers.ts` (200+ lines) - Field utilities
967
+ 6. ✅ `src/elements/Paragraph.ts` (1,300+ lines) - Paragraph class
968
+ 7. ✅ `src/elements/Run.ts` (600+ lines) - Run class
969
+ 8. ✅ `src/xml/XMLParser.ts` (700+ lines) - XML parsing
970
+ 9. ✅ `src/xml/XMLBuilder.ts` (300+ lines) - XML building
971
+
972
+ #### dochub-app Application
973
+
974
+ 1. ✅ `src/services/document/WordDocumentProcessor.ts` (1,800+ lines)
975
+ 2. ✅ `src/services/document/DocXMLaterProcessor.ts` (500+ lines)
976
+ 3. ✅ `docs/architecture/DOCXMLATER_INTEGRATION.md`
977
+ 4. ✅ `FIXES_COMPLETED.md`
978
+ 5. ✅ `TEST_RESULTS_SUMMARY.md`
979
+
980
+ ### Key Code Locations
981
+
982
+ **Field Parsing (Simple)**:
983
+
984
+ - File: `docXMLater/src/core/DocumentParser.ts`
985
+ - Method: `parseSimpleFieldFromObject()`
986
+ - Lines: ~580-616
987
+ - Status: ✅ Working
988
+
989
+ **Field Parsing (Complex)**:
990
+
991
+ - File: `docXMLater/src/core/DocumentParser.ts`
992
+ - Method: **DOES NOT EXIST** ❌
993
+ - Expected: `parseComplexFieldFromRunSequence()`
994
+ - Status: ❌ Missing
995
+
996
+ **Field Detection in Ordered Parsing**:
997
+
998
+ - File: `docXMLater/src/core/DocumentParser.ts`
999
+ - Method: `parseParagraphWithOrder()`
1000
+ - Lines: ~238-310
1001
+ - Issue: ✅ Handles `w:fldSimple`, ❌ Ignores `w:fldChar`
1002
+
1003
+ **Order Metadata Generation**:
1004
+
1005
+ - File: `docXMLater/src/xml/XMLParser.ts`
1006
+ - Method: `coalesceChildren()`
1007
+ - Lines: ~560-575
1008
+ - Issue: ❌ Conditional generation based on `uniqueTypes.length > 1`
1009
+
1010
+ **Field Serialization**:
1011
+
1012
+ - File: `docXMLater/src/elements/Paragraph.ts`
1013
+ - Method: `toXML()`
1014
+ - Lines: ~600-650
1015
+ - Issue: ❌ Wraps fields in run, always generates `fldSimple`
1016
+
1017
+ ---
1018
+
1019
+ ## 🔬 Additional Technical Details
1020
+
1021
+ ### ComplexField Class Design
1022
+
1023
+ The `ComplexField` class in `Field.ts` is **well-designed** and supports:
1024
+
1025
+ ```typescript
1026
+ export class ComplexField {
1027
+ private instruction: string;
1028
+ private result?: string;
1029
+ private instructionFormatting?: RunFormatting;
1030
+ private resultFormatting?: RunFormatting;
1031
+ private nestedFields: ComplexField[];
1032
+ private resultContent: XMLElement[];
1033
+ private multiParagraph: boolean;
1034
+
1035
+ toXML(): XMLElement[] {
1036
+ // Returns ARRAY of run elements (begin, instr, sep, result, end)
1037
+ // ✅ Correctly handles complex field structure
1038
+ }
1039
+ }
1040
+ ```
1041
+
1042
+ **The class is ready to use** - it just needs to be instantiated during parsing!
1043
+
1044
+ ### Why TOC Generation Works But TOC Preservation Doesn't
1045
+
1046
+ **File**: `docXMLater/src/core/Document.ts` (lines ~1100-1200)
1047
+
1048
+ The framework has a `replaceTableOfContents()` method that:
1049
+
1050
+ 1. Reads the saved DOCX file
1051
+ 2. Finds TOC SDT elements in XML
1052
+ 3. **Replaces them directly in the XML string**
1053
+ 4. Saves the modified XML back
1054
+
1055
+ This works because it **bypasses the object model entirely** - it never tries to parse the complex TOC fields into objects. It's pure XML string manipulation.
1056
+
1057
+ **This confirms**: The framework developers **knew** complex fields were problematic and worked around it by using direct XML manipulation instead of object model parsing.
1058
+
1059
+ ---
1060
+
1061
+ ## 🚨 Critical Conclusions
1062
+
1063
+ ### The "Hit or Miss" Behavior Explained
1064
+
1065
+ **Fields are preserved when**:
1066
+ ✅ They use `<w:fldSimple>` structure (simple fields)
1067
+ ✅ AND paragraph has multiple element types (triggers `_orderedChildren`)
1068
+ ✅ AND document isn't heavily modified during processing
1069
+
1070
+ **Fields are lost when**:
1071
+ ❌ They use `<w:fldChar>` structure (complex fields) - **ALWAYS LOST**
1072
+ ❌ OR paragraph has only fields (no `_orderedChildren` → wrong order → may break)
1073
+ ❌ OR processing modifies paragraph content array (rare but possible)
1074
+
1075
+ ### Why It Seems Random
1076
+
1077
+ Users can't see the difference between simple and complex fields in Word - they look identical. Word chooses the structure automatically based on internal complexity heuristics. This makes the bug appear non-deterministic from the user's perspective.
1078
+
1079
+ ### Business Impact
1080
+
1081
+ **HIGH SEVERITY** - This affects:
1082
+
1083
+ - 💼 Legal documents (contracts, agreements)
1084
+ - 📊 Report templates (automated fields)
1085
+ - 📚 Technical documentation (cross-references)
1086
+ - 📄 Forms (merge fields)
1087
+ - 📖 Books/manuals (TOC, page numbers, cross-refs)
1088
+
1089
+ Any document relying on dynamic fields **will be broken** after processing through dochub-app.
1090
+
1091
+ ---
1092
+
1093
+ ## ✅ Verification Steps
1094
+
1095
+ To confirm these bugs in your environment:
1096
+
1097
+ ### Step 1: Create Test Document in Word
1098
+
1099
+ 1. Open Microsoft Word
1100
+ 2. Insert → Quick Parts → Field
1101
+ 3. Choose "Page"→ OK (this creates a PAGE field)
1102
+ 4. Save as `test-simple-field.docx`
1103
+ 5. Press Alt+F9 to view field codes
1104
+ 6. Check if it shows `<w:fldSimple>` or `<w:fldChar>` in the XML
1105
+
1106
+ ### Step 2: Process Through dochub-app
1107
+
1108
+ 1. Load `test-simple-field.docx` in dochub-app
1109
+ 2. Process with minimal settings (no major modifications)
1110
+ 3. Save result
1111
+ 4. Open result in Word
1112
+ 5. Press Alt+F9 - does the field still exist?
1113
+
1114
+ ### Step 3: Check XML Directly
1115
+
1116
+ ```bash
1117
+ # Extract DOCX (it's a ZIP file)
1118
+ unzip test-simple-field.docx -d test-simple
1119
+ unzip result.docx -d result
1120
+
1121
+ # Compare field presence
1122
+ grep -i "fldSimple\|fldChar\|instrText" test-simple/word/document.xml
1123
+ grep -i "fldSimple\|fldChar\|instrText" result/word/document.xml
1124
+ ```
1125
+
1126
+ ### Step 4: Test Complex Field
1127
+
1128
+ 1. In Word, create a cross-reference (Insert → Cross-reference)
1129
+ 2. Save as `test-complex-field.docx`
1130
+ 3. Process through dochub-app
1131
+ 4. Check if cross-reference still works
1132
+
1133
+ **Expected Result**: Cross-reference will be **broken** (shows literal text or error).
1134
+
1135
+ ---
1136
+
1137
+ ## 📋 Recommendations
1138
+
1139
+ ### Immediate Actions
1140
+
1141
+ 1. **Document the limitation** in dochub-app user guide
1142
+ 2. **Add validation** to warn users when fields will be lost
1143
+ 3. **Consider field count** in processing statistics
1144
+ 4. **Add field type** to processing options (warn about complex fields)
1145
+
1146
+ ### Short-Term (1-2 weeks)
1147
+
1148
+ 1. **Implement Fix #2** (always generate `_orderedChildren`) - Low risk, high impact
1149
+ 2. **Add `extractFields()` API** to DocXMLaterProcessor for visibility
1150
+ 3. **Add unit tests** for simple field preservation
1151
+ 4. **Update documentation** with field preservation status
1152
+
1153
+ ### Long-Term (1-2 months)
1154
+
1155
+ 1. **Implement Fix #1** (complex field parsing) - Requires architecture changes
1156
+ 2. **Add stateful parser** for complex fields
1157
+ 3. **Update Paragraph API** to support ComplexField
1158
+ 4. **Comprehensive testing** with real-world documents
1159
+
1160
+ ### Workaround for Users
1161
+
1162
+ Until fixed, users should:
1163
+
1164
+ 1. **Avoid processing documents with critical fields**
1165
+ 2. **Re-insert fields manually** after processing if needed
1166
+ 3. **Use simple field structure** when possible (convert in Word first)
1167
+ 4. **Keep backups** before processing (dochub-app already does this ✅)
1168
+
1169
+ ---
1170
+
1171
+ ## 📞 Support Information
1172
+
1173
+ ### For Developers
1174
+
1175
+ - This analysis file: `FIELD_PRESERVATION_ANALYSIS.md`
1176
+ - Framework repo: `c:\Users\DiaTech\Pictures\DiaTech\Programs\DocHub\development\docXMLater`
1177
+ - Application repo: `c:\Users\DiaTech\Pictures\DiaTech\Programs\DocHub\development\dochub-app`
1178
+
1179
+ ### For Bug Reports
1180
+
1181
+ Include:
1182
+
1183
+ 1. Sample DOCX with fields that are lost
1184
+ 2. XML diff showing before/after field presence
1185
+ 3. Field type (simple vs complex) from XML inspection
1186
+ 4. Processing options used in dochub-app
1187
+
1188
+ ---
1189
+
1190
+ ## ✍️ Document Metadata
1191
+
1192
+ **Analysis Date**: November 14, 2025
1193
+ **Files Analyzed**: 14 files, ~10,000 lines of code
1194
+ **Bugs Identified**: 5 critical/high severity
1195
+ **Fix Complexity**: Medium-High (requires stateful parsing)
1196
+ **Breaking Changes**: None (fixes are additive)
1197
+
1198
+ ---
1199
+
1200
+ **END OF ANALYSIS**