documentation-hub 5.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.eslintrc.json +43 -0
- package/.github/workflows/build.yml +64 -0
- package/.github/workflows/ci.yml +39 -0
- package/.vscode/extensions.json +3 -0
- package/Current.md +97 -0
- package/DocHub_Image.png +0 -0
- package/README.md +666 -0
- package/USER_GUIDE.md +1173 -0
- package/Updater.md +311 -0
- package/build/256x256.png +0 -0
- package/build/512x512.png +0 -0
- package/build/app-update.yml +4 -0
- package/build/create-icon.js +208 -0
- package/build/icon.ico +0 -0
- package/build/icon.png +0 -0
- package/build/icon_1024x1024.png +0 -0
- package/dist/assets/Analytics-BpsG9895.js +1 -0
- package/dist/assets/Card-IAZin8kp.js +1 -0
- package/dist/assets/CurrentSession-B-rFkHvf.js +12 -0
- package/dist/assets/Dashboard-C_5gMb0q.js +1 -0
- package/dist/assets/Documents-CqZ25axS.js +1 -0
- package/dist/assets/Input-l89xwXBi.js +1 -0
- package/dist/assets/Reporting-DqdHJY_a.js +1 -0
- package/dist/assets/Search-XNbu5z_3.js +1 -0
- package/dist/assets/SessionManager-lH9hZfzH.js +1 -0
- package/dist/assets/Sessions-ClZOPYNc.js +1 -0
- package/dist/assets/Settings-DUEHGURa.js +11 -0
- package/dist/assets/index-8xUe8ptc.js +24 -0
- package/dist/assets/index-RYyJqF7O.css +1 -0
- package/dist/assets/path-BkOl0AGO.js +1 -0
- package/dist/assets/promises-ID_B9S-h.js +1 -0
- package/dist/assets/urlHelpers-TvgahX0r.js +1 -0
- package/dist/assets/useToast-yRSO1dkm.js +1 -0
- package/dist/assets/vendor-charts-RkGK5ROP.js +36 -0
- package/dist/assets/vendor-db-l0sNRNKZ.js +1 -0
- package/dist/assets/vendor-react-BVZ_anCF.js +4 -0
- package/dist/assets/vendor-search-Dw8P0qyA.js +1 -0
- package/dist/assets/vendor-ui-BU7NfluV.js +53 -0
- package/dist/electron/PowerAutomateApiService-LfW09ZGr.js +147 -0
- package/dist/electron/main-CXkNtyv-.js +19789 -0
- package/dist/electron/main.js +5 -0
- package/dist/electron/preload.js +1 -0
- package/dist/icon.png +0 -0
- package/dist/index.html +27 -0
- package/docs/CODEBASE_ANALYSIS_REPORT.md +309 -0
- package/docs/DEBUG_LOGGING_GUIDE.md +244 -0
- package/docs/README.md +115 -0
- package/docs/TOC_WIRING_GUIDE.md +344 -0
- package/docs/analysis/Bullet_Symbol_Bug_Analysis.md +136 -0
- package/docs/analysis/DOCXMLATER_ANALYSIS_SUMMARY.txt +169 -0
- package/docs/analysis/Document_Processing_Issues_Analysis.md +704 -0
- package/docs/analysis/FIELD_PRESERVATION_ANALYSIS.md +1200 -0
- package/docs/analysis/INDENTATION_PRESERVE_ANALYSIS.md +181 -0
- package/docs/analysis/INDENTATION_PRESERVE_IMPLEMENTATION.md +207 -0
- package/docs/analysis/List_Implementation.md +206 -0
- package/docs/analysis/List_Implementation_Accuracy_Report.md +366 -0
- package/docs/analysis/PROCESSING_OPTIONS_UI_UPDATES.md +220 -0
- package/docs/analysis/RefactorStyles.md +852 -0
- package/docs/analysis/STYLE_PARAMETER_ENHANCEMENT.md +143 -0
- package/docs/analysis/docxmlater-comparison-todo-2025-11-13.md +636 -0
- package/docs/analysis/docxmlater-implementation-analysis-2025-11-13.md +340 -0
- package/docs/analysis/docxmlater-template_ui-integration-analysis.md +263 -0
- package/docs/analysis/github-issues-to-create.md +237 -0
- package/docs/api/API_README.md +538 -0
- package/docs/api/API_REFERENCE.md +751 -0
- package/docs/api/TYPE_DEFINITIONS.md +869 -0
- package/docs/architecture/FONT_EMBEDDING_GUIDE.md +318 -0
- package/docs/architecture/docxmlater-functions-and-structure.md +726 -0
- package/docs/docxmlater-readme.md +1341 -0
- package/docs/fixes/EXECUTION_LOG_TEST_BASE.md +573 -0
- package/docs/fixes/HYPERLINK_TEXT_SANITIZATION.md +253 -0
- package/docs/fixes/README.md +37 -0
- package/docs/github-issues/issue-1-body.md +125 -0
- package/docs/github-issues/issue-10-body.md +850 -0
- package/docs/github-issues/issue-2-body.md +200 -0
- package/docs/github-issues/issue-3-body.md +270 -0
- package/docs/github-issues/issue-4-body.md +169 -0
- package/docs/github-issues/issue-5-body.md +173 -0
- package/docs/github-issues/issue-6-body.md +158 -0
- package/docs/github-issues/issue-7-body.md +171 -0
- package/docs/github-issues/issue-8-body.md +407 -0
- package/docs/github-issues/issue-9-body.md +515 -0
- package/docs/github-issues/issue-tracker.md +274 -0
- package/docs/github-issues/predictive-analysis-2025-10-18.md +2131 -0
- package/docs/implementation/List_Framework_Refactor_Plan.md +336 -0
- package/docs/implementation/PRIMARY_TEXT_COLOR_FEATURE.md +217 -0
- package/docs/implementation/RELEASE_PLAN_v2.1.0.md +362 -0
- package/docs/implementation/RefactorStyles.md +588 -0
- package/docs/implementation/implement-plan.md +489 -0
- package/docs/implementation/missing-helpers-implementation.md +391 -0
- package/docs/implementation/refactor-plan.md +520 -0
- package/docs/implementation/session-implementation-complete.md +233 -0
- package/docs/implementation/session-management-plan.md +250 -0
- package/docs/setup-checklist.md +77 -0
- package/docs/versions/changelog.md +345 -0
- package/electron/customUpdater.ts +656 -0
- package/electron/main.ts +2441 -0
- package/electron/memoryConfig.ts +187 -0
- package/electron/preload.ts +394 -0
- package/electron/proxyConfig.ts +340 -0
- package/electron/services/BackupService.ts +452 -0
- package/electron/services/DictionaryService.ts +402 -0
- package/electron/services/LocalDictionaryLookupService.ts +147 -0
- package/electron/services/PowerAutomateApiService.ts +231 -0
- package/electron/services/SharePointSyncService.ts +474 -0
- package/electron/windowsCertStore.ts +427 -0
- package/electron/zscalerConfig.ts +381 -0
- package/eslint.config.js +92 -0
- package/jest.config.js +52 -0
- package/package.json +214 -0
- package/postcss.config.mjs +6 -0
- package/public/icon.png +0 -0
- package/publish-release.ps1 +5 -0
- package/renovate.json +30 -0
- package/src/App.tsx +216 -0
- package/src/__mocks__/p-limit.js +12 -0
- package/src/__mocks__/styleMock.js +1 -0
- package/src/components/common/BugReportButton.tsx +44 -0
- package/src/components/common/BugReportDialog.tsx +193 -0
- package/src/components/common/Button.tsx +153 -0
- package/src/components/common/Card.tsx +86 -0
- package/src/components/common/ColorPickerDialog.tsx +177 -0
- package/src/components/common/ConfirmDialog.tsx +96 -0
- package/src/components/common/DebugConsole.tsx +275 -0
- package/src/components/common/EmptyState.tsx +183 -0
- package/src/components/common/ErrorBoundary.tsx +98 -0
- package/src/components/common/ErrorDetailsDialog.tsx +153 -0
- package/src/components/common/ErrorFallback.tsx +218 -0
- package/src/components/common/Input.tsx +109 -0
- package/src/components/common/Skeleton.tsx +184 -0
- package/src/components/common/SplashScreen.tsx +81 -0
- package/src/components/common/Toast.tsx +155 -0
- package/src/components/common/Tooltip.tsx +79 -0
- package/src/components/common/UpdateNotification.tsx +320 -0
- package/src/components/comparison/ComparisonWindow.tsx +374 -0
- package/src/components/comparison/SideBySideDiff.tsx +486 -0
- package/src/components/comparison/index.ts +8 -0
- package/src/components/document/DocumentUploader.tsx +288 -0
- package/src/components/document/HyperlinkPreview.tsx +430 -0
- package/src/components/document/HyperlinkService.md +1484 -0
- package/src/components/document/Hyperlink_Technical_Documentation.md +496 -0
- package/src/components/document/InlineChangesView.tsx +707 -0
- package/src/components/document/ProcessingProgress.tsx +303 -0
- package/src/components/document/ProcessingResults.tsx +256 -0
- package/src/components/document/TrackedChangesDetail.tsx +530 -0
- package/src/components/document/TrackedChangesPanel.tsx +546 -0
- package/src/components/document/VirtualDocumentList.tsx +240 -0
- package/src/components/editor/DocumentEditor.tsx +723 -0
- package/src/components/editor/DocumentEditorModal.tsx +640 -0
- package/src/components/editor/EditorQuickActions.tsx +502 -0
- package/src/components/editor/EditorToolbar.tsx +312 -0
- package/src/components/editor/TableEditor.tsx +926 -0
- package/src/components/editor/index.ts +18 -0
- package/src/components/layout/Header.tsx +190 -0
- package/src/components/layout/Sidebar.tsx +313 -0
- package/src/components/layout/TitleBar.tsx +190 -0
- package/src/components/navigation/CommandPalette.tsx +233 -0
- package/src/components/navigation/KeyboardShortcutsModal.tsx +173 -0
- package/src/components/sessions/ChangeItem.tsx +408 -0
- package/src/components/sessions/ChangeViewer.tsx +1155 -0
- package/src/components/sessions/DocumentComparisonModal.tsx +314 -0
- package/src/components/sessions/ProcessingOptions.tsx +297 -0
- package/src/components/sessions/ReplacementsTab.tsx +438 -0
- package/src/components/sessions/RevisionHandlingOptions.tsx +87 -0
- package/src/components/sessions/SessionManager.tsx +188 -0
- package/src/components/sessions/StylesEditor.tsx +1335 -0
- package/src/components/sessions/TabContainer.tsx +151 -0
- package/src/components/sessions/VirtualSessionList.tsx +157 -0
- package/src/components/sessions/sessionToProcessorManager.tsx +420 -0
- package/src/components/settings/CertificateManager.tsx +410 -0
- package/src/components/settings/SegmentedControl.tsx +88 -0
- package/src/components/settings/SettingRow.tsx +52 -0
- package/src/contexts/GlobalStatsContext.tsx +396 -0
- package/src/contexts/SessionContext.tsx +2129 -0
- package/src/contexts/ThemeContext.tsx +428 -0
- package/src/contexts/UserSettingsContext.tsx +290 -0
- package/src/contexts/__tests__/GlobalStatsContext.test.tsx +390 -0
- package/src/global.d.ts +273 -0
- package/src/hooks/useDocumentQueue.tsx +210 -0
- package/src/hooks/useToast.tsx +55 -0
- package/src/main.tsx +10 -0
- package/src/pages/Analytics.tsx +386 -0
- package/src/pages/CurrentSession.tsx +1174 -0
- package/src/pages/Dashboard.tsx +319 -0
- package/src/pages/Documents.tsx +317 -0
- package/src/pages/Projects.tsx +250 -0
- package/src/pages/Reporting.tsx +386 -0
- package/src/pages/Search.tsx +349 -0
- package/src/pages/Sessions.tsx +285 -0
- package/src/pages/Settings.tsx +2662 -0
- package/src/services/HyperlinkService.ts +1085 -0
- package/src/services/document/DocXMLaterProcessor.ts +617 -0
- package/src/services/document/DocumentProcessingComparison.ts +856 -0
- package/src/services/document/DocumentSnapshotService.ts +575 -0
- package/src/services/document/WordDocumentProcessor.ts +10509 -0
- package/src/services/document/__tests__/DocXMLaterProcessor.hyperlinks.test.md +311 -0
- package/src/services/document/__tests__/WordDocumentProcessor.integration.test.ts +515 -0
- package/src/services/document/__tests__/WordDocumentProcessor.test.ts +812 -0
- package/src/services/document/blanklines/BlankLineManager.ts +658 -0
- package/src/services/document/blanklines/__tests__/paragraphChecks.test.ts +281 -0
- package/src/services/document/blanklines/helpers/blankLineInsertion.ts +87 -0
- package/src/services/document/blanklines/helpers/blankLineSnapshot.ts +251 -0
- package/src/services/document/blanklines/helpers/clearCustom.ts +121 -0
- package/src/services/document/blanklines/helpers/contextChecks.ts +117 -0
- package/src/services/document/blanklines/helpers/imageChecks.ts +51 -0
- package/src/services/document/blanklines/helpers/paragraphChecks.ts +236 -0
- package/src/services/document/blanklines/helpers/removeBlanksBetweenListItems.ts +91 -0
- package/src/services/document/blanklines/helpers/removeTrailingBlanks.ts +35 -0
- package/src/services/document/blanklines/helpers/tableGuards.ts +21 -0
- package/src/services/document/blanklines/index.ts +67 -0
- package/src/services/document/blanklines/rules/additionRules.ts +337 -0
- package/src/services/document/blanklines/rules/indentationRules.ts +317 -0
- package/src/services/document/blanklines/rules/removalRules.ts +362 -0
- package/src/services/document/blanklines/rules/ruleTypes.ts +92 -0
- package/src/services/document/blanklines/types.ts +29 -0
- package/src/services/document/helpers/ImageBorderCropper.ts +377 -0
- package/src/services/document/helpers/__tests__/whitespace.test.ts +272 -0
- package/src/services/document/helpers/whitespace.ts +117 -0
- package/src/services/document/list/ListNormalizer.ts +947 -0
- package/src/services/document/list/index.ts +45 -0
- package/src/services/document/list/list-detection.ts +275 -0
- package/src/services/document/list/list-types.ts +162 -0
- package/src/services/document/processors/HyperlinkProcessor.ts +370 -0
- package/src/services/document/processors/ListProcessor.ts +257 -0
- package/src/services/document/processors/StructureProcessor.ts +176 -0
- package/src/services/document/processors/StyleProcessor.ts +389 -0
- package/src/services/document/processors/TableProcessor.ts +2238 -0
- package/src/services/document/processors/__tests__/HyperlinkProcessor.test.ts +314 -0
- package/src/services/document/processors/__tests__/ListProcessor.test.ts +291 -0
- package/src/services/document/processors/__tests__/StructureProcessor.test.ts +257 -0
- package/src/services/document/processors/__tests__/TableProcessor.hlp-tips-bullets.test.ts +459 -0
- package/src/services/document/processors/__tests__/TableProcessor.test.ts +1604 -0
- package/src/services/document/processors/index.ts +28 -0
- package/src/services/document/types/docx-processing.ts +310 -0
- package/src/services/editor/EditorActionHandlers.ts +901 -0
- package/src/services/editor/index.ts +13 -0
- package/src/setupTests.ts +47 -0
- package/src/styles/global.css +782 -0
- package/src/types/backup.ts +132 -0
- package/src/types/dictionary.ts +125 -0
- package/src/types/document-processing.ts +331 -0
- package/src/types/docxmlater-augments.d.ts +142 -0
- package/src/types/editor.ts +280 -0
- package/src/types/electron.ts +340 -0
- package/src/types/globalStats.ts +155 -0
- package/src/types/hyperlink.ts +471 -0
- package/src/types/operations.ts +354 -0
- package/src/types/session.ts +427 -0
- package/src/types/settings.ts +112 -0
- package/src/utils/MemoryMonitor.ts +248 -0
- package/src/utils/cn.ts +6 -0
- package/src/utils/colorConvert.ts +306 -0
- package/src/utils/diffUtils.ts +347 -0
- package/src/utils/documentUtils.ts +202 -0
- package/src/utils/electronGuard.ts +62 -0
- package/src/utils/indexedDB.ts +915 -0
- package/src/utils/logger.ts +717 -0
- package/src/utils/pathSecurity.ts +232 -0
- package/src/utils/pathValidator.ts +236 -0
- package/src/utils/processingTimeEstimator.ts +153 -0
- package/src/utils/safeJsonParse.ts +62 -0
- package/src/utils/textSanitizer.ts +162 -0
- package/src/utils/urlHelpers.ts +304 -0
- package/src/utils/urlPatterns.ts +198 -0
- package/src/utils/urlSanitizer.ts +152 -0
- package/src/vite-env.d.ts +11 -0
- package/tsconfig.electron.json +19 -0
- package/tsconfig.json +36 -0
- package/tsconfig.node.json +12 -0
- package/typedoc.json +45 -0
- package/vite.config.ts +152 -0
|
@@ -0,0 +1,1200 @@
|
|
|
1
|
+
# Field Preservation Analysis - Critical Bug Report
|
|
2
|
+
|
|
3
|
+
**Date**: November 14, 2025
|
|
4
|
+
**Analysis Type**: Deep Codebase Investigation
|
|
5
|
+
**Status**: 🚨 **CRITICAL BUG IDENTIFIED**
|
|
6
|
+
**Severity**: HIGH - Data Loss Issue
|
|
7
|
+
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Executive Summary
|
|
11
|
+
|
|
12
|
+
After thoroughly analyzing both the **docXMLater framework** (document processing library) and **dochub-app application** (main application using the framework), I have identified **critical bugs in field preservation** that explain why fields are inconsistently preserved during DOCX file processing.
|
|
13
|
+
|
|
14
|
+
### Key Findings
|
|
15
|
+
|
|
16
|
+
1. ✅ **Simple fields (`<w:fldSimple>`) ARE parsed** - Code exists in DocumentParser
|
|
17
|
+
2. ❌ **Simple fields MAY NOT be serialized** - Conditional preservation bug
|
|
18
|
+
3. ❌ **Complex fields (`<w:fldChar>`) are COMPLETELY IGNORED** - Not parsed at all
|
|
19
|
+
4. ⚠️ **Order preservation is unreliable** - Metadata generation has edge cases
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Architecture Overview
|
|
24
|
+
|
|
25
|
+
### System Flow
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
dochub-app (Main Application)
|
|
29
|
+
↓
|
|
30
|
+
WordDocumentProcessor.ts
|
|
31
|
+
↓
|
|
32
|
+
DocXMLaterProcessor.ts (thin wrapper)
|
|
33
|
+
↓
|
|
34
|
+
docXMLater Framework
|
|
35
|
+
├─ DocumentParser.ts (Load: XML → Objects)
|
|
36
|
+
├─ Document.ts (In-memory document)
|
|
37
|
+
└─ DocumentGenerator.ts (Save: Objects → XML)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Document Processing Pipeline
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
1. LOAD PHASE
|
|
44
|
+
┌─────────────────────────────────────┐
|
|
45
|
+
│ Word DOCX File (ZIP archive) │
|
|
46
|
+
└──────────────┬──────────────────────┘
|
|
47
|
+
↓
|
|
48
|
+
┌─────────────────────────────────────┐
|
|
49
|
+
│ ZipHandler.load() │
|
|
50
|
+
│ - Extracts word/document.xml │
|
|
51
|
+
└──────────────┬──────────────────────┘
|
|
52
|
+
↓
|
|
53
|
+
┌─────────────────────────────────────┐
|
|
54
|
+
│ DocumentParser.parseDocument() │
|
|
55
|
+
│ - Parses XML to structured objects │
|
|
56
|
+
│ - Creates Paragraph, Run, etc. │
|
|
57
|
+
└──────────────┬──────────────────────┘
|
|
58
|
+
↓
|
|
59
|
+
┌─────────────────────────────────────┐
|
|
60
|
+
│ Document (in-memory) │
|
|
61
|
+
│ - bodyElements: Paragraph[] │
|
|
62
|
+
└─────────────────────────────────────┘
|
|
63
|
+
|
|
64
|
+
2. PROCESSING PHASE
|
|
65
|
+
┌─────────────────────────────────────┐
|
|
66
|
+
│ WordDocumentProcessor │
|
|
67
|
+
│ - Updates hyperlinks │
|
|
68
|
+
│ - Applies styles │
|
|
69
|
+
│ - Formats tables │
|
|
70
|
+
└─────────────────────────────────────┘
|
|
71
|
+
|
|
72
|
+
3. SAVE PHASE
|
|
73
|
+
┌─────────────────────────────────────┐
|
|
74
|
+
│ Document.save() │
|
|
75
|
+
│ - Calls Paragraph.toXML() │
|
|
76
|
+
│ - Generates word/document.xml │
|
|
77
|
+
└──────────────┬──────────────────────┘
|
|
78
|
+
↓
|
|
79
|
+
┌─────────────────────────────────────┐
|
|
80
|
+
│ ZipHandler.save() │
|
|
81
|
+
│ - Writes ZIP archive to disk │
|
|
82
|
+
└─────────────────────────────────────┘
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## 🐛 BUG #1: Complex Fields Are Completely Ignored
|
|
88
|
+
|
|
89
|
+
### Description
|
|
90
|
+
|
|
91
|
+
**Complex fields** (using `<w:fldChar>` structure) are **NOT parsed at all**. They are silently dropped during document loading.
|
|
92
|
+
|
|
93
|
+
### Field Types in Word Documents
|
|
94
|
+
|
|
95
|
+
Word documents use two field structures:
|
|
96
|
+
|
|
97
|
+
#### 1. Simple Fields (`<w:fldSimple>`)
|
|
98
|
+
|
|
99
|
+
```xml
|
|
100
|
+
<w:p>
|
|
101
|
+
<w:fldSimple w:instr=" PAGE \* MERGEFORMAT ">
|
|
102
|
+
<w:t>1</w:t>
|
|
103
|
+
</w:fldSimple>
|
|
104
|
+
</w:p>
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
#### 2. Complex Fields (`<w:fldChar>`) - ❌ **NOT HANDLED**
|
|
108
|
+
|
|
109
|
+
```xml
|
|
110
|
+
<w:p>
|
|
111
|
+
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
|
|
112
|
+
<w:r><w:instrText> PAGE \* MERGEFORMAT </w:instrText></w:r>
|
|
113
|
+
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
|
|
114
|
+
<w:r><w:t>1</w:t></w:r>
|
|
115
|
+
<w:r><w:fldChar w:fldCharType="end"/></w:r>
|
|
116
|
+
</w:p>
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Affected Field Types
|
|
120
|
+
|
|
121
|
+
Complex fields are used for:
|
|
122
|
+
|
|
123
|
+
- ✅ Table of Contents (TOC)
|
|
124
|
+
- ✅ Cross-references (REF)
|
|
125
|
+
- ✅ Page numbers (PAGE)
|
|
126
|
+
- ✅ Date/Time fields (DATE, TIME)
|
|
127
|
+
- ✅ Document properties (AUTHOR, TITLE, etc.)
|
|
128
|
+
- ✅ Conditional fields (IF)
|
|
129
|
+
- ✅ Mail merge fields (MERGEFIELD)
|
|
130
|
+
|
|
131
|
+
### Code Evidence
|
|
132
|
+
|
|
133
|
+
**File**: `docXMLater/src/core/DocumentParser.ts`
|
|
134
|
+
|
|
135
|
+
**Parsing Logic** (lines ~238-310):
|
|
136
|
+
|
|
137
|
+
```typescript
|
|
138
|
+
// In parseParagraphWithOrder() method:
|
|
139
|
+
if (orderedChildren && orderedChildren.length > 0) {
|
|
140
|
+
for (const childInfo of orderedChildren) {
|
|
141
|
+
const elementType = childInfo.type;
|
|
142
|
+
|
|
143
|
+
if (elementType === 'w:r') {
|
|
144
|
+
// ✅ Runs are parsed
|
|
145
|
+
} else if (elementType === 'w:hyperlink') {
|
|
146
|
+
// ✅ Hyperlinks are parsed
|
|
147
|
+
} else if (elementType === 'w:fldSimple') {
|
|
148
|
+
// ✅ Simple fields are parsed
|
|
149
|
+
}
|
|
150
|
+
// ❌ NO HANDLING FOR w:fldChar - complex fields are ignored!
|
|
151
|
+
}
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**Parsing Method** (lines ~580-616):
|
|
156
|
+
|
|
157
|
+
```typescript
|
|
158
|
+
private parseSimpleFieldFromObject(fieldObj: any): Field | null {
|
|
159
|
+
// ✅ This method exists and works
|
|
160
|
+
const instruction = fieldObj["@_w:instr"];
|
|
161
|
+
const type = (typeMatch?.[1] || 'PAGE') as FieldType;
|
|
162
|
+
return Field.create({ type, instruction, formatting });
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
// ❌ NO METHOD FOR parseComplexFieldFromObject()
|
|
166
|
+
// ❌ NO CODE TO DETECT w:fldChar elements
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**Result**: Complex fields are **silently dropped** during parsing. The XML contains them, but they never make it into the in-memory Document object.
|
|
170
|
+
|
|
171
|
+
### Impact
|
|
172
|
+
|
|
173
|
+
**Hit or Miss Behavior**:
|
|
174
|
+
|
|
175
|
+
- Documents with **simple fields** (`<w:fldSimple>`) → ✅ Preserved (sometimes, see Bug #2)
|
|
176
|
+
- Documents with **complex fields** (`<w:fldChar>`) → ❌ Always lost
|
|
177
|
+
- Mixed documents → ⚠️ Partial loss (simple preserved, complex lost)
|
|
178
|
+
|
|
179
|
+
This explains the "hit or miss" nature - it depends on which field structure Word used when creating the field.
|
|
180
|
+
|
|
181
|
+
### Why Word Uses Different Structures
|
|
182
|
+
|
|
183
|
+
Word decides between simple and complex fields based on:
|
|
184
|
+
|
|
185
|
+
- **Simple**: Basic fields with no special formatting or nested content
|
|
186
|
+
- **Complex**: Fields with special formatting, nested fields, or complex instructions
|
|
187
|
+
|
|
188
|
+
The choice is **automatic and invisible to users**, which is why the bug appears random.
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## 🐛 BUG #2: Simple Field Preservation Depends on `_orderedChildren` Metadata
|
|
193
|
+
|
|
194
|
+
### Description
|
|
195
|
+
|
|
196
|
+
Simple fields (`<w:fldSimple>`) are only preserved if the **XMLParser generates `_orderedChildren` metadata**. This metadata is **conditionally generated**, leading to inconsistent field preservation.
|
|
197
|
+
|
|
198
|
+
### Root Cause
|
|
199
|
+
|
|
200
|
+
**File**: `docXMLater/src/xml/XMLParser.ts`
|
|
201
|
+
|
|
202
|
+
**Lines ~560-575** (coalesceChildren method):
|
|
203
|
+
|
|
204
|
+
```typescript
|
|
205
|
+
// Build ordered children metadata to preserve document order
|
|
206
|
+
const orderedChildren: Array<{ type: string; index: number }> = [];
|
|
207
|
+
|
|
208
|
+
// ... build orderedChildren array ...
|
|
209
|
+
|
|
210
|
+
// ❌ BUG: Only adds metadata if multiple child types exist
|
|
211
|
+
if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
|
|
212
|
+
result['_orderedChildren'] = orderedChildren;
|
|
213
|
+
}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### The Problem
|
|
217
|
+
|
|
218
|
+
**Scenario 1**: Paragraph with runs, hyperlinks, AND fields
|
|
219
|
+
|
|
220
|
+
```xml
|
|
221
|
+
<w:p>
|
|
222
|
+
<w:r><w:t>Text</w:t></w:r>
|
|
223
|
+
<w:hyperlink><w:r><w:t>Link</w:t></w:r></w:hyperlink>
|
|
224
|
+
<w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
|
|
225
|
+
</w:p>
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
- `uniqueTypes.length = 3` (w:r, w:hyperlink, w:fldSimple)
|
|
229
|
+
- ✅ `_orderedChildren` is created
|
|
230
|
+
- ✅ Field is parsed in correct order
|
|
231
|
+
- ✅ **Field PRESERVED**
|
|
232
|
+
|
|
233
|
+
**Scenario 2**: Paragraph with ONLY fields
|
|
234
|
+
|
|
235
|
+
```xml
|
|
236
|
+
<w:p>
|
|
237
|
+
<w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
|
|
238
|
+
</w:p>
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
- `uniqueTypes.length = 1` (only w:fldSimple)
|
|
242
|
+
- ❌ `_orderedChildren` is **NOT created** (fails `uniqueTypes.length > 1` check)
|
|
243
|
+
- ❌ Falls back to non-ordered parsing
|
|
244
|
+
- ⚠️ **Field MAY be lost** (depends on fallback behavior)
|
|
245
|
+
|
|
246
|
+
**Scenario 3**: Paragraph with fields and runs of same type
|
|
247
|
+
|
|
248
|
+
```xml
|
|
249
|
+
<w:p>
|
|
250
|
+
<w:r><w:t>Page </w:t></w:r>
|
|
251
|
+
<w:fldSimple w:instr="PAGE"><w:t>1</w:t></w:fldSimple>
|
|
252
|
+
<w:r><w:t> of </w:t></w:r>
|
|
253
|
+
<w:fldSimple w:instr="NUMPAGES"><w:t>10</w:t></w:fldSimple>
|
|
254
|
+
</w:p>
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
- `uniqueTypes.length = 2` (w:r, w:fldSimple)
|
|
258
|
+
- ✅ `_orderedChildren` is created
|
|
259
|
+
- ✅ **Fields PRESERVED**
|
|
260
|
+
|
|
261
|
+
### Fallback Behavior Analysis
|
|
262
|
+
|
|
263
|
+
**File**: `docXMLater/src/core/DocumentParser.ts` (lines ~280-310)
|
|
264
|
+
|
|
265
|
+
```typescript
|
|
266
|
+
} else {
|
|
267
|
+
// Fallback to sequential processing if no order metadata
|
|
268
|
+
// Handle runs (w:r)
|
|
269
|
+
const runs = pElement["w:r"];
|
|
270
|
+
// ...process runs...
|
|
271
|
+
|
|
272
|
+
// Handle hyperlinks (w:hyperlink)
|
|
273
|
+
const hyperlinks = pElement["w:hyperlink"];
|
|
274
|
+
// ...process hyperlinks...
|
|
275
|
+
|
|
276
|
+
// Handle simple fields (w:fldSimple)
|
|
277
|
+
const fields = pElement["w:fldSimple"];
|
|
278
|
+
const fieldChildren = Array.isArray(fields) ? fields : (fields ? [fields] : []);
|
|
279
|
+
|
|
280
|
+
for (const fieldObj of fieldChildren) {
|
|
281
|
+
const field = this.parseSimpleFieldFromObject(fieldObj);
|
|
282
|
+
if (field) {
|
|
283
|
+
paragraph.addField(field); // ✅ Field IS added in fallback
|
|
284
|
+
}
|
|
285
|
+
}
|
|
286
|
+
}
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
**Conclusion**: In the fallback path, fields **ARE processed** and added to the paragraph. However, they are processed **AFTER runs and hyperlinks**, which means:
|
|
290
|
+
|
|
291
|
+
- ✅ Fields are preserved
|
|
292
|
+
- ⚠️ Field **ORDER** may be wrong (always appear last instead of in correct position)
|
|
293
|
+
|
|
294
|
+
### Why This Causes "Hit or Miss"
|
|
295
|
+
|
|
296
|
+
**Working Case** (multi-type paragraph):
|
|
297
|
+
|
|
298
|
+
- Paragraph has runs + fields → `_orderedChildren` created → Fields in correct order ✅
|
|
299
|
+
|
|
300
|
+
**Broken Case** (fields only):
|
|
301
|
+
|
|
302
|
+
- Paragraph has only fields → No `_orderedChildren` → Fallback parsing → Fields at wrong position ⚠️
|
|
303
|
+
- If document structure depends on field order (e.g., TOC), this breaks functionality
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## 🐛 BUG #3: Field Serialization Does Not Preserve Complex Fields
|
|
308
|
+
|
|
309
|
+
### Description
|
|
310
|
+
|
|
311
|
+
Even if complex fields WERE parsed (they aren't per Bug #1), the serialization code in `Paragraph.toXML()` cannot properly reconstruct them.
|
|
312
|
+
|
|
313
|
+
### Code Evidence
|
|
314
|
+
|
|
315
|
+
**File**: `docXMLater/src/elements/Paragraph.ts` (lines ~600-650)
|
|
316
|
+
|
|
317
|
+
```typescript
|
|
318
|
+
// Add content (runs, fields, hyperlinks, revisions, shapes, textboxes)
|
|
319
|
+
for (let i = 0; i < this.content.length; i++) {
|
|
320
|
+
const item = this.content[i];
|
|
321
|
+
|
|
322
|
+
if (item instanceof Field) {
|
|
323
|
+
// ❌ BUG: Fields are wrapped in a run - converts to <w:fldSimple>
|
|
324
|
+
paragraphChildren.push(XMLBuilder.w('r', undefined, [item.toXML()]));
|
|
325
|
+
} else if (item instanceof Hyperlink) {
|
|
326
|
+
// ✅ Hyperlinks are standalone elements
|
|
327
|
+
paragraphChildren.push(item.toXML());
|
|
328
|
+
} else if (item) {
|
|
329
|
+
paragraphChildren.push(item.toXML());
|
|
330
|
+
}
|
|
331
|
+
}
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
**Field.toXML()** from `Field.ts`:
|
|
335
|
+
|
|
336
|
+
```typescript
|
|
337
|
+
toXML(): XMLElement {
|
|
338
|
+
// ...
|
|
339
|
+
return {
|
|
340
|
+
name: 'w:fldSimple', // ❌ Always generates fldSimple, never fldChar
|
|
341
|
+
attributes: {
|
|
342
|
+
'w:instr': this.instruction,
|
|
343
|
+
},
|
|
344
|
+
children,
|
|
345
|
+
};
|
|
346
|
+
}
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
### The Problem
|
|
350
|
+
|
|
351
|
+
The current code **always serializes fields as `<w:fldSimple>`**, even if they were originally complex fields. This means:
|
|
352
|
+
|
|
353
|
+
1. Complex fields can't be represented in the object model
|
|
354
|
+
2. Even if parsing were fixed, serialization would convert them to simple fields
|
|
355
|
+
3. Word may reject or incorrectly render the simplified fields
|
|
356
|
+
|
|
357
|
+
### Impact
|
|
358
|
+
|
|
359
|
+
- Complex field formatting is lost
|
|
360
|
+
- Nested fields are flattened
|
|
361
|
+
- TOC fields may not update correctly in Word
|
|
362
|
+
- Cross-references lose their specialized behavior
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## 🐛 BUG #4: Runs with `w:fldChar` Elements Are Treated as Regular Text Runs
|
|
367
|
+
|
|
368
|
+
### Description
|
|
369
|
+
|
|
370
|
+
During parsing, runs that contain `<w:fldChar>` elements (field markers) are processed as **regular text runs**, losing the field structure entirely.
|
|
371
|
+
|
|
372
|
+
### Code Evidence
|
|
373
|
+
|
|
374
|
+
**File**: `docXMLater/src/core/DocumentParser.ts` (lines ~450-550)
|
|
375
|
+
|
|
376
|
+
```typescript
|
|
377
|
+
private parseRunFromObject(runObj: any): Run | null {
|
|
378
|
+
// Extract all run content elements (text, tabs, breaks, etc.)
|
|
379
|
+
const content: RunContent[] = [];
|
|
380
|
+
|
|
381
|
+
if (runObj["_orderedChildren"]) {
|
|
382
|
+
for (const child of runObj["_orderedChildren"]) {
|
|
383
|
+
const elementType = child.type;
|
|
384
|
+
|
|
385
|
+
switch (elementType) {
|
|
386
|
+
case 'w:t':
|
|
387
|
+
// ✅ Text is handled
|
|
388
|
+
break;
|
|
389
|
+
case 'w:tab':
|
|
390
|
+
// ✅ Tabs are handled
|
|
391
|
+
break;
|
|
392
|
+
case 'w:br':
|
|
393
|
+
// ✅ Breaks are handled
|
|
394
|
+
break;
|
|
395
|
+
// ❌ NO CASE FOR 'w:fldChar'
|
|
396
|
+
// ❌ NO CASE FOR 'w:instrText'
|
|
397
|
+
}
|
|
398
|
+
}
|
|
399
|
+
}
|
|
400
|
+
|
|
401
|
+
// Create run from content elements - returns a regular Run
|
|
402
|
+
const run = Run.createFromContent(content, { cleanXmlFromText: false });
|
|
403
|
+
return run;
|
|
404
|
+
}
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
### What Should Happen
|
|
408
|
+
|
|
409
|
+
When a run contains `<w:fldChar>`, it's part of a **complex field structure**:
|
|
410
|
+
|
|
411
|
+
```xml
|
|
412
|
+
<!-- Field begin marker -->
|
|
413
|
+
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
|
|
414
|
+
|
|
415
|
+
<!-- Field instruction -->
|
|
416
|
+
<w:r><w:instrText> PAGE \* MERGEFORMAT </w:instrText></w:r>
|
|
417
|
+
|
|
418
|
+
<!-- Field separator -->
|
|
419
|
+
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
|
|
420
|
+
|
|
421
|
+
<!-- Field result (actual displayed value) -->
|
|
422
|
+
<w:r><w:t>1</w:t></w:r>
|
|
423
|
+
|
|
424
|
+
<!-- Field end marker -->
|
|
425
|
+
<w:r><w:fldChar w:fldCharType="end"/></w:r>
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
These runs should be **grouped together** and converted to a single Field object.
|
|
429
|
+
|
|
430
|
+
### What Actually Happens
|
|
431
|
+
|
|
432
|
+
Each run is parsed **independently**:
|
|
433
|
+
|
|
434
|
+
- Run with `fldChar="begin"` → Parsed as empty Run (no text) → ✅ Preserved as empty run
|
|
435
|
+
- Run with `instrText` → Parsed as Run with text " PAGE \* MERGEFORMAT " → ❌ Shows as literal text
|
|
436
|
+
- Run with `fldChar="separate"` → Parsed as empty Run → ✅ Preserved as empty run
|
|
437
|
+
- Run with actual value → Parsed as Run with text "1" → ✅ Preserved
|
|
438
|
+
- Run with `fldChar="end"` → Parsed as empty Run → ✅ Preserved as empty run
|
|
439
|
+
|
|
440
|
+
**Result**: The field structure is **lost**. The instruction text appears as **literal visible text** in the document instead of being executed as a field.
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
## 🐛 BUG #5: Field Order Can Be Scrambled During Serialization
|
|
445
|
+
|
|
446
|
+
### Description
|
|
447
|
+
|
|
448
|
+
Even when fields ARE preserved during parsing, they may be serialized in the wrong order relative to runs and hyperlinks.
|
|
449
|
+
|
|
450
|
+
### Code Evidence
|
|
451
|
+
|
|
452
|
+
**File**: `docXMLater/src/elements/Paragraph.ts` (lines ~600-650)
|
|
453
|
+
|
|
454
|
+
```typescript
|
|
455
|
+
// Add content (runs, fields, hyperlinks, revisions, shapes, textboxes)
|
|
456
|
+
for (let i = 0; i < this.content.length; i++) {
|
|
457
|
+
const item = this.content[i];
|
|
458
|
+
|
|
459
|
+
if (item instanceof Field) {
|
|
460
|
+
paragraphChildren.push(XMLBuilder.w('r', undefined, [item.toXML()]));
|
|
461
|
+
} else if (item instanceof Hyperlink) {
|
|
462
|
+
paragraphChildren.push(item.toXML());
|
|
463
|
+
} else if (item instanceof Revision) {
|
|
464
|
+
paragraphChildren.push(item.toXML());
|
|
465
|
+
} else if (item instanceof RangeMarker) {
|
|
466
|
+
paragraphChildren.push(item.toXML());
|
|
467
|
+
} else if (item) {
|
|
468
|
+
paragraphChildren.push(item.toXML());
|
|
469
|
+
}
|
|
470
|
+
}
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
### The Problem
|
|
474
|
+
|
|
475
|
+
The serialization iterates through `this.content[]` in order, which **should** preserve order. However:
|
|
476
|
+
|
|
477
|
+
1. **During parsing**, the order depends on whether `_orderedChildren` exists (Bug #2)
|
|
478
|
+
2. **During processing**, the `content[]` array may be modified by dochub-app operations
|
|
479
|
+
3. **No validation** ensures the order remains correct
|
|
480
|
+
|
|
481
|
+
### Potential Scenario
|
|
482
|
+
|
|
483
|
+
```
|
|
484
|
+
Original: Text [FIELD: PAGE] Link
|
|
485
|
+
Parsed: [Run: "Text"] [Field: PAGE] [Hyperlink: "Link"]
|
|
486
|
+
After processing: [Run: "Text"] [Hyperlink: "Link"] [Field: PAGE]
|
|
487
|
+
Saved: Text Link [FIELD: PAGE] ❌ Wrong order!
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
While this is **theoretically possible**, the current code doesn't modify `content[]` order during processing, so this is **low risk** compared to Bugs #1-4.
|
|
491
|
+
|
|
492
|
+
---
|
|
493
|
+
|
|
494
|
+
## 📊 Summary Table: Field Preservation Matrix
|
|
495
|
+
|
|
496
|
+
| Field Structure | Paragraph Content | `_orderedChildren`? | Parsing Result | Serialization | Final Result |
|
|
497
|
+
| -------------------- | --------------------- | ------------------- | ------------------ | ---------------- | ---------------------- |
|
|
498
|
+
| Simple (`fldSimple`) | Runs + Fields + Links | ✅ Yes (3 types) | ✅ Parsed in order | ✅ Correct order | ✅ **PRESERVED** |
|
|
499
|
+
| Simple (`fldSimple`) | Runs + Fields | ✅ Yes (2 types) | ✅ Parsed in order | ✅ Correct order | ✅ **PRESERVED** |
|
|
500
|
+
| Simple (`fldSimple`) | Fields only | ❌ No (1 type) | ⚠️ Fallback path | ⚠️ Wrong order | ⚠️ **ORDER BROKEN** |
|
|
501
|
+
| Complex (`fldChar`) | Any combination | N/A | ❌ Not parsed | ❌ No object | ❌ **COMPLETELY LOST** |
|
|
502
|
+
|
|
503
|
+
---
|
|
504
|
+
|
|
505
|
+
## 🔍 Root Cause Analysis
|
|
506
|
+
|
|
507
|
+
### Why These Bugs Exist
|
|
508
|
+
|
|
509
|
+
#### 1. **Incomplete Implementation**
|
|
510
|
+
|
|
511
|
+
The docXMLater framework has:
|
|
512
|
+
|
|
513
|
+
- ✅ `Field` class defined in `elements/Field.ts`
|
|
514
|
+
- ✅ `ComplexField` class defined in `elements/Field.ts`
|
|
515
|
+
- ❌ **No parsing code** for `ComplexField` in `DocumentParser.ts`
|
|
516
|
+
- ❌ **No detection logic** for `w:fldChar` elements
|
|
517
|
+
|
|
518
|
+
**Evidence**: The `Field.ts` file has complete classes for both simple and complex fields, but `DocumentParser.ts` only has code to parse simple fields.
|
|
519
|
+
|
|
520
|
+
#### 2. **Design Flaw in Order Preservation**
|
|
521
|
+
|
|
522
|
+
The `_orderedChildren` metadata is meant to preserve element order, but the condition `uniqueTypes.length > 1` is **too restrictive**:
|
|
523
|
+
|
|
524
|
+
```typescript
|
|
525
|
+
// ❌ FLAWED LOGIC: Assumes order only matters with multiple types
|
|
526
|
+
if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
|
|
527
|
+
result['_orderedChildren'] = orderedChildren;
|
|
528
|
+
}
|
|
529
|
+
|
|
530
|
+
// ✅ CORRECT LOGIC: Order always matters
|
|
531
|
+
if (orderedChildren.length > 0) {
|
|
532
|
+
result['_orderedChildren'] = orderedChildren;
|
|
533
|
+
}
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
The assumption that "single-type content doesn't need ordering" is **FALSE**. Consider:
|
|
537
|
+
|
|
538
|
+
- Multiple fields in sequence → Order matters (e.g., "Page 1 of 10")
|
|
539
|
+
- Multiple runs → Order matters for text flow
|
|
540
|
+
|
|
541
|
+
#### 3. **Architectural Mismatch**
|
|
542
|
+
|
|
543
|
+
The framework was designed with a **run-centric model**:
|
|
544
|
+
|
|
545
|
+
- Paragraphs contain **Runs, Hyperlinks, Fields**
|
|
546
|
+
- Each is a separate object type
|
|
547
|
+
|
|
548
|
+
But Word's XML has a **run-based structure** where **complex fields ARE runs**:
|
|
549
|
+
|
|
550
|
+
- Complex fields are **sequences of special runs**
|
|
551
|
+
- Each `<w:r>` can contain `<w:fldChar>` or `<w:instrText>`
|
|
552
|
+
|
|
553
|
+
The framework needs **stateful parsing** to group these runs into Field objects, but it uses **stateless element-by-element parsing**.
|
|
554
|
+
|
|
555
|
+
---
|
|
556
|
+
|
|
557
|
+
## 💡 Impact on User Experience
|
|
558
|
+
|
|
559
|
+
### Symptoms Users See
|
|
560
|
+
|
|
561
|
+
1. **"Fields disappear"** after processing
|
|
562
|
+
- User inserts PAGE field in Word
|
|
563
|
+
- Processes document in dochub-app
|
|
564
|
+
- Opens result → `PAGE` shows as literal text or is missing
|
|
565
|
+
|
|
566
|
+
2. **"Sometimes it works, sometimes it doesn't"**
|
|
567
|
+
- Document A: Simple fields → Works ✅
|
|
568
|
+
- Document B: Complex fields → Fails ❌
|
|
569
|
+
- Same operation, different field types → Appears random
|
|
570
|
+
|
|
571
|
+
3. **"Table of Contents is broken"**
|
|
572
|
+
- TOC uses complex fields
|
|
573
|
+
- Always lost during processing
|
|
574
|
+
- Document needs manual TOC recreation after processing
|
|
575
|
+
|
|
576
|
+
4. **"Page numbers disappear"**
|
|
577
|
+
- Header/footer page numbers often use complex fields
|
|
578
|
+
- Lost during document processing
|
|
579
|
+
- Users must manually re-insert fields
|
|
580
|
+
|
|
581
|
+
### Real-World Scenarios
|
|
582
|
+
|
|
583
|
+
**Scenario A: Legal Documents**
|
|
584
|
+
|
|
585
|
+
```
|
|
586
|
+
Original: Contract dated [DATE] on page [PAGE]
|
|
587
|
+
After: Contract dated DATE on page PAGE
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
- Field codes appear as literal text
|
|
591
|
+
- Professional documents look broken
|
|
592
|
+
|
|
593
|
+
**Scenario B: Reports**
|
|
594
|
+
|
|
595
|
+
```
|
|
596
|
+
Original: [TOC with hyperlinked entries]
|
|
597
|
+
After: [Empty TOC or visible field codes]
|
|
598
|
+
```
|
|
599
|
+
|
|
600
|
+
- TOC must be manually regenerated
|
|
601
|
+
- Cross-references broken
|
|
602
|
+
|
|
603
|
+
**Scenario C: Templates**
|
|
604
|
+
|
|
605
|
+
```
|
|
606
|
+
Original: Author: [AUTHOR], Modified: [SAVEDATE]
|
|
607
|
+
After: Author: AUTHOR, Modified: SAVEDATE
|
|
608
|
+
```
|
|
609
|
+
|
|
610
|
+
- Dynamic fields converted to static text
|
|
611
|
+
- Document loses its template functionality
|
|
612
|
+
|
|
613
|
+
---
|
|
614
|
+
|
|
615
|
+
## 🛠️ Recommended Fixes
|
|
616
|
+
|
|
617
|
+
### Fix Priority
|
|
618
|
+
|
|
619
|
+
1. **CRITICAL** - BUG #1: Add complex field parsing
|
|
620
|
+
2. **HIGH** - BUG #4: Detect and group `w:fldChar` runs
|
|
621
|
+
3. **MEDIUM** - BUG #2: Always generate `_orderedChildren`
|
|
622
|
+
4. **LOW** - BUG #3: Add ComplexField serialization
|
|
623
|
+
5. **LOW** - BUG #5: Validate content order preservation
|
|
624
|
+
|
|
625
|
+
### Fix #1: Add Complex Field Parsing
|
|
626
|
+
|
|
627
|
+
**File**: `docXMLater/src/core/DocumentParser.ts`
|
|
628
|
+
|
|
629
|
+
**Location**: `parseParagraphWithOrder()` method (around line 250)
|
|
630
|
+
|
|
631
|
+
**Current Code**:
|
|
632
|
+
|
|
633
|
+
```typescript
|
|
634
|
+
} else if (elementType === "w:fldSimple") {
|
|
635
|
+
// Parse simple fields
|
|
636
|
+
}
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
**Add After**:
|
|
640
|
+
|
|
641
|
+
```typescript
|
|
642
|
+
} else if (elementType === "w:r") {
|
|
643
|
+
// Check if this run contains field characters
|
|
644
|
+
const run = runArray[elementIndex];
|
|
645
|
+
if (run && (run["w:fldChar"] || run["w:instrText"])) {
|
|
646
|
+
// This is part of a complex field - add to pending field parser
|
|
647
|
+
this.addToComplexField(run);
|
|
648
|
+
} else {
|
|
649
|
+
// Regular run
|
|
650
|
+
const parsedRun = this.parseRunFromObject(run);
|
|
651
|
+
if (parsedRun) paragraph.addRun(parsedRun);
|
|
652
|
+
}
|
|
653
|
+
}
|
|
654
|
+
```
|
|
655
|
+
|
|
656
|
+
**Add New Method**:
|
|
657
|
+
|
|
658
|
+
```typescript
|
|
659
|
+
private complexFieldBuffer: any[] = [];
|
|
660
|
+
|
|
661
|
+
private addToComplexField(run: any): void {
|
|
662
|
+
this.complexFieldBuffer.push(run);
|
|
663
|
+
|
|
664
|
+
// Check if we've completed a field (found "end" marker)
|
|
665
|
+
if (run["w:fldChar"]?.["@_w:fldCharType"] === "end") {
|
|
666
|
+
const field = this.parseComplexFieldFromBuffer();
|
|
667
|
+
if (field) {
|
|
668
|
+
// Add field to current paragraph
|
|
669
|
+
}
|
|
670
|
+
this.complexFieldBuffer = [];
|
|
671
|
+
}
|
|
672
|
+
}
|
|
673
|
+
|
|
674
|
+
private parseComplexFieldFromBuffer(): ComplexField | null {
|
|
675
|
+
// Parse the buffered runs into a ComplexField object
|
|
676
|
+
// Extract instruction from w:instrText
|
|
677
|
+
// Extract result from runs between separate and end
|
|
678
|
+
// Return ComplexField instance
|
|
679
|
+
}
|
|
680
|
+
```
|
|
681
|
+
|
|
682
|
+
### Fix #2: Always Generate `_orderedChildren`
|
|
683
|
+
|
|
684
|
+
**File**: `docXMLater/src/xml/XMLParser.ts`
|
|
685
|
+
|
|
686
|
+
**Location**: `coalesceChildren()` method (line ~573)
|
|
687
|
+
|
|
688
|
+
**Current Code**:
|
|
689
|
+
|
|
690
|
+
```typescript
|
|
691
|
+
// ❌ BUG: Only adds metadata if multiple child types exist
|
|
692
|
+
if (uniqueTypes.length > 1 && orderedChildren.length > 0) {
|
|
693
|
+
result['_orderedChildren'] = orderedChildren;
|
|
694
|
+
}
|
|
695
|
+
```
|
|
696
|
+
|
|
697
|
+
**Fixed Code**:
|
|
698
|
+
|
|
699
|
+
```typescript
|
|
700
|
+
// ✅ FIX: Always add metadata to preserve element order
|
|
701
|
+
if (orderedChildren.length > 0) {
|
|
702
|
+
result['_orderedChildren'] = orderedChildren;
|
|
703
|
+
}
|
|
704
|
+
```
|
|
705
|
+
|
|
706
|
+
**Impact**: This single-line change ensures field order is always preserved, even for paragraphs with only one element type.
|
|
707
|
+
|
|
708
|
+
### Fix #3: Handle `w:fldChar` in Run Parsing
|
|
709
|
+
|
|
710
|
+
**File**: `docXMLater/src/core/DocumentParser.ts`
|
|
711
|
+
|
|
712
|
+
**Location**: `parseRunFromObject()` method (around line 500)
|
|
713
|
+
|
|
714
|
+
**Add to switch statement**:
|
|
715
|
+
|
|
716
|
+
```typescript
|
|
717
|
+
switch (elementType) {
|
|
718
|
+
case 'w:t':
|
|
719
|
+
// Existing text handling
|
|
720
|
+
break;
|
|
721
|
+
|
|
722
|
+
case 'w:tab':
|
|
723
|
+
// Existing tab handling
|
|
724
|
+
break;
|
|
725
|
+
|
|
726
|
+
case 'w:fldChar':
|
|
727
|
+
// ✅ NEW: Handle field character markers
|
|
728
|
+
const fldChar = runObj['w:fldChar'];
|
|
729
|
+
const fldCharType = fldChar?.['@_w:fldCharType'];
|
|
730
|
+
content.push({
|
|
731
|
+
type: 'fieldCharacter',
|
|
732
|
+
value: fldCharType, // 'begin', 'separate', or 'end'
|
|
733
|
+
});
|
|
734
|
+
break;
|
|
735
|
+
|
|
736
|
+
case 'w:instrText':
|
|
737
|
+
// ✅ NEW: Handle field instruction text
|
|
738
|
+
const instrText = runObj['w:instrText'];
|
|
739
|
+
const instruction =
|
|
740
|
+
typeof instrText === 'object' && instrText !== null
|
|
741
|
+
? instrText['#text'] || ''
|
|
742
|
+
: instrText || '';
|
|
743
|
+
content.push({
|
|
744
|
+
type: 'fieldInstruction',
|
|
745
|
+
value: instruction,
|
|
746
|
+
});
|
|
747
|
+
break;
|
|
748
|
+
}
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### Fix #4: Update Run to Support Field Elements
|
|
752
|
+
|
|
753
|
+
**File**: `docXMLater/src/elements/Run.ts`
|
|
754
|
+
|
|
755
|
+
**Update RunContentType**:
|
|
756
|
+
|
|
757
|
+
```typescript
|
|
758
|
+
export type RunContentType =
|
|
759
|
+
| 'text'
|
|
760
|
+
| 'tab'
|
|
761
|
+
| 'break'
|
|
762
|
+
| 'carriageReturn'
|
|
763
|
+
| 'softHyphen'
|
|
764
|
+
| 'noBreakHyphen'
|
|
765
|
+
| 'fieldCharacter' // ✅ NEW: w:fldChar elements
|
|
766
|
+
| 'fieldInstruction'; // ✅ NEW: w:instrText elements
|
|
767
|
+
```
|
|
768
|
+
|
|
769
|
+
**Update Run.toXML()**:
|
|
770
|
+
|
|
771
|
+
```typescript
|
|
772
|
+
switch (contentElement.type) {
|
|
773
|
+
case 'fieldCharacter':
|
|
774
|
+
runChildren.push(
|
|
775
|
+
XMLBuilder.wSelf('fldChar', {
|
|
776
|
+
'w:fldCharType': contentElement.value,
|
|
777
|
+
})
|
|
778
|
+
);
|
|
779
|
+
break;
|
|
780
|
+
|
|
781
|
+
case 'fieldInstruction':
|
|
782
|
+
runChildren.push(
|
|
783
|
+
XMLBuilder.w(
|
|
784
|
+
'instrText',
|
|
785
|
+
{
|
|
786
|
+
'xml:space': 'preserve',
|
|
787
|
+
},
|
|
788
|
+
[contentElement.value || '']
|
|
789
|
+
)
|
|
790
|
+
);
|
|
791
|
+
break;
|
|
792
|
+
}
|
|
793
|
+
```
|
|
794
|
+
|
|
795
|
+
---
|
|
796
|
+
|
|
797
|
+
## 🧪 Testing Strategy
|
|
798
|
+
|
|
799
|
+
### Test Cases
|
|
800
|
+
|
|
801
|
+
#### Test 1: Simple Field Preservation
|
|
802
|
+
|
|
803
|
+
```typescript
|
|
804
|
+
// Create document with simple field
|
|
805
|
+
const doc = Document.create();
|
|
806
|
+
const para = new Paragraph();
|
|
807
|
+
para.addField(Field.createPageNumber());
|
|
808
|
+
doc.addParagraph(para);
|
|
809
|
+
|
|
810
|
+
// Save and reload
|
|
811
|
+
await doc.save('test.docx');
|
|
812
|
+
const doc2 = await Document.load('test.docx');
|
|
813
|
+
|
|
814
|
+
// Verify field exists
|
|
815
|
+
const paras = doc2.getParagraphs();
|
|
816
|
+
assert(paras[0].getContent().some((item) => item instanceof Field));
|
|
817
|
+
```
|
|
818
|
+
|
|
819
|
+
#### Test 2: Complex Field Preservation
|
|
820
|
+
|
|
821
|
+
```typescript
|
|
822
|
+
// Create document with complex field
|
|
823
|
+
const doc = Document.create();
|
|
824
|
+
const para = new Paragraph();
|
|
825
|
+
const complexField = new ComplexField({
|
|
826
|
+
instruction: ' PAGE \\* MERGEFORMAT ',
|
|
827
|
+
result: '1',
|
|
828
|
+
});
|
|
829
|
+
para.addField(complexField);
|
|
830
|
+
doc.addParagraph(para);
|
|
831
|
+
|
|
832
|
+
// Save and reload
|
|
833
|
+
await doc.save('test-complex.docx');
|
|
834
|
+
const doc2 = await Document.load('test-complex.docx');
|
|
835
|
+
|
|
836
|
+
// Verify complex field preserved
|
|
837
|
+
const content = doc2.getParagraphs()[0].getContent();
|
|
838
|
+
assert(content.some((item) => item instanceof ComplexField));
|
|
839
|
+
```
|
|
840
|
+
|
|
841
|
+
#### Test 3: Field Order Preservation
|
|
842
|
+
|
|
843
|
+
```typescript
|
|
844
|
+
// Create paragraph with interleaved content
|
|
845
|
+
const para = new Paragraph();
|
|
846
|
+
para.addText('Page ');
|
|
847
|
+
para.addField(Field.createPageNumber());
|
|
848
|
+
para.addText(' of ');
|
|
849
|
+
para.addField(Field.createTotalPages());
|
|
850
|
+
|
|
851
|
+
// Save and reload
|
|
852
|
+
// Verify order: Run → Field → Run → Field
|
|
853
|
+
```
|
|
854
|
+
|
|
855
|
+
### Validation Approach
|
|
856
|
+
|
|
857
|
+
1. **Unit Tests**: Test each parsing/serialization method independently
|
|
858
|
+
2. **Integration Tests**: Test full load/save cycle
|
|
859
|
+
3. **Real Documents**: Test with actual Word documents containing various field types
|
|
860
|
+
4. **Regression Tests**: Ensure fixes don't break existing functionality
|
|
861
|
+
|
|
862
|
+
---
|
|
863
|
+
|
|
864
|
+
## 📝 Additional Observations
|
|
865
|
+
|
|
866
|
+
### Positive Findings
|
|
867
|
+
|
|
868
|
+
✅ **The Field classes are well-designed**
|
|
869
|
+
|
|
870
|
+
- `Field.ts` and `FieldHelpers.ts` provide comprehensive field support
|
|
871
|
+
- Both simple and complex fields have complete implementations
|
|
872
|
+
- Field creation helpers exist for common field types
|
|
873
|
+
|
|
874
|
+
✅ **Paragraph and Run handling is solid**
|
|
875
|
+
|
|
876
|
+
- Order preservation works well for runs and hyperlinks
|
|
877
|
+
- The `_orderedChildren` mechanism is clever
|
|
878
|
+
- Type-safe object model prevents many bugs
|
|
879
|
+
|
|
880
|
+
✅ **dochub-app integration is clean**
|
|
881
|
+
|
|
882
|
+
- WordDocumentProcessor uses docXMLater APIs correctly
|
|
883
|
+
- Error handling is comprehensive
|
|
884
|
+
- Memory management is excellent
|
|
885
|
+
|
|
886
|
+
### Areas of Concern
|
|
887
|
+
|
|
888
|
+
⚠️ **No Field Extraction API**
|
|
889
|
+
|
|
890
|
+
- WordDocumentProcessor has `extractHyperlinks()` method
|
|
891
|
+
- **No equivalent `extractFields()` method** exists
|
|
892
|
+
- Can't enumerate fields in a document programmatically
|
|
893
|
+
- dochub-app can't validate or report on field preservation
|
|
894
|
+
|
|
895
|
+
⚠️ **No Field Validation**
|
|
896
|
+
|
|
897
|
+
- No checks during save to warn about lost fields
|
|
898
|
+
- Silent data loss - users don't know fields were dropped
|
|
899
|
+
- No diff/comparison showing before/after field counts
|
|
900
|
+
|
|
901
|
+
⚠️ **Limited Field Support in Paragraphpara API**
|
|
902
|
+
|
|
903
|
+
- `Paragraph.addField()` exists
|
|
904
|
+
- **No `Paragraph.getFields()` method**
|
|
905
|
+
- **No `Paragraph.removeField()` method**
|
|
906
|
+
- Fields can't be queried or manipulated after being added
|
|
907
|
+
|
|
908
|
+
---
|
|
909
|
+
|
|
910
|
+
## 🎯 Critical Path to Resolution
|
|
911
|
+
|
|
912
|
+
### For Immediate Relief (Quick Fix)
|
|
913
|
+
|
|
914
|
+
**Option A: Document with Warning**
|
|
915
|
+
|
|
916
|
+
1. Add field count validation before/after processing
|
|
917
|
+
2. Warn users if fields are lost
|
|
918
|
+
3. Document limitation in UI: "Complex fields are not preserved"
|
|
919
|
+
|
|
920
|
+
**Option B: Raw XML Passthrough**
|
|
921
|
+
|
|
922
|
+
1. Detect complex fields during load
|
|
923
|
+
2. Store original XML for those paragraphs
|
|
924
|
+
3. Write back unchanged XML during save
|
|
925
|
+
4. Only process paragraphs without complex fields
|
|
926
|
+
|
|
927
|
+
### For Complete Solution (Full Fix)
|
|
928
|
+
|
|
929
|
+
**Phase 1: Parsing**
|
|
930
|
+
|
|
931
|
+
1. Implement `parseComplexFieldFromRunSequence()`
|
|
932
|
+
2. Add `w:fldChar` and `w:instrText` detection to run parser
|
|
933
|
+
3. Add state machine to group field runs into ComplexField objects
|
|
934
|
+
4. Update `parseParagraphWithOrder()` to handle complex fields
|
|
935
|
+
|
|
936
|
+
**Phase 2: XMLParser Fix**
|
|
937
|
+
|
|
938
|
+
1. Remove `uniqueTypes.length > 1` condition in `coalesceChildren()`
|
|
939
|
+
2. Always generate `_orderedChildren` when elements exist
|
|
940
|
+
3. Add regression tests for single-type parsing
|
|
941
|
+
|
|
942
|
+
**Phase 3: Serialization**
|
|
943
|
+
|
|
944
|
+
1. Update `Paragraph.toXML()` to handle `ComplexField` separately
|
|
945
|
+
2. `ComplexField.toXML()` should return **multiple runs**, not wrapped in single run
|
|
946
|
+
3. Preserve `<w:fldChar>` elements in run serialization
|
|
947
|
+
|
|
948
|
+
**Phase 4: API Enhancement**
|
|
949
|
+
|
|
950
|
+
1. Add `Paragraph.getFields()` method
|
|
951
|
+
2. Add `Document.extractFields()` method (like `extractHyperlinks()`)
|
|
952
|
+
3. Add field validation during save
|
|
953
|
+
|
|
954
|
+
---
|
|
955
|
+
|
|
956
|
+
## 📚 References and Evidence
|
|
957
|
+
|
|
958
|
+
### Files Analyzed
|
|
959
|
+
|
|
960
|
+
#### docXMLater Framework
|
|
961
|
+
|
|
962
|
+
1. ✅ `src/core/DocumentParser.ts` (1,500+ lines) - Parsing logic
|
|
963
|
+
2. ✅ `src/core/DocumentGenerator.ts` (400+ lines) - Generation logic
|
|
964
|
+
3. ✅ `src/core/Document.ts` (2,000+ lines) - Main API
|
|
965
|
+
4. ✅ `src/elements/Field.ts` (500+ lines) - Field classes
|
|
966
|
+
5. ✅ `src/elements/FieldHelpers.ts` (200+ lines) - Field utilities
|
|
967
|
+
6. ✅ `src/elements/Paragraph.ts` (1,300+ lines) - Paragraph class
|
|
968
|
+
7. ✅ `src/elements/Run.ts` (600+ lines) - Run class
|
|
969
|
+
8. ✅ `src/xml/XMLParser.ts` (700+ lines) - XML parsing
|
|
970
|
+
9. ✅ `src/xml/XMLBuilder.ts` (300+ lines) - XML building
|
|
971
|
+
|
|
972
|
+
#### dochub-app Application
|
|
973
|
+
|
|
974
|
+
1. ✅ `src/services/document/WordDocumentProcessor.ts` (1,800+ lines)
|
|
975
|
+
2. ✅ `src/services/document/DocXMLaterProcessor.ts` (500+ lines)
|
|
976
|
+
3. ✅ `docs/architecture/DOCXMLATER_INTEGRATION.md`
|
|
977
|
+
4. ✅ `FIXES_COMPLETED.md`
|
|
978
|
+
5. ✅ `TEST_RESULTS_SUMMARY.md`
|
|
979
|
+
|
|
980
|
+
### Key Code Locations
|
|
981
|
+
|
|
982
|
+
**Field Parsing (Simple)**:
|
|
983
|
+
|
|
984
|
+
- File: `docXMLater/src/core/DocumentParser.ts`
|
|
985
|
+
- Method: `parseSimpleFieldFromObject()`
|
|
986
|
+
- Lines: ~580-616
|
|
987
|
+
- Status: ✅ Working
|
|
988
|
+
|
|
989
|
+
**Field Parsing (Complex)**:
|
|
990
|
+
|
|
991
|
+
- File: `docXMLater/src/core/DocumentParser.ts`
|
|
992
|
+
- Method: **DOES NOT EXIST** ❌
|
|
993
|
+
- Expected: `parseComplexFieldFromRunSequence()`
|
|
994
|
+
- Status: ❌ Missing
|
|
995
|
+
|
|
996
|
+
**Field Detection in Ordered Parsing**:
|
|
997
|
+
|
|
998
|
+
- File: `docXMLater/src/core/DocumentParser.ts`
|
|
999
|
+
- Method: `parseParagraphWithOrder()`
|
|
1000
|
+
- Lines: ~238-310
|
|
1001
|
+
- Issue: ✅ Handles `w:fldSimple`, ❌ Ignores `w:fldChar`
|
|
1002
|
+
|
|
1003
|
+
**Order Metadata Generation**:
|
|
1004
|
+
|
|
1005
|
+
- File: `docXMLater/src/xml/XMLParser.ts`
|
|
1006
|
+
- Method: `coalesceChildren()`
|
|
1007
|
+
- Lines: ~560-575
|
|
1008
|
+
- Issue: ❌ Conditional generation based on `uniqueTypes.length > 1`
|
|
1009
|
+
|
|
1010
|
+
**Field Serialization**:
|
|
1011
|
+
|
|
1012
|
+
- File: `docXMLater/src/elements/Paragraph.ts`
|
|
1013
|
+
- Method: `toXML()`
|
|
1014
|
+
- Lines: ~600-650
|
|
1015
|
+
- Issue: ❌ Wraps fields in run, always generates `fldSimple`
|
|
1016
|
+
|
|
1017
|
+
---
|
|
1018
|
+
|
|
1019
|
+
## 🔬 Additional Technical Details
|
|
1020
|
+
|
|
1021
|
+
### ComplexField Class Design
|
|
1022
|
+
|
|
1023
|
+
The `ComplexField` class in `Field.ts` is **well-designed** and supports:
|
|
1024
|
+
|
|
1025
|
+
```typescript
|
|
1026
|
+
export class ComplexField {
|
|
1027
|
+
private instruction: string;
|
|
1028
|
+
private result?: string;
|
|
1029
|
+
private instructionFormatting?: RunFormatting;
|
|
1030
|
+
private resultFormatting?: RunFormatting;
|
|
1031
|
+
private nestedFields: ComplexField[];
|
|
1032
|
+
private resultContent: XMLElement[];
|
|
1033
|
+
private multiParagraph: boolean;
|
|
1034
|
+
|
|
1035
|
+
toXML(): XMLElement[] {
|
|
1036
|
+
// Returns ARRAY of run elements (begin, instr, sep, result, end)
|
|
1037
|
+
// ✅ Correctly handles complex field structure
|
|
1038
|
+
}
|
|
1039
|
+
}
|
|
1040
|
+
```
|
|
1041
|
+
|
|
1042
|
+
**The class is ready to use** - it just needs to be instantiated during parsing!
|
|
1043
|
+
|
|
1044
|
+
### Why TOC Generation Works But TOC Preservation Doesn't
|
|
1045
|
+
|
|
1046
|
+
**File**: `docXMLater/src/core/Document.ts` (lines ~1100-1200)
|
|
1047
|
+
|
|
1048
|
+
The framework has a `replaceTableOfContents()` method that:
|
|
1049
|
+
|
|
1050
|
+
1. Reads the saved DOCX file
|
|
1051
|
+
2. Finds TOC SDT elements in XML
|
|
1052
|
+
3. **Replaces them directly in the XML string**
|
|
1053
|
+
4. Saves the modified XML back
|
|
1054
|
+
|
|
1055
|
+
This works because it **bypasses the object model entirely** - it never tries to parse the complex TOC fields into objects. It's pure XML string manipulation.
|
|
1056
|
+
|
|
1057
|
+
**This confirms**: The framework developers **knew** complex fields were problematic and worked around it by using direct XML manipulation instead of object model parsing.
|
|
1058
|
+
|
|
1059
|
+
---
|
|
1060
|
+
|
|
1061
|
+
## 🚨 Critical Conclusions
|
|
1062
|
+
|
|
1063
|
+
### The "Hit or Miss" Behavior Explained
|
|
1064
|
+
|
|
1065
|
+
**Fields are preserved when**:
|
|
1066
|
+
✅ They use `<w:fldSimple>` structure (simple fields)
|
|
1067
|
+
✅ AND paragraph has multiple element types (triggers `_orderedChildren`)
|
|
1068
|
+
✅ AND document isn't heavily modified during processing
|
|
1069
|
+
|
|
1070
|
+
**Fields are lost when**:
|
|
1071
|
+
❌ They use `<w:fldChar>` structure (complex fields) - **ALWAYS LOST**
|
|
1072
|
+
❌ OR paragraph has only fields (no `_orderedChildren` → wrong order → may break)
|
|
1073
|
+
❌ OR processing modifies paragraph content array (rare but possible)
|
|
1074
|
+
|
|
1075
|
+
### Why It Seems Random
|
|
1076
|
+
|
|
1077
|
+
Users can't see the difference between simple and complex fields in Word - they look identical. Word chooses the structure automatically based on internal complexity heuristics. This makes the bug appear non-deterministic from the user's perspective.
|
|
1078
|
+
|
|
1079
|
+
### Business Impact
|
|
1080
|
+
|
|
1081
|
+
**HIGH SEVERITY** - This affects:
|
|
1082
|
+
|
|
1083
|
+
- 💼 Legal documents (contracts, agreements)
|
|
1084
|
+
- 📊 Report templates (automated fields)
|
|
1085
|
+
- 📚 Technical documentation (cross-references)
|
|
1086
|
+
- 📄 Forms (merge fields)
|
|
1087
|
+
- 📖 Books/manuals (TOC, page numbers, cross-refs)
|
|
1088
|
+
|
|
1089
|
+
Any document relying on dynamic fields **will be broken** after processing through dochub-app.
|
|
1090
|
+
|
|
1091
|
+
---
|
|
1092
|
+
|
|
1093
|
+
## ✅ Verification Steps
|
|
1094
|
+
|
|
1095
|
+
To confirm these bugs in your environment:
|
|
1096
|
+
|
|
1097
|
+
### Step 1: Create Test Document in Word
|
|
1098
|
+
|
|
1099
|
+
1. Open Microsoft Word
|
|
1100
|
+
2. Insert → Quick Parts → Field
|
|
1101
|
+
3. Choose "Page"→ OK (this creates a PAGE field)
|
|
1102
|
+
4. Save as `test-simple-field.docx`
|
|
1103
|
+
5. Press Alt+F9 to view field codes
|
|
1104
|
+
6. Check if it shows `<w:fldSimple>` or `<w:fldChar>` in the XML
|
|
1105
|
+
|
|
1106
|
+
### Step 2: Process Through dochub-app
|
|
1107
|
+
|
|
1108
|
+
1. Load `test-simple-field.docx` in dochub-app
|
|
1109
|
+
2. Process with minimal settings (no major modifications)
|
|
1110
|
+
3. Save result
|
|
1111
|
+
4. Open result in Word
|
|
1112
|
+
5. Press Alt+F9 - does the field still exist?
|
|
1113
|
+
|
|
1114
|
+
### Step 3: Check XML Directly
|
|
1115
|
+
|
|
1116
|
+
```bash
|
|
1117
|
+
# Extract DOCX (it's a ZIP file)
|
|
1118
|
+
unzip test-simple-field.docx -d test-simple
|
|
1119
|
+
unzip result.docx -d result
|
|
1120
|
+
|
|
1121
|
+
# Compare field presence
|
|
1122
|
+
grep -i "fldSimple\|fldChar\|instrText" test-simple/word/document.xml
|
|
1123
|
+
grep -i "fldSimple\|fldChar\|instrText" result/word/document.xml
|
|
1124
|
+
```
|
|
1125
|
+
|
|
1126
|
+
### Step 4: Test Complex Field
|
|
1127
|
+
|
|
1128
|
+
1. In Word, create a cross-reference (Insert → Cross-reference)
|
|
1129
|
+
2. Save as `test-complex-field.docx`
|
|
1130
|
+
3. Process through dochub-app
|
|
1131
|
+
4. Check if cross-reference still works
|
|
1132
|
+
|
|
1133
|
+
**Expected Result**: Cross-reference will be **broken** (shows literal text or error).
|
|
1134
|
+
|
|
1135
|
+
---
|
|
1136
|
+
|
|
1137
|
+
## 📋 Recommendations
|
|
1138
|
+
|
|
1139
|
+
### Immediate Actions
|
|
1140
|
+
|
|
1141
|
+
1. **Document the limitation** in dochub-app user guide
|
|
1142
|
+
2. **Add validation** to warn users when fields will be lost
|
|
1143
|
+
3. **Consider field count** in processing statistics
|
|
1144
|
+
4. **Add field type** to processing options (warn about complex fields)
|
|
1145
|
+
|
|
1146
|
+
### Short-Term (1-2 weeks)
|
|
1147
|
+
|
|
1148
|
+
1. **Implement Fix #2** (always generate `_orderedChildren`) - Low risk, high impact
|
|
1149
|
+
2. **Add `extractFields()` API** to DocXMLaterProcessor for visibility
|
|
1150
|
+
3. **Add unit tests** for simple field preservation
|
|
1151
|
+
4. **Update documentation** with field preservation status
|
|
1152
|
+
|
|
1153
|
+
### Long-Term (1-2 months)
|
|
1154
|
+
|
|
1155
|
+
1. **Implement Fix #1** (complex field parsing) - Requires architecture changes
|
|
1156
|
+
2. **Add stateful parser** for complex fields
|
|
1157
|
+
3. **Update Paragraph API** to support ComplexField
|
|
1158
|
+
4. **Comprehensive testing** with real-world documents
|
|
1159
|
+
|
|
1160
|
+
### Workaround for Users
|
|
1161
|
+
|
|
1162
|
+
Until fixed, users should:
|
|
1163
|
+
|
|
1164
|
+
1. **Avoid processing documents with critical fields**
|
|
1165
|
+
2. **Re-insert fields manually** after processing if needed
|
|
1166
|
+
3. **Use simple field structure** when possible (convert in Word first)
|
|
1167
|
+
4. **Keep backups** before processing (dochub-app already does this ✅)
|
|
1168
|
+
|
|
1169
|
+
---
|
|
1170
|
+
|
|
1171
|
+
## 📞 Support Information
|
|
1172
|
+
|
|
1173
|
+
### For Developers
|
|
1174
|
+
|
|
1175
|
+
- This analysis file: `FIELD_PRESERVATION_ANALYSIS.md`
|
|
1176
|
+
- Framework repo: `c:\Users\DiaTech\Pictures\DiaTech\Programs\DocHub\development\docXMLater`
|
|
1177
|
+
- Application repo: `c:\Users\DiaTech\Pictures\DiaTech\Programs\DocHub\development\dochub-app`
|
|
1178
|
+
|
|
1179
|
+
### For Bug Reports
|
|
1180
|
+
|
|
1181
|
+
Include:
|
|
1182
|
+
|
|
1183
|
+
1. Sample DOCX with fields that are lost
|
|
1184
|
+
2. XML diff showing before/after field presence
|
|
1185
|
+
3. Field type (simple vs complex) from XML inspection
|
|
1186
|
+
4. Processing options used in dochub-app
|
|
1187
|
+
|
|
1188
|
+
---
|
|
1189
|
+
|
|
1190
|
+
## ✍️ Document Metadata
|
|
1191
|
+
|
|
1192
|
+
**Analysis Date**: November 14, 2025
|
|
1193
|
+
**Files Analyzed**: 14 files, ~10,000 lines of code
|
|
1194
|
+
**Bugs Identified**: 5 critical/high severity
|
|
1195
|
+
**Fix Complexity**: Medium-High (requires stateful parsing)
|
|
1196
|
+
**Breaking Changes**: None (fixes are additive)
|
|
1197
|
+
|
|
1198
|
+
---
|
|
1199
|
+
|
|
1200
|
+
**END OF ANALYSIS**
|