devlyn-cli 0.5.2 → 0.5.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/bin/devlyn.js +1 -0
  2. package/config/commands/devlyn.team-resolve.md +31 -2
  3. package/optional-skills/dokkit/ANALYSIS.md +198 -0
  4. package/optional-skills/dokkit/COMMANDS.md +365 -0
  5. package/optional-skills/dokkit/DOCX-XML.md +76 -0
  6. package/optional-skills/dokkit/EXPORT.md +102 -0
  7. package/optional-skills/dokkit/FILLING.md +377 -0
  8. package/optional-skills/dokkit/HWPX-XML.md +73 -0
  9. package/optional-skills/dokkit/IMAGE-SOURCING.md +127 -0
  10. package/optional-skills/dokkit/INGESTION.md +65 -0
  11. package/optional-skills/dokkit/SKILL.md +153 -0
  12. package/optional-skills/dokkit/STATE.md +60 -0
  13. package/optional-skills/dokkit/references/docx-field-patterns.md +151 -0
  14. package/optional-skills/dokkit/references/docx-structure.md +58 -0
  15. package/optional-skills/dokkit/references/field-detection-patterns.md +130 -0
  16. package/optional-skills/dokkit/references/hwpx-field-patterns.md +461 -0
  17. package/optional-skills/dokkit/references/hwpx-structure.md +159 -0
  18. package/optional-skills/dokkit/references/image-opportunity-heuristics.md +121 -0
  19. package/optional-skills/dokkit/references/image-xml-patterns.md +338 -0
  20. package/optional-skills/dokkit/references/section-image-interleaving.md +346 -0
  21. package/optional-skills/dokkit/references/section-range-detection.md +118 -0
  22. package/optional-skills/dokkit/references/state-schema.md +143 -0
  23. package/optional-skills/dokkit/references/supported-formats.md +67 -0
  24. package/optional-skills/dokkit/scripts/compile_hwpx.py +134 -0
  25. package/optional-skills/dokkit/scripts/detect_fields.py +301 -0
  26. package/optional-skills/dokkit/scripts/detect_fields_hwpx.py +286 -0
  27. package/optional-skills/dokkit/scripts/export_pdf.py +99 -0
  28. package/optional-skills/dokkit/scripts/parse_hwpx.py +185 -0
  29. package/optional-skills/dokkit/scripts/parse_image_with_gemini.py +159 -0
  30. package/optional-skills/dokkit/scripts/parse_xlsx.py +98 -0
  31. package/optional-skills/dokkit/scripts/source_images.py +365 -0
  32. package/optional-skills/dokkit/scripts/validate_docx.py +142 -0
  33. package/optional-skills/dokkit/scripts/validate_hwpx.py +281 -0
  34. package/optional-skills/dokkit/scripts/validate_state.py +132 -0
  35. package/package.json +1 -1
@@ -0,0 +1,130 @@
1
+ # Field Detection Patterns
2
+
3
+ ## DOCX Detection Heuristics
4
+
5
+ ### Heuristic 1: Curly Brace Placeholders
6
+ ```regex
7
+ \{\{[^}]+\}\}
8
+ ```
9
+ Match text like `{{field_name}}`. High reliability.
10
+
11
+ ### Heuristic 2: Angle Bracket Placeholders
12
+ ```regex
13
+ <<[^>]+>>
14
+ ```
15
+ Match text like `<<field_name>>`. High reliability.
16
+
17
+ ### Heuristic 3: Square Bracket Placeholders
18
+ ```regex
19
+ \[[^\]]+\]
20
+ ```
21
+ Match text like `[field_name]`. Medium reliability (may match references).
22
+
23
+ ### Heuristic 4: Underline-Only Runs
24
+ A run where:
25
+ - `<w:rPr>` contains `<w:u w:val="single"/>`
26
+ - `<w:t>` contains only spaces, underscores, or is empty
27
+ - Run length > 3 characters
28
+
29
+ ### Heuristic 5: Empty Table Cells
30
+ A `<w:tc>` that:
31
+ - Contains only `<w:p/>` or `<w:p><w:pPr/></w:p>` (empty paragraph)
32
+ - Is adjacent to a cell containing text (the label)
33
+ - The label cell's text is short (< 50 chars) and not numeric
34
+
35
+ ### Heuristic 6: Instruction Text
36
+ A run where text matches patterns like:
37
+ ```regex
38
+ \(.*?(enter|type|input|write|fill|입력).*?\)
39
+ ```
40
+
41
+ ### Heuristic 7: Content Controls
42
+ Any `<w:sdt>` element with `<w:showingPlcHdr/>` in its properties.
43
+
44
+ ### Heuristic 8: Image Fields
45
+ A field is classified as `image` when any of these conditions hold:
46
+ - A `{{placeholder}}` or `<<placeholder>>` contains an image keyword
47
+ - A table cell contains an existing `<w:drawing>` element (pre-positioned image slot)
48
+ - An empty table cell is adjacent to a cell whose label matches an image keyword
49
+
50
+ **Image keywords** (case-insensitive):
51
+ - Korean: 사진, 증명사진, 여권사진, 로고, 서명, 날인, 도장, 직인
52
+ - English: Photo, Picture, Logo, Signature, Stamp, Seal, Image, Portrait
53
+
54
+ **Image type classification**:
55
+ | Keyword match | `image_type` |
56
+ |---------------|-------------|
57
+ | 사진, 증명사진, 여권사진, photo, picture, portrait, image | `photo` |
58
+ | 로고, logo | `logo` |
59
+ | 서명, 날인, 도장, 직인, signature, stamp, seal | `signature` |
60
+ | (no keyword match) | `figure` |
61
+
62
+ Image fields are **excluded** from the `placeholder_text` and `empty_cell` detectors to prevent double-detection.
63
+
64
+ ### Heuristic 9: Tip Box
65
+ A `<w:tbl>` that:
66
+ - Has exactly one row and one cell (1×1 table)
67
+ - `<w:tblBorders>` uses `w:val="dashed"` borders
68
+ - Cell text starts with `※` or contains `작성 팁` / `작성요령`
69
+ - Often has red text color (`<w:color w:val="FF0000"/>`)
70
+
71
+ → `field_type: "tip_box"`, `action: "delete"`
72
+
73
+ ## HWPX Detection Heuristics
74
+
75
+ ### Heuristic 1: Empty Adjacent Cells
76
+ Same as DOCX but using `<hp:tc>` and `<hp:t>` elements.
77
+
78
+ ### Heuristic 2: Korean Instruction Text
79
+ ```regex
80
+ \(.*?(입력|기재|작성).*?\)
81
+ ```
82
+
83
+ ### Heuristic 3: Date Component Cells
84
+ Cells immediately before 년/월/일 (year/month/day) markers.
85
+
86
+ ### Heuristic 4: Image Fields
87
+ Same logic as DOCX Heuristic 8, adapted for HWPX elements:
88
+ - `<hp:pic>` instead of `<w:drawing>`
89
+ - `<hp:tc>` / `<hp:t>` instead of `<w:tc>` / `<w:t>`
90
+ - Same image keyword list and type classification
91
+
92
+ ### Heuristic 5: Tip Box
93
+ An `<hp:tbl>` that:
94
+ - Has `rowCnt="1"` and `colCnt="1"` (single-cell table)
95
+ - `borderFillIDRef` resolves to DASH border style in `header.xml`
96
+ - Cell text starts with `※` or contains `작성 팁` / `작성요령` / `작성 요령`
97
+ - May appear standalone or nested inside a `<hp:subList>` within another cell
98
+
99
+ → `field_type: "tip_box"`, `action: "delete"`, `container: "standalone"|"nested"`
100
+
101
+ ### Heuristic 6: Section Header Rows
102
+ Table rows where:
103
+ - First cell spans multiple columns (`hp:cellSpan colSpan > 1`)
104
+ - Text is short and descriptive (section name)
105
+ - Background may be shaded
106
+
107
+ ## HWPX Pre-Fill Sanitization
108
+
109
+ ### Negative Character Spacing
110
+ HWPX templates may define `<hh:charPr>` elements in `header.xml` with negative `<hh:spacing>` values (e.g., `hangul="-3"`). These compress characters closer together, which works for short placeholder text but causes **severe text overlap** when the filler replaces placeholders with longer content.
111
+
112
+ **Rule**: Before filling, scan ALL `<hh:charPr>` definitions in `header.xml` and set any negative spacing attribute values to `"0"`. This applies to all attributes: `hangul`, `latin`, `hanja`, `japanese`, `other`, `symbol`, `user`.
113
+
114
+ **Example fix**:
115
+ ```xml
116
+ <!-- Before (causes overlap) -->
117
+ <hh:spacing hangul="-3" latin="-3" hanja="-3" japanese="-3" other="-3" symbol="-3" user="-3"/>
118
+
119
+ <!-- After (normal spacing) -->
120
+ <hh:spacing hangul="0" latin="0" hanja="0" japanese="0" other="0" symbol="0" user="0"/>
121
+ ```
122
+
123
+ ## False Positive Filtering
124
+
125
+ Exclude detected "fields" that are:
126
+ - Part of a header/title row (not fillable)
127
+ - Copyright notices or footer text
128
+ - Page numbers or running headers
129
+ - Table of contents entries
130
+ - Cross-reference markers
@@ -0,0 +1,461 @@
1
+ # HWPX Field Detection Patterns
2
+
3
+ ## Pattern 1: Empty Table Cell
4
+
5
+ Korean forms are heavily table-based. The most common pattern:
6
+
7
+ ```xml
8
+ <hp:tr>
9
+ <hp:tc>
10
+ <!-- Label cell -->
11
+ <hp:p>
12
+ <hp:run>
13
+ <hp:rPr charPrIDRef="1"/>
14
+ <hp:t>성명</hp:t>
15
+ </hp:run>
16
+ </hp:p>
17
+ </hp:tc>
18
+ <hp:tc>
19
+ <!-- Empty value cell → FILL THIS -->
20
+ <hp:p>
21
+ <hp:lineseg/>
22
+ </hp:p>
23
+ </hp:tc>
24
+ </hp:tr>
25
+ ```
26
+
27
+ **Action**: Insert a new `<hp:run>` with `<hp:t>value</hp:t>` into the empty paragraph. Copy `charPrIDRef` from label cell's run.
28
+
29
+ ## Pattern 2: Placeholder Text in Cell
30
+
31
+ ```xml
32
+ <hp:tc>
33
+ <hp:p>
34
+ <hp:run>
35
+ <hp:t>(이름을 입력하세요)</hp:t> <!-- Instruction text -->
36
+ </hp:run>
37
+ </hp:p>
38
+ </hp:tc>
39
+ ```
40
+
41
+ **Action**: Replace the text in `<hp:t>` with the actual value.
42
+
43
+ ## Pattern 3: Multi-Row Spanning Label
44
+
45
+ Korean forms often have a label cell spanning multiple rows:
46
+
47
+ ```xml
48
+ <hp:tr>
49
+ <hp:tc>
50
+ <hp:cellSpan rowSpan="3"/>
51
+ <hp:p><hp:run><hp:t>학력</hp:t></hp:run></hp:p>
52
+ </hp:tc>
53
+ <hp:tc><hp:p><hp:run><hp:t>학교명</hp:t></hp:run></hp:p></hp:tc>
54
+ <hp:tc><hp:p/></hp:tc> <!-- Empty → fill with school name -->
55
+ </hp:tr>
56
+ ```
57
+
58
+ **Action**: The spanning label ("학력" = Education) is the section. Sub-labels ("학교명" = School Name) identify individual fields.
59
+
60
+ ## Pattern 4: Date Fields
61
+
62
+ ```xml
63
+ <hp:tc>
64
+ <hp:p>
65
+ <hp:run><hp:t>년</hp:t></hp:run> <!-- Year -->
66
+ </hp:p>
67
+ </hp:tc>
68
+ <hp:tc>
69
+ <hp:p>
70
+ <hp:run><hp:t>월</hp:t></hp:run> <!-- Month -->
71
+ </hp:p>
72
+ </hp:tc>
73
+ <hp:tc>
74
+ <hp:p>
75
+ <hp:run><hp:t>일</hp:t></hp:run> <!-- Day -->
76
+ </hp:p>
77
+ </hp:tc>
78
+ ```
79
+
80
+ **Action**: Fill the cells preceding 년/월/일 with the appropriate date components.
81
+
82
+ ## Pattern 5: Writing Tip Box (작성 팁)
83
+
84
+ Standalone 1×1 tables with DASH-bordered cells that contain `※` guidance text. These are NOT fillable fields — they must be **deleted** before or during filling.
85
+
86
+ ```xml
87
+ <hp:tbl rowCnt="1" colCnt="1">
88
+ <hp:tr>
89
+ <hp:tc borderFillIDRef="16">
90
+ <hp:p>
91
+ <hp:run>
92
+ <hp:rPr charPrIDRef="45"/> <!-- Often RED style -->
93
+ <hp:t>※ 작성 팁: 사업의 목적과 필요성을 구체적으로 작성하세요.</hp:t>
94
+ </hp:run>
95
+ </hp:p>
96
+ <hp:p>
97
+ <hp:run>
98
+ <hp:rPr charPrIDRef="45"/>
99
+ <hp:t>※ 관련 법령이나 정책 근거를 제시하면 좋습니다.</hp:t>
100
+ </hp:run>
101
+ </hp:p>
102
+ </hp:tc>
103
+ </hp:tr>
104
+ </hp:tbl>
105
+ ```
106
+
107
+ **Identifying traits**:
108
+ - `rowCnt="1"` and `colCnt="1"` (single-cell table)
109
+ - `borderFillIDRef` resolves to DASH border style in `header.xml`
110
+ - Text starts with `※` or contains `작성 팁`, `작성요령`, `작성 요령`
111
+ - Often appears inside a `<hp:subList>` within another table cell
112
+
113
+ **Two container types**:
114
+ - **Standalone**: Top-level 1×1 table between other content → delete the entire `<hp:tbl>`
115
+ - **Nested**: Inside a `<hp:subList>` within a fill-target cell → delete the `<hp:subList>` element
116
+
117
+ **Action**: Flag as `field_type: "tip_box"`, `action: "delete"`. The filler agent removes these before filling.
118
+
119
+ ## Pattern 6: Character Property Resolution (charPrIDRef)
120
+
121
+ HWPX text formatting is controlled by `charPrIDRef` attributes that reference `<hh:charPr>` entries in `header.xml`.
122
+
123
+ ### How charPrIDRef works
124
+ ```xml
125
+ <!-- In section*.xml — a run references charPr ID 45 -->
126
+ <hp:run>
127
+ <hp:rPr charPrIDRef="45"/>
128
+ <hp:t>Some text</hp:t>
129
+ </hp:run>
130
+
131
+ <!-- In header.xml — charPr ID 45 defines the style -->
132
+ <hh:charPr id="45" height="1000" textColor="#FF0000"
133
+ bold="false" italic="true" spacing="-5"/>
134
+ ```
135
+
136
+ ### Template guide text uses RED styles
137
+ Many templates use red (#FF0000) charPrIDRef values for guide text, tip boxes, and instructions. Common red IDs seen in Korean government templates: 39, 45, 51, 52, 57, 62, 81.
138
+
139
+ **Critical rule**: When filling a field, NEVER copy `charPrIDRef` from guide/tip text. Instead, find or create a black (#000000) charPr.
140
+
141
+ ### Finding a suitable black charPr
142
+ ```python
143
+ import xml.etree.ElementTree as ET
144
+
145
+ def find_black_charpr(header_path):
146
+ """Find a charPrIDRef suitable for filled text (black, normal style)."""
147
+ hns = {"hh": "http://www.hancom.co.kr/hwpml/2011/head"}
148
+ tree = ET.parse(header_path)
149
+ root = tree.getroot()
150
+
151
+ candidates = []
152
+ for cp in root.iter("{%s}charPr" % hns["hh"]):
153
+ color = cp.get("textColor", "#000000")
154
+ bold = cp.get("bold", "false")
155
+ italic = cp.get("italic", "false")
156
+ spacing = int(cp.get("spacing", "0"))
157
+
158
+ # Want: black text, not italic, non-negative spacing
159
+ if color.upper() in ("#000000", "#000000FF", "black") and \
160
+ italic == "false" and spacing >= 0:
161
+ candidates.append({
162
+ "id": cp.get("id"),
163
+ "bold": bold == "true",
164
+ "height": int(cp.get("height", "1000")),
165
+ "spacing": spacing,
166
+ })
167
+
168
+ # Prefer non-bold, standard size, zero spacing
169
+ normal = [c for c in candidates if not c["bold"] and c["spacing"] == 0]
170
+ bold_list = [c for c in candidates if c["bold"] and c["spacing"] == 0]
171
+
172
+ return {
173
+ "normal": normal[0]["id"] if normal else None,
174
+ "bold": bold_list[0]["id"] if bold_list else None,
175
+ }
176
+ ```
177
+
178
+ ### Creating a new charPr if needed
179
+ If no suitable black charPr exists in `header.xml`, create one by appending a new `<hh:charPr>` element with the next available ID, `textColor="#000000"`, `bold="false"`, `italic="false"`, `spacing="0"`.
180
+
181
+ ## Pattern 7: Image Field in Table Cell
182
+
183
+ A label cell containing image-related keywords (사진, 증명사진, 로고, 서명, 직인, 사업자등록증) next to an empty cell indicates an image insertion point.
184
+
185
+ ```xml
186
+ <hp:tr>
187
+ <hp:tc>
188
+ <!-- Label cell with image keyword -->
189
+ <hp:p>
190
+ <hp:run>
191
+ <hp:rPr charPrIDRef="1"/>
192
+ <hp:t>사진</hp:t>
193
+ </hp:run>
194
+ </hp:p>
195
+ </hp:tc>
196
+ <hp:tc>
197
+ <!-- Empty cell → INSERT IMAGE HERE -->
198
+ <hp:p>
199
+ <hp:lineseg/>
200
+ </hp:p>
201
+ </hp:tc>
202
+ </hp:tr>
203
+ ```
204
+
205
+ **Action**: Insert a `<hp:pic>` element INSIDE a `<hp:run>` within the cell's `<hp:p>`. The `<hp:t/>` goes AFTER the pic inside the run.
206
+
207
+ ### Image Paragraph Structure (CRITICAL)
208
+
209
+ ```xml
210
+ <!-- pic must be INSIDE run, t/ AFTER pic (matches real Hancom Office output) -->
211
+ <hp:p id="..." paraPrIDRef="..." styleIDRef="0" pageBreak="0" columnBreak="0" merged="0">
212
+ <hp:linesegarray>
213
+ <hp:lineseg textpos="0" vertpos="0" vertsize="{H}" textheight="{H}"
214
+ baseline="{H*0.85}" spacing="500" .../>
215
+ </hp:linesegarray>
216
+ <hp:run charPrIDRef="0">
217
+ <hp:pic id="{seq_id}" zOrder="{z}" ...>...</hp:pic>
218
+ <hp:t/>
219
+ </hp:run>
220
+ </hp:p>
221
+ ```
222
+
223
+ ### Complete `<hp:pic>` Structure (Hancom Canonical Order)
224
+
225
+ ```xml
226
+ <hp:pic id="{seq_id}" zOrder="{z}" numberingType="PICTURE" textWrap="TOP_AND_BOTTOM"
227
+ textFlow="BOTH_SIDES" lock="0" dropcapstyle="None"
228
+ href="" groupLevel="0" instid="{seq_id}" reverse="0">
229
+ <!-- Group 1: Geometry -->
230
+ <hp:offset x="0" y="0"/>
231
+ <hp:orgSz width="{W}" height="{H}"/>
232
+ <hp:curSz width="{W}" height="{H}"/>
233
+ <hp:flip horizontal="0" vertical="0"/>
234
+ <hp:rotationInfo angle="0" centerX="{W/2}" centerY="{H/2}" rotateimage="1"/>
235
+ <hp:renderingInfo>
236
+ <hc:transMatrix e1="1" e2="0" e3="0" e4="0" e5="1" e6="0"/>
237
+ <hc:scaMatrix e1="1" e2="0" e3="0" e4="0" e5="1" e6="0"/>
238
+ <hc:rotMatrix e1="1" e2="-0" e3="0" e4="0" e5="1" e6="0"/>
239
+ </hp:renderingInfo>
240
+ <!-- Group 2: Image data -->
241
+ <hp:imgRect>
242
+ <hc:pt0 x="0" y="0"/>
243
+ <hc:pt1 x="{W}" y="0"/>
244
+ <hc:pt2 x="{W}" y="{H}"/>
245
+ <hc:pt3 x="0" y="{H}"/>
246
+ </hp:imgRect>
247
+ <hp:imgClip left="0" right="{pixW}" top="0" bottom="{pixH}"/>
248
+ <hp:inMargin left="0" right="0" top="0" bottom="0"/>
249
+ <hp:imgDim dimwidth="{pixW}" dimheight="{pixH}"/>
250
+ <hc:img binaryItemIDRef="{manifest_id}" bright="0" contrast="0" effect="REAL_PIC" alpha="0"/>
251
+ <!-- Group 3: Layout (AFTER hc:img) -->
252
+ <hp:sz width="{W}" widthRelTo="ABSOLUTE" height="{H}" heightRelTo="ABSOLUTE" protect="0"/>
253
+ <hp:pos treatAsChar="1" affectLSpacing="0" flowWithText="0" allowOverlap="0"
254
+ holdAnchorAndSO="0" vertRelTo="PARA" horzRelTo="COLUMN"
255
+ vertAlign="TOP" horzAlign="LEFT" vertOffset="0" horzOffset="0"/>
256
+ <hp:outMargin left="0" right="0" top="0" bottom="0"/>
257
+ </hp:pic>
258
+ ```
259
+
260
+ Where: `{W}/{H}` = HWPML units (1/7200 inch), `{pixW}/{pixH}` = pixel dimensions from PIL, `{manifest_id}` = `id` from `content.hpf`.
261
+
262
+ ### 9 Critical Rules for `<hp:pic>`
263
+
264
+ 1. **`<img>` uses `hc:` namespace** — `<hc:img>`, NOT `<hp:img>`
265
+ 2. **`<imgRect>` has 4 `<hc:pt>` children** — `<hc:pt0>` through `<hc:pt3>`, NOT inline attributes
266
+ 3. **All required children present** — `offset`, `orgSz`, `curSz`, `flip`, `rotationInfo`, `renderingInfo`, `inMargin`
267
+ 4. **No spurious elements** — Do NOT add `hp:lineShape`, `hp:caption`, `hp:shapeComment`
268
+ 5. **`imgClip` right/bottom = pixel dims** — from `imgDim`, NOT zeros
269
+ 6. **Hancom canonical element order** — offset, orgSz, ..., hc:img, **then** sz, pos, outMargin
270
+ 7. **Register in `content.hpf` manifest only** — Do NOT add `<hh:binDataItems>` to `header.xml`
271
+ 8. **`hp:pos` attributes** — `flowWithText="0"` `horzRelTo="COLUMN"`
272
+ 9. **pic INSIDE run, t AFTER pic** — `<hp:run><hp:pic>...</hp:pic><hp:t/></hp:run>`
273
+
274
+ ## Pattern 8: SubList Cell Wrapping (CRITICAL)
275
+
276
+ In Korean government HWPX templates, ~65% of table cells wrap their content in `<hp:subList>/<hp:p>` rather than having `<hp:p>` as a direct child of `<hp:tc>`. Hancom Office reads content from inside `<hp:subList>` and ignores orphaned direct `<hp:p>` elements.
277
+
278
+ ### Two cell structures
279
+
280
+ **Direct pattern** (~35% of cells):
281
+ ```xml
282
+ <hp:tc>
283
+ <hp:cellAddr .../>
284
+ <hp:cellSpan .../>
285
+ <hp:cellSz .../>
286
+ <hp:p>
287
+ <hp:run><hp:t>Content here</hp:t></hp:run>
288
+ </hp:p>
289
+ </hp:tc>
290
+ ```
291
+
292
+ **SubList pattern** (~65% of cells):
293
+ ```xml
294
+ <hp:tc>
295
+ <hp:cellAddr .../>
296
+ <hp:cellSpan .../>
297
+ <hp:cellSz .../>
298
+ <hp:subList>
299
+ <hp:p>
300
+ <hp:run><hp:t>Content here</hp:t></hp:run>
301
+ </hp:p>
302
+ </hp:subList>
303
+ </hp:tc>
304
+ ```
305
+
306
+ ### Critical rule for filling
307
+
308
+ When writing content into a cell, ALWAYS check for `<hp:subList>` first:
309
+ 1. If `<hp:subList>` exists: write into `<hp:subList>/<hp:p>`, NOT as a direct `<hp:p>` child of `<hp:tc>`
310
+ 2. If no `<hp:subList>`: write as direct `<hp:p>` child of `<hp:tc>` (standard pattern)
311
+
312
+ **Wrong** — creates orphaned paragraphs that Hancom ignores:
313
+ ```python
314
+ # BAD: always writes to cell directly
315
+ p = ET.SubElement(cell, hp_tag("p"))
316
+ ```
317
+
318
+ **Correct** — respects subList wrapper:
319
+ ```python
320
+ # GOOD: check for subList first
321
+ container = cell
322
+ for c in cell:
323
+ if c.tag == hp_tag("subList"):
324
+ container = c
325
+ break
326
+ p = ET.SubElement(container, hp_tag("p"))
327
+ ```
328
+
329
+ This applies to ALL cell operations: `clear_cell_content()`, `fill_cell_text()`, and `insert_cell_image_resolved()`.
330
+
331
+ ## Pattern 9: cellAddr Row Addressing (CRITICAL)
332
+
333
+ Every `<hp:tc>` inside a `<hp:tr>` contains a `<hp:cellAddr>` element with `colAddr` and `rowAddr` attributes. The `rowAddr` MUST equal the **0-based index** of the `<hp:tr>` within its parent `<hp:tbl>`.
334
+
335
+ ### Structure
336
+ ```xml
337
+ <hp:tbl rowCnt="3" colCnt="2">
338
+ <hp:tr> <!-- row index 0 -->
339
+ <hp:tc>
340
+ <hp:cellAddr colAddr="0" rowAddr="0"/> <!-- rowAddr = 0 ✓ -->
341
+ ...
342
+ </hp:tc>
343
+ <hp:tc>
344
+ <hp:cellAddr colAddr="1" rowAddr="0"/> <!-- rowAddr = 0 ✓ -->
345
+ ...
346
+ </hp:tc>
347
+ </hp:tr>
348
+ <hp:tr> <!-- row index 1 -->
349
+ <hp:tc>
350
+ <hp:cellAddr colAddr="0" rowAddr="1"/> <!-- rowAddr = 1 ✓ -->
351
+ ...
352
+ </hp:tc>
353
+ <hp:tc>
354
+ <hp:cellAddr colAddr="1" rowAddr="1"/> <!-- rowAddr = 1 ✓ -->
355
+ ...
356
+ </hp:tc>
357
+ </hp:tr>
358
+ </hp:tbl>
359
+ ```
360
+
361
+ ### Consequence of violation
362
+ If two `<hp:tr>` elements share the same `rowAddr`, Polaris Office **silently hides** the duplicate rows. The table renders with missing data but no error is reported. This is the most common corruption when cloning rows.
363
+
364
+ ### Fix code
365
+ ```python
366
+ HP = "http://www.hancom.co.kr/hwpml/2011/paragraph"
367
+
368
+ def fix_celladdr_rowaddr(tbl):
369
+ """Fix rowAddr values and rowCnt for an HWPX table after row insertion."""
370
+ rows = tbl.findall(f"{{{HP}}}tr")
371
+ for row_idx, tr in enumerate(rows):
372
+ for tc in tr.findall(f"{{{HP}}}tc"):
373
+ cell_addr = tc.find(f"{{{HP}}}cellAddr")
374
+ if cell_addr is not None:
375
+ cell_addr.set("rowAddr", str(row_idx))
376
+ tbl.set("rowCnt", str(len(rows)))
377
+ ```
378
+
379
+ ### When to apply
380
+ - After cloning a `<hp:tr>` and inserting it into a table
381
+ - After inserting new rows built from `table_content` pipe-delimited data
382
+ - After deleting rows from a table
383
+ - Any time the number or order of `<hp:tr>` children changes
384
+
385
+ ## Pattern 10: Image Paragraph Center Alignment
386
+
387
+ Image paragraphs in HWPX should be center-aligned using a `paraPrIDRef` that references a center-aligned `<hh:paraPr>` from `header.xml`.
388
+
389
+ ### Finding center-aligned paraPrIDRef
390
+
391
+ ```python
392
+ def find_center_parapr(header_path):
393
+ """Find first center-aligned paraPr from header.xml for image paragraphs."""
394
+ import xml.etree.ElementTree as ET
395
+ HH = "http://www.hancom.co.kr/hwpml/2011/head"
396
+ tree = ET.parse(header_path)
397
+ for pp in tree.getroot().iter(f"{{{HH}}}paraPr"):
398
+ align = pp.find(f"{{{HH}}}align")
399
+ if align is not None and align.get("horizontal") == "CENTER":
400
+ return pp.get("id")
401
+ return "0" # fallback to default
402
+ ```
403
+
404
+ ### Usage in image paragraphs
405
+
406
+ ```xml
407
+ <!-- Image paragraph uses center-aligned paraPrIDRef -->
408
+ <hp:p id="..." paraPrIDRef="{CENTER_PARAPR_ID}" styleIDRef="0" pageBreak="0" columnBreak="0" merged="0">
409
+ <hp:linesegarray>
410
+ <hp:lineseg textpos="0" vertpos="0" vertsize="{H}" textheight="{H}" .../>
411
+ </hp:linesegarray>
412
+ <hp:run charPrIDRef="0">
413
+ <hp:pic id="{seq_id}" ...>...</hp:pic>
414
+ <hp:t/>
415
+ </hp:run>
416
+ </hp:p>
417
+ ```
418
+
419
+ ### Why this matters
420
+
421
+ Without center alignment, images default to left-aligned positioning. Korean government document templates expect centered images, particularly for section content images (~77% page width). The `paraPrIDRef` must reference a `<hh:paraPr>` that has `<hh:align horizontal="CENTER"/>`.
422
+
423
+ ### When to apply
424
+ - ALL image paragraphs in section content (from `image_opportunities`)
425
+ - Cell-level images that should be centered within the cell
426
+ - Both standalone and inline image paragraphs
427
+
428
+ ## Safe HWPX Modification
429
+
430
+ ```python
431
+ import xml.etree.ElementTree as ET
432
+
433
+ ns = {
434
+ "hp": "http://www.hancom.co.kr/hwpml/2011/paragraph",
435
+ "hs": "http://www.hancom.co.kr/hwpml/2011/section",
436
+ }
437
+
438
+ # Register namespaces to avoid prefix changes
439
+ for prefix, uri in ns.items():
440
+ ET.register_namespace(prefix, uri)
441
+
442
+ tree = ET.parse("Contents/section0.xml")
443
+ root = tree.getroot()
444
+
445
+ # Find empty cells adjacent to label cells in tables
446
+ for tbl in root.iter("{%s}tbl" % ns["hp"]):
447
+ for tr in tbl.iter("{%s}tr" % ns["hp"]):
448
+ cells = list(tr.iter("{%s}tc" % ns["hp"]))
449
+ for i, cell in enumerate(cells):
450
+ # Check if this cell has text (label)
451
+ texts = [t.text for t in cell.iter("{%s}t" % ns["hp"]) if t.text]
452
+ if texts and i + 1 < len(cells):
453
+ next_cell = cells[i + 1]
454
+ next_texts = [t.text for t in next_cell.iter("{%s}t" % ns["hp"]) if t.text]
455
+ if not next_texts:
456
+ label = "".join(texts)
457
+ # This is a fillable field with label
458
+ print(f"Found field: {label}")
459
+
460
+ tree.write("Contents/section0.xml", xml_declaration=True, encoding="UTF-8")
461
+ ```