devlyn-cli 0.5.2 → 0.5.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/devlyn.js +1 -0
- package/config/commands/devlyn.team-resolve.md +31 -2
- package/optional-skills/dokkit/ANALYSIS.md +198 -0
- package/optional-skills/dokkit/COMMANDS.md +365 -0
- package/optional-skills/dokkit/DOCX-XML.md +76 -0
- package/optional-skills/dokkit/EXPORT.md +102 -0
- package/optional-skills/dokkit/FILLING.md +377 -0
- package/optional-skills/dokkit/HWPX-XML.md +73 -0
- package/optional-skills/dokkit/IMAGE-SOURCING.md +127 -0
- package/optional-skills/dokkit/INGESTION.md +65 -0
- package/optional-skills/dokkit/SKILL.md +153 -0
- package/optional-skills/dokkit/STATE.md +60 -0
- package/optional-skills/dokkit/references/docx-field-patterns.md +151 -0
- package/optional-skills/dokkit/references/docx-structure.md +58 -0
- package/optional-skills/dokkit/references/field-detection-patterns.md +130 -0
- package/optional-skills/dokkit/references/hwpx-field-patterns.md +461 -0
- package/optional-skills/dokkit/references/hwpx-structure.md +159 -0
- package/optional-skills/dokkit/references/image-opportunity-heuristics.md +121 -0
- package/optional-skills/dokkit/references/image-xml-patterns.md +338 -0
- package/optional-skills/dokkit/references/section-image-interleaving.md +346 -0
- package/optional-skills/dokkit/references/section-range-detection.md +118 -0
- package/optional-skills/dokkit/references/state-schema.md +143 -0
- package/optional-skills/dokkit/references/supported-formats.md +67 -0
- package/optional-skills/dokkit/scripts/compile_hwpx.py +134 -0
- package/optional-skills/dokkit/scripts/detect_fields.py +301 -0
- package/optional-skills/dokkit/scripts/detect_fields_hwpx.py +286 -0
- package/optional-skills/dokkit/scripts/export_pdf.py +99 -0
- package/optional-skills/dokkit/scripts/parse_hwpx.py +185 -0
- package/optional-skills/dokkit/scripts/parse_image_with_gemini.py +159 -0
- package/optional-skills/dokkit/scripts/parse_xlsx.py +98 -0
- package/optional-skills/dokkit/scripts/source_images.py +365 -0
- package/optional-skills/dokkit/scripts/validate_docx.py +142 -0
- package/optional-skills/dokkit/scripts/validate_hwpx.py +281 -0
- package/optional-skills/dokkit/scripts/validate_state.py +132 -0
- package/package.json +1 -1
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# Field Detection Patterns
|
|
2
|
+
|
|
3
|
+
## DOCX Detection Heuristics
|
|
4
|
+
|
|
5
|
+
### Heuristic 1: Curly Brace Placeholders
|
|
6
|
+
```regex
|
|
7
|
+
\{\{[^}]+\}\}
|
|
8
|
+
```
|
|
9
|
+
Match text like `{{field_name}}`. High reliability.
|
|
10
|
+
|
|
11
|
+
### Heuristic 2: Angle Bracket Placeholders
|
|
12
|
+
```regex
|
|
13
|
+
<<[^>]+>>
|
|
14
|
+
```
|
|
15
|
+
Match text like `<<field_name>>`. High reliability.
|
|
16
|
+
|
|
17
|
+
### Heuristic 3: Square Bracket Placeholders
|
|
18
|
+
```regex
|
|
19
|
+
\[[^\]]+\]
|
|
20
|
+
```
|
|
21
|
+
Match text like `[field_name]`. Medium reliability (may match references).
|
|
22
|
+
|
|
23
|
+
### Heuristic 4: Underline-Only Runs
|
|
24
|
+
A run where:
|
|
25
|
+
- `<w:rPr>` contains `<w:u w:val="single"/>`
|
|
26
|
+
- `<w:t>` contains only spaces, underscores, or is empty
|
|
27
|
+
- Run length > 3 characters
|
|
28
|
+
|
|
29
|
+
### Heuristic 5: Empty Table Cells
|
|
30
|
+
A `<w:tc>` that:
|
|
31
|
+
- Contains only `<w:p/>` or `<w:p><w:pPr/></w:p>` (empty paragraph)
|
|
32
|
+
- Is adjacent to a cell containing text (the label)
|
|
33
|
+
- The label cell's text is short (< 50 chars) and not numeric
|
|
34
|
+
|
|
35
|
+
### Heuristic 6: Instruction Text
|
|
36
|
+
A run where text matches patterns like:
|
|
37
|
+
```regex
|
|
38
|
+
\(.*?(enter|type|input|write|fill|입력).*?\)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Heuristic 7: Content Controls
|
|
42
|
+
Any `<w:sdt>` element with `<w:showingPlcHdr/>` in its properties.
|
|
43
|
+
|
|
44
|
+
### Heuristic 8: Image Fields
|
|
45
|
+
A field is classified as `image` when any of these conditions hold:
|
|
46
|
+
- A `{{placeholder}}` or `<<placeholder>>` contains an image keyword
|
|
47
|
+
- A table cell contains an existing `<w:drawing>` element (pre-positioned image slot)
|
|
48
|
+
- An empty table cell is adjacent to a cell whose label matches an image keyword
|
|
49
|
+
|
|
50
|
+
**Image keywords** (case-insensitive):
|
|
51
|
+
- Korean: 사진, 증명사진, 여권사진, 로고, 서명, 날인, 도장, 직인
|
|
52
|
+
- English: Photo, Picture, Logo, Signature, Stamp, Seal, Image, Portrait
|
|
53
|
+
|
|
54
|
+
**Image type classification**:
|
|
55
|
+
| Keyword match | `image_type` |
|
|
56
|
+
|---------------|-------------|
|
|
57
|
+
| 사진, 증명사진, 여권사진, photo, picture, portrait, image | `photo` |
|
|
58
|
+
| 로고, logo | `logo` |
|
|
59
|
+
| 서명, 날인, 도장, 직인, signature, stamp, seal | `signature` |
|
|
60
|
+
| (no keyword match) | `figure` |
|
|
61
|
+
|
|
62
|
+
Image fields are **excluded** from the `placeholder_text` and `empty_cell` detectors to prevent double-detection.
|
|
63
|
+
|
|
64
|
+
### Heuristic 9: Tip Box
|
|
65
|
+
A `<w:tbl>` that:
|
|
66
|
+
- Has exactly one row and one cell (1×1 table)
|
|
67
|
+
- `<w:tblBorders>` uses `w:val="dashed"` borders
|
|
68
|
+
- Cell text starts with `※` or contains `작성 팁` / `작성요령`
|
|
69
|
+
- Often has red text color (`<w:color w:val="FF0000"/>`)
|
|
70
|
+
|
|
71
|
+
→ `field_type: "tip_box"`, `action: "delete"`
|
|
72
|
+
|
|
73
|
+
## HWPX Detection Heuristics
|
|
74
|
+
|
|
75
|
+
### Heuristic 1: Empty Adjacent Cells
|
|
76
|
+
Same as DOCX but using `<hp:tc>` and `<hp:t>` elements.
|
|
77
|
+
|
|
78
|
+
### Heuristic 2: Korean Instruction Text
|
|
79
|
+
```regex
|
|
80
|
+
\(.*?(입력|기재|작성).*?\)
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### Heuristic 3: Date Component Cells
|
|
84
|
+
Cells immediately before 년/월/일 (year/month/day) markers.
|
|
85
|
+
|
|
86
|
+
### Heuristic 4: Image Fields
|
|
87
|
+
Same logic as DOCX Heuristic 8, adapted for HWPX elements:
|
|
88
|
+
- `<hp:pic>` instead of `<w:drawing>`
|
|
89
|
+
- `<hp:tc>` / `<hp:t>` instead of `<w:tc>` / `<w:t>`
|
|
90
|
+
- Same image keyword list and type classification
|
|
91
|
+
|
|
92
|
+
### Heuristic 5: Tip Box
|
|
93
|
+
An `<hp:tbl>` that:
|
|
94
|
+
- Has `rowCnt="1"` and `colCnt="1"` (single-cell table)
|
|
95
|
+
- `borderFillIDRef` resolves to DASH border style in `header.xml`
|
|
96
|
+
- Cell text starts with `※` or contains `작성 팁` / `작성요령` / `작성 요령`
|
|
97
|
+
- May appear standalone or nested inside a `<hp:subList>` within another cell
|
|
98
|
+
|
|
99
|
+
→ `field_type: "tip_box"`, `action: "delete"`, `container: "standalone"|"nested"`
|
|
100
|
+
|
|
101
|
+
### Heuristic 6: Section Header Rows
|
|
102
|
+
Table rows where:
|
|
103
|
+
- First cell spans multiple columns (`hp:cellSpan colSpan > 1`)
|
|
104
|
+
- Text is short and descriptive (section name)
|
|
105
|
+
- Background may be shaded
|
|
106
|
+
|
|
107
|
+
## HWPX Pre-Fill Sanitization
|
|
108
|
+
|
|
109
|
+
### Negative Character Spacing
|
|
110
|
+
HWPX templates may define `<hh:charPr>` elements in `header.xml` with negative `<hh:spacing>` values (e.g., `hangul="-3"`). These compress characters closer together, which works for short placeholder text but causes **severe text overlap** when the filler replaces placeholders with longer content.
|
|
111
|
+
|
|
112
|
+
**Rule**: Before filling, scan ALL `<hh:charPr>` definitions in `header.xml` and set any negative spacing attribute values to `"0"`. This applies to all attributes: `hangul`, `latin`, `hanja`, `japanese`, `other`, `symbol`, `user`.
|
|
113
|
+
|
|
114
|
+
**Example fix**:
|
|
115
|
+
```xml
|
|
116
|
+
<!-- Before (causes overlap) -->
|
|
117
|
+
<hh:spacing hangul="-3" latin="-3" hanja="-3" japanese="-3" other="-3" symbol="-3" user="-3"/>
|
|
118
|
+
|
|
119
|
+
<!-- After (normal spacing) -->
|
|
120
|
+
<hh:spacing hangul="0" latin="0" hanja="0" japanese="0" other="0" symbol="0" user="0"/>
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
## False Positive Filtering
|
|
124
|
+
|
|
125
|
+
Exclude detected "fields" that are:
|
|
126
|
+
- Part of a header/title row (not fillable)
|
|
127
|
+
- Copyright notices or footer text
|
|
128
|
+
- Page numbers or running headers
|
|
129
|
+
- Table of contents entries
|
|
130
|
+
- Cross-reference markers
|
|
@@ -0,0 +1,461 @@
|
|
|
1
|
+
# HWPX Field Detection Patterns
|
|
2
|
+
|
|
3
|
+
## Pattern 1: Empty Table Cell
|
|
4
|
+
|
|
5
|
+
Korean forms are heavily table-based. The most common pattern:
|
|
6
|
+
|
|
7
|
+
```xml
|
|
8
|
+
<hp:tr>
|
|
9
|
+
<hp:tc>
|
|
10
|
+
<!-- Label cell -->
|
|
11
|
+
<hp:p>
|
|
12
|
+
<hp:run>
|
|
13
|
+
<hp:rPr charPrIDRef="1"/>
|
|
14
|
+
<hp:t>성명</hp:t>
|
|
15
|
+
</hp:run>
|
|
16
|
+
</hp:p>
|
|
17
|
+
</hp:tc>
|
|
18
|
+
<hp:tc>
|
|
19
|
+
<!-- Empty value cell → FILL THIS -->
|
|
20
|
+
<hp:p>
|
|
21
|
+
<hp:lineseg/>
|
|
22
|
+
</hp:p>
|
|
23
|
+
</hp:tc>
|
|
24
|
+
</hp:tr>
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Action**: Insert a new `<hp:run>` with `<hp:t>value</hp:t>` into the empty paragraph. Copy `charPrIDRef` from label cell's run.
|
|
28
|
+
|
|
29
|
+
## Pattern 2: Placeholder Text in Cell
|
|
30
|
+
|
|
31
|
+
```xml
|
|
32
|
+
<hp:tc>
|
|
33
|
+
<hp:p>
|
|
34
|
+
<hp:run>
|
|
35
|
+
<hp:t>(이름을 입력하세요)</hp:t> <!-- Instruction text -->
|
|
36
|
+
</hp:run>
|
|
37
|
+
</hp:p>
|
|
38
|
+
</hp:tc>
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Action**: Replace the text in `<hp:t>` with the actual value.
|
|
42
|
+
|
|
43
|
+
## Pattern 3: Multi-Row Spanning Label
|
|
44
|
+
|
|
45
|
+
Korean forms often have a label cell spanning multiple rows:
|
|
46
|
+
|
|
47
|
+
```xml
|
|
48
|
+
<hp:tr>
|
|
49
|
+
<hp:tc>
|
|
50
|
+
<hp:cellSpan rowSpan="3"/>
|
|
51
|
+
<hp:p><hp:run><hp:t>학력</hp:t></hp:run></hp:p>
|
|
52
|
+
</hp:tc>
|
|
53
|
+
<hp:tc><hp:p><hp:run><hp:t>학교명</hp:t></hp:run></hp:p></hp:tc>
|
|
54
|
+
<hp:tc><hp:p/></hp:tc> <!-- Empty → fill with school name -->
|
|
55
|
+
</hp:tr>
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Action**: The spanning label ("학력" = Education) is the section. Sub-labels ("학교명" = School Name) identify individual fields.
|
|
59
|
+
|
|
60
|
+
## Pattern 4: Date Fields
|
|
61
|
+
|
|
62
|
+
```xml
|
|
63
|
+
<hp:tc>
|
|
64
|
+
<hp:p>
|
|
65
|
+
<hp:run><hp:t>년</hp:t></hp:run> <!-- Year -->
|
|
66
|
+
</hp:p>
|
|
67
|
+
</hp:tc>
|
|
68
|
+
<hp:tc>
|
|
69
|
+
<hp:p>
|
|
70
|
+
<hp:run><hp:t>월</hp:t></hp:run> <!-- Month -->
|
|
71
|
+
</hp:p>
|
|
72
|
+
</hp:tc>
|
|
73
|
+
<hp:tc>
|
|
74
|
+
<hp:p>
|
|
75
|
+
<hp:run><hp:t>일</hp:t></hp:run> <!-- Day -->
|
|
76
|
+
</hp:p>
|
|
77
|
+
</hp:tc>
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Action**: Fill the cells preceding 년/월/일 with the appropriate date components.
|
|
81
|
+
|
|
82
|
+
## Pattern 5: Writing Tip Box (작성 팁)
|
|
83
|
+
|
|
84
|
+
Standalone 1×1 tables with DASH-bordered cells that contain `※` guidance text. These are NOT fillable fields — they must be **deleted** before or during filling.
|
|
85
|
+
|
|
86
|
+
```xml
|
|
87
|
+
<hp:tbl rowCnt="1" colCnt="1">
|
|
88
|
+
<hp:tr>
|
|
89
|
+
<hp:tc borderFillIDRef="16">
|
|
90
|
+
<hp:p>
|
|
91
|
+
<hp:run>
|
|
92
|
+
<hp:rPr charPrIDRef="45"/> <!-- Often RED style -->
|
|
93
|
+
<hp:t>※ 작성 팁: 사업의 목적과 필요성을 구체적으로 작성하세요.</hp:t>
|
|
94
|
+
</hp:run>
|
|
95
|
+
</hp:p>
|
|
96
|
+
<hp:p>
|
|
97
|
+
<hp:run>
|
|
98
|
+
<hp:rPr charPrIDRef="45"/>
|
|
99
|
+
<hp:t>※ 관련 법령이나 정책 근거를 제시하면 좋습니다.</hp:t>
|
|
100
|
+
</hp:run>
|
|
101
|
+
</hp:p>
|
|
102
|
+
</hp:tc>
|
|
103
|
+
</hp:tr>
|
|
104
|
+
</hp:tbl>
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Identifying traits**:
|
|
108
|
+
- `rowCnt="1"` and `colCnt="1"` (single-cell table)
|
|
109
|
+
- `borderFillIDRef` resolves to DASH border style in `header.xml`
|
|
110
|
+
- Text starts with `※` or contains `작성 팁`, `작성요령`, `작성 요령`
|
|
111
|
+
- Often appears inside a `<hp:subList>` within another table cell
|
|
112
|
+
|
|
113
|
+
**Two container types**:
|
|
114
|
+
- **Standalone**: Top-level 1×1 table between other content → delete the entire `<hp:tbl>`
|
|
115
|
+
- **Nested**: Inside a `<hp:subList>` within a fill-target cell → delete the `<hp:subList>` element
|
|
116
|
+
|
|
117
|
+
**Action**: Flag as `field_type: "tip_box"`, `action: "delete"`. The filler agent removes these before filling.
|
|
118
|
+
|
|
119
|
+
## Pattern 6: Character Property Resolution (charPrIDRef)
|
|
120
|
+
|
|
121
|
+
HWPX text formatting is controlled by `charPrIDRef` attributes that reference `<hh:charPr>` entries in `header.xml`.
|
|
122
|
+
|
|
123
|
+
### How charPrIDRef works
|
|
124
|
+
```xml
|
|
125
|
+
<!-- In section*.xml — a run references charPr ID 45 -->
|
|
126
|
+
<hp:run>
|
|
127
|
+
<hp:rPr charPrIDRef="45"/>
|
|
128
|
+
<hp:t>Some text</hp:t>
|
|
129
|
+
</hp:run>
|
|
130
|
+
|
|
131
|
+
<!-- In header.xml — charPr ID 45 defines the style -->
|
|
132
|
+
<hh:charPr id="45" height="1000" textColor="#FF0000"
|
|
133
|
+
bold="false" italic="true" spacing="-5"/>
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Template guide text uses RED styles
|
|
137
|
+
Many templates use red (#FF0000) charPrIDRef values for guide text, tip boxes, and instructions. Common red IDs seen in Korean government templates: 39, 45, 51, 52, 57, 62, 81.
|
|
138
|
+
|
|
139
|
+
**Critical rule**: When filling a field, NEVER copy `charPrIDRef` from guide/tip text. Instead, find or create a black (#000000) charPr.
|
|
140
|
+
|
|
141
|
+
### Finding a suitable black charPr
|
|
142
|
+
```python
|
|
143
|
+
import xml.etree.ElementTree as ET
|
|
144
|
+
|
|
145
|
+
def find_black_charpr(header_path):
|
|
146
|
+
"""Find a charPrIDRef suitable for filled text (black, normal style)."""
|
|
147
|
+
hns = {"hh": "http://www.hancom.co.kr/hwpml/2011/head"}
|
|
148
|
+
tree = ET.parse(header_path)
|
|
149
|
+
root = tree.getroot()
|
|
150
|
+
|
|
151
|
+
candidates = []
|
|
152
|
+
for cp in root.iter("{%s}charPr" % hns["hh"]):
|
|
153
|
+
color = cp.get("textColor", "#000000")
|
|
154
|
+
bold = cp.get("bold", "false")
|
|
155
|
+
italic = cp.get("italic", "false")
|
|
156
|
+
spacing = int(cp.get("spacing", "0"))
|
|
157
|
+
|
|
158
|
+
# Want: black text, not italic, non-negative spacing
|
|
159
|
+
if color.upper() in ("#000000", "#000000FF", "black") and \
|
|
160
|
+
italic == "false" and spacing >= 0:
|
|
161
|
+
candidates.append({
|
|
162
|
+
"id": cp.get("id"),
|
|
163
|
+
"bold": bold == "true",
|
|
164
|
+
"height": int(cp.get("height", "1000")),
|
|
165
|
+
"spacing": spacing,
|
|
166
|
+
})
|
|
167
|
+
|
|
168
|
+
# Prefer non-bold, standard size, zero spacing
|
|
169
|
+
normal = [c for c in candidates if not c["bold"] and c["spacing"] == 0]
|
|
170
|
+
bold_list = [c for c in candidates if c["bold"] and c["spacing"] == 0]
|
|
171
|
+
|
|
172
|
+
return {
|
|
173
|
+
"normal": normal[0]["id"] if normal else None,
|
|
174
|
+
"bold": bold_list[0]["id"] if bold_list else None,
|
|
175
|
+
}
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Creating a new charPr if needed
|
|
179
|
+
If no suitable black charPr exists in `header.xml`, create one by appending a new `<hh:charPr>` element with the next available ID, `textColor="#000000"`, `bold="false"`, `italic="false"`, `spacing="0"`.
|
|
180
|
+
|
|
181
|
+
## Pattern 7: Image Field in Table Cell
|
|
182
|
+
|
|
183
|
+
A label cell containing image-related keywords (사진, 증명사진, 로고, 서명, 직인, 사업자등록증) next to an empty cell indicates an image insertion point.
|
|
184
|
+
|
|
185
|
+
```xml
|
|
186
|
+
<hp:tr>
|
|
187
|
+
<hp:tc>
|
|
188
|
+
<!-- Label cell with image keyword -->
|
|
189
|
+
<hp:p>
|
|
190
|
+
<hp:run>
|
|
191
|
+
<hp:rPr charPrIDRef="1"/>
|
|
192
|
+
<hp:t>사진</hp:t>
|
|
193
|
+
</hp:run>
|
|
194
|
+
</hp:p>
|
|
195
|
+
</hp:tc>
|
|
196
|
+
<hp:tc>
|
|
197
|
+
<!-- Empty cell → INSERT IMAGE HERE -->
|
|
198
|
+
<hp:p>
|
|
199
|
+
<hp:lineseg/>
|
|
200
|
+
</hp:p>
|
|
201
|
+
</hp:tc>
|
|
202
|
+
</hp:tr>
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
**Action**: Insert a `<hp:pic>` element INSIDE a `<hp:run>` within the cell's `<hp:p>`. The `<hp:t/>` goes AFTER the pic inside the run.
|
|
206
|
+
|
|
207
|
+
### Image Paragraph Structure (CRITICAL)
|
|
208
|
+
|
|
209
|
+
```xml
|
|
210
|
+
<!-- pic must be INSIDE run, t/ AFTER pic (matches real Hancom Office output) -->
|
|
211
|
+
<hp:p id="..." paraPrIDRef="..." styleIDRef="0" pageBreak="0" columnBreak="0" merged="0">
|
|
212
|
+
<hp:linesegarray>
|
|
213
|
+
<hp:lineseg textpos="0" vertpos="0" vertsize="{H}" textheight="{H}"
|
|
214
|
+
baseline="{H*0.85}" spacing="500" .../>
|
|
215
|
+
</hp:linesegarray>
|
|
216
|
+
<hp:run charPrIDRef="0">
|
|
217
|
+
<hp:pic id="{seq_id}" zOrder="{z}" ...>...</hp:pic>
|
|
218
|
+
<hp:t/>
|
|
219
|
+
</hp:run>
|
|
220
|
+
</hp:p>
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### Complete `<hp:pic>` Structure (Hancom Canonical Order)
|
|
224
|
+
|
|
225
|
+
```xml
|
|
226
|
+
<hp:pic id="{seq_id}" zOrder="{z}" numberingType="PICTURE" textWrap="TOP_AND_BOTTOM"
|
|
227
|
+
textFlow="BOTH_SIDES" lock="0" dropcapstyle="None"
|
|
228
|
+
href="" groupLevel="0" instid="{seq_id}" reverse="0">
|
|
229
|
+
<!-- Group 1: Geometry -->
|
|
230
|
+
<hp:offset x="0" y="0"/>
|
|
231
|
+
<hp:orgSz width="{W}" height="{H}"/>
|
|
232
|
+
<hp:curSz width="{W}" height="{H}"/>
|
|
233
|
+
<hp:flip horizontal="0" vertical="0"/>
|
|
234
|
+
<hp:rotationInfo angle="0" centerX="{W/2}" centerY="{H/2}" rotateimage="1"/>
|
|
235
|
+
<hp:renderingInfo>
|
|
236
|
+
<hc:transMatrix e1="1" e2="0" e3="0" e4="0" e5="1" e6="0"/>
|
|
237
|
+
<hc:scaMatrix e1="1" e2="0" e3="0" e4="0" e5="1" e6="0"/>
|
|
238
|
+
<hc:rotMatrix e1="1" e2="-0" e3="0" e4="0" e5="1" e6="0"/>
|
|
239
|
+
</hp:renderingInfo>
|
|
240
|
+
<!-- Group 2: Image data -->
|
|
241
|
+
<hp:imgRect>
|
|
242
|
+
<hc:pt0 x="0" y="0"/>
|
|
243
|
+
<hc:pt1 x="{W}" y="0"/>
|
|
244
|
+
<hc:pt2 x="{W}" y="{H}"/>
|
|
245
|
+
<hc:pt3 x="0" y="{H}"/>
|
|
246
|
+
</hp:imgRect>
|
|
247
|
+
<hp:imgClip left="0" right="{pixW}" top="0" bottom="{pixH}"/>
|
|
248
|
+
<hp:inMargin left="0" right="0" top="0" bottom="0"/>
|
|
249
|
+
<hp:imgDim dimwidth="{pixW}" dimheight="{pixH}"/>
|
|
250
|
+
<hc:img binaryItemIDRef="{manifest_id}" bright="0" contrast="0" effect="REAL_PIC" alpha="0"/>
|
|
251
|
+
<!-- Group 3: Layout (AFTER hc:img) -->
|
|
252
|
+
<hp:sz width="{W}" widthRelTo="ABSOLUTE" height="{H}" heightRelTo="ABSOLUTE" protect="0"/>
|
|
253
|
+
<hp:pos treatAsChar="1" affectLSpacing="0" flowWithText="0" allowOverlap="0"
|
|
254
|
+
holdAnchorAndSO="0" vertRelTo="PARA" horzRelTo="COLUMN"
|
|
255
|
+
vertAlign="TOP" horzAlign="LEFT" vertOffset="0" horzOffset="0"/>
|
|
256
|
+
<hp:outMargin left="0" right="0" top="0" bottom="0"/>
|
|
257
|
+
</hp:pic>
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
Where: `{W}/{H}` = HWPML units (1/7200 inch), `{pixW}/{pixH}` = pixel dimensions from PIL, `{manifest_id}` = `id` from `content.hpf`.
|
|
261
|
+
|
|
262
|
+
### 9 Critical Rules for `<hp:pic>`
|
|
263
|
+
|
|
264
|
+
1. **`<img>` uses `hc:` namespace** — `<hc:img>`, NOT `<hp:img>`
|
|
265
|
+
2. **`<imgRect>` has 4 `<hc:pt>` children** — `<hc:pt0>` through `<hc:pt3>`, NOT inline attributes
|
|
266
|
+
3. **All required children present** — `offset`, `orgSz`, `curSz`, `flip`, `rotationInfo`, `renderingInfo`, `inMargin`
|
|
267
|
+
4. **No spurious elements** — Do NOT add `hp:lineShape`, `hp:caption`, `hp:shapeComment`
|
|
268
|
+
5. **`imgClip` right/bottom = pixel dims** — from `imgDim`, NOT zeros
|
|
269
|
+
6. **Hancom canonical element order** — offset, orgSz, ..., hc:img, **then** sz, pos, outMargin
|
|
270
|
+
7. **Register in `content.hpf` manifest only** — Do NOT add `<hh:binDataItems>` to `header.xml`
|
|
271
|
+
8. **`hp:pos` attributes** — `flowWithText="0"` `horzRelTo="COLUMN"`
|
|
272
|
+
9. **pic INSIDE run, t AFTER pic** — `<hp:run><hp:pic>...</hp:pic><hp:t/></hp:run>`
|
|
273
|
+
|
|
274
|
+
## Pattern 8: SubList Cell Wrapping (CRITICAL)
|
|
275
|
+
|
|
276
|
+
In Korean government HWPX templates, ~65% of table cells wrap their content in `<hp:subList>/<hp:p>` rather than having `<hp:p>` as a direct child of `<hp:tc>`. Hancom Office reads content from inside `<hp:subList>` and ignores orphaned direct `<hp:p>` elements.
|
|
277
|
+
|
|
278
|
+
### Two cell structures
|
|
279
|
+
|
|
280
|
+
**Direct pattern** (~35% of cells):
|
|
281
|
+
```xml
|
|
282
|
+
<hp:tc>
|
|
283
|
+
<hp:cellAddr .../>
|
|
284
|
+
<hp:cellSpan .../>
|
|
285
|
+
<hp:cellSz .../>
|
|
286
|
+
<hp:p>
|
|
287
|
+
<hp:run><hp:t>Content here</hp:t></hp:run>
|
|
288
|
+
</hp:p>
|
|
289
|
+
</hp:tc>
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
**SubList pattern** (~65% of cells):
|
|
293
|
+
```xml
|
|
294
|
+
<hp:tc>
|
|
295
|
+
<hp:cellAddr .../>
|
|
296
|
+
<hp:cellSpan .../>
|
|
297
|
+
<hp:cellSz .../>
|
|
298
|
+
<hp:subList>
|
|
299
|
+
<hp:p>
|
|
300
|
+
<hp:run><hp:t>Content here</hp:t></hp:run>
|
|
301
|
+
</hp:p>
|
|
302
|
+
</hp:subList>
|
|
303
|
+
</hp:tc>
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Critical rule for filling
|
|
307
|
+
|
|
308
|
+
When writing content into a cell, ALWAYS check for `<hp:subList>` first:
|
|
309
|
+
1. If `<hp:subList>` exists: write into `<hp:subList>/<hp:p>`, NOT as a direct `<hp:p>` child of `<hp:tc>`
|
|
310
|
+
2. If no `<hp:subList>`: write as direct `<hp:p>` child of `<hp:tc>` (standard pattern)
|
|
311
|
+
|
|
312
|
+
**Wrong** — creates orphaned paragraphs that Hancom ignores:
|
|
313
|
+
```python
|
|
314
|
+
# BAD: always writes to cell directly
|
|
315
|
+
p = ET.SubElement(cell, hp_tag("p"))
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
**Correct** — respects subList wrapper:
|
|
319
|
+
```python
|
|
320
|
+
# GOOD: check for subList first
|
|
321
|
+
container = cell
|
|
322
|
+
for c in cell:
|
|
323
|
+
if c.tag == hp_tag("subList"):
|
|
324
|
+
container = c
|
|
325
|
+
break
|
|
326
|
+
p = ET.SubElement(container, hp_tag("p"))
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
This applies to ALL cell operations: `clear_cell_content()`, `fill_cell_text()`, and `insert_cell_image_resolved()`.
|
|
330
|
+
|
|
331
|
+
## Pattern 9: cellAddr Row Addressing (CRITICAL)
|
|
332
|
+
|
|
333
|
+
Every `<hp:tc>` inside a `<hp:tr>` contains a `<hp:cellAddr>` element with `colAddr` and `rowAddr` attributes. The `rowAddr` MUST equal the **0-based index** of the `<hp:tr>` within its parent `<hp:tbl>`.
|
|
334
|
+
|
|
335
|
+
### Structure
|
|
336
|
+
```xml
|
|
337
|
+
<hp:tbl rowCnt="3" colCnt="2">
|
|
338
|
+
<hp:tr> <!-- row index 0 -->
|
|
339
|
+
<hp:tc>
|
|
340
|
+
<hp:cellAddr colAddr="0" rowAddr="0"/> <!-- rowAddr = 0 ✓ -->
|
|
341
|
+
...
|
|
342
|
+
</hp:tc>
|
|
343
|
+
<hp:tc>
|
|
344
|
+
<hp:cellAddr colAddr="1" rowAddr="0"/> <!-- rowAddr = 0 ✓ -->
|
|
345
|
+
...
|
|
346
|
+
</hp:tc>
|
|
347
|
+
</hp:tr>
|
|
348
|
+
<hp:tr> <!-- row index 1 -->
|
|
349
|
+
<hp:tc>
|
|
350
|
+
<hp:cellAddr colAddr="0" rowAddr="1"/> <!-- rowAddr = 1 ✓ -->
|
|
351
|
+
...
|
|
352
|
+
</hp:tc>
|
|
353
|
+
<hp:tc>
|
|
354
|
+
<hp:cellAddr colAddr="1" rowAddr="1"/> <!-- rowAddr = 1 ✓ -->
|
|
355
|
+
...
|
|
356
|
+
</hp:tc>
|
|
357
|
+
</hp:tr>
|
|
358
|
+
</hp:tbl>
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
### Consequence of violation
|
|
362
|
+
If two `<hp:tr>` elements share the same `rowAddr`, Polaris Office **silently hides** the duplicate rows. The table renders with missing data but no error is reported. This is the most common corruption when cloning rows.
|
|
363
|
+
|
|
364
|
+
### Fix code
|
|
365
|
+
```python
|
|
366
|
+
HP = "http://www.hancom.co.kr/hwpml/2011/paragraph"
|
|
367
|
+
|
|
368
|
+
def fix_celladdr_rowaddr(tbl):
|
|
369
|
+
"""Fix rowAddr values and rowCnt for an HWPX table after row insertion."""
|
|
370
|
+
rows = tbl.findall(f"{{{HP}}}tr")
|
|
371
|
+
for row_idx, tr in enumerate(rows):
|
|
372
|
+
for tc in tr.findall(f"{{{HP}}}tc"):
|
|
373
|
+
cell_addr = tc.find(f"{{{HP}}}cellAddr")
|
|
374
|
+
if cell_addr is not None:
|
|
375
|
+
cell_addr.set("rowAddr", str(row_idx))
|
|
376
|
+
tbl.set("rowCnt", str(len(rows)))
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
### When to apply
|
|
380
|
+
- After cloning a `<hp:tr>` and inserting it into a table
|
|
381
|
+
- After inserting new rows built from `table_content` pipe-delimited data
|
|
382
|
+
- After deleting rows from a table
|
|
383
|
+
- Any time the number or order of `<hp:tr>` children changes
|
|
384
|
+
|
|
385
|
+
## Pattern 10: Image Paragraph Center Alignment
|
|
386
|
+
|
|
387
|
+
Image paragraphs in HWPX should be center-aligned using a `paraPrIDRef` that references a center-aligned `<hh:paraPr>` from `header.xml`.
|
|
388
|
+
|
|
389
|
+
### Finding center-aligned paraPrIDRef
|
|
390
|
+
|
|
391
|
+
```python
|
|
392
|
+
def find_center_parapr(header_path):
|
|
393
|
+
"""Find first center-aligned paraPr from header.xml for image paragraphs."""
|
|
394
|
+
import xml.etree.ElementTree as ET
|
|
395
|
+
HH = "http://www.hancom.co.kr/hwpml/2011/head"
|
|
396
|
+
tree = ET.parse(header_path)
|
|
397
|
+
for pp in tree.getroot().iter(f"{{{HH}}}paraPr"):
|
|
398
|
+
align = pp.find(f"{{{HH}}}align")
|
|
399
|
+
if align is not None and align.get("horizontal") == "CENTER":
|
|
400
|
+
return pp.get("id")
|
|
401
|
+
return "0" # fallback to default
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
### Usage in image paragraphs
|
|
405
|
+
|
|
406
|
+
```xml
|
|
407
|
+
<!-- Image paragraph uses center-aligned paraPrIDRef -->
|
|
408
|
+
<hp:p id="..." paraPrIDRef="{CENTER_PARAPR_ID}" styleIDRef="0" pageBreak="0" columnBreak="0" merged="0">
|
|
409
|
+
<hp:linesegarray>
|
|
410
|
+
<hp:lineseg textpos="0" vertpos="0" vertsize="{H}" textheight="{H}" .../>
|
|
411
|
+
</hp:linesegarray>
|
|
412
|
+
<hp:run charPrIDRef="0">
|
|
413
|
+
<hp:pic id="{seq_id}" ...>...</hp:pic>
|
|
414
|
+
<hp:t/>
|
|
415
|
+
</hp:run>
|
|
416
|
+
</hp:p>
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
### Why this matters
|
|
420
|
+
|
|
421
|
+
Without center alignment, images default to left-aligned positioning. Korean government document templates expect centered images, particularly for section content images (~77% page width). The `paraPrIDRef` must reference a `<hh:paraPr>` that has `<hh:align horizontal="CENTER"/>`.
|
|
422
|
+
|
|
423
|
+
### When to apply
|
|
424
|
+
- ALL image paragraphs in section content (from `image_opportunities`)
|
|
425
|
+
- Cell-level images that should be centered within the cell
|
|
426
|
+
- Both standalone and inline image paragraphs
|
|
427
|
+
|
|
428
|
+
## Safe HWPX Modification
|
|
429
|
+
|
|
430
|
+
```python
|
|
431
|
+
import xml.etree.ElementTree as ET
|
|
432
|
+
|
|
433
|
+
ns = {
|
|
434
|
+
"hp": "http://www.hancom.co.kr/hwpml/2011/paragraph",
|
|
435
|
+
"hs": "http://www.hancom.co.kr/hwpml/2011/section",
|
|
436
|
+
}
|
|
437
|
+
|
|
438
|
+
# Register namespaces to avoid prefix changes
|
|
439
|
+
for prefix, uri in ns.items():
|
|
440
|
+
ET.register_namespace(prefix, uri)
|
|
441
|
+
|
|
442
|
+
tree = ET.parse("Contents/section0.xml")
|
|
443
|
+
root = tree.getroot()
|
|
444
|
+
|
|
445
|
+
# Find empty cells adjacent to label cells in tables
|
|
446
|
+
for tbl in root.iter("{%s}tbl" % ns["hp"]):
|
|
447
|
+
for tr in tbl.iter("{%s}tr" % ns["hp"]):
|
|
448
|
+
cells = list(tr.iter("{%s}tc" % ns["hp"]))
|
|
449
|
+
for i, cell in enumerate(cells):
|
|
450
|
+
# Check if this cell has text (label)
|
|
451
|
+
texts = [t.text for t in cell.iter("{%s}t" % ns["hp"]) if t.text]
|
|
452
|
+
if texts and i + 1 < len(cells):
|
|
453
|
+
next_cell = cells[i + 1]
|
|
454
|
+
next_texts = [t.text for t in next_cell.iter("{%s}t" % ns["hp"]) if t.text]
|
|
455
|
+
if not next_texts:
|
|
456
|
+
label = "".join(texts)
|
|
457
|
+
# This is a fillable field with label
|
|
458
|
+
print(f"Found field: {label}")
|
|
459
|
+
|
|
460
|
+
tree.write("Contents/section0.xml", xml_declaration=True, encoding="UTF-8")
|
|
461
|
+
```
|