devlyn-cli 0.5.4 → 0.5.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,153 +1,211 @@
1
1
  ---
2
2
  name: dokkit
3
3
  description: >
4
- Document template filling system for DOCX and HWPX formats.
5
- Ingests source documents, analyzes templates, detects fillable fields,
6
- fills them surgically using source data, reviews with confidence scoring,
7
- and exports completed documents. Supports Korean and English templates.
8
- Subcommands: init, sources, preview, ingest, fill, fill-doc, modify, review, export.
9
- Use when user says "fill template", "fill document", "ingest", "dokkit".
4
+ One-command document template filling. Put source files (회사소개서, 사업자료,
5
+ 이미지 등) in a folder, provide a DOCX/HWPX template, and get a polished,
6
+ complete document with AI-generated images. Auto-iterates until perfect.
7
+ Supports Korean government forms: 사업계획서, 지원서, 신청서.
8
+ Trigger on: "fill template", "사업계획서 작성", "문서 작성", "dokkit",
9
+ "fill this form", "템플릿 채워줘", "complete this document", "fill document",
10
+ "template automation", HWPX files, 한글 templates, document generation,
11
+ or any task involving filling document templates with source materials.
12
+ Also trigger when user drops files and asks to fill or complete a template.
10
13
  user-invocable: true
11
14
  allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Agent
12
- argument-hint: "<subcommand> [arguments]"
15
+ argument-hint: "<template_path> <sources_folder> | improve [instruction]"
13
16
  context:
14
17
  - type: file
15
- path: ${CLAUDE_SKILL_DIR}/COMMANDS.md
18
+ path: ${CLAUDE_SKILL_DIR}/PIPELINE.md
16
19
  ---
17
20
 
18
- # Dokkit — Document Template Filling System
21
+ # Dokkit — One-Command Document Filling
19
22
 
20
- Surgical document filling for DOCX and HWPX templates using ingested source data. One command with 9 subcommands covering the full document filling lifecycle.
23
+ Source folder + template finished document. Fully automatic, iterates until perfect.
21
24
 
22
- ## Subcommands
25
+ ## Usage
23
26
 
24
- | Subcommand | Arguments | Type | Description |
25
- |------------|-----------|------|-------------|
26
- | `init` | `[--force] [--keep-sources]` | Inline | Initialize or reset workspace |
27
- | `sources` | — | Inline | Display ingested sources dashboard |
28
- | `preview` | — | Inline | Generate PDF preview via LibreOffice |
29
- | `ingest` | `<file1> [file2] ...` | Agent | Parse source documents into workspace |
30
- | `fill` | `<template.docx\|hwpx>` | Agent | End-to-end: analyze, fill, review, auto-fix, export |
31
- | `fill-doc` | `<template.docx\|hwpx>` | Agent | Analyze template and fill fields only |
32
- | `modify` | `"<instruction>"` | Agent | Apply targeted changes to filled document |
33
- | `review` | `[section\|approve]` | Agent | Review with per-field confidence annotations |
34
- | `export` | `<docx\|hwpx\|pdf>` | Agent | Export filled document to format |
35
-
36
- ## Routing
27
+ ```
28
+ /dokkit <template_path> <sources_folder>
29
+ /dokkit improve ["instruction"]
30
+ ```
37
31
 
38
- Parse `$ARGUMENTS` to determine the subcommand:
32
+ - `template_path`: DOCX or HWPX template file **(required)**
33
+ - `sources_folder`: Folder with source materials **(required)**
39
34
 
40
- 1. Extract `$1` as the subcommand name
41
- 2. Pass remaining arguments (`$2`, `$3`, ...) to the subcommand
42
- 3. If `$1` is empty or unrecognized, display the subcommand table above with usage examples
35
+ Both arguments are mandatory. If either is missing, show this error and stop:
36
+ ```
37
+ Error: template과 sources 폴더를 모두 지정해주세요.
43
38
 
44
- Full workflows for each subcommand are in COMMANDS.md (auto-loaded via context).
39
+ Usage: /dokkit <template.docx|hwpx> <sources_folder>
40
+ Example: /dokkit docs/사업계획서_양식.docx docs/sources/김철수/
41
+ ```
45
42
 
46
43
  <example>
47
- - `/dokkit ingest docs/resume.pdf docs/transcript.xlsx` — ingest two sources
48
- - `/dokkit fill docs/template.hwpx` — end-to-end fill pipeline
49
- - `/dokkit modify "Change the phone number to 010-1234-5678"` — targeted change
50
- - `/dokkit export pdf` export as PDF
44
+ /dokkit docs/사업계획서_양식.docx docs/sources/김철수/
45
+ /dokkit docs/template.hwpx docs/자료/
46
+ /dokkit improve # 전체적으로 품질 향상
47
+ /dokkit improve "이미지를 넣어줘" # 특정 방향으로 개선
48
+ /dokkit improve "시장분석 섹션을 더 풍부하게" # 특정 섹션 강화
51
49
  </example>
52
50
 
53
- ## Architecture
51
+ ## Pipeline Overview
54
52
 
55
- ### Agents
53
+ Six phases, fully automated. Phases 3-5 loop until quality gates pass (max 3 iterations).
56
54
 
57
- | Agent | Model | Role |
58
- |-------|-------|------|
59
- | **dokkit-ingestor** | opus | Parse source docs into `.dokkit/sources/` (.md + .json pairs) |
60
- | **dokkit-analyzer** | opus | Analyze templates, detect fields, map to sources. Writes `analysis.json`. READ-ONLY on templates. |
61
- | **dokkit-filler** | opus | Surgical XML modification using analysis.json. Three modes: fill, modify, review. |
62
- | **dokkit-exporter** | sonnet | Repackage ZIP archives, PDF conversion via LibreOffice. |
55
+ | # | Phase | What Happens |
56
+ |---|-------|-------------|
57
+ | 1 | **Prepare** | Parse all source files structured data |
58
+ | 2 | **Analyze** | Detect template fields, map structure |
59
+ | 3 | **Fill** | Generate & insert content **section-by-section** |
60
+ | 4 | **Images** | Generate via Gemini, insert with **correct aspect ratio** |
61
+ | 5 | **Review** | Quality gates → auto-fix failures → re-check |
62
+ | 6 | **Export** | Compile document + PDF preview |
63
63
 
64
- ### Workspace
64
+ Progress shown as: `Phase N/6: description`
65
+
66
+ ## Core Design: Section-by-Section Generation
67
+
68
+ This is the #1 quality improvement over previous versions.
69
+
70
+ **Problem**: Generating all content at once → each section gets shallow attention, quality worse than manual AI.
71
+
72
+ **Solution**: Each template section gets dedicated AI focus with full source context, exactly like asking AI to write one section at a time manually.
73
+
74
+ For each section:
75
+ 1. Read the section's template tips and writing instructions
76
+ 2. Load ALL relevant source data into context
77
+ 3. Generate rich, persuasive, data-driven content for THIS section only
78
+ 4. Insert while preserving original formatting exactly
79
+ 5. Verify quality immediately before moving to next section
80
+
81
+ The filler agent generates content AND inserts it — no lossy handoff between separate agents.
82
+
83
+ ## Quality Gates
65
84
 
66
- All agents communicate via the `.dokkit/` filesystem:
85
+ ALL must pass before final export:
67
86
 
87
+ | Gate | Criterion | Auto-Fix Strategy |
88
+ |------|-----------|-------------------|
89
+ | QG1 | Total text ≥ 7,500 chars | Re-enrich thin sections |
90
+ | QG2 | Each section_content ≥ 500 chars | Re-generate with more detail |
91
+ | QG3 | Zero `00.00` date placeholders | Derive from source context |
92
+ | QG4 | Zero `OO`/`○○` name placeholders (exclude schedule table `O` marks) | Derive from source context |
93
+ | QG5 | Zero `이미지 영역` text | Remove placeholder text |
94
+ | QG6 | Images aspect ratio correct (within 5%) | Re-measure with PIL |
95
+ | QG7 | No red/italic guide text in filled cells | Sanitize styles |
96
+ | QG8 | ≥ 10 images in document | Generate additional images via `source_images.py` |
97
+ | QG9 | Zero `XXXXX`/`XXXXXX` fake placeholders | Replace with real values or "해당없음" |
98
+ | QG10 | TOC page numbers not `00` | Remove or update TOC entries |
99
+
100
+ ## Font & Formatting Rules
101
+
102
+ These rules prevent the font corruption issues seen in previous versions:
103
+
104
+ 1. **Never copy guide text styling** — Template placeholders often use red (#FF0000) and italic. Strip these unconditionally when creating filled runs.
105
+ 2. **Use template default body style** — Find the document's standard body text formatting (black, regular weight) and apply it to all filled content.
106
+ 3. **HWPX charPr spacing** — Before ANY text insertion, scan ALL `<hh:charPr>` in `header.xml` and set negative `spacing` values to `"0"`. Negative spacing causes character overlap.
107
+ 4. **DOCX rPr sanitization** — When copying run properties from label cells, always remove `<w:color>` if red and `<w:i/>` (italic).
108
+ 5. **Preserve structural formatting** — Keep paragraph alignment (pPr), indentation, spacing, and table cell properties unchanged. Only modify text content and run-level styles.
109
+
110
+ ## Image Rules
111
+
112
+ ### Generation — use `source_images.py` exclusively
113
+
114
+ **Never write inline Gemini API calls for image generation.** Always use the provided script:
115
+ ```bash
116
+ python ${CLAUDE_SKILL_DIR}/scripts/source_images.py generate \
117
+ --prompt "<prompt>" --preset <preset> --output-dir .dokkit/images/ \
118
+ --project-dir . --lang ko
68
119
  ```
69
- .dokkit/
70
- ├── state.json # Single source of truth for session state
71
- ├── sources/ # Ingested content (.md + .json pairs)
72
- ├── analysis.json # Template analysis output (from analyzer)
73
- ├── images/ # Sourced images for template filling
74
- ├── template_work/ # Unpacked template XML (working copy)
75
- └── output/ # Exported filled documents
120
+ The script uses `gemini-3-pro-image-preview` (high quality), enforces Korean text, and applies correct aspect ratios per preset. Bypassing it results in wrong model, wrong language, wrong dimensions.
121
+
122
+ ### Prompt quality
123
+
124
+ - Include company/product-specific details never use generic prompts
125
+ - For org charts: use actual names and roles from source data
126
+ - For market charts: include specific numbers from sources
127
+ - For tech diagrams: name the actual technologies and systems
128
+
129
+ ### Sizing — prevent distortion
130
+
131
+ 1. **Always measure actual dimensions** — After generating any image, use PIL/Pillow to get true pixel dimensions.
132
+ 2. **Preserve aspect ratio** — Calculate display size that fits within the target cell while maintaining the image's original width:height ratio.
133
+ 3. **HWPX imgDim** — Must reflect actual pixel dimensions from PIL, NOT layout constants.
134
+ 4. **DOCX EMU** — Calculate from actual pixels: `EMU = pixels × 914400 / 96`.
135
+ 5. **Never stretch** — If the image doesn't fit the cell exactly, scale down to fit within bounds (letterbox, don't fill).
136
+
137
+ ```python
138
+ # Correct image sizing
139
+ from PIL import Image
140
+ img = Image.open(path)
141
+ actual_w, actual_h = img.size
142
+ aspect = actual_w / actual_h
143
+
144
+ # Scale to fit within target bounds
145
+ scale = min(target_w / actual_w, target_h / actual_h)
146
+ display_w = int(actual_w * scale)
147
+ display_h = int(actual_h * scale)
76
148
  ```
77
149
 
78
- ### State Protocol
79
-
80
- Read `.dokkit/state.json` before any operation. Write state changes atomically: read current → update fields → write back → validate.
150
+ ## Architecture
81
151
 
152
+ ### Workspace
82
153
  ```
83
- init → state created (empty)
84
- ingest source added to sources[]
85
- fill/fill-doc template set, analysis created, filled_document created
86
- modify filled_document updated
87
- review approve filled_document.status = "finalized"
88
- export export entry added to exports[]
154
+ .dokkit/
155
+ ├── state.json # Pipeline state and progress
156
+ ├── sources/ # Parsed source data (.md + .json pairs)
157
+ ├── analysis.json # Template field map (from analyzer)
158
+ ├── images/ # Generated/sourced images
159
+ ├── template_work/ # Unpacked template XML (working copy)
160
+ └── output/ # Final completed documents
89
161
  ```
90
162
 
91
- Validate after every write: `python ${CLAUDE_SKILL_DIR}/scripts/validate_state.py .dokkit/state.json`
163
+ ### Agents
164
+
165
+ | Agent | Model | Role |
166
+ |-------|-------|------|
167
+ | **dokkit-ingestor** | opus | Parse source files → `.dokkit/sources/` |
168
+ | **dokkit-analyzer** | opus | Detect fields & structure → `analysis.json` (NO content generation for sections) |
169
+ | **dokkit-filler** | opus | Generate content section-by-section + fill XML + insert images + quality review |
170
+ | **dokkit-exporter** | sonnet | Compile ZIP archives, convert to PDF |
92
171
 
93
172
  ### Knowledge Files
94
173
 
95
- Agent-facing knowledge bases in this skill directory:
96
-
97
- | File | Purpose | Agents |
98
- |------|---------|--------|
99
- | `STATE.md` | State schema and management protocol | All |
100
- | `INGESTION.md` | Format routing and parsing strategies | dokkit-ingestor |
101
- | `ANALYSIS.md` | Field detection, confidence scoring, output schema | dokkit-analyzer |
102
- | `FILLING.md` | XML surgery rules, matching strategy, image insertion | dokkit-analyzer, dokkit-filler |
103
- | `DOCX-XML.md` | Open XML structure for DOCX documents | dokkit-analyzer, dokkit-filler |
104
- | `HWPX-XML.md` | OWPML structure for HWPX documents | dokkit-analyzer, dokkit-filler |
105
- | `IMAGE-SOURCING.md` | Image generation, search, and insertion patterns | dokkit-filler |
106
- | `EXPORT.md` | Document compilation and format conversion | dokkit-exporter |
107
-
108
- Deep reference material in `references/`:
109
- - `state-schema.md` — Complete state.json schema
110
- - `supported-formats.md` — Detailed format specifications
111
- - `docx-structure.md`, `docx-field-patterns.md` — DOCX patterns
112
- - `hwpx-structure.md`, `hwpx-field-patterns.md` — HWPX patterns (10 detection patterns)
113
- - `field-detection-patterns.md` — Advanced heuristics (9 DOCX + 6 HWPX)
114
- - `section-range-detection.md` — Dynamic range detection for section_content
115
- - `section-image-interleaving.md` — Image interleaving algorithm
116
- - `image-opportunity-heuristics.md` — AI image opportunity detection
117
- - `image-xml-patterns.md` — Image element structures (DOCX + HWPX)
118
-
119
- Scripts in `scripts/`:
120
- - `validate_state.py` — State validation
121
- - `parse_xlsx.py`, `parse_hwpx.py`, `parse_image_with_gemini.py` — Custom parsers
122
- - `detect_fields.py`, `detect_fields_hwpx.py` — Field detection
123
- - `validate_docx.py`, `validate_hwpx.py` — Document validation
124
- - `compile_hwpx.py` — HWPX repackaging
125
- - `export_pdf.py` — PDF conversion
174
+ | File | Purpose | Used By |
175
+ |------|---------|---------|
176
+ | `PIPELINE.md` | Detailed pipeline steps (auto-loaded) | Orchestrator |
177
+ | `STATE.md` | State schema and management | All agents |
178
+ | `INGESTION.md` | Source file parsing | Ingestor |
179
+ | `ANALYSIS.md` | Field detection, structure mapping | Analyzer |
180
+ | `FILLING.md` | XML surgery rules, image insertion | Filler |
181
+ | `DOCX-XML.md` / `HWPX-XML.md` | XML format structures | Analyzer, Filler |
182
+ | `IMAGE-SOURCING.md` | Image generation patterns | Filler |
183
+ | `EXPORT.md` | Compilation and conversion | Exporter |
126
184
 
127
185
  ## Rules
128
186
 
129
- <rules>
130
- - Display errors clearly with actionable guidance. Never silently fall back to defaults.
131
- - Original template is never modified copies go to `.dokkit/template_work/`.
132
- - Analyzer is read-only on templates. Only the filler modifies XML.
133
- - Confidence levels: high, medium, low (not numeric scores).
134
- - Signatures must be user-provided never auto-generate them.
135
- - Validate state after every write with `scripts/validate_state.py`.
136
- - Inline commands (init, sources, preview) execute directly do NOT spawn agents.
137
- - Agent-delegated commands spawn the appropriate agent(s) sequentially.
138
- </rules>
187
+ 1. **One command does everything** — no manual subcommands needed (except `improve` for post-fill enhancement)
188
+ 2. **Never modify the original template** work on copies in `.dokkit/template_work/`
189
+ 3. **Section-by-section generation** each section gets full AI attention with all source data
190
+ 4. **Aspect ratio preservation** images never stretched or squashed
191
+ 5. **Black text only** never inherit colored/italic guide text styles
192
+ 6. **Auto-loop** iterate until ALL quality gates pass (max 3 iterations)
193
+ 7. **Progress reporting** show `Phase N/6: description` at each step
194
+ 8. **Clear errors** if something fails, show what went wrong with actionable guidance
195
+ 9. **Gemini API** if not configured, warn and skip image generation (don't block text filling)
139
196
 
140
197
  ## Known Pitfalls
141
198
 
142
- Critical issues discovered through production use:
199
+ Critical issues from production experience — these MUST be handled:
143
200
 
144
- 1. **HWPX namespace stripping**: Python ET strips unused namespace declarations. Restore ALL 14 original xmlns on EVERY root element after any `tree.write()`. Applies to section0.xml, content.hpf, header.xml.
145
- 2. **HWPX subList cell wrapping**: ~65% of cells wrap content in `<hp:subList>/<hp:p>`. Check for subList before writing content.
146
- 3. **table_content "Pre-filled" bug**: Never set `mapped_value` to placeholder strings for `table_content` fields. Use `mapped_value: null` with `action: "preserve"`.
147
- 4. **HWPX cellAddr rowAddr corruption**: After row insert/delete, re-index ALL `rowAddr` values. Duplicate rowAddr causes silent data loss.
148
- 5. **HWPX `<hp:pic>` inside `<hp:run>`**: Pic as sibling of run renders invisible. Must be `<hp:run><hp:pic>...<hp:t/></hp:run>`.
149
- 6. **HWPML units**: 1/7200 inch, NOT hundredths of mm. 1mm ~ 283.46 units. A4 text width ~ 46,648 units.
150
- 7. **rowSpan stripping**: When cloning rows with rowSpan>1, divide cellSz height by rowSpan.
201
+ 1. **HWPX namespace stripping**: Python ET strips unused namespace declarations. Restore ALL 14 original xmlns on EVERY root element after `tree.write()`.
202
+ 2. **HWPX subList cell wrapping**: ~65% of cells use `<hp:subList>/<hp:p>`. Always check before writing.
203
+ 3. **table_content "Pre-filled" bug**: Never set `mapped_value` to placeholder strings. Use `null` with `action: "preserve"`.
204
+ 4. **HWPX cellAddr rowAddr corruption**: After row insert/delete, re-index ALL `rowAddr` values.
205
+ 5. **HWPX `<hp:pic>` placement**: Must be `<hp:run><hp:pic>...<hp:t/></hp:run>`, not pic as sibling.
206
+ 6. **HWPML units**: 1/7200 inch. 1mm ~ 283.46 units. A4 text width ~ 46,648 units.
207
+ 7. **rowSpan stripping**: Divide cellSz height by rowSpan when cloning.
151
208
  8. **HWPX pic element order**: offset, orgSz, curSz, flip, rotationInfo, renderingInfo, imgRect, imgClip, inMargin, imgDim, hc:img, sz, pos, outMargin.
152
- 9. **HWPX post-write safety**: After ET write: (a) restore namespaces, (b) fix XML declaration to double quotes with `standalone="yes"`, (c) remove newline between `?>` and `<root>`.
153
- 10. **compile_hwpx.py skip .bak**: Backup files must be excluded from ZIP repackaging.
209
+ 9. **Section content table preservation**: ONLY replace `<w:p>`/`<hp:p>` elements. NEVER remove `<w:tbl>`/`<hp:tbl>`.
210
+ 10. **Section range detection**: After deleting tips/instructions, ranges are STALE. Recompute dynamically.
211
+ 11. **HWPX post-write safety**: Restore namespaces → fix XML declaration → remove newline between `?>` and root.
@@ -0,0 +1,147 @@
1
+ # Section Content Range Detection (DOCX)
2
+
3
+ ## Problem
4
+
5
+ Same as HWPX: after deleting instruction text (※ paragraphs) and tip boxes, the child indices from `analysis.json` become stale. Using stale indices destroys tables and other structural elements.
6
+
7
+ **Additionally**, DOCX section content ranges may contain embedded `<w:tbl>` elements (schedule tables, budget tables) that must NEVER be replaced during section content filling. Unlike HWPX where tables are children of the section root, DOCX tables are direct children of `<w:body>` interspersed with paragraphs.
8
+
9
+ ## Solution: Dynamic Range Detection + Table Preservation
10
+
11
+ ### Step 1: Recompute ranges after cleanup
12
+
13
+ After deleting instruction text and tip boxes, scan `<w:body>` children for section title markers:
14
+
15
+ ```python
16
+ def find_docx_section_ranges(body, w_ns):
17
+ """Find section content ranges by locating title markers in w:body.
18
+
19
+ Must run AFTER tip/instruction deletion so indices are stable.
20
+ Returns dict mapping approximate field labels to (start, end) inclusive child index ranges.
21
+ """
22
+ children = list(body)
23
+ markers = {}
24
+
25
+ for i, child in enumerate(children):
26
+ text = ''.join(
27
+ t.text or '' for t in child.iter(f'{{{w_ns}}}t')
28
+ ).strip()
29
+
30
+ # Section title markers (numbered headings)
31
+ if '1.' in text and '문제' in text and ('Problem' in text or '필요성' in text):
32
+ markers['sec1_title'] = i
33
+ elif '2.' in text and '실현' in text and ('Solution' in text or '개발' in text):
34
+ markers['sec2_title'] = i
35
+ elif '3.' in text and '성장' in text and ('Scale' in text or '사업화' in text):
36
+ markers['sec3_title'] = i
37
+ elif '4.' in text and '팀' in text and ('Team' in text or '대표자' in text):
38
+ markers['sec4_title'] = i
39
+
40
+ # End markers
41
+ elif '사업추진' in text and '일정' in text and '협약기간' in text:
42
+ markers['schedule1'] = i
43
+ elif '사업추진' in text and '일정' in text and '전체' in text:
44
+ markers['schedule2'] = i
45
+ elif '팀 구성' in text and ('구분' in text or '직위' in text or '안' in text):
46
+ markers['team_table'] = i
47
+ elif '협력' in text and '기관' in text:
48
+ markers['partnership'] = i
49
+
50
+ # Build ranges: content starts after title + instruction text, ends before next structural element
51
+ ranges = {}
52
+ if 'sec1_title' in markers and 'sec2_title' in markers:
53
+ ranges['sec1'] = (markers['sec1_title'] + 1, markers['sec2_title'] - 1)
54
+ if 'sec2_title' in markers and 'schedule1' in markers:
55
+ ranges['sec2'] = (markers['sec2_title'] + 1, markers['schedule1'] - 1)
56
+ if 'sec3_title' in markers and 'schedule2' in markers:
57
+ ranges['sec3'] = (markers['sec3_title'] + 1, markers['schedule2'] - 1)
58
+ if 'sec4_title' in markers and 'team_table' in markers:
59
+ ranges['sec4'] = (markers['sec4_title'] + 1, markers['team_table'] - 1)
60
+
61
+ return ranges
62
+ ```
63
+
64
+ ### Step 2: CRITICAL — Only replace paragraphs, never tables
65
+
66
+ When filling section content within the detected range, ONLY operate on `<w:p>` elements. **Skip all other element types**:
67
+
68
+ ```python
69
+ W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
70
+
71
+ def fill_docx_section_content(body, start_idx, end_idx, new_paragraphs):
72
+ """Replace paragraph content within a section range, preserving tables.
73
+
74
+ RULE: Only remove/replace <w:p> elements. NEVER touch <w:tbl>, <w:bookmarkStart>,
75
+ <w:bookmarkEnd>, <w:sectPr>, or any non-paragraph elements.
76
+ """
77
+ children = list(body)
78
+
79
+ # Phase 1: Identify which children to remove (paragraphs only)
80
+ to_remove = []
81
+ preserved_elements = [] # (index, element) pairs for tables etc.
82
+
83
+ for i in range(start_idx, min(end_idx + 1, len(children))):
84
+ child = children[i]
85
+ tag = child.tag.split('}')[-1] if '}' in child.tag else child.tag
86
+
87
+ if tag == 'p':
88
+ to_remove.append(child)
89
+ else:
90
+ # Tables, bookmarks, sectPr — preserve in their position
91
+ preserved_elements.append((i, child))
92
+
93
+ # Phase 2: Remove old paragraphs
94
+ for elem in to_remove:
95
+ body.remove(elem)
96
+
97
+ # Phase 3: Insert new paragraphs at the start of the range
98
+ # (preserved tables remain in place)
99
+ insert_point = start_idx
100
+ for new_p in new_paragraphs:
101
+ body.insert(insert_point, new_p)
102
+ insert_point += 1
103
+ ```
104
+
105
+ ### Why Tables Must Be Preserved
106
+
107
+ The 예비창업패키지 사업계획서 template has this structure within `<w:body>`:
108
+
109
+ ```
110
+ [19] p: "1. 문제 인식 (Problem)..." — section title
111
+ [20] p: "※ 개발하고자 하는..." — instruction text (delete)
112
+ [21-60] p: section content paragraphs (replace)
113
+ [61] p: "2. 실현 가능성 (Solution)..." — section title
114
+ [62] p: "※ 아이디어를..." — instruction text (delete)
115
+ [63-82] p: section content paragraphs (replace)
116
+ [83] p: "< 사업추진 일정(협약기간 내) >" — schedule heading
117
+ [85] tbl: schedule table ← MUST PRESERVE
118
+ [91] tbl: budget table 1 ← MUST PRESERVE
119
+ [96] tbl: budget table 2 ← MUST PRESERVE
120
+ ```
121
+
122
+ If the filler replaces the entire range including tables, the schedule and budget data is destroyed. The tables are handled separately as `table_content` fields.
123
+
124
+ ### Form Tables vs Section Content
125
+
126
+ The following tables are NOT section content — they are form-filling tables with individual cell fields:
127
+
128
+ | Body index | Content | Field type |
129
+ |-----------|---------|------------|
130
+ | 13 | 일반현황 table (창업아이템명, 산출물) | `empty_cell` per cell |
131
+ | 17 | 개요(요약) table (명칭, 범주, etc.) | `empty_cell` per cell |
132
+ | 85 | 사업추진 일정 (협약기간) | `table_content` |
133
+ | 91 | 1단계 정부지원사업비 | `table_content` |
134
+ | 96 | 2단계 정부지원사업비 | `table_content` |
135
+ | 140 | 사업추진 일정 (전체) | `table_content` |
136
+ | 160 | 팀 구성 table | `table_content` |
137
+ | 164 | 협력 기관 table | `table_content` |
138
+
139
+ The analyzer must classify these as their specific field types. They should NEVER be included in a `section_content` field's range.
140
+
141
+ ## Adapting for Different Templates
142
+
143
+ Same as the HWPX version — identify section title markers, match by text content, map to field IDs. The key difference for DOCX:
144
+
145
+ 1. Body children are direct `<w:p>` and `<w:tbl>` elements (flat structure)
146
+ 2. Tables are interspersed with paragraphs at the same level
147
+ 3. The "only replace `<w:p>`" rule is universal and template-independent
@@ -113,7 +113,7 @@ Each opportunity is added to the field's `image_opportunities` array:
113
113
  - `insertion_point.strategy`: Always `"after_paragraph"` for section content
114
114
  - `insertion_point.anchor_text`: Distinctive Korean phrase from the paragraph (used by filler to locate insertion point)
115
115
  - `generation_prompt`: English prompt for AI image generation
116
- - `preset`: Maps to `scripts/source_images.py` preset parameter
116
+ - `preset`: Maps to `.claude/skills/dokkit/scripts/source_images.py` preset parameter
117
117
  - `content_type`: One of `flowchart`, `diagram`, `data`, `concept`, `infographic`
118
118
  - `rationale`: Brief explanation of why an image helps here
119
119
  - `dimensions`: Default size — filler may adjust based on content_type