devlyn-cli 0.5.4 → 0.5.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/devlyn.js +160 -4
- package/config/agents/evaluator.md +40 -0
- package/config/commands/devlyn.evaluate.md +467 -0
- package/config/commands/devlyn.team-resolve.md +2 -2
- package/optional-skills/dokkit/ANALYSIS.md +32 -1
- package/optional-skills/dokkit/COMMANDS.md +20 -13
- package/optional-skills/dokkit/FILLING.md +19 -0
- package/optional-skills/dokkit/IMAGE-SOURCING.md +2 -2
- package/optional-skills/dokkit/PIPELINE.md +348 -0
- package/optional-skills/dokkit/SKILL.md +169 -111
- package/optional-skills/dokkit/references/docx-section-range-detection.md +147 -0
- package/optional-skills/dokkit/references/image-opportunity-heuristics.md +1 -1
- package/optional-skills/dokkit/scripts/fill_docx.py +819 -0
- package/optional-skills/dokkit/scripts/parse_image_with_gemini.py +3 -3
- package/optional-skills/dokkit/scripts/source_images.py +40 -2
- package/package.json +1 -1
|
@@ -1,153 +1,211 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: dokkit
|
|
3
3
|
description: >
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
4
|
+
One-command document template filling. Put source files (회사소개서, 사업자료,
|
|
5
|
+
이미지 등) in a folder, provide a DOCX/HWPX template, and get a polished,
|
|
6
|
+
complete document with AI-generated images. Auto-iterates until perfect.
|
|
7
|
+
Supports Korean government forms: 사업계획서, 지원서, 신청서.
|
|
8
|
+
Trigger on: "fill template", "사업계획서 작성", "문서 작성", "dokkit",
|
|
9
|
+
"fill this form", "템플릿 채워줘", "complete this document", "fill document",
|
|
10
|
+
"template automation", HWPX files, 한글 templates, document generation,
|
|
11
|
+
or any task involving filling document templates with source materials.
|
|
12
|
+
Also trigger when user drops files and asks to fill or complete a template.
|
|
10
13
|
user-invocable: true
|
|
11
14
|
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Agent
|
|
12
|
-
argument-hint: "<
|
|
15
|
+
argument-hint: "<template_path> <sources_folder> | improve [instruction]"
|
|
13
16
|
context:
|
|
14
17
|
- type: file
|
|
15
|
-
path: ${CLAUDE_SKILL_DIR}/
|
|
18
|
+
path: ${CLAUDE_SKILL_DIR}/PIPELINE.md
|
|
16
19
|
---
|
|
17
20
|
|
|
18
|
-
# Dokkit — Document
|
|
21
|
+
# Dokkit — One-Command Document Filling
|
|
19
22
|
|
|
20
|
-
|
|
23
|
+
Source folder + template → finished document. Fully automatic, iterates until perfect.
|
|
21
24
|
|
|
22
|
-
##
|
|
25
|
+
## Usage
|
|
23
26
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
| `preview` | — | Inline | Generate PDF preview via LibreOffice |
|
|
29
|
-
| `ingest` | `<file1> [file2] ...` | Agent | Parse source documents into workspace |
|
|
30
|
-
| `fill` | `<template.docx\|hwpx>` | Agent | End-to-end: analyze, fill, review, auto-fix, export |
|
|
31
|
-
| `fill-doc` | `<template.docx\|hwpx>` | Agent | Analyze template and fill fields only |
|
|
32
|
-
| `modify` | `"<instruction>"` | Agent | Apply targeted changes to filled document |
|
|
33
|
-
| `review` | `[section\|approve]` | Agent | Review with per-field confidence annotations |
|
|
34
|
-
| `export` | `<docx\|hwpx\|pdf>` | Agent | Export filled document to format |
|
|
35
|
-
|
|
36
|
-
## Routing
|
|
27
|
+
```
|
|
28
|
+
/dokkit <template_path> <sources_folder>
|
|
29
|
+
/dokkit improve ["instruction"]
|
|
30
|
+
```
|
|
37
31
|
|
|
38
|
-
|
|
32
|
+
- `template_path`: DOCX or HWPX template file **(required)**
|
|
33
|
+
- `sources_folder`: Folder with source materials **(required)**
|
|
39
34
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
35
|
+
Both arguments are mandatory. If either is missing, show this error and stop:
|
|
36
|
+
```
|
|
37
|
+
Error: template과 sources 폴더를 모두 지정해주세요.
|
|
43
38
|
|
|
44
|
-
|
|
39
|
+
Usage: /dokkit <template.docx|hwpx> <sources_folder>
|
|
40
|
+
Example: /dokkit docs/사업계획서_양식.docx docs/sources/김철수/
|
|
41
|
+
```
|
|
45
42
|
|
|
46
43
|
<example>
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
44
|
+
/dokkit docs/사업계획서_양식.docx docs/sources/김철수/
|
|
45
|
+
/dokkit docs/template.hwpx docs/자료/
|
|
46
|
+
/dokkit improve # 전체적으로 품질 향상
|
|
47
|
+
/dokkit improve "이미지를 더 넣어줘" # 특정 방향으로 개선
|
|
48
|
+
/dokkit improve "시장분석 섹션을 더 풍부하게" # 특정 섹션 강화
|
|
51
49
|
</example>
|
|
52
50
|
|
|
53
|
-
##
|
|
51
|
+
## Pipeline Overview
|
|
54
52
|
|
|
55
|
-
|
|
53
|
+
Six phases, fully automated. Phases 3-5 loop until quality gates pass (max 3 iterations).
|
|
56
54
|
|
|
57
|
-
|
|
|
58
|
-
|
|
59
|
-
|
|
|
60
|
-
|
|
|
61
|
-
|
|
|
62
|
-
|
|
|
55
|
+
| # | Phase | What Happens |
|
|
56
|
+
|---|-------|-------------|
|
|
57
|
+
| 1 | **Prepare** | Parse all source files → structured data |
|
|
58
|
+
| 2 | **Analyze** | Detect template fields, map structure |
|
|
59
|
+
| 3 | **Fill** | Generate & insert content **section-by-section** |
|
|
60
|
+
| 4 | **Images** | Generate via Gemini, insert with **correct aspect ratio** |
|
|
61
|
+
| 5 | **Review** | Quality gates → auto-fix failures → re-check |
|
|
62
|
+
| 6 | **Export** | Compile document + PDF preview |
|
|
63
63
|
|
|
64
|
-
|
|
64
|
+
Progress shown as: `Phase N/6: description`
|
|
65
|
+
|
|
66
|
+
## Core Design: Section-by-Section Generation
|
|
67
|
+
|
|
68
|
+
This is the #1 quality improvement over previous versions.
|
|
69
|
+
|
|
70
|
+
**Problem**: Generating all content at once → each section gets shallow attention, quality worse than manual AI.
|
|
71
|
+
|
|
72
|
+
**Solution**: Each template section gets dedicated AI focus with full source context, exactly like asking AI to write one section at a time manually.
|
|
73
|
+
|
|
74
|
+
For each section:
|
|
75
|
+
1. Read the section's template tips and writing instructions
|
|
76
|
+
2. Load ALL relevant source data into context
|
|
77
|
+
3. Generate rich, persuasive, data-driven content for THIS section only
|
|
78
|
+
4. Insert while preserving original formatting exactly
|
|
79
|
+
5. Verify quality immediately before moving to next section
|
|
80
|
+
|
|
81
|
+
The filler agent generates content AND inserts it — no lossy handoff between separate agents.
|
|
82
|
+
|
|
83
|
+
## Quality Gates
|
|
65
84
|
|
|
66
|
-
|
|
85
|
+
ALL must pass before final export:
|
|
67
86
|
|
|
87
|
+
| Gate | Criterion | Auto-Fix Strategy |
|
|
88
|
+
|------|-----------|-------------------|
|
|
89
|
+
| QG1 | Total text ≥ 7,500 chars | Re-enrich thin sections |
|
|
90
|
+
| QG2 | Each section_content ≥ 500 chars | Re-generate with more detail |
|
|
91
|
+
| QG3 | Zero `00.00` date placeholders | Derive from source context |
|
|
92
|
+
| QG4 | Zero `OO`/`○○` name placeholders (exclude schedule table `O` marks) | Derive from source context |
|
|
93
|
+
| QG5 | Zero `이미지 영역` text | Remove placeholder text |
|
|
94
|
+
| QG6 | Images aspect ratio correct (within 5%) | Re-measure with PIL |
|
|
95
|
+
| QG7 | No red/italic guide text in filled cells | Sanitize styles |
|
|
96
|
+
| QG8 | ≥ 10 images in document | Generate additional images via `source_images.py` |
|
|
97
|
+
| QG9 | Zero `XXXXX`/`XXXXXX` fake placeholders | Replace with real values or "해당없음" |
|
|
98
|
+
| QG10 | TOC page numbers not `00` | Remove or update TOC entries |
|
|
99
|
+
|
|
100
|
+
## Font & Formatting Rules
|
|
101
|
+
|
|
102
|
+
These rules prevent the font corruption issues seen in previous versions:
|
|
103
|
+
|
|
104
|
+
1. **Never copy guide text styling** — Template placeholders often use red (#FF0000) and italic. Strip these unconditionally when creating filled runs.
|
|
105
|
+
2. **Use template default body style** — Find the document's standard body text formatting (black, regular weight) and apply it to all filled content.
|
|
106
|
+
3. **HWPX charPr spacing** — Before ANY text insertion, scan ALL `<hh:charPr>` in `header.xml` and set negative `spacing` values to `"0"`. Negative spacing causes character overlap.
|
|
107
|
+
4. **DOCX rPr sanitization** — When copying run properties from label cells, always remove `<w:color>` if red and `<w:i/>` (italic).
|
|
108
|
+
5. **Preserve structural formatting** — Keep paragraph alignment (pPr), indentation, spacing, and table cell properties unchanged. Only modify text content and run-level styles.
|
|
109
|
+
|
|
110
|
+
## Image Rules
|
|
111
|
+
|
|
112
|
+
### Generation — use `source_images.py` exclusively
|
|
113
|
+
|
|
114
|
+
**Never write inline Gemini API calls for image generation.** Always use the provided script:
|
|
115
|
+
```bash
|
|
116
|
+
python ${CLAUDE_SKILL_DIR}/scripts/source_images.py generate \
|
|
117
|
+
--prompt "<prompt>" --preset <preset> --output-dir .dokkit/images/ \
|
|
118
|
+
--project-dir . --lang ko
|
|
68
119
|
```
|
|
69
|
-
.
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
120
|
+
The script uses `gemini-3-pro-image-preview` (high quality), enforces Korean text, and applies correct aspect ratios per preset. Bypassing it results in wrong model, wrong language, wrong dimensions.
|
|
121
|
+
|
|
122
|
+
### Prompt quality
|
|
123
|
+
|
|
124
|
+
- Include company/product-specific details — never use generic prompts
|
|
125
|
+
- For org charts: use actual names and roles from source data
|
|
126
|
+
- For market charts: include specific numbers from sources
|
|
127
|
+
- For tech diagrams: name the actual technologies and systems
|
|
128
|
+
|
|
129
|
+
### Sizing — prevent distortion
|
|
130
|
+
|
|
131
|
+
1. **Always measure actual dimensions** — After generating any image, use PIL/Pillow to get true pixel dimensions.
|
|
132
|
+
2. **Preserve aspect ratio** — Calculate display size that fits within the target cell while maintaining the image's original width:height ratio.
|
|
133
|
+
3. **HWPX imgDim** — Must reflect actual pixel dimensions from PIL, NOT layout constants.
|
|
134
|
+
4. **DOCX EMU** — Calculate from actual pixels: `EMU = pixels × 914400 / 96`.
|
|
135
|
+
5. **Never stretch** — If the image doesn't fit the cell exactly, scale down to fit within bounds (letterbox, don't fill).
|
|
136
|
+
|
|
137
|
+
```python
|
|
138
|
+
# Correct image sizing
|
|
139
|
+
from PIL import Image
|
|
140
|
+
img = Image.open(path)
|
|
141
|
+
actual_w, actual_h = img.size
|
|
142
|
+
aspect = actual_w / actual_h
|
|
143
|
+
|
|
144
|
+
# Scale to fit within target bounds
|
|
145
|
+
scale = min(target_w / actual_w, target_h / actual_h)
|
|
146
|
+
display_w = int(actual_w * scale)
|
|
147
|
+
display_h = int(actual_h * scale)
|
|
76
148
|
```
|
|
77
149
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
Read `.dokkit/state.json` before any operation. Write state changes atomically: read current → update fields → write back → validate.
|
|
150
|
+
## Architecture
|
|
81
151
|
|
|
152
|
+
### Workspace
|
|
82
153
|
```
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
154
|
+
.dokkit/
|
|
155
|
+
├── state.json # Pipeline state and progress
|
|
156
|
+
├── sources/ # Parsed source data (.md + .json pairs)
|
|
157
|
+
├── analysis.json # Template field map (from analyzer)
|
|
158
|
+
├── images/ # Generated/sourced images
|
|
159
|
+
├── template_work/ # Unpacked template XML (working copy)
|
|
160
|
+
└── output/ # Final completed documents
|
|
89
161
|
```
|
|
90
162
|
|
|
91
|
-
|
|
163
|
+
### Agents
|
|
164
|
+
|
|
165
|
+
| Agent | Model | Role |
|
|
166
|
+
|-------|-------|------|
|
|
167
|
+
| **dokkit-ingestor** | opus | Parse source files → `.dokkit/sources/` |
|
|
168
|
+
| **dokkit-analyzer** | opus | Detect fields & structure → `analysis.json` (NO content generation for sections) |
|
|
169
|
+
| **dokkit-filler** | opus | Generate content section-by-section + fill XML + insert images + quality review |
|
|
170
|
+
| **dokkit-exporter** | sonnet | Compile ZIP archives, convert to PDF |
|
|
92
171
|
|
|
93
172
|
### Knowledge Files
|
|
94
173
|
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
|
98
|
-
|
|
99
|
-
| `
|
|
100
|
-
| `
|
|
101
|
-
| `
|
|
102
|
-
| `
|
|
103
|
-
| `
|
|
104
|
-
| `
|
|
105
|
-
| `IMAGE-SOURCING.md` | Image generation, search, and insertion patterns | dokkit-filler |
|
|
106
|
-
| `EXPORT.md` | Document compilation and format conversion | dokkit-exporter |
|
|
107
|
-
|
|
108
|
-
Deep reference material in `references/`:
|
|
109
|
-
- `state-schema.md` — Complete state.json schema
|
|
110
|
-
- `supported-formats.md` — Detailed format specifications
|
|
111
|
-
- `docx-structure.md`, `docx-field-patterns.md` — DOCX patterns
|
|
112
|
-
- `hwpx-structure.md`, `hwpx-field-patterns.md` — HWPX patterns (10 detection patterns)
|
|
113
|
-
- `field-detection-patterns.md` — Advanced heuristics (9 DOCX + 6 HWPX)
|
|
114
|
-
- `section-range-detection.md` — Dynamic range detection for section_content
|
|
115
|
-
- `section-image-interleaving.md` — Image interleaving algorithm
|
|
116
|
-
- `image-opportunity-heuristics.md` — AI image opportunity detection
|
|
117
|
-
- `image-xml-patterns.md` — Image element structures (DOCX + HWPX)
|
|
118
|
-
|
|
119
|
-
Scripts in `scripts/`:
|
|
120
|
-
- `validate_state.py` — State validation
|
|
121
|
-
- `parse_xlsx.py`, `parse_hwpx.py`, `parse_image_with_gemini.py` — Custom parsers
|
|
122
|
-
- `detect_fields.py`, `detect_fields_hwpx.py` — Field detection
|
|
123
|
-
- `validate_docx.py`, `validate_hwpx.py` — Document validation
|
|
124
|
-
- `compile_hwpx.py` — HWPX repackaging
|
|
125
|
-
- `export_pdf.py` — PDF conversion
|
|
174
|
+
| File | Purpose | Used By |
|
|
175
|
+
|------|---------|---------|
|
|
176
|
+
| `PIPELINE.md` | Detailed pipeline steps (auto-loaded) | Orchestrator |
|
|
177
|
+
| `STATE.md` | State schema and management | All agents |
|
|
178
|
+
| `INGESTION.md` | Source file parsing | Ingestor |
|
|
179
|
+
| `ANALYSIS.md` | Field detection, structure mapping | Analyzer |
|
|
180
|
+
| `FILLING.md` | XML surgery rules, image insertion | Filler |
|
|
181
|
+
| `DOCX-XML.md` / `HWPX-XML.md` | XML format structures | Analyzer, Filler |
|
|
182
|
+
| `IMAGE-SOURCING.md` | Image generation patterns | Filler |
|
|
183
|
+
| `EXPORT.md` | Compilation and conversion | Exporter |
|
|
126
184
|
|
|
127
185
|
## Rules
|
|
128
186
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
</rules>
|
|
187
|
+
1. **One command does everything** — no manual subcommands needed (except `improve` for post-fill enhancement)
|
|
188
|
+
2. **Never modify the original template** — work on copies in `.dokkit/template_work/`
|
|
189
|
+
3. **Section-by-section generation** — each section gets full AI attention with all source data
|
|
190
|
+
4. **Aspect ratio preservation** — images never stretched or squashed
|
|
191
|
+
5. **Black text only** — never inherit colored/italic guide text styles
|
|
192
|
+
6. **Auto-loop** — iterate until ALL quality gates pass (max 3 iterations)
|
|
193
|
+
7. **Progress reporting** — show `Phase N/6: description` at each step
|
|
194
|
+
8. **Clear errors** — if something fails, show what went wrong with actionable guidance
|
|
195
|
+
9. **Gemini API** — if not configured, warn and skip image generation (don't block text filling)
|
|
139
196
|
|
|
140
197
|
## Known Pitfalls
|
|
141
198
|
|
|
142
|
-
Critical issues
|
|
199
|
+
Critical issues from production experience — these MUST be handled:
|
|
143
200
|
|
|
144
|
-
1. **HWPX namespace stripping**: Python ET strips unused namespace declarations. Restore ALL 14 original xmlns on EVERY root element after
|
|
145
|
-
2. **HWPX subList cell wrapping**: ~65% of cells
|
|
146
|
-
3. **table_content "Pre-filled" bug**: Never set `mapped_value` to placeholder strings
|
|
147
|
-
4. **HWPX cellAddr rowAddr corruption**: After row insert/delete, re-index ALL `rowAddr` values.
|
|
148
|
-
5. **HWPX `<hp:pic>`
|
|
149
|
-
6. **HWPML units**: 1/7200 inch
|
|
150
|
-
7. **rowSpan stripping**:
|
|
201
|
+
1. **HWPX namespace stripping**: Python ET strips unused namespace declarations. Restore ALL 14 original xmlns on EVERY root element after `tree.write()`.
|
|
202
|
+
2. **HWPX subList cell wrapping**: ~65% of cells use `<hp:subList>/<hp:p>`. Always check before writing.
|
|
203
|
+
3. **table_content "Pre-filled" bug**: Never set `mapped_value` to placeholder strings. Use `null` with `action: "preserve"`.
|
|
204
|
+
4. **HWPX cellAddr rowAddr corruption**: After row insert/delete, re-index ALL `rowAddr` values.
|
|
205
|
+
5. **HWPX `<hp:pic>` placement**: Must be `<hp:run><hp:pic>...<hp:t/></hp:run>`, not pic as sibling.
|
|
206
|
+
6. **HWPML units**: 1/7200 inch. 1mm ~ 283.46 units. A4 text width ~ 46,648 units.
|
|
207
|
+
7. **rowSpan stripping**: Divide cellSz height by rowSpan when cloning.
|
|
151
208
|
8. **HWPX pic element order**: offset, orgSz, curSz, flip, rotationInfo, renderingInfo, imgRect, imgClip, inMargin, imgDim, hc:img, sz, pos, outMargin.
|
|
152
|
-
9. **
|
|
153
|
-
10. **
|
|
209
|
+
9. **Section content table preservation**: ONLY replace `<w:p>`/`<hp:p>` elements. NEVER remove `<w:tbl>`/`<hp:tbl>`.
|
|
210
|
+
10. **Section range detection**: After deleting tips/instructions, ranges are STALE. Recompute dynamically.
|
|
211
|
+
11. **HWPX post-write safety**: Restore namespaces → fix XML declaration → remove newline between `?>` and root.
|
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
# Section Content Range Detection (DOCX)
|
|
2
|
+
|
|
3
|
+
## Problem
|
|
4
|
+
|
|
5
|
+
Same as HWPX: after deleting instruction text (※ paragraphs) and tip boxes, the child indices from `analysis.json` become stale. Using stale indices destroys tables and other structural elements.
|
|
6
|
+
|
|
7
|
+
**Additionally**, DOCX section content ranges may contain embedded `<w:tbl>` elements (schedule tables, budget tables) that must NEVER be replaced during section content filling. Unlike HWPX where tables are children of the section root, DOCX tables are direct children of `<w:body>` interspersed with paragraphs.
|
|
8
|
+
|
|
9
|
+
## Solution: Dynamic Range Detection + Table Preservation
|
|
10
|
+
|
|
11
|
+
### Step 1: Recompute ranges after cleanup
|
|
12
|
+
|
|
13
|
+
After deleting instruction text and tip boxes, scan `<w:body>` children for section title markers:
|
|
14
|
+
|
|
15
|
+
```python
|
|
16
|
+
def find_docx_section_ranges(body, w_ns):
|
|
17
|
+
"""Find section content ranges by locating title markers in w:body.
|
|
18
|
+
|
|
19
|
+
Must run AFTER tip/instruction deletion so indices are stable.
|
|
20
|
+
Returns dict mapping approximate field labels to (start, end) inclusive child index ranges.
|
|
21
|
+
"""
|
|
22
|
+
children = list(body)
|
|
23
|
+
markers = {}
|
|
24
|
+
|
|
25
|
+
for i, child in enumerate(children):
|
|
26
|
+
text = ''.join(
|
|
27
|
+
t.text or '' for t in child.iter(f'{{{w_ns}}}t')
|
|
28
|
+
).strip()
|
|
29
|
+
|
|
30
|
+
# Section title markers (numbered headings)
|
|
31
|
+
if '1.' in text and '문제' in text and ('Problem' in text or '필요성' in text):
|
|
32
|
+
markers['sec1_title'] = i
|
|
33
|
+
elif '2.' in text and '실현' in text and ('Solution' in text or '개발' in text):
|
|
34
|
+
markers['sec2_title'] = i
|
|
35
|
+
elif '3.' in text and '성장' in text and ('Scale' in text or '사업화' in text):
|
|
36
|
+
markers['sec3_title'] = i
|
|
37
|
+
elif '4.' in text and '팀' in text and ('Team' in text or '대표자' in text):
|
|
38
|
+
markers['sec4_title'] = i
|
|
39
|
+
|
|
40
|
+
# End markers
|
|
41
|
+
elif '사업추진' in text and '일정' in text and '협약기간' in text:
|
|
42
|
+
markers['schedule1'] = i
|
|
43
|
+
elif '사업추진' in text and '일정' in text and '전체' in text:
|
|
44
|
+
markers['schedule2'] = i
|
|
45
|
+
elif '팀 구성' in text and ('구분' in text or '직위' in text or '안' in text):
|
|
46
|
+
markers['team_table'] = i
|
|
47
|
+
elif '협력' in text and '기관' in text:
|
|
48
|
+
markers['partnership'] = i
|
|
49
|
+
|
|
50
|
+
# Build ranges: content starts after title + instruction text, ends before next structural element
|
|
51
|
+
ranges = {}
|
|
52
|
+
if 'sec1_title' in markers and 'sec2_title' in markers:
|
|
53
|
+
ranges['sec1'] = (markers['sec1_title'] + 1, markers['sec2_title'] - 1)
|
|
54
|
+
if 'sec2_title' in markers and 'schedule1' in markers:
|
|
55
|
+
ranges['sec2'] = (markers['sec2_title'] + 1, markers['schedule1'] - 1)
|
|
56
|
+
if 'sec3_title' in markers and 'schedule2' in markers:
|
|
57
|
+
ranges['sec3'] = (markers['sec3_title'] + 1, markers['schedule2'] - 1)
|
|
58
|
+
if 'sec4_title' in markers and 'team_table' in markers:
|
|
59
|
+
ranges['sec4'] = (markers['sec4_title'] + 1, markers['team_table'] - 1)
|
|
60
|
+
|
|
61
|
+
return ranges
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Step 2: CRITICAL — Only replace paragraphs, never tables
|
|
65
|
+
|
|
66
|
+
When filling section content within the detected range, ONLY operate on `<w:p>` elements. **Skip all other element types**:
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
W_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
|
|
70
|
+
|
|
71
|
+
def fill_docx_section_content(body, start_idx, end_idx, new_paragraphs):
|
|
72
|
+
"""Replace paragraph content within a section range, preserving tables.
|
|
73
|
+
|
|
74
|
+
RULE: Only remove/replace <w:p> elements. NEVER touch <w:tbl>, <w:bookmarkStart>,
|
|
75
|
+
<w:bookmarkEnd>, <w:sectPr>, or any non-paragraph elements.
|
|
76
|
+
"""
|
|
77
|
+
children = list(body)
|
|
78
|
+
|
|
79
|
+
# Phase 1: Identify which children to remove (paragraphs only)
|
|
80
|
+
to_remove = []
|
|
81
|
+
preserved_elements = [] # (index, element) pairs for tables etc.
|
|
82
|
+
|
|
83
|
+
for i in range(start_idx, min(end_idx + 1, len(children))):
|
|
84
|
+
child = children[i]
|
|
85
|
+
tag = child.tag.split('}')[-1] if '}' in child.tag else child.tag
|
|
86
|
+
|
|
87
|
+
if tag == 'p':
|
|
88
|
+
to_remove.append(child)
|
|
89
|
+
else:
|
|
90
|
+
# Tables, bookmarks, sectPr — preserve in their position
|
|
91
|
+
preserved_elements.append((i, child))
|
|
92
|
+
|
|
93
|
+
# Phase 2: Remove old paragraphs
|
|
94
|
+
for elem in to_remove:
|
|
95
|
+
body.remove(elem)
|
|
96
|
+
|
|
97
|
+
# Phase 3: Insert new paragraphs at the start of the range
|
|
98
|
+
# (preserved tables remain in place)
|
|
99
|
+
insert_point = start_idx
|
|
100
|
+
for new_p in new_paragraphs:
|
|
101
|
+
body.insert(insert_point, new_p)
|
|
102
|
+
insert_point += 1
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### Why Tables Must Be Preserved
|
|
106
|
+
|
|
107
|
+
The 예비창업패키지 사업계획서 template has this structure within `<w:body>`:
|
|
108
|
+
|
|
109
|
+
```
|
|
110
|
+
[19] p: "1. 문제 인식 (Problem)..." — section title
|
|
111
|
+
[20] p: "※ 개발하고자 하는..." — instruction text (delete)
|
|
112
|
+
[21-60] p: section content paragraphs (replace)
|
|
113
|
+
[61] p: "2. 실현 가능성 (Solution)..." — section title
|
|
114
|
+
[62] p: "※ 아이디어를..." — instruction text (delete)
|
|
115
|
+
[63-82] p: section content paragraphs (replace)
|
|
116
|
+
[83] p: "< 사업추진 일정(협약기간 내) >" — schedule heading
|
|
117
|
+
[85] tbl: schedule table ← MUST PRESERVE
|
|
118
|
+
[91] tbl: budget table 1 ← MUST PRESERVE
|
|
119
|
+
[96] tbl: budget table 2 ← MUST PRESERVE
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
If the filler replaces the entire range including tables, the schedule and budget data is destroyed. The tables are handled separately as `table_content` fields.
|
|
123
|
+
|
|
124
|
+
### Form Tables vs Section Content
|
|
125
|
+
|
|
126
|
+
The following tables are NOT section content — they are form-filling tables with individual cell fields:
|
|
127
|
+
|
|
128
|
+
| Body index | Content | Field type |
|
|
129
|
+
|-----------|---------|------------|
|
|
130
|
+
| 13 | 일반현황 table (창업아이템명, 산출물) | `empty_cell` per cell |
|
|
131
|
+
| 17 | 개요(요약) table (명칭, 범주, etc.) | `empty_cell` per cell |
|
|
132
|
+
| 85 | 사업추진 일정 (협약기간) | `table_content` |
|
|
133
|
+
| 91 | 1단계 정부지원사업비 | `table_content` |
|
|
134
|
+
| 96 | 2단계 정부지원사업비 | `table_content` |
|
|
135
|
+
| 140 | 사업추진 일정 (전체) | `table_content` |
|
|
136
|
+
| 160 | 팀 구성 table | `table_content` |
|
|
137
|
+
| 164 | 협력 기관 table | `table_content` |
|
|
138
|
+
|
|
139
|
+
The analyzer must classify these as their specific field types. They should NEVER be included in a `section_content` field's range.
|
|
140
|
+
|
|
141
|
+
## Adapting for Different Templates
|
|
142
|
+
|
|
143
|
+
Same as the HWPX version — identify section title markers, match by text content, map to field IDs. The key difference for DOCX:
|
|
144
|
+
|
|
145
|
+
1. Body children are direct `<w:p>` and `<w:tbl>` elements (flat structure)
|
|
146
|
+
2. Tables are interspersed with paragraphs at the same level
|
|
147
|
+
3. The "only replace `<w:p>`" rule is universal and template-independent
|
|
@@ -113,7 +113,7 @@ Each opportunity is added to the field's `image_opportunities` array:
|
|
|
113
113
|
- `insertion_point.strategy`: Always `"after_paragraph"` for section content
|
|
114
114
|
- `insertion_point.anchor_text`: Distinctive Korean phrase from the paragraph (used by filler to locate insertion point)
|
|
115
115
|
- `generation_prompt`: English prompt for AI image generation
|
|
116
|
-
- `preset`: Maps to
|
|
116
|
+
- `preset`: Maps to `.claude/skills/dokkit/scripts/source_images.py` preset parameter
|
|
117
117
|
- `content_type`: One of `flowchart`, `diagram`, `data`, `concept`, `infographic`
|
|
118
118
|
- `rationale`: Brief explanation of why an image helps here
|
|
119
119
|
- `dimensions`: Default size — filler may adjust based on content_type
|