devlyn-cli 0.5.2 → 0.5.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/devlyn.js +1 -0
- package/config/commands/devlyn.team-resolve.md +31 -2
- package/optional-skills/dokkit/ANALYSIS.md +198 -0
- package/optional-skills/dokkit/COMMANDS.md +365 -0
- package/optional-skills/dokkit/DOCX-XML.md +76 -0
- package/optional-skills/dokkit/EXPORT.md +102 -0
- package/optional-skills/dokkit/FILLING.md +377 -0
- package/optional-skills/dokkit/HWPX-XML.md +73 -0
- package/optional-skills/dokkit/IMAGE-SOURCING.md +127 -0
- package/optional-skills/dokkit/INGESTION.md +65 -0
- package/optional-skills/dokkit/SKILL.md +153 -0
- package/optional-skills/dokkit/STATE.md +60 -0
- package/optional-skills/dokkit/references/docx-field-patterns.md +151 -0
- package/optional-skills/dokkit/references/docx-structure.md +58 -0
- package/optional-skills/dokkit/references/field-detection-patterns.md +130 -0
- package/optional-skills/dokkit/references/hwpx-field-patterns.md +461 -0
- package/optional-skills/dokkit/references/hwpx-structure.md +159 -0
- package/optional-skills/dokkit/references/image-opportunity-heuristics.md +121 -0
- package/optional-skills/dokkit/references/image-xml-patterns.md +338 -0
- package/optional-skills/dokkit/references/section-image-interleaving.md +346 -0
- package/optional-skills/dokkit/references/section-range-detection.md +118 -0
- package/optional-skills/dokkit/references/state-schema.md +143 -0
- package/optional-skills/dokkit/references/supported-formats.md +67 -0
- package/optional-skills/dokkit/scripts/compile_hwpx.py +134 -0
- package/optional-skills/dokkit/scripts/detect_fields.py +301 -0
- package/optional-skills/dokkit/scripts/detect_fields_hwpx.py +286 -0
- package/optional-skills/dokkit/scripts/export_pdf.py +99 -0
- package/optional-skills/dokkit/scripts/parse_hwpx.py +185 -0
- package/optional-skills/dokkit/scripts/parse_image_with_gemini.py +159 -0
- package/optional-skills/dokkit/scripts/parse_xlsx.py +98 -0
- package/optional-skills/dokkit/scripts/source_images.py +365 -0
- package/optional-skills/dokkit/scripts/validate_docx.py +142 -0
- package/optional-skills/dokkit/scripts/validate_hwpx.py +281 -0
- package/optional-skills/dokkit/scripts/validate_state.py +132 -0
- package/package.json +1 -1
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Image Sourcing
|
|
2
|
+
|
|
3
|
+
Strategies for sourcing images to fill template fields requiring photos, logos, signatures, or illustrations.
|
|
4
|
+
|
|
5
|
+
## Image Types
|
|
6
|
+
|
|
7
|
+
| image_type | Description | Auto-generate? |
|
|
8
|
+
|-----------|-------------|----------------|
|
|
9
|
+
| `photo` | ID/profile pictures | No — user-provided only |
|
|
10
|
+
| `logo` | Company logos | No — user-provided only |
|
|
11
|
+
| `signature` | Signature fields | NEVER — must be user-provided |
|
|
12
|
+
| `figure` | Illustrations, diagrams | Yes — auto-generated during fill |
|
|
13
|
+
|
|
14
|
+
## Sourcing Priority
|
|
15
|
+
|
|
16
|
+
### 1. Check Ingested Sources
|
|
17
|
+
Search `.dokkit/sources/` for image files (PNG, JPG, JPEG, BMP, TIFF):
|
|
18
|
+
- Match by field's `image_type` and source metadata
|
|
19
|
+
- Set `image_source: "ingested"` and `image_file` to the path
|
|
20
|
+
|
|
21
|
+
### 2. User-Provided File
|
|
22
|
+
Via `/dokkit modify "use <file>"`:
|
|
23
|
+
- Search `.dokkit/sources/`, then project root
|
|
24
|
+
- Copy to `.dokkit/images/`
|
|
25
|
+
|
|
26
|
+
### 3. AI Generation
|
|
27
|
+
```bash
|
|
28
|
+
python scripts/source_images.py generate \
|
|
29
|
+
--prompt "인포그래픽: AI 감정 케어 플랫폼 4단계 로드맵" \
|
|
30
|
+
--preset infographic \
|
|
31
|
+
--output-dir .dokkit/images/ \
|
|
32
|
+
--project-dir . \
|
|
33
|
+
--lang ko
|
|
34
|
+
```
|
|
35
|
+
Parse `__RESULT__` JSON from stdout: `{"image_id": "...", "file_path": "...", "source_type": "generated"}`
|
|
36
|
+
|
|
37
|
+
#### Language Options (`--lang`)
|
|
38
|
+
|
|
39
|
+
| Value | Behavior | Example |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| `ko` | **Default.** All text in Korean only. English strictly forbidden. | 제목, 라벨, 설명 모두 한국어 |
|
|
42
|
+
| `en` | All text in English only. | Titles, labels, descriptions in English |
|
|
43
|
+
| `ko+en` | Mixed. Titles in Korean, technical terms may use English. | 제목은 한국어, Node.js 등 기술 용어는 영어 허용 |
|
|
44
|
+
| `ja` | All text in Japanese only. | 日本語のみ |
|
|
45
|
+
| `<code>` | Any ISO 639-1 code. | `zh`, `es`, `fr`, `de`, `pt` |
|
|
46
|
+
| `<a>+<b>` | Mixed: primary + secondary language. | `ko+ja`, `en+ko` |
|
|
47
|
+
|
|
48
|
+
#### Presets
|
|
49
|
+
|
|
50
|
+
| Preset | Style | Default Aspect Ratio |
|
|
51
|
+
|---|---|---|
|
|
52
|
+
| `technical_illustration` | Clean diagrams, labeled components | 16:9 |
|
|
53
|
+
| `infographic` | Icon-based, corporate color palette | 16:9 |
|
|
54
|
+
| `photorealistic` | High-quality, natural lighting | 4:3 |
|
|
55
|
+
| `concept` | Abstract/modern, business proposal style | 1:1 |
|
|
56
|
+
| `chart` | Clean data visualization | 16:9 |
|
|
57
|
+
|
|
58
|
+
Use `--aspect-ratio 16:9` to override. Use `--no-enhance` to skip preset style injection (language instruction still applies).
|
|
59
|
+
|
|
60
|
+
**Model**: `gemini-3-pro-image-preview` (nano-banana). Best for accurate text rendering in non-Latin scripts.
|
|
61
|
+
|
|
62
|
+
### 4. Web Search
|
|
63
|
+
```bash
|
|
64
|
+
python scripts/source_images.py search \
|
|
65
|
+
--query "company logo example" \
|
|
66
|
+
--output-dir .dokkit/images/
|
|
67
|
+
```
|
|
68
|
+
Parse `__RESULT__` JSON: `{"image_id": "...", "file_path": "...", "source_type": "searched"}`
|
|
69
|
+
(Note: search is not yet implemented — directs user to provide images manually.)
|
|
70
|
+
|
|
71
|
+
## Prompt Templates by Image Type
|
|
72
|
+
|
|
73
|
+
| image_type | Suggested prompt |
|
|
74
|
+
|-----------|-----------------|
|
|
75
|
+
| photo | "Professional ID photo, white background, formal attire" |
|
|
76
|
+
| logo | "Clean company logo, transparent background, modern design" |
|
|
77
|
+
| signature | **NEVER generate** — signatures must be user-provided |
|
|
78
|
+
| figure | Derive from field label and section context |
|
|
79
|
+
|
|
80
|
+
## Section Content Image Generation
|
|
81
|
+
|
|
82
|
+
For `image_opportunities` in `section_content` fields — auto-generated during `/dokkit fill` (decorative/explanatory, not identity-sensitive).
|
|
83
|
+
|
|
84
|
+
### Prompt Templates by Content Type
|
|
85
|
+
|
|
86
|
+
| content_type | preset | Prompt guidance |
|
|
87
|
+
|---|---|---|
|
|
88
|
+
| diagram | `technical_illustration` | "Technical architecture/system diagram showing [concept]. Clean lines, labeled components." |
|
|
89
|
+
| flowchart | `technical_illustration` | "Process flowchart showing [steps]. Left-to-right flow, clear arrows, numbered steps." |
|
|
90
|
+
| data | `infographic` | "Data visualization showing [metric/trend]. Clean chart style, professional colors." |
|
|
91
|
+
| concept | `technical_illustration` | "Conceptual illustration of [idea]. Abstract/modern style, suitable for business proposal." |
|
|
92
|
+
| infographic | `infographic` | "Infographic comparing [items]. Icon-based, clean layout, corporate color palette." |
|
|
93
|
+
|
|
94
|
+
### Dimension Defaults
|
|
95
|
+
|
|
96
|
+
HWPML units: 1/7200 inch (~283.46 units/mm). ~77% of A4 text width = 36,000 units.
|
|
97
|
+
|
|
98
|
+
| content_type | HWPML w x h | Approx mm | EMU cx x cy |
|
|
99
|
+
|---|---|---|---|
|
|
100
|
+
| diagram | 36,000 x 24,000 | 127x85 | 4,572,000 x 3,048,000 |
|
|
101
|
+
| flowchart | 36,000 x 24,000 | 127x85 | 4,572,000 x 3,048,000 |
|
|
102
|
+
| data | 36,000 x 20,000 | 127x71 | 4,572,000 x 2,540,000 |
|
|
103
|
+
| concept | 28,000 x 28,000 | 99x99 | 3,556,000 x 3,556,000 |
|
|
104
|
+
| infographic | 36,000 x 24,000 | 127x85 | 4,572,000 x 3,048,000 |
|
|
105
|
+
|
|
106
|
+
## Default Cell-Level Dimensions
|
|
107
|
+
|
|
108
|
+
| image_type | Width (mm) | Height (mm) | Width (EMU) | Height (EMU) |
|
|
109
|
+
|-----------|-----------|------------|------------|-------------|
|
|
110
|
+
| photo | 35 | 45 | 1,260,000 | 1,620,000 |
|
|
111
|
+
| logo | 50 | 50 | 1,800,000 | 1,800,000 |
|
|
112
|
+
| signature | 40 | 15 | 1,440,000 | 540,000 |
|
|
113
|
+
| figure | 100 | 75 | 3,600,000 | 2,700,000 |
|
|
114
|
+
|
|
115
|
+
Conversion: 1 mm = 36,000 EMU. For HWPX, use HWPML unit system.
|
|
116
|
+
|
|
117
|
+
## Rules
|
|
118
|
+
|
|
119
|
+
- Signatures MUST be user-provided — never generate or search
|
|
120
|
+
- Never auto-generate/download images without user approval EXCEPT section content images (auto-generated during fill)
|
|
121
|
+
- Ingested images can be inserted automatically
|
|
122
|
+
- Prefer user-provided over generated
|
|
123
|
+
- Image format must be PNG or JPG (compatible with both DOCX and HWPX)
|
|
124
|
+
|
|
125
|
+
## References
|
|
126
|
+
|
|
127
|
+
See `references/image-xml-patterns.md` for complete DOCX/HWPX image element structures, registration patterns, and the `build_hwpx_pic_element()` function.
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Ingestion Knowledge
|
|
2
|
+
|
|
3
|
+
Parsing strategies and format routing for converting source documents into the dual-file format (Markdown content + JSON sidecar).
|
|
4
|
+
|
|
5
|
+
## Format Routing
|
|
6
|
+
|
|
7
|
+
| Format | Parser | Command |
|
|
8
|
+
|--------|--------|---------|
|
|
9
|
+
| PDF | Docling | `python -m docling <file> --to md` |
|
|
10
|
+
| DOCX | Docling | `python -m docling <file> --to md` |
|
|
11
|
+
| PPTX | Docling | `python -m docling <file> --to md` |
|
|
12
|
+
| HTML | Docling | `python -m docling <file> --to md` |
|
|
13
|
+
| CSV | Docling | `python -m docling <file> --to md` |
|
|
14
|
+
| MD | Direct copy | Read and process as-is |
|
|
15
|
+
| XLSX | Custom | `python .claude/skills/dokkit/scripts/parse_xlsx.py` |
|
|
16
|
+
| HWPX | Custom | `python .claude/skills/dokkit/scripts/parse_hwpx.py` |
|
|
17
|
+
| JSON | Custom | Read, format as structured markdown |
|
|
18
|
+
| TXT | Custom | Read, wrap as markdown |
|
|
19
|
+
| PNG/JPG | Gemini Vision | `python .claude/skills/dokkit/scripts/parse_image_with_gemini.py` |
|
|
20
|
+
|
|
21
|
+
## Docling Usage
|
|
22
|
+
|
|
23
|
+
Primary parser for most formats:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
python -m docling <input-file> --to md --output <output-dir>
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
After Docling runs:
|
|
30
|
+
1. Read the markdown output
|
|
31
|
+
2. Extract key-value pairs from the content
|
|
32
|
+
3. Build the JSON sidecar with metadata
|
|
33
|
+
4. Move files to `.dokkit/sources/`
|
|
34
|
+
|
|
35
|
+
If Docling is not installed, show an explicit error with install instructions: `pip install docling`. Do NOT silently fall back to a different parser.
|
|
36
|
+
|
|
37
|
+
## Custom Parser Output Format
|
|
38
|
+
|
|
39
|
+
All custom parsers output JSON to stdout:
|
|
40
|
+
```json
|
|
41
|
+
{
|
|
42
|
+
"content_md": "# Document Title\n\nExtracted content...",
|
|
43
|
+
"metadata": {
|
|
44
|
+
"file_name": "original.xlsx",
|
|
45
|
+
"file_type": "xlsx",
|
|
46
|
+
"parse_date": "2026-02-07T12:00:00Z",
|
|
47
|
+
"key_value_pairs": { "Name": "John", "Date": "2026-01-15" },
|
|
48
|
+
"sections": ["Sheet1", "Sheet2"]
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Key-Value Extraction
|
|
54
|
+
|
|
55
|
+
After parsing, scan content for structured data:
|
|
56
|
+
- Table cells with label-value patterns (e.g., "Name: John Doe")
|
|
57
|
+
- Form fields with values
|
|
58
|
+
- Metadata headers
|
|
59
|
+
- Labeled sections
|
|
60
|
+
|
|
61
|
+
Store in the JSON sidecar's `key_value_pairs` field for fast lookup during template filling.
|
|
62
|
+
|
|
63
|
+
## References
|
|
64
|
+
|
|
65
|
+
See `references/supported-formats.md` for detailed format specifications.
|
|
@@ -0,0 +1,153 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dokkit
|
|
3
|
+
description: >
|
|
4
|
+
Document template filling system for DOCX and HWPX formats.
|
|
5
|
+
Ingests source documents, analyzes templates, detects fillable fields,
|
|
6
|
+
fills them surgically using source data, reviews with confidence scoring,
|
|
7
|
+
and exports completed documents. Supports Korean and English templates.
|
|
8
|
+
Subcommands: init, sources, preview, ingest, fill, fill-doc, modify, review, export.
|
|
9
|
+
Use when user says "fill template", "fill document", "ingest", "dokkit".
|
|
10
|
+
user-invocable: true
|
|
11
|
+
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Agent
|
|
12
|
+
argument-hint: "<subcommand> [arguments]"
|
|
13
|
+
context:
|
|
14
|
+
- type: file
|
|
15
|
+
path: ${CLAUDE_SKILL_DIR}/COMMANDS.md
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
# Dokkit — Document Template Filling System
|
|
19
|
+
|
|
20
|
+
Surgical document filling for DOCX and HWPX templates using ingested source data. One command with 9 subcommands covering the full document filling lifecycle.
|
|
21
|
+
|
|
22
|
+
## Subcommands
|
|
23
|
+
|
|
24
|
+
| Subcommand | Arguments | Type | Description |
|
|
25
|
+
|------------|-----------|------|-------------|
|
|
26
|
+
| `init` | `[--force] [--keep-sources]` | Inline | Initialize or reset workspace |
|
|
27
|
+
| `sources` | — | Inline | Display ingested sources dashboard |
|
|
28
|
+
| `preview` | — | Inline | Generate PDF preview via LibreOffice |
|
|
29
|
+
| `ingest` | `<file1> [file2] ...` | Agent | Parse source documents into workspace |
|
|
30
|
+
| `fill` | `<template.docx\|hwpx>` | Agent | End-to-end: analyze, fill, review, auto-fix, export |
|
|
31
|
+
| `fill-doc` | `<template.docx\|hwpx>` | Agent | Analyze template and fill fields only |
|
|
32
|
+
| `modify` | `"<instruction>"` | Agent | Apply targeted changes to filled document |
|
|
33
|
+
| `review` | `[section\|approve]` | Agent | Review with per-field confidence annotations |
|
|
34
|
+
| `export` | `<docx\|hwpx\|pdf>` | Agent | Export filled document to format |
|
|
35
|
+
|
|
36
|
+
## Routing
|
|
37
|
+
|
|
38
|
+
Parse `$ARGUMENTS` to determine the subcommand:
|
|
39
|
+
|
|
40
|
+
1. Extract `$1` as the subcommand name
|
|
41
|
+
2. Pass remaining arguments (`$2`, `$3`, ...) to the subcommand
|
|
42
|
+
3. If `$1` is empty or unrecognized, display the subcommand table above with usage examples
|
|
43
|
+
|
|
44
|
+
Full workflows for each subcommand are in COMMANDS.md (auto-loaded via context).
|
|
45
|
+
|
|
46
|
+
<example>
|
|
47
|
+
- `/dokkit ingest docs/resume.pdf docs/transcript.xlsx` — ingest two sources
|
|
48
|
+
- `/dokkit fill docs/template.hwpx` — end-to-end fill pipeline
|
|
49
|
+
- `/dokkit modify "Change the phone number to 010-1234-5678"` — targeted change
|
|
50
|
+
- `/dokkit export pdf` — export as PDF
|
|
51
|
+
</example>
|
|
52
|
+
|
|
53
|
+
## Architecture
|
|
54
|
+
|
|
55
|
+
### Agents
|
|
56
|
+
|
|
57
|
+
| Agent | Model | Role |
|
|
58
|
+
|-------|-------|------|
|
|
59
|
+
| **dokkit-ingestor** | opus | Parse source docs into `.dokkit/sources/` (.md + .json pairs) |
|
|
60
|
+
| **dokkit-analyzer** | opus | Analyze templates, detect fields, map to sources. Writes `analysis.json`. READ-ONLY on templates. |
|
|
61
|
+
| **dokkit-filler** | opus | Surgical XML modification using analysis.json. Three modes: fill, modify, review. |
|
|
62
|
+
| **dokkit-exporter** | sonnet | Repackage ZIP archives, PDF conversion via LibreOffice. |
|
|
63
|
+
|
|
64
|
+
### Workspace
|
|
65
|
+
|
|
66
|
+
All agents communicate via the `.dokkit/` filesystem:
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
.dokkit/
|
|
70
|
+
├── state.json # Single source of truth for session state
|
|
71
|
+
├── sources/ # Ingested content (.md + .json pairs)
|
|
72
|
+
├── analysis.json # Template analysis output (from analyzer)
|
|
73
|
+
├── images/ # Sourced images for template filling
|
|
74
|
+
├── template_work/ # Unpacked template XML (working copy)
|
|
75
|
+
└── output/ # Exported filled documents
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### State Protocol
|
|
79
|
+
|
|
80
|
+
Read `.dokkit/state.json` before any operation. Write state changes atomically: read current → update fields → write back → validate.
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
init → state created (empty)
|
|
84
|
+
ingest → source added to sources[]
|
|
85
|
+
fill/fill-doc → template set, analysis created, filled_document created
|
|
86
|
+
modify → filled_document updated
|
|
87
|
+
review approve → filled_document.status = "finalized"
|
|
88
|
+
export → export entry added to exports[]
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Validate after every write: `python ${CLAUDE_SKILL_DIR}/scripts/validate_state.py .dokkit/state.json`
|
|
92
|
+
|
|
93
|
+
### Knowledge Files
|
|
94
|
+
|
|
95
|
+
Agent-facing knowledge bases in this skill directory:
|
|
96
|
+
|
|
97
|
+
| File | Purpose | Agents |
|
|
98
|
+
|------|---------|--------|
|
|
99
|
+
| `STATE.md` | State schema and management protocol | All |
|
|
100
|
+
| `INGESTION.md` | Format routing and parsing strategies | dokkit-ingestor |
|
|
101
|
+
| `ANALYSIS.md` | Field detection, confidence scoring, output schema | dokkit-analyzer |
|
|
102
|
+
| `FILLING.md` | XML surgery rules, matching strategy, image insertion | dokkit-analyzer, dokkit-filler |
|
|
103
|
+
| `DOCX-XML.md` | Open XML structure for DOCX documents | dokkit-analyzer, dokkit-filler |
|
|
104
|
+
| `HWPX-XML.md` | OWPML structure for HWPX documents | dokkit-analyzer, dokkit-filler |
|
|
105
|
+
| `IMAGE-SOURCING.md` | Image generation, search, and insertion patterns | dokkit-filler |
|
|
106
|
+
| `EXPORT.md` | Document compilation and format conversion | dokkit-exporter |
|
|
107
|
+
|
|
108
|
+
Deep reference material in `references/`:
|
|
109
|
+
- `state-schema.md` — Complete state.json schema
|
|
110
|
+
- `supported-formats.md` — Detailed format specifications
|
|
111
|
+
- `docx-structure.md`, `docx-field-patterns.md` — DOCX patterns
|
|
112
|
+
- `hwpx-structure.md`, `hwpx-field-patterns.md` — HWPX patterns (10 detection patterns)
|
|
113
|
+
- `field-detection-patterns.md` — Advanced heuristics (9 DOCX + 6 HWPX)
|
|
114
|
+
- `section-range-detection.md` — Dynamic range detection for section_content
|
|
115
|
+
- `section-image-interleaving.md` — Image interleaving algorithm
|
|
116
|
+
- `image-opportunity-heuristics.md` — AI image opportunity detection
|
|
117
|
+
- `image-xml-patterns.md` — Image element structures (DOCX + HWPX)
|
|
118
|
+
|
|
119
|
+
Scripts in `scripts/`:
|
|
120
|
+
- `validate_state.py` — State validation
|
|
121
|
+
- `parse_xlsx.py`, `parse_hwpx.py`, `parse_image_with_gemini.py` — Custom parsers
|
|
122
|
+
- `detect_fields.py`, `detect_fields_hwpx.py` — Field detection
|
|
123
|
+
- `validate_docx.py`, `validate_hwpx.py` — Document validation
|
|
124
|
+
- `compile_hwpx.py` — HWPX repackaging
|
|
125
|
+
- `export_pdf.py` — PDF conversion
|
|
126
|
+
|
|
127
|
+
## Rules
|
|
128
|
+
|
|
129
|
+
<rules>
|
|
130
|
+
- Display errors clearly with actionable guidance. Never silently fall back to defaults.
|
|
131
|
+
- Original template is never modified — copies go to `.dokkit/template_work/`.
|
|
132
|
+
- Analyzer is read-only on templates. Only the filler modifies XML.
|
|
133
|
+
- Confidence levels: high, medium, low (not numeric scores).
|
|
134
|
+
- Signatures must be user-provided — never auto-generate them.
|
|
135
|
+
- Validate state after every write with `scripts/validate_state.py`.
|
|
136
|
+
- Inline commands (init, sources, preview) execute directly — do NOT spawn agents.
|
|
137
|
+
- Agent-delegated commands spawn the appropriate agent(s) sequentially.
|
|
138
|
+
</rules>
|
|
139
|
+
|
|
140
|
+
## Known Pitfalls
|
|
141
|
+
|
|
142
|
+
Critical issues discovered through production use:
|
|
143
|
+
|
|
144
|
+
1. **HWPX namespace stripping**: Python ET strips unused namespace declarations. Restore ALL 14 original xmlns on EVERY root element after any `tree.write()`. Applies to section0.xml, content.hpf, header.xml.
|
|
145
|
+
2. **HWPX subList cell wrapping**: ~65% of cells wrap content in `<hp:subList>/<hp:p>`. Check for subList before writing content.
|
|
146
|
+
3. **table_content "Pre-filled" bug**: Never set `mapped_value` to placeholder strings for `table_content` fields. Use `mapped_value: null` with `action: "preserve"`.
|
|
147
|
+
4. **HWPX cellAddr rowAddr corruption**: After row insert/delete, re-index ALL `rowAddr` values. Duplicate rowAddr causes silent data loss.
|
|
148
|
+
5. **HWPX `<hp:pic>` inside `<hp:run>`**: Pic as sibling of run renders invisible. Must be `<hp:run><hp:pic>...<hp:t/></hp:run>`.
|
|
149
|
+
6. **HWPML units**: 1/7200 inch, NOT hundredths of mm. 1mm ~ 283.46 units. A4 text width ~ 46,648 units.
|
|
150
|
+
7. **rowSpan stripping**: When cloning rows with rowSpan>1, divide cellSz height by rowSpan.
|
|
151
|
+
8. **HWPX pic element order**: offset, orgSz, curSz, flip, rotationInfo, renderingInfo, imgRect, imgClip, inMargin, imgDim, hc:img, sz, pos, outMargin.
|
|
152
|
+
9. **HWPX post-write safety**: After ET write: (a) restore namespaces, (b) fix XML declaration to double quotes with `standalone="yes"`, (c) remove newline between `?>` and `<root>`.
|
|
153
|
+
10. **compile_hwpx.py skip .bak**: Backup files must be excluded from ZIP repackaging.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# State Management
|
|
2
|
+
|
|
3
|
+
Protocol for reading and writing `.dokkit/state.json`. All agents follow this protocol.
|
|
4
|
+
|
|
5
|
+
## Workspace Structure
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
.dokkit/
|
|
9
|
+
├── state.json # Single source of truth for session state
|
|
10
|
+
├── sources/ # Ingested source content
|
|
11
|
+
│ ├── <name>.md # Extracted content (LLM-optimized markdown)
|
|
12
|
+
│ └── <name>.json # Structured metadata sidecar
|
|
13
|
+
├── analysis.json # Template analysis output (from analyzer)
|
|
14
|
+
├── images/ # Sourced images
|
|
15
|
+
├── template_work/ # Unpacked template XML (working copy)
|
|
16
|
+
│ ├── word/ # (DOCX) or Contents/ (HWPX)
|
|
17
|
+
│ └── ...
|
|
18
|
+
└── output/ # Exported filled documents
|
|
19
|
+
└── filled_<name>.<ext>
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Reading State
|
|
23
|
+
|
|
24
|
+
Read `.dokkit/state.json` before any operation. Check:
|
|
25
|
+
- `sources` array for available context
|
|
26
|
+
- `template` for current template info
|
|
27
|
+
- `analysis` for field mapping data
|
|
28
|
+
- `filled_document` for current document status
|
|
29
|
+
|
|
30
|
+
## Writing State
|
|
31
|
+
|
|
32
|
+
After any mutation:
|
|
33
|
+
1. Read current state.json (avoid overwriting concurrent changes)
|
|
34
|
+
2. Update only the relevant fields
|
|
35
|
+
3. Write the full state back
|
|
36
|
+
4. Validate: `python .claude/skills/dokkit/scripts/validate_state.py .dokkit/state.json`
|
|
37
|
+
|
|
38
|
+
## State Transitions
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
/dokkit init → state created (empty)
|
|
42
|
+
/dokkit ingest → source added to sources[]
|
|
43
|
+
/dokkit fill or fill-doc → template set, analysis created, filled_document created
|
|
44
|
+
/dokkit modify → filled_document updated
|
|
45
|
+
/dokkit review approve → filled_document.status = "finalized"
|
|
46
|
+
/dokkit export → export entry added to exports[]
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Validation
|
|
50
|
+
|
|
51
|
+
The validator checks:
|
|
52
|
+
- Schema conformance
|
|
53
|
+
- Required fields present
|
|
54
|
+
- Valid status values
|
|
55
|
+
- Source file references exist
|
|
56
|
+
- No orphaned entries
|
|
57
|
+
|
|
58
|
+
## References
|
|
59
|
+
|
|
60
|
+
See `references/state-schema.md` for the complete schema definition.
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# DOCX Field Detection Patterns
|
|
2
|
+
|
|
3
|
+
## Pattern 1: Placeholder Text
|
|
4
|
+
|
|
5
|
+
```xml
|
|
6
|
+
<!-- Text like {{name}} or <<name>> in a run -->
|
|
7
|
+
<w:r>
|
|
8
|
+
<w:rPr>
|
|
9
|
+
<w:rFonts w:ascii="Arial" w:hAnsi="Arial"/>
|
|
10
|
+
<w:sz w:val="20"/>
|
|
11
|
+
</w:rPr>
|
|
12
|
+
<w:t>{{full_name}}</w:t> <!-- REPLACE this text content -->
|
|
13
|
+
</w:r>
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
**Action**: Replace the text content of `<w:t>` while preserving `<w:rPr>`.
|
|
17
|
+
|
|
18
|
+
## Pattern 2: Empty Table Cell
|
|
19
|
+
|
|
20
|
+
```xml
|
|
21
|
+
<w:tr>
|
|
22
|
+
<w:tc>
|
|
23
|
+
<w:p><w:r><w:t>Name</w:t></w:r></w:p> <!-- Label cell -->
|
|
24
|
+
</w:tc>
|
|
25
|
+
<w:tc>
|
|
26
|
+
<w:p/> <!-- Empty cell → FILL THIS -->
|
|
27
|
+
</w:tc>
|
|
28
|
+
</w:tr>
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Action**: Insert `<w:r><w:t>value</w:t></w:r>` into the empty `<w:p>`. Copy `<w:rPr>` from the label cell's run to match formatting.
|
|
32
|
+
|
|
33
|
+
## Pattern 3: Underline Placeholder
|
|
34
|
+
|
|
35
|
+
```xml
|
|
36
|
+
<w:r>
|
|
37
|
+
<w:rPr>
|
|
38
|
+
<w:u w:val="single"/>
|
|
39
|
+
</w:rPr>
|
|
40
|
+
<w:t xml:space="preserve"> </w:t> <!-- Spaces with underline -->
|
|
41
|
+
</w:r>
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Action**: Replace the spaces in `<w:t>` with the actual value. Keep `<w:u>` in `<w:rPr>`.
|
|
45
|
+
|
|
46
|
+
## Pattern 4: Content Control
|
|
47
|
+
|
|
48
|
+
```xml
|
|
49
|
+
<w:sdt>
|
|
50
|
+
<w:sdtPr>
|
|
51
|
+
<w:alias w:val="Company Name"/>
|
|
52
|
+
<w:tag w:val="company"/>
|
|
53
|
+
<w:showingPlcHdr/> <!-- Indicates placeholder is showing -->
|
|
54
|
+
</w:sdtPr>
|
|
55
|
+
<w:sdtContent>
|
|
56
|
+
<w:p>
|
|
57
|
+
<w:r>
|
|
58
|
+
<w:rPr><w:rStyle w:val="PlaceholderText"/></w:rPr>
|
|
59
|
+
<w:t>Click here to enter text.</w:t>
|
|
60
|
+
</w:r>
|
|
61
|
+
</w:p>
|
|
62
|
+
</w:sdtContent>
|
|
63
|
+
</w:sdt>
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Action**: Replace the run inside `<w:sdtContent>` with a new run containing the value. Remove `<w:showingPlcHdr/>` from `<w:sdtPr>`. Remove the placeholder style from `<w:rPr>`.
|
|
67
|
+
|
|
68
|
+
## Pattern 5: Instruction Text
|
|
69
|
+
|
|
70
|
+
```xml
|
|
71
|
+
<w:r>
|
|
72
|
+
<w:rPr>
|
|
73
|
+
<w:color w:val="808080"/> <!-- Gray text -->
|
|
74
|
+
<w:i/> <!-- Italic -->
|
|
75
|
+
</w:rPr>
|
|
76
|
+
<w:t>(enter your name)</w:t>
|
|
77
|
+
</w:r>
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Action**: Replace text content. Change `<w:rPr>` to remove gray color and italic (or copy from a nearby filled field).
|
|
81
|
+
|
|
82
|
+
## Pattern 6: Writing Tip Box (작성 팁)
|
|
83
|
+
|
|
84
|
+
Single-cell tables with dashed borders containing `※` guidance text. These are NOT fillable — they must be **deleted**.
|
|
85
|
+
|
|
86
|
+
```xml
|
|
87
|
+
<w:tbl>
|
|
88
|
+
<w:tblPr>
|
|
89
|
+
<w:tblBorders>
|
|
90
|
+
<w:top w:val="dashed" w:sz="4" w:space="0" w:color="auto"/>
|
|
91
|
+
<w:left w:val="dashed" w:sz="4" w:space="0" w:color="auto"/>
|
|
92
|
+
<w:bottom w:val="dashed" w:sz="4" w:space="0" w:color="auto"/>
|
|
93
|
+
<w:right w:val="dashed" w:sz="4" w:space="0" w:color="auto"/>
|
|
94
|
+
</w:tblBorders>
|
|
95
|
+
</w:tblPr>
|
|
96
|
+
<w:tr>
|
|
97
|
+
<w:tc>
|
|
98
|
+
<w:p>
|
|
99
|
+
<w:r>
|
|
100
|
+
<w:rPr><w:color w:val="FF0000"/></w:rPr>
|
|
101
|
+
<w:t>※ 작성 팁: 구체적인 사업 목표를 기재하세요.</w:t>
|
|
102
|
+
</w:r>
|
|
103
|
+
</w:p>
|
|
104
|
+
</w:tc>
|
|
105
|
+
</w:tr>
|
|
106
|
+
</w:tbl>
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Identifying traits**:
|
|
110
|
+
- Single row, single cell (`<w:tr>` has one `<w:tc>`)
|
|
111
|
+
- `<w:tblBorders>` with `w:val="dashed"` on all sides
|
|
112
|
+
- Text starts with `※` or contains `작성 팁`, `작성요령`
|
|
113
|
+
- Often has red `<w:color w:val="FF0000"/>` styling
|
|
114
|
+
|
|
115
|
+
**Action**: Flag as `field_type: "tip_box"`, `action: "delete"`. Delete the entire `<w:tbl>` element.
|
|
116
|
+
|
|
117
|
+
## Color Warning for Copied Formatting
|
|
118
|
+
|
|
119
|
+
When copying `<w:rPr>` from template guide text or instruction text (Patterns 2 and 5), **always check for red color**:
|
|
120
|
+
|
|
121
|
+
```xml
|
|
122
|
+
<!-- DANGER: This rPr has red color from guide text -->
|
|
123
|
+
<w:rPr>
|
|
124
|
+
<w:color w:val="FF0000"/> <!-- REMOVE THIS -->
|
|
125
|
+
<w:i/> <!-- REMOVE THIS (from guide text) -->
|
|
126
|
+
<w:sz w:val="20"/> <!-- KEEP -->
|
|
127
|
+
</w:rPr>
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**Rule**: After copying rPr from any template text, check for `<w:color>` elements. If the value is `FF0000`, `FF0000FF`, or any red shade, **remove the `<w:color>` element** (defaults to black). Also remove `<w:i/>` if it came from guide text.
|
|
131
|
+
|
|
132
|
+
## Safe Modification Template
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
import xml.etree.ElementTree as ET
|
|
136
|
+
|
|
137
|
+
ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
|
|
138
|
+
ET.register_namespace("w", ns["w"])
|
|
139
|
+
|
|
140
|
+
tree = ET.parse("word/document.xml")
|
|
141
|
+
|
|
142
|
+
# Find and replace placeholder text
|
|
143
|
+
for t_elem in tree.iter("{%s}t" % ns["w"]):
|
|
144
|
+
if t_elem.text and "{{" in t_elem.text:
|
|
145
|
+
placeholder = t_elem.text # e.g., "{{name}}"
|
|
146
|
+
field_name = placeholder.strip("{}").strip("<>")
|
|
147
|
+
if field_name in field_values:
|
|
148
|
+
t_elem.text = field_values[field_name]
|
|
149
|
+
|
|
150
|
+
tree.write("word/document.xml", xml_declaration=True, encoding="UTF-8")
|
|
151
|
+
```
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# DOCX XML Structure Reference
|
|
2
|
+
|
|
3
|
+
## Unpacking a DOCX
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
# Unzip to inspect
|
|
7
|
+
mkdir -p .dokkit/template_work
|
|
8
|
+
cd .dokkit/template_work
|
|
9
|
+
unzip -o /path/to/template.docx
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
## Reading document.xml
|
|
13
|
+
|
|
14
|
+
The main content is in `word/document.xml`. Parse with any XML parser.
|
|
15
|
+
|
|
16
|
+
### Python Example
|
|
17
|
+
```python
|
|
18
|
+
import xml.etree.ElementTree as ET
|
|
19
|
+
|
|
20
|
+
ns = {"w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
|
|
21
|
+
tree = ET.parse("word/document.xml")
|
|
22
|
+
root = tree.getroot()
|
|
23
|
+
|
|
24
|
+
# Find all paragraphs
|
|
25
|
+
for p in root.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p"):
|
|
26
|
+
texts = []
|
|
27
|
+
for t in p.iter("{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t"):
|
|
28
|
+
if t.text:
|
|
29
|
+
texts.append(t.text)
|
|
30
|
+
print("".join(texts))
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Repackaging a DOCX
|
|
34
|
+
|
|
35
|
+
After modifying XML, repackage as a valid DOCX:
|
|
36
|
+
|
|
37
|
+
```python
|
|
38
|
+
import zipfile
|
|
39
|
+
import os
|
|
40
|
+
|
|
41
|
+
def repackage_docx(work_dir, output_path):
|
|
42
|
+
"""Repackage modified XML files into a valid DOCX."""
|
|
43
|
+
with zipfile.ZipFile(output_path, 'w', zipfile.ZIP_DEFLATED) as zf:
|
|
44
|
+
for root, dirs, files in os.walk(work_dir):
|
|
45
|
+
for file in files:
|
|
46
|
+
file_path = os.path.join(root, file)
|
|
47
|
+
arcname = os.path.relpath(file_path, work_dir)
|
|
48
|
+
zf.write(file_path, arcname)
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Critical Rules for DOCX Surgery
|
|
52
|
+
|
|
53
|
+
1. **Never remove `<w:rPr>` elements** — they contain all formatting
|
|
54
|
+
2. **Preserve `xml:space="preserve"`** on `<w:t>` elements with leading/trailing spaces
|
|
55
|
+
3. **Keep `<w:pPr>` intact** — paragraph formatting must not change
|
|
56
|
+
4. **Maintain bookmark pairs** — `<w:bookmarkStart>` must have matching `<w:bookmarkEnd>`
|
|
57
|
+
5. **Don't modify `<w:sectPr>`** — section properties control page layout
|
|
58
|
+
6. **Preserve table cell merge attributes** — `<w:vMerge>` and `<w:gridSpan>`
|