kordoc 2.2.4 → 2.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +43 -4
  2. package/dist/chunk-FCQEF2ZM.js +457 -0
  3. package/dist/chunk-FCQEF2ZM.js.map +1 -0
  4. package/dist/chunk-HXUCZ2IL.cjs +450 -0
  5. package/dist/chunk-HXUCZ2IL.cjs.map +1 -0
  6. package/dist/chunk-MUOQXDZ4.cjs +33 -0
  7. package/dist/chunk-MUOQXDZ4.cjs.map +1 -0
  8. package/dist/chunk-NL5XLN5R.js +450 -0
  9. package/dist/chunk-NL5XLN5R.js.map +1 -0
  10. package/dist/{chunk-SY2RFVLW.js → chunk-RF6UJXR3.js} +135 -2805
  11. package/dist/chunk-RF6UJXR3.js.map +1 -0
  12. package/dist/chunk-SBVRCJFH.js +33 -0
  13. package/dist/chunk-SBVRCJFH.js.map +1 -0
  14. package/dist/cli.js +12 -7
  15. package/dist/cli.js.map +1 -1
  16. package/dist/index.cjs +294 -3084
  17. package/dist/index.cjs.map +1 -1
  18. package/dist/index.d.cts +1 -1
  19. package/dist/index.d.ts +1 -1
  20. package/dist/index.js +77 -2817
  21. package/dist/index.js.map +1 -1
  22. package/dist/mcp.js +15 -9
  23. package/dist/mcp.js.map +1 -1
  24. package/dist/page-range-3C7UGGEK.cjs +7 -0
  25. package/dist/page-range-3C7UGGEK.cjs.map +1 -0
  26. package/dist/page-range-H35FN3OQ.js +7 -0
  27. package/dist/page-range-H35FN3OQ.js.map +1 -0
  28. package/dist/parser-43IAQ5KE.js +2278 -0
  29. package/dist/parser-43IAQ5KE.js.map +1 -0
  30. package/dist/parser-AMP7MAOH.js +2279 -0
  31. package/dist/parser-AMP7MAOH.js.map +1 -0
  32. package/dist/parser-KOWPTDJU.cjs +2278 -0
  33. package/dist/parser-KOWPTDJU.cjs.map +1 -0
  34. package/dist/provider-WPIYEALY.js +37 -0
  35. package/dist/provider-WPIYEALY.js.map +1 -0
  36. package/dist/provider-YN2SSK4X.cjs +37 -0
  37. package/dist/provider-YN2SSK4X.cjs.map +1 -0
  38. package/dist/{watch-5P7DJ3HG.js → watch-IUQXOXW3.js} +6 -4
  39. package/dist/{watch-5P7DJ3HG.js.map → watch-IUQXOXW3.js.map} +1 -1
  40. package/package.json +1 -1
  41. package/dist/chunk-SY2RFVLW.js.map +0 -1
package/README.md CHANGED
@@ -23,15 +23,27 @@ HWP, HWPX, PDF, XLSX, DOCX — 관공서에서 쏟아지는 모든 문서를 파
23
23
  * **📊 복잡한 표(Table) 완벽 재현**: 선이 없는 PDF나 복잡하게 병합된 HWP 표도 구조를 분석하여 정확한 마크다운 테이블로 복원합니다.
24
24
  * **🔍 신구대조표 자동 생성**: 두 문서의 차이점을 분석하여 무엇이 바뀌었는지 한눈에 보여줍니다. (HWP와 HWPX 간의 비교도 가능!)
25
25
  * **📝 마크다운을 다시 HWPX로**: AI가 작성한 내용을 다시 보고서 양식(`HWPX`)으로 되돌려줍니다. 이제 복사-붙여넣기 노가다에서 해방되세요.
26
+ * **✏️ 양식 자동 채우기**: 공문서 양식 템플릿(신청서, 보고서)에 값을 넣으면 자동으로 빈칸을 채웁니다. 원본 서식(글꼴, 크기, 정렬)을 100% 보존합니다.
26
27
  * **🤖 AI 에이전트 연동 (MCP)**: `Claude`, `Cursor`와 같은 도구에서 직접 `kordoc`을 호출해 문서를 읽고 코딩할 수 있습니다.
27
28
 
28
29
  ---
29
30
 
30
- ## v2.2.1 변경사항
31
+ ## v2.2.4 변경사항
32
+
33
+ - **📝 양식 자동 채우기 (Form Filler)** — 공문서 양식 템플릿에 값을 자동으로 채워넣습니다. 라벨-값 셀 패턴, 체크박스(`□`→`☑`), 괄호 빈칸(`일반( )통`→`일반(3)통`), 어노테이션(`(한자:)`→`(한자:金)`) 지원.
34
+ - **🏛️ HWPX 원본 서식 보존 모드** — `fillHwpx()`로 HWPX XML을 직접 조작하여 글꼴, 크기, 정렬 등 원본 서식 100% 유지한 채 값만 교체.
35
+ - **📊 병합 셀 HTML 테이블 출력** — `colspan`/`rowspan`이 있는 복잡한 표를 GFM 대신 HTML `<table>`로 출력하여 구조 보존.
36
+ - **🔧 markdownToHwpx 서식 강화** — 역변환 시 heading/bold/italic/table 등 서식 지원 대폭 개선.
37
+ - **🤖 MCP fill_form 도구** — AI 에이전트가 양식을 직접 채울 수 있는 새 MCP 도구 추가 (총 8개).
38
+
39
+ <details>
40
+ <summary>v2.2.1 변경사항</summary>
31
41
 
32
42
  - **🔧 마크다운 렌더링 개선** — GFM 특수문자(`~`) 이스케이프로 취소선 오해석 방지, 테이블 셀 내 `|` 문자 이스케이프, 중첩 테이블 텍스트 구분자 `|` → `/` 변경으로 GFM 파서 충돌 방지.
33
43
  - **📝 문단 간격 정상화** — paragraph 블록 사이 빈 줄 삽입으로 마크다운에서 별도 문단으로 렌더링.
34
44
 
45
+ </details>
46
+
35
47
  <details>
36
48
  <summary>v2.2.0 변경사항</summary>
37
49
 
@@ -186,6 +198,26 @@ if (result.success) {
186
198
  }
187
199
  ```
188
200
 
201
+ ### 양식 자동 채우기
202
+
203
+ ```typescript
204
+ import { fillForm } from "kordoc"
205
+ import { readFileSync, writeFileSync } from "fs"
206
+
207
+ const template = readFileSync("신청서.hwpx")
208
+
209
+ // HWPX 원본 서식 보존 모드 — 글꼴, 크기, 정렬 100% 유지
210
+ const result = await fillForm(template.buffer, {
211
+ 성명: "홍길동",
212
+ 주민등록번호: "900101-1234567",
213
+ 주소: "서울특별시 광진구 능동로 120",
214
+ }, { format: "hwpx-preserve" })
215
+
216
+ writeFileSync("신청서_작성완료.hwpx", Buffer.from(result.buffer!))
217
+ // result.filled → [{ label: "성명", value: "홍길동" }, ...]
218
+ // result.unmatched → 매칭 실패한 키 목록
219
+ ```
220
+
189
221
  ### HWPX 생성 (역변환)
190
222
 
191
223
  ```typescript
@@ -220,6 +252,9 @@ npx kordoc 보고서.hwp -o 보고서.md # 파일 저장
220
252
  npx kordoc *.pdf -d ./변환결과/ # 일괄 변환
221
253
  npx kordoc 검토서.hwpx --format json # JSON (blocks + metadata 포함)
222
254
  npx kordoc 보고서.hwpx --pages 1-3 # 페이지 범위
255
+ npx kordoc fill 신청서.hwpx -f '성명=홍길동,주소=서울' -o 결과.hwpx # 양식 채우기
256
+ npx kordoc fill 신청서.hwpx -j values.json -o 결과.hwpx # JSON 파일로 채우기
257
+ npx kordoc fill 신청서.hwpx --dry-run # 필드 목록만 확인
223
258
  npx kordoc watch ./수신함 -d ./변환결과 # 폴더 감시 모드
224
259
  npx kordoc watch ./문서 --webhook https://api/hook # 웹훅 알림
225
260
  ```
@@ -231,13 +266,13 @@ npx kordoc watch ./문서 --webhook https://api/hook # 웹훅 알림
231
266
  "mcpServers": {
232
267
  "kordoc": {
233
268
  "command": "npx",
234
- "args": ["-y", "kordoc-mcp"]
269
+ "args": ["-y", "kordoc", "mcp"]
235
270
  }
236
271
  }
237
272
  }
238
273
  ```
239
274
 
240
- **7개 도구:**
275
+ **8개 도구:**
241
276
 
242
277
  | 도구 | 설명 |
243
278
  |------|------|
@@ -248,6 +283,7 @@ npx kordoc watch ./문서 --webhook https://api/hook # 웹훅 알림
248
283
  | `parse_table` | N번째 테이블만 추출 |
249
284
  | `compare_documents` | 두 문서 비교 (크로스 포맷) |
250
285
  | `parse_form` | 양식 필드를 JSON으로 추출 |
286
+ | `fill_form` | 양식 템플릿에 값 채우기 (HWPX 원본 서식 보존) |
251
287
 
252
288
  ## API
253
289
 
@@ -269,6 +305,9 @@ npx kordoc watch ./문서 --webhook https://api/hook # 웹훅 알림
269
305
  |------|------|
270
306
  | `compare(bufferA, bufferB, options?)` | IR 레벨 문서 비교 |
271
307
  | `extractFormFields(blocks)` | IRBlock[]에서 양식 필드 인식 |
308
+ | `fillForm(buffer, values, options?)` | 양식 템플릿에 값 채우기 (markdown/hwpx/hwpx-preserve) |
309
+ | `fillFormFields(blocks, values)` | IRBlock[] 기반 필드 값 교체 |
310
+ | `fillHwpx(buffer, values)` | HWPX XML 직접 조작 (원본 서식 보존) |
272
311
  | `markdownToHwpx(markdown)` | Markdown → HWPX 역변환 |
273
312
  | `blocksToMarkdown(blocks)` | IRBlock[] → Markdown 문자열 |
274
313
 
@@ -280,7 +319,7 @@ import type {
280
319
  IRBlock, IRTable, IRCell, CellContext,
281
320
  DocumentMetadata, ParseOptions, ErrorCode,
282
321
  DiffResult, BlockDiff, CellDiff, DiffChangeType,
283
- FormField, FormResult,
322
+ FormField, FormResult, FillResult, HwpxFillResult, FillOutputFormat,
284
323
  OcrProvider, WatchOptions,
285
324
  } from "kordoc"
286
325
  ```
@@ -0,0 +1,457 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/utils.ts
4
+ var VERSION = true ? "2.2.6" : "0.0.0-dev";
5
+ function toArrayBuffer(buf) {
6
+ if (buf.byteOffset === 0 && buf.byteLength === buf.buffer.byteLength) {
7
+ return buf.buffer;
8
+ }
9
+ return buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength);
10
+ }
11
+ var KordocError = class extends Error {
12
+ constructor(message) {
13
+ super(message);
14
+ this.name = "KordocError";
15
+ }
16
+ };
17
+ function sanitizeError(err) {
18
+ if (err instanceof KordocError) return err.message;
19
+ return "\uBB38\uC11C \uCC98\uB9AC \uC911 \uC624\uB958\uAC00 \uBC1C\uC0DD\uD588\uC2B5\uB2C8\uB2E4";
20
+ }
21
+ function isPathTraversal(name) {
22
+ if (name.includes("\0")) return true;
23
+ const normalized = name.replace(/\\/g, "/");
24
+ const segments = normalized.split("/");
25
+ return segments.some((s) => s === "..") || normalized.startsWith("/") || /^[A-Za-z]:/.test(normalized);
26
+ }
27
+ function precheckZipSize(buffer, maxUncompressedSize = 100 * 1024 * 1024, maxEntries = 500) {
28
+ try {
29
+ const data = new DataView(buffer);
30
+ const len = buffer.byteLength;
31
+ let eocdOffset = -1;
32
+ for (let i = len - 22; i >= Math.max(0, len - 65557); i--) {
33
+ if (data.getUint32(i, true) === 101010256) {
34
+ eocdOffset = i;
35
+ break;
36
+ }
37
+ }
38
+ if (eocdOffset < 0) return { totalUncompressed: 0, entryCount: 0 };
39
+ const entryCount = data.getUint16(eocdOffset + 10, true);
40
+ if (entryCount > maxEntries) {
41
+ throw new KordocError(`ZIP \uC5D4\uD2B8\uB9AC \uC218 \uCD08\uACFC: ${entryCount} (\uCD5C\uB300 ${maxEntries})`);
42
+ }
43
+ const cdSize = data.getUint32(eocdOffset + 12, true);
44
+ const cdOffset = data.getUint32(eocdOffset + 16, true);
45
+ if (cdOffset + cdSize > len) return { totalUncompressed: 0, entryCount };
46
+ let totalUncompressed = 0;
47
+ let pos = cdOffset;
48
+ for (let i = 0; i < entryCount && pos + 46 <= cdOffset + cdSize; i++) {
49
+ if (data.getUint32(pos, true) !== 33639248) break;
50
+ totalUncompressed += data.getUint32(pos + 24, true);
51
+ const nameLen = data.getUint16(pos + 28, true);
52
+ const extraLen = data.getUint16(pos + 30, true);
53
+ const commentLen = data.getUint16(pos + 32, true);
54
+ pos += 46 + nameLen + extraLen + commentLen;
55
+ }
56
+ if (totalUncompressed > maxUncompressedSize) {
57
+ throw new KordocError(`ZIP \uBE44\uC555\uCD95 \uD06C\uAE30 \uCD08\uACFC: ${(totalUncompressed / 1024 / 1024).toFixed(1)}MB (\uCD5C\uB300 ${maxUncompressedSize / 1024 / 1024}MB)`);
58
+ }
59
+ return { totalUncompressed, entryCount };
60
+ } catch (err) {
61
+ if (err instanceof KordocError) throw err;
62
+ return { totalUncompressed: 0, entryCount: 0 };
63
+ }
64
+ }
65
+ function stripDtd(xml) {
66
+ return xml.replace(/<!DOCTYPE\s[^[>]*(\[[\s\S]*?\])?\s*>/gi, "");
67
+ }
68
+ var SAFE_HREF_RE = /^(?:https?:|mailto:|tel:|#)/i;
69
+ function sanitizeHref(href) {
70
+ const trimmed = href.trim();
71
+ if (!trimmed || !SAFE_HREF_RE.test(trimmed)) return null;
72
+ return trimmed;
73
+ }
74
+ function safeMin(arr) {
75
+ let min = Infinity;
76
+ for (let i = 0; i < arr.length; i++) if (arr[i] < min) min = arr[i];
77
+ return min;
78
+ }
79
+ function safeMax(arr) {
80
+ let max = -Infinity;
81
+ for (let i = 0; i < arr.length; i++) if (arr[i] > max) max = arr[i];
82
+ return max;
83
+ }
84
+ function classifyError(err) {
85
+ if (!(err instanceof Error)) return "PARSE_ERROR";
86
+ const msg = err.message;
87
+ if (msg.includes("\uC554\uD638\uD654")) return "ENCRYPTED";
88
+ if (msg.includes("DRM")) return "DRM_PROTECTED";
89
+ if (msg.includes("ZIP bomb") || msg.includes("ZIP \uBE44\uC555\uCD95 \uD06C\uAE30 \uCD08\uACFC") || msg.includes("ZIP \uC5D4\uD2B8\uB9AC \uC218 \uCD08\uACFC")) return "ZIP_BOMB";
90
+ if (msg.includes("bomb") || msg.includes("\uD06C\uAE30 \uCD08\uACFC") || msg.includes("\uC555\uCD95 \uD574\uC81C")) return "DECOMPRESSION_BOMB";
91
+ if (msg.includes("\uC774\uBBF8\uC9C0 \uAE30\uBC18")) return "IMAGE_BASED_PDF";
92
+ if (msg.includes("\uC139\uC158") && (msg.includes("\uCC3E\uC744 \uC218 \uC5C6") || msg.includes("\uC5C6\uC74C"))) return "NO_SECTIONS";
93
+ if (msg.includes("\uC2DC\uADF8\uB2C8\uCC98") || msg.includes("\uBCF5\uAD6C\uD560 \uC218 \uC5C6")) return "CORRUPTED";
94
+ return "PARSE_ERROR";
95
+ }
96
+
97
+ // src/table/builder.ts
98
+ var MAX_COLS = 200;
99
+ var MAX_ROWS = 1e4;
100
+ function buildTable(rows) {
101
+ if (rows.length > MAX_ROWS) rows = rows.slice(0, MAX_ROWS);
102
+ const numRows = rows.length;
103
+ const hasAddr = rows.some((row) => row.some((c) => c.colAddr !== void 0 && c.rowAddr !== void 0));
104
+ if (hasAddr) return buildTableDirect(rows, numRows);
105
+ let maxCols = 0;
106
+ const tempOccupied = Array.from({ length: numRows }, () => []);
107
+ for (let rowIdx = 0; rowIdx < numRows; rowIdx++) {
108
+ let colIdx = 0;
109
+ for (const cell of rows[rowIdx]) {
110
+ while (colIdx < MAX_COLS && tempOccupied[rowIdx][colIdx]) colIdx++;
111
+ if (colIdx >= MAX_COLS) break;
112
+ for (let r = rowIdx; r < Math.min(rowIdx + cell.rowSpan, numRows); r++) {
113
+ for (let c = colIdx; c < Math.min(colIdx + cell.colSpan, MAX_COLS); c++) {
114
+ tempOccupied[r][c] = true;
115
+ }
116
+ }
117
+ colIdx += cell.colSpan;
118
+ if (colIdx > maxCols) maxCols = colIdx;
119
+ }
120
+ }
121
+ if (maxCols === 0) return { rows: 0, cols: 0, cells: [], hasHeader: false };
122
+ const grid = Array.from(
123
+ { length: numRows },
124
+ () => Array.from({ length: maxCols }, () => ({ text: "", colSpan: 1, rowSpan: 1 }))
125
+ );
126
+ const occupied = Array.from({ length: numRows }, () => Array(maxCols).fill(false));
127
+ for (let rowIdx = 0; rowIdx < numRows; rowIdx++) {
128
+ let colIdx = 0;
129
+ let cellIdx = 0;
130
+ while (colIdx < maxCols && cellIdx < rows[rowIdx].length) {
131
+ while (colIdx < maxCols && occupied[rowIdx][colIdx]) colIdx++;
132
+ if (colIdx >= maxCols) break;
133
+ const cell = rows[rowIdx][cellIdx];
134
+ grid[rowIdx][colIdx] = {
135
+ text: cell.text.trim(),
136
+ colSpan: cell.colSpan,
137
+ rowSpan: cell.rowSpan
138
+ };
139
+ for (let r = rowIdx; r < Math.min(rowIdx + cell.rowSpan, numRows); r++) {
140
+ for (let c = colIdx; c < Math.min(colIdx + cell.colSpan, maxCols); c++) {
141
+ occupied[r][c] = true;
142
+ }
143
+ }
144
+ colIdx += cell.colSpan;
145
+ cellIdx++;
146
+ }
147
+ }
148
+ return trimAndReturn(grid, numRows, maxCols);
149
+ }
150
+ function buildTableDirect(rows, numRows) {
151
+ let maxCols = 0;
152
+ for (const row of rows) {
153
+ for (const cell of row) {
154
+ const end = (cell.colAddr ?? 0) + cell.colSpan;
155
+ if (end > maxCols) maxCols = end;
156
+ }
157
+ }
158
+ if (maxCols > MAX_COLS) maxCols = MAX_COLS;
159
+ if (maxCols === 0) return { rows: 0, cols: 0, cells: [], hasHeader: false };
160
+ const grid = Array.from(
161
+ { length: numRows },
162
+ () => Array.from({ length: maxCols }, () => ({ text: "", colSpan: 1, rowSpan: 1 }))
163
+ );
164
+ for (const row of rows) {
165
+ for (const cell of row) {
166
+ const r = cell.rowAddr ?? 0;
167
+ const c = cell.colAddr ?? 0;
168
+ if (r >= numRows || c >= maxCols || r < 0 || c < 0) continue;
169
+ grid[r][c] = { text: cell.text.trim(), colSpan: cell.colSpan, rowSpan: cell.rowSpan };
170
+ for (let dr = 0; dr < cell.rowSpan; dr++) {
171
+ for (let dc = 0; dc < cell.colSpan; dc++) {
172
+ if (dr === 0 && dc === 0) continue;
173
+ if (r + dr < numRows && c + dc < maxCols) {
174
+ grid[r + dr][c + dc] = { text: "", colSpan: 1, rowSpan: 1 };
175
+ }
176
+ }
177
+ }
178
+ }
179
+ }
180
+ return trimAndReturn(grid, numRows, maxCols);
181
+ }
182
+ function trimAndReturn(grid, numRows, maxCols) {
183
+ let effectiveCols = maxCols;
184
+ while (effectiveCols > 0) {
185
+ const colEmpty = grid.every((row) => !row[effectiveCols - 1]?.text?.trim());
186
+ if (!colEmpty) break;
187
+ effectiveCols--;
188
+ }
189
+ if (effectiveCols < maxCols && effectiveCols > 0) {
190
+ const trimmed = grid.map((row) => row.slice(0, effectiveCols));
191
+ return { rows: numRows, cols: effectiveCols, cells: trimmed, hasHeader: numRows > 1 };
192
+ }
193
+ return { rows: numRows, cols: maxCols, cells: grid, hasHeader: numRows > 1 };
194
+ }
195
+ function convertTableToText(rows) {
196
+ return rows.map(
197
+ (row) => row.map((c) => c.text.trim().replace(/\n/g, " ").replace(/\|/g, "\\|")).filter(Boolean).join(" / ")
198
+ ).filter(Boolean).join("\n");
199
+ }
200
+ function escapeGfm(text) {
201
+ return text.replace(/~/g, "\\~");
202
+ }
203
+ var HWP_SHAPE_ALT_TEXT_RE = /(?:모서리가 둥근 |둥근 )?(?:사각형|직사각형|정사각형|원|타원|삼각형|이등변 삼각형|직각 삼각형|선|직선|곡선|화살표|굵은 화살표|이중 화살표|오각형|육각형|팔각형|별|[4-8]점별|십자|십자형|구름|구름형|마름모|도넛|평행사변형|사다리꼴|부채꼴|호|반원|물결|번개|하트|빗금|블록 화살표|수식|표|그림|개체|그리기\s?개체|묶음\s?개체|글상자|수식\s?개체|OLE\s?개체)\s?입니다\.?/g;
204
+ function sanitizeText(text) {
205
+ let result = text.replace(/[\u{F0000}-\u{FFFFD}]/gu, "").replace(HWP_SHAPE_ALT_TEXT_RE, "").replace(/ +/g, " ").trim();
206
+ if (result.length <= 30 && result.includes(" ")) {
207
+ const tokens = result.split(" ");
208
+ const koreanSingleCharCount = tokens.filter((t) => t.length === 1 && /[\uAC00-\uD7AF\u3131-\u318E]/.test(t)).length;
209
+ if (tokens.length >= 3 && koreanSingleCharCount / tokens.length >= 0.7) {
210
+ result = tokens.join("");
211
+ }
212
+ }
213
+ return result;
214
+ }
215
+ function flattenLayoutTables(blocks) {
216
+ const result = [];
217
+ for (const block of blocks) {
218
+ if (block.type !== "table" || !block.table) {
219
+ result.push(block);
220
+ continue;
221
+ }
222
+ const { rows: numRows, cols: numCols, cells } = block.table;
223
+ if (numRows === 1 && numCols === 1) {
224
+ result.push(block);
225
+ continue;
226
+ }
227
+ if (numRows <= 3) {
228
+ let totalNewlines = 0;
229
+ let totalTextLen = 0;
230
+ for (let r = 0; r < numRows; r++) {
231
+ for (let c = 0; c < numCols; c++) {
232
+ const t = cells[r]?.[c]?.text || "";
233
+ totalNewlines += (t.match(/\n/g) || []).length;
234
+ totalTextLen += t.length;
235
+ }
236
+ }
237
+ if (totalNewlines > 5 || numRows <= 2 && totalTextLen > 300) {
238
+ for (let r = 0; r < numRows; r++) {
239
+ for (let c = 0; c < numCols; c++) {
240
+ const cellText = cells[r]?.[c]?.text?.trim();
241
+ if (!cellText) continue;
242
+ for (const line of cellText.split("\n")) {
243
+ const trimmed = line.trim();
244
+ if (!trimmed) continue;
245
+ result.push({ type: "paragraph", text: trimmed, pageNumber: block.pageNumber });
246
+ }
247
+ }
248
+ }
249
+ continue;
250
+ }
251
+ }
252
+ result.push(block);
253
+ }
254
+ return result;
255
+ }
256
+ function blocksToMarkdown(blocks) {
257
+ const lines = [];
258
+ for (let i = 0; i < blocks.length; i++) {
259
+ const block = blocks[i];
260
+ if (block.type === "heading" && block.text) {
261
+ const prefix = "#".repeat(Math.min(block.level || 2, 6));
262
+ const headingText = sanitizeText(block.text);
263
+ if (headingText) lines.push("", `${prefix} ${headingText}`, "");
264
+ continue;
265
+ }
266
+ if (block.type === "image" && block.text) {
267
+ lines.push("", `![image](${block.text})`, "");
268
+ continue;
269
+ }
270
+ if (block.type === "separator") {
271
+ lines.push("", "---", "");
272
+ continue;
273
+ }
274
+ if (block.type === "list" && block.text) {
275
+ const listText = sanitizeText(block.text);
276
+ if (!listText) continue;
277
+ const alreadyNumbered = block.listType === "ordered" && /^\d+\.\s/.test(listText);
278
+ const prefix = alreadyNumbered ? "" : block.listType === "ordered" ? "1. " : "- ";
279
+ lines.push(`${prefix}${listText}`);
280
+ if (block.children) {
281
+ for (const child of block.children) {
282
+ const childPrefix = child.listType === "ordered" ? "1." : "-";
283
+ lines.push(` ${childPrefix} ${child.text || ""}`);
284
+ }
285
+ }
286
+ continue;
287
+ }
288
+ if (block.type === "paragraph" && block.text) {
289
+ let text = sanitizeText(block.text);
290
+ if (!text) continue;
291
+ if (/^\[별표\s*\d+/.test(text)) {
292
+ const nextBlock = blocks[i + 1];
293
+ if (nextBlock?.type === "paragraph" && nextBlock.text && /관련\)?$/.test(nextBlock.text)) {
294
+ lines.push("", `## ${text} ${nextBlock.text}`, "");
295
+ i++;
296
+ } else {
297
+ lines.push("", `## ${text}`, "");
298
+ }
299
+ continue;
300
+ }
301
+ if (/^\([^)]*조[^)]*관련\)$/.test(text)) {
302
+ lines.push(`*${text}*`, "");
303
+ continue;
304
+ }
305
+ if (block.href) {
306
+ const href = sanitizeHref(block.href);
307
+ if (href) text = `[${text}](${href})`;
308
+ }
309
+ if (block.footnoteText) {
310
+ text += ` (\uC8FC: ${block.footnoteText})`;
311
+ }
312
+ lines.push(escapeGfm(text), "");
313
+ } else if (block.type === "table" && block.table) {
314
+ if (lines.length > 0 && lines[lines.length - 1] !== "") {
315
+ lines.push("");
316
+ }
317
+ const tableMd = tableToMarkdown(block.table);
318
+ if (tableMd) {
319
+ lines.push(tableMd);
320
+ lines.push("");
321
+ }
322
+ }
323
+ }
324
+ return lines.join("\n").trim();
325
+ }
326
+ function hasMergedCells(table) {
327
+ for (const row of table.cells) {
328
+ for (const cell of row) {
329
+ if (cell.colSpan > 1 || cell.rowSpan > 1) return true;
330
+ }
331
+ }
332
+ return false;
333
+ }
334
+ function tableToHtml(table) {
335
+ const { cells, rows: numRows, cols: numCols } = table;
336
+ const skip = /* @__PURE__ */ new Set();
337
+ const lines = ["<table>"];
338
+ for (let r = 0; r < numRows; r++) {
339
+ const tag = r === 0 ? "th" : "td";
340
+ const rowHtml = [];
341
+ for (let c = 0; c < numCols; c++) {
342
+ if (skip.has(`${r},${c}`)) continue;
343
+ const cell = cells[r]?.[c];
344
+ if (!cell) continue;
345
+ for (let dr = 0; dr < cell.rowSpan; dr++) {
346
+ for (let dc = 0; dc < cell.colSpan; dc++) {
347
+ if (dr === 0 && dc === 0) continue;
348
+ if (r + dr < numRows && c + dc < numCols) skip.add(`${r + dr},${c + dc}`);
349
+ }
350
+ }
351
+ const text = sanitizeText(cell.text).replace(/\n/g, "<br>");
352
+ const attrs = [];
353
+ if (cell.colSpan > 1) attrs.push(`colspan="${cell.colSpan}"`);
354
+ if (cell.rowSpan > 1) attrs.push(`rowspan="${cell.rowSpan}"`);
355
+ const attrStr = attrs.length ? " " + attrs.join(" ") : "";
356
+ rowHtml.push(`<${tag}${attrStr}>${text}</${tag}>`);
357
+ }
358
+ if (rowHtml.length) lines.push(`<tr>${rowHtml.join("")}</tr>`);
359
+ }
360
+ lines.push("</table>");
361
+ return lines.join("\n");
362
+ }
363
+ function tableToMarkdown(table) {
364
+ if (table.rows === 0 || table.cols === 0) return "";
365
+ const { cells, rows: numRows, cols: numCols } = table;
366
+ if (hasMergedCells(table)) return tableToHtml(table);
367
+ if (numRows === 1 && numCols === 1) {
368
+ const content = sanitizeText(cells[0][0].text);
369
+ if (!content) return "";
370
+ return content.split(/\n/).map((line) => {
371
+ const trimmed = line.trim();
372
+ if (!trimmed) return "";
373
+ if (/^\d+\.\s/.test(trimmed)) return `**${escapeGfm(trimmed)}**`;
374
+ if (/^[가-힣]\.\s/.test(trimmed)) return ` ${escapeGfm(trimmed)}`;
375
+ return escapeGfm(trimmed);
376
+ }).filter(Boolean).join("\n");
377
+ }
378
+ if (numCols === 1 && numRows >= 2) {
379
+ return cells.map((row) => escapeGfm(sanitizeText(row[0].text)).replace(/\n/g, " ")).filter(Boolean).join("\n");
380
+ }
381
+ const display = Array.from({ length: numRows }, () => Array(numCols).fill(""));
382
+ const skip = /* @__PURE__ */ new Set();
383
+ for (let r = 0; r < numRows; r++) {
384
+ for (let c = 0; c < numCols; c++) {
385
+ if (skip.has(`${r},${c}`)) continue;
386
+ const cell = cells[r]?.[c];
387
+ if (!cell) continue;
388
+ display[r][c] = escapeGfm(sanitizeText(cell.text)).replace(/\|/g, "\\|").replace(/\n/g, "<br>");
389
+ for (let dr = 0; dr < cell.rowSpan; dr++) {
390
+ for (let dc = 0; dc < cell.colSpan; dc++) {
391
+ if (dr === 0 && dc === 0) continue;
392
+ if (r + dr < numRows && c + dc < numCols) {
393
+ skip.add(`${r + dr},${c + dc}`);
394
+ }
395
+ }
396
+ }
397
+ c += cell.colSpan - 1;
398
+ }
399
+ }
400
+ const uniqueRows = [];
401
+ let pendingFirstCol = "";
402
+ for (let r = 0; r < display.length; r++) {
403
+ const row = display[r];
404
+ const isEmptyPlaceholder = row.every((cell) => cell === "");
405
+ if (isEmptyPlaceholder) continue;
406
+ const nonEmptyCols = row.filter((cell) => cell !== "");
407
+ const hasSkipInRow = row.some((_, c) => skip.has(`${r},${c}`));
408
+ if (!hasSkipInRow && nonEmptyCols.length === 1 && row[0] !== "" && row.slice(1).every((c) => c === "")) {
409
+ pendingFirstCol = row[0];
410
+ continue;
411
+ }
412
+ if (pendingFirstCol && row[0] === "") {
413
+ row[0] = pendingFirstCol;
414
+ pendingFirstCol = "";
415
+ } else {
416
+ pendingFirstCol = "";
417
+ }
418
+ uniqueRows.push(row);
419
+ }
420
+ if (uniqueRows.length === 0) return "";
421
+ const md = [];
422
+ md.push("| " + uniqueRows[0].join(" | ") + " |");
423
+ md.push("| " + uniqueRows[0].map(() => "---").join(" | ") + " |");
424
+ for (let i = 1; i < uniqueRows.length; i++) {
425
+ md.push("| " + uniqueRows[i].join(" | ") + " |");
426
+ }
427
+ return md.join("\n");
428
+ }
429
+
430
+ // src/types.ts
431
+ var HEADING_RATIO_H1 = 1.5;
432
+ var HEADING_RATIO_H2 = 1.3;
433
+ var HEADING_RATIO_H3 = 1.15;
434
+
435
+ export {
436
+ VERSION,
437
+ toArrayBuffer,
438
+ KordocError,
439
+ sanitizeError,
440
+ isPathTraversal,
441
+ precheckZipSize,
442
+ stripDtd,
443
+ sanitizeHref,
444
+ safeMin,
445
+ safeMax,
446
+ classifyError,
447
+ MAX_COLS,
448
+ MAX_ROWS,
449
+ buildTable,
450
+ convertTableToText,
451
+ flattenLayoutTables,
452
+ blocksToMarkdown,
453
+ HEADING_RATIO_H1,
454
+ HEADING_RATIO_H2,
455
+ HEADING_RATIO_H3
456
+ };
457
+ //# sourceMappingURL=chunk-FCQEF2ZM.js.map