kordoc 2.5.2 → 2.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/README.md +21 -2
  2. package/dist/chunk-5CJGKKMZ.js +266 -0
  3. package/dist/chunk-5CJGKKMZ.js.map +1 -0
  4. package/dist/{chunk-24NKFRB4.js → chunk-GNN6MHH4.js} +14 -3
  5. package/dist/chunk-GNN6MHH4.js.map +1 -0
  6. package/dist/{chunk-NKKLA43G.js → chunk-LA66FVBN.js} +14 -3
  7. package/dist/chunk-LA66FVBN.js.map +1 -0
  8. package/dist/chunk-OBSPVJ6A.js +18947 -0
  9. package/dist/chunk-OBSPVJ6A.js.map +1 -0
  10. package/dist/{chunk-Z65OQP3H.cjs → chunk-RFGEEHI4.cjs} +14 -3
  11. package/dist/{chunk-Z65OQP3H.cjs.map → chunk-RFGEEHI4.cjs.map} +1 -1
  12. package/dist/cli.js +60 -5
  13. package/dist/cli.js.map +1 -1
  14. package/dist/{detect-I7YIS4Q6.js → detect-PJZMUL2Z.js} +6 -2
  15. package/dist/formula-3AQUUIRF.js +1151 -0
  16. package/dist/formula-3AQUUIRF.js.map +1 -0
  17. package/dist/formula-JCNF43NE.js +1153 -0
  18. package/dist/formula-JCNF43NE.js.map +1 -0
  19. package/dist/formula-XGG6ZP42.cjs +1151 -0
  20. package/dist/formula-XGG6ZP42.cjs.map +1 -0
  21. package/dist/index.cjs +14703 -455
  22. package/dist/index.cjs.map +1 -1
  23. package/dist/index.d.cts +73 -2
  24. package/dist/index.d.ts +73 -2
  25. package/dist/index.js +14575 -327
  26. package/dist/index.js.map +1 -1
  27. package/dist/mcp.js +5 -5
  28. package/dist/{parser-FRROKAB7.js → parser-5CJGXQCJ.js} +135 -3
  29. package/dist/parser-5CJGXQCJ.js.map +1 -0
  30. package/dist/{parser-BQKQOIJU.js → parser-6L6DZCOB.js} +135 -3
  31. package/dist/parser-6L6DZCOB.js.map +1 -0
  32. package/dist/{parser-AZYPOKAR.cjs → parser-SRI2TIZX.cjs} +159 -27
  33. package/dist/{parser-AZYPOKAR.cjs.map → parser-SRI2TIZX.cjs.map} +1 -1
  34. package/dist/{watch-ZJAUWUAE.js → watch-7CTGUDQB.js} +4 -4
  35. package/package.json +25 -4
  36. package/dist/chunk-24NKFRB4.js.map +0 -1
  37. package/dist/chunk-2CAJSQK5.js +0 -5052
  38. package/dist/chunk-2CAJSQK5.js.map +0 -1
  39. package/dist/chunk-M3E3C5GS.js +0 -59
  40. package/dist/chunk-M3E3C5GS.js.map +0 -1
  41. package/dist/chunk-NKKLA43G.js.map +0 -1
  42. package/dist/parser-BQKQOIJU.js.map +0 -1
  43. package/dist/parser-FRROKAB7.js.map +0 -1
  44. /package/dist/{detect-I7YIS4Q6.js.map → detect-PJZMUL2Z.js.map} +0 -0
  45. /package/dist/{watch-ZJAUWUAE.js.map → watch-7CTGUDQB.js.map} +0 -0
package/README.md CHANGED
@@ -7,7 +7,7 @@
7
7
 
8
8
  > *대한민국에서 둘째가라면 서러울 문서지옥. 거기서 7년 버틴 공무원이 만들었습니다.*
9
9
 
10
- HWP, HWPX, PDF, XLSX, DOCX — 관공서에서 쏟아지는 모든 문서를 파싱하고, 비교하고, 분석하고, 생성합니다.
10
+ HWP 3.x/5.x, HWPX, PDF, XLS, XLSX, DOCX — 관공서에서 쏟아지는 모든 문서를 파싱하고, 비교하고, 분석하고, 생성합니다.
11
11
 
12
12
  [English](./README-EN.md)
13
13
 
@@ -37,13 +37,28 @@ Windows 도 자동으로 `cmd /c npx` 래핑. 수동 JSON 편집 불필요. 재
37
37
  > npx -y kordoc@latest setup
38
38
  > ```
39
39
 
40
+ > **Windows PowerShell 에서 `npx.ps1 파일을 로드할 수 없습니다 · PSSecurityException` 이 뜨면**: PowerShell 기본 보안 정책이 서명 없는 `.ps1` 을 차단하는 표준 동작입니다 (kordoc 무관). 아래 중 하나 쓰시면 됩니다.
41
+ >
42
+ > **방법 1 — 명령 프롬프트(cmd) 창에서 실행** (가장 안전)
43
+ > 윈도우 키 → `cmd` 검색 → Enter → 검은 창에서 그대로:
44
+ > ```
45
+ > npx -y kordoc setup
46
+ > ```
47
+ >
48
+ > **방법 2 — PowerShell 실행 정책 한 번만 완화**
49
+ > 관리자 권한 PowerShell:
50
+ > ```powershell
51
+ > Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
52
+ > ```
53
+ > 이후 PowerShell 재시작 → `npx -y kordoc setup` 그대로 됨.
54
+
40
55
  ---
41
56
 
42
57
  ## 💡 kordoc으로 무엇을 할 수 있나요?
43
58
 
44
59
  단순한 텍스트 추출을 넘어, **공문서 처리를 위한 모든 과정**을 자동화합니다.
45
60
 
46
- * **📄 어떤 문서든 마크다운으로**: `HWP`, `HWPX`, `HWPML`, `PDF`, `XLSX`, `DOCX` 파일을 즉시 `Markdown`으로 변환합니다. AI(LLM)가 문서를 읽고 분석하기 가장 좋은 상태로 만들어줍니다.
61
+ * **📄 어떤 문서든 마크다운으로**: `HWP3` (구버전), `HWP`(5.x), `HWPX`, `HWPML`, `PDF`, `XLS`, `XLSX`, `DOCX` 파일을 즉시 `Markdown`으로 변환합니다. AI(LLM)가 문서를 읽고 분석하기 가장 좋은 상태로 만들어줍니다.
47
62
  * **📊 복잡한 표(Table) 완벽 재현**: 선이 없는 PDF나 복잡하게 병합된 HWP 표도 구조를 분석하여 정확한 마크다운 테이블로 복원합니다.
48
63
  * **🔍 신구대조표 자동 생성**: 두 문서의 차이점을 분석하여 무엇이 바뀌었는지 한눈에 보여줍니다. (HWP와 HWPX 간의 비교도 가능!)
49
64
  * **📝 마크다운을 다시 HWPX로**: AI가 작성한 내용을 다시 보고서 양식(`HWPX`)으로 되돌려줍니다. 이제 복사-붙여넣기 노가다에서 해방되세요.
@@ -52,6 +67,10 @@ Windows 도 자동으로 `cmd /c npx` 래핑. 수동 JSON 편집 불필요. 재
52
67
 
53
68
  ---
54
69
 
70
+ ## v2.7.1 변경사항
71
+
72
+ - **🕰️ HWP 3.0 (구버전) 파서 추가** — 1996~2002년 한컴이 쓰던 단일 binary 포맷 (`"HWP Document File V3.00"` 시그니처) 텍스트 추출. 기존 kordoc 이 거부하던 구버전 판결문/공문서 등이 검색 인덱싱 가능. 상용조합형(johab) → 유니코드 + 5,893개 한자/기호 lookup. 표 cell / 머리말 / 각주 의 nested paragraph 재귀 추출. [@edwardkim/rhwp](https://github.com/edwardkim/rhwp) 의 Rust 구현을 TypeScript 로 포팅.
73
+
55
74
  ## v2.5.0 변경사항
56
75
 
57
76
  - **🏛️ macOS 한컴오피스 호환 HWPX 생성** (#4) — `markdownToHwpx()` 가 만든 HWPX 가 macOS 한컴에서 "파일이 깨졌다"며 거부되던 문제 해결. 테이블 XML 을 최소 스켈레톤에서 완전 스펙 형태로 재작성 — `<hp:tbl>` 필수 속성 10종 + `<hp:sz>`/`<hp:pos>`/`<hp:outMargin>`/`<hp:inMargin>`, `<hp:tc>` 안에 `<hp:subList>` 래퍼 + `<hp:cellAddr>`/`<hp:cellSpan>`/`<hp:cellSz>`/`<hp:cellMargin>`, paragraph 래핑. `Preview/PrvText.txt` 추가 + `borderFill` id=1(SOLID 0.12mm) 추가.
@@ -0,0 +1,266 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/detect.ts
4
+ import JSZip from "jszip";
5
+
6
+ // src/hwp5/cfb-lenient.ts
7
+ var CFB_MAGIC = Buffer.from([208, 207, 17, 224, 161, 177, 26, 225]);
8
+ var END_OF_CHAIN = 4294967294;
9
+ var FREE_SECT = 4294967295;
10
+ var MAX_CHAIN_LENGTH = 1e6;
11
+ var MAX_DIR_ENTRIES = 1e5;
12
+ var MAX_STREAM_SIZE = 100 * 1024 * 1024;
13
+ function parseLenientCfb(data) {
14
+ if (data.length < 512) throw new Error("CFB \uD30C\uC77C\uC774 \uB108\uBB34 \uC9E7\uC2B5\uB2C8\uB2E4 (\uCD5C\uC18C 512\uBC14\uC774\uD2B8)");
15
+ if (!data.subarray(0, 8).equals(CFB_MAGIC)) throw new Error("CFB \uB9E4\uC9C1 \uBC14\uC774\uD2B8 \uBD88\uC77C\uCE58");
16
+ const sectorSizeShift = data.readUInt16LE(30);
17
+ if (sectorSizeShift < 7 || sectorSizeShift > 16) throw new Error("\uC720\uD6A8\uD558\uC9C0 \uC54A\uC740 \uC139\uD130 \uD06C\uAE30 \uC2DC\uD504\uD2B8: " + sectorSizeShift);
18
+ const sectorSize = 1 << sectorSizeShift;
19
+ const miniSectorSizeShift = data.readUInt16LE(32);
20
+ if (miniSectorSizeShift > 16) throw new Error("\uC720\uD6A8\uD558\uC9C0 \uC54A\uC740 \uBBF8\uB2C8 \uC139\uD130 \uD06C\uAE30 \uC2DC\uD504\uD2B8: " + miniSectorSizeShift);
21
+ const miniSectorSize = 1 << miniSectorSizeShift;
22
+ const fatSectorCount = data.readUInt32LE(44);
23
+ if (fatSectorCount > 1e4) throw new Error("FAT \uC139\uD130 \uC218\uAC00 \uB108\uBB34 \uB9CE\uC2B5\uB2C8\uB2E4: " + fatSectorCount);
24
+ const firstDirSector = data.readUInt32LE(48);
25
+ const miniStreamCutoff = data.readUInt32LE(56);
26
+ const firstMiniFatSector = data.readUInt32LE(60);
27
+ const miniFatSectorCount = data.readUInt32LE(64);
28
+ const firstDifatSector = data.readUInt32LE(68);
29
+ const difatSectorCount = data.readUInt32LE(72);
30
+ function sectorOffset(id) {
31
+ return 512 + id * sectorSize;
32
+ }
33
+ function readSectorData(id) {
34
+ const off = sectorOffset(id);
35
+ if (off + sectorSize > data.length) return Buffer.alloc(0);
36
+ return data.subarray(off, off + sectorSize);
37
+ }
38
+ const fatSectors = [];
39
+ for (let i = 0; i < 109 && fatSectors.length < fatSectorCount; i++) {
40
+ const sid = data.readUInt32LE(76 + i * 4);
41
+ if (sid === FREE_SECT || sid === END_OF_CHAIN) break;
42
+ fatSectors.push(sid);
43
+ }
44
+ let difatSector = firstDifatSector;
45
+ const visitedDifat = /* @__PURE__ */ new Set();
46
+ for (let d = 0; d < difatSectorCount && difatSector !== END_OF_CHAIN && difatSector !== FREE_SECT; d++) {
47
+ if (visitedDifat.has(difatSector)) break;
48
+ visitedDifat.add(difatSector);
49
+ const buf = readSectorData(difatSector);
50
+ const entriesPerSector = sectorSize / 4 - 1;
51
+ for (let i = 0; i < entriesPerSector && fatSectors.length < fatSectorCount; i++) {
52
+ const sid = buf.readUInt32LE(i * 4);
53
+ if (sid === FREE_SECT || sid === END_OF_CHAIN) continue;
54
+ fatSectors.push(sid);
55
+ }
56
+ difatSector = buf.readUInt32LE(entriesPerSector * 4);
57
+ }
58
+ const entriesPerFatSector = sectorSize / 4;
59
+ const fatTable = new Uint32Array(fatSectors.length * entriesPerFatSector);
60
+ for (let fi = 0; fi < fatSectors.length; fi++) {
61
+ const buf = readSectorData(fatSectors[fi]);
62
+ for (let i = 0; i < entriesPerFatSector; i++) {
63
+ fatTable[fi * entriesPerFatSector + i] = i * 4 + 3 < buf.length ? buf.readUInt32LE(i * 4) : FREE_SECT;
64
+ }
65
+ }
66
+ function readChain(startSector, maxBytes) {
67
+ if (startSector === END_OF_CHAIN || startSector === FREE_SECT) return Buffer.alloc(0);
68
+ if (maxBytes > MAX_STREAM_SIZE) throw new Error("\uC2A4\uD2B8\uB9BC\uC774 \uB108\uBB34 \uD07D\uB2C8\uB2E4");
69
+ const chunks = [];
70
+ let current = startSector;
71
+ let totalRead = 0;
72
+ const visited = /* @__PURE__ */ new Set();
73
+ while (current !== END_OF_CHAIN && current !== FREE_SECT && totalRead < maxBytes) {
74
+ if (visited.has(current)) break;
75
+ if (visited.size > MAX_CHAIN_LENGTH) break;
76
+ visited.add(current);
77
+ const buf = readSectorData(current);
78
+ const remaining = maxBytes - totalRead;
79
+ chunks.push(remaining < sectorSize ? buf.subarray(0, remaining) : buf);
80
+ totalRead += Math.min(buf.length, remaining);
81
+ current = current < fatTable.length ? fatTable[current] : END_OF_CHAIN;
82
+ }
83
+ return Buffer.concat(chunks);
84
+ }
85
+ let miniFatTable = null;
86
+ function getMiniFatTable() {
87
+ if (miniFatTable) return miniFatTable;
88
+ if (miniFatSectorCount === 0 || firstMiniFatSector === END_OF_CHAIN) {
89
+ miniFatTable = new Uint32Array(0);
90
+ return miniFatTable;
91
+ }
92
+ const miniFatData = readChain(firstMiniFatSector, miniFatSectorCount * sectorSize);
93
+ const entries = miniFatData.length / 4;
94
+ miniFatTable = new Uint32Array(entries);
95
+ for (let i = 0; i < entries; i++) {
96
+ miniFatTable[i] = miniFatData.readUInt32LE(i * 4);
97
+ }
98
+ return miniFatTable;
99
+ }
100
+ const dirData = readChain(firstDirSector, MAX_DIR_ENTRIES * 128);
101
+ const dirEntries = [];
102
+ for (let offset = 0; offset + 128 <= dirData.length && dirEntries.length < MAX_DIR_ENTRIES; offset += 128) {
103
+ const nameLen = dirData.readUInt16LE(offset + 64);
104
+ if (nameLen <= 0 || nameLen > 64) {
105
+ dirEntries.push({ name: "", type: 0, startSector: 0, size: 0 });
106
+ continue;
107
+ }
108
+ const nameBytes = nameLen - 2;
109
+ const name = nameBytes > 0 ? dirData.subarray(offset, offset + nameBytes).toString("utf16le") : "";
110
+ const type = dirData[offset + 66];
111
+ const startSector = dirData.readUInt32LE(offset + 116);
112
+ const size = dirData.readUInt32LE(offset + 120);
113
+ dirEntries.push({ name, type, startSector, size });
114
+ }
115
+ let miniStreamData = null;
116
+ function getMiniStream() {
117
+ if (miniStreamData) return miniStreamData;
118
+ const root = dirEntries[0];
119
+ if (!root || root.type !== 5) {
120
+ miniStreamData = Buffer.alloc(0);
121
+ return miniStreamData;
122
+ }
123
+ miniStreamData = readChain(root.startSector, root.size || MAX_STREAM_SIZE);
124
+ return miniStreamData;
125
+ }
126
+ function readMiniStream(startSector, size) {
127
+ const mft = getMiniFatTable();
128
+ const ms = getMiniStream();
129
+ if (mft.length === 0 || ms.length === 0) return Buffer.alloc(0);
130
+ const chunks = [];
131
+ let current = startSector;
132
+ let totalRead = 0;
133
+ const visited = /* @__PURE__ */ new Set();
134
+ while (current !== END_OF_CHAIN && current !== FREE_SECT && totalRead < size) {
135
+ if (visited.has(current)) break;
136
+ if (visited.size > MAX_CHAIN_LENGTH) break;
137
+ visited.add(current);
138
+ const off = current * miniSectorSize;
139
+ const remaining = size - totalRead;
140
+ const chunkSize = Math.min(miniSectorSize, remaining);
141
+ if (off + chunkSize <= ms.length) {
142
+ chunks.push(ms.subarray(off, off + chunkSize));
143
+ }
144
+ totalRead += chunkSize;
145
+ current = current < mft.length ? mft[current] : END_OF_CHAIN;
146
+ }
147
+ return Buffer.concat(chunks);
148
+ }
149
+ function readStreamData(entry) {
150
+ if (entry.size === 0) return Buffer.alloc(0);
151
+ if (entry.size < miniStreamCutoff) {
152
+ const miniResult = readMiniStream(entry.startSector, entry.size);
153
+ if (miniResult.length > 0) return miniResult;
154
+ }
155
+ return readChain(entry.startSector, entry.size);
156
+ }
157
+ function findEntryByPath(path) {
158
+ const parts = path.replace(/^\//, "").split("/");
159
+ if (parts.length === 1) {
160
+ return dirEntries.find((e) => e.name === parts[0] && e.type === 2) ?? null;
161
+ }
162
+ const storageName = parts[0];
163
+ const streamName = parts.slice(1).join("/");
164
+ for (const e of dirEntries) {
165
+ if (e.type === 2 && e.name === streamName) {
166
+ return e;
167
+ }
168
+ }
169
+ const lastPart = parts[parts.length - 1];
170
+ return dirEntries.find((e) => e.type === 2 && e.name === lastPart) ?? null;
171
+ }
172
+ return {
173
+ findStream(path) {
174
+ const normalized = path.replace(/^\//, "");
175
+ const entry = findEntryByPath(normalized);
176
+ if (!entry || entry.type !== 2) return null;
177
+ const stream = readStreamData(entry);
178
+ return stream.length > 0 ? stream : null;
179
+ },
180
+ entries() {
181
+ return dirEntries.filter((e) => e.type === 2);
182
+ }
183
+ };
184
+ }
185
+
186
+ // src/detect.ts
187
+ function magicBytes(buffer) {
188
+ return new Uint8Array(buffer, 0, Math.min(4, buffer.byteLength));
189
+ }
190
+ function isZipFile(buffer) {
191
+ const b = magicBytes(buffer);
192
+ return b[0] === 80 && b[1] === 75 && b[2] === 3 && b[3] === 4;
193
+ }
194
+ function isHwpxFile(buffer) {
195
+ return isZipFile(buffer);
196
+ }
197
+ function isOldHwpFile(buffer) {
198
+ const b = magicBytes(buffer);
199
+ return b[0] === 208 && b[1] === 207 && b[2] === 17 && b[3] === 224;
200
+ }
201
+ var HWP3_PREFIX = new TextEncoder().encode("HWP Document File V3.00");
202
+ function isHwp3File(buffer) {
203
+ if (buffer.byteLength < HWP3_PREFIX.length) return false;
204
+ const head = new Uint8Array(buffer, 0, HWP3_PREFIX.length);
205
+ for (let i = 0; i < HWP3_PREFIX.length; i++) {
206
+ if (head[i] !== HWP3_PREFIX[i]) return false;
207
+ }
208
+ return true;
209
+ }
210
+ function isPdfFile(buffer) {
211
+ const b = magicBytes(buffer);
212
+ return b[0] === 37 && b[1] === 80 && b[2] === 68 && b[3] === 70;
213
+ }
214
+ function isHwpmlFile(buffer) {
215
+ const bytes = new Uint8Array(buffer, 0, Math.min(512, buffer.byteLength));
216
+ const head = new TextDecoder("utf-8", { fatal: false }).decode(bytes).replace(/^\uFEFF/, "");
217
+ return head.trimStart().startsWith("<?xml") && head.includes("<HWPML");
218
+ }
219
+ function detectFormat(buffer) {
220
+ if (buffer.byteLength < 4) return "unknown";
221
+ if (isHwp3File(buffer)) return "hwp3";
222
+ if (isZipFile(buffer)) return "hwpx";
223
+ if (isOldHwpFile(buffer)) return "hwp";
224
+ if (isPdfFile(buffer)) return "pdf";
225
+ if (isHwpmlFile(buffer)) return "hwpml";
226
+ return "unknown";
227
+ }
228
+ function detectOle2Format(buffer) {
229
+ try {
230
+ const cfb = parseLenientCfb(Buffer.from(buffer));
231
+ const names = cfb.entries().map((e) => e.name);
232
+ if (names.includes("Workbook") || names.includes("Book")) return "xls";
233
+ if (names.includes("FileHeader")) return "hwp";
234
+ if (names.some((n) => n === "DocInfo" || n.startsWith("Section"))) return "hwp";
235
+ return "unknown";
236
+ } catch {
237
+ return "unknown";
238
+ }
239
+ }
240
+ async function detectZipFormat(buffer) {
241
+ try {
242
+ const zip = await JSZip.loadAsync(buffer);
243
+ if (zip.file("xl/workbook.xml")) return "xlsx";
244
+ if (zip.file("word/document.xml")) return "docx";
245
+ if (zip.file("Contents/content.hpf") || zip.file("mimetype")) return "hwpx";
246
+ const hasSection = Object.keys(zip.files).some((f) => f.startsWith("Contents/"));
247
+ if (hasSection) return "hwpx";
248
+ return "unknown";
249
+ } catch {
250
+ return "unknown";
251
+ }
252
+ }
253
+
254
+ export {
255
+ parseLenientCfb,
256
+ isZipFile,
257
+ isHwpxFile,
258
+ isOldHwpFile,
259
+ isHwp3File,
260
+ isPdfFile,
261
+ isHwpmlFile,
262
+ detectFormat,
263
+ detectOle2Format,
264
+ detectZipFormat
265
+ };
266
+ //# sourceMappingURL=chunk-5CJGKKMZ.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/detect.ts","../src/hwp5/cfb-lenient.ts"],"sourcesContent":["/** 매직 바이트 기반 파일 포맷 감지 */\r\n\r\nimport JSZip from \"jszip\"\r\nimport type { FileType } from \"./types.js\"\r\nimport { parseLenientCfb } from \"./hwp5/cfb-lenient.js\"\r\n\r\n/** 매직 바이트 뷰 생성 (복사 없이 view) */\r\nfunction magicBytes(buffer: ArrayBuffer): Uint8Array {\r\n return new Uint8Array(buffer, 0, Math.min(4, buffer.byteLength))\r\n}\r\n\r\n/** ZIP 파일 여부: PK\\x03\\x04 */\r\nexport function isZipFile(buffer: ArrayBuffer): boolean {\r\n const b = magicBytes(buffer)\r\n return b[0] === 0x50 && b[1] === 0x4b && b[2] === 0x03 && b[3] === 0x04\r\n}\r\n\r\n/** HWPX (ZIP 기반 한컴 문서): PK\\x03\\x04 — 하위 호환용 */\r\nexport function isHwpxFile(buffer: ArrayBuffer): boolean {\r\n return isZipFile(buffer)\r\n}\r\n\r\n/** HWP 5.x (OLE2 바이너리 한컴 문서): \\xD0\\xCF\\x11\\xE0 */\r\nexport function isOldHwpFile(buffer: ArrayBuffer): boolean {\r\n const b = magicBytes(buffer)\r\n return b[0] === 0xd0 && b[1] === 0xcf && b[2] === 0x11 && b[3] === 0xe0\r\n}\r\n\r\n/**\r\n * HWP 3.x (한글 워드프로세서 3.0): \"HWP Document File V3.00 \\x1A\\x01\\x02\\x03\\x04\\x05\" 30 byte.\r\n * CFB(OLE2) 컨테이너 아닌 단일 binary stream — isOldHwpFile 과 magic 이 다르다.\r\n */\r\nconst HWP3_PREFIX = new TextEncoder().encode(\"HWP Document File V3.00\")\r\nexport function isHwp3File(buffer: ArrayBuffer): boolean {\r\n if (buffer.byteLength < HWP3_PREFIX.length) return false\r\n const head = new Uint8Array(buffer, 0, HWP3_PREFIX.length)\r\n for (let i = 0; i < HWP3_PREFIX.length; i++) {\r\n if (head[i] !== HWP3_PREFIX[i]) return false\r\n }\r\n return true\r\n}\r\n\r\n/** PDF 문서: %PDF */\r\nexport function isPdfFile(buffer: ArrayBuffer): boolean {\r\n const b = magicBytes(buffer)\r\n return b[0] === 0x25 && b[1] === 0x50 && b[2] === 0x44 && b[3] === 0x46\r\n}\r\n\r\n/** HWPML (XML 기반 한컴 문서): <?xml ... <HWPML */\r\nexport function isHwpmlFile(buffer: ArrayBuffer): boolean {\r\n const bytes = new Uint8Array(buffer, 0, Math.min(512, buffer.byteLength))\r\n const head = new TextDecoder(\"utf-8\", { fatal: false }).decode(bytes).replace(/^\\uFEFF/, \"\")\r\n return head.trimStart().startsWith(\"<?xml\") && head.includes(\"<HWPML\")\r\n}\r\n\r\n/** 동기 포맷 감지 — ZIP은 모두 \"hwpx\"로 반환 (하위 호환) */\r\nexport function detectFormat(buffer: ArrayBuffer): FileType {\r\n if (buffer.byteLength < 4) return \"unknown\"\r\n if (isHwp3File(buffer)) return \"hwp3\"\r\n if (isZipFile(buffer)) return \"hwpx\"\r\n if (isOldHwpFile(buffer)) return \"hwp\"\r\n if (isPdfFile(buffer)) return \"pdf\"\r\n if (isHwpmlFile(buffer)) return \"hwpml\"\r\n return \"unknown\"\r\n}\r\n\r\n/**\r\n * OLE2 컨테이너 내부 스트림 기반 포맷 세분화.\r\n * HWP 5.x, XLS 모두 OLE2이므로 스트림 이름으로 구분.\r\n * - \"Workbook\" 또는 \"Book\" → 'xls'\r\n * - 그 외 (FileHeader 등) → 'hwp'\r\n */\r\nexport function detectOle2Format(buffer: ArrayBuffer): \"hwp\" | \"xls\" | \"unknown\" {\r\n try {\r\n const cfb = parseLenientCfb(Buffer.from(buffer))\r\n const names = cfb.entries().map(e => e.name)\r\n if (names.includes(\"Workbook\") || names.includes(\"Book\")) return \"xls\"\r\n if (names.includes(\"FileHeader\")) return \"hwp\"\r\n // FileHeader 없어도 BodyText/DocInfo 있으면 hwp\r\n if (names.some(n => n === \"DocInfo\" || n.startsWith(\"Section\"))) return \"hwp\"\r\n return \"unknown\"\r\n } catch {\r\n return \"unknown\"\r\n }\r\n}\r\n\r\n/**\r\n * ZIP 내부 구조 기반 포맷 세분화.\r\n * HWPX, XLSX, DOCX 모두 ZIP이므로 내부 파일로 구분.\r\n */\r\nexport async function detectZipFormat(buffer: ArrayBuffer): Promise<\"hwpx\" | \"xlsx\" | \"docx\" | \"unknown\"> {\r\n try {\r\n const zip = await JSZip.loadAsync(buffer)\r\n // XLSX: xl/workbook.xml\r\n if (zip.file(\"xl/workbook.xml\")) return \"xlsx\"\r\n // DOCX: word/document.xml\r\n if (zip.file(\"word/document.xml\")) return \"docx\"\r\n // HWPX: Contents/ 또는 content.hpf 또는 mimetype\r\n if (zip.file(\"Contents/content.hpf\") || zip.file(\"mimetype\")) return \"hwpx\"\r\n // 기타 ZIP 내에 section 파일이 있으면 HWPX로 추정\r\n const hasSection = Object.keys(zip.files).some(f => f.startsWith(\"Contents/\"))\r\n if (hasSection) return \"hwpx\"\r\n return \"unknown\"\r\n } catch {\r\n return \"unknown\"\r\n }\r\n}\r\n","/**\n * Lenient CFB (Compound File Binary / OLE2) 파서.\n *\n * 표준 cfb 모듈이 FAT 검증 실패로 거부하는 손상된 HWP 파일을 열기 위한 폴백.\n * 직접 헤더/FAT/디렉토리를 파싱하여 스트림 데이터를 추출.\n *\n * 참조: rhwp (MIT) src/parser/cfb_reader.rs (LenientCfbReader)\n * 참조: MS-CFB spec (https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb)\n */\n\nimport { decompressStream } from \"./record.js\"\n\n// ── 상수 ──\n\nconst CFB_MAGIC = Buffer.from([0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1])\nconst END_OF_CHAIN = 0xfffffffe\nconst FREE_SECT = 0xffffffff\n\n/** 순환 감지용 최대 체인 길이 */\nconst MAX_CHAIN_LENGTH = 1_000_000\n/** 최대 디렉토리 엔트리 수 */\nconst MAX_DIR_ENTRIES = 100_000\n/** 최대 스트림 크기 (100MB) */\nconst MAX_STREAM_SIZE = 100 * 1024 * 1024\n\n// ── 디렉토리 엔트리 ──\n\ninterface DirEntry {\n name: string\n type: number // 0=unknown, 1=storage, 2=stream, 5=root\n startSector: number\n size: number\n}\n\n// ── CFB 컨테이너 ──\n\nexport interface LenientCfbContainer {\n /** 이름 기반 스트림 탐색 */\n findStream(path: string): Buffer | null\n /** 디렉토리 엔트리 목록 */\n entries(): DirEntry[]\n}\n\n// ── 구현 ──\n\nexport function parseLenientCfb(data: Buffer): LenientCfbContainer {\n if (data.length < 512) throw new Error(\"CFB 파일이 너무 짧습니다 (최소 512바이트)\")\n if (!data.subarray(0, 8).equals(CFB_MAGIC)) throw new Error(\"CFB 매직 바이트 불일치\")\n\n // ── 헤더 파싱 ──\n\n const sectorSizeShift = data.readUInt16LE(30)\n if (sectorSizeShift < 7 || sectorSizeShift > 16) throw new Error(\"유효하지 않은 섹터 크기 시프트: \" + sectorSizeShift)\n const sectorSize = 1 << sectorSizeShift // 보통 512\n const miniSectorSizeShift = data.readUInt16LE(32)\n if (miniSectorSizeShift > 16) throw new Error(\"유효하지 않은 미니 섹터 크기 시프트: \" + miniSectorSizeShift)\n const miniSectorSize = 1 << miniSectorSizeShift // 보통 64\n\n const fatSectorCount = data.readUInt32LE(44)\n if (fatSectorCount > 10000) throw new Error(\"FAT 섹터 수가 너무 많습니다: \" + fatSectorCount)\n const firstDirSector = data.readUInt32LE(48)\n const miniStreamCutoff = data.readUInt32LE(56) // 보통 4096\n const firstMiniFatSector = data.readUInt32LE(60)\n const miniFatSectorCount = data.readUInt32LE(64)\n const firstDifatSector = data.readUInt32LE(68)\n const difatSectorCount = data.readUInt32LE(72)\n\n // ── 유틸 ──\n\n function sectorOffset(id: number): number {\n return 512 + id * sectorSize\n }\n\n function readSectorData(id: number): Buffer {\n const off = sectorOffset(id)\n if (off + sectorSize > data.length) return Buffer.alloc(0)\n return data.subarray(off, off + sectorSize)\n }\n\n // ── DIFAT → FAT 섹터 목록 ──\n\n const fatSectors: number[] = []\n\n // 헤더 내 DIFAT (최대 109개)\n for (let i = 0; i < 109 && fatSectors.length < fatSectorCount; i++) {\n const sid = data.readUInt32LE(76 + i * 4)\n if (sid === FREE_SECT || sid === END_OF_CHAIN) break\n fatSectors.push(sid)\n }\n\n // 추가 DIFAT 섹터 체인\n let difatSector = firstDifatSector\n const visitedDifat = new Set<number>()\n for (let d = 0; d < difatSectorCount && difatSector !== END_OF_CHAIN && difatSector !== FREE_SECT; d++) {\n if (visitedDifat.has(difatSector)) break\n visitedDifat.add(difatSector)\n\n const buf = readSectorData(difatSector)\n const entriesPerSector = (sectorSize / 4) - 1 // 마지막 4바이트는 다음 DIFAT 포인터\n for (let i = 0; i < entriesPerSector && fatSectors.length < fatSectorCount; i++) {\n const sid = buf.readUInt32LE(i * 4)\n if (sid === FREE_SECT || sid === END_OF_CHAIN) continue\n fatSectors.push(sid)\n }\n difatSector = buf.readUInt32LE(entriesPerSector * 4)\n }\n\n // ── FAT 테이블 구축 ──\n\n const entriesPerFatSector = sectorSize / 4\n const fatTable = new Uint32Array(fatSectors.length * entriesPerFatSector)\n\n for (let fi = 0; fi < fatSectors.length; fi++) {\n const buf = readSectorData(fatSectors[fi])\n for (let i = 0; i < entriesPerFatSector; i++) {\n fatTable[fi * entriesPerFatSector + i] = i * 4 + 3 < buf.length\n ? buf.readUInt32LE(i * 4)\n : FREE_SECT\n }\n }\n\n // ── 체인 리더 (순환 방지) ──\n\n function readChain(startSector: number, maxBytes: number): Buffer {\n if (startSector === END_OF_CHAIN || startSector === FREE_SECT) return Buffer.alloc(0)\n if (maxBytes > MAX_STREAM_SIZE) throw new Error(\"스트림이 너무 큽니다\")\n\n const chunks: Buffer[] = []\n let current = startSector\n let totalRead = 0\n const visited = new Set<number>()\n\n while (current !== END_OF_CHAIN && current !== FREE_SECT && totalRead < maxBytes) {\n if (visited.has(current)) break // 순환 감지\n if (visited.size > MAX_CHAIN_LENGTH) break\n visited.add(current)\n\n const buf = readSectorData(current)\n const remaining = maxBytes - totalRead\n chunks.push(remaining < sectorSize ? buf.subarray(0, remaining) : buf)\n totalRead += Math.min(buf.length, remaining)\n\n current = current < fatTable.length ? fatTable[current] : END_OF_CHAIN\n }\n\n return Buffer.concat(chunks)\n }\n\n // ── Mini-FAT 테이블 ──\n\n let miniFatTable: Uint32Array | null = null\n\n function getMiniFatTable(): Uint32Array {\n if (miniFatTable) return miniFatTable\n\n if (miniFatSectorCount === 0 || firstMiniFatSector === END_OF_CHAIN) {\n miniFatTable = new Uint32Array(0)\n return miniFatTable\n }\n\n const miniFatData = readChain(firstMiniFatSector, miniFatSectorCount * sectorSize)\n const entries = miniFatData.length / 4\n miniFatTable = new Uint32Array(entries)\n for (let i = 0; i < entries; i++) {\n miniFatTable[i] = miniFatData.readUInt32LE(i * 4)\n }\n return miniFatTable\n }\n\n // ── 디렉토리 엔트리 파싱 ──\n\n const dirData = readChain(firstDirSector, MAX_DIR_ENTRIES * 128)\n const dirEntries: DirEntry[] = []\n\n for (let offset = 0; offset + 128 <= dirData.length && dirEntries.length < MAX_DIR_ENTRIES; offset += 128) {\n const nameLen = dirData.readUInt16LE(offset + 64) // 바이트 수 (null 포함)\n if (nameLen <= 0 || nameLen > 64) {\n dirEntries.push({ name: \"\", type: 0, startSector: 0, size: 0 })\n continue\n }\n\n const nameBytes = nameLen - 2 // null terminator 제외\n const name = nameBytes > 0\n ? dirData.subarray(offset, offset + nameBytes).toString(\"utf16le\")\n : \"\"\n\n const type = dirData[offset + 66]\n const startSector = dirData.readUInt32LE(offset + 116)\n // CFBv3에서는 size가 u32 (offset 120), v4에서는 u64\n const size = dirData.readUInt32LE(offset + 120)\n\n dirEntries.push({ name, type, startSector, size })\n }\n\n // ── Root 엔트리에서 미니 스트림 추출 ──\n\n let miniStreamData: Buffer | null = null\n\n function getMiniStream(): Buffer {\n if (miniStreamData) return miniStreamData\n const root = dirEntries[0]\n if (!root || root.type !== 5) {\n miniStreamData = Buffer.alloc(0)\n return miniStreamData\n }\n miniStreamData = readChain(root.startSector, root.size || MAX_STREAM_SIZE)\n return miniStreamData\n }\n\n // ── 미니 스트림에서 읽기 ──\n\n function readMiniStream(startSector: number, size: number): Buffer {\n const mft = getMiniFatTable()\n const ms = getMiniStream()\n if (mft.length === 0 || ms.length === 0) return Buffer.alloc(0)\n\n const chunks: Buffer[] = []\n let current = startSector\n let totalRead = 0\n const visited = new Set<number>()\n\n while (current !== END_OF_CHAIN && current !== FREE_SECT && totalRead < size) {\n if (visited.has(current)) break\n if (visited.size > MAX_CHAIN_LENGTH) break\n visited.add(current)\n\n const off = current * miniSectorSize\n const remaining = size - totalRead\n const chunkSize = Math.min(miniSectorSize, remaining)\n if (off + chunkSize <= ms.length) {\n chunks.push(ms.subarray(off, off + chunkSize))\n }\n totalRead += chunkSize\n\n current = current < mft.length ? mft[current] : END_OF_CHAIN\n }\n\n return Buffer.concat(chunks)\n }\n\n // ── 스트림 읽기 (일반/미니 자동 분기) ──\n\n function readStreamData(entry: DirEntry): Buffer {\n if (entry.size === 0) return Buffer.alloc(0)\n if (entry.size < miniStreamCutoff) {\n const miniResult = readMiniStream(entry.startSector, entry.size)\n // 미니스트림이 비어있으면 일반 체인으로 폴백 (lenient)\n if (miniResult.length > 0) return miniResult\n }\n return readChain(entry.startSector, entry.size)\n }\n\n // ── 경로 기반 탐색 ──\n\n // 전체 경로 맵 구축 (간이: 이름 기반 flat lookup)\n // HWP 파일의 디렉토리 구조는 보통 1~2 depth이므로 이름 매칭으로 충분\n function findEntryByPath(path: string): DirEntry | null {\n // \"/FileHeader\" → \"FileHeader\"\n // \"/BodyText/Section0\" → path component matching\n const parts = path.replace(/^\\//, \"\").split(\"/\")\n\n if (parts.length === 1) {\n // 단일 이름 매칭\n return dirEntries.find(e => e.name === parts[0] && e.type === 2) ?? null\n }\n\n // 2-depth: storage/stream\n // HWP 구조: Root/BodyText/Section0, Root/DocInfo, Root/BinData/BIN0001 등\n const storageName = parts[0]\n const streamName = parts.slice(1).join(\"/\")\n\n // 디렉토리 구조 대신 이름 패턴으로 찾기 (lenient)\n for (const e of dirEntries) {\n if (e.type === 2 && e.name === streamName) {\n // 부모 확인은 생략 (lenient) — 중복 이름 시 첫 번째 반환\n return e\n }\n }\n\n // 정확한 이름이 아닌 경우 (ViewText/Section0 등)\n const lastPart = parts[parts.length - 1]\n return dirEntries.find(e => e.type === 2 && e.name === lastPart) ?? null\n }\n\n // ── 공개 API ──\n\n return {\n findStream(path: string): Buffer | null {\n // \\005 prefix 처리 (SummaryInformation)\n const normalized = path.replace(/^\\//, \"\")\n const entry = findEntryByPath(normalized)\n if (!entry || entry.type !== 2) return null\n const stream = readStreamData(entry)\n return stream.length > 0 ? stream : null\n },\n\n entries(): DirEntry[] {\n return dirEntries.filter(e => e.type === 2) // stream만\n },\n }\n}\n"],"mappings":";;;AAEA,OAAO,WAAW;;;ACYlB,IAAM,YAAY,OAAO,KAAK,CAAC,KAAM,KAAM,IAAM,KAAM,KAAM,KAAM,IAAM,GAAI,CAAC;AAC9E,IAAM,eAAe;AACrB,IAAM,YAAY;AAGlB,IAAM,mBAAmB;AAEzB,IAAM,kBAAkB;AAExB,IAAM,kBAAkB,MAAM,OAAO;AAsB9B,SAAS,gBAAgB,MAAmC;AACjE,MAAI,KAAK,SAAS,IAAK,OAAM,IAAI,MAAM,mGAA6B;AACpE,MAAI,CAAC,KAAK,SAAS,GAAG,CAAC,EAAE,OAAO,SAAS,EAAG,OAAM,IAAI,MAAM,wDAAgB;AAI5E,QAAM,kBAAkB,KAAK,aAAa,EAAE;AAC5C,MAAI,kBAAkB,KAAK,kBAAkB,GAAI,OAAM,IAAI,MAAM,yFAAwB,eAAe;AACxG,QAAM,aAAa,KAAK;AACxB,QAAM,sBAAsB,KAAK,aAAa,EAAE;AAChD,MAAI,sBAAsB,GAAI,OAAM,IAAI,MAAM,sGAA2B,mBAAmB;AAC5F,QAAM,iBAAiB,KAAK;AAE5B,QAAM,iBAAiB,KAAK,aAAa,EAAE;AAC3C,MAAI,iBAAiB,IAAO,OAAM,IAAI,MAAM,0EAAwB,cAAc;AAClF,QAAM,iBAAiB,KAAK,aAAa,EAAE;AAC3C,QAAM,mBAAmB,KAAK,aAAa,EAAE;AAC7C,QAAM,qBAAqB,KAAK,aAAa,EAAE;AAC/C,QAAM,qBAAqB,KAAK,aAAa,EAAE;AAC/C,QAAM,mBAAmB,KAAK,aAAa,EAAE;AAC7C,QAAM,mBAAmB,KAAK,aAAa,EAAE;AAI7C,WAAS,aAAa,IAAoB;AACxC,WAAO,MAAM,KAAK;AAAA,EACpB;AAEA,WAAS,eAAe,IAAoB;AAC1C,UAAM,MAAM,aAAa,EAAE;AAC3B,QAAI,MAAM,aAAa,KAAK,OAAQ,QAAO,OAAO,MAAM,CAAC;AACzD,WAAO,KAAK,SAAS,KAAK,MAAM,UAAU;AAAA,EAC5C;AAIA,QAAM,aAAuB,CAAC;AAG9B,WAAS,IAAI,GAAG,IAAI,OAAO,WAAW,SAAS,gBAAgB,KAAK;AAClE,UAAM,MAAM,KAAK,aAAa,KAAK,IAAI,CAAC;AACxC,QAAI,QAAQ,aAAa,QAAQ,aAAc;AAC/C,eAAW,KAAK,GAAG;AAAA,EACrB;AAGA,MAAI,cAAc;AAClB,QAAM,eAAe,oBAAI,IAAY;AACrC,WAAS,IAAI,GAAG,IAAI,oBAAoB,gBAAgB,gBAAgB,gBAAgB,WAAW,KAAK;AACtG,QAAI,aAAa,IAAI,WAAW,EAAG;AACnC,iBAAa,IAAI,WAAW;AAE5B,UAAM,MAAM,eAAe,WAAW;AACtC,UAAM,mBAAoB,aAAa,IAAK;AAC5C,aAAS,IAAI,GAAG,IAAI,oBAAoB,WAAW,SAAS,gBAAgB,KAAK;AAC/E,YAAM,MAAM,IAAI,aAAa,IAAI,CAAC;AAClC,UAAI,QAAQ,aAAa,QAAQ,aAAc;AAC/C,iBAAW,KAAK,GAAG;AAAA,IACrB;AACA,kBAAc,IAAI,aAAa,mBAAmB,CAAC;AAAA,EACrD;AAIA,QAAM,sBAAsB,aAAa;AACzC,QAAM,WAAW,IAAI,YAAY,WAAW,SAAS,mBAAmB;AAExE,WAAS,KAAK,GAAG,KAAK,WAAW,QAAQ,MAAM;AAC7C,UAAM,MAAM,eAAe,WAAW,EAAE,CAAC;AACzC,aAAS,IAAI,GAAG,IAAI,qBAAqB,KAAK;AAC5C,eAAS,KAAK,sBAAsB,CAAC,IAAI,IAAI,IAAI,IAAI,IAAI,SACrD,IAAI,aAAa,IAAI,CAAC,IACtB;AAAA,IACN;AAAA,EACF;AAIA,WAAS,UAAU,aAAqB,UAA0B;AAChE,QAAI,gBAAgB,gBAAgB,gBAAgB,UAAW,QAAO,OAAO,MAAM,CAAC;AACpF,QAAI,WAAW,gBAAiB,OAAM,IAAI,MAAM,0DAAa;AAE7D,UAAM,SAAmB,CAAC;AAC1B,QAAI,UAAU;AACd,QAAI,YAAY;AAChB,UAAM,UAAU,oBAAI,IAAY;AAEhC,WAAO,YAAY,gBAAgB,YAAY,aAAa,YAAY,UAAU;AAChF,UAAI,QAAQ,IAAI,OAAO,EAAG;AAC1B,UAAI,QAAQ,OAAO,iBAAkB;AACrC,cAAQ,IAAI,OAAO;AAEnB,YAAM,MAAM,eAAe,OAAO;AAClC,YAAM,YAAY,WAAW;AAC7B,aAAO,KAAK,YAAY,aAAa,IAAI,SAAS,GAAG,SAAS,IAAI,GAAG;AACrE,mBAAa,KAAK,IAAI,IAAI,QAAQ,SAAS;AAE3C,gBAAU,UAAU,SAAS,SAAS,SAAS,OAAO,IAAI;AAAA,IAC5D;AAEA,WAAO,OAAO,OAAO,MAAM;AAAA,EAC7B;AAIA,MAAI,eAAmC;AAEvC,WAAS,kBAA+B;AACtC,QAAI,aAAc,QAAO;AAEzB,QAAI,uBAAuB,KAAK,uBAAuB,cAAc;AACnE,qBAAe,IAAI,YAAY,CAAC;AAChC,aAAO;AAAA,IACT;AAEA,UAAM,cAAc,UAAU,oBAAoB,qBAAqB,UAAU;AACjF,UAAM,UAAU,YAAY,SAAS;AACrC,mBAAe,IAAI,YAAY,OAAO;AACtC,aAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,mBAAa,CAAC,IAAI,YAAY,aAAa,IAAI,CAAC;AAAA,IAClD;AACA,WAAO;AAAA,EACT;AAIA,QAAM,UAAU,UAAU,gBAAgB,kBAAkB,GAAG;AAC/D,QAAM,aAAyB,CAAC;AAEhC,WAAS,SAAS,GAAG,SAAS,OAAO,QAAQ,UAAU,WAAW,SAAS,iBAAiB,UAAU,KAAK;AACzG,UAAM,UAAU,QAAQ,aAAa,SAAS,EAAE;AAChD,QAAI,WAAW,KAAK,UAAU,IAAI;AAChC,iBAAW,KAAK,EAAE,MAAM,IAAI,MAAM,GAAG,aAAa,GAAG,MAAM,EAAE,CAAC;AAC9D;AAAA,IACF;AAEA,UAAM,YAAY,UAAU;AAC5B,UAAM,OAAO,YAAY,IACrB,QAAQ,SAAS,QAAQ,SAAS,SAAS,EAAE,SAAS,SAAS,IAC/D;AAEJ,UAAM,OAAO,QAAQ,SAAS,EAAE;AAChC,UAAM,cAAc,QAAQ,aAAa,SAAS,GAAG;AAErD,UAAM,OAAO,QAAQ,aAAa,SAAS,GAAG;AAE9C,eAAW,KAAK,EAAE,MAAM,MAAM,aAAa,KAAK,CAAC;AAAA,EACnD;AAIA,MAAI,iBAAgC;AAEpC,WAAS,gBAAwB;AAC/B,QAAI,eAAgB,QAAO;AAC3B,UAAM,OAAO,WAAW,CAAC;AACzB,QAAI,CAAC,QAAQ,KAAK,SAAS,GAAG;AAC5B,uBAAiB,OAAO,MAAM,CAAC;AAC/B,aAAO;AAAA,IACT;AACA,qBAAiB,UAAU,KAAK,aAAa,KAAK,QAAQ,eAAe;AACzE,WAAO;AAAA,EACT;AAIA,WAAS,eAAe,aAAqB,MAAsB;AACjE,UAAM,MAAM,gBAAgB;AAC5B,UAAM,KAAK,cAAc;AACzB,QAAI,IAAI,WAAW,KAAK,GAAG,WAAW,EAAG,QAAO,OAAO,MAAM,CAAC;AAE9D,UAAM,SAAmB,CAAC;AAC1B,QAAI,UAAU;AACd,QAAI,YAAY;AAChB,UAAM,UAAU,oBAAI,IAAY;AAEhC,WAAO,YAAY,gBAAgB,YAAY,aAAa,YAAY,MAAM;AAC5E,UAAI,QAAQ,IAAI,OAAO,EAAG;AAC1B,UAAI,QAAQ,OAAO,iBAAkB;AACrC,cAAQ,IAAI,OAAO;AAEnB,YAAM,MAAM,UAAU;AACtB,YAAM,YAAY,OAAO;AACzB,YAAM,YAAY,KAAK,IAAI,gBAAgB,SAAS;AACpD,UAAI,MAAM,aAAa,GAAG,QAAQ;AAChC,eAAO,KAAK,GAAG,SAAS,KAAK,MAAM,SAAS,CAAC;AAAA,MAC/C;AACA,mBAAa;AAEb,gBAAU,UAAU,IAAI,SAAS,IAAI,OAAO,IAAI;AAAA,IAClD;AAEA,WAAO,OAAO,OAAO,MAAM;AAAA,EAC7B;AAIA,WAAS,eAAe,OAAyB;AAC/C,QAAI,MAAM,SAAS,EAAG,QAAO,OAAO,MAAM,CAAC;AAC3C,QAAI,MAAM,OAAO,kBAAkB;AACjC,YAAM,aAAa,eAAe,MAAM,aAAa,MAAM,IAAI;AAE/D,UAAI,WAAW,SAAS,EAAG,QAAO;AAAA,IACpC;AACA,WAAO,UAAU,MAAM,aAAa,MAAM,IAAI;AAAA,EAChD;AAMA,WAAS,gBAAgB,MAA+B;AAGtD,UAAM,QAAQ,KAAK,QAAQ,OAAO,EAAE,EAAE,MAAM,GAAG;AAE/C,QAAI,MAAM,WAAW,GAAG;AAEtB,aAAO,WAAW,KAAK,OAAK,EAAE,SAAS,MAAM,CAAC,KAAK,EAAE,SAAS,CAAC,KAAK;AAAA,IACtE;AAIA,UAAM,cAAc,MAAM,CAAC;AAC3B,UAAM,aAAa,MAAM,MAAM,CAAC,EAAE,KAAK,GAAG;AAG1C,eAAW,KAAK,YAAY;AAC1B,UAAI,EAAE,SAAS,KAAK,EAAE,SAAS,YAAY;AAEzC,eAAO;AAAA,MACT;AAAA,IACF;AAGA,UAAM,WAAW,MAAM,MAAM,SAAS,CAAC;AACvC,WAAO,WAAW,KAAK,OAAK,EAAE,SAAS,KAAK,EAAE,SAAS,QAAQ,KAAK;AAAA,EACtE;AAIA,SAAO;AAAA,IACL,WAAW,MAA6B;AAEtC,YAAM,aAAa,KAAK,QAAQ,OAAO,EAAE;AACzC,YAAM,QAAQ,gBAAgB,UAAU;AACxC,UAAI,CAAC,SAAS,MAAM,SAAS,EAAG,QAAO;AACvC,YAAM,SAAS,eAAe,KAAK;AACnC,aAAO,OAAO,SAAS,IAAI,SAAS;AAAA,IACtC;AAAA,IAEA,UAAsB;AACpB,aAAO,WAAW,OAAO,OAAK,EAAE,SAAS,CAAC;AAAA,IAC5C;AAAA,EACF;AACF;;;ADrSA,SAAS,WAAW,QAAiC;AACnD,SAAO,IAAI,WAAW,QAAQ,GAAG,KAAK,IAAI,GAAG,OAAO,UAAU,CAAC;AACjE;AAGO,SAAS,UAAU,QAA8B;AACtD,QAAM,IAAI,WAAW,MAAM;AAC3B,SAAO,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM,KAAQ,EAAE,CAAC,MAAM;AACrE;AAGO,SAAS,WAAW,QAA8B;AACvD,SAAO,UAAU,MAAM;AACzB;AAGO,SAAS,aAAa,QAA8B;AACzD,QAAM,IAAI,WAAW,MAAM;AAC3B,SAAO,EAAE,CAAC,MAAM,OAAQ,EAAE,CAAC,MAAM,OAAQ,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM;AACrE;AAMA,IAAM,cAAc,IAAI,YAAY,EAAE,OAAO,yBAAyB;AAC/D,SAAS,WAAW,QAA8B;AACvD,MAAI,OAAO,aAAa,YAAY,OAAQ,QAAO;AACnD,QAAM,OAAO,IAAI,WAAW,QAAQ,GAAG,YAAY,MAAM;AACzD,WAAS,IAAI,GAAG,IAAI,YAAY,QAAQ,KAAK;AAC3C,QAAI,KAAK,CAAC,MAAM,YAAY,CAAC,EAAG,QAAO;AAAA,EACzC;AACA,SAAO;AACT;AAGO,SAAS,UAAU,QAA8B;AACtD,QAAM,IAAI,WAAW,MAAM;AAC3B,SAAO,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM,MAAQ,EAAE,CAAC,MAAM;AACrE;AAGO,SAAS,YAAY,QAA8B;AACxD,QAAM,QAAQ,IAAI,WAAW,QAAQ,GAAG,KAAK,IAAI,KAAK,OAAO,UAAU,CAAC;AACxE,QAAM,OAAO,IAAI,YAAY,SAAS,EAAE,OAAO,MAAM,CAAC,EAAE,OAAO,KAAK,EAAE,QAAQ,WAAW,EAAE;AAC3F,SAAO,KAAK,UAAU,EAAE,WAAW,OAAO,KAAK,KAAK,SAAS,QAAQ;AACvE;AAGO,SAAS,aAAa,QAA+B;AAC1D,MAAI,OAAO,aAAa,EAAG,QAAO;AAClC,MAAI,WAAW,MAAM,EAAG,QAAO;AAC/B,MAAI,UAAU,MAAM,EAAG,QAAO;AAC9B,MAAI,aAAa,MAAM,EAAG,QAAO;AACjC,MAAI,UAAU,MAAM,EAAG,QAAO;AAC9B,MAAI,YAAY,MAAM,EAAG,QAAO;AAChC,SAAO;AACT;AAQO,SAAS,iBAAiB,QAAgD;AAC/E,MAAI;AACF,UAAM,MAAM,gBAAgB,OAAO,KAAK,MAAM,CAAC;AAC/C,UAAM,QAAQ,IAAI,QAAQ,EAAE,IAAI,OAAK,EAAE,IAAI;AAC3C,QAAI,MAAM,SAAS,UAAU,KAAK,MAAM,SAAS,MAAM,EAAG,QAAO;AACjE,QAAI,MAAM,SAAS,YAAY,EAAG,QAAO;AAEzC,QAAI,MAAM,KAAK,OAAK,MAAM,aAAa,EAAE,WAAW,SAAS,CAAC,EAAG,QAAO;AACxE,WAAO;AAAA,EACT,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAMA,eAAsB,gBAAgB,QAAoE;AACxG,MAAI;AACF,UAAM,MAAM,MAAM,MAAM,UAAU,MAAM;AAExC,QAAI,IAAI,KAAK,iBAAiB,EAAG,QAAO;AAExC,QAAI,IAAI,KAAK,mBAAmB,EAAG,QAAO;AAE1C,QAAI,IAAI,KAAK,sBAAsB,KAAK,IAAI,KAAK,UAAU,EAAG,QAAO;AAErE,UAAM,aAAa,OAAO,KAAK,IAAI,KAAK,EAAE,KAAK,OAAK,EAAE,WAAW,WAAW,CAAC;AAC7E,QAAI,WAAY,QAAO;AACvB,WAAO;AAAA,EACT,QAAQ;AACN,WAAO;AAAA,EACT;AACF;","names":[]}
@@ -1,5 +1,5 @@
1
1
  // src/utils.ts
2
- var VERSION = true ? "2.5.2" : "0.0.0-dev";
2
+ var VERSION = true ? "2.7.1" : "0.0.0-dev";
3
3
  function toArrayBuffer(buf) {
4
4
  if (buf.byteOffset === 0 && buf.byteLength === buf.buffer.byteLength) {
5
5
  return buf.buffer;
@@ -325,6 +325,17 @@ function hasMergedCells(table) {
325
325
  }
326
326
  return false;
327
327
  }
328
+ function containsInlineMath(text) {
329
+ return /(^|[^\\])\$(?=\S)(?:\\.|[^$\n])+?\S\$/.test(text);
330
+ }
331
+ function tableContainsInlineMath(table) {
332
+ for (const row of table.cells) {
333
+ for (const cell of row) {
334
+ if (containsInlineMath(cell.text)) return true;
335
+ }
336
+ }
337
+ return false;
338
+ }
328
339
  function tableToHtml(table) {
329
340
  const { cells, rows: numRows, cols: numCols } = table;
330
341
  const skip = /* @__PURE__ */ new Set();
@@ -357,7 +368,7 @@ function tableToHtml(table) {
357
368
  function tableToMarkdown(table) {
358
369
  if (table.rows === 0 || table.cols === 0) return "";
359
370
  const { cells, rows: numRows, cols: numCols } = table;
360
- if (hasMergedCells(table)) return tableToHtml(table);
371
+ if (hasMergedCells(table) && !tableContainsInlineMath(table)) return tableToHtml(table);
361
372
  if (numRows === 1 && numCols === 1) {
362
373
  const content = sanitizeText(cells[0][0].text);
363
374
  if (!content) return "";
@@ -447,4 +458,4 @@ export {
447
458
  HEADING_RATIO_H2,
448
459
  HEADING_RATIO_H3
449
460
  };
450
- //# sourceMappingURL=chunk-24NKFRB4.js.map
461
+ //# sourceMappingURL=chunk-GNN6MHH4.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/utils.ts","../src/table/builder.ts","../src/types.ts"],"sourcesContent":["/** kordoc 공용 유틸리티 */\n\n/** 빌드 타임에 tsup define으로 주입되는 버전 */\ndeclare const __KORDOC_VERSION__: string\nexport const VERSION: string = typeof __KORDOC_VERSION__ !== \"undefined\" ? __KORDOC_VERSION__ : \"0.0.0-dev\"\n\n/**\n * Node.js Buffer → ArrayBuffer 변환\n * pool Buffer의 공유 ArrayBuffer 문제를 안전하게 처리.\n * offset=0이고 전체 ArrayBuffer를 차지하면 복사 없이 직접 반환.\n */\nexport function toArrayBuffer(buf: Buffer): ArrayBuffer {\n if (buf.byteOffset === 0 && buf.byteLength === buf.buffer.byteLength) {\n return buf.buffer as ArrayBuffer\n }\n return buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength) as ArrayBuffer\n}\n\n/**\n * kordoc 내부 에러 클래스 — 사용자에게 노출해도 안전한 메시지만 포함.\n * MCP 에러 정제에서 instanceof로 판별하여 allowlist 패턴 매칭 없이 안전하게 통과.\n */\nexport class KordocError extends Error {\n constructor(message: string) {\n super(message)\n this.name = \"KordocError\"\n }\n}\n\n/**\n * 에러 메시지 정제 — KordocError는 그대로, 나머지는 일반 메시지로 대체.\n * 파일시스템 경로, 스택 트레이스 등 내부 정보 노출 방지.\n */\nexport function sanitizeError(err: unknown): string {\n if (err instanceof KordocError) return err.message\n return \"문서 처리 중 오류가 발생했습니다\"\n}\n\n/**\n * ZIP 엔트리 경로의 경로 순회 여부 판별.\n * 백슬래시 정규화, .., 절대경로, Windows 드라이브 문자 모두 차단.\n */\nexport function isPathTraversal(name: string): boolean {\n if (name.includes(\"\\x00\")) return true\n const normalized = name.replace(/\\\\/g, \"/\")\n const segments = normalized.split(\"/\")\n return segments.some(s => s === \"..\") || normalized.startsWith(\"/\") || /^[A-Za-z]:/.test(normalized)\n}\n\n// ─── ZIP 안전 로딩 (ZIP bomb 방지) ────────────────────\n\n/**\n * ZIP bomb 사전 검사 — Central Directory에서 비압축 합계와 엔트리 수 확인.\n * HWPX/XLSX/DOCX 등 모든 ZIP 기반 포맷에서 공통 사용.\n */\nexport function precheckZipSize(\n buffer: ArrayBuffer,\n maxUncompressedSize = 100 * 1024 * 1024,\n maxEntries = 500,\n): { totalUncompressed: number; entryCount: number } {\n try {\n const data = new DataView(buffer)\n const len = buffer.byteLength\n // EOCD 시그니처 역방향 스캔\n let eocdOffset = -1\n for (let i = len - 22; i >= Math.max(0, len - 65557); i--) {\n if (data.getUint32(i, true) === 0x06054b50) { eocdOffset = i; break }\n }\n if (eocdOffset < 0) return { totalUncompressed: 0, entryCount: 0 }\n\n const entryCount = data.getUint16(eocdOffset + 10, true)\n if (entryCount > maxEntries) {\n throw new KordocError(`ZIP 엔트리 수 초과: ${entryCount} (최대 ${maxEntries})`)\n }\n\n const cdSize = data.getUint32(eocdOffset + 12, true)\n const cdOffset = data.getUint32(eocdOffset + 16, true)\n if (cdOffset + cdSize > len) return { totalUncompressed: 0, entryCount }\n\n let totalUncompressed = 0\n let pos = cdOffset\n for (let i = 0; i < entryCount && pos + 46 <= cdOffset + cdSize; i++) {\n if (data.getUint32(pos, true) !== 0x02014b50) break\n totalUncompressed += data.getUint32(pos + 24, true)\n const nameLen = data.getUint16(pos + 28, true)\n const extraLen = data.getUint16(pos + 30, true)\n const commentLen = data.getUint16(pos + 32, true)\n pos += 46 + nameLen + extraLen + commentLen\n }\n\n if (totalUncompressed > maxUncompressedSize) {\n throw new KordocError(`ZIP 비압축 크기 초과: ${(totalUncompressed / 1024 / 1024).toFixed(1)}MB (최대 ${maxUncompressedSize / 1024 / 1024}MB)`)\n }\n\n return { totalUncompressed, entryCount }\n } catch (err) {\n if (err instanceof KordocError) throw err\n return { totalUncompressed: 0, entryCount: 0 }\n }\n}\n\n/** XXE/Billion Laughs 방지 — DOCTYPE 제거 (내부 DTD 서브셋 포함) */\nexport function stripDtd(xml: string): string {\n return xml.replace(/<!DOCTYPE\\s[^[>]*(\\[[\\s\\S]*?\\])?\\s*>/gi, \"\")\n}\n\n/** 하이퍼링크 URL 살균 — javascript: 등 XSS 위험 스킴 차단 */\nconst SAFE_HREF_RE = /^(?:https?:|mailto:|tel:|#)/i\nexport function sanitizeHref(href: string): string | null {\n const trimmed = href.trim()\n if (!trimmed || !SAFE_HREF_RE.test(trimmed)) return null\n return trimmed\n}\n\n// ─── 안전한 min/max (스택 오버플로 방지) ─────────────\n\n/** Math.min(...arr) 대체 — 대형 배열에서 스택 오버플로 방지 */\nexport function safeMin(arr: number[]): number {\n let min = Infinity\n for (let i = 0; i < arr.length; i++) if (arr[i] < min) min = arr[i]\n return min\n}\n\n/** Math.max(...arr) 대체 — 대형 배열에서 스택 오버플로 방지 */\nexport function safeMax(arr: number[]): number {\n let max = -Infinity\n for (let i = 0; i < arr.length; i++) if (arr[i] > max) max = arr[i]\n return max\n}\n\n// ─── 에러 분류 ──────────────────────────────────────\n\nimport type { ErrorCode } from \"./types.js\"\n\n/** 에러를 구조화된 ErrorCode로 분류 — KordocError 메시지 패턴 매칭 */\nexport function classifyError(err: unknown): ErrorCode {\n if (!(err instanceof Error)) return \"PARSE_ERROR\"\n const msg = err.message\n if (msg.includes(\"암호화\")) return \"ENCRYPTED\"\n if (msg.includes(\"DRM\")) return \"DRM_PROTECTED\"\n if (msg.includes(\"ZIP bomb\") || msg.includes(\"ZIP 비압축 크기 초과\") || msg.includes(\"ZIP 엔트리 수 초과\")) return \"ZIP_BOMB\"\n if (msg.includes(\"bomb\") || msg.includes(\"크기 초과\") || msg.includes(\"압축 해제\")) return \"DECOMPRESSION_BOMB\"\n if (msg.includes(\"이미지 기반\")) return \"IMAGE_BASED_PDF\"\n if (msg.includes(\"섹션\") && (msg.includes(\"찾을 수 없\") || msg.includes(\"없음\"))) return \"NO_SECTIONS\"\n if (msg.includes(\"시그니처\") || msg.includes(\"복구할 수 없\")) return \"CORRUPTED\"\n return \"PARSE_ERROR\"\n}\n","/** 2-pass colSpan/rowSpan 테이블 빌더 및 Markdown 변환 */\r\n\r\nimport type { CellContext, IRBlock, IRCell, IRTable } from \"../types.js\"\r\nimport { sanitizeHref } from \"../utils.js\"\r\n\r\n/** 테이블 열 수 상한 — 한국 공공문서 기준 충분한 값 */\r\nexport const MAX_COLS = 200\r\n/** 테이블 행 수 상한 — 메모리 폭주 방지 */\r\nexport const MAX_ROWS = 10000\r\n\r\nexport function buildTable(rows: CellContext[][]): IRTable {\r\n if (rows.length > MAX_ROWS) rows = rows.slice(0, MAX_ROWS)\r\n const numRows = rows.length\r\n\r\n // colAddr/rowAddr가 있으면 직접 배치 (HWPX cellAddr, HWP5 colAddr/rowAddr)\r\n const hasAddr = rows.some(row => row.some(c => c.colAddr !== undefined && c.rowAddr !== undefined))\r\n if (hasAddr) return buildTableDirect(rows, numRows)\r\n\r\n // Pass 1: maxCols 계산 — 2D 배열 사용 (동적 확장)\r\n let maxCols = 0\r\n const tempOccupied: boolean[][] = Array.from({ length: numRows }, () => [])\r\n\r\n for (let rowIdx = 0; rowIdx < numRows; rowIdx++) {\r\n let colIdx = 0\r\n for (const cell of rows[rowIdx]) {\r\n while (colIdx < MAX_COLS && tempOccupied[rowIdx][colIdx]) colIdx++\r\n if (colIdx >= MAX_COLS) break\r\n\r\n for (let r = rowIdx; r < Math.min(rowIdx + cell.rowSpan, numRows); r++) {\r\n for (let c = colIdx; c < Math.min(colIdx + cell.colSpan, MAX_COLS); c++) {\r\n tempOccupied[r][c] = true\r\n }\r\n }\r\n colIdx += cell.colSpan\r\n if (colIdx > maxCols) maxCols = colIdx\r\n }\r\n }\r\n\r\n if (maxCols === 0) return { rows: 0, cols: 0, cells: [], hasHeader: false }\r\n\r\n // Pass 2: 실제 배치\r\n const grid: IRCell[][] = Array.from({ length: numRows }, () =>\r\n Array.from({ length: maxCols }, () => ({ text: \"\", colSpan: 1, rowSpan: 1 }))\r\n )\r\n const occupied: boolean[][] = Array.from({ length: numRows }, () => Array(maxCols).fill(false))\r\n\r\n for (let rowIdx = 0; rowIdx < numRows; rowIdx++) {\r\n let colIdx = 0\r\n let cellIdx = 0\r\n\r\n while (colIdx < maxCols && cellIdx < rows[rowIdx].length) {\r\n while (colIdx < maxCols && occupied[rowIdx][colIdx]) colIdx++\r\n if (colIdx >= maxCols) break\r\n\r\n const cell = rows[rowIdx][cellIdx]\r\n grid[rowIdx][colIdx] = {\r\n text: cell.text.trim(),\r\n colSpan: cell.colSpan,\r\n rowSpan: cell.rowSpan,\r\n }\r\n\r\n for (let r = rowIdx; r < Math.min(rowIdx + cell.rowSpan, numRows); r++) {\r\n for (let c = colIdx; c < Math.min(colIdx + cell.colSpan, maxCols); c++) {\r\n occupied[r][c] = true\r\n }\r\n }\r\n\r\n colIdx += cell.colSpan\r\n cellIdx++\r\n }\r\n }\r\n\r\n return trimAndReturn(grid, numRows, maxCols)\r\n}\r\n\r\n/** colAddr/rowAddr 절대 좌표 기반 직접 배치 */\r\nfunction buildTableDirect(rows: CellContext[][], numRows: number): IRTable {\r\n // 전체 셀에서 maxCols 계산 (MAX_COLS 상한 적용)\r\n let maxCols = 0\r\n for (const row of rows) {\r\n for (const cell of row) {\r\n const end = (cell.colAddr ?? 0) + cell.colSpan\r\n if (end > maxCols) maxCols = end\r\n }\r\n }\r\n if (maxCols > MAX_COLS) maxCols = MAX_COLS\r\n if (maxCols === 0) return { rows: 0, cols: 0, cells: [], hasHeader: false }\r\n\r\n const grid: IRCell[][] = Array.from({ length: numRows }, () =>\r\n Array.from({ length: maxCols }, () => ({ text: \"\", colSpan: 1, rowSpan: 1 }))\r\n )\r\n\r\n for (const row of rows) {\r\n for (const cell of row) {\r\n const r = cell.rowAddr ?? 0\r\n const c = cell.colAddr ?? 0\r\n if (r >= numRows || c >= maxCols || r < 0 || c < 0) continue\r\n\r\n grid[r][c] = { text: cell.text.trim(), colSpan: cell.colSpan, rowSpan: cell.rowSpan }\r\n\r\n // 병합 영역 마킹\r\n for (let dr = 0; dr < cell.rowSpan; dr++) {\r\n for (let dc = 0; dc < cell.colSpan; dc++) {\r\n if (dr === 0 && dc === 0) continue\r\n if (r + dr < numRows && c + dc < maxCols) {\r\n grid[r + dr][c + dc] = { text: \"\", colSpan: 1, rowSpan: 1 }\r\n }\r\n }\r\n }\r\n }\r\n }\r\n\r\n return trimAndReturn(grid, numRows, maxCols)\r\n}\r\n\r\n/** 빈 후행 열 제거 후 IRTable 반환 */\r\nfunction trimAndReturn(grid: IRCell[][], numRows: number, maxCols: number): IRTable {\r\n let effectiveCols = maxCols\r\n while (effectiveCols > 0) {\r\n const colEmpty = grid.every(row => !row[effectiveCols - 1]?.text?.trim())\r\n if (!colEmpty) break\r\n effectiveCols--\r\n }\r\n if (effectiveCols < maxCols && effectiveCols > 0) {\r\n const trimmed = grid.map(row => row.slice(0, effectiveCols))\r\n return { rows: numRows, cols: effectiveCols, cells: trimmed, hasHeader: numRows > 1 }\r\n }\r\n return { rows: numRows, cols: maxCols, cells: grid, hasHeader: numRows > 1 }\r\n}\r\n\r\nexport function convertTableToText(rows: CellContext[][]): string {\r\n return rows\r\n .map(row =>\r\n row\r\n .map(c => c.text.trim().replace(/\\n/g, \" \").replace(/\\|/g, \"\\\\|\"))\r\n .filter(Boolean)\r\n .join(\" / \")\r\n )\r\n .filter(Boolean)\r\n .join(\"\\n\")\r\n}\r\n\r\n/** 마크다운 GFM 특수문자 이스케이프 — remark-gfm 오해석 방지 */\r\nfunction escapeGfm(text: string): string {\r\n // ~ → \\~ (GFM strikethrough 방지)\r\n return text.replace(/~/g, \"\\\\~\")\r\n}\r\n\r\n/** HWP 자동생성 도형/개체 대체텍스트 정규식 — 한컴오피스가 삽입하는 모든 알려진 패턴 */\r\nconst HWP_SHAPE_ALT_TEXT_RE = /(?:모서리가 둥근 |둥근 )?(?:사각형|직사각형|정사각형|원|타원|삼각형|이등변 삼각형|직각 삼각형|선|직선|곡선|화살표|굵은 화살표|이중 화살표|오각형|육각형|팔각형|별|[4-8]점별|십자|십자형|구름|구름형|마름모|도넛|평행사변형|사다리꼴|부채꼴|호|반원|물결|번개|하트|빗금|블록 화살표|수식|표|그림|개체|그리기\\s?개체|묶음\\s?개체|글상자|수식\\s?개체|OLE\\s?개체)\\s?입니다\\.?/g\r\n\r\n/** HWP PUA 특수문자 및 도형 대체텍스트 제거 — 모든 포맷 공통 */\r\nfunction sanitizeText(text: string): string {\r\n let result = text\r\n // Supplementary Private Use Area (U+F0000-U+FFFFD) — HWP 전용 기호\r\n .replace(/[\\u{F0000}-\\u{FFFFD}]/gu, \"\")\r\n // HWP 도형/개체 자동생성 대체텍스트 제거\r\n .replace(HWP_SHAPE_ALT_TEXT_RE, \"\")\r\n .replace(/ +/g, \" \")\r\n .trim()\r\n // 균등배분 스페이스 정리 (\"현 장 대 응 단 장\" → \"현장대응단장\")\r\n // 짧은 텍스트(30자 이하)에서 70%+ 토큰이 한글 1글자면 균등배분으로 판단\r\n if (result.length <= 30 && result.includes(\" \")) {\r\n const tokens = result.split(\" \")\r\n // 한글 1글자 토큰만 카운트 — ASCII 특수문자(< > & 등)는 균등배분이 아님\r\n const koreanSingleCharCount = tokens.filter(t => t.length === 1 && /[\\uAC00-\\uD7AF\\u3131-\\u318E]/.test(t)).length\r\n if (tokens.length >= 3 && koreanSingleCharCount / tokens.length >= 0.7) {\r\n result = tokens.join(\"\")\r\n }\r\n }\r\n return result\r\n}\r\n\r\n/**\r\n * 레이아웃 테이블 감지 및 해체 — IRBlock 레벨에서 수행\r\n * 적은 행(≤3) + 셀 내 줄바꿈 다량 → table 블록을 paragraph 블록들로 분해\r\n * heading 감지 전에 호출해야 해체된 텍스트에 heading 감지 적용 가능\r\n */\r\nexport function flattenLayoutTables(blocks: IRBlock[]): IRBlock[] {\r\n const result: IRBlock[] = []\r\n\r\n for (const block of blocks) {\r\n if (block.type !== \"table\" || !block.table) {\r\n result.push(block)\r\n continue\r\n }\r\n\r\n const { rows: numRows, cols: numCols, cells } = block.table\r\n\r\n // 1x1 테이블은 기존 로직(tableToMarkdown)에서 처리\r\n if (numRows === 1 && numCols === 1) {\r\n result.push(block)\r\n continue\r\n }\r\n\r\n // 레이아웃 테이블 휴리스틱\r\n if (numRows <= 3) {\r\n let totalNewlines = 0\r\n let totalTextLen = 0\r\n for (let r = 0; r < numRows; r++) {\r\n for (let c = 0; c < numCols; c++) {\r\n const t = cells[r]?.[c]?.text || \"\"\r\n totalNewlines += (t.match(/\\n/g) || []).length\r\n totalTextLen += t.length\r\n }\r\n }\r\n\r\n // 레이아웃 테이블 판정: 많은 줄바꿈(>5), 또는 적은 행에 비해 총 텍스트 과다(>300)\r\n if (totalNewlines > 5 || (numRows <= 2 && totalTextLen > 300)) {\r\n // 레이아웃 테이블 → 각 셀을 paragraph 블록으로 분해\r\n for (let r = 0; r < numRows; r++) {\r\n for (let c = 0; c < numCols; c++) {\r\n const cellText = cells[r]?.[c]?.text?.trim()\r\n if (!cellText) continue\r\n // 셀 내 줄바꿈을 별도 paragraph로 분리\r\n for (const line of cellText.split(\"\\n\")) {\r\n const trimmed = line.trim()\r\n if (!trimmed) continue\r\n result.push({ type: \"paragraph\", text: trimmed, pageNumber: block.pageNumber })\r\n }\r\n }\r\n }\r\n continue\r\n }\r\n }\r\n\r\n result.push(block)\r\n }\r\n\r\n return result\r\n}\r\n\r\nexport function blocksToMarkdown(blocks: IRBlock[]): string {\r\n const lines: string[] = []\r\n\r\n for (let i = 0; i < blocks.length; i++) {\r\n const block = blocks[i]\r\n\r\n // 헤딩 블록\r\n if (block.type === \"heading\" && block.text) {\r\n const prefix = \"#\".repeat(Math.min(block.level || 2, 6))\r\n const headingText = sanitizeText(block.text)\r\n if (headingText) lines.push(\"\", `${prefix} ${headingText}`, \"\")\r\n continue\r\n }\r\n\r\n // 이미지 블록 — ![alt](filename) 참조\r\n if (block.type === \"image\" && block.text) {\r\n lines.push(\"\", `![image](${block.text})`, \"\")\r\n continue\r\n }\r\n\r\n // 구분선 블록\r\n if (block.type === \"separator\") {\r\n lines.push(\"\", \"---\", \"\")\r\n continue\r\n }\r\n\r\n // 리스트 블록\r\n if (block.type === \"list\" && block.text) {\r\n const listText = sanitizeText(block.text)\r\n if (!listText) continue\r\n // 텍스트가 이미 번호로 시작하면 그대로 출력 (원래 번호 보존)\r\n const alreadyNumbered = block.listType === \"ordered\" && /^\\d+\\.\\s/.test(listText)\r\n const prefix = alreadyNumbered ? \"\" : block.listType === \"ordered\" ? \"1. \" : \"- \"\r\n lines.push(`${prefix}${listText}`)\r\n if (block.children) {\r\n for (const child of block.children) {\r\n const childPrefix = child.listType === \"ordered\" ? \"1.\" : \"-\"\r\n lines.push(` ${childPrefix} ${child.text || \"\"}`)\r\n }\r\n }\r\n continue\r\n }\r\n\r\n if (block.type === \"paragraph\" && block.text) {\r\n let text = sanitizeText(block.text)\r\n if (!text) continue\r\n\r\n // 별표 패턴 (기존 호환)\r\n if (/^\\[별표\\s*\\d+/.test(text)) {\r\n const nextBlock = blocks[i + 1]\r\n if (nextBlock?.type === \"paragraph\" && nextBlock.text && /관련\\)?$/.test(nextBlock.text)) {\r\n lines.push(\"\", `## ${text} ${nextBlock.text}`, \"\")\r\n i++\r\n } else {\r\n lines.push(\"\", `## ${text}`, \"\")\r\n }\r\n continue\r\n }\r\n\r\n if (/^\\([^)]*조[^)]*관련\\)$/.test(text)) {\r\n lines.push(`*${text}*`, \"\")\r\n continue\r\n }\r\n\r\n // 하이퍼링크가 있으면 텍스트에 링크 삽입 (javascript: 등 위험 스킴 제거)\r\n if (block.href) {\r\n const href = sanitizeHref(block.href)\r\n if (href) text = `[${text}](${href})`\r\n }\r\n\r\n // 각주가 있으면 괄호로 인라인 삽입\r\n if (block.footnoteText) {\r\n text += ` (주: ${block.footnoteText})`\r\n }\r\n\r\n lines.push(escapeGfm(text), \"\")\r\n } else if (block.type === \"table\" && block.table) {\r\n // 테이블 앞에 빈 줄 보장 (마크다운 렌더링 필수)\r\n if (lines.length > 0 && lines[lines.length - 1] !== \"\") {\r\n lines.push(\"\")\r\n }\r\n const tableMd = tableToMarkdown(block.table)\r\n if (tableMd) {\r\n lines.push(tableMd)\r\n lines.push(\"\")\r\n }\r\n }\r\n }\r\n\r\n return lines.join(\"\\n\").trim()\r\n}\r\n\r\n/** 병합 셀 존재 여부 확인 */\r\nfunction hasMergedCells(table: IRTable): boolean {\r\n for (const row of table.cells) {\r\n for (const cell of row) {\r\n if (cell.colSpan > 1 || cell.rowSpan > 1) return true\r\n }\r\n }\r\n return false\r\n}\r\n\r\nfunction containsInlineMath(text: string): boolean {\r\n return /(^|[^\\\\])\\$(?=\\S)(?:\\\\.|[^$\\n])+?\\S\\$/.test(text)\r\n}\r\n\r\nfunction tableContainsInlineMath(table: IRTable): boolean {\r\n for (const row of table.cells) {\r\n for (const cell of row) {\r\n if (containsInlineMath(cell.text)) return true\r\n }\r\n }\r\n return false\r\n}\r\n\r\n/** 병합 테이블 → HTML <table> 출력 (rowspan/colspan 보존) */\r\nfunction tableToHtml(table: IRTable): string {\r\n const { cells, rows: numRows, cols: numCols } = table\r\n const skip = new Set<string>()\r\n const lines: string[] = [\"<table>\"]\r\n\r\n for (let r = 0; r < numRows; r++) {\r\n const tag = r === 0 ? \"th\" : \"td\"\r\n const rowHtml: string[] = []\r\n for (let c = 0; c < numCols; c++) {\r\n if (skip.has(`${r},${c}`)) continue\r\n const cell = cells[r]?.[c]\r\n if (!cell) continue\r\n\r\n // 병합 영역 skip 마킹\r\n for (let dr = 0; dr < cell.rowSpan; dr++) {\r\n for (let dc = 0; dc < cell.colSpan; dc++) {\r\n if (dr === 0 && dc === 0) continue\r\n if (r + dr < numRows && c + dc < numCols) skip.add(`${r + dr},${c + dc}`)\r\n }\r\n }\r\n\r\n const text = sanitizeText(cell.text).replace(/\\n/g, \"<br>\")\r\n const attrs: string[] = []\r\n if (cell.colSpan > 1) attrs.push(`colspan=\"${cell.colSpan}\"`)\r\n if (cell.rowSpan > 1) attrs.push(`rowspan=\"${cell.rowSpan}\"`)\r\n const attrStr = attrs.length ? \" \" + attrs.join(\" \") : \"\"\r\n rowHtml.push(`<${tag}${attrStr}>${text}</${tag}>`)\r\n }\r\n if (rowHtml.length) lines.push(`<tr>${rowHtml.join(\"\")}</tr>`)\r\n }\r\n\r\n lines.push(\"</table>\")\r\n return lines.join(\"\\n\")\r\n}\r\n\r\nfunction tableToMarkdown(table: IRTable): string {\r\n if (table.rows === 0 || table.cols === 0) return \"\"\r\n\r\n const { cells, rows: numRows, cols: numCols } = table\r\n\r\n // 병합 셀이 있으면 HTML 테이블로 출력하되, 수식이 있으면 GFM 표로 출력한다.\r\n // 많은 Markdown 렌더러가 raw HTML table 내부의 $...$를 수식으로 다시 처리하지 않는다.\r\n if (hasMergedCells(table) && !tableContainsInlineMath(table)) return tableToHtml(table)\r\n\r\n // 1행 1열 → 구조화된 텍스트 (빈 셀이면 스킵)\r\n if (numRows === 1 && numCols === 1) {\r\n const content = sanitizeText(cells[0][0].text)\r\n if (!content) return \"\"\r\n return content\r\n .split(/\\n/)\r\n .map(line => {\r\n const trimmed = line.trim()\r\n if (!trimmed) return \"\"\r\n if (/^\\d+\\.\\s/.test(trimmed)) return `**${escapeGfm(trimmed)}**`\r\n if (/^[가-힣]\\.\\s/.test(trimmed)) return ` ${escapeGfm(trimmed)}`\r\n return escapeGfm(trimmed)\r\n })\r\n .filter(Boolean)\r\n .join(\"\\n\")\r\n }\r\n\r\n // 1열 다행 테이블 → 각 행을 별도 라인으로 출력 (목록성 데이터)\r\n if (numCols === 1 && numRows >= 2) {\r\n return cells\r\n .map(row => escapeGfm(sanitizeText(row[0].text)).replace(/\\n/g, \" \"))\r\n .filter(Boolean)\r\n .join(\"\\n\")\r\n }\r\n\r\n // 병합 셀: 행/열 병합된 셀은 빈 칸으로\r\n const display: string[][] = Array.from({ length: numRows }, () => Array(numCols).fill(\"\"))\r\n const skip = new Set<string>()\r\n\r\n for (let r = 0; r < numRows; r++) {\r\n for (let c = 0; c < numCols; c++) {\r\n if (skip.has(`${r},${c}`)) continue\r\n const cell = cells[r]?.[c]\r\n if (!cell) continue\r\n display[r][c] = escapeGfm(sanitizeText(cell.text)).replace(/\\|/g, \"\\\\|\").replace(/\\n/g, \"<br>\")\r\n\r\n // colSpan/rowSpan: 병합된 열은 빈 칸으로 유지 (텍스트 중복 방지)\r\n for (let dr = 0; dr < cell.rowSpan; dr++) {\r\n for (let dc = 0; dc < cell.colSpan; dc++) {\r\n if (dr === 0 && dc === 0) continue\r\n if (r + dr < numRows && c + dc < numCols) {\r\n skip.add(`${r + dr},${c + dc}`)\r\n }\r\n }\r\n }\r\n // colSpan > 1이면 display 열 인덱스를 건너뜀\r\n c += cell.colSpan - 1\r\n }\r\n }\r\n\r\n // rowSpan 잔류 처리:\r\n // 1) 완전 빈 행 제거\r\n // 2) \"첫 열만 값, 나머지 빈\" 행 → 다음 데이터 행의 첫 열에 값을 전파\r\n // 단, colSpan으로 인한 빈 열(skip 셀)은 이 대상이 아님\r\n const uniqueRows: string[][] = []\r\n let pendingFirstCol = \"\"\r\n for (let r = 0; r < display.length; r++) {\r\n const row = display[r]\r\n const isEmptyPlaceholder = row.every(cell => cell === \"\")\r\n if (isEmptyPlaceholder) continue\r\n\r\n // 첫 열만 값이 있고 나머지 모두 빈 행 → 다음 데이터 행의 첫 열에 전파\r\n // 단, colSpan으로 인한 빈 열(skip 셀)은 \"진짜 빈\"이 아니므로 제외\r\n const nonEmptyCols = row.filter(cell => cell !== \"\")\r\n const hasSkipInRow = row.some((_, c) => skip.has(`${r},${c}`))\r\n if (!hasSkipInRow && nonEmptyCols.length === 1 && row[0] !== \"\" && row.slice(1).every(c => c === \"\")) {\r\n pendingFirstCol = row[0]\r\n continue\r\n }\r\n\r\n // 저장된 첫 열 값을 현재 행의 빈 첫 열에 전파\r\n if (pendingFirstCol && row[0] === \"\") {\r\n row[0] = pendingFirstCol\r\n pendingFirstCol = \"\"\r\n } else {\r\n pendingFirstCol = \"\"\r\n }\r\n uniqueRows.push(row)\r\n }\r\n\r\n if (uniqueRows.length === 0) return \"\"\r\n\r\n const md: string[] = []\r\n md.push(\"| \" + uniqueRows[0].join(\" | \") + \" |\")\r\n md.push(\"| \" + uniqueRows[0].map(() => \"---\").join(\" | \") + \" |\")\r\n for (let i = 1; i < uniqueRows.length; i++) {\r\n md.push(\"| \" + uniqueRows[i].join(\" | \") + \" |\")\r\n }\r\n return md.join(\"\\n\")\r\n}\r\n","/** kordoc 공통 타입 정의 */\r\n\r\n// ─── 중간 표현 (Intermediate Representation) ─────────\r\n\r\nexport interface CellContext {\r\n text: string\r\n colSpan: number\r\n rowSpan: number\r\n /** HWP5 셀 열 주소 (0-based) — 병합 테이블 배치용 */\r\n colAddr?: number\r\n /** HWP5 셀 행 주소 (0-based) — 병합 테이블 배치용 */\r\n rowAddr?: number\r\n}\r\n\r\n/** 블록 타입 — v2.0에서 heading, list, image, separator 추가 */\r\nexport type IRBlockType = \"paragraph\" | \"table\" | \"heading\" | \"list\" | \"image\" | \"separator\"\r\n\r\nexport interface IRBlock {\r\n type: IRBlockType\r\n text?: string\r\n table?: IRTable\r\n /** 헤딩 레벨 (1-6), type=\"heading\"일 때 사용 */\r\n level?: number\r\n /** 원본 페이지 번호 (1-based) */\r\n pageNumber?: number\r\n /** 바운딩 박스 — PDF에서만 제공 */\r\n bbox?: BoundingBox\r\n /** 텍스트 스타일 정보 (선택) */\r\n style?: InlineStyle\r\n /** 리스트 타입, type=\"list\"일 때 사용 */\r\n listType?: \"ordered\" | \"unordered\"\r\n /** 중첩 리스트 아이템 */\r\n children?: IRBlock[]\r\n /** 하이퍼링크 URL */\r\n href?: string\r\n /** 각주/미주 텍스트 (인라인 삽입용) */\r\n footnoteText?: string\r\n /** 이미지 데이터 (type=\"image\"일 때) */\r\n imageData?: ImageData\r\n}\r\n\r\n/** 추출된 이미지 바이너리 데이터 */\r\nexport interface ImageData {\r\n /** 이미지 바이너리 */\r\n data: Uint8Array\r\n /** MIME 타입 (image/png, image/jpeg, image/gif, image/bmp, image/wmf, image/emf) */\r\n mimeType: string\r\n /** 원본 파일명 (있는 경우) */\r\n filename?: string\r\n}\r\n\r\n/** 바운딩 박스 — PDF 포인트 단위 (72pt = 1인치) */\r\nexport interface BoundingBox {\r\n page: number\r\n x: number\r\n y: number\r\n width: number\r\n height: number\r\n}\r\n\r\n/** 인라인 텍스트 스타일 */\r\nexport interface InlineStyle {\r\n bold?: boolean\r\n italic?: boolean\r\n fontSize?: number\r\n fontName?: string\r\n}\r\n\r\nexport interface IRTable {\r\n rows: number\r\n cols: number\r\n cells: IRCell[][]\r\n /** 첫 행을 헤더로 렌더링할지 여부 (현재: rows > 1이면 true — 의미적 감지가 아닌 레이아웃 힌트) */\r\n hasHeader: boolean\r\n}\r\n\r\nexport interface IRCell {\r\n text: string\r\n colSpan: number\r\n rowSpan: number\r\n}\r\n\r\n// ─── 메타데이터 ─────────────────────────────────────\r\n\r\n/** 문서 메타데이터 — 각 포맷에서 추출 가능한 필드만 채워짐 */\r\nexport interface DocumentMetadata {\r\n /** 문서 제목 */\r\n title?: string\r\n /** 작성자 */\r\n author?: string\r\n /** 작성 프로그램 (예: \"한글 2020\", \"Adobe Acrobat\") */\r\n creator?: string\r\n /** 생성일시 (ISO 8601) */\r\n createdAt?: string\r\n /** 수정일시 (ISO 8601) */\r\n modifiedAt?: string\r\n /** 페이지/섹션 수 */\r\n pageCount?: number\r\n /** 문서 포맷 버전 (예: HWP \"5.1.0.1\") */\r\n version?: string\r\n /** 설명 */\r\n description?: string\r\n /** 키워드 */\r\n keywords?: string[]\r\n}\r\n\r\n// ─── 파싱 옵션 ──────────────────────────────────────\r\n\r\n/** 파싱 옵션 — parse() 함수에 전달 */\r\nexport interface ParseOptions {\r\n /**\r\n * 파싱할 페이지/섹션 범위 (1-based).\r\n * - 배열: [1, 2, 3]\r\n * - 문자열: \"1-3\", \"1,3,5-7\"\r\n *\r\n * PDF: 정확한 페이지 단위. HWP/HWPX: 섹션 단위 근사치.\r\n */\r\n pages?: number[] | string\r\n /** 이미지 기반 PDF용 OCR 프로바이더 (선택) */\r\n ocr?: OcrProvider\r\n /** 진행률 콜백 — current: 현재 페이지/섹션, total: 전체 수 */\r\n onProgress?: (current: number, total: number) => void\r\n /** PDF 머리글/바닥글 자동 제거 */\r\n removeHeaderFooter?: boolean\r\n /** 원본 파일 경로 (DRM COM fallback에 필요, 내부 전용) */\r\n filePath?: string\r\n /**\r\n * PDF 수식 OCR 활성화 (기본 false).\r\n *\r\n * 활성화 시 각 PDF 페이지를 이미지로 렌더링 → YOLOv8 기반 수식 영역 검출 →\r\n * TrOCR 기반 LaTeX 인식. 감지된 수식은 `$...$` (inline) / `$$...$$` (display) 로\r\n * 블록 텍스트에 삽입된다.\r\n *\r\n * 필수 optional 의존성: `onnxruntime-node`, `@huggingface/transformers`,\r\n * `@hyzyla/pdfium`, `sharp`. 미설치 시 parse 에 실패하지 않고 **경고만** 남기고\r\n * 수식 인식은 skip 한다 (일반 텍스트 추출은 정상 동작).\r\n *\r\n * 모델(~155MB) 은 첫 사용 시 HuggingFace 에서 자동 다운로드 되어\r\n * `~/.cache/kordoc/models/pix2text/` 에 SHA-256 검증과 함께 저장된다.\r\n */\r\n formulaOcr?: boolean\r\n}\r\n\r\n// ─── 파싱 경고 ──────────────────────────────────────\r\n\r\n/** 파싱 중 스킵/실패한 요소 보고 */\r\nexport interface ParseWarning {\r\n /** 관련 페이지 번호 (알 수 있는 경우) */\r\n page?: number\r\n /** 경고 메시지 */\r\n message: string\r\n /** 구조화된 경고 코드 */\r\n code: WarningCode\r\n}\r\n\r\nexport type WarningCode =\r\n | \"SKIPPED_IMAGE\"\r\n | \"SKIPPED_OLE\"\r\n | \"TRUNCATED_TABLE\"\r\n | \"OCR_FALLBACK\"\r\n | \"UNSUPPORTED_ELEMENT\"\r\n | \"BROKEN_ZIP_RECOVERY\"\r\n | \"HIDDEN_TEXT_FILTERED\"\r\n | \"MALFORMED_XML\"\r\n | \"PARTIAL_PARSE\"\r\n | \"LENIENT_CFB_RECOVERY\"\r\n\r\n/** 문서 구조 (헤딩 트리) */\r\nexport interface OutlineItem {\r\n level: number\r\n text: string\r\n pageNumber?: number\r\n}\r\n\r\n// ─── 에러 코드 ──────────────────────────────────────\r\n\r\n/** 구조화된 에러 코드 — 프로그래밍적 에러 핸들링용 */\r\nexport type ErrorCode =\r\n | \"EMPTY_INPUT\"\r\n | \"UNSUPPORTED_FORMAT\"\r\n | \"ENCRYPTED\"\r\n | \"DRM_PROTECTED\"\r\n | \"CORRUPTED\"\r\n | \"DECOMPRESSION_BOMB\"\r\n | \"ZIP_BOMB\"\r\n | \"IMAGE_BASED_PDF\"\r\n | \"NO_SECTIONS\"\r\n | \"PARSE_ERROR\"\r\n | \"MISSING_DEPENDENCY\"\r\n\r\n// ─── 파싱 결과 (discriminated union) ────────────────\r\n\r\nexport type FileType = \"hwpx\" | \"hwp\" | \"hwp3\" | \"hwpml\" | \"pdf\" | \"xlsx\" | \"xls\" | \"docx\" | \"unknown\"\r\n\r\ninterface ParseResultBase {\r\n fileType: FileType\r\n /** 페이지/섹션 수 — PDF: 실제 페이지 수, HWP/HWPX: 섹션 수, XLSX: 시트 수 */\r\n pageCount?: number\r\n /** 이미지 기반 PDF 여부 (텍스트 추출 불가) */\r\n isImageBased?: boolean\r\n}\r\n\r\nexport interface ParseSuccess extends ParseResultBase {\r\n success: true\r\n /** 추출된 마크다운 텍스트 */\r\n markdown: string\r\n /** 중간 표현 블록 (구조화된 데이터 접근용) */\r\n blocks: IRBlock[]\r\n /** 문서 메타데이터 */\r\n metadata?: DocumentMetadata\r\n /** 문서 구조 (헤딩 트리) — v2.0 */\r\n outline?: OutlineItem[]\r\n /** 파싱 중 발생한 경고 — v2.0 */\r\n warnings?: ParseWarning[]\r\n /** 추출된 이미지 목록 — 마크다운에서 파일명으로 참조됨 */\r\n images?: ExtractedImage[]\r\n}\r\n\r\n/** 추출된 이미지 — ParseSuccess.images에 포함 */\r\nexport interface ExtractedImage {\r\n /** 마크다운에서 참조되는 파일명 (예: image_001.png) */\r\n filename: string\r\n /** 이미지 바이너리 */\r\n data: Uint8Array\r\n /** MIME 타입 */\r\n mimeType: string\r\n}\r\n\r\nexport interface ParseFailure extends ParseResultBase {\r\n success: false\r\n /** 오류 메시지 */\r\n error: string\r\n /** 구조화된 에러 코드 */\r\n code?: ErrorCode\r\n}\r\n\r\nexport type ParseResult = ParseSuccess | ParseFailure\r\n\r\n// ─── 문서 비교 (Diff) ───────────────────────────────\r\n\r\nexport type DiffChangeType = \"added\" | \"removed\" | \"modified\" | \"unchanged\"\r\n\r\nexport interface BlockDiff {\r\n type: DiffChangeType\r\n /** 원본 블록 (added이면 undefined) */\r\n before?: IRBlock\r\n /** 변경 후 블록 (removed이면 undefined) */\r\n after?: IRBlock\r\n /** modified 테이블의 셀 단위 diff */\r\n cellDiffs?: CellDiff[][]\r\n /** 유사도 (0-1) */\r\n similarity?: number\r\n}\r\n\r\nexport interface CellDiff {\r\n type: DiffChangeType\r\n before?: string\r\n after?: string\r\n}\r\n\r\nexport interface DiffResult {\r\n stats: { added: number; removed: number; modified: number; unchanged: number }\r\n diffs: BlockDiff[]\r\n}\r\n\r\n// ─── 양식 인식 ──────────────────────────────────────\r\n\r\nexport interface FormField {\r\n label: string\r\n value: string\r\n /** 0-based 소스 행 */\r\n row: number\r\n /** 0-based 소스 열 */\r\n col: number\r\n}\r\n\r\nexport interface FormResult {\r\n fields: FormField[]\r\n /** 양식 확신도 (0-1) */\r\n confidence: number\r\n}\r\n\r\n// ─── OCR 프로바이더 ─────────────────────────────────\r\n\r\n/** 사용자 제공 OCR 함수 — 페이지 이미지를 받아 텍스트 반환 */\r\nexport type OcrProvider = (\r\n pageImage: Uint8Array,\r\n pageNumber: number,\r\n mimeType: \"image/png\"\r\n) => Promise<string>\r\n\r\n// ─── Watch 모드 ─────────────────────────────────────\r\n\r\nexport interface WatchOptions {\r\n dir: string\r\n outDir?: string\r\n webhook?: string\r\n format?: \"markdown\" | \"json\"\r\n pages?: string\r\n silent?: boolean\r\n}\r\n\r\n// ─── 헤딩 감지 공통 임계값 ──────────────────────────\r\n\r\n/** 폰트 크기 비율 → heading level (전 파서 공통) */\r\nexport const HEADING_RATIO_H1 = 1.5\r\nexport const HEADING_RATIO_H2 = 1.3\r\nexport const HEADING_RATIO_H3 = 1.15\r\n\r\n// ─── 내부 파서 반환 타입 ─────────────────────────────\r\n\r\n/** 내부 파서가 index.ts에 반환하는 공통 타입 (HWP5/HWPX/PDF/XLSX/DOCX) */\r\nexport interface InternalParseResult {\r\n markdown: string\r\n blocks: IRBlock[]\r\n metadata?: DocumentMetadata\r\n outline?: OutlineItem[]\r\n warnings?: ParseWarning[]\r\n images?: ExtractedImage[]\r\n /** PDF 전용: 이미지 기반 PDF 여부 */\r\n isImageBased?: boolean\r\n}\r\n"],"mappings":";AAIO,IAAM,UAAkB,OAA4C,UAAqB;AAOzF,SAAS,cAAc,KAA0B;AACtD,MAAI,IAAI,eAAe,KAAK,IAAI,eAAe,IAAI,OAAO,YAAY;AACpE,WAAO,IAAI;AAAA,EACb;AACA,SAAO,IAAI,OAAO,MAAM,IAAI,YAAY,IAAI,aAAa,IAAI,UAAU;AACzE;AAMO,IAAM,cAAN,cAA0B,MAAM;AAAA,EACrC,YAAY,SAAiB;AAC3B,UAAM,OAAO;AACb,SAAK,OAAO;AAAA,EACd;AACF;AAeO,SAAS,gBAAgB,MAAuB;AACrD,MAAI,KAAK,SAAS,IAAM,EAAG,QAAO;AAClC,QAAM,aAAa,KAAK,QAAQ,OAAO,GAAG;AAC1C,QAAM,WAAW,WAAW,MAAM,GAAG;AACrC,SAAO,SAAS,KAAK,OAAK,MAAM,IAAI,KAAK,WAAW,WAAW,GAAG,KAAK,aAAa,KAAK,UAAU;AACrG;AAQO,SAAS,gBACd,QACA,sBAAsB,MAAM,OAAO,MACnC,aAAa,KACsC;AACnD,MAAI;AACF,UAAM,OAAO,IAAI,SAAS,MAAM;AAChC,UAAM,MAAM,OAAO;AAEnB,QAAI,aAAa;AACjB,aAAS,IAAI,MAAM,IAAI,KAAK,KAAK,IAAI,GAAG,MAAM,KAAK,GAAG,KAAK;AACzD,UAAI,KAAK,UAAU,GAAG,IAAI,MAAM,WAAY;AAAE,qBAAa;AAAG;AAAA,MAAM;AAAA,IACtE;AACA,QAAI,aAAa,EAAG,QAAO,EAAE,mBAAmB,GAAG,YAAY,EAAE;AAEjE,UAAM,aAAa,KAAK,UAAU,aAAa,IAAI,IAAI;AACvD,QAAI,aAAa,YAAY;AAC3B,YAAM,IAAI,YAAY,+CAAiB,UAAU,kBAAQ,UAAU,GAAG;AAAA,IACxE;AAEA,UAAM,SAAS,KAAK,UAAU,aAAa,IAAI,IAAI;AACnD,UAAM,WAAW,KAAK,UAAU,aAAa,IAAI,IAAI;AACrD,QAAI,WAAW,SAAS,IAAK,QAAO,EAAE,mBAAmB,GAAG,WAAW;AAEvE,QAAI,oBAAoB;AACxB,QAAI,MAAM;AACV,aAAS,IAAI,GAAG,IAAI,cAAc,MAAM,MAAM,WAAW,QAAQ,KAAK;AACpE,UAAI,KAAK,UAAU,KAAK,IAAI,MAAM,SAAY;AAC9C,2BAAqB,KAAK,UAAU,MAAM,IAAI,IAAI;AAClD,YAAM,UAAU,KAAK,UAAU,MAAM,IAAI,IAAI;AAC7C,YAAM,WAAW,KAAK,UAAU,MAAM,IAAI,IAAI;AAC9C,YAAM,aAAa,KAAK,UAAU,MAAM,IAAI,IAAI;AAChD,aAAO,KAAK,UAAU,WAAW;AAAA,IACnC;AAEA,QAAI,oBAAoB,qBAAqB;AAC3C,YAAM,IAAI,YAAY,sDAAmB,oBAAoB,OAAO,MAAM,QAAQ,CAAC,CAAC,oBAAU,sBAAsB,OAAO,IAAI,KAAK;AAAA,IACtI;AAEA,WAAO,EAAE,mBAAmB,WAAW;AAAA,EACzC,SAAS,KAAK;AACZ,QAAI,eAAe,YAAa,OAAM;AACtC,WAAO,EAAE,mBAAmB,GAAG,YAAY,EAAE;AAAA,EAC/C;AACF;AAGO,SAAS,SAAS,KAAqB;AAC5C,SAAO,IAAI,QAAQ,0CAA0C,EAAE;AACjE;AAGA,IAAM,eAAe;AACd,SAAS,aAAa,MAA6B;AACxD,QAAM,UAAU,KAAK,KAAK;AAC1B,MAAI,CAAC,WAAW,CAAC,aAAa,KAAK,OAAO,EAAG,QAAO;AACpD,SAAO;AACT;AAKO,SAAS,QAAQ,KAAuB;AAC7C,MAAI,MAAM;AACV,WAAS,IAAI,GAAG,IAAI,IAAI,QAAQ,IAAK,KAAI,IAAI,CAAC,IAAI,IAAK,OAAM,IAAI,CAAC;AAClE,SAAO;AACT;AAGO,SAAS,QAAQ,KAAuB;AAC7C,MAAI,MAAM;AACV,WAAS,IAAI,GAAG,IAAI,IAAI,QAAQ,IAAK,KAAI,IAAI,CAAC,IAAI,IAAK,OAAM,IAAI,CAAC;AAClE,SAAO;AACT;AAOO,SAAS,cAAc,KAAyB;AACrD,MAAI,EAAE,eAAe,OAAQ,QAAO;AACpC,QAAM,MAAM,IAAI;AAChB,MAAI,IAAI,SAAS,oBAAK,EAAG,QAAO;AAChC,MAAI,IAAI,SAAS,KAAK,EAAG,QAAO;AAChC,MAAI,IAAI,SAAS,UAAU,KAAK,IAAI,SAAS,kDAAe,KAAK,IAAI,SAAS,4CAAc,EAAG,QAAO;AACtG,MAAI,IAAI,SAAS,MAAM,KAAK,IAAI,SAAS,2BAAO,KAAK,IAAI,SAAS,2BAAO,EAAG,QAAO;AACnF,MAAI,IAAI,SAAS,iCAAQ,EAAG,QAAO;AACnC,MAAI,IAAI,SAAS,cAAI,MAAM,IAAI,SAAS,4BAAQ,KAAK,IAAI,SAAS,cAAI,GAAI,QAAO;AACjF,MAAI,IAAI,SAAS,0BAAM,KAAK,IAAI,SAAS,kCAAS,EAAG,QAAO;AAC5D,SAAO;AACT;;;AC5IO,IAAM,WAAW;AAEjB,IAAM,WAAW;AAEjB,SAAS,WAAW,MAAgC;AACzD,MAAI,KAAK,SAAS,SAAU,QAAO,KAAK,MAAM,GAAG,QAAQ;AACzD,QAAM,UAAU,KAAK;AAGrB,QAAM,UAAU,KAAK,KAAK,SAAO,IAAI,KAAK,OAAK,EAAE,YAAY,UAAa,EAAE,YAAY,MAAS,CAAC;AAClG,MAAI,QAAS,QAAO,iBAAiB,MAAM,OAAO;AAGlD,MAAI,UAAU;AACd,QAAM,eAA4B,MAAM,KAAK,EAAE,QAAQ,QAAQ,GAAG,MAAM,CAAC,CAAC;AAE1E,WAAS,SAAS,GAAG,SAAS,SAAS,UAAU;AAC/C,QAAI,SAAS;AACb,eAAW,QAAQ,KAAK,MAAM,GAAG;AAC/B,aAAO,SAAS,YAAY,aAAa,MAAM,EAAE,MAAM,EAAG;AAC1D,UAAI,UAAU,SAAU;AAExB,eAAS,IAAI,QAAQ,IAAI,KAAK,IAAI,SAAS,KAAK,SAAS,OAAO,GAAG,KAAK;AACtE,iBAAS,IAAI,QAAQ,IAAI,KAAK,IAAI,SAAS,KAAK,SAAS,QAAQ,GAAG,KAAK;AACvE,uBAAa,CAAC,EAAE,CAAC,IAAI;AAAA,QACvB;AAAA,MACF;AACA,gBAAU,KAAK;AACf,UAAI,SAAS,QAAS,WAAU;AAAA,IAClC;AAAA,EACF;AAEA,MAAI,YAAY,EAAG,QAAO,EAAE,MAAM,GAAG,MAAM,GAAG,OAAO,CAAC,GAAG,WAAW,MAAM;AAG1E,QAAM,OAAmB,MAAM;AAAA,IAAK,EAAE,QAAQ,QAAQ;AAAA,IAAG,MACvD,MAAM,KAAK,EAAE,QAAQ,QAAQ,GAAG,OAAO,EAAE,MAAM,IAAI,SAAS,GAAG,SAAS,EAAE,EAAE;AAAA,EAC9E;AACA,QAAM,WAAwB,MAAM,KAAK,EAAE,QAAQ,QAAQ,GAAG,MAAM,MAAM,OAAO,EAAE,KAAK,KAAK,CAAC;AAE9F,WAAS,SAAS,GAAG,SAAS,SAAS,UAAU;AAC/C,QAAI,SAAS;AACb,QAAI,UAAU;AAEd,WAAO,SAAS,WAAW,UAAU,KAAK,MAAM,EAAE,QAAQ;AACxD,aAAO,SAAS,WAAW,SAAS,MAAM,EAAE,MAAM,EAAG;AACrD,UAAI,UAAU,QAAS;AAEvB,YAAM,OAAO,KAAK,MAAM,EAAE,OAAO;AACjC,WAAK,MAAM,EAAE,MAAM,IAAI;AAAA,QACrB,MAAM,KAAK,KAAK,KAAK;AAAA,QACrB,SAAS,KAAK;AAAA,QACd,SAAS,KAAK;AAAA,MAChB;AAEA,eAAS,IAAI,QAAQ,IAAI,KAAK,IAAI,SAAS,KAAK,SAAS,OAAO,GAAG,KAAK;AACtE,iBAAS,IAAI,QAAQ,IAAI,KAAK,IAAI,SAAS,KAAK,SAAS,OAAO,GAAG,KAAK;AACtE,mBAAS,CAAC,EAAE,CAAC,IAAI;AAAA,QACnB;AAAA,MACF;AAEA,gBAAU,KAAK;AACf;AAAA,IACF;AAAA,EACF;AAEA,SAAO,cAAc,MAAM,SAAS,OAAO;AAC7C;AAGA,SAAS,iBAAiB,MAAuB,SAA0B;AAEzE,MAAI,UAAU;AACd,aAAW,OAAO,MAAM;AACtB,eAAW,QAAQ,KAAK;AACtB,YAAM,OAAO,KAAK,WAAW,KAAK,KAAK;AACvC,UAAI,MAAM,QAAS,WAAU;AAAA,IAC/B;AAAA,EACF;AACA,MAAI,UAAU,SAAU,WAAU;AAClC,MAAI,YAAY,EAAG,QAAO,EAAE,MAAM,GAAG,MAAM,GAAG,OAAO,CAAC,GAAG,WAAW,MAAM;AAE1E,QAAM,OAAmB,MAAM;AAAA,IAAK,EAAE,QAAQ,QAAQ;AAAA,IAAG,MACvD,MAAM,KAAK,EAAE,QAAQ,QAAQ,GAAG,OAAO,EAAE,MAAM,IAAI,SAAS,GAAG,SAAS,EAAE,EAAE;AAAA,EAC9E;AAEA,aAAW,OAAO,MAAM;AACtB,eAAW,QAAQ,KAAK;AACtB,YAAM,IAAI,KAAK,WAAW;AAC1B,YAAM,IAAI,KAAK,WAAW;AAC1B,UAAI,KAAK,WAAW,KAAK,WAAW,IAAI,KAAK,IAAI,EAAG;AAEpD,WAAK,CAAC,EAAE,CAAC,IAAI,EAAE,MAAM,KAAK,KAAK,KAAK,GAAG,SAAS,KAAK,SAAS,SAAS,KAAK,QAAQ;AAGpF,eAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,iBAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,cAAI,OAAO,KAAK,OAAO,EAAG;AAC1B,cAAI,IAAI,KAAK,WAAW,IAAI,KAAK,SAAS;AACxC,iBAAK,IAAI,EAAE,EAAE,IAAI,EAAE,IAAI,EAAE,MAAM,IAAI,SAAS,GAAG,SAAS,EAAE;AAAA,UAC5D;AAAA,QACF;AAAA,MACF;AAAA,IACF;AAAA,EACF;AAEA,SAAO,cAAc,MAAM,SAAS,OAAO;AAC7C;AAGA,SAAS,cAAc,MAAkB,SAAiB,SAA0B;AAClF,MAAI,gBAAgB;AACpB,SAAO,gBAAgB,GAAG;AACxB,UAAM,WAAW,KAAK,MAAM,SAAO,CAAC,IAAI,gBAAgB,CAAC,GAAG,MAAM,KAAK,CAAC;AACxE,QAAI,CAAC,SAAU;AACf;AAAA,EACF;AACA,MAAI,gBAAgB,WAAW,gBAAgB,GAAG;AAChD,UAAM,UAAU,KAAK,IAAI,SAAO,IAAI,MAAM,GAAG,aAAa,CAAC;AAC3D,WAAO,EAAE,MAAM,SAAS,MAAM,eAAe,OAAO,SAAS,WAAW,UAAU,EAAE;AAAA,EACtF;AACA,SAAO,EAAE,MAAM,SAAS,MAAM,SAAS,OAAO,MAAM,WAAW,UAAU,EAAE;AAC7E;AAEO,SAAS,mBAAmB,MAA+B;AAChE,SAAO,KACJ;AAAA,IAAI,SACH,IACG,IAAI,OAAK,EAAE,KAAK,KAAK,EAAE,QAAQ,OAAO,GAAG,EAAE,QAAQ,OAAO,KAAK,CAAC,EAChE,OAAO,OAAO,EACd,KAAK,KAAK;AAAA,EACf,EACC,OAAO,OAAO,EACd,KAAK,IAAI;AACd;AAGA,SAAS,UAAU,MAAsB;AAEvC,SAAO,KAAK,QAAQ,MAAM,KAAK;AACjC;AAGA,IAAM,wBAAwB;AAG9B,SAAS,aAAa,MAAsB;AAC1C,MAAI,SAAS,KAEV,QAAQ,2BAA2B,EAAE,EAErC,QAAQ,uBAAuB,EAAE,EACjC,QAAQ,QAAQ,GAAG,EACnB,KAAK;AAGR,MAAI,OAAO,UAAU,MAAM,OAAO,SAAS,GAAG,GAAG;AAC/C,UAAM,SAAS,OAAO,MAAM,GAAG;AAE/B,UAAM,wBAAwB,OAAO,OAAO,OAAK,EAAE,WAAW,KAAK,+BAA+B,KAAK,CAAC,CAAC,EAAE;AAC3G,QAAI,OAAO,UAAU,KAAK,wBAAwB,OAAO,UAAU,KAAK;AACtE,eAAS,OAAO,KAAK,EAAE;AAAA,IACzB;AAAA,EACF;AACA,SAAO;AACT;AAOO,SAAS,oBAAoB,QAA8B;AAChE,QAAM,SAAoB,CAAC;AAE3B,aAAW,SAAS,QAAQ;AAC1B,QAAI,MAAM,SAAS,WAAW,CAAC,MAAM,OAAO;AAC1C,aAAO,KAAK,KAAK;AACjB;AAAA,IACF;AAEA,UAAM,EAAE,MAAM,SAAS,MAAM,SAAS,MAAM,IAAI,MAAM;AAGtD,QAAI,YAAY,KAAK,YAAY,GAAG;AAClC,aAAO,KAAK,KAAK;AACjB;AAAA,IACF;AAGA,QAAI,WAAW,GAAG;AAChB,UAAI,gBAAgB;AACpB,UAAI,eAAe;AACnB,eAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,iBAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,gBAAM,IAAI,MAAM,CAAC,IAAI,CAAC,GAAG,QAAQ;AACjC,4BAAkB,EAAE,MAAM,KAAK,KAAK,CAAC,GAAG;AACxC,0BAAgB,EAAE;AAAA,QACpB;AAAA,MACF;AAGA,UAAI,gBAAgB,KAAM,WAAW,KAAK,eAAe,KAAM;AAE7D,iBAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,mBAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,kBAAM,WAAW,MAAM,CAAC,IAAI,CAAC,GAAG,MAAM,KAAK;AAC3C,gBAAI,CAAC,SAAU;AAEf,uBAAW,QAAQ,SAAS,MAAM,IAAI,GAAG;AACvC,oBAAM,UAAU,KAAK,KAAK;AAC1B,kBAAI,CAAC,QAAS;AACd,qBAAO,KAAK,EAAE,MAAM,aAAa,MAAM,SAAS,YAAY,MAAM,WAAW,CAAC;AAAA,YAChF;AAAA,UACF;AAAA,QACF;AACA;AAAA,MACF;AAAA,IACF;AAEA,WAAO,KAAK,KAAK;AAAA,EACnB;AAEA,SAAO;AACT;AAEO,SAAS,iBAAiB,QAA2B;AAC1D,QAAM,QAAkB,CAAC;AAEzB,WAAS,IAAI,GAAG,IAAI,OAAO,QAAQ,KAAK;AACtC,UAAM,QAAQ,OAAO,CAAC;AAGtB,QAAI,MAAM,SAAS,aAAa,MAAM,MAAM;AAC1C,YAAM,SAAS,IAAI,OAAO,KAAK,IAAI,MAAM,SAAS,GAAG,CAAC,CAAC;AACvD,YAAM,cAAc,aAAa,MAAM,IAAI;AAC3C,UAAI,YAAa,OAAM,KAAK,IAAI,GAAG,MAAM,IAAI,WAAW,IAAI,EAAE;AAC9D;AAAA,IACF;AAGA,QAAI,MAAM,SAAS,WAAW,MAAM,MAAM;AACxC,YAAM,KAAK,IAAI,YAAY,MAAM,IAAI,KAAK,EAAE;AAC5C;AAAA,IACF;AAGA,QAAI,MAAM,SAAS,aAAa;AAC9B,YAAM,KAAK,IAAI,OAAO,EAAE;AACxB;AAAA,IACF;AAGA,QAAI,MAAM,SAAS,UAAU,MAAM,MAAM;AACvC,YAAM,WAAW,aAAa,MAAM,IAAI;AACxC,UAAI,CAAC,SAAU;AAEf,YAAM,kBAAkB,MAAM,aAAa,aAAa,WAAW,KAAK,QAAQ;AAChF,YAAM,SAAS,kBAAkB,KAAK,MAAM,aAAa,YAAY,QAAQ;AAC7E,YAAM,KAAK,GAAG,MAAM,GAAG,QAAQ,EAAE;AACjC,UAAI,MAAM,UAAU;AAClB,mBAAW,SAAS,MAAM,UAAU;AAClC,gBAAM,cAAc,MAAM,aAAa,YAAY,OAAO;AAC1D,gBAAM,KAAK,KAAK,WAAW,IAAI,MAAM,QAAQ,EAAE,EAAE;AAAA,QACnD;AAAA,MACF;AACA;AAAA,IACF;AAEA,QAAI,MAAM,SAAS,eAAe,MAAM,MAAM;AAC5C,UAAI,OAAO,aAAa,MAAM,IAAI;AAClC,UAAI,CAAC,KAAM;AAGX,UAAI,cAAc,KAAK,IAAI,GAAG;AAC5B,cAAM,YAAY,OAAO,IAAI,CAAC;AAC9B,YAAI,WAAW,SAAS,eAAe,UAAU,QAAQ,SAAS,KAAK,UAAU,IAAI,GAAG;AACtF,gBAAM,KAAK,IAAI,MAAM,IAAI,IAAI,UAAU,IAAI,IAAI,EAAE;AACjD;AAAA,QACF,OAAO;AACL,gBAAM,KAAK,IAAI,MAAM,IAAI,IAAI,EAAE;AAAA,QACjC;AACA;AAAA,MACF;AAEA,UAAI,sBAAsB,KAAK,IAAI,GAAG;AACpC,cAAM,KAAK,IAAI,IAAI,KAAK,EAAE;AAC1B;AAAA,MACF;AAGA,UAAI,MAAM,MAAM;AACd,cAAM,OAAO,aAAa,MAAM,IAAI;AACpC,YAAI,KAAM,QAAO,IAAI,IAAI,KAAK,IAAI;AAAA,MACpC;AAGA,UAAI,MAAM,cAAc;AACtB,gBAAQ,aAAQ,MAAM,YAAY;AAAA,MACpC;AAEA,YAAM,KAAK,UAAU,IAAI,GAAG,EAAE;AAAA,IAChC,WAAW,MAAM,SAAS,WAAW,MAAM,OAAO;AAEhD,UAAI,MAAM,SAAS,KAAK,MAAM,MAAM,SAAS,CAAC,MAAM,IAAI;AACtD,cAAM,KAAK,EAAE;AAAA,MACf;AACA,YAAM,UAAU,gBAAgB,MAAM,KAAK;AAC3C,UAAI,SAAS;AACX,cAAM,KAAK,OAAO;AAClB,cAAM,KAAK,EAAE;AAAA,MACf;AAAA,IACF;AAAA,EACF;AAEA,SAAO,MAAM,KAAK,IAAI,EAAE,KAAK;AAC/B;AAGA,SAAS,eAAe,OAAyB;AAC/C,aAAW,OAAO,MAAM,OAAO;AAC7B,eAAW,QAAQ,KAAK;AACtB,UAAI,KAAK,UAAU,KAAK,KAAK,UAAU,EAAG,QAAO;AAAA,IACnD;AAAA,EACF;AACA,SAAO;AACT;AAEA,SAAS,mBAAmB,MAAuB;AACjD,SAAO,wCAAwC,KAAK,IAAI;AAC1D;AAEA,SAAS,wBAAwB,OAAyB;AACxD,aAAW,OAAO,MAAM,OAAO;AAC7B,eAAW,QAAQ,KAAK;AACtB,UAAI,mBAAmB,KAAK,IAAI,EAAG,QAAO;AAAA,IAC5C;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,YAAY,OAAwB;AAC3C,QAAM,EAAE,OAAO,MAAM,SAAS,MAAM,QAAQ,IAAI;AAChD,QAAM,OAAO,oBAAI,IAAY;AAC7B,QAAM,QAAkB,CAAC,SAAS;AAElC,WAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,UAAM,MAAM,MAAM,IAAI,OAAO;AAC7B,UAAM,UAAoB,CAAC;AAC3B,aAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,UAAI,KAAK,IAAI,GAAG,CAAC,IAAI,CAAC,EAAE,EAAG;AAC3B,YAAM,OAAO,MAAM,CAAC,IAAI,CAAC;AACzB,UAAI,CAAC,KAAM;AAGX,eAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,iBAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,cAAI,OAAO,KAAK,OAAO,EAAG;AAC1B,cAAI,IAAI,KAAK,WAAW,IAAI,KAAK,QAAS,MAAK,IAAI,GAAG,IAAI,EAAE,IAAI,IAAI,EAAE,EAAE;AAAA,QAC1E;AAAA,MACF;AAEA,YAAM,OAAO,aAAa,KAAK,IAAI,EAAE,QAAQ,OAAO,MAAM;AAC1D,YAAM,QAAkB,CAAC;AACzB,UAAI,KAAK,UAAU,EAAG,OAAM,KAAK,YAAY,KAAK,OAAO,GAAG;AAC5D,UAAI,KAAK,UAAU,EAAG,OAAM,KAAK,YAAY,KAAK,OAAO,GAAG;AAC5D,YAAM,UAAU,MAAM,SAAS,MAAM,MAAM,KAAK,GAAG,IAAI;AACvD,cAAQ,KAAK,IAAI,GAAG,GAAG,OAAO,IAAI,IAAI,KAAK,GAAG,GAAG;AAAA,IACnD;AACA,QAAI,QAAQ,OAAQ,OAAM,KAAK,OAAO,QAAQ,KAAK,EAAE,CAAC,OAAO;AAAA,EAC/D;AAEA,QAAM,KAAK,UAAU;AACrB,SAAO,MAAM,KAAK,IAAI;AACxB;AAEA,SAAS,gBAAgB,OAAwB;AAC/C,MAAI,MAAM,SAAS,KAAK,MAAM,SAAS,EAAG,QAAO;AAEjD,QAAM,EAAE,OAAO,MAAM,SAAS,MAAM,QAAQ,IAAI;AAIhD,MAAI,eAAe,KAAK,KAAK,CAAC,wBAAwB,KAAK,EAAG,QAAO,YAAY,KAAK;AAGtF,MAAI,YAAY,KAAK,YAAY,GAAG;AAClC,UAAM,UAAU,aAAa,MAAM,CAAC,EAAE,CAAC,EAAE,IAAI;AAC7C,QAAI,CAAC,QAAS,QAAO;AACrB,WAAO,QACJ,MAAM,IAAI,EACV,IAAI,UAAQ;AACX,YAAM,UAAU,KAAK,KAAK;AAC1B,UAAI,CAAC,QAAS,QAAO;AACrB,UAAI,WAAW,KAAK,OAAO,EAAG,QAAO,KAAK,UAAU,OAAO,CAAC;AAC5D,UAAI,aAAa,KAAK,OAAO,EAAG,QAAO,KAAK,UAAU,OAAO,CAAC;AAC9D,aAAO,UAAU,OAAO;AAAA,IAC1B,CAAC,EACA,OAAO,OAAO,EACd,KAAK,IAAI;AAAA,EACd;AAGA,MAAI,YAAY,KAAK,WAAW,GAAG;AACjC,WAAO,MACJ,IAAI,SAAO,UAAU,aAAa,IAAI,CAAC,EAAE,IAAI,CAAC,EAAE,QAAQ,OAAO,GAAG,CAAC,EACnE,OAAO,OAAO,EACd,KAAK,IAAI;AAAA,EACd;AAGA,QAAM,UAAsB,MAAM,KAAK,EAAE,QAAQ,QAAQ,GAAG,MAAM,MAAM,OAAO,EAAE,KAAK,EAAE,CAAC;AACzF,QAAM,OAAO,oBAAI,IAAY;AAE7B,WAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,aAAS,IAAI,GAAG,IAAI,SAAS,KAAK;AAChC,UAAI,KAAK,IAAI,GAAG,CAAC,IAAI,CAAC,EAAE,EAAG;AAC3B,YAAM,OAAO,MAAM,CAAC,IAAI,CAAC;AACzB,UAAI,CAAC,KAAM;AACX,cAAQ,CAAC,EAAE,CAAC,IAAI,UAAU,aAAa,KAAK,IAAI,CAAC,EAAE,QAAQ,OAAO,KAAK,EAAE,QAAQ,OAAO,MAAM;AAG9F,eAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,iBAAS,KAAK,GAAG,KAAK,KAAK,SAAS,MAAM;AACxC,cAAI,OAAO,KAAK,OAAO,EAAG;AAC1B,cAAI,IAAI,KAAK,WAAW,IAAI,KAAK,SAAS;AACxC,iBAAK,IAAI,GAAG,IAAI,EAAE,IAAI,IAAI,EAAE,EAAE;AAAA,UAChC;AAAA,QACF;AAAA,MACF;AAEA,WAAK,KAAK,UAAU;AAAA,IACtB;AAAA,EACF;AAMA,QAAM,aAAyB,CAAC;AAChC,MAAI,kBAAkB;AACtB,WAAS,IAAI,GAAG,IAAI,QAAQ,QAAQ,KAAK;AACvC,UAAM,MAAM,QAAQ,CAAC;AACrB,UAAM,qBAAqB,IAAI,MAAM,UAAQ,SAAS,EAAE;AACxD,QAAI,mBAAoB;AAIxB,UAAM,eAAe,IAAI,OAAO,UAAQ,SAAS,EAAE;AACnD,UAAM,eAAe,IAAI,KAAK,CAAC,GAAG,MAAM,KAAK,IAAI,GAAG,CAAC,IAAI,CAAC,EAAE,CAAC;AAC7D,QAAI,CAAC,gBAAgB,aAAa,WAAW,KAAK,IAAI,CAAC,MAAM,MAAM,IAAI,MAAM,CAAC,EAAE,MAAM,OAAK,MAAM,EAAE,GAAG;AACpG,wBAAkB,IAAI,CAAC;AACvB;AAAA,IACF;AAGA,QAAI,mBAAmB,IAAI,CAAC,MAAM,IAAI;AACpC,UAAI,CAAC,IAAI;AACT,wBAAkB;AAAA,IACpB,OAAO;AACL,wBAAkB;AAAA,IACpB;AACA,eAAW,KAAK,GAAG;AAAA,EACrB;AAEA,MAAI,WAAW,WAAW,EAAG,QAAO;AAEpC,QAAM,KAAe,CAAC;AACtB,KAAG,KAAK,OAAO,WAAW,CAAC,EAAE,KAAK,KAAK,IAAI,IAAI;AAC/C,KAAG,KAAK,OAAO,WAAW,CAAC,EAAE,IAAI,MAAM,KAAK,EAAE,KAAK,KAAK,IAAI,IAAI;AAChE,WAAS,IAAI,GAAG,IAAI,WAAW,QAAQ,KAAK;AAC1C,OAAG,KAAK,OAAO,WAAW,CAAC,EAAE,KAAK,KAAK,IAAI,IAAI;AAAA,EACjD;AACA,SAAO,GAAG,KAAK,IAAI;AACrB;;;AChLO,IAAM,mBAAmB;AACzB,IAAM,mBAAmB;AACzB,IAAM,mBAAmB;","names":[]}
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
 
3
3
  // src/utils.ts
4
- var VERSION = true ? "2.5.2" : "0.0.0-dev";
4
+ var VERSION = true ? "2.7.1" : "0.0.0-dev";
5
5
  function toArrayBuffer(buf) {
6
6
  if (buf.byteOffset === 0 && buf.byteLength === buf.buffer.byteLength) {
7
7
  return buf.buffer;
@@ -331,6 +331,17 @@ function hasMergedCells(table) {
331
331
  }
332
332
  return false;
333
333
  }
334
+ function containsInlineMath(text) {
335
+ return /(^|[^\\])\$(?=\S)(?:\\.|[^$\n])+?\S\$/.test(text);
336
+ }
337
+ function tableContainsInlineMath(table) {
338
+ for (const row of table.cells) {
339
+ for (const cell of row) {
340
+ if (containsInlineMath(cell.text)) return true;
341
+ }
342
+ }
343
+ return false;
344
+ }
334
345
  function tableToHtml(table) {
335
346
  const { cells, rows: numRows, cols: numCols } = table;
336
347
  const skip = /* @__PURE__ */ new Set();
@@ -363,7 +374,7 @@ function tableToHtml(table) {
363
374
  function tableToMarkdown(table) {
364
375
  if (table.rows === 0 || table.cols === 0) return "";
365
376
  const { cells, rows: numRows, cols: numCols } = table;
366
- if (hasMergedCells(table)) return tableToHtml(table);
377
+ if (hasMergedCells(table) && !tableContainsInlineMath(table)) return tableToHtml(table);
367
378
  if (numRows === 1 && numCols === 1) {
368
379
  const content = sanitizeText(cells[0][0].text);
369
380
  if (!content) return "";
@@ -454,4 +465,4 @@ export {
454
465
  HEADING_RATIO_H2,
455
466
  HEADING_RATIO_H3
456
467
  };
457
- //# sourceMappingURL=chunk-NKKLA43G.js.map
468
+ //# sourceMappingURL=chunk-LA66FVBN.js.map