kordoc 1.6.1 → 1.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -2
- package/dist/{chunk-DYUB34PO.js → chunk-VOMMXHNQ.js} +432 -90
- package/dist/chunk-VOMMXHNQ.js.map +1 -0
- package/dist/cli.js +38 -7
- package/dist/cli.js.map +1 -1
- package/dist/index.cjs +435 -84
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +32 -8
- package/dist/index.d.ts +32 -8
- package/dist/index.js +437 -84
- package/dist/index.js.map +1 -1
- package/dist/mcp.js +25 -21
- package/dist/mcp.js.map +1 -1
- package/dist/{watch-3QVNEAVM.js → watch-YPR56MI6.js} +34 -4
- package/dist/watch-YPR56MI6.js.map +1 -0
- package/package.json +1 -1
- package/dist/chunk-DYUB34PO.js.map +0 -1
- package/dist/watch-3QVNEAVM.js.map +0 -1
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
**모두 파싱해버리겠다** — The Korean Document Platform.
|
|
4
4
|
|
|
5
|
-
[](https://www.npmjs.com/package/kordoc)
|
|
6
6
|
[](https://github.com/chrisryugj/kordoc/blob/main/LICENSE)
|
|
7
7
|
[](https://nodejs.org)
|
|
8
8
|
|
|
@@ -14,11 +14,25 @@
|
|
|
14
14
|
|
|
15
15
|
---
|
|
16
16
|
|
|
17
|
-
## What's New in v1.
|
|
17
|
+
## What's New in v1.7.0
|
|
18
|
+
|
|
19
|
+
- **Image Extraction (HWP/HWPX)** — Binary image extraction from ZIP entries and HWP5 BinData streams. Rendered as `` in markdown output.
|
|
20
|
+
- **Partial Parsing (Graceful Degradation)** — Single page failures no longer abort the whole document. Failed pages emit `PARTIAL_PARSE` warnings and parsing continues.
|
|
21
|
+
- **Progress Callbacks** — `onProgress` callback in `ParseOptions`. CLI shows `[3/15 pages]` progress. Batch mode shows `[2/10 files]`.
|
|
22
|
+
- **File Path Input** — `parse("path/to/file.hwp")` string overload. Auto-reads file, detects format, returns result.
|
|
23
|
+
- **PDF Header/Footer Filtering** — `removeHeaderFooter: true` option removes repeated text at page edges. Removed elements recorded in `ParseWarning`.
|
|
24
|
+
- **Security Hardening** — ZIP bomb cumulative-size tracking across all file types, SSRF prevention on webhook URLs, XSS-safe hyperlink rendering (javascript: URLs stripped), null-byte path traversal detection, Levenshtein length guard (O(m×n) DoS prevention), 30s PDF load timeout.
|
|
25
|
+
- **Bug Fixes** — HWPX generator separator logic, XML recursion depth limit (MAX_XML_DEPTH=200), PDF table row merge protection, CLI `--format` validation, variable shadowing in PDF parser.
|
|
26
|
+
- **UX Improvements** — KV table false-positive reduction (time/URL/number patterns excluded), MCP `parse_metadata` uses 50MB limit with header-only format detection, Watch debounce increased to 1000ms with stable-size check.
|
|
27
|
+
|
|
28
|
+
<details>
|
|
29
|
+
<summary>v1.6.1 fixes</summary>
|
|
18
30
|
|
|
19
31
|
- **HWP5 Table Cell Offset Fix** — Fixed critical 2-byte offset misalignment in LIST_HEADER parsing. Row address was incorrectly read as colSpan, causing 3-column tables to explode into 6+ columns with misaligned content. Tables now use colAddr/rowAddr-based direct placement for accurate cell positioning.
|
|
20
32
|
- **HWP5 TAB Control Character Fix** — TAB (0x0009) inline control's 14-byte extension data was not skipped, producing garbage characters (`࣐Ā`) after every tab in the output. Fixed by adding the required 14-byte skip.
|
|
21
33
|
|
|
34
|
+
</details>
|
|
35
|
+
|
|
22
36
|
<details>
|
|
23
37
|
<summary>v1.6.0 features</summary>
|
|
24
38
|
|