@heripo/pdf-parser 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ko.md CHANGED
@@ -3,7 +3,7 @@
3
3
  > PDF 파싱 라이브러리 - Docling SDK를 활용한 OCR 지원
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/@heripo/pdf-parser.svg)](https://www.npmjs.com/package/@heripo/pdf-parser)
6
- [![Node.js](https://img.shields.io/badge/Node.js-%3E%3D22-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
6
+ [![Node.js](https://img.shields.io/badge/Node.js-%3E%3D24-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
7
7
  [![Python](https://img.shields.io/badge/Python-3.9--3.12-3776AB?logo=python&logoColor=white)](https://www.python.org/)
8
8
  ![coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)
9
9
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](../../LICENSE)
@@ -46,7 +46,7 @@
46
46
 
47
47
  ### 필수 의존성
48
48
 
49
- #### 1. Node.js >= 22.0.0
49
+ #### 1. Node.js >= 24.0.0
50
50
 
51
51
  ```bash
52
52
  brew install node
@@ -72,7 +72,7 @@ python3.11 --version
72
72
 
73
73
  #### 4. poppler (PDF 텍스트 추출)
74
74
 
75
- OCR 전략 시스템의 텍스트 레이어 사전 검사(`pdftotext`)에 필요합니다.
75
+ PDF 페이지 확인(`pdfinfo`)과 텍스트 레이어 추출(`pdftotext`)에 필요하며, OCR 전략 시스템의 텍스트 레이어 사전 검사에 사용됩니다.
76
76
 
77
77
  ```bash
78
78
  brew install poppler
@@ -281,12 +281,12 @@ const outputPath = await pdfParser.parse(
281
281
 
282
282
  `@heripo/pdf-parser`는 다음 시스템 레벨 의존성이 필요합니다:
283
283
 
284
- | 의존성 | 필수 버전 | 설치 방법 | 용도 |
285
- | ------- | ---------- | -------------------------- | ----------------------------------------- |
286
- | Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK 실행 환경 |
287
- | poppler | Any | `brew install poppler` | OCR 전략용 텍스트 레이어 추출 (pdftotext) |
288
- | jq | Any | `brew install jq` | JSON 처리 (변환 결과 파싱) |
289
- | lsof | Any | macOS 기본 설치됨 | docling-serve 포트 관리 |
284
+ | 의존성 | 필수 버전 | 설치 방법 | 용도 |
285
+ | ------- | ---------- | -------------------------- | -------------------------------------------------------------- |
286
+ | Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK 실행 환경 |
287
+ | poppler | Any | `brew install poppler` | PDF 페이지 수 확인 (pdfinfo) 및 텍스트 레이어 추출 (pdftotext) |
288
+ | jq | Any | `brew install jq` | JSON 처리 (변환 결과 파싱) |
289
+ | lsof | Any | macOS 기본 설치됨 | docling-serve 포트 관리 |
290
290
 
291
291
  > ⚠️ **Python 3.13+는 지원하지 않습니다.** Docling SDK의 일부 의존성이 Python 3.13과 호환되지 않습니다.
292
292
 
@@ -411,6 +411,16 @@ const pdfParser = new PDFParser({
411
411
  brew install jq
412
412
  ```
413
413
 
414
+ ### poppler를 찾을 수 없음
415
+
416
+ **증상**: `poppler is not installed. Please install poppler using: brew install poppler`
417
+
418
+ **해결**:
419
+
420
+ ```bash
421
+ brew install poppler
422
+ ```
423
+
414
424
  ### 포트 충돌
415
425
 
416
426
  **증상**: `Port 5001 is already in use`
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  > PDF parsing library - OCR support with Docling SDK
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/@heripo/pdf-parser.svg)](https://www.npmjs.com/package/@heripo/pdf-parser)
6
- [![Node.js](https://img.shields.io/badge/Node.js-%3E%3D22-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
6
+ [![Node.js](https://img.shields.io/badge/Node.js-%3E%3D24-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
7
7
  [![Python](https://img.shields.io/badge/Python-3.9--3.12-3776AB?logo=python&logoColor=white)](https://www.python.org/)
8
8
  ![coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)
9
9
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](../../LICENSE)
@@ -46,7 +46,7 @@
46
46
 
47
47
  ### Required Dependencies
48
48
 
49
- #### 1. Node.js >= 22.0.0
49
+ #### 1. Node.js >= 24.0.0
50
50
 
51
51
  ```bash
52
52
  brew install node
@@ -72,7 +72,7 @@ python3.11 --version
72
72
 
73
73
  #### 4. poppler (PDF text extraction)
74
74
 
75
- Required for the OCR strategy system's text layer pre-check (`pdftotext`).
75
+ Required for PDF page counting (`pdfinfo`) and text layer extraction (`pdftotext`), used by the OCR strategy system's text layer pre-check.
76
76
 
77
77
  ```bash
78
78
  brew install poppler
@@ -281,12 +281,12 @@ Archaeological excavation report PDFs have the following characteristics:
281
281
 
282
282
  `@heripo/pdf-parser` requires the following system-level dependencies:
283
283
 
284
- | Dependency | Required Version | Installation | Purpose |
285
- | ---------- | ---------------- | -------------------------- | -------------------------------------------------- |
286
- | Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK runtime |
287
- | poppler | Any | `brew install poppler` | Text layer extraction for OCR strategy (pdftotext) |
288
- | jq | Any | `brew install jq` | JSON processing (conversion result parsing) |
289
- | lsof | Any | Included with macOS | docling-serve port management |
284
+ | Dependency | Required Version | Installation | Purpose |
285
+ | ---------- | ---------------- | -------------------------- | ----------------------------------------------------------------- |
286
+ | Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK runtime |
287
+ | poppler | Any | `brew install poppler` | PDF page counting (pdfinfo) and text layer extraction (pdftotext) |
288
+ | jq | Any | `brew install jq` | JSON processing (conversion result parsing) |
289
+ | lsof | Any | Included with macOS | docling-serve port management |
290
290
 
291
291
  > ⚠️ **Python 3.13+ is not supported.** Some Docling SDK dependencies are not compatible with Python 3.13.
292
292
 
@@ -411,6 +411,16 @@ const pdfParser = new PDFParser({
411
411
  brew install jq
412
412
  ```
413
413
 
414
+ ### poppler Not Found
415
+
416
+ **Symptom**: `poppler is not installed. Please install poppler using: brew install poppler`
417
+
418
+ **Solution**:
419
+
420
+ ```bash
421
+ brew install poppler
422
+ ```
423
+
414
424
  ### Port Conflict
415
425
 
416
426
  **Symptom**: `Port 5001 is already in use`