@heripo/pdf-parser 0.1.8 → 0.1.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +19 -9
- package/README.md +19 -9
- package/dist/index.cjs +419 -151
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +4 -1
- package/dist/index.d.ts +4 -1
- package/dist/index.js +388 -126
- package/dist/index.js.map +1 -1
- package/package.json +4 -4
package/README.ko.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
> PDF 파싱 라이브러리 - Docling SDK를 활용한 OCR 지원
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/@heripo/pdf-parser)
|
|
6
|
-
[](https://nodejs.org/)
|
|
7
7
|
[](https://www.python.org/)
|
|
8
8
|

|
|
9
9
|
[](../../LICENSE)
|
|
@@ -46,7 +46,7 @@
|
|
|
46
46
|
|
|
47
47
|
### 필수 의존성
|
|
48
48
|
|
|
49
|
-
#### 1. Node.js >=
|
|
49
|
+
#### 1. Node.js >= 24.0.0
|
|
50
50
|
|
|
51
51
|
```bash
|
|
52
52
|
brew install node
|
|
@@ -72,7 +72,7 @@ python3.11 --version
|
|
|
72
72
|
|
|
73
73
|
#### 4. poppler (PDF 텍스트 추출)
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
PDF 페이지 수 확인(`pdfinfo`)과 텍스트 레이어 추출(`pdftotext`)에 필요하며, OCR 전략 시스템의 텍스트 레이어 사전 검사에 사용됩니다.
|
|
76
76
|
|
|
77
77
|
```bash
|
|
78
78
|
brew install poppler
|
|
@@ -281,12 +281,12 @@ const outputPath = await pdfParser.parse(
|
|
|
281
281
|
|
|
282
282
|
`@heripo/pdf-parser`는 다음 시스템 레벨 의존성이 필요합니다:
|
|
283
283
|
|
|
284
|
-
| 의존성 | 필수 버전 | 설치 방법 | 용도
|
|
285
|
-
| ------- | ---------- | -------------------------- |
|
|
286
|
-
| Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK 실행 환경
|
|
287
|
-
| poppler | Any | `brew install poppler` |
|
|
288
|
-
| jq | Any | `brew install jq` | JSON 처리 (변환 결과 파싱)
|
|
289
|
-
| lsof | Any | macOS 기본 설치됨 | docling-serve 포트 관리
|
|
284
|
+
| 의존성 | 필수 버전 | 설치 방법 | 용도 |
|
|
285
|
+
| ------- | ---------- | -------------------------- | -------------------------------------------------------------- |
|
|
286
|
+
| Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK 실행 환경 |
|
|
287
|
+
| poppler | Any | `brew install poppler` | PDF 페이지 수 확인 (pdfinfo) 및 텍스트 레이어 추출 (pdftotext) |
|
|
288
|
+
| jq | Any | `brew install jq` | JSON 처리 (변환 결과 파싱) |
|
|
289
|
+
| lsof | Any | macOS 기본 설치됨 | docling-serve 포트 관리 |
|
|
290
290
|
|
|
291
291
|
> ⚠️ **Python 3.13+는 지원하지 않습니다.** Docling SDK의 일부 의존성이 Python 3.13과 호환되지 않습니다.
|
|
292
292
|
|
|
@@ -411,6 +411,16 @@ const pdfParser = new PDFParser({
|
|
|
411
411
|
brew install jq
|
|
412
412
|
```
|
|
413
413
|
|
|
414
|
+
### poppler를 찾을 수 없음
|
|
415
|
+
|
|
416
|
+
**증상**: `poppler is not installed. Please install poppler using: brew install poppler`
|
|
417
|
+
|
|
418
|
+
**해결**:
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
brew install poppler
|
|
422
|
+
```
|
|
423
|
+
|
|
414
424
|
### 포트 충돌
|
|
415
425
|
|
|
416
426
|
**증상**: `Port 5001 is already in use`
|
package/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
> PDF parsing library - OCR support with Docling SDK
|
|
4
4
|
|
|
5
5
|
[](https://www.npmjs.com/package/@heripo/pdf-parser)
|
|
6
|
-
[](https://nodejs.org/)
|
|
7
7
|
[](https://www.python.org/)
|
|
8
8
|

|
|
9
9
|
[](../../LICENSE)
|
|
@@ -46,7 +46,7 @@
|
|
|
46
46
|
|
|
47
47
|
### Required Dependencies
|
|
48
48
|
|
|
49
|
-
#### 1. Node.js >=
|
|
49
|
+
#### 1. Node.js >= 24.0.0
|
|
50
50
|
|
|
51
51
|
```bash
|
|
52
52
|
brew install node
|
|
@@ -72,7 +72,7 @@ python3.11 --version
|
|
|
72
72
|
|
|
73
73
|
#### 4. poppler (PDF text extraction)
|
|
74
74
|
|
|
75
|
-
Required for the OCR strategy system's text layer pre-check
|
|
75
|
+
Required for PDF page counting (`pdfinfo`) and text layer extraction (`pdftotext`), used by the OCR strategy system's text layer pre-check.
|
|
76
76
|
|
|
77
77
|
```bash
|
|
78
78
|
brew install poppler
|
|
@@ -281,12 +281,12 @@ Archaeological excavation report PDFs have the following characteristics:
|
|
|
281
281
|
|
|
282
282
|
`@heripo/pdf-parser` requires the following system-level dependencies:
|
|
283
283
|
|
|
284
|
-
| Dependency | Required Version | Installation | Purpose
|
|
285
|
-
| ---------- | ---------------- | -------------------------- |
|
|
286
|
-
| Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK runtime
|
|
287
|
-
| poppler | Any | `brew install poppler` |
|
|
288
|
-
| jq | Any | `brew install jq` | JSON processing (conversion result parsing)
|
|
289
|
-
| lsof | Any | Included with macOS | docling-serve port management
|
|
284
|
+
| Dependency | Required Version | Installation | Purpose |
|
|
285
|
+
| ---------- | ---------------- | -------------------------- | ----------------------------------------------------------------- |
|
|
286
|
+
| Python | 3.9 - 3.12 | `brew install python@3.11` | Docling SDK runtime |
|
|
287
|
+
| poppler | Any | `brew install poppler` | PDF page counting (pdfinfo) and text layer extraction (pdftotext) |
|
|
288
|
+
| jq | Any | `brew install jq` | JSON processing (conversion result parsing) |
|
|
289
|
+
| lsof | Any | Included with macOS | docling-serve port management |
|
|
290
290
|
|
|
291
291
|
> ⚠️ **Python 3.13+ is not supported.** Some Docling SDK dependencies are not compatible with Python 3.13.
|
|
292
292
|
|
|
@@ -411,6 +411,16 @@ const pdfParser = new PDFParser({
|
|
|
411
411
|
brew install jq
|
|
412
412
|
```
|
|
413
413
|
|
|
414
|
+
### poppler Not Found
|
|
415
|
+
|
|
416
|
+
**Symptom**: `poppler is not installed. Please install poppler using: brew install poppler`
|
|
417
|
+
|
|
418
|
+
**Solution**:
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
brew install poppler
|
|
422
|
+
```
|
|
423
|
+
|
|
414
424
|
### Port Conflict
|
|
415
425
|
|
|
416
426
|
**Symptom**: `Port 5001 is already in use`
|