file2md 1.4.34 → 1.4.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.ko.md +408 -0
- package/README.md +27 -113
- package/dist/utils/pdf-extractor.d.ts.map +1 -1
- package/dist/utils/pdf-extractor.js +111 -42
- package/dist/utils/pdf-extractor.js.map +1 -1
- package/package.json +1 -1
package/README.ko.md
ADDED
|
@@ -0,0 +1,408 @@
|
|
|
1
|
+
# file2md
|
|
2
|
+
|
|
3
|
+
[](https://badge.fury.io/js/file2md)
|
|
4
|
+
[](https://www.typescriptlang.org/)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
|
+
|
|
7
|
+
다양한 문서 형식(PDF, DOCX, XLSX, PPTX, HWP, HWPX)을 **고급 레이아웃 보존**, **실제 PDF 이미지 추출**, **차트 변환**, **한국어 문서 지원** 기능과 함께 마크다운으로 변환하는 현대적인 TypeScript 라이브러리입니다.
|
|
8
|
+
|
|
9
|
+
[English](README.md) | **한국어**
|
|
10
|
+
|
|
11
|
+
## ✨ 주요 기능
|
|
12
|
+
|
|
13
|
+
- 🔄 **다양한 형식 지원**: PDF, DOCX, XLSX, PPTX, HWP, HWPX
|
|
14
|
+
- 🎨 **레이아웃 보존**: 문서 구조, 표, 서식 유지
|
|
15
|
+
- 🖼️ **실제 PDF 이미지 추출**: pdf2pic을 사용하여 PDF 페이지를 실제 PNG 이미지로 변환
|
|
16
|
+
- 📊 **차트 변환**: 차트를 마크다운 표로 변환
|
|
17
|
+
- 📝 **목록 및 표 지원**: 중첩된 목록과 복잡한 표 지원
|
|
18
|
+
- 🌏 **한국어 문서 지원**: HWP/HWPX 한국어 문서 형식 완전 지원
|
|
19
|
+
- 🔒 **타입 안전성**: 포괄적인 타입을 제공하는 완전한 TypeScript 지원
|
|
20
|
+
- ⚡ **현대적 ESM**: CommonJS 호환성을 갖춘 ES2022 모듈
|
|
21
|
+
- 🚀 **무설정**: 별도 설정 없이 바로 사용 가능
|
|
22
|
+
- 🎯 **시각적 파싱**: 향상된 PPTX 파싱 및 시각적 레이아웃 분석
|
|
23
|
+
|
|
24
|
+
## 📦 설치
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
npm install file2md
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## 🚀 빠른 시작
|
|
31
|
+
|
|
32
|
+
### TypeScript / ES 모듈
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
import { convert } from 'file2md';
|
|
36
|
+
|
|
37
|
+
// 파일 경로로 변환
|
|
38
|
+
const result = await convert('./document.pdf');
|
|
39
|
+
console.log(result.markdown);
|
|
40
|
+
|
|
41
|
+
// 옵션과 함께 변환
|
|
42
|
+
const result = await convert('./presentation.pptx', {
|
|
43
|
+
imageDir: 'extracted-images',
|
|
44
|
+
preserveLayout: true,
|
|
45
|
+
extractCharts: true,
|
|
46
|
+
useVisualParser: true // 향상된 PPTX 파싱
|
|
47
|
+
});
|
|
48
|
+
|
|
49
|
+
console.log(`✅ 변환 완료!`);
|
|
50
|
+
console.log(`📄 마크다운 길이: ${result.markdown.length}`);
|
|
51
|
+
console.log(`🖼️ 추출된 이미지: ${result.images.length}`);
|
|
52
|
+
console.log(`📊 발견된 차트: ${result.charts.length}`);
|
|
53
|
+
console.log(`⏱️ 처리 시간: ${result.metadata.processingTime}ms`);
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### 한국어 문서 지원 (HWP/HWPX)
|
|
57
|
+
|
|
58
|
+
```typescript
|
|
59
|
+
import { convert } from 'file2md';
|
|
60
|
+
|
|
61
|
+
// 한국어 HWP 문서 변환
|
|
62
|
+
const hwpResult = await convert('./document.hwp', {
|
|
63
|
+
imageDir: 'hwp-images',
|
|
64
|
+
preserveLayout: true,
|
|
65
|
+
extractImages: true
|
|
66
|
+
});
|
|
67
|
+
|
|
68
|
+
// 한국어 HWPX 문서 변환 (XML 기반 형식)
|
|
69
|
+
const hwpxResult = await convert('./document.hwpx', {
|
|
70
|
+
imageDir: 'hwpx-images',
|
|
71
|
+
preserveLayout: true,
|
|
72
|
+
extractImages: true
|
|
73
|
+
});
|
|
74
|
+
|
|
75
|
+
console.log(`🇰🇷 HWP 내용: ${hwpResult.markdown.substring(0, 100)}...`);
|
|
76
|
+
console.log(`📄 HWPX 페이지: ${hwpResult.metadata.pageCount}`);
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### CommonJS
|
|
80
|
+
|
|
81
|
+
```javascript
|
|
82
|
+
const { convert } = require('file2md');
|
|
83
|
+
|
|
84
|
+
const result = await convert('./document.docx');
|
|
85
|
+
console.log(result.markdown);
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### 버퍼에서 변환
|
|
89
|
+
|
|
90
|
+
```typescript
|
|
91
|
+
import { convert } from 'file2md';
|
|
92
|
+
import { readFile } from 'fs/promises';
|
|
93
|
+
|
|
94
|
+
const buffer = await readFile('./document.xlsx');
|
|
95
|
+
const result = await convert(buffer, {
|
|
96
|
+
imageDir: 'spreadsheet-images'
|
|
97
|
+
});
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## 📋 API 참조
|
|
101
|
+
|
|
102
|
+
### `convert(input, options?)`
|
|
103
|
+
|
|
104
|
+
**매개변수:**
|
|
105
|
+
- `input: string | Buffer` - 파일 경로 또는 문서 데이터가 포함된 버퍼
|
|
106
|
+
- `options?: ConvertOptions` - 변환 옵션
|
|
107
|
+
|
|
108
|
+
**반환값:** `Promise<ConversionResult>`
|
|
109
|
+
|
|
110
|
+
### 옵션
|
|
111
|
+
|
|
112
|
+
```typescript
|
|
113
|
+
interface ConvertOptions {
|
|
114
|
+
imageDir?: string; // 추출된 이미지 디렉터리 (기본값: 'images')
|
|
115
|
+
outputDir?: string; // 슬라이드 스크린샷 출력 디렉터리 (PPTX용, imageDir로 폴백)
|
|
116
|
+
preserveLayout?: boolean; // 문서 레이아웃 유지 (기본값: true)
|
|
117
|
+
extractCharts?: boolean; // 차트를 표로 변환 (기본값: true)
|
|
118
|
+
extractImages?: boolean; // 임베디드 이미지 추출 (기본값: true)
|
|
119
|
+
maxPages?: number; // PDF 최대 페이지 수 (기본값: 무제한)
|
|
120
|
+
useVisualParser?: boolean; // PPTX용 향상된 시각적 파싱 (기본값: true)
|
|
121
|
+
}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### 결과
|
|
125
|
+
|
|
126
|
+
```typescript
|
|
127
|
+
interface ConversionResult {
|
|
128
|
+
markdown: string; // 생성된 마크다운 콘텐츠
|
|
129
|
+
images: ImageData[]; // 추출된 이미지 정보
|
|
130
|
+
charts: ChartData[]; // 추출된 차트 데이터
|
|
131
|
+
metadata: DocumentMetadata; // 처리 정보가 포함된 문서 메타데이터
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## 🎯 형식별 세부 기능
|
|
136
|
+
|
|
137
|
+
### 📄 PDF
|
|
138
|
+
- ✅ **텍스트 추출** 및 레이아웃 향상
|
|
139
|
+
- ✅ **표 감지** 및 서식 지정
|
|
140
|
+
- ✅ **목록 인식** (글머리 기호, 번호)
|
|
141
|
+
- ✅ **제목 감지** (모든 대문자, 콜론)
|
|
142
|
+
- ✅ **실제 이미지 추출** pdf2pic 사용 - PDF 페이지를 PNG 이미지로 변환
|
|
143
|
+
- ✅ **임베디드 이미지 감지** 및 추출
|
|
144
|
+
|
|
145
|
+
### 📝 DOCX
|
|
146
|
+
- ✅ **제목 계층 구조** (H1-H6)
|
|
147
|
+
- ✅ **텍스트 서식** (굵게, 기울임꼴)
|
|
148
|
+
- ✅ **복잡한 표** 병합된 셀 포함
|
|
149
|
+
- ✅ **중첩된 목록** 적절한 들여쓰기 포함
|
|
150
|
+
- ✅ **임베디드 이미지** 및 차트
|
|
151
|
+
- ✅ **셀 스타일링** (정렬, 색상)
|
|
152
|
+
- ✅ **글꼴 크기 보존** 및 서식
|
|
153
|
+
|
|
154
|
+
### 📊 XLSX
|
|
155
|
+
- ✅ **여러 워크시트** 별도 섹션으로 구분
|
|
156
|
+
- ✅ **셀 서식** (굵게, 색상, 정렬)
|
|
157
|
+
- ✅ **데이터 타입 보존**
|
|
158
|
+
- ✅ **차트 추출** 데이터 표로 변환
|
|
159
|
+
- ✅ **조건부 서식** 메모
|
|
160
|
+
- ✅ **공유 문자열** 대용량 파일 처리
|
|
161
|
+
|
|
162
|
+
### 🎬 PPTX
|
|
163
|
+
- ✅ **슬라이드별** 구성
|
|
164
|
+
- ✅ **텍스트 위치 지정** 및 레이아웃
|
|
165
|
+
- ✅ **슬라이드별 이미지 배치**
|
|
166
|
+
- ✅ **슬라이드에서 표 추출**
|
|
167
|
+
- ✅ **다중 열 레이아웃**
|
|
168
|
+
- ✅ **향상된 레이아웃 분석을 통한 시각적 파싱**
|
|
169
|
+
- ✅ **문서 속성에서 제목 추출**
|
|
170
|
+
- ✅ **차트 및 이미지** 인라인 임베딩
|
|
171
|
+
|
|
172
|
+
### 🇰🇷 HWP (한국어)
|
|
173
|
+
- ✅ **바이너리 형식** hwp.js를 사용한 파싱
|
|
174
|
+
- ✅ **한국어 텍스트 추출** 적절한 인코딩 포함
|
|
175
|
+
- ✅ **임베디드 콘텐츠에서 이미지 추출**
|
|
176
|
+
- ✅ **한국어 문서용 레이아웃 보존**
|
|
177
|
+
- ✅ **저작권 메시지 필터링** 깔끔한 출력
|
|
178
|
+
|
|
179
|
+
### 🇰🇷 HWPX (한국어 XML)
|
|
180
|
+
- ✅ **XML 기반 형식** JSZip을 사용한 파싱
|
|
181
|
+
- ✅ **대용량 문서용 다중 섹션 지원**
|
|
182
|
+
- ✅ **이미지 참조용 관계 매핑**
|
|
183
|
+
- ✅ **OWPML 구조** 파싱
|
|
184
|
+
- ✅ **향상된 한국어 텍스트** 처리
|
|
185
|
+
- ✅ **ZIP 아카이브에서 BinData 이미지 추출**
|
|
186
|
+
|
|
187
|
+
## 🖼️ 이미지 처리
|
|
188
|
+
|
|
189
|
+
이미지는 자동으로 추출되어 지정된 디렉터리에 저장됩니다:
|
|
190
|
+
|
|
191
|
+
```typescript
|
|
192
|
+
const result = await convert('./presentation.pptx', {
|
|
193
|
+
imageDir: 'my-images'
|
|
194
|
+
});
|
|
195
|
+
|
|
196
|
+
// 결과 구조:
|
|
197
|
+
// my-images/
|
|
198
|
+
// ├── image_1.png
|
|
199
|
+
// ├── image_2.jpg
|
|
200
|
+
// └── chart_1.png
|
|
201
|
+
|
|
202
|
+
// 마크다운에 포함될 내용:
|
|
203
|
+
// 
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
## 📊 차트 변환
|
|
207
|
+
|
|
208
|
+
차트는 마크다운 표로 변환됩니다:
|
|
209
|
+
|
|
210
|
+
```markdown
|
|
211
|
+
#### Chart 1: 매출 데이터
|
|
212
|
+
|
|
213
|
+
| 카테고리 | 1분기 | 2분기 | 3분기 | 4분기 |
|
|
214
|
+
| --- | --- | --- | --- | --- |
|
|
215
|
+
| 매출 | 100 | 150 | 200 | 250 |
|
|
216
|
+
| 수익 | 20 | 30 | 45 | 60 |
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
## 🛡️ 오류 처리
|
|
220
|
+
|
|
221
|
+
```typescript
|
|
222
|
+
import {
|
|
223
|
+
convert,
|
|
224
|
+
UnsupportedFormatError,
|
|
225
|
+
FileNotFoundError,
|
|
226
|
+
ParseError
|
|
227
|
+
} from 'file2md';
|
|
228
|
+
|
|
229
|
+
try {
|
|
230
|
+
const result = await convert('./document.pdf');
|
|
231
|
+
} catch (error) {
|
|
232
|
+
if (error instanceof UnsupportedFormatError) {
|
|
233
|
+
console.error('지원하지 않는 파일 형식입니다');
|
|
234
|
+
} else if (error instanceof FileNotFoundError) {
|
|
235
|
+
console.error('파일을 찾을 수 없습니다');
|
|
236
|
+
} else if (error instanceof ParseError) {
|
|
237
|
+
console.error('문서 파싱에 실패했습니다:', error.message);
|
|
238
|
+
}
|
|
239
|
+
}
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## 🧪 고급 사용법
|
|
243
|
+
|
|
244
|
+
### 일괄 처리
|
|
245
|
+
|
|
246
|
+
```typescript
|
|
247
|
+
import { convert } from 'file2md';
|
|
248
|
+
import { readdir } from 'fs/promises';
|
|
249
|
+
|
|
250
|
+
async function convertFolder(folderPath: string) {
|
|
251
|
+
const files = await readdir(folderPath);
|
|
252
|
+
const results = [];
|
|
253
|
+
|
|
254
|
+
for (const file of files) {
|
|
255
|
+
if (file.match(/\.(pdf|docx|xlsx|pptx|hwp|hwpx)$/i)) {
|
|
256
|
+
try {
|
|
257
|
+
const result = await convert(`${folderPath}/${file}`, {
|
|
258
|
+
imageDir: 'batch-images',
|
|
259
|
+
extractImages: true
|
|
260
|
+
});
|
|
261
|
+
results.push({ file, success: true, result });
|
|
262
|
+
} catch (error) {
|
|
263
|
+
results.push({ file, success: false, error });
|
|
264
|
+
}
|
|
265
|
+
}
|
|
266
|
+
}
|
|
267
|
+
|
|
268
|
+
return results;
|
|
269
|
+
}
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
### PDF 이미지 추출 옵션
|
|
273
|
+
|
|
274
|
+
```typescript
|
|
275
|
+
import { convert } from 'file2md';
|
|
276
|
+
|
|
277
|
+
// 이미지 중심 PDF용 (스캔된 문서)
|
|
278
|
+
const result = await convert('./scanned-document.pdf', {
|
|
279
|
+
imageDir: 'pdf-images',
|
|
280
|
+
maxPages: 10, // 대용량 PDF용 페이지 제한
|
|
281
|
+
extractImages: true // PDF-이미지 변환 활성화
|
|
282
|
+
});
|
|
283
|
+
|
|
284
|
+
console.log(`PDF에서 ${result.images.length}개의 페이지 이미지를 추출했습니다`);
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
## 📊 지원 형식
|
|
288
|
+
|
|
289
|
+
| 형식 | 확장자 | 레이아웃 | 이미지 | 차트 | 표 | 목록 |
|
|
290
|
+
|------|-------|---------|-------|------|----|----|
|
|
291
|
+
| PDF | `.pdf` | ✅ | ✅ | ❌ | ✅ | ✅ |
|
|
292
|
+
| Word | `.docx` | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
293
|
+
| Excel | `.xlsx` | ✅ | ❌ | ✅ | ✅ | ❌ |
|
|
294
|
+
| PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ | ❌ |
|
|
295
|
+
| HWP | `.hwp` | ✅ | ✅ | ❌ | ❌ | ✅ |
|
|
296
|
+
| HWPX | `.hwpx` | ✅ | ✅ | ❌ | ❌ | ✅ |
|
|
297
|
+
|
|
298
|
+
> **PDF 이미지**: pdf2pic 라이브러리를 사용하여 PDF 페이지를 실제 PNG 이미지로 변환
|
|
299
|
+
|
|
300
|
+
## 🌏 한국어 문서 지원
|
|
301
|
+
|
|
302
|
+
file2md는 한국어 문서 형식에 대한 포괄적인 지원을 제공합니다:
|
|
303
|
+
|
|
304
|
+
### HWP (한글)
|
|
305
|
+
- 한글 워드프로세서에서 사용하는 **바이너리 형식**
|
|
306
|
+
- 한국 조직에서 여전히 널리 사용되는 **레거시 형식**
|
|
307
|
+
- 한국어 문자 인코딩을 통한 **완전한 텍스트 추출**
|
|
308
|
+
- **이미지 및 차트** 추출 지원
|
|
309
|
+
|
|
310
|
+
### HWPX (한글 XML)
|
|
311
|
+
- HWP의 후속작인 **현대적인 XML 기반** 형식
|
|
312
|
+
- XML 콘텐츠 파일을 포함한 **ZIP 아카이브 구조**
|
|
313
|
+
- 관계 매핑을 통한 **향상된 파싱**
|
|
314
|
+
- **다중 섹션** 및 복잡한 문서 지원
|
|
315
|
+
|
|
316
|
+
### 사용 예제
|
|
317
|
+
|
|
318
|
+
```typescript
|
|
319
|
+
// 한국어 문서 변환
|
|
320
|
+
const koreanDocs = [
|
|
321
|
+
'report.hwp', // 레거시 바이너리 형식
|
|
322
|
+
'document.hwpx', // 현대적 XML 형식
|
|
323
|
+
'presentation.pptx'
|
|
324
|
+
];
|
|
325
|
+
|
|
326
|
+
for (const doc of koreanDocs) {
|
|
327
|
+
const result = await convert(doc, {
|
|
328
|
+
imageDir: 'korean-docs-images',
|
|
329
|
+
preserveLayout: true
|
|
330
|
+
});
|
|
331
|
+
|
|
332
|
+
console.log(`📄 ${doc}: ${result.markdown.length} 문자`);
|
|
333
|
+
console.log(`🖼️ 이미지: ${result.images.length}`);
|
|
334
|
+
console.log(`⏱️ ${result.metadata.processingTime}ms에 처리 완료`);
|
|
335
|
+
}
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
## 🔧 성능 및 설정
|
|
339
|
+
|
|
340
|
+
```typescript
|
|
341
|
+
import { convert } from 'file2md';
|
|
342
|
+
|
|
343
|
+
// 대용량 문서 최적화
|
|
344
|
+
const result = await convert('./large-document.pdf', {
|
|
345
|
+
maxPages: 50, // PDF 처리 페이지 제한
|
|
346
|
+
extractImages: true, // PDF 이미지 추출 활성화
|
|
347
|
+
preserveLayout: true // 레이아웃 분석 유지
|
|
348
|
+
});
|
|
349
|
+
|
|
350
|
+
// 향상된 PPTX 처리
|
|
351
|
+
const pptxResult = await convert('./presentation.pptx', {
|
|
352
|
+
useVisualParser: true, // 시각적 레이아웃 분석 활성화
|
|
353
|
+
outputDir: 'slides', // 슬라이드용 별도 디렉터리
|
|
354
|
+
extractCharts: true, // 차트 데이터 추출
|
|
355
|
+
extractImages: true // 임베디드 이미지 추출
|
|
356
|
+
});
|
|
357
|
+
|
|
358
|
+
// 메타데이터에서 성능 지표 확인 가능
|
|
359
|
+
console.log('성능 지표:');
|
|
360
|
+
console.log(`- 처리 시간: ${result.metadata.processingTime}ms`);
|
|
361
|
+
console.log(`- 처리된 페이지: ${result.metadata.pageCount}`);
|
|
362
|
+
console.log(`- 추출된 이미지: ${result.metadata.imageCount}`);
|
|
363
|
+
console.log(`- 파일 타입: ${result.metadata.fileType}`);
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
## 🤝 기여하기
|
|
367
|
+
|
|
368
|
+
기여를 환영합니다! 언제든 풀 리퀘스트를 제출해 주세요.
|
|
369
|
+
|
|
370
|
+
1. 저장소를 포크합니다
|
|
371
|
+
2. 기능 브랜치를 생성합니다 (`git checkout -b feature/amazing-feature`)
|
|
372
|
+
3. 변경 사항을 커밋합니다 (`git commit -m 'Add amazing feature'`)
|
|
373
|
+
4. 브랜치에 푸시합니다 (`git push origin feature/amazing-feature`)
|
|
374
|
+
5. 풀 리퀘스트를 엽니다
|
|
375
|
+
|
|
376
|
+
### 개발 환경 설정
|
|
377
|
+
|
|
378
|
+
```bash
|
|
379
|
+
# 저장소 복제
|
|
380
|
+
git clone https://github.com/ricky-clevi/file2md.git
|
|
381
|
+
cd file2md
|
|
382
|
+
|
|
383
|
+
# 의존성 설치
|
|
384
|
+
npm install
|
|
385
|
+
|
|
386
|
+
# 테스트 실행
|
|
387
|
+
npm test
|
|
388
|
+
|
|
389
|
+
# 프로젝트 빌드
|
|
390
|
+
npm run build
|
|
391
|
+
|
|
392
|
+
# 린팅 실행
|
|
393
|
+
npm run lint
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
## 📄 라이센스
|
|
397
|
+
|
|
398
|
+
이 프로젝트는 MIT 라이센스에 따라 라이센스가 부여됩니다. 자세한 내용은 [LICENSE](LICENSE) 파일을 참조하세요.
|
|
399
|
+
|
|
400
|
+
## 🔗 링크
|
|
401
|
+
|
|
402
|
+
- [npm 패키지](https://www.npmjs.com/package/file2md)
|
|
403
|
+
- [GitHub 저장소](https://github.com/ricky-clevi/file2md)
|
|
404
|
+
- [이슈 및 버그 신고](https://github.com/ricky-clevi/file2md/issues)
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
**❤️와 TypeScript로 제작** • **🖼️ 실제 PDF 이미지 추출 기능 향상** • **🇰🇷 한국어 문서 지원**
|
package/README.md
CHANGED
|
@@ -4,13 +4,15 @@
|
|
|
4
4
|
[](https://www.typescriptlang.org/)
|
|
5
5
|
[](https://opensource.org/licenses/MIT)
|
|
6
6
|
|
|
7
|
-
A modern TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with **advanced layout preservation**, **image extraction**, **chart conversion**, and **Korean language support**.
|
|
7
|
+
A modern TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with **advanced layout preservation**, **real PDF image extraction**, **chart conversion**, and **Korean language support**.
|
|
8
|
+
|
|
9
|
+
**English** | [한국어](README.ko.md)
|
|
8
10
|
|
|
9
11
|
## ✨ Features
|
|
10
12
|
|
|
11
13
|
- 🔄 **Multiple Format Support**: PDF, DOCX, XLSX, PPTX, HWP, HWPX
|
|
12
14
|
- 🎨 **Layout Preservation**: Maintains document structure, tables, and formatting
|
|
13
|
-
- 🖼️ **Image Extraction**:
|
|
15
|
+
- 🖼️ **Real PDF Image Extraction**: Convert PDF pages to actual PNG images using pdf2pic
|
|
14
16
|
- 📊 **Chart Conversion**: Converts charts to Markdown tables
|
|
15
17
|
- 📝 **List & Table Support**: Proper nested lists and complex tables
|
|
16
18
|
- 🌏 **Korean Language Support**: Full support for HWP/HWPX Korean document formats
|
|
@@ -137,8 +139,8 @@ interface ConversionResult {
|
|
|
137
139
|
- ✅ **Table detection** and formatting
|
|
138
140
|
- ✅ **List recognition** (bullets, numbers)
|
|
139
141
|
- ✅ **Heading detection** (ALL CAPS, colons)
|
|
140
|
-
- ✅ **
|
|
141
|
-
- ✅ **Embedded image
|
|
142
|
+
- ✅ **Real image extraction** using pdf2pic - converts PDF pages to PNG images
|
|
143
|
+
- ✅ **Embedded image detection** and extraction
|
|
142
144
|
|
|
143
145
|
### 📝 DOCX
|
|
144
146
|
- ✅ **Heading hierarchy** (H1-H6)
|
|
@@ -239,23 +241,6 @@ try {
|
|
|
239
241
|
|
|
240
242
|
## 🧪 Advanced Usage
|
|
241
243
|
|
|
242
|
-
### Custom Error Handling
|
|
243
|
-
|
|
244
|
-
```typescript
|
|
245
|
-
import { convert, ConversionError } from 'file2md';
|
|
246
|
-
|
|
247
|
-
try {
|
|
248
|
-
const result = await convert('./complex-document.docx');
|
|
249
|
-
} catch (error) {
|
|
250
|
-
if (error instanceof ConversionError) {
|
|
251
|
-
console.error(`Conversion failed [${error.code}]:`, error.message);
|
|
252
|
-
if (error.originalError) {
|
|
253
|
-
console.error('Original error:', error.originalError);
|
|
254
|
-
}
|
|
255
|
-
}
|
|
256
|
-
}
|
|
257
|
-
```
|
|
258
|
-
|
|
259
244
|
### Batch Processing
|
|
260
245
|
|
|
261
246
|
```typescript
|
|
@@ -267,9 +252,12 @@ async function convertFolder(folderPath: string) {
|
|
|
267
252
|
const results = [];
|
|
268
253
|
|
|
269
254
|
for (const file of files) {
|
|
270
|
-
if (file.match(/\.(pdf|docx|xlsx|pptx)$/i)) {
|
|
255
|
+
if (file.match(/\.(pdf|docx|xlsx|pptx|hwp|hwpx)$/i)) {
|
|
271
256
|
try {
|
|
272
|
-
const result = await convert(`${folderPath}/${file}
|
|
257
|
+
const result = await convert(`${folderPath}/${file}`, {
|
|
258
|
+
imageDir: 'batch-images',
|
|
259
|
+
extractImages: true
|
|
260
|
+
});
|
|
273
261
|
results.push({ file, success: true, result });
|
|
274
262
|
} catch (error) {
|
|
275
263
|
results.push({ file, success: false, error });
|
|
@@ -281,64 +269,34 @@ async function convertFolder(folderPath: string) {
|
|
|
281
269
|
}
|
|
282
270
|
```
|
|
283
271
|
|
|
284
|
-
|
|
272
|
+
### PDF Image Extraction Options
|
|
285
273
|
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
```bash
|
|
289
|
-
git clone https://github.com/yourusername/file2md.git
|
|
290
|
-
cd file2md
|
|
291
|
-
npm install
|
|
292
|
-
npm run build
|
|
293
|
-
```
|
|
294
|
-
|
|
295
|
-
### Testing
|
|
296
|
-
|
|
297
|
-
```bash
|
|
298
|
-
npm test # Run tests
|
|
299
|
-
npm run test:watch # Watch mode
|
|
300
|
-
npm run test:coverage # Coverage report
|
|
301
|
-
```
|
|
274
|
+
```typescript
|
|
275
|
+
import { convert } from 'file2md';
|
|
302
276
|
|
|
303
|
-
|
|
277
|
+
// For image-heavy PDFs (scanned documents)
|
|
278
|
+
const result = await convert('./scanned-document.pdf', {
|
|
279
|
+
imageDir: 'pdf-images',
|
|
280
|
+
maxPages: 10, // Limit pages for large PDFs
|
|
281
|
+
extractImages: true // Enable PDF-to-image conversion
|
|
282
|
+
});
|
|
304
283
|
|
|
305
|
-
|
|
306
|
-
npm run lint # Check code style
|
|
307
|
-
npm run lint:fix # Fix issues
|
|
284
|
+
console.log(`Extracted ${result.images.length} page images from PDF`);
|
|
308
285
|
```
|
|
309
286
|
|
|
310
|
-
## 🤝 Contributing
|
|
311
|
-
|
|
312
|
-
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
313
|
-
|
|
314
|
-
1. Fork the repository
|
|
315
|
-
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
|
316
|
-
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
317
|
-
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
318
|
-
5. Open a Pull Request
|
|
319
|
-
|
|
320
|
-
## 📄 License
|
|
321
|
-
|
|
322
|
-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
323
|
-
|
|
324
|
-
## 🔗 Links
|
|
325
|
-
|
|
326
|
-
- [npm package](https://www.npmjs.com/package/file2md)
|
|
327
|
-
- [GitHub repository](https://github.com/yourusername/file2md)
|
|
328
|
-
- [Issues & Bug Reports](https://github.com/yourusername/file2md/issues)
|
|
329
287
|
|
|
330
288
|
## 📊 Supported Formats
|
|
331
289
|
|
|
332
290
|
| Format | Extension | Layout | Images | Charts | Tables | Lists |
|
|
333
291
|
|--------|-----------|---------|---------|---------|---------|--------|
|
|
334
|
-
| PDF | `.pdf` | ✅ |
|
|
292
|
+
| PDF | `.pdf` | ✅ | ✅ | ❌ | ✅ | ✅ |
|
|
335
293
|
| Word | `.docx` | ✅ | ✅ | ✅ | ✅ | ✅ |
|
|
336
294
|
| Excel | `.xlsx` | ✅ | ❌ | ✅ | ✅ | ❌ |
|
|
337
295
|
| PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ | ❌ |
|
|
338
296
|
| HWP | `.hwp` | ✅ | ✅ | ❌ | ❌ | ✅ |
|
|
339
297
|
| HWPX | `.hwpx` | ✅ | ✅ | ❌ | ❌ | ✅ |
|
|
340
298
|
|
|
341
|
-
|
|
299
|
+
> **PDF Images**: Converts PDF pages to actual PNG images using pdf2pic library
|
|
342
300
|
|
|
343
301
|
## 🌏 Korean Document Support
|
|
344
302
|
|
|
@@ -378,9 +336,7 @@ for (const doc of koreanDocs) {
|
|
|
378
336
|
}
|
|
379
337
|
```
|
|
380
338
|
|
|
381
|
-
## 🔧
|
|
382
|
-
|
|
383
|
-
### Performance Optimization
|
|
339
|
+
## 🔧 Performance & Configuration
|
|
384
340
|
|
|
385
341
|
```typescript
|
|
386
342
|
import { convert } from 'file2md';
|
|
@@ -388,7 +344,7 @@ import { convert } from 'file2md';
|
|
|
388
344
|
// Optimize for large documents
|
|
389
345
|
const result = await convert('./large-document.pdf', {
|
|
390
346
|
maxPages: 50, // Limit PDF processing
|
|
391
|
-
extractImages:
|
|
347
|
+
extractImages: true, // Enable PDF image extraction
|
|
392
348
|
preserveLayout: true // Keep layout analysis
|
|
393
349
|
});
|
|
394
350
|
|
|
@@ -399,41 +355,13 @@ const pptxResult = await convert('./presentation.pptx', {
|
|
|
399
355
|
extractCharts: true, // Extract chart data
|
|
400
356
|
extractImages: true // Extract embedded images
|
|
401
357
|
});
|
|
402
|
-
```
|
|
403
|
-
|
|
404
|
-
### Error Handling for Korean Documents
|
|
405
|
-
|
|
406
|
-
```typescript
|
|
407
|
-
import { convert, ParseError } from 'file2md';
|
|
408
|
-
|
|
409
|
-
try {
|
|
410
|
-
const result = await convert('./korean-document.hwp');
|
|
411
|
-
console.log('Korean document converted successfully');
|
|
412
|
-
} catch (error) {
|
|
413
|
-
if (error instanceof ParseError) {
|
|
414
|
-
console.error(`Failed to parse ${error.format} document:`, error.message);
|
|
415
|
-
// Handle Korean-specific parsing errors
|
|
416
|
-
if (error.format === 'HWP' || error.format === 'HWPX') {
|
|
417
|
-
console.log('Try converting to HWPX format for better compatibility');
|
|
418
|
-
}
|
|
419
|
-
}
|
|
420
|
-
}
|
|
421
|
-
```
|
|
422
|
-
|
|
423
|
-
## 📈 Performance Metrics
|
|
424
|
-
|
|
425
|
-
The library provides detailed performance metrics in the metadata:
|
|
426
|
-
|
|
427
|
-
```typescript
|
|
428
|
-
const result = await convert('./document.docx');
|
|
429
358
|
|
|
359
|
+
// Performance metrics are available in metadata
|
|
430
360
|
console.log('Performance Metrics:');
|
|
431
361
|
console.log(`- Processing time: ${result.metadata.processingTime}ms`);
|
|
432
362
|
console.log(`- Pages processed: ${result.metadata.pageCount}`);
|
|
433
363
|
console.log(`- Images extracted: ${result.metadata.imageCount}`);
|
|
434
|
-
console.log(`- Charts found: ${result.metadata.chartCount}`);
|
|
435
364
|
console.log(`- File type: ${result.metadata.fileType}`);
|
|
436
|
-
console.log(`- MIME type: ${result.metadata.mimeType}`);
|
|
437
365
|
```
|
|
438
366
|
|
|
439
367
|
## 🤝 Contributing
|
|
@@ -466,18 +394,6 @@ npm run build
|
|
|
466
394
|
npm run lint
|
|
467
395
|
```
|
|
468
396
|
|
|
469
|
-
### Testing Korean Documents
|
|
470
|
-
|
|
471
|
-
When testing Korean document support:
|
|
472
|
-
|
|
473
|
-
```bash
|
|
474
|
-
# Run specific tests for Korean formats
|
|
475
|
-
npm test -- --testNamePattern="HWP"
|
|
476
|
-
|
|
477
|
-
# Run with coverage for Korean parsers
|
|
478
|
-
npm run test:coverage -- --collectCoverageFrom="src/parsers/hwp-*.ts"
|
|
479
|
-
```
|
|
480
|
-
|
|
481
397
|
## 📄 License
|
|
482
398
|
|
|
483
399
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
@@ -487,9 +403,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|
|
487
403
|
- [npm package](https://www.npmjs.com/package/file2md)
|
|
488
404
|
- [GitHub repository](https://github.com/ricky-clevi/file2md)
|
|
489
405
|
- [Issues & Bug Reports](https://github.com/ricky-clevi/file2md/issues)
|
|
490
|
-
- [Korean Document Format Info](https://www.hancom.com/)
|
|
491
406
|
|
|
492
407
|
---
|
|
493
408
|
|
|
494
|
-
**Made with ❤️ and TypeScript**
|
|
495
|
-
**🇰🇷 Enhanced with Korean document support**
|
|
409
|
+
**Made with ❤️ and TypeScript** • **🖼️ Enhanced with real PDF image extraction** • **🇰🇷 Korean document support**
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"pdf-extractor.d.ts","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AAErC,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,wBAAwB,CAAC;AAEvD,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AAW3D,MAAM,WAAW,eAAe;IAC9B,QAAQ,CAAC,QAAQ,CAAC,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,cAAc,CAAC,EAAE,OAAO,CAAC;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC;IAC1B,QAAQ,CAAC,MAAM,EAAE,SAAS,OAAO,wBAAwB,EAAE,SAAS,EAAE,CAAC;IACvE,QAAQ,CAAC,SAAS,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAC;CAC5C;AAED,qBAAa,YAAY;IACvB,OAAO,CAAC,QAAQ,CAAC,cAAc,CAAiB;IAChD,OAAO,CAAC,WAAW,CAAa;gBAEpB,cAAc,EAAE,cAAc;IAI1C;;OAEG;IACG,oBAAoB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,SAAS,QAAQ,EAAE,CAAC;
|
|
1
|
+
{"version":3,"file":"pdf-extractor.d.ts","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AAErC,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,wBAAwB,CAAC;AAEvD,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AAW3D,MAAM,WAAW,eAAe;IAC9B,QAAQ,CAAC,QAAQ,CAAC,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,cAAc,CAAC,EAAE,OAAO,CAAC;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC;IAC1B,QAAQ,CAAC,MAAM,EAAE,SAAS,OAAO,wBAAwB,EAAE,SAAS,EAAE,CAAC;IACvE,QAAQ,CAAC,SAAS,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAC;CAC5C;AAED,qBAAa,YAAY;IACvB,OAAO,CAAC,QAAQ,CAAC,cAAc,CAAiB;IAChD,OAAO,CAAC,WAAW,CAAa;gBAEpB,cAAc,EAAE,cAAc;IAI1C;;OAEG;IACG,oBAAoB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,SAAS,QAAQ,EAAE,CAAC;IAgDxE;;OAEG;YACW,kBAAkB;IA4EhC;;OAEG;YACW,kBAAkB;IAqFhC;;OAEG;IACG,qBAAqB,CAAC,IAAI,EAAE,MAAM,EAAE,QAAQ,CAAC,EAAE,OAAO,GAAG,OAAO,CAAC,MAAM,CAAC;IAiE9E,OAAO,CAAC,eAAe;IAoBvB,OAAO,CAAC,qBAAqB;IAO7B,OAAO,CAAC,gBAAgB;IAaxB,OAAO,CAAC,aAAa;IAgBrB,OAAO,CAAC,eAAe;IA+BvB,OAAO,CAAC,UAAU;IAYlB,OAAO,CAAC,cAAc;IAatB;;OAEG;IACG,gBAAgB,CAAC,UAAU,EAAE,SAAS,QAAQ,EAAE,GAAG,OAAO,CAAC,MAAM,CAAC;IAgBxE;;OAEG;IACH,KAAK,IAAI,IAAI;IAIb;;OAEG;IACH,IAAI,gBAAgB,IAAI,MAAM,CAE7B;CACF"}
|
|
@@ -30,6 +30,11 @@ export class PDFExtractor {
|
|
|
30
30
|
}
|
|
31
31
|
catch (pdf2picError) {
|
|
32
32
|
console.warn('⚠️ pdf2pic extraction failed:', pdf2picError instanceof Error ? pdf2picError.message : 'Unknown error');
|
|
33
|
+
// Check if the error suggests missing dependencies
|
|
34
|
+
const errorMessage = pdf2picError instanceof Error ? pdf2picError.message : '';
|
|
35
|
+
if (errorMessage.includes('GraphicsMagick') || errorMessage.includes('ImageMagick') || errorMessage.includes('command not found')) {
|
|
36
|
+
console.log('💡 Consider installing GraphicsMagick or ImageMagick for PDF-to-image conversion');
|
|
37
|
+
}
|
|
33
38
|
// Fall back to placeholder creation only if pdf2pic fails
|
|
34
39
|
return await this.createPlaceholders(pdfData.numpages);
|
|
35
40
|
}
|
|
@@ -53,37 +58,49 @@ export class PDFExtractor {
|
|
|
53
58
|
try {
|
|
54
59
|
const { fromBuffer } = await import('pdf2pic');
|
|
55
60
|
console.log(`🔄 Converting PDF to images (max ${maxPages} pages)...`);
|
|
61
|
+
// Configure pdf2pic options to save directly to files
|
|
62
|
+
const convertOptions = {
|
|
63
|
+
format: 'png',
|
|
64
|
+
out_dir: this.imageExtractor.imageDirectory,
|
|
65
|
+
out_prefix: 'pdf_page'
|
|
66
|
+
};
|
|
67
|
+
// Convert PDF buffer to images
|
|
68
|
+
const convert = fromBuffer(buffer, convertOptions);
|
|
56
69
|
const extractedPages = [];
|
|
57
|
-
// Convert
|
|
70
|
+
// Convert pages one by one
|
|
58
71
|
for (let pageNumber = 1; pageNumber <= maxPages; pageNumber++) {
|
|
59
72
|
try {
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
73
|
+
const result = await convert(pageNumber);
|
|
74
|
+
if (result && 'path' in result && result.path) {
|
|
75
|
+
const filename = path.basename(result.path);
|
|
76
|
+
try {
|
|
77
|
+
// Read the generated image file
|
|
78
|
+
const fs = await import('fs/promises');
|
|
79
|
+
const imageBuffer = await fs.readFile(result.path);
|
|
80
|
+
// Save using image extractor to ensure proper naming and location
|
|
81
|
+
const savedPath = await this.imageExtractor.saveImage(imageBuffer, `pdf_page_${pageNumber}.png`);
|
|
82
|
+
if (savedPath) {
|
|
83
|
+
extractedPages.push({
|
|
84
|
+
pageNumber,
|
|
85
|
+
imagePath: path.basename(savedPath),
|
|
86
|
+
fullPath: savedPath,
|
|
87
|
+
dimensions: {
|
|
88
|
+
width: ('width' in result && typeof result.width === 'number') ? result.width : 800,
|
|
89
|
+
height: ('height' in result && typeof result.height === 'number') ? result.height : 600
|
|
90
|
+
}
|
|
91
|
+
});
|
|
92
|
+
console.log(`✅ Converted page ${pageNumber} to image`);
|
|
93
|
+
}
|
|
94
|
+
// Clean up the temporary file
|
|
95
|
+
try {
|
|
96
|
+
await fs.unlink(result.path);
|
|
97
|
+
}
|
|
98
|
+
catch (unlinkError) {
|
|
99
|
+
console.warn(`⚠️ Failed to clean up temp file ${result.path}`);
|
|
100
|
+
}
|
|
101
|
+
}
|
|
102
|
+
catch (fileError) {
|
|
103
|
+
console.warn(`⚠️ Failed to process image file for page ${pageNumber}:`, fileError instanceof Error ? fileError.message : 'Unknown error');
|
|
87
104
|
}
|
|
88
105
|
}
|
|
89
106
|
}
|
|
@@ -95,6 +112,7 @@ export class PDFExtractor {
|
|
|
95
112
|
if (extractedPages.length === 0) {
|
|
96
113
|
throw new Error('No pages could be converted to images');
|
|
97
114
|
}
|
|
115
|
+
console.log(`🎉 Successfully converted ${extractedPages.length} pages to images`);
|
|
98
116
|
return extractedPages;
|
|
99
117
|
}
|
|
100
118
|
catch (error) {
|
|
@@ -106,23 +124,74 @@ export class PDFExtractor {
|
|
|
106
124
|
* Create placeholder files as fallback when pdf2pic fails
|
|
107
125
|
*/
|
|
108
126
|
async createPlaceholders(pageCount) {
|
|
109
|
-
console.log('📝 Creating placeholders as fallback...');
|
|
127
|
+
console.log('📝 Creating image placeholders as fallback...');
|
|
110
128
|
const extractedPages = [];
|
|
111
129
|
const maxPages = Math.min(pageCount, 3);
|
|
112
130
|
for (let page = 1; page <= maxPages; page++) {
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
})
|
|
125
|
-
|
|
131
|
+
try {
|
|
132
|
+
// Create a simple placeholder image using Sharp
|
|
133
|
+
const sharp = await import('sharp');
|
|
134
|
+
// Create a 800x600 placeholder image with text
|
|
135
|
+
const placeholderImage = await sharp.default({
|
|
136
|
+
create: {
|
|
137
|
+
width: 800,
|
|
138
|
+
height: 600,
|
|
139
|
+
channels: 4,
|
|
140
|
+
background: { r: 240, g: 240, b: 240, alpha: 1 }
|
|
141
|
+
}
|
|
142
|
+
})
|
|
143
|
+
.png()
|
|
144
|
+
.composite([
|
|
145
|
+
{
|
|
146
|
+
input: Buffer.from(`<svg width="800" height="600" xmlns="http://www.w3.org/2000/svg">
|
|
147
|
+
<rect width="800" height="600" fill="#f0f0f0" stroke="#ccc" stroke-width="2"/>
|
|
148
|
+
<text x="400" y="250" text-anchor="middle" font-family="Arial" font-size="24" fill="#666">PDF Page ${page}</text>
|
|
149
|
+
<text x="400" y="290" text-anchor="middle" font-family="Arial" font-size="16" fill="#888">Image extraction failed</text>
|
|
150
|
+
<text x="400" y="320" text-anchor="middle" font-family="Arial" font-size="16" fill="#888">Page ${page} of ${pageCount}</text>
|
|
151
|
+
<text x="400" y="360" text-anchor="middle" font-family="Arial" font-size="14" fill="#aaa">Install GraphicsMagick for better PDF support</text>
|
|
152
|
+
</svg>`),
|
|
153
|
+
top: 0,
|
|
154
|
+
left: 0,
|
|
155
|
+
}
|
|
156
|
+
])
|
|
157
|
+
.toBuffer();
|
|
158
|
+
const filename = `pdf_page_${page}_placeholder.png`;
|
|
159
|
+
// Use the image extractor to save the placeholder
|
|
160
|
+
const savedPath = await this.imageExtractor.saveImage(placeholderImage, filename);
|
|
161
|
+
if (savedPath) {
|
|
162
|
+
extractedPages.push({
|
|
163
|
+
pageNumber: page,
|
|
164
|
+
imagePath: path.basename(savedPath),
|
|
165
|
+
fullPath: savedPath,
|
|
166
|
+
dimensions: {
|
|
167
|
+
width: 800,
|
|
168
|
+
height: 600
|
|
169
|
+
}
|
|
170
|
+
});
|
|
171
|
+
console.log(`✅ Created image placeholder for page ${page}`);
|
|
172
|
+
}
|
|
173
|
+
}
|
|
174
|
+
catch (sharpError) {
|
|
175
|
+
console.warn(`⚠️ Failed to create image placeholder for page ${page}:`, sharpError instanceof Error ? sharpError.message : 'Unknown error');
|
|
176
|
+
// Fallback to simple text-based approach without Sharp conversion
|
|
177
|
+
const filename = `pdf_page_${page}_info.txt`;
|
|
178
|
+
const placeholderContent = `PDF Page ${page} - Image extraction failed\n\nPage ${page} of ${pageCount}\n\nInstall GraphicsMagick for better PDF image support.`;
|
|
179
|
+
const placeholderBuffer = Buffer.from(placeholderContent, 'utf-8');
|
|
180
|
+
// Save directly without image conversion
|
|
181
|
+
const fs = await import('fs/promises');
|
|
182
|
+
const fullPath = path.join(this.imageExtractor.imageDirectory, filename);
|
|
183
|
+
try {
|
|
184
|
+
await fs.writeFile(fullPath, placeholderBuffer);
|
|
185
|
+
extractedPages.push({
|
|
186
|
+
pageNumber: page,
|
|
187
|
+
imagePath: filename,
|
|
188
|
+
fullPath: path.resolve(fullPath)
|
|
189
|
+
});
|
|
190
|
+
console.log(`✅ Created text placeholder for page ${page}`);
|
|
191
|
+
}
|
|
192
|
+
catch (writeError) {
|
|
193
|
+
console.warn(`⚠️ Failed to write placeholder for page ${page}:`, writeError instanceof Error ? writeError.message : 'Unknown error');
|
|
194
|
+
}
|
|
126
195
|
}
|
|
127
196
|
}
|
|
128
197
|
return extractedPages;
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"pdf-extractor.js","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AA2BrC,MAAM,OAAO,YAAY;IACN,cAAc,CAAiB;IACxC,WAAW,GAAW,CAAC,CAAC;IAEhC,YAAY,cAA8B;QACxC,IAAI,CAAC,cAAc,GAAG,cAAc,CAAC;IACvC,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,oBAAoB,CAAC,MAAc;QACvC,IAAI,CAAC;YACH,OAAO,CAAC,GAAG,CAAC,yDAAyD,CAAC,CAAC;YAEvE,yCAAyC;YACzC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,WAAW,CAAC,CAAC;YAC3C,MAAM,OAAO,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,4BAA4B,OAAO,CAAC,QAAQ,wBAAwB,OAAO,CAAC,IAAI,EAAE,MAAM,IAAI,CAAC,EAAE,CAAC,CAAC;YAE7G,oEAAoE;YACpE,MAAM,YAAY,GAAG,CAAC,OAAO,CAAC,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,GAAG,CAAC;YAEvE,4FAA4F;YAC5F,IAAI,YAAY,IAAI,OAAO,CAAC,QAAQ,IAAI,CAAC,EAAE,CAAC;gBAC1C,OAAO,CAAC,GAAG,CAAC,uEAAuE,CAAC,CAAC;gBAErF,IAAI,CAAC;oBACH,MAAM,cAAc,GAAG,MAAM,IAAI,CAAC,kBAAkB,CAAC,MAAM,EAAE,IAAI,CAAC,GAAG,CAAC,OAAO,CAAC,QAAQ,EAAE,CAAC,CAAC,CAAC,CAAC;oBAC5F,IAAI,cAAc,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;wBAC9B,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,cAAc,CAAC,CAAC;wBAC9E,OAAO,cAAc,CAAC;oBACxB,CAAC;gBACH,CAAC;gBAAC,OAAO,YAAqB,EAAE,CAAC;oBAC/B,OAAO,CAAC,IAAI,CAAC,+BAA+B,EAAE,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBACtH,0DAA0D;oBAC1D,OAAO,MAAM,IAAI,CAAC,kBAAkB,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;gBACzD,CAAC;YACH,CAAC;iBAAM,CAAC;gBACN,OAAO,CAAC,GAAG,CAAC,8DAA8D,CAAC,CAAC;gBAC5E,OAAO,EAAE,CAAC;YACZ,CAAC;YAED,OAAO,EAAE,CAAC;QACZ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,IAAI,CAAC,yBAAyB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YAClG,4EAA4E;YAC5E,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,MAAc,EAAE,WAAmB,CAAC;QACnE,IAAI,CAAC;YACH,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,MAAM,CAAC,SAAS,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,oCAAoC,QAAQ,YAAY,CAAC,CAAC;YAEtE,MAAM,cAAc,GAAe,EAAE,CAAC;YAEtC,sDAAsD;YACtD,KAAK,IAAI,UAAU,GAAG,CAAC,EAAE,UAAU,IAAI,QAAQ,EAAE,UAAU,EAAE,EAAE,CAAC;gBAC9D,IAAI,CAAC;oBACH,mDAAmD;oBACnD,MAAM,cAAc,GAAG;wBACrB,MAAM,EAAE,KAAc;wBACtB,OAAO,EAAE,IAAI,CAAC,cAAc,CAAC,cAAc;wBAC3C,UAAU,EAAE,YAAY,UAAU,EAAE;wBACpC,IAAI,EAAE,UAAU;qBACjB,CAAC;oBAEF,kCAAkC;oBAClC,MAAM,OAAO,GAAG,UAAU,CAAC,MAAM,EAAE,cAAc,CAAC,CAAC;oBACnD,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,UAAU,EAAE,IAAI,CAAC,CAAC,CAAC,4BAA4B;oBAE5E,IAAI,MAAM,IAAI,QAAQ,IAAI,MAAM,IAAI,MAAM,CAAC,MAAM,EAAE,CAAC;wBAClD,2BAA2B;wBAC3B,MAAM,WAAW,GAAG,MAAM,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,EAAE,QAAQ,CAAC,CAAC;wBACzD,MAAM,QAAQ,GAAG,YAAY,UAAU,MAAM,CAAC;wBAE9C,2CAA2C;wBAC3C,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,WAAW,EAAE,QAAQ,CAAC,CAAC;wBAE7E,IAAI,SAAS,EAAE,CAAC;4BACd,cAAc,CAAC,IAAI,CAAC;gCAClB,UAAU;gCACV,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;gCACnC,QAAQ,EAAE,SAAS;gCACnB,UAAU,EAAE;oCACV,KAAK,EAAE,GAAG,EAAE,+DAA+D;oCAC3E,MAAM,EAAE,GAAG;iCACZ;6BACF,CAAC,CAAC;4BACH,OAAO,CAAC,GAAG,CAAC,oBAAoB,UAAU,WAAW,CAAC,CAAC;wBACzD,CAAC;oBACH,CAAC;gBACH,CAAC;gBAAC,OAAO,SAAkB,EAAE,CAAC;oBAC5B,OAAO,CAAC,IAAI,CAAC,6BAA6B,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAC3H,0BAA0B;gBAC5B,CAAC;YACH,CAAC;YAED,IAAI,cAAc,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBAChC,MAAM,IAAI,KAAK,CAAC,uCAAuC,CAAC,CAAC;YAC3D,CAAC;YAED,OAAO,cAAc,CAAC;QACxB,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,KAAK,CAAC,8BAA8B,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YACxG,MAAM,KAAK,CAAC;QACd,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,SAAiB;QAChD,OAAO,CAAC,GAAG,CAAC,yCAAyC,CAAC,CAAC;QAEvD,MAAM,cAAc,GAAe,EAAE,CAAC;QACtC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,CAAC,SAAS,EAAE,CAAC,CAAC,CAAC;QAExC,KAAK,IAAI,IAAI,GAAG,CAAC,EAAE,IAAI,IAAI,QAAQ,EAAE,IAAI,EAAE,EAAE,CAAC;YAC5C,MAAM,kBAAkB,GAAG,YAAY,IAAI,oJAAoJ,IAAI,OAAO,SAAS,EAAE,CAAC;YACtN,MAAM,iBAAiB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,EAAE,OAAO,CAAC,CAAC;YAEnE,kCAAkC;YAClC,MAAM,QAAQ,GAAG,YAAY,IAAI,kBAAkB,CAAC;YAEpD,kDAAkD;YAClD,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,iBAAiB,EAAE,QAAQ,CAAC,CAAC;YAEnF,IAAI,SAAS,EAAE,CAAC;gBACd,cAAc,CAAC,IAAI,CAAC;oBAClB,UAAU,EAAE,IAAI;oBAChB,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;oBACnC,QAAQ,EAAE,SAAS;iBACpB,CAAC,CAAC;gBACH,OAAO,CAAC,GAAG,CAAC,kCAAkC,IAAI,EAAE,CAAC,CAAC;YACxD,CAAC;QACH,CAAC;QAED,OAAO,cAAc,CAAC;IACxB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,qBAAqB,CAAC,IAAY,EAAE,QAAkB;QAC1D,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;QAC/B,IAAI,YAAY,GAAG,EAAE,CAAC;QACtB,IAAI,OAAO,GAAG,KAAK,CAAC;QACpB,IAAI,SAAS,GAAe,EAAE,CAAC;QAE/B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;YACtC,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;YAE7B,IAAI,CAAC,IAAI,EAAE,CAAC;gBACV,qBAAqB;gBACrB,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBACD,YAAY,IAAI,IAAI,CAAC;gBACrB,SAAS;YACX,CAAC;YAED,iEAAiE;YACjE,IAAI,IAAI,CAAC,eAAe,CAAC,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC;gBACzC,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBAED,MAAM,YAAY,GAAG,IAAI,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC;gBACtD,YAAY,IAAI,GAAG,GAAG,CAAC,MAAM,CAAC,YAAY,CAAC,IAAI,IAAI,MAAM,CAAC;gBAC1D,SAAS;YACX,CAAC;YAED,4BAA4B;YAC5B,IAAI,IAAI,CAAC,gBAAgB,CAAC,IAAI,CAAC,EAAE,CAAC;gBAChC,IAAI,CAAC,OAAO,EAAE,CAAC;oBACb,OAAO,GAAG,IAAI,CAAC;gBACjB,CAAC;gBACD,SAAS,CAAC,IAAI,CAAC,EAAE,KAAK,EAAE,IAAI,CAAC,aAAa,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACpD,SAAS;YACX,CAAC;iBAAM,IAAI,OAAO,EAAE,CAAC;gBACnB,eAAe;gBACf,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;gBAChD,SAAS,GAAG,EAAE,CAAC;gBACf,OAAO,GAAG,KAAK,CAAC;YAClB,CAAC;YAED,eAAe;YACf,IAAI,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC1B,YAAY,IAAI,GAAG,IAAI,CAAC,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;gBACjD,SAAS;YACX,CAAC;YAED,oBAAoB;YACpB,YAAY,IAAI,GAAG,IAAI,IAAI,CAAC;QAC9B,CAAC;QAED,6BAA6B;QAC7B,IAAI,OAAO,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACpC,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;QAClD,CAAC;QAED,OAAO,YAAY,CAAC;IACtB,CAAC;IAEO,eAAe,CAAC,IAAY,EAAE,QAA2B,EAAE,KAAa;QAC9E,qCAAqC;QACrC,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,KAAK,CAAC,CAAC,2BAA2B;QAC/D,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,KAAK,CAAC,CAAE,YAAY;QAEhD,+CAA+C;QAC/C,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhE,0CAA0C;QAC1C,MAAM,QAAQ,GAAG,QAAQ,CAAC,KAAK,GAAG,CAAC,CAAC,CAAC;QACrC,IAAI,QAAQ,IAAI,QAAQ,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC3D,OAAO,IAAI,CAAC;QACd,CAAC;QAED,iDAAiD;QACjD,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpC,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,qBAAqB,CAAC,IAAY;QACxC,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE;YAAE,OAAO,CAAC,CAAC,CAAC,2BAA2B;QACtE,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,CAAC,CAAC,CAAS,4BAA4B;QACtE,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,CAAC,CAAC,CAAW,qBAAqB;QAC/D,OAAO,CAAC,CAAC,CAAC,UAAU;IACtB,CAAC;IAEO,gBAAgB,CAAC,IAAY;QACnC,8CAA8C;QAC9C,MAAM,QAAQ,GAAG;YACf,KAAK;YACL,QAAQ,EAAkB,kBAAkB;YAC5C,IAAI,EAAsB,iBAAiB;YAC3C,WAAW;YACX,cAAc;SACf,CAAC;QAEF,OAAO,QAAQ,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IACtD,CAAC;IAEO,aAAa,CAAC,IAAY;QAChC,sDAAsD;QACtD,IAAI,OAAO,GAAa,EAAE,CAAC;QAE3B,IAAI,IAAI,CAAC,QAAQ,CAAC,IAAI,CAAC,EAAE,CAAC;YACxB,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACpD,CAAC;aAAM,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,EAAE,CAAC;YAC9B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACnD,CAAC;aAAM,CAAC;YACN,2BAA2B;YAC3B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,CAAC;QAED,OAAO,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAC/C,CAAC;IAEO,eAAe,CAAC,IAAyB;QAC/C,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO,EAAE,CAAC;QAEjC,iCAAiC;QACjC,MAAM,OAAO,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC;QAE/D,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,GAAG,CAAC,IAAI,IAAI,CAAC,OAAO,EAAE,EAAE,CAAC;YACtC,IAAI,WAAW,GAAG,GAAG,CAAC;YAEtB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;gBACjC,MAAM,IAAI,GAAG,GAAG,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;gBAChC,WAAW,IAAI,IAAI,IAAI,IAAI,CAAC;YAC9B,CAAC;YAED,QAAQ,IAAI,GAAG,WAAW,IAAI,CAAC;YAE/B,uCAAuC;YACvC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;gBACZ,IAAI,SAAS,GAAG,GAAG,CAAC;gBACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;oBACjC,SAAS,IAAI,QAAQ,CAAC;gBACxB,CAAC;gBACD,QAAQ,IAAI,GAAG,SAAS,IAAI,CAAC;YAC/B,CAAC;QACH,CAAC;QAED,OAAO,GAAG,QAAQ,IAAI,CAAC;IACzB,CAAC;IAEO,UAAU,CAAC,IAAY;QAC7B,kCAAkC;QAClC,MAAM,YAAY,GAAG;YACnB,cAAc;YACd,cAAc;YACd,mBAAmB;YACnB,kBAAkB;SACnB,CAAC;QAEF,OAAO,YAAY,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC1D,CAAC;IAEO,cAAc,CAAC,IAAY;QACjC,2CAA2C;QAC3C,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC9B,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,KAAK,CAAC,CAAC;QAC7C,CAAC;aAAM,IAAI,mBAAmB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC1C,OAAO,IAAI,CAAC,OAAO,CAAC,mBAAmB,EAAE,IAAI,CAAC,CAAC;QACjD,CAAC;aAAM,IAAI,kBAAkB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC,OAAO,CAAC,kBAAkB,EAAE,IAAI,CAAC,CAAC;QAChD,CAAC;aAAM,CAAC;YACN,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC,CAAC;QAC5C,CAAC;IACH,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,gBAAgB,CAAC,UAA+B;QACpD,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,IAAI,CAAC,IAAI,UAAU,CAAC,OAAO,EAAE,EAAE,CAAC;YAC7C,QAAQ,IAAI,WAAW,IAAI,CAAC,UAAU,MAAM,CAAC;YAC7C,QAAQ,IAAI,IAAI,CAAC,cAAc,CAAC,gBAAgB,CAAC,QAAQ,IAAI,CAAC,UAAU,EAAE,EAAE,IAAI,CAAC,SAAS,CAAC,CAAC;YAC5F,QAAQ,IAAI,MAAM,CAAC;YAEnB,IAAI,CAAC,GAAG,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC9B,QAAQ,IAAI,SAAS,CAAC,CAAC,iBAAiB;YAC1C,CAAC;QACH,CAAC;QAED,OAAO,QAAQ,CAAC;IAClB,CAAC;IAED;;OAEG;IACH,KAAK;QACH,IAAI,CAAC,WAAW,GAAG,CAAC,CAAC;IACvB,CAAC;IAED;;OAEG;IACH,IAAI,gBAAgB;QAClB,OAAO,IAAI,CAAC,WAAW,CAAC;IAC1B,CAAC;CACF"}
|
|
1
|
+
{"version":3,"file":"pdf-extractor.js","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AA2BrC,MAAM,OAAO,YAAY;IACN,cAAc,CAAiB;IACxC,WAAW,GAAW,CAAC,CAAC;IAEhC,YAAY,cAA8B;QACxC,IAAI,CAAC,cAAc,GAAG,cAAc,CAAC;IACvC,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,oBAAoB,CAAC,MAAc;QACvC,IAAI,CAAC;YACH,OAAO,CAAC,GAAG,CAAC,yDAAyD,CAAC,CAAC;YAEvE,yCAAyC;YACzC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,WAAW,CAAC,CAAC;YAC3C,MAAM,OAAO,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,4BAA4B,OAAO,CAAC,QAAQ,wBAAwB,OAAO,CAAC,IAAI,EAAE,MAAM,IAAI,CAAC,EAAE,CAAC,CAAC;YAE7G,oEAAoE;YACpE,MAAM,YAAY,GAAG,CAAC,OAAO,CAAC,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,GAAG,CAAC;YAEvE,4FAA4F;YAC5F,IAAI,YAAY,IAAI,OAAO,CAAC,QAAQ,IAAI,CAAC,EAAE,CAAC;gBAC1C,OAAO,CAAC,GAAG,CAAC,uEAAuE,CAAC,CAAC;gBAErF,IAAI,CAAC;oBACH,MAAM,cAAc,GAAG,MAAM,IAAI,CAAC,kBAAkB,CAAC,MAAM,EAAE,IAAI,CAAC,GAAG,CAAC,OAAO,CAAC,QAAQ,EAAE,CAAC,CAAC,CAAC,CAAC;oBAC5F,IAAI,cAAc,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;wBAC9B,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,cAAc,CAAC,CAAC;wBAC9E,OAAO,cAAc,CAAC;oBACxB,CAAC;gBACH,CAAC;gBAAC,OAAO,YAAqB,EAAE,CAAC;oBAC/B,OAAO,CAAC,IAAI,CAAC,+BAA+B,EAAE,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAEtH,mDAAmD;oBACnD,MAAM,YAAY,GAAG,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;oBAC/E,IAAI,YAAY,CAAC,QAAQ,CAAC,gBAAgB,CAAC,IAAI,YAAY,CAAC,QAAQ,CAAC,aAAa,CAAC,IAAI,YAAY,CAAC,QAAQ,CAAC,mBAAmB,CAAC,EAAE,CAAC;wBAClI,OAAO,CAAC,GAAG,CAAC,kFAAkF,CAAC,CAAC;oBAClG,CAAC;oBAED,0DAA0D;oBAC1D,OAAO,MAAM,IAAI,CAAC,kBAAkB,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;gBACzD,CAAC;YACH,CAAC;iBAAM,CAAC;gBACN,OAAO,CAAC,GAAG,CAAC,8DAA8D,CAAC,CAAC;gBAC5E,OAAO,EAAE,CAAC;YACZ,CAAC;YAED,OAAO,EAAE,CAAC;QACZ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,IAAI,CAAC,yBAAyB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YAClG,4EAA4E;YAC5E,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,MAAc,EAAE,WAAmB,CAAC;QACnE,IAAI,CAAC;YACH,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,MAAM,CAAC,SAAS,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,oCAAoC,QAAQ,YAAY,CAAC,CAAC;YAEtE,sDAAsD;YACtD,MAAM,cAAc,GAAG;gBACrB,MAAM,EAAE,KAAc;gBACtB,OAAO,EAAE,IAAI,CAAC,cAAc,CAAC,cAAc;gBAC3C,UAAU,EAAE,UAAU;aACvB,CAAC;YAEF,+BAA+B;YAC/B,MAAM,OAAO,GAAG,UAAU,CAAC,MAAM,EAAE,cAAc,CAAC,CAAC;YAEnD,MAAM,cAAc,GAAe,EAAE,CAAC;YAEtC,2BAA2B;YAC3B,KAAK,IAAI,UAAU,GAAG,CAAC,EAAE,UAAU,IAAI,QAAQ,EAAE,UAAU,EAAE,EAAE,CAAC;gBAC9D,IAAI,CAAC;oBACH,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,UAAU,CAAC,CAAC;oBAEzC,IAAI,MAAM,IAAI,MAAM,IAAI,MAAM,IAAI,MAAM,CAAC,IAAI,EAAE,CAAC;wBAC9C,MAAM,QAAQ,GAAG,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;wBAE5C,IAAI,CAAC;4BACH,gCAAgC;4BAChC,MAAM,EAAE,GAAG,MAAM,MAAM,CAAC,aAAa,CAAC,CAAC;4BACvC,MAAM,WAAW,GAAG,MAAM,EAAE,CAAC,QAAQ,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;4BAEnD,kEAAkE;4BAClE,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,WAAW,EAAE,YAAY,UAAU,MAAM,CAAC,CAAC;4BAEjG,IAAI,SAAS,EAAE,CAAC;gCACd,cAAc,CAAC,IAAI,CAAC;oCAClB,UAAU;oCACV,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;oCACnC,QAAQ,EAAE,SAAS;oCACnB,UAAU,EAAE;wCACV,KAAK,EAAE,CAAC,OAAO,IAAI,MAAM,IAAI,OAAO,MAAM,CAAC,KAAK,KAAK,QAAQ,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,GAAG;wCACnF,MAAM,EAAE,CAAC,QAAQ,IAAI,MAAM,IAAI,OAAO,MAAM,CAAC,MAAM,KAAK,QAAQ,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,CAAC,GAAG;qCACxF;iCACF,CAAC,CAAC;gCACH,OAAO,CAAC,GAAG,CAAC,oBAAoB,UAAU,WAAW,CAAC,CAAC;4BACzD,CAAC;4BAED,8BAA8B;4BAC9B,IAAI,CAAC;gCACH,MAAM,EAAE,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;4BAC/B,CAAC;4BAAC,OAAO,WAAW,EAAE,CAAC;gCACrB,OAAO,CAAC,IAAI,CAAC,mCAAmC,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;4BACjE,CAAC;wBAEH,CAAC;wBAAC,OAAO,SAAkB,EAAE,CAAC;4BAC5B,OAAO,CAAC,IAAI,CAAC,4CAA4C,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;wBAC5I,CAAC;oBACH,CAAC;gBACH,CAAC;gBAAC,OAAO,SAAkB,EAAE,CAAC;oBAC5B,OAAO,CAAC,IAAI,CAAC,6BAA6B,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAC3H,0BAA0B;gBAC5B,CAAC;YACH,CAAC;YAED,IAAI,cAAc,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBAChC,MAAM,IAAI,KAAK,CAAC,uCAAuC,CAAC,CAAC;YAC3D,CAAC;YAED,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,kBAAkB,CAAC,CAAC;YAClF,OAAO,cAAc,CAAC;QACxB,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,KAAK,CAAC,8BAA8B,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YACxG,MAAM,KAAK,CAAC;QACd,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,SAAiB;QAChD,OAAO,CAAC,GAAG,CAAC,+CAA+C,CAAC,CAAC;QAE7D,MAAM,cAAc,GAAe,EAAE,CAAC;QACtC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,CAAC,SAAS,EAAE,CAAC,CAAC,CAAC;QAExC,KAAK,IAAI,IAAI,GAAG,CAAC,EAAE,IAAI,IAAI,QAAQ,EAAE,IAAI,EAAE,EAAE,CAAC;YAC5C,IAAI,CAAC;gBACH,gDAAgD;gBAChD,MAAM,KAAK,GAAG,MAAM,MAAM,CAAC,OAAO,CAAC,CAAC;gBAEpC,+CAA+C;gBAC/C,MAAM,gBAAgB,GAAG,MAAM,KAAK,CAAC,OAAO,CAAC;oBAC3C,MAAM,EAAE;wBACN,KAAK,EAAE,GAAG;wBACV,MAAM,EAAE,GAAG;wBACX,QAAQ,EAAE,CAAC;wBACX,UAAU,EAAE,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,EAAE,GAAG,EAAE,KAAK,EAAE,CAAC,EAAE;qBACjD;iBACF,CAAC;qBACD,GAAG,EAAE;qBACL,SAAS,CAAC;oBACT;wBACE,KAAK,EAAE,MAAM,CAAC,IAAI,CAChB;;qHAEuG,IAAI;;iHAER,IAAI,OAAO,SAAS;;qBAEhH,CACR;wBACD,GAAG,EAAE,CAAC;wBACN,IAAI,EAAE,CAAC;qBACR;iBACF,CAAC;qBACD,QAAQ,EAAE,CAAC;gBAEZ,MAAM,QAAQ,GAAG,YAAY,IAAI,kBAAkB,CAAC;gBAEpD,kDAAkD;gBAClD,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,gBAAgB,EAAE,QAAQ,CAAC,CAAC;gBAElF,IAAI,SAAS,EAAE,CAAC;oBACd,cAAc,CAAC,IAAI,CAAC;wBAClB,UAAU,EAAE,IAAI;wBAChB,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;wBACnC,QAAQ,EAAE,SAAS;wBACnB,UAAU,EAAE;4BACV,KAAK,EAAE,GAAG;4BACV,MAAM,EAAE,GAAG;yBACZ;qBACF,CAAC,CAAC;oBACH,OAAO,CAAC,GAAG,CAAC,wCAAwC,IAAI,EAAE,CAAC,CAAC;gBAC9D,CAAC;YACH,CAAC;YAAC,OAAO,UAAmB,EAAE,CAAC;gBAC7B,OAAO,CAAC,IAAI,CAAC,kDAAkD,IAAI,GAAG,EAAE,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;gBAE5I,kEAAkE;gBAClE,MAAM,QAAQ,GAAG,YAAY,IAAI,WAAW,CAAC;gBAC7C,MAAM,kBAAkB,GAAG,YAAY,IAAI,sCAAsC,IAAI,OAAO,SAAS,0DAA0D,CAAC;gBAChK,MAAM,iBAAiB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,EAAE,OAAO,CAAC,CAAC;gBAEnE,yCAAyC;gBACzC,MAAM,EAAE,GAAG,MAAM,MAAM,CAAC,aAAa,CAAC,CAAC;gBACvC,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,cAAc,CAAC,cAAc,EAAE,QAAQ,CAAC,CAAC;gBAEzE,IAAI,CAAC;oBACH,MAAM,EAAE,CAAC,SAAS,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;oBAEhD,cAAc,CAAC,IAAI,CAAC;wBAClB,UAAU,EAAE,IAAI;wBAChB,SAAS,EAAE,QAAQ;wBACnB,QAAQ,EAAE,IAAI,CAAC,OAAO,CAAC,QAAQ,CAAC;qBACjC,CAAC,CAAC;oBACH,OAAO,CAAC,GAAG,CAAC,uCAAuC,IAAI,EAAE,CAAC,CAAC;gBAC7D,CAAC;gBAAC,OAAO,UAAmB,EAAE,CAAC;oBAC7B,OAAO,CAAC,IAAI,CAAC,2CAA2C,IAAI,GAAG,EAAE,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;gBACvI,CAAC;YACH,CAAC;QACH,CAAC;QAED,OAAO,cAAc,CAAC;IACxB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,qBAAqB,CAAC,IAAY,EAAE,QAAkB;QAC1D,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;QAC/B,IAAI,YAAY,GAAG,EAAE,CAAC;QACtB,IAAI,OAAO,GAAG,KAAK,CAAC;QACpB,IAAI,SAAS,GAAe,EAAE,CAAC;QAE/B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;YACtC,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;YAE7B,IAAI,CAAC,IAAI,EAAE,CAAC;gBACV,qBAAqB;gBACrB,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBACD,YAAY,IAAI,IAAI,CAAC;gBACrB,SAAS;YACX,CAAC;YAED,iEAAiE;YACjE,IAAI,IAAI,CAAC,eAAe,CAAC,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC;gBACzC,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBAED,MAAM,YAAY,GAAG,IAAI,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC;gBACtD,YAAY,IAAI,GAAG,GAAG,CAAC,MAAM,CAAC,YAAY,CAAC,IAAI,IAAI,MAAM,CAAC;gBAC1D,SAAS;YACX,CAAC;YAED,4BAA4B;YAC5B,IAAI,IAAI,CAAC,gBAAgB,CAAC,IAAI,CAAC,EAAE,CAAC;gBAChC,IAAI,CAAC,OAAO,EAAE,CAAC;oBACb,OAAO,GAAG,IAAI,CAAC;gBACjB,CAAC;gBACD,SAAS,CAAC,IAAI,CAAC,EAAE,KAAK,EAAE,IAAI,CAAC,aAAa,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACpD,SAAS;YACX,CAAC;iBAAM,IAAI,OAAO,EAAE,CAAC;gBACnB,eAAe;gBACf,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;gBAChD,SAAS,GAAG,EAAE,CAAC;gBACf,OAAO,GAAG,KAAK,CAAC;YAClB,CAAC;YAED,eAAe;YACf,IAAI,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC1B,YAAY,IAAI,GAAG,IAAI,CAAC,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;gBACjD,SAAS;YACX,CAAC;YAED,oBAAoB;YACpB,YAAY,IAAI,GAAG,IAAI,IAAI,CAAC;QAC9B,CAAC;QAED,6BAA6B;QAC7B,IAAI,OAAO,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACpC,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;QAClD,CAAC;QAED,OAAO,YAAY,CAAC;IACtB,CAAC;IAEO,eAAe,CAAC,IAAY,EAAE,QAA2B,EAAE,KAAa;QAC9E,qCAAqC;QACrC,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,KAAK,CAAC,CAAC,2BAA2B;QAC/D,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,KAAK,CAAC,CAAE,YAAY;QAEhD,+CAA+C;QAC/C,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhE,0CAA0C;QAC1C,MAAM,QAAQ,GAAG,QAAQ,CAAC,KAAK,GAAG,CAAC,CAAC,CAAC;QACrC,IAAI,QAAQ,IAAI,QAAQ,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC3D,OAAO,IAAI,CAAC;QACd,CAAC;QAED,iDAAiD;QACjD,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpC,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,qBAAqB,CAAC,IAAY;QACxC,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE;YAAE,OAAO,CAAC,CAAC,CAAC,2BAA2B;QACtE,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,CAAC,CAAC,CAAS,4BAA4B;QACtE,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,CAAC,CAAC,CAAW,qBAAqB;QAC/D,OAAO,CAAC,CAAC,CAAC,UAAU;IACtB,CAAC;IAEO,gBAAgB,CAAC,IAAY;QACnC,8CAA8C;QAC9C,MAAM,QAAQ,GAAG;YACf,KAAK;YACL,QAAQ,EAAkB,kBAAkB;YAC5C,IAAI,EAAsB,iBAAiB;YAC3C,WAAW;YACX,cAAc;SACf,CAAC;QAEF,OAAO,QAAQ,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IACtD,CAAC;IAEO,aAAa,CAAC,IAAY;QAChC,sDAAsD;QACtD,IAAI,OAAO,GAAa,EAAE,CAAC;QAE3B,IAAI,IAAI,CAAC,QAAQ,CAAC,IAAI,CAAC,EAAE,CAAC;YACxB,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACpD,CAAC;aAAM,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,EAAE,CAAC;YAC9B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACnD,CAAC;aAAM,CAAC;YACN,2BAA2B;YAC3B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,CAAC;QAED,OAAO,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAC/C,CAAC;IAEO,eAAe,CAAC,IAAyB;QAC/C,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO,EAAE,CAAC;QAEjC,iCAAiC;QACjC,MAAM,OAAO,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC;QAE/D,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,GAAG,CAAC,IAAI,IAAI,CAAC,OAAO,EAAE,EAAE,CAAC;YACtC,IAAI,WAAW,GAAG,GAAG,CAAC;YAEtB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;gBACjC,MAAM,IAAI,GAAG,GAAG,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;gBAChC,WAAW,IAAI,IAAI,IAAI,IAAI,CAAC;YAC9B,CAAC;YAED,QAAQ,IAAI,GAAG,WAAW,IAAI,CAAC;YAE/B,uCAAuC;YACvC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;gBACZ,IAAI,SAAS,GAAG,GAAG,CAAC;gBACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;oBACjC,SAAS,IAAI,QAAQ,CAAC;gBACxB,CAAC;gBACD,QAAQ,IAAI,GAAG,SAAS,IAAI,CAAC;YAC/B,CAAC;QACH,CAAC;QAED,OAAO,GAAG,QAAQ,IAAI,CAAC;IACzB,CAAC;IAEO,UAAU,CAAC,IAAY;QAC7B,kCAAkC;QAClC,MAAM,YAAY,GAAG;YACnB,cAAc;YACd,cAAc;YACd,mBAAmB;YACnB,kBAAkB;SACnB,CAAC;QAEF,OAAO,YAAY,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC1D,CAAC;IAEO,cAAc,CAAC,IAAY;QACjC,2CAA2C;QAC3C,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC9B,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,KAAK,CAAC,CAAC;QAC7C,CAAC;aAAM,IAAI,mBAAmB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC1C,OAAO,IAAI,CAAC,OAAO,CAAC,mBAAmB,EAAE,IAAI,CAAC,CAAC;QACjD,CAAC;aAAM,IAAI,kBAAkB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC,OAAO,CAAC,kBAAkB,EAAE,IAAI,CAAC,CAAC;QAChD,CAAC;aAAM,CAAC;YACN,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC,CAAC;QAC5C,CAAC;IACH,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,gBAAgB,CAAC,UAA+B;QACpD,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,IAAI,CAAC,IAAI,UAAU,CAAC,OAAO,EAAE,EAAE,CAAC;YAC7C,QAAQ,IAAI,WAAW,IAAI,CAAC,UAAU,MAAM,CAAC;YAC7C,QAAQ,IAAI,IAAI,CAAC,cAAc,CAAC,gBAAgB,CAAC,QAAQ,IAAI,CAAC,UAAU,EAAE,EAAE,IAAI,CAAC,SAAS,CAAC,CAAC;YAC5F,QAAQ,IAAI,MAAM,CAAC;YAEnB,IAAI,CAAC,GAAG,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC9B,QAAQ,IAAI,SAAS,CAAC,CAAC,iBAAiB;YAC1C,CAAC;QACH,CAAC;QAED,OAAO,QAAQ,CAAC;IAClB,CAAC;IAED;;OAEG;IACH,KAAK;QACH,IAAI,CAAC,WAAW,GAAG,CAAC,CAAC;IACvB,CAAC;IAED;;OAEG;IACH,IAAI,gBAAgB;QAClB,OAAO,IAAI,CAAC,WAAW,CAAC;IAC1B,CAAC;CACF"}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "file2md",
|
|
3
|
-
"version": "1.4.
|
|
3
|
+
"version": "1.4.35",
|
|
4
4
|
"description": "A TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with image and layout preservation",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"types": "dist/index.d.ts",
|