file2md 1.4.34 → 1.4.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ko.md ADDED
@@ -0,0 +1,408 @@
1
+ # file2md
2
+
3
+ [![npm version](https://badge.fury.io/js/file2md.svg)](https://badge.fury.io/js/file2md)
4
+ [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+
7
+ 다양한 문서 형식(PDF, DOCX, XLSX, PPTX, HWP, HWPX)을 **고급 레이아웃 보존**, **실제 PDF 이미지 추출**, **차트 변환**, **한국어 문서 지원** 기능과 함께 마크다운으로 변환하는 현대적인 TypeScript 라이브러리입니다.
8
+
9
+ [English](README.md) | **한국어**
10
+
11
+ ## ✨ 주요 기능
12
+
13
+ - 🔄 **다양한 형식 지원**: PDF, DOCX, XLSX, PPTX, HWP, HWPX
14
+ - 🎨 **레이아웃 보존**: 문서 구조, 표, 서식 유지
15
+ - 🖼️ **실제 PDF 이미지 추출**: pdf2pic을 사용하여 PDF 페이지를 실제 PNG 이미지로 변환
16
+ - 📊 **차트 변환**: 차트를 마크다운 표로 변환
17
+ - 📝 **목록 및 표 지원**: 중첩된 목록과 복잡한 표 지원
18
+ - 🌏 **한국어 문서 지원**: HWP/HWPX 한국어 문서 형식 완전 지원
19
+ - 🔒 **타입 안전성**: 포괄적인 타입을 제공하는 완전한 TypeScript 지원
20
+ - ⚡ **현대적 ESM**: CommonJS 호환성을 갖춘 ES2022 모듈
21
+ - 🚀 **무설정**: 별도 설정 없이 바로 사용 가능
22
+ - 🎯 **시각적 파싱**: 향상된 PPTX 파싱 및 시각적 레이아웃 분석
23
+
24
+ ## 📦 설치
25
+
26
+ ```bash
27
+ npm install file2md
28
+ ```
29
+
30
+ ## 🚀 빠른 시작
31
+
32
+ ### TypeScript / ES 모듈
33
+
34
+ ```typescript
35
+ import { convert } from 'file2md';
36
+
37
+ // 파일 경로로 변환
38
+ const result = await convert('./document.pdf');
39
+ console.log(result.markdown);
40
+
41
+ // 옵션과 함께 변환
42
+ const result = await convert('./presentation.pptx', {
43
+ imageDir: 'extracted-images',
44
+ preserveLayout: true,
45
+ extractCharts: true,
46
+ useVisualParser: true // 향상된 PPTX 파싱
47
+ });
48
+
49
+ console.log(`✅ 변환 완료!`);
50
+ console.log(`📄 마크다운 길이: ${result.markdown.length}`);
51
+ console.log(`🖼️ 추출된 이미지: ${result.images.length}`);
52
+ console.log(`📊 발견된 차트: ${result.charts.length}`);
53
+ console.log(`⏱️ 처리 시간: ${result.metadata.processingTime}ms`);
54
+ ```
55
+
56
+ ### 한국어 문서 지원 (HWP/HWPX)
57
+
58
+ ```typescript
59
+ import { convert } from 'file2md';
60
+
61
+ // 한국어 HWP 문서 변환
62
+ const hwpResult = await convert('./document.hwp', {
63
+ imageDir: 'hwp-images',
64
+ preserveLayout: true,
65
+ extractImages: true
66
+ });
67
+
68
+ // 한국어 HWPX 문서 변환 (XML 기반 형식)
69
+ const hwpxResult = await convert('./document.hwpx', {
70
+ imageDir: 'hwpx-images',
71
+ preserveLayout: true,
72
+ extractImages: true
73
+ });
74
+
75
+ console.log(`🇰🇷 HWP 내용: ${hwpResult.markdown.substring(0, 100)}...`);
76
+ console.log(`📄 HWPX 페이지: ${hwpResult.metadata.pageCount}`);
77
+ ```
78
+
79
+ ### CommonJS
80
+
81
+ ```javascript
82
+ const { convert } = require('file2md');
83
+
84
+ const result = await convert('./document.docx');
85
+ console.log(result.markdown);
86
+ ```
87
+
88
+ ### 버퍼에서 변환
89
+
90
+ ```typescript
91
+ import { convert } from 'file2md';
92
+ import { readFile } from 'fs/promises';
93
+
94
+ const buffer = await readFile('./document.xlsx');
95
+ const result = await convert(buffer, {
96
+ imageDir: 'spreadsheet-images'
97
+ });
98
+ ```
99
+
100
+ ## 📋 API 참조
101
+
102
+ ### `convert(input, options?)`
103
+
104
+ **매개변수:**
105
+ - `input: string | Buffer` - 파일 경로 또는 문서 데이터가 포함된 버퍼
106
+ - `options?: ConvertOptions` - 변환 옵션
107
+
108
+ **반환값:** `Promise<ConversionResult>`
109
+
110
+ ### 옵션
111
+
112
+ ```typescript
113
+ interface ConvertOptions {
114
+ imageDir?: string; // 추출된 이미지 디렉터리 (기본값: 'images')
115
+ outputDir?: string; // 슬라이드 스크린샷 출력 디렉터리 (PPTX용, imageDir로 폴백)
116
+ preserveLayout?: boolean; // 문서 레이아웃 유지 (기본값: true)
117
+ extractCharts?: boolean; // 차트를 표로 변환 (기본값: true)
118
+ extractImages?: boolean; // 임베디드 이미지 추출 (기본값: true)
119
+ maxPages?: number; // PDF 최대 페이지 수 (기본값: 무제한)
120
+ useVisualParser?: boolean; // PPTX용 향상된 시각적 파싱 (기본값: true)
121
+ }
122
+ ```
123
+
124
+ ### 결과
125
+
126
+ ```typescript
127
+ interface ConversionResult {
128
+ markdown: string; // 생성된 마크다운 콘텐츠
129
+ images: ImageData[]; // 추출된 이미지 정보
130
+ charts: ChartData[]; // 추출된 차트 데이터
131
+ metadata: DocumentMetadata; // 처리 정보가 포함된 문서 메타데이터
132
+ }
133
+ ```
134
+
135
+ ## 🎯 형식별 세부 기능
136
+
137
+ ### 📄 PDF
138
+ - ✅ **텍스트 추출** 및 레이아웃 향상
139
+ - ✅ **표 감지** 및 서식 지정
140
+ - ✅ **목록 인식** (글머리 기호, 번호)
141
+ - ✅ **제목 감지** (모든 대문자, 콜론)
142
+ - ✅ **실제 이미지 추출** pdf2pic 사용 - PDF 페이지를 PNG 이미지로 변환
143
+ - ✅ **임베디드 이미지 감지** 및 추출
144
+
145
+ ### 📝 DOCX
146
+ - ✅ **제목 계층 구조** (H1-H6)
147
+ - ✅ **텍스트 서식** (굵게, 기울임꼴)
148
+ - ✅ **복잡한 표** 병합된 셀 포함
149
+ - ✅ **중첩된 목록** 적절한 들여쓰기 포함
150
+ - ✅ **임베디드 이미지** 및 차트
151
+ - ✅ **셀 스타일링** (정렬, 색상)
152
+ - ✅ **글꼴 크기 보존** 및 서식
153
+
154
+ ### 📊 XLSX
155
+ - ✅ **여러 워크시트** 별도 섹션으로 구분
156
+ - ✅ **셀 서식** (굵게, 색상, 정렬)
157
+ - ✅ **데이터 타입 보존**
158
+ - ✅ **차트 추출** 데이터 표로 변환
159
+ - ✅ **조건부 서식** 메모
160
+ - ✅ **공유 문자열** 대용량 파일 처리
161
+
162
+ ### 🎬 PPTX
163
+ - ✅ **슬라이드별** 구성
164
+ - ✅ **텍스트 위치 지정** 및 레이아웃
165
+ - ✅ **슬라이드별 이미지 배치**
166
+ - ✅ **슬라이드에서 표 추출**
167
+ - ✅ **다중 열 레이아웃**
168
+ - ✅ **향상된 레이아웃 분석을 통한 시각적 파싱**
169
+ - ✅ **문서 속성에서 제목 추출**
170
+ - ✅ **차트 및 이미지** 인라인 임베딩
171
+
172
+ ### 🇰🇷 HWP (한국어)
173
+ - ✅ **바이너리 형식** hwp.js를 사용한 파싱
174
+ - ✅ **한국어 텍스트 추출** 적절한 인코딩 포함
175
+ - ✅ **임베디드 콘텐츠에서 이미지 추출**
176
+ - ✅ **한국어 문서용 레이아웃 보존**
177
+ - ✅ **저작권 메시지 필터링** 깔끔한 출력
178
+
179
+ ### 🇰🇷 HWPX (한국어 XML)
180
+ - ✅ **XML 기반 형식** JSZip을 사용한 파싱
181
+ - ✅ **대용량 문서용 다중 섹션 지원**
182
+ - ✅ **이미지 참조용 관계 매핑**
183
+ - ✅ **OWPML 구조** 파싱
184
+ - ✅ **향상된 한국어 텍스트** 처리
185
+ - ✅ **ZIP 아카이브에서 BinData 이미지 추출**
186
+
187
+ ## 🖼️ 이미지 처리
188
+
189
+ 이미지는 자동으로 추출되어 지정된 디렉터리에 저장됩니다:
190
+
191
+ ```typescript
192
+ const result = await convert('./presentation.pptx', {
193
+ imageDir: 'my-images'
194
+ });
195
+
196
+ // 결과 구조:
197
+ // my-images/
198
+ // ├── image_1.png
199
+ // ├── image_2.jpg
200
+ // └── chart_1.png
201
+
202
+ // 마크다운에 포함될 내용:
203
+ // ![Slide 1 Image](my-images/image_1.png)
204
+ ```
205
+
206
+ ## 📊 차트 변환
207
+
208
+ 차트는 마크다운 표로 변환됩니다:
209
+
210
+ ```markdown
211
+ #### Chart 1: 매출 데이터
212
+
213
+ | 카테고리 | 1분기 | 2분기 | 3분기 | 4분기 |
214
+ | --- | --- | --- | --- | --- |
215
+ | 매출 | 100 | 150 | 200 | 250 |
216
+ | 수익 | 20 | 30 | 45 | 60 |
217
+ ```
218
+
219
+ ## 🛡️ 오류 처리
220
+
221
+ ```typescript
222
+ import {
223
+ convert,
224
+ UnsupportedFormatError,
225
+ FileNotFoundError,
226
+ ParseError
227
+ } from 'file2md';
228
+
229
+ try {
230
+ const result = await convert('./document.pdf');
231
+ } catch (error) {
232
+ if (error instanceof UnsupportedFormatError) {
233
+ console.error('지원하지 않는 파일 형식입니다');
234
+ } else if (error instanceof FileNotFoundError) {
235
+ console.error('파일을 찾을 수 없습니다');
236
+ } else if (error instanceof ParseError) {
237
+ console.error('문서 파싱에 실패했습니다:', error.message);
238
+ }
239
+ }
240
+ ```
241
+
242
+ ## 🧪 고급 사용법
243
+
244
+ ### 일괄 처리
245
+
246
+ ```typescript
247
+ import { convert } from 'file2md';
248
+ import { readdir } from 'fs/promises';
249
+
250
+ async function convertFolder(folderPath: string) {
251
+ const files = await readdir(folderPath);
252
+ const results = [];
253
+
254
+ for (const file of files) {
255
+ if (file.match(/\.(pdf|docx|xlsx|pptx|hwp|hwpx)$/i)) {
256
+ try {
257
+ const result = await convert(`${folderPath}/${file}`, {
258
+ imageDir: 'batch-images',
259
+ extractImages: true
260
+ });
261
+ results.push({ file, success: true, result });
262
+ } catch (error) {
263
+ results.push({ file, success: false, error });
264
+ }
265
+ }
266
+ }
267
+
268
+ return results;
269
+ }
270
+ ```
271
+
272
+ ### PDF 이미지 추출 옵션
273
+
274
+ ```typescript
275
+ import { convert } from 'file2md';
276
+
277
+ // 이미지 중심 PDF용 (스캔된 문서)
278
+ const result = await convert('./scanned-document.pdf', {
279
+ imageDir: 'pdf-images',
280
+ maxPages: 10, // 대용량 PDF용 페이지 제한
281
+ extractImages: true // PDF-이미지 변환 활성화
282
+ });
283
+
284
+ console.log(`PDF에서 ${result.images.length}개의 페이지 이미지를 추출했습니다`);
285
+ ```
286
+
287
+ ## 📊 지원 형식
288
+
289
+ | 형식 | 확장자 | 레이아웃 | 이미지 | 차트 | 표 | 목록 |
290
+ |------|-------|---------|-------|------|----|----|
291
+ | PDF | `.pdf` | ✅ | ✅ | ❌ | ✅ | ✅ |
292
+ | Word | `.docx` | ✅ | ✅ | ✅ | ✅ | ✅ |
293
+ | Excel | `.xlsx` | ✅ | ❌ | ✅ | ✅ | ❌ |
294
+ | PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ | ❌ |
295
+ | HWP | `.hwp` | ✅ | ✅ | ❌ | ❌ | ✅ |
296
+ | HWPX | `.hwpx` | ✅ | ✅ | ❌ | ❌ | ✅ |
297
+
298
+ > **PDF 이미지**: pdf2pic 라이브러리를 사용하여 PDF 페이지를 실제 PNG 이미지로 변환
299
+
300
+ ## 🌏 한국어 문서 지원
301
+
302
+ file2md는 한국어 문서 형식에 대한 포괄적인 지원을 제공합니다:
303
+
304
+ ### HWP (한글)
305
+ - 한글 워드프로세서에서 사용하는 **바이너리 형식**
306
+ - 한국 조직에서 여전히 널리 사용되는 **레거시 형식**
307
+ - 한국어 문자 인코딩을 통한 **완전한 텍스트 추출**
308
+ - **이미지 및 차트** 추출 지원
309
+
310
+ ### HWPX (한글 XML)
311
+ - HWP의 후속작인 **현대적인 XML 기반** 형식
312
+ - XML 콘텐츠 파일을 포함한 **ZIP 아카이브 구조**
313
+ - 관계 매핑을 통한 **향상된 파싱**
314
+ - **다중 섹션** 및 복잡한 문서 지원
315
+
316
+ ### 사용 예제
317
+
318
+ ```typescript
319
+ // 한국어 문서 변환
320
+ const koreanDocs = [
321
+ 'report.hwp', // 레거시 바이너리 형식
322
+ 'document.hwpx', // 현대적 XML 형식
323
+ 'presentation.pptx'
324
+ ];
325
+
326
+ for (const doc of koreanDocs) {
327
+ const result = await convert(doc, {
328
+ imageDir: 'korean-docs-images',
329
+ preserveLayout: true
330
+ });
331
+
332
+ console.log(`📄 ${doc}: ${result.markdown.length} 문자`);
333
+ console.log(`🖼️ 이미지: ${result.images.length}`);
334
+ console.log(`⏱️ ${result.metadata.processingTime}ms에 처리 완료`);
335
+ }
336
+ ```
337
+
338
+ ## 🔧 성능 및 설정
339
+
340
+ ```typescript
341
+ import { convert } from 'file2md';
342
+
343
+ // 대용량 문서 최적화
344
+ const result = await convert('./large-document.pdf', {
345
+ maxPages: 50, // PDF 처리 페이지 제한
346
+ extractImages: true, // PDF 이미지 추출 활성화
347
+ preserveLayout: true // 레이아웃 분석 유지
348
+ });
349
+
350
+ // 향상된 PPTX 처리
351
+ const pptxResult = await convert('./presentation.pptx', {
352
+ useVisualParser: true, // 시각적 레이아웃 분석 활성화
353
+ outputDir: 'slides', // 슬라이드용 별도 디렉터리
354
+ extractCharts: true, // 차트 데이터 추출
355
+ extractImages: true // 임베디드 이미지 추출
356
+ });
357
+
358
+ // 메타데이터에서 성능 지표 확인 가능
359
+ console.log('성능 지표:');
360
+ console.log(`- 처리 시간: ${result.metadata.processingTime}ms`);
361
+ console.log(`- 처리된 페이지: ${result.metadata.pageCount}`);
362
+ console.log(`- 추출된 이미지: ${result.metadata.imageCount}`);
363
+ console.log(`- 파일 타입: ${result.metadata.fileType}`);
364
+ ```
365
+
366
+ ## 🤝 기여하기
367
+
368
+ 기여를 환영합니다! 언제든 풀 리퀘스트를 제출해 주세요.
369
+
370
+ 1. 저장소를 포크합니다
371
+ 2. 기능 브랜치를 생성합니다 (`git checkout -b feature/amazing-feature`)
372
+ 3. 변경 사항을 커밋합니다 (`git commit -m 'Add amazing feature'`)
373
+ 4. 브랜치에 푸시합니다 (`git push origin feature/amazing-feature`)
374
+ 5. 풀 리퀘스트를 엽니다
375
+
376
+ ### 개발 환경 설정
377
+
378
+ ```bash
379
+ # 저장소 복제
380
+ git clone https://github.com/ricky-clevi/file2md.git
381
+ cd file2md
382
+
383
+ # 의존성 설치
384
+ npm install
385
+
386
+ # 테스트 실행
387
+ npm test
388
+
389
+ # 프로젝트 빌드
390
+ npm run build
391
+
392
+ # 린팅 실행
393
+ npm run lint
394
+ ```
395
+
396
+ ## 📄 라이센스
397
+
398
+ 이 프로젝트는 MIT 라이센스에 따라 라이센스가 부여됩니다. 자세한 내용은 [LICENSE](LICENSE) 파일을 참조하세요.
399
+
400
+ ## 🔗 링크
401
+
402
+ - [npm 패키지](https://www.npmjs.com/package/file2md)
403
+ - [GitHub 저장소](https://github.com/ricky-clevi/file2md)
404
+ - [이슈 및 버그 신고](https://github.com/ricky-clevi/file2md/issues)
405
+
406
+ ---
407
+
408
+ **❤️와 TypeScript로 제작** • **🖼️ 실제 PDF 이미지 추출 기능 향상** • **🇰🇷 한국어 문서 지원**
package/README.md CHANGED
@@ -4,13 +4,15 @@
4
4
  [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/)
5
5
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
6
 
7
- A modern TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with **advanced layout preservation**, **image extraction**, **chart conversion**, and **Korean language support**.
7
+ A modern TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with **advanced layout preservation**, **real PDF image extraction**, **chart conversion**, and **Korean language support**.
8
+
9
+ **English** | [한국어](README.ko.md)
8
10
 
9
11
  ## ✨ Features
10
12
 
11
13
  - 🔄 **Multiple Format Support**: PDF, DOCX, XLSX, PPTX, HWP, HWPX
12
14
  - 🎨 **Layout Preservation**: Maintains document structure, tables, and formatting
13
- - 🖼️ **Image Extraction**: Automatically extracts and references images
15
+ - 🖼️ **Real PDF Image Extraction**: Convert PDF pages to actual PNG images using pdf2pic
14
16
  - 📊 **Chart Conversion**: Converts charts to Markdown tables
15
17
  - 📝 **List & Table Support**: Proper nested lists and complex tables
16
18
  - 🌏 **Korean Language Support**: Full support for HWP/HWPX Korean document formats
@@ -137,8 +139,8 @@ interface ConversionResult {
137
139
  - ✅ **Table detection** and formatting
138
140
  - ✅ **List recognition** (bullets, numbers)
139
141
  - ✅ **Heading detection** (ALL CAPS, colons)
140
- - ✅ **Page-to-image fallback** for complex layouts
141
- - ✅ **Embedded image extraction** when available
142
+ - ✅ **Real image extraction** using pdf2pic - converts PDF pages to PNG images
143
+ - ✅ **Embedded image detection** and extraction
142
144
 
143
145
  ### 📝 DOCX
144
146
  - ✅ **Heading hierarchy** (H1-H6)
@@ -239,23 +241,6 @@ try {
239
241
 
240
242
  ## 🧪 Advanced Usage
241
243
 
242
- ### Custom Error Handling
243
-
244
- ```typescript
245
- import { convert, ConversionError } from 'file2md';
246
-
247
- try {
248
- const result = await convert('./complex-document.docx');
249
- } catch (error) {
250
- if (error instanceof ConversionError) {
251
- console.error(`Conversion failed [${error.code}]:`, error.message);
252
- if (error.originalError) {
253
- console.error('Original error:', error.originalError);
254
- }
255
- }
256
- }
257
- ```
258
-
259
244
  ### Batch Processing
260
245
 
261
246
  ```typescript
@@ -267,9 +252,12 @@ async function convertFolder(folderPath: string) {
267
252
  const results = [];
268
253
 
269
254
  for (const file of files) {
270
- if (file.match(/\.(pdf|docx|xlsx|pptx)$/i)) {
255
+ if (file.match(/\.(pdf|docx|xlsx|pptx|hwp|hwpx)$/i)) {
271
256
  try {
272
- const result = await convert(`${folderPath}/${file}`);
257
+ const result = await convert(`${folderPath}/${file}`, {
258
+ imageDir: 'batch-images',
259
+ extractImages: true
260
+ });
273
261
  results.push({ file, success: true, result });
274
262
  } catch (error) {
275
263
  results.push({ file, success: false, error });
@@ -281,64 +269,34 @@ async function convertFolder(folderPath: string) {
281
269
  }
282
270
  ```
283
271
 
284
- ## 🏗️ Development
272
+ ### PDF Image Extraction Options
285
273
 
286
- ### Build from Source
287
-
288
- ```bash
289
- git clone https://github.com/yourusername/file2md.git
290
- cd file2md
291
- npm install
292
- npm run build
293
- ```
294
-
295
- ### Testing
296
-
297
- ```bash
298
- npm test # Run tests
299
- npm run test:watch # Watch mode
300
- npm run test:coverage # Coverage report
301
- ```
274
+ ```typescript
275
+ import { convert } from 'file2md';
302
276
 
303
- ### Linting
277
+ // For image-heavy PDFs (scanned documents)
278
+ const result = await convert('./scanned-document.pdf', {
279
+ imageDir: 'pdf-images',
280
+ maxPages: 10, // Limit pages for large PDFs
281
+ extractImages: true // Enable PDF-to-image conversion
282
+ });
304
283
 
305
- ```bash
306
- npm run lint # Check code style
307
- npm run lint:fix # Fix issues
284
+ console.log(`Extracted ${result.images.length} page images from PDF`);
308
285
  ```
309
286
 
310
- ## 🤝 Contributing
311
-
312
- Contributions are welcome! Please feel free to submit a Pull Request.
313
-
314
- 1. Fork the repository
315
- 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
316
- 3. Commit your changes (`git commit -m 'Add amazing feature'`)
317
- 4. Push to the branch (`git push origin feature/amazing-feature`)
318
- 5. Open a Pull Request
319
-
320
- ## 📄 License
321
-
322
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
323
-
324
- ## 🔗 Links
325
-
326
- - [npm package](https://www.npmjs.com/package/file2md)
327
- - [GitHub repository](https://github.com/yourusername/file2md)
328
- - [Issues & Bug Reports](https://github.com/yourusername/file2md/issues)
329
287
 
330
288
  ## 📊 Supported Formats
331
289
 
332
290
  | Format | Extension | Layout | Images | Charts | Tables | Lists |
333
291
  |--------|-----------|---------|---------|---------|---------|--------|
334
- | PDF | `.pdf` | ✅ | ✅* | ❌ | ✅ | ✅ |
292
+ | PDF | `.pdf` | ✅ | | ❌ | ✅ | ✅ |
335
293
  | Word | `.docx` | ✅ | ✅ | ✅ | ✅ | ✅ |
336
294
  | Excel | `.xlsx` | ✅ | ❌ | ✅ | ✅ | ❌ |
337
295
  | PowerPoint | `.pptx` | ✅ | ✅ | ✅ | ✅ | ❌ |
338
296
  | HWP | `.hwp` | ✅ | ✅ | ❌ | ❌ | ✅ |
339
297
  | HWPX | `.hwpx` | ✅ | ✅ | ❌ | ❌ | ✅ |
340
298
 
341
- *PDF images via page-to-image conversion or embedded extraction
299
+ > **PDF Images**: Converts PDF pages to actual PNG images using pdf2pic library
342
300
 
343
301
  ## 🌏 Korean Document Support
344
302
 
@@ -378,9 +336,7 @@ for (const doc of koreanDocs) {
378
336
  }
379
337
  ```
380
338
 
381
- ## 🔧 Advanced Configuration
382
-
383
- ### Performance Optimization
339
+ ## 🔧 Performance & Configuration
384
340
 
385
341
  ```typescript
386
342
  import { convert } from 'file2md';
@@ -388,7 +344,7 @@ import { convert } from 'file2md';
388
344
  // Optimize for large documents
389
345
  const result = await convert('./large-document.pdf', {
390
346
  maxPages: 50, // Limit PDF processing
391
- extractImages: false, // Disable images for speed
347
+ extractImages: true, // Enable PDF image extraction
392
348
  preserveLayout: true // Keep layout analysis
393
349
  });
394
350
 
@@ -399,41 +355,13 @@ const pptxResult = await convert('./presentation.pptx', {
399
355
  extractCharts: true, // Extract chart data
400
356
  extractImages: true // Extract embedded images
401
357
  });
402
- ```
403
-
404
- ### Error Handling for Korean Documents
405
-
406
- ```typescript
407
- import { convert, ParseError } from 'file2md';
408
-
409
- try {
410
- const result = await convert('./korean-document.hwp');
411
- console.log('Korean document converted successfully');
412
- } catch (error) {
413
- if (error instanceof ParseError) {
414
- console.error(`Failed to parse ${error.format} document:`, error.message);
415
- // Handle Korean-specific parsing errors
416
- if (error.format === 'HWP' || error.format === 'HWPX') {
417
- console.log('Try converting to HWPX format for better compatibility');
418
- }
419
- }
420
- }
421
- ```
422
-
423
- ## 📈 Performance Metrics
424
-
425
- The library provides detailed performance metrics in the metadata:
426
-
427
- ```typescript
428
- const result = await convert('./document.docx');
429
358
 
359
+ // Performance metrics are available in metadata
430
360
  console.log('Performance Metrics:');
431
361
  console.log(`- Processing time: ${result.metadata.processingTime}ms`);
432
362
  console.log(`- Pages processed: ${result.metadata.pageCount}`);
433
363
  console.log(`- Images extracted: ${result.metadata.imageCount}`);
434
- console.log(`- Charts found: ${result.metadata.chartCount}`);
435
364
  console.log(`- File type: ${result.metadata.fileType}`);
436
- console.log(`- MIME type: ${result.metadata.mimeType}`);
437
365
  ```
438
366
 
439
367
  ## 🤝 Contributing
@@ -466,18 +394,6 @@ npm run build
466
394
  npm run lint
467
395
  ```
468
396
 
469
- ### Testing Korean Documents
470
-
471
- When testing Korean document support:
472
-
473
- ```bash
474
- # Run specific tests for Korean formats
475
- npm test -- --testNamePattern="HWP"
476
-
477
- # Run with coverage for Korean parsers
478
- npm run test:coverage -- --collectCoverageFrom="src/parsers/hwp-*.ts"
479
- ```
480
-
481
397
  ## 📄 License
482
398
 
483
399
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -487,9 +403,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
487
403
  - [npm package](https://www.npmjs.com/package/file2md)
488
404
  - [GitHub repository](https://github.com/ricky-clevi/file2md)
489
405
  - [Issues & Bug Reports](https://github.com/ricky-clevi/file2md/issues)
490
- - [Korean Document Format Info](https://www.hancom.com/)
491
406
 
492
407
  ---
493
408
 
494
- **Made with ❤️ and TypeScript**
495
- **🇰🇷 Enhanced with Korean document support**
409
+ **Made with ❤️ and TypeScript** • **🖼️ Enhanced with real PDF image extraction** • **🇰🇷 Korean document support**
@@ -1 +1 @@
1
- {"version":3,"file":"pdf-extractor.d.ts","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AAErC,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,wBAAwB,CAAC;AAEvD,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AAW3D,MAAM,WAAW,eAAe;IAC9B,QAAQ,CAAC,QAAQ,CAAC,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,cAAc,CAAC,EAAE,OAAO,CAAC;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC;IAC1B,QAAQ,CAAC,MAAM,EAAE,SAAS,OAAO,wBAAwB,EAAE,SAAS,EAAE,CAAC;IACvE,QAAQ,CAAC,SAAS,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAC;CAC5C;AAED,qBAAa,YAAY;IACvB,OAAO,CAAC,QAAQ,CAAC,cAAc,CAAiB;IAChD,OAAO,CAAC,WAAW,CAAa;gBAEpB,cAAc,EAAE,cAAc;IAI1C;;OAEG;IACG,oBAAoB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,SAAS,QAAQ,EAAE,CAAC;IAyCxE;;OAEG;YACW,kBAAkB;IA6DhC;;OAEG;YACW,kBAAkB;IA6BhC;;OAEG;IACG,qBAAqB,CAAC,IAAI,EAAE,MAAM,EAAE,QAAQ,CAAC,EAAE,OAAO,GAAG,OAAO,CAAC,MAAM,CAAC;IAiE9E,OAAO,CAAC,eAAe;IAoBvB,OAAO,CAAC,qBAAqB;IAO7B,OAAO,CAAC,gBAAgB;IAaxB,OAAO,CAAC,aAAa;IAgBrB,OAAO,CAAC,eAAe;IA+BvB,OAAO,CAAC,UAAU;IAYlB,OAAO,CAAC,cAAc;IAatB;;OAEG;IACG,gBAAgB,CAAC,UAAU,EAAE,SAAS,QAAQ,EAAE,GAAG,OAAO,CAAC,MAAM,CAAC;IAgBxE;;OAEG;IACH,KAAK,IAAI,IAAI;IAIb;;OAEG;IACH,IAAI,gBAAgB,IAAI,MAAM,CAE7B;CACF"}
1
+ {"version":3,"file":"pdf-extractor.d.ts","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AAErC,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,wBAAwB,CAAC;AAEvD,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,sBAAsB,CAAC;AAW3D,MAAM,WAAW,eAAe;IAC9B,QAAQ,CAAC,QAAQ,CAAC,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,cAAc,CAAC,EAAE,OAAO,CAAC;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC;IAC1B,QAAQ,CAAC,MAAM,EAAE,SAAS,OAAO,wBAAwB,EAAE,SAAS,EAAE,CAAC;IACvE,QAAQ,CAAC,SAAS,EAAE,MAAM,CAAC;IAC3B,QAAQ,CAAC,QAAQ,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAC;CAC5C;AAED,qBAAa,YAAY;IACvB,OAAO,CAAC,QAAQ,CAAC,cAAc,CAAiB;IAChD,OAAO,CAAC,WAAW,CAAa;gBAEpB,cAAc,EAAE,cAAc;IAI1C;;OAEG;IACG,oBAAoB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAAC,SAAS,QAAQ,EAAE,CAAC;IAgDxE;;OAEG;YACW,kBAAkB;IA4EhC;;OAEG;YACW,kBAAkB;IAqFhC;;OAEG;IACG,qBAAqB,CAAC,IAAI,EAAE,MAAM,EAAE,QAAQ,CAAC,EAAE,OAAO,GAAG,OAAO,CAAC,MAAM,CAAC;IAiE9E,OAAO,CAAC,eAAe;IAoBvB,OAAO,CAAC,qBAAqB;IAO7B,OAAO,CAAC,gBAAgB;IAaxB,OAAO,CAAC,aAAa;IAgBrB,OAAO,CAAC,eAAe;IA+BvB,OAAO,CAAC,UAAU;IAYlB,OAAO,CAAC,cAAc;IAatB;;OAEG;IACG,gBAAgB,CAAC,UAAU,EAAE,SAAS,QAAQ,EAAE,GAAG,OAAO,CAAC,MAAM,CAAC;IAgBxE;;OAEG;IACH,KAAK,IAAI,IAAI;IAIb;;OAEG;IACH,IAAI,gBAAgB,IAAI,MAAM,CAE7B;CACF"}
@@ -30,6 +30,11 @@ export class PDFExtractor {
30
30
  }
31
31
  catch (pdf2picError) {
32
32
  console.warn('⚠️ pdf2pic extraction failed:', pdf2picError instanceof Error ? pdf2picError.message : 'Unknown error');
33
+ // Check if the error suggests missing dependencies
34
+ const errorMessage = pdf2picError instanceof Error ? pdf2picError.message : '';
35
+ if (errorMessage.includes('GraphicsMagick') || errorMessage.includes('ImageMagick') || errorMessage.includes('command not found')) {
36
+ console.log('💡 Consider installing GraphicsMagick or ImageMagick for PDF-to-image conversion');
37
+ }
33
38
  // Fall back to placeholder creation only if pdf2pic fails
34
39
  return await this.createPlaceholders(pdfData.numpages);
35
40
  }
@@ -53,37 +58,49 @@ export class PDFExtractor {
53
58
  try {
54
59
  const { fromBuffer } = await import('pdf2pic');
55
60
  console.log(`🔄 Converting PDF to images (max ${maxPages} pages)...`);
61
+ // Configure pdf2pic options to save directly to files
62
+ const convertOptions = {
63
+ format: 'png',
64
+ out_dir: this.imageExtractor.imageDirectory,
65
+ out_prefix: 'pdf_page'
66
+ };
67
+ // Convert PDF buffer to images
68
+ const convert = fromBuffer(buffer, convertOptions);
56
69
  const extractedPages = [];
57
- // Convert each page individually to have more control
70
+ // Convert pages one by one
58
71
  for (let pageNumber = 1; pageNumber <= maxPages; pageNumber++) {
59
72
  try {
60
- // Configure pdf2pic options for this specific page
61
- const convertOptions = {
62
- format: 'png',
63
- out_dir: this.imageExtractor.imageDirectory,
64
- out_prefix: `pdf_page_${pageNumber}`,
65
- page: pageNumber
66
- };
67
- // Convert single page from buffer
68
- const convert = fromBuffer(buffer, convertOptions);
69
- const result = await convert(pageNumber, true); // true for returning base64
70
- if (result && 'base64' in result && result.base64) {
71
- // Convert base64 to buffer
72
- const imageBuffer = Buffer.from(result.base64, 'base64');
73
- const filename = `pdf_page_${pageNumber}.png`;
74
- // Save the image using the image extractor
75
- const savedPath = await this.imageExtractor.saveImage(imageBuffer, filename);
76
- if (savedPath) {
77
- extractedPages.push({
78
- pageNumber,
79
- imagePath: path.basename(savedPath),
80
- fullPath: savedPath,
81
- dimensions: {
82
- width: 800, // Default dimensions since pdf2pic doesn't always provide them
83
- height: 600
84
- }
85
- });
86
- console.log(`✅ Converted page ${pageNumber} to image`);
73
+ const result = await convert(pageNumber);
74
+ if (result && 'path' in result && result.path) {
75
+ const filename = path.basename(result.path);
76
+ try {
77
+ // Read the generated image file
78
+ const fs = await import('fs/promises');
79
+ const imageBuffer = await fs.readFile(result.path);
80
+ // Save using image extractor to ensure proper naming and location
81
+ const savedPath = await this.imageExtractor.saveImage(imageBuffer, `pdf_page_${pageNumber}.png`);
82
+ if (savedPath) {
83
+ extractedPages.push({
84
+ pageNumber,
85
+ imagePath: path.basename(savedPath),
86
+ fullPath: savedPath,
87
+ dimensions: {
88
+ width: ('width' in result && typeof result.width === 'number') ? result.width : 800,
89
+ height: ('height' in result && typeof result.height === 'number') ? result.height : 600
90
+ }
91
+ });
92
+ console.log(`✅ Converted page ${pageNumber} to image`);
93
+ }
94
+ // Clean up the temporary file
95
+ try {
96
+ await fs.unlink(result.path);
97
+ }
98
+ catch (unlinkError) {
99
+ console.warn(`⚠️ Failed to clean up temp file ${result.path}`);
100
+ }
101
+ }
102
+ catch (fileError) {
103
+ console.warn(`⚠️ Failed to process image file for page ${pageNumber}:`, fileError instanceof Error ? fileError.message : 'Unknown error');
87
104
  }
88
105
  }
89
106
  }
@@ -95,6 +112,7 @@ export class PDFExtractor {
95
112
  if (extractedPages.length === 0) {
96
113
  throw new Error('No pages could be converted to images');
97
114
  }
115
+ console.log(`🎉 Successfully converted ${extractedPages.length} pages to images`);
98
116
  return extractedPages;
99
117
  }
100
118
  catch (error) {
@@ -106,23 +124,74 @@ export class PDFExtractor {
106
124
  * Create placeholder files as fallback when pdf2pic fails
107
125
  */
108
126
  async createPlaceholders(pageCount) {
109
- console.log('📝 Creating placeholders as fallback...');
127
+ console.log('📝 Creating image placeholders as fallback...');
110
128
  const extractedPages = [];
111
129
  const maxPages = Math.min(pageCount, 3);
112
130
  for (let page = 1; page <= maxPages; page++) {
113
- const placeholderContent = `PDF Page ${page} Image Placeholder\n\nThis page appears to contain primarily image content.\nUnable to extract actual image - pdf2pic conversion failed.\n\nPage ${page} of ${pageCount}`;
114
- const placeholderBuffer = Buffer.from(placeholderContent, 'utf-8');
115
- // Save placeholder as a text file
116
- const filename = `pdf_page_${page}_placeholder.txt`;
117
- // Use the image extractor to save the placeholder
118
- const savedPath = await this.imageExtractor.saveImage(placeholderBuffer, filename);
119
- if (savedPath) {
120
- extractedPages.push({
121
- pageNumber: page,
122
- imagePath: path.basename(savedPath),
123
- fullPath: savedPath
124
- });
125
- console.log(`✅ Created placeholder for page ${page}`);
131
+ try {
132
+ // Create a simple placeholder image using Sharp
133
+ const sharp = await import('sharp');
134
+ // Create a 800x600 placeholder image with text
135
+ const placeholderImage = await sharp.default({
136
+ create: {
137
+ width: 800,
138
+ height: 600,
139
+ channels: 4,
140
+ background: { r: 240, g: 240, b: 240, alpha: 1 }
141
+ }
142
+ })
143
+ .png()
144
+ .composite([
145
+ {
146
+ input: Buffer.from(`<svg width="800" height="600" xmlns="http://www.w3.org/2000/svg">
147
+ <rect width="800" height="600" fill="#f0f0f0" stroke="#ccc" stroke-width="2"/>
148
+ <text x="400" y="250" text-anchor="middle" font-family="Arial" font-size="24" fill="#666">PDF Page ${page}</text>
149
+ <text x="400" y="290" text-anchor="middle" font-family="Arial" font-size="16" fill="#888">Image extraction failed</text>
150
+ <text x="400" y="320" text-anchor="middle" font-family="Arial" font-size="16" fill="#888">Page ${page} of ${pageCount}</text>
151
+ <text x="400" y="360" text-anchor="middle" font-family="Arial" font-size="14" fill="#aaa">Install GraphicsMagick for better PDF support</text>
152
+ </svg>`),
153
+ top: 0,
154
+ left: 0,
155
+ }
156
+ ])
157
+ .toBuffer();
158
+ const filename = `pdf_page_${page}_placeholder.png`;
159
+ // Use the image extractor to save the placeholder
160
+ const savedPath = await this.imageExtractor.saveImage(placeholderImage, filename);
161
+ if (savedPath) {
162
+ extractedPages.push({
163
+ pageNumber: page,
164
+ imagePath: path.basename(savedPath),
165
+ fullPath: savedPath,
166
+ dimensions: {
167
+ width: 800,
168
+ height: 600
169
+ }
170
+ });
171
+ console.log(`✅ Created image placeholder for page ${page}`);
172
+ }
173
+ }
174
+ catch (sharpError) {
175
+ console.warn(`⚠️ Failed to create image placeholder for page ${page}:`, sharpError instanceof Error ? sharpError.message : 'Unknown error');
176
+ // Fallback to simple text-based approach without Sharp conversion
177
+ const filename = `pdf_page_${page}_info.txt`;
178
+ const placeholderContent = `PDF Page ${page} - Image extraction failed\n\nPage ${page} of ${pageCount}\n\nInstall GraphicsMagick for better PDF image support.`;
179
+ const placeholderBuffer = Buffer.from(placeholderContent, 'utf-8');
180
+ // Save directly without image conversion
181
+ const fs = await import('fs/promises');
182
+ const fullPath = path.join(this.imageExtractor.imageDirectory, filename);
183
+ try {
184
+ await fs.writeFile(fullPath, placeholderBuffer);
185
+ extractedPages.push({
186
+ pageNumber: page,
187
+ imagePath: filename,
188
+ fullPath: path.resolve(fullPath)
189
+ });
190
+ console.log(`✅ Created text placeholder for page ${page}`);
191
+ }
192
+ catch (writeError) {
193
+ console.warn(`⚠️ Failed to write placeholder for page ${page}:`, writeError instanceof Error ? writeError.message : 'Unknown error');
194
+ }
126
195
  }
127
196
  }
128
197
  return extractedPages;
@@ -1 +1 @@
1
- {"version":3,"file":"pdf-extractor.js","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AA2BrC,MAAM,OAAO,YAAY;IACN,cAAc,CAAiB;IACxC,WAAW,GAAW,CAAC,CAAC;IAEhC,YAAY,cAA8B;QACxC,IAAI,CAAC,cAAc,GAAG,cAAc,CAAC;IACvC,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,oBAAoB,CAAC,MAAc;QACvC,IAAI,CAAC;YACH,OAAO,CAAC,GAAG,CAAC,yDAAyD,CAAC,CAAC;YAEvE,yCAAyC;YACzC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,WAAW,CAAC,CAAC;YAC3C,MAAM,OAAO,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,4BAA4B,OAAO,CAAC,QAAQ,wBAAwB,OAAO,CAAC,IAAI,EAAE,MAAM,IAAI,CAAC,EAAE,CAAC,CAAC;YAE7G,oEAAoE;YACpE,MAAM,YAAY,GAAG,CAAC,OAAO,CAAC,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,GAAG,CAAC;YAEvE,4FAA4F;YAC5F,IAAI,YAAY,IAAI,OAAO,CAAC,QAAQ,IAAI,CAAC,EAAE,CAAC;gBAC1C,OAAO,CAAC,GAAG,CAAC,uEAAuE,CAAC,CAAC;gBAErF,IAAI,CAAC;oBACH,MAAM,cAAc,GAAG,MAAM,IAAI,CAAC,kBAAkB,CAAC,MAAM,EAAE,IAAI,CAAC,GAAG,CAAC,OAAO,CAAC,QAAQ,EAAE,CAAC,CAAC,CAAC,CAAC;oBAC5F,IAAI,cAAc,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;wBAC9B,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,cAAc,CAAC,CAAC;wBAC9E,OAAO,cAAc,CAAC;oBACxB,CAAC;gBACH,CAAC;gBAAC,OAAO,YAAqB,EAAE,CAAC;oBAC/B,OAAO,CAAC,IAAI,CAAC,+BAA+B,EAAE,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBACtH,0DAA0D;oBAC1D,OAAO,MAAM,IAAI,CAAC,kBAAkB,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;gBACzD,CAAC;YACH,CAAC;iBAAM,CAAC;gBACN,OAAO,CAAC,GAAG,CAAC,8DAA8D,CAAC,CAAC;gBAC5E,OAAO,EAAE,CAAC;YACZ,CAAC;YAED,OAAO,EAAE,CAAC;QACZ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,IAAI,CAAC,yBAAyB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YAClG,4EAA4E;YAC5E,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,MAAc,EAAE,WAAmB,CAAC;QACnE,IAAI,CAAC;YACH,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,MAAM,CAAC,SAAS,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,oCAAoC,QAAQ,YAAY,CAAC,CAAC;YAEtE,MAAM,cAAc,GAAe,EAAE,CAAC;YAEtC,sDAAsD;YACtD,KAAK,IAAI,UAAU,GAAG,CAAC,EAAE,UAAU,IAAI,QAAQ,EAAE,UAAU,EAAE,EAAE,CAAC;gBAC9D,IAAI,CAAC;oBACH,mDAAmD;oBACnD,MAAM,cAAc,GAAG;wBACrB,MAAM,EAAE,KAAc;wBACtB,OAAO,EAAE,IAAI,CAAC,cAAc,CAAC,cAAc;wBAC3C,UAAU,EAAE,YAAY,UAAU,EAAE;wBACpC,IAAI,EAAE,UAAU;qBACjB,CAAC;oBAEF,kCAAkC;oBAClC,MAAM,OAAO,GAAG,UAAU,CAAC,MAAM,EAAE,cAAc,CAAC,CAAC;oBACnD,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,UAAU,EAAE,IAAI,CAAC,CAAC,CAAC,4BAA4B;oBAE5E,IAAI,MAAM,IAAI,QAAQ,IAAI,MAAM,IAAI,MAAM,CAAC,MAAM,EAAE,CAAC;wBAClD,2BAA2B;wBAC3B,MAAM,WAAW,GAAG,MAAM,CAAC,IAAI,CAAC,MAAM,CAAC,MAAM,EAAE,QAAQ,CAAC,CAAC;wBACzD,MAAM,QAAQ,GAAG,YAAY,UAAU,MAAM,CAAC;wBAE9C,2CAA2C;wBAC3C,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,WAAW,EAAE,QAAQ,CAAC,CAAC;wBAE7E,IAAI,SAAS,EAAE,CAAC;4BACd,cAAc,CAAC,IAAI,CAAC;gCAClB,UAAU;gCACV,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;gCACnC,QAAQ,EAAE,SAAS;gCACnB,UAAU,EAAE;oCACV,KAAK,EAAE,GAAG,EAAE,+DAA+D;oCAC3E,MAAM,EAAE,GAAG;iCACZ;6BACF,CAAC,CAAC;4BACH,OAAO,CAAC,GAAG,CAAC,oBAAoB,UAAU,WAAW,CAAC,CAAC;wBACzD,CAAC;oBACH,CAAC;gBACH,CAAC;gBAAC,OAAO,SAAkB,EAAE,CAAC;oBAC5B,OAAO,CAAC,IAAI,CAAC,6BAA6B,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAC3H,0BAA0B;gBAC5B,CAAC;YACH,CAAC;YAED,IAAI,cAAc,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBAChC,MAAM,IAAI,KAAK,CAAC,uCAAuC,CAAC,CAAC;YAC3D,CAAC;YAED,OAAO,cAAc,CAAC;QACxB,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,KAAK,CAAC,8BAA8B,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YACxG,MAAM,KAAK,CAAC;QACd,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,SAAiB;QAChD,OAAO,CAAC,GAAG,CAAC,yCAAyC,CAAC,CAAC;QAEvD,MAAM,cAAc,GAAe,EAAE,CAAC;QACtC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,CAAC,SAAS,EAAE,CAAC,CAAC,CAAC;QAExC,KAAK,IAAI,IAAI,GAAG,CAAC,EAAE,IAAI,IAAI,QAAQ,EAAE,IAAI,EAAE,EAAE,CAAC;YAC5C,MAAM,kBAAkB,GAAG,YAAY,IAAI,oJAAoJ,IAAI,OAAO,SAAS,EAAE,CAAC;YACtN,MAAM,iBAAiB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,EAAE,OAAO,CAAC,CAAC;YAEnE,kCAAkC;YAClC,MAAM,QAAQ,GAAG,YAAY,IAAI,kBAAkB,CAAC;YAEpD,kDAAkD;YAClD,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,iBAAiB,EAAE,QAAQ,CAAC,CAAC;YAEnF,IAAI,SAAS,EAAE,CAAC;gBACd,cAAc,CAAC,IAAI,CAAC;oBAClB,UAAU,EAAE,IAAI;oBAChB,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;oBACnC,QAAQ,EAAE,SAAS;iBACpB,CAAC,CAAC;gBACH,OAAO,CAAC,GAAG,CAAC,kCAAkC,IAAI,EAAE,CAAC,CAAC;YACxD,CAAC;QACH,CAAC;QAED,OAAO,cAAc,CAAC;IACxB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,qBAAqB,CAAC,IAAY,EAAE,QAAkB;QAC1D,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;QAC/B,IAAI,YAAY,GAAG,EAAE,CAAC;QACtB,IAAI,OAAO,GAAG,KAAK,CAAC;QACpB,IAAI,SAAS,GAAe,EAAE,CAAC;QAE/B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;YACtC,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;YAE7B,IAAI,CAAC,IAAI,EAAE,CAAC;gBACV,qBAAqB;gBACrB,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBACD,YAAY,IAAI,IAAI,CAAC;gBACrB,SAAS;YACX,CAAC;YAED,iEAAiE;YACjE,IAAI,IAAI,CAAC,eAAe,CAAC,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC;gBACzC,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBAED,MAAM,YAAY,GAAG,IAAI,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC;gBACtD,YAAY,IAAI,GAAG,GAAG,CAAC,MAAM,CAAC,YAAY,CAAC,IAAI,IAAI,MAAM,CAAC;gBAC1D,SAAS;YACX,CAAC;YAED,4BAA4B;YAC5B,IAAI,IAAI,CAAC,gBAAgB,CAAC,IAAI,CAAC,EAAE,CAAC;gBAChC,IAAI,CAAC,OAAO,EAAE,CAAC;oBACb,OAAO,GAAG,IAAI,CAAC;gBACjB,CAAC;gBACD,SAAS,CAAC,IAAI,CAAC,EAAE,KAAK,EAAE,IAAI,CAAC,aAAa,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACpD,SAAS;YACX,CAAC;iBAAM,IAAI,OAAO,EAAE,CAAC;gBACnB,eAAe;gBACf,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;gBAChD,SAAS,GAAG,EAAE,CAAC;gBACf,OAAO,GAAG,KAAK,CAAC;YAClB,CAAC;YAED,eAAe;YACf,IAAI,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC1B,YAAY,IAAI,GAAG,IAAI,CAAC,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;gBACjD,SAAS;YACX,CAAC;YAED,oBAAoB;YACpB,YAAY,IAAI,GAAG,IAAI,IAAI,CAAC;QAC9B,CAAC;QAED,6BAA6B;QAC7B,IAAI,OAAO,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACpC,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;QAClD,CAAC;QAED,OAAO,YAAY,CAAC;IACtB,CAAC;IAEO,eAAe,CAAC,IAAY,EAAE,QAA2B,EAAE,KAAa;QAC9E,qCAAqC;QACrC,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,KAAK,CAAC,CAAC,2BAA2B;QAC/D,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,KAAK,CAAC,CAAE,YAAY;QAEhD,+CAA+C;QAC/C,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhE,0CAA0C;QAC1C,MAAM,QAAQ,GAAG,QAAQ,CAAC,KAAK,GAAG,CAAC,CAAC,CAAC;QACrC,IAAI,QAAQ,IAAI,QAAQ,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC3D,OAAO,IAAI,CAAC;QACd,CAAC;QAED,iDAAiD;QACjD,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpC,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,qBAAqB,CAAC,IAAY;QACxC,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE;YAAE,OAAO,CAAC,CAAC,CAAC,2BAA2B;QACtE,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,CAAC,CAAC,CAAS,4BAA4B;QACtE,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,CAAC,CAAC,CAAW,qBAAqB;QAC/D,OAAO,CAAC,CAAC,CAAC,UAAU;IACtB,CAAC;IAEO,gBAAgB,CAAC,IAAY;QACnC,8CAA8C;QAC9C,MAAM,QAAQ,GAAG;YACf,KAAK;YACL,QAAQ,EAAkB,kBAAkB;YAC5C,IAAI,EAAsB,iBAAiB;YAC3C,WAAW;YACX,cAAc;SACf,CAAC;QAEF,OAAO,QAAQ,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IACtD,CAAC;IAEO,aAAa,CAAC,IAAY;QAChC,sDAAsD;QACtD,IAAI,OAAO,GAAa,EAAE,CAAC;QAE3B,IAAI,IAAI,CAAC,QAAQ,CAAC,IAAI,CAAC,EAAE,CAAC;YACxB,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACpD,CAAC;aAAM,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,EAAE,CAAC;YAC9B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACnD,CAAC;aAAM,CAAC;YACN,2BAA2B;YAC3B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,CAAC;QAED,OAAO,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAC/C,CAAC;IAEO,eAAe,CAAC,IAAyB;QAC/C,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO,EAAE,CAAC;QAEjC,iCAAiC;QACjC,MAAM,OAAO,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC;QAE/D,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,GAAG,CAAC,IAAI,IAAI,CAAC,OAAO,EAAE,EAAE,CAAC;YACtC,IAAI,WAAW,GAAG,GAAG,CAAC;YAEtB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;gBACjC,MAAM,IAAI,GAAG,GAAG,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;gBAChC,WAAW,IAAI,IAAI,IAAI,IAAI,CAAC;YAC9B,CAAC;YAED,QAAQ,IAAI,GAAG,WAAW,IAAI,CAAC;YAE/B,uCAAuC;YACvC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;gBACZ,IAAI,SAAS,GAAG,GAAG,CAAC;gBACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;oBACjC,SAAS,IAAI,QAAQ,CAAC;gBACxB,CAAC;gBACD,QAAQ,IAAI,GAAG,SAAS,IAAI,CAAC;YAC/B,CAAC;QACH,CAAC;QAED,OAAO,GAAG,QAAQ,IAAI,CAAC;IACzB,CAAC;IAEO,UAAU,CAAC,IAAY;QAC7B,kCAAkC;QAClC,MAAM,YAAY,GAAG;YACnB,cAAc;YACd,cAAc;YACd,mBAAmB;YACnB,kBAAkB;SACnB,CAAC;QAEF,OAAO,YAAY,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC1D,CAAC;IAEO,cAAc,CAAC,IAAY;QACjC,2CAA2C;QAC3C,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC9B,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,KAAK,CAAC,CAAC;QAC7C,CAAC;aAAM,IAAI,mBAAmB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC1C,OAAO,IAAI,CAAC,OAAO,CAAC,mBAAmB,EAAE,IAAI,CAAC,CAAC;QACjD,CAAC;aAAM,IAAI,kBAAkB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC,OAAO,CAAC,kBAAkB,EAAE,IAAI,CAAC,CAAC;QAChD,CAAC;aAAM,CAAC;YACN,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC,CAAC;QAC5C,CAAC;IACH,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,gBAAgB,CAAC,UAA+B;QACpD,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,IAAI,CAAC,IAAI,UAAU,CAAC,OAAO,EAAE,EAAE,CAAC;YAC7C,QAAQ,IAAI,WAAW,IAAI,CAAC,UAAU,MAAM,CAAC;YAC7C,QAAQ,IAAI,IAAI,CAAC,cAAc,CAAC,gBAAgB,CAAC,QAAQ,IAAI,CAAC,UAAU,EAAE,EAAE,IAAI,CAAC,SAAS,CAAC,CAAC;YAC5F,QAAQ,IAAI,MAAM,CAAC;YAEnB,IAAI,CAAC,GAAG,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC9B,QAAQ,IAAI,SAAS,CAAC,CAAC,iBAAiB;YAC1C,CAAC;QACH,CAAC;QAED,OAAO,QAAQ,CAAC;IAClB,CAAC;IAED;;OAEG;IACH,KAAK;QACH,IAAI,CAAC,WAAW,GAAG,CAAC,CAAC;IACvB,CAAC;IAED;;OAEG;IACH,IAAI,gBAAgB;QAClB,OAAO,IAAI,CAAC,WAAW,CAAC;IAC1B,CAAC;CACF"}
1
+ {"version":3,"file":"pdf-extractor.js","sourceRoot":"","sources":["../../src/utils/pdf-extractor.ts"],"names":[],"mappings":"AAAA,OAAO,IAAI,MAAM,WAAW,CAAC;AAC7B,OAAO,EAAE,MAAM,EAAE,MAAM,aAAa,CAAC;AA2BrC,MAAM,OAAO,YAAY;IACN,cAAc,CAAiB;IACxC,WAAW,GAAW,CAAC,CAAC;IAEhC,YAAY,cAA8B;QACxC,IAAI,CAAC,cAAc,GAAG,cAAc,CAAC;IACvC,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,oBAAoB,CAAC,MAAc;QACvC,IAAI,CAAC;YACH,OAAO,CAAC,GAAG,CAAC,yDAAyD,CAAC,CAAC;YAEvE,yCAAyC;YACzC,MAAM,QAAQ,GAAG,MAAM,MAAM,CAAC,WAAW,CAAC,CAAC;YAC3C,MAAM,OAAO,GAAG,MAAM,QAAQ,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,4BAA4B,OAAO,CAAC,QAAQ,wBAAwB,OAAO,CAAC,IAAI,EAAE,MAAM,IAAI,CAAC,EAAE,CAAC,CAAC;YAE7G,oEAAoE;YACpE,MAAM,YAAY,GAAG,CAAC,OAAO,CAAC,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,GAAG,CAAC;YAEvE,4FAA4F;YAC5F,IAAI,YAAY,IAAI,OAAO,CAAC,QAAQ,IAAI,CAAC,EAAE,CAAC;gBAC1C,OAAO,CAAC,GAAG,CAAC,uEAAuE,CAAC,CAAC;gBAErF,IAAI,CAAC;oBACH,MAAM,cAAc,GAAG,MAAM,IAAI,CAAC,kBAAkB,CAAC,MAAM,EAAE,IAAI,CAAC,GAAG,CAAC,OAAO,CAAC,QAAQ,EAAE,CAAC,CAAC,CAAC,CAAC;oBAC5F,IAAI,cAAc,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;wBAC9B,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,cAAc,CAAC,CAAC;wBAC9E,OAAO,cAAc,CAAC;oBACxB,CAAC;gBACH,CAAC;gBAAC,OAAO,YAAqB,EAAE,CAAC;oBAC/B,OAAO,CAAC,IAAI,CAAC,+BAA+B,EAAE,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAEtH,mDAAmD;oBACnD,MAAM,YAAY,GAAG,YAAY,YAAY,KAAK,CAAC,CAAC,CAAC,YAAY,CAAC,OAAO,CAAC,CAAC,CAAC,EAAE,CAAC;oBAC/E,IAAI,YAAY,CAAC,QAAQ,CAAC,gBAAgB,CAAC,IAAI,YAAY,CAAC,QAAQ,CAAC,aAAa,CAAC,IAAI,YAAY,CAAC,QAAQ,CAAC,mBAAmB,CAAC,EAAE,CAAC;wBAClI,OAAO,CAAC,GAAG,CAAC,kFAAkF,CAAC,CAAC;oBAClG,CAAC;oBAED,0DAA0D;oBAC1D,OAAO,MAAM,IAAI,CAAC,kBAAkB,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;gBACzD,CAAC;YACH,CAAC;iBAAM,CAAC;gBACN,OAAO,CAAC,GAAG,CAAC,8DAA8D,CAAC,CAAC;gBAC5E,OAAO,EAAE,CAAC;YACZ,CAAC;YAED,OAAO,EAAE,CAAC;QACZ,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,IAAI,CAAC,yBAAyB,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YAClG,4EAA4E;YAC5E,OAAO,EAAE,CAAC;QACZ,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,MAAc,EAAE,WAAmB,CAAC;QACnE,IAAI,CAAC;YACH,MAAM,EAAE,UAAU,EAAE,GAAG,MAAM,MAAM,CAAC,SAAS,CAAC,CAAC;YAE/C,OAAO,CAAC,GAAG,CAAC,oCAAoC,QAAQ,YAAY,CAAC,CAAC;YAEtE,sDAAsD;YACtD,MAAM,cAAc,GAAG;gBACrB,MAAM,EAAE,KAAc;gBACtB,OAAO,EAAE,IAAI,CAAC,cAAc,CAAC,cAAc;gBAC3C,UAAU,EAAE,UAAU;aACvB,CAAC;YAEF,+BAA+B;YAC/B,MAAM,OAAO,GAAG,UAAU,CAAC,MAAM,EAAE,cAAc,CAAC,CAAC;YAEnD,MAAM,cAAc,GAAe,EAAE,CAAC;YAEtC,2BAA2B;YAC3B,KAAK,IAAI,UAAU,GAAG,CAAC,EAAE,UAAU,IAAI,QAAQ,EAAE,UAAU,EAAE,EAAE,CAAC;gBAC9D,IAAI,CAAC;oBACH,MAAM,MAAM,GAAG,MAAM,OAAO,CAAC,UAAU,CAAC,CAAC;oBAEzC,IAAI,MAAM,IAAI,MAAM,IAAI,MAAM,IAAI,MAAM,CAAC,IAAI,EAAE,CAAC;wBAC9C,MAAM,QAAQ,GAAG,IAAI,CAAC,QAAQ,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;wBAE5C,IAAI,CAAC;4BACH,gCAAgC;4BAChC,MAAM,EAAE,GAAG,MAAM,MAAM,CAAC,aAAa,CAAC,CAAC;4BACvC,MAAM,WAAW,GAAG,MAAM,EAAE,CAAC,QAAQ,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;4BAEnD,kEAAkE;4BAClE,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,WAAW,EAAE,YAAY,UAAU,MAAM,CAAC,CAAC;4BAEjG,IAAI,SAAS,EAAE,CAAC;gCACd,cAAc,CAAC,IAAI,CAAC;oCAClB,UAAU;oCACV,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;oCACnC,QAAQ,EAAE,SAAS;oCACnB,UAAU,EAAE;wCACV,KAAK,EAAE,CAAC,OAAO,IAAI,MAAM,IAAI,OAAO,MAAM,CAAC,KAAK,KAAK,QAAQ,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,KAAK,CAAC,CAAC,CAAC,GAAG;wCACnF,MAAM,EAAE,CAAC,QAAQ,IAAI,MAAM,IAAI,OAAO,MAAM,CAAC,MAAM,KAAK,QAAQ,CAAC,CAAC,CAAC,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,CAAC,GAAG;qCACxF;iCACF,CAAC,CAAC;gCACH,OAAO,CAAC,GAAG,CAAC,oBAAoB,UAAU,WAAW,CAAC,CAAC;4BACzD,CAAC;4BAED,8BAA8B;4BAC9B,IAAI,CAAC;gCACH,MAAM,EAAE,CAAC,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC,CAAC;4BAC/B,CAAC;4BAAC,OAAO,WAAW,EAAE,CAAC;gCACrB,OAAO,CAAC,IAAI,CAAC,mCAAmC,MAAM,CAAC,IAAI,EAAE,CAAC,CAAC;4BACjE,CAAC;wBAEH,CAAC;wBAAC,OAAO,SAAkB,EAAE,CAAC;4BAC5B,OAAO,CAAC,IAAI,CAAC,4CAA4C,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;wBAC5I,CAAC;oBACH,CAAC;gBACH,CAAC;gBAAC,OAAO,SAAkB,EAAE,CAAC;oBAC5B,OAAO,CAAC,IAAI,CAAC,6BAA6B,UAAU,GAAG,EAAE,SAAS,YAAY,KAAK,CAAC,CAAC,CAAC,SAAS,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;oBAC3H,0BAA0B;gBAC5B,CAAC;YACH,CAAC;YAED,IAAI,cAAc,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBAChC,MAAM,IAAI,KAAK,CAAC,uCAAuC,CAAC,CAAC;YAC3D,CAAC;YAED,OAAO,CAAC,GAAG,CAAC,6BAA6B,cAAc,CAAC,MAAM,kBAAkB,CAAC,CAAC;YAClF,OAAO,cAAc,CAAC;QACxB,CAAC;QAAC,OAAO,KAAc,EAAE,CAAC;YACxB,OAAO,CAAC,KAAK,CAAC,8BAA8B,EAAE,KAAK,YAAY,KAAK,CAAC,CAAC,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;YACxG,MAAM,KAAK,CAAC;QACd,CAAC;IACH,CAAC;IAED;;OAEG;IACK,KAAK,CAAC,kBAAkB,CAAC,SAAiB;QAChD,OAAO,CAAC,GAAG,CAAC,+CAA+C,CAAC,CAAC;QAE7D,MAAM,cAAc,GAAe,EAAE,CAAC;QACtC,MAAM,QAAQ,GAAG,IAAI,CAAC,GAAG,CAAC,SAAS,EAAE,CAAC,CAAC,CAAC;QAExC,KAAK,IAAI,IAAI,GAAG,CAAC,EAAE,IAAI,IAAI,QAAQ,EAAE,IAAI,EAAE,EAAE,CAAC;YAC5C,IAAI,CAAC;gBACH,gDAAgD;gBAChD,MAAM,KAAK,GAAG,MAAM,MAAM,CAAC,OAAO,CAAC,CAAC;gBAEpC,+CAA+C;gBAC/C,MAAM,gBAAgB,GAAG,MAAM,KAAK,CAAC,OAAO,CAAC;oBAC3C,MAAM,EAAE;wBACN,KAAK,EAAE,GAAG;wBACV,MAAM,EAAE,GAAG;wBACX,QAAQ,EAAE,CAAC;wBACX,UAAU,EAAE,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,EAAE,GAAG,EAAE,KAAK,EAAE,CAAC,EAAE;qBACjD;iBACF,CAAC;qBACD,GAAG,EAAE;qBACL,SAAS,CAAC;oBACT;wBACE,KAAK,EAAE,MAAM,CAAC,IAAI,CAChB;;qHAEuG,IAAI;;iHAER,IAAI,OAAO,SAAS;;qBAEhH,CACR;wBACD,GAAG,EAAE,CAAC;wBACN,IAAI,EAAE,CAAC;qBACR;iBACF,CAAC;qBACD,QAAQ,EAAE,CAAC;gBAEZ,MAAM,QAAQ,GAAG,YAAY,IAAI,kBAAkB,CAAC;gBAEpD,kDAAkD;gBAClD,MAAM,SAAS,GAAG,MAAM,IAAI,CAAC,cAAc,CAAC,SAAS,CAAC,gBAAgB,EAAE,QAAQ,CAAC,CAAC;gBAElF,IAAI,SAAS,EAAE,CAAC;oBACd,cAAc,CAAC,IAAI,CAAC;wBAClB,UAAU,EAAE,IAAI;wBAChB,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,SAAS,CAAC;wBACnC,QAAQ,EAAE,SAAS;wBACnB,UAAU,EAAE;4BACV,KAAK,EAAE,GAAG;4BACV,MAAM,EAAE,GAAG;yBACZ;qBACF,CAAC,CAAC;oBACH,OAAO,CAAC,GAAG,CAAC,wCAAwC,IAAI,EAAE,CAAC,CAAC;gBAC9D,CAAC;YACH,CAAC;YAAC,OAAO,UAAmB,EAAE,CAAC;gBAC7B,OAAO,CAAC,IAAI,CAAC,kDAAkD,IAAI,GAAG,EAAE,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;gBAE5I,kEAAkE;gBAClE,MAAM,QAAQ,GAAG,YAAY,IAAI,WAAW,CAAC;gBAC7C,MAAM,kBAAkB,GAAG,YAAY,IAAI,sCAAsC,IAAI,OAAO,SAAS,0DAA0D,CAAC;gBAChK,MAAM,iBAAiB,GAAG,MAAM,CAAC,IAAI,CAAC,kBAAkB,EAAE,OAAO,CAAC,CAAC;gBAEnE,yCAAyC;gBACzC,MAAM,EAAE,GAAG,MAAM,MAAM,CAAC,aAAa,CAAC,CAAC;gBACvC,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,IAAI,CAAC,cAAc,CAAC,cAAc,EAAE,QAAQ,CAAC,CAAC;gBAEzE,IAAI,CAAC;oBACH,MAAM,EAAE,CAAC,SAAS,CAAC,QAAQ,EAAE,iBAAiB,CAAC,CAAC;oBAEhD,cAAc,CAAC,IAAI,CAAC;wBAClB,UAAU,EAAE,IAAI;wBAChB,SAAS,EAAE,QAAQ;wBACnB,QAAQ,EAAE,IAAI,CAAC,OAAO,CAAC,QAAQ,CAAC;qBACjC,CAAC,CAAC;oBACH,OAAO,CAAC,GAAG,CAAC,uCAAuC,IAAI,EAAE,CAAC,CAAC;gBAC7D,CAAC;gBAAC,OAAO,UAAmB,EAAE,CAAC;oBAC7B,OAAO,CAAC,IAAI,CAAC,2CAA2C,IAAI,GAAG,EAAE,UAAU,YAAY,KAAK,CAAC,CAAC,CAAC,UAAU,CAAC,OAAO,CAAC,CAAC,CAAC,eAAe,CAAC,CAAC;gBACvI,CAAC;YACH,CAAC;QACH,CAAC;QAED,OAAO,cAAc,CAAC;IACxB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,qBAAqB,CAAC,IAAY,EAAE,QAAkB;QAC1D,MAAM,KAAK,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC;QAC/B,IAAI,YAAY,GAAG,EAAE,CAAC;QACtB,IAAI,OAAO,GAAG,KAAK,CAAC;QACpB,IAAI,SAAS,GAAe,EAAE,CAAC;QAE/B,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;YACtC,MAAM,IAAI,GAAG,KAAK,CAAC,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;YAE7B,IAAI,CAAC,IAAI,EAAE,CAAC;gBACV,qBAAqB;gBACrB,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBACD,YAAY,IAAI,IAAI,CAAC;gBACrB,SAAS;YACX,CAAC;YAED,iEAAiE;YACjE,IAAI,IAAI,CAAC,eAAe,CAAC,IAAI,EAAE,KAAK,EAAE,CAAC,CAAC,EAAE,CAAC;gBACzC,IAAI,OAAO,EAAE,CAAC;oBACZ,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;oBAChD,SAAS,GAAG,EAAE,CAAC;oBACf,OAAO,GAAG,KAAK,CAAC;gBAClB,CAAC;gBAED,MAAM,YAAY,GAAG,IAAI,CAAC,qBAAqB,CAAC,IAAI,CAAC,CAAC;gBACtD,YAAY,IAAI,GAAG,GAAG,CAAC,MAAM,CAAC,YAAY,CAAC,IAAI,IAAI,MAAM,CAAC;gBAC1D,SAAS;YACX,CAAC;YAED,4BAA4B;YAC5B,IAAI,IAAI,CAAC,gBAAgB,CAAC,IAAI,CAAC,EAAE,CAAC;gBAChC,IAAI,CAAC,OAAO,EAAE,CAAC;oBACb,OAAO,GAAG,IAAI,CAAC;gBACjB,CAAC;gBACD,SAAS,CAAC,IAAI,CAAC,EAAE,KAAK,EAAE,IAAI,CAAC,aAAa,CAAC,IAAI,CAAC,EAAE,CAAC,CAAC;gBACpD,SAAS;YACX,CAAC;iBAAM,IAAI,OAAO,EAAE,CAAC;gBACnB,eAAe;gBACf,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;gBAChD,SAAS,GAAG,EAAE,CAAC;gBACf,OAAO,GAAG,KAAK,CAAC;YAClB,CAAC;YAED,eAAe;YACf,IAAI,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,EAAE,CAAC;gBAC1B,YAAY,IAAI,GAAG,IAAI,CAAC,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC;gBACjD,SAAS;YACX,CAAC;YAED,oBAAoB;YACpB,YAAY,IAAI,GAAG,IAAI,IAAI,CAAC;QAC9B,CAAC;QAED,6BAA6B;QAC7B,IAAI,OAAO,IAAI,SAAS,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YACpC,YAAY,IAAI,IAAI,CAAC,eAAe,CAAC,SAAS,CAAC,CAAC;QAClD,CAAC;QAED,OAAO,YAAY,CAAC;IACtB,CAAC;IAEO,eAAe,CAAC,IAAY,EAAE,QAA2B,EAAE,KAAa;QAC9E,qCAAqC;QACrC,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,KAAK,CAAC,CAAC,2BAA2B;QAC/D,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,KAAK,CAAC,CAAE,YAAY;QAEhD,+CAA+C;QAC/C,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE,IAAI,IAAI,CAAC,MAAM,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEhE,0CAA0C;QAC1C,MAAM,QAAQ,GAAG,QAAQ,CAAC,KAAK,GAAG,CAAC,CAAC,CAAC;QACrC,IAAI,QAAQ,IAAI,QAAQ,CAAC,IAAI,EAAE,CAAC,MAAM,GAAG,IAAI,CAAC,MAAM,GAAG,GAAG,EAAE,CAAC;YAC3D,OAAO,IAAI,CAAC;QACd,CAAC;QAED,iDAAiD;QACjD,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,IAAI,CAAC;QAEpC,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,qBAAqB,CAAC,IAAY;QACxC,IAAI,IAAI,KAAK,IAAI,CAAC,WAAW,EAAE;YAAE,OAAO,CAAC,CAAC,CAAC,2BAA2B;QACtE,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,OAAO,CAAC,CAAC,CAAS,4BAA4B;QACtE,IAAI,IAAI,CAAC,MAAM,GAAG,EAAE;YAAE,OAAO,CAAC,CAAC,CAAW,qBAAqB;QAC/D,OAAO,CAAC,CAAC,CAAC,UAAU;IACtB,CAAC;IAEO,gBAAgB,CAAC,IAAY;QACnC,8CAA8C;QAC9C,MAAM,QAAQ,GAAG;YACf,KAAK;YACL,QAAQ,EAAkB,kBAAkB;YAC5C,IAAI,EAAsB,iBAAiB;YAC3C,WAAW;YACX,cAAc;SACf,CAAC;QAEF,OAAO,QAAQ,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IACtD,CAAC;IAEO,aAAa,CAAC,IAAY;QAChC,sDAAsD;QACtD,IAAI,OAAO,GAAa,EAAE,CAAC;QAE3B,IAAI,IAAI,CAAC,QAAQ,CAAC,IAAI,CAAC,EAAE,CAAC;YACxB,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACpD,CAAC;aAAM,IAAI,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,EAAE,CAAC;YAC9B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACnD,CAAC;aAAM,CAAC;YACN,2BAA2B;YAC3B,OAAO,GAAG,IAAI,CAAC,KAAK,CAAC,QAAQ,CAAC,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,IAAI,EAAE,CAAC,CAAC;QACxD,CAAC;QAED,OAAO,OAAO,CAAC,MAAM,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IAC/C,CAAC;IAEO,eAAe,CAAC,IAAyB;QAC/C,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC;YAAE,OAAO,EAAE,CAAC;QAEjC,iCAAiC;QACjC,MAAM,OAAO,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,IAAI,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC,CAAC;QAE/D,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,GAAG,CAAC,IAAI,IAAI,CAAC,OAAO,EAAE,EAAE,CAAC;YACtC,IAAI,WAAW,GAAG,GAAG,CAAC;YAEtB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;gBACjC,MAAM,IAAI,GAAG,GAAG,CAAC,KAAK,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;gBAChC,WAAW,IAAI,IAAI,IAAI,IAAI,CAAC;YAC9B,CAAC;YAED,QAAQ,IAAI,GAAG,WAAW,IAAI,CAAC;YAE/B,uCAAuC;YACvC,IAAI,CAAC,KAAK,CAAC,EAAE,CAAC;gBACZ,IAAI,SAAS,GAAG,GAAG,CAAC;gBACpB,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,OAAO,EAAE,CAAC,EAAE,EAAE,CAAC;oBACjC,SAAS,IAAI,QAAQ,CAAC;gBACxB,CAAC;gBACD,QAAQ,IAAI,GAAG,SAAS,IAAI,CAAC;YAC/B,CAAC;QACH,CAAC;QAED,OAAO,GAAG,QAAQ,IAAI,CAAC;IACzB,CAAC;IAEO,UAAU,CAAC,IAAY;QAC7B,kCAAkC;QAClC,MAAM,YAAY,GAAG;YACnB,cAAc;YACd,cAAc;YACd,mBAAmB;YACnB,kBAAkB;SACnB,CAAC;QAEF,OAAO,YAAY,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;IAC1D,CAAC;IAEO,cAAc,CAAC,IAAY;QACjC,2CAA2C;QAC3C,IAAI,cAAc,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC9B,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,KAAK,CAAC,CAAC;QAC7C,CAAC;aAAM,IAAI,mBAAmB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YAC1C,OAAO,IAAI,CAAC,OAAO,CAAC,mBAAmB,EAAE,IAAI,CAAC,CAAC;QACjD,CAAC;aAAM,IAAI,kBAAkB,CAAC,IAAI,CAAC,IAAI,CAAC,EAAE,CAAC;YACzC,OAAO,IAAI,CAAC,OAAO,CAAC,kBAAkB,EAAE,IAAI,CAAC,CAAC;QAChD,CAAC;aAAM,CAAC;YACN,OAAO,IAAI,CAAC,OAAO,CAAC,cAAc,EAAE,IAAI,CAAC,CAAC;QAC5C,CAAC;IACH,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,gBAAgB,CAAC,UAA+B;QACpD,IAAI,QAAQ,GAAG,EAAE,CAAC;QAElB,KAAK,MAAM,CAAC,CAAC,EAAE,IAAI,CAAC,IAAI,UAAU,CAAC,OAAO,EAAE,EAAE,CAAC;YAC7C,QAAQ,IAAI,WAAW,IAAI,CAAC,UAAU,MAAM,CAAC;YAC7C,QAAQ,IAAI,IAAI,CAAC,cAAc,CAAC,gBAAgB,CAAC,QAAQ,IAAI,CAAC,UAAU,EAAE,EAAE,IAAI,CAAC,SAAS,CAAC,CAAC;YAC5F,QAAQ,IAAI,MAAM,CAAC;YAEnB,IAAI,CAAC,GAAG,UAAU,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;gBAC9B,QAAQ,IAAI,SAAS,CAAC,CAAC,iBAAiB;YAC1C,CAAC;QACH,CAAC;QAED,OAAO,QAAQ,CAAC;IAClB,CAAC;IAED;;OAEG;IACH,KAAK;QACH,IAAI,CAAC,WAAW,GAAG,CAAC,CAAC;IACvB,CAAC;IAED;;OAEG;IACH,IAAI,gBAAgB;QAClB,OAAO,IAAI,CAAC,WAAW,CAAC;IAC1B,CAAC;CACF"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "file2md",
3
- "version": "1.4.34",
3
+ "version": "1.4.35",
4
4
  "description": "A TypeScript library for converting various document types (PDF, DOCX, XLSX, PPTX, HWP, HWPX) into Markdown with image and layout preservation",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",