vault-fetch 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md ADDED
@@ -0,0 +1,157 @@
1
+ # vault-fetch
2
+
3
+ Obsidian Clipper では取得できない、JavaScript レンダリングや認証が必要な Web ページおよび PDF ファイルを Playwright で取得し、Markdown に変換して Obsidian Vault に保存する CLI ツール。
4
+
5
+ ## 特徴
6
+
7
+ - Playwright (Chromium) による JS レンダリング後のページ取得
8
+ - **PDF から Markdown への変換**(Content-Type で自動判定)
9
+ - Readability.js による記事本文の抽出(広告・ナビゲーション除去)、`--raw` モードでフルページ変換も可能
10
+ - リソースブロッキング(画像・フォント・メディア)による高速フェッチ
11
+ - Chrome User-Agent 偽装によるボット対策回避
12
+ - Obsidian Clipper 互換のフロントマター(title, source, author, published, created, description, tags)
13
+ - セッション管理(`storageState`)によるログイン済みページの取得
14
+ - 設定の 3 層解決(CLI オプション > 環境変数 > 設定ファイル)
15
+
16
+ ## インストール
17
+
18
+ ```bash
19
+ # グローバルインストール
20
+ npm install -g vault-fetch
21
+
22
+ # Playwright のブラウザも必要
23
+ npx playwright install chromium
24
+ ```
25
+
26
+ ## 使い方
27
+
28
+ `npx` でインストールなしでも実行できます:
29
+
30
+ ```bash
31
+ npx vault-fetch fetch https://example.com/article --dry-run --dest /tmp
32
+ ```
33
+
34
+ ### ページの取得・保存
35
+
36
+ ```bash
37
+ # Obsidian Vault に保存
38
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings
39
+
40
+ # 標準出力に出力(保存しない)
41
+ vault-fetch fetch https://example.com/article --dry-run --dest /tmp
42
+
43
+ # headed モードで実行(デバッグ用)
44
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --headed
45
+
46
+ # 特定の CSS セレクタのみ抽出(Readability をスキップ)
47
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --selector "article"
48
+
49
+ # タグを追加
50
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --tag tech --tag ai
51
+
52
+ # 非記事ページをフルページ変換(Readability をスキップ)
53
+ vault-fetch fetch https://example.com/table-page --dest ~/Documents/Obsidian/Clippings --raw
54
+
55
+ # 画像を含めてフェッチ(デフォルトではブロック)
56
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --no-block-images
57
+
58
+ # PDF を取得して Markdown に変換(自動判定)
59
+ vault-fetch fetch https://example.com/report.pdf --dest ~/Documents/Obsidian/Clippings
60
+ ```
61
+
62
+ ### PDF 対応
63
+
64
+ サーバーが `Content-Type: application/pdf` を返す場合、vault-fetch は自動的に PDF をダウンロードし、[pdf2md](https://github.com/opendocsg/pdf2md) で Markdown に変換します。追加のフラグは不要です。
65
+
66
+ - タイトルは優先順位に従って抽出: PDF メタデータ(`dc:title` / `info.Title`)> Markdown の最初の `#` 見出し > URL のファイル名
67
+ - 著者・公開日も PDF メタデータから抽出(存在する場合)
68
+ - 自動抽出が不正確な場合は `--title` で手動指定可能
69
+ - `--selector` および `--raw` オプションは PDF URL と併用不可
70
+ - セッション機能により認証付き PDF のダウンロードにも対応
71
+
72
+ ### ログイン(セッション保存)
73
+
74
+ 認証が必要なサイトの場合、事前にログインしてセッションを保存できます。
75
+
76
+ ```bash
77
+ vault-fetch login https://note.com
78
+ # → ブラウザが開く → 手動でログイン → ターミナルで Enter を押す
79
+ ```
80
+
81
+ 以降の `fetch` でそのドメインのセッションが自動的に使用されます。
82
+
83
+ ### fetch オプション
84
+
85
+ | オプション | 説明 |
86
+ |---|---|
87
+ | `--dest <path>` | 保存先ディレクトリ(必須) |
88
+ | `--title <text>` | タイトル/ファイル名を手動指定 |
89
+ | `--headed` | ブラウザを表示して実行 |
90
+ | `--selector <css>` | CSS セレクタで要素を抽出 |
91
+ | `--timeout <sec>` | タイムアウト秒数(デフォルト: 30) |
92
+ | `--tag <name>` | タグ追加(複数指定可) |
93
+ | `--wait-until <event>` | 待機条件: `load` / `domcontentloaded` / `networkidle`(デフォルト: `networkidle`) |
94
+ | `--skip-session` | 保存済みセッションを使わない |
95
+ | `--dry-run` | 保存せず標準出力に出力 |
96
+ | `--raw` | Readability をスキップし、フルページ HTML を直接変換 |
97
+ | `--no-block-images` | 画像リクエストのブロックを無効化 |
98
+ | `--no-block-fonts` | フォントリクエストのブロックを無効化 |
99
+ | `--no-block-media` | メディアリクエストのブロックを無効化 |
100
+
101
+ ## 設定
102
+
103
+ ### 設定ファイル
104
+
105
+ `~/.config/vault-fetch/config.yaml`:
106
+
107
+ ```yaml
108
+ # Obsidian Vault の保存先
109
+ dest: ~/Documents/Obsidian/Clippings
110
+
111
+ # デフォルトタグ
112
+ tags:
113
+ - clippings
114
+
115
+ timeout: 30
116
+ ```
117
+
118
+ ### 環境変数
119
+
120
+ | 変数 | 説明 |
121
+ |---|---|
122
+ | `VAULT_FETCH_DEST` | 保存先ディレクトリ |
123
+ | `VAULT_FETCH_TIMEOUT` | タイムアウト秒数 |
124
+
125
+ ### 優先順位
126
+
127
+ CLI オプション > 環境変数 > 設定ファイル > デフォルト値
128
+
129
+ ## 出力例
130
+
131
+ ```yaml
132
+ ---
133
+ title: ADHDの自分が毎日クッソ集中できるようになった習慣
134
+ source: https://note.com/simplearchitect/n/n8389e1b4fbde
135
+ author:
136
+ - "[[牛尾 剛]]"
137
+ published: 2025-06-14
138
+ created: 2025-07-03
139
+ description: 自分はADHDですので、もちろん集中力は暗黒です...
140
+ tags:
141
+ - clippings
142
+ ---
143
+
144
+ 記事の本文が Markdown で続きます...
145
+ ```
146
+
147
+ ## 開発
148
+
149
+ ```bash
150
+ npm run build # tsup でビルド
151
+ npm test # vitest でテスト実行
152
+ npm run typecheck # 型チェック
153
+ ```
154
+
155
+ ## ライセンス
156
+
157
+ MIT
package/README.md CHANGED
@@ -1,142 +1,157 @@
1
1
  # vault-fetch
2
2
 
3
- Obsidian Clipper では取得できない、JavaScript レンダリングや認証が必要な Web ページを Playwright で取得し、Markdown に変換して Obsidian Vault に保存する CLI ツール。
3
+ A CLI tool that uses Playwright to fetch web pages and PDF files — pages that Obsidian Clipper cannot handle converts them to Markdown, and saves them to your Obsidian Vault.
4
4
 
5
- ## 特徴
5
+ ## Features
6
6
 
7
- - Playwright (Chromium) による JS レンダリング後のページ取得
8
- - Readability.js による記事本文の抽出(広告・ナビゲーション除去)、`--raw` モードでフルページ変換も可能
9
- - リソースブロッキング(画像・フォント・メディア)による高速フェッチ
10
- - Chrome User-Agent 偽装によるボット対策回避
11
- - Obsidian Clipper 互換のフロントマター(title, source, author, published, created, description, tags)
12
- - セッション管理(`storageState`)によるログイン済みページの取得
13
- - 設定の 3 層解決(CLI オプション > 環境変数 > 設定ファイル)
7
+ - Page fetching with JS rendering via Playwright (Chromium)
8
+ - **PDF to Markdown conversion** (auto-detected via Content-Type)
9
+ - Article content extraction using Readability.js (removes ads and navigation), with `--raw` mode for full-page conversion
10
+ - Resource blocking (images, fonts, media) for faster fetching
11
+ - Chrome User-Agent spoofing to bypass bot detection
12
+ - Obsidian Clipper-compatible frontmatter (title, source, author, published, created, description, tags)
13
+ - Session management (`storageState`) for fetching authenticated pages
14
+ - 3-layer configuration resolution (CLI options > environment variables > config file)
14
15
 
15
- ## インストール
16
+ ## Installation
16
17
 
17
18
  ```bash
18
- # グローバルインストール
19
+ # Global install
19
20
  npm install -g vault-fetch
20
21
 
21
- # Playwright のブラウザも必要
22
+ # Playwright browser is also required
22
23
  npx playwright install chromium
23
24
  ```
24
25
 
25
- ## 使い方
26
+ ## Usage
26
27
 
27
- `npx` でインストールなしでも実行できます:
28
+ You can run it without installation using `npx`:
28
29
 
29
30
  ```bash
30
31
  npx vault-fetch fetch https://example.com/article --dry-run --dest /tmp
31
32
  ```
32
33
 
33
- ### ページの取得・保存
34
+ ### Fetching and Saving Pages
34
35
 
35
36
  ```bash
36
- # Obsidian Vault に保存
37
+ # Save to Obsidian Vault
37
38
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings
38
39
 
39
- # 標準出力に出力(保存しない)
40
+ # Output to stdout (without saving)
40
41
  vault-fetch fetch https://example.com/article --dry-run --dest /tmp
41
42
 
42
- # headed モードで実行(デバッグ用)
43
+ # Run in headed mode (for debugging)
43
44
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --headed
44
45
 
45
- # 特定の CSS セレクタのみ抽出(Readability をスキップ)
46
+ # Extract only a specific CSS selector (skips Readability)
46
47
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --selector "article"
47
48
 
48
- # タグを追加
49
+ # Add tags
49
50
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --tag tech --tag ai
50
51
 
51
- # 非記事ページをフルページ変換(Readability をスキップ)
52
+ # Full-page conversion for non-article pages (skips Readability)
52
53
  vault-fetch fetch https://example.com/table-page --dest ~/Documents/Obsidian/Clippings --raw
53
54
 
54
- # 画像を含めてフェッチ(デフォルトではブロック)
55
+ # Fetch with images (blocked by default)
55
56
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --no-block-images
57
+
58
+ # Fetch a PDF and convert to Markdown (auto-detected)
59
+ vault-fetch fetch https://example.com/report.pdf --dest ~/Documents/Obsidian/Clippings
56
60
  ```
57
61
 
58
- ### ログイン(セッション保存)
62
+ ### PDF Support
63
+
64
+ When the server returns `Content-Type: application/pdf`, vault-fetch automatically downloads the PDF and converts it to Markdown using [pdf2md](https://github.com/opendocsg/pdf2md). No additional flags are needed.
65
+
66
+ - Title is extracted in priority order: PDF metadata (`dc:title` / `info.Title`) > first `#` heading in converted Markdown > URL filename
67
+ - Author and published date are also extracted from PDF metadata when available
68
+ - Use `--title` to manually override the title if automatic extraction is inaccurate
69
+ - `--selector` and `--raw` options cannot be used with PDF URLs
70
+ - Session support works with authenticated PDF downloads
71
+
72
+ ### Login (Session Storage)
59
73
 
60
- 認証が必要なサイトの場合、事前にログインしてセッションを保存できます。
74
+ For sites that require authentication, you can log in and save the session beforehand.
61
75
 
62
76
  ```bash
63
77
  vault-fetch login https://note.com
64
- # → ブラウザが開く手動でログインターミナルで Enter を押す
78
+ # → Browser opens Log in manually Press Enter in terminal
65
79
  ```
66
80
 
67
- 以降の `fetch` でそのドメインのセッションが自動的に使用されます。
81
+ Subsequent `fetch` commands will automatically use the saved session for that domain.
68
82
 
69
- ### fetch オプション
83
+ ### Fetch Options
70
84
 
71
- | オプション | 説明 |
85
+ | Option | Description |
72
86
  |---|---|
73
- | `--dest <path>` | 保存先ディレクトリ(必須) |
74
- | `--headed` | ブラウザを表示して実行 |
75
- | `--selector <css>` | CSS セレクタで要素を抽出 |
76
- | `--timeout <sec>` | タイムアウト秒数(デフォルト: 30) |
77
- | `--tag <name>` | タグ追加(複数指定可) |
78
- | `--wait-until <event>` | 待機条件: `load` / `domcontentloaded` / `networkidle`(デフォルト: `networkidle`) |
79
- | `--skip-session` | 保存済みセッションを使わない |
80
- | `--dry-run` | 保存せず標準出力に出力 |
81
- | `--raw` | Readability をスキップし、フルページ HTML を直接変換 |
82
- | `--no-block-images` | 画像リクエストのブロックを無効化 |
83
- | `--no-block-fonts` | フォントリクエストのブロックを無効化 |
84
- | `--no-block-media` | メディアリクエストのブロックを無効化 |
85
-
86
- ## 設定
87
-
88
- ### 設定ファイル
87
+ | `--dest <path>` | Destination directory (required) |
88
+ | `--title <text>` | Override the page title for the output filename |
89
+ | `--headed` | Run with browser visible |
90
+ | `--selector <css>` | Extract elements by CSS selector |
91
+ | `--timeout <sec>` | Timeout in seconds (default: 30) |
92
+ | `--tag <name>` | Add tags (can be specified multiple times) |
93
+ | `--wait-until <event>` | Wait condition: `load` / `domcontentloaded` / `networkidle` (default: `networkidle`) |
94
+ | `--skip-session` | Do not use saved sessions |
95
+ | `--dry-run` | Output to stdout without saving |
96
+ | `--raw` | Skip Readability and convert full-page HTML directly |
97
+ | `--no-block-images` | Disable image request blocking |
98
+ | `--no-block-fonts` | Disable font request blocking |
99
+ | `--no-block-media` | Disable media request blocking |
100
+
101
+ ## Configuration
102
+
103
+ ### Config File
89
104
 
90
105
  `~/.config/vault-fetch/config.yaml`:
91
106
 
92
107
  ```yaml
93
- # Obsidian Vault の保存先
108
+ # Obsidian Vault destination
94
109
  dest: ~/Documents/Obsidian/Clippings
95
110
 
96
- # デフォルトタグ
111
+ # Default tags
97
112
  tags:
98
113
  - clippings
99
114
 
100
115
  timeout: 30
101
116
  ```
102
117
 
103
- ### 環境変数
118
+ ### Environment Variables
104
119
 
105
- | 変数 | 説明 |
120
+ | Variable | Description |
106
121
  |---|---|
107
- | `VAULT_FETCH_DEST` | 保存先ディレクトリ |
108
- | `VAULT_FETCH_TIMEOUT` | タイムアウト秒数 |
122
+ | `VAULT_FETCH_DEST` | Destination directory |
123
+ | `VAULT_FETCH_TIMEOUT` | Timeout in seconds |
109
124
 
110
- ### 優先順位
125
+ ### Priority
111
126
 
112
- CLI オプション > 環境変数 > 設定ファイル > デフォルト値
127
+ CLI options > Environment variables > Config file > Default values
113
128
 
114
- ## 出力例
129
+ ## Output Example
115
130
 
116
131
  ```yaml
117
132
  ---
118
- title: ADHDの自分が毎日クッソ集中できるようになった習慣
119
- source: https://note.com/simplearchitect/n/n8389e1b4fbde
133
+ title: "Thinking, Fast and Slow: Lessons for Software Engineers"
134
+ source: https://medium.com/@example/thinking-fast-and-slow-lessons-for-engineers-abc123
120
135
  author:
121
- - "[[牛尾 剛]]"
136
+ - "[[Jane Smith]]"
122
137
  published: 2025-06-14
123
138
  created: 2025-07-03
124
- description: 自分はADHDですので、もちろん集中力は暗黒です...
139
+ description: How cognitive biases from Kahneman's research apply to everyday engineering decisions...
125
140
  tags:
126
141
  - clippings
127
142
  ---
128
143
 
129
- 記事の本文が Markdown で続きます...
144
+ The article body continues in Markdown...
130
145
  ```
131
146
 
132
- ## 開発
147
+ ## Development
133
148
 
134
149
  ```bash
135
- npm run build # tsup でビルド
136
- npm test # vitest でテスト実行
137
- npm run typecheck # 型チェック
150
+ npm run build # Build with tsup
151
+ npm test # Run tests with vitest
152
+ npm run typecheck # Type checking
138
153
  ```
139
154
 
140
- ## ライセンス
155
+ ## License
141
156
 
142
157
  MIT
package/dist/cli.js CHANGED
@@ -96,6 +96,7 @@ function resolveConfig(cliOptions, configPath) {
96
96
  waitUntil,
97
97
  headed: cliOptions.headed ?? false,
98
98
  selector: cliOptions.selector ?? null,
99
+ title: cliOptions.title ?? null,
99
100
  noSession: cliOptions.noSession ?? false,
100
101
  dryRun: cliOptions.dryRun ?? false,
101
102
  blockImages: cliOptions.blockImages ?? true,
@@ -137,6 +138,9 @@ function ensureSessionDir(sessionsDir) {
137
138
 
138
139
  // src/fetcher.ts
139
140
  var CHROME_USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36";
141
+ function isPdfContentType(contentType) {
142
+ return contentType.toLowerCase().includes("application/pdf");
143
+ }
140
144
  function buildBlockedResourceTypes(options) {
141
145
  const blocked = /* @__PURE__ */ new Set();
142
146
  if (options.blockImages) blocked.add("image");
@@ -144,6 +148,29 @@ function buildBlockedResourceTypes(options) {
144
148
  if (options.blockMedia) blocked.add("media");
145
149
  return blocked;
146
150
  }
151
+ var PDF_MAGIC_BYTES = "%PDF";
152
+ function validatePdfBuffer(pdfBuffer, sourceUrl) {
153
+ if (pdfBuffer.length === 0) {
154
+ throw new Error(`Empty PDF response received from ${sourceUrl}`);
155
+ }
156
+ const header = pdfBuffer.subarray(0, PDF_MAGIC_BYTES.length).toString("ascii");
157
+ if (!header.startsWith(PDF_MAGIC_BYTES)) {
158
+ throw new Error(
159
+ `Response Content-Type is application/pdf but body is not valid PDF data from ${sourceUrl}`
160
+ );
161
+ }
162
+ }
163
+ async function downloadPdf(context, url, timeoutMs) {
164
+ const apiResponse = await context.request.get(url, { timeout: timeoutMs });
165
+ const status = apiResponse.status();
166
+ if (status >= 400) {
167
+ throw new Error(`HTTP ${status} received when downloading PDF from ${url}`);
168
+ }
169
+ const pdfBuffer = Buffer.from(await apiResponse.body());
170
+ const finalUrl = apiResponse.url();
171
+ validatePdfBuffer(pdfBuffer, finalUrl);
172
+ return { pdfBuffer, finalUrl };
173
+ }
147
174
  async function fetchPage(url, config, sessionsDir) {
148
175
  const browser = await chromium.launch({
149
176
  headless: !config.headed
@@ -169,10 +196,20 @@ async function fetchPage(url, config, sessionsDir) {
169
196
  });
170
197
  }
171
198
  const timeoutMs = config.timeout * 1e3;
172
- const response = await page.goto(url, {
173
- waitUntil: config.waitUntil,
174
- timeout: timeoutMs
175
- });
199
+ let response;
200
+ try {
201
+ response = await page.goto(url, {
202
+ waitUntil: config.waitUntil,
203
+ timeout: timeoutMs
204
+ });
205
+ } catch (error) {
206
+ if (error instanceof Error && error.message.includes("Download is starting")) {
207
+ const result = await downloadPdf(context, url, timeoutMs);
208
+ await context.close();
209
+ return { kind: "pdf", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };
210
+ }
211
+ throw error;
212
+ }
176
213
  if (!response) {
177
214
  throw new Error(`No response received from ${url}`);
178
215
  }
@@ -181,6 +218,19 @@ async function fetchPage(url, config, sessionsDir) {
181
218
  throw new Error(`HTTP ${status} received from ${response.url()}`);
182
219
  }
183
220
  const finalUrl = response.url();
221
+ const contentType = response.headers()["content-type"] ?? "";
222
+ if (isPdfContentType(contentType)) {
223
+ const body = await response.body();
224
+ try {
225
+ validatePdfBuffer(body, finalUrl);
226
+ await context.close();
227
+ return { kind: "pdf", pdfBuffer: body, url, finalUrl };
228
+ } catch {
229
+ const result = await downloadPdf(context, finalUrl, timeoutMs);
230
+ await context.close();
231
+ return { kind: "pdf", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };
232
+ }
233
+ }
184
234
  const fullHtml = await page.content();
185
235
  let html;
186
236
  if (config.selector) {
@@ -193,7 +243,7 @@ async function fetchPage(url, config, sessionsDir) {
193
243
  html = fullHtml;
194
244
  }
195
245
  await context.close();
196
- return { html, fullHtml, url, finalUrl };
246
+ return { kind: "html", html, fullHtml, url, finalUrl };
197
247
  } finally {
198
248
  await browser.close();
199
249
  }
@@ -349,14 +399,14 @@ var CONFIG_PATH = join3(homedir3(), ".config", "vault-fetch", "config.yaml");
349
399
  var program = new Command();
350
400
  program.name("vault-fetch").description(
351
401
  "Fetch JS-rendered web pages and save as Markdown to Obsidian Vault"
352
- ).version("0.2.0");
402
+ ).version("0.3.0");
353
403
  program.command("fetch").description("Fetch a page and save as Markdown").argument("<url>", "URL to fetch").option("--dest <path>", "Destination directory").option("--headed", "Run browser in headed mode").option("--selector <css>", "CSS selector to extract").option("--timeout <seconds>", "Timeout in seconds", parseInt).option("--tag <name>", "Add tag (repeatable)", (val, acc) => {
354
404
  acc.push(val);
355
405
  return acc;
356
406
  }, []).option(
357
407
  "--wait-until <event>",
358
408
  "Wait condition: load, domcontentloaded, networkidle"
359
- ).option("--skip-session", "Do not use saved session").option("--dry-run", "Output to stdout instead of saving").option("--no-block-images", "Do not block image requests").option("--no-block-fonts", "Do not block font requests").option("--no-block-media", "Do not block media requests").option("--raw", "Convert full page HTML without Readability extraction").action(async (url, options) => {
409
+ ).option("--skip-session", "Do not use saved session").option("--dry-run", "Output to stdout instead of saving").option("--no-block-images", "Do not block image requests").option("--no-block-fonts", "Do not block font requests").option("--no-block-media", "Do not block media requests").option("--raw", "Convert full page HTML without Readability extraction").option("--title <text>", "Override the page title for the output filename").action(async (url, options) => {
360
410
  try {
361
411
  const configPath = existsSync2(CONFIG_PATH) ? CONFIG_PATH : void 0;
362
412
  const config = resolveConfig(
@@ -367,6 +417,7 @@ program.command("fetch").description("Fetch a page and save as Markdown").argume
367
417
  waitUntil: options.waitUntil,
368
418
  headed: options.headed,
369
419
  selector: options.selector,
420
+ title: options.title,
370
421
  noSession: options.skipSession,
371
422
  dryRun: options.dryRun,
372
423
  blockImages: options.blockImages,
@@ -384,20 +435,36 @@ program.command("fetch").description("Fetch a page and save as Markdown").argume
384
435
  }
385
436
  const sessionsDir = getSessionDir();
386
437
  const fetchResult = await fetchPage(url, config, sessionsDir);
387
- let contentHtml;
438
+ let markdown;
388
439
  let metadata;
389
- if (config.selector) {
390
- contentHtml = fetchResult.html;
440
+ if (fetchResult.kind === "pdf") {
441
+ if (config.selector) {
442
+ throw new Error("--selector cannot be used with PDF URLs.");
443
+ }
444
+ if (config.raw) {
445
+ throw new Error("--raw cannot be used with PDF URLs.");
446
+ }
447
+ const { convertPdfToMarkdown } = await import("./pdf-converter-FM6DCBO5.js");
448
+ const pdfResult = await convertPdfToMarkdown(
449
+ fetchResult.pdfBuffer,
450
+ fetchResult.finalUrl
451
+ );
452
+ markdown = pdfResult.markdown;
453
+ metadata = pdfResult.metadata;
454
+ } else if (config.selector) {
391
455
  metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);
456
+ markdown = convertToMarkdown(fetchResult.html);
392
457
  } else if (config.raw) {
393
- contentHtml = fetchResult.fullHtml;
394
458
  metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);
459
+ markdown = convertToMarkdown(fetchResult.fullHtml);
395
460
  } else {
396
461
  const result = extract(fetchResult.html, fetchResult.finalUrl);
397
462
  metadata = result.metadata;
398
- contentHtml = result.content;
463
+ markdown = convertToMarkdown(result.content);
464
+ }
465
+ if (config.title !== null) {
466
+ metadata = { ...metadata, title: config.title };
399
467
  }
400
- const markdown = convertToMarkdown(contentHtml);
401
468
  if (config.dryRun) {
402
469
  const frontmatter = buildFrontmatter(metadata, config.tags);
403
470
  process.stdout.write(`${frontmatter}
package/dist/cli.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"sources":["../src/cli.ts","../src/config.ts","../src/fetcher.ts","../src/session.ts","../src/extractor.ts","../src/converter.ts","../src/writer.ts"],"sourcesContent":["import { Command } from \"commander\";\nimport { once } from \"node:events\";\nimport { existsSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { join } from \"node:path\";\nimport { resolveConfig } from \"./config.js\";\nimport { fetchPage } from \"./fetcher.js\";\nimport { extract, extractMetadata } from \"./extractor.js\";\nimport { convertToMarkdown } from \"./converter.js\";\nimport { writeMarkdownFile, buildFrontmatter } from \"./writer.js\";\nimport {\n getSessionDir,\n getSessionPath,\n ensureSessionDir,\n} from \"./session.js\";\nimport type { WaitUntilOption } from \"./types.js\";\n\nconst CONFIG_PATH = join(homedir(), \".config\", \"vault-fetch\", \"config.yaml\");\n\nconst program = new Command();\n\nprogram\n .name(\"vault-fetch\")\n .description(\n \"Fetch JS-rendered web pages and save as Markdown to Obsidian Vault\",\n )\n .version(\"0.2.0\");\n\nprogram\n .command(\"fetch\")\n .description(\"Fetch a page and save as Markdown\")\n .argument(\"<url>\", \"URL to fetch\")\n .option(\"--dest <path>\", \"Destination directory\")\n .option(\"--headed\", \"Run browser in headed mode\")\n .option(\"--selector <css>\", \"CSS selector to extract\")\n .option(\"--timeout <seconds>\", \"Timeout in seconds\", parseInt)\n .option(\"--tag <name>\", \"Add tag (repeatable)\", (val: string, acc: string[]) => {\n acc.push(val);\n return acc;\n }, [] as string[])\n .option(\n \"--wait-until <event>\",\n \"Wait condition: load, domcontentloaded, networkidle\",\n )\n .option(\"--skip-session\", \"Do not use saved session\")\n .option(\"--dry-run\", \"Output to stdout instead of saving\")\n .option(\"--no-block-images\", \"Do not block image requests\")\n .option(\"--no-block-fonts\", \"Do not block font requests\")\n .option(\"--no-block-media\", \"Do not block media requests\")\n .option(\"--raw\", \"Convert full page HTML without Readability extraction\")\n .action(async (url: string, options: Record<string, unknown>) => {\n try {\n const configPath = existsSync(CONFIG_PATH) ? CONFIG_PATH : undefined;\n const config = resolveConfig(\n {\n dest: options.dest as string | undefined,\n tags: options.tag as string[] | undefined,\n timeout: options.timeout as number | undefined,\n waitUntil: options.waitUntil as WaitUntilOption | undefined,\n headed: options.headed as boolean | undefined,\n selector: options.selector as string | undefined,\n noSession: options.skipSession as boolean | undefined,\n dryRun: options.dryRun as boolean | undefined,\n blockImages: options.blockImages as boolean | undefined,\n blockFonts: options.blockFonts as boolean | undefined,\n blockMedia: options.blockMedia as boolean | undefined,\n raw: options.raw as boolean | undefined,\n },\n configPath,\n );\n\n if (config.raw && config.selector) {\n throw new Error(\"--raw and --selector cannot be used together.\");\n }\n\n // Validate dest directory exists\n if (!config.dryRun && !existsSync(config.dest)) {\n throw new Error(`Destination directory does not exist: ${config.dest}`);\n }\n\n const sessionsDir = getSessionDir();\n const fetchResult = await fetchPage(url, config, sessionsDir);\n\n let contentHtml: string;\n let metadata;\n\n if (config.selector) {\n // --selector mode: skip Readability, extract metadata from full page\n contentHtml = fetchResult.html;\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n } else if (config.raw) {\n // --raw mode: skip Readability, convert full page HTML directly\n contentHtml = fetchResult.fullHtml;\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n } else {\n const result = extract(fetchResult.html, fetchResult.finalUrl);\n metadata = result.metadata;\n contentHtml = result.content;\n }\n\n const markdown = convertToMarkdown(contentHtml);\n\n if (config.dryRun) {\n const frontmatter = buildFrontmatter(metadata, config.tags);\n process.stdout.write(`${frontmatter}\\n\\n${markdown}\\n`);\n } else {\n const filePath = writeMarkdownFile(\n config.dest,\n metadata,\n markdown,\n config.tags,\n );\n console.error(`Saved: ${filePath}`);\n }\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n }\n });\n\nprogram\n .command(\"login\")\n .description(\"Login to a site and save session\")\n .argument(\"<url>\", \"URL to login\")\n .option(\"--timeout <seconds>\", \"Login timeout in seconds\", parseInt)\n .action(async (url: string, options: Record<string, unknown>) => {\n const { chromium } = await import(\"playwright\");\n const sessionsDir = getSessionDir();\n ensureSessionDir(sessionsDir);\n\n const timeoutSec = (options.timeout as number | undefined) ?? 300;\n const browser = await chromium.launch({ headless: false });\n\n try {\n const context = await browser.newContext();\n const page = await context.newPage();\n\n await page.goto(url, { waitUntil: \"networkidle\", timeout: timeoutSec * 1000 });\n\n console.error(\"Browser opened. Log in manually, then press Enter here to save session.\");\n\n process.stdin.resume();\n await once(process.stdin, \"data\");\n process.stdin.pause();\n process.stdin.unref();\n\n const sessionPath = getSessionPath(url, sessionsDir);\n await context.storageState({ path: sessionPath });\n console.error(`Session saved: ${sessionPath}`);\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n } finally {\n await browser.close();\n }\n });\n\nprogram.parse();\n","import { readFileSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { resolve } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { ResolvedConfig, WaitUntilOption } from \"./types.js\";\n\nconst DEFAULT_TIMEOUT = 30;\nconst DEFAULT_WAIT_UNTIL: WaitUntilOption = \"networkidle\";\nconst REQUIRED_TAG = \"clippings\";\n\ninterface FileConfig {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n}\n\ninterface CliOptions {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n headed?: boolean;\n selector?: string;\n noSession?: boolean;\n dryRun?: boolean;\n blockImages?: boolean;\n blockFonts?: boolean;\n blockMedia?: boolean;\n raw?: boolean;\n}\n\nfunction expandTilde(filePath: string): string {\n if (filePath.startsWith(\"~/\")) {\n return resolve(homedir(), filePath.slice(2));\n }\n return filePath;\n}\n\nconst VALID_WAIT_UNTIL: readonly string[] = [\"load\", \"domcontentloaded\", \"networkidle\"];\n\nfunction validateWaitUntil(value: string): WaitUntilOption {\n if (!VALID_WAIT_UNTIL.includes(value)) {\n throw new Error(\n `Invalid waitUntil value: \"${value}\". Must be one of: ${VALID_WAIT_UNTIL.join(\", \")}`,\n );\n }\n return value as WaitUntilOption;\n}\n\nfunction loadConfigFile(configPath: string): FileConfig {\n const content = readFileSync(configPath, \"utf-8\");\n const parsed = yaml.load(content);\n if (parsed === null || typeof parsed !== \"object\") {\n throw new Error(`Invalid config file: ${configPath}`);\n }\n const config = parsed as Record<string, unknown>;\n\n if (config.timeout !== undefined && typeof config.timeout !== \"number\") {\n throw new Error(`Invalid timeout in config file: expected number, got ${typeof config.timeout}`);\n }\n if (config.dest !== undefined && typeof config.dest !== \"string\") {\n throw new Error(`Invalid dest in config file: expected string, got ${typeof config.dest}`);\n }\n if (config.waitUntil !== undefined) {\n if (typeof config.waitUntil !== \"string\") {\n throw new Error(`Invalid waitUntil in config file: expected string, got ${typeof config.waitUntil}`);\n }\n validateWaitUntil(config.waitUntil);\n }\n if (config.tags !== undefined) {\n if (!Array.isArray(config.tags) || !config.tags.every((t: unknown) => typeof t === \"string\")) {\n throw new Error(\"Invalid tags in config file: expected array of strings\");\n }\n }\n\n return config as FileConfig;\n}\n\nexport function resolveConfig(\n cliOptions: CliOptions,\n configPath: string | undefined,\n): ResolvedConfig {\n // Layer 1: Config file\n let fileConfig: FileConfig = {};\n if (configPath) {\n fileConfig = loadConfigFile(configPath);\n }\n\n // Layer 2: Environment variables\n const envDest = process.env.VAULT_FETCH_DEST;\n const envTimeout = process.env.VAULT_FETCH_TIMEOUT;\n\n // Resolve each field: CLI > env > file > default\n const dest = cliOptions.dest ?? envDest ?? fileConfig.dest;\n if (dest === undefined) {\n throw new Error(\n \"dest is required. Set via --dest, VAULT_FETCH_DEST, or config file.\",\n );\n }\n\n let timeout: number;\n if (cliOptions.timeout !== undefined) {\n timeout = cliOptions.timeout;\n } else if (envTimeout !== undefined) {\n const parsed = Number(envTimeout);\n if (Number.isNaN(parsed)) {\n throw new Error(`Invalid VAULT_FETCH_TIMEOUT value: ${envTimeout}`);\n }\n timeout = parsed;\n } else {\n timeout = fileConfig.timeout ?? DEFAULT_TIMEOUT;\n }\n\n const rawWaitUntil = cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;\n const waitUntil = validateWaitUntil(rawWaitUntil);\n\n // Merge tags: file tags + CLI tags + always clippings\n const allTags = [\n ...(fileConfig.tags ?? []),\n ...(cliOptions.tags ?? []),\n REQUIRED_TAG,\n ];\n const tags = [...new Set(allTags)];\n\n return {\n dest: expandTilde(dest),\n tags,\n timeout,\n waitUntil,\n headed: cliOptions.headed ?? false,\n selector: cliOptions.selector ?? null,\n noSession: cliOptions.noSession ?? false,\n dryRun: cliOptions.dryRun ?? false,\n blockImages: cliOptions.blockImages ?? true,\n blockFonts: cliOptions.blockFonts ?? true,\n blockMedia: cliOptions.blockMedia ?? true,\n raw: cliOptions.raw ?? false,\n };\n}\n","import { chromium, type BrowserContext } from \"playwright\";\nimport type { FetchResult, ResolvedConfig } from \"./types.js\";\nimport { getSessionPath, sessionExists } from \"./session.js\";\n\nexport const CHROME_USER_AGENT =\n \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) \" +\n \"AppleWebKit/537.36 (KHTML, like Gecko) \" +\n \"Chrome/134.0.0.0 Safari/537.36\";\n\ninterface BlockingOptions {\n blockImages: boolean;\n blockFonts: boolean;\n blockMedia: boolean;\n}\n\nexport function buildBlockedResourceTypes(options: BlockingOptions): Set<string> {\n const blocked = new Set<string>();\n if (options.blockImages) blocked.add(\"image\");\n if (options.blockFonts) blocked.add(\"font\");\n if (options.blockMedia) blocked.add(\"media\");\n return blocked;\n}\n\nexport async function fetchPage(\n url: string,\n config: ResolvedConfig,\n sessionsDir: string,\n): Promise<FetchResult> {\n const browser = await chromium.launch({\n headless: !config.headed,\n });\n\n try {\n const contextOptions: Parameters<typeof browser.newContext>[0] = {\n userAgent: CHROME_USER_AGENT,\n };\n\n // Load session if available and not disabled\n if (!config.noSession && sessionExists(url, sessionsDir)) {\n const sessionPath = getSessionPath(url, sessionsDir);\n contextOptions.storageState = sessionPath;\n }\n\n const context: BrowserContext = await browser.newContext(contextOptions);\n const page = await context.newPage();\n\n // Block specified resource types for faster loading\n const blockedTypes = buildBlockedResourceTypes(config);\n if (blockedTypes.size > 0) {\n await page.route(\"**/*\", async (route) => {\n if (blockedTypes.has(route.request().resourceType())) {\n await route.abort();\n } else {\n await route.continue();\n }\n });\n }\n\n const timeoutMs = config.timeout * 1000;\n const response = await page.goto(url, {\n waitUntil: config.waitUntil,\n timeout: timeoutMs,\n });\n\n if (!response) {\n throw new Error(`No response received from ${url}`);\n }\n\n const status = response.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received from ${response.url()}`);\n }\n\n const finalUrl = response.url();\n const fullHtml = await page.content();\n let html: string;\n\n if (config.selector) {\n const element = await page.$(config.selector);\n if (!element) {\n throw new Error(`Selector not found: ${config.selector}`);\n }\n html = await element.innerHTML();\n } else {\n html = fullHtml;\n }\n\n await context.close();\n\n return { html, fullHtml, url, finalUrl };\n } finally {\n await browser.close();\n }\n}\n","import { existsSync, mkdirSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport { homedir } from \"node:os\";\n\nconst CONFIG_DIR = join(homedir(), \".config\", \"vault-fetch\");\nconst SESSIONS_DIR = join(CONFIG_DIR, \"sessions\");\n\nexport function getSessionDir(): string {\n return SESSIONS_DIR;\n}\n\nfunction extractDomain(url: string): string {\n const parsed = new URL(url);\n return parsed.hostname ?? \"\";\n}\n\nexport function getSessionPath(url: string, sessionsDir: string): string {\n const domain = extractDomain(url);\n return join(sessionsDir, `${domain}.json`);\n}\n\nexport function sessionExists(url: string, sessionsDir: string): boolean {\n const sessionPath = getSessionPath(url, sessionsDir);\n return existsSync(sessionPath);\n}\n\nexport function ensureSessionDir(sessionsDir: string): void {\n if (!existsSync(sessionsDir)) {\n mkdirSync(sessionsDir, { recursive: true });\n }\n}\n","import { Readability } from \"@mozilla/readability\";\nimport { JSDOM } from \"jsdom\";\nimport type { Metadata } from \"./types.js\";\n\nfunction getMetaContent(doc: Document, selector: string): string | null {\n const el = doc.querySelector(selector);\n return el?.getAttribute(\"content\") ?? null;\n}\n\nfunction formatAuthor(raw: string): string {\n return `[[${raw.trim()}]]`;\n}\n\nfunction extractPublishedDate(doc: Document): string | null {\n const published =\n getMetaContent(doc, 'meta[property=\"article:published_time\"]') ??\n getMetaContent(doc, 'meta[name=\"datePublished\"]');\n\n if (!published) {\n const jsonLd = doc.querySelector('script[type=\"application/ld+json\"]');\n if (jsonLd?.textContent) {\n try {\n const data = JSON.parse(jsonLd.textContent) as Record<string, unknown>;\n if (typeof data.datePublished === \"string\") {\n return data.datePublished.split(\"T\")[0];\n }\n } catch {\n // JSON-LD parse failed\n }\n }\n return null;\n }\n\n return published.split(\"T\")[0];\n}\n\nfunction extractAuthors(\n doc: Document,\n readabilityByline: string | null,\n): string[] {\n const articleAuthors = doc.querySelectorAll('meta[property=\"article:author\"]');\n if (articleAuthors.length > 0) {\n return Array.from(articleAuthors)\n .map((el) => el.getAttribute(\"content\"))\n .filter((v): v is string => v !== null)\n .map(formatAuthor);\n }\n\n const ogAuthor = getMetaContent(doc, 'meta[property=\"og:author\"]');\n if (ogAuthor) {\n return [formatAuthor(ogAuthor)];\n }\n\n if (readabilityByline) {\n return [formatAuthor(readabilityByline)];\n }\n\n return [];\n}\n\nexport interface ExtractResult {\n metadata: Metadata;\n content: string;\n}\n\ninterface ReadabilityArticle {\n title: string;\n byline: string | null;\n excerpt: string;\n content: string;\n}\n\nfunction buildMetadata(\n doc: Document,\n article: ReadabilityArticle | null,\n finalUrl: string,\n): Metadata {\n const title = article?.title ?? doc.title;\n const authors = extractAuthors(doc, article?.byline ?? null);\n const published = extractPublishedDate(doc);\n\n const description =\n getMetaContent(doc, 'meta[property=\"og:description\"]') ??\n getMetaContent(doc, 'meta[name=\"description\"]') ??\n (article?.excerpt ?? null);\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n title,\n source: finalUrl,\n author: authors,\n published,\n created: today,\n description,\n };\n}\n\nfunction parseWithReadability(html: string, url: string): ReadabilityArticle | null {\n const dom = new JSDOM(html, { url });\n const reader = new Readability(dom.window.document);\n return reader.parse() as ReadabilityArticle | null;\n}\n\nexport function extract(html: string, finalUrl: string): ExtractResult {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n if (!article) {\n throw new Error(\n \"Readability failed to extract content from the page. \" +\n \"Try --raw to convert the full page, or --selector <css> to target specific content.\",\n );\n }\n\n if (!article.content) {\n throw new Error(\"Readability returned empty content for the page\");\n }\n\n return {\n metadata: buildMetadata(doc, article, finalUrl),\n content: article.content,\n };\n}\n\nexport function extractMetadata(html: string, finalUrl: string): Metadata {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n return buildMetadata(doc, article, finalUrl);\n}\n","import TurndownService from \"turndown\";\n\nexport function convertToMarkdown(html: string): string {\n const turndown = new TurndownService({\n headingStyle: \"atx\",\n codeBlockStyle: \"fenced\",\n bulletListMarker: \"-\",\n });\n\n return turndown.turndown(html);\n}\n","import { writeFileSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { Metadata } from \"./types.js\";\n\nconst UNSAFE_CHARS = /[/\\\\:*?\"<>|]/g;\nconst CONTROL_CHARS = /[\\x00-\\x1f\\x7f]/g;\nconst MAX_FILENAME_LENGTH = 200;\n\nexport function sanitizeFilename(title: string): string {\n const sanitized = title\n .replace(CONTROL_CHARS, \"\")\n .replace(UNSAFE_CHARS, \"\")\n .replace(/\\s+/g, \" \")\n .trim();\n const base = sanitized.slice(0, MAX_FILENAME_LENGTH) || \"Untitled\";\n return `${base}.md`;\n}\n\nexport function buildFrontmatter(metadata: Metadata, tags: string[]): string {\n const data: Record<string, unknown> = {\n title: metadata.title,\n source: metadata.source,\n };\n\n if (metadata.author.length > 0) {\n data.author = metadata.author;\n }\n\n if (metadata.published) {\n data.published = metadata.published;\n }\n\n data.created = metadata.created;\n\n if (metadata.description) {\n data.description = metadata.description;\n }\n\n data.tags = tags;\n\n const yamlStr = yaml.dump(data, {\n quotingType: '\"',\n forceQuotes: false,\n lineWidth: -1,\n sortKeys: false,\n });\n\n return `---\\n${yamlStr}---`;\n}\n\nexport function writeMarkdownFile(\n dest: string,\n metadata: Metadata,\n markdownContent: string,\n tags: string[],\n): string {\n const filename = sanitizeFilename(metadata.title);\n const filePath = join(dest, filename);\n const frontmatter = buildFrontmatter(metadata, tags);\n const fullContent = `${frontmatter}\\n\\n${markdownContent}\\n`;\n\n writeFileSync(filePath, fullContent, \"utf-8\");\n\n return filePath;\n}\n"],"mappings":";;;AAAA,SAAS,eAAe;AACxB,SAAS,YAAY;AACrB,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,WAAAC,gBAAe;AACxB,SAAS,QAAAC,aAAY;;;ACJrB,SAAS,oBAAoB;AAC7B,SAAS,eAAe;AACxB,SAAS,eAAe;AACxB,OAAO,UAAU;AAGjB,IAAM,kBAAkB;AACxB,IAAM,qBAAsC;AAC5C,IAAM,eAAe;AAwBrB,SAAS,YAAY,UAA0B;AAC7C,MAAI,SAAS,WAAW,IAAI,GAAG;AAC7B,WAAO,QAAQ,QAAQ,GAAG,SAAS,MAAM,CAAC,CAAC;AAAA,EAC7C;AACA,SAAO;AACT;AAEA,IAAM,mBAAsC,CAAC,QAAQ,oBAAoB,aAAa;AAEtF,SAAS,kBAAkB,OAAgC;AACzD,MAAI,CAAC,iBAAiB,SAAS,KAAK,GAAG;AACrC,UAAM,IAAI;AAAA,MACR,6BAA6B,KAAK,sBAAsB,iBAAiB,KAAK,IAAI,CAAC;AAAA,IACrF;AAAA,EACF;AACA,SAAO;AACT;AAEA,SAAS,eAAe,YAAgC;AACtD,QAAM,UAAU,aAAa,YAAY,OAAO;AAChD,QAAM,SAAS,KAAK,KAAK,OAAO;AAChC,MAAI,WAAW,QAAQ,OAAO,WAAW,UAAU;AACjD,UAAM,IAAI,MAAM,wBAAwB,UAAU,EAAE;AAAA,EACtD;AACA,QAAM,SAAS;AAEf,MAAI,OAAO,YAAY,UAAa,OAAO,OAAO,YAAY,UAAU;AACtE,UAAM,IAAI,MAAM,wDAAwD,OAAO,OAAO,OAAO,EAAE;AAAA,EACjG;AACA,MAAI,OAAO,SAAS,UAAa,OAAO,OAAO,SAAS,UAAU;AAChE,UAAM,IAAI,MAAM,qDAAqD,OAAO,OAAO,IAAI,EAAE;AAAA,EAC3F;AACA,MAAI,OAAO,cAAc,QAAW;AAClC,QAAI,OAAO,OAAO,cAAc,UAAU;AACxC,YAAM,IAAI,MAAM,0DAA0D,OAAO,OAAO,SAAS,EAAE;AAAA,IACrG;AACA,sBAAkB,OAAO,SAAS;AAAA,EACpC;AACA,MAAI,OAAO,SAAS,QAAW;AAC7B,QAAI,CAAC,MAAM,QAAQ,OAAO,IAAI,KAAK,CAAC,OAAO,KAAK,MAAM,CAAC,MAAe,OAAO,MAAM,QAAQ,GAAG;AAC5F,YAAM,IAAI,MAAM,wDAAwD;AAAA,IAC1E;AAAA,EACF;AAEA,SAAO;AACT;AAEO,SAAS,cACd,YACA,YACgB;AAEhB,MAAI,aAAyB,CAAC;AAC9B,MAAI,YAAY;AACd,iBAAa,eAAe,UAAU;AAAA,EACxC;AAGA,QAAM,UAAU,QAAQ,IAAI;AAC5B,QAAM,aAAa,QAAQ,IAAI;AAG/B,QAAM,OAAO,WAAW,QAAQ,WAAW,WAAW;AACtD,MAAI,SAAS,QAAW;AACtB,UAAM,IAAI;AAAA,MACR;AAAA,IACF;AAAA,EACF;AAEA,MAAI;AACJ,MAAI,WAAW,YAAY,QAAW;AACpC,cAAU,WAAW;AAAA,EACvB,WAAW,eAAe,QAAW;AACnC,UAAM,SAAS,OAAO,UAAU;AAChC,QAAI,OAAO,MAAM,MAAM,GAAG;AACxB,YAAM,IAAI,MAAM,sCAAsC,UAAU,EAAE;AAAA,IACpE;AACA,cAAU;AAAA,EACZ,OAAO;AACL,cAAU,WAAW,WAAW;AAAA,EAClC;AAEA,QAAM,eAAe,WAAW,aAAa,WAAW,aAAa;AACrE,QAAM,YAAY,kBAAkB,YAAY;AAGhD,QAAM,UAAU;AAAA,IACd,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB;AAAA,EACF;AACA,QAAM,OAAO,CAAC,GAAG,IAAI,IAAI,OAAO,CAAC;AAEjC,SAAO;AAAA,IACL,MAAM,YAAY,IAAI;AAAA,IACtB;AAAA,IACA;AAAA,IACA;AAAA,IACA,QAAQ,WAAW,UAAU;AAAA,IAC7B,UAAU,WAAW,YAAY;AAAA,IACjC,WAAW,WAAW,aAAa;AAAA,IACnC,QAAQ,WAAW,UAAU;AAAA,IAC7B,aAAa,WAAW,eAAe;AAAA,IACvC,YAAY,WAAW,cAAc;AAAA,IACrC,YAAY,WAAW,cAAc;AAAA,IACrC,KAAK,WAAW,OAAO;AAAA,EACzB;AACF;;;AC3IA,SAAS,gBAAqC;;;ACA9C,SAAS,YAAY,iBAAiB;AACtC,SAAS,YAAY;AACrB,SAAS,WAAAC,gBAAe;AAExB,IAAM,aAAa,KAAKA,SAAQ,GAAG,WAAW,aAAa;AAC3D,IAAM,eAAe,KAAK,YAAY,UAAU;AAEzC,SAAS,gBAAwB;AACtC,SAAO;AACT;AAEA,SAAS,cAAc,KAAqB;AAC1C,QAAM,SAAS,IAAI,IAAI,GAAG;AAC1B,SAAO,OAAO,YAAY;AAC5B;AAEO,SAAS,eAAe,KAAa,aAA6B;AACvE,QAAM,SAAS,cAAc,GAAG;AAChC,SAAO,KAAK,aAAa,GAAG,MAAM,OAAO;AAC3C;AAEO,SAAS,cAAc,KAAa,aAA8B;AACvE,QAAM,cAAc,eAAe,KAAK,WAAW;AACnD,SAAO,WAAW,WAAW;AAC/B;AAEO,SAAS,iBAAiB,aAA2B;AAC1D,MAAI,CAAC,WAAW,WAAW,GAAG;AAC5B,cAAU,aAAa,EAAE,WAAW,KAAK,CAAC;AAAA,EAC5C;AACF;;;AD1BO,IAAM,oBACX;AAUK,SAAS,0BAA0B,SAAuC;AAC/E,QAAM,UAAU,oBAAI,IAAY;AAChC,MAAI,QAAQ,YAAa,SAAQ,IAAI,OAAO;AAC5C,MAAI,QAAQ,WAAY,SAAQ,IAAI,MAAM;AAC1C,MAAI,QAAQ,WAAY,SAAQ,IAAI,OAAO;AAC3C,SAAO;AACT;AAEA,eAAsB,UACpB,KACA,QACA,aACsB;AACtB,QAAM,UAAU,MAAM,SAAS,OAAO;AAAA,IACpC,UAAU,CAAC,OAAO;AAAA,EACpB,CAAC;AAED,MAAI;AACF,UAAM,iBAA2D;AAAA,MAC/D,WAAW;AAAA,IACb;AAGA,QAAI,CAAC,OAAO,aAAa,cAAc,KAAK,WAAW,GAAG;AACxD,YAAM,cAAc,eAAe,KAAK,WAAW;AACnD,qBAAe,eAAe;AAAA,IAChC;AAEA,UAAM,UAA0B,MAAM,QAAQ,WAAW,cAAc;AACvE,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAGnC,UAAM,eAAe,0BAA0B,MAAM;AACrD,QAAI,aAAa,OAAO,GAAG;AACzB,YAAM,KAAK,MAAM,QAAQ,OAAO,UAAU;AACxC,YAAI,aAAa,IAAI,MAAM,QAAQ,EAAE,aAAa,CAAC,GAAG;AACpD,gBAAM,MAAM,MAAM;AAAA,QACpB,OAAO;AACL,gBAAM,MAAM,SAAS;AAAA,QACvB;AAAA,MACF,CAAC;AAAA,IACH;AAEA,UAAM,YAAY,OAAO,UAAU;AACnC,UAAM,WAAW,MAAM,KAAK,KAAK,KAAK;AAAA,MACpC,WAAW,OAAO;AAAA,MAClB,SAAS;AAAA,IACX,CAAC;AAED,QAAI,CAAC,UAAU;AACb,YAAM,IAAI,MAAM,6BAA6B,GAAG,EAAE;AAAA,IACpD;AAEA,UAAM,SAAS,SAAS,OAAO;AAC/B,QAAI,UAAU,KAAK;AACjB,YAAM,IAAI,MAAM,QAAQ,MAAM,kBAAkB,SAAS,IAAI,CAAC,EAAE;AAAA,IAClE;AAEA,UAAM,WAAW,SAAS,IAAI;AAC9B,UAAM,WAAW,MAAM,KAAK,QAAQ;AACpC,QAAI;AAEJ,QAAI,OAAO,UAAU;AACnB,YAAM,UAAU,MAAM,KAAK,EAAE,OAAO,QAAQ;AAC5C,UAAI,CAAC,SAAS;AACZ,cAAM,IAAI,MAAM,uBAAuB,OAAO,QAAQ,EAAE;AAAA,MAC1D;AACA,aAAO,MAAM,QAAQ,UAAU;AAAA,IACjC,OAAO;AACL,aAAO;AAAA,IACT;AAEA,UAAM,QAAQ,MAAM;AAEpB,WAAO,EAAE,MAAM,UAAU,KAAK,SAAS;AAAA,EACzC,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF;;;AE7FA,SAAS,mBAAmB;AAC5B,SAAS,aAAa;AAGtB,SAAS,eAAe,KAAe,UAAiC;AACtE,QAAM,KAAK,IAAI,cAAc,QAAQ;AACrC,SAAO,IAAI,aAAa,SAAS,KAAK;AACxC;AAEA,SAAS,aAAa,KAAqB;AACzC,SAAO,KAAK,IAAI,KAAK,CAAC;AACxB;AAEA,SAAS,qBAAqB,KAA8B;AAC1D,QAAM,YACJ,eAAe,KAAK,yCAAyC,KAC7D,eAAe,KAAK,4BAA4B;AAElD,MAAI,CAAC,WAAW;AACd,UAAM,SAAS,IAAI,cAAc,oCAAoC;AACrE,QAAI,QAAQ,aAAa;AACvB,UAAI;AACF,cAAM,OAAO,KAAK,MAAM,OAAO,WAAW;AAC1C,YAAI,OAAO,KAAK,kBAAkB,UAAU;AAC1C,iBAAO,KAAK,cAAc,MAAM,GAAG,EAAE,CAAC;AAAA,QACxC;AAAA,MACF,QAAQ;AAAA,MAER;AAAA,IACF;AACA,WAAO;AAAA,EACT;AAEA,SAAO,UAAU,MAAM,GAAG,EAAE,CAAC;AAC/B;AAEA,SAAS,eACP,KACA,mBACU;AACV,QAAM,iBAAiB,IAAI,iBAAiB,iCAAiC;AAC7E,MAAI,eAAe,SAAS,GAAG;AAC7B,WAAO,MAAM,KAAK,cAAc,EAC7B,IAAI,CAAC,OAAO,GAAG,aAAa,SAAS,CAAC,EACtC,OAAO,CAAC,MAAmB,MAAM,IAAI,EACrC,IAAI,YAAY;AAAA,EACrB;AAEA,QAAM,WAAW,eAAe,KAAK,4BAA4B;AACjE,MAAI,UAAU;AACZ,WAAO,CAAC,aAAa,QAAQ,CAAC;AAAA,EAChC;AAEA,MAAI,mBAAmB;AACrB,WAAO,CAAC,aAAa,iBAAiB,CAAC;AAAA,EACzC;AAEA,SAAO,CAAC;AACV;AAcA,SAAS,cACP,KACA,SACA,UACU;AACV,QAAM,QAAQ,SAAS,SAAS,IAAI;AACpC,QAAM,UAAU,eAAe,KAAK,SAAS,UAAU,IAAI;AAC3D,QAAM,YAAY,qBAAqB,GAAG;AAE1C,QAAM,cACJ,eAAe,KAAK,iCAAiC,KACrD,eAAe,KAAK,0BAA0B,MAC7C,SAAS,WAAW;AAEvB,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,QAAQ;AAAA,IACR,QAAQ;AAAA,IACR;AAAA,IACA,SAAS;AAAA,IACT;AAAA,EACF;AACF;AAEA,SAAS,qBAAqB,MAAc,KAAwC;AAClF,QAAM,MAAM,IAAI,MAAM,MAAM,EAAE,IAAI,CAAC;AACnC,QAAM,SAAS,IAAI,YAAY,IAAI,OAAO,QAAQ;AAClD,SAAO,OAAO,MAAM;AACtB;AAEO,SAAS,QAAQ,MAAc,UAAiC;AACrE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,MAAI,CAAC,SAAS;AACZ,UAAM,IAAI;AAAA,MACR;AAAA,IAEF;AAAA,EACF;AAEA,MAAI,CAAC,QAAQ,SAAS;AACpB,UAAM,IAAI,MAAM,iDAAiD;AAAA,EACnE;AAEA,SAAO;AAAA,IACL,UAAU,cAAc,KAAK,SAAS,QAAQ;AAAA,IAC9C,SAAS,QAAQ;AAAA,EACnB;AACF;AAEO,SAAS,gBAAgB,MAAc,UAA4B;AACxE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,SAAO,cAAc,KAAK,SAAS,QAAQ;AAC7C;;;ACtIA,OAAO,qBAAqB;AAErB,SAAS,kBAAkB,MAAsB;AACtD,QAAM,WAAW,IAAI,gBAAgB;AAAA,IACnC,cAAc;AAAA,IACd,gBAAgB;AAAA,IAChB,kBAAkB;AAAA,EACpB,CAAC;AAED,SAAO,SAAS,SAAS,IAAI;AAC/B;;;ACVA,SAAS,qBAAqB;AAC9B,SAAS,QAAAC,aAAY;AACrB,OAAOC,WAAU;AAGjB,IAAM,eAAe;AACrB,IAAM,gBAAgB;AACtB,IAAM,sBAAsB;AAErB,SAAS,iBAAiB,OAAuB;AACtD,QAAM,YAAY,MACf,QAAQ,eAAe,EAAE,EACzB,QAAQ,cAAc,EAAE,EACxB,QAAQ,QAAQ,GAAG,EACnB,KAAK;AACR,QAAM,OAAO,UAAU,MAAM,GAAG,mBAAmB,KAAK;AACxD,SAAO,GAAG,IAAI;AAChB;AAEO,SAAS,iBAAiB,UAAoB,MAAwB;AAC3E,QAAM,OAAgC;AAAA,IACpC,OAAO,SAAS;AAAA,IAChB,QAAQ,SAAS;AAAA,EACnB;AAEA,MAAI,SAAS,OAAO,SAAS,GAAG;AAC9B,SAAK,SAAS,SAAS;AAAA,EACzB;AAEA,MAAI,SAAS,WAAW;AACtB,SAAK,YAAY,SAAS;AAAA,EAC5B;AAEA,OAAK,UAAU,SAAS;AAExB,MAAI,SAAS,aAAa;AACxB,SAAK,cAAc,SAAS;AAAA,EAC9B;AAEA,OAAK,OAAO;AAEZ,QAAM,UAAUA,MAAK,KAAK,MAAM;AAAA,IAC9B,aAAa;AAAA,IACb,aAAa;AAAA,IACb,WAAW;AAAA,IACX,UAAU;AAAA,EACZ,CAAC;AAED,SAAO;AAAA,EAAQ,OAAO;AACxB;AAEO,SAAS,kBACd,MACA,UACA,iBACA,MACQ;AACR,QAAM,WAAW,iBAAiB,SAAS,KAAK;AAChD,QAAM,WAAWD,MAAK,MAAM,QAAQ;AACpC,QAAM,cAAc,iBAAiB,UAAU,IAAI;AACnD,QAAM,cAAc,GAAG,WAAW;AAAA;AAAA,EAAO,eAAe;AAAA;AAExD,gBAAc,UAAU,aAAa,OAAO;AAE5C,SAAO;AACT;;;ANhDA,IAAM,cAAcE,MAAKC,SAAQ,GAAG,WAAW,eAAe,aAAa;AAE3E,IAAM,UAAU,IAAI,QAAQ;AAE5B,QACG,KAAK,aAAa,EAClB;AAAA,EACC;AACF,EACC,QAAQ,OAAO;AAElB,QACG,QAAQ,OAAO,EACf,YAAY,mCAAmC,EAC/C,SAAS,SAAS,cAAc,EAChC,OAAO,iBAAiB,uBAAuB,EAC/C,OAAO,YAAY,4BAA4B,EAC/C,OAAO,oBAAoB,yBAAyB,EACpD,OAAO,uBAAuB,sBAAsB,QAAQ,EAC5D,OAAO,gBAAgB,wBAAwB,CAAC,KAAa,QAAkB;AAC9E,MAAI,KAAK,GAAG;AACZ,SAAO;AACT,GAAG,CAAC,CAAa,EAChB;AAAA,EACC;AAAA,EACA;AACF,EACC,OAAO,kBAAkB,0BAA0B,EACnD,OAAO,aAAa,oCAAoC,EACxD,OAAO,qBAAqB,6BAA6B,EACzD,OAAO,oBAAoB,4BAA4B,EACvD,OAAO,oBAAoB,6BAA6B,EACxD,OAAO,SAAS,uDAAuD,EACvE,OAAO,OAAO,KAAa,YAAqC;AAC/D,MAAI;AACF,UAAM,aAAaC,YAAW,WAAW,IAAI,cAAc;AAC3D,UAAM,SAAS;AAAA,MACb;AAAA,QACE,MAAM,QAAQ;AAAA,QACd,MAAM,QAAQ;AAAA,QACd,SAAS,QAAQ;AAAA,QACjB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,UAAU,QAAQ;AAAA,QAClB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,aAAa,QAAQ;AAAA,QACrB,YAAY,QAAQ;AAAA,QACpB,YAAY,QAAQ;AAAA,QACpB,KAAK,QAAQ;AAAA,MACf;AAAA,MACA;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,UAAU;AACjC,YAAM,IAAI,MAAM,+CAA+C;AAAA,IACjE;AAGA,QAAI,CAAC,OAAO,UAAU,CAACA,YAAW,OAAO,IAAI,GAAG;AAC9C,YAAM,IAAI,MAAM,yCAAyC,OAAO,IAAI,EAAE;AAAA,IACxE;AAEA,UAAM,cAAc,cAAc;AAClC,UAAM,cAAc,MAAM,UAAU,KAAK,QAAQ,WAAW;AAE5D,QAAI;AACJ,QAAI;AAEJ,QAAI,OAAO,UAAU;AAEnB,oBAAc,YAAY;AAC1B,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AAAA,IACvE,WAAW,OAAO,KAAK;AAErB,oBAAc,YAAY;AAC1B,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AAAA,IACvE,OAAO;AACL,YAAM,SAAS,QAAQ,YAAY,MAAM,YAAY,QAAQ;AAC7D,iBAAW,OAAO;AAClB,oBAAc,OAAO;AAAA,IACvB;AAEA,UAAM,WAAW,kBAAkB,WAAW;AAE9C,QAAI,OAAO,QAAQ;AACjB,YAAM,cAAc,iBAAiB,UAAU,OAAO,IAAI;AAC1D,cAAQ,OAAO,MAAM,GAAG,WAAW;AAAA;AAAA,EAAO,QAAQ;AAAA,CAAI;AAAA,IACxD,OAAO;AACL,YAAM,WAAW;AAAA,QACf,OAAO;AAAA,QACP;AAAA,QACA;AAAA,QACA,OAAO;AAAA,MACT;AACA,cAAQ,MAAM,UAAU,QAAQ,EAAE;AAAA,IACpC;AAAA,EACF,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB;AACF,CAAC;AAEH,QACG,QAAQ,OAAO,EACf,YAAY,kCAAkC,EAC9C,SAAS,SAAS,cAAc,EAChC,OAAO,uBAAuB,4BAA4B,QAAQ,EAClE,OAAO,OAAO,KAAa,YAAqC;AAC/D,QAAM,EAAE,UAAAC,UAAS,IAAI,MAAM,OAAO,YAAY;AAC9C,QAAM,cAAc,cAAc;AAClC,mBAAiB,WAAW;AAE5B,QAAM,aAAc,QAAQ,WAAkC;AAC9D,QAAM,UAAU,MAAMA,UAAS,OAAO,EAAE,UAAU,MAAM,CAAC;AAEzD,MAAI;AACF,UAAM,UAAU,MAAM,QAAQ,WAAW;AACzC,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAEnC,UAAM,KAAK,KAAK,KAAK,EAAE,WAAW,eAAe,SAAS,aAAa,IAAK,CAAC;AAE7E,YAAQ,MAAM,yEAAyE;AAEvF,YAAQ,MAAM,OAAO;AACrB,UAAM,KAAK,QAAQ,OAAO,MAAM;AAChC,YAAQ,MAAM,MAAM;AACpB,YAAQ,MAAM,MAAM;AAEpB,UAAM,cAAc,eAAe,KAAK,WAAW;AACnD,UAAM,QAAQ,aAAa,EAAE,MAAM,YAAY,CAAC;AAChD,YAAQ,MAAM,kBAAkB,WAAW,EAAE;AAAA,EAC/C,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF,CAAC;AAEH,QAAQ,MAAM;","names":["existsSync","homedir","join","homedir","join","yaml","join","homedir","existsSync","chromium"]}
1
+ {"version":3,"sources":["../src/cli.ts","../src/config.ts","../src/fetcher.ts","../src/session.ts","../src/extractor.ts","../src/converter.ts","../src/writer.ts"],"sourcesContent":["import { Command } from \"commander\";\nimport { once } from \"node:events\";\nimport { existsSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { join } from \"node:path\";\nimport { resolveConfig } from \"./config.js\";\nimport { fetchPage } from \"./fetcher.js\";\nimport { extract, extractMetadata } from \"./extractor.js\";\nimport { convertToMarkdown } from \"./converter.js\";\nimport { writeMarkdownFile, buildFrontmatter } from \"./writer.js\";\nimport {\n getSessionDir,\n getSessionPath,\n ensureSessionDir,\n} from \"./session.js\";\nimport type { Metadata, WaitUntilOption } from \"./types.js\";\n\nconst CONFIG_PATH = join(homedir(), \".config\", \"vault-fetch\", \"config.yaml\");\n\nconst program = new Command();\n\nprogram\n .name(\"vault-fetch\")\n .description(\n \"Fetch JS-rendered web pages and save as Markdown to Obsidian Vault\",\n )\n .version(\"0.3.0\");\n\nprogram\n .command(\"fetch\")\n .description(\"Fetch a page and save as Markdown\")\n .argument(\"<url>\", \"URL to fetch\")\n .option(\"--dest <path>\", \"Destination directory\")\n .option(\"--headed\", \"Run browser in headed mode\")\n .option(\"--selector <css>\", \"CSS selector to extract\")\n .option(\"--timeout <seconds>\", \"Timeout in seconds\", parseInt)\n .option(\"--tag <name>\", \"Add tag (repeatable)\", (val: string, acc: string[]) => {\n acc.push(val);\n return acc;\n }, [] as string[])\n .option(\n \"--wait-until <event>\",\n \"Wait condition: load, domcontentloaded, networkidle\",\n )\n .option(\"--skip-session\", \"Do not use saved session\")\n .option(\"--dry-run\", \"Output to stdout instead of saving\")\n .option(\"--no-block-images\", \"Do not block image requests\")\n .option(\"--no-block-fonts\", \"Do not block font requests\")\n .option(\"--no-block-media\", \"Do not block media requests\")\n .option(\"--raw\", \"Convert full page HTML without Readability extraction\")\n .option(\"--title <text>\", \"Override the page title for the output filename\")\n .action(async (url: string, options: Record<string, unknown>) => {\n try {\n const configPath = existsSync(CONFIG_PATH) ? CONFIG_PATH : undefined;\n const config = resolveConfig(\n {\n dest: options.dest as string | undefined,\n tags: options.tag as string[] | undefined,\n timeout: options.timeout as number | undefined,\n waitUntil: options.waitUntil as WaitUntilOption | undefined,\n headed: options.headed as boolean | undefined,\n selector: options.selector as string | undefined,\n title: options.title as string | undefined,\n noSession: options.skipSession as boolean | undefined,\n dryRun: options.dryRun as boolean | undefined,\n blockImages: options.blockImages as boolean | undefined,\n blockFonts: options.blockFonts as boolean | undefined,\n blockMedia: options.blockMedia as boolean | undefined,\n raw: options.raw as boolean | undefined,\n },\n configPath,\n );\n\n if (config.raw && config.selector) {\n throw new Error(\"--raw and --selector cannot be used together.\");\n }\n\n // Validate dest directory exists\n if (!config.dryRun && !existsSync(config.dest)) {\n throw new Error(`Destination directory does not exist: ${config.dest}`);\n }\n\n const sessionsDir = getSessionDir();\n const fetchResult = await fetchPage(url, config, sessionsDir);\n\n let markdown: string;\n let metadata: Metadata;\n\n if (fetchResult.kind === \"pdf\") {\n if (config.selector) {\n throw new Error(\"--selector cannot be used with PDF URLs.\");\n }\n if (config.raw) {\n throw new Error(\"--raw cannot be used with PDF URLs.\");\n }\n const { convertPdfToMarkdown } = await import(\"./pdf-converter.js\");\n const pdfResult = await convertPdfToMarkdown(\n fetchResult.pdfBuffer,\n fetchResult.finalUrl,\n );\n markdown = pdfResult.markdown;\n metadata = pdfResult.metadata;\n } else if (config.selector) {\n // --selector mode: skip Readability, extract metadata from full page\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n markdown = convertToMarkdown(fetchResult.html);\n } else if (config.raw) {\n // --raw mode: skip Readability, convert full page HTML directly\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n markdown = convertToMarkdown(fetchResult.fullHtml);\n } else {\n const result = extract(fetchResult.html, fetchResult.finalUrl);\n metadata = result.metadata;\n markdown = convertToMarkdown(result.content);\n }\n\n if (config.title !== null) {\n metadata = { ...metadata, title: config.title };\n }\n\n if (config.dryRun) {\n const frontmatter = buildFrontmatter(metadata, config.tags);\n process.stdout.write(`${frontmatter}\\n\\n${markdown}\\n`);\n } else {\n const filePath = writeMarkdownFile(\n config.dest,\n metadata,\n markdown,\n config.tags,\n );\n console.error(`Saved: ${filePath}`);\n }\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n }\n });\n\nprogram\n .command(\"login\")\n .description(\"Login to a site and save session\")\n .argument(\"<url>\", \"URL to login\")\n .option(\"--timeout <seconds>\", \"Login timeout in seconds\", parseInt)\n .action(async (url: string, options: Record<string, unknown>) => {\n const { chromium } = await import(\"playwright\");\n const sessionsDir = getSessionDir();\n ensureSessionDir(sessionsDir);\n\n const timeoutSec = (options.timeout as number | undefined) ?? 300;\n const browser = await chromium.launch({ headless: false });\n\n try {\n const context = await browser.newContext();\n const page = await context.newPage();\n\n await page.goto(url, { waitUntil: \"networkidle\", timeout: timeoutSec * 1000 });\n\n console.error(\"Browser opened. Log in manually, then press Enter here to save session.\");\n\n process.stdin.resume();\n await once(process.stdin, \"data\");\n process.stdin.pause();\n process.stdin.unref();\n\n const sessionPath = getSessionPath(url, sessionsDir);\n await context.storageState({ path: sessionPath });\n console.error(`Session saved: ${sessionPath}`);\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n } finally {\n await browser.close();\n }\n });\n\nprogram.parse();\n","import { readFileSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { resolve } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { ResolvedConfig, WaitUntilOption } from \"./types.js\";\n\nconst DEFAULT_TIMEOUT = 30;\nconst DEFAULT_WAIT_UNTIL: WaitUntilOption = \"networkidle\";\nconst REQUIRED_TAG = \"clippings\";\n\ninterface FileConfig {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n}\n\ninterface CliOptions {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n headed?: boolean;\n selector?: string;\n title?: string;\n noSession?: boolean;\n dryRun?: boolean;\n blockImages?: boolean;\n blockFonts?: boolean;\n blockMedia?: boolean;\n raw?: boolean;\n}\n\nfunction expandTilde(filePath: string): string {\n if (filePath.startsWith(\"~/\")) {\n return resolve(homedir(), filePath.slice(2));\n }\n return filePath;\n}\n\nconst VALID_WAIT_UNTIL: readonly string[] = [\"load\", \"domcontentloaded\", \"networkidle\"];\n\nfunction validateWaitUntil(value: string): WaitUntilOption {\n if (!VALID_WAIT_UNTIL.includes(value)) {\n throw new Error(\n `Invalid waitUntil value: \"${value}\". Must be one of: ${VALID_WAIT_UNTIL.join(\", \")}`,\n );\n }\n return value as WaitUntilOption;\n}\n\nfunction loadConfigFile(configPath: string): FileConfig {\n const content = readFileSync(configPath, \"utf-8\");\n const parsed = yaml.load(content);\n if (parsed === null || typeof parsed !== \"object\") {\n throw new Error(`Invalid config file: ${configPath}`);\n }\n const config = parsed as Record<string, unknown>;\n\n if (config.timeout !== undefined && typeof config.timeout !== \"number\") {\n throw new Error(`Invalid timeout in config file: expected number, got ${typeof config.timeout}`);\n }\n if (config.dest !== undefined && typeof config.dest !== \"string\") {\n throw new Error(`Invalid dest in config file: expected string, got ${typeof config.dest}`);\n }\n if (config.waitUntil !== undefined) {\n if (typeof config.waitUntil !== \"string\") {\n throw new Error(`Invalid waitUntil in config file: expected string, got ${typeof config.waitUntil}`);\n }\n validateWaitUntil(config.waitUntil);\n }\n if (config.tags !== undefined) {\n if (!Array.isArray(config.tags) || !config.tags.every((t: unknown) => typeof t === \"string\")) {\n throw new Error(\"Invalid tags in config file: expected array of strings\");\n }\n }\n\n return config as FileConfig;\n}\n\nexport function resolveConfig(\n cliOptions: CliOptions,\n configPath: string | undefined,\n): ResolvedConfig {\n // Layer 1: Config file\n let fileConfig: FileConfig = {};\n if (configPath) {\n fileConfig = loadConfigFile(configPath);\n }\n\n // Layer 2: Environment variables\n const envDest = process.env.VAULT_FETCH_DEST;\n const envTimeout = process.env.VAULT_FETCH_TIMEOUT;\n\n // Resolve each field: CLI > env > file > default\n const dest = cliOptions.dest ?? envDest ?? fileConfig.dest;\n if (dest === undefined) {\n throw new Error(\n \"dest is required. Set via --dest, VAULT_FETCH_DEST, or config file.\",\n );\n }\n\n let timeout: number;\n if (cliOptions.timeout !== undefined) {\n timeout = cliOptions.timeout;\n } else if (envTimeout !== undefined) {\n const parsed = Number(envTimeout);\n if (Number.isNaN(parsed)) {\n throw new Error(`Invalid VAULT_FETCH_TIMEOUT value: ${envTimeout}`);\n }\n timeout = parsed;\n } else {\n timeout = fileConfig.timeout ?? DEFAULT_TIMEOUT;\n }\n\n const rawWaitUntil = cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;\n const waitUntil = validateWaitUntil(rawWaitUntil);\n\n // Merge tags: file tags + CLI tags + always clippings\n const allTags = [\n ...(fileConfig.tags ?? []),\n ...(cliOptions.tags ?? []),\n REQUIRED_TAG,\n ];\n const tags = [...new Set(allTags)];\n\n return {\n dest: expandTilde(dest),\n tags,\n timeout,\n waitUntil,\n headed: cliOptions.headed ?? false,\n selector: cliOptions.selector ?? null,\n title: cliOptions.title ?? null,\n noSession: cliOptions.noSession ?? false,\n dryRun: cliOptions.dryRun ?? false,\n blockImages: cliOptions.blockImages ?? true,\n blockFonts: cliOptions.blockFonts ?? true,\n blockMedia: cliOptions.blockMedia ?? true,\n raw: cliOptions.raw ?? false,\n };\n}\n","import { chromium, type BrowserContext } from \"playwright\";\nimport type { FetchResult, ResolvedConfig } from \"./types.js\";\nimport { getSessionPath, sessionExists } from \"./session.js\";\n\nexport const CHROME_USER_AGENT =\n \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) \" +\n \"AppleWebKit/537.36 (KHTML, like Gecko) \" +\n \"Chrome/134.0.0.0 Safari/537.36\";\n\ninterface BlockingOptions {\n blockImages: boolean;\n blockFonts: boolean;\n blockMedia: boolean;\n}\n\nexport function isPdfContentType(contentType: string): boolean {\n return contentType.toLowerCase().includes(\"application/pdf\");\n}\n\nexport function buildBlockedResourceTypes(options: BlockingOptions): Set<string> {\n const blocked = new Set<string>();\n if (options.blockImages) blocked.add(\"image\");\n if (options.blockFonts) blocked.add(\"font\");\n if (options.blockMedia) blocked.add(\"media\");\n return blocked;\n}\n\nconst PDF_MAGIC_BYTES = \"%PDF\";\n\nexport function validatePdfBuffer(pdfBuffer: Buffer, sourceUrl: string): void {\n if (pdfBuffer.length === 0) {\n throw new Error(`Empty PDF response received from ${sourceUrl}`);\n }\n const header = pdfBuffer.subarray(0, PDF_MAGIC_BYTES.length).toString(\"ascii\");\n if (!header.startsWith(PDF_MAGIC_BYTES)) {\n throw new Error(\n `Response Content-Type is application/pdf but body is not valid PDF data from ${sourceUrl}`,\n );\n }\n}\n\nasync function downloadPdf(\n context: BrowserContext,\n url: string,\n timeoutMs: number,\n): Promise<{ pdfBuffer: Buffer; finalUrl: string }> {\n const apiResponse = await context.request.get(url, { timeout: timeoutMs });\n const status = apiResponse.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received when downloading PDF from ${url}`);\n }\n const pdfBuffer = Buffer.from(await apiResponse.body());\n const finalUrl = apiResponse.url();\n validatePdfBuffer(pdfBuffer, finalUrl);\n return { pdfBuffer, finalUrl };\n}\n\nexport async function fetchPage(\n url: string,\n config: ResolvedConfig,\n sessionsDir: string,\n): Promise<FetchResult> {\n const browser = await chromium.launch({\n headless: !config.headed,\n });\n\n try {\n const contextOptions: Parameters<typeof browser.newContext>[0] = {\n userAgent: CHROME_USER_AGENT,\n };\n\n // Load session if available and not disabled\n if (!config.noSession && sessionExists(url, sessionsDir)) {\n const sessionPath = getSessionPath(url, sessionsDir);\n contextOptions.storageState = sessionPath;\n }\n\n const context: BrowserContext = await browser.newContext(contextOptions);\n const page = await context.newPage();\n\n // Block specified resource types for faster loading\n const blockedTypes = buildBlockedResourceTypes(config);\n if (blockedTypes.size > 0) {\n await page.route(\"**/*\", async (route) => {\n if (blockedTypes.has(route.request().resourceType())) {\n await route.abort();\n } else {\n await route.continue();\n }\n });\n }\n\n const timeoutMs = config.timeout * 1000;\n\n // page.goto throws \"Download is starting\" when the server returns\n // Content-Disposition: attachment (common for PDF downloads).\n // Catch this and download the PDF via the context's HTTP client.\n let response;\n try {\n response = await page.goto(url, {\n waitUntil: config.waitUntil,\n timeout: timeoutMs,\n });\n } catch (error) {\n if (\n error instanceof Error &&\n error.message.includes(\"Download is starting\")\n ) {\n const result = await downloadPdf(context, url, timeoutMs);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };\n }\n throw error;\n }\n\n if (!response) {\n throw new Error(`No response received from ${url}`);\n }\n\n const status = response.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received from ${response.url()}`);\n }\n\n const finalUrl = response.url();\n const contentType = response.headers()[\"content-type\"] ?? \"\";\n\n // Inline PDF (Content-Disposition: inline or absent).\n // Try response.body() first; fall back to context.request if the\n // browser returned its PDF viewer HTML instead of the actual bytes.\n if (isPdfContentType(contentType)) {\n const body = await response.body();\n try {\n validatePdfBuffer(body, finalUrl);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: body, url, finalUrl };\n } catch {\n // response.body() returned PDF viewer HTML; re-download via API\n const result = await downloadPdf(context, finalUrl, timeoutMs);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };\n }\n }\n\n const fullHtml = await page.content();\n let html: string;\n\n if (config.selector) {\n const element = await page.$(config.selector);\n if (!element) {\n throw new Error(`Selector not found: ${config.selector}`);\n }\n html = await element.innerHTML();\n } else {\n html = fullHtml;\n }\n\n await context.close();\n\n return { kind: \"html\", html, fullHtml, url, finalUrl };\n } finally {\n await browser.close();\n }\n}\n","import { existsSync, mkdirSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport { homedir } from \"node:os\";\n\nconst CONFIG_DIR = join(homedir(), \".config\", \"vault-fetch\");\nconst SESSIONS_DIR = join(CONFIG_DIR, \"sessions\");\n\nexport function getSessionDir(): string {\n return SESSIONS_DIR;\n}\n\nfunction extractDomain(url: string): string {\n const parsed = new URL(url);\n return parsed.hostname ?? \"\";\n}\n\nexport function getSessionPath(url: string, sessionsDir: string): string {\n const domain = extractDomain(url);\n return join(sessionsDir, `${domain}.json`);\n}\n\nexport function sessionExists(url: string, sessionsDir: string): boolean {\n const sessionPath = getSessionPath(url, sessionsDir);\n return existsSync(sessionPath);\n}\n\nexport function ensureSessionDir(sessionsDir: string): void {\n if (!existsSync(sessionsDir)) {\n mkdirSync(sessionsDir, { recursive: true });\n }\n}\n","import { Readability } from \"@mozilla/readability\";\nimport { JSDOM } from \"jsdom\";\nimport type { Metadata } from \"./types.js\";\n\nfunction getMetaContent(doc: Document, selector: string): string | null {\n const el = doc.querySelector(selector);\n return el?.getAttribute(\"content\") ?? null;\n}\n\nfunction formatAuthor(raw: string): string {\n return `[[${raw.trim()}]]`;\n}\n\nfunction extractPublishedDate(doc: Document): string | null {\n const published =\n getMetaContent(doc, 'meta[property=\"article:published_time\"]') ??\n getMetaContent(doc, 'meta[name=\"datePublished\"]');\n\n if (!published) {\n const jsonLd = doc.querySelector('script[type=\"application/ld+json\"]');\n if (jsonLd?.textContent) {\n try {\n const data = JSON.parse(jsonLd.textContent) as Record<string, unknown>;\n if (typeof data.datePublished === \"string\") {\n return data.datePublished.split(\"T\")[0];\n }\n } catch {\n // JSON-LD parse failed\n }\n }\n return null;\n }\n\n return published.split(\"T\")[0];\n}\n\nfunction extractAuthors(\n doc: Document,\n readabilityByline: string | null,\n): string[] {\n const articleAuthors = doc.querySelectorAll('meta[property=\"article:author\"]');\n if (articleAuthors.length > 0) {\n return Array.from(articleAuthors)\n .map((el) => el.getAttribute(\"content\"))\n .filter((v): v is string => v !== null)\n .map(formatAuthor);\n }\n\n const ogAuthor = getMetaContent(doc, 'meta[property=\"og:author\"]');\n if (ogAuthor) {\n return [formatAuthor(ogAuthor)];\n }\n\n if (readabilityByline) {\n return [formatAuthor(readabilityByline)];\n }\n\n return [];\n}\n\nexport interface ExtractResult {\n metadata: Metadata;\n content: string;\n}\n\ninterface ReadabilityArticle {\n title: string;\n byline: string | null;\n excerpt: string;\n content: string;\n}\n\nfunction buildMetadata(\n doc: Document,\n article: ReadabilityArticle | null,\n finalUrl: string,\n): Metadata {\n const title = article?.title ?? doc.title;\n const authors = extractAuthors(doc, article?.byline ?? null);\n const published = extractPublishedDate(doc);\n\n const description =\n getMetaContent(doc, 'meta[property=\"og:description\"]') ??\n getMetaContent(doc, 'meta[name=\"description\"]') ??\n (article?.excerpt ?? null);\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n title,\n source: finalUrl,\n author: authors,\n published,\n created: today,\n description,\n };\n}\n\nfunction parseWithReadability(html: string, url: string): ReadabilityArticle | null {\n const dom = new JSDOM(html, { url });\n const reader = new Readability(dom.window.document);\n return reader.parse() as ReadabilityArticle | null;\n}\n\nexport function extract(html: string, finalUrl: string): ExtractResult {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n if (!article) {\n throw new Error(\n \"Readability failed to extract content from the page. \" +\n \"Try --raw to convert the full page, or --selector <css> to target specific content.\",\n );\n }\n\n if (!article.content) {\n throw new Error(\"Readability returned empty content for the page\");\n }\n\n return {\n metadata: buildMetadata(doc, article, finalUrl),\n content: article.content,\n };\n}\n\nexport function extractMetadata(html: string, finalUrl: string): Metadata {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n return buildMetadata(doc, article, finalUrl);\n}\n","import TurndownService from \"turndown\";\n\nexport function convertToMarkdown(html: string): string {\n const turndown = new TurndownService({\n headingStyle: \"atx\",\n codeBlockStyle: \"fenced\",\n bulletListMarker: \"-\",\n });\n\n return turndown.turndown(html);\n}\n","import { writeFileSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { Metadata } from \"./types.js\";\n\nconst UNSAFE_CHARS = /[/\\\\:*?\"<>|]/g;\nconst CONTROL_CHARS = /[\\x00-\\x1f\\x7f]/g;\nconst MAX_FILENAME_LENGTH = 200;\n\nexport function sanitizeFilename(title: string): string {\n const sanitized = title\n .replace(CONTROL_CHARS, \"\")\n .replace(UNSAFE_CHARS, \"\")\n .replace(/\\s+/g, \" \")\n .trim();\n const base = sanitized.slice(0, MAX_FILENAME_LENGTH) || \"Untitled\";\n return `${base}.md`;\n}\n\nexport function buildFrontmatter(metadata: Metadata, tags: string[]): string {\n const data: Record<string, unknown> = {\n title: metadata.title,\n source: metadata.source,\n };\n\n if (metadata.author.length > 0) {\n data.author = metadata.author;\n }\n\n if (metadata.published) {\n data.published = metadata.published;\n }\n\n data.created = metadata.created;\n\n if (metadata.description) {\n data.description = metadata.description;\n }\n\n data.tags = tags;\n\n const yamlStr = yaml.dump(data, {\n quotingType: '\"',\n forceQuotes: false,\n lineWidth: -1,\n sortKeys: false,\n });\n\n return `---\\n${yamlStr}---`;\n}\n\nexport function writeMarkdownFile(\n dest: string,\n metadata: Metadata,\n markdownContent: string,\n tags: string[],\n): string {\n const filename = sanitizeFilename(metadata.title);\n const filePath = join(dest, filename);\n const frontmatter = buildFrontmatter(metadata, tags);\n const fullContent = `${frontmatter}\\n\\n${markdownContent}\\n`;\n\n writeFileSync(filePath, fullContent, \"utf-8\");\n\n return filePath;\n}\n"],"mappings":";;;AAAA,SAAS,eAAe;AACxB,SAAS,YAAY;AACrB,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,WAAAC,gBAAe;AACxB,SAAS,QAAAC,aAAY;;;ACJrB,SAAS,oBAAoB;AAC7B,SAAS,eAAe;AACxB,SAAS,eAAe;AACxB,OAAO,UAAU;AAGjB,IAAM,kBAAkB;AACxB,IAAM,qBAAsC;AAC5C,IAAM,eAAe;AAyBrB,SAAS,YAAY,UAA0B;AAC7C,MAAI,SAAS,WAAW,IAAI,GAAG;AAC7B,WAAO,QAAQ,QAAQ,GAAG,SAAS,MAAM,CAAC,CAAC;AAAA,EAC7C;AACA,SAAO;AACT;AAEA,IAAM,mBAAsC,CAAC,QAAQ,oBAAoB,aAAa;AAEtF,SAAS,kBAAkB,OAAgC;AACzD,MAAI,CAAC,iBAAiB,SAAS,KAAK,GAAG;AACrC,UAAM,IAAI;AAAA,MACR,6BAA6B,KAAK,sBAAsB,iBAAiB,KAAK,IAAI,CAAC;AAAA,IACrF;AAAA,EACF;AACA,SAAO;AACT;AAEA,SAAS,eAAe,YAAgC;AACtD,QAAM,UAAU,aAAa,YAAY,OAAO;AAChD,QAAM,SAAS,KAAK,KAAK,OAAO;AAChC,MAAI,WAAW,QAAQ,OAAO,WAAW,UAAU;AACjD,UAAM,IAAI,MAAM,wBAAwB,UAAU,EAAE;AAAA,EACtD;AACA,QAAM,SAAS;AAEf,MAAI,OAAO,YAAY,UAAa,OAAO,OAAO,YAAY,UAAU;AACtE,UAAM,IAAI,MAAM,wDAAwD,OAAO,OAAO,OAAO,EAAE;AAAA,EACjG;AACA,MAAI,OAAO,SAAS,UAAa,OAAO,OAAO,SAAS,UAAU;AAChE,UAAM,IAAI,MAAM,qDAAqD,OAAO,OAAO,IAAI,EAAE;AAAA,EAC3F;AACA,MAAI,OAAO,cAAc,QAAW;AAClC,QAAI,OAAO,OAAO,cAAc,UAAU;AACxC,YAAM,IAAI,MAAM,0DAA0D,OAAO,OAAO,SAAS,EAAE;AAAA,IACrG;AACA,sBAAkB,OAAO,SAAS;AAAA,EACpC;AACA,MAAI,OAAO,SAAS,QAAW;AAC7B,QAAI,CAAC,MAAM,QAAQ,OAAO,IAAI,KAAK,CAAC,OAAO,KAAK,MAAM,CAAC,MAAe,OAAO,MAAM,QAAQ,GAAG;AAC5F,YAAM,IAAI,MAAM,wDAAwD;AAAA,IAC1E;AAAA,EACF;AAEA,SAAO;AACT;AAEO,SAAS,cACd,YACA,YACgB;AAEhB,MAAI,aAAyB,CAAC;AAC9B,MAAI,YAAY;AACd,iBAAa,eAAe,UAAU;AAAA,EACxC;AAGA,QAAM,UAAU,QAAQ,IAAI;AAC5B,QAAM,aAAa,QAAQ,IAAI;AAG/B,QAAM,OAAO,WAAW,QAAQ,WAAW,WAAW;AACtD,MAAI,SAAS,QAAW;AACtB,UAAM,IAAI;AAAA,MACR;AAAA,IACF;AAAA,EACF;AAEA,MAAI;AACJ,MAAI,WAAW,YAAY,QAAW;AACpC,cAAU,WAAW;AAAA,EACvB,WAAW,eAAe,QAAW;AACnC,UAAM,SAAS,OAAO,UAAU;AAChC,QAAI,OAAO,MAAM,MAAM,GAAG;AACxB,YAAM,IAAI,MAAM,sCAAsC,UAAU,EAAE;AAAA,IACpE;AACA,cAAU;AAAA,EACZ,OAAO;AACL,cAAU,WAAW,WAAW;AAAA,EAClC;AAEA,QAAM,eAAe,WAAW,aAAa,WAAW,aAAa;AACrE,QAAM,YAAY,kBAAkB,YAAY;AAGhD,QAAM,UAAU;AAAA,IACd,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB;AAAA,EACF;AACA,QAAM,OAAO,CAAC,GAAG,IAAI,IAAI,OAAO,CAAC;AAEjC,SAAO;AAAA,IACL,MAAM,YAAY,IAAI;AAAA,IACtB;AAAA,IACA;AAAA,IACA;AAAA,IACA,QAAQ,WAAW,UAAU;AAAA,IAC7B,UAAU,WAAW,YAAY;AAAA,IACjC,OAAO,WAAW,SAAS;AAAA,IAC3B,WAAW,WAAW,aAAa;AAAA,IACnC,QAAQ,WAAW,UAAU;AAAA,IAC7B,aAAa,WAAW,eAAe;AAAA,IACvC,YAAY,WAAW,cAAc;AAAA,IACrC,YAAY,WAAW,cAAc;AAAA,IACrC,KAAK,WAAW,OAAO;AAAA,EACzB;AACF;;;AC7IA,SAAS,gBAAqC;;;ACA9C,SAAS,YAAY,iBAAiB;AACtC,SAAS,YAAY;AACrB,SAAS,WAAAC,gBAAe;AAExB,IAAM,aAAa,KAAKA,SAAQ,GAAG,WAAW,aAAa;AAC3D,IAAM,eAAe,KAAK,YAAY,UAAU;AAEzC,SAAS,gBAAwB;AACtC,SAAO;AACT;AAEA,SAAS,cAAc,KAAqB;AAC1C,QAAM,SAAS,IAAI,IAAI,GAAG;AAC1B,SAAO,OAAO,YAAY;AAC5B;AAEO,SAAS,eAAe,KAAa,aAA6B;AACvE,QAAM,SAAS,cAAc,GAAG;AAChC,SAAO,KAAK,aAAa,GAAG,MAAM,OAAO;AAC3C;AAEO,SAAS,cAAc,KAAa,aAA8B;AACvE,QAAM,cAAc,eAAe,KAAK,WAAW;AACnD,SAAO,WAAW,WAAW;AAC/B;AAEO,SAAS,iBAAiB,aAA2B;AAC1D,MAAI,CAAC,WAAW,WAAW,GAAG;AAC5B,cAAU,aAAa,EAAE,WAAW,KAAK,CAAC;AAAA,EAC5C;AACF;;;AD1BO,IAAM,oBACX;AAUK,SAAS,iBAAiB,aAA8B;AAC7D,SAAO,YAAY,YAAY,EAAE,SAAS,iBAAiB;AAC7D;AAEO,SAAS,0BAA0B,SAAuC;AAC/E,QAAM,UAAU,oBAAI,IAAY;AAChC,MAAI,QAAQ,YAAa,SAAQ,IAAI,OAAO;AAC5C,MAAI,QAAQ,WAAY,SAAQ,IAAI,MAAM;AAC1C,MAAI,QAAQ,WAAY,SAAQ,IAAI,OAAO;AAC3C,SAAO;AACT;AAEA,IAAM,kBAAkB;AAEjB,SAAS,kBAAkB,WAAmB,WAAyB;AAC5E,MAAI,UAAU,WAAW,GAAG;AAC1B,UAAM,IAAI,MAAM,oCAAoC,SAAS,EAAE;AAAA,EACjE;AACA,QAAM,SAAS,UAAU,SAAS,GAAG,gBAAgB,MAAM,EAAE,SAAS,OAAO;AAC7E,MAAI,CAAC,OAAO,WAAW,eAAe,GAAG;AACvC,UAAM,IAAI;AAAA,MACR,gFAAgF,SAAS;AAAA,IAC3F;AAAA,EACF;AACF;AAEA,eAAe,YACb,SACA,KACA,WACkD;AAClD,QAAM,cAAc,MAAM,QAAQ,QAAQ,IAAI,KAAK,EAAE,SAAS,UAAU,CAAC;AACzE,QAAM,SAAS,YAAY,OAAO;AAClC,MAAI,UAAU,KAAK;AACjB,UAAM,IAAI,MAAM,QAAQ,MAAM,uCAAuC,GAAG,EAAE;AAAA,EAC5E;AACA,QAAM,YAAY,OAAO,KAAK,MAAM,YAAY,KAAK,CAAC;AACtD,QAAM,WAAW,YAAY,IAAI;AACjC,oBAAkB,WAAW,QAAQ;AACrC,SAAO,EAAE,WAAW,SAAS;AAC/B;AAEA,eAAsB,UACpB,KACA,QACA,aACsB;AACtB,QAAM,UAAU,MAAM,SAAS,OAAO;AAAA,IACpC,UAAU,CAAC,OAAO;AAAA,EACpB,CAAC;AAED,MAAI;AACF,UAAM,iBAA2D;AAAA,MAC/D,WAAW;AAAA,IACb;AAGA,QAAI,CAAC,OAAO,aAAa,cAAc,KAAK,WAAW,GAAG;AACxD,YAAM,cAAc,eAAe,KAAK,WAAW;AACnD,qBAAe,eAAe;AAAA,IAChC;AAEA,UAAM,UAA0B,MAAM,QAAQ,WAAW,cAAc;AACvE,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAGnC,UAAM,eAAe,0BAA0B,MAAM;AACrD,QAAI,aAAa,OAAO,GAAG;AACzB,YAAM,KAAK,MAAM,QAAQ,OAAO,UAAU;AACxC,YAAI,aAAa,IAAI,MAAM,QAAQ,EAAE,aAAa,CAAC,GAAG;AACpD,gBAAM,MAAM,MAAM;AAAA,QACpB,OAAO;AACL,gBAAM,MAAM,SAAS;AAAA,QACvB;AAAA,MACF,CAAC;AAAA,IACH;AAEA,UAAM,YAAY,OAAO,UAAU;AAKnC,QAAI;AACJ,QAAI;AACF,iBAAW,MAAM,KAAK,KAAK,KAAK;AAAA,QAC9B,WAAW,OAAO;AAAA,QAClB,SAAS;AAAA,MACX,CAAC;AAAA,IACH,SAAS,OAAO;AACd,UACE,iBAAiB,SACjB,MAAM,QAAQ,SAAS,sBAAsB,GAC7C;AACA,cAAM,SAAS,MAAM,YAAY,SAAS,KAAK,SAAS;AACxD,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,OAAO,WAAW,KAAK,UAAU,OAAO,SAAS;AAAA,MACpF;AACA,YAAM;AAAA,IACR;AAEA,QAAI,CAAC,UAAU;AACb,YAAM,IAAI,MAAM,6BAA6B,GAAG,EAAE;AAAA,IACpD;AAEA,UAAM,SAAS,SAAS,OAAO;AAC/B,QAAI,UAAU,KAAK;AACjB,YAAM,IAAI,MAAM,QAAQ,MAAM,kBAAkB,SAAS,IAAI,CAAC,EAAE;AAAA,IAClE;AAEA,UAAM,WAAW,SAAS,IAAI;AAC9B,UAAM,cAAc,SAAS,QAAQ,EAAE,cAAc,KAAK;AAK1D,QAAI,iBAAiB,WAAW,GAAG;AACjC,YAAM,OAAO,MAAM,SAAS,KAAK;AACjC,UAAI;AACF,0BAAkB,MAAM,QAAQ;AAChC,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,MAAM,KAAK,SAAS;AAAA,MACvD,QAAQ;AAEN,cAAM,SAAS,MAAM,YAAY,SAAS,UAAU,SAAS;AAC7D,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,OAAO,WAAW,KAAK,UAAU,OAAO,SAAS;AAAA,MACpF;AAAA,IACF;AAEA,UAAM,WAAW,MAAM,KAAK,QAAQ;AACpC,QAAI;AAEJ,QAAI,OAAO,UAAU;AACnB,YAAM,UAAU,MAAM,KAAK,EAAE,OAAO,QAAQ;AAC5C,UAAI,CAAC,SAAS;AACZ,cAAM,IAAI,MAAM,uBAAuB,OAAO,QAAQ,EAAE;AAAA,MAC1D;AACA,aAAO,MAAM,QAAQ,UAAU;AAAA,IACjC,OAAO;AACL,aAAO;AAAA,IACT;AAEA,UAAM,QAAQ,MAAM;AAEpB,WAAO,EAAE,MAAM,QAAQ,MAAM,UAAU,KAAK,SAAS;AAAA,EACvD,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF;;;AEnKA,SAAS,mBAAmB;AAC5B,SAAS,aAAa;AAGtB,SAAS,eAAe,KAAe,UAAiC;AACtE,QAAM,KAAK,IAAI,cAAc,QAAQ;AACrC,SAAO,IAAI,aAAa,SAAS,KAAK;AACxC;AAEA,SAAS,aAAa,KAAqB;AACzC,SAAO,KAAK,IAAI,KAAK,CAAC;AACxB;AAEA,SAAS,qBAAqB,KAA8B;AAC1D,QAAM,YACJ,eAAe,KAAK,yCAAyC,KAC7D,eAAe,KAAK,4BAA4B;AAElD,MAAI,CAAC,WAAW;AACd,UAAM,SAAS,IAAI,cAAc,oCAAoC;AACrE,QAAI,QAAQ,aAAa;AACvB,UAAI;AACF,cAAM,OAAO,KAAK,MAAM,OAAO,WAAW;AAC1C,YAAI,OAAO,KAAK,kBAAkB,UAAU;AAC1C,iBAAO,KAAK,cAAc,MAAM,GAAG,EAAE,CAAC;AAAA,QACxC;AAAA,MACF,QAAQ;AAAA,MAER;AAAA,IACF;AACA,WAAO;AAAA,EACT;AAEA,SAAO,UAAU,MAAM,GAAG,EAAE,CAAC;AAC/B;AAEA,SAAS,eACP,KACA,mBACU;AACV,QAAM,iBAAiB,IAAI,iBAAiB,iCAAiC;AAC7E,MAAI,eAAe,SAAS,GAAG;AAC7B,WAAO,MAAM,KAAK,cAAc,EAC7B,IAAI,CAAC,OAAO,GAAG,aAAa,SAAS,CAAC,EACtC,OAAO,CAAC,MAAmB,MAAM,IAAI,EACrC,IAAI,YAAY;AAAA,EACrB;AAEA,QAAM,WAAW,eAAe,KAAK,4BAA4B;AACjE,MAAI,UAAU;AACZ,WAAO,CAAC,aAAa,QAAQ,CAAC;AAAA,EAChC;AAEA,MAAI,mBAAmB;AACrB,WAAO,CAAC,aAAa,iBAAiB,CAAC;AAAA,EACzC;AAEA,SAAO,CAAC;AACV;AAcA,SAAS,cACP,KACA,SACA,UACU;AACV,QAAM,QAAQ,SAAS,SAAS,IAAI;AACpC,QAAM,UAAU,eAAe,KAAK,SAAS,UAAU,IAAI;AAC3D,QAAM,YAAY,qBAAqB,GAAG;AAE1C,QAAM,cACJ,eAAe,KAAK,iCAAiC,KACrD,eAAe,KAAK,0BAA0B,MAC7C,SAAS,WAAW;AAEvB,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,QAAQ;AAAA,IACR,QAAQ;AAAA,IACR;AAAA,IACA,SAAS;AAAA,IACT;AAAA,EACF;AACF;AAEA,SAAS,qBAAqB,MAAc,KAAwC;AAClF,QAAM,MAAM,IAAI,MAAM,MAAM,EAAE,IAAI,CAAC;AACnC,QAAM,SAAS,IAAI,YAAY,IAAI,OAAO,QAAQ;AAClD,SAAO,OAAO,MAAM;AACtB;AAEO,SAAS,QAAQ,MAAc,UAAiC;AACrE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,MAAI,CAAC,SAAS;AACZ,UAAM,IAAI;AAAA,MACR;AAAA,IAEF;AAAA,EACF;AAEA,MAAI,CAAC,QAAQ,SAAS;AACpB,UAAM,IAAI,MAAM,iDAAiD;AAAA,EACnE;AAEA,SAAO;AAAA,IACL,UAAU,cAAc,KAAK,SAAS,QAAQ;AAAA,IAC9C,SAAS,QAAQ;AAAA,EACnB;AACF;AAEO,SAAS,gBAAgB,MAAc,UAA4B;AACxE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,SAAO,cAAc,KAAK,SAAS,QAAQ;AAC7C;;;ACtIA,OAAO,qBAAqB;AAErB,SAAS,kBAAkB,MAAsB;AACtD,QAAM,WAAW,IAAI,gBAAgB;AAAA,IACnC,cAAc;AAAA,IACd,gBAAgB;AAAA,IAChB,kBAAkB;AAAA,EACpB,CAAC;AAED,SAAO,SAAS,SAAS,IAAI;AAC/B;;;ACVA,SAAS,qBAAqB;AAC9B,SAAS,QAAAC,aAAY;AACrB,OAAOC,WAAU;AAGjB,IAAM,eAAe;AACrB,IAAM,gBAAgB;AACtB,IAAM,sBAAsB;AAErB,SAAS,iBAAiB,OAAuB;AACtD,QAAM,YAAY,MACf,QAAQ,eAAe,EAAE,EACzB,QAAQ,cAAc,EAAE,EACxB,QAAQ,QAAQ,GAAG,EACnB,KAAK;AACR,QAAM,OAAO,UAAU,MAAM,GAAG,mBAAmB,KAAK;AACxD,SAAO,GAAG,IAAI;AAChB;AAEO,SAAS,iBAAiB,UAAoB,MAAwB;AAC3E,QAAM,OAAgC;AAAA,IACpC,OAAO,SAAS;AAAA,IAChB,QAAQ,SAAS;AAAA,EACnB;AAEA,MAAI,SAAS,OAAO,SAAS,GAAG;AAC9B,SAAK,SAAS,SAAS;AAAA,EACzB;AAEA,MAAI,SAAS,WAAW;AACtB,SAAK,YAAY,SAAS;AAAA,EAC5B;AAEA,OAAK,UAAU,SAAS;AAExB,MAAI,SAAS,aAAa;AACxB,SAAK,cAAc,SAAS;AAAA,EAC9B;AAEA,OAAK,OAAO;AAEZ,QAAM,UAAUA,MAAK,KAAK,MAAM;AAAA,IAC9B,aAAa;AAAA,IACb,aAAa;AAAA,IACb,WAAW;AAAA,IACX,UAAU;AAAA,EACZ,CAAC;AAED,SAAO;AAAA,EAAQ,OAAO;AACxB;AAEO,SAAS,kBACd,MACA,UACA,iBACA,MACQ;AACR,QAAM,WAAW,iBAAiB,SAAS,KAAK;AAChD,QAAM,WAAWD,MAAK,MAAM,QAAQ;AACpC,QAAM,cAAc,iBAAiB,UAAU,IAAI;AACnD,QAAM,cAAc,GAAG,WAAW;AAAA;AAAA,EAAO,eAAe;AAAA;AAExD,gBAAc,UAAU,aAAa,OAAO;AAE5C,SAAO;AACT;;;ANhDA,IAAM,cAAcE,MAAKC,SAAQ,GAAG,WAAW,eAAe,aAAa;AAE3E,IAAM,UAAU,IAAI,QAAQ;AAE5B,QACG,KAAK,aAAa,EAClB;AAAA,EACC;AACF,EACC,QAAQ,OAAO;AAElB,QACG,QAAQ,OAAO,EACf,YAAY,mCAAmC,EAC/C,SAAS,SAAS,cAAc,EAChC,OAAO,iBAAiB,uBAAuB,EAC/C,OAAO,YAAY,4BAA4B,EAC/C,OAAO,oBAAoB,yBAAyB,EACpD,OAAO,uBAAuB,sBAAsB,QAAQ,EAC5D,OAAO,gBAAgB,wBAAwB,CAAC,KAAa,QAAkB;AAC9E,MAAI,KAAK,GAAG;AACZ,SAAO;AACT,GAAG,CAAC,CAAa,EAChB;AAAA,EACC;AAAA,EACA;AACF,EACC,OAAO,kBAAkB,0BAA0B,EACnD,OAAO,aAAa,oCAAoC,EACxD,OAAO,qBAAqB,6BAA6B,EACzD,OAAO,oBAAoB,4BAA4B,EACvD,OAAO,oBAAoB,6BAA6B,EACxD,OAAO,SAAS,uDAAuD,EACvE,OAAO,kBAAkB,iDAAiD,EAC1E,OAAO,OAAO,KAAa,YAAqC;AAC/D,MAAI;AACF,UAAM,aAAaC,YAAW,WAAW,IAAI,cAAc;AAC3D,UAAM,SAAS;AAAA,MACb;AAAA,QACE,MAAM,QAAQ;AAAA,QACd,MAAM,QAAQ;AAAA,QACd,SAAS,QAAQ;AAAA,QACjB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,UAAU,QAAQ;AAAA,QAClB,OAAO,QAAQ;AAAA,QACf,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,aAAa,QAAQ;AAAA,QACrB,YAAY,QAAQ;AAAA,QACpB,YAAY,QAAQ;AAAA,QACpB,KAAK,QAAQ;AAAA,MACf;AAAA,MACA;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,UAAU;AACjC,YAAM,IAAI,MAAM,+CAA+C;AAAA,IACjE;AAGA,QAAI,CAAC,OAAO,UAAU,CAACA,YAAW,OAAO,IAAI,GAAG;AAC9C,YAAM,IAAI,MAAM,yCAAyC,OAAO,IAAI,EAAE;AAAA,IACxE;AAEA,UAAM,cAAc,cAAc;AAClC,UAAM,cAAc,MAAM,UAAU,KAAK,QAAQ,WAAW;AAE5D,QAAI;AACJ,QAAI;AAEJ,QAAI,YAAY,SAAS,OAAO;AAC9B,UAAI,OAAO,UAAU;AACnB,cAAM,IAAI,MAAM,0CAA0C;AAAA,MAC5D;AACA,UAAI,OAAO,KAAK;AACd,cAAM,IAAI,MAAM,qCAAqC;AAAA,MACvD;AACA,YAAM,EAAE,qBAAqB,IAAI,MAAM,OAAO,6BAAoB;AAClE,YAAM,YAAY,MAAM;AAAA,QACtB,YAAY;AAAA,QACZ,YAAY;AAAA,MACd;AACA,iBAAW,UAAU;AACrB,iBAAW,UAAU;AAAA,IACvB,WAAW,OAAO,UAAU;AAE1B,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AACrE,iBAAW,kBAAkB,YAAY,IAAI;AAAA,IAC/C,WAAW,OAAO,KAAK;AAErB,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AACrE,iBAAW,kBAAkB,YAAY,QAAQ;AAAA,IACnD,OAAO;AACL,YAAM,SAAS,QAAQ,YAAY,MAAM,YAAY,QAAQ;AAC7D,iBAAW,OAAO;AAClB,iBAAW,kBAAkB,OAAO,OAAO;AAAA,IAC7C;AAEA,QAAI,OAAO,UAAU,MAAM;AACzB,iBAAW,EAAE,GAAG,UAAU,OAAO,OAAO,MAAM;AAAA,IAChD;AAEA,QAAI,OAAO,QAAQ;AACjB,YAAM,cAAc,iBAAiB,UAAU,OAAO,IAAI;AAC1D,cAAQ,OAAO,MAAM,GAAG,WAAW;AAAA;AAAA,EAAO,QAAQ;AAAA,CAAI;AAAA,IACxD,OAAO;AACL,YAAM,WAAW;AAAA,QACf,OAAO;AAAA,QACP;AAAA,QACA;AAAA,QACA,OAAO;AAAA,MACT;AACA,cAAQ,MAAM,UAAU,QAAQ,EAAE;AAAA,IACpC;AAAA,EACF,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB;AACF,CAAC;AAEH,QACG,QAAQ,OAAO,EACf,YAAY,kCAAkC,EAC9C,SAAS,SAAS,cAAc,EAChC,OAAO,uBAAuB,4BAA4B,QAAQ,EAClE,OAAO,OAAO,KAAa,YAAqC;AAC/D,QAAM,EAAE,UAAAC,UAAS,IAAI,MAAM,OAAO,YAAY;AAC9C,QAAM,cAAc,cAAc;AAClC,mBAAiB,WAAW;AAE5B,QAAM,aAAc,QAAQ,WAAkC;AAC9D,QAAM,UAAU,MAAMA,UAAS,OAAO,EAAE,UAAU,MAAM,CAAC;AAEzD,MAAI;AACF,UAAM,UAAU,MAAM,QAAQ,WAAW;AACzC,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAEnC,UAAM,KAAK,KAAK,KAAK,EAAE,WAAW,eAAe,SAAS,aAAa,IAAK,CAAC;AAE7E,YAAQ,MAAM,yEAAyE;AAEvF,YAAQ,MAAM,OAAO;AACrB,UAAM,KAAK,QAAQ,OAAO,MAAM;AAChC,YAAQ,MAAM,MAAM;AACpB,YAAQ,MAAM,MAAM;AAEpB,UAAM,cAAc,eAAe,KAAK,WAAW;AACnD,UAAM,QAAQ,aAAa,EAAE,MAAM,YAAY,CAAC;AAChD,YAAQ,MAAM,kBAAkB,WAAW,EAAE;AAAA,EAC/C,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF,CAAC;AAEH,QAAQ,MAAM;","names":["existsSync","homedir","join","homedir","join","yaml","join","homedir","existsSync","chromium"]}
@@ -0,0 +1,69 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/pdf-converter.ts
4
+ import pdf2md from "@opendocsg/pdf2md";
5
+ var PDF_DATE_PATTERN = /^D:(\d{4})(\d{2})(\d{2})/;
6
+ function parsePdfDate(dateStr) {
7
+ const match = dateStr.match(PDF_DATE_PATTERN);
8
+ if (!match) return null;
9
+ const [, year, month, day] = match;
10
+ return `${year}-${month}-${day}`;
11
+ }
12
+ var AUTHOR_SEPARATOR = /\s*(?:;|\s+and\s+|&)\s*/;
13
+ function formatAuthor(raw) {
14
+ const sanitized = raw.trim().replace(/[[\]]/g, "");
15
+ return `[[${sanitized}]]`;
16
+ }
17
+ function parseAuthors(raw) {
18
+ return raw.split(AUTHOR_SEPARATOR).map((s) => s.trim()).filter(Boolean).map(formatAuthor);
19
+ }
20
+ async function convertPdfToMarkdown(pdfBuffer, sourceUrl) {
21
+ let pdfMeta = null;
22
+ const markdown = await pdf2md(pdfBuffer, {
23
+ metadataParsed: (metadata) => {
24
+ pdfMeta = metadata;
25
+ }
26
+ });
27
+ if (!markdown.trim()) {
28
+ throw new Error("pdf2md returned empty content from the PDF");
29
+ }
30
+ const xmpTitle = pdfMeta?.metadata?.get("dc:title");
31
+ const infoTitle = pdfMeta?.info.Title;
32
+ const title = (xmpTitle && xmpTitle.trim() ? xmpTitle.trim() : null) ?? (infoTitle && infoTitle.trim() ? infoTitle.trim() : null) ?? extractTitleFromMarkdown(markdown) ?? extractTitleFromUrl(sourceUrl);
33
+ const rawAuthor = pdfMeta?.info.Author;
34
+ const author = rawAuthor && rawAuthor.trim() ? parseAuthors(rawAuthor) : [];
35
+ const rawDate = pdfMeta?.info.CreationDate;
36
+ const published = rawDate ? parsePdfDate(rawDate) : null;
37
+ const today = (/* @__PURE__ */ new Date()).toISOString().split("T")[0];
38
+ return {
39
+ markdown,
40
+ metadata: {
41
+ title,
42
+ source: sourceUrl,
43
+ author,
44
+ published,
45
+ created: today,
46
+ description: null
47
+ }
48
+ };
49
+ }
50
+ function extractTitleFromMarkdown(markdown) {
51
+ const match = markdown.match(/^#\s+(.+?)(?:\s+#+)?$/m);
52
+ if (!match) return null;
53
+ const trimmed = match[1].trim();
54
+ return trimmed || null;
55
+ }
56
+ function extractTitleFromUrl(url) {
57
+ const pathname = new URL(url).pathname;
58
+ const lastSegment = pathname.split("/").filter(Boolean).pop();
59
+ if (!lastSegment) {
60
+ throw new Error(`Cannot extract filename from URL: ${url}`);
61
+ }
62
+ return decodeURIComponent(lastSegment.replace(/\.pdf$/i, ""));
63
+ }
64
+ export {
65
+ convertPdfToMarkdown,
66
+ extractTitleFromUrl,
67
+ parsePdfDate
68
+ };
69
+ //# sourceMappingURL=pdf-converter-FM6DCBO5.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/pdf-converter.ts"],"sourcesContent":["import pdf2md from \"@opendocsg/pdf2md\";\nimport type { Metadata } from \"./types.js\";\n\nexport interface PdfConvertResult {\n markdown: string;\n metadata: Metadata;\n}\n\ninterface PdfMetadataInfo {\n Title?: string;\n Author?: string;\n Creator?: string;\n CreationDate?: string;\n}\n\ninterface PdfRawMetadata {\n info: PdfMetadataInfo;\n metadata: { get: (name: string) => string | null } | null;\n}\n\nconst PDF_DATE_PATTERN = /^D:(\\d{4})(\\d{2})(\\d{2})/;\n\nexport function parsePdfDate(dateStr: string): string | null {\n const match = dateStr.match(PDF_DATE_PATTERN);\n if (!match) return null;\n const [, year, month, day] = match;\n return `${year}-${month}-${day}`;\n}\n\nconst AUTHOR_SEPARATOR = /\\s*(?:;|\\s+and\\s+|&)\\s*/;\n\nfunction formatAuthor(raw: string): string {\n const sanitized = raw.trim().replace(/[[\\]]/g, \"\");\n return `[[${sanitized}]]`;\n}\n\nfunction parseAuthors(raw: string): string[] {\n return raw\n .split(AUTHOR_SEPARATOR)\n .map((s) => s.trim())\n .filter(Boolean)\n .map(formatAuthor);\n}\n\nexport async function convertPdfToMarkdown(\n pdfBuffer: Buffer,\n sourceUrl: string,\n): Promise<PdfConvertResult> {\n // TypeScript narrows this to `null` without the assertion because\n // it cannot see that metadataParsed is called synchronously inside pdf2md.\n let pdfMeta = null as PdfRawMetadata | null;\n\n const markdown = await pdf2md(pdfBuffer, {\n metadataParsed: (metadata) => {\n pdfMeta = metadata as PdfRawMetadata;\n },\n });\n\n if (!markdown.trim()) {\n throw new Error(\"pdf2md returned empty content from the PDF\");\n }\n\n // Title priority: XMP dc:title > info.Title > Markdown H1 > URL segment\n const xmpTitle = pdfMeta?.metadata?.get(\"dc:title\");\n const infoTitle = pdfMeta?.info.Title;\n const title =\n (xmpTitle && xmpTitle.trim() ? xmpTitle.trim() : null) ??\n (infoTitle && infoTitle.trim() ? infoTitle.trim() : null) ??\n extractTitleFromMarkdown(markdown) ??\n extractTitleFromUrl(sourceUrl);\n\n // Author\n const rawAuthor = pdfMeta?.info.Author;\n const author = rawAuthor && rawAuthor.trim() ? parseAuthors(rawAuthor) : [];\n\n // Published\n const rawDate = pdfMeta?.info.CreationDate;\n const published = rawDate ? parsePdfDate(rawDate) : null;\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n markdown,\n metadata: {\n title,\n source: sourceUrl,\n author,\n published,\n created: today,\n description: null,\n },\n };\n}\n\nfunction extractTitleFromMarkdown(markdown: string): string | null {\n const match = markdown.match(/^#\\s+(.+?)(?:\\s+#+)?$/m);\n if (!match) return null;\n const trimmed = match[1].trim();\n return trimmed || null;\n}\n\nexport function extractTitleFromUrl(url: string): string {\n const pathname = new URL(url).pathname;\n const lastSegment = pathname.split(\"/\").filter(Boolean).pop();\n if (!lastSegment) {\n throw new Error(`Cannot extract filename from URL: ${url}`);\n }\n return decodeURIComponent(lastSegment.replace(/\\.pdf$/i, \"\"));\n}\n"],"mappings":";;;AAAA,OAAO,YAAY;AAoBnB,IAAM,mBAAmB;AAElB,SAAS,aAAa,SAAgC;AAC3D,QAAM,QAAQ,QAAQ,MAAM,gBAAgB;AAC5C,MAAI,CAAC,MAAO,QAAO;AACnB,QAAM,CAAC,EAAE,MAAM,OAAO,GAAG,IAAI;AAC7B,SAAO,GAAG,IAAI,IAAI,KAAK,IAAI,GAAG;AAChC;AAEA,IAAM,mBAAmB;AAEzB,SAAS,aAAa,KAAqB;AACzC,QAAM,YAAY,IAAI,KAAK,EAAE,QAAQ,UAAU,EAAE;AACjD,SAAO,KAAK,SAAS;AACvB;AAEA,SAAS,aAAa,KAAuB;AAC3C,SAAO,IACJ,MAAM,gBAAgB,EACtB,IAAI,CAAC,MAAM,EAAE,KAAK,CAAC,EACnB,OAAO,OAAO,EACd,IAAI,YAAY;AACrB;AAEA,eAAsB,qBACpB,WACA,WAC2B;AAG3B,MAAI,UAAU;AAEd,QAAM,WAAW,MAAM,OAAO,WAAW;AAAA,IACvC,gBAAgB,CAAC,aAAa;AAC5B,gBAAU;AAAA,IACZ;AAAA,EACF,CAAC;AAED,MAAI,CAAC,SAAS,KAAK,GAAG;AACpB,UAAM,IAAI,MAAM,4CAA4C;AAAA,EAC9D;AAGA,QAAM,WAAW,SAAS,UAAU,IAAI,UAAU;AAClD,QAAM,YAAY,SAAS,KAAK;AAChC,QAAM,SACH,YAAY,SAAS,KAAK,IAAI,SAAS,KAAK,IAAI,UAChD,aAAa,UAAU,KAAK,IAAI,UAAU,KAAK,IAAI,SACpD,yBAAyB,QAAQ,KACjC,oBAAoB,SAAS;AAG/B,QAAM,YAAY,SAAS,KAAK;AAChC,QAAM,SAAS,aAAa,UAAU,KAAK,IAAI,aAAa,SAAS,IAAI,CAAC;AAG1E,QAAM,UAAU,SAAS,KAAK;AAC9B,QAAM,YAAY,UAAU,aAAa,OAAO,IAAI;AAEpD,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,UAAU;AAAA,MACR;AAAA,MACA,QAAQ;AAAA,MACR;AAAA,MACA;AAAA,MACA,SAAS;AAAA,MACT,aAAa;AAAA,IACf;AAAA,EACF;AACF;AAEA,SAAS,yBAAyB,UAAiC;AACjE,QAAM,QAAQ,SAAS,MAAM,wBAAwB;AACrD,MAAI,CAAC,MAAO,QAAO;AACnB,QAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,SAAO,WAAW;AACpB;AAEO,SAAS,oBAAoB,KAAqB;AACvD,QAAM,WAAW,IAAI,IAAI,GAAG,EAAE;AAC9B,QAAM,cAAc,SAAS,MAAM,GAAG,EAAE,OAAO,OAAO,EAAE,IAAI;AAC5D,MAAI,CAAC,aAAa;AAChB,UAAM,IAAI,MAAM,qCAAqC,GAAG,EAAE;AAAA,EAC5D;AACA,SAAO,mBAAmB,YAAY,QAAQ,WAAW,EAAE,CAAC;AAC9D;","names":[]}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "vault-fetch",
3
- "version": "0.2.0",
3
+ "version": "0.3.1",
4
4
  "description": "Fetch JS-rendered web pages with Playwright and save as Markdown to Obsidian Vault",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -35,6 +35,7 @@
35
35
  },
36
36
  "dependencies": {
37
37
  "@mozilla/readability": "^0.6.0",
38
+ "@opendocsg/pdf2md": "^0.2.5",
38
39
  "commander": "^14.0.3",
39
40
  "js-yaml": "^4.1.1",
40
41
  "jsdom": "^29.0.1",