vault-fetch 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ja.md ADDED
@@ -0,0 +1,154 @@
1
+ # vault-fetch
2
+
3
+ Obsidian Clipper では取得できない、JavaScript レンダリングや認証が必要な Web ページおよび PDF ファイルを Playwright で取得し、Markdown に変換して Obsidian Vault に保存する CLI ツール。
4
+
5
+ ## 特徴
6
+
7
+ - Playwright (Chromium) による JS レンダリング後のページ取得
8
+ - **PDF から Markdown への変換**(Content-Type で自動判定)
9
+ - Readability.js による記事本文の抽出(広告・ナビゲーション除去)、`--raw` モードでフルページ変換も可能
10
+ - リソースブロッキング(画像・フォント・メディア)による高速フェッチ
11
+ - Chrome User-Agent 偽装によるボット対策回避
12
+ - Obsidian Clipper 互換のフロントマター(title, source, author, published, created, description, tags)
13
+ - セッション管理(`storageState`)によるログイン済みページの取得
14
+ - 設定の 3 層解決(CLI オプション > 環境変数 > 設定ファイル)
15
+
16
+ ## インストール
17
+
18
+ ```bash
19
+ # グローバルインストール
20
+ npm install -g vault-fetch
21
+
22
+ # Playwright のブラウザも必要
23
+ npx playwright install chromium
24
+ ```
25
+
26
+ ## 使い方
27
+
28
+ `npx` でインストールなしでも実行できます:
29
+
30
+ ```bash
31
+ npx vault-fetch fetch https://example.com/article --dry-run --dest /tmp
32
+ ```
33
+
34
+ ### ページの取得・保存
35
+
36
+ ```bash
37
+ # Obsidian Vault に保存
38
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings
39
+
40
+ # 標準出力に出力(保存しない)
41
+ vault-fetch fetch https://example.com/article --dry-run --dest /tmp
42
+
43
+ # headed モードで実行(デバッグ用)
44
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --headed
45
+
46
+ # 特定の CSS セレクタのみ抽出(Readability をスキップ)
47
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --selector "article"
48
+
49
+ # タグを追加
50
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --tag tech --tag ai
51
+
52
+ # 非記事ページをフルページ変換(Readability をスキップ)
53
+ vault-fetch fetch https://example.com/table-page --dest ~/Documents/Obsidian/Clippings --raw
54
+
55
+ # 画像を含めてフェッチ(デフォルトではブロック)
56
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --no-block-images
57
+
58
+ # PDF を取得して Markdown に変換(自動判定)
59
+ vault-fetch fetch https://example.com/report.pdf --dest ~/Documents/Obsidian/Clippings
60
+ ```
61
+
62
+ ### PDF 対応
63
+
64
+ サーバーが `Content-Type: application/pdf` を返す場合、vault-fetch は自動的に PDF をダウンロードし、[pdf2md](https://github.com/opendocsg/pdf2md) で Markdown に変換します。追加のフラグは不要です。
65
+
66
+ - タイトルは変換された Markdown の最初の `#` 見出しから抽出(なければ URL のファイル名を使用)
67
+ - `--selector` および `--raw` オプションは PDF URL と併用不可
68
+ - セッション機能により認証付き PDF のダウンロードにも対応
69
+
70
+ ### ログイン(セッション保存)
71
+
72
+ 認証が必要なサイトの場合、事前にログインしてセッションを保存できます。
73
+
74
+ ```bash
75
+ vault-fetch login https://note.com
76
+ # → ブラウザが開く → 手動でログイン → ターミナルで Enter を押す
77
+ ```
78
+
79
+ 以降の `fetch` でそのドメインのセッションが自動的に使用されます。
80
+
81
+ ### fetch オプション
82
+
83
+ | オプション | 説明 |
84
+ |---|---|
85
+ | `--dest <path>` | 保存先ディレクトリ(必須) |
86
+ | `--headed` | ブラウザを表示して実行 |
87
+ | `--selector <css>` | CSS セレクタで要素を抽出 |
88
+ | `--timeout <sec>` | タイムアウト秒数(デフォルト: 30) |
89
+ | `--tag <name>` | タグ追加(複数指定可) |
90
+ | `--wait-until <event>` | 待機条件: `load` / `domcontentloaded` / `networkidle`(デフォルト: `networkidle`) |
91
+ | `--skip-session` | 保存済みセッションを使わない |
92
+ | `--dry-run` | 保存せず標準出力に出力 |
93
+ | `--raw` | Readability をスキップし、フルページ HTML を直接変換 |
94
+ | `--no-block-images` | 画像リクエストのブロックを無効化 |
95
+ | `--no-block-fonts` | フォントリクエストのブロックを無効化 |
96
+ | `--no-block-media` | メディアリクエストのブロックを無効化 |
97
+
98
+ ## 設定
99
+
100
+ ### 設定ファイル
101
+
102
+ `~/.config/vault-fetch/config.yaml`:
103
+
104
+ ```yaml
105
+ # Obsidian Vault の保存先
106
+ dest: ~/Documents/Obsidian/Clippings
107
+
108
+ # デフォルトタグ
109
+ tags:
110
+ - clippings
111
+
112
+ timeout: 30
113
+ ```
114
+
115
+ ### 環境変数
116
+
117
+ | 変数 | 説明 |
118
+ |---|---|
119
+ | `VAULT_FETCH_DEST` | 保存先ディレクトリ |
120
+ | `VAULT_FETCH_TIMEOUT` | タイムアウト秒数 |
121
+
122
+ ### 優先順位
123
+
124
+ CLI オプション > 環境変数 > 設定ファイル > デフォルト値
125
+
126
+ ## 出力例
127
+
128
+ ```yaml
129
+ ---
130
+ title: ADHDの自分が毎日クッソ集中できるようになった習慣
131
+ source: https://note.com/simplearchitect/n/n8389e1b4fbde
132
+ author:
133
+ - "[[牛尾 剛]]"
134
+ published: 2025-06-14
135
+ created: 2025-07-03
136
+ description: 自分はADHDですので、もちろん集中力は暗黒です...
137
+ tags:
138
+ - clippings
139
+ ---
140
+
141
+ 記事の本文が Markdown で続きます...
142
+ ```
143
+
144
+ ## 開発
145
+
146
+ ```bash
147
+ npm run build # tsup でビルド
148
+ npm test # vitest でテスト実行
149
+ npm run typecheck # 型チェック
150
+ ```
151
+
152
+ ## ライセンス
153
+
154
+ MIT
package/README.md CHANGED
@@ -1,128 +1,154 @@
1
1
  # vault-fetch
2
2
 
3
- Obsidian Clipper では取得できない、JavaScript レンダリングや認証が必要な Web ページを Playwright で取得し、Markdown に変換して Obsidian Vault に保存する CLI ツール。
3
+ A CLI tool that uses Playwright to fetch web pages and PDF files — pages that Obsidian Clipper cannot handle converts them to Markdown, and saves them to your Obsidian Vault.
4
4
 
5
- ## 特徴
5
+ ## Features
6
6
 
7
- - Playwright (Chromium) による JS レンダリング後のページ取得
8
- - Readability.js による記事本文の抽出(広告・ナビゲーション除去)
9
- - Obsidian Clipper 互換のフロントマター(title, source, author, published, created, description, tags)
10
- - セッション管理(`storageState`)によるログイン済みページの取得
11
- - 設定の 3 層解決(CLI オプション > 環境変数 > 設定ファイル)
7
+ - Page fetching with JS rendering via Playwright (Chromium)
8
+ - **PDF to Markdown conversion** (auto-detected via Content-Type)
9
+ - Article content extraction using Readability.js (removes ads and navigation), with `--raw` mode for full-page conversion
10
+ - Resource blocking (images, fonts, media) for faster fetching
11
+ - Chrome User-Agent spoofing to bypass bot detection
12
+ - Obsidian Clipper-compatible frontmatter (title, source, author, published, created, description, tags)
13
+ - Session management (`storageState`) for fetching authenticated pages
14
+ - 3-layer configuration resolution (CLI options > environment variables > config file)
12
15
 
13
- ## セットアップ
16
+ ## Installation
14
17
 
15
18
  ```bash
16
- npm install
19
+ # Global install
20
+ npm install -g vault-fetch
21
+
22
+ # Playwright browser is also required
17
23
  npx playwright install chromium
18
24
  ```
19
25
 
20
- グローバルインストール:
26
+ ## Usage
27
+
28
+ You can run it without installation using `npx`:
21
29
 
22
30
  ```bash
23
- npm install -g vault-fetch
24
- npx playwright install chromium
31
+ npx vault-fetch fetch https://example.com/article --dry-run --dest /tmp
25
32
  ```
26
33
 
27
- ## 使い方
28
-
29
- ### ページの取得・保存
34
+ ### Fetching and Saving Pages
30
35
 
31
36
  ```bash
32
- # Obsidian Vault に保存
37
+ # Save to Obsidian Vault
33
38
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings
34
39
 
35
- # 標準出力に出力(保存しない)
40
+ # Output to stdout (without saving)
36
41
  vault-fetch fetch https://example.com/article --dry-run --dest /tmp
37
42
 
38
- # headed モードで実行(デバッグ用)
43
+ # Run in headed mode (for debugging)
39
44
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --headed
40
45
 
41
- # 特定の CSS セレクタのみ抽出(Readability をスキップ)
46
+ # Extract only a specific CSS selector (skips Readability)
42
47
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --selector "article"
43
48
 
44
- # タグを追加
49
+ # Add tags
45
50
  vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --tag tech --tag ai
51
+
52
+ # Full-page conversion for non-article pages (skips Readability)
53
+ vault-fetch fetch https://example.com/table-page --dest ~/Documents/Obsidian/Clippings --raw
54
+
55
+ # Fetch with images (blocked by default)
56
+ vault-fetch fetch https://example.com/article --dest ~/Documents/Obsidian/Clippings --no-block-images
57
+
58
+ # Fetch a PDF and convert to Markdown (auto-detected)
59
+ vault-fetch fetch https://example.com/report.pdf --dest ~/Documents/Obsidian/Clippings
46
60
  ```
47
61
 
48
- ### ログイン(セッション保存)
62
+ ### PDF Support
63
+
64
+ When the server returns `Content-Type: application/pdf`, vault-fetch automatically downloads the PDF and converts it to Markdown using [pdf2md](https://github.com/opendocsg/pdf2md). No additional flags are needed.
65
+
66
+ - Title is extracted from the first `#` heading in the converted Markdown, or from the URL filename
67
+ - `--selector` and `--raw` options cannot be used with PDF URLs
68
+ - Session support works with authenticated PDF downloads
49
69
 
50
- 認証が必要なサイトの場合、事前にログインしてセッションを保存できます。
70
+ ### Login (Session Storage)
71
+
72
+ For sites that require authentication, you can log in and save the session beforehand.
51
73
 
52
74
  ```bash
53
75
  vault-fetch login https://note.com
54
- # → ブラウザが開く手動でログインターミナルで Enter を押す
76
+ # → Browser opens Log in manually Press Enter in terminal
55
77
  ```
56
78
 
57
- 以降の `fetch` でそのドメインのセッションが自動的に使用されます。
79
+ Subsequent `fetch` commands will automatically use the saved session for that domain.
58
80
 
59
- ### fetch オプション
81
+ ### Fetch Options
60
82
 
61
- | オプション | 説明 |
83
+ | Option | Description |
62
84
  |---|---|
63
- | `--dest <path>` | 保存先ディレクトリ(必須) |
64
- | `--headed` | ブラウザを表示して実行 |
65
- | `--selector <css>` | CSS セレクタで要素を抽出 |
66
- | `--timeout <sec>` | タイムアウト秒数(デフォルト: 30 |
67
- | `--tag <name>` | タグ追加(複数指定可) |
68
- | `--wait-until <event>` | 待機条件: `load` / `domcontentloaded` / `networkidle`(デフォルト: `networkidle`) |
69
- | `--skip-session` | 保存済みセッションを使わない |
70
- | `--dry-run` | 保存せず標準出力に出力 |
71
-
72
- ## 設定
73
-
74
- ### 設定ファイル
85
+ | `--dest <path>` | Destination directory (required) |
86
+ | `--headed` | Run with browser visible |
87
+ | `--selector <css>` | Extract elements by CSS selector |
88
+ | `--timeout <sec>` | Timeout in seconds (default: 30) |
89
+ | `--tag <name>` | Add tags (can be specified multiple times) |
90
+ | `--wait-until <event>` | Wait condition: `load` / `domcontentloaded` / `networkidle` (default: `networkidle`) |
91
+ | `--skip-session` | Do not use saved sessions |
92
+ | `--dry-run` | Output to stdout without saving |
93
+ | `--raw` | Skip Readability and convert full-page HTML directly |
94
+ | `--no-block-images` | Disable image request blocking |
95
+ | `--no-block-fonts` | Disable font request blocking |
96
+ | `--no-block-media` | Disable media request blocking |
97
+
98
+ ## Configuration
99
+
100
+ ### Config File
75
101
 
76
102
  `~/.config/vault-fetch/config.yaml`:
77
103
 
78
104
  ```yaml
79
- # Obsidian Vault の保存先
105
+ # Obsidian Vault destination
80
106
  dest: ~/Documents/Obsidian/Clippings
81
107
 
82
- # デフォルトタグ
108
+ # Default tags
83
109
  tags:
84
110
  - clippings
85
111
 
86
112
  timeout: 30
87
113
  ```
88
114
 
89
- ### 環境変数
115
+ ### Environment Variables
90
116
 
91
- | 変数 | 説明 |
117
+ | Variable | Description |
92
118
  |---|---|
93
- | `VAULT_FETCH_DEST` | 保存先ディレクトリ |
94
- | `VAULT_FETCH_TIMEOUT` | タイムアウト秒数 |
119
+ | `VAULT_FETCH_DEST` | Destination directory |
120
+ | `VAULT_FETCH_TIMEOUT` | Timeout in seconds |
95
121
 
96
- ### 優先順位
122
+ ### Priority
97
123
 
98
- CLI オプション > 環境変数 > 設定ファイル > デフォルト値
124
+ CLI options > Environment variables > Config file > Default values
99
125
 
100
- ## 出力例
126
+ ## Output Example
101
127
 
102
128
  ```yaml
103
129
  ---
104
- title: ADHDの自分が毎日クッソ集中できるようになった習慣
105
- source: https://note.com/simplearchitect/n/n8389e1b4fbde
130
+ title: "Thinking, Fast and Slow: Lessons for Software Engineers"
131
+ source: https://medium.com/@example/thinking-fast-and-slow-lessons-for-engineers-abc123
106
132
  author:
107
- - "[[牛尾 剛]]"
133
+ - "[[Jane Smith]]"
108
134
  published: 2025-06-14
109
135
  created: 2025-07-03
110
- description: 自分はADHDですので、もちろん集中力は暗黒です...
136
+ description: How cognitive biases from Kahneman's research apply to everyday engineering decisions...
111
137
  tags:
112
138
  - clippings
113
139
  ---
114
140
 
115
- 記事の本文が Markdown で続きます...
141
+ The article body continues in Markdown...
116
142
  ```
117
143
 
118
- ## 開発
144
+ ## Development
119
145
 
120
146
  ```bash
121
- npm run build # tsup でビルド
122
- npm test # vitest でテスト実行
123
- npm run typecheck # 型チェック
147
+ npm run build # Build with tsup
148
+ npm test # Run tests with vitest
149
+ npm run typecheck # Type checking
124
150
  ```
125
151
 
126
- ## ライセンス
152
+ ## License
127
153
 
128
154
  MIT
package/dist/cli.js CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  // src/cli.ts
4
4
  import { Command } from "commander";
5
+ import { once } from "events";
5
6
  import { existsSync as existsSync2 } from "fs";
6
7
  import { homedir as homedir3 } from "os";
7
8
  import { join as join3 } from "path";
@@ -20,13 +21,40 @@ function expandTilde(filePath) {
20
21
  }
21
22
  return filePath;
22
23
  }
24
+ var VALID_WAIT_UNTIL = ["load", "domcontentloaded", "networkidle"];
25
+ function validateWaitUntil(value) {
26
+ if (!VALID_WAIT_UNTIL.includes(value)) {
27
+ throw new Error(
28
+ `Invalid waitUntil value: "${value}". Must be one of: ${VALID_WAIT_UNTIL.join(", ")}`
29
+ );
30
+ }
31
+ return value;
32
+ }
23
33
  function loadConfigFile(configPath) {
24
34
  const content = readFileSync(configPath, "utf-8");
25
35
  const parsed = yaml.load(content);
26
36
  if (parsed === null || typeof parsed !== "object") {
27
37
  throw new Error(`Invalid config file: ${configPath}`);
28
38
  }
29
- return parsed;
39
+ const config = parsed;
40
+ if (config.timeout !== void 0 && typeof config.timeout !== "number") {
41
+ throw new Error(`Invalid timeout in config file: expected number, got ${typeof config.timeout}`);
42
+ }
43
+ if (config.dest !== void 0 && typeof config.dest !== "string") {
44
+ throw new Error(`Invalid dest in config file: expected string, got ${typeof config.dest}`);
45
+ }
46
+ if (config.waitUntil !== void 0) {
47
+ if (typeof config.waitUntil !== "string") {
48
+ throw new Error(`Invalid waitUntil in config file: expected string, got ${typeof config.waitUntil}`);
49
+ }
50
+ validateWaitUntil(config.waitUntil);
51
+ }
52
+ if (config.tags !== void 0) {
53
+ if (!Array.isArray(config.tags) || !config.tags.every((t) => typeof t === "string")) {
54
+ throw new Error("Invalid tags in config file: expected array of strings");
55
+ }
56
+ }
57
+ return config;
30
58
  }
31
59
  function resolveConfig(cliOptions, configPath) {
32
60
  let fileConfig = {};
@@ -53,7 +81,8 @@ function resolveConfig(cliOptions, configPath) {
53
81
  } else {
54
82
  timeout = fileConfig.timeout ?? DEFAULT_TIMEOUT;
55
83
  }
56
- const waitUntil = cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;
84
+ const rawWaitUntil = cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;
85
+ const waitUntil = validateWaitUntil(rawWaitUntil);
57
86
  const allTags = [
58
87
  ...fileConfig.tags ?? [],
59
88
  ...cliOptions.tags ?? [],
@@ -68,7 +97,11 @@ function resolveConfig(cliOptions, configPath) {
68
97
  headed: cliOptions.headed ?? false,
69
98
  selector: cliOptions.selector ?? null,
70
99
  noSession: cliOptions.noSession ?? false,
71
- dryRun: cliOptions.dryRun ?? false
100
+ dryRun: cliOptions.dryRun ?? false,
101
+ blockImages: cliOptions.blockImages ?? true,
102
+ blockFonts: cliOptions.blockFonts ?? true,
103
+ blockMedia: cliOptions.blockMedia ?? true,
104
+ raw: cliOptions.raw ?? false
72
105
  };
73
106
  }
74
107
 
@@ -103,23 +136,79 @@ function ensureSessionDir(sessionsDir) {
103
136
  }
104
137
 
105
138
  // src/fetcher.ts
139
+ var CHROME_USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36";
140
+ function isPdfContentType(contentType) {
141
+ return contentType.toLowerCase().includes("application/pdf");
142
+ }
143
+ function buildBlockedResourceTypes(options) {
144
+ const blocked = /* @__PURE__ */ new Set();
145
+ if (options.blockImages) blocked.add("image");
146
+ if (options.blockFonts) blocked.add("font");
147
+ if (options.blockMedia) blocked.add("media");
148
+ return blocked;
149
+ }
150
+ var PDF_MAGIC_BYTES = "%PDF";
151
+ function validatePdfBuffer(pdfBuffer, sourceUrl) {
152
+ if (pdfBuffer.length === 0) {
153
+ throw new Error(`Empty PDF response received from ${sourceUrl}`);
154
+ }
155
+ const header = pdfBuffer.subarray(0, PDF_MAGIC_BYTES.length).toString("ascii");
156
+ if (!header.startsWith(PDF_MAGIC_BYTES)) {
157
+ throw new Error(
158
+ `Response Content-Type is application/pdf but body is not valid PDF data from ${sourceUrl}`
159
+ );
160
+ }
161
+ }
162
+ async function downloadPdf(context, url, timeoutMs) {
163
+ const apiResponse = await context.request.get(url, { timeout: timeoutMs });
164
+ const status = apiResponse.status();
165
+ if (status >= 400) {
166
+ throw new Error(`HTTP ${status} received when downloading PDF from ${url}`);
167
+ }
168
+ const pdfBuffer = Buffer.from(await apiResponse.body());
169
+ const finalUrl = apiResponse.url();
170
+ validatePdfBuffer(pdfBuffer, finalUrl);
171
+ return { pdfBuffer, finalUrl };
172
+ }
106
173
  async function fetchPage(url, config, sessionsDir) {
107
174
  const browser = await chromium.launch({
108
175
  headless: !config.headed
109
176
  });
110
177
  try {
111
- const contextOptions = {};
178
+ const contextOptions = {
179
+ userAgent: CHROME_USER_AGENT
180
+ };
112
181
  if (!config.noSession && sessionExists(url, sessionsDir)) {
113
182
  const sessionPath = getSessionPath(url, sessionsDir);
114
183
  contextOptions.storageState = sessionPath;
115
184
  }
116
185
  const context = await browser.newContext(contextOptions);
117
186
  const page = await context.newPage();
187
+ const blockedTypes = buildBlockedResourceTypes(config);
188
+ if (blockedTypes.size > 0) {
189
+ await page.route("**/*", async (route) => {
190
+ if (blockedTypes.has(route.request().resourceType())) {
191
+ await route.abort();
192
+ } else {
193
+ await route.continue();
194
+ }
195
+ });
196
+ }
118
197
  const timeoutMs = config.timeout * 1e3;
119
- const response = await page.goto(url, {
120
- waitUntil: config.waitUntil,
121
- timeout: timeoutMs
122
- });
198
+ let response;
199
+ try {
200
+ response = await page.goto(url, {
201
+ waitUntil: config.waitUntil,
202
+ timeout: timeoutMs
203
+ });
204
+ } catch (error) {
205
+ if (error instanceof Error && error.message.includes("Download is starting")) {
206
+ const result = await downloadPdf(context, url, timeoutMs);
207
+ await context.close();
208
+ return { kind: "pdf", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };
209
+ }
210
+ throw error;
211
+ }
123
212
  if (!response) {
124
213
  throw new Error(`No response received from ${url}`);
125
214
  }
@@ -128,6 +217,19 @@ async function fetchPage(url, config, sessionsDir) {
128
217
  throw new Error(`HTTP ${status} received from ${response.url()}`);
129
218
  }
130
219
  const finalUrl = response.url();
220
+ const contentType = response.headers()["content-type"] ?? "";
221
+ if (isPdfContentType(contentType)) {
222
+ const body = await response.body();
223
+ try {
224
+ validatePdfBuffer(body, finalUrl);
225
+ await context.close();
226
+ return { kind: "pdf", pdfBuffer: body, url, finalUrl };
227
+ } catch {
228
+ const result = await downloadPdf(context, finalUrl, timeoutMs);
229
+ await context.close();
230
+ return { kind: "pdf", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };
231
+ }
232
+ }
131
233
  const fullHtml = await page.content();
132
234
  let html;
133
235
  if (config.selector) {
@@ -140,7 +242,7 @@ async function fetchPage(url, config, sessionsDir) {
140
242
  html = fullHtml;
141
243
  }
142
244
  await context.close();
143
- return { html, fullHtml, url, finalUrl };
245
+ return { kind: "html", html, fullHtml, url, finalUrl };
144
246
  } finally {
145
247
  await browser.close();
146
248
  }
@@ -187,54 +289,48 @@ function extractAuthors(doc, readabilityByline) {
187
289
  }
188
290
  return [];
189
291
  }
292
+ function buildMetadata(doc, article, finalUrl) {
293
+ const title = article?.title ?? doc.title;
294
+ const authors = extractAuthors(doc, article?.byline ?? null);
295
+ const published = extractPublishedDate(doc);
296
+ const description = getMetaContent(doc, 'meta[property="og:description"]') ?? getMetaContent(doc, 'meta[name="description"]') ?? (article?.excerpt ?? null);
297
+ const today = (/* @__PURE__ */ new Date()).toISOString().split("T")[0];
298
+ return {
299
+ title,
300
+ source: finalUrl,
301
+ author: authors,
302
+ published,
303
+ created: today,
304
+ description
305
+ };
306
+ }
307
+ function parseWithReadability(html, url) {
308
+ const dom = new JSDOM(html, { url });
309
+ const reader = new Readability(dom.window.document);
310
+ return reader.parse();
311
+ }
190
312
  function extract(html, finalUrl) {
191
313
  const metaDom = new JSDOM(html, { url: finalUrl });
192
314
  const doc = metaDom.window.document;
193
- const readabilityDom = new JSDOM(html, { url: finalUrl });
194
- const reader = new Readability(readabilityDom.window.document);
195
- const article = reader.parse();
315
+ const article = parseWithReadability(html, finalUrl);
196
316
  if (!article) {
197
- throw new Error("Readability failed to extract content from the page");
317
+ throw new Error(
318
+ "Readability failed to extract content from the page. Try --raw to convert the full page, or --selector <css> to target specific content."
319
+ );
198
320
  }
199
321
  if (!article.content) {
200
322
  throw new Error("Readability returned empty content for the page");
201
323
  }
202
- const title = article.title ?? doc.title;
203
- const authors = extractAuthors(doc, article.byline ?? null);
204
- const published = extractPublishedDate(doc);
205
- const description = getMetaContent(doc, 'meta[property="og:description"]') ?? getMetaContent(doc, 'meta[name="description"]') ?? (article.excerpt ?? null);
206
- const today = (/* @__PURE__ */ new Date()).toISOString().split("T")[0];
207
324
  return {
208
- metadata: {
209
- title,
210
- source: finalUrl,
211
- author: authors,
212
- published,
213
- created: today,
214
- description
215
- },
325
+ metadata: buildMetadata(doc, article, finalUrl),
216
326
  content: article.content
217
327
  };
218
328
  }
219
329
  function extractMetadata(html, finalUrl) {
220
330
  const metaDom = new JSDOM(html, { url: finalUrl });
221
331
  const doc = metaDom.window.document;
222
- const readabilityDom = new JSDOM(html, { url: finalUrl });
223
- const reader = new Readability(readabilityDom.window.document);
224
- const article = reader.parse();
225
- const title = article?.title ?? doc.title;
226
- const authors = extractAuthors(doc, article?.byline ?? null);
227
- const published = extractPublishedDate(doc);
228
- const description = getMetaContent(doc, 'meta[property="og:description"]') ?? getMetaContent(doc, 'meta[name="description"]') ?? (article?.excerpt ?? null);
229
- const today = (/* @__PURE__ */ new Date()).toISOString().split("T")[0];
230
- return {
231
- title,
232
- source: finalUrl,
233
- author: authors,
234
- published,
235
- created: today,
236
- description
237
- };
332
+ const article = parseWithReadability(html, finalUrl);
333
+ return buildMetadata(doc, article, finalUrl);
238
334
  }
239
335
 
240
336
  // src/converter.ts
@@ -253,9 +349,10 @@ import { writeFileSync } from "fs";
253
349
  import { join as join2 } from "path";
254
350
  import yaml2 from "js-yaml";
255
351
  var UNSAFE_CHARS = /[/\\:*?"<>|]/g;
352
+ var CONTROL_CHARS = /[\x00-\x1f\x7f]/g;
256
353
  var MAX_FILENAME_LENGTH = 200;
257
354
  function sanitizeFilename(title) {
258
- const sanitized = title.replace(UNSAFE_CHARS, "").replace(/\s+/g, " ").trim();
355
+ const sanitized = title.replace(CONTROL_CHARS, "").replace(UNSAFE_CHARS, "").replace(/\s+/g, " ").trim();
259
356
  const base = sanitized.slice(0, MAX_FILENAME_LENGTH) || "Untitled";
260
357
  return `${base}.md`;
261
358
  }
@@ -301,14 +398,14 @@ var CONFIG_PATH = join3(homedir3(), ".config", "vault-fetch", "config.yaml");
301
398
  var program = new Command();
302
399
  program.name("vault-fetch").description(
303
400
  "Fetch JS-rendered web pages and save as Markdown to Obsidian Vault"
304
- ).version("0.1.0");
401
+ ).version("0.3.0");
305
402
  program.command("fetch").description("Fetch a page and save as Markdown").argument("<url>", "URL to fetch").option("--dest <path>", "Destination directory").option("--headed", "Run browser in headed mode").option("--selector <css>", "CSS selector to extract").option("--timeout <seconds>", "Timeout in seconds", parseInt).option("--tag <name>", "Add tag (repeatable)", (val, acc) => {
306
403
  acc.push(val);
307
404
  return acc;
308
405
  }, []).option(
309
406
  "--wait-until <event>",
310
407
  "Wait condition: load, domcontentloaded, networkidle"
311
- ).option("--skip-session", "Do not use saved session").option("--dry-run", "Output to stdout instead of saving").action(async (url, options) => {
408
+ ).option("--skip-session", "Do not use saved session").option("--dry-run", "Output to stdout instead of saving").option("--no-block-images", "Do not block image requests").option("--no-block-fonts", "Do not block font requests").option("--no-block-media", "Do not block media requests").option("--raw", "Convert full page HTML without Readability extraction").action(async (url, options) => {
312
409
  try {
313
410
  const configPath = existsSync2(CONFIG_PATH) ? CONFIG_PATH : void 0;
314
411
  const config = resolveConfig(
@@ -320,26 +417,49 @@ program.command("fetch").description("Fetch a page and save as Markdown").argume
320
417
  headed: options.headed,
321
418
  selector: options.selector,
322
419
  noSession: options.skipSession,
323
- dryRun: options.dryRun
420
+ dryRun: options.dryRun,
421
+ blockImages: options.blockImages,
422
+ blockFonts: options.blockFonts,
423
+ blockMedia: options.blockMedia,
424
+ raw: options.raw
324
425
  },
325
426
  configPath
326
427
  );
428
+ if (config.raw && config.selector) {
429
+ throw new Error("--raw and --selector cannot be used together.");
430
+ }
327
431
  if (!config.dryRun && !existsSync2(config.dest)) {
328
432
  throw new Error(`Destination directory does not exist: ${config.dest}`);
329
433
  }
330
434
  const sessionsDir = getSessionDir();
331
435
  const fetchResult = await fetchPage(url, config, sessionsDir);
332
- let contentHtml;
436
+ let markdown;
333
437
  let metadata;
334
- if (config.selector) {
335
- contentHtml = fetchResult.html;
438
+ if (fetchResult.kind === "pdf") {
439
+ if (config.selector) {
440
+ throw new Error("--selector cannot be used with PDF URLs.");
441
+ }
442
+ if (config.raw) {
443
+ throw new Error("--raw cannot be used with PDF URLs.");
444
+ }
445
+ const { convertPdfToMarkdown } = await import("./pdf-converter-U3SFA2HY.js");
446
+ const pdfResult = await convertPdfToMarkdown(
447
+ fetchResult.pdfBuffer,
448
+ fetchResult.finalUrl
449
+ );
450
+ markdown = pdfResult.markdown;
451
+ metadata = pdfResult.metadata;
452
+ } else if (config.selector) {
336
453
  metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);
454
+ markdown = convertToMarkdown(fetchResult.html);
455
+ } else if (config.raw) {
456
+ metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);
457
+ markdown = convertToMarkdown(fetchResult.fullHtml);
337
458
  } else {
338
459
  const result = extract(fetchResult.html, fetchResult.finalUrl);
339
460
  metadata = result.metadata;
340
- contentHtml = result.content;
461
+ markdown = convertToMarkdown(result.content);
341
462
  }
342
- const markdown = convertToMarkdown(contentHtml);
343
463
  if (config.dryRun) {
344
464
  const frontmatter = buildFrontmatter(metadata, config.tags);
345
465
  process.stdout.write(`${frontmatter}
@@ -372,11 +492,10 @@ program.command("login").description("Login to a site and save session").argumen
372
492
  const page = await context.newPage();
373
493
  await page.goto(url, { waitUntil: "networkidle", timeout: timeoutSec * 1e3 });
374
494
  console.error("Browser opened. Log in manually, then press Enter here to save session.");
375
- await new Promise((resolve2) => {
376
- process.stdin.once("data", () => {
377
- resolve2();
378
- });
379
- });
495
+ process.stdin.resume();
496
+ await once(process.stdin, "data");
497
+ process.stdin.pause();
498
+ process.stdin.unref();
380
499
  const sessionPath = getSessionPath(url, sessionsDir);
381
500
  await context.storageState({ path: sessionPath });
382
501
  console.error(`Session saved: ${sessionPath}`);
package/dist/cli.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"sources":["../src/cli.ts","../src/config.ts","../src/fetcher.ts","../src/session.ts","../src/extractor.ts","../src/converter.ts","../src/writer.ts"],"sourcesContent":["import { Command } from \"commander\";\nimport { existsSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { join } from \"node:path\";\nimport { resolveConfig } from \"./config.js\";\nimport { fetchPage } from \"./fetcher.js\";\nimport { extract, extractMetadata } from \"./extractor.js\";\nimport { convertToMarkdown } from \"./converter.js\";\nimport { writeMarkdownFile, buildFrontmatter } from \"./writer.js\";\nimport {\n getSessionDir,\n getSessionPath,\n ensureSessionDir,\n} from \"./session.js\";\nimport type { WaitUntilOption } from \"./types.js\";\n\nconst CONFIG_PATH = join(homedir(), \".config\", \"vault-fetch\", \"config.yaml\");\n\nconst program = new Command();\n\nprogram\n .name(\"vault-fetch\")\n .description(\n \"Fetch JS-rendered web pages and save as Markdown to Obsidian Vault\",\n )\n .version(\"0.1.0\");\n\nprogram\n .command(\"fetch\")\n .description(\"Fetch a page and save as Markdown\")\n .argument(\"<url>\", \"URL to fetch\")\n .option(\"--dest <path>\", \"Destination directory\")\n .option(\"--headed\", \"Run browser in headed mode\")\n .option(\"--selector <css>\", \"CSS selector to extract\")\n .option(\"--timeout <seconds>\", \"Timeout in seconds\", parseInt)\n .option(\"--tag <name>\", \"Add tag (repeatable)\", (val: string, acc: string[]) => {\n acc.push(val);\n return acc;\n }, [] as string[])\n .option(\n \"--wait-until <event>\",\n \"Wait condition: load, domcontentloaded, networkidle\",\n )\n .option(\"--skip-session\", \"Do not use saved session\")\n .option(\"--dry-run\", \"Output to stdout instead of saving\")\n .action(async (url: string, options: Record<string, unknown>) => {\n try {\n const configPath = existsSync(CONFIG_PATH) ? CONFIG_PATH : undefined;\n const config = resolveConfig(\n {\n dest: options.dest as string | undefined,\n tags: options.tag as string[] | undefined,\n timeout: options.timeout as number | undefined,\n waitUntil: options.waitUntil as WaitUntilOption | undefined,\n headed: options.headed as boolean | undefined,\n selector: options.selector as string | undefined,\n noSession: options.skipSession as boolean | undefined,\n dryRun: options.dryRun as boolean | undefined,\n },\n configPath,\n );\n\n // Validate dest directory exists\n if (!config.dryRun && !existsSync(config.dest)) {\n throw new Error(`Destination directory does not exist: ${config.dest}`);\n }\n\n const sessionsDir = getSessionDir();\n const fetchResult = await fetchPage(url, config, sessionsDir);\n\n let contentHtml: string;\n let metadata;\n\n if (config.selector) {\n // --selector mode: skip Readability, extract metadata from full page\n contentHtml = fetchResult.html;\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n } else {\n const result = extract(fetchResult.html, fetchResult.finalUrl);\n metadata = result.metadata;\n contentHtml = result.content;\n }\n\n const markdown = convertToMarkdown(contentHtml);\n\n if (config.dryRun) {\n const frontmatter = buildFrontmatter(metadata, config.tags);\n process.stdout.write(`${frontmatter}\\n\\n${markdown}\\n`);\n } else {\n const filePath = writeMarkdownFile(\n config.dest,\n metadata,\n markdown,\n config.tags,\n );\n console.error(`Saved: ${filePath}`);\n }\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n }\n });\n\nprogram\n .command(\"login\")\n .description(\"Login to a site and save session\")\n .argument(\"<url>\", \"URL to login\")\n .option(\"--timeout <seconds>\", \"Login timeout in seconds\", parseInt)\n .action(async (url: string, options: Record<string, unknown>) => {\n const { chromium } = await import(\"playwright\");\n const sessionsDir = getSessionDir();\n ensureSessionDir(sessionsDir);\n\n const timeoutSec = (options.timeout as number | undefined) ?? 300;\n const browser = await chromium.launch({ headless: false });\n\n try {\n const context = await browser.newContext();\n const page = await context.newPage();\n\n await page.goto(url, { waitUntil: \"networkidle\", timeout: timeoutSec * 1000 });\n\n console.error(\"Browser opened. Log in manually, then press Enter here to save session.\");\n\n await new Promise<void>((resolve) => {\n process.stdin.once(\"data\", () => {\n resolve();\n });\n });\n\n const sessionPath = getSessionPath(url, sessionsDir);\n await context.storageState({ path: sessionPath });\n console.error(`Session saved: ${sessionPath}`);\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n } finally {\n await browser.close();\n }\n });\n\nprogram.parse();\n","import { readFileSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { resolve } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { ResolvedConfig, WaitUntilOption } from \"./types.js\";\n\nconst DEFAULT_TIMEOUT = 30;\nconst DEFAULT_WAIT_UNTIL: WaitUntilOption = \"networkidle\";\nconst REQUIRED_TAG = \"clippings\";\n\ninterface FileConfig {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n}\n\ninterface CliOptions {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n headed?: boolean;\n selector?: string;\n noSession?: boolean;\n dryRun?: boolean;\n}\n\nfunction expandTilde(filePath: string): string {\n if (filePath.startsWith(\"~/\")) {\n return resolve(homedir(), filePath.slice(2));\n }\n return filePath;\n}\n\nfunction loadConfigFile(configPath: string): FileConfig {\n const content = readFileSync(configPath, \"utf-8\");\n const parsed = yaml.load(content);\n if (parsed === null || typeof parsed !== \"object\") {\n throw new Error(`Invalid config file: ${configPath}`);\n }\n return parsed as FileConfig;\n}\n\nexport function resolveConfig(\n cliOptions: CliOptions,\n configPath: string | undefined,\n): ResolvedConfig {\n // Layer 1: Config file\n let fileConfig: FileConfig = {};\n if (configPath) {\n fileConfig = loadConfigFile(configPath);\n }\n\n // Layer 2: Environment variables\n const envDest = process.env.VAULT_FETCH_DEST;\n const envTimeout = process.env.VAULT_FETCH_TIMEOUT;\n\n // Resolve each field: CLI > env > file > default\n const dest = cliOptions.dest ?? envDest ?? fileConfig.dest;\n if (dest === undefined) {\n throw new Error(\n \"dest is required. Set via --dest, VAULT_FETCH_DEST, or config file.\",\n );\n }\n\n let timeout: number;\n if (cliOptions.timeout !== undefined) {\n timeout = cliOptions.timeout;\n } else if (envTimeout !== undefined) {\n const parsed = Number(envTimeout);\n if (Number.isNaN(parsed)) {\n throw new Error(`Invalid VAULT_FETCH_TIMEOUT value: ${envTimeout}`);\n }\n timeout = parsed;\n } else {\n timeout = fileConfig.timeout ?? DEFAULT_TIMEOUT;\n }\n\n const waitUntil =\n cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;\n\n // Merge tags: file tags + CLI tags + always clippings\n const allTags = [\n ...(fileConfig.tags ?? []),\n ...(cliOptions.tags ?? []),\n REQUIRED_TAG,\n ];\n const tags = [...new Set(allTags)];\n\n return {\n dest: expandTilde(dest),\n tags,\n timeout,\n waitUntil,\n headed: cliOptions.headed ?? false,\n selector: cliOptions.selector ?? null,\n noSession: cliOptions.noSession ?? false,\n dryRun: cliOptions.dryRun ?? false,\n };\n}\n","import { chromium, type BrowserContext } from \"playwright\";\nimport type { FetchResult, ResolvedConfig } from \"./types.js\";\nimport { getSessionPath, sessionExists } from \"./session.js\";\n\nexport async function fetchPage(\n url: string,\n config: ResolvedConfig,\n sessionsDir: string,\n): Promise<FetchResult> {\n const browser = await chromium.launch({\n headless: !config.headed,\n });\n\n try {\n const contextOptions: Parameters<typeof browser.newContext>[0] = {};\n\n // Load session if available and not disabled\n if (!config.noSession && sessionExists(url, sessionsDir)) {\n const sessionPath = getSessionPath(url, sessionsDir);\n contextOptions.storageState = sessionPath;\n }\n\n const context: BrowserContext = await browser.newContext(contextOptions);\n const page = await context.newPage();\n\n const timeoutMs = config.timeout * 1000;\n const response = await page.goto(url, {\n waitUntil: config.waitUntil,\n timeout: timeoutMs,\n });\n\n if (!response) {\n throw new Error(`No response received from ${url}`);\n }\n\n const status = response.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received from ${response.url()}`);\n }\n\n const finalUrl = response.url();\n const fullHtml = await page.content();\n let html: string;\n\n if (config.selector) {\n const element = await page.$(config.selector);\n if (!element) {\n throw new Error(`Selector not found: ${config.selector}`);\n }\n html = await element.innerHTML();\n } else {\n html = fullHtml;\n }\n\n await context.close();\n\n return { html, fullHtml, url, finalUrl };\n } finally {\n await browser.close();\n }\n}\n","import { existsSync, mkdirSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport { homedir } from \"node:os\";\n\nconst CONFIG_DIR = join(homedir(), \".config\", \"vault-fetch\");\nconst SESSIONS_DIR = join(CONFIG_DIR, \"sessions\");\n\nexport function getSessionDir(): string {\n return SESSIONS_DIR;\n}\n\nfunction extractDomain(url: string): string {\n const parsed = new URL(url);\n return parsed.hostname ?? \"\";\n}\n\nexport function getSessionPath(url: string, sessionsDir: string): string {\n const domain = extractDomain(url);\n return join(sessionsDir, `${domain}.json`);\n}\n\nexport function sessionExists(url: string, sessionsDir: string): boolean {\n const sessionPath = getSessionPath(url, sessionsDir);\n return existsSync(sessionPath);\n}\n\nexport function ensureSessionDir(sessionsDir: string): void {\n if (!existsSync(sessionsDir)) {\n mkdirSync(sessionsDir, { recursive: true });\n }\n}\n","import { Readability } from \"@mozilla/readability\";\nimport { JSDOM } from \"jsdom\";\nimport type { Metadata } from \"./types.js\";\n\nfunction getMetaContent(doc: Document, selector: string): string | null {\n const el = doc.querySelector(selector);\n return el?.getAttribute(\"content\") ?? null;\n}\n\nfunction formatAuthor(raw: string): string {\n return `[[${raw.trim()}]]`;\n}\n\nfunction extractPublishedDate(doc: Document): string | null {\n const published =\n getMetaContent(doc, 'meta[property=\"article:published_time\"]') ??\n getMetaContent(doc, 'meta[name=\"datePublished\"]');\n\n if (!published) {\n const jsonLd = doc.querySelector('script[type=\"application/ld+json\"]');\n if (jsonLd?.textContent) {\n try {\n const data = JSON.parse(jsonLd.textContent) as Record<string, unknown>;\n if (typeof data.datePublished === \"string\") {\n return data.datePublished.split(\"T\")[0];\n }\n } catch {\n // JSON-LD parse failed\n }\n }\n return null;\n }\n\n return published.split(\"T\")[0];\n}\n\nfunction extractAuthors(\n doc: Document,\n readabilityByline: string | null,\n): string[] {\n const articleAuthors = doc.querySelectorAll('meta[property=\"article:author\"]');\n if (articleAuthors.length > 0) {\n return Array.from(articleAuthors)\n .map((el) => el.getAttribute(\"content\"))\n .filter((v): v is string => v !== null)\n .map(formatAuthor);\n }\n\n const ogAuthor = getMetaContent(doc, 'meta[property=\"og:author\"]');\n if (ogAuthor) {\n return [formatAuthor(ogAuthor)];\n }\n\n if (readabilityByline) {\n return [formatAuthor(readabilityByline)];\n }\n\n return [];\n}\n\nexport interface ExtractResult {\n metadata: Metadata;\n content: string;\n}\n\nexport function extract(html: string, finalUrl: string): ExtractResult {\n // One JSDOM for metadata DOM queries\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n // One JSDOM for Readability (which mutates the DOM)\n const readabilityDom = new JSDOM(html, { url: finalUrl });\n const reader = new Readability(readabilityDom.window.document);\n const article = reader.parse();\n\n if (!article) {\n throw new Error(\"Readability failed to extract content from the page\");\n }\n\n if (!article.content) {\n throw new Error(\"Readability returned empty content for the page\");\n }\n\n const title = article.title ?? doc.title;\n const authors = extractAuthors(doc, article.byline ?? null);\n const published = extractPublishedDate(doc);\n\n const description =\n getMetaContent(doc, 'meta[property=\"og:description\"]') ??\n getMetaContent(doc, 'meta[name=\"description\"]') ??\n (article.excerpt ?? null);\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n metadata: {\n title,\n source: finalUrl,\n author: authors,\n published,\n created: today,\n description,\n },\n content: article.content,\n };\n}\n\nexport function extractMetadata(html: string, finalUrl: string): Metadata {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const readabilityDom = new JSDOM(html, { url: finalUrl });\n const reader = new Readability(readabilityDom.window.document);\n const article = reader.parse();\n\n const title = article?.title ?? doc.title;\n const authors = extractAuthors(doc, article?.byline ?? null);\n const published = extractPublishedDate(doc);\n\n const description =\n getMetaContent(doc, 'meta[property=\"og:description\"]') ??\n getMetaContent(doc, 'meta[name=\"description\"]') ??\n (article?.excerpt ?? null);\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n title,\n source: finalUrl,\n author: authors,\n published,\n created: today,\n description,\n };\n}\n","import TurndownService from \"turndown\";\n\nexport function convertToMarkdown(html: string): string {\n const turndown = new TurndownService({\n headingStyle: \"atx\",\n codeBlockStyle: \"fenced\",\n bulletListMarker: \"-\",\n });\n\n return turndown.turndown(html);\n}\n","import { writeFileSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { Metadata } from \"./types.js\";\n\nconst UNSAFE_CHARS = /[/\\\\:*?\"<>|]/g;\nconst MAX_FILENAME_LENGTH = 200;\n\nexport function sanitizeFilename(title: string): string {\n const sanitized = title\n .replace(UNSAFE_CHARS, \"\")\n .replace(/\\s+/g, \" \")\n .trim();\n const base = sanitized.slice(0, MAX_FILENAME_LENGTH) || \"Untitled\";\n return `${base}.md`;\n}\n\nexport function buildFrontmatter(metadata: Metadata, tags: string[]): string {\n const data: Record<string, unknown> = {\n title: metadata.title,\n source: metadata.source,\n };\n\n if (metadata.author.length > 0) {\n data.author = metadata.author;\n }\n\n if (metadata.published) {\n data.published = metadata.published;\n }\n\n data.created = metadata.created;\n\n if (metadata.description) {\n data.description = metadata.description;\n }\n\n data.tags = tags;\n\n const yamlStr = yaml.dump(data, {\n quotingType: '\"',\n forceQuotes: false,\n lineWidth: -1,\n sortKeys: false,\n });\n\n return `---\\n${yamlStr}---`;\n}\n\nexport function writeMarkdownFile(\n dest: string,\n metadata: Metadata,\n markdownContent: string,\n tags: string[],\n): string {\n const filename = sanitizeFilename(metadata.title);\n const filePath = join(dest, filename);\n const frontmatter = buildFrontmatter(metadata, tags);\n const fullContent = `${frontmatter}\\n\\n${markdownContent}\\n`;\n\n writeFileSync(filePath, fullContent, \"utf-8\");\n\n return filePath;\n}\n"],"mappings":";;;AAAA,SAAS,eAAe;AACxB,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,WAAAC,gBAAe;AACxB,SAAS,QAAAC,aAAY;;;ACHrB,SAAS,oBAAoB;AAC7B,SAAS,eAAe;AACxB,SAAS,eAAe;AACxB,OAAO,UAAU;AAGjB,IAAM,kBAAkB;AACxB,IAAM,qBAAsC;AAC5C,IAAM,eAAe;AAoBrB,SAAS,YAAY,UAA0B;AAC7C,MAAI,SAAS,WAAW,IAAI,GAAG;AAC7B,WAAO,QAAQ,QAAQ,GAAG,SAAS,MAAM,CAAC,CAAC;AAAA,EAC7C;AACA,SAAO;AACT;AAEA,SAAS,eAAe,YAAgC;AACtD,QAAM,UAAU,aAAa,YAAY,OAAO;AAChD,QAAM,SAAS,KAAK,KAAK,OAAO;AAChC,MAAI,WAAW,QAAQ,OAAO,WAAW,UAAU;AACjD,UAAM,IAAI,MAAM,wBAAwB,UAAU,EAAE;AAAA,EACtD;AACA,SAAO;AACT;AAEO,SAAS,cACd,YACA,YACgB;AAEhB,MAAI,aAAyB,CAAC;AAC9B,MAAI,YAAY;AACd,iBAAa,eAAe,UAAU;AAAA,EACxC;AAGA,QAAM,UAAU,QAAQ,IAAI;AAC5B,QAAM,aAAa,QAAQ,IAAI;AAG/B,QAAM,OAAO,WAAW,QAAQ,WAAW,WAAW;AACtD,MAAI,SAAS,QAAW;AACtB,UAAM,IAAI;AAAA,MACR;AAAA,IACF;AAAA,EACF;AAEA,MAAI;AACJ,MAAI,WAAW,YAAY,QAAW;AACpC,cAAU,WAAW;AAAA,EACvB,WAAW,eAAe,QAAW;AACnC,UAAM,SAAS,OAAO,UAAU;AAChC,QAAI,OAAO,MAAM,MAAM,GAAG;AACxB,YAAM,IAAI,MAAM,sCAAsC,UAAU,EAAE;AAAA,IACpE;AACA,cAAU;AAAA,EACZ,OAAO;AACL,cAAU,WAAW,WAAW;AAAA,EAClC;AAEA,QAAM,YACJ,WAAW,aAAa,WAAW,aAAa;AAGlD,QAAM,UAAU;AAAA,IACd,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB;AAAA,EACF;AACA,QAAM,OAAO,CAAC,GAAG,IAAI,IAAI,OAAO,CAAC;AAEjC,SAAO;AAAA,IACL,MAAM,YAAY,IAAI;AAAA,IACtB;AAAA,IACA;AAAA,IACA;AAAA,IACA,QAAQ,WAAW,UAAU;AAAA,IAC7B,UAAU,WAAW,YAAY;AAAA,IACjC,WAAW,WAAW,aAAa;AAAA,IACnC,QAAQ,WAAW,UAAU;AAAA,EAC/B;AACF;;;ACpGA,SAAS,gBAAqC;;;ACA9C,SAAS,YAAY,iBAAiB;AACtC,SAAS,YAAY;AACrB,SAAS,WAAAC,gBAAe;AAExB,IAAM,aAAa,KAAKA,SAAQ,GAAG,WAAW,aAAa;AAC3D,IAAM,eAAe,KAAK,YAAY,UAAU;AAEzC,SAAS,gBAAwB;AACtC,SAAO;AACT;AAEA,SAAS,cAAc,KAAqB;AAC1C,QAAM,SAAS,IAAI,IAAI,GAAG;AAC1B,SAAO,OAAO,YAAY;AAC5B;AAEO,SAAS,eAAe,KAAa,aAA6B;AACvE,QAAM,SAAS,cAAc,GAAG;AAChC,SAAO,KAAK,aAAa,GAAG,MAAM,OAAO;AAC3C;AAEO,SAAS,cAAc,KAAa,aAA8B;AACvE,QAAM,cAAc,eAAe,KAAK,WAAW;AACnD,SAAO,WAAW,WAAW;AAC/B;AAEO,SAAS,iBAAiB,aAA2B;AAC1D,MAAI,CAAC,WAAW,WAAW,GAAG;AAC5B,cAAU,aAAa,EAAE,WAAW,KAAK,CAAC;AAAA,EAC5C;AACF;;;AD1BA,eAAsB,UACpB,KACA,QACA,aACsB;AACtB,QAAM,UAAU,MAAM,SAAS,OAAO;AAAA,IACpC,UAAU,CAAC,OAAO;AAAA,EACpB,CAAC;AAED,MAAI;AACF,UAAM,iBAA2D,CAAC;AAGlE,QAAI,CAAC,OAAO,aAAa,cAAc,KAAK,WAAW,GAAG;AACxD,YAAM,cAAc,eAAe,KAAK,WAAW;AACnD,qBAAe,eAAe;AAAA,IAChC;AAEA,UAAM,UAA0B,MAAM,QAAQ,WAAW,cAAc;AACvE,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAEnC,UAAM,YAAY,OAAO,UAAU;AACnC,UAAM,WAAW,MAAM,KAAK,KAAK,KAAK;AAAA,MACpC,WAAW,OAAO;AAAA,MAClB,SAAS;AAAA,IACX,CAAC;AAED,QAAI,CAAC,UAAU;AACb,YAAM,IAAI,MAAM,6BAA6B,GAAG,EAAE;AAAA,IACpD;AAEA,UAAM,SAAS,SAAS,OAAO;AAC/B,QAAI,UAAU,KAAK;AACjB,YAAM,IAAI,MAAM,QAAQ,MAAM,kBAAkB,SAAS,IAAI,CAAC,EAAE;AAAA,IAClE;AAEA,UAAM,WAAW,SAAS,IAAI;AAC9B,UAAM,WAAW,MAAM,KAAK,QAAQ;AACpC,QAAI;AAEJ,QAAI,OAAO,UAAU;AACnB,YAAM,UAAU,MAAM,KAAK,EAAE,OAAO,QAAQ;AAC5C,UAAI,CAAC,SAAS;AACZ,cAAM,IAAI,MAAM,uBAAuB,OAAO,QAAQ,EAAE;AAAA,MAC1D;AACA,aAAO,MAAM,QAAQ,UAAU;AAAA,IACjC,OAAO;AACL,aAAO;AAAA,IACT;AAEA,UAAM,QAAQ,MAAM;AAEpB,WAAO,EAAE,MAAM,UAAU,KAAK,SAAS;AAAA,EACzC,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF;;;AE5DA,SAAS,mBAAmB;AAC5B,SAAS,aAAa;AAGtB,SAAS,eAAe,KAAe,UAAiC;AACtE,QAAM,KAAK,IAAI,cAAc,QAAQ;AACrC,SAAO,IAAI,aAAa,SAAS,KAAK;AACxC;AAEA,SAAS,aAAa,KAAqB;AACzC,SAAO,KAAK,IAAI,KAAK,CAAC;AACxB;AAEA,SAAS,qBAAqB,KAA8B;AAC1D,QAAM,YACJ,eAAe,KAAK,yCAAyC,KAC7D,eAAe,KAAK,4BAA4B;AAElD,MAAI,CAAC,WAAW;AACd,UAAM,SAAS,IAAI,cAAc,oCAAoC;AACrE,QAAI,QAAQ,aAAa;AACvB,UAAI;AACF,cAAM,OAAO,KAAK,MAAM,OAAO,WAAW;AAC1C,YAAI,OAAO,KAAK,kBAAkB,UAAU;AAC1C,iBAAO,KAAK,cAAc,MAAM,GAAG,EAAE,CAAC;AAAA,QACxC;AAAA,MACF,QAAQ;AAAA,MAER;AAAA,IACF;AACA,WAAO;AAAA,EACT;AAEA,SAAO,UAAU,MAAM,GAAG,EAAE,CAAC;AAC/B;AAEA,SAAS,eACP,KACA,mBACU;AACV,QAAM,iBAAiB,IAAI,iBAAiB,iCAAiC;AAC7E,MAAI,eAAe,SAAS,GAAG;AAC7B,WAAO,MAAM,KAAK,cAAc,EAC7B,IAAI,CAAC,OAAO,GAAG,aAAa,SAAS,CAAC,EACtC,OAAO,CAAC,MAAmB,MAAM,IAAI,EACrC,IAAI,YAAY;AAAA,EACrB;AAEA,QAAM,WAAW,eAAe,KAAK,4BAA4B;AACjE,MAAI,UAAU;AACZ,WAAO,CAAC,aAAa,QAAQ,CAAC;AAAA,EAChC;AAEA,MAAI,mBAAmB;AACrB,WAAO,CAAC,aAAa,iBAAiB,CAAC;AAAA,EACzC;AAEA,SAAO,CAAC;AACV;AAOO,SAAS,QAAQ,MAAc,UAAiC;AAErE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAG3B,QAAM,iBAAiB,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACxD,QAAM,SAAS,IAAI,YAAY,eAAe,OAAO,QAAQ;AAC7D,QAAM,UAAU,OAAO,MAAM;AAE7B,MAAI,CAAC,SAAS;AACZ,UAAM,IAAI,MAAM,qDAAqD;AAAA,EACvE;AAEA,MAAI,CAAC,QAAQ,SAAS;AACpB,UAAM,IAAI,MAAM,iDAAiD;AAAA,EACnE;AAEA,QAAM,QAAQ,QAAQ,SAAS,IAAI;AACnC,QAAM,UAAU,eAAe,KAAK,QAAQ,UAAU,IAAI;AAC1D,QAAM,YAAY,qBAAqB,GAAG;AAE1C,QAAM,cACJ,eAAe,KAAK,iCAAiC,KACrD,eAAe,KAAK,0BAA0B,MAC7C,QAAQ,WAAW;AAEtB,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL,UAAU;AAAA,MACR;AAAA,MACA,QAAQ;AAAA,MACR,QAAQ;AAAA,MACR;AAAA,MACA,SAAS;AAAA,MACT;AAAA,IACF;AAAA,IACA,SAAS,QAAQ;AAAA,EACnB;AACF;AAEO,SAAS,gBAAgB,MAAc,UAA4B;AACxE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,iBAAiB,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACxD,QAAM,SAAS,IAAI,YAAY,eAAe,OAAO,QAAQ;AAC7D,QAAM,UAAU,OAAO,MAAM;AAE7B,QAAM,QAAQ,SAAS,SAAS,IAAI;AACpC,QAAM,UAAU,eAAe,KAAK,SAAS,UAAU,IAAI;AAC3D,QAAM,YAAY,qBAAqB,GAAG;AAE1C,QAAM,cACJ,eAAe,KAAK,iCAAiC,KACrD,eAAe,KAAK,0BAA0B,MAC7C,SAAS,WAAW;AAEvB,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,QAAQ;AAAA,IACR,QAAQ;AAAA,IACR;AAAA,IACA,SAAS;AAAA,IACT;AAAA,EACF;AACF;;;ACtIA,OAAO,qBAAqB;AAErB,SAAS,kBAAkB,MAAsB;AACtD,QAAM,WAAW,IAAI,gBAAgB;AAAA,IACnC,cAAc;AAAA,IACd,gBAAgB;AAAA,IAChB,kBAAkB;AAAA,EACpB,CAAC;AAED,SAAO,SAAS,SAAS,IAAI;AAC/B;;;ACVA,SAAS,qBAAqB;AAC9B,SAAS,QAAAC,aAAY;AACrB,OAAOC,WAAU;AAGjB,IAAM,eAAe;AACrB,IAAM,sBAAsB;AAErB,SAAS,iBAAiB,OAAuB;AACtD,QAAM,YAAY,MACf,QAAQ,cAAc,EAAE,EACxB,QAAQ,QAAQ,GAAG,EACnB,KAAK;AACR,QAAM,OAAO,UAAU,MAAM,GAAG,mBAAmB,KAAK;AACxD,SAAO,GAAG,IAAI;AAChB;AAEO,SAAS,iBAAiB,UAAoB,MAAwB;AAC3E,QAAM,OAAgC;AAAA,IACpC,OAAO,SAAS;AAAA,IAChB,QAAQ,SAAS;AAAA,EACnB;AAEA,MAAI,SAAS,OAAO,SAAS,GAAG;AAC9B,SAAK,SAAS,SAAS;AAAA,EACzB;AAEA,MAAI,SAAS,WAAW;AACtB,SAAK,YAAY,SAAS;AAAA,EAC5B;AAEA,OAAK,UAAU,SAAS;AAExB,MAAI,SAAS,aAAa;AACxB,SAAK,cAAc,SAAS;AAAA,EAC9B;AAEA,OAAK,OAAO;AAEZ,QAAM,UAAUA,MAAK,KAAK,MAAM;AAAA,IAC9B,aAAa;AAAA,IACb,aAAa;AAAA,IACb,WAAW;AAAA,IACX,UAAU;AAAA,EACZ,CAAC;AAED,SAAO;AAAA,EAAQ,OAAO;AACxB;AAEO,SAAS,kBACd,MACA,UACA,iBACA,MACQ;AACR,QAAM,WAAW,iBAAiB,SAAS,KAAK;AAChD,QAAM,WAAWD,MAAK,MAAM,QAAQ;AACpC,QAAM,cAAc,iBAAiB,UAAU,IAAI;AACnD,QAAM,cAAc,GAAG,WAAW;AAAA;AAAA,EAAO,eAAe;AAAA;AAExD,gBAAc,UAAU,aAAa,OAAO;AAE5C,SAAO;AACT;;;AN/CA,IAAM,cAAcE,MAAKC,SAAQ,GAAG,WAAW,eAAe,aAAa;AAE3E,IAAM,UAAU,IAAI,QAAQ;AAE5B,QACG,KAAK,aAAa,EAClB;AAAA,EACC;AACF,EACC,QAAQ,OAAO;AAElB,QACG,QAAQ,OAAO,EACf,YAAY,mCAAmC,EAC/C,SAAS,SAAS,cAAc,EAChC,OAAO,iBAAiB,uBAAuB,EAC/C,OAAO,YAAY,4BAA4B,EAC/C,OAAO,oBAAoB,yBAAyB,EACpD,OAAO,uBAAuB,sBAAsB,QAAQ,EAC5D,OAAO,gBAAgB,wBAAwB,CAAC,KAAa,QAAkB;AAC9E,MAAI,KAAK,GAAG;AACZ,SAAO;AACT,GAAG,CAAC,CAAa,EAChB;AAAA,EACC;AAAA,EACA;AACF,EACC,OAAO,kBAAkB,0BAA0B,EACnD,OAAO,aAAa,oCAAoC,EACxD,OAAO,OAAO,KAAa,YAAqC;AAC/D,MAAI;AACF,UAAM,aAAaC,YAAW,WAAW,IAAI,cAAc;AAC3D,UAAM,SAAS;AAAA,MACb;AAAA,QACE,MAAM,QAAQ;AAAA,QACd,MAAM,QAAQ;AAAA,QACd,SAAS,QAAQ;AAAA,QACjB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,UAAU,QAAQ;AAAA,QAClB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,MAClB;AAAA,MACA;AAAA,IACF;AAGA,QAAI,CAAC,OAAO,UAAU,CAACA,YAAW,OAAO,IAAI,GAAG;AAC9C,YAAM,IAAI,MAAM,yCAAyC,OAAO,IAAI,EAAE;AAAA,IACxE;AAEA,UAAM,cAAc,cAAc;AAClC,UAAM,cAAc,MAAM,UAAU,KAAK,QAAQ,WAAW;AAE5D,QAAI;AACJ,QAAI;AAEJ,QAAI,OAAO,UAAU;AAEnB,oBAAc,YAAY;AAC1B,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AAAA,IACvE,OAAO;AACL,YAAM,SAAS,QAAQ,YAAY,MAAM,YAAY,QAAQ;AAC7D,iBAAW,OAAO;AAClB,oBAAc,OAAO;AAAA,IACvB;AAEA,UAAM,WAAW,kBAAkB,WAAW;AAE9C,QAAI,OAAO,QAAQ;AACjB,YAAM,cAAc,iBAAiB,UAAU,OAAO,IAAI;AAC1D,cAAQ,OAAO,MAAM,GAAG,WAAW;AAAA;AAAA,EAAO,QAAQ;AAAA,CAAI;AAAA,IACxD,OAAO;AACL,YAAM,WAAW;AAAA,QACf,OAAO;AAAA,QACP;AAAA,QACA;AAAA,QACA,OAAO;AAAA,MACT;AACA,cAAQ,MAAM,UAAU,QAAQ,EAAE;AAAA,IACpC;AAAA,EACF,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB;AACF,CAAC;AAEH,QACG,QAAQ,OAAO,EACf,YAAY,kCAAkC,EAC9C,SAAS,SAAS,cAAc,EAChC,OAAO,uBAAuB,4BAA4B,QAAQ,EAClE,OAAO,OAAO,KAAa,YAAqC;AAC/D,QAAM,EAAE,UAAAC,UAAS,IAAI,MAAM,OAAO,YAAY;AAC9C,QAAM,cAAc,cAAc;AAClC,mBAAiB,WAAW;AAE5B,QAAM,aAAc,QAAQ,WAAkC;AAC9D,QAAM,UAAU,MAAMA,UAAS,OAAO,EAAE,UAAU,MAAM,CAAC;AAEzD,MAAI;AACF,UAAM,UAAU,MAAM,QAAQ,WAAW;AACzC,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAEnC,UAAM,KAAK,KAAK,KAAK,EAAE,WAAW,eAAe,SAAS,aAAa,IAAK,CAAC;AAE7E,YAAQ,MAAM,yEAAyE;AAEvF,UAAM,IAAI,QAAc,CAACC,aAAY;AACnC,cAAQ,MAAM,KAAK,QAAQ,MAAM;AAC/B,QAAAA,SAAQ;AAAA,MACV,CAAC;AAAA,IACH,CAAC;AAED,UAAM,cAAc,eAAe,KAAK,WAAW;AACnD,UAAM,QAAQ,aAAa,EAAE,MAAM,YAAY,CAAC;AAChD,YAAQ,MAAM,kBAAkB,WAAW,EAAE;AAAA,EAC/C,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF,CAAC;AAEH,QAAQ,MAAM;","names":["existsSync","homedir","join","homedir","join","yaml","join","homedir","existsSync","chromium","resolve"]}
1
+ {"version":3,"sources":["../src/cli.ts","../src/config.ts","../src/fetcher.ts","../src/session.ts","../src/extractor.ts","../src/converter.ts","../src/writer.ts"],"sourcesContent":["import { Command } from \"commander\";\nimport { once } from \"node:events\";\nimport { existsSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { join } from \"node:path\";\nimport { resolveConfig } from \"./config.js\";\nimport { fetchPage } from \"./fetcher.js\";\nimport { extract, extractMetadata } from \"./extractor.js\";\nimport { convertToMarkdown } from \"./converter.js\";\nimport { writeMarkdownFile, buildFrontmatter } from \"./writer.js\";\nimport {\n getSessionDir,\n getSessionPath,\n ensureSessionDir,\n} from \"./session.js\";\nimport type { Metadata, WaitUntilOption } from \"./types.js\";\n\nconst CONFIG_PATH = join(homedir(), \".config\", \"vault-fetch\", \"config.yaml\");\n\nconst program = new Command();\n\nprogram\n .name(\"vault-fetch\")\n .description(\n \"Fetch JS-rendered web pages and save as Markdown to Obsidian Vault\",\n )\n .version(\"0.3.0\");\n\nprogram\n .command(\"fetch\")\n .description(\"Fetch a page and save as Markdown\")\n .argument(\"<url>\", \"URL to fetch\")\n .option(\"--dest <path>\", \"Destination directory\")\n .option(\"--headed\", \"Run browser in headed mode\")\n .option(\"--selector <css>\", \"CSS selector to extract\")\n .option(\"--timeout <seconds>\", \"Timeout in seconds\", parseInt)\n .option(\"--tag <name>\", \"Add tag (repeatable)\", (val: string, acc: string[]) => {\n acc.push(val);\n return acc;\n }, [] as string[])\n .option(\n \"--wait-until <event>\",\n \"Wait condition: load, domcontentloaded, networkidle\",\n )\n .option(\"--skip-session\", \"Do not use saved session\")\n .option(\"--dry-run\", \"Output to stdout instead of saving\")\n .option(\"--no-block-images\", \"Do not block image requests\")\n .option(\"--no-block-fonts\", \"Do not block font requests\")\n .option(\"--no-block-media\", \"Do not block media requests\")\n .option(\"--raw\", \"Convert full page HTML without Readability extraction\")\n .action(async (url: string, options: Record<string, unknown>) => {\n try {\n const configPath = existsSync(CONFIG_PATH) ? CONFIG_PATH : undefined;\n const config = resolveConfig(\n {\n dest: options.dest as string | undefined,\n tags: options.tag as string[] | undefined,\n timeout: options.timeout as number | undefined,\n waitUntil: options.waitUntil as WaitUntilOption | undefined,\n headed: options.headed as boolean | undefined,\n selector: options.selector as string | undefined,\n noSession: options.skipSession as boolean | undefined,\n dryRun: options.dryRun as boolean | undefined,\n blockImages: options.blockImages as boolean | undefined,\n blockFonts: options.blockFonts as boolean | undefined,\n blockMedia: options.blockMedia as boolean | undefined,\n raw: options.raw as boolean | undefined,\n },\n configPath,\n );\n\n if (config.raw && config.selector) {\n throw new Error(\"--raw and --selector cannot be used together.\");\n }\n\n // Validate dest directory exists\n if (!config.dryRun && !existsSync(config.dest)) {\n throw new Error(`Destination directory does not exist: ${config.dest}`);\n }\n\n const sessionsDir = getSessionDir();\n const fetchResult = await fetchPage(url, config, sessionsDir);\n\n let markdown: string;\n let metadata: Metadata;\n\n if (fetchResult.kind === \"pdf\") {\n if (config.selector) {\n throw new Error(\"--selector cannot be used with PDF URLs.\");\n }\n if (config.raw) {\n throw new Error(\"--raw cannot be used with PDF URLs.\");\n }\n const { convertPdfToMarkdown } = await import(\"./pdf-converter.js\");\n const pdfResult = await convertPdfToMarkdown(\n fetchResult.pdfBuffer,\n fetchResult.finalUrl,\n );\n markdown = pdfResult.markdown;\n metadata = pdfResult.metadata;\n } else if (config.selector) {\n // --selector mode: skip Readability, extract metadata from full page\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n markdown = convertToMarkdown(fetchResult.html);\n } else if (config.raw) {\n // --raw mode: skip Readability, convert full page HTML directly\n metadata = extractMetadata(fetchResult.fullHtml, fetchResult.finalUrl);\n markdown = convertToMarkdown(fetchResult.fullHtml);\n } else {\n const result = extract(fetchResult.html, fetchResult.finalUrl);\n metadata = result.metadata;\n markdown = convertToMarkdown(result.content);\n }\n\n if (config.dryRun) {\n const frontmatter = buildFrontmatter(metadata, config.tags);\n process.stdout.write(`${frontmatter}\\n\\n${markdown}\\n`);\n } else {\n const filePath = writeMarkdownFile(\n config.dest,\n metadata,\n markdown,\n config.tags,\n );\n console.error(`Saved: ${filePath}`);\n }\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n }\n });\n\nprogram\n .command(\"login\")\n .description(\"Login to a site and save session\")\n .argument(\"<url>\", \"URL to login\")\n .option(\"--timeout <seconds>\", \"Login timeout in seconds\", parseInt)\n .action(async (url: string, options: Record<string, unknown>) => {\n const { chromium } = await import(\"playwright\");\n const sessionsDir = getSessionDir();\n ensureSessionDir(sessionsDir);\n\n const timeoutSec = (options.timeout as number | undefined) ?? 300;\n const browser = await chromium.launch({ headless: false });\n\n try {\n const context = await browser.newContext();\n const page = await context.newPage();\n\n await page.goto(url, { waitUntil: \"networkidle\", timeout: timeoutSec * 1000 });\n\n console.error(\"Browser opened. Log in manually, then press Enter here to save session.\");\n\n process.stdin.resume();\n await once(process.stdin, \"data\");\n process.stdin.pause();\n process.stdin.unref();\n\n const sessionPath = getSessionPath(url, sessionsDir);\n await context.storageState({ path: sessionPath });\n console.error(`Session saved: ${sessionPath}`);\n } catch (error) {\n const message = error instanceof Error ? error.message : String(error);\n console.error(`Error: ${message}`);\n process.exit(1);\n } finally {\n await browser.close();\n }\n });\n\nprogram.parse();\n","import { readFileSync } from \"node:fs\";\nimport { homedir } from \"node:os\";\nimport { resolve } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { ResolvedConfig, WaitUntilOption } from \"./types.js\";\n\nconst DEFAULT_TIMEOUT = 30;\nconst DEFAULT_WAIT_UNTIL: WaitUntilOption = \"networkidle\";\nconst REQUIRED_TAG = \"clippings\";\n\ninterface FileConfig {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n}\n\ninterface CliOptions {\n dest?: string;\n tags?: string[];\n timeout?: number;\n waitUntil?: WaitUntilOption;\n headed?: boolean;\n selector?: string;\n noSession?: boolean;\n dryRun?: boolean;\n blockImages?: boolean;\n blockFonts?: boolean;\n blockMedia?: boolean;\n raw?: boolean;\n}\n\nfunction expandTilde(filePath: string): string {\n if (filePath.startsWith(\"~/\")) {\n return resolve(homedir(), filePath.slice(2));\n }\n return filePath;\n}\n\nconst VALID_WAIT_UNTIL: readonly string[] = [\"load\", \"domcontentloaded\", \"networkidle\"];\n\nfunction validateWaitUntil(value: string): WaitUntilOption {\n if (!VALID_WAIT_UNTIL.includes(value)) {\n throw new Error(\n `Invalid waitUntil value: \"${value}\". Must be one of: ${VALID_WAIT_UNTIL.join(\", \")}`,\n );\n }\n return value as WaitUntilOption;\n}\n\nfunction loadConfigFile(configPath: string): FileConfig {\n const content = readFileSync(configPath, \"utf-8\");\n const parsed = yaml.load(content);\n if (parsed === null || typeof parsed !== \"object\") {\n throw new Error(`Invalid config file: ${configPath}`);\n }\n const config = parsed as Record<string, unknown>;\n\n if (config.timeout !== undefined && typeof config.timeout !== \"number\") {\n throw new Error(`Invalid timeout in config file: expected number, got ${typeof config.timeout}`);\n }\n if (config.dest !== undefined && typeof config.dest !== \"string\") {\n throw new Error(`Invalid dest in config file: expected string, got ${typeof config.dest}`);\n }\n if (config.waitUntil !== undefined) {\n if (typeof config.waitUntil !== \"string\") {\n throw new Error(`Invalid waitUntil in config file: expected string, got ${typeof config.waitUntil}`);\n }\n validateWaitUntil(config.waitUntil);\n }\n if (config.tags !== undefined) {\n if (!Array.isArray(config.tags) || !config.tags.every((t: unknown) => typeof t === \"string\")) {\n throw new Error(\"Invalid tags in config file: expected array of strings\");\n }\n }\n\n return config as FileConfig;\n}\n\nexport function resolveConfig(\n cliOptions: CliOptions,\n configPath: string | undefined,\n): ResolvedConfig {\n // Layer 1: Config file\n let fileConfig: FileConfig = {};\n if (configPath) {\n fileConfig = loadConfigFile(configPath);\n }\n\n // Layer 2: Environment variables\n const envDest = process.env.VAULT_FETCH_DEST;\n const envTimeout = process.env.VAULT_FETCH_TIMEOUT;\n\n // Resolve each field: CLI > env > file > default\n const dest = cliOptions.dest ?? envDest ?? fileConfig.dest;\n if (dest === undefined) {\n throw new Error(\n \"dest is required. Set via --dest, VAULT_FETCH_DEST, or config file.\",\n );\n }\n\n let timeout: number;\n if (cliOptions.timeout !== undefined) {\n timeout = cliOptions.timeout;\n } else if (envTimeout !== undefined) {\n const parsed = Number(envTimeout);\n if (Number.isNaN(parsed)) {\n throw new Error(`Invalid VAULT_FETCH_TIMEOUT value: ${envTimeout}`);\n }\n timeout = parsed;\n } else {\n timeout = fileConfig.timeout ?? DEFAULT_TIMEOUT;\n }\n\n const rawWaitUntil = cliOptions.waitUntil ?? fileConfig.waitUntil ?? DEFAULT_WAIT_UNTIL;\n const waitUntil = validateWaitUntil(rawWaitUntil);\n\n // Merge tags: file tags + CLI tags + always clippings\n const allTags = [\n ...(fileConfig.tags ?? []),\n ...(cliOptions.tags ?? []),\n REQUIRED_TAG,\n ];\n const tags = [...new Set(allTags)];\n\n return {\n dest: expandTilde(dest),\n tags,\n timeout,\n waitUntil,\n headed: cliOptions.headed ?? false,\n selector: cliOptions.selector ?? null,\n noSession: cliOptions.noSession ?? false,\n dryRun: cliOptions.dryRun ?? false,\n blockImages: cliOptions.blockImages ?? true,\n blockFonts: cliOptions.blockFonts ?? true,\n blockMedia: cliOptions.blockMedia ?? true,\n raw: cliOptions.raw ?? false,\n };\n}\n","import { chromium, type BrowserContext } from \"playwright\";\nimport type { FetchResult, ResolvedConfig } from \"./types.js\";\nimport { getSessionPath, sessionExists } from \"./session.js\";\n\nexport const CHROME_USER_AGENT =\n \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) \" +\n \"AppleWebKit/537.36 (KHTML, like Gecko) \" +\n \"Chrome/134.0.0.0 Safari/537.36\";\n\ninterface BlockingOptions {\n blockImages: boolean;\n blockFonts: boolean;\n blockMedia: boolean;\n}\n\nexport function isPdfContentType(contentType: string): boolean {\n return contentType.toLowerCase().includes(\"application/pdf\");\n}\n\nexport function buildBlockedResourceTypes(options: BlockingOptions): Set<string> {\n const blocked = new Set<string>();\n if (options.blockImages) blocked.add(\"image\");\n if (options.blockFonts) blocked.add(\"font\");\n if (options.blockMedia) blocked.add(\"media\");\n return blocked;\n}\n\nconst PDF_MAGIC_BYTES = \"%PDF\";\n\nexport function validatePdfBuffer(pdfBuffer: Buffer, sourceUrl: string): void {\n if (pdfBuffer.length === 0) {\n throw new Error(`Empty PDF response received from ${sourceUrl}`);\n }\n const header = pdfBuffer.subarray(0, PDF_MAGIC_BYTES.length).toString(\"ascii\");\n if (!header.startsWith(PDF_MAGIC_BYTES)) {\n throw new Error(\n `Response Content-Type is application/pdf but body is not valid PDF data from ${sourceUrl}`,\n );\n }\n}\n\nasync function downloadPdf(\n context: BrowserContext,\n url: string,\n timeoutMs: number,\n): Promise<{ pdfBuffer: Buffer; finalUrl: string }> {\n const apiResponse = await context.request.get(url, { timeout: timeoutMs });\n const status = apiResponse.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received when downloading PDF from ${url}`);\n }\n const pdfBuffer = Buffer.from(await apiResponse.body());\n const finalUrl = apiResponse.url();\n validatePdfBuffer(pdfBuffer, finalUrl);\n return { pdfBuffer, finalUrl };\n}\n\nexport async function fetchPage(\n url: string,\n config: ResolvedConfig,\n sessionsDir: string,\n): Promise<FetchResult> {\n const browser = await chromium.launch({\n headless: !config.headed,\n });\n\n try {\n const contextOptions: Parameters<typeof browser.newContext>[0] = {\n userAgent: CHROME_USER_AGENT,\n };\n\n // Load session if available and not disabled\n if (!config.noSession && sessionExists(url, sessionsDir)) {\n const sessionPath = getSessionPath(url, sessionsDir);\n contextOptions.storageState = sessionPath;\n }\n\n const context: BrowserContext = await browser.newContext(contextOptions);\n const page = await context.newPage();\n\n // Block specified resource types for faster loading\n const blockedTypes = buildBlockedResourceTypes(config);\n if (blockedTypes.size > 0) {\n await page.route(\"**/*\", async (route) => {\n if (blockedTypes.has(route.request().resourceType())) {\n await route.abort();\n } else {\n await route.continue();\n }\n });\n }\n\n const timeoutMs = config.timeout * 1000;\n\n // page.goto throws \"Download is starting\" when the server returns\n // Content-Disposition: attachment (common for PDF downloads).\n // Catch this and download the PDF via the context's HTTP client.\n let response;\n try {\n response = await page.goto(url, {\n waitUntil: config.waitUntil,\n timeout: timeoutMs,\n });\n } catch (error) {\n if (\n error instanceof Error &&\n error.message.includes(\"Download is starting\")\n ) {\n const result = await downloadPdf(context, url, timeoutMs);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };\n }\n throw error;\n }\n\n if (!response) {\n throw new Error(`No response received from ${url}`);\n }\n\n const status = response.status();\n if (status >= 400) {\n throw new Error(`HTTP ${status} received from ${response.url()}`);\n }\n\n const finalUrl = response.url();\n const contentType = response.headers()[\"content-type\"] ?? \"\";\n\n // Inline PDF (Content-Disposition: inline or absent).\n // Try response.body() first; fall back to context.request if the\n // browser returned its PDF viewer HTML instead of the actual bytes.\n if (isPdfContentType(contentType)) {\n const body = await response.body();\n try {\n validatePdfBuffer(body, finalUrl);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: body, url, finalUrl };\n } catch {\n // response.body() returned PDF viewer HTML; re-download via API\n const result = await downloadPdf(context, finalUrl, timeoutMs);\n await context.close();\n return { kind: \"pdf\", pdfBuffer: result.pdfBuffer, url, finalUrl: result.finalUrl };\n }\n }\n\n const fullHtml = await page.content();\n let html: string;\n\n if (config.selector) {\n const element = await page.$(config.selector);\n if (!element) {\n throw new Error(`Selector not found: ${config.selector}`);\n }\n html = await element.innerHTML();\n } else {\n html = fullHtml;\n }\n\n await context.close();\n\n return { kind: \"html\", html, fullHtml, url, finalUrl };\n } finally {\n await browser.close();\n }\n}\n","import { existsSync, mkdirSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport { homedir } from \"node:os\";\n\nconst CONFIG_DIR = join(homedir(), \".config\", \"vault-fetch\");\nconst SESSIONS_DIR = join(CONFIG_DIR, \"sessions\");\n\nexport function getSessionDir(): string {\n return SESSIONS_DIR;\n}\n\nfunction extractDomain(url: string): string {\n const parsed = new URL(url);\n return parsed.hostname ?? \"\";\n}\n\nexport function getSessionPath(url: string, sessionsDir: string): string {\n const domain = extractDomain(url);\n return join(sessionsDir, `${domain}.json`);\n}\n\nexport function sessionExists(url: string, sessionsDir: string): boolean {\n const sessionPath = getSessionPath(url, sessionsDir);\n return existsSync(sessionPath);\n}\n\nexport function ensureSessionDir(sessionsDir: string): void {\n if (!existsSync(sessionsDir)) {\n mkdirSync(sessionsDir, { recursive: true });\n }\n}\n","import { Readability } from \"@mozilla/readability\";\nimport { JSDOM } from \"jsdom\";\nimport type { Metadata } from \"./types.js\";\n\nfunction getMetaContent(doc: Document, selector: string): string | null {\n const el = doc.querySelector(selector);\n return el?.getAttribute(\"content\") ?? null;\n}\n\nfunction formatAuthor(raw: string): string {\n return `[[${raw.trim()}]]`;\n}\n\nfunction extractPublishedDate(doc: Document): string | null {\n const published =\n getMetaContent(doc, 'meta[property=\"article:published_time\"]') ??\n getMetaContent(doc, 'meta[name=\"datePublished\"]');\n\n if (!published) {\n const jsonLd = doc.querySelector('script[type=\"application/ld+json\"]');\n if (jsonLd?.textContent) {\n try {\n const data = JSON.parse(jsonLd.textContent) as Record<string, unknown>;\n if (typeof data.datePublished === \"string\") {\n return data.datePublished.split(\"T\")[0];\n }\n } catch {\n // JSON-LD parse failed\n }\n }\n return null;\n }\n\n return published.split(\"T\")[0];\n}\n\nfunction extractAuthors(\n doc: Document,\n readabilityByline: string | null,\n): string[] {\n const articleAuthors = doc.querySelectorAll('meta[property=\"article:author\"]');\n if (articleAuthors.length > 0) {\n return Array.from(articleAuthors)\n .map((el) => el.getAttribute(\"content\"))\n .filter((v): v is string => v !== null)\n .map(formatAuthor);\n }\n\n const ogAuthor = getMetaContent(doc, 'meta[property=\"og:author\"]');\n if (ogAuthor) {\n return [formatAuthor(ogAuthor)];\n }\n\n if (readabilityByline) {\n return [formatAuthor(readabilityByline)];\n }\n\n return [];\n}\n\nexport interface ExtractResult {\n metadata: Metadata;\n content: string;\n}\n\ninterface ReadabilityArticle {\n title: string;\n byline: string | null;\n excerpt: string;\n content: string;\n}\n\nfunction buildMetadata(\n doc: Document,\n article: ReadabilityArticle | null,\n finalUrl: string,\n): Metadata {\n const title = article?.title ?? doc.title;\n const authors = extractAuthors(doc, article?.byline ?? null);\n const published = extractPublishedDate(doc);\n\n const description =\n getMetaContent(doc, 'meta[property=\"og:description\"]') ??\n getMetaContent(doc, 'meta[name=\"description\"]') ??\n (article?.excerpt ?? null);\n\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n title,\n source: finalUrl,\n author: authors,\n published,\n created: today,\n description,\n };\n}\n\nfunction parseWithReadability(html: string, url: string): ReadabilityArticle | null {\n const dom = new JSDOM(html, { url });\n const reader = new Readability(dom.window.document);\n return reader.parse() as ReadabilityArticle | null;\n}\n\nexport function extract(html: string, finalUrl: string): ExtractResult {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n if (!article) {\n throw new Error(\n \"Readability failed to extract content from the page. \" +\n \"Try --raw to convert the full page, or --selector <css> to target specific content.\",\n );\n }\n\n if (!article.content) {\n throw new Error(\"Readability returned empty content for the page\");\n }\n\n return {\n metadata: buildMetadata(doc, article, finalUrl),\n content: article.content,\n };\n}\n\nexport function extractMetadata(html: string, finalUrl: string): Metadata {\n const metaDom = new JSDOM(html, { url: finalUrl });\n const doc = metaDom.window.document;\n\n const article = parseWithReadability(html, finalUrl);\n\n return buildMetadata(doc, article, finalUrl);\n}\n","import TurndownService from \"turndown\";\n\nexport function convertToMarkdown(html: string): string {\n const turndown = new TurndownService({\n headingStyle: \"atx\",\n codeBlockStyle: \"fenced\",\n bulletListMarker: \"-\",\n });\n\n return turndown.turndown(html);\n}\n","import { writeFileSync } from \"node:fs\";\nimport { join } from \"node:path\";\nimport yaml from \"js-yaml\";\nimport type { Metadata } from \"./types.js\";\n\nconst UNSAFE_CHARS = /[/\\\\:*?\"<>|]/g;\nconst CONTROL_CHARS = /[\\x00-\\x1f\\x7f]/g;\nconst MAX_FILENAME_LENGTH = 200;\n\nexport function sanitizeFilename(title: string): string {\n const sanitized = title\n .replace(CONTROL_CHARS, \"\")\n .replace(UNSAFE_CHARS, \"\")\n .replace(/\\s+/g, \" \")\n .trim();\n const base = sanitized.slice(0, MAX_FILENAME_LENGTH) || \"Untitled\";\n return `${base}.md`;\n}\n\nexport function buildFrontmatter(metadata: Metadata, tags: string[]): string {\n const data: Record<string, unknown> = {\n title: metadata.title,\n source: metadata.source,\n };\n\n if (metadata.author.length > 0) {\n data.author = metadata.author;\n }\n\n if (metadata.published) {\n data.published = metadata.published;\n }\n\n data.created = metadata.created;\n\n if (metadata.description) {\n data.description = metadata.description;\n }\n\n data.tags = tags;\n\n const yamlStr = yaml.dump(data, {\n quotingType: '\"',\n forceQuotes: false,\n lineWidth: -1,\n sortKeys: false,\n });\n\n return `---\\n${yamlStr}---`;\n}\n\nexport function writeMarkdownFile(\n dest: string,\n metadata: Metadata,\n markdownContent: string,\n tags: string[],\n): string {\n const filename = sanitizeFilename(metadata.title);\n const filePath = join(dest, filename);\n const frontmatter = buildFrontmatter(metadata, tags);\n const fullContent = `${frontmatter}\\n\\n${markdownContent}\\n`;\n\n writeFileSync(filePath, fullContent, \"utf-8\");\n\n return filePath;\n}\n"],"mappings":";;;AAAA,SAAS,eAAe;AACxB,SAAS,YAAY;AACrB,SAAS,cAAAA,mBAAkB;AAC3B,SAAS,WAAAC,gBAAe;AACxB,SAAS,QAAAC,aAAY;;;ACJrB,SAAS,oBAAoB;AAC7B,SAAS,eAAe;AACxB,SAAS,eAAe;AACxB,OAAO,UAAU;AAGjB,IAAM,kBAAkB;AACxB,IAAM,qBAAsC;AAC5C,IAAM,eAAe;AAwBrB,SAAS,YAAY,UAA0B;AAC7C,MAAI,SAAS,WAAW,IAAI,GAAG;AAC7B,WAAO,QAAQ,QAAQ,GAAG,SAAS,MAAM,CAAC,CAAC;AAAA,EAC7C;AACA,SAAO;AACT;AAEA,IAAM,mBAAsC,CAAC,QAAQ,oBAAoB,aAAa;AAEtF,SAAS,kBAAkB,OAAgC;AACzD,MAAI,CAAC,iBAAiB,SAAS,KAAK,GAAG;AACrC,UAAM,IAAI;AAAA,MACR,6BAA6B,KAAK,sBAAsB,iBAAiB,KAAK,IAAI,CAAC;AAAA,IACrF;AAAA,EACF;AACA,SAAO;AACT;AAEA,SAAS,eAAe,YAAgC;AACtD,QAAM,UAAU,aAAa,YAAY,OAAO;AAChD,QAAM,SAAS,KAAK,KAAK,OAAO;AAChC,MAAI,WAAW,QAAQ,OAAO,WAAW,UAAU;AACjD,UAAM,IAAI,MAAM,wBAAwB,UAAU,EAAE;AAAA,EACtD;AACA,QAAM,SAAS;AAEf,MAAI,OAAO,YAAY,UAAa,OAAO,OAAO,YAAY,UAAU;AACtE,UAAM,IAAI,MAAM,wDAAwD,OAAO,OAAO,OAAO,EAAE;AAAA,EACjG;AACA,MAAI,OAAO,SAAS,UAAa,OAAO,OAAO,SAAS,UAAU;AAChE,UAAM,IAAI,MAAM,qDAAqD,OAAO,OAAO,IAAI,EAAE;AAAA,EAC3F;AACA,MAAI,OAAO,cAAc,QAAW;AAClC,QAAI,OAAO,OAAO,cAAc,UAAU;AACxC,YAAM,IAAI,MAAM,0DAA0D,OAAO,OAAO,SAAS,EAAE;AAAA,IACrG;AACA,sBAAkB,OAAO,SAAS;AAAA,EACpC;AACA,MAAI,OAAO,SAAS,QAAW;AAC7B,QAAI,CAAC,MAAM,QAAQ,OAAO,IAAI,KAAK,CAAC,OAAO,KAAK,MAAM,CAAC,MAAe,OAAO,MAAM,QAAQ,GAAG;AAC5F,YAAM,IAAI,MAAM,wDAAwD;AAAA,IAC1E;AAAA,EACF;AAEA,SAAO;AACT;AAEO,SAAS,cACd,YACA,YACgB;AAEhB,MAAI,aAAyB,CAAC;AAC9B,MAAI,YAAY;AACd,iBAAa,eAAe,UAAU;AAAA,EACxC;AAGA,QAAM,UAAU,QAAQ,IAAI;AAC5B,QAAM,aAAa,QAAQ,IAAI;AAG/B,QAAM,OAAO,WAAW,QAAQ,WAAW,WAAW;AACtD,MAAI,SAAS,QAAW;AACtB,UAAM,IAAI;AAAA,MACR;AAAA,IACF;AAAA,EACF;AAEA,MAAI;AACJ,MAAI,WAAW,YAAY,QAAW;AACpC,cAAU,WAAW;AAAA,EACvB,WAAW,eAAe,QAAW;AACnC,UAAM,SAAS,OAAO,UAAU;AAChC,QAAI,OAAO,MAAM,MAAM,GAAG;AACxB,YAAM,IAAI,MAAM,sCAAsC,UAAU,EAAE;AAAA,IACpE;AACA,cAAU;AAAA,EACZ,OAAO;AACL,cAAU,WAAW,WAAW;AAAA,EAClC;AAEA,QAAM,eAAe,WAAW,aAAa,WAAW,aAAa;AACrE,QAAM,YAAY,kBAAkB,YAAY;AAGhD,QAAM,UAAU;AAAA,IACd,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB,GAAI,WAAW,QAAQ,CAAC;AAAA,IACxB;AAAA,EACF;AACA,QAAM,OAAO,CAAC,GAAG,IAAI,IAAI,OAAO,CAAC;AAEjC,SAAO;AAAA,IACL,MAAM,YAAY,IAAI;AAAA,IACtB;AAAA,IACA;AAAA,IACA;AAAA,IACA,QAAQ,WAAW,UAAU;AAAA,IAC7B,UAAU,WAAW,YAAY;AAAA,IACjC,WAAW,WAAW,aAAa;AAAA,IACnC,QAAQ,WAAW,UAAU;AAAA,IAC7B,aAAa,WAAW,eAAe;AAAA,IACvC,YAAY,WAAW,cAAc;AAAA,IACrC,YAAY,WAAW,cAAc;AAAA,IACrC,KAAK,WAAW,OAAO;AAAA,EACzB;AACF;;;AC3IA,SAAS,gBAAqC;;;ACA9C,SAAS,YAAY,iBAAiB;AACtC,SAAS,YAAY;AACrB,SAAS,WAAAC,gBAAe;AAExB,IAAM,aAAa,KAAKA,SAAQ,GAAG,WAAW,aAAa;AAC3D,IAAM,eAAe,KAAK,YAAY,UAAU;AAEzC,SAAS,gBAAwB;AACtC,SAAO;AACT;AAEA,SAAS,cAAc,KAAqB;AAC1C,QAAM,SAAS,IAAI,IAAI,GAAG;AAC1B,SAAO,OAAO,YAAY;AAC5B;AAEO,SAAS,eAAe,KAAa,aAA6B;AACvE,QAAM,SAAS,cAAc,GAAG;AAChC,SAAO,KAAK,aAAa,GAAG,MAAM,OAAO;AAC3C;AAEO,SAAS,cAAc,KAAa,aAA8B;AACvE,QAAM,cAAc,eAAe,KAAK,WAAW;AACnD,SAAO,WAAW,WAAW;AAC/B;AAEO,SAAS,iBAAiB,aAA2B;AAC1D,MAAI,CAAC,WAAW,WAAW,GAAG;AAC5B,cAAU,aAAa,EAAE,WAAW,KAAK,CAAC;AAAA,EAC5C;AACF;;;AD1BO,IAAM,oBACX;AAUK,SAAS,iBAAiB,aAA8B;AAC7D,SAAO,YAAY,YAAY,EAAE,SAAS,iBAAiB;AAC7D;AAEO,SAAS,0BAA0B,SAAuC;AAC/E,QAAM,UAAU,oBAAI,IAAY;AAChC,MAAI,QAAQ,YAAa,SAAQ,IAAI,OAAO;AAC5C,MAAI,QAAQ,WAAY,SAAQ,IAAI,MAAM;AAC1C,MAAI,QAAQ,WAAY,SAAQ,IAAI,OAAO;AAC3C,SAAO;AACT;AAEA,IAAM,kBAAkB;AAEjB,SAAS,kBAAkB,WAAmB,WAAyB;AAC5E,MAAI,UAAU,WAAW,GAAG;AAC1B,UAAM,IAAI,MAAM,oCAAoC,SAAS,EAAE;AAAA,EACjE;AACA,QAAM,SAAS,UAAU,SAAS,GAAG,gBAAgB,MAAM,EAAE,SAAS,OAAO;AAC7E,MAAI,CAAC,OAAO,WAAW,eAAe,GAAG;AACvC,UAAM,IAAI;AAAA,MACR,gFAAgF,SAAS;AAAA,IAC3F;AAAA,EACF;AACF;AAEA,eAAe,YACb,SACA,KACA,WACkD;AAClD,QAAM,cAAc,MAAM,QAAQ,QAAQ,IAAI,KAAK,EAAE,SAAS,UAAU,CAAC;AACzE,QAAM,SAAS,YAAY,OAAO;AAClC,MAAI,UAAU,KAAK;AACjB,UAAM,IAAI,MAAM,QAAQ,MAAM,uCAAuC,GAAG,EAAE;AAAA,EAC5E;AACA,QAAM,YAAY,OAAO,KAAK,MAAM,YAAY,KAAK,CAAC;AACtD,QAAM,WAAW,YAAY,IAAI;AACjC,oBAAkB,WAAW,QAAQ;AACrC,SAAO,EAAE,WAAW,SAAS;AAC/B;AAEA,eAAsB,UACpB,KACA,QACA,aACsB;AACtB,QAAM,UAAU,MAAM,SAAS,OAAO;AAAA,IACpC,UAAU,CAAC,OAAO;AAAA,EACpB,CAAC;AAED,MAAI;AACF,UAAM,iBAA2D;AAAA,MAC/D,WAAW;AAAA,IACb;AAGA,QAAI,CAAC,OAAO,aAAa,cAAc,KAAK,WAAW,GAAG;AACxD,YAAM,cAAc,eAAe,KAAK,WAAW;AACnD,qBAAe,eAAe;AAAA,IAChC;AAEA,UAAM,UAA0B,MAAM,QAAQ,WAAW,cAAc;AACvE,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAGnC,UAAM,eAAe,0BAA0B,MAAM;AACrD,QAAI,aAAa,OAAO,GAAG;AACzB,YAAM,KAAK,MAAM,QAAQ,OAAO,UAAU;AACxC,YAAI,aAAa,IAAI,MAAM,QAAQ,EAAE,aAAa,CAAC,GAAG;AACpD,gBAAM,MAAM,MAAM;AAAA,QACpB,OAAO;AACL,gBAAM,MAAM,SAAS;AAAA,QACvB;AAAA,MACF,CAAC;AAAA,IACH;AAEA,UAAM,YAAY,OAAO,UAAU;AAKnC,QAAI;AACJ,QAAI;AACF,iBAAW,MAAM,KAAK,KAAK,KAAK;AAAA,QAC9B,WAAW,OAAO;AAAA,QAClB,SAAS;AAAA,MACX,CAAC;AAAA,IACH,SAAS,OAAO;AACd,UACE,iBAAiB,SACjB,MAAM,QAAQ,SAAS,sBAAsB,GAC7C;AACA,cAAM,SAAS,MAAM,YAAY,SAAS,KAAK,SAAS;AACxD,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,OAAO,WAAW,KAAK,UAAU,OAAO,SAAS;AAAA,MACpF;AACA,YAAM;AAAA,IACR;AAEA,QAAI,CAAC,UAAU;AACb,YAAM,IAAI,MAAM,6BAA6B,GAAG,EAAE;AAAA,IACpD;AAEA,UAAM,SAAS,SAAS,OAAO;AAC/B,QAAI,UAAU,KAAK;AACjB,YAAM,IAAI,MAAM,QAAQ,MAAM,kBAAkB,SAAS,IAAI,CAAC,EAAE;AAAA,IAClE;AAEA,UAAM,WAAW,SAAS,IAAI;AAC9B,UAAM,cAAc,SAAS,QAAQ,EAAE,cAAc,KAAK;AAK1D,QAAI,iBAAiB,WAAW,GAAG;AACjC,YAAM,OAAO,MAAM,SAAS,KAAK;AACjC,UAAI;AACF,0BAAkB,MAAM,QAAQ;AAChC,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,MAAM,KAAK,SAAS;AAAA,MACvD,QAAQ;AAEN,cAAM,SAAS,MAAM,YAAY,SAAS,UAAU,SAAS;AAC7D,cAAM,QAAQ,MAAM;AACpB,eAAO,EAAE,MAAM,OAAO,WAAW,OAAO,WAAW,KAAK,UAAU,OAAO,SAAS;AAAA,MACpF;AAAA,IACF;AAEA,UAAM,WAAW,MAAM,KAAK,QAAQ;AACpC,QAAI;AAEJ,QAAI,OAAO,UAAU;AACnB,YAAM,UAAU,MAAM,KAAK,EAAE,OAAO,QAAQ;AAC5C,UAAI,CAAC,SAAS;AACZ,cAAM,IAAI,MAAM,uBAAuB,OAAO,QAAQ,EAAE;AAAA,MAC1D;AACA,aAAO,MAAM,QAAQ,UAAU;AAAA,IACjC,OAAO;AACL,aAAO;AAAA,IACT;AAEA,UAAM,QAAQ,MAAM;AAEpB,WAAO,EAAE,MAAM,QAAQ,MAAM,UAAU,KAAK,SAAS;AAAA,EACvD,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF;;;AEnKA,SAAS,mBAAmB;AAC5B,SAAS,aAAa;AAGtB,SAAS,eAAe,KAAe,UAAiC;AACtE,QAAM,KAAK,IAAI,cAAc,QAAQ;AACrC,SAAO,IAAI,aAAa,SAAS,KAAK;AACxC;AAEA,SAAS,aAAa,KAAqB;AACzC,SAAO,KAAK,IAAI,KAAK,CAAC;AACxB;AAEA,SAAS,qBAAqB,KAA8B;AAC1D,QAAM,YACJ,eAAe,KAAK,yCAAyC,KAC7D,eAAe,KAAK,4BAA4B;AAElD,MAAI,CAAC,WAAW;AACd,UAAM,SAAS,IAAI,cAAc,oCAAoC;AACrE,QAAI,QAAQ,aAAa;AACvB,UAAI;AACF,cAAM,OAAO,KAAK,MAAM,OAAO,WAAW;AAC1C,YAAI,OAAO,KAAK,kBAAkB,UAAU;AAC1C,iBAAO,KAAK,cAAc,MAAM,GAAG,EAAE,CAAC;AAAA,QACxC;AAAA,MACF,QAAQ;AAAA,MAER;AAAA,IACF;AACA,WAAO;AAAA,EACT;AAEA,SAAO,UAAU,MAAM,GAAG,EAAE,CAAC;AAC/B;AAEA,SAAS,eACP,KACA,mBACU;AACV,QAAM,iBAAiB,IAAI,iBAAiB,iCAAiC;AAC7E,MAAI,eAAe,SAAS,GAAG;AAC7B,WAAO,MAAM,KAAK,cAAc,EAC7B,IAAI,CAAC,OAAO,GAAG,aAAa,SAAS,CAAC,EACtC,OAAO,CAAC,MAAmB,MAAM,IAAI,EACrC,IAAI,YAAY;AAAA,EACrB;AAEA,QAAM,WAAW,eAAe,KAAK,4BAA4B;AACjE,MAAI,UAAU;AACZ,WAAO,CAAC,aAAa,QAAQ,CAAC;AAAA,EAChC;AAEA,MAAI,mBAAmB;AACrB,WAAO,CAAC,aAAa,iBAAiB,CAAC;AAAA,EACzC;AAEA,SAAO,CAAC;AACV;AAcA,SAAS,cACP,KACA,SACA,UACU;AACV,QAAM,QAAQ,SAAS,SAAS,IAAI;AACpC,QAAM,UAAU,eAAe,KAAK,SAAS,UAAU,IAAI;AAC3D,QAAM,YAAY,qBAAqB,GAAG;AAE1C,QAAM,cACJ,eAAe,KAAK,iCAAiC,KACrD,eAAe,KAAK,0BAA0B,MAC7C,SAAS,WAAW;AAEvB,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,QAAQ;AAAA,IACR,QAAQ;AAAA,IACR;AAAA,IACA,SAAS;AAAA,IACT;AAAA,EACF;AACF;AAEA,SAAS,qBAAqB,MAAc,KAAwC;AAClF,QAAM,MAAM,IAAI,MAAM,MAAM,EAAE,IAAI,CAAC;AACnC,QAAM,SAAS,IAAI,YAAY,IAAI,OAAO,QAAQ;AAClD,SAAO,OAAO,MAAM;AACtB;AAEO,SAAS,QAAQ,MAAc,UAAiC;AACrE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,MAAI,CAAC,SAAS;AACZ,UAAM,IAAI;AAAA,MACR;AAAA,IAEF;AAAA,EACF;AAEA,MAAI,CAAC,QAAQ,SAAS;AACpB,UAAM,IAAI,MAAM,iDAAiD;AAAA,EACnE;AAEA,SAAO;AAAA,IACL,UAAU,cAAc,KAAK,SAAS,QAAQ;AAAA,IAC9C,SAAS,QAAQ;AAAA,EACnB;AACF;AAEO,SAAS,gBAAgB,MAAc,UAA4B;AACxE,QAAM,UAAU,IAAI,MAAM,MAAM,EAAE,KAAK,SAAS,CAAC;AACjD,QAAM,MAAM,QAAQ,OAAO;AAE3B,QAAM,UAAU,qBAAqB,MAAM,QAAQ;AAEnD,SAAO,cAAc,KAAK,SAAS,QAAQ;AAC7C;;;ACtIA,OAAO,qBAAqB;AAErB,SAAS,kBAAkB,MAAsB;AACtD,QAAM,WAAW,IAAI,gBAAgB;AAAA,IACnC,cAAc;AAAA,IACd,gBAAgB;AAAA,IAChB,kBAAkB;AAAA,EACpB,CAAC;AAED,SAAO,SAAS,SAAS,IAAI;AAC/B;;;ACVA,SAAS,qBAAqB;AAC9B,SAAS,QAAAC,aAAY;AACrB,OAAOC,WAAU;AAGjB,IAAM,eAAe;AACrB,IAAM,gBAAgB;AACtB,IAAM,sBAAsB;AAErB,SAAS,iBAAiB,OAAuB;AACtD,QAAM,YAAY,MACf,QAAQ,eAAe,EAAE,EACzB,QAAQ,cAAc,EAAE,EACxB,QAAQ,QAAQ,GAAG,EACnB,KAAK;AACR,QAAM,OAAO,UAAU,MAAM,GAAG,mBAAmB,KAAK;AACxD,SAAO,GAAG,IAAI;AAChB;AAEO,SAAS,iBAAiB,UAAoB,MAAwB;AAC3E,QAAM,OAAgC;AAAA,IACpC,OAAO,SAAS;AAAA,IAChB,QAAQ,SAAS;AAAA,EACnB;AAEA,MAAI,SAAS,OAAO,SAAS,GAAG;AAC9B,SAAK,SAAS,SAAS;AAAA,EACzB;AAEA,MAAI,SAAS,WAAW;AACtB,SAAK,YAAY,SAAS;AAAA,EAC5B;AAEA,OAAK,UAAU,SAAS;AAExB,MAAI,SAAS,aAAa;AACxB,SAAK,cAAc,SAAS;AAAA,EAC9B;AAEA,OAAK,OAAO;AAEZ,QAAM,UAAUA,MAAK,KAAK,MAAM;AAAA,IAC9B,aAAa;AAAA,IACb,aAAa;AAAA,IACb,WAAW;AAAA,IACX,UAAU;AAAA,EACZ,CAAC;AAED,SAAO;AAAA,EAAQ,OAAO;AACxB;AAEO,SAAS,kBACd,MACA,UACA,iBACA,MACQ;AACR,QAAM,WAAW,iBAAiB,SAAS,KAAK;AAChD,QAAM,WAAWD,MAAK,MAAM,QAAQ;AACpC,QAAM,cAAc,iBAAiB,UAAU,IAAI;AACnD,QAAM,cAAc,GAAG,WAAW;AAAA;AAAA,EAAO,eAAe;AAAA;AAExD,gBAAc,UAAU,aAAa,OAAO;AAE5C,SAAO;AACT;;;ANhDA,IAAM,cAAcE,MAAKC,SAAQ,GAAG,WAAW,eAAe,aAAa;AAE3E,IAAM,UAAU,IAAI,QAAQ;AAE5B,QACG,KAAK,aAAa,EAClB;AAAA,EACC;AACF,EACC,QAAQ,OAAO;AAElB,QACG,QAAQ,OAAO,EACf,YAAY,mCAAmC,EAC/C,SAAS,SAAS,cAAc,EAChC,OAAO,iBAAiB,uBAAuB,EAC/C,OAAO,YAAY,4BAA4B,EAC/C,OAAO,oBAAoB,yBAAyB,EACpD,OAAO,uBAAuB,sBAAsB,QAAQ,EAC5D,OAAO,gBAAgB,wBAAwB,CAAC,KAAa,QAAkB;AAC9E,MAAI,KAAK,GAAG;AACZ,SAAO;AACT,GAAG,CAAC,CAAa,EAChB;AAAA,EACC;AAAA,EACA;AACF,EACC,OAAO,kBAAkB,0BAA0B,EACnD,OAAO,aAAa,oCAAoC,EACxD,OAAO,qBAAqB,6BAA6B,EACzD,OAAO,oBAAoB,4BAA4B,EACvD,OAAO,oBAAoB,6BAA6B,EACxD,OAAO,SAAS,uDAAuD,EACvE,OAAO,OAAO,KAAa,YAAqC;AAC/D,MAAI;AACF,UAAM,aAAaC,YAAW,WAAW,IAAI,cAAc;AAC3D,UAAM,SAAS;AAAA,MACb;AAAA,QACE,MAAM,QAAQ;AAAA,QACd,MAAM,QAAQ;AAAA,QACd,SAAS,QAAQ;AAAA,QACjB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,UAAU,QAAQ;AAAA,QAClB,WAAW,QAAQ;AAAA,QACnB,QAAQ,QAAQ;AAAA,QAChB,aAAa,QAAQ;AAAA,QACrB,YAAY,QAAQ;AAAA,QACpB,YAAY,QAAQ;AAAA,QACpB,KAAK,QAAQ;AAAA,MACf;AAAA,MACA;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,UAAU;AACjC,YAAM,IAAI,MAAM,+CAA+C;AAAA,IACjE;AAGA,QAAI,CAAC,OAAO,UAAU,CAACA,YAAW,OAAO,IAAI,GAAG;AAC9C,YAAM,IAAI,MAAM,yCAAyC,OAAO,IAAI,EAAE;AAAA,IACxE;AAEA,UAAM,cAAc,cAAc;AAClC,UAAM,cAAc,MAAM,UAAU,KAAK,QAAQ,WAAW;AAE5D,QAAI;AACJ,QAAI;AAEJ,QAAI,YAAY,SAAS,OAAO;AAC9B,UAAI,OAAO,UAAU;AACnB,cAAM,IAAI,MAAM,0CAA0C;AAAA,MAC5D;AACA,UAAI,OAAO,KAAK;AACd,cAAM,IAAI,MAAM,qCAAqC;AAAA,MACvD;AACA,YAAM,EAAE,qBAAqB,IAAI,MAAM,OAAO,6BAAoB;AAClE,YAAM,YAAY,MAAM;AAAA,QACtB,YAAY;AAAA,QACZ,YAAY;AAAA,MACd;AACA,iBAAW,UAAU;AACrB,iBAAW,UAAU;AAAA,IACvB,WAAW,OAAO,UAAU;AAE1B,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AACrE,iBAAW,kBAAkB,YAAY,IAAI;AAAA,IAC/C,WAAW,OAAO,KAAK;AAErB,iBAAW,gBAAgB,YAAY,UAAU,YAAY,QAAQ;AACrE,iBAAW,kBAAkB,YAAY,QAAQ;AAAA,IACnD,OAAO;AACL,YAAM,SAAS,QAAQ,YAAY,MAAM,YAAY,QAAQ;AAC7D,iBAAW,OAAO;AAClB,iBAAW,kBAAkB,OAAO,OAAO;AAAA,IAC7C;AAEA,QAAI,OAAO,QAAQ;AACjB,YAAM,cAAc,iBAAiB,UAAU,OAAO,IAAI;AAC1D,cAAQ,OAAO,MAAM,GAAG,WAAW;AAAA;AAAA,EAAO,QAAQ;AAAA,CAAI;AAAA,IACxD,OAAO;AACL,YAAM,WAAW;AAAA,QACf,OAAO;AAAA,QACP;AAAA,QACA;AAAA,QACA,OAAO;AAAA,MACT;AACA,cAAQ,MAAM,UAAU,QAAQ,EAAE;AAAA,IACpC;AAAA,EACF,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB;AACF,CAAC;AAEH,QACG,QAAQ,OAAO,EACf,YAAY,kCAAkC,EAC9C,SAAS,SAAS,cAAc,EAChC,OAAO,uBAAuB,4BAA4B,QAAQ,EAClE,OAAO,OAAO,KAAa,YAAqC;AAC/D,QAAM,EAAE,UAAAC,UAAS,IAAI,MAAM,OAAO,YAAY;AAC9C,QAAM,cAAc,cAAc;AAClC,mBAAiB,WAAW;AAE5B,QAAM,aAAc,QAAQ,WAAkC;AAC9D,QAAM,UAAU,MAAMA,UAAS,OAAO,EAAE,UAAU,MAAM,CAAC;AAEzD,MAAI;AACF,UAAM,UAAU,MAAM,QAAQ,WAAW;AACzC,UAAM,OAAO,MAAM,QAAQ,QAAQ;AAEnC,UAAM,KAAK,KAAK,KAAK,EAAE,WAAW,eAAe,SAAS,aAAa,IAAK,CAAC;AAE7E,YAAQ,MAAM,yEAAyE;AAEvF,YAAQ,MAAM,OAAO;AACrB,UAAM,KAAK,QAAQ,OAAO,MAAM;AAChC,YAAQ,MAAM,MAAM;AACpB,YAAQ,MAAM,MAAM;AAEpB,UAAM,cAAc,eAAe,KAAK,WAAW;AACnD,UAAM,QAAQ,aAAa,EAAE,MAAM,YAAY,CAAC;AAChD,YAAQ,MAAM,kBAAkB,WAAW,EAAE;AAAA,EAC/C,SAAS,OAAO;AACd,UAAM,UAAU,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK;AACrE,YAAQ,MAAM,UAAU,OAAO,EAAE;AACjC,YAAQ,KAAK,CAAC;AAAA,EAChB,UAAE;AACA,UAAM,QAAQ,MAAM;AAAA,EACtB;AACF,CAAC;AAEH,QAAQ,MAAM;","names":["existsSync","homedir","join","homedir","join","yaml","join","homedir","existsSync","chromium"]}
@@ -0,0 +1,42 @@
1
+ #!/usr/bin/env node
2
+
3
+ // src/pdf-converter.ts
4
+ import pdf2md from "@opendocsg/pdf2md";
5
+ async function convertPdfToMarkdown(pdfBuffer, sourceUrl) {
6
+ const markdown = await pdf2md(pdfBuffer);
7
+ if (!markdown.trim()) {
8
+ throw new Error("pdf2md returned empty content from the PDF");
9
+ }
10
+ const title = extractTitleFromMarkdown(markdown) ?? extractTitleFromUrl(sourceUrl);
11
+ const today = (/* @__PURE__ */ new Date()).toISOString().split("T")[0];
12
+ return {
13
+ markdown,
14
+ metadata: {
15
+ title,
16
+ source: sourceUrl,
17
+ author: [],
18
+ published: null,
19
+ created: today,
20
+ description: null
21
+ }
22
+ };
23
+ }
24
+ function extractTitleFromMarkdown(markdown) {
25
+ const match = markdown.match(/^#\s+(.+?)(?:\s+#+)?$/m);
26
+ if (!match) return null;
27
+ const trimmed = match[1].trim();
28
+ return trimmed || null;
29
+ }
30
+ function extractTitleFromUrl(url) {
31
+ const pathname = new URL(url).pathname;
32
+ const lastSegment = pathname.split("/").filter(Boolean).pop();
33
+ if (!lastSegment) {
34
+ throw new Error(`Cannot extract filename from URL: ${url}`);
35
+ }
36
+ return decodeURIComponent(lastSegment.replace(/\.pdf$/i, ""));
37
+ }
38
+ export {
39
+ convertPdfToMarkdown,
40
+ extractTitleFromUrl
41
+ };
42
+ //# sourceMappingURL=pdf-converter-U3SFA2HY.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"sources":["../src/pdf-converter.ts"],"sourcesContent":["import pdf2md from \"@opendocsg/pdf2md\";\nimport type { Metadata } from \"./types.js\";\n\nexport interface PdfConvertResult {\n markdown: string;\n metadata: Metadata;\n}\n\nexport async function convertPdfToMarkdown(\n pdfBuffer: Buffer,\n sourceUrl: string,\n): Promise<PdfConvertResult> {\n const markdown = await pdf2md(pdfBuffer);\n if (!markdown.trim()) {\n throw new Error(\"pdf2md returned empty content from the PDF\");\n }\n const title = extractTitleFromMarkdown(markdown) ?? extractTitleFromUrl(sourceUrl);\n const today = new Date().toISOString().split(\"T\")[0];\n\n return {\n markdown,\n metadata: {\n title,\n source: sourceUrl,\n author: [],\n published: null,\n created: today,\n description: null,\n },\n };\n}\n\nfunction extractTitleFromMarkdown(markdown: string): string | null {\n const match = markdown.match(/^#\\s+(.+?)(?:\\s+#+)?$/m);\n if (!match) return null;\n const trimmed = match[1].trim();\n return trimmed || null;\n}\n\nexport function extractTitleFromUrl(url: string): string {\n const pathname = new URL(url).pathname;\n const lastSegment = pathname.split(\"/\").filter(Boolean).pop();\n if (!lastSegment) {\n throw new Error(`Cannot extract filename from URL: ${url}`);\n }\n return decodeURIComponent(lastSegment.replace(/\\.pdf$/i, \"\"));\n}\n"],"mappings":";;;AAAA,OAAO,YAAY;AAQnB,eAAsB,qBACpB,WACA,WAC2B;AAC3B,QAAM,WAAW,MAAM,OAAO,SAAS;AACvC,MAAI,CAAC,SAAS,KAAK,GAAG;AACpB,UAAM,IAAI,MAAM,4CAA4C;AAAA,EAC9D;AACA,QAAM,QAAQ,yBAAyB,QAAQ,KAAK,oBAAoB,SAAS;AACjF,QAAM,SAAQ,oBAAI,KAAK,GAAE,YAAY,EAAE,MAAM,GAAG,EAAE,CAAC;AAEnD,SAAO;AAAA,IACL;AAAA,IACA,UAAU;AAAA,MACR;AAAA,MACA,QAAQ;AAAA,MACR,QAAQ,CAAC;AAAA,MACT,WAAW;AAAA,MACX,SAAS;AAAA,MACT,aAAa;AAAA,IACf;AAAA,EACF;AACF;AAEA,SAAS,yBAAyB,UAAiC;AACjE,QAAM,QAAQ,SAAS,MAAM,wBAAwB;AACrD,MAAI,CAAC,MAAO,QAAO;AACnB,QAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,SAAO,WAAW;AACpB;AAEO,SAAS,oBAAoB,KAAqB;AACvD,QAAM,WAAW,IAAI,IAAI,GAAG,EAAE;AAC9B,QAAM,cAAc,SAAS,MAAM,GAAG,EAAE,OAAO,OAAO,EAAE,IAAI;AAC5D,MAAI,CAAC,aAAa;AAChB,UAAM,IAAI,MAAM,qCAAqC,GAAG,EAAE;AAAA,EAC5D;AACA,SAAO,mBAAmB,YAAY,QAAQ,WAAW,EAAE,CAAC;AAC9D;","names":[]}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "vault-fetch",
3
- "version": "0.1.0",
3
+ "version": "0.3.0",
4
4
  "description": "Fetch JS-rendered web pages with Playwright and save as Markdown to Obsidian Vault",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -35,6 +35,7 @@
35
35
  },
36
36
  "dependencies": {
37
37
  "@mozilla/readability": "^0.6.0",
38
+ "@opendocsg/pdf2md": "^0.2.5",
38
39
  "commander": "^14.0.3",
39
40
  "js-yaml": "^4.1.1",
40
41
  "jsdom": "^29.0.1",