@steipete/summarize 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -3
- package/README.md +41 -9
- package/dist/cli.cjs +5209 -740
- package/dist/cli.cjs.map +4 -4
- package/dist/esm/content/link-preview/client.js +6 -0
- package/dist/esm/content/link-preview/client.js.map +1 -1
- package/dist/esm/content/link-preview/transcript/index.js +6 -0
- package/dist/esm/content/link-preview/transcript/index.js.map +1 -1
- package/dist/esm/content/link-preview/transcript/providers/youtube/yt-dlp.js +213 -0
- package/dist/esm/content/link-preview/transcript/providers/youtube/yt-dlp.js.map +1 -0
- package/dist/esm/content/link-preview/transcript/providers/youtube.js +40 -2
- package/dist/esm/content/link-preview/transcript/providers/youtube.js.map +1 -1
- package/dist/esm/flags.js +20 -1
- package/dist/esm/flags.js.map +1 -1
- package/dist/esm/llm/generate-text.js +51 -14
- package/dist/esm/llm/generate-text.js.map +1 -1
- package/dist/esm/llm/html-to-markdown.js +3 -2
- package/dist/esm/llm/html-to-markdown.js.map +1 -1
- package/dist/esm/markitdown.js +54 -0
- package/dist/esm/markitdown.js.map +1 -0
- package/dist/esm/prompts/file.js +19 -0
- package/dist/esm/prompts/file.js.map +1 -1
- package/dist/esm/prompts/index.js +1 -1
- package/dist/esm/prompts/index.js.map +1 -1
- package/dist/esm/run.js +302 -44
- package/dist/esm/run.js.map +1 -1
- package/dist/esm/version.js +1 -1
- package/dist/types/content/link-preview/client.d.ts +3 -0
- package/dist/types/content/link-preview/content/types.d.ts +1 -1
- package/dist/types/content/link-preview/deps.d.ts +3 -0
- package/dist/types/content/link-preview/transcript/providers/youtube/yt-dlp.d.ts +15 -0
- package/dist/types/content/link-preview/transcript/types.d.ts +4 -0
- package/dist/types/flags.d.ts +5 -1
- package/dist/types/llm/generate-text.d.ts +8 -2
- package/dist/types/llm/html-to-markdown.d.ts +4 -1
- package/dist/types/markitdown.d.ts +10 -0
- package/dist/types/prompts/file.d.ts +7 -0
- package/dist/types/prompts/index.d.ts +1 -1
- package/dist/types/run.d.ts +3 -1
- package/dist/types/version.d.ts +1 -1
- package/docs/README.md +1 -1
- package/docs/extract-only.md +10 -7
- package/docs/firecrawl.md +2 -2
- package/docs/site/docs/config.html +3 -3
- package/docs/site/docs/extract-only.html +7 -5
- package/docs/site/docs/firecrawl.html +6 -6
- package/docs/site/docs/index.html +2 -2
- package/docs/site/docs/llm.html +2 -2
- package/docs/site/docs/openai.html +2 -2
- package/docs/site/docs/website.html +7 -4
- package/docs/site/docs/youtube.html +2 -2
- package/docs/site/index.html +1 -1
- package/docs/website.md +10 -7
- package/docs/youtube.md +6 -3
- package/package.json +5 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,10 +1,40 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.4.0 - 2025-12-21
|
|
4
|
+
|
|
5
|
+
### Changes
|
|
6
|
+
|
|
7
|
+
- Add URL extraction mode via `--extract` (deprecated alias: `--extract-only`) with `--format md|text`.
|
|
8
|
+
- Rename HTML→Markdown conversion flag to `--markdown-mode` (deprecated alias: `--markdown`).
|
|
9
|
+
- Add `--preprocess off|auto|always` and a `uvx markitdown` fallback for Markdown extraction and unsupported file attachments (when `--format md` is used).
|
|
10
|
+
|
|
11
|
+
## 0.3.0 - 2025-12-20
|
|
12
|
+
### Changes
|
|
13
|
+
|
|
14
|
+
- Add yt-dlp audio transcription fallback for YouTube; prefer OpenAI Whisper with FAL fallback. Thanks @dougvk.
|
|
15
|
+
- Add `--no-playlist` to yt-dlp downloads to avoid transcript mismatches.
|
|
16
|
+
- Run yt-dlp after web + Apify in `--youtube auto`, and error early for missing keys in `--youtube yt-dlp`.
|
|
17
|
+
- Require Node 22+.
|
|
18
|
+
- Respect `OPENAI_BASE_URL` when set, even with OpenRouter keys.
|
|
19
|
+
- Apply OpenRouter provider ordering headers to HTML→Markdown conversion.
|
|
20
|
+
- Add OpenRouter configuration tests. Thanks @dougvk for the initial OpenRouter support.
|
|
21
|
+
- Build and ship a Bun bytecode arm64 binary for Homebrew.
|
|
22
|
+
|
|
23
|
+
### Tests
|
|
24
|
+
|
|
25
|
+
- Add coverage for yt-dlp ordering, missing-key errors, and helper paths.
|
|
26
|
+
- Add live coverage for yt-dlp transcript mode and missing-caption YouTube pages.
|
|
27
|
+
|
|
28
|
+
### Dev
|
|
29
|
+
|
|
30
|
+
- Add `Dockerfile.test` for containerized yt-dlp testing.
|
|
31
|
+
|
|
3
32
|
## 0.2.0 - 2025-12-20
|
|
4
33
|
|
|
5
34
|
### Changes
|
|
6
35
|
|
|
7
|
-
-
|
|
36
|
+
- Add native OpenRouter support via `OPENROUTER_API_KEY` with optional provider ordering (`OPENROUTER_PROVIDERS`).
|
|
37
|
+
- Remove map-reduce summarization; reject inputs that exceed the model's context window.
|
|
8
38
|
- Preflight text prompts with the GPT tokenizer and the model’s max input tokens.
|
|
9
39
|
- Reject text files over 10 MB before tokenization.
|
|
10
40
|
- Reject too-small numeric `--length` and `--max-output-tokens` values.
|
|
@@ -71,7 +101,7 @@ First public release.
|
|
|
71
101
|
- `--max-output-tokens <count>` (optional hard cap)
|
|
72
102
|
- `--timeout <duration>` (default `2m`)
|
|
73
103
|
- `--stream auto|on|off`, `--render auto|md-live|md|plain`
|
|
74
|
-
- `--extract
|
|
104
|
+
- `--extract` (URLs only; no summary; deprecated alias: `--extract-only`)
|
|
75
105
|
- `--json` (structured output incl. input config, prompt, extracted content, LLM metadata, and metrics)
|
|
76
106
|
- `--metrics off|on|detailed` (default `on`)
|
|
77
107
|
- `--verbose`
|
|
@@ -80,7 +110,7 @@ First public release.
|
|
|
80
110
|
|
|
81
111
|
- Websites: fetch + extract “article-ish” content + normalization for prompts.
|
|
82
112
|
- Firecrawl fallback for blocked/thin sites (`--firecrawl off|auto|always`, via `FIRECRAWL_API_KEY`).
|
|
83
|
-
- Markdown extraction for websites in `--extract
|
|
113
|
+
- Markdown extraction for websites in `--extract` mode (`--format md|text`, `--markdown-mode off|auto|llm`).
|
|
84
114
|
- YouTube (`--youtube auto|web|apify`):
|
|
85
115
|
- best-effort transcript endpoints
|
|
86
116
|
- optional Apify fallback (requires `APIFY_API_TOKEN`; single actor `faVsWy9VTSNVIhWpR`)
|
package/README.md
CHANGED
|
@@ -11,6 +11,8 @@ It streams output by default on TTY and renders Markdown to ANSI (via `markdansi
|
|
|
11
11
|
|
|
12
12
|
## Install
|
|
13
13
|
|
|
14
|
+
Requires Node 22+.
|
|
15
|
+
|
|
14
16
|
- npx (no install):
|
|
15
17
|
|
|
16
18
|
```bash
|
|
@@ -23,6 +25,8 @@ npx -y @steipete/summarize "https://example.com" --model google/gemini-3-flash-p
|
|
|
23
25
|
brew install steipete/tap/summarize
|
|
24
26
|
```
|
|
25
27
|
|
|
28
|
+
Apple Silicon only (arm64).
|
|
29
|
+
|
|
26
30
|
## Quickstart
|
|
27
31
|
|
|
28
32
|
```bash
|
|
@@ -108,7 +112,10 @@ npx -y @steipete/summarize <input> [flags]
|
|
|
108
112
|
- `--max-output-tokens <count>`: hard cap for LLM output tokens (optional)
|
|
109
113
|
- `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)
|
|
110
114
|
- `--render auto|md-live|md|plain`: Markdown rendering (`auto` = best default for TTY)
|
|
111
|
-
- `--
|
|
115
|
+
- `--format md|text`: website/file content format (default `text`)
|
|
116
|
+
- `--preprocess off|auto|always`: preprocess files (only with `--format md`) for model compatibility (default `auto`)
|
|
117
|
+
- `--extract`: print extracted content and exit (no summary) — only for URLs
|
|
118
|
+
- Deprecated alias: `--extract-only`
|
|
112
119
|
- `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary
|
|
113
120
|
- `--verbose`: debug/diagnostics on stderr
|
|
114
121
|
- `--metrics off|on|detailed`: metrics output (default `on`; `detailed` prints a breakdown to stderr)
|
|
@@ -118,14 +125,23 @@ npx -y @steipete/summarize <input> [flags]
|
|
|
118
125
|
Non-YouTube URLs go through a “fetch → extract” pipeline. When the direct fetch/extraction is blocked or too thin, `--firecrawl auto` can fall back to Firecrawl (if configured).
|
|
119
126
|
|
|
120
127
|
- `--firecrawl off|auto|always` (default `auto`)
|
|
121
|
-
- `--
|
|
122
|
-
-
|
|
128
|
+
- `--extract --format md|text` (default `text`)
|
|
129
|
+
- `--markdown-mode off|auto|llm` (default `auto`; only affects `--format md` for non-YouTube URLs)
|
|
130
|
+
- Plain-text mode: use `--format text`.
|
|
131
|
+
|
|
132
|
+
## YouTube transcripts
|
|
133
|
+
|
|
134
|
+
`--youtube auto` tries best-effort web transcript endpoints first. When captions aren't available, it falls back to:
|
|
123
135
|
|
|
124
|
-
|
|
136
|
+
1. **Apify** (if `APIFY_API_TOKEN` is set): Uses a scraping actor (`faVsWy9VTSNVIhWpR`)
|
|
137
|
+
2. **yt-dlp + Whisper** (if `YT_DLP_PATH` is set): Downloads audio via yt-dlp, transcribes with OpenAI Whisper if `OPENAI_API_KEY` is set, otherwise falls back to FAL (`FAL_KEY`)
|
|
125
138
|
|
|
126
|
-
|
|
139
|
+
Environment variables for yt-dlp mode:
|
|
140
|
+
- `YT_DLP_PATH` - path to yt-dlp binary
|
|
141
|
+
- `OPENAI_API_KEY` - OpenAI Whisper transcription (preferred)
|
|
142
|
+
- `FAL_KEY` - FAL AI Whisper fallback
|
|
127
143
|
|
|
128
|
-
Apify
|
|
144
|
+
Apify costs money but tends to be more reliable when captions exist.
|
|
129
145
|
|
|
130
146
|
## Configuration
|
|
131
147
|
|
|
@@ -160,13 +176,29 @@ Set the key matching your chosen `--model`:
|
|
|
160
176
|
|
|
161
177
|
OpenRouter (OpenAI-compatible):
|
|
162
178
|
|
|
163
|
-
- Set `
|
|
164
|
-
-
|
|
165
|
-
-
|
|
179
|
+
- Set `OPENROUTER_API_KEY=...` to route `openai/...` models through OpenRouter
|
|
180
|
+
- Use OpenRouter models via the `openai/...` prefix, e.g. `--model openai/openai/gpt-oss-20b`
|
|
181
|
+
- Optional: `OPENROUTER_PROVIDERS=...` to specify provider fallback order (e.g. `groq,google-vertex`)
|
|
182
|
+
|
|
183
|
+
Example:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openai/openai/gpt-oss-20b
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
With provider ordering (falls back through providers in order):
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
OPENROUTER_API_KEY=sk-or-... OPENROUTER_PROVIDERS="groq,google-vertex" summarize "https://example.com"
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Legacy: `OPENAI_BASE_URL=https://openrouter.ai/api/v1` with `OPENAI_API_KEY` also works.
|
|
166
196
|
|
|
167
197
|
Optional services:
|
|
168
198
|
|
|
169
199
|
- `FIRECRAWL_API_KEY` (website extraction fallback)
|
|
200
|
+
- `YT_DLP_PATH` (path to yt-dlp binary for audio extraction)
|
|
201
|
+
- `FAL_KEY` (FAL AI API key for audio transcription via Whisper)
|
|
170
202
|
- `APIFY_API_TOKEN` (YouTube transcript fallback)
|
|
171
203
|
|
|
172
204
|
## Model limits
|