@steipete/summarize 0.1.2 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +66 -3
- package/README.md +40 -6
- package/dist/cli.cjs +6502 -634
- package/dist/cli.cjs.map +4 -4
- package/dist/esm/content/asset.js +18 -0
- package/dist/esm/content/asset.js.map +1 -1
- package/dist/esm/content/link-preview/client.js +8 -0
- package/dist/esm/content/link-preview/client.js.map +1 -1
- package/dist/esm/content/link-preview/content/article.js +15 -1
- package/dist/esm/content/link-preview/content/article.js.map +1 -1
- package/dist/esm/content/link-preview/content/index.js +151 -4
- package/dist/esm/content/link-preview/content/index.js.map +1 -1
- package/dist/esm/content/link-preview/transcript/index.js +6 -0
- package/dist/esm/content/link-preview/transcript/index.js.map +1 -1
- package/dist/esm/content/link-preview/transcript/providers/youtube/yt-dlp.js +213 -0
- package/dist/esm/content/link-preview/transcript/providers/youtube/yt-dlp.js.map +1 -0
- package/dist/esm/content/link-preview/transcript/providers/youtube.js +40 -2
- package/dist/esm/content/link-preview/transcript/providers/youtube.js.map +1 -1
- package/dist/esm/flags.js +14 -2
- package/dist/esm/flags.js.map +1 -1
- package/dist/esm/llm/generate-text.js +125 -21
- package/dist/esm/llm/generate-text.js.map +1 -1
- package/dist/esm/llm/html-to-markdown.js +3 -2
- package/dist/esm/llm/html-to-markdown.js.map +1 -1
- package/dist/esm/pricing/litellm.js +4 -1
- package/dist/esm/pricing/litellm.js.map +1 -1
- package/dist/esm/prompts/file.js +15 -4
- package/dist/esm/prompts/file.js.map +1 -1
- package/dist/esm/prompts/link-summary.js +20 -6
- package/dist/esm/prompts/link-summary.js.map +1 -1
- package/dist/esm/run.js +545 -407
- package/dist/esm/run.js.map +1 -1
- package/dist/esm/version.js +1 -1
- package/dist/types/content/link-preview/client.d.ts +5 -1
- package/dist/types/content/link-preview/content/types.d.ts +1 -1
- package/dist/types/content/link-preview/deps.d.ts +33 -0
- package/dist/types/content/link-preview/transcript/providers/youtube/yt-dlp.d.ts +15 -0
- package/dist/types/content/link-preview/transcript/types.d.ts +4 -0
- package/dist/types/content/link-preview/types.d.ts +1 -1
- package/dist/types/costs.d.ts +1 -1
- package/dist/types/flags.d.ts +1 -1
- package/dist/types/llm/generate-text.d.ts +8 -2
- package/dist/types/llm/html-to-markdown.d.ts +4 -1
- package/dist/types/pricing/litellm.d.ts +1 -0
- package/dist/types/prompts/file.d.ts +2 -1
- package/dist/types/version.d.ts +1 -1
- package/docs/extract-only.md +1 -1
- package/docs/firecrawl.md +2 -2
- package/docs/llm.md +7 -0
- package/docs/site/docs/config.html +1 -1
- package/docs/site/docs/firecrawl.html +1 -1
- package/docs/website.md +3 -3
- package/docs/youtube.md +5 -2
- package/package.json +7 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,12 +1,75 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
## 0.3.0 - 2025-12-20
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
### Changes
|
|
6
|
+
|
|
7
|
+
- Add yt-dlp audio transcription fallback for YouTube; prefer OpenAI Whisper with FAL fallback. Thanks @dougvk.
|
|
8
|
+
- Add `--no-playlist` to yt-dlp downloads to avoid transcript mismatches.
|
|
9
|
+
- Run yt-dlp after web + Apify in `--youtube auto`, and error early for missing keys in `--youtube yt-dlp`.
|
|
10
|
+
- Require Node 22+.
|
|
11
|
+
- Respect `OPENAI_BASE_URL` when set, even with OpenRouter keys.
|
|
12
|
+
- Apply OpenRouter provider ordering headers to HTML→Markdown conversion.
|
|
13
|
+
- Add OpenRouter configuration tests. Thanks @dougvk for the initial OpenRouter support.
|
|
14
|
+
- Build and ship a Bun bytecode arm64 binary for Homebrew.
|
|
15
|
+
|
|
16
|
+
### Tests
|
|
17
|
+
|
|
18
|
+
- Add coverage for yt-dlp ordering, missing-key errors, and helper paths.
|
|
19
|
+
- Add live coverage for yt-dlp transcript mode and missing-caption YouTube pages.
|
|
20
|
+
|
|
21
|
+
### Dev
|
|
22
|
+
|
|
23
|
+
- Add `Dockerfile.test` for containerized yt-dlp testing.
|
|
24
|
+
|
|
25
|
+
## 0.2.0 - 2025-12-20
|
|
26
|
+
|
|
27
|
+
### Changes
|
|
28
|
+
|
|
29
|
+
- Add native OpenRouter support via `OPENROUTER_API_KEY` with optional provider ordering (`OPENROUTER_PROVIDERS`).
|
|
30
|
+
- Remove map-reduce summarization; reject inputs that exceed the model's context window.
|
|
31
|
+
- Preflight text prompts with the GPT tokenizer and the model’s max input tokens.
|
|
32
|
+
- Reject text files over 10 MB before tokenization.
|
|
33
|
+
- Reject too-small numeric `--length` and `--max-output-tokens` values.
|
|
34
|
+
- Cap summaries to the extracted content length when a requested size is larger.
|
|
35
|
+
- Skip summarization for tweets when extracted content is already below the requested length.
|
|
36
|
+
- Use bird CLI for tweet extraction when available and surface it in the status line.
|
|
37
|
+
- Fall back to Nitter for tweet extraction when bird fails; report a clear error when tweet data is unavailable.
|
|
38
|
+
- Compute cost totals via tokentally’s tally helpers.
|
|
39
|
+
- Improve fetch spinner with elapsed time and throughput updates.
|
|
40
|
+
- Show Firecrawl fallback status and reason when scraping kicks in.
|
|
41
|
+
- Enforce a hard deadline for stalled streaming LLM responses.
|
|
42
|
+
- Merge cumulative streaming chunks correctly and keep stream-merge for streaming output.
|
|
43
|
+
- Fall back to non-streaming when streaming requests time out.
|
|
44
|
+
- Preserve parentheses in URL paths when resolving inputs.
|
|
45
|
+
- Stop forcing Firecrawl for --extract-only; only use it as a fallback.
|
|
46
|
+
- Avoid Firecrawl fallback when block keywords only appear in scripts/styles.
|
|
47
|
+
|
|
48
|
+
### Tests
|
|
49
|
+
|
|
50
|
+
- Add CLI + live coverage for prompt length capping.
|
|
51
|
+
- Add coverage for cumulative stream merge handling.
|
|
52
|
+
- Add coverage for streaming timeout fallback.
|
|
53
|
+
- Add live coverage for Wikipedia URLs with parentheses.
|
|
54
|
+
- Add coverage for tweet summaries that bypass the LLM when short.
|
|
55
|
+
- Add coverage for content budget paths and TOKENTALLY cache dir overrides.
|
|
56
|
+
|
|
57
|
+
### Docs
|
|
58
|
+
|
|
59
|
+
- Update release checklist to all-in-one flow.
|
|
60
|
+
- Fix release script quoting.
|
|
61
|
+
- Document input limits and minimum length/token values.
|
|
62
|
+
|
|
63
|
+
### Dev
|
|
64
|
+
|
|
65
|
+
- Add a tokenization benchmark script.
|
|
6
66
|
|
|
7
67
|
### Fixes
|
|
8
68
|
|
|
9
|
-
-
|
|
69
|
+
- Preserve balanced parentheses/brackets in URL paths (e.g. Wikipedia titles).
|
|
70
|
+
- Avoid Firecrawl fallback when block keywords only appear in scripts/styles.
|
|
71
|
+
- Add a Bird install tip when Twitter/X fetch fails without bird installed.
|
|
72
|
+
- Graceful error when tweet extraction fails after bird + Nitter fallback.
|
|
10
73
|
|
|
11
74
|
## 0.1.1 - 2025-12-19
|
|
12
75
|
|
package/README.md
CHANGED
|
@@ -11,6 +11,8 @@ It streams output by default on TTY and renders Markdown to ANSI (via `markdansi
|
|
|
11
11
|
|
|
12
12
|
## Install
|
|
13
13
|
|
|
14
|
+
Requires Node 22+.
|
|
15
|
+
|
|
14
16
|
- npx (no install):
|
|
15
17
|
|
|
16
18
|
```bash
|
|
@@ -23,6 +25,8 @@ npx -y @steipete/summarize "https://example.com" --model google/gemini-3-flash-p
|
|
|
23
25
|
brew install steipete/tap/summarize
|
|
24
26
|
```
|
|
25
27
|
|
|
28
|
+
Apple Silicon only (arm64).
|
|
29
|
+
|
|
26
30
|
## Quickstart
|
|
27
31
|
|
|
28
32
|
```bash
|
|
@@ -89,6 +93,12 @@ npx -y @steipete/summarize "https://example.com" --length 20k
|
|
|
89
93
|
- Character targets: `1500`, `20k`, `20000`
|
|
90
94
|
- Optional hard cap: `--max-output-tokens <count>` (e.g. `2000`, `2k`)
|
|
91
95
|
- Provider/model APIs still enforce their own maximum output limits.
|
|
96
|
+
- Minimums: `--length` numeric values must be ≥ 50 chars; `--max-output-tokens` must be ≥ 16.
|
|
97
|
+
|
|
98
|
+
## Limits
|
|
99
|
+
|
|
100
|
+
- Text inputs over 10 MB are rejected before tokenization.
|
|
101
|
+
- Text prompts are preflighted against the model’s input limit (LiteLLM catalog), using a GPT tokenizer.
|
|
92
102
|
|
|
93
103
|
## Common flags
|
|
94
104
|
|
|
@@ -115,11 +125,19 @@ Non-YouTube URLs go through a “fetch → extract” pipeline. When the direct
|
|
|
115
125
|
- `--markdown off|auto|llm` (default `auto`; only affects `--extract-only` for non-YouTube URLs)
|
|
116
126
|
- Plain-text mode: use `--firecrawl off --markdown off`.
|
|
117
127
|
|
|
118
|
-
## YouTube transcripts
|
|
128
|
+
## YouTube transcripts
|
|
119
129
|
|
|
120
|
-
`--youtube auto` tries best-effort web transcript endpoints first,
|
|
130
|
+
`--youtube auto` tries best-effort web transcript endpoints first. When captions aren't available, it falls back to:
|
|
121
131
|
|
|
122
|
-
Apify
|
|
132
|
+
1. **Apify** (if `APIFY_API_TOKEN` is set): Uses a scraping actor (`faVsWy9VTSNVIhWpR`)
|
|
133
|
+
2. **yt-dlp + Whisper** (if `YT_DLP_PATH` is set): Downloads audio via yt-dlp, transcribes with OpenAI Whisper if `OPENAI_API_KEY` is set, otherwise falls back to FAL (`FAL_KEY`)
|
|
134
|
+
|
|
135
|
+
Environment variables for yt-dlp mode:
|
|
136
|
+
- `YT_DLP_PATH` - path to yt-dlp binary
|
|
137
|
+
- `OPENAI_API_KEY` - OpenAI Whisper transcription (preferred)
|
|
138
|
+
- `FAL_KEY` - FAL AI Whisper fallback
|
|
139
|
+
|
|
140
|
+
Apify costs money but tends to be more reliable when captions exist.
|
|
123
141
|
|
|
124
142
|
## Configuration
|
|
125
143
|
|
|
@@ -154,13 +172,29 @@ Set the key matching your chosen `--model`:
|
|
|
154
172
|
|
|
155
173
|
OpenRouter (OpenAI-compatible):
|
|
156
174
|
|
|
157
|
-
- Set `
|
|
158
|
-
-
|
|
159
|
-
-
|
|
175
|
+
- Set `OPENROUTER_API_KEY=...` to route `openai/...` models through OpenRouter
|
|
176
|
+
- Use OpenRouter models via the `openai/...` prefix, e.g. `--model openai/openai/gpt-oss-20b`
|
|
177
|
+
- Optional: `OPENROUTER_PROVIDERS=...` to specify provider fallback order (e.g. `groq,google-vertex`)
|
|
178
|
+
|
|
179
|
+
Example:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openai/openai/gpt-oss-20b
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
With provider ordering (falls back through providers in order):
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
OPENROUTER_API_KEY=sk-or-... OPENROUTER_PROVIDERS="groq,google-vertex" summarize "https://example.com"
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
Legacy: `OPENAI_BASE_URL=https://openrouter.ai/api/v1` with `OPENAI_API_KEY` also works.
|
|
160
192
|
|
|
161
193
|
Optional services:
|
|
162
194
|
|
|
163
195
|
- `FIRECRAWL_API_KEY` (website extraction fallback)
|
|
196
|
+
- `YT_DLP_PATH` (path to yt-dlp binary for audio extraction)
|
|
197
|
+
- `FAL_KEY` (FAL AI API key for audio transcription via Whisper)
|
|
164
198
|
- `APIFY_API_TOKEN` (YouTube transcript fallback)
|
|
165
199
|
|
|
166
200
|
## Model limits
|