@steipete/summarize 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +10 -3
- package/README.md +7 -3
- package/dist/cli.cjs +451 -133
- package/dist/cli.cjs.map +4 -4
- package/dist/esm/flags.js +18 -1
- package/dist/esm/flags.js.map +1 -1
- package/dist/esm/markitdown.js +54 -0
- package/dist/esm/markitdown.js.map +1 -0
- package/dist/esm/prompts/file.js +19 -0
- package/dist/esm/prompts/file.js.map +1 -1
- package/dist/esm/prompts/index.js +1 -1
- package/dist/esm/prompts/index.js.map +1 -1
- package/dist/esm/run.js +262 -35
- package/dist/esm/run.js.map +1 -1
- package/dist/esm/version.js +1 -1
- package/dist/types/flags.d.ts +4 -0
- package/dist/types/markitdown.d.ts +10 -0
- package/dist/types/prompts/file.d.ts +7 -0
- package/dist/types/prompts/index.d.ts +1 -1
- package/dist/types/run.d.ts +3 -1
- package/dist/types/version.d.ts +1 -1
- package/docs/README.md +1 -1
- package/docs/extract-only.md +10 -7
- package/docs/firecrawl.md +2 -2
- package/docs/site/docs/config.html +3 -3
- package/docs/site/docs/extract-only.html +7 -5
- package/docs/site/docs/firecrawl.html +6 -6
- package/docs/site/docs/index.html +2 -2
- package/docs/site/docs/llm.html +2 -2
- package/docs/site/docs/openai.html +2 -2
- package/docs/site/docs/website.html +7 -4
- package/docs/site/docs/youtube.html +2 -2
- package/docs/site/index.html +1 -1
- package/docs/website.md +10 -7
- package/docs/youtube.md +1 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,7 +1,14 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
## 0.
|
|
3
|
+
## 0.4.0 - 2025-12-21
|
|
4
|
+
|
|
5
|
+
### Changes
|
|
4
6
|
|
|
7
|
+
- Add URL extraction mode via `--extract` (deprecated alias: `--extract-only`) with `--format md|text`.
|
|
8
|
+
- Rename HTML→Markdown conversion flag to `--markdown-mode` (deprecated alias: `--markdown`).
|
|
9
|
+
- Add `--preprocess off|auto|always` and a `uvx markitdown` fallback for Markdown extraction and unsupported file attachments (when `--format md` is used).
|
|
10
|
+
|
|
11
|
+
## 0.3.0 - 2025-12-20
|
|
5
12
|
### Changes
|
|
6
13
|
|
|
7
14
|
- Add yt-dlp audio transcription fallback for YouTube; prefer OpenAI Whisper with FAL fallback. Thanks @dougvk.
|
|
@@ -94,7 +101,7 @@ First public release.
|
|
|
94
101
|
- `--max-output-tokens <count>` (optional hard cap)
|
|
95
102
|
- `--timeout <duration>` (default `2m`)
|
|
96
103
|
- `--stream auto|on|off`, `--render auto|md-live|md|plain`
|
|
97
|
-
- `--extract
|
|
104
|
+
- `--extract` (URLs only; no summary; deprecated alias: `--extract-only`)
|
|
98
105
|
- `--json` (structured output incl. input config, prompt, extracted content, LLM metadata, and metrics)
|
|
99
106
|
- `--metrics off|on|detailed` (default `on`)
|
|
100
107
|
- `--verbose`
|
|
@@ -103,7 +110,7 @@ First public release.
|
|
|
103
110
|
|
|
104
111
|
- Websites: fetch + extract “article-ish” content + normalization for prompts.
|
|
105
112
|
- Firecrawl fallback for blocked/thin sites (`--firecrawl off|auto|always`, via `FIRECRAWL_API_KEY`).
|
|
106
|
-
- Markdown extraction for websites in `--extract
|
|
113
|
+
- Markdown extraction for websites in `--extract` mode (`--format md|text`, `--markdown-mode off|auto|llm`).
|
|
107
114
|
- YouTube (`--youtube auto|web|apify`):
|
|
108
115
|
- best-effort transcript endpoints
|
|
109
116
|
- optional Apify fallback (requires `APIFY_API_TOKEN`; single actor `faVsWy9VTSNVIhWpR`)
|
package/README.md
CHANGED
|
@@ -112,7 +112,10 @@ npx -y @steipete/summarize <input> [flags]
|
|
|
112
112
|
- `--max-output-tokens <count>`: hard cap for LLM output tokens (optional)
|
|
113
113
|
- `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)
|
|
114
114
|
- `--render auto|md-live|md|plain`: Markdown rendering (`auto` = best default for TTY)
|
|
115
|
-
- `--
|
|
115
|
+
- `--format md|text`: website/file content format (default `text`)
|
|
116
|
+
- `--preprocess off|auto|always`: preprocess files (only with `--format md`) for model compatibility (default `auto`)
|
|
117
|
+
- `--extract`: print extracted content and exit (no summary) — only for URLs
|
|
118
|
+
- Deprecated alias: `--extract-only`
|
|
116
119
|
- `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary
|
|
117
120
|
- `--verbose`: debug/diagnostics on stderr
|
|
118
121
|
- `--metrics off|on|detailed`: metrics output (default `on`; `detailed` prints a breakdown to stderr)
|
|
@@ -122,8 +125,9 @@ npx -y @steipete/summarize <input> [flags]
|
|
|
122
125
|
Non-YouTube URLs go through a “fetch → extract” pipeline. When the direct fetch/extraction is blocked or too thin, `--firecrawl auto` can fall back to Firecrawl (if configured).
|
|
123
126
|
|
|
124
127
|
- `--firecrawl off|auto|always` (default `auto`)
|
|
125
|
-
- `--
|
|
126
|
-
-
|
|
128
|
+
- `--extract --format md|text` (default `text`)
|
|
129
|
+
- `--markdown-mode off|auto|llm` (default `auto`; only affects `--format md` for non-YouTube URLs)
|
|
130
|
+
- Plain-text mode: use `--format text`.
|
|
127
131
|
|
|
128
132
|
## YouTube transcripts
|
|
129
133
|
|