@steipete/summarize 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,7 +1,14 @@
1
1
  # Changelog
2
2
 
3
- ## 0.3.0 - 2025-12-20
3
+ ## 0.4.0 - 2025-12-21
4
+
5
+ ### Changes
4
6
 
7
+ - Add URL extraction mode via `--extract` (deprecated alias: `--extract-only`) with `--format md|text`.
8
+ - Rename HTML→Markdown conversion flag to `--markdown-mode` (deprecated alias: `--markdown`).
9
+ - Add `--preprocess off|auto|always` and a `uvx markitdown` fallback for Markdown extraction and unsupported file attachments (when `--format md` is used).
10
+
11
+ ## 0.3.0 - 2025-12-20
5
12
  ### Changes
6
13
 
7
14
  - Add yt-dlp audio transcription fallback for YouTube; prefer OpenAI Whisper with FAL fallback. Thanks @dougvk.
@@ -94,7 +101,7 @@ First public release.
94
101
  - `--max-output-tokens <count>` (optional hard cap)
95
102
  - `--timeout <duration>` (default `2m`)
96
103
  - `--stream auto|on|off`, `--render auto|md-live|md|plain`
97
- - `--extract-only` (URLs only; no summary)
104
+ - `--extract` (URLs only; no summary; deprecated alias: `--extract-only`)
98
105
  - `--json` (structured output incl. input config, prompt, extracted content, LLM metadata, and metrics)
99
106
  - `--metrics off|on|detailed` (default `on`)
100
107
  - `--verbose`
@@ -103,7 +110,7 @@ First public release.
103
110
 
104
111
  - Websites: fetch + extract “article-ish” content + normalization for prompts.
105
112
  - Firecrawl fallback for blocked/thin sites (`--firecrawl off|auto|always`, via `FIRECRAWL_API_KEY`).
106
- - Markdown extraction for websites in `--extract-only` mode (`--markdown off|auto|llm`).
113
+ - Markdown extraction for websites in `--extract` mode (`--format md|text`, `--markdown-mode off|auto|llm`).
107
114
  - YouTube (`--youtube auto|web|apify`):
108
115
  - best-effort transcript endpoints
109
116
  - optional Apify fallback (requires `APIFY_API_TOKEN`; single actor `faVsWy9VTSNVIhWpR`)
package/README.md CHANGED
@@ -112,7 +112,10 @@ npx -y @steipete/summarize <input> [flags]
112
112
  - `--max-output-tokens <count>`: hard cap for LLM output tokens (optional)
113
113
  - `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)
114
114
  - `--render auto|md-live|md|plain`: Markdown rendering (`auto` = best default for TTY)
115
- - `--extract-only`: print extracted content and exit (no summary) — only for URLs
115
+ - `--format md|text`: website/file content format (default `text`)
116
+ - `--preprocess off|auto|always`: preprocess files (only with `--format md`) for model compatibility (default `auto`)
117
+ - `--extract`: print extracted content and exit (no summary) — only for URLs
118
+ - Deprecated alias: `--extract-only`
116
119
  - `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary
117
120
  - `--verbose`: debug/diagnostics on stderr
118
121
  - `--metrics off|on|detailed`: metrics output (default `on`; `detailed` prints a breakdown to stderr)
@@ -122,8 +125,9 @@ npx -y @steipete/summarize <input> [flags]
122
125
  Non-YouTube URLs go through a “fetch → extract” pipeline. When the direct fetch/extraction is blocked or too thin, `--firecrawl auto` can fall back to Firecrawl (if configured).
123
126
 
124
127
  - `--firecrawl off|auto|always` (default `auto`)
125
- - `--markdown off|auto|llm` (default `auto`; only affects `--extract-only` for non-YouTube URLs)
126
- - Plain-text mode: use `--firecrawl off --markdown off`.
128
+ - `--extract --format md|text` (default `text`)
129
+ - `--markdown-mode off|auto|llm` (default `auto`; only affects `--format md` for non-YouTube URLs)
130
+ - Plain-text mode: use `--format text`.
127
131
 
128
132
  ## YouTube transcripts
129
133