@steipete/summarize 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +10 -3
- package/README.md +7 -3
- package/dist/cli.cjs +451 -133
- package/dist/cli.cjs.map +4 -4
- package/dist/esm/flags.js +18 -1
- package/dist/esm/flags.js.map +1 -1
- package/dist/esm/markitdown.js +54 -0
- package/dist/esm/markitdown.js.map +1 -0
- package/dist/esm/prompts/file.js +19 -0
- package/dist/esm/prompts/file.js.map +1 -1
- package/dist/esm/prompts/index.js +1 -1
- package/dist/esm/prompts/index.js.map +1 -1
- package/dist/esm/run.js +262 -35
- package/dist/esm/run.js.map +1 -1
- package/dist/esm/version.js +1 -1
- package/dist/types/flags.d.ts +4 -0
- package/dist/types/markitdown.d.ts +10 -0
- package/dist/types/prompts/file.d.ts +7 -0
- package/dist/types/prompts/index.d.ts +1 -1
- package/dist/types/run.d.ts +3 -1
- package/dist/types/version.d.ts +1 -1
- package/docs/README.md +1 -1
- package/docs/extract-only.md +10 -7
- package/docs/firecrawl.md +2 -2
- package/docs/site/docs/config.html +3 -3
- package/docs/site/docs/extract-only.html +7 -5
- package/docs/site/docs/firecrawl.html +6 -6
- package/docs/site/docs/index.html +2 -2
- package/docs/site/docs/llm.html +2 -2
- package/docs/site/docs/openai.html +2 -2
- package/docs/site/docs/website.html +7 -4
- package/docs/site/docs/youtube.html +2 -2
- package/docs/site/index.html +1 -1
- package/docs/website.md +10 -7
- package/docs/youtube.md +1 -1
- package/package.json +1 -1
|
@@ -31,7 +31,7 @@
|
|
|
31
31
|
<a href="./index.html">Overview</a>
|
|
32
32
|
<a href="./website.html">Website mode</a>
|
|
33
33
|
<a href="./youtube.html">YouTube mode</a>
|
|
34
|
-
<a href="./extract-only.html">Extract
|
|
34
|
+
<a href="./extract-only.html">Extract</a>
|
|
35
35
|
<a href="./llm.html">LLM</a>
|
|
36
36
|
<a href="./openai.html">OpenAI</a>
|
|
37
37
|
<a href="./firecrawl.html">Firecrawl</a>
|
|
@@ -45,7 +45,7 @@
|
|
|
45
45
|
|
|
46
46
|
<h2>Notes</h2>
|
|
47
47
|
<ul>
|
|
48
|
-
<li>Some modes (like <code>--extract
|
|
48
|
+
<li>Some modes (like <code>--extract</code>) don’t need an LLM at all.</li>
|
|
49
49
|
<li>When output is used downstream, prefer <code>--json</code> and pin <code>--model</code>.</li>
|
|
50
50
|
</ul>
|
|
51
51
|
|
|
@@ -31,7 +31,7 @@
|
|
|
31
31
|
<a href="./index.html">Overview</a>
|
|
32
32
|
<a href="./website.html">Website mode</a>
|
|
33
33
|
<a href="./youtube.html">YouTube mode</a>
|
|
34
|
-
<a href="./extract-only.html">Extract
|
|
34
|
+
<a href="./extract-only.html">Extract</a>
|
|
35
35
|
<a href="./llm.html">LLM</a>
|
|
36
36
|
<a href="./openai.html">OpenAI</a>
|
|
37
37
|
<a href="./firecrawl.html">Firecrawl</a>
|
|
@@ -41,20 +41,23 @@
|
|
|
41
41
|
<article class="doc reveal">
|
|
42
42
|
<p class="kicker">mode</p>
|
|
43
43
|
<h1>Website mode</h1>
|
|
44
|
-
<p>Fetch HTML → extract “article-ish” content → normalize to clean text. If extraction looks blocked or too thin, retry via Firecrawl Markdown (optional)
|
|
44
|
+
<p>Fetch HTML → extract “article-ish” content → normalize to clean text. If extraction looks blocked or too thin, retry via Firecrawl Markdown (optional). With <code>--format md</code>, the CLI prefers Firecrawl Markdown when configured and can also convert HTML → Markdown via <code>--markdown-mode</code> (LLM) or <code>uvx markitdown</code>.</p>
|
|
45
45
|
|
|
46
46
|
<h2>Flags</h2>
|
|
47
47
|
<ul>
|
|
48
48
|
<li><code>--firecrawl off|auto|always</code></li>
|
|
49
49
|
<li><code>--timeout 30s|2m|5000ms</code> (default <code>2m</code>)</li>
|
|
50
|
-
<li><code>--extract
|
|
50
|
+
<li><code>--extract</code> (print extracted content; no summary call)</li>
|
|
51
|
+
<li><code>--format md|text</code> (default <code>text</code>)</li>
|
|
52
|
+
<li><code>--markdown-mode off|auto|llm</code> (only with <code>--format md</code>)</li>
|
|
53
|
+
<li><code>--preprocess off|auto|always</code> (default <code>auto</code>; controls markitdown usage)</li>
|
|
51
54
|
<li><code>--json</code> (emit a single JSON object)</li>
|
|
52
55
|
<li><code>--verbose</code> (progress + which extractor was used)</li>
|
|
53
56
|
<li><code>--metrics off|on|detailed</code></li>
|
|
54
57
|
</ul>
|
|
55
58
|
|
|
56
59
|
<div class="note">
|
|
57
|
-
Plain-text mode: <code>--
|
|
60
|
+
Plain-text mode: <code>--extract --format text</code>.
|
|
58
61
|
</div>
|
|
59
62
|
</article>
|
|
60
63
|
</section>
|
|
@@ -31,7 +31,7 @@
|
|
|
31
31
|
<a href="./index.html">Overview</a>
|
|
32
32
|
<a href="./website.html">Website mode</a>
|
|
33
33
|
<a href="./youtube.html">YouTube mode</a>
|
|
34
|
-
<a href="./extract-only.html">Extract
|
|
34
|
+
<a href="./extract-only.html">Extract</a>
|
|
35
35
|
<a href="./llm.html">LLM</a>
|
|
36
36
|
<a href="./openai.html">OpenAI</a>
|
|
37
37
|
<a href="./firecrawl.html">Firecrawl</a>
|
|
@@ -45,7 +45,7 @@
|
|
|
45
45
|
|
|
46
46
|
<h2>Tip</h2>
|
|
47
47
|
<ul>
|
|
48
|
-
<li>If you only want the transcript: use <code>--extract
|
|
48
|
+
<li>If you only want the transcript: use <code>--extract</code>.</li>
|
|
49
49
|
<li>For pipelines: add <code>--json</code>.</li>
|
|
50
50
|
</ul>
|
|
51
51
|
</article>
|
package/docs/site/index.html
CHANGED
|
@@ -104,7 +104,7 @@
|
|
|
104
104
|
</div>
|
|
105
105
|
<div class="card reveal">
|
|
106
106
|
<h2>Built for pipelines</h2>
|
|
107
|
-
<p><code>--extract
|
|
107
|
+
<p><code>--extract</code>, <code>--json</code>, and <code>--metrics</code> make it scriptable.</p>
|
|
108
108
|
<div class="small">Compose it with your own tools</div>
|
|
109
109
|
</div>
|
|
110
110
|
<div class="card reveal">
|
package/docs/website.md
CHANGED
|
@@ -7,21 +7,24 @@ Use this for non-YouTube URLs.
|
|
|
7
7
|
- Fetches the page HTML.
|
|
8
8
|
- Extracts “article-ish” content and normalizes it into clean text.
|
|
9
9
|
- If extraction looks blocked or too thin, it can retry via Firecrawl (Markdown).
|
|
10
|
-
-
|
|
11
|
-
-
|
|
10
|
+
- With `--format md`, the CLI prefers Firecrawl Markdown by default when `FIRECRAWL_API_KEY` is configured.
|
|
11
|
+
- With `--format md`, `--markdown-mode auto|llm` can also convert HTML → Markdown via an LLM using the configured `--model` (no provider fallback).
|
|
12
|
+
- With `--format md`, `--markdown-mode auto` may fall back to `uvx markitdown` when available (disable with `--preprocess off`).
|
|
12
13
|
|
|
13
14
|
## Flags
|
|
14
15
|
|
|
15
16
|
- `--firecrawl off|auto|always`
|
|
16
|
-
- `--
|
|
17
|
-
-
|
|
17
|
+
- `--format md|text` (default: `text`)
|
|
18
|
+
- `--markdown-mode off|auto|llm` (default: `auto`; only affects `--format md` for non-YouTube URLs)
|
|
19
|
+
- `--preprocess off|auto|always` (default: `auto`; controls markitdown usage; `always` only affects file inputs)
|
|
20
|
+
- Plain-text mode: use `--format text`.
|
|
18
21
|
- `--timeout 30s|30|2m|5000ms` (default: `2m`)
|
|
19
|
-
- `--extract
|
|
22
|
+
- `--extract` (print extracted content; no summary LLM call)
|
|
20
23
|
- `--json` (emit a single JSON object)
|
|
21
24
|
- `--verbose` (progress + which extractor was used)
|
|
22
25
|
- `--metrics off|on|detailed` (default: `on`; `detailed` prints a breakdown to stderr)
|
|
23
26
|
|
|
24
27
|
## API keys
|
|
25
28
|
|
|
26
|
-
- Optional: `FIRECRAWL_API_KEY` (for the Firecrawl fallback)
|
|
27
|
-
- Optional: `XAI_API_KEY` / `OPENAI_API_KEY` / `GEMINI_API_KEY` (also accepts `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY`) (required only when `--markdown llm` is used, or when `--markdown auto` falls back to LLM conversion)
|
|
29
|
+
- Optional: `FIRECRAWL_API_KEY` (for the Firecrawl fallback / preferred Markdown output)
|
|
30
|
+
- Optional: `XAI_API_KEY` / `OPENAI_API_KEY` / `GEMINI_API_KEY` (also accepts `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY`) (required only when `--markdown-mode llm` is used, or when `--markdown-mode auto` falls back to LLM conversion)
|
package/docs/youtube.md
CHANGED