@steipete/summarize 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/CHANGELOG.md +44 -4
  2. package/README.md +6 -0
  3. package/dist/cli.cjs +2727 -1010
  4. package/dist/cli.cjs.map +4 -4
  5. package/dist/esm/content/asset.js +18 -0
  6. package/dist/esm/content/asset.js.map +1 -1
  7. package/dist/esm/content/link-preview/client.js +2 -0
  8. package/dist/esm/content/link-preview/client.js.map +1 -1
  9. package/dist/esm/content/link-preview/content/article.js +15 -1
  10. package/dist/esm/content/link-preview/content/article.js.map +1 -1
  11. package/dist/esm/content/link-preview/content/index.js +151 -4
  12. package/dist/esm/content/link-preview/content/index.js.map +1 -1
  13. package/dist/esm/flags.js +12 -2
  14. package/dist/esm/flags.js.map +1 -1
  15. package/dist/esm/llm/generate-text.js +74 -7
  16. package/dist/esm/llm/generate-text.js.map +1 -1
  17. package/dist/esm/pricing/litellm.js +4 -1
  18. package/dist/esm/pricing/litellm.js.map +1 -1
  19. package/dist/esm/prompts/file.js +15 -4
  20. package/dist/esm/prompts/file.js.map +1 -1
  21. package/dist/esm/prompts/link-summary.js +20 -6
  22. package/dist/esm/prompts/link-summary.js.map +1 -1
  23. package/dist/esm/run.js +505 -398
  24. package/dist/esm/run.js.map +1 -1
  25. package/dist/esm/version.js +1 -1
  26. package/dist/types/content/link-preview/client.d.ts +2 -1
  27. package/dist/types/content/link-preview/deps.d.ts +30 -0
  28. package/dist/types/content/link-preview/types.d.ts +1 -1
  29. package/dist/types/costs.d.ts +1 -1
  30. package/dist/types/pricing/litellm.d.ts +1 -0
  31. package/dist/types/prompts/file.d.ts +2 -1
  32. package/dist/types/version.d.ts +1 -1
  33. package/docs/extract-only.md +1 -1
  34. package/docs/firecrawl.md +2 -2
  35. package/docs/llm.md +7 -0
  36. package/docs/site/docs/config.html +1 -1
  37. package/docs/site/docs/firecrawl.html +1 -1
  38. package/docs/website.md +3 -3
  39. package/package.json +3 -2
package/CHANGELOG.md CHANGED
@@ -1,12 +1,52 @@
1
1
  # Changelog
2
2
 
3
- All notable changes to this project are documented here.
4
-
5
- ## 0.1.2 - 2025-12-20
3
+ ## 0.2.0 - 2025-12-20
4
+
5
+ ### Changes
6
+
7
+ - Remove map-reduce summarization; reject inputs that exceed the model’s context window.
8
+ - Preflight text prompts with the GPT tokenizer and the model’s max input tokens.
9
+ - Reject text files over 10 MB before tokenization.
10
+ - Reject too-small numeric `--length` and `--max-output-tokens` values.
11
+ - Cap summaries to the extracted content length when a requested size is larger.
12
+ - Skip summarization for tweets when extracted content is already below the requested length.
13
+ - Use bird CLI for tweet extraction when available and surface it in the status line.
14
+ - Fall back to Nitter for tweet extraction when bird fails; report a clear error when tweet data is unavailable.
15
+ - Compute cost totals via tokentally’s tally helpers.
16
+ - Improve fetch spinner with elapsed time and throughput updates.
17
+ - Show Firecrawl fallback status and reason when scraping kicks in.
18
+ - Enforce a hard deadline for stalled streaming LLM responses.
19
+ - Merge cumulative streaming chunks correctly and keep stream-merge for streaming output.
20
+ - Fall back to non-streaming when streaming requests time out.
21
+ - Preserve parentheses in URL paths when resolving inputs.
22
+ - Stop forcing Firecrawl for --extract-only; only use it as a fallback.
23
+ - Avoid Firecrawl fallback when block keywords only appear in scripts/styles.
24
+
25
+ ### Tests
26
+
27
+ - Add CLI + live coverage for prompt length capping.
28
+ - Add coverage for cumulative stream merge handling.
29
+ - Add coverage for streaming timeout fallback.
30
+ - Add live coverage for Wikipedia URLs with parentheses.
31
+ - Add coverage for tweet summaries that bypass the LLM when short.
32
+ - Add coverage for content budget paths and TOKENTALLY cache dir overrides.
33
+
34
+ ### Docs
35
+
36
+ - Update release checklist to all-in-one flow.
37
+ - Fix release script quoting.
38
+ - Document input limits and minimum length/token values.
39
+
40
+ ### Dev
41
+
42
+ - Add a tokenization benchmark script.
6
43
 
7
44
  ### Fixes
8
45
 
9
- - Avoid duplicate streamed output when providers emit cumulative chunks instead of deltas.
46
+ - Preserve balanced parentheses/brackets in URL paths (e.g. Wikipedia titles).
47
+ - Avoid Firecrawl fallback when block keywords only appear in scripts/styles.
48
+ - Add a Bird install tip when Twitter/X fetch fails without bird installed.
49
+ - Graceful error when tweet extraction fails after bird + Nitter fallback.
10
50
 
11
51
  ## 0.1.1 - 2025-12-19
12
52
 
package/README.md CHANGED
@@ -89,6 +89,12 @@ npx -y @steipete/summarize "https://example.com" --length 20k
89
89
  - Character targets: `1500`, `20k`, `20000`
90
90
  - Optional hard cap: `--max-output-tokens <count>` (e.g. `2000`, `2k`)
91
91
  - Provider/model APIs still enforce their own maximum output limits.
92
+ - Minimums: `--length` numeric values must be ≥ 50 chars; `--max-output-tokens` must be ≥ 16.
93
+
94
+ ## Limits
95
+
96
+ - Text inputs over 10 MB are rejected before tokenization.
97
+ - Text prompts are preflighted against the model’s input limit (LiteLLM catalog), using a GPT tokenizer.
92
98
 
93
99
  ## Common flags
94
100