@steipete/summarize 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (398) hide show
  1. package/CHANGELOG.md +121 -0
  2. package/LICENSE +1 -1
  3. package/README.md +391 -183
  4. package/dist/cli.js +1 -1
  5. package/dist/esm/cache.js +134 -64
  6. package/dist/esm/cache.js.map +1 -1
  7. package/dist/esm/cli-main.js +27 -27
  8. package/dist/esm/cli-main.js.map +1 -1
  9. package/dist/esm/cli.js +2 -2
  10. package/dist/esm/cli.js.map +1 -1
  11. package/dist/esm/config.js +396 -126
  12. package/dist/esm/config.js.map +1 -1
  13. package/dist/esm/content/asset.js +53 -50
  14. package/dist/esm/content/asset.js.map +1 -1
  15. package/dist/esm/content/index.js +1 -1
  16. package/dist/esm/content/index.js.map +1 -1
  17. package/dist/esm/costs.js +1 -1
  18. package/dist/esm/costs.js.map +1 -1
  19. package/dist/esm/daemon/agent.js +548 -0
  20. package/dist/esm/daemon/agent.js.map +1 -0
  21. package/dist/esm/daemon/auto-mode.js +3 -3
  22. package/dist/esm/daemon/auto-mode.js.map +1 -1
  23. package/dist/esm/daemon/chat.js +88 -178
  24. package/dist/esm/daemon/chat.js.map +1 -1
  25. package/dist/esm/daemon/cli-entrypoint.js +72 -0
  26. package/dist/esm/daemon/cli-entrypoint.js.map +1 -0
  27. package/dist/esm/daemon/cli.js +91 -83
  28. package/dist/esm/daemon/cli.js.map +1 -1
  29. package/dist/esm/daemon/config.js +15 -15
  30. package/dist/esm/daemon/config.js.map +1 -1
  31. package/dist/esm/daemon/constants.js +6 -6
  32. package/dist/esm/daemon/constants.js.map +1 -1
  33. package/dist/esm/daemon/env-merge.js.map +1 -1
  34. package/dist/esm/daemon/env-snapshot.js +36 -28
  35. package/dist/esm/daemon/env-snapshot.js.map +1 -1
  36. package/dist/esm/daemon/flow-context.js +86 -32
  37. package/dist/esm/daemon/flow-context.js.map +1 -1
  38. package/dist/esm/daemon/launchd.js +119 -47
  39. package/dist/esm/daemon/launchd.js.map +1 -1
  40. package/dist/esm/daemon/meta.js +5 -5
  41. package/dist/esm/daemon/meta.js.map +1 -1
  42. package/dist/esm/daemon/models.js +54 -31
  43. package/dist/esm/daemon/models.js.map +1 -1
  44. package/dist/esm/daemon/process-registry.js +206 -0
  45. package/dist/esm/daemon/process-registry.js.map +1 -0
  46. package/dist/esm/daemon/schtasks.js +96 -32
  47. package/dist/esm/daemon/schtasks.js.map +1 -1
  48. package/dist/esm/daemon/server.js +832 -158
  49. package/dist/esm/daemon/server.js.map +1 -1
  50. package/dist/esm/daemon/summarize-progress.js +11 -11
  51. package/dist/esm/daemon/summarize-progress.js.map +1 -1
  52. package/dist/esm/daemon/summarize.js +61 -32
  53. package/dist/esm/daemon/summarize.js.map +1 -1
  54. package/dist/esm/daemon/systemd.js +96 -35
  55. package/dist/esm/daemon/systemd.js.map +1 -1
  56. package/dist/esm/firecrawl.js +12 -12
  57. package/dist/esm/firecrawl.js.map +1 -1
  58. package/dist/esm/flags.js +55 -31
  59. package/dist/esm/flags.js.map +1 -1
  60. package/dist/esm/index.js +3 -3
  61. package/dist/esm/index.js.map +1 -1
  62. package/dist/esm/language.js +1 -1
  63. package/dist/esm/language.js.map +1 -1
  64. package/dist/esm/llm/cli.js +128 -64
  65. package/dist/esm/llm/cli.js.map +1 -1
  66. package/dist/esm/llm/errors.js +1 -1
  67. package/dist/esm/llm/errors.js.map +1 -1
  68. package/dist/esm/llm/generate-text.js +107 -98
  69. package/dist/esm/llm/generate-text.js.map +1 -1
  70. package/dist/esm/llm/google-models.js +17 -17
  71. package/dist/esm/llm/google-models.js.map +1 -1
  72. package/dist/esm/llm/html-to-markdown.js +3 -3
  73. package/dist/esm/llm/html-to-markdown.js.map +1 -1
  74. package/dist/esm/llm/model-id.js +38 -16
  75. package/dist/esm/llm/model-id.js.map +1 -1
  76. package/dist/esm/llm/prompt.js +5 -5
  77. package/dist/esm/llm/prompt.js.map +1 -1
  78. package/dist/esm/llm/providers/anthropic.js +33 -33
  79. package/dist/esm/llm/providers/anthropic.js.map +1 -1
  80. package/dist/esm/llm/providers/google.js +19 -19
  81. package/dist/esm/llm/providers/google.js.map +1 -1
  82. package/dist/esm/llm/providers/models.js +30 -30
  83. package/dist/esm/llm/providers/models.js.map +1 -1
  84. package/dist/esm/llm/providers/openai.js +36 -35
  85. package/dist/esm/llm/providers/openai.js.map +1 -1
  86. package/dist/esm/llm/providers/shared.js +8 -8
  87. package/dist/esm/llm/providers/shared.js.map +1 -1
  88. package/dist/esm/llm/transcript-to-markdown.js +9 -5
  89. package/dist/esm/llm/transcript-to-markdown.js.map +1 -1
  90. package/dist/esm/llm/usage.js +18 -18
  91. package/dist/esm/llm/usage.js.map +1 -1
  92. package/dist/esm/logging/daemon.js +21 -21
  93. package/dist/esm/logging/daemon.js.map +1 -1
  94. package/dist/esm/logging/ring-file.js +5 -5
  95. package/dist/esm/logging/ring-file.js.map +1 -1
  96. package/dist/esm/markitdown.js +21 -19
  97. package/dist/esm/markitdown.js.map +1 -1
  98. package/dist/esm/media-cache.js +251 -0
  99. package/dist/esm/media-cache.js.map +1 -0
  100. package/dist/esm/model-auto.js +175 -106
  101. package/dist/esm/model-auto.js.map +1 -1
  102. package/dist/esm/model-spec.js +52 -42
  103. package/dist/esm/model-spec.js.map +1 -1
  104. package/dist/esm/pricing/litellm.js +4 -4
  105. package/dist/esm/pricing/litellm.js.map +1 -1
  106. package/dist/esm/processes.js +2 -0
  107. package/dist/esm/processes.js.map +1 -0
  108. package/dist/esm/prompts/index.js +1 -1
  109. package/dist/esm/prompts/index.js.map +1 -1
  110. package/dist/esm/refresh-free.js +81 -81
  111. package/dist/esm/refresh-free.js.map +1 -1
  112. package/dist/esm/run/attachments.js +47 -44
  113. package/dist/esm/run/attachments.js.map +1 -1
  114. package/dist/esm/run/bird.js +125 -12
  115. package/dist/esm/run/bird.js.map +1 -1
  116. package/dist/esm/run/cache-state.js +7 -7
  117. package/dist/esm/run/cache-state.js.map +1 -1
  118. package/dist/esm/run/cli-fallback-state.js +45 -0
  119. package/dist/esm/run/cli-fallback-state.js.map +1 -0
  120. package/dist/esm/run/cli-preflight.js +40 -22
  121. package/dist/esm/run/cli-preflight.js.map +1 -1
  122. package/dist/esm/run/constants.js +12 -12
  123. package/dist/esm/run/constants.js.map +1 -1
  124. package/dist/esm/run/cookies/twitter.js +47 -47
  125. package/dist/esm/run/cookies/twitter.js.map +1 -1
  126. package/dist/esm/run/env.js +21 -15
  127. package/dist/esm/run/env.js.map +1 -1
  128. package/dist/esm/run/fetch-with-timeout.js +4 -4
  129. package/dist/esm/run/fetch-with-timeout.js.map +1 -1
  130. package/dist/esm/run/finish-line.js +78 -71
  131. package/dist/esm/run/finish-line.js.map +1 -1
  132. package/dist/esm/run/flows/asset/extract.js +70 -0
  133. package/dist/esm/run/flows/asset/extract.js.map +1 -0
  134. package/dist/esm/run/flows/asset/input.js +202 -37
  135. package/dist/esm/run/flows/asset/input.js.map +1 -1
  136. package/dist/esm/run/flows/asset/media-policy.js +3 -0
  137. package/dist/esm/run/flows/asset/media-policy.js.map +1 -0
  138. package/dist/esm/run/flows/asset/media.js +233 -0
  139. package/dist/esm/run/flows/asset/media.js.map +1 -0
  140. package/dist/esm/run/flows/asset/output.js +98 -0
  141. package/dist/esm/run/flows/asset/output.js.map +1 -0
  142. package/dist/esm/run/flows/asset/preprocess.js +79 -44
  143. package/dist/esm/run/flows/asset/preprocess.js.map +1 -1
  144. package/dist/esm/run/flows/asset/summary.js +306 -89
  145. package/dist/esm/run/flows/asset/summary.js.map +1 -1
  146. package/dist/esm/run/flows/url/extract.js +31 -31
  147. package/dist/esm/run/flows/url/extract.js.map +1 -1
  148. package/dist/esm/run/flows/url/flow.js +388 -82
  149. package/dist/esm/run/flows/url/flow.js.map +1 -1
  150. package/dist/esm/run/flows/url/markdown.js +61 -56
  151. package/dist/esm/run/flows/url/markdown.js.map +1 -1
  152. package/dist/esm/run/flows/url/slides-output.js +487 -0
  153. package/dist/esm/run/flows/url/slides-output.js.map +1 -0
  154. package/dist/esm/run/flows/url/slides-text.js +628 -0
  155. package/dist/esm/run/flows/url/slides-text.js.map +1 -0
  156. package/dist/esm/run/flows/url/summary.js +493 -152
  157. package/dist/esm/run/flows/url/summary.js.map +1 -1
  158. package/dist/esm/run/format.js +10 -10
  159. package/dist/esm/run/format.js.map +1 -1
  160. package/dist/esm/run/help.js +179 -84
  161. package/dist/esm/run/help.js.map +1 -1
  162. package/dist/esm/run/logging.js +20 -12
  163. package/dist/esm/run/logging.js.map +1 -1
  164. package/dist/esm/run/markdown.js +12 -12
  165. package/dist/esm/run/markdown.js.map +1 -1
  166. package/dist/esm/run/media-cache-state.js +33 -0
  167. package/dist/esm/run/media-cache-state.js.map +1 -0
  168. package/dist/esm/run/model-attempts.js.map +1 -1
  169. package/dist/esm/run/openrouter.js +11 -11
  170. package/dist/esm/run/openrouter.js.map +1 -1
  171. package/dist/esm/run/progress.js +19 -1
  172. package/dist/esm/run/progress.js.map +1 -1
  173. package/dist/esm/run/run-config.js +16 -16
  174. package/dist/esm/run/run-config.js.map +1 -1
  175. package/dist/esm/run/run-context.js +2 -2
  176. package/dist/esm/run/run-context.js.map +1 -1
  177. package/dist/esm/run/run-env.js +55 -54
  178. package/dist/esm/run/run-env.js.map +1 -1
  179. package/dist/esm/run/run-input.js +3 -3
  180. package/dist/esm/run/run-input.js.map +1 -1
  181. package/dist/esm/run/run-metrics.js +16 -16
  182. package/dist/esm/run/run-metrics.js.map +1 -1
  183. package/dist/esm/run/run-models.js +28 -23
  184. package/dist/esm/run/run-models.js.map +1 -1
  185. package/dist/esm/run/run-output.js +3 -3
  186. package/dist/esm/run/run-output.js.map +1 -1
  187. package/dist/esm/run/run-settings.js +108 -21
  188. package/dist/esm/run/run-settings.js.map +1 -1
  189. package/dist/esm/run/run-stream.js +4 -4
  190. package/dist/esm/run/run-stream.js.map +1 -1
  191. package/dist/esm/run/runner.js +327 -100
  192. package/dist/esm/run/runner.js.map +1 -1
  193. package/dist/esm/run/slides-cli.js +226 -0
  194. package/dist/esm/run/slides-cli.js.map +1 -0
  195. package/dist/esm/run/slides-render.js +163 -0
  196. package/dist/esm/run/slides-render.js.map +1 -0
  197. package/dist/esm/run/stdin-temp-file.js +77 -0
  198. package/dist/esm/run/stdin-temp-file.js.map +1 -0
  199. package/dist/esm/run/stream-output.js +17 -10
  200. package/dist/esm/run/stream-output.js.map +1 -1
  201. package/dist/esm/run/streaming.js +16 -16
  202. package/dist/esm/run/streaming.js.map +1 -1
  203. package/dist/esm/run/summary-engine.js +89 -57
  204. package/dist/esm/run/summary-engine.js.map +1 -1
  205. package/dist/esm/run/summary-llm.js +3 -3
  206. package/dist/esm/run/summary-llm.js.map +1 -1
  207. package/dist/esm/run/terminal.js +4 -4
  208. package/dist/esm/run/terminal.js.map +1 -1
  209. package/dist/esm/run/tips.js +2 -2
  210. package/dist/esm/run/tips.js.map +1 -1
  211. package/dist/esm/run/transcriber-cli.js +148 -0
  212. package/dist/esm/run/transcriber-cli.js.map +1 -0
  213. package/dist/esm/run.js +1 -1
  214. package/dist/esm/run.js.map +1 -1
  215. package/dist/esm/shared/contracts.js +1 -1
  216. package/dist/esm/shared/contracts.js.map +1 -1
  217. package/dist/esm/shared/sse-events.js +16 -12
  218. package/dist/esm/shared/sse-events.js.map +1 -1
  219. package/dist/esm/shared/streaming-merge.js +3 -3
  220. package/dist/esm/shared/streaming-merge.js.map +1 -1
  221. package/dist/esm/slides/extract.js +1951 -0
  222. package/dist/esm/slides/extract.js.map +1 -0
  223. package/dist/esm/slides/index.js +4 -0
  224. package/dist/esm/slides/index.js.map +1 -0
  225. package/dist/esm/slides/settings.js +73 -0
  226. package/dist/esm/slides/settings.js.map +1 -0
  227. package/dist/esm/slides/store.js +111 -0
  228. package/dist/esm/slides/store.js.map +1 -0
  229. package/dist/esm/slides/types.js +2 -0
  230. package/dist/esm/slides/types.js.map +1 -0
  231. package/dist/esm/tty/format.js +13 -13
  232. package/dist/esm/tty/format.js.map +1 -1
  233. package/dist/esm/tty/osc-progress.js +22 -2
  234. package/dist/esm/tty/osc-progress.js.map +1 -1
  235. package/dist/esm/tty/progress/fetch-html.js +20 -16
  236. package/dist/esm/tty/progress/fetch-html.js.map +1 -1
  237. package/dist/esm/tty/progress/transcript.js +127 -68
  238. package/dist/esm/tty/progress/transcript.js.map +1 -1
  239. package/dist/esm/tty/spinner.js +21 -10
  240. package/dist/esm/tty/spinner.js.map +1 -1
  241. package/dist/esm/tty/theme.js +189 -0
  242. package/dist/esm/tty/theme.js.map +1 -0
  243. package/dist/esm/tty/website-progress.js +38 -34
  244. package/dist/esm/tty/website-progress.js.map +1 -1
  245. package/dist/esm/version.js +29 -29
  246. package/dist/esm/version.js.map +1 -1
  247. package/dist/types/cache.d.ts +19 -7
  248. package/dist/types/config.d.ts +71 -6
  249. package/dist/types/content/asset.d.ts +8 -6
  250. package/dist/types/content/index.d.ts +1 -1
  251. package/dist/types/costs.d.ts +3 -3
  252. package/dist/types/daemon/agent.d.ts +25 -0
  253. package/dist/types/daemon/auto-mode.d.ts +3 -3
  254. package/dist/types/daemon/chat.d.ts +10 -18
  255. package/dist/types/daemon/cli-entrypoint.d.ts +2 -0
  256. package/dist/types/daemon/config.d.ts +2 -2
  257. package/dist/types/daemon/env-merge.d.ts +1 -1
  258. package/dist/types/daemon/env-snapshot.d.ts +1 -1
  259. package/dist/types/daemon/flow-context.d.ts +24 -4
  260. package/dist/types/daemon/launchd.d.ts +12 -0
  261. package/dist/types/daemon/models.d.ts +6 -2
  262. package/dist/types/daemon/process-registry.d.ts +73 -0
  263. package/dist/types/daemon/schtasks.d.ts +4 -0
  264. package/dist/types/daemon/server.d.ts +2 -2
  265. package/dist/types/daemon/summarize-progress.d.ts +1 -1
  266. package/dist/types/daemon/summarize.d.ts +38 -7
  267. package/dist/types/daemon/systemd.d.ts +4 -0
  268. package/dist/types/firecrawl.d.ts +1 -1
  269. package/dist/types/flags.d.ts +12 -11
  270. package/dist/types/index.d.ts +4 -4
  271. package/dist/types/language.d.ts +1 -1
  272. package/dist/types/llm/attachments.d.ts +1 -1
  273. package/dist/types/llm/cli.d.ts +3 -3
  274. package/dist/types/llm/generate-text.d.ts +7 -7
  275. package/dist/types/llm/html-to-markdown.d.ts +3 -3
  276. package/dist/types/llm/model-id.d.ts +1 -1
  277. package/dist/types/llm/prompt.d.ts +2 -2
  278. package/dist/types/llm/providers/anthropic.d.ts +3 -3
  279. package/dist/types/llm/providers/google.d.ts +3 -3
  280. package/dist/types/llm/providers/models.d.ts +2 -2
  281. package/dist/types/llm/providers/openai.d.ts +4 -4
  282. package/dist/types/llm/providers/shared.d.ts +2 -2
  283. package/dist/types/llm/transcript-to-markdown.d.ts +4 -2
  284. package/dist/types/llm/usage.d.ts +1 -1
  285. package/dist/types/logging/daemon.d.ts +4 -4
  286. package/dist/types/markitdown.d.ts +1 -1
  287. package/dist/types/media-cache.d.ts +22 -0
  288. package/dist/types/model-auto.d.ts +14 -4
  289. package/dist/types/model-spec.d.ts +10 -10
  290. package/dist/types/pricing/litellm.d.ts +1 -1
  291. package/dist/types/processes.d.ts +1 -0
  292. package/dist/types/prompts/index.d.ts +1 -1
  293. package/dist/types/run/attachments.d.ts +7 -7
  294. package/dist/types/run/bird.d.ts +7 -0
  295. package/dist/types/run/cache-state.d.ts +2 -2
  296. package/dist/types/run/cli-fallback-state.d.ts +6 -0
  297. package/dist/types/run/constants.d.ts +1 -1
  298. package/dist/types/run/cookies/twitter.d.ts +1 -1
  299. package/dist/types/run/env.d.ts +1 -1
  300. package/dist/types/run/finish-line.d.ts +7 -6
  301. package/dist/types/run/flows/asset/extract.d.ts +18 -0
  302. package/dist/types/run/flows/asset/input.d.ts +19 -3
  303. package/dist/types/run/flows/asset/media-policy.d.ts +2 -0
  304. package/dist/types/run/flows/asset/media.d.ts +21 -0
  305. package/dist/types/run/flows/asset/output.d.ts +42 -0
  306. package/dist/types/run/flows/asset/preprocess.d.ts +23 -17
  307. package/dist/types/run/flows/asset/summary.d.ts +24 -16
  308. package/dist/types/run/flows/url/extract.d.ts +3 -2
  309. package/dist/types/run/flows/url/flow.d.ts +1 -1
  310. package/dist/types/run/flows/url/markdown.d.ts +6 -6
  311. package/dist/types/run/flows/url/slides-output.d.ts +66 -0
  312. package/dist/types/run/flows/url/slides-text.d.ts +87 -0
  313. package/dist/types/run/flows/url/summary.d.ts +18 -10
  314. package/dist/types/run/flows/url/types.d.ts +52 -21
  315. package/dist/types/run/format.d.ts +3 -3
  316. package/dist/types/run/help.d.ts +4 -1
  317. package/dist/types/run/logging.d.ts +3 -2
  318. package/dist/types/run/media-cache-state.d.ts +7 -0
  319. package/dist/types/run/model-attempts.d.ts +1 -1
  320. package/dist/types/run/progress.d.ts +2 -1
  321. package/dist/types/run/run-config.d.ts +4 -4
  322. package/dist/types/run/run-context.d.ts +3 -1
  323. package/dist/types/run/run-env.d.ts +3 -1
  324. package/dist/types/run/run-input.d.ts +2 -2
  325. package/dist/types/run/run-metrics.d.ts +3 -3
  326. package/dist/types/run/run-models.d.ts +3 -2
  327. package/dist/types/run/run-output.d.ts +1 -1
  328. package/dist/types/run/run-settings.d.ts +20 -5
  329. package/dist/types/run/run-stream.d.ts +2 -2
  330. package/dist/types/run/runner.d.ts +3 -2
  331. package/dist/types/run/slides-cli.d.ts +9 -0
  332. package/dist/types/run/slides-render.d.ts +30 -0
  333. package/dist/types/run/stdin-temp-file.d.ts +9 -0
  334. package/dist/types/run/stream-output.d.ts +3 -2
  335. package/dist/types/run/streaming.d.ts +4 -4
  336. package/dist/types/run/summary-engine.d.ts +22 -12
  337. package/dist/types/run/summary-llm.d.ts +5 -5
  338. package/dist/types/run/transcriber-cli.d.ts +8 -0
  339. package/dist/types/run/types.d.ts +4 -4
  340. package/dist/types/run.d.ts +1 -1
  341. package/dist/types/shared/contracts.d.ts +2 -2
  342. package/dist/types/shared/sse-events.d.ts +26 -6
  343. package/dist/types/slides/extract.d.ts +43 -0
  344. package/dist/types/slides/index.d.ts +5 -0
  345. package/dist/types/slides/settings.d.ts +20 -0
  346. package/dist/types/slides/store.d.ts +15 -0
  347. package/dist/types/slides/types.d.ts +40 -0
  348. package/dist/types/tty/osc-progress.d.ts +5 -5
  349. package/dist/types/tty/progress/fetch-html.d.ts +5 -3
  350. package/dist/types/tty/progress/transcript.d.ts +5 -3
  351. package/dist/types/tty/spinner.d.ts +3 -1
  352. package/dist/types/tty/theme.d.ts +44 -0
  353. package/dist/types/tty/website-progress.d.ts +5 -3
  354. package/dist/types/version.d.ts +1 -1
  355. package/docs/README.md +1 -1
  356. package/docs/_config.yml +26 -0
  357. package/docs/_layouts/default.html +60 -0
  358. package/docs/agent.md +367 -0
  359. package/docs/assets/site.css +748 -0
  360. package/docs/assets/site.js +72 -0
  361. package/docs/assets/summarize-cli.png +0 -0
  362. package/docs/assets/summarize-extension.png +0 -0
  363. package/docs/assets/youtube-slides.png +0 -0
  364. package/docs/cache.md +29 -3
  365. package/docs/chrome-extension.md +72 -16
  366. package/docs/cli.md +59 -13
  367. package/docs/config.md +109 -12
  368. package/docs/extract-only.md +10 -0
  369. package/docs/index.html +224 -0
  370. package/docs/index.md +25 -0
  371. package/docs/llm.md +18 -5
  372. package/docs/manual-tests.md +2 -0
  373. package/docs/media.md +6 -2
  374. package/docs/model-auto.md +3 -2
  375. package/docs/nvidia-onnx-transcription.md +55 -0
  376. package/docs/openai.md +1 -1
  377. package/docs/releasing.md +3 -0
  378. package/docs/site/404.html +4 -1
  379. package/docs/site/assets/site.css +399 -228
  380. package/docs/site/assets/site.js +46 -46
  381. package/docs/site/assets/summarize-cli.png +0 -0
  382. package/docs/site/assets/summarize-extension.png +0 -0
  383. package/docs/site/docs/chrome-extension.html +101 -0
  384. package/docs/site/docs/config.html +30 -8
  385. package/docs/site/docs/extract-only.html +17 -4
  386. package/docs/site/docs/firecrawl.html +13 -3
  387. package/docs/site/docs/index.html +40 -6
  388. package/docs/site/docs/llm.html +20 -5
  389. package/docs/site/docs/openai.html +19 -5
  390. package/docs/site/docs/website.html +30 -9
  391. package/docs/site/docs/youtube.html +13 -3
  392. package/docs/site/index.html +168 -85
  393. package/docs/slides.md +82 -0
  394. package/docs/smoketest.md +29 -20
  395. package/docs/timestamps.md +124 -0
  396. package/docs/website.md +13 -0
  397. package/docs/youtube.md +20 -0
  398. package/package.json +57 -48
package/README.md CHANGED
@@ -1,17 +1,98 @@
1
- # Summarize 👉 Point at any URL or file. Get the gist.
1
+ # Summarize 📝 Chrome Side Panel + CLI
2
2
 
3
- Fast CLI for summarizing *anything you can point at*:
3
+ ![GitHub Repo Banner](https://ghrb.waren.build/banner?header=Summarize%F0%9F%93%9D&subheader=Chrome+Side+Panel+%2B+CLI&bg=f3f4f6&color=1f2937&support=true)
4
4
 
5
- - Web pages (article extraction; Firecrawl fallback if sites block agents)
6
- - YouTube links (best-effort transcripts; can fall back to audio transcription)
7
- - Podcasts (Apple Podcasts / Spotify / RSS; prefers published transcripts when available; otherwise transcribes full episodes)
8
- - Any audio/video (local files or direct media URLs; transcribe via Whisper, then summarize)
9
- - Remote files (PDFs/images/audio/video via URL — downloaded and forwarded to the model)
10
- - Local files (PDFs/images/audio/video/text — forwarded or inlined; support depends on provider/model)
5
+ <!-- Created with GitHub Repo Banner by Waren Gonzaga: https://ghrb.waren.build -->
11
6
 
12
- It streams output by default on TTY and renders Markdown to ANSI (via `markdansi`) using scrollback-safe hybrid streaming (line-by-line, but buffers fenced code blocks and tables as blocks). At the end it prints a single “Finished in …” line with timing, token usage, and a best-effort cost estimate (when pricing is available).
7
+ Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.
13
8
 
14
- ## Install
9
+ **0.11.0 preview (unreleased):** this README reflects the upcoming release.
10
+
11
+ ## 0.11.0 preview highlights (most interesting first)
12
+
13
+ - Chrome Side Panel **chat** (streaming agent + history) inside the sidebar.
14
+ - **YouTube slides**: screenshots + OCR + transcript cards, timestamped seek, OCR/Transcript toggle.
15
+ - Media-aware summaries: auto‑detect video/audio vs page content.
16
+ - Streaming Markdown + metrics + cache‑aware status.
17
+ - CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.
18
+
19
+ ## Feature overview
20
+
21
+ - URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.
22
+ - Slide extraction for video sources (YouTube/direct media) with OCR + timestamped cards.
23
+ - Transcript-first media flow: published transcripts when available, Whisper fallback when not.
24
+ - Streaming output with Markdown rendering, metrics, and cache-aware status.
25
+ - Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.
26
+ - Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.
27
+ - Smart default: if content is shorter than the requested length, we return it as-is (use `--force-summary` to override).
28
+
29
+ ## Get the extension (recommended)
30
+
31
+ ![Summarize extension screenshot](docs/assets/summarize-extension.png)
32
+
33
+ One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.
34
+
35
+ **Chrome Web Store:** [Summarize Side Panel](https://chromewebstore.google.com/detail/summarize/cejgnmmhbbpdmjnfppjdfkocebngehfg)
36
+
37
+ YouTube slide screenshots (from the browser):
38
+
39
+ ![Summarize YouTube slide screenshots](docs/assets/youtube-slides.png)
40
+
41
+ ### Beginner quickstart (extension)
42
+
43
+ 1. Install the CLI (choose one):
44
+ - **npm** (cross‑platform): `npm i -g @steipete/summarize`
45
+ - **Homebrew** (macOS arm64): `brew install steipete/tap/summarize`
46
+ 2. Install the extension (Chrome Web Store link above) and open the Side Panel.
47
+ 3. The panel shows a token + install command. Run it in Terminal:
48
+ - `summarize daemon install --token <TOKEN>`
49
+
50
+ Why a daemon/service?
51
+
52
+ - The extension can’t run heavy extraction inside the browser. It talks to a local background service on `127.0.0.1` for fast streaming and media tools (yt‑dlp, ffmpeg, OCR, transcription).
53
+ - The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.
54
+
55
+ If you only want the **CLI**, you can skip the daemon install entirely.
56
+
57
+ Notes:
58
+
59
+ - Summarization only runs when the Side Panel is open.
60
+ - Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.
61
+ - Daemon is localhost-only and requires a shared token.
62
+ - Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).
63
+ - Tip: configure `free` via `summarize refresh-free` (needs `OPENROUTER_API_KEY`). Add `--set-default` to set model=`free`.
64
+
65
+ More:
66
+
67
+ - Step-by-step install: [apps/chrome-extension/README.md](apps/chrome-extension/README.md)
68
+ - Architecture + troubleshooting: [docs/chrome-extension.md](docs/chrome-extension.md)
69
+ - Firefox compatibility notes: [apps/chrome-extension/docs/firefox.md](apps/chrome-extension/docs/firefox.md)
70
+
71
+ ### Slides (extension)
72
+
73
+ - Select **Video + Slides** in the Summarize picker.
74
+ - Slides render at the top; expand to full‑width cards with timestamps.
75
+ - Click a slide to seek the video; toggle **Transcript/OCR** when OCR is significant.
76
+ - Requirements: `yt-dlp` + `ffmpeg` for extraction; `tesseract` for OCR. Missing tools show an in‑panel notice.
77
+
78
+ ### Advanced (unpacked / dev)
79
+
80
+ 1. Build + load the extension (unpacked):
81
+ - Chrome: `pnpm -C apps/chrome-extension build`
82
+ - `chrome://extensions` → Developer mode → Load unpacked
83
+ - Pick: `apps/chrome-extension/.output/chrome-mv3`
84
+ - Firefox: `pnpm -C apps/chrome-extension build:firefox`
85
+ - `about:debugging#/runtime/this-firefox` → Load Temporary Add-on
86
+ - Pick: `apps/chrome-extension/.output/firefox-mv3/manifest.json`
87
+ 2. Open Side Panel/Sidebar → copy token.
88
+ 3. Install daemon in dev mode:
89
+ - `pnpm summarize daemon install --token <TOKEN> --dev`
90
+
91
+ ## CLI
92
+
93
+ ![Summarize CLI screenshot](docs/assets/summarize-cli.png)
94
+
95
+ ### Install
15
96
 
16
97
  Requires Node 22+.
17
98
 
@@ -21,7 +102,7 @@ Requires Node 22+.
21
102
  npx -y @steipete/summarize "https://example.com"
22
103
  ```
23
104
 
24
- - npm (global install):
105
+ - npm (global):
25
106
 
26
107
  ```bash
27
108
  npm i -g @steipete/summarize
@@ -34,7 +115,7 @@ npm i @steipete/summarize-core
34
115
  ```
35
116
 
36
117
  ```ts
37
- import { createLinkPreviewClient } from '@steipete/summarize-core/content'
118
+ import { createLinkPreviewClient } from "@steipete/summarize-core/content";
38
119
  ```
39
120
 
40
121
  - Homebrew (custom tap):
@@ -45,110 +126,112 @@ brew install steipete/tap/summarize
45
126
 
46
127
  Apple Silicon only (arm64).
47
128
 
48
- ## Quickstart
129
+ ### CLI vs extension
130
+
131
+ - **CLI only:** just install via npm/Homebrew and run `summarize ...` (no daemon needed).
132
+ - **Chrome/Firefox extension:** install the CLI **and** run `summarize daemon install --token <TOKEN>` so the Side Panel can stream results and use local tools.
133
+
134
+ ### Quickstart
49
135
 
50
136
  ```bash
51
137
  summarize "https://example.com"
52
138
  ```
53
139
 
54
- ## Chrome Extension (Side Panel)
140
+ ### Inputs
55
141
 
56
- Want a one-click “always-on” summarizer in Chrome (real Side Panel, not injected UI)?
142
+ URLs or local paths:
57
143
 
58
- This is a **Chrome extension** + a tiny local **daemon** (autostart service) that streams Markdown summaries for the **currently visible tab** into the Side Panel.
59
-
60
- - Step-by-step install (Chrome + daemon): `apps/chrome-extension/README.md`
61
- - Architecture + troubleshooting: `docs/chrome-extension.md`
62
-
63
- Quickstart (local daemon):
64
-
65
- 1) Install summarize (choose one):
66
- - `npm i -g @steipete/summarize`
67
- - `brew install steipete/tap/summarize` (macOS arm64)
68
- 2) Build + load the extension (unpacked):
69
- - `pnpm -C apps/chrome-extension build`
70
- - Chrome → `chrome://extensions` → Developer mode → “Load unpacked”
71
- - Pick: `apps/chrome-extension/.output/chrome-mv3`
72
- 3) Open the Side Panel → it shows a token + install command.
73
- 4) Run the install command in Terminal:
74
- - Installed binary: `summarize daemon install --token <TOKEN>`
75
- - Repo/dev checkout: `pnpm summarize daemon install --token <TOKEN> --dev`
76
- 5) Verify / debug:
77
- - `summarize daemon status`
78
- - `summarize daemon restart`
144
+ ```bash
145
+ summarize "/path/to/file.pdf" --model google/gemini-3-flash-preview
146
+ summarize "https://example.com/report.pdf" --model google/gemini-3-flash-preview
147
+ summarize "/path/to/audio.mp3"
148
+ summarize "/path/to/video.mp4"
149
+ ```
79
150
 
80
- Notes:
151
+ Stdin (pipe content using `-`):
81
152
 
82
- - Summarization only runs when the Side Panel is open.
83
- - “Auto” mode summarizes on navigation (incl. SPAs); otherwise use the button.
84
- - The daemon is localhost-only and requires a shared token.
85
- - Daemon autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).
86
- - Tip: configure `free` via `summarize refresh-free` (requires `OPENROUTER_API_KEY`). Add `--set-default` to also set model=`free`, then set Model to `free` in extension settings.
153
+ ```bash
154
+ echo "content" | summarize -
155
+ pbpaste | summarize -
156
+ # binary stdin also works (PDF/image/audio/video bytes)
157
+ cat /path/to/file.pdf | summarize -
158
+ ```
87
159
 
88
- Troubleshooting:
160
+ **Notes:**
89
161
 
90
- - **“Receiving end does not exist”**: Chrome didn’t inject the content script yet.
91
- - Extension details “Site access” → set to “On all sites” (or allow this domain)
92
- - Reload the tab once.
93
- - **“Failed to fetch” / daemon unreachable**:
94
- - Run `summarize daemon status`
95
- - Check logs: `~/.summarize/logs/daemon.err.log`
162
+ - Stdin has a 50MB size limit
163
+ - The `-` argument tells summarize to read from standard input
164
+ - Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty)
165
+ - Binary stdin is preserved as raw bytes and file type is auto-detected when possible
166
+ - Useful for piping clipboard content or command output
96
167
 
97
- Input can be a URL or a local file path:
168
+ YouTube (supports `youtube.com` and `youtu.be`):
98
169
 
99
170
  ```bash
100
- npx -y @steipete/summarize "/path/to/file.pdf" --model google/gemini-3-flash-preview
101
- npx -y @steipete/summarize "/path/to/image.jpeg" --model google/gemini-3-flash-preview
171
+ summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto
102
172
  ```
103
173
 
104
- Remote file URLs work the same (best-effort; the file is downloaded and passed to the model):
174
+ Podcast RSS (transcribes latest enclosure):
105
175
 
106
176
  ```bash
107
- npx -y @steipete/summarize "https://example.com/report.pdf" --model google/gemini-3-flash-preview
177
+ summarize "https://feeds.npr.org/500005/podcast.xml"
108
178
  ```
109
179
 
110
- YouTube (supports `youtube.com` and `youtu.be`):
180
+ Apple Podcasts episode page:
111
181
 
112
182
  ```bash
113
- npx -y @steipete/summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto
183
+ summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"
114
184
  ```
115
185
 
116
- Podcast RSS feed (transcribes latest episode enclosure):
186
+ Spotify episode page (best-effort; may fail for exclusives):
117
187
 
118
188
  ```bash
119
- npx -y @steipete/summarize "https://feeds.npr.org/500005/podcast.xml"
189
+ summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"
120
190
  ```
121
191
 
122
- Apple Podcasts episode page (extracts stream URL, transcribes via Whisper):
123
-
124
- ```bash
125
- npx -y @steipete/summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"
126
- ```
192
+ ### Output length
127
193
 
128
- Spotify episode page (best-effort; resolves to full episode via iTunes/RSS enclosure when available — not preview clips; may fail for Spotify-exclusive shows):
194
+ `--length` controls how much output we ask for (guideline), not a hard cap.
129
195
 
130
196
  ```bash
131
- npx -y @steipete/summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"
197
+ summarize "https://example.com" --length long
198
+ summarize "https://example.com" --length 20k
132
199
  ```
133
200
 
134
- ## What file types work?
201
+ - Presets: `short|medium|long|xl|xxl`
202
+ - Character targets: `1500`, `20k`, `20000`
203
+ - Optional hard cap: `--max-output-tokens <count>` (e.g. `2000`, `2k`)
204
+ - Provider/model APIs still enforce their own maximum output limits.
205
+ - If omitted, no max token parameter is sent (provider default).
206
+ - Prefer `--length` unless you need a hard cap.
207
+ - Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is.
208
+ - Override with `--force-summary` to always run the LLM.
209
+ - Minimums: `--length` numeric values must be >= 50 chars; `--max-output-tokens` must be >= 16.
210
+ - Preset targets (source of truth: `packages/core/src/prompts/summary-lengths.ts`):
211
+ - short: target ~900 chars (range 600-1,200)
212
+ - medium: target ~1,800 chars (range 1,200-2,500)
213
+ - long: target ~4,200 chars (range 2,500-6,000)
214
+ - xl: target ~9,000 chars (range 6,000-14,000)
215
+ - xxl: target ~17,000 chars (range 14,000-22,000)
216
+
217
+ ### What file types work?
135
218
 
136
- This is “best effort and depends on what your selected model/provider accepts. In practice these usually work well:
219
+ Best effort and provider-dependent. These usually work well:
137
220
 
138
- - `text/*` and common structured text (`.txt`, `.md`, `.json`, `.yaml`, `.xml`, )
139
- - text-like files are **inlined into the prompt** (instead of attached as a file part) for better provider compatibility
140
- - PDFs: `application/pdf` (provider support varies; Google is the most reliable in this CLI right now)
221
+ - `text/*` and common structured text (`.txt`, `.md`, `.json`, `.yaml`, `.xml`, ...)
222
+ - Text-like files are inlined into the prompt for better provider compatibility.
223
+ - PDFs: `application/pdf` (provider support varies; Google is the most reliable here)
141
224
  - Images: `image/jpeg`, `image/png`, `image/webp`, `image/gif`
142
- - Audio/Video: `audio/*`, `video/*` (when supported by the model)
225
+ - Audio/Video: `audio/*`, `video/*` (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)
143
226
 
144
227
  Notes:
145
228
 
146
- - If a provider rejects a media type, the CLI fails fast with a friendly message (no “mystery stack traces”).
147
- - xAI models currently don’t support attaching generic files (like PDFs) via the AI SDK; use a Google/OpenAI/Anthropic model for those.
229
+ - If a provider rejects a media type, the CLI fails fast with a friendly message.
230
+ - xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.
148
231
 
149
- ## Model ids
232
+ ### Model ids
150
233
 
151
- Use gateway-style ids: `<provider>/<model>`.
234
+ Use gateway-style ids: `<provider>/<model>`.
152
235
 
153
236
  Examples:
154
237
 
@@ -159,112 +242,110 @@ Examples:
159
242
  - `zai/glm-4.7`
160
243
  - `openrouter/openai/gpt-5-mini` (force OpenRouter)
161
244
 
162
- Note: some models/providers don’t support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).
163
-
164
- ## Output length
165
-
166
- `--length` controls *how much output we ask for* (guideline), not a hard truncation.
167
-
168
- ```bash
169
- npx -y @steipete/summarize "https://example.com" --length long
170
- npx -y @steipete/summarize "https://example.com" --length 20k
171
- ```
172
-
173
- - Presets: `short|medium|long|xl|xxl`
174
- - Character targets: `1500`, `20k`, `20000`
175
- - Optional hard cap: `--max-output-tokens <count>` (e.g. `2000`, `2k`)
176
- - Provider/model APIs still enforce their own maximum output limits.
177
- - If omitted, no max token parameter is sent (provider default).
178
- - Prefer `--length` unless you need a hard cap (some providers count “reasoning” into the cap).
179
- - Minimums: `--length` numeric values must be ≥ 50 chars; `--max-output-tokens` must be ≥ 16.
180
- - Preset targets (source of truth: `packages/core/src/prompts/summary-lengths.ts`):
181
- - short: target ~900 chars (range 600-1,200)
182
- - medium: target ~1,800 chars (range 1,200-2,500)
183
- - long: target ~4,200 chars (range 2,500-6,000)
184
- - xl: target ~9,000 chars (range 6,000-14,000)
185
- - xxl: target ~17,000 chars (range 14,000-22,000)
245
+ Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).
186
246
 
187
- ## Limits
247
+ ### Limits
188
248
 
189
249
  - Text inputs over 10 MB are rejected before tokenization.
190
- - Text prompts are preflighted against the model’s input limit (LiteLLM catalog), using a GPT tokenizer.
250
+ - Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.
191
251
 
192
- ## Common flags
252
+ ### Common flags
193
253
 
194
254
  ```bash
195
- npx -y @steipete/summarize <input> [flags]
255
+ summarize <input> [flags]
196
256
  ```
197
257
 
198
258
  Use `summarize --help` or `summarize help` for the full help text.
199
259
 
200
260
  - `--model <provider/model>`: which model to use (defaults to `auto`)
201
261
  - `--model auto`: automatic model selection + fallback (default)
202
- - `--model <name>`: use a config-defined model (see Configuration)
262
+ - `--model <name>`: use a config-defined model (see Configuration)
203
263
  - `--timeout <duration>`: `30s`, `2m`, `5000ms` (default `2m`)
204
264
  - `--retries <count>`: LLM retry attempts on timeout (default `1`)
205
265
  - `--length short|medium|long|xl|xxl|s|m|l|<chars>`
206
- - `--language, --lang <language>`: output language (`auto` = match source; or `en`, `de`, `english`, `german`, ...)
207
- - `--max-output-tokens <count>`: hard cap for LLM output tokens (optional; only sent when set)
208
- - `--cli [provider]`: use a CLI provider (case-insensitive; equivalent to `--model cli/<provider>`). If omitted, uses auto selection with CLI enabled.
266
+ - `--language, --lang <language>`: output language (`auto` = match source)
267
+ - `--max-output-tokens <count>`: hard cap for LLM output tokens
268
+ - `--cli [provider]`: use a CLI provider (`--model cli/<provider>`). Supports `claude`, `gemini`, `codex`, `agent`. If omitted, uses auto selection with CLI enabled.
209
269
  - `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)
210
- - `--plain`: Keep raw output (no ANSI/OSC Markdown rendering)
270
+ - `--plain`: keep raw output (no ANSI/OSC Markdown rendering)
211
271
  - `--no-color`: disable ANSI colors
272
+ - `--theme <name>`: CLI theme (`aurora`, `ember`, `moss`, `mono`)
212
273
  - `--format md|text`: website/file content format (default `text`)
213
- - `--markdown-mode off|auto|llm|readability`: Markdown conversion mode (default `readability`). For websites: HTML→Markdown conversion. For YouTube transcripts: `llm` formats the raw transcript into clean Markdown (headings/paragraphs).
214
- - `--preprocess off|auto|always`: controls `uvx markitdown` usage (default `auto`; `always` forces file preprocessing)
274
+ - `--markdown-mode off|auto|llm|readability`: HTML -> Markdown mode (default `readability`)
275
+ - `--preprocess off|auto|always`: controls `uvx markitdown` usage (default `auto`)
215
276
  - Install `uvx`: `brew install uv` (or https://astral.sh/uv/)
216
- - `--extract`: print extracted content and exit (no summary) only for URLs
277
+ - `--extract`: print extracted content and exit (URLs only; stdin `-` is not supported)
217
278
  - Deprecated alias: `--extract-only`
279
+ - `--slides`: extract slides for YouTube/direct video URLs and render them inline in the summary narrative (auto-renders inline in supported terminals)
280
+ - `--slides-ocr`: run OCR on extracted slides (requires `tesseract`)
281
+ - `--slides-dir <dir>`: base output dir for slide images (default `./slides`)
282
+ - `--slides-scene-threshold <value>`: scene detection threshold (0.1-1.0)
283
+ - `--slides-max <count>`: maximum slides to extract (default `6`)
284
+ - `--slides-min-duration <seconds>`: minimum seconds between slides
218
285
  - `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary
219
286
  - `--verbose`: debug/diagnostics on stderr
220
- - `--metrics off|on|detailed`: metrics output (default `on`; `detailed` adds a compact 2nd-line breakdown on stderr)
287
+ - `--metrics off|on|detailed`: metrics output (default `on`)
221
288
 
222
- ## Verified podcast services (2025-12-25)
289
+ ### Coding CLIs (Codex, Claude, Gemini, Agent)
223
290
 
224
- Run: `summarize <url>`
291
+ Summarize can use common coding CLIs as local model backends:
225
292
 
226
- - Apple Podcasts
227
- - Spotify
228
- - Amazon Music / Audible podcast pages
229
- - Podbean
230
- - Podchaser
231
- - RSS feeds (Podcasting 2.0 transcripts when available)
232
- - Embedded YouTube podcast pages (e.g. JREPodcast)
293
+ - `codex` -> `--cli codex` / `--model cli/codex/<model>`
294
+ - `claude` -> `--cli claude` / `--model cli/claude/<model>`
295
+ - `gemini` -> `--cli gemini` / `--model cli/gemini/<model>`
296
+ - `agent` (Cursor Agent CLI) -> `--cli agent` / `--model cli/agent/<model>`
233
297
 
234
- Transcription: prefers local `whisper.cpp` when installed; otherwise uses OpenAI Whisper or FAL when keys are set.
298
+ Requirements:
235
299
 
236
- ## Translation paths
300
+ - Binary installed and on `PATH` (or set `CODEX_PATH`, `CLAUDE_PATH`, `GEMINI_PATH`, `AGENT_PATH`)
301
+ - Provider authenticated (`codex login`, `claude auth`, `gemini` login flow, `agent login` or `CURSOR_API_KEY`)
237
302
 
238
- `--language/--lang` controls the *output language* of the summary (and other LLM-generated text). Default is `auto` (match source language).
303
+ Quick smoke test:
239
304
 
240
- When the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:
305
+ ```bash
306
+ printf "Summarize CLI smoke input.\nOne short paragraph. Reply can be brief.\n" >/tmp/summarize-cli-smoke.txt
241
307
 
242
- 1. **Existing transcript (preferred)**
243
- - YouTube: uses `youtubei` / `captionTracks` when available.
244
- - Podcasts: uses Podcasting 2.0 RSS `<podcast:transcript>` (JSON/VTT) when the feed publishes it.
245
- 2. **Whisper transcription (fallback)**
246
- - YouTube: falls back to `yt-dlp` (audio download) + Whisper transcription when configured; Apify is a last-last resort (requires `APIFY_API_TOKEN`).
247
- - Prefers local `whisper.cpp` when installed + model available.
248
- - Otherwise uses cloud Whisper (OpenAI `OPENAI_API_KEY`) or FAL (`FAL_KEY`) depending on configuration.
308
+ summarize --cli codex --plain --timeout 2m /tmp/summarize-cli-smoke.txt
309
+ summarize --cli claude --plain --timeout 2m /tmp/summarize-cli-smoke.txt
310
+ summarize --cli gemini --plain --timeout 2m /tmp/summarize-cli-smoke.txt
311
+ summarize --cli agent --plain --timeout 2m /tmp/summarize-cli-smoke.txt
312
+ ```
249
313
 
250
- For “any video/audio file” (local path or direct media URL), use `--video-mode transcript` to force “transcribe → summarize”:
314
+ Set explicit CLI allowlist/order:
251
315
 
252
- ```bash
253
- summarize /path/to/file.mp4 --video-mode transcript --lang en
316
+ ```json
317
+ {
318
+ "cli": { "enabled": ["codex", "claude", "gemini", "agent"] }
319
+ }
320
+ ```
321
+
322
+ Configure implicit auto CLI fallback:
323
+
324
+ ```json
325
+ {
326
+ "cli": {
327
+ "autoFallback": {
328
+ "enabled": true,
329
+ "onlyWhenNoApiKeys": true,
330
+ "order": ["claude", "gemini", "codex", "agent"]
331
+ }
332
+ }
333
+ }
254
334
  ```
255
335
 
256
- ## Auto model ordering
336
+ More details: [`docs/cli.md`](docs/cli.md)
337
+
338
+ ### Auto model ordering
257
339
 
258
340
  `--model auto` builds candidate attempts from built-in rules (or your `model.rules` overrides).
259
- CLI tools are **not** used in auto mode unless you explicitly enable them via `cli.enabled` in config.
260
- Why: CLI adds ~4s latency per attempt and higher variance.
261
- Shortcut: `--cli` (with no provider) uses auto selection with CLI enabled.
341
+ CLI attempts are prepended when:
262
342
 
263
- When enabled, auto prepends CLI attempts in the order listed in `cli.enabled`
264
- (recommended: `["gemini"]`), then tries the native provider candidates
265
- (with OpenRouter fallbacks when configured).
343
+ - `cli.enabled` is set (explicit allowlist/order), or
344
+ - implicit auto selection is active and `cli.autoFallback` is enabled.
266
345
 
267
- Enable CLI attempts:
346
+ Default fallback behavior: only when no API keys are configured, order `claude, gemini, codex, agent`, and remember/prioritize last successful provider (`~/.summarize/cli-state.json`).
347
+
348
+ Set explicit CLI attempts:
268
349
 
269
350
  ```json
270
351
  {
@@ -272,56 +353,125 @@ Enable CLI attempts:
272
353
  }
273
354
  ```
274
355
 
275
- Disable CLI attempts:
356
+ Disable implicit auto CLI fallback:
276
357
 
277
358
  ```json
278
359
  {
279
- "cli": { "enabled": [] }
360
+ "cli": { "autoFallback": { "enabled": false } }
280
361
  }
281
362
  ```
282
363
 
283
- Note: when `cli.enabled` is set, it’s also an allowlist for explicit `--cli` / `--model cli/...`.
364
+ Note: explicit `--model auto` does not trigger implicit auto CLI fallback unless `cli.enabled` is set.
284
365
 
285
- ## Website extraction (Firecrawl + Markdown)
366
+ ### Website extraction (Firecrawl + Markdown)
286
367
 
287
- Non-YouTube URLs go through a fetch extract pipeline. When the direct fetch/extraction is blocked or too thin, `--firecrawl auto` can fall back to Firecrawl (if configured).
368
+ Non-YouTube URLs go through a fetch -> extract pipeline. When direct fetch/extraction is blocked or too thin,
369
+ `--firecrawl auto` can fall back to Firecrawl (if configured).
288
370
 
289
371
  - `--firecrawl off|auto|always` (default `auto`)
290
372
  - `--extract --format md|text` (default `text`; if `--format` is omitted, `--extract` defaults to `md` for non-YouTube URLs)
291
- - `--markdown-mode off|auto|llm|readability` (default `readability`; for non-YouTube URLs this controls HTML→Markdown conversion)
373
+ - `--markdown-mode off|auto|llm|readability` (default `readability`)
292
374
  - `auto`: use an LLM converter when configured; may fall back to `uvx markitdown`
293
375
  - `llm`: force LLM conversion (requires a configured model key)
294
376
  - `off`: disable LLM conversion (still may return Firecrawl Markdown when configured)
295
377
  - Plain-text mode: use `--format text`.
296
378
 
297
- ## YouTube transcripts
379
+ ### YouTube transcripts
298
380
 
299
- `--youtube auto` tries best-effort web transcript endpoints first. When captions aren't available, it falls back to:
381
+ `--youtube auto` tries best-effort web transcript endpoints first. When captions are not available, it falls back to:
300
382
 
301
- 1. **Apify** (if `APIFY_API_TOKEN` is set): Uses a scraping actor (`faVsWy9VTSNVIhWpR`)
302
- 2. **yt-dlp + Whisper** (if `yt-dlp` is available): Downloads audio via yt-dlp, transcribes with local `whisper.cpp` when installed (preferred), otherwise falls back to OpenAI (`OPENAI_API_KEY`) or FAL (`FAL_KEY`)
383
+ 1. Apify (if `APIFY_API_TOKEN` is set): uses a scraping actor (`faVsWy9VTSNVIhWpR`)
384
+ 2. yt-dlp + Whisper (if `yt-dlp` is available): downloads audio, then transcribes with local `whisper.cpp` when installed
385
+ (preferred), otherwise falls back to OpenAI (`OPENAI_API_KEY`) or FAL (`FAL_KEY`)
303
386
 
304
387
  Environment variables for yt-dlp mode:
388
+
305
389
  - `YT_DLP_PATH` - optional path to yt-dlp binary (otherwise `yt-dlp` is resolved via `PATH`)
306
390
  - `SUMMARIZE_WHISPER_CPP_MODEL_PATH` - optional override for the local `whisper.cpp` model file
307
391
  - `SUMMARIZE_WHISPER_CPP_BINARY` - optional override for the local binary (default: `whisper-cli`)
308
392
  - `SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1` - disable local whisper.cpp (force remote)
309
393
  - `OPENAI_API_KEY` - OpenAI Whisper transcription
394
+ - `OPENAI_WHISPER_BASE_URL` - optional OpenAI-compatible Whisper endpoint override
310
395
  - `FAL_KEY` - FAL AI Whisper fallback
311
396
 
312
397
  Apify costs money but tends to be more reliable when captions exist.
313
398
 
399
+ ### Slide extraction (YouTube + direct video URLs)
400
+
401
+ Extract slide screenshots (scene detection via `ffmpeg`) and optional OCR:
402
+
403
+ ```bash
404
+ summarize "https://www.youtube.com/watch?v=..." --slides
405
+ summarize "https://www.youtube.com/watch?v=..." --slides --slides-ocr
406
+ ```
407
+
408
+ Outputs are written under `./slides/<sourceId>/` (or `--slides-dir`). OCR results are included in JSON output
409
+ (`--json`) and stored in `slides.json` inside the slide directory. When scene detection is too sparse, the
410
+ extractor also samples at a fixed interval to improve coverage.
411
+ When using `--slides`, supported terminals (kitty/iTerm/Konsole) render inline thumbnails automatically inside the
412
+ summary narrative (the model inserts `[slide:N]` markers). Timestamp links are clickable when the terminal supports
413
+ OSC-8 (YouTube/Vimeo/Loom/Dropbox). If inline images are unsupported, Summarize prints a note with the on-disk
414
+ slide directory.
415
+
416
+ Use `--slides --extract` to print the full timed transcript and insert slide images inline at matching timestamps.
417
+
314
418
  Format the extracted transcript as Markdown (headings + paragraphs) via an LLM:
315
419
 
316
420
  ```bash
317
421
  summarize "https://www.youtube.com/watch?v=..." --extract --format md --markdown-mode llm
318
422
  ```
319
423
 
320
- ## Media transcription (Whisper)
424
+ ### Media transcription (Whisper)
425
+
426
+ Local audio/video files are transcribed first, then summarized. `--video-mode transcript` forces
427
+ direct media URLs (and embedded media) through Whisper first. Prefers local `whisper.cpp` when available; otherwise requires
428
+ `OPENAI_API_KEY` or `FAL_KEY`.
321
429
 
322
- `--video-mode transcript` forces audio/video inputs (local files or direct media URLs) through Whisper first, then summarizes the transcript text. Prefers local `whisper.cpp` when available; otherwise requires `OPENAI_API_KEY` or `FAL_KEY`.
430
+ ### Local ONNX transcription (Parakeet/Canary)
323
431
 
324
- ## Configuration
432
+ Summarize can use NVIDIA Parakeet/Canary ONNX models via a local CLI you provide. Auto selection (default) prefers ONNX when configured.
433
+
434
+ - Setup helper: `summarize transcriber setup`
435
+ - Install `sherpa-onnx` from upstream binaries/build (Homebrew may not have a formula)
436
+ - Auto selection: set `SUMMARIZE_ONNX_PARAKEET_CMD` or `SUMMARIZE_ONNX_CANARY_CMD` (no flag needed)
437
+ - Force a model: `--transcriber parakeet|canary|whisper|auto`
438
+ - Docs: `docs/nvidia-onnx-transcription.md`
439
+
440
+ ### Verified podcast services (2025-12-25)
441
+
442
+ Run: `summarize <url>`
443
+
444
+ - Apple Podcasts
445
+ - Spotify
446
+ - Amazon Music / Audible podcast pages
447
+ - Podbean
448
+ - Podchaser
449
+ - RSS feeds (Podcasting 2.0 transcripts when available)
450
+ - Embedded YouTube podcast pages (e.g. JREPodcast)
451
+
452
+ Transcription: prefers local `whisper.cpp` when installed; otherwise uses OpenAI Whisper or FAL when keys are set.
453
+
454
+ ### Translation paths
455
+
456
+ `--language/--lang` controls the output language of the summary (and other LLM-generated text). Default is `auto`.
457
+
458
+ When the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:
459
+
460
+ 1. Existing transcript (preferred)
461
+ - YouTube: uses `youtubei` / `captionTracks` when available.
462
+ - Podcasts: uses Podcasting 2.0 RSS `<podcast:transcript>` (JSON/VTT) when the feed publishes it.
463
+ 2. Whisper transcription (fallback)
464
+ - YouTube: falls back to yt-dlp (audio download) + Whisper transcription when configured; Apify is a last resort.
465
+ - Prefers local `whisper.cpp` when installed + model available.
466
+ - Otherwise uses cloud Whisper (OpenAI `OPENAI_API_KEY`) or FAL (`FAL_KEY`).
467
+
468
+ For direct media URLs, use `--video-mode transcript` to force transcribe -> summarize:
469
+
470
+ ```bash
471
+ summarize https://example.com/file.mp4 --video-mode transcript --lang en
472
+ ```
473
+
474
+ ### Configuration
325
475
 
326
476
  Single config location:
327
477
 
@@ -331,7 +481,9 @@ Supported keys today:
331
481
 
332
482
  ```json
333
483
  {
334
- "model": { "id": "openai/gpt-5-mini" }
484
+ "model": { "id": "openai/gpt-5-mini" },
485
+ "env": { "OPENAI_API_KEY": "sk-..." },
486
+ "ui": { "theme": "ember" }
335
487
  }
336
488
  ```
337
489
 
@@ -345,67 +497,102 @@ Shorthand (equivalent):
345
497
 
346
498
  Also supported:
347
499
 
348
- - `model: { "mode": "auto" }` (automatic model selection + fallback; see `docs/model-auto.md`)
500
+ - `model: { "mode": "auto" }` (automatic model selection + fallback; see [docs/model-auto.md](docs/model-auto.md))
349
501
  - `model.rules` (customize candidates / ordering)
350
502
  - `models` (define presets selectable via `--model <preset>`)
503
+ - `env` (generic env var defaults; process env still wins)
504
+ - `apiKeys` (legacy shortcut, mapped to env names; prefer `env` for new configs)
505
+ - `cache.media` (media download cache: TTL 7 days, 2048 MB cap by default; `--no-media-cache` disables)
351
506
  - `media.videoMode: "auto"|"transcript"|"understand"`
507
+ - `slides.enabled` / `slides.max` / `slides.ocr` / `slides.dir` (defaults for `--slides`)
508
+ - `ui.theme: "aurora"|"ember"|"moss"|"mono"`
352
509
  - `openai.useChatCompletions: true` (force OpenAI-compatible chat completions)
353
510
 
354
- Note: the config is parsed leniently (JSON5), but **comments are not allowed**.
355
- Unknown keys are ignored.
511
+ Note: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.
512
+
513
+ Media cache defaults:
514
+
515
+ ```json
516
+ {
517
+ "cache": {
518
+ "media": { "enabled": true, "ttlDays": 7, "maxMb": 2048, "verify": "size" }
519
+ }
520
+ }
521
+ ```
522
+
523
+ Note: `--no-cache` bypasses summary caching only (LLM output). Extract/transcript caches still apply. Use `--no-media-cache` to skip media files.
356
524
 
357
525
  Precedence:
358
526
 
359
- 1) `--model`
360
- 2) `SUMMARIZE_MODEL`
361
- 3) `~/.summarize/config.json`
362
- 4) default (`auto`)
527
+ 1. `--model`
528
+ 2. `SUMMARIZE_MODEL`
529
+ 3. `~/.summarize/config.json`
530
+ 4. default (`auto`)
531
+
532
+ Theme precedence:
363
533
 
364
- ## Environment variables
534
+ 1. `--theme`
535
+ 2. `SUMMARIZE_THEME`
536
+ 3. `~/.summarize/config.json` (`ui.theme`)
537
+ 4. default (`aurora`)
538
+
539
+ Environment variable precedence:
540
+
541
+ 1. process env
542
+ 2. `~/.summarize/config.json` (`env`)
543
+ 3. `~/.summarize/config.json` (`apiKeys`, legacy)
544
+
545
+ ### Environment variables
365
546
 
366
547
  Set the key matching your chosen `--model`:
367
548
 
549
+ - Optional fallback defaults can be stored in config:
550
+ - `~/.summarize/config.json` -> `"env": { "OPENAI_API_KEY": "sk-..." }`
551
+ - process env always takes precedence
552
+ - legacy `"apiKeys"` still works (mapped to env names)
553
+
368
554
  - `OPENAI_API_KEY` (for `openai/...`)
369
555
  - `ANTHROPIC_API_KEY` (for `anthropic/...`)
370
556
  - `XAI_API_KEY` (for `xai/...`)
371
557
  - `Z_AI_API_KEY` (for `zai/...`; supports `ZAI_API_KEY` alias)
372
- - `GEMINI_API_KEY` (for `google/...`)
558
+ - `GEMINI_API_KEY` (for `google/...`)
373
559
  - also accepts `GOOGLE_GENERATIVE_AI_API_KEY` and `GOOGLE_API_KEY` as aliases
374
560
 
375
561
  OpenAI-compatible chat completions toggle:
376
562
 
377
563
  - `OPENAI_USE_CHAT_COMPLETIONS=1` (or set `openai.useChatCompletions` in config)
378
564
 
565
+ UI theme:
566
+
567
+ - `SUMMARIZE_THEME=aurora|ember|moss|mono`
568
+ - `SUMMARIZE_TRUECOLOR=1` (force 24-bit ANSI)
569
+ - `SUMMARIZE_NO_TRUECOLOR=1` (disable 24-bit ANSI)
570
+
379
571
  OpenRouter (OpenAI-compatible):
380
572
 
381
573
  - Set `OPENROUTER_API_KEY=...`
382
- - Prefer forcing OpenRouter per model id: `--model openrouter/<author>/<slug>` (e.g. `openrouter/meta-llama/llama-3.1-8b-instruct:free`)
383
- - Built-in preset: `--model free` (uses a default set of OpenRouter `:free` models).
574
+ - Prefer forcing OpenRouter per model id: `--model openrouter/<author>/<slug>`
575
+ - Built-in preset: `--model free` (uses a default set of OpenRouter `:free` models)
384
576
 
385
577
  ### `summarize refresh-free`
386
578
 
387
579
  Quick start: make free the default (keep `auto` available)
388
580
 
389
581
  ```bash
390
- # writes ~/.summarize/config.json (models.free) and sets model="free"
391
582
  summarize refresh-free --set-default
392
-
393
- # now this defaults to free models
394
583
  summarize "https://example.com"
395
-
396
- # whenever you want best quality instead
397
584
  summarize "https://example.com" --model auto
398
585
  ```
399
586
 
400
- Regenerates the `free` preset (writes `models.free` into `~/.summarize/config.json`) by:
587
+ Regenerates the `free` preset (`models.free` in `~/.summarize/config.json`) by:
401
588
 
402
589
  - Fetching OpenRouter `/models`, filtering `:free`
403
- - Skipping models that look very small (<27B by default) based on the model id/name (best-effort heuristic)
590
+ - Skipping models that look very small (<27B by default) based on the model id/name
404
591
  - Testing which ones return non-empty text (concurrency 4, timeout 10s)
405
- - Picking a mix of smart-ish (bigger `context_length` / output cap) and fast models
406
- - Refining timings for the final selection and writing the sorted list back
592
+ - Picking a mix of smart-ish (bigger `context_length` / output cap) and fast models
593
+ - Refining timings and writing the sorted list back
407
594
 
408
- If `--model free` stops working (rate limits, allowed-provider restrictions, models removed), run:
595
+ If `--model free` stops working, run:
409
596
 
410
597
  ```bash
411
598
  summarize refresh-free
@@ -414,7 +601,7 @@ summarize refresh-free
414
601
  Flags:
415
602
 
416
603
  - `--runs 2` (default): extra timing runs per selected model (total runs = 1 + runs)
417
- - `--smart 3` (default): how many smart-first picks (rest filled by fastest)
604
+ - `--smart 3` (default): how many smart-first picks (rest filled by fastest)
418
605
  - `--min-params 27b` (default): ignore models with inferred size smaller than N billion parameters
419
606
  - `--max-age-days 180` (default): ignore models older than N days (set 0 to disable)
420
607
  - `--set-default`: also sets `"model": "free"` in `~/.summarize/config.json`
@@ -426,7 +613,7 @@ OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openrouter/
426
613
  ```
427
614
 
428
615
  If your OpenRouter account enforces an allowed-provider list, make sure at least one provider
429
- is allowed for the selected model. (When routing fails, `summarize` prints the exact providers to allow.)
616
+ is allowed for the selected model. When routing fails, `summarize` prints the exact providers to allow.
430
617
 
431
618
  Legacy: `OPENAI_BASE_URL=https://openrouter.ai/api/v1` (and either `OPENAI_API_KEY` or `OPENROUTER_API_KEY`) also works.
432
619
 
@@ -442,14 +629,14 @@ Optional services:
442
629
  - `FAL_KEY` (FAL AI API key for audio transcription via Whisper)
443
630
  - `APIFY_API_TOKEN` (YouTube transcript fallback)
444
631
 
445
- ## Model limits
632
+ ### Model limits
446
633
 
447
634
  The CLI uses the LiteLLM model catalog for model limits (like max output tokens):
448
635
 
449
636
  - Downloaded from: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
450
637
  - Cached at: `~/.summarize/cache/`
451
638
 
452
- ## Library usage (optional)
639
+ ### Library usage (optional)
453
640
 
454
641
  Recommended (minimal deps):
455
642
 
@@ -461,9 +648,30 @@ Compatibility (pulls in CLI deps):
461
648
  - `@steipete/summarize/content`
462
649
  - `@steipete/summarize/prompts`
463
650
 
464
- ## Development
651
+ ### Development
465
652
 
466
653
  ```bash
467
654
  pnpm install
468
655
  pnpm check
469
656
  ```
657
+
658
+ ## More
659
+
660
+ - Docs index: [docs/README.md](docs/README.md)
661
+ - CLI providers and config: [docs/cli.md](docs/cli.md)
662
+ - Auto model rules: [docs/model-auto.md](docs/model-auto.md)
663
+ - Website extraction: [docs/website.md](docs/website.md)
664
+ - YouTube handling: [docs/youtube.md](docs/youtube.md)
665
+ - Media pipeline: [docs/media.md](docs/media.md)
666
+ - Config schema and precedence: [docs/config.md](docs/config.md)
667
+
668
+ ## Troubleshooting
669
+
670
+ - "Receiving end does not exist": Chrome did not inject the content script yet.
671
+ - Extension details -> Site access -> On all sites (or allow this domain)
672
+ - Reload the tab once.
673
+ - "Failed to fetch" / daemon unreachable:
674
+ - `summarize daemon status`
675
+ - Logs: `~/.summarize/logs/daemon.err.log`
676
+
677
+ License: MIT