RubyGems - summarize-ruby - Versions diffs - 0.1.0 - Mend

summarize-ruby 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +22 -0
data/LICENSE +21 -0
data/README.md +208 -0
data/lib/summarize/client.rb +160 -0
data/lib/summarize/configuration.rb +33 -0
data/lib/summarize/errors.rb +31 -0
data/lib/summarize/options.rb +72 -0
data/lib/summarize/result.rb +101 -0
data/lib/summarize/version.rb +5 -0
data/lib/summarize.rb +50 -0
metadata +74 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 4dd0f9c27d9fb176432ad18feb629895241b2b10bf284729855e27707e5c928b
+  data.tar.gz: e302af7631889e3543f4540b7359be06b6ca99d8ee30c7b87fc69ec5ec3ff2d3
+SHA512:
+  metadata.gz: 886c4ac651c3adf6a9df027460ef03c0ee7fb1deb25b6ad46a5b812380158c7edc5da968a2806df820fd96d32419c0ac268480b01fd7f8a324e08f244b361c50
+  data.tar.gz: d41851b8bf030bab1c2856950cfac67554b1eecc15419f7fd714c11bfcab8ee91c50eb41c572379169f0ca4f10293636cb120931690b5a0482483e4a3f185307

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0] - 2026-02-19
+### Added
+- Ruby wrapper for the `summarize` CLI tool
+- `Summarize.call` for summarizing URLs and file paths
+- `Summarize.from_text` for summarizing text content
+- `Summarize.extract` for content extraction without LLM summarization
+- Streaming support via block syntax
+- Global configuration with `Summarize.configure`
+- Support for all `summarize` CLI options including model, length, language, format, video mode, slides, and more
+- `Summarize::Result` object with accessors for summary, extracted content, LLM metadata, and token metrics
+- Custom error hierarchy: `BinaryNotFoundError`, `TimeoutError`, `CommandError`, etc.
+- Environment variable passthrough for API keys
+- Automatic binary detection

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Martiano
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,208 @@
+# summarize-ruby
+Ruby wrapper for the [`summarize`](https://github.com/steipete/summarize) CLI tool. Summarize web pages, files, videos, and text using LLMs from OpenAI, Anthropic, Google, xAI, and more.
+## Installation
+First, install the `summarize` CLI:
+```bash
+npm i -g @steipete/summarize
+```
+Then add the gem to your Gemfile:
+```ruby
+gem "summarize-ruby"
+```
+Or install directly:
+```bash
+gem install summarize-ruby
+```
+## Usage
+```ruby
+require "summarize"
+# Summarize a URL
+result = Summarize.call("https://example.com/article")
+puts result.summary
+# Summarize a local file
+result = Summarize.call("/path/to/document.pdf")
+# Summarize text
+result = Summarize.from_text("Long article text here...")
+# Extract content without summarization
+result = Summarize.extract("https://example.com", format: :md)
+puts result.content
+```
+### Options
+Pass any option as a keyword argument:
+```ruby
+result = Summarize.call("https://example.com",
+  model: "anthropic/claude-sonnet-4-5",
+  length: :short,
+  language: "es",
+  prompt: "Focus on technical details"
+)
+```
+All supported options:
+| Ruby option | CLI flag | Example values |
+|---|---|---|
+| `model` | `--model` | `"openai/gpt-5-mini"`, `"anthropic/claude-sonnet-4-5"`, `"auto"` |
+| `length` | `--length` | `:short`, `:medium`, `:long`, `:xl`, `:xxl`, `5000` |
+| `language` | `--lang` | `"en"`, `"es"`, `"de"`, `"auto"` |
+| `prompt` | `--prompt` | `"Focus on key takeaways"` |
+| `prompt_file` | `--prompt-file` | `"/path/to/prompt.txt"` |
+| `format` | `--format` | `:text`, `:md` |
+| `timeout` | `--timeout` | `"3m"`, `"30s"` |
+| `retries` | `--retries` | `2` |
+| `cli` | `--cli` | `"claude"`, `"gemini"`, `"codex"` |
+| `video_mode` | `--video-mode` | `:auto`, `:transcript`, `:understand` |
+| `markdown_mode` | `--markdown-mode` | `:off`, `:auto`, `:llm`, `:readability` |
+| `max_output_tokens` | `--max-output-tokens` | `2000` |
+| `max_extract_characters` | `--max-extract-characters` | `10000` |
+| `youtube` | `--youtube` | `"auto"`, `"web"`, `"yt-dlp"` |
+| `transcriber` | `--transcriber` | `"auto"`, `"whisper"`, `"parakeet"` |
+| `firecrawl` | `--firecrawl` | `"off"`, `"auto"`, `"always"` |
+| `preprocess` | `--preprocess` | `"off"`, `"auto"`, `"always"` |
+| `theme` | `--theme` | `"aurora"`, `"ember"`, `"moss"`, `"mono"` |
+| `metrics` | `--metrics` | `"off"`, `"on"`, `"detailed"` |
+| `slides_dir` | `--slides-dir` | `"./my-slides"` |
+| `slides_max` | `--slides-max` | `10` |
+| `slides_min_duration` | `--slides-min-duration` | `5` |
+| `slides_scene_threshold` | `--slides-scene-threshold` | `0.5` |
+Boolean flags (pass `true` to enable):
+| Ruby option | CLI flag |
+|---|---|
+| `force_summary` | `--force-summary` |
+| `timestamps` | `--timestamps` |
+| `no_cache` | `--no-cache` |
+| `no_media_cache` | `--no-media-cache` |
+| `slides` | `--slides` |
+| `slides_debug` | `--slides-debug` |
+| `slides_ocr` | `--slides-ocr` |
+| `verbose` | `--verbose` |
+| `debug` | `--debug` |
+| `no_color` | `--no-color` |
+| `plain` | `--plain` |
+### Streaming
+Pass a block to stream output as it arrives:
+```ruby
+Summarize.call("https://example.com") do |chunk|
+  print chunk
+end
+```
+### Result object
+The `Result` object provides structured access to the response:
+```ruby
+result = Summarize.call("https://example.com")
+# Summary
+result.summary       # => "## Key Points\n..."
+result.success?      # => true
+# Extracted content
+result.title         # => "Article Title"
+result.description   # => "Article description"
+result.content       # => "Full extracted content..."
+result.site_name     # => "Example"
+result.media_type    # => "text/html"
+# LLM info
+result.model         # => "gpt-5-mini"
+result.provider      # => "openai"
+# Token usage
+result.total_tokens      # => 1550
+result.prompt_tokens     # => 1200
+result.completion_tokens # => 350
+# Raw JSON
+result.to_h          # => { "summary" => "...", "extracted" => { ... }, ... }
+```
+### Configuration
+Set global defaults:
+```ruby
+Summarize.configure do |c|
+  c.default_model = "anthropic/claude-sonnet-4-5"
+  c.default_length = :medium
+  c.default_language = "en"
+  c.timeout = "3m"
+  c.retries = 2
+  c.default_cli = "claude"
+  # Pass API keys to the CLI process
+  c.env = {
+    "ANTHROPIC_API_KEY" => ENV["ANTHROPIC_API_KEY"],
+    "OPENAI_API_KEY" => ENV["OPENAI_API_KEY"]
+  }
+  # Custom binary path (auto-detected by default)
+  c.binary_path = "/usr/local/bin/summarize"
+end
+```
+Per-call options override configuration defaults:
+```ruby
+Summarize.configure { |c| c.default_model = "anthropic/claude-sonnet-4-5" }
+# This uses gpt-5-mini, not the configured default
+result = Summarize.call("https://example.com", model: "openai/gpt-5-mini")
+```
+### Error handling
+```ruby
+begin
+  result = Summarize.call("https://example.com")
+rescue Summarize::BinaryNotFoundError
+  # summarize CLI not installed
+rescue Summarize::CommandError => e
+  e.exit_code  # => 1
+  e.stderr     # => "error message"
+rescue Summarize::SummarizationError => e
+  # JSON parsing failed
+rescue Summarize::Error => e
+  # catch-all for any summarize error
+end
+```
+## Requirements
+- Ruby >= 3.1
+- [`summarize`](https://github.com/steipete/summarize) CLI (`npm i -g @steipete/summarize`)
+- At least one LLM provider API key (OpenAI, Anthropic, Google, etc.)
+## Development
+```bash
+bundle install
+bundle exec rspec
+```
+## License
+MIT

data/lib/summarize/client.rb ADDED Viewed

@@ -0,0 +1,160 @@
+# frozen_string_literal: true
+require "open3"
+require "json"
+require "tempfile"
+module Summarize
+  class Client
+    attr_reader :config
+    def initialize(config = Summarize.configuration)
+      @config = config
+    end
+    # Summarize a URL or file path.
+    #
+    #   client.call("https://example.com", length: :short, model: "openai/gpt-5-mini")
+    #   client.call("/path/to/file.pdf", language: "es")
+    #
+    # With a block, streams chunks as they arrive:
+    #
+    #   client.call("https://example.com") { |chunk| print chunk }
+    #
+    def call(input, **opts, &block)
+      if block_given?
+        stream(input, **opts, &block)
+      else
+        run_json(input, **opts)
+      end
+    end
+    # Summarize text content by writing to a temp file.
+    #
+    #   client.from_text("Long article text...", length: :medium)
+    #
+    def from_text(text, **opts, &block)
+      with_temp_file(text) do |path|
+        if block_given?
+          stream(path, **opts, &block)
+        else
+          run_json(path, **opts)
+        end
+      end
+    end
+    # Extract content without LLM summarization.
+    #
+    #   result = client.extract("https://example.com", format: :md)
+    #   result.content  # => extracted markdown
+    #
+    def extract(input, **opts)
+      run_json(input, extract: true, **opts)
+    end
+    private
+    def run_json(input, extract: false, **opts)
+      args = build_args(input, extract: extract, stream: false, json: true, **opts)
+      stdout, stderr, status = execute(args)
+      handle_error!(status, stderr) unless status.success?
+      parsed = JSON.parse(stdout)
+      Result.new(parsed)
+    rescue JSON::ParserError => e
+      raise SummarizationError, "Failed to parse JSON output: #{e.message}\nOutput: #{stdout&.slice(0, 500)}"
+    end
+    def stream(input, **opts, &block)
+      args = build_args(input, stream: true, json: false, **opts)
+      full_output = +""
+      Open3.popen3(command_env, *args) do |stdin, stdout, stderr, wait_thread|
+        stdin.close
+        stdout.each_line do |line|
+          full_output << line
+          block.call(line)
+        end
+        status = wait_thread.value
+        handle_error!(status, stderr.read) unless status.success?
+      end
+      full_output
+    end
+    def with_temp_file(text)
+      file = Tempfile.new(["summarize-input", ".txt"])
+      file.write(text)
+      file.flush
+      file.close
+      yield file.path
+    ensure
+      file&.unlink
+    end
+    def build_args(input, extract: false, stream: nil, json: false, **opts)
+      merged = apply_defaults(opts)
+      args = [config.binary_path]
+      args << input
+      args << "--json" if json
+      args << "--stream" << "off" if stream == false
+      args << "--stream" << "on" if stream == true
+      args << "--extract" if extract
+      args << "--metrics" << "on" if json
+      args.concat(Options.new(merged).to_args)
+      args
+    end
+    def apply_defaults(opts)
+      defaults = {}
+      defaults[:model] = config.default_model if config.default_model && config.default_model != "auto"
+      defaults[:cli] = config.default_cli if config.default_cli
+      defaults[:length] = config.default_length if config.default_length
+      defaults[:language] = config.default_language if config.default_language
+      defaults[:timeout] = config.timeout if config.timeout
+      defaults[:retries] = config.retries if config.retries
+      defaults.merge(opts)
+    end
+    def command_env
+      env = {}
+      config.env.each { |k, v| env[k.to_s] = v.to_s }
+      env
+    end
+    def execute(args)
+      validate_binary!
+      Open3.capture3(command_env, *args)
+    end
+    def validate_binary!
+      path = config.binary_path
+      return if path == "summarize" # rely on PATH
+      return if File.executable?(path)
+      raise BinaryNotFoundError, path
+    end
+    def handle_error!(status, stderr)
+      case status.exitstatus
+      when 130
+        raise Error, "Interrupted (SIGINT)"
+      when 143
+        raise Error, "Terminated (SIGTERM)"
+      else
+        raise CommandError.new(status.exitstatus, stderr&.strip || "")
+      end
+    end
+  end
+end

data/lib/summarize/configuration.rb ADDED Viewed

@@ -0,0 +1,33 @@
+# frozen_string_literal: true
+module Summarize
+  class Configuration
+    attr_accessor :default_model, :default_length, :default_language,
+                  :default_cli, :timeout, :retries, :env
+    attr_writer :binary_path
+    def initialize
+      @binary_path = nil
+      @default_model = "auto"
+      @default_cli = nil
+      @default_length = nil
+      @default_language = nil
+      @timeout = nil
+      @retries = nil
+      @env = {}
+    end
+    def binary_path
+      @binary_path ||= find_binary
+    end
+    private
+    def find_binary
+      path = `which summarize 2>/dev/null`.strip
+      return path unless path.empty?
+      ["/usr/local/bin/summarize", "/opt/homebrew/bin/summarize"].find { |p| File.executable?(p) } || "summarize"
+    end
+  end
+end

data/lib/summarize/errors.rb ADDED Viewed

@@ -0,0 +1,31 @@
+# frozen_string_literal: true
+module Summarize
+  class Error < StandardError; end
+  class BinaryNotFoundError < Error
+    def initialize(path)
+      super("summarize binary not found at '#{path}'. Install via: npm i -g @steipete/summarize")
+    end
+  end
+  class TimeoutError < Error
+    def initialize(timeout)
+      super("summarize timed out after #{timeout}")
+    end
+  end
+  class ExtractionError < Error; end
+  class SummarizationError < Error; end
+  class CommandError < Error
+    attr_reader :exit_code, :stderr
+    def initialize(exit_code, stderr)
+      @exit_code = exit_code
+      @stderr = stderr
+      super("summarize exited with code #{exit_code}: #{stderr}")
+    end
+  end
+end

data/lib/summarize/options.rb ADDED Viewed

@@ -0,0 +1,72 @@
+# frozen_string_literal: true
+module Summarize
+  class Options
+    LENGTHS = %i[short medium long xl xxl s m l].freeze
+    VIDEO_MODES = %i[auto transcript understand].freeze
+    FORMATS = %i[text md].freeze
+    MARKDOWN_MODES = %i[off auto llm readability].freeze
+    METRICS_MODES = %i[off on detailed].freeze
+    OPTION_MAP = {
+      model: "--model",
+      length: "--length",
+      language: "--lang",
+      timeout: "--timeout",
+      retries: "--retries",
+      prompt: "--prompt",
+      prompt_file: "--prompt-file",
+      format: "--format",
+      video_mode: "--video-mode",
+      markdown_mode: "--markdown-mode",
+      max_output_tokens: "--max-output-tokens",
+      max_extract_characters: "--max-extract-characters",
+      youtube: "--youtube",
+      transcriber: "--transcriber",
+      firecrawl: "--firecrawl",
+      preprocess: "--preprocess",
+      theme: "--theme",
+      metrics: "--metrics",
+      cli: "--cli",
+      slides_dir: "--slides-dir",
+      slides_scene_threshold: "--slides-scene-threshold",
+      slides_max: "--slides-max",
+      slides_min_duration: "--slides-min-duration"
+    }.freeze
+    BOOLEAN_FLAGS = {
+      force_summary: "--force-summary",
+      timestamps: "--timestamps",
+      no_cache: "--no-cache",
+      no_media_cache: "--no-media-cache",
+      verbose: "--verbose",
+      debug: "--debug",
+      no_color: "--no-color",
+      plain: "--plain",
+      slides: "--slides",
+      slides_debug: "--slides-debug",
+      slides_ocr: "--slides-ocr"
+    }.freeze
+    def initialize(opts = {})
+      @opts = opts
+    end
+    def to_args
+      args = []
+      OPTION_MAP.each do |key, flag|
+        value = @opts[key]
+        next if value.nil?
+        args << flag << value.to_s
+      end
+      BOOLEAN_FLAGS.each do |key, flag|
+        args << flag if @opts[key]
+      end
+      args
+    end
+  end
+end

data/lib/summarize/result.rb ADDED Viewed

@@ -0,0 +1,101 @@
+# frozen_string_literal: true
+module Summarize
+  class Result
+    attr_reader :raw
+    def initialize(json)
+      @raw = json
+    end
+    def summary
+      raw["summary"]
+    end
+    def title
+      dig("extracted", "title")
+    end
+    def description
+      dig("extracted", "description")
+    end
+    def site_name
+      dig("extracted", "siteName")
+    end
+    def content
+      dig("extracted", "content")
+    end
+    def content_length
+      dig("extracted", "contentLength")
+    end
+    def media_type
+      dig("extracted", "mediaType")
+    end
+    def source
+      dig("extracted", "source")
+    end
+    def model
+      dig("llm", "model")
+    end
+    def provider
+      dig("llm", "provider")
+    end
+    def prompt
+      raw["prompt"]
+    end
+    def metrics
+      raw["metrics"]
+    end
+    def llm_metrics
+      dig("metrics", "llm") || []
+    end
+    def total_tokens
+      llm_metrics.sum { |m| m["totalTokens"] || 0 }
+    end
+    def prompt_tokens
+      llm_metrics.sum { |m| m["promptTokens"] || 0 }
+    end
+    def completion_tokens
+      llm_metrics.sum { |m| m["completionTokens"] || 0 }
+    end
+    def input_kind
+      dig("input", "kind")
+    end
+    def slides
+      raw["slides"]
+    end
+    def success?
+      !summary.nil?
+    end
+    def extract_only?
+      summary.nil? && !content.nil?
+    end
+    def to_h
+      raw
+    end
+    private
+    def dig(*keys)
+      keys.reduce(raw) { |hash, key| hash.is_a?(Hash) ? hash[key] : nil }
+    end
+  end
+end

data/lib/summarize/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module Summarize
+  VERSION = "0.1.0"
+end

data/lib/summarize.rb ADDED Viewed

@@ -0,0 +1,50 @@
+# frozen_string_literal: true
+require_relative "summarize/version"
+require_relative "summarize/configuration"
+require_relative "summarize/errors"
+require_relative "summarize/options"
+require_relative "summarize/result"
+require_relative "summarize/client"
+module Summarize
+  class << self
+    attr_writer :configuration
+    def configuration
+      @configuration ||= Configuration.new
+    end
+    def configure
+      yield(configuration)
+    end
+    def reset_configuration!
+      @configuration = Configuration.new
+    end
+    # Convenience method: summarize a URL or file path.
+    #
+    #   Summarize.call("https://example.com", length: :short)
+    #
+    def call(input, **opts, &block)
+      Client.new.call(input, **opts, &block)
+    end
+    # Convenience method: summarize text content.
+    #
+    #   Summarize.from_text("Long text...", length: :medium)
+    #
+    def from_text(text, **opts, &block)
+      Client.new.from_text(text, **opts, &block)
+    end
+    # Convenience method: extract content without summarization.
+    #
+    #   Summarize.extract("https://example.com", format: :md)
+    #
+    def extract(input, **opts)
+      Client.new.extract(input, **opts)
+    end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,74 @@
+--- !ruby/object:Gem::Specification
+name: summarize-ruby
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Martiano
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2026-02-19 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: json
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '2.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '2.0'
+description: A Ruby gem that wraps the summarize CLI tool, providing a clean Ruby
+  API for summarizing URLs, files, and text using various LLM providers.
+email:
+- hello@martiano.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE
+- README.md
+- lib/summarize.rb
+- lib/summarize/client.rb
+- lib/summarize/configuration.rb
+- lib/summarize/errors.rb
+- lib/summarize/options.rb
+- lib/summarize/result.rb
+- lib/summarize/version.rb
+homepage: https://github.com/martiano/summarize-ruby
+licenses:
+- MIT
+metadata:
+  homepage_uri: https://github.com/martiano/summarize-ruby
+  source_code_uri: https://github.com/martiano/summarize-ruby
+  changelog_uri: https://github.com/martiano/summarize-ruby/blob/main/CHANGELOG.md
+  bug_tracker_uri: https://github.com/martiano/summarize-ruby/issues
+  documentation_uri: https://rubydoc.info/gems/summarize-ruby
+  rubygems_mfa_required: 'true'
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.1.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.3.27
+signing_key:
+specification_version: 4
+summary: Ruby wrapper for the summarize CLI
+test_files: []