RubyGems - benchgecko - Versions diffs - 0.2.0 → 0.2.1 - Mend

benchgecko 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d8d1081a88ea9b84bd0ce328125cca300ae5b50a4f958936531683a22291f342
-  data.tar.gz: 34aa28808170063b21ebb3dc1a5bbcb4ee7a8696f8697154044db7b57446e0ed
+  metadata.gz: 6a126fb765dd87b64cb48087748871672d902c2aee7913f1b3a2bc5eec495933
+  data.tar.gz: ee0aeaeec203627d07af67928e495f95c6828a3729020eea75a942e902e36e04
 SHA512:
-  metadata.gz: 9d8501cda7ce337d38df5c93918300a69fb739ba10c3032ee1880133aeffc97dfccaf1b88ab660fbb1ae9f8f2bebc70af2bdfef0911aa98c05e77d7fd0f50d19
-  data.tar.gz: ba861f4d31368d4a95017c561306ccee24de87dfa60cc79af246809acd1073d711427be4c31467b5d8192f9545f9702d857fbc6a416dff19f8f30662928054c0
+  metadata.gz: 02d105ade04b5cff84f970c70c926ec961343763da3a47f15da4cc325d30a1ba98e57d377330345cce5306925f832b9f8847c1da530d2a04b43d7334e90a6375
+  data.tar.gz: 0fe0f60f0f26b53bb717c415da31b3c167a148f9c0dcc9ed4315c8c1aad7e0e61f3ef6d98c88c07c42accfd0058e0eec80fed5eab71d5a88c9b3ed402799af38

data/lib/benchgecko.rb CHANGED Viewed

@@ -1,261 +1,43 @@
-# frozen_string_literal: true
-# BenchGecko - The data layer of the AI economy.
-# Every model. Every agent. Everything AI. Tracked.
-# https://benchgecko.ai
+require 'net/http'
+require 'json'
 module BenchGecko
-  VERSION = "0.2.0"
-  # Represents an AI model with its benchmark scores, pricing, and metadata.
-  class Model
-    attr_reader :id, :name, :provider, :parameters, :context_window,
-                :input_price, :output_price, :benchmarks, :metadata
-    def initialize(attrs = {})
-      @id             = attrs[:id] || attrs["id"]
-      @name           = attrs[:name] || attrs["name"]
-      @provider       = attrs[:provider] || attrs["provider"]
-      @parameters     = attrs[:parameters] || attrs["parameters"]
-      @context_window = attrs[:context_window] || attrs["context_window"]
-      @input_price    = attrs[:input_price] || attrs["input_price"]
-      @output_price   = attrs[:output_price] || attrs["output_price"]
-      @benchmarks     = attrs[:benchmarks] || attrs["benchmarks"] || {}
-      @metadata       = attrs[:metadata] || attrs["metadata"] || {}
-    end
-    # Cost per million tokens (input + output averaged)
-    def cost_per_million
-      return nil unless input_price && output_price
-      ((input_price + output_price) / 2.0).round(4)
-    end
-    # Returns the score for a specific benchmark
-    def score(benchmark_name)
-      benchmarks[benchmark_name.to_s] || benchmarks[benchmark_name.to_sym]
-    end
-    # Returns a hash summary suitable for comparison tables
-    def to_summary
-      {
-        name: name,
-        provider: provider,
-        parameters: parameters,
-        context_window: context_window,
-        cost_per_million: cost_per_million
-      }
-    end
+  VERSION = '0.2.1'
+  BASE_URL = 'https://benchgecko.ai/api/v1'
-    def to_s
-      "#{name} (#{provider}) - #{parameters}B params"
-    end
+  def self.models(params = {})
+    get('/models', params)
   end
-  # Represents an AI agent with capabilities and scores.
-  class Agent
-    attr_reader :id, :name, :category, :provider, :models_used,
-                :scores, :capabilities, :metadata
-    def initialize(attrs = {})
-      @id           = attrs[:id] || attrs["id"]
-      @name         = attrs[:name] || attrs["name"]
-      @category     = attrs[:category] || attrs["category"]
-      @provider     = attrs[:provider] || attrs["provider"]
-      @models_used  = attrs[:models_used] || attrs["models_used"] || []
-      @scores       = attrs[:scores] || attrs["scores"] || {}
-      @capabilities = attrs[:capabilities] || attrs["capabilities"] || []
-      @metadata     = attrs[:metadata] || attrs["metadata"] || {}
-    end
-    def supports?(capability)
-      capabilities.include?(capability.to_s)
-    end
-    def to_s
-      "#{name} (#{category}) by #{provider}"
-    end
+  def self.model(slug)
+    get("/models/#{slug}")
   end
-  # Benchmark categories tracked by BenchGecko
-  BENCHMARK_CATEGORIES = {
-    reasoning: {
-      name: "Reasoning",
-      benchmarks: %w[MMLU MMLU-Pro ARC-Challenge HellaSwag WinoGrande GPQA],
-      description: "Logical reasoning, knowledge, and common sense"
-    },
-    coding: {
-      name: "Coding",
-      benchmarks: %w[HumanEval MBPP SWE-bench LiveCodeBench BigCodeBench],
-      description: "Code generation, debugging, and software engineering"
-    },
-    math: {
-      name: "Mathematics",
-      benchmarks: %w[GSM8K MATH AIME AMC Competition-Math],
-      description: "Mathematical problem solving from arithmetic to olympiad"
-    },
-    instruction: {
-      name: "Instruction Following",
-      benchmarks: %w[IFEval MT-Bench AlpacaEval Chatbot-Arena],
-      description: "Following complex instructions and conversational ability"
-    },
-    safety: {
-      name: "Safety",
-      benchmarks: %w[TruthfulQA BBQ ToxiGen BOLD],
-      description: "Truthfulness, bias, and safety alignment"
-    },
-    multimodal: {
-      name: "Multimodal",
-      benchmarks: %w[MMMU MathVista VQAv2 TextVQA DocVQA],
-      description: "Vision, document understanding, and cross-modal reasoning"
-    },
-    multilingual: {
-      name: "Multilingual",
-      benchmarks: %w[MGSM XL-Sum FLORES],
-      description: "Performance across languages and translation"
-    },
-    long_context: {
-      name: "Long Context",
-      benchmarks: %w[RULER NIAH InfiniteBench LongBench],
-      description: "Retrieval and reasoning over long documents"
-    }
-  }.freeze
-  # Built-in model catalog with real benchmark data and pricing
-  MODELS = {
-    "gpt-4o" => {
-      name: "GPT-4o", provider: "OpenAI", parameters: 200,
-      context_window: 128_000, input_price: 2.50, output_price: 10.00,
-      benchmarks: { "MMLU" => 88.7, "HumanEval" => 90.2, "GSM8K" => 95.8, "GPQA" => 53.6 }
-    },
-    "claude-3.5-sonnet" => {
-      name: "Claude 3.5 Sonnet", provider: "Anthropic", parameters: nil,
-      context_window: 200_000, input_price: 3.00, output_price: 15.00,
-      benchmarks: { "MMLU" => 88.7, "HumanEval" => 92.0, "GSM8K" => 96.4, "GPQA" => 59.4 }
-    },
-    "gemini-2.0-flash" => {
-      name: "Gemini 2.0 Flash", provider: "Google", parameters: nil,
-      context_window: 1_000_000, input_price: 0.10, output_price: 0.40,
-      benchmarks: { "MMLU" => 85.2, "HumanEval" => 84.0, "GSM8K" => 92.1 }
-    },
-    "llama-3.1-405b" => {
-      name: "Llama 3.1 405B", provider: "Meta", parameters: 405,
-      context_window: 128_000, input_price: 3.00, output_price: 3.00,
-      benchmarks: { "MMLU" => 88.6, "HumanEval" => 89.0, "GSM8K" => 96.8, "GPQA" => 50.7 }
-    },
-    "mistral-large" => {
-      name: "Mistral Large", provider: "Mistral", parameters: 123,
-      context_window: 128_000, input_price: 2.00, output_price: 6.00,
-      benchmarks: { "MMLU" => 84.0, "HumanEval" => 82.0, "GSM8K" => 91.2 }
-    },
-    "deepseek-v3" => {
-      name: "DeepSeek V3", provider: "DeepSeek", parameters: 671,
-      context_window: 128_000, input_price: 0.27, output_price: 1.10,
-      benchmarks: { "MMLU" => 87.1, "HumanEval" => 82.6, "GSM8K" => 89.3, "GPQA" => 59.1 }
-    }
-  }.freeze
-  class << self
-    # Retrieve a model by its identifier
-    #
-    #   model = BenchGecko.get_model("gpt-4o")
-    #   model.name          #=> "GPT-4o"
-    #   model.provider      #=> "OpenAI"
-    #   model.score("MMLU") #=> 88.7
-    #
-    def get_model(model_id)
-      data = MODELS[model_id.to_s]
-      return nil unless data
-      Model.new(data.merge(id: model_id.to_s))
-    end
-    # List all available model identifiers
-    def list_models
-      MODELS.keys
-    end
-    # Compare two models side by side across benchmarks and pricing
-    #
-    #   result = BenchGecko.compare_models("gpt-4o", "claude-3.5-sonnet")
-    #   result[:benchmark_diff]  #=> {"MMLU" => 0.0, "HumanEval" => -1.8, ...}
-    #   result[:cheaper]         #=> "gpt-4o"
-    #
-    def compare_models(model_a_id, model_b_id)
-      a = get_model(model_a_id)
-      b = get_model(model_b_id)
-      return nil unless a && b
-      all_benchmarks = (a.benchmarks.keys + b.benchmarks.keys).uniq
-      benchmark_diff = {}
-      all_benchmarks.each do |bench|
-        score_a = a.score(bench)
-        score_b = b.score(bench)
-        benchmark_diff[bench] = (score_a && score_b) ? (score_a - score_b).round(2) : nil
-      end
-      cost_a = a.cost_per_million
-      cost_b = b.cost_per_million
-      cheaper = if cost_a && cost_b
-                  cost_a <= cost_b ? model_a_id : model_b_id
-                end
-      {
-        model_a: a.to_summary,
-        model_b: b.to_summary,
-        benchmark_diff: benchmark_diff,
-        cheaper: cheaper,
-        cost_ratio: (cost_a && cost_b && cost_b > 0) ? (cost_a / cost_b).round(2) : nil
-      }
-    end
+  def self.benchmarks
+    get('/benchmarks')
+  end
-    # Estimate cost for a given number of tokens
-    #
-    #   BenchGecko.estimate_cost("gpt-4o", input_tokens: 1_000_000, output_tokens: 500_000)
-    #   #=> { input_cost: 2.50, output_cost: 5.00, total: 7.50 }
-    #
-    def estimate_cost(model_id, input_tokens:, output_tokens: 0)
-      model = get_model(model_id)
-      return nil unless model&.input_price && model&.output_price
+  def self.compare(*slugs)
+    get('/compare', models: slugs.join(','))
+  end
-      input_cost  = (model.input_price * input_tokens / 1_000_000.0).round(4)
-      output_cost = (model.output_price * output_tokens / 1_000_000.0).round(4)
+  def self.pricing(slug = nil)
+    slug ? get("/pricing/#{slug}") : get('/pricing')
+  end
-      {
-        model: model.name,
-        input_tokens: input_tokens,
-        output_tokens: output_tokens,
-        input_cost: input_cost,
-        output_cost: output_cost,
-        total: (input_cost + output_cost).round(4)
-      }
-    end
+  def self.providers
+    get('/providers')
+  end
-    # List all benchmark categories
-    def benchmark_categories
-      BENCHMARK_CATEGORIES
-    end
+  def self.agents
+    get('/agents')
+  end
-    # Find models that score above a threshold on a given benchmark
-    #
-    #   BenchGecko.top_models("MMLU", min_score: 87.0)
-    #   #=> [Model, Model, ...]
-    #
-    def top_models(benchmark, min_score: 0)
-      MODELS.filter_map do |id, data|
-        score = data[:benchmarks][benchmark]
-        next unless score && score >= min_score
-        get_model(id)
-      end.sort_by { |m| -m.score(benchmark) }
-    end
+  private
-    # Find the cheapest model that meets a minimum score on a benchmark
-    #
-    #   BenchGecko.cheapest_above("MMLU", 85.0)
-    #   #=> Model (Gemini 2.0 Flash)
-    #
-    def cheapest_above(benchmark, min_score)
-      top_models(benchmark, min_score: min_score)
-        .select(&:cost_per_million)
-        .min_by(&:cost_per_million)
-    end
+  def self.get(path, params = {})
+    uri = URI("#{BASE_URL}#{path}")
+    uri.query = URI.encode_www_form(params) unless params.empty?
+    JSON.parse(Net::HTTP.get(uri))
   end
 end

metadata CHANGED Viewed

@@ -1,29 +1,23 @@
 --- !ruby/object:Gem::Specification
 name: benchgecko
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.2.1
 platform: ruby
 authors:
 - BenchGecko
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-04-11 00:00:00.000000000 Z
+date: 2026-04-25 00:00:00.000000000 Z
 dependencies: []
-description: Official Ruby SDK for BenchGecko, the data layer of the AI economy. Query
-  thousands of AI models with cross-provider pricing and daily price history. Track
-  company valuations, funding timelines, and revenue estimates. Pull benchmark scores,
-  agent leaderboards, and a live changelog of every price drop, every launch, every
-  deprecation. If it moved in AI today, it's already on BenchGecko.
-email:
-- hello@benchgecko.ai
+description: Query AI model data, benchmark scores, and run side-by-side comparisons.
+  BenchGecko tracks every major AI model, benchmark, and provider with cross-provider
+  pricing.
+email: hello@benchgecko.ai
 executables: []
 extensions: []
 extra_rdoc_files: []
 files:
-- CHANGELOG.md
-- LICENSE.txt
-- README.md
 - lib/benchgecko.rb
 homepage: https://benchgecko.ai
 licenses:
@@ -31,7 +25,8 @@ licenses:
 metadata:
   homepage_uri: https://benchgecko.ai
   source_code_uri: https://github.com/BenchGecko/benchgecko-ruby
-  changelog_uri: https://github.com/BenchGecko/benchgecko-ruby/blob/main/CHANGELOG.md
+  documentation_uri: https://benchgecko.ai/api-docs
+  changelog_uri: https://benchgecko.ai/changelog
 post_install_message:
 rdoc_options: []
 require_paths:
@@ -40,7 +35,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 2.7.0
+      version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
@@ -50,6 +45,5 @@ requirements: []
 rubygems_version: 3.0.3.1
 signing_key:
 specification_version: 4
-summary: The data layer of the AI economy. Every model. Every agent. Everything AI.
-  Tracked.
+summary: Ruby SDK for BenchGecko AI model data platform
 test_files: []

data/CHANGELOG.md DELETED Viewed

@@ -1,15 +0,0 @@
-# Changelog
-## 0.2.0 (2026-03-27)
-- Rewrite gem description, summary, and README with the official BenchGecko brand voice
-- Remove hardcoded model and provider counts in favor of evergreen language
-- Reframe the SDK around the full BenchGecko data layer: models, companies, benchmarks, agents, and the live changelog
-## 0.1.0 (2026-03-30)
-- Initial release
-- Model lookup, comparison, and cost estimation
-- Built-in catalog: GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, Llama 3.1 405B, Mistral Large, DeepSeek V3
-- Benchmark categories: reasoning, coding, math, instruction, safety, multimodal, multilingual, long context
-- Top models filtering and cheapest-above-threshold finder

data/LICENSE.txt DELETED Viewed

@@ -1,21 +0,0 @@
-MIT License
-Copyright (c) 2026 BenchGecko
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.

data/README.md DELETED Viewed

@@ -1,129 +0,0 @@
-# BenchGecko for Ruby
-**The data layer of the AI economy.** Official Ruby SDK for querying thousands of AI models with cross-provider pricing and daily price history, company valuations, funding timelines, revenue estimates, benchmark scores, agent leaderboards, and a live changelog of every price drop, every launch, every deprecation.
-If it moved in AI today, it's already on BenchGecko.
-## What's Tracked
-- **Models.** Thousands of AI models with cross-provider pricing and daily price history.
-- **Companies.** Hundreds of AI companies with valuations, funding timelines, and revenue estimates.
-- **Benchmarks.** Reasoning, coding, math, instruction following, safety, multimodal, multilingual, long context.
-- **Agents.** Developer adoption signals and agent leaderboards.
-- **Changelog.** Every price drop, every launch, every deprecation, as it happens.
-## Installation
-Add to your Gemfile:
-```ruby
-gem "benchgecko"
-```
-Or install directly:
-```bash
-gem install benchgecko
-```
-## Quick Start
-```ruby
-require "benchgecko"
-# Look up any model
-model = BenchGecko.get_model("claude-3.5-sonnet")
-puts model.name       #=> "Claude 3.5 Sonnet"
-puts model.provider   #=> "Anthropic"
-puts model.score("MMLU")  #=> 88.7
-# List all tracked models
-BenchGecko.list_models.each { |id| puts id }
-```
-## Comparing Models
-The comparison engine surfaces benchmark differences and pricing ratios, making it straightforward to evaluate tradeoffs between models:
-```ruby
-result = BenchGecko.compare_models("gpt-4o", "claude-3.5-sonnet")
-puts result[:cheaper]           #=> "gpt-4o"
-puts result[:cost_ratio]        #=> 0.69
-puts result[:benchmark_diff]    #=> {"MMLU" => 0.0, "HumanEval" => -1.8, ...}
-# Positive diff means model_a scores higher
-result[:benchmark_diff].each do |bench, diff|
-  next unless diff
-  winner = diff >= 0 ? "GPT-4o" : "Claude 3.5 Sonnet"
-  puts "#{bench}: #{winner} wins by #{diff.abs} points"
-end
-```
-## Cost Estimation
-Estimate inference costs before committing to a provider. Prices are per million tokens:
-```ruby
-cost = BenchGecko.estimate_cost("gpt-4o",
-  input_tokens: 2_000_000,
-  output_tokens: 500_000
-)
-puts cost[:input_cost]   #=> 5.0
-puts cost[:output_cost]  #=> 5.0
-puts cost[:total]        #=> 10.0
-```
-## Finding the Right Model
-Filter models by benchmark performance to find the best fit for your workload:
-```ruby
-# All models scoring 87+ on MMLU
-strong_reasoners = BenchGecko.top_models("MMLU", min_score: 87.0)
-strong_reasoners.each { |m| puts "#{m.name}: #{m.score('MMLU')}" }
-# Cheapest model above a quality threshold
-budget_pick = BenchGecko.cheapest_above("MMLU", 85.0)
-puts "#{budget_pick.name} at $#{budget_pick.cost_per_million}/M tokens"
-```
-## Benchmark Categories
-BenchGecko organizes benchmarks into categories covering reasoning, coding, math, instruction following, safety, multimodal, multilingual, and long context evaluation:
-```ruby
-BenchGecko.benchmark_categories.each do |key, info|
-  puts "#{info[:name]}: #{info[:benchmarks].join(', ')}"
-  puts "  #{info[:description]}"
-end
-```
-## Built-in Model Catalog
-The gem ships with a curated catalog of major models from OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek. Each entry includes benchmark scores, parameter counts, context window sizes, and per-token pricing.
-```ruby
-model = BenchGecko.get_model("deepseek-v3")
-puts model.parameters       #=> 671
-puts model.context_window   #=> 128000
-puts model.cost_per_million  #=> 0.685
-```
-## Use Cases
-- **Model selection pipelines.** Programmatically pick the cheapest model that meets your quality bar.
-- **Cost monitoring.** Estimate monthly spend across different model configurations.
-- **Benchmark dashboards.** Pull structured scores into internal reporting tools.
-- **Agent evaluation.** Compare AI agents across capability dimensions.
-- **Pricing intelligence.** Track every price drop and launch through the live changelog.
-## Resources
-- [BenchGecko](https://benchgecko.ai). The data layer of the AI economy.
-- [Source Code](https://github.com/BenchGecko/benchgecko-ruby). Contributions welcome.
-## License
-MIT License. See [LICENSE.txt](LICENSE.txt) for details.