RubyGems - simple_inference - Versions diffs - 0.1.3 → 0.1.4 - Mend

simple_inference 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/README.md +297 -139
data/lib/simple_inference/client.rb +12 -8
data/lib/simple_inference/config.rb +16 -0
data/lib/simple_inference/version.rb +1 -1
data/sig/simple_inference.rbs +0 -1
metadata +7 -8

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 50a8ea07e0e30771b6d42f2bacf12f12b379f1151bb44947bb5febe3cac70cf9
-  data.tar.gz: c655b141ea39e518c5cdcc30cc28bac249ef23acfae8c05f4cf852e199a98a80
+  metadata.gz: 8d8b01060969cbab2df30a38e16b7952a877188e89bd720209c15b57f9f79687
+  data.tar.gz: e278f52f76cf6f7bd3f74e567731bbdec016769b2b720161e9907348fd9b54c3
 SHA512:
-  metadata.gz: '0483fda13365abb75bde99643ada2fc95017e4883e07d823d8b86912e9553764befe7c6a92222aeca3ee05d20e8f6b1e1a23a4a74decf45383bc4cc9c055d357'
-  data.tar.gz: 4d115195078c198b2c2c1b4bc1307a05e337884449b19393488374cd1710efaf6186e68c7060417141e5477933ea3fafc5223896adc06c1d1063263437254d56
+  metadata.gz: cc6724a0fbe640d7af0d6bb35bfee81e6b95d501b23734f2874dfddbb2f71dcb7ae59557b742427bb9322804fbca632cbe95abe68f9ea26709303fea86550605
+  data.tar.gz: 871b06d6e585bac84cf38ac3abef77b3940dd41f4868c76e08b19c317c2b35c93f81adde9a0ec73e9c20a689062cade65c0115d6e82afab86444d253f9964688

data/README.md CHANGED Viewed

@@ -1,13 +1,24 @@
-## simple_inference Ruby SDK
+# SimpleInference
-Fiber-friendly Ruby client for the Simple Inference Server APIs (chat, embeddings, audio, rerank, health), designed to work well inside Rails apps and background jobs.
+A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
-### Installation
+Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with `Net::HTTP`.
-Add the gem to your Rails application's `Gemfile`, pointing at this repository path:
+## Features
+- 🔌 **Universal compatibility** – Works with any OpenAI-compatible API provider
+- 🌊 **Streaming support** – Native SSE streaming for chat completions
+- 🧵 **Fiber-friendly** – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
+- 🔧 **Flexible configuration** – Customizable API prefix for non-standard endpoints
+- 🎯 **Simple interface** – Receive-an-Object / Return-an-Object style API
+- 📦 **Zero runtime dependencies** – Uses only Ruby standard library
+## Installation
+Add to your Gemfile:
 ```ruby
-gem "simple_inference", path: "sdks/ruby"
+gem "simple_inference"
 ```
 Then run:
@@ -16,231 +27,378 @@ Then run:
 bundle install
 ```
-### Configuration
-You can configure the client via environment variables:
-- `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
-- `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
-- `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
-- `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
-Or explicitly when constructing a client:
+## Quick Start
 ```ruby
+require "simple_inference"
+# Connect to OpenAI
 client = SimpleInference::Client.new(
-  base_url: "http://localhost:8000",
-  api_key:  ENV["SIMPLE_INFERENCE_API_KEY"],
-  timeout:  30.0
+  base_url: "https://api.openai.com",
+  api_key: ENV["OPENAI_API_KEY"]
+)
+response = client.chat_completions(
+  model: "gpt-4o-mini",
+  messages: [{ "role" => "user", "content" => "Hello!" }]
 )
+puts response[:body]["choices"][0]["message"]["content"]
 ```
-For convenience, you can also use the module constructor:
+## Configuration
+### Options
+| Option | Env Variable | Default | Description |
+|--------|--------------|---------|-------------|
+| `base_url` | `SIMPLE_INFERENCE_BASE_URL` | `http://localhost:8000` | API base URL |
+| `api_key` | `SIMPLE_INFERENCE_API_KEY` | `nil` | API key (sent as `Authorization: Bearer <token>`) |
+| `api_prefix` | `SIMPLE_INFERENCE_API_PREFIX` | `/v1` | API path prefix (e.g., `/v1`, empty string for some providers) |
+| `timeout` | `SIMPLE_INFERENCE_TIMEOUT` | `nil` | Request timeout in seconds |
+| `open_timeout` | `SIMPLE_INFERENCE_OPEN_TIMEOUT` | `nil` | Connection open timeout |
+| `read_timeout` | `SIMPLE_INFERENCE_READ_TIMEOUT` | `nil` | Read timeout |
+| `raise_on_error` | `SIMPLE_INFERENCE_RAISE_ON_ERROR` | `true` | Raise exceptions on HTTP errors |
+| `headers` | – | `{}` | Additional headers to send with requests |
+| `adapter` | – | `Default` | HTTP adapter (see [Adapters](#http-adapters)) |
+### Provider Examples
+#### OpenAI
 ```ruby
-client = SimpleInference.new(base_url: "http://localhost:8000")
+client = SimpleInference::Client.new(
+  base_url: "https://api.openai.com",
+  api_key: ENV["OPENAI_API_KEY"]
+)
 ```
-### Rails integration example
+#### 火山引擎 (Volcengine / ByteDance)
-Create an initializer, for example `config/initializers/simple_inference.rb`:
+火山引擎的 API 路径不包含 `/v1` 前缀，需要设置 `api_prefix: ""`：
 ```ruby
-SIMPLE_INFERENCE_CLIENT = SimpleInference::Client.new(
-  base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
-  api_key:  ENV["SIMPLE_INFERENCE_API_KEY"]
+client = SimpleInference::Client.new(
+  base_url: "https://ark.cn-beijing.volces.com/api/v3",
+  api_key: ENV["ARK_API_KEY"],
+  api_prefix: ""  # 重要：火山引擎不使用 /v1 前缀
+)
+response = client.chat_completions(
+  model: "deepseek-v3-250324",
+  messages: [
+    { "role" => "system", "content" => "你是人工智能助手" },
+    { "role" => "user", "content" => "你好" }
+  ]
 )
 ```
-Then in a controller:
+#### DeepSeek
 ```ruby
-class ChatsController < ApplicationController
-  def create
-    result = SIMPLE_INFERENCE_CLIENT.chat_completions(
-      model:    "local-llm",
-      messages: [
-        { "role" => "user", "content" => params[:prompt] }
-      ]
-    )
-    render json: result[:body], status: result[:status]
-  end
-end
+client = SimpleInference::Client.new(
+  base_url: "https://api.deepseek.com",
+  api_key: ENV["DEEPSEEK_API_KEY"]
+)
 ```
-You can also use the client in background jobs:
+#### Groq
 ```ruby
-class EmbedJob < ApplicationJob
-  queue_as :default
+client = SimpleInference::Client.new(
+  base_url: "https://api.groq.com/openai",
+  api_key: ENV["GROQ_API_KEY"]
+)
+```
-  def perform(text)
-    result = SIMPLE_INFERENCE_CLIENT.embeddings(
-      model: "bge-m3",
-      input: text
-    )
+#### Together AI
-    vector = result[:body]["data"].first["embedding"]
-    # TODO: persist the vector (e.g. in DB or a vector store)
-  end
-end
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "https://api.together.xyz",
+  api_key: ENV["TOGETHER_API_KEY"]
+)
 ```
-And for health checks / maintenance tasks:
+#### Local inference servers (Ollama, vLLM, etc.)
 ```ruby
-if SIMPLE_INFERENCE_CLIENT.healthy?
-  Rails.logger.info("Inference server is healthy")
-else
-  Rails.logger.warn("Inference server is unhealthy")
-end
+# Ollama
+client = SimpleInference::Client.new(
+  base_url: "http://localhost:11434"
+)
-models = SIMPLE_INFERENCE_CLIENT.list_models
-Rails.logger.info("Available models: #{models[:body].inspect}")
+# vLLM
+client = SimpleInference::Client.new(
+  base_url: "http://localhost:8000"
+)
 ```
-### API methods
+#### Custom authentication header
-- `client.chat_completions(params)` → `POST /v1/chat/completions`
-- `client.embeddings(params)` → `POST /v1/embeddings`
-- `client.rerank(params)` → `POST /v1/rerank`
-- `client.list_models` → `GET /v1/models`
-- `client.health` → `GET /health`
-- `client.healthy?` → boolean helper based on `/health`
-- `client.audio_transcriptions(params)` → `POST /v1/audio/transcriptions`
-- `client.audio_translations(params)` → `POST /v1/audio/translations`
+Some providers use non-standard authentication headers:
-All methods follow a Receive-an-Object / Return-an-Object style:
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "https://my-service.example.com",
+  api_prefix: "/v1",
+  headers: {
+    "x-api-key" => ENV["MY_SERVICE_KEY"]
+  }
+)
+```
-- Input: a Ruby `Hash` (keys can be strings or symbols).
-- Output: a `Hash` with keys:
-  - `:status` – HTTP status code
-  - `:headers` – response headers (lowercased keys)
-  - `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
+## API Methods
-### Error handling
+### Chat Completions
-By default (`raise_on_error: true`) non-2xx HTTP responses raise:
+```ruby
+response = client.chat_completions(
+  model: "gpt-4o-mini",
+  messages: [
+    { "role" => "system", "content" => "You are a helpful assistant." },
+    { "role" => "user", "content" => "Hello!" }
+  ],
+  temperature: 0.7,
+  max_tokens: 1000
+)
-- `SimpleInference::Errors::HTTPError` – wraps status, headers and raw body.
+puts response[:body]["choices"][0]["message"]["content"]
+```
-Network and parsing errors are mapped to:
+### Streaming Chat Completions
-- `SimpleInference::Errors::TimeoutError`
-- `SimpleInference::Errors::ConnectionError`
-- `SimpleInference::Errors::DecodeError`
+```ruby
+client.chat_completions_stream(
+  model: "gpt-4o-mini",
+  messages: [{ "role" => "user", "content" => "Tell me a story" }]
+) do |event|
+  delta = event.dig("choices", 0, "delta", "content")
+  print delta if delta
+end
+puts
+```
-If you prefer to handle HTTP error codes manually, disable raising:
+Or use as an Enumerator:
 ```ruby
-client = SimpleInference::Client.new(
-  base_url: "http://localhost:8000",
-  raise_on_error: false
+stream = client.chat_completions_stream(
+  model: "gpt-4o-mini",
+  messages: [{ "role" => "user", "content" => "Hello" }]
 )
-response = client.embeddings(model: "local-embed", input: "hello")
-if response[:status] == 200
-  # happy path
-else
-  Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
+stream.each do |event|
+  # process event
 end
 ```
-### Using with OpenAI and compatible services
+### Embeddings
+```ruby
+response = client.embeddings(
+  model: "text-embedding-3-small",
+  input: "Hello, world!"
+)
-Because this SDK follows the OpenAI-style HTTP paths (`/v1/chat/completions`, `/v1/embeddings`, etc.), you can also point it directly at OpenAI or other compatible inference services.
+vector = response[:body]["data"][0]["embedding"]
+```
-#### Connect to OpenAI
+### Rerank
 ```ruby
-client = SimpleInference::Client.new(
-  base_url: "https://api.openai.com",
-  api_key:  ENV["OPENAI_API_KEY"]
+response = client.rerank(
+  model: "bge-reranker-v2-m3",
+  query: "What is machine learning?",
+  documents: [
+    "Machine learning is a subset of AI...",
+    "The weather today is sunny...",
+    "Deep learning uses neural networks..."
+  ]
 )
+```
-response = client.chat_completions(
-  model:    "gpt-4.1-mini",
-  messages: [{ "role" => "user", "content" => "Hello" }]
+### Audio Transcription
+```ruby
+response = client.audio_transcriptions(
+  model: "whisper-1",
+  file: File.open("audio.mp3", "rb")
 )
-pp response[:body]
+puts response[:body]["text"]
 ```
-#### Streaming chat completions (SSE)
+### Audio Translation
+```ruby
+response = client.audio_translations(
+  model: "whisper-1",
+  file: File.open("audio.mp3", "rb")
+)
+```
-For OpenAI-style streaming (`text/event-stream`), use `chat_completions_stream`. It yields parsed JSON events (Ruby `Hash`), so you can consume deltas incrementally:
+### List Models
 ```ruby
-client.chat_completions_stream(
-  model: "gpt-4.1-mini",
-  messages: [{ "role" => "user", "content" => "Hello" }]
-) do |event|
-  delta = event.dig("choices", 0, "delta", "content")
-  print delta if delta
-end
-puts
+response = client.list_models
+models = response[:body]["data"]
 ```
-If you prefer, it also returns an Enumerator:
+### Health Check
 ```ruby
-client.chat_completions_stream(model: "gpt-4.1-mini", messages: [...]).each do |event|
-  # ...
+# Returns full response
+response = client.health
+# Returns boolean
+if client.healthy?
+  puts "Service is up!"
 end
 ```
-Fallback behavior:
+## Response Format
+All methods return a Hash with:
+```ruby
+{
+  status: 200,                    # HTTP status code
+  headers: { "content-type" => "application/json", ... },  # Response headers (lowercase keys)
+  body: { ... }                   # Parsed JSON body (Hash) or raw String
+}
+```
+## Error Handling
+By default, non-2xx responses raise exceptions:
+```ruby
+begin
+  client.chat_completions(model: "invalid", messages: [])
+rescue SimpleInference::Errors::HTTPError => e
+  puts "HTTP #{e.status}: #{e.message}"
+  puts e.body  # raw response body
+end
+```
-- If the upstream service does **not** support streaming (for example, this repo's server currently returns `400` with `{"detail":"Streaming responses are not supported yet"}`), the SDK will **retry non-streaming** and yield a **single synthetic chunk** so your streaming consumer code can still run.
+Other exception types:
-#### Connect to any OpenAI-compatible endpoint
+- `SimpleInference::Errors::TimeoutError` – Request timed out
+- `SimpleInference::Errors::ConnectionError` – Network error
+- `SimpleInference::Errors::DecodeError` – JSON parsing failed
+- `SimpleInference::Errors::ConfigurationError` – Invalid configuration
-For services that expose an OpenAI-compatible API (same paths and payloads), point `base_url` at that service and provide the correct token:
+To handle errors manually:
 ```ruby
 client = SimpleInference::Client.new(
-  base_url: "https://my-openai-compatible.example.com",
-  api_key:  ENV["MY_SERVICE_TOKEN"]
+  base_url: "https://api.openai.com",
+  api_key: ENV["OPENAI_API_KEY"],
+  raise_on_error: false
 )
+response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
+if response[:status] == 200
+  # success
+else
+  puts "Error: #{response[:status]} - #{response[:body]}"
+end
 ```
-If the service uses a non-standard header instead of `Authorization: Bearer`, you can omit `api_key` and pass headers explicitly:
+## HTTP Adapters
+### Default (Net::HTTP)
+The default adapter uses Ruby's built-in `Net::HTTP`. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
+### HTTPX Adapter
+For better performance or async environments, use the optional HTTPX adapter:
+```ruby
+# Gemfile
+gem "httpx"
+```
 ```ruby
+adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
 client = SimpleInference::Client.new(
-  base_url: "https://my-service.example.com",
-  headers: {
-    "x-api-key" => ENV["MY_SERVICE_KEY"]
-  }
+  base_url: "https://api.openai.com",
+  api_key: ENV["OPENAI_API_KEY"],
+  adapter: adapter
 )
 ```
-### Puma vs Falcon (Fiber / Async) usage
+### Custom Adapter
-The default HTTP adapter uses Ruby's `Net::HTTP` and is safe to use under Puma's multithreaded model:
+Implement your own adapter by subclassing `SimpleInference::HTTPAdapter`:
-- No global mutable state
-- Per-client configuration only
-- Blocking IO that integrates with Ruby 3 Fiber scheduler
+```ruby
+class MyAdapter < SimpleInference::HTTPAdapter
+  def call(request)
+    # request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
+    # Must return: { status: Integer, headers: Hash, body: String }
+  end
+  def call_stream(request, &block)
+    # For streaming support (optional)
+    # Yield raw chunks to block for SSE responses
+  end
+end
+```
-If you don't pass an adapter, `SimpleInference::Client` uses `SimpleInference::HTTPAdapters::Default` (Net::HTTP).
+## Rails Integration
-For Falcon / async environments, you can keep the default adapter, or use the optional HTTPX adapter (requires the `httpx` gem):
+Create an initializer `config/initializers/simple_inference.rb`:
 ```ruby
-gem "httpx" # optional, only required when using the HTTPX adapter
+INFERENCE_CLIENT = SimpleInference::Client.new(
+  base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
+  api_key: ENV["INFERENCE_API_KEY"]
+)
 ```
-You can then use the optional HTTPX adapter shipped with this gem:
+Use in controllers:
 ```ruby
-adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
+class ChatsController < ApplicationController
+  def create
+    response = INFERENCE_CLIENT.chat_completions(
+      model: "gpt-4o-mini",
+      messages: [{ "role" => "user", "content" => params[:prompt] }]
+    )
-SIMPLE_INFERENCE_CLIENT =
-  SimpleInference::Client.new(
-    base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
-    api_key:  ENV["SIMPLE_INFERENCE_API_KEY"],
-    adapter:  adapter
-  )
+    render json: response[:body]
+  end
+end
+```
+Use in background jobs:
+```ruby
+class EmbedJob < ApplicationJob
+  def perform(text)
+    response = INFERENCE_CLIENT.embeddings(
+      model: "text-embedding-3-small",
+      input: text
+    )
+    vector = response[:body]["data"][0]["embedding"]
+    # Store vector...
+  end
+end
 ```
+## Thread Safety
+The client is thread-safe:
+- No global mutable state
+- Per-client configuration only
+- Each request uses its own HTTP connection
+## License
+MIT License. See [LICENSE](LICENSE.txt) for details.

data/lib/simple_inference/client.rb CHANGED Viewed

@@ -23,7 +23,7 @@ module SimpleInference
     # POST /v1/chat/completions
     # params: { model: "model-name", messages: [...], ... }
     def chat_completions(params)
-      post_json("/v1/chat/completions", params)
+      post_json(api_path("/chat/completions"), params)
     end
     # POST /v1/chat/completions (streaming)
@@ -43,7 +43,7 @@ module SimpleInference
       body.delete("stream")
       body["stream"] = true
-      response = post_json_stream("/v1/chat/completions", body) do |event|
+      response = post_json_stream(api_path("/chat/completions"), body) do |event|
         yield event
       end
@@ -60,7 +60,7 @@ module SimpleInference
         fallback_body.delete(:stream)
         fallback_body.delete("stream")
-        fallback_response = post_json("/v1/chat/completions", fallback_body)
+        fallback_response = post_json(api_path("/chat/completions"), fallback_body)
         chunk = synthesize_chat_completion_chunk(fallback_response[:body])
         yield chunk if chunk
         return fallback_response
@@ -78,17 +78,17 @@ module SimpleInference
     # POST /v1/embeddings
     def embeddings(params)
-      post_json("/v1/embeddings", params)
+      post_json(api_path("/embeddings"), params)
     end
     # POST /v1/rerank
     def rerank(params)
-      post_json("/v1/rerank", params)
+      post_json(api_path("/rerank"), params)
     end
     # GET /v1/models
     def list_models
-      get_json("/v1/models")
+      get_json(api_path("/models"))
     end
     # GET /health
@@ -109,12 +109,12 @@ module SimpleInference
     # POST /v1/audio/transcriptions
     # params: { file: io_or_hash, model: "model-name", **audio_options }
     def audio_transcriptions(params)
-      post_multipart("/v1/audio/transcriptions", params)
+      post_multipart(api_path("/audio/transcriptions"), params)
     end
     # POST /v1/audio/translations
     def audio_translations(params)
-      post_multipart("/v1/audio/translations", params)
+      post_multipart(api_path("/audio/translations"), params)
     end
     private
@@ -123,6 +123,10 @@ module SimpleInference
       config.base_url
     end
+    def api_path(endpoint)
+      "#{config.api_prefix}#{endpoint}"
+    end
     def get_json(path, params: nil, raise_on_http_error: nil)
       full_path = with_query(path, params)
       request_json(

data/lib/simple_inference/config.rb CHANGED Viewed

@@ -4,6 +4,7 @@ module SimpleInference
   class Config
     attr_reader :base_url,
                 :api_key,
+                :api_prefix,
                 :timeout,
                 :open_timeout,
                 :read_timeout,
@@ -19,6 +20,10 @@ module SimpleInference
       @api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
       @api_key = nil if @api_key.empty?
+      @api_prefix = normalize_api_prefix(
+        opts.key?(:api_prefix) ? opts[:api_prefix] : ENV.fetch("SIMPLE_INFERENCE_API_PREFIX", "/v1")
+      )
       @timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
       @open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
       @read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
@@ -46,6 +51,17 @@ module SimpleInference
       url.chomp("/")
     end
+    def normalize_api_prefix(value)
+      return "" if value.nil?
+      prefix = value.to_s.strip
+      return "" if prefix.empty?
+      # Ensure it starts with / and does not end with /
+      prefix = "/#{prefix}" unless prefix.start_with?("/")
+      prefix.chomp("/")
+    end
     def to_float_or_nil(value)
       return nil if value.nil? || value == ""

data/lib/simple_inference/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SimpleInference
-  VERSION = "0.1.3"
+  VERSION = "0.1.4"
 end

data/sig/simple_inference.rbs CHANGED Viewed

@@ -1,4 +1,3 @@
 module SimpleInference
   VERSION: String
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: simple_inference
 version: !ruby/object:Gem::Version
-  version: 0.1.3
+  version: 0.1.4
 platform: ruby
 authors:
 - jasl
@@ -9,8 +9,8 @@ bindir: exe
 cert_chain: []
 date: 1980-01-02 00:00:00.000000000 Z
 dependencies: []
-description: Fiber-friendly Ruby client for Simple Inference Server APIs (chat, embeddings,
-  audio, rerank, health).
+description: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
+  (chat, embeddings, audio, rerank, health).
 email:
 - jasl9187@hotmail.com
 executables: []
@@ -29,13 +29,12 @@ files:
 - lib/simple_inference/http_adapters/httpx.rb
 - lib/simple_inference/version.rb
 - sig/simple_inference.rbs
-homepage: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
+homepage: https://github.com/jasl/simple_inference.rb
 licenses:
 - MIT
 metadata:
   allowed_push_host: https://rubygems.org
-  homepage_uri: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
-  source_code_uri: https://github.com/jasl/simple_inference_server
+  homepage_uri: https://github.com/jasl/simple_inference.rb
   rubygems_mfa_required: 'true'
 rdoc_options: []
 require_paths:
@@ -51,7 +50,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 4.0.1
+rubygems_version: 4.0.3
 specification_version: 4
-summary: Fiber-friendly Ruby client for the Simple Inference Server (OpenAI-compatible).
+summary: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
 test_files: []