RubyGems - simple_inference - Versions diffs - 0.1.0 - Mend

simple_inference 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +7 -0
data/LICENSE.txt +21 -0
data/README.md +244 -0
data/Rakefile +12 -0
data/lib/simple_inference/client.rb +573 -0
data/lib/simple_inference/config.rb +88 -0
data/lib/simple_inference/errors.rb +24 -0
data/lib/simple_inference/http_adapter/httpx.rb +72 -0
data/lib/simple_inference/http_adapter.rb +123 -0
data/lib/simple_inference/version.rb +5 -0
data/lib/simple_inference.rb +19 -0
data/sig/simple_inference.rbs +4 -0
metadata +56 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 324b5bd8986fee7b2130229b1bb7ad277b1c476c71ed07c5a350845f91ae23a5
+  data.tar.gz: b746716fe0e9364bf1085acc4ac2a09dfe6ca5703831562869a96fdf43ea29a2
+SHA512:
+  metadata.gz: 25baf84b1d22f57ed1ccb4b0413acf79cad258efee5e2a31b33d6a0247a87b175492cf293a67c6c7c2503c7d3c28e32283e21dba1aeb1bcbbe621203975e7972
+  data.tar.gz: ba78fc2c974118d4046119cc378202556e3d50611514df0cb4a9affa41412c5bb1a77aeccaef34071de15792a63b17bb5f484cafec8f506f7372660ba09fc5fa

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2025 jasl
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,244 @@
+## simple_inference Ruby SDK
+Fiber-friendly Ruby client for the Simple Inference Server APIs (chat, embeddings, audio, rerank, health), designed to work well inside Rails apps and background jobs.
+### Installation
+Add the gem to your Rails application's `Gemfile`, pointing at this repository path:
+```ruby
+gem "simple_inference", path: "sdks/ruby"
+```
+Then run:
+```bash
+bundle install
+```
+### Configuration
+You can configure the client via environment variables:
+- `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
+- `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
+- `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
+- `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
+Or explicitly when constructing a client:
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "http://localhost:8000",
+  api_key:  ENV["SIMPLE_INFERENCE_API_KEY"],
+  timeout:  30.0
+)
+```
+For convenience, you can also use the module constructor:
+```ruby
+client = SimpleInference.new(base_url: "http://localhost:8000")
+```
+### Rails integration example
+Create an initializer, for example `config/initializers/simple_inference.rb`:
+```ruby
+SIMPLE_INFERENCE_CLIENT = SimpleInference::Client.new(
+  base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
+  api_key:  ENV["SIMPLE_INFERENCE_API_KEY"]
+)
+```
+Then in a controller:
+```ruby
+class ChatsController < ApplicationController
+  def create
+    result = SIMPLE_INFERENCE_CLIENT.chat_completions(
+      model:    "local-llm",
+      messages: [
+        { "role" => "user", "content" => params[:prompt] }
+      ]
+    )
+    render json: result[:body], status: result[:status]
+  end
+end
+```
+You can also use the client in background jobs:
+```ruby
+class EmbedJob < ApplicationJob
+  queue_as :default
+  def perform(text)
+    result = SIMPLE_INFERENCE_CLIENT.embeddings(
+      model: "bge-m3",
+      input: text
+    )
+    vector = result[:body]["data"].first["embedding"]
+    # TODO: persist the vector (e.g. in DB or a vector store)
+  end
+end
+```
+And for health checks / maintenance tasks:
+```ruby
+if SIMPLE_INFERENCE_CLIENT.healthy?
+  Rails.logger.info("Inference server is healthy")
+else
+  Rails.logger.warn("Inference server is unhealthy")
+end
+models = SIMPLE_INFERENCE_CLIENT.list_models
+Rails.logger.info("Available models: #{models[:body].inspect}")
+```
+### API methods
+- `client.chat_completions(params)` → `POST /v1/chat/completions`
+- `client.embeddings(params)` → `POST /v1/embeddings`
+- `client.rerank(params)` → `POST /v1/rerank`
+- `client.list_models` → `GET /v1/models`
+- `client.health` → `GET /health`
+- `client.healthy?` → boolean helper based on `/health`
+- `client.audio_transcriptions(params)` → `POST /v1/audio/transcriptions`
+- `client.audio_translations(params)` → `POST /v1/audio/translations`
+All methods follow a Receive-an-Object / Return-an-Object style:
+- Input: a Ruby `Hash` (keys can be strings or symbols).
+- Output: a `Hash` with keys:
+  - `:status` – HTTP status code
+  - `:headers` – response headers (lowercased keys)
+  - `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
+### Error handling
+By default (`raise_on_error: true`) non-2xx HTTP responses raise:
+- `SimpleInference::Errors::HTTPError` – wraps status, headers and raw body.
+Network and parsing errors are mapped to:
+- `SimpleInference::Errors::TimeoutError`
+- `SimpleInference::Errors::ConnectionError`
+- `SimpleInference::Errors::DecodeError`
+If you prefer to handle HTTP error codes manually, disable raising:
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "http://localhost:8000",
+  raise_on_error: false
+)
+response = client.embeddings(model: "local-embed", input: "hello")
+if response[:status] == 200
+  # happy path
+else
+  Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
+end
+```
+### Using with OpenAI and compatible services
+Because this SDK follows the OpenAI-style HTTP paths (`/v1/chat/completions`, `/v1/embeddings`, etc.), you can also point it directly at OpenAI or other compatible inference services.
+#### Connect to OpenAI
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "https://api.openai.com",
+  api_key:  ENV["OPENAI_API_KEY"]
+)
+response = client.chat_completions(
+  model:    "gpt-4.1-mini",
+  messages: [{ "role" => "user", "content" => "Hello" }]
+)
+pp response[:body]
+```
+#### Streaming chat completions (SSE)
+For OpenAI-style streaming (`text/event-stream`), use `chat_completions_stream`. It yields parsed JSON events (Ruby `Hash`), so you can consume deltas incrementally:
+```ruby
+client.chat_completions_stream(
+  model: "gpt-4.1-mini",
+  messages: [{ "role" => "user", "content" => "Hello" }]
+) do |event|
+  delta = event.dig("choices", 0, "delta", "content")
+  print delta if delta
+end
+puts
+```
+If you prefer, it also returns an Enumerator:
+```ruby
+client.chat_completions_stream(model: "gpt-4.1-mini", messages: [...]).each do |event|
+  # ...
+end
+```
+Fallback behavior:
+- If the upstream service does **not** support streaming (for example, this repo's server currently returns `400` with `{"detail":"Streaming responses are not supported yet"}`), the SDK will **retry non-streaming** and yield a **single synthetic chunk** so your streaming consumer code can still run.
+#### Connect to any OpenAI-compatible endpoint
+For services that expose an OpenAI-compatible API (same paths and payloads), point `base_url` at that service and provide the correct token:
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "https://my-openai-compatible.example.com",
+  api_key:  ENV["MY_SERVICE_TOKEN"]
+)
+```
+If the service uses a non-standard header instead of `Authorization: Bearer`, you can omit `api_key` and pass headers explicitly:
+```ruby
+client = SimpleInference::Client.new(
+  base_url: "https://my-service.example.com",
+  headers: {
+    "x-api-key" => ENV["MY_SERVICE_KEY"]
+  }
+)
+```
+### Puma vs Falcon (Fiber / Async) usage
+The default HTTP adapter uses Ruby's `Net::HTTP` and is safe to use under Puma's multithreaded model:
+- No global mutable state
+- Per-client configuration only
+- Blocking IO that integrates with Ruby 3 Fiber scheduler
+For Falcon / async environments, you can keep the default adapter, or use the optional HTTPX adapter (requires the `httpx` gem):
+```ruby
+gem "httpx" # optional, only required when using the HTTPX adapter
+```
+You can then use the optional HTTPX adapter shipped with this gem:
+```ruby
+adapter = SimpleInference::HTTPAdapter::HTTPX.new(timeout: 30.0)
+SIMPLE_INFERENCE_CLIENT =
+  SimpleInference::Client.new(
+    base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
+    api_key:  ENV["SIMPLE_INFERENCE_API_KEY"],
+    adapter:  adapter
+  )
+```

data/Rakefile ADDED Viewed

@@ -0,0 +1,12 @@
+# frozen_string_literal: true
+require "bundler/gem_tasks"
+require "minitest/test_task"
+Minitest::TestTask.create
+require "rubocop/rake_task"
+RuboCop::RakeTask.new
+task default: %i[test rubocop]

data/lib/simple_inference/client.rb ADDED Viewed

@@ -0,0 +1,573 @@
+# frozen_string_literal: true
+require "json"
+require "securerandom"
+require "uri"
+require "timeout"
+require "socket"
+module SimpleInference
+  class Client
+    attr_reader :config, :adapter
+    def initialize(options = {})
+      @config = Config.new(options || {})
+      @adapter = @config.adapter || HTTPAdapter::Default.new
+    end
+    # POST /v1/chat/completions
+    # params: { model: "model-name", messages: [...], ... }
+    def chat_completions(params)
+      post_json("/v1/chat/completions", params)
+    end
+    # POST /v1/chat/completions (streaming)
+    #
+    # Yields parsed JSON events from an OpenAI-style SSE stream (`text/event-stream`).
+    #
+    # If no block is given, returns an Enumerator.
+    def chat_completions_stream(params)
+      return enum_for(:chat_completions_stream, params) unless block_given?
+      unless params.is_a?(Hash)
+        raise Errors::ConfigurationError, "params must be a Hash"
+      end
+      body = params.dup
+      body.delete(:stream)
+      body.delete("stream")
+      body["stream"] = true
+      response = post_json_stream("/v1/chat/completions", body) do |event|
+        yield event
+      end
+      content_type = response.dig(:headers, "content-type").to_s
+      # Streaming case: we already yielded events from the SSE stream.
+      if response[:status].to_i >= 200 && response[:status].to_i < 300 && content_type.include?("text/event-stream")
+        return response
+      end
+      # Fallback when upstream does not support streaming (this repo's server).
+      if streaming_unsupported_error?(response[:status], response[:body])
+        fallback_body = params.dup
+        fallback_body.delete(:stream)
+        fallback_body.delete("stream")
+        fallback_response = post_json("/v1/chat/completions", fallback_body)
+        chunk = synthesize_chat_completion_chunk(fallback_response[:body])
+        yield chunk if chunk
+        return fallback_response
+      end
+      # If we got a non-streaming success response (JSON), convert it into a single
+      # chunk so streaming consumers can share the same code path.
+      if response[:status].to_i >= 200 && response[:status].to_i < 300
+        chunk = synthesize_chat_completion_chunk(response[:body])
+        yield chunk if chunk
+      end
+      response
+    end
+    # POST /v1/embeddings
+    def embeddings(params)
+      post_json("/v1/embeddings", params)
+    end
+    # POST /v1/rerank
+    def rerank(params)
+      post_json("/v1/rerank", params)
+    end
+    # GET /v1/models
+    def list_models
+      get_json("/v1/models")
+    end
+    # GET /health
+    def health
+      get_json("/health")
+    end
+    # Returns true when service is healthy, false otherwise.
+    def healthy?
+      response = get_json("/health", raise_on_http_error: false)
+      status_ok = response[:status] == 200
+      body_status_ok = response.dig(:body, "status") == "ok"
+      status_ok && body_status_ok
+    rescue Errors::Error
+      false
+    end
+    # POST /v1/audio/transcriptions
+    # params: { file: io_or_hash, model: "model-name", **audio_options }
+    def audio_transcriptions(params)
+      post_multipart("/v1/audio/transcriptions", params)
+    end
+    # POST /v1/audio/translations
+    def audio_translations(params)
+      post_multipart("/v1/audio/translations", params)
+    end
+    private
+    def base_url
+      config.base_url
+    end
+    def get_json(path, params: nil, raise_on_http_error: nil)
+      full_path = with_query(path, params)
+      request_json(
+        method: :get,
+        path: full_path,
+        body: nil,
+        expect_json: true,
+        raise_on_http_error: raise_on_http_error
+      )
+    end
+    def post_json(path, body, raise_on_http_error: nil)
+      request_json(
+        method: :post,
+        path: path,
+        body: body,
+        expect_json: true,
+        raise_on_http_error: raise_on_http_error
+      )
+    end
+    def post_json_stream(path, body, raise_on_http_error: nil, &on_event)
+      if base_url.nil? || base_url.empty?
+        raise Errors::ConfigurationError, "base_url is required"
+      end
+      url = "#{base_url}#{path}"
+      headers = config.headers.merge(
+        "Content-Type" => "application/json",
+        "Accept" => "text/event-stream, application/json"
+      )
+      payload = body.nil? ? nil : JSON.generate(body)
+      request_env = {
+        method: :post,
+        url: url,
+        headers: headers,
+        body: payload,
+        timeout: config.timeout,
+        open_timeout: config.open_timeout,
+        read_timeout: config.read_timeout,
+      }
+      handle_stream_response(request_env, raise_on_http_error: raise_on_http_error, &on_event)
+    end
+    def handle_stream_response(request_env, raise_on_http_error:, &on_event)
+      sse_buffer = +""
+      sse_done = false
+      used_streaming_adapter = false
+      raw_response =
+        if @adapter.respond_to?(:call_stream)
+          used_streaming_adapter = true
+          @adapter.call_stream(request_env) do |chunk|
+            next if sse_done
+            sse_buffer << chunk.to_s
+            extract_sse_blocks!(sse_buffer).each do |block|
+              data = sse_data_from_block(block)
+              next if data.nil?
+              payload = data.strip
+              next if payload.empty?
+              if payload == "[DONE]"
+                sse_done = true
+                sse_buffer.clear
+                break
+              end
+              on_event&.call(parse_json_event(payload))
+            end
+          end
+        else
+          @adapter.call(request_env)
+        end
+      status = raw_response[:status]
+      headers = (raw_response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
+      body = raw_response[:body]
+      body_str = body.nil? ? "" : body.to_s
+      content_type = headers["content-type"].to_s
+      # Streaming case.
+      if status >= 200 && status < 300 && content_type.include?("text/event-stream")
+        # If we couldn't stream incrementally, best-effort parse the full SSE body.
+        unless used_streaming_adapter
+          buffer = body_str.dup
+          extract_sse_blocks!(buffer).each do |block|
+            data = sse_data_from_block(block)
+            next if data.nil?
+            payload = data.strip
+            next if payload.empty?
+            break if payload == "[DONE]"
+            on_event&.call(parse_json_event(payload))
+          end
+        end
+        return {
+          status: status,
+          headers: headers,
+          body: nil,
+        }
+      end
+      # Non-streaming response path (adapter doesn't support streaming or server returned JSON).
+      should_parse_json = content_type.include?("json")
+      parsed_body = should_parse_json ? parse_json(body_str) : body_str
+      raise_on =
+        if raise_on_http_error.nil?
+          config.raise_on_error
+        else
+          !!raise_on_http_error
+        end
+      if raise_on && (status < 200 || status >= 300)
+        # Do not raise for the known "streaming unsupported" case; the caller will
+        # perform a non-streaming retry fallback.
+        unless streaming_unsupported_error?(status, parsed_body)
+          message = "HTTP #{status}"
+          begin
+            error_body = JSON.parse(body_str)
+            error_field = error_body["error"]
+            message =
+              if error_field.is_a?(Hash)
+                error_field["message"] || error_body["message"] || message
+              else
+                error_field || error_body["message"] || message
+              end
+          rescue JSON::ParserError
+            # fall back to generic message
+          end
+          raise Errors::HTTPError.new(
+            message,
+            status: status,
+            headers: headers,
+            body: body_str
+          )
+        end
+      end
+      {
+        status: status,
+        headers: headers,
+        body: parsed_body,
+      }
+    rescue Timeout::Error => e
+      raise Errors::TimeoutError, e.message
+    rescue SocketError, SystemCallError => e
+      raise Errors::ConnectionError, e.message
+    end
+    def extract_sse_blocks!(buffer)
+      blocks = []
+      loop do
+        idx_lf = buffer.index("\n\n")
+        idx_crlf = buffer.index("\r\n\r\n")
+        idx = [idx_lf, idx_crlf].compact.min
+        break if idx.nil?
+        sep_len = (idx == idx_crlf) ? 4 : 2
+        blocks << buffer.slice!(0, idx)
+        buffer.slice!(0, sep_len)
+      end
+      blocks
+    end
+    def sse_data_from_block(block)
+      return nil if block.nil? || block.empty?
+      data_lines = []
+      block.split(/\r?\n/).each do |line|
+        next if line.nil? || line.empty?
+        next if line.start_with?(":")
+        next unless line.start_with?("data:")
+        data_lines << (line[5..]&.lstrip).to_s
+      end
+      return nil if data_lines.empty?
+      data_lines.join("\n")
+    end
+    def parse_json_event(payload)
+      JSON.parse(payload)
+    rescue JSON::ParserError => e
+      raise Errors::DecodeError, "Failed to parse SSE JSON event: #{e.message}"
+    end
+    def streaming_unsupported_error?(status, body)
+      return false unless status.to_i == 400
+      return false unless body.is_a?(Hash)
+      body["detail"].to_s.strip == "Streaming responses are not supported yet"
+    end
+    def synthesize_chat_completion_chunk(body)
+      return nil unless body.is_a?(Hash)
+      id = body["id"]
+      created = body["created"]
+      model = body["model"]
+      choices = body["choices"]
+      return nil unless choices.is_a?(Array) && !choices.empty?
+      choice0 = choices[0]
+      return nil unless choice0.is_a?(Hash)
+      message = choice0["message"]
+      return nil unless message.is_a?(Hash)
+      role = message["role"] || "assistant"
+      content = message["content"]
+      {
+        "id" => id,
+        "object" => "chat.completion.chunk",
+        "created" => created,
+        "model" => model,
+        "choices" => [
+          {
+            "index" => choice0["index"] || 0,
+            "delta" => {
+              "role" => role,
+              "content" => content,
+            },
+            "finish_reason" => choice0["finish_reason"],
+          },
+        ],
+      }
+    end
+    def request_json(method:, path:, body:, expect_json:, raise_on_http_error:)
+      if base_url.nil? || base_url.empty?
+        raise Errors::ConfigurationError, "base_url is required"
+      end
+      url = "#{base_url}#{path}"
+      headers = config.headers.merge("Content-Type" => "application/json")
+      payload = body.nil? ? nil : JSON.generate(body)
+      request_env = {
+        method: method,
+        url: url,
+        headers: headers,
+        body: payload,
+        timeout: config.timeout,
+        open_timeout: config.open_timeout,
+        read_timeout: config.read_timeout,
+      }
+      handle_response(
+        request_env,
+        expect_json: expect_json,
+        raise_on_http_error: raise_on_http_error
+      )
+    end
+    def with_query(path, params)
+      return path if params.nil? || params.empty?
+      query = URI.encode_www_form(params)
+      separator = path.include?("?") ? "&" : "?"
+      "#{path}#{separator}#{query}"
+    end
+    def post_multipart(path, params)
+      file_value = params[:file] || params["file"]
+      model = params[:model] || params["model"]
+      raise Errors::ConfigurationError, "file is required" if file_value.nil?
+      raise Errors::ConfigurationError, "model is required" if model.nil? || model.to_s.empty?
+      io, filename = normalize_upload(file_value)
+      form_fields = {
+        "model" => model.to_s,
+      }
+      # Optional scalar fields
+      %i[language prompt response_format temperature].each do |key|
+        value = params[key] || params[key.to_s]
+        next if value.nil?
+        form_fields[key.to_s] = value.to_s
+      end
+      # timestamp_granularities can be an array or single value
+      tgs = params[:timestamp_granularities] || params["timestamp_granularities"]
+      if tgs && !tgs.empty?
+        Array(tgs).each_with_index do |value, index|
+          form_fields["timestamp_granularities[#{index}]"] = value.to_s
+        end
+      end
+      body, headers = build_multipart_body(io, filename, form_fields)
+      request_env = {
+        method: :post,
+        url: "#{base_url}#{path}",
+        headers: config.headers.merge(headers),
+        body: body,
+        timeout: config.timeout,
+        open_timeout: config.open_timeout,
+        read_timeout: config.read_timeout,
+      }
+      handle_response(
+        request_env,
+        expect_json: nil, # auto-detect based on Content-Type
+        raise_on_http_error: nil
+      )
+    ensure
+      if io && io.respond_to?(:close)
+        begin
+          io.close unless io.closed?
+        rescue StandardError
+          # ignore close errors
+        end
+      end
+    end
+    def normalize_upload(file)
+      if file.is_a?(Hash)
+        io = file[:io] || file["io"]
+        filename = file[:filename] || file["filename"] || "audio.wav"
+      elsif file.respond_to?(:read)
+        io = file
+        filename =
+          if file.respond_to?(:path) && file.path
+            File.basename(file.path)
+          else
+            "audio.wav"
+          end
+      else
+        raise Errors::ConfigurationError,
+              "file must be an IO object or a hash with :io and :filename keys"
+      end
+      raise Errors::ConfigurationError, "file IO is required" if io.nil?
+      [io, filename]
+    end
+    def build_multipart_body(io, filename, fields)
+      boundary = "simple-inference-ruby-#{SecureRandom.hex(12)}"
+      headers = {
+        "Content-Type" => "multipart/form-data; boundary=#{boundary}",
+      }
+      body = +""
+      fields.each do |name, value|
+        body << "--#{boundary}\r\n"
+        body << %(Content-Disposition: form-data; name="#{name}"\r\n\r\n)
+        body << value.to_s
+        body << "\r\n"
+      end
+      body << "--#{boundary}\r\n"
+      body << %(Content-Disposition: form-data; name="file"; filename="#{filename}"\r\n)
+      body << "Content-Type: application/octet-stream\r\n\r\n"
+      while (chunk = io.read(16_384))
+        body << chunk
+      end
+      body << "\r\n--#{boundary}--\r\n"
+      [body, headers]
+    end
+    def handle_response(request_env, expect_json:, raise_on_http_error:)
+      response = @adapter.call(request_env)
+      status = response[:status]
+      headers = (response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
+      body = response[:body].to_s
+      # Decide whether to raise on HTTP errors
+      raise_on =
+        if raise_on_http_error.nil?
+          config.raise_on_error
+        else
+          !!raise_on_http_error
+        end
+      if raise_on && (status < 200 || status >= 300)
+        message = "HTTP #{status}"
+        begin
+          error_body = JSON.parse(body)
+          message = error_body["error"] || error_body["message"] || message
+        rescue JSON::ParserError
+          # fall back to generic message
+        end
+        raise Errors::HTTPError.new(
+          message,
+          status: status,
+          headers: headers,
+          body: body
+        )
+      end
+      should_parse_json =
+        if expect_json.nil?
+          content_type = headers["content-type"]
+          content_type && content_type.include?("json")
+        else
+          expect_json
+        end
+      parsed_body =
+        if should_parse_json
+          parse_json(body)
+        else
+          body
+        end
+      {
+        status: status,
+        headers: headers,
+        body: parsed_body,
+      }
+    rescue Timeout::Error => e
+      raise Errors::TimeoutError, e.message
+    rescue SocketError, SystemCallError => e
+      raise Errors::ConnectionError, e.message
+    end
+    def parse_json(body)
+      return nil if body.nil? || body.empty?
+      JSON.parse(body)
+    rescue JSON::ParserError => e
+      raise Errors::DecodeError, "Failed to parse JSON response: #{e.message}"
+    end
+  end
+end

data/lib/simple_inference/config.rb ADDED Viewed

@@ -0,0 +1,88 @@
+# frozen_string_literal: true
+module SimpleInference
+  class Config
+    attr_reader :base_url,
+                :api_key,
+                :timeout,
+                :open_timeout,
+                :read_timeout,
+                :adapter,
+                :raise_on_error
+    def initialize(options = {})
+      opts = symbolize_keys(options || {})
+      @base_url = normalize_base_url(
+        opts[:base_url] || ENV["SIMPLE_INFERENCE_BASE_URL"] || "http://localhost:8000"
+      )
+      @api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
+      @api_key = nil if @api_key.empty?
+      @timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
+      @open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
+      @read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
+      @adapter = opts[:adapter]
+      @raise_on_error = boolean_option(
+        explicit: opts.fetch(:raise_on_error, nil),
+        env_name: "SIMPLE_INFERENCE_RAISE_ON_ERROR",
+        default: true
+      )
+      @default_headers = build_default_headers(opts[:headers] || {})
+    end
+    def headers
+      @default_headers.dup
+    end
+    private
+    def normalize_base_url(value)
+      url = value.to_s.strip
+      url = "http://localhost:8000" if url.empty?
+      url.chomp("/")
+    end
+    def to_float_or_nil(value)
+      return nil if value.nil? || value == ""
+      Float(value)
+    rescue ArgumentError, TypeError
+      nil
+    end
+    def boolean_option(explicit:, env_name:, default:)
+      return !!explicit unless explicit.nil?
+      env_value = ENV[env_name]
+      return default if env_value.nil?
+      %w[1 true yes on].include?(env_value.to_s.strip.downcase)
+    end
+    def build_default_headers(extra_headers)
+      headers = {
+        "Accept" => "application/json",
+      }
+      headers["Authorization"] = "Bearer #{@api_key}" if @api_key
+      headers.merge(stringify_keys(extra_headers))
+    end
+    def symbolize_keys(hash)
+      hash.each_with_object({}) do |(key, value), out|
+        out[key.to_sym] = value
+      end
+    end
+    def stringify_keys(hash)
+      hash.each_with_object({}) do |(key, value), out|
+        out[key.to_s] = value
+      end
+    end
+  end
+end

data/lib/simple_inference/errors.rb ADDED Viewed

@@ -0,0 +1,24 @@
+# frozen_string_literal: true
+module SimpleInference
+  module Errors
+    class Error < StandardError; end
+    class ConfigurationError < Error; end
+    class HTTPError < Error
+      attr_reader :status, :headers, :body
+      def initialize(message, status:, headers:, body:)
+        super(message)
+        @status = status
+        @headers = headers
+        @body = body
+      end
+    end
+    class TimeoutError < Error; end
+    class ConnectionError < Error; end
+    class DecodeError < Error; end
+  end
+end

data/lib/simple_inference/http_adapter/httpx.rb ADDED Viewed

@@ -0,0 +1,72 @@
+# frozen_string_literal: true
+begin
+  require "httpx"
+rescue LoadError => e
+  raise LoadError,
+        "httpx gem is required for SimpleInference::HTTPAdapter::HTTPX (add `gem \"httpx\"`)",
+        cause: e
+end
+module SimpleInference
+  module HTTPAdapter
+    # Fiber-friendly HTTP adapter built on HTTPX.
+    #
+    # NOTE: This adapter intentionally does NOT implement `#call_stream`.
+    # Streaming consumers will still work via the SDK's full-body SSE parsing
+    # fallback path (see SimpleInference::Client#handle_stream_response).
+    class HTTPX
+      def initialize(timeout: nil)
+        @timeout = timeout
+      end
+      def call(request)
+        method = request.fetch(:method).to_s.downcase.to_sym
+        url = request.fetch(:url)
+        headers = request[:headers] || {}
+        body = request[:body]
+        client = ::HTTPX
+        # Mirror the SDK's timeout semantics:
+        # - `:timeout` is the overall request deadline (maps to HTTPX `request_timeout`)
+        # - `:open_timeout` and `:read_timeout` override connect/read deadlines
+        timeout = request[:timeout] || @timeout
+        open_timeout = request[:open_timeout] || timeout
+        read_timeout = request[:read_timeout] || timeout
+        timeout_opts = {}
+        timeout_opts[:request_timeout] = timeout.to_f if timeout
+        timeout_opts[:connect_timeout] = open_timeout.to_f if open_timeout
+        timeout_opts[:read_timeout] = read_timeout.to_f if read_timeout
+        unless timeout_opts.empty?
+          client = client.with(timeout: timeout_opts)
+        end
+        response = client.request(method, url, headers: headers, body: body)
+        # HTTPX may return an error response object instead of raising.
+        if response.respond_to?(:status) && response.status.to_i == 0
+          err = response.respond_to?(:error) ? response.error : nil
+          raise Errors::ConnectionError, (err ? err.message : "HTTPX request failed")
+        end
+        response_headers =
+          response.headers.to_h.each_with_object({}) do |(k, v), out|
+            out[k.to_s] = v.is_a?(Array) ? v.join(", ") : v.to_s
+          end
+        {
+          status: response.status.to_i,
+          headers: response_headers,
+          body: response.body.to_s,
+        }
+      rescue ::HTTPX::TimeoutError => e
+        raise Errors::TimeoutError, e.message
+      rescue ::HTTPX::Error, IOError, SystemCallError => e
+        raise Errors::ConnectionError, e.message
+      end
+    end
+  end
+end

data/lib/simple_inference/http_adapter.rb ADDED Viewed

@@ -0,0 +1,123 @@
+# frozen_string_literal: true
+require "net/http"
+require "uri"
+module SimpleInference
+  module HTTPAdapter
+    # Optional adapters are lazily loaded so the SDK has no hard runtime deps.
+    autoload :HTTPX, "simple_inference/http_adapter/httpx"
+    # Default synchronous HTTP adapter built on Net::HTTP.
+    # It is compatible with Ruby 3 Fiber scheduler and keeps the interface
+    # minimal so it can be swapped out for custom adapters (HTTPX, async-http, etc.).
+    class Default
+      def call(request)
+        uri = URI.parse(request.fetch(:url))
+        http = Net::HTTP.new(uri.host, uri.port)
+        http.use_ssl = uri.scheme == "https"
+        timeout = request[:timeout]
+        open_timeout = request[:open_timeout] || timeout
+        read_timeout = request[:read_timeout] || timeout
+        http.open_timeout = open_timeout if open_timeout
+        http.read_timeout = read_timeout if read_timeout
+        klass = http_class_for(request[:method])
+        req = klass.new(uri.request_uri)
+        headers = request[:headers] || {}
+        headers.each do |key, value|
+          req[key.to_s] = value
+        end
+        body = request[:body]
+        req.body = body if body
+        response = http.request(req)
+        {
+          status: Integer(response.code),
+          headers: response.each_header.to_h,
+          body: response.body.to_s,
+        }
+      end
+      # Streaming-capable request helper.
+      #
+      # When the response is `text/event-stream` (and 2xx), it yields raw body chunks
+      # as they arrive via the given block, and returns a response hash with `body: nil`.
+      #
+      # For non-streaming responses, it behaves like `#call` and returns the full body.
+      def call_stream(request)
+        return call(request) unless block_given?
+        uri = URI.parse(request.fetch(:url))
+        http = Net::HTTP.new(uri.host, uri.port)
+        http.use_ssl = uri.scheme == "https"
+        timeout = request[:timeout]
+        open_timeout = request[:open_timeout] || timeout
+        read_timeout = request[:read_timeout] || timeout
+        http.open_timeout = open_timeout if open_timeout
+        http.read_timeout = read_timeout if read_timeout
+        klass = http_class_for(request[:method])
+        req = klass.new(uri.request_uri)
+        headers = request[:headers] || {}
+        headers.each do |key, value|
+          req[key.to_s] = value
+        end
+        body = request[:body]
+        req.body = body if body
+        status = nil
+        response_headers = {}
+        response_body = +""
+        http.request(req) do |response|
+          status = Integer(response.code)
+          response_headers = response.each_header.to_h
+          headers_lc = response_headers.transform_keys { |k| k.to_s.downcase }
+          content_type = headers_lc["content-type"]
+          if status >= 200 && status < 300 && content_type&.include?("text/event-stream")
+            response.read_body do |chunk|
+              yield chunk
+            end
+            response_body = nil
+          else
+            response_body = response.body.to_s
+          end
+        end
+        {
+          status: Integer(status),
+          headers: response_headers,
+          body: response_body,
+        }
+      end
+      private
+      def http_class_for(method)
+        case method.to_s.upcase
+        when "GET"    then Net::HTTP::Get
+        when "POST"   then Net::HTTP::Post
+        when "PUT"    then Net::HTTP::Put
+        when "PATCH"  then Net::HTTP::Patch
+        when "DELETE" then Net::HTTP::Delete
+        else
+          raise ArgumentError, "Unsupported HTTP method: #{method.inspect}"
+        end
+      end
+    end
+  end
+end

data/lib/simple_inference/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module SimpleInference
+  VERSION = "0.1.0"
+end

data/lib/simple_inference.rb ADDED Viewed

@@ -0,0 +1,19 @@
+# frozen_string_literal: true
+require_relative "simple_inference/version"
+require_relative "simple_inference/config"
+require_relative "simple_inference/errors"
+require_relative "simple_inference/http_adapter"
+require_relative "simple_inference/client"
+module SimpleInference
+  class << self
+    # Convenience constructor using RORO-style options hash.
+    #
+    # Example:
+    #   client = SimpleInference.new(base_url: "...", api_key: "...")
+    def new(options = {})
+      Client.new(options)
+    end
+  end
+end

data/sig/simple_inference.rbs ADDED Viewed

@@ -0,0 +1,4 @@
+module SimpleInference
+  VERSION: String
+end

metadata ADDED Viewed

@@ -0,0 +1,56 @@
+--- !ruby/object:Gem::Specification
+name: simple_inference
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- jasl
+bindir: exe
+cert_chain: []
+date: 1980-01-02 00:00:00.000000000 Z
+dependencies: []
+description: Fiber-friendly Ruby client for Simple Inference Server APIs (chat, embeddings,
+  audio, rerank, health).
+email:
+- jasl9187@hotmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE.txt
+- README.md
+- Rakefile
+- lib/simple_inference.rb
+- lib/simple_inference/client.rb
+- lib/simple_inference/config.rb
+- lib/simple_inference/errors.rb
+- lib/simple_inference/http_adapter.rb
+- lib/simple_inference/http_adapter/httpx.rb
+- lib/simple_inference/version.rb
+- sig/simple_inference.rbs
+homepage: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
+licenses:
+- MIT
+metadata:
+  allowed_push_host: https://rubygems.org
+  homepage_uri: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
+  source_code_uri: https://github.com/jasl/simple_inference_server
+  rubygems_mfa_required: 'true'
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.2.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 4.0.1
+specification_version: 4
+summary: Fiber-friendly Ruby client for the Simple Inference Server (OpenAI-compatible).
+test_files: []