RubyGems - ruby_llm-semantic_cache - Versions diffs - 0.1.0 - Mend

ruby_llm-semantic_cache 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

checksums.yaml +7 -0
data/.rspec +3 -0
data/README.md +151 -0
data/Rakefile +6 -0
data/docker-compose.yml +32 -0
data/lib/ruby_llm/semantic_cache/cache_stores/base.rb +49 -0
data/lib/ruby_llm/semantic_cache/cache_stores/memory.rb +86 -0
data/lib/ruby_llm/semantic_cache/cache_stores/redis.rb +92 -0
data/lib/ruby_llm/semantic_cache/configuration.rb +131 -0
data/lib/ruby_llm/semantic_cache/embedding.rb +24 -0
data/lib/ruby_llm/semantic_cache/entry.rb +51 -0
data/lib/ruby_llm/semantic_cache/middleware.rb +199 -0
data/lib/ruby_llm/semantic_cache/scoped.rb +263 -0
data/lib/ruby_llm/semantic_cache/serializer.rb +116 -0
data/lib/ruby_llm/semantic_cache/vector_stores/base.rb +49 -0
data/lib/ruby_llm/semantic_cache/vector_stores/memory.rb +85 -0
data/lib/ruby_llm/semantic_cache/vector_stores/redis.rb +89 -0
data/lib/ruby_llm/semantic_cache/version.rb +7 -0
data/lib/ruby_llm/semantic_cache.rb +317 -0
data/lib/ruby_llm-semantic_cache.rb +3 -0
data/ruby_llm-semantic_cache.gemspec +41 -0
metadata +135 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 2e58e01f30b45eb4e64c138ebd569fd5c357ccca04a1bc5ed583d7a959ddd2de
+  data.tar.gz: 3e7f516a0a4f5651725e75b84054425e7907483434c28b03fc255e079ad8d1dc
+SHA512:
+  metadata.gz: a114ddf65e4f38f64baf1fe08cca50b6394dfc6cbb03684c7af788777149fd3ffac229de28d09f791bc71050551aa96f8480f8f74a75281d7b7e445002f27cef
+  data.tar.gz: d0682d532b699449a051b3213053a9dceea44ac92da4b8ccde2e4e7c47f85690a53d8158a18628f119012bd04886975351fdaf0379fe886449651c4ab6906c7a

data/.rspec ADDED Viewed

@@ -0,0 +1,3 @@
+--format documentation
+--color
+--require spec_helper

data/README.md ADDED Viewed

@@ -0,0 +1,151 @@
+# RubyLLM::SemanticCache
+Semantic caching for [RubyLLM](https://github.com/crmne/ruby_llm). Cache responses based on meaning, not exact strings.
+```
+"What's the capital of France?" → Cache MISS, call LLM
+"What is France's capital?"     → Cache HIT (92% similar)
+```
+Embedding models cost ~1000x less than chat models, so every cache hit saves money.
+## Installation
+```ruby
+gem 'ruby_llm-semantic_cache'
+```
+## Quick Start
+```ruby
+# Wrap any RubyLLM chat - caching is automatic
+chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat(model: "gpt-5.2"))
+chat.ask("What is Ruby?")  # Calls API, caches response
+# New conversation, same question = cache hit
+chat2 = RubyLLM::SemanticCache.wrap(RubyLLM.chat(model: "gpt-5.2"))
+chat2.ask("What is Ruby?")  # Returns cached response instantly
+```
+Or use the fetch API for one-off queries:
+```ruby
+response = RubyLLM::SemanticCache.fetch("What is Ruby?") do
+  RubyLLM.chat.ask("What is Ruby?")
+end
+```
+## How Caching Works
+By default, only the **first message** of each conversation is cached. Follow-up messages go directly to the LLM because they depend on conversation context.
+```ruby
+chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat)
+chat.ask("What is Ruby?")     # Cached
+chat.ask("Who created it?")   # NOT cached (context-dependent)
+```
+Cache keys include: **model + system prompt + message**. Different models or instructions = separate cache entries.
+## Configuration
+```ruby
+RubyLLM::SemanticCache.configure do |config|
+  # Storage (default: :memory, use :redis for production)
+  config.vector_store = :redis
+  config.cache_store = :redis
+  config.redis_url = ENV["REDIS_URL"]
+  # Similarity threshold: 0.92 = recommended, higher = stricter
+  config.similarity_threshold = 0.92
+  # Cache expiration (default: nil = never)
+  config.ttl = 24 * 60 * 60
+  # Embedding model
+  config.embedding_model = "text-embedding-3-small"
+  config.embedding_dimensions = 1536
+end
+```
+## Wrapper Options
+```ruby
+RubyLLM::SemanticCache.wrap(chat,
+  threshold: 0.95,         # Override similarity threshold
+  ttl: 3600,               # Override TTL (seconds)
+  max_messages: :unlimited # Cache all messages, not just first (default: 1)
+  # Also accepts: false (same as :unlimited), or Integer for custom limit
+  on_cache_hit: ->(chat, msg, resp) { log("Cache hit!") }
+)
+```
+## Multi-Turn Caching
+To cache entire conversation flows (not just first messages):
+```ruby
+chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat, max_messages: :unlimited)
+# Conversation 1
+chat.ask("What is Ruby?")
+chat.ask("Who created it?")
+# Conversation 2 - identical flow hits cache
+chat2 = RubyLLM::SemanticCache.wrap(RubyLLM.chat, max_messages: :unlimited)
+chat2.ask("What is Ruby?")    # Cache HIT
+chat2.ask("Who created it?")  # Cache HIT (same context)
+```
+## Rails Integration
+```ruby
+# config/initializers/semantic_cache.rb
+RubyLLM::SemanticCache.configure do |config|
+  config.vector_store = :redis
+  config.cache_store = :redis
+  config.redis_url = ENV["REDIS_URL"]
+  config.namespace = Rails.env
+end
+```
+## Additional APIs
+```ruby
+# Manual store
+RubyLLM::SemanticCache.store(query: "What is Ruby?", response: message)
+# Search similar
+RubyLLM::SemanticCache.search("Tell me about Ruby", limit: 5)
+# Check/delete
+RubyLLM::SemanticCache.exists?("What is Ruby?")
+RubyLLM::SemanticCache.delete("What is Ruby?")
+# Stats
+RubyLLM::SemanticCache.stats  # => { hits: 150, misses: 20, hit_rate: 0.88 }
+# Scoped caches (for multi-tenant)
+support = RubyLLM::SemanticCache::Scoped.new(namespace: "support")
+sales = RubyLLM::SemanticCache::Scoped.new(namespace: "sales")
+```
+## Requirements
+- Ruby >= 2.7
+- [RubyLLM](https://github.com/crmne/ruby_llm) >= 1.0
+- Redis 8+ with [neighbor-redis](https://github.com/ankane/neighbor-redis) (for production)
+# Roadmap
+- [x] Basic semantic caching
+- [x] Configurable similarity threshold
+- [x] Multi-turn caching
+- [x] Redis vector store
+- [ ] Advanced eviction policies
+- [ ] Web dashboard for cache stats?
+- [ ] Support for more vector stores?
+## License
+MIT

data/Rakefile ADDED Viewed

@@ -0,0 +1,6 @@
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task :default => :spec

data/docker-compose.yml ADDED Viewed

@@ -0,0 +1,32 @@
+services:
+  redis:
+    image: redis:8-alpine
+    ports:
+      - "6379:6379"
+    volumes:
+      - redis_data:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+  postgresql:
+    image: pgvector/pgvector:pg17
+    ports:
+      - "5432:5432"
+    environment:
+      POSTGRES_USER: llm_cache
+      POSTGRES_PASSWORD: llm_cache
+      POSTGRES_DB: llm_cache_dev
+    volumes:
+      - postgresql_data:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U llm_cache"]
+      interval: 5s
+      timeout: 3s
+      retries: 5
+volumes:
+  redis_data:
+  postgresql_data:

data/lib/ruby_llm/semantic_cache/cache_stores/base.rb ADDED Viewed

@@ -0,0 +1,49 @@
+# frozen_string_literal: true
+module RubyLLM
+  module SemanticCache
+    module CacheStores
+      class Base
+        def initialize(config)
+          @config = config
+        end
+        # Get a cached entry by ID
+        # @param id [String] unique identifier
+        # @return [Hash, nil] the cached entry or nil if not found
+        def get(id)
+          raise NotImplementedError
+        end
+        # Store an entry
+        # @param id [String] unique identifier
+        # @param data [Hash] the data to store
+        # @param ttl [Integer, nil] time-to-live in seconds
+        def set(id, data, ttl: nil)
+          raise NotImplementedError
+        end
+        # Delete an entry by ID
+        # @param id [String] unique identifier
+        def delete(id)
+          raise NotImplementedError
+        end
+        # Clear all entries
+        def clear!
+          raise NotImplementedError
+        end
+        # Check if the store is empty
+        def empty?
+          raise NotImplementedError
+        end
+        # Get the number of entries stored
+        def size
+          raise NotImplementedError
+        end
+      end
+    end
+  end
+end

data/lib/ruby_llm/semantic_cache/cache_stores/memory.rb ADDED Viewed

@@ -0,0 +1,86 @@
+# frozen_string_literal: true
+require_relative "base"
+module RubyLLM
+  module SemanticCache
+    module CacheStores
+      class Memory < Base
+        CacheEntry = Struct.new(:data, :expires_at, keyword_init: true)
+        def initialize(config)
+          super
+          @store = {}
+          @mutex = Mutex.new
+        end
+        def get(id)
+          @mutex.synchronize do
+            entry = @store[id]
+            return nil unless entry
+            if entry.expires_at && Time.now > entry.expires_at
+              @store.delete(id)
+              return nil
+            end
+            entry.data
+          end
+        end
+        def set(id, data, ttl: nil)
+          @mutex.synchronize do
+            expires_at = ttl ? Time.now + ttl : nil
+            @store[id] = CacheEntry.new(data: data, expires_at: expires_at)
+          end
+        end
+        def delete(id)
+          @mutex.synchronize do
+            @store.delete(id)
+          end
+        end
+        def clear!
+          @mutex.synchronize do
+            @store.clear
+          end
+        end
+        def empty?
+          @mutex.synchronize do
+            cleanup_expired
+            @store.empty?
+          end
+        end
+        def size
+          @mutex.synchronize do
+            cleanup_expired
+            @store.size
+          end
+        end
+        # Iterate over all entries (for invalidation)
+        # @yield [id, data] each entry
+        def each
+          @mutex.synchronize do
+            cleanup_expired
+            @store.each do |id, entry|
+              yield(id, entry.data)
+            end
+          end
+        end
+        private
+        def cleanup_expired
+          now = Time.now
+          @store.delete_if do |_id, entry|
+            entry.expires_at && now > entry.expires_at
+          end
+        end
+      end
+    end
+  end
+end

data/lib/ruby_llm/semantic_cache/cache_stores/redis.rb ADDED Viewed

@@ -0,0 +1,92 @@
+# frozen_string_literal: true
+require "json"
+require_relative "base"
+module RubyLLM
+  module SemanticCache
+    module CacheStores
+      class Redis < Base
+        def initialize(config)
+          super
+          setup_client
+        end
+        def get(id)
+          key = cache_key(id)
+          data = @client.call("GET", key)
+          return nil unless data
+          JSON.parse(data, symbolize_names: true)
+        rescue JSON::ParserError
+          nil
+        end
+        def set(id, data, ttl: nil)
+          key = cache_key(id)
+          json = JSON.generate(data)
+          if ttl
+            @client.call("SETEX", key, ttl.to_i, json)
+          else
+            @client.call("SET", key, json)
+          end
+        end
+        def delete(id)
+          key = cache_key(id)
+          @client.call("DEL", key)
+        end
+        def clear!
+          pattern = cache_key("*")
+          cursor = "0"
+          loop do
+            cursor, keys = @client.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
+            @client.call("DEL", *keys) unless keys.empty?
+            break if cursor == "0"
+          end
+        end
+        def empty?
+          size.zero?
+        end
+        def size
+          pattern = cache_key("*")
+          stats_key = cache_key("__semantic_cache_stats__")
+          count = 0
+          cursor = "0"
+          loop do
+            cursor, keys = @client.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
+            # Exclude the stats key from the count
+            count += keys.reject { |k| k == stats_key }.size
+            break if cursor == "0"
+          end
+          count
+        end
+        private
+        def setup_client
+          require "redis-client"
+          @client = if @config.redis_client
+                      @config.redis_client
+                    elsif @config.redis_url
+                      RedisClient.config(url: @config.redis_url).new_client
+                    else
+                      RedisClient.config.new_client
+                    end
+        end
+        def cache_key(id)
+          "#{@config.namespace}:cache:#{id}"
+        end
+      end
+    end
+  end
+end

data/lib/ruby_llm/semantic_cache/configuration.rb ADDED Viewed

@@ -0,0 +1,131 @@
+# frozen_string_literal: true
+module RubyLLM
+  module SemanticCache
+    # Defined here to avoid circular dependency - Error is defined in semantic_cache.rb
+    # but configuration.rb is loaded first
+    class ConfigurationError < StandardError; end
+    class Configuration
+      VALID_VECTOR_STORES = %i[memory redis].freeze
+      VALID_CACHE_STORES = %i[memory redis].freeze
+      # Vector store backend: :redis, :memory
+      attr_accessor :vector_store
+      # Cache store backend: :redis, :memory
+      attr_accessor :cache_store
+      # Redis connection URL (if using Redis backend)
+      attr_accessor :redis_url
+      # Redis client instance (alternative to redis_url)
+      attr_accessor :redis_client
+      # Embedding model to use (default: text-embedding-3-small)
+      attr_accessor :embedding_model
+      # Embedding dimensions (default: 1536 for text-embedding-3-small)
+      attr_accessor :embedding_dimensions
+      # Similarity threshold (0.0 to 1.0)
+      # Higher = stricter matching, fewer cache hits
+      # Lower = looser matching, more cache hits but potential mismatches
+      attr_accessor :similarity_threshold
+      # TTL for cached entries in seconds (nil = no expiration)
+      attr_accessor :ttl
+      # Namespace for cache keys (useful for multi-tenant apps)
+      attr_accessor :namespace
+      # Instrumentation callback for metrics/observability
+      # Called with event_name and payload hash
+      attr_accessor :instrumentation_callback
+      # Maximum conversation messages before skipping cache (excluding system messages)
+      # - Integer: skip cache after N messages (default: 1, only first message cached)
+      # - :unlimited or false: cache all messages regardless of conversation length
+      attr_accessor :max_messages
+      def initialize
+        @vector_store = :memory
+        @cache_store = :memory
+        @redis_url = nil
+        @redis_client = nil
+        @embedding_model = "text-embedding-3-small"
+        @embedding_dimensions = 1536
+        @similarity_threshold = 0.92
+        @ttl = nil
+        @namespace = "ruby_llm_semantic_cache"
+        @instrumentation_callback = nil
+        @max_messages = 1
+      end
+      def ttl_seconds
+        return nil if @ttl.nil?
+        case @ttl
+        when Numeric then @ttl.to_i
+        when ->(t) { t.respond_to?(:to_i) } then @ttl.to_i
+        else nil
+        end
+      end
+      # Validate the configuration and raise errors for invalid settings
+      # @raise [ConfigurationError] if configuration is invalid
+      def validate!
+        validate_stores!
+        validate_threshold!
+        validate_dimensions!
+        validate_redis_config!
+        true
+      end
+      # Check if configuration is valid without raising
+      # @return [Boolean]
+      def valid?
+        validate!
+        true
+      rescue ConfigurationError
+        false
+      end
+      private
+      def validate_stores!
+        unless VALID_VECTOR_STORES.include?(@vector_store)
+          raise ConfigurationError,
+                "Invalid vector_store: #{@vector_store}. Valid options: #{VALID_VECTOR_STORES.join(', ')}"
+        end
+        unless VALID_CACHE_STORES.include?(@cache_store)
+          raise ConfigurationError,
+                "Invalid cache_store: #{@cache_store}. Valid options: #{VALID_CACHE_STORES.join(', ')}"
+        end
+      end
+      def validate_threshold!
+        unless @similarity_threshold.is_a?(Numeric) && (0.0..1.0).cover?(@similarity_threshold)
+          raise ConfigurationError,
+                "similarity_threshold must be a number between 0.0 and 1.0, got: #{@similarity_threshold.inspect}"
+        end
+      end
+      def validate_dimensions!
+        unless @embedding_dimensions.is_a?(Integer) && @embedding_dimensions.positive?
+          raise ConfigurationError,
+                "embedding_dimensions must be a positive integer, got: #{@embedding_dimensions.inspect}"
+        end
+      end
+      def validate_redis_config!
+        return unless @vector_store == :redis || @cache_store == :redis
+        return if @redis_url || @redis_client
+        raise ConfigurationError,
+              "redis_url or redis_client required when using Redis backend"
+      end
+    end
+  end
+end

data/lib/ruby_llm/semantic_cache/embedding.rb ADDED Viewed

@@ -0,0 +1,24 @@
+# frozen_string_literal: true
+module RubyLLM
+  module SemanticCache
+    class Embedding
+      def initialize(config)
+        @config = config
+      end
+      def generate(text)
+        result = RubyLLM.embed(text, model: @config.embedding_model)
+        # RubyLLM.embed returns vectors as array (single text) or array of arrays (multiple texts)
+        vectors = result.vectors
+        vectors.is_a?(Array) && vectors.first.is_a?(Array) ? vectors.first : vectors
+      end
+      def generate_batch(texts)
+        result = RubyLLM.embed(texts, model: @config.embedding_model)
+        result.vectors
+      end
+    end
+  end
+end

data/lib/ruby_llm/semantic_cache/entry.rb ADDED Viewed

@@ -0,0 +1,51 @@
+# frozen_string_literal: true
+require "securerandom"
+require "time"
+module RubyLLM
+  module SemanticCache
+    class Entry
+      attr_reader :id, :query, :response, :embedding, :metadata, :created_at
+      def initialize(query:, response:, embedding:, metadata: {}, id: nil, created_at: nil)
+        @id = id || SecureRandom.uuid
+        @query = query
+        @response = response
+        @embedding = embedding
+        @metadata = metadata
+        @created_at = created_at || Time.now
+      end
+      def to_h
+        {
+          id: @id,
+          query: @query,
+          response: @response,
+          metadata: @metadata,
+          created_at: @created_at.iso8601
+        }
+      end
+      def self.from_h(hash)
+        new(
+          id: hash[:id] || hash["id"],
+          query: hash[:query] || hash["query"],
+          response: hash[:response] || hash["response"],
+          embedding: hash[:embedding] || hash["embedding"],
+          metadata: hash[:metadata] || hash["metadata"] || {},
+          created_at: parse_time(hash[:created_at] || hash["created_at"])
+        )
+      end
+      def self.parse_time(value)
+        case value
+        when Time then value
+        when String then Time.parse(value)
+        when nil then Time.now
+        else value
+        end
+      end
+    end
+  end
+end