ruby_llm-semantic_cache 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2e58e01f30b45eb4e64c138ebd569fd5c357ccca04a1bc5ed583d7a959ddd2de
4
+ data.tar.gz: 3e7f516a0a4f5651725e75b84054425e7907483434c28b03fc255e079ad8d1dc
5
+ SHA512:
6
+ metadata.gz: a114ddf65e4f38f64baf1fe08cca50b6394dfc6cbb03684c7af788777149fd3ffac229de28d09f791bc71050551aa96f8480f8f74a75281d7b7e445002f27cef
7
+ data.tar.gz: d0682d532b699449a051b3213053a9dceea44ac92da4b8ccde2e4e7c47f85690a53d8158a18628f119012bd04886975351fdaf0379fe886449651c4ab6906c7a
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/README.md ADDED
@@ -0,0 +1,151 @@
1
+ # RubyLLM::SemanticCache
2
+
3
+ Semantic caching for [RubyLLM](https://github.com/crmne/ruby_llm). Cache responses based on meaning, not exact strings.
4
+
5
+ ```
6
+ "What's the capital of France?" → Cache MISS, call LLM
7
+ "What is France's capital?" → Cache HIT (92% similar)
8
+ ```
9
+
10
+ Embedding models cost ~1000x less than chat models, so every cache hit saves money.
11
+
12
+ ## Installation
13
+
14
+ ```ruby
15
+ gem 'ruby_llm-semantic_cache'
16
+ ```
17
+
18
+ ## Quick Start
19
+
20
+ ```ruby
21
+ # Wrap any RubyLLM chat - caching is automatic
22
+ chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat(model: "gpt-5.2"))
23
+ chat.ask("What is Ruby?") # Calls API, caches response
24
+
25
+ # New conversation, same question = cache hit
26
+ chat2 = RubyLLM::SemanticCache.wrap(RubyLLM.chat(model: "gpt-5.2"))
27
+ chat2.ask("What is Ruby?") # Returns cached response instantly
28
+ ```
29
+
30
+ Or use the fetch API for one-off queries:
31
+
32
+ ```ruby
33
+ response = RubyLLM::SemanticCache.fetch("What is Ruby?") do
34
+ RubyLLM.chat.ask("What is Ruby?")
35
+ end
36
+ ```
37
+
38
+ ## How Caching Works
39
+
40
+ By default, only the **first message** of each conversation is cached. Follow-up messages go directly to the LLM because they depend on conversation context.
41
+
42
+ ```ruby
43
+ chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat)
44
+ chat.ask("What is Ruby?") # Cached
45
+ chat.ask("Who created it?") # NOT cached (context-dependent)
46
+ ```
47
+
48
+ Cache keys include: **model + system prompt + message**. Different models or instructions = separate cache entries.
49
+
50
+ ## Configuration
51
+
52
+ ```ruby
53
+ RubyLLM::SemanticCache.configure do |config|
54
+ # Storage (default: :memory, use :redis for production)
55
+ config.vector_store = :redis
56
+ config.cache_store = :redis
57
+ config.redis_url = ENV["REDIS_URL"]
58
+
59
+ # Similarity threshold: 0.92 = recommended, higher = stricter
60
+ config.similarity_threshold = 0.92
61
+
62
+ # Cache expiration (default: nil = never)
63
+ config.ttl = 24 * 60 * 60
64
+
65
+ # Embedding model
66
+ config.embedding_model = "text-embedding-3-small"
67
+ config.embedding_dimensions = 1536
68
+ end
69
+ ```
70
+
71
+ ## Wrapper Options
72
+
73
+ ```ruby
74
+ RubyLLM::SemanticCache.wrap(chat,
75
+ threshold: 0.95, # Override similarity threshold
76
+ ttl: 3600, # Override TTL (seconds)
77
+ max_messages: :unlimited # Cache all messages, not just first (default: 1)
78
+ # Also accepts: false (same as :unlimited), or Integer for custom limit
79
+ on_cache_hit: ->(chat, msg, resp) { log("Cache hit!") }
80
+ )
81
+ ```
82
+
83
+ ## Multi-Turn Caching
84
+
85
+ To cache entire conversation flows (not just first messages):
86
+
87
+ ```ruby
88
+ chat = RubyLLM::SemanticCache.wrap(RubyLLM.chat, max_messages: :unlimited)
89
+
90
+ # Conversation 1
91
+ chat.ask("What is Ruby?")
92
+ chat.ask("Who created it?")
93
+
94
+ # Conversation 2 - identical flow hits cache
95
+ chat2 = RubyLLM::SemanticCache.wrap(RubyLLM.chat, max_messages: :unlimited)
96
+ chat2.ask("What is Ruby?") # Cache HIT
97
+ chat2.ask("Who created it?") # Cache HIT (same context)
98
+ ```
99
+
100
+ ## Rails Integration
101
+
102
+ ```ruby
103
+ # config/initializers/semantic_cache.rb
104
+ RubyLLM::SemanticCache.configure do |config|
105
+ config.vector_store = :redis
106
+ config.cache_store = :redis
107
+ config.redis_url = ENV["REDIS_URL"]
108
+ config.namespace = Rails.env
109
+ end
110
+ ```
111
+
112
+ ## Additional APIs
113
+
114
+ ```ruby
115
+ # Manual store
116
+ RubyLLM::SemanticCache.store(query: "What is Ruby?", response: message)
117
+
118
+ # Search similar
119
+ RubyLLM::SemanticCache.search("Tell me about Ruby", limit: 5)
120
+
121
+ # Check/delete
122
+ RubyLLM::SemanticCache.exists?("What is Ruby?")
123
+ RubyLLM::SemanticCache.delete("What is Ruby?")
124
+
125
+ # Stats
126
+ RubyLLM::SemanticCache.stats # => { hits: 150, misses: 20, hit_rate: 0.88 }
127
+
128
+ # Scoped caches (for multi-tenant)
129
+ support = RubyLLM::SemanticCache::Scoped.new(namespace: "support")
130
+ sales = RubyLLM::SemanticCache::Scoped.new(namespace: "sales")
131
+ ```
132
+
133
+ ## Requirements
134
+
135
+ - Ruby >= 2.7
136
+ - [RubyLLM](https://github.com/crmne/ruby_llm) >= 1.0
137
+ - Redis 8+ with [neighbor-redis](https://github.com/ankane/neighbor-redis) (for production)
138
+
139
+ # Roadmap
140
+
141
+ - [x] Basic semantic caching
142
+ - [x] Configurable similarity threshold
143
+ - [x] Multi-turn caching
144
+ - [x] Redis vector store
145
+ - [ ] Advanced eviction policies
146
+ - [ ] Web dashboard for cache stats?
147
+ - [ ] Support for more vector stores?
148
+
149
+ ## License
150
+
151
+ MIT
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,32 @@
1
+ services:
2
+ redis:
3
+ image: redis:8-alpine
4
+ ports:
5
+ - "6379:6379"
6
+ volumes:
7
+ - redis_data:/data
8
+ healthcheck:
9
+ test: ["CMD", "redis-cli", "ping"]
10
+ interval: 5s
11
+ timeout: 3s
12
+ retries: 5
13
+
14
+ postgresql:
15
+ image: pgvector/pgvector:pg17
16
+ ports:
17
+ - "5432:5432"
18
+ environment:
19
+ POSTGRES_USER: llm_cache
20
+ POSTGRES_PASSWORD: llm_cache
21
+ POSTGRES_DB: llm_cache_dev
22
+ volumes:
23
+ - postgresql_data:/var/lib/postgresql/data
24
+ healthcheck:
25
+ test: ["CMD-SHELL", "pg_isready -U llm_cache"]
26
+ interval: 5s
27
+ timeout: 3s
28
+ retries: 5
29
+
30
+ volumes:
31
+ redis_data:
32
+ postgresql_data:
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RubyLLM
4
+ module SemanticCache
5
+ module CacheStores
6
+ class Base
7
+ def initialize(config)
8
+ @config = config
9
+ end
10
+
11
+ # Get a cached entry by ID
12
+ # @param id [String] unique identifier
13
+ # @return [Hash, nil] the cached entry or nil if not found
14
+ def get(id)
15
+ raise NotImplementedError
16
+ end
17
+
18
+ # Store an entry
19
+ # @param id [String] unique identifier
20
+ # @param data [Hash] the data to store
21
+ # @param ttl [Integer, nil] time-to-live in seconds
22
+ def set(id, data, ttl: nil)
23
+ raise NotImplementedError
24
+ end
25
+
26
+ # Delete an entry by ID
27
+ # @param id [String] unique identifier
28
+ def delete(id)
29
+ raise NotImplementedError
30
+ end
31
+
32
+ # Clear all entries
33
+ def clear!
34
+ raise NotImplementedError
35
+ end
36
+
37
+ # Check if the store is empty
38
+ def empty?
39
+ raise NotImplementedError
40
+ end
41
+
42
+ # Get the number of entries stored
43
+ def size
44
+ raise NotImplementedError
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,86 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "base"
4
+
5
+ module RubyLLM
6
+ module SemanticCache
7
+ module CacheStores
8
+ class Memory < Base
9
+ CacheEntry = Struct.new(:data, :expires_at, keyword_init: true)
10
+
11
+ def initialize(config)
12
+ super
13
+ @store = {}
14
+ @mutex = Mutex.new
15
+ end
16
+
17
+ def get(id)
18
+ @mutex.synchronize do
19
+ entry = @store[id]
20
+ return nil unless entry
21
+
22
+ if entry.expires_at && Time.now > entry.expires_at
23
+ @store.delete(id)
24
+ return nil
25
+ end
26
+
27
+ entry.data
28
+ end
29
+ end
30
+
31
+ def set(id, data, ttl: nil)
32
+ @mutex.synchronize do
33
+ expires_at = ttl ? Time.now + ttl : nil
34
+ @store[id] = CacheEntry.new(data: data, expires_at: expires_at)
35
+ end
36
+ end
37
+
38
+ def delete(id)
39
+ @mutex.synchronize do
40
+ @store.delete(id)
41
+ end
42
+ end
43
+
44
+ def clear!
45
+ @mutex.synchronize do
46
+ @store.clear
47
+ end
48
+ end
49
+
50
+ def empty?
51
+ @mutex.synchronize do
52
+ cleanup_expired
53
+ @store.empty?
54
+ end
55
+ end
56
+
57
+ def size
58
+ @mutex.synchronize do
59
+ cleanup_expired
60
+ @store.size
61
+ end
62
+ end
63
+
64
+ # Iterate over all entries (for invalidation)
65
+ # @yield [id, data] each entry
66
+ def each
67
+ @mutex.synchronize do
68
+ cleanup_expired
69
+ @store.each do |id, entry|
70
+ yield(id, entry.data)
71
+ end
72
+ end
73
+ end
74
+
75
+ private
76
+
77
+ def cleanup_expired
78
+ now = Time.now
79
+ @store.delete_if do |_id, entry|
80
+ entry.expires_at && now > entry.expires_at
81
+ end
82
+ end
83
+ end
84
+ end
85
+ end
86
+ end
@@ -0,0 +1,92 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require_relative "base"
5
+
6
+ module RubyLLM
7
+ module SemanticCache
8
+ module CacheStores
9
+ class Redis < Base
10
+ def initialize(config)
11
+ super
12
+ setup_client
13
+ end
14
+
15
+ def get(id)
16
+ key = cache_key(id)
17
+ data = @client.call("GET", key)
18
+ return nil unless data
19
+
20
+ JSON.parse(data, symbolize_names: true)
21
+ rescue JSON::ParserError
22
+ nil
23
+ end
24
+
25
+ def set(id, data, ttl: nil)
26
+ key = cache_key(id)
27
+ json = JSON.generate(data)
28
+
29
+ if ttl
30
+ @client.call("SETEX", key, ttl.to_i, json)
31
+ else
32
+ @client.call("SET", key, json)
33
+ end
34
+ end
35
+
36
+ def delete(id)
37
+ key = cache_key(id)
38
+ @client.call("DEL", key)
39
+ end
40
+
41
+ def clear!
42
+ pattern = cache_key("*")
43
+ cursor = "0"
44
+
45
+ loop do
46
+ cursor, keys = @client.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
47
+ @client.call("DEL", *keys) unless keys.empty?
48
+ break if cursor == "0"
49
+ end
50
+ end
51
+
52
+ def empty?
53
+ size.zero?
54
+ end
55
+
56
+ def size
57
+ pattern = cache_key("*")
58
+ stats_key = cache_key("__semantic_cache_stats__")
59
+ count = 0
60
+ cursor = "0"
61
+
62
+ loop do
63
+ cursor, keys = @client.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
64
+ # Exclude the stats key from the count
65
+ count += keys.reject { |k| k == stats_key }.size
66
+ break if cursor == "0"
67
+ end
68
+
69
+ count
70
+ end
71
+
72
+ private
73
+
74
+ def setup_client
75
+ require "redis-client"
76
+
77
+ @client = if @config.redis_client
78
+ @config.redis_client
79
+ elsif @config.redis_url
80
+ RedisClient.config(url: @config.redis_url).new_client
81
+ else
82
+ RedisClient.config.new_client
83
+ end
84
+ end
85
+
86
+ def cache_key(id)
87
+ "#{@config.namespace}:cache:#{id}"
88
+ end
89
+ end
90
+ end
91
+ end
92
+ end
@@ -0,0 +1,131 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RubyLLM
4
+ module SemanticCache
5
+ # Defined here to avoid circular dependency - Error is defined in semantic_cache.rb
6
+ # but configuration.rb is loaded first
7
+ class ConfigurationError < StandardError; end
8
+
9
+ class Configuration
10
+ VALID_VECTOR_STORES = %i[memory redis].freeze
11
+ VALID_CACHE_STORES = %i[memory redis].freeze
12
+
13
+ # Vector store backend: :redis, :memory
14
+ attr_accessor :vector_store
15
+
16
+ # Cache store backend: :redis, :memory
17
+ attr_accessor :cache_store
18
+
19
+ # Redis connection URL (if using Redis backend)
20
+ attr_accessor :redis_url
21
+
22
+ # Redis client instance (alternative to redis_url)
23
+ attr_accessor :redis_client
24
+
25
+ # Embedding model to use (default: text-embedding-3-small)
26
+ attr_accessor :embedding_model
27
+
28
+ # Embedding dimensions (default: 1536 for text-embedding-3-small)
29
+ attr_accessor :embedding_dimensions
30
+
31
+ # Similarity threshold (0.0 to 1.0)
32
+ # Higher = stricter matching, fewer cache hits
33
+ # Lower = looser matching, more cache hits but potential mismatches
34
+ attr_accessor :similarity_threshold
35
+
36
+ # TTL for cached entries in seconds (nil = no expiration)
37
+ attr_accessor :ttl
38
+
39
+ # Namespace for cache keys (useful for multi-tenant apps)
40
+ attr_accessor :namespace
41
+
42
+ # Instrumentation callback for metrics/observability
43
+ # Called with event_name and payload hash
44
+ attr_accessor :instrumentation_callback
45
+
46
+ # Maximum conversation messages before skipping cache (excluding system messages)
47
+ # - Integer: skip cache after N messages (default: 1, only first message cached)
48
+ # - :unlimited or false: cache all messages regardless of conversation length
49
+ attr_accessor :max_messages
50
+
51
+ def initialize
52
+ @vector_store = :memory
53
+ @cache_store = :memory
54
+ @redis_url = nil
55
+ @redis_client = nil
56
+ @embedding_model = "text-embedding-3-small"
57
+ @embedding_dimensions = 1536
58
+ @similarity_threshold = 0.92
59
+ @ttl = nil
60
+ @namespace = "ruby_llm_semantic_cache"
61
+ @instrumentation_callback = nil
62
+ @max_messages = 1
63
+ end
64
+
65
+ def ttl_seconds
66
+ return nil if @ttl.nil?
67
+
68
+ case @ttl
69
+ when Numeric then @ttl.to_i
70
+ when ->(t) { t.respond_to?(:to_i) } then @ttl.to_i
71
+ else nil
72
+ end
73
+ end
74
+
75
+ # Validate the configuration and raise errors for invalid settings
76
+ # @raise [ConfigurationError] if configuration is invalid
77
+ def validate!
78
+ validate_stores!
79
+ validate_threshold!
80
+ validate_dimensions!
81
+ validate_redis_config!
82
+ true
83
+ end
84
+
85
+ # Check if configuration is valid without raising
86
+ # @return [Boolean]
87
+ def valid?
88
+ validate!
89
+ true
90
+ rescue ConfigurationError
91
+ false
92
+ end
93
+
94
+ private
95
+
96
+ def validate_stores!
97
+ unless VALID_VECTOR_STORES.include?(@vector_store)
98
+ raise ConfigurationError,
99
+ "Invalid vector_store: #{@vector_store}. Valid options: #{VALID_VECTOR_STORES.join(', ')}"
100
+ end
101
+
102
+ unless VALID_CACHE_STORES.include?(@cache_store)
103
+ raise ConfigurationError,
104
+ "Invalid cache_store: #{@cache_store}. Valid options: #{VALID_CACHE_STORES.join(', ')}"
105
+ end
106
+ end
107
+
108
+ def validate_threshold!
109
+ unless @similarity_threshold.is_a?(Numeric) && (0.0..1.0).cover?(@similarity_threshold)
110
+ raise ConfigurationError,
111
+ "similarity_threshold must be a number between 0.0 and 1.0, got: #{@similarity_threshold.inspect}"
112
+ end
113
+ end
114
+
115
+ def validate_dimensions!
116
+ unless @embedding_dimensions.is_a?(Integer) && @embedding_dimensions.positive?
117
+ raise ConfigurationError,
118
+ "embedding_dimensions must be a positive integer, got: #{@embedding_dimensions.inspect}"
119
+ end
120
+ end
121
+
122
+ def validate_redis_config!
123
+ return unless @vector_store == :redis || @cache_store == :redis
124
+ return if @redis_url || @redis_client
125
+
126
+ raise ConfigurationError,
127
+ "redis_url or redis_client required when using Redis backend"
128
+ end
129
+ end
130
+ end
131
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RubyLLM
4
+ module SemanticCache
5
+ class Embedding
6
+ def initialize(config)
7
+ @config = config
8
+ end
9
+
10
+ def generate(text)
11
+ result = RubyLLM.embed(text, model: @config.embedding_model)
12
+
13
+ # RubyLLM.embed returns vectors as array (single text) or array of arrays (multiple texts)
14
+ vectors = result.vectors
15
+ vectors.is_a?(Array) && vectors.first.is_a?(Array) ? vectors.first : vectors
16
+ end
17
+
18
+ def generate_batch(texts)
19
+ result = RubyLLM.embed(texts, model: @config.embedding_model)
20
+ result.vectors
21
+ end
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "securerandom"
4
+ require "time"
5
+
6
+ module RubyLLM
7
+ module SemanticCache
8
+ class Entry
9
+ attr_reader :id, :query, :response, :embedding, :metadata, :created_at
10
+
11
+ def initialize(query:, response:, embedding:, metadata: {}, id: nil, created_at: nil)
12
+ @id = id || SecureRandom.uuid
13
+ @query = query
14
+ @response = response
15
+ @embedding = embedding
16
+ @metadata = metadata
17
+ @created_at = created_at || Time.now
18
+ end
19
+
20
+ def to_h
21
+ {
22
+ id: @id,
23
+ query: @query,
24
+ response: @response,
25
+ metadata: @metadata,
26
+ created_at: @created_at.iso8601
27
+ }
28
+ end
29
+
30
+ def self.from_h(hash)
31
+ new(
32
+ id: hash[:id] || hash["id"],
33
+ query: hash[:query] || hash["query"],
34
+ response: hash[:response] || hash["response"],
35
+ embedding: hash[:embedding] || hash["embedding"],
36
+ metadata: hash[:metadata] || hash["metadata"] || {},
37
+ created_at: parse_time(hash[:created_at] || hash["created_at"])
38
+ )
39
+ end
40
+
41
+ def self.parse_time(value)
42
+ case value
43
+ when Time then value
44
+ when String then Time.parse(value)
45
+ when nil then Time.now
46
+ else value
47
+ end
48
+ end
49
+ end
50
+ end
51
+ end