llm_optimizer 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c6903d2d4c2163d93ffe8d0d5ad9708d64a8472a430ed9f266c9237e468c8585
4
- data.tar.gz: c7270f4717ece6778976f46f1601f9e5d45939e3e7926ea7e3ed05b3b641f413
3
+ metadata.gz: b5f6d0b99af3e0801e77df0316ac767e0e10e0d4e7bba9dc19623797681a2961
4
+ data.tar.gz: d8644df814cb0c7f219a51620d3cd409e1bb5822228278245b2572dbaf666fdc
5
5
  SHA512:
6
- metadata.gz: 858cad7443f7adcbe42b3d5ce62b4e815081d2238b7711066276ee2a7c0fb6a506d267ccb48dbe611a2ed08b2eab29139057dcddc2d033155561499a0d6f5421
7
- data.tar.gz: b3afc392e8fb2ef5b7baa468f74f9def34a15db9f6df898fd738503638d32f5dda9b04a6c8f2e005cd94aa893eca864111f3be0f2e8bfa1cc0aeef6391e0ae2c
6
+ metadata.gz: 1396d95f7e3f498e600cf6e3b99627ee2f746692a1f002be989ce1b13859f5a1af8f50656a82ff0fa853d3e42b0c219de49ac152baa2107d9c9529fc82bf63e4
7
+ data.tar.gz: 6b45bae664e4d43fd54fe47c2c8c9ebdeea2b4442f78bf22daee56d730095a471f58faf0e6432c7e5ef18a58cfc20a9279dc8706c9f41e4ca55db0cb441df1e8
data/CHANGELOG.md CHANGED
@@ -7,6 +7,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.1.5] - 2026-04-22
11
+
12
+ ### Added
13
+ - `ConversationStore` — Redis-backed conversation persistence under the `llm_optimizer:conversation:<id>` namespace; handles load, save, TTL, and debug logging
14
+ - `conversation_id` option on `LlmOptimizer.optimize` — pass a stable ID and the gem automatically loads history from Redis, calls the LLM with full context, and saves the updated history back; no manual message management required
15
+ - `messages_caller` config option — injectable lambda `(messages, model:) -> String` for LLM providers that accept a full message array (OpenAI chat, Anthropic messages, etc.); takes priority over `llm_caller` when conversation history is present
16
+ - `system_prompt` config option — seeded as the opening exchange when a new conversation is created via `conversation_id`
17
+ - `conversation_ttl` config option — TTL in seconds for Redis conversation keys (default `86400`; `0` for no expiry)
18
+ - `LlmOptimizer.clear_conversation(conversation_id)` — deletes a conversation key from Redis; returns `true` if deleted, `false` if not found
19
+ - `pipeline#load_conversation` and `pipeline#persist_conversation` — internal helpers wiring `ConversationStore` into the optimize pipeline
20
+ - `pipeline#apply_history_manager` — applies `HistoryManager` sliding-window summarization to loaded conversation history when `manage_history: true`
21
+
22
+ ### Changed
23
+ - `HistoryManager` now receives an internal `llm_caller` lambda that routes through `raw_llm_call`, so it correctly uses `messages_caller` when available instead of always requiring `llm_caller`
24
+ - `raw_llm_call` updated to prefer `messages_caller` over `llm_caller` when a non-empty messages array is present
25
+ - `ModelRouter` classifier response matching now uses word-boundary regex (`/\bsimple\b/`, `/\bcomplex\b/`) to handle decorated responses like `"simple."`, `"**complex**"`, or `"the answer is simple"` — previously only exact string match was used
26
+ - `ModelRouter` classifier failures (any `StandardError`) and unrecognized responses both fall through silently to the word-count heuristic; no exception is raised to the caller
27
+ - `validate_conversation_options!` raises `ConfigurationError` if both `conversation_id` and `messages:` are supplied, or if `conversation_id` is used without `redis_url`
28
+
29
+ ### Fixed
30
+ - `HistoryManager` summarization raised `ConfigurationError: No llm_caller configured` when called inside the pipeline without a bound config — internal lambda now correctly captures `call_config`
31
+
10
32
  ## [0.1.4] - 2026-04-13
11
33
 
12
34
  ### Fixed
@@ -79,7 +101,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
79
101
  - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
80
102
  - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
81
103
 
82
- [Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...HEAD
104
+ [Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.5...HEAD
105
+ [0.1.5]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...v0.1.5
83
106
  [0.1.4]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.3...v0.1.4
84
107
  [0.1.3]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...v0.1.3
85
108
  [0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2
data/README.md CHANGED
@@ -21,8 +21,8 @@ Stores prompt embeddings in Redis. On subsequent calls, computes cosine similari
21
21
 
22
22
  Classifies each prompt and routes it to the appropriate model tier:
23
23
 
24
- - **Simple** → cheaper/faster model (e.g. `gpt-4o-mini`, `amazon.nova-micro`)
25
- - **Complex** → premium model (e.g. `claude-3-5-sonnet`, `gpt-4o`)
24
+ - **Simple** → cheaper/faster model (e.g. `llama3`, `gemini-2.5-flash-lite`)
25
+ - **Complex** → premium model (e.g. `claude-haiku-4-5-20251001`, `gemini-3.0-pro`)
26
26
 
27
27
  Routing uses a three-layer decision chain:
28
28
 
@@ -50,7 +50,7 @@ If `classifier_caller` is not set, the router falls back to the word-count heuri
50
50
  Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
51
51
 
52
52
  ### 4. Conversation History Sliding Window
53
- When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message.
53
+ When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message. Uses Redis to store for fast reetreival and summarizing.
54
54
 
55
55
  ## Installation
56
56
 
@@ -99,7 +99,7 @@ result = LlmOptimizer.optimize("What is Redis?")
99
99
  puts result.response # => "Redis is an in-memory data store..."
100
100
  puts result.cache_status # => :hit or :miss
101
101
  puts result.model_tier # => :simple or :complex
102
- puts result.model # => "gpt-4o-mini"
102
+ puts result.model # => "gemini-2.5-flash-lite"
103
103
  puts result.original_tokens # => 5
104
104
  puts result.compressed_tokens # => 4
105
105
  puts result.latency_ms # => 12.4
@@ -110,39 +110,50 @@ puts result.latency_ms # => 12.4
110
110
  ### Rails initializer
111
111
 
112
112
  ```ruby
113
+ # config/initializers/llm_optimizer.rb
114
+ require "llm_optimizer"
115
+
113
116
  LlmOptimizer.configure do |config|
114
- # Feature flags all off by default
117
+ # --- Feature flags (all off by default) ---
115
118
  config.compress_prompt = true # strip stop words before sending to LLM
116
119
  config.use_semantic_cache = true # cache responses by vector similarity
117
120
  config.manage_history = true # summarize old messages when over token budget
118
121
 
119
- # Model routing
120
- config.route_to = :auto # :auto | :simple | :complex
121
- config.simple_model = "gpt-4o-mini" # model used for simple prompts
122
- config.complex_model = "claude-3-5-sonnet-20241022" # model used for complex prompts
122
+ # --- Model routing ---
123
+ config.route_to = :auto # :auto, :simple, or :complex
124
+ config.simple_model = "gemini-2.5-flash-lite" # used for simple prompts
125
+ config.complex_model = "claude-haiku-4-5-20251001" # used for complex prompts
123
126
 
124
- # Redis (required if use_semantic_cache: true)
127
+ # --- Redis (required if use_semantic_cache: true) ---
125
128
  config.redis_url = ENV["REDIS_URL"]
126
129
 
127
- # Tuning
128
- config.similarity_threshold = 0.96 # cosine similarity cutoff for cache hit (0.0–1.0)
129
- config.token_budget = 4000 # token limit before history summarization
130
- config.cache_ttl = 86400 # cache TTL in seconds (default: 24h)
130
+ # --- Token / cache settings ---
131
+ config.similarity_threshold = 0.96 # cosine similarity cutoff for cache hit
132
+ config.token_budget = 4000 # max tokens before history summarization
133
+ config.cache_ttl = 86400 # cache TTL in seconds (24h)
131
134
  config.timeout_seconds = 5 # timeout for external API calls
132
135
 
133
- # Logging
136
+ # --- Logging ---
134
137
  config.logger = Rails.logger
135
- config.debug_logging = Rails.env.development? # logs full prompt+response at DEBUG level
138
+ config.debug_logging = Rails.env.development? # logs full prompt+response in dev
136
139
 
137
- # LLM caller wire to your existing LLM client (required)
140
+ # --- Wire up your app's LLM client ---
141
+ # Replace the body with however your app calls the LLM
138
142
  config.llm_caller = ->(prompt, model:) {
139
- RubyLLM.chat(model: model, assume_model_exists: true).ask(prompt).content
143
+ model ||= "claude-haiku-4-5-20251001"
144
+ provider = if model.include?("claude") then :anthropic
145
+ elsif model.include?("gpt") then :openai
146
+ elsif model.include?("gemini") then :gemini
147
+ else :ollama
148
+ end
149
+ chat = RubyLLM.chat(model: model, provider: provider, assume_model_exists: true)
150
+ chat.ask(prompt).content
140
151
  }
141
152
 
142
153
  # Embeddings caller — wire to your embeddings provider (required if use_semantic_cache: true)
143
- # Falls back to OpenAI via ENV["OPENAI_API_KEY"] if not set
144
154
  config.embedding_caller = ->(text) {
145
- MyEmbeddingService.embed(text)
155
+ response = RubyLLM.embed(text, provider: :gemini, model: 'gemini-embedding-001')
156
+ response.vectors
146
157
  }
147
158
 
148
159
  # Classifier caller — optional, improves routing accuracy for ambiguous prompts
@@ -151,7 +162,18 @@ LlmOptimizer.configure do |config|
151
162
  RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
152
163
  .ask(prompt).content.strip.downcase
153
164
  }
165
+
166
+ # Messages caller - optional, handles converation summary and hostiry manager.
167
+ config.system_prompt = "You are a sarcastic comic person who gives witty responses in a non harmful way. If any serious question is asked, handle it in a calm way."
168
+
169
+ config.messages_caller = ->(messages, model:) {
170
+ chat = RubyLLM.chat(model: model)
171
+ messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
172
+ response = chat.ask(messages.last[:content])
173
+ response.content
174
+ }
154
175
  end
176
+
155
177
  ```
156
178
 
157
179
  ### Configuration reference
@@ -162,19 +184,22 @@ end
162
184
  | `use_semantic_cache` | Boolean | `false` | Enable Redis-backed semantic cache |
163
185
  | `manage_history` | Boolean | `false` | Enable conversation history summarization |
164
186
  | `route_to` | Symbol | `:auto` | `:auto`, `:simple`, or `:complex` |
165
- | `simple_model` | String | `"gpt-4o-mini"` | Model for simple prompts |
166
- | `complex_model` | String | `"claude-3-5-sonnet-20241022"` | Model for complex prompts |
187
+ | `simple_model` | String | `"gemini-2.5-flash-lite"` | Model for simple prompts |
188
+ | `complex_model` | String | `"claude-haiku-4-5-20251001"` | Model for complex prompts |
167
189
  | `similarity_threshold` | Float | `0.96` | Minimum cosine similarity for cache hit |
168
190
  | `token_budget` | Integer | `4000` | Token limit before history summarization |
169
191
  | `cache_ttl` | Integer | `86400` | Cache entry TTL in seconds |
170
192
  | `timeout_seconds` | Integer | `5` | Timeout for external API calls |
171
193
  | `redis_url` | String | `nil` | Redis connection URL |
172
- | `embedding_model` | String | `"text-embedding-3-small"` | Embedding model name (OpenAI fallback) |
194
+ | `embedding_model` | String | `"gemini-embedding-001"` | Embedding model name (OpenAI fallback) |
173
195
  | `logger` | Logger | `Logger.new($stdout)` | Any Logger-compatible object |
174
196
  | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
175
197
  | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
176
198
  | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
177
199
  | `classifier_caller` | Lambda | `nil` | `(prompt) -> "simple" or "complex"` |
200
+ | `messages_caller` | Lambda | `nil` | `(messages, model:) -> String` — used when `conversation_id` is present; receives full history including current user turn |
201
+ | `system_prompt` | String | `nil` | Seeded as the first system message when a new conversation is created via `conversation_id` |
202
+ | `conversation_ttl` | Integer | `86400` | TTL in seconds for Redis-backed conversation history (`0` for no expiry) |
178
203
 
179
204
  ## Per-call configuration
180
205
 
@@ -200,19 +225,6 @@ messages = [
200
225
 
201
226
  result = LlmOptimizer.optimize("What else can it do?", messages: messages)
202
227
 
203
- # result.messages contains the (possibly summarized) messages array
204
- ```
205
-
206
- ## Opt-in client wrapping
207
-
208
- Transparently wrap an existing LLM client class so all calls through it are automatically optimized:
209
-
210
- ```ruby
211
- LlmOptimizer.wrap_client(OpenAI::Client)
212
- ```
213
-
214
- This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
215
-
216
228
  ## OptimizeResult
217
229
 
218
230
  Every call returns an `OptimizeResult` struct:
@@ -226,20 +238,9 @@ Every call returns an `OptimizeResult` struct:
226
238
  | `original_tokens` | Integer | Estimated token count before compression |
227
239
  | `compressed_tokens` | Integer | Estimated token count after compression (`nil` if not compressed) |
228
240
  | `latency_ms` | Float | Total wall-clock time for the optimize call |
229
- | `messages` | Array | Final messages array (for history management) |
230
-
231
- ## Error handling
232
-
233
- The gem defines a hierarchy of errors, all inheriting from `LlmOptimizer::Error`:
234
-
235
- ```
236
- LlmOptimizer::Error
237
- ├── LlmOptimizer::ConfigurationError # unknown config key, missing llm_caller
238
- ├── LlmOptimizer::EmbeddingError # embedding API failure
239
- └── LlmOptimizer::TimeoutError # network timeout exceeded
240
- ```
241
+ | `messages` | Array | Final messages array sent to the LLM, after history management and conversation hydration (`nil` on a cache hit) |
241
242
 
242
- The gateway catches all component failures and falls through to a raw LLM call with the original prompt. Your app's core functionality is never blocked by the optimizer.
243
+ The `messages` field reflects the actual array passed to `messages_caller` (or built from `conversation_id`), including any summarization applied by the history manager. You can pass it back as `options[:messages]` on the next call to continue a stateless conversation.
243
244
 
244
245
  ## Resilience
245
246
 
@@ -249,7 +250,9 @@ The gateway catches all component failures and falls through to a raw LLM call w
249
250
  | Redis unavailable (write) | Log warning, return LLM result normally |
250
251
  | Embedding API failure | Treat as cache miss, continue |
251
252
  | Any component exception | Log error, fall through to raw LLM call |
252
- | History summarization failure | Log error, return original messages unchanged |
253
+ | History summarization failure | Log warning, return original messages unchanged |
254
+ | Conversation load failure | Log warning, proceed without history |
255
+ | Conversation save failure | Log warning, return result with pre-save messages |
253
256
 
254
257
  ## Development
255
258
 
@@ -15,8 +15,8 @@ LlmOptimizer.configure do |config|
15
15
  # --- Model routing ---
16
16
  # :auto classifies each prompt; :simple or :complex forces a tier
17
17
  config.route_to = :auto
18
- config.simple_model = "gpt-4o-mini"
19
- config.complex_model = "gpt-4o"
18
+ config.simple_model = "gemini-1.5-flash"
19
+ config.complex_model = "claude-haiku-4-5"
20
20
 
21
21
  # --- Redis (required only if use_semantic_cache: true) ---
22
22
  config.redis_url = ENV.fetch("REDIS_URL", nil)
@@ -76,4 +76,42 @@ LlmOptimizer.configure do |config|
76
76
  # }
77
77
  #
78
78
  # config.classifier_caller = nil
79
+
80
+ # --- Messages caller (optional) ---
81
+ # Messages caller for history manager/conversation summary - Optional
82
+ # config.system_prompt = "You are a helpful person who gives responses in a non harmful way. " \
83
+ # "If any serious question is asked, handle it in effectively."
84
+ # OpenAI implmeentation -
85
+ # config.messages_caller = ->(messages, model:) {
86
+ # response = $openai.chat(
87
+ # parameters: {
88
+ # model: model,
89
+ # messages: messages.map { |m| { role: m[:role], content: m[:content] } }
90
+ # }
91
+ # )
92
+ # response.dig("choices", 0, "message", "content")
93
+ # }
94
+
95
+ # RubyLLM implementation -
96
+ # config.messages_caller = ->(messages, model:) {
97
+ # chat = RubyLLM.chat(model: model)
98
+ # messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
99
+ # chat.ask(messages.last[:content]).content
100
+ # }
101
+
102
+ # Anthropic implementation -
103
+ # config.messages_caller = ->(messages, model:) {
104
+ # # Anthropic separates system messages from the messages array
105
+ # system_msg = messages.find { |m| m[:role] == "system" }&.dig(:content)
106
+ # chat_msgs = messages.reject { |m| m[:role] == "system" }
107
+ # .map { |m| { role: m[:role], content: m[:content] } }
108
+
109
+ # response = $anthropic.messages(
110
+ # model: model,
111
+ # max_tokens: 1024,
112
+ # system: system_msg,
113
+ # messages: chat_msgs
114
+ # )
115
+ # response["content"].first["text"]
116
+ # }
79
117
  end
@@ -22,6 +22,9 @@ module LlmOptimizer
22
22
  llm_caller
23
23
  embedding_caller
24
24
  classifier_caller
25
+ conversation_ttl
26
+ system_prompt
27
+ messages_caller
25
28
  ].freeze
26
29
 
27
30
  # Define readers for all known keys (setters below track explicit sets)
@@ -47,6 +50,8 @@ module LlmOptimizer
47
50
  @llm_caller = nil
48
51
  @embedding_caller = nil
49
52
  @classifier_caller = nil
53
+ @conversation_ttl = 86_400
54
+ @system_prompt = nil
50
55
  end
51
56
 
52
57
  # Copies only explicitly set keys from other_config without resetting unmentioned keys.
@@ -0,0 +1,83 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ class ConversationStore
5
+ KEY_NAMESPACE = "llm_optimizer:conversation:"
6
+
7
+ def initialize(redis_client, ttl:, logger:, debug_logging: false, system_prompt: nil)
8
+ @redis = redis_client
9
+ @ttl = ttl
10
+ @logger = logger
11
+ @debug_logging = debug_logging
12
+ @system_prompt = system_prompt
13
+ end
14
+
15
+ # Loads and returns the messages array for conversation_id.
16
+ # Returns [] if no key exists or on Redis error (logs warning).
17
+ def load(conversation_id)
18
+ key = redis_key(conversation_id)
19
+ raw = @redis.get(key)
20
+
21
+ if raw.nil?
22
+ messages = seed_messages
23
+ @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
24
+ log_debug_history(conversation_id, messages)
25
+ return messages
26
+ end
27
+
28
+ messages = JSON.parse(raw, symbolize_names: true)
29
+ @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
30
+ log_debug_history(conversation_id, messages)
31
+ messages
32
+ rescue Redis::BaseError => e
33
+ @logger.warn("[llm_optimizer] ConversationStore load failed: conversation_id=#{conversation_id}, error=#{e.message}")
34
+ []
35
+ end
36
+
37
+ # Appends user + assistant messages to history and persists to Redis.
38
+ # Silently logs warning on Redis error; never raises.
39
+ def save(conversation_id, messages, prompt, response)
40
+ updated_messages = messages + [
41
+ { role: "user", content: prompt },
42
+ { role: "assistant", content: response }
43
+ ]
44
+
45
+ key = redis_key(conversation_id)
46
+ json = JSON.generate(updated_messages)
47
+
48
+ if @ttl.zero?
49
+ @redis.set(key, json)
50
+ else
51
+ @redis.set(key, json, ex: @ttl)
52
+ end
53
+
54
+ @logger.info("[llm_optimizer] ConversationStore save: conversation_id=#{conversation_id}, count=#{updated_messages.size}")
55
+ log_debug_history(conversation_id, updated_messages)
56
+ updated_messages
57
+ rescue Redis::BaseError => e
58
+ @logger.warn("[llm_optimizer] ConversationStore save failed: conversation_id=#{conversation_id}, error=#{e.message}")
59
+ nil
60
+ end
61
+
62
+ private
63
+
64
+ def redis_key(conversation_id)
65
+ "#{KEY_NAMESPACE}#{conversation_id}"
66
+ end
67
+
68
+ def seed_messages
69
+ return [] unless @system_prompt
70
+
71
+ [
72
+ { role: "user", content: @system_prompt },
73
+ { role: "assistant", content: "Got it!" }
74
+ ]
75
+ end
76
+
77
+ def log_debug_history(conversation_id, messages)
78
+ return unless @debug_logging
79
+
80
+ @logger.debug("[llm_optimizer] ConversationStore history: conversation_id=#{conversation_id}, messages=#{messages.inspect}")
81
+ end
82
+ end
83
+ end
@@ -10,10 +10,12 @@ module LlmOptimizer
10
10
  Classify the following prompt as either 'simple' or 'complex'.
11
11
 
12
12
  Rules:
13
- - simple: factual questions, basic lookups, short explanations, greetings
13
+ - simple: factual questions, basic lookups, short explanations, greetings, chitchat, general statements, simple mathematical calculations with additions, subtractions, multiplications and divisions
14
+ Example - Hello, Bye, You are funny, how are you?, what is the capital of France, tell me about yourself, what is 2 + 3 - 1 * 10 / 2 etc.
14
15
  - complex: code generation, debugging, architecture, multi-step reasoning, analysis
16
+ Example - how does pandas extract my information, debug this code, why is rag apps consume more tokens, give me code to print star in python etc.
15
17
 
16
- Reply with exactly one word: simple or complex
18
+ Reply with exactly one word, no punctuation: simple or complex
17
19
 
18
20
  Prompt: %<prompt>s
19
21
  PROMPT
@@ -48,9 +50,12 @@ module LlmOptimizer
48
50
  def classify_with_llm(prompt)
49
51
  classifier_prompt = format(CLASSIFIER_PROMPT, prompt: prompt)
50
52
  response = @config.classifier_caller.call(classifier_prompt)
51
- normalized = response.to_s.strip.downcase.gsub(/[^a-z]/, "")
52
- return :simple if normalized == "simple"
53
- return :complex if normalized == "complex"
53
+ normalized = response.to_s.strip.downcase
54
+
55
+ # Check for word boundary match to handle responses like
56
+ # "simple." / "**simple**" / "the answer is simple"
57
+ return :simple if normalized.match?(/\bsimple\b/)
58
+ return :complex if normalized.match?(/\bcomplex\b/)
54
59
 
55
60
  nil # unrecognized response — fall through to heuristic
56
61
  rescue StandardError
@@ -0,0 +1,173 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ # Internal pipeline helpers — not part of the public API.
5
+ # Extended into LlmOptimizer as private class methods.
6
+ module Pipeline
7
+ private
8
+
9
+ def build_call_config(options, &block)
10
+ cfg = Configuration.new
11
+ cfg.merge!(configuration)
12
+ options.each do |k, v|
13
+ next unless Configuration::KNOWN_KEYS.include?(k.to_sym)
14
+
15
+ cfg.public_send(:"#{k}=", v)
16
+ end
17
+ block&.call(cfg)
18
+ cfg
19
+ end
20
+
21
+ def validate_conversation_options!(conversation_id, options, call_config)
22
+ if conversation_id && options[:messages]
23
+ raise ConfigurationError,
24
+ "conversation_id and messages: are mutually exclusive — pass one or the other"
25
+ end
26
+
27
+ return unless conversation_id && call_config.redis_url.nil?
28
+
29
+ raise ConfigurationError,
30
+ "redis_url must be configured to use conversation_id"
31
+ end
32
+
33
+ def compress(prompt, config)
34
+ return [prompt, nil] unless config.compress_prompt
35
+
36
+ compressed = Compressor.new.compress(prompt)
37
+ [compressed, Compressor.new.estimate_tokens(compressed)]
38
+ end
39
+
40
+ def route(prompt, config)
41
+ router = ModelRouter.new(config)
42
+ model_tier = router.route(prompt)
43
+ model = model_tier == :simple ? config.simple_model : config.complex_model
44
+ [model_tier, model]
45
+ end
46
+
47
+ def semantic_cache_lookup(prompt, model, model_tier, original_tokens,
48
+ compressed_tokens, original_prompt, start, config)
49
+ return [nil, nil] unless config.use_semantic_cache
50
+
51
+ emb_client = EmbeddingClient.new(
52
+ model: config.embedding_model,
53
+ timeout_seconds: config.timeout_seconds,
54
+ embedding_caller: config.embedding_caller
55
+ )
56
+ embedding = emb_client.embed(prompt)
57
+ embedding, result = check_cache_hit(embedding, prompt, model, model_tier,
58
+ original_tokens, compressed_tokens,
59
+ original_prompt, start, config)
60
+ [embedding, result]
61
+ rescue EmbeddingError => e
62
+ config.logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
63
+ [nil, nil]
64
+ end
65
+
66
+ def load_conversation(conversation_id, options, config)
67
+ return [options[:messages], nil] unless conversation_id
68
+
69
+ redis = build_redis(config.redis_url)
70
+ store = ConversationStore.new(redis,
71
+ ttl: config.conversation_ttl,
72
+ logger: config.logger,
73
+ debug_logging: config.debug_logging,
74
+ system_prompt: config.system_prompt)
75
+ [store.load(conversation_id), store]
76
+ end
77
+
78
+ def apply_history_manager(messages, config)
79
+ return messages unless config.manage_history && messages
80
+
81
+ llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: config) }
82
+ history_mgr = HistoryManager.new(
83
+ llm_caller: llm_caller,
84
+ simple_model: config.simple_model,
85
+ token_budget: config.token_budget
86
+ )
87
+ history_mgr.process(messages)
88
+ end
89
+
90
+ def persist_conversation(store, conversation_id, messages, prompt, response)
91
+ return messages unless store && conversation_id
92
+
93
+ store.save(conversation_id, messages, prompt, response) || messages
94
+ end
95
+
96
+ def store_in_cache(embedding, response, config)
97
+ return unless config.use_semantic_cache && embedding && config.redis_url
98
+
99
+ redis = build_redis(config.redis_url)
100
+ cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
101
+ cache.store(embedding, response)
102
+ rescue StandardError => e
103
+ config.logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
104
+ end
105
+
106
+ def build_result(response, model, model_tier, cache_status,
107
+ original_tokens, compressed_tokens, latency_ms, messages)
108
+ OptimizeResult.new(
109
+ response: response, model: model, model_tier: model_tier,
110
+ cache_status: cache_status, original_tokens: original_tokens,
111
+ compressed_tokens: compressed_tokens, latency_ms: latency_ms,
112
+ messages: messages
113
+ )
114
+ end
115
+
116
+ def fallback_result(original_prompt, original_tokens, options, start)
117
+ latency_ms = elapsed_ms(start)
118
+ response = raw_llm_call(original_prompt, model: nil, config: configuration)
119
+ build_result(response, nil, nil, :miss, original_tokens || 0, nil,
120
+ latency_ms, options[:messages])
121
+ end
122
+
123
+ def raw_llm_call(prompt, model:, messages: nil, config: nil)
124
+ if messages && !messages.empty? && config&.messages_caller
125
+ config.messages_caller.call(messages + [{ role: "user", content: prompt }], model: model)
126
+ else
127
+ llm = config&.llm_caller || @_current_llm_caller
128
+ raise ConfigurationError, "No llm_caller configured." unless llm
129
+
130
+ llm.call(prompt, model: model)
131
+ end
132
+ end
133
+
134
+ def elapsed_ms(start)
135
+ ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
136
+ end
137
+
138
+ def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
139
+ compressed_tokens:, latency_ms:, prompt:, response:)
140
+ logger.info(
141
+ "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
142
+ "model_tier: #{model_tier.inspect}, " \
143
+ "original_tokens: #{original_tokens.inspect}, " \
144
+ "compressed_tokens: #{compressed_tokens.inspect}, " \
145
+ "latency_ms: #{latency_ms.inspect} }"
146
+ )
147
+ logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}") if config.debug_logging
148
+ end
149
+
150
+ def build_redis(redis_url)
151
+ require "redis"
152
+ Redis.new(url: redis_url)
153
+ end
154
+
155
+ def check_cache_hit(embedding, _prompt, model, model_tier, original_tokens,
156
+ compressed_tokens, original_prompt, start, config)
157
+ return [embedding, nil] unless config.redis_url
158
+
159
+ redis = build_redis(config.redis_url)
160
+ cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
161
+ cached = cache.lookup(embedding)
162
+ return [embedding, nil] unless cached
163
+
164
+ latency_ms = elapsed_ms(start)
165
+ emit_log(config.logger, config,
166
+ cache_status: :hit, model_tier: model_tier,
167
+ original_tokens: original_tokens, compressed_tokens: compressed_tokens,
168
+ latency_ms: latency_ms, prompt: original_prompt, response: cached)
169
+ [embedding, build_result(cached, model, model_tier, :hit,
170
+ original_tokens, compressed_tokens, latency_ms, nil)]
171
+ end
172
+ end
173
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LlmOptimizer
4
- VERSION = "0.1.4"
4
+ VERSION = "0.1.5"
5
5
  end
data/lib/llm_optimizer.rb CHANGED
@@ -8,26 +8,21 @@ require_relative "llm_optimizer/model_router"
8
8
  require_relative "llm_optimizer/embedding_client"
9
9
  require_relative "llm_optimizer/semantic_cache"
10
10
  require_relative "llm_optimizer/history_manager"
11
+ require_relative "llm_optimizer/conversation_store"
12
+ require_relative "llm_optimizer/pipeline"
11
13
 
12
14
  require "llm_optimizer/railtie" if defined?(Rails)
13
15
 
14
16
  module LlmOptimizer
15
- # Base error class for all gem-specific exceptions
16
17
  class Error < StandardError; end
17
-
18
- # Raised when an unrecognized configuration key is set
19
18
  class ConfigurationError < Error; end
20
-
21
- # Raised when the embedding API call fails
22
19
  class EmbeddingError < Error; end
23
-
24
- # Raised when a network timeout is exceeded
25
20
  class TimeoutError < Error; end
26
21
 
27
- # Global configuration
28
22
  @configuration = nil
29
23
 
30
- # Yields a Configuration instance; merges it into the global config.
24
+ extend Pipeline
25
+
31
26
  def self.configure
32
27
  temp = Configuration.new
33
28
  yield temp
@@ -35,7 +30,6 @@ module LlmOptimizer
35
30
  validate_configuration!(configuration)
36
31
  end
37
32
 
38
- # Warns about misconfigured options rather than failing silently at call time.
39
33
  def self.validate_configuration!(config)
40
34
  return unless config.use_semantic_cache && config.embedding_caller.nil?
41
35
 
@@ -46,36 +40,32 @@ module LlmOptimizer
46
40
  config.use_semantic_cache = false
47
41
  end
48
42
 
49
- # Returns the current global Configuration, lazy-initializing if nil.
50
43
  def self.configuration
51
44
  @configuration ||= Configuration.new
52
45
  end
53
46
 
54
- # Replaces the global config with a fresh default Configuration.
55
- # Useful in tests to avoid state leakage.
56
47
  def self.reset_configuration!
57
48
  @configuration = Configuration.new
58
49
  end
59
50
 
60
- # Opt-in client wrapping
61
- # WrapperModule intercepts `chat` on the wrapped client, runs the pre-call
62
- # optimization pipeline (compress, route, cache lookup), and delegates the
63
- # actual LLM call to the original client via `super` — so llm_caller is NOT
64
- # required when using wrap_client.
51
+ def self.clear_conversation(conversation_id)
52
+ raise ConfigurationError, "redis_url must be configured to use clear_conversation" unless configuration.redis_url
53
+
54
+ redis = build_redis(configuration.redis_url)
55
+ key = "#{ConversationStore::KEY_NAMESPACE}#{conversation_id}"
56
+ deleted = redis.del(key)
57
+ deleted.positive?
58
+ rescue ::Redis::BaseError => e
59
+ raise LlmOptimizer::Error, "Redis error in clear_conversation: #{e.message}"
60
+ end
61
+
65
62
  module WrapperModule
66
- def chat(params, &block)
63
+ def chat(params, &)
67
64
  config = LlmOptimizer.configuration
68
65
  prompt = params[:messages] || params[:prompt]
69
-
70
- # Run pre-call pipeline: compress, route, cache lookup
71
66
  result = LlmOptimizer.optimize_pre_call(prompt, config)
67
+ return result[:response] if result[:cache_status] == :hit
72
68
 
73
- # Cache hit — return immediately without calling the LLM
74
- if result[:cache_status] == :hit
75
- return result[:response]
76
- end
77
-
78
- # Apply compressed prompt and routed model, then delegate to original client
79
69
  optimized_params = params.merge(model: result[:model])
80
70
  if params[:messages]
81
71
  optimized_params = optimized_params.merge(messages: result[:prompt])
@@ -83,264 +73,79 @@ module LlmOptimizer
83
73
  optimized_params = optimized_params.merge(prompt: result[:prompt])
84
74
  end
85
75
 
86
- response = super(optimized_params, &block)
87
-
88
- # Store in cache after successful LLM call
76
+ response = super(optimized_params, &)
89
77
  LlmOptimizer.optimize_post_call(result, response, config)
90
-
91
78
  response
92
79
  end
93
80
  end
94
81
 
95
- # Prepends WrapperModule into client_class; idempotent — safe to call N times.
96
82
  def self.wrap_client(client_class)
97
83
  return if client_class.ancestors.include?(WrapperModule)
98
84
 
99
85
  client_class.prepend(WrapperModule)
100
86
  end
101
87
 
102
- # Primary entry point
103
- # Runs the optimization pipeline and returns an OptimizeResult.
88
+ def self.optimize(prompt, options = {}, &)
89
+ start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
90
+ call_config = build_call_config(options, &)
91
+ conversation_id = options[:conversation_id]
92
+ validate_conversation_options!(conversation_id, options, call_config)
104
93
 
105
- # options hash keys mirror Configuration attr_accessors and are merged over
106
- # the global config for this call only. An optional block is yielded a
107
- # per-call Configuration for fine-grained control.
108
- def self.optimize(prompt, options = {})
109
- start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
110
-
111
- # Resolve per-call configuration — only pass known config keys
112
- call_config = Configuration.new
113
- call_config.merge!(configuration)
114
- options.each do |k, v|
115
- next unless LlmOptimizer::Configuration::KNOWN_KEYS.include?(k.to_sym)
116
-
117
- call_config.public_send(:"#{k}=", v)
118
- end
119
- yield call_config if block_given?
94
+ original_prompt = prompt
95
+ original_tokens = Compressor.new.estimate_tokens(prompt)
96
+ prompt, compressed_tokens = compress(prompt, call_config)
97
+ model_tier, model = route(prompt, call_config)
120
98
 
121
- logger = call_config.logger
99
+ embedding, cached_result = semantic_cache_lookup(prompt, model, model_tier,
100
+ original_tokens, compressed_tokens,
101
+ original_prompt, start, call_config)
102
+ return cached_result if cached_result
122
103
 
123
- # Keep a reference to the original prompt for fallback use
124
- original_prompt = prompt
104
+ messages, store = load_conversation(conversation_id, options, call_config)
105
+ messages = apply_history_manager(messages, call_config)
106
+ response = raw_llm_call(prompt, messages: messages, model: model, config: call_config)
107
+ messages = persist_conversation(store, conversation_id, messages, prompt, response)
108
+ store_in_cache(embedding, response, call_config)
125
109
 
126
- # Compression
127
- compressor = Compressor.new
128
- original_tokens = compressor.estimate_tokens(prompt)
129
- compressed_tokens = nil
130
-
131
- if call_config.compress_prompt
132
- prompt = compressor.compress(prompt)
133
- compressed_tokens = compressor.estimate_tokens(prompt)
134
- end
135
-
136
- # Model routing
137
- router = ModelRouter.new(call_config)
138
- model_tier = router.route(prompt)
139
- model = model_tier == :simple ? call_config.simple_model : call_config.complex_model
140
-
141
- # Semantic cache lookup
142
- embedding = nil
143
-
144
- if call_config.use_semantic_cache
145
- begin
146
- emb_client = EmbeddingClient.new(
147
- model: call_config.embedding_model,
148
- timeout_seconds: call_config.timeout_seconds,
149
- embedding_caller: call_config.embedding_caller
150
- )
151
- embedding = emb_client.embed(prompt)
152
-
153
- if call_config.redis_url
154
- redis = build_redis(call_config.redis_url)
155
- cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
156
- cached = cache.lookup(embedding)
157
-
158
- if cached
159
- latency_ms = elapsed_ms(start)
160
- emit_log(logger, call_config,
161
- cache_status: :hit, model_tier: model_tier,
162
- original_tokens: original_tokens, compressed_tokens: compressed_tokens,
163
- latency_ms: latency_ms, prompt: original_prompt, response: cached)
164
- return OptimizeResult.new(
165
- response: cached,
166
- model: model,
167
- model_tier: model_tier,
168
- cache_status: :hit,
169
- original_tokens: original_tokens,
170
- compressed_tokens: compressed_tokens,
171
- latency_ms: latency_ms,
172
- messages: options[:messages]
173
- )
174
- end
175
- end
176
- rescue EmbeddingError => e
177
- logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
178
- embedding = nil
179
- # continue pipeline as cache miss
180
- end
181
- end
182
-
183
- # History management
184
- messages = options[:messages]
185
- if call_config.manage_history && messages
186
- llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: call_config) }
187
- history_mgr = HistoryManager.new(
188
- llm_caller: llm_caller,
189
- simple_model: call_config.simple_model,
190
- token_budget: call_config.token_budget
191
- )
192
- messages = history_mgr.process(messages)
193
- end
194
-
195
- # Raw LLM call
196
- response = raw_llm_call(prompt, model: model, config: call_config)
197
-
198
- # Cache store
199
- if call_config.use_semantic_cache && embedding && call_config.redis_url
200
- begin
201
- redis = build_redis(call_config.redis_url)
202
- cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
203
- cache.store(embedding, response)
204
- rescue StandardError => e
205
- logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
206
- end
207
- end
208
-
209
- # Build result
210
110
  latency_ms = elapsed_ms(start)
211
- emit_log(logger, call_config,
111
+ emit_log(call_config.logger, call_config,
212
112
  cache_status: :miss, model_tier: model_tier,
213
113
  original_tokens: original_tokens, compressed_tokens: compressed_tokens,
214
114
  latency_ms: latency_ms, prompt: original_prompt, response: response)
215
-
216
- OptimizeResult.new(
217
- response: response,
218
- model: model,
219
- model_tier: model_tier,
220
- cache_status: :miss,
221
- original_tokens: original_tokens,
222
- compressed_tokens: compressed_tokens,
223
- latency_ms: latency_ms,
224
- messages: messages
225
- )
115
+ build_result(response, model, model_tier, :miss, original_tokens, compressed_tokens,
116
+ latency_ms, messages)
226
117
  rescue EmbeddingError => e
227
- # Treat embedding failures as cache miss — continue to raw LLM call
228
- logger = configuration.logger
229
- logger.warn("[llm_optimizer] EmbeddingError (outer rescue, treating as cache miss): #{e.message}")
230
- latency_ms = elapsed_ms(start)
231
- response = raw_llm_call(original_prompt, model: nil, config: configuration)
232
- OptimizeResult.new(
233
- response: response,
234
- model: nil,
235
- model_tier: nil,
236
- cache_status: :miss,
237
- original_tokens: original_tokens || 0,
238
- compressed_tokens: nil,
239
- latency_ms: latency_ms,
240
- messages: options[:messages]
241
- )
118
+ configuration.logger.warn("[llm_optimizer] EmbeddingError (outer rescue): #{e.message}")
119
+ fallback_result(original_prompt, original_tokens, options, start)
120
+ rescue ConfigurationError
121
+ raise
242
122
  rescue LlmOptimizer::Error, StandardError => e
243
- logger = configuration.logger
244
- logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
245
- latency_ms = elapsed_ms(start)
246
- response = raw_llm_call(original_prompt, model: nil, config: configuration)
247
- OptimizeResult.new(
248
- response: response,
249
- model: nil,
250
- model_tier: nil,
251
- cache_status: :miss,
252
- original_tokens: original_tokens || 0,
253
- compressed_tokens: nil,
254
- latency_ms: latency_ms,
255
- messages: options[:messages]
256
- )
123
+ configuration.logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
124
+ fallback_result(original_prompt, original_tokens, options, start)
257
125
  end
258
126
 
259
- # Pre-call pipeline for wrap_client: compress, route, cache lookup.
260
- # Returns a hash with :prompt, :model, :model_tier, :embedding, :cache_status, :response.
261
- # Does NOT make an LLM call — the wrapped client handles that via super.
262
127
  def self.optimize_pre_call(prompt, config = configuration)
263
- compressor = Compressor.new
264
- prompt = compressor.compress(prompt) if config.compress_prompt
265
-
266
- router = ModelRouter.new(config)
267
- model_tier = router.route(prompt)
128
+ prompt = Compressor.new.compress(prompt) if config.compress_prompt
129
+ model_tier = ModelRouter.new(config).route(prompt)
268
130
  model = model_tier == :simple ? config.simple_model : config.complex_model
269
131
 
270
- embedding = nil
271
- if config.use_semantic_cache && config.redis_url
272
- begin
273
- emb_client = EmbeddingClient.new(
274
- model: config.embedding_model,
275
- timeout_seconds: config.timeout_seconds,
276
- embedding_caller: config.embedding_caller
277
- )
278
- embedding = emb_client.embed(prompt)
279
- redis = build_redis(config.redis_url)
280
- cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
281
- cached = cache.lookup(embedding)
282
- return { prompt: prompt, model: model, model_tier: model_tier,
283
- embedding: embedding, cache_status: :hit, response: cached } if cached
284
- rescue EmbeddingError => e
285
- config.logger.warn("[llm_optimizer] wrap_client EmbeddingError (cache miss): #{e.message}")
286
- embedding = nil
287
- end
132
+ unless config.use_semantic_cache && config.redis_url
133
+ return { prompt: prompt, model: model, model_tier: model_tier,
134
+ embedding: nil, cache_status: :miss, response: nil }
135
+ end
136
+
137
+ embedding, result = semantic_cache_lookup(prompt, model, model_tier, nil, nil,
138
+ prompt, Process.clock_gettime(Process::CLOCK_MONOTONIC), config)
139
+ if result
140
+ return { prompt: prompt, model: model, model_tier: model_tier,
141
+ embedding: embedding, cache_status: :hit, response: result.response }
288
142
  end
289
143
 
290
144
  { prompt: prompt, model: model, model_tier: model_tier,
291
145
  embedding: embedding, cache_status: :miss, response: nil }
292
146
  end
293
147
 
294
- # Post-call: store the LLM response in the semantic cache if applicable.
295
148
  def self.optimize_post_call(pre_call_result, response, config = configuration)
296
- return unless config.use_semantic_cache && config.redis_url
297
- return unless pre_call_result[:embedding]
298
-
299
- redis = build_redis(config.redis_url)
300
- cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
301
- cache.store(pre_call_result[:embedding], response)
302
- rescue StandardError => e
303
- config.logger.warn("[llm_optimizer] wrap_client cache store failed: #{e.message}")
304
- end
305
-
306
- # Private helpers
307
-
308
- class << self
309
- private
310
-
311
- def raw_llm_call(prompt, model:, config: nil)
312
- caller = config&.llm_caller || @_current_llm_caller
313
- unless caller
314
- raise ConfigurationError,
315
- "No llm_caller configured. " \
316
- "Set it via LlmOptimizer.configure { |c| c.llm_caller = ->(prompt, model:) { ... } }"
317
- end
318
-
319
- caller.call(prompt, model: model)
320
- end
321
-
322
- def elapsed_ms(start)
323
- ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
324
- end
325
-
326
- def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
327
- compressed_tokens:, latency_ms:, prompt:, response:)
328
- logger.info(
329
- "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
330
- "model_tier: #{model_tier.inspect}, " \
331
- "original_tokens: #{original_tokens.inspect}, " \
332
- "compressed_tokens: #{compressed_tokens.inspect}, " \
333
- "latency_ms: #{latency_ms.inspect} }"
334
- )
335
-
336
- return unless config.debug_logging
337
-
338
- logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}")
339
- end
340
-
341
- def build_redis(redis_url)
342
- require "redis"
343
- Redis.new(url: redis_url)
344
- end
149
+ store_in_cache(pre_call_result[:embedding], response, config)
345
150
  end
346
151
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_optimizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - arun kumar
@@ -100,10 +100,12 @@ files:
100
100
  - lib/llm_optimizer.rb
101
101
  - lib/llm_optimizer/compressor.rb
102
102
  - lib/llm_optimizer/configuration.rb
103
+ - lib/llm_optimizer/conversation_store.rb
103
104
  - lib/llm_optimizer/embedding_client.rb
104
105
  - lib/llm_optimizer/history_manager.rb
105
106
  - lib/llm_optimizer/model_router.rb
106
107
  - lib/llm_optimizer/optimize_result.rb
108
+ - lib/llm_optimizer/pipeline.rb
107
109
  - lib/llm_optimizer/railtie.rb
108
110
  - lib/llm_optimizer/semantic_cache.rb
109
111
  - lib/llm_optimizer/version.rb