llm_optimizer 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ef2fcae7f3d39043f476a555b980685670c65e266f8fc3f9ca4309081d51c066
4
- data.tar.gz: c5fb255ad280afba780ea3c417b377ae406dc828178609bf7e21c0bb4f1ba048
3
+ metadata.gz: b5f6d0b99af3e0801e77df0316ac767e0e10e0d4e7bba9dc19623797681a2961
4
+ data.tar.gz: d8644df814cb0c7f219a51620d3cd409e1bb5822228278245b2572dbaf666fdc
5
5
  SHA512:
6
- metadata.gz: f84eba0ae06cd7541616c44c8630618eb09f3f8b1d1fe5b588eae285be6dd6a2fcc88f0868a00cbfb91e00b491f56232c0c592b3bbbea579748232a89e8aff1e
7
- data.tar.gz: 80fd56954cfa497f2d7c16be68b4c41c6cd01128f3df1e2b1054c3d1005cb869b70317e60ae847dbae2d2f270119812d00d175a4dfa564c447791c2195bc7672
6
+ metadata.gz: 1396d95f7e3f498e600cf6e3b99627ee2f746692a1f002be989ce1b13859f5a1af8f50656a82ff0fa853d3e42b0c219de49ac152baa2107d9c9529fc82bf63e4
7
+ data.tar.gz: 6b45bae664e4d43fd54fe47c2c8c9ebdeea2b4442f78bf22daee56d730095a471f58faf0e6432c7e5ef18a58cfc20a9279dc8706c9f41e4ca55db0cb441df1e8
data/CHANGELOG.md CHANGED
@@ -7,6 +7,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.1.5] - 2026-04-22
11
+
12
+ ### Added
13
+ - `ConversationStore` — Redis-backed conversation persistence under the `llm_optimizer:conversation:<id>` namespace; handles load, save, TTL, and debug logging
14
+ - `conversation_id` option on `LlmOptimizer.optimize` — pass a stable ID and the gem automatically loads history from Redis, calls the LLM with full context, and saves the updated history back; no manual message management required
15
+ - `messages_caller` config option — injectable lambda `(messages, model:) -> String` for LLM providers that accept a full message array (OpenAI chat, Anthropic messages, etc.); takes priority over `llm_caller` when conversation history is present
16
+ - `system_prompt` config option — seeded as the opening exchange when a new conversation is created via `conversation_id`
17
+ - `conversation_ttl` config option — TTL in seconds for Redis conversation keys (default `86400`; `0` for no expiry)
18
+ - `LlmOptimizer.clear_conversation(conversation_id)` — deletes a conversation key from Redis; returns `true` if deleted, `false` if not found
19
+ - `pipeline#load_conversation` and `pipeline#persist_conversation` — internal helpers wiring `ConversationStore` into the optimize pipeline
20
+ - `pipeline#apply_history_manager` — applies `HistoryManager` sliding-window summarization to loaded conversation history when `manage_history: true`
21
+
22
+ ### Changed
23
+ - `HistoryManager` now receives an internal `llm_caller` lambda that routes through `raw_llm_call`, so it correctly uses `messages_caller` when available instead of always requiring `llm_caller`
24
+ - `raw_llm_call` updated to prefer `messages_caller` over `llm_caller` when a non-empty messages array is present
25
+ - `ModelRouter` classifier response matching now uses word-boundary regex (`/\bsimple\b/`, `/\bcomplex\b/`) to handle decorated responses like `"simple."`, `"**complex**"`, or `"the answer is simple"` — previously only exact string match was used
26
+ - `ModelRouter` classifier failures (any `StandardError`) and unrecognized responses both fall through silently to the word-count heuristic; no exception is raised to the caller
27
+ - `validate_conversation_options!` raises `ConfigurationError` if both `conversation_id` and `messages:` are supplied, or if `conversation_id` is used without `redis_url`
28
+
29
+ ### Fixed
30
+ - `HistoryManager` summarization raised `ConfigurationError: No llm_caller configured` when called inside the pipeline without a bound config — internal lambda now correctly captures `call_config`
31
+
32
+ ## [0.1.4] - 2026-04-13
33
+
34
+ ### Fixed
35
+ - `WrapperModule#chat` (used by `wrap_client`) incorrectly called `LlmOptimizer.optimize` internally which required `llm_caller` to be configured — causing `ConfigurationError` for users who only called `wrap_client`. Refactored into `optimize_pre_call` / `optimize_post_call` so the wrapped client handles the actual LLM call via `super`. `llm_caller` is no longer needed when using `wrap_client`
36
+
37
+ ### Added
38
+ - `LlmOptimizer.optimize_pre_call(prompt, config)` — runs compress → route → cache lookup without making an LLM call; used internally by `WrapperModule` and available for advanced integrations
39
+ - `LlmOptimizer.optimize_post_call(pre_call_result, response, config)` — stores a response in the semantic cache after an LLM call; used internally by `WrapperModule`
40
+
10
41
  ## [0.1.3] - 2026-04-10
11
42
 
12
43
  ### Added
@@ -70,7 +101,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
70
101
  - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
71
102
  - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
72
103
 
73
- [Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...HEAD
104
+ [Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.5...HEAD
105
+ [0.1.5]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...v0.1.5
106
+ [0.1.4]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.3...v0.1.4
107
+ [0.1.3]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...v0.1.3
74
108
  [0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2
75
109
  [0.1.1]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...v0.1.1
76
110
  [0.1.0]: https://github.com/arunkumarry/llm_optimizer/releases/tag/v0.1.0
data/README.md CHANGED
@@ -21,8 +21,8 @@ Stores prompt embeddings in Redis. On subsequent calls, computes cosine similari
21
21
 
22
22
  Classifies each prompt and routes it to the appropriate model tier:
23
23
 
24
- - **Simple** → cheaper/faster model (e.g. `gpt-4o-mini`, `amazon.nova-micro`)
25
- - **Complex** → premium model (e.g. `claude-3-5-sonnet`, `gpt-4o`)
24
+ - **Simple** → cheaper/faster model (e.g. `llama3`, `gemini-2.5-flash-lite`)
25
+ - **Complex** → premium model (e.g. `claude-haiku-4-5-20251001`, `gemini-3.0-pro`)
26
26
 
27
27
  Routing uses a three-layer decision chain:
28
28
 
@@ -31,9 +31,9 @@ Routing uses a three-layer decision chain:
31
31
  3. **LLM classifier** (optional) — for ambiguous prompts, calls a cheap model with a classification prompt; falls back to word-count heuristic if not configured or if the call fails
32
32
 
33
33
  This hybrid approach fixes the core weakness of pure heuristics:
34
- - `"Fix this bug"` → 3 words but `:complex` via classifier
35
- - `"Explain Ruby blocks simply"` → long but `:simple` via classifier
36
- - `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call)
34
+ - `"Fix this bug"` → 3 words but `:complex` via classifier
35
+ - `"Explain Ruby blocks simply"` → long but `:simple` via classifier
36
+ - `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call)
37
37
 
38
38
  Configure the classifier with any cheap model your app already uses:
39
39
 
@@ -50,7 +50,7 @@ If `classifier_caller` is not set, the router falls back to the word-count heuri
50
50
  Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
51
51
 
52
52
  ### 4. Conversation History Sliding Window
53
- When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message.
53
+ When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message. Uses Redis to store for fast reetreival and summarizing.
54
54
 
55
55
  ## Installation
56
56
 
@@ -99,7 +99,7 @@ result = LlmOptimizer.optimize("What is Redis?")
99
99
  puts result.response # => "Redis is an in-memory data store..."
100
100
  puts result.cache_status # => :hit or :miss
101
101
  puts result.model_tier # => :simple or :complex
102
- puts result.model # => "gpt-4o-mini"
102
+ puts result.model # => "gemini-2.5-flash-lite"
103
103
  puts result.original_tokens # => 5
104
104
  puts result.compressed_tokens # => 4
105
105
  puts result.latency_ms # => 12.4
@@ -110,39 +110,50 @@ puts result.latency_ms # => 12.4
110
110
  ### Rails initializer
111
111
 
112
112
  ```ruby
113
+ # config/initializers/llm_optimizer.rb
114
+ require "llm_optimizer"
115
+
113
116
  LlmOptimizer.configure do |config|
114
- # Feature flags all off by default
117
+ # --- Feature flags (all off by default) ---
115
118
  config.compress_prompt = true # strip stop words before sending to LLM
116
119
  config.use_semantic_cache = true # cache responses by vector similarity
117
120
  config.manage_history = true # summarize old messages when over token budget
118
121
 
119
- # Model routing
120
- config.route_to = :auto # :auto | :simple | :complex
121
- config.simple_model = "gpt-4o-mini" # model used for simple prompts
122
- config.complex_model = "claude-3-5-sonnet-20241022" # model used for complex prompts
122
+ # --- Model routing ---
123
+ config.route_to = :auto # :auto, :simple, or :complex
124
+ config.simple_model = "gemini-2.5-flash-lite" # used for simple prompts
125
+ config.complex_model = "claude-haiku-4-5-20251001" # used for complex prompts
123
126
 
124
- # Redis (required if use_semantic_cache: true)
127
+ # --- Redis (required if use_semantic_cache: true) ---
125
128
  config.redis_url = ENV["REDIS_URL"]
126
129
 
127
- # Tuning
128
- config.similarity_threshold = 0.96 # cosine similarity cutoff for cache hit (0.0–1.0)
129
- config.token_budget = 4000 # token limit before history summarization
130
- config.cache_ttl = 86400 # cache TTL in seconds (default: 24h)
130
+ # --- Token / cache settings ---
131
+ config.similarity_threshold = 0.96 # cosine similarity cutoff for cache hit
132
+ config.token_budget = 4000 # max tokens before history summarization
133
+ config.cache_ttl = 86400 # cache TTL in seconds (24h)
131
134
  config.timeout_seconds = 5 # timeout for external API calls
132
135
 
133
- # Logging
136
+ # --- Logging ---
134
137
  config.logger = Rails.logger
135
- config.debug_logging = Rails.env.development? # logs full prompt+response at DEBUG level
138
+ config.debug_logging = Rails.env.development? # logs full prompt+response in dev
136
139
 
137
- # LLM caller wire to your existing LLM client (required)
140
+ # --- Wire up your app's LLM client ---
141
+ # Replace the body with however your app calls the LLM
138
142
  config.llm_caller = ->(prompt, model:) {
139
- RubyLLM.chat(model: model, assume_model_exists: true).ask(prompt).content
143
+ model ||= "claude-haiku-4-5-20251001"
144
+ provider = if model.include?("claude") then :anthropic
145
+ elsif model.include?("gpt") then :openai
146
+ elsif model.include?("gemini") then :gemini
147
+ else :ollama
148
+ end
149
+ chat = RubyLLM.chat(model: model, provider: provider, assume_model_exists: true)
150
+ chat.ask(prompt).content
140
151
  }
141
152
 
142
153
  # Embeddings caller — wire to your embeddings provider (required if use_semantic_cache: true)
143
- # Falls back to OpenAI via ENV["OPENAI_API_KEY"] if not set
144
154
  config.embedding_caller = ->(text) {
145
- MyEmbeddingService.embed(text)
155
+ response = RubyLLM.embed(text, provider: :gemini, model: 'gemini-embedding-001')
156
+ response.vectors
146
157
  }
147
158
 
148
159
  # Classifier caller — optional, improves routing accuracy for ambiguous prompts
@@ -151,7 +162,18 @@ LlmOptimizer.configure do |config|
151
162
  RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
152
163
  .ask(prompt).content.strip.downcase
153
164
  }
165
+
166
+ # Messages caller - optional, handles converation summary and hostiry manager.
167
+ config.system_prompt = "You are a sarcastic comic person who gives witty responses in a non harmful way. If any serious question is asked, handle it in a calm way."
168
+
169
+ config.messages_caller = ->(messages, model:) {
170
+ chat = RubyLLM.chat(model: model)
171
+ messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
172
+ response = chat.ask(messages.last[:content])
173
+ response.content
174
+ }
154
175
  end
176
+
155
177
  ```
156
178
 
157
179
  ### Configuration reference
@@ -162,19 +184,22 @@ end
162
184
  | `use_semantic_cache` | Boolean | `false` | Enable Redis-backed semantic cache |
163
185
  | `manage_history` | Boolean | `false` | Enable conversation history summarization |
164
186
  | `route_to` | Symbol | `:auto` | `:auto`, `:simple`, or `:complex` |
165
- | `simple_model` | String | `"gpt-4o-mini"` | Model for simple prompts |
166
- | `complex_model` | String | `"claude-3-5-sonnet-20241022"` | Model for complex prompts |
187
+ | `simple_model` | String | `"gemini-2.5-flash-lite"` | Model for simple prompts |
188
+ | `complex_model` | String | `"claude-haiku-4-5-20251001"` | Model for complex prompts |
167
189
  | `similarity_threshold` | Float | `0.96` | Minimum cosine similarity for cache hit |
168
190
  | `token_budget` | Integer | `4000` | Token limit before history summarization |
169
191
  | `cache_ttl` | Integer | `86400` | Cache entry TTL in seconds |
170
192
  | `timeout_seconds` | Integer | `5` | Timeout for external API calls |
171
193
  | `redis_url` | String | `nil` | Redis connection URL |
172
- | `embedding_model` | String | `"text-embedding-3-small"` | Embedding model name (OpenAI fallback) |
194
+ | `embedding_model` | String | `"gemini-embedding-001"` | Embedding model name (OpenAI fallback) |
173
195
  | `logger` | Logger | `Logger.new($stdout)` | Any Logger-compatible object |
174
196
  | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
175
197
  | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
176
198
  | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
177
199
  | `classifier_caller` | Lambda | `nil` | `(prompt) -> "simple" or "complex"` |
200
+ | `messages_caller` | Lambda | `nil` | `(messages, model:) -> String` — used when `conversation_id` is present; receives full history including current user turn |
201
+ | `system_prompt` | String | `nil` | Seeded as the first system message when a new conversation is created via `conversation_id` |
202
+ | `conversation_ttl` | Integer | `86400` | TTL in seconds for Redis-backed conversation history (`0` for no expiry) |
178
203
 
179
204
  ## Per-call configuration
180
205
 
@@ -200,19 +225,6 @@ messages = [
200
225
 
201
226
  result = LlmOptimizer.optimize("What else can it do?", messages: messages)
202
227
 
203
- # result.messages contains the (possibly summarized) messages array
204
- ```
205
-
206
- ## Opt-in client wrapping
207
-
208
- Transparently wrap an existing LLM client class so all calls through it are automatically optimized:
209
-
210
- ```ruby
211
- LlmOptimizer.wrap_client(OpenAI::Client)
212
- ```
213
-
214
- This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
215
-
216
228
  ## OptimizeResult
217
229
 
218
230
  Every call returns an `OptimizeResult` struct:
@@ -226,20 +238,9 @@ Every call returns an `OptimizeResult` struct:
226
238
  | `original_tokens` | Integer | Estimated token count before compression |
227
239
  | `compressed_tokens` | Integer | Estimated token count after compression (`nil` if not compressed) |
228
240
  | `latency_ms` | Float | Total wall-clock time for the optimize call |
229
- | `messages` | Array | Final messages array (for history management) |
230
-
231
- ## Error handling
232
-
233
- The gem defines a hierarchy of errors, all inheriting from `LlmOptimizer::Error`:
234
-
235
- ```
236
- LlmOptimizer::Error
237
- ├── LlmOptimizer::ConfigurationError # unknown config key, missing llm_caller
238
- ├── LlmOptimizer::EmbeddingError # embedding API failure
239
- └── LlmOptimizer::TimeoutError # network timeout exceeded
240
- ```
241
+ | `messages` | Array | Final messages array sent to the LLM, after history management and conversation hydration (`nil` on a cache hit) |
241
242
 
242
- The gateway catches all component failures and falls through to a raw LLM call with the original prompt. Your app's core functionality is never blocked by the optimizer.
243
+ The `messages` field reflects the actual array passed to `messages_caller` (or built from `conversation_id`), including any summarization applied by the history manager. You can pass it back as `options[:messages]` on the next call to continue a stateless conversation.
243
244
 
244
245
  ## Resilience
245
246
 
@@ -249,7 +250,9 @@ The gateway catches all component failures and falls through to a raw LLM call w
249
250
  | Redis unavailable (write) | Log warning, return LLM result normally |
250
251
  | Embedding API failure | Treat as cache miss, continue |
251
252
  | Any component exception | Log error, fall through to raw LLM call |
252
- | History summarization failure | Log error, return original messages unchanged |
253
+ | History summarization failure | Log warning, return original messages unchanged |
254
+ | Conversation load failure | Log warning, proceed without history |
255
+ | Conversation save failure | Log warning, return result with pre-save messages |
253
256
 
254
257
  ## Development
255
258
 
@@ -15,8 +15,8 @@ LlmOptimizer.configure do |config|
15
15
  # --- Model routing ---
16
16
  # :auto classifies each prompt; :simple or :complex forces a tier
17
17
  config.route_to = :auto
18
- config.simple_model = "gpt-4o-mini"
19
- config.complex_model = "gpt-4o"
18
+ config.simple_model = "gemini-1.5-flash"
19
+ config.complex_model = "claude-haiku-4-5"
20
20
 
21
21
  # --- Redis (required only if use_semantic_cache: true) ---
22
22
  config.redis_url = ENV.fetch("REDIS_URL", nil)
@@ -76,4 +76,42 @@ LlmOptimizer.configure do |config|
76
76
  # }
77
77
  #
78
78
  # config.classifier_caller = nil
79
+
80
+ # --- Messages caller (optional) ---
81
+ # Messages caller for history manager/conversation summary - Optional
82
+ # config.system_prompt = "You are a helpful person who gives responses in a non harmful way. " \
83
+ # "If any serious question is asked, handle it in effectively."
84
+ # OpenAI implmeentation -
85
+ # config.messages_caller = ->(messages, model:) {
86
+ # response = $openai.chat(
87
+ # parameters: {
88
+ # model: model,
89
+ # messages: messages.map { |m| { role: m[:role], content: m[:content] } }
90
+ # }
91
+ # )
92
+ # response.dig("choices", 0, "message", "content")
93
+ # }
94
+
95
+ # RubyLLM implementation -
96
+ # config.messages_caller = ->(messages, model:) {
97
+ # chat = RubyLLM.chat(model: model)
98
+ # messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
99
+ # chat.ask(messages.last[:content]).content
100
+ # }
101
+
102
+ # Anthropic implementation -
103
+ # config.messages_caller = ->(messages, model:) {
104
+ # # Anthropic separates system messages from the messages array
105
+ # system_msg = messages.find { |m| m[:role] == "system" }&.dig(:content)
106
+ # chat_msgs = messages.reject { |m| m[:role] == "system" }
107
+ # .map { |m| { role: m[:role], content: m[:content] } }
108
+
109
+ # response = $anthropic.messages(
110
+ # model: model,
111
+ # max_tokens: 1024,
112
+ # system: system_msg,
113
+ # messages: chat_msgs
114
+ # )
115
+ # response["content"].first["text"]
116
+ # }
79
117
  end
@@ -22,6 +22,9 @@ module LlmOptimizer
22
22
  llm_caller
23
23
  embedding_caller
24
24
  classifier_caller
25
+ conversation_ttl
26
+ system_prompt
27
+ messages_caller
25
28
  ].freeze
26
29
 
27
30
  # Define readers for all known keys (setters below track explicit sets)
@@ -47,6 +50,8 @@ module LlmOptimizer
47
50
  @llm_caller = nil
48
51
  @embedding_caller = nil
49
52
  @classifier_caller = nil
53
+ @conversation_ttl = 86_400
54
+ @system_prompt = nil
50
55
  end
51
56
 
52
57
  # Copies only explicitly set keys from other_config without resetting unmentioned keys.
@@ -0,0 +1,83 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ class ConversationStore
5
+ KEY_NAMESPACE = "llm_optimizer:conversation:"
6
+
7
+ def initialize(redis_client, ttl:, logger:, debug_logging: false, system_prompt: nil)
8
+ @redis = redis_client
9
+ @ttl = ttl
10
+ @logger = logger
11
+ @debug_logging = debug_logging
12
+ @system_prompt = system_prompt
13
+ end
14
+
15
+ # Loads and returns the messages array for conversation_id.
16
+ # Returns [] if no key exists or on Redis error (logs warning).
17
+ def load(conversation_id)
18
+ key = redis_key(conversation_id)
19
+ raw = @redis.get(key)
20
+
21
+ if raw.nil?
22
+ messages = seed_messages
23
+ @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
24
+ log_debug_history(conversation_id, messages)
25
+ return messages
26
+ end
27
+
28
+ messages = JSON.parse(raw, symbolize_names: true)
29
+ @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
30
+ log_debug_history(conversation_id, messages)
31
+ messages
32
+ rescue Redis::BaseError => e
33
+ @logger.warn("[llm_optimizer] ConversationStore load failed: conversation_id=#{conversation_id}, error=#{e.message}")
34
+ []
35
+ end
36
+
37
+ # Appends user + assistant messages to history and persists to Redis.
38
+ # Silently logs warning on Redis error; never raises.
39
+ def save(conversation_id, messages, prompt, response)
40
+ updated_messages = messages + [
41
+ { role: "user", content: prompt },
42
+ { role: "assistant", content: response }
43
+ ]
44
+
45
+ key = redis_key(conversation_id)
46
+ json = JSON.generate(updated_messages)
47
+
48
+ if @ttl.zero?
49
+ @redis.set(key, json)
50
+ else
51
+ @redis.set(key, json, ex: @ttl)
52
+ end
53
+
54
+ @logger.info("[llm_optimizer] ConversationStore save: conversation_id=#{conversation_id}, count=#{updated_messages.size}")
55
+ log_debug_history(conversation_id, updated_messages)
56
+ updated_messages
57
+ rescue Redis::BaseError => e
58
+ @logger.warn("[llm_optimizer] ConversationStore save failed: conversation_id=#{conversation_id}, error=#{e.message}")
59
+ nil
60
+ end
61
+
62
+ private
63
+
64
+ def redis_key(conversation_id)
65
+ "#{KEY_NAMESPACE}#{conversation_id}"
66
+ end
67
+
68
+ def seed_messages
69
+ return [] unless @system_prompt
70
+
71
+ [
72
+ { role: "user", content: @system_prompt },
73
+ { role: "assistant", content: "Got it!" }
74
+ ]
75
+ end
76
+
77
+ def log_debug_history(conversation_id, messages)
78
+ return unless @debug_logging
79
+
80
+ @logger.debug("[llm_optimizer] ConversationStore history: conversation_id=#{conversation_id}, messages=#{messages.inspect}")
81
+ end
82
+ end
83
+ end
@@ -10,10 +10,12 @@ module LlmOptimizer
10
10
  Classify the following prompt as either 'simple' or 'complex'.
11
11
 
12
12
  Rules:
13
- - simple: factual questions, basic lookups, short explanations, greetings
13
+ - simple: factual questions, basic lookups, short explanations, greetings, chitchat, general statements, simple mathematical calculations with additions, subtractions, multiplications and divisions
14
+ Example - Hello, Bye, You are funny, how are you?, what is the capital of France, tell me about yourself, what is 2 + 3 - 1 * 10 / 2 etc.
14
15
  - complex: code generation, debugging, architecture, multi-step reasoning, analysis
16
+ Example - how does pandas extract my information, debug this code, why is rag apps consume more tokens, give me code to print star in python etc.
15
17
 
16
- Reply with exactly one word: simple or complex
18
+ Reply with exactly one word, no punctuation: simple or complex
17
19
 
18
20
  Prompt: %<prompt>s
19
21
  PROMPT
@@ -48,9 +50,12 @@ module LlmOptimizer
48
50
  def classify_with_llm(prompt)
49
51
  classifier_prompt = format(CLASSIFIER_PROMPT, prompt: prompt)
50
52
  response = @config.classifier_caller.call(classifier_prompt)
51
- normalized = response.to_s.strip.downcase.gsub(/[^a-z]/, "")
52
- return :simple if normalized == "simple"
53
- return :complex if normalized == "complex"
53
+ normalized = response.to_s.strip.downcase
54
+
55
+ # Check for word boundary match to handle responses like
56
+ # "simple." / "**simple**" / "the answer is simple"
57
+ return :simple if normalized.match?(/\bsimple\b/)
58
+ return :complex if normalized.match?(/\bcomplex\b/)
54
59
 
55
60
  nil # unrecognized response — fall through to heuristic
56
61
  rescue StandardError
@@ -0,0 +1,173 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ # Internal pipeline helpers — not part of the public API.
5
+ # Extended into LlmOptimizer as private class methods.
6
+ module Pipeline
7
+ private
8
+
9
+ def build_call_config(options, &block)
10
+ cfg = Configuration.new
11
+ cfg.merge!(configuration)
12
+ options.each do |k, v|
13
+ next unless Configuration::KNOWN_KEYS.include?(k.to_sym)
14
+
15
+ cfg.public_send(:"#{k}=", v)
16
+ end
17
+ block&.call(cfg)
18
+ cfg
19
+ end
20
+
21
+ def validate_conversation_options!(conversation_id, options, call_config)
22
+ if conversation_id && options[:messages]
23
+ raise ConfigurationError,
24
+ "conversation_id and messages: are mutually exclusive — pass one or the other"
25
+ end
26
+
27
+ return unless conversation_id && call_config.redis_url.nil?
28
+
29
+ raise ConfigurationError,
30
+ "redis_url must be configured to use conversation_id"
31
+ end
32
+
33
+ def compress(prompt, config)
34
+ return [prompt, nil] unless config.compress_prompt
35
+
36
+ compressed = Compressor.new.compress(prompt)
37
+ [compressed, Compressor.new.estimate_tokens(compressed)]
38
+ end
39
+
40
+ def route(prompt, config)
41
+ router = ModelRouter.new(config)
42
+ model_tier = router.route(prompt)
43
+ model = model_tier == :simple ? config.simple_model : config.complex_model
44
+ [model_tier, model]
45
+ end
46
+
47
+ def semantic_cache_lookup(prompt, model, model_tier, original_tokens,
48
+ compressed_tokens, original_prompt, start, config)
49
+ return [nil, nil] unless config.use_semantic_cache
50
+
51
+ emb_client = EmbeddingClient.new(
52
+ model: config.embedding_model,
53
+ timeout_seconds: config.timeout_seconds,
54
+ embedding_caller: config.embedding_caller
55
+ )
56
+ embedding = emb_client.embed(prompt)
57
+ embedding, result = check_cache_hit(embedding, prompt, model, model_tier,
58
+ original_tokens, compressed_tokens,
59
+ original_prompt, start, config)
60
+ [embedding, result]
61
+ rescue EmbeddingError => e
62
+ config.logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
63
+ [nil, nil]
64
+ end
65
+
66
+ def load_conversation(conversation_id, options, config)
67
+ return [options[:messages], nil] unless conversation_id
68
+
69
+ redis = build_redis(config.redis_url)
70
+ store = ConversationStore.new(redis,
71
+ ttl: config.conversation_ttl,
72
+ logger: config.logger,
73
+ debug_logging: config.debug_logging,
74
+ system_prompt: config.system_prompt)
75
+ [store.load(conversation_id), store]
76
+ end
77
+
78
+ def apply_history_manager(messages, config)
79
+ return messages unless config.manage_history && messages
80
+
81
+ llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: config) }
82
+ history_mgr = HistoryManager.new(
83
+ llm_caller: llm_caller,
84
+ simple_model: config.simple_model,
85
+ token_budget: config.token_budget
86
+ )
87
+ history_mgr.process(messages)
88
+ end
89
+
90
+ def persist_conversation(store, conversation_id, messages, prompt, response)
91
+ return messages unless store && conversation_id
92
+
93
+ store.save(conversation_id, messages, prompt, response) || messages
94
+ end
95
+
96
+ def store_in_cache(embedding, response, config)
97
+ return unless config.use_semantic_cache && embedding && config.redis_url
98
+
99
+ redis = build_redis(config.redis_url)
100
+ cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
101
+ cache.store(embedding, response)
102
+ rescue StandardError => e
103
+ config.logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
104
+ end
105
+
106
+ def build_result(response, model, model_tier, cache_status,
107
+ original_tokens, compressed_tokens, latency_ms, messages)
108
+ OptimizeResult.new(
109
+ response: response, model: model, model_tier: model_tier,
110
+ cache_status: cache_status, original_tokens: original_tokens,
111
+ compressed_tokens: compressed_tokens, latency_ms: latency_ms,
112
+ messages: messages
113
+ )
114
+ end
115
+
116
+ def fallback_result(original_prompt, original_tokens, options, start)
117
+ latency_ms = elapsed_ms(start)
118
+ response = raw_llm_call(original_prompt, model: nil, config: configuration)
119
+ build_result(response, nil, nil, :miss, original_tokens || 0, nil,
120
+ latency_ms, options[:messages])
121
+ end
122
+
123
+ def raw_llm_call(prompt, model:, messages: nil, config: nil)
124
+ if messages && !messages.empty? && config&.messages_caller
125
+ config.messages_caller.call(messages + [{ role: "user", content: prompt }], model: model)
126
+ else
127
+ llm = config&.llm_caller || @_current_llm_caller
128
+ raise ConfigurationError, "No llm_caller configured." unless llm
129
+
130
+ llm.call(prompt, model: model)
131
+ end
132
+ end
133
+
134
+ def elapsed_ms(start)
135
+ ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
136
+ end
137
+
138
+ def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
139
+ compressed_tokens:, latency_ms:, prompt:, response:)
140
+ logger.info(
141
+ "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
142
+ "model_tier: #{model_tier.inspect}, " \
143
+ "original_tokens: #{original_tokens.inspect}, " \
144
+ "compressed_tokens: #{compressed_tokens.inspect}, " \
145
+ "latency_ms: #{latency_ms.inspect} }"
146
+ )
147
+ logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}") if config.debug_logging
148
+ end
149
+
150
+ def build_redis(redis_url)
151
+ require "redis"
152
+ Redis.new(url: redis_url)
153
+ end
154
+
155
+ def check_cache_hit(embedding, _prompt, model, model_tier, original_tokens,
156
+ compressed_tokens, original_prompt, start, config)
157
+ return [embedding, nil] unless config.redis_url
158
+
159
+ redis = build_redis(config.redis_url)
160
+ cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
161
+ cached = cache.lookup(embedding)
162
+ return [embedding, nil] unless cached
163
+
164
+ latency_ms = elapsed_ms(start)
165
+ emit_log(config.logger, config,
166
+ cache_status: :hit, model_tier: model_tier,
167
+ original_tokens: original_tokens, compressed_tokens: compressed_tokens,
168
+ latency_ms: latency_ms, prompt: original_prompt, response: cached)
169
+ [embedding, build_result(cached, model, model_tier, :hit,
170
+ original_tokens, compressed_tokens, latency_ms, nil)]
171
+ end
172
+ end
173
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LlmOptimizer
4
- VERSION = "0.1.3"
4
+ VERSION = "0.1.5"
5
5
  end
data/lib/llm_optimizer.rb CHANGED
@@ -8,26 +8,21 @@ require_relative "llm_optimizer/model_router"
8
8
  require_relative "llm_optimizer/embedding_client"
9
9
  require_relative "llm_optimizer/semantic_cache"
10
10
  require_relative "llm_optimizer/history_manager"
11
+ require_relative "llm_optimizer/conversation_store"
12
+ require_relative "llm_optimizer/pipeline"
11
13
 
12
14
  require "llm_optimizer/railtie" if defined?(Rails)
13
15
 
14
16
  module LlmOptimizer
15
- # Base error class for all gem-specific exceptions
16
17
  class Error < StandardError; end
17
-
18
- # Raised when an unrecognized configuration key is set
19
18
  class ConfigurationError < Error; end
20
-
21
- # Raised when the embedding API call fails
22
19
  class EmbeddingError < Error; end
23
-
24
- # Raised when a network timeout is exceeded
25
20
  class TimeoutError < Error; end
26
21
 
27
- # Global configuration
28
22
  @configuration = nil
29
23
 
30
- # Yields a Configuration instance; merges it into the global config.
24
+ extend Pipeline
25
+
31
26
  def self.configure
32
27
  temp = Configuration.new
33
28
  yield temp
@@ -35,7 +30,6 @@ module LlmOptimizer
35
30
  validate_configuration!(configuration)
36
31
  end
37
32
 
38
- # Warns about misconfigured options rather than failing silently at call time.
39
33
  def self.validate_configuration!(config)
40
34
  return unless config.use_semantic_cache && config.embedding_caller.nil?
41
35
 
@@ -46,229 +40,112 @@ module LlmOptimizer
46
40
  config.use_semantic_cache = false
47
41
  end
48
42
 
49
- # Returns the current global Configuration, lazy-initializing if nil.
50
43
  def self.configuration
51
44
  @configuration ||= Configuration.new
52
45
  end
53
46
 
54
- # Replaces the global config with a fresh default Configuration.
55
- # Useful in tests to avoid state leakage.
56
47
  def self.reset_configuration!
57
48
  @configuration = Configuration.new
58
49
  end
59
50
 
60
- # Opt-in client wrapping
51
+ def self.clear_conversation(conversation_id)
52
+ raise ConfigurationError, "redis_url must be configured to use clear_conversation" unless configuration.redis_url
53
+
54
+ redis = build_redis(configuration.redis_url)
55
+ key = "#{ConversationStore::KEY_NAMESPACE}#{conversation_id}"
56
+ deleted = redis.del(key)
57
+ deleted.positive?
58
+ rescue ::Redis::BaseError => e
59
+ raise LlmOptimizer::Error, "Redis error in clear_conversation: #{e.message}"
60
+ end
61
+
61
62
  module WrapperModule
62
63
  def chat(params, &)
64
+ config = LlmOptimizer.configuration
63
65
  prompt = params[:messages] || params[:prompt]
64
- optimized = LlmOptimizer.optimize(prompt)
65
- params = params.merge(messages: optimized.messages, model: optimized.model)
66
- super
66
+ result = LlmOptimizer.optimize_pre_call(prompt, config)
67
+ return result[:response] if result[:cache_status] == :hit
68
+
69
+ optimized_params = params.merge(model: result[:model])
70
+ if params[:messages]
71
+ optimized_params = optimized_params.merge(messages: result[:prompt])
72
+ elsif params[:prompt]
73
+ optimized_params = optimized_params.merge(prompt: result[:prompt])
74
+ end
75
+
76
+ response = super(optimized_params, &)
77
+ LlmOptimizer.optimize_post_call(result, response, config)
78
+ response
67
79
  end
68
80
  end
69
81
 
70
- # Prepends WrapperModule into client_class; idempotent — safe to call N times.
71
82
  def self.wrap_client(client_class)
72
83
  return if client_class.ancestors.include?(WrapperModule)
73
84
 
74
85
  client_class.prepend(WrapperModule)
75
86
  end
76
87
 
77
- # Primary entry point
78
- # Runs the optimization pipeline and returns an OptimizeResult.
79
-
80
- # options hash keys mirror Configuration attr_accessors and are merged over
81
- # the global config for this call only. An optional block is yielded a
82
- # per-call Configuration for fine-grained control.
83
- def self.optimize(prompt, options = {})
84
- start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
85
-
86
- # Resolve per-call configuration — only pass known config keys
87
- call_config = Configuration.new
88
- call_config.merge!(configuration)
89
- options.each do |k, v|
90
- next unless LlmOptimizer::Configuration::KNOWN_KEYS.include?(k.to_sym)
91
-
92
- call_config.public_send(:"#{k}=", v)
93
- end
94
- yield call_config if block_given?
95
-
96
- logger = call_config.logger
97
-
98
- # Keep a reference to the original prompt for fallback use
99
- original_prompt = prompt
100
-
101
- # Compression
102
- compressor = Compressor.new
103
- original_tokens = compressor.estimate_tokens(prompt)
104
- compressed_tokens = nil
105
-
106
- if call_config.compress_prompt
107
- prompt = compressor.compress(prompt)
108
- compressed_tokens = compressor.estimate_tokens(prompt)
109
- end
110
-
111
- # Model routing
112
- router = ModelRouter.new(call_config)
113
- model_tier = router.route(prompt)
114
- model = model_tier == :simple ? call_config.simple_model : call_config.complex_model
115
-
116
- # Semantic cache lookup
117
- embedding = nil
118
-
119
- if call_config.use_semantic_cache
120
- begin
121
- emb_client = EmbeddingClient.new(
122
- model: call_config.embedding_model,
123
- timeout_seconds: call_config.timeout_seconds,
124
- embedding_caller: call_config.embedding_caller
125
- )
126
- embedding = emb_client.embed(prompt)
88
+ def self.optimize(prompt, options = {}, &)
89
+ start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
90
+ call_config = build_call_config(options, &)
91
+ conversation_id = options[:conversation_id]
92
+ validate_conversation_options!(conversation_id, options, call_config)
127
93
 
128
- if call_config.redis_url
129
- redis = build_redis(call_config.redis_url)
130
- cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
131
- cached = cache.lookup(embedding)
94
+ original_prompt = prompt
95
+ original_tokens = Compressor.new.estimate_tokens(prompt)
96
+ prompt, compressed_tokens = compress(prompt, call_config)
97
+ model_tier, model = route(prompt, call_config)
132
98
 
133
- if cached
134
- latency_ms = elapsed_ms(start)
135
- emit_log(logger, call_config,
136
- cache_status: :hit, model_tier: model_tier,
137
- original_tokens: original_tokens, compressed_tokens: compressed_tokens,
138
- latency_ms: latency_ms, prompt: original_prompt, response: cached)
139
- return OptimizeResult.new(
140
- response: cached,
141
- model: model,
142
- model_tier: model_tier,
143
- cache_status: :hit,
144
- original_tokens: original_tokens,
145
- compressed_tokens: compressed_tokens,
146
- latency_ms: latency_ms,
147
- messages: options[:messages]
148
- )
149
- end
150
- end
151
- rescue EmbeddingError => e
152
- logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
153
- embedding = nil
154
- # continue pipeline as cache miss
155
- end
156
- end
157
-
158
- # History management
159
- messages = options[:messages]
160
- if call_config.manage_history && messages
161
- llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: call_config) }
162
- history_mgr = HistoryManager.new(
163
- llm_caller: llm_caller,
164
- simple_model: call_config.simple_model,
165
- token_budget: call_config.token_budget
166
- )
167
- messages = history_mgr.process(messages)
168
- end
99
+ embedding, cached_result = semantic_cache_lookup(prompt, model, model_tier,
100
+ original_tokens, compressed_tokens,
101
+ original_prompt, start, call_config)
102
+ return cached_result if cached_result
169
103
 
170
- # Raw LLM call
171
- response = raw_llm_call(prompt, model: model, config: call_config)
104
+ messages, store = load_conversation(conversation_id, options, call_config)
105
+ messages = apply_history_manager(messages, call_config)
106
+ response = raw_llm_call(prompt, messages: messages, model: model, config: call_config)
107
+ messages = persist_conversation(store, conversation_id, messages, prompt, response)
108
+ store_in_cache(embedding, response, call_config)
172
109
 
173
- # Cache store
174
- if call_config.use_semantic_cache && embedding && call_config.redis_url
175
- begin
176
- redis = build_redis(call_config.redis_url)
177
- cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
178
- cache.store(embedding, response)
179
- rescue StandardError => e
180
- logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
181
- end
182
- end
183
-
184
- # Build result
185
110
  latency_ms = elapsed_ms(start)
186
- emit_log(logger, call_config,
111
+ emit_log(call_config.logger, call_config,
187
112
  cache_status: :miss, model_tier: model_tier,
188
113
  original_tokens: original_tokens, compressed_tokens: compressed_tokens,
189
114
  latency_ms: latency_ms, prompt: original_prompt, response: response)
190
-
191
- OptimizeResult.new(
192
- response: response,
193
- model: model,
194
- model_tier: model_tier,
195
- cache_status: :miss,
196
- original_tokens: original_tokens,
197
- compressed_tokens: compressed_tokens,
198
- latency_ms: latency_ms,
199
- messages: messages
200
- )
115
+ build_result(response, model, model_tier, :miss, original_tokens, compressed_tokens,
116
+ latency_ms, messages)
201
117
  rescue EmbeddingError => e
202
- # Treat embedding failures as cache miss — continue to raw LLM call
203
- logger = configuration.logger
204
- logger.warn("[llm_optimizer] EmbeddingError (outer rescue, treating as cache miss): #{e.message}")
205
- latency_ms = elapsed_ms(start)
206
- response = raw_llm_call(original_prompt, model: nil, config: configuration)
207
- OptimizeResult.new(
208
- response: response,
209
- model: nil,
210
- model_tier: nil,
211
- cache_status: :miss,
212
- original_tokens: original_tokens || 0,
213
- compressed_tokens: nil,
214
- latency_ms: latency_ms,
215
- messages: options[:messages]
216
- )
118
+ configuration.logger.warn("[llm_optimizer] EmbeddingError (outer rescue): #{e.message}")
119
+ fallback_result(original_prompt, original_tokens, options, start)
120
+ rescue ConfigurationError
121
+ raise
217
122
  rescue LlmOptimizer::Error, StandardError => e
218
- logger = configuration.logger
219
- logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
220
- latency_ms = elapsed_ms(start)
221
- response = raw_llm_call(original_prompt, model: nil, config: configuration)
222
- OptimizeResult.new(
223
- response: response,
224
- model: nil,
225
- model_tier: nil,
226
- cache_status: :miss,
227
- original_tokens: original_tokens || 0,
228
- compressed_tokens: nil,
229
- latency_ms: latency_ms,
230
- messages: options[:messages]
231
- )
123
+ configuration.logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
124
+ fallback_result(original_prompt, original_tokens, options, start)
232
125
  end
233
126
 
234
- # Private helpers
235
-
236
- class << self
237
- private
238
-
239
- def raw_llm_call(prompt, model:, config: nil)
240
- caller = config&.llm_caller || @_current_llm_caller
241
- unless caller
242
- raise ConfigurationError,
243
- "No llm_caller configured. " \
244
- "Set it via LlmOptimizer.configure { |c| c.llm_caller = ->(prompt, model:) { ... } }"
245
- end
127
+ def self.optimize_pre_call(prompt, config = configuration)
128
+ prompt = Compressor.new.compress(prompt) if config.compress_prompt
129
+ model_tier = ModelRouter.new(config).route(prompt)
130
+ model = model_tier == :simple ? config.simple_model : config.complex_model
246
131
 
247
- caller.call(prompt, model: model)
132
+ unless config.use_semantic_cache && config.redis_url
133
+ return { prompt: prompt, model: model, model_tier: model_tier,
134
+ embedding: nil, cache_status: :miss, response: nil }
248
135
  end
249
136
 
250
- def elapsed_ms(start)
251
- ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
137
+ embedding, result = semantic_cache_lookup(prompt, model, model_tier, nil, nil,
138
+ prompt, Process.clock_gettime(Process::CLOCK_MONOTONIC), config)
139
+ if result
140
+ return { prompt: prompt, model: model, model_tier: model_tier,
141
+ embedding: embedding, cache_status: :hit, response: result.response }
252
142
  end
253
143
 
254
- def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
255
- compressed_tokens:, latency_ms:, prompt:, response:)
256
- logger.info(
257
- "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
258
- "model_tier: #{model_tier.inspect}, " \
259
- "original_tokens: #{original_tokens.inspect}, " \
260
- "compressed_tokens: #{compressed_tokens.inspect}, " \
261
- "latency_ms: #{latency_ms.inspect} }"
262
- )
263
-
264
- return unless config.debug_logging
265
-
266
- logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}")
267
- end
144
+ { prompt: prompt, model: model, model_tier: model_tier,
145
+ embedding: embedding, cache_status: :miss, response: nil }
146
+ end
268
147
 
269
- def build_redis(redis_url)
270
- require "redis"
271
- Redis.new(url: redis_url)
272
- end
148
+ def self.optimize_post_call(pre_call_result, response, config = configuration)
149
+ store_in_cache(pre_call_result[:embedding], response, config)
273
150
  end
274
151
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_optimizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.1.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - arun kumar
@@ -100,10 +100,12 @@ files:
100
100
  - lib/llm_optimizer.rb
101
101
  - lib/llm_optimizer/compressor.rb
102
102
  - lib/llm_optimizer/configuration.rb
103
+ - lib/llm_optimizer/conversation_store.rb
103
104
  - lib/llm_optimizer/embedding_client.rb
104
105
  - lib/llm_optimizer/history_manager.rb
105
106
  - lib/llm_optimizer/model_router.rb
106
107
  - lib/llm_optimizer/optimize_result.rb
108
+ - lib/llm_optimizer/pipeline.rb
107
109
  - lib/llm_optimizer/railtie.rb
108
110
  - lib/llm_optimizer/semantic_cache.rb
109
111
  - lib/llm_optimizer/version.rb