llm_optimizer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: e68575c7d9fba996b2efeb9dd559635bba72316322b75a1f43b0e6b2a5e28fce
4
+ data.tar.gz: b8b3f8e06da0d860af65a192b3a4fb16bfe8110fd3eca69d328fd5ef71471571
5
+ SHA512:
6
+ metadata.gz: 6679cbc09844d71e3c42e74e313d5366bcafbfdeb7e6625a6f0ad591bb8fc98687ba3040a6c54703388bc81be614fd9095d5f71eb0b3eac80fdf0c43299445ee
7
+ data.tar.gz: '095e4ac8ef5f45240f9d0d5068cf462d66900258765d184f1e25b2e9233c782d5363a248dc178167ad83055fa0d6a006bb2631a83e94c482fcd03fed947c41d4'
data/CHANGELOG.md ADDED
@@ -0,0 +1,33 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2026-04-10
11
+
12
+ ### Added
13
+
14
+ - `LlmOptimizer.optimize(prompt, options = {}, &block)` — primary entry point returning an `OptimizeResult`
15
+ - `LlmOptimizer.configure` — global configuration with merge semantics (multiple calls merge without resetting)
16
+ - `LlmOptimizer.reset_configuration!` — resets global config to defaults (useful in tests)
17
+ - `LlmOptimizer.wrap_client(client_class)` — opt-in idempotent client wrapping via module prepend
18
+ - **Semantic Caching** — Redis-backed vector similarity cache using cosine similarity; configurable threshold and TTL
19
+ - **Intelligent Model Routing** — heuristic classifier routing prompts to `:simple` or `:complex` model tier based on word count, code blocks, and keywords
20
+ - **Token Pruning / Compressor** — English stop-word removal with fenced code block preservation; `estimate_tokens` helper
21
+ - **Conversation History Sliding Window** — summarizes oldest messages when token budget is exceeded; falls back to original messages on LLM failure
22
+ - **EmbeddingClient** — injectable `embedding_caller` lambda with OpenAI fallback via `OPENAI_API_KEY`
23
+ - **`llm_caller`** — injectable lambda to wire any LLM provider (RubyLLM, ruby-openai, Anthropic, Bedrock, etc.)
24
+ - **Rails generator** — `rails generate llm_optimizer:install` creates a pre-filled initializer
25
+ - **Railtie** — auto-loads generator when used in a Rails app
26
+ - **Structured logging** — INFO log per optimize call (no prompt content); DEBUG log with full prompt/response when `debug_logging: true`
27
+ - **Resilience** — all component failures fall through to raw LLM call; `EmbeddingError` treated as cache miss
28
+ - Full exception hierarchy: `LlmOptimizer::Error`, `ConfigurationError`, `EmbeddingError`, `TimeoutError`
29
+ - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
30
+ - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
31
+
32
+ [Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...HEAD
33
+ [0.1.0]: https://github.com/arunkumarry/llm_optimizer/releases/tag/v0.1.0
@@ -0,0 +1,132 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our
6
+ community a harassment-free experience for everyone, regardless of age, body
7
+ size, visible or invisible disability, ethnicity, sex characteristics, gender
8
+ identity and expression, level of experience, education, socio-economic status,
9
+ nationality, personal appearance, race, caste, color, religion, or sexual
10
+ identity and orientation.
11
+
12
+ We pledge to act and interact in ways that contribute to an open, welcoming,
13
+ diverse, inclusive, and healthy community.
14
+
15
+ ## Our Standards
16
+
17
+ Examples of behavior that contributes to a positive environment for our
18
+ community include:
19
+
20
+ * Demonstrating empathy and kindness toward other people
21
+ * Being respectful of differing opinions, viewpoints, and experiences
22
+ * Giving and gracefully accepting constructive feedback
23
+ * Accepting responsibility and apologizing to those affected by our mistakes,
24
+ and learning from the experience
25
+ * Focusing on what is best not just for us as individuals, but for the overall
26
+ community
27
+
28
+ Examples of unacceptable behavior include:
29
+
30
+ * The use of sexualized language or imagery, and sexual attention or advances of
31
+ any kind
32
+ * Trolling, insulting or derogatory comments, and personal or political attacks
33
+ * Public or private harassment
34
+ * Publishing others' private information, such as a physical or email address,
35
+ without their explicit permission
36
+ * Other conduct which could reasonably be considered inappropriate in a
37
+ professional setting
38
+
39
+ ## Enforcement Responsibilities
40
+
41
+ Community leaders are responsible for clarifying and enforcing our standards of
42
+ acceptable behavior and will take appropriate and fair corrective action in
43
+ response to any behavior that they deem inappropriate, threatening, offensive,
44
+ or harmful.
45
+
46
+ Community leaders have the right and responsibility to remove, edit, or reject
47
+ comments, commits, code, wiki edits, issues, and other contributions that are
48
+ not aligned to this Code of Conduct, and will communicate reasons for moderation
49
+ decisions when appropriate.
50
+
51
+ ## Scope
52
+
53
+ This Code of Conduct applies within all community spaces, and also applies when
54
+ an individual is officially representing the community in public spaces.
55
+ Examples of representing our community include using an official email address,
56
+ posting via an official social media account, or acting as an appointed
57
+ representative at an online or offline event.
58
+
59
+ ## Enforcement
60
+
61
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
+ reported to the community leaders responsible for enforcement at
63
+ [INSERT CONTACT METHOD].
64
+ All complaints will be reviewed and investigated promptly and fairly.
65
+
66
+ All community leaders are obligated to respect the privacy and security of the
67
+ reporter of any incident.
68
+
69
+ ## Enforcement Guidelines
70
+
71
+ Community leaders will follow these Community Impact Guidelines in determining
72
+ the consequences for any action they deem in violation of this Code of Conduct:
73
+
74
+ ### 1. Correction
75
+
76
+ **Community Impact**: Use of inappropriate language or other behavior deemed
77
+ unprofessional or unwelcome in the community.
78
+
79
+ **Consequence**: A private, written warning from community leaders, providing
80
+ clarity around the nature of the violation and an explanation of why the
81
+ behavior was inappropriate. A public apology may be requested.
82
+
83
+ ### 2. Warning
84
+
85
+ **Community Impact**: A violation through a single incident or series of
86
+ actions.
87
+
88
+ **Consequence**: A warning with consequences for continued behavior. No
89
+ interaction with the people involved, including unsolicited interaction with
90
+ those enforcing the Code of Conduct, for a specified period of time. This
91
+ includes avoiding interactions in community spaces as well as external channels
92
+ like social media. Violating these terms may lead to a temporary or permanent
93
+ ban.
94
+
95
+ ### 3. Temporary Ban
96
+
97
+ **Community Impact**: A serious violation of community standards, including
98
+ sustained inappropriate behavior.
99
+
100
+ **Consequence**: A temporary ban from any sort of interaction or public
101
+ communication with the community for a specified period of time. No public or
102
+ private interaction with the people involved, including unsolicited interaction
103
+ with those enforcing the Code of Conduct, is allowed during this period.
104
+ Violating these terms may lead to a permanent ban.
105
+
106
+ ### 4. Permanent Ban
107
+
108
+ **Community Impact**: Demonstrating a pattern of violation of community
109
+ standards, including sustained inappropriate behavior, harassment of an
110
+ individual, or aggression toward or disparagement of classes of individuals.
111
+
112
+ **Consequence**: A permanent ban from any sort of public interaction within the
113
+ community.
114
+
115
+ ## Attribution
116
+
117
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
+ version 2.1, available at
119
+ [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
120
+
121
+ Community Impact Guidelines were inspired by
122
+ [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
123
+
124
+ For answers to common questions about this code of conduct, see the FAQ at
125
+ [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
126
+ [https://www.contributor-covenant.org/translations][translations].
127
+
128
+ [homepage]: https://www.contributor-covenant.org
129
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
130
+ [Mozilla CoC]: https://github.com/mozilla/diversity
131
+ [FAQ]: https://www.contributor-covenant.org/faq
132
+ [translations]: https://www.contributor-covenant.org/translations
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 arun kumar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,243 @@
1
+ # llm_optimizer
2
+
3
+ A Smart Gateway for LLM API calls in Ruby and Rails applications. Reduces token usage and API costs through four composable optimizations — all opt-in, all independently configurable.
4
+
5
+ ## How it works
6
+
7
+ Every call to `LlmOptimizer.optimize` passes through an ordered pipeline:
8
+
9
+ ```
10
+ prompt → Compressor → ModelRouter → SemanticCache lookup → HistoryManager → LLM call → SemanticCache store → OptimizeResult
11
+ ```
12
+
13
+ Each stage is independently enabled via configuration flags. If any stage fails, the gem falls through to a raw LLM call — your app never breaks because of the optimizer.
14
+
15
+ ## Optimizations
16
+
17
+ ### 1. Semantic Caching
18
+ Stores prompt embeddings in Redis. On subsequent calls, computes cosine similarity against stored embeddings. If similarity ≥ threshold, returns the cached response instantly — no LLM call made.
19
+
20
+ ### 2. Intelligent Model Routing
21
+ Classifies each prompt using a heuristic and routes it to the appropriate model tier:
22
+ - **Simple** — short prompts (< 20 words), no code blocks, no complex keywords → cheaper/faster model
23
+ - **Complex** — code blocks, keywords like `analyze`, `refactor`, `debug`, `architect`, `explain in detail` → premium model
24
+
25
+ ### 3. Token Pruning
26
+ Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
27
+
28
+ ### 4. Conversation History Sliding Window
29
+ When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message.
30
+
31
+ ## Installation
32
+
33
+ Add to your Gemfile:
34
+
35
+ ```ruby
36
+ gem "llm_optimizer"
37
+ ```
38
+
39
+ Then run:
40
+
41
+ ```bash
42
+ bundle install
43
+ ```
44
+
45
+ For Rails apps, generate the initializer:
46
+
47
+ ```bash
48
+ rails generate llm_optimizer:install
49
+ ```
50
+
51
+ This creates `config/initializers/llm_optimizer.rb` with all options pre-filled and commented.
52
+
53
+ ## Quick Start
54
+
55
+ ```ruby
56
+ LlmOptimizer.configure do |config|
57
+ config.compress_prompt = true
58
+ config.use_semantic_cache = true
59
+ config.redis_url = ENV["REDIS_URL"]
60
+
61
+ # Wire up your app's LLM client
62
+ config.llm_caller = ->(prompt, model:) {
63
+ # Use whatever LLM client your app already has
64
+ MyLlmService.chat(prompt, model: model)
65
+ }
66
+
67
+ # Wire up your embeddings provider (required if use_semantic_cache: true)
68
+ config.embedding_caller = ->(text) {
69
+ MyEmbeddingService.embed(text)
70
+ }
71
+ end
72
+
73
+ result = LlmOptimizer.optimize("What is Redis?")
74
+
75
+ puts result.response # => "Redis is an in-memory data store..."
76
+ puts result.cache_status # => :hit or :miss
77
+ puts result.model_tier # => :simple or :complex
78
+ puts result.model # => "gpt-4o-mini"
79
+ puts result.original_tokens # => 5
80
+ puts result.compressed_tokens # => 4
81
+ puts result.latency_ms # => 12.4
82
+ ```
83
+
84
+ ## Configuration
85
+
86
+ ### Rails initializer
87
+
88
+ ```ruby
89
+ LlmOptimizer.configure do |config|
90
+ # Feature flags — all off by default
91
+ config.compress_prompt = true # strip stop words before sending to LLM
92
+ config.use_semantic_cache = true # cache responses by vector similarity
93
+ config.manage_history = true # summarize old messages when over token budget
94
+
95
+ # Model routing
96
+ config.route_to = :auto # :auto | :simple | :complex
97
+ config.simple_model = "gpt-4o-mini" # model used for simple prompts
98
+ config.complex_model = "claude-3-5-sonnet-20241022" # model used for complex prompts
99
+
100
+ # Redis (required if use_semantic_cache: true)
101
+ config.redis_url = ENV["REDIS_URL"]
102
+
103
+ # Tuning
104
+ config.similarity_threshold = 0.96 # cosine similarity cutoff for cache hit (0.0–1.0)
105
+ config.token_budget = 4000 # token limit before history summarization
106
+ config.cache_ttl = 86400 # cache TTL in seconds (default: 24h)
107
+ config.timeout_seconds = 5 # timeout for external API calls
108
+
109
+ # Logging
110
+ config.logger = Rails.logger
111
+ config.debug_logging = Rails.env.development? # logs full prompt+response at DEBUG level
112
+
113
+ # LLM caller — wire to your existing LLM client (required)
114
+ config.llm_caller = ->(prompt, model:) {
115
+ RubyLLM.chat(model: model, assume_model_exists: true).ask(prompt).content
116
+ }
117
+
118
+ # Embeddings caller — wire to your embeddings provider (required if use_semantic_cache: true)
119
+ # Falls back to OpenAI via ENV["OPENAI_API_KEY"] if not set
120
+ config.embedding_caller = ->(text) {
121
+ MyEmbeddingService.embed(text)
122
+ }
123
+ end
124
+ ```
125
+
126
+ ### Configuration reference
127
+
128
+ | Key | Type | Default | Description |
129
+ |---|---|---|---|
130
+ | `compress_prompt` | Boolean | `false` | Strip stop words before sending to LLM |
131
+ | `use_semantic_cache` | Boolean | `false` | Enable Redis-backed semantic cache |
132
+ | `manage_history` | Boolean | `false` | Enable conversation history summarization |
133
+ | `route_to` | Symbol | `:auto` | `:auto`, `:simple`, or `:complex` |
134
+ | `simple_model` | String | `"gpt-4o-mini"` | Model for simple prompts |
135
+ | `complex_model` | String | `"claude-3-5-sonnet-20241022"` | Model for complex prompts |
136
+ | `similarity_threshold` | Float | `0.96` | Minimum cosine similarity for cache hit |
137
+ | `token_budget` | Integer | `4000` | Token limit before history summarization |
138
+ | `cache_ttl` | Integer | `86400` | Cache entry TTL in seconds |
139
+ | `timeout_seconds` | Integer | `5` | Timeout for external API calls |
140
+ | `redis_url` | String | `nil` | Redis connection URL |
141
+ | `embedding_model` | String | `"text-embedding-3-small"` | Embedding model name (OpenAI fallback) |
142
+ | `logger` | Logger | `Logger.new($stdout)` | Any Logger-compatible object |
143
+ | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
144
+ | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
145
+ | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
146
+
147
+ ## Per-call configuration
148
+
149
+ Override global config for a single call using a block:
150
+
151
+ ```ruby
152
+ result = LlmOptimizer.optimize(prompt) do |config|
153
+ config.route_to = :simple
154
+ config.compress_prompt = false
155
+ end
156
+ ```
157
+
158
+ ## Conversation history
159
+
160
+ Pass a `messages` array to enable history management:
161
+
162
+ ```ruby
163
+ messages = [
164
+ { role: "user", content: "Tell me about Redis" },
165
+ { role: "assistant", content: "Redis is an in-memory data store..." },
166
+ # ... more messages
167
+ ]
168
+
169
+ result = LlmOptimizer.optimize("What else can it do?", messages: messages)
170
+
171
+ # result.messages contains the (possibly summarized) messages array
172
+ ```
173
+
174
+ ## Opt-in client wrapping
175
+
176
+ Transparently wrap an existing LLM client class so all calls through it are automatically optimized:
177
+
178
+ ```ruby
179
+ LlmOptimizer.wrap_client(OpenAI::Client)
180
+ ```
181
+
182
+ This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times — idempotent.
183
+
184
+ ## OptimizeResult
185
+
186
+ Every call returns an `OptimizeResult` struct:
187
+
188
+ | Field | Type | Description |
189
+ |---|---|---|
190
+ | `response` | String | The LLM response text |
191
+ | `model` | String | Model name actually used |
192
+ | `model_tier` | Symbol | `:simple` or `:complex` |
193
+ | `cache_status` | Symbol | `:hit` or `:miss` |
194
+ | `original_tokens` | Integer | Estimated token count before compression |
195
+ | `compressed_tokens` | Integer | Estimated token count after compression (`nil` if not compressed) |
196
+ | `latency_ms` | Float | Total wall-clock time for the optimize call |
197
+ | `messages` | Array | Final messages array (for history management) |
198
+
199
+ ## Error handling
200
+
201
+ The gem defines a hierarchy of errors, all inheriting from `LlmOptimizer::Error`:
202
+
203
+ ```
204
+ LlmOptimizer::Error
205
+ ├── LlmOptimizer::ConfigurationError # unknown config key, missing llm_caller
206
+ ├── LlmOptimizer::EmbeddingError # embedding API failure
207
+ └── LlmOptimizer::TimeoutError # network timeout exceeded
208
+ ```
209
+
210
+ The gateway catches all component failures and falls through to a raw LLM call with the original prompt. Your app's core functionality is never blocked by the optimizer.
211
+
212
+ ## Resilience
213
+
214
+ | Failure | Behavior |
215
+ |---|---|
216
+ | Redis unavailable (read) | Treat as cache miss, continue |
217
+ | Redis unavailable (write) | Log warning, return LLM result normally |
218
+ | Embedding API failure | Treat as cache miss, continue |
219
+ | Any component exception | Log error, fall through to raw LLM call |
220
+ | History summarization failure | Log error, return original messages unchanged |
221
+
222
+ ## Development
223
+
224
+ ```bash
225
+ bundle install
226
+ bundle exec rake test # run tests
227
+ bundle exec rake rubocop # lint
228
+ bundle exec rake # test + lint
229
+ ```
230
+
231
+ Generate the Rails initializer in a target app:
232
+
233
+ ```bash
234
+ rails generate llm_optimizer:install
235
+ ```
236
+
237
+ ## License
238
+
239
+ MIT
240
+
241
+ ---
242
+
243
+ [GitHub](https://github.com/arunkumarry/llm_optimizer) · [RubyGems](https://rubygems.org/gems/llm_optimizer) · [Changelog](https://github.com/arunkumarry/llm_optimizer/blob/main/CHANGELOG.md)
data/Rakefile ADDED
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create(:test) do |t|
7
+ t.libs << "test"
8
+ t.test_globs = ["test/test_*.rb", "test/unit/test_*.rb"]
9
+ end
10
+
11
+ require "rubocop/rake_task"
12
+
13
+ RuboCop::RakeTask.new
14
+
15
+ task default: %i[test rubocop]
@@ -0,0 +1,17 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/generators"
4
+
5
+ module LlmOptimizer
6
+ module Generators
7
+ class InstallGenerator < Rails::Generators::Base
8
+ source_root File.expand_path("templates", __dir__)
9
+
10
+ desc "Creates a LlmOptimizer initializer in your Rails app"
11
+
12
+ def copy_initializer
13
+ template "initializer.rb", "config/initializers/llm_optimizer.rb"
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ # LlmOptimizer initializer
4
+ # Run `rails generate llm_optimizer:install` to regenerate this file.
5
+ #
6
+ # Docs: https://github.com/arunkumar/llm_optimizer
7
+
8
+ LlmOptimizer.configure do |config|
9
+ # --- Feature flags ---
10
+ # All optimizations are off by default. Enable what you need.
11
+ config.compress_prompt = false # strip stop words before sending to LLM
12
+ config.use_semantic_cache = false # cache responses by vector similarity in Redis
13
+ config.manage_history = false # summarize old messages when over token budget
14
+
15
+ # --- Model routing ---
16
+ # :auto classifies each prompt; :simple or :complex forces a tier
17
+ config.route_to = :auto
18
+ config.simple_model = "gpt-4o-mini"
19
+ config.complex_model = "gpt-4o"
20
+
21
+ # --- Redis (required only if use_semantic_cache: true) ---
22
+ config.redis_url = ENV.fetch("REDIS_URL", nil)
23
+
24
+ # --- Tuning ---
25
+ config.similarity_threshold = 0.96 # cosine similarity cutoff for a cache hit
26
+ config.token_budget = 4000 # token limit before history summarization kicks in
27
+ config.cache_ttl = 86400 # cache entry TTL in seconds (default: 24h)
28
+ config.timeout_seconds = 5 # timeout for embedding / external API calls
29
+
30
+ # --- Logging ---
31
+ config.logger = Rails.logger
32
+ config.debug_logging = Rails.env.development?
33
+
34
+ # --- LLM caller (required) ---
35
+ # Wire this up to however your app already calls the LLM.
36
+ #
37
+ # Example with ruby-openai:
38
+ # config.llm_caller = ->(prompt, model:) {
39
+ # OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
40
+ # .chat(parameters: { model: model, messages: [{ role: "user", content: prompt }] })
41
+ # .dig("choices", 0, "message", "content")
42
+ # }
43
+ #
44
+ # Example with a shared service object:
45
+ # config.llm_caller = ->(prompt, model:) {
46
+ # provider = if model.include?("claude") then :anthropic
47
+ # elsif model.include?("gpt") then :openai
48
+ # elsif model.include?("gemini") then :gemini
49
+ # elsif model.include?("nova") || model.include?("amazon") then :bedrock
50
+ # else :ollama
51
+ # end
52
+ # RubyLLM.chat(model: model, provider: provider, assume_model_exists: true) }
53
+ # end
54
+ #
55
+ config.llm_caller = ->(prompt, model:) {
56
+ raise NotImplementedError, "[llm_optimizer] llm_caller is not configured. " \
57
+ "Edit config/initializers/llm_optimizer.rb and wire it to your LLM client."
58
+ }
59
+
60
+ # --- Embeddings caller (optional) ---
61
+ # Only needed if use_semantic_cache: true.
62
+ # If omitted, falls back to OpenAI via ENV["OPENAI_API_KEY"].
63
+ #
64
+ # Example:
65
+ # config.embedding_caller = ->(text) { EmbeddingService.embed(text) }
66
+ #
67
+ # config.embedding_caller = nil
68
+ end
@@ -0,0 +1,47 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ class Compressor
5
+ STOP_WORDS = %w[
6
+ the a an is are was were be been being
7
+ of in to for on at by with from as into
8
+ through during before after above below
9
+ between out off over under again further
10
+ then once
11
+ ].freeze
12
+
13
+ FENCE_RE = /(```[\s\S]*?```|~~~[\s\S]*?~~~)/
14
+
15
+ def initialize(slm_client: nil)
16
+ @slm_client = slm_client
17
+ end
18
+
19
+ def compress(prompt)
20
+ segments = prompt.split(FENCE_RE)
21
+
22
+ processed = segments.map.with_index do |segment, i|
23
+ # Odd-indexed segments are fenced code blocks (captured group)
24
+ if i.odd?
25
+ segment
26
+ else
27
+ remove_stop_words(segment)
28
+ end
29
+ end
30
+
31
+ result = processed.join
32
+ result.gsub(/\s{2,}/, " ").strip
33
+ end
34
+
35
+ def estimate_tokens(text)
36
+ (text.length / 4.0).ceil
37
+ end
38
+
39
+ private
40
+
41
+ def remove_stop_words(text)
42
+ stop_set = STOP_WORDS.to_set
43
+ words = text.split(" ")
44
+ words.reject { |w| stop_set.include?(w.downcase) }.join(" ")
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,79 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "logger"
4
+ require "set"
5
+
6
+ module LlmOptimizer
7
+ class Configuration
8
+ KNOWN_KEYS = %i[
9
+ use_semantic_cache
10
+ compress_prompt
11
+ manage_history
12
+ route_to
13
+ similarity_threshold
14
+ token_budget
15
+ redis_url
16
+ embedding_model
17
+ simple_model
18
+ complex_model
19
+ logger
20
+ debug_logging
21
+ timeout_seconds
22
+ cache_ttl
23
+ llm_caller
24
+ embedding_caller
25
+ ].freeze
26
+
27
+ # Define readers for all known keys (setters below track explicit sets)
28
+ KNOWN_KEYS.each { |key| define_method(key) { instance_variable_get(:"@#{key}") } }
29
+
30
+ def initialize
31
+ @explicitly_set = Set.new
32
+
33
+ @use_semantic_cache = false
34
+ @compress_prompt = false
35
+ @manage_history = false
36
+ @route_to = :auto
37
+ @similarity_threshold = 0.96
38
+ @token_budget = 4000
39
+ @redis_url = nil
40
+ @embedding_model = "text-embedding-3-small"
41
+ @simple_model = "gpt-4o-mini"
42
+ @complex_model = "claude-3-5-sonnet-20241022"
43
+ @logger = Logger.new($stdout)
44
+ @debug_logging = false
45
+ @timeout_seconds = 5
46
+ @cache_ttl = 86400
47
+ @llm_caller = nil
48
+ @embedding_caller = nil
49
+ end
50
+
51
+ # Copies only explicitly set keys from other_config without resetting unmentioned keys.
52
+ def merge!(other_config)
53
+ other_config.instance_variable_get(:@explicitly_set).each do |key|
54
+ public_send(:"#{key}=", other_config.public_send(key))
55
+ end
56
+ self
57
+ end
58
+
59
+ def method_missing(name, *args, &block)
60
+ key = name.to_s.chomp("=").to_sym
61
+ raise ConfigurationError, "Unknown configuration key: #{key}" unless KNOWN_KEYS.include?(key)
62
+
63
+ super
64
+ end
65
+
66
+ def respond_to_missing?(name, include_private = false)
67
+ key = name.to_s.chomp("=").to_sym
68
+ KNOWN_KEYS.include?(key) || super
69
+ end
70
+
71
+ # Override generated attr_accessor setters to track explicitly set keys.
72
+ KNOWN_KEYS.each do |key|
73
+ define_method(:"#{key}=") do |value|
74
+ @explicitly_set << key
75
+ instance_variable_set(:"@#{key}", value)
76
+ end
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "net/http"
4
+ require "uri"
5
+ require "json"
6
+
7
+ module LlmOptimizer
8
+ class EmbeddingClient
9
+ OPENAI_ENDPOINT = "https://api.openai.com/v1/embeddings"
10
+
11
+ def initialize(model:, timeout_seconds:, embedding_caller: nil)
12
+ @model = model
13
+ @timeout_seconds = timeout_seconds
14
+ @embedding_caller = embedding_caller
15
+ end
16
+
17
+ def embed(text)
18
+ if @embedding_caller
19
+ @embedding_caller.call(text)
20
+ else
21
+ embed_via_openai(text)
22
+ end
23
+ rescue EmbeddingError
24
+ raise
25
+ rescue StandardError => e
26
+ raise EmbeddingError, "Embedding request failed: #{e.message}"
27
+ end
28
+
29
+ private
30
+
31
+ def embed_via_openai(text)
32
+ api_key = ENV["OPENAI_API_KEY"]
33
+ raise EmbeddingError, "OPENAI_API_KEY is not set and no embedding_caller configured" if api_key.nil? || api_key.empty?
34
+
35
+ uri = URI(OPENAI_ENDPOINT)
36
+ body = JSON.generate({ model: @model, input: text })
37
+
38
+ http = Net::HTTP.new(uri.host, uri.port)
39
+ http.use_ssl = true
40
+ http.open_timeout = @timeout_seconds
41
+ http.read_timeout = @timeout_seconds
42
+
43
+ request = Net::HTTP::Post.new(uri.path)
44
+ request["Content-Type"] = "application/json"
45
+ request["Authorization"] = "Bearer #{api_key}"
46
+ request.body = body
47
+
48
+ response = http.request(request)
49
+
50
+ unless response.is_a?(Net::HTTPSuccess)
51
+ raise EmbeddingError, "OpenAI embeddings API returned #{response.code}: #{response.body}"
52
+ end
53
+
54
+ parsed = JSON.parse(response.body)
55
+ parsed.dig("data", 0, "embedding") or
56
+ raise EmbeddingError, "Unexpected response shape: #{response.body}"
57
+ rescue Net::OpenTimeout, Net::ReadTimeout => e
58
+ raise EmbeddingError, "Embedding request timed out: #{e.message}"
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ class HistoryManager
5
+ SUMMARIZE_COUNT = 10
6
+
7
+ def initialize(llm_caller:, simple_model:, token_budget:)
8
+ @llm_caller = llm_caller
9
+ @simple_model = simple_model
10
+ @token_budget = token_budget
11
+ end
12
+
13
+ def estimate_tokens(messages)
14
+ total_chars = messages.sum { |m| (m[:content] || m["content"] || "").length }
15
+ total_chars / 4
16
+ end
17
+
18
+ def process(messages)
19
+ return messages if estimate_tokens(messages) <= @token_budget
20
+
21
+ count = [SUMMARIZE_COUNT, messages.length].min
22
+ to_summarize = messages.first(count)
23
+ remainder = messages.drop(count)
24
+
25
+ summary = summarize(to_summarize)
26
+ return messages if summary.nil?
27
+
28
+ [{ role: "system", content: summary }] + remainder
29
+ end
30
+
31
+ private
32
+
33
+ def summarize(messages)
34
+ conversation = messages.map { |m| "#{m[:role] || m["role"]}: #{m[:content] || m["content"]}" }.join("\n")
35
+ prompt = "Summarize the following conversation history concisely, preserving key facts and decisions:\n\n#{conversation}"
36
+
37
+ @llm_caller.call(prompt, model: @simple_model)
38
+ rescue StandardError => e
39
+ warn "[llm_optimizer] HistoryManager summarization failed: #{e.message}"
40
+ nil
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,32 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ class ModelRouter
5
+ COMPLEX_KEYWORDS = %w[analyze refactor debug architect].freeze
6
+ COMPLEX_PHRASES = ["explain in detail"].freeze
7
+ CODE_BLOCK_RE = /```|~~~/
8
+
9
+ def initialize(config)
10
+ @config = config
11
+ end
12
+
13
+ def route(prompt)
14
+ # explicit override
15
+ return @config.route_to if @config.route_to == :simple || @config.route_to == :complex
16
+
17
+ # fenced code block
18
+ return :complex if CODE_BLOCK_RE.match?(prompt)
19
+
20
+ # complex keywords or phrases
21
+ lower = prompt.downcase
22
+ return :complex if COMPLEX_KEYWORDS.any? { |kw| lower.include?(kw) }
23
+ return :complex if COMPLEX_PHRASES.any? { |ph| lower.include?(ph) }
24
+
25
+ # short prompt
26
+ return :simple if prompt.split.length < 20
27
+
28
+ # default
29
+ :complex
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ OptimizeResult = Struct.new(
5
+ :response, :model, :model_tier, :cache_status,
6
+ :original_tokens, :compressed_tokens, :latency_ms, :messages,
7
+ keyword_init: true
8
+ )
9
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "rails/railtie"
4
+
5
+ module LlmOptimizer
6
+ class Railtie < Rails::Railtie
7
+ generators do
8
+ require "generators/llm_optimizer/install_generator"
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,66 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "digest"
4
+ require "msgpack"
5
+
6
+ module LlmOptimizer
7
+ class SemanticCache
8
+ KEY_NAMESPACE = "llm_optimizer:cache:"
9
+
10
+ def initialize(redis_client, threshold:, ttl:)
11
+ @redis = redis_client
12
+ @threshold = threshold
13
+ @ttl = ttl
14
+ end
15
+
16
+ def store(embedding, response)
17
+ key = cache_key(embedding)
18
+ payload = MessagePack.pack({ "embedding" => embedding, "response" => response })
19
+ @redis.set(key, payload, ex: @ttl)
20
+ rescue ::Redis::BaseError => e
21
+ warn "[llm_optimizer] SemanticCache store failed: #{e.message}"
22
+ end
23
+
24
+ def lookup(embedding)
25
+ keys = @redis.keys("#{KEY_NAMESPACE}*")
26
+ return nil if keys.empty?
27
+
28
+ best_score = -Float::INFINITY
29
+ best_response = nil
30
+
31
+ keys.each do |key|
32
+ raw = @redis.get(key)
33
+ next unless raw
34
+
35
+ entry = MessagePack.unpack(raw)
36
+ stored_embedding = entry["embedding"]
37
+ score = cosine_similarity(embedding, stored_embedding)
38
+
39
+ if score > best_score
40
+ best_score = score
41
+ best_response = entry["response"]
42
+ end
43
+ end
44
+
45
+ best_score >= @threshold ? best_response : nil
46
+ rescue ::Redis::BaseError => e
47
+ warn "[llm_optimizer] SemanticCache lookup failed: #{e.message}"
48
+ nil
49
+ end
50
+
51
+ def cosine_similarity(vec_a, vec_b)
52
+ dot = vec_a.zip(vec_b).sum { |a, b| a * b }
53
+ mag_a = Math.sqrt(vec_a.sum { |x| x * x })
54
+ mag_b = Math.sqrt(vec_b.sum { |x| x * x })
55
+ return 0.0 if mag_a.zero? || mag_b.zero?
56
+
57
+ dot / (mag_a * mag_b)
58
+ end
59
+
60
+ private
61
+
62
+ def cache_key(embedding)
63
+ KEY_NAMESPACE + Digest::SHA256.hexdigest(embedding.pack("f*"))
64
+ end
65
+ end
66
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module LlmOptimizer
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,273 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "llm_optimizer/version"
4
+ require_relative "llm_optimizer/configuration"
5
+ require_relative "llm_optimizer/optimize_result"
6
+ require_relative "llm_optimizer/compressor"
7
+ require_relative "llm_optimizer/model_router"
8
+ require_relative "llm_optimizer/embedding_client"
9
+ require_relative "llm_optimizer/semantic_cache"
10
+ require_relative "llm_optimizer/history_manager"
11
+
12
+ require "llm_optimizer/railtie" if defined?(Rails)
13
+
14
+ module LlmOptimizer
15
+ # Base error class for all gem-specific exceptions
16
+ class Error < StandardError; end
17
+
18
+ # Raised when an unrecognized configuration key is set
19
+ class ConfigurationError < Error; end
20
+
21
+ # Raised when the embedding API call fails
22
+ class EmbeddingError < Error; end
23
+
24
+ # Raised when a network timeout is exceeded
25
+ class TimeoutError < Error; end
26
+
27
+ # Global configuration
28
+ @configuration = nil
29
+
30
+ # Yields a Configuration instance; merges it into the global config.
31
+ def self.configure
32
+ temp = Configuration.new
33
+ yield temp
34
+ configuration.merge!(temp)
35
+ validate_configuration!(configuration)
36
+ end
37
+
38
+ # Warns about misconfigured options rather than failing silently at call time.
39
+ def self.validate_configuration!(config)
40
+ if config.use_semantic_cache && config.embedding_caller.nil?
41
+ config.logger.warn(
42
+ "[llm_optimizer] use_semantic_cache is true but no embedding_caller is configured. " \
43
+ "Semantic caching will be skipped. Set config.embedding_caller to enable it."
44
+ )
45
+ config.use_semantic_cache = false
46
+ end
47
+ end
48
+
49
+ # Returns the current global Configuration, lazy-initializing if nil.
50
+ def self.configuration
51
+ @configuration ||= Configuration.new
52
+ end
53
+
54
+ # Replaces the global config with a fresh default Configuration.
55
+ # Useful in tests to avoid state leakage.
56
+ def self.reset_configuration!
57
+ @configuration = Configuration.new
58
+ end
59
+
60
+ # Opt-in client wrapping
61
+ module WrapperModule
62
+ def chat(params, &block)
63
+ prompt = params[:messages] || params[:prompt]
64
+ optimized = LlmOptimizer.optimize(prompt)
65
+ params = params.merge(messages: optimized.messages, model: optimized.model)
66
+ super(params, &block)
67
+ end
68
+ end
69
+
70
+ # Prepends WrapperModule into client_class; idempotent — safe to call N times.
71
+ def self.wrap_client(client_class)
72
+ return if client_class.ancestors.include?(WrapperModule)
73
+
74
+ client_class.prepend(WrapperModule)
75
+ end
76
+
77
+ # Primary entry point
78
+ # Runs the optimization pipeline and returns an OptimizeResult.
79
+
80
+ # options hash keys mirror Configuration attr_accessors and are merged over
81
+ # the global config for this call only. An optional block is yielded a
82
+ # per-call Configuration for fine-grained control.
83
+ def self.optimize(prompt, options = {}, &block)
84
+ start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
85
+
86
+ # Resolve per-call configuration — only pass known config keys
87
+ call_config = Configuration.new
88
+ call_config.merge!(configuration)
89
+ options.each do |k, v|
90
+ next unless LlmOptimizer::Configuration::KNOWN_KEYS.include?(k.to_sym)
91
+ call_config.public_send(:"#{k}=", v)
92
+ end
93
+ yield call_config if block_given?
94
+
95
+ logger = call_config.logger
96
+
97
+ # Keep a reference to the original prompt for fallback use
98
+ original_prompt = prompt
99
+
100
+ # Compression
101
+ compressor = Compressor.new
102
+ original_tokens = compressor.estimate_tokens(prompt)
103
+ compressed_tokens = nil
104
+
105
+ if call_config.compress_prompt
106
+ prompt = compressor.compress(prompt)
107
+ compressed_tokens = compressor.estimate_tokens(prompt)
108
+ end
109
+
110
+ # Model routing
111
+ router = ModelRouter.new(call_config)
112
+ model_tier = router.route(prompt)
113
+ model = model_tier == :simple ? call_config.simple_model : call_config.complex_model
114
+
115
+ # Semantic cache lookup
116
+ embedding = nil
117
+
118
+ if call_config.use_semantic_cache
119
+ begin
120
+ emb_client = EmbeddingClient.new(
121
+ model: call_config.embedding_model,
122
+ timeout_seconds: call_config.timeout_seconds,
123
+ embedding_caller: call_config.embedding_caller
124
+ )
125
+ embedding = emb_client.embed(prompt)
126
+
127
+ if call_config.redis_url
128
+ redis = build_redis(call_config.redis_url)
129
+ cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
130
+ cached = cache.lookup(embedding)
131
+
132
+ if cached
133
+ latency_ms = elapsed_ms(start)
134
+ emit_log(logger, call_config,
135
+ cache_status: :hit, model_tier: model_tier,
136
+ original_tokens: original_tokens, compressed_tokens: compressed_tokens,
137
+ latency_ms: latency_ms, prompt: original_prompt, response: cached)
138
+ return OptimizeResult.new(
139
+ response: cached,
140
+ model: model,
141
+ model_tier: model_tier,
142
+ cache_status: :hit,
143
+ original_tokens: original_tokens,
144
+ compressed_tokens: compressed_tokens,
145
+ latency_ms: latency_ms,
146
+ messages: options[:messages]
147
+ )
148
+ end
149
+ end
150
+ rescue EmbeddingError => e
151
+ logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
152
+ embedding = nil
153
+ # continue pipeline as cache miss
154
+ end
155
+ end
156
+
157
+ # History management
158
+ messages = options[:messages]
159
+ if call_config.manage_history && messages
160
+ llm_caller = ->(p, model:) { raw_llm_call(p, model: model) }
161
+ history_mgr = HistoryManager.new(
162
+ llm_caller: llm_caller,
163
+ simple_model: call_config.simple_model,
164
+ token_budget: call_config.token_budget
165
+ )
166
+ messages = history_mgr.process(messages)
167
+ end
168
+
169
+ # Raw LLM call
170
+ response = raw_llm_call(prompt, model: model, config: call_config)
171
+
172
+ # Cache store
173
+ if call_config.use_semantic_cache && embedding && call_config.redis_url
174
+ begin
175
+ redis = build_redis(call_config.redis_url)
176
+ cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
177
+ cache.store(embedding, response)
178
+ rescue StandardError => e
179
+ logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
180
+ end
181
+ end
182
+
183
+ # Build result
184
+ latency_ms = elapsed_ms(start)
185
+ emit_log(logger, call_config,
186
+ cache_status: :miss, model_tier: model_tier,
187
+ original_tokens: original_tokens, compressed_tokens: compressed_tokens,
188
+ latency_ms: latency_ms, prompt: original_prompt, response: response)
189
+
190
+ OptimizeResult.new(
191
+ response: response,
192
+ model: model,
193
+ model_tier: model_tier,
194
+ cache_status: :miss,
195
+ original_tokens: original_tokens,
196
+ compressed_tokens: compressed_tokens,
197
+ latency_ms: latency_ms,
198
+ messages: messages
199
+ )
200
+
201
+ rescue EmbeddingError => e
202
+ # Treat embedding failures as cache miss — continue to raw LLM call
203
+ logger = configuration.logger
204
+ logger.warn("[llm_optimizer] EmbeddingError (outer rescue, treating as cache miss): #{e.message}")
205
+ latency_ms = elapsed_ms(start)
206
+ response = raw_llm_call(original_prompt, model: nil, config: configuration)
207
+ OptimizeResult.new(
208
+ response: response,
209
+ model: nil,
210
+ model_tier: nil,
211
+ cache_status: :miss,
212
+ original_tokens: original_tokens || 0,
213
+ compressed_tokens: nil,
214
+ latency_ms: latency_ms,
215
+ messages: options[:messages]
216
+ )
217
+
218
+ rescue LlmOptimizer::Error, StandardError => e
219
+ logger = configuration.logger
220
+ logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
221
+ latency_ms = elapsed_ms(start)
222
+ response = raw_llm_call(original_prompt, model: nil, config: configuration)
223
+ OptimizeResult.new(
224
+ response: response,
225
+ model: nil,
226
+ model_tier: nil,
227
+ cache_status: :miss,
228
+ original_tokens: original_tokens || 0,
229
+ compressed_tokens: nil,
230
+ latency_ms: latency_ms,
231
+ messages: options[:messages]
232
+ )
233
+ end
234
+
235
+ # Private helpers
236
+
237
+ class << self
238
+ private
239
+
240
+ def raw_llm_call(prompt, model:, config: nil)
241
+ caller = config&.llm_caller || @_current_llm_caller
242
+ raise ConfigurationError,
243
+ "No llm_caller configured. Set it via LlmOptimizer.configure { |c| c.llm_caller = ->(prompt, model:) { ... } }" unless caller
244
+
245
+ caller.call(prompt, model: model)
246
+ end
247
+
248
+ def elapsed_ms(start)
249
+ ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
250
+ end
251
+
252
+ def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
253
+ compressed_tokens:, latency_ms:, prompt:, response:)
254
+
255
+ logger.info(
256
+ "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
257
+ "model_tier: #{model_tier.inspect}, " \
258
+ "original_tokens: #{original_tokens.inspect}, " \
259
+ "compressed_tokens: #{compressed_tokens.inspect}, " \
260
+ "latency_ms: #{latency_ms.inspect} }"
261
+ )
262
+
263
+ if config.debug_logging
264
+ logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}")
265
+ end
266
+ end
267
+
268
+ def build_redis(redis_url)
269
+ require "redis"
270
+ Redis.new(url: redis_url)
271
+ end
272
+ end
273
+ end
@@ -0,0 +1,4 @@
1
+ module LlmOptimizer
2
+ VERSION: String
3
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
4
+ end
metadata ADDED
@@ -0,0 +1,135 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: llm_optimizer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - arun kumar
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: redis
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '5.0'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '5.0'
26
+ - !ruby/object:Gem::Dependency
27
+ name: msgpack
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '1.7'
33
+ type: :runtime
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '1.7'
40
+ - !ruby/object:Gem::Dependency
41
+ name: logger
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '1.6'
47
+ type: :runtime
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '1.6'
54
+ - !ruby/object:Gem::Dependency
55
+ name: prop_check
56
+ requirement: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '1.0'
61
+ type: :development
62
+ prerelease: false
63
+ version_requirements: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '1.0'
68
+ - !ruby/object:Gem::Dependency
69
+ name: mocha
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '2.0'
75
+ type: :development
76
+ prerelease: false
77
+ version_requirements: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '2.0'
82
+ description: llm_optimizer reduces LLM API costs by up to 80% through semantic caching
83
+ (Redis + vector similarity), intelligent model routing, token pruning, and conversation
84
+ history summarization. Strictly opt-in and non-invasive.
85
+ email:
86
+ - arunr.rubydev@gmail.com
87
+ executables: []
88
+ extensions: []
89
+ extra_rdoc_files: []
90
+ files:
91
+ - CHANGELOG.md
92
+ - CODE_OF_CONDUCT.md
93
+ - LICENSE.txt
94
+ - README.md
95
+ - Rakefile
96
+ - lib/generators/llm_optimizer/install_generator.rb
97
+ - lib/generators/llm_optimizer/templates/initializer.rb
98
+ - lib/llm_optimizer.rb
99
+ - lib/llm_optimizer/compressor.rb
100
+ - lib/llm_optimizer/configuration.rb
101
+ - lib/llm_optimizer/embedding_client.rb
102
+ - lib/llm_optimizer/history_manager.rb
103
+ - lib/llm_optimizer/model_router.rb
104
+ - lib/llm_optimizer/optimize_result.rb
105
+ - lib/llm_optimizer/railtie.rb
106
+ - lib/llm_optimizer/semantic_cache.rb
107
+ - lib/llm_optimizer/version.rb
108
+ - sig/llm_optimizer.rbs
109
+ homepage: https://github.com/arunkumarry/llm_optimizer
110
+ licenses:
111
+ - MIT
112
+ metadata:
113
+ allowed_push_host: https://rubygems.org
114
+ homepage_uri: https://github.com/arunkumarry/llm_optimizer
115
+ source_code_uri: https://github.com/arunkumarry/llm_optimizer/tree/main
116
+ changelog_uri: https://github.com/arunkumarry/llm_optimizer/blob/main/CHANGELOG.md
117
+ rdoc_options: []
118
+ require_paths:
119
+ - lib
120
+ required_ruby_version: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: 3.2.0
125
+ required_rubygems_version: !ruby/object:Gem::Requirement
126
+ requirements:
127
+ - - ">="
128
+ - !ruby/object:Gem::Version
129
+ version: '0'
130
+ requirements: []
131
+ rubygems_version: 3.6.9
132
+ specification_version: 4
133
+ summary: Smart Gateway for LLM calls — semantic caching, model routing, token pruning,
134
+ and history management.
135
+ test_files: []