RubyGems - llm_optimizer - Versions diffs - 0.1.3 → 0.1.4 - Mend

llm_optimizer 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ef2fcae7f3d39043f476a555b980685670c65e266f8fc3f9ca4309081d51c066
-  data.tar.gz: c5fb255ad280afba780ea3c417b377ae406dc828178609bf7e21c0bb4f1ba048
+  metadata.gz: c6903d2d4c2163d93ffe8d0d5ad9708d64a8472a430ed9f266c9237e468c8585
+  data.tar.gz: c7270f4717ece6778976f46f1601f9e5d45939e3e7926ea7e3ed05b3b641f413
 SHA512:
-  metadata.gz: f84eba0ae06cd7541616c44c8630618eb09f3f8b1d1fe5b588eae285be6dd6a2fcc88f0868a00cbfb91e00b491f56232c0c592b3bbbea579748232a89e8aff1e
-  data.tar.gz: 80fd56954cfa497f2d7c16be68b4c41c6cd01128f3df1e2b1054c3d1005cb869b70317e60ae847dbae2d2f270119812d00d175a4dfa564c447791c2195bc7672
+  metadata.gz: 858cad7443f7adcbe42b3d5ce62b4e815081d2238b7711066276ee2a7c0fb6a506d267ccb48dbe611a2ed08b2eab29139057dcddc2d033155561499a0d6f5421
+  data.tar.gz: b3afc392e8fb2ef5b7baa468f74f9def34a15db9f6df898fd738503638d32f5dda9b04a6c8f2e005cd94aa893eca864111f3be0f2e8bfa1cc0aeef6391e0ae2c

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.1.4] - 2026-04-13
+### Fixed
+- `WrapperModule#chat` (used by `wrap_client`) incorrectly called `LlmOptimizer.optimize` internally which required `llm_caller` to be configured — causing `ConfigurationError` for users who only called `wrap_client`. Refactored into `optimize_pre_call` / `optimize_post_call` so the wrapped client handles the actual LLM call via `super`. `llm_caller` is no longer needed when using `wrap_client`
+### Added
+- `LlmOptimizer.optimize_pre_call(prompt, config)` — runs compress → route → cache lookup without making an LLM call; used internally by `WrapperModule` and available for advanced integrations
+- `LlmOptimizer.optimize_post_call(pre_call_result, response, config)` — stores a response in the semantic cache after an LLM call; used internally by `WrapperModule`
 ## [0.1.3] - 2026-04-10
 ### Added
@@ -70,7 +79,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
 - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
-[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...HEAD
+[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...HEAD
+[0.1.4]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.3...v0.1.4
+[0.1.3]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...v0.1.3
 [0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2
 [0.1.1]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...v0.1.1
 [0.1.0]: https://github.com/arunkumarry/llm_optimizer/releases/tag/v0.1.0

data/README.md CHANGED Viewed

@@ -31,9 +31,9 @@ Routing uses a three-layer decision chain:
 3. **LLM classifier** (optional) — for ambiguous prompts, calls a cheap model with a classification prompt; falls back to word-count heuristic if not configured or if the call fails
 This hybrid approach fixes the core weakness of pure heuristics:
-- `"Fix this bug"` → 3 words but `:complex` via classifier ✓
-- `"Explain Ruby blocks simply"` → long but `:simple` via classifier ✓
-- `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call) ✓
+- `"Fix this bug"` → 3 words but `:complex` via classifier
+- `"Explain Ruby blocks simply"` → long but `:simple` via classifier
+- `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call)
 Configure the classifier with any cheap model your app already uses:

data/lib/llm_optimizer/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LlmOptimizer
-  VERSION = "0.1.3"
+  VERSION = "0.1.4"
 end

data/lib/llm_optimizer.rb CHANGED Viewed

@@ -58,12 +58,37 @@ module LlmOptimizer
   end
   # Opt-in client wrapping
+  # WrapperModule intercepts `chat` on the wrapped client, runs the pre-call
+  # optimization pipeline (compress, route, cache lookup), and delegates the
+  # actual LLM call to the original client via `super` — so llm_caller is NOT
+  # required when using wrap_client.
   module WrapperModule
-    def chat(params, &)
+    def chat(params, &block)
+      config = LlmOptimizer.configuration
       prompt = params[:messages] || params[:prompt]
-      optimized = LlmOptimizer.optimize(prompt)
-      params = params.merge(messages: optimized.messages, model: optimized.model)
-      super
+      # Run pre-call pipeline: compress, route, cache lookup
+      result = LlmOptimizer.optimize_pre_call(prompt, config)
+      # Cache hit — return immediately without calling the LLM
+      if result[:cache_status] == :hit
+        return result[:response]
+      end
+      # Apply compressed prompt and routed model, then delegate to original client
+      optimized_params = params.merge(model: result[:model])
+      if params[:messages]
+        optimized_params = optimized_params.merge(messages: result[:prompt])
+      elsif params[:prompt]
+        optimized_params = optimized_params.merge(prompt: result[:prompt])
+      end
+      response = super(optimized_params, &block)
+      # Store in cache after successful LLM call
+      LlmOptimizer.optimize_post_call(result, response, config)
+      response
     end
   end
@@ -231,6 +256,53 @@ module LlmOptimizer
     )
   end
+  # Pre-call pipeline for wrap_client: compress, route, cache lookup.
+  # Returns a hash with :prompt, :model, :model_tier, :embedding, :cache_status, :response.
+  # Does NOT make an LLM call — the wrapped client handles that via super.
+  def self.optimize_pre_call(prompt, config = configuration)
+    compressor = Compressor.new
+    prompt     = compressor.compress(prompt) if config.compress_prompt
+    router     = ModelRouter.new(config)
+    model_tier = router.route(prompt)
+    model      = model_tier == :simple ? config.simple_model : config.complex_model
+    embedding = nil
+    if config.use_semantic_cache && config.redis_url
+      begin
+        emb_client = EmbeddingClient.new(
+          model: config.embedding_model,
+          timeout_seconds: config.timeout_seconds,
+          embedding_caller: config.embedding_caller
+        )
+        embedding = emb_client.embed(prompt)
+        redis  = build_redis(config.redis_url)
+        cache  = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
+        cached = cache.lookup(embedding)
+        return { prompt: prompt, model: model, model_tier: model_tier,
+                 embedding: embedding, cache_status: :hit, response: cached } if cached
+      rescue EmbeddingError => e
+        config.logger.warn("[llm_optimizer] wrap_client EmbeddingError (cache miss): #{e.message}")
+        embedding = nil
+      end
+    end
+    { prompt: prompt, model: model, model_tier: model_tier,
+      embedding: embedding, cache_status: :miss, response: nil }
+  end
+  # Post-call: store the LLM response in the semantic cache if applicable.
+  def self.optimize_post_call(pre_call_result, response, config = configuration)
+    return unless config.use_semantic_cache && config.redis_url
+    return unless pre_call_result[:embedding]
+    redis = build_redis(config.redis_url)
+    cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
+    cache.store(pre_call_result[:embedding], response)
+  rescue StandardError => e
+    config.logger.warn("[llm_optimizer] wrap_client cache store failed: #{e.message}")
+  end
   # Private helpers
   class << self

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm_optimizer
 version: !ruby/object:Gem::Version
-  version: 0.1.3
+  version: 0.1.4
 platform: ruby
 authors:
 - arun kumar