RubyGems - llm_optimizer - Versions diffs - 0.1.1 → 0.1.3 - Mend

llm_optimizer 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +27 -1
data/README.md +39 -7
data/lib/generators/llm_optimizer/templates/initializer.rb +12 -1
data/lib/llm_optimizer/configuration.rb +2 -0
data/lib/llm_optimizer/model_router.rb +36 -8
data/lib/llm_optimizer/semantic_cache.rb +13 -3
data/lib/llm_optimizer/version.rb +1 -1
data/lib/llm_optimizer.rb +1 -1
metadata +1 -15

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d382e2ae48971edae81c24fa4e05bbacf9394c04dabed28b0277ca429e75a98d
-  data.tar.gz: d807840237cf09e8b271063660242ae1c460b682425a50389c60f396b086e2c4
+  metadata.gz: ef2fcae7f3d39043f476a555b980685670c65e266f8fc3f9ca4309081d51c066
+  data.tar.gz: c5fb255ad280afba780ea3c417b377ae406dc828178609bf7e21c0bb4f1ba048
 SHA512:
-  metadata.gz: 8cac9e17c1f243c17d997e799daf25d886b329c09e83c84d9151f55abbb50d36a7e1b486171e401a645443022bb4de05e4430d0e303e05587dd1b244eda18cbe
-  data.tar.gz: 598b000eabc6a4c0000b3b9bd2162231c619d7618653ca6356948b623f0524db048c7b1f9a8a589905c6b611a10fe6bdc42e7490b432fe8f6047b75dcc35038a
+  metadata.gz: f84eba0ae06cd7541616c44c8630618eb09f3f8b1d1fe5b588eae285be6dd6a2fcc88f0868a00cbfb91e00b491f56232c0c592b3bbbea579748232a89e8aff1e
+  data.tar.gz: 80fd56954cfa497f2d7c16be68b4c41c6cd01128f3df1e2b1054c3d1005cb869b70317e60ae847dbae2d2f270119812d00d175a4dfa564c447791c2195bc7672

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.1.3] - 2026-04-10
+### Added
+- `classifier_caller` config option — injectable lambda for LLM-based prompt classification
+- Hybrid routing in `ModelRouter`: fast-path signals (code blocks, keywords) → LLM classifier → word-count heuristic fallback
+- Fixes misclassification of short-but-complex prompts (e.g. "Fix this bug") and long-but-simple prompts
+- Classifier failures (network errors, missing model, unexpected response) automatically fall through to heuristic — no app impact
+- Tests for classifier integration, failure fallback, and fast-path bypass
+### Changed
+- `ModelRouter` routing logic now uses three-layer decision chain instead of pure heuristics
+- README updated with classifier documentation and routing decision flow
+## [0.1.2] - 2026-04-10
+### Fixed
+- `SemanticCache` used `pack("f*")` (32-bit) for both the Redis key hash and embedding serialization, causing precision loss on round-trip through MessagePack. Switched to `pack("G*")` / `unpack("G*")` (64-bit IEEE 754) — self-similarity is now exactly `1.0` and cache lookups work correctly with real embedding providers (Voyage AI, OpenAI, Cohere, etc.)
+- `HistoryManager` summarization failed with `ConfigurationError: No llm_caller configured` when invoked through the gateway pipeline. The internal `raw_llm_call` lambda was missing `config: call_config`, so it couldn't resolve the user's configured `llm_caller`
+- Updated `test/unit/test_gateway.rb` mock Redis helper to use `pack("G*")` to match the corrected `SemanticCache` key format
+### Added
+- `bin/test_semantic_cache.rb` — runnable smoke test for semantic cache using Voyage AI embeddings + Anthropic Claude
+- `bin/test_history_manager.rb` — runnable smoke test for history manager sliding window using Anthropic Claude
 ## [0.1.1] - 2026-04-10
 ### Fixed
@@ -46,5 +70,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
 - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
-[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...HEAD
+[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...HEAD
+[0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2
+[0.1.1]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...v0.1.1
 [0.1.0]: https://github.com/arunkumarry/llm_optimizer/releases/tag/v0.1.0

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # llm_optimizer
-A Smart Gateway for LLM API calls in Ruby and Rails applications. Reduces token usage and API costs through four composable optimizations — all opt-in, all independently configurable.
+A Smart Gateway for LLM API calls in Ruby and Rails applications. Reduces token usage and API costs through four composable optimizations all opt-in, all independently configurable.
 ## How it works
@@ -10,17 +10,41 @@ Every call to `LlmOptimizer.optimize` passes through an ordered pipeline:
 prompt → Compressor → ModelRouter → SemanticCache lookup → HistoryManager → LLM call → SemanticCache store → OptimizeResult
 ```
-Each stage is independently enabled via configuration flags. If any stage fails, the gem falls through to a raw LLM call — your app never breaks because of the optimizer.
+Each stage is independently enabled via configuration flags. If any stage fails, the gem falls through to a raw LLM call your app never breaks because of the optimizer.
 ## Optimizations
 ### 1. Semantic Caching
-Stores prompt embeddings in Redis. On subsequent calls, computes cosine similarity against stored embeddings. If similarity ≥ threshold, returns the cached response instantly — no LLM call made.
+Stores prompt embeddings in Redis. On subsequent calls, computes cosine similarity against stored embeddings. If similarity ≥ threshold, returns the cached response instantly no LLM call made.
 ### 2. Intelligent Model Routing
-Classifies each prompt using a heuristic and routes it to the appropriate model tier:
-- **Simple** — short prompts (< 20 words), no code blocks, no complex keywords → cheaper/faster model
-- **Complex** — code blocks, keywords like `analyze`, `refactor`, `debug`, `architect`, `explain in detail` → premium model
+Classifies each prompt and routes it to the appropriate model tier:
+- **Simple** → cheaper/faster model (e.g. `gpt-4o-mini`, `amazon.nova-micro`)
+- **Complex** → premium model (e.g. `claude-3-5-sonnet`, `gpt-4o`)
+Routing uses a three-layer decision chain:
+1. **Explicit override** — if `route_to: :simple` or `:complex` is set, always use that
+2. **Fast-path signals** — code blocks (` ``` `, `~~~`) and keywords (`analyze`, `refactor`, `debug`, `architect`, `explain in detail`) → instantly `:complex`, no LLM call
+3. **LLM classifier** (optional) — for ambiguous prompts, calls a cheap model with a classification prompt; falls back to word-count heuristic if not configured or if the call fails
+This hybrid approach fixes the core weakness of pure heuristics:
+- `"Fix this bug"` → 3 words but `:complex` via classifier ✓
+- `"Explain Ruby blocks simply"` → long but `:simple` via classifier ✓
+- `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call) ✓
+Configure the classifier with any cheap model your app already uses:
+```ruby
+config.classifier_caller = ->(prompt) {
+  RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
+    .ask(prompt).content.strip.downcase
+}
+```
+If `classifier_caller` is not set, the router falls back to the word-count heuristic (< 20 words → `:simple`).
 ### 3. Token Pruning
 Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
@@ -120,6 +144,13 @@ LlmOptimizer.configure do |config|
   config.embedding_caller = ->(text) {
     MyEmbeddingService.embed(text)
   }
+  # Classifier caller — optional, improves routing accuracy for ambiguous prompts
+  # Falls back to word-count heuristic if not set or if the call fails
+  config.classifier_caller = ->(prompt) {
+    RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
+      .ask(prompt).content.strip.downcase
+  }
 end
 ```
@@ -143,6 +174,7 @@ end
 | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
 | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
 | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
+| `classifier_caller` | Lambda | `nil` | `(prompt) -> "simple" or "complex"` |
 ## Per-call configuration
@@ -179,7 +211,7 @@ Transparently wrap an existing LLM client class so all calls through it are auto
 LlmOptimizer.wrap_client(OpenAI::Client)
 ```
-This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times — idempotent.
+This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
 ## OptimizeResult

data/lib/generators/llm_optimizer/templates/initializer.rb CHANGED Viewed

@@ -64,5 +64,16 @@ LlmOptimizer.configure do |config|
   # Example:
   #   config.embedding_caller = ->(text) { EmbeddingService.embed(text) }
   #
-  # config.embedding_caller = nil
+  # --- Routing classifier (optional) ---
+  # When set, ambiguous prompts are classified by a cheap LLM instead of
+  # falling back to the word-count heuristic. Unambiguous signals (code blocks,
+  # keywords) still bypass the classifier for speed.
+  #
+  # Example:
+  #   config.classifier_caller = ->(prompt) {
+  #     RubyLLM.chat(model: "amazon.nova-micro-v1:0", assume_model_exists: true)
+  #       .ask(prompt).content.strip.downcase
+  #   }
+  #
+  # config.classifier_caller = nil
 end

data/lib/llm_optimizer/configuration.rb CHANGED Viewed

@@ -21,6 +21,7 @@ module LlmOptimizer
       cache_ttl
       llm_caller
       embedding_caller
+      classifier_caller
     ].freeze
     # Define readers for all known keys (setters below track explicit sets)
@@ -45,6 +46,7 @@ module LlmOptimizer
       @cache_ttl            = 86_400
       @llm_caller           = nil
       @embedding_caller     = nil
+      @classifier_caller    = nil
     end
     # Copies only explicitly set keys from other_config without resetting unmentioned keys.

data/lib/llm_optimizer/model_router.rb CHANGED Viewed

@@ -6,27 +6,55 @@ module LlmOptimizer
     COMPLEX_PHRASES  = ["explain in detail"].freeze
     CODE_BLOCK_RE    = /```|~~~/
+    CLASSIFIER_PROMPT = <<~PROMPT
+      Classify the following prompt as either 'simple' or 'complex'.
+      Rules:
+      - simple: factual questions, basic lookups, short explanations, greetings
+      - complex: code generation, debugging, architecture, multi-step reasoning, analysis
+      Reply with exactly one word: simple or complex
+      Prompt: %<prompt>s
+    PROMPT
     def initialize(config)
       @config = config
     end
     def route(prompt)
-      # explicit override
+      # Explicit override — always
       return @config.route_to if %i[simple complex].include?(@config.route_to)
-      # fenced code block
+      # Unambiguous fast-path signals (no LLM call needed)
       return :complex if CODE_BLOCK_RE.match?(prompt)
-      # complex keywords or phrases
       lower = prompt.downcase
       return :complex if COMPLEX_KEYWORDS.any? { |kw| lower.include?(kw) }
-      return :complex if COMPLEX_PHRASES.any? { |ph| lower.include?(ph) }
+      return :complex if COMPLEX_PHRASES.any?  { |ph| lower.include?(ph) }
+      # LLM classifier for ambiguous prompts
+      if @config.classifier_caller
+        result = classify_with_llm(prompt)
+        return result if result
+      end
+      # Fallback heuristic
+      prompt.split.length < 20 ? :simple : :complex
+    end
+    private
-      # short prompt
-      return :simple if prompt.split.length < 20
+    def classify_with_llm(prompt)
+      classifier_prompt = format(CLASSIFIER_PROMPT, prompt: prompt)
+      response = @config.classifier_caller.call(classifier_prompt)
+      normalized = response.to_s.strip.downcase.gsub(/[^a-z]/, "")
+      return :simple  if normalized == "simple"
+      return :complex if normalized == "complex"
-      # default
-      :complex
+      nil # unrecognized response — fall through to heuristic
+    rescue StandardError
+      nil # classifier failure — fall through to heuristic
     end
   end
 end

data/lib/llm_optimizer/semantic_cache.rb CHANGED Viewed

@@ -15,7 +15,13 @@ module LlmOptimizer
     def store(embedding, response)
       key     = cache_key(embedding)
-      payload = MessagePack.pack({ "embedding" => embedding, "response" => response })
+      # Serialize embedding as raw 64-bit big-endian doubles to preserve full
+      # Float precision. MessagePack silently downcasts Ruby Float to 32-bit,
+      # which corrupts cosine similarity on deserialization.
+      payload = MessagePack.pack({
+                                   "embedding" => embedding.pack("G*"), # binary string, lossless
+                                   "response" => response
+                                 })
       @redis.set(key, payload, ex: @ttl)
     rescue ::Redis::BaseError => e
       warn "[llm_optimizer] SemanticCache store failed: #{e.message}"
@@ -33,7 +39,8 @@ module LlmOptimizer
         next unless raw
         entry = MessagePack.unpack(raw)
-        stored_embedding = entry["embedding"]
+        # Unpack the binary string back to 64-bit doubles
+        stored_embedding = entry["embedding"].unpack("G*")
         score = cosine_similarity(embedding, stored_embedding)
         if score > best_score
@@ -60,7 +67,10 @@ module LlmOptimizer
     private
     def cache_key(embedding)
-      KEY_NAMESPACE + Digest::SHA256.hexdigest(embedding.pack("f*"))
+      # Use "G*" (64-bit big-endian double) to match Ruby's native Float precision.
+      # "f*" (32-bit) truncates precision and produces inconsistent hashes for the
+      # same embedding across serialize/deserialize round trips.
+      KEY_NAMESPACE + Digest::SHA256.hexdigest(embedding.pack("G*"))
     end
   end
 end

data/lib/llm_optimizer/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LlmOptimizer
-  VERSION = "0.1.1"
+  VERSION = "0.1.3"
 end

data/lib/llm_optimizer.rb CHANGED Viewed

@@ -158,7 +158,7 @@ module LlmOptimizer
     # History management
     messages = options[:messages]
     if call_config.manage_history && messages
-      llm_caller = ->(p, model:) { raw_llm_call(p, model: model) }
+      llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: call_config) }
       history_mgr = HistoryManager.new(
         llm_caller: llm_caller,
         simple_model: call_config.simple_model,

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm_optimizer
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.1.3
 platform: ruby
 authors:
 - arun kumar
@@ -79,20 +79,6 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.65'
-- !ruby/object:Gem::Dependency
-  name: prop_check
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.0'
 description: llm_optimizer reduces LLM API costs by up to 80% through semantic caching,
   intelligent model routing, token pruning, and conversation history summarization.
   Strictly opt-in and non-invasive.