RubyGems - llm_optimizer - Versions diffs - 0.1.4 → 0.1.6 - Mend

llm_optimizer 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +40 -1
data/README.md +54 -63
data/lib/generators/llm_optimizer/templates/initializer.rb +46 -2
data/lib/llm_optimizer/configuration.rb +10 -0
data/lib/llm_optimizer/conversation_store.rb +83 -0
data/lib/llm_optimizer/model_router.rb +10 -5
data/lib/llm_optimizer/optimize_result.rb +39 -4
data/lib/llm_optimizer/pipeline.rb +174 -0
data/lib/llm_optimizer/semantic_cache.rb +21 -16
data/lib/llm_optimizer/version.rb +1 -1
data/lib/llm_optimizer.rb +58 -252
metadata +3 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c6903d2d4c2163d93ffe8d0d5ad9708d64a8472a430ed9f266c9237e468c8585
-  data.tar.gz: c7270f4717ece6778976f46f1601f9e5d45939e3e7926ea7e3ed05b3b641f413
+  metadata.gz: 3a0ec4bdfa750f16155927a3e00c9fe2c1c39da7e85866eb6c65855ac6eebaef
+  data.tar.gz: 0e5820f0503fbef14dc1ad858dfaa7527e3dba278fbf7640df377d82fbc61ad7
 SHA512:
-  metadata.gz: 858cad7443f7adcbe42b3d5ce62b4e815081d2238b7711066276ee2a7c0fb6a506d267ccb48dbe611a2ed08b2eab29139057dcddc2d033155561499a0d6f5421
-  data.tar.gz: b3afc392e8fb2ef5b7baa468f74f9def34a15db9f6df898fd738503638d32f5dda9b04a6c8f2e005cd94aa893eca864111f3be0f2e8bfa1cc0aeef6391e0ae2c
+  metadata.gz: 8c2f376e324a7678063e66a89b6ad89e476bd699fd3a816c7c91a79b16ba40e09111cfdfacb1206946e2d111122e63cf70babc09a0467821723b2b286eda235a
+  data.tar.gz: 5bba8c343627f230c13f0671cd8b1374ab0405f6c6369457b92e9093ac1cd2f780797797a26fabdc865981ca1e131b6dc80ae4a97342f3de2f3297255d8e13c9

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.1.6] - 2026-05-04
+### Added
+- `with_tools` configuration option (aliased as `tools`) — allows passing function/tool definitions to LLM calls via the `optimize` method
+- Tool support for both `llm_caller` and `messages_caller` — `tools:` keyword argument is now passed to all underlying LLM callers
+- `with_tools` examples in the README and Rails initializer template
+- `cache_scope` configuration option — isolates semantic cache entries into separate namespaces; useful for ensuring cache hits only occur within specific contexts (e.g., user IDs, account types, or dynamic categories)
+### Changed
+- `Pipeline#raw_llm_call` refactored to handle global and per-call tools consistently
+- Refactored `Pipeline` to remove duplicate internal method definitions (`semantic_cache_lookup`, `store_in_cache`)
+- `SemanticCache#lookup` return format updated to `[response, token_info]` to support better metadata tracking
+### Fixed
+- RuboCop `Metrics/ParameterLists` offense in `OptimizeResult#initialize` by adding targeted override for the necessary result fields
+## [0.1.5] - 2026-04-22
+### Added
+- `ConversationStore` — Redis-backed conversation persistence under the `llm_optimizer:conversation:<id>` namespace; handles load, save, TTL, and debug logging
+- `conversation_id` option on `LlmOptimizer.optimize` — pass a stable ID and the gem automatically loads history from Redis, calls the LLM with full context, and saves the updated history back; no manual message management required
+- `messages_caller` config option — injectable lambda `(messages, model:) -> String` for LLM providers that accept a full message array (OpenAI chat, Anthropic messages, etc.); takes priority over `llm_caller` when conversation history is present
+- `system_prompt` config option — seeded as the opening exchange when a new conversation is created via `conversation_id`
+- `conversation_ttl` config option — TTL in seconds for Redis conversation keys (default `86400`; `0` for no expiry)
+- `LlmOptimizer.clear_conversation(conversation_id)` — deletes a conversation key from Redis; returns `true` if deleted, `false` if not found
+- `pipeline#load_conversation` and `pipeline#persist_conversation` — internal helpers wiring `ConversationStore` into the optimize pipeline
+- `pipeline#apply_history_manager` — applies `HistoryManager` sliding-window summarization to loaded conversation history when `manage_history: true`
+### Changed
+- `HistoryManager` now receives an internal `llm_caller` lambda that routes through `raw_llm_call`, so it correctly uses `messages_caller` when available instead of always requiring `llm_caller`
+- `raw_llm_call` updated to prefer `messages_caller` over `llm_caller` when a non-empty messages array is present
+- `ModelRouter` classifier response matching now uses word-boundary regex (`/\bsimple\b/`, `/\bcomplex\b/`) to handle decorated responses like `"simple."`, `"**complex**"`, or `"the answer is simple"` — previously only exact string match was used
+- `ModelRouter` classifier failures (any `StandardError`) and unrecognized responses both fall through silently to the word-count heuristic; no exception is raised to the caller
+- `validate_conversation_options!` raises `ConfigurationError` if both `conversation_id` and `messages:` are supplied, or if `conversation_id` is used without `redis_url`
+### Fixed
+- `HistoryManager` summarization raised `ConfigurationError: No llm_caller configured` when called inside the pipeline without a bound config — internal lambda now correctly captures `call_config`
 ## [0.1.4] - 2026-04-13
 ### Fixed
@@ -79,7 +117,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
 - Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
-[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...HEAD
+[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.5...HEAD
+[0.1.5]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.4...v0.1.5
 [0.1.4]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.3...v0.1.4
 [0.1.3]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...v0.1.3
 [0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2

data/README.md CHANGED Viewed

@@ -21,8 +21,8 @@ Stores prompt embeddings in Redis. On subsequent calls, computes cosine similari
 Classifies each prompt and routes it to the appropriate model tier:
-- **Simple** → cheaper/faster model (e.g. `gpt-4o-mini`, `amazon.nova-micro`)
-- **Complex** → premium model (e.g. `claude-3-5-sonnet`, `gpt-4o`)
+- **Simple** → cheaper/faster model (e.g. `llama3`, `gemini-2.5-flash-lite`)
+- **Complex** → premium model (e.g. `claude-haiku-4-5-20251001`, `gemini-3.0-pro`)
 Routing uses a three-layer decision chain:
@@ -50,7 +50,7 @@ If `classifier_caller` is not set, the router falls back to the word-count heuri
 Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
 ### 4. Conversation History Sliding Window
-When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message.
+When a conversation history exceeds the configured token budget, summarizes the oldest messages using the simple model and replaces them with a single system summary message. Uses Redis to store for fast reetreival and summarizing.
 ## Installation
@@ -99,7 +99,7 @@ result = LlmOptimizer.optimize("What is Redis?")
 puts result.response          # => "Redis is an in-memory data store..."
 puts result.cache_status      # => :hit or :miss
 puts result.model_tier        # => :simple or :complex
-puts result.model             # => "gpt-4o-mini"
+puts result.model             # => "gemini-2.5-flash-lite"
 puts result.original_tokens   # => 5
 puts result.compressed_tokens # => 4
 puts result.latency_ms        # => 12.4
@@ -110,39 +110,50 @@ puts result.latency_ms        # => 12.4
 ### Rails initializer
 ```ruby
+# config/initializers/llm_optimizer.rb
+require "llm_optimizer"
 LlmOptimizer.configure do |config|
-  # Feature flags — all off by default
+  # --- Feature flags (all off by default) ---
   config.compress_prompt    = true   # strip stop words before sending to LLM
   config.use_semantic_cache = true   # cache responses by vector similarity
   config.manage_history     = true   # summarize old messages when over token budget
-  # Model routing
-  config.route_to      = :auto          # :auto | :simple | :complex
-  config.simple_model  = "gpt-4o-mini"  # model used for simple prompts
-  config.complex_model = "claude-3-5-sonnet-20241022"  # model used for complex prompts
+  # --- Model routing ---
+  config.route_to      = :auto                        # :auto, :simple, or :complex
+  config.simple_model  = "gemini-2.5-flash-lite" # used for simple prompts
+  config.complex_model = "claude-haiku-4-5-20251001" # used for complex prompts
-  # Redis (required if use_semantic_cache: true)
+  # --- Redis (required if use_semantic_cache: true) ---
   config.redis_url = ENV["REDIS_URL"]
-  # Tuning
-  config.similarity_threshold = 0.96   # cosine similarity cutoff for cache hit (0.0–1.0)
-  config.token_budget         = 4000   # token limit before history summarization
-  config.cache_ttl            = 86400  # cache TTL in seconds (default: 24h)
+  # --- Token / cache settings ---
+  config.similarity_threshold = 0.96   # cosine similarity cutoff for cache hit
+  config.token_budget         = 4000   # max tokens before history summarization
+  config.cache_ttl            = 86400  # cache TTL in seconds (24h)
   config.timeout_seconds      = 5      # timeout for external API calls
-  # Logging
+  # --- Logging ---
   config.logger        = Rails.logger
-  config.debug_logging = Rails.env.development?  # logs full prompt+response at DEBUG level
+  config.debug_logging = Rails.env.development? # logs full prompt+response in dev
-  # LLM caller — wire to your existing LLM client (required)
+  # --- Wire up your app's LLM client ---
+  # Replace the body with however your app calls the LLM
   config.llm_caller = ->(prompt, model:) {
-    RubyLLM.chat(model: model, assume_model_exists: true).ask(prompt).content
+    model ||= "claude-haiku-4-5-20251001"
+    provider = if model.include?("claude") then :anthropic
+               elsif model.include?("gpt") then :openai
+               elsif model.include?("gemini") then :gemini
+               else :ollama
+               end
+    chat = RubyLLM.chat(model: model, provider: provider, assume_model_exists: true)
+    chat.ask(prompt).content
   }
   # Embeddings caller — wire to your embeddings provider (required if use_semantic_cache: true)
-  # Falls back to OpenAI via ENV["OPENAI_API_KEY"] if not set
   config.embedding_caller = ->(text) {
-    MyEmbeddingService.embed(text)
+    response = RubyLLM.embed(text, provider: :gemini, model: 'gemini-embedding-001')
+    response.vectors
   }
   # Classifier caller — optional, improves routing accuracy for ambiguous prompts
@@ -151,7 +162,18 @@ LlmOptimizer.configure do |config|
     RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
       .ask(prompt).content.strip.downcase
   }
+  # Messages caller - optional, handles converation summary and hostiry manager.
+  config.system_prompt = "You are a sarcastic comic person who gives witty responses in a non harmful way. If any serious question is asked, handle it in a calm way."
+  config.messages_caller = ->(messages, model:) {
+    chat = RubyLLM.chat(model: model)
+    messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
+    response = chat.ask(messages.last[:content])
+    response.content
+  }
 end
 ```
 ### Configuration reference
@@ -162,19 +184,23 @@ end
 | `use_semantic_cache` | Boolean | `false` | Enable Redis-backed semantic cache |
 | `manage_history` | Boolean | `false` | Enable conversation history summarization |
 | `route_to` | Symbol | `:auto` | `:auto`, `:simple`, or `:complex` |
-| `simple_model` | String | `"gpt-4o-mini"` | Model for simple prompts |
-| `complex_model` | String | `"claude-3-5-sonnet-20241022"` | Model for complex prompts |
+| `simple_model` | String | `"gemini-2.5-flash-lite"` | Model for simple prompts |
+| `complex_model` | String | `"claude-haiku-4-5-20251001"` | Model for complex prompts |
 | `similarity_threshold` | Float | `0.96` | Minimum cosine similarity for cache hit |
 | `token_budget` | Integer | `4000` | Token limit before history summarization |
 | `cache_ttl` | Integer | `86400` | Cache entry TTL in seconds |
 | `timeout_seconds` | Integer | `5` | Timeout for external API calls |
 | `redis_url` | String | `nil` | Redis connection URL |
-| `embedding_model` | String | `"text-embedding-3-small"` | Embedding model name (OpenAI fallback) |
+| `embedding_model` | String | `"gemini-embedding-001"` | Embedding model name (OpenAI fallback) |
 | `logger` | Logger | `Logger.new($stdout)` | Any Logger-compatible object |
 | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
 | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
 | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
 | `classifier_caller` | Lambda | `nil` | `(prompt) -> "simple" or "complex"` |
+| `messages_caller` | Lambda | `nil` | `(messages, model:) -> String` — used when `conversation_id` is present; receives full history including current user turn |
+| `system_prompt` | String | `nil` | Seeded as the first system message when a new conversation is created via `conversation_id` |
+| `conversation_ttl` | Integer | `86400` | TTL in seconds for Redis-backed conversation history (`0` for no expiry) |
+| `with_tools` | Array | `nil` | Tools (functions) available to the LLM; passed as `tools:` keyword to callers |
 ## Per-call configuration
@@ -187,32 +213,6 @@ result = LlmOptimizer.optimize(prompt) do |config|
 end
 ```
-## Conversation history
-Pass a `messages` array to enable history management:
-```ruby
-messages = [
-  { role: "user",      content: "Tell me about Redis" },
-  { role: "assistant", content: "Redis is an in-memory data store..." },
-  # ... more messages
-]
-result = LlmOptimizer.optimize("What else can it do?", messages: messages)
-# result.messages contains the (possibly summarized) messages array
-```
-## Opt-in client wrapping
-Transparently wrap an existing LLM client class so all calls through it are automatically optimized:
-```ruby
-LlmOptimizer.wrap_client(OpenAI::Client)
-```
-This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
 ## OptimizeResult
 Every call returns an `OptimizeResult` struct:
@@ -226,20 +226,9 @@ Every call returns an `OptimizeResult` struct:
 | `original_tokens` | Integer | Estimated token count before compression |
 | `compressed_tokens` | Integer | Estimated token count after compression (`nil` if not compressed) |
 | `latency_ms` | Float | Total wall-clock time for the optimize call |
-| `messages` | Array | Final messages array (for history management) |
-## Error handling
-The gem defines a hierarchy of errors, all inheriting from `LlmOptimizer::Error`:
-```
-LlmOptimizer::Error
-├── LlmOptimizer::ConfigurationError  # unknown config key, missing llm_caller
-├── LlmOptimizer::EmbeddingError      # embedding API failure
-└── LlmOptimizer::TimeoutError        # network timeout exceeded
-```
+| `messages` | Array | Final messages array sent to the LLM, after history management and conversation hydration (`nil` on a cache hit) |
-The gateway catches all component failures and falls through to a raw LLM call with the original prompt. Your app's core functionality is never blocked by the optimizer.
+The `messages` field reflects the actual array passed to `messages_caller` (or built from `conversation_id`), including any summarization applied by the history manager. You can pass it back as `options[:messages]` on the next call to continue a stateless conversation.
 ## Resilience
@@ -249,7 +238,9 @@ The gateway catches all component failures and falls through to a raw LLM call w
 | Redis unavailable (write) | Log warning, return LLM result normally |
 | Embedding API failure | Treat as cache miss, continue |
 | Any component exception | Log error, fall through to raw LLM call |
-| History summarization failure | Log error, return original messages unchanged |
+| History summarization failure | Log warning, return original messages unchanged |
+| Conversation load failure | Log warning, proceed without history |
+| Conversation save failure | Log warning, return result with pre-save messages |
 ## Development

data/lib/generators/llm_optimizer/templates/initializer.rb CHANGED Viewed

@@ -15,8 +15,8 @@ LlmOptimizer.configure do |config|
   # --- Model routing ---
   # :auto classifies each prompt; :simple or :complex forces a tier
   config.route_to      = :auto
-  config.simple_model  = "gpt-4o-mini"
-  config.complex_model = "gpt-4o"
+  config.simple_model  = "gemini-1.5-flash"
+  config.complex_model = "claude-haiku-4-5"
   # --- Redis (required only if use_semantic_cache: true) ---
   config.redis_url = ENV.fetch("REDIS_URL", nil)
@@ -27,6 +27,9 @@ LlmOptimizer.configure do |config|
   config.cache_ttl            = 86_400 # cache entry TTL in seconds (default: 24h)
   config.timeout_seconds      = 5 # timeout for embedding / external API calls
+  # --- Tools ---
+  # config.with_tools = [] # Array of tool definitions (OpenAI/Anthropic format)
   # --- Logging ---
   config.logger        = Rails.logger
   config.debug_logging = Rails.env.development?
@@ -76,4 +79,45 @@ LlmOptimizer.configure do |config|
   #   }
   #
   # config.classifier_caller = nil
+  # --- Messages caller (optional) ---
+  # Messages caller for history manager/conversation summary - Optional
+  # config.system_prompt = "You are a helpful person who gives responses in a non harmful way. " \
+  #                  "If any serious question is asked, handle it in effectively."
+  # OpenAI implementation -
+  # config.messages_caller = ->(messages, model:, tools: nil) {
+  #   parameters = {
+  #     model: model,
+  #     messages: messages.map { |m| { role: m[:role], content: m[:content] } }
+  #   }
+  #   parameters[:tools] = tools if tools&.any?
+  #
+  #   response = $openai.chat(parameters: parameters)
+  #   response.dig("choices", 0, "message", "content")
+  # }
+  # RubyLLM implementation -
+  # config.messages_caller = ->(messages, model:, tools: nil) {
+  #   chat = RubyLLM.chat(model: model)
+  #   chat.with_tools(*tools) if tools&.any?
+  #   messages[0..-2].each { |m| chat.add_message(role: m[:role], content: m[:content]) }
+  #   chat.ask(messages.last[:content]).content
+  # }
+  # Anthropic implementation -
+  # config.messages_caller = ->(messages, model:, tools: nil) {
+  #   # Anthropic separates system messages from the messages array
+  #   system_msg = messages.find { |m| m[:role] == "system" }&.dig(:content)
+  #   chat_msgs  = messages.reject { |m| m[:role] == "system" }
+  #                       .map { |m| { role: m[:role], content: m[:content] } }
+  #
+  #   response = $anthropic.messages(
+  #     model: model,
+  #     max_tokens: 1024,
+  #     system: system_msg,
+  #     messages: chat_msgs,
+  #     tools: tools
+  #   )
+  #   response["content"].first["text"]
+  # }
 end

data/lib/llm_optimizer/configuration.rb CHANGED Viewed

@@ -22,6 +22,13 @@ module LlmOptimizer
       llm_caller
       embedding_caller
       classifier_caller
+      conversation_ttl
+      system_prompt
+      messages_caller
+      cache_scope
+      tools
+      with_tools
+      tools_caller
     ].freeze
     # Define readers for all known keys (setters below track explicit sets)
@@ -47,6 +54,9 @@ module LlmOptimizer
       @llm_caller           = nil
       @embedding_caller     = nil
       @classifier_caller    = nil
+      @conversation_ttl     = 86_400
+      @system_prompt        = nil
+      @with_tools           = nil
     end
     # Copies only explicitly set keys from other_config without resetting unmentioned keys.

data/lib/llm_optimizer/conversation_store.rb ADDED Viewed

@@ -0,0 +1,83 @@
+# frozen_string_literal: true
+module LlmOptimizer
+  class ConversationStore
+    KEY_NAMESPACE = "llm_optimizer:conversation:"
+    def initialize(redis_client, ttl:, logger:, debug_logging: false, system_prompt: nil)
+      @redis         = redis_client
+      @ttl           = ttl
+      @logger        = logger
+      @debug_logging = debug_logging
+      @system_prompt = system_prompt
+    end
+    # Loads and returns the messages array for conversation_id.
+    # Returns [] if no key exists or on Redis error (logs warning).
+    def load(conversation_id)
+      key = redis_key(conversation_id)
+      raw = @redis.get(key)
+      if raw.nil?
+        messages = seed_messages
+        @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
+        log_debug_history(conversation_id, messages)
+        return messages
+      end
+      messages = JSON.parse(raw, symbolize_names: true)
+      @logger.info("[llm_optimizer] ConversationStore load: conversation_id=#{conversation_id}, count=#{messages.size}")
+      log_debug_history(conversation_id, messages)
+      messages
+    rescue Redis::BaseError => e
+      @logger.warn("[llm_optimizer] ConversationStore load failed: conversation_id=#{conversation_id}, error=#{e.message}")
+      []
+    end
+    # Appends user + assistant messages to history and persists to Redis.
+    # Silently logs warning on Redis error; never raises.
+    def save(conversation_id, messages, prompt, response)
+      updated_messages = messages + [
+        { role: "user", content: prompt },
+        { role: "assistant", content: response }
+      ]
+      key = redis_key(conversation_id)
+      json = JSON.generate(updated_messages)
+      if @ttl.zero?
+        @redis.set(key, json)
+      else
+        @redis.set(key, json, ex: @ttl)
+      end
+      @logger.info("[llm_optimizer] ConversationStore save: conversation_id=#{conversation_id}, count=#{updated_messages.size}")
+      log_debug_history(conversation_id, updated_messages)
+      updated_messages
+    rescue Redis::BaseError => e
+      @logger.warn("[llm_optimizer] ConversationStore save failed: conversation_id=#{conversation_id}, error=#{e.message}")
+      nil
+    end
+    private
+    def redis_key(conversation_id)
+      "#{KEY_NAMESPACE}#{conversation_id}"
+    end
+    def seed_messages
+      return [] unless @system_prompt
+      [
+        { role: "user",      content: @system_prompt },
+        { role: "assistant", content: "Got it!" }
+      ]
+    end
+    def log_debug_history(conversation_id, messages)
+      return unless @debug_logging
+      @logger.debug("[llm_optimizer] ConversationStore history: conversation_id=#{conversation_id}, messages=#{messages.inspect}")
+    end
+  end
+end

data/lib/llm_optimizer/model_router.rb CHANGED Viewed

@@ -10,10 +10,12 @@ module LlmOptimizer
       Classify the following prompt as either 'simple' or 'complex'.
       Rules:
-      - simple: factual questions, basic lookups, short explanations, greetings
+      - simple: factual questions, basic lookups, short explanations, greetings, chitchat, general statements, simple mathematical calculations with additions, subtractions, multiplications and divisions
+        Example - Hello, Bye, You are funny, how are you?, what is the capital of France, tell me about yourself, what is 2 + 3 - 1 * 10 / 2 etc.
       - complex: code generation, debugging, architecture, multi-step reasoning, analysis
+        Example - how does pandas extract my information, debug this code, why is rag apps consume more tokens, give me code to print star in python etc.
-      Reply with exactly one word: simple or complex
+      Reply with exactly one word, no punctuation: simple or complex
       Prompt: %<prompt>s
     PROMPT
@@ -48,9 +50,12 @@ module LlmOptimizer
     def classify_with_llm(prompt)
       classifier_prompt = format(CLASSIFIER_PROMPT, prompt: prompt)
       response = @config.classifier_caller.call(classifier_prompt)
-      normalized = response.to_s.strip.downcase.gsub(/[^a-z]/, "")
-      return :simple  if normalized == "simple"
-      return :complex if normalized == "complex"
+      normalized = response.to_s.strip.downcase
+      # Check for word boundary match to handle responses like
+      # "simple." / "**simple**" / "the answer is simple"
+      return :simple  if normalized.match?(/\bsimple\b/)
+      return :complex if normalized.match?(/\bcomplex\b/)
       nil # unrecognized response — fall through to heuristic
     rescue StandardError

data/lib/llm_optimizer/optimize_result.rb CHANGED Viewed

@@ -1,8 +1,43 @@
 # frozen_string_literal: true
 module LlmOptimizer
-  OptimizeResult = Struct.new(
-    :response, :model, :model_tier, :cache_status,
-    :original_tokens, :compressed_tokens, :latency_ms, :messages
-  )
+  class OptimizeResult
+    attr_accessor :response, :model, :model_tier, :cache_status,
+                  :original_tokens, :compressed_tokens, :input_tokens,
+                  :output_tokens, :cached_tokens, :latency_ms, :messages
+    # rubocop:disable Metrics/ParameterLists
+    def initialize(response: nil, model: nil, model_tier: nil, cache_status: nil,
+                   original_tokens: 0, compressed_tokens: 0, input_tokens: 0,
+                   output_tokens: 0, cached_tokens: 0, latency_ms: 0, messages: [])
+      @response = response
+      @model = model
+      @model_tier = model_tier
+      @cache_status = cache_status
+      @original_tokens = original_tokens
+      @compressed_tokens = compressed_tokens
+      @input_tokens = input_tokens
+      @output_tokens = output_tokens
+      @cached_tokens = cached_tokens
+      @latency_ms = latency_ms
+      @messages = messages
+    end
+    # rubocop:enable Metrics/ParameterLists
+    def to_h
+      {
+        response: @response,
+        model: @model,
+        model_tier: @model_tier,
+        cache_status: @cache_status,
+        original_tokens: @original_tokens,
+        compressed_tokens: @compressed_tokens,
+        input_tokens: @input_tokens,
+        output_tokens: @output_tokens,
+        cached_tokens: @cached_tokens,
+        latency_ms: @latency_ms,
+        messages: @messages
+      }
+    end
+  end
 end

data/lib/llm_optimizer/pipeline.rb ADDED Viewed

@@ -0,0 +1,174 @@
+# frozen_string_literal: true
+module LlmOptimizer
+  # Internal pipeline helpers — not part of the public API.
+  # Extended into LlmOptimizer as private class methods.
+  module Pipeline
+    private
+    def build_call_config(options, &block)
+      cfg = Configuration.new
+      cfg.merge!(configuration)
+      options.each do |k, v|
+        next unless Configuration::KNOWN_KEYS.include?(k.to_sym)
+        cfg.public_send(:"#{k}=", v)
+      end
+      block&.call(cfg)
+      cfg
+    end
+    def validate_conversation_options!(conversation_id, options, call_config)
+      if conversation_id && options[:messages]
+        raise ConfigurationError,
+              "conversation_id and messages: are mutually exclusive — pass one or the other"
+      end
+      return unless conversation_id && call_config.redis_url.nil?
+      raise ConfigurationError,
+            "redis_url must be configured to use conversation_id"
+    end
+    def compress(prompt, config)
+      return [prompt, nil] unless config.compress_prompt
+      compressed = Compressor.new.compress(prompt)
+      [compressed, Compressor.new.estimate_tokens(compressed)]
+    end
+    def route(prompt, config)
+      router     = ModelRouter.new(config)
+      model_tier = router.route(prompt)
+      model      = model_tier == :simple ? config.simple_model : config.complex_model
+      [model_tier, model]
+    end
+    def load_conversation(conversation_id, options, config)
+      return [options[:messages], nil] unless conversation_id
+      redis = build_redis(config.redis_url)
+      store = ConversationStore.new(redis,
+                                    ttl: config.conversation_ttl,
+                                    logger: config.logger,
+                                    debug_logging: config.debug_logging,
+                                    system_prompt: config.system_prompt)
+      [store.load(conversation_id), store]
+    end
+    def apply_history_manager(messages, config)
+      return messages unless config.manage_history && messages
+      llm_caller  = ->(p, model:) { raw_llm_call(p, model: model, config: config) }
+      history_mgr = HistoryManager.new(
+        llm_caller: llm_caller,
+        simple_model: config.simple_model,
+        token_budget: config.token_budget
+      )
+      history_mgr.process(messages)
+    end
+    def persist_conversation(store, conversation_id, messages, prompt, response)
+      return messages unless store && conversation_id
+      store.save(conversation_id, messages, prompt, response) || messages
+    end
+    def build_result(response, model, model_tier, cache_status,
+                     original_tokens, compressed_tokens, latency_ms, messages, token_info = {})
+      OptimizeResult.new(
+        response: response, model: model, model_tier: model_tier,
+        cache_status: cache_status, original_tokens: original_tokens,
+        compressed_tokens: compressed_tokens,
+        input_tokens: token_info[:input_tokens] || compressed_tokens || original_tokens,
+        output_tokens: token_info[:output_tokens],
+        cached_tokens: token_info[:cached_tokens],
+        latency_ms: latency_ms,
+        messages: messages
+      )
+    end
+    def fallback_result(original_prompt, original_tokens, options, start)
+      latency_ms = elapsed_ms(start)
+      response, _token_info = raw_llm_call(original_prompt, model: nil, config: configuration)
+      build_result(response, nil, nil, :miss, original_tokens || 0, nil,
+                   latency_ms, options[:messages])
+    end
+    def raw_llm_call(prompt, model:, messages: nil, config: nil)
+      tools = config&.with_tools || config&.tools
+      result = if messages && !messages.empty? && config&.messages_caller
+                 config.messages_caller.call(messages + [{ role: "user", content: prompt }], model: model, tools: tools)
+               else
+                 llm = config&.llm_caller || @_current_llm_caller
+                 raise ConfigurationError, "No llm_caller configured." unless llm
+                 llm.call(prompt, model: model, tools: tools)
+               end
+      if result.is_a?(Hash)
+        [result[:content], result]
+      else
+        [result, {}]
+      end
+    end
+    def elapsed_ms(start)
+      ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
+    end
+    def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
+                 compressed_tokens:, latency_ms:, prompt:, response:)
+      logger.info(
+        "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
+        "model_tier: #{model_tier.inspect}, " \
+        "original_tokens: #{original_tokens.inspect}, " \
+        "compressed_tokens: #{compressed_tokens.inspect}, " \
+        "latency_ms: #{latency_ms.inspect} }"
+      )
+      logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}") if config.debug_logging
+    end
+    def build_redis(redis_url)
+      require "redis"
+      Redis.new(url: redis_url)
+    end
+    def semantic_cache_lookup(prompt, model, model_tier, original_tokens,
+                              compressed_tokens, original_prompt, start, config)
+      return [nil, nil] unless config.use_semantic_cache
+      embedding = config.embedding_caller.call(prompt)
+      cache     = SemanticCache.new(build_redis(config.redis_url),
+                                    threshold: config.similarity_threshold,
+                                    ttl: config.cache_ttl,
+                                    cache_scope: config.cache_scope)
+      cached, token_info = cache.lookup(embedding)
+      if cached
+        latency_ms = elapsed_ms(start)
+        emit_log(config.logger, config,
+                 cache_status: :hit, model_tier: model_tier,
+                 original_tokens: original_tokens, compressed_tokens: compressed_tokens,
+                 latency_ms: latency_ms, prompt: original_prompt, response: cached)
+        [embedding, build_result(cached, model, model_tier, :hit,
+                                 original_tokens, compressed_tokens, latency_ms, nil, token_info)]
+      else
+        [embedding, nil]
+      end
+    rescue StandardError => e
+      config.logger.warn("[llm_optimizer] semantic_cache_lookup failed: #{e.message}")
+      [nil, nil]
+    end
+    def store_in_cache(embedding, response, config, token_info = {})
+      return unless config.use_semantic_cache && embedding
+      SemanticCache.new(build_redis(config.redis_url),
+                        threshold: config.similarity_threshold,
+                        ttl: config.cache_ttl,
+                        cache_scope: config.cache_scope).store(embedding, response, token_info)
+    end
+  end
+end

data/lib/llm_optimizer/semantic_cache.rb CHANGED Viewed

@@ -7,20 +7,19 @@ module LlmOptimizer
   class SemanticCache
     KEY_NAMESPACE = "llm_optimizer:cache:"
-    def initialize(redis_client, threshold:, ttl:)
-      @redis     = redis_client
-      @threshold = threshold
-      @ttl       = ttl
+    def initialize(redis_client, threshold:, ttl:, cache_scope: nil)
+      @redis       = redis_client
+      @threshold   = threshold
+      @ttl         = ttl
+      @cache_scope = cache_scope
     end
-    def store(embedding, response)
+    def store(embedding, response, token_info = {})
       key     = cache_key(embedding)
-      # Serialize embedding as raw 64-bit big-endian doubles to preserve full
-      # Float precision. MessagePack silently downcasts Ruby Float to 32-bit,
-      # which corrupts cosine similarity on deserialization.
       payload = MessagePack.pack({
-                                   "embedding" => embedding.pack("G*"), # binary string, lossless
-                                   "response" => response
+                                   "embedding" => embedding.pack("G*"),
+                                   "response" => response,
+                                   "token_info" => token_info
                                  })
       @redis.set(key, payload, ex: @ttl)
     rescue ::Redis::BaseError => e
@@ -28,28 +27,32 @@ module LlmOptimizer
     end
     def lookup(embedding)
-      keys = @redis.keys("#{KEY_NAMESPACE}*")
+      prefix = KEY_NAMESPACE
+      prefix += "#{@cache_scope}:" if @cache_scope
+      keys = @redis.keys("#{prefix}*")
+      keys.reject! { |k| k.count(":") > 2 } unless @cache_scope
       return nil if keys.empty?
       best_score    = -Float::INFINITY
-      best_response = nil
+      best_entry    = nil
       keys.each do |key|
         raw = @redis.get(key)
         next unless raw
         entry = MessagePack.unpack(raw)
-        # Unpack the binary string back to 64-bit doubles
         stored_embedding = entry["embedding"].unpack("G*")
         score = cosine_similarity(embedding, stored_embedding)
         if score > best_score
           best_score    = score
-          best_response = entry["response"]
+          best_entry    = entry
         end
       end
-      best_score >= @threshold ? best_response : nil
+      [best_entry["response"], best_entry["token_info"] || {}] if best_score >= @threshold
     rescue ::Redis::BaseError => e
       warn "[llm_optimizer] SemanticCache lookup failed: #{e.message}"
       nil
@@ -70,7 +73,9 @@ module LlmOptimizer
       # Use "G*" (64-bit big-endian double) to match Ruby's native Float precision.
       # "f*" (32-bit) truncates precision and produces inconsistent hashes for the
       # same embedding across serialize/deserialize round trips.
-      KEY_NAMESPACE + Digest::SHA256.hexdigest(embedding.pack("G*"))
+      prefix = KEY_NAMESPACE
+      prefix += "#{@cache_scope}:" if @cache_scope
+      prefix + Digest::SHA256.hexdigest(embedding.pack("G*"))
     end
   end
 end

data/lib/llm_optimizer/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LlmOptimizer
-  VERSION = "0.1.4"
+  VERSION = "0.1.6"
 end

data/lib/llm_optimizer.rb CHANGED Viewed

@@ -8,26 +8,21 @@ require_relative "llm_optimizer/model_router"
 require_relative "llm_optimizer/embedding_client"
 require_relative "llm_optimizer/semantic_cache"
 require_relative "llm_optimizer/history_manager"
+require_relative "llm_optimizer/conversation_store"
+require_relative "llm_optimizer/pipeline"
 require "llm_optimizer/railtie" if defined?(Rails)
 module LlmOptimizer
-  # Base error class for all gem-specific exceptions
   class Error < StandardError; end
-  # Raised when an unrecognized configuration key is set
   class ConfigurationError < Error; end
-  # Raised when the embedding API call fails
   class EmbeddingError < Error; end
-  # Raised when a network timeout is exceeded
   class TimeoutError < Error; end
-  # Global configuration
   @configuration = nil
-  # Yields a Configuration instance; merges it into the global config.
+  extend Pipeline
   def self.configure
     temp = Configuration.new
     yield temp
@@ -35,7 +30,6 @@ module LlmOptimizer
     validate_configuration!(configuration)
   end
-  # Warns about misconfigured options rather than failing silently at call time.
   def self.validate_configuration!(config)
     return unless config.use_semantic_cache && config.embedding_caller.nil?
@@ -46,36 +40,32 @@ module LlmOptimizer
     config.use_semantic_cache = false
   end
-  # Returns the current global Configuration, lazy-initializing if nil.
   def self.configuration
     @configuration ||= Configuration.new
   end
-  # Replaces the global config with a fresh default Configuration.
-  # Useful in tests to avoid state leakage.
   def self.reset_configuration!
     @configuration = Configuration.new
   end
-  # Opt-in client wrapping
-  # WrapperModule intercepts `chat` on the wrapped client, runs the pre-call
-  # optimization pipeline (compress, route, cache lookup), and delegates the
-  # actual LLM call to the original client via `super` — so llm_caller is NOT
-  # required when using wrap_client.
+  def self.clear_conversation(conversation_id)
+    raise ConfigurationError, "redis_url must be configured to use clear_conversation" unless configuration.redis_url
+    redis   = build_redis(configuration.redis_url)
+    key     = "#{ConversationStore::KEY_NAMESPACE}#{conversation_id}"
+    deleted = redis.del(key)
+    deleted.positive?
+  rescue ::Redis::BaseError => e
+    raise LlmOptimizer::Error, "Redis error in clear_conversation: #{e.message}"
+  end
   module WrapperModule
-    def chat(params, &block)
+    def chat(params, &)
       config = LlmOptimizer.configuration
       prompt = params[:messages] || params[:prompt]
-      # Run pre-call pipeline: compress, route, cache lookup
       result = LlmOptimizer.optimize_pre_call(prompt, config)
+      return result[:response] if result[:cache_status] == :hit
-      # Cache hit — return immediately without calling the LLM
-      if result[:cache_status] == :hit
-        return result[:response]
-      end
-      # Apply compressed prompt and routed model, then delegate to original client
       optimized_params = params.merge(model: result[:model])
       if params[:messages]
         optimized_params = optimized_params.merge(messages: result[:prompt])
@@ -83,264 +73,80 @@ module LlmOptimizer
         optimized_params = optimized_params.merge(prompt: result[:prompt])
       end
-      response = super(optimized_params, &block)
-      # Store in cache after successful LLM call
+      response = super(optimized_params, &)
       LlmOptimizer.optimize_post_call(result, response, config)
       response
     end
   end
-  # Prepends WrapperModule into client_class; idempotent — safe to call N times.
   def self.wrap_client(client_class)
     return if client_class.ancestors.include?(WrapperModule)
     client_class.prepend(WrapperModule)
   end
-  # Primary entry point
-  # Runs the optimization pipeline and returns an OptimizeResult.
-  # options hash keys mirror Configuration attr_accessors and are merged over
-  # the global config for this call only.  An optional block is yielded a
-  # per-call Configuration for fine-grained control.
-  def self.optimize(prompt, options = {})
-    start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
-    # Resolve per-call configuration — only pass known config keys
-    call_config = Configuration.new
-    call_config.merge!(configuration)
-    options.each do |k, v|
-      next unless LlmOptimizer::Configuration::KNOWN_KEYS.include?(k.to_sym)
-      call_config.public_send(:"#{k}=", v)
-    end
-    yield call_config if block_given?
-    logger = call_config.logger
-    # Keep a reference to the original prompt for fallback use
-    original_prompt = prompt
-    # Compression
-    compressor      = Compressor.new
-    original_tokens = compressor.estimate_tokens(prompt)
-    compressed_tokens = nil
-    if call_config.compress_prompt
-      prompt            = compressor.compress(prompt)
-      compressed_tokens = compressor.estimate_tokens(prompt)
-    end
-    # Model routing
-    router     = ModelRouter.new(call_config)
-    model_tier = router.route(prompt)
-    model      = model_tier == :simple ? call_config.simple_model : call_config.complex_model
-    # Semantic cache lookup
-    embedding = nil
-    if call_config.use_semantic_cache
-      begin
-        emb_client = EmbeddingClient.new(
-          model: call_config.embedding_model,
-          timeout_seconds: call_config.timeout_seconds,
-          embedding_caller: call_config.embedding_caller
-        )
-        embedding = emb_client.embed(prompt)
-        if call_config.redis_url
-          redis  = build_redis(call_config.redis_url)
-          cache  = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
-          cached = cache.lookup(embedding)
-          if cached
-            latency_ms = elapsed_ms(start)
-            emit_log(logger, call_config,
-                     cache_status: :hit, model_tier: model_tier,
-                     original_tokens: original_tokens, compressed_tokens: compressed_tokens,
-                     latency_ms: latency_ms, prompt: original_prompt, response: cached)
-            return OptimizeResult.new(
-              response: cached,
-              model: model,
-              model_tier: model_tier,
-              cache_status: :hit,
-              original_tokens: original_tokens,
-              compressed_tokens: compressed_tokens,
-              latency_ms: latency_ms,
-              messages: options[:messages]
-            )
-          end
-        end
-      rescue EmbeddingError => e
-        logger.warn("[llm_optimizer] EmbeddingError (treating as cache miss): #{e.message}")
-        embedding = nil
-        # continue pipeline as cache miss
-      end
-    end
+  def self.optimize(prompt, options = {}, &)
+    start           = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+    call_config     = build_call_config(options, &)
+    conversation_id = options[:conversation_id]
+    validate_conversation_options!(conversation_id, options, call_config)
-    # History management
-    messages = options[:messages]
-    if call_config.manage_history && messages
-      llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: call_config) }
-      history_mgr = HistoryManager.new(
-        llm_caller: llm_caller,
-        simple_model: call_config.simple_model,
-        token_budget: call_config.token_budget
-      )
-      messages = history_mgr.process(messages)
-    end
+    original_prompt           = prompt
+    original_tokens           = Compressor.new.estimate_tokens(prompt)
+    prompt, compressed_tokens = compress(prompt, call_config)
+    model_tier, model         = route(prompt, call_config)
-    # Raw LLM call
-    response = raw_llm_call(prompt, model: model, config: call_config)
+    embedding, cached_result = semantic_cache_lookup(prompt, model, model_tier,
+                                                     original_tokens, compressed_tokens,
+                                                     original_prompt, start, call_config)
+    return cached_result if cached_result
-    # Cache store
-    if call_config.use_semantic_cache && embedding && call_config.redis_url
-      begin
-        redis = build_redis(call_config.redis_url)
-        cache = SemanticCache.new(redis, threshold: call_config.similarity_threshold, ttl: call_config.cache_ttl)
-        cache.store(embedding, response)
-      rescue StandardError => e
-        logger.warn("[llm_optimizer] SemanticCache store failed: #{e.message}")
-      end
-    end
+    messages, store = load_conversation(conversation_id, options, call_config)
+    messages        = apply_history_manager(messages, call_config)
+    response, token_info = raw_llm_call(prompt, messages: messages, model: model, config: call_config)
+    messages = persist_conversation(store, conversation_id, messages, prompt, response)
+    store_in_cache(embedding, response, call_config, token_info)
-    # Build result
     latency_ms = elapsed_ms(start)
-    emit_log(logger, call_config,
+    emit_log(call_config.logger, call_config,
              cache_status: :miss, model_tier: model_tier,
              original_tokens: original_tokens, compressed_tokens: compressed_tokens,
              latency_ms: latency_ms, prompt: original_prompt, response: response)
-    OptimizeResult.new(
-      response: response,
-      model: model,
-      model_tier: model_tier,
-      cache_status: :miss,
-      original_tokens: original_tokens,
-      compressed_tokens: compressed_tokens,
-      latency_ms: latency_ms,
-      messages: messages
-    )
+    build_result(response, model, model_tier, :miss, original_tokens, compressed_tokens,
+                 latency_ms, messages, token_info)
   rescue EmbeddingError => e
-    # Treat embedding failures as cache miss — continue to raw LLM call
-    logger = configuration.logger
-    logger.warn("[llm_optimizer] EmbeddingError (outer rescue, treating as cache miss): #{e.message}")
-    latency_ms = elapsed_ms(start)
-    response   = raw_llm_call(original_prompt, model: nil, config: configuration)
-    OptimizeResult.new(
-      response: response,
-      model: nil,
-      model_tier: nil,
-      cache_status: :miss,
-      original_tokens: original_tokens || 0,
-      compressed_tokens: nil,
-      latency_ms: latency_ms,
-      messages: options[:messages]
-    )
+    configuration.logger.warn("[llm_optimizer] EmbeddingError (outer rescue): #{e.message}")
+    fallback_result(original_prompt, original_tokens, options, start)
+  rescue ConfigurationError
+    raise
   rescue LlmOptimizer::Error, StandardError => e
-    logger = configuration.logger
-    logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
-    latency_ms = elapsed_ms(start)
-    response   = raw_llm_call(original_prompt, model: nil, config: configuration)
-    OptimizeResult.new(
-      response: response,
-      model: nil,
-      model_tier: nil,
-      cache_status: :miss,
-      original_tokens: original_tokens || 0,
-      compressed_tokens: nil,
-      latency_ms: latency_ms,
-      messages: options[:messages]
-    )
+    configuration.logger.error("[llm_optimizer] #{e.class}: #{e.message}\n#{e.backtrace&.first(5)&.join("\n")}")
+    fallback_result(original_prompt, original_tokens, options, start)
   end
-  # Pre-call pipeline for wrap_client: compress, route, cache lookup.
-  # Returns a hash with :prompt, :model, :model_tier, :embedding, :cache_status, :response.
-  # Does NOT make an LLM call — the wrapped client handles that via super.
   def self.optimize_pre_call(prompt, config = configuration)
-    compressor = Compressor.new
-    prompt     = compressor.compress(prompt) if config.compress_prompt
-    router     = ModelRouter.new(config)
-    model_tier = router.route(prompt)
+    prompt     = Compressor.new.compress(prompt) if config.compress_prompt
+    model_tier = ModelRouter.new(config).route(prompt)
     model      = model_tier == :simple ? config.simple_model : config.complex_model
-    embedding = nil
-    if config.use_semantic_cache && config.redis_url
-      begin
-        emb_client = EmbeddingClient.new(
-          model: config.embedding_model,
-          timeout_seconds: config.timeout_seconds,
-          embedding_caller: config.embedding_caller
-        )
-        embedding = emb_client.embed(prompt)
-        redis  = build_redis(config.redis_url)
-        cache  = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
-        cached = cache.lookup(embedding)
-        return { prompt: prompt, model: model, model_tier: model_tier,
-                 embedding: embedding, cache_status: :hit, response: cached } if cached
-      rescue EmbeddingError => e
-        config.logger.warn("[llm_optimizer] wrap_client EmbeddingError (cache miss): #{e.message}")
-        embedding = nil
-      end
+    unless config.use_semantic_cache && config.redis_url
+      return { prompt: prompt, model: model, model_tier: model_tier,
+               embedding: nil, cache_status: :miss, response: nil }
+    end
+    embedding, result = semantic_cache_lookup(prompt, model, model_tier, nil, nil,
+                                              prompt, Process.clock_gettime(Process::CLOCK_MONOTONIC), config)
+    if result
+      return { prompt: prompt, model: model, model_tier: model_tier,
+               embedding: embedding, cache_status: :hit, response: result.response }
     end
     { prompt: prompt, model: model, model_tier: model_tier,
       embedding: embedding, cache_status: :miss, response: nil }
   end
-  # Post-call: store the LLM response in the semantic cache if applicable.
   def self.optimize_post_call(pre_call_result, response, config = configuration)
-    return unless config.use_semantic_cache && config.redis_url
-    return unless pre_call_result[:embedding]
-    redis = build_redis(config.redis_url)
-    cache = SemanticCache.new(redis, threshold: config.similarity_threshold, ttl: config.cache_ttl)
-    cache.store(pre_call_result[:embedding], response)
-  rescue StandardError => e
-    config.logger.warn("[llm_optimizer] wrap_client cache store failed: #{e.message}")
-  end
-  # Private helpers
-  class << self
-    private
-    def raw_llm_call(prompt, model:, config: nil)
-      caller = config&.llm_caller || @_current_llm_caller
-      unless caller
-        raise ConfigurationError,
-              "No llm_caller configured. " \
-              "Set it via LlmOptimizer.configure { |c| c.llm_caller = ->(prompt, model:) { ... } }"
-      end
-      caller.call(prompt, model: model)
-    end
-    def elapsed_ms(start)
-      ((Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000).round(2)
-    end
-    def emit_log(logger, config, cache_status:, model_tier:, original_tokens:,
-                 compressed_tokens:, latency_ms:, prompt:, response:)
-      logger.info(
-        "[llm_optimizer] { cache_status: #{cache_status.inspect}, " \
-        "model_tier: #{model_tier.inspect}, " \
-        "original_tokens: #{original_tokens.inspect}, " \
-        "compressed_tokens: #{compressed_tokens.inspect}, " \
-        "latency_ms: #{latency_ms.inspect} }"
-      )
-      return unless config.debug_logging
-      logger.debug("[llm_optimizer] prompt=#{prompt.inspect} response=#{response.inspect}")
-    end
-    def build_redis(redis_url)
-      require "redis"
-      Redis.new(url: redis_url)
-    end
+    store_in_cache(pre_call_result[:embedding], response, config)
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm_optimizer
 version: !ruby/object:Gem::Version
-  version: 0.1.4
+  version: 0.1.6
 platform: ruby
 authors:
 - arun kumar
@@ -100,10 +100,12 @@ files:
 - lib/llm_optimizer.rb
 - lib/llm_optimizer/compressor.rb
 - lib/llm_optimizer/configuration.rb
+- lib/llm_optimizer/conversation_store.rb
 - lib/llm_optimizer/embedding_client.rb
 - lib/llm_optimizer/history_manager.rb
 - lib/llm_optimizer/model_router.rb
 - lib/llm_optimizer/optimize_result.rb
+- lib/llm_optimizer/pipeline.rb
 - lib/llm_optimizer/railtie.rb
 - lib/llm_optimizer/semantic_cache.rb
 - lib/llm_optimizer/version.rb