llm_optimizer 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6a0351ff5590228acf939201d0c7eee71e33ee39a0cd20df33e76187c827ab34
4
- data.tar.gz: 6bc7df5aa71407be80ecd07104e1dbff9a25a9ecab7a95cb390261f97fa212e8
3
+ metadata.gz: ef2fcae7f3d39043f476a555b980685670c65e266f8fc3f9ca4309081d51c066
4
+ data.tar.gz: c5fb255ad280afba780ea3c417b377ae406dc828178609bf7e21c0bb4f1ba048
5
5
  SHA512:
6
- metadata.gz: e6822ea254300a957c8aa5953d267695ebc2ae4c1e2fa478492d175534bc990eac9bb5bb39127b8418e7306a3f5de9e8124832f9aeac9f723979e92a68babdec
7
- data.tar.gz: 4248569dc2a969518142ac2749b6b6e9dc6defedec71e67eeec2480ade3a1aea05261736a2438c994906c05a1d1a1a7991055d48eab9ac0cc33f1344a88a221c
6
+ metadata.gz: f84eba0ae06cd7541616c44c8630618eb09f3f8b1d1fe5b588eae285be6dd6a2fcc88f0868a00cbfb91e00b491f56232c0c592b3bbbea579748232a89e8aff1e
7
+ data.tar.gz: 80fd56954cfa497f2d7c16be68b4c41c6cd01128f3df1e2b1054c3d1005cb869b70317e60ae847dbae2d2f270119812d00d175a4dfa564c447791c2195bc7672
data/CHANGELOG.md CHANGED
@@ -7,6 +7,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.1.3] - 2026-04-10
11
+
12
+ ### Added
13
+ - `classifier_caller` config option — injectable lambda for LLM-based prompt classification
14
+ - Hybrid routing in `ModelRouter`: fast-path signals (code blocks, keywords) → LLM classifier → word-count heuristic fallback
15
+ - Fixes misclassification of short-but-complex prompts (e.g. "Fix this bug") and long-but-simple prompts
16
+ - Classifier failures (network errors, missing model, unexpected response) automatically fall through to heuristic — no app impact
17
+ - Tests for classifier integration, failure fallback, and fast-path bypass
18
+
19
+ ### Changed
20
+ - `ModelRouter` routing logic now uses three-layer decision chain instead of pure heuristics
21
+ - README updated with classifier documentation and routing decision flow
22
+
10
23
  ## [0.1.2] - 2026-04-10
11
24
 
12
25
  ### Fixed
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # llm_optimizer
2
2
 
3
- A Smart Gateway for LLM API calls in Ruby and Rails applications. Reduces token usage and API costs through four composable optimizations all opt-in, all independently configurable.
3
+ A Smart Gateway for LLM API calls in Ruby and Rails applications. Reduces token usage and API costs through four composable optimizations all opt-in, all independently configurable.
4
4
 
5
5
  ## How it works
6
6
 
@@ -10,17 +10,41 @@ Every call to `LlmOptimizer.optimize` passes through an ordered pipeline:
10
10
  prompt → Compressor → ModelRouter → SemanticCache lookup → HistoryManager → LLM call → SemanticCache store → OptimizeResult
11
11
  ```
12
12
 
13
- Each stage is independently enabled via configuration flags. If any stage fails, the gem falls through to a raw LLM call your app never breaks because of the optimizer.
13
+ Each stage is independently enabled via configuration flags. If any stage fails, the gem falls through to a raw LLM call your app never breaks because of the optimizer.
14
14
 
15
15
  ## Optimizations
16
16
 
17
17
  ### 1. Semantic Caching
18
- Stores prompt embeddings in Redis. On subsequent calls, computes cosine similarity against stored embeddings. If similarity ≥ threshold, returns the cached response instantly no LLM call made.
18
+ Stores prompt embeddings in Redis. On subsequent calls, computes cosine similarity against stored embeddings. If similarity ≥ threshold, returns the cached response instantly no LLM call made.
19
19
 
20
20
  ### 2. Intelligent Model Routing
21
- Classifies each prompt using a heuristic and routes it to the appropriate model tier:
22
- - **Simple** short prompts (< 20 words), no code blocks, no complex keywords → cheaper/faster model
23
- - **Complex** — code blocks, keywords like `analyze`, `refactor`, `debug`, `architect`, `explain in detail` → premium model
21
+
22
+ Classifies each prompt and routes it to the appropriate model tier:
23
+
24
+ - **Simple** → cheaper/faster model (e.g. `gpt-4o-mini`, `amazon.nova-micro`)
25
+ - **Complex** → premium model (e.g. `claude-3-5-sonnet`, `gpt-4o`)
26
+
27
+ Routing uses a three-layer decision chain:
28
+
29
+ 1. **Explicit override** — if `route_to: :simple` or `:complex` is set, always use that
30
+ 2. **Fast-path signals** — code blocks (` ``` `, `~~~`) and keywords (`analyze`, `refactor`, `debug`, `architect`, `explain in detail`) → instantly `:complex`, no LLM call
31
+ 3. **LLM classifier** (optional) — for ambiguous prompts, calls a cheap model with a classification prompt; falls back to word-count heuristic if not configured or if the call fails
32
+
33
+ This hybrid approach fixes the core weakness of pure heuristics:
34
+ - `"Fix this bug"` → 3 words but `:complex` via classifier ✓
35
+ - `"Explain Ruby blocks simply"` → long but `:simple` via classifier ✓
36
+ - `"analyze this code"` → keyword fast-path → `:complex` instantly (no classifier call) ✓
37
+
38
+ Configure the classifier with any cheap model your app already uses:
39
+
40
+ ```ruby
41
+ config.classifier_caller = ->(prompt) {
42
+ RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
43
+ .ask(prompt).content.strip.downcase
44
+ }
45
+ ```
46
+
47
+ If `classifier_caller` is not set, the router falls back to the word-count heuristic (< 20 words → `:simple`).
24
48
 
25
49
  ### 3. Token Pruning
26
50
  Removes common English stop words from prompts before sending to the LLM. Preserves fenced code block content unchanged. Typically reduces token count by 10–20%.
@@ -120,6 +144,13 @@ LlmOptimizer.configure do |config|
120
144
  config.embedding_caller = ->(text) {
121
145
  MyEmbeddingService.embed(text)
122
146
  }
147
+
148
+ # Classifier caller — optional, improves routing accuracy for ambiguous prompts
149
+ # Falls back to word-count heuristic if not set or if the call fails
150
+ config.classifier_caller = ->(prompt) {
151
+ RubyLLM.chat(model: "amazon.nova-micro-v1:0", provider: :bedrock, assume_model_exists: true)
152
+ .ask(prompt).content.strip.downcase
153
+ }
123
154
  end
124
155
  ```
125
156
 
@@ -143,6 +174,7 @@ end
143
174
  | `debug_logging` | Boolean | `false` | Log full prompt and response at DEBUG level |
144
175
  | `llm_caller` | Lambda | `nil` | `(prompt, model:) -> String` |
145
176
  | `embedding_caller` | Lambda | `nil` | `(text) -> Array<Float>` |
177
+ | `classifier_caller` | Lambda | `nil` | `(prompt) -> "simple" or "complex"` |
146
178
 
147
179
  ## Per-call configuration
148
180
 
@@ -179,7 +211,7 @@ Transparently wrap an existing LLM client class so all calls through it are auto
179
211
  LlmOptimizer.wrap_client(OpenAI::Client)
180
212
  ```
181
213
 
182
- This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
214
+ This prepends the optimization pipeline into the client's `chat` method. Safe to call multiple times idempotent.
183
215
 
184
216
  ## OptimizeResult
185
217
 
@@ -64,5 +64,16 @@ LlmOptimizer.configure do |config|
64
64
  # Example:
65
65
  # config.embedding_caller = ->(text) { EmbeddingService.embed(text) }
66
66
  #
67
- # config.embedding_caller = nil
67
+ # --- Routing classifier (optional) ---
68
+ # When set, ambiguous prompts are classified by a cheap LLM instead of
69
+ # falling back to the word-count heuristic. Unambiguous signals (code blocks,
70
+ # keywords) still bypass the classifier for speed.
71
+ #
72
+ # Example:
73
+ # config.classifier_caller = ->(prompt) {
74
+ # RubyLLM.chat(model: "amazon.nova-micro-v1:0", assume_model_exists: true)
75
+ # .ask(prompt).content.strip.downcase
76
+ # }
77
+ #
78
+ # config.classifier_caller = nil
68
79
  end
@@ -21,6 +21,7 @@ module LlmOptimizer
21
21
  cache_ttl
22
22
  llm_caller
23
23
  embedding_caller
24
+ classifier_caller
24
25
  ].freeze
25
26
 
26
27
  # Define readers for all known keys (setters below track explicit sets)
@@ -45,6 +46,7 @@ module LlmOptimizer
45
46
  @cache_ttl = 86_400
46
47
  @llm_caller = nil
47
48
  @embedding_caller = nil
49
+ @classifier_caller = nil
48
50
  end
49
51
 
50
52
  # Copies only explicitly set keys from other_config without resetting unmentioned keys.
@@ -6,27 +6,55 @@ module LlmOptimizer
6
6
  COMPLEX_PHRASES = ["explain in detail"].freeze
7
7
  CODE_BLOCK_RE = /```|~~~/
8
8
 
9
+ CLASSIFIER_PROMPT = <<~PROMPT
10
+ Classify the following prompt as either 'simple' or 'complex'.
11
+
12
+ Rules:
13
+ - simple: factual questions, basic lookups, short explanations, greetings
14
+ - complex: code generation, debugging, architecture, multi-step reasoning, analysis
15
+
16
+ Reply with exactly one word: simple or complex
17
+
18
+ Prompt: %<prompt>s
19
+ PROMPT
20
+
9
21
  def initialize(config)
10
22
  @config = config
11
23
  end
12
24
 
13
25
  def route(prompt)
14
- # explicit override
26
+ # Explicit override — always
15
27
  return @config.route_to if %i[simple complex].include?(@config.route_to)
16
28
 
17
- # fenced code block
29
+ # Unambiguous fast-path signals (no LLM call needed)
18
30
  return :complex if CODE_BLOCK_RE.match?(prompt)
19
31
 
20
- # complex keywords or phrases
21
32
  lower = prompt.downcase
22
33
  return :complex if COMPLEX_KEYWORDS.any? { |kw| lower.include?(kw) }
23
- return :complex if COMPLEX_PHRASES.any? { |ph| lower.include?(ph) }
34
+ return :complex if COMPLEX_PHRASES.any? { |ph| lower.include?(ph) }
35
+
36
+ # LLM classifier for ambiguous prompts
37
+ if @config.classifier_caller
38
+ result = classify_with_llm(prompt)
39
+ return result if result
40
+ end
41
+
42
+ # Fallback heuristic
43
+ prompt.split.length < 20 ? :simple : :complex
44
+ end
45
+
46
+ private
24
47
 
25
- # short prompt
26
- return :simple if prompt.split.length < 20
48
+ def classify_with_llm(prompt)
49
+ classifier_prompt = format(CLASSIFIER_PROMPT, prompt: prompt)
50
+ response = @config.classifier_caller.call(classifier_prompt)
51
+ normalized = response.to_s.strip.downcase.gsub(/[^a-z]/, "")
52
+ return :simple if normalized == "simple"
53
+ return :complex if normalized == "complex"
27
54
 
28
- # default
29
- :complex
55
+ nil # unrecognized response — fall through to heuristic
56
+ rescue StandardError
57
+ nil # classifier failure — fall through to heuristic
30
58
  end
31
59
  end
32
60
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LlmOptimizer
4
- VERSION = "0.1.2"
4
+ VERSION = "0.1.3"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_optimizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - arun kumar