llm_optimizer 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -1
- data/lib/llm_optimizer/semantic_cache.rb +13 -3
- data/lib/llm_optimizer/version.rb +1 -1
- data/lib/llm_optimizer.rb +1 -1
- metadata +1 -15
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6a0351ff5590228acf939201d0c7eee71e33ee39a0cd20df33e76187c827ab34
|
|
4
|
+
data.tar.gz: 6bc7df5aa71407be80ecd07104e1dbff9a25a9ecab7a95cb390261f97fa212e8
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e6822ea254300a957c8aa5953d267695ebc2ae4c1e2fa478492d175534bc990eac9bb5bb39127b8418e7306a3f5de9e8124832f9aeac9f723979e92a68babdec
|
|
7
|
+
data.tar.gz: 4248569dc2a969518142ac2749b6b6e9dc6defedec71e67eeec2480ade3a1aea05261736a2438c994906c05a1d1a1a7991055d48eab9ac0cc33f1344a88a221c
|
data/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.1.2] - 2026-04-10
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
- `SemanticCache` used `pack("f*")` (32-bit) for both the Redis key hash and embedding serialization, causing precision loss on round-trip through MessagePack. Switched to `pack("G*")` / `unpack("G*")` (64-bit IEEE 754) — self-similarity is now exactly `1.0` and cache lookups work correctly with real embedding providers (Voyage AI, OpenAI, Cohere, etc.)
|
|
14
|
+
- `HistoryManager` summarization failed with `ConfigurationError: No llm_caller configured` when invoked through the gateway pipeline. The internal `raw_llm_call` lambda was missing `config: call_config`, so it couldn't resolve the user's configured `llm_caller`
|
|
15
|
+
- Updated `test/unit/test_gateway.rb` mock Redis helper to use `pack("G*")` to match the corrected `SemanticCache` key format
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
- `bin/test_semantic_cache.rb` — runnable smoke test for semantic cache using Voyage AI embeddings + Anthropic Claude
|
|
19
|
+
- `bin/test_history_manager.rb` — runnable smoke test for history manager sliding window using Anthropic Claude
|
|
20
|
+
|
|
10
21
|
## [0.1.1] - 2026-04-10
|
|
11
22
|
|
|
12
23
|
### Fixed
|
|
@@ -46,5 +57,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
46
57
|
- `OptimizeResult` struct with `response`, `model`, `model_tier`, `cache_status`, `original_tokens`, `compressed_tokens`, `latency_ms`, `messages`
|
|
47
58
|
- Unit test suite covering all components with positive and negative scenarios using Minitest + Mocha
|
|
48
59
|
|
|
49
|
-
[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.
|
|
60
|
+
[Unreleased]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.2...HEAD
|
|
61
|
+
[0.1.2]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.1...v0.1.2
|
|
62
|
+
[0.1.1]: https://github.com/arunkumarry/llm_optimizer/compare/v0.1.0...v0.1.1
|
|
50
63
|
[0.1.0]: https://github.com/arunkumarry/llm_optimizer/releases/tag/v0.1.0
|
|
@@ -15,7 +15,13 @@ module LlmOptimizer
|
|
|
15
15
|
|
|
16
16
|
def store(embedding, response)
|
|
17
17
|
key = cache_key(embedding)
|
|
18
|
-
|
|
18
|
+
# Serialize embedding as raw 64-bit big-endian doubles to preserve full
|
|
19
|
+
# Float precision. MessagePack silently downcasts Ruby Float to 32-bit,
|
|
20
|
+
# which corrupts cosine similarity on deserialization.
|
|
21
|
+
payload = MessagePack.pack({
|
|
22
|
+
"embedding" => embedding.pack("G*"), # binary string, lossless
|
|
23
|
+
"response" => response
|
|
24
|
+
})
|
|
19
25
|
@redis.set(key, payload, ex: @ttl)
|
|
20
26
|
rescue ::Redis::BaseError => e
|
|
21
27
|
warn "[llm_optimizer] SemanticCache store failed: #{e.message}"
|
|
@@ -33,7 +39,8 @@ module LlmOptimizer
|
|
|
33
39
|
next unless raw
|
|
34
40
|
|
|
35
41
|
entry = MessagePack.unpack(raw)
|
|
36
|
-
|
|
42
|
+
# Unpack the binary string back to 64-bit doubles
|
|
43
|
+
stored_embedding = entry["embedding"].unpack("G*")
|
|
37
44
|
score = cosine_similarity(embedding, stored_embedding)
|
|
38
45
|
|
|
39
46
|
if score > best_score
|
|
@@ -60,7 +67,10 @@ module LlmOptimizer
|
|
|
60
67
|
private
|
|
61
68
|
|
|
62
69
|
def cache_key(embedding)
|
|
63
|
-
|
|
70
|
+
# Use "G*" (64-bit big-endian double) to match Ruby's native Float precision.
|
|
71
|
+
# "f*" (32-bit) truncates precision and produces inconsistent hashes for the
|
|
72
|
+
# same embedding across serialize/deserialize round trips.
|
|
73
|
+
KEY_NAMESPACE + Digest::SHA256.hexdigest(embedding.pack("G*"))
|
|
64
74
|
end
|
|
65
75
|
end
|
|
66
76
|
end
|
data/lib/llm_optimizer.rb
CHANGED
|
@@ -158,7 +158,7 @@ module LlmOptimizer
|
|
|
158
158
|
# History management
|
|
159
159
|
messages = options[:messages]
|
|
160
160
|
if call_config.manage_history && messages
|
|
161
|
-
llm_caller = ->(p, model:) { raw_llm_call(p, model: model) }
|
|
161
|
+
llm_caller = ->(p, model:) { raw_llm_call(p, model: model, config: call_config) }
|
|
162
162
|
history_mgr = HistoryManager.new(
|
|
163
163
|
llm_caller: llm_caller,
|
|
164
164
|
simple_model: call_config.simple_model,
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: llm_optimizer
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- arun kumar
|
|
@@ -79,20 +79,6 @@ dependencies:
|
|
|
79
79
|
- - "~>"
|
|
80
80
|
- !ruby/object:Gem::Version
|
|
81
81
|
version: '0.65'
|
|
82
|
-
- !ruby/object:Gem::Dependency
|
|
83
|
-
name: prop_check
|
|
84
|
-
requirement: !ruby/object:Gem::Requirement
|
|
85
|
-
requirements:
|
|
86
|
-
- - "~>"
|
|
87
|
-
- !ruby/object:Gem::Version
|
|
88
|
-
version: '1.0'
|
|
89
|
-
type: :development
|
|
90
|
-
prerelease: false
|
|
91
|
-
version_requirements: !ruby/object:Gem::Requirement
|
|
92
|
-
requirements:
|
|
93
|
-
- - "~>"
|
|
94
|
-
- !ruby/object:Gem::Version
|
|
95
|
-
version: '1.0'
|
|
96
82
|
description: llm_optimizer reduces LLM API costs by up to 80% through semantic caching,
|
|
97
83
|
intelligent model routing, token pruning, and conversation history summarization.
|
|
98
84
|
Strictly opt-in and non-invasive.
|