RubyGems - mini_embed - Versions diffs - 0.1.1 → 0.2.1 - Mend

mini_embed 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

checksums.yaml +4 -4
data/README.md +9 -5
data/ext/mini_embed/mini_embed.c +788 -603
data/lib/mini_embed.rb +14 -0
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fd4a9fa127d0882eef7443594736c7ed633bf4728f75cfcf49a2987a515b3e8e
-  data.tar.gz: 632a8f4cdd9f2a218dc025b47e6f4c19c03ee81db67f5f50c8184cf14809b05e
+  metadata.gz: '038f53048205e4db0def9faa8fa718580f9e089a2eb057ca64ee86a9794f8fa6'
+  data.tar.gz: d5d37dd58c4bb3671053acb280db02ebb2ef78722d9c115f57f2594ad3a9ab50
 SHA512:
-  metadata.gz: 1a3bf50d26e8d53a560e97f1b3125797b1f6de94c773a344d1acb96ab1ef6a8b7a731707f104e4d9cdd856e3df6e1579b9510dafc959d890c6623441d470fa95
-  data.tar.gz: ac6f937aafff0dd9dc93193ac85eae4293eb0fa51dbb56897e4ad25c60cf784b9c7896032dc48607b4d702bf81cbb0524b8f3dddbc90d190c2eacdee63200dfb
+  metadata.gz: 9af0cca4fe5cf57f8ac43f1b410f37faac267090e7cb54aa52aecc990343c283899b81675d62314a0982574756027e1d367b3ab180196ad8a5a68e4cd6d0cc2e
+  data.tar.gz: f5bb3db889b9c51348daed59c3fbab9496237c3e9a64cb908ef386a1093e5e678531a5ad10eb051d0614dbe1fb9217d93a32049e6a5b8392b053d2474d6e9606

data/README.md CHANGED Viewed

@@ -52,15 +52,19 @@ require 'mini_embed'
 # Load a GGUF model (F32, F16, Q8_0, Q4_K, etc. are all supported)
 model = MiniEmbed.new(model: '/path/to/gte-small.Q8_0.gguf')
-# Get the raw binary string (little‑endian 32‑bit floats)
-binary = model.embeddings(text: 'hello world')
-# Get an embedding as an array of floats
-embedding = binary.unpack('e*')
+# Get embedding as an array of floats (default)
+embedding = model.embeddings(text: 'hello world')
 puts embedding.size   # e.g. 384
 puts embedding[0..4]  # e.g. [0.0123, -0.0456, ...]
+# Or get the raw binary string (little‑endian 32‑bit floats)
+binary = model.embeddings(text: 'hello world', type: :binary)
+embedding_from_binary = binary.unpack('e*')
 ```
+Note: The type parameter is optional – it defaults to :vector which returns a Ruby `Array<Float>`. Use `type: :binary` to get the raw binary string (compatible with the original C extension).
 ## Simple tokenization note
 MiniEmbed uses a naive space‑based tokenizer. This means it splits input on spaces and looks up each token exactly in the model's vocabulary. For models trained with subword tokenization (like BERT), this will not work for out‑of‑vocabulary words.
 If you need proper subword tokenization, you can: