mini_embed 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fd4a9fa127d0882eef7443594736c7ed633bf4728f75cfcf49a2987a515b3e8e
4
- data.tar.gz: 632a8f4cdd9f2a218dc025b47e6f4c19c03ee81db67f5f50c8184cf14809b05e
3
+ metadata.gz: '038f53048205e4db0def9faa8fa718580f9e089a2eb057ca64ee86a9794f8fa6'
4
+ data.tar.gz: d5d37dd58c4bb3671053acb280db02ebb2ef78722d9c115f57f2594ad3a9ab50
5
5
  SHA512:
6
- metadata.gz: 1a3bf50d26e8d53a560e97f1b3125797b1f6de94c773a344d1acb96ab1ef6a8b7a731707f104e4d9cdd856e3df6e1579b9510dafc959d890c6623441d470fa95
7
- data.tar.gz: ac6f937aafff0dd9dc93193ac85eae4293eb0fa51dbb56897e4ad25c60cf784b9c7896032dc48607b4d702bf81cbb0524b8f3dddbc90d190c2eacdee63200dfb
6
+ metadata.gz: 9af0cca4fe5cf57f8ac43f1b410f37faac267090e7cb54aa52aecc990343c283899b81675d62314a0982574756027e1d367b3ab180196ad8a5a68e4cd6d0cc2e
7
+ data.tar.gz: f5bb3db889b9c51348daed59c3fbab9496237c3e9a64cb908ef386a1093e5e678531a5ad10eb051d0614dbe1fb9217d93a32049e6a5b8392b053d2474d6e9606
data/README.md CHANGED
@@ -52,15 +52,19 @@ require 'mini_embed'
52
52
  # Load a GGUF model (F32, F16, Q8_0, Q4_K, etc. are all supported)
53
53
  model = MiniEmbed.new(model: '/path/to/gte-small.Q8_0.gguf')
54
54
 
55
- # Get the raw binary string (little‑endian 32‑bit floats)
56
- binary = model.embeddings(text: 'hello world')
57
-
58
- # Get an embedding as an array of floats
59
- embedding = binary.unpack('e*')
55
+ # Get embedding as an array of floats (default)
56
+ embedding = model.embeddings(text: 'hello world')
60
57
  puts embedding.size # e.g. 384
61
58
  puts embedding[0..4] # e.g. [0.0123, -0.0456, ...]
59
+
60
+ # Or get the raw binary string (little‑endian 32‑bit floats)
61
+ binary = model.embeddings(text: 'hello world', type: :binary)
62
+ embedding_from_binary = binary.unpack('e*')
62
63
  ```
63
64
 
65
+ Note: The type parameter is optional – it defaults to :vector which returns a Ruby `Array<Float>`. Use `type: :binary` to get the raw binary string (compatible with the original C extension).
66
+
67
+
64
68
  ## Simple tokenization note
65
69
  MiniEmbed uses a naive space‑based tokenizer. This means it splits input on spaces and looks up each token exactly in the model's vocabulary. For models trained with subword tokenization (like BERT), this will not work for out‑of‑vocabulary words.
66
70
  If you need proper subword tokenization, you can: