RubyGems - red-candle - Versions diffs - 0.0.3 → 0.0.4 - Mend

red-candle 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: aced5f3b8f49c525244d1f464af0063d71785b883f54b4a2fe4996c1d6f0f8ea
-  data.tar.gz: 934333c1d6fd74aca84a36086a4dcd157722af1be127494da225446a2fd0a193
+  metadata.gz: 5ec591f2ace2a1706864c5ee80d9a55b2832e493459180eaaa00a18b892d2276
+  data.tar.gz: 8807b8b426f6778e71876f3662a4881ae429e170551f2f2cae573feebd623a11
 SHA512:
-  metadata.gz: 9779d1e2c244c477eae0a706f1f89f98cd6d95714a46792ac43395a4e7c5db19fa005bea072011ae15e573435288eb161c236459378faf0765293373ec12adf2
-  data.tar.gz: 9687c8596dca420c96fee69c0614dc275d31aa8d898868e74407798ca8eb0b343dcd9d3936116f68768fb21da3071b6975789efe531876fd663af873723dd60f
+  metadata.gz: 3358efa6942ee8e0fca86051f3469878ecce217b3e5d6fd72c26f63cba3eb90d4b6a4bd19a06c1d83736b883a6465a95fe002b289b7ff5ad3f7eb0685c5f6342
+  data.tar.gz: 3abf5cf27f34b72c5690cfc0b8dd8314964c4b8bf62cd190fd90bbb28eae8b9b2a4a86f4604a8f9d188221d9cc29c6bcd8951e74fa9500978af2132d7d58acac

data/README.md CHANGED Viewed

@@ -18,6 +18,50 @@ x = x.reshape([3, 2])
 # Tensor[[3, 2], f32]
 ```
+```ruby
+require 'candle'
+model = Candle::Model.new
+embedding = model.embedding("Hi there!")
+```
+## A note on memory usage
+The `Candle::Model` defaults to the `jinaai/jina-embeddings-v2-base-en` model with the `sentence-transformers/all-MiniLM-L6-v2` tokenizer (both from [HuggingFace](https://huggingface.co)). With this configuration the model takes a little more than 3GB of memory running on my Mac. The memory stays with the instantiated `Candle::Model` class, if you instantiate more than one, you'll use more memory. Likewise, if you let it go out of scope and call the garbage collector, you'll free the memory. For example:
+```ruby
+> require 'candle'
+# Ruby memory = 25.9 MB
+> model = Candle::Model.new
+# Ruby memory = 3.50 GB
+> model2 = Candle::Model.new
+# Ruby memory = 7.04 GB
+> model2 = nil
+> GC.start
+# Ruby memory = 3.56 GB
+> model = nil
+> GC.start
+# Ruby memory = 55.2 MB
+```
+## A note on returned embeddings
+The code should match the same embeddings when generated from the python `transformers` library. For instance, locally I was able to generate the same embedding for the text "Hi there!" using the python code:
+```python
+from transformers import AutoModel
+model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
+sentence = ['Hi there!']
+embedding = model.encode(sentence)
+print(embedding)
+```
+And the following ruby:
+```ruby
+require 'candle'
+model = Candle::Model.new
+embedding = model.embedding("Hi there!")
+```
 ## Development
 FORK IT!
@@ -29,6 +73,7 @@ bundle
 bundle exec rake compile
 ```
 Implemented with [Magnus](https://github.com/matsadler/magnus), with reference to [Polars Ruby](https://github.com/ankane/polars-ruby)
 Policies

data/ext/candle/Cargo.toml CHANGED Viewed

@@ -7,7 +7,13 @@ edition = "2021"
 crate-type = ["cdylib"]
 [dependencies]
-candle-core = "0.2"
-candle-nn = "0.2"
+candle-core = "0.4.1"
+candle-nn = "0.4.1"
+candle-transformers = "0.4.1"
+tokenizers = { version = "0.15.0", default-features = true, features = ["fancy-regex"], exclude = ["onig"] }
+hf-hub = "0.3.0"
 half = "2"
 magnus = "0.6"
+[profile.test]
+opt-level = 3