red-candle 0.0.3 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # red-candle
2
2
 
3
- [![build](https://github.com/kojix2/red-candle/actions/workflows/build.yml/badge.svg)](https://github.com/kojix2/red-candle/actions/workflows/build.yml)
3
+ [![build](https://github.com/assaydepot/red-candle/actions/workflows/build.yml/badge.svg)](https://github.com/assaydepot/red-candle/actions/workflows/build.yml)
4
4
  [![Gem Version](https://badge.fury.io/rb/red-candle.svg)](https://badge.fury.io/rb/red-candle)
5
5
 
6
6
  🕯️ [candle](https://github.com/huggingface/candle) - Minimalist ML framework - for Ruby
@@ -18,6 +18,50 @@ x = x.reshape([3, 2])
18
18
  # Tensor[[3, 2], f32]
19
19
  ```
20
20
 
21
+ ```ruby
22
+ require 'candle'
23
+ model = Candle::Model.new
24
+ embedding = model.embedding("Hi there!")
25
+ ```
26
+
27
+ ## A note on memory usage
28
+ The `Candle::Model` defaults to the `jinaai/jina-embeddings-v2-base-en` model with the `sentence-transformers/all-MiniLM-L6-v2` tokenizer (both from [HuggingFace](https://huggingface.co)). With this configuration the model takes a little more than 3GB of memory running on my Mac. The memory stays with the instantiated `Candle::Model` class, if you instantiate more than one, you'll use more memory. Likewise, if you let it go out of scope and call the garbage collector, you'll free the memory. For example:
29
+
30
+ ```ruby
31
+ > require 'candle'
32
+ # Ruby memory = 25.9 MB
33
+ > model = Candle::Model.new
34
+ # Ruby memory = 3.50 GB
35
+ > model2 = Candle::Model.new
36
+ # Ruby memory = 7.04 GB
37
+ > model2 = nil
38
+ > GC.start
39
+ # Ruby memory = 3.56 GB
40
+ > model = nil
41
+ > GC.start
42
+ # Ruby memory = 55.2 MB
43
+ ```
44
+
45
+ ## A note on returned embeddings
46
+
47
+ The code should match the same embeddings when generated from the python `transformers` library. For instance, locally I was able to generate the same embedding for the text "Hi there!" using the python code:
48
+
49
+ ```python
50
+ from transformers import AutoModel
51
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
52
+ sentence = ['Hi there!']
53
+ embedding = model.encode(sentence)
54
+ print(embedding)
55
+ ```
56
+
57
+ And the following ruby:
58
+
59
+ ```ruby
60
+ require 'candle'
61
+ model = Candle::Model.new
62
+ embedding = model.embedding("Hi there!")
63
+ ```
64
+
21
65
  ## Development
22
66
 
23
67
  FORK IT!
@@ -31,15 +75,8 @@ bundle exec rake compile
31
75
 
32
76
  Implemented with [Magnus](https://github.com/matsadler/magnus), with reference to [Polars Ruby](https://github.com/ankane/polars-ruby)
33
77
 
34
- Policies
35
- - The less code, the better.
36
- - Ideally, the PyPO3 code should work as is.
37
- - Error handling is minimal.
38
-
39
78
  Pull requests are welcome.
40
79
 
41
- kojix2 started this project to learn Rust, but does not necessarily have enough time to maintain this library. If you are interested in becoming a project owner or committer, please send me a pull request.
42
-
43
80
  ### See Also
44
81
 
45
82
  - [Numo::NArray](https://github.com/ruby-numo/numo-narray)
@@ -7,7 +7,13 @@ edition = "2021"
7
7
  crate-type = ["cdylib"]
8
8
 
9
9
  [dependencies]
10
- candle-core = "0.2"
11
- candle-nn = "0.2"
10
+ candle-core = "0.4.1"
11
+ candle-nn = "0.4.1"
12
+ candle-transformers = "0.4.1"
13
+ tokenizers = { version = "0.15.0", default-features = true, features = ["fancy-regex"], exclude = ["onig"] }
14
+ hf-hub = "0.3.0"
12
15
  half = "2"
13
16
  magnus = "0.6"
17
+
18
+ [profile.test]
19
+ opt-level = 3
@@ -1,4 +1,4 @@
1
- require 'mkmf'
2
- require 'rb_sys/mkmf'
1
+ require "mkmf"
2
+ require "rb_sys/mkmf"
3
3
 
4
- create_rust_makefile('candle/candle')
4
+ create_rust_makefile("candle/candle")