red-candle 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 829a937851c782dfd58b8fb724dc7b08d524d26400047e9f5fc7a5bd0de9cb4b
4
- data.tar.gz: e8a9420fc310e977968aa396a47e5e5269d107eb8cb7246ca9e2f980a0a28f4d
3
+ metadata.gz: 7405c9911d6088106dd7a19e96312f12b86e9a80087c3d7745cd3911e263890a
4
+ data.tar.gz: a88b75152708e72e019aba9acfeabd899f1ae1d1c567562ded6d2c6aa8eae8d0
5
5
  SHA512:
6
- metadata.gz: 020e23df61d5679612a7892bdbfc7dbcf2d28055df9fc6b9199a8e09933e74d03a0a0fb97d6ddc998f8f8e70e856d63d7eb758638e072cd492734b21681267b7
7
- data.tar.gz: 5e10f888c2bd74dfdf01c97ca18f6666440edb000d524784ad0ac0676208d9485b8d708fa72d8fe73672f018c038b3526109153540ce5bbc53a81e4e30deccd1
6
+ metadata.gz: ce1cc52dc1223968f3398ab0972283a6309a80d306c14193f23336cd36ed55c8fa5eaaaf05d756f76c88e442abe19b0d82d2742a49199930ef6effcffd6d4482
7
+ data.tar.gz: 8c30f3c0c096f8186b219a9a5d0fe92928621126f545eaabae26ddedc843515b7da8c45890a1f24a7d519c0c68fcd55b0553ea73f977644258ff30c5e5ccd2f1
data/README.md CHANGED
@@ -1,9 +1,85 @@
1
- # red-candle
1
+ # `red-candle` Native LLMs for Ruby 🚀
2
2
 
3
3
  [![build](https://github.com/assaydepot/red-candle/actions/workflows/build.yml/badge.svg)](https://github.com/assaydepot/red-candle/actions/workflows/build.yml)
4
4
  [![Gem Version](https://badge.fury.io/rb/red-candle.svg)](https://badge.fury.io/rb/red-candle)
5
5
 
6
- [candle](https://github.com/huggingface/candle) - Minimalist ML framework - for Ruby
6
+ Run state-of-the-art **language models directly from Ruby**. No Python, no APIs, no external services - just Ruby with blazing-fast Rust under the hood. Hardware accelerated with **Metal (Mac)** and **CUDA (NVIDIA).**
7
+
8
+ ## Install & Chat in 30 Seconds
9
+
10
+ [![red-candle quickstart](https://img.youtube.com/vi/hbyFCyh8esk/0.jpg)](https://www.youtube.com/watch?v=hbyFCyh8esk)
11
+
12
+ ```bash
13
+ # Install the gem
14
+ gem install red-candle
15
+ ```
16
+
17
+ ```ruby
18
+ require 'candle'
19
+
20
+ # Download a model (one-time, ~650MB) - Mistral, Llama3, Gemma all work!
21
+ llm = Candle::LLM.from_pretrained("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
22
+ gguf_file: "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
23
+
24
+ # Chat with it - no API calls, running locally in your Ruby process!
25
+ messages = [
26
+ { role: "user", content: "Explain Ruby in one sentence" }
27
+ ]
28
+
29
+ puts llm.chat(messages)
30
+ # => "Ruby is a dynamic, object-oriented programming language known for its
31
+ # simplicity, elegance, and productivity, often used for web development
32
+ # with frameworks like Rails."
33
+ ```
34
+
35
+ ## What Just Happened?
36
+
37
+ You just ran a 1.1-billion parameter AI model inside Ruby. The model lives in your process memory, runs on your hardware (CPU/GPU), and responds instantly without network latency.
38
+
39
+ ## Stream Responses Like a Pro
40
+
41
+ ```ruby
42
+ # Watch the AI think in real-time
43
+ llm.chat_stream(messages) do |token|
44
+ print token
45
+ end
46
+ ```
47
+
48
+ ## Why This Matters
49
+
50
+ - **Privacy**: Your data never leaves your machine
51
+ - **Speed**: No network overhead, direct memory access
52
+ - **Control**: Fine-tune generation parameters, access raw tokens
53
+ - **Integration**: It's just Ruby objects - use it anywhere Ruby runs
54
+
55
+ ## Supports
56
+
57
+ - **Tokenizers**: Access the tokenizer directly
58
+ - **EmbeddingModel**: Generate embeddings for text
59
+ - **Reranker**: Rerank documents based on relevance
60
+ - **NER**: Named Entity Recognition directly from Ruby
61
+ - **LLM**: Chat with Large Language Models (e.g., Llama, Mistral, Gemma)
62
+
63
+ ## Model Storage
64
+
65
+ Models are automatically downloaded and cached when you first use them. They are stored in:
66
+ - **Location**: `~/.cache/huggingface/hub/`
67
+ - **Size**: Models range from ~100MB (embeddings) to several GB (LLMs)
68
+ - **Reuse**: Models are downloaded once and reused across sessions
69
+
70
+ To check your cache or manage storage:
71
+ ```bash
72
+ # View cache contents
73
+ ls -la ~/.cache/huggingface/hub/
74
+
75
+ # Check total cache size
76
+ du -sh ~/.cache/huggingface/
77
+
78
+ # Clear cache if needed (removes all downloaded models)
79
+ rm -rf ~/.cache/huggingface/hub/
80
+ ```
81
+
82
+ ----
7
83
 
8
84
  ## Usage
9
85
 
@@ -137,7 +213,7 @@ llm = Candle::LLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", device:
137
213
  # Metal
138
214
  device = Candle::Device.metal
139
215
 
140
- # CUDA support (for NVIDIA GPUs COMING SOON)
216
+ # CUDA support (for NVIDIA GPUs)
141
217
  device = Candle::Device.cuda # Linux/Windows with NVIDIA GPU
142
218
  ```
143
219
 
@@ -671,7 +747,7 @@ All NER methods return entities in a consistent format:
671
747
 
672
748
  ## Common Runtime Errors
673
749
 
674
- ### 1. Weight is negative, too large or not a valid number
750
+ ### Weight is negative, too large or not a valid number
675
751
 
676
752
  **Error:**
677
753
  ```
@@ -688,13 +764,12 @@ All NER methods return entities in a consistent format:
688
764
  - Q3_K_M (3-bit) - Minimum recommended quantization
689
765
 
690
766
  ```ruby
691
- # Instead of Q2_K:
692
767
  llm = Candle::LLM.from_pretrained("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
693
768
  device: device,
694
769
  gguf_file: "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
695
770
  ```
696
771
 
697
- ### 2. Cannot find tensor model.embed_tokens.weight
772
+ ### Cannot find tensor model.embed_tokens.weight
698
773
 
699
774
  **Error:**
700
775
  ```
@@ -713,7 +788,7 @@ Failed to load quantized model: cannot find tensor model.embed_tokens.weight (Ru
713
788
  ```
714
789
  3. If the error persists, the GGUF file may use an unsupported architecture or format
715
790
 
716
- ### 3. No GGUF file found in repository
791
+ ### No GGUF file found in repository
717
792
 
718
793
  **Error:**
719
794
  ```
@@ -730,7 +805,7 @@ llm = Candle::LLM.from_pretrained("TheBloke/Llama-2-7B-Chat-GGUF",
730
805
  gguf_file: "llama-2-7b-chat.Q4_K_M.gguf")
731
806
  ```
732
807
 
733
- ### 4. Failed to download tokenizer
808
+ ### Failed to download tokenizer
734
809
 
735
810
  **Error:**
736
811
  ```
@@ -741,7 +816,7 @@ Failed to load quantized model: Failed to download tokenizer: request error: HTT
741
816
 
742
817
  **Solution:** The code now includes fallback tokenizer loading. If you still encounter this error, ensure you're using the latest version of red-candle.
743
818
 
744
- ### 5. Missing metadata in GGUF file
819
+ ### Missing metadata in GGUF file
745
820
 
746
821
  **Error:**
747
822
  ```
@@ -770,17 +845,24 @@ Failed to load GGUF model: cannot find llama.attention.head_count in metadata (R
770
845
  FORK IT!
771
846
 
772
847
  ```
773
- git clone https://github.com/your_name/red-candle
848
+ git clone https://github.com/assaydepot/red-candle
774
849
  cd red-candle
775
850
  bundle
776
851
  bundle exec rake compile
777
852
  ```
778
853
 
779
- Implemented with [Magnus](https://github.com/matsadler/magnus), with reference to [Polars Ruby](https://github.com/ankane/polars-ruby)
780
-
781
854
  Pull requests are welcome.
782
855
 
783
- ### See Also
856
+ ## Release
857
+
858
+ 1. Update version number in `lib/candle/version.rb` and commit.
859
+ 2. `bundle exec rake build`
860
+ 3. `git tag VERSION_NUMBER`
861
+ 4. `git push --follow-tags`
862
+ 5. `gem push pkg/red-candle-VERSION_NUMBER.gem`
863
+
864
+ ## See Also
784
865
 
785
- - [Numo::NArray](https://github.com/ruby-numo/numo-narray)
786
- - [Cumo](https://github.com/sonots/cumo)
866
+ - [Candle](https://github.com/huggingface/candle)
867
+ - [Magnus](https://github.com/matsadler/magnus)
868
+ - [Outlines-core](https://github.com/dottxt-ai/outlines-core)
data/ext/candle/build.rs CHANGED
@@ -16,6 +16,7 @@ fn main() {
16
16
  println!("cargo:rerun-if-env-changed=CUDA_PATH");
17
17
  println!("cargo:rerun-if-env-changed=CANDLE_FEATURES");
18
18
  println!("cargo:rerun-if-env-changed=CANDLE_ENABLE_CUDA");
19
+ println!("cargo:rerun-if-env-changed=CANDLE_DISABLE_CUDA");
19
20
 
20
21
  // Check if we should force CPU only
21
22
  if env::var("CANDLE_FORCE_CPU").is_ok() {
@@ -26,13 +27,13 @@ fn main() {
26
27
 
27
28
  // Detect CUDA availability
28
29
  let cuda_available = detect_cuda();
29
- let cuda_enabled = env::var("CANDLE_ENABLE_CUDA").is_ok();
30
+ let cuda_disabled = env::var("CANDLE_DISABLE_CUDA").is_ok();
30
31
 
31
- if cuda_available && cuda_enabled {
32
+ if cuda_available && !cuda_disabled {
32
33
  println!("cargo:rustc-cfg=has_cuda");
33
- println!("cargo:warning=CUDA detected and enabled via CANDLE_ENABLE_CUDA");
34
- } else if cuda_available && !cuda_enabled {
35
- println!("cargo:warning=CUDA detected but not enabled. To enable CUDA support (coming soon), set CANDLE_ENABLE_CUDA=1");
34
+ println!("cargo:warning=CUDA detected, CUDA acceleration will be available");
35
+ } else if cuda_available && cuda_disabled {
36
+ println!("cargo:warning=CUDA detected but disabled via CANDLE_DISABLE_CUDA");
36
37
  }
37
38
 
38
39
  // Detect Metal availability (macOS only)
@@ -15,10 +15,10 @@ else
15
15
  (File.exist?('C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA') ||
16
16
  File.exist?('C:\CUDA')))
17
17
 
18
- cuda_enabled = ENV['CANDLE_ENABLE_CUDA']
18
+ cuda_disabled = ENV['CANDLE_DISABLE_CUDA']
19
19
 
20
- if cuda_available && cuda_enabled
21
- puts "CUDA detected and enabled via CANDLE_ENABLE_CUDA"
20
+ if cuda_available && !cuda_disabled
21
+ puts "CUDA detected, enabling CUDA support"
22
22
  features << 'cuda'
23
23
 
24
24
  # Check if CUDNN should be enabled
@@ -26,10 +26,9 @@ else
26
26
  puts "CUDNN support enabled"
27
27
  features << 'cudnn'
28
28
  end
29
- elsif cuda_available && !cuda_enabled
29
+ elsif cuda_available && cuda_disabled
30
30
  puts "=" * 80
31
- puts "CUDA detected but not enabled."
32
- puts "To enable CUDA support (coming soon), set CANDLE_ENABLE_CUDA=1"
31
+ puts "CUDA detected but disabled via CANDLE_DISABLE_CUDA"
33
32
  puts "=" * 80
34
33
  end
35
34
 
@@ -15,8 +15,8 @@ module Candle
15
15
  if cuda_potentially_available
16
16
  warn "=" * 80
17
17
  warn "Red Candle: CUDA detected on system but not enabled in build."
18
- warn "To enable CUDA support (experimental), reinstall with:"
19
- warn " CANDLE_ENABLE_CUDA=1 gem install red-candle"
18
+ warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
19
+ warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
20
20
  warn "=" * 80
21
21
  end
22
22
  # :nocov:
@@ -1,5 +1,5 @@
1
1
  # :nocov:
2
2
  module Candle
3
- VERSION = "1.0.0"
3
+ VERSION = "1.0.2"
4
4
  end
5
5
  # :nocov:
data/lib/red-candle.rb ADDED
@@ -0,0 +1 @@
1
+ require 'candle'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: red-candle
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Christopher Petersen
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2025-07-19 00:00:00.000000000 Z
12
+ date: 2025-07-22 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rb_sys
@@ -197,6 +197,7 @@ files:
197
197
  - lib/candle/tensor.rb
198
198
  - lib/candle/tokenizer.rb
199
199
  - lib/candle/version.rb
200
+ - lib/red-candle.rb
200
201
  homepage: https://github.com/assaydepot/red-candle
201
202
  licenses:
202
203
  - MIT
@@ -216,9 +217,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
216
217
  - !ruby/object:Gem::Version
217
218
  version: 3.3.26
218
219
  requirements:
219
- - Rust >= 1.61
220
+ - Rust >= 1.65
220
221
  rubygems_version: 3.5.3
221
222
  signing_key:
222
223
  specification_version: 4
223
- summary: huggingface/candle for ruby
224
+ summary: huggingface/candle for Ruby
224
225
  test_files: []