red-candle 1.0.0 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +97 -15
- data/ext/candle/build.rs +6 -5
- data/ext/candle/extconf.rb +5 -6
- data/lib/candle/build_info.rb +2 -2
- data/lib/candle/version.rb +1 -1
- data/lib/red-candle.rb +1 -0
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7405c9911d6088106dd7a19e96312f12b86e9a80087c3d7745cd3911e263890a
|
4
|
+
data.tar.gz: a88b75152708e72e019aba9acfeabd899f1ae1d1c567562ded6d2c6aa8eae8d0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ce1cc52dc1223968f3398ab0972283a6309a80d306c14193f23336cd36ed55c8fa5eaaaf05d756f76c88e442abe19b0d82d2742a49199930ef6effcffd6d4482
|
7
|
+
data.tar.gz: 8c30f3c0c096f8186b219a9a5d0fe92928621126f545eaabae26ddedc843515b7da8c45890a1f24a7d519c0c68fcd55b0553ea73f977644258ff30c5e5ccd2f1
|
data/README.md
CHANGED
@@ -1,9 +1,85 @@
|
|
1
|
-
# red-candle
|
1
|
+
# `red-candle` Native LLMs for Ruby 🚀
|
2
2
|
|
3
3
|
[](https://github.com/assaydepot/red-candle/actions/workflows/build.yml)
|
4
4
|
[](https://badge.fury.io/rb/red-candle)
|
5
5
|
|
6
|
-
|
6
|
+
Run state-of-the-art **language models directly from Ruby**. No Python, no APIs, no external services - just Ruby with blazing-fast Rust under the hood. Hardware accelerated with **Metal (Mac)** and **CUDA (NVIDIA).**
|
7
|
+
|
8
|
+
## Install & Chat in 30 Seconds
|
9
|
+
|
10
|
+
[](https://www.youtube.com/watch?v=hbyFCyh8esk)
|
11
|
+
|
12
|
+
```bash
|
13
|
+
# Install the gem
|
14
|
+
gem install red-candle
|
15
|
+
```
|
16
|
+
|
17
|
+
```ruby
|
18
|
+
require 'candle'
|
19
|
+
|
20
|
+
# Download a model (one-time, ~650MB) - Mistral, Llama3, Gemma all work!
|
21
|
+
llm = Candle::LLM.from_pretrained("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
|
22
|
+
gguf_file: "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
|
23
|
+
|
24
|
+
# Chat with it - no API calls, running locally in your Ruby process!
|
25
|
+
messages = [
|
26
|
+
{ role: "user", content: "Explain Ruby in one sentence" }
|
27
|
+
]
|
28
|
+
|
29
|
+
puts llm.chat(messages)
|
30
|
+
# => "Ruby is a dynamic, object-oriented programming language known for its
|
31
|
+
# simplicity, elegance, and productivity, often used for web development
|
32
|
+
# with frameworks like Rails."
|
33
|
+
```
|
34
|
+
|
35
|
+
## What Just Happened?
|
36
|
+
|
37
|
+
You just ran a 1.1-billion parameter AI model inside Ruby. The model lives in your process memory, runs on your hardware (CPU/GPU), and responds instantly without network latency.
|
38
|
+
|
39
|
+
## Stream Responses Like a Pro
|
40
|
+
|
41
|
+
```ruby
|
42
|
+
# Watch the AI think in real-time
|
43
|
+
llm.chat_stream(messages) do |token|
|
44
|
+
print token
|
45
|
+
end
|
46
|
+
```
|
47
|
+
|
48
|
+
## Why This Matters
|
49
|
+
|
50
|
+
- **Privacy**: Your data never leaves your machine
|
51
|
+
- **Speed**: No network overhead, direct memory access
|
52
|
+
- **Control**: Fine-tune generation parameters, access raw tokens
|
53
|
+
- **Integration**: It's just Ruby objects - use it anywhere Ruby runs
|
54
|
+
|
55
|
+
## Supports
|
56
|
+
|
57
|
+
- **Tokenizers**: Access the tokenizer directly
|
58
|
+
- **EmbeddingModel**: Generate embeddings for text
|
59
|
+
- **Reranker**: Rerank documents based on relevance
|
60
|
+
- **NER**: Named Entity Recognition directly from Ruby
|
61
|
+
- **LLM**: Chat with Large Language Models (e.g., Llama, Mistral, Gemma)
|
62
|
+
|
63
|
+
## Model Storage
|
64
|
+
|
65
|
+
Models are automatically downloaded and cached when you first use them. They are stored in:
|
66
|
+
- **Location**: `~/.cache/huggingface/hub/`
|
67
|
+
- **Size**: Models range from ~100MB (embeddings) to several GB (LLMs)
|
68
|
+
- **Reuse**: Models are downloaded once and reused across sessions
|
69
|
+
|
70
|
+
To check your cache or manage storage:
|
71
|
+
```bash
|
72
|
+
# View cache contents
|
73
|
+
ls -la ~/.cache/huggingface/hub/
|
74
|
+
|
75
|
+
# Check total cache size
|
76
|
+
du -sh ~/.cache/huggingface/
|
77
|
+
|
78
|
+
# Clear cache if needed (removes all downloaded models)
|
79
|
+
rm -rf ~/.cache/huggingface/hub/
|
80
|
+
```
|
81
|
+
|
82
|
+
----
|
7
83
|
|
8
84
|
## Usage
|
9
85
|
|
@@ -137,7 +213,7 @@ llm = Candle::LLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", device:
|
|
137
213
|
# Metal
|
138
214
|
device = Candle::Device.metal
|
139
215
|
|
140
|
-
# CUDA support (for NVIDIA GPUs
|
216
|
+
# CUDA support (for NVIDIA GPUs)
|
141
217
|
device = Candle::Device.cuda # Linux/Windows with NVIDIA GPU
|
142
218
|
```
|
143
219
|
|
@@ -671,7 +747,7 @@ All NER methods return entities in a consistent format:
|
|
671
747
|
|
672
748
|
## Common Runtime Errors
|
673
749
|
|
674
|
-
###
|
750
|
+
### Weight is negative, too large or not a valid number
|
675
751
|
|
676
752
|
**Error:**
|
677
753
|
```
|
@@ -688,13 +764,12 @@ All NER methods return entities in a consistent format:
|
|
688
764
|
- Q3_K_M (3-bit) - Minimum recommended quantization
|
689
765
|
|
690
766
|
```ruby
|
691
|
-
# Instead of Q2_K:
|
692
767
|
llm = Candle::LLM.from_pretrained("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
|
693
768
|
device: device,
|
694
769
|
gguf_file: "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
|
695
770
|
```
|
696
771
|
|
697
|
-
###
|
772
|
+
### Cannot find tensor model.embed_tokens.weight
|
698
773
|
|
699
774
|
**Error:**
|
700
775
|
```
|
@@ -713,7 +788,7 @@ Failed to load quantized model: cannot find tensor model.embed_tokens.weight (Ru
|
|
713
788
|
```
|
714
789
|
3. If the error persists, the GGUF file may use an unsupported architecture or format
|
715
790
|
|
716
|
-
###
|
791
|
+
### No GGUF file found in repository
|
717
792
|
|
718
793
|
**Error:**
|
719
794
|
```
|
@@ -730,7 +805,7 @@ llm = Candle::LLM.from_pretrained("TheBloke/Llama-2-7B-Chat-GGUF",
|
|
730
805
|
gguf_file: "llama-2-7b-chat.Q4_K_M.gguf")
|
731
806
|
```
|
732
807
|
|
733
|
-
###
|
808
|
+
### Failed to download tokenizer
|
734
809
|
|
735
810
|
**Error:**
|
736
811
|
```
|
@@ -741,7 +816,7 @@ Failed to load quantized model: Failed to download tokenizer: request error: HTT
|
|
741
816
|
|
742
817
|
**Solution:** The code now includes fallback tokenizer loading. If you still encounter this error, ensure you're using the latest version of red-candle.
|
743
818
|
|
744
|
-
###
|
819
|
+
### Missing metadata in GGUF file
|
745
820
|
|
746
821
|
**Error:**
|
747
822
|
```
|
@@ -770,17 +845,24 @@ Failed to load GGUF model: cannot find llama.attention.head_count in metadata (R
|
|
770
845
|
FORK IT!
|
771
846
|
|
772
847
|
```
|
773
|
-
git clone https://github.com/
|
848
|
+
git clone https://github.com/assaydepot/red-candle
|
774
849
|
cd red-candle
|
775
850
|
bundle
|
776
851
|
bundle exec rake compile
|
777
852
|
```
|
778
853
|
|
779
|
-
Implemented with [Magnus](https://github.com/matsadler/magnus), with reference to [Polars Ruby](https://github.com/ankane/polars-ruby)
|
780
|
-
|
781
854
|
Pull requests are welcome.
|
782
855
|
|
783
|
-
|
856
|
+
## Release
|
857
|
+
|
858
|
+
1. Update version number in `lib/candle/version.rb` and commit.
|
859
|
+
2. `bundle exec rake build`
|
860
|
+
3. `git tag VERSION_NUMBER`
|
861
|
+
4. `git push --follow-tags`
|
862
|
+
5. `gem push pkg/red-candle-VERSION_NUMBER.gem`
|
863
|
+
|
864
|
+
## See Also
|
784
865
|
|
785
|
-
- [
|
786
|
-
- [
|
866
|
+
- [Candle](https://github.com/huggingface/candle)
|
867
|
+
- [Magnus](https://github.com/matsadler/magnus)
|
868
|
+
- [Outlines-core](https://github.com/dottxt-ai/outlines-core)
|
data/ext/candle/build.rs
CHANGED
@@ -16,6 +16,7 @@ fn main() {
|
|
16
16
|
println!("cargo:rerun-if-env-changed=CUDA_PATH");
|
17
17
|
println!("cargo:rerun-if-env-changed=CANDLE_FEATURES");
|
18
18
|
println!("cargo:rerun-if-env-changed=CANDLE_ENABLE_CUDA");
|
19
|
+
println!("cargo:rerun-if-env-changed=CANDLE_DISABLE_CUDA");
|
19
20
|
|
20
21
|
// Check if we should force CPU only
|
21
22
|
if env::var("CANDLE_FORCE_CPU").is_ok() {
|
@@ -26,13 +27,13 @@ fn main() {
|
|
26
27
|
|
27
28
|
// Detect CUDA availability
|
28
29
|
let cuda_available = detect_cuda();
|
29
|
-
let
|
30
|
+
let cuda_disabled = env::var("CANDLE_DISABLE_CUDA").is_ok();
|
30
31
|
|
31
|
-
if cuda_available &&
|
32
|
+
if cuda_available && !cuda_disabled {
|
32
33
|
println!("cargo:rustc-cfg=has_cuda");
|
33
|
-
println!("cargo:warning=CUDA detected
|
34
|
-
} else if cuda_available &&
|
35
|
-
println!("cargo:warning=CUDA detected but
|
34
|
+
println!("cargo:warning=CUDA detected, CUDA acceleration will be available");
|
35
|
+
} else if cuda_available && cuda_disabled {
|
36
|
+
println!("cargo:warning=CUDA detected but disabled via CANDLE_DISABLE_CUDA");
|
36
37
|
}
|
37
38
|
|
38
39
|
// Detect Metal availability (macOS only)
|
data/ext/candle/extconf.rb
CHANGED
@@ -15,10 +15,10 @@ else
|
|
15
15
|
(File.exist?('C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA') ||
|
16
16
|
File.exist?('C:\CUDA')))
|
17
17
|
|
18
|
-
|
18
|
+
cuda_disabled = ENV['CANDLE_DISABLE_CUDA']
|
19
19
|
|
20
|
-
if cuda_available &&
|
21
|
-
puts "CUDA detected
|
20
|
+
if cuda_available && !cuda_disabled
|
21
|
+
puts "CUDA detected, enabling CUDA support"
|
22
22
|
features << 'cuda'
|
23
23
|
|
24
24
|
# Check if CUDNN should be enabled
|
@@ -26,10 +26,9 @@ else
|
|
26
26
|
puts "CUDNN support enabled"
|
27
27
|
features << 'cudnn'
|
28
28
|
end
|
29
|
-
elsif cuda_available &&
|
29
|
+
elsif cuda_available && cuda_disabled
|
30
30
|
puts "=" * 80
|
31
|
-
puts "CUDA detected but
|
32
|
-
puts "To enable CUDA support (coming soon), set CANDLE_ENABLE_CUDA=1"
|
31
|
+
puts "CUDA detected but disabled via CANDLE_DISABLE_CUDA"
|
33
32
|
puts "=" * 80
|
34
33
|
end
|
35
34
|
|
data/lib/candle/build_info.rb
CHANGED
@@ -15,8 +15,8 @@ module Candle
|
|
15
15
|
if cuda_potentially_available
|
16
16
|
warn "=" * 80
|
17
17
|
warn "Red Candle: CUDA detected on system but not enabled in build."
|
18
|
-
warn "
|
19
|
-
warn "
|
18
|
+
warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
|
19
|
+
warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
|
20
20
|
warn "=" * 80
|
21
21
|
end
|
22
22
|
# :nocov:
|
data/lib/candle/version.rb
CHANGED
data/lib/red-candle.rb
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require 'candle'
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: red-candle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Christopher Petersen
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2025-07-
|
12
|
+
date: 2025-07-22 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rb_sys
|
@@ -197,6 +197,7 @@ files:
|
|
197
197
|
- lib/candle/tensor.rb
|
198
198
|
- lib/candle/tokenizer.rb
|
199
199
|
- lib/candle/version.rb
|
200
|
+
- lib/red-candle.rb
|
200
201
|
homepage: https://github.com/assaydepot/red-candle
|
201
202
|
licenses:
|
202
203
|
- MIT
|
@@ -216,9 +217,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
216
217
|
- !ruby/object:Gem::Version
|
217
218
|
version: 3.3.26
|
218
219
|
requirements:
|
219
|
-
- Rust >= 1.
|
220
|
+
- Rust >= 1.65
|
220
221
|
rubygems_version: 3.5.3
|
221
222
|
signing_key:
|
222
223
|
specification_version: 4
|
223
|
-
summary: huggingface/candle for
|
224
|
+
summary: huggingface/candle for Ruby
|
224
225
|
test_files: []
|