red-candle 1.2.1 → 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Cargo.lock +2 -2
- data/LICENSE +1 -1
- data/README.md +77 -2
- data/ext/candle/src/ruby/reranker.rs +40 -38
- data/lib/candle/build_info.rb +6 -7
- data/lib/candle/device_utils.rb +1 -1
- data/lib/candle/llm.rb +2 -2
- data/lib/candle/logger.rb +149 -0
- data/lib/candle/ner.rb +1 -1
- data/lib/candle/reranker.rb +6 -4
- data/lib/candle/version.rb +1 -1
- data/lib/candle.rb +1 -0
- metadata +4 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2d5cfca9bb05ab3dd9e8b1f1d92db4d94190b9acf473a41c7eb8a35c51a1a94a
|
4
|
+
data.tar.gz: aa2028e08fde8be4d9fd55b9719beab8ee8afabdc240900f176d88f7888bae6f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 16426f08f0cf7fd5ec1353adb139702c662df714a9d6fa7c3ed4d8feac81ba0313dcb58874c13b76c47ef6d8ca6acf6e22c8818888bf1affbcbc3d3b3a6fde96
|
7
|
+
data.tar.gz: 82f161767e7214ed97f36127d979da634507584f3c8110aa4672876d78edd6c9af0494f473e1a22f83311f7ebdf7e7ab8c41ec6451e5cf433b6a5acdc2ea603f
|
data/Cargo.lock
CHANGED
@@ -2999,9 +2999,9 @@ checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
|
|
2999
2999
|
|
3000
3000
|
[[package]]
|
3001
3001
|
name = "slab"
|
3002
|
-
version = "0.4.
|
3002
|
+
version = "0.4.11"
|
3003
3003
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
3004
|
-
checksum = "
|
3004
|
+
checksum = "7a2ae44ef20feb57a68b23d846850f861394c2e02dc425a50098ae8c90267589"
|
3005
3005
|
|
3006
3006
|
[[package]]
|
3007
3007
|
name = "smallvec"
|
data/LICENSE
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
MIT License
|
2
2
|
|
3
3
|
Copyright (c) 2023 kojix2
|
4
|
-
Copyright (c) 2024 Christopher Petersen
|
4
|
+
Copyright (c) 2024, 2025 Christopher Petersen
|
5
5
|
|
6
6
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
7
7
|
of this software and associated documentation files (the "Software"), to deal
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
<img src="/docs/assets/logo-title.png" alt="red-candle" height="80px">
|
2
2
|
|
3
|
-
[](https://github.com/scientist-labs/red-candle/actions/workflows/build.yml)
|
4
4
|
[](https://badge.fury.io/rb/red-candle)
|
5
5
|
|
6
6
|
Run state-of-the-art **language models directly from Ruby**. No Python, no APIs, no external services - just Ruby with blazing-fast Rust under the hood. Hardware accelerated with **Metal (Mac)** and **CUDA (NVIDIA).** Red candle leverages the Rust ecosystem, notably [Candle](https://github.com/huggingface/candle) and [Magnus](https://github.com/matsadler/magnus), to provide a fast and efficient way to run LLMs in Ruby. See [Dependencies](#dependencies) for more.
|
@@ -363,6 +363,12 @@ require 'candle'
|
|
363
363
|
# Initialize the reranker with a cross-encoder model
|
364
364
|
reranker = Candle::Reranker.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
|
365
365
|
|
366
|
+
# Or with custom max_length for truncation (default is 512)
|
367
|
+
reranker = Candle::Reranker.from_pretrained(
|
368
|
+
"cross-encoder/ms-marco-MiniLM-L-12-v2",
|
369
|
+
max_length: 256 # Faster processing with less context
|
370
|
+
)
|
371
|
+
|
366
372
|
# Define your query and candidate documents
|
367
373
|
query = "How many people live in London?"
|
368
374
|
documents = [
|
@@ -469,6 +475,75 @@ The reranker uses a BERT-based architecture that:
|
|
469
475
|
|
470
476
|
This joint processing allows cross-encoders to capture subtle semantic relationships between queries and documents, making them more accurate for reranking tasks, though at the cost of higher computational requirements.
|
471
477
|
|
478
|
+
### Performance Considerations
|
479
|
+
|
480
|
+
**Important**: The Reranker automatically truncates documents to ensure stable performance. The default maximum is 512 tokens, but this is configurable.
|
481
|
+
|
482
|
+
#### Configurable Truncation
|
483
|
+
|
484
|
+
You can adjust the `max_length` parameter to balance performance and context:
|
485
|
+
|
486
|
+
```ruby
|
487
|
+
# Default: 512 tokens (maximum context, ~300ms per doc on CPU)
|
488
|
+
reranker = Candle::Reranker.from_pretrained(model_id)
|
489
|
+
|
490
|
+
# Faster: 256 tokens (~60% faster, ~120ms per doc on CPU)
|
491
|
+
reranker = Candle::Reranker.from_pretrained(model_id, max_length: 256)
|
492
|
+
|
493
|
+
# Fastest: 128 tokens (~80% faster, ~60ms per doc on CPU)
|
494
|
+
reranker = Candle::Reranker.from_pretrained(model_id, max_length: 128)
|
495
|
+
```
|
496
|
+
|
497
|
+
Choose based on your needs:
|
498
|
+
- **512 tokens**: Maximum context for complex queries (default)
|
499
|
+
- **256 tokens**: Good balance of speed and context
|
500
|
+
- **128 tokens**: Fast processing for simple matching
|
501
|
+
|
502
|
+
#### Performance Guidelines
|
503
|
+
|
504
|
+
1. **Document Length**: Documents longer than ~400 words will be truncated
|
505
|
+
- The first 512 tokens (roughly 300-400 words) are used
|
506
|
+
- Consider splitting very long documents into chunks if full coverage is needed
|
507
|
+
|
508
|
+
2. **Batch Size**: Process multiple documents in one call for efficiency
|
509
|
+
```ruby
|
510
|
+
# Good: Single call with multiple documents
|
511
|
+
results = reranker.rerank(query, documents)
|
512
|
+
|
513
|
+
# Less efficient: Multiple calls
|
514
|
+
documents.map { |doc| reranker.rerank(query, [doc]) }
|
515
|
+
```
|
516
|
+
|
517
|
+
3. **Expected Performance**:
|
518
|
+
- **CPU**: ~0.3-0.5s per query-document pair
|
519
|
+
- **GPU (Metal/CUDA)**: ~0.05-0.1s per query-document pair
|
520
|
+
- Performance is consistent regardless of document length due to truncation
|
521
|
+
|
522
|
+
4. **Chunking Strategy** for long documents:
|
523
|
+
```ruby
|
524
|
+
def rerank_long_document(query, long_text, chunk_size: 300)
|
525
|
+
# Split into overlapping chunks
|
526
|
+
words = long_text.split
|
527
|
+
chunks = []
|
528
|
+
|
529
|
+
(0...words.length).step(chunk_size - 50) do |i|
|
530
|
+
chunk = words[i...(i + chunk_size)].join(" ")
|
531
|
+
chunks << chunk
|
532
|
+
end
|
533
|
+
|
534
|
+
# Rerank chunks
|
535
|
+
results = reranker.rerank(query, chunks)
|
536
|
+
|
537
|
+
# Return best chunk
|
538
|
+
results.max_by { |r| r[:score] }
|
539
|
+
end
|
540
|
+
```
|
541
|
+
|
542
|
+
5. **Memory Usage**:
|
543
|
+
- Model size: ~125MB
|
544
|
+
- Each batch processes all documents simultaneously
|
545
|
+
- Consider batching if you have many documents
|
546
|
+
|
472
547
|
## Tokenizer
|
473
548
|
|
474
549
|
Red-Candle provides direct access to tokenizers for text preprocessing and analysis. This is useful for understanding how models process text, debugging issues, and building custom NLP pipelines.
|
@@ -874,7 +949,7 @@ Failed to load GGUF model: cannot find llama.attention.head_count in metadata (R
|
|
874
949
|
FORK IT!
|
875
950
|
|
876
951
|
```
|
877
|
-
git clone https://github.com/
|
952
|
+
git clone https://github.com/scientist-labs/red-candle
|
878
953
|
cd red-candle
|
879
954
|
bundle
|
880
955
|
bundle exec rake compile
|
@@ -18,46 +18,48 @@ pub struct Reranker {
|
|
18
18
|
}
|
19
19
|
|
20
20
|
impl Reranker {
|
21
|
-
pub fn new(model_id: String, device: Option<Device>) -> Result<Self> {
|
21
|
+
pub fn new(model_id: String, device: Option<Device>, max_length: Option<usize>) -> Result<Self> {
|
22
22
|
let device = device.unwrap_or(Device::best()).as_device()?;
|
23
|
-
|
23
|
+
let max_length = max_length.unwrap_or(512); // Default to 512
|
24
|
+
Self::new_with_core_device(model_id, device, max_length)
|
24
25
|
}
|
25
|
-
|
26
|
-
fn new_with_core_device(model_id: String, device: CoreDevice) -> std::result::Result<Self, Error> {
|
26
|
+
|
27
|
+
fn new_with_core_device(model_id: String, device: CoreDevice, max_length: usize) -> std::result::Result<Self, Error> {
|
27
28
|
let result = (|| -> std::result::Result<(BertModel, TokenizerWrapper, Linear, Linear), Box<dyn std::error::Error + Send + Sync>> {
|
28
29
|
let api = Api::new()?;
|
29
30
|
let repo = api.repo(Repo::new(model_id.clone(), RepoType::Model));
|
30
|
-
|
31
|
+
|
31
32
|
// Download model files
|
32
33
|
let config_filename = repo.get("config.json")?;
|
33
34
|
let tokenizer_filename = repo.get("tokenizer.json")?;
|
34
35
|
let weights_filename = repo.get("model.safetensors")?;
|
35
|
-
|
36
|
+
|
36
37
|
// Load config
|
37
38
|
let config = std::fs::read_to_string(config_filename)?;
|
38
39
|
let config: Config = serde_json::from_str(&config)?;
|
39
40
|
|
40
|
-
// Setup tokenizer with padding
|
41
|
+
// Setup tokenizer with padding AND truncation
|
41
42
|
let tokenizer = Tokenizer::from_file(tokenizer_filename)?;
|
42
43
|
let tokenizer = TokenizerLoader::with_padding(tokenizer, None);
|
43
|
-
|
44
|
+
let tokenizer = TokenizerLoader::with_truncation(tokenizer, max_length);
|
45
|
+
|
44
46
|
// Load model weights
|
45
47
|
let vb = unsafe {
|
46
48
|
VarBuilder::from_mmaped_safetensors(&[weights_filename], DType::F32, &device)?
|
47
49
|
};
|
48
|
-
|
50
|
+
|
49
51
|
// Load BERT model
|
50
52
|
let model = BertModel::load(vb.pp("bert"), &config)?;
|
51
|
-
|
53
|
+
|
52
54
|
// Load pooler layer (dense + tanh activation)
|
53
55
|
let pooler = candle_nn::linear(config.hidden_size, config.hidden_size, vb.pp("bert.pooler.dense"))?;
|
54
|
-
|
56
|
+
|
55
57
|
// Load classifier layer for cross-encoder (single output score)
|
56
58
|
let classifier = candle_nn::linear(config.hidden_size, 1, vb.pp("classifier"))?;
|
57
|
-
|
59
|
+
|
58
60
|
Ok((model, TokenizerWrapper::new(tokenizer), pooler, classifier))
|
59
61
|
})();
|
60
|
-
|
62
|
+
|
61
63
|
match result {
|
62
64
|
Ok((model, tokenizer, pooler, classifier)) => {
|
63
65
|
Ok(Self { model, tokenizer, pooler, classifier, device, model_id })
|
@@ -65,18 +67,18 @@ impl Reranker {
|
|
65
67
|
Err(e) => Err(Error::new(magnus::exception::runtime_error(), format!("Failed to load model: {}", e))),
|
66
68
|
}
|
67
69
|
}
|
68
|
-
|
70
|
+
|
69
71
|
/// Extract CLS embeddings from the model output, handling Metal device workarounds
|
70
72
|
fn extract_cls_embeddings(&self, embeddings: &Tensor) -> std::result::Result<Tensor, Error> {
|
71
73
|
let cls_embeddings = if self.device.is_metal() {
|
72
74
|
// Metal has issues with tensor indexing, use a different approach
|
73
75
|
let (batch_size, seq_len, hidden_size) = embeddings.dims3()
|
74
76
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to get dims: {}", e)))?;
|
75
|
-
|
77
|
+
|
76
78
|
// Reshape to [batch * seq_len, hidden] then take first hidden vectors for each batch
|
77
79
|
let reshaped = embeddings.reshape((batch_size * seq_len, hidden_size))
|
78
80
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to reshape: {}", e)))?;
|
79
|
-
|
81
|
+
|
80
82
|
// Extract CLS tokens (first token of each sequence)
|
81
83
|
let mut cls_vecs = Vec::new();
|
82
84
|
for i in 0..batch_size {
|
@@ -85,7 +87,7 @@ impl Reranker {
|
|
85
87
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to extract CLS: {}", e)))?;
|
86
88
|
cls_vecs.push(cls_vec);
|
87
89
|
}
|
88
|
-
|
90
|
+
|
89
91
|
// Stack the CLS vectors
|
90
92
|
Tensor::cat(&cls_vecs, 0)
|
91
93
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to cat CLS tokens: {}", e)))?
|
@@ -93,39 +95,39 @@ impl Reranker {
|
|
93
95
|
embeddings.i((.., 0))
|
94
96
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to extract CLS token: {}", e)))?
|
95
97
|
};
|
96
|
-
|
98
|
+
|
97
99
|
// Ensure tensor is contiguous for downstream operations
|
98
100
|
cls_embeddings.contiguous()
|
99
101
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to make CLS embeddings contiguous: {}", e)))
|
100
102
|
}
|
101
|
-
|
103
|
+
|
102
104
|
pub fn debug_tokenization(&self, query: String, document: String) -> std::result::Result<magnus::RHash, Error> {
|
103
105
|
// Create query-document pair for cross-encoder
|
104
106
|
let query_doc_pair: EncodeInput = (query.clone(), document.clone()).into();
|
105
|
-
|
107
|
+
|
106
108
|
// Tokenize using the inner tokenizer for detailed info
|
107
109
|
let encoding = self.tokenizer.inner().encode(query_doc_pair, true)
|
108
110
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Tokenization failed: {}", e)))?;
|
109
|
-
|
111
|
+
|
110
112
|
// Get token information
|
111
113
|
let token_ids = encoding.get_ids().to_vec();
|
112
114
|
let token_type_ids = encoding.get_type_ids().to_vec();
|
113
115
|
let attention_mask = encoding.get_attention_mask().to_vec();
|
114
116
|
let tokens = encoding.get_tokens().iter().map(|t| t.to_string()).collect::<Vec<_>>();
|
115
|
-
|
117
|
+
|
116
118
|
// Create result hash
|
117
119
|
let result = magnus::RHash::new();
|
118
120
|
result.aset("token_ids", RArray::from_vec(token_ids.iter().map(|&id| id as i64).collect::<Vec<_>>()))?;
|
119
121
|
result.aset("token_type_ids", RArray::from_vec(token_type_ids.iter().map(|&id| id as i64).collect::<Vec<_>>()))?;
|
120
122
|
result.aset("attention_mask", RArray::from_vec(attention_mask.iter().map(|&mask| mask as i64).collect::<Vec<_>>()))?;
|
121
123
|
result.aset("tokens", RArray::from_vec(tokens))?;
|
122
|
-
|
124
|
+
|
123
125
|
Ok(result)
|
124
126
|
}
|
125
|
-
|
127
|
+
|
126
128
|
pub fn rerank_with_options(&self, query: String, documents: RArray, pooling_method: String, apply_sigmoid: bool) -> std::result::Result<RArray, Error> {
|
127
129
|
let documents: Vec<String> = documents.to_vec()?;
|
128
|
-
|
130
|
+
|
129
131
|
// Create query-document pairs for cross-encoder
|
130
132
|
let query_and_docs: Vec<EncodeInput> = documents
|
131
133
|
.iter()
|
@@ -135,13 +137,13 @@ impl Reranker {
|
|
135
137
|
// Tokenize batch using inner tokenizer for access to token type IDs
|
136
138
|
let encodings = self.tokenizer.inner().encode_batch(query_and_docs, true)
|
137
139
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Tokenization failed: {}", e)))?;
|
138
|
-
|
140
|
+
|
139
141
|
// Convert to tensors
|
140
142
|
let token_ids = encodings
|
141
143
|
.iter()
|
142
144
|
.map(|e| e.get_ids().to_vec())
|
143
145
|
.collect::<Vec<_>>();
|
144
|
-
|
146
|
+
|
145
147
|
let token_type_ids = encodings
|
146
148
|
.iter()
|
147
149
|
.map(|e| e.get_type_ids().to_vec())
|
@@ -153,11 +155,11 @@ impl Reranker {
|
|
153
155
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to create token type ids tensor: {}", e)))?;
|
154
156
|
let attention_mask = token_ids.ne(0u32)
|
155
157
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to create attention mask: {}", e)))?;
|
156
|
-
|
158
|
+
|
157
159
|
// Forward pass through BERT
|
158
160
|
let embeddings = self.model.forward(&token_ids, &token_type_ids, Some(&attention_mask))
|
159
161
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Model forward pass failed: {}", e)))?;
|
160
|
-
|
162
|
+
|
161
163
|
// Apply pooling based on the specified method
|
162
164
|
let pooled_embeddings = match pooling_method.as_str() {
|
163
165
|
"pooler" => {
|
@@ -181,10 +183,10 @@ impl Reranker {
|
|
181
183
|
(sum / (seq_len as f64))
|
182
184
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to compute mean: {}", e)))?
|
183
185
|
},
|
184
|
-
_ => return Err(Error::new(magnus::exception::runtime_error(),
|
186
|
+
_ => return Err(Error::new(magnus::exception::runtime_error(),
|
185
187
|
format!("Unknown pooling method: {}. Use 'pooler', 'cls', or 'mean'", pooling_method)))
|
186
188
|
};
|
187
|
-
|
189
|
+
|
188
190
|
// Apply classifier to get relevance scores (raw logits)
|
189
191
|
// Ensure tensor is contiguous before linear layer
|
190
192
|
let pooled_embeddings = pooled_embeddings.contiguous()
|
@@ -193,7 +195,7 @@ impl Reranker {
|
|
193
195
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Classifier forward failed: {}", e)))?;
|
194
196
|
let scores = logits.squeeze(1)
|
195
197
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to squeeze tensor: {}", e)))?;
|
196
|
-
|
198
|
+
|
197
199
|
// Optionally apply sigmoid activation
|
198
200
|
let scores = if apply_sigmoid {
|
199
201
|
sigmoid(&scores)
|
@@ -201,7 +203,7 @@ impl Reranker {
|
|
201
203
|
} else {
|
202
204
|
scores
|
203
205
|
};
|
204
|
-
|
206
|
+
|
205
207
|
let scores_vec: Vec<f32> = scores.to_vec1()
|
206
208
|
.map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to convert scores to vec: {}", e)))?;
|
207
209
|
|
@@ -212,7 +214,7 @@ impl Reranker {
|
|
212
214
|
.enumerate()
|
213
215
|
.map(|(idx, (doc, score))| (doc, score, idx))
|
214
216
|
.collect();
|
215
|
-
|
217
|
+
|
216
218
|
// Sort documents by relevance score (descending)
|
217
219
|
ranked_docs.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
|
218
220
|
|
@@ -232,17 +234,17 @@ impl Reranker {
|
|
232
234
|
pub fn tokenizer(&self) -> std::result::Result<crate::ruby::tokenizer::Tokenizer, Error> {
|
233
235
|
Ok(crate::ruby::tokenizer::Tokenizer(self.tokenizer.clone()))
|
234
236
|
}
|
235
|
-
|
237
|
+
|
236
238
|
/// Get the model_id
|
237
239
|
pub fn model_id(&self) -> String {
|
238
240
|
self.model_id.clone()
|
239
241
|
}
|
240
|
-
|
242
|
+
|
241
243
|
/// Get the device
|
242
244
|
pub fn device(&self) -> Device {
|
243
245
|
Device::from_device(&self.device)
|
244
246
|
}
|
245
|
-
|
247
|
+
|
246
248
|
/// Get all options as a hash
|
247
249
|
pub fn options(&self) -> std::result::Result<magnus::RHash, Error> {
|
248
250
|
let hash = magnus::RHash::new();
|
@@ -254,7 +256,7 @@ impl Reranker {
|
|
254
256
|
|
255
257
|
pub fn init(rb_candle: RModule) -> std::result::Result<(), Error> {
|
256
258
|
let c_reranker = rb_candle.define_class("Reranker", class::object())?;
|
257
|
-
c_reranker.define_singleton_method("_create", function!(Reranker::new,
|
259
|
+
c_reranker.define_singleton_method("_create", function!(Reranker::new, 3))?;
|
258
260
|
c_reranker.define_method("rerank_with_options", method!(Reranker::rerank_with_options, 4))?;
|
259
261
|
c_reranker.define_method("debug_tokenization", method!(Reranker::debug_tokenization, 2))?;
|
260
262
|
c_reranker.define_method("tokenizer", method!(Reranker::tokenizer, 0))?;
|
data/lib/candle/build_info.rb
CHANGED
@@ -3,8 +3,7 @@ module Candle
|
|
3
3
|
def self.display_cuda_info
|
4
4
|
info = Candle.build_info
|
5
5
|
|
6
|
-
#
|
7
|
-
return unless ENV['CANDLE_VERBOSE'] || ENV['CANDLE_DEBUG'] || $DEBUG
|
6
|
+
# CUDA info is now controlled by logger level
|
8
7
|
|
9
8
|
if info["cuda_available"] == false
|
10
9
|
# :nocov:
|
@@ -13,11 +12,11 @@ module Candle
|
|
13
12
|
File.exist?('/usr/local/cuda') || File.exist?('/opt/cuda')
|
14
13
|
|
15
14
|
if cuda_potentially_available
|
16
|
-
warn "=" * 80
|
17
|
-
warn "Red Candle: CUDA detected on system but not enabled in build."
|
18
|
-
warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
|
19
|
-
warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
|
20
|
-
warn "=" * 80
|
15
|
+
Candle.logger.warn "=" * 80
|
16
|
+
Candle.logger.warn "Red Candle: CUDA detected on system but not enabled in build."
|
17
|
+
Candle.logger.warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
|
18
|
+
Candle.logger.warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
|
19
|
+
Candle.logger.warn "=" * 80
|
21
20
|
end
|
22
21
|
# :nocov:
|
23
22
|
end
|
data/lib/candle/device_utils.rb
CHANGED
@@ -3,7 +3,7 @@ module Candle
|
|
3
3
|
# @deprecated Use {Candle::Device.best} instead
|
4
4
|
# Get the best available device (Metal > CUDA > CPU)
|
5
5
|
def self.best_device
|
6
|
-
warn "[DEPRECATION] `DeviceUtils.best_device` is deprecated. Please use `Device.best` instead."
|
6
|
+
Candle.logger.warn "[DEPRECATION] `DeviceUtils.best_device` is deprecated. Please use `Device.best` instead."
|
7
7
|
Device.best
|
8
8
|
end
|
9
9
|
end
|
data/lib/candle/llm.rb
CHANGED
@@ -78,7 +78,7 @@ module Candle
|
|
78
78
|
JSON.parse(json_content)
|
79
79
|
rescue JSON::ParserError => e
|
80
80
|
# Return the raw string if parsing fails
|
81
|
-
warn "
|
81
|
+
Candle.logger.warn "Generated output is not valid JSON: #{e.message}" if options[:warn_on_parse_error]
|
82
82
|
result
|
83
83
|
end
|
84
84
|
end
|
@@ -261,7 +261,7 @@ module Candle
|
|
261
261
|
if e.message.include?("No tokenizer found")
|
262
262
|
# Auto-detect tokenizer
|
263
263
|
detected_tokenizer = guess_tokenizer(model_id)
|
264
|
-
|
264
|
+
Candle.logger.info "No tokenizer found in GGUF repo. Using tokenizer from: #{detected_tokenizer}"
|
265
265
|
model_str = "#{model_str}@@#{detected_tokenizer}"
|
266
266
|
_from_pretrained(model_str, device)
|
267
267
|
else
|
@@ -0,0 +1,149 @@
|
|
1
|
+
require 'logger'
|
2
|
+
|
3
|
+
module Candle
|
4
|
+
# Logging functionality for the Red Candle gem
|
5
|
+
class << self
|
6
|
+
# Get the current logger instance
|
7
|
+
# @return [Logger] The logger instance
|
8
|
+
def logger
|
9
|
+
@logger ||= create_default_logger
|
10
|
+
end
|
11
|
+
|
12
|
+
# Set a custom logger instance
|
13
|
+
# @param custom_logger [Logger] A custom logger instance
|
14
|
+
def logger=(custom_logger)
|
15
|
+
@logger = custom_logger
|
16
|
+
end
|
17
|
+
|
18
|
+
# Configure logging with a block
|
19
|
+
# @yield [config] Configuration object
|
20
|
+
def configure_logging
|
21
|
+
config = LoggerConfig.new
|
22
|
+
yield config if block_given?
|
23
|
+
@logger = config.build_logger
|
24
|
+
end
|
25
|
+
|
26
|
+
private
|
27
|
+
|
28
|
+
# Create the default logger with CLI-friendly settings
|
29
|
+
# @return [Logger] Configured logger instance
|
30
|
+
def create_default_logger
|
31
|
+
logger = Logger.new($stderr)
|
32
|
+
logger.level = default_log_level
|
33
|
+
logger.formatter = cli_friendly_formatter
|
34
|
+
logger
|
35
|
+
end
|
36
|
+
|
37
|
+
# Determine default log level based on environment variables
|
38
|
+
# @return [Integer] Logger level constant
|
39
|
+
def default_log_level
|
40
|
+
# Support legacy CANDLE_VERBOSE for backward compatibility, but prefer explicit configuration
|
41
|
+
return Logger::DEBUG if ENV['CANDLE_VERBOSE']
|
42
|
+
Logger::WARN # CLI-friendly: only show warnings/errors by default
|
43
|
+
end
|
44
|
+
|
45
|
+
# CLI-friendly formatter that outputs just the message
|
46
|
+
# @return [Proc] Formatter proc
|
47
|
+
def cli_friendly_formatter
|
48
|
+
proc { |severity, datetime, progname, msg| "#{msg}\n" }
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
# Configuration helper for logger setup
|
53
|
+
class LoggerConfig
|
54
|
+
attr_accessor :level, :output, :formatter
|
55
|
+
|
56
|
+
def initialize
|
57
|
+
@level = :warn
|
58
|
+
@output = $stderr
|
59
|
+
@formatter = :simple
|
60
|
+
end
|
61
|
+
|
62
|
+
# Build a logger from the configuration
|
63
|
+
# @return [Logger] Configured logger
|
64
|
+
def build_logger
|
65
|
+
logger = Logger.new(@output)
|
66
|
+
logger.level = normalize_level(@level)
|
67
|
+
logger.formatter = build_formatter(@formatter)
|
68
|
+
logger
|
69
|
+
end
|
70
|
+
|
71
|
+
# Set log level to debug (verbose output)
|
72
|
+
def verbose!
|
73
|
+
@level = :debug
|
74
|
+
end
|
75
|
+
|
76
|
+
# Set log level to info
|
77
|
+
def info!
|
78
|
+
@level = :info
|
79
|
+
end
|
80
|
+
|
81
|
+
# Set log level to warn (default)
|
82
|
+
def quiet!
|
83
|
+
@level = :warn
|
84
|
+
end
|
85
|
+
|
86
|
+
# Set log level to error (minimal output)
|
87
|
+
def silent!
|
88
|
+
@level = :error
|
89
|
+
end
|
90
|
+
|
91
|
+
# Log to stdout instead of stderr
|
92
|
+
def log_to_stdout!
|
93
|
+
@output = $stdout
|
94
|
+
end
|
95
|
+
|
96
|
+
# Log to a file
|
97
|
+
# @param file_path [String] Path to log file
|
98
|
+
def log_to_file!(file_path)
|
99
|
+
@output = file_path
|
100
|
+
end
|
101
|
+
|
102
|
+
# Disable logging completely
|
103
|
+
def disable!
|
104
|
+
@output = File::NULL
|
105
|
+
end
|
106
|
+
|
107
|
+
private
|
108
|
+
|
109
|
+
# Convert symbol/string level to Logger constant
|
110
|
+
# @param level [Symbol, String, Integer] Log level
|
111
|
+
# @return [Integer] Logger level constant
|
112
|
+
def normalize_level(level)
|
113
|
+
case level.to_s.downcase
|
114
|
+
when 'debug' then Logger::DEBUG
|
115
|
+
when 'info' then Logger::INFO
|
116
|
+
when 'warn', 'warning' then Logger::WARN
|
117
|
+
when 'error' then Logger::ERROR
|
118
|
+
when 'fatal' then Logger::FATAL
|
119
|
+
else Logger::WARN
|
120
|
+
end
|
121
|
+
end
|
122
|
+
|
123
|
+
# Build formatter based on type
|
124
|
+
# @param formatter_type [Symbol] Type of formatter
|
125
|
+
# @return [Proc] Formatter proc
|
126
|
+
def build_formatter(formatter_type)
|
127
|
+
case formatter_type
|
128
|
+
when :simple, :cli
|
129
|
+
proc { |severity, datetime, progname, msg| "#{msg}\n" }
|
130
|
+
when :detailed
|
131
|
+
proc do |severity, datetime, progname, msg|
|
132
|
+
"[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
|
133
|
+
end
|
134
|
+
when :json
|
135
|
+
require 'json'
|
136
|
+
proc do |severity, datetime, progname, msg|
|
137
|
+
JSON.generate({
|
138
|
+
timestamp: datetime.iso8601,
|
139
|
+
level: severity,
|
140
|
+
message: msg,
|
141
|
+
program: progname
|
142
|
+
}) + "\n"
|
143
|
+
end
|
144
|
+
else
|
145
|
+
proc { |severity, datetime, progname, msg| "#{msg}\n" }
|
146
|
+
end
|
147
|
+
end
|
148
|
+
end
|
149
|
+
end
|
data/lib/candle/ner.rb
CHANGED
@@ -196,7 +196,7 @@ module Candle
|
|
196
196
|
# This is especially important for Ruby < 3.2
|
197
197
|
max_length = 1_000_000 # 1MB of text
|
198
198
|
if text.length > max_length
|
199
|
-
warn "PatternEntityRecognizer: Text truncated from #{text.length} to #{max_length} chars for safety"
|
199
|
+
Candle.logger.warn "PatternEntityRecognizer: Text truncated from #{text.length} to #{max_length} chars for safety"
|
200
200
|
text = text[0...max_length]
|
201
201
|
end
|
202
202
|
|
data/lib/candle/reranker.rb
CHANGED
@@ -6,18 +6,20 @@ module Candle
|
|
6
6
|
# Load a pre-trained reranker model from HuggingFace
|
7
7
|
# @param model_id [String] HuggingFace model ID (defaults to cross-encoder/ms-marco-MiniLM-L-12-v2)
|
8
8
|
# @param device [Candle::Device] The device to use for computation (defaults to best available)
|
9
|
+
# @param max_length [Integer] Maximum sequence length for truncation (defaults to 512)
|
9
10
|
# @return [Reranker] A new Reranker instance
|
10
|
-
def self.from_pretrained(model_id = DEFAULT_MODEL_PATH, device: Candle::Device.best)
|
11
|
-
_create(model_id, device)
|
11
|
+
def self.from_pretrained(model_id = DEFAULT_MODEL_PATH, device: Candle::Device.best, max_length: 512)
|
12
|
+
_create(model_id, device, max_length)
|
12
13
|
end
|
13
14
|
|
14
15
|
# Constructor for creating a new Reranker with optional parameters
|
15
16
|
# @deprecated Use {.from_pretrained} instead
|
16
17
|
# @param model_path [String, nil] The path to the model on Hugging Face
|
17
18
|
# @param device [Candle::Device, Candle::Device.cpu] The device to use for computation
|
18
|
-
|
19
|
+
# @param max_length [Integer] Maximum sequence length for truncation (defaults to 512)
|
20
|
+
def self.new(model_path: DEFAULT_MODEL_PATH, device: Candle::Device.best, max_length: 512)
|
19
21
|
$stderr.puts "[DEPRECATION] `Reranker.new` is deprecated. Please use `Reranker.from_pretrained` instead."
|
20
|
-
_create(model_path, device)
|
22
|
+
_create(model_path, device, max_length)
|
21
23
|
end
|
22
24
|
|
23
25
|
# Returns documents ranked by relevance using the specified pooling method.
|
data/lib/candle/version.rb
CHANGED
data/lib/candle.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: red-candle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.2.
|
4
|
+
version: 1.2.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Christopher Petersen
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2025-
|
12
|
+
date: 2025-09-07 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rb_sys
|
@@ -218,13 +218,14 @@ files:
|
|
218
218
|
- lib/candle/embedding_model.rb
|
219
219
|
- lib/candle/embedding_model_type.rb
|
220
220
|
- lib/candle/llm.rb
|
221
|
+
- lib/candle/logger.rb
|
221
222
|
- lib/candle/ner.rb
|
222
223
|
- lib/candle/reranker.rb
|
223
224
|
- lib/candle/tensor.rb
|
224
225
|
- lib/candle/tokenizer.rb
|
225
226
|
- lib/candle/version.rb
|
226
227
|
- lib/red-candle.rb
|
227
|
-
homepage: https://github.com/
|
228
|
+
homepage: https://github.com/scientist-labs/red-candle
|
228
229
|
licenses:
|
229
230
|
- MIT
|
230
231
|
metadata: {}
|